banner
andrewji8

Being towards death

Heed not to the tree-rustling and leaf-lashing rain, Why not stroll along, whistle and sing under its rein. Lighter and better suited than horses are straw sandals and a bamboo staff, Who's afraid? A palm-leaf plaited cape provides enough to misty weather in life sustain. A thorny spring breeze sobers up the spirit, I feel a slight chill, The setting sun over the mountain offers greetings still. Looking back over the bleak passage survived, The return in time Shall not be affected by windswept rain or shine.
telegram
twitter
github

Scientific Multimodal Large Model Intern-S1 (Codename: Scholar)

WAIC 2025 Conference: Intern-S1 (Codename "Scholar") Released#

image

Shanghai Artificial Intelligence Laboratory released and open-sourced the latest scientific multimodal large model Intern-S1 (codename "Scholar") at the WAIC 2025 conference.

Experience Address#

Model Architecture#

Intern-S1 is based on the MoE architecture and has:

  • A language model part with 235 billion parameters (Qwen3)
  • A visual encoder with 6 billion parameters (IntrenViT)
  • Total scale: 241 billion parameters

Training Data and Capabilities#

  • A 5T training dataset, more than half of which consists of specialized domain knowledge.
  • Context length: 128K tokens, capable of processing multiple top conference papers and performing sequential analysis.

Data Analysis Capability#

  • Able to read data trends in scientific charts and explain the underlying logic in conjunction with text.
  • Understand the content displayed in the figures, interpret the physical processes represented, and deduce the next experimental steps.

Innovative Features#

Intern-S1 pioneered a cross-modal scientific analysis engine that adaptively performs token encoding for different modalities of data. It provides more efficient encoding representations for special sequences such as chemical formulas and protein sequences, achieving over 70% compression, enabling it to understand specialized, complex data.

Training Paradigm#

To balance generality and specialization, Intern-S1 proposed a "training paradigm of integration between general and specialized":

  • Utilize vast amounts of general scientific data to expand the model's knowledge breadth.
  • Train numerous domain expert models to generate highly readable, logically clear professional data, verified for quality by customized domain intelligence.

Through this closed-loop mechanism that feeds back into pre-training, Intern-S1 possesses both strong general reasoning capabilities and multiple top-tier specialized abilities, achieving a breakthrough where one model addresses various specialized tasks.

Training Efficiency#

A large-scale multi-task reinforcement learning framework Inte was introduced in the later stages of model training. The algorithm focuses on "mixed rewards"—tasks that can be validated receive rewards from rules and validators. This system allows its training energy consumption to be only 1% of Grok 4, while performance remains competitive.

Conclusion#

Will Intern-S1 become the standard answer for scientific multimodality? It's too early to conclude, but it shows us another path—not merely building large models to compete on parameters, but starting from actual needs to solve truly challenging yet valuable application scenarios. The direction of Intern-S1 differs from the pursuit of general capabilities in recent years. While models like GPT, Gemini, and Claude are mature in dialogue and code generation, they often yield unstable and illogical results when analyzing scientific graphs or assisting in experimental design, with complex formulas appearing as gibberish to them.

Intern-S1, on the other hand, tackles this difficulty in scientific research, applying multimodality to literature analysis, experimental assistance, and other "high-pressure areas," opening a pathway toward the possibility of "specialized AI."

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.