model February 11, 2026

Intern‑S1‑Pro: Trillion‑Scale Multimodal Scientific Reasoner Takes the Lead

Intern‑S1‑Pro, released by the InternLM team, is a trillion‑parameter mixture‑of‑experts (MoE) foundation model that targets scientific multimodal reasoning. Tagged with **image‑text‑to‑text**, it accepts visual inputs alongside textual prompts and generates detailed textual responses. The model scales to 1 T total parameters with 512 experts, activating 8 experts per token (≈22 B activated parameters), and is distributed under the Apache‑2.0 license.

Key technical innovations include STE routing with dense gradients for stable router training, grouped routing for balanced expert parallelism, and a Fourier Position Encoding (FoPE) combined with an upgraded time‑series module that can handle heterogeneous series ranging from 10⁰ to 10⁶ points. These design choices give Intern‑S1‑Pro state‑of‑the‑art performance on AI4Science benchmarks across chemistry, materials, life‑science, and earth domains, while also retaining strong general multimodal and text capabilities. The model also supports tool‑calling via the OpenAI‑compatible API, enabling it to invoke external functions such as weather look‑ups, and features a toggleable “thinking mode” that can be disabled for faster, non‑reasoning generation.

For inference, the README advises using specialized LLM serving engines—LMDeploy, vLLM, or SGLang—because the native Hugging Face forward pass struggles with a model of this size. Recommended sampling settings are `top_p=0.95`, `top_k=50`, `temperature=0.8`. The repository provides example code for tool calling, as well as guidance on switching the thinking mode on or off through the `enable_thinking` flag in the chat template. A deployment guide and a technical report (arXiv:2508.15763) accompany the release, and the authors invite community interaction via Discord and WeChat.

Intern‑S1‑Pro’s open‑source nature, extensive scientific reasoning abilities, and multimodal flexibility make it a valuable resource for researchers and developers building AI‑augmented scientific tools, multimodal assistants, or domain‑specific reasoning systems.

Project Ideas

  1. Create a chemistry assistant that answers questions about molecular structures by feeding in diagram images and receiving detailed textual explanations.
  2. Build a multimodal literature summarizer that ingests figure images and captions from scientific papers and generates concise abstracts.
  3. Develop a time‑series analysis chatbot for materials scientists that interprets long experimental data streams and suggests next‑step actions.
  4. Implement a weather‑aware planning tool that uses Intern‑S1‑Pro’s tool‑calling capability to fetch real‑time forecasts and incorporate them into itinerary suggestions.
  5. Deploy an AI lab‑experiment designer that takes a brief textual prompt, optionally a schematic image, and returns a step‑by‑step experimental protocol using the model’s scientific reasoning and thinking‑mode features.
← Back to all reports