MindSpore Transformers (MindFormers)

LICENSE Downloads PyPI

1. Introduction

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

  • One-click initiation of single or multi card pre-training, fine-tuning, evaluation, inference, and deployment processes for large models;
  • Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
  • System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
  • Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
  • Provide real-time visualization of training accuracy/performance monitoring indicators.

For details about MindSpore Transformers tutorials and API documents, see MindSpore Transformers Documentation. The following are quick jump links to some of the key content:

If you have any suggestions on MindSpore Transformers, contact us through an issue, and we will address it promptly.

If you're interested in MindSpore Transformers technology or wish to contribute code, we welcome you to join the MindSpore Transformers SIG.

Models List

The following table lists models supported by MindSpore Transformers.

Model Specifications Model Type Model Architecture Latest Version
Qwen3 Recent Popular 0.6B/1.7B/4B/8B/14B/32B Dense LLM Mcore In-development version
Qwen3-MoE Recent Popular 30B-A3B/235B-A22B Sparse LLM Mcore In-development version
DeepSeek-V3 Recent Popular 671B Sparse LLM Legacy 1.6.0, In-development version
GLM4 Recent Popular 9B Dense LLM Legacy 1.6.0, In-development version
Llama3.1 Recent Popular 8B/70B Dense LLM Legacy 1.6.0, In-development version
Mixtral Recent Popular 8x7B Sparse LLM Legacy 1.6.0, In-development version
Qwen2.5 Recent Popular 0.5B/1.5B/7B/14B/32B/72B Dense LLM Legacy 1.6.0, In-development version
TeleChat2 Recent Popular 7B/35B/115B Dense LLM Legacy 1.6.0, In-development version
CodeLlama End of Life 34B Dense LLM Legacy 1.5.0
CogVLM2-Image End of Life 19B MM Legacy 1.5.0
CogVLM2-Video End of Life 13B MM Legacy 1.5.0
DeepSeek-V2 End of Life 236B Sparse LLM Legacy 1.5.0
DeepSeek-Coder-V1.5 End of Life 7B Dense LLM Legacy 1.5.0
DeepSeek-Coder End of Life 33B Dense LLM Legacy 1.5.0
GLM3-32K End of Life 6B Dense LLM Legacy 1.5.0
GLM3 End of Life 6B Dense LLM Legacy 1.5.0
InternLM2 End of Life 7B/20B Dense LLM Legacy 1.5.0
Llama3.2 End of Life 3B Dense LLM Legacy 1.5.0
Llama3.2-Vision End of Life 11B MM Legacy 1.5.0
Llama3 End of Life 8B/70B Dense LLM Legacy 1.5.0
Llama2 End of Life 7B/13B/70B Dense LLM Legacy 1.5.0
Qwen2 End of Life 0.5B/1.5B/7B/57B/57B-A14B/72B Dense /Sparse LLM Legacy 1.5.0
Qwen1.5 End of Life 7B/14B/72B Dense LLM Legacy 1.5.0
Qwen-VL End of Life 9.6B MM Legacy 1.5.0
TeleChat End of Life 7B/12B/52B Dense LLM Legacy 1.5.0
Whisper End of Life 1.5B MM Legacy 1.5.0
Yi End of Life 6B/34B Dense LLM Legacy 1.5.0
YiZhao End of Life 12B Dense LLM Legacy 1.5.0
Baichuan2 End of Life 7B/13B Dense LLM Legacy 1.3.2
GLM2 End of Life 6B Dense LLM Legacy 1.3.2
GPT2 End of Life 124M/13B Dense LLM Legacy 1.3.2
InternLM End of Life 7B/20B Dense LLM Legacy 1.3.2
Qwen End of Life 7B/14B Dense LLM Legacy 1.3.2
CodeGeex2 End of Life 6B Dense LLM Legacy 1.1.0
WizardCoder End of Life 15B Dense LLM Legacy 1.1.0
Baichuan End of Life 7B/13B Dense LLM Legacy 1.0
Blip2 End of Life 8.1B MM Legacy 1.0
Bloom End of Life 560M/7.1B/65B/176B Dense LLM Legacy 1.0
Clip End of Life 149M/428M MM Legacy 1.0
CodeGeex End of Life 13B Dense LLM Legacy 1.0
GLM End of Life 6B Dense LLM Legacy 1.0
iFlytekSpark End of Life 13B Dense LLM Legacy 1.0
Llama End of Life 7B/13B Dense LLM Legacy 1.0
MAE End of Life 86M MM Legacy 1.0
Mengzi3 End of Life 13B Dense LLM Legacy 1.0
PanguAlpha End of Life 2.6B/13B Dense LLM Legacy 1.0
SAM End of Life 91M/308M/636M MM Legacy 1.0
Skywork End of Life 13B Dense LLM Legacy 1.0
Swin End of Life 88M MM Legacy 1.0
T5 End of Life 14M/60M Dense LLM Legacy 1.0
VisualGLM End of Life 6B MM Legacy 1.0
Ziya End of Life 13B Dense LLM Legacy 1.0
Bert End of Life 4M/110M Dense LLM Legacy 0.8

End of Life indicates that the model has been offline from the main branch and can be used with the latest supported version.

The model maintenance strategy follows the Life Cycle And Version Matching Strategy of the corresponding latest supported version.

Model Level Introduction

The Mcore architecture model is divided into five levels for training and inference, respectively, representing different standards for model deployment. For details on the levels of different specifications of models in the library, please refer to the model documentation.

Training

  • Released: Passed testing team verification, with loss and grad norm accuracy meeting benchmark alignment standards under deterministic conditions;
  • Validated: Passed self-verification by the development team, with loss and grad norm accuracy meeting benchmark alignment standards under deterministic conditions;
  • Preliminary: Passed preliminary self-verification by developers, with complete functionality and usability, normal convergence of training, but accuracy not strictly verified;
  • Untested: Functionality is available but has not undergone systematic testing, with accuracy and convergence not verified, and support for user-defined development enablement;
  • Community: Community-contributed MindSpore native models, developed and maintained by the community.

Inference

  • Released: Passed testing team acceptance, with evaluation accuracy aligned with benchmark standards;
  • Validated: Passed developer self-verification, with evaluation accuracy aligned with benchmark standards;
  • Preliminary: Passed preliminary self-verification by developers, with complete functionality and usable for testing; inference outputs are logically consistent but accuracy has not been strictly verified;
  • Untested: Functionality is available but has not undergone system testing; accuracy has not been verified; supports user-defined development enablement;
  • Community: Community-contributed MindSpore native models, developed and maintained by the community.

2. Installation

Version Mapping

Currently supported hardware includes Atlas 800T A2, Atlas 800I A2, and Atlas 900 A3 SuperPoD.

Python 3.11.4 is recommended for the current suite.

MindSpore Transformers MindSpore CANN Driver/Firmware
In-development version In-development version In-development version In-development version

Historical Version Supporting Relationships:

MindSpore Transformers MindSpore CANN Driver/Firmware
1.6.0 2.7.0 8.2.RC1 25.2.0
1.5.0 2.6.0-rc1 8.1.RC1 25.0.RC1
1.3.2 2.4.10 8.0.0 24.1.0
1.3.0 2.4.0 8.0.RC3 24.1.RC3
1.2.0 2.3.0 8.0.RC2 24.1.RC2

Installation Using the Source Code

Currently, MindSpore Transformers can be compiled and installed using the source code. You can run the following commands to install MindSpore Transformers:

git clone -b master https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh

3. User Guide

MindSpore Transformers supports distributed pre-training, supervised fine-tuning, and inference tasks for large models with one click. You can click the link of each model in Model List to see the corresponding documentation.

For more information about the functions of MindSpore Transformers, please refer to MindSpore Transformers Documentation.

4. Life Cycle And Version Matching Strategy

MindSpore Transformers version has the following five maintenance phases:

Status Duration Description
Plan 1-3 months Planning function.
Develop 3 months Build function.
Preserve 6 months Incorporate all solved problems and release new versions.
No Preserve 0—3 months Incorporate all the solved problems, there is no full-time maintenance team, and there is no plan to release a new version.
End of Life (EOL) N/A The branch is closed and no longer accepts any modifications.

MindSpore Transformers released version preservation policy:

MindSpore Transformers Version Corresponding Label Current Status Release Time Subsequent Status EOL Date
1.7.0 v1.7.0 Preserve 2025/10/27 No preserve expected from 2025/04/27 2026/07/27
1.6.0 v1.6.0 Preserve 2025/07/29 No preserve expected from 2025/01/29 2026/04/29
1.5.0 v1.5.0 No Preserve 2025/04/29 End of Life expected from 2026/01/29 2026/01/29
1.3.2 v1.3.2 End of Life 2024/12/20 - 2025/09/20
1.2.0 v1.2.0 End of Life 2024/07/12 - 2025/04/12
1.1.0 v1.1.0 End of Life 2024/04/15 - 2025/01/15

5. Disclaimer

  1. scripts/examples directory are provided as reference examples and do not form part of the commercially released products. They are only for users' reference. If it needs to be used, the user should be responsible for transforming it into a product suitable for commercial use and ensuring security protection. MindSpore Transformers does not assume security responsibility for the resulting security problems.
  2. Regarding datasets, MindSpore Transformers only provides suggestions for datasets that can be used for training. MindSpore Transformers does not provide any datasets. Users who use any dataset for training must ensure the legality and security of the training data and assume the following risks:
    1. Data poisoning: Maliciously tampered training data may cause the model to produce bias, security vulnerabilities, or incorrect outputs.
    2. Data compliance: Users must ensure that data collection and processing comply with relevant laws, regulations, and privacy protection requirements.
  3. If you do not want your dataset to be mentioned in MindSpore Transformers, or if you want to update the description of your dataset in MindSpore Transformers, please submit an issue to Gitee, and we will remove or update the description of your dataset according to your issue request. We sincerely appreciate your understanding and contribution to MindSpore Transformers.
  4. Regarding model weights, users must verify the authenticity of downloaded and distributed model weights from trusted sources. MindSpore Transformers cannot guarantee the security of third-party weights. Weight files may be tampered with during transmission or loading, leading to unexpected model outputs or security vulnerabilities. Users should assume the risk of using third-party weights and ensure that weight files are verified for security before use.
  5. Regarding weights, vocabularies, scripts, and other files downloaded from sources like openmind, users must verify the authenticity of downloaded and distributed model weights from trusted sources. MindSpore Transformers cannot guarantee the security of third-party files. Users should assume the risks arising from unexpected functional issues, outputs, or security vulnerabilities when using these files.
  6. MindSpore Transformers saves weights or logs based on the path set by the user. Users should avoid using system file directories when configuring paths. If unexpected system issues arise due to improper path settings, users shall bear the risks themselves.

6. Contribution

We welcome contributions to the community. For details, see MindSpore Transformers Contribution Guidelines.

7. License

Apache 2.0 License

Description
MindSpore Transformers套件的目标是构建一个大模型预训练、微调、推理、部署的全流程开发套件,提供业内主流的Transformer类大语言模型(Large Language Models, LLMs)和多模态理解模型(Multimodal Models, MMs)。期望帮助用户轻松地实现大模型全流程开发。
Readme Apache-2.0 242 MiB
Languages
Python 99.4%
C++ 0.3%
Shell 0.3%