Merge pull request !6700 from wangshaocong/br_infer_deepseek_os
MindSpore Transformers (MindFormers)
1. Introduction
The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.
Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:
- One-click initiation of single or multi card pre-training, fine-tuning, evaluation, inference, and deployment processes for large models;
- Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
- System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
- Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
- Provide real-time visualization of training accuracy/performance monitoring indicators.
For details about MindSpore Transformers tutorials and API documents, see MindSpore Transformers Documentation. The following are quick jump links to some of the key content:
If you have any suggestions on MindSpore Transformers, contact us through an issue, and we will address it promptly.
Models List
The following table lists models supported by MindSpore Transformers.
| Model | Specifications | Model Type | Latest Version |
|---|---|---|---|
| DeepSeek-V3 | 671B | Sparse LLM | In-development version, 1.5.0 |
| GLM4 | 9B | Dense LLM | In-development version, 1.5.0 |
| Llama3.1 | 8B/70B | Dense LLM | In-development version, 1.5.0 |
| Qwen2.5 | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 |
| TeleChat2 | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 |
| CodeLlama | 34B | Dense LLM | 1.5.0 |
| CogVLM2-Image | 19B | MM | 1.5.0 |
| CogVLM2-Video | 13B | MM | 1.5.0 |
| DeepSeek-V2 | 236B | Sparse LLM | 1.5.0 |
| DeepSeek-Coder-V1.5 | 7B | Dense LLM | 1.5.0 |
| DeepSeek-Coder | 33B | Dense LLM | 1.5.0 |
| GLM3-32K | 6B | Dense LLM | 1.5.0 |
| GLM3 | 6B | Dense LLM | 1.5.0 |
| InternLM2 | 7B/20B | Dense LLM | 1.5.0 |
| Llama3.2 | 3B | Dense LLM | 1.5.0 |
| Llama3.2-Vision | 11B | MM | 1.5.0 |
| Llama3 | 8B/70B | Dense LLM | 1.5.0 |
| Llama2 | 7B/13B/70B | Dense LLM | 1.5.0 |
| Mixtral | 8x7B | Sparse LLM | 1.5.0 |
| Qwen2 | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense/Sparse LLM | 1.5.0 |
| Qwen1.5 | 7B/14B/72B | Dense LLM | 1.5.0 |
| Qwen-VL | 9.6B | MM | 1.5.0 |
| TeleChat | 7B/12B/52B | Dense LLM | 1.5.0 |
| Whisper | 1.5B | MM | 1.5.0 |
| Yi | 6B/34B | Dense LLM | 1.5.0 |
| YiZhao | 12B | Dense LLM | 1.5.0 |
| Baichuan2 | 7B/13B | Dense LLM | 1.3.2 |
| GLM2 | 6B | Dense LLM | 1.3.2 |
| GPT2 | 124M/13B | Dense LLM | 1.3.2 |
| InternLM | 7B/20B | Dense LLM | 1.3.2 |
| Qwen | 7B/14B | Dense LLM | 1.3.2 |
| CodeGeex2 | 6B | Dense LLM | 1.1.0 |
| WizardCoder | 15B | Dense LLM | 1.1.0 |
| Baichuan | 7B/13B | Dense LLM | 1.0 |
| Blip2 | 8.1B | MM | 1.0 |
| Bloom | 560M/7.1B/65B/176B | Dense LLM | 1.0 |
| Clip | 149M/428M | MM | 1.0 |
| CodeGeex | 13B | Dense LLM | 1.0 |
| GLM | 6B | Dense LLM | 1.0 |
| iFlytekSpark | 13B | Dense LLM | 1.0 |
| Llama | 7B/13B | Dense LLM | 1.0 |
| MAE | 86M | MM | 1.0 |
| Mengzi3 | 13B | Dense LLM | 1.0 |
| PanguAlpha | 2.6B/13B | Dense LLM | 1.0 |
| SAM | 91M/308M/636M | MM | 1.0 |
| Skywork | 13B | Dense LLM | 1.0 |
| Swin | 88M | MM | 1.0 |
| T5 | 14M/60M | Dense LLM | 1.0 |
| VisualGLM | 6B | MM | 1.0 |
| Ziya | 13B | Dense LLM | 1.0 |
| Bert | 4M/110M | Dense LLM | 0.8 |
The model maintenance strategy follows the Life Cycle And Version Matching Strategy of the corresponding latest supported version.
2. Installation
Version Mapping
Currently, the Atlas 800T A2 training server is supported.
Python 3.11.4 is recommended for the current suite.
| MindSpore Transformers | MindSpore | CANN | Driver/Firmware | Image Link |
|---|---|---|---|---|
| In-development version | In-development version | In-development version | In-development version | Not involved |
Historical Version Supporting Relationships:
| MindSpore Transformers | MindSpore | CANN | Driver/Firmware | Image Link |
|---|---|---|---|---|
| 1.5.0 | 2.6.0-rc1 | 8.1.RC1 | 25.0.RC1 | Link |
| 1.3.2 | 2.4.10 | 8.0.0 | 24.1.0 | Link |
| 1.3.0 | 2.4.0 | 8.0.RC3 | 24.1.RC3 | Link |
| 1.2.0 | 2.3.0 | 8.0.RC2 | 24.1.RC2 | Link |
Installation Using the Source Code
Currently, MindSpore Transformers can be compiled and installed using the source code. You can run the following commands to install MindSpore Transformers:
git clone -b dev https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh
3. User Guide
MindSpore Transformers supports distributed pre-training, supervised fine-tuning, and inference tasks for large models with one click. You can click the link of each model in Model List to see the corresponding documentation.
For more information about the functions of MindSpore Transformers, please refer to MindSpore Transformers Documentation.
4. Life Cycle And Version Matching Strategy
MindSpore Transformers version has the following five maintenance phases:
| Status | Duration | Description |
|---|---|---|
| Plan | 1-3 months | Planning function. |
| Develop | 3 months | Build function. |
| Preserve | 6 months | Incorporate all solved problems and release new versions. |
| No Preserve | 0—3 months | Incorporate all the solved problems, there is no full-time maintenance team, and there is no plan to release a new version. |
| End of Life (EOL) | N/A | The branch is closed and no longer accepts any modifications. |
MindSpore Transformers released version preservation policy:
| MindSpore Transformers Version | Corresponding Label | Current Status | Release Time | Subsequent Status | EOL Date |
|---|---|---|---|---|---|
| 1.5.0 | v1.5.0 | Preserve | 2025/04/29 | No preserve expected from 2025/10/29 | 2026/01/29 |
| 1.3.2 | v1.3.2 | Preserve | 2024/12/20 | No preserve expected from 2025/06/20 | 2025/09/20 |
| 1.2.0 | v1.2.0 | End of Life | 2024/07/12 | - | 2025/04/12 |
| 1.1.0 | v1.1.0 | End of Life | 2024/04/15 | - | 2025/01/15 |
5. Disclaimer
scripts/examplesdirectory are provided as reference examples and do not form part of the commercially released products. They are only for users' reference. If it needs to be used, the user should be responsible for transforming it into a product suitable for commercial use and ensuring security protection. MindSpore does not assume security responsibility for the resulting security problems.- With regard to datasets, MindSpore Transformers only suggests datasets that can be used for training. MindSpore Transformers does not provide any datasets. If you use these datasets for training, please note that you should comply with the licenses of the corresponding datasets, and that MindSpore Transformers is not responsible for any infringement disputes that may arise from the use of the datasets.
- If you do not want your dataset to be mentioned in MindSpore Transformers, or if you want to update the description of your dataset in MindSpore Transformers, please submit an issue to Gitee, and we will remove or update the description of your dataset according to your issue request. We sincerely appreciate your understanding and contribution to MindSpore Transformers.
6. Contribution
We welcome contributions to the community. For details, see MindSpore Transformers Contribution Guidelines.