mirror of https://gitee.com/mindspore/mindformers.git synced 2025-12-06 11:29:59 +08:00

Go to file

i-robot bfbcc1eaf4 !6700 support dispatch_quant

Merge pull request !6700 from wangshaocong/br_infer_deepseek_os

2025-07-21 09:41:44 +00:00

.gitee

[dev][template]Update pull request template.

2025-02-25 21:33:31 +08:00

.jenkins

delete whisper, llava model

2025-06-17 11:37:33 +08:00

configs

code_rf_mae_swin_vit

2025-06-20 15:59:53 +08:00

docs

!6559 【Dev】【下架】mae&vit&swin 模型相关代码

2025-06-23 06:45:12 +00:00

mindformers

ops continugous

2025-07-08 09:52:59 +08:00

research

support dispatch_quant and dequantSwigluQuant

2025-07-19 16:19:23 +08:00

scripts

!6520 【dev】【删除模型】delete whisper, llava, llava_next

2025-06-18 01:21:54 +00:00

tests

!6559 【Dev】【下架】mae&vit&swin 模型相关代码

2025-06-23 06:45:12 +00:00

toolkit

!6537 【Dev】【下架】DPO&Yizhao

2025-06-18 08:29:20 +00:00

.gitignore

support seqpp and heterogeneous model

2025-03-07 16:56:58 +08:00

.readthedocs.yaml

fix read the docs file.

2023-10-23 13:00:08 +08:00

build.sh

删除setuptool版本限制，修改构建时使用的python版本

2023-09-20 16:15:43 +08:00

convert_weight.py

code_rf_mae_swin_vit

2025-06-20 15:59:53 +08:00

LICENSE

Initial commit

2022-02-09 03:17:18 +00:00

OWNERS

add owners

2025-06-24 16:53:46 +08:00

README_CN.md

【Docs】修复因官网文档大纲调整后产生的失效链接

2025-06-05 16:00:26 +08:00

README.md

【Docs】修复因官网文档大纲调整后产生的失效链接

2025-06-05 16:00:26 +08:00

requirements.txt

hf_tokenizer 快速复用

2025-06-08 18:50:06 +08:00

run_mindformer.py

code_rf_mae_swin_vit

2025-06-20 15:59:53 +08:00

setup.py

[Build] Update required python version

2025-06-09 16:15:28 +08:00

Third_Party_Open_Source_Software_Notice

hf_tokenizer 快速复用

2025-06-08 18:50:06 +08:00

README.md

MindSpore Transformers (MindFormers)

1. Introduction

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

One-click initiation of single or multi card pre-training, fine-tuning, evaluation, inference, and deployment processes for large models;
Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
Provide real-time visualization of training accuracy/performance monitoring indicators.

For details about MindSpore Transformers tutorials and API documents, see MindSpore Transformers Documentation. The following are quick jump links to some of the key content:

If you have any suggestions on MindSpore Transformers, contact us through an issue, and we will address it promptly.

Models List

The following table lists models supported by MindSpore Transformers.

Model	Specifications	Model Type	Latest Version
DeepSeek-V3	671B	Sparse LLM	In-development version, 1.5.0
GLM4	9B	Dense LLM	In-development version, 1.5.0
Llama3.1	8B/70B	Dense LLM	In-development version, 1.5.0
Qwen2.5	0.5B/1.5B/7B/14B/32B/72B	Dense LLM	In-development version, 1.5.0
TeleChat2	7B/35B/115B	Dense LLM	In-development version, 1.5.0
CodeLlama	34B	Dense LLM	1.5.0
CogVLM2-Image	19B	MM	1.5.0
CogVLM2-Video	13B	MM	1.5.0
DeepSeek-V2	236B	Sparse LLM	1.5.0
DeepSeek-Coder-V1.5	7B	Dense LLM	1.5.0
DeepSeek-Coder	33B	Dense LLM	1.5.0
GLM3-32K	6B	Dense LLM	1.5.0
GLM3	6B	Dense LLM	1.5.0
InternLM2	7B/20B	Dense LLM	1.5.0
Llama3.2	3B	Dense LLM	1.5.0
Llama3.2-Vision	11B	MM	1.5.0
Llama3	8B/70B	Dense LLM	1.5.0
Llama2	7B/13B/70B	Dense LLM	1.5.0
Mixtral	8x7B	Sparse LLM	1.5.0
Qwen2	0.5B/1.5B/7B/57B/57B-A14B/72B	Dense/Sparse LLM	1.5.0
Qwen1.5	7B/14B/72B	Dense LLM	1.5.0
Qwen-VL	9.6B	MM	1.5.0
TeleChat	7B/12B/52B	Dense LLM	1.5.0
Whisper	1.5B	MM	1.5.0
Yi	6B/34B	Dense LLM	1.5.0
YiZhao	12B	Dense LLM	1.5.0
Baichuan2	7B/13B	Dense LLM	1.3.2
GLM2	6B	Dense LLM	1.3.2
GPT2	124M/13B	Dense LLM	1.3.2
InternLM	7B/20B	Dense LLM	1.3.2
Qwen	7B/14B	Dense LLM	1.3.2
CodeGeex2	6B	Dense LLM	1.1.0
WizardCoder	15B	Dense LLM	1.1.0
Baichuan	7B/13B	Dense LLM	1.0
Blip2	8.1B	MM	1.0
Bloom	560M/7.1B/65B/176B	Dense LLM	1.0
Clip	149M/428M	MM	1.0
CodeGeex	13B	Dense LLM	1.0
GLM	6B	Dense LLM	1.0
iFlytekSpark	13B	Dense LLM	1.0
Llama	7B/13B	Dense LLM	1.0
MAE	86M	MM	1.0
Mengzi3	13B	Dense LLM	1.0
PanguAlpha	2.6B/13B	Dense LLM	1.0
SAM	91M/308M/636M	MM	1.0
Skywork	13B	Dense LLM	1.0
Swin	88M	MM	1.0
T5	14M/60M	Dense LLM	1.0
VisualGLM	6B	MM	1.0
Ziya	13B	Dense LLM	1.0
Bert	4M/110M	Dense LLM	0.8

The model maintenance strategy follows the Life Cycle And Version Matching Strategy of the corresponding latest supported version.

2. Installation

Version Mapping

Currently, the Atlas 800T A2 training server is supported.

Python 3.11.4 is recommended for the current suite.

MindSpore Transformers	MindSpore	CANN	Driver/Firmware	Image Link
In-development version	In-development version	In-development version	In-development version	Not involved

Historical Version Supporting Relationships:

MindSpore Transformers	MindSpore	CANN	Driver/Firmware	Image Link
1.5.0	2.6.0-rc1	8.1.RC1	25.0.RC1	Link
1.3.2	2.4.10	8.0.0	24.1.0	Link
1.3.0	2.4.0	8.0.RC3	24.1.RC3	Link
1.2.0	2.3.0	8.0.RC2	24.1.RC2	Link

Installation Using the Source Code

Currently, MindSpore Transformers can be compiled and installed using the source code. You can run the following commands to install MindSpore Transformers:

git clone -b dev https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh

3. User Guide

MindSpore Transformers supports distributed pre-training, supervised fine-tuning, and inference tasks for large models with one click. You can click the link of each model in Model List to see the corresponding documentation.

For more information about the functions of MindSpore Transformers, please refer to MindSpore Transformers Documentation.

4. Life Cycle And Version Matching Strategy

MindSpore Transformers version has the following five maintenance phases:

Status	Duration	Description
Plan	1-3 months	Planning function.
Develop	3 months	Build function.
Preserve	6 months	Incorporate all solved problems and release new versions.
No Preserve	0—3 months	Incorporate all the solved problems, there is no full-time maintenance team, and there is no plan to release a new version.
End of Life (EOL)	N/A	The branch is closed and no longer accepts any modifications.

MindSpore Transformers released version preservation policy:

MindSpore Transformers Version	Corresponding Label	Current Status	Release Time	Subsequent Status	EOL Date
1.5.0	v1.5.0	Preserve	2025/04/29	No preserve expected from 2025/10/29	2026/01/29
1.3.2	v1.3.2	Preserve	2024/12/20	No preserve expected from 2025/06/20	2025/09/20
1.2.0	v1.2.0	End of Life	2024/07/12	-	2025/04/12
1.1.0	v1.1.0	End of Life	2024/04/15	-	2025/01/15

5. Disclaimer

scripts/examples directory are provided as reference examples and do not form part of the commercially released products. They are only for users' reference. If it needs to be used, the user should be responsible for transforming it into a product suitable for commercial use and ensuring security protection. MindSpore does not assume security responsibility for the resulting security problems.
With regard to datasets, MindSpore Transformers only suggests datasets that can be used for training. MindSpore Transformers does not provide any datasets. If you use these datasets for training, please note that you should comply with the licenses of the corresponding datasets, and that MindSpore Transformers is not responsible for any infringement disputes that may arise from the use of the datasets.
If you do not want your dataset to be mentioned in MindSpore Transformers, or if you want to update the description of your dataset in MindSpore Transformers, please submit an issue to Gitee, and we will remove or update the description of your dataset according to your issue request. We sincerely appreciate your understanding and contribution to MindSpore Transformers.

6. Contribution

We welcome contributions to the community. For details, see MindSpore Transformers Contribution Guidelines.

7. License

Apache 2.0 License

Description

MindSpore Transformers套件的目标是构建一个大模型预训练、微调、推理、部署的全流程开发套件，提供业内主流的Transformer类大语言模型（Large Language Models, LLMs）和多模态理解模型（Multimodal Models, MMs）。期望帮助用户轻松地实现大模型全流程开发。

Readme Apache-2.0 242 MiB