mirror of https://gitee.com/mindspore/mindformers.git synced 2025-12-06 11:29:59 +08:00

Go to file

i-robot 024279c219 !5142 【bugfix】【r1.3.0】更新qwen15推理开箱配置

Merge pull request !5142 from sunyuxuan/bugfix/qwen15_yaml_dtype_r130

2024-12-31 08:30:12 +00:00

.gitee

update .gitee/PULL_REQUEST_TEMPLATE.zh-CN.md.

2024-10-24 11:03:40 +00:00

.jenkins

update method docs of metrics

2024-12-16 20:55:03 +08:00

chat_web

fix code static check

2024-09-17 22:20:14 +08:00

configs

set default qkv_concat False

2024-12-10 16:09:05 +08:00

docs

fix param name of metrics docs

2024-12-26 16:55:44 +08:00

mindformers

fix docs issue

2024-12-20 15:52:17 +08:00

research

!5142 【bugfix】【r1.3.0】更新qwen15推理开箱配置

2024-12-31 08:30:12 +00:00

scripts

【r1.3.0】【Bugfix】修复benchmark训练工具在线下载处理数据集error

2024-12-11 19:19:41 +08:00

tests

!5093 【r1.3.0】【UT】补充glm2_transformers.py测试用例

2024-12-17 05:07:44 +00:00

toolkit/benchmarks

for fix CI issue

2024-12-13 11:03:39 +08:00

.gitignore

Modify Image Processors and add UT testing

2024-02-20 18:44:11 +08:00

.readthedocs.yaml

fix read the docs file.

2023-10-23 13:00:08 +08:00

build.sh

删除setuptool版本限制，修改构建时使用的python版本

2023-09-20 16:15:43 +08:00

convert_weight.py

!4018 baichuan2 文档和yaml修改

2024-09-10 12:05:12 +00:00

LICENSE

Initial commit

2022-02-09 03:17:18 +00:00

OWNERS

update r1.3.0 approvers list.

2024-11-07 08:44:08 +00:00

README_CN.md

add en file 1.3.0

2024-12-06 09:43:33 +08:00

README.md

add en file 1.3.0

2024-12-06 09:43:33 +08:00

RELEASE_CN.md

release notes

2024-10-29 16:43:16 +08:00

RELEASE.md

release notes

2024-10-29 16:43:16 +08:00

requirements.txt

!5048 【bugfix】Specific the version of sentencepiece >= 0.2.0

2024-12-12 11:02:26 +00:00

run_mindformer.py

[codefix]

2024-09-13 17:24:19 +08:00

setup.py

在打包时将VERSION_MAP.json加入whell包中

2024-12-15 23:27:22 +08:00

Third_Party_Open_Source_Software_Notice

add third party open source software notice

2024-12-11 16:06:18 +08:00

README.md

MindSpore Transformers (MindFormers)

1. Introduction

The goal of the MindFormers suite is to build a full-process development suite for foundation model training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based pre-trained models and SOTA downstream task applications in the industry, covering various parallel features. It is expected to help users easily implement foundation model training and innovative R&D.

Based on MindSpore's built-in parallel technology and component-based design, the MindFormers suite has the following features:

Seamless switch from single-device to large-scale cluster training with just one line of code
Flexible and easy-to-use personalized parallel configuration
Automatic topology awareness, efficiently combining data parallelism and model parallelism strategies
One-click launch for single-device/multi-device training, fine-tuning, evaluation, and inference for any task
Support for users to configure any module in a modular way, such as optimizers, learning strategies, and network assembly
High-level usability APIs such as Trainer, pipeline, and AutoClass.
Built-in SOTA weight auto-download and loading functionality
Seamless migration and deployment support for AI computing centers

For details about MindFormers tutorials and API documents, see MindFormers Documentation. The following are quick jump links to some of the key content:

If you have any suggestions on MindFormers, contact us through an issue, and we will address it promptly.

Supported Models

The following table lists models supported by MindFormers.

Model	Specifications	Model Type
Llama2	7B/13B/70B	Dense LLM
Llama3	8B/70B	Dense LLM
Llama3.1	8B/70B	Dense LLM
Qwen	7B/14B	Dense LLM
Qwen1.5	7B/14B/72B	Dense LLM
Qwen2	0.5B/1.5B/7B/57B/57B-A14B/72B	Dense/Sparse MoE LLM
Qwen-VL	9.6B	Multimodal
GLM2	6B	Dense LLM
GLM3	6B	Dense LLM
GLM3-32K	6B	Dense LLM
GLM4	9B	Dense LLM
CogVLM2-Video	13B	Multimodal
CogVLM2-Image	19B	Multimodal
InternLM	7B/20B	Dense LLM
InternLM2	7B/20B	Dense LLM
DeepSeek-Coder	33B	Dense LLM
DeepSeek-Coder-V1.5	7B	Dense LLM
DeepSeek-V2	236B	Sparse MoE LLM
CodeLlama	34B	Dense LLM
Mixtral	8x7B	Sparse MoE LLM
Baichuan2	7B/13B	Dense LLM
Yi	6B/34B	Dense LLM
GPT2	13B	Dense LLM
Whisper	1.5B	Multimodal

2. Installation

Version Mapping

Currently, the Atlas 800T A2 training server is supported.

Python 3.10 is recommended for the current suite.

MindFormers	MindPet	MindSpore	CANN	Driver/Firmware	Image Link
1.3.0	1.0.4	2.4.0		24.1.RC3	Link

The preceding software mapping is recommended for MindFormers. The CANN and firmware/driver must match the machine in use. You need to identify the machine model and select the version of the corresponding architecture.

Installation Using the Source Code

Currently, MindFormers can be compiled and installed using the source code. You can run the following commands to install MindFormers:

git clone -b r1.3.0 https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh

3. User Guide

MindFormers supports model pre-training, fine-tuning, inference, and evaluation. You can click a model name in Supported Models to view the document and complete the preceding tasks. The following describes the distributed startup mode and provides an example.

It is recommended that MindFormers launch model training and inference in distributed mode. Currently, the scripts/msrun_launcher.sh distributed launch script is provided as the main way to launch models. For details about the msrun feature, see msrun Launching. The input parameters of the script are described as follows.

Parameter	Required on Single-Node	Required on Multi-Node	Default Value	Description
WORKER_NUM	✓	✓	8	Total number of compute devices used on all nodes
LOCAL_WORKER	-	✓	8	Number of compute devices used on the current node
MASTER_ADDR	-	✓	127.0.0.1	IP address of the primary node to be started in distributed mode
MASTER_PORT	-	✓	8118	Port number bound for distributed startup
NODE_RANK	-	✓	0	Rank ID of the current node
LOG_DIR	-	✓	output/msrun_log	Log output path. If the path does not exist, create it recursively.
JOIN	-	✓	False	Specifies whether to wait for all distributed processes to exit.
CLUSTER_TIME_OUT	-	✓	7200	Waiting time for distributed startup, in seconds.

Note: If you need to specify device_id for launching, you can set the environment variable ASCEND_RT_VISIBLE_DEVICES. For example, to use devices 2 and 3, input export ASCEND_RT_VISIBLE_DEVICES=2,3.

Single-Node Multi-Device

# 1. Single-node multi-device quick launch mode. Eight devices are launched by default.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}"

# 2. Single-node multi-device quick launch mode. You only need to set the number of devices to be used.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" WORKER_NUM

# 3. Single-node multi-device custom launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" \
  WORKER_NUM MASTER_PORT LOG_DIR JOIN CLUSTER_TIME_OUT

Examples

# Single-node multi-device quick launch mode. Eight devices are launched by default.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config path/to/xxx.yaml \
  --run_mode finetune"

# Single-node multi-device quick launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config path/to/xxx.yaml \
  --run_mode finetune" 8

# Single-node multi-device custom launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config path/to/xxx.yaml \
  --run_mode finetune" \
  8 8118 output/msrun_log False 300

Multi-Node Multi-Device

To execute the multi-node multi-device script for distributed training, you need to run the script on different nodes and set MASTER_ADDR to the IP address of the primary node. The IP address should be the same across all nodes, and only the NODE_RANK parameter varies across nodes.

# Multi-node multi-device custom launch mode.
bash scripts/msrun_launcher.sh "run_mindformer.py \
 --config {CONFIG_PATH} \
 --run_mode {train/finetune/eval/predict}" \
 WORKER_NUM LOCAL_WORKER MASTER_ADDR MASTER_PORT NODE_RANK LOG_DIR JOIN CLUSTER_TIME_OUT

Examples

# Node 0, with IP address 192.168.1.1, serves as the primary node. There are a total of 8 devices, with 4 devices allocated per node.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" \
  8 4 192.168.1.1 8118 0 output/msrun_log False 300

# Node 1, with IP address 192.168.1.2, has the same launch command as node 0, with the only difference being the NODE_RANK parameter.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --config {CONFIG_PATH} \
  --run_mode {train/finetune/eval/predict}" \
  8 4 192.168.1.1 8118 1 output/msrun_log False 300

Single-Device Launch

MindFormers provides the run_mindformer.py script as the single-device launch method. This script can be used to complete the single-device training, fine-tuning, evaluation, and inference of a model based on the model configuration file.

# The input parameters for running run_mindformer.py will override the parameters in the model configuration file.
python run_mindformer.py --config {CONFIG_PATH} --run_mode {train/finetune/eval/predict}

4. Contribution

We welcome contributions to the community. For details, see MindFormers Contribution Guidelines.

5. License

Apache 2.0 License

Description

MindSpore Transformers套件的目标是构建一个大模型预训练、微调、推理、部署的全流程开发套件，提供业内主流的Transformer类大语言模型（Large Language Models, LLMs）和多模态理解模型（Multimodal Models, MMs）。期望帮助用户轻松地实现大模型全流程开发。

Readme Apache-2.0 242 MiB