Commit Graph

8114 Commits

Author SHA1 Message Date
Hsshuai
7db691b9f0 add testcase for trainer 2025-11-29 10:13:50 +08:00
i-robot
3ce5bcc657 !7741 【bugfix】【master】【MCore】对DeepSeek-V3离线权重转换脚本做多进程加速
Merge pull request !7741 from SaiYao/update_convert_dpsk_mcore_hf_251126
2025-11-29 01:23:42 +00:00
i-robot
9bd961cdfd !7728 【master】【bugfix】移除onehot重计算
Merge pull request !7728 from 魏琢艺/onehot_fix
2025-11-28 11:31:43 +00:00
zyw_hw
3ef42f609c fix tokenizer case bug 2025-11-28 19:28:01 +08:00
i-robot
3db906e486 !7736 [bugfix] muon apply grouped lr.
Merge pull request !7736 from niujunhao/bugfix/muon_grouped_lr
2025-11-28 10:17:51 +00:00
yiyison
342a937efc 非法device id拦截 2025-11-28 17:12:29 +08:00
i-robot
ada331c1df !7725 [bugfix] fix input config in megatron dataset.
Merge pull request !7725 from niujunhao/bugfix/fix_megatron_config
2025-11-28 08:42:35 +00:00
i-robot
627aa3702f !7530 【master】新增pma测试用例
Merge pull request !7530 from lan/pma_test
2025-11-28 08:29:58 +00:00
SaiYao
f92920e1a2 【MCore】对DeepSeek-V3离线权重转换脚本做多进程加速 2025-11-28 14:50:20 +08:00
魏琢艺
6326c428d2 remove onehot recompute 2025-11-28 11:23:54 +08:00
lanxiang
9212f2c983 pma新增测试用例 2025-11-28 10:16:04 +08:00
Yule100
634737e446 [Feature] 升级transformers版本 2025-11-27 20:50:46 +08:00
i-robot
7cf42f96af !7735 【bugfix】【master】增加重试机制保证创建文件夹在共享存储下能够被正确读取
Merge pull request !7735 from hsshuai/bugfix/master/file_sys
2025-11-27 12:24:18 +00:00
i-robot
81e7296e76 !7693 【master】【bugfix】消除位置编码冗余通信重排
Merge pull request !7693 from lzy0920232/code_bugfix_remove
2025-11-27 12:12:19 +00:00
Hsshuai
69cc5e15ab Enhance set_safe_mode_for_file_or_dir function with retry logic for file permission changes and remove unnecessary cache refresh in set_strategy_save_path. 2025-11-27 17:41:53 +08:00
niujunhao
261ba11cdc muon apply grouped lr. 2025-11-27 17:27:05 +08:00
i-robot
5367943249 !7712 【用例】【master】为tokenizer基类添加用例
Merge pull request !7712 from hsshuai/test/master/tokenization_tests
2025-11-27 08:21:30 +00:00
i-robot
f00f6145be !7721 【master】【mcore】【bugfix】fix nope_layer_interval not rejected by invalid value
Merge pull request !7721 from zhangyihui/master-nope
2025-11-27 07:19:50 +00:00
i-robot
becc76cd4b !7729 【master】【门禁】更新ms包
Merge pull request !7729 from zyw_hw/update_ms_url_1127
2025-11-27 06:20:32 +00:00
zyw_hw
c747d11b0d update ms pkg url 2025-11-27 11:19:15 +08:00
i-robot
08ba1272ef !7715 Add unit tests to context
Merge pull request !7715 from Jingwei Huang/upstream_master
2025-11-27 02:52:49 +00:00
i-robot
134c13be51 !7691 1. Title:限制position_embedding_type 取值范围在'rope', 'yarn', 'none', 'relative' ,'learned_absolute'中 2. Type: Bugfix: Fix Bug
Merge pull request !7691 from zzzkeke/new/modify
2025-11-27 01:59:13 +00:00
i-robot
4b2b0da025 !7720 【master】【bugfix】避免重复创建通信组导致过多warning
Merge pull request !7720 from 魏琢艺/group_update
2025-11-27 01:40:39 +00:00
zhangyihuiben
d4d973a654 【master】【mcore】【bugfix】fix nope_layer_interval not rejected by invalid value 2025-11-26 20:22:56 +08:00
JingweiHuang
c9f21c7f66 Add unit tests to context 2025-11-26 18:55:21 +08:00
niujunhao
e18b6d59ed fix input config in megatron dataset. 2025-11-26 14:21:51 +08:00
魏琢艺
ff937e67bb avoid repeatedly create group 2025-11-26 10:33:39 +08:00
i-robot
9346440d73 !7716 【MCore】对Qwen3系列反转脚本做多进程加速
Merge pull request !7716 from SaiYao/fix_reverse_qwen3_mcore_251125
2025-11-25 12:11:57 +00:00
i-robot
7cb8adc3c6 !7719 Enhance FlashAttention: optimize max logits tracking and reduce max op…
Merge pull request !7719 from JavaZero/fix_muon_tnd
2025-11-25 11:38:45 +00:00
zyw_hw
f5f3009e97 fix profiler step question 2025-11-25 19:24:03 +08:00
Hsshuai
70fb19f35d add test for tokenization 2025-11-25 19:04:20 +08:00
i-robot
8b0a977906 !7718 flash new made dir.
Merge pull request !7718 from niujunhao/bugfix/flash_dir
2025-11-25 10:53:30 +00:00
JavaZero
b5bbe1983e Enhance FlashAttention: optimize max logits tracking and reduce max operationa and fix tnd layout 2025-11-25 18:44:21 +08:00
i-robot
1b70b5a9c7 !7710 fix toolapaca case
Merge pull request !7710 from niujunhao/bugfix/fix_toolalpaca
2025-11-25 10:30:22 +00:00
i-robot
9755ec848f !7633 【master】【bugfix】增加offset校验,避免stage分配到负数层
Merge pull request !7633 from kongziyi/fix_offset_master
2025-11-25 09:05:47 +00:00
niujunhao
68d664bdcf flash new made dir. 2025-11-25 17:04:20 +08:00
SaiYao
67742fd98a 【MCore】对Qwen3系列反转脚本做多进程加速 2025-11-25 16:50:45 +08:00
i-robot
0a91666087 !7683 【master】完善非共享路径校验报错
Merge pull request !7683 from 森镇/fix_shared_path_error
2025-11-25 06:41:56 +00:00
kongziyi
23206156e3 【master】【bugfix】增加offset校验,避免分配负数层 2025-11-25 11:52:32 +08:00
i-robot
f639b49f4b !7713 [bugfix] fix iter num_parallel_workers in hf streaming load.
Merge pull request !7713 from niujunhao/bugfix/fix_hf_stream_workers
2025-11-25 02:24:52 +00:00
i-robot
df70d77d90 !7697 【master】【bugfix】权重2.0去冗余保存流程bugfix
Merge pull request !7697 from AAA碧根果批发赵少/master
2025-11-24 12:49:02 +00:00
niujunhao
0f1e389c62 fix toolalpaca case. 2025-11-24 20:02:54 +08:00
i-robot
983771412e !7711 【Telechat2】对无用配置项 masked_softmax_fusion 做忽略处理
Merge pull request !7711 from SaiYao/fix_telechat2_model
2025-11-24 12:02:40 +00:00
zzzkeke
ceeb710a20 The position_embedding_type must be one of: 'rope', 'yarn', 'none', 'relative', 'learned_absolute'. 2025-11-24 19:16:30 +08:00
niujunhao
4c5f9074cb fix iter num_parallel_workers in hf streaming load. 2025-11-24 17:44:35 +08:00
SaiYao
ef14ccf1de 【Telechat2】对无用配置项做忽略处理 2025-11-24 16:40:58 +08:00
i-robot
fc9bdbe0c4 !7702 【safetensors】修复老流程无法正确加载HuggingFace权重
Merge pull request !7702 from SaiYao/fix_load_hf_legacy_251122
2025-11-24 02:37:56 +00:00
senzhen
1857b5ae68 完善非共享路径校验报错 2025-11-24 09:30:34 +08:00
yiyison
a37b706ac9 保存流程bugfix 2025-11-24 09:25:13 +08:00
i-robot
c059070e1c !7701 【bugfix】【master】回退.copy为.value以及ms_custom_ops导包问题
Merge pull request !7701 from hsshuai/bugfix/master/host_memory
2025-11-22 12:11:53 +00:00