8167 Commits

Author SHA1 Message Date
i-robot
8b580434fd !7583 【master】【train】【mcore】support dryrun with alltoall_deredundency
Merge pull request !7583 from husichao/master_dry_run
2025-11-13 02:56:25 +00:00
i-robot
a4444ad722 !7626 【master】【bugfix】修复融合rope算子用例精度问题
Merge pull request !7626 from kongziyi/fix_fused_rope_ut_master
2025-11-12 09:50:39 +00:00
liuyanwei
d474f3aada 【master】【bugfix】补充自定义参数初始化方差功能用例 2025-11-12 16:57:09 +08:00
kongziyi
a4cca04e3d 【master】【bugfix】修复融合rope算子用例精度问题 2025-11-12 16:05:45 +08:00
i-robot
7a7c393793 !7625 【bugfix】修复Trainer中获取optimizer为None的情况
Merge pull request !7625 from JavaZero/fix_muon_bug
2025-11-12 08:04:44 +00:00
i-robot
8777a0d892 !7608 【master】支持离线切分权重转换加载
Merge pull request !7608 from 森镇/adapt_load_offline_split_ckpt_master
2025-11-12 07:37:10 +00:00
i-robot
dc5ae6a17b !7600 fix compute dtype in residual and rope.
Merge pull request !7600 from niujunhao/bugfix/fix_compute_dtype
2025-11-11 13:09:13 +00:00
senzhen
b07ac95277 支持其他命名格式权重转换加载 2025-11-11 21:01:58 +08:00
JavaZero
86ee0bae29 Fix optimizer type check to ensure configuration is valid before accessing properties 2025-11-11 19:44:29 +08:00
i-robot
84535d0fb1 !7620 【bugfix】【master】修复vocab_emb_dp=True时,embedding做多余通信带来的精度问题
Merge pull request !7620 from hsshuai/bugfix/master/vocab_emb_dp
2025-11-11 11:35:06 +00:00
niujunhao
3b7a97f709 fix residual fp32 and rope dtype. 2025-11-11 15:50:54 +08:00
i-robot
6171b5828a !7616 Revert "del llama3_1"
Merge pull request !7616 from zyw_hw/revert_llama3_1
2025-11-11 06:32:31 +00:00
husichao
2a44886582 support dryrun with alltoall_deredundency 2025-11-11 10:02:30 +08:00
i-robot
e7b8fe4166 !7599 【master】【mcore】【feature】EPLB:支持专家负载统计输出
Merge pull request !7599 from YinanF/eplb
2025-11-11 01:54:31 +00:00
i-robot
006a8c208c !7617 add quant-method ut
Merge pull request !7617 from hangq/master
2025-11-11 01:20:04 +00:00
YinanF
2ad47db864 【master】【mcore】【feature】EPLB:支持专家负载统计输出 2025-11-10 21:47:37 +08:00
Hsshuai
63cf1fba97 fix precision issue when vocab_emb_dp=True
(cherry picked from commit 10bbf69209)
2025-11-10 21:11:19 +08:00
hangangqiang
413b11381b add quant-method ut 2025-11-10 17:39:57 +08:00
zyw_hw
7d7323d933 revert del llama3_1 2025-11-10 16:31:53 +08:00
i-robot
f06a946af2 !7555 【master】Add support for NoPE layers and update configuration parameters
Merge pull request !7555 from JavaZero/add_nope_interleave
2025-11-08 11:19:56 +00:00
kongziyi
6c5df3dfed 【master】【bugfix】修复mcore训练开swap报错问题 2025-11-08 15:51:11 +08:00
i-robot
6aaec30d17 !7596 【master】Mcore新增支持DeepSeekV3下的Muon优化器
Merge pull request !7596 from JavaZero/mcore_add_muon_2
2025-11-08 02:56:04 +00:00
JavaZero
2a82110b0e Add support for NoPE layers and update configuration parameters 2025-11-08 10:47:59 +08:00
JavaZero
f907a99c44 add Muon 2025-11-07 17:47:24 +08:00
i-robot
05f9b48752 !7593 【master】【DFX】修复进程级快恢+开启断点续训问题
Merge pull request !7593 from zyw_hw/fix_tft_skip_load
2025-11-07 04:57:32 +00:00
i-robot
59c897ee17 !7580 机间通信合并断流优化
Merge pull request !7580 from liuyanwei/fix_move_to
2025-11-07 01:22:28 +00:00
i-robot
c0a8e8b456 !7597 add grouped lr limit.
Merge pull request !7597 from niujunhao/bugfix/fix_grouped_lr_limit
2025-11-07 01:22:18 +00:00
niujunhao
ab221d69ba add grouped lr limit. 2025-11-06 17:52:32 +08:00
i-robot
feb5b1ec69 !7575 【bugfix】【master】修改分组路由配置生效逻辑
Merge pull request !7575 from hsshuai/bugfix/master/group_limited
2025-11-06 08:15:06 +00:00
i-robot
91dbb7d3fc !7585 【master】下架telechat2/infer
Merge pull request !7585 from zyw_hw/del_telechat2_infer
2025-11-06 06:50:05 +00:00
i-robot
c28c064cfe !7586 【master】下架llama3_1
Merge pull request !7586 from zyw_hw/del_llama_3_1
2025-11-06 06:49:50 +00:00
i-robot
361bc85818 !7587 【master】下架mixtral
Merge pull request !7587 from zyw_hw/del_mixtral
2025-11-06 06:49:18 +00:00
i-robot
1f740cb042 !7588 【master】下架llm_boost
Merge pull request !7588 from zyw_hw/del_llm_boost
2025-11-06 06:49:02 +00:00
i-robot
75521b2218 !7595 【master】添加监控max_logits
Merge pull request !7595 from JavaZero/mcore_add_max_logits_monitor
2025-11-06 03:21:40 +00:00
JavaZero
2533249247 Muon: Support Muon optimizer + qk-clip + max_logits monitor
fix pylint

fix Lizard

fix pylint

remove linkk
2025-11-06 00:23:15 +08:00
zyw_hw
cf1b6a7868 reboot node skip load ckpt 2025-11-05 16:13:56 +08:00
i-robot
3b3e9b293f !7590 【master】修复ms包链接失效的问题
Merge pull request !7590 from zyw_hw/fix_ms_pkg_url
2025-11-05 06:17:42 +00:00
zyw_hw
a4eb0316d4 fix ms package url 2025-11-05 11:46:10 +08:00
zyw_hw
4f34b0922a del llama3_1 2025-11-05 11:00:05 +08:00
zyw_hw
790867d726 del mixtral 2025-11-05 10:57:23 +08:00
zyw_hw
b2f23a45f7 del telechat2/infer 2025-11-05 10:28:36 +08:00
Hsshuai
7805009817 Refactor TopKRouter to enable group-limited routing based on num_groups configuration 2025-11-05 10:20:46 +08:00
i-robot
7b62206acb !7571 【master】【Mcore】bugfix: force_expert_balance when use expert_bias
Merge pull request !7571 from husichao/master_force
2025-11-05 02:01:28 +00:00
zyw_hw
902b50f3fd del llm_boost 2025-11-05 09:51:28 +08:00
i-robot
2a7f10d293 !7567 【master】【bugfix】增加空指针校验
Merge pull request !7567 from zyw_hw/add_nullstr_check
2025-11-04 01:04:29 +00:00
liuyanwei
ce82f46f42 机间通信合并断流优化 2025-11-03 20:44:29 +08:00
zyw_hw
7af2aa8522 add nullstr check 2025-11-03 17:01:38 +08:00
husichao
e1c8abc518 bugfix: force_expert_balance when use expert_bias 2025-11-03 14:19:44 +08:00
i-robot
cd1e17d522 !7563 更新set_ms_affinity
Merge pull request !7563 from AAA碧根果批发赵少/affinity
2025-10-31 09:04:50 +00:00
yiyison
31ffc657e5 更新set_ms_affinity 2025-10-31 15:02:12 +08:00