i-robot
|
8b580434fd
|
!7583 【master】【train】【mcore】support dryrun with alltoall_deredundency
Merge pull request !7583 from husichao/master_dry_run
|
2025-11-13 02:56:25 +00:00 |
|
i-robot
|
a4444ad722
|
!7626 【master】【bugfix】修复融合rope算子用例精度问题
Merge pull request !7626 from kongziyi/fix_fused_rope_ut_master
|
2025-11-12 09:50:39 +00:00 |
|
liuyanwei
|
d474f3aada
|
【master】【bugfix】补充自定义参数初始化方差功能用例
|
2025-11-12 16:57:09 +08:00 |
|
kongziyi
|
a4cca04e3d
|
【master】【bugfix】修复融合rope算子用例精度问题
|
2025-11-12 16:05:45 +08:00 |
|
i-robot
|
7a7c393793
|
!7625 【bugfix】修复Trainer中获取optimizer为None的情况
Merge pull request !7625 from JavaZero/fix_muon_bug
|
2025-11-12 08:04:44 +00:00 |
|
i-robot
|
8777a0d892
|
!7608 【master】支持离线切分权重转换加载
Merge pull request !7608 from 森镇/adapt_load_offline_split_ckpt_master
|
2025-11-12 07:37:10 +00:00 |
|
i-robot
|
dc5ae6a17b
|
!7600 fix compute dtype in residual and rope.
Merge pull request !7600 from niujunhao/bugfix/fix_compute_dtype
|
2025-11-11 13:09:13 +00:00 |
|
senzhen
|
b07ac95277
|
支持其他命名格式权重转换加载
|
2025-11-11 21:01:58 +08:00 |
|
JavaZero
|
86ee0bae29
|
Fix optimizer type check to ensure configuration is valid before accessing properties
|
2025-11-11 19:44:29 +08:00 |
|
i-robot
|
84535d0fb1
|
!7620 【bugfix】【master】修复vocab_emb_dp=True时,embedding做多余通信带来的精度问题
Merge pull request !7620 from hsshuai/bugfix/master/vocab_emb_dp
|
2025-11-11 11:35:06 +00:00 |
|
niujunhao
|
3b7a97f709
|
fix residual fp32 and rope dtype.
|
2025-11-11 15:50:54 +08:00 |
|
i-robot
|
6171b5828a
|
!7616 Revert "del llama3_1"
Merge pull request !7616 from zyw_hw/revert_llama3_1
|
2025-11-11 06:32:31 +00:00 |
|
husichao
|
2a44886582
|
support dryrun with alltoall_deredundency
|
2025-11-11 10:02:30 +08:00 |
|
i-robot
|
e7b8fe4166
|
!7599 【master】【mcore】【feature】EPLB:支持专家负载统计输出
Merge pull request !7599 from YinanF/eplb
|
2025-11-11 01:54:31 +00:00 |
|
i-robot
|
006a8c208c
|
!7617 add quant-method ut
Merge pull request !7617 from hangq/master
|
2025-11-11 01:20:04 +00:00 |
|
YinanF
|
2ad47db864
|
【master】【mcore】【feature】EPLB:支持专家负载统计输出
|
2025-11-10 21:47:37 +08:00 |
|
Hsshuai
|
63cf1fba97
|
fix precision issue when vocab_emb_dp=True
(cherry picked from commit 10bbf69209)
|
2025-11-10 21:11:19 +08:00 |
|
hangangqiang
|
413b11381b
|
add quant-method ut
|
2025-11-10 17:39:57 +08:00 |
|
zyw_hw
|
7d7323d933
|
revert del llama3_1
|
2025-11-10 16:31:53 +08:00 |
|
i-robot
|
f06a946af2
|
!7555 【master】Add support for NoPE layers and update configuration parameters
Merge pull request !7555 from JavaZero/add_nope_interleave
|
2025-11-08 11:19:56 +00:00 |
|
kongziyi
|
6c5df3dfed
|
【master】【bugfix】修复mcore训练开swap报错问题
|
2025-11-08 15:51:11 +08:00 |
|
i-robot
|
6aaec30d17
|
!7596 【master】Mcore新增支持DeepSeekV3下的Muon优化器
Merge pull request !7596 from JavaZero/mcore_add_muon_2
|
2025-11-08 02:56:04 +00:00 |
|
JavaZero
|
2a82110b0e
|
Add support for NoPE layers and update configuration parameters
|
2025-11-08 10:47:59 +08:00 |
|
JavaZero
|
f907a99c44
|
add Muon
|
2025-11-07 17:47:24 +08:00 |
|
i-robot
|
05f9b48752
|
!7593 【master】【DFX】修复进程级快恢+开启断点续训问题
Merge pull request !7593 from zyw_hw/fix_tft_skip_load
|
2025-11-07 04:57:32 +00:00 |
|
i-robot
|
59c897ee17
|
!7580 机间通信合并断流优化
Merge pull request !7580 from liuyanwei/fix_move_to
|
2025-11-07 01:22:28 +00:00 |
|
i-robot
|
c0a8e8b456
|
!7597 add grouped lr limit.
Merge pull request !7597 from niujunhao/bugfix/fix_grouped_lr_limit
|
2025-11-07 01:22:18 +00:00 |
|
niujunhao
|
ab221d69ba
|
add grouped lr limit.
|
2025-11-06 17:52:32 +08:00 |
|
i-robot
|
feb5b1ec69
|
!7575 【bugfix】【master】修改分组路由配置生效逻辑
Merge pull request !7575 from hsshuai/bugfix/master/group_limited
|
2025-11-06 08:15:06 +00:00 |
|
i-robot
|
91dbb7d3fc
|
!7585 【master】下架telechat2/infer
Merge pull request !7585 from zyw_hw/del_telechat2_infer
|
2025-11-06 06:50:05 +00:00 |
|
i-robot
|
c28c064cfe
|
!7586 【master】下架llama3_1
Merge pull request !7586 from zyw_hw/del_llama_3_1
|
2025-11-06 06:49:50 +00:00 |
|
i-robot
|
361bc85818
|
!7587 【master】下架mixtral
Merge pull request !7587 from zyw_hw/del_mixtral
|
2025-11-06 06:49:18 +00:00 |
|
i-robot
|
1f740cb042
|
!7588 【master】下架llm_boost
Merge pull request !7588 from zyw_hw/del_llm_boost
|
2025-11-06 06:49:02 +00:00 |
|
i-robot
|
75521b2218
|
!7595 【master】添加监控max_logits
Merge pull request !7595 from JavaZero/mcore_add_max_logits_monitor
|
2025-11-06 03:21:40 +00:00 |
|
JavaZero
|
2533249247
|
Muon: Support Muon optimizer + qk-clip + max_logits monitor
fix pylint
fix Lizard
fix pylint
remove linkk
|
2025-11-06 00:23:15 +08:00 |
|
zyw_hw
|
cf1b6a7868
|
reboot node skip load ckpt
|
2025-11-05 16:13:56 +08:00 |
|
i-robot
|
3b3e9b293f
|
!7590 【master】修复ms包链接失效的问题
Merge pull request !7590 from zyw_hw/fix_ms_pkg_url
|
2025-11-05 06:17:42 +00:00 |
|
zyw_hw
|
a4eb0316d4
|
fix ms package url
|
2025-11-05 11:46:10 +08:00 |
|
zyw_hw
|
4f34b0922a
|
del llama3_1
|
2025-11-05 11:00:05 +08:00 |
|
zyw_hw
|
790867d726
|
del mixtral
|
2025-11-05 10:57:23 +08:00 |
|
zyw_hw
|
b2f23a45f7
|
del telechat2/infer
|
2025-11-05 10:28:36 +08:00 |
|
Hsshuai
|
7805009817
|
Refactor TopKRouter to enable group-limited routing based on num_groups configuration
|
2025-11-05 10:20:46 +08:00 |
|
i-robot
|
7b62206acb
|
!7571 【master】【Mcore】bugfix: force_expert_balance when use expert_bias
Merge pull request !7571 from husichao/master_force
|
2025-11-05 02:01:28 +00:00 |
|
zyw_hw
|
902b50f3fd
|
del llm_boost
|
2025-11-05 09:51:28 +08:00 |
|
i-robot
|
2a7f10d293
|
!7567 【master】【bugfix】增加空指针校验
Merge pull request !7567 from zyw_hw/add_nullstr_check
|
2025-11-04 01:04:29 +00:00 |
|
liuyanwei
|
ce82f46f42
|
机间通信合并断流优化
|
2025-11-03 20:44:29 +08:00 |
|
zyw_hw
|
7af2aa8522
|
add nullstr check
|
2025-11-03 17:01:38 +08:00 |
|
husichao
|
e1c8abc518
|
bugfix: force_expert_balance when use expert_bias
|
2025-11-03 14:19:44 +08:00 |
|
i-robot
|
cd1e17d522
|
!7563 更新set_ms_affinity
Merge pull request !7563 from AAA碧根果批发赵少/affinity
|
2025-10-31 09:04:50 +00:00 |
|
yiyison
|
31ffc657e5
|
更新set_ms_affinity
|
2025-10-31 15:02:12 +08:00 |
|