68 Commits

Author SHA1 Message Date
jzh
56b3cda738 !3127 [pytorch][bugfix] update icsl for weights_only
Merge pull request !3127 from jzh/210_aicsl
2025-08-11 15:58:41 +00:00
guozhihua
d13b6003f5 !3035 [pytorch][sh]change core commitid of deepseekv3
Merge pull request !3035 from guozhihua/2.1.0
2025-08-09 03:19:13 +00:00
jzh
6785f325f3 !3113 [pytorch][bugfix]fix some bug for icsl
Merge pull request !3113 from jzh/210_uicsl
2025-08-08 07:13:20 +00:00
qu_yueze
acc11f130c !2966 [pytorch][bugfix]fix vpp of V3_ckpt
Merge pull request !2966 from qu_yueze/2.1.0
2025-07-04 08:43:57 +00:00
shengjy
c6c3f3a39c !2826 [pytorch][sh]update mindspeed commit id for deepseek3
Merge pull request !2826 from shengjy/master
2025-06-17 06:37:21 +00:00
shengjy
f38bb1fd30 !2667 [pytorch][feature]dualpipe in sft
Merge pull request !2667 from shengjy/sft_dp
2025-06-16 08:12:31 +00:00
qu_yueze
64ae8b737b !2803 fix readme of V3
Merge pull request !2803 from qu_yueze/master
2025-06-10 11:11:53 +00:00
qu_yueze
d4a80afd20 !2787 fix bug of ckpt in V3
Merge pull request !2787 from qu_yueze/master
2025-06-09 02:29:47 +00:00
qu_yueze
379f824dad !2735 add lora-sft and fix scripts of qwen3
Merge pull request !2735 from qu_yueze/master
2025-05-29 10:53:12 +00:00
yanzhixiao
b2868282b0 !2722 fixing torch2.6, DeepSeek-V3 QLoRA loading checkpoint error
Merge pull request !2722 from yanzhixiao/bugfix-ckpt-load
2025-05-29 10:51:18 +00:00
qu_yueze
044f1b9dcc !2728 fix readme of ckpt of V3 and add some restrict
Merge pull request !2728 from qu_yueze/master
2025-05-29 03:02:57 +00:00
shengjy
9930a5057a !2710 Compatibility fixes
Merge pull request !2710 from shengjy/bf523
2025-05-27 09:58:38 +00:00
CY-Slightwind
5b7c6a030d !2674 [feat] QLoRA GMM Support
Merge pull request !2674 from CY-Slightwind/qlora
2025-05-22 12:30:51 +00:00
shengjy
89a231767e !2679 update deepseek3 scripts
Merge pull request !2679 from shengjy/master
2025-05-20 02:04:10 +00:00
丁子叉
b1ca6aaeae !2678 [dskv3]fix mtp_loss in sft and fix mg2hf when having multi mtp
Merge pull request !2678 from 丁子叉/master
2025-05-19 12:36:48 +00:00
jzh
0ceafd12c5 !2673 docs readme modify
Merge pull request !2673 from jzh/docs_0517
2025-05-19 04:04:38 +00:00
shengjy
1457cd83e8 !2672 Update poc deepseek3 A3 scripts
Merge pull request !2672 from shengjy/master
2025-05-16 09:35:02 +00:00
mhh001
6dce820728 !2669 DeepSeekV3 shell scripts update
Merge pull request !2669 from mhh001/v3shell_
2025-05-15 06:50:22 +00:00
mhh001
ac088f975c !2599 Adapt to core0.8.0 version
Merge pull request !2599 from mhh001/master
2025-05-09 08:51:18 +00:00
qu_yueze
922534a19d !2595 【fix】fix readme of 0day and some .sh
Merge pull request !2595 from qu_yueze/master
2025-04-27 11:37:54 +00:00
yuhui
0a46676b5e !2496 deepseek3权重转换适配dualpipe
Merge pull request !2496 from yuhui/dualpipe
2025-04-23 01:22:08 +00:00
shengjy
fae98b77c5 !2470 [core-llm][dskv3]mtp loss scaler and fix expert bias dtype
Merge pull request !2470 from shengjy/mtp_loss_scaler
2025-04-16 06:06:45 +00:00
qu_yueze
e2b3d4e39d !2536 fix readme of ckpt of deepseekV3
Merge pull request !2536 from qu_yueze/master
2025-04-16 03:23:55 +00:00
shenjiarun
fa3ad62105 !2507 Rename all seq length naming K to k
Merge pull request !2507 from shenjiarun/master
2025-04-10 09:11:18 +00:00
qu_yueze
9679b65830 !2485 优化权重转换readme
Merge pull request !2485 from qu_yueze/master
2025-04-08 06:08:59 +00:00
jzh
6480c709df !2495 修复部分脚本缺少ckpt_load,ckpt_save的问题
Merge pull request !2495 from jzh/master_dskv3sh
2025-04-01 12:39:52 +00:00
丁子叉
12207267bd !2481 [core-llm][dskv3]调整长稳脚本,添加性能脚本
Merge pull request !2481 from 丁子叉/master
2025-03-30 14:33:41 +00:00
guozhihua
7dd8173313 !2483 [core-llm][dskv3]A3 调整长稳脚本,添加性能脚本
Merge pull request !2483 from guozhihua/master
2025-03-30 13:53:34 +00:00
qu_yueze
57c893def2 !2465 deepseekV3 LORA to HF
Merge pull request !2465 from qu_yueze/master
2025-03-28 02:33:11 +00:00
jzh
120067f46a !2439 调优全参微调最优脚本,修正部分脚本bug
Merge pull request !2439 from jzh/master
2025-03-26 08:52:36 +00:00
yuhui
f7ccd71ba8 !2378 deepseek3权重转换支持mla-mm-split
Merge pull request !2378 from yuhui/mla_split
2025-03-26 08:47:36 +00:00
guozhihua
278ff1c89d !2440 迁移deepseek3的8机脚本到test/poc/deepseek3下面
Merge pull request !2440 from guozhihua/master
2025-03-24 03:26:32 +00:00
yuhui
32406fec57 !2404 deepseek3权重mg转hf postprocess逻辑修正
Merge pull request !2404 from yuhui/postprocess
2025-03-20 08:58:57 +00:00
yuhui
edc78042e4 !2405 新增deepseek3推理评估脚本
Merge pull request !2405 from yuhui/generate
2025-03-18 11:34:16 +00:00
qu_yueze
c5e9a4eec7 !2412 fix merge lora ckpt of deepseekV3
Merge pull request !2412 from qu_yueze/master
2025-03-18 11:04:43 +00:00
guozhihua
b18e900d2c !2399 修改grok1和deepseek3_60b_128die脚本
Merge pull request !2399 from guozhihua/master
2025-03-15 04:28:07 +00:00
yuhui
36b7119860 !2392 deepseek3权重mg转hf增加metadata信息
Merge pull request !2392 from yuhui/mg2hf
2025-03-13 11:43:09 +00:00
yuhui
a061c9f14a !2388 deepseek3权重转换参数优化
Merge pull request !2388 from yuhui/ds3_convert
2025-03-12 10:36:16 +00:00
qu_yueze
82b3e00222 !2387 fix bug of mtp when mcore2hf of deepseek3
Merge pull request !2387 from qu_yueze/master
2025-03-12 09:41:08 +00:00
jzh
e6d441a17e !2352 fix some GPUS_PER_NODE to NPUS_PER_NODE
Merge pull request !2352 from jzh/master-v3example
2025-03-08 09:40:40 +00:00
qu_yueze
0202b40ae3 !2368 fix lora_path is None when merge lora_ckpt
Merge pull request !2368 from qu_yueze/master
2025-03-08 02:05:17 +00:00
qu_yueze
e1359cde7a !2366 deepseek3新增lora权重、base权重合并
Merge pull request !2366 from qu_yueze/master
2025-03-07 12:34:59 +00:00
qu_yueze
e873148726 !2362 deepseek3 权重转换 lora 合并
Merge pull request !2362 from qu_yueze/master
2025-03-07 07:59:00 +00:00
yuhui
5bba420035 !2361 ds3权重转换部分vpp场景下层索引逻辑修正
Merge pull request !2361 from yuhui/vpp
2025-03-07 03:53:31 +00:00
yuhui
9d65eaef5d !2354 v3权重mg转hf修正postprocess权重保存逻辑
Merge pull request !2354 from yuhui/mg2hf
2025-03-06 09:19:11 +00:00
mhh001
abedff3e6f !2331 deepseekV3训练shell脚本更新
Merge pull request !2331 from mhh001/master
2025-03-06 01:12:49 +00:00
breeze623
910a03d225 !2339 deepseek v3 attention部分mg2hf
Merge pull request !2339 from breeze623/master
2025-03-05 07:13:17 +00:00
yuhui
6951b2a66e !2333 deepseek3权重支持mg转hf
Merge pull request !2333 from yuhui/mg2hf
2025-03-04 12:05:36 +00:00
yuhui
6cc2ce4533 !2328 权重转换tp_extend_ep逻辑修正
Merge pull request !2328 from yuhui/ckpt_tp_extend_ep
2025-03-03 11:24:40 +00:00
yuhui
36bd5742b5 !2319 deepseek3权重hf转mg支持moe_tp_extend_ep
Merge pull request !2319 from yuhui/ckpt_tp_extend_ep
2025-03-03 02:16:07 +00:00