jzh
|
56b3cda738
|
!3127 [pytorch][bugfix] update icsl for weights_only
Merge pull request !3127 from jzh/210_aicsl
|
2025-08-11 15:58:41 +00:00 |
|
guozhihua
|
d13b6003f5
|
!3035 [pytorch][sh]change core commitid of deepseekv3
Merge pull request !3035 from guozhihua/2.1.0
|
2025-08-09 03:19:13 +00:00 |
|
jzh
|
6785f325f3
|
!3113 [pytorch][bugfix]fix some bug for icsl
Merge pull request !3113 from jzh/210_uicsl
|
2025-08-08 07:13:20 +00:00 |
|
qu_yueze
|
acc11f130c
|
!2966 [pytorch][bugfix]fix vpp of V3_ckpt
Merge pull request !2966 from qu_yueze/2.1.0
|
2025-07-04 08:43:57 +00:00 |
|
shengjy
|
c6c3f3a39c
|
!2826 [pytorch][sh]update mindspeed commit id for deepseek3
Merge pull request !2826 from shengjy/master
|
2025-06-17 06:37:21 +00:00 |
|
shengjy
|
f38bb1fd30
|
!2667 [pytorch][feature]dualpipe in sft
Merge pull request !2667 from shengjy/sft_dp
|
2025-06-16 08:12:31 +00:00 |
|
qu_yueze
|
64ae8b737b
|
!2803 fix readme of V3
Merge pull request !2803 from qu_yueze/master
|
2025-06-10 11:11:53 +00:00 |
|
qu_yueze
|
d4a80afd20
|
!2787 fix bug of ckpt in V3
Merge pull request !2787 from qu_yueze/master
|
2025-06-09 02:29:47 +00:00 |
|
qu_yueze
|
379f824dad
|
!2735 add lora-sft and fix scripts of qwen3
Merge pull request !2735 from qu_yueze/master
|
2025-05-29 10:53:12 +00:00 |
|
yanzhixiao
|
b2868282b0
|
!2722 fixing torch2.6, DeepSeek-V3 QLoRA loading checkpoint error
Merge pull request !2722 from yanzhixiao/bugfix-ckpt-load
|
2025-05-29 10:51:18 +00:00 |
|
qu_yueze
|
044f1b9dcc
|
!2728 fix readme of ckpt of V3 and add some restrict
Merge pull request !2728 from qu_yueze/master
|
2025-05-29 03:02:57 +00:00 |
|
shengjy
|
9930a5057a
|
!2710 Compatibility fixes
Merge pull request !2710 from shengjy/bf523
|
2025-05-27 09:58:38 +00:00 |
|
CY-Slightwind
|
5b7c6a030d
|
!2674 [feat] QLoRA GMM Support
Merge pull request !2674 from CY-Slightwind/qlora
|
2025-05-22 12:30:51 +00:00 |
|
shengjy
|
89a231767e
|
!2679 update deepseek3 scripts
Merge pull request !2679 from shengjy/master
|
2025-05-20 02:04:10 +00:00 |
|
丁子叉
|
b1ca6aaeae
|
!2678 [dskv3]fix mtp_loss in sft and fix mg2hf when having multi mtp
Merge pull request !2678 from 丁子叉/master
|
2025-05-19 12:36:48 +00:00 |
|
jzh
|
0ceafd12c5
|
!2673 docs readme modify
Merge pull request !2673 from jzh/docs_0517
|
2025-05-19 04:04:38 +00:00 |
|
shengjy
|
1457cd83e8
|
!2672 Update poc deepseek3 A3 scripts
Merge pull request !2672 from shengjy/master
|
2025-05-16 09:35:02 +00:00 |
|
mhh001
|
6dce820728
|
!2669 DeepSeekV3 shell scripts update
Merge pull request !2669 from mhh001/v3shell_
|
2025-05-15 06:50:22 +00:00 |
|
mhh001
|
ac088f975c
|
!2599 Adapt to core0.8.0 version
Merge pull request !2599 from mhh001/master
|
2025-05-09 08:51:18 +00:00 |
|
qu_yueze
|
922534a19d
|
!2595 【fix】fix readme of 0day and some .sh
Merge pull request !2595 from qu_yueze/master
|
2025-04-27 11:37:54 +00:00 |
|
yuhui
|
0a46676b5e
|
!2496 deepseek3权重转换适配dualpipe
Merge pull request !2496 from yuhui/dualpipe
|
2025-04-23 01:22:08 +00:00 |
|
shengjy
|
fae98b77c5
|
!2470 [core-llm][dskv3]mtp loss scaler and fix expert bias dtype
Merge pull request !2470 from shengjy/mtp_loss_scaler
|
2025-04-16 06:06:45 +00:00 |
|
qu_yueze
|
e2b3d4e39d
|
!2536 fix readme of ckpt of deepseekV3
Merge pull request !2536 from qu_yueze/master
|
2025-04-16 03:23:55 +00:00 |
|
shenjiarun
|
fa3ad62105
|
!2507 Rename all seq length naming K to k
Merge pull request !2507 from shenjiarun/master
|
2025-04-10 09:11:18 +00:00 |
|
qu_yueze
|
9679b65830
|
!2485 优化权重转换readme
Merge pull request !2485 from qu_yueze/master
|
2025-04-08 06:08:59 +00:00 |
|
jzh
|
6480c709df
|
!2495 修复部分脚本缺少ckpt_load,ckpt_save的问题
Merge pull request !2495 from jzh/master_dskv3sh
|
2025-04-01 12:39:52 +00:00 |
|
丁子叉
|
12207267bd
|
!2481 [core-llm][dskv3]调整长稳脚本,添加性能脚本
Merge pull request !2481 from 丁子叉/master
|
2025-03-30 14:33:41 +00:00 |
|
guozhihua
|
7dd8173313
|
!2483 [core-llm][dskv3]A3 调整长稳脚本,添加性能脚本
Merge pull request !2483 from guozhihua/master
|
2025-03-30 13:53:34 +00:00 |
|
qu_yueze
|
57c893def2
|
!2465 deepseekV3 LORA to HF
Merge pull request !2465 from qu_yueze/master
|
2025-03-28 02:33:11 +00:00 |
|
jzh
|
120067f46a
|
!2439 调优全参微调最优脚本,修正部分脚本bug
Merge pull request !2439 from jzh/master
|
2025-03-26 08:52:36 +00:00 |
|
yuhui
|
f7ccd71ba8
|
!2378 deepseek3权重转换支持mla-mm-split
Merge pull request !2378 from yuhui/mla_split
|
2025-03-26 08:47:36 +00:00 |
|
guozhihua
|
278ff1c89d
|
!2440 迁移deepseek3的8机脚本到test/poc/deepseek3下面
Merge pull request !2440 from guozhihua/master
|
2025-03-24 03:26:32 +00:00 |
|
yuhui
|
32406fec57
|
!2404 deepseek3权重mg转hf postprocess逻辑修正
Merge pull request !2404 from yuhui/postprocess
|
2025-03-20 08:58:57 +00:00 |
|
yuhui
|
edc78042e4
|
!2405 新增deepseek3推理评估脚本
Merge pull request !2405 from yuhui/generate
|
2025-03-18 11:34:16 +00:00 |
|
qu_yueze
|
c5e9a4eec7
|
!2412 fix merge lora ckpt of deepseekV3
Merge pull request !2412 from qu_yueze/master
|
2025-03-18 11:04:43 +00:00 |
|
guozhihua
|
b18e900d2c
|
!2399 修改grok1和deepseek3_60b_128die脚本
Merge pull request !2399 from guozhihua/master
|
2025-03-15 04:28:07 +00:00 |
|
yuhui
|
36b7119860
|
!2392 deepseek3权重mg转hf增加metadata信息
Merge pull request !2392 from yuhui/mg2hf
|
2025-03-13 11:43:09 +00:00 |
|
yuhui
|
a061c9f14a
|
!2388 deepseek3权重转换参数优化
Merge pull request !2388 from yuhui/ds3_convert
|
2025-03-12 10:36:16 +00:00 |
|
qu_yueze
|
82b3e00222
|
!2387 fix bug of mtp when mcore2hf of deepseek3
Merge pull request !2387 from qu_yueze/master
|
2025-03-12 09:41:08 +00:00 |
|
jzh
|
e6d441a17e
|
!2352 fix some GPUS_PER_NODE to NPUS_PER_NODE
Merge pull request !2352 from jzh/master-v3example
|
2025-03-08 09:40:40 +00:00 |
|
qu_yueze
|
0202b40ae3
|
!2368 fix lora_path is None when merge lora_ckpt
Merge pull request !2368 from qu_yueze/master
|
2025-03-08 02:05:17 +00:00 |
|
qu_yueze
|
e1359cde7a
|
!2366 deepseek3新增lora权重、base权重合并
Merge pull request !2366 from qu_yueze/master
|
2025-03-07 12:34:59 +00:00 |
|
qu_yueze
|
e873148726
|
!2362 deepseek3 权重转换 lora 合并
Merge pull request !2362 from qu_yueze/master
|
2025-03-07 07:59:00 +00:00 |
|
yuhui
|
5bba420035
|
!2361 ds3权重转换部分vpp场景下层索引逻辑修正
Merge pull request !2361 from yuhui/vpp
|
2025-03-07 03:53:31 +00:00 |
|
yuhui
|
9d65eaef5d
|
!2354 v3权重mg转hf修正postprocess权重保存逻辑
Merge pull request !2354 from yuhui/mg2hf
|
2025-03-06 09:19:11 +00:00 |
|
mhh001
|
abedff3e6f
|
!2331 deepseekV3训练shell脚本更新
Merge pull request !2331 from mhh001/master
|
2025-03-06 01:12:49 +00:00 |
|
breeze623
|
910a03d225
|
!2339 deepseek v3 attention部分mg2hf
Merge pull request !2339 from breeze623/master
|
2025-03-05 07:13:17 +00:00 |
|
yuhui
|
6951b2a66e
|
!2333 deepseek3权重支持mg转hf
Merge pull request !2333 from yuhui/mg2hf
|
2025-03-04 12:05:36 +00:00 |
|
yuhui
|
6cc2ce4533
|
!2328 权重转换tp_extend_ep逻辑修正
Merge pull request !2328 from yuhui/ckpt_tp_extend_ep
|
2025-03-03 11:24:40 +00:00 |
|
yuhui
|
36bd5742b5
|
!2319 deepseek3权重hf转mg支持moe_tp_extend_ep
Merge pull request !2319 from yuhui/ckpt_tp_extend_ep
|
2025-03-03 02:16:07 +00:00 |
|