i-robot
|
773328361c
|
!7779 【master】【bugfix】文档拼写整改
Merge pull request !7779 from 森镇/code_docs_fix_spelling
|
2025-12-04 07:40:58 +00:00 |
|
i-robot
|
5cde08f9cc
|
!7787 【master】【mcore】【bugfix】Fix the incorrect path in the YAML file and Fix…
Merge pull request !7787 from zhangyihui/master-bugfix
|
2025-12-03 14:42:12 +00:00 |
|
zhangyihuiben
|
d261447482
|
【master】【mcore】【bugfix】Fix the incorrect path in the YAML file and Fix inconsistencies in formatting, capitalization, and typos in the documentation.
|
2025-12-03 16:59:15 +08:00 |
|
senzhen
|
a044bdbc35
|
文档拼写整改
|
2025-12-03 14:46:56 +08:00 |
|
zxq
|
e2ee4478fb
|
【master】【bug-fix】修改文档中的拼写错误
|
2025-12-02 09:59:46 +08:00 |
|
i-robot
|
dc5ae6a17b
|
!7600 fix compute dtype in residual and rope.
Merge pull request !7600 from niujunhao/bugfix/fix_compute_dtype
|
2025-11-11 13:09:13 +00:00 |
|
niujunhao
|
3b7a97f709
|
fix residual fp32 and rope dtype.
|
2025-11-11 15:50:54 +08:00 |
|
Hsshuai
|
63cf1fba97
|
fix precision issue when vocab_emb_dp=True
(cherry picked from commit 10bbf69209)
|
2025-11-10 21:11:19 +08:00 |
|
zhangyihuiben
|
add4cda1e7
|
【master】【mcore】【bugfix】fix moe_aux_loss in qwen3_moe
|
2025-10-29 15:19:16 +08:00 |
|
i-robot
|
8b297995e9
|
!7525 【文档】Qwen3系列README添加训转推文档链接
Merge pull request !7525 from SaiYao/code_docs_qwen3_t2i
|
2025-10-20 02:34:37 +00:00 |
|
SaiYao
|
a8cf6a293d
|
【文档】Qwen3系列README添加训转推文档链接
|
2025-10-17 17:19:57 +08:00 |
|
zhangyihuiben
|
7abfc7e024
|
【master】【mcore】【bugfix】delete recompute_slice_activation
|
2025-10-17 10:48:45 +08:00 |
|
senzhen
|
f135e3b1cd
|
修复qwen3训练报错
|
2025-10-15 16:30:14 +08:00 |
|
i-robot
|
2289d16afb
|
!6996 【code_docs】glm4文档
Merge pull request !6996 from nashturing/glm_master_doc
|
2025-09-24 04:03:54 +00:00 |
|
JavaZero
|
6d172177d1
|
set vocab_emb_dp default to False
|
2025-09-19 17:09:51 +08:00 |
|
i-robot
|
80b13bc2c5
|
!7030 新增预训练和微调模板配置
Merge pull request !7030 from lan/dev_model_yaml
|
2025-09-10 06:19:59 +00:00 |
|
lanxiang
|
3bf3a0d135
|
新增预训练和微调模板配置
|
2025-09-10 11:43:33 +08:00 |
|
JavaZero
|
b7cfd27505
|
update parallel configuration for training scripts in Qwen3 README
|
2025-09-09 16:23:55 +08:00 |
|
i-robot
|
bac1b0c8d8
|
!7250 【docs】【master】添加qwen3 微调文档
Merge pull request !7250 from JavaZero/code_docs_add_mcore_qwen3_sft_readme
|
2025-09-09 07:50:04 +00:00 |
|
JavaZero
|
6c8af8f670
|
add qwen3 sft readme
|
2025-09-09 11:14:10 +08:00 |
|
ccsszz
|
d25fa86623
|
mcore deepseek yaml add use_fused_mla switch
|
2025-09-08 19:42:56 +08:00 |
|
pengjingyou
|
8002390917
|
【bugfix】【infer】router模块根据配置选择融合算子及路由算法的激活函数
|
2025-09-05 16:08:34 +08:00 |
|
JavaZero
|
770a9bae03
|
support qwen3 32b finetune
|
2025-09-04 11:40:05 +08:00 |
|
zhangyihuiben
|
0d8c4a0334
|
【master】【bugfix】 qwen3-moe 配置参数错误
|
2025-09-02 15:11:08 +08:00 |
|
zhangyihuiben
|
6d6ab4b138
|
【master】【feature】qwen3-moe 预训练yaml+README.md
|
2025-09-01 21:33:20 +08:00 |
|
husichao
|
a139117591
|
add new MLP with interleaved weight layout
|
2025-08-22 19:39:20 +08:00 |
|
Xinrui Chen
|
0bba89b148
|
[Docs] Add model readme template
|
2025-08-20 17:37:43 +08:00 |
|
nashturing
|
cb8278bfd6
|
【code_docs】glm4_moe文档
|
2025-08-18 20:52:11 +08:00 |
|
nashturing
|
1efb128a11
|
【code_docs】glm4文档
|
2025-08-14 17:14:34 +08:00 |
|
zxq
|
00d6fcd205
|
新增Qwen3模型文档(推理部分),包含模型描述、支持规格、使用样例及配置建议
|
2025-07-26 18:22:35 +08:00 |
|
i-robot
|
1ac1d70eae
|
!6877 新增 Qwen3 模型文档,包含模型描述、支持规格、使用样例及配置建议
Merge pull request !6877 from JavaZero/code_docs_add_mcore_qwen3_readme
|
2025-07-25 08:11:39 +00:00 |
|
JavaZero
|
05766c8f0a
|
新增 Qwen3 模型文档,包含模型描述、支持规格、使用样例及配置建议
|
2025-07-25 14:34:35 +08:00 |
|
i-robot
|
3af357374d
|
!6880 modify links
Merge pull request !6880 from 宦晓玲/code_docs_0723
|
2025-07-25 03:29:59 +00:00 |
|
huan
|
83a307e139
|
modify links
|
2025-07-24 09:35:55 +08:00 |
|
i-robot
|
78438775c3
|
!6860 【master】训练配置简化
Merge pull request !6860 from lan/dev_yaml_update
|
2025-07-23 10:21:53 +00:00 |
|
lanxiang
|
f5656852e8
|
训练配置简化
|
2025-07-22 10:44:24 +08:00 |
|
i-robot
|
d6b44f26e9
|
!6584 【dev】mcore添加qwen3
Merge pull request !6584 from JavaZero/mcore_qwen3
|
2025-07-16 08:03:57 +00:00 |
|
JavaZero
|
b18b9a6a4b
|
mcore add qwen3
|
2025-07-16 15:22:38 +08:00 |
|
zxq
|
dc386615df
|
【feature】【dev】修改Qwen2模型配置及yaml,适配run_mindformer
|
2025-07-14 14:39:13 +08:00 |
|
Yule100
|
352d8377af
|
deepseek3 run_mindformer适配
|
2025-06-27 11:50:34 +08:00 |
|
sunyu-xuan
|
c915432b5a
|
optimize config and fix qwen3_moe name in comment
|
2025-06-26 20:04:21 +08:00 |
|
i-robot
|
19677dc409
|
!6465 【dev】下架 bert以及T5最小集
Merge pull request !6465 from yiyison/bert_off
|
2025-06-26 01:47:23 +00:00 |
|
i-robot
|
4371c76746
|
!6598 bugfix hf-tokenizer
Merge pull request !6598 from Yule100/bugfix_hf_tokenizer
|
2025-06-26 01:21:35 +00:00 |
|
Yule100
|
3a8a0ac796
|
bugfix hf-tokenizer
|
2025-06-25 19:59:36 +08:00 |
|
yiyison
|
e8b9bcc58d
|
下架bert以及t5模型代码
|
2025-06-25 17:47:21 +08:00 |
|
i-robot
|
1b03cc9e38
|
!6371 【dev】【下架】下架过时api
Merge pull request !6371 from 魏琢艺/delete_api
|
2025-06-25 06:53:54 +00:00 |
|
i-robot
|
6fab43fbe5
|
!6614 [Models] Delete Llama2
Merge pull request !6614 from Xinrui Chen/dev-del-llama
|
2025-06-25 06:53:29 +00:00 |
|
Xinrui Chen
|
80b61f5668
|
[Models] Delete Llama2
|
2025-06-24 21:27:49 +08:00 |
|
qinsichun
|
74ee12de76
|
fix_generation_load
|
2025-06-24 18:50:17 +08:00 |
|
魏琢艺
|
7d75d5692f
|
delete api
|
2025-06-24 17:24:56 +08:00 |
|