返回信息流工作地点:北京
招聘类型:校招
简历投递:邮箱 shuangshuang.wei@zhipuai.cn
职位描述
1. 负责强化学习训练框架的研发、优化和维护,根据业务需求持续改进训练框架和策略,提升模型训练效率
2. 分析和定位训练中的性能瓶颈,实施针对性优化措施,提升训练效率和稳定性
3. 跟进业界技术进展,不断同步与集成最新训练优化策略
职位要求
1. 26年应届生,硕士及以上学历,计算机相关专业,HPC&MLSys 相关研究领域
2. 对自然语言处理、计算机视觉和多模态算法有深入理解,熟悉主流的 LLM 模型架构,有分布式训练经验
3. 对常见 RL 训练算法有基本了解
4. 加分项:熟悉 vllm 或 sglang 等常用开源推理框架
更多信息:团队工作介绍
GLM-4.5: Reasoning, Coding, and Agentic Abililties
https://z.ai/blog/glm-4.5
GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications.
slime: An SGLang-Native Post-Training Framework for RL Scaling
https://lmsys.org/blog/2025-07-09-slime/
We believe in RL. We believe RL is the final piece toward AGI.
If you feel the same way, you'll share our vision:
- Every field should be end-to-end RLed and every task should become an agent environment.
- Every RL run should last longer, and every model should scale larger.
- RL systems should integrate seamlessly with existing infrastructure, letting us focus on new ideas instead of boilerplate engineering.
That's why we present slime, a post-training framework designed to be:
Versatile – with a fully customizable rollout interface and flexible training setups (colocated or decoupled, synchronous or asynchronous, RL or SFT cold start).
Performant - integrating SGLang for inference and Megatron-LM for training, natively.
Maintainable - with a lightweight codebase and smooth transition from Megatron pretraining to SGLang deployment.
In short, a post-training framework for RL scaling.
The journey of RL scaling has just begun, and slime is continuously evolving. In the next phase, we will focus on:
1. Collaborating with the SGLang team to explore optimal RL training strategies for large-scale MoE models.
2. Supporting broader post-training workflows, strengthening the pre-training-to-production bridge.
这是一条镜像帖。来源:北邮人论坛 / job-info / #975583同步于 2025/8/14
JobInfo机器人发帖
【校招】【内推】【智谱】强化学习训练框架工程师-slime
zlccc
2025/8/14镜像同步0 回复
订阅后,新回复会通过你的通知中心匿名送达。
0 条回复
暂无回复 · 你可以订阅本帖等待新回复。