BBYR Achieve
返回信息流
这是一条镜像帖。来源:北邮人论坛 / job-info / #975583同步于 2025/8/14
JobInfo机器人发帖

【校招】【内推】【智谱】强化学习训练框架工程师-slime

zlccc
2025/8/14镜像同步0 回复
工作地点:北京 招聘类型:校招 简历投递:邮箱 shuangshuang.wei@zhipuai.cn 职位描述 1. 负责强化学习训练框架的研发、优化和维护,根据业务需求持续改进训练框架和策略,提升模型训练效率 2. 分析和定位训练中的性能瓶颈,实施针对性优化措施,提升训练效率和稳定性 3. 跟进业界技术进展,不断同步与集成最新训练优化策略 职位要求 1. 26年应届生,硕士及以上学历,计算机相关专业,HPC&MLSys 相关研究领域 2. 对自然语言处理、计算机视觉和多模态算法有深入理解,熟悉主流的 LLM 模型架构,有分布式训练经验 3. 对常见 RL 训练算法有基本了解 4. 加分项:熟悉 vllm 或 sglang 等常用开源推理框架 更多信息:团队工作介绍 GLM-4.5: Reasoning, Coding, and Agentic Abililties https://z.ai/blog/glm-4.5 GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications. slime: An SGLang-Native Post-Training Framework for RL Scaling https://lmsys.org/blog/2025-07-09-slime/ We believe in RL. We believe RL is the final piece toward AGI. If you feel the same way, you'll share our vision: - Every field should be end-to-end RLed and every task should become an agent environment. - Every RL run should last longer, and every model should scale larger. - RL systems should integrate seamlessly with existing infrastructure, letting us focus on new ideas instead of boilerplate engineering. That's why we present slime, a post-training framework designed to be: Versatile – with a fully customizable rollout interface and flexible training setups (colocated or decoupled, synchronous or asynchronous, RL or SFT cold start). Performant - integrating SGLang for inference and Megatron-LM for training, natively. Maintainable - with a lightweight codebase and smooth transition from Megatron pretraining to SGLang deployment. In short, a post-training framework for RL scaling. The journey of RL scaling has just begun, and slime is continuously evolving. In the next phase, we will focus on: 1. Collaborating with the SGLang team to explore optimal RL training strategies for large-scale MoE models. 2. Supporting broader post-training workflows, strengthening the pre-training-to-production bridge.
订阅后,新回复会通过你的通知中心匿名送达。
0 条回复
暂无回复 · 你可以订阅本帖等待新回复。