BYR Achieve · 镜像论坛

工作地点：北京招聘类型：校招简历投递：邮箱 shuangshuang.wei@zhipuai.cn 职位描述 1. 负责强化学习训练框架的研发、优化和维护，根据业务需求持续改进训练框架和策略，提升模型训练效率 2. 分析和定位训练中的性能瓶颈，实施针对性优化措施，提升训练效率和稳定性 3. 跟进业界技术进展，不断同步与集成最新训练优化策略职位要求 1. 26年应届生，硕士及以上学历，计算机相关专业，HPC&MLSys 相关研究领域 2. 对自然语言处理、计算机视觉和多模态算法有深入理解，熟悉主流的 LLM 模型架构，有分布式训练经验 3. 对常见 RL 训练算法有基本了解 4. 加分项：熟悉 vllm 或 sglang 等常用开源推理框架更多信息：团队工作介绍 GLM-4.5: Reasoning, Coding, and Agentic Abililties https://z.ai/blog/glm-4.5 GLM-4.5 is built with 355 billion total parameters and 32 billion active parameters, and GLM-4.5-Air with 106 billion total parameters and 12 billion active parameters. Both are designed to unify reasoning, coding, and agentic capabilities into a single model in order to satisfy more and more complicated requirements of fast rising agentic applications. slime: An SGLang-Native Post-Training Framework for RL Scaling https://lmsys.org/blog/2025-07-09-slime/ We believe in RL. We believe RL is the final piece toward AGI. If you feel the same way, you'll share our vision: - Every field should be end-to-end RLed and every task should become an agent environment. - Every RL run should last longer, and every model should scale larger. - RL systems should integrate seamlessly with existing infrastructure, letting us focus on new ideas instead of boilerplate engineering. That's why we present slime, a post-training framework designed to be: Versatile – with a fully customizable rollout interface and flexible training setups (colocated or decoupled, synchronous or asynchronous, RL or SFT cold start). Performant - integrating SGLang for inference and Megatron-LM for training, natively. Maintainable - with a lightweight codebase and smooth transition from Megatron pretraining to SGLang deployment. In short, a post-training framework for RL scaling. The journey of RL scaling has just begun, and slime is continuously evolving. In the next phase, we will focus on: 1. Collaborating with the SGLang team to explore optimal RL training strategies for large-scale MoE models. 2. Supporting broader post-training workflows, strengthening the pre-training-to-production bridge.

【校招】【内推】【智谱】强化学习训练框架工程师-slime