0%

2025-09-24-insights

发表于 2025-09-25 更新于 2025-10-06 分类于 Arxiv-Insights 阅读次数： Valine：
本文字数： 570 阅读时长 ≈ 1 分钟

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipes

实验室的minicpm-v工作的迭代版本，这个系列工作比较focus image/video的perception问题，在8B scale下刷新了sota

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

这是一篇对bagel做推理加速的工作，甚至是一个lossless的算法

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents

Zhipu的Mobile RL工作，前几天出了个ComputerRL，今天出了MobileRL，标题都挺简单粗暴的。作者在rl上算法上，主要是更偏好更短的traj，由此进行online rl。

Soft Tokens, Hard Truths

Meta的一篇关于latent cot的工作，作者做了一个hard/soft/fuzzy模式相结合的方案，然后第一次可以不用distill的方式，直接训练一个latent cot模型。并发现，在下游任务中，latent cot的bon更高，也就是diversity更高