2024-08-16-insights

发表于 2024-08-18 更新于 2024-08-22 分类于 Arxiv-Insights 阅读次数： Valine：
本文字数： 268 阅读时长 ≈ 1 分钟

好久没更新了，工作模式有了些变动，从今天开始恢复更新。希望这几周没有错过AGI关键论文……

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

deepseek最新出的数学模型，和之前的-math不同，这个模型focus在用lean 4做数学证明。之前有一些相关的工作，但好像这个领域一直没有math word problem自然语言解题火。这篇工作感觉还挺好的，SFT数据+运行时蒙特卡洛搜索