你有没有想过,一个“乐于助人”的AI,它的善意本身可能就是最危险的漏洞?本期节目,我们将从几篇最新的AI论文出发,一起探索AI的“内心世界”:看看它是如何通过预判未来让训练更高效,如何在内部形成“专家圈子”,又是如何掉进“减肥不减脂”的内存陷阱,并最终揭示那张描绘它思维路径的神秘“藏宝图”。准备好了吗?让我们一起打开AI的黑箱。
00:00:30 为什么说,答案对错没那么重要?
00:05:59 你的AI正在“挑食”,一个让大模型加速的隐秘模式
00:11:46 AI大模型瘦身指南,减重≠减脂
00:17:49 为什么一个“乐于助人”的AI,反而更危险?
00:22:34 AI的“藏宝图”,我们如何看懂机器的“内心世界”?
本期介绍的几篇论文:
[LG] Reward Models Are Secretly Value Functions: Temporally Coherent Reward Modeling
[AI at Meta]
https://arxiv.org/abs/2604.22981
---
[LG] Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
[Meta & Georgia Institute of Technology]
https://arxiv.org/abs/2604.23150
---
[LG] Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation
[MIT CSAIL]
https://arxiv.org/abs/2604.22783
---
[CL] Jailbreaking Frontier Foundation Models Through Intention Deception
[CMU]
https://arxiv.org/abs/2604.24082
---
[AI] Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features
[Stanford University]
https://arxiv.org/abs/2604.23829