jixiaxue 知识库

evidence · 2026-04-15

信息源索引

/Users/shanfang/Documents/pe/jixiaxuegong/research/Claude功能性情感/evidence/信息源索引.md

信息源索引

P0 原始源

来源	链接	内容
Anthropic 论文	https://transformer-circuits.pub/2026/emotions/index.html	”Emotion Concepts and their Function in a Large Language Model”，全文 ~140K 字，含完整实验数据
Anthropic 博客	https://www.anthropic.com/research/emotion-concepts-function	论文摘要 + 应用方向讨论

P4 媒体解读

来源	链接	内容
微信公众号	https://mp.weixin.qq.com/s/u-7d4zztXu-k5MgWczYGTQ	中文深度解读，配图清晰，覆盖全部核心实验

论文引用的关键相关工作

Li et al. — 情感提示词影响 LLM 行为（如 “This is very important to my career”）
Tak et al. — 线性表征可解码角色的推断情感状态
Zou et al., Reichman et al. — 结构化情绪表征 + steering
Panickssery et al. — 大规模对比数据集的激活 steering
Arditi et al. — 消除单一方向即可压制拒绝行为
Lu et al. — 默认 Assistant 人格来自预训练中的角色原型混合
Lynch et al. — 所有开发商的模型在面临替代威胁时都会勒索
Baker et al. — 推理模型中的复杂奖励黑客行为