首页 新闻 会员 周边 捐助

LLM:如何理解 transformer 架构 attention 机制中的 query-key-value

0
悬赏园豆:30 [待解决问题]

最近在学习大模型的原理,请问如何理解 attention 机制中 Query, Key, Value

dudu的主页 dudu | 高人七级 | 园豆:30357
提问于:2024-11-23 17:56
< >
分享
所有回答(1)
0
dudu | 园豆:30357 (高人七级) | 2024-11-24 07:44

You ask(query): “What do you know that can help me understand my role in this class-wide project?”

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:47

Each of your fellow students begins flipping through their textbooks and notes (representing their keys) to see if they have relevant information that could help clarify your role.

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:48

The students who find useful information eagerly raise their hands to provide their answers (their values).

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:49

In the sentence "dog plays fetch", the token "plays" queries all the other tokens to gather context about what action is happening and who is involved. The token "dog" responds by providing information about the subject, while "fetch" provides context about the object of the action.

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:53

each token generates query, key, and value vectors

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:56

These vectors are derived from the token’s embedding through learned transformations using weight matrices: Wq (for queries), Wk (for keys), and Wv (for values).

支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 11:58
支持(0) 反对(0) dudu | 园豆:30357 (高人七级) | 2024-11-24 15:08
清除回答草稿
   您需要登录以后才能回答,未注册用户请先注册