You ask(query): “What do you know that can help me understand my role in this class-wide project?”
Each of your fellow students begins flipping through their textbooks and notes (representing their keys) to see if they have relevant information that could help clarify your role.
The students who find useful information eagerly raise their hands to provide their answers (their values).
In the sentence "dog plays fetch", the token "plays" queries all the other tokens to gather context about what action is happening and who is involved. The token "dog" responds by providing information about the subject, while "fetch" provides context about the object of the action.
each token generates query, key, and value vectors
These vectors are derived from the token’s embedding through learned transformations using weight matrices: Wq (for queries), Wk (for keys), and Wv (for values).