Encoder-decoder, Q from decoder, K/V from encoder
Medium AttentionImplement multi-head cross-attention (encoder-decoder attention).
• Q comes from the decoder, K and V come from the encoder
• No causal mask (all encoder positions visible)
Implement the function below. Use only basic PyTorch operations.
Use this code to debug before submitting.
Try solving it yourself first! Click below to reveal the solution.
For interactive practice with auto-grading, run TorchCode locally:pip install torch-judge then use check("cross_attention")
Encoder-decoder, Q from decoder, K/V from encoder