In Transformer architectures, the _______ mechanism allows the model to focus on different parts of the input data differently.

  • Self-Attention
  • Batch Normalization
  • Recurrent Layer
  • Convolutional Layer
In Transformer architectures, the mechanism that allows the model to focus on different parts of the input data differently is known as "Self-Attention." It enables the model to weigh input elements based on their relevance for a given context.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *