Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a nonlinear activation function. We prove that the minimax rate is $M^{-\frac{2\beta}{2\beta+1}}$, where $M$ is the sample size and $\beta$ is the H\"older smoothness of the activation function. Importantly, this rate is independent of the embedding dimension $d$, the number of tokens $N$, and the rank $r$ of the weight matrix, provided that $rd \le (M/\log M)^{\frac{1}{2\beta+1}}$. These results highlight a fundamental statistical efficiency of attention-style models, even when the weight matrix and activation are not separately identifiable, and provide a theoretical understanding of attention mechanisms and guidance on training.

Taux minimax pour l'apprentissage des interactions par paires dans les modèles à mécanisme d'attention

Nous étudions le taux de convergence pour l'apprentissage des interactions par paires dans les modèles à mécanisme d'attention à couche unique, où les jetons textuels interagissent par l'entremise d'une matrice de poids et d'une fonction d'activation non linéaire. Nous démontrons que le taux minimax est $M^{-\frac{2\beta}{2\beta+1}}$, où $M$ est la taille de l'échantillon et $\beta$ est la régularité de Hölder de la fonction d'activation. Il est important de noter que ce taux est indépendant de la dimension d'intégration $d$, du nombre de jetons textuels $N$ et du rang $r$ de la matrice de poids, en supposant que $rd \le (M/\log M)^{\frac{1}{2\beta+1}}$. Ces résultats mettent en évidence l'efficacité statistique fondamentale des modèles à mécanisme d'attention, même lorsque la matrice de poids et l'activation ne sont pas identifiables séparément, et fournissent une compréhension théorique des mécanismes d'attention ainsi que des conseils sur l'apprentissage.

Session

Recent Advances in the Probabilistic Foundations of Statistics and Machine Learning

Date and Time

Tue, 06/02/2026 - 16:30 - Tue, 06/02/2026 - 17:00

Additional Authors and Speakers (not including you)

Language of Oral Presentation

English

Language of Visual Aids

English