How Can Self-Attention Networks Recognize Dyck-n Languages?
Clicks: 17
ID: 282326
2020
We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with
self-attention (SA) networks, which has been deemed to be a difficult task for
these networks. We compare the performance of two variants of SA, one with a
starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$
is able to generalize to longer sequences and deeper dependencies. For
$\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences
whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by
$\text{SA}{^+}$ to be amenable to interpretation and compatible with a
stack-based language recognizer. Surprisingly, the performance of SA networks
is at par with LSTMs, which provides evidence on the ability of SA to learn
hierarchies without recursion.
Reference Key |
zhang2020how
Use this key to autocite in the manuscript while using
SciMatic Manuscript Manager or Thesis Manager
|
---|---|
Authors | Javid Ebrahimi; Dhruv Gelda; Wei Zhang |
Journal | arXiv |
Year | 2020 |
DOI | DOI not found |
URL | |
Keywords |
Citations
No citations found. To add a citation, contact the admin at info@scimatic.org
Comments
No comments yet. Be the first to comment on this article.