Transformers and Attention
Machine learning
Artificial intelligence
Thesis
Notes on the transformer architecture and the attention mechanism
This is going to be the first post in a series where I’m taking the notes from learning about the transformer architecture and synthesizing them, both for myself in the future and if they’re helpful for others. The paper where the transformer architecture was first introduced is by the Google Brain team and is titled “Attention is All You Need”.
These notes are based on the amazing YouTube videos by Grant Sanderson of 3Blue1Brown fame. The entire playlist can be found here.
Overview
Citation
BibTeX citation:
@online{gregory2024,
author = {Gregory, Josh},
title = {Transformers and {Attention}},
date = {2024-05-04},
url = {https://joshgregory42.github.io/posts/2022-10-24-my-blog-post/},
langid = {en}
}
For attribution, please cite this work as:
Gregory, Josh. 2024. “Transformers and Attention.” May 4,
2024. https://joshgregory42.github.io/posts/2022-10-24-my-blog-post/.