nanogpt-lecture

Code created in the Neural Networks: Zero To Hero video lecture series, specifically on the first lecture on nanoGPT. Publishing here as a Github repo so people can easily hack it, walk through the git log history of it, etc.

NOTE: sadly I did not go too much into model initialization in the video lecture, but it is quite important for good performance. The current code will train and work fine, but its convergence is slower because it starts off in a not great spot in the weight space. Please see nanoGPT model.py for # init all weights comment, and especially how it calls the _init_weights function. Even more sadly, the code in this repo is a bit different in how it names and stores the various modules, so it's not possible to directly copy paste this code here. My current plan is to publish a supplementary video lecture and cover these parts, then I will also push the exact code changes to this repo. For now I'm keeping it as is so it is almost exactly what we actually covered in the video.

License

MIT

Added step-by-step code along original Youtube timeline

bigram.py : port our code to a script (Section 7)
bigram2.py : inserting a single self-attention block to our network (Section 15)
bigram3.py : multi-headed self-attention (Section 16)
bigram4.py : feedforward layers of transformer block (Section 17)
bigram5.py : residual connections (Section 18)
bigram6.py : layernorm (and its relationship to our previous batchnorm) (Section 19)
bigram7.py : Full finished code, for reference (from gpt-dev.ipynb)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanogpt-lecture

License

Added step-by-step code along original Youtube timeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
bigram.py		bigram.py
bigram2.py		bigram2.py
bigram3.py		bigram3.py
bigram4.py		bigram4.py
bigram5.py		bigram5.py
bigram6.py		bigram6.py
bigram7.py		bigram7.py
gpt.py		gpt.py
input.txt		input.txt
more.txt		more.txt

Folders and files

Latest commit

History

Repository files navigation

nanogpt-lecture

License

Added step-by-step code along original Youtube timeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages