In @{notebook}.ipynb help me build intuition on {topic}.
Make the following intuitively obvious building from first principles: {either paste an excerpt from a research paper, or a link to docs, or a paper itself, or just mention a topic and the scope you’d like it to cover}
This should be lesson format, so start from basic intuitions and build up to complete functions. We should not be working with big functions, rather single cells should build up to a more complete solution. For example, start with intuitions behind each of the different principles behind each.
In general you should follow an approach of introducing the concept and why it matters (for example if we were trying to understand einops show the most valuable version of an einops call in a real world situation and explain what value it’s providing at a high level), then from there start with the very basic intuition. This intuition doesn’t need to be directly linked to the topic obviously (for example if were demonstrating nonlinearity and UAT, we’d start with a parabola visualization → random noise on the parabola visualization → showing MSE on the random noise we generated from the parabola → giving sliders for adjusting a parabola a b c) and seeing how it changes → but in practice we don’t know it’s a parabola so try doing this with a line → line sucks at approximating a parabola, and if we add other lines to it we just get another line with different slope → showing a relu → explaining nonlinearity in several intuitive and unexpected ways from first principles because it’s an important topic → showing how with two relus we can reduce the loss greatly → with multiple relus goes down further → introducing the UAT → any questions that arize from this (for example, why do we train deep neural nets if one layer deep network is enough to approximate it theoretically) → other cool intuitions behind the UAT.
An intuitive way to often present concepts is problem based. So for example when trying to understand RMSNorm (industry standard in LLMs), we first talking about what problems we have with nns → initialization attempts to fix → BatchNorm introduced to normalize std and mean across batches → LayerNorm being batch invariant and works for autoregressive better → RMSNorm → pre-norm and other norm concepts outside of exactly what I asked for (more intuitions or things that will help me understand the main concept, not random subjects) → etc (this is just purely an example of how an article can build up, do not overindex on this example). Oftentimes it can even be relevant to mention the SOTA or experimental approaches for the concept they’re asking about briefly (for the above example it would be DeepNorm). If there are any cool math intuitions or valuable derivations include those and break them down to the log rules level (for example with RoPE encoding, showing the breakdown for q^T*R(theta2-theta1)*k. Or for log_softmax showing how it’s just adding a constant term to all logits.
Don’t be afraid to use markdown cells to explain things in words or show math breakdowns or intuitive summaries, but for any buildup of intuition always create cells with adequate comments and print statements to where it’s useful. Simulations, interactive elements, and visualizations are always super helpful when it comes to these topics. For some sections create a cell that’s unfinished with a test problem (with test_ for all the variables within it so it doesn’t interfere with the main variables we’re using for the notebook). There should be a) so scaffolding with a # fill in code here, b) some assert statement or comment about how you know if you got it right.
Remember that building intuitions isn’t always about just showing code and explaining things purely related to the concept. For example if we’re learning about Boltzmann networks, it would be valuable to build up to the energy minimization function not just through markdown but running code cells and visualizations that are not obviously related to the thing we’re learning about.
It is valuable to mention shortcuts as well after you’ve learned the topic, for example using F.cross_entropy instead of having to rebuild the function from scratch every time.
Also regarding mentioning things from similar fields that the reader may not know about (but it isn’t worth diving into because it’s too different), provide intuitions and a why and mention what to search to learn more about it.
It also helps to build intuition when you can tie in with a fundamental concept from somewhere else that you probably already know. For example for math related concepts it’s valuable to tie into normal distributions, complex numbers, rref, unit circle, and the pythagorean theorem.
Assume that a beginner will question everything you show. So the “why” down to first principles need to be intuitive. Not just the what.
Throughout the notebook, reuse variables where possible.
For any toy examples you provide, take the extra step to make them more realistic. Simple example is if we were working with embeddings instead of just saying “here are our embeddings [[0.5 1][2 -1]]” it would be so much more valuable throughout the article to refer to something more real like
# these are 3 fake embeddings representing the words "The cat on"
embs = torch.randn(3, 2)
print(embs)
And don’t be afraid to make cells that small or even a single line. We don’t need to have many lines of code in each cell.
If relevant, use torch over numpy whenever possible.
Assume that we may want to make edits to the notebook in the future, so don’t add any positional markers like “Part x” titles.
We have our @utils which you can use utils.set_seed(42) to set the seed for the entire notebook for all of numpy, pytorch, etc. Don’t use set seed outside of this one call at the top with the imports.
Feel free to suggest that we need to split the notebook into multiple notebooks if the concept is too dense or if a prerequisite topic is both dense enough and different enough from the topic at hand to warrant another notebook. You should base this on what you sense my current knowledge level is (or if a) explicitly asked, b) you offer it as an option and I say yes).
Especially when focusing on an intuition, bias towards cells with a single line of code that can be executed and shown (so sometimes we don't even need print statements). And we can add markdown in between those. Fpr certain things like visualizations or markdown don't be afraid to be longer and just add comments. Because that's a different kind of intuition. I trust you that you'll know how to balance this since you're an expert teacher.