Skip to content

Commit 7638cb3

Browse files
Add ::tabs-start markers to ML articles for interactive code blocks (#5654)
All 27 ML problem articles now have ::tabs-start/::tabs-end markers around Python code blocks, enabling the copy button and syntax highlighting in the Solution tab on neetcode.io. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent bcac58b commit 7638cb3

27 files changed

Lines changed: 243 additions & 0 deletions

articles/backpropagation.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ We run the forward pass to get $\hat{y}$, compute the delta term (error times si
3737

3838
### Implementation
3939

40+
::tabs-start
4041
```python
4142
import numpy as np
4243
from numpy.typing import NDArray
@@ -57,6 +58,8 @@ class Solution:
5758

5859
return (dL_dw, dL_db)
5960
```
61+
::tabs-end
62+
6063

6164
### Walkthrough
6265

@@ -87,18 +90,22 @@ The negative gradients mean: increase $w_0$, increase $w_1$, and increase $b$ to
8790

8891
The error is $\hat{y} - y$, not $y - \hat{y}$. Flipping it negates all gradients, making the model move away from the target.
8992

93+
::tabs-start
9094
```python
9195
# Wrong: inverted error
9296
error = y_true - y_hat
9397

9498
# Correct: prediction minus truth
9599
error = y_hat - y_true
96100
```
101+
::tabs-end
102+
97103

98104
### Forgetting the Sigmoid Derivative
99105

100106
The sigmoid derivative is part of the chain. Without it, you are computing the gradient as if the activation were linear, which gives wrong weight updates.
101107

108+
::tabs-start
102109
```python
103110
# Wrong: missing sigmoid derivative in the chain
104111
delta = error # only the error, no activation derivative
@@ -107,6 +114,8 @@ delta = error # only the error, no activation derivative
107114
sigmoid_deriv = y_hat * (1.0 - y_hat)
108115
delta = error * sigmoid_deriv
109116
```
117+
::tabs-end
118+
110119

111120
---
112121

articles/basics-of-pytorch.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Each method exercises a core PyTorch operation. We use `torch.reshape` for resha
2929

3030
### Implementation
3131

32+
::tabs-start
3233
```python
3334
import torch
3435
import torch.nn
@@ -53,6 +54,8 @@ class Solution:
5354
loss = torch.nn.functional.mse_loss(prediction, target)
5455
return torch.round(loss, decimals=4)
5556
```
57+
::tabs-end
58+
5659

5760
### Walkthrough
5861

@@ -76,25 +79,31 @@ class Solution:
7679

7780
`dim=0` averages across rows (column-wise means), `dim=1` averages across columns (row-wise means). These are easy to confuse.
7881

82+
::tabs-start
7983
```python
8084
# Wrong: averages across columns instead of rows
8185
averaged = torch.mean(to_avg, dim=1)
8286

8387
# Correct: averages across rows (column means)
8488
averaged = torch.mean(to_avg, dim=0)
8589
```
90+
::tabs-end
91+
8692

8793
### Mismatched Shapes for Concatenation
8894

8995
Concatenation along `dim=1` requires the same number of rows. Different row counts cause a runtime error.
9096

97+
::tabs-start
9198
```python
9299
# Wrong: different number of rows (2 vs 3)
93100
torch.cat((torch.zeros(2, 3), torch.zeros(3, 3)), dim=1)
94101

95102
# Correct: same number of rows
96103
torch.cat((torch.zeros(2, 3), torch.zeros(2, 3)), dim=1)
97104
```
105+
::tabs-end
106+
98107

99108
---
100109

articles/build-vocabulary.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Extract unique characters with `set()`, sort them, build two dictionaries with e
3232

3333
### Implementation
3434

35+
::tabs-start
3536
```python
3637
from typing import Dict, List, Tuple
3738

@@ -48,6 +49,8 @@ class Solution:
4849
def decode(self, ids: List[int], itos: Dict[int, str]) -> str:
4950
return ''.join(itos[i] for i in ids)
5051
```
52+
::tabs-end
53+
5154

5255
### Walkthrough
5356

@@ -77,25 +80,31 @@ Round-trip: `decode(encode("hello")) = "hello"`.
7780

7881
Python sets have no guaranteed iteration order. Without sorting, the same text may produce different vocabularies on different runs.
7982

83+
::tabs-start
8084
```python
8185
# Wrong: non-deterministic order
8286
chars = list(set(text))
8387

8488
# Correct: sorted for reproducibility
8589
chars = sorted(set(text))
8690
```
91+
::tabs-end
92+
8793

8894
### Building itos Incorrectly
8995

9096
The `itos` mapping must be the exact inverse of `stoi`. Building it independently can introduce mismatches.
9197

98+
::tabs-start
9299
```python
93100
# Wrong: building independently, might not be exact inverse
94101
itos = {i: ch for i, ch in enumerate(chars)}
95102

96103
# Correct: derive from stoi to guarantee inverse relationship
97104
itos = {i: ch for ch, i in stoi.items()}
98105
```
106+
::tabs-end
107+
99108

100109
---
101110

articles/code-gpt.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Compose all previously built components: embedding layers, a sequence of transfo
3333

3434
### Implementation
3535

36+
::tabs-start
3637
```python
3738
import torch
3839
import torch.nn as nn
@@ -135,6 +136,8 @@ class GPT(nn.Module):
135136
embedded = embedded + self.linear_network(self.second_norm(embedded)) # another skip connection
136137
return embedded
137138
```
139+
::tabs-end
140+
138141

139142
### Walkthrough
140143

@@ -166,6 +169,7 @@ Each of the 5 positions outputs a distribution over 100 tokens, predicting the n
166169

167170
Without position embeddings, the model has no way to distinguish "cat sat" from "sat cat." The representations would be identical.
168171

172+
::tabs-start
169173
```python
170174
# Wrong: no position information
171175
embedded = self.word_embeddings(context)
@@ -177,11 +181,14 @@ positions = torch.arange(context.shape[1], device=context.device)
177181
embedded = embedded + self.position_embeddings(positions)
178182
output = self.transformer_blocks(embedded)
179183
```
184+
::tabs-end
185+
180186

181187
### Using nn.ModuleList Instead of nn.Sequential for Blocks
182188

183189
`nn.Sequential` chains modules automatically in `forward`. `nn.ModuleList` requires you to write the loop yourself. Both register parameters, but Sequential is cleaner here.
184190

191+
::tabs-start
185192
```python
186193
# Works but requires manual loop
187194
self.blocks = nn.ModuleList([TransformerBlock(...) for _ in range(N)])
@@ -191,6 +198,8 @@ self.blocks = nn.ModuleList([TransformerBlock(...) for _ in range(N)])
191198
self.blocks = nn.Sequential(*[TransformerBlock(...) for _ in range(N)])
192199
# forward: x = self.blocks(x)
193200
```
201+
::tabs-end
202+
194203

195204
---
196205

articles/cross-entropy-loss.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ For binary cross-entropy, we apply the formula directly: clip predictions with e
3636

3737
### Implementation
3838

39+
::tabs-start
3940
```python
4041
import numpy as np
4142
from numpy.typing import NDArray
@@ -55,6 +56,8 @@ class Solution:
5556
loss = -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
5657
return round(loss, 4)
5758
```
59+
::tabs-end
60+
5861

5962
### Walkthrough
6063

@@ -90,6 +93,7 @@ Average: $(0.35667 + 0.22314) / 2 = 0.28991$
9093

9194
Without epsilon clipping, $\log(0)$ produces $-\infty$ and breaks training.
9295

96+
::tabs-start
9397
```python
9498
# Wrong: log(0) is undefined
9599
loss = -np.mean(y_true * np.log(y_pred))
@@ -98,18 +102,23 @@ loss = -np.mean(y_true * np.log(y_pred))
98102
y_pred = np.clip(y_pred, 1e-7, 1 - 1e-7)
99103
loss = -np.mean(y_true * np.log(y_pred))
100104
```
105+
::tabs-end
106+
101107

102108
### Mixing Up Binary and Categorical
103109

104110
Binary cross-entropy expects 1D arrays (one probability per sample). Categorical expects 2D arrays (one probability per class per sample). Using the wrong one silently produces wrong gradients.
105111

112+
::tabs-start
106113
```python
107114
# Wrong: using BCE formula on one-hot encoded multi-class data
108115
loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
109116

110117
# Correct: for multi-class, sum over classes first, then average over samples
111118
loss = -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
112119
```
120+
::tabs-end
121+
113122

114123
---
115124

articles/gpt-data-loader.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Sample random starting indices with `torch.randint`, then for each index use ten
3232

3333
### Implementation
3434

35+
::tabs-start
3536
```python
3637
import torch
3738
from torchtyping import TensorType
@@ -45,6 +46,8 @@ class Solution:
4546
y = torch.stack([data[i + 1:i + 1 + context_length] for i in ix])
4647
return x, y
4748
```
49+
::tabs-end
50+
4851

4952
### Walkthrough
5053

@@ -73,25 +76,31 @@ At position 0 of batch 0, the model sees $[20]$ and must predict $30$. At positi
7376

7477
If you sample from $[0, \text{len}(\text{data}))$ instead of $[0, \text{len}(\text{data}) - C)$, starting positions near the end will cause index-out-of-bounds when extracting the target window.
7578

79+
::tabs-start
7680
```python
7781
# Wrong: index can be too large, y slice goes past end
7882
ix = torch.randint(len(data), (batch_size,))
7983

8084
# Correct: ensure room for context_length + 1 tokens
8185
ix = torch.randint(len(data) - context_length, (batch_size,))
8286
```
87+
::tabs-end
88+
8389

8490
### Forgetting the +1 Offset for Targets
8591

8692
The target window starts one position after the input window. Without the offset, input and target are identical and the model learns nothing.
8793

94+
::tabs-start
8895
```python
8996
# Wrong: target same as input
9097
y = torch.stack([data[i:i + context_length] for i in ix])
9198

9299
# Correct: target shifted by 1
93100
y = torch.stack([data[i + 1:i + 1 + context_length] for i in ix])
94101
```
102+
::tabs-end
103+
95104

96105
---
97106

articles/gpt-dataset.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Split the raw text into words, sample random starting positions, and extract con
3333

3434
### Implementation
3535

36+
::tabs-start
3637
```python
3738
import torch
3839
from typing import List, Tuple
@@ -49,6 +50,8 @@ class Solution:
4950
Y.append(tokenized[idx+1:idx+1+context_length])
5051
return X, Y
5152
```
53+
::tabs-end
54+
5255

5356
### Walkthrough
5457

@@ -77,6 +80,7 @@ Each target word is the next word after the corresponding input position.
7780

7881
The problem uses `torch.manual_seed(0)` for reproducibility. Using `random.randint` instead produces different indices and fails the test cases.
7982

83+
::tabs-start
8084
```python
8185
# Wrong: different RNG, non-reproducible
8286
import random
@@ -87,11 +91,14 @@ indices = [random.randint(0, len(tokenized) - context_length - 1) for _ in range
8791
torch.manual_seed(0)
8892
indices = torch.randint(low=0, high=len(tokenized) - context_length, size=(batch_size,)).tolist()
8993
```
94+
::tabs-end
95+
9096

9197
### Forgetting to Convert Tensor Indices to Python List
9298

9399
`torch.randint` returns a tensor. Using it directly for list slicing works, but `.tolist()` makes the code clearer and avoids potential type issues.
94100

101+
::tabs-start
95102
```python
96103
# Works but less clear
97104
indices = torch.randint(low=0, high=n, size=(batch_size,))
@@ -101,6 +108,8 @@ for idx in indices:
101108
# Better: explicit conversion
102109
indices = torch.randint(low=0, high=n, size=(batch_size,)).tolist()
103110
```
111+
::tabs-end
112+
104113

105114
---
106115

articles/gradient-descent.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ We start at some initial value and repeatedly apply the update rule. Each iterat
2929

3030
### Implementation
3131

32+
::tabs-start
3233
```python
3334
class Solution:
3435
def get_minimizer(self, iterations: int, learning_rate: float, init: int) -> float:
@@ -40,6 +41,8 @@ class Solution:
4041

4142
return round(minimizer, 5)
4243
```
44+
::tabs-end
45+
4346

4447
### Walkthrough
4548

@@ -66,6 +69,7 @@ Each step multiplies $x$ by $(1 - 2\alpha) = 0.8$, so convergence is geometric.
6669

6770
A common mistake is computing the derivative but not actually subtracting it from the current value:
6871

72+
::tabs-start
6973
```python
7074
# Wrong: derivative computed but minimizer never changes
7175
derivative = 2 * minimizer
@@ -75,6 +79,8 @@ derivative = 2 * minimizer
7579
derivative = 2 * minimizer
7680
minimizer = minimizer - learning_rate * derivative
7781
```
82+
::tabs-end
83+
7884

7985
### Using the Wrong Derivative
8086

0 commit comments

Comments
 (0)