All Problems Description Template Solution

LayerNorm

Normalization, running stats, affine transform

Medium Fundamentals

Problem Description

Implement Layer Normalization from scratch.

$$\text{LayerNorm}(x) = \gamma \cdot \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} + \beta$$

where \mu and \sigma^2 are computed over the last dimension.

Signature

def my_layer_norm( x: torch.Tensor, # input gamma: torch.Tensor, # scale (same size as last dim) beta: torch.Tensor, # shift (same size as last dim) eps: float = 1e-5 ) -> torch.Tensor: ...

Rules

• Do NOT use F.layer_norm or torch.nn.LayerNorm

• Normalize over the last dimension only

• Must support autograd

Template

Implement the function below. Use only basic PyTorch operations.

# ✏️ YOUR IMPLEMENTATION HERE def my_layer_norm(x, gamma, beta, eps=1e-5): pass # Replace this

Test Your Implementation

Use this code to debug before submitting.

# 🧪 Debug x = torch.randn(2, 8) gamma = torch.ones(8) beta = torch.zeros(8) out = my_layer_norm(x, gamma, beta) ref = torch.nn.functional.layer_norm(x, [8], gamma, beta) print("Your output mean:", out.mean(dim=-1)) # should be ~0 print("Your output std: ", out.std(dim=-1)) # should be ~1 print("Match ref? ", torch.allclose(out, ref, atol=1e-4))

Reference Solution

Try solving it yourself first! Click below to reveal the solution.

# ✅ SOLUTION def my_layer_norm(x, gamma, beta, eps=1e-5): mean = x.mean(dim=-1, keepdim=True) var = x.var(dim=-1, keepdim=True, unbiased=False) x_norm = (x - mean) / torch.sqrt(var + eps) return gamma * x_norm + beta

Tips

Run Locally

For interactive practice with auto-grading, run TorchCode locally:
pip install torch-judge then use check("layernorm")

Key Concepts

Normalization, running stats, affine transform

LayerNorm

Description Template Test Solution Tips