Main Site β†—

pytorch

by benchflow-ai890173GitHub

Building and training neural networks with PyTorch. Use when implementing deep learning models, training loops, data pipelines, model optimization with torch.compile, distributed training, or deploying PyTorch models.

Unlock Deep Analysis

Use AI to visualize the workflow and generate a realistic output preview for this skill.

Powered by Fastest LLM

Development
Compatible Agents
Claude Code
Claude Code
~/.claude/skills/
Codex CLI
Codex CLI
~/.codex/skills/
Gemini CLI
Gemini CLI
~/.gemini/skills/
O
OpenCode
~/.opencode/skills/
O
OpenClaw
~/.openclaw/skills/
GitHub Copilot
GitHub Copilot
~/.copilot/skills/
Cursor
Cursor
~/.cursor/skills/
W
Windsurf
~/.codeium/windsurf/skills/
C
Cline
~/.cline/skills/
R
Roo Code
~/.roo/skills/
K
Kiro
~/.kiro/skills/
J
Junie
~/.junie/skills/
A
Augment Code
~/.augment/skills/
W
Warp
~/.warp/skills/
G
Goose
~/.config/goose/skills/
SKILL.md

Train vs Eval Mode

  • model.train() enables dropout, BatchNorm updates β€” default after init
  • model.eval() disables dropout, uses running stats β€” MUST call for inference
  • Mode is sticky β€” train/eval persists until explicitly changed
  • model.eval() doesn't disable gradients β€” still need torch.no_grad()

Gradient Control

  • torch.no_grad() for inference β€” reduces memory, speeds up computation
  • loss.backward() accumulates gradients β€” call optimizer.zero_grad() before backward
  • zero_grad() placement matters β€” before forward pass, not after backward
  • .detach() to stop gradient flow β€” prevents memory leak in logging

Device Management

  • Model AND data must be on same device β€” model.to(device) and tensor.to(device)
  • .cuda() vs .to('cuda') β€” both work, .to(device) more flexible
  • CUDA tensors can't convert to numpy directly β€” .cpu().numpy() required
  • torch.device('cuda' if torch.cuda.is_available() else 'cpu') β€” portable code

DataLoader

  • num_workers > 0 uses multiprocessing β€” Windows needs if __name__ == '__main__':
  • pin_memory=True with CUDA β€” faster transfer to GPU
  • Workers don't share state β€” random seeds differ per worker, set in worker_init_fn
  • Large num_workers can cause memory issues β€” start with 2-4, increase if CPU-bound

Saving and Loading

  • torch.save(model.state_dict(), path) β€” recommended, saves only weights
  • Loading: create model first, then model.load_state_dict(torch.load(path))
  • map_location for cross-device β€” torch.load(path, map_location='cpu') if saved on GPU
  • Saving whole model pickles code path β€” breaks if code changes

In-place Operations

  • In-place ops end with _ β€” tensor.add_(1) vs tensor.add(1)
  • In-place on leaf variable breaks autograd β€” error about modified leaf
  • In-place on intermediate can corrupt gradient β€” avoid in computation graph
  • tensor.data bypasses autograd β€” legacy, prefer .detach() for safety

Memory Management

  • Accumulated tensors leak memory β€” .detach() logged metrics
  • torch.cuda.empty_cache() releases cached memory β€” but doesn't fix leaks
  • Delete references and call gc.collect() β€” before empty_cache if needed
  • with torch.no_grad(): prevents graph storage β€” crucial for validation loop

Common Mistakes

  • BatchNorm with batch_size=1 fails in train mode β€” use eval mode or track_running_stats=False
  • Loss function reduction default is 'mean' β€” may want 'sum' for gradient accumulation
  • cross_entropy expects logits β€” not softmax output
  • .item() to get Python scalar β€” .numpy() or [0] deprecated/error

Source: https://github.com/benchflow-ai/SkillsBench#registry-terminal_bench_2.0-full_batch_reviewed-terminal_bench_2_0_torch-tensor-parallelism-environment-skills-pytorch

Content curated from original sources, copyright belongs to authors

Grade B
-AI Score
Best Practices
Checking...
Try this Skill

User Rating

USER RATING

0UP
0DOWN
Loading files...

WORKS WITH

Claude Code
Claude
Codex CLI
Codex
Gemini CLI
Gemini
O
OpenCode
O
OpenClaw
GitHub Copilot
Copilot
Cursor
Cursor
W
Windsurf
C
Cline
R
Roo
K
Kiro
J
Junie
A
Augment
W
Warp
G
Goose