A clean PyTorch training loop with validation and checkpoints

1670
0

The training loop is where research code either becomes maintainable or turns into a mess. I keep it explicit: train phase, validation phase, scheduler step, metric tracking, and checkpoint saving. That structure pays off immediately when experiments fail halfway through or need to be resumed on another machine.