For image work, I start with a compact CNN before reaching for heavy pretrained models. That baseline helps confirm whether labels, normalization, and augmentation are sane. It also makes failure cases easier to explain because the model architecture is still small enough to reason about directly.