classification

Confusion matrix diagnostics for threshold selection

Thresholds are policy decisions disguised as numbers. I use confusion matrices to make those tradeoffs concrete for stakeholders: how many risky accounts we block, how many fraud attempts slip through, and how much manual review load is created. This

Baseline classifiers in scikit-learn for fast benchmark setting

I like setting a few strong baselines before chasing complexity. A regularized logistic regression, a random forest, and a gradient boosting model usually tell me whether the problem is linearly separable, non-linear, or data-limited. Good baseline di