evaluation

Confusion matrix diagnostics for threshold selection

Thresholds are policy decisions disguised as numbers. I use confusion matrices to make those tradeoffs concrete for stakeholders: how many risky accounts we block, how many fraud attempts slip through, and how much manual review load is created. This

Classification metrics beyond accuracy for imbalanced problems

Accuracy is a bad comfort metric when the positive class is rare. I care more about precision, recall, PR AUC, calibration, and how thresholding changes operational workload. The right metric depends on the cost of false negatives versus false positiv

Train test split and stratified cross validation done properly

Evaluation goes wrong when data splitting is treated like boilerplate. I stratify imbalanced targets, guard time order when necessary, and make sure preprocessing lives inside cross-validation. This is the difference between a model that looks good in