preprocessing

OpenCV image preprocessing for OCR and vision pipelines

A lot of computer vision performance comes from cleaner inputs rather than larger models. I use OpenCV for resizing, denoising, thresholding, and contour extraction when preparing images for OCR or downstream classification. These classical steps ofte

Cleaning missing values and normalizing messy CSV exports

Real data arrives dirty. I usually start with missing-value audits, duplicate removal, explicit type conversion, and canonical text cleanup. The trick is to make each cleanup rule reproducible rather than burying it in notebook state. I prefer small,

Encoding categorical variables without creating leakage

Categoricals are where good intentions become leakage. I use one-hot encoding for low-cardinality stable fields, ordinal encoders only when order is real, and frequency or target encoders with strict cross-validation boundaries. The encoder strategy s

ColumnTransformer pipelines that keep preprocessing honest

I push nearly all preprocessing into a Pipeline so training and inference paths share exactly the same logic. ColumnTransformer is the workhorse here because real-world tables mix numeric, categorical, boolean, and text fields. It gives you reproducib