feature-engineering

Encoding categorical variables without creating leakage

Categoricals are where good intentions become leakage. I use one-hot encoding for low-cardinality stable fields, ordinal encoders only when order is real, and frequency or target encoders with strict cross-validation boundaries. The encoder strategy s

SQL window functions for feature extraction and behavioral ranking

A surprising amount of feature engineering is best done in SQL before Python ever runs. ROW_NUMBER, LAG, rolling windows, and partitioned aggregates are ideal for deriving customer behavior signals close to the source. I use SQL here when it reduces m

Feature engineering for recency, frequency, and monetary behavior

Tabular models improve fast when you encode behavior rather than raw events. Recency, frequency, and monetary aggregates are durable baseline features for retention, fraud, and conversion use cases. I usually build them in pure pandas first, then port

NumPy broadcasting for vectorized feature engineering

Good NumPy code replaces Python loops with array semantics that are easier to optimize and easier to benchmark. Broadcasting is the feature that makes those transformations elegant. I rely on it for normalization, distance calculations, and matrix-fri