pandas

Cleaning missing values and normalizing messy CSV exports

Real data arrives dirty. I usually start with missing-value audits, duplicate removal, explicit type conversion, and canonical text cleanup. The trick is to make each cleanup rule reproducible rather than burying it in notebook state. I prefer small,

Time series resampling and rolling windows in pandas

For operational metrics and forecasting features, I standardize timestamps first and then resample into stable windows. Rolling statistics like 7D means, lagged deltas, and volatility bands are easy wins for exploratory analysis. I avoid mixing timezo

GroupBy aggregations and pivot tables for business reporting

I reach for groupby when I need trustworthy aggregates that can power dashboards or analytical reports. Clear aggregation naming matters because these outputs frequently get joined back into feature tables or exported to BI systems. pivot_table is use

pandas DataFrame essentials: loading, indexing, and selection

I treat pandas as the default workbench for structured data. The goal is to make loading explicit, indexes predictable, and selection operations readable under maintenance pressure. I prefer stable column naming, typed parsing for dates, and avoiding

Feature engineering for recency, frequency, and monetary behavior

Tabular models improve fast when you encode behavior rather than raw events. Recency, frequency, and monetary aggregates are durable baseline features for retention, fraud, and conversion use cases. I usually build them in pure pandas first, then port

Merging datasets safely with join keys and validation

Merges are where silent data corruption often begins. I prefer explicit key audits, join cardinality validation, and indicator columns when investigating row loss or duplication. In production analytics, proving that a join is one_to_one or many_to_on