data-cleaning

Cleaning missing values and normalizing messy CSV exports

Real data arrives dirty. I usually start with missing-value audits, duplicate removal, explicit type conversion, and canonical text cleanup. The trick is to make each cleanup rule reproducible rather than burying it in notebook state. I prefer small,

Regular expressions for extracting structured entities from raw text

Regex is not glamorous, but it remains one of the fastest ways to turn messy text into useful structured fields. I use it for IDs, dates, codes, and log fragments before reaching for heavier NLP. The important part is making patterns specific enough t