python 23 lines · 1 tab

Cleaning missing values and normalizing messy CSV exports

Dr. Elena Vasquez Apr 2026

1 tab

import pandas as pd

df = pd.read_csv('customers.csv')

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_', regex=False)

df['email'] = df['email'].str.strip().str.lower()
df['country'] = df['country'].fillna('unknown').str.strip().str.upper()
df['signup_date'] = pd.to_datetime(df['signup_date'], errors='coerce')
df['lifetime_value'] = pd.to_numeric(df['lifetime_value'], errors='coerce')

df = df.drop_duplicates(subset=['email'], keep='last')
df = df[df['email'].notna()]
df['lifetime_value'] = df['lifetime_value'].fillna(df['lifetime_value'].median())

missing_summary = (
    df.isna()
    .mean()
    .sort_values(ascending=False)
    .rename('missing_ratio')
)

assert missing_summary.loc['email'] == 0, 'email should be fully populated'

1 file · python Explain with highlit

Real data arrives dirty. I usually start with missing-value audits, duplicate removal, explicit type conversion, and canonical text cleanup. The trick is to make each cleanup rule reproducible rather than burying it in notebook state. I prefer small, composable transformations and assertions that fail loudly when source feeds drift.

Share this code

Here's the card — post it anywhere.

Cleaning missing values and normalizing messy CSV exports — share card

Link copied