import pandas as pd
df = pd.read_csv(
'orders.csv',
parse_dates=['created_at'],
dtype={
'customer_id': 'int64',
'country': 'string',
'status': 'category',
},
)
df.columns = (
df.columns.str.strip()
.str.lower()
.str.replace(' ', '_', regex=False)
)
active_orders = df.loc[
(df['status'] == 'paid') &
(df['country'].isin(['US', 'CA'])),
['customer_id', 'created_at', 'total_amount']
].copy()
active_orders = active_orders.set_index('created_at').sort_index()
latest_orders = active_orders.last('30D')
print(latest_orders.head())
print(latest_orders.dtypes)
I treat pandas as the default workbench for structured data. The goal is to make loading explicit, indexes predictable, and selection operations readable under maintenance pressure. I prefer stable column naming, typed parsing for dates, and avoiding chained indexing. Once a DataFrame is shaped well, downstream feature engineering and model training get dramatically simpler.