import pandas as pd
df = pd.read_parquet('events.parquet')
df['event_date'] = pd.to_datetime(df['event_date'])
df['month'] = df['event_date'].dt.to_period('M').astype(str)
channel_summary = (
df.groupby(['month', 'acquisition_channel'], as_index=False)
.agg(
sessions=('session_id', 'nunique'),
users=('user_id', 'nunique'),
revenue=('revenue', 'sum'),
avg_order_value=('revenue', 'mean'),
)
)
revenue_matrix = pd.pivot_table(
channel_summary,
index='month',
columns='acquisition_channel',
values='revenue',
fill_value=0,
aggfunc='sum',
)
print(channel_summary.head())
print(revenue_matrix.tail())
I reach for groupby when I need trustworthy aggregates that can power dashboards or analytical reports. Clear aggregation naming matters because these outputs frequently get joined back into feature tables or exported to BI systems. pivot_table is useful when stakeholders want category x time summaries without manual spreadsheet work.