WITH ordered_events AS (
SELECT
customer_id,
event_time,
revenue,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY event_time DESC) AS event_rank,
LAG(event_time) OVER (PARTITION BY customer_id ORDER BY event_time) AS previous_event_time,
SUM(revenue) OVER (
PARTITION BY customer_id
ORDER BY event_time
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7_event_revenue
FROM customer_events
)
SELECT
customer_id,
MAX(CASE WHEN event_rank = 1 THEN event_time END) AS latest_event_time,
AVG(EXTRACT(EPOCH FROM (event_time - previous_event_time)) / 3600.0) AS avg_hours_between_events,
MAX(rolling_7_event_revenue) AS max_recent_revenue
FROM ordered_events
GROUP BY customer_id;
A surprising amount of feature engineering is best done in SQL before Python ever runs. ROW_NUMBER, LAG, rolling windows, and partitioned aggregates are ideal for deriving customer behavior signals close to the source. I use SQL here when it reduces movement, ambiguity, and notebook-only logic.