python 23 lines · 1 tab

ColumnTransformer pipelines that keep preprocessing honest

Dr. Elena Vasquez Apr 2026

1 tab

from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier

preprocessor = ColumnTransformer([
    ('num', Pipeline([
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler()),
    ]), ['age', 'income', 'days_since_last_login']),
    ('cat', Pipeline([
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('encoder', OneHotEncoder(handle_unknown='ignore')),
    ]), ['country', 'plan_tier']),
    ('text', TfidfVectorizer(max_features=5000, ngram_range=(1, 2)), 'support_ticket_text'),
])

model = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_estimators=300, random_state=42, n_jobs=-1)),
])

1 file · python Explain with highlit

I push nearly all preprocessing into a Pipeline so training and inference paths share exactly the same logic. ColumnTransformer is the workhorse here because real-world tables mix numeric, categorical, boolean, and text fields. It gives you reproducibility without having to manage fragile pre-fit artifacts by hand.

Share this code

Here's the card — post it anywhere.

ColumnTransformer pipelines that keep preprocessing honest — share card

Link copied