python 14 lines · 1 tab

Great Expectations checks for dataset health before retraining

Dr. Elena Vasquez Apr 2026

1 tab

import great_expectations as gx

context = gx.get_context()
data_source = context.data_sources.add_pandas(name='training_data')
asset = data_source.add_dataframe_asset(name='churn_asset')
batch_definition = asset.add_batch_definition_whole_dataframe('full_dataframe')

batch = batch_definition.get_batch(batch_parameters={'dataframe': df})
suite = context.suites.add(gx.ExpectationSuite(name='churn_training_suite'))

batch.expect_column_values_to_not_be_null('customer_id')
batch.expect_column_values_to_be_between('age', min_value=18, max_value=100)
batch.expect_column_distinct_values_to_be_in_set('churned', [0, 1])
batch.expect_column_proportion_of_unique_values_to_be_between('customer_id', min_value=1.0, max_value=1.0)

1 file · python Explain with highlit

Before retraining, I want hard guarantees that the data feed still looks structurally sane. Great Expectations gives teams a shared validation language that analysts, ML engineers, and data engineers can all inspect. I use it to codify invariants that should never silently drift.

Share this code

Here's the card — post it anywhere.

Great Expectations checks for dataset health before retraining — share card

Link copied