import great_expectations as gx
context = gx.get_context()
data_source = context.data_sources.add_pandas(name='training_data')
asset = data_source.add_dataframe_asset(name='churn_asset')
batch_definition = asset.add_batch_definition_whole_dataframe('full_dataframe')
batch = batch_definition.get_batch(batch_parameters={'dataframe': df})
suite = context.suites.add(gx.ExpectationSuite(name='churn_training_suite'))
batch.expect_column_values_to_not_be_null('customer_id')
batch.expect_column_values_to_be_between('age', min_value=18, max_value=100)
batch.expect_column_distinct_values_to_be_in_set('churned', [0, 1])
batch.expect_column_proportion_of_unique_values_to_be_between('customer_id', min_value=1.0, max_value=1.0)
Before retraining, I want hard guarantees that the data feed still looks structurally sane. Great Expectations gives teams a shared validation language that analysts, ML engineers, and data engineers can all inspect. I use it to codify invariants that should never silently drift.