A B testing analysis with confidence intervals and guardrails

Experiment analysis should not stop at a binary win or lose label. I calculate uplift, confidence intervals, and guardrail metrics like latency or refund rate before recommending rollout. The point of the analysis is decision quality, not statistical

Hypothesis testing for product experiments in Python

I use hypothesis testing to quantify whether observed differences are likely noise or signal, but I keep the business context attached. A tiny p-value without practical effect size is not a win. The code should make assumptions visible: sample sizes,

Jupyter notebook setup that stays reproducible and reviewable

Notebooks are great for exploration but dangerous when they become invisible production dependencies. I keep them reproducible by pinning environments, clearing stale state, and structuring them so rerunning from top to bottom works every time. If a r

Fine tuning transformer models for domain text classification

Fine tuning pays off when domain language differs from general web text and you have enough labeled examples to justify it. I keep the training recipe conservative: class weighting if needed, early stopping, mixed precision when available, and metrics

Using Hugging Face transformers for modern NLP inference

I use transformers when the text task justifies contextual modeling and the serving budget can handle it. The fastest path to value is usually starting with pretrained checkpoints, measuring latency, and then deciding whether quantization, distillatio

Custom Datasets and DataLoaders for robust training input pipelines

Input pipelines are part of the model system, not an afterthought. I keep dataset classes deterministic, move expensive transforms into explicit stages, and use DataLoader settings that match hardware limits. Good batching and collation logic can remo

Transfer learning with pretrained torchvision backbones

Transfer learning is the right default when labeled data is limited and time matters. I usually freeze the backbone first, train the head, then selectively unfreeze deeper layers if the domain gap justifies it. This strategy converges faster and is mu

Convolutional neural networks for image classification in PyTorch

For image work, I start with a compact CNN before reaching for heavy pretrained models. That baseline helps confirm whether labels, normalization, and augmentation are sane. It also makes failure cases easier to explain because the model architecture

A clean PyTorch training loop with validation and checkpoints

The training loop is where research code either becomes maintainable or turns into a mess. I keep it explicit: train phase, validation phase, scheduler step, metric tracking, and checkpoint saving. That structure pays off immediately when experiments

PyTorch tensor basics and automatic differentiation

I treat PyTorch tensors like the main vocabulary of deep learning work. Understanding device placement, shape semantics, and autograd is more important than memorizing model classes. Once that foundation is solid, debugging training loops gets much ea

Word embeddings with gensim for semantic similarity tasks

Dense embeddings help when lexical overlap is weak but semantic similarity matters. I use them for retrieval prototypes, clustering, and feature enrichment when transformer infrastructure is overkill. The main discipline is keeping training data clean

Text vectorization with TF-IDF for strong classical baselines

Before I fine-tune transformers, I almost always try a TF-IDF baseline. It is fast, interpretable, and often surprisingly competitive for moderate text classification tasks. If a linear model over sparse features is already good enough, that is usuall