nlp

Word embeddings with gensim for semantic similarity tasks

Dense embeddings help when lexical overlap is weak but semantic similarity matters. I use them for retrieval prototypes, clustering, and feature enrichment when transformer infrastructure is overkill. The main discipline is keeping training data clean

Using Hugging Face transformers for modern NLP inference

I use transformers when the text task justifies contextual modeling and the serving budget can handle it. The fastest path to value is usually starting with pretrained checkpoints, measuring latency, and then deciding whether quantization, distillatio

Text vectorization with TF-IDF for strong classical baselines

Before I fine-tune transformers, I almost always try a TF-IDF baseline. It is fast, interpretable, and often surprisingly competitive for moderate text classification tasks. If a linear model over sparse features is already good enough, that is usuall

Natural language processing with spaCy pipelines and custom rules

I like spaCy for production NLP because it balances performance, ergonomics, and deployability. It is especially good for entity extraction, rule-based matching, and clean token-level processing. I often pair learned models with explicit match pattern