Incident response runbook and diagnostic scripts

Essential diagnostic commands and runbook procedures for production incidents. Quickly triage high CPU, memory leaks, disk full, and network issues with structured investigation scripts. Includes severity classification, escalation procedures, and pos

Load testing APIs with k6 for performance validation

Write comprehensive load tests using k6 to validate API performance before production deployments. Define scenarios with ramping VUs, set thresholds for response times and error rates, test specific endpoints, and integrate results with CI/CD pipeline

Container registry management and image lifecycle

Manage container images across registries including Docker Hub, AWS ECR, and GitHub Container Registry. Automate image tagging strategies, implement lifecycle policies for cleanup, scan for vulnerabilities with Trivy, and set up cross-region replicati

AWS Lambda serverless functions with Terraform

Deploy serverless functions on AWS Lambda using Terraform. Configure API Gateway integration, CloudWatch logging, environment variables, and IAM roles. Package Python or Node.js handlers with dependencies, set up event triggers, and manage function ve

HashiCorp Vault for secrets management in Kubernetes

Integrate HashiCorp Vault with Kubernetes for dynamic secrets management. Use the Vault Agent sidecar injector to automatically inject secrets into pods, configure KV secret engines, and set up Kubernetes authentication. Eliminate hardcoded secrets fr

ArgoCD GitOps continuous deployment for Kubernetes

Implement GitOps with ArgoCD for declarative, git-driven Kubernetes deployments. Configure Application and ApplicationSet resources, automated sync policies, health checks, and multi-environment promotion. Keep your cluster state in sync with your Git

AWS VPC and networking with Terraform

Build production-ready AWS VPC infrastructure using Terraform. Create public and private subnets across availability zones, configure NAT gateways, internet gateways, and route tables. Implement network ACLs and VPC flow logs for security and observab

Kubernetes Jobs and CronJobs for batch workloads

Run one-off tasks and scheduled batch processing in Kubernetes with Job and CronJob resources. Configure parallelism, backoff limits, completion counts, and cron schedules. Handle cleanup policies and monitor job history for reliable batch operations.

Docker networking: bridge, host, and overlay networks

Master Docker networking modes and custom network creation. Understand bridge networks for container isolation, host mode for direct host networking, and overlay networks for multi-host Swarm communication. Configure DNS resolution, port mapping, and

Linux system administration essentials for DevOps

Linux system administration is fundamental to DevOps. Process management with ps, top, htop monitors system activity. systemctl manages systemd services—start, stop, enable, disable. Disk management with df, du, lsblk, mount handles storage. journalct

Terraform state management and workspace strategies

Terraform state tracks the mapping between configuration and real infrastructure. Remote state backends like S3, GCS, or Terraform Cloud enable team collaboration. DynamoDB provides state locking to prevent concurrent modifications. The terraform_remo

Kubernetes StatefulSets for stateful workloads

StatefulSets manage stateful applications requiring stable identities and persistent storage. Unlike Deployments, StatefulSets provide ordered Pod creation (pod-0, pod-1, pod-2) and stable network identifiers. Each Pod gets a predictable hostname via