EXPLAIN and query plan optimization

4262 views

                  -- Basic EXPLAIN
EXPLAIN
SELECT * FROM users WHERE email = 'alice@example.com';

-- EXPLAIN with cost and row estimates
-- Output shows: Seq Scan on users (cost=0.00..15.50 rows=1 width=100)

-- EXPLAIN ANALYZE: Actual execution with timing
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

-- PostgreSQL: Detailed output
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM orders WHERE user_id = 123;

-- MySQL: Extended format
EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE created_at > '2024-01-01';

-- Example output analysis:
/*
Seq Scan on users  (cost=0.00..18.50 rows=10 width=100)
                   (actual time=0.012..0.034 rows=8 loops=1)
  Filter: (created_at > '2024-01-01')
  Rows Removed by Filter: 2
Planning Time: 0.123 ms
Execution Time: 0.456 ms

Key metrics:
- cost: Estimated cost (startup..total)
- rows: Estimated rows (10) vs actual rows (8)
- width: Average row size in bytes
- actual time: Real execution time
- loops: How many times node executed
*/

-- Index scan vs Sequential scan
-- Without index: Sequential Scan
EXPLAIN SELECT * FROM orders WHERE status = 'pending';
-- Seq Scan on orders (cost=0.00..50.00 rows=100)

CREATE INDEX idx_orders_status ON orders(status);

-- With index: Index Scan
EXPLAIN SELECT * FROM orders WHERE status = 'pending';
-- Index Scan using idx_orders_status (cost=0.28..8.30 rows=100)

-- Bitmap Index Scan: Combines multiple indexes
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending' AND created_at > CURRENT_DATE - INTERVAL '7 days';
-- Bitmap Heap Scan on orders
--   Recheck Cond: ...
--   -> BitmapAnd
--        -> Bitmap Index Scan on idx_orders_status
--        -> Bitmap Index Scan on idx_orders_created_at

                  -- Join strategies

-- Nested Loop: Good for small datasets
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.id = 123;
-- Nested Loop (cost=... rows=10)
--   -> Index Scan on users (cost=... rows=1)
--   -> Index Scan on orders (cost=... rows=10)

-- Hash Join: Better for large datasets
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- Hash Left Join (cost=... rows=1000)
--   -> Seq Scan on users (cost=... rows=1000)
--   -> Hash (cost=... rows=5000)
--         -> Seq Scan on orders

-- Merge Join: For sorted data
EXPLAIN ANALYZE
SELECT *
FROM orders o1
INNER JOIN orders o2 ON o1.user_id = o2.user_id
WHERE o1.id < o2.id;
-- Merge Join (cost=...)

-- Force index usage (MySQL)
SELECT * FROM users USE INDEX (idx_email)
WHERE email LIKE 'test%';

-- Subquery vs JOIN comparison
-- Subquery (may not use index efficiently)
EXPLAIN ANALYZE
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

-- JOIN (usually better)
EXPLAIN ANALYZE
SELECT DISTINCT u.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.total > 100;

-- Update statistics for better plans
ANALYZE users;
ANALYZE orders;

-- PostgreSQL: Check statistics
SELECT
  schemaname,
  tablename,
  last_analyze,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables;

-- Disable specific plan nodes for testing
SET enable_seqscan = OFF;  -- Force index usage
EXPLAIN SELECT * FROM users WHERE created_at > '2024-01-01';
SET enable_seqscan = ON;

-- Common plan node types:
-- - Seq Scan: Full table scan
-- - Index Scan: B-tree index lookup
-- - Index Only Scan: Covering index (no table access)
-- - Bitmap Index Scan: Multiple index combination
-- - Nested Loop: Join strategy for small datasets
-- - Hash Join: Join strategy for large datasets
-- - Merge Join: Join on sorted data
-- - Sort: ORDER BY operation
-- - Aggregate: GROUP BY operation
-- - Limit: LIMIT clause

-- Identify slow queries (PostgreSQL)
SELECT
  calls,
  total_time,
  mean_time,
  query
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;

EXPLAIN reveals database execution plans. I use EXPLAIN ANALYZE for actual runtime statistics. Understanding plan nodes—Seq Scan, Index Scan, Nested Loop, Hash Join—guides optimization. Cost estimates predict query expense. Rows estimates show expected result size. Inaccurate statistics cause poor plans—run ANALYZE regularly. Sequential scans aren't always bad for small tables. Index scans excel for selective queries. Bitmap scans combine multiple indexes. Hash joins suit large datasets. Nested loops work best with small inner tables. Sort operations indicate ORDER BY cost. Understanding execution plans is essential for performance tuning. EXPLAIN is the first step in fixing slow queries.

Maria Garcia

More from Maria Garcia

EXPLAIN and query plan optimization

0 Comments

More from Maria Garcia