sql-optimization
No description provided.
Unlock Deep Analysis
Use AI to visualize the workflow and generate a realistic output preview for this skill.
Powered by Fastest LLM
SQL Query Optimization Patterns
Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.
Quick Reference
When to use this skill:
- Slow database queries (>100ms for simple queries, >1s for complex)
- High database CPU usage
- Analyzing EXPLAIN plans
- Designing database indexes
- Debugging N+1 query problems
- Optimizing JOIN operations
- Reducing table scans
Common triggers:
- "This query is too slow"
- "How do I optimize this SQL"
- "What indexes should I add"
- "Explain this EXPLAIN plan"
- "Fix N+1 queries"
- "Database CPU at 100%"
Typical improvements:
- 3 seconds → 50ms (60x faster)
- Full table scan → Index scan
- 1000 queries → 2 queries (N+1 elimination)
Part 1: Understanding EXPLAIN Plans
Reading PostgreSQL EXPLAIN
EXPLAIN ANALYZE
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
ORDER BY p.created_at DESC
LIMIT 10;
Key Metrics to Watch:
- Seq Scan (bad): Full table scan, reads every row
- Index Scan (good): Uses index, reads only needed rows
- Cost: Estimated computational cost (lower is better)
- Actual time: Real execution time in milliseconds
- Rows: Number of rows processed at each step
Bad EXPLAIN Example
Seq Scan on users u (cost=0.00..1234.00 rows=1000 width=50)
(actual time=0.123..45.678 rows=950 loops=1)
Filter: (created_at > '2024-01-01'::date)
Rows Removed by Filter: 50000
Planning Time: 0.234 ms
Execution Time: 3456.789 ms
Problems:
- Seq Scan: Reading entire table (50,950 rows)
- Rows Removed by Filter: Filtering after reading (wasteful)
- Execution Time: 3.5 seconds (way too slow)
Good EXPLAIN Example (After Index)
Index Scan using users_created_at_idx on users u
(cost=0.29..123.45 rows=950 width=50)
(actual time=0.012..3.456 rows=950 loops=1)
Index Cond: (created_at > '2024-01-01'::date)
Planning Time: 0.123 ms
Execution Time: 4.567 ms
Improvements:
- Index Scan: Using index (only reads needed rows)
- Index Cond: Filtering during index scan (efficient)
- Execution Time: 4.5ms (750x faster!)
Part 2: Index Design Strategies
When to Add an Index
✅ Add index when:
- Column frequently in WHERE clauses
- Column frequently in JOIN conditions
- Column frequently in ORDER BY
- Query does Seq Scan on large table (>10K rows)
- Query execution time >100ms
❌ Don't add index when:
- Table has <1000 rows (Seq Scan is fast enough)
- Column rarely queried
- Column has very low cardinality (e.g., boolean with 50/50 distribution)
- Table has heavy writes (indexes slow down INSERTs/UPDATEs)
Single-Column Indexes
-- Create index on frequently queried column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
Query that benefits:
-- Before index: Seq Scan (3000ms)
-- After index: Index Scan (5ms)
SELECT * FROM users WHERE email = 'user@example.com';
Composite Indexes (Multi-Column)
Order matters! Put most selective column first.
-- Good: Specific to general
CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);
-- Query that uses all columns
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC;
Index usage rules:
-- ✅ Uses index (matches from left)
WHERE user_id = 123
WHERE user_id = 123 AND status = 'pending'
WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'
-- ❌ Doesn't use index (skips first column)
WHERE status = 'pending'
WHERE created_at > '2024-01-01'
Partial Indexes (Filtered)
Index only subset of rows (smaller, faster):
-- Only index active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';
-- Query that uses partial index
SELECT * FROM users
WHERE email = 'user@example.com'
AND status = 'active'; -- Must include filter condition!
Benefits:
- Smaller index (faster scans, less storage)
- Only indexes rows you actually query
- Great for status fields with skewed distribution
Covering Indexes (INCLUDE)
Include columns in index to avoid table access:
-- Postgres: INCLUDE clause
CREATE INDEX idx_users_email_covering
ON users(email)
INCLUDE (name, created_at);
-- Query doesn't need to access table (all data in index)
SELECT name, created_at FROM users WHERE email = 'user@example.com';
Benefit: Index-Only Scan (even faster than Index Scan)
Part 3: Query Rewriting Patterns
Problem 1: SELECT * (Fetching Unnecessary Data)
❌ Bad (fetches all columns):
SELECT * FROM users WHERE id = 123;
✅ Good (fetches only needed columns):
SELECT id, name, email FROM users WHERE id = 123;
Impact:
- Smaller result set
- Less memory usage
- Faster network transfer
- Can use covering indexes
Problem 2: N+1 Queries
❌ Bad (1 query for posts + N queries for users):
# 1 query
posts = db.execute("SELECT * FROM posts LIMIT 10")
# N queries (10 separate queries!)
for post in posts:
user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)
print(f"{user.name}: {post.title}")
✅ Good (2 queries total):
# 1 query for posts
posts = db.execute("SELECT * FROM posts LIMIT 10")
# 1 query for all users
user_ids = [p.user_id for p in posts]
users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)
users_by_id = {u.id: u for u in users}
# No additional queries
for post in posts:
user = users_by_id[post.user_id]
print(f"{user.name}: {post.title}")
Even Better (1 query with JOIN):
SELECT u.name, p.title
FROM posts p
JOIN users u ON p.user_id = u.id
LIMIT 10;
Problem 3: Implicit Type Conversion
❌ Bad (index not used due to implicit conversion):
-- user_id is INTEGER, but '123' is string
SELECT * FROM posts WHERE user_id = '123'; -- Seq Scan!
✅ Good (explicit type, index used):
SELECT * FROM posts WHERE user_id = 123; -- Index Scan
Problem 4: Function Calls in WHERE Clause
❌ Bad (can't use index):
-- Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
✅ Good (use functional index):
-- Create index on function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now query can use index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
✅ Even Better (store lowercase, regular index):
-- Store email in lowercase
CREATE INDEX idx_users_email ON users(email);
-- Query with lowercase value
SELECT * FROM users WHERE email = LOWER('user@example.com');
Problem 5: OR Conditions
❌ Bad (often does Seq Scan):
SELECT * FROM users
WHERE email = 'user@example.com'
OR username = 'johndoe';
✅ Good (use UNION, allows index usage):
SELECT * FROM users WHERE email = 'user@example.com'
UNION
SELECT * FROM users WHERE username = 'johndoe';
Problem 6: NOT IN with Subquery
❌ Bad (very slow, especially with large subquery):
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM blocked_users);
✅ Good (use LEFT JOIN with NULL check):
SELECT u.*
FROM users u
LEFT JOIN blocked_users b ON u.id = b.user_id
WHERE b.user_id IS NULL;
Part 4: JOIN Optimization
JOIN Order Matters
Database optimizes join order, but you can help:
Small table first, large table second:
-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows)
SELECT p.*
FROM categories c
JOIN products p ON c.id = p.category_id
WHERE c.name = 'Electronics';
Avoid Cartesian Products
❌ Bad (produces rows1 × rows2 × rows3):
SELECT *
FROM users, posts, comments; -- 100 × 10000 × 50000 = 50 billion rows!
✅ Good (proper JOIN conditions):
SELECT *
FROM users u
JOIN posts p ON u.id = p.user_id
JOIN comments c ON p.id = c.post_id;
EXISTS vs IN for Subqueries
Scenario: Find users who have at least one post
❌ Slower (IN loads all post IDs):
SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts);
✅ Faster (EXISTS stops at first match):
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);
JOIN vs Subquery
Use JOIN when you need data from both tables:
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id;
Use Subquery when you only need filtering:
SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM posts);
Part 5: Aggregation Optimization
COUNT(*) Optimization
Problem: COUNT(*) on large tables is slow
❌ Slow (counts all rows):
SELECT COUNT(*) FROM orders; -- Full table scan on 10M rows
✅ Faster (approximate count from statistics):
-- PostgreSQL: Use statistics (fast but approximate)
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';
✅ Also Good (exact count with index):
-- If you have an index, COUNT can use index-only scan
CREATE INDEX idx_orders_id ON orders(id);
SELECT COUNT(*) FROM orders; -- Index-only scan
GROUP BY Optimization
❌ Bad (groups after fetching all data):
SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;
✅ Good (index helps grouping):
-- Create index on grouped column
CREATE INDEX idx_posts_user_id ON posts(user_id);
SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id; -- Can use index for grouping
HAVING vs WHERE
❌ Bad (filters after grouping):
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING user_id > 1000; -- Filters after grouping all users
✅ Good (filters before grouping):
SELECT user_id, COUNT(*) as post_count
FROM posts
WHERE user_id > 1000 -- Filters before grouping (can use index)
GROUP BY user_id;
Part 6: Pagination Optimization
OFFSET is Slow for Large Offsets
❌ Bad (OFFSET scans all skipped rows):
-- Page 1000 (skips 50,000 rows, then returns 50)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 50 OFFSET 50000; -- Scans 50,050 rows!
✅ Good (keyset pagination with WHERE):
-- Save last seen ID from previous page
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:30:00' -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 50; -- Only fetches 50 rows
Implementation:
# Page 1
results = db.execute("""
SELECT id, created_at, title
FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 50
""")
last_created_at = results[-1].created_at
last_id = results[-1].id
# Page 2 (using last seen values)
results = db.execute("""
SELECT id, created_at, title
FROM posts
WHERE (created_at, id) < (%s, %s)
ORDER BY created_at DESC, id DESC
LIMIT 50
""", last_created_at, last_id)
Benefits:
- Consistent performance for any page number
- No OFFSET overhead
- Works with millions of rows
Part 7: Database-Specific Patterns
PostgreSQL-Specific
1. EXPLAIN ANALYZE (shows actual execution):
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'user@example.com';
2. pg_stat_statements (find slow queries):
-- Enable extension
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
calls,
mean_exec_time,
total_exec_time,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
3. VACUUM and ANALYZE:
-- Update statistics after bulk changes
VACUUM ANALYZE table_name;
MySQL-Specific
1. EXPLAIN FORMAT=JSON (detailed output):
EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE email = 'user@example.com';
2. SHOW PROFILE (detailed timing):
SET profiling = 1;
SELECT * FROM users WHERE email = 'user@example.com';
SHOW PROFILE FOR QUERY 1;
3. Query Cache (deprecated in MySQL 8.0):
-- Old MySQL versions only
SHOW VARIABLES LIKE 'query_cache%';
SQLite-Specific
1. EXPLAIN QUERY PLAN:
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'user@example.com';
2. ANALYZE (update statistics):
ANALYZE table_name;
3. Indexes are critical (SQLite has no query optimizer):
-- SQLite relies heavily on indexes
CREATE INDEX idx_users_email ON users(email);
Part 8: Monitoring and Diagnosis
Find Missing Indexes (PostgreSQL)
-- Queries doing sequential scans on large tables
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;
Find Unused Indexes (PostgreSQL)
-- Indexes that are never used
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey' -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
Check Index Bloat
-- Postgres: Check for bloated indexes
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated indexes
REINDEX INDEX index_name;
Part 9: Common Patterns by Use Case
Pattern 1: Recent Records with Pagination
-- Create composite index
CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id);
-- Query uses index efficiently
SELECT * FROM posts
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 20;
Pattern 2: Search Across Multiple Columns
-- Create GIN index for full-text search (Postgres)
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || content));
-- Fast full-text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@
to_tsquery('english', 'database & optimization');
Pattern 3: Range Queries
-- Create index on range column
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Query uses index
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY created_at;
Pattern 4: Distinct Values with Filter
-- Create composite index
CREATE INDEX idx_posts_category_user ON posts(category_id, user_id);
-- Query uses index
SELECT DISTINCT category_id
FROM posts
WHERE user_id = 123;
Part 10: Quick Optimization Checklist
Diagnosis:
- Run EXPLAIN ANALYZE on slow queries
- Check for Seq Scan on tables >10K rows
- Look for missing indexes
- Identify N+1 query patterns
- Check query execution time (aim for <100ms)
Index Strategy:
- Index columns in WHERE clauses
- Index columns in JOIN conditions
- Index columns in ORDER BY
- Use composite indexes for multi-column queries
- Consider partial indexes for filtered queries
- Remove unused indexes
Query Rewriting:
- Select only needed columns (not SELECT *)
- Use JOIN instead of N+1 queries
- Use proper types (avoid implicit conversion)
- Avoid functions in WHERE (use functional indexes)
- Use UNION instead of OR for multiple conditions
- Use EXISTS instead of IN for large subqueries
Pagination:
- Use keyset pagination (not OFFSET) for large datasets
- Include stable sort column (id) in pagination
- Index pagination columns
Monitoring:
- Enable query logging for slow queries
- Monitor index usage
- Check for index bloat
- Regularly VACUUM ANALYZE (Postgres)
Resources
PostgreSQL:
- EXPLAIN docs: https://www.postgresql.org/docs/current/sql-explain.html
- Index types: https://www.postgresql.org/docs/current/indexes-types.html
- Performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization
MySQL:
- EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
- Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
Tools:
- pgAdmin (Postgres GUI)
- MySQL Workbench
- DBeaver (multi-database)
- explain.depesz.com (Postgres EXPLAIN visualizer)
Content curated from original sources, copyright belongs to authors
User Rating
USER RATING
WORKS WITH