SQL Query Optimization

SQL Query Optimization: Mastering Database Performance

Table of Contents

  1. Introduction to Query Optimization
  2. Understanding Query Execution
  3. Indexing Strategies
  4. Query Analysis and Profiling
  5. Join Optimization
  6. Subquery and CTE Optimization
  7. Performance Tuning Techniques
  8. Database-Specific Optimizations
  9. Monitoring and Maintenance
  10. Best Practices and Anti-Patterns

Introduction to Query Optimization {#introduction}

SQL query optimization is the process of improving the performance of database queries by reducing execution time, resource consumption, and improving overall throughput. Efficient queries are crucial for application performance, especially as data volumes grow.

Why Query Optimization Matters

  • Performance: Faster query execution improves user experience
  • Scalability: Optimized queries handle larger datasets efficiently
  • Resource Utilization: Reduced CPU, memory, and I/O consumption
  • Cost Reduction: Lower cloud infrastructure costs
  • Concurrency: Better performance under high user loads

Query Performance Factors

-- Factors affecting query performance:
-- 1. Table size and data distribution
-- 2. Index availability and quality
-- 3. Query complexity and structure
-- 4. Hardware resources (CPU, memory, storage)
-- 5. Database configuration
-- 6. Concurrent load

-- Example: Impact of proper indexing
-- Without index: O(n) linear scan
SELECT * FROM users WHERE email = 'john@example.com';

-- With index: O(log n) tree traversal
CREATE INDEX idx_users_email ON users(email);

Query Optimization Process

-- 1. Identify slow queries
-- 2. Analyze execution plans
-- 3. Identify bottlenecks
-- 4. Apply optimization techniques
-- 5. Test and measure improvements
-- 6. Monitor ongoing performance

Understanding Query Execution {#execution}

Query Execution Phases

-- 1. Parsing: Syntax and semantic validation
-- 2. Optimization: Query plan generation
-- 3. Compilation: Plan compilation
-- 4. Execution: Data retrieval and processing

-- Example execution plan analysis
EXPLAIN (ANALYZE, BUFFERS) 
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

Reading Execution Plans

-- PostgreSQL execution plan example
/*
QUERY PLAN
Limit  (cost=1000.00..1000.25 rows=10 width=64)
  ->  Sort  (cost=1000.00..1000.50 rows=200 width=64)
        Sort Key: (count(o.id)) DESC
        ->  HashAggregate  (cost=800.00..950.00 rows=200 width=64)
              Group Key: u.id, u.name
              ->  Hash Left Join  (cost=400.00..750.00 rows=5000 width=32)
                    Hash Cond: (u.id = o.user_id)
                    ->  Seq Scan on users u  (cost=0.00..200.00 rows=1000 width=32)
                          Filter: (created_at >= '2023-01-01'::date)
                    ->  Hash  (cost=150.00..150.00 rows=10000 width=8)
                          ->  Seq Scan on orders o  (cost=0.00..150.00 rows=10000 width=8)
*/

Cost-Based Optimization

-- Database optimizers use statistics to estimate costs
-- Update statistics regularly for accurate estimates
ANALYZE users;
ANALYZE orders;

-- View table statistics
SELECT 
    schemaname,
    tablename,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename IN ('users', 'orders');

Indexing Strategies {#indexing}

Index Types and Use Cases

B-Tree Indexes (Default)

-- Best for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_date ON orders(order_date);

-- Composite indexes for multiple columns
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);

-- Query examples that benefit from B-tree indexes
SELECT * FROM users WHERE email = 'john@example.com';
SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
SELECT * FROM orders WHERE user_id = 123 AND order_date > '2023-06-01';

Hash Indexes

-- PostgreSQL hash indexes for equality queries
CREATE INDEX idx_users_status_hash ON users USING HASH(status);

-- Good for exact matches only
SELECT * FROM users WHERE status = 'active';

Partial Indexes

-- Index only rows that meet certain conditions
CREATE INDEX idx_active_users_email ON users(email) 
WHERE status = 'active';

CREATE INDEX idx_recent_orders ON orders(user_id) 
WHERE order_date >= '2023-01-01';

-- Significantly reduces index size and maintenance cost

Expression Indexes

-- Index on computed values
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
CREATE INDEX idx_orders_total_amount ON orders((quantity * unit_price));

-- Enables efficient queries on expressions
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
SELECT * FROM orders WHERE (quantity * unit_price) > 1000;

Covering Indexes

-- Include additional columns for covering queries
CREATE INDEX idx_orders_covering ON orders(user_id, order_date) 
INCLUDE (total_amount, status);

-- Query can be satisfied entirely from index
SELECT total_amount, status 
FROM orders 
WHERE user_id = 123 AND order_date = '2023-06-15';

Index Optimization Strategies

-- 1. Selectivity: Create indexes on highly selective columns
-- Good: email (unique values)
CREATE INDEX idx_users_email ON users(email);

-- Poor: gender (low selectivity)
-- CREATE INDEX idx_users_gender ON users(gender); -- Avoid

-- 2. Column order in composite indexes
-- Put most selective column first
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, order_date);

-- 3. Avoid over-indexing
-- Each index adds overhead for INSERT/UPDATE/DELETE operations
-- Monitor index usage
SELECT 
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

-- Drop unused indexes
DROP INDEX IF EXISTS idx_unused_index;

Query Analysis and Profiling {#analysis}

Performance Monitoring Tools

PostgreSQL Query Analysis

-- Enable query logging
SET log_statement = 'all';
SET log_min_duration_statement = 1000; -- Log queries > 1 second

-- Use pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    stddev_time,
    rows
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

EXPLAIN ANALYZE

-- Detailed execution analysis
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) 
SELECT 
    u.name,
    u.email,
    COUNT(o.id) as order_count,
    SUM(o.total_amount) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
  AND u.created_at >= '2023-01-01'
GROUP BY u.id, u.name, u.email
HAVING COUNT(o.id) > 5
ORDER BY total_spent DESC
LIMIT 20;

-- Key metrics to analyze:
-- - Actual Time vs Estimated Time
-- - Rows vs Estimated Rows  
-- - Buffer usage (shared hit, read, dirtied)
-- - Sort operations and memory usage

MySQL Query Profiling

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- Profile a specific query
SET profiling = 1;
SELECT * FROM users WHERE email = 'john@example.com';
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;

-- Performance Schema queries
SELECT 
    event_name,
    count_star,
    sum_timer_wait,
    avg_timer_wait
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC
LIMIT 10;

Query Pattern Analysis

-- Identify common anti-patterns

-- 1. N+1 Query Problem
-- Bad: Multiple queries in a loop
SELECT * FROM users WHERE status = 'active';
-- For each user: SELECT * FROM orders WHERE user_id = ?

-- Good: Single query with JOIN
SELECT 
    u.*,
    o.id as order_id,
    o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';

-- 2. SELECT * antipattern
-- Bad: Retrieving unnecessary columns
SELECT * FROM users WHERE id = 123;

-- Good: Select only needed columns
SELECT id, name, email FROM users WHERE id = 123;

-- 3. Missing WHERE clauses
-- Bad: Full table scan
SELECT COUNT(*) FROM orders;

-- Good: Use appropriate filters
SELECT COUNT(*) FROM orders WHERE order_date >= '2023-01-01';

Join Optimization {#joins}

Join Types and Performance

Inner Joins

-- Most efficient when both tables have appropriate indexes
SELECT 
    u.name,
    o.order_date,
    o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
  AND o.order_date >= '2023-01-01';

-- Ensure indexes exist on join columns
CREATE INDEX idx_users_id ON users(id);
CREATE INDEX idx_orders_user_id ON orders(user_id);

Left/Right Outer Joins

-- Be careful with large result sets
SELECT 
    u.name,
    COALESCE(COUNT(o.id), 0) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01'
GROUP BY u.id, u.name;

-- Consider using EXISTS for existence checks
SELECT u.name
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.user_id = u.id 
    AND o.order_date >= '2023-01-01'
);

Join Algorithms

-- 1. Nested Loop Join
-- Good for small datasets, indexed joins
-- Avoid for large tables without indexes

-- 2. Hash Join
-- Good for large datasets, equality joins
-- Requires sufficient memory

-- 3. Sort Merge Join
-- Good for large datasets, range joins
-- Benefits from sorted data

-- Force specific join algorithm (PostgreSQL)
SET enable_nestloop = OFF;
SET enable_hashjoin = ON;
SET enable_mergejoin = ON;

Join Optimization Techniques

-- 1. Join order optimization
-- Filter early, join late
SELECT 
    u.name,
    recent_orders.total_spent
FROM users u
INNER JOIN (
    SELECT 
        user_id,
        SUM(total_amount) as total_spent
    FROM orders
    WHERE order_date >= '2023-11-01'  -- Filter early
    GROUP BY user_id
) recent_orders ON u.id = recent_orders.user_id
WHERE u.status = 'active';

-- 2. Avoid Cartesian products
-- Always specify join conditions
-- Bad
SELECT * FROM users u, orders o WHERE u.status = 'active';

-- Good
SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';

-- 3. Use appropriate join types
-- EXISTS vs IN vs JOIN
-- EXISTS is often more efficient for existence checks
SELECT u.name
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.user_id = u.id AND o.status = 'completed'
);

Subquery and CTE Optimization {#subqueries}

Subquery Types and Performance

Correlated vs Non-Correlated Subqueries

-- Non-correlated subquery (generally faster)
SELECT name
FROM users
WHERE id IN (
    SELECT user_id 
    FROM orders 
    WHERE order_date >= '2023-01-01'
);

-- Correlated subquery (executed for each row)
SELECT name
FROM users u
WHERE EXISTS (
    SELECT 1 
    FROM orders o
    WHERE o.user_id = u.id 
    AND o.order_date >= '2023-01-01'
);

-- Often better to rewrite as JOIN
SELECT DISTINCT u.name
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.order_date >= '2023-01-01';

Scalar Subqueries

-- Scalar subquery in SELECT (can be expensive)
SELECT 
    u.name,
    u.email,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_count
FROM users u;

-- Better: Use LEFT JOIN with aggregation
SELECT 
    u.name,
    u.email,
    COALESCE(o.order_count, 0) as order_count
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
) o ON u.id = o.user_id;

Common Table Expressions (CTEs)

-- Simple CTE
WITH active_users AS (
    SELECT id, name, email
    FROM users
    WHERE status = 'active' AND created_at >= '2023-01-01'
),
user_orders AS (
    SELECT 
        u.id,
        u.name,
        COUNT(o.id) as order_count,
        SUM(o.total_amount) as total_spent
    FROM active_users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.id, u.name
)
SELECT *
FROM user_orders
WHERE order_count > 5
ORDER BY total_spent DESC;

-- Recursive CTE for hierarchical data
WITH RECURSIVE category_hierarchy AS (
    -- Base case
    SELECT id, name, parent_id, 1 as level
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive case
    SELECT c.id, c.name, c.parent_id, ch.level + 1
    FROM categories c
    INNER JOIN category_hierarchy ch ON c.parent_id = ch.id
)
SELECT * FROM category_hierarchy ORDER BY level, name;

Window Functions vs Subqueries

-- Using subquery (less efficient)
SELECT 
    user_id,
    order_date,
    total_amount,
    (SELECT AVG(total_amount) 
     FROM orders o2 
     WHERE o2.user_id = o1.user_id) as user_avg
FROM orders o1;

-- Using window function (more efficient)
SELECT 
    user_id,
    order_date,
    total_amount,
    AVG(total_amount) OVER (PARTITION BY user_id) as user_avg
FROM orders;

-- Ranking with window functions
SELECT 
    user_id,
    order_date,
    total_amount,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as order_rank
FROM orders;

Performance Tuning Techniques {#tuning}

Query Rewriting Techniques

Predicate Pushdown

-- Bad: Filter after join
SELECT u.name, o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.order_date >= '2023-01-01';

-- Good: Filter before join
SELECT u.name, recent_orders.total_amount
FROM users u
INNER JOIN (
    SELECT user_id, total_amount
    FROM orders
    WHERE order_date >= '2023-01-01'  -- Filter pushed down
) recent_orders ON u.id = recent_orders.user_id;

Projection Pushdown

-- Bad: Select all columns then filter
SELECT name, email
FROM (
    SELECT * FROM users WHERE status = 'active'
) active_users;

-- Good: Select only needed columns
SELECT name, email
FROM users
WHERE status = 'active';

Aggregate Pushdown

-- Bad: Aggregate after join
SELECT 
    u.name,
    SUM(o.total_amount) as total_spent
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active'
GROUP BY u.id, u.name;

-- Good: Pre-aggregate before join
SELECT 
    u.name,
    user_totals.total_spent
FROM users u
INNER JOIN (
    SELECT user_id, SUM(total_amount) as total_spent
    FROM orders
    GROUP BY user_id
) user_totals ON u.id = user_totals.user_id
WHERE u.status = 'active';

Partitioning Strategies

-- Range partitioning by date
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Hash partitioning by user_id
CREATE TABLE orders_p0 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE orders_p1 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 1);

-- Partition pruning in action
SELECT * FROM orders 
WHERE order_date >= '2023-06-01' 
  AND order_date < '2023-07-01';
-- Only scans orders_2023 partition

Materialized Views

-- Create materialized view for expensive aggregations
CREATE MATERIALIZED VIEW user_order_summary AS
SELECT 
    u.id,
    u.name,
    u.email,
    COUNT(o.id) as order_count,
    SUM(o.total_amount) as total_spent,
    MAX(o.order_date) as last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name, u.email;

-- Create index on materialized view
CREATE INDEX idx_user_order_summary_total_spent 
ON user_order_summary(total_spent DESC);

-- Refresh materialized view
REFRESH MATERIALIZED VIEW user_order_summary;

-- Use in queries
SELECT * FROM user_order_summary
WHERE total_spent > 1000
ORDER BY total_spent DESC;

Database-Specific Optimizations {#database-specific}

PostgreSQL Optimizations

-- Configuration tuning
-- postgresql.conf
shared_buffers = '256MB'          -- 25% of RAM
effective_cache_size = '1GB'      -- Available OS cache
work_mem = '4MB'                  -- Per operation memory
maintenance_work_mem = '64MB'     -- For maintenance operations
random_page_cost = 1.1            -- SSD optimization

-- VACUUM and ANALYZE
VACUUM ANALYZE users;             -- Reclaim space and update stats
REINDEX INDEX idx_users_email;    -- Rebuild fragmented index

-- Parallel query execution
SET max_parallel_workers_per_gather = 2;
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM large_table WHERE condition = 'value';

MySQL Optimizations

-- Configuration tuning
-- my.cnf
innodb_buffer_pool_size = 1G      -- 70-80% of RAM
innodb_log_file_size = 256M       -- Large redo logs
query_cache_size = 128M           -- Query result cache
tmp_table_size = 128M             -- Temporary table size

-- Index hints
SELECT /*+ USE_INDEX(users, idx_users_email) */ 
    name, email 
FROM users 
WHERE email = 'john@example.com';

-- Optimizer hints
SELECT /*+ JOIN_ORDER(u, o) */ 
    u.name, o.total_amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- Partitioning
CREATE TABLE orders (
    id INT AUTO_INCREMENT,
    user_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    PRIMARY KEY (id, order_date)
)
PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION pmax VALUES LESS THAN MAXVALUE
);

SQL Server Optimizations

-- Index recommendations
SELECT 
    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
    'CREATE INDEX [missing_index_' + CONVERT(varchar, mig.index_group_handle) + '_' + CONVERT(varchar, mid.index_handle) + ']'
    + ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns,'') 
    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
    + ISNULL(mid.inequality_columns, '') + ')' 
    + ISNULL(' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
ORDER BY improvement_measure DESC;

-- Query Store
ALTER DATABASE MyDatabase SET QUERY_STORE = ON;

-- Columnstore indexes for analytics
CREATE CLUSTERED COLUMNSTORE INDEX cci_orders_analytics
ON orders_analytics;

-- In-memory OLTP
CREATE TABLE users_memory (
    id INT IDENTITY PRIMARY KEY NONCLUSTERED,
    name NVARCHAR(100),
    email NVARCHAR(255),
    INDEX ix_email HASH (email) WITH (BUCKET_COUNT = 1000000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

Monitoring and Maintenance {#monitoring}

Performance Monitoring Queries

-- PostgreSQL monitoring
-- Active queries
SELECT 
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';

-- Lock monitoring
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- Index usage statistics
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Automated Monitoring Setup

-- Create monitoring views
CREATE VIEW slow_queries AS
SELECT 
    query,
    calls,
    total_time / 1000.0 AS total_seconds,
    mean_time / 1000.0 AS mean_seconds,
    stddev_time / 1000.0 AS stddev_seconds,
    rows,
    100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
WHERE mean_time > 1000  -- Queries taking more than 1 second on average
ORDER BY mean_time DESC;

-- Maintenance procedures
CREATE OR REPLACE FUNCTION maintenance_routine()
RETURNS void AS $$
BEGIN
    -- Update table statistics
    ANALYZE;

    -- Vacuum tables with high update/delete activity
    VACUUM (ANALYZE, VERBOSE) users;
    VACUUM (ANALYZE, VERBOSE) orders;

    -- Reindex if fragmentation is high
    REINDEX INDEX CONCURRENTLY idx_users_email;

    -- Log completion
    INSERT INTO maintenance_log (run_date, action, status)
    VALUES (NOW(), 'routine_maintenance', 'completed');
END;
$$ LANGUAGE plpgsql;

-- Schedule maintenance
SELECT cron.schedule('maintenance', '0 2 * * 0', 'SELECT maintenance_routine();');

Best Practices and Anti-Patterns {#best-practices}

Query Optimization Best Practices

-- 1. Always use appropriate WHERE clauses
-- Good
SELECT * FROM orders WHERE user_id = 123 AND order_date >= '2023-01-01';

-- Bad
SELECT * FROM orders; -- Full table scan

-- 2. Use LIMIT for large result sets
-- Good
SELECT * FROM users ORDER BY created_at DESC LIMIT 100;

-- Bad
SELECT * FROM users ORDER BY created_at DESC; -- Returns all rows

-- 3. Avoid functions in WHERE clauses on indexed columns
-- Bad
SELECT * FROM users WHERE YEAR(created_at) = 2023;

-- Good
SELECT * FROM users WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01';

-- 4. Use appropriate data types
-- Good
CREATE TABLE events (
    id BIGINT PRIMARY KEY,
    event_date DATE,            -- Not TIMESTAMP if time not needed
    status SMALLINT,            -- Not INT for small values
    amount DECIMAL(10,2)        -- Not FLOAT for money
);

-- 5. Normalize appropriately (avoid over-normalization)
-- Good balance between normalization and performance

Common Anti-Patterns to Avoid

-- 1. SELECT * antipattern
-- Bad
SELECT * FROM users u
JOIN orders o ON u.id = o.user_id;

-- Good
SELECT u.name, u.email, o.total_amount, o.order_date
FROM users u
JOIN orders o ON u.id = o.user_id;

-- 2. N+1 query problem
-- Bad (in application code)
users = SELECT * FROM users WHERE status = 'active';
for user in users:
    orders = SELECT * FROM orders WHERE user_id = user.id;

-- Good
SELECT u.*, o.id as order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';

-- 3. Inefficient pagination
-- Bad
SELECT * FROM products ORDER BY name LIMIT 1000 OFFSET 50000;

-- Good (cursor-based pagination)
SELECT * FROM products 
WHERE name > 'last_seen_name'
ORDER BY name 
LIMIT 1000;

-- 4. Correlated subqueries when joins would be better
-- Bad
SELECT u.name,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_count
FROM users u;

-- Good
SELECT u.name, COALESCE(o.order_count, 0) as order_count
FROM users u
LEFT JOIN (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    GROUP BY user_id
) o ON u.id = o.user_id;

Performance Testing and Benchmarking

-- Create test data for benchmarking
INSERT INTO users (name, email, status, created_at)
SELECT 
    'User ' || generate_series,
    'user' || generate_series || '@example.com',
    CASE WHEN random() > 0.1 THEN 'active' ELSE 'inactive' END,
    NOW() - (random() * interval '2 years')
FROM generate_series(1, 1000000);

-- Benchmark queries with timing
    iming on

-- Test different query approaches
EXPLAIN (ANALYZE, BUFFERS) 
SELECT COUNT(*) FROM users WHERE status = 'active';

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM users WHERE status = 'active' AND created_at >= '2023-01-01';

-- Compare index strategies
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_status_date ON users(status, created_at);

-- Measure improvement
    iming off

This comprehensive guide provides the foundation for optimizing SQL queries effectively. Remember that optimization is an iterative process - measure performance before and after changes, focus on the queries that matter most to your application, and maintain a balance between query performance and maintainability.

The key to successful query optimization is understanding your data, monitoring query performance continuously, and applying the right techniques for your specific use case. Start with the basics (proper indexing and WHERE clauses) and gradually move to more advanced techniques as needed.