Advanced SQL Queries

16. What is a Subquery?

Subquery হল একটি SQL query যা অন্য একটি SQL query এর ভিতরে nested বা embedded থাকে। এটি "query within a query" নামেও পরিচিত। Subquery এর result main query তে ব্যবহৃত হয়।

SELECT column1, column2
FROM table1
WHERE column1 = (SELECT column1 FROM table2 WHERE condition);

Types of Subqueries:

1. Single Row Subquery:

-- Highest paid employee এর তথ্য
SELECT name, salary 
FROM employees 
WHERE salary = (SELECT MAX(salary) FROM employees);

2. Multiple Row Subquery:

-- IT department এর সব employees
SELECT name, salary 
FROM employees 
WHERE dept_id IN (SELECT dept_id FROM departments WHERE dept_name = 'IT');

3. Subquery in Different Clauses:

WHERE Clause:

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

FROM Clause (Derived Table):

SELECT dept_name, avg_salary
FROM (
    SELECT dept_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY dept_id
) AS dept_avg
JOIN departments d ON dept_avg.dept_id = d.dept_id;

SELECT Clause:

SELECT 
    name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg_salary
FROM employees;

Difference between Subquery and Correlated Subquery?

Regular Subquery (Independent):

Regular subquery একবার execute হয় এবং তার result main query তে ব্যবহৃত হয়।

-- Simple subquery - executes once
SELECT name, salary
FROM employees
WHERE dept_id = (SELECT dept_id FROM departments WHERE dept_name = 'IT');

Execution Process:

Inner query execute: SELECT dept_id FROM departments WHERE dept_name = 'IT' → Returns 101
Main query execute: SELECT name, salary FROM employees WHERE dept_id = 101

Correlated Subquery (Dependent):

Correlated subquery main query এর প্রতিটি row এর জন্য execute হয়।

-- Correlated subquery - executes for each row
SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary) 
    FROM employees e2 
    WHERE e2.dept_id = e1.dept_id  -- References outer query
);

Execution Process:

Main query এর প্রতিটি row এর জন্য
Inner query execute হয় সেই row এর context এ
Result compare করে condition check করে

Performance Comparison:

বিষয়	Regular Subquery	Correlated Subquery
Execution	Once	For each outer row
Performance	Generally faster	Generally slower
Dependency	Independent	Dependent on outer query
Optimization	Easier to optimize	Harder to optimize

Real-world Examples:

Regular Subquery:

-- সবচেয়ে expensive product এর category
SELECT category_name
FROM categories
WHERE category_id = (
    SELECT category_id 
    FROM products 
    ORDER BY price DESC 
    LIMIT 1
);

Correlated Subquery:

-- Each department এর highest paid employee
SELECT e1.name, e1.salary, d.dept_name
FROM employees e1
JOIN departments d ON e1.dept_id = d.dept_id
WHERE e1.salary = (
    SELECT MAX(e2.salary)
    FROM employees e2
    WHERE e2.dept_id = e1.dept_id  -- Correlated part
);

When would you use EXISTS vs IN with subqueries?

EXISTS Operator:

EXISTS operator check করে যে subquery কোনো rows return করে কি না। এটি boolean result দেয় (TRUE/FALSE)।

-- Customers যাদের কমপক্ষে একটি order আছে
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id
);

IN Operator:

IN operator specific values এর list এর সাথে match করে।

-- Same result, কিন্তু IN দিয়ে
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders
);

Performance Comparison:

EXISTS এর Advantages:

1. NULL Handling:

-- EXISTS doesn't care about NULL values
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.status IS NULL  -- NULL values don't affect EXISTS
);

2. Early Termination:

-- EXISTS stops after finding first match
-- More efficient for large datasets

3. Better for Correlated Queries:

-- Efficiently finds customers with orders in last 30 days
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
);

IN এর Advantages:

1. Simple Value Lists:

-- Direct value comparison
SELECT * FROM products 
WHERE category_id IN (1, 2, 3, 4);

2. Readable for Simple Cases:

-- Easy to understand
SELECT * FROM employees 
WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'Dhaka');

When to Use Which:

Scenario	Use	Reason
NULL values in subquery	EXISTS	IN returns NULL with NULL values
Large datasets	EXISTS	Early termination, better performance
Simple value lists	IN	More readable
Correlated subqueries	EXISTS	Better optimization
Multiple columns	EXISTS	IN doesn't support multiple columns well

Examples with Performance Impact:

Large Table Scenario:

-- Table: customers (1M rows), orders (10M rows)

-- EXISTS - faster (stops at first match)
SELECT COUNT(*)
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);
-- Execution time: ~2 seconds

-- IN - slower (processes all subquery results)
SELECT COUNT(*)
FROM customers c
WHERE c.customer_id IN (
    SELECT customer_id FROM orders
);
-- Execution time: ~8 seconds

Performance Comparison: JOIN vs Subquery?

JOIN Approach:

-- Using JOIN
SELECT DISTINCT c.customer_name, c.email
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2023-01-01';

Subquery Approach:

-- Using EXISTS subquery
SELECT c.customer_name, c.email
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= '2023-01-01'
);

Performance Analysis:

JOIN Advantages:

1. Better Optimization:

-- Query optimizer can choose best execution plan
-- Can use indexes efficiently
-- Better statistics available

2. Retrieving Related Data:

-- When you need data from both tables
SELECT c.customer_name, o.order_date, o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed';

3. Set-based Operations:

-- Better for aggregations
SELECT c.customer_name, COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;

Subquery Advantages:

1. Existence Check:

-- Better when you only need to check existence
SELECT customer_name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

2. Complex Conditions:

-- Complex logic easier to read
SELECT product_name
FROM products p
WHERE price > (
    SELECT AVG(price) 
    FROM products p2 
    WHERE p2.category_id = p.category_id
);

Performance Benchmarks:

Test Setup: 100K customers, 1M orders

Query Type	Execution Time	Memory Usage
INNER JOIN	0.8 seconds	50MB
EXISTS Subquery	1.2 seconds	30MB
IN Subquery	3.5 seconds	80MB

Best Practices:

Use JOIN when:

Need data from multiple tables
Better performance for large datasets
Simple relationships

Use Subquery when:

Complex business logic
Existence checks
Avoiding duplicate results
Nested conditions

Example Decision Tree:

-- Need customer info + order details? → Use JOIN
SELECT c.name, o.order_date, o.amount
FROM customers c
JOIN orders o ON c.id = o.customer_id;

-- Only need customers who have orders? → Use EXISTS
SELECT c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Need customers with specific order criteria? → Use subquery
SELECT c.name
FROM customers c
WHERE c.id IN (
    SELECT customer_id FROM orders 
    WHERE total_amount > (SELECT AVG(total_amount) FROM orders)
);

17. What is HAVING vs WHERE Clause?

WHERE এবং HAVING উভয়ই SQL এ filtering এর জন্য ব্যবহৃত হয়, কিন্তু তারা different stages এ apply হয় এবং different purposes serve করে।

বিষয়	WHERE	HAVING
Usage	Individual rows filter	Groups filter
Applied	Before GROUP BY	After GROUP BY
Works with	Column values	Aggregate functions
Performance	Faster (fewer rows to process)	Slower (after aggregation)

WHERE Clause:

WHERE clause individual rows কে filter করে GROUP BY এবং aggregation এর আগে।

-- WHERE example: Individual row filtering
SELECT name, salary, department
FROM employees
WHERE salary > 50000;  -- Filters rows before any grouping

HAVING Clause:

HAVING clause groups কে filter করে GROUP BY এবং aggregation এর পরে।

-- HAVING example: Group filtering
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;  -- Filters groups after aggregation

Table Structure:

CREATE TABLE sales (
    id INT,
    salesperson VARCHAR(50),
    region VARCHAR(50),
    sale_amount DECIMAL(10,2),
    sale_date DATE
);

INSERT INTO sales VALUES
(1, 'আলী', 'ঢাকা', 15000, '2023-01-15'),
(2, 'ফাতেমা', 'ঢাকা', 25000, '2023-01-20'),
(3, 'করিম', 'চট্টগ্রাম', 18000, '2023-01-25'),
(4, 'রহিম', 'ঢাকা', 12000, '2023-02-01'),
(5, 'আয়েশা', 'চট্টগ্রাম', 22000, '2023-02-05');

WHERE Usage:

-- Individual sales > 15000 এর তথ্য
SELECT salesperson, region, sale_amount
FROM sales
WHERE sale_amount > 15000;  -- Filters individual rows

-- Result:
-- ফাতেমা, ঢাকা, 25000
-- করিম, চট্টগ্রাম, 18000
-- আয়েশা, চট্টগ্রাম, 22000

HAVING Usage:

-- Regions যাদের total sales > 30000
SELECT region, SUM(sale_amount) as total_sales
FROM sales
GROUP BY region
HAVING SUM(sale_amount) > 30000;  -- Filters groups

-- Result:
-- ঢাকা, 52000
-- চট্টগ্রাম, 40000

Combined Usage:

-- 2023 January এর sales, regions যাদের average > 15000
SELECT 
    region, 
    COUNT(*) as sale_count,
    AVG(sale_amount) as avg_sale
FROM sales
WHERE sale_date >= '2023-01-01' 
  AND sale_date < '2023-02-01'  -- WHERE: filters rows first
GROUP BY region
HAVING AVG(sale_amount) > 15000;  -- HAVING: filters groups after

-- Execution order:
-- 1. WHERE filters January sales only
-- 2. GROUP BY groups by region  
-- 3. HAVING filters regions with avg > 15000

Can HAVING be used without GROUP BY?

হ্যাঁ, কিন্তু limited cases এ। কিছু database systems এ HAVING without GROUP BY ব্যবহার করা যায়, কিন্তু এটি পুরো table কে একটি single group হিসেবে treat করে।

HAVING without GROUP BY:

-- Valid in some databases (MySQL, SQL Server)
SELECT COUNT(*) as total_employees
FROM employees
HAVING COUNT(*) > 100;

-- Equivalent to:
SELECT COUNT(*) as total_employees
FROM employees
WHERE (SELECT COUNT(*) FROM employees) > 100;

Practical Example:

-- Check if company has more than 50 employees
SELECT 'Company is large' as status
FROM employees
HAVING COUNT(*) > 50;

-- If condition false, no rows returned
-- If condition true, returns the message

Why it's rarely used:

-- More readable approach
SELECT 
    CASE 
        WHEN (SELECT COUNT(*) FROM employees) > 50 
        THEN 'Large Company' 
        ELSE 'Small Company' 
    END as company_size;

Can you use column aliases in HAVING clause?

এটি database-dependent। বেশিরভাগ modern databases এ column aliases HAVING clause এ ব্যবহার করা যায়।

Supported Databases:

MySQL:

SELECT 
    department,
    AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING avg_salary > 60000;  -- ✓ Alias works

PostgreSQL:

SELECT 
    department,
    COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING emp_count > 5;  -- ✓ Alias works

SQL Server:

SELECT 
    region,
    SUM(sales) as total_sales
FROM sales_data
GROUP BY region
HAVING total_sales > 100000;  -- ✓ Alias works

Alternative Approach (Universal):

-- If aliases don't work, repeat the expression
SELECT 
    department,
    AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;  -- Repeat the aggregate function

Execution Order এ Alias Usage:

-- SQL execution order:
-- 1. FROM - Table selection
-- 2. WHERE - Row filtering (aliases not available yet)
-- 3. GROUP BY - Grouping (aliases not available yet)  
-- 4. SELECT - Column calculation (aliases created here)
-- 5. HAVING - Group filtering (aliases available now)
-- 6. ORDER BY - Sorting (aliases available)

SELECT 
    department,
    AVG(salary) as dept_avg_salary
FROM employees
WHERE salary > 30000           -- ❌ Can't use dept_avg_salary here
GROUP BY department            -- ❌ Can't use dept_avg_salary here
HAVING dept_avg_salary > 55000 -- ✓ Can use alias here
ORDER BY dept_avg_salary DESC; -- ✓ Can use alias here

18. What is UNION vs UNION ALL?

UNION এবং UNION ALL operators দুই বা ততোধিক SELECT statements এর results combine করার জন্য ব্যবহৃত হয়। তাদের মধ্যে main difference হল duplicate handling এ।

UNION Operator:

UNION operator দুই বা ততোধিক SELECT statements এর results combine করে এবং duplicate rows remove করে দেয়।

-- UNION syntax
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

UNION ALL Operator:

UNION ALL operator same কাজ করে কিন্তু duplicate rows keep করে।

-- UNION ALL syntax
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;

Table Setup:

-- Dhaka customers
CREATE TABLE dhaka_customers (
    id INT,
    name VARCHAR(50),
    city VARCHAR(50)
);

INSERT INTO dhaka_customers VALUES
(1, 'রহিম', 'ঢাকা'),
(2, 'করিম', 'ঢাকা'),
(3, 'ফাতেমা', 'ঢাকা');

-- Chittagong customers  
CREATE TABLE chittagong_customers (
    id INT,
    name VARCHAR(50),
    city VARCHAR(50)
);

INSERT INTO chittagong_customers VALUES
(2, 'করিম', 'ঢাকা'),  -- Duplicate
(4, 'আলী', 'চট্টগ্রাম'),
(5, 'আয়েশা', 'চট্টগ্রাম');

UNION Example:

SELECT name, city FROM dhaka_customers
UNION
SELECT name, city FROM chittagong_customers;

-- Result (duplicates removed):
-- রহিম, ঢাকা
-- করিম, ঢাকা  (appears only once)
-- ফাতেমা, ঢাকা
-- আলী, চট্টগ্রাম
-- আয়েশা, চট্টগ্রাম

UNION ALL Example:

SELECT name, city FROM dhaka_customers
UNION ALL
SELECT name, city FROM chittagong_customers;

-- Result (all rows included):
-- রহিম, ঢাকা
-- করিম, ঢাকা
-- ফাতেমা, ঢাকা
-- করিম, ঢাকা  (duplicate included)
-- আলী, চট্টগ্রাম
-- আয়েশা, চট্টগ্রাম

When would you use UNION ALL instead of UNION?

1. Performance Reasons:

UNION ALL significantly faster কারণ duplicate checking এর প্রয়োজন নেই।

-- Performance test with large tables

-- UNION (slower - needs duplicate checking)
SELECT customer_id, name FROM customers_2022  -- 1M rows
UNION
SELECT customer_id, name FROM customers_2023; -- 1M rows
-- Execution time: ~15 seconds

-- UNION ALL (faster - no duplicate checking)  
SELECT customer_id, name FROM customers_2022
UNION ALL  
SELECT customer_id, name FROM customers_2023;
-- Execution time: ~3 seconds

2. When Duplicates are Wanted:

-- Sales report: want to see all transactions including duplicates
SELECT sale_date, amount, 'Online' as channel FROM online_sales
UNION ALL
SELECT sale_date, amount, 'Store' as channel FROM store_sales;

-- Result shows all sales, even if same date/amount

3. Data Migration/ETL:

-- Combining data from multiple sources
SELECT emp_id, name, 'HR_SYSTEM' as source FROM hr_employees
UNION ALL
SELECT emp_id, name, 'PAYROLL_SYSTEM' as source FROM payroll_employees
UNION ALL  
SELECT emp_id, name, 'BENEFITS_SYSTEM' as source FROM benefits_employees;

-- Need to see all records from all systems for analysis

4. Log/Audit Tables:

-- Combining different log tables
SELECT timestamp, user_id, action FROM login_logs
UNION ALL
SELECT timestamp, user_id, action FROM logout_logs
UNION ALL
SELECT timestamp, user_id, action FROM error_logs;

-- All log entries needed, duplicates are meaningful

Performance Difference Between Them:

UNION Performance Impact:

-- UNION process:
-- 1. Execute first SELECT
-- 2. Execute second SELECT  
-- 3. Combine results
-- 4. Sort/hash to identify duplicates (EXPENSIVE)
-- 5. Remove duplicates
-- 6. Return final result

UNION ALL Performance:

-- UNION ALL process:
-- 1. Execute first SELECT
-- 2. Execute second SELECT
-- 3. Combine results (SIMPLE APPEND)
-- 4. Return final result

Real Performance Numbers:

Test Setup: 2 tables with 500K rows each, 10% overlap

Operation	Execution Time	Memory Usage	CPU Usage
UNION	12.5 seconds	450MB	85%
UNION ALL	2.1 seconds	180MB	35%

When Performance Matters Most:

-- Large dataset combination
SELECT product_id, sales_amount FROM sales_q1_2023  -- 2M rows
UNION ALL  -- Choose UNION ALL for performance
SELECT product_id, sales_amount FROM sales_q2_2023  -- 2M rows
UNION ALL
SELECT product_id, sales_amount FROM sales_q3_2023  -- 2M rows
UNION ALL
SELECT product_id, sales_amount FROM sales_q4_2023; -- 2M rows

-- 8M total rows processed in ~5 seconds vs ~45 seconds with UNION

What is INTERSECT and EXCEPT?

INTERSECT Operator:

INTERSECT returns rows যা দুই SELECT statements এ common আছে।

-- Common customers between two regions
SELECT customer_id, name FROM dhaka_customers
INTERSECT
SELECT customer_id, name FROM premium_customers;

-- Returns only customers who are both in Dhaka AND premium

EXCEPT (or MINUS) Operator:

EXCEPT returns rows যা first SELECT এ আছে কিন্তু second SELECT এ নেই।

-- Customers in Dhaka but not premium
SELECT customer_id, name FROM dhaka_customers
EXCEPT
SELECT customer_id, name FROM premium_customers;

-- Returns Dhaka customers who are NOT premium

19. What is DISTINCT Keyword?

DISTINCT keyword SQL এ duplicate values remove করার জন্য ব্যবহৃত হয়। এটি SELECT statement এ unique values শুধুমাত্র একবার return করে।

SELECT DISTINCT column1, column2, ...
FROM table_name;

Without DISTINCT:

-- Sample data in employees table
SELECT department FROM employees;

-- Result (with duplicates):
-- IT
-- HR  
-- IT
-- Finance
-- HR
-- IT
-- Finance

With DISTINCT:

SELECT DISTINCT department FROM employees;

-- Result (without duplicates):
-- IT
-- HR
-- Finance

Can DISTINCT be used across multiple columns?

হ্যাঁ, DISTINCT multiple columns এর combination এর জন্য ব্যবহার করা যায়।

Multiple Column DISTINCT:

-- Unique combinations of city and state
SELECT DISTINCT city, state FROM customers;

-- Result:
-- ঢাকা, ঢাকা বিভাগ
-- চট্টগ্রাম, চট্টগ্রাম বিভাগ
-- সিলেট, সিলেট বিভাগ
-- কুমিল্লা, চট্টগ্রাম বিভাগ

Understanding Multi-column DISTINCT:

Sample Data:

CREATE TABLE orders (
    customer_id INT,
    product_id INT,
    order_date DATE,
    quantity INT
);

INSERT INTO orders VALUES
(1, 101, '2023-01-15', 2),
(1, 101, '2023-01-15', 3),  -- Different quantity, same customer+product+date
(1, 102, '2023-01-15', 1),  -- Different product
(2, 101, '2023-01-15', 2),  -- Different customer
(1, 101, '2023-01-16', 2);  -- Different date

Different DISTINCT Applications:

-- 1. Unique customers who placed orders
SELECT DISTINCT customer_id FROM orders;
-- Result: 1, 2

-- 2. Unique customer-product combinations
SELECT DISTINCT customer_id, product_id FROM orders;
-- Result: (1,101), (1,102), (2,101)

-- 3. Unique customer-product-date combinations
SELECT DISTINCT customer_id, product_id, order_date FROM orders;
-- Result: (1,101,'2023-01-15'), (1,102,'2023-01-15'), (2,101,'2023-01-15'), (1,101,'2023-01-16')

-- 4. All unique combinations (all columns)
SELECT DISTINCT customer_id, product_id, order_date, quantity FROM orders;
-- Result: All 5 rows (each is unique in this combination)

How does DISTINCT work internally?

Internal Processing Steps:

1. Sorting Method:

-- Database sorts all result rows
SELECT DISTINCT department, salary FROM employees;

-- Internal process:
-- 1. Retrieve all rows: (IT, 50000), (HR, 45000), (IT, 50000), (Finance, 55000)
-- 2. Sort by all columns: (Finance, 55000), (HR, 45000), (IT, 50000), (IT, 50000)  
-- 3. Remove consecutive duplicates: (Finance, 55000), (HR, 45000), (IT, 50000)
-- 4. Return result

2. Hashing Method:

-- Database uses hash table to track seen combinations
-- 1. Create empty hash table
-- 2. For each row, calculate hash of (department, salary)
-- 3. If hash not seen before, add to result and hash table
-- 4. If hash already exists, skip row
-- 5. Return accumulated results

Performance Implications:

Memory Usage:

-- DISTINCT needs memory to track unique values
SELECT DISTINCT customer_id FROM large_orders_table;  -- 10M rows

-- Memory usage approaches:
-- Sorting: Needs space to sort all customer_ids
-- Hashing: Needs space for hash table of unique customer_ids

Query Execution Plan:

-- Check execution plan
EXPLAIN SELECT DISTINCT department FROM employees;

-- Possible plans:
-- 1. Sort + Remove Duplicates
-- 2. Hash Aggregate  
-- 3. Index Scan (if covering index exists)

Optimization Techniques:

1. Index Usage:

-- If index exists on department column
CREATE INDEX idx_department ON employees (department);

-- DISTINCT can use index scan instead of full table scan
SELECT DISTINCT department FROM employees;
-- Uses index scan → much faster

2. Covering Index:

-- For multi-column DISTINCT
CREATE INDEX idx_dept_salary ON employees (department, salary);

SELECT DISTINCT department, salary FROM employees;
-- Uses covering index → no table access needed

3. Limit Results:

-- When you don't need all distinct values
SELECT DISTINCT category FROM products LIMIT 10;
-- Database can stop after finding 10 unique values

DISTINCT vs GROUP BY Performance:

-- These queries are often equivalent internally:

-- Using DISTINCT
SELECT DISTINCT department FROM employees;

-- Using GROUP BY
SELECT department FROM employees GROUP BY department;

-- Both might generate similar execution plans
-- GROUP BY sometimes optimized better with indexes

Best Practices:

1. Use Indexes:

-- Create appropriate indexes for DISTINCT queries
CREATE INDEX idx_category ON products (category);
SELECT DISTINCT category FROM products;  -- Fast with index

2. Limit Columns:

-- Bad: DISTINCT on many columns
SELECT DISTINCT * FROM large_table;  -- Slow, memory intensive

-- Good: DISTINCT on specific columns
SELECT DISTINCT category, brand FROM products;  -- Faster, less memory

3. Consider Alternatives:

-- Sometimes EXISTS is more efficient than DISTINCT
-- Instead of:
SELECT DISTINCT customer_id FROM orders;

-- Consider:
SELECT customer_id FROM customers c 
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

4. Monitor Performance:

-- Use EXPLAIN to understand execution plan
EXPLAIN SELECT DISTINCT department, location FROM employees;

-- Look for:
-- - Index usage
-- - Memory usage estimates  
-- - Sort operations
-- - Hash operations

16. What is a Subquery?​

Types of Subqueries:​

1. Single Row Subquery:​

2. Multiple Row Subquery:​

3. Subquery in Different Clauses:​

Difference between Subquery and Correlated Subquery?​

Regular Subquery (Independent):​

Correlated Subquery (Dependent):​

Performance Comparison:​

Real-world Examples:​

When would you use EXISTS vs IN with subqueries?​

EXISTS Operator:​

IN Operator:​

Performance Comparison:​

EXISTS এর Advantages:​

IN এর Advantages:​

When to Use Which:​

Examples with Performance Impact:​

Performance Comparison: JOIN vs Subquery?​

JOIN Approach:​

Subquery Approach:​

Performance Analysis:​

JOIN Advantages:​

Subquery Advantages:​

Performance Benchmarks:​

Best Practices:​

17. What is HAVING vs WHERE Clause?​

WHERE Clause:​

HAVING Clause:​

WHERE Usage:​

HAVING Usage:​

Combined Usage:​

Can HAVING be used without GROUP BY?​

HAVING without GROUP BY:​

Practical Example:​

Why it's rarely used:​

Can you use column aliases in HAVING clause?​

Supported Databases:​

Alternative Approach (Universal):​

Execution Order এ Alias Usage:​

18. What is UNION vs UNION ALL?​

UNION Operator:​

UNION ALL Operator:​

Table Setup:​

UNION Example:​

UNION ALL Example:​

When would you use UNION ALL instead of UNION?​

1. Performance Reasons:​

2. When Duplicates are Wanted:​

3. Data Migration/ETL:​

4. Log/Audit Tables:​

Performance Difference Between Them:​

UNION Performance Impact:​

UNION ALL Performance:​

Real Performance Numbers:​

When Performance Matters Most:​

What is INTERSECT and EXCEPT?​

INTERSECT Operator:​

EXCEPT (or MINUS) Operator:​

19. What is DISTINCT Keyword?​

Without DISTINCT:​

With DISTINCT:​

Can DISTINCT be used across multiple columns?​

Multiple Column DISTINCT:​

Understanding Multi-column DISTINCT:​

How does DISTINCT work internally?​

Internal Processing Steps:​

Performance Implications:​

Optimization Techniques:​

DISTINCT vs GROUP BY Performance:​

Best Practices:​

16. What is a Subquery?

Types of Subqueries:

1. Single Row Subquery:

2. Multiple Row Subquery:

3. Subquery in Different Clauses:

Difference between Subquery and Correlated Subquery?

Regular Subquery (Independent):

Correlated Subquery (Dependent):

Performance Comparison:

Real-world Examples:

When would you use EXISTS vs IN with subqueries?

EXISTS Operator:

IN Operator:

Performance Comparison:

EXISTS এর Advantages:

IN এর Advantages:

When to Use Which:

Examples with Performance Impact:

Performance Comparison: JOIN vs Subquery?

JOIN Approach:

Subquery Approach:

Performance Analysis:

JOIN Advantages:

Subquery Advantages:

Performance Benchmarks:

Best Practices:

17. What is HAVING vs WHERE Clause?

WHERE Clause:

HAVING Clause:

WHERE Usage:

HAVING Usage:

Combined Usage:

Can HAVING be used without GROUP BY?

HAVING without GROUP BY:

Practical Example:

Why it's rarely used:

Can you use column aliases in HAVING clause?

Supported Databases:

Alternative Approach (Universal):

Execution Order এ Alias Usage:

18. What is UNION vs UNION ALL?

UNION Operator:

UNION ALL Operator:

Table Setup:

UNION Example:

UNION ALL Example:

When would you use UNION ALL instead of UNION?

1. Performance Reasons:

2. When Duplicates are Wanted:

3. Data Migration/ETL:

4. Log/Audit Tables:

Performance Difference Between Them:

UNION Performance Impact:

UNION ALL Performance:

Real Performance Numbers:

When Performance Matters Most:

What is INTERSECT and EXCEPT?

INTERSECT Operator:

EXCEPT (or MINUS) Operator:

19. What is DISTINCT Keyword?

Without DISTINCT:

With DISTINCT:

Can DISTINCT be used across multiple columns?

Multiple Column DISTINCT:

Understanding Multi-column DISTINCT:

How does DISTINCT work internally?

Internal Processing Steps:

Performance Implications:

Optimization Techniques:

DISTINCT vs GROUP BY Performance:

Best Practices: