SQL 进阶
这篇讲 PostgreSQL 中比基础 CRUD 更强大的查询技巧:窗口函数、CTE(公共表表达式)、DISTINCT ON 等 PG 特有语法。
窗口函数
窗口函数在「结果集的某个窗口上」做计算,不像 GROUP BY 那样合并行,而是保留每一行:
-- 基本语法
SELECT
department,
employee,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;常用窗口函数
-- 准备测试数据
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
salesperson VARCHAR(50),
region VARCHAR(20),
amount DECIMAL(10,2),
sale_date DATE
);
INSERT INTO sales (salesperson, region, amount, sale_date) VALUES
('张三', '华北', 15000, '2024-01-15'),
('李四', '华北', 12000, '2024-01-20'),
('王五', '华东', 18000, '2024-01-10'),
('赵六', '华东', 14000, '2024-01-25'),
('张三', '华南', 20000, '2024-02-01'),
('李四', '华南', 11000, '2024-02-05');-- ROW_NUMBER():每行唯一序号
SELECT salesperson, amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) AS row_num
FROM sales;
-- RANK():排名(有间隔——同值占位)
SELECT salesperson, amount,
RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales;
-- DENSE_RANK():排名(无间隔——同值不占位)
SELECT salesperson, amount,
DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rank
FROM sales;
-- SUM() + PARTITION BY:分区累计
SELECT salesperson, region, amount,
SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) AS running_total
FROM sales;
-- LAG/LEAD:前后行访问
SELECT salesperson, amount, sale_date,
LAG(amount) OVER (ORDER BY sale_date) AS prev_amount,
amount - LAG(amount) OVER (ORDER BY sale_date) AS diff
FROM sales;CTE(公共表表达式)
CTE 让你把复杂查询拆成可读的步骤,像一个临时视图:
-- 基础 CTE
WITH ranked_sales AS (
SELECT
salesperson,
SUM(amount) AS total_sales,
RANK() OVER (ORDER BY SUM(amount) DESC) AS rank
FROM sales
GROUP BY salesperson
)
SELECT * FROM ranked_sales WHERE rank <= 3;-- 多个 CTE 串联(复杂查询拆为步骤)
WITH
monthly AS (
SELECT
salesperson,
DATE_TRUNC('month', sale_date) AS month,
SUM(amount) AS monthly_sales
FROM sales
GROUP BY salesperson, DATE_TRUNC('month', sale_date)
),
ranked AS (
SELECT *,
RANK() OVER (PARTITION BY month ORDER BY monthly_sales DESC) AS rank
FROM monthly
)
SELECT month, salesperson, monthly_sales
FROM ranked
WHERE rank = 1
ORDER BY month;CTE + 窗口函数是 PG 数据分析的组合利器。一个业务需求(如「各月各区域的销售冠军」)用 CTE 拆解:先按月汇总 → 再排名 → 最后取第一名。SQL 的可读性远高于层层嵌套的子查询。
DISTINCT ON
DISTINCT ON 是 PG 独有的语法,按指定列去重并保留每组的第一行:
-- 每个销售员最新的销售记录
SELECT DISTINCT ON (salesperson)
salesperson,
amount,
sale_date
FROM sales
ORDER BY salesperson, sale_date DESC;等价于用窗口函数写的:
SELECT salesperson, amount, sale_date
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY sale_date DESC) AS rn
FROM sales
) sub
WHERE rn = 1;DISTINCT ON 更简洁,但必须和 ORDER BY 配合——按相同列排序才能得到预期结果。
递归 CTE
处理树形结构(菜单、组织架构、文件目录):
-- 建表
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(50),
parent_id INT REFERENCES categories(id)
);
INSERT INTO categories (id, name, parent_id) VALUES
(1, '电子产品', NULL),
(2, '手机', 1),
(3, '电脑', 1),
(4, '苹果', 2),
(5, '华为', 2),
(6, '笔记本', 3);
-- 递归查询:从"电子产品"往下找所有子分类
WITH RECURSIVE tree AS (
-- 基础情况:根节点
SELECT id, name, parent_id, 0 AS level
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- 递归情况:子节点
SELECT c.id, c.name, c.parent_id, t.level + 1
FROM categories c
INNER JOIN tree t ON c.parent_id = t.id
)
SELECT REPEAT(' ', level) || name AS hierarchy
FROM tree;输出:
电子产品
手机
苹果
华为
电脑
笔记本聚合与 GROUP BY 进阶
-- GROUPING SETS:多维度汇总
SELECT
COALESCE(region, '所有地区') AS region,
COALESCE(salesperson, '所有销售') AS salesperson,
SUM(amount) AS total
FROM sales
GROUP BY GROUPING SETS ((region), (salesperson), ())
ORDER BY region, salesperson;
-- FILTER 从句(PG 特有的条件聚合语法)
SELECT
salesperson,
SUM(amount) FILTER (WHERE region = '华北') AS north_sales,
SUM(amount) FILTER (WHERE region = '华东') AS east_sales,
SUM(amount) FILTER (WHERE region = '华南') AS south_sales
FROM sales
GROUP BY salesperson;一句话小结
窗口函数让你在保留原行的情况下做排名和累计,CTE 把复杂查询拆成可读步骤,DISTINCT ON 是最简洁的去重方式,递归 CTE 处理树形数据。掌握这些,你的 SQL 就从「能查」跃升到「查得好」。下一篇讲 索引与性能优化。
最后更新于