聚合、分组和子查询
一、聚合函数
聚合函数把多行数据计算成一个值。
常用函数
基本语法
SELECT 聚合函数(字段) FROM 表名 WHERE 条件;
示例
-- 统计已发布文章数量
SELECT COUNT(*) AS total FROM posts WHERE status = 'published';
-- 统计总浏览量
SELECT SUM(view_count) AS total_views FROM posts WHERE status = 'published';
-- 统计平均浏览量
SELECT AVG(view_count) AS avg_views FROM posts WHERE status = 'published';
-- 查找浏览量最高和最低
SELECT MAX(view_count) AS max_views, MIN(view_count) AS min_views FROM posts;
AS total 是起别名,让结果列名更易读。
COUNT(*) 和 COUNT(字段) 的区别
-- 统计所有行数(包含 NULL)
SELECT COUNT(*) FROM users;
-- 只统计 age 不为 NULL 的行
SELECT COUNT(age) FROM users;
一般统计行数用 COUNT(*),想排除 NULL 用 COUNT(字段)。
多个聚合函数一起用
SELECT
COUNT(*) AS total,
SUM(view_count) AS total_views,
AVG(view_count) AS avg_views
FROM posts
WHERE status = 'published';
一条 SQL 就能同时拿到总数、总和、平均值。
二、GROUP BY 分组
什么时候用
"按 XX 分组统计"——比如按状态统计文章数、按分类统计文章数、按用户统计评论数。
基本语法
SELECT 分组字段, 聚合函数(字段)
FROM 表名
WHERE 条件
GROUP BY 分组字段;
示例:按状态统计文章数
SELECT status, COUNT(*) AS post_count
FROM posts
GROUP BY status;
结果:
status | post_count
-----------+----------
draft | 1
published | 3
archived | 1
GROUP BY status 把相同状态的行分成一组,COUNT(*) 分别统计每组有多少行。
分组字段必须出现在 SELECT 里
-- ✅ 正确:status 在 SELECT 和 GROUP BY 里都有
SELECT status, COUNT(*) FROM posts GROUP BY status;
-- ❌ 错误:title 不在 GROUP BY 里,也不在聚合函数里
SELECT status, title, COUNT(*) FROM posts GROUP BY status;
规则:SELECT 里出现的非聚合字段,必须出现在 GROUP BY 里。
按多个字段分组
-- 按用户和状态分组,统计每个用户在每种状态下的文章数
SELECT user_id, status, COUNT(*) AS post_count
FROM posts
GROUP BY user_id, status;
结果:
user_id | status | post_count
--------+-----------+----------
1 | draft | 2
1 | published | 5
2 | published | 3
三、HAVING 过滤分组结果
WHERE vs HAVING
-- WHERE:分组前先过滤掉非 published 的行
SELECT status, COUNT(*) FROM posts WHERE status = 'published' GROUP BY status;
-- HAVING:分组后只保留数量大于 2 的组
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status HAVING cnt > 2;
基本语法
SELECT 分组字段, 聚合函数(字段)
FROM 表名
WHERE 条件 -- 分组前过滤行
GROUP BY 分组字段
HAVING 聚合条件; -- 分组后过滤结果
示例:文章数大于 1 的分类
SELECT
c.name AS category_name,
COUNT(p.id) AS post_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id
GROUP BY c.id, c.name
HAVING COUNT(p.id) > 1;
为什么不能把聚合条件写到 WHERE
-- ❌ 错误:WHERE 里不能用聚合函数
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status WHERE cnt > 2;
-- ✅ 正确:聚合条件放 HAVING
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status HAVING cnt > 2;
WHERE 执行时还没分组,聚合函数算不出来。HAVING 在分组之后执行,这时候才有聚合结果。
WHERE + GROUP BY + HAVING 完整流程
SELECT user_id, COUNT(*) AS post_count
FROM posts
WHERE status = 'published' -- ① 先过滤:只要已发布的
GROUP BY user_id -- ② 再分组:按用户分
HAVING COUNT(*) >= 3 -- ③ 最后过滤:只保留 >= 3 篇的用户
ORDER BY post_count DESC; -- ④ 排序
执行顺序:WHERE → GROUP BY → HAVING → ORDER BY → LIMIT
四、JOIN + GROUP BY 组合
实际统计经常需要跨表分组。
示例:按分类统计文章数
SELECT
c.name AS category_name,
COUNT(p.id) AS post_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id
GROUP BY c.id, c.name
ORDER BY post_count DESC;
用 LEFT JOIN 是为了没有文章的分类也显示出来,数量为 0。
为什么用 COUNT(p.id) 而不是 COUNT(*):
LEFT JOIN 后,没有文章的分类也会产生一行。
COUNT(*) 会把这行也算进去 → 结果是 1,但实际是 0。
COUNT(p.id) 只统计 p.id 不为 NULL 的行 → 结果是 0,正确。
示例:按用户统计评论数
SELECT
u.username,
COUNT(cm.id) AS comment_count
FROM users AS u
LEFT JOIN comments AS cm ON u.id = cm.user_id
GROUP BY u.id, u.username
ORDER BY comment_count DESC;
示例:统计每篇文章的评论数
SELECT
p.id,
p.title,
COUNT(cm.id) AS comment_count
FROM posts AS p
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY p.id, p.title
ORDER BY comment_count DESC;
模式都一样:LEFT JOIN 连表 → GROUP BY 分组 → COUNT 统计。
五、子查询
子查询就是 SQL 里面嵌套一个 SQL。
基本语法
SELECT 字段
FROM 表名
WHERE 字段 比较运算符 (SELECT 字段 FROM 表名 WHERE 条件);
括号里的子查询先执行,结果给外层查询用。
示例:查询浏览量高于平均值的文章
SELECT id, title, view_count
FROM posts
WHERE view_count > (
SELECT AVG(view_count) FROM posts
)
ORDER BY view_count DESC;
执行过程:
- 子查询
SELECT AVG(view_count) FROM posts 算出平均浏览量,假设是 100
- 外层查询变成
WHERE view_count > 100
示例:查询浏览量最高的文章
SELECT id, title, view_count
FROM posts
WHERE view_count = (
SELECT MAX(view_count) FROM posts
);
六、IN 子查询
子查询返回多个值时,用 IN 匹配。
基本语法
SELECT 字段
FROM 表名
WHERE 字段 IN (SELECT 字段 FROM 表名 WHERE 条件);
示例:查询有评论的文章
SELECT id, title
FROM posts
WHERE id IN (
SELECT DISTINCT post_id FROM comments
);
执行过程:
- 子查询找出所有有评论的
post_id,比如 [1, 3, 5]
- 外层查询变成
WHERE id IN (1, 3, 5)
IN 子查询 vs JOIN
同一个需求可以有不同写法:
-- 写法一:IN 子查询
SELECT id, title FROM posts
WHERE id IN (SELECT DISTINCT post_id FROM comments);
-- 写法二:JOIN
SELECT DISTINCT p.id, p.title
FROM posts AS p
INNER JOIN comments AS cm ON p.id = cm.post_id;
两种写法结果一样。数据量小时差别不大,数据量大时 JOIN 通常更快。
NOT IN:查询没有评论的文章
SELECT id, title
FROM posts
WHERE id NOT IN (
SELECT DISTINCT post_id FROM comments
);
注意:如果子查询结果里有 NULL,NOT IN 会返回空结果。加个 WHERE post_id IS NULL 过滤掉:
SELECT id, title FROM posts
WHERE id NOT IN (
SELECT post_id FROM comments WHERE post_id IS NOT NULL
);
七、EXISTS 子查询
EXISTS 只关心子查询有没有结果,不关心具体值。
基本语法
SELECT 字段
FROM 表名 AS 别名
WHERE EXISTS (
SELECT 1 FROM 另一个表 AS 别名 WHERE 关联条件
);
SELECT 1 是固定写法,因为 EXISTS 只看有没有行,不看值。
示例:查询有评论的文章
SELECT p.id, p.title
FROM posts AS p
WHERE EXISTS (
SELECT 1
FROM comments AS cm
WHERE cm.post_id = p.id
);
执行过程:对 posts 表的每一行,去 comments 表里找有没有 post_id 匹配的行。有 → 返回这行,没有 → 跳过。
EXISTS vs IN
-- IN:子查询先跑完,拿到所有结果,再匹配
WHERE id IN (SELECT post_id FROM comments)
-- EXISTS:外层每行都去子查询里找,找到就停
WHERE EXISTS (SELECT 1 FROM comments WHERE post_id = p.id)
NOT EXISTS:查询没有评论的文章
SELECT p.id, p.title
FROM posts AS p
WHERE NOT EXISTS (
SELECT 1
FROM comments AS cm
WHERE cm.post_id = p.id
);
比 NOT IN 更安全,不会受 NULL 影响。
八、综合示例
查询每个分类下已发布文章的评论数,只显示评论数 > 5 的分类
SELECT
c.name AS category_name,
COUNT(cm.id) AS comment_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id AND p.status = 'published'
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY c.id, c.name
HAVING COUNT(cm.id) > 5
ORDER BY comment_count DESC;
注意 p.status = 'published' 放在 ON 里而不是 WHERE 里。放 WHERE 会过滤掉没有已发布文章的分类,LEFT JOIN 就失效了。
查询评论数最多的文章
SELECT
p.id,
p.title,
COUNT(cm.id) AS comment_count
FROM posts AS p
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY p.id, p.title
ORDER BY comment_count DESC
LIMIT 1;
九、写 SQL 的建议顺序
复杂 SQL 不要一口气写完,分步骤来:
① 先写主表 FROM
② 加 JOIN 连表
③ 加 WHERE 过滤行
④ 加 GROUP BY 分组
⑤ 加 HAVING 过滤分组结果
⑥ 加 SELECT 选字段
⑦ 加 ORDER BY 排序
⑧ 加 LIMIT 截取
SQL 执行顺序和写法顺序不一样:
FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
先理解执行顺序,写 SQL 时才不会把条件放错位置。