聚合、分组和子查询

一、聚合函数

聚合函数把多行数据计算成一个值。

常用函数

函数含义示例
COUNT()统计行数统计有多少篇文章
SUM()求和统计总浏览量
AVG()平均值统计平均浏览量
MAX()最大值查找浏览量最高的文章
MIN()最小值查找最早发布的文章

基本语法

SELECT 聚合函数(字段) FROM 表名 WHERE 条件;

示例

-- 统计已发布文章数量
SELECT COUNT(*) AS total FROM posts WHERE status = 'published';

-- 统计总浏览量
SELECT SUM(view_count) AS total_views FROM posts WHERE status = 'published';

-- 统计平均浏览量
SELECT AVG(view_count) AS avg_views FROM posts WHERE status = 'published';

-- 查找浏览量最高和最低
SELECT MAX(view_count) AS max_views, MIN(view_count) AS min_views FROM posts;

AS total 是起别名,让结果列名更易读。

COUNT(*) 和 COUNT(字段) 的区别

-- 统计所有行数(包含 NULL)
SELECT COUNT(*) FROM users;

-- 只统计 age 不为 NULL 的行
SELECT COUNT(age) FROM users;
写法行为
COUNT(*)统计所有行,不管有没有 NULL
COUNT(字段)只统计该字段不为 NULL 的行

一般统计行数用 COUNT(*),想排除 NULL 用 COUNT(字段)

多个聚合函数一起用

SELECT
    COUNT(*) AS total,
    SUM(view_count) AS total_views,
    AVG(view_count) AS avg_views
FROM posts
WHERE status = 'published';

一条 SQL 就能同时拿到总数、总和、平均值。


二、GROUP BY 分组

什么时候用

"按 XX 分组统计"——比如按状态统计文章数、按分类统计文章数、按用户统计评论数。

基本语法

SELECT 分组字段, 聚合函数(字段)
FROM 表名
WHERE 条件
GROUP BY 分组字段;

示例:按状态统计文章数

SELECT status, COUNT(*) AS post_count
FROM posts
GROUP BY status;

结果:

status     | post_count
-----------+----------
draft      | 1
published  | 3
archived   | 1

GROUP BY status 把相同状态的行分成一组,COUNT(*) 分别统计每组有多少行。

分组字段必须出现在 SELECT 里

-- ✅ 正确:status 在 SELECT 和 GROUP BY 里都有
SELECT status, COUNT(*) FROM posts GROUP BY status;

-- ❌ 错误:title 不在 GROUP BY 里,也不在聚合函数里
SELECT status, title, COUNT(*) FROM posts GROUP BY status;

规则:SELECT 里出现的非聚合字段,必须出现在 GROUP BY 里。

按多个字段分组

-- 按用户和状态分组,统计每个用户在每种状态下的文章数
SELECT user_id, status, COUNT(*) AS post_count
FROM posts
GROUP BY user_id, status;

结果:

user_id | status    | post_count
--------+-----------+----------
1       | draft     | 2
1       | published | 5
2       | published | 3

三、HAVING 过滤分组结果

WHERE vs HAVING

关键字作用执行时机
WHERE过滤分组之前
HAVING过滤分组结果分组之后
-- WHERE:分组前先过滤掉非 published 的行
SELECT status, COUNT(*) FROM posts WHERE status = 'published' GROUP BY status;

-- HAVING:分组后只保留数量大于 2 的组
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status HAVING cnt > 2;

基本语法

SELECT 分组字段, 聚合函数(字段)
FROM 表名
WHERE 条件          -- 分组前过滤行
GROUP BY 分组字段
HAVING 聚合条件;    -- 分组后过滤结果

示例:文章数大于 1 的分类

SELECT
    c.name AS category_name,
    COUNT(p.id) AS post_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id
GROUP BY c.id, c.name
HAVING COUNT(p.id) > 1;

为什么不能把聚合条件写到 WHERE

-- ❌ 错误:WHERE 里不能用聚合函数
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status WHERE cnt > 2;

-- ✅ 正确:聚合条件放 HAVING
SELECT status, COUNT(*) AS cnt FROM posts GROUP BY status HAVING cnt > 2;

WHERE 执行时还没分组,聚合函数算不出来。HAVING 在分组之后执行,这时候才有聚合结果。

WHERE + GROUP BY + HAVING 完整流程

SELECT user_id, COUNT(*) AS post_count
FROM posts
WHERE status = 'published'    -- ① 先过滤:只要已发布的
GROUP BY user_id               -- ② 再分组:按用户分
HAVING COUNT(*) >= 3           -- ③ 最后过滤:只保留 >= 3 篇的用户
ORDER BY post_count DESC;      -- ④ 排序

执行顺序:WHEREGROUP BYHAVINGORDER BYLIMIT


四、JOIN + GROUP BY 组合

实际统计经常需要跨表分组。

示例:按分类统计文章数

SELECT
    c.name AS category_name,
    COUNT(p.id) AS post_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id
GROUP BY c.id, c.name
ORDER BY post_count DESC;

LEFT JOIN 是为了没有文章的分类也显示出来,数量为 0。

为什么用 COUNT(p.id) 而不是 COUNT(*)

LEFT JOIN 后,没有文章的分类也会产生一行。
COUNT(*) 会把这行也算进去 → 结果是 1,但实际是 0。
COUNT(p.id) 只统计 p.id 不为 NULL 的行 → 结果是 0,正确。

示例:按用户统计评论数

SELECT
    u.username,
    COUNT(cm.id) AS comment_count
FROM users AS u
LEFT JOIN comments AS cm ON u.id = cm.user_id
GROUP BY u.id, u.username
ORDER BY comment_count DESC;

示例:统计每篇文章的评论数

SELECT
    p.id,
    p.title,
    COUNT(cm.id) AS comment_count
FROM posts AS p
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY p.id, p.title
ORDER BY comment_count DESC;

模式都一样:LEFT JOIN 连表 → GROUP BY 分组 → COUNT 统计。


五、子查询

子查询就是 SQL 里面嵌套一个 SQL。

基本语法

SELECT 字段
FROM 表名
WHERE 字段 比较运算符 (SELECT 字段 FROM 表名 WHERE 条件);

括号里的子查询先执行,结果给外层查询用。

示例:查询浏览量高于平均值的文章

SELECT id, title, view_count
FROM posts
WHERE view_count > (
    SELECT AVG(view_count) FROM posts
)
ORDER BY view_count DESC;

执行过程:

  1. 子查询 SELECT AVG(view_count) FROM posts 算出平均浏览量,假设是 100
  2. 外层查询变成 WHERE view_count > 100

示例:查询浏览量最高的文章

SELECT id, title, view_count
FROM posts
WHERE view_count = (
    SELECT MAX(view_count) FROM posts
);

六、IN 子查询

子查询返回多个值时,用 IN 匹配。

基本语法

SELECT 字段
FROM 表名
WHERE 字段 IN (SELECT 字段 FROM 表名 WHERE 条件);

示例:查询有评论的文章

SELECT id, title
FROM posts
WHERE id IN (
    SELECT DISTINCT post_id FROM comments
);

执行过程:

  1. 子查询找出所有有评论的 post_id,比如 [1, 3, 5]
  2. 外层查询变成 WHERE id IN (1, 3, 5)

IN 子查询 vs JOIN

同一个需求可以有不同写法:

-- 写法一:IN 子查询
SELECT id, title FROM posts
WHERE id IN (SELECT DISTINCT post_id FROM comments);

-- 写法二:JOIN
SELECT DISTINCT p.id, p.title
FROM posts AS p
INNER JOIN comments AS cm ON p.id = cm.post_id;

两种写法结果一样。数据量小时差别不大,数据量大时 JOIN 通常更快。

NOT IN:查询没有评论的文章

SELECT id, title
FROM posts
WHERE id NOT IN (
    SELECT DISTINCT post_id FROM comments
);

注意:如果子查询结果里有 NULLNOT IN 会返回空结果。加个 WHERE post_id IS NULL 过滤掉:

SELECT id, title FROM posts
WHERE id NOT IN (
    SELECT post_id FROM comments WHERE post_id IS NOT NULL
);

七、EXISTS 子查询

EXISTS 只关心子查询有没有结果,不关心具体值。

基本语法

SELECT 字段
FROM 表名 AS 别名
WHERE EXISTS (
    SELECT 1 FROM 另一个表 AS 别名 WHERE 关联条件
);

SELECT 1 是固定写法,因为 EXISTS 只看有没有行,不看值。

示例:查询有评论的文章

SELECT p.id, p.title
FROM posts AS p
WHERE EXISTS (
    SELECT 1
    FROM comments AS cm
    WHERE cm.post_id = p.id
);

执行过程:对 posts 表的每一行,去 comments 表里找有没有 post_id 匹配的行。有 → 返回这行,没有 → 跳过。

EXISTS vs IN

-- IN:子查询先跑完,拿到所有结果,再匹配
WHERE id IN (SELECT post_id FROM comments)

-- EXISTS:外层每行都去子查询里找,找到就停
WHERE EXISTS (SELECT 1 FROM comments WHERE post_id = p.id)
场景推荐
子查询结果集小IN
子查询结果集大,外层表小EXISTS
只判断是否存在EXISTS

NOT EXISTS:查询没有评论的文章

SELECT p.id, p.title
FROM posts AS p
WHERE NOT EXISTS (
    SELECT 1
    FROM comments AS cm
    WHERE cm.post_id = p.id
);

NOT IN 更安全,不会受 NULL 影响。


八、综合示例

查询每个分类下已发布文章的评论数,只显示评论数 > 5 的分类

SELECT
    c.name AS category_name,
    COUNT(cm.id) AS comment_count
FROM categories AS c
LEFT JOIN posts AS p ON c.id = p.category_id AND p.status = 'published'
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY c.id, c.name
HAVING COUNT(cm.id) > 5
ORDER BY comment_count DESC;

注意 p.status = 'published' 放在 ON 里而不是 WHERE 里。放 WHERE 会过滤掉没有已发布文章的分类,LEFT JOIN 就失效了。

查询评论数最多的文章

SELECT
    p.id,
    p.title,
    COUNT(cm.id) AS comment_count
FROM posts AS p
LEFT JOIN comments AS cm ON p.id = cm.post_id
GROUP BY p.id, p.title
ORDER BY comment_count DESC
LIMIT 1;

九、写 SQL 的建议顺序

复杂 SQL 不要一口气写完,分步骤来:

① 先写主表 FROM
② 加 JOIN 连表
③ 加 WHERE 过滤行
④ 加 GROUP BY 分组
⑤ 加 HAVING 过滤分组结果
⑥ 加 SELECT 选字段
⑦ 加 ORDER BY 排序
⑧ 加 LIMIT 截取

SQL 执行顺序和写法顺序不一样:

FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

先理解执行顺序,写 SQL 时才不会把条件放错位置。