【SQL进阶教程】外连接的用法

用外连接进行行列转换(1)(行->列):制作交叉表

这里有一张用于管理员工学习过的培训课程的表,如下所示。

nVUMLT.jpg

利用上面这张表生成下面这样一张交叉表(“课程学习记录一览表”)。

nVdrRA.jpg

1
2
3
4
5
6
7
8
9
-- 外连接
SELECT C0.name,
CASE WHEN C1.name IS NOT NULL THEN 'O' ELSE NULL END AS 'SQL入门',
CASE WHEN C2.name IS NOT NULL THEN 'O' ELSE NULL END AS 'UNIX基础',
CASE WHEN C3.name IS NOT NULL THEN 'O' ELSE NULL END AS 'JAVA中级'
FROM (SELECT DISTINCT name FROM Courses) C0
LEFT OUTER JOIN (SELECT name FROM Courses WHERE course = 'SQL入门') C1 ON C0.name = C1.name
LEFT OUTER JOIN (SELECT name FROM Courses WHERE course = 'UNIX基础') C2 ON C0.name = C2.name
LEFT OUTER JOIN (SELECT name FROM Courses WHERE course = 'JAVA中级') C3 ON C0.name = C3.name;

这种做法具有比较直观和易于理解的优点,但是因为大量用到了内嵌视图和连接操作,代码会显得很臃肿。而且,随着表头列数的增加,性能也会恶化。

一般情况下,外连接都可以用标量子查询替代,因此可以像下面这样写。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- 标量子查询
SELECT C0.name,
(SELECT 'O'
FROM Courses C1
WHERE course = 'SQL入门'
AND C0.name = C1.name) AS 'SQL入门',
(SELECT 'O'
FROM Courses C2
WHERE course = 'UNIX基础'
AND C0.name = C2.name) AS 'UNIX基础',
(SELECT 'O'
FROM Courses C3
WHERE course = 'JAVA中级'
AND C0.name = C3.name) AS 'JAVA中级'
FROM (SELECT DISTINCT name FROM Courses) C0;

这种做法不仅利于应对需求变更,对于需要动态生成SQL的系统也是很有好处的。缺点是性能不太好,目前在SELECT子句中使用标量子查询(或者关联子查询)的话,性能开销还是相当大的。

1
2
3
4
5
6
7
8
9
10
-- CASE表达式
SELECT name,
CASE WHEN SUM(CASE WHEN course = 'SQL入门' THEN 1 ELSE NULL END) = 1
THEN 'O' ELSE NULL END AS 'SQL入门',
CASE WHEN SUM(CASE WHEN course = 'UNIX基础' THEN 1 ELSE NULL END) = 1
THEN 'O' ELSE NULL END AS 'UNIX基础',
CASE WHEN SUM(CASE WHEN course = 'JAVA中级' THEN 1 ELSE NULL END) = 1
THEN 'O' ELSE NULL END AS 'JAVA中级'
FROM Courses
GROUP BY name;

用外连接进行行列转换(2)(列->行):汇总重复项于一列

假设存在下面这样一张表。

nVxrff.jpg

需要做的是将这张表转换成行格式的数据。不过,根据具体需求,有时需要把没有孩子的吉田也留在表里,像下面这张“员工子女列表”这样。

nVzPje.jpg

1
2
3
4
5
6
7
8
9
10
11
CREATE VIEW Children(child)
AS
SELECT child_1 FROM Personnel
UNION
SELECT child_2 FROM Personnel
UNION
SELECT child_3 FROM Personnel;

SELECT EMP.employee, CHILDREN.child
FROM Personnel EMP
LEFT OUTER JOIN Children ON CHILDREN.child IN (EMP.child_1, EMP.child_2, EMP.child_3);

在交叉表里制作嵌套式表侧栏

在生成统计表的工作中,经常会有制作嵌套式表头和表侧栏的需求。例如:表TblPop是一张按照县、年龄层级和性别统计的人口分布表,要求根据表TblPop生成交叉表“包含嵌套式表侧栏的统计表”。

nZ3nwd.jpg

nZ3Nwj.jpg

nZ3UTs.jpg

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT MASTER.age_class AS age_class,
MASTER.sex_cd AS sex_cd,
DATA.pop_tohoku AS pop_tohoku,
DATA.pop_kanto AS pop_kanto
FROM (SELECT age_class, sex_cd
FROM TblAge CROSS JOIN TblSex) MASTER
LEFT OUTER JOIN (SELECT age_class, sex_cd,
SUM(CASE WHEN pref_name IN ('秋田', '青森') THEN population ELSE NULL END) AS pop_tohoku,
SUM(CASE WHEN pref_name IN ('东京', '千叶') THEN population ELSE NULL END) AS pop_kanto
FROM TblPop
GROUP BY age_class, sex_cd) DATA
ON MASTER.age_class = DATA.age_class
AND MASTER.sex_cd = DATA.sex_cd;

技巧是对表TblAge和表TblSex进行交叉连接运算,生成笛卡尔积。

作为乘法运算的连接

有下面的商品主表和商品销售历史管理表。

nZ0fSJ.jpg

使用这两张表生成一张统计表,以商品为单位汇总出各自的销量。

nZDzIf.png

1
2
3
4
5
6
7
-- 通过在连接前聚合来创建一对一的关系
SELECT I.item_no, SH.total_qty
FROM Items I
LEFT OUTER JOIN (SELECT item_no, SUM(quantity) AS total_qty
FROM SalesHistory
GROUP BY item_no) SH
ON I.item_no = SH.item_no;

这种做法的确是正确的,代码也很容易理解。但是,如果从性能角度考虑,这条SQL语句还是有些问题的。比如临时视图SH的数据需要临时存储在内存里,还有就是虽然通过聚合将item_no变成了主键,但是SH上却不存在主键索引,因此也就无法利用索引优化查询。

1
2
3
4
5
-- 先进行一对多的连接再聚合
SELECT I.item_no, SUM(SH.quantity) AS total_qty
FROM Items I
LEFT OUTER JOIN SalesHistory SH ON I.item_no = SH.item_no
GROUP BY I.item_no;

这种做法代码更简洁,而且没有使用临时视图,所以性能也会有所改善。技巧是一对一或一对多关系的两个集合,在进行连接操作后行数不会(异常地)增加。

全外连接

全外连接是能够从这样两张内容不一致的表里,没有遗漏地获取全部信息的方法,所以也可以理解成“把两张表都当作主表来使用”的连接。

nZ2nF1.jpg

nZ2KW6.jpg

1
2
3
4
5
SELECT COALESCE(A.id, B.id) AS id,
A.name AS A_name,
B.name AS B_name
FROM Class_A A
FULL OUTER JOIN Class_B B ON A.id = B.id;

nZ2779.png

用外连接求差集:A-B

1
2
3
4
SELECT A.id AS id, A.name AS A_name
FROM Class_A A
LEFT OUTER JOIN Class_B B ON A.id = B.id
WHERE B.name IS NULL;

nZWfLF.png

用外连接求差集:B-A

1
2
3
4
SELECT B.id AS id, B.name AS B_name
FROM Class_A A
RIGHT OUTER JOIN Class_B B ON A.id = B.id
WHERE A.name IS NULL;

nZfpFA.png

用全外连接求异或集

SQL没有定义求异或集的运算符。如果用集合运算符,可以有两种方法。一种是(A UNION B) EXCEPT (A INTERSECT B),另一种是(A EXCEPT B) UNION (B EXCEPT A)。两种方法都比较麻烦,性能开销也会增大。

1
2
3
4
5
6
SELECT COALESCE(A.id, B.id) AS id,
COALESCE(A.name, B.name) AS name
FROM Class_A A
FULL OUTER JOIN Class_B B ON A.id = B.id
WHERE A.name IS NULL
OR B.name IS NULL:

nZoOyt.png

用外连接求集合的商

使用“HAVING子句的力量”里的表Items和表ShopItems,可以像下面这样写。

1
2
3
4
5
6
7
8
SELECT DISTINCT shop
FROM ShopItems SI1
WHERE NOT EXISTS (SELECT I.item
FROM Items I
LEFT OUTER JOIN ShopItems SI2
ON SI2.item = I.item
AND SI1.shop = SI2.shop
WHERE SI2.item IS NULL);

nZbh1H.png