【SQL进阶教程】用SQL处理数列

生成连续编号

思考一下如何使用SQL生成连续编号。例如生成0~99这100个连续编号。
首先生成一张存储了各个数位上数字的表“数字表”。

nUm90A.jpg

这样,就可以通过对两个Digits集合求笛卡尔积而得出0~99的数字。

1
2
3
4
SELECT D1.digit + (D2.digit * 10) AS seq
FROM Digits D1
CROSS JOIN Digits D2
ORDER BY seq;

nUmJcF.png

同样地,通过追加D3、D4等集合,不论多少位的数都可以生成。而且,如果只想生成从1开始,或者到542结束的数,只需要在WHERE子句中加入过滤条件就可以了。

1
2
3
4
5
6
SELECT D1.digit + (D2.digit * 10) + (D3.digit * 100) AS seq
FROM Digits D1
CROSS JOIN Digits D2
CROSS JOIN Digits D3
WHERE D1.digit + (D2.digit * 10) + (D3.digit * 100) BETWEEN 1 AND 542
ORDER BY seq;

通过将这个查询的结果存储在视图里,就可以在需要连续编号时通过简单的SELECT来获取需要的编号。

1
2
3
4
5
6
CREATE VIEW Sequence(seq)
AS
SELECT D1.digit + (D2.digit * 10) + (D3.digit * 100)
FROM Digits D1
CROSS JOIN Digits D2
CROSS JOIN Digits D3;

求全部的缺失编号

查找连续编号中的缺失编号。想要知道如何求出全部的缺失编号。
作为示例,假设存在下面这样一张编号有缺失的表。

nUUbKs.jpg

因为表中最小的值是1,最大的值是12,所以可以根据这个范围从序列视图中获取数。

1
2
3
4
5
6
7
-- EXCEPT
SELECT seq
FROM Sequence
WHERE seq BETWEEN 1 AND 12
EXCEPT
SELECT seq
FROM SeqTbl;
1
2
3
4
5
-- IN
SELECT seq
FROM Sequence
WHERE seq BETWEEN 1 AND 12
AND seq NOT IN (SELECT seq FROM SeqTbl);

nUd41g.png

可能像下面这么做性能会有所下降,但是通过扩展BETWEEN谓词的参数,可以动态地指定目标表的最大值和最小值。

1
2
3
4
5
6
7
SELECT seq
FROM Sequence
WHERE seq BETWEEN (SELECT MIN(seq) FROM SeqTbl)
AND (SELECT MAX(seq) FROM SeqTbl)
EXCEPT
SELECT seq
FROM SeqTbl;

这种写法在查询上限和下限未必固定的表时非常方便。如果在“seq”列上建立索引,那么极值函数的运行可以变得更快速。

三个人能坐得下吗

假设存在下面这样一张存储了火车座位预定情况的表。

nUc9kd.jpg

问题是,从1~15的座位编号中,找出连续3个空位的全部组合。假设所有的座位排成了一条直线。
希望得到的结果是下面四种。

  • 3~5
  • 7~9
  • 8~10
  • 9~11
1
2
3
4
5
6
7
SELECT S1.seat AS start_seat, '~', S2.seat AS end_seat
FROM Seats S1, Seats S2
WHERE S2.seat = S1.seat + (:head_cnt - 1)
AND NOT EXISTS (SELECT *
FROM Seats S3
WHERE S3.seat BETWEEN S1.seat AND S2.seat
AND S3.status <> '未预订');

其中,“:head_cnt”是表示需要的空位个数的参数。

接下来,看一下升级版,即发生换排的情况。假设这列火车每一排有5个座位。

nUhS78.jpg

要想解决换排的问题,除了要求序列内的所有座位全部都是空位,还需要加入“全部都在一排”这样一个条件。

1
2
3
4
5
6
7
8
SELECT S1.seat AS start_seat, '~', S2.seat AS end_seat
FROM Seats2 S1, Seats2 S2
WHERE S2.seat = S1.seat + (:head_cnt - 1)
AND NOT EXISTS (SELECT *
FROM Seats2 S3
WHERE S3.seat BETWEEN S1.seat AND S2.seat
AND (S3.status <> '未预订'
OR S3.row_id <> S1.row_id));

nUxEvV.png

最多能坐下多少人

这次要查询的是“按现在的空位状况,最多能坐下多少人”。换句话说,要求的是最长的序列。使用下面这张表Seats3。

nUxIx0.jpg

就这张表而言,长度为4的序列“2~5”就是答案。
针对表Seats3中的数据,要想保证从座位A到另一个座位B是一个序列,则下面的3个条件必须全部都满足。

  • 条件1:起点到终点之间的所有座位状态都是“未预订”。
  • 条件2:起点之前的座位状态不是“未预订”。
  • 条件3:终点之后的座位状态不是“未预订”。

可以先生成一张存储了所有可能序列的视图。有了这个视图之后,只需从中查找出最长的序列就可以了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE VIEW Sequences(start_seat, end_seat, seat_cnt)
AS
SELECT S1.seat AS start_seat,
S2.seat AS end_seat,
S2.seat - S1.seat + 1 AS seat_cnt
FROM Seats3 S1, Seats3 S2
WHERE S1.seat <= S2.seat
AND NOT EXISTS (SELECT *
FROM Seats3 S3
WHERE (S3.seat BETWEEN S1.seat AND S2.seat
AND S3.status <> '未预订')
OR (S3.seat = S2.seat + 1
AND S3.status = '未预订')
OR (S3.seat = S1.seat - 1
AND S3.status = '未预订'));

napaaF.png

1
2
3
SELECT start_seat, '~', end_seat, seat_cnt
FROM Sequences
WHERE seat_cnt = (SELECT MAX(seat_cnt) FROM Sequences);

单调递增和单调递减

假设存在下面这样一张反映了某公司股价动态的表。

napzzn.jpg

这里,求一下股价单调递增的时间区间。

1
2
3
4
5
6
7
8
9
10
SELECT S1.deal_date AS start_date,
S2.deal_date AS end_date
FROM MyStock S1, MyStock S2
WHERE S1.deal_date < S2.deal_date
AND NOT EXISTS (SELECT *
FROM MyStock S3, MyStock S4
WHERE S3.deal_date BETWEEN S1.deal_date AND S2.deal_date
AND S4.deal_date BETWEEN S1.deal_date AND S2.deal_date
AND S3.deal_date < S4.deal_date
AND S3.price >= S4.price);

naeC9K.png

要把不需要的子集排除掉。使用极值函数很容易就能实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT MIN(start_date) AS start_date, end_date
FROM (SELECT S1.deal_date AS start_date,
MAX(S2.deal_date) AS end_date
FROM MyStock S1, MyStock S2
WHERE S1.deal_date < S2.deal_date
AND NOT EXISTS (SELECT *
FROM MyStock S3, MyStock S4
WHERE S3.deal_date BETWEEN S1.deal_date AND S2.deal_date
AND S4.deal_date BETWEEN S1.deal_date AND S2.deal_date
AND S3.deal_date < S4.deal_date
AND S3.price >= S4.price)
GROUP BY S1.deal_date) TMP
GROUP BY end_date;

naZtSO.png

这段代码的关键在于最大限度地延伸起点和终点。