问题
每日人流量信息被记录在这三列信息中:序号 (id)、日期 (visit_date)、 人流量 (people)
每天只有一行记录,日期随着 id 的增加而增加。
编写一个 SQL 查询以找出每行的人数大于或等于 100 且 id 连续的三行或更多行记录。
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| visit_date | date |
| people | int |
+---------------+---------+
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
思路
首先需要找出人数大于100的天数, 然后这个题目就和连续登录是一样的了,需要用row_number()排序。 不同的是这里不需要区分user,所以over从句里没有partition by,只有order by。
select id, visit_date, people, row_number() over (order by id)
from stadium
where people > 100;
输出
id | visit_date | people | row_number
----+------------+--------+------------
2 | 2017-01-02 | 109 | 1
3 | 2017-01-03 | 150 | 2
5 | 2017-01-05 | 145 | 3
6 | 2017-01-06 | 1455 | 4
7 | 2017-01-07 | 199 | 5
8 | 2017-01-08 | 188 | 6
接下来的步骤就是和连续登录一样,找出相同开始时间的组。具体参考SQL-连续n天登录 - 天马行空的小站 (1bittech.com)
不同的是这里因为要列出所有列,所以不能直接group by
select id, visit_date, people, id - rank::int as start_date
from (
select id, visit_date, people, row_number() over (order by id) as rank
from stadium
where people > 100) as large_number;
输出
id | visit_date | people | start_date
----+------------+--------+------------
2 | 2017-01-02 | 109 | 1
3 | 2017-01-03 | 150 | 1
5 | 2017-01-05 | 145 | 2
6 | 2017-01-06 | 1455 | 2
7 | 2017-01-07 | 199 | 2
8 | 2017-01-09 | 188 | 2
之后分组
select id, visit_date, people, count(1) over (partition by start_date)
from (
select id, visit_date, people, id - rank::int as start_date
from (
select id, visit_date, people, row_number() over (order by id) as rank
from stadium
where people > 100) as large_number
) as counted_table;
输出
id | visit_date | people | count
----+------------+--------+-------
2 | 2017-01-02 | 109 | 2
3 | 2017-01-03 | 150 | 2
5 | 2017-01-05 | 145 | 4
6 | 2017-01-06 | 1455 | 4
7 | 2017-01-07 | 199 | 4
8 | 2017-01-09 | 188 | 4
此时,我们只要过滤出count(连续登录天数)大于等于3的
select id, visit_date, people
from (
select id, visit_date, people, count(1) over (partition by start_date)
from (
select id, visit_date, people, id - rank::int as start_date
from (
select id, visit_date, people, row_number() over (order by id) as rank
from stadium
where people > 100) as large_number
) as counted_table
) as final_table
where count >= 3;
输出
id | visit_date | people
----+------------+--------
5 | 2017-01-05 | 145
6 | 2017-01-06 | 1455
7 | 2017-01-07 | 199
8 | 2017-01-09 | 188