问题

每日人流量信息被记录在这三列信息中:序号 (id)、日期 (visit_date)、 人流量 (people)

每天只有一行记录,日期随着 id 的增加而增加。

编写一个 SQL 查询以找出每行的人数大于或等于 100 且 id 连续的三行或更多行记录。

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+
+------+------------+-----------+
| id   | visit_date | people    |
+------+------------+-----------+
| 1    | 2017-01-01 | 10        |
| 2    | 2017-01-02 | 109       |
| 3    | 2017-01-03 | 150       |
| 4    | 2017-01-04 | 99        |
| 5    | 2017-01-05 | 145       |
| 6    | 2017-01-06 | 1455      |
| 7    | 2017-01-07 | 199       |
| 8    | 2017-01-09 | 188       |
+------+------------+-----------+

思路

首先需要找出人数大于100的天数, 然后这个题目就和连续登录是一样的了,需要用row_number()排序。 不同的是这里不需要区分user,所以over从句里没有partition by,只有order by。

select id, visit_date, people, row_number() over (order by id) 
from stadium 
where people > 100;

输出

 id | visit_date | people | row_number 
----+------------+--------+------------
  2 | 2017-01-02 |    109 |          1
  3 | 2017-01-03 |    150 |          2
  5 | 2017-01-05 |    145 |          3
  6 | 2017-01-06 |   1455 |          4
  7 | 2017-01-07 |    199 |          5
  8 | 2017-01-08 |    188 |          6

接下来的步骤就是和连续登录一样,找出相同开始时间的组。具体参考SQL-连续n天登录 - 天马行空的小站 (1bittech.com)

不同的是这里因为要列出所有列,所以不能直接group by

select id, visit_date, people, id - rank::int as start_date
from (
  select id, visit_date, people, row_number() over (order by id) as rank
  from stadium 
  where people > 100) as large_number;

输出

 id | visit_date | people | start_date 
----+------------+--------+------------
  2 | 2017-01-02 |    109 |          1
  3 | 2017-01-03 |    150 |          1
  5 | 2017-01-05 |    145 |          2
  6 | 2017-01-06 |   1455 |          2
  7 | 2017-01-07 |    199 |          2
  8 | 2017-01-09 |    188 |          2

之后分组

select id, visit_date, people, count(1) over (partition by start_date)
from (
  select id, visit_date, people, id - rank::int as start_date
  from (
    select id, visit_date, people, row_number() over (order by id) as rank
    from stadium 
    where people > 100) as large_number
) as counted_table;

输出

 id | visit_date | people | count 
----+------------+--------+-------
  2 | 2017-01-02 |    109 |     2
  3 | 2017-01-03 |    150 |     2
  5 | 2017-01-05 |    145 |     4
  6 | 2017-01-06 |   1455 |     4
  7 | 2017-01-07 |    199 |     4
  8 | 2017-01-09 |    188 |     4

此时,我们只要过滤出count(连续登录天数)大于等于3的

select id, visit_date, people
from (
  select id, visit_date, people, count(1) over (partition by start_date)
  from (
    select id, visit_date, people, id - rank::int as start_date
    from (
      select id, visit_date, people, row_number() over (order by id) as rank
      from stadium 
      where people > 100) as large_number
  ) as counted_table
) as final_table
where count >= 3;

输出

 id | visit_date | people 
----+------------+--------
  5 | 2017-01-05 |    145
  6 | 2017-01-06 |   1455
  7 | 2017-01-07 |    199
  8 | 2017-01-09 |    188