【MySQL】窗口函数 Lead 和 Lag 的运用(MySQL版本8+)以及 时间差函数 TIMESTAMPDIFF 的运用
力扣题
1、题目地址
2、模拟表
表:Signups
Column Name | Type |
---|---|
user_id | int |
time_stamp | datetime |
- user_id is the primary key for this table.
- 每行包含有关 ID 为 user_id 的用户的注册时间的信息。
表:Confirmations
Column Name | Type |
---|---|
user_id | int |
time_stamp | datetime |
action | ENUM |
- (user_id, time_stamp) is the primary key for this table.
- user_id is a foreign key with a reference to the Signups table.
- action is an ENUM of the type (‘confirmed’, ‘timeout’)
- 此表的每一行都表示 ID 为 user_id 的用户在 time_stamp 请求了确认消息,
- 并且该确认消息已被确认(‘confirmed’)或已过期(‘timeout’)。
3、要求
编写 SQL 查询以查找在 24 小时窗口内两次请求确认消息的用户的 ID。
两个正好相隔 24 小时的消息被认为是在窗口内。 该操作不会影响答案,只会影响请求时间。
以任意顺序返回结果表。
查询结果格式如下例:
Signups 表:
user_id | time_stamp |
---|---|
3 | 2020-03-21 10:16:13 |
7 | 2020-01-04 13:57:59 |
2 | 2020-07-29 23:09:44 |
6 | 2020-12-09 10:39:37 |
Confirmations 表:
user_id | time_stamp | action |
---|---|---|
3 | 2021-01-06 03:30:46 | timeout |
3 | 2021-01-06 03:37:45 | timeout |
7 | 2021-06-12 11:57:29 | confirmed |
7 | 2021-06-13 11:57:30 | confirmed |
2 | 2021-01-22 00:00:00 | confirmed |
2 | 2021-01-23 00:00:00 | timeout |
6 | 2021-10-23 14:14:14 | confirmed |
6 | 2021-10-24 14:14:13 | timeout |
输出:
user_id |
---|
2 |
3 |
6 |
解析:
用户 2 在彼此恰好 24 小时内请求了两条消息,因此我们将它们包括在内。
用户 3 在 6 分 59 秒内请求了两条消息,因此我们将它们包括在内。
用户 6 在 23 小时 59 分 59 秒内请求了两条消息,因此我们将它们包括在内。
用户 7 在 24 小时 1 秒内请求了两条消息,因此我们将它们从答案中排除。
4、代码编写
知识点(先看这个方便理解代码)
TIMESTAMPDIFF ⭐️
语法: TIMESTAMPDIFF (interval, datetime_expr1, datetime_expr2)
说明: 返回结果 datetime_expr2 - datetime_expr1
参数: interval 可以为 SECOND - 秒、MINUTE - 分、HOUR - 时、DAY - 日、MONTH - 月、YEAR - 年
mysql> SELECT TIMESTAMPDIFF(DAY,'2024-01-03','2024-01-01');
-> -2
mysql> SELECT TIMESTAMPDIFF(DAY,'2024-01-01','2024-01-03');
-> 2
Lead 和 Lag⭐️
语法:
Lead (列名, 偏移量, 超出记录窗口时的默认值)
Lag (列名, 偏移量, 超出记录窗口时的默认值)
注:如果使用 Lead (列名) 或 Lag (列名) 语法,则偏移量默认1,默认值是 null(无显示)
说明:
Lead:形象的理解就是把数据从下向上推,下端出现空格
Lag:形象的理解就是把数据从上向下推,上端出现空格
原始数据 (items):
item_id | item_brand |
---|---|
1 | Samsung |
2 | Lenovo |
3 | LG |
4 | HP |
Lead代码效果
SELECT *, Lead(item_brand, 1, 0) OVER (ORDER BY item_id) AS lead_data
FROM items
item_id | item_brand | lead_data |
---|---|---|
1 | Samsung | Lenovo |
2 | Lenovo | LG |
3 | LG | HP |
4 | HP | 0 |
Lag代码效果
SELECT *, Lag(item_brand, 1, 0) OVER (ORDER BY item_id) AS lag_data
FROM items
item_id | item_brand | lag_data |
---|---|---|
1 | Samsung | 0 |
2 | Lenovo | Samsung |
3 | LG | Lenovo |
4 | HP | LG |
1、TIMESTAMPDIFF 配合 Lead 写法
SELECT DISTINCT user_id
FROM (
SELECT user_id, TIMESTAMPDIFF(
SECOND,
time_stamp,
LEAD(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
) AS diff
FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND diff <= 24*60*60
代码分析
SELECT user_id, TIMESTAMPDIFF(
SECOND,
time_stamp,
LEAD(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
) AS diff
FROM Confirmations
Lead的效果是下向上推,例如
| 2 | 2021-01-22 00:00:00 |
| 2 | 2021-01-23 00:00:00 |
变成
| 2 | 2021-01-23 00:00:00 |
| 2 | null |
利用向上推的数据减去原先的数据,
上面SQL语句 TIMESTAMPDIFF 的第二个参数是 2021-01-22 00:00:00,第三个参数是 2021-01-23 00:00:00
下面出现空格,默认值null(无显示)没错,
| user_id | diff |
| ------- | ----- |
| 2 | 86400 |
| 2 | null |
| 3 | 419 |
| 3 | null |
| 6 | 86399 |
| 6 | null |
| 7 | 86401 |
| 7 | null |
要小于等于86400(一天)里面只有三个符合要求
| user_id | diff |
| ------- | ----- |
| 2 | 86400 |
| 3 | 419 |
| 6 | 86399 |
2、TIMESTAMPDIFF 配合 Lag 写法
SELECT DISTINCT user_id
FROM (
SELECT user_id, TIMESTAMPDIFF(
SECOND,
time_stamp,
LAG(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
) AS diff
FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND ABS(diff) <= 24*60*60
Lag的效果是上向下推,例如
| 2 | 2021-01-22 00:00:00 |
| 2 | 2021-01-23 00:00:00 |
变成
| 2 | null |
| 2 | 2021-01-22 00:00:00 |
利用向上推的数据减去原先的数据,
上面SQL语句 TIMESTAMPDIFF 的第二个参数是 2021-01-23 00:00:00,第三个参数是 2021-01-22 00:00:00
上面出现空格,默认值null(无显示)没错,
| user_id | diff |
| ------- | ------ |
| 2 | null |
| 2 | -86400 |
| 3 | null |
| 3 | -419 |
| 6 | null |
| 6 | -86399 |
| 7 | null |
| 7 | -86401 |
要小于等于86400(一天)里面只有三个符合要求(需要取绝对值)
| user_id | diff |
| ------- | ----- |
| 2 | -86400 |
| 3 | -419 |
| 6 | -86399 |
你要简单写就把 TIMESTAMPDIFF 参数二和参数三换个位置就不用取绝对值了,主要还是演示多种情况,方便更好理解
SELECT DISTINCT user_id
FROM (
SELECT user_id, TIMESTAMPDIFF(
SECOND,
LAG(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp),
time_stamp
) AS diff
FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND diff <= 24*60*60
注:ABS() 函数的效果和数学中的绝对值效果一样
3、单 TIMESTAMPDIFF 写法
有限制正范围的情况就只保证两个时间不相同即可(a.time_stamp != b.time_stamp 是为了不和自己连接,因为没有唯一键做区分)
SELECT DISTINCT a.user_id
FROM Confirmations a, Confirmations b
WHERE a.user_id = b.user_id
AND TIMESTAMPDIFF(SECOND, a.time_stamp, b.time_stamp) BETWEEN 0 AND 24*60*60
AND a.time_stamp != b.time_stamp
包含负范围的情况就还得控制算出的时间差是正的(b.time_stamp - a.time_stamp > 0)
SELECT DISTINCT a.user_id
FROM Confirmations a, Confirmations b
WHERE a.user_id = b.user_id
AND TIMESTAMPDIFF(SECOND, a.time_stamp, b.time_stamp) <= 86400
AND a.time_stamp < b.time_stamp