【MySQL】窗口函数 Lead 和 Lag 的运用(MySQL版本8+)以及 时间差函数 TIMESTAMPDIFF 的运用

力扣题

1、题目地址

1939. 主动请求确认消息的用户

2、模拟表

表:Signups

Column NameType
user_idint
time_stampdatetime
  • user_id is the primary key for this table.
  • 每行包含有关 ID 为 user_id 的用户的注册时间的信息。

表:Confirmations

Column NameType
user_idint
time_stampdatetime
actionENUM
  • (user_id, time_stamp) is the primary key for this table.
  • user_id is a foreign key with a reference to the Signups table.
  • action is an ENUM of the type (‘confirmed’, ‘timeout’)
  • 此表的每一行都表示 ID 为 user_id 的用户在 time_stamp 请求了确认消息,
  • 并且该确认消息已被确认(‘confirmed’)或已过期(‘timeout’)。

3、要求

编写 SQL 查询以查找在 24 小时窗口内两次请求确认消息的用户的 ID。
两个正好相隔 24 小时的消息被认为是在窗口内。 该操作不会影响答案,只会影响请求时间。

以任意顺序返回结果表。

查询结果格式如下例:

Signups 表:

user_idtime_stamp
32020-03-21 10:16:13
72020-01-04 13:57:59
22020-07-29 23:09:44
62020-12-09 10:39:37

Confirmations 表:

user_idtime_stampaction
32021-01-06 03:30:46timeout
32021-01-06 03:37:45timeout
72021-06-12 11:57:29confirmed
72021-06-13 11:57:30confirmed
22021-01-22 00:00:00confirmed
22021-01-23 00:00:00timeout
62021-10-23 14:14:14confirmed
62021-10-24 14:14:13timeout

输出:

user_id
2
3
6

解析:
用户 2 在彼此恰好 24 小时内请求了两条消息,因此我们将它们包括在内。
用户 3 在 6 分 59 秒内请求了两条消息,因此我们将它们包括在内。
用户 6 在 23 小时 59 分 59 秒内请求了两条消息,因此我们将它们包括在内。
用户 7 在 24 小时 1 秒内请求了两条消息,因此我们将它们从答案中排除。

4、代码编写

知识点(先看这个方便理解代码)

TIMESTAMPDIFF ⭐️

语法: TIMESTAMPDIFF (interval, datetime_expr1, datetime_expr2)

说明: 返回结果 datetime_expr2 - datetime_expr1

参数: interval 可以为 SECOND - 秒、MINUTE - 分、HOUR - 时、DAY - 日、MONTH - 月、YEAR - 年

mysql> SELECT TIMESTAMPDIFF(DAY,'2024-01-03','2024-01-01'); 
-> -2
mysql> SELECT TIMESTAMPDIFF(DAY,'2024-01-01','2024-01-03'); 
-> 2

参考:MySQL – 计算时间差

Lead 和 Lag⭐️

语法:
Lead (列名, 偏移量, 超出记录窗口时的默认值)
Lag (列名, 偏移量, 超出记录窗口时的默认值)
注:如果使用 Lead (列名) 或 Lag (列名) 语法,则偏移量默认1,默认值是 null(无显示)

说明:
Lead:形象的理解就是把数据从下向上推,下端出现空格
Lag:形象的理解就是把数据从上向下推,上端出现空格

原始数据 (items):

item_iditem_brand
1Samsung
2Lenovo
3LG
4HP

Lead代码效果

SELECT *, Lead(item_brand, 1, 0) OVER (ORDER BY item_id) AS lead_data
FROM items
item_iditem_brandlead_data
1SamsungLenovo
2LenovoLG
3LGHP
4HP0

Lag代码效果

SELECT *, Lag(item_brand, 1, 0) OVER (ORDER BY item_id) AS lag_data
FROM items
item_iditem_brandlag_data
1Samsung0
2LenovoSamsung
3LGLenovo
4HPLG

参考:sql中的窗口函数:lead,lag

1、TIMESTAMPDIFF 配合 Lead 写法

SELECT DISTINCT user_id
FROM (
    SELECT user_id, TIMESTAMPDIFF(
        SECOND, 
        time_stamp, 
        LEAD(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
    ) AS diff
    FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND diff <= 24*60*60

代码分析

SELECT user_id, TIMESTAMPDIFF(
    SECOND, 
    time_stamp, 
    LEAD(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
) AS diff
FROM Confirmations

Lead的效果是下向上推,例如

| 2       | 2021-01-22 00:00:00 |
| 2       | 2021-01-23 00:00:00 |

变成

| 2       | 2021-01-23 00:00:00 |
| 2       | null |

利用向上推的数据减去原先的数据,
上面SQL语句 TIMESTAMPDIFF 的第二个参数是 2021-01-22 00:00:00,第三个参数是 2021-01-23 00:00:00
下面出现空格,默认值null(无显示)没错,

| user_id | diff  |
| ------- | ----- |
| 2       | 86400 |
| 2       | null  |
| 3       | 419   |
| 3       | null  |
| 6       | 86399 |
| 6       | null  |
| 7       | 86401 |
| 7       | null  |

要小于等于86400(一天)里面只有三个符合要求

| user_id | diff  |
| ------- | ----- |
| 2       | 86400 |
| 3       | 419   |
| 6       | 86399 |

2、TIMESTAMPDIFF 配合 Lag 写法

SELECT DISTINCT user_id
FROM (
    SELECT user_id, TIMESTAMPDIFF(
        SECOND, 
        time_stamp, 
        LAG(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp)
    ) AS diff
    FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND ABS(diff) <= 24*60*60

Lag的效果是上向下推,例如

| 2       | 2021-01-22 00:00:00 |
| 2       | 2021-01-23 00:00:00 |

变成

| 2       | null |
| 2       | 2021-01-22 00:00:00 |

利用向上推的数据减去原先的数据,
上面SQL语句 TIMESTAMPDIFF 的第二个参数是 2021-01-23 00:00:00,第三个参数是 2021-01-22 00:00:00
上面出现空格,默认值null(无显示)没错,

| user_id | diff   |
| ------- | ------ |
| 2       | null   |
| 2       | -86400 |
| 3       | null   |
| 3       | -419   |
| 6       | null   |
| 6       | -86399 |
| 7       | null   |
| 7       | -86401 |

要小于等于86400(一天)里面只有三个符合要求(需要取绝对值)

| user_id | diff  |
| ------- | ----- |
| 2       | -86400 |
| 3       | -419   |
| 6       | -86399 |

你要简单写就把 TIMESTAMPDIFF 参数二和参数三换个位置就不用取绝对值了,主要还是演示多种情况,方便更好理解

SELECT DISTINCT user_id
FROM (
    SELECT user_id, TIMESTAMPDIFF(
        SECOND, 
        LAG(time_stamp, 1, 0) OVER (PARTITION BY user_id ORDER BY time_stamp), 
        time_stamp
    ) AS diff
    FROM Confirmations
) AS one
WHERE diff IS NOT NULL
AND diff <= 24*60*60

注:ABS() 函数的效果和数学中的绝对值效果一样

3、单 TIMESTAMPDIFF 写法

有限制正范围的情况就只保证两个时间不相同即可(a.time_stamp != b.time_stamp 是为了不和自己连接,因为没有唯一键做区分)

SELECT DISTINCT a.user_id
FROM Confirmations a, Confirmations b
WHERE a.user_id = b.user_id
AND TIMESTAMPDIFF(SECOND, a.time_stamp, b.time_stamp) BETWEEN 0 AND 24*60*60
AND a.time_stamp != b.time_stamp

包含负范围的情况就还得控制算出的时间差是正的(b.time_stamp - a.time_stamp > 0)

SELECT DISTINCT a.user_id
FROM Confirmations a, Confirmations b
WHERE a.user_id = b.user_id
AND TIMESTAMPDIFF(SECOND, a.time_stamp, b.time_stamp) <= 86400
AND a.time_stamp < b.time_stamp