0

Does anyone know of a way to LAG in a specified order? In the example below, I provide a table with my current output [Lag (Current)] and my desired output [Lag (Desired)] columns. I am interested in creating a lag for the event time based on event time order, need to make sure that the Groups do not get out of order. The issue is that Group CL1 is currently not being lagged based on time, its based on the group, so that is where I am struggling.

Personal ID Event Time Groups Lag (Current) Lag (Desired)
99999999 4:18:58 PM GM1
99999999 4:21:03 PM GM1 4:18:58 PM 4:18:58 PM
99999999 4:21:42 PM CL1
99999999 4:25:04 PM CL1 4:21:42 PM 4:21:42 PM
99999999 4:25:40 PM GM2
99999999 4:25:45 PM GM2 4:25:40 PM 4:25:40 PM
99999999 4:26:14 PM CL1 4:25:04 PM
99999999 4:26:23 PM CL1 4:21:42 PM 4:21:42 PM

Here is the SQL I have tried.

SELECT LAG() OVER (PARTITION BY Personal ID, Groups ORDER BY Personal ID, Event Time)

Updated

Second example

The below seemed to work, but not sure if its correct, especially as more data passes through. It did work for this example and the one you provided above, I just added the groups in the the last w2.

WITH sample_table AS (
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'CL1' groups UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'FZ1' UNION ALL
  SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' 
)

Expected Output enter image description here

Jaytiger
  • 11,626
  • 2
  • 5
  • 15
leaner007
  • 17
  • 5

1 Answers1

1

Consider below gaps and islands approach.

WITH sample_table AS (
  SELECT '99999999' personal_id, '4:18:58 PM' event_time, 'GM1' groups UNION ALL
  SELECT '99999999' personal_id, '4:21:03 PM' event_time, 'GM1' UNION ALL
  SELECT '99999999' personal_id, '4:21:42 PM' event_time, 'CL1' UNION ALL
  SELECT '99999999' personal_id, '4:25:04 PM' event_time, 'CL1' UNION ALL
  SELECT '99999999' personal_id, '4:25:40 PM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '4:25:45 PM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '4:26:14 PM' event_time, 'CL1' UNION ALL
  SELECT '99999999' personal_id, '4:26:23 PM' event_time, 'CL1' 
)
SELECT personal_id, event_time, groups, LAG(event_time) OVER w2 AS lag FROM (
  SELECT *, COUNT(1) OVER w1 - SUM(IF(flag, 1, 0)) OVER w1 AS part FROM (
    SELECT *, groups = LAG(groups) OVER w0 AS flag 
      FROM sample_table
    WINDOW w0 AS (PARTITION BY personal_id ORDER BY event_time)
  ) t WINDOW w1 AS (PARTITION BY personal_id ORDER BY event_time)
) t WINDOW w2 AS (PARTITION BY personal_id, part ORDER BY event_time);

Query results

enter image description here

Updated

WITH sample_table AS (
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'CL1' groups UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'FZ1' UNION ALL
  SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' UNION ALL
  SELECT '99999999' personal_id, '8:13:43 AM' event_time, 'GM2'
)
SELECT personal_id, event_time, groups, LAG(event_time) OVER w2 AS lag FROM (
  SELECT *, COUNT(1) OVER w1 - SUM(IF(flag, 1, 0)) OVER w1 AS part FROM (
    SELECT *, groups = LAG(groups) OVER w0 AS flag 
      FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY personal_id, event_time) AS rn FROM sample_table) t
    WINDOW w0 AS (PARTITION BY personal_id ORDER BY event_time, rn)
  ) t WINDOW w1 AS (PARTITION BY personal_id ORDER BY event_time, rn)
) t WINDOW w2 AS (PARTITION BY personal_id, part ORDER BY event_time, rn);

Query results

enter image description here

Jaytiger
  • 11,626
  • 2
  • 5
  • 15
  • 1
    Wow, that was impressive. Exactly what I needed and I learned something new. Thank you! – leaner007 Jan 22 '23 at 00:52
  • SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'CL1' groups UNION ALL SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'GM2' UNION ALL SELECT '99999999' personal_id, '1:04:12 AM' event_time, 'FZ1' UNION ALL SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' UNION ALL SELECT '99999999' personal_id, '8:13:42 AM' event_time, 'GM2' – leaner007 Jan 23 '23 at 00:05
  • Looks like this example does not work. Any way to fix it? – leaner007 Jan 23 '23 at 00:06
  • @leaner007, would you share the expected outoupt as well ? and one thing that I'd like to check is your last 2 rows has same `event_time`. is it correct ? (in case that, we can't apply `LAG` function) – Jaytiger Jan 23 '23 at 01:01
  • I have added the below. – leaner007 Jan 23 '23 at 01:26
  • @leaner007, you can remove your *answer* ;o). I've moved it to your question. And for the sample in your expected output, the current query will works. But for the sample in your query, it becomes different problem. Would you confirm that same *personal_id* and *groups* can have same *event_time* or not ? I'll get back to you later in my free time with more explanation based on your response. – Jaytiger Jan 23 '23 at 01:56
  • sorry, not for same *personal_id* and *groups*, just for *personal_id* can it have same *event_time* or not ? – Jaytiger Jan 23 '23 at 02:09
  • yes, one ID can have the same time stamp for different groups. The only reason why there are different timestamps in the output expected is because I could not replicate the expected output in SQL. So, the example I have provided does have the same time stamp. – leaner007 Jan 23 '23 at 03:08
  • kindly check the below link and updated answer as well. https://docs.google.com/spreadsheets/d/1PpZDLSmkGRKjtrZMGiunKhcKPHhkbXLePC1HAoyZ3c8/edit?usp=sharing – Jaytiger Jan 23 '23 at 07:21