I have a table of events (SQL Server 2000) with a start date/time and some other columns, some of which can be summed.
Id counter starttime 1 17 '23 jan 2003 11:20:00' 1 43 '23 jan 2003 11:24:45' 1 26 '23 jan 2003 11:32:23' 1 33 '23 jan 2003 12:15:20' 1 45 '23 jan 2003 12:18:43'
How can I group events which started within say 30 minutes of each other? The result I want is to sum the first 3, as they all occur within 30 minutes of the first event, then sum the last two as they are a new event set, and get the results :
Id counter starttime 1 86 '23 jan 2003 11:20:00' 1 78 '23 jan 2003 12:15:20'
I understand how to group by Id, and sum(counter) etc., but can't get my head around grouping by (or having) all events occurring within the 30 minute time frame from the first :-O.
NB the 30 minutes is an arbitrary number chosen to illustrate this example, the actual number will probably be different and need to be 'tuned'.
I can see how I could do this by running through the table in a cursor (or an external c# app etc.), and summing in the loop, but hope that it's possible without this overhead.
Short answer: it may be possible without the "overhead" of a cursor, but there will be other overheads that I'm sure you will not want, because they are several orders of magnitude larger.
Your requirement to "sum the first 3, as they all occur within 30 minutes of the first event" is quite easy to comprehend. Unfortunately, I think it's an example of petitio principii (to beg the question). We instinctively use process logic when considering problems of this sort. We cannot really help ourselves, because of our background in programming languages:
- start with the first record
- set the accumulator to the record's counter value, and save the time
- get the next record
- if there is no next record, skip to 7
- compare the saved time to this record's time
- if within 30 minutes, add the counter to the accumulator, and go back to 3
- output the accumulated value and time
- if this wasn't the last record, go to 2
This logic is very easy to implement with a scripting language and a cursor. In SQL Server, use Transact-SQL. You end up making one really efficient pass of the table, and Bob's your uncle.
Obtaining the same result using set-based logic, where you do not specify how to get what you want, but simply what you want, is fantastically more difficult in this particular situation.
First of all, we need to think of the groups. How do we group the individual rows?
If we wanted to group them according to fixed time slots, e.g. 11:00 to 11:30, 11:30 to 12:00, 12:00 to 12:30, and so on, then the solution is straightforward, and all we need is to "generate" the appropriate time slots and do a LEFT JOIN. See Grouping output by week (24 October 2002), which uses weeks instead of 30-minute intervals.
But I get the sense that your example doesn't want to use pre-set time slots, and instead, you wish to group the rows based on the time values in the data. This is hard. The query has to create groups based on the lowest time in the group being later than 30 minutes after the next lower time. I can foresee multiple subqueries. Perhaps a theta join. I'm getting a headache just thinking about where to start. No, let's not even go there. Use a cursor.
Dig Deeper on Oracle and SQL
Related Q&A from Rudy Limeback
Read SQL expert Rudy Limeback's advice for counting combinations in a table with SQL's GROUP BY clause Continue Reading
Read about the Mimer Validator, a tool used to verify your SQL code, in this tip from SQL expert Rudy Limeback. Continue Reading
Read an example of an SQL case expression from our SQL expert Rudy Limeback. Continue Reading