Delta queue design
Delta queue design
Delta Queue
So, planning to try geting things like ALU and auto joypad reads into a delta queue. If it works well, then do the same for NMI+IRQ. Not entirely sure how to design such a class.
---
Background info:
There's probably no more than ~10 events: DRAM refresh, DMA trigger, HDMA trigger, HDMA init trigger, NMI trigger, IRQ trigger, ALU step, Joypad poll step, etc. They don't stack, so we can make our queue a fixed size, no need for a resizable vector.
---
The only way I can think to implement it is like so:
List is sorted in linear order, next event to fire comes first. Each event tells you that it triggers +n clock cycles from the previous, eg:
queue {
{ 26, NMI }, //NMI trigger in 26 clocks ahead of current time
{ 14, ALU }, //ALU step in 42 clocks (26+14)
};
add_clocks(n) would be something like:
- subtract n from the top of the queue
loop:
- if queue(0) > 0, return
- trigger queue(0) event
- zero out queue(0) so it won't fire if queue wraps around? or mark the enum as inactive / empty?
- add the now-negative queue(0) to queue(1); advance queue one place
add_event and remove_event would be a major pain. I'd essentially have to rebuild the queue from scratch. At best, an O(log n) search to find where it should be inserted, and rebuild everything after that to avoid throwing off the relative timestamps. I would need to add and remove events at least 2-3x per scanline, minimum ... but then there's only ~10 of them or so. Maybe it won't matter much? What kind of complexity level does adding and removing from such a queue usually entail?
My table would just be a time index+enum. Triggering an event can switch on the enum to invoke the function call. So copying an enum for rebuilding shouldn't be too bad.
Advancing the queue would be easy enough at O(1) by using a ring-buffer index into it.
---
Is there a better way to implement this, possibly?
So, planning to try geting things like ALU and auto joypad reads into a delta queue. If it works well, then do the same for NMI+IRQ. Not entirely sure how to design such a class.
---
Background info:
There's probably no more than ~10 events: DRAM refresh, DMA trigger, HDMA trigger, HDMA init trigger, NMI trigger, IRQ trigger, ALU step, Joypad poll step, etc. They don't stack, so we can make our queue a fixed size, no need for a resizable vector.
---
The only way I can think to implement it is like so:
List is sorted in linear order, next event to fire comes first. Each event tells you that it triggers +n clock cycles from the previous, eg:
queue {
{ 26, NMI }, //NMI trigger in 26 clocks ahead of current time
{ 14, ALU }, //ALU step in 42 clocks (26+14)
};
add_clocks(n) would be something like:
- subtract n from the top of the queue
loop:
- if queue(0) > 0, return
- trigger queue(0) event
- zero out queue(0) so it won't fire if queue wraps around? or mark the enum as inactive / empty?
- add the now-negative queue(0) to queue(1); advance queue one place
add_event and remove_event would be a major pain. I'd essentially have to rebuild the queue from scratch. At best, an O(log n) search to find where it should be inserted, and rebuild everything after that to avoid throwing off the relative timestamps. I would need to add and remove events at least 2-3x per scanline, minimum ... but then there's only ~10 of them or so. Maybe it won't matter much? What kind of complexity level does adding and removing from such a queue usually entail?
My table would just be a time index+enum. Triggering an event can switch on the enum to invoke the function call. So copying an enum for rebuilding shouldn't be too bad.
Advancing the queue would be easy enough at O(1) by using a ring-buffer index into it.
---
Is there a better way to implement this, possibly?
-
- ZSNES Shake Shake Prinny
- Posts: 5632
- Joined: Wed Jul 28, 2004 4:15 pm
- Location: PAL50, dood !
Re: Delta queue design
Circular buffer structure.byuu wrote:add_event and remove_event would be a major pain. I'd essentially have to rebuild the queue from scratch. At best, an O(log n) search to find where it should be inserted, and rebuild everything after that to avoid throwing off the relative timestamps. I would need to add and remove events at least 2-3x per scanline, minimum ... but then there's only ~10 of them or so. Maybe it won't matter much? What kind of complexity level does adding and removing from such a queue usually entail?
(is what I decided on when faced with a similar issue)
皆黙って俺について来い!!
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Code: Select all
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Re: Delta queue design
I'm using a ring buffer to advance the queue for push/pop.grinvader wrote:Circular buffer structure.
But each element relies on the previous element, as each timestamp is relative. In order to find where to insert, I'd sum the times of all elements from the beginning until they exceed the new element. Then I'd have to insert the new element in the middle and move all the old ones back.
A linked list wouldn't really help, as inserting a new timed element would throw off all the elements after it, so I'd have to recompute all of those anyway.
Just worried that inserting / deleting might add just as much overhead as maintaining a linear list and checking each one individually.
I use either a heap with absolute times or, if there is a very low number of fixed events, I just statically select the closest event linearly on changes, avoiding looping/indirection overhead (the low constant factor benefit outweighs the O(n) penalty for low n) but having to write the selection function per instance.
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
Are there any properties of the delays you can take advantage of? For instance if the events never happen less than 4 clocks apart and the last event is always less than 256 clocks ahead, you could use a 64-entry array and do something like (startingIndex + delay / 4) % 64 to find the next spot for insertion (storing delay % 4 as the offset). You'll be walking through a somewhat sparse array, but at least you won't have to reorder the entries.
Won't that require decrementing and checking every item in the list? That's pretty much what I do now, just hard-coded for extra speed.I use either a heap with absolute times
How do you know which event is the closest without checking them all? Are you keeping your list sorted, but still using absolute times? That could work well with a linked list for O(log n) search + insertion + deletion.if there is a very low number of fixed events, I just statically select the closest event linearly on changes
You'd need to keep track of how many clocks you took off the first entry, and once it trips, take that counter off all the other ones in turn. Could be tricky, but workable.
Not that I'm aware of. They're all very different types of timing events with varying levels of granularity.For instance if the events never happen less than 4 clocks apart and the last event is always less than 256 clocks ahead
-
- Regular
- Posts: 347
- Joined: Tue Mar 07, 2006 10:32 am
- Location: The Netherlands
Re: Delta queue design
You definitely want to use a linked-list to do the insertions in order to preserve the O(1) amortized time. It may seem like a binary search would be faster, but you lose the savings over a straight linked-list insertion when you have to move all the array elements to insert a new event.byuu wrote:
Background info:
There's probably no more than ~10 events: DRAM refresh, DMA trigger, HDMA trigger, HDMA init trigger, NMI trigger, IRQ trigger, ALU step, Joypad poll step, etc. They don't stack, so we can make our queue a fixed size, no need for a resizable vector.
The only way I can think to implement it is like so:
List is sorted in linear order, next event to fire comes first. Each event tells you that it triggers +n clock cycles from the previous, eg:
queue {
{ 26, NMI }, //NMI trigger in 26 clocks ahead of current time
{ 14, ALU }, //ALU step in 42 clocks (26+14)
};
add_clocks(n) would be something like:
- subtract n from the top of the queue
loop:
- if queue(0) > 0, return
- trigger queue(0) event
- zero out queue(0) so it won't fire if queue wraps around? or mark the enum as inactive / empty?
- add the now-negative queue(0) to queue(1); advance queue one place
add_event and remove_event would be a major pain.
Advancing the queue would be easy enough at O(1) by using a ring-buffer index into it.
Is there a better way to implement this, possibly?
The O(1) amortized time idea goes like this:
1. Insert a new event at time t + n
2. Start walking the linked list from the head
3. For each linked like node with an event time less than t + n, remove one credit from the amortization pool
4. Insert the new even in the linked list
5. For each call to next_event, return the head of the linked list, advance to the next node and add one credit from the amortization pool.
You can see that the credit pool will always be equal to zero when it becomes empty, so everything takes amortized constant time.
If you use an array with a binary search, then your insertions will take O(log n) + O(n) time to find and insert the new event.
In terms of implementation, you can implement linked lists using a set of fixed-length arrays for speed. This can be optimized further using standard linked-list optimizations. This code is not tested and is just for discussion.
Code: Select all
// Arrays to implement linked list
//
// delta : time in the future to fire event
// event : type of event (ALU, NMI, etc.)
// next : index of next linked list node
// head : index of list head
// free_node : list of free array slots
// free_len : number of free slots
unsigned int delta[MAX_LEN];
unsigned int event[MAX_LEN];
int next[MAX_LEN];
unsigned int free_node[MAX_LEN] = {0, 1, 2, ..., MAX_LEN-1};
int head = -1;
unsigned int free_idx = MAX_LEN - 1;
void add_event( unsigned int evnt, unsigned int dlta )
{
int prev, curr;
unsigned int delta_sum = 0;
// special case inserting to an empty list (assume free_idx > 0)
if ( head == -1 )
{
head = free_node[free_idx--];
delta[head] = dlta;
event[head] = evnt;
next[head] = -1;
return;
}
// otherwise scan for the first empty slot
curr = head;
prev = -1;
while ( curr >= 0 )
{
delta_sum += delta[curr];
if ( delta_sum >= dlta )
break;
prev = curr;
curr = next[curr];
}
// If the times are the same, merge the events
if ( delta_sum == dlta )
{
event[curr] |= evnt;
return;
}
// Otherwise insert a new event. After an event is inserted,
// we need to check if there is another event after the insertion
// point and decrement its delta value
// Special case an insertion before the head
if ( prev == -1 )
{
head = free_node[free_idx--];
delta[head] = dlta;
event[head] = evnt;
next[head] = curr;
}
else
{
// General insertion. Point the previous node to a new one
next[prev] = free_node[free_idx--];
// Advance to the new node and set it's link and data fields
// The delta field calculation is a bit ugly since it depends on
// how we terminated the search loop, e.g. end-of-list or
// delta_sum > dlta. This could probably be cleaned up to
// avoid the branch.
prev = next[prev];
next[prev] = curr;
event[prev] = evnt;
if ( curr >= 0 )
delta[prev] = dlta - (delta_sum - delta[curr]);
else
delta[prev] = dlta - delta_sum;
}
// Adjust the value of delta[curr] if needed
if ( curr >= 0 )
delta[curr] = delta_sum - dlta;
}
void add_clocks( unsigned int dlta )
{
unsigned int delta_sum = 0;
// scan the linked list until delta_sum > dlta
int curr = head;
while ( curr >= 0 )
{
// Advance the delta counter
delta_sum += delta[curr];
if ( delta_sum > dlta )
break;
// process the event
do_event( event[curr] );
// delete the linked list node
free_node[++free_idx] = curr;
curr = next[curr];
}
// reset the start of the list
head = curr;
// optionally reset the next delta
if ( head >= 0 )
delta[head] = delta_sum - dlta;
}
Edit:
Updated example code to actually sum the relative deltas and make the implementation a bit more realistic.
Also, if you are querying the delta queue on single-cycle intervals, then the add_clocks() function is greatly simplified because dlta = 1, e.g.
Code: Select all
void add_clock(void)
{
// Just check and see if the head of the list is
// present with a delta value equal to one
if ( head < 0 )
return;
if ( --delta[head] == 0 )
{
// process the event
do_event( event[head] );
// delete the linked list node
free_node[++free_idx] = head;
head = next[head];
}
}
Last edited by lscharen on Wed Dec 31, 2008 5:41 am, edited 2 times in total.
agreed - linked list.
{ delta_time_to_expire, event, next*}
insertions are cheap, and you can't DO a binary search on a (classical) delta queue for insertions because you don't know the accumulated time delay on any particular event - only the time delta from the prior event.
If you expected your queue to have to get very long (rule #1 of choosing an algorithm: Know your domain), you could do an array of linked lists (somewhat like a hash table), and you could take advantage of precomputed "block" (referring to the linked lists) deltas to further reduce the insertion time. If you *really* wanted to do binary searches, you could reduce the cost of keeping the absolute times current by keeping semi-absolute times in each block (absolute times relative to the first non-active block), which would give you the ability to do a binary search. Then when the active block expires, you recompute absolute times only for the blocks, instead of each and every element in the queue.
But that's getting into a very high level of complexity that's generally not warranted for a delta queue.
edit: the biggest advantage that lscharen's implementation has over a classical linked-list implementation is locality. For the size of linked-list that byuu is talking about, there shouldn't be more than 2 cache misses in walking the list. A classical implementation would have many many more cache misses.
{ delta_time_to_expire, event, next*}
insertions are cheap, and you can't DO a binary search on a (classical) delta queue for insertions because you don't know the accumulated time delay on any particular event - only the time delta from the prior event.
If you expected your queue to have to get very long (rule #1 of choosing an algorithm: Know your domain), you could do an array of linked lists (somewhat like a hash table), and you could take advantage of precomputed "block" (referring to the linked lists) deltas to further reduce the insertion time. If you *really* wanted to do binary searches, you could reduce the cost of keeping the absolute times current by keeping semi-absolute times in each block (absolute times relative to the first non-active block), which would give you the ability to do a binary search. Then when the active block expires, you recompute absolute times only for the blocks, instead of each and every element in the queue.
But that's getting into a very high level of complexity that's generally not warranted for a delta queue.
edit: the biggest advantage that lscharen's implementation has over a classical linked-list implementation is locality. For the size of linked-list that byuu is talking about, there shouldn't be more than 2 cache misses in walking the list. A classical implementation would have many many more cache misses.
Last edited by DataPath on Wed Dec 31, 2008 1:35 am, edited 1 time in total.
-
- ZSNES Developer
- Posts: 3904
- Joined: Tue Jul 27, 2004 10:54 pm
- Location: Solar powered park bench
- Contact:
I must be misunderstanding the case here, since it seems too easy to me.
Can't you keep the same queue you have, just have an extra integer alongside it to contain the offset to use when tallying up to where you need to be in the queue?
Can't you keep the same queue you have, just have an extra integer alongside it to contain the offset to use when tallying up to where you need to be in the queue?
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
_____________
Insane Coding
No, I keep a single cycle counter that is incremented and compared to the head. The event time is the count at which the event is supposed to occur.byuu wrote:Won't that require decrementing and checking every item in the list?I use either a heap with absolute times
EDIT: what do you mean by "the list"? A binary heap is usually implemented with an array.
No, I mean it's a hard-coded linear search, but because of the low constant cost factor it's faster than O(log N) alternatives for low N. It only needs to be done every time an event is executed or scheduled.byuu wrote:How do you know which event is the closest without checking them all? Are you keeping your list sorted, but still using absolute times?if there is a very low number of fixed events, I just statically select the closest event linearly on changes
lscharen, thank you very much for the example code.
Not following what you mean by "amortization pool credits", but I get the general idea of your code. The part I missed was that you don't have to recompute all elements after your insertion part, just one.
Example ... say we have the following queue:
[0] = { NMI, +26 }
[1] = { IRQ, +14 } //= +42 total
[2] = { DRAM, +10 } //= +52 total
We want to insert ALU at +32, so we do an O(n) summation and find that we insert at [0]. So first link ALU between [0] and [2] to get:
[0] = { NMI, +26 }
[1] = { ALU, ??? }
[2] = { IRQ, +14 }
[3] = { DRAM, +10 }
Then fix the times:
[1] = clocks(32) - [0] = +6
[2] = [0] + [2] - [1] = +10
[3] is still correct [+26, +6, +10, +10 = +52], no need to update it.
The O(n) search to find the insertion point sucks, but like DataPath mentioned (and I missed in the first post), the relative values means we can't perform a binary search here. The linked list also rules out that possibility, anyway.
1) you have to test each and every entry in the list when adding time, or keep the list sorted somehow.
2) exact cycle count timestamps that grow forever will eventually overflow, unless you subtract them all by the smallest value in the list periodically. Do you do something like that after each frame or so?
Perhaps it would be better to keep a sorted exact cycle position list rather than an exact relative cycle position list. Don't think it'll matter with a linked list approach, though.
Not following what you mean by "amortization pool credits", but I get the general idea of your code. The part I missed was that you don't have to recompute all elements after your insertion part, just one.
Example ... say we have the following queue:
[0] = { NMI, +26 }
[1] = { IRQ, +14 } //= +42 total
[2] = { DRAM, +10 } //= +52 total
We want to insert ALU at +32, so we do an O(n) summation and find that we insert at [0]. So first link ALU between [0] and [2] to get:
[0] = { NMI, +26 }
[1] = { ALU, ??? }
[2] = { IRQ, +14 }
[3] = { DRAM, +10 }
Then fix the times:
[1] = clocks(32) - [0] = +6
[2] = [0] + [2] - [1] = +10
[3] is still correct [+26, +6, +10, +10 = +52], no need to update it.
The O(n) search to find the insertion point sucks, but like DataPath mentioned (and I missed in the first post), the relative values means we can't perform a binary search here. The linked list also rules out that possibility, anyway.
Not entirely following what you mean, sorry.Can't you keep the same queue you have, just have an extra integer alongside it to contain the offset to use when tallying up to where you need to be in the queue?
Ah, that brings up two problems for me:No, I keep a single cycle counter that is incremented and compared to the head. The event time is the count at which the event is supposed to occur.
1) you have to test each and every entry in the list when adding time, or keep the list sorted somehow.
2) exact cycle count timestamps that grow forever will eventually overflow, unless you subtract them all by the smallest value in the list periodically. Do you do something like that after each frame or so?
Perhaps it would be better to keep a sorted exact cycle position list rather than an exact relative cycle position list. Don't think it'll matter with a linked list approach, though.
I'm referring to an array, yes. But from lscharen's observation above, I'm leaning toward using a linked list now.EDIT: what do you mean by "the list"? A binary heap is usually implemented with an array.
That's what I'm worried about now. I don't have many events, so I don't know if this queue scheme will be worth the extra complication compared to hard-coded tests in the appropriate places.No, I mean it's a hard-coded linear search, but because of the low constant cost factor it's faster than O(log N) alternatives for low N. It only needs to be done every time an event is executed or scheduled.
No idea what you're talking about here. Are you perhaps confusing a heap with something else? With a binary heap adding cycles would be O(1), finding the head is O(1), adjusting or scheduling an event is O(log N), and removing the head is O(log N). Heaps are typically used for priority queues.Ah, that brings up two problems for me:
1) you have to test each and every entry in the list when adding time (at least when its possible for those events to fire.)
Yes, I do subtract all counters periodically to avoid overflow.
Using a linked list for a binary heap would be silly (slow).
Note that by hard-coded linear search I'm referring to something very different from what you're currently doing.That's what I'm worried about now. I don't have many events, so I don't know if this queue scheme will be worth the extra complication compared to hard-coded tests in the appropriate places.
Ah yes, I was thinking you had a linear array. Can't say I've ever used a binary heap before, so I had to read up on it.No idea what you're talking about here. Are you perhaps confusing a heap with something else?
The 2n+1, 2n+2 trick to index into the heap is cute.
... because they're absolute times, so you're using a min-heap to trigger that, removing the element, then repeating the compare. Got it.With a binary heap adding cycles would be O(1)
Always heap[0], yes.finding the head is O(1)
Hmm. I'll have to work that out on paper. I'm visualizing a tree that branches in each direction, so you could add to one side indefinitely rather than both evenly. Probably just diagram->actual implementation confusion.adjusting or scheduling an event is O(log N), and removing the head is O(log N)
Obviously what you say about big O timing is correct.
Yes, subtract the current time from everything, since that will always be the lowest value; assuming you've polled for events after your last cycle add.Yes, I do subtract all counters periodically to avoid overflow.
I'll take your word for that, have enough here to work with at the moment ;)Note that by hard-coded linear search I'm referring to something very different from what you're currently doing.
Thank you very much for your help. Now I'm not sure which approach I should use ... linked list + relative timing, or binary heap + absolute timing + periodic subtraction to prevent overflow. Meh.
No need there. When comparing time stamps, instead of a < b always do (signed) a - b < 0. On most processor models, this amounts to checking the negative flag instead of the carry flag, so there should be no difference in performance.byuu wrote:2) exact cycle count timestamps that grow forever will eventually overflow, unless you subtract them all by the smallest value in the list periodically. Do you do something like that after each frame or so?
byuu, I found this helpful illustration of array-based heap operations. I had never studied this structure before. Array-based heap lecture notes.pdf
Ecst's point is that if you ensure that all your times fall within 0 to UINT_MAX/2 of each other, you can entirely eliminate periodic adjustments of all times. Unsigned values wrap around by definition, so addition and subtraction work without change. Only ordered comparison needs special treatment:
On a two's complement machine where conversion from unsigned to signed doesn't do any range check, which is the case on almost all machines these days, that can be optimized to the following (and I wouldn't be surprised if gcc does this automatically):
Don't try to use signed int everywhere and eliminate less_than; even though overflow of signed integers doesn't usually generate a trap on most machines these days, a compiler's optimizer might take advantage of that fact and rewrite the second "if" below into the first:
Ecst's point is that if you ensure that all your times fall within 0 to UINT_MAX/2 of each other, you can entirely eliminate periodic adjustments of all times. Unsigned values wrap around by definition, so addition and subtraction work without change. Only ordered comparison needs special treatment:
Code: Select all
bool less_than( unsigned x, unsigned y )
{
return x - y > UINT_MAX / 2;
}
Code: Select all
bool less_than( unsigned x, unsigned y )
{
return (int) (x - y) < 0;
}
Code: Select all
int time_x, time_y;
if ( time_x < time_y ) // won't work if y has overflowed, but x hasn't
if ( time_x - time_y < 0 ) // optimizer might rewrite this as above
Yes, I was trying to get across the idea that, even though the worst-case insertion time is O(n), a sequence of insert/remove operations have amortized constant time. See http://en.wikipedia.org/wiki/Amortized_analysis for more information.byuu wrote:lscharen, thank you very much for the example code.
Not following what you mean by "amortization pool credits", but I get the general idea of your code. The part I missed was that you don't have to recompute all elements after your insertion part, just one.
Right. You only have to update the delta time of the event you are inserting and the delta of the next event. If you're not convinced of the O(1) expected performance of the queue, implement it and try it out. You may be correct that it would be faster to keep an array sorted if the number of events is bounded by a small number, e.g.Example ... say we have the following queue:
[0] = { NMI, +26 }
[1] = { IRQ, +14 } //= +42 total
[2] = { DRAM, +10 } //= +52 total
We want to insert ALU at +32, so we do an O(n) summation and find that we insert at [0]. So first link ALU between [0] and [2] to get:
[0] = { NMI, +26 }
[1] = { ALU, ??? }
[2] = { IRQ, +14 }
[3] = { DRAM, +10 }
Then fix the times:
[1] = clocks(32) - [0] = +6
[2] = [0] + [2] - [1] = +10
[3] is still correct [+26, +6, +10, +10 = +52], no need to update it.
The O(n) search to find the insertion point sucks, but like DataPath mentioned (and I missed in the first post), the relative values means we can't perform a binary search here. The linked list also rules out that possibility, anyway.
Code: Select all
void array_queue_insert( unsigned int time, unsigned int event)
{
// find the insertion point
int idx = binary_search( array, time );
// move the elements from idx+1 to array_len
memmove( &array[idx + 1], &array[idx], (array_len - idx) * sizeof(int));
// insert the new event
array[idx] = event;
}
Suppose you have N events, and that every time an event is executed it is rescheduled to execute after all the other events (but we don't know this, so we can't just stick it to the back of a FIFO). Then with a delta queue approach each event execution+rescheduling has O(N) complexity. Amortizing over the number of event executions like that looks like a fallacy to me, since you're not considering that all the events already in the queue need to be scheduled too.
lscharen, excellent point about only having to adjust the next delta, not all of them, when inserting before the end. sinamas, if you keep track of the absolute time of the last event, inserting an event past all the others can done in constant time. I wrote a simple delta queue class and exercise program, which demonstrates both of the above points. Excerpt of the key parts shown below. Note that this demo doesn't associate anything with the events, just the times of the events themselves. Delta_Queue.cpp
Code: Select all
void Delta_Queue::insert( int time )
{
if ( time >= last )
{
events.push_back( time - last );
last = time;
}
else
{
events_t::iterator p = events.begin();
while ( p != events.end() )
{
if ( time <= *p )
{
*p -= time;
break;
}
time -= *p++;
}
events.insert( p, time );
}
}
void Delta_Queue::pop_front()
{
last -= front();
events.pop_front();
}
I have to agree with sinamas here. I took the javadoc comment from the page byuu linked to on its face, but I can't figure out a proof of amortized constant time for a linked-list delta queue. I even looked at the Java source code and it's basically the same implementation that I proposed earlier.sinamas wrote:True, but we still get linear complexity on average if the common case position is before the end and has a mean proportional to N.blargg wrote:sinamas, if you keep track of the absolute time of the last event, inserting an event past all the others can done in constant time.
http://avrora.cvs.sourceforge.net/viewv ... iew=markup
In terms of efficient implementation, I'm starting to think that a min-heap using the 2s-complement trick might very well be faster. The lack of having to update a next field and guaranteed O(log n) insertion might make up for O(log n) removal.
-
- Hazed
- Posts: 76
- Joined: Sat Jan 28, 2006 7:21 am
blargg, good job in elegantly implementing a delta queue (and also in providing commentary on standards-compliant C/C++ implementation of "sliding" compares). Note that you can get rid of the subtractions in "*p -= time;" and "time -= *p++", and also the "last" variable if you use absolute time stamps instead of relative deltas (paradoxical as it may sound).
The misunderstanding regarding complexity seems to be the following: The O(1) for delta queues takes the number of events as constant, not depending on it as a variable, rather giving it the meaning of O(1) per event and time unit.
Facilitating comparability, let us consider the number n of events, the number t of cycles of the considered time interval, the frequencies (average number of occurences per cycle) f_1, ..., f_n of the events, and the average delta queue insertion points (how many list items we have to iterate over when inserting the event) a_1, ..., a_n of each event. Note that a_i * f_i <= 1 (this is the key point of delta queues).
The binary heap is O(log(n) * (f_1 + ... + f_n) * t).
The delta queue is O((a_1 * f_1 + ... + a_n * f_n) * t). Using a_i * f_i <= 1, we can give the simpler, but less sharp estimate O(n * t) (this is the O(1) per event and time unit from above).
As a case study, further simplifying, assume that each event respawns as soon as it got killed and has constant delay, that f_1 > ... > f_n, that no events occur simultaneously, and that the time interval is sufficiently large. After some calculations, we find that a_1 * f_1 + ... + a_n * f_n = 2 * (0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n). So it all boils down to log(n) * (f_1 + ... + f_n) vs. 0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n. Just for fun, let f_n = 1 / n^q with q > 0 some real parameter. Interestingly, the binary heap wins for 0 < q < 2, while the delta queue wins for q > 2, with a draw for q = 2.
Informally, these heuristic results would mean the delta queue is best suited for situations where the event delays cover all different kinds of magnitudes, while the binary heap should be used in the remaining cases, and when guaranteed optimal (and good) worst case complexity is important. Note that (albeit not obvious, and maybe slowed by a constant factor) binary heaps can also feature logarithmic removal/rescheduling of events (e. g., in case IRQ settings change).
Interesting to consider would be a synergistic approach, with the at most logarithmic insertion of the binary heap, but cheaper insertion for events sooner to occurs. Maybe a kind of "unbalanced" binary search tree with a bias for the right (with larger values).
Nevertheless, all these considerations will probably be in vain if the number of events is low enough for efficient hard-coded solutions. Still, a scalable approach would be interesting (e. g., if you, byuu, plan on potentially adding many future events, for whatever reason).
The misunderstanding regarding complexity seems to be the following: The O(1) for delta queues takes the number of events as constant, not depending on it as a variable, rather giving it the meaning of O(1) per event and time unit.
Facilitating comparability, let us consider the number n of events, the number t of cycles of the considered time interval, the frequencies (average number of occurences per cycle) f_1, ..., f_n of the events, and the average delta queue insertion points (how many list items we have to iterate over when inserting the event) a_1, ..., a_n of each event. Note that a_i * f_i <= 1 (this is the key point of delta queues).
The binary heap is O(log(n) * (f_1 + ... + f_n) * t).
The delta queue is O((a_1 * f_1 + ... + a_n * f_n) * t). Using a_i * f_i <= 1, we can give the simpler, but less sharp estimate O(n * t) (this is the O(1) per event and time unit from above).
As a case study, further simplifying, assume that each event respawns as soon as it got killed and has constant delay, that f_1 > ... > f_n, that no events occur simultaneously, and that the time interval is sufficiently large. After some calculations, we find that a_1 * f_1 + ... + a_n * f_n = 2 * (0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n). So it all boils down to log(n) * (f_1 + ... + f_n) vs. 0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n. Just for fun, let f_n = 1 / n^q with q > 0 some real parameter. Interestingly, the binary heap wins for 0 < q < 2, while the delta queue wins for q > 2, with a draw for q = 2.
Informally, these heuristic results would mean the delta queue is best suited for situations where the event delays cover all different kinds of magnitudes, while the binary heap should be used in the remaining cases, and when guaranteed optimal (and good) worst case complexity is important. Note that (albeit not obvious, and maybe slowed by a constant factor) binary heaps can also feature logarithmic removal/rescheduling of events (e. g., in case IRQ settings change).
Interesting to consider would be a synergistic approach, with the at most logarithmic insertion of the binary heap, but cheaper insertion for events sooner to occurs. Maybe a kind of "unbalanced" binary search tree with a bias for the right (with larger values).
Nevertheless, all these considerations will probably be in vain if the number of events is low enough for efficient hard-coded solutions. Still, a scalable approach would be interesting (e. g., if you, byuu, plan on potentially adding many future events, for whatever reason).
Last edited by ecst on Fri Jan 02, 2009 4:18 am, edited 2 times in total.