View unanswered posts | View active topics It is currently Sat Dec 07, 2019 8:08 am



This topic is locked, you cannot edit posts or make further replies.  [ 36 posts ]  Go to page 1, 2  Next
Delta queue design 
Author Message
Reply with quote
Post Delta queue design
Delta Queue

So, planning to try geting things like ALU and auto joypad reads into a delta queue. If it works well, then do the same for NMI+IRQ. Not entirely sure how to design such a class.

---

Background info:
There's probably no more than ~10 events: DRAM refresh, DMA trigger, HDMA trigger, HDMA init trigger, NMI trigger, IRQ trigger, ALU step, Joypad poll step, etc. They don't stack, so we can make our queue a fixed size, no need for a resizable vector.

---

The only way I can think to implement it is like so:
List is sorted in linear order, next event to fire comes first. Each event tells you that it triggers +n clock cycles from the previous, eg:
queue {
{ 26, NMI }, //NMI trigger in 26 clocks ahead of current time
{ 14, ALU }, //ALU step in 42 clocks (26+14)
};

add_clocks(n) would be something like:
- subtract n from the top of the queue
loop:
- if queue(0) > 0, return
- trigger queue(0) event
- zero out queue(0) so it won't fire if queue wraps around? or mark the enum as inactive / empty?
- add the now-negative queue(0) to queue(1); advance queue one place

add_event and remove_event would be a major pain. I'd essentially have to rebuild the queue from scratch. At best, an O(log n) search to find where it should be inserted, and rebuild everything after that to avoid throwing off the relative timestamps. I would need to add and remove events at least 2-3x per scanline, minimum ... but then there's only ~10 of them or so. Maybe it won't matter much? What kind of complexity level does adding and removing from such a queue usually entail?

My table would just be a time index+enum. Triggering an event can switch on the enum to invoke the function call. So copying an enum for rebuilding shouldn't be too bad.

Advancing the queue would be easy enough at O(1) by using a ring-buffer index into it.

---

Is there a better way to implement this, possibly?


Tue Dec 30, 2008 8:08 pm
ZSNES Shake Shake Prinny
User avatar

Joined: Wed Jul 28, 2004 4:15 pm
Posts: 5615
Location: PAL50, dood !
Reply with quote
Post Re: Delta queue design
byuu wrote:
add_event and remove_event would be a major pain. I'd essentially have to rebuild the queue from scratch. At best, an O(log n) search to find where it should be inserted, and rebuild everything after that to avoid throwing off the relative timestamps. I would need to add and remove events at least 2-3x per scanline, minimum ... but then there's only ~10 of them or so. Maybe it won't matter much? What kind of complexity level does adding and removing from such a queue usually entail?

Circular buffer structure.

(is what I decided on when faced with a similar issue)

_________________
皆黙って俺について来い!!
Code:
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)

Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54


Tue Dec 30, 2008 9:40 pm
Profile
Reply with quote
Post Re: Delta queue design
grinvader wrote:
Circular buffer structure.


I'm using a ring buffer to advance the queue for push/pop.

But each element relies on the previous element, as each timestamp is relative. In order to find where to insert, I'd sum the times of all elements from the beginning until they exceed the new element. Then I'd have to insert the new element in the middle and move all the old ones back.

A linked list wouldn't really help, as inserting a new timed element would throw off all the elements after it, so I'd have to recompute all of those anyway.

Just worried that inserting / deleting might add just as much overhead as maintaining a linear list and checking each one individually.


Tue Dec 30, 2008 10:05 pm
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
I use either a heap with absolute times or, if there is a very low number of fixed events, I just statically select the closest event linearly on changes, avoiding looping/indirection overhead (the low constant factor benefit outweighs the O(n) penalty for low n) but having to write the selection function per instance.


Tue Dec 30, 2008 10:35 pm
Profile
Regular
User avatar

Joined: Tue Mar 07, 2006 10:32 am
Posts: 347
Location: The Netherlands
Reply with quote
Post 
Are there any properties of the delays you can take advantage of? For instance if the events never happen less than 4 clocks apart and the last event is always less than 256 clocks ahead, you could use a 64-entry array and do something like (startingIndex + delay / 4) % 64 to find the next spot for insertion (storing delay % 4 as the offset). You'll be walking through a somewhat sparse array, but at least you won't have to reorder the entries.


Tue Dec 30, 2008 10:51 pm
Profile
Reply with quote
Post 
Quote:
I use either a heap with absolute times


Won't that require decrementing and checking every item in the list? That's pretty much what I do now, just hard-coded for extra speed.

Quote:
if there is a very low number of fixed events, I just statically select the closest event linearly on changes


How do you know which event is the closest without checking them all? Are you keeping your list sorted, but still using absolute times? That could work well with a linked list for O(log n) search + insertion + deletion.

You'd need to keep track of how many clocks you took off the first entry, and once it trips, take that counter off all the other ones in turn. Could be tricky, but workable.

Quote:
For instance if the events never happen less than 4 clocks apart and the last event is always less than 256 clocks ahead


Not that I'm aware of. They're all very different types of timing events with varying levels of granularity.


Tue Dec 30, 2008 11:21 pm
Regular
User avatar

Joined: Tue Mar 07, 2006 10:32 am
Posts: 347
Location: The Netherlands
Reply with quote
Post 
byuu wrote:
Not that I'm aware of. They're all very different types of timing events with varying levels of granularity.

And the maximum delay between the first and last events in the queue is unreasonably large?


Tue Dec 30, 2008 11:57 pm
Profile
New Member

Joined: Tue Dec 30, 2008 11:37 pm
Posts: 7
Reply with quote
Post Re: Delta queue design
byuu wrote:

Background info:
There's probably no more than ~10 events: DRAM refresh, DMA trigger, HDMA trigger, HDMA init trigger, NMI trigger, IRQ trigger, ALU step, Joypad poll step, etc. They don't stack, so we can make our queue a fixed size, no need for a resizable vector.

The only way I can think to implement it is like so:
List is sorted in linear order, next event to fire comes first. Each event tells you that it triggers +n clock cycles from the previous, eg:
queue {
{ 26, NMI }, //NMI trigger in 26 clocks ahead of current time
{ 14, ALU }, //ALU step in 42 clocks (26+14)
};

add_clocks(n) would be something like:
- subtract n from the top of the queue
loop:
- if queue(0) > 0, return
- trigger queue(0) event
- zero out queue(0) so it won't fire if queue wraps around? or mark the enum as inactive / empty?
- add the now-negative queue(0) to queue(1); advance queue one place

add_event and remove_event would be a major pain.

Advancing the queue would be easy enough at O(1) by using a ring-buffer index into it.

Is there a better way to implement this, possibly?


You definitely want to use a linked-list to do the insertions in order to preserve the O(1) amortized time. It may seem like a binary search would be faster, but you lose the savings over a straight linked-list insertion when you have to move all the array elements to insert a new event.

The O(1) amortized time idea goes like this:

1. Insert a new event at time t + n
2. Start walking the linked list from the head
3. For each linked like node with an event time less than t + n, remove one credit from the amortization pool
4. Insert the new even in the linked list
5. For each call to next_event, return the head of the linked list, advance to the next node and add one credit from the amortization pool.

You can see that the credit pool will always be equal to zero when it becomes empty, so everything takes amortized constant time.

If you use an array with a binary search, then your insertions will take O(log n) + O(n) time to find and insert the new event.

In terms of implementation, you can implement linked lists using a set of fixed-length arrays for speed. This can be optimized further using standard linked-list optimizations. This code is not tested and is just for discussion.

Code:
// Arrays to implement linked list
//
// delta : time in the future to fire event
// event : type of event (ALU, NMI, etc.)
// next : index of next linked list node
// head : index of list head
// free_node : list of free array slots
// free_len : number of free slots

unsigned int delta[MAX_LEN];
unsigned int event[MAX_LEN];
int next[MAX_LEN];
unsigned int free_node[MAX_LEN] = {0, 1, 2, ..., MAX_LEN-1};
int head = -1;
unsigned int free_idx = MAX_LEN - 1;

void add_event( unsigned int evnt, unsigned int dlta )
{
   int prev, curr;
   unsigned int delta_sum = 0;

   // special case inserting to an empty list (assume free_idx > 0)
   if ( head == -1 )
   {
      head = free_node[free_idx--];
      delta[head] = dlta;
      event[head] = evnt;
      next[head] = -1;
      return;
   }

   // otherwise scan for the first empty slot
   curr = head;
   prev = -1;

   while ( curr >= 0 )
   {
      delta_sum += delta[curr];

      if ( delta_sum >= dlta )
         break;

      prev = curr;
      curr = next[curr];
   }

   // If the times are the same, merge the events
   if ( delta_sum == dlta )
   {
      event[curr] |= evnt;
      return;
   }

   // Otherwise insert a new event.  After an event is inserted,
   // we need to check if there is another event after the insertion
   // point and decrement its delta value

   // Special case an insertion before the head
   if ( prev == -1 )
   {
      head = free_node[free_idx--];
      delta[head] = dlta;
      event[head] = evnt;
      next[head] = curr;
   }
   else
   {
      // General insertion.  Point the previous node to a new one
      next[prev] = free_node[free_idx--];

      // Advance to the new node and set it's link and data fields
      // The delta field calculation is a bit ugly since it depends on
      // how we terminated the search loop, e.g. end-of-list or
      // delta_sum > dlta.  This could probably be cleaned up to
      // avoid the branch.
      prev = next[prev];

      next[prev] = curr;
      event[prev] = evnt;
     
      if ( curr >= 0 )
         delta[prev] = dlta - (delta_sum - delta[curr]);
      else
         delta[prev] = dlta - delta_sum;
   }

   // Adjust the value of delta[curr] if needed
   if ( curr >= 0 )
      delta[curr] = delta_sum - dlta;
}

void add_clocks( unsigned int dlta )
{
    unsigned int delta_sum = 0;

    // scan the linked list until delta_sum > dlta
    int curr = head;

    while ( curr >= 0 )
    {
        // Advance the delta counter
        delta_sum += delta[curr];
        if ( delta_sum > dlta )
           break;

        // process the event
        do_event( event[curr] );

        // delete the linked list node
        free_node[++free_idx] = curr;
        curr = next[curr];
    }

    // reset the start of the list
    head = curr;

    // optionally reset the next delta
    if ( head >= 0 )
       delta[head] = delta_sum - dlta;
}


That should be a start.

Edit:

Updated example code to actually sum the relative deltas and make the implementation a bit more realistic.

Also, if you are querying the delta queue on single-cycle intervals, then the add_clocks() function is greatly simplified because dlta = 1, e.g.

Code:
void add_clock(void)
{
    // Just check and see if the head of the list is
    // present with a delta value equal to one
    if ( head < 0 )
        return;

    if ( --delta[head] == 0 )
    {
        // process the event
        do_event( event[head] );

        // delete the linked list node
        free_node[++free_idx] = head;
        head = next[head];
    }
}


Last edited by lscharen on Wed Dec 31, 2008 5:41 am, edited 2 times in total.



Wed Dec 31, 2008 1:02 am
Profile
Lurker

Joined: Wed Jul 28, 2004 1:35 am
Posts: 128
Reply with quote
Post 
agreed - linked list.

{ delta_time_to_expire, event, next*}

insertions are cheap, and you can't DO a binary search on a (classical) delta queue for insertions because you don't know the accumulated time delay on any particular event - only the time delta from the prior event.

If you expected your queue to have to get very long (rule #1 of choosing an algorithm: Know your domain), you could do an array of linked lists (somewhat like a hash table), and you could take advantage of precomputed "block" (referring to the linked lists) deltas to further reduce the insertion time. If you *really* wanted to do binary searches, you could reduce the cost of keeping the absolute times current by keeping semi-absolute times in each block (absolute times relative to the first non-active block), which would give you the ability to do a binary search. Then when the active block expires, you recompute absolute times only for the blocks, instead of each and every element in the queue.

But that's getting into a very high level of complexity that's generally not warranted for a delta queue.

edit: the biggest advantage that lscharen's implementation has over a classical linked-list implementation is locality. For the size of linked-list that byuu is talking about, there shouldn't be more than 2 cache misses in walking the list. A classical implementation would have many many more cache misses.


Last edited by DataPath on Wed Dec 31, 2008 1:35 am, edited 1 time in total.



Wed Dec 31, 2008 1:23 am
Profile ICQ YIM
ZSNES Developer
ZSNES Developer
User avatar

Joined: Tue Jul 27, 2004 10:54 pm
Posts: 3902
Location: Solar powered park bench
Reply with quote
Post 
I must be misunderstanding the case here, since it seems too easy to me.

Can't you keep the same queue you have, just have an extra integer alongside it to contain the offset to use when tallying up to where you need to be in the queue?

_________________
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding


Wed Dec 31, 2008 1:32 am
Profile WWW
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
byuu wrote:
Quote:
I use either a heap with absolute times

Won't that require decrementing and checking every item in the list?

No, I keep a single cycle counter that is incremented and compared to the head. The event time is the count at which the event is supposed to occur.
EDIT: what do you mean by "the list"? A binary heap is usually implemented with an array.

byuu wrote:
Quote:
if there is a very low number of fixed events, I just statically select the closest event linearly on changes


How do you know which event is the closest without checking them all? Are you keeping your list sorted, but still using absolute times?

No, I mean it's a hard-coded linear search, but because of the low constant cost factor it's faster than O(log N) alternatives for low N. It only needs to be done every time an event is executed or scheduled.


Wed Dec 31, 2008 2:07 am
Profile
Reply with quote
Post 
lscharen, thank you very much for the example code.

Not following what you mean by "amortization pool credits", but I get the general idea of your code. The part I missed was that you don't have to recompute all elements after your insertion part, just one.

Example ... say we have the following queue:
[0] = { NMI, +26 }
[1] = { IRQ, +14 } //= +42 total
[2] = { DRAM, +10 } //= +52 total
We want to insert ALU at +32, so we do an O(n) summation and find that we insert at [0]. So first link ALU between [0] and [2] to get:
[0] = { NMI, +26 }
[1] = { ALU, ??? }
[2] = { IRQ, +14 }
[3] = { DRAM, +10 }
Then fix the times:
[1] = clocks(32) - [0] = +6
[2] = [0] + [2] - [1] = +10

[3] is still correct [+26, +6, +10, +10 = +52], no need to update it.

The O(n) search to find the insertion point sucks, but like DataPath mentioned (and I missed in the first post), the relative values means we can't perform a binary search here. The linked list also rules out that possibility, anyway.

Quote:
Can't you keep the same queue you have, just have an extra integer alongside it to contain the offset to use when tallying up to where you need to be in the queue?


Not entirely following what you mean, sorry.

Quote:
No, I keep a single cycle counter that is incremented and compared to the head. The event time is the count at which the event is supposed to occur.


Ah, that brings up two problems for me:
1) you have to test each and every entry in the list when adding time, or keep the list sorted somehow.
2) exact cycle count timestamps that grow forever will eventually overflow, unless you subtract them all by the smallest value in the list periodically. Do you do something like that after each frame or so?

Perhaps it would be better to keep a sorted exact cycle position list rather than an exact relative cycle position list. Don't think it'll matter with a linked list approach, though.

Quote:
EDIT: what do you mean by "the list"? A binary heap is usually implemented with an array.


I'm referring to an array, yes. But from lscharen's observation above, I'm leaning toward using a linked list now.

Quote:
No, I mean it's a hard-coded linear search, but because of the low constant cost factor it's faster than O(log N) alternatives for low N. It only needs to be done every time an event is executed or scheduled.


That's what I'm worried about now. I don't have many events, so I don't know if this queue scheme will be worth the extra complication compared to hard-coded tests in the appropriate places.


Wed Dec 31, 2008 11:33 pm
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
Quote:
Ah, that brings up two problems for me:
1) you have to test each and every entry in the list when adding time (at least when its possible for those events to fire.)

No idea what you're talking about here. Are you perhaps confusing a heap with something else? With a binary heap adding cycles would be O(1), finding the head is O(1), adjusting or scheduling an event is O(log N), and removing the head is O(log N). Heaps are typically used for priority queues.
Yes, I do subtract all counters periodically to avoid overflow.

Using a linked list for a binary heap would be silly (slow).

Quote:
That's what I'm worried about now. I don't have many events, so I don't know if this queue scheme will be worth the extra complication compared to hard-coded tests in the appropriate places.

Note that by hard-coded linear search I'm referring to something very different from what you're currently doing.


Thu Jan 01, 2009 12:12 am
Profile
Reply with quote
Post 
Quote:
No idea what you're talking about here. Are you perhaps confusing a heap with something else?


Ah yes, I was thinking you had a linear array. Can't say I've ever used a binary heap before, so I had to read up on it.

The 2n+1, 2n+2 trick to index into the heap is cute.

Quote:
With a binary heap adding cycles would be O(1)


... because they're absolute times, so you're using a min-heap to trigger that, removing the element, then repeating the compare. Got it.

Quote:
finding the head is O(1)


Always heap[0], yes.

Quote:
adjusting or scheduling an event is O(log N), and removing the head is O(log N)


Hmm. I'll have to work that out on paper. I'm visualizing a tree that branches in each direction, so you could add to one side indefinitely rather than both evenly. Probably just diagram->actual implementation confusion.

Obviously what you say about big O timing is correct.

Quote:
Yes, I do subtract all counters periodically to avoid overflow.


Yes, subtract the current time from everything, since that will always be the lowest value; assuming you've polled for events after your last cycle add.

Quote:
Note that by hard-coded linear search I'm referring to something very different from what you're currently doing.


I'll take your word for that, have enough here to work with at the moment ;)

Thank you very much for your help. Now I'm not sure which approach I should use ... linked list + relative timing, or binary heap + absolute timing + periodic subtraction to prevent overflow. Meh.


Thu Jan 01, 2009 1:51 am
Rookie

Joined: Fri Nov 14, 2008 1:01 am
Posts: 13
Reply with quote
Post 
byuu wrote:
2) exact cycle count timestamps that grow forever will eventually overflow, unless you subtract them all by the smallest value in the list periodically. Do you do something like that after each frame or so?


No need there. When comparing time stamps, instead of a < b always do (signed) a - b < 0. On most processor models, this amounts to checking the negative flag instead of the carry flag, so there should be no difference in performance.


Thu Jan 01, 2009 2:11 am
Profile
Regular
User avatar

Joined: Thu Jun 30, 2005 1:54 pm
Posts: 327
Location: USA
Reply with quote
Post 
byuu, I found this helpful illustration of array-based heap operations. I had never studied this structure before. Array-based heap lecture notes.pdf

Ecst's point is that if you ensure that all your times fall within 0 to UINT_MAX/2 of each other, you can entirely eliminate periodic adjustments of all times. Unsigned values wrap around by definition, so addition and subtraction work without change. Only ordered comparison needs special treatment:
Code:
bool less_than( unsigned x, unsigned y )
{
    return x - y > UINT_MAX / 2;
}

On a two's complement machine where conversion from unsigned to signed doesn't do any range check, which is the case on almost all machines these days, that can be optimized to the following (and I wouldn't be surprised if gcc does this automatically):
Code:
bool less_than( unsigned x, unsigned y )
{
    return (int) (x - y) < 0;
}

Don't try to use signed int everywhere and eliminate less_than; even though overflow of signed integers doesn't usually generate a trap on most machines these days, a compiler's optimizer might take advantage of that fact and rewrite the second "if" below into the first:
Code:
int time_x, time_y;

if ( time_x < time_y ) // won't work if y has overflowed, but x hasn't

if ( time_x - time_y < 0 ) // optimizer might rewrite this as above


Thu Jan 01, 2009 4:09 am
Profile WWW
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
Good point, ecst. I use that in my gettime/sleep abstraction code for instance, but haven't applied it to this for various reasons.


Thu Jan 01, 2009 5:41 am
Profile
Lurker

Joined: Wed Jul 28, 2004 1:35 am
Posts: 128
Reply with quote
Post 
Really, all this angst over refining the process should be dumped in favor of the most straightforward readable implementation, and optimized if it comes up in performance profiling.


Thu Jan 01, 2009 4:00 pm
Profile ICQ YIM
New Member

Joined: Tue Dec 30, 2008 11:37 pm
Posts: 7
Reply with quote
Post 
byuu wrote:
lscharen, thank you very much for the example code.

Not following what you mean by "amortization pool credits", but I get the general idea of your code. The part I missed was that you don't have to recompute all elements after your insertion part, just one.


Yes, I was trying to get across the idea that, even though the worst-case insertion time is O(n), a sequence of insert/remove operations have amortized constant time. See http://en.wikipedia.org/wiki/Amortized_analysis for more information.

Quote:
Example ... say we have the following queue:
[0] = { NMI, +26 }
[1] = { IRQ, +14 } //= +42 total
[2] = { DRAM, +10 } //= +52 total
We want to insert ALU at +32, so we do an O(n) summation and find that we insert at [0]. So first link ALU between [0] and [2] to get:
[0] = { NMI, +26 }
[1] = { ALU, ??? }
[2] = { IRQ, +14 }
[3] = { DRAM, +10 }
Then fix the times:
[1] = clocks(32) - [0] = +6
[2] = [0] + [2] - [1] = +10

[3] is still correct [+26, +6, +10, +10 = +52], no need to update it.

The O(n) search to find the insertion point sucks, but like DataPath mentioned (and I missed in the first post), the relative values means we can't perform a binary search here. The linked list also rules out that possibility, anyway.


Right. You only have to update the delta time of the event you are inserting and the delta of the next event. If you're not convinced of the O(1) expected performance of the queue, implement it and try it out. You may be correct that it would be faster to keep an array sorted if the number of events is bounded by a small number, e.g.

Code:
void array_queue_insert( unsigned int time, unsigned int event)
{
   // find the insertion point
   int idx = binary_search( array, time );
   
   // move the elements from idx+1 to array_len
   memmove( &array[idx + 1], &array[idx], (array_len - idx) * sizeof(int));

   // insert the new event
   array[idx] = event;
}


Thu Jan 01, 2009 6:38 pm
Profile
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
Suppose you have N events, and that every time an event is executed it is rescheduled to execute after all the other events (but we don't know this, so we can't just stick it to the back of a FIFO). Then with a delta queue approach each event execution+rescheduling has O(N) complexity. Amortizing over the number of event executions like that looks like a fallacy to me, since you're not considering that all the events already in the queue need to be scheduled too.


Thu Jan 01, 2009 7:34 pm
Profile
Regular
User avatar

Joined: Thu Jun 30, 2005 1:54 pm
Posts: 327
Location: USA
Reply with quote
Post 
lscharen, excellent point about only having to adjust the next delta, not all of them, when inserting before the end. sinamas, if you keep track of the absolute time of the last event, inserting an event past all the others can done in constant time. I wrote a simple delta queue class and exercise program, which demonstrates both of the above points. Excerpt of the key parts shown below. Note that this demo doesn't associate anything with the events, just the times of the events themselves. Delta_Queue.cpp
Code:
void Delta_Queue::insert( int time )
{
    if ( time >= last )
    {
        events.push_back( time - last );
        last = time;
    }
    else
    {
        events_t::iterator p = events.begin();
        while ( p != events.end() )
        {
            if ( time <= *p )
            {
                *p -= time;
                break;
            }
           
            time -= *p++;
        }
           
        events.insert( p, time );
    }
}

void Delta_Queue::pop_front()
{
    last -= front();
    events.pop_front();
}


Fri Jan 02, 2009 1:41 am
Profile WWW
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Reply with quote
Post 
blargg wrote:
sinamas, if you keep track of the absolute time of the last event, inserting an event past all the others can done in constant time.

True, but we still get linear complexity on average if the common case position is before the end and has a mean proportional to N.


Fri Jan 02, 2009 2:20 am
Profile
New Member

Joined: Tue Dec 30, 2008 11:37 pm
Posts: 7
Reply with quote
Post 
sinamas wrote:
blargg wrote:
sinamas, if you keep track of the absolute time of the last event, inserting an event past all the others can done in constant time.

True, but we still get linear complexity on average if the common case position is before the end and has a mean proportional to N.


I have to agree with sinamas here. I took the javadoc comment from the page byuu linked to on its face, but I can't figure out a proof of amortized constant time for a linked-list delta queue. I even looked at the Java source code and it's basically the same implementation that I proposed earlier.

http://avrora.cvs.sourceforge.net/viewvc/avrora/avrora/src/avrora/sim/clock/DeltaQueue.java?revision=1.1.1.1&view=markup

In terms of efficient implementation, I'm starting to think that a min-heap using the 2s-complement trick might very well be faster. The lack of having to update a next field and guaranteed O(log n) insertion might make up for O(log n) removal.


Fri Jan 02, 2009 2:56 am
Profile
Hazed

Joined: Sat Jan 28, 2006 7:21 am
Posts: 76
Reply with quote
Post 
This is the kind of talk that keeps me coming back. I always learn something new and realize how much more my skills need to improve :P


Fri Jan 02, 2009 3:05 am
Profile
Rookie

Joined: Fri Nov 14, 2008 1:01 am
Posts: 13
Reply with quote
Post 
blargg, good job in elegantly implementing a delta queue (and also in providing commentary on standards-compliant C/C++ implementation of "sliding" compares). Note that you can get rid of the subtractions in "*p -= time;" and "time -= *p++", and also the "last" variable if you use absolute time stamps instead of relative deltas (paradoxical as it may sound).

The misunderstanding regarding complexity seems to be the following: The O(1) for delta queues takes the number of events as constant, not depending on it as a variable, rather giving it the meaning of O(1) per event and time unit.

Facilitating comparability, let us consider the number n of events, the number t of cycles of the considered time interval, the frequencies (average number of occurences per cycle) f_1, ..., f_n of the events, and the average delta queue insertion points (how many list items we have to iterate over when inserting the event) a_1, ..., a_n of each event. Note that a_i * f_i <= 1 (this is the key point of delta queues).

The binary heap is O(log(n) * (f_1 + ... + f_n) * t).

The delta queue is O((a_1 * f_1 + ... + a_n * f_n) * t). Using a_i * f_i <= 1, we can give the simpler, but less sharp estimate O(n * t) (this is the O(1) per event and time unit from above).

As a case study, further simplifying, assume that each event respawns as soon as it got killed and has constant delay, that f_1 > ... > f_n, that no events occur simultaneously, and that the time interval is sufficiently large. After some calculations, we find that a_1 * f_1 + ... + a_n * f_n = 2 * (0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n). So it all boils down to log(n) * (f_1 + ... + f_n) vs. 0 * f_1 + 1 * f_2 + ... + (n - 1) * f_n. Just for fun, let f_n = 1 / n^q with q > 0 some real parameter. Interestingly, the binary heap wins for 0 < q < 2, while the delta queue wins for q > 2, with a draw for q = 2.

Informally, these heuristic results would mean the delta queue is best suited for situations where the event delays cover all different kinds of magnitudes, while the binary heap should be used in the remaining cases, and when guaranteed optimal (and good) worst case complexity is important. Note that (albeit not obvious, and maybe slowed by a constant factor) binary heaps can also feature logarithmic removal/rescheduling of events (e. g., in case IRQ settings change).

Interesting to consider would be a synergistic approach, with the at most logarithmic insertion of the binary heap, but cheaper insertion for events sooner to occurs. Maybe a kind of "unbalanced" binary search tree with a bias for the right (with larger values).

Nevertheless, all these considerations will probably be in vain if the number of events is low enough for efficient hard-coded solutions. Still, a scalable approach would be interesting (e. g., if you, byuu, plan on potentially adding many future events, for whatever reason).


Last edited by ecst on Fri Jan 02, 2009 4:18 am, edited 2 times in total.



Fri Jan 02, 2009 3:11 am
Profile
Display posts from previous:  Sort by  
This topic is locked, you cannot edit posts or make further replies.   [ 36 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.