Fun timing stuff

byuu · Post by **byuu** » Wed Aug 03, 2005 12:54 am

So I noticed a flickering line in Metroid 3 and figured I'd look into it.
I have 98% accurate DMA/HDMA/IRQ/NMI/opcode timing (plenty of variance from real hardware though), and yet it turned out to be a timing issue.
It sets up a VHIRQ for 31 (forget the H, but its halfway or so through the frame), that gets hit, then HDMA saps all of the cycles on the scanline, and the sta $212c ends up happening at V=32, H=1.5 or so. And since I make the scanline render at H=0, that's a problem.
So it's gotta be an IRQ thing or something, so I look into that. Fun IRQ things I've noticed today:

The first frame at reset triggers IRQs one scanline early. e.g. set a VIRQ for 2 and it triggers at H=?/V=1. Set it for 0 (when you're on 0) and it goes to the next frame, whereas any other frame the VIRQ triggers on the same scanline.
Like anomie said, there's ~5.5 dot delay after V=$4209, but this delay seems to magically disappear between 333-339, lest the IRQ would never trigger. It never bumps it to the next scanline as you'd expect. The exact delay changes every scanline. I get correct results ~90% of the time with a 5.5 dot delay, ~80% with 5.0, and everything else is even more off. 5.5 fixes the 5.0 errors, and 5.0 fixes the 5.5 errors.
A V+H/IRQ on the same scanline seems to wait an opcode after the sta $4200 before triggering. Probably something to do with all that rti/cli stuff anomie was talking about before.

Ok, fine. Maybe I can get NMI a little more accurate at least?
It triggers at V=$e1/$f0 / H=$3 (Hcycles >= 12). Oh, but of course it isn't that easy. Sometimes it just randomly totally misses $e1, and the NMI occurs on scanline $e2.
I got this by making a tight loop that keeps invoking NMI, latching the counters, saving to SRAM, and continuing. No bugs in the code, works fine through emulation. Just something causes NMI to trigger on the NEXT scanline sometimes.

Grr. Maybe I can just find a valid range for $212c writes and be done with it. Ah, great. $212c can be written ANY TIME, even mid frame. You can actually create some really neat effects by exploiting mid-scanline writes. The easiest (cleanest) way to emulate that would be to make the PPU a dot-based renderer, that or use a bajillion lookup tables for every register whilst rendering the scanline.
So the problem is that Metroid 3 actually ends up setting the $212c register on V=32/H=1/2, but rendering doesn't begin until H=22, so you don't notice it.

anomie · Post by **anomie** » Wed Aug 03, 2005 6:57 pm

byuusan wrote:Like anomie said, there's ~5.5 dot delay after V=$4209, but this delay seems to magically disappear between 333-339, lest the IRQ would never trigger. It never bumps it to the next scanline as you'd expect. The exact delay changes every scanline. I get correct results ~90% of the time with a 5.5 dot delay, ~80% with 5.0, and everything else is even more off. 5.5 fixes the 5.0 errors, and 5.0 fixes the 5.5 errors.
A V+H/IRQ on the same scanline seems to wait an opcode after the sta $4200 before triggering. Probably something to do with all that rti/cli stuff anomie was talking about before.

To summarize my IRQ timing notes, for anyone paying attention: First, IRQ can only occur between opcodes, i.e. we do the IRQ handler code instead of fetch-next-opcode. /IRQ is checked one CPU Cycle earlier, which leads to a problem for CLI and friends. CLI is 2 CPU cycles: opcode fetch and adjust flag. However, the IRQ can't trigger right away after the IRQ instruction, because the check occurs gefore the "adjust flag" cycle. Thus, we get interesting results like "CLI ; SEI" the IRQ handler will be called after the SEI. RTI doesn't have this problem, since the flag adjustment is not the final cycle, so "CLI ; RTI" won't allow the IRQ to trigger at all.

The actual interrupt timing: if H=0, then the earliest the IRQ can occur is at (0,V-1)+1374. Otherwise, it's (0,V)+14+H*4; there's no correction for the PPU long dots. H=0x153 will not fire on the short scanline, or on the last scanline of any frame.

Sometimes it just randomly totally misses $e1, and the NMI occurs on scanline $e2.

WTF?

Ah, great. $212c can be written ANY TIME, even mid frame.

I'm not surprised, since $2100 can be done the same way. Probably most registers can be written to some effect or another mid-scanline, ugh.

byuu · Post by **byuu** » Wed Aug 03, 2005 10:17 pm

Sometimes it just randomly totally misses $e1, and the NMI occurs on scanline $e2.
WTF?

See for yourself: http://byuu.org/files/nmi_timing.zip

I'll be available the rest of today if you want to stop by that IRC room and discuss this.

I was actually a bit incorrect before. NMI wasn't firing on $e2, NMI was not firing at all, and the - lda $4210 : bpl - loop didn't break out until scanline $e2.

Let me explain nmi_trigger.txt: a log file through emulation.
The first line //nnnn : bla tells you the position in the SRAM file, the L/H/E means low, high, exec. Low happens when the CPU is told to lower the NMI pin, high happens when $4210 is read, and exec happens between opcodes.
Low and high will happen mid-cycle (not on cycle edges, e.g. they can occur 2 cycles into the 6-cycle $4210 read cycle), whereas exec only happens on opcode edges.
Low obviously can only trigger 2 cycles into the read cycle, whereas high will happen on ANY possible cycle tick.
This may not be exactly how the SNES does it, but it does give you exact pinpoints of what's going on at exactly what master cycle.
Even if there were a bug in my program, this is still very very strange behavior that we should probably figure out if possible.

If you look at SRAM location 0x0098, you'll see that the NMI was skipped the entire frame, and doesn't execute until the next frame. What's worse is that the wait for $4210 bit 7 loop breaks out on the next scanline ($e2). Madness. Absolute madness.

Oh yeah, about the SRAM file itself: basically, the ROM will loop with the wait for $4210 bit 7 thing, and the NMI will trigger during or immediately after that loop ends. The NMI then writes the x/y position to SRAM, then it returns, and after that the main loop writes the x/y position to SRAM as well. To tell the two apart, I OR'ed the NMI x with $8000.
Now look at 0x0098 in the ufo.srm file, and you'll see that the NMI never happened here.

I'm not surprised, since $2100 can be done the same way. Probably most registers can be written to some effect or another mid-scanline, ugh.

The really bad part is that with a little timing magic (or some clipping windows), it would be trivial to create some "VDMA" effects (hah :P ).
It would be trivial, to say, create 32-pixel black color window clipping columns at the left, center, and right sides of the screen, and then make the left column disable BG2, and the right BG1, creating two separate images. I'm sure it has to be possible to create some totally unique effects using similar effects as well. Asuming $2105 works mid-scanline, what about the left side being mode1, and the right side being mode7? You could show text scrolling on the left, and a mode7 graphic scrolling around on the right. Heheh. Maybe I should make a test program to torment emu authors :P

anomie · Post by **anomie** » Thu Aug 04, 2005 3:08 am

byuusan wrote:See for yourself: http://byuu.org/files/nmi_timing.zip

It looks more like the "latch when 44210 bit 7 set" test is sometimes passing twice... It may or may not be because you don't PHP/PLP in your NMI handler, and it may or may not be because of the oddity we've found where $4210 bit 7 is occasionally not cleared as expected in a "lda $4210 ; bpl -" loop like this.

Asuming $2105 works mid-scanline, what about the left side being mode1, and the right side being mode7? You could show text scrolling on the left, and a mode7 graphic scrolling around on the right. Heheh. Maybe I should make a test program to torment emu authors :P

Or if the Mode 7 registers can be altered mid-line (i'm really not too sure they can), you could do some really interesting Mode 7 effects.

byuu · Post by **byuu** » Thu Aug 04, 2005 6:49 am

anomie wrote:It looks more like the "latch when 44210 bit 7 set" test is sometimes passing twice... It may or may not be because you don't PHP/PLP in your NMI handler, and it may or may not be because of the oddity we've found where $4210 bit 7 is occasionally not cleared as expected in a "lda $4210 ; bpl -" loop like this.

Hmmm... I will update the test to write a counter for each test so I can determine if the $4210 bit 7 test is triggering twice. That makes a hell of a lot more sense, but it's still very strange. It seems to pass the second time at the start of $e2. Either the NMI interrupt handler eats almost exactly an entire scanline, or it isn't raising $4210 again until the next scanline. Easy to find out either way...

Why would php/plp help? NMI doesn't care about the status of any of the flags, and the pha/pla inside the loop should preserve the N flag (rather, the pla will restore the N flag to what it was before the NMI was triggered). Even if it didn't, then the error should occur during emulation as well, so clearly there is something not being emulated either way here.
At least it's good to know that the NMI isn't skipping the frame, that would be an extremely large problem to deal with.
Edit: Damn, the rti would screw up the flags, wouldn't it? So php/plp wouldn't help much. wai is probably the best thing to do here, but I want to get this odd behavior working correctly nonetheless :/

I'll have to work on it next week though, work and stuff :(

anomie · Post by **anomie** » Sat Aug 06, 2005 3:16 pm

byuusan wrote:Hmmm... I will update the test to write a counter for each test so I can determine if the $4210 bit 7 test is triggering twice. That makes a hell of a lot more sense, but it's still very strange. It seems to pass the second time at the start of $e2. Either the NMI interrupt handler eats almost exactly an entire scanline, or it isn't raising $4210 again until the next scanline. Easy to find out either way...

Well, what i see is first the NMI triggers latching at ($21,$e1), then the $4210 handler latches at ($ce,$e1), then the $4210 handler latches again at ($13,$e2). This happens every time the NMI latches at H=$21.

BTW, this test is very deterministic. I see the following results:

Code: Select all

NMI  $4210
---  -----
$1e  $d7
$1f  $d0
$20  $d8
$21  $ce & $13 of line $e2
$22  $cf or $d3
$23  $d4

I suspect that there is a window when /NMI is being raised where reading $4210 won't clear it. Read before this, and $4210.7 will be clear, and after this window it will be cleared normally.

By this theory, we observe the non-clearing when /NMI goes right when the "LDA $4210 : BPL -" loop actually reads /NMI. The NMI itself then is probably occurring just after the BPL fails, so the return address on the stack should point to your LDA $2137 instruction ("LDA 3,S" should be the low byte of the address, and "LDY #1 : LDA (3,S),Y" should give you either $10 (the LDA $4210), $fb (the BPL), or $37 (the LDA $2137) assuming DB was set up correctly). /NMI going any later the BPL would pass and we'd point to the LDA $4210 instruction instead. Earlier _might_ point to LDA $2137, but more likely would point to the BPL instruction itself. Feel like logging the LDA (3,S),Y byte in the NMI routine?

Why would php/plp help? NMI doesn't care about the status of any of the flags, and the pha/pla inside the loop should preserve the N flag (rather, the pla will restore the N flag to what it was before the NMI was triggered).

Hrm... Yes, i think you're right. PLP shouldn't matter here.

Edit: Damn, the rti would screw up the flags, wouldn't it? So php/plp wouldn't help much.

Err, yeah, that's another reason why PHP/PLP wouldn't matter: the NMI/RTI automatically includes PHP/PLP. Bad me.

byuu · Post by **byuu** » Sat Aug 06, 2005 6:53 pm

I was starting to think that the problem was that /NMI was testing one cycle before the lda $4210 completed (like /IRQ), the $4210.7 read last cycle returns one, then at the end of the opcode the NMI is triggered and $4210.7 is set back to return one because the NMI just triggered for real.
So $4210.7 returns true twice, only when $4210 bit 7 is read during the cycle after the /NMI test.

Bad me.

At least you weren't dumb enough to think NMI missed a frame sometimes :P

byuu · Post by **byuu** » Tue Aug 09, 2005 3:19 am

I figured it out.

$4210 bit 7 is read and the bit is set, and then NMI routine latches at ($21,$e1) [file offset 0x90], then returns to the bpl instruction, which is false, then the counter is latched at ($ce,$e1), the routine goes back into the next 'wait for NMI' loop, and the first read has $4210 bit 7 set again, so the counter latches at ($13,$e2).

The cause of the phenomenon isn't because the NMI routine is lathcing at $21, but rather because the lda $4210 occurs at HC=2
HC just being H cycles instead of H dots, so HC=2 == H=0.5

I presently assume that the actual read from $4210/$2137 happens 2 master cycles into the 6-master-cycle read.

Given that, I notice that $4210 bit 7 is set when either of the following are true:
V==225/240 && HC>=2
V>225/240
At that time, the nmi_read flag is set, and subsequent reads return this bit as being clear until the next NMI triggers on the next frame.
There is one exception: When $4210 bit 7 is read when V==225/240 && HC==2 (e.g. the first possible cycle where $4210 bit 7 *CAN* be set), the nmi_read flag is not set, even though $4210 bit 7 is set. Therefore, two reads in a row result in $4210 bit 7 being set.
Note that the NMI pin that Charles MacDonald speaks of in snestech.txt proved to be irrelevant for emulation of NMI. The nmi_read flag I use is not raised at the start of a new frame, it is lowered.

The actual NMI itself probably tests one opcode cycle before an opcode edge, like IRQs, but I was able to emulate the behavior by testing for NMI only on opcode edges where V==225/240 && HC>=12, or V>225/240.

Why $4210 bit 7 is set before the NMI really triggers is beyond me. Probably just a tiny speedhack by the SNES designers since by the time the next opcode executes, NMI will have already triggered anyway.

I've updated the NMI timing archive from above:
http://byuu.org/files/nmi_timing.zip
...with a new SMC file that will test the results against a cached hardware result log. The screen turns blue if the timing matches hardware, and red if it does not. A quick way to test if an emulator implements the above correctly. Here's a WIP version of bsnes that can pass the above test, verifying my above assertions:
<edit: link removed>

TODO:
1) Try adding in an opcode to the $4210 bit 7 wait loop whose last cycle is 8 master cycles to see if NMI triggers one cycle early like IRQ.
2) Test what happens when H/DMA run over the NMI trigger (HDMA runs on V=224, right?), and even when crossing over entire frames.
3) Try triggering the above quirk, and then not reading $4210 bit 7 until after to start of a new frame and see if its still set.
4) Just try reading $4210 bit 7 at the start of the next frame to see if the bit is still set.
5) IRQ/NMI interruptions? (NMI supposedly takes priority)
6) Try reading $4210 in 16-bit mode to see if results change

Edit: 1 causes timing errors, as expected. I tried to emulate it as 'hc >= (18 - last_opcode_cycle_count)', since I was using hc >= 12 before.
If I were testing on the cycle edge, then that code should've worked. I also tried using various values when the last cycle was 8 master cycles long (10, 14, etc.), but nothing worked. The farther away I got from 12, the more errors appeared.
The code I used for this was - lda $4210 : wdm #$00 : bpl -

6 also causes errors. I used - lda $4210 : asl #8 : bpl - to avoid any 8-master-cycle-ending opcodes.

So in other words, I'm still not even close. I find this so unbelievably tedious and pointless sometimes... :/

[later]
Hooray! I realized what I was doing wrong before with IRQ/NMI timing (NMI still has issues though).
I was using 30 - last_opcode_cycle_count before, so that if the last opcode cycle was 6 master cycles, HC would need to be 24. 8 -> 22, etc. But that's backwards. What I needed was 18 + last_opcode_cycle_count. 6->24, 8->26, etc.

So now for IRQ, I have the following which is ran at the edge of every opcode (an IRQ is executed when true is returned):

Code: Select all

bool bCPU::irq_test() {
int vpos, hpos;
  if(regs.p.i)return false; //no interrupt can occur with I flag set
  if(status.irq_pin == 0)return false; //same as above
  if(status.virq_enabled == false && status.hirq_enabled == false)return false;

//calculate V/H positions required for IRQ to trigger
  vpos = status.virq_pos;
  hpos = (status.hirq_enabled) ? status.hirq_pos : 0;

//positions that can never be latched
  if(vpos == 240 && hpos == 339 && interlace() == false && interlace_field() == 1)return false;
  if(vpos == 261 && hpos == 339 && interlace() == false)return false;
  if(vpos == 262 && interlace() == false)return false;
  if(vpos == 262 && hpos == 339)return false;
  if(vpos  > 262)return false;
  if(hpos  > 339)return false;

  if(hpos == 0) {
    hpos = status.cycle_count + 14;
  } else {
    hpos <<= 2;
    hpos += status.cycle_count + 18;
  //it should be OK to use the current line cycles/frame lines,
  //as the IRQ will only trigger on the correct scanline anyway...
    if(hpos >= time.line_cycles) {
      hpos -= time.line_cycles;
      vpos++;
      if(vpos >= time.frame_lines) {
        vpos = 0;
      }
    }
  }

  if(status.virq_enabled == true && vcounter() != vpos)return false;

  if(hcycles() >= hpos) {
    status.irq_triggered = true;
    status.irq_pin = 0;
    return true;
  }

  return false;
}

That will match 100% of about 200 different test positions I tried with that IRQ test program I sent you earlier. I don't doubt for a second that there's still a lot of unemulated stuff in IRQs, though.

The NMI code now works with HC>=6+last_opcode_cycle_count (6->12, 8->14), but only when all opcodes in the wait for NMI loop end with 6 or 8 cycles (e.g. - jmp - for 8, - bra - for 6, with nop/wdm added to the loop to test), but when you mix two (- nop : jmp -, - wdm #$00 : bra -), the results are off by one dot every now and then, which eventually causes more and more errors in the logged data...
16-bit lda $4210 is also still screwed.

anomie · Post by **anomie** » Tue Aug 09, 2005 5:09 pm

byuusan wrote:There is one exception: When $4210 bit 7 is read when V==225/240 && HC==2 (e.g. the first possible cycle where $4210 bit 7 *CAN* be set), the nmi_read flag is not set, even though $4210 bit 7 is set. Therefore, two reads in a row result in $4210 bit 7 being set.

That's what I suspected...

Note that the NMI pin that Charles MacDonald speaks of in snestech.txt proved to be irrelevant for emulation of NMI. The nmi_read flag I use is not raised at the start of a new frame, it is lowered.

That pin doesn't exist anyway. There is a NMI pin on the S-CPU chip, but it is connected to ground, not the PPU.

ZSNES board

Fun timing stuff

Fun timing stuff

Re: Fun timing stuff

Re: Fun timing stuff

Re: Fun timing stuff

Re: Fun timing stuff

Re: Fun timing stuff