Does this fix resolve any game issue's?
Nope, it doesn't fix any game. It just improves the timing precision ~15-20% of IRQs generated by almost exactly one microsecond (one ten millionth of a second.)
By the way, will the test you invented to obtain this result give you any sort of edge for figuring out the NMI-related edge case problem below it?
Separate issue, I could probably handle that one now actually ... but yeah, I really need a break. I really didn't expect this IRQB problem to become one of the toughest issues I've faced to date.
How did you stumble upon this bug? Is it related to Mecarobot?
I was starting to write a Mecarobot code clone, and immediately noticed the IRQ timing was off. But I had so many tests on that before -- IRQ timing was supposed to be perfect! So yeah, five day tangent to solve that first. Actually discovered the PIO delay while I was at it, too. But I'll worry about that when I worry about mul / div delays. I need a better mechanism for streamlining hardware delays.
I'm a big believer that you have to get the basic building blocks right before you mess with higher level problems, so that's why I wanted to solve this prior to Mecarobot's issue. Problem is, I'm completely worn out now.
I only vaguely understand it
I'd be surprised if anyone still around understands what I'm saying, heh. It just helps to type things out, I usually come up with most of my ideas when trying to summarize what I know. And sometimes I get a lot of help, eg from dvdmth in this case.
I wasn't sure it would even be possible to test, but you've had a lot of experience writing tests, so I figured if there was a way, you'd find it.
anomie and I used to joke about crazy tests involving executing code out of memory data registers and MMIO. Never thought I'd actually end up using those tricks, myself.
But it's really amazing the kinds of edge cases you can test with them. One that anomie always mentioned was trying to read from $2180 twice in a row. $2180 reads at 6 clock cycles, but standard WRAM accesses require 8 clock cycles. You can't use fixed channel DMA to test, because that always waits 8 clock cycles no matter what.
A fun test would be to execute lda $2180 from PC = $00217e. It would read the $21 from WRAM, and then it would immediately try and read again. To keep the code from crashing (as you can't execute code on top of MMIO $2181+), you would have to trigger an IRQ immediately after that opcode completed, and then log the value in A right away at the start of your IRQ. If A is valid, then you really can access WRAM faster than normal here. But most likely, you'd end up with some rather strange results here.
Now do the same thing with Uniracers' voodoo. :D
Realistically, I doubt that's going to happen. The amazing amount of time it took for just this small finding was humbling. Trying to tackle something like the S-PPU1/2 is most likely way out of my league. I don't have the kind of analytical mind blargg and co have, so I pretty much work by narrowing down possibilities. Since you can't execute your own code on the S-PPU, that becomes quite a lot more difficult.
And then there's the speed issue ...