NMIB / IRQB timing (applies to NES as well)

Archived bsnes development news, feature requests and bug reports. Forum is now located at http://board.byuu.org/
ZH/Franky

Post by ZH/Franky »

Verdauga Greeneyes wrote:
byuu wrote:And it only took a week of 6-8 hour days of testing to figure it out. The coolest part is that it would have been absolutely fucking impossible to figure this out if not for quite literally all of the previous work we've done in the past. That's what I want to write about in an article.
Six to eight hours a day? Wew! How do you have time/energy for anything else? :) Either way, congratulations! That's a major unknown-cause bug off your errata page just a day or two after you created it ;) By the way, will the test you invented to obtain this result give you any sort of edge for figuring out the NMI-related edge case problem below it? (not that I'm suggesting anything - you deserve a break!)
Ideally, 1/3 of our time is spent sleeping (8 hours). this leaves us 16 hours.
He said 6-8, so we'll just average it to 7. 16 - 7 = 9.

After all that work, he still has 9 hours left in the day (which I assume he either spends for free time off doing other things, or the time when he goes to work.

I think it's great that he spends that long. It shows commitment, and I deeply respect people who commit themselves to their dreams.
Last edited by ZH/Franky on Tue Apr 08, 2008 5:38 pm, edited 1 time in total.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Nice job on working that out, byuu.


Now do the same thing with Uniracers' voodoo. :D
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Verdauga is just being whimsical, no need to break down byuu's sleep patterns :/

ps. this was a triumph
dvdmth
New Member
Posts: 6
Joined: Thu Feb 14, 2008 9:12 pm

Post by dvdmth »

I see my theory helped you to solve this problem. And to think that I started doubting myself an hour or so after I posted...

You were very clever in the way you were able to test it. I wasn't sure it would even be possible to test, but you've had a lot of experience writing tests, so I figured if there was a way, you'd find it. Great work!
ZH/Franky

Post by ZH/Franky »

ps. this was a triumph
What was a triumph?
Verdauga Greeneyes
Regular
Posts: 347
Joined: Tue Mar 07, 2006 10:32 am
Location: The Netherlands

Post by Verdauga Greeneyes »

Franky wrote:What was a triumph?
byuu figuring out this problem, I imagine. Also, the cake is a lie!
byuu

Post by byuu »

Does this fix resolve any game issue's?
Nope, it doesn't fix any game. It just improves the timing precision ~15-20% of IRQs generated by almost exactly one microsecond (one ten millionth of a second.)
By the way, will the test you invented to obtain this result give you any sort of edge for figuring out the NMI-related edge case problem below it?
Separate issue, I could probably handle that one now actually ... but yeah, I really need a break. I really didn't expect this IRQB problem to become one of the toughest issues I've faced to date.
How did you stumble upon this bug? Is it related to Mecarobot?
I was starting to write a Mecarobot code clone, and immediately noticed the IRQ timing was off. But I had so many tests on that before -- IRQ timing was supposed to be perfect! So yeah, five day tangent to solve that first. Actually discovered the PIO delay while I was at it, too. But I'll worry about that when I worry about mul / div delays. I need a better mechanism for streamlining hardware delays.

I'm a big believer that you have to get the basic building blocks right before you mess with higher level problems, so that's why I wanted to solve this prior to Mecarobot's issue. Problem is, I'm completely worn out now.
I only vaguely understand it
I'd be surprised if anyone still around understands what I'm saying, heh. It just helps to type things out, I usually come up with most of my ideas when trying to summarize what I know. And sometimes I get a lot of help, eg from dvdmth in this case.
I wasn't sure it would even be possible to test, but you've had a lot of experience writing tests, so I figured if there was a way, you'd find it.
anomie and I used to joke about crazy tests involving executing code out of memory data registers and MMIO. Never thought I'd actually end up using those tricks, myself.

But it's really amazing the kinds of edge cases you can test with them. One that anomie always mentioned was trying to read from $2180 twice in a row. $2180 reads at 6 clock cycles, but standard WRAM accesses require 8 clock cycles. You can't use fixed channel DMA to test, because that always waits 8 clock cycles no matter what.

A fun test would be to execute lda $2180 from PC = $00217e. It would read the $21 from WRAM, and then it would immediately try and read again. To keep the code from crashing (as you can't execute code on top of MMIO $2181+), you would have to trigger an IRQ immediately after that opcode completed, and then log the value in A right away at the start of your IRQ. If A is valid, then you really can access WRAM faster than normal here. But most likely, you'd end up with some rather strange results here.
Now do the same thing with Uniracers' voodoo. :D
Realistically, I doubt that's going to happen. The amazing amount of time it took for just this small finding was humbling. Trying to tackle something like the S-PPU1/2 is most likely way out of my league. I don't have the kind of analytical mind blargg and co have, so I pretty much work by narrowing down possibilities. Since you can't execute your own code on the S-PPU, that becomes quite a lot more difficult.

And then there's the speed issue ...
Last edited by byuu on Tue Apr 08, 2008 7:11 pm, edited 2 times in total.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

dvdmth is the new dmv27. Stick around, dude.
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

I'm sure you're exhausted, but does this revelation mean a v0.031 release? :)

As others have said, I have the barest grasp on the issues you're tackling. Regardless, I love reading your "problem solving" threads and I greatly appreciate the work you're doing for the community.
King Of Chaos
Trooper
Posts: 394
Joined: Mon Feb 20, 2006 3:11 am
Location: Space

Post by King Of Chaos »

Somebody really needs to compile all this information on a single website dedicated to the SNES where developers can write articles how things work on the actual hardware, so future would-be developers can get a good idea what to do and not what to do. NES has sites like this, so why not the damn SNES?

Stuff like that might actually get MORE people interested in SNES development.
[url=http://www.eidolons-inn.net/tiki-index.php?page=Kega]Kega Fusion Supporter[/url] | [url=http://byuu.cinnamonpirate.com/]bsnes Supporter[/url] | [url=http://aamirm.hacking-cult.org/]Regen Supporter[/url]
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

Like RHDN?
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
King Of Chaos
Trooper
Posts: 394
Joined: Mon Feb 20, 2006 3:11 am
Location: Space

Post by King Of Chaos »

Yes. Just more structured and maybe in a more wiki-type of format. =)
[url=http://www.eidolons-inn.net/tiki-index.php?page=Kega]Kega Fusion Supporter[/url] | [url=http://byuu.cinnamonpirate.com/]bsnes Supporter[/url] | [url=http://aamirm.hacking-cult.org/]Regen Supporter[/url]
byuu

Post by byuu »

I'm sure you're exhausted, but does this revelation mean a v0.031 release?
I guess I can do that soon, if only to fix that god awful beeping from IsDialogMessage ...
Stuff like that might actually get MORE people interested in SNES development.
If this thread gets anyone interested in joining SNES development, then they are seriously masochistic. But then, you sort of have to be if you want to get anywhere.
Like RHDN?
That's not really suited for emulator development. Especially mixing in modern docs with antiques that are seriously flawed. I see that they at least added some general comments to the older docs to warn about this now, but I honestly don't think that goes far enough.

Don't get me wrong, I really respect Y0shi / koitsu going bananas with his documentation, and Qwertie` exposing the private parts of the SNES, but I would just as soon wish that people would move on from my docs in ~5-10 years when more suitable docs are created by future emulator authors. And yes, I know anomie's docs can be difficult to read for a beginner. It's still better to learn things the right way, as it's much harder to unforget wrong information than it is to learn correct information.

And no, I don't have a better system than RHDN in mind at the moment.
Dullaron
Lurker
Posts: 199
Joined: Mon Mar 10, 2008 11:36 pm

Post by Dullaron »

Palin I rather wait a little longer on stable v0.031 release than having a broken one. I say give byuu all the time he want to find the fix without rushing him.
Window Vista Home Premium 32-bit / Intel Core 2 Quad Q6600 2.40Ghz / 3.00 GB RAM / Nvidia GeForce 8500 GT
Nightcrawler
Romhacking God
Posts: 922
Joined: Wed Jul 28, 2004 11:27 pm
Contact:

Post by Nightcrawler »

byuu wrote:
Like RHDN?
That's not really suited for emulator development. Especially mixing in modern docs with antiques that are seriously flawed. I see that they at least added some general comments to the older docs to warn about this now, but I honestly don't think that goes far enough.

Don't get me wrong, I really respect Y0shi / koitsu going bananas with his documentation, and Qwertie` exposing the private parts of the SNES, but I would just as soon wish that people would move on from my docs in ~5-10 years when more suitable docs are created by future emulator authors. And yes, I know anomie's docs can be difficult to read for a beginner. It's still better to learn things the right way, as it's much harder to unforget wrong information than it is to learn correct information.

And no, I don't have a better system than RHDN in mind at the moment.
User Reviews should be operational in a few weeks time. That should help the issue even more. Though even without user reviews, if there any documents YOU know about to be inaccurrate or whatever, you probably should note it in the description so others will know. Every entry is freely editable. ;)

RHDN isn't in the business of removing old documents from the archive for various valid reasons which seems to be what you would like. So, it's not everybody's cup of tea and never will be. Though perhaps an 'outdated' field recommending a better document or something added to the database might help in that direction. I don't know.
[url=http://transcorp.romhacking.net]TransCorp[/url] - Home of the Dual Orb 2, Cho Mahou Tairyku Wozz, and Emerald Dragon SFC/SNES translations.
[url=http://www.romhacking.net]ROMhacking.net[/url] - The central hub of the ROM hacking community.
King Of Chaos
Trooper
Posts: 394
Joined: Mon Feb 20, 2006 3:11 am
Location: Space

Post by King Of Chaos »

Perhaps an archive of older out of date documentation would suffice in that case?
[url=http://www.eidolons-inn.net/tiki-index.php?page=Kega]Kega Fusion Supporter[/url] | [url=http://byuu.cinnamonpirate.com/]bsnes Supporter[/url] | [url=http://aamirm.hacking-cult.org/]Regen Supporter[/url]
Snark
Trooper
Posts: 376
Joined: Tue Oct 31, 2006 7:17 pm

Post by Snark »

byuu wrote:
Does this fix resolve any game issue's?
Nope, it doesn't fix any game. It just improves the timing precision ~15-20% of IRQs generated by almost exactly one microsecond (one ten millionth of a second.)
Good job on the findings and the test byuu. TI've said this before but SNES emulation has come a looong way. Let's hope in the future someone(s) would give you a hand with the arduous task of emulating the dot based Snes ppu
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

byuu wrote:One that anomie always mentioned was trying to read from $2180 twice in a row.
Don't DEC ABS,X and a few others read the source twice in a row? That seems a simpler approach.
byuu

Post by byuu »

Don't DEC ABS,X and a few others read the source twice in a row? That seems a simpler approach.
Not on the SNES, at least. DEC looks like this:

Code: Select all

void sCPU::op_dec_addrx() {
  aa.l = op_readpc();
  aa.h = op_readpc();
  op_io();
  rd.l = op_readdbr(aa.w + regs.x.w);
  if(!regs.p.m)rd.h = op_readdbr(aa.w + regs.x.w + 1);
  op_io();
  if(regs.p.m) { op_dec_b(); }
  else { op_dec_w();
  op_writedbr(aa.w + regs.x.w + 1, rd.h); }
  last_cycle();
  op_writedbr(aa.w + regs.x.w,     rd.l);
}
It seems the SNES tries to avoid duplicate reads due to the introduction of differing memory access speeds. It'd be slower when executing out of a fast memory region.

For the NES, it obviously didn't matter as much, since all cycles took the same amount of time.
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

Even with 6502 emulation mode enabled?
byuu

Post by byuu »

Even with 6502 emulation mode enabled?
According to everything I've ever read on the subject, yes. But I'll admit that I've never tested this myself.

But that sounds like an easy enough test ...

Code: Select all

sec; xce
stz $2181; stz $2182; stz $2183
-; lda #$01; tax; sta $7e0000,x; inc; inx; cpx #$08; bcc -
inc $2180,x
-; ldx #$00; lda $7e0000,x; sta $700000,x; inx; cpx #$08; bcc -
See what's in save RAM now.

I'm too burned out to drag out the copier. Anyone else want to give it a try? :)

EDIT: index.
Last edited by byuu on Thu Apr 10, 2008 4:01 pm, edited 1 time in total.
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

It's absolute indexed, not just plain absolute. But anyway, I tried it and I don't see any dummy reads, even in emulation mode. Nice to know about the WRAM register; useful for tests like this. Also pretty useful in general, as an extra auto-indexing memory pointer.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuu wrote:Final test complete. We nailed it!
Awesome work. I look forward to reading your article about this. I find your thoroughness and uncompromising attention to detail on bsnes to be an inspirational example, and the fact that this has been solved would prove the merits of your approach even if bsnes had not already proved it in spades. =)
byuu wrote: Anyway, the best part about this is how easy it is to emulate. There's no need for cycle counting, lookup tables, or any nonsense like that.

Instead, we just change the opcodes themselves.

CLC changes from:

Code: Select all

void op_clc() {
  last_cycle();
  op_io();
  regs.p.c = 0;
}
To:

Code: Select all

void op_clc() {
  last_cycle();
  if(event.irq) op_read(regs.pc.d);
  else op_io();
  regs.p.c = 0;
}
Do this for each of the ~20 or so immediate opcodes, and the effect is fully emulated. Best of all, this isn't just a theory to match observed results, hundreds of tests have narrowed and ruled out every other last possibility.

So yeah, case closed. Thank you everyone for the help.
I would like to throw out a possibility here... I might be completely wrong because I don't have the detailed SNES emulation knowledge that you guys have.

However...

I remember writing a post on the nesdev board a year or two ago about how the 6502 instructions are essentially pipelined with a pipeline depth of 2. In other words, the NES CPU accesses memory in every cycle, AND it does some internal operations in every cycle. Due to this, instructions actually overlap by one cycle---the final "execution" cycle of the opcode, and the first memory read of the next instruction (i.e. the fetch of the opcode itself) actually occur at the same time in real hardware:

Code: Select all

InsnA-Mem1
InsnA-Mem2   InsnA-Exec1
InsnA-Mem3   InsnA-Exec2
InsnB-Mem1   InsnA-Exec3    <-- the overlap cycle
InsnB-Mem2   InsnB-Exec1
...          InsnB-Exec2
However, since we are emulating things sequentially anyways, we find it convenient to pretend that the "execution" work of a cycle occurs at the same time as the "memory" work of the PREVIOUS cycle.

Code: Select all

What actually happens on    |    However, we find it convenient
hardware looks like this:   |    to emulate it like this:
                            |
...                         |    ...           
InsnA-Mem1                  |    --------------    
--------------              |    InsnA-Mem1
InsnA-Exec1     <-- (same   |    InsnA-Exec1   
InsnA-Mem2      <-- time)   |    --------------
--------------              |    InsnA-Mem2    
InsnA-Exec2                 |    InsnA-Exec2   
InsnA-Mem3                  |    --------------
--------------              |    InsnA-Mem3    
InsnA-Exec3                 |    InsnA-Exec3   
InsnB-Mem1                  |    --------------
--------------              |    InsnB-Mem1    
InsnB-Exec1                 |    InsnB-Exec1   
InsnB-Mem2                  |    --------------
--------------              |    InsnB-Mem2    
InsnB-Exec2                 |    InsnB-Exec2   
...                         |    --------------

I'm wondering if the same thing is also true of the SNES CPU and the way that bsnes or other SNES emulators do the work of each cycle. (I haven't checked the source code of any of them, I'm at work and I don't have time)

If so, could it explain why you appear to have this memory access in the "last cycle" of the opcodes, which is affected by the way the IRQ works? I.e. is it the case that on real hardware, the "first" cycle of the IRQ is the same cycle in which the "last" cycle's worth of execution is done for the previous instruction?

(After thinking about it for 10 seconds, it feels like it might be wrong... oh well, proceed to demolish this idea)
byuu

Post by byuu »

I'm wondering if the same thing is also true of the SNES CPU and the way that bsnes or other SNES emulators do the work of each cycle ... If so, could it explain why you appear to have this memory access in the "last cycle" of the opcodes, which is affected by the way the IRQ works?
The SNES is exactly the same as the NES. A two-stage pipeline is used.

I tried to think of a way to emulate this, and really couldn't come up with anything sensible. The main thing to keep in mind is that the work cycle and bus cycle do different things, so it doesn't matter if they don't execute at the same time.

The only time it matters is for IRQ testing. And there it's exactly as you surmise. The bus cycle has finished the opcode, but the last work cycle has not occured. Hence, an opcode like sei will test IRQs before the last cycle that sets the interrupt flag. So if I were clear, an IRQ will trigger.

The way bsnes emulates this is to stick last_cycle() before the last work cycle of each opcode. I'm not sure about other emulators. It's only needed for one or two games, so some may choose to omit this entirely.
Last edited by byuu on Sat Apr 26, 2008 12:12 am, edited 1 time in total.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuu wrote:The SNES is exactly the same as the NES. A two-stage pipeline is used.

I tried to think of a way to emulate this, and really couldn't come up with anything sensible. The main thing to keep in mind is that the work cycle and bus cycle do different things, so it doesn't matter if they don't execute at the same time.

The only time it matters is for IRQ testing. And there it's exactly as you surmise. The bus cycle has finished the opcode, but the last work cycle has not occured. Hence, an opcode like sei will test IRQs before the last cycle that sets the interrupt flag. So if I were clear, an IRQ will trigger.

The way bsnes emulates this is to stick last_cycle() before the last work cycle of each opcode. I'm not sure about other emulators. It's only needed for one or two games, so some may choose to omit this entirely.
So just to make sure I understand this.... You use last_cycle() to handle the testing of the IRQ flag at the right time, and the new code pattern, "if(event.irq) op_read(regs.pc.d); else op.io();" is to handle the fact that the first cycle of the IRQ reads memory.

And the reason this "special IRQ read logic" needs to be mixed into the opcode emulation, is because on real hardware, they overlap for 1 cycle due to the 2-stage pipeline. And most of the time the next thing to happen (fetch next opcode) is totally predictable, but this IRQ case is one where its different from the predicted thing.

[Edit: okay, your last post contains the answers to my questions, I just didn't read it carefully enough.]

Anyway, I'm super impressed that you got to the bottom of this whole thing. Good sleuthing. 8)
Locked