Whenever an NMI or IRQ triggers, some basic code executes to enter the interrupt (backup PC/P, load vector, etc).
The W65C815S document notes that the first two cycles are I/O. We know that is false, because I/O takes 6 clocks. Two would be 12.
We've always observed it to be 14 clocks, until today. Here's the basic routine:
Code: Select all
void sCPU::op_irq() {
op_read(regs.pc.d);
op_io();
if(!regs.e) op_writestack(regs.pc.b);
op_writestack(regs.pc.h);
op_writestack(regs.pc.l);
op_writestack(regs.e ? (regs.p & ~0x10) : regs.p);
rd.l = op_read(event.irq_vector + 0);
regs.pc.b = 0x00;
regs.p.i = 1;
regs.p.d = 0;
rd.h = op_read(event.irq_vector + 1);
regs.pc.w = rd.w;
}
However, when writing up some starter tests for Mecarobot, I hit some edge cases where the first two cycles of IRQ were taking 16 clock cycles instead of 14.
I ruled out all that I could ... it's not related to the FastROM setting, it isn't related to the speed of the most recent address on the bus, etc.
Here's my notes so far:
Code: Select all
nop = 14 cycles [opfetch + i/o]
bit = 30 cycles [opfetch + aal + aah + read]
bpl = 20 cycles [opfetch + relfetch + i/o]
Cycle VDA VPA RWB AddrBus DataBus
nop.0 1 1 1 PBR,PC Opcode
nop.1 0 0 1 PBR,PC+1 I/O
-----
bit.0 1 1 1 PBR,PC Opcode
bit.1 0 1 1 PBR,PC+1 AAL
bit.2 0 1 1 PBR,PC+2 AAH
bit.3 1 0 1/0 DBR,AA Data Low
-----
bpl.0 1 1 1 PBR,PC Opcode
bpl.1 0 1 1 PBR,PC+1 Offset
bpl.2 0 0 1 PBR,PC+1 I/O
HTIME latching results:
VTIME Emulation Hardware
20 = 24/6f 24/70 Slow
21 = 23/6e 23/6f Slow
22 = 29/75 29/75 Fast
23 = 28/74 28/74 Fast
24 = 27/73 27/73 Fast
25 = 26/72 26/72 Fast
26 = 25/71 25/71 Fast
27 = 24/70 24/70 Fast
28 = 23/6f 23/6f Fast
29 = 22/6e 22/6e Fast
2a = 27/72 27/72 Fast
2b = 26/71 26/71 Fast
2c = 25/70 25/70 Fast
2d = 24/6f 24/6f Fast
2e = 23/6e 23/6e Fast
2f = 25/71 26/71 Slow
20 = slow
008104 bpl $8100 [$008100] A:0020 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdiZC V: 31 H:1354
1354 = opfetch
1362 = rel fetch
6 = i/o
* IRQ transition @ 32, 10
008100 nop A:0020 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdiZC V: 32 H: 12
12 = opfetch
20 = i/o
* IRQ @ 32, 26
21 = slow
008100 nop A:0021 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdiZc V: 33 H: 8
8 = opfetch
16 = i/o
* IRQ transition @ 33, 10
* IRQ @ 33, 22
22 = fast
008100 nop A:0022 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 34 H: 4
4 = opfetch
12 = i/o
* IRQ transition @ 34, 10
008101 bit $4212 [$004212] A:0022 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 34 H: 18
18 = opfetch
26 = $12 fetch
34 = $42 fetch
42 = $4212 fetch
* IRQ @ 34, 48
23 = fast
008100 nop A:0023 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 35 H: 0
0 = opfetch
8 = i/o
* IRQ transition @ 35, 10
008101 bit $4212 [$004212] A:0023 X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 35 H: 14
14 = opfetch
22 = $12 fetch
30 = $42 fetch
38 = $4212 read
* IRQ @ 35, 44
2a = fast
008101 bit $4212 [$004212] A:002a X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 41 H:1350
1350 = opfetch
1358 = $12 fetch
2 = $42 fetch
10 = $4212 read
* IRQ transition @ 42, 10
008104 bpl $8100 [$008100] A:002a X:8000 Y:0000 S:01ff D:0000 DB:00 nvMxdizc V: 42 H: 16
16 = opfetch
24 = rel fetch
32 = i/o
* IRQ @ 42, 38
2f = slow
008104 bpl $8100 [$008100] A:002f X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 46 H:1360
1360 = opfetch
4 = rel fetch
12 = i/o
* IRQ transition @ 47, 10
008100 nop A:002f X:8000 Y:0000 S:01ff D:0000 DB:00 nVMxdizc V: 47 H: 18
18 = opfetch
26 = i/o
* IRQ @ 47, 32
VTIME Transition Trigger TriggerAddr TransOp Delay
20 i/o->opfetch i/o $008100 nop 8+8 (16)
21 opfetch->i/o i/o $008100 nop 8+8 (16)
22 opfetch->i/o read $004212 bit 8+6 (14)
23 i/o->opfetch read $004212 bit 8+6 (14)
2a r4212->opfetch i/o $008105 bpl 8+6 (14)
2f relfetch->i/o i/o $008100 nop 8+8 (16)
However, whenever the IRQ begins after either BPL or BIT, then it takes 14 clocks of time.
This doesn't make a damn bit of sense, even taking the pipeline into consideration.
If we assumed it was related to the second to last cycle, NOP would indeed be 8, we could get 8+8 that way. But the second to last cycle of both BIT and BPL are slow memory region fetches as well, so that doesn't work.
If we look at the very last cycle, NOP is an I/O cycle, but so is BPL; and BIT's last cycle reads from a fast memory region ($4212.) Both take 6 clock cycles.
Note that I'm fairly confident it's the last opcode before entering the IRQ that is causing this issue. Both cli and nop end up with 16 clocks (used cli with my Mecarobot test and it was throwing it off -- not enough to be the cause of the bug, but it's a start), and I have other tests. All of my other IRQ latch timing tests use - lda $4212 : bpl - or - bit $4212 : bpl -. This is why I always got perfect timing with them, of course. I went on the assumption time was always 14 clocks, and it was because of that.
So, anyone have any ideas why NOP/CLI take 16 clocks inside the IRQ, yet BIT/BPL take 14 clocks? Any low-level NES emu authors work with this issue before?