Overload wrote:I also want to add in that cycle delay before MDMA starts.
...You want to add in the cycle delay before 3,4-methylenedioxymethamphetamine starts? o_O
Or perhaps you have a dvorak keyboard and/or were meaning HDMA? :)
That should be simple enough, HDMA has an 18 cycle pause at the start (if any channels are active), and 8 cycles for each active channel, as well as 16 additional cycles if that channel uses indirect mode. Info courtesy of anomie.
How did you figure this? Actual tests, or just theory?
Actual tests. Set up the SNES at first power-on with:
clc : xce : rep #$20 : pea $2100 : pld
Now notice if you use lda $37, you get latch value
n.
Now if you use lda $2137, you get latch value
n + 2.
If you use asl $2137, you also get latch value
n + 2.
If you use lda $2136, you get latch value
n + 4.
lda $2137 has more cycles than lda $37, so therefore it should have a higher latch value, which it does.
But then at the same time, asl $2137 has more cycles than lda $2137 (because it's an RMW opcode), and yet they share the same latch value, thats because the read happens at the same cycle during the opcode. lda $37 has one less memory read cycle (operand fetch) than lda $2137, which explains why its latch value is lower.
lda $2137 latches a lower value than lda $2136 because $2137 itself is read sooner in the opcode.
To further prove this, I have about 6-8 opcodes tested in my previous post near the bottom. Using my theory, the numbers match up to an SNES every time with direct page, absolute, indirect, indirect long, and absolute RMW opcodes.
Yes, it is sometime during that cycle where the latch occurs. But there is nothing saying that the latch actually has to occur at the beginning of that cycle.
Nope. It could be anywhere from the very start of that cycle, to the very end of it. As far as I know, there's no way to find that out without knowing -exactly- what cycle the SNES starts on, or by using an oscillator as you stated. I don't even know what an oscillator/multimeter does, so I can't help you out there :)
Ok, one possible test, but it's a long shot. Set up two test ROMS that will begin the (final) $2137 read at a known cycle count. One should use LDA $2137, and one should use DMA. If it's really the end of the read cycle, the second will latch .5 dot later. OTOH, it could just be always cycle 6 and then both would be the same.
I see where you're going with this. A difficult test, but doable.
Nope. You have to update the cycle counter every CPU cycle, but you pretty much have to do this anyway with the 6-or-8-or-12-master-cycles-per-CPU-cycle deal.
Before, I had a routine that would decode any addressing type, and store the indirect and direct addresses, as well as set flags depending on the various conditions (DL!=0, bank boundary cross, etc.), at the start of the opcode. I'd then perform the entire opcode, then use a routine that would add the specified number of direct and indirect memory cycles to the cycle counter.
You only have to calculate the dot position when you read $2137 (or do the trick with $4201).
Not really, most opcodes will need it. Such as reading from $4210/$4212, hblank status could change in the middle of the opcode. This is my theory with $4210 bit 7. If you read from $4210 and that read happens in the middle of the opcode, it reutrns $4210 bit 7 as being set. This then clears the NMI status bit, but then at the end of the opcode, it's set again automatically. The NMI is pulled low, and $4210 bit 7 (inverted NMI pin) is high again. This is all untested theory.
This could also be needed for registers like $2118. If you leave vblank in the middle of the opcode, the write might get rejected by the PPU. Again, untested but possible.
It doesn't seem too bad to just update the counter at the end of every opcode, and once during MMIO reads/writes ($2000-$5fff).
Note that IRQs don't need the current dot position, you just have to know which cycle corresponds to where it's aimed in order to trigger it.
You mean by storing the cycle it should stop on in advance... that could be tricky because you'd have to calculate all the complexities like the 1360-cycle line, the longer dots, and the extra line in interlace mode, and hope the user doesn't change any settings that would affect this before the IRQ was invoked. I think it'd be easier to just compare it against the current dot position.
Sure you can. For example, any read to a ROM area or WRAM will very likely have no difference.
Yeah, but then you get into a bunch of nasty if/else structs.
if(memory_area(addr) == MEMAREA_ROM || memory_area(addr) == MEMAREA_WRAM) {
do_word_opcode();
} else { //MEMAREA_MMIO
do_byte_opcode();
do_byte_opcode();
}
It would only be a few x86 clock cycles more to just always do the latter. Especially if do_byte_opcode is a define and not a function, something I avoid in order to keep executable size down. Not to mention it reduces code complexity.
I have no idea. At one point I thought the IRQ delay was ~5.5 dots, but with your new testing method I'll be better able to re-measure it just like I did NMI.
I'd recommend waiting until I have this new cycle counter done. We could time things easier by just changing the NMI/IRQ start positions slightly and finding the nearest 2-master-cycle variance where things like $421x bits change.
I realize my current implementation of latching counters mid-opcode is not perfect to a real SNES for the aforementioned reasons, but if the results
always match a real SNES perfectly, then it doesn't really matter, does it? I think the best way to find out where within that one cycle the counter is latched, would be to try writing to registers that can only be written to during vblank ($2118, ...) right near the transition between outside to inside vblank, and then reading that value and printing a message saying whether the write succeeded. And again, as long as that result matches in emulation to SNES hardware, it shouldn't matter if it isn't really correct, right?
The thing that really scares me is perfect PPU emulation. Now that's something that's
really going to require some CPU power.
The PPU renders dot-by-dot, and this most likely would be needed to implement things like writing to $2100 while in the middle of the scanline.
I think it is possible, though. You would have to make the PPU check its dot position after every single cycle passed, and draw a new dot when needed. You'd obviously resort to using addition and static counters to move along each line, but my god is that going to be tough. I'll worry about that much, much later :)
Lastly, I have a question for Overload: I see that you're interested in adding sound support to Super Sleuth at some point. I'm aware you already have the spc700 (and dsp?) implemented, but just don't do the actual sound sample decoding and output.
I personally don't really know much (anything at all) about how writing sound to a sound card would work, and I don't know how adept you are at that. I could definately use someones assistance in implementing this, so would you possibly consider working with me side-by-side in trying to figure out how to go about adding real sound output to an emulator? We could add sound to both at the same time. No big deal if not, it can't be
that hard... I know your emulator is written in Dephi, that shouldn't be a problem if we use APIs.
Also, it's good to hear you're planning on implementing these findings into Super Sleuth :D I do recommend waiting until we finish our testing though, as our results / findings seem to change daily, heh.