Latch timing

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

Post Reply
Nach
ZSNES Developer
ZSNES Developer
Posts: 3904
Joined: Tue Jul 27, 2004 10:54 pm
Location: Solar powered park bench
Contact:

Post by Nach »

byuusan wrote: Dr. Dobbs recently wrote a nice document about how processor speeds are starting to reach their theoretical limits. Anyone notice how the 2GHz processor was released in 2001 or so? We're still not at 4GHz.
I don't buy it.

There's more to CPU speed than it's clock speed.

They'll also get around the limits eventually when they realize something. Remember when net over phone lines was never expected to break 56K?
May 9 2007 - NSRT 3.4, now with lots of hashing and even more accurate information! Go download it.
_____________
Insane Coding
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:The good news is that I figured out how the opcode cycles factor into the latch positions returned by $2137, the bad news is that it actually happens in the middle of the opcode.
How did you figure this? Actual tests, or just theory?
The fourth cycle is where $2137 is actually read. The counter is latched right here. So you have cycle pos + 3 memory cycles (at pbr,pc (+1, +2)).
Yes, it is sometime during that cycle where the latch occurs. But there is nothing saying that the latch actually has to occur at the beginning of that cycle. And the problem is, I can think of no software test which will discriminate the two. The CPU can start at any cycle 130-136 and latch during the appropriate master cycle of the read to give us dot 212. Now, if someone can hook up a counting device to the master clock and to the PPU's /RESET and the CPU's /RESET and count the clock cycles difference between the two...

Ok, one possible test, but it's a long shot. Set up two test ROMS that will begin the (final) $2137 read at a known cycle count. One should use LDA $2137, and one should use DMA. If it's really the end of the read cycle, the second will latch .5 dot later. OTOH, it could just be always cycle 6 and then both would be the same.
How could the latch know where it would be after the latch register itself was read? I doubt the SNES would update the cycle position before actually reading $2137.)
There's no real "cycle position" as far as the SNES is concerned. When PPU2 decodes a read for address $37 (or the latch pin does the appropriate transtition, BTW), it activates the latch circuit to read the current value of the dot counter. The dot counter is being updated continuously anyway, so the PPU knows when to output the color burst, when to output the blanking signals, and everything else.
This works exactly like you'd expect for indirect, R-M-W opcodes, etc.
The actual cycle that reads from (not writes to) $2137 (whether it be direct, indirect, or whatever) is where the latch occurs.
Good to have confirmation on that.
Keep in mind that you would have to recalculate the x/y dot positions, along with all of their quirks (longer/shorter dots, missing scanlines, etc) after each cycle. Or at the very least, after each opcode and before each MMIO read/write.
Nope. You have to update the cycle counter every CPU cycle, but you pretty much have to do this anyway with the 6-or-8-or-12-master-cycles-per-CPU-cycle deal. Maybe some opcodes will be slightly slower, since you may have to add 6 twice instead of just adding 12 once or something. You only have to calculate the dot position when you read $2137 (or do the trick with $4201). Note that IRQs don't need the current dot position, you just have to know which cycle corresponds to where it's aimed in order to trigger it.
Notice how I can't even perform a word-read/write without breaking the timing.
Sure you can. For example, any read to a ROM area or WRAM will very likely have no difference.
nor any offsets in the actual latch value returned (I remember someone saying that the latch position is off by 5.5 dots or so?)
I have no idea. At one point I thought the IRQ delay was ~5.5 dots, but with your new testing method I'll be better able to re-measure it just like I did NMI.
Overload
Hazed
Posts: 70
Joined: Sat Sep 18, 2004 12:47 am
Location: Australia
Contact:

Post by Overload »

byuusan wrote: With all that said, I'm mildly curious. Does anyone plan to implement any/all of these findings into their emulators? Or is this just strictly going to be for research/documentation purposes? I realize how unfeasible a lot of these quirks are to implement.
I was going to put exact cycle latching into Super Sleuth. At the moment I just add the total cycle count at the beginning of each opcode. I must of rewritten my CPU core at least 20 times in the last 6 years, fortunately it's becoming easier to add more features to. I also want to add in that cycle delay before MDMA starts.

I really don't think you need a super fast computer to run a SNES emulator. I would have to say that the PPU is the most CPU hungry part of my SNES emulator.
Dmog
Lurker
Posts: 192
Joined: Tue Aug 31, 2004 6:03 pm

Post by Dmog »

zidanax wrote:Personally, I think it would be nice to have an emulator that's extemely accurate, even if it's so slow it won't play at full speed on my computer.
Agree, besides, your average computer today is much faster than the average computer that were around when Zsnes was young.

Byuusan once wrote: "Requiring more than 1000 times the original processing power to attain full speed emulation is, to me a travesty"

Well imo, that's the nature of (good) emulation. Snes9x's current requirment is around 1Ghz (if you want 60/60fps)...That's allready 300/200 times the speed of the original processor. It's never as simple as just comparing the host processor and the virtual,emulated processor. It's not simply about clock frequency.
Dmog
Lurker
Posts: 192
Joined: Tue Aug 31, 2004 6:03 pm

Post by Dmog »

byuusan wrote:The problem is that it likely won't play at full speed on any computers for a long time.
So? Mame allready emulates some games that can't run full speed, or even half of that, on any PC right now. And aside from some lame kiddies, everyone understand that.
Dr. Dobbs recently wrote a nice document about how processor speeds are starting to reach their theoretical limits.
That's interesting, I'll check it out.
Also, this won't really fix any games. Maybe one or two stubborn ones. I would wager quite a bit that virtually no games in existance care about most of this stuff we're finding out.
I believe emulation should be first and foremost about emulating hardware. Software comes second. Games may or may not be affected by those things. Botom line, if you're emulating those things, you're sure you're doing things right.
I have all of my findings saved, and I remember them in my mind as well, so don't worry about losing anything. I want to complete my understanding of something before trying to document it.


And people in the future will be grateful. Look at C64 emulation. Back when people realized they had to write cycle-exact emulators to make the last few things work right, they didn't have PCs fast enough to run them. Now, we do, and we don't have to worry about accuracy.
Indeed. Unless the PCs of tomorrow really won't have faster processor frequencies(or better processor architecture or whatever)...which, at this time, seem unlikely.
byuu

Post by byuu »

Overload wrote:I also want to add in that cycle delay before MDMA starts.
...You want to add in the cycle delay before 3,4-methylenedioxymethamphetamine starts? o_O
Or perhaps you have a dvorak keyboard and/or were meaning HDMA? :)
That should be simple enough, HDMA has an 18 cycle pause at the start (if any channels are active), and 8 cycles for each active channel, as well as 16 additional cycles if that channel uses indirect mode. Info courtesy of anomie.
How did you figure this? Actual tests, or just theory?
Actual tests. Set up the SNES at first power-on with:
clc : xce : rep #$20 : pea $2100 : pld
Now notice if you use lda $37, you get latch value n.
Now if you use lda $2137, you get latch value n + 2.
If you use asl $2137, you also get latch value n + 2.
If you use lda $2136, you get latch value n + 4.
lda $2137 has more cycles than lda $37, so therefore it should have a higher latch value, which it does.
But then at the same time, asl $2137 has more cycles than lda $2137 (because it's an RMW opcode), and yet they share the same latch value, thats because the read happens at the same cycle during the opcode. lda $37 has one less memory read cycle (operand fetch) than lda $2137, which explains why its latch value is lower.
lda $2137 latches a lower value than lda $2136 because $2137 itself is read sooner in the opcode.
To further prove this, I have about 6-8 opcodes tested in my previous post near the bottom. Using my theory, the numbers match up to an SNES every time with direct page, absolute, indirect, indirect long, and absolute RMW opcodes.
Yes, it is sometime during that cycle where the latch occurs. But there is nothing saying that the latch actually has to occur at the beginning of that cycle.
Nope. It could be anywhere from the very start of that cycle, to the very end of it. As far as I know, there's no way to find that out without knowing -exactly- what cycle the SNES starts on, or by using an oscillator as you stated. I don't even know what an oscillator/multimeter does, so I can't help you out there :)
Ok, one possible test, but it's a long shot. Set up two test ROMS that will begin the (final) $2137 read at a known cycle count. One should use LDA $2137, and one should use DMA. If it's really the end of the read cycle, the second will latch .5 dot later. OTOH, it could just be always cycle 6 and then both would be the same.
I see where you're going with this. A difficult test, but doable.
Nope. You have to update the cycle counter every CPU cycle, but you pretty much have to do this anyway with the 6-or-8-or-12-master-cycles-per-CPU-cycle deal.
Before, I had a routine that would decode any addressing type, and store the indirect and direct addresses, as well as set flags depending on the various conditions (DL!=0, bank boundary cross, etc.), at the start of the opcode. I'd then perform the entire opcode, then use a routine that would add the specified number of direct and indirect memory cycles to the cycle counter.
You only have to calculate the dot position when you read $2137 (or do the trick with $4201).
Not really, most opcodes will need it. Such as reading from $4210/$4212, hblank status could change in the middle of the opcode. This is my theory with $4210 bit 7. If you read from $4210 and that read happens in the middle of the opcode, it reutrns $4210 bit 7 as being set. This then clears the NMI status bit, but then at the end of the opcode, it's set again automatically. The NMI is pulled low, and $4210 bit 7 (inverted NMI pin) is high again. This is all untested theory.
This could also be needed for registers like $2118. If you leave vblank in the middle of the opcode, the write might get rejected by the PPU. Again, untested but possible.
It doesn't seem too bad to just update the counter at the end of every opcode, and once during MMIO reads/writes ($2000-$5fff).
Note that IRQs don't need the current dot position, you just have to know which cycle corresponds to where it's aimed in order to trigger it.
You mean by storing the cycle it should stop on in advance... that could be tricky because you'd have to calculate all the complexities like the 1360-cycle line, the longer dots, and the extra line in interlace mode, and hope the user doesn't change any settings that would affect this before the IRQ was invoked. I think it'd be easier to just compare it against the current dot position.
Sure you can. For example, any read to a ROM area or WRAM will very likely have no difference.
Yeah, but then you get into a bunch of nasty if/else structs.
if(memory_area(addr) == MEMAREA_ROM || memory_area(addr) == MEMAREA_WRAM) {
do_word_opcode();
} else { //MEMAREA_MMIO
do_byte_opcode();
do_byte_opcode();
}
It would only be a few x86 clock cycles more to just always do the latter. Especially if do_byte_opcode is a define and not a function, something I avoid in order to keep executable size down. Not to mention it reduces code complexity.
I have no idea. At one point I thought the IRQ delay was ~5.5 dots, but with your new testing method I'll be better able to re-measure it just like I did NMI.
I'd recommend waiting until I have this new cycle counter done. We could time things easier by just changing the NMI/IRQ start positions slightly and finding the nearest 2-master-cycle variance where things like $421x bits change.

I realize my current implementation of latching counters mid-opcode is not perfect to a real SNES for the aforementioned reasons, but if the results always match a real SNES perfectly, then it doesn't really matter, does it? I think the best way to find out where within that one cycle the counter is latched, would be to try writing to registers that can only be written to during vblank ($2118, ...) right near the transition between outside to inside vblank, and then reading that value and printing a message saying whether the write succeeded. And again, as long as that result matches in emulation to SNES hardware, it shouldn't matter if it isn't really correct, right?

The thing that really scares me is perfect PPU emulation. Now that's something that's really going to require some CPU power.
The PPU renders dot-by-dot, and this most likely would be needed to implement things like writing to $2100 while in the middle of the scanline.
I think it is possible, though. You would have to make the PPU check its dot position after every single cycle passed, and draw a new dot when needed. You'd obviously resort to using addition and static counters to move along each line, but my god is that going to be tough. I'll worry about that much, much later :)

Lastly, I have a question for Overload: I see that you're interested in adding sound support to Super Sleuth at some point. I'm aware you already have the spc700 (and dsp?) implemented, but just don't do the actual sound sample decoding and output.
I personally don't really know much (anything at all) about how writing sound to a sound card would work, and I don't know how adept you are at that. I could definately use someones assistance in implementing this, so would you possibly consider working with me side-by-side in trying to figure out how to go about adding real sound output to an emulator? We could add sound to both at the same time. No big deal if not, it can't be that hard... I know your emulator is written in Dephi, that shouldn't be a problem if we use APIs.
Also, it's good to hear you're planning on implementing these findings into Super Sleuth :D I do recommend waiting until we finish our testing though, as our results / findings seem to change daily, heh.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:Actual tests. Set up the SNES at first power-on with:
clc : xce : rep #$20 : pea $2100 : pld
Now notice if you use lda $37, you get latch value n.
Now if you use lda $2137, you get latch value n + 2.
If you use asl $2137, you also get latch value n + 2.
If you use lda $2136, you get latch value n + 4.
Sorry, i meant "how did you figure it latches at the beginning of the read cycle and not the end?"... This is to be expected, since it must latch in response to the read cycle that actually reads $2137. But confirmation is good.
Not really, most opcodes will need it. Such as reading from $4210/$4212, hblank status could change in the middle of the opcode.
That just needs the current cycle position at the time of the read. No need to the dot position.
This is my theory with $4210 bit 7. If you read from $4210 and that read happens in the middle of the opcode, it reutrns $4210 bit 7 as being set. This then clears the NMI status bit, but then at the end of the opcode, it's set again automatically. The NMI is pulled low, and $4210 bit 7 (inverted NMI pin) is high again. This is all untested theory.
I don't see how that would work. I do LDA $4210, and it's set. IIRC, if I then do LDA $4210 again, it's never set (we should test this and make sure). But occasionally if we do LDA $4210 and read set then NMI triggers at some point and does LDA $4210 again much later, it's set again. Which is weird...
This could also be needed for registers like $2118. If you leave vblank in the middle of the opcode, the write might get rejected by the PPU. Again, untested but possible.
Cycle position again, not dot position.
You mean by storing the cycle it should stop on in advance... that could be tricky because you'd have to calculate all the complexities like the 1360-cycle line, the longer dots, and the extra line in interlace mode, and hope the user doesn't change any settings that would affect this before the IRQ was invoked. I think it'd be easier to just compare it against the current dot position.
The longer dots aren't too hard, and really neither is the 1360 cycle line. V*1364, -4 if V>240 and it's the short-frame, and then add the appropriate H value (lookup table, or H*4+(short_line?0:(H>0x143?2:0)+(H>0x147?2:0))). Changing interlace or whatever, just recalculate.

Or you could check V against the current scanline, and only do H based on cycle count.
Yeah, but then you get into a bunch of nasty if/else structs.
if(memory_area(addr) == MEMAREA_ROM || memory_area(addr) == MEMAREA_WRAM) {
do_word_opcode();
} else { //MEMAREA_MMIO
do_byte_opcode();
do_byte_opcode();
}
It would only be a few x86 clock cycles more to just always do the latter. Especially if do_byte_opcode is a define and not a function, something I avoid in order to keep executable size down. Not to mention it reduces code complexity.
Maybe your memory archetecture is different, but snes9x's version of do_byte_opcode() already has to check if it's a PPU register, a CPU register, a special chip register, or a bunch of other things. do_word_opcode() does 16-bit reads when possible, and does two byte reads when necessary (the caller just calls do_word_opcode() and lets that handle whether to do two do_byte_opcode()s).
I realize my current implementation of latching counters mid-opcode is not perfect to a real SNES for the aforementioned reasons, but if the results always match a real SNES perfectly, then it doesn't really matter, does it?
OTOH, you can't claim that it latches at the beginning of the read cycle versus the end, just that it latches at some point during those 6 master cycles...
I think the best way to find out where within that one cycle the counter is latched, would be to try writing to registers that can only be written to during vblank ($2118, ...) right near the transition between outside to inside vblank, and then reading that value and printing a message saying whether the write succeeded.
Maybe... but maybe we'd just get that VRAM writes are enabled at [1.5,225] versus [0,225] or something. I'd have to really look at it to find uot which combinations of $2137-beginning-or-end and $2118-beginning-or-end might be unambiguous.
And again, as long as that result matches in emulation to SNES hardware, it shouldn't matter if it isn't really correct, right?
That's true, "if a tree falls in the woods and no one can hear it, does it matter if it makes a sound?"
The thing that really scares me is perfect PPU emulation. Now that's something that's really going to require some CPU power.
The PPU renders dot-by-dot, and this most likely would be needed to implement things like writing to $2100 while in the middle of the scanline.
I think it is possible, though. You would have to make the PPU check its dot position after every single cycle passed, and draw a new dot when needed. You'd obviously resort to using addition and static counters to move along each line, but my god is that going to be tough. I'll worry about that much, much later :)
It's not that bad. Unless someone writes $2100 (or whichever other registers are effective) outside H-Blank, you can do things exactly the same as now. When they do write, only then must you render the partial scanline. No need at all to render one pixel every 4 emulated master cycles.

You could even buffer the changes ("$2100 set to N at [X,Y]") and play back that buffer when you render the screen later to save even more context switches.
burning shadow
Rookie
Posts: 32
Joined: Wed Aug 25, 2004 1:55 pm
Location: spb, ru
Contact:

Post by burning shadow »

zidanax wrote:Personally, I think it would be nice to have an emulator that's extemely accurate, even if it's so slow it won't play at full speed on my computer.
I absolutely agree.
burning shadow
Rookie
Posts: 32
Joined: Wed Aug 25, 2004 1:55 pm
Location: spb, ru
Contact:

Post by burning shadow »

byuusan wrote:Also, this won't really fix any games. Maybe one or two stubborn ones. I would wager quite a bit that virtually no games in existance care about most of this stuff we're finding out.
I guess you can write two cores. One will work in opcode-by-opcode mode, another in cycle-by-cycle. Why not?

I'm dreaming of emu which is as accurate as possible.. :)
byuu

Post by byuu »

That just needs the current cycle position at the time of the read. No need to the dot position.
So hblank occurs at the same cycle position on every scanline? Well I guess that makes sense, since it's at dot 278 or something like that...
The cycle->dot calculation really isn't all that complex either. Especially with your formula.
hscan_pos = ((hcycle_pos) - ((ppu.interlace == false && ppu.interlace_frame == 1 && vscan_pos == 240) ? 0 : (hcycle_pos > (0x143 << 2) ? 2 : 0) + ((hcycle_pos + 2) > (0x147 << 2) ? 2 : 0))) >> 2;
Should work. That's almost looks like perl code, I love c/c++ :D
Or you could check V against the current scanline, and only do H based on cycle count.
I like this idea best. Avoid the sticky 1360 lines + extra interlace lines altogether.
Maybe your memory archetecture is different, but snes9x's version of do_byte_opcode() already has to check if it's a PPU register, a CPU register, a special chip register, or a bunch of other things. do_word_opcode() does 16-bit reads when possible, and does two byte reads when necessary (the caller just calls do_word_opcode() and lets that handle whether to do two do_byte_opcode()s).
No, it's the same. Very well encapsulated. And again you're right. I'm nowhere near as good an optimizer as I thought I was :/
The only other problem would be that I might have to mask the (address + 1) on a few addressing types that wrap around the bank/page/whatever. I haven't really worked those out, need to read over the results you posted on that a while back. The 65816 PDF is really vague about that. It always lists things like: DBR,AA+1; which to me looks like AA is the low-order 16-bits, and DBR is always the same. But most of the time, AA+1 can be >= 0x10000, which would change the DBR.
Speaking of which, I have a question about addr,x and addr,y opcode cycles:

Code: Select all

/************************
 *** 0xb9: lda addr,y ***
 ************************
cycles:
  [1 ] pbr,pc         ; operand
  [2 ] pbr,pc+1       ; aal
  [3 ] pbr,pc+2       ; aah
* [3a] dbr,aah,aal+yl ; io [4]
  [4 ] dbr,aa+y       ; data low
  [4a] dbr,aa+y+1     ; data high [1]
*/
How would one implement 3a (condition 4)?
"4) Add one cycle for indexing across page boundaries, or X=0. When X=1 in emulation mode, this cycle contains invalid addresses."
Ok, so if you do lda addr,x; then you need to do cycle 3a when X is 16-bit, or if aal+yl > 0xff? Or is it (aah,aal+yl) > 0x10000?
I was under the impression a page was 256 bytes, and a bank 65536.
But page boundary stuff only applies to emulation mode, right?
OTOH, you can't claim that it latches at the beginning of the read cycle versus the end, just that it latches at some point during those 6 master cycles...
Yeah, you can only do so much with software-only tests... I admit this is my biggest limitation. I don't even know how to use a soldering iron.
That's true, "if a tree falls in the woods and no one can hear it, does it matter if it makes a sound?"
You know that makes a hell of a lot more sense with the "does it matter if it..." part in there?
I always used to ask myself why that was so philosophical. Of course it would make a sound, but who cares since no one hears it?
It's not so much a matter that I don't care that we don't know for sure how the latches work, it's just that if I can't do any better, and it's bit-perfect, it should be ok. It would still be nice to know though, if we can think of an appropriate test to find out.
It's not that bad. Unless someone writes $2100 (or whichever other registers are effective) outside H-Blank, you can do things exactly the same as now. When they do write, only then must you render the partial scanline. No need at all to render one pixel every 4 emulated master cycles.
That's the thing: We only know this is needed for $2100. Isn't there something about writing to $2102/$2103/$2104 mid-scanline as well? Like with Uniracers? Super James Pond 2 was the other weird OAM one, but that already plays fine for me. I guess because I used Charles MacDonald's notes for OAM accesses.
There could be more registers, we won't know until we test each and every one. I'll work on this once we have h/v/blank start/end positions done.
You could even buffer the changes ("$2100 set to N at [X,Y]") and play back that buffer when you render the screen later to save even more context switches.
This is probably the best thing to do. Shouldn't require too much memory.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:The only other problem would be that I might have to mask the (address + 1) on a few addressing types that wrap around the bank/page/whatever. I haven't really worked those out, need to read over the results you posted on that a while back.
You have to watch out for other boundaries too, e.g. LDX $1FFF (bank 0) would read a byte of WRAM and a byte of Open Bus. snes9x just treats every 0x1000 being a potential boundary and needing the special treatment.
"4) Add one cycle for indexing across page boundaries, or X=0. When X=1 in emulation mode, this cycle contains invalid addresses."
Hmmm... my version says index across page boundaries, or write, or X=0.

Anyway, yes, add an IO if it's a write, or X=0, or AAL+XL>0xff. Emulation mode doesn't matter here. I guess the address math unit can realize when there's no carry so it can skip that cycle.

Please verify if you feel like it.
That's the thing: We only know this is needed for $2100. Isn't there something about writing to $2102/$2103/$2104 mid-scanline as well? Like with Uniracers?
That's something different. OAM address invalidation.
Super James Pond 2 was the other weird OAM one, but that already plays fine for me. I guess because I used Charles MacDonald's notes for OAM accesses.
I don't know about his notes, the problem there IIRC was OAM writes to the low table weren't being done properly.
There could be more registers, we won't know until we test each and every one.
But still my point holds. Most games don't write registers that make a visible difference during the rendering of a scanline. And even if they do, we still don't have to go dot-by-dot for everything. Say that the game writes $21xx at [54,128]. We can do nothing until that point, and then render the whole screen up to [53,128] in one fell swoop. And if there are no more writes to $21xx, we can very likely render [0,129] through the end of the screen in one swoop too. Only [54,128]-[255,128] would have to be done dot-by-dot, and depending on just what $21xx does maybe not even that much.
Starman Ghost
Trooper
Posts: 535
Joined: Wed Jul 28, 2004 3:26 am

Post by Starman Ghost »

Nach wrote: There's more to CPU speed than it's clock speed.
That's the magic behind AMD cpu's.
[code]<Guo_Si> Hey, you know what sucks?
<TheXPhial> vaccuums
<Guo_Si> Hey, you know what sucks in a metaphorical sense?
<TheXPhial> black holes
<Guo_Si> Hey, you know what just isn't cool?
<TheXPhial> lava?[/code]
byuu

Post by byuu »

Anyway, yes, add an IO if it's a write, or X=0, or AAL+XL>0xff. Emulation mode doesn't matter here. I guess the address math unit can realize when there's no carry so it can skip that cycle.

Please verify if you feel like it.
Will do. I want to verify the rep/sep extra cycle, too (it should be I/O, but the doc says it's a program cycle, even though there isn't a third byte to read).
And even if they do, we still don't have to go dot-by-dot for everything.
Yeah, there's lots of ways to speed things up like this, it just seems less faithful to the original hardware, even if it is 100% perfectly functionally equivalent. But you're right in how it should be implemented in emulators.

---

I finally finished my CPU core rewrite, so I'll start testing some more stuff now (within the next few days). I should be able to get exact cycle positions for V/HBlank start/end.
They may not be correct if it turns out that the counter is latched mid-cycle, or if the starting cycle position turns out to be wrong, but we can adjust the results if we find out those are wrong later.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Back from vacation/sickness now.

Here's something interesting, discovered way back when. As you know, IRQ is level-triggered, so if you forget to read $4211 in your IRQ handler routine it'll re-trigger as soon as you clear the I flag (probably with your RTI). However, one game (Marko's Magic Football, IIRC) sets I and does some processing with DB=$7e or $7f for a few frames, leaving an IRQ pending during most of that. And the NMI routine for some reason does a CLI before RTI-ing. If you trigger IRQ after the CLI, the IRQ routine will read $7e4211 rather than PPU register $4211, and you're in an infinite loop... Obviously, the real SNES doesn't trigger IRQ or the game wouldn't do that.

I've tested all combinations of using PLP, RTI, CLI, and REP to clear the I flag with all combinations of PLP, RTI, SEI, SEP, BRK, and COP to immediately set it again. The results: PLP/RTI, CLI/RTI, REP/RTI, PLP/BRK, CLI/BRK, REP/BRK, PLP/COP, CLI/COP, and REP/COP will not allow IRQ to trigger, everything else will (including RTI/RTI, RTI/BRK, and RTI/COP). The only other possibility I can think of is using NMI to set I...

[later]

More testing, $4211 bit 7 this time.

V-IRQ acts just like HV-IRQ with H=0. H-IRQ triggers every line, while HV-IRQ triggers only on line V.

With H=0, $4211 bit 7 gets set 1374 master cycles after dot 0.0 of the previous scanline. Otherwise, the bit gets set H*4+14 master cycles after dot 0.0 of the current scanline. In both cases, keep in mind the two long dots throws off the calculation.

Oh, and note that an IRQ set for (153,240) will not trigger on the short frame in non-interlace mode. And for some reason, an IRQ set for dot 153 on the last scanline of the frame (261, or 262 for interlace mode long frames) will not trigger.

Writing $4200 only disables a pending IRQ if the write is disabling IRQs. Enabling IRQs, as long as the write cycle setting $4200 is complete before the 'trigger point' cycle it seems to be able to go. For example, an IRQ set for (2,1) will trigger when the STA $4200 completes at (5,1), but (5.5,1) is too late. H=0 is an exception, if the write completes at (3.0,0) it will still go.

[later]

The delay between $4211 bit 7 and the earliest the IRQ handler may be called seems to work just like NMI: if the previous instruction was pure FastROM, delay is 6 master cycles. Mixed Slow/Fast must wait 8, pure Slow waits 10, and so on.

[later]

$213f bit 7 seems to be toggled at [1.0,0]. Odd choice.

And BTW, it looks to me like the NI short frame is the one with $213f bit 7 = 1, while the interlace long frame is the one with $213f bit 7 = 0. Is that what you got?
byuu

Post by byuu »

$213f bit 7 seems to be toggled at [1.0,0]. Odd choice.
hblank begins at 1.0,[0-262] as well. It could just be that the PPU updates its register status one dot late. We need to get some really close tests to try and write to VRAM during that first dot on a scanline to see if it really is in hblank or not. It'd be best to try on the first line of vblank and use $2118/$2119. It would also be neccesary to match this to the exact write cycle during that opcode.
And BTW, it looks to me like the NI short frame is the one with $213f bit 7 = 1, while the interlace long frame is the one with $213f bit 7 = 0. Is that what you got?
Yes. $213f bit 7 [0 = even frame, 1 = odd frame]
Interlaced even frames have one extra scanline (so interlace $213f.7 = 0 has the extra line), and non-interlaced odd frames are the ones missing one dot on scanline 240 (so non-interlace $213f.7 = 1).

---

By the way, I have some bad news: I just started working again, full-time, and as it's a rather dead-end laborious job, it's sapping all of my motivation and energy away. I'll try and work on things when I can, but expect to see a lot less from me for the next few months :(
I'm moving when my current lease expires and I'll be looking for a better paying part-time job, which should give me a lot more time again.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

byuusan wrote:By the way, I have some bad news: I just started working again, full-time, and as it's a rather dead-end laborious job, it's sapping all of my motivation and energy away. I'll try and work on things when I can, but expect to see a lot less from me for the next few months :(
Ick, i know how that is...
Kagerato
Lurker
Posts: 153
Joined: Mon Aug 09, 2004 1:40 am
Contact:

Post by Kagerato »

That's terrible. This kind of research is so intensive that not many people are willing to do it, and it looks like byuusan and anomie were making good progress here.

Even though I hardly understand what either of you are saying most of the time, and don't fully comprehend all the implications of a correct cycle-by-cycle emulator, I still feel this research is important.
Nightcrawler
Romhacking God
Posts: 922
Joined: Wed Jul 28, 2004 11:27 pm
Contact:

Post by Nightcrawler »

anomie wrote:
byuusan wrote:By the way, I have some bad news: I just started working again, full-time, and as it's a rather dead-end laborious job, it's sapping all of my motivation and energy away. I'll try and work on things when I can, but expect to see a lot less from me for the next few months :(
Ick, i know how that is...
Yes.. I feel your pain. Working leaves limited time and even in your limited time, you don't feel motivated to take advantage of it because your motivation and energy has been sucked away before you get home. On the plus side, with a labor intensive job, it will help you stay in shape. It's so easy to get plump when you sit on your ass all day at work.

On a side note, what is somebody with your skills doing in a labor intensive job? You have the programming ability, learning ability, and smarts to do better. I understand this can be a rather personal question, and we all have our reasons and problems, so you don't have to share if you don't want to.
[url=http://transcorp.romhacking.net]TransCorp[/url] - Home of the Dual Orb 2, Cho Mahou Tairyku Wozz, and Emerald Dragon SFC/SNES translations.
[url=http://www.romhacking.net]ROMhacking.net[/url] - The central hub of the ROM hacking community.
kieran_
Mugwump
Posts: 824
Joined: Fri Jul 30, 2004 9:05 pm

Post by kieran_ »

That sucks, Byuusan. I had no idea what all of these technical posts were about, but you seemed to be contributor who actually knew his stuff. Good Luck.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Nightcrawler wrote:On a side note, what is somebody with your skills doing in a labor intensive job? You have the programming ability, learning ability, and smarts to do better. I understand this can be a rather personal question, and we all have our reasons and problems, so you don't have to share if you don't want to.
Me, or byuusan? As for me, I seem to lack whatever quality it is that makes employers say "I want to hire that guy instead of one of the other 5000 applicants for my one open position".

[other results]

Trying to time HDMA now (to verify my previous results), and things are strange...

I set up one simple Mode 0 HDMA. Everything works fine up to [227.5,0]. Adding 2 master cycles latches either [285,0] or [285.5,0]. Adding 2 more master cycles latches dot 287/287.5. Two more gives 288/288.5. Adding two more goes back to 287/287.5! And so on, i haven't found a pattern yet. And things get worse for later scanlines, presumably because each HDMA's weirdness is cumulative.

The best i can figure, it seems like the HDMA delay somehow has to do with just when in the instruction HDMA triggers, and possibly even the instruction executing. How fun...
byuu

Post by byuu »

On the plus side, with a labor intensive job, it will help you stay in shape. It's so easy to get plump when you sit on your ass all day at work.
Well, I gained 20lbs in a year and a half at AOL's HelpDesk. 35k/year to take simple employee phone calls while surfing the web. But that only puts me at 130lbs, so that was probably a good thing (reason for leaving was outsourcing, by the way).
Working leaves limited time and even in your limited time, you don't feel motivated to take advantage of it because your motivation and energy has been sucked away before you get home.
This never happened to me at AOL. I had zero problems with free time as I was never physically nor mentally exhausted upon arriving home from work there.
On a side note, what is somebody with your skills doing in a labor intensive job? You have the programming ability, learning ability, and smarts to do better.
1) I live in Ohio.
And even though 1 should have answered your question already:
2) Even though I was overqualified for nearly all of the positions I applied for at the fourty companies I went to, I lack the social networking skills and college degrees to get past the HR lackeys.

I appreciate the pity, though. I'll get something eventually, this is just money in my pocket while I keep searching for something better. Damn shame about the timing though, I was this close to getting Chrono Trigger to play. Now I'll probably have to put it on hold for a few months.
I set up one simple Mode 0 HDMA. Everything works fine up to [227.5,0]. Adding 2 master cycles latches either [285,0] or [285.5,0]. Adding 2 more master cycles latches dot 287/287.5. Two more gives 288/288.5. Adding two more goes back to 287/287.5! And so on, i haven't found a pattern yet. And things get worse for later scanlines, presumably because each HDMA's weirdness is cumulative.
Ack. I've just been starting HDMA at dot 278 or above (after the current opcode completes). Why am I not surprised this would turn out to be complicated, as well?
The best i can figure, it seems like the HDMA delay somehow has to do with just when in the instruction HDMA triggers, and possibly even the instruction executing. How fun...
I figured HDMA would wait until the current instruction had completed, like NMIs and IRQs.
A long shot here, but I'm curious. Does the SNES ever raise the /ABORT line on its own, or is that only doable via external device? Perhaps some things that act weird with certain opcodes (like PHP/RTI, CLI/RTI, etc. that you mentioned above) are a result of the SNES aborting the opcode part-way through?
Nightcrawler
Romhacking God
Posts: 922
Joined: Wed Jul 28, 2004 11:27 pm
Contact:

Post by Nightcrawler »

byuusan wrote:
On the plus side, with a labor intensive job, it will help you stay in shape. It's so easy to get plump when you sit on your ass all day at work.
Well, I gained 20lbs in a year and a half at AOL's HelpDesk. 35k/year to take simple employee phone calls while surfing the web. But that only puts me at 130lbs, so that was probably a good thing (reason for leaving was outsourcing, by the way).
Wow.. you're a small guy. You weighed 110 pounds? How tall are you?
Working leaves limited time and even in your limited time, you don't feel motivated to take advantage of it because your motivation and energy has been sucked away before you get home.
This never happened to me at AOL. I had zero problems with free time as I was never physically nor mentally exhausted upon arriving home from work there.
I think it has to do with your schedule and if you like what you do. I have worked labor intensive jobs and been LESS tired than I am when I come home from my current non laboring job. I think it's the hours. They just don't agree with my natural schedule. That and I hate my job, so my mental frame of mind is always poor and I think we definitely have a connection between our physical and mental health.
On a side note, what is somebody with your skills doing in a labor intensive job? You have the programming ability, learning ability, and smarts to do better.
1) I live in Ohio.
And even though 1 should have answered your question already:
2) Even though I was overqualified for nearly all of the positions I applied for at the fourty companies I went to, I lack the social networking skills and college degrees to get past the HR lackeys.
Ah.. I see. Any reason you didn't persue college? Not that it matters much, I'm a pretty strong believer that I college is overrated and a waste of money.
I appreciate the pity, though. I'll get something eventually, this is just money in my pocket while I keep searching for something better. Damn shame about the timing though, I was this close to getting Chrono Trigger to play. Now I'll probably have to put it on hold for a few months.
That understandable. I think many people work where they work just to 'hold them over' until something better comes up. In fact... I bet some people do that their whole lives ALWAYS looking for something better.. and then they get that and look for something better. That's human nature. We all want something better. You have limited time yes, but you're not dead yet, so you can still do some testing if you want to. :) Don't dispair, just keep truckin'.


anomie:

Yeah.. it can be tough sometimes, but keep trying. I hate seeing stupid a'holes in the work field that end up being your boss who don't know the first thing about what you're doing when there are so many bright talented people working below the level they should be at, who are unrecognized for their skill.
[url=http://transcorp.romhacking.net]TransCorp[/url] - Home of the Dual Orb 2, Cho Mahou Tairyku Wozz, and Emerald Dragon SFC/SNES translations.
[url=http://www.romhacking.net]ROMhacking.net[/url] - The central hub of the ROM hacking community.
byuu

Post by byuu »

Wow.. you're a small guy. You weighed 110 pounds? How tall are you?
5' 11, skinny is a better word for it.
I think it has to do with your schedule and if you like what you do. That and I hate my job, so my mental frame of mind is always poor and I think we definitely have a connection between our physical and mental health.
Agree, and agree.
Ah.. I see. Any reason you didn't persue college?
Be warned, I'm very cynical. That said:
I don't agree with college. For the specific fields I'm interested in, it's overrated and overvalued. The computer science stuff I saw from a roommate who was in college for it was a joke.
Would you rather hire a worker who spent his whole life programming and loved doing it, or someone who got into the profession at 20 looking to earn big money, but who doesn't really care about it?
As far as pursuing college anyway: I didn't have straight A's in high school, and since I'm a white male, that means I can't get a free scholarship. I can't afford to take a few years off work while I go to college, and I don't want to pay back student loans the rest of my life. Even if I did go to college, comp. sci. would be a waste of my time (what could they really teach me at this point?), and incredibly stupid (since most jobs in this field are being outsourced anyway). The only other thing I'd like to learn in college is Japanese or Mandarin, because I'm unable to fully teach myself a second language on my own. But that won't get me a job. At least, not one that pays enough to cover student loans and cost of living at the same time.
You have limited time yes, but you're not dead yet
You could always ask the cart master to do something about that for you. Five golden stars to whoever gets the reference.

Anyway, while always fun to rant about this stuff, it is fairly offtopic. If you'd like to continue discussing this, would you mind sending me an e-mail, please? We could use AIM too, but I'd need your screenname first.
Dmog
Lurker
Posts: 192
Joined: Tue Aug 31, 2004 6:03 pm

Post by Dmog »

Nightcrawler wrote:
byuusan wrote:
On the plus side, with a labor intensive job, it will help you stay in shape. It's so easy to get plump when you sit on your ass all day at work.
Well, I gained 20lbs in a year and a half at AOL's HelpDesk. 35k/year to take simple employee phone calls while surfing the web. But that only puts me at 130lbs, so that was probably a good thing (reason for leaving was outsourcing, by the way).
Wow.. you're a small guy. You weighed 110 pounds? How tall are you?
That all depends on one's height I guess. Saw a lightweight boxing match (not live) yerterday. Well, I guess it was more of a one way fight and the other guy was just a moving punching pag. Anyway, the lightweight champion Juan Diaz weights about 130lbs (at 5'6'') and doesn't look so small.
Oblivion
What?
Posts: 177
Joined: Wed Jul 28, 2004 1:32 pm
Location: You'd want to know, wouldn't you?

Post by Oblivion »

byuusan wrote:
You have limited time yes, but you're not dead yet
You could always ask the cart master to do something about that for you. Five golden stars to whoever gets the reference.
Monty Python and the Holy Grail. *takes the five golden stars and leaves dev forum, having nothing to contribute*
Everything I say is a lie.
Post Reply