Latch timing

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

Post Reply
byuu

Post by byuu »

Nitpick: You should say 339,240 or $153,$f0 :P
Then.. one would expect that, while the latches report normal-length dots for the 'short' line on odd NI frames, the 5A22 still treats them as long dots?
Seems perfectly logical. A way to test this could be to set an IRQ for dot 320, and measure that latch against an IRQ for dot 328. You'd have to be extremely careful in counting all the cycles used in the WAI instruction, the IRQ invocation, and the following latching instruction; but it should be possible to determine if that's the case.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

TRAC wrote:Do you mean them as the 3 stages of:
Stage 1: 128:1 (T0, T1) or 16:1 (T2) scaler.
Stage 2: 1-256 'divisor', based on a 0-255 wraparound counter and a post-increment comparator.
Stage 3: The 4-bit counter for output ticks from the comparater stage.
Yes, that's exactly what I was trying to say. I'm going to copy that ;)
Also, about the 4-bit counter - do we know if it wraps around, or saturates?
They wrap.
TRAC wrote:Then.. one would expect that, while the latches report normal-length dots for the 'short' line on odd NI frames, the 5A22 still treats them as long dots?
Well, the 5A22's IRQ timer actually treats all dots on the scanline as 4 master cycles no matter what. H=0 is special, it occurs 1374 master cycles after dot 0 of the previous scanline. Otherwise, the IRQ timer goes off 14+H*4 cycles after dot 0 of the current scanline. Yes, that means there is a 'gap' between H=0 and H=1 on normal scanlines. Feel free to double check these numbers, verification is good.

Interestingly, it is also impossible to trigger an IRQ on the very last dot of the frame (H=339 V=261 (or 262 for interlace long frames), or presumably 311/312 for PAL)...
Zuzma

Post by Zuzma »

Whatever you guys are doing it's great! All those changes TRAC made to SNEeSes CVS seem to make afew games more stable and less crashy (TOP). I can tell the timing is still off just by looking at some games I own but it sure is getting alot closer. :) Thanks for all your hard work guys.
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

anomie wrote:Yes, that's exactly what I was trying to say. I'm going to copy that ;)
Hurrah!
anomie wrote:
TRAC wrote:Also, about the 4-bit counter - do we know if it wraps around, or saturates?
They wrap.
This has been verified, then?
anomie wrote:Well, the 5A22's IRQ timer actually treats all dots on the scanline as 4 master cycles no matter what. H=0 is special, it occurs 1374 master cycles after dot 0 of the previous scanline. Otherwise, the IRQ timer goes off 14+H*4 cycles after dot 0 of the current scanline. Yes, that means there is a 'gap' between H=0 and H=1 on normal scanlines. Feel free to double check these numbers, verification is good.
Hmm. It's just that I'm trying to find some reasoning for the 'no IRQ on H=339 on short scanline' case, and for both that and consistent 4-mcycle timer resolution to be true, something seems... missing...
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

TRAC wrote:This has been verified, then?
The wrapping? Yes. I set up a wait loop that effectively counts down from ~0x100f0, with timer #2 set for a target of 1, and still no saturation.
Hmm. It's just that I'm trying to find some reasoning for the 'no IRQ on H=339 on short scanline' case, and for both that and consistent 4-mcycle timer resolution to be true, something seems... missing...
I suspect some oddity with the quicker end-of-HBlank signal causes 339 to get cut off. And something similar with end-of-VBlank cutting the last dot IRQ off for the end of the frame.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

More SPC700 info tonight. On reset, the SPC700 side of the IO registers ($00f4-7) does get reset. I set things up with

Code: Select all

        ldx #$0200
        stx $2142
        stz $2141
        lda #$cc
        sta $2140
        stp
And on reset $2140-1 were displaying $bbaa rather than what my code at $0200 would do.

$00f1 bits 4 and 5 clear the ports whenever you write 1, not just on a 0->1 transition. No effect on the 5A22 side of things.

$00f8-9 still do nothing I can detect. I checked timers to see if they did anything there (and no sign of mythical speed bits in there either, unless they double-speed the timers too), no hidden IRQ, no change to STOP or SLEEP, nothing. But they're convenient to hold an indirect address if you need to ;)

And i've found that the DSP will properly wrap $ffff->$0000 when decoding BRR samples, either mid-sample or in between samples.

BTW, who has the best known BRR formulas?
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

anomie wrote:More SPC700 info tonight. On reset, the SPC700 side of the IO registers ($00f4-7) does get reset. I set things up with

Code: Select all

        ldx #$0200 
        stx $2142 
        stz $2141 
        lda #$cc 
        sta $2140 
        stp 
And on reset $2140-1 were displaying $bbaa rather than what my code at $0200 would do.
I recently did some tests that verified that, at least for $2140-1, the 5A22 read-side of those registers is also cleared on reset. Not until enough time has passed for the IPL RAM clear to finish and write to the ports does anything nonzero appear on those ports. The time I measured before the SSMP starts executing the IPL after reset seems to be ABOUT 5 SSMP instruction cycles (fractional component involved, plus inaccuracies due to async clocks causing communications variances).
anomie wrote:$00f1 bits 4 and 5 clear the ports whenever you write 1, not just on a 0->1 transition. No effect on the 5A22 side of things.
Then, it is true that the ports are writable from the 5A22 side when $00f1.d4-5 are written and left high?
anomie wrote:And i've found that the DSP will properly wrap $ffff->$0000 when decoding BRR samples, either mid-sample or in between samples.
Very interesting, I don't believe that had been properly determined before now.
anomie wrote:BTW, who has the best known BRR formulas?
If I'm not mistaken, the best BRR formulae came from FatlXception/Brad Martin (libopenspc), though tracking down the last release of that, I believe 20030109, is not easily done. SNEeSe BRR code is greatly based on the results of his research that were either revealed by him on IRC and/or put in libopenspc.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

TRAC wrote:The time I measured before the SSMP starts executing the IPL after reset seems to be ABOUT 5 SSMP instruction cycles (fractional component involved, plus inaccuracies due to async clocks causing communications variances).
Only 5? hrm...
Then, it is true that the ports are writable from the 5A22 side when $00f1.d4-5 are written and left high?
Yes. In my tests, all my individual routines have a "go back to IPL" command so the enxt test can be uploaded. And that command stores #$b0 to $f1 and jumps to $ffc0, so those bits are 1 the whole time the IPL is transfering.
Very interesting, I don't believe that had been properly determined before now.
It really wasn't any harder than getting the thing to play a sound in the first place. Except that I made about 3 stupid bugs in the process. ;)
If I'm not mistaken, the best BRR formulae came from FatlXception/Brad Martin (libopenspc), though tracking down the last release of that, I believe 20030109, is not easily done. SNEeSe BRR code is greatly based on the results of his research that were either revealed by him on IRC and/or put in libopenspc.
I suspect someone put those into snes9x at some point too. I may check SNEeSe and see if they agree.

[later]

They agree, except for two things. For invalid headers, snes9x uses ~0x7FF, while SNEeSe uses ~0xFFF. And in the Method 2 and 3 filters, snes9x at one point uses "- (prev1 >> 1)" while SNEeSe uses "+ (-last2 >> 1)". Any comments?
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

anomie wrote:They agree, except for two things. For invalid headers, snes9x uses ~0x7FF, while SNEeSe uses ~0xFFF. And in the Method 2 and 3 filters, snes9x at one point uses "- (prev1 >> 1)" while SNEeSe uses "+ (-last2 >> 1)". Any comments?
SNEeSe is incorrect on the invalid headers, will get a fix in CVS shortly. The BRR code in SNEeSe agrees (mostly) with libopenspc, and since Brad Martin is the one who did the bulk of the research in figuring out the details, I'd expect it's more correct. Unfortunately, I am not certain as to how much of the details of even libopenspc are exactly correct - in particular, in some places it shift-rights a negated value and then adds; in others, it shift-rights the original value and then subtracts. Due to the rounding differences, these are not the same. I never have asked him about it, though.
byuu

Post by byuu »

Ok, I'm pretty much at an impasse with DMA timing... I'm not going to be able to finish this without help.

Here are my latest test results:
http://setsuna.the2d.com/files/dma_delay_op.txt

I've optimized the results and removed all the extra tests I did to verify some things, so that it's easier to read.

This is my current implementation of the timing:

Code: Select all

/*
  speed 0 = 6+6
  speed 1 = 8+6
  speed 2 = 8+8
*/
byte dma_base_init_timing_table[36] = {
//ring:
// 1,  2,  3
// |   |   |
// v   v   v

//pattern 0
  14, 12, 16, //speed 0
  20, 18, 22, //speed 1
  24, 24, 24, //speed 2

//pattern 1
  20, 18, 22, //speed 0
  20, 18, 16, //speed 1
  16, 16, 16, //speed 2

//pattern 2
  20, 18, 16, //speed 0
  14, 18, 16, //speed 1
  16, 16, 16, //speed 2

//pattern 3
  14, 18, 16, //speed 0
  14, 12, 16, //speed 1
  16, 16, 16  //speed 2
};

  if(value != 0x00) {
    bytes_transferred = 0;
    for(i=0;i<8;i++) {
      if(value & (1 << i)) {
        if(dma_channel[i].transfer_size == 0) {
          bytes_transferred += 65536;
        } else {
          bytes_transferred += dma_channel[i].transfer_size;
        }
      }
    }

    pattern = snes_time->dma_delay_pos >> 1;
    ring    = (bytes_transferred - 1) % 3; //subtract one to cast (1, 2, 3) -> (0, 1, 2)
    speed   = 0; //force fastrom 6+6 speed (for testing)

    index = (pattern * 9) + (speed * 3) + ring;

    snes_time->add_cpu_cycles(dma_base_init_timing_table[index]);
  }
I'm presently only handling the FastROM speeds. As I've mentioned before, I believe the next two cycles affects how long the base initialization is, and FastROM is always 6+6. I would need to make a lookup table for SlowROM, so that's why I don't have that in yet.

The ring code is pretty simple. It's just # of bytes transferred mod 3. I'm thinking that DMA is based around 24-bit transfers. It'd be like a malloc on a PC, malloc tries to use 32-bit transfers, but if the transfer isn't divisible by four, then it has to copy a word, and maybe a byte as well. I think the SNES is doing the same thing. Thusly, I don't think # of bytes transferred affects -base- timing, but that the extra time is added somewhere near the end of the DMA transfer. But you get the same result if you add the added delay to the beginning anyway.

I do know that the type of transfer ($43x0), the destination register ($43x1), and the channel specified for the transfer (0-7) do not affect base init. timing. Number of channels does.

The pattern thing I explained before, that's based on the cycle counter. It's (cycle counter >> 1) % 4

The above code works for every possible one channel transfer, but is off by 0 - 2 cycles when two or more channels are active.

I'm completely unable to find any logic to explain this, and I don't think I have the intelligence required to figure out how to add yet another factor into this (such as # of channels used for the transfer), as that would greatly increase the size of my table above. My current table took me three days to solve for.

What we need to do is try and figure out what the hell is going on with the table I have above (which I'm 99% sure is correct), and then apply that logic to what happens with more than one active channel.

Unfortunately, figuring this out exactly is pretty important to me. We need to understand DMA if we hope to understand HDMA timing, NMI+DMA timing, etc., and if I start half-assing my emulation by doing something like just always specifying an 18-cycle delay for each DMA transfer, then I'll end up with an unstable foundation. The more I emulate, the more the core bugs will cause problems with the higher-level code. And in the end, I end up with a glorified zsnes9x, with none of the features or speed :/

Here's a build of bsnes with the above code in place, which could be used to test things. And here is the test ROM I'm using for my tests.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

Ack, we lost the DMA timing info! I guess i'll have to rewrite it... Anything else we're missing?

It seems to me that the DMA timing involves two different clocks. First, once $420b is written the S-CPU has time to complete one more CPU cycle before pausing for DMA (typically, this is the opcode load for the instruction after "STA $420B"). The cycle to be executed after the pause determines the CPU Clock Speed for the DMA transfer. This is the source of Byuu's "Ring".

The second clock is the DMA Clock, which always runs at 8 master cycles (starting from Reset, presumably. If you latch $2137 4 master cycles into the 6-cycle read, the clock starts at 0 i think). This is the source of Byuu's "pattern".

Starting from the pause, wait 2-8 master cycles to sync with the DMA Clock. Then do the transfer (i tried to DMA from $2137 to see when the actual transfer occurred within the DMA, but I couldn't get things to add up nicely). Then, wait 2-6 or 2-8 (or 2-12, i suppose) master cycles to sync back to the CPU Clock before continuing with the S-CPU's next instruction.

HDMA, both Init and Transfer, work in the same way: pause at the appropriate position (at the end of a CPU cycle), sync to the DMA Clock, do whatever needs doing, and sync back to the CPU Clock to continue.

Some more info on [H]DMA, BTW: They cannot access $4300-$437f, or $420b/c. Reads all return Open Bus, and writes have no effect.

When enabling HDMA mid-frame, there will be no transfer for the first line. After that, the Repeat bit takes effect as normal. BTW, the same thing happens if you just write $43xA mid-transfer, it will write or not for that line regardless of what you've just written, but after that the new Repeat bit takes effect. IOW, Repeat is checked at the end of the HDMA Transfer to determine whether or not to transfer for the next line.

And one more thing that we've never known before: $43x0 bit 7 does affect HDMA. Indirect mode is more useful, it will read the HDMA table as normal and write to rather than read from the specified indirect address. Direct mode will work though, it will read the $43xA values and try to write to (rather than read from) the data values in the HDMA table. Obviously, your HDMA table should be in RAM to make use of this. At least DMA does seem to ignore bit 6, and HDMA does seem to still ignore bits 3 and 4. Tests with those bits set gave identical results as those with the bit clear. And if bit 5 (or $43xB/F) does anything to either DMA or HDMA, I can't find any sign of it (sorry TRAC).
DMV27
New Member
Posts: 9
Joined: Thu Jan 27, 2005 5:03 pm

Post by DMV27 »

anomie wrote:Ack, we lost the DMA timing info! I guess i'll have to rewrite it... Anything else we're missing?
I saved a copy of the thread before the forum went down. I'll re-post everything that was lost later tonight.
byuu

Post by byuu »

Hopefully that includes my pseudo-code as well. I've been working on a small pet project recently, so I haven't had a chance to test out the DMA timing myself.
Really glad to hear that the DMA info (mostly) solves HDMA timing as well, though.
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

(edited for clarity)

anomie, (re: SPC700) you mentioned that ADDW/SUBW set H flag on high byte; is Z flag set on high byte only also, or on the full result?

Also, is Z flag set on the high byte only or the full result for INCW/DECW?

Regarding registers FD-FF (timer counters) being reset on 'write'; have you tested them by writing to them using mov dp,dp? I have a suspicion (due to the timings) that nearly all other write opcodes (except PERHAPS mov dp,YA and likely mov (X),(Y)) read the destination address before writing to it...

Lastly, what all information/test data do you have on DIV? Is there any chance I could get that from you?
Last edited by TRAC on Tue Jun 07, 2005 11:17 pm, edited 1 time in total.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

TRAC wrote:anomie, (re: SPC700) you mentioned that ADDW/SUBW set H flag on high byte; is Z flag set on high byte only also, or on the full result?

Also, is Z flag set on the high byte only or the full result for INCW/DECW?
Full result for both.
Regarding registers FD-FF (timer counters) being reset on 'write'; have you tested them by writing to them using mov dp,dp? I have a suspicion (due to the timings) that nearly all other write opcodes (except PERHAPS mov dp,YA and likely mov (X),(Y)) read the destination address before writing to it...
You're right, of the MOV ops, only MOV (X)+,A and MOV dd,ds don't read the dest. BTW, i don't seem to have a listing for a MOV (X), (Y) opcode. And MOVW d,YA only reads the low byte of dest.
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

anomie wrote:You're right, of the MOV ops, only MOV (X)+,A and MOV dd,ds don't read the dest. BTW, i don't seem to have a listing for a MOV (X), (Y) opcode. And MOVW d,YA only reads the low byte of dest.
Yeah, my bad. :oops: I've been going at this code too long, there is no MOV (X),(Y).
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

TRAC wrote:Yeah, my bad. :oops: I've been going at this code too long, there is no MOV (X),(Y).
No problem. With the state of SPC700 documentation, i wouldn't have been surprised if there were such a mode misnamed as something else.
byuu

Post by byuu »

Seriously... I'd love to get my hands on some info to get a cycle-based SPC700 core started, much like in SNEeSe 0.84. I take it the implementation is mostly all theoretical at this point?
But anomie, your SPC docs are really good. They are in fact the only ones to explain registers like $f1 at all...
Post Reply