SPC700

Strictly for discussing ZSNES development and for submitting code. You can also join us on IRC at irc.libera.chat in #zsnes.
Please, no requests here.

Moderator: ZSNES Mods

byuu

Post by byuu »

Some tests I did today:
http://byuu.org/spc/spc_decadjust.zip
This validates the behavior of daa/das with C/H set and clear, for a total of 8 * 256 tests, 2 bytes per test (one result, one PSW value).
Overload's formula is perfect.

ZSNES has problems with both the results and flags, but daa seems to be more correct-ish than das.

http://byuu.org/spc/spc_movwdpyat0.zip
http://byuu.org/spc/spc_movwdpyat2.zip
The ZIP ending indicates the timer used. I used both timer 0 (more tests = more precision), and timer 2 (why not?) to run movw dp,ya multiple times to test and see how many APU clock cycles the instruction takes. I run the test 128 times, because I can.
sfsound.txt says 4, anomie says 5.

Taking t0 as an example, I get these results on a real SNES:

Code: Select all

00000000 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000010 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000020 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000030 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000040 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000050 0A0A 0B0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000060 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A 0A0A
00000070 0A0A 0A0A 0A0A 0A0B 0A0A 0A0A 0A0A 0A0A
I ran movw dp,ya 256 times.
256*4=1024 ticks/128= 8
256*5=1280 ticks/128=10

t2 confirms the same thing.
Result: movw dp,ya is 5 cycles. sfsound.txt is incorrect. anomie wins three cookies.

As expected, ZSNES is off in center field with its timer results.
byuu

Post by byuu »

What the hell...

http://byuu.org/spc/spc_adcflags.zip
http://byuu.org/spc/spc_sbcflags.zip
http://byuu.org/spc/spc_cmpflags.zip

ZSNES fails spc_sbcflags.zip, but passes the other two.
I also tested adc/sbc CPU flags because something's screwy about the overflow flag. ZSNES passes both. Check this out:

Code: Select all

[adc]
APU: x^y ^ y^r
int16 r = x + y + carry;
  overflow = !!(~(x ^ y) & (y ^ r) & 0x80);

CPU: x^y ^ x^r
int16 r = x + y + carry;
  overflow = !!(~(x ^ y) & (x ^ r) & 0x80);

[sbc]
APU: x^y ^ x^r
int16 r = x - y - !carry;
  overflow = !!((x ^ y) & (x ^ r) & 0x80);

CPU: x^y ^ x^r
int16 r = x - y - !carry;
  overflow = !!((x ^ y) & (x ^ r) & 0x80);
SBC works as expected on both the CPU and the APU, but look at ADC. On the CPU, you use x^y ^ x^r, but on the APU, you use x^y ^ y^r. Why?
I tested this, and confirmed the results. I tried swapping the APU one and the APU failed, I tried swapping the CPU one, and the CPU failed.

Is there a bug in the APU processor? Every overflow algorithm I've seen has used x^y ^ x^r...

I also assume the behavior is the same for 16-bit adc/sbc/addw/subw? I'm sure as hell not testing 65536*65536 possible results to find out, nor am I going to make a test program that checks the values itself whilst running.
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

Overload, re: your SPC flag test program...

I was trying to debug why it was freezing on SNEeSe, obviously a SNEeSe problem since it works MOST of the time on the real hardware, when I noticed a shortcoming in the test source.

Sometimes, the test program seems to freeze on my SNES + copier. Most of the time, it does not. I suspect the cause is how the comm handles end-of-test in some cases.

In spc_adc_test, when a value is read off $F7 (high byte of movw ya, $F6) that has the high bit set, you end the test, and wait for the value to change, then move to the next test. However, the APU manual mentions what can happen when an I/O port read coincides with a write from the other end - that the value may not be exactly either the old value or the new value. If this happens on that movw, then the wait-for-change will end immediately (since the new value has stabilized), and then the next test will abort (high bit set), and you'll be in a proper end-of-test wait loop - but one test too late. Throw in a movw before the cmpw and the risk of that happening disappears.

I don't know necessarily if that's the reason it freezes on my unit, however, as I haven't explored that. This is something to be aware of, though.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuusan wrote:Some tests I did today:
http://byuu.org/spc/spc_decadjust.zip
This validates the behavior of daa/das with C/H set and clear, for a total of 8 * 256 tests, 2 bytes per test (one result, one PSW value).
Overload's formula is perfect.

ZSNES has problems with both the results and flags, but daa seems to be more correct-ish than das.
Zsnes uses the x86 DAA/DAS instructions and doesn't do any work to emulate those except setting up flags. [as of 0909 wip]

As near as I can tell, Bsnes uses the algorithm Overload posted. However, the algorithms described for DAA/DAS in the Intel instruction set reference doc appear to be completely equivalent to Overload's (as long as the x86 Auxiliary-carry flag and Carry flag are set to the SPC's H flag and C flag before the DAA/DAS is executed).

Here are the algorithms described in that doc, in pseudo-C code:

Code: Select all

//Note: AF=auxiliary carry flag (x86 half-carry flag)
/****** DAA ******/

if (((AL & 0x0F) > 9) || AF == 1) {
    AL = AL + 6;
    CF = CF OR CarryFromLastAddition; /* CF OR carry from AL = AL + 6 */
    AF = 1;
} else {
    AF = 0;
}
if ((AL & 0xF0) > 0x90) || CF == 1) {
    AL = AL + 0x60;
    CF = 1;
} else {
    CF = 0;
}

/****** DAS ******/

if (((AL & 0x0F) > 9) || AF == 1) {
    AL = AL - 6;
    CF = CF OR BorrowFromLastSubtraction; /* CF OR borrow from AL = AL - 6 */
    AF = 1;
} else {
    AF = 0;
}
if ((AL > 0x9F) || CF = 1) {
    AL = AL - 60H;
    CF = 1;
} else {
    CF = 0;
}
Now if you compare these algorithms with Overload's, they seem equivalent? The order of the if-tests is switched around, and the x86 handles AF=1 as a forced carry/borrow into the upper digit, but an assembler implementation that sets AF=H and CF=C and uses x86 DAA/DAS should work fine, should it not? (And since zsnes basically does this... is there something wrong with the flag-manipulating part of the zsnes code for these insns?)

[EDIT: AHA... I think I notice something.. in order to match Overload's, you have to set up the x86 AF flag to have the value of the SPC's H flag *before* doing the DAA/DAS (no need to write it back after). Correction above in red.
I use SAHF to do set this flag and the carry flag at the same time, as that is easiest with my flags representation.]


Also regarding DIV, I was looking with puzzlement at byuu's implementation of it in bsnes v0.012, which seemed to be different from both zsnes and snes9x. I was wondering where that code came from... I now realize it's blargg's algorithm (cool :)) and there's this experimental evidence to back it up. Since zsnes and snes9x seem to both use a different algorithm (do a plain unsigned divide, and special-case X=0) and since I tested plain unsigned divide against blargg's algorithm and got completely different results from blargg's algorithm in many many cases, I currently believe that the DIV implementation in zsnes and snes9x produces wrong results. Is it just that there are no SPC programs out there that rely on this instruction's results in the overflow cases?
Last edited by mozz on Mon Oct 10, 2005 8:11 pm, edited 4 times in total.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

Regarding the cycle counts of the CBNE D and CBNE DX instructions ($2E and $DE).

bsnes v0.012 implements both of these instructions as 5/7 cycles: ifetch, fetch, fetch [+ add X], direct read, compare to A followed by 2 more cycles for a taken branch.

However, one of the SPC documents I have somewhere lists CBNE DX as 6/8 cycles, and the SPC cycle table in snes9x has values 5 and 6 for CBNE D and CBNE DX, respectively.

My own hunch is that CBNE DX should be 6/8, with the 3rd cycle just adding X to the direct offset and the 4th cycle being the direct read. This is consistent with other SPC instructions which use direct X addressing, where the address calculation gets its own cycle.

Does anyone know which is correct?
byuu

Post by byuu »

My CBNE DP,X is most likely wrong. Thanks.

Yeah, the DIV algorithm is from blargg. I tested a few overflow cases and only blargg's gave the correct results (well, anomie's probably does too, I didn't try it). However, I haven't tested all possible cases against the real hardware, but I believe blargg has.
I doubt any programs rely on the overflow exceptions...

I'll fix CBNE DP,X tonight.
anomie
Lurker
Posts: 151
Joined: Tue Dec 07, 2004 1:40 am

Post by anomie »

mozz wrote:However, the algorithms described for DAA/DAS in the Intel instruction set reference doc appear to be completely equivalent to Overload's (as long as the x86 Auxiliary-carry flag and Carry flag are set to the SPC's H flag and C flag before the DAA/DAS is executed).
Try A=0xFA to 0xFF (with ch clear) for DAA, and 0x9A-0x9F with CH set and 0x9A-0xA5 with C set and h clear for DAS.
if (((AL & 0x0F) > 9) || AF == 1) {
Shouldn't that be AF==0 in DAS? And similarly for CF?
I currently believe that the DIV implementation in zsnes and snes9x produces wrong results.
We all believe that. No one ever systematically tested DIV until recently.
My own hunch is that CBNE DX should be 6/8, with the 3rd cycle just adding X to the direct offset and the 4th cycle being the direct read. This is consistent with other SPC instructions which use direct X addressing, where the address calculation gets its own cycle.
Assuming you're not counting the opcode fetch as Cycle 1, then that sounds about right. My own tests have indicated 6/8 for CBNE d+X,r.
byuusan wrote:(well, anomie's probably does too, I didn't try it)
It does. At one point, I ran every possibly value through both algorithms and got identical results.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuusan wrote:My CBNE DP,X is most likely wrong. Thanks.
Most welcome.
anomie wrote:Try A=0xFA to 0xFF (with ch clear) for DAA, and 0x9A-0x9F with CH set and 0x9A-0xA5 with C set and h clear for DAS.
Thanks anomie, that was instructive. I guess I will abandon my attempt to use the x86 DAA/DAS instructions for these opcodes.

[EDIT: or not! I think the x86 pseudocode handles that case, and gives the same result as Overload's peudocode. Notice that they OR in the carry from the AL=AL+6 step. If you pass in 0xFA, the x86 algorithm will add 6 to it and then OR in the carry causing the 0x60 to be added as well, even though AL now contains 0x00. It doesn't matter what value the C flag ends up with, after all. However, with DAS you have to make sure C and H are inverted when you load them into CF and AF, because of the difference in handling borrows between SPC and x86. And you still need to produce a result for the C flag. All in all, easier to not use the x86 version.

So for example, on a DAA of 0xFA with C=H=0, both algorithms yield 0x60 with C=1. On a DAS of 0x99 with C=1,H=1 (i.e. CF=0,AF=0 in the x86 alg), both algorithms yield 0x99 with C=1 (i.e. CF=0). I'm 95% convinced they are equivalent, maybe I should just write a little test program to make sure.]
anomie wrote:
if (((AL & 0x0F) > 9) || AF == 1) {
Shouldn't that be AF==0 in DAS? And similarly for CF?
Nope. The x86 CF and AF are 1 for a borrow, 0 for not-borrow. This is the opposite of the way the SPC flags C and H work with subtraction. I better double-check my handling of the H and C flags in my SBC opcodes.. *sigh*
byuu

Post by byuu »

If you're that desperate to squeeze out performance that you can't afford a couple lines of C inside the rarely-used SPC700 DAA/DAS opcodes, then you have more serious problems to worry about.

It's cool to try and speed things up and all, but seriously... you won't even notice the difference. It isn't worth worrying about.

If this is purely academic, then I guess what the hell... have fun.
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

byuusan wrote:If you're that desperate to squeeze out performance that you can't afford a couple lines of C inside the rarely-used SPC700 DAA/DAS opcodes, then you have more serious problems to worry about.

It's cool to try and speed things up and all, but seriously... you won't even notice the difference. It isn't worth worrying about.

If this is purely academic, then I guess what the hell... have fun.
Its mostly just for fun. x86 DAA/DAS instructions probably don't perform well on modern processors anyway (though better than the equivalent sequence of simple x86 instructions). Its just that many x86 instructions generate the flags needed for corresponding SPC instructions and I wanted to push this to the limit with DAA/DAS. :lol:

My core will be in assembly and optimized for speed on P2/P3/PM, except for a major design choice of using a jump through a register to chain together *two* handlers for a lot of the opcodes. This is a size optimization and have no idea what the performance hit is. I have reason to believe it is small on P2/P3, but it might be slow on Athlons and P4s, I don't know. My core will also do accurate cycle counting (at least an accurate count must be available at the beginning of any memory access cycle which can access ports). There is no real reason to write something like this except that (a) I want to, and (b) it doesn't seem to have been done before.
TRAC
SNEeSe Developer
SNEeSe Developer
Posts: 25
Joined: Sun Nov 14, 2004 12:46 pm
Contact:

Post by TRAC »

IIRC, DAA/DAS on x86 can have a 'double carry/borrow' into the high nybble for invalid values, and the 6502-family and SPC700 do not have that issue. Because of this, for accurate outputs of SPC700 DAA/DAS, x86 DAA/DAS is useless. Also, because of the differences in results, the flags are not the same, either. x86 AAA/AAS CAN be useful, however (see implementation in SNEeSe).
mozz
Hazed
Posts: 56
Joined: Mon Oct 10, 2005 3:12 pm
Location: Montreal, QC

Post by mozz »

TRAC wrote:IIRC, DAA/DAS on x86 can have a 'double carry/borrow' into the high nybble for invalid values, and the 6502-family and SPC700 do not have that issue. Because of this, for accurate outputs of SPC700 DAA/DAS, x86 DAA/DAS is useless. Also, because of the differences in results, the flags are not the same, either. x86 AAA/AAS CAN be useful, however (see implementation in SNEeSe).
Ok, that is good to know. Thanks TRAC.
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

mozz wrote:x86 DAA/DAS instructions probably don't perform well on modern processors anyway
Keep the next generation (64-bit) CPUs in mind as well... :wink:
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Infamous bump because it's too tiny to make a new thread about it.

Code: Select all

<MathOnNapkins> anyone here know anything about register $F0 on the spc?
anomie wrote:

Code: Select all

$00f0 -w TEST - Testing functions
        ssssTRrt
        
        ssss = CPU speed control (doesn't affect timer rate)
          0 = normal CPU rate
          1 = 3/5 normal rate
          2 = 3/9 normal rate
          3 = 3/17 normal rate
          4 = 3/4 normal rate
          5 = 3/6 normal rate
          6 = 3/10 normal rate
          7 = 3/18 normal rate
          8 = 3/6 normal rate
          9 = 3/8 normal rate
          A = 3/12 normal rate
          B = 3/20 normal rate
          C = 3/10 normal rate
          D = 3/12 normal rate
          E = 3/16 normal rate
          F = 3/24 normal rate
        ** Settings other than 0 may lock up the SPC700! **

Code: Select all

<MathOnNapkins> all I really wanted to say is I figured out the numerical pattern for the speed settings
<MathOnNapkins> I doubt most emulators even implement the different speed settings
<MathOnNapkins> but you never know
<MathOnNapkins> instead of ssss, it should be ttss, and the speed is 3 / (2^(ss+1) + 2^tt)
Those are 'power' ^, btw.

Using that ttss notation, it obviously gives

Code: Select all

speed_denom = (1<<(ss+1)) + (1<<tt);
No idea what we currently know/do about it. Thank MON for all that.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

Neat, thanks for the info.

I used to emulate it, but I took it out when I was removing the S-DSP enslavement from the S-SMP for blargg's new 1.024MHz S-DSP core. I believe anomie said that the speed setting doesn't affect the S-DSP, which would be really odd, since the S-DSP interleaves S-SMP reads with its own reads. Probably why the thing crashes when you try setting other speeds, bus conflicts probably start feeding the SMP data from the DSP accesses.

As of now, no emulator does anything with the speed control. I do support most of the other bits of the test register at the moment. I was thinking of adding a pedantic mode setting for programmers using emulators, to crash hard on things like DMA<>HDMA conflict, reading mul/div registers early, etc -- this could go in there as well. Just forcefully deadlock the SMP whenever you change the speed setting. The idea is that since we can't at this time emulate these hardware bugs properly, the least we can do is make it not look like what they're trying to do is okay.
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

That's a nice idea. :)
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
neo_bahamut1985
-Burninated-
Posts: 871
Joined: Mon Sep 10, 2007 11:33 pm
Location: Unspecified

Post by neo_bahamut1985 »

Yes! Some new news about the SPC700 core!
俺はテメエの倒す男だ! 宜しく! お前はもう死んでいる...
Post Reply