bsnes v0.042 released

Archived bsnes development news, feature requests and bug reports. Forum is now located at http://board.byuu.org/
byuu

Post by byuu »

I mentioned I wouldn't be posting a new WIP for a while so that I could work on something in secret. That way in case it didn't work out, nobody would be bummed out. Imagine my surprise when it only took me two days to get this far ...

Image Image
Image Image
Image Image
(I removed the title-bar text for the sake of the screenshot aesthetic. Check the WIP yourself if you don't believe it.)

Kirby's Dream Land 3 and Dragon Ball Z: Hyper Dimension are fully playable. Note that most games aren't playable, and most of the chip's added features are missing.

Speed took a ~3-5% hit for non-SA1 games due to all the new co-processor thread synchronization primitives that you can't really hide from inlined, super-intensive sections of the scheduler code.

As of now, and this will change, SA-1 games run about ~60% slower than normal games. Meaning you'll really want at least an E4500, but preferrably an E8400; and no filters.

The most impressive part is that I emulate this at the bus/clock level. Meaning if both the S-CPU and SA-1 access RAM at the same time, they'll see the changes and stay perfectly in sync. I even emulated the bus conflict resolution of the SA-1 memory controller. So in terms of accuracy, this is akin to the cycle-level S-PPU. It's the "theoretical worst case" for the most processor-intensive, lowest-possible emulation achievable.

I believe it was _Demo_ who speculated that it'd take at least a 10GHz processor to achieve this. Then again, it's been so long I could be attributing the quote to the wrong person. Don't even remember the exact words anymore. Anyone recall?

This gives us insight into the kind of performance we can expect from the cycle-PPU (also runs at 10.74MHz) and SuperFX. For SA-1+cycle S-PPU, it would appear that there is no processor on the market that can maintain full speed with that combo yet, heh. By the time I get around to S-PPU, there most likely will be though.

Lastly, don't bug me about SuperFX support because of this. This SA-1 support is a simple subclass of the core S-CPU that already existed in cycle-perfect, bug-free form; plus a memory mapper and ALU. Lots more to go, and even then, this is easily multiple times less work than the SuperFX is going to be.
franpa
Gecko snack
Posts: 2374
Joined: Sun Aug 21, 2005 11:06 am
Location: Australia, QLD
Contact:

Post by franpa »

<3 Byuu for making SMRPG work :P
Core i7 920 @ 2.66GHZ | ASUS P6T Motherboard | 8GB DDR3 1600 RAM | Gigabyte Geforce 760 4GB | Windows 10 Pro x64
lordmissus
Ignorant Child
Posts: 326
Joined: Mon Apr 06, 2009 10:10 pm
Location: 1984

Post by lordmissus »

Byuu, I used to consider you a god. Now I consider you a double-god.
Jipcy
Veteran
Posts: 768
Joined: Thu Feb 03, 2005 8:18 pm
Contact:

Post by Jipcy »

Amazing! It's a real thrill to see some more progress on actual emulation. I imagine it feels good for you too.

I can't wait to see where this goes.
[url=http://zsnes-docs.sf.net]Official ZSNES Docs[/url] | [url=http://zsnes-docs.sf.net/nsrt]NSRT Guide[/url] | [url=http://endoftransmission.net/phpBB3/viewtopic.php?t=394]Using a Wiimote w/ emulators[/url]
Palin
Hazed
Posts: 96
Joined: Tue Nov 08, 2005 12:40 pm

Post by Palin »

Very impressive.

I had given up hope of seeing those games running in BSNES.
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

YESSS

As a side note, one of the few games to use the SA-1 was a friggin' Power Rangers Zeo game.

Sort of odd that a game that could have been crapped out the door uses one of the best SNES enhancement chips...
Clements
Randomness
Posts: 1172
Joined: Wed Jul 28, 2004 4:01 pm
Location: UK
Contact:

Post by Clements »

It's amazing to see these games going in bsnes. Accurate SA-1 emulation is sorely needed, regardless of speed.
tetsuo55
Regular
Posts: 307
Joined: Sat Mar 04, 2006 3:17 pm

Post by tetsuo55 »

60% slower isn't bad considering the SA-1 is basically an overclocked snes CPU
byuu

Post by byuu »

68fps in Kirby 3's most intensive part on a stock 2.4GHz E4600. What a terrible value compared to the E8400 for $25 more. Throw in +15% for profiling.
<3 Byuu for making SMRPG work
It doesn't run just yet. Stops after Mario jumps at the screen. Probably needs IRQ support.

And I'm going to be goddamn pissed if it locks up after gaining a level :P
It's a real thrill to see some more progress on actual emulation. I imagine it feels good for you too.
Definitely. While a really good GUI is important too, it's not why I got into this. Love working on real stuff for a change.
Sort of odd that a game that could have been crapped out the door uses one of the best SNES enhancement chips...
I'm kind of in shock as to how little the SA-1 chips are even utilized. I'm missing 2/3rds of the chips' added features, and I only see one spot in Kirby 3 that isn't perfect.

It's kind of like those V8 engines in old-person cars like the Grand Marquis. It boggles the mind.
60% slower isn't bad considering the SA-1 is basically an overclocked snes CPU
It's "four" times faster, until you count the added overhead. Functions that modify the program counter incur two extra cycles, branch to uneven address another two, BW-RAM access another two, and simultaneous S-CPU + SA-1 ROM access another two. So it's effectively something like ~7.85MHz. That is unless you like programming with 2,048 bytes of memory.

And yeah, the speed is amazing. Not because of the SA-1's clock in and of itself (next gen goes to 100-400MHz), but because of the number of cooperative thread switches needed to keep the two perfectly in sync at all times. I'm not going to bother myself, but one could easily relax the timing to opcode-level syncing (take the sync() out of addclocks_cop() and call it after each opcode dispatch instead) and probably only need half the added power (~30% more.)

I also have some more speed-ups in mind: don't need to sync on bus acknowledge, I/O, or S-CPU access to things the SA-1 can't see like WRAM.

And there may be more speed hits, depending on how the H/V counter works. Neither ZSNES nor Snes9X implement the counters at all (well, ZSNES adds 23 to the counter in a function called SA1Debug ... but it's not handling things like H/V vs linear mode, or counter reset) I guess nothing uses them. It's also not technically possible to emulate it the same way as the S-CPU does, via V/Hblank pins from S-PPU2 -> S-CPU, because those pins aren't connected to the cart bus. Neither is the S-CPU /NMI line, so no idea how NMIs work there either.

So, who knows what the final speed will look like. But it's an order of magnitude better than I thought it would be.
Last edited by byuu on Tue Apr 07, 2009 4:59 pm, edited 2 times in total.
Verdauga Greeneyes
Regular
Posts: 347
Joined: Tue Mar 07, 2006 10:32 am
Location: The Netherlands

Post by Verdauga Greeneyes »

Wow, sweet. What a great thing to come home to :) I'm not sure how to read your first sentence in the post above - how does 68fps on the E4600 compare to your E8400? I'd be interested in seeing what performance will look like on the i7.. if you're looking at future CPUs that's your best bet for now. (might be best to wait for a finished and profiled version before making serious comparisons though)
byuu

Post by byuu »

Damn ... I still can't believe I'm running Kirby 3 at full speed on my work PC (it used to be a P4 1.4GHz for the last three years. Just got the E4600 system upgrade.) Wonder how quick I'd get fired for bringing in my Xbox 360 controller ;)

Oh and that cat is fucking awesome:
Image

For those interested, a look at the architecture (src/chip/sa1):

I decided against the Star Ocean approach of manually transforming addresses against the memory map settings. Instead, I use Bus::map() to dynamically change the layout when needed. I actually create a secondary bus (named SA1Bus) to handle the different address bus layouts between the two chips.

Though definitely overkill for the normal S-CPU, the SA-1 utilizes every last bit of flexibility in Bus::map() to pull this off. I use the size delimiter for 00-3f|80-bf:6000-7fff BW-RAM page mirroring, the built-in max size wrapping with Nach's mirror() function so that no matter what you set the MMC to, it won't access memory out of bounds, and I actually override the MMIO dispatch table for 00-3f|80-bf:3000-37ff for faster S-CPU side IRAM access. Both chips share the MMIO table between 00-3f|80-bf:2200-23ff, and the SA-1 has the rest of the MMIO range unmapped.

The SA-1 CPU and the S-CPU opcode cores both inherit from CPUcore (src/cpu/core). Nice thing is that both modules build in 1/10th of a second for changes, and the 2-3 second core only has to be built once. After I port the S-SMP to the new pre-processor, I'll make an SMPcore class. So if anyone wants to try something crazy, they can change:

Code: Select all

class SA1 : public CPUcore
to:

Code: Select all

class SA1 : public SMPcore
... and have a 10.74MHz S-SMP co-processor :D

Done more for consistency than for need, of course.

I deferred S-CPU reset vector fetch until emulation begins, allowing mapping after CPU::power/reset() to work. 100% of the SA-1 for both busses is mapped in SA1Bus::init(). Will probably handle BS-X this way in the future.

The current bus collision detection is pretty simple. CPUcore gains regs.bus, which holds the current address bus value. It's set before a memory read / write begins, and cleared for I/O access.

The SA-1 checks to see what area it's accessing for its timing. If it's a shared area, it checks the S-CPU regs.bus value. If they point to the same logical device, you get a speed penalty added immediately.

The more proper way to do this would be to cut sCPU::bus_write() in half like I do with bus_read(), and only hold the regs.bus value for half of each cycle (~5.38MHz.) Then when the SA-1 tries to access the same chip, it should instead keep waiting until the S-CPU is not asserting the bus. Problem is this will slow down non-SA1 games by quite a bit (maybe 10%?) for no gain to them. Might be worth making bus_read/write a function pointer. Make up the penalty by not setting regs.bus at all in the normal method.

Still, I believe the end result will be ~99% the same, timing wise. The most important part, the major speed difference between SA-1 running in ROM while S-CPU runs in WRAM vs both running in ROM, is emulated well either way.

This is where syncing at the clock/bus-level really shines: without it, you wouldn't be able to reliably detect this edge case. The best you could do would be to look at cpu.regs.pc, which would fail to catch penalties from WRAM code accessing ROM.

Given this kind of performance ... I'm starting to think there'd be no point to true multi-threading of the individual components. We'd need at least a hexa-core, and by then one core is already fast enough. It'd only benefit cycle-PPU+SA-1 (assuming it even worked, and the mutexes didn't kill performance), but really ... that's not meant for gaming anyway. Probably not worth the trouble when otherwise the cothreaded model is more than sufficient.

I recall Snes9X required an absolute top-of-the-line system at first, and when SA-1 support first appeared. And ten years later you can run it on cell phones. Let's hope Moore's Law keeps up ;)

I wish I knew more about the Genesis architecture. If it syncs at the same speed, then Steve Snake would be right about not needing multi-threading there either. Still, Nemesis' is the emulator to watch for multi-threaded processor design.
how does 68fps on the E4600 compare to your E8400?
E8400 gets ~87fps at stock 3GHz. Easy to push those things to 3.6GHz, too. The Core i7 will be slower due to anemic stock clock and L2 cache. E8400 is a much better value for single-threaded apps.

That's also the absolute worst-case and without PGO. Average SA-1 speed would be ~110fps + ~15% for E8400.
FitzRoy
Veteran
Posts: 861
Joined: Wed Aug 04, 2004 5:43 pm
Location: Sloop

Post by FitzRoy »

Awesome! Can't wait to play KDL3 for its Jun Ishikawa score.

I've tested a lot of troublemakers after the opcode rewrite, as I'm sure you have. Seems fine, but I'll still want to retest the whole library again at some point, maybe a year from now.
adventure_of_link
Locksmith of Hyrule
Posts: 3634
Joined: Sun Aug 08, 2004 7:49 am
Location: 255.255.255.255
Contact:

Post by adventure_of_link »

Wouldn't get your hopes up about SA-1 games on Snes9X and it running on handheld devices, byuu. the PSP port (even though it's ages old) barely manages to get 5-6 FPS. Surprisingly, the Wii port gets a playable speed (however I feel that on the wiimote+nunchuck the timed hits are off somewhat on super mario RPG.)

Other than this, nice job Byuu :D
<Nach> so why don't the two of you get your own room and leave us alone with this stupidity of yours?
NSRT here.
ShadowFX
Regular
Posts: 265
Joined: Thu Jul 29, 2004 8:55 am
Location: The Netherlands

Post by ShadowFX »

Awesome progress! I'll be waiting for more to come...
[i]"Change is inevitable; progress is optional"[/i]
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

in before "about fucking time"




still, about fucking time
now work on superfx2





with love,
grin
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
Killa B
♥ Love Freak FlonneZilla ♥
Posts: 111
Joined: Sun Apr 01, 2007 12:59 am
Location: USA
Contact:

Post by Killa B »

:(
Image

That is awesome, though. :P

I feel silly for asking this, but are the WIPs publicly available, or do you just use them for taking progress screenshots?
Last edited by Killa B on Tue Apr 07, 2009 6:57 pm, edited 2 times in total.
Dullaron
Lurker
Posts: 199
Joined: Mon Mar 10, 2008 11:36 pm

Post by Dullaron »

Wow. I'm so happy byuu. You just can't stop working on bsnes. You want more. :D
Window Vista Home Premium 32-bit / Intel Core 2 Quad Q6600 2.40Ghz / 3.00 GB RAM / Nvidia GeForce 8500 GT
Canar

Post by Canar »

byuu wrote:Oh and that cat is fucking awesome:
Image
Hi. :D
byuu

Post by byuu »

Pop Star area is screwed in Kirby, heh. At least it's usable.
Seems fine, but I'll still want to retest the whole library again at some point, maybe a year from now.
I broke something in WIP03, forgot to update status.wai_lock to regs.wai. But I didn't post that one even privately, so. Thanks for testing :D
Seems fine, but I'll still want to retest the whole library again at some point, maybe a year from now.
Too soon. Should wait for perfect PPU + SuperFX, then just test every single game and document all issues.
still, about fucking time
now work on superfx2
I really believed it'd require at least 5-8GHz for full speed. I'm still in shock as to how easy this part has been. Worst is yet to come of course.

And SuperFX2 gives us something to look forward to in the future. If I support that, then what will people clamor for? Quick-move Shogi Match with Nidan Rank-holder Morita II? :(
That is awesome, though.
Well it's only been two days so far. Hoping it just needs S-CPU IRQ notification.
Hi.
Win :D
DancemasterGlenn
Veteran
Posts: 637
Joined: Sat Apr 21, 2007 8:05 pm

Post by DancemasterGlenn »

What?? A change in bsnes that we're not all arguing about? It must not have anything to do with the gui...

Congrats byuu, looking forward to seeing you progress on this new front!
I bring the trouble.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Canar wrote:
byuu wrote:Oh and that cat is fucking awesome:
Image
Hi. :D
OBJECTION !!!

That cat is named Nago. Who are you and why are you failing to impersonate a fictional triplejumping feline which lacks guns on its hood and hence sucks... wait, think I got lost a bit there. Starting over...

OBJECTION !
Triplejumps are for weenies, arr. Real men clear the whole game solo. Well, except for the sekritz to reach 100%.

This post is totally not biased by my exploding hate for miserable dwarven felines.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

Hmm. I could really use help from someone who understands the SA-1 ...

$2200:
CPU -> SA-1 IRQ
CPU -> SA-1 NMI

$2201:
SA-1 -> CPU IRQ enable

$2209:
SA-1 -> CPU IRQ

$220a:
CPU -> SA-1 IRQ enable
CPU -> SA-1 NMI enable

So $2200 enables IRQs from the SNES to generate IRQs on the SA-1. $220a acts as a mask to block the triggering (but not state transitions) for $2200.

Likewise, $2201 enables IRQs from the SA-1 to generate IRQs on the SNES. $2201 is the mask here.

Obviously the SA-1 cannot send NMIs to the CPU, because there's no /NMI line on the cart connector; just an /IRQ line. So what's with $2205,$2206 CNV? How does the SA-1 generate an internal IRQ?

$2210:
Timer mode: linear or H/V
H/V IRQ mode: none, H, V, H+V

$2211:
Reset timer.

Since S-PPU2 X/Y aren't connected to the cart bus, I don't see how the SA-1 can know its H/V position. So I take it writing to $2211 resets the H/V counter as well ...

So what happens if you set linear timer mode and enable H and/or V IRQs?

And this will cause drifting by not supporting the 'short' scanline on non-interlace fields. -120 clocks per second. I guess that gives you a skew of one second per emulated 180,000 seconds; so probably not a big deal. But a game would definitely want to sync the counters every now and then ... how annoying.

$230c,$230d:
Variable bit-length read port. Fixed mode seems absolutely useless, adjusting the address will take almost as long as masking the data yourself.

So then does auto-increment work after reads to $230c or $230d? What if you set a bit-size of 4? Do you still need to read both 230c+230d to increment that?

-----

Super Mario RPG:

Code: Select all

* w2200 = 00
- CPU -> SA1 IRQ = false
- CPU -> SA1 NMI = false

* w2201 = 80
- SA1 -> CPU IRQ enable = true

"SA-1 CPU IRQEN: IRQ enable/disable from the SA-1 CPU"

* w2209 = 87
- SA1 -> CPU IRQ = true

"Super NES CPU IRQ: IRQ from SA-1 CPU to Super NES CPU"

* w220a = 90
- CPU -> SA1 IRQ enable = true
- CPU -> SA1 NMI enable = true

"Super NES CPU IRQEN: IRQ control from Super NES CPU to SA-1 CPU"
The only thing enabled according to my understanding is SA-1 -> CPU IRQs. But the IRQ timer register is never written to, and defaults to off.

So the SA-1 would never trigger an SNES IRQ (for it has no means to), and the CPU wouldn't trigger the SA-1 (for it was explicitly disabled / masked in $2200.)
Last edited by byuu on Tue Apr 07, 2009 11:13 pm, edited 1 time in total.
lordmissus
Ignorant Child
Posts: 326
Joined: Mon Apr 06, 2009 10:10 pm
Location: 1984

Post by lordmissus »

Hey byuu, I have a post on my site about your recent efforts with the SA-1
http://narf.byethost16.com/articles/byuu_sa1.html
Keep up the good work man.
byuu

Post by byuu »

Neat. Kirby 3 Pop Star effect relied upon not clearing MA after multiplication and sigma multiplication (eg cumulative addition.)

Image

Unrelated: division by zero should set MR to 0, not MA << 16.

Still no idea what the hell is up with BW-RAM $60-6f:0000-ffff. I get the projection from 4-bit / 16-bit ... but I don't see why conversion between bitmap and bitplane form is needed if this acts like an automatic translator between the two formats.

Now given, banks $6x have padding, but games could write to $6x and read back / transfer from $4x ...

---

How incredibly annoying. Snes9x:

Code: Select all

case 0x2200: {
	SA1.Waiting = (byte & 0x60) != 0;
//	SA1.Executing = !SA1.Waiting && SA1.S9xOpcodes;

	if (!(byte & 0x20) && (Memory.FillRAM [0x2200] & 0x20))	{
	    S9xSA1Reset ();
	}
	if (byte & 0x80) {
	    Memory.FillRAM [0x2301] |= 0x80;
	    if (Memory.FillRAM [0x220a] & 0x80) {
    		SA1.Flags |= IRQ_FLAG;
    		SA1.IRQActive |= SNES_IRQ_SOURCE;
    		SA1.Executing = !SA1.Waiting && SA1.S9xOpcodes;
	    }
	}
        ...
} break;

case 0x220a: {
	if (((byte ^ Memory.FillRAM [0x220a]) & 0x80) && (Memory.FillRAM [0x2301] & byte & 0x80)) {
  //if(byte & 0x80 != [0x220a] & 0x80) {
  //  if([0x2301] & 0x80) {
  //    if(byte & 0x80) {
  //      ...
  //    }
  //  }
  //}

  //if( (data & 0x80) && !(r220a & 0x80) && (r2301 & 0x80) ) { ... }
  //220a: 0->1 w/2301==1

	    SA1.Flags |= IRQ_FLAG;
	    SA1.IRQActive |= SNES_IRQ_SOURCE;
    //SA1.Executing = !SA1.Waiting;
	}
...
So $2200.d7 = 1 (CPU -> SA1 IRQ enable) sets IRQ_FLAG + SNES_IRQ_SOURCE.

But also transitioning $220a.d7 from 0->1 also sets IRQ_FLAG + SNES_IRQ_SOURCE.

I guess technically $2301.d7 won't be 1 until it's enabled. But since it's not cleared, you can turn $2200.d7 off, and still trigger IRQs with this code.
earbenT
New Member
Posts: 5
Joined: Thu Aug 14, 2008 3:34 am

Post by earbenT »

Quite an encouraging development indeed, and it'll be interesting to see what the final performance hit is like. Outstanding work, byuu!
Locked