Taking dynarecs one step further: Just-in-time assembly?

Announce new emulators, discuss which games run best under each emulator, and much much more.

Moderator: General Mods

byuu

Post by byuu »

The more accurate comparison would have been no$GBA
I always wonder how you guys gauge accuracy in closed source emulators. For all you know, there could be game-specific hacks for every single title. How would you know? You trust an author who already has something to hide?
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

I said more accurate, not totally accurate...

>.>

Edit: Given his insanely detailed GBA documentation( http://nocash.emubase.de/gbatek.htm ), I doubt he halfassed the emu completely...

Of course, I could be wrong.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

byuu wrote:You said: "(observe for example, GBA emulator compatibility is far higher compared to the SNES)."

Not: "(observe for example, VBA emulator compatibility is far higher compared to ZSNES)."

Hence, I wasn't mincing anything with my words. If the latter was what you meant, how was I to predict your misspoken statement?
Hmm, now you're thinking with port... err, with a simili-Nach wyriwym sentence parsing. ;)
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
byuu

Post by byuu »

I said more accurate, not totally accurate...
That's still in line with what I was saying. How do you know it's more accurate, then?
Hmm, now you're thinking with port... err, with a simili-Nach wyriwym sentence parsing. ;)
Oh god, dear god no ... I've been talking to Nach so much that ... no! Lies! I will hear no more of this! >_<
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

byuu wrote:
I said more accurate, not totally accurate...
That's still in line with what I was saying. How do you know it's more accurate, then?
NOTE: THE FOLLOWING POST MAY CONTAIN FRANPA LEVELS OF BULLSHIT

Wouldn't hacks for every single GBA game take up a huge amount of space in the exe? I compared the size no$GBA's exe(The latest free version, 2.6.) to zsnes' exe.

Now, when you consider the fact that it also runs DS games with fairly high compatibility... It doesn't add up to me.

Of course, as mentioned in the disclaimer, I could very well be talking out my ass.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Hacks are usually small. Very small - it's just a matter of adding/skipping checks to do something different in specific cases, so it's somewhere in the line of 20 bytes of binary per hack tops, before upx'ing the hell out of it.

Hackish solutions are horribly inaccurate yet very simple, which is why they're used - if it was harder to hack through than do stuff right, noone would do it.

@byuu: consider it a good thing ? ;)
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

Even so, hacks for every single game would add up to over 150 kb, given that the GBA has over 1000(Possible up to 2000) unique games, and that no$GBA also supports at least 100 DS games.

Edit: Might be a bit off here... Now, given 20 bytes per hack, and 2803 GBA dumps so far, that adds up to 56,060. However, considering that there are over 2000 DS games...
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

NO$GMB was written in Assembler, so I'd be wary of size comparisons.
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
Deathlike2
ZSNES Developer
ZSNES Developer
Posts: 6747
Joined: Tue Dec 28, 2004 6:47 am

Post by Deathlike2 »

Measuring hacks vs program size is not a valid comparison.
Continuing [url=http://slickproductions.org/forum/index.php?board=13.0]FF4[/url] Research...
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

I didn't think so(Hell, I put up a franpa alert), but it still doesn't seem to possible that an emulator that has Open GL accelerated 3D and a software version of the 3D code and GBA emulation would have hacks in every single fucking game and still only be 150 KB.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

Gonna insist on the fact you probably overestimate what a hack is.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

No, I'd be surprised if a hack took more than 15 bytes, to be honest...
AamirM
Regen Developer
Regen Developer
Posts: 533
Joined: Sun Feb 17, 2008 8:01 am
Contact:

Post by AamirM »

Hi,

I wrote this on another forum some time ago and now copy pasting it here
...I tried to write a M68000 dynamic recompiler for my Nugen(Neogeo, CPS1/2) and soon to be released Regen(Sega Genesis) emus but soon realized that it was not worth it. Although the increased speed is there but not by much(after 800Mhz Pentium) from the fastest 68k interpreted emu, the A68K, but its implementation is much harder than an interpreter. If I remeber correctly, the author of Generator emulator got similar results after writing a 68k recompiler for ARM processor. You can read his report from www.squish.net.

stay safe,

AamirM
I am sure you can write a much faster emulator by optimizing video and sound hardware than writing CPU JIT compiler or dynarec in less time. In Regen, the most of the time (24% reported by gprof) is taken by not even the VDP but by the YM2612 emulation. But if you still want to go on I would recommend that you try and use libjit (google it) to implement it as it will save you much time.

stay safe,

AamirM
tcaudilllg2
Hazed
Posts: 77
Joined: Fri Mar 21, 2008 12:52 am

Post by tcaudilllg2 »

I perceive that my aims are being misinterpreted by most as dynarec as defined by past efforts. Let me be clear about what I'm trying to accomplish: I'm aiming for dynamic translation. You know how Babelfish translates text from one spoken language to another? That's what I'm aiming for, but in terms of computer programming. A faithful translation, I theorize, would incur minimal speed loss between platforms. Indeed, the more features are common between platforms, the greater the performance of the translated code.

But I would like to see how you implemented your dynarec, AamirM.
I.S.T.
Zealot
Posts: 1325
Joined: Tue Nov 27, 2007 7:03 am

Post by I.S.T. »

BTW, I'd like to retract my statements... You see, I was under the impression that no$GBA does not use UPX. I found out it does(Reading a post on this forum...). >.>
AamirM
Regen Developer
Regen Developer
Posts: 533
Joined: Sun Feb 17, 2008 8:01 am
Contact:

Post by AamirM »

tcaudilllg2 wrote:I'm aiming for dynamic translation. You know how Babelfish translates text from one spoken language to another? That's what I'm aiming for, but in terms of computer programming.
Are you trying to do something like this ?

"move.l x, y" (M68000)

gets translated to:

"mov y, x" (x86)

So you are doing text processing?

Sorry, if I misinterpreted.
AamirM
Regen Developer
Regen Developer
Posts: 533
Joined: Sun Feb 17, 2008 8:01 am
Contact:

Post by AamirM »

Hi,

Here is what I did in my dynarec. Its just an overview. My terminology may be different:

There were two parts in it, the frontend and backend. The frontend mainly contained a basic
interpreter (not using jumptable), code to detect a code block (a code block can
start at any instruction but will end when a PC modifying opcode is encountered) and some
other things. The backend is the compiler. It will compile the block (not individual instructions)
to the native CPU's instructions to, in theory, perform the same operation. The native code was created
on stack on x86.

The interpreter was there so that if the dynarec was running on a CPU for which there was no
backend for, it would still run. Secondly it was also used to run a block specific number of
times before that block is recompiled (for self-modifying code). Lastly it was also used to
verify and validate if the compiled and interpreted code were indeed doing the same thing.
Code block information was kept in a list which specified thier range and some other
things which were used mainly for optimizations. All memory accesses would go through
a special function to see if a block was being modified in which case that block was invalidated
and its code flushed. This is pretty brute-force method of handling self-modyfing code.
The optimizations part was there to do optimizations such as dead flag calculation removal
but I miserably failed at doing it.

stay safe,

AamirM
tcaudilllg2
Hazed
Posts: 77
Joined: Fri Mar 21, 2008 12:52 am

Post by tcaudilllg2 »

AamirM wrote:
tcaudilllg2 wrote:I'm aiming for dynamic translation. You know how Babelfish translates text from one spoken language to another? That's what I'm aiming for, but in terms of computer programming.
Are you trying to do something like this ?

"move.l x, y" (M68000)

gets translated to:

"mov y, x" (x86)

So you are doing text processing?

Sorry, if I misinterpreted.
That, and adding in hardware simulation code in between the instructions where it would be invoked. (if the source platform and the target were markedly dissimilar.)

The idea is this:
- identify instruction, translate it to its equivalent on the target platform
- code the target platform to mimic the behavior of the source platform's hardware in response to the instruction. For example, a sprite flip bit on a console would need to be reproduced on a PC the same way it is on an emulator: by rearranging the sprite's data.

I wonder if your code would perform better without the self-modification checks. Absolutely the quest for accuracy would kill any performance gains otherwise obtained. (or so I suspect) That, and if you aren't translating as much of the hardware functionality as you can, you're not going to get good gains.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

tcaudilllg2 wrote:For example, a sprite flip bit on a console would need to be reproduced on a PC the same way it is on an emulator: by rearranging the sprite's data.
... flipping doesn't rearrange anything. That's THE POINT. You just read the same data, but in a different direction.
If flipping tiles rearranged their data, you'd lose all the advantage of reusing the same data over and over (and you'd fill the OAM in no time).
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
tcaudilllg2
Hazed
Posts: 77
Joined: Fri Mar 21, 2008 12:52 am

Post by tcaudilllg2 »

grinvader wrote:
tcaudilllg2 wrote:For example, a sprite flip bit on a console would need to be reproduced on a PC the same way it is on an emulator: by rearranging the sprite's data.
... flipping doesn't rearrange anything. That's THE POINT. You just read the same data, but in a different direction.
If flipping tiles rearranged their data, you'd lose all the advantage of reusing the same data over and over (and you'd fill the OAM in no time).
Indeed. But if you were going to port a SNES game to PC, would you still read the tile data from a different direction? Not likely, you'd probably flip it around in a buffer, and then replace the tile with its flipped version before blasting it to the display. How do you keep a record of the flip? By setting a flag which you correspond to the flip bit on the original hardware, and have the port refer to the bit by that.
grinvader
ZSNES Shake Shake Prinny
Posts: 5632
Joined: Wed Jul 28, 2004 4:15 pm
Location: PAL50, dood !

Post by grinvader »

tcaudilllg2 wrote:But if you were going to port a SNES game to PC, would you still read the tile data from a different direction? Not likely, you'd probably flip it around in a buffer, and then replace the tile with its flipped version before blasting it to the display.
Hmm, no.
I'd do exactly as it's done originally - read the tile data from the other way into the output display. No temporary buffer (waste of bytes, waste of time).
How do you keep a record of the flip? By setting a flag which you correspond to the flip bit on the original hardware, and have the port refer to the bit by that.
That's about right.
皆黙って俺について来い!!

Code: Select all

<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)
Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54
AamirM
Regen Developer
Regen Developer
Posts: 533
Joined: Sun Feb 17, 2008 8:01 am
Contact:

Post by AamirM »

Hi,

Hmmm.....that sound more like a static recompiler and simulator hybird. So you are in effect simulating rather than emulating. Doesn't sound nice but I have been wrong before.

stay safe,

AamirM

P.S. Sorry if I misinterpreted again.
tcaudilllg2
Hazed
Posts: 77
Joined: Fri Mar 21, 2008 12:52 am

Post by tcaudilllg2 »

Looked up the SNES PPU. Every little switch has its own little register, that I can tell. Should be able to make a translator of PPU calls quite fast indeed.
tcaudilllg2
Hazed
Posts: 77
Joined: Fri Mar 21, 2008 12:52 am

Post by tcaudilllg2 »

On the matter of the PPU:
- if the memory address is immediate, in-code switches are not necessary to determine the PPU function to process
- if the memory address is accessed from a register, then full emulation (not simulation) of the entire external system is probably required, because we don't know what the contents of that register are.

Example from x86 ASM:

Code: Select all

Mov AX, &H03 // sets the video mode
INT &H10
In this case, we could get a pretty even translation. INT always uses AX, so we know to get whatever is in AX as the parameter for the videomode change. ...Hmm, when I had conceived of this project, I had been thinking in terms of immediate addressing. I hadn't ever actually considered the use of indirect addressing, or what problems it would present.

The question is one of how much can be determined. To make the determinations, rules are necessary:
- the registers must be considered as variables in the produced source (that has already been established). If the code says, &H03 goes in AX, then the register most often used for purposes of interrupt addressing gets the value 0x03.
- the functions of the targeted machine must only use register variables to the extent that the source machine does.

The tricky part is that 0x03 may not be the text mode interrupt on the host machine, in which case you need a correspondence table to know that 0x03. When the interrupt is actually called, you leave 0x03 as it is... but you call instead with the value which corresponds to 0x03 on the target device.

Interrupts (an in/out operations) are simple... memory is more complicated.

When we write to say, $2103, we are doing with memory what would on the PC be done with output operations. This brings with it a problem, it that we must treat RAM both as memory and hardware output. The solution is to equate the hardware RAM regions with the hardware itself. [more later]
creaothceann
Seen it all
Posts: 2302
Joined: Mon Jan 03, 2005 5:04 pm
Location: Germany
Contact:

Post by creaothceann »

I'm not sure I understand you here...
tcaudilllg2 wrote:The question is one of how much can be determined. To make the determinations, rules are necessary:
- the registers must be considered as variables in the produced source (that has already been established). If the code says, &H03 goes in AX, then the register most often used for purposes of interrupt addressing gets the value 0x03.
- the functions of the targeted machine must only use register variables to the extent that the source machine does.

The tricky part is that 0x03 may not be the text mode interrupt on the host machine, in which case you need a correspondence table to know that 0x03. When the interrupt is actually called, you leave 0x03 as it is... but you call instead with the value which corresponds to 0x03 on the target device.
Note that the target machine has typically completely different hardware than the emulated machine, so direct translations of "function arguments" won't suffice. Instead you'll have to simulate the hardware too.
tcaudilllg2 wrote:When we write to say, $2103, we are doing with memory what would on the PC be done with output operations. This brings with it a problem, it that we must treat RAM both as memory and hardware output. The solution is to equate the hardware RAM regions with the hardware itself.
You should treat the read/write accesses as a CPU output facility ("in"/"out" with x86 ASM), and RAM/ROM only as components that are mapped into the address space.
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list
Post Reply