View unanswered posts | View active topics It is currently Mon Sep 23, 2019 4:50 am



This topic is locked, you cannot edit posts or make further replies.  [ 408 posts ]  Go to page 1, 2, 3, 4, 5 ... 17  Next
bsnes v0.038 released 
Author Message
Post bsnes v0.038 released
byuu.org wrote:
2008-12-15 - bsnes v0.038 released

The main change for this release is what I talked about in my last post. Because of this, I can finally time exact cycle positions for writes to take effect within the S-PPU core. So far, I've only added OAM reset at Y=240,H=10 and OBSEL fetching at H=1152. The latter hasn't been verified on hardware, but it does fix the single black scanlines evident in the intros to Mega Lo Mania and Winter Olympics. Previously, I had a setting named ppu.hack.obj_cache ... this essentially cached OBSEL at H=512, and without it, at H=512+1364. The setting was needed to fix these games, but would then cause sprite flickering in other games, such as Ninja Warriors and Lord of the Rings. With the new timing, all of these games work correctly with the same timing.

I should note, this is still a scanline-based renderer at its core. I'm not aiming for, nor do I believe it is possible to, obtain 100% perfect rendering compatibility with this approach. But I will continue to expand its cycle-level capabilities, as it will no doubt be much faster than a true, fully cycle-based S-PPU renderer.

All of that said, the extra state logging, decoupling of timing code in the most critical section of the emulator, etc means that a small speed hit was inevitable. I mitigated it as much as I could, but it appears that Core processors suffer a ~6% speed hit from the previous version. Oddly enough, AMD processors seem to be largely unaffected by the change.

I know these speed hits continue to stack, but that's the nature of the beast. I've added a link to tukuyomi's SNES emulator archive. If you're not able to get full speed, I'd strongly recommend using an older version of the emulator. v017 in particular is nearly twice as fast as the current version, while still being very close to bug-free.

On the bright side, the new synchronization model is 100% compatible between both the scanline renderer and a future cycle renderer. That will allow me to avoid a lot of timing code duplication, and it will also allow me to continue to offer a scanline renderer in future builds. And once I get the cycle renderer perfected, I would like to team up with some other people and work on a fast and accurate emulator.

Lastly, I'm changing up how the emulator is distributed. The readme and license files are now embedded inside the executabe, accessible from the help menu. As it is no longer required to include these text files, I can distribute the executable itself directly, ala uTorrent. For sites that mirror bsnes, but do not want to host it as a direct EXE file, feel free to put it inside a ZIP archive (along with a language locale file, if you wish.)

</wall of text>

Changelog:
* eliminated S-DD1 DMA enslavement to the S-CPU; this allows the S-DD1 to behave more like the real chip, and it also simplifies the S-CPU DMA module
* eliminated S-PPU enslavement to the S-CPU; all processor cores now run independently of each other
* added cycle-level S-PPU timing for OAM address reset and OBSEL; fixes scanline glitches in Mega Lo Mania and Winter Olympics
* removed ppu.hack.* settings; as they are no longer needed due to above changes
* corrected VRAM tiledata cache bug; fixes Super Buster Bros v1.0 reset glitch
* added memory export and trace logging key bindings to user interface
* removed WAV logging (to trim the emulation core)
* embedded readme and license texts inside executable
* simplified S-CPU, S-SMP flag register handling
* source code cleanup for S-CPU timing module
* GUI-Linux: added style improvements to the listbox and combo box controls
* GUI-Linux: finally added filetype filter support to the file open dialog
* GUI-all: shrunk configuration panel [FitzRoy]
* GUI-all: modified paths panel descriptions for clarity [FitzRoy]


Regressions, bug reports, opinions on the new direct-executable distribution method, etc welcome.

Also, please be mindful of the rule[s] before commenting.


Mon Dec 15, 2008 8:37 am
Trooper
User avatar

Joined: Tue Oct 31, 2006 7:17 pm
Posts: 376
Post Re: bsnes v0.038 released
byuu wrote:
byuu.org wrote:
2008-12-15 - bsnes v0.038 released

The main change for this release is what I talked about in my last post. Because of this, I can finally time exact cycle positions for writes to take effect within the S-PPU core. So far, I've only added OAM reset at Y=240,H=10 and OBSEL fetching at H=1152. The latter hasn't been verified on hardware, but it does fix the single black scanlines evident in the intros to Mega Lo Mania and Winter Olympics. Previously, I had a setting named ppu.hack.obj_cache ... this essentially cached OBSEL at H=512, and without it, at H=512+1364. The setting was needed to fix these games, but would then cause sprite flickering in other games, such as Ninja Warriors and Lord of the Rings. With the new timing, all of these games work correctly with the same timing.

I should note, this is still a scanline-based renderer at its core. I'm not aiming for, nor do I believe it is possible to, obtain 100% perfect rendering compatibility with this approach. But I will continue to expand its cycle-level capabilities, as it will no doubt be much faster than a true, fully cycle-based S-PPU renderer.

All of that said, the extra state logging, decoupling of timing code in the most critical section of the emulator, etc means that a small speed hit was inevitable. I mitigated it as much as I could, but it appears that Core processors suffer a ~6% speed hit from the previous version. Oddly enough, AMD processors seem to be largely unaffected by the change.

I know these speed hits continue to stack, but that's the nature of the beast. I've added a link to tukuyomi's SNES emulator archive. If you're not able to get full speed, I'd strongly recommend using an older version of the emulator. v017 in particular is nearly twice as fast as the current version, while still being very close to bug-free.

On the bright side, the new synchronization model is 100% compatible between both the scanline renderer and a future cycle renderer. That will allow me to avoid a lot of timing code duplication, and it will also allow me to continue to offer a scanline renderer in future builds. And once I get the cycle renderer perfected, I would like to team up with some other people and work on a fast and accurate emulator.

Lastly, I'm changing up how the emulator is distributed. The readme and license files are now embedded inside the executabe, accessible from the help menu. As it is no longer required to include these text files, I can distribute the executable itself directly, ala uTorrent. For sites that mirror bsnes, but do not want to host it as a direct EXE file, feel free to put it inside a ZIP archive (along with a language locale file, if you wish.)

</wall of text>

Changelog:
* eliminated S-DD1 DMA enslavement to the S-CPU; this allows the S-DD1 to behave more like the real chip, and it also simplifies the S-CPU DMA module
* eliminated S-PPU enslavement to the S-CPU; all processor cores now run independently of each other
* added cycle-level S-PPU timing for OAM address reset and OBSEL; fixes scanline glitches in Mega Lo Mania and Winter Olympics
* removed ppu.hack.* settings; as they are no longer needed due to above changes
* corrected VRAM tiledata cache bug; fixes Super Buster Bros v1.0 reset glitch
* added memory export and trace logging key bindings to user interface
* removed WAV logging (to trim the emulation core)
* embedded readme and license texts inside executable
* simplified S-CPU, S-SMP flag register handling
* source code cleanup for S-CPU timing module
* GUI-Linux: added style improvements to the listbox and combo box controls
* GUI-Linux: finally added filetype filter support to the file open dialog
* GUI-all: shrunk configuration panel [FitzRoy]
* GUI-all: modified paths panel descriptions for clarity [FitzRoy]


Regressions, bug reports, opinions on the new direct-executable distribution method, etc welcome.

Also, please be mindful of the rule[s] before commenting.


Looking mighty promising for the future renderer

_________________
I want to fry~~ Sky Hiiiiiiiiigh~
Let's go-o-o-O~ togeda~


Mon Dec 15, 2008 9:14 am
Profile
Post 
So, I noticed this comment from v037 shortly before release:

Code:
  //note: this should actually occur at V=225,HC=10.
  //this is a limitation of the scanline-based renderer.
  //... OAM reset stuff


Easy enough now. Got that in right before release. For those curious how the new PPU scheduling works:

Code:
void bPPU::enter() {
  loop:
  //H =    0 (initialize)
  scanline();
  if(ppucounter.ppuvcounter() == 0) frame();
  add_clocks(10);

  //H =   10 (OAM address reset)
  if(ppucounter.ppuvcounter() == (!overscan() ? 225 : 240)) {
    if(regs.display_disabled == false) {
      regs.oam_addr = regs.oam_baseaddr << 1;
      regs.oam_firstsprite = (regs.oam_priority == false) ? 0 : (regs.oam_addr >> 2) & 127;
    }
  }
  add_clocks(502);

  //H =  512 (render)
  render_scanline();
  add_clocks(640);

  //H = 1152 (cache OBSEL)
  cache.oam_basesize   = regs.oam_basesize;
  cache.oam_nameselect = regs.oam_nameselect;
  cache.oam_tdaddr     = regs.oam_tdaddr;
  add_clocks(ppucounter.ppulineclocks() - 1152);  //seek to start of next scanline

  goto loop;
}


Bla bla "goto is evil", whatever. Replace it with while(true) {} if it lets you sleep better.

Also, here is the big timing change that causes the large speed hit:

Code:
  alwaysinline void tick() {
    history.ppudiff += 2;  //this is new
    status.hcounter += 2;

    if(status.hcounter >= 1360 && status.hcounter == lineclocks()) {
      //this part is hit one in 680 calls, so optimizing it won't help much
      status.hcounter = 0;
      status.vcounter++;
      if((region() == 0 && interlace() == false && status.vcounter == 262)
      || (region() == 0 && interlace() == true  && status.vcounter == 263)
      || (region() == 0 && interlace() == true  && status.vcounter == 262 && status.field == 1)
      || (region() == 1 && interlace() == false && status.vcounter == 312)
      || (region() == 1 && interlace() == true  && status.vcounter == 313)
      || (region() == 1 && interlace() == true  && status.vcounter == 312 && status.field == 1)
      ) {
        status.vcounter = 0;
        status.field = !status.field;
      }

      scanline();
    }

    history.index = (history.index + 1) & 2047;
    history.field   [history.index] = status.field;  //this is new
    history.vcounter[history.index] = status.vcounter;
    history.hcounter[history.index] = status.hcounter;
  }


Lines not marked "new" were in the old code.

May be some minor optimizations possible ... but such simplistic code really shouldn't be affecting speed much at all. Called exactly 10.5 million times a second, and those two lines eat up the 6-10% of time lost from the last release.


Mon Dec 15, 2008 9:57 am
Inmate

Joined: Thu Jan 11, 2007 4:28 am
Posts: 1485
Location: Salem, Oregon
Post 
it's great to see such progress towards the cycle based renderer and I didn't even notice the speed hit on my system. Congrats for another release byuu, and it seems like this is a breakthrough of sorts to allow some further tweaks in the weeks to follow. Great stuff.

_________________
byuu wrote:
Seriously, what kind of asshole makes an old-school 2D emulator that requires a Core 2 to get full speed? >:(


Mon Dec 15, 2008 11:44 am
Profile WWW
Seen it all
User avatar

Joined: Mon Jan 03, 2005 5:04 pm
Posts: 2302
Location: Germany
Post 
Question: Did you ever find a game/module that uses the "FirstSprite+Y priority" feature?

Nice release btw. :P I'll post a localized locale.cfg later.

_________________
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list


Mon Dec 15, 2008 11:49 am
Profile WWW
Regular
User avatar

Joined: Thu Jul 29, 2004 8:55 am
Posts: 265
Location: The Netherlands
Post 
A bug in the (probably) the GUI prevents me from using the diaeresis (e+¨).

"Export data:" = "Geëxporteerde gegevens:"

Just a heads up before I'm posting the updated locale.cfg for Dutch.

_________________
"Change is inevitable; progress is optional"


Mon Dec 15, 2008 1:11 pm
Profile
Seen it all
User avatar

Joined: Mon Jan 03, 2005 5:04 pm
Posts: 2302
Location: Germany
Post 
http://rapidshare.com/files/173544573/b ... locale.rar

Some other things:
- Long key names are cut off by the joypad image (pic). How about doubling the space for the text, and centering it there vertically?
- The notes in the advanced section of the settings window are not translated. Add them to locale.cfg?
- Real boolean values ("bool") in the advanced section? This would allow the user to toggle them with a doubleclick onto the list item.
- Disable the "Set" button in the advanced section if the value in the edit control is equal to the currently set value? (This is only a cosmetic issue.)

_________________
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list


Mon Dec 15, 2008 1:31 pm
Profile WWW
Regular
User avatar

Joined: Tue Mar 07, 2006 10:32 am
Posts: 347
Location: The Netherlands
Post 
By the way, now that bsnes includes the readme in the executable, is there a call for this to be translated as well?


Mon Dec 15, 2008 1:31 pm
Profile
Veteran

Joined: Wed Aug 04, 2004 5:43 pm
Posts: 861
Location: Sloop
Post 
Verdauga Greeneyes wrote:
By the way, now that bsnes includes the readme in the executable, is there a call for this to be translated as well?


That might be asking too much of translators to do the readme and license and advanced text. But at least this text can be easily copied and pasted.


Mon Dec 15, 2008 3:32 pm
Profile
Lurker

Joined: Wed Jul 28, 2004 1:35 am
Posts: 128
Post 
The readme ought to be completely reasonable to localize.

The license, for some very good reasons, probably should not be localized.


Mon Dec 15, 2008 4:17 pm
Profile ICQ YIM
Post 
Quote:
I didn't even notice the speed hit on my system


And this is why I hate modern processors so much :P
Pentium IV 1.7GHz takes a 25% speed hit from the last version (51->41)
E8400 takes a 7% speed hit (152->142)
E4600 takes a 4% speed hit (118->113)
Athlon 3500+ takes no speed hit at all

That kind of difference for two new addition statements ... it's very annoying.

Quote:
it seems like this is a breakthrough of sorts to allow some further tweaks in the weeks to follow


Easily the biggest since the S-DSP by blargg :D
Someone recently mentioned he couldn't understand bsnes' scheduling system ... it's getting hard even for me to follow. Really cool the way it all comes together and works as expected.

Quote:
A bug in the (probably) the GUI prevents me from using the diaeresis (e+¨).


I was able to get this to work. Make sure the file format is UTF-8, as diaeresis letters are > U+007F.

Quote:
Question: Did you ever find a game/module that uses the "FirstSprite+Y priority" feature?


Nope, I've never seen a game use it, and anomie's description was too vague. Hoping it will be supported transparently when re-writing the PPU.

Quote:
Some other things:


Thanks for the suggestions. I can add most of them.
I'd recommend shortening the joypad description for the time being, at least.

Quote:
That might be asking too much of translators to do the readme and license and advanced text. But at least this text can be easily copied and pasted.


Yeah, I didn't want to bug the translators with giant walls of text. Really, nobody should even need to use the advanced panel, and the readme is too big (plus we're still planning to re-do it or whatever.)

I'm sure it's already annoying getting blank locale files for each new version. Hopefully people are mostly just adding the missing strings to v037a's locale, rather than starting over each time :/


Mon Dec 15, 2008 5:45 pm
Regular
User avatar

Joined: Thu Jul 29, 2004 8:55 am
Posts: 265
Location: The Netherlands
Post 
byuu wrote:
Quote:
A bug in the (probably) the GUI prevents me from using the diaeresis (e+¨).


I was able to get this to work. Make sure the file format is UTF-8, as diaeresis letters are > U+007F.

Thank you, it is working properly now.

_________________
"Change is inevitable; progress is optional"


Mon Dec 15, 2008 5:59 pm
Profile
Regular
User avatar

Joined: Tue Mar 07, 2006 10:32 am
Posts: 347
Location: The Netherlands
Post 
byuu wrote:
Hopefully people are mostly just adding the missing strings to v037a's locale, rather than starting over each time :/

You could consider making a diff for the translators. But as long as new strings can be added to the end of the locale file it shouldn't matter much.

Regarding the readme, it should be easier to translate than the shorter strings in many ways even if it is longer, but I agree we should at least wait until we're happy with the English version.


Mon Dec 15, 2008 6:32 pm
Profile
Seen it all
User avatar

Joined: Mon Jan 03, 2005 5:04 pm
Posts: 2302
Location: Germany
Post 
byuu wrote:
I'd recommend shortening the joypad description for the time being, at least.

Done. :D

http://rapidshare.com/files/173687599/b ... le__v2.rar
(result)

EDIT: fixed download

byuu wrote:
I'm sure it's already annoying getting blank locale files for each new version. Hopefully people are mostly just adding the missing strings to v037a's locale, rather than starting over each time :/

I just open the previous locale.cfg and the new one in Notepad++ and switch between them. Copying lines or even entire sections is easy.

_________________
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list


Last edited by creaothceann on Tue Dec 16, 2008 2:06 am, edited 1 time in total.



Mon Dec 15, 2008 6:51 pm
Profile WWW
Regular
User avatar

Joined: Thu Jul 29, 2004 8:55 am
Posts: 265
Location: The Netherlands
Post 
Dutch translation for v0.038 ready:

DOWNLOAD!

_________________
"Change is inevitable; progress is optional"


Mon Dec 15, 2008 7:21 pm
Profile
Veteran

Joined: Wed Aug 04, 2004 5:43 pm
Posts: 861
Location: Sloop
Post 
byuu wrote:
Pentium IV 1.7GHz takes a 25% speed hit from the last version (51->41)


I wouldn't worry about ancient processor classes that couldn't get full speed to begin with. In two years, even the cheapest netbook will probably get fullspeed. That tells me that a massive writedown (beyond the IRQ trick) for PC architecture probably wouldn't be worth the trouble. Popular ultra-portables like the PSP are where a higher calibur SNES emulator is really needed. I'm not really sure when the next iteration of handhelds is.


Mon Dec 15, 2008 7:26 pm
Profile
Post 
Anyone want to help come up with some better names for the new PPUcounter class?

Code:
class PPUcounter {
  //I like these two names ... they convey that S-CPU runs ahead of S-PPU
  void tick();  //advance S-CPU
  void tock();  //advance S-PPU

  bool field();  //S-CPU current field value (0 = even, 1 = odd)
  uint16 vcounter();  //S-CPU current vertical counter
  uint16 hcounter();  //S-CPU current horizontal counter
  uint16 hdot();  //S-CPU current horizontal dot (pixel) position
  uint16 lineclocks();  //S-CPU number of clocks on this scanline

  bool field(unsigned n);  //S-CPU field value 'n' clocks ago [history buffer]
  uint16 vcounter(unsigned n);
  uint16 hcounter(unsigned n);

  bool ppufield();  //S-PPU current field value
  uint16 ppuvcounter();
  uint16 ppuhcounter();
  uint16 ppulineclocks();
} ppucounter;


Because yeah, ppucounter.ppuvcounter() looks like ass :(

A shame, I could easily do something like this:

Code:
uint16 PPUcounter::vcounter(unsigned n = 0) {
  return co_active() == thread_cpu ? cpu_vcounter(n) : ppu_vcounter(n);
}


Eg automatically detect the active thread and dynamically change what vcounter() returns transparently. Kind of like thread local storage for cooperative threads.

But the code is so sensitive that a simple comparison would cause a speed hit.

Anyway ... need something minimalist, clean, and with the least amount of repetition. The counters are part of the PPU, and I can technically subclass this inside the main PPU class, so that ppucounter.* becomes ppu.*.

Quote:
Popular ultra-portables like the PSP are where a higher calibur SNES emulator is really needed. I'm not really sure when the next iteration of handhelds is.


Yeah, we really don't have a sweet spot between portability, compatibility and speed right now. 9x uses an older opcode-based model, ZSNES uses x86 assembler and SNESGT is closed source.

I'd really like to team up with some other people (maybe AamirM? ;) and work on something like that once I finish the cycle renderer. Would be fun working in a group via SVN or something.


Last edited by byuu on Mon Dec 15, 2008 8:12 pm, edited 1 time in total.



Mon Dec 15, 2008 8:02 pm
ZSNES Shake Shake Prinny
User avatar

Joined: Wed Jul 28, 2004 4:15 pm
Posts: 5615
Location: PAL50, dood !
Post 
byuu wrote:
Bla bla "goto is evil", whatever. Replace it with while(true) {} if it lets you sleep better.

I don't understand why you don't use it, really. Syntactically identical, takes less space to boot, and not using goto: win-win-win. ^^
Goto is not evil, since it's only a way to write a jump.
It usually indicates either a really special event where it's the only right thing to do, or the classic case that gave it a bad reputation (silly code). Typically you can write code that's 'aware' of the inavoidable jump and end up removing it altogether, so eventually gotos became synonym with lazy/bad code.
And there's no shortage of lazy/bad coders, so the picture stuck. Every once in a while a skilled coder will use it, but that's not enough to bring it back from shameland where the morons made it drift.

Quote:
Code:
big OR block
Interesting. Probably can condense it a fair bit with some effort. 1/600 is still a lot with millions of calls per seconds, nay ?

_________________
皆黙って俺について来い!!
Code:
<jmr> bsnes has the most accurate wiki page but it takes forever to load (or something)

Pantheon: Gideon Zhi | CaitSith2 | Nach | kode54


Mon Dec 15, 2008 8:11 pm
Profile
Post 
Quote:
I don't understand why you don't use it, really. Syntactically identical, takes less space to boot, and not using goto: win-win-win. ^^


The extra indentation in the main loop for absolutely no reason is just annoying, really. Especially in bigger modules like the S-DSP.

Used to have it where the cothread system would automatically re-enter the thread entry point upon return from it (rather than showing undefined behavior as now.) But yeah, that would be even more confusing to outsiders looking at the code.

Quote:
Interesting. Probably can condense it a fair bit with some effort. 1/600 is still a lot with millions of calls per seconds, nay ?


An extra two or three compares ~1,600x a second. But if it helps, I'll be happy to optimize it.

Code:
if((region() == 0 && interlace() == false && status.vcounter == 262)
|| (region() == 0 && interlace() == true  && status.vcounter == 263)
|| (region() == 0 && interlace() == true  && status.vcounter == 262 && status.field == 1)
|| (region() == 1 && interlace() == false && status.vcounter == 312)
|| (region() == 1 && interlace() == true  && status.vcounter == 313)
|| (region() == 1 && interlace() == true  && status.vcounter == 312 && status.field == 1)
)

vs:

if(status.vcounter == (313 - !region() * 50) - (!interlace() | (interlace() & status.field)))


Blech, absolutely evil x.x

I'd have to use more grouping layers in the if statement. It'd be harder to read, but should be doable. I'd honestly hope the compiler would optimize the redundant checks out on its own, but it probably doesn't.

I really want to understand why a simple math op, even 10 million times a second, is causing such a tremendous speed loss. You wouldn't think that'd cause a modern processor to even blink.

Seriously, this is half of bsnes' speed problem in 2kb of code. I can accept that I'm a bad programmer, sure. If someone can find a way to get it running faster, I'll happily merge their changes. I'm at a loss.


Last edited by byuu on Mon Dec 15, 2008 8:53 pm, edited 1 time in total.



Mon Dec 15, 2008 8:18 pm
Seen it all
User avatar

Joined: Mon Jan 03, 2005 5:04 pm
Posts: 2302
Location: Germany
Post 
byuu wrote:
The extra indentation in the main loop for absolutely no reason

It's a visual clue. I'd prefer "while" for that reason.


EDIT: How about this code?

Code:
DestV := 262;
if region then Inc(DestV, 50);
if (not interlace) then begin
        if (status.vcounter = DestV)                            then goto proceed;
end else begin
        if (status.vcounter = DestV + 1)                        then goto proceed;
        if (status.vcounter = DestV    ) and (status.field = 1) then goto proceed;
end;
goto skip;

_________________
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list


Last edited by creaothceann on Mon Dec 15, 2008 9:15 pm, edited 2 times in total.



Mon Dec 15, 2008 8:53 pm
Profile WWW
Post 
Again byuu, you are great. I now, yet again, have a new version of bsnes to (finish) Shin Megami Tensei on. I'm almost at the end.
Very soon, your wonderful emulator will be running Shin Megami Tensei 2.
Yes, SMT is my favourite SNES game.

A great reason to compile a new version of bsnes on my linux box too (on Windows I have .037a, but on Linux I'm still using .034; that's about to change).


Mon Dec 15, 2008 9:07 pm
Gambatte Developer
Gambatte Developer

Joined: Fri Oct 21, 2005 4:03 pm
Posts: 157
Location: Norway
Post 
Try putting this part:
Code:
      //this part is hit one in 680 calls, so optimizing it won't help much
      status.hcounter = 0;
      status.vcounter++;
      if((region() == 0 && interlace() == false && status.vcounter == 262)
      || (region() == 0 && interlace() == true  && status.vcounter == 263)
      || (region() == 0 && interlace() == true  && status.vcounter == 262 && status.field == 1)
      || (region() == 1 && interlace() == false && status.vcounter == 312)
      || (region() == 1 && interlace() == true  && status.vcounter == 313)
      || (region() == 1 && interlace() == true  && status.vcounter == 312 && status.field == 1)
      ) {
        status.vcounter = 0;
        status.field = !status.field;
      }
 
      scanline();

in a separate non-inline function.

Consider merging counters into a single counter and use offsets from this counter for particular counters if the offsets don't need to be calculated every tick.

If possible, use a single counter and a single "next necessary update count" variable to batch updates. A hierarchical event system may be even better.

I'm guessing these things may be impractical with the current architecture or that they would require compromising clarity and maintainability. I'm just airing my thoughts here in case you'd find them helpful.

I don't have time to get detailed at the minute (or look at lots of bsnes code), but, if there's anything, feel free to ask and I'll try to answer as best as I can when I have time.


Last edited by sinamas on Mon Dec 15, 2008 9:41 pm, edited 1 time in total.



Mon Dec 15, 2008 9:36 pm
Profile
Rookie
User avatar

Joined: Mon Aug 02, 2004 5:14 am
Posts: 39
Post 
http://kuro-hitsuji.net/~tukuyomi/stuff ... french.zip
French locale for bsnes v0.038.
I left untranslated strings because of my lack of programming experience. If someone has the knowledge, feel free to modify my locale file.
Lines 175-179:
Code:
"Export memory" = "Export memory"
"Toggle S-CPU tracing" = "Toggle S-CPU tracing"
"Toggle S-CPU trace mask" = "Toggle S-CPU trace mask"
"Toggle S-SMP tracing" = "Toggle S-SMP tracing"
"Toggle S-SMP trace mask" = "Toggle S-SMP trace mask"

Line 226:
Code:
"Export data:" = "Export data:"


Mon Dec 15, 2008 9:39 pm
Profile WWW
Seen it all
User avatar

Joined: Mon Jan 03, 2005 5:04 pm
Posts: 2302
Location: Germany
Post 
<offtopic>
Version 2 of the locale.cfg I posted was the wrong one; if you got that one please clear your caches and download again.
</offtopic>

_________________
vSNES | Delphi 10 BPLs
bsnes launcher with recent files list


Mon Dec 15, 2008 10:10 pm
Profile WWW
Post 
Quote:
EDIT: How about this code?


Haven't learned Moonspeak yet, sorry :P j/k

Quote:
Very soon, your wonderful emulator will be running Shin Megami Tensei 2.


Just gotta work out the bugs :P

Quote:
Yes, SMT is my favourite SNES game.


Ugh, those games have horrible Huffman tables.

My favorites are SMT: DC Black+White, too bad they weren't for the SNES, or compatible with the SGB.

Quote:
Try putting this part in a separate non-inline function.


tick() is only called in one place, but okay. I'll post results in a bit.

EDIT: no luck :(
With the E4600, set frameskip to 9 to rule out S-PPU overhead. I get 160.5fps either way. The same thing with the above black magic vcounter position testing.

Quote:
Consider merging counters into a single counter and use offsets from this counter for particular counters if the offsets don't need to be calculated every tick.


I wanted to do that, but unfortunately both the hcounter clocks per scanline, and vcounter scanlines per frame, change based on field and interlace settings.

I tried a trick where I treated all scanlines as 1364, rather than 1360 and 1364, and skipped an extra four on said scanline ... it helped speed up IRQ testing, but I couldn't get all of my edge case test ROMs to pass :/

Quote:
If possible, use a single counter and a single "next necessary update count" variable to batch updates.


Batching could certainly work. I still need to write out each counter value in the history buffer due to aforementioned complexity in calculating the counter positions, so I'm not sure how much of a performance advantage that would offer ...

But batching is pretty easy by just detecting when the clocks to add will wrap past the end of the scanline, and then splitting that case into two batches. That may be worth it. It would also require range testing the IRQ latch positions as well (I test against the counter positions after each tick() call) ... which is what's given me nightmares in the past.

Quote:
I'm guessing these things may be impractical with the current architecture or that they would require compromising clarity and maintainability.


It's more a problem of difficulty with the batch / range testing stuff. There was a good ~50 or so pages in the old bsnes thread where I was trying to get that working and failed.

Given the relative obscurity of the code, and the tremendous impact to speed, if we could get it working I'd go with the speed over clarity in this particular case. I'm reasonably confident I have the IRQ timing perfected at this point.

Quote:
I don't have time to get detailed at the minute (or look at lots of bsnes code), but, if there's anything, feel free to ask and I'll try to answer as best as I can when I have time.


Thanks, I appreciate the examples above.

I was hoping there'd be a way to use the same approach and still speed it up, but I realize that's fairly unlikely. Going to require a radically different approach to accelerate.


Last edited by byuu on Mon Dec 15, 2008 10:23 pm, edited 2 times in total.



Mon Dec 15, 2008 10:14 pm
Display posts from previous:  Sort by  
This topic is locked, you cannot edit posts or make further replies.   [ 408 posts ]  Go to page 1, 2, 3, 4, 5 ... 17  Next

Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software.