Creator of Ootake has a message to all SNES/NES emu authors

Announce new emulators, discuss which games run best under each emulator, and much much more.

Moderator: General Mods

Post Reply
kick
Trooper
Posts: 550
Joined: Wed Mar 01, 2006 8:47 pm

Creator of Ootake has a message to all SNES/NES emu authors

Post by kick »

The author of the free opensource PC Engine emulator Ootake has just released the latest version which includes an important message to all SNES/NES/Genesis/GBx emulator authors out there:

You can read up more on it here:
http://www.ouma.jp/ootake/delay.html
(but before you go there,please read the text message below)

Here's the text included with the latest release of the emulator:

+ This will be an important talk for the future of "PC Engine(TG16)"
emulator. Please read though it is my not good English Language.
+ "Ootake" is not satisfied with "the game was able to be started".
"The game can enjoy even ending" is a target. Therefore, Ootake sticks to
detailed reproduction. Especially, it sticks to the reproduction of
"Reactive speed of the Joypad".
+ Please do seriously and compare the Shooting & Action game by
"MagicEngine(Charge)" and "Ootake(Free)". Perhaps, almost playing by
"Ootake(the delay after the pad is operated is minimum)" will be able to
take a high score. I think the difference of the operation feeling to be
felt.
+ This greatly influences "Played happiness", too. Though it becomes
a severe opinion, it will not be able to enjoy the game enough by
"MagicEngine(the delay from the pad operation to the reaction is large)".
The delay also has danger of making it to the one to which even
"Evaluation of the game" was mistaken. Therefore, "Ootake" is checked
always severely "Whether it is possible to enjoy it by the sense similar
to a real machine or not?". Because it is a respect of minimum to a past
masterpiece "PC Engine(TG16)" game.
About the delay measures of "Ootake"
- http://www.ouma.jp/ootake/delay.html
+ There are no feelings that criticize "MagicEngine". I think that it did
wonderful work. But, it greatly delayed the development of "PC Engine"
emulator compared with other machine(NES,SNES,GENESIS,etc.) emulators.
Because the monopolized (even the BAD ROM-image could be played)
"MagicEngine" was "Source closed-door".
Other machine(NES,SNES,GENESIS,etc.) emulators has developed greatly by an
excellent source code and information on hot people.
+ Fortunately now, "Mednafen(Free)" and "Ootake(Free)" exist. Those
reproduction level is higher than "MagicEngine". And, they keep disclosing
the source and information.
I feel that it is time when "MagicEngine" also discloses the source and
information if "PC Engine(TG16) is loved. This is my selfish hope.
Because the continuance of development can be handed over to young people
who love retro game.
"Even future generations keep telling an old masterpiece with perfect
reproduction." I think that this is essence of the emulator.
+ Thank you for reading.

Kitao Nakamura
PC-Engine / TurboGrafx16 emulator Ootake

Now,emu authors can click here: http://www.ouma.jp/ootake/delay.html and read up on how Ootake solved this issue and how the same thing can be achieved in other emulators as well.

The future is in OpenSource and accuracy-focused emulation :D
[i]Have a nice kick in da nutz[/i] @~@* c//
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

Summary:
  • To reduce input latency, don't read state of controller until emulated system does. But, some games continuously read controller, so limit frequency of reads by ignoring reads sooner than some amount of time after previous read.
  • To further reduce latency, break emulation into chunks of 1/240 second, and avoid emulating more than 1/240 second within 1/240 second of host time.
  • Defining the following uses a lower-latency DirectInput version on Windows XP SP2: #define DIRECTINPUT_VERSION 0x0500
byuu

Post by byuu »

+ There are no feelings that criticize "MagicEngine". I think that it did
wonderful work. But, it greatly delayed the development of "PC Engine"
emulator compared with other machine(NES,SNES,GENESIS,etc.) emulators.
Because the monopolized (even the BAD ROM-image could be played)
"MagicEngine" was "Source closed-door".
Other machine(NES,SNES,GENESIS,etc.) emulators has developed greatly by an
excellent source code and information on hot people.
Ah, someone who agrees with my opinions on open v closed source :)
* To reduce input latency, don't read state of controller until emulated system does. But, some games continuously read controller, so limit frequency of reads by ignoring reads sooner than some amount of time after previous read.
That was the problem I had, a game polling nonstop would add serious overhead. This trick would help. However, I'm not convinced that it makes a huge difference in responsiveness, as you typically won't see any changes until the next frame anyway.

Perhaps a compromise would be to forcefully poll once every N scanline(s), rather than once every frame. A much simpler change, and it shouldn't have much impact unless your input method eats up way too much CPU time.
To further reduce latency, break emulation into chunks of 1/240 second, and avoid emulating more than 1/240 second within 1/240 second of host time.
This one also seems kind of odd. I see his general premise about the large vsync points where no input is parsed, but at the same time, if the frame doesn't update but every 1/60th of a second, is this really even noticeable?

Unfortunately, this one would be much trickier with an emulator that synchronizes to the audio playback rate, rather than video playback rate. Still doable, I know. It'd have quite an impact on speed, too, especially as I continue emulating during the vertical refresh time.
Defining the following uses a lower-latency DirectInput version on Windows XP SP2: #define DIRECTINPUT_VERSION 0x0500
Well, that one's easy enough. Would be nice to have empirical proof of that, though.
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

byuu wrote:
* To reduce input latency, don't read state of controller until emulated system does. But, some games continuously read controller, so limit frequency of reads by ignoring reads sooner than some amount of time after previous read.
However, I'm not convinced that it makes a huge difference in responsiveness, as you typically won't see any changes until the next frame anyway.
Delaying the input read until the emulated system accesses it is to reduce latency from input to emulator, not emulator to display.
Perhaps a compromise would be to forcefully poll once every N scanline(s), rather than once every frame. A much simpler change, and it shouldn't have much impact unless your input method eats up way too much CPU time.
Why not simply wait until the emulated system reads (and return previous input if the last host read was more recent than a certain time)?
To further reduce latency, break emulation into chunks of 1/240 second, and avoid emulating more than 1/240 second within 1/240 second of host time.
The point of this one seems to be to avoid having the next frame emulated just after the current one is displayed. Seems it'd be even better to wait for vsync, display current frame, then wait slightly less than 1/60 second before emulating the next frame. That way, it will be emlated just before it needs to be displayed, allowing input to be read at the last possible moment.

The general point of both these techniques is to reduce the average latency, even if only by a few milliseconds, since every bit helps.
Defining the following uses a lower-latency DirectInput version on Windows XP SP2: #define DIRECTINPUT_VERSION 0x0500
Well, that one's easy enough. Would be nice to have empirical proof of that, though.
Yep, someone needs to write a simple fake emulator that fakes emulation and shows the latency from input to display, so the above techniques can be objectively benchmarked.
Exophase
Hazed
Posts: 77
Joined: Mon Apr 28, 2008 10:54 pm

Post by Exophase »

Both of my emulators update input state at 1/60th of a second (when vsync is emulated) and I have yet to hear a single complaint from a single person that the latency is bad in any game. I've heard a lot of suggestion that it needs to be higher than this, but of course most games just poll once per frame anyway (and typically after vblank IRQ) - I understand this is less "accurate", but not in the same way that accuracy is strived for. I think that this is a highly psychological problem.

On another topic, I'm not really satisfied with what Ootake calls "detailed reproduction" or "modeling after a real machine", when a lot of things are done via hacks and guess implementations (things that "just work") that are far from how the real machine works. If this is his goal then I suggest he start writing some tests to determine VDC delays because I'm all but certain that that's what's tripping up most of the problems in the PCE emulators that aren't per-game-hacking around it (even Popful Mail). I also don't think that he should criticize the accuracy of other emulators (particularly based on assumptions due to close source) until he changes his approach to emulation substantially.

I haven't heard from others that Magic Engine is less responsive - is it not possible that particularly machine parameters are affecting this for him? We need more information before we start burning ME at the cross for charging money (a precedent set when it was by far the best PCE emulator for years)

On the topic of open source, I don't know if Magic Engine going open would have helped PCE emulation. I actually believe that Ootake being open has helped little, or at least it has helped me little. Actually, I would say that it has pushed me in the wrong direction before because of its hacks, and since I can't understand the Japanese comments I don't really know what to make of things (not that this is his fault of course, but restricting developers to those who know Japanese creates a small minority). I know that David Michel (author of ME) has helped others with PCE details in the past, and to me this is more valuable, as are the efforts of Charles MacDonald and his testing + documentation than his PCE emulator (which is open, but very old/incomplete).

My PCE emulator is also not open source, but I am trying to document the things I encounter, including the results of tests ran on hardware (and these tests can be obtained from me as well), which I also think is potentially more useful than Ootake's source.
kick
Trooper
Posts: 550
Joined: Wed Mar 01, 2006 8:47 pm

Post by kick »

The latest version of Ootake is out and once again it comes bundled with another chapter of the multi-volume bestseller "Okure Monogatari" :D

************************
The saga continues...
************************

Please let me do a little an important talk for the game field. It is
continuation of the story (Input Delay(Lag) problem) written last time.
+ The delay influences people who do not feel the delay, too. Even if you
cannot feel the delay, the difficulty of the game goes up certainly, and
the pleasure of the game certainly falls. As a result, there might be a
case that goes away from the game (Or, the played frequency decreases.
Without recognizing the cause for myself).
+ "Golf game" is the most remarkable to feel the delay. Please play
"Naxat Open"(the masterpiece golf game of "PC Engine") by full-screen(for
easy to watch). And, watch the power meter(small circle) of the screen
seriously when you swing. In the state, push the button of the pad when
the power meter(small circle) came right above person's head. When playing
with the display monitor where delay hardly exists, in "Ootake", the meter
(small circle) stops at once if the button is pushed.
But, in "MagicEngine(input-lag is large)", the meter(small circle) stops
in the place that shifted to the right a little.
If you do not push the button (do not shot) after this, this test can be
repeated. Please try repeatedly. And, you must feel the difference of
"Play Sense".
+ When you recognize the input-lag, if you push the button ahead of time, it
becomes a temporary solution. However, you will not feel good feelings
in the method. True interest of the action game is ruined.
Though it is not desirable, the timing of the golf game can become
accustomed. However, the timing of the baseball and tennis game becomes
unnatural late. Because up, down, left, and right movement is necessary.
Even the timing of other actions and shooting games is similar.
+ As a result, I strongly worry about the misunderstanding (difficult and
not interesting for the delay) of the game.
* Example of measures on emulator author side
-> http://www.ouma.jp/ootake/delay.html
* Measures on user side PC environment
-> http://www.ouma.jp/ootake/delay-solution.html
+ Therefore, please read above-mentioned "Measures on user side PC
environment", and make a good PC environment if possible. And, in
"Ootake", enjoy playing "PC Engine(TG16)" games sincerely.

+ Thank you for reading.

Kitao Nakamura,
PC Engine / TurboGrafx16 emulator Ootake
[i]Have a nice kick in da nutz[/i] @~@* c//
byuu

Post by byuu »

On the topic of open source, I don't know if Magic Engine going open would have helped PCE emulation ... I know that David Michel (author of ME) has helped others with PCE details in the past, and to me this is more valuable, as are the efforts of Charles MacDonald and his testing + documentation than his PCE emulator (which is open, but very old/incomplete).
It's hard to say for sure.

I really think code is more important than documentation. It's far, far easier to make mistakes in documentation, as you lack any form of empirical unit tests. Whereas with improper source code, all but very subtle flaws will cause very visible bugs. Of course, I'd take an expert's notes over the source code of someone who did not perform his tests on real hardware.

I also believe that the PCE suffered more due to its lack of popularity. Even with the SNES being as huge as it is, there's currently zero people running tests on real hardware right now to improve emulation. The PCE, like the Jaguar, had the odds stacked against it from the beginning.

I'm always disappointed that people would rather hide secrets than share them, but what can you do? There are real assholes out there who take advantage of such generosity and cause people to not want to release code. It's completely their choice and needs to be respected. But if you're going to complain, step up and help out first, at least. Thankfully, the Ootake guy is doing that. But he needs to keep in mind (as do I), that eg the ME author owes the world nothing.
The latest version of Ootake is out and once again it comes bundled with another chapter of the multi-volume bestseller "Okure Monogatari"
... alright, what the hell.

How about I make a little ADD-style test program. Have it flash letters on the screen. When you see an 'X', you push a certain button. The idea is to push the button as soon as you see the letter. We then record when the emulator acknowledged the keypress.

Do this test for about five minutes. Once with an emulator that polls every frame, and once with the same exact emulator, modified to poll every scanline.

Do it as a blind test: do not provide source code, so that nobody knows which is which.

Store the results in RAM, and then compare the results. If one scores significantly better in the vast majority of cases, then we know it matters. If not, then we know it does not.

Any perceived flaws in this idea that I'm not considering?

(I would honestly prefer someone else do this, so that people without Core 2's could test. But if nobody's interested, I'll try and give it a shot.)
Exophase
Hazed
Posts: 77
Joined: Mon Apr 28, 2008 10:54 pm

Post by Exophase »

Sounds like a good idea, but first, I have a better plan to debunk what he's saying:

Super Star Soldier, a game he claims is sensitive to this:

Code: Select all

reading io (pc e15b), select 1, clr 0, frame 3c0 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c0 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c0 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c0 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c0 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c0 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c0 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c0 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c0 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c0 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c1 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c1 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c1 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c1 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c1 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c1 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c1 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c1 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c1 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c1 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c2 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c2 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c2 scanline 101
reading io (pc e16b), select 0, clr 0, frame 3c2 scanline 101
reading io (pc e15b), select 1, clr 0, frame 3c2 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c2 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c2 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c2 scanline 102
reading io (pc e15b), select 1, clr 0, frame 3c2 scanline 102
reading io (pc e16b), select 0, clr 0, frame 3c2 scanline 102
Naxat Open, the other game he pointed out, follows the same story:

Code: Select all

reading io (pc ee48), select 1, clr 0, frame 6dd scanline 90
reading io (pc ee58), select 0, clr 0, frame 6dd scanline 90
reading io (pc ee48), select 1, clr 0, frame 6dd scanline 91
reading io (pc ee58), select 0, clr 0, frame 6dd scanline 91
reading io (pc ee48), select 1, clr 0, frame 6dd scanline 91
reading io (pc ee58), select 0, clr 0, frame 6dd scanline 91
reading io (pc ee48), select 1, clr 0, frame 6dd scanline 91
reading io (pc ee58), select 0, clr 0, frame 6dd scanline 91
reading io (pc ee48), select 1, clr 0, frame 6dd scanline 91
reading io (pc ee58), select 0, clr 0, frame 6dd scanline 91
reading io (pc ee48), select 1, clr 0, frame 6de scanline 90
reading io (pc ee58), select 0, clr 0, frame 6de scanline 90
reading io (pc ee48), select 1, clr 0, frame 6de scanline 90
reading io (pc ee58), select 0, clr 0, frame 6de scanline 90
reading io (pc ee48), select 1, clr 0, frame 6de scanline 90
reading io (pc ee58), select 0, clr 0, frame 6de scanline 90
reading io (pc ee48), select 1, clr 0, frame 6de scanline 91
reading io (pc ee58), select 0, clr 0, frame 6de scanline 91
reading io (pc ee48), select 1, clr 0, frame 6de scanline 91
reading io (pc ee58), select 0, clr 0, frame 6de scanline 91
It looks like it's read several times per frame, but actually it's just polling all of the controller states - notice how it always happens within two scanlines. Sometimes it happens a little later, but never takes more than two scanlines, and this is probably just the code taking a certain amount of time. In other words, the polling is not distributed throughout the frame.

But let's say it were and "method 1" were implemented. In an efficient emulator the emulation may only take 10% of the actual frame time. So polling at the point where it's read isn't going to really make a difference, unless you continue to follow a heuristic of when it'll be read. I think blargg already touched on this.

The only method remaining is the "b" method, however, getting precision timing at 4x the frame rate can start to get quite cumbersome, even with high resolution timers. That is, unless idle loops are done, but I'm sure we don't want to resort to such a thing. Even then, this placing much higher real time requirements on the code.

If the time of polling really is an issue, then maybe you can track it in the emulator and try to distribute the frame so the waits are made around that point, and no more than that. I still think none of this is really necessarily, though (and his real issue is with driver or hardware problems)

He also said that the multi-synchronization would improve audio and timer emulation. I don't see how these things are relevant. There should only one variable when it comes to audio, and that's global latency. This might be more noticeable for sound effects that have a clearer frame of reference, but it'll always be there and it'll always be the same amount, depending on what the audio buffering is like (and it'll be impossible to have it anywhere near zero latency). Suggesting that decreasing emulation quanta improves accuracy of PSG emulation means that he probably tied the audio generation to the audio output, when these things can be kept separate. Then you can update the generation as low level as you want (every clock cycle if you really feel like it) without having to wait any amounts, as it'll all be read off of the buffer its generated to when it's needed.
blargg
Regular
Posts: 327
Joined: Thu Jun 30, 2005 1:54 pm
Location: USA
Contact:

Post by blargg »

But let's say it were and "method 1" were implemented. In an efficient emulator the emulation may only take 10% of the actual frame time. So polling at the point where it's read isn't going to really make a difference, unless you continue to follow a heuristic of when it'll be read.
Yep. If you read host input before starting emulation of the frame, you'd be reading about 1 msec earlier than if you waited until you got to the point where the emulated system read. As you say, method 2 would make more of a difference, moving the time you emulate the read closer to when it actually occurs, thus delaying it by 4 to 8 msec. The best idea that's come from this is to emulate the frame as late as possible, since the host is always going to introduce more latency than you'd like. To do this, you need to be able to get feedback as to how close you were to finishing emulation of the frame too late, so you can emulate the next one slightly earlier if you went over this time.

But all this is speculation, so it would be best to find ways of objectively measuring latency, and in the same manner on the console as the emulator. Below are the tests I wrote for the NES, and how to apply them.

Time_Latency.nes times overall latency from joypad input to audio output. When run, it clicks at a regular interval and your task is to press a button in sync with the sound (any button on the joypad will work). After you establish a good rhythm and maintain if for the last four presses, you can stop pressing the button and read the result on screen, which is the average of the last four presses. It's best to close your eyes to avoid visual distraction (especially trying to read the time on screen, which could influence you adjust your timing to get a particular value). I noticed that I normally pressed the button early, so I had to consciously delay a bit until the sound of the button pressing was in synchronous with the click. When that occurred, there was a clear audible difference in my head that I was in sync; it was as if the click and my button press became one (experiment and you'll hear what I mean).

Audio_Latency.nes times the relative latency of audio as compared to video. When run, it clicks and flashes a box at a regular interval. Use the joypad to adjust the delay until the sound and image seem synchronized. Up and down make coarse adjustments, and left and right make fine adjustments. Hold the direction to adjust the delay value shown, as it changes fairly slowly. Adjust the value past what seems correct, so you get a feel for how much of a difference there is between synchronized and unsynchronized. Unfortunately this test is harder to fine-tune, due to the human visual and auditory systems being somewhat independent regarding timing.

Both tests display the value in milliseconds relative to a NES. Time_Latency gives an emulator's input+audio latency, and the difference of Time_Latency - Audio_Latency gives the emulator's input+video latency. On my NES emulator with vsync disabled on a 76 Hz CRT monitor, Time_Latency gave 58 msec and Audio_Latency gave 20 msec. This means that joypad input to audio output latency is 58 msec greater than a NES, and joypad input to video output latency is 58 - 20 = 38 msec greater than a NES. For reference, a video frame is 16.7 msec long.
Post Reply