Sorry to rehash this, but I don't understand. OK, you get different number of samples each frame. All that's needed is a stable average, like 32040 samples per second. A buffer can even out the flow to an even 534 per frame. So you may get 32000 samples in one frame, then none for 58 frames, then 40 on the 60th frame.byuu wrote:I've already tried to use vsync. I can do that just fine, but then I get differing numbers of samples each frame. The difference is so severe that resampling each batch results in audible pitch differences. If I buffer out the differences, that just results in bigger pitch changes, just less frequently.
The average number of samples waiting in the buffer determines latency. The number needed depends on how much the number of samples per frame varies. If you get exactly 534 samples every frame, then the buffer doesn't need to have any extra samples after each frame. If you sometimes get 500 samples for one frame and 568 samples on the next, then the buffer must have 34 extra samples on average. Since the buffer size needed depends on other factors too, it's usually best to allow it to be adjusted by the user, or automatically increase it if it's running empty too often. To increase it, you either run the emulator faster for a moment, or stop reading samples from it for a frame or two.
That covers getting a consistent number of samples every frame. Then, you want to resample to a different rate. That works independently of the above buffering. Instead of feeding the 534 samples directly to the sound driver with its rate set at 32040 Hz, you'd feed them to the resampler, which then resamples as much as possible and feeds it to the sound card.
The resampler might not be able to resample as much as requested, due to each output sample needing more than one input sample (even when resampling to a higher rate). For example, if you were resampling to 2 times the input rate using linear interpolation, you wouldn't be able to generate 10 output samples when given 5 input samples, because the last output sample would depend on the as yet unknown 6th input sample. The solution is for the resampler to generate as many output samples as it can, then save the rest of the input for next time (or tell the caller how many samples it's completely done with, so the caller can simply keep the last sample in its buffer until next time).
1-2-3-4-5 input samples
123456789 output samples
5-6-7-8-9-0 input samples for next call that supplies 5 new samples
-0123456789 output samples
So after 10 input samples, you have 19 output samples, and the last input sample saved until next time.
The key here is that the resampler's ratio is not affected by how many samples it's resampling in a given call. A flawed approach would be to calculate how many output samples could be generated from a given number of input samples, and round that to the nearest integer, then adjust the resample ratio for that call based on this rounded value. For example, using the above example, you adjust the ratio from 2 to slightly less, so that the 10th output sample doesn't need anything beyond the 5th input sample. This means that the first and last output samples match the first and last input samples, which is wrong.
It gets worse if the ratio is something like 1.5. For 5 input samples, 7.5 should be output. If you rounded that to 7, the ratio would become 1.4. If later you had 10 input samples to generate 15 output samples, the ratio would be 1.5, but you'd still have the issue with the sampling points being slightly off.
I can supply test code that does the buffering and resampling, so we can find out whether this is the problem, or something else.