I wrote a bit about how the filter works:
A SNES pixel's effect is spread out on the TV screen, due to the way NTSC encoding and decoding work. At some point I realized that the NTSC operation was
linear, which meant that the result of running two adjacent pixels A and B through the NTSC filter is the same as running A and black through the decoder, black and B through separately, then summing the two outputs. This opened the door to precalculating each possible pixel appearance in isolation and then combining them when running the filter.
For each possible SNES color, a single pixel of it on a black background is generated at each combination of the three column alignments and three scanline burst phases, totaling 9 possible appearances (see
earlier post about the burst phases):
Code: Select all
012 012 012
0 X-- -X- --X
1 --- --- ---
2 --- --- ---
0 --- --- ---
1 X-- -X- --X
2 --- --- ---
0 --- --- ---
1 --- --- ---
2 X-- -X- --X
A SNES pixel affects many output pixels to the left and right, due to filtering in the TV, so for each of the above combinations, the 14 non-zero output pixels are captured into a
kernel. An imporant subtlety is that the output pixels are
signed RGB values, since a SNES pixel can both increase
and decrease the RGB values of nearby output pixels. So the kernel contains the signed red, green, and blue components for each of the 14 output pixels, centered around the original SNES pixel.
There are 32768 SNES colors available. Each kernel uses 56 bytes (14 pixels * 4 bytes per pixel), so if a set of 9 kernels were made for every color, almost 16 megabytes of memory would be needed for the table, and almost 300000 kernels would need to be calculated during initialization. This would take quite a while, since NTSC signal processing is somewhat complex. I found that I could reduce both the red and blue channels from 32 possible intensities to 16 without affecting the image noticeably. This reduced the number of colors to 8192, one fourth of the original.
Finally, when running the filter on a full SNES image, the appropriate kernels for each SNES pixel are summed together. Below, SNES pixels are numbered from 0 upwards, where 0 is the first pixel on the line.
Code: Select all
|----- 0 -----|----- 6 -----|---- 12 -----|------------ ...
|----- 1 -----|----- 7 -----|---- 13 -----|---------- ...
|----- 2 -----|----- 8 -----|---- 14 -----|-------- ...
|----- 3 -----|----- 9 -----|---- 15 -----|----- ...
|----- 4 -----|---- 10 -----|---- 16 -----|--- ...
+ |----- 5 -----|---- 11 -----|---- 17 -----|- ...
-----------------------------------------------------------
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... output pixels
*
As you can see, any given output pixel depends on the nearby 6 SNES pixels. For example, the starred output pixel is the sum of k0 [13], k1 [11], k2 [9], k3 [6], k4 [4], and k5 [2], where k0-k5 are the kernels for pixels 0-5, and k [0] is the first pixel of a given kernel.
One additional step needs to be performed. The final RGB values can be less than zero or greater than the maximum, so they need to be
clamped, otherwise they will wrap around and look bad. Clamping turns negative values into zero and excess values into the maximum. After clamping, the RGB components converted to the packed format used by the graphics card.
It's quite an intensive calculation, even with the table optimization. I put a lot of work into encoding the RGB values into the kernels so that they could be efficiently summed and clamped with a minimum of calculation. Even kernel initialization needed significant optimization to prevent it from taking several seconds. This has come a long way from the original NES NTSC code and algorithm given to me by NewRisingSun, which ran at around 2 frames per second on my machine. The optimized table implementation now runs 75 times faster at 150 frames per second.