Here's something I frequently do in my SNES projects during development:
Latching the scanline counter once before entering the NMI/IRQ-waitloop and using that as the y-position of a sprite.
The more that sprite moves toward the bottom of the screen, the more time the CPU needed to calculate the last frame.
A no-brainer, really. Very quick way to estimate current effective cpu load and track down causes of slowdowns.
I've seen a couple of games use that in debug mode, too.
Also, if you're looking for CPU intensitive stuff on the SNES in the sense of most effective use of the available cpu cycles, have a look at the long forgotten tiny part of snes history that is demoscene/cracktros.