So, here’s the deal with gamma: phosphor.
If you pump twice as much energy into a piece of phosphor, you might get less or more than twice the photons out of it, depending on how much energy you started with. Which is to say: phosphor’s light output is non-linear in relation to its input.
You generally have a handful of TV cameras and studios, and millions (or more) of TV sets, so rather than building a circuit into every TV to compensate for the non-linearity of its phosphor (which would add to the cost of TV sets), TV standards like NTSC and PAL specify which particular type and variety of phosphor should be used in the manufacture of TVs, so they’ll all have approximately the same degree of non-linearity in their displays, and so the non-linearity-compensation can be done in the TV studio, or better yet in the TV camera itself, and everything will still look as it should.
As it happens, if you express signal strength as a number between 0 and 1, then the non-linearity of phosphor can be modelled with the exponentiation function xy (that’s “x to the power of y”), where x is the signal strength and y is a measure of exactly how non-linear the phosphor we’re talking about happens to be. This “measure of non-linearity” is typically referred to by the Greek letter “gamma”, and the NTSC standard (to pick one example) declares that all TVs should be made out of phosphor whose non-linearity can be modelled with a gamma of 2.2. If you want to get an idea what an NTSC display is going to do to an image, you can run x2.2 over the R, G, B components of each pixel. If you have an image in a linear colour space and you want to prepare it to be displayed on an NTSC display, you can run x(1/2.2) over it (that’s x to the power of (1 divided by 2.2) or x to the power of 0.4545…) and then the phosphor physics will (effectively) run x2.2 over it later to reverse the process.
The major problem with this scheme is that if you have an image that’s ready to be displayed on a phosphor-based display (with gamma of 0.4545… applied) you can’t reliably do any mathematical operations on the pixel data - for example, a signal of 0.5 is noticeably darker than “halfway between black and white”, and trying to average colours will give more weight to whichever one is darker rather than a proper average. If you want to do any kind of calculation on the signal at all, like simulating how light from nearby phosphors merge, or alpha-blending, or scaling an image up or down with any kind of interpolation, you need to uncorrect the pixel data you get (by raising it to the power of 0.4545), do your calculation, then recorrect the pixel data (by raising it to the power of 2.2).
If you start with purely generated colour data (say, you wrote a raytracer or something), then you’ll need to gamma-correct it to make it display correctly on an NTSC display (or an sRGB display like any modern computer monitor, since sRGB was invented to describe existing CRT displays). However, anything created interactively by somebody looking at a screen (artwork made in Photoshop, or sprite data, or whatever) is probably already gamma-corrected (just because the person making it would have picked colours that looked good to them on their sRGB display), and so is pretty much anything from a camera (video cameras, because they’re designed to output a signal a TV can display, and digital still cameras, because most computers can only reliably display sRGB images, so that’s what they make).
This article is mostly based on knowledge absorbed by osmosis from various posters on byuu’s message board, but I would like to mention blargg‘s posts in various threads about his NTSC filters.
Naturally, there’s a Wikipedia page on Gamma Correction which is probably more accurate than anything written here, even if I think my explanation is more readable.
The origin-story of sRGB I’d heard was fact-checked against the original sRGB proposal.