Chroma Subsampling
The human eye is far less sensitive to changes in chrominance than to changes in luminance. Since YUV already has a separate luma channel the next logical step is to lower the resolution of the chroma samples. In other words, a single chroma sample can be used for multiple pixels. In fact, some YUV implementations sample chroma at 1/4 the resolution of luma. Since the bpp for chroma is 2/3 of the total bpp this can save a lot of space. DVD, for example, uses YV12, which has only one U and one V value per block of four pixels. This cuts the bitrate in half - from 24bpp to 12bpp. Each variation on chroma sampling is considered a separate color space, but they can all be described as being in the YUV color space.
Each YUV color space can be described with a series of three numbers, representing the relationship between the number of Y samples and the number of U and V samples. The primary YUV color spaces you're likely to see are 4:2:0, 4:2:2, and 4:1:1. While I've seen explanations for a supposed system that can be used to understand what these numbers mean, I can't verify the accuracy of these explanations, and in the end it's easier to simply memorize what each standard set of numbers means.
4:2:2
In 4:2:2 YUV each pair of chroma samples is shared across a pair of pixels horizontally adjacent to each other. The vertical luma and chroma resolution are identical. MPEG-2 and MPEG-4 both technically support 4:2:2 YUV, but I'm not aware of any encoder or standalone player format that takes advantage of this.
4:2:0
YUV with a chroma subsampling of 4:2:0 shares chroma samples across both horizontally adjacent and vertically adjacent pixels. Each 2x2 block of pixels contains only a single U/V chroma sample.
4:1:1
Much like 4:2:2, 4:1:1 YUV uses the same vertical resolution for chroma as luma. Unlike 4:2:2, the horizontal resolution is only 1/4 of the luma, meaning each group of four horizontally adjacent pixels shares a single piece of U/V information.
Sample Locations
Although it would be possible to simply repeat the same chroma information for each pixel that shares a pair of chroma samples, the quality is much better if the chroma is assigned to a specific point and then values between any two points (samples) is interpolated. Interpolation means simply taking two points and mathematically determining the value of the points in between. Depending on the algorithm involved, more points can be considered for greater accuracy. For example, consider three points in a line. If the first point has a red value of 0, the last has a red value of 255, and the middle a value of 127, you can be fairly certain that colors in between them are even graduations. In reality, the logic required to interpolate the additional pixels is much more complicated than that, but the basic idea is the same.
Packed vs. Planar
There are two ways for a YUV encoded picture to be stored in a file. The most common is using a packed format. This simply means that the luma and chroma samples are stored next to each other in the file. For example, a 4:2:2 encoded frame could have the folowing order:
|U|Y|V|Y|U|Y|V|Y|U|Y|V|Y|U|Y|V|Y|
Each chroma sample is located in the exact same coordinate of the frame as the corresponding Y sample. Each group of pixels that share a single chroma sample is called a macro pixel.
The alternative, planar format, changes the order so that all the Y information for a given frame is followed by all the U information for that frame, which is followed by all the V information for the frame. Think of it as starting with a surface that's backlit with a single white lamp for each pixel in a frame. Then a sheet of translucent blue material with the chroma samples and interpolated points in between is put over the top. Finally a translucent red sheet, with chroma samples and interpolated points, is put on top of the blue. When all three layers are in place you have your final picture.
So why bother with planar format? It has one major advantage over packed formats. Since luma is never subsampled, each luma point is always at an exact pixel location. Chroma however doesn't necessarily need to be lined up perfectly with a single pixel. With a planar format pixels can be placed on a grid that centers them between pixels. For example, a 4:2:0 encoded frame could make use of this by putting the chroma samples in the center, where all four pixels a sample applies to meet.