Dithering is a process that allows for masking of the quantization or 'rounding' errors when going from a higher bit depth to a lower one. It is typically used when going from a higher bit depth to a lower one, but is not usually used when going the other way. It is essentially adding random noise to the data so that "rounding" errors are not systematic. It really came about because 16 bits is not enough to fully represent what we hear, especially at very low volumes. DAWs that sound engineers use usually work in at least 24 bits these days, but when mastering for a 16 bit, CD they need to eliminate any quantization errors.
Consider going from 24 bits to 16 bits. You can simply truncate the last 8 bits or you can round the 8 bits to get the 16th bit. Either one is a systematic approach which, in the digital world, can create noise. The problem is that the noise is systematic and that often is heard as some sort of regular distortion in the final analog. The idea of dithering is to add some random noise in the process, so the 16th bit gets set randomly rather than systematically. The dithering creates noise, but it is not regular so the ear does not pick up on it so readily.
Dithering is also used when calculations are done on 16 bit data, such as applying filters or DSP calculations. Theses often involve multiplication which creates a lot more digits, which has to be reduced to 16 bits. Dithering reduces the systematic error in that process to a random error.
When you are at 24 bits, the last few bits are well below normal hearing levels and the noise in the electronics is usually larger than the 24th bit. So, when working with 24 bits, dithering is nor normally used. However, in the quest for more and more resolution some people do dither 24 bits data, just hoping it will help.
Dithering is a complicated issue. Like upsampling, there is a theoretical side and a practical side. Pretty much everyone does it when manipulating 16 bit data (normalization for example. or mastering CDs) and some people use it for 24 bit data. But, in general, it is used when going from higher bit depth to a lower one in order to eliminate any systematic errors in the data.
Search on dithering when upsampling and you will find a lot of discussions on this topic. Wikipedia has a good general article on dithering. The Etymology section is very interesting. It talks about how dithering was used in early computers to make the results more accurate.
Note that dithering is also very often used in digital photography and digital video. For example if a device (like a phone or a camera) has a limited color palette dither can make the pictures look smoother. The rough edges of the limited palette get randomly smoothed to give a picture which appears to have more color resolution than the device actual has.
Hope than helps.