A few scanning tips


JPEG - Joint Photographic Experts Group

(.JPG file extension, pronounced Jay Peg). This is the right format for those photo images which must be very small files, for example, for web sites or for email. JPG is often used on digital camera memory cards, but RAW or TIF format may be offered too, to avoid it. The JPG file is wonderfully small, often compressed to perhaps only 1/10 of the size of the original data, which is a good thing when modems are involved. However, this fantastic compression efficiency comes with a high price. JPG uses lossy compression (lossy meaning "with losses to quality"). Lossy means that some image quality is lost when the JPG data is compressed and saved, and this quality can never be recovered.

File compression methods for most other file formats are lossless, and lossless means "fully recoverable". Lossless compression always returns the original data, bit-for-bit identical without any question about differences (losses). We are used to saving data to a file, and getting it all back when we next open that file. Our Word and Excel documents, our Quicken data, any data at all, we cannot imagine NOT getting back exactly the original data. TIF, PNG, GIF, BMP and most other image file formats are lossless too. This integrity requirement does limit efficiency, limiting compression of photo image data to maybe only 10% to 40% reduction in practice (graphics can be smaller). But most compression methods have full lossless recoverability as the first requirement.

JPG files don't work that way. JPG is a big exception. JPG compression is not lossless. JPG compression is lossy. Lossy means "with losses" to image quality. JPG compression has very high efficiency (relatively tiny files) because it is intentionally designed to be lossy, designed to give very small files without the requirement for full recoverability. JPG modifies the image pixel data (color values) to be more convenient for its compression method. Tiny detail that doesn't compress well (minor color changes) can be ignored (not retained). This allows amazing size reductions on the remainder, but when we open the file and expand the data to access it again, it is no longer the same data as before. This lost data is like lost purity or integrity. It can vary in degree, it can be fairly good, but it is always unrecoverable corruption of the data. This makes JPG be quite different from all the other usual file format choices. This will sound preachy, but if your use is critical, you need a really good reason to use JPG.

There are times and places this compromise is an advantage. Web pages and email files need to be very small, to be fast through the modem, and some uses may not need maximum quality. In some cases, we are willing to compromise quality for size, sacrificing for the better good. And this is the purpose of JPG. There is no magic answer providing both high compression and high quality. We don't get something for nothing, and the small size has a cost in quality. Still, mild quality losses may sometimes be acceptable for less critical purposes. The sample JPG images on next page show the kind of problem to expect from excessive compression.

Even worse, more quality is lost every time the JPG file is compressed and saved again, so ever editing and saving a JPG image again is a questionable decision. You should instead just discard the old JPG file and start over from your archived lossless TIF master, saving that change as the new JPG copy you need. JPG compression can be selected to be better quality in a larger file, or to be lesser quality in a smaller file. When you save a JPG file, your FILE - SAVE AS dialog box should have an option for the degree of file compression.

Many programs (for example Photoshop or Elements) call this setting JPG Quality. Other programs (Paint Shop Pro and Corel) call it JPG Compression, which is the same thing, except Quality runs numerically the opposite direction from Compression. High Quality corresponds to Low Compression. Typical values might be 85 Quality, or 15 Compression. These numbers are relative and have no absolute meaning. Compression in one program will vary from another even at the same number. The number is also not a percentage of anything, and Quality 100 does NOT mean no compression, it is just an arbitrary starting point. JPG will always compress, and Quality 90 is not so different from Quality 100 in practice. There's very little improvement over 95.

Digital cameras also offer JPG quality choices too. Large image files do fill memory cards fast. You can buy more and larger cards, or you can compromise by sacrificing image quality for small file size (but I hope you won't go overboard with this). The camera menu will have two kinds of resizing choices. One size choice actually creates a smaller image size (pixels), resampled smaller from the original standard size of the CCD chip, for example perhaps to half size in pixel dimensions. The correct image size in pixels is related to your goal for using the image. For example you may need enough pixels to print 8x10 inches on paper (6 megapixels), or you may only want a small image for video screen viewing (1 megapixel).

Regardless of that selected image size in pixels, the camera menu will also offer a smaller file size choice in bytes, related to quality, via JPG file compression. This menu will offer a best quality setting which is the largest file, and maybe intermediate sizes, and a smallest but worst quality choice. Digital cameras typically offer three JPG file size choices of Fine (about 1/4 size in bytes), Normal (about 1/8 size in bytes), or Basic (about 1/16 size in bytes), comparing compressed file size to the uncompressed size. The best (largest) JPG file size will still contain JPG artifacts, but very mild, essentially undetectable, vastly better than the smallest file choice. Even better, some cameras also offer a RAW or TIF format to bypass JPG problems all together. These images may be large, but memory cards are becoming less expensive, and larger or multiple cards are by far the best quality solution.

With either scanner or camera images, individual image JPG file sizes will vary a little, because detail in the individual image greatly affects compressibility. Large featureless areas (skies, walls, etc.) compress much better (smaller) than images containing much tiny detail all over (a tree full of leaves). Therefore images of the same size in pixels and using the same JPG quality setting, but with differing image content, will vary a little in JPG file size, with extremes perhaps over a 2 to 1 range around the average size.

Since each image varies a little (regarding image detail density), it compresses a bit differently, so that the file size is only a crude indicator of JPG quality, however it is a rough guide. For ordinary color images (24 bit RGB. All RGB JPG files are 24 bits), the uncompressed image size when opened in memory is always 3 bytes per pixel. For example, an image size of 6000x4000 pixels is 24 megapixels, and therefore by definition, when uncompressed (when opened), this memory size is 3X that in bytes, or 72 million bytes (68.7 MB). That is simply how large the 24 bit data is if 6000x4000 pixels. The compressed JPG file size will be smaller (same pixel count, but compressed into fewer bytes in the file). A High quality JPG file size might be compressed to 40% to 30% uncompressed size (bytes). A lesser quality JPG file size might be only 10% of that image's size when open in memory, which might be a general ballpark for a fair tradeoff of quality vs. file size for color images of web page quality (but not best quality).

The 10% size is not very precise (varies, data compresses differently), and it only refers roughly to a common image size, since each individual image varies a little. A little larger is better image quality. Color compresses better than grayscale files, so grayscale doesn't decrease as much. These are very rough guidelines, your image, your photo program, your purpose, and your personal criteria or tolerance will all be a little different. Normally, we ought to prefer high quality images. The smallest possible JPG is NOT a plus.

It is difficult to describe the JPG quality losses, except by seeing an example image (next page). JPG does not discard pixels. Instead it changes the color detail of some pixels in an abstract mathematical way. JPG is mathematically complex and requires considerable CPU processing power to decompress an image. JPG also allows several parameters, and programs don't all use the same JPG rules. Some programs take shortcuts to smaller JPG with less quality (browsers for example), and other programs prefer larger JPG with better quality. Final image quality can depend on the image details, on the degree of compression, on the method used by the compressing JPG program, and on the method used by the viewing JPG program.

Continued, JPEG Artifacts (and JPG artifacts are also covered in more detail here).

Copyright © 1997-2010 by Wayne Fulton - All rights are reserved.

Previous Main Next