A few scanning tips

www.scantips.com


Color Bit-Depth, & Memory Cost of Images

Large images consume large memory and make our computers struggle. Memory cost for an image is computed from the image size.

For a 6x4 inch image printed at 300 dpi, the image size is calculated as:

  (6 inches × 300 dpi) × (4 inches × 300 dpi) = 1800 × 1200 pixels

1800 × 1200 pixels is 1800×1200 = 2,160,000 pixels (2 megapixels).

The memory cost for this RGB color image is:

  1800 × 1200 × 3 = 6.48 million bytes.

The last "× 3" is for 3 bytes of RGB color information per pixel for 24 bit color (3 RGB values per pixel, one 8-bit byte for each RGB value, which totals 24 bit color).

The compressed JPG file will be smaller (maybe 10% of that size), selected by our choice for JPG Quality, but the smaller it is, the worse the image quality. The larger it is, the better the image quality. If uncompressed, it is three bytes per pixel.

Different color modes have different size data values, as shown below:

Image Type   Bytes per pixel
1 bit Line art   1/8 byte per pixel
(1 bit per pixel, 8 bits per byte)
8 bit Grayscale   1 byte per pixel
16 bit Grayscale   2 bytes per pixel
24 bit RGB   3 bytes per pixel
Most common for photos, for example JPG
32 bit CMYK   4 bytes per pixel, for Prepress
48 bit RGB   6 bytes per pixel

File Compression techniques can make this data smaller while stored in the file, but it comes back out of the file uncompressed, with the original number of bytes when open in memory. JPG artifacts (lossy compression) just means the pixels may not all be the same original color, but there is the same number of pixels when uncompressed.

Calculator to Convert Memory Values
of Bytes, KB, MB, GB, TB

Type a value somewhere here, and click its Convert button, to convert to all the other values.

Bytes B
Kilobytes KB
Megabytes MB
Gigabytes GB
Terabytes TB

Seeing 1e-7 would mean to move the decimal point 7 places to the left, 1e-7 to 0.0000001
Seeing a NaN result would mean the input is Not A Number.

Each line here is 1024 times the line below it. Which is binary, and how memory computes byte addresses, however humans normally use 1000 for their stuff. Specically, megapixels and the 500 GB disk drive we buy correctly use 1000, but memory chips (including SSD) use 1024.

About Megabytes

The memory size of images is often shown in megabytes. You may notice a little discrepancy from the number you calculate with WxHx3 bytes. This is because (as regarding memory sizes) "megabytes" and "millions of bytes" are not quite the same units (for memory sizes).

Memory sizes in terms like KB, MB, GB, and TB count in units of 1024 bytes, whereas humans count thousands in units of 1000.

A million of anything is 1000x1000 = 1,000,000, powers of 10, or 106. But binary units normally are used for memory sizes, powers of 2, where one kilobyte is 1024 bytes, and a one megabyte is 1024x1024 = 1,048,576 bytes, or 220. So a number like 10 million bytes is 10,000,000 / (1024x1024) = 9.54 megabytes. One binary megabyte holds nearly 5% more bytes than one million, so there are about 5% fewer megabytes.

Binary 1024 units are necessarily used for memory, and also, computer operating systems like to arbitrarily use it for file sizes. All else (megapixels, disk size, etc) use normal 1000 units.

Specifications for megapixels in digital cameras, and disk drive size in gigabytes are both correctly advertised as multiples of decimal thousands... millions are 1000x1000. Or Gigabytes are 1000x1000x1000. That is a larger number of MB or GB than if counting by units of 1024, but this is NOT cheating, It is extremely natural, because humans do in fact count in powers of 10.

However, after formatting the disk, the computer operating system has notions to count it in binary GB. The device manufacturer did advertise it correctly, and formatting did NOT make the disk smaller, the units just changed (in computer lingo, 1K became counted as 1024 bytes instead of 1000 bytes). This is why we buy a 500 GB disk drive (sold as 1000's, the actual real count, the decimal way humans count), and it does mean 500,000,000,000 bytes, and we do get them all. But then we format it, and then we see about 465 gigabytes of binary file space (using 1024). All precisely correct, 500 GB / (1.024 x 1.024 x 1.024) = 465.661 GB. But users who don't understand this switch assume the disk manufacturer cheated them somehow. Instead, no, the disk just counted in decimal, same way as we humans do. No crime in that, mega does mean million, and we do count in decimal (powers of 10 instead of 2). It is the operating system that confuses us, calling it something different.

However, Memory chips (also including SSD and Compact Flash and SD cards and USB flash sticks, which are all memory chips) are different, and construction requires the use binary kilobytes (1024) or megabytes (1024x1024) or gigabytes (1024x1024x1024). Also (for no good reason) file sizes are usually in binary 1024K units. Doing this for file sizes is debatable, but there are good necessary technical reasons for memory chips to use binary numbers, because each address bit is a power of two - the sequence 1,2,4,8,16,32,64,128,256,512,1024... makes it be extremely impractical (unthinkable) to build a 1000 byte memory chip. It simply would not come out even. The binary address lines count to 1024, so it is necessary to add the other 24 bytes to fill it up. But there is no good reason today for file sizes in binary today, it is just an unnecessary complication, however counting in binary 1024 units is still done on them. If we have a file of actual size 20,001 bytes, the operating system will call it 19.532 KB.

The definition of the unit prefix "Mega" absolutely has always meant millions (decimal factors of 1000) - and of course it still does mean 1000, it does NOT mean 1024. However, memory chips are necessarily dimensioned in binary units (factors of 1024), and they simply incorrectly appropriated the terms kilo and mega, years ago... so we do use it that way. In the early days, when memory chips were tiny, it was more important to think of file sizes in binary, when they had to fit. Since then though, chips have become huge, and we don't sweat a few bytes now.

And also, with the goal to preserve the actual decimal meanings of Mega and Kilo, new SI units Ki and Mi and Gi were defined for the binary powers of 1024, but they seem ignored, they have not caught on. So, this still complicates things today. Memory chips are binary of course, but there is absolutely no reason why our computer operating system still does this, regarding file sizes. Humans count in decimal.

Note that you will see different numbers in different units for the same file size dimension:

  1. Photo editors normally show image size in binary units, either KB (uncompressed bytes divided by 1024) or MB (bytes divided by 1024 twice). Some editors (Irfanview) show both numbers, the binary representation and actual decimal byte count.

    The numbers we need to know is the image size in pixels. Then image size in bytes is (width in pixels) x (height in pixels) and then x 3 (for 3 bytes per pixel, if normal 24 bit color). That is the real decimal data size in bytes. Then for binary numbers for bytes, then divided by 1024 bytes for KB, or divided by 1024 bytes twice for MB. After that, you can go back to real decimal byte count by multiplying by 1.024 (once for KB, or twice for MB, or three times for GB).

  2. The Windows Explorer shows the file size in units of KB (bytes divided by 1024 once). But a larger factor is that the file is probably compressed, so the file on disk is likely smaller than the image size.
  3. The Windows command line DIR command shows exact decimal file size in bytes. The operating system records file size in decimal bytes, but it tends to show humans the value in binary KB or MB. I cannot think of any reason why that convention is retained today.
  4. Right clicking the file in the Windows Explorer (file explorer) and selecting Properties will show size in KB or MB, and also in bytes. Two sizes are shown, the actual file size, and the slightly larger space occupied because the disk has to allocate space in clusters (probably 4096 byte units for NTFS today).

Example For a 4000 x 2500 pixel image, then: (24-bit RGB is most common)

4000 x 2500 pixels = 4000x2500 = 10 megapixels

4000x2500 x 3 = 30 million bytes (if 24-bit RGB)

30,000,000 bytes / (1024 x 1024) = 28.61 megabytes (MB)

This is simply how large the data is - For ANY 24-bit 10 megapixel image, but JPG files compress it smaller (only while in the file).

57.220 MB if 48-bit RGB (6 bytes RGB per pixel)

38.147 MB if 32-bit CMYK (4 bytes CMYK per pixel)

28.610 MB if 24-bit RGB (3 bytes RGB per pixel)

19.073 MB if 16-bit GrayScale (2 bytes per pixel)

9.537 MB if 8-bit GrayScale (1 byte per pixel)

1.192 MB if 1-bit Line Art (1 bit per pixel)

Scanning any 6x4 inch photo will occupy the amounts of memory shown in the table below. I hope you realize that extreme resolution rapidly becomes impossible.

You may enter another resolution and scan size here, and it will also be calculated on the last line of the chart below. Seeing a result of NaN means that some input was Not a Number.

Scan size: by inches cm

At scan resolution: dpi    

When people ask how to fix memory errors when scanning photos or documents 9600 dpi, the answer is "don't do that" if you don't have 8 gigabytes of memory, and a 9600 dpi scanner, and have a special reason. It is normally correct to scan at 300 dpi to reprint at original size (600 dpi can help line art scans, but not color or grayscale).

Notice that when you increase resolution, the size formula above multiplies the memory cost by that resolution number twice, in both width and height. The memory cost for an image increases as the square of the resolution. The square of say 300 dpi is a pretty large number (more than double the square of 200).

Scan resolution and print resolution are two very different things. The idea is that we might scan about 1x1 inch of film at say 2400 dpi, and then print it 8x size at 300 dpi at 8x8 inches. We always want to print photos at about 300 dpi, greater scan resolution is only for enlargement purposes.
The enlargement factor is Scanning resolution / printing resolution. A scan at 600 dpi will print 2x size at 300 dpi.
Emphasizing, unless it is small film to be enlarged, you do not want a high resolution scan of letter size paper. You may want a 300 dpi scan to reprint it at original size.

When we double the scan resolution, memory cost goes up 4 times. Multiply resolution by 3 and the memory cost increases 9 times, etc. So this seems a very clear argument to use only the amount of resolution we actually need to improve the image results for the job purpose. More than that is waste. It's often even painful. Well, virtual pain.  <grin>


Copyright © 1997-2016 by Wayne Fulton - All rights are reserved.

Previous Main Next