XIF File Format (Pagis Pro 2.0 and 3.0)

What is XIF file format?

Scanning documents (for example, magazine articles) is difficult, because there are many factors to consider. Our goals pull us one way for one reason, and pull the other way for another reason. Speaking of documents (as opposed to photographs) then these properties are desirable:

So what do we do?   We have both needs in the same one document. We have conflicting concerns about the purpose of the scan. If we scan at 300 dpi line art mode for the text, the images are horrible. If we scan at 150 dpi color for the images, a full page color image can be over 6 megabytes and the text won't be very crisp. We can use a lot of JPG compression for size, and the text will become very poor. Documents often have both text and pictures on the same page, so that the more JPG compression we specify for picture size, the more JPG artifacts we have around the text. All these factors oppose each other. A conventional scan must treat the text and the pictures in the same scan in the same way, even though they have extremely different needs. We must compromise one selection.

What we really need is a way to handle the text differently than the pictures in the same image.

The ScanSoft Pagis XIF file format (eXtended Image Format) was designed specifically to solve this problem for storing document images, and is ideal for the purpose. The XIF file is an extension of the TIFF specification. XIF separates the image and file into four layer components. This allows each layer to be optimally compressed in its own best way to achieve a very small file size. The manual shows it this way:

The key to XIF file format is that the document is literally separated into multiple modes at multiple resolutions in multiple layers of the same file.

For XIF files, Pagis typically scans the TEXT layer as 300 dpi Line art. This size is for printing, which has higher requirements than the video monitor screen. Pagis typically scans the PICTURE layer at 100 dpi, 24 bit color or 8 bit Grayscale. You specify these resolution values, your choice, and you can use higher resolution for the images if needed (the 100 dpi default is the right ballpark for printing on a 600 dpi laser printer). This picture layer in XIF files is compressed with JPG compression, which is optimum for continuous tone images, like photographic pictures.

However, JPG is far from optimum for text, the dark smudgy artifacts surrounding sharp edges are exceptionally bad. Therefore, the XIF text layer is seperated and scanned and stored as Line art, and that layer is NOT JPG compressed, and so is not affected by the JPG artifacts. Line art compresses very well via other methods. It is already stored 8 pixels to a byte, so Line art is already only 1/24 the size in bytes of a color image. JPG compression is more typically 1/12 size, and even that level adds bad artifacts to text documents.

Pagis does not require that you use XIF files. It will handle most other file format choices, but XIF is a strong choice for storing documents compactly. I would suggest XIF for documents, but concerning your regular photo scans, you should know that XIF images are JPG compressed. Repeating, use of XIF is optional, but is ideal for documents. But a file is not even required, you can scan directly from Pagis into your own photo editor if desired.

So the XIF advantages include small document files, but that still contain clear Line art text without JPG artifacts. Embedded pictures in the same documen t are included in the same XIF file with JPG compression.

There are strong advantages, and two disadvantages (life is like that). One, most other programs cannot view XIF files, so there is a dependency on Pagis to view and print them, if used. However, Pagis will always convert these images to most other formats when desired, and there is a free downloadable XIF viewer at the Pagis web site that can be sent with XIF images if necessary. The Pagis Inbox provides a bar of icons for all your own programs, and you just drag and drop a document on a program and Pagis will automatically convert to your choice of format for that program.

And two, the Pagis Pro user interface automatically makes two scanning passes on the document, in color mode and again in line art mode for the text. That is a plus, but this does not include descreen for moire in the images, Pagis cannot enable that remotely. You can instead elect to use your scanners native twain driver interface, and can then select descreen. However then Pagis Pro makes one pass in color mode. If writing an XIF file, it will still convert the text to line art mode, and it is good, but not exactly like scanning in line art mode.

This is a scan of a catalog page, a typical color document.

This first one was a conventional scan (using Photoshop, NOT using Pagis Pro), at 100 dpi Color Mode. The image was compressed to a JPG file for smaller file size. It is shown here as a screen capture when enlarged 2X size. It looks like a typical JPG image. JPG artifacts are tough on text characters.

This text is not like Line art. It is 24 bit color mode, so the text is not black and the background is not white (I really did set the histogram contrast carefully). The text is anti-aliased in Grayscale and Color modes, those gray pixels are added to smooth the 100 dpi jagged edges.

All of the dark smudge speckles hovering all around the text are JPG compression artifacts. These appear around any sharp edge. The amount of these artifacts depends on the amount of JPG compression specified. This one is compressed with comparatively mild compression.

This next scan was with Pagis Pro, into an XIF file so that the pictures are at 100 dpi Color Mode, and the text are 300 dpi Line art mode. Again, this is a 2X size screen capture. Color images are 24 times the size in bytes of comparable size Line art images, so Line art is both a considerable size savings, and a large quality boost for text. Line art at 300 dpi will print well and will have smaller jaggies.

The black text is free from JPG artifacts, because it is in a separate line art layer. The background is white because it is line art. However, the white text on the box is part of the picture layer, so it does have JPG artifacts. The picture portions are automatically detected as such, and are still 24 bit color and JPG compressed. The text is Line art and the picture is 24 bit JPG, all in one XIF file.   100 dpi is not as good as 150 dpi for the image, we could specify 200 dpi if important.

Here are the text characters from OCR of that image.

4 Scanner Tools for 1 Low Pricel
Pagis Pro
1 Turn paper into accurate digital documentr
with TextBridge Pro OCR software
2 Scan, organize, and find documents with rage
3 Make quick color copies with Pagis Copier
4 Import and edit images with PhotoSuite

There are lots of options. You can configure Pagis to use the scanners native twain driver interface, and can then use the Descreen filter for Moire. However, Pagis does not then make a second Line art scan for the text. Pagis does enhance that text, it is considerably better than the Photoshop try, but it is not the same as Line art.   Actually, some of the original gray text color is carefully preserved.

