A few scanning tips


ScanSoft TextBridge Pro OCR

TextBridge Pro is an excellent state-of-the-art OCR program (Optical Character Recognition) that will convert a scanned image of text into editable text characters suitable for your word processor. TextBridge Pro can be run standalone, or from the menu of most word processors, or integrated from Pagis Pro. It will scan that image, or use previously scanned image files of formats BMP, PCX, DCX, TIF, XIF, and Winfax Pro files (there is even a scheduler to do large runs overnight). It can process many pages of images, and can run your scanners Automatic Document Feeder too, if you have one. It can output files in most word processor file formats, and Postscript EPS and HTML too. Or you can route the text directly into a word processor via a Send To menu.

A page from the TextBridge Help File:

Types of Documents TextBridge Can Recognize

TextBridge Recognition technology retains accurate OCR and format results for a wide range of documents:

This is the initial screen of the standalone mode.

Basically, that is all there is to it, 1, 2, 3, Get Pages, Recognize text, Save As to a file.   If you use Auto, it scans and asks if MORE PAGES or DONE?   If you use the 1,2,3 steps, you can do additional manual processing, like to mark certain zones as text or image type. Then to do additional pages, you simply do GET PAGES again, perhaps like step sequence 1,2, 1,2, 1,2, 3. If you use the arrow at the right side of the Get Pages button, you can specify From Scanner, Scanner Feeder, or From File.

There are many options too, to improve your results. Before you begin, the menu PROCESS - PAGE TYPE brings up this next screen to select your document type. Not shown here (scroll down to next row) is types of Newspaper and Table. You can specify "Any Page", which means TextBridge will figure it out, or you can give it a hint. The Toolbar above indicates that "Any Page" is currently selected, and that small arrow shows the next screen below too.

Then for that selection, the Settings button will offer more choices. You can specify paper orientation, or TextBridge will figure it out and rotate it properly. It will also straighten crooked pages. You can give TextBridge hints about print quality. You can specify if you want the original formatting retained, columns, tables, etc. You can save custom settings for use again on similar pages. You can define a "Template" that saves and reuses zones (text or image) for similarly formatted pages.

And the Scanner tab has settings too, to describe the paper document and format. Table cells can be delimited with tabs for spreadsheets if desired.

Scanning resolution is determined automatically for the document type you have specified. You can see the resolution and mode (B&W, Gray, Color) that will be used, but you can change the resolution only by selecting a different page type.

And in particular, you can use the CUSTOM button there to specify to to scan the document in two passes, (optional), for example, to use 300 dpi line art for the text portion, and 100 dpi color mode to retain the images in the document. When you mark the Regions for OCR, the appropriate scan pass will be used.

Then languages and user dictionaries and training data can be specified. There are 56 languages you can install from the CD (0.5 megabytes each). See the list. OCR recognizes character patterns (like a, b, c) from the pixel patterns in the image. This is no easy job for a computer, so OCR also resolves possible errors by recognizing complete words from the selected dictionary, like a spell checker (you can add your own words in your own user dictionary too). And you can "train" it, adding instructions showing: "when you get this, it should be that".

For example, if the pixel decoding deciphered the word "exarnple", the dictionary shows there is a valid word "example" that is a more suitable candidate, and it is automatically corrected. Plus, the manual proofing correction utility is quite good too (below).

The menu TOOLS - INSTANT ACCESS CONTROL PANEL will let you select from a list of programs to add TextBridge to their menu. These are typically word processor or spreadsheet programs, and the OCR output will appear back in documents in those programs.

There will then be a new menu item FILE - TEXTBRIDGE in those programs, which brings up this dialog box to perform the OCR, basically the same thing as in the standalone version.


