Document Number 2204 Optical Character Recognition Technology Information May 1993 In the Beginning . . . Although optical character recognition (OCR) has only recently been popularized, OCR, or at least the concept of OCR, has existed since the beginning of the nineteenth century. In 1809, the first patents for reading devices to aid the blind were awarded. These inventions were the first real "seeds" of OCR's development. The next 100 years saw numerous advances in optical scanning. One important invention was the "retina scanner" that used a mosaic of photocells in an image transmission system. Another important milestone in the evolution of OCR was the invention of the "Nipkow Disk," a sequential scanning disk which made possible the technique of line-by-line analysis of images, as well as other future innovations. For example, the principle of Nipkow's sequential scanning process was used in the operation of modern television cameras and as a technology incorporated in many current OCR systems. Shortly before World War I, the first true "readers," or machines that were able to convert printed characters into another form, were made commercially available. In 1912, Emmanuel Goldberg patented a machine which directly read characters and converted those symbols into standard telegraph code. Goldberg's machine read typed messages, converted them to paper tape, and then used the tape to transmit telegraphic messages over wires without human intervention. His invention demonstrated a practical application of OCR. During the same time, but independently, Fournier D'Albe invented an OCR device called the "Optophone." The Optophone was a hand-held scanner that optically scanned printed material and produced series of audible tones while being moved along a page. Each tone corresponded to a specific letter or character which allowed a visually impaired person to interpret written material. In the late 1920's, AT&T patented systems which scanned messages and encoded them into "Morse Code" for telegraphic transmission. Emmanuel Goldberg was responsible for yet another significant development in 1931, when he patented a device that searched photographic transparencies of data records and attempted to match them against a template of the desired search pattern. The hypothesis behind this system was that once a match was located, the coincidence of pattern would cause a light source to be completely blocked from a detection device, more specifically, a photographic cell. This concept was the beginning concept of "template matching." This technique was actually applied in the first actual working character readers which appeared in the 1950's. In the mid 1940's, the birth of the electronic data processing industry created the need for a productive method of data entry. Although IBM entered the optical scanning field in 1938, and was awarded various OCR-related patents, including one for a "Light Sensitive Device," the computer pioneer made no attempts to market commercial OCR devices until after 1960. A nonscientific article in the mid 1950's introduced to the public the first potential commercial marketplace for OCR technology and equipment -- an invention named "Gismo." Developed by a Department of Defense engineer, Gismo was capable of reading of reading 23 letters of the alphabet, which had been produced by a standard typewriter. Gismo could also understand Morse Code, read musical notations, and even read aloud from printed pages. The inventor was quoted as saying that once "Gismo" got into production, the machine would have about 99.9-percent reading accuracy and would sell for approximately $1000.00. Of course, this was all theory in 1950. But it generated substantial enthusiasm and pointed to a bright future for OCR. Shortly after "Gismo" captured the public's attention, the same engineer founded Intelligent Machines Research Corporation (IMR). IMR developed and applied OCR technology to the problems and needs of commercial data processing. The company went on to achieve a major first in OCR with the installation of a commercial OCR reader at Reader's Digest in New York in 1954. The initial reader was used to convert typewritten documents (sales reports) into punched cards for input into the subscription department computer. This equipment enabled the magazine to reduce order processing from the former rate of one month to a little more than a day. Reader's Digest scanner, often cited as "paying for itself" twice each year, had already read its billionth character by September of 1959. Numerous other companies were early adopters of OCR: First National City Bank, New York (processing travelers' checks); National Biscuit Company (converting sales records to cards); AT&T (dividend checks and stockholder records); Ohio Bell Telephone; Arizona Public Service Company; Atlantic City Electric (cash accounting) and numerous government agencies including the U.S. Post Office. At this time, most of the OCR systems were hardware + software combined devices costing hundreds of thousands of dollars that were restricted to reading two specialized fonts: OCR A and OCR B. "Matrix matching" dominated OCR technology during the 1970's. In matrix matching systems, the software compares small parts of each bit-image scanned to bit-patterns stored in a library, finding which stored character is the most similar to the bit-pattern scanned. However, the large variety of fonts, type sizes, and styles created a major problem for matrix matching. For example, an Italic "A" has a different pattern from a Roman "A," even within the same size and type family. Because of this, a matrix-matching OCR system must have either an enormous library of bit-patterns, (which requires a time- consuming search for each match), or the system must be limited to matching a few type styles. Matrix matching systems are commonly referred to as "trainable," since they allow the user to "train" the program to recognize different fonts. Generally, after a document has been scanned, the program separates out what it believes to be character images and asks the user to identify each image. It then stores each bitmap as the assigned character in its library and matches later images against that collection of bitmaps in order to identify characters. This process is very time-consuming, given the number of fonts available today. In 1974, a company named Kurzweil was formed to extend the capabilities of OCR to fonts other than the set fonts. The company's initial goal was to enable blind people to hear written documents through OCR software and voice synthesis. A new technology was sorely needed, since matrix matching was becoming increasingly difficult, as word processors and laser printers gave rise to a rapid proliferation of fonts and heavily kerned, touching text. The technology pioneered by Kurzweil for the blind was called "OmniFont." OmniFont, also known as "feature extraction," looks at the features of a character to recognize it, instead of looking at the entire letter and matching it to a letter in its library. The features each character are matched to the features of a known character. For example, a figure charac- terized by two slanted lines with a horizontal line across the center is an "A." A vertical line with a circle attached on the lower right hand side is a "b." If the circle is on the other side, it is a "d." OmniFont works on most normal fonts because most fonts, as different as they are, share the same features. The major benefits of OmniFont over matrix matching are speed and the ability to read most normal fonts. The increase in speed is the result of minimizing the samples table in relation to the volume of fonts supported. A matrix matching table can include multiple samples of each character and can be updated by the user training it. OmniFont only uses a table of generic features which does not increase in size and makes the search process much quicker. In 1976, DEST pioneered an OCR solution to the business and office market, and in 1980, introduced a product call the Workless Station. The company claimed that the Workless station garnered 65% of the flatbed scanner market. However, the product was specialized and not for the mass commercial market. In 1988, Caere brought OCR to the mass commercial market with the OmniPage product -- an OmniFont OCR package aimed at the rapidly expanding flatbed scanner market. What had cost many thousands of dollars and ran only on expensive hardware, was now offered to owners of personal computers with flatbed scanners. OCR on Every Desktop Scanners -- the electronic "partner" of OCR -- give "eyes" to the computer by providing a bridge between the analog world of everyday reality and the digital world of the computer. But before Caere revolutionized the scanner market with the introduction of OmniFont technology, flatbed scanners were seen as devices for capturing images, not text. Today, scanners are seen as both graphics and text solutions. Until recently, quality images could only be captured and digitized with extremely expensive flatbed and sheetfed scanners. However, the same functionality and sophistication are now available in the smaller, more affordable hand-held scanner. As a result, hand-held scanners have evolved from tech toy of computer hobbyist into integral, productive desktop tools for business people as well as the home user. As this evolution takes place, users are demanding capabilities beyond image capture, as they purchase hand-held scanners to create complex documents that incorporate both text and graphics. Flatbed scanners are already able to perform optical character recognition (OCR) at high level of speed and accuracy; the challenge lies in bringing this capability to the hand-held scanner. In 1988, Logitech introduced the first ScanMan hand-held scanner and brought scanning to the individual desktop. The unit was intended for graphics scanning and limited to 200 dpi hardware resolution. In addition, it was difficult to scan straight with this early model, which contained only one set of rollers. Thus, OCR was not a recommended use for the scanner. What's more, initial OCR packages for hand-held scanners were expensive and, in many cases, too slow and inaccurate to truly enhance individual productivity. ScanMan Plus for DOS, introduced by Logitech in late 1989, paved the way for OCR in the hand-held environment. With its 400 dpi hardware resolution, extra set of rollers, scanning speed indicator, straightedge head design, and scanning speed indicator, ScanMan Plus enabled users to control their scans and achieve a level of resolution necessary for OCR. The first version of CatchWord, a DOS-based OCR software by Logitech, followed the introduction of ScanMan Plus. CatchWord marked the second stage in the evolution of Logitech hand-held scanners into highly functional, multipurpose input devices. CatchWord used OmniFont technology, giving hand-held scanners the flexibility to capture a wide range of fonts and styles. CatchWord was also able to scan full pages of text by stitching together two scans of a full page. In 1992, Logitech introduced CatchWord Pro for Windows. CatchWord Pro for Windows represented a new generation of OCR software that kept the special requirements of hand-held scanners in mind. Logitech is now directly partnering with the acknowledged market leader in OCR software for the personal computer and the founder of OmniFont technology -- Los Gatos, Calif.-based Caere Corporation. Caere is tailoring its popular OmniFont Direct product for use with Logitech's Windows-based ScanMan hand- held scanners. The application -- OmniPage Direct for Logitech -- is positioned as an affordable basic utility designed to meet the needs of ScanMan users who wish to capture a few pages of text to incorporate into other documents. Acknowledgment Much of the history of OCR was obtained with permission from the book The History of OCR by Herbert Schantz. Herbert Schantz is the Director and Vice President of the Recognition Technology Users Group and a member of the OCR/Scanner/Fax Association. He has written many papers and given numerous presentations on the theory, economics, and application of OCR dating back to 1969. Logitech would like to thank him for writing such an exciting and informative book on a subject that does not have that much written about it.