Optical Character Recognition or OCR is an essential tool in transitioning to digital document management. OCR may seem like a recent invention, but in actuality, the first form of OCR was invented in 1913. The Optophone, invented by scientist Edmund Edward Fournier d’Able, was a device that was designed to help the blind read. This device used photosensors to convert printed text into audible sound. Unfortunately, the first inception this device was incredibly slow, reading only a single word per minute. It wasn’t until about 1974 that OCR technology was developed that could read and covert multiple fonts at an efficient speed.
The techniques employed by OCR technology have remained almost unchanged over the years. While OCR has become faster and more efficient the basic functionality has remained the same. A document is converted to an image and the OCR program uses pre-built rules to try and match each character to an alphabetical glyph. It’s a simple matter of identifying and matching. With the advent of machine learning, OCR is undergoing its first major evolution since the 70s. Instead of being reliant on preprogrammed character sets, OCR can begin to identify new characters by training with large volumes of data. This means that OCR can learn to recognize any number of unique symbols and characters from nearly any language. This eliminates the need to painstakinely program in each individual character.
1913 – Scientist Edmund Edward Fournier d’Albe Invents the Optophone reading device
1960s – The US postal services is using OCR technology in the form of an automated address reader based on a device developed by David Shepard.
1974 – Kurzweil Technologies invents the first OCR device that can read multiple fonts – previously devices were dependent on special typefaces designed specifically for machine-reading.
1980s-90s – OCR technology becomes a staple in the airline and retail industry and is used to scan price tags, passports, and more.
2000’s – OCR technology moves to the Cloud and becomes available to the masses online through freeware such as Adobe Acrobat.
These advances are significant because digitization for its own sake isn’t always a good investment. First, there has to be a value proposition behind converting records to a digital format. Having an unorganized mass of digital records isn’t much better than having a dusty archive of misfiled paper records. In either of these situations, the valuable information remains inaccessible – lost in a sea of pages without any indication of what is important. However, by incorporating hyper automation tools like machine learning, OCR can do more than just digitally translate text. When powered by deep learning it can be used to find meaning and insights within your documents. OCR then becomes a vehicle for the intelligent application of your business data. The ability to distill your records into usable insights can give your business a powerful advantage.
OCR is in many ways the forefather of hyper automation and is considered one of the first expressions of digital intelligence. This technology proved that computers indeed can mimic skills many consider inherently human such as reading. Over the years these tools and methods have expanded and become more refined. The world of records management is seeing the genesis of many technologies that will completely change the way we deal with information. This is particularly true in the field of artificial intelligence. OCR continues to play a valuable role in bridging the gap between the analog and digital worlds. As OCR evolves to include deep learning tools, its relevance and impact may continue to shape business intelligence and content management for years to come.
DOMA has been implementing the latest OCR technologies for our customers for over 20 years. We are focused on delivering superior value to our customers by vetting and deploying only the most effective tools and processes. DOMA has followed the field closely and has unmatched expertise in integrating OCR into your workflow, or applying it to our document conversion services.
Today, machine learning powered tools like Amazon’s Textract allow us to capture complex information without losing context. Instead of simply translating each character, Textract can identify tables or other structured forms of information and digitize them in a way that they retain meaning.
As the field evolves we are continuously updating our roster of tools to help you maintain a competitive edge. DOMA can help you build value with OCR through data-driven business insights, process automation, and more.
DOMA Technologies (DOMA) is a software development and digital transformation company whose mission is to change customer lives by lightening their workload through faster and more targeted access to their data. Since 2000, our team of 200+ experts
has helped businesses navigate all aspects of the digital world. We are a dedicated strategic partner for the federal government and private sector clients at every stage of their unique digital transformation journey.
Learn more about Hyper Automation
Digital Solutions | Cloud Solutions | Hyper Automation | DX Software | Healthcare | Federal Government | SLED | Commercial
Please complete this form to start receiving our Newsletter. Keep up to date on offers, expert articles, and news.