Automatically name scanned documents

Automatically name scanned documents how to#
Automatically name scanned documents pdf#
Automatically name scanned documents manual#
Automatically name scanned documents full#

Setting a scan condition* 1 (Paper Size, Image Processing, Paper Supply, Sub Area, Imprinter, Control Sheet, Rotate, Duplex, Automatic crop, Deskew, Length control).Your application may label this as “make searchable”, “apply OCR”, “text-under-image” or “searchable PDF.

Automatically name scanned documents pdf#

With most data capture solutions to create a full-text index, users simply select the output file format as a “searchable PDF.” This uses OCR technology to create a PDF file with two layers, an image layer and a text layer that can be used for full-text searching. Learn more about automated indexing at ImageRamp. Additionally, drag-and-drop OCR allows an operator to highlight document text which is automatically OCR'd and dropped into index fields. With zonal OCR, document areas are specifically identified for OCR capture. W ith OCR, you can make your image-based file fully text-searchable or extract data from a zone for indexing.

Automatically name scanned documents manual#

The document can be tagged for manual inspection before further index processing is done Optical Character Recognition and Indexing Regex can also play a role in Index Field Validation. If an inventory item should contain three alpha characters followed by five numbers, advanced indexing solutions can use regex to recognized this pattern and reject all documents with items not meeting this rule. The scripting process can look for words with specific characters, lengths, character types, or preceding keywords. The use of regular expression scripts (regex) has found its way into this arena by providing a powerful tool to help identify keywords or the actual string of text that is desired for capture. Or existing text-based office documents such as spreadsheets and word documents can also be mined using these techniques.

PDF print streams is another method used to produce the source data for invoice runs or other AP/AR functions that can then be mined for data and document splits.

Automatically name scanned documents full#

If your documents are scanned pages, you can use full page (text) OCR tools, to turn them into files that can take advantage of this. Step 6 - The Processed output - We can now see the results of our processing with all numeric values extracted from the OCR zones.įiles that contain text can be mined using various data mining techniques. Here we've loaded a 17 page scan and processed the splits in accordance with our newly defined delivery notes document type. Step 5 - Processing the splits - Once set up, we can now load individual files or set up folder watching to process files. Step 4 - File Naming and paths we can finish the document type configuration by defining how the file is to be named using barcode, system or text extraction keywords, and defining where the files will be coming from and go to. All text will utilize the character definitions defined earlier and will use the regular expression to extract just the specific text we desire. Step 3 Define our Zone - Now select the icon to define a rectangular zone from which we will perform the OCR extraction of text. In this script we are looking for "" at the end of the string and extracting everything in between. This offers a way to pinpoint the exact text patterns we want to extract. Step 2 Regular Expressions - We can also take advantage of the uniqueness of the text and Regular Expression. You can enter any specific characters or use the built in character types found in the OCR Settings panel. All text identified in the region of interest will only result in these characters. Step 1 OCR Fine Tuning - One of the first things to do is to set the OCR engine to only recognize specific characters. We will walk through the steps of ensuring the highest degree of automation with ImageRamp in this article.

It is costly to scan each document individually, so splitting is a desired output.

OCR tends to misread numbers as letters (1 as I, 5 as S, 8 as B and 0 as O).

Inconsistency of the location of the text,.

Some of the common issues inherent with this process are multiple. And the desire is to automate the naming of the resulting delivery tickets.

Each delivery note has a specific string of text that is contained within square brackets ie "]. Since these are newly scanned, OCR processing is required to add intelligence to the scanned image files. The tickets contain a unique text identifier including a date and three part numbers encased in square brackets. In this scenario, a company scans several delivery tickets at the same time. How can we obtain the highest degree of automation out of our scanned documents? Delivery Tickets Use Case.

Automatically name scanned documents how to#

How to extract Zonal OCR text to name and split files?Įxtracting content from existing scanned documents can save significant time when done right.