PDF Image + Searchable Text Conversion: (formerly
known as PDF plus hidden text) contains a bitmapped image of the original, and a
hidden layer of searchable text. The conversion process involves: scanning the
hardcopy original, performing OCR (Optical Character Recognition) to capture the
text of the document, and distilling the two layers into a PDF searchable image
file. Though text can be searched, hyperlinks and bookmarks are not fully
functional in this format. As with PDF image only, PDF searchable image files
are only as legible as the original. And PDF searchable image files have the
largest file size of the three types - this can be a big issue if the PDF
document is bound for the Internet.
Pages will be displayed as image resulting in accuracy
which is inherently high based on image displayed.
Text resulting from an OCR (Optical Character
Recognition) process may be “bonded” to the originating image to create a
PDF/Searchable Image file. When you search for words or phrases, they will be
highlighted in the image.
This background text allows searchability, but the
accuracy is dependent on the quality of your originals and other factors. Based
on this background text, you have two options:
- PDF Image + Text (Raw or uncorrected OCR text)
- PDF Image + Text (Corrected or proof-read)
For many applications, the raw conversion with
uncorrected text is accurate enough. For clients needing higher accuracy rates,
Suntec will correct and proofread the OCR output. This process is often vital
for documents containing italicized characters and small text, or for
poor-quality original documents. PDF/Searchable Image files may be indexed for full-text
retrieval by any search engine capable of indexing PDF files. Typical applications include:
- business records
- academic journals
- advertising and promotional materials
- historical materials and
- handwritten materials including color or grayscale images.
PDF/Searchable Image is used globally by governments and businesses
for electronic storage and retrieval of:
- Business Records
- CD-ROM publishing
- Electronic Publishing
- Manufacturing and design documentation
- On-line content / Intranet content
- Records Retention / Legacy Data Conversion
- Delivery Challans, Shipping notes, and Invoices
|
PDF File Type Comparison |
| |
Image |
Image + Searchable Text |
PDF Normal (Formatted Text & Graphics) |
| Accuracy |
Very high (Page is retained as image) |
Very high (Page is retained as image) |
High (in effect, re-authoring the document) |
| Text searchability |
No |
Yes |
Yes |
| File size |
Large (Typically, 40-50 KB at 300 dpi without grayscale or color images) |
Large (Typically, 50-60 KB at 300 dpi without grayscale or color images) |
Small size (Typically, 4–6 kb per page for simple documents) |
| Typical Application |
Budget friendly archiving |
Full-text search for bitonal files |
Tiny but rich files - great for the
web |
| Cost |
Low |
Medium |
High |
|