


Lastly, it is also possible that Optical Character Recognition (OCR) with low accuracy was applied to your document before uploading it to Docparser. Another common reason is that the character mapping information was deliberately obfuscated as a protection mechanism to prevent the reader to "copy & paste" the text data. The reason for this can be that the document was produced incorrectly. More specifically, your PDF document is probably missing important information about font character mapping. Some imported PDF documents may return garbled text when you view them in the parsing rule editor or process them with existing parsing rules. When you see unreadable gibberish symbols as shown in the screenshot below, you are likely dealing with a corrupted PDF file. What to do when a PDF document is converted to garbled characters and symbols?
