9422 shaares
228 private links
228 private links
4 results
tagged
ocr
A very basic shell script that I'm using to gauge the quality of already OCRed PDFs. Takes the filename of a PDF as a parameter, and prints the total word count, the count of words not known by aspell, and the percentage of unknown words. A good PDF (exported straight from the original source) likely has an unknown rate of around 5%, while a poorly OCRed scan of questionable quality may be 20% or higher.
Requires pdftotext and aspell.
PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
It has support online. It seems to a a promising project!