See the release notes for details on the latest changes. OCRmyPDF uses Tesseract for OCR, and relies on its language packs. For Linux users, you can often find packages that provide language packs: ...
A local tool for extracting text from scanned PDF documents using OCR (Optical Character Recognition). Runs entirely on your machine - no data sent to external servers. pdf-ocr-extractor/ ├── app.py # ...