OpenBSD ports

The print/ocrmypdf port

ocrmypdf-16.6.0 – add an OCR text layer to scanned PDF files

Description

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to
be searched or copy+pasted.

- Generates a searchable PDF/A file from a regular PDF
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a "lossless" operation
  without disrupting any other content
- Optimizes PDF images, often producing files smaller than the input file
- If requested, deskews and/or cleans the image before performing OCR
- Validates input and output files
- Distributes work across all available CPU cores
- Uses Tesseract OCR engine to recognize more than 100 languages
  (use "pkg_info -Q tesseract" to locate language packs to install)
- Keeps your private data private
- Scales properly to handle files with thousands of pages
- Battle-tested on millions of PDFs

ocrmypdf                      # it's a scriptable command line program
   -l eng+fra                 # it supports multiple languages
   --rotate-pages             # it can fix pages that are misrotated
   --deskew                   # it can deskew crooked PDFs!
   --title "My PDF"           # it can change output metadata
   --jobs 4                   # it uses multiple cores by default
   --output-type pdfa         # it produces PDF/A by default
   input_scanned.pdf          # takes PDF input (or images)
   output_searchable.pdf      # produces validated PDF output

WWW: https://ocrmypdf.readthedocs.io/
Categories:
lang/python print

Library dependencies

Build dependencies

Run dependencies