Skip to end of metadata
Go to start of metadata

Using ocrmypdf to batch OCR PDFs and convert to PDF/A on the processing server

Step-by-step guide

  1. Log on to processing server or clone the ingest-processing-workflow repo from Github.
  2. cd or open a command line shell in the ingest-processing-workflow directory.
    1. on server: cd /opt/lib/ingest-processing-workflow

    2. the git repo directory you cloned on your local machine
  3. Activate the orc virtualenv
    1. pyenv activate ocr

  4. Run the OCR script
    1. python3 ocr.py [package_id] -p [relative path from derivatives]