Using ocrmypdf to batch OCR PDFs and convert to PDF/A on the processing server
Step-by-step guide
- Log on to processing server or clone the ingest-processing-workflow repo from Github.
- cd or open a command line shell in the
ingest-processing-workflow
directory.on server:
cd /opt/lib/ingest-processing-workflow
- the git repo directory you cloned on your local machine
- Activate the orc virtualenv
pyenv activate ocr
- Run the OCR script
python3 ocr.py [package_id] -p [relative path from derivatives]
Related articles