Skip to end of metadata
Go to start of metadata

Tools to help process digital files in \\Romeo\processing


Logon with: 

ssh railsdev

Creating image derivatives with convertImages.py

convertImages.py creates compressed images (jpg, png) and pdfs from master images.

  1. Log on to processing server or clone the ingest-processing-workflow repo from Github.
  2. cd or open a command line shell in the ingest-processing-workflow directory.
    1. on server: cd /opt/lib/ingest-processing-workflow

    2. the git repo directory you cloned on your local machine
  3. Run: python3 convertImages.py <package ID> -i input -o output
    • Examples:
      • python3 convertImages.py apap301_h4fMLPL48CuxFPLpYxTmkL -i tif -o jpg
      • python3 convertImages.py ua950.009_qUQbs7GYhzmB3uL3yjH5uX -i tif -o pdf
      • python3 convertImages.py ua809_JxkK2VWVFu7F8VWaTe72BG -i pdf -o pdf
  4. (optional) A -p flag with a subpath limit the input to that path, relative to the masters directory:
    • Example:
      • python convertImages.py ua802.011_xMHVAto2AuzLfd2NtP9STY -i tif -o pdf -p TIFFs/edited

      • This will only convert files in:[package]\masters\TIFFs\edited
  5. Files will be created in \derivatives subfolder
    1. directory structure will also be duplicated
    2. for PDF outputs, all input images in the same folder will be joined as a single PDF in the order of the filesystem
    3. (Server only for now) PDF inputs and PDF outputs will combine in PDF files in folders in the order of the filesystem

Dependencies:

Arranging Digitized and Born-Digital Materials in ArchivesSpace

  • Use asInventory (asUpload.exe) to enter initial description in ArchivesSpace
  • Use asInventory (asDownload.exe) to export the same description from ArchivesSpace with the new identifiers
    • Be sure to use the whole spreadsheet, even if you are not adding digital objects for all items or the order will be altered. Just leave the DAO field blank for these items.
  • Export the changes you made in ArchivesSpace to ArcLight

  • Place a copy of the exported spreadsheet in the package's \metadata directory

    \\Romeo\SPE\processing\<collectionID>\<packageID>\metadata


  • Use listFiles.py to make a .txt file of all derivatives

    python3 listFiles.py apap015_CijY985mDUy6hdLSPPYqRR

  • Use the derivatives.txt file in the package root to copy and paste and arrange derivative relative paths in to DAO column in exported asInventory spreadsheet



  • You are adding ines are added, use asInventory to upload the additions, and re-download a new spreadsheet. This will create ASpace IDs for the new records.

  • Run buildHyraxUpload.py with the package ID as an argument to create Hyrax Upload .tsv file

    sudo python3 buildHyraxUpload.py ua950.012_Xf5xzeim7n4yE6tjKKHqLM

  • Add Resource Type, Licenses or Rights Statements to all objects in the Hyrax upload .tsv file

Resource Types:

      • Audio
      • Bound Volume
      • Dataset
      • Document
      • Image
      • Map
      • Mixed Materials (Avoid)
      • Pamphlet
      • Periodical
      • Slides
      • Video
      • Other (Avoid)

Licenses:

      • BY-NC-ND: https://creativecommons.org/licenses/by-nc-nd/4.0/
      • Public Domain: http://creativecommons.org/publicdomain/mark/1.0/
      • Unknown

Rights Statements (if License is "Unknown"):

  • Move a copy of all derivatives to be uploaded to \\Lincoln\Library\ESPYderivatives\files
    • (Optional) Use a collection ID subfolder if convenient: \\Lincoln\Library\ESPYderivatives\files\apap101

  • Move the Hyrax Upload .tsv file to the Hyrax import directory: \\Lincoln\Library\ESPYderivatives\import

  • Upload the files to Hyrax using the Batch Upload to Hyrax documentation (Starting with Step 4)
    Note: All files will be public by default

  • When the import is finished, copy the completed .tsv file back to the package's metadata file from: \\Lincoln\Library\ESPYderivatives\complete

  • Run updateASpace.py with the package ID as an argument to add the correct URIs back into the ASpace export

    python3 updateASpace.py ua680_FbBxaYn8Jm9tBxuXsQ6R3L

  • Don't forget to unpublish/republish to make sure the collection is exported!

  • Use asInventory to re-import the ASpace spreadsheet with the correct URIs back into ArchivesSpace

  • Run packageAIP.py with the package ID as an argument to combine the processing package and the SIP in to an AIP
    • Must be run from the processing server, not a local machine
    • If running without flags, use "packageAIP" function from /etc/profile.d/processingFunctions.sh
      • Will log to \\Romeo\SPE\processing\log\<collection ID>
    • Use a -u flag to use the master files from the processing package instead of the files in the SIP
    • Use a -n flag for no derivatives, this will only package master files
    • If this is used, you must also delete the SIP manually after examination with safeRemoveSIP.py

    sudo python3 packageAIP.py ua950.012_Xf5xzeim7n4yE6tjKKHqLM

    packageAIP ua950.012_Xf5xzeim7n4yE6tjKKHqLM


    sudo python3 packageAIP.py -u ua435_LUUaFPvezhmdcwwnVX3drV

    sudo python3 packageAIP.py -n ua950.009_qUQbs7GYhzmB3uL3yjH5uX

    python3 safeRemoveSIP.py ua435_LUUaFPvezhmdcwwnVX3drV

Replacing /masters with Masters in SIP

The masters directory in /processing is a duplicate of the SIP. If you edit or delete the masters