Skip to main content Link Search Menu Expand Document (external link)

Digitization Workflows

On this page

  1. Digitization Workflows
    1. Introduction
    2. Reference/Patron Motivated Digitization
    3. Planned, In-House Digitization
    4. Copyright Analysis
    5. File Naming Conventions
      1. Documents, Photographs, Negatives, Books and Other Flat or Paper-Based Materials
      2. Audio and Video Materials
      3. Bulk File Renaming
    6. File Organization
    7. Quality Control

Introduction

Digitization processes will vary based on the format being digitized and the organizational process through which the materials are being digitized. There are three avenues for digitization: digitization conducted by reference staff by patron request, planned in-house digitization activities, and planned external vendor digitization. Planned digitization activities include: strategic digitization of at-risk materials; materials that are part of a broader strategy to increase access; digitization of materials loaned explicitly for that purpose; and digitization of materials outside of the scope of reference staff work, such as audio or video materials or higher-res photograph digitization. All major digitization projects must be completed in coordination with the Digital Archivist to ensure proper preservation of files.

Digitization conducted by reference staff members is usually done for original works on paper and is conducted at lower resolutions. Often this resolution of materials is all that is needed by most users to conduct their research. Such responsive digitization work takes time and should ideally not be repeated more than necessary to minimize handling of the materials and repetitious work. Following the steps outlined in this section allows reference scans to be placed in the digital repository and be used more widely.

Reference/Patron Motivated Digitization

  • Reference/patron motivated digitization will begin when a patron requests a reference staff member digitize materials from a collection. All digitization work that can be completed by reference staff, such as digitization of 2-D documents, should be completed by adhering to the following sections:

    • File Naming Conventions
    • File Organization
    • Quality Control
      • For quality control checks, the correct file location for this work may vary based on the request. OneDrive, thumb drives, the I:/ drive, or other carriers work equally well to transmit materials to the Digital Archivist as long as the file naming conventions and file organization guidelines are followed.
      • Once digitization is complete, add the files to I:\DLC WV Collection!Public Services and Reference\Recent Scans for the Digital Archivist to move into storage.
  • All digitization work, such as that of audio and video materials or preservation quality photograph or slide digitization, that cannot be completed by reference staff but for which equipment is available to support in-house digitization should follow the following steps:

    • A reference staff member or the Photographs Manager must submit a Reformatting Request Form (accessible to WVRHC employees only but shared in Appendix 2) for the item, being sure to indicate that the item is a patron request.
      • The Digital Archivist or Photographs Manager will respond with an anticipated completion time for digitizing the materials. The Digital Archivist or another assigned employee will digitize the materials and relay them to the reference staff member.

Planned, In-House Digitization

Planned, in-house digitization is done at or near preservation quality (as limited by available equipment) under the supervision of the Digital Archivist. Digitization activities occur through two paths: small and large scale.

Small-scale digitization of select items within a collection will be completed by the relevant party requesting digitization submitting the Reformatting Request Form (accessible to WVRHC employees only but shared in Appendix 2). Digitization work must adhere to the File Naming Conventions, File Organization, Quality Control considerations outlined in this section as well as the information in the Planned, In-House Digitization section.

Large-scale digitization activities, or activities that involve scanning complex materials (such as loans for digitization) or large portions of collections, will only begin once the Digital Archivist has completed a Digitization Project Planning document and coordinated with the Photographs Manager or individual conducting the digitization. This document will include information related to deadlines, filename adaptations, metadata, rights, promotional planning, broader project management considerations, and more. Taking an approach that emphasizes project management considerations minimizes miscommunication and increases the quality of the final product.

Both large-scale and small-scale digitization projects will adhere to the following:

Items queued for digitization must be given to or retrieved by the Digital Archivist or Photographs Manager. When removing items from a collection, fill out a Removal Sheet and place half of the sheet in the location the item(s) originally came from and the other half of the sheet with the item itself. Items currently being digitized are listed on the Digital Archives Work Tasks board (accessible to WVRHC employees only but shared in Appendix 2).

Digitization for 2-D materials below 11x17 in size will be conducted in the WVRHC. Digitization for all other materials will be conducted in the Digitization Lab. Instructions for using equipment for digitization can be found on the Digitization Lab confluence page (used internally by WVRHC and WVU Libraries). The Digitization Standards Document should be consulted to determine the settings to be used when digitizing media. All formats should be digitized at preservation quality, except when the Supported Formats and Equipment section notes that digitization can only be performed at a lower level.

If working in the Digitization Lab, materials and storage devices are NOT to be left in the lab between digitization sessions. Materials are to be returned to a secure workstation within the WVRHC when scanning is not in progress. If applicable to the project, description activities as outlined in the Description section should occur at this point during computer processing. Ensure that all file naming and organization done in the Lab is at a sufficient level to enable conduct description activities elsewhere if necessary (see the Bulk File Renaming section). Conduct post-processing and Quality Control activities as applicable. Instructions can be found on steps 7-10 of the Export and Convert Photos section in Digitization Lab documentation (only available internally to WVU Libraries employees; the steps document batch changes to image files using FastStone Image Viewer).

Notify the Digital Archivist when digitization activities are complete and digitized files are ready to be saved to preservation storage or transmitted to reference staff for user access. The digital files are uploaded by the Digital Archivist to the shared drive under the respective collection folder, where a checksum and file manifest will be generated for the materials. The Digital Archivist will create the directory structure found at Z:\Working Files\Resources_For_Born_Digital_Processing\TemplateAccessioningFileStructure for the collection and the file manifest with checksums. They will remove irrelevant folders and documents from the template where applicable. Files for the digitized materials will be stored in the appropriate folders:

If the collection is processed, the files are uploaded to Z:\A&M\ to a folder named with the collection’s four digit A&M number. If the collection is a backlog it is uploaded to Z:\Working Files\Backlog\ to a folder named with the collection’s accession number, formatted as Accession-YYYY-CollNum.

Materials will be shelved and the Removal Sheets returned to the Digital Archivist once completed.

Planned vendor digitization actions are outlined in the External Vendor Digitization section.

This section contains additional information on the approach to copyright, digitization, and digital access of materials. It is intended to augment the copyright section of the Digitization Project Planning document for collections undergoing digitization with the intention of submission to a digital repository.

To determine whether materials are in the public domain, use Cornell’s “Copyright Term and the Public Domain in the United States,” page. For materials that will be digitized and made available under a fair use argument, complete the Cornell Fair Use Checklist and save the Checklist in the folder for the relevant collection on the Z: drive. Include the name of the individual completing the checklist and the date in the file name along with the string “FairUseChecklist.”

To determine which RightsStatements.org statement to use for upload into the digital repository, use the “Rights Review: An approach to applying Rights Statements from RightsStatements.org” document from the University of Minnesota.

File Naming Conventions

Documents, Photographs, Negatives, Books and Other Flat or Paper-Based Materials

Digitized materials should follow this general file naming format:

  • CollectionNumber_BoxNumber_FolderNumber_ItemNumber
  • Example usage: 1970_b01_f01_01.tif

The collection number refers to the collection the item is from. Similarly, the box and folder number refer to the box and folder from which the item originates. The item number corresponds to the number of items that have been scanned from a particular folder. For instance, if you scan one photo from a folder, it would be considered item number 01. Scanning a second photo from that folder would make that second photo number 02.

For multipage materials, insert an additional number after the item number to indicate the order in which the files should be viewed to understand the item. This can be trickier for difficult formats like scrapbooks. Consult the Digital Archivist if you are unsure about the file naming convention.

An example for multipage materials is as follows:

  • Page 1: 1970_b01_f01_01_001.tif
  • Page 2: 1970_b01_f01_01_002.tif
  • Page 3: 1970_b01_f01_01_003.tif

For a PDF of all of the tif files in the prior example:

  • Discrete item: 1970_b01_f01_01.pdf

This general file naming convention will apply to materials of all types within a folder. Where folder numbers are not present for minimally processed materials, use an abbreviated version of the folder name to substitute for the folder number. Every other aspect of the naming convention remains the same.

If the items being digitized are unfoldered and only in a box, use the following, with the item number following the sequential numbering scheme outlined previously:

  • CollectionNumber_BoxNumber_unfoldered_ItemNumber
  • Example usage: 1970_b01_unfoldered_01.tif

Audio and Video Materials

File names for audio and video materials should be formatted similarly to 2-D materials and include the identifier for the audio or video item using the following template:

  • CollectionNumber_BoxNumber_FolderNumber(if applicable)_Identifier(minus collection number)

Examples include:

  • 4050_10_vhs_023.mov
  • 4050_9_10_rtr_011.mp4

Bulk File Renaming

The file naming process can be cumbersome if renaming files by hand. If you find yourself doing a lot of manual renaming, the Digital Archivist can work with you to determine what automated solutions exist if they are informed about the actions that need to be taken on the files before they can meet the conventions outlined here. For instance, the Digital Archivist uses the Bulk Rename Utility to do complex renaming tasks in bulk to save time and maximize accuracy. This program is also installed in the Digitization Lab.

File Organization

The top-level folder for all digitization work should be the collection number, or A&M number. Below that level, the directory organization will depend on whether the digital files are for simple objects, compound objects, or books. Simple objects are a single item, with each item having one file. Compound objects are objects that, when digitized, consist of two or more files to be used together. They can be documents, books, a strip of negatives that is digitized with each negative as a separate file, photographs, postcards with a front and back, or more.

What follows is a sample file organization featuring simple and compound objects. Bolded items are folders while unbolded items are files:

  • 4387
    • 4387_Box1_Album1 (describing a folder containing a compound object, an album, from collection 4387 box 1. The album is in folder 1 and is the first object in the folder being scanned.)
      • 4387_b01_f01_01_001.tif
      • 4387_b01_f01_01_002.tif
      • 4387_b01_f01_01_003.tif
      • 4387_b01_f01_01_004.tif
      • 4387_b01_f01_01_005.tif
      • 4387_b01_f01_01_006.tif
    • 4387_Box3_LooseMaterials (describing loose materials in a box)
      • 4387_b03_unfoldered_01.tif (example simple object)
      • 4387_b03_unfoldered_02.tif (example simple object)
      • 4387_b03_unfoldered_03_01.tif (example compound object)
      • 4387_b03_unfoldered_03_02.tif (example compound object)
    • 4387_Box7_Correspondence1of3 (describing materials in an unnumbered folder)
      • 4387_b07_Correspondence1of3_01_001.tif (example compound object)
      • 4387_b07_Correspondence1of3_01_002.tif (example compound object)
      • 4387_b07_Correspondence1of3_01_003.tif (example compound object)
      • 4387b07 Correspondence1of3_02.tif (example simple object)

An equally valid file organization is to simply place all materials directly under the top level folder, like so:

  • 4387
    • 4387_b01_f01_01_001.tif (example compound object)
    • 4387_b01_f01_01_002.tif (example compound object)
    • 4387_b01_f01_01_003.tif (example compound object)
    • 4387_b01_f01_01_004.tif (example compound object)
    • 4387_b01_f01_01_005.tif (example compound object)
    • 4387_b01_f01_01_006.tif (example compound object)
    • 4387_b03_unfoldered_01.tif (example simple object)
    • 4387_b03_unfoldered_02.tif (example simple object)
    • 4387_b03_unfoldered_03_01.tif (example compound object )
    • 4387_b03_unfoldered_03_02.tif (example compound object)
    • 4387_b07_Correspondence1of3_01_001.tif (example compound object)
    • 4387_b07_Correspondence1of3_01_002.tif (example compound object)
    • 4387_b07_Correspondence1of3_01_003.tif (example compound object)
    • 4387_b07_Correspondence1of3_02.tif (example simple object)

The only difference in the two structures above is human readability. Sometimes additional foldering, as seen in the first example, can aid in tracking how complete a digitization project is.

Quality Control

The Digitization Project Planning document, completed at the beginning of each major digitization project, incorporates a quality control step with an associated checklist to be completed both during and after digitization. Procedures for audio, video, and image file quality control are included.

Quality control for reference or small-scale digitization projects includes:

  • Files are saved in the correct file format and can be opened and viewed properly
  • Files are saved in the correct location
  • The number of files and file names are accurate
  • The files are not skewed or illegible
  • There are not any cropping issues or cropping issues have been flagged to the Digital Archivist

To verify the above for image files, tools such as going to File Explorer->View (on the upper toolbar)->Extra Large Icons (in the layout section of the toolbar) can provide a visual method of checking.

For video and audio files, opening and verifying that files play and playback is correct is necessary. It is only possible to do this for a sample of items due to time constraints. Aim for approximately 10% of the total digitized files.

Future area of work: utilize tools similar to the following - Refer to https://paradisec-archive.github.io/PARADISEC_workflows/10_quality_control.html for QC

Flag all errors to the Digital Archivist, being sure to include the name(s) of the impacted files as a reference to flag for re-digitization or troubleshooting.