Projects

Court Transcripts
Scanarkist digitized a 30-year archive of 500,00 pages of bound court transcripts.  Material quality ranged from pale carbon-copy blue text to black-and-white on varying manufactures of paper.  Images were captured in  24-bit color with conversion to enhanced grayscale to ensure text legibility.  Each bound volume included up to a dozen transcripts, which were extracted separately.  Title, timestamp and certification pages were identified and integrated with court-provided metadata including docket number, name, and date.  Images were output to FADGI standards, including color masters, grayscale derivatives in JPEG and TIF format, and a PDF/A-1B document for each of the 4,300 transcripts.  Transcripts are hosted in the Nainuwa digital library system.

Research Library
A multi-year project implemented an enterprise solution for digitizing to preservation standards a research library of 100,000 volumes originating from the mid-19th century.  Scanarkist conceptualized and implemented systems and custom software for digitization, metadata enrichment, data transfer and web hosting for discovery and presentation.  All elements of a book were scanned, including front and back covers, spines, slipcases, jackets, foldouts, and inserts.  Additional physical characteristics such as handwritten annotations extending deep into the book’s gutter or out to the page edge were carefully handled to ensure complete integrity of the final image.  Up to 20,000 pages were scanned daily, post-processed to cropped derivative images, OCR’d, and uploaded to the web application on a continuous 24×7 basis.  The web finding aid features full free-text and metadata search and bandwidth-independent responsive display of book pages on desktop and mobile devices.

Newsletters
An archive of newsletters spanning a century in print and microfilm form had been digitized to various formats by several vendors over a period of years.  Some of the material in print format had yet to be digitized.  The already digitized material consisted of images in JPEG and TIFF format and PDF documents, some of which were searchable but others not.  A single PDF in some cases was hundreds of pages long containing many newsletters.  Scanarkist was hired to digitize the remaining newsletters and organize everything into a cohesive archive with each newsletter separated in it’s own PDF file.  We created a chronological index identifying the file location and format of every newsletter.  This required indexing the PDFs by page number for the start and end of each newsletter, including in some cases reversing the order where materials had been digitized back to front.  Each newsletter page was extracted from the PDF as a single JPEG or compiled from the original JPEGs.  In some cases the 2 pages were digitized originally as a single image, requiring that they be split into 2 separate files.   The print material was digitized and master files output in TIFF format and derivatives in JPEG.  All JPEGs were processed through OCR, and a master set of searchable PDFs produced, 1 PDF per newsletter.

Publisher Backlist
Scanarkist digitized modern books from a publisher backlist to exacting specifications for a high-volume print-on-demand operation.  A single file for each book was output in PDF format, combining 600-dpi black/white text with 300dpi half-tone images.  Adobe Photoshop was used to enhance the printability of half-tone images.  PDF files were output from Adobe Acrobat integrated with a custom software application to produce documents compliant with the PDF/X-1:2001 standard.

Artist Archive
Scanarkist provided a complete service including metadata capture, digitization, and web hosting for an  Historical Archive consisting of 100,000 documents and photographs.  Materials were received in file boxes indexed by category and organized into file folders with text descriptions.  Scanarkist indexed materials by description and organized them into document collections.  Each image was post-processed into tiled images allowing 11 zoom levels.  A custom web application was developed for discovery and presentation on the client’s intranet.

Photo Archive
A Photo Archive consisting of 30,000 photos (black/white and color), glass negatives, and transparencies originating from the 19th century to present.  The materials had been organized to presentation standards containers customized to their size.  Scanarkist indexed all materials by catalog number and photographic techniques, and other attributes handwritten on the envelopes and backs of photos.  Duplicate photos were identified and not scanned.  Epson flatbed scanners were used for digitization at resolutions from 4800 dpi to 400 dpi depending on original size and material type.  The final output of digital photos was published to client’s intranet.