Digital Catalog Job Scheduling
The number and kinds of batch scheduling workflows are diverse. One valuable category of batch job scheduling activity is the scheduling and processing of digital media into on-line catalogs.
The nature of digital media varies widely, from audio files to digital photos and movies to PDF-formatted contracts and legal documents. Frequently this media is processed and loaded into a digital catalog or archive for digital subscription services. As just one example, many theme parks integrate video and photographs of a guest’s visit into a digital scrapbook for their review and purchase on the park’s website. Yet the activities performed during such digital media cataloging and archiving are often common and repeated, and these activities need to occur on a recurring basis.
These repeatable activities can be expressed as workflows — and often as a set of dependent workflows. Workflows that involve digital media almost always contain a common set of activities. By reviewing these common workflow activities, listed below, you can save time and frustration by following these best practice workflow steps:
Register: Aregistration or arrival event that signals receipt of digital media files available for processing. This could be detecting the arrival of a file or the receipt of a message via a queue or incoming REST call. The registration event may also provide metadata that directs how the media is to be cataloged.
Retrieve: The Register event may provide a URI, file location, or a file name that provides required information to initiate the retrieval of the media file. Often SFTP is used, but other file transfer protocols may be involved especially when very large files need to be transferred.
Decrypt / Decompress: Decryption or decompression is performed upon the media file in preparation for its processing.
Validate Against Standards: Validation of the digital media and its metadata is performed. Media must be tested to ensure they can be parsed and conform to agreed-upon standards or published standards. Such standards define elements such as required fields, allowed values, range checks, and media size checks.
Duplicate Check: Perform duplicate checking to ensure that the metadata or digital media has not been received previously. If the media or its metadata fail this check, a notification is often sent to the sender of the file.
Acknowledge Receipt: An acknowledgment is sent indicating the file has been received successfully.
Validate for Use: An enterprise-specific validation of the metadata and media is executed using the enterprise’s specific business rules and validations, above and beyond those called for in the general ‘Validate Against Standards’ step above. While the general validation assures the file conforms to an external standard, the enterprise may have additional constraints (e.g., required fields, allowed values, list restrictions) that must be applied against the metadata or media.
Preprocess: Other processes may also be executed against the metadata and media to determine its completeness or fitness for use. In digital content management systems, processes may execute that validate media format and content (e.g., TIFF tag validation, valid PDF form content, ensuring images are not too light or too dark, rotating images to correct their orientation, transcoding of media into different media formats).
Review / Moderate: In instances where either the metadata or the media is found to not meet enterprise standards, the metadata or media is submitted for some form of review (either automated or manual). For some media types, this review operation may be referred to as ‘moderating’, as in moderating for inappropriate content.
Depending upon the specific application, files that fail this step may be submitted to a repair or editing process where the media and metadata can be corrected or otherwise edited to allow it to pass this review.
Reject: At some point in the process files may need to be rejected back to the sender. Such rejection generally involves the creation of a rejection notice with a set of rejection reasons provided to the sender so that they can resolve the issues and resend the file.
Publish: Content from the metadata and media often has to be combined in some manner to feed other systems, such as web sites or backend billing and posting systems. This may involve database load processes, web services, and the execution of enterprise-specific application code.
Archive: In many instances, the metadata and media are indexed and stored in long-lived data retention and storage systems, in some cases up to many years.
Acknowledge Process Success: An acknowledgment is sent indicating the media has been processed successfully.
While the workflow described above is consistent and repeatable, implementing the workflow is neither trivial nor simple. Beyond just the execution of the workflow one must address:
- Monitoring potentially thousands of these processes concurrently
- Restart and recovery concerns
- The need in many instances to perform ad hoc actions
- Scaling, load-balancing, clustering, and failover are also key attributes to large-scale implementations of this particular workflow
- Service level agreement (SLA) monitoring and notifications (e.g., via email, text, or integration with enterprise service management tools) of exceptional events and SLA violations also cannot be ignored.
Batch scheduling the cataloging digital media exists as one of the many forms of workflow occurring in enterprises today. One size does not fit all workflows. Selecting the appropriate platform for this key and mission-critical workflow is essential in addressing the wide range of digital content processing occurring in today’s enterprises.