Batch Job Orchestration: Batch Job Scheduling + File Processing

Batch Job Scheduling

Batch job scheduling is a key and often overlooked facility of modern enterprise systems. Although considered ‘low-tech’ in this age of streaming video, mobile, big data, and high-speed messaging, much of an enterprise’s day to day processing taking place today is via scheduled batch jobs.

Batch job scheduling is frequently addressed in a fragmented, siloed, application-by-application basis. Each application has an isolated view of its jobs — those activities and resources needed for the application’s processing. Job schedules and dependencies are often buried in manual procedures, myriad task schedulers, and batch and process scripts.

From Batch Job Scheduling to Batch Job Orchestration

As system complexity continues to increase, the need for centralized control and orchestration of an enterprise’s batch jobs grows ever more pressing. Such centralized platforms are often referred to as ‘orchestration’ platforms.

“An orchestration platform is dedicated to the effective maintenance and execution of business process logic. Modern-day orchestration environments are expected to support sophisticated and complex service composition logic that can result in long-running runtime activities.” See: SOA Patterns definition of orchestration

The Benefits of Batch Job Orchestration

Those who have moved from basic job and task scheduling to batch job orchestration have seen significant benefit in enterprise activities such as:

  • Improved throughput
  • More effective resource utilization
  • Improved consistency in backup and restoration
  • Faster recovery from disaster and quicker failover
  • More timely satisfaction of service requests
  • Increased visibility of Incident management / event response
  • Better audits of data movement

Batch job orchestration sits within the domain of IT process automation. See the research report, “IT Process Automation” by Michael Biddick for an examination of the above challenges and support regarding IT process automation

In addition, proponents cite a litany of opportunities for savings. Their research shows opportunities for:

  • Savings due to potentially using fewer or less costly personnel
  • Savings realized by reducing or eliminating manual, unnecessary or repeating processes
  • Savings enjoyed by no longer missing SLAs, and reduced customer dissatisfaction
  • Savings on meeting regulatory and other compliance issues
  • Savings through purchasing fewer or smaller quantities of software and licenses
  • Savings on annual software maintenance costs
  • Savings on homegrown software development, testing, and maintenance
  • Savings through use of fewer hardware, virtual, database, or OS platforms

Batch Job Scheduling + File Processing = Batch Job Orchestration

Batch job scheduling involves initiating and controlling jobs. Jobs often process files.  A batch job orchestration platform should include facilities that address fundamental file processing activities.

Enterprises constantly grapple with disparate industry-standard and proprietary file formats, and varied channels to receive and deliver files. Challenges surface in managing the receipt, processing, converting, and delivery of this diversity. Providing a consistent means of addressing this diversity is a key tenet of file orchestration.

File processing involves many activities. These actions include, for example (note – this list is not exhaustive):

SplitExtractReplicate
ConvertReformatMatch/Merge
ConcatenateAggregateValidate
CompareParseSummarize
ArchiveSortLoad
GenerateIngestInject
CopyTransformDelete
RenameEncrypt/Decrypt DecryptingVersion
ScheduleDetectTransfer

In a manner similar to a web application server, a batch job orchestration platform wraps each of the above file actions with:

  • retry and error handling capabilities,
  • a reuse repository to maintain each of the above patterns and make them reusable in multiple circumstances,
  • load balancing and distributed processing,
  • centralized monitoring, and
  • service level tracking and alerting.

A batch job orchestration platform also addresses the ‘no one size fits all’ paradox. An approach that works well for processing small text files may not fit for processing very large (gigabytes in size) binary files containing images or digital media. Any effective batch job orchestration platform must provide a robust toolset of capabilities to address the diversity of processing it may encounter.

Key attributes of a batch job orchestration platform consist of, for example:

  • Load balancing of work
  • Clustering across multiple physical and virtual processors
  • Fine-grained partitioning and allocation of work across a cluster – allowing work to be processed on machines most appropriate for the job and file processing required
  • Parallel execution of processes
  • The facility to delegate tasks to other resources
  • Built-in and user-defined facilities for file transfer, file processing, and file validation
  • File-by-file process monitoring
  • Automated retry and recovery of file and job processing exceptions
  • Centralized storage and management of workflows
  • Secure processing of files and fine-grained access controls to the workflows and processing engines that govern job and file processing

Taken in their entirety, these facilities comprise the business value available from a batch job orchestration platform that is not otherwise achieved in existing application servers or homegrown solutions.

An Example

Consider a common batch job orchestration – that of routing a payload of information (such as a payments file or a report) to a destination based on routing information maintained in an enterprise database. Such orchestrations are common in numerous systems, such as the delivery of disclosure forms for financial transactions and the delivery of statements from a healthcare system.

flux batch job orchestration workflow example

In Summary

Batch job orchestration supports reliable and repeatable business processes involving batch jobs and files. A batch job orchestration platform standardizes and provides a repository for a catalog of common processes that are then designed into executable workflows supporting robust orchestrations. A batch job orchestration platform that incorporates file processing has significant value for a wide variety of enterprise needs.