Workload Automation and File Processing
There exists many forms of workload automation (WLA), ranging from scheduling recurring jobs such as database tasks, DevOps tasks involving software build and deployments, to virtual machine and container provisioning for software rollouts. Another significant workload automation area is that involving file processing.
The job scheduling of file-related activities can vary from very simple to highly complex. Sometimes the scheduled moving of a file from a remote location to a local file system is all that is required. But as the complexity of processing increases, the number of actions that need to be performed against files quickly expands. Consider the following list of file actions as a starting point:
| aggregate | compare | concatenate | convert | copy | create | decrypt | delete | detect | encrypt | extract | generate | inject | load | match | merge | move | publish | replicate | sort | split | transcode | transfer | transform | translate
Coordinating and integrating these individual actions into useful functionality is a key goal of automating file processing. Some examples of such functionality include:
- Receipt and Acknowledgement: Register the receipt of a file and send an acknowledgment of receipt.
- Publishing: Files are accumulated over time or until a threshold is reached, at which time the accumulated files are archived or otherwise processed.
- Forwarder: Files are forwarded, with their contents unchanged, to another location for subsequent processing.
- Ingester: Incoming files are ‘ingested’ into a database, information warehouse, or archive, perhaps after some transcoding, repair of reject records, or other kinds of activity.
- Workflow Provisioning: File workflow provisioning, sometimes called onboarding, is a mechanism to collect information regarding a customer and process that information to set up a subscription delivered in the form of files.
File Receipt and Acknowledgement
A common file-related function is to register the arrival (i.e., receipt) of a file into a system, into a database, and then send an acknowledgment of receipt. This sounds easy initially, but additional analysis exposes increased complexity. For example – what started out as simply registering receipt of a file and sending an acknowledgment expands into:
- Incoming files are located in different directories on multiple servers. Some of these servers are simple local file shares while others are on FTP and SFTP servers.
- Incoming files may be encrypted, requiring decryption that varies by customer or file exchange partner.
- Duplicate checking must be performed on incoming files to ensure the same file does not arrive from the sender and get processed two or more times. This duplicate checking differs based on incoming file type and the sender.
- Not only must incoming files be registered and acknowledged, but the lack of arrival of a file by a specified time or event must generate a notification to operations staff and the late sender.
- The routing and format of the acknowledgment is queried from a customer contacts database.
- The acknowledgment may need to be sent as a file, a web service call, or as an encrypted email.
- Outgoing files may need to be encrypted.
- Late arriving files need to be acknowledged back to the customer as being late, and the content of the acknowledgment will differ on these late arriving files. In addition, late arriving files need to have an alert email sent to operations staff so that they can follow up with the sender.
Another file-related function involves accumulating files over the course of some interval (e.g., some number of hours or minutes, or until some number of files is present, or until some total size of files is achieved). These files are validated, sometimes concatenated, processed against other systems (possibly by transforming the file or matching the file against a database) and then published into reports, forms, and alternate file formats – for submission to other systems or third parties.
A file forwarder involves forwarding files – with their contents unchanged – to another location for subsequent processing. There are many instances where remote offices or branches accumulate payment files constructed by their in-branch or back-office counter solutions. Forwarding these files to a central office, either individually or aggregated into larger files on a timed basis, with controls to ensure all files are successfully transferred and then balanced with the central office, is a quite common file function. File Forwarders also provide multi-destination routing, transmitting files to multiple locations (e.g., to a main site and a backup or archive site) when needed.
A file ingester takes incoming files and ‘ingests’ them by transforming them into a format suitable for loading into another system – generally into a database, information warehouse, or archive. Ingesting increases in complexity when errors are encountered and new processes need to be initiated – sometimes in parallel with the load process itself to the database – to repair files and records that then need to be merged back into files before the ingestion can be completed.
File Workflow Provisioning
File workflow provisioning, sometimes called onboarding, is a mechanism to collect information regarding a customer or consumer and process that information to set up a route or subscription to information delivered in the form of files. Often the provisioning information is an incoming file itself, but it can also be provided via interactive input from an operator or via incoming messages delivered, for instance, via web service calls. Workflow provisioning allows the creation of consumer and exchange partner subscriptions to the enterprises’ data offerings and often requires coordination with other enterprise systems such as billing.
Workload automation and file processing fit nicely together to form the foundation of a robust set of services for many enterprises. The challenge exists in organizing your use of the features and functions of your workload automation tools in a coordinated manner to yield reliable and scalable functionality. Taking the time to carefully architect and design your file processing environment – from discrete actions and activities, then to the services and functions exposed to your customers, pays significant dividends in reliability, flexibility, and understandable services.