When utilizing a batch job scheduler like Flux, how does one process multiple files concurrently?
Many batch schedulers execute workflows that process a file one at a time and from beginning to end. These files are scheduled to run through a workflow while many are ready to process. Given this scenario, files begin to back up as they wait for the prior file to finish and proceed with the next file through the workflow.
But if one separates and categorizes workflows between those that listen for incoming work, schedule that work, and those that then perform the work, new opportunities arise.
Let’s look closer at a scenario used by many payment vendors concurrently processing incoming payment files to post on their back-end systems. A “listening workflow” listens for files and moves them to what can be called the landing zone. A “scheduling workflow” waits for files to arrive into the landing zone. Upon arrival of the files, a processing workflow initiates processing for each file.
Within the listening workflow, a File Move action moves files to the landing zone. The scheduling workflow watches for these moved files. The scheduling workflow then creates the processing workflow and passes the filename to this workflow for processing. When creating the processing workflow, the filename or some metadata is extracted from the file and is appended to the workflow’s name – uniquely identifying what file is being processed for graphical monitors or a dashboard.
The following Flux workflow outlines this process graphically. The Initiate Process per File starts many processing workflows to handle the concurrent processing of files. These processing workflows can be as complex as required to process the file and are not shown in detail here.
Such reusable processes are an integral feature of Flux.
Contact firstname.lastname@example.org for additional examples or further details.