Batch Job Schedulers: Doing More Than Simply Scheduling
From fraud detection to payment clearing and digital media translation, many SaaS providers bring additional value for batch scheduled data feeds provided by their customers. They also use a wide range of methods for data collection and transfer from their customers – often slow, error-prone, and difficult to centrally monitor the data.
The following outlines a use case of utilizing Flux’s batch job scheduling, RESTful services, and file transfer capabilities to create a data transfer platform that facilitates a SaaS provider’s growth and flexibility – at a reasonable cost.
Some Current Data Collection Methods
For context some current approaches are outlined here:
- Customer routinely pushes data into a shared database (daily usually)
- SaaS provider routinely takes a snapshot of the tables using a database batch utility run via a scheduled macro on the customer’s premise
- This data is moved from the customer’s premise to the provider’s premise via SFTP
- The data is loaded into duplicated Temp tables on the SaaS provider
- Data is then moved from Temp tables into the Saas provider’s processing engine and then processed
- Customer provides a read-only database view of “live data” instead of a copy of the data in a shared database
- SaaS Provider routinely takes a snapshot of the database view using a database batch utility run via a scheduled macro on the customer’s premise
- Do steps 3-5 above
- Customer provides a read-only extract of “live data” instead of a copy of the data in a shared database
- Do steps 3-5 above
Limitations in the above approaches
- Loading database tables involves clearing and reinserting all records on the SaaS provider’s side. The SaaS provider’s system is technically at risk of breaking if a user is in the system. Mitigating this requires limiting access to the system during specific times which breaks 24 by 7 availability.
- The outlined process requires the entire dataset be reloaded every time prior to running any analysis.
SaaS providers are looking for a reusable program or workflow that will meet the following requirements:
- Utilize a modern, scalable solution
- Preferably does not rely on database specific tools
- Configurable source and destination through JSON / XML / Variables;
- Table(s) *
- Field(s) *
- Allow for multiple criteria to be passed for filtering: fieldname = > < value
- Support a “minimum” number of records being available before the process will commence (and logged as an error if below that number)
- Support multiple data providers such as Microsoft SQL Server, Oracle, MySQL, and Postgres.
- Support calling one or more stored procedures after a table or all tables have been imported
- Support audit and calling a logging service at all levels of the process
- Database connection
- Table available
- Minimum number of records available
- Transfer commencement
- No of records transferred successfully
* allow default source and destination to be the same, as well as support for different source and destination names if required
A Flux Solution
Such requirements are ideally suited for a solution such as Flux. Numerous service providers have built similar solutions using Flux’s capabilities, small footprint, and ease of deployment on multiple platforms.
For these requirements a Flux solution would be comprised of the following: each SaaS provider’s customer would install a specially packaged Flux engine – self-contained with all required componentry (Java virtual machine, internal database, etc.) The exact same package would be installed at each customer. The customer would be required to provide a server with open ports to send and receive REST calls via HTTPS, and perform SFTP file transfers.
This Flux package would run 4 workflows:
- On JVM startup, the customer Flux engine would ‘phone home’ to the SaaS provider to
- Tell the SaaS provider’s host the customer engine was running
- Download the customer’s specific runtime configuration, which would contain customer specific variables such as server name, database name and access credentials, tables names, and database queries. These ‘directives’ would be placed on a the SaaS provider’s server in a directory structure organized by customer or within a set of version controlled database tables or even inside a Git repository.
- Download and install any new or modified workflows or custom Java actions for the customer engine
- Start sending a heartbeat of status information back to the SaaS provider’s server
- Workflow 2 would periodically run a SaaS provider custom Java action that would extract changed, new, and deleted data using JDBC and store that data into JSON, XML, or CSV. The Java action would also create control and summary metrics associated with each extract to ensure the data was complete and valid for transfer. Record count, extract size, time of extract, start time, and duration of extract are candidate metrics. This workflow could also be manually initiated via a REST call from the SaaS provider to the customer engine, or via a web service call controlled via a Flux workflow on the SaaS provider’s home server.
- Workflow 3 would on a scheduled basis configured via the runtime configuration or based on a file being available to send, initiate a workflow to transmit the data created into Workflow 2 to the SaaS provider’s home server. The workflow would log data transfer metrics locally and send the metrics to the home server in the status messages sent in Workflow 1 to be logged and tracked in a file transfer table for summary data collection. This workflow could also be manually initiated via a REST call from SaaS provider’s server to the customer engine, or via a web service call controlled via a Flux workflow on the SaaS provider’s server.
- Workflow 4 is a housekeeping workflow that would run on the customer engine to archive or delete files and generate status information regarding engine performance, memory, and CPU utilization.
- The customer engine would expose a subset of Flux’s REST API, allowing the SaaS provider’s Home server to load and execute workflows on-demand if required.
The SaaS provider’s home server would be configured to execute a workflow to:
- Poll a directory and verify/validate the transferred data,
- Run a Java action or database-specific process actions to load the data and
- Execute any required stored procedures after the execution.
- Collect metrics regarding the run
To support item 1d above would require the SaaS provider create and deploy a simple web service for processing incoming logging information from the customer servers.
The provisioning of data feeds from a customer to a SaaS provider is an essential component of many SaaS business models. Numerous Flux customers have utilized this quick to market approach, as outlined above, to get their services up and running quickly, and with a high degree of flexibility and control.
For further details on our innovative use cases, contact firstname.lastname@example.org.