Remote File Processor

This guide shows you how to use xyOps to setup an automatic remote file processor. Meaning, a system that monitors remote servers for files, and automatically picks up and processes the files. It includes two methods for accomplishing this, and explains the tradeoffs.

Important

These features require xyOps v1.0.10 or newer.

Method 1: Monitors and Alerts

This method shows you how to use the xyOps Monitoring & Alert features, to easily construct a remote file monitor system, which triggers an alert for new files, and that alert can launch a job on the target server to grab the files.

As a prerequisite, make sure you read our Monitors and Alerts docs, to get a basic understanding of the xyOps monitoring and alert systems.

Create Monitor Plugin

The first thing we need to do is create a custom Monitor Plugin. These are super easy to make, as they can be written in any language, and the output format is totally freeform.

Monitor Plugins run every minute on the target servers, and the output can be used to populate monitor graphs, and/or trigger alerts.

For this example we'll be using a simple shell script. All we need to do is count the number of files waiting to be processed, and output the count. So set the "Executable" to bash (or a shell of your choice), and then use the following for the "Script" source:

ls -1 /opt/pickup/ | wc -l

This is using ls -1 to list files in a particular folder (/opt/pickup/). The -1 argument to ls produces a simple output format (one file per line). It then pipes this to wc with the -l argument, which counts the number of lines (i.e. files). The output of our script then becomes a single number, which will be 0 if no files are waiting, or some positive integer if files are found.

You can of course customize this to look for particular file types using a glob, or even look for a single file by using it's full path.

A few more points about setting up the Monitor Plugin:

Set the "Format" to "Text", as in this case we don't need the output to be parsed as JSON or XML.
Set the "Server Groups" so you can limit which of your servers the monitor will run on.
Copy the unique alphanumeric Plugin ID that is auto-assigned when the Plugin is saved. It will look like this: pmm2tezgxya9bidp.
Once you create the Plugin, you can re-edit it and click the "Test..." button to make sure the output is what you expect.

Create Alert

The next thing we need to do is create an Alert, with a special expression that triggers on our custom monitor Plugin output. Here is how you set this up:

First, make sure the new alert is targeting the same servers as our Monitor Plugin.

Next, set the Alert Expression to the following (replace the Plugin ID with yours):

integer( commands.pmm2tezgxya9bidp ) > 0

So, there's a lot going on here. This is a xyOps Expression, which is a subset of JavaScript. Using this you can trigger the alert based on any values available in the Server Monitor Data structure.

The commands object is where all the Monitor Plugin outputs are stored per server. Note that you can click the database search icon on the right side of the alert expression text field to explore all of the available server data properties you can use.

So, this is taking the text output of our Monitor Plugin (commands.pmm2tezgxya9bidp in this example), casting it to an integer, and then using it in a mathematical expression. If the integer result is greater than zero (meaning there are files waiting), the expression evaluates to true, and our alert will trigger.

For the "Message" text box, you can enter something like:

There are {{ commands.pmm2tezgxya9bidp }} files waiting for pickup!

Here we are using the output of our Monitor Plugin again, this time using {{ mustache syntax }}, so the value is dynamically substituted into the message text. When the alert fires, the message will be something like:

There are 4 files waiting for pickup!

At this point you can pause, save, and test the alert. Drop a file into the folder on your server, make sure the alert triggers (remember, the monitor system only runs once per minute), remove the file, and then make sure the alert clears on the next minute.

Exclusive Actions

xyOps has an optional checkbox for alerts called "Exclusive Actions". When this is checked, the alert will run only the actions specified in the alert itself, and not inherit any actions from the server groups, or the universal action set.

This is important in the context of our file processor system, because we probably do not want the default actions to run when files are detected on our servers. This kind of alert is not a "bad" alert requiring user attention, as is the case with the usual CPU, memory or disk alerts. In our case, the alert firing is a "good" thing! We have files waiting!

So it is recommended that you enable "Exclusive Actions", so the default actions do not run for our alert (e.g. take a server snapshot, send an email, fire a web hook, etc.), and only our actions run (see next section).

Run Event Action

So, we have an alert that fires when files are waiting. The next thing to do is configure an alert action to run when the alert fires. In our case we want the Run Event action, to run an event, which will grab all the files and upload them to xyOps.

In the dialog where you configure the Run Event action, make sure you check these two checkboxes:

Target Alert Server
- This will launch the job directly on the server where the alert triggered. This is important because we need to grab the files that are waiting for pickup.
Clear Alert on Job Completion
- This will automatically clear the alert as soon as the files are picked up. This is important because the alert won't clear normally until the monitoring system runs on the next minute, and by that time, a new file may have been dropped onto the server!

For the event itself, once again we can write a very small and simple shell script. All it needs to do is tell xyOps to grab all the files in our pickup directory, attach them to the job, and delete them when they're successfully uploaded. You can do all this with a single-line script:

echo '{ "xy":1, "files":[{ "path": "/opt/pickup/*", "delete": true }] }'

We're telling xyOps to attach all the files in the pickup directory to the job, and delete them upon completion. The /opt/pickup/* is a glob, which xyOps detects and expands to individual files at runtime. See Output Files for more details on this.

It is recommended that you remove the Max Concurrent Jobs limit from your event (or set it to a very high number). This is so in the case of multiple servers, all files can all be collected in parallel. For actually processing the files, see the next section.

Concurrency

For handling concurrency, the recommendation is to "chain" the file collection job into another one, which then performs the actual processing. The reason is that you want the file "pickup" job to be quick and parallel, so it run on all your servers and also clear the alert (to get ready for new files).

You can easily have the pickup job launch another one by attaching a Run Event action with the "On Success" condition. The files will be passed along automatically, and the new job will receive them as inputs. The new job can have Max Concurrent Jobs and/or Max Queue limits to control concurrency and queuing.

The secondary job can also be a Workflow, where you can use something like the Split Controller to split the files up and run separate sub-jobs for processing each file. In the workflow editor, if you wire your manual launcher node directly into your split controller, the input files will be passed to it.

Downsides

The monitoring system only runs Monitor Plugins once per minute on each server. If you need to detect and process files faster than that, this method won't work.
The file detection alerts, while generally short-lived, do cause some UI "noise" for logged in users. The alert counter will show in the header in a red box, and each alert generates a visible yellow notification.

Read on for a different method that overcomes these downsides.

Method 2: Silent Workflow

This method shows how you can construct a Workflow that actively goes out to each server scanning for files, and if any are found, it can trigger additional jobs to process the files. The neat trick here is that, using the Quiet Modifier, the workflow itself can be marked as "invisible", so it is not shown in the UI at all, and also "ephemeral", so that it auto-deletes itself upon completion (unless files are detected).

Also, you can schedule the workflow to run as frequently as you wish, even multiple times per minute, by using an Interval trigger.

Here is how the finished workflow should look:

Workflow Diagram

Getting Started

The first thing you should start with is a Manual Run trigger node. You can skip the scheduler and quiet modifiers for now -- only add those once everything is tested and confirmed working.

Multiplex Controller

The next thing you should wire up is a Multiplex Controller. This will "fan out" and run multiple sub-jobs in parallel, specifically one for each server in the target group(s). Each job will be retargeted to run on a specific server in the set.

For the event itself, we can write a very small and simple shell script. All it needs to do is tell xyOps to grab all the files in our pickup directory, attach them to the job, and delete them when they're successfully uploaded. You can do all this with a single-line script:

echo '{ "xy":1, "files":[{ "path": "/opt/pickup/*", "delete": true }] }'

We're telling xyOps to attach all the files in the pickup directory to the job, and delete them upon completion. The /opt/pickup/* is a glob, which xyOps detects and expands to individual files at runtime. See Output Files for more details on this.

It is recommended that you omit a Max Concurrent Jobs limit from your event node (or set it to a very high number). This is so in the case of multiple servers, all files can all be collected in parallel.

Decision Controller

The next thing we need to wire up is a Decision Controller. This allows us to conditionally launch a secondary processing job, if and only if files were found.

An important thing to note here is that the multiplex controller will spawn multiple collection jobs, and each one will run the decision controller upon success. So everything from this point onward happens in parallel, once for each server in your target group(s).

Tip

If we wanted to run something after all the multiplex jobs complete, we would use the special On Continue condition instead. But in our case we want to run things after every multiplexed job.

For the decision expression, all we need to do is check if the previous job had any output files:

count(files) > 0

This is a xyOps Expression, which is a subset of JavaScript. Using this you can trigger the alert based on any values available in the previous Job Data Structure. In our case we want to count the number of elements in the Job.files array. With our expression language we have to use the count() helper function, because arrays don't have a length property.

So the idea here is that our decision controller will only pass control onto the next node if the previous job produced output files.

Run Event Action

The final node we need to wire up is a Run Event action. This may seem odd, because we could just add an Event Node or Job Node right into the workflow for processing our files. But the thing is, the workflow is going to be running in invisible and ephemeral mode, but we will want file processing to be a normal, visible job (and one that doesn't auto-delete itself).

So that's why we're launching the event via a "Run Event" action, so it "breaks out" of the workflow context, runs on its own, and doesn't run invisibly or ephemerally.

The files collected from the previous job will be passed into the new job as inputs. It is up to you how you want to process the files, but as a test, you can select the built-in Test Plugin, then go examine the job results to see the files passed in to it.

Concurrency

For handling concurrency on the file processing, all you need to do is attach Max Concurrent Jobs and/or Max Queue limits onto the event.

This should be done outside of the workflow, i.e. on the event itself.

Quiet Modifier

Once everything is tested and working, you can add a Quiet Modifier node to the workflow. This is a "schedule modifier", which alters the workflow (and it's sub-jobs) when it is launched via a scheduler trigger. Inside the quiet modifier configuration, you have two options:

Invisible: Upcoming, queued and running workflow jobs are completely hidden from the UI.
Ephemeral: Auto-delete all workflow jobs upon completion (no permanent storage).

So the idea here is twofold. Since you will likely schedule the workflow to run very frequently (possibly even multiple times per minute) to check for files, you probably don't want your UI cluttered with upcoming, running, and completed jobs. The quiet modifier handles this in two ways. First, "invisible" mode means that the workflow, and all sub-jobs within, run completely "invisibly" to the UI. They aren't shown while running, they aren't counted in the "Active Jobs" header widget counter, nor are they shown in any "upcoming jobs" prediction views. Second, ephemeral mode means that all the jobs will auto-delete themselves upon completion, i.e. they won't take up any space in your database.

Now, you may be asking, "what about when files are detected and processed? I want to see THOSE jobs!". Absolutely! That's why we launch the file processing job as a Run Event action. This means that the job will run "outside" of the workflow context, and thus will not inherit the invisible or ephemeral properties. In effect, the processing job will run as a normal, visible job.

It should also be noted that if any job produces output files (like the collection job), it also "escapes" ephemeral mode, so it is not deleted upon completion. This is because the files are "attached" to the job that produced them, so it needs to stick around for the files to exist long enough to be processed. It also adds a nice side effect because you can trace a file processing job back to the server where the files originated from.

Note: The invisible and ephemeral modes only activate if the workflow is triggered via a schedule. This is as designed, as they are "schedule modifiers". So if you manually run the workflow for testing purposes, it will not be invisible or ephemeral.

Downsides

This method is a bit more complex to set up versus the Monitors and Alerts method described above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote File Processor

Remote File Processor

Method 1: Monitors and Alerts

Create Monitor Plugin

Create Alert

Exclusive Actions

Run Event Action

Concurrency

Downsides

Method 2: Silent Workflow

Getting Started

Multiplex Controller

Decision Controller

Run Event Action

Concurrency

Quiet Modifier

Downsides

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally