Mitto v2.9 Sneak Peek - System Job

New to Mitto as of 2.9 is the system job.

This job monitors system resources on the mitto host machine.

We’ve had the opportunity to see this job in action on some Zuar hosted deployments of Mitto for about a month, and it’s been super helpful in alerting us to any resource issues. We gain visibility into memory and disk usage as well as any service failure. Further, in the event of a resource issue, this job logs a history of what jobs/sequences were running right before the failure.

By making use of Mitto’s scheduling and webhooks, we post to our slack channel on job failure. so we get notifications in real time as system resources cross thresholds defined in a job config like the one below:

{
    "json_file": "system_status",
    "max_disk": 90,
    "max_memory": 95,
    "write_json": "true"
}

A system job with this configuration will fail if the disk usage goes above 90%, or if the memory usage goes above 95%. It will also fail if any services (like gunicorn) stop running. with write_json set to true the job will write to a jsonl file every time the job runs, logging all currently running jobs and sequences along with the memory and disk usage.

We also add a webhook on the system job to POST to Slack on job failure with this body:

{
    "text": "*SYSTEM MONITOR WARNING:* - *memory:* ${job['status']['kvp']['current memory usage']} *disk:* ${job['status']['kvp']['current disk usage']} - https://${system['fqdn']}"
}