Thank you for the awesome question @BlueFlamingo!
You should be able to solve this use case with the following:
-
Monitor job - This job will test that your SQL statement returns results. Depending on the setup, a webhook may or may not be used.
-
Query to CSV job - This job will create a CSV with the results of a SQL query.
-
Email job - This job will email the CSV created above to you.
-
Sequence with these 3 jobs in it.
Example Data Setup
Create two tables:
test.parts_a
drop table if exists test.parts_a;
create table if not exists test.parts_a (part varchar, units integer);
insert into test.parts_a (part, units) VALUES ('Part A', 100), ('Part B', 150), ('Part C', 5);
Results:
part |
units |
Part A |
100 |
Part B |
150 |
Part C |
5 |
test.parts_b
drop table if exists test.parts_b;
create table if not exists test.parts_b (part varchar, units integer);
insert into test.parts_b (part, units) VALUES ('Part A', 99), ('Part B', 150), ('Part D', 5);
Results:
part |
units |
Part A |
99 |
Part B |
150 |
Part D |
5 |
Test SQL Statement
We need to compare these two tables in a SQL statement to see where the tables don’t match:
SELECT
COALESCE(a.part, b.part) AS part
, a.units AS units_from_a
, b.units AS units_from_b
FROM
test.parts_a a
FULL OUTER JOIN test.parts_b b
ON b.part = a.part
WHERE
a.units != b.units
OR a.units IS NULL
OR b.units IS NULL
Results:
part |
units_from_a |
units_from_b |
Part A |
100 |
99 |
Part C |
5 |
NULL |
Part D |
NULL |
5 |
1. Monitor Job Config
For the Mitto monitor job, we have a couple options based on how we want the other jobs to interact with this one:
- If we want the monitor job to FAIL when the SQL statement returns results, then we want the SQL statement to return
TRUE
. Having the job fail is is helpful because it fits in with the normal concept of jobs that when they fail something is wrong. This does complicate our job setup slightly as we have to use a webhook to start the sequence.
- If we want the monitor job to SUCCEED when the SQL statement returns results, then we want the SQL statement to return
FALSE
. Having the job succeed is helpful if we want to use a normal sequence with 3 jobs and schedule the job. This method doesn’t require webhooks for job interaction. The potentially annoying result with this setup is that the monitor job will mostly be in a failed state.
Here are the two potential monitor job configs:
- SQL statement returns
TRUE
and monitor job FAILS:
{
"dbo": "postgresql://localhost/analytics",
"sql": [
"WITH test AS (SELECT COALESCE(a.part, b.part) AS part, a.units AS units_from_a, b.units AS units_from_b FROM test.parts_a a FULL OUTER JOIN test.parts_b b ON b.part = a.part WHERE a.units != b.units OR a.units IS NULL OR b.units IS NULL) SELECT COUNT(*) > 0 FROM test;"
]
}
- SQL statement returns
FALSE
and monitor job SUCCEEDS:
{
"dbo": "postgresql://localhost/analytics",
"sql": [
"WITH test AS (SELECT COALESCE(a.part, b.part) AS part, a.units AS units_from_a, b.units AS units_from_b FROM test.parts_a a FULL OUTER JOIN test.parts_b b ON b.part = a.part WHERE a.units != b.units OR a.units IS NULL OR b.units IS NULL) SELECT COUNT(*) = 0 FROM test;"
]
}
The only difference between the SQL here is COUNT(*) > 0
or COUNT(*) = 0
after the WITH statement.
2. Query to CSV job config
Mitto IO jobs can take input data from a database and pipe the output data to a file.
Here’s the IO job config using our example above:
{
"input": {
"dbo": "postgresql://localhost/analytics",
"query": [
"SELECT",
" COALESCE(a.part, b.part) AS part",
" , a.units AS units_from_a",
" , b.units AS units_from_b",
"FROM",
" test.parts_a a",
" FULL OUTER JOIN test.parts_b b",
" ON b.part = a.part",
"WHERE",
" a.units != b.units",
" OR a.units IS NULL",
" OR b.units IS NULL"
],
"use": "query.io#QueryInput"
},
"output": {
"delimiter": ",",
"path": "/var/mitto/data/parts_mismatch.csv",
"use": "call:mitto.iov2#tocsv"
},
"steps": [
{
"transforms": [
{
"include_empty_columns": true,
"rename_columns": false,
"use": "mitto.iov2.transform#ExtraColumnsTransform"
}
],
"use": "mitto.iov2.steps#Input"
},
{
"transforms": [
{
"use": "mitto.iov2.transform#FlattenTransform"
}
],
"use": "mitto.iov2.steps#Output"
}
]
}
This job creates a file named parts_mismatch.csv
. You can see this file on Mitto’s files page.
3. Email CSV file
Mitto can send emails (with attachments) via email jobs.
Here’s the relevant part of the email job config:
{
"attachments": [
"/var/mitto/data/parts_mismatch.csv"
],
...
}
4. Sequence
Put these three jobs together in a sequence.
- Monitor job
- Query to CSV IO job
- Email job
If you decided to use monitor job config #2, then just set the sequence to continue on error false and schedule it. The monitor job is first in the sequence and if it succeeds (there is data in the results of your monitor query), then the CSV is created and the email is sent. If the monitor job fails, the sequence won’t continue, the CSV won’t be created, and the email isn’t sent.
If you decided to use monitor job config #1, there are a few additional steps.
- Schedule the monitor job instead of the sequence.
- Add a webhook to the monitor job. The webhook will start the sequence when the monitor job fails
- Don’t schedule the sequence. The monitor job is scheduled and starts the sequence when it fails via it’s webhook.
- Disable the monitor job in the sequence. We don’t want this job to run when the sequence runs.
(OPTIONAL) Monitor Job Webhook Config
Create a webhook on the monitor job.
When the monitor job fails, it will send a POST request to the sequence and start it. You need to put the id
of the schedule into the URL of the webhook. For example if the sequence’s id is 123
, the URL will be https://localhost:7127/v2/jobs/123/:actions
.