Notify me when SQL query returns 0 rows

BlueFlamingo · February 22, 2021, 11:22pm

Question: How do I get an email alert if there is data returned from an SQL query?

Background:
I have a SQL query that aggregates and compares data across two tables to ensure they reconcile.

If there is any mismatch – there will be rows of data generated showing data that is in one table and not in the other.

If everything reconciles – i.e. the same data resides in both tables – there will be NO DATA returned.

I want to get an email (containing the data that doesn’t match) – only when there are rows generated.

What is the best way to do this?

Example
Table 1: Contains Part # and Units On Hand
Part A, 100 Units
Part B, 150 Units
Part C, 5 Units

Table 2: A DIFFERENT data set also containing Part # and Units On Hand information
Part A, 99 Units
Part B, 150 Units
Part D, 5 Units

SQL returns: Part #, Units On Hand in Table A, Units On Hand in Table B
Part A, 100 Units, 99 Units
Part C, 5 Units, NULL
Part D, NULL, 5 Units
I want an email with this table’s data

anon68878319 · February 23, 2021, 6:58pm

Thank you for the awesome question @BlueFlamingo!

You should be able to solve this use case with the following:

Monitor job - This job will test that your SQL statement returns results. Depending on the setup, a webhook may or may not be used.
Query to CSV job - This job will create a CSV with the results of a SQL query.
Email job - This job will email the CSV created above to you.
Sequence with these 3 jobs in it.

Example Data Setup

Create two tables:
test.parts_a

drop table if exists test.parts_a;
create table if not exists test.parts_a (part varchar, units integer);
insert into test.parts_a (part, units) VALUES ('Part A', 100), ('Part B', 150), ('Part C', 5);

Results:

part	units
Part A	100
Part B	150
Part C	5

test.parts_b

drop table if exists test.parts_b;
create table if not exists test.parts_b (part varchar, units integer);
insert into test.parts_b (part, units) VALUES ('Part A', 99), ('Part B', 150), ('Part D', 5);

Results:

part	units
Part A	99
Part B	150
Part D	5

Test SQL Statement

We need to compare these two tables in a SQL statement to see where the tables don’t match:

SELECT
    COALESCE(a.part, b.part) AS part
  , a.units                  AS units_from_a
  , b.units                  AS units_from_b
FROM
    test.parts_a a
    FULL OUTER JOIN test.parts_b b
        ON b.part = a.part
WHERE
     a.units != b.units
  OR a.units IS NULL
  OR b.units IS NULL

Results:

part	units_from_a	units_from_b
Part A	100	99
Part C	5	NULL
Part D	NULL	5

1. Monitor Job Config

For the Mitto monitor job, we have a couple options based on how we want the other jobs to interact with this one:

If we want the monitor job to FAIL when the SQL statement returns results, then we want the SQL statement to return TRUE. Having the job fail is is helpful because it fits in with the normal concept of jobs that when they fail something is wrong. This does complicate our job setup slightly as we have to use a webhook to start the sequence.
If we want the monitor job to SUCCEED when the SQL statement returns results, then we want the SQL statement to return FALSE. Having the job succeed is helpful if we want to use a normal sequence with 3 jobs and schedule the job. This method doesn’t require webhooks for job interaction. The potentially annoying result with this setup is that the monitor job will mostly be in a failed state.

Here are the two potential monitor job configs:

SQL statement returns TRUE and monitor job FAILS:

{
    "dbo": "postgresql://localhost/analytics",
    "sql": [
        "WITH test AS (SELECT COALESCE(a.part, b.part) AS part, a.units AS units_from_a, b.units AS units_from_b FROM test.parts_a a FULL OUTER JOIN test.parts_b b ON b.part = a.part WHERE a.units != b.units OR a.units IS NULL OR b.units IS NULL) SELECT COUNT(*) > 0 FROM test;"
    ]
}

SQL statement returns FALSE and monitor job SUCCEEDS:

{
    "dbo": "postgresql://localhost/analytics",
    "sql": [
        "WITH test AS (SELECT COALESCE(a.part, b.part) AS part, a.units AS units_from_a, b.units AS units_from_b FROM test.parts_a a FULL OUTER JOIN test.parts_b b ON b.part = a.part WHERE a.units != b.units OR a.units IS NULL OR b.units IS NULL) SELECT COUNT(*) = 0 FROM test;"
    ]
}

The only difference between the SQL here is COUNT(*) > 0 or COUNT(*) = 0 after the WITH statement.

2. Query to CSV job config

Mitto IO jobs can take input data from a database and pipe the output data to a file.

Here’s the IO job config using our example above:

{
    "input": {
        "dbo": "postgresql://localhost/analytics",
        "query": [
            "SELECT",
            "    COALESCE(a.part, b.part) AS part",
            "  , a.units                  AS units_from_a",
            "  , b.units                  AS units_from_b",
            "FROM",
            "    test.parts_a a",
            "    FULL OUTER JOIN test.parts_b b",
            "        ON b.part = a.part",
            "WHERE",
            "     a.units != b.units",
            "  OR a.units IS NULL",
            "  OR b.units IS NULL"
        ],
        "use": "query.io#QueryInput"
    },
    "output": {
        "delimiter": ",",
        "path": "/var/mitto/data/parts_mismatch.csv",
        "use": "call:mitto.iov2#tocsv"
    },
    "steps": [
        {
            "transforms": [
                {
                    "include_empty_columns": true,
                    "rename_columns": false,
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                }
            ],
            "use": "mitto.iov2.steps#Input"
        },
        {
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ],
            "use": "mitto.iov2.steps#Output"
        }
    ]
}

This job creates a file named parts_mismatch.csv. You can see this file on Mitto’s files page.

3. Email CSV file

Mitto can send emails (with attachments) via email jobs.

Here’s the relevant part of the email job config:

{
  "attachments": [
    "/var/mitto/data/parts_mismatch.csv"
  ],
  ...
}

4. Sequence

Put these three jobs together in a sequence.

Monitor job
Query to CSV IO job
Email job

If you decided to use monitor job config #2, then just set the sequence to continue on error false and schedule it. The monitor job is first in the sequence and if it succeeds (there is data in the results of your monitor query), then the CSV is created and the email is sent. If the monitor job fails, the sequence won’t continue, the CSV won’t be created, and the email isn’t sent.

If you decided to use monitor job config #1, there are a few additional steps.

Schedule the monitor job instead of the sequence.
Add a webhook to the monitor job. The webhook will start the sequence when the monitor job fails
Don’t schedule the sequence. The monitor job is scheduled and starts the sequence when it fails via it’s webhook.
Disable the monitor job in the sequence. We don’t want this job to run when the sequence runs.

(OPTIONAL) Monitor Job Webhook Config

Create a webhook on the monitor job.

URL = http://localhost:7127/v2/jobs/{schedule_id}/:actions
Event = JOB_FAILURE
Method = POST
Content Type = application/json
Body = { "action": "START" }

When the monitor job fails, it will send a POST request to the sequence and start it. You need to put the id of the schedule into the URL of the webhook. For example if the sequence’s id is 123, the URL will be https://localhost:7127/v2/jobs/123/:actions.

BlueFlamingo · February 24, 2021, 6:23pm

How do you recommend combining the SQL in Step #1 and Step #2 into a single job (table?) to avoid having to make a single SQL update in TWO jobs?

In Step #1, you are only adding a COUNT check to see if it fails or not.
In Step #2, you are using the same SQL, but showing the granular details.

I think it would be better to have the SQL syntax result in a table (?), then have step #1 do a count on that table and step #2 report the table…

For example, if there was a THIRD column I wanted to check, say “Warehouse Location” – I would have to make the exact same SQL update in TWO sets of code – you know that will be an issue

anon68878319 · February 25, 2021, 8:54pm

You make some great points here. I completely agree that creating a table with a SQL job can simplify the SQL statements for #1 and #2 above.

The SQL job would look like this:

DROP TABLE IF EXISTS test.parts_discrepancies
;

CREATE TABLE IF NOT EXISTS test.parts_discrepancies AS
SELECT
    COALESCE(a.part, b.part) AS part
  , a.units                  AS units_from_a
  , b.units                  AS units_from_b
FROM
    test.parts_a a
    FULL OUTER JOIN test.parts_b b
                    ON b.part = a.part
WHERE
     a.units != b.units
  OR a.units IS NULL
  OR b.units IS NULL
;

Then the monitor job’s SQL (#1) could be simplified to:

WITH test as (SELECT * FROM test.parts_discrepancies) SELECT COUNT(*) > 0 FROM test;

or (#2)

WITH test as (SELECT * FROM test.parts_discrepancies) SELECT COUNT(*) = 0 FROM test;

The query to CSV job’s SQL is also simplified to:

SELECT * FROM test.parts_discrepancies;

Now you only have to change one job (the new SQL job) and the monitor job and query to csv jobs still work without any changes.

Add the SQL job to the beginning of the sequence.