I think I need a Stored Procedure?

I have a use case where new client tables are being created daily and I need a way to automate the creation of a union job to include new client tables as they are created. I can use the following query to search for all tables based on their name:

        SELECT *
        FROM "INFORMATION_SCHEMA"."TABLES"
        WHERE
            TABLE_SCHEMA = 'CLIENTS'
          /*Update these fields*/
            AND TABLE_NAME ilike  'amazon_mws_%^_AllListingsReport' ESCAPE '^'
            AND TABLE_TYPE = 'BASE TABLE'
    ),
    queries AS (
        SELECT *,
            $$SELECT *, 'TABLE_NAME' as table_name FROM SCHEMA."TABLE_NAME"$$                           AS sql_template,
            replace(sql_template, 'SCHEMA', table_info.TABLE_SCHEMA)      AS sql_with_schema,
            replace(sql_with_schema, 'TABLE_NAME', table_info.TABLE_NAME) AS sql_with_table,
            sql_with_table                                                AS final_sql_statement
        FROM table_info
    )
    SELECT
    listagg(final_sql_statement,'\nUNION ALL\n') WITHIN GROUP ( ORDER BY TABLE_SCHEMA, TABLE_CATALOG)
    FROM queries;```

This generates a select sql statment that unions all tables together. Is there a way I can then use Mitto to grab and use that select statement so that I don’t have to copy and paste it into a different SQL job?

Here’s a simple example of my understanding of what you are trying to do.
All of the below SQL is using PostgreSQL.

STEP 1 - Create two tables to later union.
NOTE: These tables have the same number of columns and those columns’ data types match.

DROP TABLE IF EXISTS test.union_single_table_a
;

CREATE TABLE IF NOT EXISTS test.union_single_table_a (
    id      SERIAL,
    comment VARCHAR
)
;

INSERT INTO
    test.union_single_table_a (comment)
VALUES
    ('this is the first row of table a'),
    ('this is the second row of table a')
;

DROP TABLE IF EXISTS test.union_single_table_b
;

CREATE TABLE IF NOT EXISTS test.union_single_table_b (
    id      SERIAL,
    comment VARCHAR
)
;

INSERT INTO
    test.union_single_table_b (comment)
VALUES
    ('this is the first row of table b')
;

Here’s what these two tables look like:
test.union_single_table_a

id,comment
1,this is the first row of table a
2,this is the second row of table a

test.union_single_table_b

id,comment
1,this is the first row of table b

STEP 2 - Run SQL procedure to dynamically union these tables.

DO
$$
    DECLARE
        _query RECORD;
    BEGIN
        CREATE SCHEMA IF NOT EXISTS test;
        DROP TABLE IF EXISTS test.union_tables_with_proc;
        CREATE TABLE IF NOT EXISTS test.union_tables_with_proc (
            table_schema VARCHAR,
            table_name   VARCHAR,
            id           INTEGER,
            comment      VARCHAR
        );
        FOR _query IN (
            SELECT
                t.table_schema
              , t.table_name
            FROM information_schema.tables t
            WHERE
                  t.table_schema = 'test'
              AND table_name LIKE 'union_single_table_%'
            ORDER BY
                table_schema
              , table_name
        )
            LOOP
                RAISE INFO 'The schema is %, the table is %', _query.table_schema, _query.table_name;
                EXECUTE
                                'INSERT INTO test.union_tables_with_proc (table_schema, table_name, id, comment)
                                     select
                                        $1 AS table_schema,
                                        $2 AS table_name,
                                        id as id,
                                        comment as comment
                                     from ' || QUOTE_IDENT(_query.table_schema) || '.' || QUOTE_IDENT(_query.table_name)
                    USING QUOTE_IDENT(_query.table_schema), QUOTE_IDENT(_query.table_name);
            END LOOP;
    END
$$
LANGUAGE plpgsql
;

Here’s the resulting data:

table_schema,table_name,id,comment
test,union_single_table_a,1,this is the first row of table a
test,union_single_table_a,2,this is the second row of table a
test,union_single_table_b,1,this is the first row of table b

You can drop this SQL procedure into a Mitto SQL job and run it.

Mitto also has a Union job, but it currently only supports hard coding table names. We’ve got an improvement ticket to make this dynamic based on a SQL statement.

2 Likes

Here is something that I got to work.

create or replace procedure DynmanicTableUnion(sqlCommand String)
    returns string
    language JavaScript
as
$$
/**
 * Stored procedure to execute multiple SQL statements generated from a SQL query
 * Note that this procedure will always use the column named "SQL_COMMAND"
 *
 * @param {String} sqlCommand: The SQL query to run to generate one or more SQL commands
 * @return {String}: A string containing all the SQL commands executed, each separated by a newline.
 */
      cmd1_dict = {sqlText: SQLCOMMAND};
      stmt = snowflake.createStatement(cmd1_dict);

      rs = stmt.execute();

      var s = '';

      while (rs.next())  {
          cmd2_dict = {sqlText: rs.getColumnValue("SQL_COMMAND")};
          stmtEx = snowflake.createStatement(cmd2_dict);
          stmtEx.execute();
          s += rs.getColumnValue(1) + "\n";
          }

      return s;

$$;

Essentially, what this person was doing is creating a SQL statement to “delete all tables that end with “TEST”” and then passing that statement to a stored procedure to execute.

Here is what this process looks like for us in SQL.

/*create a table with a single row of text that contains the query you want to pass to the stored procedure*/
drop table if exists TARGETSCHEMA.__SQL_COMMAND_RESTOCK_INV_REPORT;
create table TARGETSCHEMA.__SQL_COMMAND_RESTOCK_INV_REPORT as
with
table_info as (
 SELECT *
        FROM "INFORMATION_SCHEMA"."TABLES"
        WHERE
            TABLE_SCHEMA = 'CLIENTS'
          /*Update these fields*/
            AND TABLE_NAME ilike  'amazon_mws_%^_RestockInventoryReport' ESCAPE '^'
            AND TABLE_TYPE = 'BASE TABLE'
    ),
    queries AS (
        SELECT *,
            $$SELECT *, 'TABLE_NAME' as table_name FROM SCHEMA."TABLE_NAME"$$                           AS sql_template,
            replace(sql_template, 'SCHEMA', table_info.TABLE_SCHEMA)      AS sql_with_schema,
            replace(sql_with_schema, 'TABLE_NAME', table_info.TABLE_NAME) AS sql_with_table,
            sql_with_table                                                AS final_sql_statement
        FROM table_info
    )
    SELECT
     concat('Create table TARGETSCHEMA.RestockInventoryReport_union as \n', listagg( final_sql_statement,'\nUNION ALL\n') WITHIN GROUP ( ORDER BY TABLE_SCHEMA, TABLE_CATALOG)) as SQL_COMMAND
    FROM queries;

/*Drop the existing union table, if exists*/
drop table if exists TARGETSCHEMA.RestockInventoryReport_union;

/*Call the stored procedure with the above query which created the table*/
call  CLIENTS.DynmanicTableUnion(select * from TARGETSCHEMA.__SQL_COMMAND_RESTOCK_INV_REPORT );

/*Drop the sql command statement*/
drop table if exists TARGETSCHEMA.__SQL_COMMAND_RESTOCK_INV_REPORT;

Although this process works I am recieving a javascript error (which will show in Mitto as a failed job run).

snowflake.connector.errors.ProgrammingError: 100183 (P0000): 019a8368-003e-ad14-0000-093105afe1fa: Execution error in store procedure DYNMANICTABLEUNION:
Given column name/index does not exist: SQL_COMMAND
At ResultSet.getColumnValue, line 17 position 35

I have tried troubleshooting but to no avail.

1 Like

If you know your way around JavaScript let me know because I would love to not throw this error every time we run it.