Mitto Store Jobs - How do I Extract 2 Level JSON Node to Database using PluckV2 Transform?

Consider the following JSON. It is an anonymized extract from Shopify. It represents an order placed on the Shopify platform.

The use case is to obtain the tax_line information and associate it to the parent order_id of 6444275957783and the line item IDs of 6444275990551 and 6444275957783.

{
	"id": 2983914373143,
	"email": "Test@gmail.com",
	"closed_at": "2021-01-02T14:50:01-06:00",
	"created_at": "2021-01-02T14:49:59-06:00",
	"updated_at": "2021-01-02T14:55:01-06:00",
	"line_items": [{
			"id": 6444275957783,
			"quantity": 1,
			"product_id": 4498623,
			"name": "Test Product 1",
			"grams": 9,
			"price": "48.00",
			"total_discount": "0.00",
			"pre_tax_price": "48.00"
		},
		{
			"tax_lines": [{
					"title": "TN COUNTY TAX",
					"price": "1.08",
					"rate": 0.0225
				},
				{
					"title": "TN STATE TAX",
					"price": "3.36",
					"rate": 0.07
				},
				{
					"title": "TN CITY TAX",
					"price": "0.00",
					"rate": 0
				}
			]
		},
		{
			"id": 6444275990551,
			"quantity": 1,
			"product_id": 1580599,
			"name": "Test Product 2",
			"grams": 4,
			"price": "52.00",
			"total_discount": "0.00"
		},
		{
			"tax_lines": [{
					"title": "STATE TAX",
					"price": "3.64",
					"rate": 0.07
				},
				{
					"title": "CITY TAX",
					"price": "0.00",
					"rate": 0
				},
				{
					"title": "COUNTY TAX",
					"price": "1.17",
					"rate": 0.0225
				}
			]
		}
	]
}

Using Mitto Store jobs it is straightforward to get the top level Order information. Contained within the JSON is the second level “line_items” node. It is also straightforward to get this information using a standard Mitto Store job.

The “line_items” node is referenced in the jpath and the parent order_id and order_updated_at fields are referenced in the “members” section of the “input” node. The orders__line_items table will be output with extra order_id and order_updated_at fields so that the child line_items can be linked up to the parent id.

{
   "input":{
      "jpath":"$.line_items[*]",
      "members":[
         {
            "name":"order_id",
            "value":"$.id"
         },
         {
            "name":"order_updated_at",
            "value":"$.updated_at"
         }
      ],
      "name":"shopify_orders",
      "use":"mitto.iov2.input#StoreInput"
   },
   "output":{
      "dbo":"postgresql://localhost/analytics",
      "schema":"shopify",
      "tablename":"orders__line_items",
      "use":"call:mitto.iov2.db#todb"
   },
   "steps":[
      {
         "column":"order_updated_at",
         "use":"mitto.iov2.steps#MaxTimestamp"
      },
      {
         "use":"mitto.iov2.steps.upsert#SetUpdatedAt"
      },
      {
         "transforms":[
            {
               "use":"mitto.iov2.transform#ExtraColumnsTransform"
            },
            {
               "use":"mitto.iov2.transform#ColumnsTransform"
            }
         ],
         "use":"mitto.iov2.steps#Input"
      },
      {
         "use":"mitto.iov2.steps#CreateTable"
      },
      {
         "use":"mitto.iov2.steps.upsert#CreateTempTable"
      },
      {
         "transforms":[
            {
               "use":"mitto.iov2.transform#FlattenTransform"
            }
         ],
         "use":"mitto.iov2.steps#Output"
      },
      {
         "key":"id",
         "use":"mitto.iov2.steps.upsert#SyncTempTable"
      },
      {
         "use":"mitto.iov2.steps#CollectMeta"
      }
   ]
}

This will create a shopify.orders__line_items table that will contain the following information:

Extracting the “tax_lines” Node

However for each of the line_item nodes there is a third le. In this example we will look at the “tax_lines” node. For each line item these is an individual tax charged. To get this third level of information out of the JSON and to be able to reference the tax_line to the parent line_item, the Pluck V2 method needs to be used.

To set this up we again use the “use”: “mitto.iov2.input#StoreInput” input method.

"input":{
   "jpath":"$.line_items[*]",
   "members":[
      {
         "name":"order_id",
         "value":"$.id"/*Order ID 2983914373143*/
      },
      {
         "name":"order_updated_at",
         "value":"$.updated_at""/*Updated at 2021-01-02T14":"55":"01-06":00*/
      }
   ],
   "name":"shopify_orders",
   "use":"mitto.iov2.input#StoreInput"
},

The “members” section of the input node specifies our grandparent IDs (and any other important fields, in this case, “updated_at”). These are highest level fields in the JSON.

These specified fields are then referenced in the “transforms” node below.

The “members” node in the “transforms” node makes reference to the “order_id” and “order_updated_at” that was created in the “input” “member” node.

Then a new reference to the line item id (from $.line_items[*].$.id) is then made.


{
    "input": {
        "jpath": "$.line_items[*]",
        "members": [
            {
                "name": "order_id",
                "value": "$.id"
            },
            {
                "name": "order_updated_at",
                "value": "$.updated_at"
            }
        ],
        "name": "shopify_orders",
        "use": "mitto.iov2.input#StoreInput"
    },
    "output": {
        "dbo": "postgresql://localhost/analytics",
        "schema": "shopify",
        "tablename": "orders__line_items__tax_lines2",
        "use": "call:mitto.iov2.db#todb"
    },
    "steps": [
        {
            "transforms": [
                {
                    "jpath": "$.tax_lines[*]",
                    "members": [
                        {/* Reference to the input member node above*/
                            "name": "order_id", 
                            "value": "$.order_id"
                        },
                        {/* Reference to the input member node above*/
                            "name": "order_updated_at",
                            "value": "$.order_updated_at"
                        },
                        {/* New Reference the line item id ie $.line_items[*].$.id*/
                            "name": "order_line_item_id",
                            "value": "$.id"
                        }
                    ],
                    "use": "mitto.iov2.transform#PluckV2Transform"
                },
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ],
            "use": "mitto.iov2.steps#Input"
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ],
            "use": "mitto.iov2.steps#Output"
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

The output of this table is as follows:

1 Like

It may also be worth noting that when using a Pluck Transform, you can actually include any necessary members from the store in the output table within the members set of the pluck.

For example, I have 2 fields that are being returned from an API as an array but will never have more than 1 value (custom_field_1, custom_field_2). We can create a store as an input job that targets the top level of data and use a Pluck Transform to extract any child members within the results set.

Data

{
   "id": "123",
   "updated_at":"2022-12-21",
   "fields":  {
        "date": "2022-12-01",
         "custom_field_1": [
              "United States"
          ],
         "custom_field_2": [
              "Marketing"
          ]
   }
}

Because the 2 custom fields are nested arrays, Mitto will not automatically pull them out.
With a combination of SDLs, Pluck Transform, and Ignores, we can generate a child table with the custom fields and the primary id to be joined back to the parent table:

Steps:

steps: [
    {
      use: mitto.iov2.steps#Input
      transforms: [
        {
          jpath: $.fields.custom_field_1[0]    # get the first value of the custom field for every record
          members: [
            {
              name: parent_id
              value: $.id
            }
            {
              name: department
              value: $.fields.custom_field_2[0]    # get the first value of the custom field for every record
            }
          ]
          use: mitto.iov2.transform#PluckV2Transform
        }
        {
          ignores: [
            $.__value__    # this field is a duplicate of location so we can ignore it
            $.__index__    # unnecessary attempt to index the table that can be ignored
          ]
          use: mitto.iov2.transform#ExtraColumnsTransform
        }
        {
          use: mitto.iov2.transform#ColumnsTransform
        }
      ]
    }
    {
      use: mitto.iov2.steps#CreateTable
    }
    {
      use: mitto.iov2.steps#Output
      transforms: [
        {
          use: mitto.iov2.transform#FlattenTransform
        }
      ]
    }
    {
      use: mitto.iov2.steps#CollectMeta
    }
  ]

We also need an SDL to modify the name of the custom_field_1 column:

sdl: {
    columns: [
      {
        __field__: __value__
        __jpath__: $.__value__
        length: 20
        name: location
        type: String
      }
    ]
  }

Parent table:

id updated_at fields__date
123 2022-12-21 2022-12-01

Child table:

parent_id location department
123 United States Marketing

You can now easily build a relationship between the tables using id = parent_id

1 Like