Mitto v2.9 introduces two new transforms for IO jobs:
-
IncludeTransform
- Specify the exact columns to include from the data piped from the input. -
ExcludeTransform
- Specify the exact columns to exclude from the data piped from the input.
Potential use cases:
- Narrow down a wide dataset when you know you don’t need all the fields
- Remove fields with personally identifiable information
As these are transforms and part of the steps of an IO job, they work with any input (e.g. APIs, databases, and files).
IncludeTransform Example
Let’s take a simple CSV as an example:
id,name
1,Justin
2,Matt
A standard IO job using this CSV as an input and a database table as an output would result in this database table:
__index__ | id | name |
---|---|---|
1 | 1 | Justin |
2 | 2 | Matt |
By adding the IncludeTransform
and specifying only id
in the keys, the output table becomes this:
id |
---|
1 |
2 |
Only the id
column is included.
Here’s the job’s modified “input” step
:
"steps": [
{
"use": "mitto.iov2.steps#Input",
"transforms": [
{
"use": "mitto.iov2.transform#IncludeTransform",
"keys": [
"id"
]
},
{
"use": "mitto.iov2.transform#ExtraColumnsTransform"
},
{
"use": "mitto.iov2.transform#ColumnsTransform"
}
]
},
...
]
ExcludeTransform Example
By adding the ExcludeTransform
and specifying only name
in the keys, the output table becomes this:
__index__ | id |
---|---|
1 | 1 |
2 | 2 |
The name
column is excluded.
Here’s the job’s modified “input” step
:
"steps": [
{
"use": "mitto.iov2.steps#Input",
"transforms": [
{
"use": "mitto.iov2.transform#ExcludeTransform",
"keys": [
"name"
]
},
{
"use": "mitto.iov2.transform#ExtraColumnsTransform"
},
{
"use": "mitto.iov2.transform#ColumnsTransform"
}
]
},
...
]