Pre Mitto v2.9, using IO jobs, Mitto outputs data from APIs, databases, and files to relational databases (typical behavior) and delimited flat files (e.g. csv, tsv, etc).
Mitto v2.9 introduces two new file outputs for IO jobs:
This introduces several new use cases for IO jobs:
- Output raw API data directly to JSON or JSON lines - This is especially useful when exploring data from new APIs and helping to understand the structure of that potentially nested data
- Output database data directly to JSON or JSON lines
- Convert files (e.g. csv, tsv, json, json lines etc) to JSON or JSON lines
Example Use Case
This example demonstrates using Mitto to download a .json
and a .jsonl
file from a public Github with a Mitto curl
job, piping that data through Mitto, and outputting the data as a JSON or JSON lines file.
curl job
Here are the two files we will be downloading with Mitto:
- data/zuar_pets.json at master · zuarbase/data · GitHub
- data/zuar_pets.jsonl at master · zuarbase/data · GitHub
Here are the two curl
job configs:
{
url: https://raw.githubusercontent.com/zuarbase/data/master/zuar_pets.json
args: [
-s
-b
/tmp/cookies
-L
-O
-f
]
}
{
url: https://raw.githubusercontent.com/zuarbase/data/master/zuar_pets.jsonl
args: [
-s
-b
/tmp/cookies
-L
-O
-f
]
}
End result: Two new files in Mitto’s file manager.
IO job - JSON input with JSON output
Here’s the IO job that takes the zuar_pets.json
file and pipes it through Mitto and outputs it as zuar_pets_tojson.json
:
{
input: {
use: flatfile.iov2#JsonInput
source: /var/mitto/data/zuar_pets.json
}
output: {
path: /var/mitto/data/zuar_pets_tojson.json
use: call:mitto.iov2#tojson
}
steps: [
{
transforms: [
{
use: mitto.iov2.transform#ExtraColumnsTransform
rename_columns: false
include_empty_columns: true
include_nested_json: true
}
]
use: mitto.iov2.steps#Input
}
{
transforms: [
{
use: mitto.iov2.transform#FlattenTransform
}
]
use: mitto.iov2.steps#Output
}
]
}
Two critical job config pieces here:
- The
output
'suse
references the newtojson
code and thepath
references the output file Mitto will create. - The
ExtraColumnsTransform
transform step includes a new parameterinclude_nested_json: true
.
End result - We end up with the exact same JSON file we started with as a new file.
IO job - JSON lines input with JSON lines output
Here’s the IO job that takes the zuar_pets.json
file and pipes it through Mitto and outputs it as zuar_pets_tojson.json
:
{
input: {
use: flatfile.iov2#JsonlInput
source: /var/mitto/data/zuar_pets.jsonl
}
output: {
path: /var/mitto/data/zuar_pets_tojsonl.jsonl
use: call:mitto.iov2#tojsonl
}
steps: [
{
transforms: [
{
use: mitto.iov2.transform#ExtraColumnsTransform
rename_columns: false
include_empty_columns: true
include_nested_json: true
}
]
use: mitto.iov2.steps#Input
}
{
transforms: [
{
use: mitto.iov2.transform#FlattenTransform
}
]
use: mitto.iov2.steps#Output
}
]
}
Differences here:
- The
input
'suse
isJsonlInput
instead ofJsonInput
. - The
output
'suse
istojsonl
instead oftojson
. - The
output
'spath
ends injsonl
instead ofjson
.
End result - We end up with the exact same JSON lines file we started with as a new file.