Is there a way to exclude the index columns from IO Jobs?

pmcgavick · November 2, 2023, 3:09pm

We have a number of running regex jobs that pull from an increasing number of different csv files. We try to avoid duplicate data as much as we can, but we often have repeating data. Regex should handle this via its union process, but because the regex jobs import the index of the csv where the data lives, these lines do not get eliminated in the union process. Is there a way within the regex IO job to tell it not to generate the index column? I couldn’t find the answers in the documentation so I came here.

greg.rossi · November 17, 2023, 4:27pm

Hey @pmcgavick , there is a pretty simple way to ignore the index column. In the job configuration, under the input step, you can add an ignores parameter as part of the use: mitto.iov2.transform#ExtraColumnsTransform that should look something like this:

       {
         ignores: [
            $.__index__   
          ]
          use: mitto.iov2.transform#ExtraColumnsTransform
        }

Let me know how this works if you need further assistance!