Mitto REGEX job - Debugging with python

Building on @anon68878319 post on mismatched Regex jobs, we recently started working on a command line python debugging script to help troubleshoot regex jobs.

This script will show you any mismatches in header columns and header positions, as well as information about files matching your Regex pattern.

It takes one mandatory positional arg, which is the Regex pattern. All flags are described below in the help output. (--help)

$ ./regex-tester.py -h
usage: regex-tester [-h] [--encoding ENCODING] [--workdir WORKDIR] [--delimiter DELIMITER] [--json] [--table] [--files] [--meta] [regex]

regex tester

positional arguments:
  regex                 regular expression (use quotes!)

optional arguments:
  -h, --help            show this help message and exit
  --encoding ENCODING, -e ENCODING
                        optional: include file encoding.
                        defaults to utf-8
  --workdir WORKDIR, -w WORKDIR
                        optional: use a working dir other than the current dir
  --delimiter DELIMITER, -d DELIMITER
                        define a delimiter other than `,`
  --json, -j            print json object
  --table, -t           print table
  --files, -f           only show matching files
  --meta, -m            show matching files including meta data

As mentioned this is a work in progress (still untested on windows, for example), but anyone running python3 is welcome to clone the repo and try it out.

Show help and exit
$ ./regex-tester.py -h

Show matching files using working dir example-files/
$ ./regex-tester.py ".+clean.csv" -w example-files -f

Show matching files using working dir example-files/ with meta info about file encoding and number of header columns
$ ./regex-tester.py ".+clean.csv" -w example-files -m

Show header names and anything missing using utf-8-sig encoding
$ ./regex-tester.py ".+test-ca.csv" -e utf-8-sig -w example-files

Use a different delimiter
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|"

Show a table with header names and positions
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|" -t

Output a json object and pipe into jq
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|" -j | jq

There’s setup and usage directions in the README

1 Like