Building on @anon68878319 post on mismatched Regex jobs, we recently started working on a command line python debugging script to help troubleshoot regex jobs.
This script will show you any mismatches in header columns and header positions, as well as information about files matching your Regex pattern.
It takes one mandatory positional arg, which is the Regex pattern. All flags are described below in the help output. (--help
)
$ ./regex-tester.py -h
usage: regex-tester [-h] [--encoding ENCODING] [--workdir WORKDIR] [--delimiter DELIMITER] [--json] [--table] [--files] [--meta] [regex]
regex tester
positional arguments:
regex regular expression (use quotes!)
optional arguments:
-h, --help show this help message and exit
--encoding ENCODING, -e ENCODING
optional: include file encoding.
defaults to utf-8
--workdir WORKDIR, -w WORKDIR
optional: use a working dir other than the current dir
--delimiter DELIMITER, -d DELIMITER
define a delimiter other than `,`
--json, -j print json object
--table, -t print table
--files, -f only show matching files
--meta, -m show matching files including meta data
As mentioned this is a work in progress (still untested on windows, for example), but anyone running python3 is welcome to clone the repo and try it out.
Show help and exit
$ ./regex-tester.py -h
Show matching files using working dir example-files/
$ ./regex-tester.py ".+clean.csv" -w example-files -f
Show matching files using working dir example-files/
with meta info about file encoding and number of header columns
$ ./regex-tester.py ".+clean.csv" -w example-files -m
Show header names and anything missing using utf-8-sig encoding
$ ./regex-tester.py ".+test-ca.csv" -e utf-8-sig -w example-files
Use a different delimiter
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|"
Show a table with header names and positions
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|" -t
Output a json object and pipe into jq
$ ./regex-tester.py ".+clean.csv" -w example-files -d "|" -j | jq
There’s setup and usage directions in the README