distinct
diffit analyse distinct
provides a way to report on rows from a diffit
engine extract that
only appear in either one of the left
or right
target data sources.
diffit analyse distinct usage message.
Usage: diffit analyse distinct [OPTIONS] PARQUET_PATH
Spark DataFrame list rows source data.
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────╮
│ * parquet_path TEXT Path to Spark Parquet: input [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --orientation -O [left|right] Limit analysis orientation to either "left" or "right" │
│ * --key -k TEXT Analysis column to act as a unique constraint [default: None] │
│ [required] │
│ --descending -D Change output ordering to descending │
│ --counts-only -C Only output counts │
│ --hits -H INTEGER Rows to display [default: 20] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example: Analyse Rows Unique to Each Spark DataFrame
Note
The following examples source sample diffit
engine output data that can be found at
docker/files/parquet/analysis
.
The key setting, col01, acts as the GROUP BY predicate.
venv/bin/diffit analyse distinct --key col01 docker/files/parquet/analysis
Combined diffit analyse distinct output.
### Analysing distinct rows from "left" source DataFrame
+-----+-----+-----+----------+
|col01|col02|col03|diffit_ref|
+-----+-----+-----+----------+
+-----+-----+-----+----------+
### Analysing distinct rows from "right" source DataFrame
+-----+-----------+-----------+----------+
|col01|col02 |col03 |diffit_ref|
+-----+-----------+-----------+----------+
|9 |col02_val09|col03_val09|right |
+-----+-----------+-----------+----------+
A Diffit extract can be limited with the --orientation
switch. For example, to only show
distinct Diffit extract records from the right
data source: