Skip to content

Commit 5a8191e

Browse files
committed
docs update
1 parent aa5ca0a commit 5a8191e

5 files changed

Lines changed: 85 additions & 43 deletions

File tree

checks.yaml

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,34 @@
11
version: "1"
22
validations:
3+
# https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi
34
- dataset: ch@[nyc_taxi.trips_small]
5+
# common pre-filter for every check, e.g. to run daily check only for yesterday
46
where: "pickup_datetime > '2014-01-01'"
57
checks:
68
- id: row_count > 0
7-
description: "data is present" # optional
9+
description: "data should be present" # optional
810
on_fail: error # optional (error, warn), default "error"
911

1012
- id: row_count between 100 and 30000
11-
description: "data is not too big"
12-
on_fail: error
13+
description: "expected rows count"
14+
on_fail: warn
1315

1416
- id: null_count(pickup_ntaname) == 0
15-
description: "no nulls in column" # optional
16-
on_fail: error
17+
description: "no nulls are allowed in column: pickup_ntaname"
1718

1819
- id: min(pickup_datetime) < now() - interval 3 day
19-
description: "min check"
20-
on_fail: error
20+
description: "min(pickup_datetime) should not be earlier than 3 days"
2121

2222
- id: stddevPop(trip_distance) < 100_000
2323
description: "check stddev value"
24-
on_fail: error
2524

2625
- id: sum(fare_amount) <= 10_000_000
2726
description: "sum of value"
28-
on_fail: error
2927

3028
- id: countIf(trip_id == 1) == 1
3129
description: "check trip id"
32-
on_fail: warn
3330

3431
- id: raw_query
35-
description: "some raw query description here"
36-
on_fail: error
32+
description: "raw query quality test"
3733
query: |
3834
select countIf(trip_distance == 0) > 0 from {{table}} where 1=1

cmd/root.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ import (
2525
var verbose bool
2626

2727
var rootCmd = &cobra.Command{
28-
Use: "dbq",
29-
Short: "dbq is a CLI tool for profiling data and running quality checks across various data sources",
28+
Use: "dbqctl",
29+
Short: "dbqctl is a CLI tool for profiling data and running quality checks across various data sources",
3030
}
3131

3232
func Execute() {

cmd/version.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ const (
2828
func NewVersionCommand() *cobra.Command {
2929
cmd := &cobra.Command{
3030
Use: "version",
31-
Short: "Prints dbq version",
31+
Short: "Prints dbqctl and core lib version",
3232
Run: func(cmd *cobra.Command, args []string) {
3333
fmt.Printf("DataBridge Quality CLI: %s\n", DbqCtlVersion)
3434
fmt.Printf("DataBridge dbqcore lib version: %s\n", dbqcore.GetDbqCoreLibVersion())

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ require (
1010
gopkg.in/yaml.v3 v3.0.1
1111
)
1212

13-
//replace github.com/DataBridgeTech/dbqcore => ../dbqcore
13+
replace github.com/DataBridgeTech/dbqcore => ../dbqcore
1414

1515
require (
1616
github.com/ClickHouse/ch-go v0.65.1 // indirect

readme.md

Lines changed: 73 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,22 @@
33
`dbqctl` is a free, open-source data quality CLI checker that provides a set of tools to profile, validate and test data in your data warehouse or databases.
44
It is designed to be flexible, fast, easy to use and integrate seamlessly into your existing workflow.
55

6-
## Features
6+
---
77

8-
data profiling
8+
## Features
99

10-
v1 supported checks
11-
---
12-
row_count > 10
13-
null_count(col) == 0
14-
avg(col) <= 24.2
15-
max(col) < 1000
16-
min(col) == 0
17-
sum(col) > 0
18-
stddevPop(col) between 1 and 100_000_000
19-
custom
10+
- Effortless dataset import: pull in datasets (e.g. tables) from your chosen DWH with filters
11+
- Comprehensive data profiling: gain instant insights into your data with automatic profiling, including:
12+
- Data Types
13+
- Null & Blank Counts
14+
- Min/Max/Avg Values
15+
- Standard Deviation
16+
- Most Frequent Values
17+
- Data quality checks with built-in support for:
18+
- Row Count
19+
- Null Count
20+
- Average, Max, Min, Sum
21+
- Flexible custom SQL checks: you can define and run your own SQL-based quality rules to meet unique business requirements.
2022

2123
## Supported databases
2224
- [ClickHouse](https://clickhouse.com/)
@@ -25,11 +27,11 @@ custom
2527

2628
### Installation
2729

28-
Download the latest binaries from [GitHub Releases](https://github.com/DataBridgeTech/dbq/releases).
30+
Download the latest binaries from [GitHub Releases](https://github.com/DataBridgeTech/dbqctl/releases).
2931

3032
### Configuration
3133

32-
Create dbq configuration file (default lookup directory is $HOME/.dbq.yaml or ./dbq.yaml). Alternatively,
34+
Create `dbqctl` configuration file (default lookup directory is $HOME/.dbq.yaml or ./dbq.yaml). Alternatively,
3335
you can specify configuration during the launch via `--config` parameter:
3436

3537
```bash
@@ -56,18 +58,51 @@ datasources:
5658
5759
```yaml
5860
# checks.yaml
59-
61+
version: "1"
62+
validations:
63+
# https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi
64+
- dataset: ch@[nyc_taxi.trips_small]
65+
# common pre-filter for every check, e.g. to run daily check only for yesterday
66+
where: "pickup_datetime > '2014-01-01'"
67+
checks:
68+
- id: row_count > 0
69+
description: "data should be present" # optional
70+
on_fail: error # optional (error, warn), default "error"
71+
72+
- id: row_count between 100 and 30000
73+
description: "expected rows count"
74+
on_fail: warn
75+
76+
- id: null_count(pickup_ntaname) == 0
77+
description: "no nulls are allowed in column: pickup_ntaname"
78+
79+
- id: min(pickup_datetime) < now() - interval 3 day
80+
description: "min(pickup_datetime) should not be earlier than 3 days"
81+
82+
- id: stddevPop(trip_distance) < 100_000
83+
description: "check stddev value"
84+
85+
- id: sum(fare_amount) <= 10_000_000
86+
description: "sum of value"
87+
88+
- id: countIf(trip_id == 1) == 1
89+
description: "check trip id"
90+
91+
- id: raw_query
92+
description: "raw query quality test"
93+
query: |
94+
select countIf(trip_distance == 0) > 0 from {{table}} where 1=1
6095
```
6196
6297
### Commands
6398
6499
```bash
65-
$ dbq help
100+
$ dbqctl help
66101

67-
dbq is a CLI tool for profiling data and running quality checks across various data sources
102+
dbqctl is a CLI tool for profiling data and running quality checks across various data sources
68103

69104
Usage:
70-
dbq [command]
105+
dbqctl [command]
71106

72107
Available Commands:
73108
check Runs data quality checks defined in a configuration file against a datasource
@@ -76,19 +111,30 @@ Available Commands:
76111
import Connects to a data source and imports all available tables as datasets
77112
ping Checks if the data source is reachable
78113
profile Collects dataset`s information and generates column statistics
79-
version Prints dbq version
114+
version Prints dbqctl and core lib version
80115

81116
Flags:
82117
--config string config file (default is $HOME/.dbq.yaml or ./dbq.yaml)
83-
-h, --help help for dbq
118+
-h, --help help for dbqctl
84119
-v, --verbose Enables verbose logging
85120

86-
Use "dbq [command] --help" for more information about a command.
121+
Use "dbqctl [command] --help" for more information about a command.
87122
```
88123

89-
### Quick start
90-
- dqb ping cnn-id
91-
- dbq import cnn-id --filter "reporting.*" --cfg checks.yaml --update-cfg
92-
- dbq check --cfg checks.yaml
93-
- dbq --config /Users/artem/code/dbq/dbq.yaml import
94-
- dbq profile --datasource cnn-id --dataset table_name
124+
### Quick usage examples
125+
```bash
126+
# check connection to datasource
127+
$ dqbctl ping cnn-id
128+
129+
# automatically import datasets from datasource with applied filter and in-place update config file
130+
$ dbqctl import cnn-id --filter "reporting.*" --cfg checks.yaml --update-cfg
131+
132+
# run checks from checks.yaml file
133+
$ dbqctl check --cfg checks.yaml
134+
135+
# override default dbqctl config file
136+
$ dbqctl --config /Users/artem/code/dbq/dbq.yaml import
137+
138+
# run dataset profile to collect general stats
139+
$ dbqctl profile --datasource cnn-id --dataset table_name
140+
```

0 commit comments

Comments
 (0)