Using the Gable CLI

Gable’s CLI provides methods for checking changes to data assets that would violate an existing contract. You can either run these checks locally or incorporate it into your CI/CD pipeline. Similar to Registering Data Assets, Gable relies on having access to a local instance of the data asset with the proposed changes applied to evaluate it for contract violations. To use the CLI to validate changes, you need to provide the Gable API endpoint for your organization, as well as a valid API key to the CLI. The API key should be treated as a secret value and stored accordingly. Check out the Gable CLI on PyPi for setup instructions and more information.

Checking Relational Databases

Similar to Registering Data Assets, Gable relies on having access to a local “proxy” database to check for contract violations. It’s important that the contract check runs against an instance of the database with the proposed changes in order to catch breaking changes before they’re merged and deployed to production. A proxy database is a database instance that represents the database’s schema should the proposed changes under evaluation be merged, and is accessible locally or in your CI/CD environment. The proxy database concept also removes the need to grant access to your production database, as well as eliminates the possibility of impacting the performance of your production database in any way. A proxy database can be a local Docker container, a Docker container that is spun up in your CI/CD workflow, or a database instance in a test/staging environment. If you already start a database Docker container in your CI/CD workflows for integration testing, Gable can be configured to use that same container at the end of the test run. When using a proxy database, you specify both the production host/port/schema, as well as those of the proxy. The production information is required to compute the unique data asset resource name for each discovered table in order to find any contracts associated with the database’s tables.

Example:

In this example, a local Docker Postgres instance is created and database migrations are applied with proposed changes. Gable connects to the local Postgres instance and validates the changes.

# Start a local Postgres Docker container
docker run --name serviceone_proxy -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres

# Apply the service's migrations, this assumes you have a yarn script that knows to apply
# the migrations to the local Postgres instance when passed the `--test` flag
yarn migrate --test

# Check Data Assets for Contract Violations
gable data-asset check \
    --source-type mysql \
    --host 'service-one.aaaaaaaaaaaa.eu-west-1.rds.amazonaws.com' --port 5432  \
    --db prod_serviceone --schema public --proxy-host 'localhost' --proxy-port 3306 \
    --proxy-db test_serviceone --proxy-user root --proxy-password root

Checking Protobuf/Avro/JSON Schema Files

Checking a service’s Protobuf, Avro, or JSON schema files for contract violations is straightforward as the only requirement is having the service’s git repository checked out locally. The CLI supports checking multiple files, either specified as a space delimited list (file1.proto file2.proto), or as a glob pattern. The check command must be run within the repository’s directory, as it uses the repo’s git information to construct the unique resource name for the data assets it discovers in order to find any contracts associated with the file(s).

Example:

In this example, Gable inspects the protobuf files of serviceone for contract violations.

# Check Data Assets for Contract Violations
gable data-asset check \
    --source-type protobuf --files ./proto/*.proto

Static Code Analysis

Using static code analysis, Gable can check data-generating code across your codebase to ensure compliance with existing data contracts. Following the examples below, you can use the Gable CLI to have your code bundled, transmitted, and analyzed by Gable for native type detection and checking against data contracts. Please note that bundling and transmission of your code is necessary for our Gable static analysis, but rest assured that your code will not be persisted on Gable servers. In future releases, we will add the ability to run the static code analysis completely within your CI/CD pipeline.

Python

Example: Checking Python Emitter Data Assets

gable data-asset check --source-type python \
    --project-root <project_root> \
    --emitter-file-path <path_to_your_emitter_function> \
    --emitter-function <emitter_function_name> \
    --emitter-payload-parameter <payload_param> \
    --event-name-key <key>

When checking a Python project, it’s important to specify the project’s entry point, which is the root directory of your project. This allows Gable to correctly identify and bundle the project for analysis. Additionally, specifying the emitter function and event name key helps Gable understand how your project interacts with and emits data, ensuring accurate tracking and management.

Check Python Options

--source-type: Specify source. Python in this case
--project-root: Specifying the project’s entry point for proper bundling
--emitter-file-path: Identify the emitter function location
--emitter-function: Identify the emitter function
--emitter-payload-parameter: Identify payload parameter within the emitter function
--event-name-key: Define the property of the event to distinguish event types

PySpark

Example: Checking PySpark Projects

gable data-asset check --source-type pyspark \
    --project-root . \
    --spark-job-entrypoint job.py \
    --connection-string hive://localhost:10000

Check PySpark Options

--source-type - Set to pyspark for PySpark projects
--project-root - The directory containing the PySpark job to be analyzed
--spark-job-entrypoint - The command to execute the Spark job, including any argument
--connection-string - Connection string to the Hive metastore
--csv-schema-file - Path to csv file containing the schema of upstream tables, formatted with columns table_name, column_name, and column_type

Typescript

Example: Checking Typescript Projects (Supported Library)

gable data-asset check --source-type typescript \
  --project-root . \
  --library segment

Example: Checking Typescript Projects (UDF: Event Name Parameter)

In this example, a parameter of the UDF (eventName) is used to set the event name when publishing.Example Event Publishing UDF

// src/lib/events.ts
// Note: the event name is passed in as a parameter to the function
function trackEvent(eventName: string, eventProperties: object) {
  ...
}

Check command using —emitter-name-parameter

#
gable data-asset check --source-type typescript \
  --project-root . \
  --emitter-file-path file.ts \
  --emitter-function trackEvent  \
  --emitter-name-parameter eventName \
  --emitter-payload-parameter eventProperties

Example: Checking Typescript Projects (UDF: Event Name Key)

In this example the event name is a property of the event payloadExample Event Publishing UDF

// src/lib/events.ts
// Note: the event name is a property of the event payload
function trackEvent(eventProperties: object) {
  const eventName = eventProperties['__event_name'];
  ...
}

Check command using —event-name-key

gable data-asset check --source-type typescript \
  --project-root . \
  --emitter-file-path src/lib/events.ts \
  --emitter-function trackEvent  \
  --event-name-key __event_name \
  --emitter-payload-parameter eventProperties

Check Typescript Options

Required

--source-type - Set to typescript to check events in Typescript
--project-root - The directory containing the Typescript project to be analyzed

Supported Libraries

--library - The natively supported library used to publish data, usually events

User Defined Function

--emitter-file-path src/lib/events.ts - The path to the file containing the UDF
--emitter-function trackEvent - The name of the UDF
--emitter-payload-parameter eventProperties - The name of the function parameter representing the event payload
--emitter-name-parameter eventName - [Optional] The name of the function parameter representing the event name. Use either this option, or --event-name-key __event_name. See above examples.
--event-name-key __event_name - [Optional] The name of the event property representing the event name. Use either this option, or --event-name-key __event_name. See above examples.

Getting Started

Data Contracts

Assets

Prevention & Alerting

Settings

Security & Compliance

Resources

Releases

Checking Relational Databases

Checking Protobuf/Avro/JSON Schema Files

Static Code Analysis

Python

Check Python Options

PySpark

Check PySpark Options

Typescript

Check Typescript Options

Getting Started

Data Contracts

Assets

Prevention & Alerting

Settings

Security & Compliance

Resources

Releases

​Checking Relational Databases

​Checking Protobuf/Avro/JSON Schema Files

​Static Code Analysis

​Python

​Check Python Options

​PySpark

​Check PySpark Options

​Typescript

​Check Typescript Options

Checking Relational Databases

Checking Protobuf/Avro/JSON Schema Files

Static Code Analysis

Python

Check Python Options

PySpark

Check PySpark Options

Typescript

Check Typescript Options