Skip to main content
Gable’s CLI provides methods for checking changes to data assets that would violate an existing contract. You can either run these checks locally or incorporate it into your CI/CD pipeline. Similar to Registering Data Assets, Gable relies on having access to a local instance of the data asset with the proposed changes applied to evaluate it for contract violations. To use the CLI to validate changes, you need to provide the Gable API endpoint for your organization, as well as a valid API key to the CLI. The API key should be treated as a secret value and stored accordingly. Check out the Gable CLI on PyPi for setup instructions and more information.

Checking Relational Databases

Similar to Registering Data Assets, Gable relies on having access to a local “proxy” database to check for contract violations. It’s important that the contract check runs against an instance of the database with the proposed changes in order to catch breaking changes before they’re merged and deployed to production. A proxy database is a database instance that represents the database’s schema should the proposed changes under evaluation be merged, and is accessible locally or in your CI/CD environment. The proxy database concept also removes the need to grant access to your production database, as well as eliminates the possibility of impacting the performance of your production database in any way. A proxy database can be a local Docker container, a Docker container that is spun up in your CI/CD workflow, or a database instance in a test/staging environment. If you already start a database Docker container in your CI/CD workflows for integration testing, Gable can be configured to use that same container at the end of the test run. When using a proxy database, you specify both the production host/port/schema, as well as those of the proxy. The production information is required to compute the unique data asset resource name for each discovered table in order to find any contracts associated with the database’s tables.
In this example, a local Docker Postgres instance is created and database migrations are applied with proposed changes. Gable connects to the local Postgres instance and validates the changes.
# Start a local Postgres Docker container
docker run --name serviceone_proxy -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres

# Apply the service's migrations, this assumes you have a yarn script that knows to apply
# the migrations to the local Postgres instance when passed the `--test` flag
yarn migrate --test

# Check Data Assets for Contract Violations
gable data-asset check \
    --source-type mysql \
    --host 'service-one.aaaaaaaaaaaa.eu-west-1.rds.amazonaws.com' --port 5432  \
    --db prod_serviceone --schema public --proxy-host 'localhost' --proxy-port 3306 \
    --proxy-db test_serviceone --proxy-user root --proxy-password root

Checking Protobuf/Avro/JSON Schema Files

Checking a service’s Protobuf, Avro, or JSON schema files for contract violations is straightforward as the only requirement is having the service’s git repository checked out locally. The CLI supports checking multiple files, either specified as a space delimited list (file1.proto file2.proto), or as a glob pattern. The check command must be run within the repository’s directory, as it uses the repo’s git information to construct the unique resource name for the data assets it discovers in order to find any contracts associated with the file(s).
In this example, Gable inspects the protobuf files of serviceone for contract violations.
# Check Data Assets for Contract Violations
gable data-asset check \
    --source-type protobuf --files ./proto/*.proto

Static Code Analysis

Using static code analysis, Gable can check data-generating code across your codebase to ensure compliance with existing data contracts. Following the examples below, you can use the Gable CLI to have your code bundled, transmitted, and analyzed by Gable for native type detection and checking against data contracts. Please note that bundling and transmission of your code is necessary for our Gable static analysis, but rest assured that your code will not be persisted on Gable servers. In future releases, we will add the ability to run the static code analysis completely within your CI/CD pipeline.

Python

gable data-asset check --source-type python \
    --project-root <project_root> \
    --emitter-file-path <path_to_your_emitter_function> \
    --emitter-function <emitter_function_name> \
    --emitter-payload-parameter <payload_param> \
    --event-name-key <key>
When checking a Python project, it’s important to specify the project’s entry point, which is the root directory of your project. This allows Gable to correctly identify and bundle the project for analysis. Additionally, specifying the emitter function and event name key helps Gable understand how your project interacts with and emits data, ensuring accurate tracking and management.

Check Python Options

  • --source-type: Specify source. Python in this case
  • --project-root: Specifying the project’s entry point for proper bundling
  • --emitter-file-path: Identify the emitter function location
  • --emitter-function: Identify the emitter function
  • --emitter-payload-parameter: Identify payload parameter within the emitter function
  • --event-name-key: Define the property of the event to distinguish event types

PySpark

gable data-asset check --source-type pyspark \
    --project-root . \
    --spark-job-entrypoint job.py \
    --connection-string hive://localhost:10000

Check PySpark Options

  • --source-type - Set to pyspark for PySpark projects
  • --project-root - The directory containing the PySpark job to be analyzed
  • --spark-job-entrypoint - The command to execute the Spark job, including any argument
  • --connection-string - Connection string to the Hive metastore
  • --csv-schema-file - Path to csv file containing the schema of upstream tables, formatted with columns table_name, column_name, and column_type

Typescript

gable data-asset check --source-type typescript \
  --project-root . \
  --library segment
In this example, a parameter of the UDF (eventName) is used to set the event name when publishing.Example Event Publishing UDF
// src/lib/events.ts
// Note: the event name is passed in as a parameter to the function
function trackEvent(eventName: string, eventProperties: object) {
  ...
}
Check command using —emitter-name-parameter
#
gable data-asset check --source-type typescript \
  --project-root . \
  --emitter-file-path file.ts \
  --emitter-function trackEvent  \
  --emitter-name-parameter eventName \
  --emitter-payload-parameter eventProperties
In this example the event name is a property of the event payloadExample Event Publishing UDF
// src/lib/events.ts
// Note: the event name is a property of the event payload
function trackEvent(eventProperties: object) {
  const eventName = eventProperties['__event_name'];
  ...
}
Check command using —event-name-key
gable data-asset check --source-type typescript \
  --project-root . \
  --emitter-file-path src/lib/events.ts \
  --emitter-function trackEvent  \
  --event-name-key __event_name \
  --emitter-payload-parameter eventProperties

Check Typescript Options

Required
  • --source-type - Set to typescript to check events in Typescript
  • --project-root - The directory containing the Typescript project to be analyzed
Supported Libraries
  • --library - The natively supported library used to publish data, usually events
User Defined Function
  • --emitter-file-path src/lib/events.ts - The path to the file containing the UDF
  • --emitter-function trackEvent - The name of the UDF
  • --emitter-payload-parameter eventProperties - The name of the function parameter representing the event payload
  • --emitter-name-parameter eventName - [Optional] The name of the function parameter representing the event name. Use either this option, or --event-name-key __event_name. See above examples.
  • --event-name-key __event_name - [Optional] The name of the event property representing the event name. Use either this option, or --event-name-key __event_name. See above examples.
I