Gable’s CLI provides methods for checking changes to data assets that would violate an existing contract. You can either run these checks locally or incorporate it into your CI/CD pipeline. Similar to Registering Data Assets, Gable relies on having access to a local instance of the data asset with the proposed changes applied to evaluate it for contract violations. To use the CLI to validate changes, you need to provide the Gable API endpoint for your organization, as well as a valid API key to the CLI. The API key should be treated as a secret value and stored accordingly. Check out the Gable CLI on PyPi for setup instructions and more information.

Checking Relational Databases

Similar to Registering Data Assets, Gable relies on having access to a local “proxy” database to check for contract violations. It’s important that the contract check runs against an instance of the database with the proposed changes in order to catch breaking changes before they’re merged and deployed to production. A proxy database is a database instance that represents the database’s schema should the proposed changes under evaluation be merged, and is accessible locally or in your CI/CD environment. The proxy database concept also removes the need to grant access to your production database, as well as eliminates the possibility of impacting the performance of your production database in any way. A proxy database can be a local Docker container, a Docker container that is spun up in your CI/CD workflow, or a database instance in a test/staging environment. If you already start a database Docker container in your CI/CD workflows for integration testing, Gable can be configured to use that same container at the end of the test run. When using a proxy database, you specify both the production host/port/schema, as well as those of the proxy. The production information is required to compute the unique data asset resource name for each discovered table in order to find any contracts associated with the database’s tables.

Checking Protobuf/Avro/JSON Schema Files

Checking a service’s Protobuf, Avro, or JSON schema files for contract violations is straightforward as the only requirement is having the service’s git repository checked out locally. The CLI supports checking multiple files, either specified as a space delimited list (file1.proto file2.proto), or as a glob pattern. The check command must be run within the repository’s directory, as it uses the repo’s git information to construct the unique resource name for the data assets it discovers in order to find any contracts associated with the file(s).

Static Code Analysis

Using static code analysis, Gable can check data-generating code across your codebase to ensure compliance with existing data contracts. Following the examples below, you can use the Gable CLI to have your code bundled, transmitted, and analyzed by Gable for native type detection and checking against data contracts. Please note that bundling and transmission of your code is necessary for our Gable static analysis, but rest assured that your code will not be persisted on Gable servers. In future releases, we will add the ability to run the static code analysis completely within your CI/CD pipeline.

Python

When checking a Python project, it’s important to specify the project’s entry point, which is the root directory of your project. This allows Gable to correctly identify and bundle the project for analysis. Additionally, specifying the emitter function and event name key helps Gable understand how your project interacts with and emits data, ensuring accurate tracking and management.

Check Python Options

  • --source-type: Specify source. Python in this case
  • --project-root: Specifying the project’s entry point for proper bundling
  • --emitter-file-path: Identify the emitter function location
  • --emitter-function: Identify the emitter function
  • --emitter-payload-parameter: Identify payload parameter within the emitter function
  • --event-name-key: Define the property of the event to distinguish event types

PySpark

Check PySpark Options

  • --source-type - Set to pyspark for PySpark projects
  • --project-root - The directory containing the PySpark job to be analyzed
  • --spark-job-entrypoint - The command to execute the Spark job, including any argument
  • --connection-string - Connection string to the Hive metastore
  • --csv-schema-file - Path to csv file containing the schema of upstream tables, formatted with columns table_name, column_name, and column_type

Typescript

Check Typescript Options

Required
  • --source-type - Set to typescript to check events in Typescript
  • --project-root - The directory containing the Typescript project to be analyzed
Supported Libraries
  • --library - The natively supported library used to publish data, usually events
User Defined Function
  • --emitter-file-path src/lib/events.ts - The path to the file containing the UDF
  • --emitter-function trackEvent - The name of the UDF
  • --emitter-payload-parameter eventProperties - The name of the function parameter representing the event payload
  • --emitter-name-parameter eventName - [Optional] The name of the function parameter representing the event name. Use either this option, or --event-name-key __event_name. See above examples.
  • --event-name-key __event_name - [Optional] The name of the event property representing the event name. Use either this option, or --event-name-key __event_name. See above examples.