In Gable, data contracts are defined as YAML files following the below specification. These contracts are then used by the Gable platform to automatically enforce these constraints at scale throughout the entire lifecycle of your data pipelines.

Each contract must contain the following components:

  • spec-version: The version of Gable’s data contract specification.
    • type: string
    • constraints: Valid semantic spec version. Currently supported versions are 0.1.0.
  • namespace: A unique named collection of contracts. This field is meant to be a user-defined space that has meaning within your organization.
    • type: string
    • constraints: (?=.*[a-zA-Z0-9])[A-Za-z0-9\.\_] (Must include at least one alphanumeric character,can include special characters . and _)
  • name: The name of the contract which, when combined with the namespace, uniquely identifies the contract.
    • type: string
    • constraints: (?=.*[a-zA-Z0-9])[a-zA-Z0-9_] (Must include at least one alphanumeric character, can include special character _)
  • doc: Documentation for the contract, can be specified with YAML’s multi-line syntax
    • type: string
  • dataAssetResourceName: The unique resource name of the data asset this contract covers. See Data Assets for more details
    • type: string
    • constraints: Must be a valid resource name in the format <type>://<source>:<dataset>
  • owner: The email for the team or individual who owns the data contract
    • type: email (string)
  • schema: The schema definition of the data asset. Gable uses a flexible type system for contract schemas, see the below section on the type specification for more details. At minimum, each field contains a name and type. Types can be primitives, or complex types like unions, lists, and nested objects.
    • type: list[Type]
    • constraints: In the data contract schema, various constraints can be applied to ensure data quality. These constraints define the conditions that the data must satisfy. Below are the detailed descriptions of the supported constraints:
    • Warning: Currently contract constraints only exist for S3 based assets
      • GREATER_THAN:
        • Description: Ensures that the field value is greater than a specified value.
        • Applicable To: Number, Time
        • Usage:
          constraints:
            - greaterThan: 10
          
      • GREATER_THAN_OR_EQUAL_TO:
        • Description: Ensures that the field value is greater than or equal to a specified value.
        • Applicable To: Number, Time
        • Usage:
          constraints:
            - greaterThanOrEqualTo: 10
          
      • LESS_THAN:
        • Description: Ensures that the field value is less than a specified value.
        • Applicable To: Number, Time
        • Usage:
          constraints:
            - lessThan: 100
          
      • LESS_THAN_OR_EQUAL_TO:
        • Description: Ensures that the field value is less than or equal to a specified value.
        • Applicable To: Number, Time
        • Usage:
          constraints:
            - lessThanOrEqualTo: 100
          
      • IS_NULL:
        • Description: Ensures that the field value is null.
        • Applicable To: Number, String, Time, Bytes, DataStructure, Other
        • Usage:
          constraints:
            - isNull: true
          
      • IS_NULL_THRESHOLD:
        • Description: Ensures the null value does not exceed a specified threshold.
        • Applicable To: Number, String, Time, Bytes, DataStructure, Other
        • Usage:
          constraints:
            - isNullThreshold: 0.1
          
      • IS_NOT_EMPTY:
        • Description: Ensures the value is not empty.
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - isNotEmpty: true
          
      • LENGTH:
        • Description: Ensures that the length of the field value is equal to a specified length.
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - length: 10
          
      • LENGTH_GREATER_THAN:
        • Description: Ensures that the length of the field value is greater than a specified length
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - lengthGreaterThan: 5
          
      • LENGTH_GREATER_THAN_OR_EQUAL_TO:
        • Description: Ensures the length of the value is greater than or equal to a specified length.
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - lengthGreaterThanOrEqualTo: 5
          
      • LENGTH_LESS_THAN:
        • Description: Ensures the length of the value is less than a specified length.
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - lengthLessThan: 10
          
      • LENGTH_LESS_THAN_OR_EQUAL_TO:
        • Description: Ensures the length of the value is less than or equal to a specified length.
        • Applicable To: String, DataStructure
        • Usage:
          constraints:
            - lengthLessThanOrEqualTo: 10
          

Example Data Contract Spec

Below is an example of a data contract that enforces the schema for OneBusAway, an open source project providing public APIs for transit information:
spec-version: 0.1.0
name: VehicleStatus
namespace: OneBusAway
dataAssetResourceName: postgres://gable.prod.rds.aws.com:5432:onebusaway.transit.vehicle_status
doc: Contract representing the status of a vehicle in OneBusAway's system.
owner: chadgable@gable.ai
schema:
  - name: vehicle_id
    doc: The id of the vehicle
    type: string32
    constraints:
      - charLength: 32
      - isNull: FALSE
      - isNotEmpty: TRUE
  - name: trip_id
    doc: (Optional) The id of the vehicle's current trip.
    type: union
    types: ['null', 'string32']
    default: 'null'
    constraints:
      - isNullThreshold: 0.8
  - name: status
    doc: The status of the vehicle.
    type: enum
    symbols: ['SCHEDULED', 'IN_PROGRESS']
    constraints:
      - isNullThreshold: 0.3
      - length: 1
  - name: location
    doc: (optional) The last known location of the vehicle
    type: union
    types:
      - type: 'null'
      - type: struct
        alias: Location
        name: location
        doc: A geographic location
        fields:
          - name: latitude
            doc: The latitude of the location
            type: float64
            constraints:
              - isNull: False
          - name: longitude
            doc: The longitude of the location
            type: float64
            constraints:
              - isNull: False
    constraints:
      - isNullThreshold: 0.45
  - name: last_location_update_time
    doc: The last known real-time update from the transit vehicle containing a location update (in milliseconds since the Unix epoch)
    type: date64
    constraints:
      - isNull: FALSE
      - max: today
  - name: last_update_time
    doc: The last known real-time update from the transit vehicle (in milliseconds since the Unix epoch)
    type: date64
    constraints:
      - isNull: FALSE
      - max: today
   - name: transaction_id
    doc: The unique identifier for each transaction
    type: uuid
    constraints:
      - isNotEmpty: true
      - isNull: false
  - name: customer_id
    doc: The unique identifier for each customer
    type: string32
    constraints:
      - charLength: 32
      - isNotEmpty: true
      - isNull: false
  - name: transaction_amount
    doc: The amount for the transaction
    type: float64
    constraints:
      - greaterThan: 0
      - isNull: false
  - name: discount
    doc: The discount applied to the transaction
    type: float32
    constraints:
      - greaterThanOrEqualTo: 0
      - lessThanOrEqualTo: 1
      - isNull: false
  - name: transaction_date
    doc: The date and time when the transaction occurred
    type: timestamp64
    constraints:
      - isNull: false
      - lessThanOrEqualTo: today
  - name: delivery_date
    doc: The expected delivery date for the transaction
    type: date64
    constraints:
      - isNullThreshold: 0.1
      - greaterThan: transaction_date
  - name: customer_feedback
    doc: Feedback provided by the customer
    type: union
    types: ['null', 'string']
    constraints:
      - lengthLessThanOrEqualTo: 1000
      - isNullThreshold: 0.7
  - name: product_category
    doc: The category of the product sold
    type: enum
    symbols: ['ELECTRONICS', 'FURNITURE', 'CLOTHING', 'TOYS', 'FOOD']
    constraints:
      - isNull: false
  - name: quantity_sold
    doc: The number of units sold in the transaction
    type: int32
    constraints:
      - greaterThan: 0
      - isNull: false
  - name: unit_price
    doc: The price per unit of the product
    type: float64
    constraints:
      - greaterThanOrEqualTo: 0.01
      - isNull: false
  - name: sales_region
    doc: The region where the sales occurred
    type: string
    constraints:
      - lengthGreaterThanOrEqualTo: 3
      - lengthLessThanOrEqualTo: 20
      - isNotEmpty: true
      - isNull: false
  - name: return_reason
    doc: The reason for the return of the product, if applicable
    type: union
    types: ['null', 'string']
    constraints:
      - lengthGreaterThan: 0
      - isNullThreshold: 0.95
  - name: return_date
    doc: The date when the product was returned, if applicable
    type: union
    types: ['null', 'date64']
    constraints:
      - greaterThanOrEqualTo: transaction_date
      - isNullThreshold: 0.95
  - name: sales_representative
    doc: The name of the sales representative who handled the transaction
    type: string
    constraints:
      - lengthGreaterThan: 0
      - lengthLessThanOrEqualTo: 50
      - isNotEmpty: true
      - isNull: false