Skip to content

Lineage builder

Source code Sample CSV file

The lineage builder package allows you to create lineage between any source and any target asset.

Use cases

  • Add your own arbitrary lineage connections, to render in lineage like any others
  • Inject lineage for transformations from a tool Atlan does not mine out-of-the-box

Configuration

Source

  • Import lineage from: select how you want to provide the input file to be processed (see below for the required format)

    Directly upload a CSV file containing the lineage.

    • Lineage file: the CSV file containing lineage details to load.

    Limited file sizes

    This option is generally limited to ~10-20MB for each file. For anything larger, use object storage.

    Retrieve the lineage file from cloud object storage.

    • Prefix (path) the directory (path) within the object store from which to fetch the file containing lineage metadata.
    • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
    • Cloud object store the object store from which to fetch the file conaining cube assets.

      • AWS access key: your AWS access key.
      • AWS secret key: your AWS secret key.
      • Region: your AWS region.
      • Bucket: your AWS bucket.

      Reusing Atlan's backing S3 store

      When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.

      • Project ID: the ID of your GCP project.
      • Servive account JSON: your service account credentials, as JSON.
      • Bucket your GCS bucket.

      Reusing Atlan's backing GCS store

      When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.

      • Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
      • Azure client secret: your Azure client secret (it's actual value, not its identifier).
      • Azure tenant ID: the unique identifier of the Azure Active Directory instance.
      • Storage account name: name of your storage account.
      • Container: your ADLS container.

      Reusing Atlan's backing ADLS store

      When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.

Options

  • Unknown asset handling: how you want to handle source and target assets in the input file that do not match any assets in Atlan

    Any source or target assets in the input file that do not match any asset in Atlan will be skipped (those lineage rows will not be loaded).

    For any source or target asset in the input file that does not match any asset in Atlan, create a "partial" asset. These assets only appear in lineage, and cannot be searched or detailed through the sidebar. They can be later "resolved" into full assets which are then discoverable and visible in the asset sidebar.

    Will not work for child assets

    The lineage builder package can create partial assets for lineage at the container level (table, view, materialized view), but not at the child (field) level (column, etc). If you want to create field-level lineage using partial assets, you must first create those field-level partial assets using an alternative such as the relational assets builder.

    For any source or target asset in the input file that does not match any asset in Atlan, create a full asset. These assets will behave like any other: they will be discoverable through search and appear in the asset sidebar.

  • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

  • Case-sensitive match for assets: whether attempts to match assets should be done case-sensitively (Yes) or case-insensitively (No).
  • Field separator: the single character that is used to separate values in the input file, typically either a comma (,) or a semicolon (;).
  • Batch size: the maximum number of rows of input to process per underlying API call.

What it does

Each row of the CSV file supplied represents a single lineage process to be created (or updated).

  • If Create full assets or Create partial assets is enabled, the package will make a first pass through the file to create any assets that do not already exist in Atlan (as either fully-discoverable, or partial assets that only appear in lineage).
  • The package will then process each row of the CSV file supplied to:
    • Create a lineage process for that row, if none already exists with the given characteristics (Transformation Connection and Transformation Identity),
    • Update any existing lineage process that matches the row's given characteristics (Transformation Connection and Transformation Identity).
  • Finally, the package will retrieve any persistent cache of assets (used for constructing lineage in other packages) and add any new or updated assets to it.

CSV file

Connections must pre-exist

This package will not create any connections — the connections referenced must already exist before running the package.

Detailed information on the columns in the CSV file:

Source Type

Required. Type of the source (input, upstream) asset in the lineage.

Source Connector

Required. Type of connector for that asset (e.g. the kind of source system).

Source Connection

Required. Name of the connection in Atlan that should contain the asset. (If the asset does not exist and creation is permitted, this is the connection in which the source asset will be created.)

Source Identity

Required. Unique identity of the source asset in Atlan within the connection. This is essentially its qualifiedName, excluding the connection portion of the qualifiedName (which will instead be calculated for you from the preceding fields).

Source Name

Required. Simple name of the source asset.

Target Type

Required. Type of the target (output, downstream) asset in the lineage.

Target Connector

Required. Type of connector for that asset (e.g. the kind of target system).

Target Connection

Required. Name of the connection in Atlan that should contain the asset. (If the asset does not exist and creation is permitted, this is the connection in which the target asset will be created.)

Target Identity

Required. Unique identity of the target asset in Atlan within the connection. This is essentially its qualifiedName, excluding the connection portion of the qualifiedName (which will instead be calculated for you from the preceding fields).

Target Name

Required. Simple name of the target asset.

Transformation Connector

Required. Type of connector for the transformation or data movement process itself (e.g. the kind of system).

Transformation Connection

Required. Name of the connection in Atlan that should contain the lineage process. (Where the lineage process will be created or updated.)

Transformation Identity

Required. Unique identity of the transformation (lineage) process in Atlan within the connection. This is essentially its qualifiedName, excluding the connection portion of the qualifiedName (which will instead be calculated for you from the preceding fields).

Transformation Name

Required. Simple name to use when displaying the lineage process.

.. (remaining columns) ..

You can also supply any number of additional columns. These will be loaded as attributes on the lineage process itself, and must follow the format described in the asset export (basic) package or use a column name that matches one of the lineage process properties or more general asset properties .

How it works
  1. If asset creation is permitted, runs a first pass through the supplied CSV file to create any assets within their respective connections.
  2. Then proceeds to create a single lineage process per row of the supplied CSV file.
  3. Will at the end update any persistent connection cache with the assets that were created or updated.