Skip to content

Lineage and Asset Loader

This package creates lineage and (optionally) assets based on an input CSV file that contains source-to-target mappings.

Use Cases

  • ELT/ETL is performed in a tool that Atlan does not have a native connector for but source-to-target mapping information can be extracted/produced from the tool
  • Lineage cannot be extracted from any tool, but it is feasible to generate the source-to-target mapping information through simple logic or manual efforts

Pre-requisites

Input

Each instance of the custom package workflow has one input mapping file. The file is a CSV with fields that identify/map the source and target assets to one another.

The fields in the input mapping file can be divided into 3 sets:

The precise fields that are required for Source and Target sets depend on the asset types.
Currently, the package supports Relational Database and S3 types only. Each file can have only one source type and one target type.

Source Identifiers

Regardless of the source type, the following two fields are required:

  • SOURCE_TYPE: Object type of the source asset that the lineage is to be created for. Must be "Table" for database type and "S3 Object" for S3 type.
  • SOURCE_CONN: Qualified Name of the connection where the source assets reside or will be created

Use the following fields when the source assets are from a Relational Database.

  • SOURCE_DB: Name of the database where the source table object is found
  • SOURCE_SCHEMA: Name of the schema where the source table object is found
  • SOURCE_TABLE: Name of the source table object

Use the following fields when the source assets are from S3.

  • SOURCE_BUCKET: Name of the bucket where the S3 objects are found
  • SOURCE_BUCKET_ARN: ARN of the bucket
  • SOURCE_OBJECT: Name (key) of the S3 object
  • SOURCE_OBJECT_ARN: ARN of the S3 object

Target Identifiers

Regardless of the target type, the following two fields are required:

  • TARGET_TYPE: Object type of the target asset that the lineage is to be created for. Must be "Table" for database type and "S3 Object" for S3 type.
  • TARGET_CONN: Qualified Name of the connection where the target assets reside or will be created

Use the following fields when the target assets are from a Relational Database.

  • TARGET_DB: Name of the database where the target table object is found
  • TARGET_SCHEMA: Name of the schema where the target table object is found
  • TARGET_TABLE: Name of the target table object

Use the following fields when the target assets are from S3.

  • TARGET_BUCKET: Name of the bucket where the S3 objects are found
  • TARGET_BUCKET_ARN: ARN of the bucket
  • TARGET_OBJECT: Name (key) of the S3 object
  • TARGET_OBJECT_ARN: ARN of the S3 object

Asset creation controllers

  • CREATE_SOURCE_IF_NOT_EXISTS: Controls whether or not the script should create the source asset referenced by the source identifier fields or if it should require them to exist apart from the script in order to generate the lineage. Valid values are "TRUE" and "FALSE".
  • CREATE_TARGET_IF_NOT_EXISTS: Controls whether or not the script should create the target asset referenced by the target identifier fields or if it should require them to exist apart from the script in order to generate the lineage. Valid values are "TRUE" and "FALSE".

Lineage metadata fields

  • DESCRIPTION: Description to be saved on the lineage/process asset that connects the source/target objects on that row of the mapping file.
  • EXPRESSION: SQL/Expression to be saved on the lineage/process asset.

Templates

Workflow Setup

Credentials

Input Method

  • Input: This identifies the method by which the package will acces the input file. Currently, the only option is S3 Bucket.

S3 Input Option Parameters

  • AWS Access Key: AWS Access Key used to gain access to the S3 bucket where the mapping file is located.
  • AWS Access Secret: AWS Access Secret used to gain access to the S3 bucket where the mapping file is located.
  • S3 Bucket Name: Name of the S3 bucket where the input file is located.
  • Mapping Filename/Key: Name of the CSV mapping file/key in S3 including the prefix.
  • S3 Region: AWS Region where the S3 bucket is located.

Configuration

  • Connection QN: Qualified Name of the connection where the lienage/process assets will be created. NOTE: This must be created via the API separate from running the workflow.
  • Name: Will be name of the custom metadata set created/used by the workflow to store the reference info about this source. Can be used for multiple instances of this package.
  • Instance Name: Will be the name of the custom metadata property that will store the identity of the workflow.
  • Instance Unique ID: Unique identifier to be stored on each asset that is created by this workflow. MUST BE UNIQUE TO WORKFLOW

How it works

  • The Custom Metadata set/property used for identifying the workflow may be pre-created, or it will be created by the workflow. It is recommended to let the workflow create it as it will be "locked" in the UI so that it cannot be inadvertently modified.
  • Every unique asset in the input file (database, schema, table, S3 bucket, S3 ojbect, etc.) that has the "CREATE_SOURCE/TARGET_IF_NOT_EXISTS" field set to "True" will be created by the workflow.
  • If an asset has the "CREATE_SOURCE/TARGET_IF_NOT_EXISTS" set to "False", the workflow will not create the asset. It must already exist in Atlan under the specified connection if the lineage represented by that line is to be generated.
  • Lineage will be created in Atlan for every row in the input file for which both the source and target assets exist in Atlan and are active (either created by the workflow or pre-existing).
  • If the Description or Expression lineage metadata are updated in the input mapping file after they were created by the lineage, they will be updated on the process asset in a subsequent run.
  • Every asset (including lineage) will have the Custom Metadata property identifed in the configuration set with the CIF Unique ID value so that the workflow can locate the assets it authored in subsequent runs, and deprecate them if needed.
  • If assets or lineage previously created by the workflow are no longer found in the input mapping file, the workflow will deprecate them (archive the assets, purge/delete the lineage).
  • If on a subsequent run, the "CREATE_SOURCE/TARGET_IF_NOT_EXISTS" field is set to "False" for an asset that was previously created by the workflow, the asset (and corresponding lineage) will be deprecated.