Skip to content

Lineage generator (no transformations)

The lineage generator (no transformations) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.

To avoid to blindly let the package to create the lineage, an option to preview the ouput is provided. The typical path to use this package would be:

  1. Ask the package to generate the lineage preview.
  2. If happy with the output, ask the package to generate the lineage on Atlan.

The package also provides a method to delete lineage created by the package itself.

Configuration

  • Source asset type: type name of the lineage input assets (sources).
  • Source qualified name prefix: qualified name prefix of the lineage input assets (sources).
  • Target asset type: type name of the lineage output assets (targets).
  • Target qualified name prefix: qualified name prefix of the lineage output assets (targets).
  • Case sensitive match: whether to match asset names using a case sensitive logic, default: No.
  • Match on schema: whether to include the schema name to match source and target assets. If one of "Source asset type" or "Target asset type" is not a relational type (Table, View, Materialized View or Column) or a Mongo DB Collection the option is ignored, default: No
  • Output type, default: Preview Lineage:
    • Preview Lineage: to generate a csv with the lineage preview.
    • Generate Lineage: to generate the lineage on Atlan.
    • Delete Lineage: to delete the lineage on Atlan.
  • Generate lineage on child assets: whether to generate the lineage on the child assets specified on Source asset type and Target asset type, default: No.
  • Regex to match characters to replace (optional): if there is a re-naming happening between the source and the target that can be identified by a regex pattern, use this field to identify the characters to be replaced.
  • Regex with replacements characters (optional): if there is a re-naming happening between the source and the target that can be identified by a regex pattern, use this field to specify the replacements characters.
  • Regex to match characters to replace on the schema (optional): if there is a re-naming happening between the source and the target schema that can be identified by a regex pattern, use this field to identify the characters to be replaced. Applicable only if "Match on schema" is "Yes".
  • Regex with replacements characters on the schema (optional): if there is a re-naming happening between the source and the target schema that can be identified by a regex pattern, use this field to specify the replacements characters. Applicable only if "Match on schema" is "Yes".
  • Match prefix (optional): prefix to add to source assets to match with target ones.
  • Match suffix (optional): suffix to add to source assets to match with target ones.
  • File advanced separator (applicable to file based assets only) (optional): Sepator used to split the qualified name. It's applicable to file based assets only. E.g. if the separator is equal to /: default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> [default,s3,1707397085,arn:aws:s3:::mybucket,prefix,myobject.csv]
  • File advanced position (applicable to file based assets only) (optional): Number of substrings (created using File advanced separator) to use for the asset match. The count is from right to left. It's applicable to file based assets only. In the above example if the value is equal to 3 -> [arn:aws:s3:::mybucket,prefix,myobject.csv]
  • Process connection (optional): connection for the process assets. If blank the process assets will be assigned to the source assets connection.

Warning

If "Match on schema" is "Yes" and the same regex replacement is applicable for both the schema and child assets, both regex combination of fields need to be filled with the same rules.

What it does

  1. Retrieve the list of assets that match the asset types (Source asset type and Target asset type) and the qualified name prefix (Source qualified name prefix and Target qualified name prefix) defined in the configuration.
  2. Generate the matching key starting from each asset qualifiedName:

    Remove from the qualifiedName the part related to connection, database and schema.

    • If "Match on schema" is No
      • default/snowflake/1697720934/DATABASE/SCHEMA/TABLE -> TABLE
      • default/snowflake/1697720934/DATABASE/SCHEMA/TABLE/COLUMN -> TABLE/COLUMN
    • If "Match on schema" is Yes
      • default/snowflake/1697720934/DATABASE/SCHEMA/TABLE -> SCHEMA/TABLE
      • default/snowflake/1697720934/DATABASE/SCHEMA/TABLE/COLUMN -> SCHEMA/TABLE/COLUMN

    Remove from the qualifiedName the part related to connection and organization.

    • default/salesforce/1697720934/ORGANIZATION/OBJECT -> OBJECT
    • default/salesforce/1697720934/ORGANIZATION/OBJECT/FIELD -> OBJECT/FIELD

    Remove from the qualifiedName the part related to connection and database.

    • If "Match on schema" is No
      • default/mongodb/1697720934/DATABASE/COLLECTION -> COLLECTION
    • If "Match on schema" is Yes
      • default/mongodb/1697720934/DATABASE/COLLECTION -> DATABASE/COLLECTION

    Fetch from the qualifiedName the part related to the object.

    • If either "File advanced separator" or "File advanced position" is empty:
      • default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> myobject
      • default/adls/123456789/myaccount/mycontainer/myobject.csv -> myobject
    • Else:
      • if the "File advanced separator" is equal to /: default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> [default,s3,1707397085,arn:aws:s3:::mybucket,prefix,myobject.csv]
      • if "File advanced position" is equal to 3 -> [arn:aws:s3:::mybucket,prefix,myobject.csv]

    Fetch from the qualifiedName the part related to the Power BI Table and Power BI Column.

    • default/powerbi/1704228169/ac169fd3-6091-459f-b012-35e5cb7aa9c1/14631399-c4eb-409b-a149-693301bb9c1a/TABLE -> TABLE
    • default/powerbi/1704228169/ac169fd3-6091-459f-b012-35e5cb7aa9c1/14631399-c4eb-409b-a149-693301bb9c1a/TABLE/COLUMN -> TABLE/COLUMN
  3. If defined in the configuration, apply the regular expression replacement logic to the source asset matching keys.

  4. If defined in the configuration, add a prefix to the source asset matching keys.
  5. If defined in the configuration, apply an upper case expression to all asset matching keys.
  6. Use the matching keys to match source and target assets.

    Matching logic

    • Salesforce Objects are matched with Tables/Views/Materialized Views and Salesforce Fields with Columns.
    • MongoDB Collections are matched with Tables/Views/Materialized Views.
    • S3, ADLS and GCS Objects are matched with Tables/Views/Materialized Views.
    • Tables/Views/Materialized Views are matched with Tables/Views/Materialized Views.
    • S3, ADLS and GCS Objects are matched with S3, ADLS and GCS Objects.
  7. Based on the output type defined in the configuration different logics are executed:

    A csv file with the list of matching assets is generated with the following columns:

    • source_guid
    • source_qualified_name
    • source_asset_type
    • target_guid
    • target_qualified_name
    • target_asset_type
    • full_name (aka matching key)

    How to download the file

    The file with the lineage preview can be downloaded from the Argo workflow log screen.

    Warning

    The file with the lineage preview is always generated even if Preview Lineage is not selected as ouput type.

    Processes assets (lineage) are created between assets matched in (5).

    Warning

    If Process connection is blank the process assets will be assigned to the source assets connection.

    Processes assets (lineage) between assets matched in (5) are deleted.

    Warning

    Only processes (lineage) created using this package are deleted.