Lineage generator (no transformations)¶

The lineage generator (no transformations) package automatically detects assets with the same (or similar) name between two connections and creates the lineage between them.

To avoid to blindly let the package to create the lineage, an option to preview the ouput is provided. The typical path to use this package would be:

Ask the package to generate the lineage preview.
If happy with the output, ask the package to generate the lineage on Atlan.

The package also provides a method to delete lineage created by the package itself.

Configuration¶

Source asset type: type name of the lineage input assets (sources).
Source qualified name prefix: qualified name prefix of the lineage input assets (sources).
Target asset type: type name of the lineage output assets (targets).
Target qualified name prefix: qualified name prefix of the lineage output assets (targets).
Case sensitive match: whether to match asset names using a case sensitive logic, default: No.
Match on schema: whether to include the schema name to match source and target assets. If one of "Source asset type" or "Target asset type" is not a relational type (Table, View, Materialized View, Calculation View or Column) or a Mongo DB Collection the option is ignored, default: No
Output type, default: Preview Lineage:
- Preview Lineage: to generate a csv with the lineage preview.
- Generate Lineage: to generate the lineage on Atlan.
- Delete Lineage: to delete the lineage on Atlan.
Generate lineage on child assets: whether to generate the lineage on the child assets specified on Source asset type and Target asset type, default: No.
Regex to match characters to replace (optional): if there is a re-naming happening between the source and the target that can be identified by a regex pattern, use this field to identify the characters to be replaced. Multiple patterns can be defined by splitting them with @@@@@ (the number of patterns have to match with the ones defined in the Regex with replacements characters parameter).
Regex with replacements characters (optional): if there is a re-naming happening between the source and the target that can be identified by a regex pattern, use this field to specify the replacements characters. Multiple patterns can be defined by splitting them with @@@@@ (the number of patterns have to match with the ones defined in the Regex to match characters to replace parameter).
Regex to match characters to replace on the schema (optional): if there is a re-naming happening between the source and the target schema that can be identified by a regex pattern, use this field to identify the characters to be replaced. Applicable only if "Match on schema" is "Yes".
Regex with replacements characters on the schema (optional): if there is a re-naming happening between the source and the target schema that can be identified by a regex pattern, use this field to specify the replacements characters. Applicable only if "Match on schema" is "Yes".
Regex to match characters to replace on the name + schema together (optional): if there is a re-naming happening between the source and the target name + schema that can be identified by a regex pattern, use this field to identify the characters to be replaced. Applicable only if "Match on schema" is "Yes". It overrides any other regex defined.
Regex with replacements characters on the name + schema together (optional): if there is a re-naming happening between the source and the target name + schema that can be identified by a regex pattern, use this field to specify the replacements characters. Applicable only if "Match on schema" is "Yes". It overrides any other regex defined.
Match prefix (optional): prefix to add to source assets to match with target ones.
Match suffix (optional): suffix to add to source assets to match with target ones.
File advanced separator (applicable to file based assets only) (optional): Sepator used to split the qualified name. It's applicable to file based assets only. E.g. if the separator is equal to /: default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> [default,s3,1707397085,arn:aws:s3:::mybucket,prefix,myobject.csv]
File advanced position (applicable to file based assets only) (optional): Number of substrings (created using File advanced separator) to use for the asset match. The count is from right to left. It's applicable to file based assets only. In the above example if the value is equal to 3 -> [arn:aws:s3:::mybucket,prefix,myobject.csv]
Process connection (optional): connection for the process assets. If blank the process assets will be assigned to the source assets connection.

Warning

If "Match on schema" is "Yes" and the same regex replacement is applicable for both the schema and child assets, both regex combination of fields need to be filled with the same rules.

What it does¶

Retrieve the list of assets that match the asset types (Source asset type and Target asset type) and the qualified name prefix (Source qualified name prefix and Target qualified name prefix) defined in the configuration.
Generate the matching key starting from each asset qualifiedName:
Tables, Views, Materialized Views, Calculation Views, ColumnsSalesforce Objects and FieldsMongoDB CollectionsS3, ADLS and GCSO ObjectsPower BI Tables and ColumnsKafka TopicLooker Field and Looker View
Remove from the qualifiedName the part related to connection, database and schema.
- If "Match on schema" is No
  
  default/snowflake/1697720934/DATABASE/SCHEMA/TABLE -> TABLE
  
  default/snowflake/1697720934/DATABASE/SCHEMA/TABLE/COLUMN -> TABLE/COLUMN
- If "Match on schema" is Yes
  
  default/snowflake/1697720934/DATABASE/SCHEMA/TABLE -> SCHEMA/TABLE
  
  default/snowflake/1697720934/DATABASE/SCHEMA/TABLE/COLUMN -> SCHEMA/TABLE/COLUMN
Remove from the qualifiedName the part related to connection and organization.
- default/salesforce/1697720934/ORGANIZATION/OBJECT -> OBJECT
- default/salesforce/1697720934/ORGANIZATION/OBJECT/FIELD -> OBJECT/FIELD
Remove from the qualifiedName the part related to connection and database.
- If "Match on schema" is No
  
  default/mongodb/1697720934/DATABASE/COLLECTION -> COLLECTION
- If "Match on schema" is Yes
  
  default/mongodb/1697720934/DATABASE/COLLECTION -> DATABASE/COLLECTION
Fetch from the qualifiedName the part related to the object.
- If either "File advanced separator" or "File advanced position" is empty:
  
  default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> myobject
  
  default/adls/123456789/myaccount/mycontainer/myobject.csv -> myobject
- Else:
  
  if the "File advanced separator" is equal to /: default/s3/1707397085/arn:aws:s3:::mybucket/prefix/myobject.csv -> [default,s3,1707397085,arn:aws:s3:::mybucket,prefix,myobject.csv]
  
  if "File advanced position" is equal to 3 -> [arn:aws:s3:::mybucket,prefix,myobject.csv]
Fetch from the qualifiedName the part related to the Power BI Table and Power BI Column.
- default/powerbi/1704228169/ac169fd3-6091-459f-b012-35e5cb7aa9c1/14631399-c4eb-409b-a149-693301bb9c1a/TABLE -> TABLE
- default/powerbi/1704228169/ac169fd3-6091-459f-b012-35e5cb7aa9c1/14631399-c4eb-409b-a149-693301bb9c1a/TABLE/COLUMN -> TABLE/COLUMN
Fetch from the qualifiedName the part related to the topic.
Fetch from the qualifiedName the part related to the field name only.
- default/looker/1713912275/tile/42/distribution_centers.location -> location
If defined in the configuration, apply the regular expression replacement logic to the source asset matching keys.
If defined in the configuration, add a prefix to the source asset matching keys.
If defined in the configuration, apply an upper case expression to all asset matching keys.
Use the matching keys to match source and target assets.
Matching logic
- Salesforce Objects are matched with Tables/Views/Materialized Views and Salesforce Fields with Columns.
- MongoDB Collections are matched with Tables/Views/Materialized Views.
- S3, ADLS and GCS Objects are matched with Tables/Views/Materialized Views.
- Tables/Views/Materialized Views/Calculation Views are matched with Tables/Views/Materialized Views/Calculation Views.
- S3, ADLS and GCS Objects are matched with S3, ADLS and GCS Objects.
- Looker Fields are matched with Columns.
- Looker Views are matched with Tables/Views/Materialized Views.
Based on the output type defined in the configuration different logics are executed:
Preview LineageGenerate LineageDelete Lineage
A csv file with the list of matching assets is generated with the following columns:
- source_guid
- source_qualified_name
- source_asset_type
- target_guid
- target_qualified_name
- target_asset_type
- full_name (aka matching key)
How to download the file

The file with the lineage preview can be downloaded from the Argo workflow log screen.

Warning

The file with the lineage preview is always generated even if Preview Lineage is not selected as ouput type.
Processes assets (lineage) are created between assets matched in (5).

Warning

If Process connection is blank the process assets will be assigned to the source assets connection.

Processes assets (lineage) between assets matched in (5) are deleted.

Warning

Only processes (lineage) created using this package are deleted.