Lineage builder¶
The lineage builder package allows you to create lineage between any source and any target asset.
Use cases¶
- Add your own arbitrary lineage connections, to render in lineage like any others
- Inject lineage for transformations from a tool Atlan does not mine out-of-the-box
Configuration¶
Source¶
-
Import lineage from: select how you want to provide the input file to be processed (see below for the required format)
Directly upload a CSV file containing the lineage.
- Lineage file: the CSV file containing lineage details to load.
Limited file sizes
This option is generally limited to ~10-20MB for each file. For anything larger, use object storage.
Retrieve the lineage file from cloud object storage.
- Prefix (path) the directory (path) within the object store from which to fetch the file containing lineage metadata.
- Object key (filename) the object key (filename), including its extension, within the object store and prefix.
-
Cloud object store the object store from which to fetch the file conaining cube assets.
- AWS access key: your AWS access key.
- AWS secret key: your AWS secret key.
- Region: your AWS region.
- Bucket: your AWS bucket.
Reusing Atlan's backing S3 store
When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.
- Project ID: the ID of your GCP project.
- Servive account JSON: your service account credentials, as JSON.
- Bucket your GCS bucket.
Reusing Atlan's backing GCS store
When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.
- Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
- Azure client secret: your Azure client secret (it's actual value, not its identifier).
- Azure tenant ID: the unique identifier of the Azure Active Directory instance.
- Storage account name: name of your storage account.
- Container: your ADLS container.
Reusing Atlan's backing ADLS store
When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.
Options¶
-
Unknown asset handling: how you want to handle source and target assets in the input file that do not match any assets in Atlan
Any source or target assets in the input file that do not match any asset in Atlan will be skipped (those lineage rows will not be loaded).
For any source or target asset in the input file that does not match any asset in Atlan, create a "partial" asset. These assets only appear in lineage, and cannot be searched or detailed through the sidebar. They can be later "resolved" into full assets which are then discoverable and visible in the asset sidebar.
Will not work for child assets
The lineage builder package can create partial assets for lineage at the container level (table, view, materialized view), but not at the child (field) level (column, etc). If you want to create field-level lineage using partial assets, you must first create those field-level partial assets using an alternative such as the relational assets builder.
For any source or target asset in the input file that does not match any asset in Atlan, create a full asset. These assets will behave like any other: they will be discoverable through search and appear in the asset sidebar.
-
Fail on errors: whether an invalid value in a field should cause the import to fail (
Yes
) or log a warning, skip that value, and proceed (No
). - Case-sensitive match for assets: whether attempts to match assets should be done case-sensitively (Yes) or case-insensitively (No).
- Field separator: the single character that is used to separate values in the input file, typically either a comma (
,
) or a semicolon (;
). - Batch size: the maximum number of rows of input to process per underlying API call.
What it does¶
Each row of the CSV file supplied represents a single lineage process to be created (or updated).
- If
Create full assets
orCreate partial assets
is enabled, the package will make a first pass through the file to create any assets that do not already exist in Atlan (as either fully-discoverable, or partial assets that only appear in lineage). - The package will then process each row of the CSV file supplied to:
- Create a lineage process for that row, if none already exists with the given characteristics (
Transformation Connection
andTransformation Identity
), - Update any existing lineage process that matches the row's given characteristics (
Transformation Connection
andTransformation Identity
).
- Create a lineage process for that row, if none already exists with the given characteristics (
- Finally, the package will retrieve any persistent cache of assets (used for constructing lineage in other packages) and add any new or updated assets to it.
CSV file¶
Connections must pre-exist
This package will not create any connections — the connections referenced must already exist before running the package.
Detailed information on the columns in the CSV file:
Source Type
¶
Required. Type of the source (input, upstream) asset in the lineage.
Source Connector
¶
Required. Type of connector for that asset (e.g. the kind of source system).
Source Connection
¶
Required. Name of the connection in Atlan that should contain the asset. (If the asset does not exist and creation is permitted, this is the connection in which the source asset will be created.)
Source Identity
¶
Required. Unique identity of the source asset in Atlan within the connection. This is essentially its qualifiedName
, excluding the connection portion of the qualifiedName
(which will instead be calculated for you from the preceding fields).
Source Name
¶
Required. Simple name of the source asset.
Target Type
¶
Required. Type of the target (output, downstream) asset in the lineage.
Target Connector
¶
Required. Type of connector for that asset (e.g. the kind of target system).
Target Connection
¶
Required. Name of the connection in Atlan that should contain the asset. (If the asset does not exist and creation is permitted, this is the connection in which the target asset will be created.)
Target Identity
¶
Required. Unique identity of the target asset in Atlan within the connection. This is essentially its qualifiedName
, excluding the connection portion of the qualifiedName
(which will instead be calculated for you from the preceding fields).
Target Name
¶
Required. Simple name of the target asset.
Transformation Connector
¶
Required. Type of connector for the transformation or data movement process itself (e.g. the kind of system).
Transformation Connection
¶
Required. Name of the connection in Atlan that should contain the lineage process. (Where the lineage process will be created or updated.)
Transformation Identity
¶
Required. Unique identity of the transformation (lineage) process in Atlan within the connection. This is essentially its qualifiedName
, excluding the connection portion of the qualifiedName
(which will instead be calculated for you from the preceding fields).
Transformation Name
¶
Required. Simple name to use when displaying the lineage process.
.. (remaining columns) ..
¶
You can also supply any number of additional columns. These will be loaded as attributes on the lineage process itself, and must follow the format described in the asset export (basic) package or use a column name that matches one of the lineage process properties or more general asset properties .
How it works
- If asset creation is permitted, runs a first pass through the supplied CSV file to create any assets within their respective connections.
- Then proceeds to create a single lineage process per row of the supplied CSV file.
- Will at the end update any persistent connection cache with the assets that were created or updated.