Relational assets builder¶
The relational assets builder package allows you to create (and update) net-new relational assets: connections, databases, schemas, tables, views, materialized views and columns.
Use cases¶
- Ingest metadata for relational assets where there is not (yet) an out-of-the-box crawler
Configuration¶
Source¶
-
Import file from: select how you want to provide the input file to be processed (the format must match the expected CSV file format)
Directly upload a CSV file containing the assets.
- Assets file: the CSV file containing details to load, for assets.
Limited file sizes
This option is generally limited to ~10MB for each file. For anything larger, use object storage.
Retrieve the relational assets file from cloud object storage.
- Prefix (path) the directory (path) within the object store from which to fetch the file containing relational assets metadata.
- Object key (filename) the object key (filename), including its extension, within the object store and prefix.
-
Cloud object store the object store from which to fetch the file conaining cube assets.
- AWS access key: your AWS access key.
- AWS secret key: your AWS secret key.
- Region: your AWS region.
- Bucket: your AWS bucket.
Reusing Atlan's backing S3 store
When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.
- Project ID: the ID of your GCP project.
- Servive account JSON: your service account credentials, as JSON.
- Bucket your GCS bucket.
Reusing Atlan's backing GCS store
When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.
- Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
- Azure client secret: your Azure client secret (it's actual value, not its identifier).
- Azure tenant ID: the unique identifier of the Azure Active Directory instance.
- Storage account name: name of your storage account.
- Container: your ADLS container.
Reusing Atlan's backing ADLS store
When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.
Semantics¶
-
Input handling: how to handle assets in the CSV file that do not exist in Atlan
Create a full-fledged asset that can be discovered and maintained like other assets in Atlan. (Any existing assets will be updated.)
Create a partial asset. Partial assets cannot be discovered in Atlan and are only visible in lineage. (Any existing assets will be updated.)
Only update assets that already exist in Atlan, and do not create any asset, of any kind.
Does not apply to related READMEs and links
READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.
-
Delta handling: how changes should be detected and handled (if at all)
Will delete any assets that were in a previous input file, but are no longer in the most recent input file provided. (Will also update any existing assets with the details found in the input file, and create any net-new assets found in the input file.)
Delta calculated from files
Be aware that the delta is calculated by comparing the input files, not the assets that currently exist in the connection within Atlan. This will only work when all management of the connection's assets is done through this package. (This also restricts the package to only allowing an input file with all of its assets in the same connection — you will receive an error if you attempt to load an input file with assets in multiple connections with this mode of delta handling.)
-
Removal type: how assets not found in the input file should be removed:
- Archive (recoverable): will mark each asset as soft-deleted. They will no longer appear in the UI, but can be recovered if needed.
- Purge (cannot be recovered): will permanently delete each such asset. They will no longer appear in the UI, and there is no way to recover them.
Will only create and update any assets that appear in the input file. Any assets that exist within the cube in Atlan, but are no longer in the input file provided, will be left as-is in Atlan.
-
Options¶
-
Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.
Atlan tags are always overwritten
Note that the Atlan tags (
atlanTags
) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned. -
Fail on errors: whether an invalid value in a field should cause the import to fail (
Yes
) or log a warning, skip that value, and proceed (No
). - Field separator: the single character that is used to separate values in the input file, typically either a comma (
,
) or a semicolon (;
). - Batch size: the maximum number of rows of input to process per underlying API call.
What it does¶
For each row in the CSV file, will apply the values from the CSV file to that asset in Atlan.
- Any values that are empty in the CSV will be ignored, unless they were specified in the
Remove attributes, if empty
configuration — in which case they will be removed from the asset in Atlan. - Any attributes for which there can be multiple values (
ownerUsers
,ownerGroups
,assignedTerms
,atlanTags
,links
,starredDetails
) will be replaced on the asset in Atlan. For example, if the asset in Atlan had 3ownerUsers
and the CSV file only lists 1, then after importing the asset in Atlan will have only 1 user inownerUsers
(the one from the CSV input file). -
If
Full replacement
delta handling is used, the package will then compare the input file from this run to the most recent input file used from a previous run.- Any assets found in the previous file that do not exist in this run's input file will then be deleted from Atlan.
- This run's input file will be uploaded to Atlan's backing store for comparison purposes against any future runs.
Must be used from the beginning
Note that due to the file-based comparisons, delta handling must be used from the beginning for it to be effective. (It will automatically detect the very first time it is run and the delta comparison will become a no-op on the initial load.)
-
Finally, the package will retrieve any persistent cache of assets (used for constructing lineage in other packages) and add any new or updated assets to it, while removing any assets from it that were deleted by the delta handling.
CSV file¶
Each typeName requires different minimal set of details
The type of asset loaded for each row in the CSV depends on the value in the typeName
column. Different types require different information: for example, a Connection
only requires a connectionName
and connectorType
, while a column requires: connectionName
, connectorType
, databaseName
, schemaName
, entityName
, and columnName
.
The following describes the expected format of the CSV file. The ordering of the columns in the file does not matter, but they must have exactly the same name.
typeName
¶
Required. Type of the asset. This determines what kind of asset will be created or updated by the row of the CSV.
connectionName
¶
Required. Name of the connection. This is any arbitrary name you want to use to group assets for a particular source, for example a system or application name.
connectorType
¶
Required. Type of connector used to represent the source. View the available options and associated icons, specifically the all-lowercase value under the Raw REST API
tab.
databaseName
¶
Technical name of the database, as you would crawl it from source. This is required for database assets and any assets contained within a database.
schemaName
¶
Technical name of the schema, as you would crawl it from source. This is required for schema assets and any assets contained within a schema.
entityName
¶
Technical name of the table, view, or materialized view, as you would crawl it from source. This is required for table, view and materialized view assets, and any assets contained within them.
columnName
¶
Technical name of the column, as you would crawl it from source. This is required for columns.
dataType
¶
Data type of a column. You can specify this as a conceptual type (like string
) or a full SQL type (like NVARCHAR(255)
or DECIMAL(5,2)
).
displayName
¶
An optional name you can give to the asset to override how it is displayed in the Atlan UI. If present, this will be shown in the UI instead of name
.
description
¶
Explanation of the asset, possibly crawled from a source system.
userDescription
¶
Explanation of the asset, as entered or confirmed by a user through the Atlan UI. If present, this will be shown in the UI instead of description
.
ownerUsers
¶
Individual users who are owners of the asset. Each user should be separated by a newline within the cell.
ownerGroups
¶
Groups of users who are owners of the asset. Each group should be separated by a newline within the cell.
certificateStatus
¶
Certificate on the asset. Must either be empty or one of:
VERIFIED
DRAFT
DEPRECATED
certificateStatusMessage
¶
An optional message that can be associated with the certificate (only used if certificateStatus
is non-empty).
announcementType
¶
Type of announcement on the asset. Must either be empty or one of:
information
warning
issue
announcementTitle
¶
Heading line for the announcement on the asset (only used if announcementType
is non-empty).
announcementMessage
¶
An optional detailed message that can be associated with the announcement (only used if announcementType
is non-empty).
assignedTerms
¶
Business terms that are assigned to the asset. Each term should be separated by a newline within the cell, and formatted as:
Term Name@@@Glossary Name
atlanTags
¶
Atlan tags that are assigned to the asset. Each tag should be separated by a newline within the cell, and formatted as one of:
Tag Name
, for tags that should be directly assigned and should not be propagatedTag Name>>FULL
for tags that should be directly assigned to the asset and propagated down their hierarchy and through lineageTag Name>>HIERARCHY_ONLY
for tags that should be directly assigned to the asset and only be propagated down their hierarchy (not through lineage)-
Tag Name<<PROPAGATED
for tags that have been propagated to the asset.Propagated tags will be ignored on import
Any tag marked propagated (
Tag Name<<PROPAGATED
) will be ignored by an import. Only those tags that are directly applied will be imported, though of course any tags applied up-hierarchy or upstream that are marked to propagate will still propagate accordingly.
links
¶
List of resources (links) assigned to the asset. Each link should be separated by a newline within the cell, and formatted as embedded JSON:
{"name":"linkName","link":"https://www.example.com"}
readme
¶
Richly-formatted, detailed documentation for the asset. This should be an HTML-formatted string containing everything that would be inside <body></body>
, without the <body></body>
wrapping.
starredDetailsList
¶
Details about users who have starred the asset. Each starred asset detail entry should be separated by a newline within the cell, and formatted as embedded JSON:
{"assetStarredBy":"someone","assetStarredAt":1698769268966}
{{CM}}::{{Attribute}}
¶
Values for any custom metadata. You simply need to name the column after the custom metadata and attribute name. For example, if the custom metadata is RACI
and the attribute is Responsible
, the column name must be RACI::Responsible
. The values for the column follow the same rules as above: for multiple values use an in-cell newline to delimit them.
How it works
Reads from the CSV file and creates a number of parallel batches for submitting the updates in several passes:
-
The first pass preprocesses certain information in the CSV file, for example: the relative position of columns (to determine their ordering), counts of children objects (columns per table, tables per schema, etc), and so on.
Assumes a particular ordering for columns
Note that the import assumes your input CSV is ordered such that any columns for a table or view are the rows listed immediately below that table or view.
-
The second pass will create and update assets themselves, noting any related assets (like links and READMEs) that may also need to be updated, created, or deleted.
- The third pass will load the related assets' and process any deletions.
- Will at the end update any persistent connection cache with the assets that were created, updated, and removed.