Skip to content

Asset import

Source code Sample CSV file

The asset import package loads metadata from a CSV file that matches the format of one extracted using either of the asset export packages (basic or advanced).

Use cases

  • Bulk load details from the CSV file into Atlan, to enrich assets in Atlan
  • Create new assets in bulk in Atlan, based on the details in the CSV file
  • Migrate manual enrichment done to one set of data assets to another (for example, when migrating from one data warehouse to another but retaining the same data structures)

Configuration

Source

  • Import metadata from: select how you want to provide the input file(s) to be processed (the format must match the CSV file format produced by an export)

    Directly upload CSV file(s) containing the metadata.

    Limited file sizes

    This option is generally limited to ~10MB for each file. For anything larger, use object storage.

    Retrieve the metadata files from cloud object storage.

    • Cloud object store the object store from which to fetch the metadata file(s).

      • AWS access key: your AWS access key.
      • AWS secret key: your AWS secret key.
      • Region: your AWS region.
      • Bucket: your AWS bucket.

      Reusing Atlan's backing S3 store

      When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.

      • Project ID: the ID of your GCP project.
      • Servive account JSON: your service account credentials, as JSON.
      • Bucket your GCS bucket.

      Reusing Atlan's backing GCS store

      When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.

      • Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
      • Azure client secret: your Azure client secret (it's actual value, not its identifier).
      • Azure tenant ID: the unique identifier of the Azure Active Directory instance.
      • Storage account name: name of your storage account.
      • Container: your ADLS container.

      Reusing Atlan's backing ADLS store

      When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.

Assets

  • Assets file: the CSV file containing enriched details, to load, for assets.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing assets metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle assets in the CSV file that do not exist in Atlan

    Create a full-fledged asset that can be discovered and maintained like other assets in Atlan.

    Create a "partial" asset. These are only shown in lineage, and cannot be discovered through search. These are useful when you want to represent a placeholder for an asset that you lack full context about, but also do not want to ignore completely.

    Only update assets that already exist in Atlan, and do not create any asset, of any kind.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how assets are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Assets will be matched case-sensitively.
    • Type names in the input file will be strictly adhered to.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.
    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).
    • Case-sensitive match for updates: whether attempts to match assets should be done case-sensitively (Yes) or case-insensitively (No).
    • Table/view agnostic?: whether the import should strictly adhere to the types Table, View and MaterializedView in the input CSV, or allow these to be matched interchangeably.
    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

    Atlan tags are always overwritten

    Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned. If you want to avoid this, be sure that the atlanTags column does not exist in your CSV file at all, or that if it does it has the complete set of tags you want to be applied to every row.

Glossaries

  • Glossaries file: the CSV file containing enriched details, to load, for glossaries, terms and categories.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing glossaries metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle glossaries, categories and terms in the CSV file that do not exist in Atlan

    Create a full-fledged glossary, category or term that can be discovered and maintained like other in Atlan.

    Only update glossaries, categories and terms that already exist in Atlan, but do not create any new ones.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how glossaries, terms and categories are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

      Atlan tags are always overwritten

      Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

Data products

  • Data products file: the CSV file containing enriched details, to load, for data domains and products.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing data domains and product metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle data domains and products in the CSV file that do not exist in Atlan

    Create a full-fledged data domain or data product that can be discovered and maintained like other in Atlan.

    Only update data domains and data products that already exist in Atlan, but do not create any new ones.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how data domains and products are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

      Atlan tags are always overwritten

      Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

What it does

For each row in the CSV file, will apply the values from the CSV file to that asset in Atlan.

  • If running in Create and update mode, the package will make multiple passes over the CSV file to attempt to load any parent assets prior to any contained child assets — to ensure assets are created in top-down order. (In Update only mode this is unnecessary, as all assets must already exist.)

    Relationships in create and update mode

    When using Create and update mode, ensure you specify all relationships from the child asset to its parent asset, and not the other way around. (This is necessary since the dependency ordering is top-down, ensuring parents exist before children are processed.)

  • Any values that are empty in the CSV will be ignored, unless they were specified in the Remove attributes, if empty configuration — in which case they will be removed from the asset in Atlan.

  • Any attributes for which there can be multiple values (ownerUsers, ownerGroups, assignedTerms, atlanTags, links, starredDetails, seeAlso, preferredTerms, synonyms, antonyms, translatedTerms, validValuesFor, classifies and any custom metadata attributes) will be replaced on the asset in Atlan. For example, if the asset in Atlan had 3 ownerUsers and the CSV file only lists 1, then after importing the asset in Atlan will have only 1 user in ownerUsers (the one from the CSV input file).
  • Finally, the package will retrieve any persistent cache of assets (used for constructing lineage in other packages) and add any new or updated assets to it.

Assets CSV file

qualifiedName and type must match

For non-glossary assets, the unique combination of the qualifiedName and typeName columns in the CSV file must exactly match an asset in Atlan. This may require you to change the qualifiedName values, for example, to use a different connection in a scenario like migrating assets from one data tool to another.

Detailed information about the fields in the CSV file:

qualifiedName

Required. Unique name of the asset. This combined with typeName will be unique in Atlan.

typeName

Required. Type of the asset.

name

Required. Name of the asset. This will be the technical name, as crawled from a source system.

connectionQualifiedName

Required. Qualified name of the connection for the asset. Without this, discovery filters will not work for the asset.

connectorType

Required. Name of the connector type for the asset (for example: snowflake). Without this, discovery filters will not work for the asset.

displayName

An optional name you can give to the asset to override how it is displayed in the Atlan UI. If present, this will be shown in the UI instead of name.

description

Explanation of the asset, possibly crawled from a source system.

userDescription

Explanation of the asset, as entered or confirmed by a user through the Atlan UI. If present, this will be shown in the UI instead of description.

ownerUsers

Individual users who are owners of the asset. Each user should be separated by a newline within the cell.

ownerGroups

Groups of users who are owners of the asset. Each group should be separated by a newline within the cell.

certificateStatus

Certificate on the asset. Must either be empty or one of:

  • VERIFIED
  • DRAFT
  • DEPRECATED

certificateStatusMessage

An optional message that can be associated with the certificate (only used if certificateStatus is non-empty).

announcementType

Type of announcement on the asset. Must either be empty or one of:

  • information
  • warning
  • issue

announcementTitle

Heading line for the announcement on the asset (only used if announcementType is non-empty).

announcementMessage

An optional detailed message that can be associated with the announcement (only used if announcementType is non-empty).

assignedTerms

Business terms that are assigned to the asset. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

atlanTags

Atlan tags that are assigned to the asset. Each tag should be separated by a newline within the cell, and formatted as one of:

  • Tag Name, for tags that should be directly assigned and should not be propagated
  • Tag Name>>FULL for tags that should be directly assigned to the asset and propagated down their hierarchy and through lineage
  • Tag Name>>HIERARCHY_ONLY for tags that should be directly assigned to the asset and only be propagated down their hierarchy (not through lineage)
  • Tag Name<<PROPAGATED for tags that have been propagated to the asset.

    Propagated tags will be ignored on import

    Any tag marked propagated (Tag Name<<PROPAGATED) will be ignored by an import. Only those tags that are directly applied will be imported, though of course any tags applied up-hierarchy or upstream that are marked to propagate will still propagate accordingly.

For source tags (with values), you can extend the tag name portion as follows:

  • Tag Name {{connectorType/connectorName@@sourceTagLocation??key=value}}, where:
    • connectorType is the type of the source tag (snowflake, dbt, etc)
    • connectorName is the name of the connection for the source the tag is synced from
    • sourceTagLocation is the path within that connection where the source tag exists
    • key is an optional key for the associated value for the tag
    • value is the value for the associated tag

For example, this will associate the Confidential Atlan tag, which is synced with the CONFIDENTIAL Snowflake tag in the ANALYTICS database's WIDE_WORLD_IMPORTERS schema and is synced through the development Snowflake connection. It has a value of Not Restricted in Snowflake, and the tag itself should be fully-propagated.

Confidential {{snowflake/development@@ANALYTICS/WIDE_WORLD_IMPORTERS/CONFIDENTIAL??=Not Restricted}}>>FULL

List of resources (links) assigned to the asset. Each link should be separated by a newline within the cell, and formatted as embedded JSON:

{"typeName":"Link","attributes":{"name":"linkName","link":"https://www.example.com"}}

readme

Richly-formatted, detailed documentation for the asset. This should be an HTML-formatted string containing everything that would be inside <body></body>, without the <body></body> wrapping.

starredDetails

Details about users who have starred the asset. Each starred asset detail entry should be separated by a newline within the cell, and formatted as embedded JSON:

{"assetStarredBy":"someone","assetStarredAt":1698769268966}

{CM}::{attribute}

Any number of columns using a :: separator in their heading represent custom metadata.

  • The {CM} portion must give the name of the custom metadata
  • The {attribute} portion must give the name of an attribute within the custom metadata.

Both are the human-readable names.

For multi-valued custom metadata attributes, each value should be separated by a newline within the cell. Date values should be provided as an epoch-style timestamp (purely numeric).

.. (remaining columns) ..

For an import, you can also supply any number of additional columns. These will be loaded as attributes on the asset on that row, and should contain the type of data expected for that attribute. (If a particular attribute does not apply to the type of asset on that row, it's cell should be left blank.)

You can find a listing of all attributes available, per asset type, through the full model reference . (Search within the diagram for the asset type of interest, or browse for it under Types along the navigation bar on the left.)

For attributes that specify a relationship to another asset, use this format:

TypeName@qualifiedName

For example:

Table@default/snowflake/1234567890/DB/SCHEMA/TABLE_NAME

Glossaries CSV file

Matching existing glossary assets

For glossary assets, the matching to existing assets is as follows:

  • For glossaries, qualifiedName is ignored and name is used.
  • For terms, qualifiedName is ignored and the unique combination of name and anchor are used.
  • For categories, qualifiedName is ignored and the unique combination of name, parentCategory and anchor are used.

Detailed information about the fields in the CSV file:

qualifiedName

Purely for your own reference, this is ignored during any import (and can therefore be empty).

typeName

Required. Type of the asset.

name

Required. Name of the asset.

anchor

Required. Name of the glossary in which the term or category is contained.

parentCategory

Parent category when the category on the row is a subcategory. The categories should use @ as a path-delimiter, and should have @@@ followed by the glossary name at the end. For example, if the parent category is called Lowest, which itself is a subcategory of Middle, itself a subcategory of Top:

Top@Middle@Lowest@@@Glossary Name

categories

Categories in which a term is organized. Each category should be separated by a newline within the cell, and formatted as indicated above: @ as a path-delimiter and @@@ as the glossary delimiter.

displayName

An optional name you can give to the asset to override how it is displayed in the Atlan UI. If present, this will be shown in the UI instead of name.

description

Explanation of the asset, as a fallback. For example, if you want to pre-populate the description for the asset but allow users to override it through Atlan's UI.

userDescription

Explanation of the asset, as entered or confirmed by a user through the Atlan UI. If present, this will be shown in the UI instead of description.

ownerUsers

Individual users who are owners of the asset. Each user should be separated by a newline within the cell.

ownerGroups

Groups of users who are owners of the asset. Each group should be separated by a newline within the cell.

certificateStatus

Certificate on the asset. Must either be empty or one of:

  • VERIFIED
  • DRAFT
  • DEPRECATED

certificateStatusMessage

An optional message that can be associated with the certificate (only used if certificateStatus is non-empty).

announcementType

Type of announcement on the asset. Must either be empty or one of:

  • information
  • warning
  • issue

announcementTitle

Heading line for the announcement on the asset (only used if announcementType is non-empty).

announcementMessage

An optional detailed message that can be associated with the announcement (only used if announcementType is non-empty).

atlanTags

Atlan tags that are assigned to the asset. Each tag should be separated by a newline within the cell, and formatted as one of:

  • Tag Name, for tags that should be directly assigned and should not be propagated
  • Tag Name>>FULL for tags that should be directly assigned to the asset and propagated down their hierarchy and through lineage
  • Tag Name>>HIERARCHY_ONLY for tags that should be directly assigned to the asset and only be propagated down their hierarchy (not through lineage)
  • Tag Name<<PROPAGATED for tags that have been propagated to the asset.

    Propagated tags will be ignored on import

    Any tag marked propagated (Tag Name<<PROPAGATED) will be ignored by an import. Only those tags that are directly applied will be imported, though of course any tags applied up-hierarchy or upstream that are marked to propagate will still propagate accordingly.

    No source tags for glossaries

    Note that source tags can only be related to physical assets, so you should not attempt to assign them to glossary objects.

List of resources (links) assigned to the asset. Each link should be separated by a newline within the cell, and formatted as embedded JSON:

{"typeName":"Link","attributes":{"name":"linkName","link":"https://www.example.com"}}

readme

Richly-formatted, detailed documentation for the asset. This should be an HTML-formatted string containing everything that would be inside <body></body>, without the <body></body> wrapping.

starredDetails

Details about users who have starred the asset. Each starred asset detail entry should be separated by a newline within the cell, and formatted as embedded JSON:

{"assetStarredBy":"someone","assetStarredAt":1698769268966}

seeAlso

Business terms that will show as related terms in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

preferredTerms

Business terms that will show as recommended terms in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

synonyms

Business terms that will show as synonyms in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

antonyms

Business terms that will show as antonyms in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

translatedTerms

Business terms that will show as translations for a term in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

validValuesFor

Business terms that will show as valid values for a term in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

classifies

Business terms that will show as classified by a term in the UI. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

{CM}::{attribute}

Any number of columns using a :: separator in their heading represent custom metadata.

  • The {CM} portion must give the name of the custom metadata
  • The {attribute} portion must give the name of an attribute within the custom metadata.

Both are the human-readable names.

For multi-valued custom metadata attributes, each value should be separated by a newline within the cell. Date values should be provided as an epoch-style timestamp (purely numeric).

Data products CSV file

Matching existing data products assets

For data products assets, the matching to existing assets is as follows:

  • For (sub)domains, qualifiedName is ignored and a combination of name and parentDomain are used.
  • For data products, qualifiedName is ignored and the unique combination of name and dataDomain are used.

Detailed information about the fields in the CSV file:

qualifiedName

Purely for your own reference, this is ignored during any import (and can therefore be empty).

typeName

Required. Type of the asset.

name

Required. Name of the asset.

parentDomain

Name of the parent domain in which the subdomain is contained.

dataDomain

Path of the data (sub)domain in which the data product is contained. The domains should use @ as a path-delimiter. For example, if the parent domain is called Lowest, which itself is a subdomain of Middle, itself a subdomain of Top:

Top@Middle@Lowest

assetCoverImage

Image to use as the cover image for the asset.

assetThemeHex

An hexadecimal RGB value to specify the color of the theme for the asset.

assetIcon

Name of the Phosphor icon to use to represent the asset, in the form PhIconName.

displayName

An optional name you can give to the asset to override how it is displayed in the Atlan UI. If present, this will be shown in the UI instead of name.

description

Explanation of the asset, as a fallback. For example, if you want to pre-populate the description for the asset but allow users to override it through Atlan's UI.

userDescription

Explanation of the asset, as entered or confirmed by a user through the Atlan UI. If present, this will be shown in the UI instead of description.

ownerUsers

Individual users who are owners of the asset. Each user should be separated by a newline within the cell.

ownerGroups

Groups of users who are owners of the asset. Each group should be separated by a newline within the cell.

certificateStatus

Certificate on the asset. Must either be empty or one of:

  • VERIFIED
  • DRAFT
  • DEPRECATED

certificateStatusMessage

An optional message that can be associated with the certificate (only used if certificateStatus is non-empty).

announcementType

Type of announcement on the asset. Must either be empty or one of:

  • information
  • warning
  • issue

announcementTitle

Heading line for the announcement on the asset (only used if announcementType is non-empty).

announcementMessage

An optional detailed message that can be associated with the announcement (only used if announcementType is non-empty).

atlanTags

Atlan tags that are assigned to the asset. Each tag should be separated by a newline within the cell, and formatted as one of:

  • Tag Name, for tags that should be directly assigned and should not be propagated
  • Tag Name>>FULL for tags that should be directly assigned to the asset and propagated down their hierarchy and through lineage
  • Tag Name>>HIERARCHY_ONLY for tags that should be directly assigned to the asset and only be propagated down their hierarchy (not through lineage)
  • Tag Name<<PROPAGATED for tags that have been propagated to the asset.

    Propagated tags will be ignored on import

    Any tag marked propagated (Tag Name<<PROPAGATED) will be ignored by an import. Only those tags that are directly applied will be imported, though of course any tags applied up-hierarchy or upstream that are marked to propagate will still propagate accordingly.

    No source tags for data products

    Note that source tags can only be related to physical assets, so you should not attempt to assign them to data product objects.

List of resources (links) assigned to the asset. Each link should be separated by a newline within the cell, and formatted as embedded JSON:

{"typeName":"Link","attributes":{"name":"linkName","link":"https://www.example.com"}}

readme

Richly-formatted, detailed documentation for the asset. This should be an HTML-formatted string containing everything that would be inside <body></body>, without the <body></body> wrapping.

starredDetails

Details about users who have starred the asset. Each starred asset detail entry should be separated by a newline within the cell, and formatted as embedded JSON:

{"assetStarredBy":"someone","assetStarredAt":1698769268966}

daapCriticality

Criticality of the data product. Must either be empty or one of:

  • High
  • Medium
  • Low

daapSensitivity

Sensitivity of the data product. Must either be empty or one of:

  • Public
  • Internal
  • Confidential

daapVisibility

Visibility of the data product. Must either be empty or one of:

  • Private
  • Protected (shows as Restricted in the UI)
  • Public

daapVisibilityUsers

List of usernames of users who should be able to see this data product. Each should be separated by a newline within the cell.

daapVisibilityGroups

List of group aliases (internal Atlan group names) of groups who should be able to see this data product. Each should be separated by a newline within the cell.

dataProductAssetsPlaybookFilter

JSON-based DSL specifying the criteria to retain for the UI-based filtering rule(s) to select the assets for the data product.

dataProductAssetsDSL

Required. JSON-based Elasticsearch DSL specifying the criteria for selecting which assets are part of the data product.

{CM}::{attribute}

Any number of columns using a :: separator in their heading represent custom metadata.

  • The {CM} portion must give the name of the custom metadata
  • The {attribute} portion must give the name of an attribute within the custom metadata.

Both are the human-readable names.

For multi-valued custom metadata attributes, each value should be separated by a newline within the cell. Date values should be provided as an epoch-style timestamp (purely numeric).

How it works

For the assets file, reads from the CSV file and creates a number of parallel batches for submitting the updates in several passes:

  1. The first pass will make any updates to the assets themselves, noting any related assets (like links and READMEs) that may also need to be updated, created, or deleted.
  2. The second pass will load the related assets' and process any deletions.

The glossaries file is more complicated. Since there can be so many different inter-relations between the various objects, they must be loaded in a number of different passes to ensure the dependencies exist before creating relationships amongst them.

  1. Reads from the CSV file and creates a number of parallel batches for submitting updates to glossaries only.
  2. As above, does a first pass updating the glossaries themselves and noting related assets (like links and READMEs), followed by a second pass for these related assets.
  3. Then reads categories from the CSV level-by-level (multiple passes) to create top-level categories first, then subcategories, and so on down the hierarchy. Still applying the same parallelism and multi-pass logic between the categories themselves and their related assets (links and READMEs).
  4. Then reads terms from the CSV and creates or updates these without any of the term-to-term relationships, using the same multi-pass logic between the terms themselves and their related assets (links and READMEs).
  5. Finally updates the terms with any term-to-term relationships in a final pass.
  6. Will at the end update any persistent connection cache with the assets that were created and updated.