Skip to content

Asset import

Source code Sample CSV file

The asset import package loads metadata from a CSV file that matches the format of one extracted using either of the asset export packages (basic or advanced).

Use cases

  • Bulk load details from the CSV file into Atlan, to enrich assets in Atlan
  • Create new assets in bulk in Atlan, based on the details in the CSV file
  • Migrate manual enrichment done to one set of data assets to another (for example, when migrating from one data warehouse to another but retaining the same data structures)

Configuration

Assets

  • Import assets from: select how you want to provide the input file to be processed (the format must match the CSV file format produced by an export)

    Directly upload a CSV file containing the assets. Note that this is generally limited to ~10-20MB.

    • Assets file: the CSV file containing enriched details, to load, for assets.

    Specify the details of an S3 object, for which there is no size limit.

    • S3 region: the S3 region where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 bucket: the S3 bucket where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 object key: the complete object key for the S3 object — including any prefix ("directory" structure) and the filename itself.

    When not using Atlan's S3 back-end storage

    When using your own (rather than Atlan's) S3 storage for the input, you must first set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the object within it.

  • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

    Atlan tags are always overwritten

    Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

  • Input handling: how to handle assets in the CSV file that do not exist in Atlan

    Create a full-fledged asset that can be discovered and maintained like other assets in Atlan.

    Create a "partial" asset. These are only shown in lineage, and cannot be discovered through search. These are useful when you want to represent a placeholder for an asset that you lack full context about, but also do not want to ignore completely.

    Only update assets that already exist in Atlan, and do not create any asset, of any kind.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

  • Case-sensitive match for updates: whether attempts to match assets should be done case-sensitively (Yes) or case-insensitively (No).
  • Table/view agnostic?: whether the import should strictly adhere to the types Table, View and MaterializedView in the input CSV, or allow these to be matched interchangeably.

Glossaries

  • Import glossaries, categories and terms from: select how you want to provide the input file to be processed (the format must match the CSV file format produced by an export)

    Directly upload a CSV file containing the glossary information. Note that this is generally limited to ~10-20MB.

    • Assets file: the CSV file containing enriched details, to load, for glossaries, terms and categories.

    Specify the details of an S3 object, for which there is no size limit.

    • S3 region: the S3 region where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 bucket: the S3 bucket where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 object key: the complete object key for the S3 object — including any prefix ("directory" structure) and the filename itself.

    When not using Atlan's S3 back-end storage

    When using your own (rather than Atlan's) S3 storage for the input, you must first set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the object within it.

  • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

    Atlan tags are always overwritten

    Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

  • Input handling: how to handle glossaries, categories and terms in the CSV file that do not exist in Atlan

    Create a full-fledged glossary, category or term that can be discovered and maintained like other in Atlan.

    Only update glossaries, categories and terms that already exist in Atlan, but do not create any new ones.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

What it does

For each row in the CSV file, will apply the values from the CSV file to that asset in Atlan.

  • Any values that are empty in the CSV will be ignored, unless they were specified in the Remove attributes, if empty configuration — in which case they will be removed from the asset in Atlan.
  • Any attributes for which there can be multiple values (ownerUsers, ownerGroups, assignedTerms, atlanTags, links, starredDetails, seeAlso, preferredTerms, synonyms, antonyms, translatedTerms, validValuesFor, classifies and any custom metadata attributes) will be replaced on the asset in Atlan. For example, if the asset in Atlan had 3 ownerUsers and the CSV file only lists 1, then after importing the asset in Atlan will have only 1 user in ownerUsers (the one from the CSV input file).

For assets: qualifiedName and type must match

For non-glossary assets, this relies on unique combination of the qualifiedName and typeName columns in the CSV file exactly matching an asset in Atlan. This may require you to change the qualifiedName values, for example, to use a different connection in a scenario like migrating assets from one data tool to another.

  • For glossaries, qualifiedName is ignored and name is used.
  • For terms, qualifiedName is ignored and the unique combination of name and anchor are used.
  • For categories, qualifiedName is ignored and the unique combination of name, parentCategory and anchor are used.
How it works

For the assets file, reads from the CSV file and creates a number of parallel batches for submitting the updates in several passes:

  1. The first pass will make any updates to the assets themselves, noting any related assets (like links and READMEs) that may also need to be updated, created, or deleted.
  2. The second pass will load the related assets' and process any deletions.

The glossaries file is more complicated. Since there can be so many different inter-relations between the various objects, they must be loaded in a number of different passes to ensure the dependencies exist before creating relationships amongst them.

  1. Reads from the CSV file and creates a number of parallel batches for submitting updates to glossaries only.
  2. As above, does a first pass updating the glossaries themselves and noting related assets (like links and READMEs), followed by a second pass for these related assets.
  3. Then reads categories from the CSV level-by-level (multiple passes) to create top-level categories first, then subcategories, and so on down the hierarchy. Still applying the same parallelism and multi-pass logic between the categories themselves and their related assets (links and READMEs).
  4. Then reads terms from the CSV and creates or updates these without any of the term-to-term relationships, using the same multi-pass logic between the terms themselves and their related assets (links and READMEs).
  5. Finally updates the terms with any term-to-term relationships in a final pass.