Skip to content

Asset import

Source code Sample CSV file

The asset import package loads metadata from a CSV file that matches the format of one extracted using either of the asset export packages (basic or advanced).

Use cases

  • Bulk load details from the CSV file into Atlan, to enrich assets in Atlan
  • Create new assets in bulk in Atlan, based on the details in the CSV file
  • Migrate manual enrichment done to one set of data assets to another (for example, when migrating from one data warehouse to another but retaining the same data structures)

Configuration

Source

  • Import metadata from: select how you want to provide the input file(s) to be processed (the format must match the CSV file format produced by an export)

    Directly upload CSV file(s) containing the metadata.

    Limited file sizes

    This option is generally limited to ~10-20MB for each file. For anything larger, use object storage.

    Retrieve the metadata files from cloud object storage.

    • Cloud object store the object store from which to fetch the metadata file(s).

      • AWS access key: your AWS access key.
      • AWS secret key: your AWS secret key.
      • Region: your AWS region.
      • Bucket: your AWS bucket.

      Reusing Atlan's backing S3 store

      When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.

      • Project ID: the ID of your GCP project.
      • Servive account JSON: your service account credentials, as JSON.
      • Bucket your GCS bucket.

      Reusing Atlan's backing GCS store

      When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.

      • Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
      • Azure client secret: your Azure client secret (it's actual value, not its identifier).
      • Azure tenant ID: the unique identifier of the Azure Active Directory instance.
      • Storage account name: name of your storage account.
      • Container: your ADLS container.

      Reusing Atlan's backing ADLS store

      When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.

Assets

  • Assets file: the CSV file containing enriched details, to load, for assets.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing assets metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle assets in the CSV file that do not exist in Atlan

    Create a full-fledged asset that can be discovered and maintained like other assets in Atlan.

    Create a "partial" asset. These are only shown in lineage, and cannot be discovered through search. These are useful when you want to represent a placeholder for an asset that you lack full context about, but also do not want to ignore completely.

    Only update assets that already exist in Atlan, and do not create any asset, of any kind.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how assets are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Assets will be matched case-sensitively.
    • Type names in the input file will be strictly adhered to.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.
    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).
    • Case-sensitive match for updates: whether attempts to match assets should be done case-sensitively (Yes) or case-insensitively (No).
    • Table/view agnostic?: whether the import should strictly adhere to the types Table, View and MaterializedView in the input CSV, or allow these to be matched interchangeably.
    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

    Atlan tags are always overwritten

    Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned. If you want to avoid this, be sure that the atlanTags column does not exist in your CSV file at all, or that if it does it has the complete set of tags you want to be applied to every row.

Glossaries

  • Glossaries file: the CSV file containing enriched details, to load, for glossaries, terms and categories.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing glossaries metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle glossaries, categories and terms in the CSV file that do not exist in Atlan

    Create a full-fledged glossary, category or term that can be discovered and maintained like other in Atlan.

    Only update glossaries, categories and terms that already exist in Atlan, but do not create any new ones.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how glossaries, terms and categories are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

      Atlan tags are always overwritten

      Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

Data products

  • Data products file: the CSV file containing enriched details, to load, for data domains and products.
  • Prefix (path) the directory (path) within the object store from which to fetch the file containing data domains and product metadata.
  • Object key (filename) the object key (filename), including its extension, within the object store and prefix.
  • Input handling: how to handle data domains and products in the CSV file that do not exist in Atlan

    Create a full-fledged data domain or data product that can be discovered and maintained like other in Atlan.

    Only update data domains and data products that already exist in Atlan, but do not create any new ones.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: optional settings to optimize how data domains and products are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

      Atlan tags are always overwritten

      Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

    • Field separator: character used to separate fields in the input file.
    • Batch size: maximum number of records to attempt to process per underlying API request.

What it does

For each row in the CSV file, will apply the values from the CSV file to that asset in Atlan.

  • Any values that are empty in the CSV will be ignored, unless they were specified in the Remove attributes, if empty configuration — in which case they will be removed from the asset in Atlan.
  • Any attributes for which there can be multiple values (ownerUsers, ownerGroups, assignedTerms, atlanTags, links, starredDetails, seeAlso, preferredTerms, synonyms, antonyms, translatedTerms, validValuesFor, classifies and any custom metadata attributes) will be replaced on the asset in Atlan. For example, if the asset in Atlan had 3 ownerUsers and the CSV file only lists 1, then after importing the asset in Atlan will have only 1 user in ownerUsers (the one from the CSV input file).

For assets: qualifiedName and type must match

For non-glossary assets, this relies on unique combination of the qualifiedName and typeName columns in the CSV file exactly matching an asset in Atlan. This may require you to change the qualifiedName values, for example, to use a different connection in a scenario like migrating assets from one data tool to another.

  • For glossaries, qualifiedName is ignored and name is used.
  • For terms, qualifiedName is ignored and the unique combination of name and anchor are used.
  • For categories, qualifiedName is ignored and the unique combination of name, parentCategory and anchor are used.
How it works

For the assets file, reads from the CSV file and creates a number of parallel batches for submitting the updates in several passes:

  1. The first pass will make any updates to the assets themselves, noting any related assets (like links and READMEs) that may also need to be updated, created, or deleted.
  2. The second pass will load the related assets' and process any deletions.

The glossaries file is more complicated. Since there can be so many different inter-relations between the various objects, they must be loaded in a number of different passes to ensure the dependencies exist before creating relationships amongst them.

  1. Reads from the CSV file and creates a number of parallel batches for submitting updates to glossaries only.
  2. As above, does a first pass updating the glossaries themselves and noting related assets (like links and READMEs), followed by a second pass for these related assets.
  3. Then reads categories from the CSV level-by-level (multiple passes) to create top-level categories first, then subcategories, and so on down the hierarchy. Still applying the same parallelism and multi-pass logic between the categories themselves and their related assets (links and READMEs).
  4. Then reads terms from the CSV and creates or updates these without any of the term-to-term relationships, using the same multi-pass logic between the terms themselves and their related assets (links and READMEs).
  5. Finally updates the terms with any term-to-term relationships in a final pass.