Skip to content

Cube assets builder

Source code Sample CSV file

The cube assets builder package allows you to create (and update) net-new cube assets: connections, cubes, dimensions, hierarchies, fields, and subfields.

One cube per input file

Note that this package is designed to work when there is only a single cube (and all its dimensions, hierarchies, and fields) defined per input file.

Use cases

  • Ingest metadata for cube assets where there is not (yet) an out-of-the-box crawler

Configuration

Assets

  • Import assets from: select how you want to provide the input file to be processed (the format must match the expected CSV file format)

    Directly upload a CSV file containing the assets. Note that this is generally limited to ~10-20MB.

    • Assets file: the CSV file containing details to load, for assets.

    Specify the details of an S3 object, for which there is no size limit.

    • S3 region: the S3 region where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 bucket: the S3 bucket where the input file is hosted. If you are using the S3 storage back-end for the Atlan tenant itself, you can leave this blank.
    • S3 object key: the complete object key for the S3 object — including any prefix ("directory" structure) and the filename itself.

    When not using Atlan's S3 back-end storage

    When using your own (rather than Atlan's) S3 storage for the input, you must first set up a cross-acount bucket policy for Atlan to have access to your S3 bucket and the object within it.

  • Input handling: how to handle assets in the CSV file that do not exist in Atlan

    Create a full-fledged asset that can be discovered and maintained like other assets in Atlan. (Any existing assets will be updated.)

    Only update assets that already exist in Atlan, and do not create any asset, of any kind.

    Does not apply to related READMEs and links

    READMEs and links in Atlan are technically separate assets — but these will still be created, even in Update only mode.

  • Options: further ways to configure the utility

    Configures the utility with these defaults:

    Parameter Value
    Remove attributes, if empty None
    Fail on errors No
    Field separator ,
    Batch size 20
    • Remove attributes, if empty: by default, any value that is blank in the CSV file will simply be ignored. If you would prefer the value to instead be overwritten on the asset in Atlan (i.e. that value removed from the asset), you can select the field here. Any fields selected here whose values are empty in the file will result in the value being removed from the asset in Atlan.

      Atlan tags are always overwritten

      Note that the Atlan tags (atlanTags) column will always be replaced wholesale. Therefore, if the Atlan tags column is empty in the CSV input, but the asset in Atlan has tags assigned, after running the import that asset will no longer have any tags assigned.

    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).

    • Field separator: specify a single character that is used to separate the fields in the input file, for example a , or a ;.

    • Batch size: override the number of assets that should be sent to Atlan to be saved per API request.

Semantics

  • Delta handling: how changes should be detected and handled (if at all)

    Will delete any assets that were in a previous input file, but are no longer in the most recent input file provided. (Will also update any existing assets with the details found in the input file, and create any net-new assets found in the input file.)

    Delta calculated from files

    Be aware that the delta is calculated by comparing the input files, not the assets that currently exist in the cube within Atlan. This will only work when all management of the cube is done through this package.

    • Removal type: how assets not found in the input file should be removed:

      • Archive (recoverable): will mark each asset as soft-deleted. They will no longer appear in the UI, but can be recovered if needed.
      • Purge (cannot be recovered): will permanently delete each such asset. They will no longer appear in the UI, and there is no way to recover them.

    Will only create and update any assets that appear in the input file. Any assets that exist within the cube in Atlan, but are no longer in the input file provided, will be left as-is in Atlan.

What it does

For each row in the CSV file, will apply the values from the CSV file to that asset in Atlan.

  • Any values that are empty in the CSV will be ignored, unless they were specified in the Remove attributes, if empty configuration — in which case they will be removed from the asset in Atlan.
  • Any attributes for which there can be multiple values (ownerUsers, ownerGroups, assignedTerms, atlanTags, links, starredDetails) will be replaced on the asset in Atlan. For example, if the asset in Atlan had 3 ownerUsers and the CSV file only lists 1, then after importing the asset in Atlan will have only 1 user in ownerUsers (the one from the CSV input file).

Each typeName requires different minimal set of details

The type of asset loaded for each row in the CSV depends on the value in the typeName column. Different types require different information: for example, a Connection only requires a connectionName and connectorType, while a field requires: connectionName, connectorType, cubeName, cubeDimensionName, cubeHierarchyName, and fieldName.

CSV file

The following describes the expected format of the CSV file. The ordering of the columns in the file does not matter, but they must have exactly the same name.

typeName

Required. Type of the asset. This determines what kind of asset will be created or updated by the row of the CSV.

connectionName

Required. Name of the connection. This is any arbitrary name you want to use to group assets for a particular source, for example a system or application name.

connectorType

Required. Type of connector used to represent the source. View the available options and associated icons, specifically the all-lowercase value under the Raw REST API tab.

cubeName

Technical name of the cube, as you would crawl it from source. This is required for cube assets and any assets contained within a cube.

cubeDimensionName

Technical name of the dimension, as you would crawl it from source. This is required for dimension assets and any assets contained within a dimension.

cubeHierarchyName

Technical name of the hierarchy, as you would crawl it from source. This is required for hierarchy assets, and any assets contained within them.

fieldName

Technical name of the field, as you would crawl it from source. This is required for fields.

parentFieldQualifiedName

Path of the parent field in which a subfield is nested, using ~ as a delimiter.

displayName

An optional name you can give to the asset to override how it is displayed in the Atlan UI. If present, this will be shown in the UI instead of name.

description

Explanation of the asset, possibly crawled from a source system.

userDescription

Explanation of the asset, as entered or confirmed by a user through the Atlan UI. If present, this will be shown in the UI instead of description.

ownerUsers

Individual users who are owners of the asset. Each user should be separated by a newline within the cell.

ownerGroups

Groups of users who are owners of the asset. Each group should be separated by a newline within the cell.

certificateStatus

Certificate on the asset. Must either be empty or one of:

  • VERIFIED
  • DRAFT
  • DEPRECATED

certificateStatusMessage

An optional message that can be associated with the certificate (only used if certificateStatus is non-empty).

announcementType

Type of announcement on the asset. Must either be empty or one of:

  • information
  • warning
  • issue

announcementTitle

Heading line for the announcement on the asset (only used if announcementType is non-empty).

announcementMessage

An optional detailed message that can be associated with the announcement (only used if announcementType is non-empty).

assignedTerms

Business terms that are assigned to the asset. Each term should be separated by a newline within the cell, and formatted as:

Term Name@@@Glossary Name

atlanTags

Atlan tags that are assigned to the asset. Each tag should be separated by a newline within the cell, and formatted as one of:

  • Tag Name, for tags that should be directly assigned and should not be propagated
  • Tag Name>>FULL for tags that should be directly assigned to the asset and propagated down their hierarchy and through lineage
  • Tag Name>>HIERARCHY_ONLY for tags that should be directly assigned to the asset and only be propagated down their hierarchy (not through lineage)
  • Tag Name<<PROPAGATED for tags that have been propagated to the asset.

    Propagated tags will be ignored on import

    Any tag marked propagated (Tag Name<<PROPAGATED) will be ignored by an import. Only those tags that are directly applied will be imported, though of course any tags applied up-hierarchy or upstream that are marked to propagate will still propagate accordingly.

List of resources (links) assigned to the asset. Each link should be separated by a newline within the cell, and formatted as embedded JSON:

{"name":"linkName","link":"https://www.example.com"}

readme

Richly-formatted, detailed documentation for the asset. This should be an HTML-formatted string containing everything that would be inside <body></body>, without the <body></body> wrapping.

starredDetailsList

Details about users who have starred the asset. Each starred asset detail entry should be separated by a newline within the cell, and formatted as embedded JSON:

{"assetStarredBy":"someone","assetStarredAt":1698769268966}

{{CM}}::{{Attribute}}

Values for any custom metadata. You simply need to name the column after the custom metadata and attribute name. For example, if the custom metadata is RACI and the attribute is Responsible, the column name must be RACI::Responsible. The values for the column follow the same rules as above: for multiple values use an in-cell newline to delimit them.

How it works

Reads from the CSV file and creates a number of parallel batches for submitting the updates in several passes:

  1. The first pass preprocesses certain information in the CSV file, for example: the parent paths of fields (to determine their level), counts of children objects (fields per hierarchy, hierarchies per dimension, etc), and so on.
  2. The second pass will create and update assets themselves, noting any related assets (like links and READMEs) that may also need to be updated, created, or deleted.
  3. The third pass will load the related assets' and process any deletions of related assets.
  4. Finally, if configured to run as a Full replacement for delta handling, will compare the last loaded file into this cube with the latest provided input file to determine any assets to delete.
  5. Will then delete any such assets.