Skip to content

Azure Data Lake Storage crawler

The Azure Data Lake Storage crawler fetchs assets from Azure Data Lake Storage and publish them to Atlan for discovery. The assets crawled are:

  • Account
  • Container
  • Objects

Configuration

Credentials

  • Azure Client ID: unique application (client) ID assigned to your app by Azure AD when the app was registered.
  • Azure Client Secret: client secret.
  • Azure Tenant ID: unique identifier of the Azure Active Directory instance.
  • Storage Account Name: name of the Azure storage account.

Metadata

  • Container prefix: publish to Atlan only the containers that start with the 'container prefix' specified in this parameter. Leave as empty if you need all containers.
  • Object prefix: publish to Atlan only the objects that start with the 'object prefix' specified in this parameter. Leave as empty if you need all objects.

Configurations

  • Connection: name of the connection that will be created in Atlan.

Warning

The connection name must be unique across all Azure Data Lake storage connections.

What it does

The package performs the following steps:

  • Create a connection in Atlan. If the connection already exists the step is skipped.
  • Fetch the list of containers part of the storage account.
  • For each container fetch the list of objects.
  • Publish containers and objects into Atlan.

Warning

Containers and Objects deleted/archived in Azure Data Lake Storage are automatically archived in Atlan as well.