Skip to content

BigID Crawler

The BigID Crawler package crawls BigID for a specified set of data sources and imports classification and policy-related metadata into assets in Atlan in the form of tags, custom metadata and announcements.

Tags associated with assets in BigID will be added as sourceTags to the associated asset in Atlan.

Attributes associated with assets in BigID will be concatenated into a comma seperated list and added to the property of the specified custom metadata set. This custom metadata property will then be added to the associated asset in Atlan.

If policy violations are found on an asset in BigID, an Issue announcement will be created on the associated asset in Atlan. The title of the announcement will be BigID Policy Violations(s) Detected. The body of the announcement will contain a list of the policy violations.

Configuration

Credential

Workflow Name

Enter a unique name to help recognize and manage the custom workflow.

Host FQDN

The fully qualified domain name of the BigID host to be crawled. This field is required.

Personal Access Token Value

A personal access token that can be used to access the specified BigID host. This field is required. (see BigID Token creation for information on creating a personal access token)

SSL Certificate

Enter the SSL certificate if BigID uses one.

Connection

Connection Name

The name of the connection that will be associated with source tags that will be created. This field is required.

Connection Admins

The users that will be administrators of the connection. This field is required.

Mapping

Here mappings between an Atlan connection and one or more BigID data sources can be specified. Up to three mappings can be specified. Each Atlan connection or BigID data source should be only used once.

Atlan Connection

This field lists the available Atlan connections. Select the one that contains assets associated with the BigID datasources. This field is required.

BigID Datasources

This will provide a drop-down list of the available datasources on the BigID host specified earlier. At least one datasource must be selected. This field is required.

Configuration

Attribute Custom Metadata

Enter the name of the custom metadata set that will be used to store a list of BigID attributes. The custom metadata set must already exist in Atlan. This field is required.

Attribute Custom Metadata Property

Enter the name of the property in the custom metadata set specified that will be used to store a list of BigID attributes. The custom metadata attribute must already exist in Atlan. This field is required.

BigID token creation

  1. Go to Settings -> Access Management. Select the Roles tab and click on Add New Role.
  2. Enter Role Name as "Atlan Integration" and select "root" as the Scope. Enter an appropriate Role Description as desired.
  3. Under Role Permissions, select the following:
    a. Catalog - See screenshot
    screen shot
    b. Data Sources - Read
    c. Policies - Read
  4. Once done, click on Save.
  5. Select the System Users tab and click on Add New User.
  6. Add the details as needed and click on Connect Roles. Select Atlan Integration from the list of displayed roles. Click on Save.
  7. In the Tokens section, click on Generate to create a token for use during the workflow configuration.

What it does

Extraction

  1. For the specified datasources, get the list of all Catalog objects. Note that these are top-level assets typically (Tables, Views etc.) for relational datasources. In addition to relational, object-store datasources are supported as well.
  2. Iterating on the object list
    1. Get the details of each Catalog object using the fully qualified-name (url-encoded) from the object listing. The Catalog Object details thus retrieved will include details of Tags and Attributes associated with the Objects themselves.
    2. For relational objects, get the details of the Columns associated with Tables/Views. Details thus retrieved will include Tags and Attributes associated with Columns.
  3. Retrieve the list of all violated policies. Iterating over the list, use the policyName to get the list of all Catalog objects in violation of the given policy across the specified datasources.

Processing

Map Assets

  1. Look up Atlan assets to match against the Catalog Objects using the fully qualified-name (which, for object-store assets, maps to the path).

Tags

  1. Identify the list of distinct Tags across all Catalog Objects and Columns (for relational datasources).
  2. Retrieve the list of existing BigID Tags across the assets in the mapped Connections.

Attributes

  1. Create a list of the BigID attributes for the object/column

Policies

  1. For all assets with active policy violation(s), create an announcement (type: Issue) at an asset

Output

Two csv files will be output which will be imported into Atlan using the Atan import workflow.

tags.csv

This file will be produced from all the tags which were found in the processing step. The format of the file can be found here.

assets.csv

This will be in the format for asset import documented here. For each object/table/column found in the BigID catalog(s) a record will be produced with the following information: 1. qualifiedName - this will be produced by replacing the catalog name with the BigID datasource name with the Atlan connection qualifiedName. 2. typeName - this appropriate typeName depending upon the type of object found. 3. name - the name of the object 4. attributes - this will contain the list of attributes associated with the object in BigID. 5. tags - a list of the tags associated with the object in BigID 6. announcementType - this column will have the value Issue if policy violations were found for the object in BigID otherwise it will be empty. 7. announcementTitle - this column will have the value BigID Policy Violation(s) Detected if policy violations were found for the object in BigID otherwise it will be empty. 8. announcementMessage - this column will contain a list of the policy violations if policy violations were found for the object in BigID otherwise it will be empty.

Input

The tags.csv and assets.csv will then be imported via the asset-import workflow.