BigID Crawler¶
The BigID Crawler package crawls BigID for a specified set of data sources and imports classification and policy-related metadata into assets in Atlan in the form of tags, custom metadata and announcements.
Tags associated with assets in BigID will be added as sourceTags to the associated asset in Atlan.
Attributes associated with assets in BigID will be concatenated into a comma seperated list and added to the property of the specified custom metadata set. This custom metadata property will then be added to the associated asset in Atlan.
If policy violations are found on an asset in BigID, an Issue announcement will be created on the associated asset in Atlan. The title of the announcement will be BigID Policy Violations(s) Detected. The body of the announcement will contain a list of the policy violations.
Configuration¶
Credential¶
Workflow Name¶
Enter a unique name to help recognize and manage the custom workflow.
Host FQDN¶
The fully qualified domain name of the BigID host to be crawled. This field is required.
Personal Access Token Value¶
A personal access token that can be used to access the specified BigID host. This field is required. (see BigID Token creation for information on creating a personal access token)
SSL Certificate¶
Enter the SSL certificate if BigID uses one.
Connection¶
Connection Name¶
The name of the connection that will be associated with source tags that will be created. This field is required.
Connection Admins¶
The users that will be administrators of the connection. This field is required.
Mapping¶
Here mappings between an Atlan connection and one or more BigID data sources can be specified. Up to three mappings can be specified. Each Atlan connection or BigID data source should be only used once.
Atlan Connection¶
This field lists the available Atlan connections. Select the one that contains assets associated with the BigID datasources. This field is required.
BigID Datasources¶
This will provide a drop-down list of the available datasources on the BigID host specified earlier. At least one datasource must be selected. This field is required.
Configuration¶
Attribute Custom Metadata¶
Enter the name of the custom metadata set that will be used to store a list of BigID attributes. The custom metadata set must already exist in Atlan. This field is required.
Attribute Custom Metadata Property¶
Enter the name of the property in the custom metadata set specified that will be used to store a list of BigID attributes. The custom metadata attribute must already exist in Atlan. This field is required.
BigID token creation¶
- Go to Settings -> Access Management. Select the Roles tab and click on Add New Role.
- Enter Role Name as "Atlan Integration" and select "root" as the Scope. Enter an appropriate Role Description as desired.
- Under Role Permissions, select the following:
a. Catalog - See screenshot
b. Data Sources - Read
c. Policies - Read - Once done, click on Save.
- Select the System Users tab and click on Add New User.
- Add the details as needed and click on Connect Roles. Select Atlan Integration from the list of displayed roles. Click on Save.
- In the Tokens section, click on Generate to create a token for use during the workflow configuration.
What it does¶
Extraction¶
- For the specified datasources, get the list of all Catalog objects. Note that these are top-level assets typically (Tables, Views etc.) for relational datasources. In addition to relational, object-store datasources are supported as well.
- Iterating on the object list
- Get the details of each Catalog object using the fully qualified-name (url-encoded) from the object listing. The Catalog Object details thus retrieved will include details of Tags and Attributes associated with the Objects themselves.
- For relational objects, get the details of the Columns associated with Tables/Views. Details thus retrieved will include Tags and Attributes associated with Columns.
- Retrieve the list of all violated policies. Iterating over the list, use the policyName to get the list of all Catalog objects in violation of the given policy across the specified datasources.
Processing¶
Map Assets¶
- Look up Atlan assets to match against the Catalog Objects using the fully qualified-name (which, for object-store assets, maps to the path).
Tags¶
- Identify the list of distinct Tags across all Catalog Objects and Columns (for relational datasources).
- Retrieve the list of existing BigID Tags across the assets in the mapped Connections.
Attributes¶
- Create a list of the BigID attributes for the object/column
Policies¶
- For all assets with active policy violation(s), create an announcement (type: Issue) at an asset
Output¶
Two csv files will be output which will be imported into Atlan using the Atan import workflow.
tags.csv¶
This file will be produced from all the tags which were found in the processing step. The format of the file can be found here.
assets.csv¶
This will be in the format for asset import documented here. For each object/table/column found in the BigID catalog(s) a record will be produced with the following information: 1. qualifiedName - this will be produced by replacing the catalog name with the BigID datasource name with the Atlan connection qualifiedName. 2. typeName - this appropriate typeName depending upon the type of object found. 3. name - the name of the object 4. attributes - this will contain the list of attributes associated with the object in BigID. 5. tags - a list of the tags associated with the object in BigID 6. announcementType - this column will have the value Issue if policy violations were found for the object in BigID otherwise it will be empty. 7. announcementTitle - this column will have the value BigID Policy Violation(s) Detected if policy violations were found for the object in BigID otherwise it will be empty. 8. announcementMessage - this column will contain a list of the policy violations if policy violations were found for the object in BigID otherwise it will be empty.
Input¶
The tags.csv and assets.csv will then be imported via the asset-import workflow.