Skip to content

Source code

The lake formation tag sync package adds Lake Formation Tags as custom metadata properties on assets.

Use cases

  • Updates Custom Metadata in Atlan according to AWS Lake Formation Tag output that has been exported into one or more files (iftag_association_xxxx.json),

Configuration

Source

  • Import lake tags from: indicates from where the tag and mapping files will be retrieved.

    Retrieve the tag and mapping files from cloud object storage.

    • Cloud object store the object store from which to fetch the metadata file(s).

      • AWS access key: your AWS access key.
      • AWS secret key: your AWS secret key.
      • Region: your AWS region.
      • Bucket: your AWS bucket.

      Reusing Atlan's backing S3 store

      When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.

      • Project ID: the ID of your GCP project.
      • Servive account JSON: your service account credentials, as JSON.
      • Bucket your GCS bucket.

      Reusing Atlan's backing GCS store

      When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.

      • Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
      • Azure client secret: your Azure client secret (it's actual value, not its identifier).
      • Azure tenant ID: the unique identifier of the Azure Active Directory instance.
      • Storage account name: name of your storage account.
      • Container: your ADLS container.

      Reusing Atlan's backing ADLS store

      When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.

  • Prefix (path): directory (path) within the object store from which to retrieve the files containing the data.

  • Options: optional settings to optimize how assets are loaded.

    By default, these settings will be applied:

    • All blank fields in the input file will be ignored.
    • Any invalid value in a field will cause the import to fail rather than proceeding.
    • Assets will be matched case-sensitively.
    • Type names in the input file will be strictly adhered to.
    • Comma (,) is the expected field separator.
    • A maximum of 20 records will be processed per underlying API request.
    • Fail on errors: whether an invalid value in a field should cause the import to fail (Yes) or log a warning, skip that value, and proceed (No).
    • Batch size: maximum number of records to attempt to process per underlying API request.

What it does

It will process the files uploaded from object storage to apply custom metadata properties to assets based upon the information found in one or more tag association files. The following three types of files are expected.

  1. The Tag Association files are JSON files in the format documented in the AWS CLI reference docs here The name of each tag association file should start with iftag_association. At least on Tag Association file is required.
  2. The connection mapping file which must be named connection_map.json.This is a simple dictionary stored in a JSON file. This file specifies that mapping between DatabaseName that appears in the tag association file(s) and the database connection in Atlan. The value of DatabaseName found in the tag association file will be split at the first underscore character. The value to the left of the underscore will be used as the key to find that value in connection_map.json. The value in connnection_map.json should be the fully qualified connection name in Atlan.
  3. The metadata mapping file which must be named metadata_map.json. This is a simple dictionary stored in a JSON file. The key in the dictionary is a value associated with a TagKey in the tag association file. The value in th dictionary is human-readable metadata set name followed by a double colon and the human-readable property desired.

If a TagKey refers to a custom metadata property which is of type option any missing options specified under TagValues will be automatically created.

Missing Keys

For any Table entry with a DatabaseName for which an entry can not be found in the connection map the entry will be ignored and a warning logged. For any TagKey for which an entry can not be found in the metadata mapping file the entry will be ignored and a warning logged.

How it works

So for example given the following files file(s):

iftag_association_1.json
 {
  "TableList": [
    {
      "Table": {
        "CatalogId": "614518280298",
        "DatabaseName": "dev_sch",
        "Name": "tbl1"
      },
      "LFTagOnDatabase": [
        {
          "CatalogId": "614518280298",
          "TagKey": "security_classification",
          "TagValues": [
            "public"
          ]
        }
      ],
      "LFTagsOnTable": [
        {
          "CatalogId": "614518280298",
          "TagKey": "security_classification",
          "TagValues": [
            "public"
          ]
        }
      ],
      "LFTagsOnColumns": [
        {
          "Name": "col1",
          "LFTags": [
            {
              "CatalogId": "614518280298",
              "TagKey": "security_classification",
              "TagValues": [
                "private"
              ]
            }
          ]
        }
      ]
    }
  ]
} 
and the following connection mapping file:
connection_map.json
{"dev":"default/minisql/1719861283/db_test"}
and the following metadata map file:
metadata_map.json
{
  "security_classification": "Classifications::Security Classification",
}
After loading connection_map.json and metadata_map.json the program will start processing the tag association files. An association tag file will be loaded and the program will first create any missing Options. It does this by gathering all the TagValues associated with a TagKey. It looks up the value of the TagKey in metadata mapping file. It will then check to see if the associated metadata property is of type option. If it is then verify all the 'TagValues' specified are options. Any that are missing will be added. For each tag association file processed, a CSV file in CSV file format produced by an export containing the metadata properties to be associated with each table or column will be produced. The CSV file will then be imported using the normal import logic. So for example given the three files shown above. The file iftag_association_1.json will read each entry in TableList processed. In this example we only have one entry but in practice there would be many more. The first entry is for the Table with the DatabaseName dev_sch. dev_sch would be split on the underscore giving the two strings dev and sch. The string dev would be used as the key for the connection lookup and the string sch would be used for the schema name. The key dev is looked up in the connection map and the value returned would be default/minisql/1719861283/db_test. The schema would be appended to this value along with the value for the table tbl1 found in the tag association file(s) resulting in the fully qualified name of the table default/minisql/1719861283/db_test/sch/tbl1. Under LFTagsOnTable we only have one TagKey entry. In reality there could be more. The value of 'TagKey' is security_classification. In the custom metadata map we find the key security_classification associated with Classifications::Security Classification. The program will use the value specified under TagValues as the value of this property. A row will be created in the CSV file for each Table in the tag association file containing the fully qualified table name along with values for all the TagKey entries associated with the table. A row will also be created in the CSV for each column with the LFTagsOnColumn following the same logic for determining the values of the custom metadata properties.