¶
The lake formation tag sync package adds Lake Formation Tags as custom metadata properties on assets.
Use cases¶
- Updates Custom Metadata in Atlan according to AWS Lake Formation Tag output that has been exported into one or more files (iftag_association_xxxx.json),
Configuration¶
Source¶
-
Import lake tags from: indicates from where the tag and mapping files will be retrieved.
Retrieve the tag and mapping files from cloud object storage.
-
Cloud object store the object store from which to fetch the metadata file(s).
- AWS access key: your AWS access key.
- AWS secret key: your AWS secret key.
- Region: your AWS region.
- Bucket: your AWS bucket.
Reusing Atlan's backing S3 store
When your Atlan tenant is deployed in AWS, you can leave all of these blank to reuse the backing store of Atlan itself. You can also set up a cross-account bucket policy for Atlan to have access to your S3 bucket and the objects within it, and leave these blank.
- Project ID: the ID of your GCP project.
- Servive account JSON: your service account credentials, as JSON.
- Bucket your GCS bucket.
Reusing Atlan's backing GCS store
When your Atlan tenant is deployed in GCP, you can leave all of these blank to reuse the backing store of Atlan itself.
- Azure client ID: the unique application (client) ID assigned to your app by Azure AD when the app was registered.
- Azure client secret: your Azure client secret (it's actual value, not its identifier).
- Azure tenant ID: the unique identifier of the Azure Active Directory instance.
- Storage account name: name of your storage account.
- Container: your ADLS container.
Reusing Atlan's backing ADLS store
When your Atlan tenant is deployed in Azure, you can leave all of these blank to reuse the backing store of Atlan itself.
-
-
Prefix (path): directory (path) within the object store from which to retrieve the files containing the data.
-
Options: optional settings to optimize how assets are loaded.
By default, these settings will be applied:
- All blank fields in the input file will be ignored.
- Any invalid value in a field will cause the import to fail rather than proceeding.
- Assets will be matched case-sensitively.
- Type names in the input file will be strictly adhered to.
- Comma (
,
) is the expected field separator. - A maximum of 20 records will be processed per underlying API request.
- Fail on errors: whether an invalid value in a field should cause the import to fail (
Yes
) or log a warning, skip that value, and proceed (No
). - Batch size: maximum number of records to attempt to process per underlying API request.
What it does¶
It will process the files uploaded from object storage to apply custom metadata properties to assets based upon the information found in one or more tag association files. The following three types of files are expected.
- The Tag Association files are JSON files in the format documented in the AWS CLI reference docs here The name of each tag association file should start with
iftag_association
. At least on Tag Association file is required. - The connection mapping file which must be named
connection_map.json
.This is a simple dictionary stored in a JSON file. This file specifies that mapping betweenDatabaseName
that appears in the tag association file(s) and the database connection in Atlan. The value ofDatabaseName
found in the tag association file will be split at the first underscore character. The value to the left of the underscore will be used as the key to find that value inconnection_map.json
. The value inconnnection_map.json
should be the fully qualified connection name in Atlan. - The metadata mapping file which must be named
metadata_map.json
. This is a simple dictionary stored in a JSON file. The key in the dictionary is a value associated with aTagKey
in the tag association file. The value in th dictionary is human-readable metadata set name followed by a double colon and the human-readable property desired.
If a TagKey
refers to a custom metadata property which is of type option
any missing options specified under TagValues
will be automatically created.
Missing Keys
For any Table
entry with a DatabaseName
for which an entry can not be found in the connection map the entry will be ignored and a warning logged.
For any TagKey
for which an entry can not be found in the metadata mapping file the entry will be ignored and a warning logged.
How it works
So for example given the following files file(s):
{
"TableList": [
{
"Table": {
"CatalogId": "614518280298",
"DatabaseName": "dev_sch",
"Name": "tbl1"
},
"LFTagOnDatabase": [
{
"CatalogId": "614518280298",
"TagKey": "security_classification",
"TagValues": [
"public"
]
}
],
"LFTagsOnTable": [
{
"CatalogId": "614518280298",
"TagKey": "security_classification",
"TagValues": [
"public"
]
}
],
"LFTagsOnColumns": [
{
"Name": "col1",
"LFTags": [
{
"CatalogId": "614518280298",
"TagKey": "security_classification",
"TagValues": [
"private"
]
}
]
}
]
}
]
}
{"dev":"default/minisql/1719861283/db_test"}
{
"security_classification": "Classifications::Security Classification",
}
connection_map.json
and metadata_map.json
the program will start processing the tag association files.
An association tag file will be loaded and the program will first create any missing Options
. It does this by gathering all the TagValues
associated with a TagKey
. It looks up the value of the TagKey
in metadata mapping file. It will then check to see if the associated metadata property is of type option
. If it is then verify all the 'TagValues' specified are options. Any that are missing will be added.
For each tag association file processed, a CSV file in CSV file format produced by an export containing the metadata properties to be associated with each table or column will be produced. The CSV file will then be imported using the normal import logic.
So for example given the three files shown above. The file iftag_association_1.json
will read each entry in TableList
processed. In this example we only have one entry but in practice there would be many more. The first entry is for the Table
with the DatabaseName
dev_sch
. dev_sch
would be split on the underscore giving the two strings dev
and sch
. The string dev
would be used as the key for the connection lookup and the string sch
would be used for the schema name. The key dev
is looked up in the connection map and the value returned would be default/minisql/1719861283/db_test
. The schema would be appended to this value along with the value for the table tbl1
found in the tag association file(s) resulting in the fully qualified name of the table default/minisql/1719861283/db_test/sch/tbl1
.
Under LFTagsOnTable
we only have one TagKey
entry. In reality there could be more. The value of 'TagKey' is security_classification
. In the custom metadata map we find the key security_classification
associated with Classifications::Security Classification
. The program will use the value specified under TagValues
as the value of this property.
A row will be created in the CSV file for each Table
in the tag association file containing the fully qualified table name along with values for all the TagKey
entries associated with the table.
A row will also be created in the CSV for each column with the LFTagsOnColumn
following the same logic for determining the values of the custom metadata properties.