Snowflake Asset description reverse sync¶
Asset description reverse sync Package enables users to reverse-sync updated descriptions of assets from Atlan back to Snowflake by adding them as comments on the corresponding objects within Snowflake. This package provides a stored procedure that users can execute to update the comments on Snowflake objects.
In-Scope: Snowflake
Mode: Offline
Prerequisites for Execution.¶
-
Things to note before running the workflow
Important
- Atlan must be the sole platform for managing data asset descriptions. Using alternative methods to add descriptions may result in loss of content due to overwrites.
- Ensure that the view definitions in Atlan are latest if descriptions of View Columns are to be enriched.
-
To successfully execute the artifacts generated by this package on Snowflake, the Snowflake user-account performing the operation must have the necessary roles and permissions to update Comments on the relevant Snowflake objects.
- Users can use the
Dry Run
option after loading the objects data when calling procedure.sql to preview the changes that will be made to the objects before actually executing the reverse-sync. ex:-call procedure('YES');
. - The
snowflake.account_usage.<object-type-view-name>.<object-type>_owner
column must not be empty for successfully updating comments for objects. Objects can include tables, schemas, databases, functions, etc. For example, you can use the following query to check the owner of a table -select table_name, table_owner from snowflake.account_usage.views where table_catalog = '<yourDatabaseName>';
- To retrieve all roles possessed by a given user, run the SQL below on Snowflake.
SELECT REPLACE(VALUE, '"') as CURRENT_ROLE FROM TABLE(FLATTEN(input => PARSE_JSON(CURRENT_AVAILABLE_ROLES())))
- To grant OWNERSHIP privileges on an object to an existing role, run the SQL below on Snowflake.
GRANT OWNERSHIP ON '<object_type>' '<object_name>' TO ROLE '<role_name>' COPY CURRENT GRANTS;
Credentials¶
In this section, you will configure the output destination where the assets are extracted.
- Atlan API Token: this is required to generate the dynamic dropdowns (see the Asset types in scope screens)
Warning
The CSV file can be downloaded from the Atlan UI follow steps in download section.
- The Bucket which will be used to store the output value is default S3 bucket attached to the tenant.
-
Atlan API Token: this is required to generate the dynamic dropdowns (see the Asset types in scope screens)
-
S3 Bucket Name
- S3 Folder Path: folder path with trailing slash - atlan/output/ (make sure to add slash / at end of folder name)
- Region: s3 bucket region
- For example, the bucket name is
Atlan_bucket
, the path isreverse_description_sync/
, and the region iseu-west-2
. You need to navigate to theAtlan_bucket
, then go to thereverse_description_sync
directory. Inside, you will find two SQL files:procedure.sql
andloader.sql
.
Configuration¶
- Extraction mode - Defaults to Offline.
- Connection - Select the Snowflake connection you want to use for syncing descriptions from Atlan to Snowflake.
- Asset types in scope - Choose the asset types you want to include in the reverse-sync operation,
such as
['Table', 'Column']
. This selection will ensure that only Tables and Columns with updated user descriptions are included in the process. - Recompile Views? - If selected as
YES
, the workflow will handle updating descriptions for Columns in Views, which will require a recompilation of the updated View definition SQL that could result in a momentary unavailability of the impacted Views. IfNO
is selected, the workflow will NOT handle updating descriptions for Columns in Views. - Interval for Run - Select the time interval in which the
user-description
was updated on Atlan for the selected assets.
Audit Trail¶
Database Name
and Schema Name
are used to create two types of tables during the execution of the stored procedure:
-
tbl_atlan_reverse_sync_inventory (Temporary Table)
- This table is a temporary table that exists only for the duration of the current database session. It is used to store temporary data that is required for updating comments. Once the session ends or the user logs out, this table is automatically dropped and no longer exists. It helps manage intermediate data efficiently without leaving any persistent footprint. -
tbl_atlan_reverse_sync_audit (Persistent Table)
- This table is created only if it does not already exist in the environment. It is a persistent table that remains in the database across sessions. If it is the first time the stored procedure is executed or if the table is not present in the current environment, this table will be created. It is used to store data that needs to persist beyond the current session, ensuring that essential information is retained for future operations. By using these tables appropriately, the stored procedure manages both temporary and persistent data effectively, facilitating accurate updates and maintaining necessary records.
Downloading Artifacts¶
-
The workflow generates two SQL files -
loader.sql
andprocedure.sql
. The former handles the creation of the temporary/audit tables and populating them appropriately, while the latter handles the actual reverse-sync of descriptions for the assets in scope.Download Files
- You can download the file from Atlan UI Atlan download refrence
- You can use the Atlan API/SDK to download the file from the object store as illustrated here. The key would be
<s3-prefix-value-provided-during-configuration>/loader.sql
and<s3-prefix-value-provided-during-configuration>/procedure.sql
.
Executing Artifacts¶
-
Steps to execute
Execute on Snowflake UI
- First, create the stored procedure on Snowflake. To do this, copy the content of
procedure.sql
and execute it on Snowflake. This will create the procedure on Snowflake. It will remain in the Snowflake environment until explicitly deleted. - Once this is done, copy all the content from
loader.sql
and paste it over a separate Snowflake window. - Select the appropriate database and schema at top.
- Scroll to the end of the file where you will find the CALL statement. If you want to execute the stored procedure in
Dry-run
mode, passYES
as the parameter; otherwise, passNO
. - After that, select all the statements and execute them.
Execute using SnowSql
- Install
SnowSql
on your machine and configure it. Refer to the SnowSQL Installation and Configuration Guide for detailed instructions. - If your Snowflake account uses SSO for authentication, use the following command to authenticate and execute the artifacts:
snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f procedure.sql
snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f loader.sql
- If you use a password for login, execute following command:
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f procedure.sql
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f loader.sql
- First, create the stored procedure on Snowflake. To do this, copy the content of
What it does¶
- Get assets whose user-description was updated in selected time range and updated that as a comment on Snowflake.
How it works
- Extract assets from the selected Snowflake connection.
- Transform the data and convert it into a single Python dictionary.
- Create the main stored procedure and an SQL file containing multiple insert statements, along with a call statement for the main stored procedure, using this data and a predefined Jinja template.
- Based on the selected output method, the file is either uploaded to an S3 bucket or made available for download from an Argo artifact.