Snowflake Asset description reverse sync¶
Asset description reverse sync Package enables users to reverse-sync updated descriptions of assets from Atlan back to Snowflake by adding them as comments on the corresponding objects within Snowflake. This package provides a stored procedure that users can execute to update the comments on Snowflake objects for offline mode. For the online mode user just need to configure the ui and everything else is taken care by workflow.
In-Scope: Snowflake
Mode: Offline, Online(Direct)
Prerequisites¶
-
Things to note before running the workflow
Important
- Atlan must be the sole platform for managing data asset descriptions. Using alternative methods to add descriptions may result in loss of content due to overwrites.
- Ensure that the view definitions in Atlan are latest if descriptions of View Columns are to be enriched.
-
To successfully execute the artifacts generated by this package on Snowflake, the Snowflake user-account performing the operation must have the necessary roles and permissions to update Comments on the relevant Snowflake objects.
- The
snowflake.account_usage.<object-type-view-name>.<object-type>_owner
column must not be empty for successfully updating comments for objects. Objects can include tables, schemas, databases, functions, etc. For example, you can use the following query to check the owner of a table -select table_name, table_owner from snowflake.account_usage.views where table_catalog = '<yourDatabaseName>';
- To retrieve all roles possessed by a given user, run the SQL below on Snowflake.
SELECT REPLACE(VALUE, '"') as CURRENT_ROLE FROM TABLE(FLATTEN(input => PARSE_JSON(CURRENT_AVAILABLE_ROLES())))
- To grant OWNERSHIP privileges on an object to an existing role, run the SQL below on Snowflake.
GRANT OWNERSHIP ON '<object_type>' '<object_name>' TO ROLE '<role_name>' COPY CURRENT GRANTS;
- Things to note before running the workflow in online mode.
- Make sure the User and Role used for crawling the connection have permission to create and update objects in Snowflake, as the Argo workflow uses stored credential GUIDs to perform the update if you plan to use existing connection credentials.
- If the stored user does not have the required permissions, you can provide new credentials for the existing connection by selecting the connection type as
New
.
Configuration¶
- Workflow Name - Enter a unique name to help you quickly recognise and manage your custom workflow.
- Connection Type - Select
New
if you want the connection to use new credential details that are not part of an existing Snowflake crawler; otherwise, selectExisting
. - Mode - Choose whether you want to perform an online sync or an offline sync..
- Output - If
Preview
is selected, all the .sql files will be generated, allowing the user to validate whether the sync will be correct. Once you're confident with the results, you can proceed by selectingSync
.
Filter¶
This screen allows to apply filters on the asset. Each filterable attribute has an operator and a value. The operator is always a dropdown menu while the value can be either a dropdown or a free text field. The package considers the filter only if both the operator and the value are filled. The available filters are: - Connection - Select the connection whose description you want to sync inside snowflake. - Asset Type - Qualified Name - Unique Identifier
- Recompile Views? - If YES is selected, the workflow updates descriptions for Columns in Views. This requires recompiling the updated view definition SQL, potentially resulting in brief unavailability of the impacted views. If NO is selected, the workflow won't update descriptions for Columns in Views.
- Run Interval - Select the time interval in which the user-description was updated on Atlan for the selected assets.
- Dry run - Need to dry run before making any updated to the objects.
Audit trail¶
Database Name
and Schema Name
are used to create two types of tables during the execution of the stored procedure:
-
tbl_atlan_reverse_sync_inventory
- This table is created to store intermediate data required for updating comments and is deleted immediately after the operation is completed. This approach helps manage temporary data efficiently without leaving any persistent footprint in the database. -
tbl_atlan_reverse_sync_audit
- This table is created only if it does not already exist in the environment. It is a persistent table that remains in the database across sessions. If it is the first time the stored procedure is executed or if the table is not present in the current environment, this table will be created. It is used to store data that needs to persist beyond the current session, ensuring that essential information is retained for future operations. By using these tables appropriately, the stored procedure manages both temporary and persistent data effectively, facilitating accurate updates and maintaining necessary records.
Downloading artifacts¶
-
The workflow generates several files:
loader
,procedure
,callsp
,drop
,create
, andload-online
. -
The
create
folder contains two .sql files used to create the tbl_atlan_reverse_sync_inventory and tbl_atlan_reverse_sync_audit tables. -
The
loader
folder contains an .sql file that can be executed to load data into the tbl_atlan_reverse_sync_inventory table. -
The
procedure
folder includes the stored procedure that updates comments on the objects. -
The
callsp
folder contains an .sql file used to call the stored procedure based on the configuration. -
The
drop
folder includes an .sql file used to drop the tbl_atlan_reverse_sync_inventory table. -
The
load-online
folder (used only for online sync) contains .sql files that insert data into the tbl_atlan_reverse_sync_inventory table.Download Files
- You can download the file from Atlan UI
- You can use the Atlan API/SDK to download the file from the object store as illustrated here. The key would be
<s3-prefix-value-provided-during-configuration>/loader.sql
and<s3-prefix-value-provided-during-configuration>/procedure.sql
.
Executing artifacts¶
-
Steps to execute offline
Execute on Snowflake UI
- Run both CREATE statements located in the create folder (inside the create artifact). This will create the
tbl_atlan_reverse_sync_inventory
andtbl_atlan_reverse_sync_audit tables
. - First, create the stored procedure in Snowflake. To do this, copy the contents of
procedure.sql
(located inside the procedure artifact) and execute it in Snowflake. This will create the procedure in the Snowflake environment, where it will remain until explicitly deleted. - Once this is done, copy all the contents of loader.sql (located inside the loader artifact) and paste them into a separate Snowflake worksheet.
- Select the appropriate database and schema at top and execute all insert statment
-
Callsp
contains the call statement. Once all the data is inserted, you can execute the call statement, and it will perform the update. - At last drop
tbl_atlan_reverse_sync_inventory
table by executing drop statment inside drop artifact.
Execute using SnowSql
- Install
SnowSql
on your machine and configure it. Refer to the SnowSQL Installation and Configuration Guide for detailed instructions. - If your Snowflake account uses SSO for authentication, use the following command to authenticate and execute the artifacts:
snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f procedure.sql
snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f loader.sql
- If you use a password for login, execute following command:
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f procedure.sql
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f loader.sql
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f procedure.sql
snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f loader.sql
- Run both CREATE statements located in the create folder (inside the create artifact). This will create the
-
Steps to execute offline
Details
- In direct execution, the only requirement is that the user and role have the necessary update and insert permissions.
- Once the necessary permissions are provided, select the desired input from the UI and run the workflow.
What it does¶
- Get assets whose user-description was updated in selected time range and updated that as a comment on Snowflake.
How it works offline
- Extract assets from the selected Snowflake connection.
- Transform the data and convert it into a single Python dictionary.
- Create the main stored procedure and an SQL file containing multiple insert statements, along with a call statement for the main stored procedure, using this data and a predefined Jinja template.
- Based on the selected output method, the file is either uploaded to an S3 bucket or made available for download from an Argo artifact.
How it works direct
- Extract assets from the selected Snowflake connection.
- Use the Argo template to establish the connection and perform the description update operation on the assets.