Skip to content

Snowflake Asset description reverse sync

Asset description reverse sync Package enables users to reverse-sync updated descriptions of assets from Atlan back to Snowflake by adding them as comments on the corresponding objects within Snowflake. This package provides a stored procedure that users can execute to update the comments on Snowflake objects for offline mode. For the online mode user just need to configure the ui and everything else is taken care by workflow.

In-Scope: Snowflake

Mode: Offline, Online(Direct)

Prerequisites

  • Things to note before running the workflow

    Important

    • Atlan must be the sole platform for managing data asset descriptions. Using alternative methods to add descriptions may result in loss of content due to overwrites.
    • Ensure that the view definitions in Atlan are latest if descriptions of View Columns are to be enriched.
  • To successfully execute the artifacts generated by this package on Snowflake, the Snowflake user-account performing the operation must have the necessary roles and permissions to update Comments on the relevant Snowflake objects.

  • The snowflake.account_usage.<object-type-view-name>.<object-type>_owner column must not be empty for successfully updating comments for objects. Objects can include tables, schemas, databases, functions, etc. For example, you can use the following query to check the owner of a table -
    select table_name, table_owner from snowflake.account_usage.views  where table_catalog = '<yourDatabaseName>';
    
  • To retrieve all roles possessed by a given user, run the SQL below on Snowflake.
    SELECT REPLACE(VALUE, '"') as CURRENT_ROLE FROM TABLE(FLATTEN(input => PARSE_JSON(CURRENT_AVAILABLE_ROLES())))
    
  • To grant OWNERSHIP privileges on an object to an existing role, run the SQL below on Snowflake.
    GRANT OWNERSHIP ON '<object_type>' '<object_name>' TO ROLE '<role_name>' COPY CURRENT GRANTS;
    
  • Things to note before running the workflow in online mode.
  • Make sure the User and Role used for crawling the connection have permission to create and update objects in Snowflake, as the Argo workflow uses stored credential GUIDs to perform the update if you plan to use existing connection credentials.
  • If the stored user does not have the required permissions, you can provide new credentials for the existing connection by selecting the connection type as New.

Configuration

  • Workflow Name - Enter a unique name to help you quickly recognise and manage your custom workflow.
  • Connection Type - Select New if you want the connection to use new credential details that are not part of an existing Snowflake crawler; otherwise, select Existing.
  • Mode - Choose whether you want to perform an online sync or an offline sync..
  • Output - If Preview is selected, all the .sql files will be generated, allowing the user to validate whether the sync will be correct. Once you're confident with the results, you can proceed by selecting Sync.

Filter

This screen allows to apply filters on the asset. Each filterable attribute has an operator and a value. The operator is always a dropdown menu while the value can be either a dropdown or a free text field. The package considers the filter only if both the operator and the value are filled. The available filters are: - Connection - Select the connection whose description you want to sync inside snowflake. - Asset Type - Qualified Name - Unique Identifier

  • Recompile Views? - If YES is selected, the workflow updates descriptions for Columns in Views. This requires recompiling the updated view definition SQL, potentially resulting in brief unavailability of the impacted views. If NO is selected, the workflow won't update descriptions for Columns in Views.
  • Run Interval - Select the time interval in which the user-description was updated on Atlan for the selected assets.
  • Dry run - Need to dry run before making any updated to the objects.

Audit trail

Database Name and Schema Name are used to create two types of tables during the execution of the stored procedure:

  • tbl_atlan_reverse_sync_inventory - This table is created to store intermediate data required for updating comments and is deleted immediately after the operation is completed. This approach helps manage temporary data efficiently without leaving any persistent footprint in the database.

  • tbl_atlan_reverse_sync_audit - This table is created only if it does not already exist in the environment. It is a persistent table that remains in the database across sessions. If it is the first time the stored procedure is executed or if the table is not present in the current environment, this table will be created. It is used to store data that needs to persist beyond the current session, ensuring that essential information is retained for future operations. By using these tables appropriately, the stored procedure manages both temporary and persistent data effectively, facilitating accurate updates and maintaining necessary records.

Downloading artifacts

  • The workflow generates several files: loader, procedure, callsp, drop, create, and load-online.

  • The create folder contains two .sql files used to create the tbl_atlan_reverse_sync_inventory and tbl_atlan_reverse_sync_audit tables.

  • The loader folder contains an .sql file that can be executed to load data into the tbl_atlan_reverse_sync_inventory table.

  • The procedure folder includes the stored procedure that updates comments on the objects.

  • The callsp folder contains an .sql file used to call the stored procedure based on the configuration.

  • The drop folder includes an .sql file used to drop the tbl_atlan_reverse_sync_inventory table.

  • The load-online folder (used only for online sync) contains .sql files that insert data into the tbl_atlan_reverse_sync_inventory table.

    Download Files

    • You can download the file from Atlan UI
    • You can use the Atlan API/SDK to download the file from the object store as illustrated here. The key would be <s3-prefix-value-provided-during-configuration>/loader.sql and <s3-prefix-value-provided-during-configuration>/procedure.sql.

Executing artifacts

  • Steps to execute offline

    Execute on Snowflake UI
    • Run both CREATE statements located in the create folder (inside the create artifact). This will create the tbl_atlan_reverse_sync_inventory and tbl_atlan_reverse_sync_audit tables.
    • First, create the stored procedure in Snowflake. To do this, copy the contents of procedure.sql (located inside the procedure artifact) and execute it in Snowflake. This will create the procedure in the Snowflake environment, where it will remain until explicitly deleted.
    • Once this is done, copy all the contents of loader.sql (located inside the loader artifact) and paste them into a separate Snowflake worksheet.
    • Select the appropriate database and schema at top and execute all insert statment -Callsp contains the call statement. Once all the data is inserted, you can execute the call statement, and it will perform the update.
    • At last drop tbl_atlan_reverse_sync_inventory table by executing drop statment inside drop artifact.
    Execute using SnowSql
    • Install SnowSql on your machine and configure it. Refer to the SnowSQL Installation and Configuration Guide for detailed instructions.
    • If your Snowflake account uses SSO for authentication, use the following command to authenticate and execute the artifacts: snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f procedure.sql snowsql -a <account_id> -u <username> --authenticator externalbrowser -r <Role> -w <warehouse> -d <database> -s MY_SCHEMA -f loader.sql
    • If you use a password for login, execute following command: snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f procedure.sql snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f loader.sql snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f procedure.sql snowsql -a <account_name> -u <username> -r <role> -w <warehouse> -d <database> -s <schema> -f loader.sql
  • Steps to execute offline

    Details
    • In direct execution, the only requirement is that the user and role have the necessary update and insert permissions.
    • Once the necessary permissions are provided, select the desired input from the UI and run the workflow.

What it does

  • Get assets whose user-description was updated in selected time range and updated that as a comment on Snowflake.
How it works offline
  • Extract assets from the selected Snowflake connection.
  • Transform the data and convert it into a single Python dictionary.
  • Create the main stored procedure and an SQL file containing multiple insert statements, along with a call statement for the main stored procedure, using this data and a predefined Jinja template.
  • Based on the selected output method, the file is either uploaded to an S3 bucket or made available for download from an Argo artifact.
How it works direct
  • Extract assets from the selected Snowflake connection.
  • Use the Argo template to establish the connection and perform the description update operation on the assets.