Skip to content

Dataplex Custom Metadata Enricher

The Dataplex Custom Metadata Enricher package retrieves Dataplex Data Quality Scan results and publishes to the BigQuery Table and Column assets as Custom Metadata.

Note: If a table has multiple Data Quality Scan jobs, only the most recent job will be considered by the package.
Future Enhancement: Select the latest job that has the "Job_labels" field populated with the input parameter "{Primary}". Job label will be considered only in cases where multiple jobs are associated with the same table.

Custom Metadata Definition

Custom Metadata Name: Data Quality Scan Results Properties: The following custom metadata properties will be created and the scan results will be mapped to the corresponding custom metadata property for each associated asset.

Custom Metadata Property Applicable Asset Type Value
Scan Results Column Following Attributes are exported from the BigQuery Output Table and added as text to the custom metadata for each Data Quality Scan job on the table.

Job Name - data_quality_scan.scan_id
Job Result - job_quality_result.passed

For each rule:
Rule Dimension - rule_dimension (Dimension Name e.g. Completeness, Uniqueness)
Rule Name - rule_name
Rule Description - rule_description
Rule Type - rule_type
Rule Evaluation Type - rule_evaluation_type
Rule Column - rule_column
Rule Threshold Percent - rule_threshold_percent
Rule parameters - rule_parameters
Rule Passed - rule_passed
Rule Rows Evaluated - rule_rows_evaluated
Rule Rows Passed - rule_rows_passed
Rule Rows Percentage - rule_rows_passed_percent
Rule Rows Null - rule_rows_null
Rule Failed Records Query - rule_failed_records_query
Scan Results Table For each DQ Scan job on the table:

Job Name - data_quality_scan.scan_id
Job Result - job_quality_result.passed
Job URL - calculated
Job Score - job_quality_result.score
Job Start Time
Job End Time
Job Dimension result - job_dimension_result
Job Rows Scanned - job_rows_scanned
Job Error Details: [ for all failed rules
{Rule Dimension - rule_dimension (Dimension Name)
Rule Name - rule_name
Rule Description - rule_description
Rule Type - rule_type
Rule Evaluation Type - rule_evaluation_type
Rule Column - rule_column
Rule Threshold Percent - rule_threshold_percent
Rule parameters - rule_parameters
Rule Passed - rule_passed
Rule Rows Evaluated - rule_rows_evaluated
Rule Rows Passed - rule_rows_passed
Rule Rows Percentage - rule_rows_passed_percent
Rule Rows Null - rule_rows_null
Rule Failed Records Query - rule_failed_records_query}
Job Name Table data_quality_scan.scan_id
Job Result Table job_quality_result.passed
Job URL Table Calculated from columns:

data_quality_scan.project_id
data_quality_scan.location
data_quality_scan.data_scan_id
data_quality_job_id
Job Score Table job_quality_result.score
Job Start Time Table job_start_time
Job End Time Table job_end_time
Job Completeness Dimension Score Table Retrieved from job_dimension_result
e.g. {"COMPLETENESS":{"passed":true,"score":100},"CONSISTENCY":{"passed":false,"score":0},"UNIQUENESS":{"passed":false,"score":50},"VALIDITY":{"passed":false,"score":50}}

Populate Dimension score only if it exists , else leave N/A (Blank)
Job Consistency Dimension Score Table Retrieved from job_dimension_result
Job Uniqueness Dimension Score Table Retrieved from job_dimension_result
Job Freshness Dimension Score Table Retrieved from job_dimension_result
Job Volume Dimension Score Table Retrieved from job_dimension_result
Job Accuracy Dimension Score Table Retrieved from job_dimension_result
Job Rows Scanned Table job_rows_scanned
Average Number of Rules Passed (Calculated = Count of all rules that passed for this table) Table Sum of rule_passed where rule_passed = True / Total no. of rules * 100
Average Number of Rules Failed (Calculated - Count of all rules that failed for this table) Table Sum of rule_passed where rule_passed = False / Total no. of rules * 100
Total Number of Rules Table Total no. of rules for this table
Job Error Details Table Job Error Details:
List of all failed rules with the following details:

[{Rule Dimension - rule_dimension (Dimension Name)
Rule Name - rule_name
Rule Description - rule_description
Rule Type - rule_type
Rule Evaluation Type - rule_evaluation_type
Rule Column - rule_column
Rule Threshold Percent - rule_threshold_percent
Rule parameters - rule_parameters
Rule Passed - rule_passed
Rule Rows Evaluated - rule_rows_evaluated
Rule Rows Passed - rule_rows_passed
Rule Rows Percentage - rule_rows_passed_percent
Rule Rows Null - rule_rows_null
Rule Failed Records Query - rule_failed_records_query}]

Authentication

Service Account Authentication

  • Authentication is supported via Google Service Account.
  • Service Account JSON file can be uploaded via the UI.

Roles and Permissions Required:

The following permissions should be granted to the Service Account: - BigQuery Data Viewer (roles/bigquery.dataViewer)

This is required for Reading Data from BigQuery and populating on Atlan.

    Note: Granular permissions can be applied to restrict Data Viewer access to the Data Quality Scan Results output Table.

Configuration

Credentials

  • Organization ID: Google organization ID (Required for Dataplex Tags)
  • Service Account JSON: Google service account JSON
  • Query: Dataplex query to extract tags. (Add default "NA" for DQ Scans)
  • Connection : Select the Target BigQuery connection

Data Quality Scan Configuration

  • Data Quality Scan Results : Toggl to import Data Quality Scan Results
  • DQ Scan Output Table Name : Enter the name of the BigQuery Table holding the Data Quality Scan Results. Expected format is: "project.dataset.table"
  • Lookback Period : Define the lookback period (in days) to crawl Data Quality Scan results.

What it does?

Data Quality Scans

  • Retrieves Data Quality Scan results from the defined BigQuery table.
  • Creates or updates the custom metadata structure defined under the section "Custom Metadata Definition".
  • Publishes data quality results to custom metadata properties on tables and columns.