Dataplex Custom Metadata Enricher¶
The Dataplex Custom Metadata Enricher package retrieves Dataplex Data Quality Scan results and publishes to the BigQuery Table and Column assets as Custom Metadata.
Note: If a table has multiple Data Quality Scan jobs, only the most recent job will be considered by the package.
Future Enhancement: Select the latest job that has the "Job_labels" field populated with the input parameter "{Primary}". Job label will be considered only in cases where multiple jobs are associated with the same table.
Custom Metadata Definition¶
Custom Metadata Name: Data Quality Scan Results Properties: The following custom metadata properties will be created and the scan results will be mapped to the corresponding custom metadata property for each associated asset.
Custom Metadata Property | Applicable Asset Type | Value |
---|---|---|
Scan Results | Column | Following Attributes are exported from the BigQuery Output Table and added as text to the custom metadata for each Data Quality Scan job on the table. Job Name - data_quality_scan.scan_id Job Result - job_quality_result.passed For each rule: Rule Dimension - rule_dimension (Dimension Name e.g. Completeness, Uniqueness) Rule Name - rule_name Rule Description - rule_description Rule Type - rule_type Rule Evaluation Type - rule_evaluation_type Rule Column - rule_column Rule Threshold Percent - rule_threshold_percent Rule parameters - rule_parameters Rule Passed - rule_passed Rule Rows Evaluated - rule_rows_evaluated Rule Rows Passed - rule_rows_passed Rule Rows Percentage - rule_rows_passed_percent Rule Rows Null - rule_rows_null Rule Failed Records Query - rule_failed_records_query |
Scan Results | Table | For each DQ Scan job on the table: Job Name - data_quality_scan.scan_id Job Result - job_quality_result.passed Job URL - calculated Job Score - job_quality_result.score Job Start Time Job End Time Job Dimension result - job_dimension_result Job Rows Scanned - job_rows_scanned Job Error Details: [ for all failed rules {Rule Dimension - rule_dimension (Dimension Name) Rule Name - rule_name Rule Description - rule_description Rule Type - rule_type Rule Evaluation Type - rule_evaluation_type Rule Column - rule_column Rule Threshold Percent - rule_threshold_percent Rule parameters - rule_parameters Rule Passed - rule_passed Rule Rows Evaluated - rule_rows_evaluated Rule Rows Passed - rule_rows_passed Rule Rows Percentage - rule_rows_passed_percent Rule Rows Null - rule_rows_null Rule Failed Records Query - rule_failed_records_query} |
Job Name | Table | data_quality_scan.scan_id |
Job Result | Table | job_quality_result.passed |
Job URL | Table | Calculated from columns: data_quality_scan.project_id data_quality_scan.location data_quality_scan.data_scan_id data_quality_job_id |
Job Score | Table | job_quality_result.score |
Job Start Time | Table | job_start_time |
Job End Time | Table | job_end_time |
Job Completeness Dimension Score | Table | Retrieved from job_dimension_result e.g. {"COMPLETENESS":{"passed":true,"score":100},"CONSISTENCY":{"passed":false,"score":0},"UNIQUENESS":{"passed":false,"score":50},"VALIDITY":{"passed":false,"score":50}} Populate Dimension score only if it exists , else leave N/A (Blank) |
Job Consistency Dimension Score | Table | Retrieved from job_dimension_result |
Job Uniqueness Dimension Score | Table | Retrieved from job_dimension_result |
Job Freshness Dimension Score | Table | Retrieved from job_dimension_result |
Job Volume Dimension Score | Table | Retrieved from job_dimension_result |
Job Accuracy Dimension Score | Table | Retrieved from job_dimension_result |
Job Rows Scanned | Table | job_rows_scanned |
Average Number of Rules Passed (Calculated = Count of all rules that passed for this table) | Table | Sum of rule_passed where rule_passed = True / Total no. of rules * 100 |
Average Number of Rules Failed (Calculated - Count of all rules that failed for this table) | Table | Sum of rule_passed where rule_passed = False / Total no. of rules * 100 |
Total Number of Rules | Table | Total no. of rules for this table |
Job Error Details | Table | Job Error Details: List of all failed rules with the following details: [{Rule Dimension - rule_dimension (Dimension Name) Rule Name - rule_name Rule Description - rule_description Rule Type - rule_type Rule Evaluation Type - rule_evaluation_type Rule Column - rule_column Rule Threshold Percent - rule_threshold_percent Rule parameters - rule_parameters Rule Passed - rule_passed Rule Rows Evaluated - rule_rows_evaluated Rule Rows Passed - rule_rows_passed Rule Rows Percentage - rule_rows_passed_percent Rule Rows Null - rule_rows_null Rule Failed Records Query - rule_failed_records_query}] |
Authentication¶
Service Account Authentication¶
- Authentication is supported via Google Service Account.
- Service Account JSON file can be uploaded via the UI.
Roles and Permissions Required:¶
The following permissions should be granted to the Service Account: - BigQuery Data Viewer (roles/bigquery.dataViewer)
This is required for Reading Data from BigQuery and populating on Atlan.
Note: Granular permissions can be applied to restrict Data Viewer access to the Data Quality Scan Results output Table.
Configuration¶
Credentials¶
Organization ID
: Google organization ID (Required for Dataplex Tags)Service Account JSON
: Google service account JSONQuery
: Dataplex query to extract tags. (Add default "NA" for DQ Scans)Connection
: Select the Target BigQuery connection
Data Quality Scan Configuration¶
Data Quality Scan Results
: Toggl to import Data Quality Scan ResultsDQ Scan Output Table Name
: Enter the name of the BigQuery Table holding the Data Quality Scan Results. Expected format is: "project.dataset.table"Lookback Period
: Define the lookback period (in days) to crawl Data Quality Scan results.
What it does?¶
Data Quality Scans¶
- Retrieves Data Quality Scan results from the defined BigQuery table.
- Creates or updates the custom metadata structure defined under the section "Custom Metadata Definition".
- Publishes data quality results to custom metadata properties on tables and columns.