Google Cloud Storage crawler¶
The Google Cloud Storage crawler fetchs assets from Google Cloud Storage and publish them to Atlan for discovery. The assets crawled are:
- Buckets
- Objects
Configuration¶
Credentials¶
- Project id: Google Cloud project id that contains the buckets.
- Service Account JSON key: follow this article to create a service account JSON key.
The following permissions have to be granted to the role assinged to the Service Account:
storage.buckets.list
andstorage.objects.list
.
Metadata¶
- Bucket prefix: publish to Atlan only the buckets that start with the 'bucket prefix' specified in this parameter. Leave as empty if you need all buckets.
- Object prefix: publish to Atlan only the objects that start with the 'object prefix' specified in this parameter. Leave as empty if you need all objects.
-
Object delimiter (applicable only if the inventory report is NOT selected): this can be used to list all blobs in a "folder", e.g. "public/". The delimiter argument can be used to restrict the results to only the "files" in the given "folder". Without the delimiter, the entire tree under the prefix is returned. For example, given these blobs:
- a/1.txt
- a/b/2.txt
If prefix ='a/', without a delimiter, the following blobs will be published to Atlan:
- a/1.txt
- a/b/2.txt
However, if prefix='a/' and delimiter='/', only the file directly under 'a/' will be published to Atlan:
- a/1.txt
-
Bucket exclusion list: list of buckets (comma separated) to be excluded.
-
Use inventory report:
- Inventory bucket name: bucket where the inventory is stored.
- Inventory prefix: prefix within the inventory bucket where the inventory is located.
- Inventory file format: file format used to generate the inventory report,
CSV
orParquet
.
Warning
The following permissions have to be granted to the role assigned to the Service Account:
storage.buckets.list
,storage.objects.list
androles/storage.objectViewer
-
Build abstraction layer: whether to build abstraction layer on top of files, default:
No
- Publish as-is patterns: list of comma separated patterns to be published as-is (without abstraction layer). Applicable only if Build abstraction layer =
Yes
- Regex to match characters to replace: regular expression to match characters to replace. It acts on the file full name (without bucket prefix).
- Regex with replacement characters: regular expression with replacement characters. It acts on the file full name (without bucket prefix).
Configurations¶
- Connection: name of the connection that will be created in Atlan.
Warning
The connection name must be unique across all Google Cloud Storage connections.
What it does¶
The package performs the following steps:
- Create a connection in Atlan. If the connection already exists the step is skipped.
- Fetch the list of buckets part of the Google Cloud projects (according to the prefix defined in the config section).
- For each bucket fetch the list of objects (according to the prefix defined in the config section).
- Compute abstraction and add the files name as object description (if Build abstraction layer =
Yes
). - Publish buckets and objects into Atlan.
Warning
Buckets and Objects deleted/archived in Google Cloud Storage are automatically archived in Atlan as well.