Skip to content

S3 Crawler

The S3 Crawler package allows an S3 bucket to be cataloged in Atlan, ingesting all the S3 objects residing in the bucket or optionally, those with the provided prefix. Note - Cataloging-only. No lineage-related aspects are addressed.

The assets crawled are:

  • Buckets
  • Objects

Configuration

Connection

Provide a connection name to associate with the catalog.

Credentials

Two authentication models are available.

Provide the AWS Access Key and Secret Key for an IAM user that has access to the S3 bucket. The policy below illustrates the accesses needed.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetEncryptionConfiguration"
      ],
      "Resource": [
        "arn:aws:s3:::<s3_bucket>",
        "arn:aws:s3:::<s3_bucket>/*"
      ]
    }
  ]
}

Allows for role-delegation. To configure:

  • Raise a support ticket to get the ARN of the Node Instance Role for your Atlan EKS cluster.

  • Create a new policy in your AWS account with the below accesses -

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket",
            "s3:GetObject",
            "s3:GetEncryptionConfiguration"
          ],
          "Resource": [
            "arn:aws:s3:::<s3_bucket>",
            "arn:aws:s3:::<s3_bucket>/*"
          ]
        }
      ]
    }
    
  • Create a new role in your AWS account by following the steps in the AWS Identity and Access Management User Guide.

  • When prompted for policies, attach the policy created earlier to this role.

  • When prompted, create a trust relationship for the role using the following trust policy. (Replace with the ARN received from Atlan support.)

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "<atlan_nodeinstance_role_arn>"
          },
          "Action": "sts:AssumeRole",
          "Condition": {}
        }
      ]
    }
    
  • Now, reach out to Atlan support with:

    • The name of the role you created above.
    • The ID of the AWS account where the role was created.

Warning

Wait until the support team confirms the account is allowlisted to assume the role before setting up the workflow.

Bucket Details

Specify the S3 Bucket name (without the s3:// prefix), Prefix and Region. To catalog all assets in the bucket including those at the root, leave Prefix empty.

What it does

The package performs the following steps:

  • Gathers basic information on the bucket, including versioning and encryption details.
  • Retrieves a list of objects in the bucket, based on the prefix (if provided), and the associated attributes.
  • Creates a new connection upon the first run and ingests the bucket and objects identified.
  • For subsequent runs, compares the object listing derived from the bucket against the asset catalog on Atlan. Then adds/updates/removes assets as needed to address the delta.