Adding a data source is required to run an Importer. This article will provide an overview on how to add Amazon S3 as a data source to import data.
Key terms for Amazon S3
Amazon S3-related terms
- AWS Account ID: An AWS Account ID is a unique 12-digit number assigned to each Amazon Web Services (AWS) account.
- External ID: A user-defined string that serves as a security mechanism used to authenticate and secure access from third-party entities.
- ARN: An Amazon Resource Name (ARN) uniquely identify AWS resources. An ARN is required to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls.
- Bucket: An Amazon S3 bucket is a storage container where you can store and manage data in the form of objects. It serves as the top-level container for your data in S3 and acts like a folder in which you organize and store your files.
- CSV Path Prefix: A path prefix refers to the logical organization of objects within a bucket. A prefix is the portion of the key that comes before the object name.
Adding Amazon S3 as a data source
Step 1: Initiating the creation of a data source in your Persona dashboard
Click Imports in the Dashboard’s navigation bar. Then click on + Add new data source in the top right corner of the Imports page.
When creating a data source for use in Importers, you are required to select a source from which to import data. In this case, you would select S3.
When adding a data source, click on S3 that will show Persona’s AWS Account ID (a 12 character identification number). An External ID will also be auto-generated. You will need this to connect in order to complete set up of your new data source.
Now that you have gathered the External ID and Persona's AWS Account ID, keep this browser window open while you can proceed to Step 2 in a separate browser window. In the new window, follow the directions in Step 2 below to create an AWS Reader Role that will allow Persona to connect to your Amazon S3 bucket that contains the underlying data that you want to bring into your Persona instance.
Step 2: Creating an AWS Reader Role in the AWS S3 console
In your AWS S3 console, you will create a new role. After selecting AWS account as the Trusted entity type, copy and paste the AWS Account ID and External ID from the Data source form.
On the “Add permissions” step, grant the role AmazonS3ReadOnlyAccess
Name and create your role.
Step 3: Finish creating the data source in your Persona dashboard
This newly created Reader role from Step 2 above will have its own Amazon Resource Name (ARN), which serves as a unique identifier. Return to your browser window with the Persona dashboard open and paste the ARN to the +Add Data Source form from Step 1 above.
To complete the connection, you will need to enter the S3 region, Bucket name, and CSV Path Prefix on the data form. Once that information has been entered, click Connect.
On the next page, you will see the result of CSVs found in the bucket with CSV path prefix provided. From the list provided, select all of the CSVs to import into Persona.
An example of what a list of found csvs within the Amazon S3 bucket will look like as you complete the data source creation process.
Data source-specific considerations
Every data source has nuances specific to how the data is stored. To further normalize data and ensure Importers function successfully, there are some considerations that you may need to understand.
Importing images using an Amazon S3 data source
We will want to make sure that each image is located in the same bucket as the CSV file imported.
Each image in the bucket will have its own
object_key that uniquely
references the image in the bucket.
The CSV file from Step 3 for your import should include a column containing the object_key
of the image you want to import. During the import process, the referenced image will be downloaded and attached to the corresponding Account or Transaction.
Example
For example, in the Importer configuration below, we are importing data from a CSV file named image_import.csv
.
ref_id,name_first,name_last,phone_number,birth_date,email,file_id
account_1,John,Smith,+14151111111,1967-01-01,john_smith@withpersona.com,selfie_photo_1.jpeg
When viewing the Importer in the Persona dashboard, each row in the CSV file represents a new Account in Persona, including details such as the reference_id, name, phone number, birth date, and email. The last column in the CSV contains the object_key
of an image stored in the same bucket as the CSV. This image will be downloaded and attached as the account's selfie photo.
Plans Explained
Amazon S3 data source for Importers by plan
Startup Program | Essential Plan | Growth Plan | Enterprise Plan | |
---|---|---|---|---|
Amazon S3 as an available data source | Available | Available | Available | Available |