# Export data to a data warehouse Automate data exports from Stripe to Snowflake, Redshift, Databricks, or BigQuery. Data Pipeline supports data exports from Stripe to: - [Snowflake](https://docs.snowflake.com/en/user-guide/intro-regions.html) (deployed on AWS, Azure, or Google Cloud) - [Amazon Redshift](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/select-region.html) - [Databricks](https://docs.databricks.com/aws/en/resources/supported-regions) - [BigQuery](https://cloud.google.com/bigquery/docs/locations) (Public preview) ## Availability Data Pipeline is available in the following AWS regions. BigQuery uses [GCP regions](https://docs.stripe.com/data/access-data-in-warehouse/data-warehouses.md#bigquery-regions). | AWS Region | Snowflake | Amazon Redshift RA3 (with encryption) | Databricks | | --- | --- | --- | --- | | us-west-2 (Oregon) | ✓ Supported | ✓ Supported | ✓ Supported | | us-east-2 (Ohio) | ✓ Supported | ✓ Supported | ✓ Supported | | us-east-1 (N. Virginia) | ✓ Supported | ✓ Supported | ✓ Supported | | us-west-1 (N. California) | ✓ Supported | ✓ Supported | ✓ Supported | | ca-central-1 (Central Canada) | ✓ Supported | ✓ Supported | ✓ Supported | | sa-east-1 (São Paulo) | ✓ Supported | ✓ Supported | ✓ Supported | | eu-central-1 (Frankfurt) | ✓ Supported | ✓ Supported | ✓ Supported | | eu-central-2 (Zurich) | ✓ Supported | ❌ Unsupported | ✓ Supported | | eu-west-1 (Ireland) | ✓ Supported | ✓ Supported | ✓ Supported | | eu-west-2 (London) | ✓ Supported | ✓ Supported | ✓ Supported | | eu-west-3 (Paris) | ✓ Supported | ✓ Supported | ✓ Supported | | eu-north-1 (Stockholm) | ✓ Supported | ✓ Supported | ✓ Supported | | me-south-1 (Bahrain) | ✓ Supported | ✓ Supported | ✓ Supported | | ap-southeast-1 (Singapore) | ✓ Supported | ✓ Supported | ✓ Supported | | ap-southeast-2 (Sydney) | ✓ Supported | ✓ Supported | ✓ Supported | | ap-southeast-3 (Jakarta) | ✓ Supported | ❌ Unsupported | ✓ Supported | | ap-northeast-1 (Tokyo) | ✓ Supported | ✓ Supported | ✓ Supported | | ap-northeast-2 (Seoul) | ✓ Supported | ✓ Supported | ✓ Supported | | ap-northeast-3 (Osaka) | ✓ Supported | ❌ Unsupported | ✓ Supported | | ap-east-1 (Hong Kong) | ✓ Supported | ✓ Supported | ✓ Supported | ### BigQuery regions (Public preview) BigQuery uses GCP regions instead of AWS regions. Data Pipeline supports all [BigQuery locations](https://cloud.google.com/bigquery/docs/locations#supported_locations) except `asia-south1` and `asia-south2` due to data localization requirements. ## Get started 1. In the Stripe Dashboard, go to **Reporting** > [Data management](https://dashboard.stripe.com/data-management). 2. Complete the [onboarding steps](https://docs.stripe.com/data/access-data-in-warehouse/data-warehouses.md#sdp-onboarding-video). You can only connect one warehouse account to your Stripe account at a time. After you accept the data share and finish onboarding, your core Stripe data is available in your warehouse within 12 hours. After the initial load, your data [refreshes regularly](https://docs.stripe.com/data/data-pipeline/data-freshness.md). ## Onboard #### Snowflake [Watch on YouTube](https://www.youtube.com/watch?v=Ric7L5BCSQM) ### Link your Snowflake account First, send all your up-to-date Stripe data and reports through the Dashboard: 1. On the [Data Pipeline settings](https://dashboard.stripe.com/settings/stripe-data-pipeline) page in the Dashboard, click **Sign up**. 2. From the drawer, select **Snowflake**, then click **Continue**. 3. Enter your Snowflake [Account Identifier](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html), and then select the cloud provider on which your Snowflake account is deployed (AWS, Azure, or GCP). 4. Select your region, and then click **Next**. 5. Copy the SQL from the code block and insert it in a SQL worksheet in Snowflake warehouse and run the query to retrieve the unique value. Enter the value in the text box and click **Subscribe**. ### Access your data share in Snowflake After your core Stripe data becomes available (typically within 12 hours), access your data from your Snowflake account: 1. Navigate to your Snowflake account to accept the Stripe data share. 2. If your Snowflake cloud provider is deployed on AWS in one of these seven regions: - `us-east-1`: accept share from account `GSWUDFY_STRIPE_AWS_US_EAST_1` - `us-west-2`: accept share from account `JZA07263` - `us-east-2`: accept share from account `VM70738` - `ca-central-1`: accept share from account `TD26747` - `eu-central-1`: accept share from account `GSWUDFY-STRIPE_AWS_EU_CENTRAL_1` - `eu-west-1`: accept share from account `GSWUDFY-STRIPE_AWS_EU_WEST_1` - `ap-southeast-2`: accept share from account `KB26084` In Snowflake, have a user with `ACCOUNTADMIN` access go to **Data** > **Shared Data**. Under **Ready to Get**, go to the share called `SHARE_[ACCOUNT_IDENTIFIER]` from the corresponding Stripe account. Then, click **Get shared data** to accept the share. 3. If your Snowflake cloud provider isn’t AWS, or if your AWS region isn’t one of the seven regions listed above: 1. Your share comes from the `JZA07263` account as a private listing. 2. In Snowflake, have a user with `ACCOUNTADMIN` access go to **Data Sharing** > **External Sharing**. 3. Locate the share from `JZA07263` and follow the instructions as prompted by Snowflake. 4. In the modal that opens, give the database a name (for example, “Stripe”), select the roles to grant access to (for example, `SYSADMIN`), then click **Get Data**. 5. Confirm that you can view your Stripe data in **Data From Direct Shares** and **Databases**. You can now query your Stripe data directly in Snowflake. #### Amazon Redshift RA3 [Watch on YouTube](https://www.youtube.com/watch?v=cp8684Lyi0M) Before you begin, confirm your cluster is compatible with Data Pipeline. Only RA3 node types with [cluster encryption](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html) enabled are compatible with this method of data sharing. ### Link your Amazon Redshift account First, send all your up-to-date Stripe data and reports through the Dashboard: 1. In the [Data Pipeline settings](https://dashboard.stripe.com/settings/stripe-data-pipeline) page in the Dashboard, click **Sign up**. 2. From the modal, select **Redshift**, then click **Continue**. 3. Enter your Amazon [Account ID](https://docs.aws.amazon.com/general/latest/gr/acct-identifiers.html) and your AWS region, then click **Continue**. 4. Confirm the information, then click **Sign up** to start the creation of your data share and your data should be available in Redshift within 12 hours. If you don’t see an incoming data share after 12 hours, verify that cluster encryption is enabled [in the AWS console](https://docs.aws.amazon.com/redshift/latest/mgmt/configuring-db-encryption-console.html). Amazon Redshift only displays shares in compatible clusters. ### Access your data share in Amazon Redshift Next, after 12 hours, access your data share from your Redshift account: 1. Navigate to your Amazon Redshift account to accept the Stripe data share. In Amazon Redshift, have an administrator go to the account-level **Datashares** tab in the console’s left navigation. 2. In the **From other accounts** section, go to a share entitled `share_[ACCOUNT_ID]`. Check the box next to the data share, then click **Associate**. 3. In the modal that opens, select the namespaces that have access to this data. If you want all Amazon Redshift clusters in this region to have access to this data, select **Associate the entire AWS account**. 4. Select the accepted share to open the Share details page, then click **Connect to database**. Next, connect to one of your existing databases, as you would in the Redshift query console, then click **Create database from datashare**. 5. In the following modal, give the database a name (for example, “Stripe”). You can now query your Stripe directly in Amazon Redshift. > When using Redshift, consider the following: > > - The Redshift UI displays incoming data shares in either an account-level view or a cluster-level view. When accepting the data share from Stripe, make sure to view all shares for your account by using the console’s left navigation. - Redshift doesn’t support some UTF-8 characters. If your data contains unsupported UTF-8 codepoints, the data load replaces them with `?`. #### Databricks ### Link your Databricks account First, send all your up-to-date Stripe data and reports through the Dashboard: 1. On the [Data Pipeline settings](https://dashboard.stripe.com/settings/stripe-data-pipeline) page in the Dashboard, click **Sign up**. 2. From the drawer, select **Databricks**, then click **Continue**. 3. In your [Databricks console](https://accounts.cloud.databricks.com), click **New > Query**. 4. Copy and run the following query: ```sql select current_metastore(); ``` 5. Copy the result. It has the format `cloud:region:uuid` (for example, `aws:us-east-2:a1b2c3d4-e5f6-7890-abcd-ef1234567890`). 6. Paste the sharing identifier in the text field and click **Continue**. ### Access your data share in Databricks After Stripe sends the data share to your Databricks account, access it from your Databricks console: 1. In your Databricks console, click **New > Query** and run the following command to grant yourself provider access: ```sql GRANT USE PROVIDER ON METASTORE TO `your_email`; ``` Replace `your_email` with your Databricks account email address. 2. Navigate to **Catalog > Delta Sharing > Shared with me**. 3. Click the **stripe** share. 4. Click **Mount to catalog**. 5. Select **Create a new catalog** and enter a unique catalog name. 6. Click **Create**. 7. Verify that your Stripe data is accessible in the new catalog. ### Verify your account After mounting the catalog, complete the verification step in the Dashboard: 1. In your Databricks console, click **New > Query**. 2. Copy and run the verification query displayed in the Dashboard. The query retrieves a test value from your newly mounted catalog. 3. Copy the query result and paste it in the verification field in the Dashboard. 4. Click **Activate Databricks** to complete the setup. #### BigQuery [Public preview] ### Before you begin Make sure you have: - Permission to create service accounts and GCS buckets in the project - No existing BigQuery datasets named `stripe`, `stripe_testmode`, or `account_validation` in your project #### Create a service account 1. Go to **IAM & Admin** > **Service Accounts** > **Create Service Account**. 2. Name it clearly (for example, `stripe-bigquery-service-account`), then click **Create and Continue**. 3. At the project level, assign these roles: - `Storage Insights Collector Service`: allows Stripe to verify your GCS bucket’s region - `BigQuery Job User`: allows Stripe to run BigQuery load jobs - `BigQuery Data Editor`: allows Stripe to create datasets (`stripe`, `stripe_testmode`, `account_validation`) 4. Click **Done**. 5. Open the service account, then go to **Keys > Add Key > Create new key > JSON**. Download and store securely. You upload this to Stripe in a later step. #### Create a GCS bucket 1. Go to **Cloud Storage** > **Buckets** > **Create**. 2. Set a name that identifies it as Stripe data (for example, `your-company-stripe-intermediate`). 3. For **Location type**, select a region that matches your BigQuery dataset region. We recommend **us (multiple regions in United States)**. Click **Continue**. 4. For **Storage class**, select **Standard**. 5. For **Access control**, select **Uniform** and enable **Prevent public access**. 6. For **Data protection**, select **Soft delete policy** > **Use default retention duration**. Disable **Bucket retention policy** and **Object retention**. Click **Create**. 7. Open the bucket, then go to **Permissions** > **Grant access**. 8. Add the service account email you used earlier. 9. Assign the following roles: - `Storage Object Creator`: Stripe writes files to this bucket - `Storage Object User`: Stripe reads and deletes files after loading into BigQuery > Intermediate files in the GCS bucket aren’t PGP-encrypted. 10. Click **Save**. #### Create a lifecycle policy 1. Open the bucket, then click **Lifecycle**. 2. Click **Add a rule**. 3. Select **Delete Object** as the action. 4. Set **Age** to `1` (24 hours). This auto-deletes intermediate files from the GCS bucket after Stripe loads them into BigQuery. 5. Click **Create**. #### Enable Data Access audit logs (recommended) 1. Go to **IAM & Admin** > **Audit Logs**. 2. Find **Google Cloud Storage**, then enable **Data Read** and **Data Write**. 3. Click **Save**. > Google bills for Data Access audit logs. Review expected volume with your GCP administrator. ### Link your BigQuery account 1. On the [Data Pipeline settings](https://dashboard.stripe.com/settings/stripe-data-pipeline) page in the Dashboard, click **Sign up**. 2. From the drawer, select **BigQuery**, then click **Continue**. 3. Fill in the following fields: - **GCP project name**: The Google Cloud project where your GCS bucket and BigQuery dataset live - **GCS bucket name**: The dedicated intermediate bucket you created earlier - **BigQuery dataset region**: Use the same region as your GCS bucket to avoid cross-region egress costs (for example, `us`) - **Service account JSON key**: The JSON file you downloaded earlier 4. Click **Next**. ### Verify your account 1. In the [BigQuery console](https://console.cloud.google.com/bigquery), find the `account_validation` dataset (created automatically by Stripe). 2. Run the following query: ```sql SELECT value FROM `YOUR_PROJECT_ID.account_validation.penny_test_values` WHERE merchant = 'YOUR_MERCHANT_ID' ORDER BY created_at DESC LIMIT 1; ``` 3. Replace `YOUR_PROJECT_ID` with your GCP project ID and `YOUR_MERCHANT_ID` with your Stripe account ID (for example, `acct_xxxxx`). 4. Copy the value returned by the query and paste it into the verification text box in the Dashboard. 5. Click **Activate BigQuery** to complete the setup. ### Change the warehouse account To switch to a different warehouse account: 1. Turn off Data Pipeline on the [settings page](https://dashboard.stripe.com/settings/stripe-data-pipeline). 2. [Sign up again](https://docs.stripe.com/data/access-data-in-warehouse/data-warehouses.md#get-started) using the new warehouse account details. To add another Stripe account to the same warehouse: 1. Follow the [signup steps](https://docs.stripe.com/data/access-data-in-warehouse/data-warehouses.md#get-started) for the additional Stripe account. 2. Use the same account identifier from your initial setup. To find it, go to the [settings page](https://dashboard.stripe.com/settings/stripe-data-pipeline) and locate **ID** under **Connected data warehouse**. ## Query Stripe data in your data warehouse In Snowflake, Amazon Redshift, Databricks, and BigQuery, your data is available as secure views. To query your data, follow the steps below. #### Snowflake View your available Stripe data by navigating to **Views** in the database you created. For each table, you can also see the available columns by clicking on the table and navigating to **Columns**. #### Amazon Redshift RA3 View all your available Stripe data by navigating to **Tables and views** in the database you created. #### Databricks View your available Stripe data by navigating to **Catalog** in Databricks. Select the catalog you created during onboarding, then browse the available schemas and views. #### BigQuery [Public preview] View your available Stripe data in the [BigQuery console](https://console.cloud.google.com/bigquery). Select your project, then browse the `stripe` and `stripe_testmode` datasets to see available tables and views. ## Financial reports in Data Pipeline Financial reports aren’t available for Amazon Redshift, Databricks, or BigQuery. To facilitate your financial close, you can access Stripe’s [reports](https://docs.stripe.com/stripe-reports.md) directly in your data warehouse. Financial report templates have a `FINANCIAL_REPORT` prefix and are available as views in your data warehouse. ![](https://b.stripecdn.com/docs-statics-srv/assets/finrep.eb725f745cb57d3e03e813f96b3e8071.png) ### Generating financial reports in Snowflake Generating financial reports from Data Pipeline requires setting a few custom variables. These are the same variables you set when generating the report through the Dashboard or API: - `START_DATE` (varchar)—The starting date of the report (inclusive). - `END_DATE` (varchar)—The ending date of the report (exclusive). - `TIMEZONE` (varchar)—The time zone of non-UTC datetime columns. You can format your dates with varying levels of precision: - `START_DATE` = ‘2021-09-01’; - `START_DATE` = ‘2021-09-01 00:00:00’; - `START_DATE` = ‘2021-09-01 00:00:00.000’; To set these variables and run the report query: 1. Create a new worksheet. 2. Set the database schema and required variables to your desired values. ```sql -- set schema based on the name you gave your Stripe database use schema db_name.stripe; -- set financial report template variables set (TIMEZONE, START_DATE, END_DATE) = ('UTC', '2021-09-01', '2021-10-01'); ``` Run these lines of code separately before attempting to query tables that require them. Otherwise, you might receive an error that a session variable doesn’t exist. If you use the [Snowflake Connector for Python](https://docs.snowflake.com/en/user-guide/python-connector.html), set the session parameter `TIMEZONE` with the `ALTER SESSION SET TIMEZONE = 'UTC'` command. 3. After running the code that sets the necessary variables, query the view of the report you want to generate. For example, running: ```sql select * from FINANCIAL_REPORT_BALANCE_CHANGE_FROM_ACTIVITY_ITEMIZED; ``` Returns the same results that the itemized balance change from the activity report displays in the Dashboard or through the API: ![](https://b.stripecdn.com/docs-statics-srv/assets/report.601f8e6f124d2dbf1adc74388fa58025.png) > #### Need support for a different file format? > > If you want to upload files with a different structure or in a custom format, contact [Stripe support](https://support.stripe.com/contact/email?topic=third_party_integrations&subject=Stripe%20Data%20Pipeline%20\(SDP\)).