Storage
BigQuery
Add credentials
Before you begin, you’ll need to create a service account key. Please read Google Cloud’s documentation on how to create that.
Once your finished, following these steps:
- Create a new pipeline or open an existing pipeline.
- Expand the left side of your screen to view the file browser.
- Scroll down and click on a file named
io_config.yaml
. - Enter the following keys and values under the key named
default
(you can have multiple profiles, add it under whichever is relevant to you) - Note: you only need to add the keys under
GOOGLE_SERVICE_ACC_KEY
or the value for keyGOOGLE_SERVICE_ACC_KEY_FILEPATH
(both are not simultaneously required. If you useGOOGLE_SERVICE_ACC_KEY_FILEPATH
, please deleteGOOGLE_SERVICE_ACC_KEY
in theio_config.yaml
).
version: 0.1.1
default:
GOOGLE_SERVICE_ACC_KEY:
type: service_account
project_id: project-id
private_key_id: key-id
private_key:
"-----BEGIN PRIVATE KEY-----\nyour_private_key\n-----END_PRIVATE_KEY"
client_email: your_service_account_email
auth_uri: "https://accounts.google.com/o/oauth2/auth"
token_uri: "https://accounts.google.com/o/oauth2/token"
auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs"
client_x509_cert_url: "https://www.googleapis.com/robot/v1/metadata/x509/your_service_account_email"
GOOGLE_SERVICE_ACC_KEY_FILEPATH: "/path/to/your/service/account/key.json"
Required permissions
bigquery.datasets.create
bigquery.datasets.get
bigquery.jobs.create
bigquery.readsessions.create
bigquery.readsessions.getData
bigquery.tables.create
If you’re running queries in the existing BigQuery dataset, make sure your account also have “BigQuery Data Editor” role for the BigQuery dataset.
Using SQL blocks
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block.
- Select
SQL
. - Under the
Data provider
dropdown, selectBigQuery
. - Under the
Profile
dropdown, selectdefault
(or the profile you added credentials underneath). - Next to the
Database
label, enter the database name you want this block to save data to. - Next to the
Save to schema
label, enter the schema name you want this block to save data to. - Under the
Write policy
dropdown, selectReplace
orAppend
(please see SQL blocks guide for more information on write policies). - Enter in this test query:
SELECT 1
. - Run the block.
Using Python blocks
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
- Select
Generic (no template)
. - Enter this code snippet (note: change the
config_profile
fromdefault
if you have a different profile):
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.bigquery import BigQuery
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame
if 'data_loader' not in globals():
from mage_ai.data_preparation.decorators import data_loader
@data_loader
def load_data_from_big_query(**kwargs) -> DataFrame:
query = 'SELECT 1'
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
return BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
- Run the block.
Export a dataframe
Here is an example code snippet to export a dataframe to BigQuery:
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.bigquery import BigQuery
from mage_ai.io.config import ConfigFileLoader
from pandas import DataFrame
from os import path
if 'data_exporter' not in globals():
from mage_ai.data_preparation.decorators import data_exporter
@data_exporter
def export_data_to_big_query(df: DataFrame, **kwargs) -> None:
"""
Template for exporting data to a BigQuery warehouse.
Specify your configuration settings in 'io_config.yaml'.
Docs: https://docs.mage.ai/design/data-loading#bigquery
"""
table_id = 'your-project.your_dataset.your_table_name'
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).export(
df,
table_id,
if_exists='replace', # Specify resolution policy if table name already exists
overwrite_types=None, # Specify the column types to overwrite in a dictionary
)
To overwrite a column type when running a python export block, simply specify the column name and type in the overwrite_types
dict in data exporter config.
You can find the supported types in this doc. Here is the example code:
BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).export(
df,
table_id,
if_exists='replace', # Specify resolution policy if table name already exists
overwrite_types={'col_name': 'STRING'}, # Specify the column types to overwrite
)
Was this page helpful?