Google BigQuery is a web service from Google that is used for handling and analyzing big data. It’s part of the Google Cloud Platform.
To access BigQuery through Python, we’ll be using the google-cloud-bigquery package, which is a client library for Google BigQuery. This allows us to interact with BigQuery through Python programs, including creating, executing, and evaluating BigQuery queries.
Prerequisites
You need to have:
- Python installed on your local system or server.
- A Google Cloud Platform account.
- BigQuery API enabled on your Google Cloud Platform project.
- Appropriate permissions on the Google Cloud Platform for the operations you are going to perform.
Setting Up Google Cloud SDK
- Create a Google Cloud Platform (GCP) project. You can follow this link to create a new project: https://console.cloud.google.com/projectcreate
- Enable BigQuery API for your project. You can follow this link to enable it: https://console.cloud.google.com/flows/enableapi?apiid=bigquery
- Install Google Cloud SDK on your local system. Follow the instructions here: https://cloud.google.com/sdk/docs/install
Setting Up Authentication
Google Cloud APIs use the OAuth 2.0 protocol for authentication and authorization. The simplest way for authentication is to set up service account keys, download the JSON file, and point to it with an environment variable.
- In the Cloud Console, go to the Service Accounts page: https://console.cloud.google.com/iam-admin/serviceaccounts
- Click on Create Service Account.
- In the Service account name field, enter a name, and click on Create.
- Grant the service account BigQuery roles that gives it required permissions like BigQuery Data Viewer, BigQuery Job User, etc. (Depending on the needs of your application, you might grant additional roles).
- Click Continue, then click Done.
- Click on the service account that you created.
- Click Keys.
- Click Add key, then click Create new key.
- Click JSON. This will create a new JSON key, download it to your computer, and display a confirmation dialog.
- Click Close.
Now, point the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the JSON file.
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
Replace [PATH]
with the path of the JSON file that contains your service account key.
Installing Google Cloud BigQuery Python Client
Before proceeding, make sure to have pip installed in your Python environment. You can install the BigQuery client library using pip:
pip install google-cloud-bigquery
Executing Queries using Python
Here is a simple Python code to execute a select query:
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT name, SUM(number) as people_country
FROM `freshers-in.login-data.usa`
GROUP BY name
ORDER BY people_country DESC
LIMIT 10
"""
query_job = client.query(query) # Make an API request.
print("The query data:")
for row in query_job:
# Row values can be accessed by field name or index.
print("name={}, count={}".format(row[0], row["people_country"]))