Use Google Cloud Platform
Google Cloud Platform (GCP) is a suite of cloud computing services.
Get access to GCP
If you are a member of GDS staff, you will already have access to GCP through your GDS Google Account.
You can use GCP through:
- the cloud console
- the command line on your local machine
- code that runs on your local machine
- code that runs on GCP
You will need specific roles and permissions to use each of the GCP services.
Get a specific role or permissions to use GCP
A role is a set of permissions. You should only have the specific role or permissions you need to use the GCP service you require. See the Google Cloud documentation on understanding roles and the IAM permissions reference for more information.
If your code fails to run, check the error message to see if there was a role or permission error.
If you need to run code unsupervised (without logging into your Google account) then you need a service account. It is unlikely that you will need a service account for personal use.
Contact the Data Engineering community on Slack to ask for a role, permission or service account.
Use GCP through the cloud console
You can use GCP from the cloud console by opening one of the following URLs in a web browser:
- the cloud console
- the cloud shell
Use GCP through the command line on your local machine
You can use the Google Cloud CLI to:
- interact directly with GCP
- run code that interacts with GCP
Install the Google Cloud CLI on your local machine.
Run
gcloud init
in the command line to configure the Google Cloud CLI. Authorise the Google Cloud CLI to use your Google account when prompted in a separate window.Once configuration is complete, you and your code are authenticated for an hour. You can refresh your authentication by running
gcloud auth login
in the command line.
Use GCP through code that runs on your local machine
Follow the documentation on using GCP through the command line on your local machine.
Run
gcloud auth login --update-adc
in the command line.Authorise gcloud to use your Google account when prompted in a separate window.
Running this command also creates a file in a location that it prints at the command line, ~/.config/gcloud/application_default_credentials.json
.
Code that you run will use this file to authenticate itself on GCP.
Check that you can now use GCP through code that runs on your local machine
You should check that you can now use GCP through code that runs on your local machine.
If you cannot, this is likely to be an authentication issue.
Get the BigQuery Job User role for your GDS Google account by contacting the Data Engineering community on Slack.
Install the BigQuery API client library for Python by running
pip install --upgrade google-cloud-bigquery
in the command line.You should do this in a new python virtual environment.
Run
gcloud auth login --update-adc
. The flag--update-adc
allows code that you run to use your credentials.Create a file called
bigquery.py
that contains the following code:from google.cloud import bigquery # Construct a BigQuery client object. client = bigquery.Client(project="govuk-bigquery-analytics") query = """ SELECT name, SUM(number) as total_people FROM `bigquery-public-data.usa_names.usa_1910_2013` WHERE state = 'TX' GROUP BY name, state ORDER BY total_people DESC LIMIT 20 """ query_job = client.query(query) # Make an API request. print("The query data:") for row in query_job: # You can access row values by field name or index. print("name={}, count={}".format(row[0], row["total_people"]))
Run
python bigquery.py
in the command line. You should see the following output:The query data: name=James, count=272793 name=John, count=235139 name=Michael, count=225320 name=Robert, count=220399 name=David, count=219028 name=Mary, count=209893 name=William, count=173092 name=Jose, count=157362 name=Christopher, count=144196 name=Maria, count=131056 name=Charles, count=126509 name=Daniel, count=117470 name=Richard, count=109888 name=Juan, count=109808 name=Jennifer, count=98696 name=Joshua, count=90679 name=Elizabeth, count=90465 name=Joseph, count=89097 name=Matthew, count=88464 name=Joe, count=87977
If you do not see this output, then you currently cannot use GCP through code that runs on your local machine.
Contact the Data Engineering community on Slack to ask for help.
Access files on Google Drive
Follow the documentation on using GCP through code that runs on your local machine.
Run
gcloud auth login --enable-gdrive-access --update-adc
in the command line. The flag--enable-gdrive-access
allows code to access files on Google Drive.Install the legacy Google API Python Client by running
pip install google-api-python-client
in the command line.You should do this in a new python virtual environment.
No newer client library supports Google Drive, Sheets or Docs.
Create a file called
sheets.py
that contains the following code:from googleapiclient.discovery import build from googleapiclient.errors import HttpError from pprint import pprint from argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("-s", "--sheet-id", dest="SHEET_ID", help="ID of a Google Sheet") parser.add_argument("-r", "--range", dest="RANGE", help="Range of cells in A1 or R1C1 notation") args = parser.parse_args() # The ID and range of a sample spreadsheet. # SHEET_ID = '1ThBiDsAtMhfwvPXW39siZkZPdFS3FW_IGgQuET4YjnM' # RANGE = 'scores!A:E' try: service = build('sheets', 'v4') # Call the Sheets API result = service.spreadsheets().values().get( spreadsheetId=args.SHEET_ID, range=args.RANGE).execute() rows = result.get('values', []) if not rows: print('No data found.') # print('{0} rows retrieved.'.format(len(rows))) pprint(rows) except HttpError as err: print(err)
Run the code in this file by running
python sheets.py -s=SHEET_ID -r=RANGE
in the command line, where:
- `SHEET_ID` is the ID of a sheet that that you can access
- `RANGE` is the name of a tab and a range of cells, such as `Sheet1\!A1:E5`.
You will probably need to escape the exclamation mark with a backslash, as shown in the example.
Running this code prints the values of some cells of this file sheet in the command line.
Use GCP through code that runs on GCP
You can use the GCP through code that runs on the GCP itself.
This is the same as running code unsupervised or automatically as part of a service, without needing to log into your Google account.
Therefore you need a service account. Contact the Data Engineering community on Slack to ask for a service account, saying what roles and permissions it requires.
Contact the Data Engineering community on Slack to ask them to attach the service account to the resource in GCP that the code will run in.
You should not need to modify your code, because this code will automatically find and use the service account credentials.
Access files on Google Drive, such as Sheets and Docs, using a service account
In Google Drive, share the files with the email address of the service account.
Impersonate a service account on your local device
Contact the Data Engineering community on Slack to get a service account that grants your personal account the role Service Account Token Creator, which includes the permission iam.serviceAccounts.getAccessToken.
Note the email address of the service account.
Follow the documentation on using GCP through the command line on your local machine.
Run
gcloud auth application-default login --impersonate-service-account=EMAIL_ADDRESS
on the command line, whereEMAIL_ADDRESS
is the email of the service account, ending in.iam.gserviceaccount.com
.
The Google Cloud client libraries will now use the service account.
Use GCP from code that runs on Google Colab
Paste and run the following code into a cell in a Colab notebook.
from google.colab import auth
# Authenticate the user - follow the link and the prompts to get an authentication token
auth.authenticate_user()
Avoid using a credentials file
You should not use a credentials file because they pose a security risk.
The Google Cloud client libraries will automatically find the credentials that it needs, whether you are running code on your local device or in GCP itself.