Skip to main content

Use Google Cloud Platform

Google Cloud Platform (GCP) is a suite of cloud computing services.

Get access to GCP

If you are a member of GDS staff, you will already have access to GCP through your GDS Google Account.

You can use GCP through:

  • the cloud console
  • the command line on your local machine
  • code that runs on your local machine
  • code that runs on GCP

You will need specific roles and permissions to use each of the GCP services.

Get a specific role or permissions to use GCP

A role is a set of permissions. You should only have the specific role or permissions you need to use the GCP service you require. See the Google Cloud documentation on understanding roles and the IAM permissions reference for more information.

If your code fails to run, check the error message to see if there was a role or permission error.

If you need to run code unsupervised (without logging into your Google account) then you need a service account. It is unlikely that you will need a service account for personal use.

Contact the Data Engineering community on Slack to ask for a role, permission or service account.

Use GCP through the cloud console

You can use GCP from the cloud console by opening one of the following URLs in a web browser:

Use GCP through the command line on your local machine

You can use the Google Cloud CLI to:

  • interact directly with GCP
  • run code that interacts with GCP
  1. Install the Google Cloud CLI on your local machine.

  2. Run gcloud init in the command line to configure the Google Cloud CLI. Authorise the Google Cloud CLI to use your Google account when prompted in a separate window.

  3. Once configuration is complete, you and your code are authenticated for an hour. You can refresh your authentication by running gcloud auth login in the command line.

Use GCP through code that runs on your local machine

  1. Follow the documentation on using GCP through the command line on your local machine.

  2. Run gcloud auth login --update-adc in the command line.

    Authorise gcloud to use your Google account when prompted in a separate window.

Running this command also creates a file in a location that it prints at the command line, ~/.config/gcloud/application_default_credentials.json.

Code that you run will use this file to authenticate itself on GCP.

Check that you can now use GCP through code that runs on your local machine

You should check that you can now use GCP through code that runs on your local machine.

If you cannot, this is likely to be an authentication issue.

  1. Get the BigQuery Job User role for your GDS Google account by contacting the Data Engineering community on Slack.

  2. Install the BigQuery API client library for Python by running pip install --upgrade google-cloud-bigquery in the command line.

    You should do this in a new python virtual environment.

  3. Run gcloud auth login --update-adc. The flag --update-adc allows code that you run to use your credentials.

  4. Create a file called bigquery.py that contains the following code:

    from google.cloud import bigquery
    
    # Construct a BigQuery client object.
    client = bigquery.Client(project="govuk-bigquery-analytics")
    
    query = """
      SELECT name, SUM(number) as total_people
      FROM `bigquery-public-data.usa_names.usa_1910_2013`
      WHERE state = 'TX'
      GROUP BY name, state
      ORDER BY total_people DESC
      LIMIT 20
    """
    
    query_job = client.query(query) # Make an API request.
    
    print("The query data:")
    for row in query_job:
      # You can access row values by field name or index.
      print("name={}, count={}".format(row[0], row["total_people"]))
    
  5. Run python bigquery.py in the command line. You should see the following output:

    The query data:
    name=James, count=272793
    name=John, count=235139
    name=Michael, count=225320
    name=Robert, count=220399
    name=David, count=219028
    name=Mary, count=209893
    name=William, count=173092
    name=Jose, count=157362
    name=Christopher, count=144196
    name=Maria, count=131056
    name=Charles, count=126509
    name=Daniel, count=117470
    name=Richard, count=109888
    name=Juan, count=109808
    name=Jennifer, count=98696
    name=Joshua, count=90679
    name=Elizabeth, count=90465
    name=Joseph, count=89097
    name=Matthew, count=88464
    name=Joe, count=87977
    

If you do not see this output, then you currently cannot use GCP through code that runs on your local machine.

Contact the Data Engineering community on Slack to ask for help.

Access files on Google Drive

  1. Follow the documentation on using GCP through code that runs on your local machine.

  2. Run gcloud auth login --enable-gdrive-access --update-adc in the command line. The flag --enable-gdrive-access allows code to access files on Google Drive.

  3. Install the legacy Google API Python Client by running pip install google-api-python-client in the command line.

    You should do this in a new python virtual environment.

    No newer client library supports Google Drive, Sheets or Docs.

  4. Create a file called sheets.py that contains the following code:

    from googleapiclient.discovery import build
    from googleapiclient.errors import HttpError
    from pprint import pprint
    from argparse import ArgumentParser
    
    parser = ArgumentParser()
    parser.add_argument("-s", "--sheet-id", dest="SHEET_ID",
              help="ID of a Google Sheet")
    parser.add_argument("-r", "--range", dest="RANGE",
              help="Range of cells in A1 or R1C1 notation")
    args = parser.parse_args()
    
    # The ID and range of a sample spreadsheet.
    # SHEET_ID = '1ThBiDsAtMhfwvPXW39siZkZPdFS3FW_IGgQuET4YjnM'
    # RANGE = 'scores!A:E'
    
    try:
      service = build('sheets', 'v4')
    
      # Call the Sheets API
      result = service.spreadsheets().values().get(
        spreadsheetId=args.SHEET_ID, range=args.RANGE).execute()
      rows = result.get('values', [])
    
      if not rows:
        print('No data found.')
    
      # print('{0} rows retrieved.'.format(len(rows)))
      pprint(rows)
    except HttpError as err:
      print(err)
    
  5. Run the code in this file by running python sheets.py -s=SHEET_ID -r=RANGE in the command line, where:

- `SHEET_ID` is the ID of a sheet that that you can access
- `RANGE` is the name of a tab and a range of cells, such as `Sheet1\!A1:E5`.

You will probably need to escape the exclamation mark with a backslash, as shown in the example.

Running this code prints the values of some cells of this file sheet in the command line.

Use GCP through code that runs on GCP

You can use the GCP through code that runs on the GCP itself.

This is the same as running code unsupervised or automatically as part of a service, without needing to log into your Google account.

Therefore you need a service account. Contact the Data Engineering community on Slack to ask for a service account, saying what roles and permissions it requires.

Contact the Data Engineering community on Slack to ask them to attach the service account to the resource in GCP that the code will run in.

You should not need to modify your code, because this code will automatically find and use the service account credentials.

Access files on Google Drive, such as Sheets and Docs, using a service account

In Google Drive, share the files with the email address of the service account.

Impersonate a service account on your local device

  1. Contact the Data Engineering community on Slack to get a service account that grants your personal account the role Service Account Token Creator, which includes the permission iam.serviceAccounts.getAccessToken.

    Note the email address of the service account.

  2. Follow the documentation on using GCP through the command line on your local machine.

  3. Run gcloud auth application-default login --impersonate-service-account=EMAIL_ADDRESS on the command line, where EMAIL_ADDRESS is the email of the service account, ending in .iam.gserviceaccount.com.

The Google Cloud client libraries will now use the service account.

Use GCP from code that runs on Google Colab

Paste and run the following code into a cell in a Colab notebook.

from google.colab import auth

# Authenticate the user - follow the link and the prompts to get an authentication token
auth.authenticate_user()

Avoid using a credentials file

You should not use a credentials file because they pose a security risk.

The Google Cloud client libraries will automatically find the credentials that it needs, whether you are running code on your local device or in GCP itself.

This page was last reviewed on 2 August 2024. It needs to be reviewed again on 2 February 2025 .
This page was set to be reviewed before 2 February 2025. This might mean the content is out of date.