Observable Framework View source

Google BigQuery data loader

Here’s a JavaScript data loader that fetches the confirmed_cases metric from the publicly available BigQuery Covid19 dataset through the Google BigQuery API by using the Google BigQuery Data Node.js Client.

import {csvFormat} from "d3-dsv";
import {runQuery} from "./google-bigquery.js";

const rows = await runQuery(`
  SELECT
    FORMAT_TIMESTAMP('%Y-%m-%d', date) as date,
    confirmed_cases
  FROM
    \`bigquery-public-data.covid19_italy.data_by_province\`
  WHERE
    name = "Lombardia"
    AND province_name = "Lecco"
    AND date BETWEEN '2020-05-01 00:00:00 UTC' AND '2020-05-15 00:00:00 UTC'
  GROUP BY 1,2
  ORDER BY 1 ASC;
`);

process.stdout.write(csvFormat(rows));

This data loader depends on @google-cloud/bigquery, d3-dsv, and dotenv, which we reference in package.json.

The data loader uses a helper file, google-bigquery.js, which is a thin wrapper on the @google-cloud/bigquery package. This reduces the amount of boilerplate you need to run a report.

import "dotenv/config";
import {BigQuery} from "@google-cloud/bigquery";

const {BQ_PROJECT_ID, BQ_CLIENT_EMAIL, BQ_PRIVATE_KEY} = process.env;

if (!BQ_PROJECT_ID) throw new Error("missing BQ_PROJECT_ID");
if (!BQ_CLIENT_EMAIL) throw new Error("missing BQ_CLIENT_EMAIL");
if (!BQ_PRIVATE_KEY) throw new Error("missing BQ_PRIVATE_KEY");

const bigQueryClient = new BigQuery({
  projectId: BQ_PROJECT_ID,
  credentials: {
    client_email: BQ_CLIENT_EMAIL,
    private_key: BQ_PRIVATE_KEY
  }
});

export async function runQuery(query) {
  return (await bigQueryClient.query({query}))[0];
}

For the data loader to authenticate with the Google BigQuery API, you will need to set several environment variables containing secret credentials. If you use GitHub, you can use secrets in GitHub Actions to set environment variables; other platforms provide similar functionality for continuous deployment. For local development, we use the dotenv package, which allows environment variables to be defined in a .env file which lives in the project root and looks like this:

BQ_PROJECT_ID="123456789-abc"
BQ_CLIENT_EMAIL="[email protected]"
BQ_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nxxxxxxxxxx-----END PRIVATE KEY-----\n"

The .env file should not be committed to your source code repository; keep your credentials secret.

See the Google BigQuery API Quickstart for how to create the service account needed to access the Google BigQuery API after it has been enable in the GCP console. The following Google codelab might be helpful walk-through before getting started. Although the data in this example lives in the public database bigquery-public-data.covid19_italy, you still need to set up a service account and authenticate into one of your projects to access it.

The above data loader lives in data/covidstats_it.csv.js, so we can load the data as data/covidstats_it.csv.

const covidStats = FileAttachment("data/covidstats_it.csv").csv({typed: true});

The covidStats table has two columns: date and confirmed_cases. We can display the table using Inputs.table.

Inputs.table(covidStats)

Lastly, we can pass the table to Plot.plot.

Plot.plot({
  y: {
    nice: 2,
    grid: true,
    label: "Confirmed cases"
  },
  marks: [
    Plot.lineY(covidStats, {
      x: "date",
      y: "confirmed_cases",
      stroke: "steelblue",
      marker: true,
      tip: true
    })
  ]
})