Observable Framework View source

R data loader to generate CSV

Here’s an R data loader that performs k-means clustering with penguin body size measurements then outputs a CSV file to standard out.

# Attach libraries (must be installed)
library(readr)
library(dplyr)
library(tidyr)

# Data access, wrangling and analysis
penguins <- read_csv("docs/data-files/penguins.csv") |>
  drop_na(culmen_depth_mm, culmen_length_mm)

penguin_kmeans <- penguins |>
  select(culmen_depth_mm, culmen_length_mm) |>
  scale() |>
  kmeans(centers = 3)

penguin_clusters <- penguins |>
  mutate(cluster = penguin_kmeans$cluster)

# Convert data frame to delimited string, then write to standard output
cat(format_csv(penguin_clusters))

To run this data loader, you’ll need R installed, along with the readr, dplyr, and tidyr libraries, e.g. with install.packages("dplyr").

The above data loader lives in data/penguin-kmeans.csv.R, so we can load the data using data/penguin-kmeans.csv. You can access the output in a markdown page using the FileAttachment.csv method:

const penguinKmeans = FileAttachment("data/penguin-kmeans.csv").csv({typed: true});

We can display the contents of penguinKmeans with Inputs.table:

Inputs.table(penguinKmeans)

We can pass the data to Plot.plot to make a scatterplot of penguin size (body mass and flipper length) with text indicating the assigned cluster number and color mapped to penguin species.

Plot.plot({
  color: {
    legend: true,
    range: ["lightseagreen", "orchid", "darkorange"]
  },
  marks: [
    Plot.text(penguinKmeans, {
      text: "cluster",
      x: "body_mass_g",
      y: "flipper_length_mm",
      fill: "species",
      fontWeight: 600
    })
  ]
})