A quick look at museums per capita

Tags: R-code

I saw a cool tweet from Scott Pilkington today about museums per capita, and Aimee Whitcroft had some interesting follow up questions about countries similar in population to New Zealand…this was far too good a procrastination opportunity!

<!DOCTYPE html>

Museums per capita

Here's a nice random fact for you: NZ has a high number of museums per capita. 2013 was 1:9,500.

Same year, UK was 1:17,000, the US 1:24,000. 2008 Australia was 1:18,000.

~55,000 museums globally, approx one for every 146,000 people. NZ is well ahead of the bell curve!
— Scott Pilkington (@spil030) March 24, 2019

library(tidyverse)
library(rvest)
library(gapminder)

Getting some comparison countries

I decided to grab some countries with populations within a million of New Zealands. The easiest dataset I new with loads of population data is in the lovely gapminder package. Unfortunately the most recent population counts are for 2007, but this attempt is so back-of-the-envelope anyway, and more just a chance to play, so I’m going to just go with that. Also provides GDP per capita which I think is interesting for arts and culture funding.

my_pop = gapminder %>% 
  filter(year == max(year)) %>% 
  mutate(NZpop = ifelse(country == "New Zealand", pop, NA)) %>% 
  mutate(NZpop = as.numeric(NZpop)) %>% 
  arrange(NZpop) %>% 
  fill(NZpop) %>% 
  mutate(`Difference from NZ` = pop-NZpop) %>% 
  filter(abs(`Difference from NZ`) < 1000000)

Examples of getting number of museums from Wikipedia lists

Below are a couple very messy examples of scraping the individual pages on Wikipedia:

NZ = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_New_Zealand") %>% 
  html_nodes("ul") %>% 
  html_text() %>% 
  as.data.frame() %>% 
  .[3:22,] %>% 
  as.data.frame() %>% 
  rename(name = 1) %>% 
  separate_rows(name, sep = "\\\n") 

nrow(NZ)

## [1] 138

ireland = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_the_Republic_of_Ireland") %>% 
  html_nodes("table") %>% 
  .[[2]] %>% 
  html_table(fill = TRUE)

nrow(ireland)

## [1] 244

norway = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Norway") %>% 
  html_nodes("ul") %>% 
  html_text() %>% 
  as.data.frame() %>% 
  .[3:22,] %>% 
  data_frame() %>% 
  rename(name = 1) %>% 
  separate_rows(name, sep = "\\\n")
nrow(norway)

## [1] 150

oman = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Oman") %>% 
  html_nodes("ul") %>% 
  html_text() %>% 
  as.data.frame() %>% 
  .[1,] %>% 
  data_frame() %>% 
  rename(name = 1) %>% 
  separate_rows(name, sep = "\\\n")
nrow(oman)

## [1] 11

panama = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Panama") %>% 
  html_nodes("table") %>% 
  .[[1]] %>% 
  html_table(fill = TRUE)
nrow(panama)

## [1] 20

puertorico = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Puerto_Rico") %>% 
  html_nodes("table") %>% 
  .[[2]] %>% 
  html_table(fill = TRUE)
nrow(puertorico)

## [1] 82

singapore = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Singapore") %>% 
  html_nodes("ul") %>% 
  html_text() %>% 
  as.data.frame() %>% 
  .[2:4,] %>% 
  data_frame() %>% 
  rename(name = 1) %>% 
  separate_rows(name, sep = "\\\n")
nrow(singapore)

## [1] 42

uruguay = read_html("https://en.wikipedia.org/wiki/List_of_museums_in_Uruguay") %>% 
  html_nodes("ul") %>% 
  html_text() %>% 
  as.data.frame() %>% 
  .[1,] %>% 
  data_frame() %>% 
  rename(name = 1) %>% 
  separate_rows(name, sep = "\\\n")
nrow(uruguay)

## [1] 10

The ideal would be to use the main page to get all the hyperlinks with something like this:

get_links = read_html("https://en.wikipedia.org/wiki/List_of_museums_by_country") %>% 
  html_nodes("a") %>% 
  html_attr("href") %>% 
  data.frame() %>% 
  rename(links = 1) %>% 
  filter(grepl("wiki/List_of_museums_in", links)) %>% 
  mutate(links = paste0("https://en.wikipedia.org/", links))

…and then scrape each of these, but aside from there being several ways these pages are formatted, you also have some lists just under the country title, not linking to a new page, and then you have some missing (I think Republic of Congo’s list of museums is missing from this main page). Ugh.

Making a graph

So here is a quick look at the number of museums per 100,000 people. There are quite a few limitations on the data of course, but interesting none the less, I hope.

# numbers mostly by hand from wikipedia

museums = read_csv("museums.csv")

tidy_museums = museums %>% 
  mutate(`Museums per 100,000` = `N museums from Wikipedia`/pop*100000) %>% 
  filter(!is.na(country))

tidy_museums %>% 
  mutate(country = fct_reorder(country, `Museums per 100,000`)) %>% 
  ggplot(aes(x = `Museums per 100,000`, y = country, size = gdpPerCap, color = continent)) +
  geom_point() +
  scale_size_continuous(name = "GDP per capita (2007)") +
  scale_color_discrete(name = "Continent") +
  ylab(label = "Country") +
  xlab(label = "Museums per 100,000 people") +
  theme_minimal() +
  ggtitle("Museums per 100,000 people") +
  labs(caption = "More info: Countries were included based on being +/- 1 million the population of\nNew Zealand in 2007, based on Gapminder data. Museum counts were\ndone by hand based on lists on Wikipedia and may be wildly wrong.")

Code and data available at https://github.com/elb0/museums-per-capita.

Written on March 26, 2019

A quick look at museums per capita

Museums per capita

Getting some comparison countries

Examples of getting number of museums from Wikipedia lists

Making a graph

Related Posts

Learning tools and resources 18 May 2020

Resources and other notes from eCOTS 2020 18 May 2020

My new job 14 Mar 2019