class: center, middle, inverse, title-slide .title[ # Visualizing spatial data I ] .author[ ### MACS 40700
University of Chicago ] --- class: inverse, middle # Announcements --- class: inverse, middle # For today: - API key: https://docs.stadiamaps.com/tutorials/getting-started-in-r-with-ggmap/#set-up-api-key-authentication ```r # install ggmap if necessary # install.packages("ggmap") usethis::use_course("MACS40700/spatial") ``` --- ## Setup ``` r # load packages library(leaflet) library(geofacet) library(rnaturalearth) library(maps) library(ggmap) library(sf) library(tidyverse) # set default theme for ggplot2 ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16)) # set default figure parameters for knitr knitr::opts_chunk$set( fig.width = 8, fig.asp = 0.618, fig.retina = 2, dpi = 150 ) ``` --- ## Geospatial visualizations * Earliest form of information visualizations * Geospatial data visualizations * [Google Maps](https://www.google.com/maps) --- ## Not that Jon Snow <img src="https://media.giphy.com/media/3ohzdUi5U8LBb4GD4s/giphy.gif" width="80%" style="display: block; margin: auto;" /> --- ## Dr. John Snow <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/Snow-cholera-map-1.jpg/2183px-Snow-cholera-map-1.jpg" width="50%" style="display: block; margin: auto;" /> .footnote[Source: [Wikipedia](https://en.wikipedia.org/wiki/John_Snow)] --- ## Designing modern maps * Depict spatial features * Incorporate additional attributes and information * Major features * Scale * Projection * Symbols --- ## Scale * Proportion between distances and sizes on a map and their actual distances and sizes on Earth * Small-scale map * Large-scale map --- ## Mapping options: * maps -- map data * ggmap -- nice way to render but need API * leaflet -- more interactive * plotly -- soon-to-be new friend :) * sf -- ways to visualize objects --- ## Large-scale map
--- ## Small-scale map
--- ## Projection * Process of taking a three-dimensional object and visualizing it on a two-dimensional surface * No 100% perfect method for this * Always introduces distortions -- * Properties of projection methods 1. Shape 1. Area 1. Angles 1. Distance 1. Direction --- ## Conformal projections <img src="index_files/figure-html/mercator-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Equal-area projections <img src="index_files/figure-html/equal-area-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Mollweide <img src="index_files/figure-html/mollweide-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Gall-Peters (AKA variant of equal area cylindrical) <img src="index_files/figure-html/gp -1.png" width="80%" style="display: block; margin: auto;" /> --- class: inverse, middle # raster maps --- ## `ggmap` - Package for drawing maps using `ggplot2` and **raster** map tiles - Static image files generated by mapping services - Focus on incorporating data into existing maps - Severely limits ability to change the appearance of the geographic map - Don't have to worry about the maps, just the data to go on top - **NEED API** --- ## `leaflet` (has both R and javascript availabilty) - similar to `ggmap` but no API needed! --- # Bounding boxes: `leaflet` .panel[.panel-name[Code] ``` r leaflet() %>% fitBounds(-87.936287, 41.679835, -87.447052,42.060835) %>% addMarkers(lng=-87.597086, lat=41.785958, popup="You are here") %>% setView(lng = -87.597, lat = 41.787, zoom = 8) %>% addTiles() ``` ] .panel[.panel-name[Output]
] --- ## Level of detail: zoom = 12 vs 10 .pull-left[
] .pull-right[
] --- ## Bounding box: `ggmap` .panel[.panel-name[Code] ``` r *chi_bb <- c( left = -87.936287, bottom = 41.679835, right = -87.447052, top = 42.060835) chicago <- get_stadiamap( bbox = chi_bb, zoom = 11) ggmap(chicago) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" /> ] --- ## Identifying bounding box > Use [bboxfinder.com](http://bboxfinder.com/#0.000000,0.000000,0.000000,0.000000) to determine the exact longitude/latitude coordinates for the bounding box you wish to obtain. --- ## Types of map tiles: Leaflet
--- ## B&W ``` r m %>% addProviderTiles(providers$CartoDB.Positron) ```
--- ## Nat Geo ``` r m %>% addProviderTiles(providers$Esri.NatGeoWorldMap) ```
``` r leaflet() %>% setView(lng = -87.624003, lat = 41.876823, zoom = 8) %>% addTiles()%>% addWMSTiles( "http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi", layers = "nexrad-n0r-900913", options = WMSTileOptions(format = "image/png", transparent = TRUE), attribution = "Weather data © 2012 IEM Nexrad" ) ```
--- ## Types of map tiles: `ggmap` <img src="index_files/figure-html/stamen-maptype-1.png" width="80%" style="display: block; margin: auto;" /> --- class: inverse, middle <img src="https://media.giphy.com/media/oOK9AZGnf9b0c/giphy.gif" width="80%" style="display: block; margin: auto;" /> --- class: inverse, middle # Example: using `ggmap` --- ## Import crime data * City of Chicago open data portal * [Crime data from 2017](https://data.cityofchicago.org/Public-Safety/Crimes-2017/d62x-nvdr) ``` r crimes <- read_csv("data/chicago-crimes.csv") glimpse(crimes) ``` ``` ## Rows: 268,862 ## Columns: 22 ## $ ID <dbl> 11227287, 11227583, 11227293, 11227634, 1122750… ## $ `Case Number` <chr> "JB147188", "JB147595", "JB147230", "JB147599",… ## $ Date <chr> "10/08/2017 03:00:00 AM", "03/28/2017 02:00:00 … ## $ Block <chr> "092XX S RACINE AVE", "026XX W 79TH ST", "060XX… ## $ IUCR <chr> "0281", "0620", "0810", "0281", "1754", "0810",… ## $ `Primary Type` <chr> "CRIM SEXUAL ASSAULT", "BURGLARY", "THEFT", "CR… ## $ Description <chr> "NON-AGGRAVATED", "UNLAWFUL ENTRY", "OVER $500"… ## $ `Location Description` <chr> "RESIDENCE", "OTHER", "RESIDENCE", "HOTEL/MOTEL… ## $ Arrest <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE… ## $ Domestic <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE… ## $ Beat <chr> "2222", "0835", "0313", "0122", "1033", "1432",… ## $ District <chr> "022", "008", "003", "001", "010", "014", "001"… ## $ Ward <dbl> 21, 18, 20, 42, 12, 32, 2, 8, 39, 40, 8, 44, 47… ## $ `Community Area` <dbl> 73, 70, 42, 32, 30, 22, 32, 44, 13, 1, 46, 6, 4… ## $ `FBI Code` <chr> "02", "05", "06", "02", "02", "06", "11", "14",… ## $ `X Coordinate` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ `Y Coordinate` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ Year <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,… ## $ `Updated On` <chr> "02/11/2018 03:57:41 PM", "02/11/2018 03:57:41 … ## $ Latitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ Longitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ Location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ``` --- ## Plot high-level map of crime .panelset[ .panel[.panel-name[Code] ``` r *ggmap(chicago) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/import-chicago-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Using `geom_point()` .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + * geom_point( * data = crimes, * mapping = aes( * x = Longitude, * y = Latitude * ) * ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/plot-crime-point-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Using `geom_point()` .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + geom_point( data = crimes, mapping = aes( x = Longitude, y = Latitude ), * size = .25, * alpha = .01 ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/plot-crime-point-alpha-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Using `geom_density_2d()` .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + * geom_density_2d( data = crimes, mapping = aes( x = Longitude, y = Latitude ) ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/kde-contour-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Using `stat_density_2d()` .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + * stat_density_2d( data = crimes, mapping = aes( x = Longitude, y = Latitude, * fill = after_stat(level) ), * geom = "polygon" ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/kde-fill-1.png" width="70%" style="display: block; margin: auto;" /> ] ] https://ggplot2.tidyverse.org/reference/geom_density_2d.html --- ## Using `stat_density_2d()` .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + stat_density_2d( data = crimes, mapping = aes( x = Longitude, y = Latitude, fill = after_stat(level) ), * alpha = .2, * bins = 25, geom = "polygon" ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/plot-crime-density-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Using `stat_density_2d()` but make it ~**FANCY**~ .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + stat_density_2d( data = crimes, mapping = aes( x = Longitude, y = Latitude, fill = after_stat(level) ), alpha = .2, bins = 25, geom = "polygon" ) + scale_fill_viridis_c(option = "inferno") ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/plot-crime-density-f-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- ## Looking for variation .panelset[ .panel[.panel-name[Code] ``` r ggmap(chicago) + stat_density_2d( data = crimes %>% filter(`Primary Type` %in% c( "BURGLARY", "MOTOR VEHICLE THEFT", "NARCOTICS", "ROBBERY" )), mapping = aes( x = Longitude, y = Latitude, fill = after_stat(level) ), alpha = .4, bins = 10, geom = "polygon" ) + * facet_wrap(facets = vars(`Primary Type`), labeller = label_wrap_gen(10)) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/plot-crime-wday-1.png" width="70%" style="display: block; margin: auto;" /> ] ] --- # API keys: what are they and how to store? --- ## API Keys APIs: application programming interface. They **provide the opportunity for users to respectfully access data in a structured format**. You can think of like when you order through an app -- you can check off the items you want and then you receive them in a tidy package. This is the same idea here -- you can choose from a menu of data options (e.g. variables) and then you receive them in a structured format (maybe JSON, maybe for mapping, etc.). --- ## How to store API keys: * In document: e.g. register_stadiamaps(apikey) * In RProfile * In R envion For more: https://cran.r-project.org/web/packages/httr/vignettes/secrets.html .task[code: `file.edit("~/.Renviron")`] The file looks something like `VAR1 = value1` `VAR2 = value2` And you can access the values in R using `Sys.getenv():` .task[ `Sys.getenv("VAR1")`] This would then be used, something like: `apikey <- Sys.getenv("VAR1")` --- ## Exercise using `ggmap`: needs API key - City of Chicago data portal - 311 service requests - API key: https://docs.stadiamaps.com/tutorials/getting-started-in-r-with-ggmap/#set-up-api-key-authentication ```r # install ggmap if necessary # install.packages("ggmap") usethis::use_course("MACS40700/spatial") ```
--- class: middle, inverse # Geofaceting --- <img src="index_files/figure-html/geofacet-state-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Daily US vaccine data by state .small[ ``` r us_state_vaccinations <- read_csv(here::here("data/us_state_vaccinations.csv")) ``` ``` r glimpse(us_state_vaccinations) ``` ``` ## Rows: 30,692 ## Columns: 16 ## $ date <date> 2021-01-12, 2021-01-13, 2021-01-1… ## $ location <chr> "Alabama", "Alabama", "Alabama", "… ## $ total_vaccinations <dbl> 78134, 84040, 92300, 100567, NA, N… ## $ total_distributed <dbl> 377025, 378975, 435350, 444650, NA… ## $ people_vaccinated <dbl> 70861, 74792, 80480, 86956, NA, NA… ## $ people_fully_vaccinated_per_hundred <dbl> 0.15, 0.19, NA, 0.28, NA, NA, NA, … ## $ total_vaccinations_per_hundred <dbl> 1.59, 1.71, 1.88, 2.05, NA, NA, NA… ## $ people_fully_vaccinated <dbl> 7270, 9245, NA, 13488, NA, NA, NA,… ## $ people_vaccinated_per_hundred <dbl> 1.45, 1.53, 1.64, 1.77, NA, NA, NA… ## $ distributed_per_hundred <dbl> 7.69, 7.73, 8.88, 9.07, NA, NA, NA… ## $ daily_vaccinations_raw <dbl> NA, 5906, 8260, 8267, NA, NA, NA, … ## $ daily_vaccinations <dbl> NA, 5906, 7083, 7478, 7498, 7509, … ## $ daily_vaccinations_per_million <dbl> NA, 1205, 1445, 1525, 1529, 1531, … ## $ share_doses_used <dbl> 0.207, 0.222, 0.212, 0.226, NA, NA… ## $ total_boosters <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA… ## $ total_boosters_per_hundred <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA… ``` ] .footnote[ Source: https://ourworldindata.org/us-states-vaccinations ] --- ## Facet by location .panelset.sideways[ .panel[.panel-name[Code] ``` r ggplot( us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred) ) + geom_area() + * facet_wrap(facets = vars(location)) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-16-1.png" width="95%" style="display: block; margin: auto;" /> ] ] --- ## Data cleaning ``` r us_state_vaccinations <- us_state_vaccinations %>% mutate(location = if_else(location == "New York State", "New York", location)) %>% filter(location %in% c(state.name, "District of Columbia")) ``` --- ## Geofacet by state Using `geofacet::facet_geo()`: .panelset.sideways[ .panel[.panel-name[Code] ``` r ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred)) + geom_area() + * facet_geo(facets = vars(location)) + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data" ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-18-1.png" width="95%" style="display: block; margin: auto;" /> ] ] --- ## Geofacet by state, with improvements .panelset.sideways[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-19-1.png" width="95%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] .small[ ``` r ggplot(us_state_vaccinations, aes(x = date, y = people_fully_vaccinated_per_hundred, group = location)) + geom_area() + facet_geo(facets = vars(location)) + * scale_y_continuous( * limits = c(0, 100), * breaks = c(0, 50, 100), * minor_breaks = c(25, 75) * ) + * scale_x_date(breaks = c(ymd("2021-01-01", "2021-07-01", "2022-01-01")), date_labels = "%b-%y") + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data" ) + theme( * strip.text.x = element_text(size = 7), * axis.text = element_text(size = 8), plot.title.position = "plot" ) ``` ] ] ] --- ## Bring in 2020 Presidential election results ``` r election_2020 <- read_csv(here::here("data/us-election-2020.csv")) ``` ``` r election_2020 ``` ``` ## # A tibble: 51 × 5 ## state electoal_votes biden trump win ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 Alabama 9 0 9 Republican ## 2 Alaska 3 0 3 Republican ## 3 Arizona 11 11 0 Democrat ## 4 Arkansas 6 0 6 Republican ## 5 California 55 55 0 Democrat ## 6 Colorado 9 9 0 Democrat ## 7 Connecticut 7 7 0 Democrat ## 8 Delaware 3 3 0 Democrat ## 9 District of Columbia 3 3 0 Democrat ## 10 Florida 29 0 29 Republican ## # ℹ 41 more rows ``` --- ## Geofacet by state, color by presidential election result .tiny[ .panelset.sideways[ .panel[.panel-name[Code] ``` r us_state_vaccinations %>% left_join(election_2020, by = c("location" = "state")) %>% ggplot(aes(x = date, y = people_fully_vaccinated_per_hundred)) + * geom_area(aes(fill = win)) + facet_geo(facets = vars(location)) + * scale_y_continuous(limits = c(0, 100), breaks = c(0, 50, 100), minor_breaks = c(25, 75)) + scale_x_date(breaks = c(ymd("2021-01-01", "2021-07-01", "2022-01-01")), date_labels = "%b") + * scale_fill_manual(values = c("#2D69A1", "#BD3028")) + labs( x = NULL, y = NULL, title = "Covid-19 vaccination rate in the US", subtitle = "Daily number of people fully vaccinated, per hundred", caption = "Source: Our World in Data", fill = "2020 Presidential\nElection" ) + theme( strip.text.x = element_text(size = 7), axis.text = element_text(size = 8), plot.title.position = "plot", * legend.position = c(0.93, 0.15), * legend.text = element_text(size = 9), * legend.title = element_text(size = 11), * legend.background = element_rect(color = "gray", size = 0.5) ) ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="95%" style="display: block; margin: auto;" /> ] ] ] --- # Leaflet vs ggmap * ggmap connects to google open street data * leaflet has interactive factor --- # Recap * Lots we can do spatially * Maps can require a LOT and options vary / update all the time