class: center, middle, inverse, title-slide .title[ # Deep dive: coordinates + facets ] .author[ ### MACS 40700
University of Chicago ] --- class: middle, inverse # Agenda: * Coordinate systems * Aspect ratios * 🙅♀️ Pie charts * Coord maps * Themes --- ## Announcements * Grading up-to-date * Feedback: formatting, technical details * A2 coming up! (Jan 23) --- ## Setup .small[ ``` r # load packages library(tidyverse) library(knitr) library(openintro) library(palmerpenguins) library(ggrepel) library(ggwaffle) #devtools::install_github("liamgilbey/ggwaffle") library(broom) # set default theme for ggplot2 ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16)) # set default figure parameters for knitr knitr::opts_chunk$set( fig.width = 8, fig.asp = 0.618, fig.retina = 2, dpi = 150, out.width = "60%" ) # dplyr print min and max options(dplyr.print_max = 6, dplyr.print_min = 6) ``` ] --- class: middle, inverse # Coordinate systems --- ## Coordinate systems: purpose - Combine the two position aesthetics (`x` and `y`) to produce a 2d position on the plot: - linear coordinate system: horizontal and vertical coordinates - polar coordinate system: angle and radius - maps: latitude and longitude - Draw axes and panel backgrounds in coordination with the faceter coordinate systems --- ## Coordinate systems: types 1. **Linear coordinate systems:** preserve the shape of geoms -- 2. **Non-linear coordinate systems:** can change the shapes -- a straight line may no longer be straight. The closest distance between two points may no longer be a straight line. --- ## Coordinate systems: Linear **Linear coordinate systems:** preserve the shape of geoms - `coord_cartesian()`: the default Cartesian coordinate system, where the 2d position of an element is given by the combination of the x and y positions. - `coord_flip()`: Cartesian coordinate system with x and y axes flipped *(won't be using much now that geoms can take aesthetic mappings in x and y axes)* - `coord_fixed()`: Cartesian coordinate system with a fixed aspect ratio. *(useful only in limited circumstances)* --- ## Coordinate systems: Non-linear **Non-linear coordinate systems:** can change the shapes -- a straight line may no longer be straight. The closest distance between two points may no longer be a straight line. - `coord_trans()`: Apply arbitrary transformations to x and y positions, after the data has been processed by the stat - `coord_polar()`: Polar coordinates - `coord_map()` / `coord_quickmap()` / `coord_sf()`: Map projections --- ## Setting limits: what the plots say Identify the differences: focus on the range of the `x` and `y` axes and the plots.
.tiny[ ``` r p <- ggplot(palmerpenguins::penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point() + geom_smooth() p + labs(title = "Plot 1") p + scale_x_continuous(limits = c(190, 220)) + scale_y_continuous(limits = c(4000, 5000)) + labs(title = "Plot 2") p + xlim(190, 220) + ylim(4000, 5000) + labs(title = "Plot 3") p + coord_cartesian(xlim = c(190,220), ylim = c(4000, 5000)) + labs(title = "Plot 4") ``` <img src="index_files/figure-html/set-limits-1.png" width="20%" /><img src="index_files/figure-html/set-limits-2.png" width="20%" /><img src="index_files/figure-html/set-limits-3.png" width="20%" /><img src="index_files/figure-html/set-limits-4.png" width="20%" /> ] --- ## Setting limits: what the plots say (plot closeup) <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="35%" /><img src="index_files/figure-html/unnamed-chunk-3-2.png" width="35%" /><img src="index_files/figure-html/unnamed-chunk-3-3.png" width="35%" /><img src="index_files/figure-html/unnamed-chunk-3-4.png" width="35%" /> --- ## Setting limits: what the warnings say .tiny[ ``` r p <- ggplot(palmerpenguins::penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point() + geom_smooth() p + labs(title = "Plot 1") ## Warning: Removed 2 rows containing non-finite outside the scale range ## (`stat_smooth()`). ## Warning: Removed 2 rows containing missing values or values outside the scale range ## (`geom_point()`). p + scale_x_continuous(limits = c(190, 220)) + scale_y_continuous(limits = c(4000, 5000)) + labs(title = "Plot 2") ## Warning: Removed 235 rows containing non-finite outside the scale range ## (`stat_smooth()`). ## Warning: Removed 235 rows containing missing values or values outside the scale range ## (`geom_point()`). p + xlim(190, 220) + ylim(4000, 5000) + labs(title = "Plot 3") ## Warning: Removed 235 rows containing non-finite outside the scale range ## (`stat_smooth()`). ## Removed 235 rows containing missing values or values outside the scale range ## (`geom_point()`). p + coord_cartesian(xlim = c(190,220), ylim = c(4000, 5000)) + labs(title = "Plot 4") ## Warning: Removed 2 rows containing non-finite outside the scale range ## (`stat_smooth()`). ## Warning: Removed 2 rows containing missing values or values outside the scale range ## (`geom_point()`). ``` ] --- ## Setting limits - Setting scale limits: Any data outside the limits is thrown away - `scale_*_continuous()`, `xlim` and `ylim` arguments - `xlim()` and `ylim()` - Setting coordinate system limits: Use all the data, but only display a small region of the plot (zooming in) - `coord_cartesian()`, `xlim` and `ylim` arguments --- # DIDN'T WE SHOW THE OPPOSITE LAST WEEK?!! .panelset[ .panel[.panel-name[Breaks] .pull-left[ ``` r library(c3s2datasets) ggplot(scorecard, aes(x = netcost, y = avgfacsal)) + geom_point(alpha = 0.5) + scale_x_continuous( name = "Net cost of attendance", breaks = seq(from = 0, to = 30000, by = 10000) ) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="90%" /> ] ] .panel[.panel-name[Limits] .pull-left[ ``` r library(c3s2datasets) ggplot(scorecard, aes(x = netcost, y = avgfacsal)) + geom_point(alpha = 0.5) + scale_x_continuous( name = "Net cost of attendance" ) + xlim(0,30000) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="90%" /> ] ] ] --- ## Fixing aspect ratio with `coord_fixed()` Useful when having an aspect ratio of 1 makes sense, e.g. scores on two tests (reading and writing) on the same scale (0 to 100 points) .tiny[ ``` r ggplot(hsb2, aes(x = read, y = write)) + geom_point() + geom_smooth(method = "lm") + geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") + labs(title = "Not fixed") ggplot(hsb2, aes(x = read, y = write)) + geom_point() + geom_smooth(method = "lm") + geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray") + coord_fixed() + labs(title = "Fixed") ``` <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="20%" /><img src="index_files/figure-html/unnamed-chunk-7-2.png" width="20%" /> ] --- # Plot closeup <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="50%" /><img src="index_files/figure-html/unnamed-chunk-8-2.png" width="50%" /> --- ## Fixing aspect ratio with `tune::coord_obs_pred()` .panelset[ .panel[.panel-name[Code] ``` r data(solubility_test, package = "modeldata") p <- ggplot(solubility_test, aes(x = solubility, y = prediction)) + geom_abline(lty = 2) + geom_point(alpha = 0.5) ``` ] .panel[.panel-name[`coord_cartesian()`] ``` r p + coord_cartesian() ``` <img src="index_files/figure-html/pred-default-1.png" width="60%" /> ] .panel[.panel-name[`coord_fixed()`] ``` r p + coord_fixed() ``` <img src="index_files/figure-html/pred-fixed-1.png" width="60%" /> ] .panel[.panel-name[`coord_obs_pred()`] ``` r p + tune::coord_obs_pred() ``` <img src="index_files/figure-html/pred-tune-1.png" width="60%" /> ] ] --- ## Transformations .tiny[ ``` r p<- ggplot(penguins, aes(x = bill_depth_mm, y = body_mass_g)) + geom_point() + geom_smooth(method = "lm") p + labs(title = "Plot 1") p + scale_x_log10() + scale_y_log10() + labs(title = "Plot 2") p + coord_trans(x = "log10", y = "log10") + labs(title = "Plot 3") *ggplot(penguins, aes(x = log(bill_depth_mm, base = 10), y = log(body_mass_g, base = 10))) + geom_point() + geom_smooth(method = "lm") + labs(title = "Plot 4") ``` <img src="index_files/figure-html/transformations-1.png" width="25%" /><img src="index_files/figure-html/transformations-2.png" width="25%" /><img src="index_files/figure-html/transformations-3.png" width="25%" /><img src="index_files/figure-html/transformations-4.png" width="25%" /> ] --- ## Pie charts and bullseye charts with `coord_polar()` .small[ ``` r ggplot(penguins, aes(x = 1, fill = species)) + geom_bar() + labs(title = "Stacked bar chart") ggplot(penguins, aes(x = 1, fill = species)) + geom_bar() + * coord_polar(theta = "y") + labs(title = "Pie chart") ggplot(penguins, aes(x = 1, fill = species)) + geom_bar() + * coord_polar(theta = "x") + labs(title = "Bullseye chart") ``` <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="30%" /><img src="index_files/figure-html/unnamed-chunk-9-2.png" width="30%" /><img src="index_files/figure-html/unnamed-chunk-9-3.png" width="30%" /> ] --- class: middle, inverse # aside: about pie charts... --- ## Pie charts .task[ What do you know about pie charts and data visualization best practices? Love 'em or lose 'em? ] <img src="index_files/figure-html/unnamed-chunk-10-1.png" width="45%" /><img src="index_files/figure-html/unnamed-chunk-10-2.png" width="45%" /> --- ## Pie charts: when to love 'em, when to lose 'em .pull-left-narrow[ ❤️ For categorical variables with few levels, pie charts can sort of work well ] .pull-right-wide[ <img src="index_files/figure-html/unnamed-chunk-11-1.png" width="50%" /><img src="index_files/figure-html/unnamed-chunk-11-2.png" width="50%" /> ] --- ## Pie charts: when to love 'em, when to lose 'em .pull-left-narrow[ 💔 For categorical variables with many levels, pie charts are difficult to read ] .pull-right-wide[ <img src="index_files/figure-html/unnamed-chunk-12-1.png" width="50%" /><img src="index_files/figure-html/unnamed-chunk-12-2.png" width="50%" /> ] --- ## Waffle charts - Like with pie charts, work best when the number of levels represented is low - Unlike pie charts, easier to compare proportions that represent non-simple fractions .panelset[ .panel[.panel-name[Homeownership - Raw] <img src="index_files/figure-html/unnamed-chunk-13-1.png" width="50%" /> ] .panel[.panel-name[Homeownership - Transformed] <img src="index_files/figure-html/unnamed-chunk-14-1.png" width="50%" /> ] .panel[.panel-name[Loan Status - Raw] <img src="index_files/figure-html/unnamed-chunk-15-1.png" width="50%" /> ] .panel[.panel-name[Loan Status - Transformed] <img src="index_files/figure-html/unnamed-chunk-16-1.png" width="50%" /> ] ] --- ## Waffle charts: making of .panelset[ .panel[.panel-name[Code] ``` r *library(ggwaffle) penguins %>% waffle_iron(aes_d(group = species)) %>% ggplot(aes(x, y, fill = group)) + * geom_waffle() + labs(fill = NULL, title = "Penguin species") ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/waffle-penguin-1.png" width="60%" /> ] ] --- ## Waffle charts: enhanced theme .panelset[ .panel[.panel-name[Code] ``` r penguins %>% waffle_iron(aes_d(group = species), rows = 10) %>% ggplot(aes(x, y, fill = group)) + geom_waffle() + labs(fill = NULL, title = "Penguin species") + * coord_equal() + * scale_fill_waffle() + * theme_waffle() ``` ] .panel[.panel-name[Plot] <img src="index_files/figure-html/waffle-penguin-theme-1.png" width="60%" /> ] ] --- class: middle, inverse # back to coordinate systems... --- ## `coord_quickmap()` .pull-left[ - Approximation sets aspect ratio: 1m of latitude & 1m of longitude are same distance in middle of plot - Reasonable for smaller regions - Fast ] .pull-right[ .panelset[ .panel[.panel-name[Cartesian] ``` r ggplot(map_data("italy"), aes(long, lat, group = group)) + geom_polygon(fill = "white", color = "#008c45") +labs(x = NULL, y = NULL) ``` <img src="index_files/figure-html/unnamed-chunk-19-1.png" width="80%" /> ] .panel[.panel-name[Quickmap] ``` r ggplot(map_data("italy"), aes(long, lat, group = group)) + geom_polygon(fill = "white", color = "#008c45") + labs(x = NULL, y = NULL) + coord_quickmap() ``` <img src="index_files/figure-html/unnamed-chunk-20-1.png" width="80%" /> ] ] ] --- ## `coord_map()` - Uses the [**mapproj**](https://cran.r-project.org/package=mapproj) package - Uses [Mercator projection](https://en.wikipedia.org/wiki/Mercator_projection) by default, with many other options via `mapproj::mapproject() ` - Slower than `coord_quickmap()` --- ## `coord_map()` .panelset[ .panel[.panel-name[Cartesian] ``` r ggplot(map_data("state"), aes(long, lat, group = group)) + geom_polygon(fill = "white", color = "#3c3b6e") + labs(x = NULL, y = NULL) ``` <img src="index_files/figure-html/unnamed-chunk-21-1.png" width="45%" /> ] .panel[.panel-name[Mercator] ``` r ggplot(map_data("state"), aes(long, lat, group = group)) + geom_polygon(fill = "white", color = "#3c3b6e") + labs(x = NULL, y = NULL) + coord_map() ``` <img src="index_files/figure-html/unnamed-chunk-22-1.png" width="45%" /> ] .panel[.panel-name[Stereographic] ``` r ggplot(map_data("state"), aes(long, lat, group = group)) + geom_polygon(fill = "white", color = "#3c3b6e") + labs(x = NULL, y = NULL) + coord_map(projection = "stereographic") ``` <img src="index_files/figure-html/unnamed-chunk-23-1.png" width="45%" /> ] ] --- ## Try it out! .task[Create a map of a region or state you have visited] -- ``` r ggplot(map_data("county", "iowa"), aes(long, lat, group = group)) + geom_polygon(fill = "gray75", color = "black") + labs(x = NULL, y = NULL) + coord_map() ``` <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="60%" /> --- class: middle, inverse # Facets: what if you want to break it down into sub-plots? --- ## `facet_*()` .pull-left[ - `facet_wrap()` - "wraps" a 1d ribbon of panels into 2d - generally for faceting by a single variable - `facet_grid()` for faceting - produces a 2d grid of panels defined by variables which form the rows and columns - generally for faceting by two variables - `facet_null()`: a single plot, the default ] .pull-right[ <img src="images/grammar-of-graphics.png" width="60%" /> ] --- ## Free the scales! .small[ ``` r p <- ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point() p + facet_wrap(vars(species)) + labs(title = "Same scales") p + facet_wrap(vars(species), scales = "free") + labs(title = "Free scales") ``` <img src="index_files/figure-html/free-all-scales-1.png" width="50%" /><img src="index_files/figure-html/free-all-scales-2.png" width="50%" /> ] --- ## Free some scales .small[ ``` r p + facet_wrap(vars(species), scales = "free_x") + labs(title = "Free x scale") p + facet_wrap(vars(species), scales = "free_y") + labs(title = "Free y scale") ``` <img src="index_files/figure-html/free-some-scales-1.png" width="50%" /><img src="index_files/figure-html/free-some-scales-2.png" width="50%" /> ] --- .task[ Freeing the y scale improves the display, but it's still not satisfying. What's wrong with it? ] .small[ ``` r ggplot(penguins, aes(y = species, x = body_mass_g, fill = species)) + geom_boxplot(show.legend = FALSE) + facet_grid(island ~ .) + labs(title = "Same scale and spacing") ggplot(penguins, aes(y = species, x = body_mass_g, fill = species)) + geom_boxplot(show.legend = FALSE) + facet_grid(island ~ ., scales = "free_y") + labs(title = "Free y scale, same spacing") ``` <img src="index_files/figure-html/free-scales-not-spaces-1.png" width="50%" /><img src="index_files/figure-html/free-scales-not-spaces-2.png" width="50%" /> ] --- ## Free spaces ``` r ggplot(penguins, aes(y = species, x = body_mass_g, fill = species)) + geom_boxplot(show.legend = FALSE) + facet_grid(island ~ ., scales = "free_y", space = "free") + labs(title = "Free y scale and spacing") ``` <img src="index_files/figure-html/free-spaces-1.png" width="60%" /> --- ## Highlighting across facets: Option 1 ``` r penguins_sans_species <- penguins %>% select(-species) ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(data = penguins_sans_species, color = "gray") + geom_point(aes(color = species)) + facet_wrap(vars(species)) ``` <img src="index_files/figure-html/unnamed-chunk-26-1.png" width="50%" /> --- ## Highlighting across facets: Option 2 ``` r ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + #geom_point(data = penguins_sans_species, color = "gray") + geom_point(aes(color = species)) + facet_wrap(vars(species)) + gghighlight::gghighlight() ``` <img src="index_files/figure-html/unnamed-chunk-27-1.png" width="50%" /> --- # Discussion * When might it (not) be appropriate to use the following: * facets * free x * free y * free space * highlighting --- # Discussion: TS .pull-left[ <img src="images/ts_sentiment_arc.png" width="95%" /> ] .pull-right[ <img src="images/ts_sentiment_arc_facet.png" width="95%" /> ] --- class: middle, inverse # Themes --- ## Complete themes ``` r p + theme_gray() + labs(title = "Gray") p + theme_void() + labs(title = "Void") p + theme_dark() + labs(title = "Dark") ``` <img src="index_files/figure-html/unnamed-chunk-30-1.png" width="33%" /><img src="index_files/figure-html/unnamed-chunk-30-2.png" width="33%" /><img src="index_files/figure-html/unnamed-chunk-30-3.png" width="33%" /> --- ## Themes from ggthemes ``` r library(ggthemes) p + theme_fivethirtyeight() + labs(title = "FiveThirtyEight") p + theme_economist() + labs(title = "Economist") p + theme_wsj() + labs(title = "Wall Street Journal") ``` <img src="index_files/figure-html/unnamed-chunk-31-1.png" width="30%" /><img src="index_files/figure-html/unnamed-chunk-31-2.png" width="30%" /><img src="index_files/figure-html/unnamed-chunk-31-3.png" width="30%" /> --- ## Themes and color scales from ggthemes ``` r library(ggthemes) p + aes(color = species) + scale_color_wsj() + theme_wsj() + labs(title = "Wall Street Journal") ``` <img src="index_files/figure-html/unnamed-chunk-32-1.png" width="40%" /> --- ## Modifying theme elements ``` r p + labs(title = "Palmer penguins") + theme( plot.title = element_text(color = "red", face = "bold", family = "Comic Sans MS"), plot.background = element_rect(color = "red", fill = "mistyrose") ) ``` <img src="index_files/figure-html/unnamed-chunk-33-1.png" width="60%" /> --- # Custom themes! * Can tweak existing * Can google for your own (e.g. topic + ggplot) * Example: [TSwift custom themes](https://taylor.wjakethompson.com/articles/plotting) * [Color brewer](https://r-graph-gallery.com/38-rcolorbrewers-palettes.html) --- # examples: starting with TS and blues ``` r library(taylor) library(ggplot2) library(RColorBrewer) reputation <- subset(taylor_album_songs, album_name == "reputation") reputation$track_name <- factor(reputation$track_name, levels = reputation$track_name) # Define the number of colors you want nb.cols <- 17 blues17 <- colorRampPalette(brewer.pal(8, "Blues"))(nb.cols) ``` .footnote[[source](https://www.datanovia.com/en/blog/easy-way-to-expand-color-palettes-in-r/)] --- # Same plot, different ways! .panelset[ .panel[.panel-name[Blues Code] ``` r p <- ggplot(reputation, aes(x = valence, y = track_name, fill = track_name)) + geom_col(show.legend = FALSE) + expand_limits(x = c(0, 1)) + labs(y = NULL) + theme_minimal() + scale_fill_manual(values = blues17) p ``` ] .panel[.panel-name[Blues Plot] <img src="index_files/figure-html/blues-1.png" width="60%" /> ] .panel[.panel-name[Custom Code] ``` r p + scale_fill_taylor_d(album = "reputation") ``` ] .panel[.panel-name[Custom Plot] <img src="index_files/figure-html/tv-1.png" width="60%" /> ] ] --- # Other options: alternate themes * https://emilhvitfeldt.github.io/paletteer/ * https://matthewbjane.github.io/ThemePark/ * https://datavizs24.classes.andrewheiss.com/resource/colors.html --- # Recap * Coord systems: can change to linear/nonlinear * Coord_polar also interesting * Geographic layer possible * Themes: delightful