class: center, middle, inverse, title-slide .title[ # Introduction to data visualization ] .author[ ### MACS 40700
University of Chicago ] --- class: middle, inverse # Course Details --- ## Teaching team ### Instructor * Jean Clipperton - clipperton@uchicago.edu --- ## Themes: what, why, and how - **What:** the plot - Specific types of visualizations for a particular purpose (e.g., maps for spatial data, Sankey diagrams for proportions, etc.) - Tooling to produce them (e.g., specific R packages) -- - **How:** the process - Start with a design (sketch + pseudo code) - Pre-process data (e.g., wrangle, reshape, join, etc.) - Map data to aesthetics - Make visual encoding decisions (e.g., address accessibility concerns) - Post-process for visual appeal and annotation -- - **Why:** the theory - Tie together "how" and "what" through the grammar of graphics --- class: middle, inverse # Course components --- ## Course website .center[.large[[https://macs40700.netlify.app/](https://macs40700.netlify.app/)]] .center[ <iframe width="900" height="450" src="https://macs40700.netlify.app/" frameborder="0" style="background:white;"></iframe> ] --- ## Lectures - Build on readings - Attendance *and engagement* expected - A little bit of everything: - Traditional lecture - Live coding + demos - Short exercises + solution discussion --- ## Announcements - Posted on Ed, be sure to check regularly - I'll assume that you've read an announcement by the next "business" day .center[ <iframe width="900" height="450" src="https://edstem.org/us/courses/70315/discussion/" frameborder="0" style="background:white;"></iframe> ] --- class: middle, inverse # Assessments --- ## Assessments - Homework assignments - Accessed on GitHub, submitted on Canvas, individual - Final Project - Accessed on GitHub, submitted on Canvas, individual or team-based --- ## Teams: UP TO YOU - Final project - You can opt in to group work OR work independently - Team-based submissions may be up to three people and each person must clearly explain their contributions to the project both descriptively and within a % (e.g. I did x while my partner did y and I contributed z% to the project (but more detail!)). While the percentages don't have to match exactly, they should be in the general ball-park. - Expectations and roles - Everyone is expected to contribute equal *effort* - Everyone is expected to understand *all* code turned in - Individual contribution evaluated by peer evaluation, commits, etc. --- ## Grading |Assignment|Type |Value | n |Due | |:---------|:----------|:------|:---|---------------------| |Assignments |Individual-ish |60% | 5* | ~ Every other week | |Final choice | Choice |40% | 1 .fn[*] | Exam week | .footnote[[*] check-ins and proposal are part of this grade] --- class: middle, inverse # Course policies --- ## Collaboration policy - Only work that is clearly designated as team work should be completed collaboratively (Projects) - Homework assignments must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice. --- ## Sharing / reusing code policy - A huge volume of code is available on the web, and many tasks may have solutions posted - Unless explicitly stated otherwise, this course's policy is that you may make use of any online resources (e.g. RStudio Community, StackOverflow, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s). - Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of its source - AI: if you use AI, you need to include a statement about what asked, your original code, and the issues you fixed / resolved. -- .task[**If you don't understand what the code is doing and are not prepared to explain it in detail, you should not submit it.**] --- class: middle, inverse # Course Tools --- ## RStudio - Local R installations - [Software setup instructions](https://macs40700.netlify.app/setup/#option-2---install-the-software-locally) --- ## GitHub .center[.large[https://classroom.github.com/a/4G-lKClM]] - GitHub classroom for the course - All of your work and your membership (enrollment) in the organization is private - Each assignment is a private repo on GitHub, I distribute the assignments on GitHub and you submit them there - Feedback on assignments is in Canvas --- ## Username advice .task[in case you don't yet have a GitHub account...] Some brief advice about selecting your account names (particularly for GitHub), - Incorporate your actual name! People like to know who they’re dealing with and makes your username easier for people to guess or remember - Reuse your username from other PROFESSIONAL contexts, e.g., Twitter or Slack - Pick a username you will be comfortable revealing to your future boss - Shorter is better than longer, but be as unique as possible - Make it timeless. Avoid highlighting your current university, employer, or place of residence, or year (birth, graduation, etc.) --- ## Ed Discussions .center[.midi[https://edstem.org/us/courses/91046]] <br> - Online forum for asking and answering questions - Allows for code snippets - Connected to Canvas - Ask **and answer** questions related to course logistics, assignment, etc. here - Personal questions (e.g., extensions, illnesses, specific code, etc.) should be via private message --- --- # Workflow Here will be your workflow for class: * Start on on the website: read over the assignment description. * Click the github link to accept the assignment: this will create a repo with the proper format/setup for the assignment. The permissions are set so that it will be private to all except you and me/the TA * Connect to the github repo and push your work to it * When complete, go to Canvas, submit the github repo link. (we can't push grades from github to Canvas, unfortunately) --- class: center middle inverse # Data, truth, and beauty --- # Just show me the data! -- .pull-left[ ``` r head(my_data, 10) ``` ``` ## # A tibble: 10 × 2 ## x y ## <dbl> <dbl> ## 1 55.4 97.2 ## 2 51.5 96.0 ## 3 46.2 94.5 ## 4 42.8 91.4 ## 5 40.8 88.3 ## 6 38.7 84.9 ## 7 35.6 79.9 ## 8 33.1 77.6 ## 9 29.0 74.5 ## 10 26.2 71.4 ``` ] -- .pull-right[ ``` r mean(my_data$x) ``` ``` *## [1] 54.26327 ``` ``` r mean(my_data$y) ``` ``` *## [1] 47.83225 ``` ``` r cor(my_data$x, my_data$y) ``` ``` *## [1] -0.06447185 ``` ] --- class: center # oh no: all these data have the same summary stats! --- # Raw data is not enough <img src="index_files/figure-html/datasaurus-graph-static-1.png" width="80%" style="display: block; margin: auto;" /> --- # Humans love patterns .center[ <div class="figure" style="text-align: center"> <img src="images/01/pattern-processing.png" alt="Pattern processing" width="80%" /> <p class="caption">Pattern processing</p> </div> .footnote[https://doi.org/10.3389/fnins.2014.00265] ] --- # (Sometimes we love them too much) -- .center[ .box-inv-3.sp-after[**Pareidolia**: seeing patterns that aren't there.] ] -- .pull-left[ <img src="images/01/pareidolia-1.jpg" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="images/01/pareidolia-3.jpg" width="80%" style="display: block; margin: auto;" /> ] --- # Beauty is necessary to see patterns .pull-left[ <img src="images/01/amount-diffs-table.png" alt="Amount donated table" width="80%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="images/01/amount-diffs.png" alt="Amount donated graph" width="80%" style="display: block; margin: auto;" /> ] --- class: center middle inverse # Beautiful visualizations --- # What makes a great visualization? .midi[ - Truthful - Functional - Beautiful - Insightful - Enlightening ] .footnote[Alberto Cairo, *The Truthful Art*] --- Alberto Cairo, *The Truthful Art*: > 1. It is truthful, as it’s based on thorough and honest research. > > 2. It is functional, as it constitutes an accurate depiction of the data, and it’s built in a way that lets people do meaningful operations based on it (seeing change in time). > > 3. It is beautiful, in the sense of being attractive, intriguing, and even aesthetically pleasing for its intended audience—scientists, in the first place, but the general public, too. > > 4. It is insightful, as it reveals evidence that we would have a hard time seeing otherwise. > > 5. It is enlightening because if we grasp and accept the evidence it depicts, it will change our minds for the better. --- # What makes a great visualization? > Graphical excellence is the **well-designed presentation of interesting data**—a matter of substance, of statistics, and of design … [It] consists of complex ideas communicated with clarity, precision, and efficiency. … [It] is that which **gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space** … [It] is nearly always multivariate … And graphical excellence requires **telling the truth about the data**. .footnote[Edward Tufte, *The Visual Display of Quantitative Information*, p. 51] --- # What makes a great visualization? .midi[ - Good aesthetics - No substantive issues - No perceptual issues - Honesty + good judgment ] .footnote[Kieran Healy, *Data Visualization: A Practical Introduction*] --- # What's wrong? .left-column[ - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues ] .right-column[ <img src="images/01/pie-genus.png" width="75%" style="display: block; margin: auto;" /> ] ??? - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues --- # What's wrong? .left-column[ - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues ] .right-column[ <img src="images/01/changing-face-of-america.png" width="60%" style="display: block; margin: auto;" /> ] ??? - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues --- # What's wrong? .left-column[ - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues ] .right-column[ <img src="images/death_penalty.jpg" width="60%" style="display: block; margin: auto;" /> ] --- # What's right? .left-column[ - Aesthetic issues - Substantive issues - Perceptual issues - Honesty + judgment issues ] .right-column[ <img src="index_files/figure-html/flatten-the-curve-1.png" width="80%" style="display: block; margin: auto;" /> ] --- class: center, middle, inverse # Diving in! --- # Data visualization: Grammar of Graphics * Basics of visualization * Palmer penguins (!) * Parts of a graph: * aesthetics * color * shape * size * alpha (transparency) * faceting * Prettying up --- ## Data visualization > *"The simple graph has brought more information to the data analyst's mind than any other device."* > > John Tukey - Data visualization is the creation and study of the visual representation of data - Many tools for visualizing data -- R is one of them - Many approaches/systems within R for making data visualizations -- **ggplot2** is one of them, and that's what we're going to use --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="images/ggplot2-part-of-tidyverse.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ - **ggplot2** is tidyverse's data visualization package - `gg` in "ggplot2" stands for Grammar of Graphics - Inspired by the book **Grammar of Graphics** by Leland Wilkinson ] --- ## Grammar of Graphics .pull-left-narrow[ A grammar of graphics is a tool that enables us to concisely describe the components of a graphic ] .pull-right-wide[ <img src="images/grammar-of-graphics.png" width="60%" style="display: block; margin: auto;" /> ] .footnote[ Source: [BloggoType](http://bloggotype.blogspot.com/2016/08/holiday-notes2-grammar-of-graphics.html) ] --- ## Hello ggplot2! - `ggplot()` is the main function in ggplot2 - Plots are constructed in layers - Structure of the code for plots can be summarized as - The ggplot2 package comes with the tidyverse - For help with ggplot2, see [ggplot2.tidyverse.org](http://ggplot2.tidyverse.org/) --- class: middle, inverse # Why do we visualize? --- ## Why do we visualize? 1. Discover patterns that may not be obvious from numerical summaries --- ## Anscombe's quartet --- ## Summary statistics for Anscombe's quartet ``` ## # A tibble: 4 × 6 ## set mean_x mean_y sd_x sd_y r ## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 I 9 7.50 3.32 2.03 0.816 ## 2 II 9 7.50 3.32 2.03 0.816 ## 3 III 9 7.5 3.32 2.03 0.816 ## 4 IV 9 7.50 3.32 2.03 0.817 ``` --- ## Scatterplots for Anscombe's quartet <img src="index_files/figure-html/quartet-plot-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Why do we visualize? 1. Discover patterns that may not be obvious from numerical summaries 2. Convey information in a way that is otherwise difficult/impossible to convey --- ### Impact of Omicron variant on unvaccinated populations <img src="images/covid-hong-kong.jpeg" width="60%" style="display: block; margin: auto;" /> .footnote[ Source: [John Burn-Murdoch](https://twitter.com/jburnmurdoch/status/1503420660869214213) ] --- ### COVID-19 vaccination in US Counties <img src="images/nytimes-us-covid-vaccine.png" width="50%" style="display: block; margin: auto;" /> .footnote[ Source: [New York Times](https://www.nytimes.com/interactive/2020/us/covid-19-vaccine-doses.html), March 28, 2022. ] --- class: middle, inverse # Let's see it in action --- ## ggplot2 `\(\in\)` tidyverse .pull-left[ <img src="images/ggplot2-part-of-tidyverse.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ - **ggplot2** is tidyverse's data visualization package - Structure of the code for plots can be summarized as ] --- ## Data: Palmer Penguins Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex. .pull-left[ <img src="images/penguins.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[ ``` ## Rows: 344 ## Columns: 8 ## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel… ## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse… ## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, … ## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, … ## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186… ## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, … ## $ sex <fct> male, female, female, NA, female, male, female, male… ## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007… ``` ] --- # Penguins and you: common datasets ``` r # uncomment to install: # install.packages("palmerpenguins") data(palmerpenguins) # loads it into your environment head(penguins) ``` ``` ## # A tibble: 6 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 ## 2 Adelie Torgersen 39.5 17.4 186 3800 ## 3 Adelie Torgersen 40.3 18 195 3250 ## 4 Adelie Torgersen NA NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 3450 ## 6 Adelie Torgersen 39.3 20.6 190 3650 ## # ℹ 2 more variables: sex <fct>, year <int> ``` --- .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-20-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species") ``` ] ] --- class: middle, inverse # Coding out loud --- .small[ > **Start with the `penguins` data frame** ] .pull-left[ ``` r *ggplot(data = penguins) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > **map bill depth to the x-axis** ] .pull-left[ ``` r ggplot(data = penguins, * mapping = aes(x = bill_depth_mm)) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > **and map bill length to the y-axis.** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, * y = bill_length_mm)) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > **Represent each observation with a point** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm)) + * geom_point() ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > **and map species to the color of each point.** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, * color = species)) + geom_point() ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > **Title the plot "Bill depth and length"** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + * labs(title = "Bill depth and length") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-26-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Bill depth and length", > **add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins"** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", * subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > **label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", * x = "Bill depth (mm)", y = "Bill length (mm)") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > **label the legend "Species"** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", * color = "Species") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > **and add a caption for the data source.** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species", * caption = "Source: Palmer Station LTER / palmerpenguins package") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .small[ > Start with the `penguins` data frame, > map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Bill depth and length", > add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > and add a caption for the data source. > **Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.** ] .pull-left[ ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species", caption = "Source: Palmer Station LTER / palmerpenguins package") + * scale_color_viridis_d() ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-31-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-32-1.png" width="70%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r ggplot(data = penguins, mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + labs(title = "Bill depth and length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species", caption = "Source: Palmer Station LTER / palmerpenguins package") + scale_color_viridis_d() ``` ] .panel[.panel-name[Narrative] .pull-left-wide[ .small[ Start with the `penguins` data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot "Bill depth and length", add the subtitle "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, label the legend "Species", and add a caption for the data source. Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness. ] ] ] ] --- ## Argument names .tip[ You can omit the names of the first two arguments when building plots with `ggplot()`. ] .pull-left[ ] .pull-right[ ``` r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + scale_color_viridis_d() ``` ] --- class: middle, inverse # Aesthetics --- ## Aesthetics options Commonly used characteristics of plotting characters that can be **mapped to a specific variable** in the data are - `color` - `shape` - `size` - `alpha` (transparency) --- ## Color .pull-left[ ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-33-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to a different variable than `color` .pull-left[ ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-34-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Shape Mapped to same variable as `color` .pull-left[ ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-35-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Size .pull-left[ ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-36-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Alpha .pull-left[ ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-37-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .pull-left[ **Mapping** <img src="index_files/figure-html/unnamed-chunk-38-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ **Setting** <img src="index_files/figure-html/unnamed-chunk-39-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Mapping vs. setting - **Mapping:** Determine the size, alpha, etc. of points based on the values of a variable in the data - goes into `aes()` - **Setting:** Determine the size, alpha, etc. of points **not** based on the values of a variable in the data - goes into `geom_*()` (this was `geom_point()` in the previous example, but we'll learn about other geoms soon!) --- class: middle, inverse # Faceting --- ## Faceting - Smaller plots that display different subsets of the data - Useful for exploring conditional relationships and large data --- .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-40-1.png" width="65%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * facet_grid(rows = vars(species), cols = vars(island)) ``` ] ] --- ## Various ways to facet .question[ In the next few slides describe what each plot displays. Think about how the code relates to the output. **Note:** The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots! ] --- <img src="index_files/figure-html/unnamed-chunk-41-1.png" width="65%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/unnamed-chunk-42-1.png" width="65%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/unnamed-chunk-43-1.png" width="65%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/unnamed-chunk-44-1.png" width="80%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/unnamed-chunk-45-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Faceting summary - `facet_grid()`: - 2 dimensional grid - `rows = vars(<VARIABLE>), cols = vars(<VARIABLE>)` - Alternative: `rows ~ cols` - `facet_wrap()`: 1 dimensional ribbon wrapped according to number of rows and columns specified or available plotting area --- ## Facet and color .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-46-1.png" width="65%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r ggplot( penguins, aes(x = bill_depth_mm, y = bill_length_mm, * color = species)) + geom_point() + facet_grid(species ~ sex) + * scale_color_viridis_d() ``` ] ] --- ## Facet and color, no legend .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/unnamed-chunk-47-1.png" width="65%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r ggplot( penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() + facet_grid(species ~ sex) + * scale_color_viridis_d(guide = "none") ``` ] ] --- class: middle, inverse # Take a sad plot, and make it better --- The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. [This report](https://www.aaup.org/sites/default/files/files/AAUP_Report_InstrStaff-75-11_apr2013.pdf) by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below. <img src="images/staff-employment.png" width="70%" style="display: block; margin: auto;" /> --- Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year. ``` ## # A tibble: 5 × 12 ## faculty_type `1975` `1989` `1993` `1995` `1999` `2001` `2003` `2005` `2007` ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Full-Time Tenu… 29 27.6 25 24.8 21.8 20.3 19.3 17.8 17.2 ## 2 Full-Time Tenu… 16.1 11.4 10.2 9.6 8.9 9.2 8.8 8.2 8 ## 3 Full-Time Non-… 10.3 14.1 13.6 13.6 15.2 15.5 15 14.8 14.9 ## 4 Part-Time Facu… 24 30.4 33.1 33.2 35.5 36 37 39.3 40.5 ## 5 Graduate Stude… 20.5 16.5 18.1 18.8 18.7 19 20 19.9 19.5 ## # ℹ 2 more variables: `2009` <dbl>, `2011` <dbl> ``` .footnote[https://uchicago.box.com/s/eqk73widao74ysdd172ob81jac38ecjx] --- ## Recreate the visualization In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format. But before we do so... .task[ If the long data will have a row for each year/faculty type combination, and there are 5 faculty types and 11 years of data, how many rows will the data have? ] --- class: center, middle <img src="images/pivot.gif" width="80%" style="display: block; margin: auto;" /> --- ## `pivot_*()` function <img src="https://github.com/gadenbuie/tidyexplain/raw/main/images/tidyr-pivoting.gif" width="50%" style="display: block; margin: auto;" /> --- ## `pivot_longer()` - The first argument is `data` as usual. - The second argument, `cols`, is where you specify which columns to pivot into longer format -- in this case all columns except for the `faculty_type` - The third argument, `names_to`, is a string specifying the name of the column to create from the data stored in the column names of data -- in this case `year` - The fourth argument, `values_to`, is a string specifying the name of the column to create from the data stored in cell values, in this case `percentage` --- ## Pivot instructor data .small[ ``` ## # A tibble: 55 × 3 ## faculty_type year percentage ## <chr> <chr> <dbl> ## 1 Full-Time Tenured Faculty 1975 29 ## 2 Full-Time Tenured Faculty 1989 27.6 ## 3 Full-Time Tenured Faculty 1993 25 ## 4 Full-Time Tenured Faculty 1995 24.8 ## 5 Full-Time Tenured Faculty 1999 21.8 ## 6 Full-Time Tenured Faculty 2001 20.3 ## 7 Full-Time Tenured Faculty 2003 19.3 ## 8 Full-Time Tenured Faculty 2005 17.8 ## 9 Full-Time Tenured Faculty 2007 17.2 ## 10 Full-Time Tenured Faculty 2009 16.8 ## # ℹ 45 more rows ``` ] --- .question[ This doesn't look quite right, how would you fix it? ] .small[ <img src="index_files/figure-html/unnamed-chunk-53-1.png" width="80%" style="display: block; margin: auto;" /> ] --- .small[ <img src="index_files/figure-html/unnamed-chunk-54-1.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Some improvement... .small[ <img src="index_files/figure-html/unnamed-chunk-55-1.png" width="60%" style="display: block; margin: auto;" /> ] --- ## More improvement .small[ <img src="index_files/figure-html/unnamed-chunk-56-1.png" width="85%" style="display: block; margin: auto;" /> ] --- ## Goal: even more improvement! .task[ I want to achieve the following look but I have no idea how! ] <img src="images/sketch.png" width="70%" style="display: block; margin: auto;" /> --- ## Asking good questions - Describe what you want - Describe where you are - Create a minimal **repr**oducible **ex**ample: `reprex::reprex()` --- .panelset[ .panel[.panel-name[Plot] <img src="index_files/figure-html/instructor-lines-1.png" width="100%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Code] ``` r library(scales) staff_long %>% * mutate( * part_time = if_else(faculty_type == "Part-Time Faculty", * "Part-Time Faculty", "Other Faculty"), * year = as.numeric(year) * ) %>% ggplot(aes(x = year, y = percentage/100, group = faculty_type, color = part_time)) + geom_line() + * scale_color_manual(values = c("gray", "red")) + * scale_y_continuous(labels = label_percent(accuracy = 1)) + theme_minimal() + labs( title = "Instructional staff employment trends", x = "Year", y = "Percentage", color = NULL ) + * theme(legend.position = "bottom") ``` ]] --- ## Practice: Penguin challenge Choose one of the following plots and explain why you think it is the best representation of the data. Describe a research question it might help you answer. Post on Ed -- be specific and reference the readings when possible. .panelset[ .panel[.panel-name[Facets] <img src="index_files/figure-html/unnamed-chunk-58-1.png" width="45%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Alpha] <img src="index_files/figure-html/unnamed-chunk-59-1.png" width="45%" style="display: block; margin: auto;" /> ] .panel[.panel-name[Basic] <img src="index_files/figure-html/unnamed-chunk-60-1.png" width="45%" style="display: block; margin: auto;" /> ] ] --- class: inverse # Assignment 1 You need to find a graph and **critique** it (don't totally trash it -- this is an academic exercise). If you want you can work to make it better if you can get your hands on similar data. But if not, that's OK! --- # Speaking of: doing well on assignments <img src="https://geekd-out.com/wp-content/uploads/2018/08/sugar-rush-featured-image.jpg" width="65%" style="display: block; margin: auto;" /> --- # Recap Parts of a graph: * aesthetics * color * shape * size * alpha (transparency) * faceting --- class: middle, inverse # Before Thursday --- ## Before Thursday - Create a GitHub account if you don't have one - Start (or even complete!!) the [*FIRST ASSIGNMENT*](https://macs40700.netlify.app/assignments/assign1/) - Read the [syllabus](https://macs40700.netlify.app/course-syllabus/) - Complete the [reading](https://macs40700.netlify.app/) - Make sure you have working installations of R and RStudio --- --- # Reminder: Workflow Here will be your workflow for class: * Start on on the website: read over the assignment description. * Click the github link to accept the assignment: this will create a repo with the proper format/setup for the assignment. The permissions are set so that it will be private to all except you and me * Connect to the github repo and push your work to it * When complete, go to Canvas, submit the github repo link. (we can't push grades from github to Canvas, unfortunately)