Skip to main content
SAGE
Search form
  • 00:00

    [MUSIC PLAYING][DATA VISUALIZATION] [case study][Making Small Multiples in R to Comparethe Effect of the Coronavirus Pandemic Within the UK]

  • 00:17

    ALEX REPPEL: My name is Alex Reppel,and I'm a Reader in Marketing at Royal Holloway,University of London.[Alex Reppel, Reader in Marketing] [Royal Holloway,][University of London] This is the second part of a two-partseries on small multiples.In part 1, we learned that this form of visualizationis particularly useful when we wantto present a lot of information and compare this information

  • 00:40

    ALEX REPPEL [continued]: consistently and efficiently.In this second part, I'm going to demonstratehow to create a small multiple using R and the datavisualization library ggplot2.This video is a fairly quick walk-throughand does not cover every step along the way.

  • 01:03

    ALEX REPPEL [continued]: A detailed description is availablein the supplementary material for this video.[Introduction Aim of Data Visualization]From the beginning of the coronavirus pandemic in 2020,small multiples were used to compare outbreaks

  • 01:23

    ALEX REPPEL [continued]: across countries.Here you see a collection of examplesfrom different websites.In the previous video, we looked at one examplein more detail, John Burn-Murdoch's small multipleof mortality rates for different countriesthat he developed for the Financial Times.

  • 01:45

    ALEX REPPEL [continued]: The aim of this part in the seriesis to create a small multiple of mortality rates for Englandand Wales across different age groups.As an additional challenge, we willreproduce some of the aesthetics as well asthe overall visual appeal of the visualization developed

  • 02:08

    ALEX REPPEL [continued]: by John Burn-Murdoch for the Financial Times.[Customization]For this we have to specify a numberof variables such as a title and acknowledgments.Reproducing the aesthetics of an existing visualization

  • 02:31

    ALEX REPPEL [continued]: makes this exercise a little more ambitious.And we have to specify a couple of additional options.These are a subtitle that allows usto highlight individual words.You can see that here in the examplethe ability to highlight specific weeks

  • 02:54

    ALEX REPPEL [continued]: in the final graph and to specify a custom color scheme.Defining a title and acknowledgmentsis fairly straightforward.But highlighting individual words in the subtitleis a little more challenging because wewant to use the subtitle of the final graph

  • 03:16

    ALEX REPPEL [continued]: as a replacement for a legend.The visualizations library we useisn't able to do that by default,which means we have to find a workaround that includessome HTML code as well as placeholdersto insert color labels and dates dynamically later on.

  • 03:37

    ALEX REPPEL [continued]: And you can see that here for the subtitle variable.Another customization is to highlight individual weeksin our graph.The variable Highlight Weeks contains the numberof each week we want to highlight in the final graph,and we specify that here.The last customization we have to do

  • 03:59

    ALEX REPPEL [continued]: is to specify the colors we will be using in our graph.Ggplot2 offers a wide variety of built-in themes and colorschemes.So this is yet another step that isn't reallyrequired for creating good-looking graphs.[Data Files]

  • 04:21

    ALEX REPPEL [continued]: The data we will be using was downloaded from the Officefor National Statistics.And the data file extracted from the ONS dataneeds to be read into a variable.This is accomplished with the read.csv function.By printing a few rows of our data set,we can confirm that it is now available.

  • 04:45

    ALEX REPPEL [continued]: All is well, and we can now move onto manipulate the data to generateall the remaining elements for our graph.[Data Manipulation]Specifically, we have to convert datesfrom text into date objects.

  • 05:05

    ALEX REPPEL [continued]: We have to tidy the data set, and wehave to create custom labels for our x-axis.Dates are currently stored as text strings,but it would be much easier to workwith dates as date objects.We do this using the as.Date function.

  • 05:28

    ALEX REPPEL [continued]: Once dates are converted into date objects,R knows what to do with them.We can, for example, move each individual date storedin our data set back four days.And you can see that all we have to dois to substract four from each of the dates.We're doing this because the dates in the ONS data set

  • 05:53

    ALEX REPPEL [continued]: are set to a Friday, which is the last weekday of each week.By substracting four days, we arriveat the Monday of the same week, whichis the label we want to use in our visualization.While not strictly necessary, this stepdemonstrates the flexibility we havein terms of manipulating data within R

  • 06:15

    ALEX REPPEL [continued]: as opposed to, say, changing datelabels manually in a spreadsheet program.Earlier we have confirmed that our datawas successfully loaded.However, there still is a small issue with our datathat we have to fix.The problem is that some of the values we want to visualize

  • 06:41

    ALEX REPPEL [continued]: are stored in column labels.The problem is that some of the values we want to visualizeare stored in the labels of several columns.You can see here that the labels for separate columnsare the age groups that we want to use later on to create

  • 07:05

    ALEX REPPEL [continued]: our small multiple.The format you see here is often referred to as wide format.This is opposed to what is often referred to as long formatwhere no values are stored in column labels.While I find a wide format much easier

  • 07:26

    ALEX REPPEL [continued]: to work with manually in a spreadsheet program,for example, the visualization library ggplot2disagrees and prefers data to be stored in longor so-called tidy format.Luckily converting our data between the two

  • 07:46

    ALEX REPPEL [continued]: is straightforward.And you can see that here where we gather the dataand extract it from the column labelsand put them into a variable.And once converted, our data set no longercontains values in column labels,which is exactly what we want.

  • 08:06

    ALEX REPPEL [continued]: And we can see that here in the second table.Now all that's left is to generate labels for the x-axis.Instead of listing the number of weeks,we want to show actual dates.And we do this by extracting the dates for 2020from our data set that correspond to the weeks

  • 08:29

    ALEX REPPEL [continued]: we want to highlight.Because we converted dates into date objects earlier,it is now much easier for us to adjustthe format in which they should be displayed on the x-axis.We use a format that shows the day of the month on one lineand the first three letters of that month underneath it.

  • 08:54

    ALEX REPPEL [continued]: The previous data manipulations wereapplied to the entire data set.The final one requires a distinctionbetween current and historic dataor between data from the most recent yearand from previous years.We can quickly separate both using the Subset functionand move on to calculate the average number of cases

  • 09:17

    ALEX REPPEL [continued]: per week for previous years.And you see that here.While not particularly elegant, splittingthe data into two separate data setsis a pragmatic way of creating that average.We can also now calculate the last missing bit of informationwe will need to create our graph.

  • 09:40

    ALEX REPPEL [continued]: And that is the minimum value for each weekof either the current year or the historic average.We store that in a variable called YMIN.This is required for a key featureof our graph, the highlighted area where the number of cases

  • 10:02

    ALEX REPPEL [continued]: for the current year is above the average numberof historic cases.We are now finally ready to generate the graph.[Generating the Graph]In ggplot2, a graph is initiated with the ggplot function.

  • 10:24

    ALEX REPPEL [continued]: The following code tells that functionto use the historic data set as its defaultand to show weeks on the x-axis and case counts on the y-axis.This creates a frame on which our graphcan be built layer by layer.Now that we have the frame, we canbegin building our graph by adding additional layers to it.

  • 10:48

    ALEX REPPEL [continued]: However, before we do that, we quicklyadd the title, subtitle, and caption to our frame.Adding labels to a graph is very simple with ggplot2.Unfortunately, it gets a little more complicatedbecause of the customization we want to achieve.For this, we first have to prepare the subtitle

  • 11:11

    ALEX REPPEL [continued]: and caption text.Remember the subtitle variable at the beginningof this exercise?It's shown here again, and it looks unwieldy.And the following code block doesn't reallymake it any better.But using a combination of HTML codeand what is known as string formatting,

  • 11:34

    ALEX REPPEL [continued]: we can generate a subtitle that uses some formattingand can thus be used to replace a legend.The acknowledgments also require a bit of formatting.And once this is done, we can add the title, the subtitle,and acknowledgments to the frame.And voila.

  • 11:57

    ALEX REPPEL [continued]: Printing the graph shows us an empty frame.This is exactly as it should be because we haven't yetspecified how data should be put on this empty frame.But we can see that our acknowledgments are there,our title is there, and the subtitle nowincludes all the information we want

  • 12:19

    ALEX REPPEL [continued]: to show such as dates and the colorcodes we will be using to highlight individual words.So before we add data to our frame,we convert it first into a small multiple.And this step really demonstrates the power

  • 12:39

    ALEX REPPEL [continued]: of the ggplot to a library.All that is required is to use the facet_wrap functiontogether with three arguments.First, the variable used to distinguish individual graphs,in our example that is age group.Second, the number of rows used to organize individual graphs,

  • 13:01

    ALEX REPPEL [continued]: which we have set to two.And finally, whether the same or different scalesshould be used for each individual graph.We specify a free scale, which meansthat each individual graph uses their own scale.And this is exactly what we see here,a small multiple consisting of a grid of six graphs for the six

  • 13:25

    ALEX REPPEL [continued]: age groups.And the numbers on the y-axis differfor each of the six graphs because wehave used a free scale.[Additional Layers]Now that we have established the frame for our small multiple,we can begin adding data to it.

  • 13:47

    ALEX REPPEL [continued]: The beauty of the ggplot2 libraryis the clarity with which graphs are built.Ggplot2 applies a layered grammarof graphics, which means that a graph can be thoughtof as a collection of layers, each one drawnover the previous one.

  • 14:08

    ALEX REPPEL [continued]: The first layer shows historic dataas a series of line graphs.Each line represents the number of weekly cases for oneyear between 2010 and 2019.The way these lines are arranged does notallow for a direct comparison of historic data.

  • 14:29

    ALEX REPPEL [continued]: But we can broadly see where the caseshave fluctuated over the years.Much more insightful is the average numberof cases over all previous years.This is, once again, shown as a line graph although this timewith a thicker line.Now we come to one of the most interesting and powerful

  • 14:50

    ALEX REPPEL [continued]: aspects of the original inspiration.In the original inspiration for this exercise,we saw a shaded area indicating excess mortalityduring the outbreak.Replicating this feature means that we have to colorthe area where cases for the current year

  • 15:11

    ALEX REPPEL [continued]: stay above the historic average.We do this by specifying a maximum and minimum valueon the y-axis for each week.The maximum value is always the value for the current year.The minimum value is either the valuefor the current year or that of the historic average, whatever

  • 15:35

    ALEX REPPEL [continued]: is lowest.We already have calculated these valuesand can access them with the YMIN variable.We can now clearly see the difference between the casecount for 2020 compared to the historic average.The fourth layer is added for visual effect,once again inspired by John Burn-Murdoch's original.

  • 15:57

    ALEX REPPEL [continued]: Here we draw a white line above the areagraph of the previous layer.The line size is slightly thickerthan the next layer, which will create a pleasing visual effectof a subtle white shadow underneath the actual linefor the current year.Because it doesn't yet look like much,

  • 16:19

    ALEX REPPEL [continued]: we move swiftly on to the next layer.A red line graph representing the case countfor the current year is drawn on top of the previous white line.The result is an impression that layers three to fivefrom a visual entity, a filled area chartwith a distinct outline and the subtle white shadow underneath.

  • 16:44

    ALEX REPPEL [continued]: And this is exactly what we wanted.The final layer is yet another layeradded for aesthetic reasons.It is a straight horizontal line that meets the y-axis at 0.0.And the graph is now ready.[Themes]

  • 17:07

    ALEX REPPEL [continued]: The ggplot2 library includes a number of attractive themesto ensure visual consistency and to enhancethe appeal of the final output.Here we see one of those themes applied to our layered graph.However, we want to replicate our original inspiration

  • 17:28

    ALEX REPPEL [continued]: as closely as possible, which requiresus to customize our theme.While straightforward, this step can take some timeto get right as is evident from the fairly long listof customization options that we have added as you can see here.

  • 17:49

    ALEX REPPEL [continued]: But the result is worth the effort.And finally, our subtitle looks as expectedwith individual words highlightedto correspond with the data shownin each of the six charts.[Scales]However, we still have to improve

  • 18:11

    ALEX REPPEL [continued]: the labels used on both axes.At the moment, ggplot2 has applied default labels, whichare, in a sense, best guesses.And there's nothing wrong with them,but we want to replace labels on the x-axis with dates.So when we apply this, you can see here

  • 18:33

    ALEX REPPEL [continued]: that the dates shown on the x-axis correspondwith the weeks we said earlier we wanted to highlight.For the y-axis, we specify up to three values, a maximum value,roughly half of that maximum, and zero at the bottom.

  • 18:55

    ALEX REPPEL [continued]: Our small multiple is now complete.While it does not include all the annotationsof the original inspiration, we did manageto include several of them.We did manage to add a customized subtitle.We added dates as labels on the x-axis.

  • 19:16

    ALEX REPPEL [continued]: And we added an area chart to highlightthe difference between the case count for 2020compared to the historic average.In the first part of this video series on small multiples,I quoted Hadley Wickham, who statesthat fixed scales make it easier to see patterns across panels.

  • 19:40

    ALEX REPPEL [continued]: Free scales make it easier to see patterns within panels.We can confirm that looking at our result.Using a separate scale for each chartallows us to see patterns within panels.How different would our small multiplelook if we apply the same scale to all charts?

  • 20:04

    ALEX REPPEL [continued]: To demonstrate the difference between a free scaleand a fixed scale, I have manually adjusted the y-axisto fit the highest value across the entire data set.The result is a y-axis that is fixedto that value across all charts.

  • 20:25

    ALEX REPPEL [continued]: The outcome of this change is quite striking.[Summary]The aim of this exercise was to showhow easy it is to create aesthetically pleasing datavisualizations using R and the ggplot2 library.

  • 20:46

    ALEX REPPEL [continued]: A considerable level of complexitywas added by a desire to reproducesome of the aesthetics and the overall visual appealof John Burn-Murdoch's small multiplefor the Financial Times.When I first started learning R not too long ago,

  • 21:08

    ALEX REPPEL [continued]: it was particularly the ease with whichI was able to create small multiples that impressed methe most.So I hope you have enjoyed this video.While the final graph we have createdmay not be as insightful as some of the examples I showed youat the beginning of this video, I nevertheless

  • 21:29

    ALEX REPPEL [continued]: hope that it encourages you to exploreR and the ggplot library when creating your own datavisualizations.

  • 21:53

    ALEX REPPEL [continued]: nbsp;

Video Info

Series Name: Alex Reppel on Small Multiples

Episode: 2

Publisher: SAGE Publications, Ltd.

Publication Year: 2021

Video Type:Video Case

Methods: Data visualization, Data management, R packages

Keywords: charts (data visualization); communication aids; coronaviruses; customization; data management; data manipulation; data visualisation; database design; graphical presentation of data; pandemics; Scales of measurement; Statistical packages; variables (research) ... Show More

Segment Info

Segment Num.: 1

Persons Discussed:

Events Discussed:

Keywords:

Abstract

Dr. Alex Reppel, a reader in marketing at Royal Holloway, University of London, discusses using R and ggplot to create and customize small multiples to visualize the effect of the coronavirus pandemic.

Looks like you do not have access to this content.

Making Small Multiples in R to Compare the Effect of the Coronavirus Pandemic Within the UK

Dr. Alex Reppel, a reader in marketing at Royal Holloway, University of London, discusses using R and ggplot to create and customize small multiples to visualize the effect of the coronavirus pandemic.

Copy and paste the following HTML into your website