How-to Guide
Introduction

In this guide, you will learn how to create a line chart in R using the tidyverse set of R packages. Readers are provided links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. The example assumes you have downloaded the relevant data files to a folder on your computer and that you are using the RStudio IDE. The relevant code should, however, work in other environments too.

Contents

1. Line Chart

2. An Example in R: Central American Greenhouse Gas Emissions 2000–2012

• 2.1 The R Procedure
• 2.1.1 Preparing the Data
• 2.1.2 Plotting the Data
• 2.2 Exploring the Output

1. Line Chart

The line chart is often considered the default option in statistical graphics and is probably the most commonly used chart type. Like the vertical bar chart, it can be used for displaying continuous data and is typically used to visualize data spanning a specific period of time. A line chart consists of individual data points connected by a line, where the line is an approximation of the values falling between recorded points. Most often this means employing a straight line between points, but sometimes a curved or stepped line is also used—the latter when changes in the value are abrupt. The points themselves are usually marked using dots, though in the case that the chart includes a great number of data points, they are usually left unmarked to avoid cluttering the graphic. Markers are also not used with stepped or curved line charts. A line chart should always have a quantitative scale on both x and y-axes.

2. An Example in R: Central American Greenhouse Gas Emissions 2000–2012

Figure 1 shows a basic line chart of Central American countries’ greenhouse gas emissions during the period 2000–2012. The chart is generated with World Bank data, and a set of R packages called tidyverse.

The chart shows kt of CO2 equivalent of emissions by Guatemala, Honduras, Nicaragua, Panama, El Salvador, Costa Rica, and Belize. The horizontal axis ranges from 2000 to 2012 in increments of one year. The vertical axis ranges from 0 to 70,000 in increments of 10,000. The approximate data from the chart are tabulated below.

 Year Guatemala Honduras Nicaragua Panama El Salvador Costa Rica Belize 2000 70000 21500 14500 9000 10500 10000 2000 2001 21500 13000 15000 11000 12500 11000 1000 2002 31000 15000 15000 10000 12500 10000 2000 2003 55500 22000 17000 10500 12500 10500 4000 2004 25000 15500 15500 10000 12500 10500 2000 2005 42000 22500 17000 10000 12500 10000 2500 2006 27500 17000 1500 12000 13000 10500 2000 2007 34000 19500 15500 11500 13000 11000 1500 2008 33000 20000 16000 15500 13000 10500 2000 2009 38000 20000 16000 15750 13000 11000 2000 2010 30000 20000 16000 15750 13000 12500 2000 2011 30500 20000 16000 16000 13000 13000 2000 2012 31000 20500 16000 16000 13000 13000 2000

Text at the bottom of the chart reads, “Source: World Bank, 2020.”

Figure 1. Line Chart of Greenhouse Gas Emissions in Central America
2.1 The R Procedure

R is a free open-source software and computing platform for statistical analysis with many charting options. R is not based on a graphical interface with pull-down menus. Rather, you input lines of code that execute functions and operations built into R or different packages. It is best to save your code in a simple text file that R users generally refer to as a script file. We provide a script file with this example that executes all of the operations described here. If you are not familiar with R, we suggest you start with the introduction manual located here (http://cran.R-project.org/doc/manuals/R-release/R-intro.html).

For this example, we are using RStudio, a free, open-source user interface for R which makes working with R programming easier.

In this example, we write our code using R Script, found in the top left of the four windows in R Studio. This means that all actions can be recorded and kept for further use. It is helpful to do this to be able to trace back your steps and decisions made in the analysis. To run the code, you can either press Ctrl + Enter (or Command + Enter on a Mac) after each line of code or highlight the line(s) of code you wish to perform and click Run. (Code can also be written in the Console area in the bottom left, pressing Enter at the end of each line of code. This does not record your actions, however.)

Creating the line chart requires installing some packages and importing the libraries. Install them if you do not have them already, either using the interface or by typing install.packages(“packagename”) in the console or using the menu item Install packages… under Tools. You will get an error message if something is missing when trying to run the script. If needed, just install the missing package and everything should work after that.

The necessary packages are:

Install these as needed and save the tutorial csv data file world-bank-emissions.csv to a folder on your computer. The example uses a folder called sage-dataset in the user root, where the table goes in a subfolder tables.

Begin by running the code in the script file up to line 4 by marking the lines and hitting control + enter (command + enter on a mac), this will import the necessary libraries.

2.1.1 Preparing the Data

We will begin by reading in our data:

You can take a look at the data table by typing View(greenhouse_emissions) into your console panel in the bottom left quadrant of the interface or by opening the data table from the top right Environment panel. You can also view just the column names with the command colnames(greenhouse_emissions) (Figure 2).

The window consists of two tabs: linechart-in-worldbank-2020-project and greenhouse_emissions. The second tab is selected. The data table shows the following columns: Country Name, Country Code, Indicator Name, Indicator Code, 1970, 1971, 1973, and 1974. Text at the bottom of the window reads, “Showing 1 to 23 entries of 264 entries, 47 total columns.”

Figure 2.

We will firstly make a copy of the original dataset in case we later want to make several different subsets…

GE <- greenhouse_emissions

… and for ease of use, we will rename one column into a shorter form:

GE <- GE %>% rename("Name" = "Country Name")

Next, we will select the columns and rows we want to visualize in our chart. In this case, we will take the years 2000–2012, and choose the countries of Central America by name. In this case, we use select() for columns, and filter() for rows. You could also choose rows by value, for example, with %>% filter(GE\$2012> 10000000).

GE <- GE %>% select("Name", "2000":"2012") %>% filter(Name %in% c("Guatemala", "Belize", "Panama","Costa Rica","Nicaragua","Honduras","El Salvador"))

Opening up our GE data frame now, it should look something like Figure 3.

The screenshot shows two tabs: linechart-in-worldbank-2020-project and GE. The second tab is selected. The table shows the following columns: Name, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, and 2008.

Figure 3.

We will reformat the table for easier plotting, gathering all the individual year columns together into Year and all emission values into an Emissions column (Figure 4).

GE <- GE %>% gather(Year, Emissions, -Name)

The screenshot shows two tabs: linechart-in-worldbank-2020-project and GE. The second tab is selected. The table shows the following columns: Name, Year, and Emissions.

Figure 4.
2.1.2 Plotting the Data

Now that we have our data ready, we can begin plotting the line chart. We will begin by initializing a ggplot plot and adding series lines and points.

lineChart <- ggplot(GE, aes(x=Year, y=Emissions, group=Name, color = Name, label = Name)) + geom_line()+ geom_point()

We will also assign the highly visible Dark2 color palette and a minimal chart theme at this point. To peruse possible color palettes and further documentation on colors, you can search for “scale_color_brewer” in your Help panel. You can also check on the current state of your plot at any time by typing lineChart in your Console panel (Figure 5).

lineChart<- lineChart + theme_minimal() + scale_color_brewer(palette = "Dark2")

The chart shows the emissions by Guatemala, Honduras, Nicaragua, Panama, El Salvador, Costa Rica, and Belize. The horizontal axis is labeled Year and ranges from 2000 to 2012 in increments of one year. The vertical axis is labeled Emissions and ranges from 0 to 70,000 in increments of 10,000. The approximate data from the chart are tabulated below.

 Year Belize Costa Rica El Salvador Guatemala Honduras Nicaragua Panama 2000 2000 10000 10500 70000 21500 14500 9000 2001 1000 11000 12500 21500 13000 15000 11000 2002 2000 10000 12500 31000 15000 15000 10000 2003 4000 10500 12500 55500 22000 17000 10500 2004 2000 10500 12500 25000 15500 15500 10000 2005 2500 10000 12500 42000 22500 17000 10000 2006 2000 10500 13000 27500 17000 1500 12000 2007 1500 11000 13000 34000 19500 15500 11500 2008 2000 10500 13000 33000 20000 16000 15500 2009 2000 11000 13000 38000 20000 16000 15750 2010 2000 12500 13000 30000 20000 16000 15750 2011 2000 13000 13000 30500 20000 16000 16000 2012 2000 13000 13000 31000 20500 16000 16000
Figure 5.

We will continue by making all extraneous gridlines and backgrounds blank and adding a nice margin for visual breathing space. If you find that for some reason some of your chart elements are being cut off by the edge of the plot area, you can experiment with changing scale expansion parameters in scale_x_discrete(expand=c(0.1, 0.6)). If you also wanted to give your x and y axes labels, you would do so at the end of this section in ylab("Y AXIS TEXT") + xlab("X AXIS TEXT"), though in this case, we prefer to leave them blank.

lineChart<- lineChart + scale_x_discrete(expand=c(0.1, 0.6)) +

theme(legend.position = "none",

panel.grid.major.x = element_blank(),

panel.background = element_blank(),

plot.margin=unit(c(30, 80, 30, 30), "points")) +

ylab("") + xlab("")

Next, we add our chart title, subtitle, and source information:

lineChart <- lineChart + labs(title = "Total greenhouse gas emissions, 2000-2012",

subtitle="kt of CO2 equivalent",

caption="Source: World Bank, 2020") +

theme(plot.title = element_text(face="bold",size=14, color="black"))

And finally, adding direct text labels for each series:

lineChart<- lineChart + geom_text_repel(label=ifelse(GE\$Year == "2012", GE\$Name,""),

direction="y",

segment.alpha = 0.2,

nudge_x = 2,

nudge_y = 2)

For the text labels, we use the ggrepel package’s geom_text_repel() function to avoid label overlap. You can also experiment with using the more traditional geom_text() with future datasets if the series labels are not in any danger of overlapping. We have added conditional formatting for the label to turn off all other point labels but that of the final year 2012. Most other parameters are somewhat self-explanatory, with special mention for the nudge_x and nudge_y; parameters, which move the text a certain distance from the origin, as well as box.padding for determining the distance between labels and direction for determining in which direction the labels should be allowed to move. For further documentation on using these repelling labels, just search for “ggrepel” in your Help panel.

Now that we have added all our elements to the lineChart variable, we can plot our finished chart (Figure 6).

lineChart

The chart shows kt of CO2 equivalent of emissions by Guatemala, Honduras, Nicaragua, Panama, El Salvador, Costa Rica, and Belize. The horizontal axis ranges from 2000 to 2012 in increments of one year. The vertical axis ranges from 0 to 70,000 in increments of 10,000. The approximate data from the chart are tabulated below.

 Year Guatemala Honduras Nicaragua Panama El Salvador Costa Rica Belize 2000 70000 21500 14500 9000 10500 10000 2000 2001 21500 13000 15000 11000 12500 11000 1000 2002 31000 15000 15000 10000 12500 10000 2000 2003 55500 22000 17000 10500 12500 10500 4000 2004 25000 15500 15500 10000 12500 10500 2000 2005 42000 22500 17000 10000 12500 10000 2500 2006 27500 17000 1500 12000 13000 10500 2000 2007 34000 19500 15500 11500 13000 11000 1500 2008 33000 20000 16000 15500 13000 10500 2000 2009 38000 20000 16000 15750 13000 11000 2000 2010 30000 20000 16000 15750 13000 12500 2000 2011 30500 20000 16000 16000 13000 13000 2000 2012 31000 20500 16000 16000 13000 13000 2000

Text at the bottom of the chart reads, “Source: World Bank, 2020.

Figure 6. Our Completed Line Chart

If you wanted to visually fine tune your plot, you can export it out of R as a PDF (see the Export tab above your plot) and open it in a vector graphics editing program of your choice, such as Adobe Illustrator or Inkscape. The following image has been ever so slightly tweaked for improved legibility and clean alignment. Many of these small adjustments can of course also be executed within R, such as drawing a bolder zero baseline with lineChart <- lineChart + geom_segment(aes(x=2000, xend=2012, y=0, yend=0,), color="black", size=0.5) (Figure 7).

The chart shows kt of CO2 equivalent of emissions by Guatemala, Honduras, Nicaragua, Panama, El Salvador, Costa Rica, and Belize. The horizontal axis ranges from 2000 to 2012 in increments of one year. The vertical axis ranges from 0 to 70,000 in increments of 10,000. The approximate data from the chart are tabulated below.

 Year Guatemala Honduras Nicaragua Panama El Salvador Costa Rica Belize 2000 70000 21500 14500 9000 10500 10000 2000 2001 21500 13000 15000 11000 12500 11000 1000 2002 31000 15000 15000 10000 12500 10000 2000 2003 55500 22000 17000 10500 12500 10500 4000 2004 25000 15500 15500 10000 12500 10500 2000 2005 42000 22500 17000 10000 12500 10000 2500 2006 27500 17000 1500 12000 13000 10500 2000 2007 34000 19500 15500 11500 13000 11000 1500 2008 33000 20000 16000 15500 13000 10500 2000 2009 38000 20000 16000 15750 13000 11000 2000 2010 30000 20000 16000 15750 13000 12500 2000 2011 30500 20000 16000 16000 13000 13000 2000 2012 31000 20500 16000 16000 13000 13000 2000

Text at the bottom of the chart reads, “Source: World Bank, 2020.”

Figure 7. The Finished Line Chart After Some Tweaking in Illustrator
2.2 Exploring the Output

The line chart created through this demonstration shows what a wide range of emissions are generated in a relatively limited geographical area, by countries of relatively similar size in terms of area. Guatemala obviously differs greatly from the other Central American countries, emitting even at the best of times half again as much as the next country, Honduras. This is at least partially an indication of Guatemala’s relatively large population and economic output, but maybe also indicative of variables such as the means of generating electricity.

Several countries seem to exhibit emission peaks in both 2003 and 2005, while emissions since then seem to have mostly stabilized. The data itself does not hint at the reason for these peaks but does highlight some interesting time periods for further study.