How-to Guide for R
Introduction

In this guide, you will learn how to produce closeness centrality in Statistical Software R, using a practical example to illustrate this process. Readers are provided with links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. This example assumes that you have the data file stored in the working directory being used by R.

Contents
  • Closeness
  • An Example in R: Centrality in the Marriage Network
    • 2.1 The R Procedure
    • 2.2 Exploring the R Output
  • Your Turn
1 Closeness

Closeness is a centrality measure for nodes in a network. In other words, it ranks nodes based on their positions in a network. The method assumes that nodes which are closer to other nodes in a network are more central in the network. Technically, the closeness of a node is the inverse of the average distance between the node and every other node in the network; hence, nodes with high closeness are likely to be in the center of a network.

2 An Example in R: Centrality in the Marriage Network

This example introduces the closeness centrality measure with a network of Renaissance Florentine families in around 1430. Specifically, we examine the closeness centrality of the Florentine families in their marriage network. The families are nodes and marriage ties between the families are edges in the network.

This example uses a subset of data from the Florentine Families dataset collected by Padgett (1994) and made publicly available by UCINET (https://sites.google.com/site/ucinetsoftware/datasets/padgettflorentinefamilies). The network is undirected since marriage ties are mutual. It includes 16 nodes and 20 edges.

2.1 The R Procedure

R is a free open source software and computing platform well suited for statistical analysis. R does not operate with pull-down menus. Rather, you must submit lines of code that execute functions and operations built into R. It is best to save your code in a simple text file that R users generally refer to as a script file. We provide a script file with this example that executes all of the operations described here. If you are not familiar with R, we suggest you start with the introduction manual located at http://cran.r-project.org/doc/manuals/r-release/R-intro.html.

For this example, we must first load the node table and the edge table into R. Using the network files provided, the code looks like this (assuming the data file is already saved in your working directory):

  • nodes = read.csv(‘dataset-florentine-1994-subset1-nodes.csv’)
  • edges = read.csv(‘dataset-florentine-1994-subset1-edges.csv’)

Now the node table and edge table are read in as dataframes. To perform any analysis, we need to turn them into a network object. There are two packages in R commonly used for network analysis: igraph and statnet. Statnet is useful in statistical modeling of networks and will be introduced in SAGE Research Methods Dataset on Exponential Random Graph Models. In this example, we use igraph which is good at computations on networks.

We need to load the igraph package in order to use it. If you don’t have igraph installed, you will get an error. Run the following code to install it first

install.packages(‘igraph’)

Once it is installed successfully or if already installed, you can load it like this

library(‘igraph’)

Next, we can turn the node and edge tables into a network object by the following command:

G = graph_from_data_frame(d=edges, vertices=nodes, directed=F)

Any column after the first one in the node table will be used as attributes for the nodes, and any column after the second in the edge table will be used as attributes for the edges. Here, we want to manually specify the name of each node using the “label” column in the node table. This can be done with the following code:

V(G)$name = as.character(nodes$label)

You can set other attributes for the nodes similarly. The benefit of naming the nodes in this example is that we can call them by name directly (instead of ID’s) in further analysis in igraph.

First, the network can be plot by the following command for a visual inspection:

plot(G)

The closeness centrality of each node in this network can be calculated by

closeness(G, normalized=T)

The parameter “normalized” in the command above will calculate the average distance instead of the sum of distances for each node.

2.2 Exploring the R Output

For each command above, R will return its results immediately. Here, we summarize them below.

The closeness centrality for each node is shown in Table 1. We can see that the Medici family is the most central one with a closeness centrality of 0.37, followed by Ridolfi. Albizzi and Tornabuoni are tied at the third place. Note that Ridolfi is not among the top three in terms of degree and pagerank centrality (see SAGE Research Methods Dataset on Pagerank), but it is close to many other families.

Table 1: Closeness Centrality of Each Node.

Acciaiuoli

Albizzi

Barbadori

Bischeri

Castellani

Ginori

Guadagni

Strozzi

0.28

0.33

0.31

0.29

0.29

0.26

0.33

0.3125000

Lamberteschi

Medici

Pazzi

Peruzzi

Pucci

Ridolfi

Salviati

Tornabuoni

0.25

0.37

0.23

0.28

0.06

0.34

0.29

0.33

3 Your Turn

Download this sample data to see whether you can replicate these results. Repeat the process after removing the most central node and check how the ranking changes.