How-to Guide for R

Introduction

In this guide, you will learn how to produce betweenness in Statistical Software R, using a practical example to illustrate this process. Readers are provided with links to the example dataset and encouraged to replicate this example. An additional practice example is suggested at the end of this guide. This example assumes that you have the data file stored in the working directory being used by R.

Contents

- Betweenness
- An Example in R: Brokerage in the Network of Marriage
- 2.1 The R Procedure
- 2.2 Exploring the R Output

- Your Turn

1 Betweenness

Betweenness is a centrality measure for both nodes and edges in a network. In other words, it ranks nodes or edges based on their positions in a network. The method assumes nodes or edges that sit on many shortest paths are important because they “control” the information or resource flow between other nodes and have access to otherwise disconnected groups. Hence, betweenness is a function of the shortest paths, and nodes or edges with high betweenness serve as gatekeepers or bridges between other nodes.

2 An Example in R: Brokerage in the Network of Marriage

This example introduces the betweenness centrality measure with a network of Renaissance Florentine families in around 1430. Specifically, we examine the betweenness centrality of the Florentine families in their marriage network. The families are nodes, and marriage ties between the families are edges in the network.

This example uses a subset of data from the Florentine Families dataset collected by Padgett (1994) and made publicly available by UCINET (https://sites.google.com/site/ucinetsoftware/datasets/padgettflorentinefamilies). The network is undirected since marriage ties are mutual. It includes 16 nodes and 20 edges.

2.1 The R Procedure

R is a free open source software and computing platform well suited for statistical analysis. R does not operate with pull-down menus. Rather, you must submit lines of code that execute functions and operations built into R. It is best to save your code in a simple text file that R users generally refer to as a script file. We provide a script file with this example that executes all of the operations described here. If you are not familiar with R, we suggest you start with the introduction manual located at http://cran.r-project.org/doc/manuals/r-release/R-intro.html.

For this example, we must first load the node table and the edge table into R. Using the network files provided, the code looks like this (assuming the data file is already saved in your working directory):

- nodes = read.csv(‘dataset-florentine-1994-subset1-nodes.csv’)
- edges = read.csv(‘dataset-florentine-1994-subset1-edges.csv’)

Now the node table and edge table are read in as dataframes. To perform any analysis, we need to turn them into a network object. There are two packages in R commonly used for network analysis: igraph and statnet. Statnet is useful in statistical modeling of networks and will be introduced in SAGE Research Methods Dataset on Exponential Random Graph Models. In this example, we use igraph, which is good at computations on networks.

We need to load the igraph package in order to use it. If you don’t have igraph installed, you will get an error. Run the following code to install it first

install.packages(‘igraph’)

Once it is installed successfully or if already installed, you can load it like this

library(‘igraph’)

Next, we can turn the node and edge tables into a network object by the following command:

G = graph_from_data_frame(d=edges, vertices=nodes, directed=F)

Any column after the first one in the node table will be used as attributes for the nodes, and any column after the second in the edge table will be used as attributes for the edges. Here, we want to manually specify the name of each node using the “label” column in the node table. This can be done with the following code

V(G)$name = as.character(nodes$label)

You can set other attributes for the nodes similarly. The benefit of naming the nodes in this example is that we can call them by name directly (instead of ID’s) in further analysis in igraph.

The betweenness centrality of each node in this network can be calculated by

betweenness(G, directed=F)

Similarly, the betweenness centrality of each edge in this network can be calculated by

edge_betweenness(G, directed=F)

This command above will return a list of numbers with each corresponding to an edge. To see what those edges are, we can print out the nodes for each edge by

E(G)

2.2 Exploring the R Output

For each command above, R will return its results immediately. Here, we summarize them below.

The betweenness centrality for each node is shown in Table 3. We can see that the Medici family is the most central one with a betweenness centrality of 47.50, followed by Guadagni and Albizzi. Note that Albizzi is not among the top three in terms of degree and PageRank centrality (see SAGE Research Methods Dataset on PageRank), but it is important as a broker between other families. On the other hand, Strozzi has a large PageRank centrality since it connects to many other important families, but for the same reason the paths through it are “short-circuit” by the other families and its betweenness centrality is hence small.

Table 3: Betweenness Centrality of Each Node. | |||||||
---|---|---|---|---|---|---|---|

Acciaiuoli | Albizzi | Barbadori | Bischeri | Castellani | Ginori | Guadagni | Strozzi |

0 | 19.33 | 8.50 | 9.50 | 5.00 | 0 | 23.17 | 9.33 |

Lamberteschi | Medici | Pazzi | Peruzzi | Pucci | Ridolfi | Salviati | Tornabuoni |

0 | 47.50 | 0 | 2.00 | 0 | 10.33 | 13.00 | 8.33 |

The betweenness centrality for each edge is shown in Table 4.

Table 4: Betweenness Centrality of Each Edge. | |||
---|---|---|---|

Acciaiuoli–Medici | Albizzi–Ginori | Albizzi–Guadagni | Albizzi–Medici |

14.00 | 14.00 | 16.33 | 22.33 |

Barbadori–Castellani | Barbadori–Medici | Bischeri–Guadagni | Bischeri–Peruzzi |

12.50 | 18.50 | 17.17 | 7.50 |

Bischeri–Strozzi | Castellani–Peruzzi | Castellani–Strozzi | Guadagni–Lamberteschi |

8.33 | 6.00 | 5.50 | 14.00 |

Guadagni–Tornabuoni | Medici–Ridolfi | Medici–Salviati | Medici–Tornabuoni |

12.83 | 15.33 | 26.00 | 12.83 |

Pazzi–Salviati | Peruzzi–Strozzi | Ridolfi–Strozzi | Ridolfi–Tornabuoni |

14.00 | 4.50 | 14.33 | 5.00 |

3 Your Turn

Download this sample data to see whether you can replicate these results. Repeat the process after removing the most central node or edge and check how the ranking changes.