Social network analysis is a powerful method to analyze complex social networks. However, finding and gathering data that are suitable for social network analysis is often complex and time-consuming. This is particularly the case for clandestine networks such as terrorist organizations. In our research project, we used social network analysis to gather and analyze publicly available data from the United Nations Security Council on the Al-Qaeda terrorist network. We generated a new dataset which allowed us to map and analyze diverse features of the Al-Qaeda group structure. In this methods case, we introduce some of the key concepts and measures of social network analysis such as “node,” “tie/edge,” ordifferent “centrality” measures, and we provide insight on how we generated and analyzed the data fostering a basic understanding of social network analysis. Furthermore, we discuss the strength and weaknesses of social network analysis in analyzing terrorist networks and reflect on some of the advantages and shortcomings of our dataset. We also provide practical advice on several topics when using social network analysis. We suggest software, websites, and references to help start further social network analysis research projects. This research methods case should be most helpful for advanced students and researchers who want to use social network analysis in general and for researching terrorist networks in particular.
By the end of this case, students should be able to
- Have a basic understanding on how to use social network analysis to explore complex (terrorist) networks
- Define core concepts of social network analysis such as “node,” “tie/edge,” and “centrality”
- Discuss the strengths and limitations of performing social network analysis on terrorist networks
- Understand how to tackle the challenges of data generation and data analysis for researching covert networks
- Use concepts and methods of social network analysis to formulate their own social network research projects
What can we learn from social network analysis (SNA) about the structure and characteristics of Al-Qaeda? The main goal of our research project was to analyze issues that were disputed in the literature on terrorist networks. For instance, what is the basic structure of the Al-Qaeda network? Is it a strong global network or a frail localized organization? Who are the key individuals and what are their roles within the network? Who could serve as a second-tier successor of Osama bin Laden after his death? What are the strengths, weaknesses, and potential for future studies of SNA to research terrorist networks? All these questions above seemed highly relevant to us to gain a deeper understanding of Al-Qaeda’s structure as an organization. However, we faced one crucial problem: There were no adequate data available to answer these questions.
To be able to answer our research questions, we used data extracted from the United Nations Security Council (UNSC) Al-Qaeda Sanctions List to generate a new dataset suitable for SNA. In generating and analyzing a new dataset, this research project addresses several shortcomings in the literature on transnational terrorist networks. First, there was a lack of large-n data to answer the questions posed. When we were designing this study, the largest social network dataset on terrorist networks consisted of 172 individuals (Sageman, 2004). Most of the previous studies focused on micro-networks, that is, local terror networks responsible for key attacks including the 9/11 cell (Krebs, 2002), the group orchestrating the 2002 Bali Bombings (Koschade, 2006), or the Malian Al-Qaeda network (Walther & Christopoulos, 2014). Instead, we introduced a new large-scale dataset from a mostly unexplored resource for a debate that is often informed by anecdotal evidence rather than reliable data. Second, we were then able to use this new dataset to analyze the structure and patterns of the Al-Qaeda terrorist network to uncover whether and how local or regional cliques were in fact embedded in a larger terrorist network by using a large-n approach. The Al-Qaeda Sanctions List, produced and regularly updated by the UNSC, was a highly useful data source for this purpose. Surprisingly, this large and publicly available data source had been mostly ignored in terrorism studies. This was particularly puzzling as the Al-Qaeda sanctions regime of the UNSC is a core multilateral counter-terrorism instrument. Third, we critically assessed the advantages and disadvantages of our dataset and indicated avenues for future research.
This methods case discusses the methodological and practical advantages as well as challenges we encountered in the analysis of network data derived from the Al-Qaeda Sanctions List.
From the outset, any scientific study regardless of the method should be theory-driven or problem-driven, not data-driven. It should also build upon state-of-the-art literature and be able to relate results back to previous findings in the literature. Hence, the challenge lies in uncovering a research puzzle that is theoretically relevant and ideally also policy-related before starting with data generation, even though the latter consumes much of the time in the entire research process.
In our case, we found references and sweeping assumptions about the structure of the Al-Qaeda network in public debates that relied on anecdotal evidence. For instance, we read commentaries that noted the hierarchical structure of Al-Qaeda or highlighted different Al-Qaeda franchises. However, without reliable data and systematic research, these claims seemed intuitive but hard to maintain at a closer look. Although we observed that terrorism scholars had started to critically evaluate the plausibility of such claims, the structure of Al-Qaeda required further scientific analysis underpinned by empirical evidence on a larger scale.
The state-of-the-art research on terrorist networks does not lack an abundance of articles and studies but often suffers from a lack of theory-driven work. In addition, SNA as a methodology seems difficult to apply to causal analysis due to a lack of adequate data. So far, it remains a rather descriptive method. In order not to fall into the trap of another data-driven study, we started off by exploring low- and middle-range hypotheses presented in terrorist network literature. For instance, Marc Sageman’s (2008) argument about smaller sub-groups being strongly connected through kinship and friendship was useful in this regard. Building on such arguments, our new dataset enabled us to explore these and other perspectives on a large-n basis.
SNA allows us to study how individuals relate to each other, how larger cliques of individuals and groups are connected, or how specific individuals relate to the network as a whole. SNA focuses on relationships between units of a network. Hence, SNA rests upon relational data, that is, data that bring its research objects “in relation to” each other. Networks are “social” as our networks are comprised of social actors such as individuals, firms, or states. A network consists of two or more social actors, which are called “nodes.” In most networks, not all nodes will be connected to each other. The relationship between a pair of nodes is called a “tie” or “edge.” Ties can have different forms, for instance, family ties, trade relationships, or financing a terrorist attack. In the best-case scenario, we have as much information as possible about the direction and content of a tie. For example, “A is the brother of B” is a tie that goes in both directions, whereas “A financed terror attack of B” is unidirectional. “Centrality” is another key measurement in SNA used to assess the importance of particular nodes within a network (see Robins, 2015, pp. 17–38). In general, greater centrality of a node within a network signals greater relevance of this node for the network. Different kinds of centralities include degree, closeness, and betweenness centrality.
Any scholarly work on terrorist networks faces a fundamental problem for data generation: secrecy. These networks are covert organizations whose survival depends on operating invisibly. In addition, counter-terrorism and law enforcement agencies also possess information based on intelligence with little incentive to make such data publicly available as to not endanger investigations. This is different for the UNSC Sanctions List, because the idea here is to use sanctions to constrain terrorist activities. Therefore, the effective implementation of the sanctions vitally depends on publicly sharing as much information as possible on the sanctions targets, for instance, to identify their financial assets.
For a SNA, we required data about the relationships between the individuals and groups within the Al-Qaeda network. Luckily, the UNSC list provided just that. One of the authors has worked with the UNSC data in the context of a different qualitative research project, so we already knew a lot about how the UNSC compiled the data, what the data included, and how the data were structured. The data include information about ties between a given individual or group to the broader Al-Qaeda network and to other individuals or groups within the network. In short, the UNSC data proved to be an ideal source for SNA of the Al-Qaeda network. Still, we had to circumnavigate a few issues when designing our study.
The UNSC Al-Qaeda Sanctions List is publicly available on the Council’s website. The UNSC names both individuals (e.g., Ayman al-Zawahiri) as well as groups (e.g., Al-Qaeda in the Islamic Maghreb) suspected of association with Al-Qaeda. All UN member states are legally required to prohibit the travel of, freeze any financial assets of, and prevent the transfer of arms to any listed individuals and entities. Two different document types are relevant here: (1) the Al-Qaeda Sanctions List and (2) the so-called “narrative summaries” for each listed individual or entity, which contain detailed descriptions of why the individual or entity meets the listing criteria. The narrative summaries increase the robustness and reliability of the data, as the sanctions list does not cover all ties of an individual (see below).
Table 1 provides an example of an individual found on the Al-Qaeda Sanctions List named Ayman al-Zawahiri, the leader of Al-Qaeda. The sanctions list contains a vast amount of information on names, aliases, nationalities, passport numbers, location, and biographical details, sometimes even physical attributes. This is remarkable given the clandestine nature of terrorist organizations. Each individual or group has a unique UN identifier number (e.g., “QDe.004” for “Al-Qaeda” or “QDi.006” for Ayman al-Zawahiri).
|Table 1. Example list entry, ISIL (Da’esh), and Al-Qaeda Sanctions List.|
Name: 1: AIMAN 2: MUHAMMED 3: RABI 4: AL-ZAWAHIRI
Name (original script): أيمن محمد ربيع الظواهري
Title: (a) Doctor (b) Dr.
DOB: June 19, 1951, POB: Giza, Egypt
Good quality a.k.a.: (a) Ayman Al-Zawahari (b) Ahmed Fuad Salim (c) Al Zawahry Aiman Mohamed Rabi Abdel Muaz (d) Al Zawahiri Ayman (e) Abdul Qader Abdul Aziz Abdul Moez Al Doctor (f) Al Zawahry Aiman Mohamed Rabi (g) Al Zawahry Aiman Mohamed Rabie (h) Al Zawahry Aiman Mohamed Robi (i) Dhawahri Ayman (j) Eddaouahiri Ayman (k) Nur Al Deen Abu Mohammed (l) Ayman Al-Zawahari (m) Ahmad Fuad Salim Low quality a.k.a.: (a) Abu Fatma (b) Abu Mohammed
Passport no: (a) Egypt number 1084010 (b) 19820215
Listed on: January 25, 2001 (amended on July 2, 2007; July 18, 2007; August 13, 2007; December 16, 2010; May 22, 2015)
Other information: Leader of Al-Qaeda (QDe.004). Former operational and military leader of Egyptian Islamic Jihad (QDe.003), was a close associate of Osama bin Laden (deceased). Believed to be in the Afghanistan/Pakistan border area. Review pursuant to Security Council resolution 1822 (2008) was concluded on June 21, 2010. INTERPOL-UN Security Council Special Notice web link: https://www.interpol.int/en/notice/search/un/4487197
Source: https://scsanctions.un.org/en/?keywords=Al Qaeda (last accessed July 11, 2018).
For SNA purposes, the last section is of prime importance. Under the section “Other information,” the Council details the number and types of relationships of the listed entry. In our example, Ayman al-Zawahiri has three ties: (1) Leader of Al-Qaeda (QDe.004), (2) Former operational and military leader of Egyptian Islamic Jihad (QDe.003), and (3) Close associate of Osama bin Laden. This includes ties to groups (Al-Qaeda, Egyptian Islamic Jihad) and individuals (Osama bin Laden).
Using this information, we compiled the dataset in the next step. The UNSC Al-Qaeda Sanctions List comes in XML, HTML, or PDF format. First, we used the open-source software package R to scrape data from the XML file. Then, we combined the list and the narrative summaries and determined the ties between the actors automatically by searching for a unique UN identifier via a regular expression-pattern. Finally, the data were re-coded and exported into GEXF format. The data quality of the lists themselves was high, with only a few misspelled identifiers. These were corrected within the script to allow for reproducibility. In the end, this process provided us with a dataset consisting of 305 total list entries, of which 237 were individuals and 68 were groups, with 1,315 total relationships between them (see Figure 1). It should be noted that there are also other data formats suitable for SNA, such as GraphML.
On the left side is the node table; on the right is the tie table.
The data generation process posed five challenges. First, we discovered that the ties of individuals and groups in the sanctions list did not always equal the ties listed in the narrative summaries of the same individuals and groups, even though they were both generated by the UNSC. The reason for this is unclear. As a result, we decided to include all ties mentioned in either the list or the narrative summary into the dataset, which required cross-checking all 305 list entries.
The second challenge related to how we should deal with information on the direction of ties. In fact, the UNSC provides information on the nature of the ties for many entries, such as “brother of Abdul Rahim Ba’aysir”; “provided financial, material, and technological support for Al Qaeda”; or “received money from Ansar al-Islam (QDe.098) to conduct attacks in Kirkuk and Ninveh in Iraq.” However, because the nature of the ties and information on these ties varied so broadly, coding would have consumed resources beyond our capacities. Ultimately, we decided against using information on the nature of ties. As a result, our data were undirected, and omitted information on duration, strength, and nature of ties.
The third challenge related to the timeliness of the data. As the Sanctions List is updated on a rolling basis, we were faced with the question of whether we should update the list as the project moved on. However, to keep the project manageable and within our available resources, we decided to work with the original dataset we scraped in October 2012.
The fourth challenge was missing network data which is an acute obstacle in SNA because even a single missing tie can considerably affect the structure of a network (Robins, 2015, pp. 116–118). The dataset comprised not only of network ties but also information such as nationalities, addresses, or passport numbers. Yet, much of this information was understandably still incomplete as we were studying a clandestine terrorist network. As we neither saw a feasible way to impute missing data nor could we reasonably assume that the data were missing at random, we decided to focus on the most relevant and complete information only.
The fifth challenge concerned analyzing a two-mode network. The UNSC lists both individuals and entities, although the latter (at least in some cases) comprised individuals. Consequently, our data and mappings contained individuals and group actors such as Al-Qaeda, which are grouped aggregates of individuals. For these groups, it could not be discerned whether their connections to other individuals or groups within the network were based on connections of individuals contained in the graph or of unknown individuals. We decided to keep them in the data and treat them as a one-mode network as they “behave” like individuals and their inclusion mitigated some of the challenges connected to missing data. Moreover, group-level ties were not entirely replicated by individual-level ties.
The SNA itself was performed using the open-source software package Gephi. As Gephi allows only for rudimentary data editing, we first prepared the data in R in the GEXF-format and transferred the data to Gephi. There are a number of alternative software packages available. However, few match the usability in combination with a broad functionality and customized visualization of Gephi, which is capable of visualizing network data and calculating a broad range of network metrics. Alternative software with similar functions include Graphviz or NetworKit.
In our study, we mainly focused on three aspects. First, we relied on visualizations in the form of network mappings to uncover the structure and characteristics of the Al-Qaeda terrorist network. Second, we combined these network mappings with several network metrics: size (total number of nodes), density (the number of ties divided by the number of total possible ties), and three measures of centrality. These centrality measures include degree centrality (the total number of a node’s ties), closeness centrality (nodes with a higher closeness centrality need fewer steps to connect to any other node), and betweenness centrality (importance of a node for other nodes to connect to each other). For all centrality metrics, we calculated the mode and mean values. Key to our choice of metrics was the idea that we wanted to bolster the visual analysis with reliable metrics to make more robust claims and avoid misinterpretation of the mappings. Third, in addition to the metrics, we were also looking at the role of cliques within the network. Cliques are a set of highly interconnected nodes that form cohesive clusters as compared with the rest of the network.
In terms of analysis and presentation of findings, one practical issue with SNA is the adequate illustration of fine-grained structures in a limited number of plots. In this task, the goal was to ensure readability, while maximizing the amount of information which could be conveyed in the mappings.
This challenge is even more pressing if the network is quite large, as in our case. Different variables or dimensions can, for example, be represented by the size and color of the node or its label, and the network as a whole can be laid out according to different layout algorithms (see Figures 2 and 3). Although the options for visualizations are plentiful, readability should be the prime concern, limiting the feasible possibilities. For example, too fine-grained color nuances accounting for differences in the diverse metrics can easily get lost in a plot, and dense clustering of nodes prevents distinguishing different individuals or cliques within the network.
Source: Authors’ illustration based on the Force–Atlas Algorithm, node size according to betweenness centrality.
Source: Authors’ illustration based on Fruchterman–Reingold Algorithm, node size according to betweenness centrality.
In our original research article, we opted for a first plot displaying the entire network (see Stollenwerk, Dörfler, & Schibberges, 2016). The node size and its written name signaled the nodes’ degree centrality, with larger nodes/names signaling higher degree centrality than smaller nodes. We used the node color to show the betweenness centrality for the node, with a lighter color representing less betweenness centrality. By highlighting results of each dimension separately, we were able to compress as much information as possible into one graph to achieve analytical clarity. In a second plot, we depicted the network structure, but this time each node depicted a number to represent nationality. Thus, we were able to analyze different cliques within the network and their nationalities. In the process of mapping this information, we faced several challenges, for instance, how many names to plot as names with lower centrality have a smaller, potentially unreadable font.
Graphical illustrations of social networks can be easily misinterpreted. Comparing Figures 2 and 3, the exact same network has been visualized using two different algorithms, but could the same results be found in these depictions? To draw valid conclusions from these visualizations, we combined the visual analysis with three tables depicting three types of centrality metrics. This allowed us to assess the role of key individuals. For instance, which individuals were important for connecting other nodes with each other as brokers (betweenness centrality)? Which individuals could connect most easily with other nodes within the network (closeness centrality)?
There are several practical lessons to be learned from this research project. An obvious, but intriguing point is that there is already a lot of free and publicly available data that may be useful for SNA. However, the data may not be in the shape or format that is directly usable for the social network method. As coding and preparing data that are suitable for SNA are time-consuming, this should be scheduled in for every research project that aims to analyze social structures with SNA. In our case, we were surprised to see that terrorism studies have largely neglected the use of the UNSC Al-Qaeda Sanctions List despite its size, detail, and availability. Hence, the time and effort to prepare and analyze the data were worth the effort and other researchers should be encouraged to pursue similar SNA projects.
Another lesson is that researchers should think carefully about the limitations of their network data and discuss them transparently. For instance, our dataset excludes information on the direction, duration, strength, or nature of the ties. It is also important to question the extent to which the data are biased. How does this affect the results of the study? What are possible remedies for such biases? In our case, sanctions listings of the UNSC are politically influenced as the data submitted rest upon the capacity and willingness of states to submit listing proposals along with sufficient evidence of the association of an individual or entity to Al-Qaeda. Although this mechanism does not require evidentiary standards as in criminal investigations, the number of states able and willing to produce information and data of suspected terrorists is limited. On the contrary, some states may not be willing to put forward cases, while others may be overly interested in proposing names, for instance, to repress opposition as counter-terrorism in disguise. States might also only pursue certain types of listings (e.g., primarily financiers of attacks).
A third lesson is that it is highly productive to combine the strength and expertise of different researchers. A research project like ours requires a diverse set of skills and knowledge, particularly with respect to theory and methods. With regard to theory, it was helpful that some of the authors were experts in the field of terrorism studies, while others were UN experts. The preparation and analysis of the data, in turn, required not only methodological knowledge of how to conduct SNA but also the hands-on skills in specific software packages such as R and Gephi. To complement individual skillsets, students should be encouraged to collaborate with fellow students and researchers. Besides, collaborative research is often simply more fun than doing everything by yourself. Hence, it might be useful to think about the level of skills and expertise necessary before starting such a project. For example, what expertise do I have in the fields connected to the study I want to conduct? What software skills are necessary for such a research project? What is the key audience for my research project? What are the resources and the time horizon for my project?
Despite some of the shortcomings discussed, the data of the UNSC Al-Qaeda Sanctions List have unique strengths. It is not just a new and publicly available data source for SNA, it is also relatively up-to-date. Building on the state-of-the-art literature, the data we compiled from the Sanctions List represent one of the first attempts to explore large-n data of terrorist networks with SNA tools. It is unique with respect to the inclusion of both individuals and groups on two distinct levels. This allows for exploring network structures and mapping of relations among nodes across these two levels, that is, how individuals relate to groups and vice versa. Overall, SNA is a rich methodology for exploring research questions with regard to large-n datasets with a focus on relationships between network members. SNA allows for in-depth studies for networks with broad coverage not only with regard to Al-Qaeda but any kind of social network. As such, the use and applicability of SNA to diverse fields of study is a key strength of the method.
Our research project on the Al-Qaeda network using the Al-Qaeda Sanctions List could be further developed and extended in many directions. Using the same data source but scraping the data at different points in time would create a longitudinal social network dataset. One could, for example, show how the network structure develops over time or what effects the elimination of single nodes will have on the network, for example, through successful counter-terrorism efforts. In addition, one could explore the information on the nature of the ties more closely. However, the data gleaned from UNSC likely draw its greatest potential from combining it with other qualitative or quantitative data. This way, not only the shortcomings of the political nature of sanctions listing could be compensated for, but also additional aspects of the Al-Qaeda network not reflected in the data could be uncovered.
Beyond our dataset, SNA based on public sources like news media may also be useful to analyze other research topics. For instance, SNA could be most useful in exploring the spike in foreign terrorist fighters traveling abroad (e.g., Reynolds & Hafez, 2017) and for analyzing their ties to terrorist networks. Such data could also be useful to uncover the quickly changing structure of the Islamic State of Iraq and the Levant (ISIL) from its creation, to its rapid expansion and retreat, to the relocation of smaller ISIL cliques in third countries. Above all, future attempts for SNA may focus on testing theories of social networks in general, and terrorist networks in particular. Recently developed approaches such as Advanced Exponential Random Graph models or Bayesian network models are a first step in this direction.
- What are the advantages and disadvantages of social network analysis (SNA) when compared with other methods?
- What are the pros and cons of using SNA on terrorist networks?
- Can you think of other data suitable for SNA, for instance, on counter-terrorism, disarmament, economics, trade, or environment?
- What are some of the drawbacks of combining actors of different aggregation levels within a network?
- What are the strengths and weaknesses of publicly available SNA data? What are potential sources of bias?
- Which ethical dilemmas do you see when researching terrorist networks?
Social Network Analysis Textbook
Social Network Analysis in International Relations
Social Network Analysis and the Study of Terrorist Networks
Social Network Analysis Methods