This case presents the bibliometric and visualization method applied to a dataset of 729 documents published in the collaborative economy research field. Four steps are described in detail: (1) the delimitation of the field of study; (2) the selection of databases, keywords, and search criteria; (3) the extraction, cleaning, and formatting; and finally (4) the co-citation analysis and visualization. The method validation section shows the results obtained by applying our methodological procedure to an author network analysis as well as a source title network analysis. This study is unique in that it presents a co-citation analysis coupled with a network visualization applied to the rapidly growing research area of the collaborative economy. More specifically, the methodology presented in this case presents three key advances: (1) The data have depth. The dataset is not only comprised of journal articles, as is usually done in many bibliometric studies, but it also includes conference papers (e.g., proceedings), books, editorials, gray literature, and book chapters. (2) The data were extracted from two databases (Scopus and Web of Science), instead of one as is commonly done in bibliometric studies. While the input from an additional database provided us with additional information, it also required a higher amount of work with regard to the extraction, cleaning, and formatting of the data. (3) VOSviewer was our main analytical tool performing the co-citation analysis and the network visualizations.
By the end of this case, students should be able to
- Perform the delimitation of the research field under study
- Select databases, keywords, and search criteria
- Extract, clean, and format data extracted from both Scopus and Web of Science databases
- Conduct a co-citation analysis and visualization of the data using VOSviewer
The field of the collaborative economy has expanded exponentially since 2010. Although a number of literature reviews have been made so far on specific business models such as car-sharing, peer-to-peer business models, crowdsourcing, or specific platforms (e.g., Uber, Airbnb), little retrospective work on this evolution has been made so far. The project aimed at applying rigorous bibliometric methods and network analysis combining both Scopus and Web of Science databases to provide fresh new insights into the evolution of the collaborative economy research field as well as its increasing coverage of sustainability-related topics.
Bibliometrics is a quantitative process that investigates the formal properties of knowledge domains by extracting data from published documents using statistical analysis (Agarwal et al., 2016; Mora, Bolici, & Deakin, 2017). It is considered a discrete, objective, low-cost, and reliable approach for analyzing different aspects from publications, journals, scientists, and communities (Zhao & Strotmann, 2015).
The bibliometric procedure enables researchers to describe or evaluate the intellectual structure of a field of study, the diffusion of knowledge, the relationship between academics, and their use of the scientific literature. In addition, bibliometrics allows comparisons over time to assess the history of a specific research domain. Bibliometric analyses are also easily replicable as the data come from search engines widely used by the scientific community.
Nowadays, bibliometrics is used by several leaders to quantitatively evaluate research fields. Sometimes it completely replaces the traditional qualitative peer assessment of performance indicators, such as the h-index or the impact factor of a journal, which creates debates about how we evaluate scientific production (Agarwal et al., 2016; Haustein & Larivière, 2015).
We follow the formal bibliometric procedure outlined by Zhao and Strotmann (2015) and perform a co-citation analysis, which is the most popular form of interrelationship analysis related to bibliometrics, as it “counts the number of documents that have cited two objects together” (p. 5). This enables best to pinpoint the connections between references in the literature.
Our methodological framework for this process is based on Zhao and Strotmann’s (2015) book Analysis and Visualization of Citation Networks. Although the general ideas are the same, our process went slightly different. More precisely, it focuses on the use of two databases whereas Zhao and Strotmann (2015) use only one, and the recourse to VOSviewer, which was not mentioned by both authors either.
After conventional bibliometric analysis, which is essentially a frequency analysis on the properties of the bibliographic data, we initiated a co-citation analysis. Co-citation analysis is a more dynamic type of study as it is a bibliometric method that compiles the number of documents that have cited two objects together (Zhao & Strotmann, 2015). This method examines therefore documents that were published at times which may fall outside of our period of analysis (i.e., 2010–2017), such as in 1978, for example. Consequently, seminal papers in the field that were not in our dataset for various explanations (e.g., not in the timeframe, not available in the databases, not specifically related to the collaborative economy) were also considered (Persson, 1994). To acquire the data, we followed a four-step method.
First, we delimited the research field under study. This field can be broad (e.g., health sciences), focused on a specific science (e.g., medicine), a discipline (e.g., cardiology), a sub-discipline (e.g., pediatric cardiology), and so on (e.g., pediatric cardiology for preterm birth). Our search field was the collaborative economy. Then, we circumscribed the search field in a timeframe. In our case, the timeframe was between 2010 and November 2017. The year 2010 was chosen in regard to the media coverage, consultant reports, and conferences, as well as venture capitalists and hedge funds investments into collaborative platforms, which increased tremendously at this time. Then, November 2017 was chosen simply because it was the period when we were doing our study and we wanted to have access to the most recent publications in the domain to be up to date.
Finally, we chose which language(s) should be retrieved. We decided to keep documentation in English only, as the majority of the influent publications in the collaborative economy domain are in this language. Optionally, the types of documents may also be chosen at this step (e.g., journal articles, books, etc.). However, we feel it is easier to delimit this matter once the first searches are made. This way, we can see what is possible to obtain and therefore adjust our search criteria in consequence.
During Phase 2 of selection of databases, keywords, and search criteria, we decided to include data from both Web of Science and Scopus. In fact, recent studies such as Mongeon and Paul-Hus (2016) demonstrated that the coverage of both databases are not the same. Thus, the results of bibliometric analyses may vary depending on the database used. Following Zhao and Strotmann (2015), a
good approach might be to supplement results retrieved from a citation database with additional publications (which are then indexed by researchers in the same format as the downloaded records) in order to reach the desired level of completeness for the study at hand. (p. 66)
The databases used were Scopus and Web of Science because they offer citation metadata, which is necessary to accomplish a co-citation analysis. Keywords used were “sharing economy,” “collaborative economy,” and “collaborative consumption.” The search criteria were the title, abstract, keywords search for Scopus and the topic search for Web of Science.
Data were extracted in plain text from Scopus and Web of Science. We used a representative sample of the documentation published in the subject studied. After the duplicates and the non-relevant entries were removed, the dataset went from 1,056 to 729 entries. Then, we formatted the data from Scopus in the style of Web of Science for facilitating the analysis. Additional considerations pertaining to this stage will be discussed in the “Research Practicalities” section.
The co-citation approach works in groups of two. If two data of the same type (e.g., authors, article titles, journals) are cited inside the same document, they each earn a mention and become linked. For example, if A quotes B and C, B and C become coupled. This is the most common approach when analyzing a citation network. However, it is quite complex to set up and requires the use of computer programs to automate the process. For example, if an article has more than one author, each author must receive its mention when pairing with another document. On a scale of a few hundred documents, authors’ couplings become a real puzzle. This is why we imported the standardized data into VOSviewer. This program allows us to do both the co-citation analysis and the visualization at the same time.
Furthermore, network analysis through co-citation analysis requires using either distance-based or graph-based data mapping techniques (Perianes-Rodriguez, Waltman, & van Eck, 2016). In distance-based maps, smaller distance between two items reflects the strength of the relation between the items (Van Eck, Waltman, Dekker, & Van Den Berg, 2010). In graph-based maps, the difference between two items need not reflect the strength of the association between the two items (Fahimnia, Sarkis, & Davarzani, 2015). Items distribute in a uniform way and the most connected nodes move to the center of the network while the more isolated nodes move to the borders (Van Eck et al., 2010). Both approaches have pros and cons. However, Van Eck et al. (2010) suggest that with graph-based mapping, it is more difficult to see the strength of the relation between two items, and clusters of related items may be difficult to detect. Given that we seek to find clusters of publications to identify research themes within the collaborative economy (CE), we favor distance-based maps.
Past research showed that the visualization of similarities (VOS) mapping technique shows superior performance than other distance-based algorithms (e.g., multidimensional scaling, VsOrd, Kopcsa-Schiebel) (Van Eck et al., 2010). VOSviewer is based on Van Eck, Waltman, and Van den Berg’s (2005) VOS. It is a clustering technique that is used to “provide a low-dimensional visualization in which objects are located in such a way that the distance between any pair of objects reflects their similarity as accurately as possible” (Van Eck & Waltman, 2007, p. 299). The weighted sum of the squared Euclidean distances with all pairs of objects are minimized with VOS and the similarity between two objects will affect positively the weight for their squared distance. In the line of Van Eck and Waltman (2007, p. 2), if there are n objects (i.e., citations), denoted by 1, …, n and an n × n similarity matrix S, then element Sij of S denotes the similarity between the objects i and j. Then if there is an n × m matrix X, where m denotes the number of dimensions of the space that is used, contains the coordinates of the objects 1, …, n, the vector denotes the ith row of X and contains the coordinates of object i. The objective function to be minimized in VOS is expressed as follows:
where ||.|| denotes the Euclidean norm. The objective function is minimized in accordance to the constraint shown in (2), in which the distances in the constraint are not squared.
The VOS mapping technique is fully integrated in the VOSviewer software. Thus, no additional computer program (e.g., Pajek) is needed for constructing VOS maps (Van Eck et al., 2010).
We compiled the data with BibExcel (Persson, Danell, & Schneider, 2009), Excel, and Notepad++ and did the co-citation analysis with VOSviewer (Van Eck & Waltman, 2007; Van Eck et al., 2010; Van Eck et al., 2005) (i.e., Tables 1 and 2). The thresholds for the clusters were 30 for the authors and 20 for the source titles (Shaw, 1985).
|Table 1. Authors with more than 30 citations in the dataset and their associated cluster.|
Belk, R. W.
Schor, J. B.
John, N. A.
Lamberton, C. P.
Shaheen, S. A.
Tussyadiah, I. P.
Edelman, B. G.
Mont, O. K.
Martin, C. J.
Albinsson, P. A.
Ozanne, L. K.
|Table 2. Source titles with more than 20 citations in the dataset and their associated cluster.|
Belk, R. (2014a). Sharing versus pseudo-sharing in web 2.0. The Anthropologist, 18(1), 7–23.
Benkler, Y. (2004). Sharing nicely: On shareable goods and the emergence of sharing as a modality of economic production. The Yale Law Journal, 114(2), 273–358.
Benkler, Y. (2006). The wealth of networks: How social production transforms markets and freedom. New Haven, CT: Yale University Press.
Botsman, R. (2010b). Rogers R, what’s mine is yours: The rise of collaborative consumption. New York, NY: HarperCollins.
Gansky, L. (2010). The mesh: Why the future of business is sharing. New York, NY: Portfolio Penguin.
John, N. (2012). Sharing and web 2.0: The emergence of a keyword. New Media and Society, 15(2), 167–182.
John, N. (2013). The social logics of sharing. The Communication Review, 16(3), 113–131.
Lessig, L. (2008). Remix: Making art and commerce thrive in the hybrid economy. New York, NY: Penguin Books.
Ostrom, E. (1990). Governing the commons. Cambridge, UK: Cambridge University Press.
Putnam, R. (2000). Bowling alone: The collapse and revival of American community. New York, NY: Simon & Schuster.
Rifkin, J. (2014). The zero marginal cost society: The Internet of things, the collaborative commons, and the eclipse of capitalism. New York, NY: Palgrave Macmillan.
Albinsson, P., & Perera, B. (2012). Alternative marketplaces in the 21st century: Building community through sharing events. Journal of Consumer Behaviour, 11, 303–315.
Bardhi, F., & Eckhardt, G. (2012). Access-based consumption: The case of car sharing. Journal of Consumer Research, 39, 881–898.
Belk, R. (2010). Sharing. Journal of Consumer Research, 36, 715–734.
Belk, R. (2007). Why not share rather than own. The Annals of the American Academy of Political and Social Science, 611(1), 126–140.
Felson, M., & Spaeth, J. (1978). Community structure and collaborative consumption: A routine activity approach. American Behavioral Scientist, 21, 614–624.
Lamberton, C., & Rose, R. (2012). When is ours better than mine? A framework for understanding and altering participation in commercial sharing systems. Journal of Marketing, 76, 109–125.
Leismann, K., Schmitt, M., Rohn, H., & Baedeker, C. (2013). Collaborative consumption: Towards a resource-saving consumption culture. Resources, 2(3), 184–203.
Ozanne, L., & Ballantine, P. (2010). Sharing as a form of anti-consumption? An examination of toy library users. Journal of Consumer Behavior, 9, 485–498.
Belk, R. (2014b). You are what you can access: Sharing and collaborative consumption online. Journal of Business Research, 67, 1595–1600.
Botsman, R., & Rogers, R. (2010a). Beyond Zipcar: Collaborative consumption. Harvard Business Review, 80(10), 30.
Cohen, B., & Kietzmann, J. (2014). Ride on! Mobility business models for the sharing economy. Organization and Environment, 27, 279–296.
Ert, E., Fleischer, A., & Magen, N. (2016). Trust and reputation in the sharing economy: The role of personal photos in Airbnb. Tourism Management, 55, 62–73.
Guttentag, D. (2015). Airbnb: Disruptive innovation and the rise of an informal tourism accommodation sector. Current Issues in Tourism, 18, 1192–1217.
Hamari, J., Sjöklint, M., & Ukkonen, A. (2016) The sharing economy: Why people participate in collaborative consumption. Journal of the Association for Information Science and Technology, 67, 2047–2059.
Möhlmann, M. (2015). Collaborative consumption: Determinants of satisfaction and the likelihood of using a sharing economy option again. Journal of Consumer Behaviour, 14, 193–207.
Heinrichs, H. (2013). Sharing economy: A potential new pathway to sustainability. GAIA Ecology Perspectives Science and Society, 22(4), 228.
Martin, C. (2016). The sharing economy: A pathway to sustainability or a nightmarish form of neoliberal capitalism? Ecological Economics, 121, 149–159.
Sundararajan, A. (2016). The sharing economy: The end of employment and the rise of crowd-based capitalism. Cambridge, MA: MIT Press.
One key decision that we had to make when conducting a bibliometric analysis related to the specific technique that we should use to perform the bibliometric analysis. Generally, researchers have the choice between two main techniques: bibliographic coupling (BC) and co-citation analysis.
BC refers to “the number of cited references that two objects have in common” (p. 5). BC excels in studying the recent research activities of a research field. On the contrary, co-citation analysis examines the past intellectual influences on the field or the knowledge base of the field (Persson, 1994). Whereas BC maps citing publications in the dataset, co-citation analysis maps cited publications dataset (Zhao & Strotmann, 2015). Using BC would only identify influential authors and publications that remain limited to our dataset, whereas co-citation analysis is more encompassing. It includes authors or publications that have been influential in the field of CE although not included in the dataset (Zhao & Strotmann, 2015). The lack of inclusion of a publication on the dataset may be related to the fact that the publication is not stored in the databases used, does not include the defined search terms, or got published outside the timeframe under study. Given the superiority of co-citation analysis in mapping extra-sample influential publications and its past-oriented nature, co-citation analysis was favored over BC for our study.
In addition, bibliometrics can be integrated with other methods, such as visualization mapping, for offering new and unique insights. Visualization mapping is part of the science of networks, a multidisciplinary field of research and is increasingly used with bibliometric analysis. By using it with co-citation analysis, visualization allows us to examine the characteristics, structures, and evolution of a field of research. Several computer programs can be used for this, but our preference went to VOSviewer given its popularity among researchers.
However, this bibliometric/visualization approach needs metadata from a set of publications that are related in a way or another (i.e., subject, affiliation, etc.) with all their citations. Thus, it is necessary to use a search engine that retrieves citations from a document. To date, only Web of Science (Thomson Reuters) and Scopus (Elsevier) offer this possibility. We thus decided to merge the data extracted from both datasets into a single database.
We followed the process mentioned above (delimitation of the field of study; selection of databases, keywords, and search criteria; extraction, cleaning, and formatting; and finally, co-citation analysis and visualization) and obtained insightful results. Here are some of the results that demonstrate the validity of our method.
Table 1 shows the authors with more than 30 citations in the dataset, as well as their associated cluster. All of these were given a distance (x,y) by VOSviewer with the algorithm described earlier. The results of the authors’ visualization analysis are shown in Figure 1.
Table 2 displays the source titles with more than 20 citations in the dataset as well as their associated cluster. These were also given a distance (x,y) by VOSviewer. The results of the source title visualization analysis are visible in Figure 2.
In both cases, we observe meaningful information for the scholars interested in the field of CE. For example, Figure 1 shows at a glance, which is connected to whom and can demonstrate allegiances, leaders, or excluded authors. Figure 2 shows instead which documents are the most popular in the field of CE and those who share similar approach or methodology.
For more details about the results, please go to the main article published in the Journal of Cleaner Production.
In this section, we provide an in-depth reflection on the specific method that we used in the research project as well as the lessons that were learned while adapting Zhao and Strotmann’s (2015) methodology for conducting a bibliometric analysis.
There are a number of lessons learned from Phase 2 of selection of databases, keywords, and search criteria.
First, it is preferable to limit the number of keywords retained to find relevant publications on the subject. It is preferable to use a few influential keywords instead of the many different and often under-used keywords that abound in the literature.
More specifically, once the databases were chosen, we defined the search keywords. In our case, we limited our consideration to “sharing economy,” “collaborative economy,” and “collaborative consumption.” Although many synonyms are used in this subject, such as gig economy, platform economy, peer economy, app economy, or access-based consumption, our tests with different search queries indicated that they were not relevant for finding new documentation, as most of them were included in publications which already comprised the initial keywords.
Another thing to consider here was the search criteria. Instead of having the same process for retrieving information, searching with both controlled (subject) and natural (title, abstract, keywords, etc.) vocabulary offers more variety in the results (Fidel, 1991; Savoy, 2005).
We used the title, abstract, keywords search for Scopus and the topic search for Web of Science. In both cases, it was the default way of searching.
Third, a bibliometric analysis should be fed with a great diversity of publications beyond journal articles.
In fact, when the first searches were made, we concluded that many journal articles, conference papers, book, book chapters, editorials, and gray literature1 were relevant to our field of study. Thus, all of these were considered in the creation of our core set.
The extraction, cleaning, and formatting phase of the project was characterized by three main considerations.
First, it is preferable to use a representative sample of the literature rather than attempting to catch all publications in a field of research because peripheral research will be included if the core sample cites them in sufficient occurrences.
When we were satisfied with the results of our keywords and search criteria, we extracted the data. By the very nature of the co-citation analysis, “outsiders” will be taken in consideration if our core set cites them. For example, some of the most cited documents in our set were outsiders (e.g., Rachel Botsman, Yochai Benkler, Lisa Gansky).
Second, by using Web of Science and Scopus together, a higher amount of work and precaution is necessary in relation to the extraction, cleaning, and formatting of the data.
First, our goal was to obtain a similar format by Web of Science and Scopus to be able to analyze them with VOSviewer. At the time of our research, plain text was available for extraction for both of them and was readable by VOSviewer. However, the formatting between them was different and both VOSviewer and BibExcel were only able to read plain text from Web of Science. Thus, we needed to translate the plain text sample from Scopus in the Web of Science format. The field tags were easily standardized with regular expression in Notepad++. However, the complexity came with the formatting of authors, affiliations, countries, and citations (about 26,000). VOSviewer needs the same information to create a relationship between two pieces of information. For example, if the metadata of a document says it was published in England and another says United Kingdom, they will not be linked together when they should have. It was the same for the names of the authors, affiliations, and the titles of documents. All initials, dots, commas, and spaces needed to be in the same order to create a relationship. This task took several hours of work with the help of BibExcel and Notepad++.
Third, all publications need to be reviewed for relevance when doing the formatting.
A dozen of publications were retracted because they were not relevant to the study although they contained the words under study (i.e., sharing economy, collaborative economy, and collaborative consumption) in their title, abstract, or keywords. More specifically, we determined the relevance of each publication by reading through its title and keywords. In case we were still unclear about the appropriateness of the publication after going through the title and keywords, we read through the whole abstract. At the same time, we were confronted with a lot of publications that appeared in both databases. They were suppressed from the data extracted from Web of Science as their format offers less information than Scopus (Yong-Hak, 2013). After the suppression of non-relevant documentation and duplicates, our dataset went from 1,056 to 729 entries. These 729 observations constituted our final sample.
An important point when doing a visualization network is the threshold for citations.
Generally, the researcher has control of the thresholds above which documents will be retrieved (McCain, 1990; Shaw, 1985). When the threshold is too small, a “giant component” may appear where most of the items are related to each other (Small, 2009). This result may be interesting as it demonstrates the unicity of a group. However, it is impossible to observe distinctiveness or particular associations. This was also an issue that we faced when we applied thresholds that were too small. However, applying a very high threshold may break meaningful relationships and alter the results. According to Shaw (1985), a threshold between 3 and 35–40 may be statistically significant depending on the situation. However, is it ambiguous to use a specific threshold based on similar studies because of the singularity of our own dataset (e.g., number of documents, associations between them, subject studied, etc.). After some experiments with different thresholds, we concluded that a cutoff value set at 30 citations for the author visualization, and at 20 citations for the literature classification, were the best for visualizing meaningful clusters.
The project presented in this case pertained to the application of bibliometric tools to the study of the collaborative economy research field. Zhao and Strotmann’s (2015) classic procedure was adapted to optimally analyze the data extracted uniquely from two different datasets (i.e., Scopus and Web of Science). The overarching objective was to provide meaningful bibliometric analytics as well as visualization of the evolution of the research field of the collaborative economy. The project was conducted in four phases, as follows:
- The first stage consisted in the delimitation of the field of study. More specifically, the search field was the collaborative economy. The timeframe was between January 2010 and November 2017. All documentation was retrieved in English only. Types of documents extracted were journal articles, conference papers, books, book chapters, editorials, and gray literature.
- The second phase referred to the selection of databases, keywords, and search criteria. The databases used were Scopus and Web of Science because they offer citation metadata, which is necessary to accomplish a co-citation analysis. Keywords used were “sharing economy,” “collaborative economy,” and “collaborative consumption.” The search criteria were the title, abstract, keywords search for Scopus, and the topic search for Web of Science.
- The third step related to the extraction, cleaning, and formatting of the data. Data were extracted in plain text from Scopus and Web of Science. We used a representative sample of the documentation published in the subject studied. After the duplicates and the non-relevant entries removed, the dataset went from 1,056 to 729 entries. Then, we formatted the data from Scopus in the style of Web of Science for facilitating the analysis.
- Finally, the bibliometric analysis was conducted. We compiled the data with BibExcel, Excel, and Notepad++ and did the co-citation analysis with VOSviewer. The thresholds for the clusters were 30 for the authors and 20 for the source titles.
This process enabled us to develop a meaningful representation of the evolution of the field of the collaborative economy.
1 Gray literature consists of institutional reports (e.g., European Commission), business magazines, and newspapers (e.g., The Economist), as well as consultant reports (e.g., PwC).
- Discuss the four steps for conducting a bibliometric analysis.
- In your own words, how does the co-citation approach work?
- Discuss the pros and cons of using bibliographic coupling on one hand and co-citation analysis on the other to perform a bibliometric analysis.
- What is the procedure to merge data extracted from both Scopus and Web of Science?
- To what extent should a researcher always follow strict thresholds such as the threshold for citations when doing a visualization network?
Co-citation analysis and visualization of a citation network