Systematic review skills are required in almost all areas of health services research. This case study describes a systematic review originally undertaken for a master’s dissertation in health economics and health policy. In this project, we conducted a systematic review using novel searching techniques, applied a qualitative research method to fragments of text extracted from published literature, and analyzed the results both pictorially and statistically. We detail the steps involved in conducting a systematic review, and describe the qualitative discourse analysis and quantitative graphical and statistical analyses that we undertook. We also describe the further steps we needed to take to ensure that the results were publishable in a peer-reviewed journal. Reading this case study will give students an insight into the processes that are standard in any systematic review and encouragement to try novel techniques.
By the end of this case, students should be able to
- Understand the steps involved in conducting a systematic review of the literature
- Recognize different electronic sources of material for a literature review
- Understand the differing requirements for a dissertation project and a peer-reviewed publication
- Be aware of the potential uses of text-based analysis
Project Overview and Context
At the end of a project, it is very common for researchers to recommend that further research is required. However, research incurs costs to society, so it is important that those studies yielding the most benefits are prioritized. Formal statistical methods have been developed to assess the potential value of conducting further research; for example, the so-called “expected value of perfect information” (EVPI) puts a monetary value on eliminating uncertainty around decision making through conducting further research. If the cost of undertaking more research to reduce uncertainty around a decision (e.g., to fund a new health care program) is more than the EVPI, there is little justification for doing the study. In other words, given what we know already, is it potentially worth doing another study, or should decision makers just make a final decision based on the current evidence?
Value of information techniques have been increasingly employed in a health care research context since the publication of a key methodology paper in 1996 by Karl Claxton and John Posnett (1996). However, although the development of statistical methods for conducting value of information studies continues year on year, our understanding of how authors interpret the results, and how the results are actually used in practical decision making, is more limited.
This project aimed to find all examples of applied EVPI calculations in a health care context published up until 2010 and to examine the authors’ advice on whether further research should be undertaken or not. We proposed to analyze the quoted figures to see whether there was an empirical “threshold” value at which authors appeared to recommend or not recommend further research and to see whether there were any other factors driving authors’ recommendations. However, the project was very much grounded in the interpretation of results rather than the statistical technique itself; further information about applying value of information methodology (and a detailed description of the statistical methods used to analyze the obtained information) can be found from the bibliography but is not the focus of this research project.
Dr Joanna Thorn undertook this project as a dissertation for a master’s degree in Health Economics and Health Policy at the University of Birmingham, UK, during the latter half of 2011. The project was conceived by Dr Lazaros Andronis, who conducts research in the field of value of information studies (an area of complex statistical research). However, the project also involved qualitative research techniques to answer questions about researchers’ interpretation of their own work. Therefore, Professor Joanna Coast, who has extensive expertise in qualitative research techniques, was the second supervisor.
At the point of undertaking the review as a student at the University of Birmingham, Joanna Thorn was working as a research associate at the University of Bristol, which allowed access to journal subscriptions through two libraries. Although the project started out as a master’s dissertation, we aimed to publish the results in a peer-reviewed journal, which necessitated further work beyond the MSc submission; the subsequent statistical analysis was carried out by Lazaros Andronis. The project was adopted by the MRC Network of Hubs for Trials Methodology Research (ConDuCT Hub, Grant G0800800), which provided further resources to fund access to required articles and disseminate its findings through a presentation at the Health Economists’ Study Group meeting, Oxford, June 2012.
The developing nature of the project was a good introduction to the realities of the research environment. Although in hindsight we might have done some things differently from the start to facilitate publication plans, the project was exploratory in nature and conducted under stringent time constraints which necessitated a flexible approach.
A Systematic Review of the Literature
A systematic review of the literature is a common first step in health service and medical research and is a requirement of many PhDs. Assessing what is already known on a subject prevents duplication of research efforts, and distilling the results on a particular topic from a large number of studies into a manageable overview is of considerable value to others, including researchers and policy makers. A scientific and systematic approach is intended to make a study repeatable and ensure relevant literature is not missed because of a biased approach by a researcher; published information can be contradictory, and it is easy to select only the material that fits with a pre-conceived view. Guidelines have been written to steer researchers in the right direction; for example, the Cochrane Collaboration and Centre for Research and Dissemination have published guides to conducting systematic reviews, and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement aims to standardize the reporting of systematic reviews.
A systematic review is a structured approach to identifying and summarizing relevant material in the literature. It involves electronic searches of bibliographic databases and other sources to locate potentially relevant abstracts, sifting through these abstracts to pick out those that are likely to be relevant, and acquiring full texts to read for a final decision on relevance and inclusion in the study. Relevant data are then extracted from each of the included articles, and an analysis (which could take many forms) is undertaken.
We undertook a systematic review of the literature to collate the known information on applied EVPI values, that is, studies in which an actual EVPI value had been calculated. We drew up a written plan of the steps we intended to take (a protocol) in advance of starting the review; this is good practice and was very helpful as the project proceeded.
Decisions on which articles to include, referred to as the “inclusion criteria,” need to be made at the beginning of the study and preferably recorded in the protocol. For example, the time period over which to search must balance the need to identify relevant material with the need not to be overwhelmed by irrelevant material. We opted to search a fixed period that started from just before the point we believed relevant material might have been published (based on the publication of a key methodology paper in 1996 by Karl Claxton and John Posnett). We fixed the end point just prior to some political changes that might have affected authors’ interpretation of their work. For pragmatic reasons, we also chose to include only material published in English. Material published in the gray literature was also excluded as we felt that it was less likely to have any effect on decision makers. The key point here is to record and be able to justify any inclusion or exclusion criteria.
Searching for Material
We constructed searches in standard bibliographic databases for health services research. As each database has a different focus, it is good practice to search more than one to increase the amount of relevant material identified. Medline is a very commonly used database with pan-medical coverage, and students would need to have a very good reason not to include it for a health-related project. However, we also searched EMBASE to include additional titles, CINAHL for a nursing focus, Web of Science for a broader scientific emphasis, and the Cochrane Library to access material explicitly related to health economics. We considered searching BIOSIS Previews but rejected it as the coverage is pre-clinical. Students undertaking systematic reviews should consider carefully which databases contain material that is relevant to their research question.
We used a combination of terms to cover the different phrases relevant to value of information studies and also scoped out the likely returns from various additional searches by taking samples of 50 results, rejecting those for which no new relevant material was identified.
EVPI calculations are not always referred to in the abstract or keywords of an article, which means that it is hard to construct a standard bibliographic database search that is sufficiently specific. Fortunately, however, EVPI is a very specific term—articles using that exact phrase are reasonably likely to be relevant to the research question. It is also an explicit phrase—there are no widely accepted alternative ways to describe this type of study. We decided that it was worthwhile attempting to search full texts, avoiding excessive noise using very limited and specific terms (“EVPI” or “expected value of perfect information” only). We sought material using the search functionality available on publisher websites; selecting the key health economics journals enabled us to identify the main publishers to search. However, there is no consistency in the search functionality available from different publishers, and the supporting documentation provided is very variable, so developing appropriate search syntaxes was largely done by trial and error for each site. Finally, we also used Google Scholar to identify potentially relevant material. Scholar was chosen because it is freely available, and Xiaotian Chen (2010) had supplied evidence to suggest that coverage is good. However, there are some practical limitations associated with using Google Scholar as a research tool; only the first thousand results are returned, so searches must be effectively specified, and it is not possible to download multiple abstracts simultaneously which means that screening must be carried out through Google itself, following links to abstracts where appropriate.
Identifying Relevant Material
Having identified a body of abstracts that might potentially be relevant to our research question (about 2,500), we developed a bespoke database as an effective means of managing the data. Microsoft Access is a widely available and user-friendly database that is adequate for managing most literature reviews. The database was constructed to allow bibliographic data to be stored in it (downloaded from reference management software).
We then undertook a screening exercise; Joanna Thorn went through all the titles or abstracts and identified articles she felt should definitely be included in the review according to the inclusion criteria given above and articles for which it was not possible to tell from the abstract and title alone. A 10% sample was also screened independently by Lazaros Andronis; dual screening for at least some of the abstracts is good practice to ensure that accuracy and consistency are achieved. Highlighting the key search terms (EVPI or “expected value of perfect information”) in the stored abstracts helped with efficient screening of material.
For all articles that passed the first screening round (about 500), we then set about collating a set of full text versions. Acquiring the material necessary for the review was a lengthy and time-consuming process. As a student with no access to funds for inter-library loans, Joanna relied as much as possible on the subscriptions held by the libraries at the universities of Bristol and Birmingham, and the generosity of researchers in sharing authors’ copies of articles or confirming that their study had not included an EVPI calculation. However, one of the challenges of contacting corresponding authors is that many have moved on and are no longer at the address given; many of the emails we sent resulted in bounce backs or a lack of reply. Therefore, we also used some other solutions to overcome an inability to get the full text; for example, until 2015, the NHS Economic Evaluation Database (EED) published reviews of economic evaluations, and some articles were identified as being irrelevant by the lack of mention of value of information in the EED record.
These full texts were then examined more closely to identify material that was actually relevant to the research question, that is, articles that described applied EVPI studies. Technological solutions were employed to weed out irrelevant material; for example, multiple pdf files were searched for keywords such as “perfect” and the context examined to see whether it was relevant. If you choose to take this approach, beware of pdf files that are not searchable! For articles that were not searchable, or the context was unclear, we read the full text. By the end of the identification phase of the study, we had identified 86 relevant articles.
Extracting and Classifying Information
We extracted specific items of data from each of the 86 relevant articles, including information about the article itself (e.g., when it was published), information about the study (such as where it was carried out and who funded it), and, of course, the actual EVPI values cited. We then classified the extracted data into groups where possible; for example, we classified the funding according to whether the source was from industry, academia, government, or charity.
It is common practice when conducting systematic reviews to assess the methodological quality of the included studies. However, in this particular study, we were interested in the recommendations that the authors made based on the EVPI value derived in their study regardless of whether the study itself had been carried out well. Owing to this focus, and in combination with the lack of a suitable quality guide, we decided that it was not necessary to assess the methodological quality formally.
We also extracted data for the key focus of this study—the authors’ interpretation of their results. Brief verbatim excerpts of text describing the recommendations made by authors on the basis of their results were extracted into the Access database. We then classified the authors’ research recommendations into six categories according to how strongly they recommended conducting further research (or not).
We further analyzed the extracted text excerpts using a method based on discourse analysis, a technique that is covered thoroughly in Alan Bryman’s (2004) textbook on social research methods. Discourse analysis is a form of qualitative study that takes a language-based approach to examining reasons for presenting information in a particular manner; other useful sources of background reading include papers by Mary Dixon-Woods (2001) and Michael Traynor (2006). The texts we were examining all came from published, peer-reviewed material, that is, they were written by scientists for scientists. Bearing this in mind, we examined the extracted texts analytically to look for patterns in the use of language and repeated the process several times. We then highlighted key phrases and grouped them into similar descriptive modes.
This process led us to code the extracts according to how confident the authors were in their recommendations for further research. For example, some authors used unequivocal language, such as “should not fund more research” or “should be willing to invest,” whereas other authors used more equivocal words such as “possible” or “potential.” These descriptors were used to classify the excerpts on an ordinal scale of confidence in the recommendation, ranging from “unequivocal” through “valuable,” “justified,” “probably,” to the least confident “possibly.” This classification schema differed from the original positive or negative recommendation schema; both positive and negative recommendations could fall into the same confidence category.
Following advice that an important initial step in any analysis is to plot and inspect results (see, for example, the textbook Basic Econometrics by Damodar Gujarati and Dawn Porter, 2009), we opted to use a visual graphical analysis to get some first insights into the factors affecting the recommendation. We sorted the EVPI values into size order and plotted them on bar charts using color to differentiate between the different recommendations, as shown, for example, in Figure 1. This enabled us to see whether any patterns existed; for example, from the graphs, it appeared that studies carried out in the Netherlands and the United States resulted in higher proportions of positive recommendations. The graphs also clearly confirmed that, as expected, lower EVPI values led to more recommendations against further research.
Figure 1 Example of graphical approach to analysis.
We repeated the graphical analysis using different colors to differentiate between the confidences with which a recommendation was made (using the classifications from the text analysis). This suggested that, for example, studies carried out in the United Kingdom had recommendations that were typically less confident than those carried out elsewhere. However, it is important to note that although the graphical approach allowed us to quickly generate suggestions for factors that might have an impact, it did not allow us to state that these factors were statistically significant.
Reporting the Project
Keeping clear and extensive notes of the procedures undertaken (which is substantially easier to do as you go along and should include your reasoning for any research decisions and summaries of conversations) substantially contributed to the ease of writing up. In particular, it is necessary to be able to report the exact searches undertaken, and the dates on which they were carried out, so reliable records must be kept.
It is also important to identify and report any limitations that may have affected the results of a project. In this case study, the limitations included the subjective nature of some of the choices of extracted data; for example, some studies calculated multiple EVPI values, only one of which was extracted for the analysis.
Although the project was conceived and undertaken in the context of a master’s dissertation, we planned to submit it to a peer-reviewed journal. It had not been possible to locate full texts for a small number of abstracts. We were reasonably confident that these articles were not relevant to the research question, and simply reporting this was acceptable in the context of a master’s dissertation. However, for publication in a peer-reviewed journal, the omissions needed to be remedied. The project was adopted by the MRC Network of Hubs for Trials Methodology Research (ConDuCT Hub) at the University of Bristol which gave us access to inter-library loans. We were, therefore, able to locate all the necessary material to complete the project (and breathed a sigh of relief that, as we had predicted, none of the “missing” material was relevant and, therefore, did not change the existing results!).
Adoption by the ConDuCT Hub also allowed us to submit a paper to the Health Economists’ Study Group meeting. This conference follows an unusual format in that the author does not present their own work—a discussant presents and critiques the work, and the audience is then engaged in a discussion around the work. The format leads to extremely useful feedback being given; our discussant reviewed the work in detail and, together with the audience, gave us lots of ideas to think about. However, the key message to come out of the session was that the dual approach to classification was confusing for the readers. Therefore, we decided to separate the classifications and present only the original schema based on the actual positive or negative research recommendation in the publication; disentangling the schemas required some careful work but resulted in a much clearer report.
The mixed nature of the work led to some difficulty in identifying a suitable home for it; the advanced statistical subject matter of the review rendered it unsuitable for a general journal. However, the focus of the project was on the interpretation rather than the methods themselves. We opted to submit it to Medical Decision Making, a journal which is considered to be a prime channel of dissemination of value of information-related research, as the readers would understand the statistical techniques we were assessing. However, an obvious criticism of our visual graphical analysis was that a statistical approach would be more robust, and a reviewer did indeed pick up on this as an area for improvement.
Therefore, we developed a logistic regression model to formally test whether different factors had any bearing on the probability of a positive recommendation for further research. Logistic regression is a methodological technique commonly used to model the effect of various “explanatory” variables on a “dependent” variable that can take one of the two values only (here, recommend or not recommend further research). We explored whether factors such as the country the study was carried out in, or the type of funder, affected the likelihood of a positive recommendation for further research. To come up with a statistical model that captures the effect of relevant factors while at the same time is free from unhelpful “noise” caused by irrelevant variables, we put together and tested different models using formal model selection (stepwise selection) and “goodness-of-fit” (likelihood ratio, Akaike, and Bayesian information criteria) methods. The best model showed that the EVPI value was the only factor to have a significant effect on the likelihood of a positive research recommendation and also allowed us to formally identify an empirical threshold EVPI value at about £1.5 million. After other pertinent suggestions from referees had been addressed, the article was published.
Practical Lessons Learned
Writing a good bibliographic database search is a challenge. You must pull out all the relevant material (i.e., the search must be sensitive), while not overwhelming yourself with too much irrelevant material (i.e., the search must also be specific). To learn more about the searches and balance the sensitivity and specificity, we spent quite a bit of time scoping the likely returns early on in the project; this made the search a lot easier. In constructing our bibliographic database searches, it was very helpful to consult library staff—this is their area of expertise, and they can help write a search that is as specific and sensitive as possible.
One of the issues associated with searching publisher websites is that there is no control over the way in which the functionality might change. This was brought home to us more recently when attempting to rerun some of the searches; in some cases, the website interface had changed entirely, and journal mergers and acquisitions meant that some sites had changed owners. This lack of control is also a particular problem with using Google Scholar as a research tool; although it appears to perform excellently in terms of locating relevant material, the algorithms are opaque and subject to change without warning or acknowledgment.
The requirements for submission for a degree are different to the requirements for publication in a peer-reviewed journal, as was demonstrated by the comments made by referees when we submitted the article. Journal publication requires all the loose ends to be tied up, and suggestions from referees can lead to further work. The MSc dissertation was a report of everything undertaken, but we were more selective in presenting material in the journal article.
The key lesson the student (Joanna Thorn) learned from the study was to record everything! Extensive notes on what you did and when you did it make it very much easier to write up at the end of the project. In hindsight, explicitly noting down the reasoning behind each decision and, in particular, the decisions not to include or do something, would have been valuable.
In this case study, we have described a systematic review of the medical literature with analyses based on both qualitative and quantitative methodologies. By systematically identifying all the relevant material, we were able to determine a threshold EVPI value at which further research is recommended. We used qualitative research techniques to explore the confidence with which authors made their recommendations. The project was successfully submitted for a master’s degree, and parts of it were presented at a national conference and were subsequently published in a peer-reviewed journal article. The project also raised a number of interesting issues suitable for further research. For example, through contact with authors, it became apparent that EVPI studies are not always reported; it would be interesting to look at both the extent to which this happens, and the reasons behind it.
The steps we took in conducting the review were as follows:
- To develop a protocol;
- To search bibliographic databases and publisher websites;
- To screen the resulting abstracts for relevance;
- To acquire full texts of potentially relevant material;
- To extract information from each included article;
- To analyze the data both qualitatively (using text analysis techniques) and quantitatively (via logistic regression methods);
- To report the study.
Although we excluded the text analysis from the final publication, carrying out a more speculative piece of research was valuable in its own right. At the outset, we didn’t know whether anything useful could be gained from analyzing published text in this manner, and it was interesting to find that a coherent classification system could be devised. The project was engaging to work on, with a “quirky” feel to it, and we successfully avoided a tickbox mentality to conducting a systematic review by employing novel techniques and using the results creatively. Simply following a tickbox recipe for research rather than thinking through the issues and developing solutions would have led to a rather duller piece of research. The diverse expertise of the two supervisors was an essential contributor to successfully bringing the project to degree submission and, ultimately, publication. The further questions raised as the project progressed are left open for future work.
Exercises and Discussion Questions
- The need to conduct systematic reviews is well established in medical research. Can you think of any criticisms of systematic review methodology?
- Describe some of the challenges of using both qualitative and quantitative methods in a single project.
- What features of a reported systematic review would convince you that it had been carried out competently?
- Describe the circumstances in which searching full text might be appropriate or inappropriate.
- Why might you choose to conduct a qualitative text analysis?
Cardiff University, Systematic Review Methodologies: http://www.cardiff.ac.uk/specialist-unit-for-review-evidence/resources/systematic-review-methodologies