Having worked with young people who offend for some time, I questioned the effectiveness of interventions because the young people may not have had enough emotional understanding to process the sessions. For example, how can a young person who demonstrates anger management difficulties learn control without an appreciation of how their own emotions work, or even what they might be? In my PhD, I set out to discover whether young people in the criminal justice system had a level of emotional intelligence related in some way to their propensity to offend using an already existent questionnaire, validated for use with young people. Because I had to ask for colleagues' help with the questionnaire completion, consistency could not be ensured in its delivery with young people, so the returned questionnaires showed serious discrepancies. Statistical analyses revealed where these inconsistencies lay, and helped mitigate their effects for the purposes of my study. However, my discoveries also formed the basis of important principles for using this kind of questionnaire with young people, especially those with more communication difficulties and lower literacy levels than the general population.
By the end of the case, you should
- Understand the difficulties that some young people might encounter with the design of a Likert-scale questionnaire
- Be able to anticipate potential difficulties, and mitigate these with a robust administration protocol
- Understand the importance of reliability and validity in research tools
- Understand how to test responses to ensure that they show reliability
I began this study in 2006, at which time no one else appeared to have looked at the relationship between young people who offend and emotional intelligence (EI). My motivation was the fact that the governing body of our organisation, the Youth Justice Board, would only allow interventions to be used if they had a solid evidence base. On one hand, this is sound reasoning; on the other hand, what if an area had never even been considered for research? Lack of evidence would certainly not be proof of worth; so I set out to produce the evidence myself, in order to open the way for work to be done with young people to develop their EI.
Lack of previous research had a subsequent effect on the data-gathering process, because there were very few validated instruments to measure EI in young people (and none for young people who have offended). I needed a measure which would concur with my chosen model of EI, which was Mayer and Salovey's Four Branch Model of EI. One measure was available: the Adolescent Swinburne University Emotional Intelligence Test (ASUEIT), which had been developed in an Australian university by Palmer, Gignac, Ekermans and Stough. This was a self-report questionnaire consisting of 54 questions, answered by selecting a number on a Likert scale between 1 and 5 (where 1 means they seldom think this). As this was the only available measurement tool, I had little choice but to use it, and hope that it worked!
A Likert-scale questionnaire is a questionnaire with a series of statements or questions, which can be rated according to the intensity of reaction the respondent indicates. For example, a Likert item might state: I love walking dogs; the subsequent options might be 1 = strongly disagree, 2 = slightly disagree, 3 = neither agree nor disagree, 4 = slightly agree, 5 = strongly agree. Sometimes the somewhat neutral middle answer (3 = neither agree nor disagree) is not made available because it can give rise to an easy, non-committal answer, in which case the scale is considered a forced-choice scale. Likert scales elicit a numerical response, so they can be analysed as scale data (which assumes that the numbers are exact intervals on a scale), but they can also be analysed as categorical data, because it would be difficult to be sure that someone selecting 5 was exactly 20% stronger in their reaction than someone selecting 4. Given the potential for error, the choice of a forced-choice scale is often preferable.
In my research project, I wanted to gather at least 100 responses to enable effective quantitative analyses, which seemed likely to be completed in approximately 12 months, based on the numbers of young people coming through the courts. I needed to rely on colleagues to use the ASUEIT with the young people with whom they worked and return the completed questionnaire to me in a sealed envelope (for confidentiality reasons). To standardise the administration of the questionnaire, I gave a presentation to all staff, indicating how they should explain the questionnaire to the young people for the best results. I based the information in this presentation on the results of a small pilot, which attempted to identify what the young people struggled to understand. I offered explanations for those elements with which young people had difficulty.
The young people who took part in this study were all working with the Youth Offending Service after having been convicted of criminal offences. They were aged between 11 and 18 years, and 75% were males. Because I had set the parameters to include all young people who had been given a specific order from the court during the data-gathering period, strictly speaking I was using a population rather than a cohort (as there was no further selection criterion, random or otherwise, from the eligible young people). Young people who offend have a much higher level of communication difficulties than the general population, as identified in a research project by Gregory and Bryan (2009). They are also more likely to have missed mainstream schooling, by having been excluded, refusing to attend, or being allocated to a special school for social, emotional and behavioural difficulties. Such difficulties can have an adverse effect on their education, and in particular on their ability to understand a questionnaire which has been validated for secondary school-age young people. Along with these difficulties, difficulties with concentration (and a higher level of diagnosed attention deficit hyperactivity disorder) can result in a situation whereby young people find it challenging to successfully complete a Likert-scale questionnaire.
It actually took 2 years to get the target number of responses through my co-workers. By the end of this data-gathering period I had received 101 responses, although 1 was not sufficiently completed for inclusion, leaving a database of 100 young people. While inputting some of the questionnaires into the database, I noticed some trends which I found concerning. One of the young people had written on his questionnaire ‘I am not dum and most of these are Repeat [sic]’, illustrating one of the difficulties with this questionnaire design. To ensure reliability and consistency, questionnaires are usually designed to include several similar questions, which can make them feel repetitive. Although repetition was also a common theme in the pilot feedback, there was very little I could do about it because altering the questionnaire would reduce its validity (validity is whether a test is measuring what it purports to measure, and reliability is whether responses to the questions are consistent, both internally, and over time). There were also comments about the length of the questionnaire which, at 54 items, was long for a population of young people who have difficulties with concentration. One worker commented, ‘I have to say that one of the young people that I filled the questionnaire in with found it a bit long and therefore did not take it seriously and just ticked any answer’. This was something I was able to address at the end of the study with a new EI model, which negated the need for some of the questions; however, at the time of the data-gathering phase, all the items needed to remain.
It became evident, from the returned questionnaires, that some of the young people had not necessarily completed it with responses which were well considered. This may have been for several reasons: they may not have understood a question and been afraid to ask for clarification; they may have become bored with the questionnaire and ceased to consider their responses; they may have been trying to answer in a socially acceptable way rather than reflecting their actual opinion. However, with a Likert-scale questionnaire, respondents can answer a question regardless of whether or not they understand it. Some returned questionnaires had a large number of middle responses (i.e. 3 = neither agree nor disagree; in general, these increased towards the end of the questionnaire, supporting the probability that this questionnaire was too long for this client group), which might have been because the young person was indecisive about their response, but could also have been because they could not be bothered to think about it, and chose middle values as an easy and quick choice (which brings up the issue of consent: young people, being in a lower position of authority than the researchers, may feel unable to refuse consent but feel reluctant to involve themselves, so quick completion may be their way of refusing consent).
I noticed that one respondent, with whom I was very familiar as I had previously worked with him, had completed the questionnaire in a way which gave him an unrealistically high EI level, one which I seriously questioned. This caused me to take a closer look at his responses, which were very ‘black and white’, at mainly either 5 (strongly agree) or 1 (strongly disagree). I concluded from my experience working with him that his polarised answers reflected a concrete level of thinking, about which a more sophisticated thinker would likely have been more circumspect. This brings into doubt the reliability of the questionnaire with these young people. Young people may not have a mature enough awareness of their own abilities with emotions to answer this questionnaire accurately, and because the multiple-choice design allows them to give an answer, even if it is random, the researcher has no way to identify whether this has happened.
A main concern with the Likert-scale questionnaire in this population, which only surfaced during the statistical analyses, was the reverse-scoring of some of the questions. To test for internal reliability, Likert-scale questionnaires frame some questions positively and some negatively. For example, the statement ‘I can always tell other people how I feel’ can be negatively phrased as ‘I cannot tell other people how I feel’. If the response to the first question is 5 (strongly agree), then the response to the second is expected to be 1 (strongly disagree). A well-written questionnaire will have several sets of similar (but not the same) questions, with one positively and one negatively written. Consistency can be identified by calculating the reliability coefficient, which in this case was Cronbach's alpha (which calculates all of the possible combinations of questions to see whether they are answered in a consistent way; see Field, 2009). For consistency, an alpha score (which ranges between 0 and 1 – although 1 would only occur if all the questions were identical, and 0 if they were all completely disparate) should be greater than .7, but alpha scores of the responses with which I was working ranged between .2 and .59 (for the different branches of EI), all of which were unacceptable. A reversal of the negatively framed answers resulted in alpha scores of .68 up to .82, which suggests that the young people may have been confused by the negative phrasing. Because they may have been confused about which end of the scale they really wanted to select, they may have chosen the wrong one. On advice from a respected psychology statistician at the university, I presented the findings in their non-reversed state, because the alpha scores were then acceptable, but I acknowledged this in the analysis methodology.
A specific note about the use of a self-report questionnaire to measure abilities: the model of EI I chose was based on a set of abilities that someone might be capable of (e.g. the ability to identify emotion in other people), known as the ‘ability EI’. This was in an effort to reduce the personality factors in the measure, because I felt having too many factors would cloud the purity of the model (other models openly include personality traits, known as ‘trait EI’). However, it is debatable whether it is possible to measure someone's ability to do something based on their opinion rather than on practical demonstration; therefore, it is entirely possible that while I set out to measure ability EI, I ended up measuring trait EI. It is important to remember that a Likert scale will reflect the views of the respondent, not necessarily their actual ability, which gives further guidance to the researcher regarding when this type of questionnaire would be appropriate (or how it can be interpreted). It is also important not to become so attached to your hypotheses and models that you deny issues that arise which question the validity of what you set out to do!
Many of the difficulties I encountered could have been mitigated by insisting on a robust and detailed administration protocol to which my co-workers were required to adhere when using the questionnaire with young people. Had I offered a script, which included explanation of every negatively phrased question as it came up (e.g. ‘If you think ____, then you need to select at this end of the scale, but if you think ____, then use the other end), then the young people would have been more able to answer confidently, knowing which part of the scale they really intended to select. Had I insisted that the questionnaire be completed over two sessions, then questionnaire fatigue would have been less likely. (It is important to note that making changes to a validated research tool will compromise validity, meaning that questions cannot be removed even if you feel that there are too many.) Of course, had I been able to administer all the questionnaires myself, their consistency would have been much easier to control.
However, none of these measures would have addressed the difficulties which young people at a concrete stage of thinking might have, especially with regard to dealing with shades of opinion or answering questions about concepts and ideas, which are intangible. This type of questionnaire may simply be unsuitable for certain types of young people (e.g. one returned questionnaire was almost completely blank, but included a cover note from the case worker explaining that the respondent has Asperger's syndrome and cannot understand emotions enough to be able to consider the questions), which needs to be considered at the start of a research project, when research tools are being selected. Chronological age appropriateness does not guarantee cognitive age appropriateness.
It is therefore vital to consider carefully the research participants prior to data gathering, and to identify any potential issues. Are there specific vulnerabilities which might compromise the reliability of a research tool with a certain group? Will there be difficulties with administration of the tool if multiple data gatherers are used? How can these difficulties be mitigated? Is there something else available which might be more appropriate than the initially identified option?
- List some of the advantages and disadvantages of using a Likert-scale questionnaire.
- When conducting research with vulnerable young people, how can the researcher make sure that informed consent obtained on a consent form means that the young person understood and actually agreed to take part? (Put yourself in the place of a young person who has been told they have to work with someone and consider how easy it would be to tell that worker that you did not want to do a certain exercise?)
- What might a vulnerable young person do to withdraw their consent implicitly from a process (i.e. without saying that they do not want to take part)? How might the researcher identify when that young person is withdrawing consent?
- Consider the following categories of research participants and research inquiries and discuss whether the use of a 5-point Likert-scale questionnaire would be appropriate:
- young people aged 16 and 17 years who attend a local youth group, looking into their opinion about youth provision in the area;
- primary school-age children, to see how much they enjoy school;
- young people aged 12 to 15 years who are in a youth custody institution, to find out about their experiences of bullying inside;
- a group of 15- and 16-year olds, to find out how good they are at mathematics
- university students, to find out about their experiences with their courses
- a group of 8-year olds, to find out how happy they are in their home environment.
- If some of the above options are felt to be inappropriate, discuss which, if any, measures could be put in place to mitigate the concerns?
- What would cause you to question the validity of a questionnaire?
- What would cause you to question the reliability of a questionnaire?