368x Filetype PDF File size 0.13 MB Source: iase-web.org
22
QUALITATIVE RESEARCH: AN ESSENTIAL PART OF
STATISTICAL COGNITION RESEARCH3
PAV KALINOWSKI
Statistical Cognition Laboratory, School of Psychological Science, La Trobe University
p.kalinowski@latrobe.edu.au
JERRY LAI
Statistical Cognition Laboratory, School of Psychological Science, La Trobe University
kj2lai@students.latrobe.edu.au
FIONA FIDLER
Statistical Cognition Laboratory, School of Psychological Science, La Trobe University
f.fidler@latrobe.edu.au
GEOFF CUMMING
Statistical Cognition Laboratory, School of Psychological Science, La Trobe University
g.cumming@latrobe.edu.au
ABSTRACT
Our research in statistical cognition uses both qualitative and quantitative methods.
A mixed method approach makes our research more comprehensive, and provides us
with new directions, unexpected insights, and alternative explanations for previously
established concepts. In this paper, we review four statistical cognition studies that
used mixed methods and explain the contributions of both the quantitative and
qualitative components. The four studies investigated concern statistical reporting
practices in medical journals, an intervention aimed at improving psychologists’
interpretations of statistical tests, the extent to which interpretations improve when
results are presented with confidence intervals (CIs) rather than p-values, and
graduate students’ misconceptions about CIs. Finally, we discuss the concept of
scientific rigour and outline guidelines for maintaining rigour that should apply
equally to qualitative and quantitative research.
Keywords: Statistics education research; Mixed methods; Scientific rigour;
Qualitative analysis
1. MIXED METHODS IN STATISTICAL COGNITION
Statistical cognition refers to “the cognitive processes, representations, and activities
involved in acquiring and using statistical knowledge,” as well as the research program
that investigates these processes (Beyth-Marom, Fidler, & Cumming, 2008, p. 22). In this
way statistical cognition is similar to the discipline of cognition, which refers to both
mental processes and the body of research investigating these processes. In this paper we
describe how both quantitative and qualitative methods are used together in our statistical
cognition research program.
Statistics Education Research Journal, 9(2), 22-34, http://www.stat.auckland.ac.nz/serj
International Association for Statistical Education (IASE/ISI), November, 2010
23
Regardless of whether research is quantitative or qualitative, we believe that
researchers should describe the context of their work and their preconceptions and
assumptions. For this reason, we begin this paper by stating that we are advocates of
statistical reform in psychology; that is, we believe that the dichotomous thinking
associated with Null Hypothesis Significance Testing (NHST) has damaged the progress
of psychology and that estimation-based techniques, that is, effect sizes and confidence
intervals (CIs), are better tools for statistical communication. However, we also believe
that statistical reform should be evidence-based. As such, we believe that advocates of
reform should provide empirical evidence that the alternatives to NHST that they promote
are better communicators of inferential information and less prone to misinterpretation
and misuse. Our statistical cognition program has produced some evidence in favour of
CIs (e.g., Fidler & Loftus, 2009), but the four studies recounted here show that collecting
such evidence is by no means straightforward!
Qualitative research is essential in fulfilling the goals of the statistical cognition
program in at least two ways. First, it helps achieve fuller and more complete descriptions
of phenomena. We illustrate this in the first two of our four studies: Fidler, Thomason,
Cumming, Finch, and Leeman (2004) used a mixed approach to examine the effect of
error bars in result interpretation in medical journals. Faulkner (2005) used interviews to
explore students’ preference and efficiency in interpreting CIs and NHST.
Secondly, qualitative methods may be very useful in suggesting new directions for
research. Our exploratory studies, open-ended questions, and interviews have yielded
unexpected and novel insights and have led to new research programs. Again, two studies
are offered as examples: Coulson, Healy, Fidler, and Cumming (2010) produced
unexpected results when comparing researchers’ interpretations of NHST and CIs, which
led to a new research program. Kalinowski (2010) explored student misconceptions of
CIs using both qualitative and quantitative methods.
Of course, qualitative methods have more to offer than just these two features (more
complete description and new directions). In our account of the four studies that follows
we will also illustrate how qualitative methods have helped correct our misinterpretations
of quantitative results, and in other cases provided triangulation. Statistical reasoning is
often fragile, and quantitative methods can fail to capture subtleties and layered
misconceptions. For example, a quantitative survey may provide an indication of how
many students have a false belief about some statistical concept, but not necessarily how
they arrived at that false belief, or which other statistical concepts might be implicated.
Qualitative methods can help us access processes and the mental models at work in the
formation of misconceptions.
Finally, we will address the issue of robustness in qualitative research. Qualitative
methods are often mis-associated with terms such as subjective or biased. In reality,
research judgment is an integral and important part of both quantitative and qualitative
methods. In the final section of this paper we will explicate established guidelines
(namely those of Elliott, Fisher, & Rennie, 1999) for maintaining rigour in qualitative
research and argue that the same standards should also be expected of quantitative
research.
2. ACHIEVING MORE COMPLETE DESCRIPTIONS OF PHENOMENA:
FIDLER ET AL. (2004)
As mentioned above, one major goal of statistical reform in psychology is the
replacement of NHST p-values with CIs. A common way to examine reform progress is
via journal surveys on the prevalence of reporting practices (e.g., Cumming et al., 2007;
24
Thompson & Snyder, 1997). Such surveys provide quantitative estimates of the extent or
lack of change in statistical practice.
In psychology, such journal surveys have consistently demonstrated little change in
response to reformers’ calls for downplaying NHST. In medicine, by contrast, changes
have been reasonably dramatic, starting in the mid-1980s when several journal editors
enforced new reporting policies. Fidler et al. (2004) investigated changes in medicine by
surveying statistical practices in two medical journals, the American Journal of Public
Health (AJPH) and Epidemiology. Both journals were subject to strict editorial policies
from then-editor Kenneth Rothman that eschewed p-values and encouraged use of CIs.
Quantitative The quantitative component of this study recorded the proportion of
articles reporting p-values versus CIs. Results revealed a dramatic increase in the uptake
of CIs under Rothman’s editorship—from 10% pre-Rothman (1983) to 60% at the peak
of his influence (1987). There was a corresponding drop in p-value reporting: from 63%
in 1982 to just 6% in 1986–1989. In Epidemiology, the influence of Rothman’s policy
was even more striking: 94% of articles reported CIs in 2000 and none reported p-values.
From the quantitative survey alone it seemed that statistical reform in medicine had been
quite successful.
Qualitative The qualitative component examined the interpretation of results, in
particular, how the increase in CI reports changed the way authors discussed their results.
Did they now reflect on the width of the CI and talk about issues of statistical
power/precision (we know they didn’t with p-values!)?
Conclusion Results from the qualitative analysis revealed that, despite the frequent
reporting of CIs, incidences of CI interpretation were rare. Of the articles reporting CIs,
the vast majority still made their interpretations in NHST terms: They continued to make
references to the null hypothesis and to discuss results in terms of significant and/or non-
significant. In many ways, the discussion sections of these papers were identical to those
in p-value papers. In other words, CIs had been reported (added to tables, text, and
occasionally figures) to fulfill editorial hurdles, but they had made little impact on how
researchers thought about and interpreted their results. The discrepancy between the
proportion of reporting (the quantitative component of the study) and incidences of
interpretation (the qualitative component of the study) revealed that the seemingly
successful statistical reform in medicine was in fact relatively superficial.
In this study the use of mixed methods revealed a more complete picture: Medical
researchers conformed to the new reporting policy and included CIs in their papers, but
there had been no substantial cognitive change from dichotomous NHST thinking to CI
estimation-based thinking. Fidler et al. (2004) concluded that “editors can lead
researchers to confidence intervals, but can’t make them think” (p. 119).
3. ACCESS TO PROCESSES AND REASONING: FAULKNER (2005)
Qualitative methods help describe complex mental processes and reasoning that are
difficult to examine with quantitative methods alone. Faulkner (2005) provides an
example. Faulkner aimed to improve probationary psychologists’ interpretation of the
outcomes of Randomized Control Trials (RCT). The study was again motivated by the
argument that CIs are easier to understand than NHST, and can elicit more
comprehensive and adequate interpretations (e.g., Schmidt, 1996; Schmidt & Hunter,
1997). Thirty-five probationary psychologists took part in a teaching intervention, which
25
consisted of one-to-one tutorials on how to interpret various RCT outcomes. In some
RCT scenarios results were presented as NHST p-values and in others exactly the same
results were presented as CIs. Immediately after the intervention, the participants
completed two tasks. First, the participants rated their preference for each of the two
presentation styles on Likert scales (quantitative). Second, they wrote short
interpretations of results of some new RCT scenarios in their own words (qualitative).
Quantitative Students rated their preference for NHST or CI presentation on a 7-
point Likert scale (e.g., 1=strongly prefer CI format, 4=indifferent, 7=strongly prefer
NHST format). Overall, 75% of participants expressed a preference (i.e., strongly,
somewhat, or slightly preferred) towards the CI format. Only a minority of participants
(25%) had any level of preference for the NHST format.
Qualitative Students wrote short interpretations of RCT results presented as
either CIs or NHST p-values in their own words. We coded and analysed their texts. In
our analysis of qualitative data, we considered the comprehensiveness, structure, and
quality of their descriptions.
For comprehensiveness, we looked at the number of descriptions containing the
following five components: (1) the direction of effect, (2) effect size, (3) clinical
significance, (4) difference between groups/statistical significance, and (5)
power/precision (interval width). To analyse structure we looked at how similar each of
the students’ responses were. Was there a routine answer, or a lot of variation in their
responses? Finally, for quality we examined whether qualifying and linking statements
were used to make conceptual connections between the five components in the
comprehensiveness list above.
For both NHST and CI presentations of results, students’ descriptions were
surprisingly comprehensive, with above 90% of students mentioning components (1) to
(4). The only substantial difference between the presentation formats was in how often
students mentioned (5) power/precision. When results were presented as NHST, only
70% of students made mention of power/precision; when results were presented as CIs,
97% of students did.
The analysis of structure revealed that participants generally resorted to a rigid
interpretational routine when presented with NHST. CI descriptions in comparison were
more varied in both content and order. Table 1 provides some typical examples of
interpretations of the two formats.
As mentioned, when assessing quality we looked for qualifying and linking
statements that reflected conceptual connections between the components listed above. In
other words, we searched students’ answers for any extra elements within the NHST and
CI descriptions that were not part of the tutorial instructions. Qualifying statements
included statements such as “a large effect size is good” or “clinical significance of 50%
is encouraging.” Examples of linking statements included “effect size is large leading to a
clinically significant effect” and “non-statistically significant results were due to low
power.” Examples of overall conclusions included “therapy has a good effect overall”
and “I would use Therapy A because it appeared to have a greater effect.” On average,
these extra elements were found in 90% of descriptions of CI results, compared to only
15% of descriptions of NHST results. In sum, the qualitative analysis in Faulkner’s
(2005) study supported the argument that CIs can elicit better, more insightful
interpretations.
no reviews yet
Please Login to review.