A Hard Test of Individual Heterogeneity in Response Scale Usage: Evidence From Qatar
A common approach to correcting for interpersonal differences in response category thresholds in surveys is the use of anchoring vignettes. Here we present results from the first applications of anchoring vignettes in Qatar and, to our knowledge, the Arab world. We extend previous findings both geographically and substantively to show that a range of social and demographic variables account for important variation in response scale use in the domains of economic well-being and political efficacy, and that this variation leads to substantively misleading conclusions when not appropriately modeled. Qatar’s exceptionally homogeneous citizenry presents a uniquely hard test of response scale heterogeneity, and our results suggest that potentially obfuscating differences in individual reporting styles are even more ubiquitous than previously known. When using surveys to measure complex questions and concepts, the issue of interpersonal incomparability must be addressed. Individuals understand concepts and questions differently: A Yemeni’s moderate economy may appear as destitution to a Kuwaiti; the political openness of a typical Latvian may be illiberal to most Swedes. The question of how to account for such interpersonal heterogeneity in response scale usage, also called differential item functioning (DIF), continues to garner much scholarly investigation (e.g., Aldrich & McKelvey, 1977; Alvarez & Nagler, 2004; Brady, 1985; King, Murray, Salomon, & Tandon, 2004; King & Wand, 2007; Stegmueller, 2011). Since its introduction in political science more than a decade ago by King and colleagues (2004), the use of anchoring vignettes to correct for DIF in survey responses has spread to diverse areas of social, economic, and health research (e.g., Bratton, 2010; Chevalier & Fielding, 2011; Hopkins & King, 2010; Kapteyn, Smith, & van Soest, 2007; King & Wand, 2007; Kristensen & Johansson, 2008; Paccagnella, 2013; Rice, Robone, & Smith, 2011; Salomon, Tandon, & Murray, 2004; Wand, 2013). The approach measures and controls for individual differences in response scale by first asking general self-assessment questions. These self-assessments are then supplemented with follow-up vignettes that portray relevant aspects of the lives of hypothetical individuals, which respondents rate according to the same scale. Because the vignettes are anchored to concrete cases, variability in assessment can be attributed directly to differences in the subjective scales used by respondents, offering both an individual-level measure of, and method of correction for, DIF. Given obvious cross-country disparities in social, economic, and political experiences, much of the resulting research agenda has used anchoring vignettes as a way to adjust for differences in understanding concepts across distinct national populations. Such cross-group comparison is expected to introduce DIF on account of often unspecified underlying differences in ‘‘culture,’’ frequently operationalized as a simple dummy variable. A common outcome is that, after accounting for variability in response category thresholds, anomalous or curious findings—for instance, higher self-ratings of political efficacy among citizens of a nondemocratic state compared with those of a democracy (King et al., 2004)—are shown to be spurious. In practice, then, original theoretical concern over interpersonal incomparability bias in surveys has largely proceeded instead as investigation of intergroup incomparability, understating or ignoring the effects of response scale heterogeneity within culturally cohesive populations—that is, among individuals qua individuals. Yet, more recent anchoring vignette applications, most notably in the area of health, have demonstrated that even basic demographic factors such as sex, education, and work status can impact how people use survey response scales, and that this individual-level heterogeneity can lead to misleading conclusions if not modeled appropriately (Angelini, Cavapozzi, & Paccagnella, 2012; Grol-Prokopczyk, 2014; Grol-Prokopczyk, Freese, & Hauser, 2011). There is also evidence, again from the field of health, that scale use may vary over time among individuals (Angelini, Cavapozzi, & Paccagnella, 2011). However, it remains unclear the extent to which these findings apply to other, nonhealth domains of research, or outside the context of the United States and Western Europe where extant studies have been conducted. Here we extend the analysis of the individual sources of DIF through the first application of anchoring vignettes in the Persian Gulf emirate of Qatar (and, to our knowledge, the Arab Middle East), administered in three original and nationally representative surveys conducted during 2013–2015. Beyond geographical extension to a new world region, we also expand thematically on the list of topics studied with a view to understanding inter-individual differences in survey scale usage, examining for the first time the issue of selfassessed economic well-being in addition to feelings of political efficacy. We find that a range of social and demographic variables—age, sex, education, and social class—account for important variation in response scale use even within the cohesive cultural group represented by Qatari nationals. Qatar’s exceptionally high (citizen) homogeneity thus presents a uniquely hard test of individual heterogeneity in response behavior, and our positive findings demonstrate that demographically linked scale use differences are even more ubiquitous than previously known.
- Research [137 items ]