Analyzing Differential Item Functioning With
Cognitive Diagnosis Models
tlroxbur@uncg.edu
UNC Greensboro
Matthew Burke
mjburke@uncg.edu
The purpose of this study is to investigate Differential Item Functioning (DIF)
in a test of English language proficiency using an approach recently developed
by Zhang (2006, dissertation) based on one particular form of a Cognitive
Diagnosis Model (CDM). Differential Item Functioning occurs when the
probability of a correct response to a test item is different for two groups,
despite the groups having the same level of ability. DIF is an indication of an
unfair test, in which something other than the ability to be measured by a test
is influencing the observed performance. Cognitive Diagnosis Models represent
examinee ability as a profile of dichotomous skills. The primary advantage of a
CDM approach over Classical Test Theory or Item Response Theory is that it can
provide diagnostic information to students and teachers concerning which of the
skills required by a test have been mastered. This allows for the
identification of which skills have not been mastered, thus allowing for
directed instruction to address those skills specifically. Additionally, CDMs
have been used as a new method (Zhang, 2006) to improve the identification of
unfair test items. The current study applies this new method to a large-scale,
standardized test of language proficiency, the Examination for the Certificate
of Proficiency in English (ECPE). Data from the 2003 ECPE will be analyzed to
determine whether or not DIF is present in any of the ECPE’s 30 items. Results
will be discussed in terms of test fairness.
Objectives:
The purpose of this study is to investigate Differential Item Functioning (DIF)
in a test of English language proficiency using an approach recently developed
by Zhang (2006, dissertation) based on one particular form of a Cognitive
Diagnosis Model (CDM). The CDM in question is the deterministic input noisy
“and” gate model (DINA) developed by Haertel (1989), which
postulates that the likelihood of a correct response by an examinee depends
upon mastery of all skills required by a test item. Zhang (2006) showed that
the CDM approach to assessing DIF outperformed traditional methods. This study
expands upon the investigation performed by Zhang (2006) by extending it to
another dataset. Data from the 2003 Examination for the Certificate of
Proficiency in English (ECPE) will be analyzed to determine whether or not DIF
is present in any of the ECPE’s 30 items. The results of the CDM approach to
DIF will also be compared to those of traditional methods of assessing DIF, in
an attempt to further demonstrate the effectiveness and benefits of an approach
based on CDMs.
Theoretical Framework:
Cognitive Diagnosis Models (such as DINA) represent examinee ability as a
profile of dichotomous skills. The attribute mastery profile for each examinee
is a vector of ones and zeroes representing mastery/non-mastery of each of the
skills required by test items. Additionally, a test can be described by a
Q-Matrix (Tatsuoka, 1985), which is a “blueprint”
indicating which particular abilities or skills are required by each item on a
test. The Q-Matrix is an attribute by test item matrix of ones and zeroes,
where ones indicate that an attribute is required by an item, and zeroes
indicate that a skill is not required by an item. The Q-Matrix, in conjunction
with an examinee’s abilities (as represented by the attribute mastery profile),
provide information concerning the individual’s chances of answering items on
the test correctly. The DINA describes the relationship between the Q-Matrix
and the examinee attribute mastery profile in an all or none fashion (Haertel, 1989). If the examinee possesses all of the
requisite abilities, they have a high probability of answering the item
correctly. Lacking any one of the requisite attributes reduces the probability
of a correct response to chance (or guessing) levels. CDMs have the desirable
quality of providing information as to the acquired skills of examinees, which
allows for focused instruction to address any non-mastered skills. In addition
they also provide information as to the appropriateness of the Q-Matrix
specification (Henson & Templin, 2006). The primary advantage of a CDM
approach over Classical Test Theory or Item Response Theory is that it can
provide diagnostic information to students and teachers concerning which of the
skills required by a test have been mastered. This allows for the
identification of which skills have not been mastered, thus allowing for
directed instruction to address those skills specifically.
Differential Item Functioning occurs when the probability of a correct response
to an item is different for two groups, despite the groups having the same
level of ability. DIF is an indication of an unfair test, in which something
other than the ability to be measured by a test is influencing the observed
performance. In other words, some of the latent space has not been specified
within the confines of the test (Ackerman & Evans, 1994). Sinharay (2004) and Zhang (2006) offer a new definition for
DIF in the context of a CDM: DIF occurs when the probability of a correct
response differs for two groups matched according to all possible
instantiations of the attribute mastery profile. Many methods have been
formulated to detect items that exhibit differential functioning including, but
not limited to: the Mantel-Haenszel (MH) procedure
(see Holland & Thayer, 1988), a logistic regression (LR) approach (Swaminathan &Rogers, 1990), simultaneous item bias test
(SIBTEST; Shealy & Stout, 1993) and calculating
the area between two item characteristic curves (Raju,
1988). Zhang (2006) has provided yet another way to assess the fairness of test
items within the framework of a CDM.
Methods:
The current study will use the method developed by Zhang (2006) to assess the
degree of DIF present in the 30 items of the ECPE. Two other traditional
methods will also be implemented to assess DIF (MH and SIBTEST), and these will
be compared to the approach based on CDMs. MH and SIBTEST compare groups based
on the matching of total test scores, on the other hand, Zhang’s (2006)
approach matches the groups based on all observed attribute mastery profiles.
The Q-Matrix for the ECPE is provided by Henson (manuscript submitted for
publication). Significance tests for MH will be performed via a Chi-square
statistic, and for SIBTEST, two-tailed z tests will be used.
The determination of the reference and focal groups will be based upon
examinees’ native language. Groups will be split according to whether the
native language of their indicated country of origin is a romance language.
This will be done because romance languages share more commonalities with the
English language than non-romance languages. If DIF is present for an item,
then this may be an indication that the item is sensitive to the structure of
all romance languages and not English in particular.
Data Source:
These data are from a 2003 administration of the Examination for the
Certificate of Proficiency in English (ECPE). The ECPE is an advanced test of
English language ability developed by the English Language Institute at the
University of Michigan. This sample includes 2,922 examinees from countries in
Europe, Africa, and Asia.
Results/Educational Contributions:
Results are not yet available as statistical analyses are currently underway.
The results will be available for discussion by the time of the meeting.
The contributions of this research to the field of educational measurement are:
increased knowledge of the functioning of the ECPE as a language assessment
device, a further understanding of DIF and some of its traditional
formulations, new insight into alternative methods of assessing DIF, further
understanding of the applications and usefulness of CDMs, and an additional
practical application of Cognitive Diagnosis Models to the field of language
testing. In the history of standardized testing, CDMs are relatively new on the
scene. Much work is left to be done to demonstrate the characteristics of their
functioning. This research hopes to provide some additional insights into a
burgeoning field of useful models.