Analyzing Differential Item Functioning With Cognitive Diagnosis Models

Tiese Roxbury
tlroxbur@uncg.edu
UNC Greensboro
Matthew Burke
mjburke@uncg.edu

The purpose of this study is to investigate Differential Item Functioning (DIF) in a test of English language proficiency using an approach recently developed by Zhang (2006, dissertation) based on one particular form of a Cognitive Diagnosis Model (CDM). Differential Item Functioning occurs when the probability of a correct response to a test item is different for two groups, despite the groups having the same level of ability. DIF is an indication of an unfair test, in which something other than the ability to be measured by a test is influencing the observed performance. Cognitive Diagnosis Models represent examinee ability as a profile of dichotomous skills. The primary advantage of a CDM approach over Classical Test Theory or Item Response Theory is that it can provide diagnostic information to students and teachers concerning which of the skills required by a test have been mastered. This allows for the identification of which skills have not been mastered, thus allowing for directed instruction to address those skills specifically. Additionally, CDMs have been used as a new method (Zhang, 2006) to improve the identification of unfair test items. The current study applies this new method to a large-scale, standardized test of language proficiency, the Examination for the Certificate of Proficiency in English (ECPE). Data from the 2003 ECPE will be analyzed to determine whether or not DIF is present in any of the ECPE’s 30 items. Results will be discussed in terms of test fairness.

Objectives:

The purpose of this study is to investigate Differential Item Functioning (DIF) in a test of English language proficiency using an approach recently developed by Zhang (2006, dissertation) based on one particular form of a Cognitive Diagnosis Model (CDM). The CDM in question is the deterministic input noisy “and” gate model (DINA) developed by Haertel (1989), which postulates that the likelihood of a correct response by an examinee depends upon mastery of all skills required by a test item. Zhang (2006) showed that the CDM approach to assessing DIF outperformed traditional methods. This study expands upon the investigation performed by Zhang (2006) by extending it to another dataset. Data from the 2003 Examination for the Certificate of Proficiency in English (ECPE) will be analyzed to determine whether or not DIF is present in any of the ECPE’s 30 items. The results of the CDM approach to DIF will also be compared to those of traditional methods of assessing DIF, in an attempt to further demonstrate the effectiveness and benefits of an approach based on CDMs.

Theoretical Framework:
Cognitive Diagnosis Models (such as DINA) represent examinee ability as a profile of dichotomous skills. The attribute mastery profile for each examinee is a vector of ones and zeroes representing mastery/non-mastery of each of the skills required by test items. Additionally, a test can be described by a Q-Matrix (Tatsuoka, 1985), which is a “blueprint” indicating which particular abilities or skills are required by each item on a test. The Q-Matrix is an attribute by test item matrix of ones and zeroes, where ones indicate that an attribute is required by an item, and zeroes indicate that a skill is not required by an item. The Q-Matrix, in conjunction with an examinee’s abilities (as represented by the attribute mastery profile), provide information concerning the individual’s chances of answering items on the test correctly. The DINA describes the relationship between the Q-Matrix and the examinee attribute mastery profile in an all or none fashion (Haertel, 1989). If the examinee possesses all of the requisite abilities, they have a high probability of answering the item correctly. Lacking any one of the requisite attributes reduces the probability of a correct response to chance (or guessing) levels. CDMs have the desirable quality of providing information as to the acquired skills of examinees, which allows for focused instruction to address any non-mastered skills. In addition they also provide information as to the appropriateness of the Q-Matrix specification (Henson & Templin, 2006). The primary advantage of a CDM approach over Classical Test Theory or Item Response Theory is that it can provide diagnostic information to students and teachers concerning which of the skills required by a test have been mastered. This allows for the identification of which skills have not been mastered, thus allowing for directed instruction to address those skills specifically.
Differential Item Functioning occurs when the probability of a correct response to an item is different for two groups, despite the groups having the same level of ability. DIF is an indication of an unfair test, in which something other than the ability to be measured by a test is influencing the observed performance. In other words, some of the latent space has not been specified within the confines of the test (Ackerman & Evans, 1994). Sinharay (2004) and Zhang (2006) offer a new definition for DIF in the context of a CDM: DIF occurs when the probability of a correct response differs for two groups matched according to all possible instantiations of the attribute mastery profile. Many methods have been formulated to detect items that exhibit differential functioning including, but not limited to: the Mantel-Haenszel (MH) procedure (see Holland & Thayer, 1988), a logistic regression (LR) approach (Swaminathan &Rogers, 1990), simultaneous item bias test (SIBTEST; Shealy & Stout, 1993) and calculating the area between two item characteristic curves (Raju, 1988). Zhang (2006) has provided yet another way to assess the fairness of test items within the framework of a CDM.

Methods:
The current study will use the method developed by Zhang (2006) to assess the degree of DIF present in the 30 items of the ECPE. Two other traditional methods will also be implemented to assess DIF (MH and SIBTEST), and these will be compared to the approach based on CDMs. MH and SIBTEST compare groups based on the matching of total test scores, on the other hand, Zhang’s (2006) approach matches the groups based on all observed attribute mastery profiles. The Q-Matrix for the ECPE is provided by Henson (manuscript submitted for publication). Significance tests for MH will be performed via a Chi-square statistic, and for SIBTEST, two-tailed z tests will be used.
The determination of the reference and focal groups will be based upon examinees’ native language. Groups will be split according to whether the native language of their indicated country of origin is a romance language. This will be done because romance languages share more commonalities with the English language than non-romance languages. If DIF is present for an item, then this may be an indication that the item is sensitive to the structure of all romance languages and not English in particular.

Data Source:
These data are from a 2003 administration of the Examination for the Certificate of Proficiency in English (ECPE). The ECPE is an advanced test of English language ability developed by the English Language Institute at the University of Michigan. This sample includes 2,922 examinees from countries in Europe, Africa, and Asia.

Results/Educational Contributions:
Results are not yet available as statistical analyses are currently underway. The results will be available for discussion by the time of the meeting.

The contributions of this research to the field of educational measurement are: increased knowledge of the functioning of the ECPE as a language assessment device, a further understanding of DIF and some of its traditional formulations, new insight into alternative methods of assessing DIF, further understanding of the applications and usefulness of CDMs, and an additional practical application of Cognitive Diagnosis Models to the field of language testing. In the history of standardized testing, CDMs are relatively new on the scene. Much work is left to be done to demonstrate the characteristics of their functioning. This research hopes to provide some additional insights into a burgeoning field of useful models.