Effectiveness of mantel-haenszel and logistic regression statistics in detecting differential item functioning under different conditions
Abstract/ Overview
ABSTRACT
Differential Item Functioning (DIF) is the different probability of responding to a test item by individuals with the same ability level, but from two different groups. The groups may be based on gender, race or disability. DIF can be detected by methods such as Mantel-Haenszel (MH) and Logistic Regression (LR) which classify DIF items into negligible, moderate, and large DIF. Conditions such as Sample size, Ability distribution and Test length may have a significant effect on DIF detection. A conceptual measurement model indicated that person achievement is made observable through a set of items and the items vary in their locations on the latent variable. The purpose of this study was to determine the effect of different conditions on the detection of DIF using MH and LR statistics. The objectives of the study were; to determine the effect of different conditions on the Effect size and the number of DIF detections; and to compare the effect of different conditions on the number of DIF detections using MH and LR statistics. A Factorial research design was used in the study. The independent variables were Sample size, Ability distribution and Test length. The dependent variables were the Effect sizes and the number of DIF detections. The population of the study was 2000. A stratified random sampling technique was used with the stratifying criteria as the reference and focal groups with sample sizes 20, 60 and 1000. This was based on the examinee numbers in a classroom or in a school. WinGen3 software was used to generate dichotomous data with 1000 replications so as to reduce the sampling variance. Two Ability distribution conditions were established with tests of 10, 30 and 50 items selected according to the number of items often observed on personality inventories and achievement tests. A pilot study was conducted. Face validity was obtained by experts and a reliability coefficient of 0.75 was obtained using Kuder-Richardson method. ANOVA was used for analysis at a level of significance of .05. Line graphs also aided interpretation. The findings of the study showed that sample size had a significant effect on the Effect size for Type B items using MH and Type A items using LR statistic. Ability distribution had a significant effect on Type C items using MH but no effect using LR statistic. Test Length had no effect on all DIF types. Ability distribution contributed to the number of DIF items of all kinds detected using both statistics. MH statistic detected more Type C items than LR statistic. The LR statistic detected more Type A and B items than the MH statistic. It was therefore concluded that the effect of Sample size and Ability distribution depended on the DIF statistic used. Test length had no significant effect on Effect size using both statistics. Also the number of DIF items detected depended on the Ability distribution. MH detected more Type C items than Type A and B items while LR detected more Type A and B items than Type C items. It was recommended that test developers use MH statistic when detecting Type C items, and LR for Type A and B items. The findings may be used by test developers to determine the items to be included in a test or those to be omitted to ensure that a test presented to the examinees is free of bias.
Collections
- School of Education [69]