Validity evidence for the AP(r) Biology examination: An HLM study poster

John S. Klaric jsklaric@uncg.edu University of North Carolina - Greensboro
W. David Scales wdscales@uncg.edu

The Advanced Placement (AP(r)) program, sponsored by The College Board, provides a gateway for introducing high school students to collegiate coursework (College Board, 2005).   Successful participation and high achievement in AP(r) programs in science and mathematics is related to higher grades in these subjects compared to those not in AP(r) programs (Morgan & Ramist, 1998).  Previous research has indicated that academic success can be measured by college grades in the second course of an introductory series and by the probability of ultimate college graduation (Morgan & Klaric, 2008).  These authors found that participation in AP(r) programs is related to increases in the probability of predicted college graduation even when controlling for student ability level, measured by scores on the SAT.  The present work focuses on the AP(r) Biology Examination specifically.  Because students were nested within academic institutions, hierarchical linear modeling (HLM) techniques were used to provide additional validity evidence for the prediction of student academic success by performance on this examination.  SAT scores and gender were used as covariates in Level 1 (Student-level) modeling.  Previous research had suggested a relationship between academic achievement and size of academic institutions (Bonesronning, 1996); therefore, the role of Biology departmental size in predicting such success was investigated as a Level 2 (School Level) variable.  Complete results from this study will be presented during the poster session.

INTRODUCTION

The Advanced Placement (AP(r)) program, sponsored by the College Board, provides a method for introducing high school students to collegiate coursework (College Board, 2005).   Successful participation and high achievement in AP(r) programs in science and mathematics is related to higher grades in these subjects compared to those not in AP(r) programs (Morgan & Ramist, 1998).  Achievement in AP(r) programs is assessed with both multiple-choice and free-response items (Bridgeman & Morgan, 1994). Previous research has indicated that academic success can be measured by college grades in the second course of an introductory series and by the probability of ultimate college graduation (Morgan & Klaric, 2007).  The predictive capability of AP examination performance was maintained when SAT scores and gender were used as model covariates (Morgan & Klaric, 2007).  Traditional parametric analyses of outcome variables have focused on identification of possible variance sources after exposure to categorical experimental conditions.  Possible bias introduced to these "level 1" variance sources due to differential membership in superordinate organizational groups (level 2 units such as academic institutions) have generally been ignored or accounted for with complex weighting procedures.  Univariate statistical analyses have been designed that model simultaneously variance sources from both level 1 and level 2 units.  Because these multilevel, or hierarchical, analyses have until recently assumed that such variance sources are additive in their effects, this has been termed hierarchical linear modeling (HLM; Raudenbush & Bryk, 2002).  This research attempts to provide additional validity evidence for the prediction of student academic success by performance on the AP Biology examination.  SAT scores and gender were covariates.  Because students were nested within academic institutions, data were analyzed with HLM techniques.  Previous research had suggested a relationship between academic achievement and size of academic institutions (Bonesronning, 1996); therefore, the role of Biology departmental size in predicting such success was investigated as a level 2 variable.

METHODS

Source Data

The College Board provided scholastic information on over 70,000 students enrolled in 1994 nested within 27 academic institutions.  The institutions and students were carefully selected.

Research Questions

The following research questions are proposed for this investigation: 
1. Is there significant variation among schools sampled in the prediction of performance during the second of a two-course introductory Biology sequence?
2. If there is significant variation among schools in this prediction, what is the magnitude of that unconditional variation?
3. Do Level 2 variables (Biology Department size, institution size) account for a portion of that unconditional variation?
4. With school level variables held constant, do AP Biology examination scores predict performance during the second course of a Biology introductory sequence?  If these are predictive when used alone, is the predictive power maintained when SAT examination scores and gender are accounted for?
5. Again with school level variables held constant, does performance during the first course of a Biology introductory sequence also predict second course performance?  Is the predictive power maintained when SAT examination scores and gender are accounted for? How does this prediction compare to AP examination scores?
6. When the variation in Biology Department size is accounted for at the school level, do the coefficient estimates or their standard errors at level one change substantially from those seen during analyses for previous questions? Questions 1 and 2 can be partially answered in this proposal.  The remaining questions will be addressed in the final paper.

Preparation of Data Sets: 
Step 1.  Datasets for each institution were built from the raw data provided; there was one record for each student.  These AP datasets contained information on each AP examination taken, a collegiate course history, and demographic information.  A second dataset for each institution was created that made the college transcript information more readily available.  In this course dataset, there were multiple records for each student with one record for each course on the transcript.  Each course record contained a unique student identifier, course name, grade earned, and date of taking the course.
Step 2-4.  Student datasets were created that summarized AP examination scores and college transcript data.  These datasets were modified to include variables required for HLM modeling.  Methods used will be described at the poster session.
Analytic Strategies:  A two-level modeling strategy was employed in order to account for variance arising from the nesting of student records within different institutions.  Because biological concepts can be taught by several departments within a specific institution, using a three-level hierarchical model was considered.
Questions 1 and 2:  A one-way ANOVA model with random effects suggested responses to these questions. 
The level 1 model was:  Yij = b0j + rij, (1) where Yij is the predicted grade earned in the second of a two-course Biology sequence for student i at school j, b0j is the unconditional mean grade earned in that course at each school j, and rij is an error term for student i at school j that is assumed for this and the remainder of the equations to be independently and normally distributed (0,s2).
The level 2 model was: b0j = g00 + uoj,  (2) where b0j is the mean grade earned at each school j or intercept from Level 1 regression, g00 is the grand mean across all schools, and u0j is an error term for each school j that is assumed to be independently and normally distributed (0,t00).  Using the variance of the error distributions at each level (s2 and t00), an unconditional intra class correlation coefficient was also computed.
Question 3-6.  It is proposed that these questions be addressed with a subsequent series of HLM equations.  These equations have been developed, and will be presented in the final paper.

PRELIMINARY RESULTS: 

Question 1.  Is there significant variation among schools in the prediction of performance during the second of a two-course introductory Biology sequence?

Results from the analysis of this first question, assessed with a one-way ANOVA with random effects for 3581 students at Level One nested within 19 institutions at Level 2, are shown in Table 2.  Across all institutions, a mean grade of approximately 2.8 is earned in the second of a 2-course introductory Biology sequence; however, analysis of variance components indicates that significant variation in grades earned exists among the institutions examined (c2 (df=18) = 446.77, p<.001).  This finding indicates that it is not reasonable to assume that all institutions have the same mean, and that a multi-level analytic approach is warranted.

Table 2.  One-Way ANOVA, Random Effects
Fixed Effect Coefficient (SE)  T Ratio p
Intercept (g00) 2.814 (0.091)  30.850 <.001
   
Random Effects Variance component (SE) c2 df p
Intercept (u0j) 0.148 (0.385) 446.77 18 <.001
rij (pred. s2) 0.870 (0.933)  

Model Fit
Deviance  9725.42 
# of estimated parameters  2 
Note: SE=Standard Error

Question 2.  If there is significant variation among schools in this prediction, what is the magnitude of that unconditional variation?
An intraclass correlation coefficient (ICC) showing the proportion of variance in grades due to differences among institutions can be based on the random effects analysis in Table 2.  The magnitude of the ICC, and its meaning, will be presented during the poster session.

SIGNIFICANCE

This study is proposed to determine if the validity of the Biology AP examination established in previous work is replicable using HLM techniques.  Study 1 and 2 show that HLM is feasible and necessary for these data.  Given the limitations summarized below, HLM can be used to support the validity of AP programs.

One factor that may limit the ability to generalize the present findings is that HLM analyses assume a random sample from the population of Level 2 units, in this case academic institutions.  Institutions were not randomly sampled; they were selected from a list of institutions according to a priori specifications.

REFERENCES: 
Bonesronning, H.  (1996).  Student composition and school performance:
Evidence from Norway.  Education Economics, 4, 11-31.

Bridgeman, B., & Morgan, R. (1994). Relationships Between Differential
Performance on Multiple-Choice and Essay Sections of Selected AP Exams
and Measures of Performance in High School and College.  Downloaded on
4/19/2006 from http://www.collegeboard.com/research/home.

Cohen J., Cohen P., West, S.G., & Aiken, L.S.  (2003).  Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
Mahwah NJ, Lawrence Erlbaum Associates.

College Board. (2006). Mission, Purpose, Goals.  Downloaded on 4/19/2006
from
http://apcentral.collegeboard.com/article/0,,150-157-0-2155,00.html.

Morgan R, & Klaric J. (2006).  Comparative analyses of AP examinees with
non-AP students.  Paper presented at the 2006 Advanced Placement Annual
Conference, Orlando, FL.

Morgan, R., & Ramist, L. (1998).  Advanced Placement students in
college: An investigation  of course grades at 21 colleges.  Downloaded
on 4/19/2006 from http://www.collegeboard.com/research/home/.

Raudenbush, S.W. & Bryk, A.S. (2002).  Hierarchical linear models:
Applications and data analysis methods.  (2nd edition). Thousand Oaks,
CA: Sage Publishing.

Snijders, T.A.B., & Bosker, R.J. (2003).  Multilevel analysis: An
introduction to basic and advanced multilevel modeling.  Thousand Oaks,
CA: Sage Publishing.