Validity evidence for the AP(r) Biology examination:
An HLM study poster
John S. Klaric
jsklaric@uncg.edu University of North Carolina - Greensboro
W. David Scales wdscales@uncg.edu
The Advanced Placement (AP(r)) program, sponsored by The College Board,
provides a gateway for introducing high school students to collegiate
coursework (College Board, 2005). Successful participation and high
achievement in AP(r) programs in science and mathematics is related to
higher grades in these subjects compared to those not in AP(r) programs
(Morgan & Ramist, 1998). Previous research
has indicated that academic
success can be measured by college grades in the second course of an
introductory series and by the probability of ultimate college
graduation (Morgan & Klaric, 2008). These
authors found that
participation in AP(r) programs is related to increases in the
probability of predicted college graduation even when controlling for
student ability level, measured by scores on the SAT. The present work
focuses on the AP(r) Biology Examination specifically. Because students
were nested within academic institutions, hierarchical linear modeling (HLM)
techniques were used to provide additional validity evidence for
the prediction of student academic success by performance on this
examination. SAT scores and gender were used as covariates in Level 1
(Student-level) modeling. Previous research had suggested a
relationship between academic achievement and size of academic
institutions (Bonesronning, 1996); therefore, the
role of Biology
departmental size in predicting such success was investigated as a Level
2 (School Level) variable. Complete results from this study will be
presented during the poster session.
INTRODUCTION
The Advanced Placement (AP(r)) program, sponsored by the College Board,
provides a method for introducing high school students to collegiate
coursework (College Board, 2005). Successful participation and high
achievement in AP(r) programs in science and mathematics is related to
higher grades in these subjects compared to those not in AP(r) programs (Morgan
& Ramist, 1998). Achievement in AP(r)
programs is assessed with
both multiple-choice and free-response items (Bridgeman & Morgan, 1994).
Previous research has indicated that academic success can be measured by
college grades in the second course of an introductory series and by the
probability of ultimate college graduation (Morgan & Klaric,
2007). The
predictive capability of AP examination performance was maintained when
SAT scores and gender were used as model covariates (Morgan & Klaric, 2007).
Traditional parametric analyses of outcome variables have focused on
identification of possible variance sources after exposure to
categorical experimental conditions. Possible bias introduced to these
"level 1" variance sources due to differential membership in
superordinate organizational groups (level 2 units
such as academic
institutions) have generally been ignored or accounted for with complex
weighting procedures. Univariate statistical
analyses have been
designed that model simultaneously variance sources from both level 1
and level 2 units. Because these multilevel, or hierarchical, analyses
have until recently assumed that such variance sources are additive in
their effects, this has been termed hierarchical linear modeling (HLM;
Raudenbush & Bryk,
2002). This research attempts to provide additional
validity evidence for the prediction of student academic success by
performance on the AP Biology examination. SAT scores and gender were
covariates. Because students were nested within academic institutions,
data were analyzed with HLM techniques. Previous research had suggested
a relationship between academic achievement and size of academic
institutions (Bonesronning, 1996); therefore, the
role of Biology
departmental size in predicting such success was investigated as a level
2 variable.
METHODS
Source Data
The College Board provided scholastic information on over 70,000
students enrolled in 1994 nested within 27 academic institutions. The
institutions and students were carefully selected.
Research Questions
The following research questions are proposed for this investigation:
1. Is there significant variation among schools sampled in the
prediction of performance during the second of a two-course introductory
Biology sequence?
2. If there is significant variation among schools in this
prediction, what is the magnitude of that unconditional variation?
3. Do Level 2 variables (Biology Department size, institution size) account for
a portion of that unconditional variation?
4. With school level variables held constant, do AP Biology
examination scores predict performance during the second course of a
Biology introductory sequence? If these are predictive when used alone,
is the predictive power maintained when SAT examination scores and
gender are accounted for?
5. Again with school level variables held constant, does
performance during the first course of a Biology introductory sequence
also predict second course performance? Is the predictive power
maintained when SAT examination scores and gender are accounted for?
How does this prediction compare to AP examination scores?
6. When the variation in Biology Department size is accounted for
at the school level, do the coefficient estimates or their standard
errors at level one change substantially from those seen during analyses
for previous questions?
Questions 1 and 2 can be partially answered in this proposal. The
remaining questions will be addressed in the final paper.
Preparation of Data Sets:
Step 1. Datasets for each institution were built from the raw data
provided; there was one record for each student. These AP datasets
contained information on each AP examination taken, a collegiate course
history, and demographic information. A second dataset for each
institution was created that made the college transcript information
more readily available. In this course dataset, there were multiple
records for each student with one record for each course on the
transcript. Each course record contained a unique student identifier,
course name, grade earned, and date of taking the course.
Step 2-4. Student datasets were created that summarized AP examination
scores and college transcript data. These datasets were modified to
include variables required for HLM modeling. Methods used will be
described at the poster session.
Analytic Strategies:
A two-level modeling strategy was employed in order to account for
variance arising from the nesting of student records within different
institutions. Because biological concepts can be taught by several
departments within a specific institution, using a three-level
hierarchical model was considered.
Questions 1 and 2: A one-way ANOVA model with random effects suggested
responses to these questions.
The level 1 model was:
Yij = b0j + rij, (1)
where Yij is the predicted grade earned in the second
of a two-course
Biology sequence for student i at school j, b0j is
the unconditional
mean grade earned in that course at each school j, and rij
is an error
term for student i at school j that is assumed for
this and the
remainder of the equations to be independently and normally distributed (0,s2).
The level 2 model was:
b0j = g00 + uoj, (2)
where b0j is the mean grade earned at each school j or intercept from
Level 1 regression, g00 is the grand mean across all schools, and u0j is
an error term for each school j that is assumed to be independently and
normally distributed (0,t00). Using the variance of the error
distributions at each level (s2 and t00), an unconditional intra class
correlation coefficient was also computed.
Question 3-6. It is proposed that these questions be addressed with a
subsequent series of HLM equations. These equations have been
developed, and will be presented in the final paper.
PRELIMINARY RESULTS:
Question 1. Is there significant variation among schools in the
prediction of performance during the second of a two-course introductory
Biology sequence?
Results from the analysis of this first question, assessed with a
one-way ANOVA with random effects for 3581 students at Level One nested
within 19 institutions at Level 2, are shown in Table 2. Across all
institutions, a mean grade of approximately 2.8 is earned in the second
of a 2-course introductory Biology sequence; however, analysis of
variance components indicates that significant variation in grades
earned exists among the institutions examined (c2 (df=18)
= 446.77,
p<.001). This finding indicates that it is not reasonable to assume
that all institutions have the same mean, and that a multi-level
analytic approach is warranted.
Table 2. One-Way ANOVA, Random Effects
Fixed Effect Coefficient (SE) T Ratio p
Intercept (g00) 2.814 (0.091) 30.850 <.001
Random Effects Variance component (SE) c2 df p
Intercept (u0j) 0.148 (0.385) 446.77 18 <.001
rij (pred. s2) 0.870 (0.933)
Model Fit
Deviance 9725.42
# of estimated parameters 2
Note: SE=Standard Error
Question 2. If there is significant variation among schools in this
prediction, what is the magnitude of that unconditional variation?
An intraclass correlation coefficient (ICC) showing
the proportion of
variance in grades due to differences among institutions can be based on
the random effects analysis in Table 2. The magnitude of the ICC, and
its meaning, will be presented during the poster session.
SIGNIFICANCE
This study is proposed to determine if the validity of the Biology AP
examination established in previous work is replicable using HLM
techniques. Study 1 and 2 show that HLM is feasible and necessary for
these data. Given the limitations summarized below, HLM can be used to
support the validity of AP programs.
One factor that may limit the ability to generalize the present findings
is that HLM analyses assume a random sample from the population of Level
2 units, in this case academic institutions. Institutions were not
randomly sampled; they were selected from a list of institutions
according to a priori specifications.
REFERENCES:
Bonesronning, H. (1996). Student composition
and school performance:
Evidence from Norway. Education Economics, 4, 11-31.
Bridgeman, B., & Morgan, R. (1994). Relationships Between Differential
Performance on Multiple-Choice and Essay Sections of Selected AP Exams
and Measures of Performance in High School and College. Downloaded on
4/19/2006 from http://www.collegeboard.com/research/home.
Cohen J., Cohen P., West, S.G., & Aiken, L.S. (2003). Applied
multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
Mahwah NJ, Lawrence Erlbaum Associates.
College Board. (2006). Mission, Purpose, Goals. Downloaded on 4/19/2006
from
http://apcentral.collegeboard.com/article/0,,150-157-0-2155,00.html.
Morgan R, & Klaric J. (2006). Comparative
analyses of AP examinees with
non-AP students. Paper presented at the 2006 Advanced Placement Annual
Conference, Orlando, FL.
Morgan, R., & Ramist, L. (1998). Advanced
Placement students in
college: An investigation of course grades at 21 colleges.
Downloaded
on 4/19/2006 from http://www.collegeboard.com/research/home/.
Raudenbush, S.W. & Bryk,
A.S. (2002). Hierarchical linear models:
Applications and data analysis methods. (2nd edition). Thousand Oaks,
CA: Sage Publishing.
Snijders, T.A.B., & Bosker,
R.J. (2003). Multilevel analysis: An
introduction to basic and advanced multilevel modeling. Thousand Oaks,
CA: Sage Publishing.