Effects of Performance Assessment on the Achievement and Motivation of Graduate Students
Dawson R. Hancock
The University of North Carolina at Charlotte
This
study explored the impact of performance assessment versus traditional
paper-and-pencil assessment on graduate students' achievement and motivation to
learn while enrolled in a 16-lesson course on program evaluation methods.
Students in two sections of the course were exposed to the same content and
instructional methods, with one exception: students in one section were exposed
to performance assessment during which they demonstrated their knowledge and
skills while conducting a program evaluation in a 4th grade classroom, whereas
students in the other section were provided with a written scenario of that 4th
grade program and were required to answer questions on a traditional
paper-and-pencil test about program evaluation methods. Results revealed that
students exposed to performance assessment achieved somewhat higher scores on
the final examination and demonstrated significantly higher levels of
motivation to learn than did students evaluated by traditional paper-and-pencil
tests. Qualitative analysis of the students' written expressions about the
course and of their comments from group interviews revealed possible
explanations for these outcomes. Educators know that meaningful learning requires the active
participation of the learner (Fetsco and McClure, 2005).Yet influencing students'
motivation to learn remains a formidable challenge in many classrooms. Despite
the contrasting views of motivation offered by traditional theoretical models -
behavioral (Skinner, 1953; Walker, 1996), humanistic (Deci et al., 1991;
Maslow, 1970; Rogers and Freibuerg, 1994), cognitive (Graham, 1991; Piaget,
1952; Vroom, 1964;Weiner, 1992), and sociocultural (Lave, 1988; Lave and
Wenger, 1991) - most educators agree that motivation is a mental process that
activates, sustains, and maintains behavior (Alderman, 1999; Pintrich and
Schunk, 2002; Reeve, 1996). In academic environments, motivation to learn is
often viewed as a student's tendency to find academic activities meaningful and
worthwhile while deriving the intended benefits of those activities (Brophy,
1998). Although the various
motivation theories offer insights into how students' motivation may evolve and
be influenced in various contexts, our understanding of the factors that impact
people's motivation to learn remains incomplete (Hancock, 2004). Most
dispositional and situational variables influence students' motivation
individually and inter-actively. Because motivation and learning are so
interrelated (Pintrich and Schunk, 2002), researchers regularly seek better
understanding of the differential effects of various classroom variables on
students' motivation to learn (Beck, 2004; Deci and Ryan, 2002). Evaluation of students' achievement in
specific content areas is one classroom variable often explored by researchers.
Evaluation can have both beneficial and deleterious effects on student
learning. Brophy (1988) found that teachers' evaluative feedback can direct
students' future academic efforts by identifying topics that have been learned
and those in need of additional attention. Stipek (2002) suggested that
evaluation of a student's performance can often lead to enhanced student
confidence. Stiggins (2001) and Weiner (1992) contended that when students know
what is needed to improve their abilities, they perceive control over
achievement outcomes, which often enhances their learning. However, Bracey
(2002) suggested that over-emphasis on evaluation can negatively impact
students' critical thinking skills and distract from learning about content not
evaluated. Although effectively administered evaluation generally enhances
student achievement, some researchers have discovered that various types of
evaluation impact student achievement differently (Linn and Gronlund, 2005;
Oosterhof, 2003). Two types of
evaluation often used in classrooms are the selected response format and the
constructed response format. In selected response formats, test-takers
encounter questions with two or more possible options and must select a
response from the possible choices (McMillan, 2007). Typical selected response
item formats are multiple choice, true/false, and matching. These items are
scored objectively in that there is a single best or correct answer that the
test-taker is expected to select (Stiggins, 2001). In constructed response
formats, test-takers must create their own answers to test questions.
Because constructed response items do not include a list of options from which
a test-taker must select the correct or best response, scoring can be much more
subjective (Popham, 2008). Researchers have demonstrated that although either
format can be used to measure any learning outcome (McMillan, 2007),
constructed response items that assess reasoning and deep understanding cannot
generally assess the amount of knowledge and simple understanding that can be
assessed with selected response items (Ennis, 1987; Quellmalz and Hoskyn,
1997). One type of constructed
response format, performance assessment, has received significant attention in
recent years. Performance assessment involves observing and assessing students'
behavior while the behavior is underway (Hanna and Dettmer, 2004). Students are
required to demonstrate acquisition of knowledge and skills in one or more
content areas rather than answer questions about their knowledge and skills in
that/those area(s) (Kane et al., 1999). In most cases, a performance assessment
presents a hands-on task requiring students to do an activity that requires
application of knowledge and skills from several learning targets and uses
clearly defined criteria to evaluate how well the student has achieved this
application. Students involved in performance assessments may be asked to
demonstrate their achievement by engaging in individual or group activities,
producing an extended written or spoken answer, or creating a specific product
(Nitko, 2004).
Researchers
(Christie, 2003; Linn and Gronlund, 2005; Onwuegbuzie, 2000; Stiggins, 2001;
Wiggins, 1993) have demonstrated various advantages of performance assessment.
Collectively, these researchers have found that performance assessment can: (a)
clarify the meaning of complex learning targets; (b) assess students' ability
to take action; (c) determine the extent to which students integrate knowledge,
skills, and abilities; (d) allow teachers to assess the thought processes
students use as well as the products they produce; and (e) be consistent with
modern learning theory.
Although
positive outcomes of performance assessment have been demonstrated at the
primary and secondary levels, few studies to date have examined the impact of
performance assessment in graduate-level environments. In addition, few studies
have explored the impact of performance assessment on the motivation of
graduate students to learn. Because student motivation is such an important but
poorly understood issue in higher education, the current study was conducted to
investigate the impact of performance assessment on graduate students'
achievement and motivation to learn.
Methods
Participants
Forty-seven
graduate students enrolled in a research course titled Program Evaluation Methods
at a state-supported university of approximately 21,000 students in the
southeast United States participated in this study. The average age of the
students was 32.4 years (SD = 7.6). Sixty-four per cent were females. Although
all students were pursuing a master's degree in an education-related
discipline, their academic backgrounds and career goals varied.
Procedures
All
participants in the study were enrolled in one of two sections of a
one-semester course designed to examine principles, strategies, and techniques
of program evaluation in order to identify and apply criteria that would
indicate a program's value, quality, utility, effectiveness, and significance.
Upon completion of the course, participants were expected to understand
concepts essential to the conduct of program evaluations, know the basic steps
of conducting program evaluations, and be able to critique program evaluation
studies. One section (Section 1) had 22 students and the other section (Section
2) had 25 students. Prior to the beginning of the semester, students
self-enrolled in one of the sections. One section was taught on the university
campus and the other was taught at a distance education site approximately 27
miles away. The students in these sections had taken all of their courses at
their respective locations; as a result, they did not know or communicate with
the students in the other section. Although students were not assigned randomly
to the two sections, the sections were relatively matched based on the
students' sex, age, and academic backgrounds. The same professor taught both
sections to ensure uniformity of instruction.
During
lesson one of the course, the course requirements and objectives were explained
to the students. This discussion was the same in both sections, with one
exception. In Section 1, students were told that in addition to the final
written examination, their knowledge and skills would be evaluated twice during
the semester through a process called performance assessment, a means of
evaluating their performance on authentic tasks in the field. In Section 2,
students were told that in addition to the final written examination, their
knowledge and skills would be evaluated twice during the semester through a
traditional paper-and-pencil test comprised of short-answer, constructed
response items. Students in each section were unaware of the evaluation
practices employed in the other section.
During
the subsequent 15 weekly lessons (2 hours and 50 minutes each) of the course,
except for lessons 6 and 12, the professor taught the material to both sections
in exactly the same manner using the Socratic method of inquiry. Activities
differed between the sections during lessons 6 and 12 in that the Section 1
students were required to demonstrate their understanding of the material
addressed thus far in the course by conducting a program evaluation in a local
school, whereas the Section 2 students were required to demonstrate their
understanding of the material addressed thus far in the course by completing a
traditional paper-and-pencil test in the classroom. The learning objectives
measured during lessons 6 and 12 were the same in each section.
During
lessons 6 and 12, in accordance with the principles of performance assessment,
students in Section 1 were required to observe the activities of a local
school's program designed to infuse the arts into the 4th grade curriculum and
to complete a protocol outlining the nature of those observations. These
students also interviewed selected classroom and art teachers in order to
discern their impressions of the program. Finally, these students synthesized
the information that they had collected in order to determine the extent to
which this program's goals were being accomplished.
During
lessons 6 and 12, students in Section 2 were provided a written scenario
describing the local school's program designed to infuse the arts into the 4th
grade curriculum. Sitting at their desks in the university classroom, these
students answered short answer, constructed response questions on a traditional
paper-and-pencil test regarding the kinds of issues they would want to observe
in the 4th grade classroom if they were conducting a program evaluation. These
students also recorded on the test the kinds of questions they would ask in an
interview and the kinds of outcomes that might be synthesized from the
observations and interviews.
During
lesson 16, students in both sections were given the same professor-created,
criterion-referenced final examination consisting of twelve short-answer
constructed response items and six multiple-part essay questions.
Specific features were expected in the students' answers and points were
awarded when features were present. The content validity of the examination was
established when an external program evaluation expert determined that all
items and questions were valid and aligned with the course objectives.
During
the last portion of lesson 16, students in both sections were asked to complete
the motivation section of the Motivated Strategies for Learning Questionnaire
(MSLQ) (Pintrich et al., 1991) with two additional items. The MSLQ is a
self-report instrument designed to assess higher education students'
motivational orientations and learning strategies in a college course. The
motivation section consists of thirty-one items in six sub-scales that assess
students' goals and value beliefs for a course, their beliefs about their
skills to succeed in a course, and their anxiety about tests in a course.
Normally administered in a classroom, the motivation section of the MSLQ takes
ten to fifteen minutes to complete. Students respond to the items using a
seven-point Likert scale where only the first and seventh points are anchored
('not at all true of me' to 'very true of me'). Examples of items include: (a)
It is important for me to learn the course material of this class; (b) If I
can, I want to get better grades in this class than most of the other students;
and (c) If I try hard enough, then I will understand the course material. An
individual's motivation score is determined by computing the mean of the items
in the MSLQ's motivation section.
The
psychometric properties of the MSLQ made the instrument acceptable for use in
this study. Supported by the National Center for Research to Improve Post-Secondary
Teaching and Learning (NCRIPTL), development of the MSLQ began as a research
project on college student learning and teaching. Data collected from over 1700
college students were used to revise and ultimately construct the version of
the MSLQ used in this study. The resulting sub-scales were empirically derived
on the basis of item and factor analysis. Internal consistency coefficients for
the motivation sub-scales have ranged from 0.62 to 0.93.With respect to
predictive validity, five of the six motivation sub-scales have shown
significant correlations with final grade (r > 0.13, alpha = 0.05)
when administered to 380 college students (Pintrich et al., 1993).The MSLQ's
content validity has been supported through extensive literature on college
student learning and teaching (Gable, 1998).
In
addition to completing the motivation section of the MSLQ, students in both
sections were asked to respond individually to two items designed to help
researchers identify factors that may have influenced the students' motivation
to learn. The two items were: (a) Describe two or three things that you liked
about this course; and (b) Describe two or three things that you disliked about
this course. Students took approximately 15 minutes to complete the motivation
section of the MSLQ and the two additional items. After completing the MSLQ and
two additional items, the professor conducted a semi-structured group interview
of the students in each section in order to debrief them regarding their
participation in this study and to solicit their opinions about the evaluation
strategies used in the course. Each group interview lasted approximately 22
minutes and was guided by a four-item interview protocol. The interviews were
audiotape-recorded and then transcribed verbatim. Questions on the protocol
were: How did you feel about the evaluation practices used in this course? Did
the evaluation practices used in this course differ from those typically used
in your other courses? How? Would the evaluation procedures used in this course
work in undergraduate courses? Why? Would you recommend that the evaluation
procedures used in this course be used in your other graduate courses? Why?
Design
In
this quasi-experimental study, the independent variable was the manner in which
graduate students were evaluated in a course titled Program Evaluation Methods.
On one section, students' knowledge and skills were evaluated through a process
called performance assessment - assessment based on individual performance on
authentic tasks in a particular content domain. In another section, students'
knowledge and skills were evaluated through a traditional paper-and-pencil
examination comprised of short-answer, constructed response questions. The
dependent variables were: (a) student achievement as measured on a
professor-generated, criterion-referenced final examination; and (b) motivation
to learn as measured using the MSLQ with two additional items and two group
interviews.
After
administering the final examination and the MSLQ, researchers calculated a
multivariate analysis of variance (MANOVA). MANOVA is often used when examining
the impact of an independent variable (with two levels) on two dependent
variables (that are assumed to be related to each other).While it would have
been possible to conduct univariate tests (i.e. one for each dependent
variable), this would have caused Type I error inflation. MANOVA allowed
researchers to examine the impact of the evaluation method (performance
assessment versus traditional paper-and-pencil) on both dependent variables
(achievement and motivation to learn) simultaneously. Furthermore, using
techniques of qualitative research analysis (Bogdan and Biklen, 2003; Wolcott,
2001), researchers evaluated the students' written responses to the two items
at the end of the MSLQ regarding what they liked and disliked about the course
and the transcribed comments from the two semi-structured group interviews.
Results
A
one-way MANOVA was calculated examining the effects of evaluation method
(performance assessment and traditional paper-and-pencil) on achievement and
motivation to learn. The most widely used multivariate test, Wilks' Lambda,
revealed a significant effect (Lambda (2.44) = 4.90, p < 0.05). Follow-up
univariate tests indicated that achievement was not significantly influenced by
evaluation method (F (1.45) = 3.06, p > 0.05). That is, students exposed to
performance assessment practices during lessons 6 and 12 did not score
differently in achievement (M = 84.32, SD = 5.41) than did students exposed to
traditional paper-and-pencil tests (M = 81.32, SD = 6.36). Motivation to learn,
however, was significantly impacted by the evaluation method (F (1,45) = 6.15,
p < 0.05).That is, students exposed to performance assessment practices
during lessons 6 and 12 were significantly more motivated to continue to learn
(M = 5.12, SD = 0.20) than were the students exposed to traditional
paper-and-pencil tests (M = 4.95, SD = 0.27).
Using
qualitative research analysis techniques suggested by Bogdan and Biklen (2003)
and Wolcott (2001), researchers analyzed the students' written responses to the
two items regarding what they liked and disliked about the course and their
transcribed comments from the two semi-structured group interviews. This
analysis involved organizing the data into manageable units, synthesizing them,
searching for trends or themes in the data, and determining their relative
levels of importance. Specifically, the students' comments were read in their
entirety several times to gain a general sense of how they felt about the
course. These readings led to patterned regularities (i.e. patterns of
responses that appeared more frequently than others) specifically related to
the students' opinions about the evaluation procedures in the course. Commonly
expressed views among Section 1 students exposed to performance assessment in
lessons 6 and 12 were represented by the following comments: 'Grading at the
school was done in a realistic manner'; 'I was able to demonstrate how much I
really knew'; and 'Showing what I could do was much better than writing about
what I could do'. Commonly expressed views among Section 2 students exposed to
paper-and-pencil tests in lessons 6 and 12 were represented by the following
comments: 'The tests required too much memorization and regurgitation of
facts'; 'I knew more than I could show on the exam'; and 'Grading was like it
is in most of our courses'.
Discussion
Students
exposed to performance assessment twice during the course scored somewhat but
not significantly higher on the final examination than did students exposed
twice during the course to traditional pencil-and-paper tests. In addition,
students exposed twice to performance assessments were significantly more
motivated to learn at the end of the course than were students who had been
exposed twice to traditional pencil-and-paper assessments. Qualitative analysis
of the two items designed to assess students' likes and dislikes about the
course and of the comments from the group interviews revealed possible explanations
for these outcomes.
First,
students in traditional learning environments often complain that they know
more than what they are allowed to express on a written examination. These
comments were expressed by many of the students exposed to the traditional
pencil-and-paper assessments in this study. In contrast, the students in this
study exposed to performance assessment said that they valued the opportunity
to demonstrate their knowledge and skills about program evaluation. These
students stated that they were able to express the full range of their
understanding of program evaluation methods and that as a result, the professor
gained a fair and complete perspective of their knowledge and abilities.
Second,
students experiencing performance assessment wanted to continue to learn as a
result of their involvement in this course. An extraordinarily high quantity of
comments expressed in response to the two items on the MSLQ (i.e. What did you
like about this course? What did you dislike about this course?) involved the
students' positive feelings toward having their knowledge and skills evaluated
while they conducted a program evaluation in a local school during lessons 6
and 12. Previous research (Merrill, 2001; van Den Hurk, 2006) has demonstrated
that active involvement in the learning process motivates students to want to
continue to learn. This study suggests that active involvement in the
evaluation process make likewise contribute to students' motivation to learn.
Third,
students exposed to performance assessment in this study suggested that their
evaluations were much more concrete and less abstract than they had experienced
in traditional learning environments. The opportunity to observe activities and
conduct interviews in a local 4th grade classroom in order to collect data with
which to evaluate how well a program was accomplishing its objectives was far
more preferable than was reading a scenario in a university classroom and
commenting on how one would conduct the program evaluation. Several students
stated that the realism of this type of assessment improved their understanding
of program evaluation concepts which helped them perform better on the final
examination.
Fourth,
the students in the performance assessment condition likely reflected more
frequently and earnestly on their experience than did their colleagues exposed
to the traditional paper-and-pencil evaluation format. Researchers (Oosterhof,
2001; Wiggins, 1993) have discovered that reflection on one's evaluation
activities can prompt greater appreciation for one's effort. In this study,
students exposed to the performance assessment condition in an environment in
which their performance was not normally assessed may have become more
cognitively engaged in the process. As a result, their interest and enthusiasm
in the endeavor may have increased.
Fifth,
students evaluated by performance assessment were perhaps more motivated to
learn as a result of the authentic setting in which they demonstrated their
knowledge and skills. As opposed to the students being evaluated by traditional
paper-and-pencil tests in a university classroom, the performance assessment
students were in a local elementary school classroom observing real-world
activities designed to infuse the arts into the 4th grade curriculum. Their
interaction with teachers and 4th grade students likely energized these
university students and made them more interested in performing their assigned
activities. A number of these students commented that their discussions with
the 4th graders and their teachers made the experience more realistic and
rewarding.
Conclusions
Significant
research suggests that meaningful learning requires the active participation of
the learner (Brophy, 1998; Pintrich and Schunk, 2002). Therefore, to maximize their
effectiveness, professors must understand the factors that motivate their
students to become actively involved in learning. Professors must the use their
knowledge of these factors to construct classroom activities that maximize
students' involvement in the learning process in order to enhance the students'
motivation to learn. The current study demonstrates that performance assessment
is one classroom activity that achieves this outcome.
Specifically,
the results of this study demonstrate the contribution of performance
assessment toward enhancing graduate students motivation to learn. When
compared to the performance of students being tested through traditional
paper-and-pencil assessments, students tested in real-world environments in
which they could demonstrate their knowledge and skills performed better and
were significantly more motivated to continue to learn. While many higher
education environments cling to traditional testing procedures, the results of
this study suggest that professors may want to consider the merits of
performance assessment as a means by which to enhance student motivation to
learn. Allowing students to demonstrate their understanding of course content
by applying their knowledge in practical, realistic situations may be worthy of
consideration.