Effects of Performance Assessment on the Achievement and Motivation of Graduate Students

Dawson R. Hancock
The University of North Carolina at Charlotte

This study explored the impact of performance assessment versus traditional paper-and-pencil assessment on graduate students' achievement and motivation to learn while enrolled in a 16-lesson course on program evaluation methods. Students in two sections of the course were exposed to the same content and instructional methods, with one exception: students in one section were exposed to performance assessment during which they demonstrated their knowledge and skills while conducting a program evaluation in a 4th grade classroom, whereas students in the other section were provided with a written scenario of that 4th grade program and were required to answer questions on a traditional paper-and-pencil test about program evaluation methods. Results revealed that students exposed to performance assessment achieved somewhat higher scores on the final examination and demonstrated significantly higher levels of motivation to learn than did students evaluated by traditional paper-and-pencil tests. Qualitative analysis of the students' written expressions about the course and of their comments from group interviews revealed possible explanations for these outcomes.  Educators know that meaningful learning requires the active participation of the learner (Fetsco and McClure, 2005).Yet influencing students' motivation to learn remains a formidable challenge in many classrooms. Despite the contrasting views of motivation offered by traditional theoretical models - behavioral (Skinner, 1953; Walker, 1996), humanistic (Deci et al., 1991; Maslow, 1970; Rogers and Freibuerg, 1994), cognitive (Graham, 1991; Piaget, 1952; Vroom, 1964;Weiner, 1992), and sociocultural (Lave, 1988; Lave and Wenger, 1991) - most educators agree that motivation is a mental process that activates, sustains, and maintains behavior (Alderman, 1999; Pintrich and Schunk, 2002; Reeve, 1996). In academic environments, motivation to learn is often viewed as a student's tendency to find academic activities meaningful and worthwhile while deriving the intended benefits of those activities (Brophy, 1998).  Although the various motivation theories offer insights into how students' motivation may evolve and be influenced in various contexts, our understanding of the factors that impact people's motivation to learn remains incomplete (Hancock, 2004). Most dispositional and situational variables influence students' motivation individually and inter-actively. Because motivation and learning are so interrelated (Pintrich and Schunk, 2002), researchers regularly seek better understanding of the differential effects of various classroom variables on students' motivation to learn (Beck, 2004; Deci and Ryan, 2002).  Evaluation of students' achievement in specific content areas is one classroom variable often explored by researchers. Evaluation can have both beneficial and deleterious effects on student learning. Brophy (1988) found that teachers' evaluative feedback can direct students' future academic efforts by identifying topics that have been learned and those in need of additional attention. Stipek (2002) suggested that evaluation of a student's performance can often lead to enhanced student confidence. Stiggins (2001) and Weiner (1992) contended that when students know what is needed to improve their abilities, they perceive control over achievement outcomes, which often enhances their learning. However, Bracey (2002) suggested that over-emphasis on evaluation can negatively impact students' critical thinking skills and distract from learning about content not evaluated. Although effectively administered evaluation generally enhances student achievement, some researchers have discovered that various types of evaluation impact student achievement differently (Linn and Gronlund, 2005; Oosterhof, 2003).  Two types of evaluation often used in classrooms are the selected response format and the constructed response format. In selected response formats, test-takers encounter questions with two or more possible options and must select a response from the possible choices (McMillan, 2007). Typical selected response item formats are multiple choice, true/false, and matching. These items are scored objectively in that there is a single best or correct answer that the test-taker is expected to select (Stiggins, 2001). In constructed response formats, test-takers must create their own answers to test questions.  Because constructed response items do not include a list of options from which a test-taker must select the correct or best response, scoring can be much more subjective (Popham, 2008). Researchers have demonstrated that although either format can be used to measure any learning outcome (McMillan, 2007), constructed response items that assess reasoning and deep understanding cannot generally assess the amount of knowledge and simple understanding that can be assessed with selected response items (Ennis, 1987; Quellmalz and Hoskyn, 1997).  One type of constructed response format, performance assessment, has received significant attention in recent years. Performance assessment involves observing and assessing students' behavior while the behavior is underway (Hanna and Dettmer, 2004). Students are required to demonstrate acquisition of knowledge and skills in one or more content areas rather than answer questions about their knowledge and skills in that/those area(s) (Kane et al., 1999). In most cases, a performance assessment presents a hands-on task requiring students to do an activity that requires application of knowledge and skills from several learning targets and uses clearly defined criteria to evaluate how well the student has achieved this application. Students involved in performance assessments may be asked to demonstrate their achievement by engaging in individual or group activities, producing an extended written or spoken answer, or creating a specific product (Nitko, 2004).

Researchers (Christie, 2003; Linn and Gronlund, 2005; Onwuegbuzie, 2000; Stiggins, 2001; Wiggins, 1993) have demonstrated various advantages of performance assessment. Collectively, these researchers have found that performance assessment can: (a) clarify the meaning of complex learning targets; (b) assess students' ability to take action; (c) determine the extent to which students integrate knowledge, skills, and abilities; (d) allow teachers to assess the thought processes students use as well as the products they produce; and (e) be consistent with modern learning theory. 

Although positive outcomes of performance assessment have been demonstrated at the primary and secondary levels, few studies to date have examined the impact of performance assessment in graduate-level environments. In addition, few studies have explored the impact of performance assessment on the motivation of graduate students to learn. Because student motivation is such an important but poorly understood issue in higher education, the current study was conducted to investigate the impact of performance assessment on graduate students' achievement and motivation to learn.

Methods

Participants

Forty-seven graduate students enrolled in a research course titled Program Evaluation Methods at a state-supported university of approximately 21,000 students in the southeast United States participated in this study. The average age of the students was 32.4 years (SD = 7.6). Sixty-four per cent were females. Although all students were pursuing a master's degree in an education-related discipline, their academic backgrounds and career goals varied.

Procedures

All participants in the study were enrolled in one of two sections of a one-semester course designed to examine principles, strategies, and techniques of program evaluation in order to identify and apply criteria that would indicate a program's value, quality, utility, effectiveness, and significance. Upon completion of the course, participants were expected to understand concepts essential to the conduct of program evaluations, know the basic steps of conducting program evaluations, and be able to critique program evaluation studies. One section (Section 1) had 22 students and the other section (Section 2) had 25 students. Prior to the beginning of the semester, students self-enrolled in one of the sections. One section was taught on the university campus and the other was taught at a distance education site approximately 27 miles away. The students in these sections had taken all of their courses at their respective locations; as a result, they did not know or communicate with the students in the other section. Although students were not assigned randomly to the two sections, the sections were relatively matched based on the students' sex, age, and academic backgrounds. The same professor taught both sections to ensure uniformity of instruction.

During lesson one of the course, the course requirements and objectives were explained to the students. This discussion was the same in both sections, with one exception. In Section 1, students were told that in addition to the final written examination, their knowledge and skills would be evaluated twice during the semester through a process called performance assessment, a means of evaluating their performance on authentic tasks in the field. In Section 2, students were told that in addition to the final written examination, their knowledge and skills would be evaluated twice during the semester through a traditional paper-and-pencil test comprised of short-answer, constructed response items. Students in each section were unaware of the evaluation practices employed in the other section.

During the subsequent 15 weekly lessons (2 hours and 50 minutes each) of the course, except for lessons 6 and 12, the professor taught the material to both sections in exactly the same manner using the Socratic method of inquiry. Activities differed between the sections during lessons 6 and 12 in that the Section 1 students were required to demonstrate their understanding of the material addressed thus far in the course by conducting a program evaluation in a local school, whereas the Section 2 students were required to demonstrate their understanding of the material addressed thus far in the course by completing a traditional paper-and-pencil test in the classroom. The learning objectives measured during lessons 6 and 12 were the same in each section. 

During lessons 6 and 12, in accordance with the principles of performance assessment, students in Section 1 were required to observe the activities of a local school's program designed to infuse the arts into the 4th grade curriculum and to complete a protocol outlining the nature of those observations. These students also interviewed selected classroom and art teachers in order to discern their impressions of the program. Finally, these students synthesized the information that they had collected in order to determine the extent to which this program's goals were being accomplished.

During lessons 6 and 12, students in Section 2 were provided a written scenario describing the local school's program designed to infuse the arts into the 4th grade curriculum. Sitting at their desks in the university classroom, these students answered short answer, constructed response questions on a traditional paper-and-pencil test regarding the kinds of issues they would want to observe in the 4th grade classroom if they were conducting a program evaluation. These students also recorded on the test the kinds of questions they would ask in an interview and the kinds of outcomes that might be synthesized from the observations and interviews. 

During lesson 16, students in both sections were given the same professor-created, criterion-referenced final examination consisting of twelve short-answer constructed response items and six multiple-part essay questions.  Specific features were expected in the students' answers and points were awarded when features were present. The content validity of the examination was established when an external program evaluation expert determined that all items and questions were valid and aligned with the course objectives.

During the last portion of lesson 16, students in both sections were asked to complete the motivation section of the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich et al., 1991) with two additional items. The MSLQ is a self-report instrument designed to assess higher education students' motivational orientations and learning strategies in a college course. The motivation section consists of thirty-one items in six sub-scales that assess students' goals and value beliefs for a course, their beliefs about their skills to succeed in a course, and their anxiety about tests in a course. Normally administered in a classroom, the motivation section of the MSLQ takes ten to fifteen minutes to complete. Students respond to the items using a seven-point Likert scale where only the first and seventh points are anchored ('not at all true of me' to 'very true of me'). Examples of items include: (a) It is important for me to learn the course material of this class; (b) If I can, I want to get better grades in this class than most of the other students; and (c) If I try hard enough, then I will understand the course material. An individual's motivation score is determined by computing the mean of the items in the MSLQ's motivation section.

The psychometric properties of the MSLQ made the instrument acceptable for use in this study. Supported by the National Center for Research to Improve Post-Secondary Teaching and Learning (NCRIPTL), development of the MSLQ began as a research project on college student learning and teaching. Data collected from over 1700 college students were used to revise and ultimately construct the version of the MSLQ used in this study. The resulting sub-scales were empirically derived on the basis of item and factor analysis. Internal consistency coefficients for the motivation sub-scales have ranged from 0.62 to 0.93.With respect to predictive validity, five of the six motivation sub-scales have shown significant correlations with final grade (r > 0.13, alpha  = 0.05) when administered to 380 college students (Pintrich et al., 1993).The MSLQ's content validity has been supported through extensive literature on college student learning and teaching (Gable, 1998).

In addition to completing the motivation section of the MSLQ, students in both sections were asked to respond individually to two items designed to help researchers identify factors that may have influenced the students' motivation to learn. The two items were: (a) Describe two or three things that you liked about this course; and (b) Describe two or three things that you disliked about this course. Students took approximately 15 minutes to complete the motivation section of the MSLQ and the two additional items. After completing the MSLQ and two additional items, the professor conducted a semi-structured group interview of the students in each section in order to debrief them regarding their participation in this study and to solicit their opinions about the evaluation strategies used in the course. Each group interview lasted approximately 22 minutes and was guided by a four-item interview protocol. The interviews were audiotape-recorded and then transcribed verbatim. Questions on the protocol were: How did you feel about the evaluation practices used in this course? Did the evaluation practices used in this course differ from those typically used in your other courses? How? Would the evaluation procedures used in this course work in undergraduate courses? Why? Would you recommend that the evaluation procedures used in this course be used in your other graduate courses? Why?

Design

In this quasi-experimental study, the independent variable was the manner in which graduate students were evaluated in a course titled Program Evaluation Methods. On one section, students' knowledge and skills were evaluated through a process called performance assessment - assessment based on individual performance on authentic tasks in a particular content domain. In another section, students' knowledge and skills were evaluated through a traditional paper-and-pencil examination comprised of short-answer, constructed response questions. The dependent variables were: (a) student achievement as measured on a professor-generated, criterion-referenced final examination; and (b) motivation to learn as measured using the MSLQ with two additional items and two group interviews.

After administering the final examination and the MSLQ, researchers calculated a multivariate analysis of variance (MANOVA). MANOVA is often used when examining the impact of an independent variable (with two levels) on two dependent variables (that are assumed to be related to each other).While it would have been possible to conduct univariate tests (i.e. one for each dependent variable), this would have caused Type I error inflation. MANOVA allowed researchers to examine the impact of the evaluation method (performance assessment versus traditional paper-and-pencil) on both dependent variables (achievement and motivation to learn) simultaneously. Furthermore, using techniques of qualitative research analysis (Bogdan and Biklen, 2003; Wolcott, 2001), researchers evaluated the students' written responses to the two items at the end of the MSLQ regarding what they liked and disliked about the course and the transcribed comments from the two semi-structured group interviews.

Results

A one-way MANOVA was calculated examining the effects of evaluation method (performance assessment and traditional paper-and-pencil) on achievement and motivation to learn. The most widely used multivariate test, Wilks' Lambda, revealed a significant effect (Lambda (2.44) = 4.90, p < 0.05). Follow-up univariate tests indicated that achievement was not significantly influenced by evaluation method (F (1.45) = 3.06, p > 0.05). That is, students exposed to performance assessment practices during lessons 6 and 12 did not score differently in achievement (M = 84.32, SD = 5.41) than did students exposed to traditional paper-and-pencil tests (M = 81.32, SD = 6.36). Motivation to learn, however, was significantly impacted by the evaluation method (F (1,45) = 6.15, p < 0.05).That is, students exposed to performance assessment practices during lessons 6 and 12 were significantly more motivated to continue to learn (M = 5.12, SD = 0.20) than were the students exposed to traditional paper-and-pencil tests (M = 4.95, SD = 0.27).

Using qualitative research analysis techniques suggested by Bogdan and Biklen (2003) and Wolcott (2001), researchers analyzed the students' written responses to the two items regarding what they liked and disliked about the course and their transcribed comments from the two semi-structured group interviews. This analysis involved organizing the data into manageable units, synthesizing them, searching for trends or themes in the data, and determining their relative levels of importance. Specifically, the students' comments were read in their entirety several times to gain a general sense of how they felt about the course. These readings led to patterned regularities (i.e. patterns of responses that appeared more frequently than others) specifically related to the students' opinions about the evaluation procedures in the course. Commonly expressed views among Section 1 students exposed to performance assessment in lessons 6 and 12 were represented by the following comments: 'Grading at the school was done in a realistic manner'; 'I was able to demonstrate how much I really knew'; and 'Showing what I could do was much better than writing about what I could do'. Commonly expressed views among Section 2 students exposed to paper-and-pencil tests in lessons 6 and 12 were represented by the following comments: 'The tests required too much memorization and regurgitation of facts'; 'I knew more than I could show on the exam'; and 'Grading was like it is in most of our courses'.

Discussion

Students exposed to performance assessment twice during the course scored somewhat but not significantly higher on the final examination than did students exposed twice during the course to traditional pencil-and-paper tests. In addition, students exposed twice to performance assessments were significantly more motivated to learn at the end of the course than were students who had been exposed twice to traditional pencil-and-paper assessments. Qualitative analysis of the two items designed to assess students' likes and dislikes about the course and of the comments from the group interviews revealed possible explanations for these outcomes.

First, students in traditional learning environments often complain that they know more than what they are allowed to express on a written examination. These comments were expressed by many of the students exposed to the traditional pencil-and-paper assessments in this study. In contrast, the students in this study exposed to performance assessment said that they valued the opportunity to demonstrate their knowledge and skills about program evaluation. These students stated that they were able to express the full range of their understanding of program evaluation methods and that as a result, the professor gained a fair and complete perspective of their knowledge and abilities.

Second, students experiencing performance assessment wanted to continue to learn as a result of their involvement in this course. An extraordinarily high quantity of comments expressed in response to the two items on the MSLQ (i.e. What did you like about this course? What did you dislike about this course?) involved the students' positive feelings toward having their knowledge and skills evaluated while they conducted a program evaluation in a local school during lessons 6 and 12. Previous research (Merrill, 2001; van Den Hurk, 2006) has demonstrated that active involvement in the learning process motivates students to want to continue to learn. This study suggests that active involvement in the evaluation process make likewise contribute to students' motivation to learn.

Third, students exposed to performance assessment in this study suggested that their evaluations were much more concrete and less abstract than they had experienced in traditional learning environments. The opportunity to observe activities and conduct interviews in a local 4th grade classroom in order to collect data with which to evaluate how well a program was accomplishing its objectives was far more preferable than was reading a scenario in a university classroom and commenting on how one would conduct the program evaluation. Several students stated that the realism of this type of assessment improved their understanding of program evaluation concepts which helped them perform better on the final examination.

Fourth, the students in the performance assessment condition likely reflected more frequently and earnestly on their experience than did their colleagues exposed to the traditional paper-and-pencil evaluation format. Researchers (Oosterhof, 2001; Wiggins, 1993) have discovered that reflection on one's evaluation activities can prompt greater appreciation for one's effort. In this study, students exposed to the performance assessment condition in an environment in which their performance was not normally assessed may have become more cognitively engaged in the process. As a result, their interest and enthusiasm in the endeavor may have increased.

Fifth, students evaluated by performance assessment were perhaps more motivated to learn as a result of the authentic setting in which they demonstrated their knowledge and skills. As opposed to the students being evaluated by traditional paper-and-pencil tests in a university classroom, the performance assessment students were in a local elementary school classroom observing real-world activities designed to infuse the arts into the 4th grade curriculum. Their interaction with teachers and 4th grade students likely energized these university students and made them more interested in performing their assigned activities. A number of these students commented that their discussions with the 4th graders and their teachers made the experience more realistic and rewarding.

Conclusions

Significant research suggests that meaningful learning requires the active participation of the learner (Brophy, 1998; Pintrich and Schunk, 2002). Therefore, to maximize their effectiveness, professors must understand the factors that motivate their students to become actively involved in learning. Professors must the use their knowledge of these factors to construct classroom activities that maximize students' involvement in the learning process in order to enhance the students' motivation to learn. The current study demonstrates that performance assessment is one classroom activity that achieves this outcome.

Specifically, the results of this study demonstrate the contribution of performance assessment toward enhancing graduate students motivation to learn. When compared to the performance of students being tested through traditional paper-and-pencil assessments, students tested in real-world environments in which they could demonstrate their knowledge and skills performed better and were significantly more motivated to continue to learn. While many higher education environments cling to traditional testing procedures, the results of this study suggest that professors may want to consider the merits of performance assessment as a means by which to enhance student motivation to learn. Allowing students to demonstrate their understanding of course content by applying their knowledge in practical, realistic situations may be worthy of consideration.