Standards-based grading in Math 1560, Calculus I

In August 2018 I had the privilege to attend a workshop organized by the American Institute of Mathematics related to their Curated Courses initiative. The workshop was primarily focused on the Curated Courses programme, and Open Education Resources in general, but there was time for some very useful discussion and presentations on the teaching of undergraduate mathematics.

Through this workshop, I learned from Steven Clontz and others about using a standards-based approach to assessment in mathematics. After years of frustration with students who routinely ignore all the carefully-crafted feedback they receive on tests and assignments, I was inspired to try it myself. Use of standards-based grading in calculus has already been discussed at length (and greater eloquence) by Kate Owens, but I'll make an attempt at relating my own experiences nonetheless.

My own adventure in standards-based grading took place in our Math 1560 Calculus I course. This was a class of over 200 students, divided into two sections (both taught by me). Before we get into the details, I can begin with a disclaimer. Most of the propoents of SBG seem to be concentrated in the smaller liberal arts universities, where they teach 20-30 students at a time. Scaling things to a class 10 times the size is... a challenge. And a whole lot of work. Does it pay off? Stay tuned to find out!

Basic details (the setup)

The basic premise of standards-based grading (SBG) is that students are allowed to continue to work on a particular outcome until they've mastered it. I used Steven Clontz's "MRIF" system, tied to a 3-point scale: an M on a standard means the student has mastered that outcome; this was worth 3 points. A student with minor errors (and in particular, unsatisfactory presentation and use of notation) got an R (2 points). They were allowed to fix their mistakes and submit a revision to bring their grade up to a 3 (M). If there were serious flaws in the solution (I for "issues", or 1 point), the student had to try a new problem. If a student failed to submit work for a standard (or submitted particularly poor work), they got an F (0 points) on that standard. The standards I used for my course can be found on our course outline.

Most SBG practitioners go all-in with this assessment method: it's a flipped classroom, with daily quizzes tied to learning outcomes (a.k.a. standards). Student grades are based on the number of standards they've mastered by the end of the course. I didn't go quite so far, in part because I didn't think it would be received well by my colleagues (or the administration) if I took a large first year service course and taught it without tests or a final exam. Instead, I went with the following breakdown of assessed work:

20% for in-class exercises (tied to standards)
10% for online homework assignments (WeBWorK)
7% each for 5 chapter tests
35% for a cumulative final exam

The tests were two stage tests. Students wrote first individually (worth 5%) and then as a group (worth 2% unless their individual grade was higher). The final exam was a traditional three hour cumulative exam. Students knew in advance that test and exam questions would be tied to the standards. This gave them two study resources for the final exam: their tests, and their in-class exercises.

Logistical notes

I mentioned already that this was a lot of work. How much? Here is a picture of all the work I collected by the end of term. This includes in-class assignments, test, and the final exam. All the work is still sitting in my office because everything was scanned and uploaded to Crowdmark and graded online. Without the efficiencies delivered by Crowdmark, I don't think it would have been possible to do this in a class of over 200. (More on using Crowdmark for SBG below.)

The other thing that made this possible was that I had some external funding from PIMS via the Callysto project, which I'm involved with as the Lethbridge site supervisor. (This money was intended as a sort of grading relief, to free up more of my time for the Callysto project. It's possible I ended up making more work for myself in the end, but at least, as intended, I didn't have to grade the tests myself.)
I used this money to hire a recent graduate from our Masters program, who was biding time until heading down to Australia to do his PhD. (Good luck Forrest!) His ability to use Crowdmark to quickly and efficiently provide feedback on student submissions (around 600 pages per week on average, not counting revisions!) is the only reason we were able to make this work.

The course, in action

Students had two main learning resources available to them: an open textbook, and a YouTube channel with screencast videos I created that cover all the content in the course. (I did the videos for my 2017 offering of the course.) In class, we'd usually do a brief discussion, followed a short lecture portion where I did examples. The last 30 minutes was reserved for students to work on their standards. There were two of us in the room to provide help during this portion of the class, but with as many as 100 students in the room, getting to everyone was a challenge. (The afternoon section became fairly self-sufficient by the midpoint of the semester, with students relying on each other for help first. This never materialized in the afternoon section.)

I would post ahead of time on our course website to let students know which standards they'd be able to work on in each class, so that (in theory) they could prepare accordingly. In class, I would give students a list of around 6 standards to choose from, with the intent that they would choose the one or two they were best prepared for, and then they would submit their work at the end.
Of course, many chose not to prepare; in at least some cases, this was based on the belief that lecture should be the first place one encounters a topic. Despite my best efforts to explain the system, not everyone understood. Many were under the impression that they had to complete all six problems instead of one or two, and got upset about the lack of time, and the fact that I wouldn't let them just copy down the problems to be done as homework. This was probably the biggest battle to be fought with the students.

An aside: given the class size, printing costs became an issue. I think many people who use SBG issue each student an individualized quiz at the start of each class, with problems selected based on their progress to date. I didn't have time to individualize, and I needed to use as little paper as possible. The best solution I came up with was to give each student an exercise booklet, with one page for each standard. New problems were provided each class on a rolling schedule: each day, one or two new standards would be added, and one or two would be dropped, with a typical standard available for three classes. This contributed to some of the confusion, since many students were unwilling to submit incomplete work (despite the promise that they'd be allowed to complete it as a revision), so they would copy down the question, take it home, and submit it the next class. I had to start enforcing the requirement that they had to do the problem I was providing that day, and not from the day before. (This led to a certain amount of resentment with some students...)
In cases where students got behind, or weren't able to complete a standard before I stopped providing problems in class, I allowed them to drop by during office hours to request a new problem.

After class, I would scan all the collected work, and upload it to Crowdmark. My grader knew when to expect new work, and was generally very good about getting it graded quickly, so that students could determine what revisions needed to be done for the next class.

Results and comparison

For reference, I taught the same course in Fall 2017. Again, we had two sections, and about the same number of students. In Fall 2017, I also had videos available for viewing before class, but the entire class time was used for examples and discussion (more of a lecture format). We also used two stage testing in Fall 2017. The main difference was that instead of 20% of the grade coming from in-class standards, we had 10% of the grade for written take-home assignments, and 10% of the grade for tutorial assignments. Results for Fall 2017 were as follows:

Math 1560, Fall 2017

Class average: 73% Class median: 75.5%
Exam average: 61% Exam median: 64%
Grade distribution:
- A+: 13 students (cutoff: 95%)
- A: 36 students (cutoff: 87%)
- A-: 12 students (cutoff: 84%)
- B+: 16 students (cutoff: 82%)
- B: 36 students (cutoff: 75%)
- B-: 9 students (cutoff: 72%)
- C+: 11 students (cutoff: 69%)
- C: 15 students (cutoff: 64%)
- C-: 18 students (cutoff: 61%)
- D+: 10 students (cutoff: 57%)
- D: 18 students (cutoff: 48%)
- F: 18 students (4 did not write the exam)

Math 1560, Fall 2018

Class average: 78.5% Class median: 83.5%
Exam average: 72.5% Exam median: 75.5%
Grade distribution:
- A+: 23 students (cutoff: 97%)
- A: 41 students (cutoff: 91%)
- A-: 16 students (cutoff: 88%)
- B+: 17 students (cutoff: 85%)
- B: 24 students (cutoff: 79%)
- B-: 9 students (cutoff: 76%)
- C+: 12 students (cutoff: 73%)
- C: 14 students (cutoff: 67%)
- C-: 9 students (cutoff: 64%)
- D+: 7 students (cutoff: 61%)
- D: 8 students (cutoff: 55%)
- F: 23 students (5 did not write the exam)

Overall, grades were higher, which I expected. This is why the letter grade cutoffs for 2018 were all 3 to 4 percent higher than in 2017. I really thought I was safe with a 97% threshold for an A+, but even at 98% I would have had a dozen A+ students! Allowing revisions made it easier to earn high grades on standards, but in 2017, tutorial grades were essentially participation-based, and probably averaged slightly higher than the grades on standards. The number of A grades is up sharply, while B grades are down. (I'd like to think that this system allowed B students to become A students. The number of grades in the C- to D range is also down, which is good, but the number who failed was actually slightly worse. (If the passing grade were dropped to 50 from 55, I'd have the same number of F grades in both years, while still having fewer D grades.)

The biggest pleasant surprise was the shift in final exam grades. The 2017 and 2018 exams were very similar in terms of length, format, types of questions, and difficulty. The average and median grades in 2018 were both 11.5% higher than in 2017.

Conclusions... sort of

The exam results alone would be enough to convince me to try this again, or even push things a little further towards a fully flipped classroom. (I'd even be tempted to flip this right over, and use the tutorial as a lecture to introduce concepts, and run the classes entirely hands-on!) Sadly, logistics, money, and reality will probably get in the way. First, I don't ever expect to have external funding for grading support again (or a highly capable grader looking for something to do before leaving to do his PhD). Second, future access to Crowdmark is doubtful. We haven't been able to secure institutional funding for it, and instructors can't be expected to cover the cost out of their own pockets. Finally, even with everything falling into place this semester, this was still a lot more work than last year, even accounting for the fact that I took time last year to record all the videos! If I try it again, it will be with a much smaller class.

Thoughts on SBG and Crowdmark

We ran the in-class standards exercises as an "administered assessment". This means that I created booklets in advance, and produced QR-coded copies for each student, as one would for a test done using Crowdmark. The difference is that a test is collected all at once, while the standards were collected bit-by-bit throughout the semester. This isn't exactly how the system is designed to work, but it functioned all right. Of course, students lost or destroyed their booklets, but once everyone was matched to their booklet, I was able to take the big PDF from Crowdmark, split it into individual booklets, and give each student access to their booklet on Moodle, in case they needed to reprint any part of it.

One big letdown was that we were not able to get LTI integration with Moodle working this time, apparently due to some changes in our server settings that we weren't able to sort out. (This was on us, not Crowdmark. Over the summer there was hope that Arts and Sciences would find money for a Crowdmark site license, so we waited for a decision. And waited, until the semester was almost upon us. At the end of August I decided no money was forthcoming and decided to proceed on my own. But by the time everything was set up, it was already September, and when we hit technical issues, everyone was too busy to sort them out.)
Lack of LTI meant that students needed to keep their email link, and set up an account. To add to the confusion, since we had LTI the year before, students were still presented with an option to sign into Crowdmark using their U of L credentials, which didn't work!
In hindsight, I should have proceeded on my own much earlier in the summer. Had I been able to get everything set up correctly ahead of time, I would have been able to avoid a lot of the extra work I wound up dealing with.

What worked well

As usual, efficient grading was a life-saver. I set things up so that each standard was a question worth 3 points (according to the scale described above). My grader was able to go through one standard at a time, quickly give a grade, and add feedback. Being able to reuse comments meant that feedback could be more detailed than on paper. (However, we intentionally left things somewhat vauge so that students had to figure out what they needed to fix.)

Most students who needed to do revisions were able to print off pages with a grade of 2, make their changes right on the page, and hand it back in. If they needed more room, they could always reprint the blank page using their copy on Moodle. Some students lacked access to (or money for) printing. Fortunately, there was an easy workaround: students could use regular notepaper, and copy down the human-readable hexadecimal serial number from the top of the page. (Entering the codes manually was a bit of work, but manageable.)

The system was perfect for students who wanted to do their revisions during office hours. I could pull up their booklet on Crowdmark, let them tell me what needed to be corrected, and then instantly update the grade. I'd be tempted to insist that all revisions be done this way if not for the logistics of getting 200 students into office hours.

Best of all was probably the fact that I could upload work on Tuesday after class, and students would have access to their graded work online by the end of the day on Wednesday, so they knew if they needed to redo a standard in Thursday's class.

This isn't related to SBG, but it also bears mentioning: if the grad student assigned to grade your tests leaves to visit family overseas and doesn't come back, they don't get out of their assignment, since grading is online and can be done from anywhere!

What didn't

Aside from the LTI issues mentioned above, the main trouble was that I was using a format intended for a one-time assessment (namely, tests), and applying it to an ongoing assessment. One of the big limitations that resulted from this is that the "Send grades to students" button can only be used once. In this case, as soon as a student had submitted the cover page (so I could match them to their booklet) and at least one page of work, I would release the grades to them, so they'd gain access to their work, which would let them find out if any revisions were needed. It would have been nice to be able to repeat the email periodically, to remind them that new graded work was available.
I think this would have been a non-issue with LTI, since students would have the link to their standards in Moodle, and would quickly get into the habit of checking it before each class.

Managing revisions was the biggest challenge. At first, I thought that filtering evaluations to grades less than 3 would let us quickly scan through, spot revised work, and update it. This quickly got out of control, since updating a grade on Crowdmark results in both the old and new grades being stored -- if they are done by different graders -- and both show up in the filtering. (I had a second student marker in charge of checking the revisions.)

It turned out that there was an easy low-tech solution: have students write down their booklet number, and the numbers of any standards they were submitting revisions for, on a sheet of paper. I could then pass that list onto my markers, so they knew which pages to regrade.

The printing interface for students on Crowdmark could be improved. Many of them ran into trouble when they wanted to print out individual pages of graded work so that they could do their revisions. Many resorted to taking screenshots and printing those, and unfortunately, the screenshots were often of such low resolution that neither the QR code nor the human-readable equivalent could be read (by machine or human). This lead to a lot more manual processing than was perhaps necessary.
The "student factor" came up a few times with revisions. Probably the most common (and most problematic) was students not realizing that every page was coded differently, and either using the wrong page for a submission, or using the wrong code on a revision (resulting in their work getting uploaded to the wrong place, and in some cases, overwriting existing work).

Students would often get confused when submitting revisions, because once the revision was uploaded, it would replace their original submission, but original grading and annotation remained in place until a grader was able to get to the revision. But since we were operating well outside the indended use for the platform, it's not surprising that these issues arose.

Wish list musings

In some ways the assignment submission process in Crowdmark might have worked better, with students responsible for uploading their own work. But because I wanted this to be work done in class, I wanted them handing things in before they left, and not everyone has a smartphone, so it would not have been fair to expect them to upload on the spot.

The best case scenario would probably build off of the administered assessment. Students get QR-coded pages to submit. Printing off work for revisions works well enough that the QR code isn't corrupted on reprinting (or at least, is corrupted less frequently), so that revisions can easily be rescanned. The biggest features, for both students and graders, would be if:

page uploads had a "history" feature, so that instead of revisions replacing the original submission, they supplement it.
there was some means of indicating when a page that is already graded gets modified (i.e. because a revision has been added). This would let graders easily keep track of what work needs to be revised. Perhaps a green "graded" square in the overview could turn yellow to indicate a "revised" status.
there was an indicator for students when new graded work was available, so they do not have to go in and look through the whole assessment for updates.

I'm still not sure I'd ever try SBG again in a class of 200, but I think something like Crowdmark -- with a bit of tweaking -- could make it feasible, if the marker budget is there.

In a perfect world, all this gets tied into something like Steven Clontz's SBG app. We build a big bank of appropriate problems for each standard, and as grades are fed into the system, it generates the next set of problems for each student. But in a big class with limited resources, I'm still not sure what the best method is for the students' solutions (presumably done on paper) to be fed back into the system for grading.