# dividing up a large class into discussion sections using integer programming

I am team teaching a freshman course on engineering grand challenges with five other instructors. Each of the six instructors teaches a “theme” (mine is about systems, critical infrastructure, and logistics) to a subgroup of students. We must partition the students into six themes twice, since we repeat our themes in the second half of the semester with another group of students (we call these modules). To facilitate the assignments, students can submit a rank ordering of their top four choices. I was asked to use optimization to make the pairings. I thought about manually making the matches, but I could see that it would be a lot of work to redo the matchings if someone wanted to make some changes later on. So I put together a quick optimization model to do the work for me.

Here are the constraints:

• Theme class sizes must be between 21 and 24 students
• Each student should get one of their top two choices (this is not guaranteed)
• Every student needs to be assigned to a theme in each module
• Every student needs to be assigned to different themes across the two modules
• Two groups of students are in first year interest groups (FIGs) and should all be assigned to the same first theme, regardless of preferences. This is a hard-but-flexible constraint (is that a thing?). These students needed to be assigned to the same theme, but I could choose any theme for them.

I had about two days to make the assignments and possibly field some requests to change the assignments. Integer programming was appealing because it would allow me to quickly make changes to the assignments and do a “what if” analysis. I could quickly see that it was going to be impossible to fill some of the themes based on student preferences. Luckily, some students did not submit preferences, so I could use these students to “pad” the assignments while maintaining a good solution. The objective function captures the quality of the assignments based on student preferences, so I would always have a feasible solution. I assigned weights for each student-theme pair as the objective function coefficients. I assigned the first choice theme a weight of 8, the second choice theme a weight of 4, a third choice theme a weight of 2, and a fourth choice theme a weight of 1. I assigned all other student-theme pairs a weight of zero. The goal was to mostly assign students to first and second preferred choices, so I chose weights that would “encourage” this. After solving the model I could easily check to see how many students were assigned to one of their top two matches (we promise to do this, if its possible).

The FIGs was a bit tricky. This constraint made it hard to eyeball a good solution and motivated the use of integer programming. I wanted to be able to manually test out different theme assignments for the FIGs and possibly have the flexibility to change the assignments really quickly if one of the instructors did or did not want a FIG. I manually selected themes for the FIG students by fixing variables and then optimized the assignment of everyone else. I could have treated the FIG assignments as a decision instead of an input.

Here are the parameters:

• N = 138 students
• T = 6 themes
• M = 2 modules
• $w_{nt}$: preference weight for student n and theme t (either 0, 1, 2, 4, or 8).

Here are the binary decision variables:

• $x_{ntm}$: 1 if student n is assigned to theme t in module m.

The integer programming model is as follows.

maximize $\sum_{n,t,m} w_{nt} x_{ntm}$

subject to $21 \le \sum_n x_{ntm} \le 24, t=1,...,T,\ m=1,2$ $x_{nt1} + x_{nt2} \le 1, n=1,...,N, t=1,...,T$ $\sum_t x_{ntm} = 1,\ i=1,...,N,\ m=1,2$ $x_{ntm} \in \{0,1\}, n=1,...,N,\ t=1,...,T,\ m=1,2$

The objective function maximizes the value of the assignments. The first constraint sets class sizes between 21 and 24. The second constraint ensures that a student is not assigned to the same theme across both modules. The third constraint ensures that each student is assigned to one of the themes in both modules. I do not require the model to assign students to one of their top two choices, and therefore, a feasible solution is always easy to find. However, I checked afterward to see if students got what they wanted (according to my results, they did!). What isn’t in this model is fixed variables associated with FIG student assignments, but that is straightforward to change.

The model was easy to set up except for setting the weights $w_{nt}$ based on student preferences. This information came from a survey we conducted on the course management system. Downloading the data in a spreadsheet did not give us a flat file, so it took some extra parsing to get the right data. We would have needed to parse the survey data even if we made the assignments manually, so this was unavoidable. I would have liked to ensure diversity in teams somehow, but I did not have the data for this. Next time.

All in all, this was fun, and I would recommend the use of integer programming. It took me longer to write this post than to set up and solve the integer program to assign students to themes 🙂

#### 8 responses to “dividing up a large class into discussion sections using integer programming”

• Greg Glockner (@gglockner)

How well did it solve? This would also lend itself well to column generation methods.

• Laura Albert McLay

It solved really quickly. I popped it into Matlab’s IP solver and it even solved quickly there too (I was surprised). I had 138*6*2 = binary 1656 variables (not many) and some symmetry.

• prubin73

I wonder if constraint programming might do as well on this problem? Also, why the 138 in 138*6*2 binary variables. Wouldn’t all the students in each FIG share a single n index?

• Laura Albert McLay

It probably would, but the overarching goal of the project was to save me time, so I did what was quick and easy for me.

And yes, to be completely accurate, the actual variables was smaller than 1656 in this case because I fixed the FIG student decision variables.

• matforddavid

Did you consider treating this as a network flow problem? I did similar problems of student assignment that way using the out-of-kilter algorithm – the need to do two modules might mean solving two separate problems, and the interaction between the flow problems might be a little tricky.

• Ryan J. O'Neil

Could you post (a version of) the data? I’d kind of like to see how well a CP solver does on it.

• opensolverblog

My students Oscar Dowson and Michael Fairley created a SolverStudio Excel model using PuLP & CBC to partition 571 engineering students into project groups that were balanced by gender, ethnicity, major/specialisation and academic ability. They were in the class being partitioned, and so did this modelling work because they wanted to be treated fairly. Their model is now used annually by the course organisers; being in Excel (and not Matlab!) made this adoption much easier.
Their model is freely available at: https://github.com/odow/group-allocator
There is a paper describing this is at: https://secure.orsnz.org.nz/conf48/program/Papers/nzsaorsnz2014_paper_48.pdf
PS: They won an ORSNZ Young Practitioner Prize for their contribution.

• prubin73

Some years back I wrote an open source app to form teams (http://paritybuilder.sourceforge.net/), which ended up turning into a journal article (http://www.tandfonline.com/doi/abs/10.1080/0740817X.2014.953643#.Vgqk-Je49ko). It handles a few things you didn’t mention but does *not* handle the between-modules constraints. The interesting thing was that, after starting with a MIP model and migrating to a genetic algorithm (the underlying problem being somewhat nonlinear), I found that a very basic problem-specific heuristic (similar to 2-opt) did as well if not better, with less bulky code. As a MIP guy, I was crushed. 🙂