Tag Archives: analytics

higher ed meets analytics

The Wall Street Journal reports that legislation will be introduced to identify which universities provide the most bang for the buck in terms of students’ employment prospects.  See more here. With college tuition debt surpassing 1 trillion dollars and unemployment levels being relatively high after graduation, this is a serious issue. It’s alarming to know that employment information is not generally available to prospective students when they select a university.

There are three serious challenges in identifying “valuable” universities or programs:

1. Identification of meaningful metrics.

2. Data collection and analysis to support the metrics in 1.

3. Convey what is learned to prospective students

I’m not going to critically directly discuss metrics #1 in this post, but it’s a serious issue. After all, bad rubrics can lead to an explosion in the majors that score well in those rubrics whether it’s justified or not. Let’s assume that we can agree on some basic metrics, e.g., student loan default rates.

Instead, I’ll discuss the challenges in data collection and communication, since these are core analytics problems.  No progress can be made toward recognizing valuable universities and degree programs without analytics. How should a university collect data, analyze the data, and report the results to prospective students.

Data collection is a problem for most universities – the students who report back salary information are usually those who have jobs. It’s almost impossible to infer what unemployment rates are. Some states, including my present home state of Virginia [Link], are collecting information about students on a large-scale so missing data will not be such a big problem. But there is still room for improvement:

Last year, Virginia lawmakers began requiring the State Council of Higher Education for Virginia to produce annual reports on the wages of college graduates 18 months and five years after they receive their degrees. Beginning this year, the reports must also include average student loan debt.

The state data have shortcomings. Paychecks for the same job can vary widely by location. Salary data don’t reflect self-employed graduates or those who work for the U.S. government or move to another state.

More analytics questions: How should data be analyzed regarding graduates who have gone on to graduate school? Students who are self-employed? What if many graduates go on to live in a city with a high cost of living and are paid more than there peers who live in more affordable places?

Employment rates are a function of major as well as university (as well as other factors, of course). Assessing by major and university introduces new challenges. Small programs–like entomology and maybe operations research programs–are going to be hard to assess. They will likely be sensitive to outliers that can skew expected values and missing data. We may not be able to say a whole lot about entomology majors at a university due to too few data points. Can we infer whether this major is a good investment based on other factors?

All of the tools I looked at for evaluating universities reported a single metric that reeked of expected values. There are few attempts to report a range or the uncertainty with the single metric. I suspect that this can be improved. Online retailers have done away with the average rating (based on 1-5 stars) with a confidence level based on the total number of reviews submitted. The confidence level is still conveyed as a single scalar value, but it’s more meaningful. How confident are we about the few entomology majors?

Another concern with a single metric is that it does not convey what has happened over time. I see room for analytics here to  recommend, say, when it’s worthwhile to consider law school again after there has been a substantial decline in law school admissions [Link]. This is a big issue since loan default rates (one possible metric) has gone up everywhere over the last few years but at different rates (see this college – default rates increased from 10% to 20% in three years. Others are less troubling). Trends over time are important.

In terms of conveying information, this is hard to do at a university-wide level. Having said that, I like the chart below of public 4-year universities in Illinois. Having grown up in Illinois and knowing quite about about the public universities, I would personally rank the public universities there in ascending order of their student loan default rates, regardless of major.

I’m less inclined to do so in Virginia, where some metrics such as average salary can be misleading. For example, George Mason University graduates can earn quite a lot because they often get jobs in DC, where cost of living is through the roof. They are not necessarily better off than students who get jobs at, say, Virginia Tech.

Conveying information at the university level may be too coarse. I’ve checked out quite a few online tools for assessing the quality of different universities. The level of aggregation is sometimes alarming. This online tool does very coarse ratings at the state level. This is meaningless, because there are bad and good places to get a degree in every state, and they should not be aggregated. Some aggregation is necessary. This is an area where analytics can be useful: at what level should we report outcomes: at the state level, university level, college level, department level, or other (e.g., different regions or industries where graduates may get jobs).

What role do you think analytics will – or should – play in evaluating universities?

Student loan default rates in Illinois

big data and operations research

Sheldon Jacobson and Edwin Romeijn, the OR and SES/MES program directors at NSF, respectively, talked about the role of operations research in the bigger picture of scientific research at the INFORMS Computing Society Conference in Santa Fe last week. Quite often, program managers at funding agencies dole out advice on how to get funded. This is useful, but it doesn’t answer the more fundamental question of why they can only fund so many projects?

Sheldon and Edwin answered this question by noting that OR competes for scientific research dollars with every other scientific discipline. One way to both improve our funding rates and to give back to our field is to make a case for how operations research should get a bigger slice of the research funding pie.

Sheldon specifically mentioned OR’s role in “big data.” Most of us work or do research where data plays an integral role, and it seems like this is a great opportunity for our field. I’ve been thinking about the difference between “data” and “big data” in terms of operations research. Big data was a popular term in 2012 despite how there is no good definition of how “big” or diverse the data must be before the data become “big data.” NSF had a call for proposals for core techniques for big data. The call summarized how they define big data:

The phrase “big data” in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.

I like this definition of big data, since it acknowledges that the challenges do not only lie in the size of the data; complex data in multiple formats and data that changes rapidly is also included.

I ultimately decided not to write a proposal for this solicitation, but I did earmark it as something to think about for the future. This call required that the innovation needed to be on the big data side, meaning that projects that utilize big data in new applications would not be funded. Certainly, OR models and methods benefit from a data-rich environment, since it leads to new OR models and methods. Here, data is mainly used as a starting point from which to explore new areas. But this means that there is no innovation on the Big Data side. Instead, the innovation will be on the OR side. Does big data in OR mean that we will continue to do what we have been doing well, just with bigger data?

This is an open question for our field: how will bid data fundamentally change what we do in operations research? 

My previous post on whether analytics is necessarily data-driven and whether analytics includes optimization can be viewed as a step towards an answer to this question. But I’m not close to coming up with an answer to this question. Please let me know what you think.