Tag Archives: data

big data and operations research

Sheldon Jacobson and Edwin Romeijn, the OR and SES/MES program directors at NSF, respectively, talked about the role of operations research in the bigger picture of scientific research at the INFORMS Computing Society Conference in Santa Fe last week. Quite often, program managers at funding agencies dole out advice on how to get funded. This is useful, but it doesn’t answer the more fundamental question of why they can only fund so many projects?

Sheldon and Edwin answered this question by noting that OR competes for scientific research dollars with every other scientific discipline. One way to both improve our funding rates and to give back to our field is to make a case for how operations research should get a bigger slice of the research funding pie.

Sheldon specifically mentioned OR’s role in “big data.” Most of us work or do research where data plays an integral role, and it seems like this is a great opportunity for our field. I’ve been thinking about the difference between “data” and “big data” in terms of operations research. Big data was a popular term in 2012 despite how there is no good definition of how “big” or diverse the data must be before the data become “big data.” NSF had a call for proposals for core techniques for big data. The call summarized how they define big data:

The phrase “big data” in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.

I like this definition of big data, since it acknowledges that the challenges do not only lie in the size of the data; complex data in multiple formats and data that changes rapidly is also included.

I ultimately decided not to write a proposal for this solicitation, but I did earmark it as something to think about for the future. This call required that the innovation needed to be on the big data side, meaning that projects that utilize big data in new applications would not be funded. Certainly, OR models and methods benefit from a data-rich environment, since it leads to new OR models and methods. Here, data is mainly used as a starting point from which to explore new areas. But this means that there is no innovation on the Big Data side. Instead, the innovation will be on the OR side. Does big data in OR mean that we will continue to do what we have been doing well, just with bigger data?

This is an open question for our field: how will bid data fundamentally change what we do in operations research? 

My previous post on whether analytics is necessarily data-driven and whether analytics includes optimization can be viewed as a step towards an answer to this question. But I’m not close to coming up with an answer to this question. Please let me know what you think.

grocery store market share

Thanks to all for your positive feedback about my post about breakfast cereals.  My mom said that she was flattered, especially after a community of OR geeks admitted that she was onto something good with her breakfast cereal rule.

A recent article in the local paper generated a lot of discussion between my husband and I.  The article reported the market share of different grocers in the area.  We immediately thought of the big three in our area (Ukrops, Food Lion, and Kroger).  We suspected that they shared the vast majority of the market share (like 80%).

We were wrong! A portion of the market share is as follows (I am going off of memory for the ones not in boldface):

  • Food Lion 19.34 
  • Ukrop’s 17.58 
  • Wal-Mart 12.14 
  • Kroger 11.38
  • CVS 6%
  • 711 3%
  • Walgreens 3%

First of all, the top grocery store has less than a 20% market share.  We observed that Ukrops dominated the market (it is *the* grocery store in Richmond), but even when they led the market, they had less than 20% of the market share.  (Incidentally, the fictional Kay Scarpetta shops at Ukrops).  We are loyal Kroger shoppers (they have a healthy OR department, and more importantly, great deals on day old bread) and assumed they were second or third, but they are actually fourth on the list.  The top three combine for only about half of the market share.

We were amazed by the diversity of places where people buy their groceries.  Convenience stores and drug stores account for something like 20% of the market share.  We could not imagine buying 20% of our groceries at drugstores (even though Walgreens has wicked good deals on nuts and dried fruit).  But if you don’t have a car, the drugstore may be the only convenient place to shop.

I did an exercise in one of my classes when students assign probabilities to things that occur “frequently” and “rarely” to get an idea of how we quantify normal and rare events.   The estimates were all over the map.  It depends on your background–people who study rare events may consider an event with a probability of 1% to be “frequent.”  But even within an area, I bet the guesses are all over the place.  In time, I would like to assemble a collection of surprising data sets to use in my classes, and I will get there eventually.

The article was a pleasant reminder that real data is often more diverse and messier than we believe.  People are not good at estimating probabilities or proportions (I’ll stop there, since I am not an expert at eliciting expert judgment, and some of you are familiar with how to mitigate some of these problems).

Have you been surprised by looking at real numbers?