A score was incorrectly added to a database that is used to rank football teams in the BCS system, which led to two rank reversals in the rankings (between the #10 and #11 teams and between the #17 and #18 teams). Many sports journalists are talking about all of the implications of the error (What if the error affected the top two ranked teams?!?). Only one of the six data sources is available publicly, so it is not clear the extent of the data inaccuracies or any errors in the actual BCS formula.
What is interesting is that this is being reported as a math error, but from what I have read, it is a data entry error. I was actually a little disappointed when I read about what happened, because I initially thought that the BCS algorithm was incorrectly used (it wasn’t, as far as anyone seems to know, but there is room for improvement there).
I have three thoughts:
- Sports journalists apparently do not understand the different between mathematical formulas and the data used to populate those formulas. Throwing out the BCS formula for ranking teams would not eliminate this type of error, since data from experts (votes, points awarded, etc.) could be incorrectly tabulated and entered into a database. Such errors would be more difficult to spot than a missing score, since they would not be obvious.
- Having incorrect, missing, or inaccurate data is a part of life for many of us who analyze data. In every other type of industry or sector, people make big decisions with inaccurate and incomplete information, and life apparently goes on. When is the BCS data “good enough?” In NCAA football, accurate and complete data should be available for the BCS formula, and as a football fan, I do hope that every effort is made to collect good data. One missing data point isn’t evidence that there is a systematic problem. On the contrary, it sounds like the data set is pretty accurate and complete.
- How can we come to reasonable conclusions about the BCS system when the journalists who are supposed to inform us get the important stuff wrong? This story has become a non-story for me, as I have been reading articles written by people who are not really responding to the issue at hand (one inaccurate data point). There are surely other issues with the BCS system (such as not enough oversight), but replacing the BCS formula with another system would not necessarily imply that there would be more oversight.
The secrecy of the BCS formula seems to be one of the reasons for its unpopularity, yet it produces rankings that are virtually identical to other ranking systems. Is the bias against BCS a function of how most people don’t understand math well? If the BCS formula was more transparent, would that even be a good thing? Do you consider a single data error a big deal?
But I could be wrong, of course.
Related posts:
December 10th, 2010 at 5:48 pm
From an academic standpoint, it would be nice if the BCS computer ranking methodologies were publicly known. At this time, the Colley Matrix is the only method that is publicly available (and therefore easily reproducible). An irony is that the Colley Matrix is where this data error occurred. The media and general public wouldn’t gain much from openly reproducible methods, but at least the technically inclined would be able to verify the results.
December 11th, 2010 at 2:06 am
Stephen, I couldn’t agree with you more when it comes to having the ranking methodologies publicly known. Cloaking the process in secrecy is unsettling, considering that the BCS allocates millions of dollars to many universities who are in the business of doing things openly and transparently. I find it inconsistent.
December 16th, 2010 at 1:40 pm
If my understanding is right that the data error was a Div. II game final score was not entered into one of the computer polls. How interesting is it that the outcome of a game that is pretty remotely related to the Top 10 teams does in fact affect a Top 10 ranking.
I agree it would be nice if the BCS was more open and transparent. Are they trying to deter betting?