The single most lucrative single day prize for any contestant in Jeopardy’s history was indirectly based on data mining and simulation. Roger Craig, who has a PhD in computer science, used data-mining algorithms to train himself on a database of training questions. His source data was The Jeopardy Archive, which has every question and answer from all of the Jeopardy episodes. He parsed the whole site to create one large data set composed on unstructured test data. He reverse engineered the game to identify which categories of questions to study based on how valuable these questions are in the game. He randomly sampled from the set of training questions and tried to answer the questions correctly. His answered questions were used to predict which questions he would get right and wrong and to identify which subjects to study.
Roger Craig clustered the training questions based on their category and their value. For example, low-valued questions are often based on food whereas high-valued questions are often based on art. He constructed a nonlinear algorithm to identify the optimal “path” for beating the average contestant on Jeopardy. The algorithm was based on the probability of getting questions correct for his “predicted self” using simulation. He focused on improving his answers on the high-valued questions by studying topics that were shaky.
Training for Jeopardy was also a knapsack problem: Roger Craig had a limited amount of time to study. One way to effectively use his studying time was to limit the amount of time spent on each question. This had the side benefit of preparing him to answer the questions as quickly as possible, improving his odds of being the first to buzz in with the correct question. Then, he used his algorithm to identify which topics would help improve his score the most.
The video below shows how Roger Craig prepared for Jeopardy. If you have discovered more details about his specific model and implementation, please leave a comment.