I was recently asked for some of my code that I used in a paper. First of all, I should firmly state that people should share code. Sharing and openly sharing ideas is, after all, the hallmark of academic research.
I am not often asked for code, and I had a few reactions to a recent request, which was made by a student. My first reaction was that of a professor: should I give the student a fish or teach the student how to fish? When I am usually asked for code, the request is extremely short with no context given, meaning that it is hard for me to gauge how hard the student had tried to get the program to work. Was there a typo in the paper that is causing a problem? Where are they stuck? What types of programming errors are they getting? Are they simply being lazy? I just don’t know.
This particular request relied on code that was about 20 lines of code. Most of my projects are longer, usually comprising of hundreds or thousands of lines of code (I don’t really count, but they can get hairy). I found myself wondering if the requester–a graduate student–really had thought about how to run the code or was just emailing me. I exchanged a few emails with the requester before sending my code to make sure that I wasn’t doing someone’s homework for them.
Another recent request asked for code in a paper that I did the computational work for when I was in graduate school. I looked at the pseudo-code in the paper draft and wrote the code. It took me a day or so to get my code working, but it wasn’t particularly painful. In this case, I was confident that the paper was clear and unambiguous about how to write the code.
My second concern about sharing code is–and I’m being honest here–my code is one giant hack. I am not a software programmer. I know what good, elegant code looks like, and it’s not mine. I often have to cobble together multiple programs to solve a problem from beginning to end. I often write a script to run many copies of the same program with different inputs. I always write code for analyzing the solutions and creating figures. Over the years, I have gotten good at making my code readable to me, so that I can come back to it after months or years and figure out what I did. But that’s not the same thing as being readable to someone else. This is a long way of saying that I’m a little embarrassed about sharing my code with others. Maybe I’m just prudent and am being too hard on myself. But I am married to a software programmer, so I very aware of how high the bar really is for “good” code.
Having someone look at my code is like inviting someone into my house before straightening up first. It’s one thing to show my messy code to a collaborator but it’s another thing to show my messy code to a stranger. Sharing papers and tech reports is different–they are polished so they are OK to share. This can be somewhat addressed by commenting code better. I always start off commenting code well, but during the fog of debugging, my code usually gets a little out of control, and it’s hard to reign in after awhile. (I’ve seen other people’s code. I have some good programming habits–my code could be much, much worse).
However, as I am learning in my discrete optimization course this semester, even simple programming assignments such as implementing the Secretary Problem Markov decision process model can be incredibly difficult for PhD students. They can benefit from looking at my code. My homework solution code isn’t as wild and unruly as my research code. I’m getting used to sharing my code for the homework solutions.
On a related note, this post by Panos Ipeirotis reflects on how to make code more robust to changes, since old code often does not run if it relies on old libraries. Dr. Ipeirotis is a computer scientist, and it sounds like he writes more elegant code than I do. I’m still in square one, meaning that I try to make my code readable to someone else.
How self-conscious are you about sharing your code?
March 10th, 2011 at 3:37 pm
Been there. If I think the code is something that could be developed as a homework assignment, I would not share. They really should be developing their own. However if it is something I developed that I finally got to work well and I don’t plan on marketing, why not share.
March 10th, 2011 at 4:17 pm
Dr. Ipeirotis’s solution turns the code into a black box. The user can run the code, but can’t read it. Requests I get usually are from people who are struggling to do something complicated (typically with CPLEX) and would like learn from source code that did something similar. A black box does them no good.
I’m in the same boat you are in terms of being an amateur coder. I also suffer from a tendency to over-engineer some things. I think I’ve actually heard compilers giggle at me once or twice. Someone looking for a clue, though, will find that clue despite the inartistic expression of the code. If they’re polite, they won’t smirk about it, either.
March 10th, 2011 at 4:29 pm
Thank you Laurence and Paul. It’s good to know that I’m not alone. In my post, I forgot to stress the point that although we are amateur programmers, our code does indeed work. That, of course, is important when conducting research.
Paul, I hadn’t thought about the Black Box aspect of making code so usable that it no longer needs to be understood. That is an excellent point! (and spoken like a professor, always looking to teach someone something).
I have been using some code that I wrote in grad school for making assignments in my discrete optimization course. Amazingly, I was able to easily understand and edit the code after all of these years to suit my teaching needs.
March 10th, 2011 at 4:33 pm
Paul, you are unfair in the comment. The code *is* available, as I explicitly mention in the post. The code from that particular example is available at Google Code (http://code.google.com/p/panos-ipeirotis/) and you can see it at http://code.google.com/p/panos-ipeirotis/source/browse/#svn%2Ftrunk%2Fsrc%2Fcom%2Fipeirotis%2Freadability
I tend to put my code online, not because I am proud of my code, but because by making my code easy to use, I get more people to use (and cite 🙂 my work. It is almost like academic bribery: I save you time, give me a citation back for using my code/data 🙂
March 10th, 2011 at 4:35 pm
Thanks for the clarification, Panos (and for your tips on academic bribery).
March 10th, 2011 at 4:48 pm
@Panos: Sorry, I did not mean to imply that you failed to make source code available by other means. I just meant that Google App Engine (or similar alternative) provides the functionality but not the learning opportunity.
March 10th, 2011 at 6:20 pm
Great post Laura – I’m incredibly self-conscious about my code, but I’m starting to accept more and more that I’m going to have to get over it and start publishing code. There’s a growing movement in my field (mostly from industry) that if academics want their work taken seriously in a general context we need to start releasing source. As nice as it is to think that “the paper speaks for itself”, the metrics that we use in our result sections are often not entirely what industry needs to assess in order to decide if the technique is useful for them. That might be specific to Autonomous Agents and Game AI, but I’m beginning to lean towards the view that in order to have significant impact we should all be releasing source code as a requirement of publishing – whilst we peer review ideas for relevance, originality and significance, I rarely see anyone verifying that the results are actually replicable – in fact increasingly I’m noticing papers that omit key parameter assignments before launching into results analysis so they are not replicable AT ALL! Publishing source alongside the paper, perhaps through some sort of “academic sourceforge” or something would mean that we can ensure replicability of results and it becomes a totally accepted practice that benefits the whole community – and as Panos rightly points out helps circulate your technique and (ideally) gets people using it and referencing it. And if everyone was doing it, I think we’d lose a lot of the hang-ups we have about letting people see our untidy house (love that analogy!)
March 10th, 2011 at 7:45 pm
In the spirit of what Panos and Luke have said, I point to the third paragraph on the COIN-OR home page.
I’ve had the experience of trying to benchmark a new algorithm against previously published algorithms, only to discover that the other authors’ code is not available (and the description in the published paper many not go into implementation details). Besides the PITA of coding their work myself, I run the risk that gains I demonstrate are due to my errant coding of their algorithm rather than the wonderfulness of my algorithm.
March 11th, 2011 at 2:18 am
“My second concern about sharing code is–and I’m being honest here–my code is one giant hack”
So what? You are not a programmer by profession and therefore are excused 🙂 Heck, everyone writes ugly hacks that just get the work done. By not sharing your code, you are missing out the opportunity of finding help that will make it better (in the sense of not being an ugly hack anymore).
These days I’m dealing with a piece of code written by a Physicist. Yes it is ugly. But you know what it works! Can I make it look better? I think so, I try to. Who will benefit if I succeed? We all are.
March 14th, 2011 at 4:00 am
My most public code is the GPL’d OpenSolver for Excel. I felt nervous about my first release, but have now had enough downloads and comments to be pretty sure that it works ok for most people (or perhaps fails so badly that the user gives up immediately!). We are still falling into traps that I’m sure more experienced Excel programmers avoid. For example, I’ve only just appreciated the full complexities of internationalisation; the next version will fix a glitch that shows up only on non-English systems. (These systems display values in the local number format, such as 1,34, but still require Solver parameters – for example the branch and bound tolerance – to be accessed as US-style numeric strings; VBA doesn’t do this by default.) I had assumed my code was going unread until I learnt last week of one user who has made significant customisations; I guess his lack of complaints is a sign that our house is not *too* untidy. Even so, we probably need to make more small releases, even when we know there are still problems, and perhaps accompany each release with a “mea culpa” on the web pages so users know where caution is required.Andrew
March 14th, 2011 at 4:03 pm
[…] mathematical practices, Nuit Blanche ranted wrote about our self-archiving rights, Punk Rock OR mused about sharing code, Goedel’s Lost Letter and P=NP offered hilarious stories about TeX and QED Insight wrote […]
March 14th, 2011 at 4:28 pm
Andrew, Thanks for sharing your OpenSolver code. I am looking forward to checking it out.
March 19th, 2011 at 8:09 pm
Interesting post Dr.McLay.
Sharing code can be a tricky situation, especially when it comes to students. You want the student to learn, but at the same time give them some direction. Here is where I think a pseudo-code would be helpful. The students get to implement it in their favorite programming language. Also if initial assignments are written with a similar modeling pattern, students will start understanding the methodology and catch up. Then future assignments will be a useful tool to test their learning.
With fellow researchers though, code should become more open source. There is no poing re-inventing the wheel. If people can find a better implementation of the algorithm, that is great. They should publish the changes and the entire community could and will benefit.
Just my 2 cents!