The National Science Foundation (NSF) started to require a data management plan for each new proposal. The data management plan will require investigators to make their data (including figures, tables, and code) available to encourage collaboration. An excellent idea! They specifically mention that investigators are required to document their data–not just make it available–so that others could use it, thus creating new opportunities for scientific research. NSF’s data management plan is similar to NIH’s public access plan, which requires that publications from NIH-funded research are publicly available through Pubmed Central with twelve months of publication.
The research world is moving toward a place where investigators are required to share data and code. I once wrote about my insecurities surrounding sharing my code. While I don’t have insecurities about sharing my data (except trying to find extra time to document data better), I do need to think about creating a system for posting my research materials. I’m not sure what the solution should look like.
What is the best place online for sharing research materials? How should code be stored and formatted? Tables, figures, data, and code have different formats. It’s best if they are all stored (or accessed) in the same location.
My university does not have the best tools for sharing data (at least not that I know of). Just updating my web site is a pain. I use my university’s BlackBoard page for my research group that contains my code, papers, references, slides, and whatever else I get tired of emailing students. However, my BlackBoard site is a closed system that does not allow guests even at my university to have access, so it cannot be used to share data to the public.
Dropbox may be a good place to store many documents, given that a separate page links to all of the stored data, although I am loathe to use my precious Dropbox space for storing data. Slideshare and Scribd are good places for sharing slides and technical reports, respectively. Code can be zipped and uploaded elsewhere. But having to store each type of file in a different account on a different site would not exactly facilitate sharing information with others (and no fun for me to keep track of all the different logins and passwords), but I could create a Google Sites page to manage the information so that it can be accessed from a single page.
How do you share your data? How do you find time to document your data?
March 16th, 2011 at 6:24 pm
I currently store my data (of which there is very little) on my personal university web site. Documentation is essentially chopping a description out of the corresponding publication and converting it to HTML or plain text (the latter stored as a README in the zip archive). Visibility (or lack thereof) to search engines aside, I’m not sure how long it will outlive my employment. So I’m waiting to see if a professional society (INFORMS) or some other entity with high survivability offers up free hosting of research data (and code).
In my case, there’s some old (and probably useless) code involved, so perhaps I could justify creating a project at SourceForge, GitHub or similar and plant it there.
March 18th, 2011 at 1:06 pm
Hi Dr. McLay,
I have started storing my data in XML format.
The drawbacks of using XML are 1) that you have to know a little about the XML language, 2) it is a little more work to put the data in this format, and 3) large XML datasets take up a lot of memory.
The benefits of using XML are: 1) XML is widely used (e.g., HTML and RSS are based on XML) and therefore that are a lot of packages for using XML with C/C++/Java/etc., 2) XML is easy to document using XML schema of XML Document Type Definitions (DTDs), 3) it would be easy to standardize for particular problems (e.g., I don’t think it would be hard to create a XML Schema or XML DTD for shortest path problem datasets).
University of Arkansas
March 20th, 2011 at 1:39 pm
Google just launched a service similar to Google Docs but for data.