One mistaken gene conversion for example turns the gene symbol SEPT2, short for Septin 2, to “2-Sep.” Likewise, MARCH1—aka Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase—is rendered as “1-Mar.” The scientists wrote: “Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’).”

The conversion takes place without the researcher noticing and culminates in research papers with errors in their supplementary files, sometimes contributing to unverifiable data or errors in subsequent calculations.

paper: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
Excel's options can't be changed to fix the problem permanently ... the best work-around is probably to make a template with all cells set to "text". Then reset any columns where calculations are needed, or use a different template if doing calculations.

Pretty annoying, since it's an option a lot of people want, and it would be simple to implement program-wide. Most of the cell formatting features work fine on their own, so it's often creating other problems to have to turn all of them off, instead of just checking a box saying "never change the data into a date".


Excel is a good tool to explore data but I think it is not a suitable tool to use for research results. It is very hard to do any sort of quality control or unit testing of stats code or results. Having said that I'm not sure many academic researchers have much of an idea of doing any testing on the validity of how they produce their results, software testing.

Another problem is simply loading data can often lead to issues unless there are checks. I think one paper had an age of 32000+ which is clearly an issue in converting unsigned to signed integers in a load process. But checks should be put in place.

There were some economists who made spread sheet errors with a paper used by economists in support of austerity.