• Welcome to Phoenix Rising!

    Created in 2008, Phoenix Rising is the largest and oldest forum dedicated to furthering the understanding of and finding treatments for complex chronic illnesses such as chronic fatigue syndrome (ME/CFS), fibromyalgia (FM), long COVID, postural orthostatic tachycardia syndrome (POTS), mast cell activation syndrome (MCAS), and allied diseases.

    To register, simply click the Register button at the top right.

Excel Created Major Typos in 20% of Scientific Papers on Genes


Senior Member
One mistaken gene conversion for example turns the gene symbol SEPT2, short for Septin 2, to “2-Sep.” Likewise, MARCH1—aka Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase—is rendered as “1-Mar.” The scientists wrote: “Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’).”

The conversion takes place without the researcher noticing and culminates in research papers with errors in their supplementary files, sometimes contributing to unverifiable data or errors in subsequent calculations.

paper: http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
Gene name errors are widespread in the scientific literature
Genome Biology201617:177
DOI: 10.1186/s13059-016-1044-7
© The Author(s). 2016
Published: 23 August 2016
Excel's options can't be changed to fix the problem permanently ... the best work-around is probably to make a template with all cells set to "text". Then reset any columns where calculations are needed, or use a different template if doing calculations.

Pretty annoying, since it's an option a lot of people want, and it would be simple to implement program-wide. Most of the cell formatting features work fine on their own, so it's often creating other problems to have to turn all of them off, instead of just checking a box saying "never change the data into a date".


Senior Member
Excel is a good tool to explore data but I think it is not a suitable tool to use for research results. It is very hard to do any sort of quality control or unit testing of stats code or results. Having said that I'm not sure many academic researchers have much of an idea of doing any testing on the validity of how they produce their results, software testing.

Another problem is simply loading data can often lead to issues unless there are checks. I think one paper had an age of 32000+ which is clearly an issue in converting unsigned to signed integers in a load process. But checks should be put in place.

There were some economists who made spread sheet errors with a paper used by economists in support of austerity.