Hi Valentijn,
I have not read through what everyone else has written here, so I might be repeating what someone else has said. I might also have misunderstood some things.
With those caveats, these are my first thoughts on what you asked in your initial post:
1. One piece of data per cell
Trying to load too much information into the same column (by using exclamation marks, bolded text, asterisks, colors, or whatever) makes it confusing. Spread the information out so that each cell only provides one key bit of data.
In other words, have more columns. There is no inherent benefit in having as few columns as possible (if the chart is allowed to fill an entire printed page or computer screen, and isn't required to fit into a tiny 2-inch-square space in a journal article or something).
You can easily fit 6 or 8 columns in a chart that covers an A4/letter-size printed page, and it would be easier for a novice to absorb the information presented that way.
As an example, the chart I made for my 23andme results that I attached to my thread has 7 columns (actually, my chart's got 20 columns in real life, but those first 7 were the pertinent ones that I wanted to convey to the people who viewed my thread).
http://forums.phoenixrising.me/inde...hts-and-pointers-on-my-23andme-results.23837/
I don't think it looks too busy or overwhelming. Obviously, I didn't design it to provide all the information that you want to provide in your chart, so I'm not suggesting mine as any sort of template, merely using it as an example of how adding more columns can make this genetic information easier to understand.
2. Words instead of symbols
Do not be afraid of using words to describe characteristics. Not everyone "thinks" in symbols and can automatically translate from a symbol to the idea that it represents: seeing exclamation points, asterisks, bold font, plusses and minusses, highlight colors, etc. only confuses them and slows them down, rather than speeds up their processing of the data that is being presented to them.
If your chart is not just going to be used by a small selection of repeat customers --such as medical doctors or genetic counselors who would use it for many of their patients and quickly would grasp the meaning of the symbols used-- but instead it's going to be used by a wide selection of one-time-only customers --people who get their genotypes tested and just want to use your product once, for themselves only-- it can be good to make everything in the chart as self-explanatory as possible, without requiring the user to consult a separate list/page/paragraph that gives a necessary translation of the symbols into words.
People who are not used to interpreting large amounts of information are more comfortable with words than symbols or numbers. Even those who are comfortable using symbols and numbers can become confused by their meaning, as evidenced by how many people get confused by what Yasko's "+/-" indicators really mean and have to think it through several times before understanding how to read them.
Instead of using a "+" symbol to mean that something is more likely, is higher, is beneficial (or any of the other things that a plus symbol might stand for), just put the actual words "high" or "low", or "beneficial" or "negative", or "up" or "down", or "benign" or "risky", etc. That way, the meaning is clear.
Additionally, and this drives me personally bananas when it comes to Excel, a lot of symbols are read by spreadsheets as indicating the start of a formula, and therefore symbols can be a pain to copy and paste into, or type into, spreadsheets because the spreadsheet does not treat it as unformatted information but imposes an unwanted format onto it (and sometimes Excel can apply a formatting rule and actually remove the value that was originally typed into the cell before the user even knows what has happened). Not everyone knows that putting a single quotation mark before a symbol will protect it from Excel's meddling, but even if they do, mistakes happen and data can disappear or become altered without notice.
An Example of 1 & 2:
You can add a column entitled "regulation". The possible results in that column might be (in words): "up", "down", "neutral", "not applicable", or "no data".
Now, if you would rather not use descriptive words and wish to use symbols, you can still have several columns within the spreadsheet to make the information easier to understand. E.g., you can have a "regulation" column and use an up arrow for up regulation, a down arrow for down regulation, a sideways arrow (with 2 heads on it) for no change in regulation, etc.
3. Colors
As I wrote above, I think it would be better to indicate everything in words instead of symbols. Beyond the basic problems I think are inherent in using symbols for a "lay" audience, I think using colors in an important graphic provides an additional layer of unnecessary complexity, because many times something can only be printed out or photocopied in black-and-white (such as at a doctor's office or library) and the colors disappear into indiscernible shades of grey, or even obscure entirely the very text they are meant to highlight.
However, if you are going to use colors to signify anything, I would suggest not replicating ANY of the Yasko (GeneticGenie etc.) colors, because seeing those in your chart would only confuse people.
Especially if you mean something different with the same color than Yasko means (which I think is your intention regarding the color red - that you intend for it to imply something different than Yasko implies with the color red).
In other words, in your chart, do not use the color red at all. (Or yellow.)
Instead of red, for a color that indicates "warmth", perhaps orange would be suitable. (And apropos for your location!
)
4. Info
I would recommend including for each SNP the info that I did in my chart, and potentially a lot more: the gene, the variation (that's not the right word, I think, so use whatever is the name of that particular concept), the rs id, the possible alleles, the majority alleles, the minority alleles (you could also include the percentages of worldwide population for the alleles, or get fancier yet and note the allele percentages of the "ethnic" group or country that the person who is being tested is from), the tested person's alleles, and several further columns that explain the tested person's alleles and what that result actually means, if the regulation is constricted or overdriven, if it's a risk and they should be worried about it, if there are medications or lifestyle behaviors that can address it, what other mutations might work together with that one to produce a problematic health result, etc. etc.
I'm just brainstorming, but I think it would be really useful to have some or all of this summarized in one chart.
Maybe you could have one output which is just a 3-column chart (if you need a chart that is very narrow and succinct for some reason), and a second output which is the larger chart that gives a one-stop-shop of all the main info the user might want to keep top-of-mind about her/his genetic results.
To make it really useful as a translation tool, you could put how Yasko reports the allele, how 23andme reports it, how the government or global genome database (or whatever) reports it, etc.
23andme does this in their raw data area when they show how the user's allele would show up in dbSNP.
5. Rare SNPs
As you will remember, today a comment you wrote on another thread led me to discover that I have a comparatively rare allele (which will be known in future biology textbooks as the "Craigbell mutation").
Rare things can give so much information. They can be the missing link for people who have examined every other "normal" cause of their illness/concern but still haven't solved it.
I think too much is made of probabilities and expecting most people to fall into a normal category. So many health problems I have were overlooked or pooh-poohed by healthcare workers because they weren't in the top 3 most expected results or didn't fit the main two expected patterns. Rather than asking why the situation/lab reading/symptom was different than expected, they devalued the situation and said it was probably really somehow normal, and left it at that. [My creaking memory is whispering something to me about Type I and Type II errors, but I'm not sure if that's what I'm talking about here or not.]
Besides, my definition of "rare" for health conditions would be more like one in 50,000, not 1 in 100.
I don't know how many rare SNPs you are talking about including or leaving out, but including them, if you already have the data on them, and if they might have a real effect on the individual's health, could become one of the main differentiators of your chart, versus the others out there.