The past decade in human genetic disease has been dominated by genome-wide association studies and, more recently, by sequencing human genomes. The goal is to discover disease genes by identifying variants that cause or increase the risk of disease. In genome-wide association studies, patients with disease are compared with unaffected controls. DNA is collected from both groups, and approximately a million genetic variants are surveyed in their genomes. Variants that are at a higher frequency in patients than in controls are associated with an increased risk of disease. More than a thousand disease-associated variants have been identi- fied from over 400 genome-wide association studies, but much is still unknown about the genetic heritability of disease. For example, genome-wide association studies have identified 40 markers associated with height, but these markers only account for ∼5% of height’s heritability (Maher, 2008). Despite studying thou- sands of individuals, <10% of heritability is explained for most diseases (Manolio et al., 2009).
Individual human genomes have been sequenced, and there are approximately 3 million to 4 million variations with respect to the reference genome (Frazer et al., 2009). It is thought that some of these variants will cause phenotypic differences that can lead to disease or apparent physical traits. It is estimated that 3–8% of the human genome is functional (Siepel et al., 2005), so it is unlikely that all the variation in the 3 Gb human genome will lead to phenotypic differences. Rather, func- tional variants may be localized to the 90–240 Mb of human genome that contains transcribed coding genes, regulatory elements, RNA genes, and other functional elements.
By contrast, the human microbiome has extensive diversity. Each location (skin, mouth, intestine, etc) has its own metagenome. Recent studies have suggested that healthy individuals have up to 15,000 species-level phylotypes in their gastroin- testinal tracts as determined by 16S rRNA sequencing (>97% identity) (Peterson et al., 2008) (Fig. 1.1 of this chapter), and that the two major phylogenetic groups present are the Firmicutes and Bacteriodetes. The average genome size of sequenced organisms from these groups is 3.4 Mb (Liolios et al., 2008), and the percent- age of these genomes that codes for protein-coding genes is approximately 92%. Therefore, the functional part of the gastrointestinal microbiome can be estimated to be approximately 47,000 Mb (15,000 × 3.4 × 0.92), which is more than two orders of magnitude greater than the above-mentioned estimate of the functional part of the human genome.