Updated 31 March 2010
The goal of the MSGene database is to serve as a comprehensive, unbiased,
publicly available and regularly updated field-synopsis of published genetic
association studies performed on multiple sclerosis phenotypes. Eligible publications are
identified following systematic searches of scientific literature databases, as
well as the table of contents of journals in genetics, neurology, and
immunology. Data selected for display summarize key characteristics of the
investigated study cohorts (e.g.,
gene overview), as well as genotype distributions in cases and controls (e.g.,
polymorphism details). For eligible polymorphisms with genotype data in at least
four case-control samples, continuously updated random-effects meta-analyses
are presented, complemented by cumulative random-effects meta-analyses (see
meta-analysis methods). Note that data obtained from family-based studies
are not included in the meta-analyses, as crude odds ratios cannot be readily
calculated from overall genotype distributions. However, these studies and
their qualitative results are still listed on the gene-summary pages of the
MSGene website (see
Table 2 for example).
To ensure the highest degree of scientific objectivity, only studies
published in peer-reviewed journals available in English are considered for
inclusion into the database. In particular, this precludes the inclusion of
data presented only in abstracted form, e.g. at scientific meetings. We
encourage authors of original reports fulfilling the above criteria to submit their data as soon as
their work is accepted for publication.
For more details on inclusion criteria, literature searches, data-management
procedures, statistical analyses, and online database structure, please see Bertram et
al. (2007), and Allen et
2. The "Top Results" List
In an effort to facilitate the
identification of the most promising meta-analysis results available in
MSGene, a continuously updated list displaying the most strongly associated
genes ("Top Results") has been added to the MSGene homepage. The
list includes genes/loci which contain at least
one variant showing a nominally significant summary OR in the analysis of all
studies (“All”), or those limited to samples of a specific ethnicity
(e.g. “Caucasian”). The nominally significant meta-analyses are then graded based on interim guidelines for the grading of the epidemiological credibility of genetic association studies recently developed
by the Human Genome Epidemiology Network (HuGENet; Ioannidis et al, 2008).
In the "Top Results" list, genes are ranked based the genetic variant with the best overall HuGENet/Venice grade; for genes with identical grades, ranking is based on P-value; for genes with identical grade & P-value, ranking is based on effect size (OR).
HuGENet "Venice criteria"
We rate overall epidemiological credibility
as ‘strong’ if associations received three A grades, ‘moderate’ if they received at least
one B grade but no C grades, and ‘weak’ if they received a C grade in any of the
three assessment criteria. While we believe that this list represents an up-to-date
summary of particularly promising multiple sclerosis candidate genes that warrant follow-up
with high priority, we note that many of these may still represent false-positive
Briefly, each meta-analyzed association in MSGene is graded on the basis
of the amount of evidence, consistency of replication, and protection from bias.
For amount of evidence, we assign the grade ‘A’ when the total number of minor
alleles of cases and controls combined in the meta-analyses exceeds 1,000, ‘B’ when
it is between 100 and 1,000, and ‘C’ when it is less than 100. For consistency of replication, we assign the grade ‘A’ for I2 point estimates <25%, ‘B’ for I2 values of
25–50%, and ‘C’ for I2 values >50%. Note that this criterion does not apply to meta-analyses with a P-value <1x10-7 after exclusion of the initial studie(s), as described in Khoury et al, 2009.
For protection from bias, the guidelines propose consideration of various potential
sources of bias, including errors in phenotypes, genotypes, confounding (population
stratification) and errors or biases at the meta-analysis level (publication
and other selection biases). A grade A implies that there is probably no bias that
can affect the presence of the association, grade B that there is no demonstrable
bias but important information is missing for its appraisal, and grade C that there
is evidence for potential or clear bias that can invalidate the association. Errors
and biases are also considered in the framework of the observed summary OR.
Whenever the summary OR deviates less than 1.15-fold from the null in meta-analyses
based on published data, we acknowledge that occult publication and selective
reporting biases alone may invalidate the association, regardless of the presence
or absence of other biases, and therefore assign a grade of C. When the summary
OR deviates more than 1.15-fold from the null, we assign a grade of C when the
modified regression test (Hardbord et al, 2006) or excess test suggest the possibility of publication-bias
or significance-chasing bias or when the association is no longer nominally statistically
significant upon exclusion of the initial study or studies violating HWE.
3. Database Organization and Methods
For all polymorphisms with minor allele
frequencies in healthy controls >1%, and for which case-control genotype
data are available in four or more independent samples, crude odds ratios (ORs)
and 95 percent confidence intervals (CIs) are calculated from the reported
allele distributions for each study. Summary ORs and 95 percent CIs are calculated using the
DerSimonian and Laird (1986) random-effects model (using the 'rmeta' package in R). This procedure is
done including all studies irrespective of ethnicity (denoted by "All
Studies" on the meta-analysis figures), and for all ethnic groups with independent genotype data in at least three populations. Whenever applicable, the results of a number of sensitivity analyses are also displayed, e.g. after exclusion of the initial study, after exclusion of studies in which a violation of Hardy-Weinberg Equilibrium (HWE; calculated using the 'HardyWeinberg' package in R) was detected. Overlapping
samples (of which usually only the largest is included), studies with missing
data, or control samples deviating from HWE are indicated on the meta-analysis
graphs. Please note, that when only few studies are included in the meta-analyses
(i.e. less than ~10), the random effects model may yield summary ORs and
confidence bounds that are slightly anti-conservative.
To allow a visual assessment of the change in summary OR over time, cumulative meta-analyses are displayed for each of the polymorphisms eligible for meta-analysis. Cumulative meta-analyses are only displayed for the ethnic subgroup (e.g. 'All' or 'Caucasian' ) with the overall best ranking summary OR by random-effects meta-analyses.
Inclusion of Genome-wide Association Studies (GWAS)
For the systematic inclusion of data from GWAS and other
large-scale studies, we have devised the following step-wise
protocol, which we believe allows to capture the most relevant genetic information
without the need to include every data-point from these studies. Please visit this page to see a summary of all published large-scale
studies currently included in MSGene.
Stage I: Represents the inclusion
of genes and polymorphisms “featured” or
highlighted by the authors of the GWAS or other large-scale study, usually because they show
some degree of genetic association after completion of all analyses, e.g.
testing multiple independent samples. These genes and polymorphisms probably
represent the most important findings of each GWAS and are
therefore included in MSGene with highest priority. Genomic loci that do not map within any known gene are represented by a surrogate name specifying the cytogenetic location (e.g. “GWA_13q31.3”).
that have made their genotype data publicly available, we will also make use of “non-featured” genotype
distributions, i.e. of polymorphisms not believed to be strongly associated with MS in
the original publications:
Stage II: Will add GWAS and large-scale study
genotype data for polymorphisms already available in MSGene, i.e. usually derived
from candidate gene studies. GWAS data for
such overlapping polymorphisms will be added to the gene-specific entries and,
if applicable, included and displayed in the meta-analyses. This stage adds valuable information to the existing
MSGene meta-analyses as it is derived from assessments that are largely
unbiased with respect to gene function, in contrast to most conventional
candidate gene studies. Note that genotype data from large-scale association studies using a "pooled" genotyping approach will not be considered for these analyses due to the sometimes substantial variability of genotype and allele frequencies when compared to subject-level genotyping.
Stage III: Focuses on published meta-analyses of the existing GWAS datasets. Genes and loci resulting from these analyses are treated equivalently to the "featured genes" of Stage I (above). Genotype data on the gene-specific pages will be extracted from the primary GWAS studies (if their data is publicly available) and displayed alongside a "GWAS meta-analysis" entry. This stage also entails the inclusion of more complex genetic analyses, e.g. those jointly analyzing large numbers of polymorphisms at different loci based on assumptions regarding the functional interconnection of these loci, e.g. in forms of "pathways". To the degree that it can be achieved in this context, these pathway-based results are labeled as such and stored in separate "unmapped" section of the database.
***Please note that GWAS genotype data and allele frequencies cannot be displayed online due to the necessary data protection policy (see ref. Homer et al., 2008, for explanation), unless provided in the original publications. The respective SNP entries are labelled as "Either no data provided, or data otherwise not eligible for inclusion". However, GWAS genotype data are included in the MSGene meta-analyses and rounded ORs and CIs are displayed on the respective graphs where applicable.***
Association studies on mitochondrial genes
Studies assessing a potential association between MS and genetic variants in the mitochondrial (mt) genome are subject to the same inclusion criteria as studies investigating markers from the nuclear genome, and are displayed on a separate "chromosome graph" (which is adapted from imagery on the "Mito Map" website [http://www.mitomap.org/]). Owing to the specific characteristics of human mt-inheritance (e.g. its multicopy nature and the high frequency of somatic mutation events) and the innate heterogeneity of mt-association studies, however, genotype data from these studies are not included on MSGene and therefore not subject to meta-analysis.
Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, Khoury MJ, Tanzi RE, Bertram L. (2008) "Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database." Nat Genet 40(7):827-34. Abstract
Bertram L, McQueen, Mullin K, Blacker D, Tanzi RE. (2007)
"Systematic meta-analyses of Alzheimer disease genetic association
studies: the SZGene database." Nat Genet 39(1): 17-23. Abstract
DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin
Trials. 1986 Sep;7(3):177-88. Abstract
Harbord RM, Egger M, Sterne JA (2006) "A modified test for small-study effects in
meta-analyses of controlled trials with binary endpoints." Stat Med 25;3443–3457. Abstract
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, Pearson JV, Stephan DA, Nelson SF, Craig DW. (2008) "Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays." PLoS Genet. 2008 Aug 29;4(8):e1000167. Abstract
Ioannidis JP, Boffetta P, Little J, O'Brien TR, Uitterlinden AG, Vineis P, Balding DJ, Chokkalingam A, Dolan SM, Flanders WD, Higgins JP, McCarthy MI, McDermott DH, Page GP, Rebbeck TR, Seminara D, Khoury MJ (2008) "Assessment of cumulative evidence on genetic associations: interim guidelines." Int J Epidemiol 37(1):120-32. Abstract
Khoury MJ, Bertram L, Boffetta P, Butterworth AS, Chanock SJ, Dolan SM, Fortier I, Garcia-Closas M, Gwinn M, Higgins JP, Janssens AC, Ostell J, Owen RP, Pagon RA, Rebbeck TR, Rothman N, Bernstein JL, Burton PR, Campbell H, Chockalingam A, Furberg H, Little J, O'Brien TR, Seminara D, Vineis P, Winn DM, Yu W, Ioannidis JP (2009) "Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases." Am J Epidemiol 170(3):269-79. Abstract