In the genomic era, there are numerous genotype and phenotype data publicly available as a form of ‘big genomic data’. Genome-wide genotype information has provided valuable insights into the genetic basis of complex human diseases. It is now increasingly recognised that whole-genome approach is useful in complex disease analyses, which can use all or most genetic variants across the genome simultaneously. The approach is to link two individuals who are not related in the conventional sense, but who can be compared experimentally because they share part of their genome by descent over many generations. This is a paradigm-shifting approach, leading to a design-free experiment for population genetic analyses that does not require pedigree-informative individuals or relatives. Combined with advanced statistical methods, the whole-genome approach is a promising tool to dissect the genetic architecture and maximise the accuracy of risk prediction for complex diseases, leading to effective precision medicine.

We are currently developing advanced whole-genome methods for causative variant detection, genotype-environment interaction and dissection of a dynamic genetic architecture of complex traits to maximise the accuracy of individual risk prediction.

Available software

MTG2 is a computer program implementing a multivariate linear mixed model to fit complex covariance structures that can be constructed based on genomic information, i.e. multivariate version of GCTA REML. It gives residual maximum likelihood (REML) estimates for genetic and environmental variance and covariance across multiple traits. It estimates the best liner unbiased prediction (BLUP) for quantifying genetic merits or genetic risk. MTG uses the direct average information algorithm. Recently, we combined the direct AI algorithm with an eigen-decomposition of the genomic relationship matrix, as first proposed by Thompson and Shaw (1990). We apply the procedure to analyse real data with univariate, multivariate and random regression linear mixed models with a single genetic covariance structure, and demonstrate that the computation efficiency can increase by >1,000 fold compared with standard REML software based on MME. In addition, random regression models and reaction norm models are available for univariate and multivariate frameworks. There are many other functions in complex trait analyses and statistical genetics (see contents table in the manual). 

Access MTG2 here


Current research projects

  • Advanced whole-genome approaches for causative variant detection and individual risk prediction of complex traits in human populations minus-thick plus-thick

    (PI: Associate Professor Hong Lee)

    The genomics era has demonstrated the true complexity of genetic traits, but brings promise for personalised genomic medicine in which diagnosis and treatment are tailored to individuals based on profiles recorded in the genome. A more feasible and realistic approach is 'precision medicine' in which individuals are classified into treatment-relevant sub-groups based on profiles that incorporate information from both genomic and environmental risk factors. 

    This project aims to develop advanced statistical methods to better detect causative variants, and to better predict an individual's risk to disease. We have pioneered whole-genome methods and propose to improve upon them in several ways. These include a flexible Bayesian framework to elucidate the genetic architecture of complex traits and a linear mixed model to capture currently undetected genetic variance. We will apply our new methods to large data sets, including next-generation sequencing data. Our methods may lead to predictions of risk of disease for individuals that have clinical utility. 

  • Multivariate whole genome estimation and prediction analysis of genomics data for complex diseases minus-thick plus-thick

    (PI: Associate Professor Hong Lee)

    Complex disease is caused by a combination of multiple genes and environmental effects that may affect other traits and diseases. The relative importance of pleiotropic effects is expressed by the genetic correlation, which is often high between diseases and traits. This implies that considering multiple diseases and traits jointly fitted in a model is important to shed light on the etiology of complex diseases. Genomics data, combined with an advanced statistical tool, provides a plausible strategy to identify the latent mechanism of multivariate mode of diseases and to increase the accuracy of genomic risk prediction. In this project, we develop multivariate whole genome estimation and prediction analysis of genomics data for complex diseases, which may lead to improved and personalised treatments for complex diseases. 

  • Development of general statistical analysis package for the national livestock genetic evaluation minus-thick plus-thick

    (PI Associate Professor Hong Lee)

    The aim of this project is to develop a versatile software package that can be useful in a next generation genetic evaluation system. We will develop and implement appropriate statistical models and methods to improve the prediction accuracy using pedigree and genomic information. We will consider various methods including single step approaches and genotype-by-environment interaction models. Applications to real data will validate the usefulness of developed software package that can be a paradigm-shift in the genetic evaluation system. The outcome of this project will facilitate a multivariate analysis of carcass traits, a computationally efficient genetic evaluation (> 100,000 genomes can be analysed simultaneously), and a method integrating complex pedigree and genomic information, which can be applied to various breeds and species. These cutting-edge technologies will be recorded as computer programming codes and will be used in a further R&D project.