Document Type

Article

Publication Date

6-13-2014

Abstract

A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved.

Publication Source (Journal or Book title)

Nature Communications

Plum Print visual indicator of research metrics
PlumX Metrics
  • Citations
    • Citation Indexes: 294
  • Usage
    • Downloads: 112
    • Abstract Views: 9
  • Captures
    • Readers: 444
  • Mentions
    • References: 1
see details

COinS