Paleo-Eskimo and Siberians

Tatiana Tatarinova



The ability to trace individuals to the point where their DNA was formed at the population level poses a formidable challenge in genetic anthropology, population genetics and personalized medicine [1]. The vast progress accomplished in developing resources for identifying candidate gene loci for medical care and drug development [2]  was largely unmatched by the field of biogeography and ancestral inference. Only in the past decade have researchers begun harnessing high-throughput genetic data to improve our understanding of global patterns of genetic variation and its correlation to geography. This is not surprising, because the genetic variation is largely determined by demographic history of inbreeding or admixture which often vary between geographic regions. Although in the past few years we have witnessed a growing interest in biogeography methods, only a few computational tools exist, particularly for analysis of mixed individuals [3-7]. These methods can be either local (focusing on origin of chromosomal segments), such as Lanc-CSV [8], LAMP-LD [9], and MULTIMIX [10], global (average ancestral proportions across the genome), such as ADMIXTURE [11], STRUCTURE [12, 13], and reAdmix [7] or both, such as HAPMIX [14], LAMP [9, 15]. Some popular applications are PCA-based. For humans, PCA was shown to be accurate within 700 kilometers in Europe [3]. The Spatial Ancestry Analysis (SPA and SPAMIX) [16] is an advanced tool that explicitly models allele frequencies. SPAMIX is has to have an accuracy of 550Km for two-ancestral admixtures. Algorithms like mSpectrum [18], HAPMIX [13] and LAMP [8] achieve good accuracy at a continent resolution [18], but do not achieve country-level resolution.

Related tools like BEAST[17], STRUCTURE [13], and Lagrange [18] are either inapplicable to autosomal data or cannot be used to study recent admixture in humans, animals, and plants. We note that looking at Y chromosome and mtDNA alone is insufficient for detailed biogeographic analysis, since closely related populations have similar distributions of haplogroups. To address these limitations, we have recently developed an admixture-based tool, Geographic Population Structure (GPS) that can accurately infer ancestral origin on unmixed individuals [19]. GPS infers the geographical origin of individuals by comparing their “genetic signatures” to those of reference populations known to exhibit low mobility in the recent past. GPS’s accuracy was demonstrated by classifying 83% worldwide individuals to their country of origin and 65% to a particular region of the country. Applied to over 200 Sardinian villagers, GPS placed 25% of them in their villages and ≈ 50% within 50 kilometers of their villages. However, contemporary individuals are not necessarily docile and often migrate to different areas and bear offspring of mixed geographical origins. GPS would incorrectly predict such offspring to the central point between the parental origins, which would be unsuitable for pharmacology, forensics, and genealogy, and therefore GPS is not equipped to handle mixed individuals. Moreover, often individuals have an indication of at least one of their possible origins, which can be used to improve the prediction, but existing tools are not designed to consider such information. To address these limitations, we developed  reAdmix [7], a tool that models individuals as a mix of populations and can use user input to improve prediction accuracy.

Upon demonstrating accuracy of reAdmix on simulated datasets we applied this algorithm to analyze individuals of presumed Ket origin. The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We have collected data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We compared the GenoChip SNP array data for the Ket, Selkup, Nganasan, and Enets populations to the worldwide collection of populations based on 130 K ancestry-informative markers [20]. We applied GPS and reAdmix algorithms to infer provenance of the samples and confirm self-reported ethnic origin. Combining the output from the two algorithms, we identified a subset of non-admixed Kets among self-identified Ket individuals, and nominated two individuals for whole-genome sequencing. Analysis of these carefully selected individuals enabled us to establish that Kets belong to a group of modern populations closest to an ancient source of Siberian ancestry in Saqqaq.



  1. Tishkoff, S.A. and K.K. Kidd, Implications of biogeography of human populations for 'race' and medicine. Nature Genetics, 2004. 36(11 Suppl): p. S21-7.
  2. Takeda, J., et al., H-InvDB in 2013: an omics study platform for human functional gene and transcript discovery. Nucleic Acids Res, 2013. 41(Database issue): p. D915-9.
  3. Novembre, J., et al., Genes mirror geography within Europe. Nature, 2008. 456(7218): p. 98-101.
  4. Yang, W.Y., et al., A model-based approach for analysis of spatial structure in genetic data. Nature Genetics, 2012. 44(6): p. 725-31.
  5. François, O., et al., Principal component analysis under population genetic models of range expansion and admixture. Molecular Biology and Evolution, 2010. 27(6): p. 1257-68.
  6. Rannala, B. and Z. Yang, Improved reversible jump algorithms for Bayesian species delimitation. Genetics, 2013. 195: p. 245-253.
  7. Kozlov, K., et al., Differential Evolution approach to detect recent admixture. BMC Genomics, 2015. 16 Suppl 8: p. S9.
  8. Brown, R. and B. Pasaniuc, Enhanced methods for local ancestry assignment in sequenced admixed individuals. PLoS Comput Biol, 2014. 10(4): p. e1003555.
  9. Sankararaman, S., et al., Estimating local ancestry in admixed populations. Am J Hum Genet, 2008. 82(2): p. 290-303.
  10. Churchhouse, C. and J. Marchini, Multiway admixture deconvolution using phased or unphased ancestral panels. Genet Epidemiol, 2013. 37(1): p. 1-12.
  11. Alexander, D.H., J. Novembre, and K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res, 2009. 19(9): p. 1655-64.
  12. Falush, D., M. Stephens, and J.K. Pritchard, Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 2003. 164(4): p. 1567-87.
  13. Falush, D., M. Stephens, and J.K. Pritchard, Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology, 2007. 7(4): p. 574-578.
  14. Price, A.L., et al., Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet, 2009. 5(6): p. e1000519.
  15. Liu, Y., et al., Softwares and methods for estimating genetic ancestry in human populations. Hum Genomics, 2013. 7: p. 1.
  16. Yang, W.-Y., et al., Spatial localization of recent ancestors for admixed individuals. G3, 2014. 4(12): p. 2505-18.
  17. Drummond, A.J. and A. Rambaut, BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol, 2007. 7: p. 214.
  18. Ree, R.H. and S.A. Smith, Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. Syst Biol, 2008. 57(1): p. 4-14.
  19. Elhaik, E., et al., Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nat Commun, 2014. 5.
  20. Elhaik, E., et al., The GenoChip: A New Tool for Genetic Anthropology. Genome Biology and Evolution, 2013.


Татьяна Татаринова


Татьяна Татаринова закончила кафедру Теоретической Физики Московского Инженерно-Физического Института (МИФИ), продолжила учебу в Университете Юты, США, и потом закончила образование на факультельтете математики в Университете Южной Калифорнии, где получила степень доктора философии (PhD). Одновременно с учебой в аспирантуре Татьяна работала в растительном биотех стартапе Серес, где изобретала алгоритмы для анализа регуляторных областей и для аннотации растительных геномов. Затем Татьяна перебралась в Соединенное Королевство, где возглавила группу Вычислительной Биологии в Университете Гламоргана (Уэльс). В 2013 году Татьяна получила предложение работать в Университете Южной Калифорнии и вернулась на Калифорнийщину. Биоинформатическая лаборатория под руководством Татьяны разрабатывает новые алгоритмы для фунционального геномного анализа, популяционной генетики человека и растений, и для работы с древними ДНК. С 2014 года лаборатория Татариновой тесно сотрудничает с российскими учеными в МГУ, ИППИ, ИОГЕНе и Сколково и участвует в образовательных программах.  Татьяна имеет чёрный пояс (4й дан) по Шотокан карате, занимается йогой и вокалом.