Sunday, November 13, 2011

Genetic Evidence Regarding the Peopling of Siberia

In modern times, the expanse of Siberia is dominated by speakers of Slavic languages (including Russian and Ukranian), Altaic languages (including languages of the Mongolic, Turkic and Tungusic language families) and Uralic languages (including languages of the Samoyedic language family). However, there are several language families and language isolates found throughout Siberia that tell of a more distant history of the population of Siberia. These languages include the Chukotko-Kamchatkan languages of eastern Siberia, the Yukaghir languages of western Siberia, the Nivkh and Ainu language isolates of the Sakhalin, and the Ket language (the only surviving language of the Yeniseian language family).

Additionally, based on the theory that the Americas were populated by Siberians who crossed the Bering Strait, one would expect that languages ancestral to the Eskimo-Aleut, Na-Dené and Amerind language families of the indigenous Americans must have at one time been spoken in Siberia. In 2008, historical linguist Edward Vajda proposed that the Na-Dené and Yeniseian language families are in fact subphyla of a larger Dené-Yeniseian language stock, a theory which has since received wide acceptance. A similar proposal by Michael Fortescue in 1998 suggests that the Uralic, Chukotko-Kamchatkan, Yukaghir and Eskimo-Aleut families are in fact branches of a larger Uralo-Siberian language family.

This article explores the genetic differences and similarities between the modern speakers of the Slavic, Uralo-Siberian, Altaic, Nivkh, Ainu, Dené-Yeniseian and Amerind language families, with the goal of understanding how, when and why their ancestors came to inhabit the harsh expanse of Siberia.

The Big Picture: Haplogroup K(xLT) vs. Haplogroup C

In terms of Y-DNA haplogroups exhibited by modern inhabitants of Siberia, subclades of haplogroup C and of haplogroup K(xLT) dominate the region. These two distantly related haplogroups (sharing a most recent common ancestor 60,000 years before present) are believed to have spread across Eurasia independently, although they ultimately met in Siberia, forming the basis of the current population of both Siberia and the Americas. A lack of haplogroup C among the speakers of Amerind languages suggests either a bottleneck in the migration through the Americas, or that ancestors of the first Siberians to migrate to the Americas lived in an eastern Siberian region that had thus far been populated by K(xLT), but not C ethnicities. I find the latter to be a more convincing theory, and propose that an examination of the peopling of Siberia should begin with an examination of the ratio of C to K(xLT) among present populations.

Most Recent Arrivals: Slavic Speakers

The spread of Slavic languages throughout Siberia is known to be recent. Based on linguistic research alone, the Slavic languages probably only split off from the Indo-European language family less than 4,000 years ago, and most Slavic languages are not spoken in Siberia, but rather Eastern Europe. Speakers of Slavic languages exhibit a high occurrence of Y-DNA haplogroup R1a, associated with Indo-European languages, a sister clade of R1b which is associated with the Vasconian languages of the Iberian Peninsula. The Slavic people do not generally exhibit haplogroup C. Any population speaking a Slavic language in Siberia would either be the result of recent migration (as would be indicated by the dominance of R1a, or recent adoption of the Slavic language by that population, i.e. language shift. It should be noted, however, that R1a is a clade of K(xLT), indicating a distant relation to some of the other Siberian ethnicities, albeit some 40,000 years before present, and through a most recent common ancestor who probably resided in Central Asia, not Siberia.

Populations With Large Percentages of Haplogroup C

According to a 2004 study by Kristiina Tambet, et al., speakers of Tungusic languages (Evens and Evenks, in this study) exhibited high proportions of haplogroup C. Evens exhibited 74.2% haplogroup C; Evenks exhibited 67.7%. The dominant K(xLT) haplogroup making up the remainder of these populations was haplogroup N, perhaps a result of recent or ancient admixture with Uralo-Siberian speaking populations. A 2005 study by Miroslava Derenko demonstrated that Mongolic speaking Kalmyks exhibited 70.6% haplogroup C; Buryats exhibited 63.9% haplogroup C; and Mongolians exhibited 53.8%. In each of these populations, the next largest component of the Y-DNA mixture consisted of haplogroup N (although Mongolians exhibited a similar proportion of haplogroup O, likely due to a long period of close contact with Chinese populations). Due to the time-depth of the separation of the Tungusic and Mongolic languages from the Altaic family, the haplogroup N component probably represents a uniform superstrata that manifested in Altaic populations soon after Altaic arrival in Siberia. While Turkic speaking Siberian populations exhibit both haplgroups C and N, they exhibit haplogroup C in far lesser proportions, probably due to admixture from various populations as a result of centuries of migrations and reverse migrations along the silk road. The dominant clade of haplogroup C among Altaic speakers is C3, which is believed to have first occurred approximately 12,000 years ago. C3 is also present in over 11% of the hypothetically Macro-Altaic speakers of the Korean language, and the related C1 sister clade, also approximately 12,000 years old, is frequent in hypothetically Macro-Altaic Japanese speakers. Perhaps the Macro-Altaic languages of the Japanese and Korean populations split off from the rest of the Altaic languages very close to the time subclades C1 and C3 occurred. Other Y-DNA admixtures in Japanese and Koreans are not discussed here, as they are not strictly-speaking, Siberian populations. The Ainus of Siberia and Japan, however, will be discussed in a subsequent section of this article. It suffices to say that populations of Proto-Macro-Altaic speakers arrived in Siberia, probably in two migrations (C1 along the coast and C3 further inland), around 10,000 years ago, and soon after, the inland population began interbreeding with earlier Uralo-Siberian populations exhibiting haplogroup N.

Subdivisions of Haplogroup K(xLT)

At the timeframe of around 40,000 years before present, K(xLT) was probably already divided into four subclades, namely M, NO, P and S. Haplogroups M and S are not of particular importance to this discussion, as they generally appear only in island populations of the southern Pacific, such as New Guinea. Approximately 30,000 years ago, haplogroup O split off from NO, and descendants of the individual who first carried that mutation include much of the population of central and southeast Asia (including Sino-Tibetan, Tai-Kadai, Austronesian, Hmong-Mien and Austro-Asiatic speakers). Haplogroup N probably first occurred around 25,000 years before present, and its N1 subclades (primarily N1c) probably arrived in Siberia around 15,000 years ago, and spread across the region via Uralo-Siberian speaking populations. Haplogroup Q (the predominant Siberian subclade of haplogroup P) probably predated haplgroup N in Siberia, and will be discussed subsequently.

Evidence strongly suggests that the progenitor of haplogroup N spoke a language ancestral to the Uralic languages, as haplogroup N is found in high proportions in all areas where Uralic languages are spoken, particularly in populations of Nenets (97.3%), Nganasans (92.1%) and Khants (76.6%). See also, the Tambet study, cited above. Selkups appear to be outliers, exhibiting only 6.9% haplogroup N, but 66.4% haplogroup Q, indicating a likely language shift of a Q population that assimilated into Uralic culture. According to a 2001 study by Jeffrey Lell, et al., 50.6% of Yupiks (Eskimo-Aleut speakers) are also of haplogroup N, as are 58.3% of Chukchis (Chukotko-Kamchatkan speaker), which supports the Uralo-Siberian hypothesis. I was not able to find any statistics regarding the levels of N among Yukaghir speakers. Although Koryaks speak a Chukotko-Kamchatkan language, they only exhibit 22.2% haplogroup N, and appear to have interbred heavily with Altaic speaking populations. The same is true regarding Itelmens, who only exhibit 11% haplogroup N, and have large components of both Altaic and Slavic admixture (C3 and R1a, respectively).

Origin of the Ainu

It would appear that the Ainu may be remnants of the first migration from Africa to Asia, as they exhibit haplogroup D (specifically D2), which is a sister clade of haplogroup E (common among Nilo-Saharan populations in Africa). Ainu also exhibit a small percentage of haplogroup C3, suggesting admixture from an unrelated population in Siberia, probably the Nivkhs who co-habited with the Ainu on Sakhalin Island prior to the Ainu migration to Japan, and who exhibit exhibit C3 at a rate of 47%. If the Ainu language was found to have any recognizable relation to any of the other Siberian languages, it would have to be a result of language shift, as the Ainu are not believed to be closely related to any other Siberian population, with the exception of the Nivkh admixture mentioned above. While the Ainu are the only population in Siberia with a strong correspondence to haplogroup D, other populations with considerable proportions of haplogroup D include Negrito populations in southern Asia, especially aboriginal Andaman Islanders. Perhaps mt-DNA analysis can explain the difference in appearance between the Ainu and Negrito populations. If the founding population of the Ainu consisted of mostly males, and if those males interbred early-on with females from another Siberian population (such as the Nivkhs), then one would expect to see haplogroup D in terms of Y-DNA, but mostly common Siberian mt-DNA haplogroups.

Haplogroup Q

With the possible exception of the paternal ancestors of the Ainu, it would appear that the earliest settlers of Siberia were populations exhibiting haplogroup Q. In fact, haplogroup Q underlies all of the previously discussed populations in low frequencies (possibly excepting the Ainus, for which I have not found any data relating to haplogroup Q frequency). This leaves us with one haplogroup with which to explain the divergence of the Nivkh, Dené-Yeniseian and Amerind languages, all exhibiting high proportions of haplogroup Q. Lell's study puts the Nivkh population at 35% Q. Kets (Dené-Yeniseian) have a 93.7% incidence of Q according to Tambet.

Michael Fortescue has suggested that the Nivkh language may be connected to a Native American language family referred to by Edward Sapir in 1929 as the Mosan language family (a subgroup of the traditional Amerind grouping that includes Salishan, Wakashan, and Chimakuan languages of the Pacific Northwest). According to Wikipedia, however, it has not yet even been proven whether the Mosan language family is in fact monophylectic. I would suggest, that based on Y-DNA evidence, a Paleosiberian language ancestral to the Nivkh, Dené-Yeniseian and Amerind languages, was probably once spoken in eastern Siberia, prior to the arrival of the Uralo-Siberian, Altaic and Slavic speakers. I am not suggesting at this point that Amerind is a monophylectic group, and in fact there seems to be some evidence that some Amerind languages are more closely related to Nivkh than they are to other Amerind languages. The key to reconstructing a Paleosiberian proto-language will be first studying, classifying and reconstructing the proto-languages of language families of the Americas, and in particular Joseph Greenburg's illusive Amerind languages.

2 comments:

Andrew Oh-Willeke said...

Your date for Slavic is far too old. The Slavic expansion is historically dated to after the fall of the Roman empire during the Middle Ages starting ca. 600 CE (about 1400 years ago). It probably traces more distant roots to Illyrian and other Balkan languages. There was probably a meaningful demographic component to this expansion, not just language shift.

The predecessors of Slavic in much of its territory, however, were also probably Indo-European, from an early Eneolithic era expansion, except in the most Northern areas where they would have been Uralic language speakers.

The antiquity of Uralic is unclear as well. Linguistic evidence would suggest a proto-Uralic language comparable in age to proto-Indo-European give or take a millenium or so. While the circumpolar N1 haplogroup is associated with Uralic today, they bearers of that language could very well have spoken a Paleosiberian language instead.

Andrew Oh-Willeke said...

Also, the Ainu are pretty closely associated with the Jomon people of Japan from ca. 30,000 years ago.