The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300–500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.
Siddis, or Habshis, are a unique tribe that has African ancestry and lives in South Asia. They are mainly found in three Indian states—Gujarat, Karnataka, and Andhra Pradesh—and according to the latest census, their total population size is about 0.25 million.1 The first documented record of Siddis in India dates to 1100 AD, when the Siddis settled in Western India.2,3 By the thirteenth century, substantial numbers of Siddis were being imported by the Nawabs and the Sultans of India to serve as soldiers and slaves. The major influx of Siddis occurred during the 17th–19th centuries, when the Portuguese brought them as slaves to India.2 Previous genetic studies have shown that the Siddis have ancestry from up to three continental groups: Africans, Europeans, and South Asians.2,4,5 Some genetic studies have suggested that they are most closely related to Africans.3,6 However, the specific African group to which the Siddis trace their ancestry remains unknown. To obtain a high-resolution genome-wide perspective of ancestry, we analyzed data from three Siddi groups (from Karnataka and Gujarat) by genotyping them with ∼850,000 autosomal and sex-linked markers. Applying statistical methods, we have estimated the contributions of various continental ancestries to the Siddis genome and investigated the likely source of the ancestral populations and the timing of the admixture events.
Blood samples (about 10 ml from each individual) were collected from Gujarat and Karnataka in India. Specifically, we collected samples from 60 Siddis (unrelated and healthy males) and 90 individuals belonging to the nearby tribal populations (Charan and Bharwad) of the Junagarh district of Gujarat and from 94 Siddis (65 males and 29 females) and 178 individuals belonging to neighboring tribal populations (Medar, Gram Vokkal, Kare Vokkal, and Korova) from the Uttara Kannad district of Karnataka. Informed written consent was obtained from all the donors. This project was approved by the Institutional Ethical Committee of the Centre for Cellular and Molecular Biology, Hyderabad, India. We genotyped 16 Siddi samples on Affymetrix (SNP 6.0) arrays by using standard protocols. We removed four duplicate samples and restricted the analysis to SNPs that had <5% missing data (846,418 SNPs). Two merged datasets were created for further analysis. Dataset I contained Siddi data that were merged with data from the International Haplotype Map (HapMap) phase 37 (n = 1,115 samples from 11 populations genotyped on an Illumina 1M array) and the India Project (n = 132 individuals from 25 groups genotyped on an Affymetrix 6.0 array). The merged dataset contained 574,197 SNPs. Dataset I was used for the principal-components analysis reported in Figure 1. Dataset II contained the Siddi data merged with three other datasets: the Population Reference Sample (POPRES)8 (n = 3,845 samples from 37 populations genotyped on an Affymetrix 500K array), the International Haplotype Map (HapMap) phase 37 (n = 1,115 samples from 11 populations genotyped on an Illumina 1M array), and the India Project9 (n = 132 individuals from 25 groups genotyped on an Affymetrix 6.0 array). The merged dataset contained 257,840 SNPs. Dataset II was used for estimating the admixture proportions and dates of admixture.
African Ancestries in Siddis
To explore patterns of population structure in the Siddis and to test their genetic affinity to other groups worldwide, we analyzed autosomal data from 12 individuals from three Siddi groups (six individuals from Karnataka and six individuals from Gujarat), 128 individuals from 16 Indian groups (Mala, Madiga, Kurumba, Bhil, Kamsali, Satnami, Vysya, Naidu, Lodi, Tharu, Velama, Srivastava, Meghaval, Vaish, Kashmiri Pandit, and Hallaki), and 300 individuals from three HapMap populations (Yoruba from Ibadan, Nigeria [YRI], Utah residents with Northern and Western European ancestry [CEU], and Han Chinese from Beijing, China [CHB]).7,9 The 16 Indian groups were chosen because they spanned a high degree of diversity within India. It had been previously shown that most Indian populations have ancestry from two highly divergent groups: an Ancestral North Indian (ANI) population that is closely related to West Eurasians and an Ancestral South Indian (ASI) population that is not related to any population outside India. The ANI ancestry proportion lies within the range of 39%–71% across the 16 groups chosen.9 The ANI and ASI have been inferred to be highly differentiated at the time that they mixed, and Reich et al. (2009)9 estimated that the average allele frequency differentiation, FST (ANI, ASI), is ∼0.09.
We performed principal-components analysis (PCA) on the autosomal SNP data with the EIGENSOFT software.10 A plot of the first and second principal components (PCs) suggests that the Siddis have ancestry from Africans as well as Eurasians (Figure 1A). Like other Indian populations, Siddis have both ANI and ASI ancestry, but they lie off the main cline of ANI-ASI admixture and are closely related to African individuals (Figure 1A). The average allele frequency differentiation between the two Siddi groups (Karnataka and Gujarat) is relatively high; FST (Siddi_Karnataka-1, Siddi_Gujarat) = 0.02 (Table S1), suggesting that the populations differ substantially, possibly as a result of endogamy, different ancestral origins, or admixture with different local South Asian groups. However, the diversity in the Siddis is not correlated with geography in our small sample; the individuals from the Karnataka-2 group are genetically closer to the Gujarat Siddis (FST [Siddi_Karnataka-2, Siddi_Gujarat] = 0.002) than to the other group from Karnataka (FST [Siddi_Karnataka-1, Siddi_Karnataka-2] = 0.026) (Table S1, available online). This suggests that the members of Karnataka2 might be recent migrants from Gujarat or that the ancestors of one of the Karnataka samples might have experienced a very strong recent founder event.
Previous genetic studies with traditional biochemical and autosomal markers have suggested that the Siddis have ancestry from up to three distinct ancestral groups: Africans, Indians, and Europeans.3,6 To formally test whether the Siddis have ancestry from each of these three ancestral populations, we used a regression method proposed by Patterson et al. (2010).11 This method allowed us to model the allele frequency of the admixed Siddis as a linear combination of the allele frequencies in the ancestral populations and to build optimal models with and without each ancestral population and then compute the error between our model and the data. For example, to test whether the Siddis contain genetic admixture from Africans, we built two models with the data; one included Africans as the ancestral population, and another excluded Africans from the model. Applying this method to the Siddi Gujarat samples, we observed that there is strong evidence that the Siddis have African ancestry (Z score >> 25), but the genetic variation in Africans does not fully explain the underlying genetic data in the Siddis (Table S2A). Next, we assessed whether a two-way model or three-way mixture model provides a better fit to the data. Table S2B shows that a two-way model of African + Portuguese or African + Mala (or any other group that has high ASI ancestry) provides a poor fit to the data. However, the model of African + Vaish (or any other group that has high ANI ancestry) provides just as good a fit to the data as a three-way model of African + any Indian population + Portuguese (Table S2B). This suggests that the Siddis have some West-Eurasian-related (ANI or Portuguese) ancestry, in addition to their African and ASI ancestry. However, the size of our dataset prevents our methods from being sensitive enough to differentiate between ANI and Portuguese ancestry. To represent the ancestral non-African population of the Siddis, we combined the data from 16 Indian groups and the Portuguese (“ICP”). To test the robustness of our models, we analyzed Siddi Karnataka samples with the models built from the Siddi Gujarat samples and showed that the models provided a good fit to the data (Table S2C).
Applying the regression-style method to all three Siddi groups with YRI and ICP as the ancestral populations, we estimated that the Siddis have on average ∼67% African ancestry (Table 1). We obtained qualitatively similar results when we used East Africans (HapMap Luhya [LWK]) in place of YRI (Table 1 and Table S2D).
Estimation of Ancestry Proportions in the Siddis
To characterize the temporal impact of admixture and to develop a historical interpretation of the results, we needed not only to qualitatively demonstrate a history of admixture but also to quantitatively estimate a date for the admixture event. We applied the ROLLOFF method,12 which utilizes information related to admixture linkage disequilibrium (LD) to estimate the time since admixture. This method capitalizes on the fact that the genome of an admixed population contains chromosomal segments from ancestral populations, whose length is inversely proportional to the date of admixture. By modeling the decay of the LD in the admixed individuals and weighting it by the allele frequency differentiation in the ancestral populations (such that the statistics are only sensitive to admixture LD), we can precisely estimate the time since the admixture event. Simulations have suggested that this method is robust for data from poor surrogates of ancestral populations and can estimate the date of admixture up to 300 generations ago.12
Applying ROLLOFF to the Siddis (combining data from all three groups—Siddi_Karnataka-1, Siddi_Karnataka-2, and Siddi_Gujarat—to increase the power), we observed an approximately exponential decay of the weighted correlation with distance, which provides strong evidence of admixture (Figure 2). By using the least-squares method to fit an exponential distribution to this pattern, we estimated an average date of ∼eight generations, or 200 years (if one assumes a generation size of 25 years13). This approximately coincides with the historical date of arrival of most African ancestors of the Siddis to India. To show that combining the data from the admixed group does not substantially change the results, we ran ROLLOFF separately for each admixed group and obtained qualitatively similar results (within two standard errors) for Siddi_Gujarat and Siddi_Karnataka-1. Because of the limited number of samples, we were not able to perform analysis for the Siddi_Karnataka-2 group (ROLLOFF analysis requires at least four samples). In addition, changing the African ancestral group to East African Luhya did not change the estimated date of admixture (Figure S1).
ROLLOFF Analysis of Siddis
To gain insight into the most likely source of the African ancestry in Siddis, we examined paternally inherited Y-chromosomal biallelic markers as well as maternally inherited mtDNA markers. Analysis of data from uniparentally inherited markers can provide information about population genetic relatedness, including probable ancestral source populations and information related to admixture events. We genotyped 32 Y-chromosomal biallelic markers (viz. M94, M60, M182, M168, M130, M145, M96, M75, M2, M89, M82, M304, M172, M9, M70, M11, M45, M207, M173, M17, M124, M201, M170, M70, M147, M189, M214, M52, M33, M356, P36, and P2) in 125 Siddis and 268 individuals (all males) from nearby Indian groups. We combined our data with published data from 2,301 individuals belonging to 56 different groups from the African subcontinent and 667 individuals from 16 populations from Gujarat, Karnataka, Maharashtra, and Andhra-Pradesh in India (Document S2).14–26
We observed that the Y-chromosomal haplogroups B2-M182 and E1b1a-M2, which are characteristic of African ancestry, were present at high frequencies in the Siddis but not in other Indians. Moreover, about 70% of the Siddi male lineages fall into haplogroups generally characteristic of African populations (Figure 3A), thus confirming the results from the autosomal DNA markers (Figure 1B). The remaining 30% were C∗-M130- and M89-derived Indian or Near-Eastern lineages (H1a-M82, H2-Apt-H2, J2-M172, L-M11, and P∗-M45). The populations neighboring the Siddis were found to harbor only these Asian-specific haplogroups. It is interesting to note that none of the African paternal lineages were observed among the neighboring Indian groups, whereas Indian-specific lineages were detected in Siddi individuals. This suggests primarily unidirectional paternal gene flow from Indian populations to the Siddis (Figure 2B).
Y-Chromosomal and mtDNA Haplogroups in Siddis
To learn more about the source of the African paternal lineages, we performed PCA with a merge of our Y-chromosomal dataset (Siddis and neighboring Indian groups) with data from 2,301 individuals from 56 African populations (Document S2). A plot of the first and second PCs showed that the Siddis cluster with Bantu-speaking populations of sub-Saharan Africa (Figure S2A). Previous studies have proposed that the E3a (currently known as E1b1a), E2, and B2 haplogroups are associated with the Bantu expansions within Africa.21,22,27 The presence of these haplogroups in the Siddis suggests that their ancestors might have been part of this expansion. To investigate this possibility, we typed 17 Y-STRs by using multiplex PCR and the Y-filer kit (Applied Biosystems, Foster City, USA) in reaction volumes of 10 μl with 1U of AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, USA), 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 250 μM dNTPs, 3.0 μM of each primer (forward primers were fluorescently labeled), and 1 ng of DNA template. Thermal cycling conditions were as follows: (1) 95°C for 11 min, (2) 30 cycles as follows: 94°C for 1 min, 61°C for 1 min, and 72°C for 1 min, (3) 60°C for 80 min, and (4) 25°C hold. The PCR amplicons along with GS500 LIZ (as a size standard) were run in the ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, USA). The raw data were analyzed with the GeneMapper v4.0 software program (Applied Biosystems, Foster City, US).
We excluded two DYS385 loci from the current analyses because they could not be distinguished via the typing method employed, and we renamed locus DYS389I as DYS389b, whereas we calculated DYS389a by subtracting DYS389I from DYS389II. We constructed median-joining networks with ten common loci (Figure S3) for the two major African haplogroups (E1b1a-M2 and B2-M182) that are present at high frequencies in Siddis. We supplemented our dataset with other published data that included African samples.28,29 The TMRCA (time to most recent common ancestor) was estimated with the ρ statistic (the mean number of mutations from the assumed root), for which a 25-year generation time was used, and the TD statistic (for both, a mutation rate of 6.9 × 10−4 per STR per generation was assumed),30 The majority of the Siddis haplotypes were found shared on otherwise Bantu-specific branches and were present all over the tree (Figure S3). In addition, the Gujarat and Karnataka Siddis were highly diverged and did not share any haplotypes. These results support the autosomal observation of high Fst differentiation among Siddis from Gujarat and Karnataka. Although the majority of the Siddi haplotypes were scattered in the network, we found that all haplogroup B2 Gujarat Siddis formed a cluster and coalesced to their most recent common ancestor 2.4 ± 1 Kya (thousand years ago). The sharing of haplotypes suggests relatedness among the samples. This is similar to the results seen in the autosomal analyses of the Siddi_Gujarat and Siddi_Karnataka-2 samples. The male effective population size was estimated with BATWING31,32, via a demographic model that assumes a period of constant size followed by exponential growth (the prior probabilities for the other parameters used in the model were set as previously described).26 A random subset of 40 samples was analyzed after 106 to 108 MCMC cycles, and we obtained the same posterior probability for effective population size (N) as that obtained after 107 cycles. The effective population size of the African ancestors of Siddis brought to India during the slave trade was estimated as ∼1,400 individuals (Table S3 and Figure S4).
To gain insight into the maternal lineages and to test the directionality of maternal gene flow in the Siddis, we assayed the hypervariable region I (HVRI) of mtDNA in 153 Siddis and 269 individuals from the nearby Indian populations (accession numbers JN022021-JN022442). These data were compared with those from the revised Cambridge Reference Sequence (rCRS),33 and variations were scored. We assigned haplogroups on the basis of HVR1 variations and further confirmed these by genotyping the coding-region mutations published to date.34 The mtDNA haplogroup distributions in the Siddis are shown in Figure 3B, Document S3, and Figure S5.
PCA plots of the combined dataset (Document S3) showed that the African-specific mtDNA haplogroups were present at high frequency in the Siddis; these results were similar to the observations from the autosomal and paternal lineages (Figure S2B). The African-specific haplogroup L was present at a frequency of 53% and 24% in Siddis from Gujarat and Karnataka, respectively. Previous studies have suggested that the L0a, L2a, L3b, and L3e haplogroups are associated with the Bantu expansion.35–39 Haplogroup L2a (including L2a1) was observed in the Siddis along with rare sublineages of L2, which further supports the conclusion that the ancestors of the Siddis were most likely African Bantus (Figure S5). The L0d lineage, which is now largely confined to the Khoisan-speaking South African populations but which was possibly more widespread in the past,40 was also observed in two Siddi individuals from Gujarat state. The presence of Indian-specific sublineages of M and N (R and U, which include M2, M3, M5, M6, M33, M35, M39, M57, R8, R30, and U2 haplogroups) is indicative of recent admixture with indigenous Indian populations (Figure S5).26 In addition, haplogroup T, which is widespread in southern and Western Europe41 and is also present at a low frequency in some South Asian groups42 was present in four Siddi individuals (Figure S5). This suggests maternal gene flow from a West Eurasian ancestral source—perhaps Portuguese or Indian. Consistent with the Y-chromosomal results, there is no evidence of African haplogroups in the neighboring Indian populations, thus confirming the hypothesis of unidirectional gene flow to Siddi individuals from contemporary Indian populations (Figure 1B).
In order to further explore the evidence of sub-Saharan ancestry, we analyzed data for the G6PD (MIM 305900) variants in Siddis along with 26 ethnic populations from India. The A− variant, which provides protection against malarial infection and is estimated to have a sub-Saharan African origin between 3,840 to 11,760 YBP,43 was observed only in Siddis (10%) and not in any other Indian populations (Table S4). This further strengthens the evidence for the sub-Saharan ancestry of the Siddis.
In conclusion, our combined analysis of genetic variation in the Siddis, involving high-resolution sex-linked and autosomal markers, provides strong evidence of African ancestry together with unidirectional gene flow from local Indian groups to the Siddis. The directionality of gene flow supports the complex genetic structuring among Indian populations, which are highly influenced by social norms. We have traced the likely ancestry of Indian Siddis to sub-Saharan African Bantus. The ancestry proportions based on the analysis of autosomal and Y-chromosomal markers are similar, whereas mtDNA markers reveal more likely South Asian lineages among Siddi individuals. The model that emerges from our results is as follows: During the course of the Bantu expansion, African farmers settled in East Africa. Later, during the 15th to 17th centuries, this region was predominantly ruled by the Portuguese. They brought some Africans to India as slaves and sold them to local Nawabs and Sultans, whose descendents, admixed with neighboring populations, comprise the present-day Siddi population of India (Figure 4).