Association Analysis in Soybean
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Association analysis is a new approach to identify the location of gene(s)/allele(s) of interest. There are a number of factors determining the feasibility of whole-genome association analysis which include the level of linkage disequilibrium (LD) and the magnitude of population structure in a population. The goal of this study was to evaluate the success of whole-genome association analysis in soybean germplasm accessions using DNA markers across the whole genome. Firstly, the extent of LD and the presence of population structure were estimated. Secondly, whole-genome association analysis was performed to detect the location of the allele/gene controlling flower color, pubescence color, and seed protein quantitative trait loci (QTLs) in 319 soybean [Glycine max (L.) Merr.] germplasm accessions. The soybean germplasm accessions had a relatively low level of LD which declined very rapidly to 0.8 in less than 4 Kbp as indicated by r2 as well as highly diverse population structure. Despite the low LD and the presence of high population structure, whole-genome case-control analysis successfully detected the 65 bp insertion in the GmF3'5'H (GenBank acc. AY117551) gene controlling purple vs. white flower color, as well as a single base deletion in the F3'H (GenBank acc. AB191404) gene controlling tawny vs<\em>. gray pubescence color. However, there were 28 gray pubescence lines that did not contain the deletion suggesting that there is a second mutation determining the pubescence color alteration. In the case of seed protein QTL, whole-genome regression analysis detected one of four previously reported seed protein QTLs which reside on linkage group (LG) E and a new seed protein QTL on LG K. The detection of three other previously reported seed protein QTLs on LGs A1, I and M was not successful. It is unclear why association analysis was not successful in the detection of the three previously reported QTLs. However, a number of reasons including incomplete adjustment for population structure, lack of statistical power, an inadequate number of genetic markers in light of the low level of LD, and the power of association analysis to detect alleles with relatively modest genetic effects are suggested as possible reasons.