Software tool helps make genome-wide association studies more diverse


Source: RKaulitzki / Getty Images

A software package called Tractor, developed by researchers at Massachusetts General Hospital, can help diversify genome-wide association studies.

Numerous genome-wide association studies or GWASs have been conducted over the past 20 years and a large amount of data on the genetic variants that contribute to different diseases has been collected. This information is increasingly used to develop genetic risk scores to predict the likelihood of a person developing diseases ranging from cancer to Alzheimer’s disease.

However, the data collected in these studies is largely based on studies of people of white European ancestry, meaning that the developed risk scores are less accurate for people of other ethnicities such as people of Asian descent. or African.

“If you build disease risk models on the available data and attempt to extrapolate them to various populations, the accuracy of predicting who will get sick is reduced,” says Elizabeth Atkinson, Ph.D., lead author of the article describing research published in Genetics of nature and researcher in the Analytical and Translational Genetics Unit at Massachusetts General Hospital.

“These errors exacerbate existing health disparities, in part because we cannot find specific genetic variants that could contribute to a higher risk of a particular disease in various populations.”

Part of the reason that studies have been limited to these homogeneous groups so far is that mutation patterns in different ethnic groups are influenced by human migration patterns from thousands of years ago and there has been difficult for researchers to correct in statistical analyzes. For example, a person of African descent will likely have much more genetic variation than a person of European descent and these differences can be difficult to correct when analyzing genomic data.

What’s this Tractor software tool available for free was developed to help with. It is designed to allow the inclusion of diverse groups in GWAS studies without making data analysis difficult or inaccurate.

“Different ancestry groups have genetic variants that occur at different frequencies due to the demographic history of populations,” says Atkinson.

“Failure to account for ancestry in a GWAS can lead to false positive results or to genetic variants that cancel out and are dismissed as unimportant. So until now it has been easier to exclude people with multiple ancestry from GWAS to avoid being confused by different patterns of genetic variants.

The tractor uses genomic information to identify the ethnicity sequence of individuals included in a GWAS and label it accordingly. This allows researchers to more accurately estimate the ancestry-specific effect sizes of different mutations.

Atkinson and his colleagues tested the software in simulation and on population samples with different ancestries and found it to be accurate and improve the predictive power of GWAS. It also allows researchers to estimate how much a specific variant influences disease risk in different populations by estimating ancestry-specific effect sizes, which is not possible in a standard GWAS.

“Instead of getting a weighted average of the disease risk effect size for a particular gene variant, Tractor can determine the magnitude or small size of the effect of a variant in various groups of ancestry, ”Atkinson explains. “It will be instructive for establishing genetic risk scores in various populations.”


Comments are closed.