Its adaptability and wide distribution are thanks to the diversity of its genetic material. Different naturally selected strains (called accessions) of A. thaliana have genomes that not only differ in small ways, but have hundreds of genes that are absent in some accessions and have extra copies in others. This genetic flexibility leads to enormous phenotypic variety, with different accessions of the plant showing differing stress response, disease resistance, germination and flowering behavior, and other traits. It also makes A. thaliana an ideal plant on which to study the genetic basis of phenotypic variations.
The 1001 Genomes Project, a collaboration between thirteen research institutions worldwide including the University of Chicago, was launched in 2008 with the goal of exploring whole-genome variation in at least 1001 A. thaliana accessions. The first phase of the project was recently completed with the successful sequencing of 1,135 genomes and publication of the results in the journal Cell. Researchers used this data to draw conclusions about A. thaliana’s population structure and the history of how the plant has evolved and migrated.
Professor Joy Bergelson, PhD, of the Department of Ecology and Evolution collaborated with the CRI to accomplish the University of Chicago’s contribution to this study and publication. Sequencing was made possible by our new HPC cluster Tarbell, on which analysis required over 350,000 CPU hours. Our HPC System Administrator Mike Jarsulic worked with 1001 Genomes researchers to solve problems and optimize performance, while Director of Bioinformatics Jorge Andrade, PhD, coordinated data collection and nomenclature and contributed to the development of bioinformatics pipelines for analyzing the genomes.
A major practical motivation for the study was to create an opportunity for a genome-wide association study (GWAS) using the nearly complete genotypic information made available by sequencing such a large collection of accessions. GWAS is a technique for mapping genetic variations in individuals to determine whether they are associated with phenotypic traits, and these studies become significantly more powerful with larger sample sizes. A. thaliana, with its many inbred strains adapted for different environments, offers a particularly well-suited set of diverse genomes on which to use GWAS techniques.
The impact of having sequenced these genomes will go far beyond the findings in this publication. Because it is possible to produce unlimited amounts of genetically identical A. thaliana plants under whatever environmental conditions researchers choose, having this genomic data available will enable new studies on how environment and genotype interact, a question relevant not only to evolutionary biology and plant breeding but to human genetics as well.