The Genome Analysis Centre (TGAC) welcomed participants to its one-day training workshop on biological analysis through sequence similarity searching - “Effective Similarity Searching - What BLAST does, Why it works”.
To advance our understanding of genes, how they function and evolutionary developments it can be highly beneficial to search for sequence similarities within diverse DNA and protein samples. Similarities, for instance, can be used to understand evolutionary relationships and help to build phylogenetic trees, or locate genes that are known to be involved in the development of a desired genetic trait, such as disease resistance. Conducting these comparisons is a complex process and requires bioinformatics tools such as the Basic Local Alignment Search Tool (BLAST).
BLAST threshold
Often the programme of choice for researchers, BLAST runs an algorithm that identifies regions of nucleotides in DNA or amino acids in proteins that exceed a chosen threshold of similarity. As well as being able to alter this threshold, researchers may select the databases, or ‘target sequences’, from BLAST’s broad database that the sequence of interest, or ‘query sequence’, will be compared against. Through analysing the results from these searches, researchers can gain insight into an array of biological information.
Evolutionary context
The workshop, led by Professor William Pearson, University of Virginia, explored the biological and statistical concepts that make similarity searching a valuable tool. The group discussed the nature of homology (shared ancestry of sequences), the powerful insights that it can provide and how similarity scores can be used to infer its existence. Participants further explored strategies that will allow them to apply BLAST in an effective and tailored manner to their future research. This included understanding the effects of modifying the BLAST search parameters and appreciating when it can be beneficial to do so.
“To get a better sense of what BLAST does well and when it might not do things as well, it’s really important to put it in an evolutionary context - to think about the biology that is causing this to work; you need to understand that you’re always moving back and forth on an evolutionary tree,” said Professor William Pearson, Biochemistry and Molecular Genetics, University of Virginia.
Scoring matrices
Attendee Ben White, PhD student at TGAC, commented: "I was interested in learning more about BLAST as I’ll be working on data from unreferenced crop species; looking for markers associated with resistant genes. The course made me rethink a lot of what I’d been previously taught about BLAST, and FASTA; highlighting the importance of doing simple things like selecting appropriate databases and thinking about the scoring matrices being used. In my research, I’ll be sure to follow the key principles from the course of using protein databases and expectation values, not percent identity, for my future similarity searches with BLAST/FASTA."
Many thanks to all those who attended and facilitated this workshop.