Automated Clone Evaluation (ACE) - Automated high throughput sequence validation

Development Team: Michael Fiacco and Preston Hunter

 

Increasing interest in elucidating the function of proteins encoded by numerous sequenced genomes has led to the development of the functional proteomics field. For functional protein studies to be relevant, cloned plasmids must be accurate copies of the intended gene, which can only be determined by full length sequencing and annotation. Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, a time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be “acceptable” for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation. >> VISIT ACE WEBSITE


 

 

We developed an Automated Clone Evaluation (ACE) system, which is a comprehensive, multi-platform, web-based plasmid sequence verification software package (Taycher et. al. 2007). ACE automates the clone verification process by defining each clone sequence as a list of multidimensional discrepancy objects (see table for details), each describing a difference between the clone and its expected sequence including the resulting polypeptide consequences. To evaluate clones automatically, this list can be compared against user-defined acceptance criteria that specifies the allowable number of discrepancies of each type. This strategy allows users to re-evaluate the same set of clones against different acceptance criteria as needed. ACE manages the entire sequence validation process including contig management, identifying and annotating discrepancies, determining if discrepancies correspond to polymorphisms, and clone finishing (Chang and LaBaer 2005) (see figure below for typical clone sequence verification workflow). ACE provides an integrated environment that relieves the user from routine process management tasks, such as bookkeeping of all project- and clone-related information, re-entering of parameters and criteria, and history tracking. It automates all steps of sequence verification, including primer design, sequence contig assembly, gap mapping, demarcating low confidence regions in sequence coverage, and identifying polymorphisms.

Designed to manage thousands of clones simultaneously, ACE uses an Oracle database to store information about clones at various completion stages, project processing parameters, and acceptance criteria. The software has been used successfully to evaluate over 250,000 clones at the LaBaer lab. Its use has allowed a dramatic reduction in the amount of time and labor required to evaluate clone sequences and has decreased the number of missed sequence discrepancies, which commonly occur during manual evaluation. In addition, ACE has helped to reduce the number of sequencing reads needed to achieve adequate coverage for making decisions on clones.


 

 


Block diagram of typical clone sequence verification workflow (image at left).  The diagram illustrates a typical workflow for the full length sequence validation of a protein coding clone used in the LaBaer lab. A project begins with end read sequencing of one or more clonal isolates per target. End reads are acquired, assigned to their corresponding clone (End Read Processor), and then processed by the assembler to determine if end reads alone are sufficient to yield a complete contig assembly (Assembly Wrapper). Whether or not the assembly yielded a single contig covering the full-length CDS, clone contig(s) are analyzed to detect differences or “discrepancies” compared with the reference/target sequence (Discrepancy Finder). ACE can compare any discrepancies with one or more sequence database, such as GenBank, to determine if they correspond to naturally occurring polymorphisms (Polymorphism Finder). During the final decision process, users can optionally configure the software to avoid penalizing clones for discrepancies that represent polymorphisms (Figure 2). If more than one isolate exists for a given clone, an optional module (Isolate Ranker) can rank isolates based on user-defined preferences specified in the form of penalties associated with different types of discrepancies. Clones that failed to assemble into a single contig covering the CDS can be scanned to find the remaining gaps in sequence coverage (Gap Mapper). In addition, sequence regions of low confidence can be analyzed to demarcate their boundaries (Low Confidence Regions Finder). Subsequently, clones with low confidence regions or gaps in sequence coverage can be processed to define appropriate internal sequencing primers to cover those regions (Primer Designer). At any stage during the clone verification process, a set of clones can be processed by the Decision Tool in order to determine how far each clone has progressed in the analysis pipeline and its acceptance/rejection status. Processing steps in parentheses and italics are optional.

 


Publications

Taycher E, Rolfs A, Hu Y, Zuo D,Mohr SE, Williamson J, LaBaer J. A novel approach to sequence validating protein expression clones with automated decision making. BMC Bioinformatics. 2007 Jun 13;8:198. PMID: 17567908

Chang CY, LaBaer J. DNA polymorphism detector: an automated tool that searches for allelic matches in public databases for discrepancies found in clone or cDNA sequences. Bioinformatics. 2005 May 1;21(9):2133-5. Epub 2005 Feb 2. PMID: 15695424