>, followed immediately by the name of a sequence,
followed by one or more lines of sequence data, followed by
another > line, etc. For example:
>gene1 from Human ATTATTGTATG.... TTCTGATCCCT.... >gene2 from Human .... >gene3 from Mouse ....It is important that the word immediately following a
> sign be unique.
Thus, the following would probably not work.
>gene1 from Human ATTATTGTATG.... TTCTGATCCCT.... >gene1 from Mouse .... >gene1 from Rat ....For especially long sequences, or where you expect the number of domains to be extremely large, you may be better off downloading the ABA source code and running it locally; for very large tasks---e.g., aligning genomes---many of the tasks are easily automated and parallelized, but we cannot currently provide those services through the web site.
If, after clicking Align in the previous step, a blank page loads, it is likely that your sequences
were not alignable. For example, sequences that are too short for reasonable alignments or that are
corrupted in some way may cause an error in the alignment program. In these cases, please mail your
sequence file to aba AT aba DOT nbcr DOT net and we will diagnose the problem.
Upon success, AliWABA will display a (possibly large) image of a graph. The notation of this graph
can be somewhat cryptic, so an explanation is in order. As with all ABA graphs, vertices do not
represent subsequences of the alignment, but intersections of subsequences. Edges represent
local alignments between 1 or more sequences. In the graphical representation of the ABA graph
used by AliWABA, vertices are numbered with an integer; this integer can be used to choose a path
in the graph (see below). Edges are labelled with three numbers, in the format a,b(c).
The first number, a, denotes the "length" of the edge (that is, the length of the local
alignment induced by the edge). The second number, b, is simply an edge identifier and can
generally be ignored. The final number (in parentheses) represents the multiplicity of that edge, or the number
of sequences participating in the local alignment. For example, the edge label 424,44(2)
describes a local alignment of length 424 between two sequences; recall that a local alignment between
two sequences has a length, and that length may be different than either of the substrings that participate
in it because of indels.
As a way to mark the sequences, the "source" vertex of each sequence is usually drawn with a red box around it and the name of the sequence written above the box. This can sometimes lead to confusion because the label for one sequence may be drawn in the box for another sequence, but this can easily be seen by selecting edges for a given query sequence (see below).
Usually the ABA graph will be very large. You can change the size of the image or the compactness of vertices in the image by adjusting view options, as described below.
The primary operation that can be done from this view is to select edges, whose sequence data can then be either viewed or annotated.
At the bottom of the graph image are two sections with parameters that you may specify. The first section, Edge Selection allows you to pick edges, paths, or sequences that you may be interested in exploring further. It is generally the case that an ABA graph contains a few high multiplicity edges that may represent interesting alignments, and many low multiplicity (or unique) edges that act as a sort of sequence glue that is less interesting. Often, you want to identify the high multiplicty (and especially, long and high multiplicity) edges and scrutinize them further, by retrieving the sequences along those edges or by annotating them against known domain databases.
There are five ways to specify a set of edges for which you want more information. To activate any particular method, click on its radio button and fill in the necessary parameters; pressing the "Change Selection" button will update the image (selected paths are marked in red), while pressing the "Reset Selection" button will forget any selections you've made.
(a,b,c,d,...)
containing 2 or more vertices. Multiple sequences can be activated at once
as (a,b,c,d...)+(x,y,z...). Once an edge is selected, it is always selected, so
(1,2,3,4,5)+(2,3,4) is the same as (1,2,3,4,5), not (1,2)+(4,5).
When a selection is already present on the graph, the natural operation to perform when adding a new selection
is to AND them together, which is the default operation. You can, however, take the intersection
which will result in a smaller subsection; this feature is primarily for completeness, as it is almost always
easier to construct the intersection by specifying its exact parts. There are cases, however, that you will
need to remove edges from a selection. In this case, you can enforce Curr-New, which takes
the existing selection, computes the selection that you have specified in the form, and then removes the new
selection from the current selection and returns the difference. (Symmetrically, you can do this the opposite
way, where you select a small set of edges, then specify a bigger set and ask for New-Curr.)
One case that this is useful for is when you want to investigate only low multiplicty edges near two
vertices, one with high fan-in and the other with high fan-out. Suppose you have vertices A and
B, where A has in-degree 10 and B has out-degree 10, and
the edge (A,B) has multiplicity 10. To get all the fan-in and fan-out edges adjacent
to A or B, select edge (A,B) by explicitly describing it.
Then extend the current selections by 1 edges on each end. Then explicitly specify edge (A,B),
choose Curr-New and press Change Selection
ABA graphs can be quite large. We provide two modes of drawing the graph:
Additionally, clicking on the graph itself, regardless of size, will display the graph at full resolution. Note: Firefox, and other browsers, may choose to display a thumbnail of this image, but if you click the image again it will be displayed at full resolution.
Each edge in an ABA graph represents a local alignment. By selecting edges and pressing
the Display FASTA for selected edges button, you will be presented with a
new FASTA file that contains the segments of the sequences that aligned along those edges.
This file can then be input into a multiple alignment tool of your choice to see a more
detailed alignment. The alignments performed by ABA to construct the ABA graph may be a coarser
alignment than the seuqences actually warrant so as to reduce spurious alignments that could
cause a different graph topology. By displaying the FASTA for sequences along an edge, you
can perform a more detailed or exact alignment.
Because ABA (and, by extension, AliWABA) is a tool for exploring the domain organization of
biosequences, it is sensible to check domains that you might have found in the ABA graph
against known domain databases. In particular, we have enabled searching the Conserved Domain Database
(www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd)
for selected edges. By performing an edge selection and then pressing the Annotate selected edges
button, you will be presented with a BLAST report that describes any significant (E-value < .001) hits to
Cdd. Note: This feature may take several minutes to run.
The GraphViz DOT file used to construct the image can be downloaded from a link at the bottom of the ABA Graph viewing page. This may be helpful if an ABA graph is needed in a figure for publication or a presentation, or if you would like to simplify the ABA graph by removing any short edges that you have some a priori reason to remove.
The following features are being considered. Your input would be extremely helpful in determining their relative priorities or suggesting new ones.