The following tutorial will take you through a series of steps that will illuminate
the domain organization of POU-domain transcription factors that is described
in the following article.
AliWABA: Alignment on the Web with an A-Bruijn Approach
Neil Jones, Degui Zhi, and Ben Raphael
Submitted (BibTeX)
- Open the main ABA page. Clicking on this
link will open the same page as if you had clicked "Run AliWABA", but does so in a new window
so that you can read the tutorial at the same time you do the operations.
- You will be aligning a set of about 10 proteins, so select "This is Protein" in
the sequence type box, like so:

- Cut the following sequence data from this page and paste into the Sequences text
box on the AliWABA page.
- Click Align and wait for the results window to appear:
- The results window is divided into two panes, the left
hand containing a summary of the alignment and the right containing
the ABA graph. Scroll around the ABA graph to get a feel for its
structure. Notice that there are a few vertices (for example, 1 and 3)
that have high fan-in and a vertex with high fan-out (5), but keep in
mind that vertices in the ABA graph do not correspond to any portion
of a sequence. More importantly, there are several long edges with
high multiplicity: (3,4) and (5,6). Both of these edges are listed
as (potential) domains in the left hand pane.
- Select edge (3,4) which has high multiplicity and, based
on the graph drawing algorithms, appears to be a central bottleneck
in the ABA graph. Scroll to the bottom of the right hand pane and
choose the radio selector next to the words "Explicit path" and
type in
(3,4) into the text box, like so:

Click on "Change Selection button.
The edge from node 3 to node 4 should now be colored red.
- At the very bottom of the right hand pane, press the button
that says "Query edges against CDD." You will be given a verbatim
report from RPSBlast, which examines each sequence that aligns along
the selected edge against the Conserved Domain Database. This operation
may take a minute or two, depending on server load.
- Inspecting the RPSBlast report should give you the impression
that edge (3,4) is a POU domain and a homeobox. If you had protein sequences
and did not know what they did, selecting the high multiplicity edges
and querying against RPSBlast might give you some guess as to the function.
- To get back to the graph from the RPSBlast report, simply
click on the link in the upper left hand corner that says "Current A-Bruijn Graph".
Clicking on your browser button may or may not pop up a dialog asking you to
resend form data. It doesn't matter if you resend form data to this
application, but the dialog itself can be annoying after a few iterations.
- One of the advantages of the ABA graph is that it helps
to identify potentially shuffled domains. In the left hand pane, locate
the table describing potentially shuffled domains. There is one
entry,
OCT1_MOUSE and PIT1_MOUSE. To
get an idea of how these are related, we'll select the path corresponding
to these two sequences:
- First, clear the graph of the previous selection (actually, the
edge will just be selected again in the next step, so this isn't strictly
necessary). Click on the "Reset Graph" button in the lower right pane.
- Scroll to the bottom of the right hand pane and select
the radio selector next to the word Sequence, and then choose
OCT1-MOUSE
from the adjacent selection field. Click on the "Change Selection" button.

- Your result should look like:

- Repeat for
PIT1_MOUSE, making sure that you are
taking the union (a full description of the options can be found in the
User's Guide on this site) of the selections.
- Your result should look like:

- Notice that there seems to be a triangle of tangled red
edges, suggested by the shuffled domain list on the left pane. This
pattern indicates shuffling: a local similarity in PIT1 appears before
it appears in OCT1, relative to the POU domain (edge (3,4) is the POU
domain, a fact you verified by querying that edge against
CDD).
- To get a better idea of how PIT1 and OCT1 are related, we will
realign those two sequences independent of the others. Their sequences are
Click on the link on the left labelled "New alignment"
- Realign the two sequences above, remembering to
set the Sequence Type to Protein.
- The resulting pairwise ABA graph shows quite
clearly that a 62 residue sequence exists in both and
appears shuffled w.r.t. the POU domain. Keeping in mind
that the vertex numbers in this graph have absolutely nothing
to do with the vertex numbers in the previous graph you generated,
select edge
(3,4) as you did before.
- A pairwise alginment of the sequences along this
edge can be found by selecting the "Display multiple alignment for selection"
button at the bottom. The ClustalW alignment is shown, and after a few
moments, a button labelled "JalView" will pop into being in the upper
left corner of that page. Click this button (presuming you have
Java installed) and the alignemnt will be shown "graphically"
in a way that displays the conservation of each column in the alignment, like so:
- Querying the same edge against CDD will likely yield
no results. To our knowledge, this region of similarity has not been
reported before. We are not claiming that it is biologically
functional, just that it makes for a good example of how to use
AliWABA.