The following tutorial will take you through a series of steps that will illuminate the domain organization of POU-domain transcription factors that is described in the following article.

AliWABA: Alignment on the Web with an A-Bruijn Approach
Neil Jones, Degui Zhi, and Ben Raphael
Submitted (BibTeX)

  1. Open the main ABA page. Clicking on this link will open the same page as if you had clicked "Run AliWABA", but does so in a new window so that you can read the tutorial at the same time you do the operations.
  2. You will be aligning a set of about 10 proteins, so select "This is Protein" in the sequence type box, like so:
  3. Cut the following sequence data from this page and paste into the Sequences text box on the AliWABA page.
  4. Click Align and wait for the results window to appear:
  5. The results window is divided into two panes, the left hand containing a summary of the alignment and the right containing the ABA graph. Scroll around the ABA graph to get a feel for its structure. Notice that there are a few vertices (for example, 1 and 3) that have high fan-in and a vertex with high fan-out (5), but keep in mind that vertices in the ABA graph do not correspond to any portion of a sequence. More importantly, there are several long edges with high multiplicity: (3,4) and (5,6). Both of these edges are listed as (potential) domains in the left hand pane.
  6. Select edge (3,4) which has high multiplicity and, based on the graph drawing algorithms, appears to be a central bottleneck in the ABA graph. Scroll to the bottom of the right hand pane and choose the radio selector next to the words "Explicit path" and type in (3,4) into the text box, like so:

    Click on "Change Selection button. The edge from node 3 to node 4 should now be colored red.
  7. At the very bottom of the right hand pane, press the button that says "Query edges against CDD." You will be given a verbatim report from RPSBlast, which examines each sequence that aligns along the selected edge against the Conserved Domain Database. This operation may take a minute or two, depending on server load.
  8. Inspecting the RPSBlast report should give you the impression that edge (3,4) is a POU domain and a homeobox. If you had protein sequences and did not know what they did, selecting the high multiplicity edges and querying against RPSBlast might give you some guess as to the function.
  9. To get back to the graph from the RPSBlast report, simply click on the link in the upper left hand corner that says "Current A-Bruijn Graph". Clicking on your browser button may or may not pop up a dialog asking you to resend form data. It doesn't matter if you resend form data to this application, but the dialog itself can be annoying after a few iterations.
  10. One of the advantages of the ABA graph is that it helps to identify potentially shuffled domains. In the left hand pane, locate the table describing potentially shuffled domains. There is one entry, OCT1_MOUSE and PIT1_MOUSE. To get an idea of how these are related, we'll select the path corresponding to these two sequences:
    1. First, clear the graph of the previous selection (actually, the edge will just be selected again in the next step, so this isn't strictly necessary). Click on the "Reset Graph" button in the lower right pane.
    2. Scroll to the bottom of the right hand pane and select the radio selector next to the word Sequence, and then choose OCT1-MOUSE from the adjacent selection field. Click on the "Change Selection" button.
    3. Your result should look like:
    4. Repeat for PIT1_MOUSE, making sure that you are taking the union (a full description of the options can be found in the User's Guide on this site) of the selections.
    5. Your result should look like:
  11. Notice that there seems to be a triangle of tangled red edges, suggested by the shuffled domain list on the left pane. This pattern indicates shuffling: a local similarity in PIT1 appears before it appears in OCT1, relative to the POU domain (edge (3,4) is the POU domain, a fact you verified by querying that edge against CDD).
  12. To get a better idea of how PIT1 and OCT1 are related, we will realign those two sequences independent of the others. Their sequences are

    Click on the link on the left labelled "New alignment"
  13. Realign the two sequences above, remembering to set the Sequence Type to Protein.
  14. The resulting pairwise ABA graph shows quite clearly that a 62 residue sequence exists in both and appears shuffled w.r.t. the POU domain. Keeping in mind that the vertex numbers in this graph have absolutely nothing to do with the vertex numbers in the previous graph you generated, select edge (3,4) as you did before.
  15. A pairwise alginment of the sequences along this edge can be found by selecting the "Display multiple alignment for selection" button at the bottom. The ClustalW alignment is shown, and after a few moments, a button labelled "JalView" will pop into being in the upper left corner of that page. Click this button (presuming you have Java installed) and the alignemnt will be shown "graphically" in a way that displays the conservation of each column in the alignment, like so:
  16. Querying the same edge against CDD will likely yield no results. To our knowledge, this region of similarity has not been reported before. We are not claiming that it is biologically functional, just that it makes for a good example of how to use AliWABA.