Alignment+of+Sequences

Sequences can be aligned using Clustal Omega found here:

Analyzing alignment results When I align sequences, I find that it can often be difficult to navigate to a specific desired location of the alignment. There are two reasons this is difficult. The first reason is that with the default output format of Clustal Omega, the sequence spans over many lines (carriage returns). Therefore if you try to search for "AATTTCCGGCA" you may not find it because the sequence may span many lines so that it would be something like this "AATTTC CGGCA"

This issue can be resolved by setting the output format of clustal to Vienna. This format puts the whole sequence on one line which I find much easier to deal with.

The second reason that navigating to specific locations of the alignment can be difficult is due to the fact that a particular sequence could be separated by any number of "-" to indicate that the sequence must be spaced to align with another sequence. For example the desired sequence of "AATTTCCGGCA" could be in a form like this "AATTTCCGGCA" This problem can be circumvented by searching for the sequence you want to locate with a regular expression that will match regardless of the position or number of "-". To use regular expressions use something like the website regexr (http://www.gskinner.com/RegExr/) or Notepad 2. Note that Notepad 2 does not support "grouping", therefore something like regexr must be used to create at least the initial regular expression from the original sequence you want to search for. For example, paste the sequence (quotations should be omitted) "AATTTCCGGCA" into regexr, switch to the find and replace mode, enter "(\w)" (without the quotations) into the find box, and "$1\-*" into the replace box. The "(\w)" turns any word character "\w" into a a group indicated with the parenthesis "". In the replace box the "$1" calls these groups back (there was only one group defined so that's why "1" is used). The "\-" indicates that a "-" should be placed after the group (the "\" is used to escape the "-" so that the regex machine does not interpret the "-" to mean that a character class should be spanned). The "*" indicates that the previous token (the "-") can occur 0 or more times to yield a match. So the final output of the replace function is a string which can be used as a regular expression for another string. The output looks like this A\-*A\-*T\-*T\-*T\-*C\-*C\-*G\-*G\-*C\-*A\-*

Now the output in the Vienna format from the clustal program can be pasted into Notepad. I'll paste "AATTTCCGGCA" I will then press Ctrl+H to bring up the find and replace dialog. I will check the "Regular expression search" box to turn it on. I will then enter my regular expression search string that we've created "A\-*A\-*T\-*T\-*T\-*C\-*C\-*G\-*G\-*C\-*A\-*" and hit Find Next. Notepad 2 then highlights my text, and I can now see where the sequence I was interested in lies in the alignment output.

email to Hojoon about online program that can do shotgun type alignment 7-30-13