anti_matter
Junior Member

Posts: 3
Registered: 5-8-2002
Member Is Offline
Mood: No Mood
|
posted on 5-8-2002 at 04:08 PM |
|
|
About Smith-Waterman algorithm
Hello!!
I have two questions to ask:
#1. Why zero is included in this step of SW algorithm:
Hij=max{Hi-1, j-1 +s(ai,bj), max{Hi-k,j -Wk}, max{Hi, j-l -Wl}, 0}
Can any negative number be used as well as a measure of no similarity ? along with filling the zeroth row and zeroth column with the same negative
number.
#2. Why some programs like ClustalW always uses a scoring matrix with positive substitution values? i.e they convert by adding the modulas of the
least number to all of the numbers.
|
|
|
GQuievryn
Post Master
   
Posts: 89
Registered: 3-4-2002
Location: Boston Area
Member Is Offline
Mood: No Mood
|
posted on 5-8-2002 at 06:45 PM |
|
|
The zero is included for LOCAL sequence alignments. If you had two sequences that you wanted to align from beginning to end, you would NOT include the
zero. You can think of the zero as allowing you to start anywhere in the matrix.
The numbers themselves really don't have much relavance, as they depend on the parameters. For local alignments, you usually choose the highest
number in the matrix to start the recursive reconstruction from. So a really low negative number just means that there aren't many matches. I'm not
sure why you would necessary use these numbers for anything such as a measure of similarity.
For your second question, I'm not entirely sure what you are asking. Are you talking about sequence alignment or protein alignment (using scoring
matrix such as PAM)? Can you give an example of what you are talking about?
|
|
|
anti_matter
Junior Member

Posts: 3
Registered: 5-8-2002
Member Is Offline
Mood: No Mood
|
posted on 5-13-2002 at 07:58 PM |
|
|
for the first question:
one must have also at least some positive numbers in the scoring matrix to have an alignment. Moreover do we need that zero if all the values in the
scoring matrix are positive?
for the second question:consider the following scoring matrix:
X 3
Y 1 2
Z -2 0 4
X Y Z
The least number is -2, so some programs like clustalw converts this matrix( unless you use -negativematrix option) as follows
X 5
Y 3 4
Z 0 2 6
X Y Z
i.e they add | -2 | = 2 to all of the elements of the first matrix and get all positive values in the second.
|
|
|
GQuievryn
Post Master
   
Posts: 89
Registered: 3-4-2002
Location: Boston Area
Member Is Offline
Mood: No Mood
|
posted on 5-14-2002 at 02:32 PM |
|
|
It would impossible for ALL the numbers to be positive, as some of the matrix positions represent all insertions or deletions. However, if the
majority of the elements are positive, the algorithm will return the global alignment of the sequences - this is because the sequences are so similar
that the best alignment is reached by starting at the beginning and going to the end. In this case, you could use the global alignment algorithm
(i.e. get rid of the zero). However, imagine the case were you have 2 sequences that have "random" bases at the beginning and end of both sequences
and somewhere in the middle there is a string of matches. What the local alignment algorithm (i.e. the zero) does is to start the alignment for the
place where the sequences match and ignore everything that came before. When reconstructing the alignment, you look for the greatest value in the
matrix (which will be at the end of the matching region) and trace back until you reach the zero. Now you found the match but ignored the fact that
the beginning and ending of the sequences didn't match. If you did not include the zero, the algorithm would look for the best match from beginning
to end. If the matching region is a relatively small part of the entire sequence, you won't find it in the global alignment. Does this make
sense?
For the second part, I don't understand your notation: what are X, Y, and Z? Is this a substitution penalty? Is this value subtracted or added to
the score?
|
|
|
anti_matter
Junior Member

Posts: 3
Registered: 5-8-2002
Member Is Offline
Mood: No Mood
|
posted on 5-14-2002 at 03:52 PM |
|
|
Sorry!!!!
I think I have used "scoring matrix" and "substitution matrix" interchangably.
In fact I wanted to mean substitution matrices like PAM or BLOSUM. Hope you can explain better now.
Regards
|
|
|