BioPlanet Bioinformatics
BioPlanet
Bioinformatics Forums
General Chat

All Boards are FREE to post
(see our privacy policy)

Not logged in [Login - Register]
  Home     Forums     Jobs     CVs     Chat     Companies  
  Recruiters     Courses     Links     Books     StockIndex     Tutorial  
go to bottom

Printable Version | Subscribe | Add to Favorites  
Author: Subject: About Smith-Waterman algorithm
anti_matter
Junior Member
**




Posts: 3
Registered: 5-8-2002
Member Is Offline

Mood: No Mood

[*] posted on 5-8-2002 at 04:08 PM
About Smith-Waterman algorithm


Hello!!

I have two questions to ask:
#1. Why zero is included in this step of SW algorithm:

Hij=max{Hi-1, j-1 +s(ai,bj), max{Hi-k,j -Wk}, max{Hi, j-l -Wl}, 0}

Can any negative number be used as well as a measure of no similarity ? along with filling the zeroth row and zeroth column with the same negative number.

#2. Why some programs like ClustalW always uses a scoring matrix with positive substitution values? i.e they convert by adding the modulas of the least number to all of the numbers.

View user's profile View All Posts By User
GQuievryn
Post Master
*****




Posts: 89
Registered: 3-4-2002
Location: Boston Area
Member Is Offline

Mood: No Mood

[*] posted on 5-8-2002 at 06:45 PM


The zero is included for LOCAL sequence alignments. If you had two sequences that you wanted to align from beginning to end, you would NOT include the zero. You can think of the zero as allowing you to start anywhere in the matrix.

The numbers themselves really don't have much relavance, as they depend on the parameters. For local alignments, you usually choose the highest number in the matrix to start the recursive reconstruction from. So a really low negative number just means that there aren't many matches. I'm not sure why you would necessary use these numbers for anything such as a measure of similarity.

For your second question, I'm not entirely sure what you are asking. Are you talking about sequence alignment or protein alignment (using scoring matrix such as PAM)? Can you give an example of what you are talking about?
View user's profile Visit user's homepage View All Posts By User
anti_matter
Junior Member
**




Posts: 3
Registered: 5-8-2002
Member Is Offline

Mood: No Mood

[*] posted on 5-13-2002 at 07:58 PM


for the first question:
one must have also at least some positive numbers in the scoring matrix to have an alignment. Moreover do we need that zero if all the values in the scoring matrix are positive?

for the second question:consider the following scoring matrix:

X 3
Y 1 2
Z -2 0 4
X Y Z

The least number is -2, so some programs like clustalw converts this matrix( unless you use -negativematrix option) as follows

X 5
Y 3 4
Z 0 2 6
X Y Z
i.e they add | -2 | = 2 to all of the elements of the first matrix and get all positive values in the second.
View user's profile View All Posts By User
GQuievryn
Post Master
*****




Posts: 89
Registered: 3-4-2002
Location: Boston Area
Member Is Offline

Mood: No Mood

[*] posted on 5-14-2002 at 02:32 PM


It would impossible for ALL the numbers to be positive, as some of the matrix positions represent all insertions or deletions. However, if the majority of the elements are positive, the algorithm will return the global alignment of the sequences - this is because the sequences are so similar that the best alignment is reached by starting at the beginning and going to the end. In this case, you could use the global alignment algorithm (i.e. get rid of the zero). However, imagine the case were you have 2 sequences that have "random" bases at the beginning and end of both sequences and somewhere in the middle there is a string of matches. What the local alignment algorithm (i.e. the zero) does is to start the alignment for the place where the sequences match and ignore everything that came before. When reconstructing the alignment, you look for the greatest value in the matrix (which will be at the end of the matching region) and trace back until you reach the zero. Now you found the match but ignored the fact that the beginning and ending of the sequences didn't match. If you did not include the zero, the algorithm would look for the best match from beginning to end. If the matching region is a relatively small part of the entire sequence, you won't find it in the global alignment. Does this make sense?

For the second part, I don't understand your notation: what are X, Y, and Z? Is this a substitution penalty? Is this value subtracted or added to the score?
View user's profile Visit user's homepage View All Posts By User
anti_matter
Junior Member
**




Posts: 3
Registered: 5-8-2002
Member Is Offline

Mood: No Mood

[*] posted on 5-14-2002 at 03:52 PM


Sorry!!!!
I think I have used "scoring matrix" and "substitution matrix" interchangably.

In fact I wanted to mean substitution matrices like PAM or BLOSUM. Hope you can explain better now.

Regards
View user's profile View All Posts By User
  Go To Top
BioBanner - free advertising for BioScience web sites. BioBanner - free advertising for BioScience web sites.

Powered by XMB 1.9.11
Developed By The XMB Group © 2001-2009