A "semiprobabilistic" alignment algorithm which combines ideas from SmithWaterman and probabilistic alignment is proposed and studied in detail. It is predicted that the score statistics of this "hybrid" algorithm is of the universal Gumbel form, with the key Gumbel parameter A taking on a fixed asymptotic value for a wide variety of scoring parameters. We have also characterized the "extremal ensemble", i.e., the collection of sequence pairs exhibiting similarities that a given scoring system is most sensitive to. Based on this extremal ensemble, a simple recipe for the computation of the "relative entropy", and from it the correction to A due to finite sequence length is also given. This allows us to assign pvalues to the alignment results for arbitrary scoring parameters and gap costs. The predictions compare well with direct numerical simulations for a broad range of sequence lengths with various choices of the substitu substitution scores and affine gap parameters.

Authors: Yu Y.K., Bundschuh R., Hwa T.  Pages: 330 Year: 2002 
Tags: statistical ensemble significance bundschuh hybrid gapped hwa extremal alignment 