conserv                package:bio3d                R Documentation

_S_c_o_r_e _R_e_s_i_d_u_e _C_o_n_s_e_r_v_a_t_i_o_n _A_t _E_a_c_h _P_o_s_i_t_i_o_n _i_n _a_n _A_l_i_g_n_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     Quantifies residue conservation in a given protein sequence
     alignment by calculating the degree of amino acid variability in
     each column of the alignment.

_U_s_a_g_e:

     conserv(x, method = c("similarity","identity","entropy22","entropy10"),
             sub.matrix = c("bio3d", "blosum62", "pam30", "other"),
             matrix.file = NULL, normalize.matrix = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: an alignment list object with 'id' and 'ali' components,
          similar to that generated by 'read.fasta'. 

  method: the conservation assesment method. 

sub.matrix: a matrix to score conservation. 

matrix.file: a file name of an arbitary user matrix. 

normalize.matrix: logical, if TRUE the matrix is normalized pior to
          assesing conservation. 

_D_e_t_a_i_l_s:

     To assess the level of sequence conservation at each position in
     an alignment, the "similarity", "identity", and "entropy" per
     position can be calculated.

     The "similarity" is defined as the average of the similarity
     scores of all pairwise residue comparisons for that position in
     the alignment, where the similarity score between any two residues
     is the score value between those residues in the chosen
     substitution matrix "sub.matrix".

     The "identity" i.e. the preference for a specific amino acid to be
     found at a certain position, is assessed by averaging the identity
     scores resulting from all possible pairwise comparisons at that
     position in the alignment, where all identical residue comparisons
     are given a score of 1 and all other comparisons are given a value
     of 0.

     "Entropy" is based on Shannons information entropy. See the
     'entropy' function for further details.

     Note that the returned scores are normalized so that conserved
     columns score 1 and diverse columns score 0.

_V_a_l_u_e:

     Returns a numeric vector of scores

_N_o_t_e:

     Each of these conservation scores has particular strengths and
     weaknesses.  For example, entropy elegantly captures amino acid
     diversity but fails to account for stereochemical similarities. By
     employing a combination of scores and taking the union of their
     respective conservation signals we expect to achieve a more
     comprehensive analysis of sequence conservation (Grant, 2007).

_A_u_t_h_o_r(_s):

     Barry Grant

_R_e_f_e_r_e_n_c_e_s:

     Grant, B.J. et al. (2006) _Bioinformatics_ *22*, 2695-2696. Grant,
     B.J. et al. (2007) _J. Mol. Biol._ *368*, 1231-1248.

_S_e_e _A_l_s_o:

     'read.fasta', 'read.fasta.pdb'

_E_x_a_m_p_l_e_s:

     ## Read an example alignment
     aln <- read.fasta(system.file("examples/kinesin_xray.fa",package="bio3d"))

     ## Score conservation
     conserv(x=aln$ali, method="similarity", sub.matrix="bio3d")
     ##conserv(x=aln$ali,method="entropy22", sub.matrix="other")

