seqaln                 package:bio3d                 R Documentation

_S_e_q_u_e_n_c_e _A_l_i_g_n_m_e_n_t _w_i_t_h _M_U_S_C_L_E

_D_e_s_c_r_i_p_t_i_o_n:

     Create multiple alignments of amino acid or nucleotide sequences
     according to the method of Edgar.

_U_s_a_g_e:

     seqaln(aln, id=NULL, exepath="", file="aln.fa", protein=TRUE,
            seqgroup=FALSE, refine=FALSE, extra.args="")

_A_r_g_u_m_e_n_t_s:

     aln: a sequence character matrix, as obtained from 'seqbind', or
          an alignment list object as obtained from 'read.fasta'. 

      id: a vector of sequence names to serve as sequence identifers. 

 exepath: path to the 'MUSCLE' program on your system (i.e. the
          directory where 'MUSCLE' is stored). 

    file: name of 'FASTA' output file to which alignment should be
          written. 

 protein: logical, if TRUE the input sequences are assumed to be
          protein not DNA or RNA. 

seqgroup: logical, if TRUE similar sequences are grouped together in
          the output. 

  refine: logical, if TRUE the input sequences are assumed to already
          be aligned, and only tree dependent refinement is performed. 

extra.args: a single character string containing extra command line
          arguments for the alignment program. 

_D_e_t_a_i_l_s:

     Sequence alignment attempts to arrange the sequences of protein,
     DNA or RNA, to highlight regions of shared similarity that may
     reflect functional, structural, and/or evolutionary relationships
     between the sequences.

     Aligned sequences are represented as rows within a matrix. Gaps
     ('-') are inserted between the aminoacids or nucleotides so that
     equivalent characters are positioned in the same column.

     This function calls the 'MUSCLE' program, to perform a multiple
     sequence alignment, which MUST BE INSTALLED on your system and in
     the search path for executables.

     If you have a large number of input sequences (a few thousand), or
     they are very long, the default settings may be too slow for
     practical use. A good compromise between speed and accuracy is to
     run just the first two iterations of the 'MUSCLE' algorithm by
     setting the 'extra.args' argument to "-maxiters 2".

     You can set 'MUSCLE' to improve an existing alignment by setting
     'refine' to TRUE.

     To inspect the sequence clustering used by 'MUSCLE' to produce
     alignments, include "-tree2 tree.out" in the 'extra.args'
     argument. You can then load the "tree.out" file with the
     'read.tree' function from the 'ape' package.

_V_a_l_u_e:

     A list with two components: 

     ali: an alignment character matrix with a row per sequence and a
          column per equivalent aminoacid/nucleotide. 

     ids: sequence names as identifers.

_N_o_t_e:

     A system call is made to the 'MUSCLE' program, which must be
     installed on your system and in the search path for executables.

_A_u_t_h_o_r(_s):

     Barry Grant

_R_e_f_e_r_e_n_c_e_s:

     Grant, B.J. et al. (2006) _Bioinformatics_ *22*, 2695-2696.

     'MUSCLE' is the work of Edgar: Edgar (2004) _Nuc. Acid. Res._
     *32*, 1792-1797.

     Full details of the 'MUSCLE' algorithm, along with download and
     installation instructions can be obtained from:
      <URL: http://www.drive5.com/muscle>.

_S_e_e _A_l_s_o:

     'read.fasta', 'read.fasta.pdb', 'seqbind'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     ##-- Read a folder/directory of PDB files
     pdb.path <- "my_dir_of_pdbs"
     files  <- list.files(path=pdb.path ,
                          pattern="[.pdb]",
                          full.names=TRUE)

     ##-- Extract sequences
     raw <- NULL
     for(i in 1:length(files)) {
       pdb <- read.pdb(files[i])
       raw <- seqbind(raw, aa321(pdb$atom[pdb$calpha,"resid"]))
     }

     ##-- Align sequences
     aln <- seqaln(raw, id=basename(files),
                   file="seqaln.fa")

     ##-- Read Aligned PDBs
     pdbs <- read.fasta.pdb(aln, pdb.path=pdb.path, pdbext = "")

     ## Do further analysis...



     ##-- For identical sequences with masking use a custom matrix
     aln <- list(ali=seqbind(c("X","C","X","X","A","G","K"),
                             c("C","-","A","X","G","X","X","K")),
                 id=c("a","b"))

     temp <- seqaln(aln=aln, file="temp.fas", protein=TRUE,
                    extra.args= paste("-matrix",
                     system.file("matrices/custom.mat", package="bio3d"),
                     "-gapopen -3.0 ",
                     "-gapextend -0.5",
                     "-center 0.0") )

     ## End(Not run)

