| seqaln {bio3d} | R Documentation |
Create multiple alignments of amino acid or nucleotide sequences according to the method of Edgar.
seqaln(aln, id=NULL, exepath="", file="aln.fa", protein=TRUE,
seqgroup=FALSE, refine=FALSE, extra.args="")
aln |
a sequence character matrix, as obtained from
seqbind, or an alignment list object as obtained from
read.fasta. |
id |
a vector of sequence names to serve as sequence identifers. |
exepath |
path to the ‘MUSCLE’ program on your system (i.e. the directory where ‘MUSCLE’ is stored). |
file |
name of ‘FASTA’ output file to which alignment should be written. |
protein |
logical, if TRUE the input sequences are assumed to be protein not DNA or RNA. |
seqgroup |
logical, if TRUE similar sequences are grouped together in the output. |
refine |
logical, if TRUE the input sequences are assumed to already be aligned, and only tree dependent refinement is performed. |
extra.args |
a single character string containing extra command line arguments for the alignment program. |
Sequence alignment attempts to arrange the sequences of protein, DNA or RNA, to highlight regions of shared similarity that may reflect functional, structural, and/or evolutionary relationships between the sequences.
Aligned sequences are represented as rows within a matrix. Gaps (‘-’) are inserted between the aminoacids or nucleotides so that equivalent characters are positioned in the same column.
This function calls the ‘MUSCLE’ program, to perform a multiple sequence alignment, which MUST BE INSTALLED on your system and in the search path for executables.
If you have a large number of input sequences (a few thousand), or they are
very long, the default settings may be too slow for practical
use. A good compromise between speed and accuracy is to run just the
first two iterations of the ‘MUSCLE’ algorithm by setting the
extra.args argument to “-maxiters 2”.
You can set ‘MUSCLE’ to improve an existing alignment by setting
refine to TRUE.
To inspect the sequence clustering used by ‘MUSCLE’ to produce
alignments, include “-tree2 tree.out” in the extra.args
argument. You can then load the “tree.out” file with the
‘read.tree’ function from the ‘ape’ package.
A list with two components:
ali |
an alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide. |
ids |
sequence names as identifers. |
A system call is made to the ‘MUSCLE’ program, which must be installed on your system and in the search path for executables.
Barry Grant
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
‘MUSCLE’ is the work of Edgar: Edgar (2004) Nuc. Acid. Res. 32, 1792–1797.
Full details of the ‘MUSCLE’ algorithm, along with download and
installation instructions can be obtained from:
http://www.drive5.com/muscle.
read.fasta, read.fasta.pdb,
seqbind
## Not run:
##-- Read a folder/directory of PDB files
pdb.path <- "my_dir_of_pdbs"
files <- list.files(path=pdb.path ,
pattern="[.pdb]",
full.names=TRUE)
##-- Extract sequences
raw <- NULL
for(i in 1:length(files)) {
pdb <- read.pdb(files[i])
raw <- seqbind(raw, aa321(pdb$atom[pdb$calpha,"resid"]))
}
##-- Align sequences
aln <- seqaln(raw, id=basename(files),
file="seqaln.fa")
##-- Read Aligned PDBs
pdbs <- read.fasta.pdb(aln, pdb.path=pdb.path, pdbext = "")
## Do further analysis...
##-- For identical sequences with masking use a custom matrix
aln <- list(ali=seqbind(c("X","C","X","X","A","G","K"),
c("C","-","A","X","G","X","X","K")),
id=c("a","b"))
temp <- seqaln(aln=aln, file="temp.fas", protein=TRUE,
extra.args= paste("-matrix",
system.file("matrices/custom.mat", package="bio3d"),
"-gapopen -3.0 ",
"-gapextend -0.5",
"-center 0.0") )
## End(Not run)