Create comparisons between DNA segments
comparisons_from_dna_segs.Rd
Create a list of comparison
objects from a list of dna_seg
objects or
files by parsing (and executing) sequence alignments. If these files already
exist, then those will be parsed. If not, DIAMOND or a BLAST program can be
executed to generate the sequence alignment results between the dna_seg
,
with respect to the order of dna_segs
. Executing DIAMOND or BLAST requires
that the command-line implementations of these tools are installed.
Usage
comparisons_from_dna_segs(
dna_segs = NULL,
seg_labels = NULL,
files = NULL,
mode = "full",
tool = "blast",
algorithm = "blastp",
sensitivity = "default",
output_path = NULL,
all_vs_all = FALSE,
filt_high_evalue = NULL,
filt_low_per_id = NULL,
filt_length = "auto",
use_cache = TRUE,
verbose = FALSE,
...
)
Arguments
- dna_segs
A list of
dna_seg
objects to create comparisons between. Eitherdna_segs
orfiles
must be provided.- seg_labels
A character vector containing DNA segment labels.
- files
A character vector, containing file paths to the FASTA or GenBank files. The comparisons will be made between these files. Either
dna_segs
orfiles
must be provided.- mode
Determines how the comparisons will be filtered.
"besthit"
,"bidirectional"
, or"full"
. If mode is"besthit"
, only the best hit will be taken from each input query (see best_hit). If mode is"bidirectional"
, then hits are only kept if they are the best hits for their query in both directions (see bidirectional_best_hit)."full"
means that all sequence alignment results are considered.- tool
Choice of sequence alignment tool. Either
"blast"
or"diamond"
.- algorithm
Choice of BLAST algorithm to run. One of:
"blastp"
,"blastp-fast"
,"blastp-short"
,"tblastx"
,"blastn"
,"blastn-short"
,"megablast"
, or"dc-megablast"
.- sensitivity
Choice of sensitivity option when running DIAMOND. One of:
"fast"
,"default"
,"mid-sensitive"
,"sensitive"
,"more-sensitive"
,"very-sensitive"
, or"ultra-sensitive"
.- output_path
Path to the folder that will contain the output files. Both the sequence alignment result and the FASTA files used to make them will be stored here.
- all_vs_all
Logical. If
TRUE
, sequence alignments will be performed for every combination of the inputs, instead of just the ones necessary for plotting. Note that this can take a long time, so use with caution.- filt_high_evalue
A numerical, filters out all comparisons with an e-value higher than this value (unfiltered when left as
NULL
).- filt_low_per_id
A numerical, filters out all comparisons with a percentage identity lower than this value (unfiltered when left as
NULL
).- filt_length
A number indicating the minimum length required for hits, or
"auto"
. If"auto"
, it will be determined based on the choice oftool
andalgorithm
(150 for DIAMOND or any blastp algorithm, 450 for tblastx, 900 for any blastn algorithm).- use_cache
Logical. If
FALSE
, it will never check for existing files. This includes the FASTA files used as input for sequcence alignment, the database files used by DIAMOND and BLAST, and the sequence alignment results themselves.- verbose
Logical. If
TRUE
, reports timings when creating new files.- ...
Arguments to pass to other functions (the functions executing the sequence alignments tools, run_blast, and run_diamond).
Details
Unless use_cache
is set to FALSE
, this function will look for the files
required using a combination of the seg_labels
(if these are provided), and
the names of the dna_segs
or files
that were provided as input. If it
cannot find sequence alignment results in the form of "query_subject"
(or to put it differently, "dna_seg1_dna_seg2"
), then it will run DIAMOND
or BLAST to generate these results. Using this system, it also looks for the
FASTA files required as input for the sequence alignment.
If output_path
is left as NULL
, the current working directory will be
used instead.
Examples
if (FALSE) { # \dontrun{
## Comparisons from a vector of GenBank files using DIAMOND
comparisons <- comparisons_from_dna_segs(
files = c("genome1.gb", "genome2.gb", "genome2.gb"),
tool = "diamond",
output_path = "output/diamond",
sensitivity = "very-sensitive",
verbose = TRUE
)
} # }