Trim a tree to remove unused sequences
trim_tree.Rd
Takes a (phylogenetic) tree and removes all tree tips that are not found in a given set of names.
Arguments
- dna_segs
Either a character vector containing
dna_seg
labels, or a list ofdna_seg
objects.- tree
A (phylogenetic) tree, in the form of a
phylo
orphylog
object, or a character string containing a file path to a Newick tree format file.- exact_match
Logical. If
TRUE
,dna_seg
labels will need to match the labels of the tree exactly. Ifexact_match = FALSE
, tree tip labels only need to contain thedna_seg
labels for a match to be found (e.g. thedna_seg
label"seq_1"
will match tree tip label"E_coli_seq_1.fa"
).
Details
This function takes a character vector of dna_seg
labels, either directly,
or by extracting them from a list of dna_segs
, through the dna_segs
argument. Each of the labels is queried to find matching tree tip labels,
and any tree tip label without a match will removed. If multiple matches are
found for a single dna_seg
label, an
error is returned that shows the offending dna_seg
label.
Examples
## Generate data
names <- c("seq_1", "seq_2", "seq_3")
tree_str <- paste0("(seq_1_B_bacilliformis:0.5,",
"(seq_2_B_grahamii:0.1,",
"(seq_3_B_henselae:0.1,",
"seq_4_B_quintana:0.2):0.1):0.1);"
)
tree <- ade4::newick2phylog(tree_str)
## Filter tree
tree$tre
#> [1] "(seq_1_B_bacilliformis,(seq_2_B_grahamii,(seq_3_B_henselae,seq_4_B_quintana)I1)I2)Root;"
tree <- trim_tree(dna_segs = names, tree = tree)
tree$tre
#> [1] "(seq_1_B_bacilliformis,(seq_2_B_grahamii,seq_3_B_henselae)I2)Root;"