Skip to contents

Takes a (phylogenetic) tree and removes all tree tips that are not found in a given set of names.

Usage

trim_tree(dna_segs, tree, exact_match = FALSE)

Arguments

dna_segs

Either a character vector containing dna_seg labels, or a list of dna_seg objects.

tree

A (phylogenetic) tree, in the form of a phylo or phylog object, or a character string containing a file path to a Newick tree format file.

exact_match

Logical. If TRUE, dna_seg labels will need to match the labels of the tree exactly. If exact_match = FALSE, tree tip labels only need to contain the dna_seg labels for a match to be found (e.g. the dna_seg label "seq_1" will match tree tip label "E_coli_seq_1.fa").

Value

A phylog object containing the filtered tree.

Details

This function takes a character vector of dna_seg labels, either directly, or by extracting them from a list of dna_segs, through the dna_segs argument. Each of the labels is queried to find matching tree tip labels, and any tree tip label without a match will removed. If multiple matches are found for a single dna_seg label, an error is returned that shows the offending dna_seg label.

Author

Mike Puijk

Examples

## Generate data
names <- c("seq_1", "seq_2", "seq_3")
tree_str <- paste0("(seq_1_B_bacilliformis:0.5,",
                   "(seq_2_B_grahamii:0.1,",
                   "(seq_3_B_henselae:0.1,",
                   "seq_4_B_quintana:0.2):0.1):0.1);"
                   )
tree <- ade4::newick2phylog(tree_str)

## Filter tree
tree$tre
#> [1] "(seq_1_B_bacilliformis,(seq_2_B_grahamii,(seq_3_B_henselae,seq_4_B_quintana)I1)I2)Root;"
tree <- trim_tree(dna_segs = names, tree = tree)
tree$tre
#> [1] "(seq_1_B_bacilliformis,(seq_2_B_grahamii,seq_3_B_henselae)I2)Root;"