Reorder dna_segs or labels to match a tree
permute_dna_segs.RdTakes a list of dna_seg objects or dna_seg labels and reorganizes them
based on a given (phylogenetic) tree.
Arguments
- dna_segs
Either a character vector containing
dna_seglabels, or a list ofdna_segobjects.- tree
A (phylogenetic) tree, in the form of a
phyloorphylogobject, or a character string containing a file path to a Newick tree format file.- exact_match
Logical. If
TRUE,dna_seglabels will need to match the labels of the tree exactly. Ifexact_match = FALSE, tree tip labels only need to contain thedna_seglabels for a match to be found (e.g. thedna_seglabel"seq_1"will match tree tip label"E_coli_seq_1.fa").- return_old_labels
Logical. If
TRUE, then thedna_seglabels will be returned using the original names provided by thedna_segsargument. Only relevant whenexact_match = FALSE, as this option can causedna_seglabels to be changed to match the tree tip labels.
Value
A list of dna_seg objects or a character vector of dna_seg
labels, matching the input given in the dna_segs argument.
If return_old_labels = TRUE, a list with 2 named elements will
be returned instead (dna_segs, the same return value as above,
and old_labels, a character vector of the original labels that is now
sorted).
Details
This function takes a character vector of dna_seg labels, either directly,
or by extracting them from a list of dna_segs, through the dna_segs
argument. Each of the labels is queried to find matching tree tip labels,
sorting them to match the order in which they are found in the tree. If
exactly 1 match is found, the dna_seg label is updated to match the tree
tip label, unless exact_match = TRUE. If multiple matches are found, an
error is returned that shows the offending dna_seg label.
Examples
## Generate data
seg_labels <- c("seq_2", "seq_3", "seq_1", "seq_4")
tree_str <- paste0("(seq_1_B_bacilliformis:0.5,",
"(seq_2_B_grahamii:0.1,",
"(seq_3_B_henselae:0.1,",
"seq_4_B_quintana:0.2):0.1):0.1);")
tree <- ade4::newick2phylog(tree_str)
## Reorder and rename dna_seg labels to match tree
seg_labels
#> [1] "seq_2" "seq_3" "seq_1" "seq_4"
seg_labels <- permute_dna_segs(dna_segs = seg_labels, tree = tree)
seg_labels
#> [1] "seq_1_B_bacilliformis" "seq_2_B_grahamii" "seq_3_B_henselae"
#> [4] "seq_4_B_quintana"