Skip to contents

Takes a list of dna_seg objects or dna_seg labels and reorganizes them based on a given (phylogenetic) tree.

Usage

permute_dna_segs(
  dna_segs,
  tree,
  exact_match = FALSE,
  return_old_labels = FALSE
)

Arguments

dna_segs

Either a character vector containing dna_seg labels, or a list of dna_seg objects.

tree

A (phylogenetic) tree, in the form of a phylo or phylog object, or a character string containing a file path to a Newick tree format file.

exact_match

Logical. If TRUE, dna_seg labels will need to match the labels of the tree exactly. If exact_match = FALSE, tree tip labels only need to contain the dna_seg labels for a match to be found (e.g. the dna_seg label "seq_1" will match tree tip label "E_coli_seq_1.fa").

return_old_labels

Logical. If TRUE, then the dna_seg labels will be returned using the original names provided by the dna_segs argument. Only relevant when exact_match = FALSE, as this option can cause dna_seg labels to be changed to match the tree tip labels.

Value

A list of dna_seg objects or a character vector of dna_seg labels, matching the input given in the dna_segs argument.

If return_old_labels = TRUE, a list with 2 named elements will be returned instead (dna_segs, the same return value as above, and old_labels, a character vector of the original labels that is now sorted).

Details

This function takes a character vector of dna_seg labels, either directly, or by extracting them from a list of dna_segs, through the dna_segs argument. Each of the labels is queried to find matching tree tip labels, sorting them to match the order in which they are found in the tree. If exactly 1 match is found, the dna_seg label is updated to match the tree tip label, unless exact_match = TRUE. If multiple matches are found, an error is returned that shows the offending dna_seg label.

Author

Mike Puijk

Examples

## Generate data
seg_labels <- c("seq_2", "seq_3", "seq_1", "seq_4")
tree_str <- paste0("(seq_1_B_bacilliformis:0.5,",
                   "(seq_2_B_grahamii:0.1,",
                   "(seq_3_B_henselae:0.1,",
                   "seq_4_B_quintana:0.2):0.1):0.1);")
tree <- ade4::newick2phylog(tree_str)

## Reorder and rename dna_seg labels to match tree
seg_labels
#> [1] "seq_2" "seq_3" "seq_1" "seq_4"
seg_labels <- permute_dna_segs(dna_segs = seg_labels, tree = tree)
seg_labels
#> [1] "seq_1_B_bacilliformis" "seq_2_B_grahamii"      "seq_3_B_henselae"     
#> [4] "seq_4_B_quintana"