Edit dna_seg features

Edit the features from dna_segs by supplying a set of IDs, and a set of new values for features that match these IDs.

Usage

edit_dna_segs(
  dna_seg_input,
  ids,
  seg_labels = NULL,
  id_tags = c("name", "locus_id"),
  fixed = FALSE,
  verbose = FALSE,
  ...
)

Arguments

dna_seg_input: Either a single dna_seg or a list of dna_seg objects.
ids: Either a character string (specifying a file path), or a data.frame object. Contains the information on how to edit the dna_segs, and must contain at least an id column. See details.
seg_labels: Either NULL or a character vector of the same length as dna_seg_input. If ids contains a seg_label column, then changes will be made only in the dna_segs with the corresponding labels. These labels will be determined both from the seg_labels argument, but also from the dna_segs themselves. As such, seg_labels can be used to provide an alternate set of names.
id_tags: A character vector of dna_seg column names to match to the id column from ids.
fixed: Logical. If TRUE, values from the id column have to match exactly to the values found in the dna_segs. If FALSE, grep is used to search instead, allowing for regular expressions to be used.
verbose: Logical. If TRUE, generates warnings when no dna_seg could be found for the labels found in the seg_label column from ids. Additionally, generates a warning when the columns provided by id_tags could not be found in the dna_segs.
...: Arguments to pass to fread, which is used when the ids argument refers to a file.

Value

Either a single dna_seg object or a list of dna_seg objects, matching the input given using dna_seg_input.

Details

If ids is a character string, it is assumed to be a file path, and the file will be read using fread. If not, it has to be a data.frame or data.table object, with a mandatory id column. It will then update the dna_segs by querying each value (the IDs) from the id column, updating each matching row. It will look for the IDs in the columns provided by the id_tags argument. This can be constrained so that it only looks in a specific dna_seg for each ID by including a seg_label column in ids.

This function can be used to alter dna_seg attributes on mass, by providing IDs that match to very general attributes, like their color or the presence of a certain word in their functions. But, it can also be used to modify very specific features by making use of attributes with locus tags or the like.

Author

Mike Puijk

Examples

## Prepare dna_seg
names1 <- c("1A", "1B", "1C")
names2 <- c("2A", "2C", "2B")
names3 <- c("3B", "3A", "3C")

## Make dna_segs
dna_seg1 <- dna_seg(data.frame(name = names1,
                               start = (1:3) * 3,
                               end = (1:3) * 3 + 2,
                               strand = rep(1, 3)))
dna_seg2 <- dna_seg(data.frame(name = names2,
                               start = (1:3) * 3,
                               end = (1:3) * 3 + 2,
                               strand = rep(1, 3)))
dna_seg3 <- dna_seg(data.frame(name = names3,
                               start = (1:3) * 3,
                               end = (1:3) * 3 + 2,
                               strand = rep(1, 3)))
dna_segs <- list("Genome 1" = dna_seg1,
                 "Genome 2" = dna_seg2,
                 "Genome 3" = dna_seg3)

## Colors before using edit_dna_segs
lapply(dna_segs, function(x) x[, .(name, fill)])
#> $`Genome 1`
#>      name   fill
#>    <char> <char>
#> 1:     1A grey80
#> 2:     1B grey80
#> 3:     1C grey80
#> 
#> $`Genome 2`
#>      name   fill
#>    <char> <char>
#> 1:     2A grey80
#> 2:     2C grey80
#> 3:     2B grey80
#> 
#> $`Genome 3`
#>      name   fill
#>    <char> <char>
#> 1:     3B grey80
#> 2:     3A grey80
#> 3:     3C grey80
#> 

## Add colors based on exact feature names
id_fixed <- c("1A", "1B", "2A", "2B")
fill_fixed <- c("red", "blue", "red", "blue")
dna_segs1 <- edit_dna_segs(dna_seg_input = dna_segs,
                           ids = data.frame(id = id_fixed,
                                            fill = fill_fixed),
                           fixed = TRUE)
lapply(dna_segs1, function(x) x[, .(name, fill)])
#> $`Genome 1`
#>      name   fill
#>    <char> <char>
#> 1:     1A    red
#> 2:     1B   blue
#> 3:     1C grey80
#> 
#> $`Genome 2`
#>      name   fill
#>    <char> <char>
#> 1:     2A    red
#> 2:     2C grey80
#> 3:     2B   blue
#> 
#> $`Genome 3`
#>      name   fill
#>    <char> <char>
#> 1:     3B grey80
#> 2:     3A grey80
#> 3:     3C grey80
#> 

## Add colors based on the presence of a string in the feature names
id_grep <- c("A", "B")
fill_grep <- c("red", "blue")
dna_segs2 <- edit_dna_segs(dna_seg_input = dna_segs,
                           ids = data.frame(id = id_grep, fill = fill_grep))
lapply(dna_segs2, function(x) x[, .(name, fill)])
#> $`Genome 1`
#>      name   fill
#>    <char> <char>
#> 1:     1A    red
#> 2:     1B   blue
#> 3:     1C grey80
#> 
#> $`Genome 2`
#>      name   fill
#>    <char> <char>
#> 1:     2A    red
#> 2:     2C grey80
#> 3:     2B   blue
#> 
#> $`Genome 3`
#>      name   fill
#>    <char> <char>
#> 1:     3B   blue
#> 2:     3A    red
#> 3:     3C grey80
#> 

## Use seg_labels to add colors only to specific dna_segs
id_labels <- c("Genome 1", "Genome 2")
dna_segs3 <- edit_dna_segs(dna_seg_input = dna_segs,
                           ids = data.frame(id = id_grep, fill = fill_grep))
lapply(dna_segs3, function(x) x[, .(name, fill)])
#> $`Genome 1`
#>      name   fill
#>    <char> <char>
#> 1:     1A    red
#> 2:     1B   blue
#> 3:     1C grey80
#> 
#> $`Genome 2`
#>      name   fill
#>    <char> <char>
#> 1:     2A    red
#> 2:     2C grey80
#> 3:     2B   blue
#> 
#> $`Genome 3`
#>      name   fill
#>    <char> <char>
#> 1:     3B   blue
#> 2:     3A    red
#> 3:     3C grey80
#>