Make unique IDs for dna_segs
make_unique_ids.Rd
Generates unique identifiers (IDs) for each dna_seg
features. They can be
based on the values from existing columns, or generated from scratch.
Arguments
- dna_seg_input
Either a single
dna_seg
or a list ofdna_seg
objects.- old_id
Either a character vector representing
dna_seg
columns, orNULL
. The IDs will be generated based on the vector ofdna_seg
columns provided, or generated from scratch if this argument isNULL
.- new_id
A character string, the generated IDs will be stored in the
dna_seg
column given by this argument. Will create a new column if it does not exist in thedna_segs
.
Value
Either a single dna_seg
object or a list of dna_seg
objects,
matching the input given using dna_seg_input
.
Details
This function generates unique identifiers for dna_segs
. Having unique
identifiers is necessary for certain other functions, like converting a
dna_seg
into a FASTA file, as most tools that make use of FASTA files
require unique headers for each sequence in the FASTA file.
If old_id
is left as NULL
, the generated IDs are simply row numbers for
each feature. If old_id
refers to one or multiple dna_seg
columns,
then the values of those columns are concatenated, separated by "_"
. Then,
a number is added to these values, which starts at 1 for each combination
of values, and goes up each time the same combination is found. See the
examples below.
Examples
## Prepare dna_seg
names1 <- c("A", "A", "B", "B", "B", "C")
types1 <- c("gene", "gene", "gene", "protein", "gene", "gene")
## Make dna_seg
dna_seg_raw <- dna_seg(data.frame(name = names1,
start = (1:6) * 3,
end = (1:6) * 3 + 2,
strand = rep(1, 6),
type = types1))
## Generate IDs based on 1 column
dna_seg_edit <- make_unique_ids(dna_seg_input = dna_seg_raw,
old_id = "name")
dna_seg_edit[, .(name, type, id)]
#> name type id
#> <char> <char> <char>
#> 1: A gene A_1
#> 2: A gene A_2
#> 3: B gene B_1
#> 4: B protein B_2
#> 5: B gene B_3
#> 6: C gene C_1
## Generate IDs based on multiple columns
dna_seg_edit <- make_unique_ids(dna_seg_input = dna_seg_raw,
old_id = c("name", "type"))
dna_seg_edit[, .(name, type, id)]
#> name type id
#> <char> <char> <char>
#> 1: A gene A_gene_1
#> 2: A gene A_gene_2
#> 3: B gene B_gene_1
#> 4: B protein B_protein_1
#> 5: B gene B_gene_2
#> 6: C gene C_gene_1