Create comparisons between dna_segs by reading in a file of groupings
read_orthogroup.Rd
Functions to create comparisons
between dna_seg
objects by parsing
(ortho)groups from a file.
Usage
read_orthogroup_from_file(
file,
dna_segs,
fileType,
seg_labels = NULL,
id_tag = "locus_id",
group_name = "orthogroup",
alter_dna_segs = TRUE,
verbose = FALSE,
...
)
read_orthogroup_from_orthomcl(file, dna_segs, ...)
read_orthogroup_from_orthofinder(file, dna_segs, ...)
read_orthogroup_from_mmseqs2(file, dna_segs, ...)
read_orthogroup_from_diamond(file, dna_segs, ...)
Arguments
- file
A character string containing a file path.
- dna_segs
A list of
dna_seg
objects.- fileType
A character string containing the file format to parse. Must be one of:
"orthomcl"
,"orthofinder"
,"mmseqs2"
, or"diamond"
.- seg_labels
Only for use with
"orthofinder"
format files, a character string ofdna_seg
labels. See details.- id_tag
A character string with a
dna_seg
column. The (gene) names taken from the (ortho)groups file will be matched to the names found in thisdna_seg
column.- group_name
A character string containing a column name. This column will contain the group names found in the (ortho)groups file and will be added to the
comparisons
, as well as thedna_segs
whenalter_dna_segs
is set toTRUE
.- alter_dna_segs
Logical. If
TRUE
, a group column will be added to eachdna_seg
containing the groupings found in the (ortho)groups file.- verbose
Logical. If
TRUE
, will report a warning when the column specified bygroup_name
is already present in thedna_segs
and will therefore be overwritten. Has no effect unlessalter_dna_segs
is set toTRUE
.- ...
Arguments to pass to fread and
read_orthogroup_from_file
.
Value
With alter_dna_segs = TRUE
, a list with 2
named elements: dna_segs
and comparisons
, which are both lists containing
the dna_seg
and comparison
objects, respectively.
With alter_dna_segs = FALSE
, a list of comparison
objects.
Details
read_orthogroup_from_orthomcl
, read_orthogroup_from_orthofinder
,
read_orthogroup_from_mmseqs2
, and read_orthogroup_from_diamond
are
all just convenience functions for read_orthogroup_from_file
.
This function was created to create a list of comparisons
from a list of
dna_segs
and a file that contains orthologous groups of genes
(orthogroups). However, it could theoretically be used for any grouping of
(genetic) elements on a genomic track. For instance, a group could represent
an operon, pathway, or general function. This function creates the
comparisons
by linking together columns from dna_segs
(specified by
id_tag
).
Because "orthofinder"
format files contain columns representing the
different genomes as input, this function will attempt to
query only the dna_seg
whose label matches the column. dna_seg
labels
will be determined automatically, but they can also be provided using the
seg_labels
argument. If the labels cannot be matched, it will continue
without matching dna_seg
names, querying each dna_seg
for each column.
Examples
## Generate data
names1 <- c("1_FeatA1", "1_FeatA2", "1_FeatB")
names2 <- c("2_FeatA", "2_FeatB")
names3 <- c("3_FeatA", "3_FeatB")
df1 <- data.frame(name = names1, start = c(1, 501, 1501),
end = c(400, 900, 2200), strand = c(1, 1, 1))
df2 <- data.frame(name = names2, start = c(1, 501),
end = c(400, 1200), strand = c(1, 1))
df3 <- data.frame(name = names3, start = c(1, 501),
end = c(400, 1200), strand = c(1, 1))
## Create list of dna_segs
dna_segs <- list(dna_seg(df1), dna_seg(df2), dna_seg(df3))
## Read feature groups from OrthoFinder (Orthogroups.tsv) format
file <- system.file('extdata/OrthoFinder_format.tsv', package = 'genoPlotR')
full_data <- read_orthogroup_from_file(file = file, dna_segs = dna_segs,
fileType = "orthofinder",
id_tag = "name")
#> Error in read_orthogroup_from_file(file = file, dna_segs = dna_segs, fileType = "orthofinder", id_tag = "name"): Could not find file at "".
## Plot data
plot_gene_map(dna_segs = full_data$dna_segs,
comparisons = full_data$comparisons,
global_color_scheme = "uniform",
alpha_comparisons = 0.5)
#> Error: object 'full_data' not found
## Examples of these groups in the different supported formats:
OrthoFinder_file <- system.file('extdata/OrthoFinder_format.tsv',
package = 'genoPlotR')
OrthoMCL_file <- system.file('extdata/OrthoMCL_format.txt',
package = 'genoPlotR')
MMSeqs2_or_DIAMOND_file <- system.file('extdata/MMseqs2_DIAMOND_format.tsv',
package = 'genoPlotR')
cat(readLines(OrthoFinder_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#>
cat(readLines(OrthoMCL_file), sep = "\n")
#> GroupA: 1_FeatA1 1_FeatA2 2_FeatA 3_FeatA
#> GroupB: 1_FeatB 2_FeatB 3_FeatB
cat(readLines(MMSeqs2_or_DIAMOND_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#>