Create comparisons between dna_segs by reading in a file of groupings
read_orthogroup.RdFunctions to create comparisons between dna_seg objects by parsing
(ortho)groups from a file.
Usage
read_orthogroup_from_file(
file,
dna_segs,
fileType,
seg_labels = NULL,
id_tag = "locus_id",
group_name = "orthogroup",
alter_dna_segs = TRUE,
verbose = FALSE,
...
)
read_orthogroup_from_orthomcl(file, dna_segs, ...)
read_orthogroup_from_orthofinder(file, dna_segs, ...)
read_orthogroup_from_mmseqs2(file, dna_segs, ...)
read_orthogroup_from_diamond(file, dna_segs, ...)Arguments
- file
A character string containing a file path.
- dna_segs
A list of
dna_segobjects.- fileType
A character string containing the file format to parse. Must be one of:
"orthomcl","orthofinder","mmseqs2", or"diamond".- seg_labels
Only for use with
"orthofinder"format files, a character string ofdna_seglabels. See details.- id_tag
A character string with a
dna_segcolumn. The (gene) names taken from the (ortho)groups file will be matched to the names found in thisdna_segcolumn.- group_name
A character string containing a column name. This column will contain the group names found in the (ortho)groups file and will be added to the
comparisons, as well as thedna_segswhenalter_dna_segsis set toTRUE.- alter_dna_segs
Logical. If
TRUE, a group column will be added to eachdna_segcontaining the groupings found in the (ortho)groups file.- verbose
Logical. If
TRUE, will report a warning when the column specified bygroup_nameis already present in thedna_segsand will therefore be overwritten. Has no effect unlessalter_dna_segsis set toTRUE.- ...
Arguments to pass to fread and
read_orthogroup_from_file.
Value
With alter_dna_segs = TRUE, a list with 2
named elements: dna_segs and comparisons, which are both lists containing
the dna_seg and comparison objects, respectively.
With alter_dna_segs = FALSE, a list of comparison objects.
Details
read_orthogroup_from_orthomcl, read_orthogroup_from_orthofinder,
read_orthogroup_from_mmseqs2, and read_orthogroup_from_diamond are
all just convenience functions for read_orthogroup_from_file.
This function was created to create a list of comparisons from a list of
dna_segs and a file that contains orthologous groups of genes
(orthogroups). However, it could theoretically be used for any grouping of
(genetic) elements on a genomic track. For instance, a group could represent
an operon, pathway, or general function. This function creates the
comparisons by linking together columns from dna_segs (specified by
id_tag).
Because "orthofinder" format files contain columns representing the
different genomes as input, this function will attempt to
query only the dna_seg whose label matches the column. dna_seg labels
will be determined automatically, but they can also be provided using the
seg_labels argument. If the labels cannot be matched, it will continue
without matching dna_seg names, querying each dna_seg for each column.
Examples
## Generate data
names1 <- c("1_FeatA1", "1_FeatA2", "1_FeatB")
names2 <- c("2_FeatA", "2_FeatB")
names3 <- c("3_FeatA", "3_FeatB")
df1 <- data.frame(name = names1, start = c(1, 501, 1501),
end = c(400, 900, 2200), strand = c(1, 1, 1))
df2 <- data.frame(name = names2, start = c(1, 501),
end = c(400, 1200), strand = c(1, 1))
df3 <- data.frame(name = names3, start = c(1, 501),
end = c(400, 1200), strand = c(1, 1))
## Create list of dna_segs
dna_segs <- list(dna_seg(df1), dna_seg(df2), dna_seg(df3))
## Read feature groups from OrthoFinder (Orthogroups.tsv) format
file <- system.file('extdata/OrthoFinder_format.tsv', package = 'genoPlotR')
full_data <- read_orthogroup_from_file(file = file, dna_segs = dna_segs,
fileType = "orthofinder",
id_tag = "name")
#> Error in read_orthogroup_from_file(file = file, dna_segs = dna_segs, fileType = "orthofinder", id_tag = "name"): Could not find file at "".
## Plot data
plot_gene_map(dna_segs = full_data$dna_segs,
comparisons = full_data$comparisons,
global_color_scheme = "uniform",
alpha_comparisons = 0.5)
#> Error: object 'full_data' not found
## Examples of these groups in the different supported formats:
OrthoFinder_file <- system.file('extdata/OrthoFinder_format.tsv',
package = 'genoPlotR')
OrthoMCL_file <- system.file('extdata/OrthoMCL_format.txt',
package = 'genoPlotR')
MMSeqs2_or_DIAMOND_file <- system.file('extdata/MMseqs2_DIAMOND_format.tsv',
package = 'genoPlotR')
cat(readLines(OrthoFinder_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#>
cat(readLines(OrthoMCL_file), sep = "\n")
#> GroupA: 1_FeatA1 1_FeatA2 2_FeatA 3_FeatA
#> GroupB: 1_FeatB 2_FeatB 3_FeatB
cat(readLines(MMSeqs2_or_DIAMOND_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#>