Skip to contents

Functions to create comparisons between dna_seg objects by parsing (ortho)groups from a file.

Usage

read_orthogroup_from_file(
  file,
  dna_segs,
  fileType,
  seg_labels = NULL,
  id_tag = "locus_id",
  group_name = "orthogroup",
  alter_dna_segs = TRUE,
  verbose = FALSE,
  ...
)

read_orthogroup_from_orthomcl(file, dna_segs, ...)

read_orthogroup_from_orthofinder(file, dna_segs, ...)

read_orthogroup_from_mmseqs2(file, dna_segs, ...)

read_orthogroup_from_diamond(file, dna_segs, ...)

Arguments

file

A character string containing a file path.

dna_segs

A list of dna_seg objects.

fileType

A character string containing the file format to parse. Must be one of: "orthomcl", "orthofinder", "mmseqs2", or "diamond".

seg_labels

Only for use with "orthofinder" format files, a character string of dna_seg labels. See details.

id_tag

A character string with a dna_seg column. The (gene) names taken from the (ortho)groups file will be matched to the names found in this dna_seg column.

group_name

A character string containing a column name. This column will contain the group names found in the (ortho)groups file and will be added to the comparisons, as well as the dna_segs when alter_dna_segs is set to TRUE.

alter_dna_segs

Logical. If TRUE, a group column will be added to each dna_seg containing the groupings found in the (ortho)groups file.

verbose

Logical. If TRUE, will report a warning when the column specified by group_name is already present in the dna_segs and will therefore be overwritten. Has no effect unless alter_dna_segs is set to TRUE.

...

Arguments to pass to fread and read_orthogroup_from_file.

Value

With alter_dna_segs = TRUE, a list with 2 named elements: dna_segs and comparisons, which are both lists containing the dna_seg and comparison objects, respectively.

With alter_dna_segs = FALSE, a list of comparison objects.

Details

read_orthogroup_from_orthomcl, read_orthogroup_from_orthofinder, read_orthogroup_from_mmseqs2, and read_orthogroup_from_diamond are all just convenience functions for read_orthogroup_from_file.

This function was created to create a list of comparisons from a list of dna_segs and a file that contains orthologous groups of genes (orthogroups). However, it could theoretically be used for any grouping of (genetic) elements on a genomic track. For instance, a group could represent an operon, pathway, or general function. This function creates the comparisons by linking together columns from dna_segs (specified by id_tag).

Because "orthofinder" format files contain columns representing the different genomes as input, this function will attempt to query only the dna_seg whose label matches the column. dna_seg labels will be determined automatically, but they can also be provided using the seg_labels argument. If the labels cannot be matched, it will continue without matching dna_seg names, querying each dna_seg for each column.

Author

Mike Puijk

Examples

## Generate data
names1 <- c("1_FeatA1", "1_FeatA2", "1_FeatB")
names2 <- c("2_FeatA", "2_FeatB")
names3 <- c("3_FeatA", "3_FeatB")
df1 <- data.frame(name = names1, start = c(1, 501, 1501),
                  end = c(400, 900, 2200), strand = c(1, 1, 1))
df2 <- data.frame(name = names2, start = c(1, 501),
                  end = c(400, 1200), strand = c(1, 1))
df3 <- data.frame(name = names3, start = c(1, 501),
                  end = c(400, 1200), strand = c(1, 1))

## Create list of dna_segs
dna_segs <- list(dna_seg(df1), dna_seg(df2), dna_seg(df3))

## Read feature groups from OrthoFinder (Orthogroups.tsv) format
file <- system.file('extdata/OrthoFinder_format.tsv', package = 'genoPlotR')
full_data <- read_orthogroup_from_file(file = file, dna_segs = dna_segs,
                                       fileType = "orthofinder", 
                                       id_tag = "name")
#> Error in read_orthogroup_from_file(file = file, dna_segs = dna_segs, fileType = "orthofinder",     id_tag = "name"): Could not find file at "".

## Plot data
plot_gene_map(dna_segs = full_data$dna_segs,
              comparisons = full_data$comparisons,
              global_color_scheme = "uniform",
              alpha_comparisons = 0.5)
#> Error: object 'full_data' not found

## Examples of these groups in the different supported formats:
OrthoFinder_file <- system.file('extdata/OrthoFinder_format.tsv',
                                package = 'genoPlotR')
OrthoMCL_file <- system.file('extdata/OrthoMCL_format.txt',
                             package = 'genoPlotR')
MMSeqs2_or_DIAMOND_file <- system.file('extdata/MMseqs2_DIAMOND_format.tsv',
                                       package = 'genoPlotR')

cat(readLines(OrthoFinder_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#> 
cat(readLines(OrthoMCL_file), sep = "\n")
#> GroupA: 1_FeatA1 1_FeatA2 2_FeatA 3_FeatA
#> GroupB: 1_FeatB 2_FeatB 3_FeatB
cat(readLines(MMSeqs2_or_DIAMOND_file), sep = "\n")
#> Warning: file("") only supports open = "w+" and open = "w+b": using the former
#>