Skip to contents

Functions to parse comparison objects from tabular data or BLAST output.

Usage

read_comparison_from_file(file, fileType, ...)

read_comparison_from_files(files, fileType, seg_labels = NULL, ...)

read_comparison_from_blast(
  file,
  sort_by = NULL,
  filt_high_evalue = NULL,
  filt_low_per_id = NULL,
  filt_length = NULL,
  color_scheme = NULL,
  reverse = 0
)

read_comparison_from_tab(file, header = TRUE, filt_length = NULL, reverse = 0)

Arguments

file

A character string containing a file path, or a file connection.

fileType

A character string containing the file format to parse. Must be one of: "blast", or "tab".

...

Further arguments to pass to read_comparison_from_blast or read_comparison_from_tab, see arguments below.

files

A list or character vector containing file paths. Supports wildcard expansion (e.g. *.txt).

seg_labels

A character vector containing dna_seg labels. When provided, this function will search for file names that match these dna_seg labels. For example, with seg_labels = c("seg1", "seg2", "seg3"), it will look for "seg1_seg2" and "seg2_seg3" among the file names provided by the files argument, parse those files, and ignore all other files provided.

sort_by

A character string containing the name of a column to sort by. Must be a numeric column.

filt_high_evalue

A numerical, filters out all comparisons with an e-value higher than this value (unfiltered when left as NULL).

filt_low_per_id

A numerical, filters out all comparisons with a percentage identity lower than this value (unfiltered when left as NULL).

filt_length

A numerical, filters out all comparisons that have alignments shorter than this value (unfiltered when left as NULL).

color_scheme

A color scheme to apply. Possible values include grey, red_blue, and NULL (which applies no color scheme). See gradient_color_scheme for more details.

reverse

When provided, the comparison will be reversed. If reverse = 1, the first side will be reversed. If reverse = 2, the second side will be reversed. If reverse < 1, no side is reversed. If reverse > 2, both sides are reversed.

header

Logical. If TRUE, the first line will be parsed as a header containing column names.

Value

A list of comparison objects for read_comparison_from_files, and a single comparison object otherwise.

Details

To make comparisons from tabular files, the columns in these files must include start1, end1, start2, and end2. If no header is specified, these columns will be expected in that order, and an optional fifth column will be parsed as the color of the comparison under the col column. Additional columns can be included regardless of whether a header was specified.

BLAST output must be provided in the default tabular format (-outfmt 6), which contains the following columns in this order: qseqid, sseqid, pident, length, mismatch, gapopen, qstart, qend, sstart, send, evalue, and bitscore.

The resulting comparison object will contain these columns under these names: name1, name2, per_id, aln_len, mism, gaps, start1, start2,end1, end2, e_value, and bit_score.

Author

Lionel Guy, Jens Roat Kultima, Mike Puijk

Examples


## Read comparison from tabular data
tab_file <- system.file('extdata/comparison2.tab', package = 'genoPlotR')
tab_comp <- read_comparison_from_tab(tab_file)
data.frame(tab_comp)
#>   start1 end1 start2 end2 direction
#> 1     50  500   1899 2034         1
#> 2    800 1100   2732 2508        -1

## Load dna_segs
data(barto)
bh <- barto$dna_segs[[3]]
bq <- barto$dna_segs[[4]]

## Read comparison from BLAST output
blast <- system.file('extdata/BH_vs_BQ.blastn.tab', package = 'genoPlotR')
blast_comp <- read_comparison_from_blast(blast, color_scheme = "red_blue")

## Plot
plot_gene_map(dna_segs = list(BH = bh, BQ = bq),
              comparisons = list(blast_comp),
              xlims = list(c(1,50000), c(1, 50000)))