Calculate the nuclear fraction statistic by quantifying intron/exon overlap

This function calculates the nuclear fraction score by parsing all of the reads in the provided BAM file and determining their overlap with intron and exon intervals obtained from the provided annotation file. It is a more flexible function than nuclear_fraction_tags() because it doesn't rely on the BAM file already containing tags describing whether reads are aligned to intronic or exonic regions, but it is also slower, because this information needs to be calculated.

nuclear_fraction_annotation(
  annotation_path,
  annotation_format = "auto",
  bam,
  bam_index = paste0(bam, ".bai"),
  barcodes,
  cell_barcode_tag = "CB",
  cores = future::availableCores() - 1,
  tiles = 1000,
  verbose = TRUE
)

Arguments

annotation_path	character, should be a character vector pointing to the annotation file
annotation_format	character. Can be one of "auto", "gff3" or "gtf". This is passed to the 'format' argument of GenomicFeatures::makeTxDbFromGFF(). THis should generally just be left as "auto"
bam	character, should be a character vector pointing to the BAM file
bam_index	character, the path to the input bam file index
barcodes	character, either a vector of barcode names or the path to a file barcodes.tsv.gz containging the cell barcodes. If providing the cell barcodes as a vector, make sure that the format matches the one in the BAM file - e.g. be mindful if there are integers appended to the end of the barcode sequence.
cell_barcode_tag	character, defines the BAM tage which contains the cell barcode e.g. "CB"
cores	numeric, parsing of the BAM file can be run in parallel using furrr:future_map() with the requested number of cores. Setting `cores=1` will cause future_map to run sequentially.
tiles	integer, to speed up the processing of the BAM file we can split transcripts up into tiles and process reads in chunks, default=1000.
verbose	logical, whether or not to print progress

Value

data.frame. Returns a data frame containing the nuclear fraction score. THis is just the fraction of reads that are intronic:

nuclear fraction = # intronic reads / (# intronic reads + # of exonic reads)

The row names of the returned data frame will match the order and name of the supplied barcodes.

Examples

nf3 <- nuclear_fraction_annotation(
 annotation_path = system.file("extdata/outs/chr1.gff3",
  package = "DropletQC"),
 bam = system.file("extdata/outs/possorted_genome_bam.bam",
  package = "DropletQC"),
 barcodes = system.file(
 "extdata/outs/filtered_feature_bc_matrix/barcodes.tsv.gz",
  package = "DropletQC"),
 tiles = 1, cores = 1, verbose = FALSE)
head(nf3)
#>                    nuclear_fraction
#> AAAAGTCACTTACTTG-1        0.9032698
#> AAAAGTGGATCTCTAA-1        0.4032761
#> AAAGCAGTTACGAAGA-1        0.3957704
#> AACGACTTCAATATGT-1        0.4004525
#> AACGGCGTCATCTGGA-1        0.8845109
#> AAGCAGGGGTCGCGAA-1        0.3929376