This function calculates the nuclear fraction score by parsing all of the reads in the provided BAM file and determining their overlap with intron and exon intervals obtained from the provided annotation file. It is a more flexible function than nuclear_fraction_tags() because it doesn't rely on the BAM file already containing tags describing whether reads are aligned to intronic or exonic regions, but it is also slower, because this information needs to be calculated.

nuclear_fraction_annotation(
  annotation_path,
  annotation_format = "auto",
  bam,
  bam_index = paste0(bam, ".bai"),
  barcodes,
  cell_barcode_tag = "CB",
  cores = future::availableCores() - 1,
  tiles = 1000,
  verbose = TRUE
)

Arguments

annotation_path

character, should be a character vector pointing to the annotation file

annotation_format

character. Can be one of "auto", "gff3" or "gtf". This is passed to the 'format' argument of GenomicFeatures::makeTxDbFromGFF(). THis should generally just be left as "auto"

bam

character, should be a character vector pointing to the BAM file

bam_index

character, the path to the input bam file index

barcodes

character, either a vector of barcode names or the path to a file barcodes.tsv.gz containging the cell barcodes. If providing the cell barcodes as a vector, make sure that the format matches the one in the BAM file - e.g. be mindful if there are integers appended to the end of the barcode sequence.

cell_barcode_tag

character, defines the BAM tage which contains the cell barcode e.g. "CB"

cores

numeric, parsing of the BAM file can be run in parallel using furrr:future_map() with the requested number of cores. Setting cores=1 will cause future_map to run sequentially.

tiles

integer, to speed up the processing of the BAM file we can split transcripts up into tiles and process reads in chunks, default=1000.

verbose

logical, whether or not to print progress

Value

data.frame. Returns a data frame containing the nuclear fraction score. THis is just the fraction of reads that are intronic:

nuclear fraction = # intronic reads / (# intronic reads + # of exonic reads)

The row names of the returned data frame will match the order and name of the supplied barcodes.

Examples

nf3 <- nuclear_fraction_annotation( annotation_path = system.file("extdata/outs/chr1.gff3", package = "DropletQC"), bam = system.file("extdata/outs/possorted_genome_bam.bam", package = "DropletQC"), barcodes = system.file( "extdata/outs/filtered_feature_bc_matrix/barcodes.tsv.gz", package = "DropletQC"), tiles = 1, cores = 1, verbose = FALSE) head(nf3)
#> nuclear_fraction #> AAAAGTCACTTACTTG-1 0.9032698 #> AAAAGTGGATCTCTAA-1 0.4032761 #> AAAGCAGTTACGAAGA-1 0.3957704 #> AACGACTTCAATATGT-1 0.4004525 #> AACGGCGTCATCTGGA-1 0.8845109 #> AAGCAGGGGTCGCGAA-1 0.3929376