nuclear_fraction_annotation.Rd
This function calculates the nuclear fraction score by parsing
all of the reads in the provided BAM file and determining their overlap with
intron and exon intervals obtained from the provided annotation file. It is
a more flexible function than nuclear_fraction_tags()
because it doesn't
rely on the BAM file already containing tags describing whether reads are
aligned to intronic or exonic regions, but it is also slower, because this
information needs to be calculated.
nuclear_fraction_annotation( annotation_path, annotation_format = "auto", bam, bam_index = paste0(bam, ".bai"), barcodes, cell_barcode_tag = "CB", cores = future::availableCores() - 1, tiles = 1000, verbose = TRUE )
annotation_path | character, should be a character vector pointing to the annotation file |
---|---|
annotation_format | character. Can be one of "auto", "gff3" or "gtf". This is passed to the 'format' argument of GenomicFeatures::makeTxDbFromGFF(). THis should generally just be left as "auto" |
bam | character, should be a character vector pointing to the BAM file |
bam_index | character, the path to the input bam file index |
barcodes | character, either a vector of barcode names or the path to a file barcodes.tsv.gz containging the cell barcodes. If providing the cell barcodes as a vector, make sure that the format matches the one in the BAM file - e.g. be mindful if there are integers appended to the end of the barcode sequence. |
cell_barcode_tag | character, defines the BAM tage which contains the cell barcode e.g. "CB" |
cores | numeric, parsing of the BAM file can be run in parallel using
furrr:future_map() with the requested number of cores. Setting |
tiles | integer, to speed up the processing of the BAM file we can split transcripts up into tiles and process reads in chunks, default=1000. |
verbose | logical, whether or not to print progress |
data.frame. Returns a data frame containing the nuclear fraction score. THis is just the fraction of reads that are intronic:
nuclear fraction = # intronic reads / (# intronic reads + # of exonic reads)
The row names of the returned data frame will match the order and name of the supplied barcodes.
nf3 <- nuclear_fraction_annotation( annotation_path = system.file("extdata/outs/chr1.gff3", package = "DropletQC"), bam = system.file("extdata/outs/possorted_genome_bam.bam", package = "DropletQC"), barcodes = system.file( "extdata/outs/filtered_feature_bc_matrix/barcodes.tsv.gz", package = "DropletQC"), tiles = 1, cores = 1, verbose = FALSE) head(nf3)#> nuclear_fraction #> AAAAGTCACTTACTTG-1 0.9032698 #> AAAAGTGGATCTCTAA-1 0.4032761 #> AAAGCAGTTACGAAGA-1 0.3957704 #> AACGACTTCAATATGT-1 0.4004525 #> AACGGCGTCATCTGGA-1 0.8845109 #> AAGCAGGGGTCGCGAA-1 0.3929376