nuclear_fraction_tags.Rd
This function uses the region type tags in a provided BAM file to calculate for each input cell barcode the nuclear fraction statistic. This is just the fraction of reads that are intronic:
nuclear fraction = # intronic reads / (# intronic reads + # of exonic reads)
The row names of the returned data frame will match the order and name of the supplied barcodes. As a minimum you can provide as input a directory containing cellranger output (outs).
nuclear_fraction_tags( outs = NULL, bam = NULL, bam_index = paste0(bam, ".bai"), barcodes = NULL, cores = future::availableCores() - 1, tiles = 100, cell_barcode_tag = "CB", region_type_tag = "RE", exon_tag = "E", intron_tag = "N", verbose = TRUE )
outs | character, the path to the 'outs' directory created by Cell Ranger. We assume outs is structured this way: ├── filtered_feature_bc_matrix │ ├── barcodes.tsv.gz │ ├── features.tsv.gz │ └── matrix.mtx.gz ├── possorted_genome_bam.bam ├── possorted_genome_bam.bam.bai ├── raw_feature_bc_matrix │ ├── barcodes.tsv.gz │ ├── features.tsv.gz │ └── matrix.mtx.gz Note that there will probably be other files in the directory as well. We don't need to worry about those, as the only three files that the function will require are; possorted_genome_bam.bam, possorted_genome_bam.bam.bai and filtered_feature_bc_matrix/barcodes.tsv.gz. This is the only required argument for the function. If your directory structure doesn't match the one created by Cell Ranger you can provide the file paths directly using the bam, bam_index and barcodes arguments. |
---|---|
bam | character, the path to the input bam file. Not required if an 'outs' directory is provided. |
bam_index | character, the path to the input bam file index. Not required if an 'outs' directory is provided. |
barcodes | character, either a vector of barcode names or the path to the barcodes.tsv.gz file output by Cell Ranger. If providing the cell barcodes as a vector, make sure that the format matches the one in the BAM file - e.g. be mindful if there are integers appended to the end of the barcode sequence. This argument isn't required if an 'outs' directory is provided - the function will just look for "barcodes.tsv.gz" in outs/filtered_feature_bc_matrix. |
cores | numeric, runs the function in parallel using furrr:future_map()
with the requested number of cores. Setting |
tiles | numeric, to speed up the processing of the BAM file we can split the genome up into tiles and process reads in chunks |
cell_barcode_tag | character, the BAM tag containing the cell barcode sequence |
region_type_tag | character, the BAM tag containing the region type |
exon_tag | character, the character string that defines a read as exonic |
intron_tag | character, the character string that defines a read as intronic |
verbose | logical, whether or not to print progress |
data.frame, the function returns a 1-column data frame containing the calculated nuclear fraction statistic for each input barcode. The order and names of the rows will match those of the input cell barcodes.
nf1 <- nuclear_fraction_tags( outs = system.file("extdata", "outs", package = "DropletQC"), tiles = 1, cores = 1, verbose = FALSE) head(nf1)#> nuclear_fraction #> AAAAGTCACTTACTTG-1 0.9032698 #> AAAAGTGGATCTCTAA-1 0.4032761 #> AAAGCAGTTACGAAGA-1 0.3957704 #> AACGACTTCAATATGT-1 0.4004525 #> AACGGCGTCATCTGGA-1 0.8845109 #> AAGCAGGGGTCGCGAA-1 0.3929376nf2 <- nuclear_fraction_tags( bam = system.file("extdata", "outs","possorted_genome_bam.bam", package = "DropletQC"), barcodes = c("AAAAGTCACTTACTTG-1", "AAAAGTGGATCTCTAA-1", "AAACACGTTCTCATCG-1"), tiles = 1, cores = 1, verbose = FALSE) nf2#> nuclear_fraction #> AAAAGTCACTTACTTG-1 0.9032698 #> AAAAGTGGATCTCTAA-1 0.4032761 #> AAACACGTTCTCATCG-1 0.0000000