This function is used to identify a suitable nuclear fraction cut-off point to guide the identification of empty droplets. To do this it calculates the kernel density estimate of the input nuclear fraction scores and identifies the trough after the first peak, which is assumed to represent the population of empty droplets.

identify_empty_drops(
  nf_umi,
  nf_rescue = 0.05,
  umi_rescue = 1000,
  include_plot = FALSE,
  plot_name = NULL,
  plot_path = NULL,
  plot_width = 18,
  plot_height = 13,
  pdf_png = "png"
)

Arguments

nf_umi

data frame, containing two columns; the nuclear fraction estimates in the first column and the total UMI count for each barcode in the second column

nf_rescue

numeric, a rescue parameter defining a minimum nuclear fraction score between zero and one. This is used in combination with umi_rescue to identify cells that were misidentified as empty droplets

umi_rescue

integer, a rescue parameter defining a minimum UMI count. This is used in combination with nf_rescue to identify cells that were misidentified as empty droplets

include_plot

logical, whether or not to produce a plot illustrating how the nuclear fraction threshold was identified and which barcodes have been called as empty droplets. In the plot of nuclear fraction vs log10(UMI counts), empty droplets are expected to occupy the lower left corner of the plot.

plot_name

character, if provided a plot will be saved with the provided name

plot_path

character, if provided a plot will be saved to the provided path

plot_width

numeric, plot width in cm

plot_height

numeric, plot height in cm

pdf_png

character, either "png" or "pdf"

Value

data frame, the original data frame is returned plus an additional column identifying each barcode as a "cell" or "empty_droplet"

Examples

data("qc_examples") gbm <- qc_examples[qc_examples$sample=="GBM",] gbm.ed <- gbm[,c("nuclear_fraction_droplet_qc","umi_count")] gbm.ed <- identify_empty_drops(nf_umi = gbm.ed) head(gbm.ed)
#> nuclear_fraction_droplet_qc umi_count cell_status #> AAACCCAAGGCGATAC-1 0.1947243 2226 cell #> AAACCCAAGGCTGTAG-1 0.2766798 1063 cell #> AAACCCACAAGTCCCG-1 0.1843824 17883 cell #> AAACCCACAGATGCGA-1 0.2919902 8172 cell #> AAACCCACAGGTGAGT-1 0.3295617 9057 cell #> AAACCCAGTCTTGCGG-1 0.3795893 5612 cell
table(gbm.ed$cell_status)
#> #> cell empty_droplet #> 5296 220