API

Pre-processing

panopticon.preprocessing.generate_antibody_prediction(loom, raw_antibody_counts_df=None, antibodies=None, pseudocount=1, overwrite=False, only_generate_zscore=False, group_ca=None, hashtags=None, enforce_unimodality_of_positive_component=False, maximum_recursion_to_enforce_unimodality=4, enforce_positive_minimum_greater_than_negative_maximum=False, verbose=False)

This approach takes some inspiration from the dsb approach: https://doi.org/10.1101/2020.02.24.963603. However, there is no use of isotypes. Therefore, is amounts only to “step 1” of that procedure. This routine can also take into acccount the distribution of antibody/feature barcode counts from empty droplet using the optional argument raw_antibody_counts_df. This routine uses a 2-component Gaussian-mixture model on z-scored log1p antibody counts (with or without background correction) to predict whether a given cells is “positive” or “negative” for the feature barcode in question.

Parameters

loom (LoomConnection) – The LoomConnection object on which to make antibody predictions.

raw_antibody_counts_df (pandas.DataFrame object, or NoneType) – If `None`(Default value = None)

antibodies (str, or iterable of strings) – interable of strings, each of which should be a column attribute of loom. Each should represent the column attributes indicating the raw counts of a particular feature barcode. (Default value = None)

pseudocount (int) – Indicates the pseudocount of the feature barcode to use when taking the log(count+pseudocount) (Default value = 1)

overwrite (bool) – If True, will overwrite existing prediction. If False, with raise an Exception if prediction had previously been generated. (Default value = False)

only_generate_zscore (bool) – If True, will only generate log1pseudo(counts) and z-score thereof, but will not compute a Gaussian mixture model nor make subsequent prediction. (Default value = False)

Return type

None (all output are written as column attributes to LoomConnection loom)

panopticon.preprocessing.generate_cell_and_gene_quality_metrics(loom, layername, gene_ra='gene', ribosomal_qc=False, mitochondrial_qc=False, ribosomal_gene_mask=None, mitochondrial_gene_mask=None, verbose=False)

Calculates multiple QC-related quantities and writes them to the LoomConnection instance specified in loom:

nCountsForCell: absolute number of non-normalized counts for a given cell (across all genes)

nCountsForGene: absolute number of non-normalized counts for a give gene (across all cells)

RibosomalRelativeMeanAbsolutionDeviation: this is madrp / meanrp, where madrp is the mean absolute deviation of ribosomal protein genes (RP*) and meanrp is the mean expression of ribosomal protein genes (RP*) (means, mads compute on a cell-by-cell basis).

RibosomalMaxOverMean: this is maxrp / meanrp, where maxrp is the maximum RP* expression over all RP* genes; meanrp is as above (max, mean on a cell-by-cell basis).

MitochondrialMean: this is the mean expression of genes from the mitochondrial genome.

MitochondrialRelativeMeanAbsoluteDeviation: this is madmt / meanmt, where madmt is the mean absolute deviation of mitochondrial genes (MT*) and meanmt is the mean expression of mitochondrial protein genes (MT*) (means, mads computed on a cell-by-cell basis).

AllGeneRelativeMeanAbsoluteDeviation: this is madall / meanall, where madall is the MAD over all genes, and meanall is the mean over all genes (mad, mean on a cell-by-cell basis).

nGene: number of unique genes expressed (>0) on a cell-by-cell basis (also known as complexity).

nCell: number of unique cells expressing a gene (>0), on a gene-by-gene basis.

Parameters

loom (LoomConnection) – The LoomConnection instance upon which cell and gene quality metrics will be annotated.

layername (str) – The name of the layer upon which cell and gene quality metrics will be calculated.

gene_ra (str) – The row attribute used for genes (recommended to use the HUGO names for genes, as this is used by default to determine which genes are mitochondrial or ribosomal) (Default value = ‘gene’)

ribosomal_qc (bool) – If True, will compute ribosomal-based QC metrics (RibosomalRelativeMeanAbsolutionDeviation:, RibosomalMaxOverMean) (Default value = False)

mitochondrial_qc (bool) – If True, will compute mitochondrial-based QC metric ((Default value = False)

ribosomal_gene_mask (boolean mask with length equal to loom.shape[0]) – Specifies the indices (via Boolean mask) of the rows corresponding to ribosomal genes. If None, will generate mask from all genes whose name starts with ‘RP’ or ‘Rp’ (Default value = None)

mitochondrial_gene_mask (boolean mask with length equal to loom.shape[0]) – Specifies the indices (via Boolean mask) of the rows corresponding to mitochrondrial genes. If None, will generate mask from all genes whose name starts with ‘mt.’ or ‘mt.’ (Default value = None)

verbose (bool) – If True, will print notification whenever a QC calculation is completed. (Default value = False)

Return type

Calculated qualities are written to LoomConnection loom. Returns NoneType object.

panopticon.preprocessing.generate_count_normalization(loom, raw_count_layername, output_layername, denominator=10000, out_of_core_cell_threshold=20000, batch_size=512)

Generates a new layer with log2(transcripts per denominator). By default this will produce a log2(TP10k+1) layer; adjusting the denominator will permit e.g. a log2(TP100k+1) or log2(TPM+1), etc.

Parameters

loom (The LoomConnection instance upon which a count normalized layer will be computed.) –

raw_count_layername (The layername corresponding to raw counts, or normalized log counts (counts of TPM).) –

output_layername (The desired output layername.) –

denominator (The denominator for transcript count normalization (e.g. transcripts per denominator).) – (Default value = 10**5)

panopticon.preprocessing.generate_gene_variances(loom, layername)

Computes the variance (across cells) of genes, and assigns this to the row attribute GeneVar

Parameters

loom (LoomConnection) – The LoomConnection instance upon which gene variance will be calculated

layername (str) – The name of the layer of the LoomConnection upon which the gene variance will be calculated.

Return type

Writes a new variable Genevar as a row attribute to LoomConnection loom.Returns NoneType object.

panopticon.preprocessing.generate_guide_rna_prediction(loom, guide_rnas, nguide_ca='nGuide', nguide_reads_ca='nGuideReads', cell_prediction_summary_ca='CellGuidePrediction', overwrite=False, only_generate_log2=False, ncell_threshold_for_guide=10, nguide_threshold_for_cell=10)

This approach is inspired by Replogle et a. 2018 (https://doi.org/10.1038/s41587-020-0470-y). However, instead of a Gaussian/Poisson mixture, this routine uses a Poisson/Poisson mixture. This routine uses the pomegranate package (https://github.com/jmschrei/pomegranate).

Parameters

loom (LoomConnection) – A LoomConnection object upon which guide rna predictions will be made

guide_rnas (iterable of strings) – a list or other iterable of the strings, each corresponding to a column attribute of loom indicate the raw counts of a given guide RNA over cells

nguide_ca (str) – QC metric, indicating the name of the column attribute to use to indicate the number of predicted guide RNAs for a cell (Default value = ‘nGuide’)

nguide_reads_ca – QC metric, indicating the name of the column attribute to use to indicate the total number of guide RNA reads for a cell(Default value = ‘nGuideReads’)

cell_prediction_summary_ca (str) – Indicates the name of the column attribute to use to indicate a summary of positively-predicted guide RNAs for a cell(Default value = ‘CellGuidePrediction’)

overwrite (bool) – If False, will raise exception if requested column attributes have already been written. If True, will overwrite existing column attributes. (Default value = False)

only_generate_log2 (bool) – If true, will generate log2 guide RNA counts, but will not apply any mixture model prediction. (Default value = False)

ncell_threshold_for_guide (int) – Threshold for the number of cells wherein guide should have nonzero counts for mixture model to attempt prediction. (Default value = 10)

nguide_threshold_for_cell (int) – Threshold for the number of guides to be detected in a given cell to attempt to make a prediction for that particular cell. (Default value = 10)

panopticon.preprocessing.generate_standardized_layer(loom, layername, variance_axis='cell', batch_size=512, out_of_core_cell_threshold=20000)

Parameters

loom (The LoomConnection instance upon which the standardized layer will be added.) –

layername (The name of the layer which will be standardized.) –

variance_axis (The axis over which the standardization will proceed (i.e. over cells or over genes).) – (Default value = ‘cell’)

batch_size – (Default value = 512)

out_of_core_cell_threshold – (Default value = 20000)

panopticon.preprocessing.get_cluster_specific_greater_than_cutoff_mask(loom, metric, cluster_level, default_cutoff, exception_dict={})

Parameters

loom –

metric –

cluster_level –

default_cutoff –

exception_dict – (Default value = {})

panopticon.preprocessing.get_clustering_based_outlier_prediction(loom, max_cluster_fraction_break_threshold=0.99, cluster_proportion_minimum=0.01)

Parameters

loom –

max_cluster_fraction_break_threshold – (Default value = 0.99)

cluster_proportion_minimum – (Default value = 0.01)

Analysis

panopticon.analysis.conditional_simpson(x, x_conditional, x_total, with_replacement=False)

For computing simpson index directly from counts (or frequencies, if with_replacement=True), where the first selected element is conditional on some feature

Parameters

x –

with_replacement – (Default value = False)

x_conditional –

x_total –

panopticon.analysis.generate_clustering(loom, layername, starting_clustering_depth=0, n_clustering_iterations=3, max_clusters='cbrt_rule', mode='pca', n_components=50, silhouette_threshold=0.1, clusteringcachedir='clusteringcachedir/', out_of_core_batch_size=1024, min_subclustering_size=100, first_round_leiden=False, optimized_leiden=True, leiden_nneighbors=100, leiden_iterations=10, incremental_pca_threshold=10000, show_dendrogram=False, linkage='average', verbose=False, minimum_second_to_first_cluster_ratio=0.001)

Parameters

loom (LoomConnection object) –

final_clustering_depth (The clustering iteration on which to terminate; final_clustering_depth=3 will assign values through column attribute ClusteringIteration3) – (Default value = 3)

starting_clustering_depth (The clustering iteration on which to begin; starting_clustering_depth=0 will assign values to column attribute ClusteringIteration0) – (Default value = 0)

max_clusters – (Default value = 200)

layername –

mode – (Default value = ‘pca’)

silhouette_threshold – (Default value = 0.1)

clusteringcachedir – (Default value = ‘clusteringcachedir/’)

n_components – (Default value = 50)

out_of_core_batch_size – (Default value = 512)

n_clustering_iterations – (Default value = 3)

min_subclustering_size – (Default value = 50)

first_round_leiden – (Default value = False)

leiden_nneighbors – (Default value = 20)

leiden_iterations – (Default value = 10)

incremental_pca_threshold – (Default value = 10000)

panopticon.analysis.generate_diffusion_coordinates(loom, layername, sigma, n_coordinates=10, verbose=False, metric='euclidean')

Parameters

loom –

layername –

sigma –

n_coordinates – (Default value = 10)

verbose – (Default value = False)

metric – (Default value = ‘euclidean’)

panopticon.analysis.generate_embedding(loom, layername, min_dist=0.0001, n_neighbors=30, n_epochs=1000, metric='correlation', random_state=None, n_pca_components=None, mode='pca')

Parameters

loom (LoomConnection object) –

pca_type – (Default value = ‘log_tpm’)

layername –

min_dist – (Default value = 0.0001)

n_neighbors – (Default value = 30)

n_epochs – (Default value = 1000)

metric – (Default value = ‘correlation’)

random_state – (Default value = None)

pca_cols_to_use – (Default value = None)

components_to_use – (Default value = None)

mode – (Default value = ‘nmf’)

n_pca_components – (Default value = None)

panopticon.analysis.generate_incremental_pca(loom, layername, batch_size=512, n_components=50, min_size_for_incrementalization=5000)

Computes a principal component analysis (PCA) over a layer of interest. Defaults to incremental PCA (using IncrementalPCA from sklearn.decomposition) but will switch to conventional PCA for LoomConnections with cell numbers below a min_size_for_incrementalization. Will write the n_components principal components as row attributes: - (layer) PC (PC number, 1-indexed)

The following are written as attributes: - NumberPrincipalComponents_(layername). This is simply n_components. - PCExplainedVariancedRatio_(layername). This is explained_variance_ratio_ from the PCA model.

Will also run panopticon.analysis.generate_pca_loadings.

Parameters

loom (The LoomConnection instance upon which PCA will be calculated.) –

layername (The layer of the loom file over which the PCs will be computed.) –

batch_size – (Default value = 512)

n_components – (Default value = 50)

min_size_for_incrementalization – (Default value = 5000)

panopticon.analysis.generate_malignancy_score(loom, layername, cell_sort_key='CellSort', patient_id_key='patient_ID', malignant_sort_label='45neg', cell_name_key='cellname')

For calculating malignancy scores for cells based on inferred CNV. This subroutine isn’t terribly future proof. S Markson 6 June 2020.

Parameters

loom (LoomConnection object) –

layername –

cell_sort_key – (Default value = ‘CellSort’)

patient_id_key – (Default value = ‘patient_ID’)

malignant_sort_label – (Default value = ‘45neg’)

cellname_key – (Default value = ‘cellname’)from panopticon.wme import get_list_of_gene_windows)

tqdmcnv_scores_dict (robust_mean_windowed_expressionsfrom sklearn.decomposition import PCAfrom tqdm import) – (Default value = {}for patient in tqdm(np.unique(bm.ca[patient_id_key]))

desc – (Default value = ‘Computing per-patient)

scores' (per-cell malignancy) –

cell_name_key – (Default value = ‘cellname’)

panopticon.analysis.generate_masked_module_score(loom, layername, cellmask, genelist, ca_name, nbins=10, ncontrol=5, gene_ra='gene')

Parameters

loom (Name of loom object of interest.) –

layername (Layername on which the module score will be calculated.) –

cellmask (Mask over cells over which the score will be calculated ("None" for all cells)) –

genelist (list of gene names in signature) –

ca_name (Desired name of signature to be made into a column attribute.) –

nbins – (Default value = 10)

ncontrol – (Default value = 5)

gene_ra – (Default value = ‘gene’)

panopticon.analysis.generate_nmf_and_loadings(loom, layername, nvargenes=2000, n_components=100, verbose=False)

Parameters

loom (LoomConnection object) –

layername –

nvargenes – (Default value = 2000)

n_components – (Default value = 100)

verbose – (Default value = False)

panopticon.analysis.generate_pca_loadings(loom, layername, dosparse=False, batch_size=1024)

Parameters

loom (LoomConnection object) –

layername –

dosparse – (Default value = False)

batch_size – (Default value = 512)

panopticon.analysis.get_cluster_differential_expression(loom, layername, cluster_level=None, ident1=None, ident2=None, mask1=None, mask2=None, verbose=False, ident1_downsample_size=None, ident2_downsample_size=None, min_cluster_size=0, gene_alternate_name=None)

Parameters

loom (LoomConnection object) –

cluster_level – (Default value = None)

layername –

ident1 – (Default value = None)

ident2 – (Default value = None)

verbose – (Default value = False)

ident1_downsample_size – (Default value = None)

ident2_downsample_size – (Default value = None)

mask1 – (Default value = None)

mask2 – (Default value = None)

min_cluster_size – (Default value = 0)

gene_alternate_name – (Default value = None)

panopticon.analysis.get_cluster_embedding(loom, layername, cluster, min_dist=0.01, n_neighbors=None, verbose=False, mask=None, genemask=None, n_components_pca=50)

Parameters

loom (LoomConnection object) –

layername –

cluster –

min_dist – (Default value = 0.01)

n_neighbors – (Default value = None)

verbose – (Default value = False)

mask – (Default value = None)

genemask – (Default value = None)

n_components_pca – (Default value = 50)

panopticon.analysis.get_cluster_enrichment_dataframes(x, y, data, weights=None)

Parameters

x –

y –

data –

panopticon.analysis.get_cluster_markers(loom, layername, cluster_level)

Parameters

loom (LoomConnection object) –

layername –

cluster_level –

panopticon.analysis.get_cosine_self_similarity(loom, layername, cluster, self_mean=None)

Parameters

loom (LoomConnection object) –

layername –

cluster –

self_mean – (Default value = None)

panopticon.analysis.get_dictionary_of_cluster_means(loom, layername, clustering_level)

Parameters

loom (LoomConnection object) –

layername –

clustering_level –

panopticon.analysis.get_differential_expression_custom(X1, X2, genes, axis=0)

Parameters

X1 –

X2 –

genes –

axis – (Default value = 0)

panopticon.analysis.get_differential_expression_dict(loom, layername, output=None, downsample_size=500, starting_iteration=0, final_iteration=3, min_cluster_size=50, gene_alternate_name=None, verbose=True)

Runs get_cluster_differential_expression over multiple clustering iterations (From ClusteringIteration(x) to ClusteringIteration(y), inclusive, where x = starting_iteration, and y = final_iteration), where ident1 is a cluster, and ident2 is the set of all other clusters which differ only in the terminal iteration (e.g. if there are clusters 0-0, 0-1, and 0-2, 1-0, and 1-1, differential expression will compare 0-0 with 0-1 and 0-2, 0-1 with 0-0 and 0-2, etc). Outputs a dictionary with each of these differential expression result, with key equal to ident1.

Parameters

loom (LoomConnection object) –

layername (layer key of loom, over which differential expression will be computed) –

output (Optional filename whereto a .pkl object will be written with dictionary output, or an xlsx, with each key assigned to a separate sheet) – (Default value = None)

downsample_size (Number of cells from each cluster to downsample to prior to running differential expression) – (Default value = 500)

starting_iteration (if 0, will start with ClusteringIteration0, for example) – (Default value = 0)

final_iteration (if 3, will continue to ClusteringIteration3, for example) – (Default value = 3)

min_cluster_size (minimum size of clusters to consider (if one of clusters if below this threshold, will output nan instead of a differential expression dataframe for that particular key)) – (Default value = 50)

gene_alternate_name – (Default value = None)

verbose – (Default value = True)

panopticon.analysis.get_differential_expression_over_continuum(loom, layer, mask, covariate, method='spearman', gene_alternate_name=None)

Parameters

loom (LoomConnection object) –

layer –

mask –

covariate –

method – (Default value = ‘spearman’)

panopticon.analysis.get_enrichment_score(genes, geneset, scores=None, presorted=False, return_es_curve=False, return_pvalue=False, n_pvalue_permutations=1000)

Returns an enrichment score (ES) in the manner of Subramanian et al. 2005 (https://doi.org/10.1073/pnas.0506580102).

Parameters

genes –

geneset –

scores – (Default value = None)

presorted – (Default value = False)

return_es_curve – (Default value = True)

return_pvalue – (Default value = False)

n_pvalue_permutations – (Default value = 1000)

panopticon.analysis.get_metafield_breakdown(loom, cluster, field, complexity_cutoff=0, mask=None)

Parameters

loom (LoomConnection object) –

cluster –

field –

complexity_cutoff – (Default value = 0)

mask – (Default value = None)

panopticon.analysis.get_module_score_from_matrix(X, signature_mask, nbins=10, ncontrol=5)

generates a module score (a la Seurat’s AddModuleScore, see Tirosh 2016) on a matrix, with a mask. I don’t call this directly (S Markson 3 June 2020).

Parameters

X (matrix) –

signature_mask (indices corresponding to signature) –

nbins (Number of quantile bins to use) – (Default value = 10)

ncontrol (Number of genes in each matched quantile) – (Default value = 5)

layername –

cellmask –

panopticon.analysis.get_module_score_matrix(loom, layername, cellmask, signature_mask, nbins=10, ncontrol=5)

generates a module score (a la Seurat’s AddModuleScore, see Tirosh 2016) on a matrix, with a mask. I don’t call this directly (S Markson 3 June 2020).

Parameters

alldata (matrix) –

signature_mask (indices corresponding to signature) –

nbins (Number of quantile bins to use) – (Default value = 10)

ncontrol (Number of genes in each matched quantile) – (Default value = 5)

loom –

layername –

cellmask –

panopticon.analysis.get_patient_averaged_table(loom, patient_key='patient_ID', column_attributes=[], n_cell_cutoff=0)

Parameters

loom (LoomConnection object) –

patient_key – (Default value = ‘patient_ID’)

column_attributes – (Default value = [])

n_cell_cutoff – (Default value = 0)

panopticon.analysis.get_pca_loadings_matrix(loom, layername, n_components=None)

Parameters

loom (LoomConnection object) –

layername (corresponding layer from which to retrieve PCA loadings matrix) –

components_to_use – (Default value = None)

n_components – (Default value = None)

panopticon.analysis.get_subclustering(X, score_threshold, max_clusters=50, min_input_size=10, silhouette_threshold=0.2, regularization_factor=0.01, clusteringcachedir='clusteringcachedir/', show_dendrogram=False, linkage='average', silhouette_score_sample_size=None, verbose=False, minimum_second_to_first_cluster_ratio=0.001)

Parameters

embedding – score_threshold :

max_clusters – (Default value = 10)

X –

min_input_size – (Default value = 5)

silhouette_threshold – (Default value = 0.2)

regularization_factor – (Default value = 0.01)

clusteringcachedir – (Default value = ‘clusteringcachedir/’)

panopticon.analysis.hutcheson_t(x, y)

Parameters

x –

y –

panopticon.analysis.scrna2tracer_mapping(scrna_cellnames, tracer_cellnames)

Parameters

scrna_cellnames –

tracer_cellnames –

panopticon.analysis.simpson(x, with_replacement=False)

For computing simpson index directly from counts (or frequencies, if with_replacement=True)

Parameters

x –

with_replacement – (Default value = False)

Visualization

panopticon.visualization.cluster_differential_expression_heatmap(loom, layer, clusteringlevel, diffex={}, output=None, min_cluster_size=2, figsize=(5, 5), cbar_label=None, n_top_genes=10, gene_sort_criterion='CommonLanguageEffectSize', vmin=None, vmax=None, cmap='coolwarm', average_over_clusters=True, return_fig_ax=False, custom_gene_list=None, gene_ra='gene', rotate=False, cluster_blacklist=[])

Generates a heatmap, with expression of marker genes displayed in heatmap form. Can also be used with hand-picked genes using the custom_gene_list argument, as well as with custom labels by setting clusteringlevel to be a column attribute representing the clusters of interest. When using a custom set of genes, this command will automatically cluster those genes using the seaborn clustermap command.

Parameters

loom (LoomConnection) – LoomConnection object for which the differential expression over clusters will be computed param layer:

layer (str) – Specified loom layer to be used as values for heatmap.

clusteringlevel (int or str) – Specifies column attribute of loom to be uses as clusters. Typical values might be ClusteringIteration0. However, any column attribute can be used if the diffex dictionary object is pre-computed. (Default value = {})

diffex (dict) – Specifies pre-computed output of get_cluster_differential expression. Differential expression for clusters that don’t exist as keys in diffex will be computed on-the-fly. (Default value = {})

output (NoneType or str) – If str, then will write heatmap to file with filename output. (Default value = None)

min_cluster_size (int) – Minimum number of cells in cluster for cluster to be included in plot. (Default value = 2)

figsize (tuple) – Size of figure, in inches. (Default value = (5, 5)) :

cbar_label (str) – Specifies the label of colorbar axis. (Default value = None)

n_top_genes (int) – Specifies the number of top marker genes for each cluster to be displayed in heatmap. (Default value = 10)

gene_sort_criterion (str) – (Default value = ‘CommonLanguageEffectSize’)

vmin (float) – Sets the minimum value for heatmap/clustermap. None indicates no minimum value. (Default value = None)

vmax (float) – Sets the maximum value for heatmap/clustermap. None indicates no maximum value. (Default value = None)

cmap (str) – Sets the colormap to be used for heatmap/clustermap. Must be a valid matplotlib colormap. (Default value = ‘coolwarm’)

average_over_clusters (bool) – If set to True, will average expression over all cells of a given cluster. If False, will group clusters together, without averaging. (Default value = True)

return_fig_ax (bool) – If set to True will return a tuple (fig,`ax`) for the figure and axis respectively with the heatmap of interest. If a seaborn clustermap was generated, will instead return the full seaborn.clustermap output, wherefrom figure and axis objects can be accessed. (Default value = False)

custom_gene_list (list, ndarry or NoneType) – If not None, specifies a custome gene list to use for heatmap/clustermap. (Default value = None)

gene_ra (str) – Specifies the row attribute of LoomConnection loom indicating the gene name. (Default value = ‘gene’)

panopticon.visualization.cluster_enrichment_heatmap(x, y, data, show=True, output=None, fig=None, cax=None, ax=None, side_annotation=True, heatmap_shading_key='FractionOfCluster', annotation_key='Counts', annotation_fmt='.5g', figsize=(5, 5), weights=None)

Produces a heatmap indicating the fraction of cell clusters across groups. For example, if there are m experimental groups and n clusters of cells, will produce a heatmap with n rows and m columns. heatmap_shading_key can be any field of the named tuple output of panopticon.analysis.get_cluster_enrichment_dataframes. These include:

If heatmap_shading_key = “FractionOfCluster”, heatmap color will be row-normalized; that is, it will indicate the fraction of cells in a cluster that are in groups.

If heatmap_shading_key = “FractionOfGroup”, heatmap color will be column-normalized; that is, it will indicate the fraction of cells in a group that are in a given cluster.

If heatmap_shading_key = “Counts”, heatmap color will depict raw counts.

If heatmap_shading_key = “PhiCoefficient”, heatmap color will depict phi-coefficients (described below).

If heatmap_shading_key = “FishersExactP”, heatmap color will depict Fisher’s exact test p-values (described below).

P-values and phi-coefficients are computed by constructing the contigency matrices as follows:

a

b

c

d

where a represents counts in cluster (not normalized) in group, b counts not in cluster in group, c counts in group, not in cluster, and d counts not in group, not in cluster. This is most intuitive for two groups, but can be computed in all cases (margins of the contingency matrix will be unchanged). P-values are computed via scipy.stats.fisher_exact, and effect sizes by phi coefficient (panopticon.utilities.phi_coefficient).

If there are only two groups, side annotation can also be used in order to display counts, normalized counts, fisher’s exact p-values and phi-coefficients all on one plot (see https://doi.org/10.1101/2021.08.25.456956, Fig. 4c, f and Fig. 5c). This will only work in the two-group case however.

Parameters

x (str) – column of data indicating the group (e.g. experimental group)

y (str) – column of data indicating the cell cluster

data (pandas.DataFrame) – pandas.DataFrame object with

show (bool) – (Default value = True)

output (NoneType or str) – (Default value = None)

fig (matplotlib.figure.Figure or None) – (Default value = None)

cax (matplotlib.axes._subplots.AxesSubplot or None) – (Default value = None)

ax (matplotlib.axes._subplots.AxesSubplot) – (Default value = None)

side_annotation (bool) – (Default value = True)

heatmap_shading_key (str) – (Default value = ‘ClusterFraction’)

annotation_key (str) – (Default value = ‘Counts’)

annotation_fmt (str) – (Default value = ‘.5g’)

Return type

None.

See also

panopticon.analysis.get_cluster_enrichment_dataframes
routine for generating dataframes used in this visualization

scipy.stats.fisher_exact
Fisher’s exact test

panopticon.utilities.phi_coefficient
phi coefficient

panopticon.visualization.data_to_grid_kde(x, y, xmin=None, xmax=None, ymin=None, ymax=None, px=100)

Parameters

x (Vector) –

y (Vector) –

px – (Default value = 100)

xmin – (Default value = None)

xmax – (Default value = None)

ymin – (Default value = None)

ymax – (Default value = None)

panopticon.visualization.plot_cluster_umap(loom, layername, cluster, mask=None, plot_output=None, label_clusters=True, complexity_cutoff=0, downsample_to=500, blacklist=[])

Parameters

loom – param layername:

cluster – type cluster: Cluster, or list of clusters

sublayers – Default value = 1)

plot_output – Default value = None)

label_clusters – Default value = True)

complexity_cutoff – Default value = 0)

mask – Default value = None)

downsample_to – Default value = 500)

blacklist – Default value = [])

layername –

panopticon.visualization.plot_density(x, y, ax=None, cmap=<matplotlib.colors.ListedColormap object>)

Parameters

x –

y –

ax – (Default value = None)

cmap – (Default value = plt.cm.twilight_r)

panopticon.visualization.plot_differential_density(x, y, mask1, mask2, ax=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>)

Parameters

x –

y –

mask1 –

mask2 –

ax – (Default value = None)

cmap – (Default value = plt.cm.RdBu_r)

panopticon.visualization.plot_subclusters(loom, layername, cluster, sublayers=1, plot_output=None, label_clusters=True, complexity_cutoff=0, downsample_to=500, blacklist=[])

Parameters

loom – param layername:

cluster – type cluster: Cluster, or list of clusters

sublayers – Default value = 1)

plot_output – Default value = None)

label_clusters – Default value = True)

complexity_cutoff – Default value = 0)

downsample_to – Default value = 500)

blacklist – Default value = [])

layername –

panopticon.visualization.repertoire_plot(x=None, y=None, data=None, hue=None, ax=None, fig=None, show=False, output=None, pre_normalize_by_cohort=False, normalize=False, piechart=False, ylabel='', color_palette=None, stack_order='agnostic', smear=False, annotate_simpson=False, weights=None, colorkey_col=None)

Repertoire plot, designed for plotting cluster compositions or TCR repertoires as stacked bar plots or pies, with stack height indicating the size of a given TCR clone. See https://doi.org/10.1101/2021.08.25.456956, Fig. 3e. In this context, input should consist of a dataframe (‘data’), with each row representing a cell. Argument ‘y’ should be a column of ‘data’ representing the cell’s clone or other grouping of cells. Argument ‘x’ should be a column of ‘data’ representing the sample whence the cell came.

Parameters

x (str) – Column of data indicating sample. (Default value = None)

y (str) – Column of data indicate clone. (Default value = None)

data (pandas.DataFrame object, with necessary columns specified by arguments x, y.) – (Default value = None)

hue (Optional column of data indicating additional grouping of samples) – Default value = None)

ax (matplotlib matplotlib.axes._subplots.AxesSubplot object or array thereof, optional) – Default value = None)

fig (matplotlib.figure.Figure object, optional) – Default value = None)

show (if 'True', will run matplotlib.show() upon completion) – Default value = False)

output (argument to matplotlib.savefig) – Default value = None)

normalize – (Default value = False)

piechart – (Default value = False)

ylabel – (Default value = ‘’)

legend – (Default value = False)

color_palette – (Default value = None)

stack_order – (Default value = ‘agnostic’)

panopticon.visualization.swarmviolin(data, x, y, hue=None, ax=None, split=False, dodge=True, alpha=0.2, noswarm=False, violinplot_kwargs={}, swarmplot_kwargs={}, swarm_downsample_percentage=None, annotate_hue_pvalues=False, annotate_hue_effect_size=False, annotate_hue_n=False, annotate_hue_pvalue_fmt_str='p: {0:.2f}', annotate_hue_effect_size_fmt_str='es: {0:.2f}', annotate_hue_n_fmt_str='n: {}, {}', paired_hue_matching_col=None, effect_size='cohensd', pvalue='mannwhitney', custom_annotation_dict={}, custom_annotation_fontsize=6)

Parameters

data – type data: pandas dataframe in form that would be acceptable input to seaborn violin, swarmplots

diffex – param x:

y – type y: column of data to be used for violin, swarmplot y argument

hue (column of data to be used for violin, swarmplot hue argument) – Default value = 10)

ax (matplotlib axis) – Default value = None)

split (split: split argument to be passed to violin, swarmplot) – Default value = True)

alpha (alpha: alpha to be used for seaborn violinplot) – Default value = 0.2)

violinplot_kwargs – Default value = {})

swarmplot_kwargs – Default value = {})

swarm_downsample_percentage – Default value = None)

annotate_hue_pvalues – Default value = False)

annotate_hue_pvalue_fmt_str – Default value = ‘p: {0:.2f}’)

x –

noswarm – (Default value = False)

annotate_hue_effect_size – (Default value = False)

annotate_hue_effect_size_fmt_str – (Default value = ‘es: {0:.2f}’)

effect_size (If annotating the effect size between hues, this will set the relevant means of calculation. Must be one of 'cohensd' or 'cles' for Cohen's d, or common language effect size, respectively.) – (Default value = ‘cohensd’)

panopticon.visualization.volcano(diffex, ax=None, gene_column='gene', pval_column='pvalue', effect_size_col='CommonLanguageEffectSize', left_name='', right_name='', genemarklist=[], neglogpval_importance_threshold=5, title='', output=None, positions=None, show=True, gene_label_offset_scale=1, no_effect_line=0)

Parameters

diffex – param ax: (Default value = None)

gene_column – Default value = ‘gene’)

pval_column – Default value = ‘pvalue’)

left_name (Genes toward the left in the volcano plot, which are upregulated in group2 of 'diffex') – Default value = ‘’)

right_name (Genes toward the right in the volcano plot, which are upregulated in group1 of 'diffex') – Default value = ‘’)

genemarklist – Default value = [])

effect_size_importance_threshold – Default value = 0.5)

neglogpval_importance_threshold – Default value = 5)

title – Default value = ‘’)

output – Default value = None)

positions – Default value = None)

show – Default value = True)

gene_label_offset_scale – Default value = 1)

ax – (Default value = None)

Utilities

panopticon.utilities.cohensd(g1, g2)

Returns Cohen’s D for the effect size of group 1 values (g1) over group 2 values (g2).

Parameters

g1 (group 1 values (list or numpy vector)) –

g2 (group 2 values (list or numpy vector)) –

panopticon.utilities.combine_misshaped_looms(looms, combined_output_loomname, filename_suffix='_reshaped.loom', key_ras=['gene'], key_ra_for_combining='gene', layers_to_copy=[''], verbose=False)

Parameters

looms –

combined_output_loomname –

filename_suffix – (Default value = ‘_reshaped.loom’)

key_ras – (Default value = [‘gene’])

key_ra_for_combining – (Default value = ‘gene’)

layers_to_copy – (Default value = [‘’])

verbose – (Default value = False)

panopticon.utilities.convert_10x_h5(path_10x_h5, output_file, labelkey=None, label='', genes_as_ca=[], gene_whitelist=None, output_type='loom', write_chunked=False, chunk_size=512)

Parameters

path_10x_h5 –

output_file –

labelkey – (Default value = None)

label – (Default value = ‘’)

genes_as_ca – (Default value = [])

gene_whitelist – (Default value = None)

output_type – (Default value = ‘loom’)

panopticon.utilities.convert_h5ad(h5ad, output_loom, convert_obsm=True, convert_varm=True, convert_uns=True, convert_layers=True, write_chunked=False, chunk_size=512)

Parameters

h5ad –

output_loom –

convert_obsm – (Default value = True)

convert_varm – (Default value = True)

convert_uns – (Default value = True)

convert_layers – (Default value = True)

panopticon.utilities.create_excel_spreadsheet_from_differential_expression_dict(diffdict, filename)

Parameters

diffdict –

filename –

panopticon.utilities.create_gsea_txt_and_cls(loom, layername, output_prefix, phenotypes, cellmask=None, gene_ra='gene', cellname_ca='cellname')

Parameters

loom –

layername –

output_prefix –

phenotypes –

cellmask – (Default value = None)

gene_ra – (Default value = ‘gene’)

cellname_ca – (Default value = ‘cellname’)

panopticon.utilities.create_single_cell_portal_compatible_files(loom, layers=None, cellname='cellname', genename='gene', metadata_dict={}, gene_common_name='gene_common_name', coordinate_1='log2(TP10k+1) PCA UMAP embedding 1', coordinate_2='log2(TP10k+1) PCA UMAP embedding 2', clustering_ca_list=[], groupvsnumeric_dict={})

Parameters

loom –

layers – (Default value = None)

cellname – (Default value = ‘cellname’)

genename – (Default value = ‘gene’)

gene_common_name – (Default value = ‘gene_common_name’)

coordinate_1 – (Default value = ‘log2(TP10k+1) PCA UMAP embedding 1’)

coordinate_2 – (Default value = ‘log2(TP10k+1) PCA UMAP embedding 2’)

panopticon.utilities.create_split_exon_gtf(input_gtf, output_gtf, gene)

Parameters

input_gtf –

output_gtf –

gene –

panopticon.utilities.deintify(df_init)

Parameters

df_init –

panopticon.utilities.generate_ca_frequency(loom, ca, blacklisted_ca_values=[], exclude_blacklisted_in_denominator=True, second_ca=None, output_name=None, overwrite=False, output_counts_name=None)

Parameters

loom –

ca –

blacklisted_ca_values – (Default value = [])

second_ca – (Default value = None)

output_name – (Default value = None)

overwrite – (Default value = False)

exclude_blacklisted_in_denominator – (Default value = True)

output_counts_name – (Default value = None)

panopticon.utilities.get_UMI_curve_from_10x_h5(path_10x_h5, save_to_file=None)

Parameters

path_10x_h5 –

save_to_file – (Default value = None)

panopticon.utilities.get_alpha_concave_hull_polygon(xcoords, ycoords, alpha=0.1, buffer=1)

Much credit to https://thehumangeo.wordpress.com/2014/05/12/drawing-boundaries-in-python/

Parameters

xcoords –

ycoords –

alpha – (Default value = 0.1)

buffer – (Default value = 1)

panopticon.utilities.get_cellphonedb_compatible_counts_and_meta(loom, layername, celltype_ca, gene_ra='gene', cellname_ca='cellname', return_df=False, output_prefix=None, mouse_to_human=False)

Parameters

loom –

layername –

celltype_ca –

gene_ra – (Default value = ‘gene’)

cellname_ca – (Default value = ‘cellname’)

return_df – (Default value = False)

output_prefix – (Default value = None)

mouse_to_human – (Default value = False)

panopticon.utilities.get_clumpiness(distances, clusteringcachedir='/tmp', verbose=False)

Parameters

distances –

clusteringcachedir – (Default value = ‘/tmp’)

verbose – (Default value = False)

panopticon.utilities.get_cluster_differential_expression_heatmap_df(loom, layer, clusteringlevel, diffex={}, gene_name='gene', cell_name='cellname')

Parameters

loom –

layer –

clusteringlevel –

diffex – (Default value = {})

gene_name – (Default value = ‘gene’)

cell_name – (Default value = ‘cellname’)

panopticon.utilities.get_complement_contigency_tables(df)

Parameters

df –

panopticon.utilities.get_cross_column_attribute_heatmap(loom, ca1, ca2, normalization_axis=None)

Parameters

loom –

ca1 –

ca2 –

normalization_axis – (Default value = None)

panopticon.utilities.get_dsb_normalization(cell_antibody_counts, empty_droplet_antibody_counts, use_isotype_control=True, denoise_counts=True, isotype_control_name_vec=None, define_pseudocount=False, pseudocount_use=10, quantile_clipping=False, quantile_clip=[0.001, 0.9995], return_stats=False)

Parameters

cell_antibody_counts –

empty_droplet_antibody_counts –

use_isotype_control – (Default value = True)

denoise_counts – (Default value = True)

isotype_control_name_vec – (Default value = None)

define_pseudocount – (Default value = False)

pseudocount_use – (Default value = 10)

quantile_clipping – (Default value = False)

quantile_clip – (Default value = [0.001)

0.9995] –

return_stats – (Default value = False)

panopticon.utilities.get_igraph_from_adjacency(adjacency, directed=None)

This is taken from scanpy._utils.__init__.py as of 12 August 2021

Get igraph graph from adjacency matrix.

Parameters

adjacency –

directed – (Default value = None)

panopticon.utilities.get_outlier_removal_mask(xcoords, ycoords, nth_neighbor=10, quantile=0.9)

Parameters

xcoords –

ycoords –

nth_neighbor – (Default value = 10)

quantile – (Default value = .9)

panopticon.utilities.get_umap_from_matrix(X, random_state=17, verbose=True, min_dist=0.001, n_neighbors=20, metric='correlation')

Parameters

X –

random_state – (Default value = 17)

verbose – (Default value = True)

min_dist – (Default value = 0.001)

n_neighbors – (Default value = 20)

metric – (Default value = ‘correlation’)

panopticon.utilities.get_valid_gene_info(genes: List[str], release=106, species='homo sapiens', gene_info_threshold=0.95, include_X_chromosome=False) → Tuple[List[str], List[int], List[int], List[int]]

Returns gene locations for all genes in ensembl release 93 –S Markson 3 June 2020

Parameters

genes (List[str] :) –

release – (Default value = 102)

species – (Default value = ‘homo sapiens’)

genes –

panopticon.utilities.import_check(package, statement_upon_failure, standard_prefix=True)

Parameters

package –

statement_upon_failure –

standard_prefix – (Default value = True)

panopticon.utilities.intify(df_init)

Parameters

df_init –

panopticon.utilities.phi_coefficient(contingency_table)

Returns the phi-coefficient for a contingency table.

Paramenters

contingency_table : contingency table, identical in format to scipy.stats.fisher_exact

param contingency_table

panopticon.utilities.recover_meta(db, do_deint=False)

Parameters

db –

do_deint – (Default value = False)

panopticon.utilities.seurat_to_loom(seuratrds, patient_id_column, celltype_column, complexity_column, loomfile)

Parameters

seuratrds –

patient_id_column –

celltype_column –

complexity_column –

loomfile –

panopticon.utilities.tcr_levenshtein_distance(tra1=None, tra2=None, trb1=None, trb2=None)

Parameters

tra1 – (Default value = None)

tra2 – (Default value = None)

trb1 – (Default value = None)

trb2 – (Default value = None)

panopticon.utilities.we_can_pickle_it(thing, thingname: str)

Parameters

thing –

thingname (str :) –

thingname –

thingname –

thingname –

thingname –

thingname –

thingname –

thingname –

panopticon.utilities.we_can_unpickle_it(thingname: str)

Parameters

thingname (str :) –

thingname –

thingname –

thingname –

thingname –

thingname –

thingname –

thingname –

Windowed-Mean Expression (WME)

wme.py

wme

panopticon.wme.convert_to_sparse(dense_file, sparse_file=None, genes_not_present=False, genelist_file=None, delimiter='\t')

Parameters

dense_file –

sparse_file – (Default value = None)

genelist_file – (Default value = None)

delimiter – (Default value = ‘ ‘)

panopticon.wme.get_list_of_gene_windows(genes, window_size=200, window_step=50, release=106, species='homo sapiens')

This function will, given a set of genes, return a list of lists, where

Parameters

genes (list of str) – The list of genes that will be used to generate a list of gene windows (list of lists).

window_step (int) – How many genes over each window will be “shifted” from the previous. (Default value = 50)

window_size (int) – The size of the windows. (Default value = 200)

release (int) – The ensembl release which will be used to sort the genes into windows of contiguous genes along the genome. (Default value = 106)

Return type

A list of lists of strings. Each element of this list will have length window_size.

panopticon.wme.get_ranks(mean_window_expressions)

Parameters

mean_window_expressions –

panopticon.wme.get_windowed_mean_expression(loom, list_of_gene_windows, patient_column='Patient_ID', patient=0, cell_type_column=None, cell_type=None, complexity_column='nGene', complexity_cutoff=0, upper_cut=5, log2=False)

THIS IS DEPRECATED–S. Markson 4 June 2020

Parameters

genes – param metadata:

expression_data – param list_of_gene_windows:

patient – param cell_type: (Default value = ‘tumor’)

complexity_cutoff – Default value = 1000)

cell_type_col_name – Default value = ‘cell.type’)

patient_col_name – Default value = ‘patient_ID’)

complexity_col_name – Default value = ‘nGene’)

metadata –

list_of_gene_windows –

cell_type – (Default value = ‘tumor’)

patient_columns – (Default value = ‘Patient_ID’)

cell_type_column – (Default value = ‘cell.type’)

panopticon.wme.robust_mean_windowed_expressions(genes, list_of_gene_windows, expression_data, upper_cut=5, windsor=False, use_tqdm=True, tqdm_desc='computing WME')

Produces an arithmetic mean over expression in windows determined by list_of_gene_windows. Highest-expression genes in each window are discarded. Can be made more memory-friendly, by implementing a map function over expression_data–I still haven’t done this. S Markson 4 June 2020.

Parameters

genes – param list_of_gene_windows:

expression_data – param upper_cut: (Default value = 0)

windsor – Default value = False)

tqdm_desc – Default value = ‘’)

list_of_gene_windows –

upper_cut – (Default value = 5)

DNA

panopticon.dna.multireference_dna_correspondence(loom: Any, loomquery: Any, *segmentations: Any) → Any

Parameters

loom (Any :) –

loomquery (Any :) –

*segmentations (Any :) –

panopticon.dna.segmentation_to_copy_ratio_dict(genes: List[str], segmentation: Any, chrom_col: str = 'chrom', start_col: str = 'chromStart', end_col: str = 'chromEnd', score_col: str = 'copyRatio', log2: bool = False) → Any

Parameters

segmentation (Any :) –

chrom_col (str :) – (Default value = ‘chrom’)

start_col (str :) – (Default value = ‘chromStart’)

end_col (str :) – (Default value = ‘chromEnd’)

score_col (str :) – (Default value = ‘score’)

log2 (bool :) – (Default value = False)

genes (List[str] :) –

segmentation –

chrom_col – (Default value = ‘chrom’)

start_col – (Default value = ‘chromStart’)

end_col – (Default value = ‘chromEnd’)

score_col – (Default value = ‘copyRatio’)

log2 – (Default value = False)

Clustering

panopticon.clustering.kt_cluster(mean_window_expression_ranks: ndarray, t: int = 4) → Any

For clustering points (ideally mean window expression vectors) via 1 - KT, where KT is the Kendall-Tau correlation of those vectors

Parameters

mean_window_expression_ranks (np.ndarray :) –

t (int :) – (Default value = 4)

mean_window_expression_ranks –

t – (Default value = 4)

mean_window_expression_ranks –

t – (Default value = 4)

panopticon.clustering.leiden_with_silhouette_score(X, leiden_nneighbors, skip_silhouette=False, leiden_iterations=10)

Parameters

X –

leiden_nneighbors –

skip_silhouette – (Default value = False)

leiden_iterations – (Default value = 10)

panopticon.clustering.silhouette_optimized_leiden(X, min_neighbors=2, initial_intermediate=128, max_neighbors=1024, verbose=True)

Parameters

X –

min_neighbors – (Default value = 2)

initial_intermediate – (Default value = 128)

max_neighbors – (Default value = 1024)

verbose – (Default value = True)

Legacy

panopticon.legacy.create_subsetted_loom(loom, output_loom_filename, cellmask)

Deprecated.

Will create a new loom file with cells specified according to a Boolean vector mask.

Parameters

loom (LoomConnection object which will be subsetted) –

output_loom_filename (string denoting the path and filename of the output loom file.) –

cellmask (Boolean numpy vector with length equal to the number of cells in "loom") –

panopticon.legacy.create_subsetted_loom_space_efficient(loom, output_loom_filename, cellmask, batch_size=1024)

Deprecated.

Will create a new loom file with cells specified according to a Boolean vector mask.

Parameters

loom (LoomConnection object which will be subsetted) –

output_loom_filename (string denoting the path and filename of the output loom file.) –

cellmask (Boolean numpy vector with length equal to the number of cells in "loom") –

batch_size (Size (number of cells) to add to output file at a time (default: 1024)) –

panopticon.legacy.create_subsetted_loom_with_genemask(loom, output_loom, cellmask, genemask)

Deprecated.

Parameters

loom –

output_loom –

cellmask –

genemask –

panopticon.legacy.get_gsea_with_selenium(diffex, email='s')

If you aren’t Sam, probably don’t use this.

Parameters

diffi –

panopticon.legacy.get_module_score_loom(loom, signature_name, querymask=None, nbins=10, ncontrol=5)

Calculates a module score over a loom file. This routine is deprecated–use generate masked module score (S Markson 3 June 2020).

Deprecated since version 0.1.

Parameters

loom (loom object on which to calculate score) –

signature_name (Name of signature (pre-loaded into loom object) over which to calculate score) –

nbins (Number of quantile bins to use) – (Default value = 100)

ncontrol (Number of genes in each matched quantile) – (Default value = 5)

querymask – (Default value = None)