Getting started
Making a panopticon-friendly loom file
To get started, find some scRNA data, or download some public data e.g.
from the Broad Institute Single Cell
Portal. One good
starting place is Tirosh et al., Science
2016.
Most of the time, scRNA data is shared as a big dense .txt, .tsv
or .csv matrix with each column corresponding to a cell, and each
row corresponding to a gene, and the entries representing gene
expression, normalized in some way. Of course, sometimes the rows are
cells and columns are genes, so keep your head on a swivel. Usually
there will be some metadata wherein each row gives some extra info
corresponding to each cell.
The basic data object used by panopticon is a
loom file, with some expectations for what
certain columns attributes and row attributes are named. Cell names
belong in the cellname column attribute, gene names belong in the
gene row attribute; complexity (the number of unique genes expressed
by a cell) is the column attribute nGene—the number of unique
cells expressing a gene is nCell. Beyond that, you’re pretty
vogelfrei.
For help in getting from whatever data you have to what you can run other panopticon commands on, run
panopticon book scrna-wizard
This tool should guide you through a series of prompts to make a fully functioning panopticon-friendly loom file. If it seems to be failing, it’s usually because you have some input data with a formatting edge case that I haven’t yet thought of. Try to resolve that, or, if it persists, message me.
Data normalization, clustering, and exploration
9 times out of 10 I’ll start out with something like the following,
where the '' layer represents raw transcript counts.
import loompy
from panopticon.preprocessing import generate_count_normalization
from panopticon.analysis import generate_incremental_pca, generate_embedding, generate_clustering, generate_masked_module_score
import matplotlib.pyplot as plt
import numpy as np
db = loompy.connect("WhateverYouNamedIt.loom")
layername = 'log2(TP100k+1)'
# Generates a new layer with log2(transcripts per
# denominator).
generate_count_normalization(db, '', layername)
generate_incremental_pca(db, layername)
generate_embedding(db, layername)
generate_clustering(db, layername, clustering_depth=3)
fig, ax = plt.subplots(figsize=(6,6))
for cluster in np.unique(db.ca['ClusteringIteration0']):
mask = db.ca['ClusteringIteration0'] == cluster
ax.scatter(db.ca['PCA UMAP embedding 1'][mask], db.ca['PCA UMAP embedding 2'][mask], label=cluster)
plt.legend()
plt.show()
This will plot your basic UMAP of all your cells, with cells clusters and colored based on the first iteration of the panopticon’s agglomerative iterative subclustering procedure.
Making a split-exon gtf file
Suppose that you have a .gtf file (hereafter nameofyourgtf.gtf). If you want to create a different .gtf file where a particular gene (say, GeneOfInterest) has been replaced with separate “genes” corresponding to different exons of that gene, you run the following command on the command line:
panopticon create-split-exon-gtf nameofyourgtf.gtf nameofyouroutputgtf.gtf GeneOfInterest
If there are multiple such genes (GeneOfInterest1, GeneOfInterest2…), you may “split” them with the command:
panopticon create-split-exon-gtf nameofyourgtf.gtf nameofyouroutputgtf.gtf GeneOfInterest1 GeneOfInterest2
and so on.