spateo.io.bgi¶

IO functions for BGI stereo technology.

Attributes¶

`VERSIONS`
`COUNT_COLUMN_MAPPING`

Functions¶

`read_bgi_as_dataframe`(→ pandas.DataFrame)	Read a BGI read file as a pandas DataFrame.
`dataframe_to_labels`(→ numpy.ndarray)	Convert a BGI dataframe that contains cell labels to a labels matrix.
`dataframe_to_filled_labels`(→ numpy.ndarray)	Convert a BGI dataframe that contains cell labels to a (filled) labels matrix.
`read_bgi_agg`(→ anndata.AnnData)	Read BGI read file to calculate total number of UMIs observed per
`read_bgi`(→ anndata.AnnData)	Read BGI read file as AnnData.

Module Contents¶

spateo.io.bgi.VERSIONS[source]¶

spateo.io.bgi.COUNT_COLUMN_MAPPING[source]¶

spateo.io.bgi.read_bgi_as_dataframe(path: str, label_column: str | None = None) → pandas.DataFrame[source]¶

Read a BGI read file as a pandas DataFrame.

Parameters:

path: Path to read file.
label_column: Column name containing positive cell labels.

Returns:

Pandas Dataframe with the following standardized column names.

gene: Gene name/ID (whatever was used in the original file)
x, y: X and Y coordinates
total, spliced, unspliced: Counts for each RNA species.
The latter two is only present if they are in the original file.

spateo.io.bgi.dataframe_to_labels(df: pandas.DataFrame, column: str, shape: Tuple[int, int] | None = None) → numpy.ndarray[source]¶

Convert a BGI dataframe that contains cell labels to a labels matrix.

Parameters:

df: Read dataframe, as returned by read_bgi_as_dataframe().
columns: Column that contains cell labels as positive integers. Any labels that are non-positive are ignored.

Returns:

Labels matrix

spateo.io.bgi.dataframe_to_filled_labels(df: pandas.DataFrame, column: str, shape: Tuple[int, int] | None = None) → numpy.ndarray[source]¶

Convert a BGI dataframe that contains cell labels to a (filled) labels matrix.

Parameters:

df: Read dataframe, as returned by read_bgi_as_dataframe().
columns: Column that contains cell labels as positive integers. Any labels that are non-positive are ignored.

Returns:

Labels matrix

spateo.io.bgi.read_bgi_agg(path: str, stain_path: str | None = None, binsize: int = 1, gene_agg: Dict[str, List[str] | Callable[[str], bool]] | None = None, prealigned: bool = False, label_column: str | None = None, version: typing_extensions.Literal[stereo] = 'stereo') → anndata.AnnData[source]¶

Read BGI read file to calculate total number of UMIs observed per coordinate.

Parameters:

path: Path to read file.
stain_path: Path to nuclei staining image. Must have the same coordinate system as the read file.
binsize: Size of pixel bins.
gene_agg: Dictionary of layer keys to gene names to aggregate. For example, {‘mito’: [‘list’, ‘of’, ‘mitochondrial’, ‘genes’]} will yield an AnnData with a layer named “mito” with the aggregate total UMIs of the provided gene list.
prealigned: Whether the stain image is already aligned with the minimum x and y RNA coordinates.
label_column: Column that contains already-segmented cell labels.
version: BGI technology version. Currently only used to set the scale and scale units of each unit coordinate. This may change in the future.

Returns:

An AnnData object containing the UMIs per coordinate and the nucleus staining image, if provided. The total UMIs are stored as a sparse matrix in .X, and spliced and unspliced counts (if present) are stored in .layers[‘spliced’] and .layers[‘unspliced’] respectively. The nuclei image is stored as a Numpy array in .layers[‘nuclei’].

spateo.io.bgi.read_bgi(path: str, binsize: int | None = None, segmentation_adata: anndata.AnnData | None = None, labels_layer: str | None = None, labels: numpy.ndarray | str | None = None, seg_binsize: int = 1, label_column: str | None = None, add_props: bool = True, version: typing_extensions.Literal[stereo] = 'stereo') → anndata.AnnData[source]¶

Read BGI read file as AnnData.

Parameters:

path: Path to read file.
binsize: Size of pixel bins. Should only be provided when labels (i.e. the segmentation_adata and labels arguments) are not used.
segmentation_adata: AnnData containing segmentation results.
labels_layer: Layer name in segmentation_adata containing labels.
labels: Numpy array or path to numpy array saved with np.save that contains labels.
seg_binsize: the bin size used in cell segmentation, used in conjunction with labels and will be overwritten when labels_layer and segmentation_adata are not None.
label_column: Column that contains already-segmented cell labels. If this column is present, this takes prescedence.
add_props: Whether or not to compute label properties, such as area, bounding box, centroid, etc.
version: BGI technology version. Currently only used to set the scale and scale units of each unit coordinate. This may change in the future.

Returns:

Bins x genes or labels x genes AnnData.