spateo.io.bgi#

IO functions for BGI stereo technology.

Module Contents#

Functions#

read_bgi_as_dataframe(→ pandas.DataFrame)

Read a BGI read file as a pandas DataFrame.

dataframe_to_labels(→ numpy.ndarray)

Convert a BGI dataframe that contains cell labels to a labels matrix.

dataframe_to_filled_labels(→ numpy.ndarray)

Convert a BGI dataframe that contains cell labels to a (filled) labels matrix.

read_bgi_agg(→ anndata.AnnData)

Read BGI read file to calculate total number of UMIs observed per

read_bgi(→ anndata.AnnData)

Read BGI read file as AnnData.

Attributes#

spateo.io.bgi.VERSIONS[source]#
spateo.io.bgi.COUNT_COLUMN_MAPPING[source]#
spateo.io.bgi.read_bgi_as_dataframe(path: str, label_column: str | None = None) pandas.DataFrame[source]#

Read a BGI read file as a pandas DataFrame.

Parameters:
path

Path to read file.

label_column

Column name containing positive cell labels.

Returns:

Pandas Dataframe with the following standardized column names.
  • gene: Gene name/ID (whatever was used in the original file)

  • x, y: X and Y coordinates

  • total, spliced, unspliced: Counts for each RNA species.

    The latter two is only present if they are in the original file.

spateo.io.bgi.dataframe_to_labels(df: pandas.DataFrame, column: str, shape: Tuple[int, int] | None = None) numpy.ndarray[source]#

Convert a BGI dataframe that contains cell labels to a labels matrix.

Parameters:
df

Read dataframe, as returned by read_bgi_as_dataframe().

columns

Column that contains cell labels as positive integers. Any labels that are non-positive are ignored.

Returns:

Labels matrix

spateo.io.bgi.dataframe_to_filled_labels(df: pandas.DataFrame, column: str, shape: Tuple[int, int] | None = None) numpy.ndarray[source]#

Convert a BGI dataframe that contains cell labels to a (filled) labels matrix.

Parameters:
df

Read dataframe, as returned by read_bgi_as_dataframe().

columns

Column that contains cell labels as positive integers. Any labels that are non-positive are ignored.

Returns:

Labels matrix

spateo.io.bgi.read_bgi_agg(path: str, stain_path: str | None = None, binsize: int = 1, gene_agg: Dict[str, List[str] | Callable[[str], bool]] | None = None, prealigned: bool = False, label_column: str | None = None, version: typing_extensions.Literal[stereo] = 'stereo') anndata.AnnData[source]#

Read BGI read file to calculate total number of UMIs observed per coordinate.

Parameters:
path

Path to read file.

stain_path

Path to nuclei staining image. Must have the same coordinate system as the read file.

binsize

Size of pixel bins.

gene_agg

Dictionary of layer keys to gene names to aggregate. For example, {‘mito’: [‘list’, ‘of’, ‘mitochondrial’, ‘genes’]} will yield an AnnData with a layer named “mito” with the aggregate total UMIs of the provided gene list.

prealigned

Whether the stain image is already aligned with the minimum x and y RNA coordinates.

label_column

Column that contains already-segmented cell labels.

version

BGI technology version. Currently only used to set the scale and scale units of each unit coordinate. This may change in the future.

Returns:

An AnnData object containing the UMIs per coordinate and the nucleus staining image, if provided. The total UMIs are stored as a sparse matrix in .X, and spliced and unspliced counts (if present) are stored in .layers[‘spliced’] and .layers[‘unspliced’] respectively. The nuclei image is stored as a Numpy array in .layers[‘nuclei’].

spateo.io.bgi.read_bgi(path: str, binsize: int | None = None, segmentation_adata: anndata.AnnData | None = None, labels_layer: str | None = None, labels: numpy.ndarray | str | None = None, seg_binsize: int = 1, label_column: str | None = None, add_props: bool = True, version: typing_extensions.Literal[stereo] = 'stereo') anndata.AnnData[source]#

Read BGI read file as AnnData.

Parameters:
path

Path to read file.

binsize

Size of pixel bins. Should only be provided when labels (i.e. the segmentation_adata and labels arguments) are not used.

segmentation_adata

AnnData containing segmentation results.

labels_layer

Layer name in segmentation_adata containing labels.

labels

Numpy array or path to numpy array saved with np.save that contains labels.

seg_binsize

the bin size used in cell segmentation, used in conjunction with labels and will be overwritten when labels_layer and segmentation_adata are not None.

label_column

Column that contains already-segmented cell labels. If this column is present, this takes prescedence.

add_props

Whether or not to compute label properties, such as area, bounding box, centroid, etc.

version

BGI technology version. Currently only used to set the scale and scale units of each unit coordinate. This may change in the future.

Returns:

Bins x genes or labels x genes AnnData.