Cyto utilities

Functions enabling smooth interaction with CellProfiler and DeepProfiler output formats.

pycytominer.cyto_utils.cells module

class pycytominer.cyto_utils.cells.SingleCells(sql_file, strata=['Metadata_Plate', 'Metadata_Well'], aggregation_operation='median', output_file=None, compartments=['cells', 'cytoplasm', 'nuclei'], compartment_linking_cols={'cells': {'cytoplasm': 'ObjectNumber'}, 'cytoplasm': {'cells': 'Cytoplasm_Parent_Cells', 'nuclei': 'Cytoplasm_Parent_Nuclei'}, 'nuclei': {'cytoplasm': 'ObjectNumber'}}, merge_cols=['TableNumber', 'ImageNumber'], image_cols=['TableNumber', 'ImageNumber', 'Metadata_Site'], add_image_features=False, image_feature_categories=None, features='infer', load_image_data=True, image_table_name='image', subsample_frac=1, subsample_n='all', subsampling_random_state=None, fields_of_view='all', fields_of_view_feature='Metadata_Site', object_feature='Metadata_ObjectNumber', default_datatype_float=<class 'numpy.float64'>)

Bases: object

This is a class to interact with single cell morphological profiles. Interaction includes aggregation, normalization, and output.

sql_file

SQLite connection pointing to the single cell database. The string prefix must be “sqlite:///”.

Type:

str

strata

The columns to groupby and aggregate single cells.

Type:

list of str, default [“Metadata_Plate”, “Metadata_Well”]

aggregation_operation

Operation to perform single cell aggregation.

Type:

str, default “median”

output_file

If specified, the location to write the file.

Type:

str, default None

compartments

list of compartments to process.

Type:

list of str, default [“cells”, “cytoplasm”, “nuclei”]

compartment_linking_cols

Dictionary identifying how to merge columns across tables.

Type:

dict, default noted below

merge_cols

Columns indicating how to merge image and compartment data.

Type:

list of str, default [“TableNumber”, “ImageNumber”]

image_cols

Columns to select from the image table.

Type:

list of str, default [“TableNumber”, “ImageNumber”, “Metadata_Site”]

add_image_features

Whether to add image features to the profiles.

Type:

bool, default False

image_feature_categories

list of categories of features from the image table to add to the profiles.

Type:

list of str, optional

features

list of features that should be loaded or aggregated.

Type:

str or list of str, default “infer”

load_image_data

Whether or not the image data should be loaded into memory.

Type:

bool, default True

image_table_name

The name of the table inside the SQLite file of image measurements.

Type:

str, default “image”

subsample_frac

The percentage of single cells to select (0 < subsample_frac <= 1).

Type:

float, default 1

subsample_n

How many samples to subsample - do not specify both subsample_frac and subsample_n.

Type:

str or int, default “all”

subsampling_random_state

The random state to init subsample.

Type:

str or int, default None

fields_of_view

list of fields of view to aggregate.

Type:

list of int, str, default “all”

fields_of_view_feature

Name of the fields of view feature.

Type:

str, default “Metadata_Site”

object_feature

Object number feature.

Type:

str, default “Metadata_ObjectNumber”

default_datatype_float

Numpy floating point datatype to use for load_compartment and resulting dataframes. This parameter may be used to assist with performance-related issues by reducing the memory required for floating-point data. For example, using np.float32 instead of np.float64 for this parameter will reduce memory consumed by float columns by roughly 50%. Please note: using any besides np.float64 are experimentally unverified.

Type:

type

Notes

Note

the argument compartment_linking_cols is designed to work with CellProfiler output, as curated by cytominer-database. The default is: {

“cytoplasm”: {

“cells”: “Cytoplasm_Parent_Cells”, “nuclei”: “Cytoplasm_Parent_Nuclei”,

}, “cells”: {“cytoplasm”: “ObjectNumber”}, “nuclei”: {“cytoplasm”: “ObjectNumber”},

}

aggregate_compartment(compartment, compute_subsample=False, compute_counts=False, add_image_features=False, n_aggregation_memory_strata=1)

Aggregate morphological profiles. Uses pycytominer.aggregate()

Parameters:
  • compartment (str) – Compartment to aggregate.

  • compute_subsample (bool, default False) – Whether or not to subsample.

  • compute_counts (bool, default False) – Whether or not to compute the number of objects in each compartment and the number of fields of view per well.

  • add_image_features (bool, default False) – Whether or not to add image features.

  • n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory. For example, if aggregating by “well”, then n_aggregation_memory_strata=1 means that one “well” will be pulled from the SQLite database into memory at a time.

Returns:

DataFrame of aggregated profiles.

Return type:

pandas.core.frame.DataFrame

aggregate_profiles(compute_subsample=False, output_file=None, compression_options=None, float_format=None, n_aggregation_memory_strata=1, **kwargs)

Aggregate and merge compartments. This is the primary entry to this class.

Parameters:
  • compute_subsample (bool, default False) – Whether or not to compute subsample. compute_subsample must be specified to perform subsampling. The function aggregate_profiles(compute_subsample=True) will apply subsetting even if subsample is initialized.

  • output_file (str, optional) – The name of a file to output. We recommended that, if provided, the output file be suffixed with “_augmented”.

  • compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file.

  • n_aggregation_memory_strata (int, default 1) – Number of unique strata to pull from the database into working memory at once. Typically 1 is fastest. A larger number uses more memory.

Returns:

if output_file=None) returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pandas.core.frame.DataFrame or str

count_cells(compartment='cells', count_subset=False)

Determine how many cells are measured per well.

Parameters:
  • compartment (str, default "cells") – Compartment to subset.

  • count_subset (bool, default False) – Whether or not count the number of cells as specified by the strata groups.

Returns:

DataFrame of cell counts in the experiment.

Return type:

pandas.core.frame.DataFrame

count_sql_table_rows(table)

Count total number of rows for a table.

get_sql_table_col_names(table)

Get column names from the database.

get_subsample(df=None, compartment='cells', rename_col=True)

Apply the subsampling procedure.

Parameters:
  • df (pandas.core.frame.DataFrame) – DataFrame of a single cell profile.

  • compartment (str, default "cells") – The compartment to process.

  • rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

Nothing is returned.

Return type:

None

load_compartment(compartment)

Creates the compartment dataframe.

Note: makes use of default_datatype_float attribute for setting a default floating point datatype.

Parameters:

compartment (str) – The compartment to process.

Returns:

Compartment dataframe.

Return type:

pandas.core.frame.DataFrame

load_image(image_table_name=None)

Load image table from sqlite file

Returns:

Nothing is returned.

Return type:

None

merge_single_cells(compute_subsample: bool = False, sc_output_file: str | None = None, compression_options: str | None = None, float_format: str | None = None, single_cell_normalize: bool = False, normalize_args: dict | None = None, platemap: str | DataFrame | None = None, **kwargs)

Given the linking columns, merge single cell data. Normalization is also supported.

Parameters:
  • compute_subsample (bool, default False) – Whether or not to compute subsample.

  • sc_output_file (str, optional) – The name of a file to output.

  • compression_options (str, optional) – Compression arguments as input to pandas.to_csv() with pandas version >= 1.2.

  • float_format (str, optional) – Decimal precision to use in writing output file.

  • single_cell_normalize (bool, default False) – Whether or not to normalize the single cell data.

  • normalize_args (dict, optional) – Additional arguments passed as input to pycytominer.normalize().

  • platemap (str or pd.DataFrame, default None) – optional platemap filepath str or pd.DataFrame to be used with results via annotate

Returns:

if output_file=None returns a Pandas dataframe else will write to file and return the filepath of the file

Return type:

pandas.core.frame.DataFrame or str

set_output_file(output_file)

Setting operation to conveniently rename output file.

Parameters:

output_file (str) – New output file name.

Returns:

Nothing is returned.

Return type:

None

set_subsample_frac(subsample_frac)

Setting operation to conveniently update the subsample fraction.

Parameters:

subsample_frac (float, default 1) – Percentage of single cells to select (0 < subsample_frac <= 1).

Returns:

Nothing is returned.

Return type:

None

set_subsample_n(subsample_n)

Setting operation to conveniently update the subsample n.

Parameters:

subsample_n (int, default "all") – Indicate how many sample to subsample - do not specify both subsample_frac and subsample_n.

Returns:

Nothing is returned.

Return type:

None

set_subsample_random_state(random_state)

Setting operation to conveniently update the subsample random state.

Parameters:

random_state (int, optional) – The random state to init subsample.

Returns:

Nothing is returned.

Return type:

None

split_column_categories(col_names)

Split a list of column names into feature and metadata columns lists.

subsample_profiles(df, rename_col=True)

Sample a Pandas DataFrame given subsampling information.

Parameters:
  • df (pandas.core.frame.DataFrame) – DataFrame of a single cell profile.

  • rename_col (bool, default True) – Whether or not to rename the columns.

Returns:

A subsampled pandas dataframe of single cell profiles.

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.collate module

pycytominer.cyto_utils.collate.collate(batch, config, plate, base_directory='../..', column=None, munge=False, csv_dir='analysis', aws_remote=None, aggregate_only=False, tmp_dir='/tmp', overwrite=False, add_image_features=True, image_feature_categories=['Granularity', 'Texture', 'ImageQuality', 'Threshold'], printtoscreen=True)

Collate the CellProfiler-created CSVs into a single SQLite file by calling cytominer-database

Parameters:
  • batch (str) – Batch name to process

  • config (str) – Config file to pass to cytominer-database

  • plate (str) – Plate name to process

  • base_directory (str, default "../..") – Base directory for subdirectories containing CSVs, backends, etc; in our preferred structure, this is the “workspace” directory

  • column (str, optional, default None) – An existing column to be explicitly copied to a new column called Metadata_Plate if no Metadata_Plate column already explicitly exists

  • munge (bool, default False) – Whether munge should be passed to cytominer-database, if True cytominer-database will expect a single all-object CSV; it will split each object into its own table

  • csv_dir (str, default 'analysis') – The directory under the base directory where the analysis CSVs will be found. If running the analysis pipeline, this should nearly always be “analysis”

  • aws_remote (str, optional, default None) – A remote AWS prefix, if set CSV files will be synced down from at the beginning and to which SQLite files will be synced up at the end of the run

  • aggregate_only (bool, default False) – Whether to perform only the aggregation of existent SQLite files and bypass previous collation steps

  • tmp_dir (str, default '/tmp') – The temporary directory to be used by cytominer-databases for output

  • overwrite (bool, optional, default False) – Whether or not to overwrite an sqlite that exists in the temporary directory if it already exists

  • add_image_features (bool, optional, default True) – Whether or not to add the image features to the profiles

  • image_feature_categories (list, optional, default ['Granularity','Texture','ImageQuality','Count','Threshold']) – The list of image feature groups to be used by add_image_features during aggregation

  • printtoscreen (bool, optional, default True) – Whether or not to print output to the terminal

pycytominer.cyto_utils.collate.run_check_errors(cmd)

Run a system command, and exit if an error occurred, otherwise continue

pycytominer.cyto_utils.features module

Utility function to manipulate cell profiler features

pycytominer.cyto_utils.features.convert_compartment_format_to_list(compartments: list[str] | str) list[str]

Converts compartment to a list.

Parameters:

compartments (list of str or str) – Cell Painting compartment(s).

Returns:

compartments – List of Cell Painting compartments.

Return type:

list of str

pycytominer.cyto_utils.features.count_na_features(population_df, features)

Given a population dataframe and features, count how many nas per feature.

Parameters:
  • population_df (pandas.core.frame.DataFrame) – DataFrame of profiles.

  • features (list of str) – Features present in the population dataframe.

Return type:

Dataframe of NA counts per feature

pycytominer.cyto_utils.features.drop_outlier_features(population_df, features='infer', samples='all', outlier_cutoff=500)

Exclude a feature if its min or max absolute value is greater than the threshold.

Parameters:
  • population_df (pandas.core.frame.DataFrame) – DataFrame that includes metadata and observation features.

  • features (list of str or str, default "infer") – Features present in the population dataframe. If “infer”, then assume CellProfiler feature conventions (start with “Cells_”, “Nuclei_”, or “Cytoplasm_”)

  • samples (str, default "all") – List of samples to perform operation on. The function uses a pd.DataFrame.query() function, so you should structure samples in this fashion. An example is “Metadata_treatment == ‘control’” (include all quotes). If “all”, use all samples to calculate.

  • outlier_cutoff (int or float, default 500)

  • https (see) – Threshold to remove features if absolute values is greater

Returns:

outlier_features – Features greater than the threshold.

Return type:

list of str

pycytominer.cyto_utils.features.get_blocklist_features(blocklist_file='/home/docs/checkouts/readthedocs.org/user_builds/pycytominer/checkouts/551/pycytominer/cyto_utils/../data/blocklist_features.txt', population_df=None)

Get a list of blocklist features.

Parameters:
  • blocklist_file (path-like object) – Location of the dataframe with features to exclude.

  • population_df (pandas.core.frame.DataFrame, optional) – Profile dataframe used to subset blocklist features.

Returns:

blocklist_features – Features to exclude from downstream analysis.

Return type:

list of str

pycytominer.cyto_utils.features.infer_cp_features(population_df, compartments=['Cells', 'Nuclei', 'Cytoplasm'], metadata=False, image_features=False)

Given CellProfiler output data read as a DataFrame, output feature column names as a list.

Parameters:
  • population_df (pandas.core.frame.DataFrame) – DataFrame from which features are to be inferred.

  • compartments (list of str, default ["Cells", "Nuclei", "Cytoplasm"]) – Compartments from which Cell Painting features were extracted.

  • metadata (bool, default False) – Whether or not to infer metadata features. If metadata is set to True, find column names that begin with the Metadata_ prefix. This convention is expected by CellProfiler defaults.

  • image_features (bool, default False) – Whether or not the profiles contain image features.

Returns:

features – List of Cell Painting features.

Return type:

list of str

pycytominer.cyto_utils.features.label_compartment(cp_features, compartment, metadata_cols)

Assign compartment label to each features as a prefix.

Parameters:
  • cp_features (list of str) – All features being used.

  • compartment (str) – Measured compartment.

  • metadata_cols (list) – Columns that should be considered metadata.

Returns:

cp_features – Recoded column names with appropriate metadata and compartment labels.

Return type:

list of str

pycytominer.cyto_utils.load module

pycytominer.cyto_utils.load.infer_delim(file: str)

Sniff the delimiter in the given file

Parameters:

file (str) – File name

Return type:

the delimiter used in the dataframe (typically either tab or commas)

pycytominer.cyto_utils.load.is_path_a_parquet_file(file: str | PurePath) bool

Checks if the provided file path is a parquet file.

Identify parquet files by inspecting the file extensions. If the file does not end with parquet, this will return False, else True.

Parameters:

file (Union[str, pathlib.PurePath]) – path to parquet file

Returns:

Returns True if the file path contains .parquet, else it will return False

Return type:

bool

Raises:
  • TypeError – Raised if a non str or non-path object is passed in the file parameter

  • FileNotFoundError – Raised if the provided path in the file does not exist

pycytominer.cyto_utils.load.load_npz_features(npz_file, fallback_feature_prefix='DP', metadata=True)

Load an npz file storing features and, sometimes, metadata.

The function will first search the .npz file for a metadata column called “Metadata_Model”. If the field exists, the function uses this entry as the feature prefix. If it doesn’t exist, use the fallback_feature_prefix.

If the npz file does not exist, this function returns an empty dataframe.

Parameters:
  • npz_file (str) – file path to the compressed output (typically DeepProfiler output)

  • fallback_feature_prefix (str) – a string to prefix all features [default: “DP”].

Returns:

df – pandas DataFrame of profiles

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.load.load_npz_locations(npz_file, location_x_col_index=0, location_y_col_index=1)

Load an npz file storing locations and, sometimes, metadata.

The function will first search the .npz file for a metadata column called “locations”. If the field exists, the function uses this entry as the feature prefix.

If the npz file does not exist, this function returns an empty dataframe.

Parameters:
  • npz_file (str) – file path to the compressed output (typically DeepProfiler output)

  • location_x_col_index (int) – index of the x location column (which column in DP output has X coords)

  • location_y_col_index (int) – index of the y location column (which column in DP output has Y coords)

Returns:

df – pandas DataFrame of profiles

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.load.load_platemap(platemap, add_metadata_id=True)

Unless a dataframe is provided, load the given platemap dataframe from path or string

Parameters:
  • platemap (pandas dataframe) – location or actual pandas dataframe of platemap file

  • add_metadata_id (bool) – boolean if “Metadata_” should be appended to all platemap columns

Returns:

platemap – pandas DataFrame of profiles

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.load.load_profiles(profiles)

Unless a dataframe is provided, load the given profile dataframe from path or string

Parameters:

profiles ({str, pathlib.Path, pandas.DataFrame}) – file location or actual pandas dataframe of profiles

Returns:

  • pandas DataFrame of profiles

  • Raises

  • ——-

  • FileNotFoundError – Raised if the provided profile does not exists

pycytominer.cyto_utils.modz module

pycytominer.cyto_utils.modz.modz(population_df, replicate_columns, features='infer', method='spearman', min_weight=0.01, precision=4)

Collapse replicates into a consensus signature using a weighted transformation

Parameters:
  • population_df (pandas.core.frame.DataFrame) – DataFrame that includes metadata and observation features.

  • replicate_columns (str, list) – a string or list of column(s) in the population dataframe that indicate replicate level information

  • features (list, default "infer") – A list of strings corresponding to feature measurement column names in the population_df DataFrame. All features listed must be found in population_df. Defaults to “infer”. If “infer”, then assume CellProfiler features are those prefixed with “Cells”, “Nuclei”, or “Cytoplasm”.

  • method (str, default "spearman") – indicating which correlation metric to use.

  • min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to

  • precision (int, default 4) – how many significant digits to round weights to

Returns:

modz_df – Consensus signatures with metadata for all replicates in the given DataFrame

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.modz.modz_base(population_df, method='spearman', min_weight=0.01, precision=4)

Perform a modified z score transformation.

This code is modified from cmapPy. (see https://github.com/cytomining/pycytominer/issues/52). Note that this will apply the transformation to the FULL population_df. See modz() for replicate level procedures.

Parameters:
  • population_df (pandas.core.frame.DataFrame) – DataFrame that includes metadata and observation features.

  • method (str, default "spearman") – indicating which correlation metric to use.

  • min_weight (float, default 0.01) – the minimum correlation to clip all non-negative values lower to

  • precision (int, default 4) – how many significant digits to round weights to

Returns:

modz_df – modz transformed dataframe - a consensus signature of the input data weighted by replicate correlation

Return type:

pandas.core.frame.DataFrame

pycytominer.cyto_utils.output module

Utility function to compress output data

pycytominer.cyto_utils.output.check_compression_method(compression: str)

Ensure compression options are set properly

Parameters:

compression (str) – The category of compression options available

Returns:

Asserts available options

Return type:

None

pycytominer.cyto_utils.output.output(df: DataFrame, output_filename: str, output_type: str | None = 'csv', sep: str = ',', float_format: str | None = None, compression_options: str | dict[str, Any] | None = {'method': 'gzip', 'mtime': 1}, **kwargs) str

Given an output file and compression options, write file to disk

Parameters:
  • df (pandas.core.frame.DataFrame) – a pandas dataframe that will be written to file

  • output_filename (str) – location of file to write

  • output_type (str, default "csv") – type of output file to create

  • sep (str) – file delimiter

  • float_format (str, default None) – Decimal precision to use in writing output file as input to pd.DataFrame.to_csv(float_format=float_format). For example, use “%.3g” for 3 decimal precision.

  • compression_options (str or dict, default {"method": "gzip", "mtime": 1}) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.

Returns:

returns output_filename

Return type:

str

Examples

import pandas as pd from pycytominer.cyto_utils import output

data_df = pd.concat(
[
pd.DataFrame(
{

“Metadata_Plate”: “X”, “Metadata_Well”: “a”, “Cells_x”: [0.1, 0.3, 0.8], “Nuclei_y”: [0.5, 0.3, 0.1],

}

), pd.DataFrame(

{

“Metadata_Plate”: “X”, “Metadata_Well”: “b”, “Cells_x”: [0.4, 0.2, -0.5], “Nuclei_y”: [-0.8, 1.2, -0.5],

}

),

]

).reset_index(drop=True)

output_file = “test.csv.gz” output(

df=data_df, output_filename=output_file, sep=”,”, compression_options={“method”: “gzip”, “mtime”: 1}, float_format=None,

)

pycytominer.cyto_utils.output.set_compression_method(compression: str | dict | None) dict[str, Any]

Set the compression options

Parameters:

compression (str or dict) – Contains compression options as input to pd.DataFrame.to_csv(compression=compression_options). pandas version >= 1.2.

Returns:

A formated dictionary expected by output()

Return type:

compression, dict

pycytominer.cyto_utils.util module

Miscellaneous utility functions

pycytominer.cyto_utils.util.check_aggregate_operation(operation)

Confirm that the input operation for aggregation is currently supported.

Parameters:

operation (str) – Aggregation operation to provide.

Returns:

Correctly formatted operation method.

Return type:

str

pycytominer.cyto_utils.util.check_compartments(compartments)

Checks if the input compartments are noncanonical compartments.

Parameters:

compartments (list of str) – Input compartments.

Returns:

Nothing is returned.

Return type:

None

pycytominer.cyto_utils.util.check_consensus_operation(operation)

Confirm that the input operation for consensus is currently supported.

Parameters:

operation (str) – Consensus operation to provide.

Returns:

Correctly formatted operation method.

Return type:

str

pycytominer.cyto_utils.util.check_correlation_method(method)

Confirm that the input method is currently supported.

Parameters:

method (str) – The correlation metric to use.

Returns:

Correctly formatted correlation method.

Return type:

str

pycytominer.cyto_utils.util.check_fields_of_view(data_fields_of_view, input_fields_of_view)

Confirm that the input list of fields of view is a subset of the list of fields of view in the image table.

Parameters:
  • data_fields_of_view (list of int) – Fields of view in the image table.

  • input_fields_of_view (list of int) – Input fields of view.

Returns:

Nothing is returned.

Return type:

None

pycytominer.cyto_utils.util.check_fields_of_view_format(fields_of_view)

Confirm that the input fields of view is valid.

Parameters:

fields_of_view (list of int) – List of integer fields of view.

Returns:

Correctly formatted fields_of_view variable.

Return type:

str or list of int

pycytominer.cyto_utils.util.check_image_features(image_features, image_columns)

Confirm that the input list of image features are present in the image table

Parameters:
  • image_features (list of str) – Input image features to extract from the image table.

  • image_columns (list of str) – Columns in the image table

Returns:

Nothing is returned.

Return type:

None

pycytominer.cyto_utils.util.extract_image_features(image_feature_categories, image_df, image_cols, strata)

Confirm that the input list of image features categories are present in the image table and then extract those features.

Parameters:
  • image_feature_categories (list of str) – Input image feature groups to extract from the image table.

  • image_df (pandas.core.frame.DataFrame) – Image dataframe.

  • image_cols (list of str) – Columns to select from the image table.

  • strata (list of str) – The columns to groupby and aggregate single cells.

Returns:

  • image_features_df (pandas.core.frame.DataFrame) – Dataframe with extracted image features.

  • image_feature_categories (list of str) – Correctly formatted image feature categories.

pycytominer.cyto_utils.util.get_default_compartments()

Returns default compartments.

Returns:

Default compartments.

Return type:

list of str

pycytominer.cyto_utils.util.get_pairwise_correlation(population_df, method='pearson')

Given a population dataframe, calculate all pairwise correlations.

Parameters:
  • population_df (pandas.core.frame.DataFrame) – Includes metadata and observation features.

  • method (str, default "pearson") – Which correlation matrix to use to test cutoff.

Returns:

Features to exclude from the population_df.

Return type:

list of str

pycytominer.cyto_utils.util.load_known_metadata_dictionary(metadata_file='/home/docs/checkouts/readthedocs.org/user_builds/pycytominer/checkouts/551/pycytominer/cyto_utils/../data/metadata_feature_dictionary.txt')

From a tab separated text file (two columns: [“compartment”, “feature”]), load previously known metadata columns per compartment.

Parameters:

metadata_file (str, optional) – File location of the metadata text file. Uses a default dictionary if you do not specify.

Returns:

Compartment (keys) mappings to previously known metadata (values).

Return type:

dict

pycytominer.cyto_utils.write_gct module

Transform profiles into a gct (Gene Cluster Text) file A gct is a tab deliminted text file that traditionally stores gene expression data File Format Description: https://clue.io/connectopedia/gct_format

Modified from cytominer_scripts “write_gcg” written in R https://github.com/broadinstitute/cytominer_scripts/blob/master/write_gct.R

pycytominer.cyto_utils.write_gct.write_gct(profiles, output_file, features='infer', meta_features='infer', feature_metadata=None, version='#1.3')

Convert profiles to a .gct file

Parameters:
  • profiles (pandas.core.frame.DataFrame) – DataFrame of profiles.

  • output_file (str) – If provided, will write gct to file.

  • features (list) – A list of strings corresponding to feature measurement column names in the profiles DataFrame. All features listed must be found in profiles. Defaults to “infer”. If “infer”, then assume features are from CellProfiler output and prefixed with “Cells”, “Nuclei”, or “Cytoplasm”.

  • meta_features (list) – A list of strings corresponding to metadata column names in the profiles DataFrame. All features listed must be found in profiles. Defaults to “infer”. If “infer”, then assume metadata features are those prefixed with “Metadata”

  • feature_metadata (pandas.core.frame.DataFrame, default None)

  • version (str, default "#1.3") – Important for gct loading into Morpheus

Returns:

Writes gct to file

Return type:

None

Module contents