Skip to content

Operators

Operator blocks transform molecules within a workflow. Most are 1:1 transforms; some (conformer generation, ionization, stereo enumeration) produce multiple outputs per input.

Block properties to keep in mind when building workflows:

  • Required Inputs — file paths or text values that must be provided before running the workflow. Pass them as constructor keyword arguments in Python (MyBlock(key="value")), or via the MCP agent using run_workflow set_inputs.
  • Output Properties — named properties attached to each output molecule when the block is included. Downstream blocks (filters, selectors, score blocks) can reference these by name.
  • Mutable Parameters — numeric or categorical settings tuned automatically by Bayesian optimization. Set defaults at construction; the optimizer adjusts them during optimize_workflow.

Note: By convention the constructors only explicitely contain extra keyword arguments. However, all required inputs and mutable parameters can also be passed.

Standardization

cmxflow.operators.standardize.MoleculeStandardizeBlock(canonicalize_tautomers: bool = False, **kwargs: Any)

Bases: MoleculeBlock

Standardize molecules for drug discovery preprocessing.

Applies a standard pipeline: metal disconnection, normalization, salt/fragment removal, and charge neutralization. Optionally canonicalizes tautomers.

Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeStandardizeBlock(),
    MoleculeSinkBlock()
)

Initialize the MoleculeStandardizeBlock.

Parameters:

Name Type Description Default
canonicalize_tautomers bool

Whether to canonicalize tautomers.

False
**kwargs Any

Additional keyword arguments passed to set_inputs.

{}

Deduplication

cmxflow.operators.dedup.MoleculeDeduplicateBlock(**kwargs: Any)

Bases: MoleculeBlock

Remove duplicate molecules from a stream based on canonical SMILES.

Keeps the first occurrence and discards subsequent duplicates. Uses RDKit canonical SMILES as the deduplication key. Cannot be parallelized because it relies on shared mutable state.

Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeDeduplicateBlock(),
    MoleculeSinkBlock()
)

Initialize the de-duplication block.

RDKit Methods

cmxflow.operators.method.RDKitBlock(method: Callable[[Chem.Mol], Any] | str, name: str | None = None, **kwargs)

Bases: MoleculeBlock

Apply an arbitrary RDKit method to each molecule in the stream.

The method can be a callable or a dot-separated string path (e.g., "rdkit.Chem.Descriptors.MolWt"). If the method returns a Mol, it replaces the molecule; if it returns a scalar (int, float, str, bool), the value is stored as a molecule property; if it returns None, the molecule is filtered out.

Output Properties
  • <method_name>: Scalar result stored as a molecule property. The key is the method name (or the explicit name argument).
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    MoleculeSinkBlock()
)

Initialize with an RDKit method.

Parameters:

Name Type Description Default
method Callable[[Mol], Any] | str

RDKit method as callable or string path (e.g., "rdkit.Chem.Descriptors.MolWt").

required
name str | None

Optional property name for scalar results. Defaults to the method name extracted from callable or path.

None

Filtering

cmxflow.operators.filter.SubstructureFilterBlock(**kwargs: Any)

Bases: MoleculeBlock

Filter molecules based on substructure matches.

Molecules are flagged using SMARTS patterns and/or built-in RDKit filter catalogs (PAINS, BRENK, NIH, ZINC). Uses OR logic: a molecule is flagged if it matches any pattern or catalog entry. The mode input controls whether matching molecules are removed or kept.

Required Inputs
  • query (text): Space-separated catalog names and/or SMARTS patterns.
  • mode (text): "remove" (default) drops matches; "keep" retains only matches.
Example
workflow.add(
    MoleculeSourceBlock(),
    SubstructureFilterBlock(query="PAINS BRENK", mode="remove"),
    MoleculeSinkBlock()
)

Initialize the SubstructureFilterBlock.

cmxflow.operators.filter.PropertyFilterBlock(**kwargs: Any)

Bases: MoleculeBlock

Filter molecules based on numeric property conditions.

Conditions are specified in the filters input text using AND logic — a molecule must satisfy every condition to pass. Supported syntax: simple comparisons (MW>200), reversed comparisons (200<MW), range expressions (200<MW<500), and comma-separated multiple conditions. Supported operators: <, >, <=, >=, ==, !=.

Required Inputs
  • filters (text): Comma-separated filter expressions (e.g. "200<MolWt<500, logP>0").
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyFilterBlock(filters="200<MolWt<500"),
    MoleculeSinkBlock()
)

Initialize the PropertyFilterBlock.

Selection

cmxflow.operators.select.PropertyHeadBlock(**kwargs: Any)

Bases: PropertySelectBlock

Return molecules with the highest values of a specified property.

Collects all input molecules, sorts by the specified property in descending order, and yields the top N (highest values first).

Required Inputs
  • property (text): Name of the molecule property to sort by.
  • count (text): Number of molecules to return. 0 or empty returns all, sorted.
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyHeadBlock(property="MolWt", count="10"),
    MoleculeSinkBlock()
)

Initialize the PropertyHeadBlock.

cmxflow.operators.select.PropertyTailBlock(**kwargs: Any)

Bases: PropertySelectBlock

Return molecules with the lowest values of a specified property.

Collects all input molecules, sorts by the specified property in ascending order, and yields the bottom N (lowest values first).

Required Inputs
  • property (text): Name of the molecule property to sort by.
  • count (text): Number of molecules to return. 0 or empty returns all, sorted.
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyTailBlock(property="MolWt", count="10"),
    MoleculeSinkBlock()
)

Initialize the PropertyTailBlock.

2D Similarity

cmxflow.operators.sim2d.MoleculeSimilarityBlock(**kwargs: Any)

Bases: MoleculeBlock

Compute 2D fingerprint similarity against a set of query molecules.

For each input molecule, computes the maximum fingerprint similarity across all query molecules and attaches the score and best-matching query name as properties.

Required Inputs
  • queries (file): Path to query molecule file (SDF, SMILES, etc.).
Output Properties
  • max_similarity: Maximum similarity score to any query molecule.
  • most_similar_query: Name or index of the most similar query molecule.
Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeSimilarityBlock(queries="reference_ligands.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • fingerprint_type: Fingerprint algorithm (morgan, rdkit, maccs, atom_pair, topological_torsion).
  • similarity_metric: Similarity function (tanimoto, dice, cosine, sokal, russel).
  • radius: Morgan fingerprint radius (1–4).
  • nbits: Fingerprint bit length (512–4096).

Initialize the similarity search block.

3D Similarity

cmxflow.operators.sim3d.Molecule3DSimilarityBlock(**kwargs: Any)

Bases: MoleculeBlock

Compute 3D molecular similarity against a set of query molecules.

Both input and query molecules must have pre-existing 3D conformers. For each input molecule, computes maximum similarity across all conformer pairs and attaches the result as properties.

Required Inputs
  • query (file): Path to query molecule file with 3D conformers.
Output Properties
  • similarity_3d: Maximum 3D similarity score to any query conformer.
  • most_similar_query_3d: Name of the most similar query molecule.
  • similarity_3d_method: Similarity method used.
  • similarity_3d_conf_id: Conformer ID that gave the best similarity.
Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeAlignBlock(query="reference.sdf"),
    Molecule3DSimilarityBlock(query="reference.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • method: Similarity method (shape_tanimoto, shape_tversky, usr, usrcat).
  • tversky_alpha: Tversky alpha parameter (0.0–1.0).
  • tversky_beta: Tversky beta parameter (0.0–1.0).

Initialize the 3D similarity block.

Ionization

cmxflow.operators.ionize.IonizeMoleculeBlock(ph_min: float = 6.4, ph_max: float = 8.4, **kwargs: Any)

Bases: MoleculeBlock

Generate pH-dependent ionization states using dimorphite_dl.

This is a 1:N transform: one input molecule can yield multiple protonation variants. Includes automatic correction for tertiary amide nitrogens that dimorphite_dl incorrectly protonates.

Example
workflow.add(
    MoleculeSourceBlock(),
    IonizeMoleculeBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • precision: pH precision window around min/max (0.1–3.0).
  • max_variants: Maximum number of ionization variants per molecule (1–128).

Initialize the IonizeMoleculeBlock.

Parameters:

Name Type Description Default
ph_min float

Minimum pH for protonation.

6.4
ph_max float

Maximum pH for protonation.

8.4
**kwargs Any

Additional keyword arguments passed to set_inputs.

{}

Stereoisomers

cmxflow.operators.confgen.EnumerateStereoBlock()

Bases: MoleculeBlock

Enumerate all stereoisomers of each input molecule.

This is a 1:N transform that yields all possible stereoisomers for each input molecule. Properties from the input molecule are copied to each output stereoisomer.

Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    MoleculeSinkBlock()
)

Initialize the stereoisomer enumeration block.

Conformer Generation

cmxflow.operators.confgen.ConformerGenerationBlock(**kwargs: Any)

Bases: MoleculeBlock

Generate 3D conformers using RDKit's ETKDGv3 algorithm.

Molecules must have fully specified stereochemistry before conformer generation. Use EnumerateStereoBlock upstream to resolve any unspecified stereocenters.

Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • numConfs: Number of conformers to generate (1–100).
  • pruneRmsThresh: RMS threshold for pruning similar conformers (0.0–3.0).
  • useRandomCoords: Use random initial coordinates instead of distance geometry.

Initialize the conformer generation block.

Alignment

cmxflow.operators.align.MoleculeAlignBlock(**kwargs: Any)

Bases: MoleculeBlock

Align input molecule conformers to a set of 3D reference molecules.

For each input molecule, all conformers are aligned to all reference conformers using the selected method. The single best-scoring conformer (highest shape Tanimoto after alignment) is retained and returned. Input molecules must already have 3D conformers.

Required Inputs
  • query (file): Path to reference molecule file with 3D conformers.
Output Properties
  • alignment_shape_similarity: Shape Tanimoto similarity of the best-aligned conformer.
  • alignment_score: RMSD of the best alignment.
  • alignment_reference: Name of the reference molecule used.
  • alignment_method: Alignment algorithm used.
  • alignment_ref_index: Index of the reference molecule used.
  • alignment_mcs: MCS SMARTS pattern (only present when method is mcs).
Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeAlignBlock(query="reference.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • alignment_method: Alignment algorithm (crippen_o3a, mmff_o3a, mcs).

Initialize the molecular alignment block.

Docking

cmxflow.operators.dock.MoleculeDockBlock(**kwargs: Any)

Bases: MoleculeBlock

MoleculeBlock for docking ligands into protein binding sites.

Performs pose optimization using Vinardo scoring with configurable parameter. Supports both rigid-body and flexible (torsional) docking. An optional electrostatic complementarity (EC) term can be enabled via the w_ec parameter to reward charge complementarity.

Required Inputs
  • receptor (file): Path to receptor PDB file.
Output Properties
  • docking_initial_pose_score: Score before optimization.
  • docking_score: Final optimized score (Vinardo + EC adjustment).
  • docking_ec: Electrostatic complementarity value (0.0 when w_ec=0).
  • docking_converged: Whether optimization converged.
Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeAlignBlock(query="reference.sdf"),
    MoleculeDockBlock(receptor="protein.pdb"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • w_gauss1: Vinardo Gaussian attractive term weight.
  • w_repulsion: Vinardo repulsion term weight.
  • w_hydrophobic: Vinardo hydrophobic term weight.
  • w_hbond: Vinardo hydrogen bond term weight.
  • w_ec: Weight for electrostatic complementarity term (0 = disabled).
  • max_iterations: Maximum optimization iterations.
  • box_size: Translation search box size in Angstroms.
  • rigid: If True, only rigid-body optimization (no torsions).

Initialize the molecular docking block.

Clustering

cmxflow.operators.cluster.RepresentativeClusterBlock(**kwargs: Any)

Bases: MoleculeBlock

Assign molecules to clusters using streaming leader clustering.

For each molecule, computes an ECFP4 fingerprint (or Murcko scaffold fingerprint) and compares it against all existing cluster representatives via Tanimoto similarity. If the best similarity meets the threshold, the molecule joins that cluster; otherwise a new cluster is created. All molecules pass through annotated with cluster metadata.

Output Properties
  • cluster_id: Integer index of the assigned cluster.
  • cluster_representative: SMILES of the cluster's representative molecule.
  • cluster_similarity: Tanimoto similarity to the cluster representative.
Example
workflow.add(
    MoleculeSourceBlock(),
    RepresentativeClusterBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • threshold: Tanimoto similarity threshold for cluster assignment (0.05–0.95).
  • scaffold: If True, cluster by Murcko scaffold fingerprint instead of full molecule.

Initialize the representative cluster block.