Skip to content

Operators

Operator blocks transform molecules within a workflow. Most are 1:1 transforms; some (conformer generation, ionization, stereo enumeration) produce multiple outputs per input.

Block properties to keep in mind when building workflows:

  • Required Inputs — file paths or text values that must be provided before running the workflow. Pass them as constructor keyword arguments in Python (MyBlock(key="value")), or via the MCP agent using run_workflow set_inputs.
  • Output Properties — named properties attached to each output molecule when the block is included. Downstream blocks (filters, selectors, score blocks) can reference these by name.
  • Mutable Parameters — numeric or categorical settings tuned automatically by Bayesian optimization. Set defaults at construction; the optimizer adjusts them during optimize_workflow.

Note: By convention the constructors only explicitely contain extra keyword arguments. However, all required inputs and mutable parameters can also be passed.

Standardization

cmxflow.operators.standardize.MoleculeStandardizeBlock(canonicalize_tautomers: bool = False, **kwargs: Any)

Bases: MoleculeBlock

Standardize molecules for drug discovery preprocessing.

Applies a standard pipeline: metal disconnection, normalization, salt/fragment removal, and charge neutralization. Optionally canonicalizes tautomers.

Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeStandardizeBlock(),
    MoleculeSinkBlock()
)

Initialize the MoleculeStandardizeBlock.

Parameters:

Name Type Description Default
canonicalize_tautomers bool

Whether to canonicalize tautomers.

False
**kwargs Any

Additional keyword arguments passed to set_inputs.

{}

Deduplication

cmxflow.operators.dedup.MoleculeDeduplicateBlock(**kwargs: Any)

Bases: MoleculeBlock

Remove duplicate molecules from a stream based on canonical SMILES.

Keeps the first occurrence and discards subsequent duplicates. Uses RDKit canonical SMILES as the deduplication key. Cannot be parallelized because it relies on shared mutable state.

Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeDeduplicateBlock(),
    MoleculeSinkBlock()
)

Initialize the de-duplication block.

RDKit Methods

cmxflow.operators.method.RDKitBlock(method: Callable[[Chem.Mol], Any] | str, name: str | None = None, **kwargs)

Bases: MoleculeBlock

Apply an arbitrary RDKit method to each molecule in the stream.

The method can be a callable or a dot-separated string path (e.g., "rdkit.Chem.Descriptors.MolWt"). If the method returns a Mol, it replaces the molecule; if it returns a scalar (int, float, str, bool), the value is stored as a molecule property; if it returns None, the molecule is filtered out.

Output Properties
  • <method_name>: Scalar result stored as a molecule property. The key is the method name (or the explicit name argument).
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    MoleculeSinkBlock()
)

Initialize with an RDKit method.

Parameters:

Name Type Description Default
method Callable[[Mol], Any] | str

RDKit method as callable or string path (e.g., "rdkit.Chem.Descriptors.MolWt").

required
name str | None

Optional property name for scalar results. Defaults to the method name extracted from callable or path.

None

Filtering

cmxflow.operators.filter.SubstructureFilterBlock(**kwargs: Any)

Bases: MoleculeBlock

Filter molecules based on substructure matches.

Molecules are flagged using SMARTS patterns and/or built-in RDKit filter catalogs (PAINS, BRENK, NIH, ZINC). Uses OR logic: a molecule is flagged if it matches any pattern or catalog entry. The mode input controls whether matching molecules are removed or kept.

Required Inputs
  • query (text): Space-separated catalog names and/or SMARTS patterns.
  • mode (text): "remove" (default) drops matches; "keep" retains only matches.
Example
workflow.add(
    MoleculeSourceBlock(),
    SubstructureFilterBlock(query="PAINS BRENK", mode="remove"),
    MoleculeSinkBlock()
)

Initialize the SubstructureFilterBlock.

cmxflow.operators.filter.PropertyFilterBlock(**kwargs: Any)

Bases: MoleculeBlock

Filter molecules based on numeric property conditions.

Conditions are specified in the filters input text using AND logic — a molecule must satisfy every condition to pass. Supported syntax: simple comparisons (MW>200), reversed comparisons (200<MW), range expressions (200<MW<500), and comma-separated multiple conditions. Supported operators: <, >, <=, >=, ==, !=.

Required Inputs
  • filters (text): Comma-separated filter expressions (e.g. "200<MolWt<500, logP>0").
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyFilterBlock(filters="200<MolWt<500"),
    MoleculeSinkBlock()
)

Initialize the PropertyFilterBlock.

Selection

cmxflow.operators.select.PropertyHeadBlock(**kwargs: Any)

Bases: PropertySelectBlock

Return molecules with the highest values of a specified property.

Collects all input molecules, sorts by the specified property in descending order, and yields the top N (highest values first).

Required Inputs
  • property (text): Name of the molecule property to sort by.
  • count (text): Number of molecules to return. 0 or empty returns all, sorted.
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyHeadBlock(property="MolWt", count="10"),
    MoleculeSinkBlock()
)

Initialize the PropertyHeadBlock.

cmxflow.operators.select.PropertyTailBlock(**kwargs: Any)

Bases: PropertySelectBlock

Return molecules with the lowest values of a specified property.

Collects all input molecules, sorts by the specified property in ascending order, and yields the bottom N (lowest values first).

Required Inputs
  • property (text): Name of the molecule property to sort by.
  • count (text): Number of molecules to return. 0 or empty returns all, sorted.
Example
workflow.add(
    MoleculeSourceBlock(),
    RDKitBlock("rdkit.Chem.Descriptors.MolWt"),
    PropertyTailBlock(property="MolWt", count="10"),
    MoleculeSinkBlock()
)

Initialize the PropertyTailBlock.

2D Similarity

cmxflow.operators.sim2d.MoleculeSimilarityBlock(**kwargs: Any)

Bases: MoleculeBlock

Compute 2D fingerprint similarity against a set of query molecules.

For each input molecule, computes the maximum fingerprint similarity across all query molecules and attaches the score and best-matching query name as properties.

Required Inputs
  • queries (file): Path to query molecule file (SDF, SMILES, etc.).
Output Properties
  • max_similarity: Maximum similarity score to any query molecule.
  • most_similar_query: Name or index of the most similar query molecule.
Example
workflow.add(
    MoleculeSourceBlock(),
    MoleculeSimilarityBlock(queries="reference_ligands.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • fingerprint_type: Fingerprint algorithm (morgan, rdkit, maccs, atom_pair, topological_torsion).
  • similarity_metric: Similarity function (tanimoto, dice, cosine, sokal, russel).
  • radius: Morgan fingerprint radius (1–4).
  • nbits: Fingerprint bit length (512–4096).

Initialize the similarity search block.

3D Similarity

cmxflow.operators.sim3d.Molecule3DSimilarityBlock(**kwargs: Any)

Bases: MoleculeBlock

Compute 3D molecular similarity against a set of query molecules.

Both input and query molecules must have pre-existing 3D conformers. For each input molecule, computes maximum similarity across all conformer pairs and attaches the result as properties.

Required Inputs
  • query (file): Path to query molecule file with 3D conformers.
Output Properties
  • similarity_3d: Maximum 3D similarity score to any query conformer.
  • most_similar_query_3d: Name of the most similar query molecule.
  • similarity_3d_method: Similarity method used.
  • similarity_3d_conf_id: Conformer ID that gave the best similarity.
Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeAlignBlock(query="reference.sdf"),
    Molecule3DSimilarityBlock(query="reference.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • method: Similarity method (shape_tanimoto, shape_tversky, usr, usrcat).
  • tversky_alpha: Tversky alpha parameter (0.0–1.0).
  • tversky_beta: Tversky beta parameter (0.0–1.0).

Initialize the 3D similarity block.

Ionization

cmxflow.operators.ionize.IonizeMoleculeBlock(ph_min: float = 6.4, ph_max: float = 8.4, **kwargs: Any)

Bases: MoleculeBlock

Generate pH-dependent ionization states using dimorphite_dl.

This is a 1:N transform: one input molecule can yield multiple protonation variants. Includes automatic correction for tertiary amide nitrogens that dimorphite_dl incorrectly protonates.

When the input carries a 3D conformer the conformer is preserved: dimorphite only changes formal charges and hydrogen counts on an unchanged heavy-atom skeleton, so the protonation state is transferred back onto the original 3D heavy atoms (matched exactly by atom-map number, not substructure search) and any hydrogens added during protonation get coordinates from the heavy-atom geometry. Inputs without a 3D conformer take the plain SMILES path.

Example
workflow.add(
    MoleculeSourceBlock(),
    IonizeMoleculeBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • precision: pH precision window around min/max (0.1–3.0).
  • max_variants: Maximum number of ionization variants per molecule (1–128).

Initialize the IonizeMoleculeBlock.

Parameters:

Name Type Description Default
ph_min float

Minimum pH for protonation.

6.4
ph_max float

Maximum pH for protonation.

8.4
**kwargs Any

Additional keyword arguments passed to set_inputs.

{}

Stereoisomers

cmxflow.operators.confgen.EnumerateStereoBlock()

Bases: MoleculeBlock

Enumerate all stereoisomers of each input molecule.

This is a 1:N transform that yields all possible stereoisomers for each input molecule. Properties from the input molecule are copied to each output stereoisomer.

Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    MoleculeSinkBlock()
)

Initialize the stereoisomer enumeration block.

Conformer Generation

cmxflow.operators.confgen.ConformerGenerationBlock(**kwargs: Any)

Bases: MoleculeBlock

Generate 3D conformers using RDKit's ETKDGv3 algorithm.

Molecules must have fully specified stereochemistry before conformer generation. Use EnumerateStereoBlock upstream to resolve any unspecified stereocenters.

Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • numConfs: Number of conformers to generate (1–100).
  • pruneRmsThresh: RMS threshold for pruning similar conformers (0.0–3.0).
  • useRandomCoords: Use random initial coordinates instead of distance geometry.

Initialize the conformer generation block.

Alignment

cmxflow.operators.align.MoleculeAlignBlock(**kwargs: Any)

Bases: MoleculeBlock

Align input molecule conformers to a set of 3D reference molecules.

For each input molecule, all conformers are aligned to all reference conformers using the selected method. The single best-scoring conformer (highest shape Tanimoto after alignment) is retained and returned. Input molecules must already have 3D conformers.

Required Inputs
  • query (file): Path to reference molecule file with 3D conformers.
Output Properties
  • alignment_shape_similarity: Shape Tanimoto similarity of the best-aligned conformer.
  • alignment_score: RMSD of the best alignment.
  • alignment_reference: Name of the reference molecule used.
  • alignment_method: Alignment algorithm used.
  • alignment_ref_index: Index of the reference molecule used.
  • alignment_mcs: MCS SMARTS pattern (only present when method is mcs).
Example
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeAlignBlock(query="reference.sdf"),
    MoleculeSinkBlock()
)
Mutable Parameters
  • alignment_method: Alignment algorithm (crippen_o3a, mmff_o3a, mcs).

Initialize the molecular alignment block.

Docking

cmxflow.operators.dock.MoleculeDockBlock(score_components: bool = True, score_strain: bool = False, **kwargs: Any)

Bases: MoleculeBlock

MoleculeBlock for docking ligands into protein binding sites.

Performs pose optimization using an empirical scoring function with configurable parameters. Supports both rigid-body and flexible (torsional) docking. Electrostatic complementarity (EC) is evaluated once on the final pose and reported as docking_ec — a standalone score, never part of the search.

Two modes, both requiring a receptor and a site_reference:

  • Free docking (index_poses=False, default): multi-start search (n_starts) anchored on the site-reference centroid.
  • Scaffold-indexed docking (index_poses=True): the first molecule of each Bemis-Murcko scaffold is docked fully and its core pose cached; later siblings transfer that pose and run a single constrained local search — much faster for congeneric series, and series-consistent. The site_reference ligand is seeded as the first scaffold entry, so its experimentally grounded pose is the preferred template.
Required Inputs
  • receptor (file): Path to receptor PDB file.
  • site_reference (file): Molecule file (.sdf, .mol2, etc.) whose heavy-atom centroid defines the binding site center. Sobol restart samples are anchored to this pocket, so molecules dock from a freshly generated conformer — no preceding alignment step is required. It is the reference template seeded into the scaffold index when index_poses=True. Technically optional: omit only for MCS/overlay refinement workflows where the input pose is already in the binding site (the search then recenters on the input conformer position).
Output Properties
  • docking_initial_pose_score: Score before optimization.
  • docking_score: Final optimized score (empirical + EC adjustment, plus ligand strain when score_strain=True).
  • docking_empirical: Pure empirical score (without EC term).
  • docking_ec: Electrostatic complementarity of the final pose, in [-1, 1] (0.0 only when EC protein data is unavailable).
  • docking_strain: Ligand strain penalty — intramolecular energy added vs the input conformer (>=0). Reported regardless of score_strain.
  • docking_converged: Whether optimization converged. When score_components=True (default), also writes:
  • docking_gauss1: Gaussian term contribution to docking_score.
  • docking_repulsion: Repulsion term contribution to docking_score.
  • docking_hydrophobic: Hydrophobic term contribution to docking_score.
  • docking_hbond: H-bond term contribution to docking_score.
  • docking_n_rot: Torsional entropy energetic term (w_rot * N_rot).
  • docking_scoring_function: Scoring weights used, for reproducibility.
Example
# site_reference recenters the search, so a fresh conformer docks
# directly — no MoleculeAlignBlock required.
workflow.add(
    MoleculeSourceBlock(),
    EnumerateStereoBlock(),
    ConformerGenerationBlock(),
    MoleculeDockBlock(
        receptor="protein.pdb",
        site_reference="crystal_ligand.sdf",
    ),
    MoleculeSinkBlock(),
)
Mutable Parameters
  • w_gauss1: Vinardo Gaussian attractive term weight.
  • w_repulsion: Vinardo repulsion term weight.
  • w_hydrophobic: Vinardo hydrophobic term weight.
  • w_hbond: Vinardo hydrogen bond term weight.
  • w_rot: Torsional entropy divisor weight (0=pure Vinardo, 0.02=smina default).
  • n_starts: Number of L-BFGS-B restarts. 1 = local minimize from the input pose only. For blind docking (with site_reference), use 1+2^k for ideal Sobol balance: 3, 5, 9, 17, 33, 65. Row 0 always minimizes from the aligned pose; rows 1+ sample the binding site box.
  • basin_hops: Iterated-local-search refinement steps per restart (0 = single minimize). Higher finds lower-energy poses at more cost.
  • max_iterations: Maximum L-BFGS-B iterations per restart.
  • box_size: Translation search box half-width in Angstroms (default 5.0). Centred on site_reference centroid when provided, otherwise on the input conformer position.
  • rigid: If True, only rigid-body optimization (no torsions).
  • index_poses: If True, scaffold-indexed (template) docking (see above). A mode toggle, not a search dimension — freeze it during optimization.

Initialize the molecular docking block.

Parameters:

Name Type Description Default
score_components bool

If True (default), write per-term weighted score components as SDF properties on each docked molecule.

True
score_strain bool

If True, add the ligand strain penalty (intramolecular energy added vs the input conformer, >=0) into docking_score and into multistart selection. Default False keeps docking_score purely intermolecular (smina-comparable). The strain value is always written as docking_strain regardless.

False
**kwargs Any

Passed to set_inputs. Accepts the inputs receptor and site_reference (file paths) and any mutable parameter by name (n_starts, basin_hops, max_iterations, box_size, rigid, score weights, and index_poses). index_poses is a bool (default False): when True the block runs in scaffold-indexed (template) docking mode -- the first molecule of each Bemis-Murcko scaffold is docked fully and its scaffold pose cached at ./.cmxflow/scaffold_index.db; later molecules sharing that scaffold transfer the cached pose and run a single constrained local search (faster for congeneric series, and series-consistent). The cache persists across runs and is reused. Cache keys are namespaced by the docking parameters and the receptor/reference paths, so changing score weights or search settings, or pointing at a different target/site, never reuses a stale pose. index_poses is a mode toggle: leave it out of the optimized parameter space.

{}

Clustering

cmxflow.operators.cluster.RepresentativeClusterBlock(**kwargs: Any)

Bases: MoleculeBlock

Assign molecules to clusters using streaming leader clustering.

For each molecule, computes an ECFP4 fingerprint (or Murcko scaffold fingerprint) and compares it against all existing cluster representatives via Tanimoto similarity. If the best similarity meets the threshold, the molecule joins that cluster; otherwise a new cluster is created. All molecules pass through annotated with cluster metadata.

Output Properties
  • cluster_id: Integer index of the assigned cluster.
  • cluster_representative: SMILES of the cluster's representative molecule.
  • cluster_similarity: Tanimoto similarity to the cluster representative.
Example
workflow.add(
    MoleculeSourceBlock(),
    RepresentativeClusterBlock(),
    MoleculeSinkBlock()
)
Mutable Parameters
  • threshold: Tanimoto similarity threshold for cluster assignment (0.05–0.95).
  • scaffold: If True, cluster by Murcko scaffold fingerprint instead of full molecule.

Initialize the representative cluster block.