cmxflow¶
Composable cheminformatics workflows with Bayesian optimization.
cmxflow is a Python framework for building and optimizing cheminformatics pipelines. Chain together molecular operations as blocks, then let Bayesian optimization find the best parameters for your task.
Two Usage Modes¶
cmxflow is designed to work both as:
- An Agentic Tool — via MCP server, allowing LLM agents like Claude to build and optimize workflows conversationally
- A Programmatic API — for direct Python usage in scripts and notebooks
Installation¶
Base environment¶
conda config --set solver libmamba
conda env create -f conda.yml
conda activate cmxflow
poetry install
MCP server (for use with Claude)¶
See Using with Claude for details.
Optional: PyMOL¶
Required only for 3D structure visualization (view_structures MCP tool):
Quick Start¶
Programmatic API¶
from cmxflow import Workflow
from cmxflow.sources import MoleculeSourceBlock
from cmxflow.operators import MoleculeSimilarityBlock, RDKitBlock
from cmxflow.sinks import MoleculeSinkBlock
# Build a workflow (everything is composable, add as many blocks as you want)
workflow = Workflow()
workflow.add(
MoleculeSourceBlock(), # Reader
MoleculeSimilarityBlock(queries="crystal_ligand.sdf"), # 2D Similarity
RDKitBlock("rdkit.Chem.Descriptors.MolWt"), # Arbitrary rdkit method
MoleculeSinkBlock() # Writer
)
# Run it
workflow("molecules.sdf", "results.sdf")
Agentic (e.g., via Claude)¶
I need to build a ligand-based virtual screening workflow. I'm not sure if 2D or 3D is better. Can you optimize two workflows? I want to see the results of 2D first. The benchmark is in "benchmark.parquet" with hits labeled in the "active" column and the query is in "reference.sdf".
See the Block Catalog for all available blocks or the
examples/basic_usage.ipynb
notebook for a full tutorial covering similarity search, optimization, and parallel execution.