About me

Hi, my name is Andrew Dalke <dalke@dalkescientific.com>. You’ve probably used some of my software before. I make my living as a software developer in cheminformatics. I consult and write custom software, sell copies of chemfp, and teach training classes for computational chemists to get up to speed with Python.

I am the primary author of mmpdb. That package builds upon some of the cheminformatics expertise I’ve developed over the last 20 years.


For example, mmpdb includes code to manipulate SMILES strings at the syntax level, rather than through a chemistry toolkit. I really enjoy the SMILES language. I co-authored the OpenSMILES specification, and have written tools like smiview to help work with SMILES strings. I co-authored a preprint with Noel O’Boyle on DeepSMILES, a postfix form of SMILES, and wrote a fast DeepSMILES to SMILES converter.

Maximum common substructure

Roche funded the original mmpdb development work, and contributed it to the RDKit under an open source license. This isn’t the first time they did something like that. A few years previous they funded me to develop a maximum common substructure implementation, which was also contributed to the RDKit project. That implementation can find the MCS of a group of compounds, which is quite useful if you want to depict a series of structures with a common core.


In cheminformatics I’m probably best known for my chemfp package, which implements high-performance cheminformatics similarity search. If you haven’t heard of it, you can read the documentation, download the no-cost/open source version for Python 2.7 today (or install using pip install chemfp), or contact me if you want to buy or trial the commercial version, which also supports Python 3.

cheminformatics toolkits

I mostly use the RDKit and OpenEye toolkits these days. When I started in cheminformatics, I developed the PyDaylight toolkit, which provided a high-level Python API around a SWIG interface to Daylight’s low-level C toolkit API. I published details about it in Dr. Dobb’s Journal.


Almost everything I do is written in Python. I started using Python full-time in 1998. I used to be very active in the Python community and presented at many Python conferences. I even had commit rights to the CPython core.

Bioinformatics and structural biology

I used to work in bioinformatics before I worked full-time in cheminformatics. I co-founded the Biopython project, was on the board of the Open Bioinformatics Foundation, and helped organize the Bioinformatics Open Source Conference.

Finally, I started working with molecular-oriented software back in 1992, with a summer project at Florida State University to parallelize CHARMm. I later joined Klaus Schulten’s group at the University of Illinois where I was first the junior co-author of VMD, then the main developer for nearly two years. I also contributed to the original version of NAMD.

Where am I?

I am based in Trollhättan, Sweden, and work through my Swedish registered company, Andrew Dalke Scientific, AB.