mmpdb crowd­funding project

How can we raise money to fund open source software development in cheminformatics? It’s a hard question. Simple donations don’t work – companies might not even have a mechanism to make donations. Consultant-based funding doesn’t work that well either, because the cost of developing a general-purpose tool is two or three times more expensive than developing a tool which only meets the specialized needs of one client, and few clients are willing to subsidize the rest of the field. Proprietary software development solves the problem by getting many people to pay for the same product. Can we learn from the success of proprietary software to get the funds which would certainly be useful in improving open source software?


This is an experiment to see if a crowdfunding consortium can be used to fund the matched molecular pair program “mmpdb”. The deadline to join is 15 February 2020!

J. Chem. Inf. Model. 2018, 58, 902–910.
ACS Editors’ Choice

Join now


Interruption due to COVID

mmpdb development has been on hold since February 2020. The extra child minding and general upheaval of life meant I have had to focus on chemfp.

mmpdb development is expected to resume in mid-February 2021, once chemfp 3.5 is released.

Why is mmpdb interesting?

Many people have written software to find matched molecular pairs. I know of one company with several different in-house versions, and I doubt they are unique.

Some of these companies are looking to switch to mmpdb for internal use because it has several features beyond what most other tools have.

  • mmpdb can handle large data sets.
  • mmpdb uses the fragment-and-indexing approach of Hussain and Rea. The fragmentation step is parallelized. When re-fragmenting an updated compound dataset, the previous fragmentation information can be reused as a cache, rather than re-computing all of the fragments.
  • The fragmentation is fully canonical, giving in fully canonical transforms. (The original Hussain and Rea method could identify up to 6 different, equivalent transforms for 3-cut fragmentations.)
  • Chiral structures are handled through “up-enumeration”, where all 3n chiral, inverted chiral, and achiral forms (up to uniqueness) are fragmented and indexed.
  • The local chemical environment may affect a transform. The fragmentation step computes circular fingerprints around the attachment points, up to a radius of 5 bonds, which is used during indexing to identify transforms at different levels of environment specificity.
  • If physical property information is available, then indexing will generate “property rules”. These contain overall statistics and a list of the associated pairs, for each environment radius.
  • The indexing results are stored in a relational database; currently SQLite-only. This crowdfunding effort will add Postgres support.
For more details, read the paper mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets, which was made an ACS Editors’ Choice.

How do you explain “crowdfunding” to accounting?

Don’t. (Unless you really want to.) Instead, tell them that you are going to purchase a new version of mmpdb with the following features:
  • Postgres support, as an alternative to the existing SQLite support;
  • A new ‘mmpdb proprulecat’ command to export the property rules in the database (transformation details and property statistics of the associated pairs) in CSV form, along with a fragment SMILES for the given environment.
You will receive these within two weeks of sending me a purchase order.

In addition, your purchase includes membership in the crowdfunding consortium. As more people join, and additional funding goals met, I will continue to improve mmpdb, and you will get those improvements as part of your membership.

If EUR 23 000 is raised, I will contribute the new code upstream to the main mmpdb repository for anyone to use by 1 October 2020. If EUR 50 000 is raised, I will contribute the new code upstream “immediately.” The deadline for joining the consortium is 15 February 2020 – join now!

Why should you fund mmpdb?

If you want to use mmpdb in-house, then you probably also want someone to support it, and improve it. While you can do that yourself – it is open-source and available for free – it’s cheaper for you if you can share that cost with other people.

Of course, it’s even cheaper if other people pay for mmpdb development, and you get the result for free. This crowdfunding effort uses a delayed release model to incentivize you to pay for mmpdb now, rather than wait most of a year for the new features.

In the long term, people have asked for new features like a web interface to mmpdb, or support for categorial properties and multi-valued properties. The current mmpdb code base is not set up for this sort of growth. It needs some cleanup, and has only a basic test suite.

If you want those sorts of interesting long-term features, then you should fund this effort now to set the groundwork for future growth and show that crowdfunding can be the way to get there.

How do I join?

Send me an email saying that you are interested. Once I get a purchase order, you are a member.

Here are the suggested consortium membership rates:

  • Academics - EUR 1 000 (no warranty and only limited support)
  • Industry - EUR 5 000 (includes 9 months of support)
See the sample quote for the details your purchasing department may want to know.

Feel free to pay more if you want! We can also discuss the price to add other specific features you may want me to develop, as well as possible features to add in future crowdfunding efforts.

If invoicing doesn’t work for you, I can arrange a PayPal transaction, with a 5% overhead.