Chemhacktica tutorial

Contents

Introduction

There are several related but distinct applications available through the Chemhacktica website. These applications and their corresponding use cases are listed in the table below.

ApplicationBrief DescriptionUse Case
Interactive Path Planning Multi-step retrosynthesis (manual) Interactively explore synthetic pathways comprised of multiple one-step retrosynthesis predictions
Tree-Builder Multi-step retrosynthesis (automatic) Identifying synthetic pathways that can be used to synthesize a target compound via a series of reactions (allows for creation of pathways from buyable chemicals to a target)
Synthesis Predictor Reaction evaluation considering conditions and impurities Predicting whether or not a target compound of interest will be generated when specified precursors react with one another, identifying conditions that are most likely to promote creation of the specified target compound, and identifying common types of impurities that may form during the reaction
SCScore Evaluator Estimate synthetic complexity Predicting the synthetic complexity of a molecule [1-5 scale] using a trained neural network model
Buyable Look-Up Database of commercially-available compounds Determining whether or not a chemical is listed in our Buyables database and, if so, for what price (in USD per gram)
Drawing Structures from SMILES/SMARTS Converting molecule/reaction SMILES strings or template SMARTS strings to skeletal formulas

Generally, the website takes target compound inputs as SMILES strings, although there are some compounds for which the common name (aspirin, ibuprofen, etc.) can be entered in the input window and converted to a SMILES string automatically. For more information about SMILES strings, click here. Compound SMILES strings can be obtained by first drawing the molecule of interest in ChemDraw, selecting the molecule with the selector tool, right-clicking, and choosing 'Molecule', 'Copy As', and finally, 'SMILES'. Alternatively, much of the website also supports drawing compounds of interest directly. This can be done by clicking the Draw links adjacent to input text fields, where applicable.

Interactive Path Planning

The interactive path planner uses one-step retrosynthesis tools to identify precursor molecules that can be combined to generate a target compound via a single reaction.

Usage

  • There is a guided tutorial built into the user interface. Click the "?" button to get started on the interactive path planning page

Options

  • Reaction templates have been culled from the Reaxys database. The templates are applied to the target molecule in an order determined by a machine learning prediction. There are two options for template prioritization: Num. Templates and Max. Cum. Prob. The search will terminate either when:
    • the maximum number of templates have been applied to the target, regardless of template application success
    • the maximum cumulative probability of the template scores (from the machine learning model) has been surpassed
  • Precursor recommendations are selected and sorted based on a scoring function. There are three options for precursor scoring:
    • Heuristic: crude synthetic accessibility score
    • Relevance+Heuristic: combination of heuristic and template relevance
    • SCScore: learned synthetic complexity score
  • In addition, a minimum plausibility may be specified. The plausibility metric represents the likelihood that a pair of candidate precursors can be successfully used to generate the target compound under any set of conditions.

Output

The one-step retrosynthesis tool generates a list of recommended precursor pairs shown on the right side of the page. The pairs are rank-ordered based on the scoring function specified by the user. For each pair, the following information is displayed:

  1. Skeletal structures
  2. "Score": the value of the score defined by the chosen scoring function
  3. "# Examples": the number of entries in Reaxys that support the template(s) that was (were) applied to generate the precursors
  4. "Template score": the relevance score (or maximum thereof) assigned by the neural network to the template(s) that was (were) applied to generate the precursors
  5. "Plausibility": the score given to the reaction from the fast filter model
  6. "Reaction cluster": each precursor is assigned a cluster according to similarity calculated through the clustering method selected in the user settings. By default, only the top ranked precursor from each cluster is shown in the list, but the cluster icon (four squares) can be used to view other suggestions in the cluster.

Clicking on a reaction node in the graph visualization (on the left side of the page - circles with numbers) changes the information displayed on the right side of the page. For each reaction, the following information is displayed:

  1. Reaction smiles
  2. Reaction skeletal image
  3. A link to evaluate the reaction in a new tab - this link takes you to the Interactive Reaction Evaluator
  4. "Retroscore": the value of the score defined by the chosen scoring function
  5. "Template score": the relevance score (or maximum thereof) assigned by the neural network to the template(s) that was (were) applied to generate the precursors
  6. "Plausibility": the score given to the reaction from the fast filter model
  7. "Num. Examples": the number of entries in Reaxys that support the template(s) that was (were) applied to generate the precursors
  8. "Supporting templates": a list of links to view the templates (and possibly reaction precedents) used to generate this precursor

Example

The interactive tutorial built into the Interactive Path Planning page will walk you through an example use case. Click the "?" button to get started on the Interactive Path Planning page.

Tree Builder

The Tree Builder is used to identify precursor molecules that can be combined to generate a target compound via a series of reactions (a “tree”). This is a multi-step retrosynthetic planner.

Usage

  1. Specify target compound:
    • Enter SMILES string of target in “Target compound” field
    • OR
    • Click “Draw” link adjacent to “Target compound” field and draw target compound in the window that appears
  2. Confirm that parsed structure matches intended target
  3. Select expansion settings (see Options, below)
  4. Select stopping criteria (see Options, below)
  5. Select evaluation settings (see Options, below)
  6. Click Start

Options

Expansion settings

  • The maximum depth setting allows you to specify the maximum number of sequential synthetic steps you would like to see in the results
    • The tree-builder roughly follows a “depth-first” search, which means that it prioritizes expanding existing chemical nodes at depth k to depth k+1 over collecting additional option-nodes at depth k. Thus, setting the maximum depth too high may preclude the creation of several good disconnections at a particular depth
    • We recommend setting the maximum depth equal to 1 + the length of the shortest anticipated route
  • The minimum retrosynthetic template count setting allows you to specify the minimum number of instances in Reaxys that support the templates that will be applied in your tree expansion. We recommend leaving this setting at the default of zero (effectively: no minimum), unless you have some motivation to exclusively use popular chemistry
  • The tree search relies on truncated branching: after a chemical node is expanded to generate a list of potential precursors, a heuristic scoring function is used to select the most promising precursors from the list. The maximum branching factor specifies the number of precursors to select during this truncation. The heuristic prioritizes buyability and, to a lesser extent, structural simplicity
  • Expansion time (s): this setting allows you to specify the amount of time for which you would like the expansion to run
  • For information about template prioritization, num. templates, max. cum. prob, and precursor scoring: see One-Step Retro Options

Stop Criteria

  • Maximum chemical price ($/g): this setting allows you to specify the maximum price for chemicals that the algorithm deems buyable
  • Chemical property logic (optional): changing this setting from the default ‘None’ to ‘OR’ or ‘AND’ brings up four new text fields that allow you to specify the maximum number of carbon, nitrogen, oxygen, and hydrogen atoms you would like the starting materials to contain. The ‘OR’ option will allow the algorithm to terminate the expansion of a chemical node either if the compound is known to be buyable or if it satisfies the maximum atom criteria specified. The ‘AND’ option will prevent the algorithm from terminating an expansion unless both the buyability and heavy atom criteria are satisfied. Setting the maximum count for a particular atom type to an arbitrarily large number (e.g., 1000, which is the default for each) effectively places no constraint on the algorithm in relation to that atom type.
  • Chemical popularity logic (optional): changing this setting from the default ‘None’ to ‘OR’ will allow the algorithm to terminate the expansion of a chemical node either if the compound is known to be buyable or if it appears with a high frequency as either a reactant or product in the Reaxys database. If a compound appears frequently in Reaxys, it is likely that a synthetic chemist would know how to make it.
    • Specifying this option can be very useful for lengthy syntheses. After an initial expansion (with this option set to OR) from a target compound is complete, those starting materials that are not buyable can be copied and expanded to buyable chemicals on their own.

Evaluation settings

  • Minimum plausibility: this setting allows you to specify the minimum plausibility of each reaction in your expansion. Reactions that are proposed by the retrosynthetic algorithm but do not meet this minimum will be removed from the expansion.
  • Manual forward prediction: this setting defines the forward prediction approach that will be implemented if a user clicks the 'Evaluate' link on a particular proposed synthetic tree in the output. This does not affect the outcome of the retrosynthetic expansion itself.
  • Pay close attention to chirality: we recommend keeping this box checked. This prevents the application of achiral templates to chiral molecules, or chiral templates to achiral molecules, although it is somewhat more computationally expensive to do so.

Output

The tree-builder produces a rank-ordered list of recommended synthetic routes. Compounds that are buyable are outlined in green; those that are not are outlined in orange. Between each target compound and its retrospective precursor(s) is a number that specifies the number of examples of reactions in Reaxys that support the template(s) that was (were) applied to make the proposed disconnection.

You can store the results of an expansion and review them later. To do this after an expansion is complete, scroll to the top of the website and click the "My Results" tab in the navigation bar. In the list that appears, click the “Save this page” link to save your results. From this list, you can also access results that you have saved previously (to do this, click "See saved results" instead of "Save this page").

Additionally, once a retrosynthetic expansion is performed, the user has the option to ban a chemical or reaction that appears in the tree if it is, e.g., known to be patented. Then, the user may re-run the expansion to obtain recommendations that exclude the chemical(s) or reaction(s) that were deemed undesirable. The list of compounds and reactions that you have banned can be accessed by clicking the "My Banlist" link in the navigation bar at the top of the website. For more information, see the Example below.

Example

A set of proposed synthetic routes to the compound Fluconazole is easily created using the default tree-builder settings. To get started, click the Tree Builder module option (accessible from the Modules drop-down at the top of the Chemhacktica website). This will bring up a field where you can enter the SMILES string of your target (in this case, Fluconazole):

When you enter Fluconazole’s name or SMILES string in the "Target compound" field, the various algorithm settings appear, along with the parsed structure (which you should confirm is correct):

Click Start. Once the expansion is complete, a summary of the results of the expansion appears, followed by a rank-ordered list of potential synthetic routes. The summary of the expansion results shows the number of chemicals (or, at half-depths, reactions) that were found by the algorithm during the expansion.

The first recommendation in the list of synthetic routes is Fluconazole itself, because it can be purchased for less than the maximum chemical price that was specified as part of the stop criteria. Since Fluconazole is commercially available, it is outlined in green:

The second option is a linear synthesis consisting of two reactions. The epoxide intermediate is outlined in orange because it is not buyable:

Hovering over each of the compounds pulls up a pale-yellow window with additional information, including the compound’s SMILES string, frequency of appearance as a reactant or product in Reaxys, and the options to “ban” or “hide all” (see Tree Builder - Output). For compounds that are commercially available, the price per gram is listed (non-purchasable compounds show the words “not buyable”). For example, for the non-purchasable epoxide intermediate:

As described in Tree Builder - Output, compounds and reactions may be banned, if desired. To do this, for example, for the epoxide intermediate in Option 2, mouse over the compound, and click the ban link in the pale-yellow window that appears. Another pop-up will appear with a text field where you can specify the reason for banning the chemical (for your records only). After you click OK, another window will appear, confirming that the compound with the specified SMILES string was banned. Click OK again.

To review the compounds and reactions that you’ve banned, scroll to the top of the webpage and click the "My Banlist" tab. You will be taken to a new webpage that shows a list of the chemicals you’ve banned. For now, only the epoxide intermediate appears. You have the option to temporarily disable compounds on your banlist by clicking the checkmark under the Active column on the left; you may also permanently delete compounds on your banlist by clicking the trash icon under the Delete column on the right

Now that we’ve banned this chemical, it is possible to return to the Tree Builder tab and rerun the Fluconazole expansion. This time, none of the trees that the algorithm recommends will contain the banned chemical.

Synthesis Predictor

The Synthesis Predictor module now combines what were previously four separate modules: context recommendation, forward prediction, reaction evaluator, and impurity predictor. The interface is divided into three parts: condition (context) recommendation, synthesis prediction, and impurity prediction

Usage

  • There is a guided tutorial built into the user interface. Click the "?" button to get started on the Synthesis Predictor page

Options

There are four different models that are used in various stages of the workflow described below. Only one version of each model is currently available, however we are always training and testing new models that may get added in the future

  • Condition recommednation model: Predicts the best set of reaction conditions given reactants and products
  • Forward prediction model: Predicts the most likely major product given reactants (and conditions)
  • Reaction scoring model: Predicts a reaction plausibility score given reactants and products
  • Atom-mapping model: Predicts atom mapping for a reaction

Output

The context recommender produces a list of recommendations for reaction conditions. Each recommendation contains a subset of the following features: reagent, catalyst, solvent (1 or 2), and temperature. We define reagents as compounds that do not contribute heavy atoms to the product. In some cases, the distinction between a reagent and a catalyst in Reaxys is ambiguous, such that the distinction between these categories in the recommendations may be ambiguous as well, although data cleaning has been performed to help mitigate this. Note that the context recommender will not produce an error if the proposed chemical transformation is not achievable via a single reaction.

The forward predictor produces a list of candidate reaction outcomes that are rank-ordered by a probability score that reflects the model’s confidence that each outcome will appear. The probabilities for a set of candidates should not be interpreted together to comprise an expected product distribution, nor should an individual probability be interpreted as an expected yield; rather, a probability for a given outcome purely reflects the model’s confidence regarding whether or not that compound will appear at all (the model is not trained on specific yield-related data).

The impurity predictor produces a list of common types of impurities that could form given the reactants, products, and reaction conditions.

Example

The interactive tutorial built into the Synthesis Predictor page will walk you through an example use case. Click the "?" button to get started on the Synthesis Predictor page.

Loading...