Software for predicting metabolism




















Examples of predictions by BioTransformer are illustrated in Fig. Details of the evaluation are available in the Additional file 5. Examples of human non gut microbial metabolites predicted by BioTransformer. ADMET Predictor is a software tool that allows the prediction of sites of metabolism and the resulting metabolites upon CYPcatalyzed biotransformation. Details of the evaluation are available in the Additional file 6.

The respective structures were retrieved from ContaminantDB [ 41 ]. In addition, BioTransformer predicted three more metabolites for the degradation of Disulfoton.

All three metabolites resulted from the correctly used biotransformation rule bt , which was applied at three different sites of metabolism, producing two metabolites in each case. This was designed to simulate a real case involving the MS-based experimental analysis of epicatechin metabolites produced by rats upon a five-day treatment with epicatechin, as done by two of the co-authors of this manuscript CM and JF.

Epicatechin is an important compound from the chemical class of flavanols, and is known to exhibit cardiovascular health benefits [ 85 , 86 , 87 ]. It is a major component from cocoa extracts, and is also abundant in apples, grapes, berries, and tea. Briefly, rats were fed for 5 days a standardized diet supplemented with epicatechin. Spot urines were sampled after the supplementation period and compared to the spot urines sampled under the same conditions after 9 days of the same diet without epicatechin.

Monoisotopic masses were generated by subtracting 1. Details regarding the data extraction process are provided in Additional file 2. These masses exhibited marked increases in intensity after epicatechin supplementation compared to baseline. The human supertransformer option superbio was used to facilitate putative compound identification. From the monoisotopic masses that were extracted, BMIT identified 37 possible metabolites of epicatechin corresponding to 20 unique masses.

These masses do not correspond to adducts or isomers and may therefore be considered parent ions Additional file 7 : Table S1. In order to acquire additional support for the identity of the predicted metabolites, the scientific literature was searched manually to collect structural data regarding epicatechin metabolites reported in previous experimental studies of both humans and rats.

A total of 56 single- and multi-step metabolites of epicatechin, corresponding to 37 monoisotopic masses were identified Additional file 7 : Tables S1 and S2. Of the 37 predicted metabolites matching our experimental data, 22 matched 11 unique and previously reported monoisotopic masses. Among those, 18 compounds corresponded to previously reported metabolites. For the nine other experimental masses that had matches with BMIT predictions, 15 possible metabolites never previously reported were obtained.

Figure 9 shows examples of the suggested epicatechin metabolites with their masses, as identified in our study. A complete list of predicted epicatechin metabolites, along with their corresponding metabolic pathways leading to each metabolite are available in Additional file 8. Moreover, metadata e. Identification of predicted metabolites of Epicatechin in humans which are assumed to be nearly identical for rats.

The figure illustrates: a metabolites correctly identified by BMIT, and corresponding to masses in Da observed in our experimental study; b metabolites correctly identified by BMIT, and corresponding to masses observed exclusively in previous studies, and; c a previously reported metabolite of epicatechin not identified by BMIT.

We also tested whether BMIT could identify any of the remaining 38 known metabolites corresponding to 26 unique masses previously reported, but not observed in our study, or not selected by our data treatment parameters.

The 26 unique masses were provided to BioTransformer as input, and the identification was performed using the same mass tolerance as before 0. BMIT was able to suggest 28 molecules for 19 unique masses. Among those, 21 compounds corresponding to 18 unique monoisotopic masses had previously been reported as epicatechin metabolites Additional file 7 : Table S2. Figure 9 illustrates a number of epicatechin metabolites exclusively reported in previous studies, which were correctly identified by BMIT Fig.

Overall, BMIT was able to suggest 39 epicatechin metabolites that were previously reported in the literature, 18 of which were observed in our study. Moreover, BMIT suggested 28 epicatechin metabolites that had not been reported in previous studies 17 corresponding to masses that do not match previously reported ones, and 11 extra structures matching previously known masses.

BioTransformer is a software tool that combines both a knowledge-based approach and a machine learning approach to predict the metabolism of small molecules, and to assist in metabolite identification.

The knowledge-based system consists of a biotransformation database MetXBioDB , a knowledgebase the reaction knowledgebase , and a reasoning engine.

MetXBioDB is a unique resource that is freely available, and covers a wide range of enzymatic reactions that take place in human tissues, the human gut and the environment soil and water microflora. For each biotransformation, at least one scientific source or reference is provided.

One potential application of MetXBioDB is in the design of biotransformation rules with narrow specificity, which can be used for in silico metabolism prediction. Although it covers a large number of enzymatic reactions, it is clear that more data is needed in order to cover an even larger set of reactions e. Moreover, users could benefit from data about the different sites of metabolism for each specific biotransformation, as it would serve as a training set for the development of models for the prediction of sites of metabolism SoMs.

For the current version of MetXBioDB, the intent was simply to provide an easily readable and comprehensible data set. However, providing MetXBioDB in a database format that can be parsed and queried in a more sophisticated way e.

SQL would make the database much more useful to a broader number of users. We welcome and encourage contributions in regard to the curation, improvement, and expansion of this resource. In our first test, BioTransformer was evaluated against Meteor Nexus v. Meteor Nexus is a commercially available software tool that is considered to be the gold standard for predicting biotransformations of xenobiotics. It is worth noting that BioTransformer heavily relies on the selective nature of the biotransformation rules and other structural constraints, in addition to its implementation of relative reasoning.

On the other hand, Meteor Nexus combines the continuous absolute scoring of biotransformations with relative reasoning, providing binned data for different levels of reasoning through a more dynamic scoring system.

Overall, the performance of BioTransformer suggests that the freely accessible BioTransformer tool could be used to assist scientists in various drug discovery and environmental safety studies. The better performance, compared to the first test, can be partly explained by the fact that some endobiotics, such as sphingo- and glycerophospholipids, follow very classical and well-known metabolic pathways Additional file 2 : Fig.

S3 , which were encoded in the reaction knowledgebase. Therefore, these results still show that BioTransformer was also able to accurately predict the metabolism of compounds with a more complex metabolism Fig. In fact, BioTransformer was able to correctly predict the human and human gut metabolism of polyphenols e. Epicatechin , and pharmaceuticals e.

This is very promising, as little is known about gut microbial metabolism of those classes of compounds. Even for the well-studied, and biologically relevant class of polyphenols, a lot of experimental work is needed to validate the metabolic pathways for hundreds of known compounds.

BioTransformer could be used to provide accurate suggestions about the identity of their metabolites and propose metabolic pathways, which could then in turn be validated experimentally.

Overall, the first three tests demonstrate BioTransformer ability to accurately predict human and human gut microbial metabolism for a very diverse set of metabolites, covering endogenous metabolites, pharmaceuticals and personal care products, food compounds, as well as other exogenous compounds.

In particular, BioTransformer is open access, and it covers a much wider range of chemical substrates and metabolic biotransformations. These results suggest that BioTransformer could also be used to accurately predict environmental microbial metabolism. This task tacitly relies on the metabolism prediction task, and BioTransformer was able to suggest 37 metabolites matching 20 masses from a list of monoisotopic masses extracted from the MS analysis of urine samples collected after exposure to epicatechin Additional file 7 : Table S1.

Of those, 18 metabolites were identified as previously known metabolites. Twenty-six monoisotopic masses matching to 36 reported epicatechin metabolites were not observed in our experimental study. This variation in the observed metabolites may be caused by different experimental settings and analytical conditions e.

For example, rats are expected to perform less sulfonation of epicatechin than humans [ 87 ]. In a second run, BMIT was used to search metabolites corresponding to monoisotopic masses that were observed in previous studies but not in our experimental dataset. In this test it was able to correctly identify another 21 known epicatechin metabolites.

Overall, BMIT was able to predict 39 out of 56 previously reported compounds. The discrepancy between the number of metabolites suggested by BMIT and the number of previously reported metabolites could be explained by several factors. First, ten of the known epicatechin metabolites not predicted by BMIT 3 masses observed in our study are products of a 2-step conjugation, but the superbio option simulates only one conjugation step, as it is often sufficient to make a molecule stable and hydrophilic enough for excretion based on experimental data from MetXBioDB.

Second, in some cases e. Often, the same reaction especially conjugations can occur at several locations within a molecule, thus producing regioisomers. These examples illustrate a common problem with metabolism prediction in the identification of the correct sites of metabolism. We believe that increasing the number of true positives, as well as reducing the number of false positives could be achieved by integrating models that more accurately predict sites of metabolism.

It is worth mentioning that these are only putative predicted metabolites, and that the results of the BMIT must be validated experimentally, through further MS-based investigations. However, it was beyond the scope of this particular experimental study to fully investigate the metabolism of epicatechin. Indeed, we believe that complementary analytical platforms such as GC—MS would be necessary to cover the whole chemical space of epicatechin metabolites.

Epicatechin is metabolized in the liver, and more extensively by the gut microbiome. In addition, the predictions generated by BMPT could be very useful for suspect-screening analysis, and thereby permit faster non-targeted data analysis and more facile putative compound identification. This makes it particularly useful for the wide-ranging applications seen in metabolomics and other small molecule studies.

Furthermore, the accuracy, coverage, precision and recall of BioTransformer appear to be as good as, or even much better than some of the most highly regarded metabolic prediction systems now available. It is also notable that BioTransformer, unlike most of its competitors, is freely available. Certainly a more extensive analysis of a much larger set of query compounds would likely better illustrate the strengths and weaknesses of BioTransformer. While there are a number of strengths to BioTransformer, we believe that certain improvements could still be made to the program.

In particular, adding an option for absolute reasoning would give BioTransformer the ability to select candidates with a set cut-off score. As many xenobiotics as well as endogenous compounds are known to be metabolized in the gut [ 75 , 89 , 90 , 91 , 92 ], it will be important to further expand the coverage of gut microbial metabolism in BioTransformer.

We plan to make these improvements in upcoming versions of BioTransformer. Over the longer term we are hoping to integrate more machine learning prediction models e. This integration depends mostly on the amount of data available as machine learning hinges on having large and diverse training sets to optimize its performance.

Given that the number of experimentally confirmed biotransformations is still quite low for the systems of interest, it is likely that this will take a number of years to complete.

In this work, we have presented BioTransformer, a freely available, open access software tool that supports the rapid, accurate, comprehensive prediction of metabolism of small molecules in both mammals and environmental microorganisms. BioTransformer can also assist in metabolite identification using experimental MS data. BioTransformer can be used either as a command-line tool or as an imported library.

Moreover, BioTransformer is also freely accessible as a web service at www. The web service provides users with the possibility to manually or programmatically submit queries, and retrieve data generated by the BioTransformer software tool.

Within mammals, we have shown that BioTransformer was able to accurately predict single-step biotransformations for a diverse set of xenobiotics, including drugs, pesticides, and food compounds.

The reactions that BioTransformer predicts cover Phase I and Phase II metabolism in mammals, as well as the human gut microbial metabolism. Unlike most other metabolic prediction tools, BioTransformer also supports the prediction of metabolism of small molecules by environmental microbes. The integration of environmental metabolism with endogenous human and gut microbial metabolism allows BioTransformer to address many of the predictive metabolic needs of metabolomics or exposomics researchers, which tend to span a much wider range than, say, drug researchers, food chemists or environmental scientists.

Despite its strengths, BioTransformer is not without some limitations. Addressing these would certainly make the program much more flexible, more accurate, and more comprehensive. Google Scholar. Chem Res Toxicol 29 12 — Sustain Chem Process Article Google Scholar.

Drug Discov Today 17 11—12 — Biomed Res Int. Science — Testa B Drug metabolism for the perplexed medicinal chemist. Chem Biodivers 6 11 — Interdiscip Toxicol 2 1 :1— Drug Metab Dispos 29 9 — Environ Sci Pollut Res 20 5 — Commun Agric Appl Biol Sci 78 2 — Shamasunder B Chlorpyrifos contamination across the food system: shifting science, regulatory challenges, and implications for public health. Routledge, New York, pp — Chapter Google Scholar. Emerg Contam 3 1 :1— Chemosphere 93 9 — Environ Health Perspect 12 — Wishart DS Computational strategies for metabolite identification in metabolomics.

Bioanalysis 1 9 — Environ Toxicol Chem 28 12 On my MacPro it utilised multiple cores. The output is stored in a folder entitled "metabolitepredictionresults" created in the users home folder.

This folder will contain the predictions in the form of one or more SDF files, whereby each SDF file corresponds to up to input molecules.

The input molecules are included as the first record in the output file s , so that each input molecule is followed by its predicted metabolites. The resulting output file contains a large number of predicted metabolites together with a score, a description of the type of reaction and the InChi. This looks to be really comprehensive and would be very useful for those involved in metabolite ID. Running the prediction with aspirin as the input highlights a variety of non-CYP mediated metabolic pathways.

I tried a range of other molecules and GLORYx was really very impressive in identifying potential metabolites. Way2Drug offers a web service for predicting sites of metabolism details of which have been published DOI. However I would not recommend that you use it for proprietary molecules. All major classes of metabolic reactions—aliphatic and aromatic hydroxylation, N- and O-glucuronidation, N-, S- and C-oxidation, and N- and O-dealkylation are evaluated.

Global solutions, intended to predict the metabolism of any molecule exposed to a complex biological system. This type of solution is often rule-based and uses an extensive database of known biotransformations Programs such as MetaDrug Expert Opin.

Drug Metab. The transformations described include among many:- C,N,S and P-oxidation, including dealkylation, hydroxylation, double bond peroxidation, Quinone formation, reduction nitro, carbonyl, azo, sulphur , hydrolysis esters, amides, phosphates, epoxides , glucoronidation, sulphation, glutathione conjugation, methyl transferases, amino acid conjugation.

Other programs using similar approaches include Meteor , Pure Appl. CypReact is a command line tool, is extremely fast and is ideal for quickly evaluating a batch of compounds. See also the section on Metabolism. Predicting Metabolism All drugs are subject to metabolic process, in general these process serve to increase the polarity of molecules in an effort to increase excretion. These weighting factors thereby maintain the same ratio of as described previously in ref 15 but are scaled such that the final priority score more reflects a probability-like concept, with values ranging from 0 to 1.

The final priority score of a predicted metabolite is thereby the product of the maximum SoM probability and the weighting factor corresponding to the priority level, common or uncommon, of the reaction type. The final assignment of a priority level to the reaction rules was determined rationally.

The phase 2 rules corresponding to the five main phase 2 enzyme families were designated common, while the others glycination, phosphorylation, and dephosphorylation were designated uncommon. Predicted metabolites were compared to the known metabolites from either the reference data set or the test data set using InChIs that were generated without stereochemistry information. Spontaneous oxidation from an aldehyde to a carboxylic acid was considered during the evaluation process, as in GLORY see ref 15 , but only for predicted metabolites that were the product of a phase 1 reaction rule.

Note that this applies only to the validation and does not affect the predicted metabolites that are provided to the users of GLORYx. When SyGMa is run with a single metabolism Scenario object specifying both phase 1 and phase 2, the rule sets for the phases are applied sequentially, that is, the first rule set listed phase 1 is applied first, and then the second rule set phase 2 is applied to the parent compound as well as the products of the first rule set.

This behavior corresponds to a different research question than the one posed in our evaluation, so SyGMa was instead run twice for each molecule in the test set, once using only the phase 1 rules and then separately using only the phase 2 rules.

The predictions from both runs were combined. The concept of GLORYx is that SoMs, or rather the probability of each heavy atom being a SoM, are predicted with FAME 3, and, building on these predictions, a set of reaction rules is applied in order to generate the structures of predicted metabolites for both phase 1 and phase 2 metabolism.

We have previously determined, for our earlier CYP-focused metabolite prediction tool GLORY, that using the predicted SoM probabilities as a hard cutoff to determine whether or not to apply a reaction rule at a given position is not a particularly effective approach, except if the goal is to simply reduce the number of predictions. This reference data set was used to examine phase 1 and phase 2 metabolism separately to make sure each phase could be handled satisfactorily on its own as well as to determine how to best combine predictions for both phases.

Considering both phase 1 and phase 2 metabolism, and using the data preparation process described in Methods , we collected metabolite data for parent molecules from DrugBank and parent molecules from MetXBioDB. Of these parent molecules, are identical, not considering stereochemistry, meaning there are parent molecules total from both sources combined. The metabolites for the overlapping parent molecules were consolidated when forming the reference data set. Within this overlap, of metabolites were present in both data sets.

Neither data source includes annotations regarding whether any given metabolite data were collected in an in vivo or an in vitro study. Beyond noting the amount of overlap between the two data sources, we wanted to examine the chemical space covered by each, in terms of the parent molecules. For DrugBank, an analysis focused specifically on the compounds for which there is metabolite data has also not yet been undertaken.

When performing this analysis, we retained the overlapping parent molecules in both data sets. In addition, we noted a shift in the distributions, whereby DrugBank has a median molecular weight of while MetXBioDB has a median of only The mean values are not compared due to the presence of an outlier with a molecular weight of Da semaglutide in the DrugBank data. However, for clog P the median values of the two distributions are very similar, at 3. A Distribution of molecular weight. B Distribution of clog P.

C Histogram of the number of metabolites per parent molecule in terms of percentage of parent molecules. The percentage of the total variance explained by each of the first two principal components is included in the axis labels. In the context of metabolite prediction, it is especially interesting to compare the ratio of parent molecules and metabolites recorded in a data set as this ratio can give an indication of the comprehensiveness of the metabolism data metabolism data are generally incomplete; more metabolites are typically known for compounds of high relevance, in particular approved drugs.

In both cases, the majority of parent molecules have only one known metabolite. From the PCA we see that there is a large amount of overlap between the two data sets, which is unsurprising given that most of the molecules in the DrugBank data set are also included in the MetXBioDB data set.

However, we also see that there are portions of the chemical space populated by parent molecules from DrugBank but not from MetXBioDB, which is consistent with the results from the comparison of the distributions of molecular weight and clog P.

In particular, molecule size seems to influence the first principal component, while polarity seems to influence the second principal component. Interestingly, the five data points two from DrugBank, three from MetXBioDB in the far right portion of the PCA plot correspond to the five largest molecules included in the calculation, all of which have a molecular weight between and Da the outlier with a molecular weight of over Da was not included in the PCA.

These five molecules consist of five macrocyclic peptides including cyclosporine and one nonmacrocyclic peptide angiotensin II. Whereas the above chemical space analysis included all valid metabolite data from DrugBank and MetXBioDB, a further data preprocessing step was performed for the formation of the final reference data set used for the evaluation of the metabolite structure prediction approach.

All metabolism data corresponding to parent molecules contained in the test set were removed from the reference data set. This removal resulted in a final reference data set containing parent molecules and a total of metabolites. The reference data set was further separated into two subsets, corresponding to phase 1 and phase 2 metabolism. The phase 1 subset contains parent molecules and metabolites, and the phase 2 subset contains parent molecules and their metabolites Table 1.

Note that some of the phase 2 metabolites do not correspond to any of the listed enzyme families, just as some of the phase 1 metabolites are not formed by CYPs. The two separate subsets of the reference data set were used to analyze the performance of GLORYx for phase 1 and phase 2 individually, because there are slightly different considerations for each metabolism phase. In addition, the entire reference data set was used to analyze the combined prediction of both phase 1 and phase 2 metabolites.

Note that GLORYx is unable to process two parent molecules in the phase 1 subset of the reference data set and one parent molecule in the phase 2 subset. Both of the phase 1 parent molecules contain a Se atom, which FAME 3 cannot handle partial charges cannot be calculated; see Methods for a list of allowed element types. Because no SoM predictions can be made, no metabolites are predicted.

The parent molecule in the phase 2 subset is unable to be processed because it contains a nitrogen atom with a state that FAME 3 does not recognize. This is the case regardless of which FAME 3 model is used. The fundamental concept of our approach to predicting metabolites is to integrate machine learning-based SoM prediction in order to score the predicted metabolites. The predicted metabolites were scored using the maximum SoM probability predicted among all heavy atoms in the mapping onto the parent molecule of the reaction rule that led to the particular predicted metabolite.

In this case, the score was therefore equal to this SoM probability; no weighting based on reaction type was used. SyGMa, on the other hand, ranks its predictions based on probability scores that are calculated using the occurrence ratios of each reaction rule in the Metabolite database.

Given the same reaction rules, SyGMa with its reaction probability score-based ranking performed slightly better than our SoM probability-based ranking, with an AUC of 0. This supposition is consistent with the observation in by Kirchmair et al. Rank-based ROC curves for the evaluation of metabolite prediction performance on the reference data set. The ranks are calculated based on the priority scores of the predicted metabolites for each parent molecule.

Weighted rules refer to the weighting of the SoM probability-based score based on whether the reaction type is designated common or uncommon. The scoring approach that is based on both SoM probability and reaction probability is achieved by a simple multiplication of the two components. C Comparison of the ranking performance of GLORYx for combined prediction of metabolites for phases 1 and 2 metabolism, using different SoM prediction approaches to score the predicted metabolites.

When combining the rule sets, the overlap of the rules from the two different sources is handled in a straightforward manner. Duplicate metabolite predictions are combined by retaining the highest priority score. The addition of the CYP reaction rules from GLORY resulted in a substantial jump in recall portion of known metabolites that were successfully predicted, also known as sensitivity from 0.

The precision portion of predictions that match known metabolites , on the other hand, was halved, as the number of total predicted metabolites more than doubled, from over 10, to nearly 25, Note that only a fraction of the metabolites generated by organisms is experimentally observed and reported in the scientific literature and databases, for a number of reasons e.

Nevertheless, the number of predicted metabolites is enormous, so it is crucial that metabolite prediction methods are able to rank their predictions in a meaningful way. To examine the ranking performance of GLORYx using the combined rule set, we first used only the SoM probability to score and rank the predicted metabolites, as described above.

This nonweighted scoring approach resulted in an AUC of 0. Note that even though the sets of predicted metabolites are different in this case, the ranking ability of each approach can still be compared using the ROC curves and AUC.

We then applied the concept of weighting reaction rules that we first developed for GLORY, namely applying a simple common vs uncommon distinction between reaction types and generating the priority score for a predicted metabolite by multiplying the SoM probability by a factor corresponding to whether or not the reaction that led to that particular predicted metabolite was designated common see ref 15 for details.

The result of this weighting of the rule sets was a jump in AUC to 0. This means that predicted metabolites are compared across different parent molecules in the reference data set in terms of their priority scores. Here, it is important to note that the original publication of SyGMa implied that its score was only intended to be used to compare likelihoods of predicted metabolites of the same parent molecule, and the evaluation in that publication only considered the ranking per parent molecule.

In silico metabolism prediction is a cheminformatic task of autonomously predicting the set of metabolic byproducts produced from a specified molecule and a set of enzymes or reactions. Here we describe a novel machine-learned in silico cytochrome P CYP metabolism prediction suite, called CyProduct, that accurately predicts metabolic byproducts for a specified molecule and a human CYP isoform.

CyProduct predicts metabolic biotransformation products for each of the nine most important human CYP enzymes. Works referenced in this record:. Similar records in OSTI. GOV collections:. Title: CyProduct: A software tool for accurately predicting the byproducts of human cytochrome P metabolism.



0コメント

  • 1000 / 1000