by Keyword: Mass spectrometry
Rodríguez-Pérez, R., Fernández, L., Marco, S., (2018). Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study Analytical and Bioanalytical Chemistry 410, (23), 5981-5992
Advances in analytical instrumentation have provided the possibility of examining thousands of genes, peptides, or metabolites in parallel. However, the cost and time-consuming data acquisition process causes a generalized lack of samples. From a data analysis perspective, omics data are characterized by high dimensionality and small sample counts. In many scenarios, the analytical aim is to differentiate between two different conditions or classes combining an analytical method plus a tailored qualitative predictive model using available examples collected in a dataset. For this purpose, partial least squares-discriminant analysis (PLS-DA) is frequently employed in omics research. Recently, there has been growing concern about the uncritical use of this method, since it is prone to overfitting and may aggravate problems of false discoveries. In many applications involving a small number of subjects or samples, predictive model performance estimation is only based on cross-validation (CV) results with a strong preference for reporting results using leave one out (LOO). The combination of PLS-DA for high dimensionality data and small sample conditions, together with a weak validation methodology is a recipe for unreliable estimations of model performance. In this work, we present a systematic study about the impact of the dataset size, the dimensionality, and the CV technique used on PLS-DA overoptimism when performance estimation is done in cross-validation. Firstly, by using synthetic data generated from a same probability distribution and with assigned random binary labels, we have obtained a dataset where the true classification rate (CR) is 50%. As expected, our results confirm that internal validation provides overoptimistic estimations of the classification accuracy (i.e., overfitting). We have characterized the CR estimator in terms of bias and variance depending on the internal CV technique used and sample to dimensionality ratio. In small sample conditions, due to the large bias and variance of the estimator, the occurrence of extremely good CRs is common. We have found that overfitting peaks when the sample size in the training subset approaches the feature vector dimensionality minus one. In these conditions, the models are neither under- or overdetermined with a unique solution. This effect is particularly intense for LOO and peaks higher in small sample conditions. Overoptimism is decreased beyond this point where the abundance of noisy produces a regularization effect leading to less complex models. In terms of overfitting, our study ranks CV methods as follows: Bootstrap produces the most accurate estimator of the CR, followed by bootstrapped Latin partitions, random subsampling, K-Fold, and finally, the very popular LOO provides the worst results. Simulation results are further confirmed in real datasets from mass spectrometry and microarrays.
JTD Keywords: Metabolomics, Mass spectrometry, Microarrays, Chemometrics, Data analysis, Classification, Method validation
Taghadomi-Saberi, S., Garcia, S. M., Masoumi, A. A., Sadeghi, M., Marco, S., (2018). Classification of bitter orange essential oils according to fruit ripening stage by untargeted chemical profiling and machine learning Sensors 18, (6), 1922
The quality and composition of bitter orange essential oils (EOs) strongly depend on the ripening stage of the citrus fruit. The concentration of volatile compounds and consequently its organoleptic perception varies. While this can be detected by trained humans, we propose an objective approach for assessing the bitter orange from the volatile composition of their EO. The method is based on the combined use of headspace gas chromatography–mass spectrometry (HS-GC-MS) and artificial neural networks (ANN) for predictive modeling. Data obtained from the analysis of HS-GC-MS were preprocessed to select relevant peaks in the total ion chromatogram as input features for ANN. Results showed that key volatile compounds have enough predictive power to accurately classify the EO, according to their ripening stage for different applications. A sensitivity analysis detected the key compounds to identify the ripening stage. This study provides a novel strategy for the quality control of bitter orange EO without subjective methods.
JTD Keywords: Bitter orange essential oil, Headspace gas chromatography–mass spectrometry, Artificial neural network, Foodomics, Chemometrics, Feature selection
Oller-Moreno, Sergio, Cominetti, Ornella, Galindo, Antonio Núñez, Irincheeva, Irina, Corthésy, John, Astrup, Arne, Saris, Wim H. M., Hager, Jörg, Kussmann, Martin, Dayon, Loïc, (2018). The differential plasma proteome of obese and overweight individuals undergoing a nutritional weight loss and maintenance intervention PROTEOMICS - Clinical Applications 12, (1), 1600150
Purpose : The nutritional intervention program “DiOGenes” focuses on how obesity can be prevented and treated from a dietary perspective. We generated differential plasma proteome profiles in the DiOGenes cohort to identify proteins associated with weight loss and maintenance and explore their relation to body mass index, fat mass, insulin resistance and sensitivity. Experimental Design : Relative protein quantification was obtained at baseline and after combined weight loss/maintenance phases using isobaric tagging and MS/MS. A Welch t-test determined proteins differentially present after intervention. Protein relationships with clinical variables were explored using univariate linear models, considering collection center, gender and age as confounding factors. Results : 473 subjects were measured at baseline and end of the intervention; 39 proteins were longitudinally differential. Proteins with largest changes were sex hormone-binding globulin, adiponectin, C-reactive protein, calprotectin, serum amyloid A, and proteoglycan 4 (PRG4), whose association with obesity and weight loss is known. We identified new putative biomarkers for weight loss/maintenance. Correlation between PRG4 and proline-rich acidic protein 1 (PRAP1) variation and Matsuda insulin sensitivity increment was showed. Conclusions and Clinical Relevance : MS-based proteomic analysis of a large cohort of non-diabetic overweight and obese individuals concomitantly identified known and novel proteins associated with weight loss and maintenance.
JTD Keywords: Biomarker, Diabetes, Large-scale study, Mass spectrometry, Obesity, Proteomics
Moles, E., Marcos, J., Imperial, S., Pozo, O. J., Fernàndez-Busquets, X., (2017). 2-picolylamine derivatization for high sensitivity detection of abscisic acid in apicomplexan blood-infecting parasites Talanta 168, 130-135
We have developed a new liquid chromatography-electrospray ionization tandem mass spectrometry methodology based on 2-picolylamine derivatization and positive ion mode detection for abscisic acid (ABA) identification. The selected reaction leads to the formation of an amide derivative which contains a highly active pyridyl group. The enhanced ionization allows for a 700-fold increase over commonly monitored unmodified ABA, which in turn leads to excellent limits of detection and quantification values of 0.03 and 0.15 ng mL-1, respectively. This method has been validated in the highly complex matrix of a red blood cell extract. In spite of the high sensitivity achieved, ABA could not be detected in Plasmodium falciparum-infected red blood cells, suggesting that, if present, it will be found either in ultratrace amounts or as brief bursts at defined time points within the intraerythrocytic cycle and/or in the form of a biosynthetic analogue.
JTD Keywords: Abscisic acid, Apicomplexa, Liquid chromatography-electrospray ionization tandem mass spectrometry, Malaria, Picolylamine, Plasmodium falciparum
Gimenez-Oya, V., Villacanas, O., Fernàndez-Busquets, X., Rubio-Martinez, J., Imperial, S., (2009). Mimicking direct protein-protein and solvent-mediated interactions in the CDP-methylerythritol kinase homodimer: a pharmacophore-directed virtual screening approach Journal of Molecular Modeling , 15, (8), 997-1007
The 2C-methylerythritol 4-phosphate (MEP) pathway for the biosynthesis of isopentenyl pyrophosphate and its isomer dimethylallyl pyrophosphate, which are the precursors of isoprenoids, is present in plants, in the malaria parasite Plasmodium falciparum and in most eubacteria, including pathogenic agents. However, the MEP pathway is absent from fungi and animals, which have exclusively the mevalonic acid pathway. Given the characteristics of the MEP pathway, its enzymes represent potential targets for the generation of selective antibacterial, antimalarial and herbicidal molecules. We have focussed on the enzyme 4-(cytidine 5'-diphospho)-2-C-methyl-D: -erythritol kinase (CMK), which catalyses the fourth reaction step of the MEP pathway. A molecular dynamics simulation was carried out on the CMK dimer complex, and protein-protein interactions analysed, considering also water-mediated interactions between monomers. In order to find small molecules that bind to CMK and disrupt dimer formation, interactions observed in the dynamics trajectory were used to model a pharmacophore used in database searches. Using an intensity-fading matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry approach, one compound was found to interact with CMK. The data presented here indicate that a virtual screening approach can be used to identify candidate molecules that disrupt the CMK-CMK complex. This strategy can contribute to speeding up the discovery of new antimalarial, antibacterial, and herbicidal compounds.
JTD Keywords: Solvent-mediated interactions, Protein-protein interactions, Molecular dynamics, Drug design, Intensisty-fading MALDI-TOF mass spectrometry