[go: up one dir, main page]

WO2025012793A1 - Procédé permettant identifier un ou des biomarqueurs relatifs aux troubles de la santé mentale et biomarqueurs déterminés selon ce procédé - Google Patents

Procédé permettant identifier un ou des biomarqueurs relatifs aux troubles de la santé mentale et biomarqueurs déterminés selon ce procédé Download PDF

Info

Publication number
WO2025012793A1
WO2025012793A1 PCT/IB2024/056625 IB2024056625W WO2025012793A1 WO 2025012793 A1 WO2025012793 A1 WO 2025012793A1 IB 2024056625 W IB2024056625 W IB 2024056625W WO 2025012793 A1 WO2025012793 A1 WO 2025012793A1
Authority
WO
WIPO (PCT)
Prior art keywords
biomarker
mental health
mdd
health disorder
dataset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IB2024/056625
Other languages
English (en)
Inventor
Rifat HAMOUDI
Amal BOUZID
Maksim SHARAEV
Evgeny BURNAEV
Alexander Bernstein
Zubrikhina MARIA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Skoltech University
University of Sharjah
Original Assignee
Skoltech University
University of Sharjah
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Skoltech University, University of Sharjah filed Critical Skoltech University
Publication of WO2025012793A1 publication Critical patent/WO2025012793A1/fr
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present disclosure provides a method for biomarker analysis. More particularly, the present disclosure provides a method for identifying biomarker(s) associated with mental health disorder(s) such as but not limited to Major Depressive Disorder (MDD).
  • MDD Major Depressive Disorder
  • the identified biomarkers allows early and efficient detection of mental health disorder(s) by non-invasive methods.
  • Major Depressive Disorder is among the most prevalent, chronic complex psychiatric disorders nowadays. Depression affects over 280 million people globally according to the World Health Organization (WHO, 2023), and is the second leading cause of disability worldwide after cancer, while being predicted to be the leading cause by 2030. In the post- COVID-19 era, mental and behavioral disorders are reported to have become more severe, possibly due to the pandemic effects on healthcare and the economy worldwide.
  • MDD is a heterogeneous disorder, resulting from a complex interaction of social, psychological, environmental and genetic factors. Depression does not only affect the mental and psychological aspects of the individual’s health, but also affects physical health by disturbing the heart, kidney, nervous system, and immune system. It is characterized by the presence of depressed moods, functional impairments, a loss of interest in activities, fatigue, sleep disturbances, and psychomotor retardation or agitation. This negatively affects the patient’s productivity, self-perception, and self-esteem, resulting in an impaired quality of life which can lead to suicidal ideation and attempt. The WHO reported that more than 700,000 individuals worldwide die as a result of suicide each year, with depression being a leading cause (2023). Thus, MDD has grown into a major public health problem that needs urgent attention.
  • MDD affects the functional activity of the brain which can be identified by analyzing data on aberrations in the neuronal network. Moreover, the heterogeneity of depressive disorders may also be related to neuronal plasticity correlated with different depressive symptomatology. Besides, it is suggested that MDD results from systemic changes in the signaling and biochemical pathways involved in the regulation of moods, cognitive functions and disposition. Therefore, a comprehensive understanding of affected or associated biological pathways involved in MDD is of high importance to reveal the molecular mechanism of MDD and to identify accurate targets for MDD diagnosis.
  • the present disclosure sets out to compare transcriptomics data of MDD patients to healthy controls using an integrative approach reliant on bioinformatics and ML based techniques, to identify key biomarkers that can aid in the early diagnosis and prediction of the onset of depression in patients.
  • the present disclosure provides a method for identifying biomarker(s) for detection of a mental health disorder, the method comprising: receiving a dataset comprising one or more input features, wherein the one or more input features includes at least one of biological parameters and clinical parameters; preprocessing the dataset to obtain a normalized and regularized dataset; subjecting the preprocessed dataset to an integrative analysis engine, wherein the integrative analysis engine comprises a biomarker identification model and a bioinformatics analysis model, wherein: the biomarker identification model is implemented to create an environment such that a learning agent, associated with the environment, is configured to: determine a similarity score for each of input features of the preprocessed dataset, wherein the biomarker identification model is trained by correlating training feature(s), derived from a repository dataset, with a corresponding predetermined similarity score, wherein the repository dataset includes a set of healthy reference parameters associated with one or more healthy controls, and a set of diseased reference parameters associated with one or more disease
  • implementing the biomarker identification model comprises applying a machine learning (ML) model, based on the preprocessed dataset, to create an environment, wherein the learning agent is associated with the environment, the learning agent being configured to: determine patterns and relationships among the one or more input features that are indicative of the mental health disorder(s); computing a similarity score of the determined patterns and relationships based on related contribution to the detection of the mental health disorder(s); and identifying, based on the associated similarity scores, one or more target input features that are indicative of the mental health disorder(s).
  • ML machine learning
  • the biomarker identification model comprises two or more machine learning (ML) models, to create the environment, wherein the learning agent is associated with the environment, the learning agent being configured to: select genes altered by the mental health disorder(s) based on individual outputs of the two or more ML models; perform hyperparameter optimization for the individual outputs of the two or more ML models; perform cross-validation on the individual outputs of the two or more ML models; identifying, based on the cross-validation, the first set of biomarker(s) that are indicative of the mental health disorder.
  • ML machine learning
  • the aforesaid method comprises: implementing a training agent in the environment of the biomarker identification model, wherein the training agent is configured to train the learning agent using information of the set of healthy reference parameters associated with the one or more healthy controls, and the set of diseased reference parameters associated with the one or more diseased subjects from the repository dataset.
  • the similarity score of the biomarker(s) is determined by: identifying patterns and relationships among the one or more input features that are indicative of the mental health disorder.
  • the dataset comprising biological and clinical data, wherein the biological data includes genetic, proteomic, metabolic, and neuroimaging information, and the clinical data includes behavioral and psychological parameters.
  • the dataset may be specific to the mental health disorder(s) for which the biomarker(s) is to be determined.
  • preprocessing of the dataset comprises: performing data analysis on the dataset according to a data normalization rule to obtain normalized and regularized dataset, wherein the data normalization rule comprises a data classification rule and one or more data filtering conditions.
  • the mental health disorder(s) is selected from a group comprising major depressive disorder (MDD), anxiety, schizophrenia, attention-deficit hyperactivity disorder (ADHD) and bipolar disorder.
  • MDD major depressive disorder
  • ADHD attention-deficit hyperactivity disorder
  • bipolar disorder bipolar disorder
  • the dataset further comprises contextual features, wherein the contextual features are associated with one or more biological parameters and clinical parameters related to the mental health disorder(s).
  • the above method further comprises: generating a mapping profile of the determined biomarkers with corresponding brain regions and structural alterations thereof; associating the mapping profile with cognitive functions and varied range of behaviors.
  • the mental health disorder is MDD and wherein determined biomarker(s) is selected from a group comprising CEACAM8, CLEC12B, DEFA4, EIP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • the biological sample is a liquid biopsy sample.
  • the liquid biopsy sample is selected from a group comprising blood sample, saliva sample,
  • the level of biomarker(s) is determined by PCR.
  • determining the level of biomarker(s) in a biological sample as described above for use in one or more of: a) diagnosing MDD; b) diagnosing and treating MDD; c) determining genetic predisposition to MDD; and d) monitoring response to MDD therapy in patients diagnosed with MDD.
  • a method for diagnosing mental health disorder(s) comprising analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) compared to a healthy control, wherein the biomarker(s) is identified by the method as described above.
  • the mental health disorder is MDD; and wherein the one or more biomarker(s) is selected from a group comprising CEACAM8, CLEC12B, DEFA4, EIP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBSE
  • the mental health disorder is MDD; and wherein the one or more biomarker(s) is selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBSE [0030]
  • the therapy includes administration of Venlafaxine to the subject.
  • a method for identifying genetic predisposition of a subject to mental health disorder(s) comprising analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) compared to a healthy control, wherein the biomarker(s) is identified by the method as described above.
  • kits for determining differential expression level of one or more biomarker(s) identified by the method as described above in a biological sample, for detection of mental health disorder(s), comprising at least one amplification primer and/or at least one probe that hybridizes to nucleotide(s) encoding the one or more biomarker(s).
  • kits for determining differential expression level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, EIP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 in a biological sample comprising at least one amplification primer and/or at least one probe that hybridizes to nucleotide(s) encoding the one or more biomarker(s).
  • the amplification primers for NRG1 gene expression evaluation comprise Forward:5'-TCGTGGAATCAAACGCTACA-3' and Reverse:5'-ACTCCCCTCCATTCACACA-3'.
  • biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for the diagnosis of MDD.
  • Figure 2 depicts the top 20 enriched pathways in MDD patients compared to healthy controls.
  • Figure 3 depicts Venn diagram showing the overlap between the significantly identified genes by machine learning models and GSEA-leading gene analysis in MDD patients compared to healthy controls.
  • Figure 4 depicts network intersections of brain regions involving A) up-regulated and
  • Figure 5 depicts gene expression proofing of the NRG1 gene in brain samples A) Subcortex, B) Left cortex from healthy samples.
  • Figure 6 depicts increased expression of NRG1 gene in MDD patients compared to Healthy controls.
  • Y-axis is showing a random unit of expression.
  • Y-axis is showing relative expression.
  • Figure 7 depicts the general workflow of MDD-related Biomarkers identification based on Bioinformatics and ML-approach to connect Transcriptomic changes and Brain regions.
  • mental health disorder refers to health conditions characterized by a clinically significant disturbance in an individual's cognition, emotional regulation, or behaviour or a combination of these. This includes a wide range of conditions that affect mood, thinking and behaviour such as but not limited to clinical depression, anxiety disorder, bipolar disorder, Attention-deficit/hyperactivity disorder (ADHD), Schizophrenia, Obsessive compulsive disorder (OCD), Post traumatic stress disorder (PTSD) and the like.
  • ADHD Attention-deficit/hyperactivity disorder
  • OCD Obsessive compulsive disorder
  • PTSD Post traumatic stress disorder
  • input features or equivalent phrases bearing the same meaning in the context of the present disclosure are intended to refer to parameters and/or components of the dataset introduced into the integrative analysis engine of the present disclosure.
  • the dataset comprising the input features may be subjected to preprocessing before being processed by the integrative analysis engine of the present disclosure.
  • the preprocessing may involve one or more rounds of normalization and/or regularization.
  • normalization and equivalent terms refer to a data preprocessing technique employed to standardize the values of features in a dataset to a common scale, to transform the features to a similar scale and/or distribution. Normalization aids in improving the performance and accuracy of machine learning models by eliminating the potential biases and distortions caused by the different scales or distribution of features in the dataset.
  • regularization refers to a data preprocessing technique that prevents a model from overfitting which may occur when a model not only recognizes the inherent pattern within the training data but also incorporates the noise, potentially leading to subpar performance on fresh, unobserved data. Regularization may also include filtering to employ a set of regularity condition onto the features of the dataset.
  • integrated analysis engine refers to a module provided in the present disclosure that integrates functionalities of a bioinformatics model and a Machine Learning (ML)-based biomarker identification model to perform step(s) of the present subject matter.
  • ML Machine Learning
  • bioinformatics analysis model refers to a data analysis module reliant on one or more bioinformatics tools that predicts biomarkers indicative of mental health disorder by performing a comparative analysis to determine the anomaly between datasets pertaining to diseased and control subjects.
  • biomarker identification model refers to a Machine Learning (ML) based model that is trained to determine the anomaly between datasets pertaining to diseased and control subjects and identify potential biomarkers that help in early diagnosis of mental health disorders or identification of genetic predisposition to mental health disorders.
  • artificial intelligence generally refers to machines or computers that can perform tasks in a manner that is “intelligent” or non -repetitive or rote or pre-programmed by mimicking cognitive functions and reasoning by incorporating various data perception and decision-making techniques.
  • the term “trained” or “training” in the context of the ML model in the present disclosure refers to a process of providing input data points and corresponding target values to at least one model from which the at least one model examines and learn underlying patterns and relationships between the input data points and the corresponding target values.
  • the ML model may perform said examination and learning by allocating certain scores or weights and bias to refine related predictive and/ or deterministic processing in order to produce accurate responses to fresh or new input queries.
  • models can be linear models, transformers, or neural networks such as convolutional neural networks (CNNs), Bayesian Neural Networks (BNN), and the like.
  • CNNs convolutional neural networks
  • BNN Bayesian Neural Networks
  • the model is trained on one or more of the data sets.
  • machine learning refers to a type of learning in which the machine (e.g., computer program or engine) can employ supervised or unsupervised learning techniques based on assessment and/ or computation being performed on an input dataset. Machine learning may utilize such techniques in order to perform required predictions and/ or determinations.
  • neural net refers to an interconnected network of computational nodes.
  • the nodes are organized into a plurality of layers in which each layer comprises one or more nodes.
  • the neural network is configured to receive an input and perform computation on the input in order to learn and solve complex relationships and identify patterns in the data associated with the input. Based on the computation, the neural network may generate a predictive and/ or deterministic output.
  • exemplary is used herein to mean serving as an example, instance, or illustration. Any embodiment or implementation of the present subject matter described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
  • the present disclosure provides a bioinformatics and artificial intelligence (Al) reliant method, addressing the need to find means to mine publicly available datasets pertaining to mental health disorder patients to identify key biomarkers that can aid in the diagnosis and prediction of the onset of such mental health disorders.
  • MDD Major Depressive Disorder
  • the present disclosure provides a method reliant on integrative analysis between the bioinformatics and ML approaches to analyze and identify the most significant genes that display differential expression between datasets pertaining to subjects diagnosed with or predisposed to metal health disorder and healthy datasets.
  • a method for identifying biomarkers for detection of a mental health disorder comprising: receiving a dataset comprising one or more input features, wherein the one or more input features includes at least one of biological parameters and clinical parameters; preprocessing the dataset to obtain a normalized and regularized dataset; subjecting the preprocessed dataset to an integrative analysis engine, wherein the integrative analysis engine comprises a biomarker identification model and a bioinformatics analysis model, wherein: the biomarker identification model is implemented to create an environment such that a learning agent, associated with the environment, is configured to: determine a similarity score for each of input features of the preprocessed dataset, wherein the biomarker identification model is trained by correlating training feature(s), derived from a repository dataset, with a corresponding predetermined similarity score, wherein the repository dataset includes a set of healthy reference parameters associated with one or more healthy controls, and a set of diseased reference parameters associated with one or more diseased subjects; and identify, based on the
  • Figure 1 provides a broad overview of the method of the present disclosure.
  • the dataset is a publicly available dataset, preferably comprising information on biological parameters and clinical parameters of one or more subjects previously diagnosed with mental health disorder(s).
  • the dataset accordingly, may be selected based on the mental health disorder(s) for which the biomarker(s) is to be determined.
  • the data set may be a publicly available dataset, preferably comprising information on biological parameters and clinical parameters of one or more subjects previously diagnosed with MDD.
  • the data set may be a publicly available dataset, preferably comprising information on biological parameters and clinical parameters of one or more subjects previously diagnosed with Schizophrenia.
  • the biological parameters include but are not limited to genetic information of subjects previously diagnosed with mental health disorder(s).
  • a non-limiting example of such biological parameters includes but is not limited to genomic data, genetic data, transcriptomic data, proteomics data, metabolic data and neuroimaging data.
  • the biological parameters include transcriptomic data.
  • the dataset comprises transcriptomic data of one or more subjects previously diagnosed with mental health disorder(s).
  • the dataset comprises transcriptomic data of multiple subjects previously diagnosed with mental health disorder(s) as derived from a publicly available database.
  • the clinical parameters include but are not limited to information on cognition, emotional regulation and behaviour of subjects previously diagnosed with mental health disorder(s).
  • the clinical parameters include behavioural data.
  • the dataset comprises behavioural data of multiple subjects previously diagnosed with mental health disorder(s) as derived from a publicly available database.
  • the dataset comprises behavioural data of subjects previously diagnosed with mental health disorder(s).
  • preprocessing of the dataset comprises: performing data analysis on the dataset according to a data normalization rule to obtain normalized and regularized dataset, wherein the data normalization rule comprises a data classification rule and one or more data filtering conditions.
  • obtaining the regularizing dataset may involve filtering the dataset.
  • the filtering is selected from specific and/or adaptive filtering.
  • non-specific filtering may be carried out to extract the common variant probes, while adaptive filtering may be applied to identify probes with a predetermined cut off value for MAS 5 and GCRMA, for instance MAS5 value > 50 and a coefficient of variation > 10% in GCRMA. Then, processed probes may be intersected across the dataset.
  • the preprocessing of the dataset to obtain a normalized and regularized dataset may be performed by one or more known method(s) and/or algorithm(s) such as but not limited to MAS5, PLIER, Li-Wong, RMA, GCRMA, justRMA and justGCRMA.
  • the preprocessing of the dataset is performed by MAS5 and/or GCRMA.
  • implementing the biomarker identification model comprises applying a machine learning (ML) model, based on the preprocessed dataset, to create an environment, wherein the learning agent is associated with the environment, the learning agent being configured to: determine patterns and relationships among the one or more input features that are indicative of the mental health disorder; computing a similarity score of the determined patterns and relationships based on related contribution to the detection of the mental health disorder; and identifying, based on the associated similarity scores, one or more target input features that are indicative of the mental health disorder.
  • ML machine learning
  • the environment created by the ML model, facilitates optimization for the ML model, and provides robust deployment capabilities for the ML model.
  • the learning agent included in the environment is an intelligent agent which is deployed in the environment for efficiently training, optimizing, and deploy the ML model.
  • the learning agent is configured to initialize the ML model by selecting a suitable architecture for the ML model and continuous evaluation techniques to assess the model performance. Further, the learning agent fine-tunes the ML model based on the evaluation and deploys the ML model such that the ML model meets the required performance efficiency.
  • the learning agent may continuously monitor the model performance by adaptive feedback mechanisms and further optimize the ML model parameters to achieve the desired performance efficiency.
  • implementing the biomarker identification model comprises applying a machine learning (ML) model, based on the preprocessed dataset, to create an environment, wherein the learning agent is associated with the environment, the learning agent being configured to: determine patterns and relationships among the one or more input features that are indicative of the mental health disorder and the healthy reference parameters associated with one or more healthy controls; computing a similarity score of the determined patterns and relationships based on related contribution to the detection of the mental health disorder; and identifying, based on the associated similarity scores, one or more target input features that are indicative of the mental health disorder.
  • ML machine learning
  • the similarity score is computed on the basis of similarity to parameters of the diseased subject and /or deviation of the input parameters from parameters of the healthy control. For example, a high similarity score between an input parameter from parameters of the diseased subjects may be indicative of the mental health disorder. Alternatively, a low similarity score between an input parameter from parameters of the diseased subjects may be indicative of the input parameter being related to a healthy subject not suffering from the mental health disorder.
  • the biomarker identification model comprises two or more machine learning (ML) models, to create the environment, wherein the learning agent is associated with the environment, the learning agent being configured to: select genes altered by the mental health disorder based on individual outputs of the two or more ML models; perform hyperparameter optimization for the individual outputs of the two or more ML models; perform cross-validation on the individual outputs of the two or more ML models; identifying, based on the cross-validation, the first set of biomarkers that are indicative of the mental health disorder(s).
  • ML machine learning
  • the hyperparameter optimization may be performed on the hyperparameters associated with the ML models.
  • hyperparameter and equivalent terms refer to parameters associated to the ML model which control the behaviour and structure of the ML model. Examples of hyperparameters, may include, but are not limited to, a number of total layers in the network associated with the ML model, a regularization strength in some of the ML models, a learning rate associated with the ML model, and the like.
  • the hyperparameter optimization may be performed for the individual outputs of the two or more ML models in order to determine a combination of hyperparameters that may result in the best performance of each of the ML models. Therefore, hyperparameter optimization is critical in achieving an optimal performance efficiency of the ML models.
  • cross-validation may be performed on the individual outputs of the two or more ML models in order to assess an eligibility of individual output of each of the ML models for being generalized to an independent dataset.
  • the input parameters, provided to each of the ML models to obtained respective individual outputs may be split into multiple subsets, for training the model in some of the subset (also referred hereinafter as training subset) and perform validation or evaluation the training subset by using the remaining subset (also referred hereinafter as validation subset).
  • cross-validation may be performed multiple times by selecting splitting the parameters for each ML model into different combination of the training subset and the validation subset.
  • cross-validation By performing the cross-validation by splitting parameters for each of the ML models into the training subset and the validation subset, the overall reliability of the individual outputs of the ML models. In addition, performing cross-validation for the individual outputs may prevent any potential problems, such as overfitting and underfitting, in the ML models. Further, cross-validation aids in determining optimal hyperparameters and selecting the best performing ML model(s).
  • the two or more machine learning (ML) models are selected from but not limited to Logistic Regression (LR), Random Forest (RF), XGBoost (XGB), Support Vector Machine Classifier (SVM), PCA, SelectKBest, and K-Nearest Neighbors (KNN).
  • LR Logistic Regression
  • RF Random Forest
  • XGBoost XGB
  • SVM Support Vector Machine Classifier
  • PCA SelectKBest
  • KNN K-Nearest Neighbors
  • the two or more machine learning (ML) models are selected from a group comprising PCA, SelectKBest, LR, and RF.
  • the hyperparameter optimization is performed on the basis of prediction performance calculated for each of the models. For example, in case it is identified that the prediction performance of a model does not fulfill a predefined performance criteria, the combination of the hyperparameters may be reconfigured or optimized until the prediction performance of the model fulfill the performance criteria.
  • the prediction performance of each classification model is estimated by accuracy, Fl score and ROC-AUC measures which are known in the art.
  • the method comprises: implementing a training agent in the environment of the biomarker identification model, wherein the training agent is configured to train the learning agent using information of the set of healthy reference parameters associated with the one or more healthy controls, and the set of diseased reference parameters associated with the one or more diseased subjects from the repository dataset.
  • the training agent is configured to receive data from the repository dataset.
  • the training agent may be configured to receive and aggregate data from a plurality of data sources.
  • the training agent may also be configured to perform normalization and regularization of the dataset and further segment/ split the dataset into training subset and validation subset.
  • the training agent may also segment the dataset into a test subset which may be employed for testing of the performance of the ML model.
  • the bioinformatics analysis model may rely on any known bioinformatics techniques to compare the input dataset pertaining to subject suffering from or suspected to be suffering from a mental health disorder.
  • the bioinformatics analysis model is a structured technique employed to determine the biomarkers indicative of mental health disorder by performing a comparative analysis to determine the anomaly between datasets pertaining to diseased and control subjects.
  • the bioinformatics analysis model may be configured to receive one or more batches of data pertaining to diseased subjects and healthy controls.
  • the bioinformatics analysis model employs gene enrichment techniques on the received batches and identify significantly enriched pathways indicative of the mental health disorder. Further, based on the identified enriched pathways, leading genes may be identified for each batch.
  • the leading genes identified from each of the batches may be processed to obtain a subset of overlapping genes.
  • the bioinformatics analysis model may filter the overlapping genes considering an upper threshold for upregulation and a lower threshold for upregulation, to obtain a subset of core genes. Therefore, based on the above approaches, the bioinformatics analysis model may identify the subset of core genes which are indicative of the mental health disorder.
  • the bioinformatics analysis model may rely on a gene set enrichment analysis (GSEA) to reveal differentially expressed genes in mental health disorder patients and identify pathways they are associated with.
  • GSEA gene set enrichment analysis
  • the bioinformatics analysis model identifies the activated and enriched cellular pathways in MDD patients in comparison to HCs. Without intending to be limited by theory, the bioinformatics analysis model reveals that MDD patients are mainly enriched in pathways related to immune response, inflammatory response, neurodegeneration pathways and cerebellar atrophy pathways.
  • the aforesaid method of the present disclosure helps identify most significant and strong differential expression between data sets of subjects having mental health disorder(s) and healthy controls.
  • the mental health disorder is selected from but not limited to major depressive disorder (MDD), anxiety, schizophrenia, attention-deficit hyperactivity disorder (ADHD) and bipolar disorder.
  • MDD major depressive disorder
  • ADHD attention-deficit hyperactivity disorder
  • bipolar disorder bipolar disorder
  • the mental health disorder is MDD.
  • the present disclosure provides application of an integrative analysis based method reliant on a mixture of bioinformatics and ML approaches to mine transcriptomics data of patients with MDD compared to HCs.
  • envisaged herein is a method for identifying biomarker(s) for detection of MDD, the method comprising: receiving a dataset comprising one or more input features, wherein the one or more input features includes at least one of biological parameters and clinical parameters; preprocessing the dataset to obtain a normalized and regularized dataset; subjecting the preprocessed dataset to an integrative analysis engine, wherein the integrative analysis engine comprises a biomarker identification model and a bioinformatics analysis model, wherein: the biomarker identification model is implemented to create an environment such that a learning agent, associated with the environment, is configured to: determine a similarity score for each of input features of the preprocessed dataset, wherein the biomarker identification model is trained by correlating training feature(s), derived from a repository dataset, with a corresponding predetermined similarity score, wherein the repository dataset includes a set of healthy reference parameters associated with one or more healthy controls, and a set of diseased reference parameters associated with one or more diseased subjects; and
  • the identification of such biomarkers associated with MDD would provide a better and earlier preventative intervention, and insight into patients' responses to anti-depressant therapy, further aiding in monitoring the effectiveness of treatment in patients.
  • the identification of specific depression-related biomarkers is crucial to aid in the diagnosis, prognosis, treatment and monitoring of patients with depression which may result in earlier diagnosis, as well as earlier intervention, and substantially improve the patients’ outcomes.
  • the dataset further comprises contextual features, wherein the contextual features are associated with one or more biological parameters and clinical parameters related to the mental health disorder(s).
  • the input features being pre-processed, prior to subjecting to the integrative analysis engine may be further enriched with the contextual features.
  • the contextual features may be obtained from disparate data sources. Said disparate data sources may include, but are not limited to, electronic health record (EHR) databases, Electronic Health Information Exchanges (HIE), public health data sources, and the like. Other suitable forms of historical data may also be employed as contextual features. Enriching the input features with the contextual features may further improve the accuracy of identifying biomarker(s) for detection of a mental health disorder.
  • EHR electronic health record
  • HIE Electronic Health Information Exchanges
  • Other suitable forms of historical data may also be employed as contextual features. Enriching the input features with the contextual features may further improve the accuracy of identifying biomarker(s) for detection of a mental health disorder.
  • the contextual features may be selected from a group comprising data from a different population, data from a population before and after a natural calamity or event such as but not limited to a pandemic or an epidemic.
  • the above- described method of the present disclosure is equipped to provide a comparison between the contextual datasets and the other dataset(s) processed by the method of the present disclosure previously, simultaneously or subsequently.
  • the biomarker identification model may include a classifier for predicting and classifying the input features into predefined classes or categories. Further, the biomarker identification model may implement an iteratively regenerating or refining of the classifier based on a determination that an output obtained by the classifier does not meet a predefined receiver operating characteristic (ROC) curve criteria. For example, if the output of the classifier is not meeting the predefined ROC curve criteria, the biomarker identification model may utilize a different subset of input parameters. Further, the biomarker identification model may allocate an associated weight to the output. Subsequently, if the classifier does not meet the predefined ROC curve criteria, the biomarker identification model may adjust or decrease the associated weight of the respective output until a generated output of the classifier meets the predefined ROC curve criteria.
  • ROC receiver operating characteristic
  • the biomarker identification model may utilize a different subset of input parameters along with a subset of contextual features until a generated output of the classifier meets the predefined ROC curve criteria.
  • the top identified mental health disorder related biomarkers are capable of modeling multivariate associations between brain plasticity and behavioural disorders.
  • the aforesaid method of the present disclosure further comprises: generating a mapping profile of the determined biomarkers with corresponding brain regions and structural alterations thereof; associating the mapping profile with cognitive functions and varied range of behaviours.
  • the mental health disorder is MDD, wherein determined biomarker(s) is selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • determined biomarker(s) is selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • the mental health disorder is MDD, by mapping the identified MDD- related biomarkers across the Human brain regions, significant overlapped differentially expressed genes across the brain regions were found. The mapping network analysis showed close interconnections between the different involved frontal brain regions suggesting that the identified genes or biomarkers are significantly associated with cognitive functions which are probably altered in MDD.
  • the above-described method of the present di sclosure may be further employed for confirming reproducibility of expression changes of mental health disorder related biomarkers across different populations.
  • a method for confirming reproducibility of expression changes of mental health disorder related biomarkers across different populations comprising: receiving a dataset comprising one or more input features, wherein the one or more input features includes at least one of biological parameters and clinical parameters; preprocessing the dataset to obtain a normalized and regularized dataset; subjecting the preprocessed dataset to an integrative analysis engine, wherein the integrative analysis engine comprises a biomarker identification model and a bioinformatics analysis model, wherein: the biomarker identification model being implemented to create an environment such that a learning agent, associated with the environment, is configured to: determine a similarity score for each of input features of the preprocessed dataset, wherein the biomarker identification model is trained by correlating training feature(s), derived from a repository dataset, with a corresponding predetermined similarity score, wherein the repository dataset includes a set of healthy reference parameters associated with one or more healthy controls, and a set of diseased reference parameters associated with one or more diseased
  • the above-described method of the present disclosure may be further employed for confirming reproducibility of expression changes of mental health disorder related biomarkers in the same population before and after a natural calamity or any event of potential evolutionary significance such as but not limited to a pandemic or an epidemic.
  • the above-described method of the present disclosure as described herein generates a database containing or comprising input and/or output data.
  • the method the present disclosure is versatile in its intended application and can be relied upon to determine or identify patterns and relationships between a plurality of biomarkers pertaining to diseased subjects associated with a mental health disorder (for example, MDD patients) and healthy controls.
  • the determined or identified patterns and relationships may be indicative of one or more biomarker(s) associated with the mental health disorder.
  • examples of such applications include but are not limited to the use of the method as described above for early detection of a mental health disorder. Accordingly, further envisaged herein are devices, software, systems, and other methods that leverage the capabilities of artificial intelligence or machine learning techniques for biomarker analysis as described above to make predictions with respect to biomarkers indicative of mental health disorder(s), such as but not limited to MDD.
  • the aforesaid technical advancements and practical applications of the disclosed method may be attributed to the aspect of efficiently implementing the bioinformatics analysis model and efficiently training the biomarker identification model to draw correlation biomarker(s) associated with diseased subjects and healthy controls and obtain the desired output pertaining to biomarker(s) characteristic of specific mental health disorder(s) by employing a suitable input dataset.
  • the present disclosure further envisages a method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject.
  • the subject may be a subject suspected to be suffering from a mental health disorder.
  • the mental health disorder is MDD.
  • the mental health disorder is MDD and the biomarker(s) determined by the method for identifying biomarker(s) for detection of MDD is selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, in a biological sample obtained from a subject.
  • the method of determining level of one or more biomarker(s) as described above may include determination of the level any of the following biomarker(s) or combinations thereof as set out in Table A in the biological sample, wherein ‘X’ represents presence of the specific component of the combination - Table A
  • the above table is a depiction of individual biomarker(s) and combinations of biomarker(s) indicative of MDD or predisposition thereto. Each of the above combinations 1- 148 may thus be relied upon for the diagnosis of MDD or predict predisposition to MDD.
  • NRG1 coding for neuregulin 1
  • EGF epidermal growth factor
  • NRG I is a particularly important biomarker for the detection of MDD.
  • NRG1 is therefore an important liquid-biopsy biomarker for MDD diagnosis and prognosis.
  • NRG1 further may serve as a predictive biomarker of the response to pharmacological antidepressant treatment since NRG1 changes expression profile in MDD patients compared to healthy control before and after therapy for MDD.
  • the therapy for MDD comprises administration of anti-depressant drugs such as but not limiting to MDD.
  • the method for identifying biomarker(s) for detection the present disclosure particularly shows that an up-regulation of the NRG1 in MDD patients compared to HCs in liquid biopsy samples such as but not limited to blood and saliva samples results in deficits in the response to a stimulus, hyper-agitation and perturbation in the working memory.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and one or more additional biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 0EFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and two or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 0EFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and three or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 0EFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and four or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 0EFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and five or more additional biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and six or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and seven or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and eight or more additional biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarkers CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1.
  • the biological sample is a liquid biopsy sample.
  • the liquid biopsy sample is selected from a group comprising blood sample, saliva sample
  • the level of biomarker(s) in the biological sample is determined by technique(s) that allows quantitative gene expression evaluation of the biomarker(s) in the biological sample.
  • the level of biomarker(s) in the biological sample is determined by techniques such as but not limited to PCR.
  • the biological sample is analyzed for expression of biomarker(s) including NRG1 and the primers for NRG1 gene expression evaluation include Forward:5'- TCGTGGAATCAAACGCTACA-3' and Reverse:5'-ACTCCCCTCCATTCACACA-3'.
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 in the biological sample, wherein the level of biomarker(s) is determined by PCR and the primers for NRG1 gene expression evaluation include Forwards '-TCGTGGAATCAAACGCTACA-3' and Reversed ACTCCCCTCCATTCACACA-3
  • the method of determining level of one or more biomarker(s) determined by the aforesaid method for identifying biomarker(s) for detection of mental health disorder(s), in a biological sample obtained from a subject comprises determining the level of biomarker NRG1 and one or more additional biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1, wherein the level of biomarker(s) is determined by PCR and the primers for NRG1 gene expression evaluation include Forwards '-TCGTGGAATCAAACGCTACA-3 ' and Reversed ACTCCCCTCCATTCACACA-3
  • the above method may find application in the diagnosis and monitoring of mental health disorder(s) such as MDD.
  • mental health disorder(s) such as MDD.
  • envisaged herein is the method as described above for use in one or more of: a) diagnosing mental health disorder(s); b) diagnosing and treating mental health disorder(s); c) determining genetic predisposition to mental health disorder(s); and d) monitoring response to MDD therapy in patients diagnosed with mental health disorder(s); wherein the mental health disorder(s) is associated with the biomarker(s) whose level is identified.
  • envisaged herein is the method for determining the level of biomarkers CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 0EFM4, SERPING1, TCN1, and THBS1 in a biological sample for use in one or more of: a) diagnosing MDD; b) diagnosing and treating MDD; c) determining genetic predisposition to MDD; and d) monitoring response to MDD therapy in patients diagnosed with MDD.
  • the present disclosure further provides a method for diagnosing mental health disorder(s), the said method comprising analyzing a biological sample from a subject to confirm presence or absence of the biomarker(s) identified by the above-mentioned method of the present disclosure for identification of biomarker(s) associated with the mental health disorder(s).
  • the method for diagnosing mental health disorder(s) comprises analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) compared to a healthy control, wherein the biomarker(s) is identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above.
  • the method for identifying biomarker(s) for detection of mental health disorder(s) provides a cut off value for relative gene expression level for the identified biomarker(s) vs. healthy control, wherein the cut off value for the biomarker(s) is correlated to a specific mental health disorder(s).
  • the method for diagnosing mental health disorder(s) comprises analyzing a biological sample from a subject to confirm if the expression level of the biomarker(s) associated with the mental health disorder(s) meets pre-determined cut off value for relative gene expression level for the said biomarker(s) vs. healthy control.
  • the mental health disorder is MDD.
  • MDD is characterized by a differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for diagnosing MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for diagnosing MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for diagnosing MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of a single biomarker or any combination biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control, as set out in Table A.
  • the present disclosure provides a method for diagnosing MDD, the said method comprising analyzing a biological sample from a subject to confirm if the expression level of the biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 meets a pre- determined cut off value for relative gene expression level for the said biomarker(s) vs. healthy control.
  • the method of diagnosis is a non-invasive method.
  • the biological sample is a liquid biopsy sample.
  • the liquid biopsy sample is selected from a group comprising blood sample, saliva sample.
  • the diagnosis is performed by subjecting the biological sample to analysis by technique(s) that allow quantitative gene expression evaluation of the biomarker(s) in the biological sample.
  • the diagnosis is performed by subjecting the biological sample to analysis by techniques such as but not limited to PCR.
  • the biological sample is analyzed for differential expression of biomarker(s) including NRG1 and the primers for NRG1 gene expression evaluation include Forward:5 '-TCGTGGAATCAAACGCTACA-3 ' and Reversed '-ACTCCCCTCCA TTCACACA- 3'.
  • the present disclosure further provides a method for identifying genetic predisposition of a subject to mental health disorder(s), the said method comprising analyzing a biological sample from a subject to confirm presence or absence of the biomarker(s) identified by the above-mentioned method of the present disclosure for identification of biomarker(s) associated with or known to play a role in genetic predisposition to the mental health disorder(s).
  • the method for identifying genetic predisposition of a subject to mental health disorder(s) comprises analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) compared to a healthy control, wherein the biomarker(s) is identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above.
  • the method for identifying biomarker(s) for detection of mental health disorder(s) provides a cut off value for relative gene expression level for the identified biomarker(s) vs. healthy control, wherein the cut off value for the biomarker(s) is correlated to genetic predisposition to a specific mental health disorder(s).
  • the method for identifying genetic predisposition of a subject to mental health disorder(s) comprises analyzing a biological sample from a subject to confirm if the expression level of the biomarker(s) associated with the mental health disorder(s) meets pre-determined cut off value for relative gene expression level for the said biomarker(s) vs. healthy control.
  • the mental health disorder is MDD.
  • MDD is characterized by a differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, EIP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for identifying genetic predisposition of a subject to MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for identifying genetic predisposition of a subject to MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, 374 OLFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method for identifying genetic predisposition of a subject to MDD, the said method comprising analyzing a biological sample from a subject to confirm differential expression level of a single biomarker or any combination biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, 374 OLFM4, SERPING1, TCN1, and THBS1 compared to a healthy control, as set out in Table A.
  • the present disclosure provides a method for diagnosing MDD, the said method comprising analyzing a biological sample from a subject to confirm if the expression level of the biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 meets a pre- determined cut off value for relative gene expression level for the said biomarker(s) vs. healthy control.
  • the method of identifying genetic predisposition of a subject to mental health disorder(s) is a non-invasive method.
  • the biological sample is a liquid biopsy sample.
  • the liquid biopsy sample is selected from a group comprising blood sample, saliva sample.
  • the identification of genetic predisposition to mental health disorder(s) is performed by subjecting the biological sample from a subject to analysis by technique(s) that allow quantitative gene expression evaluation of the biomarker(s) in the biological sample.
  • the identification of genetic predisposition is performed by subjecting the biological sample to analysis by techniques such as but not limited to PCR.
  • the biological sample is analyzed for differential expression of biomarker(s) including NRG1 and the primers for NRG1 gene expression evaluation include Forward:5'- TCGTGGAATCAAACGCTACA-3' and Reverse:5'-ACTCCCCTCCATTCACACA-3'.
  • a method of treating a subject expressing biomarker(s) identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above comprising administering therapy for the mental health disorder(s) to the subject after being determined to express a differential level of one or more of the biomarker(s) associated with the mental health disorder(s), compared to a healthy control.
  • provided in the present disclosure is a method of treating a subject expressing biomarker(s) identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above, comprising administering therapy for the mental health disorder(s) to the subject after being determined to express a differential level of one or more of the biomarker(s) associated with the mental health disorder(s), compared to a healthy control.
  • the method comprises treating a subject expressing biomarker(s) identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above, comprising administering therapy for the mental health disorder(s) to the subject after analyzing a biological sample from the subject to confirm differential expression level of one or more biomarker(s) compared to a healthy control.
  • envisaged herein is a method of treating a subject diagnosed with a mental health disorder(s) by the method of diagnosis as described above, comprising administering therapy for the mental health disorder(s) to the subject after the diagnosis.
  • envisaged herein is a method of diagnosing and treating mental health disorder(s) in a subject, comprising:
  • the therapy administered is suitable for treatment of the mental health disorder(s) characterized by differential level of the biomarker(s) as identified by the method for identifying biomarker(s) for detection of mental health disorder(s), wherein the said method provides a correlation between the biomarker(s) and differential level thereof vs. healthy control, and the mental health disorder(s).
  • the method for identifying biomarker(s) for detection of mental health disorder(s) provides a cut off value for relative gene expression level for the identified biomarker(s) vs. healthy control, wherein the cut off value for the biomarker(s) is correlated to a specific mental health disorder(s).
  • the method of treating a patient expressing biomarker(s) identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above comprises administering therapy for the mental health disorder(s) to the patient after a biological sample from the patient is determined to meet the pre-determined cut off value for relative gene expression level for the said biomarker(s) vs. healthy control.
  • the mental health disorder is MDD.
  • MDD is characterized by a differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 0LFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of treating a patient for MDD comprising administering MDD therapy to a patient after a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides a method of treating a patient for MDD comprising administering MDD therapy to a patient after being determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of diagnosing and treating MDD in a subject comprising:
  • a method of diagnosing and treating MDD in a subject comprising:
  • the present disclosure provides a method of treating a patient for MDD comprising administering MDD therapy to a patient after being determined to express differential level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of diagnosing and treating MDD in a subject comprising:
  • biomarker NRG1 a biological sample from the subject to confirm differential expression level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control;
  • the therapy may include administration of drugs suitable for the treatment of MDD.
  • the therapy may include administration of Venlafaxine to the patient.
  • envisaged herein is a method of treating a patient for MDD comprising administering Venlafaxine to a patient after being determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of treating a patient for MDD comprising administering Venlafaxine to a patient after being determined to express differential level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of diagnosing and treating MDD in a subject comprising: - analyzing a biological sample from the subject to confirm differential expression level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 compared to a healthy control;
  • Venlafaxine as therapy for MDD to the diagnosed subject.
  • a method of diagnosing and treating MDD in a subject comprising:
  • biomarker NRG1 a biological sample from the subject to confirm differential expression level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control;
  • Venlafaxine as therapy for MDD to the diagnosed subject.
  • the present disclosure further provides a method of monitoring response to treatment in a patient a) expressing biomarker(s) identified by the method for identifying biomarker(s) for detection of mental health disorder(s) as described above and b) treated for the mental health disorder(s).
  • the present disclosure provides a method of monitoring response to treatment for MDD in a patient wherein the treatment is provided after a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • the present disclosure provides method of monitoring response to treatment for MDD in a patient wherein the treatment is provided after a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBSE
  • the present disclosure provides a method of monitoring response to Venlafaxine as treatment for MDD in a patient wherein the treatment is provided after a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control.
  • a method of monitoring response to Venlafaxine as a treatment for MDD in a subject comprising:
  • a method of monitoring response to Venlafaxine as a treatment for MDD in a subject comprising:
  • the present disclosure provides method of monitoring response to Venlafaxine as a treatment for MDD in a patient wherein the treatment is provided after a biological sample from the patient is determined to express differential level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, EIP, LCN2, 374 OEFM4, SERPING1, TCN1, and THBSE
  • a method of monitoring response to Venlafaxine as a treatment for MDD in a subject comprising:
  • biomarker NRG1 a biological sample from the subject to confirm differential expression level of biomarker NRG1 and one or more biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, 374 OEFM4, SERPING1, TCN1, and THBS1 compared to a healthy control;
  • envisaged herein is use of the method for identifying biomarker(s) for detection of mental health disorder(s) for providing a correlation between the biomarker(s), their expression level(s) in a biological sample and mental health disorder(s).
  • biomarker(s) for detection of mental health disorder(s) for studying reproducibility of biomarker expression levels across different populations.
  • the present disclosure further provides use of the method for identifying biomarker(s) for detection of mental health disorder(s), for analyzing progression of mental health disorder(s) in subject(s).
  • biomarker(s) for detection of mental health disorder(s), for analyzing response to treatment in subject(s).
  • the present disclosure also provides use of the method for identifying biomarker(s) for detection of mental health disorder(s), for analyzing the impact of environmental and/or social events on the expression of biomarker(s) characterizing mental health disorder(s).
  • the present disclosure further provides use of the biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for the diagnosis of MDD.
  • biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for the diagnosis of MDD.
  • biomarker NRG1 for the diagnosis of MDD.
  • biomarker NRG1 in combination with at least one biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for the diagnosis of MDD.
  • biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for the diagnosis of MDD.
  • the present disclosure also provides use of the biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for preparing suitable probes/primers for incorporating in a point-of-care system or kit for the diagnosis of MDD.
  • biomarker(s) selected from a group comprising CEACAM8, CEEC12B, DEFA4, HP, ECN2, NRG1, 374 0EFM4, SERPING1, TCN1, and THBS1 or any combination thereof, for preparing suitable probes/primers for incorporating in a point-of-care system or kit for the diagnosis of MDD.
  • the present disclosure further provides a panel of biomarker(s) that can aid in the diagnosis and/or monitoring of patients with mental health disorder(s).
  • the said biomarker(s) are those identified by the method for identifying biomarker(s) for detection of mental health disorder(s), as described above.
  • the present disclosure provides a panel of biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 0LFM4, SERPING1, TCN1, and THBS1 that can aid in the diagnosis and monitoring of patients with MDD.
  • a panel of 10 biomarkers that can aid in the diagnosis and/or monitoring of patients with MDD, wherein the biomarkers are selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • NRG1 exhibits higher expression in the amygdala and hippocampus brain subregions and is suggested as a particularly important non- invasive liquid-biopsy biomarker for the diagnosis of MDD patients. Accordingly, in some embodiments, provided herein is a panel of biomarkers that can aid in the diagnosis and/or monitoring of patients with MDD, wherein the biomarkers include NRG1 in combination with one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, 374 OLFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • the present disclosure further provides a kit for determining differential expression level of one or more biomarker(s) determined by the method for identifying biomarker(s) for detection of mental health disorder(s), as described above, in a biological sample, for detection of mental health disorder(s), comprising at least one amplification primer and/or at least one probe that specifically hybridizes to nucleotide(s) encoding the one or more biomarker(s).
  • the mental health disorder is MDD
  • the biomarker(s) determined by the method for identifying biomarker(s) for detection of MDD is selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 or any combination thereof.
  • kits for determining differential expression level of one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 in a biological sample comprising at least one amplification primer and/or at least one probe that specifically hybridizes to nucleotide(s) encoding the one or more biomarker(s).
  • the said kit finds application in the diagnosis and/or monitoring of MDD.
  • kits for diagnosis and/or monitoring of MDD comprising at least one amplification primer and/or at least one probe that specifically hybridizes to nucleotide(s) encoding one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 in a biological sample for determining differential level of expression of the biomarker(s) in the biological sample vs. healthy control.
  • kits for diagnosis and/or monitoring of MDD comprising at least one amplification primer and/or at least one probe that specifically hybridizes to nucleotide(s) encoding one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 in a biological sample for determining differential level of expression of the biomarker(s) in the biological sample vs. healthy control, wherein the primers for NRG1 expression evaluation include Forward: 5'-TCGTGGAATCAAACGCTACA-3' and Reverse: 5'- ACTCCCCTCCATTCACACA-3'.
  • the kit finds utility in diagnosis of one or more mental health disorder(s) such as but not limited to MDD. [0075] In some embodiments, the kit finds utility in diagnosis of one or more mental health disorder(s) such as but not limited to MDD. [0076] In some embodiments, the kit finds utility in analyzing progression of one or more mental health disorder(s) such as but not limited to MDD in subject(s). [0077] In some embodiments, the kit finds utility in analyzing response to treatment in subject(s) being treated for mental health disorder(s) such as but not limited to MDD in subject(s).
  • the kit finds utility in analyzing the impact of environmental and/or social events on the expression of biomarker(s) characterizing mental health disorder(s) such as but not limited to MDD.
  • a point-of-care system or kit for the diagnosis of MDD comprising at least one amplification primer and/or at least one probe that specifically hybridizes to the nucleotide(s) encoding one or more biomarker(s) selected from a group comprising CEACAM8, CLEC12B, DEFA4, HP, LCN2, NRG1, 374 OLFM4, SERPING1, TCN1, and THBS1 in a biological sample for determining differential level of expression of the biomarker(s) in the biological sample vs.
  • the claimed steps are not routine, conventional, or well-known aspects in the art, as the claimed steps provide the aforesaid solutions to the technical problems existing in the conventional technologies. Further, the claimed steps clearly bring an improvement in determination of biomarker(s) that may be relied upon for accurate and early diagnosis of mental health disorder(s) such as but not limited to MDD.
  • the Gene Expression Omnibus database (GEO) was searched for publicly available MDD and HCs transcriptomic datasets (Mips ://w w w ; ncbi .nlmmih .go y/gep/) to choose the appropriate gene expression datasets.
  • the selection criteria included studies for Homo sapiens and MDD patients recruited as per international practice guidelines in psychiatry. Studies with no healthy control samples or with a large batch effect were excluded.
  • the GSE98793 dataset which met all the inclusion criteria was selected and processed. The samples were performed in two batches, for which batch information was extracted from phenotypic data. This dataset consists of high-resolution gene expression sets retrieved from whole blood samples of 128 MDD patients and 64 healthy controls (HCs).
  • the Affymetrix Human Genome U133-Plus 2.0 gene expression gene-chip was used.
  • the MDD patients were diagnosed if at least two episodes of depression satisfying DSM-IV or ICD10 criteria were identified using the semi- structured Schedule for Clinical Assessment in Neuropsychiatry (SCAN).
  • SCAN Semi- structured Schedule for Clinical Assessment in Neuropsychiatry
  • the demographic and clinical details of the GSE98793 dataset are listed in Supplementary Table SI.
  • the datasets GSE99725, GSE76826, GSE38206 and GSE32280 for confirmation external sets are available from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo).
  • the machine learning pipeline including sourcecode is uploaded to a publicly available repository at GitHub (https://github.com/maryjis/autoNeuro).
  • Raw Microarray Normalization and Adaptive Filtering [0085]
  • the Affymetrix microarray covers more than 54,000 probes.
  • the raw CEL files were downloaded and analyzed using an in-house R script pipeline. First, logarithmic transformation and batch effect correction were performed to eliminate the experimental effect on the datasets.
  • MAS5 and GCRMA packages from the Bioconductor project (https://www.bioconductor.org/) were applied. Non-specific filtering was carried out to extract the common variant probes, while adaptive filtering was applied to identify probes with MAS5 value > 50 and a coefficient of variation > 10% in GCRMA. Then, processed probes were intersected across all MDD patients and HCs to identify common variant probes. The filtered common probes between all samples from the two batches of the dataset were combined into a data matrix and mapped to the genes list using Broad Institute software (https://www.gsea-msigdb.org/gsea/index.jsp ).
  • the significantly identified pathways were then subjected to the normal GSEA considering the significance cut-off of p-value p ⁇ 0.05.
  • ML analysis was performed on the same mapped gene expression list from the GSE98793 dataset using SkLeam v.1.2.1 library in Python v3.9.
  • FS feature selection
  • the ML pipeline selection involves FS and hyperparameter optimization for different ML models including Logistic Regression (LR), Random Forest (RF), XGBoost (XGB), Support Vector Machine Classifier (SVM) and K-Nearest Neighbors (KNN) with 10- fold cross- validation (Figure 1C).
  • the Fl score is the harmonic mean of precision and recall.
  • the precision is the fraction of truly predicted subjects related to a certain class among all subjects for which a classification model was assigned to this class.
  • the recall is the fraction of truly predicted subjects related to a certain class among all subjects related to this class.
  • the Fl-macro score was used, where all per-class Fl scores were averaged.
  • ROC-AUC is the area under the error curve, which evaluates the quality of the model without being tied to a specific threshold. For building the ROC curve, TPR against the FPR is plotted, and then the area under the resulting curve was calculated.
  • Transcriptomic data consists of a large number of genes (features) that are much more than the number of samples, which leads to model overfitting in the classification tasks.
  • the FS procedure is applied to reduce the dimensionality of the data while preserving significant information. Considering that the number of features should be equal to or smaller than the sample size, thus, the following options were chosen: First, the number of selected features was equal to the number of samples in the dataset, while the second option, the number of selected features was 80% of the number of samples.
  • Several methods in the FS procedure ( Figure 1C) were chosen including PCA and the SelectKBest method with ANOVA F-value as a score function.
  • ML methods such as LR and RF were chosen to extract important features from trained models.
  • Table 1 List of Transcriptomic datasets used in this study.
  • AHBA Allen Human Brain Atlas
  • Figure 7 depicts the general workflow of MDD-related Biomarkers identification based on Bioinformatics and ML-approach to connect Transcriptomic changes and Brain regions.
  • the identified potential MDD biomarker was further assessed in an independent cohort with a particular genetic background from the Ukraine population.
  • Saliva samples of 12 well-characterized MDD patients and 8 HCs were collected at Al-Farabi Kazakh National University, Ukraine. The study was approved by the Ethics Committee of the Al-Farabi Kazakh National University (IRB-AO83 and IRB-A267). Inclusion criteria were diagnosed with MDD by a psychiatrist for the first time (based on ICD-10) and no medication/ treatment. Exclusion criteria were volunteers with substance or alcohol abuse and taking antipsychotic medications/treatment. All participants after signing the consent form completed the Inventory of Depressive Symptomatology (IDS). Subjects who showed IDS scores>20 were interviewed by a psychiatrist for diagnosis. Saliva samples were collected from all samples. mRNA gene expression evaluation using qRT-PCR
  • the top candidate biomarker NRG1 was selected for further quantitative gene expression evaluation between MDD patients and HCs from Kazakh population. Saliva samples were stored in RNAlater stabilization solution at +4 °C until analysis. Total RNA was extracted using the RNeasy Mini kit (Qiagen, Germany), according to the manufacturer’s protocol. RNA purity and quantity were assessed using Nanodrop2000 (ThermoFisher Scientific, USA). Total RNA was transcribed using a High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, USA) according to the manufacturer’s instructions. The primers for NRG1 gene expression evaluation were Forward:5'- TCGTGGAATCAAACGCTACA-3' and Reverse:5'-ACTCCCCTCCATTCACACA-3'.
  • 18S rRNA was used as a housekeeping gene with the primers Forward: 5'- TCGCTCCACCAACTAAGAAC-3' and Reverse: 5'-TGACTCAACACGGGAAAC-3'.
  • the mRNA expression of the candidate and reference genes were quantified in MDD patients and HCs using Maxima SYBR Green/ROX qPCRMaster Mix (ThermoFisher Scientific) on the QuantStudio3 system (Applied Biosystems). The amplification for each sample was carried out in two biological replicates and two technical replicates, then the mean Cq value (quantitation cycle) was used to determine the fold expression change in MDD patients compared to the HCs. The relative expression levels of the target gene were calculated using the 2 AACt method.
  • GSEA was performed. It consists of a computational method that identifies whether a priori set of genes shows statistically significant and coherent differences between two groups across several thousand pre-defined datasets including complex molecular mechanisms, immunologic signatures, biological processes, molecular activities and cellular structures. The enrichment analysis showed that a total of 3262 and 2186 significant pathways (p ⁇ 0.05) were identified in batch 1 and batch 2, respectively.
  • the GSEA revealed that the DEGs in MDD patients are mainly enriched in pathways related to immune response, immune effector processes, signaling by cytokines, inflammatory response, proinfl ammatory and profibrotic mediators, cellular responses to stimuli, neurodegeneration pathways, cerebellar atrophy pathway, and neuroactive ligand- receptor interactions. The details of all significant pathways are listed in Supplementary Table 2
  • the leading-edge analysis was also performed for all identified significant gene sets in each subgroup of MDD and HCs. Subsequently, the gene frequency was calculated based on the occurrence of a gene amongst all enriched leading-edge core genes from the significant over-represented gene sets in each batch. Overall, 353 frequent genes resulted based on the merged data of MDD patients compared to HCs (Supplementary Table 3). These identified genes could be considered the most informative genes that play or control potential biological roles among the over-enriched pathways in MDD compared to HCs.
  • Table 2 Filtered Top leading genes in MDD patients compared to healthy controls.
  • the final model’s score is 0.76 ⁇ 0.11 and ROC- AUC is 0.82+0.12.
  • the performance classification of the best-built classifiers trained on the merged discovery dataset which consists of the LR best model along with LR as the feature selection method was further evaluated in four well-characterized independent transcriptomics datasets from different populations.
  • Table 5 The classification performance comparison (F1 -macro) with and without retraining compared to a baseline classifier in external transcriptomics datasets.
  • NRG1, neuregulin 1 was the top frequently altered gene in three out of four different MDD populations.
  • the ML on 80% of the entire combined datasets including the discovery and external datasets along with performance evaluation in the 20% holdout showed that the best-built classifiers trained on the merged discovery dataset which consists of the LR best model along with LR as the feature selection method showed a 0.58 ⁇ 0.05 F1 macro score on 5-fold cross-validation which is better than baseline with 0.38 ⁇ 0.04 F1 macro score.
  • the NRG1 gene was considered among the most performing genes to distinguish between MDD patients and HC groups among the independent populations with diverse ethnicities. Taking these results together, it was suggestive of NRG1 as a potential robust biomarker to differentiate between MDD patients and HCs under different depression-related conditions as well as to follow up the MDD patients’ remission.
  • NRG1 is a liquid biopsy and brain-related biomarker [00111]
  • AHBA AHBA
  • the NRG1 expression was assessed among transcriptomic data of six post-mortem brains from health samples without known neuropsychiatric or neuropathological disorders history.
  • the NRG1 is expressed in the left cortex and subcortex with variable intensity among the six studied samples, suggesting that the NRG1 is a brain-related biomarker that could be informative upon a mental disorder such as depression. Due to the inaccessibility of the brain tissues, saliva biomarkers are considered the ultimate non-invasive alternative for mental- related diseases.
  • NRG1 As a potent MDD-related biomarker in saliva samples, the presence of NRGJ was investigated using quantitative RT-PCR on 12 patients with MDD and 8 HCs from the Kazakh population. The latter has a particular population ethnicity and is different from the other populations that were explored in the discovery and external datasets.
  • the relative quantification of NRG1 gene expression after normalization to the reference gene 18S rRNA showed an up-regulation by the factor 1.48 in MDD patients compared to HCs ( Figure 6).

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Primary Health Care (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Wood Science & Technology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Zoology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)

Abstract

La présente divulgation concerne un procédé d'analyse de biomarqueurs. Plus particulièrement, la présente invention concerne un procédé pour identifier des biomarqueurs associés à des troubles de la santé mentale tels que, mais sans s'y limiter, le trouble dépressif majeur (TDM). Le procédé selon la présente invention est basé sur une analyse intégrative entre les techniques d'apprentissage automatique et de bio-informatique. Le procédé permet d'exploiter efficacement des ensembles de données accessibles au public concernant des patients atteints de troubles mentaux afin d'identifier des biomarqueurs susceptibles d'aider au diagnostic et à la prédiction de l'apparition de ces troubles de la santé mentale. La présente invention porte également sur le(s) biomarqueur(s) identifié(s) selon ledit procédé, qui permet(tent) une détection précoce et efficace des troubles de la santé mentale grâce à des méthodes non invasives et à l'application de ce(s) biomarqueur(s).
PCT/IB2024/056625 2023-07-08 2024-07-08 Procédé permettant identifier un ou des biomarqueurs relatifs aux troubles de la santé mentale et biomarqueurs déterminés selon ce procédé Pending WO2025012793A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363525664P 2023-07-08 2023-07-08
US63/525,664 2023-07-08

Publications (1)

Publication Number Publication Date
WO2025012793A1 true WO2025012793A1 (fr) 2025-01-16

Family

ID=94214893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2024/056625 Pending WO2025012793A1 (fr) 2023-07-08 2024-07-08 Procédé permettant identifier un ou des biomarqueurs relatifs aux troubles de la santé mentale et biomarqueurs déterminés selon ce procédé

Country Status (1)

Country Link
WO (1) WO2025012793A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121281845A (zh) * 2025-12-08 2026-01-06 浙江大学 基于多组学深度学习的结直肠癌肝转移检测方法和系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019140380A1 (fr) * 2018-01-12 2019-07-18 Kymera Therapeutics, Inc. Agents de dégradation de protéines et utilisations associées
EP3819386A1 (fr) * 2019-11-08 2021-05-12 Alcediag Diagnostic des troubles de l'humeur à l'aide de biomarqueurs d'édition d'arn sanguin

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019140380A1 (fr) * 2018-01-12 2019-07-18 Kymera Therapeutics, Inc. Agents de dégradation de protéines et utilisations associées
EP3819386A1 (fr) * 2019-11-08 2021-05-12 Alcediag Diagnostic des troubles de l'humeur à l'aide de biomarqueurs d'édition d'arn sanguin

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BOUZID AMAL, ALMIDANI ABDULRAHMAN, ZUBRIKHINA MARIA, KAMZANOVA ALTYNGUL, ILCE BURCU YENER, ZHOLDASSOVA MANZURA, YUSUF AYESHA M., B: "Integrative bioinformatics and artificial intelligence analyses of transcriptomics data identified genes associated with major depressive disorders including NRG1", NEUROBIOLOGY OF STRESS, vol. 26, 1 September 2023 (2023-09-01), pages 100555, XP093267414, ISSN: 2352-2895, DOI: 10.1016/j.ynstr.2023.100555 *
LEVCHENKO ANASTASIA, VYALOVA NATALIA M., NURGALIEV TIMUR, POZHIDAEV IVAN V., SIMUTKIN GERMAN G., BOKHAN NIKOLAY A., IVANOVA SVETLA: "NRG1, PIP4K2A, and HTR2C as Potential Candidate Biomarker Genes for Several Clinical Subphenotypes of Depression and Bipolar Disorder", FRONTIERS IN GENETICS, FRONTIERS RESEARCH FOUNDATION, SWITZERLAND, vol. 11, Switzerland , XP093267416, ISSN: 1664-8021, DOI: 10.3389/fgene.2020.00936 *
ZHAO SHU, BAO ZHIWEI, ZHAO XINYI, XU MENGXIANG, LI MING D., YANG ZHONGLI: "Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods", FRONTIERS IN NEUROSCIENCE, FRONTIERS RESEARCH FOUNDATION, CH, vol. 15, CH , XP093267415, ISSN: 1662-453X, DOI: 10.3389/fnins.2021.645998 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121281845A (zh) * 2025-12-08 2026-01-06 浙江大学 基于多组学深度学习的结直肠癌肝转移检测方法和系统

Similar Documents

Publication Publication Date Title
Zhao et al. Identification of diagnostic markers for major depressive disorder using machine learning methods
Lee et al. Prediction of Alzheimer’s disease using blood gene expression data
US20190108915A1 (en) Disease monitoring from insurance claims data
Motelow et al. Sub-genic intolerance, ClinVar, and the epilepsies: A whole-exome sequencing study of 29,165 individuals
Laing et al. Identifying and validating blood mRNA biomarkers for acute and chronic insufficient sleep in humans: a machine learning approach
Ramaswamy et al. Feature selection for Alzheimer’s gene expression data using modified binary particle swarm optimization
Bouzid et al. Integrative bioinformatics and artificial intelligence analyses of transcriptomics data identified genes associated with major depressive disorders including NRG1
Chandrashekar et al. DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype–phenotype prediction
Liu et al. Identification of crucial genes for predicting the risk of atherosclerosis with system lupus erythematosus based on comprehensive bioinformatics analysis and machine learning
Liu et al. Digital phenotyping from wearables using AI characterizes psychiatric disorders and identifies genetic associations
US20230348980A1 (en) Systems and methods of detecting a risk of alzheimer's disease using a circulating-free mrna profiling assay
Breen et al. Systematic review of blood transcriptome profiling in neuropsychiatric disorders: guidelines for biomarker discovery
WO2025012793A1 (fr) Procédé permettant identifier un ou des biomarqueurs relatifs aux troubles de la santé mentale et biomarqueurs déterminés selon ce procédé
CA2885634A1 (fr) Dispositif de detection d'un biomarqueur de reseau dynamique, procede de detection et programme de detection
Kim et al. Deep learning-based brain age prediction in patients with schizophrenia spectrum disorders
Natarajan et al. A novel method for bioinformatics analysis in gene expression profiling framework for personalized healthcare applications
AU2021100434A4 (en) A system and method for predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes
Casalino et al. Evaluation of cognitive impairment in pediatric multiple sclerosis with machine learning: an exploratory study of miRNA expressions
Tassi et al. Clinical stratification of Major Depressive Disorder in the UK Biobank: A gene-environment-brain Topological Data Analysis
Tassi et al. Gene–environment–brain topology reveals clinical subtypes of depression in UK Biobank
Lopez-Rincon et al. Modelling asthma patients’ responsiveness to treatment using feature selection and evolutionary computation
Shafik et al. GENETIC BIOMARKERS DETECTION FOR ALZHEIMER’S DISEASE
Wei et al. NetMoST: A network-based machine learning approach for subtyping schizophrenia using polygenic SNP allele biomarkers
Liu et al. Outcome-guided disease subtyping for high-dimensional omics data
CN120536571B (zh) 基于基因标志物组合诊断或预测阿尔茨海默症的系统、设备或介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24838982

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: P2026-00091

Country of ref document: AE

WWE Wipo information: entry into national phase

Ref document number: 2024838982

Country of ref document: EP