CN119069003A - A prediction system and electronic device for congenital heart disease combined with intellectual disability - Google Patents
A prediction system and electronic device for congenital heart disease combined with intellectual disability Download PDFInfo
- Publication number
- CN119069003A CN119069003A CN202411127026.2A CN202411127026A CN119069003A CN 119069003 A CN119069003 A CN 119069003A CN 202411127026 A CN202411127026 A CN 202411127026A CN 119069003 A CN119069003 A CN 119069003A
- Authority
- CN
- China
- Prior art keywords
- heart disease
- sample
- congenital heart
- model
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000002330 Congenital Heart Defects Diseases 0.000 title claims abstract description 112
- 208000028831 congenital heart disease Diseases 0.000 title claims abstract description 110
- 201000006347 Intellectual Disability Diseases 0.000 title claims 17
- 230000014509 gene expression Effects 0.000 claims abstract description 139
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 105
- 101001048764 Homo sapiens Protein FAM118A Proteins 0.000 claims abstract description 67
- 102100023783 Protein FAM118A Human genes 0.000 claims abstract description 67
- 101000878221 Homo sapiens Peptidyl-prolyl cis-trans isomerase FKBP8 Proteins 0.000 claims abstract description 66
- 101000881247 Homo sapiens Spectrin beta chain, erythrocytic Proteins 0.000 claims abstract description 65
- 102100037613 Spectrin beta chain, erythrocytic Human genes 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims abstract description 58
- 102100036978 Peptidyl-prolyl cis-trans isomerase FKBP8 Human genes 0.000 claims abstract description 42
- 102000004169 proteins and genes Human genes 0.000 claims description 30
- 108020004999 messenger RNA Proteins 0.000 claims description 22
- 239000003153 chemical reaction reagent Substances 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 16
- 238000010801 machine learning Methods 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 claims description 14
- 238000011161 development Methods 0.000 claims description 13
- 238000012165 high-throughput sequencing Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 10
- 210000004369 blood Anatomy 0.000 claims description 10
- 239000008280 blood Substances 0.000 claims description 10
- 238000003757 reverse transcription PCR Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 6
- 238000012417 linear regression Methods 0.000 claims description 6
- 238000007477 logistic regression Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000007637 random forest analysis Methods 0.000 claims description 6
- 238000012706 support-vector machine Methods 0.000 claims description 6
- 238000011529 RT qPCR Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 5
- 238000003119 immunoblot Methods 0.000 claims description 5
- 238000007901 in situ hybridization Methods 0.000 claims description 5
- 238000004949 mass spectrometry Methods 0.000 claims description 5
- 210000002966 serum Anatomy 0.000 claims description 4
- 238000003364 immunohistochemistry Methods 0.000 claims description 2
- 238000002360 preparation method Methods 0.000 claims description 2
- 238000001114 immunoprecipitation Methods 0.000 claims 1
- 208000036626 Mental retardation Diseases 0.000 abstract description 91
- 238000001514 detection method Methods 0.000 abstract description 25
- 230000035945 sensitivity Effects 0.000 abstract description 11
- 239000000523 sample Substances 0.000 description 114
- 238000012163 sequencing technique Methods 0.000 description 33
- 201000010099 disease Diseases 0.000 description 13
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 13
- 210000001519 tissue Anatomy 0.000 description 11
- 239000000758 substrate Substances 0.000 description 10
- 210000004027 cell Anatomy 0.000 description 9
- 238000003745 diagnosis Methods 0.000 description 8
- 208000019622 heart disease Diseases 0.000 description 7
- 241000124008 Mammalia Species 0.000 description 6
- 150000007523 nucleic acids Chemical group 0.000 description 6
- 238000003908 quality control method Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 238000002965 ELISA Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 108091033319 polynucleotide Proteins 0.000 description 5
- 102000040430 polynucleotide Human genes 0.000 description 5
- 239000002157 polynucleotide Substances 0.000 description 5
- 239000007790 solid phase Substances 0.000 description 5
- 238000000018 DNA microarray Methods 0.000 description 4
- 230000003321 amplification Effects 0.000 description 4
- 238000000749 co-immunoprecipitation Methods 0.000 description 4
- 238000002493 microarray Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 239000002773 nucleotide Substances 0.000 description 4
- 125000003729 nucleotide group Chemical group 0.000 description 4
- 108090000765 processed proteins & peptides Proteins 0.000 description 4
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- UHOVQNZJYSORNB-UHFFFAOYSA-N Benzene Chemical compound C1=CC=CC=C1 UHOVQNZJYSORNB-UHFFFAOYSA-N 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000009396 hybridization Methods 0.000 description 3
- 238000012151 immunohistochemical method Methods 0.000 description 3
- 108020004707 nucleic acids Proteins 0.000 description 3
- 102000039446 nucleic acids Human genes 0.000 description 3
- 239000002751 oligonucleotide probe Substances 0.000 description 3
- 230000035935 pregnancy Effects 0.000 description 3
- 230000002035 prolonged effect Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 206010008805 Chromosomal abnormalities Diseases 0.000 description 2
- 208000031404 Chromosome Aberrations Diseases 0.000 description 2
- 108020004414 DNA Proteins 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 238000001574 biopsy Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000002552 multiple reaction monitoring Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000012071 phase Substances 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000001308 synthesis method Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 108091023037 Aptamer Proteins 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 241000700198 Cavia Species 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241000709687 Coxsackievirus Species 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241001272567 Hominoidea Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 206010056254 Intrauterine infection Diseases 0.000 description 1
- FYYHWMGAXLPEAU-UHFFFAOYSA-N Magnesium Chemical compound [Mg] FYYHWMGAXLPEAU-UHFFFAOYSA-N 0.000 description 1
- 208000005647 Mumps Diseases 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 239000000020 Nitrocellulose Substances 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241000282579 Pan Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 108010027179 Tacrolimus Binding Proteins Proteins 0.000 description 1
- 102000018679 Tacrolimus Binding Proteins Human genes 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 238000007622 bioinformatic analysis Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000012503 blood component Substances 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 238000013115 immunohistochemical detection Methods 0.000 description 1
- 230000002055 immunohistochemical effect Effects 0.000 description 1
- 238000012296 in situ hybridization assay Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 229910052749 magnesium Inorganic materials 0.000 description 1
- 239000011777 magnesium Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000036630 mental development Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 208000010805 mumps infectious disease Diseases 0.000 description 1
- 229920001220 nitrocellulos Polymers 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000035764 nutrition Effects 0.000 description 1
- 235000016709 nutrition Nutrition 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000005304 optical glass Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 239000000575 pesticide Substances 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 239000000941 radioactive substance Substances 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 239000003161 ribonuclease inhibitor Substances 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 201000005404 rubella Diseases 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- -1 small molecule compound Chemical class 0.000 description 1
- 238000010532 solid phase synthesis reaction Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 239000008223 sterile water Substances 0.000 description 1
- 210000001179 synovial fluid Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Pathology (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a congenital heart disease combined mental retardation prediction system and electronic equipment, and also provides a congenital heart disease combined mental retardation prediction method, which comprises the steps of obtaining expression data of target genes of a sample to be detected, wherein the target genes are any one or more of FKBP8 and FAM118A, SPTB, carrying out classification prediction based on the expression data of the target genes to obtain a classification result of whether the sample to be detected is a congenital heart disease combined mental retardation sample, and the method has the advantages of high accuracy, high detection sensitivity, high specificity, low cost and the like, has a wide application range, and provides support for clinicians to timely take more personalized control schemes for patients with congenital heart disease combined mental retardation.
Description
Technical Field
The invention belongs to the technical field of disease prediction method development, and particularly relates to a prediction system and electronic equipment for congenital heart disease combined with mental retardation, and more particularly relates to a prediction method, a prediction system, electronic equipment and a computer readable storage medium for congenital heart disease combined with mental retardation.
Background
Possible causes of congenital heart disease combined with mental retardation include chromosomal abnormalities (e.g., exposure of pregnant women to radioactive substances, prolonged exposure to chemical substances, or viral infections during pregnancy, which may result in chromosomal abnormalities), intrauterine infections (e.g., infection of pregnant women with rubella, coxsackie virus, mumps, etc.), prolonged exposure to harmful substances during pregnancy (e.g., prolonged exposure of pregnant women to heavy metals, pesticides, benzene, etc.), and the like. Congenital heart disease with mental retardation is one of congenital heart diseases (congenital HEART DISEASE, CHD). At present, a prediction method or a prediction product capable of effectively diagnosing or predicting congenital heart disease combined with mental retardation is not known in the art.
Disclosure of Invention
In view of the above, the present invention aims to provide a prediction system and an electronic device for congenital heart disease complicated with mental retardation.
The invention also provides a prediction method of congenital heart disease combined mental retardation, which comprises the steps of obtaining expression data of target genes of a sample to be detected, wherein the target genes are any one or more of FKBP8 and FAM118A, SPTB, and carrying out classification prediction based on the expression data of the target genes to obtain a classification result of whether the sample to be detected is the congenital heart disease combined mental retardation sample.
The prediction method provided by the invention has the advantages of high accuracy, high detection sensitivity and specificity, low cost and the like, has a wide application range, and provides support for clinicians to timely take more personalized control schemes for patients with congenital heart diseases complicated with mental retardation.
The invention adopts the following technical scheme to realize the technical purposes:
prediction method for congenital heart disease combined with mental retardation
The invention firstly provides a prediction method for congenital heart disease combined with mental retardation, which comprises the following steps:
Obtaining expression data of a target gene of a sample to be detected, wherein the target gene is any one or more of FKBP8 and FAM118A, SPTB;
based on the expression data of the target gene, carrying out classification prediction to obtain a classification result of whether the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than a threshold value, obtaining a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is higher than a threshold value and/or the expression level of FAM118A is lower than the threshold value, a classification result that the sample to be detected is a sample with non-congenital heart disease combined with mental retardation is obtained.
Further, the classification result is obtained based on a prediction model;
the method for constructing the prediction model comprises the steps of obtaining target gene expression data of a training set sample and clinical characteristics corresponding to the sample, wherein the clinical characteristics comprise a congenital heart disease combined mental retardation patient and a congenital heart disease combined mental retardation patient, extracting target gene expression data in the training set, inputting the target gene expression data into a machine learning model, constructing the prediction model, and obtaining a constructed prediction model.
Further, the machine learning model includes a linear regression model, a logistic regression model, a random forest model, a Lasso regression model, a neural network model, a decision tree model, a perceptron model, a support vector machine model, and/or a naive bayes model.
Further, the target gene expression data includes mRNA expression amount data of the target gene or protein expression amount data of the target gene;
the mRNA expression quantity data are mRNA expression quantity data obtained by a high-throughput sequencing technology, RT-PCR, qRT-PCR or in situ hybridization technology;
The protein expression amount data are obtained by an immunoblotting method, an immunohistochemical method, a mass spectrometry method or a co-immunoprecipitation method.
Further, the sample to be tested includes a blood sample, a serum sample, a plasma sample, a tissue sample, and/or a cell sample.
Prediction system for congenital heart disease combined with mental retardation
The invention also provides a prediction system for congenital heart disease combined with mental retardation, which comprises the following steps:
The method comprises the steps of obtaining a data unit, wherein the data unit is used for obtaining expression data of a target gene of a sample to be detected, and the target gene is any one or more of FKBP8 and FAM118A, SPTB;
The analysis and prediction unit is used for carrying out classification and prediction based on the expression data of the target gene to obtain a classification result of whether the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than a threshold value, obtaining a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is higher than a threshold value and/or the expression level of FAM118A is lower than the threshold value, obtaining a classification result that the sample to be detected is a sample with non-congenital heart disease combined with mental retardation;
the classification result is obtained based on a prediction model;
Obtaining target gene expression data of a training set sample and clinical characteristics corresponding to the sample, wherein the clinical characteristics comprise a congenital heart disease combined mental retardation patient and a congenital heart disease combined mental retardation patient, extracting the target gene expression data in the training set, inputting the target gene expression data into a machine learning model, and constructing a prediction model to obtain a constructed prediction model;
And the output result unit is used for outputting the classification result.
Further, the machine learning model includes a linear regression model, a logistic regression model, a random forest model, a Lasso regression model, a neural network model, a decision tree model, a perceptron model, a support vector machine model, and/or a naive bayes model.
Electronic equipment
The invention also provides an electronic device comprising a memory for storing program instructions and a processor for invoking the program instructions for performing the steps of the congenital heart disease combined mental retardation prediction method as described above, when the program instructions are executed.
Computer readable storage medium
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a congenital heart disease combined intellectual development lag prediction method as described above.
New use of reagent for detecting target gene FKBP8, FAM118A and/or SPTB expression level
The invention also provides application of the reagent for detecting the expression level of the target genes FKBP8, FAM118A and/or SPTB in the sample to be detected in preparation of products for diagnosing congenital heart disease with mental retardation.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a prediction method for congenital heart disease combined with mental retardation according to an embodiment of the invention;
Fig. 2 is a schematic diagram of a prediction system for congenital heart disease combined with mental retardation according to an embodiment of the invention;
fig. 3 is a schematic diagram of an electronic device for predicting congenital heart disease combined with mental retardation according to an embodiment of the invention;
FIG. 4 is an exemplary diagram of a read FASTQ data format;
FIG. 5 is a sequencing quality histogram;
FIG. 6 is a sequence average mass fraction;
FIG. 7 is a graph of correlation analysis among mRNA samples;
FIG. 8 is a volcanic and thermal map (top 100) corresponding to a differentially expressed gene;
FIG. 9 shows the result of enrichment of differentially expressed genes (GO enrichment on the left and KEGG enrichment on the right);
FIG. 10 is a graph showing the differential expression results of FKBP8 and FAM118A, SPTB, wherein A is a sample derived from a patient suffering from a congenital heart disease with normal nutrition and development of intelligence, and B is a sample derived from a patient suffering from a congenital heart disease with delayed development of intelligence;
FIG. 11 shows the results of diagnostic efficacy validation of FKBP8 and FAM118A, SPTB for predicting combined mental retardation of congenital heart disease, respectively;
FIG. 12 shows the results of a diagnostic efficacy test for the combination of any two of FKBP8 and FAM118A, SPTB for predicting combined mental retardation in the case of congenital heart disease;
FIG. 13 shows the results of diagnosis and efficacy verification of the combination of FKBP8 and FAM118A, SPTB for predicting the combined mental retardation of the congenital heart disease.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the above figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S101, S102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments according to the invention without any creative effort, are within the protection scope of the invention.
Fig. 1 is a schematic flow chart of a prediction method for congenital heart disease combined with mental retardation, which is provided by the embodiment of the invention, specifically, the method comprises the following steps:
s101, obtaining expression data of a target gene of a sample to be detected, wherein the target gene is any one or more of FKBP8 and FAM118A, SPTB;
In one embodiment, the test sample is derived from a subject.
In one embodiment, the subject includes a mammal and a non-mammal. Examples of mammals include, but are not limited to, any member of the class mammalia, humans, non-human primates such as chimpanzees and other apes and monkeys, farm animals such as cows, horses, sheep, goats, pigs, domestic animals such as rabbits, dogs and cats, laboratory animals including rodents such as rats, mice and guinea pigs, and the like. Examples of non-mammals include, but are not limited to, birds, fish, or other non-mammals, and the like.
In one embodiment, the subject is a human.
In one embodiment, the test sample refers to a composition obtained or derived from a target subject comprising cellular entities and/or other molecular entities to be characterized and/or identified, e.g. based on physical, biochemical, chemical and/or physiological characteristics. The sample may be obtained from blood and other fluid samples of biological origin and tissue samples of the subject, such as biopsy tissue samples or tissue cultures or cells derived therefrom. The source of the tissue sample may be solid tissue, such as tissue from fresh, frozen and/or preserved organs or tissue samples, biopsy tissue or aspirates, blood or any blood component, body fluids, cells from any time of pregnancy or development of the individual, or plasma.
In one embodiment, the sample to be tested comprises a subject-derived blood sample, a tissue sample, a blood-derived cell sample, a serum sample, a plasma sample, a lymph sample, a synovial fluid sample, a cell extract sample, and/or any combination thereof.
In one embodiment, the sample to be tested is a blood sample from a subject.
In one embodiment, the expression data of the target gene includes mRNA expression level data of the target or protein expression level data of the target.
In one embodiment, the mRNA expression level data is mRNA expression level data obtained by high throughput sequencing techniques, RT-PCR, qRT-PCR, or in situ hybridization techniques. Any method or technique that can be used to detect the amount of mRNA expression corresponding to a gene can be used in the present invention.
In one embodiment, the protein expression level data is protein expression level data obtained by mass spectrometry, immunoblotting, immunohistochemistry, or co-immunoprecipitation. Any method or technique that can be used to detect the amount of protein expression corresponding to a gene can be used in the present invention.
In one embodiment, the target gene expression data is mRNA expression level data of a target gene, and the mRNA expression level data of the target gene can be obtained by a high-throughput sequencing technology, and in a specific embodiment of the present invention, the high-throughput sequencing technology is an Illumina high-throughput sequencing platform.
In one embodiment, the target gene is any one or a combination of a plurality of FKBP8 and FAM118A, SPTB, specifically, the target gene comprises any one of FKBP8 and FAM118A, SPTB, any two combinations of FKBP8 and FAM118A, SPTB, and three combinations of FKBP8 and FAM118A, SPTB.
S102, carrying out classification prediction based on the expression data of the target gene to obtain a classification result of whether the sample to be detected is a congenital heart disease combined intelligence development lag sample;
in one embodiment, the classification result is derived based on a predictive model;
the method for constructing the prediction model comprises the steps of obtaining target gene expression data of a training set sample and clinical characteristics corresponding to the sample, wherein the clinical characteristics comprise a congenital heart disease combined mental retardation patient and a congenital heart disease combined mental retardation patient, extracting target gene expression data in the training set, inputting the target gene expression data into a machine learning model, constructing the prediction model, and obtaining a constructed prediction model.
In one embodiment, the machine learning model includes a linear regression model, a logistic regression model, a random forest model, a Lasso regression model, a neural network model, a decision tree model, a perceptron model, a support vector machine model, and/or a naive bayes model.
In one embodiment, the target gene expression data includes mRNA expression amount data of the target gene or protein expression amount data of the target gene;
the mRNA expression quantity data are mRNA expression quantity data obtained by a high-throughput sequencing technology, RT-PCR, qRT-PCR or in situ hybridization technology;
The protein expression amount data are obtained by an immunoblotting method, an immunohistochemical method, a mass spectrometry method or a co-immunoprecipitation method.
In one embodiment, the target gene expression data is mRNA expression level data of a target gene, and the mRNA expression level data of the target gene can be obtained by a high-throughput sequencing technology, and in a specific embodiment of the present invention, the high-throughput sequencing technology is an Illumina high-throughput sequencing platform.
In one embodiment, the sample to be tested comprises a blood sample, a serum sample, a plasma sample, a tissue sample, and/or a cell sample.
In one embodiment, the method further comprises determining whether the sample to be tested is a congenital heart disease combined with mental retardation sample based on the following criteria:
If the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than a threshold value, obtaining a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is higher than a threshold value and/or the expression level of FAM118A is lower than the threshold value, a classification result that the sample to be detected is a sample with non-congenital heart disease combined with mental retardation is obtained.
In one embodiment, the efficiency of the constructed prediction model may be further predicted, that is, a data set containing the target gene expression data corresponding to the patients with the combined mental retardation of the congenital heart disease and the patients with the combined mental retardation of the congenital heart disease may be further taken, and the efficiency of the constructed prediction model may be verified in the data set. Proved by verification, the constructed prediction model can be effectively used for predicting the disease of the combined mental retardation of the congenital heart disease.
In one embodiment, the inventors collected blood samples from patients with combined mental retardation of congenital heart disease and sequenced analysis were performed to verify the expression of the above-mentioned target genes and the diagnostic efficacy of the disease with combined mental retardation of congenital heart disease. Clinical information of patients with the mental retardation due to the combined development of the congenital heart disease and patients with the mental retardation due to the combined development of the congenital heart disease are shown in the following table 1.
TABLE 1 clinical information of patients with advanced heart disease and mental retardation and patients with normal heart disease and mental retardation
In one embodiment, sequencing a cDNA library using an Illumina high throughput sequencing platform based on sequencing-by-synthesis (Sequencing By Synthesis, SBS) technology can yield large amounts of high quality Reads, and these Reads or bases produced by the sequencing platform are referred to as Raw Data (Raw Data). In order to facilitate analysis of sequencing data, the original image data obtained by Illumina sequencing is converted into sequence data, namely a FASTQ format, by Base rolling, so as to obtain the most original sequencing data file. The FASTQ format file may record the base and mass fraction of the read (read) measured. As shown in FIG. 4, the FASTQ format is stored in units of sequencing reads, each read occupies 4 rows, with the first and third rows consisting of a file identification tag (sequence identifiers) and a read name (ID) (the first row beginning with "@" and the third row beginning with "+"; the ID in the third row may be omitted but "+" cannot be omitted), the second row base sequence, and the fourth row the sequencing mass fraction of the corresponding position base.
The Illumina sequencer had 2 flowcell run and 8 lanes in flowcell, and one lane contained 2 columns, each column containing 60 tiles, each tile in turn was seeded with a different cluster, and the detailed information in the sequencing file identifier (Sequence Identifiers) generated is shown in table 2.
TABLE 2 sequencing File identification tag interpretation
The mass fraction of the read is represented in different characters, wherein the ASCII value corresponding to each character is subtracted by 33, which is the corresponding sequencing mass value. Typically, the base quality is from 0-40, i.e., the corresponding ASCII code is from "|" (0+33) to "I" (40+33). If the sequencing error rate is represented by E and the base quality value of Illunima HiSeq/Miseq is represented by Q, then there is the relationship that equation 1:Q = -10log 10 (E). Sequencing reads error rate increases as sequencing proceeds (see table 3), which is caused by the consumption of chemical reagents during sequencing and is a common feature of Illumina high throughput sequencing platforms.
TABLE 3 simple correspondence between sequencing error Rate and sequencing quality value
In one embodiment, the present application performs data volume statistics on a sequence of raw data, the results of which are shown in Table 4. In one embodiment, the present application provides quality control of the sequencing data, which can severely impact the quality of subsequent assembly, as sequencing adapter sequences, low quality reads, and other types of redundant sequences can be included in the original sequencing data. To ensure accuracy of subsequent bioinformatic analysis, the raw sequencing data is first filtered to obtain high quality sequencing data (CLEAN DATA). Cutadapt can find and delete adaptors, primers, polyA tails and other types of redundant sequences in reads. Sequencing data were washed using Cutadapt (version, 3.5, default parameters). In one embodiment, the application performs statistics on the data after quality control, multiQC supports integrating the quality control result, and performs data volume statistics on the sequence after data quality control. In one embodiment, the quality of data is evaluated after quality control, the Illumina sequencing belongs to a second generation sequencing technology, billions of reads can be generated by single operation, so that massive data cannot display the quality of each read one by one, and the quality evaluation is performed on all sequencing reads by using a statistical method, so that the sequencing quality of a sample and the library construction quality can be intuitively reflected from a macroscopic view. We performed sequencing related quality assessment on sequencing data for each sample. The mass values averaged for each base position in reads are shown in FIG. 5. Statistics of the average quality score of the sequences and the number of reads are shown in fig. 6. In one embodiment, the application performs gene quantification, HISAT2 (version, 2.1.0) alignment genome, samtools (version, 1.9) ordering and indexing bam civilization, qualiMap (version, 2.2.2) quality control bam file, featureCounts (version, 1.6.4) performs gene quantification. Reference genome version grch38.Primary_assembly. Genome. Fa.
TABLE 4 sequencing data statistics table
Correlation of gene expression levels between samples is an important indicator that the reliability of the test and the selection of samples are reasonable. Biological replicates are necessary for biological experiments and are provided for mainly two purposes, one to prove that the biological experimental procedure involved is not occasional but reproducible and the other to ensure more reliable results for subsequent differential gene analysis. Correlation of gene expression levels between samples is an important indicator for testing the reliability of experiments and for reasonably selecting samples. The closer the correlation coefficient is to1, the higher the similarity of expression patterns between samples. If there are biological repeats in the sample, the correlation coefficient between the biological repeats is generally required to be high. The Encode program suggests that the square of the pearson correlation coefficient (R2) is greater than 0.92 (ideal sampling and experimental conditions). In particular embodiments, it is desirable that R2 be at least greater than 0.8 between biological replicates. And calculating correlation coefficients in and among groups according to the count values of all genes/transcripts of the samples, and drawing a heat map, so that sample differences among groups and sample repetition conditions in the groups can be visually displayed. The sample correlation heat map according to this embodiment is shown in fig. 7.
In one embodiment, the application performs differential expression analysis, in which it is determined whether there is a difference in the expression level of a certain gene in different samples in the transcriptome, which is one of the core contents of the analysis. After the gene expression level is obtained, the expression data can be statistically analyzed, and then the genes with obvious differences among different samples can be screened. The variance analysis is largely divided into two steps, (1) the original read count is first normalized (normalization), mainly to correct the sequencing depth. (2) Calculation of hypothesis test probability (P-value) by statistical model differential expression analysis using DEseq2, screening parameters pvalue <0.05 and |log2foldchange | >1. Screening by the above standard to obtain 362 genes with different expression between patients with mental retardation due to congenital heart disease and patients with mental retardation due to congenital heart disease, wherein 145 genes with up-regulated expression and 217 genes with down-regulated expression, and volcanic and heat maps are shown in figure 8.
In one embodiment, the application further performs functional enrichment analysis on the differential expression genes between the patients with the mental retardation of the prior heart disease and the mental retardation of the prior heart disease, and uses a David database (https:// David. Ncifcrf. Gov/tools. Jsp) to perform GO functional enrichment and KEGG functional enrichment analysis on the differential expression genes. The screening criteria were pvalue <0.05 and the enrichment results are shown in FIG. 9 and Table 5.
TABLE 5KEGG enrichment results
In one embodiment, the present application screens for differential expression genes including FKBP8 and FAM118A, SPTB between patients with combined mental retardation of congenital heart disease and patients with combined mental retardation of congenital heart disease, and the present application has found that the FKBP8 and FAM118A, SPTB have significant differential expression between patients with combined mental retardation of congenital heart disease and patients with combined mental retardation of congenital heart disease (see figure 10).
In one embodiment, the application further verifies the diagnostic efficacy of the screened differentially expressed genes FKBP8, FAM118A, SPTB for predicting the disease with combined mental retardation of congenital heart disease, performs Receiver Operating Characteristic (ROC) analysis by using R package "pROC" (version 1.15.0), calculates the area under the working characteristic curve (AUC) of the subject to evaluate the accuracy of the single differentially expressed gene, any two differentially expressed gene combinations and three differentially expressed gene combinations respectively for predicting the disease with combined mental retardation of congenital heart disease, wherein the AUC value ranges from 0 to 1. When judging the diagnosis efficacy of a single differential expression gene on the disease of predicting the advanced heart disease combined with the mental retardation, directly analyzing the expression quantity of the single differential expression gene, and selecting the level corresponding to the point with the maximum Youden index as the cutoff value. When judging the diagnosis efficacy of any two differential expression gene combinations and three differential expression gene combinations for predicting the disease of the combined mental retardation of the congenital heart disease, carrying out Logistics regression analysis on the expression level of each differential expression gene, calculating the probability of each patient being the combined mental retardation of the congenital heart disease through a fitted regression curve, determining different probability division thresholds, and calculating the accuracy, the specificity, the sensitivity and the like of any two differential expression gene combinations and three differential expression gene combinations for predicting the disease of the combined mental retardation of the congenital heart disease according to the determined probability division thresholds.
The results of the verification are shown in fig. 11-13, and the results show that the single genes FKBP8 and FAM118A, SPTB have higher diagnostic efficacy on the disease with the combined mental retardation of the predicted congenital heart disease, wherein the AUC value corresponding to FKBP8 is 0.786, the sensitivity is 80%, the specificity is 71.4%, the cutoff value is 22311.000, the AUC value corresponding to FAM118A is 0.829, the sensitivity is 80%, the specificity is 92.9%, the cutoff value is 715.500, the AUC value corresponding to SPTB is 0.800, the sensitivity is 60%, the specificity is 92.9% and the cutoff value is 15.000. The combination of any two genes in FKBP8 and FAM118A, SPTB has higher diagnosis efficacy on the disease with the complicated mental retardation of the predicted congenital heart disease, wherein the AUC value corresponding to FKBP8+FAM118A is 0.957, the sensitivity is 80%, the specificity is 100%, the cutoff value is 0.682, the AUC value corresponding to FKBP8+SPTB is 0.857, the sensitivity is 80%, the specificity is 92.9%, the cutoff value is 0.441, and the AUC value corresponding to FAM118A+SPTB is 0.986, the sensitivity is 100%, the specificity is 92.9%, and the cutoff value is 0.199. The combination of FKBP8 and FAM118A, SPTB has higher diagnosis efficacy on the disease with the complicated mental retardation of the predicted congenital heart disease, and the corresponding AUC value of FKBP8+ FAM118A + SPTB is 1.000, the sensitivity is 100%, the specificity is 100% and the cutoff value is 0.500. The verification results prove that FKBP8, FAM118A and/or SPTB have higher diagnosis efficacy for predicting the disease with the combined mental retardation of the congenital heart disease, namely the effectiveness (higher diagnosis efficacy, accuracy, sensitivity and specificity) of the prediction method provided by the application is proved, and the method can be used for effectively predicting the disease with the combined mental retardation of the congenital heart disease.
S103, if the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than the threshold value, obtaining a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is higher than a threshold value and/or the expression level of FAM118A is lower than the threshold value, a classification result that the sample to be detected is a sample with non-congenital heart disease combined with mental retardation is obtained.
In one embodiment, the classification result is obtained based on a prediction model, and the method for constructing the prediction model is as described above.
In one embodiment, as described above, the application collects blood samples of patients with mental retardation due to congenital heart disease and patients with mental retardation due to congenital heart disease, and performs sequencing analysis to obtain the target gene expression data corresponding to two clinical characteristics of patients with mental retardation due to congenital heart disease and patients with mental retardation due to congenital heart disease. Clinical information of patients with the mental retardation caused by the combined development of the congenital heart disease and patients with the normal mental development caused by the combined development of the congenital heart disease is shown in table 1. And extracting the target gene expression data and inputting the target gene expression data into a machine learning model to construct a prediction model, thus obtaining a constructed prediction model.
In one embodiment, the machine learning model comprises a machine learning model as previously described.
In an embodiment, the threshold value is obtained in the prediction model, i.e. included in the prediction model, in the case of the determination of the above-mentioned construction method of the prediction model, i.e. the threshold value is also determined in the case of the determination of the prediction model. The prediction model can predict whether the sample to be tested is a classification result of a sample with the first heart disease and the mental retardation based on the determined threshold value. The specific judgment result based on the prediction model is that if the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than the threshold value, a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample is obtained, and if the expression level of any one or more of the target genes FKBP8 and SPTB is higher than the threshold value and/or the expression level of FAM118A is lower than the threshold value, a classification result that the sample to be detected is a non-congenital heart disease combined mental retardation sample is obtained.
Fig. 2 is a schematic diagram of a prediction system for congenital heart disease combined with mental retardation according to an embodiment of the invention, specifically, the prediction system includes:
The method comprises the steps of obtaining a data unit, wherein the data unit is used for obtaining expression data of a target gene of a sample to be detected, and the target gene is any one or more of FKBP8 and FAM118A, SPTB;
The analysis and prediction unit is used for carrying out classification and prediction based on the expression data of the target gene to obtain a classification result of whether the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is lower than a threshold value and/or the expression level of FAM118A is higher than a threshold value, obtaining a classification result that the sample to be detected is a congenital heart disease combined mental retardation sample;
If the expression level of any one or more of the target genes FKBP8 and SPTB is higher than a threshold value and/or the expression level of FAM118A is lower than the threshold value, obtaining a classification result that the sample to be detected is a sample with non-congenital heart disease combined with mental retardation;
the classification result is obtained based on a prediction model;
Obtaining target gene expression data of a training set sample and clinical characteristics corresponding to the sample, wherein the clinical characteristics comprise a congenital heart disease combined mental retardation patient and a congenital heart disease combined mental retardation patient, extracting the target gene expression data in the training set, inputting the target gene expression data into a machine learning model, and constructing a prediction model to obtain a constructed prediction model;
And the output result unit is used for outputting the classification result.
In one embodiment, the target gene expression data includes mRNA expression amount data of the target gene or protein expression amount data of the target gene;
the mRNA expression quantity data are mRNA expression quantity data obtained by a high-throughput sequencing technology, RT-PCR, qRT-PCR or in situ hybridization technology;
The protein expression amount data are obtained by an immunoblotting method, an immunohistochemical method, a mass spectrometry method or a co-immunoprecipitation method.
In one embodiment, the machine learning model includes a linear regression model, a logistic regression model, a random forest model, a Lasso regression model, a neural network model, a decision tree model, a perceptron model, a support vector machine model, and/or a naive bayes model.
In one embodiment, the predictive model predicts whether the sample to be tested is a classification of a heart disease and mental retardation sample based on a determined threshold.
Fig. 3 is a schematic diagram of an electronic device for predicting congenital heart disease combined mental retardation according to an embodiment of the invention, specifically, the electronic device includes a memory and a processor, the memory is used for storing program instructions, and the processor is used for calling the program instructions, when the program instructions are executed, for executing the steps of the method for predicting congenital heart disease combined mental retardation as described above.
The embodiment of the invention also provides a computer readable storage medium, in particular to a computer program stored on the computer readable storage medium, and the computer program is executed by a processor to realize the steps of the prediction method for congenital heart disease combined with mental retardation.
The embodiment of the invention also provides application of the reagent for detecting the expression level of the target genes FKBP8, FAM118A and/or SPTB in the sample to be detected in preparing a product for diagnosing congenital heart disease with mental retardation.
In one embodiment, the product comprises a detection kit, a detection chip or a detection reagent strip.
In one embodiment, the detection kit, detection chip or detection kit strip comprises a reagent for detecting the expression level of the target genes FKBP8, FAM118A and/or SPTB in a test sample derived from a subject.
In one embodiment, the reagent comprises a reagent for detecting the mRNA expression level of FKBP8, FAM118A and/or SPTB in the test sample, a reagent for detecting the protein expression level of FKBP8, FAM118A and/or SPTB in the test sample, and/or a reagent for detecting the number of FKBP8, FAM118A and/or SPTB positive expressing cells in the test sample.
In one embodiment, the reagent for detecting the mRNA expression level of FKBP8, FAM118A and/or SPTB in the test sample comprises a primer for specifically amplifying FKBP8, FAM118A and/or SPTB, and/or a probe for specifically recognizing FKBP8, FAM118A and/or SPTB.
In one embodiment, the reagent for detecting the protein expression level of FKBP8, FAM118A and/or SPTB in the test sample comprises an antibody that specifically binds to a protein encoded by FKBP8, FAM118A and/or SPTB, a peptide that specifically binds to a protein encoded by FKBP8, FAM118A and/or SPTB, an aptamer that specifically binds to a protein encoded by FKBP8, FAM118A and/or SPTB, a small molecule compound that specifically binds to a protein encoded by FKBP8, FAM118A and/or SPTB, and/or an affinity protein that specifically binds to a protein encoded by FKBP8, FAM118A and/or SPTB.
In one embodiment, the reagent for detecting the number of FKBP8, FAM118A and/or SPTB-positive expressing cells in the test sample comprises a reagent for detecting the number of FKBP8, FAM118A and/or SPTB-positive expressing cells by an immunohistochemical assay.
In one embodiment, the primer is identical to an amplification primer, meaning a nucleic acid fragment comprising 5-100 nucleotides, preferably the primer or amplification primer comprises 15-30 nucleotides capable of initiating an enzymatic reaction (e.g., an enzymatic amplification reaction), in a specific embodiment of the invention the primer is a primer for a specific amplification gene FKBP8, FAM118A and/or SPTB.
In one embodiment, the probe refers to a molecule that binds to a specific sequence or subsequence or other portion of another molecule. In a specific embodiment of the present invention, the probe refers to a probe that specifically recognizes FKBP8, FAM118A and/or SPTB. Unless otherwise indicated, a probe generally refers to a polynucleotide probe that is capable of binding to another polynucleotide (often referred to as a target polynucleotide) by complementary base pairing. Depending on the stringency of the hybridization conditions, the probe is able to bind to a target polynucleotide that lacks complete sequence complementarity with the probe. Hybridization means include, but are not limited to, solution phase, solid phase, mixed phase or in situ hybridization assays. Exemplary probes in the present invention include gene-specific DNA oligonucleotide probes, such as microarray probes immobilized on a microarray substrate, quantitative nuclease protection test probes, probes attached to molecular barcodes, and probes immobilized on beads.
In one embodiment, the detection kit comprises an RT-PCR detection kit, an ELISA detection kit, a protein chip detection kit, a rapid detection kit, a DNA chip detection kit, an immunohistochemical detection kit, or an MRM (multiple reaction monitoring) detection kit.
In one embodiment, the detection kit may further comprise elements necessary for the reverse transcription polymerase chain reaction. The RT-PCR detection kit comprises a pair of primers specific for the gene encoding the marker protein. Each primer is a nucleotide having a nucleic acid sequence specific for the gene and may be about 7 to 50bp in length, more particularly about 10-39bp. In addition, the kit may further comprise primers specific for the nucleic acid sequence of the control gene, preferably the RT-PCR detection kit may further comprise a test tube or a suitable vessel, reaction buffers (different pH values and magnesium concentrations), deoxynucleotides (dNTPs), enzymes (e.g., taq polymerase and reverse transcriptase), deoxyribonuclease inhibitors, ribonuclease inhibitors, DEPC-water, and sterile water.
In one embodiment, the detection kit may contain the elements necessary for the manipulation of the DNA chip. The DNA chip kit may comprise a substrate to which a gene or cDNA or an oligonucleotide corresponding to a fragment thereof is bound, and reagents, agents and enzymes for constructing a fluorescently labeled probe. In addition, the substrate may comprise a control gene or cDNA or an oligonucleotide corresponding to a fragment thereof.
In one embodiment, the presently disclosed detection kits may contain the necessary elements for performing an ELISA. The ELISA detection kit may comprise antibodies specific for proteins (FKBP 8, FAM118A and/or SPTB-encoded proteins according to the invention). The antibodies have high selectivity and affinity for the marker protein, are non-cross-reactive with other proteins, and may be monoclonal, polyclonal or recombinant. Furthermore, the ELISA detection kit may comprise antibodies specific to a control protein. In addition, the ELISA detection kit may further comprise reagents capable of detecting the bound antibody, e.g., a labeled secondary antibody, a chromophore, an enzyme (e.g., conjugated to an antibody), and substrates thereof or substances capable of binding to the antibody.
In one embodiment, the detection chip, also referred to as a biochip or array, refers to a solid support comprising attached nucleic acid or peptide probes. The array typically comprises a plurality of different nucleic acid or peptide probes attached to the surface of a substrate at different known locations. These arrays, also known as microarrays, can typically be produced using either mechanical synthesis methods or light-guided synthesis methods that combine a combination of photolithographic methods and solid-phase synthesis methods. The array may comprise a planar surface or may be a bead, gel, polymer surface, fiber such as optical fiber, glass or any other suitable nucleic acid or peptide on a substrate. The array may be packaged in a manner that allows for diagnosis or other manipulation of the fully functional device. Microarrays are ordered arrays of hybridization array elements, such as polynucleotide probes (e.g., oligonucleotides) or binding agents (e.g., antibodies), on a substrate. The substrate may be a solid substrate, for example, a glass or silica slide, beads, a fiber optic binder, or a semi-solid substrate, for example, a nitrocellulose membrane. The nucleotide sequence may be DNA, RNA or any arrangement thereof.
In one embodiment, the detection chip comprises a gene chip and a protein chip, wherein the gene chip comprises a solid phase carrier, and oligonucleotide probes orderly fixed on the solid phase carrier, and the oligonucleotide probes specifically correspond to part or all of the sequences shown in FKBP8, FAM118A and/or SPTB. The protein chip comprises a solid phase carrier, and FKBP8, FAM118A and/or SPTB-encoded protein specific antibodies or ligands immobilized on the solid phase carrier.
The results of the verification of the present verification embodiment show that assigning an inherent weight to an indication may moderately improve the performance of the present method relative to the default settings.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing related hardware, and the program may be stored in a computer readable storage medium, where the storage medium may include a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.
While the foregoing describes a computer device provided by the present invention in detail, those skilled in the art will appreciate that the foregoing description is not meant to limit the invention thereto, as long as the scope of the invention is defined by the claims appended hereto.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411127026.2A CN119069003A (en) | 2024-08-16 | 2024-08-16 | A prediction system and electronic device for congenital heart disease combined with intellectual disability |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411127026.2A CN119069003A (en) | 2024-08-16 | 2024-08-16 | A prediction system and electronic device for congenital heart disease combined with intellectual disability |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN119069003A true CN119069003A (en) | 2024-12-03 |
Family
ID=93639975
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411127026.2A Pending CN119069003A (en) | 2024-08-16 | 2024-08-16 | A prediction system and electronic device for congenital heart disease combined with intellectual disability |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119069003A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119068973A (en) * | 2024-08-16 | 2024-12-03 | 大连市妇女儿童医疗中心(集团) | A prediction method, system, electronic device and storage medium for congenital heart disease combined with intellectual disability |
| CN120636531A (en) * | 2025-06-03 | 2025-09-12 | 北京师范大学 | Developmental dynamic expression pattern prediction method, model, system and database structure based on gene characteristics |
-
2024
- 2024-08-16 CN CN202411127026.2A patent/CN119069003A/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119068973A (en) * | 2024-08-16 | 2024-12-03 | 大连市妇女儿童医疗中心(集团) | A prediction method, system, electronic device and storage medium for congenital heart disease combined with intellectual disability |
| CN119068973B (en) * | 2024-08-16 | 2025-04-18 | 大连市妇女儿童医疗中心(集团) | A prediction method, system, electronic device and storage medium for congenital heart disease combined with intellectual disability |
| CN120636531A (en) * | 2025-06-03 | 2025-09-12 | 北京师范大学 | Developmental dynamic expression pattern prediction method, model, system and database structure based on gene characteristics |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kallioniemi | Biochip technologies in cancer research | |
| CN119069003A (en) | A prediction system and electronic device for congenital heart disease combined with intellectual disability | |
| US20220025443A1 (en) | Methods for screening biological samples for contamination | |
| CA2905517A1 (en) | Methods and compositions for tagging and analyzing samples | |
| WO2004097051A2 (en) | Methods for diagnosing aml and mds differential gene expression | |
| CN103168118A (en) | Gene expression profiling with reduced number of transcript measurements | |
| CN114875149A (en) | Application of reagent for detecting biomarkers in preparation of product for predicting gastric cancer prognosis | |
| CN114164264A (en) | Method for evaluating endometrial receptivity of a patient and kit for carrying out the method | |
| CN112921083A (en) | Genetic markers in the assessment of intestinal polyps and colorectal cancer | |
| WO2011060080A2 (en) | Genes differentially expressed by cumulus cells and assays using same to identify pregnancy competent oocytes | |
| CN106033087B (en) | The method system of built-in property standard curve detection substance molecular number | |
| CN113999900A (en) | Method for evaluating fetal DNA concentration by using free DNA of pregnant woman and application | |
| CN111748640A (en) | Application of intestinal flora in sarcopenia | |
| CN116445606A (en) | Application of serum molecular marker COMP in auxiliary diagnosis of depression | |
| CN112662754B (en) | Application method of composition for predicting the probability of occurrence of microtia | |
| WO2010142751A1 (en) | In vitro diagnosis/prognosis method and kit for assessment of chronic antibody mediated rejection in kidney transplantation | |
| CN119068973B (en) | A prediction method, system, electronic device and storage medium for congenital heart disease combined with intellectual disability | |
| CN113151465A (en) | Products and related applications for identifying polyps and cancers based on genetic markers | |
| CN112980959A (en) | Genetic markers for predicting or diagnosing colorectal cancer/colorectal cancer risk | |
| CN115831367B (en) | A risk prediction model for pregnancy complications and its application | |
| CN119842887B (en) | Biomarkers and their application in evaluating severe bronchiolitis in children with respiratory syncytial virus infection | |
| CN116287208B (en) | A non-invasive method for diagnosing coronary artery ectasia | |
| CN118127149B (en) | Biomarker, model and kit for assessing risk of sepsis and infection in a subject | |
| US20170121774A1 (en) | Methods and compositions for assessing predicting responsiveness to a tnf inhibitor | |
| CN116875673A (en) | System for diagnosing myocardial infarction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |