[go: up one dir, main page]

HK1239740B - Biomarkers for rheumatoid arthritis and usage thereof - Google Patents

Biomarkers for rheumatoid arthritis and usage thereof

Info

Publication number
HK1239740B
HK1239740B HK17113274.1A HK17113274A HK1239740B HK 1239740 B HK1239740 B HK 1239740B HK 17113274 A HK17113274 A HK 17113274A HK 1239740 B HK1239740 B HK 1239740B
Authority
HK
Hong Kong
Prior art keywords
con
seq
nucleotide sequence
rheumatoid arthritis
biomarkers
Prior art date
Application number
HK17113274.1A
Other languages
Chinese (zh)
Other versions
HK1239740A1 (en
Inventor
冯强
张东亚
贾慧珏
王东辉
王俊
Original Assignee
深圳华大基因科技有限公司
深圳华大生命科学研究院
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大生命科学研究院 filed Critical 深圳华大基因科技有限公司
Publication of HK1239740A1 publication Critical patent/HK1239740A1/en
Publication of HK1239740B publication Critical patent/HK1239740B/en

Links

Description

类风湿性关节炎的生物标记物及其用途Biomarkers for rheumatoid arthritis and their uses

相关申请的交叉参考CROSS-REFERENCE TO RELATED APPLICATIONS

none

技术领域Technical Field

本发明涉及生物医学领域,且特别地涉及用于预测与微生物群有关的疾病,特别是类风湿性关节炎(RA)的风险的生物标记物和方法。The present invention relates to the field of biomedicine, and in particular to biomarkers and methods for predicting the risk of diseases associated with microbiota, in particular rheumatoid arthritis (RA).

背景技术Background Art

类风湿性关节炎(RA)是影响全球数千万人口的使人衰弱的自身免疫性疾病并增加了患有其心血管和其它全身性并发症的患者的死亡率,但RA的病因仍不清楚。传染性病原体一直与RA有牵连。然而,RA相关的病原体的特征和致病性很大程度上是不清楚的,而最近确定人体是寄宿数以万亿计的有益以及有害微生物的超级有机体(super-organism)使问题进一步复杂化。尽管使用疾病缓解性抗风湿药(DMARD)成功减轻了许多RA患者的状态,但是对触发或促进疾病的因素的不充分认识阻碍了开发特异性和更有效的治疗方法。对微生物的调查研究也揭示了预防或减轻RA的益生菌。Rheumatoid arthritis (RA) is a debilitating autoimmune disease that affects tens of millions of people worldwide and increases the mortality rate of patients with its cardiovascular and other systemic complications, but the cause of RA remains unclear. Infectious pathogens have been implicated in RA. However, the characteristics and pathogenicity of RA-related pathogens are largely unclear, and the recent determination that the human body is a super-organism that hosts trillions of beneficial and harmful microorganisms has further complicated the problem. Although the use of disease-modifying antirheumatic drugs (DMARDs) has successfully alleviated the condition of many RA patients, insufficient understanding of the factors that trigger or promote the disease has hindered the development of specific and more effective treatments. Investigations into microorganisms have also revealed probiotics that prevent or alleviate RA.

认为在关节炎症发病前RA在某些其它身体部位发起并潜伏了数年。肠道微生物群是人类健康的关键环境因素,在肥胖、糖尿病、结肠癌等中具有确定的作用。除了在营养和异生物代谢方面起作用外,末端肠道中的微生物还与神经-免疫-内分泌系统和血流相互作用以影响整个人体。肠微生物群与给定个体稳定相关,增加了其在疾病相关研究中的价值。人群中肠道微生物群的异质性表明,疾病的治疗应该根据肠道微生物群个体化,其在药物激活或失活、免疫调节等中的作用很大程度上仍然不清楚。与道微生物群相比,口腔微生物群相对地处于研究中,其中人类微生物群计划(HMP)仅采集了约100名健康个体用于WGS(Human Microbiome Project Consortium.A framework for human microbiomeresearch.Nature 486,215–21(2012),通过引用并入本文)。尽管事实上牙齿和唾液样本在门诊治疗中比粪便样本更容易获得,但是一直以来缺乏对口腔微生物群在疾病中作用的宏基因组分析粪便。还未知的是口腔和肠道微生物疾病标记物在它们的特性或功能方面一致到何种程度。It is believed that RA initiates and lurks in certain other parts of the body for several years before the onset of joint inflammation. The intestinal microbiome is a key environmental factor in human health and has a definite role in obesity, diabetes, colon cancer, etc. In addition to playing a role in nutrition and xenobiotic metabolism, the microorganisms in the terminal intestine also interact with the neuro-immune-endocrine system and blood flow to affect the entire human body. The intestinal microbiome is stably associated with a given individual, increasing its value in disease-related research. The heterogeneity of the intestinal microbiome in the population suggests that the treatment of the disease should be individualized according to the intestinal microbiome, and its role in drug activation or inactivation, immune regulation, etc. remains largely unclear. Compared with the oral microbiome, the oral microbiome is relatively under research, with the Human Microbiome Project (HMP) only collecting about 100 healthy individuals for WGS (Human Microbiome Project Consortium. A framework for human microbiomeresearch. Nature 486, 215–21 (2012), incorporated herein by reference). Despite the fact that dental and saliva samples are more readily available in outpatient settings than stool samples, metagenomic analyses of the role of the oral microbiome in disease have been lacking. It is also unknown to what extent oral and gut microbial disease markers are concordant in their identity or function.

发明内容Summary of the Invention

本公开的实施方式旨在至少在一定程度上解决现有技术中存在的问题的至少之一。The embodiments of the present disclosure are intended to solve at least one of the problems existing in the prior art, at least to some extent.

本发明是基于本发明人的以下发现:The present invention is based on the following findings of the present inventors:

肠道微生物的评估和表征已经成为包括类风湿性关节炎(RA)的人类疾病的主要研究领域。为了对RA患者的肠道微生物内容物进行分析,本发明人基于对来自212个个体的微生物DNA进行深度鸟枪法测序进行了宏基因组关联分析(Metagenome-Wide AssociationStudy,MGWAS)(Qin,J.等人.A metagenome-wide association study of gut microbiotain type2diabetes.Nature 490,55–60(2012),通过引用并入本文)的方案。本发明人基于RA-相关的基因标记物通过随机森林模型鉴定出并证实了肠道/牙齿/唾液标记物组(29个肠道MLG\28个牙齿MLG\19个唾液MLG)。为了基于这些29个肠道MLG\28个牙齿MLG\19个唾液MLG直观地评估RA疾病的风险,本发明人基于训练集中的MLG标记物的相对丰度谱通过随机森林模型分别计算了疾病的概率。本发明人的数据提供了对与RA风险相关的肠道/牙齿/唾液宏基因组的特征的深入理解,对将来研究肠道/牙齿/唾液宏基因组在其它相关疾病中的病理生理学作用提供了范例,以及提供了基于微生物群的用于评估个体有风险患有这种疾病的方法的潜在用途。The evaluation and characterization of intestinal microorganisms have become a major research area for human diseases including rheumatoid arthritis (RA). In order to analyze the intestinal microbial contents of RA patients, the present inventors conducted a metagenomic association study (Metagenome-Wide Association Study, MGWAS) (Qin, J. et al. A metagenome-wide association study of gut microbiotain type 2 diabetes. Nature 490, 55–60 (2012), incorporated herein by reference) based on deep shotgun sequencing of microbial DNA from 212 individuals. The present inventors identified and confirmed the intestinal/tooth/saliva marker group (29 intestinal MLG\28 tooth MLG\19 saliva MLG) based on RA-related gene markers by random forest model. In order to intuitively assess the risk of RA disease based on these 29 intestinal MLG\28 tooth MLG\19 saliva MLG, the present inventors calculated the probability of the disease separately by random forest model based on the relative abundance spectrum of MLG markers in the training set. Our data provide insights into the characteristics of the gut/tooth/saliva metagenome associated with RA risk, serve as a paradigm for future studies investigating the pathophysiological roles of the gut/tooth/saliva metagenome in other related diseases, and offer potential applications for microbiome-based methods for assessing individuals at risk for this disease.

据认为,由于以下原因,RA-相关的肠道微生物群(29个肠道MLG\28个牙齿MLG\19个唾液MLG)对在早期阶段增加RA检测是有价值的。第一,本发明的标记物具有特异性和灵敏性。第二,粪便的分析保证准确性、安全性、可负担性和患者依从性。并且粪便的样本是可运输的。基于聚合酶链反应(PCR)的试验舒适且无创,所以人们会更容易参与给定的筛选程序。第三,本发明的标记物还可以用作用于对RA患者进行治疗监测的工具以检测对治疗的响应。It is believed that the RA-associated gut microbiota (29 intestinal MLGs\28 dental MLGs\19 salivary MLGs) is valuable for increasing RA detection in the early stages for the following reasons. First, the markers of the present invention are specific and sensitive. Second, analysis of feces guarantees accuracy, safety, affordability and patient compliance. And fecal samples are transportable. Polymerase chain reaction (PCR)-based tests are comfortable and non-invasive, so people will be more likely to participate in a given screening program. Third, the markers of the present invention can also be used as a tool for treatment monitoring of RA patients to detect response to treatment.

一方面,提供了用于预测受试者与微生物群有关的疾病的生物标记物组,且根据本公开的实施方式,该生物标记物组由肠道生物标记物、牙齿生物标记物、唾液生物标记物或具有包括SEQ ID NO:1至9319的至少部分序列的基因组DNA的微生物组成,其中In one aspect, a biomarker panel for predicting a disease associated with a microbiome in a subject is provided, and according to an embodiment of the present disclosure, the biomarker panel is composed of intestinal biomarkers, dental biomarkers, salivary biomarkers, or microorganisms having genomic DNA comprising at least a portion of the sequence of SEQ ID NOs: 1 to 9319, wherein

肠道生物标记物包括齿双歧杆菌(Bifidobacterium dentium)、RA-2633、肠球菌属(Enterococcus sp.)、RA-781、Gordonibacter pamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属(Clostridium sp.)、RA-2637、柠檬酸杆菌属(Citrobacter sp.)、真杆菌属(Eubacterium sp.)、柠檬酸杆菌属、RA-3215、Con-1722、Con-4360、Con-4212、Con-1261、两歧双歧杆菌(Bifidobacterium bifidum)、肺炎克雷伯菌(Klebsiellapneumoniae)、Con-1423、韦荣氏球菌属(Veillonella sp.)、Con-4095、Con-4103、Con-1735、Con-1710、Con-1832、Con-1170,Gut biomarkers included Bifidobacterium dentium, RA-2633, Enterococcus sp., RA-781, Gordonibacter pamelaeae, RA-3396, RA-6638, RA-2441, RA-527, Clostridium sp., RA-2637, Citrobacter sp., Eubacterium sp., Citrobacter sp., RA-3215, Con-1722, Con-4360, Con-4212, Con-1261, Bifidobacterium bifidum, Klebsiella pneumoniae, Con-1423, Veillonella sp.), Con-4095, Con-4103, Con-1735, Con-1710, Con-1832, Con-1170,

牙齿生物标记物包括RA-10848、RA-9842、RA-9941、RA-9938、RA-10684、RA-9998、Con-7913、Con-20702、Con-11、Con-8169、Con-1708、Con-7847、Con-5233、Con-791、Con-5566、Con-4455、Con-13169、Con-6088、Con-5554、Con-14781、Con-2466、Con-483、Con-2562、Con-4701、Con-4824、Con-5030、Con-757、Con-530,以及Dental biomarkers include RA-10848, RA-9842, RA-9941, RA-9938, RA-10684, RA-9998, Con-7913, Con-20702, Con-11, Con-8169, Con-1708, Con-7847, Con-5233, Con-791 , Con-5566, Con-4455, Con-13169, Con-6088, Con-5554, Con-14781, Con-2466, Con-483, Con-2562, Con-4701, Con-4824, Con-5030, Con-757, Con-530, and

唾液生物标记物包括RA-27683、RA-9651、RA-13621、RA-27616、Con-6908、Con-305、Con-1559、Con-1374、Con-6746、直肠弯曲杆菌(Campylobacter rectus)、Con-1141、Con-20、链球菌属(Streptococcus sp.)、Con-1238、Con-1073、Con-636、Con-1、牙龈卟啉单胞菌(Porphyromonas gingivalis)、乳球菌属(Lactococcus sp.),Salivary biomarkers include RA-27683, RA-9651, RA-13621, RA-27616, Con-6908, Con-305, Con-1559, Con-1374, Con-6746, Campylobacter rectus, Con-1141, Con-20, Streptococcus sp., Con-1238, Con-1073, Con-636, Con-1, Porphyromonas gingivalis, Lactococcus sp.,

或者基因组DNA包含SEQ ID NO:1至9319的至少部分序列的微生物。Or a microorganism whose genomic DNA comprises at least a portion of the sequence of SEQ ID NO: 1 to 9319.

可选地,生物标记物组由列于表2-2中的种属中的至少一种种属组成,优选地由至少10%、至少20%、至少30%、至少40%、至少50%、至少60%、至少70%、至少80%、至少90%、至少100%的列于表2-2中的种属组成。Optionally, the biomarker panel consists of at least one species listed in Table 2-2, preferably consists of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100% of the species listed in Table 2-2.

根据本公开的实施方式,肠道生物标记物包括如表5中所述的SEQ ID NO:1至9319的至少部分序列。According to an embodiment of the present disclosure, the intestinal biomarker comprises at least a portion of the sequences of SEQ ID NOs: 1 to 9319 as described in Table 5.

根据本公开的实施方式,肠道生物标记物包括齿双歧杆菌JCVIHMP022、普氏菌CB7(Prevotella copri CB7)、DSM 18205、屎肠球菌E980(Enterococcus faecium E980)、卵形瘤胃球菌A2-162(Ruminococcus obeum A2-162)、Gordonibacter pamelaeae 7-10-1-bT、DSM 19378、布氏瘤胃球菌L2-63(Ruminococcus bromii L2-63)、凸腹真杆菌ATCC 27560(Eubacterium ventriosum ATCC 27560)、产酸克雷伯菌KCTC 1686(Klebsiella oxytocaKCTC 1686)、Clostridium asparagiforme DSM 15981、普氏菌CB7(Prevotella copriCB7)、DSM 18205、弗氏柠檬酸杆菌4_7_47CFAA(Citrobacter freundii 4_7_47CFAA)、真杆菌属3_1_31(Eubacterium sp.3_1_31)、柠檬酸杆菌属30_2(Citrobacter sp.30_2)、梭状芽孢杆菌属7_2_43FAA(Clostridium sp.7_2_43FAA)、罗氏弧菌M50/1(Roseburiaintestinalis M50/1)、Dialister invisus DSM 15470、Bacteroides plebeius M12、DSM17135、两歧双歧杆菌S17(Bifidobacterium bifidum S17)、肺炎克雷伯菌NTUH-K2044(Klebsiella pneumoniae NTUH-K2044)、韦荣氏球菌属口腔分类群158F0412(Veillonellasp.oral taxon 158F0412)、睾丸酮丛毛单胞菌KF-1(Comamonas testosteroni KF-1)、肺炎克雷伯菌NTUH-K2044(Klebsiella pneumoniae NTUH-K2044)、非典型韦荣球菌ACS-134-V-Col7a(Veillonella atypica ACS-134-V-Col7a)、澳大利亚链球菌ATCC 700641(Streptococcus australis ATCC 700641)、Parabacteroides merdae ATCC 43184,According to an embodiment of the present disclosure, intestinal biomarkers include Bifidobacterium dentate JCVIHMP022, Prevotella copri CB7, DSM 18205, Enterococcus faecium E980, Ruminococcus obeum A2-162, Gordonibacter pamelaeae 7-10-1-bT, DSM 19378, Ruminococcus bromii L2-63, Eubacterium ventriosum ATCC 27560, Klebsiella oxytoca KCTC 1686, Clostridium asparagiforme DSM 15981, Prevotella copriCB7, DSM 18205, Citrobacter freundii 4_7_47CFAA, Eubacterium sp. 3_1_31, Citrobacter sp. 30_2, Clostridium sp. 7_2_43FAA, Roseburia intestinalis M50/1, Dialister invisus DSM 15470, Bacteroides plebeius M12, DSM 17135, Bifidobacterium bifidum S17, Klebsiella pneumoniae NTUH-K2044 NTUH-K2044), Veillonellasp. oral taxon 158F0412, Comamonas testosteroni KF-1, Klebsiella pneumoniae NTUH-K2044, Veillonella atypica ACS-134-V-Col7a, Streptococcus australis ATCC 700641, Parabacteroides merdae ATCC 43184,

牙齿生物标记物包括放线菌属口腔分类群180F0310(Actinomyces sp.oraltaxon 180F0310)、粘滑罗斯菌DY-18(Rothia mucilaginosa DY-18)、Actinomycesgraevenitzii C83、龋齿放线菌ATCC 17982(Actinomyces odontolyticus ATCC 17982)、非典型韦荣球菌ACS-134-V-Col7a(Veillonella atypica ACS-134-V-Col7a)、放线菌属F0384(Actinomyces sp.F0384)、放线菌属口腔分类群848F0332(Actinomyces sp.oraltaxon 848F0332)、粘膜奈瑟菌M26(Neisseria mucosa M26)、ATCC 25996、放线菌属口腔分类群448F0400(Actinomyces sp.oral taxon 448F0400)、福赛斯坦纳菌ATCC 43037(Tannerella forsythensis ATCC 43037)、放线菌属口腔分类群448F0400(Actinomycessp.oral taxon 448F0400)、杆状奈瑟菌ATCC BAA-1200(Neisseria bacilliformis ATCCBAA-1200)、互养菌门细菌SGP1(Synergistetes bacterium SGP1)、奇异口动菌ATCC 51599(Lautropia mirabilis ATCC 51599)、牙龈二氧化碳嗜纤维菌ATCC 33624(Capnocytophaga gingivalis ATCC 33624)、人心杆菌ATCC 15826(Cardiobacteriumhominis ATCC 15826)、牙龈二氧化碳嗜纤维菌ATCC 33624(Capnocytophaga gingivalisATCC 33624)、奇异口动菌ATCC 51599(Lautropia mirabilis ATCC 51599)、懒惰约翰森菌ATCC 51276(Johnsonella ignava ATCC 51276)、费氏丙酸杆菌谢氏CIRM-BIA1(Propionibacterium freudenreichii shermanii CIRM-BIA1)、齿垢密螺旋体ATCC 35405(Treponema denticola ATCC 35405)、梭杆菌属口腔分类群370F0437(Fusobacteriumsp.oral taxon 370F0437)、奇异口动菌ATCC 51599(Lautropia mirabilis ATCC 51599)、侵蚀艾肯菌ATCC 23834(Eikenella corrodens ATCC 23834)、有害新月形单胞菌ATCC43541(Selenomonas noxia ATCC 43541)、利氏卟啉单胞菌DSM 23370(Porphyromonaslevii DSM 23370)、Bulleidia extructa W1219,Dental biomarkers included Actinomyces sp. oraltaxon 180F0310, Rothia mucilaginosa DY-18, Actinomyces graevenitzii C83, Actinomyces odontolyticus ATCC 17982, Veillonella atypica ACS-134-V-Col7a, Actinomyces sp. F0384, Actinomyces sp. oraltaxon 848F0332, Neisseria mucosa M26, ATCC 25996, Actinomyces sp. oral taxon 448F0400, Tannerella forsythensis ATCC 43037, Actinomyces sp. oral taxon 448F0400, Neisseria bacilliformis ATCC BAA-1200, Synergistetes bacterium SGP1, Lautropia mirabilis ATCC 51599, Capnocytophaga gingivalis ATCC 33624, Cardiomycobacterium ATCC Cardiobacterium hominis ATCC 15826, Capnocytophaga gingivalis ATCC 33624, Lautropia mirabilis ATCC 51599, Johnsonella ignava ATCC 51276, Propionibacterium freudenreichii shermanii CIRM-BIA1, Treponema denticola ATCC 35405, Fusobacterium sp. oral taxon 370F0437, and Cardiobacterium hominis ATCC 15826. 51599 (Lautropia mirabilis ATCC 51599), Eikenella corrodens ATCC 23834 (Eikenella corrodens ATCC 23834), Selenomonas noxia ATCC 43541 (Selenomonas noxia ATCC 43541), Porphyromonas levii DSM 23370 (Porphyromonas levii DSM 23370), Bulleidia extructa W1219,

唾液生物标记物包括溶血孪生球菌ATCC 10379(Gemella haemolysans ATCC10379)、非典型韦荣球菌ACS-049-V-Sch6(Veillonella atypica ACS-049-V-Sch6)、龋齿放线菌ATCC 17982(Actinomyces odontolyticus ATCC 17982)、龋齿放线菌ATCC 17982(Actinomyces odontolyticus ATCC 17982)、齿垢密螺旋体ATCC 35405(Treponemadenticola ATCC 35405)、放线菌属口腔分类群448F0400(Actinomyces sp.oral taxon448F0400)、文氏密螺旋体ATCC 35580(Treponema vincentii ATCC 35580)、澳大利亚链球菌ATCC 700641(Streptococcus australis ATCC 700641)、直肠弯曲杆菌RM3267(Campylobacter rectus RM3267)、CCUG 20446、放线菌属口腔分类群171F0337(Actinomyces sp.oral taxon 171F0337)、齿垢密螺旋体ATCC 35405(Treponemadenticola ATCC 35405)、血链球菌VMC66(Streptococcus sanguinis VMC66)、放线菌属口腔分类群448F0400(Actinomyces sp.oral taxon 448F0400)、放线菌属口腔分类群448F0400(Actinomyces sp.oral taxon 448F0400)、杆状奈瑟菌ATCC BAA-1200(Neisseria bacilliformis ATCC BAA-1200)、鼻疽伯克霍尔德氏菌PRL-20(Burkholderiamallei PRL-20)、牙龈卟啉单胞菌TDC60(Porphyromonas gingivalis TDC60)、乳酸乳球菌乳亚种KF147(Lactococcus lactis lactis KF147)。Salivary biomarkers included Gemella haemolysans ATCC 10379, Veillonella atypica ACS-049-V-Sch6, Actinomyces odontolyticus ATCC 17982, Treponema denticola ATCC 35405, Actinomyces sp. oral taxon 448F0400, Treponema vincentii ATCC 35580, and Actinomyces sp. oral taxon 448F0400. 35580), Streptococcus australis ATCC 700641, Campylobacter rectus RM3267, CCUG 20446, Actinomyces sp. oral taxon 171F0337, Treponema denticola ATCC 35405, Streptococcus sanguinis VMC66, Actinomyces sp. oral taxon 448F0400, Actinomyces sp. oral taxon 448F0400, Neisseria truncatula ATCC BAA-1200 (Neisseria bacilliformis ATCC BAA-1200), Burkholderia mallei PRL-20 (Burkholderiamallei PRL-20), Porphyromonas gingivalis TDC60 (Porphyromonas gingivalis TDC60), Lactococcus lactis lactis KF147 (Lactococcus lactis lactis KF147).

在本公开的另一方面,提供了用于预测受试者与微生物群有关的疾病的生物标记物组,根据本公开的实施方式,该生物标记物组由肠道生物标记物、牙齿生物标记物和唾液标记物组成,其中In another aspect of the present disclosure, a biomarker panel for predicting a disease associated with a microbiome in a subject is provided. According to an embodiment of the present disclosure, the biomarker panel consists of intestinal biomarkers, dental biomarkers, and salivary biomarkers, wherein

肠道生物标记物包括SEQ ID NO:1至9319的至少部分序列。The intestinal biomarkers include at least part of the sequences of SEQ ID NOs: 1 to 9319.

根据本公开的实施方式,疾病为类风湿性关节炎或相关疾病。According to an embodiment of the present disclosure, the disease is rheumatoid arthritis or a related disease.

在本公开的另一方面,提供了用于确定上述基因标记物组的试剂盒,包括用于PCR扩增和根据如下列出的DNA序列设计的引物:In another aspect of the present disclosure, a kit for determining the above-mentioned gene marker group is provided, comprising primers for PCR amplification and designed according to the DNA sequences listed below:

肠道生物标记物包括SEQ ID NO:1至9319的至少部分序列。The intestinal biomarkers include at least part of the sequences of SEQ ID NOs: 1 to 9319.

在本公开的另一方面,提供了用于确定上述基因标记物组的试剂盒,包括一种以上根据如下所列出的基因设计的探针:肠道生物标记物包括SEQ ID NO:1至9319的至少部分序列。In another aspect of the present disclosure, a kit for determining the above gene marker panel is provided, comprising one or more probes designed according to the genes listed below: The intestinal biomarkers include at least part of the sequences of SEQ ID NOs: 1 to 9319.

在本公开的另一方面,提供了上述基因标记物组用于预测待测受试者类风湿性关节炎或相关疾病的风险的用途,包括:In another aspect of the present disclosure, there is provided a use of the above gene marker panel for predicting the risk of rheumatoid arthritis or related diseases in a subject to be tested, comprising:

(1)从待测受试者中采集样本;(1) Collecting samples from the subjects to be tested;

(2)确定步骤(1)中获得的样本中根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物的相对丰度信息;(2) determining the relative abundance information of each biomarker of the biomarker panel according to any one of claims 1 to 5 in the sample obtained in step (1);

(3)通过采用多元统计模型将待测受试者的各个生物标记物的相对丰度信息与训练数据集进行比较获得类风湿性关节炎的概率,(3) The probability of rheumatoid arthritis is obtained by comparing the relative abundance information of each biomarker of the test subject with the training data set using a multivariate statistical model,

其中类风湿性关节炎的概率大于阈值表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。The probability of rheumatoid arthritis being greater than a threshold value indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or has a risk of developing rheumatoid arthritis or a related disease.

根据本公开的实施方式,训练数据集是采用多元统计模型基于多个患有类风湿性关节炎的受试者和多个正常受试者的各个生物标记物的相对丰度信息构建的,可选地,多元统计模型为随机森林模型。According to an embodiment of the present disclosure, a training data set is constructed using a multivariate statistical model based on the relative abundance information of each biomarker of multiple subjects with rheumatoid arthritis and multiple normal subjects. Optionally, the multivariate statistical model is a random forest model.

根据本公开的实施方式,训练数据集为矩阵,其中各行表示根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物,各列表示样本,各个单元表示样本中的生物标记物的相对丰度谱,且样本疾病状态为向量,其中1表示类风湿性关节炎且0表示对照。According to an embodiment of the present disclosure, the training data set is a matrix, wherein each row represents each biomarker of the biomarker panel according to any one of claims 1 to 5, each column represents a sample, each cell represents the relative abundance spectrum of the biomarker in the sample, and the sample disease state is a vector, wherein 1 represents rheumatoid arthritis and 0 represents the control.

根据本公开的实施方式,齿双歧杆菌、RA-2633、肠球菌属、RA-781、Gordonibacterpamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属、RA-2637、柠檬酸杆菌属、真杆菌属、柠檬酸杆菌属、RA-3215、Con-1722、Con-4360、Con-4212、Con-1261、两歧双歧杆菌、肺炎克雷伯菌、Con-1423、韦荣氏球菌属、Con-4095、Con-4103、Con-1735、Con-1710、Con-1832和Con-1170中的每一个的相对丰度信息,例如齿双歧杆菌JCVIHMP022、普氏菌CB7、DSM 18205、屎肠球菌E980、卵形瘤胃球菌A2-162、Gordonibacter pamelaeae 7-10-1-bT、DSM 19378、布氏瘤胃球菌L2-63、凸腹真杆菌ATCC 27560、产酸克雷伯菌KCTC 1686、Clostridium asparagiforme DSM 15981、普氏菌CB7、DSM 18205、弗氏柠檬酸杆菌4_7_47CFAA、真杆菌属3_1_31、柠檬酸杆菌属30_2、梭状芽孢杆菌属7_2_43FAA、罗氏弧菌M50/1、Dialister invisus DSM 15470、Bacteroides plebeius M12、DSM 17135、两歧双歧杆菌S17、肺炎克雷伯菌NTUH-K2044、韦荣氏球菌属口腔分类群158F0412、睾丸酮丛毛单胞菌KF-1、肺炎克雷伯菌NTUH-K2044、非典型韦荣球菌ACS-134-V-Col7a、澳大利亚链球菌ATCC700641、Parabacteroides merdae ATCC 43184的相对丰度信息是根据SEQ ID NO:1至9319的相对丰度信息获得的。According to an embodiment of the present disclosure, the relative abundance information of each of Bifidobacterium dentate, RA-2633, Enterococcus, RA-781, Gordonibacter pamelaeae, RA-3396, RA-6638, RA-2441, RA-527, Clostridium, RA-2637, Citrobacter, Eubacterium, Citrobacter, RA-3215, Con-1722, Con-4360, Con-4212, Con-1261, Bifidobacterium bifidum, Klebsiella pneumoniae, Con-1423, Veillonella, Con-4095, Con-4103, Con-1735, Con-1710, Con-1832, and Con-1170, such as Bifidobacterium dentate JCVIHMP022, Prevotella CB7, DSM 18205, Enterococcus faecium E980, Ruminococcus ovale A2-162, Gordonibacter pamelaeae 7-10-1-bT, DSM 19378, Ruminococcus brucei L2-63, Eubacterium convexum ATCC 27560, Klebsiella oxytoca KCTC 1686, Clostridium asparagiforme DSM 15981, Prevotella CB7, DSM 18205, Citrobacter freundii 4_7_47CFAA, Eubacterium 3_1_31, Citrobacter 30_2, Clostridium 7_2_43FAA, Vibrio rosenbergii M50/1, Dialister invisus DSM 15470, Bacteroides plebeius M12, DSM The relative abundance information of 17135, Bifidobacterium bifidum S17, Klebsiella pneumoniae NTUH-K2044, Veillonella oral taxon 158F0412, Comamonas testosteroni KF-1, Klebsiella pneumoniae NTUH-K2044, atypical Veillonella ACS-134-V-Col7a, Streptococcus australis ATCC700641, and Parabacteroides merdae ATCC 43184 was obtained based on the relative abundance information of SEQ ID NOs: 1 to 9319.

根据本公开的实施方式,训练数据集为表8-1和表8-2的至少之一,且类风湿性关节炎的概率为至少0.5表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。According to an embodiment of the present disclosure, the training data set is at least one of Table 8-1 and Table 8-2, and a probability of rheumatoid arthritis of at least 0.5 indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or is at risk of developing rheumatoid arthritis or a related disease.

在本公开的另一方面,提供了上述基因标记物在制备用于预测待测受试者类风湿性关节炎或相关疾病的风险的试剂盒的用途,包括:In another aspect of the present disclosure, there is provided a use of the above-mentioned gene markers in preparing a kit for predicting the risk of rheumatoid arthritis or related diseases in a test subject, comprising:

(1)从待测受试者中采集样本;(1) Collecting samples from the subjects to be tested;

(2)确定步骤(1)中获得的样本中根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物的相对丰度信息;(2) determining the relative abundance information of each biomarker of the biomarker panel according to any one of claims 1 to 5 in the sample obtained in step (1);

(3)通过采用多元统计模型将待测受试者的各个生物标记物的相对丰度信息与训练数据集进行比较获得类风湿性关节炎的概率,(3) The probability of rheumatoid arthritis is obtained by comparing the relative abundance information of each biomarker of the test subject with the training data set using a multivariate statistical model,

其中类风湿性关节炎的概率大于阈值表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。The probability of rheumatoid arthritis being greater than a threshold value indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or has a risk of developing rheumatoid arthritis or a related disease.

根据本公开的实施方式,训练数据集是采用多元统计模型基于多个患有类风湿性关节炎的受试者和多个正常受试者的各个生物标记物的相对丰度信息构建的,可选地,多元统计模型为随机森林模型。According to an embodiment of the present disclosure, a training data set is constructed using a multivariate statistical model based on the relative abundance information of each biomarker of multiple subjects with rheumatoid arthritis and multiple normal subjects. Optionally, the multivariate statistical model is a random forest model.

根据本公开的实施方式,训练数据集为矩阵,其中各行表示根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物,各列表示样本,各个单元表示样本中的生物标记物的相对丰度谱,且样本疾病状态为向量,其中1表示类风湿性关节炎且0表示对照。According to an embodiment of the present disclosure, the training data set is a matrix, wherein each row represents each biomarker of the biomarker panel according to any one of claims 1 to 5, each column represents a sample, each cell represents the relative abundance spectrum of the biomarker in the sample, and the sample disease state is a vector, wherein 1 represents rheumatoid arthritis and 0 represents the control.

根据本公开的实施方式,齿双歧杆菌、RA-2633、肠球菌属、RA-781、Gordonibacterpamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属、RA-2637、柠檬酸杆菌属、真杆菌属、柠檬酸杆菌属、RA-3215、Con-1722、Con-4360、Con-4212、Con-1261、两歧双歧杆菌、肺炎克雷伯菌、Con-1423、韦荣氏球菌属、Con-4095、Con-4103、Con-1735、Con-1710、Con-1832和Con-1170中的每一个的相对丰度信息,例如齿双歧杆菌JCVIHMP022、普氏菌CB7、DSM 18205、屎肠球菌E980、卵形瘤胃球菌A2-162、Gordonibacter pamelaeae 7-10-1-bT、DSM 19378、布氏瘤胃球菌L2-63、凸腹真杆菌ATCC 27560、产酸克雷伯菌KCTC 1686、Clostridium asparagiforme DSM 15981、普氏菌CB7、DSM 18205、弗氏柠檬酸杆菌4_7_47CFAA、真杆菌属3_1_31、柠檬酸杆菌属30_2、梭状芽孢杆菌属7_2_43FAA、罗氏弧菌M50/1、Dialister invisus DSM 15470、Bacteroides plebeius M12、DSM 17135、两歧双歧杆菌S17、肺炎克雷伯菌NTUH-K2044、韦荣氏球菌属口腔分类群158F0412、睾丸酮丛毛单胞菌KF-1、肺炎克雷伯菌NTUH-K2044、非典型韦荣球菌ACS-134-V-Col7a、澳大利亚链球菌ATCC700641、Parabacteroides merdae ATCC 43184的相对丰度信息是根据SEQ ID NO:1至9319的相对丰度信息获得的。According to an embodiment of the present disclosure, the relative abundance information of each of Bifidobacterium dentate, RA-2633, Enterococcus, RA-781, Gordonibacter pamelaeae, RA-3396, RA-6638, RA-2441, RA-527, Clostridium, RA-2637, Citrobacter, Eubacterium, Citrobacter, RA-3215, Con-1722, Con-4360, Con-4212, Con-1261, Bifidobacterium bifidum, Klebsiella pneumoniae, Con-1423, Veillonella, Con-4095, Con-4103, Con-1735, Con-1710, Con-1832, and Con-1170, such as Bifidobacterium dentate JCVIHMP022, Prevotella CB7, DSM 18205, Enterococcus faecium E980, Ruminococcus ovale A2-162, Gordonibacter pamelaeae 7-10-1-bT, DSM 19378, Ruminococcus brucei L2-63, Eubacterium convexum ATCC 27560, Klebsiella oxytoca KCTC 1686, Clostridium asparagiforme DSM 15981, Prevotella CB7, DSM 18205, Citrobacter freundii 4_7_47CFAA, Eubacterium 3_1_31, Citrobacter 30_2, Clostridium 7_2_43FAA, Vibrio rosenbergii M50/1, Dialister invisus DSM 15470, Bacteroides plebeius M12, DSM The relative abundance information of 17135, Bifidobacterium bifidum S17, Klebsiella pneumoniae NTUH-K2044, Veillonella oral taxon 158F0412, Comamonas testosteroni KF-1, Klebsiella pneumoniae NTUH-K2044, atypical Veillonella ACS-134-V-Col7a, Streptococcus australis ATCC700641, and Parabacteroides merdae ATCC 43184 was obtained based on the relative abundance information of SEQ ID NOs: 1 to 9319.

根据本公开的实施方式,训练数据集为表8-1和表8-2的至少之一,且类风湿性关节炎的概率为至少0.5表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。According to an embodiment of the present disclosure, the training data set is at least one of Table 8-1 and Table 8-2, and a probability of rheumatoid arthritis of at least 0.5 indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or is at risk of developing rheumatoid arthritis or a related disease.

在本公开的另一方面,提供了诊断受试者是否具有与微生物群有关的异常状态或者有风险发展与微生物群有关的异常状态的方法,包括:In another aspect of the present disclosure, there is provided a method for diagnosing whether a subject has an abnormal state associated with a microbiota or is at risk of developing an abnormal state associated with a microbiota, comprising:

确定来自受试者的样本中的上述生物标记物的相对丰度,和determining the relative abundance of the aforementioned biomarkers in a sample from the subject, and

基于该相对丰度确定受试者是否具有与微生物群有关的异常状态或者有风险发展与微生物群有关的异常状态。Based on the relative abundance, it is determined whether the subject has an abnormal state associated with the microbiota or is at risk of developing an abnormal state associated with the microbiota.

根据本公开的实施方式,该方法包括:According to an embodiment of the present disclosure, the method includes:

(1)从待测受试者中采集样本;(1) Collecting samples from the subjects to be tested;

(2)确定步骤(1)中获得的样本中根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物的相对丰度信息;(2) determining the relative abundance information of each biomarker of the biomarker panel according to any one of claims 1 to 5 in the sample obtained in step (1);

(3)通过采用多元统计模型将待测受试者的各个生物标记物的相对丰度信息与训练数据集进行比较获得类风湿性关节炎的概率,(3) The probability of rheumatoid arthritis is obtained by comparing the relative abundance information of each biomarker of the test subject with the training data set using a multivariate statistical model,

其中类风湿性关节炎的概率大于阈值表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。The probability of rheumatoid arthritis being greater than a threshold value indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or has a risk of developing rheumatoid arthritis or a related disease.

根据本公开的实施方式,训练数据集是采用多元统计模型基于多个患有类风湿性关节炎的受试者和多个正常受试者的各个生物标记物的相对丰度信息构建的,可选地,多元统计模型为随机森林模型。According to an embodiment of the present disclosure, a training data set is constructed using a multivariate statistical model based on the relative abundance information of each biomarker of multiple subjects with rheumatoid arthritis and multiple normal subjects. Optionally, the multivariate statistical model is a random forest model.

根据本公开的实施方式,训练数据集为矩阵,其中各行表示根据权利要求1至5中任一项所述的生物标记物组的各个生物标记物,各列表示样本,各个单元表示样本中的生物标记物的相对丰度谱,且样本疾病状态为向量,其中1表示类风湿性关节炎且0表示对照。According to an embodiment of the present disclosure, the training data set is a matrix, wherein each row represents each biomarker of the biomarker panel according to any one of claims 1 to 5, each column represents a sample, each cell represents the relative abundance spectrum of the biomarker in the sample, and the sample disease state is a vector, wherein 1 represents rheumatoid arthritis and 0 represents the control.

根据本公开的实施方式,齿双歧杆菌、RA-2633、肠球菌属、RA-781、Gordonibacterpamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属、RA-2637、柠檬酸杆菌属、真杆菌属、柠檬酸杆菌属、RA-3215、Con-1722、Con-4360、Con-4212、Con-1261、两歧双歧杆菌、肺炎克雷伯菌、Con-1423、韦荣氏球菌属、Con-4095、Con-4103、Con-1735、Con-1710、Con-1832和Con-1170中的每一个的相对丰度信息,例如齿双歧杆菌JCVIHMP022、普氏菌CB7、DSM 18205、屎肠球菌E980、卵形瘤胃球菌A2-162、Gordonibacter pamelaeae 7-10-1-bT、DSM 19378、布氏瘤胃球菌L2-63、凸腹真杆菌ATCC 27560、产酸克雷伯菌KCTC 1686、Clostridium asparagiforme DSM 15981、普氏菌CB7、DSM 18205、弗氏柠檬酸杆菌4_7_47CFAA、真杆菌属3_1_31、柠檬酸杆菌属30_2、梭状芽孢杆菌属7_2_43FAA、罗氏弧菌M50/1、Dialister invisus DSM 15470、Bacteroides plebeius M12、DSM17135、两歧双歧杆菌S17、肺炎克雷伯菌NTUH-K2044、韦荣氏球菌属口腔分类群158F0412、睾丸酮丛毛单胞菌KF-1、肺炎克雷伯菌NTUH-K2044、非典型韦荣球菌ACS-134-V-Col7a、澳大利亚链球菌ATCC700641、Parabacteroides merdae ATCC 43184的相对丰度信息是根据SEQ ID NO:1至9319的相对丰度信息获得的。According to an embodiment of the present disclosure, the relative abundance information of each of Bifidobacterium dentate, RA-2633, Enterococcus, RA-781, Gordonibacter pamelaeae, RA-3396, RA-6638, RA-2441, RA-527, Clostridium, RA-2637, Citrobacter, Eubacterium, Citrobacter, RA-3215, Con-1722, Con-4360, Con-4212, Con-1261, Bifidobacterium bifidum, Klebsiella pneumoniae, Con-1423, Veillonella, Con-4095, Con-4103, Con-1735, Con-1710, Con-1832, and Con-1170, such as Bifidobacterium dentate JCVIHMP022, Prevotella CB7, DSM 18205, Enterococcus faecium E980, Ruminococcus ovale A2-162, Gordonibacter pamelaeae 7-10-1-bT, DSM 19378, Ruminococcus brucei L2-63, Eubacterium convexoventris ATCC 27560, Klebsiella oxytoca KCTC 1686, Clostridium asparagiforme DSM 15981, Prevotella CB7, DSM 18205, Citrobacter freundii 4_7_47CFAA, Eubacterium 3_1_31, Citrobacter 30_2, Clostridium 7_2_43FAA, Vibrio rosenbergii M50/1, Dialister invisus DSM 15470, Bacteroides plebeius The relative abundance information of M12, DSM17135, Bifidobacterium bifidum S17, Klebsiella pneumoniae NTUH-K2044, Veillonella oral taxon 158F0412, Comamonas testosteroni KF-1, Klebsiella pneumoniae NTUH-K2044, atypical Veillonella ACS-134-V-Col7a, Streptococcus australis ATCC700641, and Parabacteroides merdae ATCC 43184 was obtained based on the relative abundance information of SEQ ID NOs: 1 to 9319.

根据本公开的实施方式,训练数据集为表8-1和表8-2的至少之一,且类风湿性关节炎的概率为至少0.5表明待测受试者患有类风湿性关节炎或相关疾病或者有风险发展类风湿性关节炎或相关疾病。According to an embodiment of the present disclosure, the training data set is at least one of Table 8-1 and Table 8-2, and a probability of rheumatoid arthritis of at least 0.5 indicates that the subject to be tested suffers from rheumatoid arthritis or a related disease or is at risk of developing rheumatoid arthritis or a related disease.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

本公开的这些和其它的方面和优点从以下结合附图的描述中将变得明显和更容易理解,其中:These and other aspects and advantages of the present disclosure will become apparent and more readily understood from the following description taken in conjunction with the accompanying drawings, in which:

图1肠道或口腔MLG允许从健康对照中分类RA患者。(a,d,f)由未治疗的RA病例和无关的正常对照组成的粪便(a)、牙齿(d)和唾液(f)的训练集的ROC曲线(对于粪便、牙齿和唾液样本,分别为n=157,100,94)。圆点标记了最佳阈值概率的假阳性率和真阳性率。(b)对由彼此具有血缘关系或不具有血缘关系的17个对照和17个RA病例组成的粪便测试集进行分类。(c,e,g)对DMARD治疗后的粪便(c)、牙齿(e)和唾液(g)的RA样本进行分类(对于粪便、牙齿和唾液样本,分别为n=40,38,24)。根据欧洲抗风湿联盟(EULAR)标准,DAS28<2.6表明症状缓解。所有样本的分类结果列于表12。Figure 1. Gut or oral MLG allows classification of RA patients from healthy controls. (a, d, f) Receiver operating characteristic (ROC) curves for a training set of stool (a), teeth (d), and saliva (f) samples from untreated RA cases and unrelated normal controls (n = 157, 100, and 94 for stool, teeth, and saliva samples, respectively). Dots mark the false-positive and true-positive rates at the optimal threshold probability. (b) Classification of a stool test set consisting of 17 controls and 17 RA cases who were either related or unrelated. (c, e, g) Classification of stool (c), teeth (e), and saliva (g) samples of RA after DMARD treatment (n = 40, 38, and 24 for stool, teeth, and saliva samples, respectively). According to the European League Against Rheumatism (EULAR) criteria, a DAS28 < 2.6 indicates symptom remission. Classification results for all samples are listed in Table 12.

具体实施方式DETAILED DESCRIPTION

实施例Example

本文所使用的术语具有本发明相关领域的普通技术人员通常理解的含义。术语,如“一”、“一个”和“该”并非旨在仅指单数实体,而是包含采用具体实施方式来说明的一般类别。除了如在权利要求中概述的之外,本文中的术语用于描述本发明的具体实施方式,但是它们的用法不限制本发明。The terms used herein have the meanings commonly understood by persons of ordinary skill in the art to which this invention relates. Terms such as "a," "an," and "the" are not intended to refer to only singular entities, but rather encompass general categories that are described using specific embodiments. Except as outlined in the claims, the terms herein are used to describe specific embodiments of the invention, but their usage does not limit the invention.

实施方式Implementation Method

实施例1.鉴别和验证用于评估类风湿性关节炎风险的生物标记物Example 1. Identification and validation of biomarkers for assessing risk of rheumatoid arthritis

1.材料和方法1. Materials and Methods

1.1样本采集和DNA提取1.1 Sample collection and DNA extraction

本发明人采集了一共212名个体的粪便样本(表1-1,粪便样本、牙菌斑样本和唾液样本),包含训练集(n=157,77未治疗的RA病例和80名健康对照)和测试集(对于相关病例-对照对,n=34,即8个有血缘关系的病例-对照对和9个不具有血缘关系的病例-对照对;对于DMARD-治疗的RA患者,n=21)。The present inventors collected stool samples from a total of 212 individuals (Table 1-1, stool samples, dental plaque samples and saliva samples), including a training set (n = 157, 77 untreated RA cases and 80 healthy controls) and a test set (for related case-control pairs, n = 34, i.e., 8 related case-control pairs and 9 unrelated case-control pairs; for DMARD-treated RA patients, n = 21).

粪便样本是在北京协和医院采集,冷冻运输并如前所述在BGI-深圳(深圳华大基因)进行提取(Qin,J.等人.A metagenome-wide association study of gut microbiotain type 2diabetes.Nature 490,55–60(2012),通过引用并入本文)。牙菌斑是用眼科镊子从牙齿表面刮取的直到具有3μl的体积。将样本转移至200μl含有10mM Tris、1mM EDTA、0.5%吐温20和200μg/ml蛋白酶K(Fermentas)的1×裂解缓冲液并在55℃下孵育2小时。在95℃下孵育10分钟终止裂解,并在运输前将样本冷冻在-80℃。按照针对粪便样本的方案进行DNA提取。对于唾液,将100μl唾液加入到100μl的2×裂解缓冲液中,擦拭后咽壁并加入到同一试管中,然后如牙齿样本一样对样品进行裂解和提取。Fecal samples were collected at Peking Union Medical College Hospital, transported frozen and extracted at BGI-Shenzhen (BGI-Shenzhen) as previously described (Qin, J. et al. A metagenome-wide association study of gut microbiotain type 2 diabetes. Nature 490, 55–60 (2012), incorporated herein by reference). Dental plaque was scraped from the tooth surface using ophthalmic forceps until a volume of 3 μl was obtained. The samples were transferred to 200 μl of 1× lysis buffer containing 10 mM Tris, 1 mM EDTA, 0.5% Tween 20 and 200 μg/ml proteinase K (Fermentas) and incubated at 55°C for 2 hours. Lysis was terminated by incubation at 95°C for 10 minutes, and the samples were frozen at -80°C before transportation. DNA extraction was performed according to the protocol for fecal samples. For saliva, 100 μl of saliva was added to 100 μl of 2× lysis buffer, swabbed the posterior pharyngeal wall and added to the same tube, and the sample was lysed and extracted as for the dental samples.

根据2010ACR/EULAR分类标准在北京协和医院对RA进行诊断。根据标准程序,在受试者到医院初诊时采集所有表型信息。招募18至65岁之间,疾病持续时间至少6周,至少1处关节肿胀和3处关节压痛的RA患者。如果患者具有慢性严重感染史、任何当前感染或任何类型的癌症,则将他们排除在外。将孕妇或哺乳期妇女排除在外。告知所有患者具有不孕的风险并将想要孩子的患者排除在外。尽管一些患者已经患RA多年,但他们是未用DMARD的,因为他们在就诊北京协和医院之前没有在当地医院被诊断患有RA,而且他们仅服用止痛药来缓解RA症状。RA was diagnosed at Peking Union Medical College Hospital according to the 2010 ACR/EULAR classification criteria. All phenotypic information was collected at the time of the subjects' initial visit to the hospital according to standard procedures. RA patients aged between 18 and 65 years with disease duration of at least 6 weeks, at least 1 swollen joint and 3 tender joints were recruited. Patients were excluded if they had a history of chronic serious infection, any current infection or any type of cancer. Pregnant or breastfeeding women were excluded. All patients were informed of the risk of infertility and patients who wanted to have children were excluded. Although some patients had suffered from RA for many years, they were DMARD-naive because they had not been diagnosed with RA at a local hospital before visiting Peking Union Medical College Hospital and they only took analgesics to relieve RA symptoms.

根据标准程序,在受试者到医院初诊时采集所有表型信息。212个用于肠道微生物基因目录构建的样本中仅有21个来自DMARD-治疗的患者的粪便样本且在这篇文章中没有进行分析。All phenotypic information was collected at the time of the subjects' initial hospital visit according to standard procedures. Only 21 of the 212 samples used to construct the gut microbial gene catalog were stool samples from DMARD-treated patients and were not analyzed in this article.

这项研究得到了北京协和医院和深圳华大基因的机构审查委员会的批准。This study was approved by the institutional review boards of Peking Union Medical College Hospital and BGI-Shenzhen.

表1-1.用于基因目录构建的样本Table 1-1. Samples used for gene catalog construction

1.2宏基因组测序和组装1.2 Metagenomic Sequencing and Assembly

如前所述(Qin等人.2012,supra),在Illumina平台上进行双末端宏基因组测序(插入片段350bp,序列长度100bp),对测序读段进行质量控制并采用SOAPdenovo v2.04将测序读段重新组装成重叠群(Luo,R.等人.SOAPdenovo2:an empirically improvedmemory-efficient short-read de novo assembler.Gigascience 1,18(2012).,通过引用并入本文)。宿主污染的平均率对粪便样本来说为0.37%,对牙齿样本来说为5.55%,对唾液样本为40.85%。Paired-end metagenomic sequencing (insert fragment 350 bp, sequence length 100 bp) was performed on the Illumina platform as previously described (Qin et al. 2012, supra). Sequencing reads were quality controlled and reassembled into contigs using SOAPdenovo v2.04 (Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)., incorporated herein by reference). The average rate of host contamination was 0.37% for fecal samples, 5.55% for dental samples, and 40.85% for saliva samples.

1.3基因目录构建1.3 Gene catalog construction

利用GeneMark v2.7d对经过组装的重叠群的基因进行预测。采用BLAT(Kent,W.J.BLAT--the BLAST-like alignment tool.Genome Res.12,656–64(2002),通过引用并入本文)以90%重叠和95%同一性(不允许洞的存在)的阈值去除冗余基因,对于212个粪便样本(含有21个DMARD-治疗的样本)形成3,800,011个基因的非冗余基因目录,对于203个口腔样品(105个牙菌斑样本和98个唾液样本)形成3,234,997个基因的目录。利用BLAT(95%的同一性,90%重叠)将来自粪便样本的基因目录并入已有的包含430万个基因的肠道微生物参考目录中(Qin等人.2012,supra),形成包含590万个基因的最终目录。采用与出版的T2D论文(Qin等人,2012,同上)中相同的程序通过将高质量测序读段与肠道或口腔参考基因目录进行比对来确定基因的相对丰度。GeneMark v2.7d was used to predict genes in the assembled contigs. BLAT (Kent, W.J.BLAT--the BLAST-like alignment tool. Genome Res. 12, 656–64 (2002), incorporated herein by reference) was used to remove redundant genes with a threshold of 90% overlap and 95% identity (not allowing the presence of holes). This resulted in a non-redundant gene catalog of 3,800,011 genes for 212 stool samples (including 21 DMARD-treated samples) and a catalog of 3,234,997 genes for 203 oral samples (105 dental plaque samples and 98 saliva samples). The gene catalog from the stool samples was merged into an existing gut microbial reference catalog containing 4.3 million genes (Qin et al. 2012, supra) using BLAT (95% identity, 90% overlap), resulting in a final catalog of 5.9 million genes. The relative abundance of genes was determined by aligning high-quality sequencing reads to intestinal or oral reference gene catalogs using the same procedure as in the published T2D paper (Qin et al., 2012, supra).

1.4分类注释和丰度计算1.4 Taxonomic annotation and abundance calculation

利用先前详述的内部流程(pipeline)(Qin等人,2012,同上)根据IMG数据库(v400)对预测基因进行分类分配,70%重叠和65%同一性分配至门,85%同一性分配至属,95%同一性分配至种。从分类群基因的相对丰度计算分类群的相对丰度。The predicted genes were assigned to taxonomies according to the IMG database (v400) using a previously described internal pipeline (Qin et al., 2012, supra), with 70% overlap and 65% identity assigned to phylum, 85% identity assigned to genus, and 95% identity assigned to species. The relative abundance of taxa was calculated from the relative abundance of taxa genes.

通过Wilcoxon秩和检验(其中p<0.05)确定患者和健康对照之间分类群的相对丰度的显著差异。Significant differences in the relative abundance of taxa between patients and healthy controls were determined by Wilcoxon rank sum test (with p < 0.05).

1.5宏基因组关联分析(MGWAS)1.5 Metagenomic Wage-Based Association Study (MGWAS)

对于粪便微生物群的病例-对照比较,去除在少于6个样本(n=157)中检测到的基因导致具有3,110,085个基因的集。83,858个基因在对照和病例之间在相对丰度方面显示出差异(p<0.01,Wilcoxon秩和检验,FDR=0.3285)。根据这些标记物基因在所有样本中的丰度变化将它们聚类成MLG(Qin等人,2012,同上)。对于构建牙齿MLG,从2,247,835个基因(存在于至少6个样本中,n=105)中选择209820个标记物基因(p<0.01,Wilcoxon秩和检验,FDR=0.072)。对于唾液MLG,本发明人从2,404,726个基因(存在于至少6个样本中,n=98)中选择206399个标记物基因(p<0.01,Wilcoxon秩和检验,FDR=0.088)。For case-control comparisons of fecal microbiota, removal of genes detected in less than 6 samples (n=157) resulted in a set of 3,110,085 genes. 83,858 genes showed differences in relative abundance between controls and cases (p<0.01, Wilcoxon rank sum test, FDR=0.3285). These marker genes were clustered into MLGs based on their abundance changes in all samples (Qin et al., 2012, supra). For the construction of tooth MLGs, 209,820 marker genes were selected from 2,247,835 genes (present in at least 6 samples, n=105) (p<0.01, Wilcoxon rank sum test, FDR=0.072). For salivary MLG, the present inventors selected 206,399 marker genes (p<0.01, Wilcoxon rank sum test, FDR=0.088) from 2,404,726 genes (present in at least 6 samples, n=98).

如先前所述(Qin等人,2012,同上),根据分类学和它们的组成基因的相对丰度进行分类分配和丰度分析。简言之,分配到种需要将MLG中的超过90%的基因与种的基因组比对时,具有超过95%的同一性,70%的查询重叠。将MLG分配至属要求其超过80%的基因与基因组比对,其中在DNA和蛋白序列中具有85%的同一性。示出与从所有基因计算的与基因组的平均同一性仅用于参考。根据MLG在所有样本中的丰度之间的Kendall相关性而不管病例-对照状态将MLG进一步聚类,并且同现网络通过Cytoscape 3.0.2可视化。Taxonomic assignment and abundance analysis were performed as previously described (Qin et al., 2012, supra) based on taxonomy and the relative abundance of their constituent genes. Briefly, assignment to species required that more than 90% of the genes in the MLG had more than 95% identity when aligned to the species genome, with 70% query overlap. Assignment of an MLG to a genus required that more than 80% of its genes aligned to the genome, with 85% identity in both DNA and protein sequences. The average identity to the genome calculated from all genes is shown for reference only. The MLGs were further clustered based on the Kendall correlation between their abundance in all samples regardless of case-control status, and the co-occurrence network was visualized using Cytoscape 3.0.2.

1.6基于MLG的分类器1.6 MLG-based classifier

利用训练群组(表1-2)的MLG丰度谱对随机森林模型(R.2.14,randomForest4.6-7软件包)(Liaw,Andy&Wiener,Matthew.Classification and Regression byrandomForest,R News(2002),第2/3期,第18页,通过引用并入本文)进行训练以选择MLG标记物的最佳集。在一个以上测试集上对该模型进行测试并计算预测误差。The MLG abundance profiles of the training cohort (Tables 1-2) were used to train a random forest model (R.2.14, randomForest4.6-7 software package) (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002), Vol. 2/3, p. 18, incorporated herein by reference) to select the best set of MLG markers. The model was tested on one or more test sets and the prediction error was calculated.

关于随机森林模型,采用2.14版本的R中打包的“随机森林4.6-7软件包”,输入为训练数据集(即训练样本中选择的MLG的相对丰度谱)、样本疾病状态(训练样本的样本疾病状为向量,1代表RA,0代表对照)和测试集(只是测试集中选择的MLG的相对丰度谱)。然后本发明人采用来自R软件的随机森林软件包的随机森林函数构建分类,并采用预测函数来预测测试集。输出为预测结果(患病概率,阈值为0.5,且如果患病概率≥0.5,则受试者有风险患有RA)。For the random forest model, the "Random Forest 4.6-7 software package" packaged in R version 2.14 was used. The input was a training dataset (i.e., the relative abundance profile of the MLGs selected in the training sample), a sample disease status (the sample disease status of the training sample is a vector, 1 represents RA, 0 represents the control), and a test set (just the relative abundance profile of the MLGs selected in the test set). The inventors then used the random forest function from the random forest package in R software to construct a classifier and used the predict function to predict the test set. The output was the prediction result (probability of disease, with a threshold of 0.5, and if the probability of disease is ≥0.5, the subject is at risk of RA).

表1-2.训练集的样本信息(选自表1-1中的用于基因目录构建的样本)Table 1-2. Sample information of the training set (selected from the samples used for gene catalog construction in Table 1-1)

2.结果2. Results

基于微生物群的RA患者的鉴定和验证Microbiome-based identification and validation of RA patients

为了进一步说明RA相关的微生物群的诊断或预后价值,本发明人首先基于肠道MLG构建随机森林疾病分类器。采用来自对照和病例的85个肠道MLG标记物(至少100个基因)中的29个肠道MLG标记物的模型给出了训练集(n=157)(图1a、表2-1、表2-2、表5、表8-1、表8-2)中最低的预测误差和接受者操作特征(ROC)曲线下面积(AUC)为0.977。关于由具有血缘关系的病例-对照对和不具有血缘关系的病例-对照对(n=34,表1-3)组成的测试集,整体错误率为32%(图1b,表11)且AUC为0.706。因此,基于肠道MLG的模型对训练集和适用情况下对测试集的效能堪比或超过现有的基于RA血清标记物的分类器的效能(Van derHelm-van Mil,A.H.M.Risk estimation in rheumatoid arthritis-from bench tobedside.Nat.Rev.Rheumatol.(2014).doi:10.1038/nrrheum.2013.215,通过引用并入本文)。To further illustrate the diagnostic or prognostic value of RA-associated microbiota, the present inventors first constructed a random forest disease classifier based on intestinal MLG. A model using 29 of the 85 intestinal MLG markers (at least 100 genes) from controls and cases gave the lowest prediction error and receiver operating characteristic (ROC) curve area (AUC) of 0.977 in the training set (n = 157) (Figure 1a, Table 2-1, Table 2-2, Table 5, Table 8-1, Table 8-2). For the test set consisting of consanguineous case-control pairs and unrelated case-control pairs (n = 34, Tables 1-3), the overall error rate was 32% (Figure 1b, Table 11) and the AUC was 0.706. Thus, the performance of the gut MLG-based model on the training set and, where applicable, the test set was comparable to or exceeded that of existing classifiers based on RA serum markers (Van der Helm-van Mil, A.H.M. Risk estimation in rheumatoid arthritis—from bench tobedside. Nat. Rev. Rheumatol. (2014). doi: 10.1038/nrrheum.2013.215, incorporated herein by reference).

类似地,选自171个牙齿MLG(至少100个基因)的28个MLG(表3-1,表3-2,表6,表9-1,表9-2)在训练集中给出0.864的AUC(图1d)。选自142个唾液MLG(至少100个基因)的19个MLG(表4-1,表4-2,表7,表10-1,表10-2)给出0.898的AUC(图1f)。这些结果表明粪便、牙齿和唾液微生物标记物对诊断RA都非常有用。Similarly, 28 MLGs selected from 171 dental MLGs (at least 100 genes) (Table 3-1, Table 3-2, Table 6, Table 9-1, Table 9-2) gave an AUC of 0.864 in the training set (Figure 1d). 19 MLGs selected from 142 salivary MLGs (at least 100 genes) (Table 4-1, Table 4-2, Table 7, Table 10-1, Table 10-2) gave an AUC of 0.898 (Figure 1f). These results indicate that fecal, dental, and salivary microbial markers are all very useful for diagnosing RA.

此外,对经DMARD治疗的患者样本(表1-3)测试肠道和牙齿MLG分类器仍然将它们中的大部分鉴定为RA患者,而具有低疾病活性的牙齿样本(DAS28)更常被归类为健康的(图1c,1e,表12),说明牙齿微生物群如实地表明了DMARD治疗的效果。此外,来自经DMARD治疗的患者的唾液样本通常被分类为对照,可能是由于DMARD对唾液微生物群的直接调节(图1g,表12)。总之,结果表明肠道和口腔MLG可以区分有效和无效治疗并且促进对治疗策略的评估。Furthermore, testing the gut and tooth MLG classifiers on samples from patients treated with DMARDs (Tables 1-3) still identified the majority of them as RA patients, whereas tooth samples with low disease activity (DAS28) were more often classified as healthy (Figures 1c, 1e, Table 12), suggesting that the dental microbiota faithfully reflects the effects of DMARD treatment. Furthermore, saliva samples from patients treated with DMARDs were often classified as controls, likely due to direct modulation of the saliva microbiota by DMARDs (Figure 1g, Table 12). Taken together, these results suggest that gut and oral MLG can distinguish between effective and ineffective treatments and facilitate the evaluation of therapeutic strategies.

表1-3测试集的样本信息Table 1-3 Sample information of the test set

表5. 29个肠道最佳标记物的SEQ IDTable 5. SEQ IDs of 29 optimal intestinal markers

MLG IDMLG ID SEQ ID NO:SEQ ID NO: 基因数Number of genes mlg_id:2441mlg_id:2441 1~1591~159 159159 mlg_id:4103mlg_id:4103 160~304160~304 145145 mlg_id:4212mlg_id:4212 305~709305~709 405405 mlg_id:1047mlg_id:1047 710~856710~856 147147 mlg_id:1735mlg_id:1735 857~1536857~1536 680680 mlg_id:4360mlg_id:4360 1537~16461537~1646 110110 mlg_id:1796mlg_id:1796 1647~17981647~1798 152152 mlg_id:3396mlg_id:3396 1799~20711799~2071 273273 mlg_id:2472mlg_id:2472 2072~23092072~2309 238238 mlg_id:1261mlg_id:1261 2310~29912310~2991 682682 mlg_id:1832mlg_id:1832 2992~30932992~3093 102102 mlg_id:6638mlg_id:6638 3094~32143094~3214 121121 mlg_id:1722mlg_id:1722 3215~33533215~3353 139139 mlg_id:1423mlg_id:1423 3354~34553354~3455 102102 mlg_id:1170mlg_id:1170 3456~35583456~3558 103103 mlg_id:3215mlg_id:3215 3559~37393559~3739 181181 mlg_id:4095mlg_id:4095 3740~43813740~4381 642642 mlg_id:2637mlg_id:2637 4382~47544382~4754 373373 mlg_id:905mlg_id:905 4755~48854755~4885 131131 mlg_id:4111mlg_id:4111 4886~67434886~6743 18581858 mlg_id:1710mlg_id:1710 6744~68626744~6862 119119 mlg_id:2633mlg_id:2633 6863~71136863~7113 251251 mlg_id:819mlg_id:819 7114~74257114~7425 312312 mlg_id:4158mlg_id:4158 7426~77367426~7736 311311 mlg_id:527mlg_id:527 7737~78547737~7854 118118 mlg_id:784mlg_id:784 7855~80487855~8048 194194 mlg_id:2473mlg_id:2473 8049~87588049~8758 710710 mlg_id:781mlg_id:781 8759~88698759~8869 111111 mlg_id:5mlg_id:5 8870~93198870~9319 450450

表6. 28个牙齿最佳标记物的SEQ IDTable 6. SEQ IDs of 28 optimal tooth markers

表7. 19个唾液最佳标记物的SEQ IDTable 7. SEQ IDs of 19 optimal salivary markers

MLG IDMLG ID SEQ ID NO:SEQ ID NO: 基因数Number of genes mlg_id:1238mlg_id:1238 1~1261~126 126126 mlg_id:1559mlg_id:1559 127~231127~231 105105 mlg_id:6908mlg_id:6908 232~360232~360 129129 mlg_id:1141mlg_id:1141 361~519361~519 159159 mlg_id:6746mlg_id:6746 520~697520~697 178178 mlg_id:1mlg_id:1 698~5680698~5680 49834983 mlg_id:27683mlg_id:27683 5681~58515681~5851 171171 mlg_id:1374mlg_id:1374 5852~60325852~6032 181181 mlg_id:13mlg_id:13 6033~84826033~8482 24502450 mlg_id:1073mlg_id:1073 8483~95978483~9597 11151115 mlg_id:29mlg_id:29 9598~104699598~10469 872872 mlg_id:636mlg_id:636 10470~1124610470~11246 777777 mlg_id:9651mlg_id:9651 11247~1138311247~11383 137137 mlg_id:305mlg_id:305 11384~1148511384~11485 102102 mlg_id:12mlg_id:12 11486~1422811486~14228 27432743 mlg_id:20mlg_id:20 14229~1623914229~16239 20112011 mlg_id:2831mlg_id:2831 16240~1760516240~17605 13661366 mlg_id:13621mlg_id:13621 17606~1811517606~18115 510510 mlg_id:27616mlg_id:27616 18116~931918116~9319 123123

因此,本发明人给基于RA相关的基因标记物通过随机森林模型已经鉴别出并验证了标记物组(29个肠道MLG\28个牙齿MLG\19个唾液MLG)。并且本发明人已经构建出基于这些RA相关的肠道微生物群来评估RA疾病的风险的RA分类器。Therefore, the present inventors have identified and validated a marker panel (29 intestinal MLGs, 28 dental MLGs, and 19 salivary MLGs) based on RA-related gene markers using a random forest model. Furthermore, the present inventors have constructed an RA classifier that assesses the risk of RA disease based on these RA-related intestinal microbiota.

尽管已经示出和描述了示例性实施例,但是本领域技术人员应当理解,上述实施例不能被解释为限制本公开,并且可以在不脱离本公开的精神、原理和范围的情况下对实施例进行改变、替换和修改。Although exemplary embodiments have been shown and described, those skilled in the art should understand that the above embodiments should not be construed as limiting the present disclosure, and that changes, substitutions, and modifications may be made to the embodiments without departing from the spirit, principle, and scope of the present disclosure.

Claims (10)

1.一种用于确定生物标记物组的试剂盒,所述试剂盒中包括针对所述生物标记物组的探针和/或引物,所述生物标记物组由肠道生物标记物组成,所述肠道生物标记物包括齿双歧杆菌(Bifidobacterium dentium)、RA-2633、肠球菌属(Enterococcus sp.)、RA-781、Gordonibacter pamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属(Clostridium sp.)、RA-2637柠檬酸杆菌属(Citrobacter sp.真杆菌属(Eubacterium sp.柠檬酸杆菌属RA-3215Con-1722Con-4360Con-4212Con-1261两歧双歧杆菌(Bifidobacterium bifidum肺炎克雷伯菌(Klebsiella pneumoniaeCon-1423韦荣氏球菌属(Veillonella sp.Con-4095Con-4103Con-1735Con-1710Con-1832Con-1170,1. A kit for identifying a biomarker group, the kit comprising probes and/or primers for said biomarker group, said biomarker group consisting of intestinal biomarkers including * Bifidobacterium dentium *, RA-2633, * Enterococcus* sp ., RA-781, * Gordonibacter pamelaeae *, RA-3396, RA-6638, RA-2441, RA-527, *Clostridium * sp ., RA-2637 , * Citrobacter* sp . , *Eubacterium* sp., * Citrobacter * sp . , RA-3215 , Con-1722 , Con-4360 , Con-4212 , Con-1261 , and * Bifidobacterium bifidum * . Klebsiella pneumoniae , Con-1423 , Veillonella sp . , Con-4095 , Con-4103 , Con-1735 , Con-1710 , Con-1832 , Con-1170, 所述肠道微生物标记物由包括SEQ ID NO:1~SEQ ID NO:9319的序列组成,The gut microbiome markers consist of sequences including SEQ ID NO:1 to SEQ ID NO:9319. 其中,RA-2633的核苷酸序列如SEQ ID NO: 6863~7113所示,RA-781的核苷酸序列如SEQ ID NO: 8759~8869所示,RA-3396的核苷酸序列如SEQ ID NO: 1799~2071所示,RA-6638的核苷酸序列如SEQ ID NO: 3094~3214所示,RA-2441的核苷酸序列如SEQ ID NO: 1~159所示,RA-527的核苷酸序列如SEQ ID NO: 7737~7854所示,RA-2637的核苷酸序列如SEQID NO: 4382~4754所示,RA-3215的核苷酸序列如SEQ ID NO: 3559~3739所示,Con-1722的核苷酸序列如SEQ ID NO: 3215~3353所示,Con-4360的核苷酸序列如SEQ ID NO: 1537~1646所示,Con-4212的核苷酸序列如SEQ ID NO: 305~709所示,Con-1261的核苷酸序列如SEQ ID NO: 2310~2991所示,Con-1423的核苷酸序列如SEQ ID NO: 3354~3455所示,Con-4095的核苷酸序列如SEQ ID NO: 3740~4381所示,Con-4103的核苷酸序列如SEQ ID NO:160~304所示,Con-1735的核苷酸序列如SEQ ID NO: 857~1536所示,Con-1710的核苷酸序列如SEQ ID NO: 6744~6862所示,Con-1832的核苷酸序列如SEQ ID NO: 2992~3093所示,Con-1170的核苷酸序列如SEQ ID NO: 3456~3558所示。The nucleotide sequences of RA-2633 are shown in SEQ ID NO: 6863~7113, RA-781 in SEQ ID NO: 8759~8869, RA-3396 in SEQ ID NO: 1799~2071, RA-6638 in SEQ ID NO: 3094~3214, RA-2441 in SEQ ID NO: 1~159, RA-527 in SEQ ID NO: 7737~7854, RA-2637 in SEQ ID NO: 4382~4754, RA-3215 in SEQ ID NO: 3559~3739, Con-1722 in SEQ ID NO: 3215~3353, and Con-4360 in SEQ ID NO: 1799~2071. As shown in SEQ ID NO: 305-709, the nucleotide sequence of Con-4212 is shown in SEQ ID NO: 2310-2991, the nucleotide sequence of Con-1423 is shown in SEQ ID NO: 3354-3455, the nucleotide sequence of Con-4095 is shown in SEQ ID NO: 3740-4381, the nucleotide sequence of Con-4103 is shown in SEQ ID NO: 160-304, the nucleotide sequence of Con-1735 is shown in SEQ ID NO: 857-1536, the nucleotide sequence of Con-1710 is shown in SEQ ID NO: 6744-6862, the nucleotide sequence of Con-1832 is shown in SEQ ID NO: 2992-3093, and the nucleotide sequence of Con-1170 is shown in SEQ ID NO: 3456-3558. 2.根据权利要求1所述的试剂盒,其中所述肠道生物标记物包括齿双歧杆菌JCVIHMP022、普氏菌(Prevotella copriCB7DSM 18205、屎肠球菌(Enterococcus faecium)E980、卵形瘤胃球菌(Ruminococcus obeumA2-162Gordonibacter pamelaeae7- 10-1-bT、DSM 19378、布氏瘤胃球菌(Ruminococcus bromiiL2-63、凸腹真杆菌(Eubacterium ventriosumATCC 27560、产酸克雷伯菌(Klebsiella oxytocaKCTC 1686Clostridium asparagiforme DSM 15981、弗氏柠檬酸杆菌(Citrobacter freundii4_7_ 47CFAA、真杆菌属(Eubacterium sp.3_1_31、柠檬酸杆菌属(Citrobacter sp.30_2、梭状芽孢杆菌属(Clostridium sp.7_2_43FAA、罗氏弧菌(Roseburia intestinalisM50/1Dialister invisus DSM 15470Bacteroides plebeius M12、DSM 17135、两歧双歧杆菌(Bifidobacterium bifidumS17、肺炎克雷伯菌(Klebsiella pneumoniaeNTUH-K2044、韦荣氏球菌属口腔分类群(Veillonella sp. oral taxon158 F0412、睾丸酮丛毛单胞菌(Comamonas testosteroniKF-1、非典型韦荣球菌(Veillonella atypicaACS-134-V- Col7a、澳大利亚链球菌(Streptococcus australisATCC 700641Parabacteroides merdae ATCC 431842. The kit according to claim 1, wherein the intestinal biomarkers include Bifidobacterium tumefaciens JCVIHMP022, Prevotella copri CB7 , DSM 18205 , Enterococcus faecium E980, Ruminococcus obeum A2-162 , Gordonibacter pamelaeae7-10-1 -bT , DSM 19378, Ruminococcus bromii L2-63 , Eubacterium ventriosum ATCC 27560 , Klebsiella oxytoca KCTC 1686 , Clostridium asparagiforme DSM 15981 , and Citrobacter freundii . 4_7_ 47CFAA , Eubacterium sp . 3_1_31 , Citrobacter sp . 30_2 , Clostridium sp . 7_2_43FAA , Roseburia intestinalis M50/1 , Dialister invisus DSM 15470 , Bacteroides plebeius M12 , DSM 17135, Bifidobacterium bifidum S17 , Klebsiella pneumoniae NTUH-K2044 , Veillonella sp. oral taxon 158 F0412 , Comamonas testosteroni KF-1 Veillonella atypica ACS-134-V- Col7a , Streptococcus australis ATCC 700641 , Parabacteroides merdae ATCC 43184 . 3.一种用于确定生物标记物组的试剂盒,包括用于PCR扩增和根据如在权利要求1中所述的肠道生物标记物设计的引物组。3. A kit for identifying a set of biomarkers, comprising a primer set for PCR amplification and designed according to intestinal biomarkers as described in claim 1. 4.一种用于确定生物标记物组的试剂盒,包括一种以上根据如在权利要求1中所述的肠道生物标记物设计的探针组。4. A kit for identifying a set of biomarkers, comprising one or more probe sets designed according to intestinal biomarkers as described in claim 1. 5.生物标记物组在制备用于预测待测受试者类风湿性关节炎的风险的试剂盒的用途,所述生物标记物组由肠道生物标记物组成,所述肠道生物标记物包括齿双歧杆菌(Bifidobacterium dentium)、RA-2633、肠球菌属(Enterococcus sp.)、RA-781、Gordonibacter pamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属(Clostridium sp.)、RA-2637柠檬酸杆菌属(Citrobacter sp.真杆菌属(Eubacterium sp.柠檬酸杆菌属RA-3215Con-1722Con-4360Con-4212Con-1261两歧双歧杆菌(Bifidobacterium bifidum肺炎克雷伯菌(Klebsiella pneumoniaeCon-1423韦荣氏球菌属(Veillonella sp.Con-4095Con-4103Con-1735Con-1710Con-1832Con-1170,5. Use of a biomarker set in the preparation of a kit for predicting the risk of rheumatoid arthritis in a subject, said biomarker set comprising gut biomarkers including * Bifidobacterium dentium *, RA-2633, * Enterococcus* sp. , RA-781, * Gordonibacter pamelaeae *, RA-3396, RA-6638, RA-2441, RA-527, *Clostridium* sp ., RA-2637 , *Citrobacter* sp. , * Eubacterium * sp. , *Citrobacter * sp ., RA -3215 , Con-1722 , Con-4360 , Con-4212 , Con-1261 , and * Bifidobacterium bifidum *. Klebsiella pneumoniae , Con-1423 ; Veillonella sp . , Con-4095 , Con-4103 , Con-1735 , Con-1710 , Con-1832 , Con-1170, 所述肠道微生物标记物由包括SEQ ID NO:1 ~SEQ ID NO:9319的序列组成,The gut microbiome markers consist of sequences including SEQ ID NO:1 ~ SEQ ID NO:9319. 其中,RA-2633的核苷酸序列如SEQ ID NO: 6863~7113所示,RA-781的核苷酸序列如SEQ ID NO: 8759~8869所示,RA-3396的核苷酸序列如SEQ ID NO: 1799~2071所示,RA-6638的核苷酸序列如SEQ ID NO: 3094~3214所示,RA-2441的核苷酸序列如SEQ ID NO: 1~159所示,RA-527的核苷酸序列如SEQ ID NO: 7737~7854所示,RA-2637的核苷酸序列如SEQID NO: 4382~4754所示,RA-3215的核苷酸序列如SEQ ID NO: 3559~3739所示,Con-1722的核苷酸序列如SEQ ID NO: 3215~3353所示,Con-4360的核苷酸序列如SEQ ID NO: 1537~1646所示,Con-4212的核苷酸序列如SEQ ID NO: 305~709所示,Con-1261的核苷酸序列如SEQ ID NO: 2310~2991所示,Con-1423的核苷酸序列如SEQ ID NO: 3354~3455所示,Con-4095的核苷酸序列如SEQ ID NO: 3740~4381所示,Con-4103的核苷酸序列如SEQ ID NO:160~304所示,Con-1735的核苷酸序列如SEQ ID NO: 857~1536所示,Con-1710的核苷酸序列如SEQ ID NO: 6744~6862所示,Con-1832的核苷酸序列如SEQ ID NO: 2992~3093所示,Con-1170的核苷酸序列如SEQ ID NO: 3456~3558所示;The nucleotide sequences of RA-2633 are shown in SEQ ID NO: 6863~7113, RA-781 in SEQ ID NO: 8759~8869, RA-3396 in SEQ ID NO: 1799~2071, RA-6638 in SEQ ID NO: 3094~3214, RA-2441 in SEQ ID NO: 1~159, RA-527 in SEQ ID NO: 7737~7854, RA-2637 in SEQ ID NO: 4382~4754, RA-3215 in SEQ ID NO: 3559~3739, Con-1722 in SEQ ID NO: 3215~3353, and Con-4360 in SEQ ID NO: 1799~2071. As shown in SEQ ID NO: 305-709, the nucleotide sequence of Con-4212 is shown in SEQ ID NO: 2310-2991, the nucleotide sequence of Con-1423 is shown in SEQ ID NO: 3354-3455, the nucleotide sequence of Con-4095 is shown in SEQ ID NO: 3740-4381, the nucleotide sequence of Con-4103 is shown in SEQ ID NO: 160-304, the nucleotide sequence of Con-1735 is shown in SEQ ID NO: 857-1536, the nucleotide sequence of Con-1710 is shown in SEQ ID NO: 6744-6862, the nucleotide sequence of Con-1832 is shown in SEQ ID NO: 2992-3093, and the nucleotide sequence of Con-1170 is shown in SEQ ID NO: 3456-3558; 所述用途包括:The uses include: (1)从所述待测受试者采集样本;(1) Collect samples from the subject to be tested; (2)确定步骤(1)中获得的所述样本中所述的生物标记物组的各个生物标记物的相对丰度信息;(2) Determine the relative abundance information of each biomarker in the biomarker group obtained in step (1); (3)通过采用多元统计模型将待测受试者的各个生物标记物的所述相对丰度信息与训练数据集进行比较获得类风湿性关节炎的概率,(3) The probability of rheumatoid arthritis is obtained by comparing the relative abundance information of each biomarker of the subject with the training dataset using a multivariate statistical model. 其中所述类风湿性关节炎的概率大于阈值表明所述待测受试者患有所述类风湿性关节炎或者有风险发展所述类风湿性关节炎。The probability of having rheumatoid arthritis greater than a threshold indicates that the subject has rheumatoid arthritis or is at risk of developing rheumatoid arthritis. 6.根据权利要求5所述的用途,其中所述训练数据集是采用所述多元统计模型基于多个患有类风湿性关节炎的受试者和多个正常受试者的各个生物标记物的相对丰度信息构建的。6. The use according to claim 5, wherein the training dataset is constructed using the multivariate statistical model based on the relative abundance information of various biomarkers of multiple subjects with rheumatoid arthritis and multiple normal subjects. 7.根据权利要求6所述的用途,其特征在于,所述多元统计模型为随机森林模型。7. The use according to claim 6, wherein the multivariate statistical model is a random forest model. 8.根据权利要求6所述的用途,其中所述训练数据集为矩阵,其中各行代表根据权利要求1至3中任一项所述的生物标记物组的各个生物标记物,各列代表样本,各个单元代表所述样本中的生物标记物的相对丰度谱,且样本疾病状态为向量,其中1表示类风湿性关节炎且0表示对照。8. The use according to claim 6, wherein the training dataset is a matrix, wherein each row represents a biomarker of the biomarker group according to any one of claims 1 to 3, each column represents a sample, each cell represents the relative abundance spectrum of the biomarkers in the sample, and the sample disease state is a vector, wherein 1 represents rheumatoid arthritis and 0 represents control. 9.根据权利要求8所述的用途,其中齿双歧杆菌(Bifidobacterium dentium)、RA-2633、肠球菌属(Enterococcus sp.)、RA-781、Gordonibacter pamelaeae、RA-3396、RA-6638、RA-2441、RA-527、梭状芽孢杆菌属(Clostridium sp.)、RA-2637柠檬酸杆菌属(Citrobacter sp.真杆菌属(Eubacterium sp.柠檬酸杆菌属RA-3215Con-1722Con-4360Con-4212Con-1261两歧双歧杆菌(Bifidobacterium bifidum肺炎克雷伯菌(Klebsiella pneumoniaeCon-1423韦荣氏球菌属(Veillonella sp.Con-4095Con-4103Con-1735Con-1710Con-1832和Con-1170中的每一个的相对丰度信息是根据SEQ IDNO: 1 ~SEQ ID NO:9319的相对丰度信息获得的。9. The use according to claim 8, wherein * Bifidobacterium dentium *, RA-2633, *Enterococcus * sp. , RA-781, * Gordonibacter pamelaeae *, RA-3396, RA-6638, RA-2441, RA-527, * Clostridium* sp ., RA-2637 , *Citrobacter* sp . , *Eubacterium * sp . , *Citrobacter* sp. , RA-3215 , Con-1722 , Con-4360 , Con-4212 , Con-1261 , * Bifidobacterium bifidum * , * Klebsiella pneumoniae* , Con-1423 , * Veillonella* spp. The relative abundance information of each of Veillonella sp . , Con-4095 , Con-4103 , Con-1735 , Con-1710 , Con-1832 and Con-1170 was obtained from the relative abundance information of SEQ ID NO: 1 to SEQ ID NO: 9319. 10.根据权利要求6所述的用途,其中所述训练数据集为表8-1和表8-2的至少之一,且所述类风湿性关节炎的概率为至少0.5表明所述待测受试者患有所述类风湿性关节炎或者有风险发展所述类风湿性关节炎。10. The use according to claim 6, wherein the training dataset is at least one of Table 8-1 and Table 8-2, and the probability of rheumatoid arthritis is at least 0.5, indicating that the subject has rheumatoid arthritis or is at risk of developing rheumatoid arthritis.
HK17113274.1A 2014-09-30 Biomarkers for rheumatoid arthritis and usage thereof HK1239740B (en)

Publications (2)

Publication Number Publication Date
HK1239740A1 HK1239740A1 (en) 2018-05-11
HK1239740B true HK1239740B (en) 2022-03-18

Family

ID=

Similar Documents

Publication Publication Date Title
Lluch et al. The characterization of novel tissue microbiota using an optimized 16S metagenomic sequencing pipeline
CN108350502B (en) Microbiome derived diagnostic and therapeutic methods and systems for oral health
Thorkildsen et al. Dominant fecal microbiota in newly diagnosed untreated inflammatory bowel disease patients
CN108350510B (en) Microbiome derived diagnostic and therapeutic methods and systems for gastrointestinal health related disorders
JP2013520973A (en) Obesity diagnosis method
US20130045874A1 (en) Method of Diagnostic of Inflammatory Bowel Diseases
JP6485843B2 (en) Rheumatoid arthritis biomarker and use thereof
CN107002021B (en) Biomarker for rheumatoid arthritis and application thereof
CN105368944A (en) Biomarker capable of detecting diseases and application of biomarker
CN107075563A (en) Biomarkers for Coronary Artery Disease
WO2017044880A1 (en) Method and system for microbiome-derived diagnostics and therapeutics infectious disease and other health conditions associated with antibiotic usage
AU2017229488A1 (en) Method and system for characterizing mouth-associated conditions
CN107075453A (en) The biomarker of coronary artery disease
CN106795479B (en) Biomarkers for rheumatoid arthritis and their uses
CN106795480B (en) Biomarkers for rheumatoid arthritis and their uses
WO2019115755A1 (en) A new inflammation associated, low cell count enterotype
HK1239740B (en) Biomarkers for rheumatoid arthritis and usage thereof
WO2016119191A1 (en) Biomarkers for colorectal cancer related diseases
HK1239740A1 (en) Biomarkers for rheumatoid arthritis and usage thereof
HK1248753B (en) Biomarkers for rheumatoid arthritis and usage thereof
HK1240266B (en) Biomarkers for obesity related diseases
HK1249134B (en) Biomarkers for colorectal cancer related diseases
Zhu et al. Self-organizing Approach for the Human Gut Meta-genome
HK1240281B (en) Biomarkers for colorectal cancer related diseases
Sikaroodi Dysbiosis in inflammatory bowel disease