AT527861A1

AT527861A1 - DETERMINING A CANCER RISK SCORE

Info

Publication number: AT527861A1
Application number: ATA51055/2023A
Authority: AT
Inventors: Strasser Patrick; Schneider Maximilian
Original assignee: Strasser Patrick; Schneider Maximilian
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2025-07-15
Also published as: WO2025133230A1

Abstract

Ein computerimplementiertes Verfahren zur Bestimmung eines Krebsrisiko-Scores für ein Individuum umfasst: - Erhalten von 1H-NMR-Spektroskopiedaten (300a-c) für eine Biofluidprobe eines Individuums an einer Computervorrichtung (100), wobei die 1H-NMR-Spektroskopiedaten (300a-c) indikativ für eine NMR-Signalintensität als eine Funktion der chemischen Verschiebung sind; - Auswerten der 1H-NMR-Spektroskopiedaten (300a-c) mit mindestens einem trainierten maschinellen Lernmodell (114) der Computervorrichtung (100) in Bezug auf ein molekulares Profil (310a-c) der Biofluidprobe, wobei das molekulare Profil alle Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten enthält, die einem oder mehreren von einem Metaboliten, einem Protein, einer Aminosäure, einem Mikromolekül und einem Makromolekül, die in der Biofluidprobe enthalten sind, zugeordnet werden können; - Klassifizieren des molekularen Profils (310a-c) auf der Basis der Auswertung mit dem mindestens einen trainierten maschinellen Lernmodell (114) in mindestens eine erste Klasse und eine zweite Klasse von molekularen Profilen, wobei die erste Klasse repräsentativ für molekulare Profile (310a) ist, die mit gesunden Individuen assoziiert sind, und die zweite Klasse repräsentativ für molekulare Profile (310b, c) ist, die mit Individuen assoziiert sind, die eine krebsartige und/oder präkrebsartige Erkrankung haben; - Bestimmen, auf der Basis der Klassifizierung des molekularen Profils, eines Krebsrisiko-Scores, der eine Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum angibt.A computer-implemented method for determining a cancer risk score for an individual comprises: - obtaining 1H NMR spectroscopy data (300a-c) for a biofluid sample of an individual at a computing device (100), wherein the 1H NMR spectroscopy data (300a-c) is indicative of an NMR signal intensity as a function of chemical shift; - evaluating the 1H NMR spectroscopy data (300a-c) with at least one trained machine learning model (114) of the computing device (100) with respect to a molecular profile (310a-c) of the biofluid sample, wherein the molecular profile includes all hydrogen peaks in the 1H NMR spectroscopy data that can be assigned to one or more of a metabolite, a protein, an amino acid, a micromolecule, and a macromolecule contained in the biofluid sample; - Classifying the molecular profile (310a-c) based on the evaluation with the at least one trained machine learning model (114) into at least a first class and a second class of molecular profiles, wherein the first class is representative of molecular profiles (310a) associated with healthy individuals and the second class is representative of molecular profiles (310b, c) associated with individuals who have a cancerous and/or pre-cancerous disease; - Determining, based on the classification of the molecular profile, a cancer risk score that indicates a probability of cancer occurring in the individual.

Description

15 15

20 20

25 25

30 30

Unsere Ref.: W07717 AT/CHP Our Ref.: W07717 AT/CHP

BESTIMMUNG EINES KREBSRISIKO-SCORES DETERMINING A CANCER RISK SCORE

TECHNISCHES GEBIET TECHNICAL FIELD

[0001] Die vorliegende Erfindung bezieht sich allgemein auf das Gebiet der computerimplementierten Onkologie. Insbesondere bezieht sich die vorliegende Erfindung auf ein computerimplementiertes Verfahren zur Bestimmung eines Krebsrisikos für ein Individuum. Die vorliegende Erfindung betrifft ferner eine Computervorrichtung, die zur Durchführung des Verfahrens eingerichtet ist, ein Computerprogramm, das die Computervorrichtung anweist, das Verfahren [0001] The present invention relates generally to the field of computer-implemented oncology. In particular, the present invention relates to a computer-implemented method for determining a cancer risk for an individual. The present invention further relates to a computer device configured to carry out the method, a computer program instructing the computer device to carry out the method

durchzuführen, und ein computerlesbares Medium, das ein solches Computerprogramm speichert. to carry out, and a computer-readable medium storing such a computer program.

HINTERGRUND BACKGROUND

[0002] Krebserkrankungen, oder allgemein Krebs, betreffen Millionen von Menschen und sind weltweit eine der häufigsten Todesursachen. Die häufigsten oder am weitesten verbreiteten Krebsarten sind Brust-, Lungen-, Dickdarm- und Enddarm-/Prostatakrebs, aber fast jedes Organ des menschlichen Körpers kann von Krebs befallen werden, z. B. Leber, Bauchspeicheldrüse, Nieren und andere. In der Regel kann die Sterblichkeitsrate bei Krebs gesenkt werden, wenn er frühzeitig erkannt und behandelt wird. Wenn der Krebs früh erkannt wird, ist es wahrscheinlicher, dass er auf die Behandlung anspricht und weniger wahrscheinlich, dass er sich auf anderes Gewebe ausbreitet, was zu einer höheren Überlebenswahrscheinlichkeit bei geringerer Morbidität sowie zu einer weniger kostspieligen Behandlung führen kann. [0002] Cancer, or cancer in general, affects millions of people and is one of the leading causes of death worldwide. The most common or widespread cancers are breast, lung, colon, and rectum/prostate cancer, but almost any organ in the human body can be affected by cancer, such as the liver, pancreas, kidneys, and others. Typically, the mortality rate from cancer can be reduced if it is detected and treated early. When cancer is detected early, it is more likely to respond to treatment and less likely to spread to other tissue, which can lead to a higher probability of survival with lower morbidity, as well as less costly treatment.

[0003] In vielen Fällen können Krebs, Krebserkrankungen oder Krebsvorstufen auf der Basis von Labortestergebnissen einer Biofluidprobe eines Individuums oder eines Patienten, wie z. B. Blut, Urin und Zerebrospinalflüssigkeit (Cerebrospinal Fluid, CSF), erkannt oder diagnostiziert werden. Zu den derzeit etablierten und weit verbreiteten Analysemethoden von Biofluidproben gehören beispielsweise blutchemische Tests, mit denen die Menge bestimmter Substanzen in einer Blutprobe gemessen wird, und Tests zur Bestimmung des vollständigen Blutbildes, mit denen die Anzahl der roten und weißen Blutkörperchen sowie der Blutplättchen in einer Blutprobe gemessen wird. Zu den weiteren Krebsnachweis- oder Analysemethoden gehören Tumormarker-Tests, mit denen Substanzen gemessen werden, die von Krebszellen oder anderen Körperzellen als Reaktion auf Krebs produziert werden, sowie die Urinanalyse, die Farbe und Inhalt einer Urinprobe beschreibt. Andere Ansätze nutzen die Flüssigbiopsie in Kombination mit Sequenzierungstechnologien der nächsten Generation zur Krebserkennung. [0003] In many cases, cancer, cancers, or precancerous conditions can be detected or diagnosed based on laboratory test results of an individual's or patient's biofluid sample, such as blood, urine, and cerebrospinal fluid (CSF). Currently established and widely used methods for analyzing biofluid samples include blood chemistry tests, which measure the amount of certain substances in a blood sample, and complete blood count tests, which measure the number of red and white blood cells and platelets in a blood sample. Other cancer detection or analysis methods include tumor marker tests, which measure substances produced by cancer cells or other body cells in response to cancer, and urinalysis, which describes the color and content of a urine sample. Other approaches use liquid biopsy in combination with next-generation sequencing technologies for cancer detection.

[0004] In den letzten Jahren wurden auch viele Entwicklungen in Richtung computerimplementierter oder automatisierter Krebserkennungs- oder -bewertungsmethoden vorgenommen, um [0004] In recent years, many developments have also been made towards computer-implemented or automated cancer detection or assessment methods in order to

beispielsweise eine datengesteuerte und/oder individualisierte Patientenversorgung auf der Basis For example, data-driven and/or individualized patient care based on

KWK:KWK 7/44 CHP:CHP 7/44

15 15

20 20

25 25

30 30

35 35

2 von Labortestergebnissen einer Biofluidprobe eines Individuums, wie oben beschrieben, zu ermöglichen. Im Allgemeinen haben sich künstliche Intelligenz (Kl) und maschinelles Lernen (ML), eine Untergruppe der Kl, die es Computern ermöglicht, aus Trainingsdaten zu lernen, als nützliche Hilfsmittel in der Onkologie erwiesen und können nachweislich die Genauigkeit der medizinischen Versorgung und die Ergebnisse für die Patienten verbessern. [0005] Die derzeit verwendeten Kl- und ML-basierten Methoden zur Erkennung von Krebs auf der Basis von Messungen von Biofluidproben konzentrieren sich jedoch in der Regel auf bestimmte Substanzen, die in einer Biofluidprobe vorhanden sind, und sind daher in der Regel auf die Erkennung bestimmter Arten von Krebs beschränkt. Außerdem ist zumindest bei einigen der derzeit verwendeten Krebserkennungsmethoden eine frühzeitige Erkennung von Krebserkrankungen oder sogar Krebsvorstufen kaum möglich, da beispielsweise das entsprechende Krebssignal im 2 of laboratory test results of a biofluid sample from an individual, as described above. In general, artificial intelligence (AI) and machine learning (ML), a subset of AI that enables computers to learn from training data, have proven to be useful tools in oncology and can demonstrably improve the accuracy of medical care and patient outcomes. [0005] However, the AI- and ML-based methods currently used to detect cancer based on measurements of biofluid samples usually focus on specific substances present in a biofluid sample and are therefore usually limited to the detection of specific types of cancer. Furthermore, at least with some of the cancer detection methods currently in use, early detection of cancers or even precancerous lesions is hardly possible because, for example, the corresponding cancer signal in the

Frühstadium des Krebses möglicherweise nicht nachweisbar ist. Early stages of cancer may not be detectable.

ZUSAMMENFASSUNG SUMMARY

[0006] Es kann daher wünschenswert sein, ein verbessertes computerimplementiertes Verfahren und eine entsprechende Computervorrichtung zur Bestimmung eines Krebsrisikos zu schaffen, das die Wahrscheinlichkeit des Auftretens von Krebs bei einem Individuum angibt. Das hier beschriebene Verfahren und die Computervorrichtung können insbesondere eine frühzeitige Erkennung einer krebsartigen Erkrankung (Krebserkrankung) und/oder einer präkrebsartigen Erkrankung (Krebsvorstufe) erlauben oder ermöglichen. [0006] It may therefore be desirable to provide an improved computer-implemented method and corresponding computer device for determining a cancer risk, which indicates the probability of cancer occurring in an individual. The method and computer device described herein may, in particular, allow or enable early detection of a cancerous condition (cancer) and/or a precancerous condition (precancerous stage).

[0007] Dies wird durch den Gegenstand der unabhängigen Ansprüche erreicht, wobei weitere Ausführungsformen in den abhängigen Ansprüchen und der folgenden Beschreibung enthalten sind. [0008] Aspekte der vorliegenden Offenbarung beziehen sich auf ein computerimplementiertes Verfahren zur Bestimmung eines Krebsrisiko-Scores für ein Individuum, auf eine Computervorrichtung, die so eingerichtet ist, dass sie ein Verfahren zur Bestimmung eines Krebsrisiko-Scores für ein Individuum durchführt, auf ein entsprechendes Computerprogramm und auf ein computerlesbares Medium, auf dem ein solches Computerprogramm gespeichert ist. Jede Offenbarung, die hier unter Bezugnahme auf einen Aspekt der vorliegenden Offenbarung dargestellt wird, gilt auch für jeden anderen Aspekt der vorliegenden Offenbarung. [0007] This is achieved by the subject matter of the independent claims, with further embodiments being included in the dependent claims and the following description. [0008] Aspects of the present disclosure relate to a computer-implemented method for determining a cancer risk score for an individual, to a computer device configured to perform a method for determining a cancer risk score for an individual, to a corresponding computer program, and to a computer-readable medium on which such a computer program is stored. Any disclosure presented herein with reference to one aspect of the present disclosure also applies to any other aspect of the present disclosure.

[0009] Unter einem Aspekt der vorliegenden Offenbarung wird ein computerimplementiertes Verfahren zur Bestimmung, Berechnung und/oder Kalkulation eines Krebsrisiko-Scores für ein Individuum bereitgestellt. Das Verfahren umfasst: [0009] In one aspect of the present disclosure, a computer-implemented method for determining, calculating, and/or estimating a cancer risk score for an individual is provided. The method comprises:

- Erhalten von Wasserstoff-1-Kernspinresonanz-, 1H-NMR-Spektroskopiedaten für eine Biofluidprobe eines Individuums an einer Computervorrichtung, wobei die 1H-NMRSpektroskopiedaten indikativ für eine NMR-Signalintensität als Funktion der chemischen Verschiebung sind; - Obtaining hydrogen 1-nuclear magnetic resonance, 1H NMR spectroscopy data for a biofluid sample of an individual on a computing device, wherein the 1H NMR spectroscopy data is indicative of an NMR signal intensity as a function of chemical shift;

- Auswerten der 1H-NMR-Spektroskopiedaten mit mindestens einem trainierten maschinellen Lernmodell der Computervorrichtung in Bezug auf ein molekulares Profil der Biofluidprobe, - Evaluating the 1H NMR spectroscopy data with at least one trained machine learning model of the computer device with respect to a molecular profile of the biofluid sample,

wobei das molekulare Profil alle Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten where the molecular profile includes all hydrogen peaks in the 1H NMR spectroscopy data

15 15

20 20

25 25

30 30

35 35

3 enthält, die mit einem oder mehreren von einem Metaboliten, einem Protein, einer Aminosäure, einem Mikromolekül und einem Makromolekül, die in der Biofluidprobe enthalten sind, assoziiert werden können und/oder damit assoziiert sind; - Klassifizieren des molekularen Profils auf der Basis der Auswertung mit dem mindestens einen trainierten maschinellen Lernmodell in mindestens eine erste Klasse und eine zweite Klasse von molekularen Profilen, wobei die erste Klasse repräsentativ für molekulare Profile ist, die mit gesunden Individuen assoziiert sind, und die zweite Klasse repräsentativ für molekulare Profile ist, die mit Individuen assoziiert sind, die eine krebsartige und/oder präkrebsartige Erkrankung haben; und - Bestimmen, basierend auf der Klassifizierung des molekularen Profils, eines KrebsrisikoScores, der eine Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum angibt. [0010] Krebs kann die molekulare Gesamtzusammensetzung einer oder mehrerer Biofluiden eines Individuums beeinflussen, wie z. B. die Zusammensetzung von Blut, Urin, Zerebrospinalflüssigkeit und anderen Biofluiden. Dieser Einfluss kann Veränderungen des Anteils spezifischer Proteine, Metaboliten, Mikro- und/oder Makromoleküle einschließen, die entweder direkt von den Krebszellen freigesetzt werden oder deren Freisetzung direkt oder indirekt durch sie induziert wird. [0011] Das hier beschriebene Verfahren nutzt hochdetaillierte 1H-NMR (nuclear magnetic resonance bzw. Kernspinresonanz)-Spektroskopiedaten, auch Hochfrequenz-NMRSpektroskopiedaten oder Protonen-NMR-Spektroskopiedaten genannt, die aus einer Biofluidprobe eines Patienten oder Individuums gewonnen werden, in Kombination mit maschinellem Lernen (ML), um das molekulare Profil der Biofluidprobe zu analysieren und/oder zu bewerten, zum Beispiel um die oben genannten, durch Krebs verursachten Veränderungen in der Biofluidprobe zu erkennen. Die 1H-NMR-Spektroskopiedaten und/oder das darin enthaltene molekulare Profil können Informationen über die chemische Umgebung jedes Wasserstoffatoms (1H) in der Biofluidprobe liefern. Das bedeutet, dass je nach der Struktur des Moleküls, das das Wasserstoffatom enthält, eine andere chemische Umgebung vorhanden sein kann, wodurch sich die chemische Verschiebung dieses spezifischen Wasserstoffatoms ändert. Aus diesen Gründen können hochfrequente NMRoder 1H-NMR-Spektroskopiedaten einer Biofluidprobe eine extrem hohe Informationsdichte liefern, die zur Unterscheidung zwischen gesunden und kranken Patienten mit einer krebsartigen und/oder präkrebsartigen Erkrankung genutzt werden kann. [0012] Als präkrebsartige Erkrankungen werden im Allgemeinen Gesundheitszustände oder Läsionen bezeichnet, die ein erhöhtes Risiko haben, sich zu Krebs zu entwickeln, aber noch nicht als Krebs eingestuft werden. Die begleitenden Zellveränderungen sind oft abnormal, aber nicht völlig unkontrolliert, wie es bei Krebs der Fall ist. Ein Beispiel für eine Krebsvorstufe ist die Dysplasie des Gebärmutterhalses, die als zervikale intraepitheliale Neoplasie (cervical intrapethlelial neoplasia, CIN) bezeichnet wird und zu Gebärmutterhalskrebs führen kann. Ein Beispiel für eine Krebserkrankung ist der invasive Gebärmutterhalskrebs, bei dem sich die Krebszellen unkontrolliert 3 which can be associated with and/or are associated with one or more of a metabolite, a protein, an amino acid, a micromolecule and a macromolecule contained in the biofluid sample; - classifying the molecular profile based on the evaluation with the at least one trained machine learning model into at least a first class and a second class of molecular profiles, wherein the first class is representative of molecular profiles associated with healthy individuals and the second class is representative of molecular profiles associated with individuals who have a cancerous and/or pre-cancerous disease; and - determining, based on the classification of the molecular profile, a cancer risk score which indicates a probability of cancer occurring in the individual. [0010] Cancer can affect the overall molecular composition of one or more of an individual's biofluids, such as the composition of blood, urine, cerebrospinal fluid and other biofluids. This influence may include changes in the proportion of specific proteins, metabolites, micro- and/or macromolecules that are either directly released by the cancer cells or whose release is directly or indirectly induced by them. [0011] The method described here uses highly detailed 1H NMR (nuclear magnetic resonance) spectroscopy data, also called high-frequency NMR spectroscopy data or proton NMR spectroscopy data, obtained from a biofluid sample of a patient or individual, in combination with machine learning (ML) to analyze and/or evaluate the molecular profile of the biofluid sample, for example, to detect the above-mentioned cancer-induced changes in the biofluid sample. The 1H NMR spectroscopy data and/or the molecular profile contained therein can provide information about the chemical environment of each hydrogen atom (1H) in the biofluid sample. This means that, depending on the structure of the molecule containing the hydrogen atom, a different chemical environment may be present, changing the chemical shift of that specific hydrogen atom. For these reasons, high-frequency NMR or 1H NMR spectroscopy data from a biofluid sample can provide an extremely high information density that can be used to distinguish between healthy and sick patients with a cancerous and/or pre-cancerous condition. [0012] Pre-cancerous conditions generally refer to health conditions or lesions that have an increased risk of developing into cancer but are not yet classified as cancer. The accompanying cellular changes are often abnormal but not completely uncontrolled, as is the case with cancer. An example of a precancerous condition is dysplasia of the cervix, known as cervical intraepithelial neoplasia (CIN), which can lead to cervical cancer. An example of a cancer is invasive cervical cancer, in which the cancer cells spread uncontrollably.

teilen und in das umliegende Gewebe eindringen können. divide and penetrate the surrounding tissue.

15 15

20 20

25 25

30 30

35 35

4 [0013] Dementsprechend kann sich der Begriff präkrebsartige Erkrankung im Zusammenhang mit der vorliegenden Offenlegung auf Krankheiten beziehen, die entweder direkte Vorläufer von Krebs sind, wie z. B. IPMN, ein Tumor (Wucherung) der Bauchspeicheldrüse, der sich später oft zu Krebs entwickelt, oder auf Krankheiten, die das Krebsrisiko erhöhen, wie z. B. chronische Pankreatitis. [0014] Während die derzeit verwendeten Krebsnachweismethoden in der Regel auf einzelne Signale beschränkt sind, die einem bestimmten Tumormarker oder spezifischen Signalen in der jeweiligen Messung zugeordnet werden können, wie z. B. einem bestimmten Molekül, Metabolit oder Lipoprotein, berücksichtigt das Verfahren nach der vorliegenden Offenlegung alle Signaturen in den 1H-NMR-Spektroskopiedaten. Auf diese Weise kann das gesamte molekulare Profil der Biofluidprobe eines Individuums berücksichtigt, analysiert und ausgewertet werden, was eine zuverlässige Unterscheidung zwischen gesunden und krebskranken Patienten oder Individuen ermöglicht. 4 [0013] Accordingly, the term precancerous disease in the context of the present disclosure may refer to diseases that are either direct precursors of cancer, such as IPMN, a tumor (growth) of the pancreas that often later develops into cancer, or to diseases that increase the risk of cancer, such as chronic pancreatitis. [0014] While currently used cancer detection methods are generally limited to individual signals that can be attributed to a specific tumor marker or specific signals in the respective measurement, such as a specific molecule, metabolite, or lipoprotein, the method according to the present disclosure takes into account all signatures in the 1H NMR spectroscopy data. In this way, the entire molecular profile of an individual's biofluid sample can be considered, analyzed, and evaluated, enabling reliable discrimination between healthy and cancer-stricken patients or individuals.

[0015] Aufgrund des hohen Informationsgehalts der 1H-NMR-Spektroskopiedaten und der Tatsache, dass Krebs das molekulare Profil und den Stoffwechsel des Individuums bereits in einem sehr frühen Stadium beeinflussen kann, können verschiedene Veränderungen im molekularen Profil, die durch Krebs verursacht werden oder mit Krebs in Zusammenhang stehen, bereits in einem frühen Stadium einer Krebserkrankung oder sogar einer Präkrebserkrankung erkannt werden, wodurch eine frühzeitige Erkennung und Behandlung ermöglicht wird. [0015] Due to the high information content of 1H NMR spectroscopy data and the fact that cancer can affect the molecular profile and metabolism of the individual at a very early stage, various changes in the molecular profile caused by or associated with cancer can be detected at an early stage of cancer or even pre-cancerous disease, thus enabling early detection and treatment.

[0016] Darüber hinaus ist das hier beschriebene Verfahren nicht auf eine bestimmte Art von Krebs oder Krebserkrankung beschränkt, sondern kann vorteilhaft zum Nachweis verschiedener Krebsarten und/oder sogar zur Unterscheidung zwischen verschiedenen Krebsarten, Krebserkrankungen und/oder Krebsvorstufen verwendet werden. [0016] Furthermore, the method described here is not limited to a specific type of cancer or cancerous disease, but can be advantageously used to detect various types of cancer and/or even to distinguish between different types of cancer, cancerous diseases and/or precancerous lesions.

[0017] Darüber hinaus kann das hier beschriebene computerimplementierte Verfahren als Screening-Verfahren betrachtet oder verwendet werden, die in eine normale oder routinemäßige Untersuchung von Patienten oder Individuen integriert werden kann, z. B. wenn eine Biofluidprobe von dem Individuum während einer Routineuntersuchung bei einem Gesundheitsdienstleister oder im Krankenhaus entnommen wird. Darüber hinaus eignet sich 1H-NMR auch gut für ScreeningAnsätze mit hohem Durchsatz, ist relativ kostengünstig und hat eine hohe Reproduzierbarkeit. Daher kann das hier beschriebene Verfahren Gesundheitsdienstleistern bei der Früherkennung oder Identifizierung von Individuen mit einem Risiko für Krebs und/oder Krebsvorstufen helfen. Infolgedessen kann das hier beschriebene Verfahren eine zuverlässige und genaue Risikostratifizierung eines Individuums in Bezug auf eine Krebserkrankung, eine Präkrebserkrankung und/oder als gesund oder nicht an Krebs erkrankt ermöglichen. Dies wiederum kann eine bessere und effizientere Nutzung von Gesundheitsressourcen sowie Kosteneinsparungen ermöglichen. [0017] Furthermore, the computer-implemented method described herein can be considered or used as a screening method that can be integrated into a normal or routine examination of patients or individuals, e.g., when a biofluid sample is taken from the individual during a routine checkup at a healthcare provider or in the hospital. Furthermore, 1H NMR is also well suited to high-throughput screening approaches, is relatively inexpensive, and has high reproducibility. Therefore, the method described here can assist healthcare providers in the early detection or identification of individuals at risk for cancer and/or precancerous conditions. As a result, the method described here can enable reliable and accurate risk stratification of an individual with respect to having cancer, having precancerous conditions, and/or being healthy or not suffering from cancer. This, in turn, can enable better and more efficient use of healthcare resources as well as cost savings.

[0018] Im Zusammenhang mit der vorliegenden Offenlegung kann sich der Begriff "Individuum" allgemein auf ein Wirbeltier, einschließlich Tiere und Menschen, beziehen. Der Begriff "Individuum" [0018] In the context of the present disclosure, the term "individual" may refer generally to a vertebrate, including animals and humans. The term "individual"

kann hier synonym oder austauschbar mit Subjekt oder Patient verwendet werden. can be used here synonymously or interchangeably with subject or patient.

15 15

20 20

25 25

30 30

35 35

5 [0019] Die hier verwendeten 1H-NMR-Spektroskopiedaten können sich auf Daten beziehen, die mit einem Hochfrequenz-NMR-Spektrometer gewonnen und/oder erzeugt wurden, z. B. bei einer Frequenz von etwa 500 MHz oder darüber, vorzugsweise bei 600 MHz oder darüber. Die 1H-NMRSpektroskopiedaten, die auch als 1H-NMR-Spektrum bezeichnet werden, können beispielsweise mit einer Pulsfolge ähnlich CPMGPR1D gewonnen werden. [0020] Im Allgemeinen können die 1H-NMR-Spektroskopiedaten den freien Induktionsabfall (free induction decay, FID) enthalten oder anzeigen, der als 1H-NMR-Spektroskopie-Rohdaten in einer 1H-NMR-Spektroskopie der Biofluidprobe gemessen werden kann. Der gemessene FID kann optional durch Fourier- Transformation in ein Spektrum umgewandelt werden, und weiter optional referenziert und/oder in eine chemische Verschiebung in Teilen pro Million (ppm) umgewandelt werden, zum Beispiel auf der Basis einer Normalisierung auf einen Referenzpeak im Spektrum, wie z. B. den Peak von Laktat oder anomerem D-Glukose-Duplett. Dementsprechend umfasst der Ausdruck "1H-NMR-Spektroskopiedaten, die indikativ für eine NMR-Signalintensität als Funktion der chemischen Verschiebung sind" sowohl 1H-NMR-Spektroskopiedaten, die als Funktion der chemischen Verschiebung angegeben werden, als auch 1H-NMR-Spektroskopiedaten, die als Funktion der FID oder einer anderen Größe angegeben werden, aus der die NMR-Signalintensität als Funktion der chemischen Verschiebung abgeleitet werden kann. [0021] Wie hier verwendet, kann das Erhalten der 1H-NMR-Spektroskopiedaten den Zugriff auf die 1H-NMR-Spektroskopiedaten und/oder das Abrufen der 1H-NMR-Spektroskopiedaten umfassen. Beispielsweise kann auf die 1H-NMR-Spektroskopiedaten in mindestens einem Speicher oder Datenspeicher der Computervorrichtung, die das Verfahren der vorliegenden Offenbarung durchführt, aus einem Speicher oder Datenspeicher einer anderen Computervorrichtung und/oder aus einem entfernten Datenspeicher (einer Datenbank, einem weiteren Speicher, einem CloudSpeicher oder dergleichen) zugegriffen werden und/oder diese abgerufen werden. Dementsprechend kann das Abrufen der 1H-NMR-Spektroskopiedaten optional das Herunterladen der 1H-NMR-Spektroskopiedaten von einem externen Rechengerät oder Datenspeicher umfassen. [0022] Zusätzlich oder alternativ kann das Erhalten der 1H-NMR-Spektroskopiedaten das Empfangen der 1H-NMR-Spektroskopiedaten umfassen, z. B. von einer anderen Computereinrichtung als der Computereinrichtung, die auf die Daten zugreift. Dementsprechend kann das Erhalten der 1H-NMR-Spektroskopiedaten einen oder mehrere der folgenden Schritte umfassen: Empfangen der 1H-NMR-Spektroskopiedaten, Speichern der 1H-NMRSpektroskopiedaten im Speicher oder Datenspeicher der Computervorrichtung und Abrufen der 1HNMR-Spektroskopiedaten durch die Computervorrichtung. [0023] Wie hierin verwendet, kann die Auswertung der 1H-NMR-Spektroskopiedaten mit dem mindestens einen trainierten maschinellen Lernmodell der Computervorrichtung in Bezug auf das molekulare Profil der Biofluidprobe die Verarbeitung und/oder Analyse der 1H-NMRSpektroskopiedaten mit Hilfe des mindestens einen trainierten ML-Modells in Bezug auf alle in den 1H-NMR-Spektroskopiedaten enthaltenen Wasserstoffpeaks umfassen. Eine solche Auswertung der 5 [0019] The 1H NMR spectroscopy data used herein may refer to data acquired and/or generated using a high-frequency NMR spectrometer, e.g., at a frequency of about 500 MHz or higher, preferably at 600 MHz or higher. The 1H NMR spectroscopy data, also referred to as a 1H NMR spectrum, may be acquired, for example, using a pulse sequence similar to CPMGPR1D. [0020] In general, the 1H NMR spectroscopy data may include or indicate the free induction decay (FID), which may be measured as raw 1H NMR spectroscopy data in a 1H NMR spectroscopy of the biofluid sample. The measured FID can optionally be converted into a spectrum by Fourier transformation, and further optionally referenced and/or converted into a chemical shift in parts per million (ppm), for example, based on normalization to a reference peak in the spectrum, such as the peak of lactate or anomeric D-glucose doublet. Accordingly, the term "1H NMR spectroscopy data indicative of an NMR signal intensity as a function of chemical shift" includes both 1H NMR spectroscopy data expressed as a function of chemical shift and 1H NMR spectroscopy data expressed as a function of FID or another quantity from which the NMR signal intensity as a function of chemical shift can be derived. [0021] As used herein, obtaining the 1H NMR spectroscopy data may comprise accessing and/or retrieving the 1H NMR spectroscopy data. For example, the 1H NMR spectroscopy data may be accessed and/or retrieved in at least one memory or data storage of the computing device performing the method of the present disclosure, from a memory or data storage of another computing device, and/or from a remote data storage (a database, further storage, cloud storage, or the like). Accordingly, retrieving the 1H NMR spectroscopy data may optionally comprise downloading the 1H NMR spectroscopy data from an external computing device or data storage. [0022] Additionally or alternatively, obtaining the 1H NMR spectroscopy data may comprise receiving the 1H NMR spectroscopy data, e.g., from a computer or device. B. from a different computing device than the computing device accessing the data. Accordingly, obtaining the 1H NMR spectroscopy data may comprise one or more of the following steps: receiving the 1H NMR spectroscopy data, storing the 1H NMR spectroscopy data in the memory or data storage of the computing device, and retrieving the 1H NMR spectroscopy data by the computing device. [0023] As used herein, evaluating the 1H NMR spectroscopy data with the at least one trained machine learning model of the computing device with respect to the molecular profile of the biofluid sample may comprise processing and/or analyzing the 1H NMR spectroscopy data with the aid of the at least one trained ML model with respect to all hydrogen peaks contained in the 1H NMR spectroscopy data. Such evaluation of the

1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil kann die Auswertung und/oder 1H-NMR spectroscopy data related to the molecular profile can facilitate the evaluation and/or

15 15

20 20

25 25

30 30

35 35

6 Analyse des molekularen Profils der Biofluidprobe in Bezug auf ein oder mehrere molekulare Referenzprofile von einer oder mehreren Referenzindividuen umfassen. Beispielsweise kann das mindestens eine ML-Modell auf der Basis von 1H-NMR-Spektroskopie-Referenzdaten einer oder mehrerer Referenzindividuen trainiert werden. [0024] Auf der Basis der Auswertung des molekularen Profils und/oder der 1H-NMRSpektroskopiedaten der Biofluidprobe kann das mindestens eine trainierte maschinelle Lernmodell eine binäre Klassifizierung oder Multiklassenklassifizierung in mindestens zwei Klassen von molekularen Profilen durchführen, nämlich in mindestens die erste Klasse von molekularen Profilen, die mit gesunden Individuen assoziiert sind, und die zweite Klasse von molekularen Profilen, die mit einer Krebserkrankung und/oder einer Krebsvorstufe assoziiert sind. Dementsprechend kann sich das mindestens eine maschinelle Lernmodell auf ein Klassifizierungsmodell oder einen Algorithmus beziehen, das bzw. der nach einem maschinellen Lernansatz trainiert wurde. [0025] Wie hierin verwendet, kann die Bestimmung des Krebsrisiko-Scores auch die Berechnung und/oder Bewertung des Risikos und/oder der Wahrscheinlichkeit für das Auftreten und/oder Vorliegen einer krebsartigen und/oder präkrebsartigen Erkrankung bei dem Individuum umfassen. Dies kann die Bewertung, Berechnung und/oder Bestimmung der Wahrscheinlichkeit für das Auftreten und/oder Vorhandensein von Krebs bzw. einer krebsartigen und/oder präkrebsartigen Erkrankung bei dem Individuum umfassen. Mit anderen Worten kann das Bestimmen des Krebsrisiko-Scores zum Beispiel das Berechnen und/oder Bestimmen der Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum auf der Basis oder unter Verwendung des mindestens einen trainierten maschinellen Lernmodells der Computervorrichtung umfassen. [0026] Der hier verwendete Krebsrisiko-Score kann sich auf ein numerisches Maß beziehen, das das ermittelte Risiko und/oder die Wahrscheinlichkeit des Auftretens und/oder Vorliegens einer krebsartigen und/oder präkrebsartigen Erkrankung bei einem Individuum angibt. Dabei kann der Krebsrisiko-Score auf einer arbiträren Skala angegeben werden, die von einem Minimalwert, z. B. Null oder 0, bis zu einem Maximalwert, z. B. Eins oder 100, reicht. Jede andere Skala, einschließlich relativer und absoluter Skalen, kann zur Darstellung des Krebsrisiko-Scores verwendet werden. Es sollte beachtet werden, dass eine Vielzahl von Krebsrisiko-Scores berechnet werden kann, z. B. unter Verwendung einer Vielzahl von verschiedenen maschinellen Lernmodellen oder Algorithmen, wie nachstehend ausführlicher beschrieben. [0027] Optional kann der Krebsrisiko-Score von der Computervorrichtung ausgegeben werden, zum Beispiel auf einer Benutzeroberfläche der Computervorrichtung. Ferner können optional zusätzliche oder kontextbezogene Informationen, die mit dem Krebsrisiko-Score assoziiert sind, von der Computervorrichtung bestimmt werden, die optional als Ausgabe von der Computervorrichtung bereitgestellt werden können. Solche Kontextinformationen können ein Risikoniveau oder eine Risikostufe, wie z. B. geringes Risiko, mittleres Risiko, hohes Risiko, für eine krebsartige und/oder präkrebsartige Erkrankung, die bei dem Individuum auftritt und/oder vorhanden ist, beinhalten. Alternativ oder zusätzlich können die kontextbezogenen Informationen einen Hinweis auf eine 6 Analysis of the molecular profile of the biofluid sample with respect to one or more molecular reference profiles of one or more reference individuals. For example, the at least one ML model can be trained based on 1H NMR spectroscopy reference data of one or more reference individuals. [0024] Based on the evaluation of the molecular profile and/or the 1H NMR spectroscopy data of the biofluid sample, the at least one trained machine learning model can perform a binary classification or multi-class classification into at least two classes of molecular profiles, namely at least the first class of molecular profiles associated with healthy individuals and the second class of molecular profiles associated with a cancer and/or a precancerous condition. Accordingly, the at least one machine learning model can refer to a classification model or algorithm trained according to a machine learning approach. [0025] As used herein, determining the cancer risk score may also comprise calculating and/or assessing the risk and/or probability of the occurrence and/or presence of a cancerous and/or pre-cancerous condition in the individual. This may comprise assessing, calculating and/or determining the probability of the occurrence and/or presence of cancer or a cancerous and/or pre-cancerous condition in the individual. In other words, determining the cancer risk score may comprise, for example, calculating and/or determining the probability of the occurrence of cancer in the individual based on or using the at least one trained machine learning model of the computing device. [0026] The cancer risk score used here may refer to a numerical measure that indicates the determined risk and/or probability of the occurrence and/or presence of a cancerous and/or pre-cancerous condition in an individual. The cancer risk score may be given on an arbitrary scale that ranges from a minimum value, e.g. B. zero or 0, up to a maximum value, e.g. one or 100. Any other scale, including relative and absolute scales, may be used to represent the cancer risk score. It should be noted that a variety of cancer risk scores may be calculated, e.g. using a variety of different machine learning models or algorithms, as described in more detail below. [0027] Optionally, the cancer risk score may be output by the computing device, for example, on a user interface of the computing device. Further, optionally, additional or contextual information associated with the cancer risk score may be determined by the computing device, which may optionally be provided as output from the computing device. Such contextual information may include a risk level or stage, such as low risk, moderate risk, high risk, for a cancerous and/or pre-cancerous condition occurring and/or present in the individual. Alternatively or additionally, the contextual information may be an indication of a

Krebsart und/oder ein geschätztes Krebsstadium enthalten. cancer type and/or an estimated cancer stage.

15 15

20 20

25 25

30 30

35 35

7 [0028] Die hier beschriebene Computervorrichtung kann sich auf jedes Datenverarbeitungsgerät mit einem oder mehreren Prozessoren für die Datenverarbeitung beziehen. Die Computervorrichtung kann als eigenständige Computervorrichtung, als Server und/oder als Computernetzwerk mit einer Vielzahl von zusammenarbeitenden Computervorrichtungen, wie z. B. einem CloudComputersystem oder Serversystem, ausgeführt werden. Alternativ oder zusätzlich kann die Computervorrichtung zumindest teilweise als mobiles Gerät, wie z. B. ein Smartphone, ein TabletComputer, ein Notebook oder ähnliches, ausgeführt werden. [0029] Gemäß einer Ausführungsform umfasst das Klassifizieren des molekularen Profils das Bestimmen und/oder Berechnen eines Klassifizierungsergebnisses, das eine Wahrscheinlichkeit für mindestens eine der ersten Klasse und der zweiten Klasse angibt, wobei der Krebsrisiko-Score auf der Basis des Klassifizierungsergebnisses bestimmt wird. Dementsprechend kann der KrebsrisikoScore das Klassifizierungsergebnis, das von dem mindestens einen ML-Modell berechnet oder erzeugt wurde, umfassen oder diesem entsprechen. Das Klassifizieren des molekularen Profils kann zum Beispiel beinhalten, dass eine Schlussfolgerung gezogen wird, ob das molekulare Profil der Biofluidprobe zu der ersten Klasse, der zweiten Klasse und optional zu einer oder mehreren weiteren Klassen gehört. Alternativ oder zusätzlich kann das Klassifizieren des molekularen Profils die Berechnung einer Wahrscheinlichkeit für die Zugehörigkeit des molekularen Profils der Biofluidprobe zu der ersten Klasse von molekularen Profilen, zu der zweiten Klasse von molekularen Profilen und optional zu einer oder mehreren weiteren Klassen von molekularen Profilen umfassen. Mit anderen Worten, das mindestens ein maschinelles Lernmodell kann trainiert werden, um eine Wahrscheinlichkeit für eine binäre Klassifizierung und/oder Multiklassen-Klassifizierung des molekularen Profils der Biofluidprobe zu bestimmen. Es sollte beachtet werden, dass im Falle einer binären Klassifizierung nur ein Klassifizierungsergebnis einer der ersten und zweiten Klasse berechnet werden kann und das Klassifizierungsergebnis der anderen der ersten und zweiten Klasse darauf basierend berechnet werden kann, z.B. basierend auf der Subtraktion des ermittelten Klassifizierungsergebnisses von eins oder 100%. [0030] Gemäß einer Ausführungsform wird die Biofluidprobe aus der Gruppe ausgewählt, die aus einer Blutserumprobe, einer Blutplasmaprobe, einer Blutprobe, einer Urinprbbe und einer Zerebrospinalflüssigkeitsprobe besteht. Mit anderen Worten, die Biofluidprobe kann eine oder mehrere Blutserumproben, Blutplasmaproben, Blutproben, Urinproben und Zerebrospinalflüssigkeitsproben umfassen. [0031] Je nach Art des Krebses oder der Krebserkrankung kann die Zusammensetzung eines oder mehrerer Biofluide eines Individuums verändert, modifiziert und/oder umgewandelt werden, wobei solche Veränderungen gegebenenfalls in verschiedenen Stadien der Krebserkrankung herbeigeführt werden können. Bei einem Großteil der Krebsarten und Krebserkrankungen ist davon auszugehen, dass sich die Zusammensetzung oder die molekulare Zusammensetzung des Blutes eines Individuums verändert, beispielsweise aufgrund des veränderten Stoffwechsels und der veränderten Proteinzusammensetzung, wodurch sich der Anteil eines oder mehrerer Proteine, Metaboliten, 7 [0028] The computing device described here can refer to any data processing device having one or more processors for data processing. The computing device can be embodied as a standalone computing device, as a server, and/or as a computer network with a plurality of cooperating computing devices, such as a cloud computing system or server system. Alternatively or additionally, the computing device can be embodied at least partially as a mobile device, such as a smartphone, a tablet computer, a notebook, or the like. [0029] According to one embodiment, classifying the molecular profile comprises determining and/or calculating a classification result indicating a probability for at least one of the first class and the second class, wherein the cancer risk score is determined based on the classification result. Accordingly, the cancer risk score can comprise or correspond to the classification result calculated or generated by the at least one ML model. Classifying the molecular profile may, for example, involve drawing a conclusion as to whether the molecular profile of the biofluid sample belongs to the first class, the second class, and optionally to one or more further classes. Alternatively or additionally, classifying the molecular profile may comprise calculating a probability for the molecular profile of the biofluid sample to belong to the first class of molecular profiles, to the second class of molecular profiles, and optionally to one or more further classes of molecular profiles. In other words, the at least one machine learning model may be trained to determine a probability for a binary classification and/or multiclass classification of the molecular profile of the biofluid sample. It should be noted that in the case of a binary classification, only one classification result of one of the first and second classes may be calculated, and the classification result of the other of the first and second classes may be calculated based thereon, e.g., based on the subtraction of the determined classification result from one or 100%. [0030] According to one embodiment, the biofluid sample is selected from the group consisting of a blood serum sample, a blood plasma sample, a blood sample, a urine sample, and a cerebrospinal fluid sample. In other words, the biofluid sample may comprise one or more blood serum samples, blood plasma samples, blood samples, urine samples, and cerebrospinal fluid samples. [0031] Depending on the type of cancer or cancerous disease, the composition of one or more biofluids of an individual may be altered, modified, and/or transformed, wherein such changes may optionally be induced at different stages of the cancerous disease. In the majority of cancers and cancerous diseases, it is assumed that the composition or molecular composition of an individual's blood changes, for example due to altered metabolism and protein composition, thereby altering the proportion of one or more proteins, metabolites,

Aminosäuren, Mikro- und/oder Makromoleküle verändern kann, die entweder direkt von den amino acids, micro- and/or macromolecules that are either directly derived from the

15 15

20 20

25 25

30 30

35 35

8 Krebszellen in das Blut abgegeben werden oder deren Abgabe direkt oder indirekt durch den Krebs, z. B. über biologische Netzwerke, induziert wird. In diesen Fällen kann eine Blutprobe, eine Blutserumprobe und/oder eine Blutplasmaprobe als Biofluidprobe verwendet werden, um die 1HNMR-Spektroskopiedaten zu erzeugen und den Krebsrisiko-Score zu bestimmen. Andere Krebsarten oder Krebserkrankungen, wie z. B. Tumore der Blase, der Niere oder der Harnwege, können dagegen die molekulare Zusammensetzung des Urins im frühen Krebsstadium verändern, und eine Urinprobe kann als Biofluidprobe verwendet werden, um die 1H-NMR-Spektroskopiedaten zu erzeugen und den Krebsrisiko-Score zu bestimmen, was eine Krebsfrüherkennung ermöglichen kann. Andere Krebsarten oder Krebserkrankungen, wie z. B. Hirntumore, können jedoch die molekulare Zusammensetzung der Zerebrospinalflüssigkeit verändern, und eine Zerebrospinalflüssigkeitsprobe kann als Biofluidprobe verwendet werden, um die 1H-NMRSpektroskopiedaten zu erzeugen und den Krebsrisiko-Score zu bestimmen. Dementsprechend kann die Verwendung einer oder mehrerer Blutserumproben, Blutplasmaproben, Blutproben, Urinproben und Zerebrospinalflüssigkeitsproben eine frühestmögliche Diagnose oder Krebserkennung ermöglichen. [0032] Gemäß einer Ausführungsform sind die 1H-NMR-Spektroskopiedaten indikativ für eine NMR-Signalintensität in gebinnten Inkrementen der chemischen Verschiebung, wobei jedes Inkrement eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, zum Beispiel etwa 0,00016 ppm, aufweist. Es sei darauf hingewiesen, dass die vorgenannten ppm-Werte und Bereiche für jede inkrementelle Bin der chemischen Verschiebung gelten, die hier erörtert oder offengelegt wird. Im Allgemeinen kann die Verwendung einer solchen feinstrukturierten Einteilung des 1H-NMR-Spektrums den Informationsgehalt der 1H-NMR-Spektroskopiedaten erhöhen und somit die Identifizierung feinstrukturierter Veränderungen im molekularen Profil, die durch Krebs oder eine Krebserkrankung verursacht werden, in einem sehr frühen Stadium des Krebses ermöglichen. [0033] Gemäß einer Ausführungsform umfasst das Verfahren ferner die Umwandlung der 1H-NMRSpektroskopiedaten in eine gebinnte Datenstruktur auf der Basis der Figur tind/oder Zuordnung der NMR-Signalintensität zu gebinnten Inkrementen der chemischen Verschiebung, wobei jedes Inkrement eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, beispielsweise etwa 0,00016 ppm, aufweist. Alternativ oder zusätzlich kann das Verfahren ferner das Binning der 1HNMR-Spektroskopiedaten in eine Vielzahl von inkrementellen Bins der chemischen Verschiebung umfassen, wobei jedes Bin eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, beispielsweise etwa 0,00016 ppm, aufweist. Optional können die gebinnten 1H-NMRSpektroskopiedaten auf einen Median von mindestens einer Teilmenge der Bins normalisiert werden. Ein Vorteil einer solchen Normalisierung kann eine Verringerung der Variabilität zwischen 8 Cancer cells are released into the blood or their release is induced directly or indirectly by the cancer, e.g., via biological networks. In these cases, a blood sample, a blood serum sample, and/or a blood plasma sample can be used as a biofluid sample to generate the 1H NMR spectroscopy data and determine the cancer risk score. Other cancer types or cancer diseases, such as tumors of the bladder, kidney, or urinary tract, can, on the other hand, alter the molecular composition of the urine in the early cancer stage, and a urine sample can be used as a biofluid sample to generate the 1H NMR spectroscopy data and determine the cancer risk score, which can enable early cancer detection. Other cancer types or cancer diseases, such as tumors of the bladder, kidney, or urinary tract, can, However, certain cancers, such as brain tumors, can alter the molecular composition of the cerebrospinal fluid, and a cerebrospinal fluid sample can be used as a biofluid sample to generate the 1H NMR spectroscopy data and determine the cancer risk score. Accordingly, the use of one or more blood serum samples, blood plasma samples, blood samples, urine samples, and cerebrospinal fluid samples can enable the earliest possible diagnosis or cancer detection. [0032] According to one embodiment, the 1H NMR spectroscopy data is indicative of an NMR signal intensity in binned increments of chemical shift, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. It should be noted that the aforementioned ppm values and ranges apply to each incremental chemical shift bin discussed or disclosed herein. In general, the use of such fine-grained binning of the 1H NMR spectrum can increase the information content of the 1H NMR spectroscopy data and thus enable the identification of fine-grained changes in the molecular profile caused by cancer or cancerous disease at a very early stage of the cancer. [0033] According to one embodiment, the method further comprises converting the 1H NMR spectroscopy data into a binned data structure based on the figure and/or assigning the NMR signal intensity to binned chemical shift increments, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. Alternatively or additionally, the method may further comprise binning the 1HNMR spectroscopy data into a plurality of incremental chemical shift bins, each bin having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. Optionally, the binned 1H NMR spectroscopy data may be normalized to a median of at least a subset of the bins. An advantage of such normalization may be a reduction in the variability between

verschiedenen präanalytischen Ansätzen sein. different pre-analytical approaches.

15 15

20 20

25 25

30 30

35 35

9 [0034] In einer beispielhaften Implementierung kann eine Biofluidprobe mit einem HochfrequenzNMR-Spektrometer analysiert werden, z. B. bei oder über etwa 500 MHz mit einer Pulsfolge ähnlich CPMGPR1D. Der freie Induktionsabfall (FID) kann gemessen und mittels Fourier-Transformation in ein Spektrum umgewandelt werden, und das Fourier-transformierte Spektrum kann auf einen Referenzpeak, wie z. B. Laktat, anomeres D-Glukose-Duplett oder einen anderen Peak, normiert werden. Ferner kann die Basislinie korrigiert und das Spektrum dann in gebinnte Inkremente von kleiner oder gleich 0,02 ppm, vorzugsweise kleiner oder gleich etwa 0,01, noch bevorzugter kleiner oder gleich etwa 0,006 ppm, zum Beispiel etwa 0,00016 ppm, umgewandelt werden. Dies ermöglicht es, eine beträchtliche Menge an Daten für die Entschlüsselung des molekularen Profils der Biofluidprobe zu erhalten, und gibt Informationen über alle vorhandenen Wasserstoffatome, wobei jedes Bin eine bestimmte Position im 1H-NMR-Spektrum oder in den Spektroskopiedaten darstellen kann. [0035] Gemäß einer Ausführungsform ist das mindestens eine trainierte maschinelle Lernmodell ein maschinell erlernter Klassifikator, der so eingerichtet ist, dass er als Eingabedaten die 1H-NMRSpektroskopiedaten in einer Datenstruktur aus gebinnten Inkrementen der chemischen Verschiebung verarbeitet. Mit anderen Worten, das mindestens eine ML-Modell kann trainiert oder maschinell erlernt werden, um die 1H-NMR-Spektroskopiedaten in einer Datenstruktur von gebinnten Inkrementen der chemischen Verschiebung zu empfangen und/oder zu verarbeiten und als Ausgabe ein Klassifizierungsergebnis und/oder den mindestens einen Krebsrisiko-Score zu liefern, der das Klassifizierungsergebnis darstellen, beinhalten und/oder ihm entsprechen kann. [0036] Gemäß einer Ausführungsform wird das mindestens eine maschinelle Lernmodell trainiert und/oder eingerichtet, um mindestens eine molekulare Signatur einer Krebserkrankung und/oder einer präkrebsartigen Erkrankung im molekularen Profil der Biofluidprobe und/oder in den 1H-NMRSpektroskopiedaten zu identifizieren. Im Allgemeinen kann das ML-Modell trainiert und/oder eingerichtet werden, um eine oder mehrere molekulare Signaturen von Krebs und/oder Krebsvorstufen in den 1H-NMR-Spektroskopiedaten zu identifizieren und/oder zu bestimmen. Dabei kann die eine oder mehrere molekulare Signatur(en) an einer beliebigen Position oder Stelle im Spektrum identifiziert werden, zum Beispiel in einem oder mehreren Bins d&r chemischen Verschiebung. Darüber hinaus kann eine molekulare Signatur einer krebsartigen und/oder präkrebsartigen Erkrankung jede Veränderung und/oder Änderung des molekularen Profils der Biofluidprobe im Vergleich zu einem oder mehreren molekularen Profilen gesunder Individuen umfassen. Eine solche molekulare Signatur bzw. eine solche Veränderung und/oder Änderung des molekularen Profils im Vergleich zu den molekularen Profilen gesunder Individuen kann beispielsweise einen zusätzlichen Wasserstoffpeak in den 1H-NMR-Spektroskopiedaten umfassen, der durch eine Krebserkrankung und/oder eine präkrebsartige Erkrankung hervorgerufen wird, einen Wasserstoffpeak mit erhöhter Höhe oder Größe, einen Wasserstoffpeak mit verringerter Höhe oder Größe, eine Änderung der Breite eines Wasserstoffpeaks, eine Überlappung mehrerer Wasserstoffpeaks, das Entfernen eines Wasserstoffpeaks, eine Verschiebung in ppm eines oder 9 [0034] In an exemplary implementation, a biofluid sample may be analyzed with a high-frequency NMR spectrometer, e.g., at or above about 500 MHz with a pulse sequence similar to CPMGPR1D. The free induction decay (FID) may be measured and Fourier transformed into a spectrum, and the Fourier transformed spectrum may be normalized to a reference peak, such as lactate, anomeric D-glucose doublet, or another peak. Further, the baseline may be corrected, and the spectrum may then be binned increments of less than or equal to 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. This makes it possible to obtain a significant amount of data for deciphering the molecular profile of the biofluid sample and provides information about all hydrogen atoms present, where each bin may represent a specific position in the 1H NMR spectrum or in the spectroscopy data. [0035] According to one embodiment, the at least one trained machine learning model is a machine-learned classifier configured to process, as input data, the 1H NMR spectroscopy data in a data structure of binned chemical shift increments. In other words, the at least one ML model may be trained or machine-learned to receive and/or process the 1H NMR spectroscopy data in a data structure of binned chemical shift increments and to provide, as output, a classification result and/or the at least one cancer risk score, which may represent, include, and/or correspond to the classification result. [0036] According to one embodiment, the at least one machine learning model is trained and/or configured to identify at least one molecular signature of a cancer and/or a pre-cancerous condition in the molecular profile of the biofluid sample and/or in the 1H NMR spectroscopy data. In general, the ML model can be trained and/or configured to identify and/or determine one or more molecular signatures of cancer and/or pre-cancerous conditions in the 1H NMR spectroscopy data. The one or more molecular signatures can be identified at any position or location in the spectrum, for example, in one or more chemical shift bins. Furthermore, a molecular signature of a cancer and/or pre-cancerous condition can comprise any alteration and/or change in the molecular profile of the biofluid sample compared to one or more molecular profiles of healthy individuals. Such a molecular signature or such a change and/or alteration of the molecular profile compared to the molecular profiles of healthy individuals may, for example, include an additional hydrogen peak in the 1H NMR spectroscopy data caused by a cancer and/or a pre-cancerous condition, a hydrogen peak with increased height or size, a hydrogen peak with reduced height or size, a change in the width of a hydrogen peak, an overlap of several hydrogen peaks, the removal of a hydrogen peak, a shift in ppm of one or

mehrerer Wasserstoffpeaks oder eine Kombination davon. several hydrogen peaks or a combination thereof.

15 15

20 20

25 25

30 30

35 35

10 [0037] Gemäß einer Ausführungsform kann der ermittelte Krebsrisiko-Score als rechnerischer Biomarker verwendet werden, der auf eine pathogene molekulare Signatur und/oder molekulare Signaturen einer krebsartigen und/oder präkrebsartigen Erkrankung im molekularen Profil der Biofluidprobe hinweist. Mit anderen Worten, der ermittelte Krebsrisiko-Score kann ein computerimplementierter Biomarker sein, der über pathogene molekulare Signaturen in der Biofluidprobe des Individuums informiert. Wie oben beschrieben, können eine oder mehrere verschiedene molekulare Signaturen zur Berechnung des Krebsrisikos herangezogen werden. Daher können auch verschiedene rechnerische Biomarker in Form verschiedener KrebsrisikoScores für verschiedene Arten von Krebs, Krebserkrankungen und/oder Krebsvorstufen nach dem hier beschriebenen Verfahren berechnet werden. Dementsprechend kann das hier beschriebene Verfahren auf viele verschiedene Arten von Krebserkrankungen und/oder Krebsvorstufen angewandt oder verwendet werden, wodurch ein vielseitiger Ansatz oder ein vielseitiges Verfahren für die Krebserkennung und/oder die Bewertung des Krebsrisikos bereitgestellt wird. [0038] Gemäß einer Ausführungsform umfasst die Identifizierung der mindestens einen molekularen Signatur das Identifizieren eines oder mehrerer Bins in den 1H-NMRSpektroskopiedaten, die mit einem oder mehreren Wasserstoffpeaks assoziiert sind und/oder einen oder mehrere Wasserstoffpeaks enthalten, die auf eine krebsinduzierte Veränderung des molekularen Profils im Vergleich zu molekularen Profilen von gesunden Individuen und/oder Referenzindividuen hinweisen. Auf der Basis der identifizierten einen oder mehreren Bins in den 1HNMR-Spektroskopiedaten kann der Krebsrisiko-Score berechnet werden. Dementsprechend kann das mindestens ein ML-Modell trainiert werden, um eine oder mehrere krebsbedingte Veränderungen in einem oder mehreren Wasserstoffpeaks in Bezug auf ein oder mehrere molekulare Profile von gesunden Individuen zu identifizieren. [0039] Gemäß einer Ausführungsform umfasst das Identifizieren der mindestens einen molekularen Signatur die Identifizierung eines oder mehrerer Bins und/oder eines Bereichs der chemischen Verschiebung in den 1H-NMR-Spektroskopiedaten, die mit einer Überlappung von Wasserstoffpeaks verbunden sind, die einer Vielzahl von Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen in der Biofluidprobe zugeordnet werden können. Mit anderen Worten: Das ML-Modell kann so trainiert werden, dass es eine Überlappung von Wasserstoffpeaks identifiziert, die durch eine Krebserkrankung und/oder eine Krebsvorstufe hervorgerufen werden können und die einen oder mehrere Bins der chemischen Verschiebung in den 1H-NMRSpektroskopiedaten umfassen können. Dies kann es ermöglichen, komplexe Muster molekularer Signaturen zu identifizieren, die durch die Krebserkrankung und/oder die Krebsvorstufe verursacht werden, und/oder komplexe Muster molekularer Signaturen bei der Berechnung des Krebsrisikos zu berücksichtigen. Dadurch kann die allgemeine Vielseitigkeit des Verfahrens erhöht werden, um beispielsweise die Bestimmung verschiedener Arten von Krebs und/oder Krebsvorstufen zu ermöglichen. [0040] Gemäß einer Ausführungsform umfasst das Identifizieren der mindestens einen molekularen 10 [0037] According to one embodiment, the determined cancer risk score can be used as a computational biomarker that indicates a pathogenic molecular signature and/or molecular signatures of a cancerous and/or pre-cancerous disease in the molecular profile of the biofluid sample. In other words, the determined cancer risk score can be a computer-implemented biomarker that informs about pathogenic molecular signatures in the individual's biofluid sample. As described above, one or more different molecular signatures can be used to calculate cancer risk. Therefore, different computational biomarkers in the form of different cancer risk scores for different types of cancer, cancers, and/or pre-cancerous conditions can also be calculated according to the method described herein. Accordingly, the method described herein can be applied or used for many different types of cancers and/or pre-cancerous conditions, thereby providing a versatile approach or method for cancer detection and/or cancer risk assessment. [0038] According to one embodiment, identifying the at least one molecular signature comprises identifying one or more bins in the 1H NMR spectroscopy data that are associated with one or more hydrogen peaks and/or contain one or more hydrogen peaks that indicate a cancer-induced alteration of the molecular profile compared to molecular profiles of healthy individuals and/or reference individuals. Based on the identified one or more bins in the 1H NMR spectroscopy data, the cancer risk score can be calculated. Accordingly, the at least one ML model can be trained to identify one or more cancer-related alterations in one or more hydrogen peaks with respect to one or more molecular profiles of healthy individuals. [0039] According to one embodiment, identifying the at least one molecular signature comprises identifying one or more bins and/or a chemical shift range in the 1H NMR spectroscopy data that are associated with an overlap of hydrogen peaks that can be attributed to a variety of metabolites, proteins, amino acids, micromolecules, and macromolecules in the biofluid sample. In other words, the ML model can be trained to identify an overlap of hydrogen peaks that can be caused by a cancer and/or a precancerous condition and that can comprise one or more chemical shift bins in the 1H NMR spectroscopy data. This can make it possible to identify complex patterns of molecular signatures caused by the cancer and/or the precancerous condition and/or to consider complex patterns of molecular signatures when calculating the cancer risk. This can increase the overall versatility of the method, for example, to enable the determination of different types of cancer and/or precancerous conditions. [0040] According to one embodiment, identifying the at least one molecular

Signatur das Identifizieren eines oder mehrerer Bins und/oder eines Bereichs der chemischen Signature the identification of one or more bins and/or a range of chemical

15 15

20 20

25 25

30 30

35 35

11 Verschiebung in den 1H-NMR-Spektroskopiedaten, die nicht einzelnen Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen in der Biofluidprobe zugeordnet werden können. Dementsprechend können auch molekulare Signaturen, die nicht einzelnen Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen in der Biofluidprobe zuzuordnen und/oder mit ihnen assoziiert sind, bei der Berechnung des Krebsrisiko-Scores berücksichtigt werden. Auch dies kann die allgemeine Vielseitigkeit des Verfahrens erhöhen, beispielsweise um die Bestimmung verschiedener Arten von Krebs und/oder Krebsvorstufen zu ermöglichen. [0041] Gemäß einer Ausführungsform ist die mindestens eine molekulare Signatur mit einer krebsbedingten Veränderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMRSpektroskopiedaten assoziiert, die mit einem oder mehreren Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen in der Biofluidprobe im Bezug zu einer Biofluidprobe einer oder mehrerer gesunder Individuen assoziiert sind. Dabei kann sich eine Änderung des Anteils eines Wasserstoff-Peaks auf eine oder mehrere der folgenden Größen beziehen: Höhe des Peaks, Größe des Peaks, Integral des Peaks, Fläche des Peaks, Mittelwert des Peaks, Zentrum des Peaks, Ort oder Position des Peaks im 1H-NMR-Spektrum oder eine Kombination davon. Im Allgemeinen kann durch die Identifizierung von Änderungen des Anteils eines oder mehrerer Wasserstoffpeaks zur Klassifizierung des Molekularprofils in mindestens die erste und zweite Klasse der Krebsrisiko-Score genau und zuverlässig ermittelt werden. [0042] Gemäß einer Ausführungsform umfasst das Auswerten der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil die Analyse aller Wasserstoffpeaks in den 1H-NMRSpektroskopiedaten, die mit einem oder mehreren Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen in der Biofluidprobe assoziiert sind. Durch die Analyse aller Wasserstoffpeaks im molekularen Profil können mehrere oder alle molekularen Signaturen, die durch die krebsartige und/oder präkrebsartige Erkrankung in einem molekularen Profil induziert werden, bei der Bestimmung des Krebsrisiko-Scores berücksichtigt werden, wodurch die Genauigkeit und Robustheit der Bestimmung verbessert und die Vielseitigkeit der Bestimmung erhöht wird. [0043] Gemäß einer Ausführungsform umfasst das Auswerten der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil die Erkennung eines Musters von Wasserstoffpeaks, das mit einer krebsbedingten Änderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMRSpektroskopiedaten im Vergleich zu einer Biofluidprobe einer oder mehrerer gesunder Individuen assoziiert ist. Alternativ oder zusätzlich kann das mindestens eine maschinelle Lernmodell so trainiert werden, dass es ein Muster von Wasserstoffpeaks erkennt, das mit einer krebsbedingten Änderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten im Vergleich zu einer Biofluidprobe eines oder mehrerer gesunder Individuen assoziiert ist. Beispielsweise können komplexe Korrelationen zwischen Krebserkrankungen und molekularen Signaturen im molekularen Profil eines Individuums auftreten, was zu Mustern von Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten führen kann, die zur Bestimmung des 11 Shift in the 1H NMR spectroscopy data that cannot be assigned to individual metabolites, proteins, amino acids, micromolecules, and macromolecules in the biofluid sample. Accordingly, molecular signatures that cannot be assigned to and/or are not associated with individual metabolites, proteins, amino acids, micromolecules, and macromolecules in the biofluid sample can also be taken into account when calculating the cancer risk score. This can also increase the overall versatility of the method, for example, to enable the determination of various types of cancer and/or precancerous lesions. [0041] According to one embodiment, the at least one molecular signature is associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H NMR spectroscopy data that are associated with one or more metabolites, proteins, amino acids, micromolecules, and macromolecules in the biofluid sample relative to a biofluid sample from one or more healthy individuals. A change in the proportion of a hydrogen peak can relate to one or more of the following quantities: height of the peak, size of the peak, integral of the peak, area of the peak, mean value of the peak, center of the peak, location or position of the peak in the 1H NMR spectrum, or a combination thereof. In general, by identifying changes in the proportion of one or more hydrogen peaks to classify the molecular profile into at least the first and second classes, the cancer risk score can be accurately and reliably determined. [0042] According to one embodiment, evaluating the 1H NMR spectroscopy data with respect to the molecular profile comprises analyzing all hydrogen peaks in the 1H NMR spectroscopy data that are associated with one or more metabolites, proteins, amino acids, micromolecules, and macromolecules in the biofluid sample. By analyzing all hydrogen peaks in the molecular profile, several or all molecular signatures induced by the cancerous and/or pre-cancerous condition in a molecular profile can be taken into account when determining the cancer risk score, thereby improving the accuracy and robustness of the determination and increasing the versatility of the determination. [0043] According to one embodiment, evaluating the 1H NMR spectroscopy data with respect to the molecular profile comprises detecting a pattern of hydrogen peaks associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H NMR spectroscopy data compared to a biofluid sample of one or more healthy individuals. Alternatively or additionally, the at least one machine learning model can be trained to detect a pattern of hydrogen peaks associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H NMR spectroscopy data compared to a biofluid sample of one or more healthy individuals. For example, complex correlations between cancers and molecular signatures in an individual’s molecular profile can occur, leading to patterns of hydrogen peaks in the 1H NMR spectroscopy data that can be used to determine the

Krebsrisiko-Scores erkannt werden können. Durch die Erkennung solcher Muster von Cancer risk scores can be detected. By detecting such patterns of

15 15

20 20

25 25

30 30

35 35

12 Wasserstoffpeaks kann der Krebsrisiko-Score mit hoher Genauigkeit, Zuverlässigkeit und Robustheit berechnet werden. [0044] Gemäß einer Ausführungsform umfasst das Erhalten der 1H-NMR-Spektroskopiedaten die Erfassung von Rohdaten der 1H-NMR-Spektroskopie (1H-NMR-Spektroskopie-Rohdaten) mit einem NMR-Spektrometer bei einer Frequenz über etwa 500 MHz. Die Rohdaten der 1H-NMRSpektroskopie können den freien Induktionsabfall, wie er in einer 1H-NMR-Spektroskopie gemessen wird, enthalten oder anzeigen. Dementsprechend kann die Erfassung von Rohdaten der 1H-NMRSpektroskopie mit einem NMR-Spektrometer bei einer Frequenz über etwa 500 MHz die Messung des freien Induktionsabfalls mit einem NMR-Spektrometer umfassen. [0045] In einer beispielhaften Implementierung führt das Verfahren eine weitere spektrale Verarbeitung der 1H-NMR-Spektroskopie-Rohdaten und/oder eine Peakausrichtung der 1H-NMRSpektroskopie-Rohdaten durch. Dies kann es ermöglichen, Verschiebungen in der Position oder chemische Verschiebungen von Wasserstoffpeaks verschiedener Biofluidproben zu korrigieren, beispielsweise so, dass Wasserstoffpeaks, die mit einem bestimmten Metaboliten assoziiert sind, aber an verschiedenen Biofluidproben gemessen wurden, in den entsprechenden 1H-NMRSpektroskopiedaten an derselben Position liegen. Auf diese Weise kann die Vergleichbarkeit von 1H-NMR-Spektroskopiedaten verschiedener Biofluidproben verbessert werden. [0046] Gemäß einer Ausführungsform umfasst die spektrale Verarbeitung der 1H-NMRSpektroskopie-Rohdaten eine oder mehrere chemische Verschiebung-Referenzierung, Phaseneinstellen und Basislinienkorrektur der 1H-NMR-Spektroskopie-Rohdaten. Dementsprechend können eine oder mehrere dieser Vorverarbeitungsschritte auf die 1H-NMR-SpektroskopieRohdaten angewendet werden, was die Vergleichbarkeit der 1H-NMR-Spektroskopiedaten verschiedener Biofluidproben untereinander erhöhen und die Robustheit und Reproduzierbarkeit der Bestimmung des Krebsrisiko-Scores in Bezug auf verschiedene Biofluidproben verbessern kann. [0047] Gemäß einer Ausführungsform umfasst das Verfahren ferner eine oder mehrere von: Normalisierung, Skalierung, Binning und Filterung der 1H-NMR-Spektroskopie-Rohdaten und/oder der 1H-NMR-Spektroskopiedaten. Auch dies kann die Vergleichbarkeit von 1H-NMRSpektroskopiedaten verschiedener Biofluidproben erhöhen und die RobustMeit und Reproduzierbarkeit der Bestimmung des Krebsrisiko-Scores in Bezug auf verschiedene Biofluidproben verbessern. [0048] Gemäß einer Ausführungsform wird das mindestens ein maschinelles Lernmodell zur Klassifizierung mit einem oder mehreren statistischen maschinellen Lernalgorithmen trainiert. Im Allgemeinen können statistische maschinelle Lernalgorithmen statistische Verfahren verwenden, um Modelle zu entwickeln, die aus Daten lernen und auf der Basis des Lernens Vorhersagen oder Entscheidungen treffen können. Zu den beispielhaften und nicht einschränkenden statistischen Algorithmen für maschinelles Lernen gehören ein Voting-Klassifikator, ein Ensemble-Verfahren und die logistische Regression. [0049] Ein Voting-Klassifikator ist ein maschinelles Lernmodell, das auf einem Ensemble 12 hydrogen peaks, the cancer risk score can be calculated with high accuracy, reliability, and robustness. [0044] According to one embodiment, obtaining the 1H NMR spectroscopy data comprises acquiring raw 1H NMR spectroscopy data (1H NMR spectroscopy raw data) with an NMR spectrometer at a frequency above about 500 MHz. The raw 1H NMR spectroscopy data may include or indicate the free induction decay as measured in 1H NMR spectroscopy. Accordingly, acquiring raw 1H NMR spectroscopy data with an NMR spectrometer at a frequency above about 500 MHz may comprise measuring the free induction decay with an NMR spectrometer. [0045] In an exemplary implementation, the method performs further spectral processing of the 1H NMR spectroscopy raw data and/or peak alignment of the 1H NMR spectroscopy raw data. This may allow for correcting shifts in the position or chemical shifts of hydrogen peaks of different biofluid samples, for example, such that hydrogen peaks associated with a particular metabolite but measured on different biofluid samples are located at the same position in the corresponding 1H NMR spectroscopy data. In this way, the comparability of 1H NMR spectroscopy data of different biofluid samples can be improved. [0046] According to one embodiment, the spectral processing of the 1H NMR spectroscopy raw data comprises one or more of chemical shift referencing, phase adjustment, and baseline correction of the 1H NMR spectroscopy raw data. Accordingly, one or more of these preprocessing steps can be applied to the 1H NMR spectroscopy raw data, which can increase the comparability of the 1H NMR spectroscopy data of different biofluid samples and improve the robustness and reproducibility of the determination of the cancer risk score with respect to different biofluid samples. [0047] According to one embodiment, the method further comprises one or more of: normalization, scaling, binning, and filtering of the 1H NMR spectroscopy raw data and/or the 1H NMR spectroscopy data. This can also increase the comparability of 1H NMR spectroscopy data of different biofluid samples and improve the robustness and reproducibility of the determination of the cancer risk score with respect to different biofluid samples. [0048] According to one embodiment, the at least one machine learning model for classification is trained with one or more statistical machine learning algorithms. In general, statistical machine learning algorithms can use statistical methods to develop models that can learn from data and make predictions or decisions based on the learning. Example and non-limiting statistical machine learning algorithms include a voting classifier, an ensemble method, and logistic regression. [0049] A voting classifier is a machine learning model that is based on an ensemble

zahlreicher Modelle trainiert und eine Ausgabe oder mindestens eine Klasse auf der Basis der numerous models trained and an output or at least one class based on the

15 15

20 20

25 25

30 30

35 35

13 höchsten Wahrscheinlichkeit der gewählten Klasse als Ausgabe vorhersagt. Ensemble-Methoden hingegen sind Techniken, die darauf abzielen, die Genauigkeit der Ergebnisse von Modellen zu verbessern, indem mehrere Modelle kombiniert werden, anstatt ein einziges Modell zu verwenden. Die kombinierten Modelle können somit die Genauigkeit der erzielten Klassifizierungsergebnisse erhöhen. Darüber hinaus bezieht sich die logistische Regression auf eine statistische Methode, die zur Erstellung von Modellen für maschinelles Lernen verwendet werden kann, bei denen die abhängige Variable dichotom und/oder binär ist. Dementsprechend kann die logistische Regression insbesondere für die binäre Klassifizierung in zwei Klassen, wie die erste und zweite Klasse von Molekularprofilen, verwendet werden. [0050] Gemäß einer Ausführungsform umfasst das Klassifizieren des molekularen Profils in mindestens die erste Klasse und die zweite Klasse von molekularen Profilen das Klassifizieren des molekularen Profils in mindestens eine gesunde Klasse von molekularen Profilen, die mit gesunden Individuen assoziiert sind, und eine nicht-gesunde oder kranke Klasse von molekularen Profilen, die mit nicht-gesunden oder kranken Individuen assoziiert sind, basierend auf der Auswertung der 1HNMR-Spektroskopiedaten hinsichtlich des molekularen Profils der Biofluidprobe mit einem ersten trainierten Maschinenlernmodell. Ferner umfasst das Verfahren nach der Feststellung, dass das molekulare Profil der Biofluidprobe mit der nicht-gesunden oder kranken Klasse von molekularen Profilen assoziiert ist oder in dieser Klasse liegt, die Klassifizierung des molekularen Profils in eine Nicht-Krebs-Klasse von molekularen Profilen, die mit nicht-krebsartigen Individuen assoziiert sind, und eine Krebs-Klasse von molekularen Profilen, die mit krebsartigen und/oder präkrebsartigen Individuen assoziiert sind, basierend auf der Auswertung der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem zweiten trainierten maschinellen Lernmodell. Darin unterscheiden sich das erste trainierte maschinelle Lernmodell und das zweite maschinelle Lernmodell voneinander. Insbesondere können sich das erste und das zweite maschinelle Lernmodell in Bezug auf das Training der jeweiligen Modelle voneinander unterscheiden. Beispielsweise kann das erste ML-Modell so trainiert werden, dass es eine binäre Klassifizierung des molekularen Profils der Biofluidprobe in die gesunde Klasse der molekularen Profile, die mit gesunden Individuen assoziiert sind, und die nicht gesunde oder kranke Klässe der molekularen Profile, die mit nicht gesunden oder kranken Individuen assoziiert sind, liefert, während das zweite ML-Modell auf eine binäre Klassifizierung des molekularen Profils der Biofluidprobe in die NichtKrebs-Klasse der molekularen Profile, die mit nicht krebskranken Individuen assoziiert sind, und die Krebs-Klasse der molekularen Profile, die mit krebskranken und/oder präkrebsartigen Individuen assoziiert sind, trainiert werden kann. Alternativ oder zusätzlich können für das erste und zweite MLModell unterschiedliche Typen von ML-Modellen verwendet werden, wie z. B. unterschiedliche statistische ML-Modelle oder Algorithmen. Alternativ oder zusätzlich können für das erste und das zweite ML-Modell unterschiedliche Klassifizierungen verwendet werden, wie z. B. eine Klassifizierung in Bezug auf das Krebsstadium, z. B. metastatisch und nicht-metastatisch, und/oder eine Klassifizierung in Bezug auf eine Untererkrankung. Alternativ oder zusätzlich können sich das 13 highest probability of the selected class as output. Ensemble methods, on the other hand, are techniques that aim to improve the accuracy of model results by combining multiple models instead of using a single model. The combined models can thus increase the accuracy of the classification results obtained. Furthermore, logistic regression refers to a statistical method that can be used to build machine learning models where the dependent variable is dichotomous and/or binary. Accordingly, logistic regression can be used particularly for binary classification into two classes, such as the first and second classes of molecular profiles. [0050] According to one embodiment, classifying the molecular profile into at least the first class and the second class of molecular profiles comprises classifying the molecular profile into at least one healthy class of molecular profiles associated with healthy individuals and a non-healthy or diseased class of molecular profiles associated with non-healthy or diseased individuals based on the evaluation of the 1HNMR spectroscopy data regarding the molecular profile of the biofluid sample with a first trained machine learning model. Furthermore, the method comprises, after determining that the molecular profile of the biofluid sample is associated with or lies within the non-healthy or diseased class of molecular profiles, classifying the molecular profile into a non-cancer class of molecular profiles associated with non-cancerous individuals and a cancer class of molecular profiles associated with cancerous and/or pre-cancerous individuals based on the evaluation of the 1H NMR spectroscopy data with respect to the molecular profile of the biofluid sample with a second trained machine learning model. The first trained machine learning model and the second machine learning model differ from each other in this respect. In particular, the first and second machine learning models may differ from each other with respect to the training of the respective models. For example, the first ML model can be trained to provide a binary classification of the molecular profile of the biofluid sample into the healthy class of molecular profiles associated with healthy individuals and the unhealthy or diseased class of molecular profiles associated with unhealthy or diseased individuals, while the second ML model can be trained to provide a binary classification of the molecular profile of the biofluid sample into the non-cancer class of molecular profiles associated with non-cancerous individuals and the cancer class of molecular profiles associated with cancerous and/or pre-cancerous individuals. Alternatively or additionally, different types of ML models can be used for the first and second ML models, such as different statistical ML models or algorithms. Alternatively or additionally, different classifications can be used for the first and second ML models, such as a classification related to the cancer stage, e.g. B. metastatic and non-metastatic, and/or a classification in relation to a sub-disease. Alternatively or additionally, the

erste und das zweite ML-Modell in einem oder mehreren Parametern, dem gewählten Algorithmus first and second ML model in one or more parameters, the selected algorithm

14 / 44 14 / 44

15 15

20 20

25 25

30 30

35 35

14 und den Methoden der Merkmalstechnik unterscheiden. Beispielsweise kann die Hauptkomponentenanalyse (principal component analysis, PCA) und/oder die t-verteilte stochastische Nachbarschaftseinbettung (t-distributed stochastic neighbor embedding, tSNE) verwendet werden, um Merkmale auf der Basis von unüberwachtem Lernen für das erste und/oder zweite ML-Modell zu erzeugen. [0051] Dementsprechend können verschiedene ML-Modelle nacheinander verwendet oder herangezogen werden, um den Krebsrisiko-Score zu bestimmen. Die Anwendung eines mehrstufigen Ansatzes zur Bestimmung des Krebsrisiko-Scores kann im Allgemeinen die Genauigkeit und Robustheit des Gesamtverfahrens zur Bestimmung des Krebsrisiko-Scores erhöhen. Auch die Vielseitigkeit der Gesamtverfahrens zur Berechnung des Krebsrisiko-Scores kann erhöht werden. [0052] Gemäß einer Ausführungsform umfasst das Verfahren ferner das Bestimmen, basierend auf der Klassifizierung des molekularen Profils der Biofluidprobe mit dem ersten trainierten maschinellen Lernmodell, eines ersten Krebsrisiko-Scores, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil mit der gesunden Klasse und/oder der nicht-gesunden oder kranken Klasse assoziiert ist oder in dieser Klasse liegt. Das Verfahren umfasst ferner das Bestimmen, basierend auf der Klassifizierung des molekularen Profils der Biofluidprobe mit dem zweiten trainierten maschinellen Lernmodell, eines zweiten Krebsrisiko-Scores, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil mit der Nicht-Krebs-Klasse und/oder der Krebs-Klasse des molekularen Profils assoziiert ist oder in dieser Klasse liegt, und das Bestimmen des KrebsrisikoScores basierend auf dem ersten und zweiten Krebsrisiko-Score. Zum Beispiel kann das Klassifizierungsergebnis und/oder der zweite Krebsrisiko-Score des zweiten ML-Modells als Krebsrisiko-Score verwendet werden oder diesen bilden. Alternativ oder zusätzlich kann sowohl der erste als auch der zweite Krebsrisiko-Score kombiniert werden, z. B. in Übereinstimmung mit einer vordefinierten Metrik, um den Krebsrisiko-Score zu ermitteln. Alternativ oder zusätzlich kann ein Kalibrierungsklassifikator verwendet werden, um verfeinerte Wahrscheinlichkeiten für die Klassifizierungsergebnisse bzw. die ersten und/oder zweiten Krebsrisiko-Scores zu liefern. [0053] Gemäß einer Ausführungsform umfasst das Verfahren ferner, dass hach der Feststellung, dass das molekulare Profil der Biofluidprobe mit der Krebsklasse von molekularen Profilen assoziiert ist oder in dieser Klasse liegt, das molekulare Profil auf der Basis der Auswertung von 1H-NMRSpektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem dritten trainierten maschinellen Lernmodell in eine Vielzahl von Klassen von molekularen Profilen klassifiziert wird, wobei jede Klasse mit einer bestimmten Art von Krebserkrankung und/oder Präkrebserkrankung assoziiert ist. Optional kann ein dritter Krebsrisiko-Score bestimmt werden, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil mit einer bestimmten Art von krebsartiger und/oder präkrebsartiger Erkrankung assoziiert ist. Dementsprechend kann eine Art von Krebs, krebsartiger und/oder präkrebsartiger Erkrankung mit Hilfe des hier beschriebenen 14 and the feature engineering methods. For example, principal component analysis (PCA) and/or t-distributed stochastic neighbor embedding (tSNE) can be used to generate features based on unsupervised learning for the first and/or second ML model. [0051] Accordingly, different ML models can be used sequentially or consulted to determine the cancer risk score. Applying a multi-stage approach to determining the cancer risk score can generally increase the accuracy and robustness of the overall method for determining the cancer risk score. The versatility of the overall method for calculating the cancer risk score can also be increased. [0052] According to one embodiment, the method further comprises determining, based on the classification of the molecular profile of the biofluid sample with the first trained machine learning model, a first cancer risk score indicating a probability that the molecular profile is associated with or lies in the healthy class and/or the non-healthy or diseased class. The method further comprises determining, based on the classification of the molecular profile of the biofluid sample with the second trained machine learning model, a second cancer risk score indicating a probability that the molecular profile is associated with or lies in the non-cancer class and/or the cancer class of the molecular profile, and determining the cancer risk score based on the first and second cancer risk scores. For example, the classification result and/or the second cancer risk score of the second ML model can be used as or form the cancer risk score. Alternatively or additionally, both the first and the second cancer risk score can be combined, e.g., in accordance with a predefined metric, to determine the cancer risk score. Alternatively or additionally, a calibration classifier can be used to provide refined probabilities for the classification results or the first and/or second cancer risk scores. [0053] According to one embodiment, the method further comprises, after determining that the molecular profile of the biofluid sample is associated with or lies in the cancer class of molecular profiles, classifying the molecular profile into a plurality of classes of molecular profiles based on the evaluation of 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a third trained machine learning model, each class being associated with a particular type of cancer and/or pre-cancer. Optionally, a third cancer risk score can be determined, which indicates a probability that the molecular profile is associated with a specific type of cancerous and/or precancerous disease. Accordingly, a type of cancer, cancerous and/or precancerous disease can be identified using the method described here.

Verfahrens bestimmt werden. procedure can be determined.

15 15

20 20

25 25

30 30

35 35

15 [0054] Das dritte ML-Modell kann sich von einem oder beiden, dem ersten und dem zweiten MLModell, unterscheiden. Insbesondere kann sich das dritte ML-Modell von dem ersten und/oder zweiten ML-Modell in Bezug auf das Training unterscheiden. Insbesondere kann das dritte MLModell so trainiert werden, dass es ein Mehrklassen-Klassifizierungsergebnis für die Klassifizierung des molekularen Profils in mehreren Klassen von molekularen Profilen liefert, wobei jede Klasse mit einer bestimmten Art von krebsartiger und/oder präkrebsartiger Erkrankung assoziiert ist, während das erste und das zweite ML-Modell so trainiert werden können, dass sie ein binäres Klassifizierungsergebnis liefern, wie oben beschrieben. Optional kann für das dritte ML-Modell auch eine andere Art von ML-Modell verwendet werden als für das erste und zweite ML-Modell. Alternativ oder zusätzlich können für das erste, zweite und dritte ML-Modell unterschiedliche Klassifizierungen verwendet werden, wie z. B. eine Klassifizierung in Bezug auf das Krebsstadium, z. B. metastatisch und nicht-metastatisch, und/oder eine Klassifizierung in Bezug auf eine Untererkrankung. Alternativ oder zusätzlich können sich das erste, das zweite und das dritte ML-Modell in einem oder mehreren Parametern, dem gewählten Algorithmus und den Methoden der Merkmalstechnik unterscheiden. Beispielsweise kann die Hauptkomponentenanalyse (PCA) und/oder die t-verteilte stochastische Nachbarschaftseinbettung (tSNE) verwendet werden, um Merkmale auf der Basis von unüberwachtem Lernen für mindestens eines der ersten, zweiten und dritten ML-Modelle zu erzeugen. [0055] Ein weiterer Aspekt der vorliegenden Offenbarung bezieht sich auf eine Computervorrichtung, die so eingerichtet ist, dass sie die Schritte des Verfahrens, wie oben und unten beschrieben, durchführt. Es wird betont, dass jedes Merkmal, jede Funktion, jedes Element und/oder jeder Schritt, die hier unter Bezugnahme auf das Verfahren beschrieben werden, ein Merkmal, eine Funktion und/oder ein Element der Computervorrichtung sein können und umgekehrt. [0056] Die Computervorrichtung umfasst einen oder mehrere Prozessoren zur Datenverarbeitung. Ferner kann die Computervorrichtung einen Datenspeicher und/oder einen Speicher zum Speichern von Daten, wie z. B. die 1H-NMR-Spektroskopiedaten, einen oder mehrere Krebsrisiko-Scores und/oder andere Daten, umfassen. Ferner kann der Datenspeicher und/oder der Speicher Softwarebefehle und/oder ein Computerprogramm speichern, das, wenn es von einer Computervorrichtung ausgeführt wird, die Computervorrichtung anweist, Schritte des hierin und im Folgenden beschriebenen Verfahrens durchzuführen. [0057] Darüber hinaus kann das mindestens eine maschinelle Lernmodell, z. B. eines oder mehrere des ersten ML-Modells, des zweiten ML-Modells und des dritten ML-Modells, in Software und/oder Hardware auf dem Computergerät implementiert werden. [0058] In einer beispielhaften Implementierung kann die Computereinrichtung eine oder mehrere Kommunikationsschnittstellen umfassen, die so eingerichtet sind, dass sie mit einer oder mehreren entfernten Einrichtungen, beispielsweise einem externen Datenspeicher oder einer externen Datenbank, und/oder einer oder mehreren entfernten Computereinrichtungen kommunizieren können. Zumindest ein Teil der 1H-NMR-Spektroskopiedaten kann über die 15 [0054] The third ML model may differ from one or both of the first and second ML models. In particular, the third ML model may differ from the first and/or second ML models in terms of training. In particular, the third ML model may be trained to provide a multi-class classification result for classifying the molecular profile into multiple classes of molecular profiles, each class being associated with a specific type of cancerous and/or pre-cancerous condition, while the first and second ML models may be trained to provide a binary classification result, as described above. Optionally, a different type of ML model may be used for the third ML model than for the first and second ML models. Alternatively or additionally, different classifications may be used for the first, second and third ML models, such as a classification related to the cancer stage, e.g. B. metastatic and non-metastatic, and/or a classification with respect to a sub-disease. Alternatively or additionally, the first, second and third ML models may differ in one or more parameters, the chosen algorithm and the feature engineering methods. For example, principal component analysis (PCA) and/or t-distributed stochastic neighborhood embedding (tSNE) may be used to generate features based on unsupervised learning for at least one of the first, second and third ML models. [0055] Another aspect of the present disclosure relates to a computing device configured to perform the steps of the method as described above and below. It is emphasized that each feature, function, element and/or step described herein with reference to the method may be a feature, function and/or element of the computing device and vice versa. [0056] The computing device comprises one or more processors for data processing. Furthermore, the computing device may comprise a data store and/or a memory for storing data, such as the 1H NMR spectroscopy data, one or more cancer risk scores, and/or other data. Furthermore, the data store and/or the memory may store software instructions and/or a computer program that, when executed by a computing device, instructs the computing device to perform steps of the method described herein and hereinafter. [0057] Furthermore, the at least one machine learning model, e.g., one or more of the first ML model, the second ML model, and the third ML model, may be implemented in software and/or hardware on the computing device. [0058] In an exemplary implementation, the computing device may comprise one or more communication interfaces configured to communicate with one or more remote devices, e.g., an external data store or database, and/or one or more remote computing devices. At least part of the 1H NMR spectroscopy data can be

Kommunikationsschnittstelle von einem externen Datenspeicher oder einer externen Datenbank Communication interface from an external data storage or database

15 15

20 20

25 25

30 30

35 35

16 empfangen werden. Alternativ oder zusätzlich können der berechnete Krebsrisiko-Score oder damit zusammenhängende Informationen an ein oder mehrere entfernte Geräte übertragen oder in dem externen Datenspeicher oder der externen Datenbank gespeichert werden. [0059] Ein weiterer Aspekt der vorliegenden Offenbarung bezieht sich auf ein Computerprogramm, das, wenn es von einem Computergerät ausgeführt wird, das Computergerät anweist, Schritte des oben und unten beschriebenen Verfahrens durchzuführen. [0060] Ein weiterer Aspekt der vorliegenden Offenbarung bezieht sich auf ein nicht-transitorisches computerlesbares Medium, auf dem ein Computerprogramm gespeichert ist, das, wenn es von einem Computergerät ausgeführt wird, das Computergerät anweist, Schritte des oben und unten 16. Alternatively or additionally, the calculated cancer risk score or related information may be transmitted to one or more remote devices or stored in the external data storage or database. [0059] Another aspect of the present disclosure relates to a computer program that, when executed by a computing device, instructs the computing device to perform steps of the method described above and below. [0060] Another aspect of the present disclosure relates to a non-transitory computer-readable medium having stored thereon a computer program that, when executed by a computing device, instructs the computing device to perform steps of the above and below

beschriebenen Verfahrens durchzuführen. described procedure.

KURZBESCHREIBUNG DER ZEICHNUNGEN BRIEF DESCRIPTION OF THE DRAWINGS

[0061] Beispielhafte Ausführungsformen werden unter Bezugnahme auf die Figuren weiter beschrieben, wobei: [0061] Exemplary embodiments are further described with reference to the figures, wherein:

[0062] Figur 1 zeigt eine Computervorrichtung gemäß einer beispielhaften Ausführungsform; [0063] Figur 2 zeigt 1H-NMR-Spektroskopiedaten zur Veranschaulichung der Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores; [0062] Figure 1 shows a computer device according to an exemplary embodiment; [0063] Figure 2 shows 1H NMR spectroscopy data illustrating the steps of a computer-implemented method for determining a cancer risk score;

[0064] Figur 3A zeigt ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß einer beispielhaften Ausführungsform illustriert; [0064] Figure 3A shows a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to an exemplary embodiment;

[0065] Die Figuren 3B bis 3D zeigen jeweils 1H-NMR-Spektroskopiedaten, die als Eingabe für das in Figur 3A dargestellte Verfahren verwendet werden können; und [0065] Figures 3B to 3D each show 1H NMR spectroscopy data that can be used as input for the method shown in Figure 3A; and

[0066] Die Figuren 4 bis 6 zeigen jeweils ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß beispielhaften Ausführungsformen veranschaulicht. [0066] Figures 4 to 6 each show a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to exemplary embodiments.

[0067] Die Figuren sind nur schematisch und nicht maßstabsgetreu. Grundsätzlich sind gleiche oder gleichartige Teile, Elemente und/oder Schritte in den Figuren mit gleichen oder gleichartigen [0067] The figures are only schematic and not to scale. In principle, identical or similar parts, elements and/or steps in the figures are represented by identical or similar

Bezugsziffern versehen. provided with reference numbers.

AUSFÜHRLICHE BESCHREIBUNG BEISPIELHAFTER AUSFÜHRUNGSFORMEN DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0068] Figur 1 zeigt eine Computervorrichtung oder ein Computersystem 100, die oder das so eingerichtet ist, dass es einen Krebsrisiko-Score gemäß einer beispielhaften Ausführungsform bestimmt. [0068] Figure 1 shows a computer device or computer system 100 configured to determine a cancer risk score according to an exemplary embodiment.

[0069] Die Computervorrichtung 100 umfasst eine Verarbeitungsschaltung 110 mit einem oder mehreren Prozessoren 112 zur Datenverarbeitung. [0069] The computer device 100 includes a processing circuit 110 with one or more processors 112 for data processing.

[0070] Die Computervorrichtung 100 umfasst außerdem mindestens ein maschinelles Lernmodell 114. Das ML-Modell 114 kann in Software und/oder Hardware implementiert sein und kann Teil der [0070] The computing device 100 further includes at least one machine learning model 114. The ML model 114 may be implemented in software and/or hardware and may be part of the

Verarbeitungsschaltung 110 des Computergeräts 100 sein. Die Computervorrichtung 100 kann Processing circuit 110 of the computing device 100. The computing device 100 may

15 15

20 20

25 25

30 30

35 35

17 optional eine Vielzahl von maschinellen Lernmodellen 114 umfassen, wie z.B. ein erstes, ein zweites und ein drittes ML-Modell, wie z.B. unter Bezugnahme auf die nachstehende Figur 6 ausführlicher beschrieben. Die optionale Vielzahl von ML-Modellen ist in Figur 1 mit der Referenznummer 114 gemeinsam dargestellt. [0071] Die Computervorrichtung 100 umfasst ferner mindestens einen Datenspeicher 116 und/oder einen Speicher 116 zum Speichern von Daten und/oder Softwareanweisungen, beispielsweise in Form eines Computerprogramms, um die Computervorrichtung 100 anzuweisen, Schritte des Verfahrens zur Bestimmung des Krebsrisiko-Scores, wie oben und unten beschrieben, auszuführen. [0072] Die in Figur 1 gezeigte Computervorrichtung umfasst ferner eine Benutzerschnittstelle 118 zur Ausgabe von Daten und/oder Informationen, wie z. B. dem Krebsrisiko-Score, an einen Benutzer der Computervorrichtung 100 und/oder zur Steuerung des Computersystems 100 durch den Benutzer. [0073] Ferner umfasst das Computersystem 100 eine Kommunikationsschnittstelle 120 zur kommunikativen und/oder operativen Kopplung der Computervorrichtung 100 mit einer externen Computervorrichtung 500 und/oder zur Kopplung des Computersystems 100 mit einer oder mehreren externen Datenquellen 500, um beispielsweise 1H-NMR-Spektroskopiedaten von diesen zu empfangen oder zu erhalten und/oder Daten an diese zu übermitteln, wie z. B. den KrebsrisikoScore. [0074] Figur 2 zeigt 1H-NMR-Spektroskopiedaten in 200 zur Veranschaulichung der Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores. Insbesondere zeigt Figur 2 die NMR-Signalintensität in arbiträren Einheiten als Funktion der chemischen Verschiebung in Teilen pro Million, ppm. [0075] Figur 2 zeigt ein beispielhaftes molekulares Profil 210a eines gesunden Individuums (durchgezogene Linie) im Vergleich zu einem molekularen Profil 210b eines Individuums mit einer Krebserkrankung und/oder einer Präkrebserkrankung (gestrichelte Linie). Die molekularen Profile 210a, b umfassen oder bestehen dabei aus allen Wasserstoffpeaks der 1H-NMRSpektroskopiedaten 200. [0076] Ferner umfassen die in Figur 2 dargestellten Molekularprofile 210a,‘6 Gruppen oder Peaks 220, die Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen zugeordnet werden können, die den in den jeweiligen Molekularprofilen 210a, b dargestellten Wasserstoffpeaks zugeordnet und/oder mit ihnen assoziiert sind. [0077] Wie aus dem Vergleich des molekularen Profils 210a mit dem molekularen Profil 210b hervorgeht, unterscheiden sich die molekularen Profile 210a, b von gesunden Individuen und Individuen mit einer Krebsartige und/oder präkrebsartige Erkrankung in mehreren Bereichen der chemischen Verschiebung. Solche Unterschiede können hier als molekulare Signaturen bezeichnet werden, die mit einer Krebsartige und/oder präkrebsartige Erkrankung assoziiert sind. Auf der Basis dieser Unterschiede und/oder molekularen Signaturen kann das ML-Modell 114 und/oder die Computervorrichtung 100 den Krebsrisiko-Score bestimmen, wie oben und unten ausführlicher 17 optionally comprise a plurality of machine learning models 114, such as a first, a second and a third ML model, as described in more detail with reference to Figure 6 below, for example. The optional plurality of ML models are shown together in Figure 1 with the reference number 114. [0071] The computing device 100 further comprises at least one data store 116 and/or a memory 116 for storing data and/or software instructions, for example in the form of a computer program, for instructing the computing device 100 to carry out steps of the method for determining the cancer risk score, as described above and below. [0072] The computing device shown in Figure 1 further comprises a user interface 118 for outputting data and/or information, such as the cancer risk score, to a user of the computing device 100 and/or for controlling the computing system 100 by the user. [0073] Furthermore, the computer system 100 comprises a communication interface 120 for communicatively and/or operatively coupling the computer device 100 to an external computer device 500 and/or for coupling the computer system 100 to one or more external data sources 500, for example, to receive or obtain 1H NMR spectroscopy data from them and/or to transmit data to them, such as the cancer risk score. [0074] Figure 2 shows 1H NMR spectroscopy data in 200 to illustrate the steps of a computer-implemented method for determining a cancer risk score. In particular, Figure 2 shows the NMR signal intensity in arbitrary units as a function of the chemical shift in parts per million, ppm. [0075] Figure 2 shows an exemplary molecular profile 210a of a healthy individual (solid line) compared to a molecular profile 210b of an individual with cancer and/or pre-cancer (dashed line). The molecular profiles 210a, b comprise or consist of all hydrogen peaks of the 1H NMR spectroscopy data 200. [0076] Furthermore, the molecular profiles 210a, b shown in Figure 2 comprise groups or peaks 220 that can be assigned to metabolites, proteins, amino acids, micromolecules, and macromolecules that are assigned to and/or associated with the hydrogen peaks shown in the respective molecular profiles 210a, b. [0077] As can be seen from the comparison of the molecular profile 210a with the molecular profile 210b, the molecular profiles 210a, b of healthy individuals and individuals with a cancerous and/or pre-cancerous disease differ in several ranges of the chemical shift. Such differences may be referred to herein as molecular signatures associated with a cancerous and/or pre-cancerous disease. Based on these differences and/or molecular signatures, the ML model 114 and/or the computing device 100 may determine the cancer risk score, as described in more detail above and below.

beschrieben. described.

15 15

20 20

25 25

30 30

35 35

18 [0078] So ist beispielsweise ein Peak 222, der dem Cholinphospholipid zuzuordnen ist, im Molekularprofil 210b des krebskranken Individuums vorhanden, während dieser Peak im Profil 210a des gesunden Individuums nicht vorhanden ist. Auch die Anteile des Taurin-Peaks 220, des Glutamin-Peaks 224, des Glutamin- und Glutamat-Peaks 226, der Aminosäure-Peaks 228, der Lipid- und Fettsäure-Peaks 230, 232 und anderer unterscheiden sich zwischen dem Molekularprofil 210a des gesunden Individuums und dem Molekularprofil 210b des krebskranken und/oder präkrebsartigen Individuums. Einer oder mehrere solcher Unterschiede können von dem ML-Modell 114 und/oder der Computervorrichtung 100 verwendet werden, um den Krebsrisiko-Score zu bestimmen, wie oben und unten ausführlicher beschrieben. [0079] Figur 3A zeigt ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß einer beispielhaften Ausführungsform illustriert. Die Figuren 3B bis 3D zeigen jeweils 1H-NMR-Spektroskopiedaten 300a, 300b, 300c, die als Eingabe in das Verfahren der Figur 3A verwendet werden können. Insbesondere zeigen die Figuren 3B bis 3D jeweils die NMR-Signalintensität in beliebigen Einheiten als Funktion der chemischen Verschiebung in Teilen pro Million, ppm. Darin entsprechen die 1H-NMRSpektroskopiedaten 300a der Figur 3B einem molekularen Profil 310a, das mit einem gesunden Individuum assoziiert ist, oder zeigen dieses. Die 1H-NMR-Spektroskopiedaten 300b der Figur 3C entsprechen oder zeigen ein molekulares Profil 310b, das einem Individuum mit einer präkrebsartigen Erkrankung, wie z. B. Pankreatitis, zugeordnet ist, und die 1H-NMRSpektroskopiedaten 300c der Figur 3D entsprechen oder zeigen ein molekulares Profil 310c, das einem Individuum mit einer krebsartigen Erkrankung, wie z. B. Pankreaskrebs, zugeordnet ist. Im Folgenden kann allgemein auf die Figuren 1 bis 3D Bezug genommen werden. [0080] Das Verfahren umfasst einen Schritt S1 des Erhaltens von 1H-NMR-Spektroskopiedaten 300a-c für eine Biofluidprobe eines Individuums an einer Computervorrichtung 100, wobei die 1HNMR-Spektroskopiedaten 3100a-c eine NMR-Signalintensität als eine Funktion der chemischen Verschiebung anzeigen. Optional kann der Schritt S1 den Zugriff auf die 1H-NMRSpektroskopiedaten bei und/oder den Abruf der 1H-NMR-Spektroskopiedaten 300a-c von einem Datenspeicher 116 der Computervorrichtung 100 und/oder von einer extermfen Computervorrichtung 500 umfassen. [0081] In einem weiteren Schritt S2 umfasst das Verfahren das Auswerten und/oder Analysieren der 1H-NMR-Spektroskopiedaten 300a-c mit mindestens einem trainierten maschinellen Lernmodell 114 der Computervorrichtung in Bezug auf ein molekulares Profil 310a-c der Biofluidprobe, wobei das molekulare Profil 310a-c alle Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten 300a-c enthält, die einem oder mehreren von einem Metaboliten, einem Protein, einer Aminosäure, einem Mikromolekül und einem Makromolekül, die in der Biofluidprobe enthalten sind, zugeordnet werden können. [0082] Das Verfahren umfasst ferner einen Schritt S3 des Klassifizierens des molekularen Profils 310a-c auf der Basis der Auswertung mit dem mindestens einen trainierten maschinellen Lernmodell 18 [0078] For example, a peak 222 attributable to choline phospholipid is present in the molecular profile 210b of the cancerous individual, whereas this peak is absent in the profile 210a of the healthy individual. Also, the proportions of the taurine peak 220, the glutamine peak 224, the glutamine and glutamate peak 226, the amino acid peaks 228, the lipid and fatty acid peaks 230, 232, and others differ between the molecular profile 210a of the healthy individual and the molecular profile 210b of the cancerous and/or pre-cancerous individual. One or more of such differences may be used by the ML model 114 and/or the computing device 100 to determine the cancer risk score, as described in more detail above and below. [0079] Figure 3A shows a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to an exemplary embodiment. Figures 3B to 3D each show 1H NMR spectroscopy data 300a, 300b, 300c that can be used as input to the method of Figure 3A. In particular, Figures 3B to 3D each show the NMR signal intensity in arbitrary units as a function of chemical shift in parts per million, ppm. Therein, the 1H NMR spectroscopy data 300a of Figure 3B corresponds to or shows a molecular profile 310a associated with a healthy individual. The 1H NMR spectroscopy data 300b of Figure 3C corresponds to or shows a molecular profile 310b associated with an individual with a pre-cancerous condition, such as cancer. B. pancreatitis, and the 1H NMR spectroscopy data 300c of Figure 3D correspond to or show a molecular profile 310c associated with an individual with a cancerous disease, such as pancreatic cancer. In the following, reference may be made generally to Figures 1 to 3D. [0080] The method comprises a step S1 of obtaining 1H NMR spectroscopy data 300a-c for a biofluid sample of an individual at a computing device 100, wherein the 1H NMR spectroscopy data 3100a-c indicate an NMR signal intensity as a function of chemical shift. Optionally, step S1 may comprise accessing the 1H-NMR spectroscopy data at and/or retrieving the 1H-NMR spectroscopy data 300a-c from a data storage 116 of the computing device 100 and/or from an external computing device 500. [0081] In a further step S2, the method comprises evaluating and/or analyzing the 1H-NMR spectroscopy data 300a-c with at least one trained machine learning model 114 of the computing device with respect to a molecular profile 310a-c of the biofluid sample, wherein the molecular profile 310a-c contains all hydrogen peaks in the 1H-NMR spectroscopy data 300a-c that can be assigned to one or more of a metabolite, a protein, an amino acid, a micromolecule, and a macromolecule contained in the biofluid sample. [0082] The method further comprises a step S3 of classifying the molecular profile 310a-c on the basis of the evaluation with the at least one trained machine learning model

114 in mindestens einer ersten Klasse und einer zweiten Klasse von molekularen Profilen 310a-c, 114 in at least a first class and a second class of molecular profiles 310a-c,

15 15

20 20

25 25

30 30

35 35

19 wobei die erste Klasse repräsentativ für molekulare Profile 310a ist, die mit gesunden Individuen assoziiert sind, und die zweite Klasse repräsentativ für molekulare Profile 310b, c ist, die mit Individuen assoziiert sind, die eine krebsartige und/oder präkrebsartige Erkrankung haben. [0083] In einem weiteren Schritt S4 umfasst das Verfahren die Bestimmung, basierend auf der Klassifizierung des molekularen Profils 310a-c, eines Krebsrisiko-Scores, der eine Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum angibt. Optional kann der Schritt S4 die Berechnung einer Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum umfassen. Ferner kann optional der Krebsrisiko-Score und/oder die Kontextinformation von der Computervorrichtung 100 ausgegeben werden, z.B. an der Benutzerschnittstelle 118. [0084] Wie oben erwähnt, kann jeder der in den Figuren 3B-3D gezeigten 1H-NMRSpektroskopiedaten 300a-c als Eingabe verwendet werden, um einen entsprechenden KrebsrisikoScore für das entsprechende Individuum und/oder Biofluidprobe zu bestimmen. Bei Verwendung des Spektrums 300a eines gesunden Individuums kann der Krebsrisiko-Score eine hohe Wahrscheinlichkeit von über 50 % anzeigen, dass das molekulare Profil 310a mit der ersten Klasse von molekularen Profilen eines gesunden oder nicht krebskranken Individuums assoziiert ist. Andererseits kann der Krebsrisiko-Score bei Verwendung der 1H-NMR-Spektroskopiedaten 300b, c der Figuren 3C und 3D eine hohe Wahrscheinlichkeit von über 50 % anzeigen, dass das molekulare Profil 310b, c mit der zweiten Klasse von molekularen Profilen eines Individuums mit einer Krebsund/oder Präkrebserkrankung assoziiert ist. Optional können verschiedene Klassen für Krebs- und Präkrebserkrankungen verwendet werden, und der Krebsrisiko-Score kann anzeigen, ob das Individuum eine Krebserkrankung, wie z.B. Bauchspeicheldrüsenkrebs (Figur 3D), oder eine Präkrebserkrankung, wie z.B. Pankreatitis (Figur 3C), hat. Beispielsweise kann ein KrebsrisikoScore über 0,5 auf eine Krebserkrankung hinweisen, während ein Krebsrisiko-Score unter 0,5 auf ein gesundes Individuum und/oder ein Individuum mit einer Krebsvorstufe hinweisen kann. [0085] Mit dem hier beschriebenen Verfahren kann zum Beispiel Bauchspeicheldrüsenkrebs im Stadium I-IV von chronischer Pankreatitis in einer binären Klassifikation mit einer Genauigkeit von über 95 % bei 10-facher Kreuzvalidierung unterschieden werden. [0086] Im Folgenden werden das Verfahren zur Bestimmung des Krebsrisiko-Scores, damit verbundene Aspekte und Vorteile zusammengefasst. Das hier beschriebene Verfahren kann Krebs, Krebserkrankungen und/oder Krebsvorstufen genau erkennen, die normalerweise erst diagnostiziert werden, wenn es zu spät ist und die Überlebensraten niedrig sind. Das Verfahren beinhaltet eine Kombination aus Hochfrequenz-1H-NMR-Spektroskopie (magnetische Kernresonanz) und maschinellem Lernen, die auf eine Biofluidprobe, z. B. eine Blutprobe, angewendet wird. Als Ergebnis wird ein Krebsrisiko-Score für die verschiedenen Krebs- und/oder Krebsvorstufen erzeugt, der im Vergleich zu anderen bekannten Verfahren eine frühere Erkennung ermöglicht. Im Gegensatz zu derzeit verwendeten oder bekannten Techniken kann das hier beschriebene Verfahren nicht nur aus bekannten krebsbedingten Signalen oder molekularen Signaturen, sondern auch aus noch nicht 19 wherein the first class is representative of molecular profiles 310a associated with healthy individuals, and the second class is representative of molecular profiles 310b, c associated with individuals having a cancerous and/or pre-cancerous condition. [0083] In a further step S4, the method comprises determining, based on the classification of the molecular profile 310a-c, a cancer risk score indicating a probability of cancer occurrence in the individual. Optionally, step S4 may comprise calculating a probability of cancer occurrence in the individual. Furthermore, the cancer risk score and/or the context information may optionally be output from the computing device 100, e.g., at the user interface 118. [0084] As mentioned above, each of the 1H NMR spectroscopy data 300a-c shown in Figures 3B-3D may be used as input to determine a corresponding cancer risk score for the corresponding individual and/or biofluid sample. Using the spectrum 300a of a healthy individual, the cancer risk score may indicate a high probability, greater than 50%, that the molecular profile 310a is associated with the first class of molecular profiles of a healthy or non-cancerous individual. On the other hand, using the 1H NMR spectroscopy data 300b, c of Figures 3C and 3D, the cancer risk score may indicate a high probability of over 50% that the molecular profile 310b, c is associated with the second class of molecular profiles of an individual with a cancer and/or pre-cancerous condition. Optionally, different classes for cancer and pre-cancerous conditions may be used, and the cancer risk score may indicate whether the individual has a cancer, such as pancreatic cancer (Figure 3D), or a pre-cancerous condition, such as pancreatitis (Figure 3C). For example, a cancer risk score above 0.5 may indicate a cancer, while a cancer risk score below 0.5 may indicate a healthy individual and/or an individual with a pre-cancerous condition. [0085] With the method described here, for example, stage I-IV pancreatic cancer can be differentiated from chronic pancreatitis in a binary classification with an accuracy of over 95% with 10-fold cross-validation. [0086] The method for determining the cancer risk score, related aspects, and advantages are summarized below. The method described here can accurately detect cancer, cancers, and/or precancerous lesions that are usually only diagnosed when it is too late and survival rates are low. The method involves a combination of high-frequency 1H NMR (nuclear magnetic resonance) spectroscopy and machine learning applied to a biofluid sample, e.g., a blood sample. As a result, a cancer risk score is generated for the various cancers and/or precancerous lesions, enabling earlier detection compared to other known methods. In contrast to currently used or known techniques, the method described here can not only detect cancer-related signals or molecular signatures, but also

identifizierten molekularen Signaturen Informationen über den Gesundheitszustand des Individuums identified molecular signatures provide information about the individual's health status

20 / 44 20 / 44

15 15

20 20

25 25

30 30

35 35

20 ableiten. Darüber hinaus ist das hier beschriebene Verfahren schnell, skalierbar und kostengünstig, weshalb sie sich sehr gut für den industriellen Einsatz eignet. [0087] Krebszellen können aufgrund ihres veränderten Stoffwechsels und ihrer Proteinzusammensetzung die molekulare Gesamtzusammensetzung von Biofluiden, wie z. B. Blut, beeinflussen. Dieser Einfluss kann Änderungen des Anteils spezifischer Proteine, Metaboliten, Mikro- und/oder Makromoleküle umfassen, die entweder direkt von den Krebszellen freigesetzt werden oder deren Freisetzung direkt oder indirekt von ihnen induziert wird, z. B. über biologische Netzwerke, wie z. B. unter Bezugnahme auf Figur 2 beschrieben. Um diese Veränderung zu erkennen, werden die Hochfrequenz-1H-NMR-Spektroskopie- und/oder die 1H-NMRSpektroskopiedaten 200, 300a-c verwendet, die Informationen über die chemische Umgebung jedes Wasserstoffatoms in der Biofluidprobe und damit indirekt Informationen über jedes einzelne Molekül und seine Konzentration liefern können. Je nach der Struktur des Moleküls, das das entsprechende Wasserstoffatom enthält, kann eine andere chemische Umgebung vorhanden sein, wodurch sich die chemische Verschiebung dieses spezifischen Wasserstoffatoms ändert. Aus diesen Gründen können die 1H-NMR-Spektroskopiedaten 200, 300a-c der Biofluidprobe eine extrem hohe Informationsdichte liefern, die zur Unterscheidung zwischen gesunden Individuen und Individuen mit einer krebsartigen und/oder präkrebsartigen Erkrankung verwendet werden kann. [0088] Gegenwärtig sind die Analysemethoden auf Signale beschränkt, die einem bestimmten Molekül zugeordnet werden können, z. B. einem Metabolit oder einem Lipoprotein. Der große Vorteil des hier beschriebenen Verfahrens kann darin liegen, dass alle Wasserstoffpeaks im 1H-NMRSpektrum berücksichtigt werden und somit das gesamte molekulare Profil 310a-c der Biofluidprobe des Individuums in Betracht gezogen werden kann. Bereits in frühen Stadien können die Krebszellen den Stoffwechsel und das molekulare Profil beeinflussen, was mit dem Verfahren der vorliegenden Offenbarung zuverlässig nachgewiesen werden kann. [0089] Die Krebsfrüherkennung ist eine der größten Herausforderungen der Onkologie und daher ein dynamischer, schnell wachsender Bereich. Mit dem hier beschriebenen Verfahren können Hochrisikopatienten oder Individuen mit Veranlagungen für Krebsmerkmale untersucht werden. Daher können Patienten oder Einzelindividuen eine Biofluidprobe, z. B. ein€ Blutprobe, beim Arzt, Gesundheitsdienstleister oder im Krankenhaus abgeben, die dann mit dem hier beschriebenen Verfahren analysiert werden kann. [0090] Für die NMR-Analyse können die Biofluidproben mit einem Hochfrequenz-NMRSpektrometer mit einer Pulsfolge ähnlich der CPMGPR1D analysiert werden. Der freie Induktionsabfall (FID) kann mittels Fourier-Transformation in ein Spektrum umgewandelt werden, das auf einen Referenzpeak (Laktat, anomeres D-Glukose-Duplett etc.) normiert wird, es kann die Basislinie korrigiert werden, und das Spektrum kann in gebinnte Inkremente von kleiner oder gleich 0,02 ppm, vorzugsweise kleiner oder gleich etwa 0,01, noch bevorzugter kleiner oder gleich etwa 0,006 ppm, zum Beispiel etwa 0,00016 ppm, umgewandelt werden, wie in den Figuren 3B bis 3D gezeigt. Dies ermöglicht es, eine beträchtliche Menge an Daten für die Entschlüsselung der 20. Furthermore, the method described here is fast, scalable, and cost-effective, making it very suitable for industrial use. [0087] Cancer cells can influence the overall molecular composition of biofluids, such as blood, due to their altered metabolism and protein composition. This influence can include changes in the proportion of specific proteins, metabolites, micro- and/or macromolecules that are either released directly by the cancer cells or whose release is directly or indirectly induced by them, e.g., via biological networks, as described, for example, with reference to Figure 2. To detect this change, the high-frequency 1H NMR spectroscopy and/or the 1H NMR spectroscopy data 200, 300a-c are used, which can provide information about the chemical environment of each hydrogen atom in the biofluid sample and thus indirectly provide information about each individual molecule and its concentration. Depending on the structure of the molecule containing the corresponding hydrogen atom, a different chemical environment may be present, changing the chemical shift of that specific hydrogen atom. For these reasons, the 1H NMR spectroscopy data 200, 300a-c of the biofluid sample can provide an extremely high information density that can be used to distinguish between healthy individuals and individuals with a cancerous and/or pre-cancerous disease. [0088] Currently, analysis methods are limited to signals that can be assigned to a specific molecule, e.g., a metabolite or a lipoprotein. The great advantage of the method described here may be that all hydrogen peaks in the 1H NMR spectrum are taken into account, thus allowing the entire molecular profile 310a-c of the individual's biofluid sample to be considered. Even in early stages, cancer cells can influence the metabolism and molecular profile, which can be reliably detected using the method of the present disclosure. [0089] Early cancer detection is one of the greatest challenges in oncology and is therefore a dynamic, rapidly growing field. The method described here can be used to screen high-risk patients or individuals with predispositions to cancer characteristics. Therefore, patients or individuals can provide a biofluid sample, e.g., a blood sample, to a physician, healthcare provider, or hospital, which can then be analyzed using the method described here. [0090] For NMR analysis, the biofluid samples can be analyzed using a high-frequency NMR spectrometer with a pulse sequence similar to the CPMGPR1D. The free induction decay (FID) can be converted into a spectrum by Fourier transformation, normalized to a reference peak (lactate, anomeric D-glucose doublet, etc.), baseline corrected, and the spectrum converted into binned increments of less than or equal to 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm, as shown in Figures 3B to 3D. This allows a considerable amount of data to be obtained for the decoding of the

molekularen Zusammensetzung oder des Profils 310a-c der Biofluidprobe zu erhalten und kann molecular composition or profile 310a-c of the biofluid sample and can

15 15

20 20

25 25

30 30

35 35

21 Informationen über alle vorhandenen Wasserstoffatome liefern, wobei jedes Bin eine bestimmte Position im Spektrum darstellen kann. [0091] Während bei den meisten anderen Ansätzen zur Erkennung von Krebs oder Multikrebs die Flüssigbiopsie in Kombination mit Sequenzierungstechnologien der nächsten Generation verwendet wird, konzentriert sich das Verfahren gemäß der vorliegenden Offenlegung nicht auf Nukleotidsequenzen, sondern auf das molekulare Profil 310a-c der Biofluidprobe des Individuums unter Verwendung von 1H-NMR-Spektroskopie, maschinellem Lernen und optional biologischem Denken. Insbesondere können Wasserstoffpeaks, die nicht spezifischen Metaboliten, Proteinen, Aminosäuren, Mikro- und/oder Makromolekülen zugeordnet werden können, sondern einer Überlappung mehrerer Moleküle, Proteine, Nukleinsäuren und/oder anderer Komponenten oder Bestandteile der Biofluidprobe, zur Bestimmung des Krebsrisiko-Scores verwendet werden. Aus diesem Grund kann das Verfahren gemäß der vorliegenden Offenlegung eine verbesserte Sensitivität und Spezifität von über 95 % (sowohl in vorläufigen als auch in Pilotstudiendaten) bieten, verglichen mit etwa 60-80 % für andere Ansätze zur Erkennung von Mehrfachkrebs oder Tumormarkern mit noch geringerer Genauigkeit. [0092] Das Verfahren gemäß der vorliegenden Offenbarung kann die Unterscheidung von gesunden und kranken Patienten oder Individuen bzw. Individuen mit einer Krebserkrankung und/oder einer Präkrebserkrankung ermöglichen. Zum Trainieren des mindestens einen ML-Modells 114 können 1-HNMR-Spektroskopiedaten 200, 300a-c eines Trainingsdatensatzes von vorzugsweise einer Vielzahl von Individuen normalisiert werden. Anschließend kann eine Merkmalsauswahl und ein Engineering für Merkmale des mindestens einen ML-Modells 114 durchgeführt werden. Beispielsweise können ein oder mehrere Bereiche in und/oder Bins der chemischen Verschiebung auf der Basis der normalisierten 1H-NMR-Spektroskopiedaten 300a-c als Merkmale des ML-Modells 114 ausgewählt werden, die einen Unterschied zwischen gesunden und Krebs-/Präkrebs-Proben liefern und somit die Moleküle, Proteine, Aminosäuren, Mikro- und/oder Makromoleküle darstellen können, die Veränderungen zwischen gesunden oder nicht krebskranken Individuen und krebskranken/präkrebskranken Individuen widerspiegeln oder anzeigen. [0093] Optional können Methoden wie die Hauptkomponenten (PCA) verwendet werden, um zusätzliche konstruierte und/oder extrahierte Merkmale für das mindestens eine ML-Modell 114 zu erstellen. Die identifizierten Merkmale können Informationen von Metaboliten, Aminosäuren, Proteinen, Mikro- und/oder Makromolekülen enthalten. Nach der Merkmalsauswahl, beispielsweise durch einen p-Wert-basierten Filter, kann das mindestens ein ML-Modell 114 anhand der ausgewählten, konstruierten und/oder extrahierten Merkmale trainiert werden. [0094] Ein Ansatz des maschinellen Lernens wird verwendet, um das mindestens eine ML-Modell 114 zu trainieren, das das molekulare Profil 310a-c der gesunden/nicht krebskranken und der erkrankten/krebskranken/präkrebskranken Patienten oder Individuen vorhersagen kann. Die Verwendung des molekularen Profils 310a-c kann die Erkennung von Krebssignalen oder signaturen aus der Biofluidprobe des Individuums ermöglichen. Das mindestens eine ML-Modell 114 21 provide information about all hydrogen atoms present, where each bin can represent a specific position in the spectrum. [0091] While most other approaches to cancer or multicancer detection use liquid biopsy in combination with next-generation sequencing technologies, the method according to the present disclosure focuses not on nucleotide sequences, but on the molecular profile 310a-c of the individual's biofluid sample using 1H NMR spectroscopy, machine learning, and optionally biological reasoning. In particular, hydrogen peaks that cannot be attributed to specific metabolites, proteins, amino acids, micro- and/or macromolecules, but rather to an overlap of multiple molecules, proteins, nucleic acids, and/or other components or constituents of the biofluid sample, can be used to determine the cancer risk score. For this reason, the method according to the present disclosure can provide improved sensitivity and specificity of over 95% (in both preliminary and pilot study data), compared to approximately 60-80% for other approaches for detecting multiple cancers or tumor markers with even lower accuracy. [0092] The method according to the present disclosure can enable the discrimination of healthy and diseased patients or individuals with a cancer and/or pre-cancer disease. To train the at least one ML model 114, 1-HNMR spectroscopy data 200, 300a-c of a training dataset of preferably a plurality of individuals can be normalized. Subsequently, feature selection and engineering for features of the at least one ML model 114 can be performed. For example, one or more ranges in and/or bins of the chemical shift may be selected based on the normalized 1H NMR spectroscopy data 300a-c as features of the ML model 114 that provide a difference between healthy and cancer/pre-cancer samples and thus may represent the molecules, proteins, amino acids, micro- and/or macromolecules that reflect or indicate changes between healthy or non-cancerous individuals and cancer/pre-cancerous individuals. [0093] Optionally, methods such as principal components analysis (PCA) may be used to create additional constructed and/or extracted features for the at least one ML model 114. The identified features may include information from metabolites, amino acids, proteins, micro- and/or macromolecules. After feature selection, for example, by a p-value-based filter, the at least one ML model 114 may be trained using the selected, constructed and/or extracted features. [0094] A machine learning approach is used to train the at least one ML model 114 that can predict the molecular profile 310a-c of the healthy/non-cancerous and the diseased/cancerous/pre-cancerous patients or individuals. The use of the molecular profile 310a-c can enable the detection of cancer signals or signatures from the individual's biofluid sample. The at least one ML model 114

kann durch Kreuzvalidierung validiert und optional an einem separaten Testdatensatz von 1H-NMR-can be validated by cross-validation and optionally on a separate test data set of 1H-NMR

15 15

20 20

25 25

30 30

35 35

22 Spektroskopiedaten 200, 300a-c einer oder mehrerer weiterer Individuen getestet werden, die sich z. B. von den im Trainingsdatensatz berücksichtigten Individuen unterscheiden. [0095] Dementsprechend kann die Fähigkeit der Hochfrequenz-NMR-Spektroskopie mit einem gut durchdachten Ansatz des maschinellen Lernens und der Merkmalsidentifizierung kombiniert werden, um hochkomplexe Korrelationen in den 1H-NMR-Spektroskopiedaten 200, 300a-c zu erkennen, die für die Biofluidprobe des Individuums und/oder das entsprechende molekulare Profil 310a-c kennzeichnend sein können, was die frühestmögliche Erkennung einer krebsartigen und/oder präkrebsartigen Erkrankung ermöglichen kann. [0096] Beispielsweise kann durch Ausgabe und/oder Berechnung eines oder mehrerer Klassifizierungsergebnisse und/oder Wahrscheinlichkeiten für die jeweiligen Klassen ein KrebsrisikoScore ermittelt werden, der Aufschluss darüber geben kann, inwieweit das molekulare Profil 310a-c einem bestimmten Gesundheitszustand ähnelt. Basierend auf dem Krebsrisiko-Score können Informationen darüber bereitgestellt werden, wie wahrscheinlich der vorhergesagte Gesundheitszustand ist. [0097] Optional kann das mindestens ein ML-Modell 114 durch Anpassung der Krebsrisiko-ScoreCut-off-Werte für Patienten- oder Einzelgruppen und gewünschte falsch-positive und/oder falschnegative Quoten angepasst werden, beispielsweise auf der Basis einer Area-Under-The-CurveAUC-Receiver-Operator-Kurve (AUC-ROC). [0098] Figur 4 zeigt ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß einer beispielhaften Ausführungsform illustriert. Das Verfahren der Figur 4 kann von der unter Bezugnahme auf Figur 1 beschriebenen Computervorrichtung 100 durchgeführt werden. Insbesondere veranschaulicht Figur 4 das Training des ML-Modells 114 des Computergeräts 100. [0099] In einem ersten Schritt 400 werden 1H-NMR-Spektroskopie-Rohdaten bei einer NMRFrequenz von etwa 500 MHz oder darüber erfasst, z. B. mit einer Pulsfolge ähnlich CPMG und/oder CPMGPR1D. [0100] In Schritt 402 können die 1H-NMR-Spektroskopiedaten ausgerichtet und/oder eine Spektralverarbeitung durchgeführt werden, wie z. B. eine oder mehrere Referenzierungen der chemischen Verschiebung, eine Phaseneinstellung oder -korrektur und eine Basislinienkorrektur der 1H-NMR-Spektroskopie-Rohdaten. [0101] In einem weiteren Schritt 404 können die 1H-NMR-Spektroskopie-Rohdaten in inkrementelle Bins der chemischen Verschiebung eingeteilt werden, wobei jedes Bin eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, zum Beispiel etwa 0,00016 ppm, aufweist. [0102] In Schritt 406 können die gebinnten 1H-NMR-Spektroskopiedaten auf einen Median der Spektroskopiedaten normiert werden. [0103] In Schritt 408 können den gebinnten 1H-NMR-Spektroskopiedaten Kennzeichnungen 22 Spectroscopy data 200, 300a-c of one or more further individuals may be tested, which, for example, differ from the individuals considered in the training dataset. [0095] Accordingly, the capability of high-frequency NMR spectroscopy may be combined with a well-designed machine learning and feature identification approach to detect highly complex correlations in the 1H NMR spectroscopy data 200, 300a-c that may be characteristic of the individual's biofluid sample and/or the corresponding molecular profile 310a-c, which may enable the earliest possible detection of a cancerous and/or pre-cancerous disease. [0096] For example, by outputting and/or calculating one or more classification results and/or probabilities for the respective classes, a cancer risk score may be determined, which may provide information about the extent to which the molecular profile 310a-c resembles a particular health condition. Based on the cancer risk score, information can be provided about how likely the predicted health condition is. [0097] Optionally, the at least one ML model 114 can be adapted by adjusting the cancer risk score cut-off values for patient or individual groups and desired false positive and/or false negative rates, for example, based on an area-under-the-curve AUC receiver operator curve (AUC-ROC). [0098] Figure 4 shows a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to an exemplary embodiment. The method of Figure 4 can be performed by the computing device 100 described with reference to Figure 1. In particular, Figure 4 illustrates the training of the ML model 114 of the computing device 100. [0099] In a first step 400, raw 1H NMR spectroscopy data is acquired at an NMR frequency of about 500 MHz or above, e.g., with a pulse sequence similar to CPMG and/or CPMGPR1D. [0100] In step 402, the 1H NMR spectroscopy data may be aligned and/or spectral processing may be performed, such as one or more chemical shift referencing, phase adjustment or correction, and baseline correction of the raw 1H NMR spectroscopy data. [0101] In a further step 404, the 1H NMR spectroscopy raw data may be divided into incremental chemical shift bins, each bin having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. [0102] In step 406, the binned 1H NMR spectroscopy data may be normalized to a median of the spectroscopy data. [0103] In step 408, labels may be assigned to the binned 1H NMR spectroscopy data

zugewiesen werden, z. B. 0 und 1 für eine binäre Klassifizierung. be assigned, e.g. 0 and 1 for a binary classification.

23 / 44 23 / 44

15 15

20 20

25 25

30 30

35 35

23 [0104] In Schritt 410 kann eine Merkmalsauswahl, -identifizierung und/oder -extraktion für Merkmale des ML-Modells 114 wie oben beschrieben durchgeführt werden. Zum Beispiel kann KBest für die Merkmalsauswahl verwendet werden. [0105] Im optionalen Schritt 412 können Bin-Merkmale auf der Basis des ppm-Bereichs verworfen werden, es können unüberwachte Clustering-Merkmale durchgeführt werden, es kann eine Unterabtastung und/oder eine Überabtastung vorgenommen werden. [0106] In Schritt 414 kann eine Skalierung vorgenommen werden, z. B. mit StandardScaler. [0107] In Schritt 416 kann das ML-Modell 114 für eine binäre oder Mehrklassen-Klassifizierung mit einem statistischen maschinellen Lernalgorithmus trainiert werden, wie z. B. VotingClassifier, Ensemble-Methoden, logistische Regression oder andere. [0108] In Schritt 418 kann das trainierte ML-Modell bei der Inferenz verwendet werden, um einen Krebsrisiko-Score zu erzeugen, der z. B. eine Wahrscheinlichkeit zwischen 0 und 1 für eine binäre Klassifizierung angibt. Dabei kann der Krebsrisiko-Score als computerimplementierter Biomarker fungieren, der über pathogene molekulare Signaturen in der Biofluidprobe eines Individuums informiert. [0109] Figur 5 zeigt ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß einer beispielhaften Ausführungsform illustriert. Das Verfahren der Figur 5 kann von der unter Bezugnahme auf Figur 1 beschriebenen Computervorrichtung 100 durchgeführt werden. Insbesondere veranschaulicht Figur 5 die Inferenz des ML-Modells 114 der Computervorrichtung 100. [0110] In einem ersten Schritt 500 werden 1H-NMR-Spektroskopie-Rohdaten bei einer NMRFrequenz von etwa 500 MHz oder darüber erfasst, z. B. mit einer Pulsfolge ähnlich CPMG und/oder CPMGPR1D. [0111] In Schritt 502 können die 1H-NMR-Spektroskopiedaten ausgerichtet und/oder eine Spektralverarbeitung durchgeführt werden, wie z. B. eine oder mehrere Referenzierungen der chemischen Verschiebung, eine Phaseneinstellung oder -korrektur und eine Basislinienkorrektur der 1H-NMR-Spektroskopierohdaten. [0112] In einem weiteren Schritt 504 können die 1H-NMR-Spektroskopie-Rohdatenin inkrementelle Bins der chemischen Verschiebung eingeteilt werden, wobei jedes Bin eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, zum Beispiel etwa 0,00016 ppm, aufweist. [0113] In Schritt 506 können die gebinnten 1H-NMR-Spektroskopiedaten auf einen Median der Spektroskopiedaten normiert werden. [0114] In Schritt 508 kann eine Skalierung vorgenommen werden, z. B. mit StandardScaler. [0115] In Schritt 510 kann das trainierte ML-Modell 114, das z. B. nach dem Verfahren von Figur 4 trainiert wurde, verwendet werden, um einen Krebsrisiko-Score zu erzeugen, der z. B. eine Wahrscheinlichkeit zwischen 0 und 1 für eine binäre Klassifizierung angibt. Dabei kann das Krebsrisiko als computerimplementierter Biomarker dienen, der über pathogene molekulare 23 [0104] In step 410, feature selection, identification, and/or extraction for features of the ML model 114 may be performed as described above. For example, KBest may be used for feature selection. [0105] In optional step 412, bin features may be discarded based on the ppm range, unsupervised clustering of features may be performed, undersampling and/or oversampling may be performed. [0106] In step 414, scaling may be performed, e.g., with StandardScaler. [0107] In step 416, the ML model 114 may be trained for binary or multi-class classification using a statistical machine learning algorithm, such as VotingClassifier, ensemble methods, logistic regression, or others. [0108] In step 418, the trained ML model can be used in the inference to generate a cancer risk score, which, for example, indicates a probability between 0 and 1 for a binary classification. The cancer risk score can act as a computer-implemented biomarker that informs about pathogenic molecular signatures in the biofluid sample of an individual. [0109] Figure 5 shows a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to an exemplary embodiment. The method of Figure 5 can be performed by the computing device 100 described with reference to Figure 1. In particular, Figure 5 illustrates the inference of the ML model 114 of the computing device 100. [0110] In a first step 500, 1H NMR spectroscopy raw data is acquired at an NMR frequency of about 500 MHz or above, e.g. B. with a pulse sequence similar to CPMG and/or CPMGPR1D. [0111] In step 502, the 1H NMR spectroscopy data may be aligned and/or spectral processing may be performed, such as one or more chemical shift referencing, phase adjustment or correction, and baseline correction of the 1H NMR spectroscopy raw data. [0112] In a further step 504, the 1H NMR spectroscopy raw data may be binned into incremental chemical shift bins, each bin having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example, about 0.00016 ppm. [0113] In step 506, the binned 1H-NMR spectroscopy data can be normalized to a median of the spectroscopy data. [0114] In step 508, scaling can be performed, e.g., with StandardScaler. [0115] In step 510, the trained ML model 114, which was trained, e.g., according to the method of Figure 4, can be used to generate a cancer risk score, which, e.g., indicates a probability between 0 and 1 for a binary classification. The cancer risk can serve as a computer-implemented biomarker that can be used to predict pathogenic molecular

Signaturen in der Biofluidprobe eines Individuums informiert. Beispielsweise kann einer der in den signatures in an individual’s biofluid sample. For example, one of the

15 15

20 20

25 25

30 30

35 35

24 Figuren 3B bis 3D gezeigten 1H-NMR-Spektroskopiedaten 310a-c als Eingabe in Schritt 510 verwendet werden, um den entsprechenden Krebsrisiko-Score zu berechnen oder zu bestimmen. [0116] Figur 6 zeigt ein Flussdiagramm, das die Schritte eines computerimplementierten Verfahrens zur Bestimmung eines Krebsrisiko-Scores gemäß einer beispielhaften Ausführungsform illustriert. Das Verfahren von Figur 6 kann von der in Figur 1 beschriebenen Computervorrichtung 100 durchgeführt werden. Außerdem kann das in Figur 6 gezeigte Verfahren zur Ergänzung von Schritt S3 des in Figur 3 gezeigten Verfahrens verwendet werden. Dementsprechend kann das Verfahren der Figur 6 die Schritte S1 bis S4 des Verfahrens der Figur 3 umfassen, wobei der Schritt S3 durch die in Figur 6 dargestellten und beschriebenen detaillierten Schritte ergänzt werden kann. [0117] In Schritt 600 des Verfahrens von Figur 6 umfasst Schritt S3 des Klassifizierens des molekularen Profils 310a-c in mindestens die erste Klasse und die zweite Klasse von molekularen Profilen 310a-c das Klassifizieren des molekularen Profils 310a-c in mindestens eine gesunde Klasse von molekularen Profilen 310a, die mit gesunden Individuen assoziiert sind, und eine nicht gesunde oder kranke Klasse von molekularen Profilen 310b,c, die mit nicht gesunden oder kranken Individuen assoziiert sind, basierend auf dem Auswerten der 1H-NMR-Spektroskopiedaten 300a-c in Bezug auf das molekulare Profil 310a-c der Biofluidprobe mit einem ersten trainierten Maschinenlernmodell. Der Klarheit und Einfachheit halber ist in Figur 6 eine Klassifizierung in die gesunde Klasse mit der Referenznummer 602 und eine Klassifizierung in die nicht gesunde oder kranke Klasse mit der Referenznummer 604 dargestellt. [0118] Ferner umfasst das Verfahren in Schritt 606 nach der Feststellung in Schritt 604, dass das molekulare Profil 310b, c der Biofluidprobe mit der nicht-gesunden oder kranken Klasse der molekularen Profile 310b, c assoziiert ist oder sich in dieser befindet, das Klassifizieren, basierend auf der Auswertung der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem zweiten trainierten maschinellen Lernmodell, das molekulare Profil in eine Nicht-Krebs-Klasse 310a von molekularen Profilen, die mit nicht-krebsartigen Individuen assoziiert sind, und eine Krebs-Klasse von molekularen Profilen 310b, c, die mit krebsartigen und/oder präkrebsartigen Individuen assoziiert sind. Der Klarheit und Einfachheit halber wird in Figur 6 eine Klassifizierung in die Nicht-Krebs-Klasse mit der Referenznummer 608 und’eine Klassifizierung In die Krebs-Klasse mit der Referenznummer 610 dargestellt. [0119] Optional kann in den Schritten 600-604 ein erster Krebsrisiko-Score bestimmt werden, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil 310a mit der gesunden Klasse und/oder der nicht-gesunden oder kranken Klasse assoziiert ist oder in dieser Klasse liegt. Ferner kann optional in den Schritten 606-610 ein zweiter Krebsrisiko-Score bestimmt werden, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil 310b, c mit der Nicht-Krebs-Klasse und/oder der Krebs-Klasse der molekularen Profile 310b, c assoziiert ist oder in dieser Klasse liegt. Ferner kann optional in den Schritten 606-610 der Krebsrisiko-Score bzw. ein endgültiger Krebsrisiko-Score bestimmt werden, z. B. auf der Basis des ersten und des zweiten Krebsrisiko-24 The 1H NMR spectroscopy data 310a-c shown in Figures 3B to 3D can be used as input in step 510 to calculate or determine the corresponding cancer risk score. [0116] Figure 6 shows a flowchart illustrating the steps of a computer-implemented method for determining a cancer risk score according to an exemplary embodiment. The method of Figure 6 can be performed by the computing device 100 described in Figure 1. Furthermore, the method shown in Figure 6 can be used to supplement step S3 of the method shown in Figure 3. Accordingly, the method of Figure 6 can comprise steps S1 to S4 of the method of Figure 3, wherein step S3 can be supplemented by the detailed steps shown and described in Figure 6. [0117] In step 600 of the method of Figure 6, step S3 of classifying the molecular profile 310a-c into at least the first class and the second class of molecular profiles 310a-c comprises classifying the molecular profile 310a-c into at least one healthy class of molecular profiles 310a associated with healthy individuals and one unhealthy or diseased class of molecular profiles 310b,c associated with unhealthy or diseased individuals based on evaluating the 1H NMR spectroscopy data 300a-c with respect to the molecular profile 310a-c of the biofluid sample with a first trained machine learning model. For clarity and simplicity, Figure 6 depicts a classification into the healthy class with reference number 602 and a classification into the unhealthy or diseased class with reference number 604. [0118] Furthermore, the method comprises, in step 606, after determining in step 604 that the molecular profile 310b, c of the biofluid sample is associated with or is in the non-healthy or diseased class of molecular profiles 310b, c, classifying, based on the evaluation of the 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a second trained machine learning model, the molecular profile into a non-cancer class 310a of molecular profiles associated with non-cancerous individuals and a cancer class of molecular profiles 310b, c associated with cancerous and/or pre-cancerous individuals. For clarity and simplicity, a classification into the non-cancer class is depicted in Figure 6 with reference number 608 and a classification into the cancer class with reference number 610. [0119] Optionally, in steps 600-604, a first cancer risk score can be determined, which indicates a probability that the molecular profile 310a is associated with the healthy class and/or the non-healthy or diseased class or lies in this class. Furthermore, optionally, in steps 606-610, a second cancer risk score can be determined, which indicates a probability that the molecular profile 310b, c is associated with the non-cancer class and/or the cancer class of the molecular profiles 310b, c or lies in this class. Furthermore, optionally, in steps 606-610, the cancer risk score or a final cancer risk score can be determined, e.g., on the basis of the first and second cancer risk scores.

Scores. Scores.

15 15

20 20

25 25

30 30

35 35

25 [0120] Ferner umfasst das Verfahren in Schritt 612 nach der Feststellung, dass das molekulare Profil der Biofluidprobe mit der krebsartigen Klasse von molekularen Profilen assoziiert ist oder sich in dieser befindet, das Klassifizieren des molekularen Profils in eine Vielzahl von Klassen von molekularen Profilen auf der Basis der auswertenden 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem dritten trainierten maschinellen Lernmodell, wobei jede Klasse mit einer bestimmten Art von Krebserkrankung und/oder Präkrebserkrankung assoziiert ist, wie durch die Bezugsziffern 614a, 614b, 614c in Figur 6 angegeben. [0121] Optional kann ein dritter Krebsrisiko-Score bestimmt werden, der die Wahrscheinlichkeit angibt, dass das molekulare Profil mit einer bestimmten Art von Krebs, wie z. B. Bauchspeicheldrüsenkrebs, und/oder einer Krebsvorstufe, wie z. B. Pankreatitis, in Verbindung steht. Dementsprechend kann eine Krebsart, eine Krebserkrankung und/oder eine Krebsvorstufe mit Hilfe des hier beschriebenen Verfahrens bestimmt werden. [0122] Obwohl die Erfindung in den Zeichnungen und der vorstehenden Beschreibung im Detail dargestellt und beschrieben ist, sind diese Darstellungen und Beschreibungen als illustrativ oder beispielhaft und nicht einschränkend zu betrachten; die Erfindung ist nicht auf die offengelegten Ausführungsformen beschränkt. Die Erfindung ist nicht auf die offengelegten Ausführungsformen beschränkt. Andere Variationen der offengelegten Ausführungsformen können von Fachleuten, die die beanspruchte Erfindung ausüben, anhand der Zeichnungen, der Offenbarung und der Ansprüche verstanden und verwirklicht werden. [0123] Wie hier verwendet, schließt das Wort "umfassend" andere Elemente oder Schritte nicht aus, und der unbestimmte Artikel "ein" schließt eine Mehrzahl nicht aus. Die bloße Tatsache, dass bestimmte Maßnahmen in voneinander abhängigen Ansprüchen genannt werden, bedeutet nicht, dass eine Kombination dieser Maßnahmen nicht vorteilhaft sein kann. Etwaige Bezugszeichen in den Ansprüchen sollten nicht als Einschränkung des Anwendungsbereichs ausgelegt werden. 25 [0120] Furthermore, in step 612, after determining that the molecular profile of the biofluid sample is associated with or is in the cancerous class of molecular profiles, the method comprises classifying the molecular profile into a plurality of classes of molecular profiles based on the evaluating 1H NMR spectroscopy data related to the molecular profile of the biofluid sample with a third trained machine learning model, wherein each class is associated with a particular type of cancer and/or pre-cancerous condition, as indicated by reference numerals 614a, 614b, 614c in Figure 6. [0121] Optionally, a third cancer risk score may be determined that indicates the probability that the molecular profile is associated with a particular type of cancer, such as pancreatic cancer, and/or a pre-cancerous condition, such as pancreatitis. Accordingly, a type of cancer, a cancer disorder, and/or a precancerous condition can be determined using the method described herein. [0122] Although the invention is shown and described in detail in the drawings and the above description, these illustrations and descriptions are to be considered as illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments. The invention is not limited to the disclosed embodiments. Other variations of the disclosed embodiments can be understood and practiced by those skilled in the art practicing the claimed invention, based on the drawings, the disclosure, and the claims. [0123] As used herein, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" does not exclude a plurality. The mere fact that certain measures are recited in dependent claims does not mean that a combination of these measures cannot be advantageous. Any reference signs in the claims should not be construed as limiting the scope of application.

[0124] Wie hierin verwendet, kann die Formulierung "indikativ für" "widerspiegelnd” und/oder "umfassend" bedeuten. Dementsprechend kann eine Einheit/ein Element, auf die/das hier als "indikativ für [...]" Bezug genommen wird, hier synonym oder austauschbar mit der Einheit/dem Element "umfassend [...]" oder der Einheit/’dem Element "reflektierend [...]” Verwendet werden. [0125] Darüber hinaus werden die Begriffe "erste", "zweite", "dritte" oder "a)", "b)", "c)” und dergleichen in der Beschreibung und in den Ansprüchen zur Unterscheidung ähnlicher Elemente und nicht unbedingt zur Beschreibung einer sequentiellen oder chronologischen Reihenfolge verwendet. Es versteht sich, dass die so verwendeten Begriffe unter geeigneten Umständen austauschbar sind und dass die hierin beschriebenen Ausführungsformen der Erfindung auch in anderen als den hier beschriebenen oder abgebildeten Reihenfolgen betrieben werden können. [0126] Im Rahmen der vorliegenden Erfindung ist jeder angegebene Zahlenwert typischerweise mit einem Genauigkeitsintervall verbunden, das der Fachmann so versteht, dass die technische Wirkung des betreffenden Merkmals noch gewährleistet ist. Im Rahmen der vorliegenden Erfindung liegt die Abweichung von dem angegebenen Zahlenwert im Bereich von + 10 %, vorzugsweise von + [0124] As used herein, the phrase "indicative of" may mean "reflective" and/or "comprising." Accordingly, a unit/element referred to herein as "indicative of [...]" may be used synonymously or interchangeably with the unit/element "comprising [...]" or the unit/element "reflective [...]." [0125] Furthermore, the terms "first," "second," "third," or "a)," "b)," "c)," and the like are used in the description and claims to distinguish similar elements and not necessarily to describe a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances, and that the embodiments of the invention described herein may be operated in orders other than those described or depicted herein. [0126] Within the scope of the present invention, each specified numerical value is typically associated with an accuracy interval that the person skilled in the art understands to be such that the technical effect of the feature in question is still ensured. Within the scope of the present invention, the deviation from the specified numerical value is in the range of + 10%, preferably +

5 %. Die vorgenannte Abweichung von dem angegebenen Zahlenintervall von + 10 %, vorzugsweise 5%. The above-mentioned deviation from the specified numerical interval of + 10%, preferably

26 / 44 26 / 44

+ 5%, wird auch durch die hier in Bezug auf einen Zahlenwert verwendeten Begriffe "etwa" und + 5%, is also indicated by the terms "approximately" and "approximately" used here in relation to a numerical value.

"ungefähr" ausgedrückt. expressed as "approximately".

15 15

20 20

25 25

30 30

35 35

27 ANSPRÜCHE 27 CLAIMS

1. Computer-implementiertes Verfahren zur Bestimmung eines Krebsrisiko-Scores für ein Individuum, wobei das Verfahren umfasst: 1. A computer-implemented method for determining a cancer risk score for an individual, the method comprising:

Erhalten, an einer Computervorrichtung (100), von Wasserstoff-1-Kernspinresonanz-, 1HNMR-Spektroskopiedaten (300a-c) für eine Biofluidprobe eines Individuums, wobei die 1H-NMRSpektroskopiedaten (300a-c) indikativ für eine NMR-Signalintensität als eine Funktion der chemischen Verschiebung sind; Obtaining, at a computing device (100), hydrogen 1-nuclear magnetic resonance, 1HNMR spectroscopy data (300a-c) for a biofluid sample of an individual, wherein the 1H-NMR spectroscopy data (300a-c) is indicative of an NMR signal intensity as a function of chemical shift;

Auswerten, mit mindestens einem trainierten maschinellen Lernmodell (114) der Computervorrichtung (100), der 1H-NMR-Spektroskopiedaten (300a-c) in Bezug auf ein molekulares Profil (310a-c) der Biofluidprobe, wobei das molekulare Profil alle Wasserstoffpeaks in den 1H-NMRSpektroskopiedaten enthält, die einem oder mehreren von einem Metaboliten, einem Protein, einer Aminosäure, einem Mikromolekül und einem Makromolekül zugeordnet werden können, die in der Biofluidprobe enthalten sind; Evaluating, with at least one trained machine learning model (114) of the computing device (100), the 1H NMR spectroscopy data (300a-c) with respect to a molecular profile (310a-c) of the biofluid sample, wherein the molecular profile includes all hydrogen peaks in the 1H NMR spectroscopy data that can be assigned to one or more of a metabolite, a protein, an amino acid, a micromolecule, and a macromolecule contained in the biofluid sample;

Klassifizieren, basierend auf dem Auswerten mit dem mindestens einen trainierten maschinellen Lernmodell (114), des molekularen Profils (310a-c) in mindestens eine erste Klasse und eine zweite Klasse von molekularen Profilen, wobei die erste Klasse repräsentativ für molekulare Profile (310a) ist, die mit gesunden Individuen assoziiert sind, und die zweite Klasse repräsentativ für molekulare Profile (310b, c) ist, die mit Individuen assoziiert sind, die eine krebsartige und/oder präkrebsartige Erkrankung haben; und Classifying, based on the evaluation with the at least one trained machine learning model (114), the molecular profile (310a-c) into at least a first class and a second class of molecular profiles, wherein the first class is representative of molecular profiles (310a) associated with healthy individuals and the second class is representative of molecular profiles (310b, c) associated with individuals having a cancerous and/or pre-cancerous disease; and

Bestimmen, basierend auf dem Klassifizieren des molekularen Profils, eines Krebsrisiko-Determine, based on classifying the molecular profile, a cancer risk

Scores, der eine Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum angibt. Scores that indicate a probability of cancer occurring in the individual.

2. Verfahren nach dem vorhergehenden Anspruch, wobei das Bestimmen des KrebsrisikoScores das Berechnen und/oder Bestimmen der Wahrscheinlichkeit für das Auftreten von Krebs bei dem Individuum auf Basis des mindestens einen trainierten maschinellen Lernmodells (114) der Computervorrichtung (100) umfasst. 2. The method according to the preceding claim, wherein determining the cancer risk score comprises calculating and/or determining the probability of cancer occurrence in the individual based on the at least one trained machine learning model (114) of the computer device (100).

3. Verfahren nach einem der vorhergehenden Ansprüche, wobei das Klassifizieren des molekularen Profils (310a-c) das Bestimmen und/oder Berechnen eines Klassifizierungsergebnisses umfasst, das indikativ für eine Wahrscheinlichkeit für die erste Klasse und/oder die zweite Klasse ist; und 3. The method according to any one of the preceding claims, wherein classifying the molecular profile (310a-c) comprises determining and/or calculating a classification result indicative of a probability for the first class and/or the second class; and

wobei der Krebsrisiko-Score auf Basis des Klassifizierungsergebnisses bestimmt wird. where the cancer risk score is determined based on the classification result.

4. Verfahren nach einem der vorhergehenden Ansprüche, wobei das mindestens eine maschinelle Lernmodell (114) trainiert wird, um eine Wahrscheinlichkeit für eine binäre Klassifizierung und/oder Multiklassen-Klassifizierung des molekularen Profils (310a-c) der 4. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained to determine a probability for a binary classification and/or multi-class classification of the molecular profile (310a-c) of the

Biofluidprobe zu bestimmen. biofluid sample.

28 / 44 28 / 44

15 15

20 20

25 25

30 30

35 35

28 5. Verfahren nach einem der vorhergehenden Ansprüche, wobei die Biofluidprobe aus der Gruppe bestehend aus einer Blutserumprobe, einer Blutplasmaprobe, einer Blutprobe, einer 28 5. The method according to any one of the preceding claims, wherein the biofluid sample is selected from the group consisting of a blood serum sample, a blood plasma sample, a blood sample, a

Urinprobe und einer Zerebrospinalflüssigkeitsprobe ausgewählt wird. urine sample and a cerebrospinal fluid sample.

6. Verfahren nach einem der vorhergehenden Ansprüche, wobei die 1H-NMRSpektroskopiedaten (300a-c) indikativ für eine NMR-Signalintensität in gebinnten Inkrementen der chemischen Verschiebung sind, wobei jedes Inkrement eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder 6. The method according to any one of the preceding claims, wherein the 1H-NMR spectroscopy data (300a-c) are indicative of an NMR signal intensity in binned increments of chemical shift, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or

gleich etwa 0,006 ppm, beispielsweise etwa 0,00016 ppm, aufweist. equal to about 0.006 ppm, for example about 0.00016 ppm.

7. Verfahren nach einem der vorhergehenden Ansprüche, das ferner umfasst: 7. A method according to any one of the preceding claims, further comprising:

Umwandeln der 1H-NMR-Spektroskopiedaten (300a-c) in eine gebinnte Datenstruktur auf der Basis einer Figur der NMR-Signalintensität auf gebinnte Inkremente der chemischen Verschiebung, wobei jedes Inkrement eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, beispielsweise Converting the 1H NMR spectroscopy data (300a-c) into a binned data structure based on a map of NMR signal intensity to binned increments of chemical shift, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example

etwa 0,00016 ppm, aufweist. about 0.00016 ppm.

8. Verfahren nach einem der vorhergehenden Ansprüche, das ferner umfasst: Zusammenfassen der 1H-NMR-Spektroskopiedaten (300a-c) in eine Vielzahl von inkrementellen Bins der chemischen Verschiebung, wobei jedes Bin eine Breite von weniger als oder gleich etwa 0,02 ppm, vorzugsweise weniger als oder gleich etwa 0,01, noch bevorzugter weniger als oder gleich etwa 0,006 ppm, beispielsweise etwa 0,00016 ppm, aufweist; und optionales Normieren der gebinnten 1H-NMR-Spektroskopiedaten (300a-c) auf einen Median von zumindest einer 8. The method of any one of the preceding claims, further comprising: combining the 1H NMR spectroscopy data (300a-c) into a plurality of incremental chemical shift bins, each bin having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example about 0.00016 ppm; and optionally normalizing the binned 1H NMR spectroscopy data (300a-c) to a median of at least one

Untergruppe der Bins. Subgroup of bins.

9. Verfahren nach einem der vorhergehenden Ansprüche, wobei das mindestens eine trainierte maschinelle Lernmodell (114) ein maschinell erlernter Klassifikator ist, der dazu eingerichtet ist, als Eingabedaten, die 1H-NMR-Spektroskopiedaten (300a-c) in einer Datensträktur von gebinnten 9. The method according to any one of the preceding claims, wherein the at least one trained machine learning model (114) is a machine-learned classifier configured to receive as input data the 1H-NMR spectroscopy data (300a-c) in a data structure of binned

Inkrementen der chemischen Verschiebung zu verarbeiten. to process chemical shift increments.

10. Verfahren nach einem der vorhergehenden Ansprüche, wobei das mindestens ein maschinelles Lernmodell (114) trainiert und/oder eingerichtet ist, mindestens eine molekulare Signatur einer krebsartigen und/oder einer präkrebsartigen Erkrankung im molekularen Profil (310a-10. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained and/or configured to identify at least one molecular signature of a cancerous and/or pre-cancerous disease in the molecular profile (310a-

c) der Biofluidprobe zu identifizieren. c) to identify the biofluid sample.

11. Verfahren nach einem der vorhergehenden Ansprüche, wobei der ermittelte KrebsrisikoScore als rechnerischer Biomarker verwendbar ist, der eine pathogene molekulare Signatur und/oder eine molekulare Signatur einer krebsartigen und/oder präkrebsartigen Erkrankung im 11. Method according to one of the preceding claims, wherein the determined cancer risk score can be used as a computational biomarker which represents a pathogenic molecular signature and/or a molecular signature of a cancerous and/or pre-cancerous disease in the

molekularen Profil (310a-c) der Biofluidprobe anzeigt. molecular profile (310a-c) of the biofluid sample.

29 / 44 29 / 44

15 15

20 20

25 25

30 30

35 35

29 29

12. Verfahren nach einem der Ansprüche 10 und 11, wobei das Identifizieren der mindestens einen molekularen Signatur das Identifizieren eines oder mehrerer Bins in den 1H-NMRSpektroskopiedaten (300a-c) einschließt, die mit einem oder mehreren Wasserstoffpeaks assoziiert sind und/oder diese enthalten, die auf eine krebsbedingte Veränderung des molekularen Profils (310b, c) in Bezug auf molekulare Profile (310a), die mit gesunden Individuen assoziiert sind, 12. The method according to any one of claims 10 and 11, wherein identifying the at least one molecular signature includes identifying one or more bins in the 1H-NMR spectroscopy data (300a-c) associated with and/or containing one or more hydrogen peaks that indicate a cancer-related alteration of the molecular profile (310b, c) with respect to molecular profiles (310a) associated with healthy individuals,

hinweisen. point out.

13. Verfahren nach einem der Ansprüche 10 bis 12, wobei das Identifizieren der mindestens einen molekularen Signatur das Identifizieren eines oder mehrerer Bins und/oder eines Bereichs der chemischen Verschiebung in den 1H-NMR-Spektroskopiedaten (300a-c) einschließt, die mit einer Überlappung von Wasserstoffpeaks verbunden sind, die einer Vielzahl von Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen zugeordnet werden können, die in der Biofluidprobe enthalten sind; und/oder 13. The method of any one of claims 10 to 12, wherein identifying the at least one molecular signature includes identifying one or more bins and/or a chemical shift range in the 1H NMR spectroscopy data (300a-c) associated with an overlap of hydrogen peaks attributable to a plurality of metabolites, proteins, amino acids, micromolecules, and macromolecules contained in the biofluid sample; and/or

wobei das Identifizieren der mindestens einen molekularen Signatur das Identifizieren eines oder mehrerer Bins und/oder eines Bereichs der chemischen Verschiebung in den 1H-NMRSpektroskopiedaten (300a-c) umfasst, die nicht einzelnen Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen zugeordnet werden können, die in der Biofluidprobe enthalten wherein identifying the at least one molecular signature comprises identifying one or more bins and/or a range of chemical shift in the 1H-NMR spectroscopy data (300a-c) that cannot be assigned to individual metabolites, proteins, amino acids, micromolecules and macromolecules contained in the biofluid sample

sind. are.

14. Verfahren nach einem der Ansprüche 10 bis 13, wobei die mindestens eine molekulare Signatur mit einer krebsbedingten Veränderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten (300a-c) verbunden ist, die mit einem oder mehreren Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen assoziiert sind, die in der Biofluidprobe enthalten sind in Bezug auf eine Biofluidprobe einer oder mehrerer gesunder 14. The method according to any one of claims 10 to 13, wherein the at least one molecular signature is associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) associated with one or more metabolites, proteins, amino acids, micromolecules and macromolecules contained in the biofluid sample with respect to a biofluid sample of one or more healthy

Individuen. individuals.

15. Verfahren nach einem der vorhergehenden Ansprüche, wobei das ÄAuswerten der 1H-NMRSpektroskopiedaten (300a-c) in Bezug auf das molekulare Profil (310a-c) das Analysieren aller Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten (300a-c) umfasst, die mit einem oder mehreren Metaboliten, Proteinen, Aminosäuren, Mikromolekülen und Makromolekülen assoziiert 15. The method according to any one of the preceding claims, wherein evaluating the 1H-NMR spectroscopy data (300a-c) with respect to the molecular profile (310a-c) comprises analyzing all hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) associated with one or more metabolites, proteins, amino acids, micromolecules, and macromolecules

sind, die in der Biofluidprobe enthalten sind. contained in the biofluid sample.

16. Verfahren nach einem der vorhergehenden Ansprüche, wobei das Auswerten der 1H-NMRSpektroskopiedaten (300a-c) in Bezug auf das molekulare Profil (310a-c) das Erfassen eines Musters von Wasserstoffpeaks umfasst, das mit einer krebsbedingten Änderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten (300a-c) in Bezug auf eine Biofluidprobe einer oder mehrerer gesunder Individuen assoziiert ist; und/oder 16. The method according to any one of the preceding claims, wherein evaluating the 1H-NMR spectroscopy data (300a-c) with respect to the molecular profile (310a-c) comprises detecting a pattern of hydrogen peaks associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) with respect to a biofluid sample of one or more healthy individuals; and/or

wobei das mindestens eine maschinelle Lernmodell darauf trainiert ist, ein Muster von wherein the at least one machine learning model is trained to generate a pattern of

15 15

20 20

25 25

30 30

35 35

30 Wasserstoffpeaks zu erkennen, das mit einer krebsbedingten Änderung des Anteils eines oder mehrerer Wasserstoffpeaks in den 1H-NMR-Spektroskopiedaten (300a-c) in Bezug auf eine 30 hydrogen peaks, which is associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H NMR spectroscopy data (300a-c) with respect to a

Biofluidprobe eines oder mehrerer gesunder Individuen assoziiert ist. Biofluid sample of one or more healthy individuals.

17. Verfahren nach einem der vorangehenden Ansprüche, wobei das Erhalten der 1H-NMRSpektroskopiedaten (300a-c) das Erfassen von 1H-NMR-Spektroskopie-Rohdaten mit einem NMR-17. The method according to any one of the preceding claims, wherein obtaining the 1H-NMR spectroscopy data (300a-c) comprises acquiring raw 1H-NMR spectroscopy data with an NMR

Spektrometer bei einer Frequenz über etwa 500 MHz umfasst. spectrometer at a frequency above about 500 MHz.

18. Verfahren nach dem vorhergehenden Anspruch, das ferner ein spektrales Verarbeiten der 1H-NMR-Spektroskopie-Rohdaten und/oder eine Peak-Ausrichtung der 1H-NMR-SpektroskopieRohdaten umfasst. 18. The method according to the preceding claim, further comprising spectral processing of the 1H NMR spectroscopy raw data and/or peak alignment of the 1H NMR spectroscopy raw data.

19. Verfahren nach dem vorhergehenden Anspruch, wobei das spektrale Verarbeiten eines oder mehrere von einem Referenzieren der chemischen Verschiebung, einem Phaseneinstellen und einer 19. The method according to the preceding claim, wherein the spectral processing comprises one or more of a chemical shift referencing, a phase adjustment and a

Basislinienkorrektur der 1H-NMR-Spektroskopie-Rohdaten beinhaltet. Baseline correction of the 1H NMR spectroscopy raw data is included.

20. Verfahren nach einem der vorhergehenden Ansprüche, das ferner eine oder mehrere von einem Normalisieren, Skalieren, Binnen und Filtern der 1H-NMR-Spektroskopie-Rohdaten und/oder der 1H-NMR-Spektroskopiedaten. 20. The method of any preceding claim, further comprising one or more of normalizing, scaling, binning, and filtering the 1H NMR spectroscopy raw data and/or the 1H NMR spectroscopy data.

21. Verfahren nach einem der vorhergehenden Ansprüche, wobei das mindestens eine maschinelle Lernmodell (114) für das Klassifizieren unter Verwendung eines oder mehrerer statistischer maschineller Lernalgorithmen trainiert ist, die vorzugsweise auf einem oder mehreren 21. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained for classification using one or more statistical machine learning algorithms, which are preferably based on one or more

von einem Voting-Klassifikator, einem Ensemble-Verfahren und logistischer Regression basieren. based on a voting classifier, an ensemble procedure and logistic regression.

22. Verfahren nach einem der vorhergehenden Ansprüche, wobei das Klassifizieren des molekularen Profils (310a-c) in mindestens die erste Klasse und die zweite Klasse von molekularen Profilen umfasst: f 22. The method according to any one of the preceding claims, wherein classifying the molecular profile (310a-c) into at least the first class and the second class of molecular profiles comprises: f

Klassifizieren, basierend auf dem Auswerten der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem ersten trainierten Maschinenlernmodell, des molekularen Profils in mindestens eine gesunde Klasse von molekularen Profilen (310a), die mit gesunden Individuen assoziiert sind, und eine nicht-gesunde Klasse von molekularen Profilen (310b, c), die mit nicht-gesunden Individuen assoziiert sind; und Classifying, based on the evaluation of the 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a first trained machine learning model, the molecular profile into at least one healthy class of molecular profiles (310a) associated with healthy individuals and a non-healthy class of molecular profiles (310b, c) associated with non-healthy individuals; and

nach der Feststellung, dass das molekulare Profil der Biofluidprobe mit oder in der ungesunden Klasse von molekularen Profilen (310b, c) assoziiert ist, Klassifizieren, basierend auf der Auswertung der 1H-NMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem zweiten trainierten maschinellen Lernmodell, des molekularen Profils in eine Nicht-Krebs-Klasse von molekularen Profilen (310a), die mit nicht krebskranken Individuen assoziiert after determining that the molecular profile of the biofluid sample is associated with or in the unhealthy class of molecular profiles (310b, c), classifying, based on the evaluation of the 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a second trained machine learning model, the molecular profile into a non-cancerous class of molecular profiles (310a) associated with non-cancerous individuals

sind, und einer Krebs-Klasse von molekularen Profilen, die mit krebskranken und/oder and a cancer class of molecular profiles associated with cancerous and/or

15 15

20 20

25 25

30 30

35 35

31 präkrebsartigen Individuen (310b, c) assoziiert sind; wobei sich das erste trainierte maschinelle Lernmodell und das zweite maschinelle 31 pre-cancerous individuals (310b, c); where the first trained machine learning model and the second machine learning model

Lernmodell voneinander unterscheiden. Learning models differ from each other.

23. Verfahren nach dem vorhergehenden Anspruch, das ferner umfasst: 23. The method according to the preceding claim, further comprising:

Bestimmen, basierend auf der Klassifizierung des molekularen Profils (310a-c) der Biofluidprobe mit dem ersten trainierten maschinellen Lernmodell, eines ersten Krebsrisiko-Scores, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil (310a-c) mit der gesunden Klasse und/oder der nicht-gesunden Klasse assoziiert ist oder in dieser Klasse liegt; Determining, based on the classification of the molecular profile (310a-c) of the biofluid sample with the first trained machine learning model, a first cancer risk score indicating a probability that the molecular profile (310a-c) is associated with or lies in the healthy class and/or the non-healthy class;

Bestimmen, basierend auf der Klassifizierung des molekularen Profils der Biofluidprobe mit dem zweiten trainierten maschinellen Lernmodell, eines zweiten Krebsrisiko-Scores, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil (310a-c) mit der Nicht-Krebs-Klasse und/oder der Krebs-Klasse von molekularen Profilen assoziiert ist oder sich darin befindet; und Determining, based on the classification of the molecular profile of the biofluid sample with the second trained machine learning model, a second cancer risk score indicating a probability that the molecular profile (310a-c) is associated with or located within the non-cancer class and/or the cancer class of molecular profiles; and

Bestimmung des Krebsrisiko-Scores auf der Basis des ersten und zweiten Krebsrisiko-Determination of the cancer risk score based on the first and second cancer risk

Scores. Scores.

24. Verfahren nach einem der Ansprüche 22 und 23, ferner umfassend: 24. The method according to any one of claims 22 and 23, further comprising:

nach der Feststellung, dass das molekulare Profil der Biofluidprobe mit der Krebsklasse von molekularen Profilen assoziiert ist oder sich darin befindet, Klassifizieren des molekularen Profils (310a-c) in eine Vielzahl von Klassen von molekularen Profilen auf der Basis der ausgewerteten 1HNMR-Spektroskopiedaten in Bezug auf das molekulare Profil der Biofluidprobe mit einem dritten trainierten maschinellen Lernmodell, wobei jede Klasse mit einer bestimmten Art von krebsartiger und/oder präkrebsartiger Erkrankung assoziiert ist; und after determining that the molecular profile of the biofluid sample is associated with or is within the cancer class of molecular profiles, classifying the molecular profile (310a-c) into a plurality of classes of molecular profiles based on the evaluated 1HNMR spectroscopy data relating to the molecular profile of the biofluid sample with a third trained machine learning model, each class being associated with a particular type of cancerous and/or pre-cancerous condition; and

optional Bestimmen eines dritten Krebsrisiko-Scores, der eine Wahrscheinlichkeit dafür angibt, dass das molekulare Profil mit einer bestimmten Art von krebsartiger und/oder optionally determining a third cancer risk score that indicates a probability that the molecular profile is associated with a specific type of cancerous and/or

präkrebsartiger Erkrankung assoziiert ist. 25. Computerprogramm, das, wenn es von einer Computervorrichtung (100) ausgeführt wird, die Computervorrichtung (100) anweist, das Verfahren nach einem der vorhergehenden Ansprüche precancerous disease. 25. A computer program which, when executed by a computer device (100), instructs the computer device (100) to carry out the method according to any one of the preceding claims

durchzuführen. to carry out.

26. Nicht-transitorisches computerlesbares Medium, auf dem ein Computerprogramm nach dem 26. Non-transitory computer-readable medium on which a computer program is embodied in accordance with

vorhergehenden Anspruch gespeichert ist. previous claim is stored.

27. Computervorrichtung (100), die zum Durchführen des Verfahrens nach einem der Ansprüche 27. Computer device (100) adapted to carry out the method according to one of the claims

1 bis 24 eingerichtet ist. 1 to 24 is set up.

Claims

CHANGED CLAIMS

1. A computer-implemented method for determining a cancer risk score for an individual, the method comprising:

Obtaining, at a computing device (100), hydrogen 1-nuclear magnetic resonance, 1HNMR spectroscopy data (300a-c) for a biofluid sample of an individual, wherein the 1H-NMR spectroscopy data (300a-c) is indicative of an NMR signal intensity as a function of chemical shift;

Evaluating, with at least one trained machine learning model (114) of the computing device (100), the 1H NMR spectroscopy data (300a-c) with respect to a molecular profile (310a-c) of the biofluid sample, wherein the molecular profile includes all hydrogen peaks in the 1H NMR spectroscopy data that can be assigned to one or more of a metabolite, a protein, an amino acid, a micromolecule, and a macromolecule contained in the biofluid sample;

Classifying, based on the evaluation with the at least one trained machine learning model (114), the molecular profile (310a-c) into at least a first class and a second class of molecular profiles, wherein the first class is representative of molecular profiles (310a) associated with healthy individuals and the second class is representative of molecular profiles (310b, c) associated with individuals having a cancerous and/or pre-cancerous disease; and

Determine, based on classifying the molecular profile, a cancer risk

Scores that indicate a probability of cancer occurring in the individual.

2. The method according to the preceding claim, wherein determining the cancer risk score comprises calculating and/or determining the probability of cancer occurrence in the individual on the basis of the at least one trained machine learning model (114)

the computer device (100).

3. The method according to any one of the preceding claims, wherein classifying the molecular profile (310a-c) comprises determining and/or calculating a classification result indicative of a probability for the first class and/or the second class; and

where the cancer risk score is determined based on the classification result.

4. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained to determine a probability for a binary classification and/or multi-class classification of the molecular profile (310a-c) of the

biofluid sample, wherein for training the at least one ML model (114) 1-

4

MOST RECENTLY SUBMITTED CLAIMS

HNMR spectroscopy data (200, 300a-c) of a training data set from a variety of

individuals are normalized.

5. The method according to any one of the preceding claims, wherein the biofluid sample is selected from the group consisting of a blood serum sample, a blood plasma sample, a blood sample, a

urine sample and a cerebrospinal fluid sample.

6. The method according to any one of the preceding claims, wherein the 1H-NMR spectroscopy data (300a-c) are indicative of an NMR signal intensity in binned increments of chemical shift, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably

less than or equal to about 0.006 ppm, for example about 0.00016 ppm.

7. A method according to any one of the preceding claims, further comprising:

Converting the 1H NMR spectroscopy data (300a-c) into a binned data structure based on a map of NMR signal intensity to binned increments of chemical shift, each increment having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about

0.006 ppm, for example about 0.00016 ppm.

8. The method of any preceding claim, further comprising: combining the 1H NMR spectroscopy data (300a-c) into a plurality of incremental chemical shift bins, each bin having a width of less than or equal to about 0.02 ppm, preferably less than or equal to about 0.01, more preferably less than or equal to about 0.006 ppm, for example about 0.00016 ppm; and normalizing the binned 1H NMR spectroscopy data (300a-c) to a median of at least one

Subgroup of bins.

9. The method according to claim 7, wherein the at least one trained machine learning model (114) is a machine-learned classifier configured to receive as input data the 1HNMR spectroscopy data (300a-c) in a data structure of binned increments of the

chemical shift.

10. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained and/or configured to identify at least one molecular signature of a cancerous and/or pre-cancerous disease in the molecular profile

(310a-c) of the biofluid sample.

11. The method according to claim 10, wherein the determined cancer risk score is used as a calculated

2 40 / 44

MOST RECENTLY SUBMITTED CLAIMS

Biomarker can be used that contains a pathogenic molecular signature and/or a molecular signature of a cancerous and/or pre-cancerous disease in the molecular profile (310a-c)

of the biofluid sample.

12. The method according to any one of claims 10 and 11, wherein identifying the at least one molecular signature includes identifying one or more bins in the 1H-NMR spectroscopy data (300a-c) associated with and/or containing one or more hydrogen peaks that indicate a cancer-related alteration of the molecular profile (310b, c) with respect to molecular profiles (310a) associated with healthy individuals

associated with.

13. The method of any one of claims 10 to 12, wherein identifying the at least one molecular signature includes identifying one or more bins and/or a chemical shift range in the 1H NMR spectroscopy data (300a-c) associated with an overlap of hydrogen peaks attributable to a plurality of metabolites, proteins, amino acids, micromolecules, and macromolecules contained in the biofluid sample; and/or

wherein identifying the at least one molecular signature comprises identifying one or more bins and/or a range of chemical shift in the 1HNMR spectroscopy data (300a-c) that cannot be assigned to individual metabolites, proteins, amino acids, micromolecules and macromolecules present in the

Biofluid sample.

14. The method according to any one of claims 10 to 13, wherein the at least one molecular signature is associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) associated with one or more metabolites, proteins, amino acids, micromolecules and macromolecules contained in the biofluid sample with respect to a biofluid sample of one or more

several healthy individuals.

15. The method according to any one of the preceding claims, wherein evaluating the 1HNMR spectroscopy data (300a-c) with respect to the molecular profile (310a-c) comprises analyzing all hydrogen peaks in the 1H NMR spectroscopy data (300a-c) associated with one or more metabolites, proteins, amino acids, micromolecules, and macromolecules

contained in the biofluid sample.

16. Method according to one of the preceding claims, wherein the evaluation of the 1H-

NMR spectroscopy data (300a-c) related to the molecular profile (310a-c) detecting

a pattern of hydrogen peaks associated with a cancer-related change in

3

MOST RECENTLY SUBMITTED CLAIMS

proportion of one or more hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) with respect to a biofluid sample of one or more healthy individuals; and/or wherein the at least one machine learning model is trained to recognize a pattern of hydrogen peaks that is associated with a cancer-related change in the proportion of one or more hydrogen peaks in the 1H-NMR spectroscopy data (300a-c) with respect to a

Biofluid sample of one or more healthy individuals.

17. The method according to any one of the preceding claims, wherein obtaining the 1H-NMR spectroscopy data (300a-c) comprises acquiring raw 1H-NMR spectroscopy data with a

NMR spectrometer at a frequency above about 500 MHz.

18. The method according to the preceding claim, further comprising spectral processing of the 1H NMR spectroscopy raw data and/or peak alignment of the 1H NMR spectroscopy raw data.

19. The method according to the preceding claim, wherein the spectral processing comprises one or more of a chemical shift referencing, a phase adjustment

and a baseline correction of the 1H NMR spectroscopy raw data.

20. The method of claim 17, further comprising one or more of normalizing, scaling, binning and filtering the 1H NMR spectroscopy raw data and/or the 1H NMR

spectroscopy data.

21. The method according to any one of the preceding claims, wherein the at least one machine learning model (114) is trained for classification using one or more statistical machine learning algorithms, which are preferably based on one or more of a voting classifier, an ensemble method and logistic

based on regression.

22. The method according to any one of the preceding claims, wherein classifying the molecular profile (310a-c) into at least the first class and the second class of molecular profiles comprises:

Classifying, based on the evaluation of the 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a first trained machine learning model, the molecular profile into at least one healthy class of molecular profiles (310a) associated with healthy individuals and a non-healthy class of molecular profiles (310b, c) associated with non-healthy individuals; and

after determining that the molecular profile of the biofluid sample is consistent with or in the

unhealthy class of molecular profiles (310b, c), classifying based on

4 42 / 44

MOST RECENTLY SUBMITTED CLAIMS

evaluating the 1H NMR spectroscopy data with respect to the molecular profile of the biofluid sample with a second trained machine learning model, classifying the molecular profile into a non-cancer class of molecular profiles (310a) associated with non-cancerous individuals and a cancer class of molecular profiles associated with cancerous and/or pre-cancerous individuals (310b, c);

where the first trained machine learning model and the second machine

Learning models differ from each other.

23. The method according to the preceding claim, further comprising:

Determining, based on the classification of the molecular profile (310a-c) of the biofluid sample with the first trained machine learning model, a first cancer risk score indicating a probability that the molecular profile (310a-c) is associated with or lies in the healthy class and/or the non-healthy class;

Determining, based on the classification of the molecular profile of the biofluid sample with the second trained machine learning model, a second cancer risk score indicating a probability that the molecular profile (310a-c) is associated with or located within the non-cancer class and/or the cancer class of molecular profiles; and

Determination of the cancer risk score based on the first and second cancer risk

Scores.

24. The method according to any one of claims 22 and 23, further comprising:

after determining that the molecular profile of the biofluid sample is associated with or is within the cancer class of molecular profiles, classifying the molecular profile (310a-c) into a plurality of classes of molecular profiles based on the evaluated 1H NMR spectroscopy data relating to the molecular profile of the biofluid sample with a third trained machine learning model, each class being associated with a particular type of cancerous and/or pre-cancerous condition; and

Determining a third cancer risk score that indicates a probability that the molecular profile is associated with a specific type of cancerous and/or precancerous

disease is associated.

25. A computer program which, when executed by a computer device (100), instructs the computer device (100) to carry out the method according to any one of the preceding

to carry out claims.

26. Non-transitory computer-readable medium on which a computer program is embodied in accordance with

the previous claim is stored.

5 43 / 44

MOST RECENTLY SUBMITTED CLAIMS

27. Computer device (100) which is used to carry out the method according to one of the

Claims 1 to 24 are set up.

6 44 / 44

MOST RECENTLY SUBMITTED CLAIMS