HK40041711A

HK40041711A - Visualization of biomedical predictions

Info

Publication number: HK40041711A
Application number: HK62021031861.9A
Authority: HK
Inventors: Thomas Schoedl; Stefanie Kaufmann; Markus Bundschus; Antonia STANK; Marius René GARMHAUSEN
Original assignee: 豪夫迈·罗氏有限公司
Priority date: 2018-05-03
Filing date: 2019-04-30
Publication date: 2021-08-20

Description

Visualization of biomedical predictions

Technical Field

The present invention relates to the field of model-based biomedical prediction, and more particularly to model-based biomedical prediction visualized by a matrix display of a mobile device.

Background

Predictive modeling in the biomedical field faces multiple pressing problems: scientists and healthcare managers need to utilize more and more model-based predictors to accomplish various biomedical prediction tasks. For example, neural networks or support vector machines can be used to predict the most suitable cancer therapy for a patient, to identify suitable epitopes in a protein sequence, to identify specific patterns in a DNA sequence, to identify candidate drug targets in a library of molecules, or to predict the 3D structure of a protein. Statistical models and data from patient cohorts are used to predict whether a particular drug shows improved efficacy compared to drugs used in standard therapy, or whether the drug shows any efficacy. The great diversity of the predictive models is accompanied by the diversity of the corresponding user interfaces. Typically, only a small portion of the information originally generated by the predictive model will actually be displayed to the user. However, if only a binary "yes or no" result is presented to the user, valuable information about the certainty (i.e., reliability) of the prediction and/or the accuracy of the model on which the prediction is based may be lost. In the context of biomedical prediction, the loss of context information is an urgent issue in the biomedical field, where data, models and patients are often characterized by unique combinations of features and conditions and "grey value states", i.e., states that do not allow for an explicit determination of whether a patient/cell/tissue has a particular feature.

On the other hand, if the user interface of the predictive model outputs all the possible available context information, the interface is not suitable for handheld mobile devices, as the small screen limits the complexity of the data that can be displayed. Furthermore, it is impossible to display prediction results based on multiple models on a single screen without losing important context information. However, frequent trips between different workplaces, customers, hospitals, universities, and convention centers have become an integral part of the routine work of many scientists and healthcare providers who use cell phones to perform their work.

Another problem is that the information acquisition speed in the biomedical field is fast and therefore the predictive model is outdated very quickly.

Kundaje A, Middedorf M, Shah M, Wiggins CH, free Y, Leslie C.in "A classification-based frame for compression and analysis gene regulation response", BMC bioinformatics.2006; 7(Suppl 1) S5.doi: 10.1186/1471-S1-S5 describes a prediction framework in the biomedical field that visualizes the prediction results in the form of a combination of scatter plots and confusion matrices. Scatter plots are a special, model-specific visualization of results, and confusion matrices provide some information about the quality of the model. However, even experts often do not understand the information conveyed by the confusion matrix. Moreover, it is very difficult, if not impossible, to provide the information contained in the scatter plot and matrix on the small screen of the handset, and the output cannot be easily compared to other predictions for the same model or other models.

US patent application US 2006/129326 a1 describes a method for improving the control of clinical trials. After the clinical trial began, the data was periodically cleaned and processed to allow statistical analysis of the data. The results include a predictive measure of the time and level at which the study reaches one or more statistically significant levels, allowing for modifications to be made in the middle of the study.

Gayvert Kaitlyn M. et al, in "A Data-drive Approach to differentiating between Success and faces of Clinical Trials", CELL CHEMICAL BIOLOGY, ELSEVIER, AMSTERDAM, NL, Vol 23, Vol 10, p 2016, 9, 15, p 1294-. Prior to clinical trials, it was difficult to identify compounds with adverse toxicity. The authors proposed a "moneyball" approach that analyzes overlooked features to predict clinical toxicity. A new data-driven approach (proxtor) directly predicts the likelihood of toxicity in clinical trials.

Disclosure of Invention

It is an object of the present invention to provide an improved method for visualizing the certainty of model-based predictions on a matrix display of a battery-powered handheld mobile device and a corresponding mobile device as set forth in the independent claims. Embodiments of the invention are given in the dependent claims. If the embodiments of the present invention are not mutually exclusive, they may be freely combined with each other.

In one aspect, the invention relates to a method of visualizing the certainty of a biomedical model-based prediction on a matrix display of a battery-powered handheld mobile telecommunication device. The method includes receiving, by the mobile device, the prediction via a digital cellular mobile telecommunications network. The prediction results are generated by the program logic for the biomedical prediction task using the biomedical model. The prediction result comprises at least a prediction score, a first confidence interval and a second confidence interval.

The prediction score represents the certainty of the prediction and is a numerical value within the score range. The score range is a predefined range of possible score values. Preferably, the prediction score is a normalized prediction score and the predefined score range is a numerical score range, e.g. a score range between-1 and +1 or between 0 and 1.

The goal of prediction may be, for example, to evaluate the membership of a particular object in one of two classes (i.e., binary classification of object execution into class "C0" or class "C1"). Each of the two categories is represented by one of two boundaries of a range of possible score values, e.g. "0" and "1" for the score value range [0, 1], or "-1" and "+ 1" for the score value range [ -1, +1 ]. The prediction score indicates how safe (reliable) the membership of such an assessment is. An "ideal" (absolutely reliable) score value will always be the maximum or minimum possible score value within the range of score values. For example, the "ideal" score may be 0 or 1 (if the range of possible score values is [0 to 1]), or-1 or 1 (if the range of possible score values is [ -1 to +1 ]). The further the calculated pre-measured score is from the nearest of the two scoring range boundaries (i.e., the closer the score is to 0.5 in the case of the range [0 to 1], or the closer the score is to 0 in the case of the range-1 to +1), the less secure the predicted membership represented by the nearest scoring range boundary. For example, category C0 may represent a predicted result: "the drug will be FDA approved for the treatment of a particular disease," category C1 may represent a predicted outcome: "the drug will not be FDA approved for the treatment of that particular disease".

The first confidence interval is a first sub-interval of the score range. The first confidence interval indicates a model-specific score value subrange known to have a False Negative (FN) prediction percentage below a predefined FN percentage threshold.

The second confidence interval is a second sub-interval of the score range. The second confidence interval indicates a sub-range of model-specific score values known to have a percentage of False Positive (FP) predictions below a predefined FP percentage threshold.

The method further includes displaying, by the mobile device, an analog scale icon on a matrix display of the mobile device. The simulated scale icon includes a background region that includes the predicted score. The analog scale icon further includes an analog scale, a pointer, a first sub-range indicator, and a second sub-range indicator.

The simulated scale represents the score range. The two ends of the scale represent the maximum and minimum score values of the score range.

The pointer points to a location within the scale representing the predicted score.

The first sub-range indicator is aligned with the scale such that the size and position of the first sub-range indicator relative to the scale indicates the size and position of the first confidence interval within the range of scores.

The second sub-range indicator is aligned with the scale such that the size and position of the second sub-range indicator relative to the scale indicates the size and position of the second confidence interval within the range of scores.

This feature may be advantageous because the analog scale icons may allow any user to immediately and intuitively understand the results of the predictions and the quality and certainty of the predictions and the accuracy of the model on which the predictions are based, without involving conscious reading, understanding and interpretation of any numerical value or range of values: the score value is a numerical value that cannot be intuitively interpreted due to limitations of human physiology. Furthermore, understanding the score values may be hindered by a lack of experience and expertise in the machine learning arts. To overcome the human physiological limitations of numerical perception, and to fully understand the meaning of score values and the reliability of score values, users were previously forced to consciously read score values, consciously remember the range of possible score values, compare score values to that range, and decide whether the predicted result should be interpreted as a confirmation or rejection of a particular hypothesis. Furthermore, in order to assess the accuracy of the model used to perform the prediction, the user must previously navigate to other pages or display areas that include other values indicative of the quality of the model, read the values, understand them, and interpret the prediction results with reference to this additional information.

By contrast, by providing an analog scale and a pointer to a predicted score value within the scale, a visualization of the predicted result is provided that can be quickly and intuitively understood without involving conscious reading, understanding, and interpreting of the value or range of values. Thus, the understanding of the information conveyed in the analog scale icon does not depend on psychological or other subjective factors, but rather on the relative positions of the pointer, the scale, and the two sub-range indicators aligned with the scale. The relative position of objects represents a form of data representation that can be processed quickly, since the physiology of the human brain has evolved over millions of years in a world full of moving objects, whose rapid interpretation is often critical to survival.

In another advantageous aspect, the information encoded in the model prediction results is presented such that it can also be recognized and intuitively understood if presented on a small screen. Even in the case where the screen is too small to allow the user to read and recognize the numerical values and prediction scores encoded in the scale, and even in the case where the scale does not include any scale values at all, the user can intuitively evaluate the prediction results simply based on the direction of the pointer with respect to the analog scale. In case the display allows to present a value large enough to be human readable, the visualization in the form of an analog scale still improves the readability of the value, since the pointer provides visually recognizable information which may allow the user to immediately recognize that the value erroneously considered to be "0.7" at first sight is actually "0.1", since the pointer may point to a low value (e.g. 0) close to the median value of the scale, instead of a high value (e.g. 1) close to the maximum value of the scale.

In another advantageous aspect, the simulated scale icon includes first and second sub-range indicators that indicate the scale regions that represent pre-measured scores that are considered to be particularly reliable, since in these sub-ranges only a very low proportion of false positive and false negative prediction results are expected. These subranges are model-specific, and the width of the subranges may be different for false positive results and false negative results. Thus, the width of the first and second sub-ranges informs the user of the quality of the model (in general and independent of the quality and certainty of the current prediction).

For example, in a prediction calculated based on a particular model, it is common that a score value of "0" represents an event (classification result) "negative" and a score value of 1 represents an event (classification result) "positive". A score value of "0" may represent the first category "C0" and define the lower bound of the first confidence interval. A score value of "1" may represent the second category "C1" and define the upper bound of the second confidence interval.

In the case where the score value is normalized between-1 and +1, rather than between 0 and 1, the score value "-1" may represent the first category "C0" and define the lower bound of the first confidence interval. The score value "+ 1" may represent the second class "C1" and define the upper bound of the second confidence interval.

According to another example, the user may receive a predicted score of 0.7 based on model M1. This score is a highly reliable predictor because the second subrange of the M1 model covers the range from 0.6 to 1.0, which means that the risk of a false positive predictor is below 10% if the score value is equal to or higher than 0.6. Since the second sub-range is rather large, the user can quickly and intuitively understand the high quality of the M1 model in terms of false positive results. The high quality of the prediction can be evaluated intuitively because the pointer points to a position in the scale table that is aligned with and covered by the second sub-range indicator. The user may then receive a prediction score of 0.7 based on a different M2 model. This score is not a reliable predictor because the second subrange of the M2 model covers the range from 0.9 to 1.0, which means that the risk of a false positive predictor is below 10% if the score value is equal to or higher than 0.9. Since the second sub-range is rather small, the user can quickly and intuitively understand the low quality of the M2 model in terms of false positive results. The low quality of the prediction can be evaluated intuitively because the pointer points to a location in the scale that is not aligned with and covered by the second sub-range indicator.

In another advantageous aspect, the simulated scale icons may provide sufficient contextual information to allow the user to assess the quality of the particular prediction received and the quality of the model used for the prediction, whereby the simulated scale icons may be displayed on the small matrix display of the handheld portable mobile device. Thus, the user does not have to scroll or otherwise navigate to additional GUI panes or pages that include additional model-related information (e.g., confusion matrices, statistical quality parameters of the model, etc.). In contrast, the analog scale icons can provide prediction results and context information in a single dense (compact) view, allowing skilled users to intuitively assess the quality and certainty of individual predictions and the overall quality and accuracy of the model. The visualization may be applied to the output of any biomedical model that generates a prediction score and whose score range is known within which the share of false positive or false negative predictions is below a predetermined threshold range, for example below 10%, or below 5%. Thus, predictions of many different biomedical models or of many different versions of the same model can be visualized and easily and intuitively compared to each other.

According to some examples, the method is performed as a subroutine in a semi-automatic process that determines whether a particular drug should continue to be studied, again depending on the likelihood that the drug is later approved by an authority for treatment of a particular disease. Alternatively, a simulated scale icon may be used to predict whether a particular protein sequence is a good candidate epitope, is a binding region for another molecule, etc., whereby further computational or wet laboratory steps are performed to analyze a drug, protein, or any other biomedical object based on the prediction results. Thus, the representation of the prediction results in the form of an analog scale icon may speed up the guided human-computer interaction process.

According to some embodiments, the analog scale icon is a speedometer icon.

According to some embodiments, the background area is a tachometer disk area.

According to some embodiments, the scale is part of an outline of the background region.

According to some embodiments, the pointer originates from a center of the background region.

According to some embodiments, the background area is a semicircle.

According to some embodiments, the first and second sub-range indicators are each an arc, in particular a circular arc.

According to some embodiments, the predicted score is displayed in the center of the background region.

According to some embodiments, the analog scale icon further comprises a central region concentrically aligned with the background region. The central region shows the score value.

According to an embodiment, the prediction result further comprises a prediction variation interval. The prediction variation interval represents a sub-range of the score range. The width of the prediction variation interval is inversely related to the robustness of the model and the model-based prediction to small variations in the input data used to compute the prediction.

Thus, the prediction variation interval is an estimate of the robustness of the prediction generated by the model to small variations in the input data used to compute the prediction. Ideally, a small change in the input data should result in only a small change in the predicted outcome indicated by the score value.

The analog scale icon further includes a variation bar arranged perpendicular to the pointer. The width of the variation bar is related to the width of the predicted variation interval and indicates the width of the predicted variation interval. The shift bar is preferably arranged such that its center overlaps the center of the axis of the pointer.

Typically, the variation bar describes the degree of variation (e.g., standard deviation) of the prediction score. The variation bar indicates the influence of a slight change in the input data for a specific prediction on the score value. The user may quickly identify an ideal/highly determined prediction by determining that a pointer indicating the predicted score value is within one of the confidence intervals and is associated with a change bar that is entirely within the range of the confidence intervals. Thus, complex information relating to the certainty of model-based predictions may be quickly and intuitively perceived by a user.

This feature may be advantageous because although the orientation of the pointer relative to the scale indicates the outcome of the prediction, the width of the variation bar provides additional information about the "certainty" of that particular prediction. The wider/larger the movement bar, the wider the angle of the possible pointer directions covered by the movement bar. The wide/large variation bar indicates that the trained model is very sensitive to small variations in the model training data, so the current prediction score (direction of the pointer) is also not very robust to small variations in the input data. In other words, the wide/large variation bar indicates that the current prediction is not very reliable or robust.

For example, in response to a first prediction request by a user, a first prediction result is generated and displayed having a prediction score of 0.7 and a short prediction variance bar indicating a small range variability of the score along the scale region (e.g., ranging from 0.69 to 0.71). In the case of a score value of 0.7, the score value and the direction of the pointer may indicate that an assumption like "drug X will be FDA approved for treating disease D" is predicted to be true. The range of 0.69 to 0.71 may be calculated based on a user-defined or otherwise defined prediction confidence level (e.g., 95%). The pointer may be covered by a short variation bar representing and indicating a width of 0.02 scoring units. The certainty of the user immediately intuitively understanding this particular result is very high because the change bars are short.

In contrast, in response to a second prediction request by the user, a second prediction result is generated and displayed having a prediction score of 0.7 and a long variation bar indicating a wide range of variability of the score along the scale region (e.g., ranging from 0.5 to 0.9). As in the first example, where the score value is 0.7, the score value and the direction of the pointer may indicate that an assumption like "drug X will be FDA approved for treating disease D" is predicted to be true. A range of 0.5 to 0.9 may also be calculated based on the above prediction confidence level (e.g., 95%). The pointer may be covered by a wide variation bar representing and indicating a width of 0.4 scoring units. In this case, the user can easily understand that the certainty of the specific score is significantly low because the variation bar of the second prediction is larger than that of the first prediction. A large change bar indicates that a small change in the input parameter value may have a large impact on the calculated prediction score, and/or that retraining of the model-based predictor based on a slightly different training data set may have a large impact on the calculated prediction score of the input data currently in use.

As used herein, a "prediction variation interval" is a metric used to quantify the amount of variation or dispersion of a set of prediction scores computed by model-based prediction logic. A small prediction variation interval indicates a small amount of dispersion, and a large prediction variation interval indicates a large amount of dispersion.

For example, the training data used to generate and train the model-based predictor may be a training data set obtained by data sampling from a superset of training data. Data sampling is a technique that selects a subset of the training set at each epoch. This may be a way to make the epoch unit smaller or to select the associated training sequence at each epoch. This is also often performed when processing very large data sets, in which case the entire data need not be loaded into memory for each epoch. Thus, the "prediction variation interval" according to an embodiment of the present invention gives an estimate of the sample variation of the superset training data; in other words, the prediction variation interval indicates how much the prediction score computed by a particular model-based machine learning logic may change if trained based on a new training set sample.

Typically, if the training data originally used to train the model-based prediction logic is large, high-quality training data, i.e., includes a sufficient amount of true positives and true negatives, has no large bias, and accurately represents a "real-world" composition, the prediction score computed by the model-based predictor may be robust in the face of small variations in the training data set. For example, the training set used to train the model-based prediction logic may be a sampled training set, and the prediction variation interval according to an embodiment of the present invention gives an estimate of the degree of sampling variation for a random forest; in other words, the prediction variation interval indicates how much the prediction of the model-based prediction logic (e.g., the prediction logic of a neural network or a random forest) will change when trained on a new training set. Typically, the width of the predicted variation interval of the prediction generated by the MLL as a pocket learner depends on and indicates the degree of variation of the base learner.

According to an embodiment, the width of the variation bar is equal to the chord length of the visible or invisible circle segment. The circle segment originates from the center of the background region. The two ends of the variation strip intersect the legs of the circular section. The arc of the circle segment is a portion of the scale corresponding to the predicted variation interval.

According to an embodiment, the method further comprises automatically generating, by the program logic, the predicted outcome using the biomedical model.

According to an embodiment, the program logic is installed on a server computer system. The method further comprises: automatically generating, by program logic, a prediction result using the biomedical model; and sending the prediction logic to the mobile device via the network. Alternatively, the method includes sending a message to the mobile device via the network informing the mobile device that the prediction has been generated, and downloading, by the mobile device, the prediction from the server computer.

According to an embodiment, the program logic is trained machine learning logic.

According to an embodiment, the method further comprises repeatedly receiving training data. Each received training data update includes at least some data not included in previously received training data. Each time training data is received, the method includes automatically retraining machine learning logic based on the currently received training data, thereby automatically generating an updated version of the biomedical model.

This may be advantageous because the amount of data and knowledge available in many areas of biology and medicine is increasing rapidly. Therefore, the prediction result may be outdated quickly. By automatically retraining the machine learning logic based on newly obtained information and automatically triggering the recalculation of the prediction results based on updated versions of the biomedical model, scientists and managers in the biomedical field can be assured that decisions can be made based on predictions provided by the latest biomedical model.

According to an embodiment, the biomedical model used by the machine learning logic is a first biomedical model generated based on the first training data. The mobile device is one of a plurality of mobile devices respectively assigned to one of a plurality of users. The method further includes registering the plurality of users and the plurality of biomedical prediction tasks at a back-end program. For example, the back-end program may maintain and manage a user and predictive task registry. Each registered user is assigned one or more predictive tasks. The machine learning logic performs each prediction task to generate a first prediction result using the first biomedical model, respectively. The method includes selectively sending a first prediction result to a mobile device of a user, the first prediction result generated for the prediction task assigned to the mobile device. In response to each retraining of the machine learning logic, the machine learning logic automatically performs each prediction task again to generate a second prediction result using the updated version of the biomedical model, respectively. The method then includes selectively sending a second prediction result, or a notification of its calculation, to a mobile device of a user, the first prediction result being generated for the prediction task assigned to the mobile device.

According to some embodiments, the prediction task is performed based on many different types of models, such as literature-based models, microarray data-based models, and the like. The user and task registry further includes assignments of predicted tasks and model IDs, and the daemon is configured to select an appropriate model for each predicted task to be executed or re-executed based on the model IDs and task assignments in the registry.

This may be advantageous because it is ensured on the one hand that the latest prediction results available are always provided to a plurality of users for an arbitrary number of different prediction tasks. At the same time, network traffic is avoided, as only those users that have registered the prediction task for which the model update was performed are notified to predict the outcome.

According to some embodiments, the back-end program compares the first predicted result and the second predicted result calculated for each of the predicted tasks. The sending of the second prediction results or the sending of a notification of their calculation is performed selectively for those prediction tasks for which the first prediction results and the second prediction results were calculated, which satisfy one or more of the following conditions:

-the score value of the second predictor is within the first confidence interval and the score value of the first predictor is not within the first confidence interval; for example, this may mean that new predictions are observed to have entered a range of scores that are considered particularly reliable due to the FN fraction being small (e.g., FN ratio < 10%); or

-the score value of the first predictor is within the first confidence interval and the score value of the second predictor is not within the first confidence interval; for example, this may mean that new predictions are observed suddenly entering a scoring region that is considered unreliable due to more FN (e.g., a ratio of FN > 10%); or

-the score value of the first predictor is within the second confidence interval and the score value of the second predictor is not within the second confidence interval; for example, this may mean that new predictions are observed suddenly leaving a scoring area that is considered particularly reliable due to the FP parts being small (e.g., the ratio of FP < 10%); or

-the score value of the second predictor is within the second confidence interval and the score value of the first predictor is not within the second confidence interval; for example, this may mean that new predictions are observed suddenly entering a scoring area that is considered particularly reliable due to a low FP ratio (e.g. FP ratio < 10%); or

-the score values of the first and second predicted outcomes differ by more than a predefined score difference threshold; for example, this may mean that the model-based prediction suddenly improves or worsens significantly; or

-the magnitudes of the predicted variation intervals of the first and second predicted outcomes differ by more than a predefined interval length difference threshold. For example, this may mean that the quality of the model-based prediction suddenly improves or deteriorates significantly, e.g. due to variability changes in the training data.

This may help to avoid network traffic and unnecessarily disturbing registered users, because users are only informed of the prediction results if the prediction results generated based on the updated model differ significantly from those generated based on previous model versions, and/or if the certainty of the model or of the prediction results generated based on the updated model differs significantly from that generated based on previous model versions.

According to an embodiment, the machine learning logic has been trained on biomedical literature. The machine learning logic is adapted to use automatically extracted features from the biomedical literature to predict the likelihood of failure of a preclinical or clinical trial testing the ability of a particular drug to treat a particular disease.

According to an embodiment, a mobile device receives a plurality of predictors including the predictor. Each received prediction is generated by program logic based on different input data using the biomedical model. For example, the prediction logic may be used to predict whether a particular drug D1 with the target molecule T1 will be FDA approved for treating disease D, and may additionally be used to predict whether a particular drug D2 with the target molecule T2 will be FDA approved for treating disease D. Thus, the input data for the two predictions may be different because the names of the drug targets T1, T2 are different. The mobile device displays the prediction list on a display of the mobile device. Each list item represents one of the received predictions and includes at least a thumbnail analog scale icon graphically representing the prediction. Each thumbnail analog scale icon at least includes: a scaled down version of the scale; a scaled down version of the background region with the prediction score; and a scaled down version of the pointer originating at the center of the scaled down background region and pointing to a location within the scaled down scale representing the predicted score. When a user selects one of the list items, generation and display of an analog scale icon is performed, wherein the displayed analog scale icon represents a predicted result represented by the selected list item. The display is performed such that the analog scale icon is on the matrix display of the mobile device in place of the prediction list.

This may be advantageous because the thumbnail simulates a scale icon already may provide the user with a visual impression of the prediction results and the quality of the model used for the prediction. The user is able to manage technical tasks in a more efficient and accurate manner, such as comparing and interpreting multiple biomedical prediction results generated by one or more models or model versions.

Embodiments of the present invention may allow for the identification of the most appropriate target and, in general, different use cases and data entry scenarios may be compared and evaluated to find the best solution for a particular biomedical task (e.g., the task of identifying a drug and/or drug target). In this case, the same model is used to perform the prediction.

In another advantageous aspect, embodiments of the invention may allow for comparison of predictions of different models.

The prediction results in the list comprising thumbnail simulated scale images enable a user to quickly identify model predictions that are significantly different from the prediction results provided by other models or other versions of the same model, and/or to quickly identify predictions based on models or model versions of particularly high or low quality or accuracy. Thus, an interface is provided that may allow a user to quickly and intuitively understand the accuracy of different versions of the same model, and also to identify trends in the development of the model and its accuracy if trained repeatedly on training data sets of increasing size. This may be particularly advantageous in the context of biomedical research, where the amount of data increases rapidly, and thus the quality of many predictive models may be improved by repeatedly training the models based on updated versions of training data. This may allow different input scenarios and model versions to be compared and evaluated, and may allow the best model and model version to be identified for performing the prediction task. Scale images are simulated by comparing thumbnails to identify in a large prediction list that the prediction is not dependent on subjective psychological traits of individual users. In contrast, due to the physiological properties of the human brain, analog information is interpreted faster than values or ranges of values for all humans.

According to an embodiment, the analog scale icon is displayed as an element of a graphical user interface that has no scroll bar and/or does not support scrolling.

This may be advantageous because the scrolling operation consumes a lot of energy and processing power, and scrolling may no longer be necessary, because all relevant information for interpreting the predicted results of the scientist is visualized in the analog scale icon.

According to an embodiment, the generation of the analog scale icon is performed by a browser executing script elements of a web page provided by a server computer.

According to other embodiments, the generation of the analog scale icon is performed by a browser plug-in of a browser that displays a web page provided by a server computer.

According to other embodiments, the generation of the analog dial-chart icon is performed by an application program ("app"). The application program may interoperate with a backend program hosted by the server computer. The back-end program is adapted to provide the prediction result to the mobile device over the network.

According to an embodiment, the method further comprises normalizing the original prediction score generated by the program logic and using the normalized prediction score as the prediction score generated by the program logic. The normalized score is normalized based on a predefined range of scores.

According to an embodiment, the method further comprises repeatedly performing, by the program logic, generation of prediction results of the biomedical prediction task, thereby using repeatedly updated versions of the biomedical model. The method further comprises visualizing a change in accuracy of the repeatedly updated biomedical model in the form of a moving image simulating the scale icon, wherein the size of the first and second sub-range indicators, the direction of the pointer and/or the size of the variation bar (if any) varies over time in the moving image.

Generating a motion image from a series of analog scale icons may be particularly beneficial because physiologically, humans cannot intuitively understand the development of five values (both the score values and the ends of each sub-range interval) over time. This is because the human brain is unable to read, interpret and analyze five values in parallel, especially if these values dynamically change over time. Instead, the physiology of the human brain allows intuitive understanding and interpretation of the movement of the pointer along a static scale and the increase or decrease of the two range indicators over time. Thus, the use of analog scale icons allows visualization of trends regarding predicted quality as well as the quality of the model itself over time, thereby allowing a user to process and understand more information at once than is possible based on numerical values and numerical ranges to present results and model quality.

In another aspect, the present invention relates to a mobile handheld telecommunications device that includes a battery for powering the mobile device, a digital cellular network interface, a matrix display, and program logic (also referred to as "client program logic"). The program logic may be, for example, a client application, such as a standalone application or a browser plug-in, or a script, such as JavaScript code embedded in a web page. The program logic may be executable by one or more processors of the mobile device and configured for receiving the prediction via the digital cellular network interface. The prediction results are generated by the prediction program logic for the biomedical prediction task using the biomedical model. The prediction result comprises at least a prediction score, a first confidence interval and a second confidence interval.

The prediction score indicates the certainty of the prediction and is a numerical value within a score range that is a predefined range of possible score values. Preferably, the score value is a normalized score value and the score value range is a range of normalized score values.

The particular manner in which the score is calculated may depend on the mathematical model used. For example, where the model is a random forest model, the prediction score quantifies the certainty of the random forest model by calculating the percentage of decision trees in the random forest that conclude that the sample belongs to category C1. The calculation is as follows: the score is sum (DT, where O (DT) ═ C1)/sum (DT), where DT is the decision tree, O is a function describing the decision tree output, and C1 is the positive category.

The first confidence interval is a first sub-interval of the score range. The first confidence interval indicates a sub-range of model-specific score values known to have a percentage of false negative predictions below a predefined FN percentage threshold.

The second confidence interval is a second sub-interval of the score range. The second confidence interval indicates a sub-range of model-specific score values known to have a percentage of false positive predictions below a predefined FP percentage threshold.

The client program logic is further configured to display the analog scale icon on the matrix display. The simulated scale icon includes a background region including the prediction score, the simulated scale, the pointer, the first sub-range indicator, and the second sub-range indicator. The simulated scale represents the score range, whereby the endpoints of the scale represent the maximum and minimum score values of the score range. The pointer points to a location within the scale representing the predicted score. The first sub-range indicator is aligned with the scale such that the size and position of the first sub-range indicator relative to the scale indicates the size and position of the first confidence interval within the range of scores. The second sub-range indicator is aligned with the scale such that the size and position of the second sub-range indicator relative to the scale indicates the size and position of the second confidence interval within the range of scores.

In another aspect, the invention relates to a system comprising a mobile device and a server computer. The server computer may connect to the mobile device through a network connection established via the cellular network interface. The server computer includes a biomedical model, a prediction program logic configured to generate a prediction result using the biomedical model, and a back-end program adapted to provide the prediction result to the mobile device via the network.

The system may optionally include a plurality of additional mobile devices of additional users that may have registered in a user and task registry managed by remote program logic hosted on the server computer.

As used herein, a "training data set" or "training data" is a set of data records, such as tissue images, electronic documents, microarray data, protein expression profiles, etc., that includes manually or automatically annotated metadata, that allows the MLL to learn a predictive model based on the training data that incorporates biomedical knowledge contained in the training data and that can be used to perform predictions of biomedical problems. The training data is used to train an untrained version of the MLL to generate a trained MLL adapted to perform a particular prediction task. For example, the training data set may include an electronic document in which the name of the disease and the name of the target molecule of a particular drug assumed to be used to treat the disease are both mentioned, and may include several features extracted from the document, such as author name, publication date, journal impact factor, and the like. The electronic document in the training dataset is annotated with a label indicating whether the drug was approved by the FDA for treatment of the disease or rejected by the FDA or failed for other reasons during preclinical and clinical trials. Thus, MLL can learn from features (e.g., number of articles, author network, etc.) contained in a training dataset to distinguish between promising drug candidates and less promising drug candidates based on available biomedical literature and features extracted therefrom.

As used herein, "Machine Learning Logic (MLL)" is program logic, e.g., software, that has been trained or that may be trained in a training process, whereby in the training process, the MLL learns how to perform predictions to solve a particular prediction task from a training data set. For example, the MLL may be a neural network or a support vector machine, or the like. Thus, the MLL program code may include instructions and program routines that are not explicitly specified by a programmer, but are implicitly learned from training data in a data-driven learning process. The learning may include generating one or more implicit or explicit prediction models used by the trained MLL to perform predictions based on future input data. Machine learning may employ supervised learning or unsupervised learning.

As used herein, a "biological model" is a description of a static or dynamic biological system, including biomedical systems represented in electronic form. For example, a biological model may be a description of a particular substance and the manner in which that substance interacts with other substances or biomedical mechanisms. The model may be, for example, but not limited to, mathematical, statistical, heuristic, or rule-based heuristics of biological systems. The model may be explicitly specified, for example, for a rule-based biological model, or may be implicitly specified, for example, in a training phase of a model-based machine learning algorithm. The model may be an integral part of machine learning based prediction logic. In a preferred embodiment, the biological model is a predictive model, i.e., a model used to compute predictions rather than to perform simulations (as is often the case with system biological models).

As used herein, a "sub-range indicator" is a visual GUI element, such as a line, arc, bar, arrow, etc., that represents the size and location of a sub-range that is out-of-range. For example, the sub-range indicator may be an arc 202 having a predefined color and thickness and located outside of the background area 220 of the analog scale icon. For example, the sub-range indicator may be aligned with the simulated scale 208 representing an out-of-range (e.g., a predicted score range) such that the position and size of the sub-range indicator corresponds to a pre-measured score in the scale that is included in the sub-range.

As used herein, a "pointer" is a visible GUI element, such as a bar, arrow, triangle, hand, and the like. Preferably, the pointer has a major axis and a minor axis, wherein the major axis is at least 30%, preferably at least 50% longer than the minor axis.

As used herein, a "prediction score" is a numerical value that represents a prediction result. For example, the prediction score may be a normalized numerical value. In some examples, the prediction result is that a given hypothesis is predicted to be true if the normalized prediction score is higher than the median of all possible normalized score values. If the normalized prediction score is below the median of all possible normalized score values, then the prediction result is that the given hypothesis is predicted to be false. Thus, according to an embodiment of the present invention, the prediction score is a numerical value that indicates which of two possible values or categories is likely to be correct. These two possible values may be, for example: "membership in a particular category: yes or no "; "FDA drug approval of a particular drug for a particular disease: yes or no "; and so on.

As used herein, a "confidence interval" is a sub-range within a range of possible score values that indicates that a prediction having a score value within the sub-range will not have an FP or FN prediction that exceeds a predefined ratio. For example, the first confidence interval may be a first sub-interval of the score range and may indicate a sub-range of model-specific score values known to have a percentage of false negative predictions below a predefined FN percentage threshold (e.g., below 10%). The second confidence interval may be a second sub-interval of the score range and may indicate a sub-range of model-specific score values known to have a percentage of false positive predictions below a predefined FP percentage threshold (e.g., below 10%).

As used herein, a "prediction variation interval" is a metric used to quantify the amount of variation or dispersion of a set of prediction scores computed by model-based prediction logic. A small predicted fluctuation range indicates a small fluctuation amount, and a large predicted fluctuation range indicates a large fluctuation amount. Thus, a small prediction variation interval (covering only about 7% of the score range or less) may indicate that the score value calculated based on the currently used input data value and similar input data values tends to approach the expected score value. A larger predicted variation interval indicates that score values calculated over currently used input data values and similar input data values tend to be distributed over a larger range of values. The "prediction variation interval" may be implemented as a sub-range of score values, whereby the width of the sub-range is a measure of the amount of variation or discrete amount of the score value.

According to some embodiments, the prediction variation interval represents a standard deviation of the score value. This may be advantageous because the standard deviation is algebraically simpler than other measures of variation such as mean absolute deviation. However, there are other measures of deviation of the predicted score from the expected value, including mean absolute deviation, which provides a mathematical attribute different from the standard deviation.

As used herein, a "confidence level" or "change confidence level" is a percentage value. It is used as a basis for computing a particular predicted pre-measured inter-partition. Which represents the reliability of the prediction process given the details of the model-based predictor (e.g., the size and/or quality of the training data used in training the given model-based predictor).

According to an embodiment, the model-based prediction logic calculates the width of the prediction variation interval for each prediction based on a predefined (e.g., user-defined or pre-configured) confidence level (e.g., a 90% confidence level). For example, the prediction variation interval generated for a particular prediction based on a 90% confidence level and calculated by a model-based predictor that has been trained based on a particular training data sample is an interval of scores that satisfies the following condition: "if the prediction is repeated by multiple other versions of the model-based predictor that have been trained separately based on another training data sample, the proportion of the calculated confidence interval, expressed as a prediction variation interval (which will be different for each sample) containing the true population parameters (true prediction/classification results), will tend to be 90%.

The higher the confidence level, the wider the predicted variation interval and variation bar: the predicted variation interval calculated based on the 95% confidence level will be less than the predicted variation interval calculated based on the 99% confidence level.

According to some embodiments, the prediction logic encodes a user interface that enables a user to specify a confidence level for calculating the prediction variation interval, thereby enabling the user to specify a minimum certainty of the prediction that he or she deems acceptable. For example, the confidence level may be 95%, or 99% or any other percentage value, preferably greater than 90%.

Model-based prediction logics such as bag learners and Random Forests) The variability of predictions made can be determined as described in Stefan Wager, Trevor Hastie and Bradley Efreon "fits Intervals for Random questions: The Jackknife and The Infinitimial Jackknife", Journal of Machine Learning Research 15(2014 1625) 1651. The variability of the prediction may be expressed, for example, in the form of a standard error, and the width of the prediction variation interval may be indicative of and related to the standard error.

An "icon" as used herein is a picture displayed on a screen, e.g. a matrix display, and visualizes the result of the model-based prediction, and preferably also the certainty of the prediction and/or the accuracy of the model used to generate the prediction. The icons may be implemented in the form of pixel matrices, vector graphics, or based on a library specific to the programming language, such as the Java Swing or Java awt library. Preferably, any resizing of the icons will scale the size of the visual elements contained therein. The icon is preferably a readily understandable symbol, more like a traffic sign, rather than a detailed description of the actual entity it represents. It can be used as a selectable electronic hyperlink or file shortcut to access additional information related to the particular model-based prediction graphically visualized by the icon. In this case, the user may select an icon for accessing the additional information using a mouse, a pointing device, a finger, or a voice command. According to a preferred embodiment, the mobile device is a smartphone or a small tablet computer and the icon is selected by the user's finger.

As used herein, an "analog scale" is a scale in which information (particularly numerical values, such as predictive scores) is encoded in non-quantized variables, as opposed to a digital scale in which information is encoded in numerical or character form. For example, many speedometers in older cars are devices with analog scales that code the speed of an object, many "traditional" thermometers are devices with analog scales that code the temperature of an object, and so on. Thus, a "simulated scale icon" is an icon that includes a visual element that acts as a scale that encodes numerical values (e.g., all possible normalized pre-measured scores that may be generated by prediction).

As used herein, a "mobile device" is a computing device that is small enough to be handheld and operated in the hand. The mobile device includes a display, typically an LCD flat screen interface, providing a touch screen interface with numeric buttons and a keyboard or physical buttons and a physical keyboard. The mobile device may be connected to the internet and to other mobile devices and/or server computers via a network, in particular a cellular network, and optionally also via a WLAN-mediated internet connection. Integrated cameras, digital media players, the ability to make and receive phone calls, video games, and Global Positioning System (GPS) functionality may also be part of the mobile device. The power is typically provided by a lithium battery. The mobile device may run a mobile operating system that allows third party applications specific to the function to be installed and run. For example, the mobile device may be a mobile phone, in particular a smartphone, a tablet computer or a Personal Digital Assistant (PDA).

As used herein, a "matrix display" is a display device for displaying information on a device, such as a machine, computer, telecommunications device, clock, railway departure indicator, and many other devices. The display consists of a matrix of lights or mechanical indicators arranged in a rectangle (although other shapes are possible, although not common) so that by turning on or off selected lights, text or graphics can be displayed. The matrix controller converts instructions from the processor into signals that turn the lights in the matrix on or off to produce the desired display. The matrix display may be, for example, an LCD display, in particular an LCD touch screen display.

"rendering" as used herein refers to the process of color, shading, and texture addition to an image, particularly a vector image.

Drawings

In the following embodiments of the invention, which are explained in more detail by way of example only, reference is made to the accompanying drawings, in which:

FIG. 1 depicts a flow diagram of a method of visualizing model-based predictions and prediction quality;

FIG. 2A depicts generation and use of predictive biomedical models;

FIG. 2B depicts an analog scale icon representing a predicted result;

FIG. 2C depicts another simulated scale icon representing another predicted result;

FIG. 3 depicts a list of predicted results visualized by thumbnail icons, respectively;

FIG. 4 depicts four graphs, each relating FDA approval of a particular drug to a profile of a particular characteristic (e.g., number of articles);

FIG. 5 depicts a graph of FDA approval for a particular drug versus number of articles;

FIG. 6 depicts two prediction scores generated by different models for the same prediction task;

FIG. 7 depicts a confusion matrix associated with a particular model;

FIG. 8 depicts a confusion matrix associated with another model;

FIG. 9 depicts a simulation scale icon representing an integration forecast generated by an integration model; and is

FIG. 10 depicts a block diagram of a system including a server computer and at least one mobile device.

FIG. 1 depicts a flow diagram of a method 100 of visualizing model-based predictions and prediction quality. The prediction results and model quality are visualized in a dense "compressed" manner, i.e., a large amount of information is presented over a small area, so that the user can intuitively recognize the information encoded therein. The method will be described below by reference to elements of the other figures (in particular fig. 2 and 10).

The method may be implemented, for example, by a client application 980 of a mobile battery powered device 970, the client application being interoperable with a backend 962 of a server system and adapted to receive predictions from the backend via a digital cellular mobile telecommunications network. Alternatively, the method may be implemented in a stand-alone application instantiated on the mobile device, the stand-alone application adapted to extract the predicted result from a message received via the network. For example, the message may be received from the server computer in the form of an email or text message or any other message format. Still alternatively, the method may be implemented by a browser plug-in configured to visualize data contained in a web page, or may be implemented as script code embedded in a web page provided by a server computer to a mobile device via a network. In other embodiments, the method is implemented by hardware logic, firmware logic, software logic, or any combination thereof, included in the mobile device and adapted to receive and display the prediction results and display the analog scale icons generated thereby.

The method 100 allows for the dense visualization of the results of model-based predictions and the certainty of the predictions on a matrix display of the battery-powered handheld mobile device 970. The expression "dense display" as used herein means that a lot of information is displayed on a very limited display space, for example on the display of a mobile phone.

First, at step 102, the mobile device 970 receives a prediction 960. The prediction result may be received from the server system 950 via the digital cellular mobile telecommunications network 990. The prediction results are generated to address a specific biomedical prediction task. The predicted results have been generated by the program logic 956 using the biomedical model 958. For example, the program logic that has generated the predictions may be trained machine learning logic, such as a trained artificial neural network, a trained support vector machine, or any other type of program logic adapted to generate predictions in the form of predicted scores. For example, the prediction logic may also be implemented as a manually specified set of rules and heuristics.

The biomedical model may be an explicitly specified model, such as a manually, semi-automatically or automatically specified model. Alternatively, the model may be an implicitly specified model generated during a training phase of the machine learning logic. For example, the network architecture elements (e.g., weights of "neurons" of a layer) of an artificial neural network modified during a training phase, in combination with the network architecture, may constitute an implicit predictive model ("black box" model) suitable for providing predictions for biomedical problems.

As shown in more detail in fig. 2A, the prediction result may include a plurality of data values. The prediction result includes a prediction score 216, a first confidence interval 256.1, and a second confidence interval 256.2.

The prediction score represents the certainty of the prediction and is a numerical value within the score range. This range may also be referred to as a "possible score value interval". For example, the score range may be a predefined range between-1 and +1, and any raw score values output by the model-based prediction logic are normalized to a value between-1 and + 1. Other score ranges may be used for normalization, such as a range between 0 and 1, depending on the type of prediction. In the following example, a score range from-1 to +1 will be used, but this range is only an example and any other predefined score range may equally be used for normalizing the originally provided score values.

The first confidence interval 256.1 is a first sub-interval of the score range and indicates a model-specific score value sub-range known to have a percentage of False Negatives (FN) predictions below a predefined FN percentage threshold. For example, the first confidence interval 256.1 may be a sub-interval of a score range for which, e.g., based on statistical analysis of multiple model predictions, it is known that any prediction score within the sub-interval is likely to be a false negative score value that is less than a predefined FN percentage threshold (e.g., less than 10%, or less than 5%, or less than 1%). The appropriate size of the FN percentage threshold preferably depends on the type of prediction calculated: in the event that false negative results would bring significant financial or health-related costs to the patient or society, the predefined FN percentage threshold is selected such that the resulting first sub-range is relatively narrow. For example, the first sub-range is selected such that it covers only score values known to include a percentage of False Negatives (FN) of less than 5%. Conversely, where false negative results do not incur significant financial or health-related costs to the patient or society, the predefined FN percentage threshold is selected such that the resulting first sub-range is relatively wide. For example, the first sub-range is selected such that it covers only score values known to include a percentage of false negatives less than 25%. In some embodiments, the first sub-range selectively covers score values known to include a percentage of false negatives that is less than 10%.

The second confidence interval 256.2 is a second sub-interval of the score range and indicates a model-specific score value sub-range known to have a percentage of False Positive (FP) predictions below a predefined FP percentage threshold. For example, the second confidence interval 256.2 may be a sub-interval of a score range for which, e.g., based on statistical analysis of multiple model predictions, it is known that any prediction score within the sub-interval is likely to be a false positive score value that is less than a predefined FP percentage threshold (e.g., less than 10%, or less than 5%, or less than 1%). The appropriate size of the FP percentage threshold preferably depends on the type of prediction calculated: in the event that false positive results would incur significant financial or health-related costs to the patient or society, the predefined FP percentage threshold is selected such that the resulting second subrange is relatively narrow. For example, the second sub-range is selected such that it covers only score values known to include a percentage of false positives less than 5%. Conversely, where false positive results do not incur significant financial or health-related costs to the patient or society, the predefined FP percentage threshold is selected such that the resulting second subrange is relatively wide. For example, the second sub-range is selected such that it covers only score values known to include an FP percentage of less than 10%.

In step 106, the mobile device according to some embodiments generates an analog scale icon 200, 260 from the predicted result, for example, as shown in fig. 2B, 2C, and 9. In other embodiments, this step is performed remotely, for example, by a server computer that has provided the predicted results to the mobile device. In this case, the mobile device may receive the prediction in the form of values and value ranges or already generated analog scale icons that graphically indicate these values, for example in the form of a text string or a specific sub-range indicator that is part of the analog scale icon.

In some embodiments, the analog scale icon is generated in the form of a bitmap image. In other embodiments, the analog scale icon is generated in the form of a vector image. When the analog scale image is displayed on the matrix display, the mobile device is preferably adapted to resize the analog scale icon such that it covers at least 50%, preferably at least 70%, preferably at least 90% of the display. For example, the analog dial gauge icon may be displayed on the screen in "full screen mode".

The simulated scale icon includes a background area 220 that includes the prediction score 216 and the simulated scale 208. Preferably, the prediction score is located at the center of the background region.

The simulated scale represents the score range. The scale has two ends 210, 212 representing the maximum and minimum score values of the score range. For example, the scale may have the form of a semi-circular arc or portion thereof, or may be a horizontal or vertical line or strip, or may have any other shape preferably including two readily identifiable end points. In some embodiments, the analog scale icon may have a design that simulates a scale area of the measurement device. For example, the analog dial icon may be designed to represent a speedometer, scale for a balance, thermometer, etc. Thus, the scale may have, for example, an arcuate form, such as a semicircular arc, or a straight line. The scale and background regions may be designed to simulate a scale on behalf of a virtual measuring device.

In some embodiments, the scale comprises a scale value. In other embodiments, the scale has no scale value. In some cases, the scale value may be too small to be read by the human eye, even if the analog dial gauge icon is displayed in full screen mode on the matrix display. However, the user will be able to interpret the prediction result and its quality based on the position of the pointer and the size and position of the sub-range indicator.

The simulated scale icon further includes a pointer 218 to a location within the scale that represents the predicted score. For example, the background area may be a semicircle representing the speedometer. The scale may be an arc corresponding to a portion of the background area outline. The pointer may originate from the center of the background area.

The analog scale icon further includes a first sub-range indicator 202 and a second sub-range indicator 204. The first sub-range indicator is a graphical element that simulates a scale icon, which is aligned with the scale such that the size and position of the first sub-range indicator relative to the scale indicates the size and position of the first confidence interval within the score range. The second sub-range indicator is a graphical element that simulates a scale icon, which is aligned with the scale such that the size and position of the second sub-range indicator relative to the scale indicates the size and position of the second confidence interval within the score range.

Next, at step 108, the mobile device displays an analog dial gauge icon on its matrix display 978. Where the analog scale icon is a vector graphic, the displaying step may include a rendering step for dynamically assigning colors, shades, and other features to the vector-based design elements of the icon. The displaying comprises resizing the analog dial icon such that it fills a predetermined portion of the matrix display, for example at least 50% of the display, or at least 80% of the display, or 100% of the display.

FIG. 2A depicts the generation and use of predictive biomedical models according to embodiments of the present invention. For example, the model 958 may be an implicit model of an artificial neural network that is implicitly learned by the network 956 during the training phase. The machine learning logic 956 may be prediction logic that has been trained to predict whether a particular drug will be accepted by the FDA as a treatment for a particular disease based on analysis of the biomedical literature. For example, the network 956 may be initially trained based on a large corpus of documents (e.g., a MEDLINE document database used as training data 966). During training of machine learning logic on training data 966, model 958 is learned explicitly or implicitly. The function of learning model 958 from training data 966 is shown as model generation unit 957, although the model generation process may be an implicit part of machine learning logic 956 that is not explicitly specified by a human programmer. MLL can be implemented using a variety of programming techniques and/or readily available machine learning tools, libraries, and modules. In some embodiments, the logic for training the model and for applying the trained model to some new input data may be implemented in different program modules. In some other embodiments, the biological model is part of the program logic that is trained and/or performs the prediction, and thus it is not possible to separate the biological model from the program logic that generates or uses it. For example, the model may be based on a neural network architecture configured to receive a particular type of input data and features, and the weights of the neural network architecture in different network layers have been adjusted during a training phase, so that the trained neural network architecture is able to perform predictions based on new input data corresponding to the structure and type of data used in the training phase.

Once the model 958 is generated, the machine learning logic 956 may use the model 958 to solve a particular prediction problem. For example, the model-based prediction unit 955 may receive some input data 969, such as a description of the name of one or more target molecules of the drug of interest. Prediction unit 955 may then analyze the currently available literature to identify documents or document summaries that mention the name of the target molecule or molecules and the name of the disease to be treated, and analyze metadata associated with the identified documents. For example, the predictor may analyze the author's name, publication date, cross-references to other documents, names of diseases, metabolites, genes, or drugs mentioned in the documents to extract a plurality of document-based features of one or more target molecules provided as input. Feature extraction may be a data analysis step that is explicitly or implicitly specified in the code of prediction logic 956. The extracted features are then used as input to a model 958 that generates a prediction of whether a drug whose target is provided as input 969 will be approved by the FDA as a treatment for a particular disease in the future. Feature extraction may also be performed during the training phase for extracting features from training data that is actually input to the model to be trained.

The prediction results include a normalized prediction score 216, a first confidence interval 256.1, and a second confidence interval 256.2 indicating a sub-range of score values with particularly low false positive or false negative result ratios. Optionally, the prediction result further includes a prediction variation interval 254.

If the predicted outcome is that the FDA will have 100% likelihood of approving the drug, the prediction score (which may optionally be normalized) is, for example, 1. If the predicted outcome is that the FDA will have a 100% likelihood of rejecting approval of the drug, the prediction score (which may optionally be normalized) is, for example, -1. Typically, the prediction score has a value greater than the minimum value of the scale (greater than-1) and less than the maximum value of the scale (less than + 1).

For example, a prediction score of 0.7 indicates that the particular drug whose corresponding target molecule name is provided as input to prediction logic 956 is predicted to be most likely approved by the FDA. A prediction score of-0.8 indicates that the particular drug whose corresponding target molecule name is provided as input to prediction logic 956 is predicted to be most likely rejected (not approved) by the FDA. A prediction score of about 0 indicates that for the currently provided input data 969, the model cannot clearly predict whether the FDA will approve the drug because the model considers the likelihood of rejection and the likelihood of acceptance to be the same or highly similar. The user can easily and intuitively understand the prediction result simply by making a brief observation of the position of the pointer: a pointer to a scale region near the end of the scale representing the smallest scale value indicates a rejection hypothesis/very low prediction score; a pointer to a scale region near the end of the scale representing the maximum scale value indicates acceptance of the hypothesis/very high prediction score; pointers to the central region of the scale indicate that the prediction is ambiguous.

Preferably, the model generation based on the training data is performed fully automatically, for example within a computer-implemented model generation and update framework. For example, training data 966 may be updated and supplemented with more data on a regular basis, such as weekly or monthly. This may be very advantageous in the biomedical field, where the amount of available data is rapidly increasing. This is the case, for example, with biomedical literature data. The model generation and update framework is preferably configured such that each time the training data 966 is supplemented with more training data or modified by removing or replacing portions of the training data, the machine learning logic 956 is automatically retrained based on the updated version of the training data 966. Thus, updated versions of the biomedical model 958 are also automatically generated. If an updated version of the model is used to compute the same prediction, again based on the same input data 969, the prediction results will be different from the previously generated prediction results because the model has absorbed more new knowledge that may have an impact on the prediction results.

According to some embodiments, the literature-based training and model-based prediction are performed as described, for example, in PCT/EP2017/060844, the disclosure of which is incorporated herein by reference in its entirety.

Fig. 2B depicts an analog scale icon 200 representing the predicted result. The analog scale icon in this embodiment is similar to a speedometer.

The icon includes a background area 220 in the form of a semi-circle or semi-ellipse. It includes a central region 214, which is also semi-circular or semi-elliptical, having a different color than the background region. The predicted score 216 is contained in the center region.

The scale 208 is a semicircular arc. The first end 212 of the scale represents the smallest possible normalized pre-measured score value of-1, the second end 210 of the scale represents the largest possible normalized pre-measured score of +1, and the center point of the scale represents the scale value of "0". The scale also displays scale values "-1", "0" and "+ 1". In other embodiments, no scale value, other scale values, or more scale values are displayed.

The icon 200 further includes a pointer 218 in the form of an arrow. The pointer originates in the center of the background region and points to a location within the scale representing a predicted score of "0.5", which is also displayed in the center region 214. Thus, even in the case where the user does not have time to "read" the values shown in region 214, and even in the case where the numbers shown in 214 are too small to be read by the human eye, the user can easily recognize that the prediction score is somewhere between "0" and "+ 1" and thus indicate a "positive" prediction that is assumed to be "true", e.g., the FDA will "allow" a particular drug for treating a particular disease.

The icon 200 further includes a first sub-range indicator 202 that is aligned with the scale such that the size and position of the first sub-range indicator relative to the scale indicates the size and position of the first confidence interval within the score range. In the depicted example, the first sub-range indicator is a (invisible) circular arc segment that originates at the center of the background region and accurately covers the scale region representing the first confidence interval.

The icon 200 further includes a second sub-range indicator 204 that is aligned with the scale such that the size and position of the second sub-range indicator relative to the scale indicates the size and position of the second confidence interval within the score range. In the depicted example, the second sub-range indicator is a (invisible) circular arc segment that originates at the center of the background region and accurately covers the scale region representing the second confidence interval.

The first and second sub-range indicators may each be an arc, in particular a circular arc.

Thus, the user need only check whether the pointer 218 points to a tick mark region that is aligned with the second sub-range indicator. In the depicted example, arrow 218 points to a scale region aligned with the second sub-range indicator 204. This means that the prediction score 0.5 is within the second confidence interval and has a probability of being a false positive prediction result below a predefined FP threshold (e.g. below 10%). Checking whether the pointer points to a region within arc 204 may be performed quickly and intuitively without consciously comparing the score values to confidence intervals of the model.

The change bar 258 is an optional element that visualizes the predicted change interval, i.e., if the input data for the prediction is to be slightly modified, then the result of the visualization prediction will be very different from the current prediction output. If the prediction is robust against small changes in the input parameter values, the prediction variation bar is short, indicating that the prediction will not change much. If the prediction is sensitive to small changes in the input parameter values, the prediction variation bar is broad, indicating that the prediction will change significantly. In the depicted example, the scale is represented as a thick line along a portion of the outline of the background region 220.

FIG. 2C depicts another analog scale icon representing another predicted result. The analog scale icon in this embodiment is similar to another speedometer.

The icons include the visual elements already described with respect to fig. 2B. Further, it comprises a further sub-range indicator 206 aligned with the scale such that the size and position of the further sub-range indicator relative to the scale indicates the size and position of the further confidence interval within the score range. In the depicted example, the other sub-range indicator is a (invisible) circular arc segment that originates at the center of the background region and exactly covers the scale region representing the other confidence interval. For example, the first sub-range indicator 202 may indicate a range of score values for which the known model generates an FN results ratio below a threshold (e.g., below 10%). Another sub-range indicator 206 may indicate a range of score values for which the known model generates a FN result ratio below another threshold (e.g., below 25%).

In the depicted example, the scale 208 is represented as a thick segment representing a portion of the outline of the background region 220. The scale does not include any displayed scale values, but the sub-range indicators include labels indicating the maximum possible proportion of FP or FN predictions at the score ranges indicated by indicators 202, 204 and 206, respectively. The icon 260 has a needle or triangle shape instead of an arrow shaped pointer.

The optional interpretation 222 may provide further information that facilitates the interpretation of the icon 260, such as a color code interpretation of the colors in the color gradient contained in the scale.

Fig. 3 depicts a list 302 of predicted results visualized by thumbnail icons, respectively. The mobile telecommunications device may receive a plurality of predictions via the network.

The plurality of predicted outcomes may include two or more predicted outcomes provided by the same model for different prediction tasks, whereby the different prediction tasks are associated with providing different input data to the prediction logic. For example, the first task may be to predict whether a particular drug X with target PDCD1 will be FDA-approved as a treatment for melanoma, and the second task may be to predict whether the same drug X with the same target PDCD1 will be FDA-approved as a treatment for breast tumors. The same document-based biomedical prediction model may be used for two different prediction tasks whose prediction results are contained in the list 302.

Additionally, or alternatively, the plurality of predictors may include two or more predictors provided by different models for the same or different prediction tasks. For example, a first prediction result may be generated by the literature-based model 958 ("M1") for predicting whether a particular drug X with target PDCD1 will be FDA-approved as a treatment for melanoma. For the same prediction task, a second prediction result may be generated by another M2 model. Another model may not be a literature-based model, but a model that uses metabolic flux analysis, molecular interaction simulation logic, or toxicity simulation logic to predict whether a particular drug X with the target PDCD1 will be approved by the FDA as a treatment for melanoma. Alternatively, the other model may be a document-based model, but the prediction may be based on semantic analysis of the document, while the model M1 may be a co-occurrence-based model.

Additionally, or alternatively, the plurality of predictors may include two or more predictors provided by different versions of the same model for the same or different prediction tasks. For example, a first prediction result may indicate whether a particular drug X with target PDCD1 will be FDA approved as a treatment for melanoma, with a particular version v1 of the literature-based biomedical prediction model being used to generate the prediction. By 1/2016, model version v1 may have been trained based on a literature database. The second prediction result may also indicate whether a particular drug X with the target PDCD1 will be FDA approved as a treatment for melanoma, with version v2 of the literature-based biomedical prediction model used to generate the prediction. By date 2/1 in 2016, model version v2 may have been trained based on literature databases. Multiple further prediction results may have been received for the same prediction task, whereby each further prediction result corresponds to a further version of the document-based biomedical prediction model, e.g. 2016 month 1/3, 2016 month 1/4, etc.

Each list item 304, 306 includes at least one thumbnail analog scale icon and optionally one or more data values. Data values may include, for example, numerical values (such as prediction scores), prediction tasks (item types, targets, indications, etc.), and other metadata for predictions or models used for predictions. Each item may further include a link "details" or other selectable GUI element that allows the user to select a particular list item for triggering display of the analog dial-sheet icon in the new view of the replacement list 302. For example, the new view may include a full-screen version of the analog dial-sheet icon.

Thus, using thumbnail simulated scale icons in the prediction results list allows a user to easily compare the prediction scores to the quality of multiple predictions provided by different models, different model versions, and/or different prediction tasks. Thus, a dense visualization of multiple highly heterogeneous predictive models and software programs is provided that allows a user to compare the prediction results to the prediction quality provided by many different models. This is particularly advantageous in the context of life science research and drug development, as these technical fields are characterized by a highly heterogeneous information technology framework, rapidly growing amounts of structured and unstructured data, and a large number of different prediction methods, relating to the type of training and input data (literature, sequence data, expression profiles, 3D structures, array data, image analysis), to the type of biomedical problems (target prediction, toxicity prediction, drug recognition, side effect prediction), and to the type of prediction methods used (neural networks, support vector machines, random forests, rules, etc.).

Embodiments of the present invention provide an intuitive, dense overview of a plurality of different models, and also allow a user to monitor trends that have an impact on model quality. For example, if multiple predictors have been generated from different versions of the model for a particular prediction task, in some embodiments, the simulated scale icons generated for the predictors are combined into a single motion image, such as an animated gif or video clip, in which the elements of the icon, such as the arrow 218, the sub-range indicators 202, 204, 206, and/or the variation bar 258, can change their respective positions and sizes. When the user clicks on the moving image, the elements of the analog scale icon change their size and/or position. For example, where different versions of a model correspond to an ever-increasing set of training data, it may happen that an increase in the amount of available data may allow for an increase in the accuracy and predictive capabilities of a particular model. Thus, while the prediction score of the initial prediction may be ambiguous and near zero, and the sub-range indicators 202, 204 may be very narrow, the prediction scores generated by subsequent versions of the model may clearly indicate a positive (or negative) answer, and the sub-range indicators 202, 204 may be very wide. In some cases, model quality may also deteriorate if the increased available data includes information contrary to the assumptions supported so far by outdated versions of the training data set. Accordingly, by viewing moving images generated from a plurality of analog scale icons representing prediction results of many different versions of the same model for the same prediction task, the user can easily recognize whether the quality of the model changes over time and whether the change results in improvement or deterioration of the prediction quality.

Fig. 4 depicts four graphs, each relating FDA approval of a particular drug to another profile, such as the number of articles.

Curve 402 depicts the change over time of the subject matter of the publication relating to successful and unsuccessful drugs, focusing on the subject matter "drug therapy". The time frame shown is 20 years before time point "0", which refers to a specific important time point in drug development, in this case the beginning of the earliest phase 2 trial. Each publication has a limited set of subject matter annotations, also referred to as Mesh terms. Graph 402 shows the percentage of publications in two categories of publications that are annotated on the subject of "drug therapy": the "FDA-approved" publication (upper one of the two curves on the right of the figure) is a publication that mentions the target and indication of an FDA-approved drug; the "failure" publication refers to the target and indication of the drug that ends at either phase 2 or phase 3. The bold line represents the median and the shaded area represents the confidence interval of the distribution (implicitly the degree of variation). Statistically significant differences in the distributions evaluated by the Wilcoxon test are marked with an asterisk at the top of the graph. The main hypothesis demonstrated here is that publications leading to successful drug development are annotated with the topic of "drug therapy" significantly more before the start of phase 2 trials.

Graph 404 depicts first and second curves. The first curve, "FDA approved" (upper one of the two curves on the right of the figure) indicates the number of articles in the Medline database that mention the name of a particular drug target, where the drug is later approved by the FDA as a treatment for the disease. The second curve "failure" represents the number of articles in the Medline database that mention the name of a particular drug target, where the drug was later rejected by the FDA and is not allowed to be used to treat the disease. Early in an emerging research field, these two curves were very similar and a literature-based model may not be able to predict unambiguously whether a particular drug is likely to be FDA approved. However, after a few years, it can be observed that the number of published articles that mention both disease and drug targets is higher for targets that are later approved by the FDA than for targets that fail. This is probably because positive results supporting the relationship between a particular drug target and disease have attracted further research groups working in this area, increasing the number of publications that mention only the target. The graph shows that a literature-based model can reliably predict whether the FDA will approve a particular drug, particularly in later years when a sufficient number of documents are available. Therefore, frequent updating of the literature-based prediction model is key to providing high quality predictions.

Graph 406 depicts first and second curves. The first curve, "FDA approved" (upper one of the two curves on the right of the figure) represents the number of articles in the Medline database referring to a particular gene/protein name as a target for a particular drug that is later approved by the FDA as a treatment for a disease associated with the appearance of a biomarker. The second curve "failure" represents the number of articles in the Medline database referring to the name of a particular gene/protein that is the target of a particular drug that is later rejected by the FDA and is not allowed to be used to treat a disease associated with the occurrence of that gene/protein.

Graph 408 depicts first and second curves. The first curve, "FDA approved" (upper one of the two curves on the right of the figure) represents the number of articles in the Medline database that mention the subject genetic variation and the name of a particular drug target and its indication, where the drug was later approved by the FDA. The second curve "failure" represents the number of articles in the Medline database that mention the subject genetic variation and the drug target name and its indication, where the drug was later rejected by the FDA. Thus, information about how many publications mention the subject of the genetic variation may be used as training data for generating further predictive models adapted to predict whether a particular drug will be approved by the FDA based on the genetic variation data.

Fig. 5 depicts a graph 502 of FDA approval for a particular drug versus number of articles. The time at which the earliest phase 2 trial for a particular drug begins is time "0", so graph 502 shows that the article is 20 years earlier than the beginning of phase 2, and even much earlier than the final decision of FDA approval or rejection of the drug. It can be seen that the number of articles is very similar from 20 years before the decision time to around 7 years before the entry into phase 2. Then, the number of publications mentioning documents of both disease and specific drug targets is significantly higher for drug targets of drugs that will be approved later by the FDA. Five years before the drug entered stage 2, the differences were statistically significant, as indicated by the asterisks at the top of the graph. Thus, based on literature models, using article numbers and other distinguishing features, as shown in fig. 4, it may be possible to predict whether a particular drug should still be considered a candidate drug that is promising for FDA approval several years prior to the FDA's actual decision on entering a phase 2 trial, and whether more funds and effort should be invested in preclinical and clinical studies associated with that drug.

This is an important finding, as scientists may manage large numbers of preclinical trials and may be interested in more hypothetical drug disease combinations that future research efforts may be interested in. By automatically generating a document-based prediction for a plurality of predictive tasks relating to, for example, a plurality of different diseases, drugs, drug candidates and combinations thereof, and by repeatedly and fully automatically updating the document-based model and the predictions generated based on the model, an automated prediction and alert system may be provided that allows the scientist or scientists who have registered a plurality of different predictive tasks to remain in follow-up and to stop expensive research when the chance of success is low.

FIG. 6 depicts two prediction scores generated by different document-based models M1, M2 for the same prediction task. For example, a first model-based prediction logic MLL1 that has been trained against the biomedical literature may produce a prediction result with a prediction score of 0.75 for the question of whether a particular drug X will be approved by the FDA for treating disease D. The second model-based prediction logic MLL2, which has been trained based on the same biomedical literature using neural networks for the same problem, can produce a prediction result with a prediction score of 0.65. Thus, the user may face the problem of deciding which prediction should be trusted. To assess the quality of the prediction models, a confusion matrix as shown in fig. 7 and 8 is typically used to determine which of the models M1, M2, and/or the corresponding prediction logic MLL should be considered more accurate and trustworthy. For example, the models M1, M2 may be neural networks or random forest models.

Fig. 7 depicts a confusion matrix associated with a particular model M1, e.g., the predicted result is depicted in fig. 7 as model M1 outputting a predicted score of 0.75.

Fig. 8 depicts the confusion matrix associated with model M2 that outputs a prediction score of 0.65. The confusion matrix includes the color coding frequencies of true negative predictions 702, 802, false positive predictions 704, 804, false negative predictions 706, 806, and true positive predictions 708, 808. However, the user must check the graph with the predicted results as shown in fig. 6 in conjunction with the two confusion matrices as shown in fig. 7 and 8 in order to assess whether the predicted results of the two models and their respective qualities are similar. However, it is not possible to display the graphs shown in fig. 6, 7 and 8 on a small display of a mobile handset large enough to allow a user to quickly obtain and understand the information contained therein. Furthermore, the scrolling action of the user typically consumes a lot of energy. This may discharge the battery and is therefore highly undesirable. In contrast, embodiments of the present invention provide dense visualization of the prediction results and quality of two or more different models, thereby facilitating comparison of two or more models for performing the same prediction task.

FIG. 9 depicts another example of an analog dial indicator icon displayed on the GUI 902. The icons depicted in the figure represent the integration predictions generated by the integration model. The integration model uses as inputs the outputs generated by many different predictive models for generating an overall prediction result, which may also include the elements 216, 254, 256 already described with reference to fig. 2A. The GUI may include additional metadata such as the predicted task, the number of approved publications that mention the disease and drug target, the specificity of positive and negative results, etc.

FIG. 10 depicts a block diagram of a system including a server computer 950 and a plurality of mobile devices 970, 992, 994 interconnected via a network 990. The network may in particular be a digital cellular mobile telecommunications network.

The server system 950 includes one or more processors 952 and non-volatile storage media 954 that includes model-based prediction logic 956 and a database 964. The server system may be an overall system or may be a distributed computer system, such as a cloud computer system. Likewise, storage medium 954 may be a single physical device, or may be a set of interconnected distributed storage devices.

Model-based prediction logic 956 may be machine learning logic, such as a neural network, a support vector machine, or the like. Functionally, the model-based prediction logic includes a model generation function 957 that analyzes a set of training data 966 (which are typically annotated as true positive and true negative results) to generate a model 958 in a so-called "training phase". The model-based prediction logic further includes a prediction function 955 that uses the generated model 958 to generate a prediction for particular input data 969 corresponding to a particular prediction task. The model generation and model-based prediction functions may be implemented as separate modules, or even as separate applications integrated into the model-based prediction framework. Alternatively, the model generation and model-based prediction functions may be components of a single piece of software. The model-based prediction logic 956 may be implemented in any programming language, such as, for example, Java, C #, Perl, C + +, or the like.

According to some embodiments, the server system includes a backend program 962 configured to coordinate the exchange of request and response messages between the server and each of the mobile devices 970, 992, 994. For example, the back-end program may be configured to receive a request from the mobile device 970 to perform a predictive task, such as predicting whether the FDA may be approved to treat disease D using drug X according to the document-based model 958. For example, the backend program 9620 may interoperate with a client application 980 running on the mobile device, or with a specially designed plug-in to the browser 982 of the mobile device, whereby the plug-in acts as a kind of client application. Input data 969 for performing the requested prediction, such as a description of the name of the one or more drug targets of the respective drug X, the name of the disease of interest D, and optionally other parameters, such as a model version for prediction, may be input by the user via the mobile device 970, or already available to the backend program 962 upon receiving the request from the mobile device 970. For example, the server system may include a database 964 that includes a list of registered users and predicted tasks 963, where each registered user has been assigned one or more predicted tasks, and where input data 969 to be used in performing a particular predicted task for a user may have been stored in the database 964 in association with the respective predicted task and user.

The backend program 962 may receive a request from the mobile device 9700 to perform a predicted task for a particular user via the network 990, and may receive any other trigger for initiating performance of the predicted task. In response to receiving the request or other trigger, the back-end program forwards input data associated with the predicted task to model-based prediction logic 956 and triggers the model-based prediction logic to generate a predicted result 960 for the predicted task. For example, the prediction results may include the normalized prediction score 216, the first and second confidence intervals 25.1, 256.2, and optionally also the prediction variation interval 254, as described, for example, in the description of fig. 2A and other portions of this specification. The backend program 962 forwards the prediction result 960 to one of the mobile devices 992, 994, 970 that received the request to perform the prediction, or to one of the mobile devices of the user who was assigned the prediction task in the user and task registry 963 of the database 964. In addition, the prediction results may be stored in a prediction history 961 in the database 964. Preferably, each prediction result stored in history 961 has assigned some metadata, such as the prediction task for which the prediction was performed, the user or users to whom the prediction task has been assigned, the date of the prediction, and the ID, type, and/or version of the model used for the prediction, etc. This history allows to obtain a development profile of the prediction scores and the variation over time of the prediction quality and certainty based on the model, provided that a specific prediction is repeated multiple times on an updated version of the same model for the same prediction task. These profiles can be used to generate motion images from a plurality of simulated scale icons that visualize the prediction results obtained in repetitive predictions. For example, the motion image may be an animation gif or a short video clip, or any other suitable data format having graphical user interface elements that allow a server-side or client-side to generate a "movie-like".

The back-end program 962 may return the prediction results 960 to the mobile device 970 via the network 990 in a number of different ways and protocols. For example, the backend program may send the prediction results 960 directly to the client application 980 using, for example, the EJB JavaBeans framework or a Web services protocol (e.g., SOAP protocol). Alternatively, the second program 962 may send the prediction results in the form of an email, short message, or any other message format to the client application 986 or any other application running on the mobile device, which may be accessed and analyzed by the client application 980 or a browser plug-in acting as a client application and used to extract messages. Still alternatively, backend program 962 may be a web server application configured to generate a web page, such as an HTML web page that includes prediction results 960. The web page may include the prediction results in the form of text, e.g., HTML text elements, and/or in the form of analog scale icons 200, 260 generated by the back-end program 962 on the server side. In the case where the server generates the analog scale icons, the receiving mobile device need only resize the icons contained in the web page so that the pages in the icons contained therein fit the dimensions of the matrix display 978 of the mobile device. Where the analog scale icons are provided in the form of vector graphics, the display of the icons by the mobile device includes rendering the vector graphics icons. Alternatively, where the predicted results are returned in the form of numerical values and value ranges, the client application 980 is configured to generate the simulated scale icons 200, 260 from the predicted results, as described herein for various embodiments of the invention. Icon generation for the client may also be accomplished by a browser plug-in acting as a client application or by scripting, such as PHP code or JavaScript code, flash program code, etc., which is part of a web page provided by web server software that acts as the backend program 962.

In the example illustrated in fig. 10, the mobile device 970 is powered by a battery 972 and includes one or more processors 974 and a non-volatile storage medium 976 in which a browser 982 and a client application 980 are stored and installed.

In one embodiment, the client application is a browser plug-in that is interoperable with the backend program 962. The client application receives the prediction results 960 in the form of values and value ranges and generates the analog scale icons 200, 260 from the values and value ranges. Client application 980 further creates web pages, integrates the generated icons into the web pages, and triggers the display of the web pages with icons on matrix display 978 through browser 982.

In another embodiment, the client application is a standalone application that can interoperate with the backend program 962. The client application receives the prediction results 960 in the form of values and value ranges and generates the analog scale icons 200, 260 from the values and value ranges. The client application generates a Graphical User Interface (GUI) including the icons and displays the GUI with the icons on the matrix display. For example, the GUI may be generated using a Java swing or awt library. Alternatively, the GUI may be an HTML web page, and the client application 980 may act as a "browser" adapted to display an HTML-based GUI.

In yet another embodiment, the mobile device does not include any client application or clock, or at least does not require them to receive and visualize the predictions 960. For example, the back-end program 962 may generate a web page that includes the prediction results 960 in the form of numeric values and numeric ranges and includes a script, such as a JavaScript portion, that is adapted to generate, when executed by the processor 974 of the client device 970, the elements of the prediction results 960 graphically represented in the form of respective elements of the analog scale icons 200, 260.

Instead of a single predicted outcome, the mobile device 970 may also request multiple predicted outcomes for one or more predicted tasks and receive a list of predicted outcomes instead of a single outcome. For example, the prediction result list may be displayed in the form of a list, as shown in fig. 3.

Backend programs 962 and/or client applications 980 may also be implemented in any programming language, such as Java, C #, Perl, C + +, or the like.

The above-described manner of requesting a prediction by a mobile device and receiving a prediction result from a server in response to the request may not be the only manner in which a user requests and visualizes a particular prediction. In a preferred embodiment, each user that has assigned one of the mobile devices 992, 994, 970 may additionally or alternatively, in response to the server system having generated a new version of the model 958, have repeated one or more predicted tasks associated with the user of the mobile device in the user and task registry 963, and have determined that at least one of the predicted results obtained using the new, updated model version differ significantly from the predicted results of the same predicted task obtained previously, automatically and without requiring an explicit request to receive the predicted results. Thus, "significantly different" may mean that the prediction scores are significantly different and/or that any confidence interval (e.g., the first confidence interval 256.1, the second confidence interval 256.2, and/or the prediction variation interval 254) is significantly different from the corresponding interval obtained in a previous prediction of the same prediction task. In a preferred embodiment, the neck and program 962 enable the user to individually configure a threshold for the prediction score and/or for each of the confidence intervals 254, 256.1, 256.2 that specifies what the user considers to be "significantly different". The use of performing this configuration may be one of the users who have assigned one of the mobile devices 992, 994, 970, or may be an operator of the server system 950. For example, the user may specify: if the normalized score values differ from each other by more than 15%, the current predicted score value differs significantly from the previously obtained predicted score value.

Additionally, or alternatively, the user may specify: the currently predicted first confidence interval 256.1 is significantly different from the previously predicted first confidence interval for the same prediction task if the intersection of the two first confidence intervals is more than 15% smaller than the first confidence interval obtained for the previous prediction, and/or if the size of the first confidence interval obtained for the new prediction is at least 15% larger than the size of the first confidence interval obtained for the previous prediction.

Additionally, or alternatively, the user may specify: the currently predicted second confidence interval 256.2 is significantly different from the previously predicted second confidence interval for the same prediction task if the intersection of the two second confidence intervals is more than 15% smaller than the second confidence interval obtained for the previous prediction and/or if the size of the second confidence interval obtained for the new prediction is at least 15% larger than the size of the second confidence interval obtained for the previous prediction.

Additionally, or alternatively, the user may specify: the currently predicted range-of-variation 254 is significantly different from the previously predicted range-of-variation 254 for the same prediction task if the intersection of the two predicted ranges of variation is more than 15% smaller than the predicted range of variation obtained for the previous prediction, and/or if the size of the predicted range of variation 254 obtained for the new prediction is at least 15% larger than the size of the predicted range of variation 254 obtained for the previous prediction.

The user may set different values for the first and second confidence intervals, the predicted variation interval, and the score difference threshold.

According to a preferred embodiment, the program logic 962 is configured to retrain the machine learning based prediction logic 956 automatically, periodically and/or whenever new or more training data 968 is available. The retraining of the machine learning logic 956 means that a new version of the model 956 is generated that can generate different prediction results than previous models for the same prediction task. The backend program 962 may be configured to automatically trigger re-computation of prediction results for all of the predicted tasks contained in the user and task registry 961, or at least for the predicted tasks marked with "auto-update" tags, based on the new model version. The back end program 962 then compares the prediction results obtained using the new model with the corresponding prediction results obtained using the previous versions of the model that have been stored in the prediction history 961. In the event that a particular prediction is "significantly different" from the prediction obtained for the same prediction task using the previous model version, the backend program 962 automatically notifies all users who have been assigned the particular prediction task through an alert message: for the prediction tasks assigned to the user, newer prediction results are available. For example, the second program 602 may be configured to send an alert message (e.g., an email, text message, or any other form of message) to a mobile device of a user who has registered for the predicted task: the prediction results of this prediction task are significantly affected by model updates. In some embodiments, the alert includes a selectable element, such as a URL, that allows the user to select the element to trigger receipt of an updated prediction result and/or to receive an analog scale icon representing the updated prediction result. In the event that the user has registered multiple predicted tasks, and in the event that the predicted outcomes of the multiple predicted tasks have been determined by the back-end program to be significantly affected (changed, modified) by the model update, the back-end program may send an alert message including an updated list of predicted outcomes and/or a list of analog scale icons representing the updated predicted outcomes. Alternatively, the alert message may include only a new version of the prediction model and a notification that multiple prediction tasks are affected by the new model and an optional element (e.g., a URL), whereby user selection of the URL triggers receipt of a list of updated prediction results and/or receipt of a list of analog scale icons representing updated prediction results.

This can be very advantageous because users who have registered a large number of prediction tasks are not "overwhelmed" with a large number of messages each time the model is updated. Typically, model updates do not significantly affect the outcome of the prediction, and notification to the user only increases network traffic and annoys the user. However, in case it is observed that a model update has a significant impact on the prediction results, any user that has registered a respective prediction task is automatically provided with a new updated version of the prediction results, or at least with a link allowing the user to retrieve an updated version of the prediction results. Thus, the user is relieved of the burden of repeatedly performing a particular prediction to ensure that the prediction results are always based on the latest available data and model version.

According to an embodiment, the client application 980 in conjunction with the backend program 962 and the machine learning logic 956 is part of an automated IT framework configured to: automatically and repeatedly retrieving a current training data set; retraining the machine learning logic 956 based on the newly available training data to generate an updated version of the predictive biomedical model; all predictions specified in the user and prediction tasks in the task bank 963 are repeated on the updated model to obtain a current prediction result, the current prediction result is compared with the prediction result generated based on the previous version of the model obtained previously, and selectively only those registered users who have been assigned the prediction task in the registry 963 and observed in the comparison operation that the prediction result of the prediction task is significantly different from the prediction result obtained based on the previous model version are notified of the relevant model update and the new relevant prediction result. Thus, network traffic is avoided and a highly user-friendly system is provided, ensuring that scientists and scientific managers make decisions based on predictions of the latest model at all times. The use of analog dial icons and corresponding thumbnail icons ensures that a user can quickly and intuitively evaluate and compare the predicted results and the quality of the underlying models and model versions on the small screen of the mobile device without performing any scrolling action, even reading a value or range of values.

List of reference numbers

100 method

102-108 step

200 analog scale chart icon

202 first sub-range indicator

204 second sub-range indicator

210 represents the end of the scale for the maximum of the prediction score range

212 represents the end of the scale for the minimum of the prediction score range

214 central region of the background region

216 normalized predictive score

218 pointer

220 background area (speedometer plate)

222 explanation

254 predicted variation interval

258 change strip

256.1 first confidence interval

256.2 second confidence interval

260 simulated scale chart icon

302 prediction list

304 list items including thumbnail analog scale icons

306 list items including thumbnail analog scale icons

402- & 408 graphs relating FDA approval of a drug to another profile of another characteristic, respectively

502 graph relating FDA approval of drugs to number of articles

504 counts the number of articles that mention a particular drug that has subsequently been approved by the FDA

506 counts the number of articles that mention a particular drug that was later rejected by the FDA

508 reference to predicted variation in median number of articles for drug target of FDA-approved drug and for approved publication 510 reference to predicted variation in median number of articles for drug target of drug not approved by FDA and for approved publication

602 bar graph visualization of predictive scores generated by a random forest model

604 Bar graph visualization of predictive scores generated by a baseline model

702 true negative prediction

704 false positive prediction

706 false negative prediction

708 true positive prediction

802 true negative prediction

804 false positive prediction

806 false negative prediction

808 true yang prediction

902 graphical user interface including analog scale icons

904 interpretation of the predictions and metadata

950 server system

952 processor

954 storage medium

955 model-based prediction logic

956 machine learning logic

957 model Generation logic

958 predictive model

960 predicted results

961 prediction history

962 Back end program

963 registering users and predicting tasks

964 database

966 training data

968 New training data

969 input data

970 mobile device

972 Battery

974 processor

976 storage medium

978 matrix display

980 client application

982 browser

990 network

992 Mobile device

994 Mobile device

Claims

1. A method (100) of visualizing a certainty of a biomedical model-based prediction on a matrix display of a battery-powered handheld mobile telecommunications device (970), the method comprising:

-receiving (104), by the mobile device (970), a prediction result (960) via a digital cellular mobile telecommunications network, the prediction result being generated by a program logic (956) for a biomedical prediction task using a biomedical model (958), the prediction result including at least:

a prediction score (216) that indicates the certainty of the prediction and is a numerical value within a score range that is a predefined range of possible score values;

a first confidence interval (256.1), the first confidence interval being a first sub-interval of the score range, wherein the first confidence interval is indicative of a model-specific score sub-range known to have a percentage of false negative predictions below a predefined FN percentage threshold;

a second confidence interval (256.2) that is a second sub-interval of the score range, wherein the second confidence interval indicates a model-specific score sub-range known to have a percentage of false positive predictions below a predefined FP percentage threshold;

-displaying (108), by the mobile device, an analog scale icon (200, 260) on a matrix display (978) of the mobile device, the analog scale icon comprising:

a background region (220) comprising the prediction score (216);

-a simulation scale (208) representing the score range, wherein the endpoints (210, 212) of the scale represent the maximum and minimum score values of the score range;

a pointer (218) pointing to a location within the scale representing the predicted score;

a first sub-range indicator (202) aligned with the scale such that a size and position of the first sub-range indicator relative to the scale indicates a size and position of the first confidence interval within the score range; and

a second sub-range indicator (204) aligned with the scale such that a size and position of the second sub-range indicator relative to the scale indicates a size and position of the second confidence interval within the score range.

2. The method of claim 1, wherein:

-the analog scale icon is a speedometer icon; and/or

-the background area is a tachometer disc area; and/or

-the scale is part of the contour of the background region; and/or

-the pointer originates from the center of the background region; and/or

-the background area is semi-circular; and/or

-the first and second sub-range indicators are each an arc, in particular a circular arc; and/or

-the prediction score is displayed in the center of the background region; and/or

-the simulated scale icon further comprises a central region (214) concentrically aligned with the background region (220), the central region displaying the score value.

3. The method according to any one of the preceding claims,

-the prediction result further comprises a prediction variation interval (254) indicative of a sub-range of the score range, the breadth of the prediction variation interval quantifying a variation or discrete amount of the prediction score calculated by program logic using the biological model;

-the analog scale icon further comprises a variation bar (258) arranged perpendicular to the pointer (218), the variation bar having a width related to and indicative of the width of the predicted variation interval.

4. The method of claim 3, wherein the variation bar has a width equal to a chord length of a visible or invisible circle segment originating from a center of the background region, both ends of the variation bar intersecting legs of the circle segment, an arc of the circle segment being a portion of a scale corresponding to the predicted variation interval (254).

5. The method of any of the preceding claims, further comprising:

-automatically generating, by the program logic (956), the predicted outcome (960) using the biomedical model (958).

6. The method of any of the preceding claims, the program logic (956) installed on a server computer system, the method further comprising:

-automatically generating, by the program logic (956), the predicted outcome (960) using the biomedical model (958);

-sending the prediction logic to the mobile device via a network; or

-sending a message to the mobile device via a network, the message informing the mobile device that the prediction has been generated, and downloading the prediction by the mobile device from the server computer.

7. The method of any of the preceding claims, the program logic (956) being trained machine learning logic.

8. The method of claim 7, the method further comprising:

-repeatedly receiving training data, each received training data comprising at least some data not included in previously received training data;

-automatically retraining the machine learning logic based on currently received training data each time training data is received, thereby automatically generating an updated version of the biomedical model (958).

9. The method of claim 8, the biomedical model used by the machine learning logic being a first biomedical model generated based on first training data, the mobile device being one of a plurality of mobile devices (970, 972, 974) assigned to respective users, the method further comprising:

-registering the plurality of users and a plurality of biomedical prediction tasks at a back-end program (962), each user having assigned one or more of the prediction tasks;

-performing, by the machine learning logic (956), each of the prediction tasks, thereby generating first prediction results using the first biomedical model, respectively;

-selectively sending the first prediction result to a mobile device of the user, the first prediction result being generated for the prediction task assigned to the mobile device;

-automatically performing each of the prediction tasks again by the machine learning logic (956) in response to each retraining of the machine learning logic, thereby generating second prediction results using the updated version of the biomedical model, respectively; and

-selectively sending the second prediction result or a notification of its calculation to the mobile device of the user, the first prediction result being generated for the prediction task assigned to the mobile device.

10. The method of claim 9, further comprising:

-comparing, by the back-end program, the first predicted outcome and the second predicted outcome calculated for each predicted task;

-selectively performing the sending of the second prediction result or a notification of its calculation for those prediction tasks for which the first prediction result and the second prediction result were calculated, wherein:

the score value of the first predictor is within the first confidence interval and the score value of the second predictor is not within the first confidence interval; or

The score value of the second predictor is within the first confidence interval, while the score value of the first predictor is not within the first confidence interval; or

The score value of the first predictor is within the second confidence interval and the score value of the second predictor is not within the second confidence interval; or

The score value of the second predictor is within the second confidence interval, while the score value of the first predictor is not within the second confidence interval; or

The score values of the first and second predicted results differ by more than a predefined score difference threshold; or

-the magnitudes of the prediction variation intervals (254) of the first and second prediction results differ by more than a predefined interval length difference threshold.

11. The method according to any of the preceding claims 7-10, the machine learning logic having been trained on a biomedical document, the machine learning logic being adapted to use automatically extracted features from the biomedical document to predict the likelihood of failure of a preclinical or clinical trial testing the ability of a particular drug to treat a particular disease.

12. The method of any of the preceding claims, further comprising:

-receiving (104), by the mobile device (970), a plurality of predicted outcomes comprising the predicted outcome (960), each received predicted outcome having been generated by the program logic (956) based on different input data using the biomedical model (958);

-displaying a prediction list (302) on the display (978) of the mobile device, each list item (304, 306) representing one of the received prediction results and comprising at least a thumbnail analog scale icon graphically representing the prediction result, each thumbnail analog scale icon comprising at least:

a reduced version of the scale table,

a reduced version of the background region with the prediction score, and

a scaled down version of the pointer originating at the center of the scaled down background region and pointing to a location within the scaled down scale representing the predictive score;

-performing said generating and said displaying of said analog scale icon (200, 260) when one of said list items is selected by a user, wherein said analog scale icon (200, 260) represents said predicted result represented by the selected list item, wherein said analog scale icon replaces said predicted list on said matrix display of said mobile device.

13. The method according to any of the preceding claims, wherein the analog scale icon is displayed as an element of a graphical user interface that has no scroll bar and/or does not support scrolling.

14. The method of any preceding claim, wherein the generation of the analogue scale icon is performed by one of:

-executing, by a browser, script elements of a web page provided by a server computer; or

-a browser plug-in through a browser, the browser plug-in displaying a web page provided by a server computer; or

-by an application program interoperable with a back-end program hosted by the server computer, the back-end program adapted to provide the prediction result to the mobile device over a network.

15. The method of any of the preceding claims, further comprising normalizing an original prediction score generated by the program logic (956), and using the normalized prediction score as the prediction score generated by the program logic (956), wherein the normalized score is normalized based on a predefined range of scores.

16. The method of any of the preceding claims, further comprising:

-repeatedly performing, by the program logic (956), the generation of the prediction result of the biomedical prediction task, thereby using repeatedly updated versions (958) of the biomedical model, and

-visualizing the deterministic change of the repeatedly updated biomedical model in the form of a moving image of the analog scale icon, wherein the size of the first and second sub-range indicators, the direction of the pointer and/or the size of the movement bar (258), if any, varies in the moving image over time.

17. A mobile handheld telecommunications device (970), comprising:

-a battery (972) for powering the mobile device;

-a digital cellular network interface;

-a matrix display (978); and

-program logic (980, 982) executable by the one or more processors and configured for:

receiving (104), via the digital cellular network interface, a prediction result (960) generated by a prediction program logic (956) for a biomedical prediction task using a biomedical model (958), the prediction result including at least:

-a prediction score (216) that indicates the certainty of the prediction and is a numerical value within a score range, the score range being a predefined range of possible score values;

a first confidence interval (256.1), the first confidence interval being a first sub-interval of the score range, wherein the first confidence interval indicates a model-specific score sub-range known to have a percentage of false negative predictions below a predefined FN percentage threshold;

-a second confidence interval (256.2) that is a second sub-interval of the score range, wherein the second confidence interval indicates a model-specific score sub-range that is known to have a percentage of false positive predictions below a predefined FP percentage threshold;

displaying (108) an analog scale icon (200, 260) on the matrix display, the analog scale icon comprising:

-a background area (220) comprising the prediction score (216);

-an analog scale (208) representing the score range, wherein endpoints (210, 212) of the scale represent maximum and minimum score values of the score range;

-a pointer (218) pointing to a location within the scale representing the prediction score;

18. A system comprising a server computer (950) and the mobile device (970) of claim 17, the server computer comprising:

-the biomedical model (958);

-the program logic (956) configured to generate the predicted outcome using the biomedical model (958);

-a back-end program (962) adapted to provide the prediction result to the mobile device via a network (990).