EP1866849A2

EP1866849A2 - Computer-implemented method for the automatic knowledge-based generation of user-relevant data

Info

Publication number: EP1866849A2
Application number: EP06725693A
Authority: EP
Inventors: Gerd c/o DEFINIENS AG BINNIG; Arno c/o DEFINIENS AG SCHÄPE; Günter c/o DEFINIENS AG SCHMIDT
Original assignee: Definiens AG
Current assignee: Definiens AG
Priority date: 2005-04-08
Filing date: 2006-04-10
Publication date: 2007-12-19
Also published as: WO2006106152A2; WO2006106152A3; EP2320358A3; EP2320358A2

Abstract

Disclosed is a computer-implemented method for the automatic knowledge-based generation of user-relevant data. Said method comprises the following steps: objects are selected among a group encompassing data objects, class objects, processing objects, and heterogeneous linking objects in order to generate a first group of objects; heterogeneous linking objects are selected that interlink objects within homogeneous or heterogeneous groups and lead from the first group of objects to one or several second groups of objects; the heterogeneous linking objects are repeatedly selected until a predetermined final criterion has been met; one or several algorithms is/are selected to process the first and second groups of objects. All steps are repeated several times and are used in a hierarchically nested fashion, thereby defining a sequential control.

Description

description

COMPUTER IMPLEMENTED METHOD FOR THE AUTOMATIC KNOWLEDGE-BASED PRODUCTION OF USER RELEVANT

DATES

Automatic electronic data processing has been around since there are computers. The more complex the tasks became, the more demanding the computer programs became, the greater the requirement for the programmer to create a faultless and functioning program. In order to enable the programmer to create programs of ever-increasing complexity, the development of increasingly powerful computer languages was necessary. One line of code in Java, a modern programming language, e.g. has a much greater power than a line in machine language.

Despite the rapid development of programming languages, there is still a situation today that even simple intelligent tasks of computers can not be met if their complexity exceeds a certain extent. Yet, this critical measure is rather modest when compared to the complexity with which a human being can deal. When it comes to simple relationships but large amounts of data, the computer is superior to humans, but when it comes to complex relationships, the computer usually fails at a very low level.

The circumstance of complex relationships is given above all when information is formulated predominantly implicitly, ie when information can be correctly evaluated only with the aid of complex knowledge. Examples of such implicit information are complex tables, text, graphics, images, movies or other types of data. In all these cases, knowledge or even expertise is needed to interpret the content correctly. Both the images on our retina and the digital images of a camera are in the form of pixel fields. The meaningful objects in the pictures, that which makes the pictures interesting at first, are not a priori in the pictures, but arise only in our brain through interpretation. This process of interpretation is also called perception.

It is accordingly the object of the present invention to provide a computer-implemented method for the automatic knowledge-based generation of user-relevant data with which information can be obtained from heterogeneous data in an efficient manner by interpretation.

This object is achieved with the measures specified in claims 1 and 15.

Further advantageous embodiments of the present invention are

Subject of the dependent claims.

With the present invention, the following problems are solved.

A. Expansion towards auto-programming and training.

B. Automating segmentation, analyzing and interpreting complex three- and multi-dimensional images and enabling meaningful visualization of content in three-dimensional images and Film.

C. Enable the automatic interpretation of complex table contents with automatic segmentation and content grouping (generation of non-predefined object networks) and thus the recognition of important relationships.

D. Automatic interpretation, analysis and indexing of complex text documents.

E. Enabling the creation of highly complex "business intelligence" software.

F. Automatic meaningful intelligent data integration and analysis of multi-modal heterogeneous data (holistic interpretation of

Dates).

The present invention will now be explained in more detail with reference to an embodiment with reference to the accompanying drawings.

It shows:

Fig. 1 is a graphic representation of object classes;

Fig. 2 is a structural diagram of the object classes in Fig. 1;

3 is a graphic representation of the object classes;

Fig. 4 is a structural diagram of the object classes in Fig. 3;

5 a 3D analysis of an MR image; Fig. 6 is an example of a spreadsheet; and

Fig. 7 is an example of a hierarchy.

In the further course the following abbreviations are used:

En units = objects and links used in the program flow

CNT cognition network technology = principles of the new computer language

CN -Cognition network = Network of computer language objects

CL cognition language = new computer language

CP cognition program = program code in CL

CM cognition machine = machine with executable CPs

PN program network = all units occurring during the program run (En)

Processes = Segmentation Processes

This application deals with the automatic interpretation of content in complex data by means of a computer-implemented method. In particular, it deals with the automatic interpretation of tables, images, texts and networks. The application does not only describe the structure of a finished solution for data perception, but it should Above all, a software tool will be presented, which makes the creation of complex interpretations possible. It describes a new computer language specifically designed to interpret complex content in data such as spreadsheets and other structures. The internal structure of this language is based on the basic characteristics of the human thinking structure.

context

Proper interpretation of content in data requires knowledge. In conventional computer programs, this knowledge is implicitly incorporated by the programmer into the program in the form of algorithms. The problem is that this knowledge can be distributed throughout the program and usually can not be located properly in the program. Knowledge and processes are interwoven. Unraveling this retrospectively is almost impossible for very complex programs. This makes changes to the program difficult or completely impossible. In complex contexts, however, the perfect functioning of a program is not really achievable and a continuous improvement of the program over weeks, months, years or even decades is absolutely necessary. The biggest problem is the so-called transferability. Even if the proper functioning of a program has been successfully tested on a large number of data records, problems that are still largely unaddressed occur on new data records. This has to do with the unmanageably large amount of different situations in complex contexts.

In the early days of the computer software and hardware were also interwoven. It was a great step forward to separate the two so that they could be independently improved. Similarly, the cognition language is a method by which Help very sophisticated and complex programs can be written while knowing and processes can be treated easily and conveniently separated.

A previously known method with a claim in this direction represent the semantic networks. Here, knowledge is represented by the structure of the semantic network and a computer program uses this knowledge when dealing with a data structure. However, semantic networks are not to be understood as a higher computer language, but rather as simulation tools with very limited execution possibilities. Even if not all solutions can be handled elegantly with a computer language, at least every problem has to be solved in principle. New processes in semantic networks must be written in conventional computer language. In CL, some basic processes are predefined and implemented in conventional computer language. However, these processes are so generic that new processes can be created from these generic building blocks. Just to further enhance the ease and elegance of programming, new generic CL modules programmed in conventional language are also being added over and over again. Also some other aspects, such as the concept of the domain and the navigation through subsets (see below) are missing in semantic networks.

To cope with the demanding task, the new higher computer language, the "Cognition-Language" or CL, has the following characteristics:

General structure of the language

^■ CL is a topic-specific high-level language. ^■ CL is modular; Modules (CL objects) are merged into a program.

^■ There are input and CL objects.

^■ More specific CL objects are knowledge and workflow objects.

^■ More specific input objects are input data and objects generated from them.

^■ Knowledge and flow objects can be assembled into knowledge and flow hierarchies.

^■ The possibility of networking all objects ultimately leads to a hierarchical network, the finished program.

^■ The program transforms weakly structured input data into a hierarchical network of relevant input objects.

^■ This transformation takes place over a large number of intermediate steps, in which ultimately non-relevant intermediate objects are generated, which gradually develop into the relevant ones.

^■ Basic general and topic-specific knowledge is in the

Language permanently installed.

^■ Fundamental knowledge characterizes the structure of the language described below and specific knowledge characterizes the specific, concrete predefined building blocks, the basic modules, the CL objects. ^■ Any number of CL objects can be selected, customized with parameters or variables, and assembled into a complete program through hierarchical networking.

^■ The program sequence and the results can be visualized and thus checked for their quality.

Specific structure of languages

^■ Application- ^related "world knowledge", WW, and program sequences

(Processes) can be formulated separately.

^■ WW and processes can be constructed as hierarchical network structures of knowledge objects or of flow objects.

^■ WW Hierarchy, and Flow Hierarchy together represent the program.

^{Subset of} all forms of objects, domains, can be selected manually or by processes directly or indirectly.

^■ Domains are for local execution of processes and other algorithms.

^■ Domains can be part of a program flow as part of a knowledge object.

^■ Indirect selection is done by sequential hierarchical selection by navigation along partially predefined or selected links. ^■ Domains can be transformed into objects (hierarchically higher ordered objects) by processes or manually (segmentation of input, knowledge and processes).

^■ Processes can be performed locally on domains.

^■ Processes and WW can mutually access and mutually modify each other.

^■ Relations between subsections in the WW, in the processes as well as in the data can be formulated and automatically calculated.

^■ The WW consists of knowledge objects, in particular: the concepts (classes), linking concepts, concept links, markers, object descriptions (expressions) and local and global variables.

^■ The flow hierarchy consists of calculations, conditions, different types of classification and segmentation processes (the processes), and object property computations (features), as well as a feature and variable formula editor.

^■ All objects can be linked together manually or through processes.

^■ Different data sets can be loaded and these or their components can be linked together by processes.

^■ Different analysis results can be loaded and these or their components can be linked together by processes. ^■ Different knowledge hierarchies can be loaded and these or their components can be linked together.

^■ Linked simultaneous analysis and interpretation of different or heterogeneous tables and other types of data such as text, numbers, vectors, spectra, images, and presentation documents can be performed.

^■ All objects and links can be deleted by processes or manually.

^■ The linking of input objects with knowledge objects corresponds to a classification, with a marker of a marker, and with a local variable of an attribute.

^■ Not only input objects but also knowledge objects and process objects can be classified.

^■ Results can be exported as hierarchical object meshes or as tables and graphs.

^■ Tables, subsets and other data objects can be created and destroyed or changed (eg add new column, multiply columns together)

^{Subset of} a table can be inserted into other tables through processes.

^■ Repeated program sequences (loops or loops) can be formulated. ^■ Sub-processes can be formulated.

- A process can conditionally cancel a loop and transfer it to other processes in any position in the flow hierarchy.

^■ General data, especially the names and contents of concepts, or even more specifically names and contents of rows or columns in table concepts of the knowledge hierarchy, can be retrieved from knowledge objects via variables and inserted and used in other concepts or procedures.

^■ Names of rows or columns of tables, can be retrieved via variables and inserted and used in concepts or procedures or other tables.

^■ Meshed objects can be thought of as hierarchically superior objects that can have properties that can be meshed into hierarchically tall objects, and so on.

Visualization of the program flow and the results

^■ All structural elements can be graphically displayed.

^■ The course of the program can be displayed by presenting the evolution of the knowledge or data objects.

^■ The representation of the objects can take the form of (transparent) coloring, which corresponds to their classification, their marking or the value of a feature or an attribute. The outline of the object and its position can also be displayed. ^■ The similarity of objects or another form of belonging together can be shown, for example, by similar coloring.

Complex data, in particular tables, images and texts can be automatically analyzed and interpreted with the help of the Cognition-Language, CL. The CL has a structure that makes sense and thus makes possible the creation of highly complex solutions from the field of perception. In contrast to conventional methods, the data sets automatically and gradually generate an evolution from objects to a hierarchical network structure of these objects. In this case, this object structure only partially consists of predefined data blocks of the data. By incorporating abstract knowledge, the objects are rather generated according to meaning criteria and not according to formal criteria. This makes it possible to extract from the data sets, e.g. Tables automatically extract meaning and meaning. The new computer language makes it possible to quickly and easily create such programs using the described coordinated structural elements.

The automatically generated analyzes can serve as a supplement to their own interpretation as a decision-making aid. You can drastically reduce the time you spend on an analysis and significantly increase its quality. The tables and other data can have heterogeneous contents, so that very different types of information can be automatically combined to form an overall assessment, an overall interpretation.

Concrete technical realization

The automatic treatment of the data is done by a computer program of a special structure. The detailed structure of this program is described in detail below. The rough structure of the Program but is composed as follows.

The three main building blocks of this program structure are:

1. the input data plus the input-en generated from these.

2. the program control and

3. abstract and concrete knowledge of data content, program control and knowledge.

It may seem strange that a program contains knowledge about itself. Only then, however, will intelligence achievements similar to those of a human become possible.

The input data itself includes the input data itself as well as the procedurally generated objects and their interconnections. The program control represents and describes the dynamics of the program, which, once initiated, make calculations and structural changes. In the simplest case, program control structures the input data into a new form using abstract knowledge. Above all, subsets of the input data are determined, defined as objects with properties and networked with each other as well as with the knowledge. In many cases, the objects are also provided with attributes by the program control.

In the general case, however, not only the input data but also knowledge and even the program control are restructured during program control. This means that the program structure as a whole changes depending on the input itself. The change in the entire program structure becomes necessary precisely when all three components, knowledge, data and program control, have to adapt to each other. to guarantee a meaningful program. This circumstance occurs, for example, when new data and new processes have to be automatically generated from the data, that is, if not every possible situation in the input data can be predefined in the form of a concrete program and knowledge structure, but if these are exceeded Program control steps can only be formed. In this way, 1. a program can be set up more flexibly to input data, and 2. a self-optimizing program (knowledge plus program control) can be created which may only be self-optimizing on a training data set and then fixed in this optimized form set to run on other new records.

In addition to this general procedure, the content of this invention consists in the structure of a new programming language, which allows a quick and easy programming in just this procedure. In this object-oriented language, there are three basic types of objects: input, flow, and knowledge objects. In addition, there are a number of special, subordinate, objects, of which, in addition to more conventional, such as object properties and mathematical expressions, three flow objects are to be particularly emphasized.

Important special process objects are variables, selection processes (determination of subsets of the entire En set, short domains) and structuring processes (short processes). Processes create, destroy and modify objects and links in the entire program structure. Generate and destroy domains (change) subsets of the entire program structure. The processes are linked to domains and are then active only on these subsets. Global and local variables are used to transfer data, such as strings, numbers or even vectors, curves or tables, to specific places in the program structure (data, knowledge, Processes) and to use or save in other places (partly bound to objects) and later use.

This particular structure of cognition language has been chosen to model an evolutionary process of perception. The input data, the program flow and the knowledge gradually adapt to each other. In the end it works

1. in an automatic learning process about the structure of the knowledge and the program flow, which can be reused on new data sets, and

2. in an interpretation process around the ultimately generated "data objects of interest" (hierarchically higher objects generated by networking the original data objects).

Solutions:

Technically, the solutions mentioned are made possible by a.) Automatic segmentation of the data into hierarchical networks, which are automatically linked to knowledge objects and other objects to form an integrated program network (PN), with the entire network dynamically changing during program execution,

b.) a navigation method in the multimodal network (the En) for locating subnets (= subsets of the program units), with the possibility to run classification and segmentation processes locally on these subnets and

c.) Inter- and Intranet communication by variables with the corresponding processes. A. Heterogeneous linkages and heterogeneous domains (for all solutions).

Automatically create links between objects that can be used to generate hierarchical parent objects, to tag and classify them as a whole, and then link them to other objects at a higher level, and so on. In addition, they are generally used for quick retrieval objects by using objects that are easier to find as starting objects and then using established links to reach the objects that are harder to find much faster than through other process-oriented procedures (indexing). However, the links should not be fixed, but rather depend on which concrete structures exist in the respective data. Thus, the goal is the automatic process- and knowledge-driven creation and destruction of links between automatically generated subsets of (heterogeneous multimodal) data. This is only possible with the help of a multi-modal network navigation method.

The automatically created links are defined by navigation in the CN, ie along different objects and links. At the same time, these new links as well as the creation of new objects enable new navigation paths. In this way, the CN grows, but it is partially degraded again (sounds like the development of the brain). Both the existence or non-existence of the nodes and links as well as the existence or non-existence of side branches of the path serve as a condition of whether the path is continued or aborted. Part of the navigation path can also be mathematically or logically linked conditions, nodes, and edges of the mesh (in other words, some of the joins are mathematical operations such as and, or +, - *, etc.). At the end or even at intermediate steps of this navigation can be segmentation processes, which are exercised at the here and on the subsets found there. If the path is aborted due to non-fulfillment of conditions, the subset for all subsequent processes is equal to the zero quantity. If the path is passed to the point of the process, the subset found at this point of the path will be considered the "subset of interest", as the so-called domain, for the attached segmentation process, and the process will be executed on that subset. Segmentation processes create and / or destroy objects, groups of objects (domains), and links. These objects and joins include the segmentation processes themselves, local or global variables or their values, classes and labels, objects generated from the data by aggregating their elements, and all kinds of joins. The shortcuts of all kinds thus include both the predefined links that are automatically prepared during the course of the program (such as, for example, for image objects, direct geometric neighborhoods in the form of neighborhood lists or geometrically hierarchical links) and the links described in segmentation processes and generated processually. So far no process-driven links have been described in this way, especially not when the navigation and links are performed on heterogeneous objects and links, whereby the internal objects such as processes, attributes, features and classes are also counted. Thus, elements of one domain can be linked to the elements of another or even different domains in a simple automatic way. A special segmentation process can be: "Link the enum of the start set of the path to the en of the final set." So far, only links of image objects to classes of the class hierarchy can be automatically created. "" Only in the BiId object hierarchy can not navigate in the described way however, within the process, variable, feature, expression and Class hierarchy. In addition, variables are not hierarchically structurable and characteristics only conditionally, and expressions only exist for creating user-defined characteristics and for linking characteristics to classes.

It is also not possible to create links between image objects by means of processes, and therefore certainly not between objects of different images or even heterogeneous links between objects in images on the one hand and other types of data objects such as images on the other hand. Table entries or internal objects such as processes or content in classes (expressions and conditions).

In the described manner of heterogeneous navigation and linking and domain generation, the table entries, image objects and text objects located at completely different locations can be automatically linked with each other and even across data in an abstract manner in an abstract manner.

The newly defined links can then in turn be used to navigate through the general object network and thus to define and create again new domains and links. Groups of linked objects can again be defined as objects with features that can be linked at a higher hierarchical level. In this way, for example, objects in two-dimensional slice images can be combined to form three-dimensional objects with features such as volume by linking objects in different slice images. Even within an image, objects that are not adjacent to one another, which are in a meaning context, can automatically be provided with an attribute (eg dotted line) with a feature, such as length. Such superordinate objects can then also be linked with local variables or with classes. The same applies to texts and Tables and other data. Non-contiguous table entries can also be meaningful and form superordinate objects with characteristics, or a table entry can only acquire special significance through relationships to other entries.

In summary, it can be said that the automatic generation of links by navigation in the CN is new and, above all, new with regard to navigation in the heterogeneous multimodal CN.

B. Communication in the CN during program execution (for all solutions).

The networking and the internal program communication happens in two ways: 1. via links resp. Links and 2. by variables.

Expressions are small mathematical subnets that can be given a name and thus can be used at any point on the CP by linking (using their name).

What is new now is all the mechanisms in the system, such as user-defined thresholds, class descriptions, features, algorithms, processes, conditions, domains, parts of the program, etc., with all sorts of mathematical and logical operations such as +, -, *, /, log, exp, sin , ... and, or, not-and, not-or-networking (in a kind of formula editor) and naming it as Expression and using it anywhere on the CP. In this approach, especially the use of named subprograms, "Customized Algorithms", CAs, is worth mentioning. Until now, references (links) to classes within a program flow were used. Now you can also use CAs within a class definition. This makes it possible to write a subprogram in CL to compute a complex feature that goes beyond the possibilities of creating a CF using mathematical meshing. It is thus a kind of "higher CF", a CF that was not programmed in C ++, but in CL. Another advantage of the expressions is the ability to more clearly separate program flow and knowledge while improving communication between the two. Expressions can be formulated independently and used in various places. It may make sense to split a class description into several expressions (eg in the image analysis in an expression with form descriptions of the object and another with color descriptions). It is quite likely that only partial aspects of a class description are meaningfully needed in different parts of the program flow.

The better separation of program flow and knowledge is especially important if a program is to be adapted to a new form of input data. On the one hand, a professional CL programmer would like to have as little as possible to change the program (desirably nothing) and to adapt mainly or only the expressions in the classes of the new situation. On the other hand, a non-CL programmer, ie a user of a CL solution, would then have to be able to adapt the program to its data solely by optimizing the expressions without having to understand the program sequence. Changing the expressions is dramatically easier than changing a program flow. Since the expressions are now also used in the program sequence, a user indirectly also changes the program sequence. But this happens in a meaningful way without the user even having to remember it.

1) Arithm / Iogic Expressions: General terms, formulated as "For- my, "include ClassDescriptions, Cust Arth features, tresholds, etc. allow the definition of rules, their logical operation and all mathematical operations such as +, -, *, /, log, exp, sin, ...

2) Procedural Expressions: General terms formulated in process

Language, allow CustAlg, CustRelationalFeatures, ...

3) Arithm. Expressions can be used as

Feature: Def: computes a property of an object (BiIdO, Link, Class, Process, ...) (returns double)

Fuzzy Conditions: as usual (thresholds are simple special cases of a fuzzy condition)

4) Proc. Expressions can be used as an algorithm: Def .: changes network structure by creating / modifying / deleting objects / links / attributes

Domain: Def: describes a set of objects based on the network structure, taking into account the current processing status.

Process: as usual

5) Expressions have unique names.

6) domain: net structure (global / local) + fuzzy condition (expression) global: all objects, all classes, image object level, etc .. local: current object, neighbors, super / sub object (s), .. Fuzzy Condition: additional expression 7) Class:

* can interfere with arbitrary number of named expressions * predefined names:

* classify = current class description

* sgmn = std. segmentation method for these objects

* general = std. generalization procedure for these objects

8) Class Centric Development:

ex: for all objects on img obj level main with membership to nuclei. classify> 0.2: assign nuclei

* Class expressions can be addressed via "class_name". "Expression_name".

* Class expressions can initialize "wildcard expressions".

ex2: for all objects on img obj level main with assigned to nuclei: apply nuclei. generalize

Variables and corresponding processes

The second type of internal program communication is done by means of variables. They can be understood as containers that can fetch information from program objects and links (ie at CN-En) and make or store them in other program objects and links. In the case of saving, these are local variables, or object variables, and in the other case of transport and "sharing" around global variables, or scene variables. First and foremost, the global variables have the task of intra-program communication, while local variables primarily store results once associated with CN En and provide them for later communications. Saving results to or to objects or links can have several reasons. On the one hand, the computational effort for calculating a result can be very large and the result can be required at several points in the program sequence. If you store the values locally, the calculation only has to be done once and the values can be retrieved again and again. On the other hand, it may be that a recalculation of the result at a later time of the program flow is no longer possible. Since the entire program object network is dynamic, relationships and objects (that is, the entire CN) are constantly changing, so will the results of calculations. However, old results can be important if, for example, the history of the development of an object serves as the basis for a segmentation decision.

Global variables only have a meaning if they go hand in hand with corresponding segmentation processes, ie their contents or values can actually be fetched, used or stored. So you need to be able to record all relevant data in the program network. Accordingly, these actions must also be formulated as segmentation processes. For example, access to the class hierarchy must be possible in order to be able to automatically incorporate changes there into the program flow. In text analysis, it must therefore be possible to write the name of the classes (or the concepts in the ontology) as a string in a variable and to be able to compare this string with the text contents or words of the text input data. This already includes two types of processes: "write class name as string in variable" and "compare in condition" (eg = or fuzzy comparison) the contents of the variables with text objects. This action seems to be meaningless at first glance, because you could directly use the class name in the program flow, without having to try the class hierarchy (ontology is always included here). However, this assumes that the CL programmer knows all the entries in the class hierarchy, including the future ones, which we explicitly want to avoid. As already mentioned above, the class hierarchy can change and you do not want to have to search for all corresponding points in the program flow in order to have to change all corresponding entries there as well. This should happen automatically and can be realized via the navigation and the domains in the following way (explained in the text example).

Just as one can only selectively treat specific objects in the input data via the domains, without knowing exactly which concretely occur in a concrete analysis or are first formed by segmentation processes, so too can unknown classes, processes, domains, expressions or find and handle variables via the described navigation. Input objects often use class and expression navigation: all input objects that are linked to a particular class and meet certain conditions represent the domain and are treated. So here navigation goes from the classes via the classification links to the input objects. Similarly, one can proceed to classes as one navigates from more abstract classes to more concrete ones.

For example, if you want to automatically interpret protein names at one point of a text analysis program, in conjunction with proteins in cell biology, then you must also be able to treat the protein shortcuts in the program. As a rule, these abbreviations consist of only three letters and are therefore highly ambiguous. It is now possible to search all abbreviations in the input text, without to know her beforehand. They only need to be linked to others in the class hierarchy and ultimately to more abstract classes. Then the navigation and thus the definition of the domain can be made via these links. Thus, if the abbreviations in ontology appearing in the input text are also mentioned and linked to the class or the concept "protein" via hierarchical links, then you can navigate "to all proteins", then via the hierarchical link " specifically "and with the condition" number of letters <4 "reach this subset of words in the text, namely the abbreviations, as a domain. Then you can create the classification links between the class "Protein Abbreviations" and the protein abbreviations occurring in the input text via segmentation processes. In a first step, many of the joins may be wrong, as abbreviations of other terms are also linked. Later, however, other processes can be used for correction.

The processes for creating the classification links can be of different types. There can be a fixed "String Match" process, where-if the class name matches the name in the text-the classification link is created. Or a variable can label the names of the concepts as "values" and mark the abbreviations with the condition "value of variables = word-string". This label (with the name abbreviation of proteins, for example), you can also call it forced classification, serves as an index, because the abbreviations can now be found very quickly on the links created to the markers again. In addition, a simple formulation of the domain is possible: "go to class" abbreviations of proteins "in the class hierarchy and then via classification or labeling links to the input objects". The programs that are created specifically for abbreviations can all run on this domain, which is found very quickly by this quasi-indexing. By no means do variables have to be just numbers or strings. Especially for training a program (but also in the normal program sequence) it is appropriate and convenient to also tables, vectors, variable sets, or even blocks of data, such as images or texts, as the contents of variables to allow and thus the internal program communication of these data. A variable set has a name, like all variables, and is a set of variables plus their values.

In this sense, a representative image or template outline of an object to be found in an image may represent a local variable in the knowledge hierarchy. If necessary, this variable can be retrieved by navigation, transported to specific image objects and used there for a segmentation process (e.g., classification by determining the scale-invariant and rotationally-invariant best coverage). The same applies to tables or texts or other data such as DNA code. Table examples, text examples (many words) or DNA code example can serve as a template and be compared with a concrete, already segmented input object. Of course, this comparison segmentation process is not usually a simple match, but one that allows distortions.

During automatic or semi-automatic training of a CP, the program will be informed of the desired final results in some form. Hierarchical annotation allows a user to specify what kind of objects to find. However, it can also be done interactively by a user applying certain processes (with varying parameters) to a data set, creating some desired objects or manually marking objects. The user can also mark the automatically found correct objects. The program can then automatically extract parameters from these selected objects and perform new processes with these parameters Finally, at the end of this action, a parameter set is available that represents the optimized sequence program, which determines that there is an optimal match (best overall classification of all objects) with the selected objects. In these parameters, segmentation and execution parameters (also classification) can stand for all En. Thus, it can also be determined which expressions, classes or process blocks are to be active and in which order. It is taken for granted that CN features are available to all En. Thus, for a self-learning CP, segmentation processes or blocks thereof must also be classifiable via features such as process runtime.

Since now usually not just one kind (a class) of objects is to be found, but a whole variety thereof, also many such parameter sets and thus possibly a very large number of parameters must be trained. So if the number of parameters becomes confusing, it makes sense to hierarchically structure variables with their values in the form of groups with their own names. Parameter sets are even more than hierarchically structured variables, because different parameter sets can contain the same variables but with different values. A parameter set is therefore a kind of n-dimensional vector with a number of n values and with freely selectable dimensions (represented by the names of the variables). If a parameter set is activated, the values of the block are automatically written to the corresponding variables. This means that different parameter sets can be effective at different points in the program sequence, and thus possibly the same variables but with different values.

C. Automatic Knowledge Driven Segmentation of

Table contents for the purpose of their automatic interpretation. Even in the case of spreadsheets, the CNT software solutions automatically generate non-predefined yet useful objects and relationships, meaning objects, from or between subsets of data. Which objects and links are created automatically can be described in a knowledge structure, the class hierarchy. Sense objects can also be objects (or are these as a rule), which only have the function of serving as intermediate objects in order to ultimately enable the segmentation of the actual meaning objects. The type of objects to be automatically segmented can thus be formulated in an abstract manner in the form of a hierarchical knowledge network separately from the rest of the program sequence. This makes spreadsheets based on CNT technology dramatically different from conventional spreadsheets.

Automatic and process-automatically generated links are used to classify and quickly find meaning objects to be treated. A meaning object can be a non-predefined segment of a row or column, or an object that is described in a very abstractly knowledge-based and process-defined manner, which consists of hierarchically linked rows or columns or segments thereof. In a multimodal approach, such an object consists of multimodal parts, e.g. from image, text and table segments.

The entire CN structure described in already existing is needed. In addition, however, special, table-specific structures are necessary. These special structures optionally include transformations of tables into thematic-level images to treat tables in the same CN as images. This simplifies data integration and usability as most features are needed only once for both applications. Other special structures are the following:

For the user-friendly editing of the tables, it should be possible to transfer meaning objects within tables (or from images generated from tables plus thematic levels) into new tables. This requires the following segmentation processes: "Create empty table with row length x and column length y", "define meaning object Y (table subset) as sense table Z", "write sense table in local variable LV" and "fill empty table with content from LA" or "write content of LA in empty table at positions x, y". This means that variables must be able to contain numbers and strings as well as tables. However, it would also be possible (somewhat more cumbersome) to transfer entry by entry of the object of meaning to the desired location of the empty table via the domain navigation described above. In this case, one comes with string and number contents of variables.

Just as in image analysis conventional processes, such as averaging or edge filters, have also found their way into the CNT, this is also necessary for spreadsheets. Multiplication or general mathematical operations of one row or column with another, or else the reordering of rows and columns, are offered as segmentation processes.

Tables should be hierarchically structured via domains and navigation. The following types of phrases must be possible to navigate: "go to the column with the name XYZ and with the feature expressions column length = 10 and column average value <100, from there to the left neighbor column, from there along the hierarchical link" is part of "to the overobject from there along all the links" has parts "to all subobjects.These are at the bottom of the navigation Subset of the table represents the domain on which a segmentation process such as "can find the table entry in the domain with the maximum value" expires.

The classification of meaning objects must be possible by formulating classes of meaning objects with expressions as content (or classes linked to expressions). It must be possible to formulate the following types of expressions: "Sense object ABC has as sub-objects the objects classified as KLM and the objects classified as NOP with the Mermal D of the KLM objects <5 and the feature E of the NOP objects> 100". Fuzzy features can also be formulated.

D. Automatic segmentation, analysis and interpretation of complex three-dimensional and multi-dimensional images by linking two-dimensional objects.

In addition to conventional three-dimensional segmentation techniques, the multidimensional image CNT includes the ability to segment and classify a four-dimensional image (a time-varying three-dimensional image) in layers, and then create three- or four-dimensional objects by linking two-dimensional objects. This process must be regarded as an evolutionary one in which two and three / four-dimensional analyzes alternate in an interplay of one another.

The layer structures can be represented as an image with different tiles or as a number of images that are sorted or marked according to their geometrical four-dimensional position.

There must be features that automatically link two-dimensional objects as three- or four-dimensional objects. be considered. Examples of such features are volume or velocity of an object.

The intelligent segmentation of multi-dimensional images makes the visualization of image content in multi-dimensional images meaningful in the first place. The different objects of interest can be represented three / four-dimensionally by different colors and the less interesting objects can be displayed transparently to a desired degree.

E. Automatic data integration through concurrent simultaneous analysis and interpretation, different or heterogeneous tables and other types of data such as text, numbers, vectors, spectra, images and presentation documents, and even CL internal structures such as concepts and program flows.

F. Visualization of results as graphs.

G. New computer language for creating self-learning programs.

As already mentioned above, it is only necessary to annotate the input data and to be able to save training results (learning results) in the form of tables or parameter sets. Otherwise, there is no need for special CNT features for this application beyond the multi-modal application. An exception is only the case of extremely complex parameter sets:

If the number of parameters to be tested becomes very large, it is no longer possible to go through all possible combinations in a reasonable time. Then evolutionary and genetic algorithms must be in the CNT can be used

New features of the programming language (keyword-like):

I. Extension of the language to global variables which, with the help of corresponding processes, can transport information of different kinds (strings, vectors, tables,...) Between (and within) world knowledge, program flow and input data (in short all En).

2. Segmentation and classification of processes and knowledge objects.

3. Process-driven creation of links for all program objects (all En).

4. Treatment of the networked objects as hierarchically superior objects that may have properties and which in turn can be networked to hierarchically higher objects, and so on.

5. Extension of the domain concept to subsets of world knowledge and other enums of the program flow including CP-created links.

6. Application of the language to heterogeneous tables of all kinds (including, but not limited to, business intelligence) with corresponding topic-specific processes and object property calculations.

7. n-dimensional image and table processing by being able to load several images or tables (or even images with the process: load table as image) and to link these and their objects in a procedural manner. ken.

8. Multimodal data can be loaded and processed through multi-mo- dal navigation and domains, ie, multimodal data can be cut out by multimodal navigation to create very heterogeneous subsets. Segmentation processes can raise multimodal domains to En and further segment and classify them with multimodal expressions and features.

The invention in detail

Fig. 1 graphically shows the object classes of the units, En.

Fig. 2 shows the associated structure diagram.

Fig. 3 shows in another way the object classes in the CL:

4 shows the structure diagram associated with FIG. 3:

consequences

Cognition-Language enables the creation of software solutions for the interpretation of complex data records. It is specially designed for problems with ambiguous solutions. An example of a unique solution is the sum of a column in a table or finding the largest numeric values in the table (can be one or more entries). All calculating machines in the world, if they are not defective, will arrive at the same result for the same table. However, in cases where the number of necessary arithmetic operations for unambiguous results becomes untreatable, new approaches have to be taken. The question "which train in a given situation The answer depends on how many moves the calculator can predict in a reasonable amount of time (all moves can not be calculated by a calculator) and which strategies it is equipped with Once again, which strategy is clearly the better One can find a dramatically aggravated situation in perceptual issues The content of a complex table is probably interpreted by 100 different people in 100 different ways and can only be interpreted very primitively or not at all by today's computer programs become.

Another example of non-predictability: The number of possible combinations of pixels of an image to an object is so huge that no computer can create only approximately all possible objects and examine their properties. Thus, no computer can unambiguously find the "best" object of an image (except in trivial cases). The unpredictability is especially true when the meaning of blocks of numbers (or image objects) can be defined only by relationships to other blocks of numbers (or image objects) (which in almost all cases is actually so given). In such cases, that is, in the normal cases of an interpretation problem, the combination possibilities explode to an immeasurable degree. Herein, in the complexity of the task, is the difference between a spreadsheet and a table meaning.

Nature has developed a strategy for such unpredictable situations: evolution in small steps over primitive life forms to more complex ones. The cognition language provides a language that naturally allows small-step evolution of blocks of meaning within data structures, from primitive blocks of meaning to more sophisticated ones. In this approach, individual treatment of the meaning blocks is necessary: different meanings (even if they are still primitive) require different treatments. Here, a primitive semantic block can be a single number of a table if it deviates drastically from the other numbers. However, this number is part of a larger unit eg a column. Thus, the whole column can gain a special meaning and thus make a special treatment necessary for them. However, it may also be necessary for other columns that have a special relationship to this one to also require special (possibly different) treatment.

With the CL with the concepts, markings and the domains a local treatment is enabled and a gradual development of the

Simple to the sophisticated. The separation of knowledge and processes makes the programs clearer and simplifies your creation and continuous improvement and troubleshooting.

Tables interpretation

Tables and digital images have a similar structure: they consist of an n-dimensional ordered array of numeric values. In images, the picture objects play the decisive roles in addition to the pixels; in tables, in addition to the numbers, above all the lines and columns. Picture objects are usually very difficult to extract, lines and columns on the other side are given. But by no means are rows and columns the only interesting objects in tables. Within a column or row, undesignated sub-objects may be included. Thus, in the monthly sales figures of a company, a jump in the numerical values occur. This could have been triggered by an event that also appears explicitly in the tables or not (eg a marketing activity). In any case, there is a time before and a time after the event, ie the line can be divided into two sub-objects, one with high and one with low averages. For an interpretation of the table, it may make sense to treat these sub-objects. act and relate to other objects (to the marketing activities). Including the sales figures of the previous years can then make sense. Thus, this results in a similar multi-scale segmentation problem as in the image analysis, in this example, however, only in one dimension.

Depending on the dimension of the picture, n-dimensional neighborhoods play an important role in pictures. This is of minor importance to tables, although they are not insignificant, especially if rows and columns can be sorted according to certain criteria (sorting alters the table and can be considered as a special pixel filter in analogy to image analysis). Then it may make sense to segment two-dimensional objects, e.g. Areas where profits, sales, marketing activities and other areas are at a high level. However, two-dimensional or multi-dimensional table problems similar to those of the images occur in practice even without sorting if within one line one parameter of a measured value increases or decreases in small steps and in the columns another parameter also decreases or increases in fairly homogeneous steps , However, this will not be the norm, and direct neighborhood in tables will usually only be significant in the one-dimensional. Nevertheless, there are higher-level objects whose entries can be distributed over the entire table. Therefore, it is important that data objects can be automatically linked with each other if they together make a meaningful object after an automatic classification analysis.

This can be considered as a way of multidimensional segmentation of objects. In this case, one could decrease the time segments with high marketing activity with the time segments of high turnover and the time segments of high profits. One could transform this net into a hierarchically higher object and in the world knowledge with the Associate Marketing Success Class (Classification) This classification can be done automatically by the Marketing Success class satisfying certain object descriptions (expressions) Expressions are based on object property calculations (features) and can have simple conditions (ratio of marketing effort to profit increase if jumping less than 0.01) or A class can be a logical combination of many expressions.A comparison of the expression logic with the concrete objects results in a classification probability, which in the example cited describes the size of the success.Thus, all marketing successes can be achieved through segmentation and subsequent classification (for example, since the company was founded) with a classification probability greater than 0.5, which can now be linked together again and form an object on an even higher level be compared with each other and in turn be related to other higher objects in order to find out more correlations.

The advantages of the domains can also be explained by this example. The objects "marketing successes" did not exist in the original data, they were first created in Cognition-Language. Now you can use them as domains. Domains direct the processes to the right places within the data. In our example the domain description would be as follows: "go to all marketing successes (class description of the domain) with classification probability greater than 0.7 (condition for great successes) in the period after 1.1.2000 (2nd condition, time AG establishment) and leave the Process become active ". The process can be the calculation of a number, for example the variance of the ratio of marketing effort to turnover of the company. Here it is assumed that formulas based on features and variables can be created in the program (formula editor). This gives you a measure of whether marketing expenses and company sales are linearly linked. In detail, this proposal provides in cognition-language like this.

1st process, domain: go to all marketing successes (with the appropriate conditions), no process activity; 2. Process, (sub-process): Domain: go to the sub-object Marketing Expense, Process Activity: write the value of the feature "Sum Marketing Expenditure" in the global variable "Marketing Expenditure"; 3rd process, (subprocess): domain: go to the subobject sales, process activity: write the value of the "total sales" feature into the global variable "sales": 4th process, subprocess: no domain process activity: divide the global variables " Marketing effort "and" turnover "messed up and write it into the global variable" ratio "; 5. process: domain: go to the over-object (marketing success), process activity: write global variable "ratio" into local variable "marketing effort / sales"; 6th process (no subprocess): domain: go to all marketing successes as usual, process activity: calculate the statistical value of the standard deviation of the local variable "marketing effort / revenue" and write the result into a global or local variable with an appropriate name.

FIG. 5 shows a 3D analysis of an MR image.

On the left, for a 3D analysis of an MR image of a human head, one sees a strip in shades of gray of about 6 sections and a segmentation result (twice enlarged in comparison to the gray images). Here are the left (green) and right (red) ventricles and the white brain mass found by segmentation of the cuts. On the right side, the left ventricular slices are linked and the total volume of the left ventricle can be determined. The link can be visualized by clicking on the object with the computer mouse on a single cut. Then, as an example, all objects linked to it could be automatically marked at the same time, as in the shown on the right two strips. It can be checked to see if the automatic link has worked as desired. The same can also be implemented for other data such as texts or tables (see also FIG. 6).

Fig. 6 shows an example of a spreadsheet.

Here, a line segment was clicked in a spreadsheet in CNT and those associated with this segment are also automatically selected (marked in red). Thus, a hierarchically superior object can be visualized. The gray values represent the numerical values. But you can also directly display the numbers. Geometrically superior objects are color-coded in the two right-hand stripes. If you click here on e.g. a brown field, then the entire brown field is marked. It is then also available as an object for domain navigation.

Fig. 7 shows an example of a hierarchy.

Here you can see the class hierarchy, the process hierarchy and the segmented and classified input data. The red and blue stripes are non-predefined automatically found sense objects (and therefore all of different lengths).

Example of a "language definition: http: //www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf (188 pages!).

Perhaps also important is the question of the flow control (runtime environment). There are still several alternatives or additions.

1. Iterative processing of heterogeneous mass data from heterogeneous data sources for extracting user-specific, relevant information mations (parameters, diagrams).

2. Processing is characterized by

2a. the assignment of data to objects,

2 B. the creation, destruction and linking of objects or sets of objects,

2c. the limitation on the application of processing instructions to certain dynamically generated sets of objects (domains).

3. Objects and processing instructions can be graphically or textually displayed and edited by the user.

Claims

claims

1. A computer-implemented method for automatic knowledge-based generation of user-relevant data, with the following steps:

Selecting objects from a group having data objects, class objects, processing objects, and heterogeneous linking objects to thereby create a first group of objects, selecting heterogeneous linking objects that interconnect objects within similar or heterogeneous groups, and from the first group of objects lead to one or more second groups of objects; repeatedly performing the selection of the heterogeneous linkage objects until a predetermined end criterion is met;

Selecting one or more algorithms for processing the first and second groups of objects; wherein all steps are repeated multiple times and hierarchically nested and thereby define a flow control.

2. The method of claim 1, wherein selecting the heterogeneous linkage objects is partially omitted.

3. The method according to claim 1 or 2, comprising the following step:

Determining whether or not a respective next step of the flow control is hierarchical, thereby determining whether, starting from a preceding group of objects, the subsequent objects found via link objects are defined as multiple groups, or whether the plurality of groups are to be treated as a whole group, to thereby determine a chain-like or tree-like navigation.

4. The method of claim 1, wherein objects standing at the end or at an intermediate step of the drain control are selected from the group comprising data objects, processing objects, class objects, link objects and locally linked variables.

The method of any one of the preceding claims, wherein selected algorithms structurally change objects, include information within the objects in variables, and place variables in the objects.

6. The method of claim 1, wherein link objects are created or destroyed partially conditionally by selecting a plurality of selected groups within the flow control or independently of the flow control and group members of the plurality of groups are linked together by link objects or their link objects are deleted ,

7. The method of claim 1, wherein heterogeneous objects are created conditionally by selecting a plurality of selected groups within the scheduler or independently of the scheduler, and group members of the plurality of groups taken together being defined as new hierarchically superior objects.

8. The method of claim 7, wherein conditions relate to properties of a hierarchically superior object to be generated.

9. A method according to any one of the preceding claims, wherein class Sending objects are linked to one another via link objects, which can be selected as part of the flow control, and store properties that identify corresponding and their link objects and are used for limiting conditions in the flow control.

10. The method according to any one of the preceding claims, wherein the data to be analyzed and the data objects are heterogeneous and access to the data and the data objects is homogeneous.

11. The method of claim 10, wherein the heterogeneous dataset has a plurality of different slice images that together represent an n-dimensional image, where n is an integer equal to or greater than three, and wherein the time to detect temporal changes is one dimension and knowledge objects are heterogeneous in that siw describe two-dimensional slices of the n-dimensional data objects as well as three-dimensional and n-dimensional objects.

12. The method of claim 10 or 11, wherein the heterogeneous dataset comprises a table having heterogeneous entries.

13. The method of claim 10, wherein the heterogeneous dataset comprises text documents having heterogeneous contents.

14. The method according to any one of claims 10 to 13, wherein the heterogeneous data set represents a plurality of similar or different images, which are in a meaningful context.

15. The method according to claim 10, wherein the heterogeneous data record comprises a computer program which, during an interaction of the computer program with data objects for interpreting the correctness of the data objects, the computer program with the aim of generating the data. Improving the efficiency of interpreting the data is hierarchically classified, segmented, and changed by a superordinate computer program optimizer.

16. A computer-implemented method for automatic knowledge-based generation of user-relevant data, comprising the following steps:

Selecting generic launch objects based on properties and names of objects having data objects, class objects, processing objects, and link objects;

Navigate to new common objects, which are also defined by name and properties; and

Create a tree of groups of common objects by repeatedly navigating from the new common objects as new generic startup objects.