EP1509856A2

EP1509856A2 - Method for searching for data, taking into account the moment of availability ofsaid data in a distributed system

Info

Publication number: EP1509856A2
Application number: EP02719901A
Authority: EP
Inventors: Markus Blume; Markus Hoffmann
Original assignee: Individual
Current assignee: Individual
Priority date: 2001-02-22
Filing date: 2002-02-22
Publication date: 2005-03-02
Also published as: AU2002250996A1; WO2002069184A3; US20020116375A1; WO2002069184A2; DE10108564A1

Abstract

The invention relates to a method for searching for data which is stored in a distributed system (1) or resources (2b, 5-10) containing data, wherein the data stored in the system (1) contains a time index relating to the moment when or period of time during which the data is or was available in the system (1). The search words defining the search conditions include a time parameter which limits the search to the time and/or time period defined by said time parameter. According to a method for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and/or representing data stored in said resources (2b, 5-10), said data is displayed along with the information contained in the time index; data in the system (1) is accessed according to predefinable time parameters.

Description

Method for searching for data or data containing data currently or previously stored in a distributed system, taking into account the time of their availability

The present invention relates to a method for searching for data or data containing data currently or previously stored in a distributed system, and a method for accessing the resources of a distributed system and for receiving and / or displaying data currently or previously stored in these resources Data, taking into account the time of availability of the data in the system. In particular, the invention relates to a method for searching or accessing data from the Internet.

The Internet in its current form offers the possibility to access extensive databases and information in a short time. With the help of so-called search engines, for example, targeted searches can be carried out for data which are intended to meet specified search conditions. The available research options and the database that can be accessed are considerably more extensive than a classic library.

A characteristic of the Internet, however, is that the information available changes very quickly. The content of so-called websites is updated at regular intervals or even continuously, depending on the type of information it contains. The average lifespan of a website, i.e. the period in which the data remain unchanged is estimated to be around 70 days. If the data is updated, so far usually the originally available data was not saved or archived, so that it was irretrievably lost. In comparison to a classic library, only the current state of knowledge can be called up when researching on the Internet. How this has developed over time cannot be found in the data made available on the Internet.

Since a large part of the information is now made available exclusively on the Internet, there is a risk that a not insignificant amount of data and knowledge will be lost again after a short time, also because sometimes only after a longer period the relevance of the published data and information is recognizable. If these have already been deleted in the meantime, there is often no way to reconstruct them. As a result, the citability of Internet resources is severely restricted as it is it is uncertain whether information or data remains permanently available. Either the location can change, or the data can disappear entirely.

It is often not only of historical but also of practical interest to know the level of knowledge available at a certain point in time in a certain area. For example, in order to assess the patentability of an invention, it is necessary to take into account the state of the art available at the time of filing the invention. However, the information provided on the Internet can only be used to a limited extent for this purpose, since it only provides an image of the current state of knowledge, usually. however, do not provide any information as of when this knowledge was available. At present, the assessment of inventions can essentially only be made on the basis of printed publications, which, however, now and in the future will include an insignificant amount of knowledge compared to the data of the Internet. Another problem in this connection is that, unlike printed works, it has not yet been possible to verify when this data was first available.

In the meantime, first attempts have been made to archive the data made available on the Internet. For example, the Internet archive

(www.archive.org), in which the content of websites is stored on data tapes to prevent the loss of the information contained therein

Avoid changing the website. In addition, the stored data is provided with information which provides information about when the data was stored. This makes it possible to determine the information content of a

Find out the website earlier by calling up the data stored in the archive. Also save the alexa.com and google.com web pages

Data from the Internet, however, will be overwritten if newer data of the same resource is saved, so that only the last saved version is always publicly available.

Furthermore, a method for creating a database is known from US Pat. No. 5,933,832, in which the stored data are provided with a time index which provides information about when the data was renewed. However, this method also does not offer the possibility of searching specifically for data or of accessing data that was available to the general public at a specific point in time or period. Another option is to use proxy servers (information about the AT&T iProxy project can be found at: http://www.research.att.com/ ^~ iproxy / archive /), which provide Internet users with access to the system to expand their scope in such a way that they form a personal archive for the respective user. The user has the option of storing a currently accessed website in the personal archive together with information about the time of storage. If he accesses his personal archive at a later point in time, he is able to restore pages essentially as they were previously available on the Internet. However, the content of this archive is only limited to the information that is specifically selected and saved by the user, so that it does not provide a comprehensive overview of the level of knowledge in a particular area at a particular point in time.

Furthermore, a method for creating a database is known from US Pat. No. 5,933,832, in which the stored data are provided with a time index which provides information about when the data was renewed. However, this method also does not offer the possibility of searching specifically for data or of accessing data that was available to the general public at a specific point in time or period.

In addition, both the Internet archive and the personal archive do not offer the option of specifically searching for information, since these are pure databases that do not offer the option of searching under certain search conditions.

The present invention is therefore based on the object of specifying a concept for accessing and searching for data or data containing resources that are currently or previously stored in a branched system, the time at which the data is available being taken into account. The invention relates not only to the Internet, but to all distributed or networked systems which provide data, for example also on intranets, extranets, LANs, WANs or MetropolitanANs.

The object is achieved by the methods and devices of the independent claims.

A first aspect of the invention relates to a method for searching for data currently or previously stored in a distributed system or for resources, which data contain. In this context, resources are to be understood as all storage locations of data which can be clearly localized, in the case of the Internet, for example, the storage locations which can be localized by means of a URL (Uniform Resource Locator) or a corresponding standard. The data is then to be understood as the websites available under a resource, for example, including the files contained therein and / or the files associated therewith. Strictly speaking, if they are clearly addressable, they can also represent their own resource. For the sake of clarity, however, data will primarily be referred to below.

The method according to the invention comprises several steps, with a query containing one or more search terms first being transmitted to a search unit. In a further step, the distributed system searches for resources or data or information relating to this data which meets the condition (s) defined by the search terms, and in a final step the data found with the search and / or information related to the resources that contain this data. The search can, as is usual with search engines on the Internet, take place in such a way that the distributed system is not searched for every query, but rather that the search engine is connected to a memory which stores images or references (“fingerprints”) to those in the distributed system The data is then only searched in this memory and the search results then refer to the respective data or resources in the distributed system According to the invention, the data contains a time index with respect to the point in time or period at which it is available in the system were, the search terms in turn may include a time parameter that limits the search to the time and / or period defined by the time parameter.

The method according to the invention thus offers the possibility not only of searching for specific resources or for information on a specific subject area or on specific search terms, but also to restrict the search to specific periods or times. This opens up the possibility of getting to know the state of knowledge in a certain area at an earlier point in time and thus, for example, of following the development over time in this area. The method according to the invention thus offers the same possibilities as when searching in a classic library, the search being able to be carried out much more simply and efficiently on account of the computer-aided automated processing of the request. Developments of this method according to the invention for searching for data or data-containing resources are the subject of subclaims. In particular, the search unit is preferably implemented by a computer program, which is made available, for example, by certain resources of the system. In particular, this aspect of the invention relates to a search engine for searching for data or data containing data stored in a distributed system, the search engine being designed such that it carries out the search in the manner just described.

Another aspect of the present invention relates to a method for accessing resources of a distributed system and for receiving and / or displaying data currently or previously stored in these resources, which also includes access to the data archived in an archive or storage network is. The data in turn contain a time index relating to the point in time or period at which they were available in the system, and the information contained in the time index can also be displayed when the retrieved data is displayed. This means that a user can see at any time when the data presented was available.

This method is also preferably implemented using a computer program. This aspect of the invention relates in particular to a browser for access or a representation of access to the resources of a distributed system which is realized in a browser. Further training is the subject of further subclaims.

According to a third aspect of the invention, which likewise relates to a method for accessing the resources of a distributed system and for receiving and / or displaying data that is currently or previously stored in the resources, the data of the system is accessed as a function of one Predeterminable time parameters, the data stored in the system also containing the time index with respect to the point in time or the period of availability in the system.

In addition to the method described above, not only is the information contained in the time index of the data shown, but the data is now accessed in a targeted manner in such a way that only the data that is available at a predeterminable, possibly earlier, time point or period is used Available data is accessed. It is therefore possible to determine the information content of resources at an earlier point in time. It also opens up the possibility of not only being available in the currently available standing distributed system but also to move in a temporal dimension. For example, the temporal development of a certain resource can be observed in a simple manner. Alternatively, one could now move around in the distributed system such that the system behaves as it was available at a certain earlier point in time.

This third aspect of the invention also relates in particular to a browser for access or a representation of access to the resources of a distributed system which is implemented in a browser and to which a time parameter can be predefined, the access to the data of the system taking place as a function of this time parameter. Further developments of this aspect of the invention are also the subject of subclaims.

Finally, another aspect of the invention relates to a method for archiving data stored in a distributed system. Data is first retrieved or received from the distributed system, then supplemented by a time index relating to the point in time or period at which the data was available in the system, provided the data did not yet have a time index, and finally in a data archive or archived at a depository in such a way that the data can be accessed by search engines, browsers or programs. Alternatively, the archiving can take place at any point in the distributed system, in which case verification information relating to the data can also be archived in a depository.

The present invention thus offers a self-contained concept by which it is possible to use the complete information content of the data of a distributed system, taking into account the temporal development of the data. This provides comfortable and powerful display and research options.

The invention will be explained in more detail below with reference to the accompanying drawing. Show it:

Figure 1 is a schematic representation of a distributed system for explaining the present invention.

2 shows the display of the window of a browser according to the invention, which offers the possibility of taking into account the time or period of availability of this data when accessing and displaying data; and 3 shows a search engine according to the invention, which offers the possibility of taking temporal aspects into account when searching for data.

1, the structure of a networked or distributed system with the corresponding resources and the type of data available are first to be explained in more detail. This is done using the example of the Internet, but the invention relates to all conceivable distributed systems that make data available, that is to say also to intranets, extranets, LANs, WANs and metropolitanANs.

In the present case, the distributed system 1 contains a number of different resources 4 to 10 and 2b, i.e. from clearly localizable storage locations that contain data. In the case of the Internet, these resources 4 to 10, 2b can be localized by their URL, in the most general case by any corresponding standard. Strictly speaking, each component of a resource that can be clearly localized itself can represent its own resource.

The resources 5 to 7 each contain retrievable data, for example websites present in the HTML or another hypertext standard, including the files associated therewith. The reference symbol 2b denotes a user terminal which can act as a resource, provided that the data stored there belong to a component of a storage network. The character of the storage network will be explained later. Reference number 8 denotes a further resource, which is a public depository. Data made available from resources 5 to 7 can be specifically selected and copied to this public depository 8 - also called a trust center - for data backup, or resource 8 can be instructed to copy this data. The function of this depository 8 will be explained in more detail later. Furthermore, a data archive 9 is part of the system 1, in which the data, for example the resources 6 and 7, are systematically stored for archiving. Finally, the system 1 contains the search engines 4a or 4b as further resources, which serve to serve a user connected to the system 1, represented by a further user terminal 2a, or the user of the terminal 2b while searching for the resources 5- 7, the archives 8, 9 or the data made available in the context of a storage network 2b or 10. In the same way, the search engines 4a, 4b can be used by programs, represented, for example, by an intelligent agent 12, which automatically carries out searches for other resources, archives or users. The Search unit 4c only supports research in archives 8 and 9 as a mere interface.

User 2a can be connected to system 1 via a proxy server 10 or directly as with user 2b.

Furthermore, I-d denotes private archives, which can be part of resources 2b, 8, 9 or 10. The function of these private archives l la-d will also be explained in more detail later.

Before the methods according to the invention for searching and for accessing resources or data are explained taking into account the time aspect, the type of archiving of the available data should first be discussed.

The data 5 to 1 provided with the index 1 represent the latest data stock made available by the resources 5 to 7, ie the data that was last updated. Resource 5, for example, also provides 5 in addition to the latest data _! several data 5 ₂ and 5 ₃ published and archived at earlier times are also available. In the case of the Internet, this archived data corresponds to 5 ₂ and 5 ₃ websites in a form that was available at earlier times.

This archived data 5 ₂ and 5 ₃ can be stored in the original format with all content and possibly the data or resources linked by means of references (links), so that they can be read, for example, by a browser or an alternative playback program and displayed exactly as they are were available earlier. This means that during archiving, for example, the download files linked by the links, which are behind the graphical user interface (e.g. PDF files, Word documents, etc.), are also saved. If the data also contain scripts, applets or content dynamically integrated from other resources, this content can also be archived.

In order to achieve a reduction in the amount of data, provision can also be made to archive the data 5 ₂ , 5 _{3 in} compressed form or, if necessary, to exclude individual contents which are not essential for the information content. For example, the advertisements or advertising banners often displayed on websites could be excluded from archiving. If the data contains dynamic content or content that depends on the configurations or information of a user, so when archiving, they are preferably saved as they appear by default when they are called up for the first time.

The time at which data is saved for archiving can vary depending on the type and content of the data. For example, it can be provided that the data at regular intervals, e.g. a few days, weeks, or months. Another option is to only archive if the content of the data has changed to a certain extent, which e.g. can be determined by a comparison between the most recently archived and the current data, if necessary with the aid of checksum methods or the like. In this case, to reduce the data volume, provision can also be made for only relative changes to be stored and for the data to be completely archived only in the event that the total of the changes is greater than a complete re-storage.

It is essential that when data is archived, the data last saved is not overwritten and is therefore lost, but that the archiving takes place continuously, so that the complete development, for example, of the data made available by resource 5 using the current data 5, and of the set of archived data 5 ₂ , 5 ₃ can be traced.

Which data and where it is archived can also depend on various conditions. For example, resource 5 completely archives its data 5j to 5 ₃ itself and thus makes a complete data record available. This is also the case with the second resource 6, in which the own data 6 to 6 ₃ are also archived over time, but not with resource 7. The archive 9 can make the claim, all of the resources in the distributed system 1 5-7 provided data 5 to 5 ₃ , 6, to 6 ₃ and 7. This applies regardless of whether the resources archive their data for general access themselves like resources 5 and 6, but not resource 7. It is also conceivable that only the previous data of certain resources are archived - for whatever reason: so in Example the earlier data 6 _t and 7 _t of resources 6 and 7, but not that of resource 5.

However, this archive 9 can also be provided to archive only the information relating to a specific subject area. If data relating to this subject area are published by resources 5-7, these are systematically archived in archive 9. The data can be backed up or copied into the archive 9 using, for example, automatic robotic methods. Based on addressing, cross-referencing, frequency of updates or relevance of the various resources, a systematic query and archiving is carried out with the help of these procedures. It is possible to use so-called "self-learning" methods, in which the frequency of polling is made dependent on the frequency at which the data is updated and the extent of the changes. "Learning" can take place with the aid of mathematical methods, for example based on neural networks, whereby the query frequency is adjusted independently in order to achieve optimal archiving. This means, for example, that the archiving frequency is increased if the data is updated more frequently, whereas, in contrast, archiving takes place only at long intervals if the data remains unchanged over a long period of time. In addition, the nature of the changes in content can also be taken into account, for example only the content of texts contained in the data being taken into account for assessing whether archiving should take place or not.

In addition to systematic archiving with the aid of robotic methods, provision can also be made for archiving to take place only on the basis of a specific request. For example, the resource 6 can initiate archiving in the archive 9 on its own at regular intervals or at times at which the data have been updated. This can be implemented using applets, scripts or other software solutions that are provided for setup on the corresponding resource. This is particularly advantageous in the case of resource 7, since, in contrast to resources 5 and 6, it does not itself archive the data made available by it. If the data of resource 7 is updated in the example shown, the data previously made available are copied into archive 9 so that it contains a complete set of data 7 that was available at earlier times. Of course, the archive 9 can also be requested by one of the users 2a or 2b by entering a specific resource to archive this data or resource. The interface for the input can run on its own resource or can be integrated in software - for example in the user's browser.

The archive 9 can also be the basis of an expert system which allows the targeted output of data on specific content, topics, categories, formats and times or intervals. Research in the archive can be carried out via a separate interface, for example a search unit 4c. Archive 9 can also be designed in such a way that data specified in advance is only archived by content or other categories.

In general, there is also the possibility that the archived data can only be accessed against payment of a certain fee, whereby the original provider of the data, i.e. resources 6 and 7, from which the data originate, can share in the income, for example in the form of micropricing.

Another possibility is to archive data in the archives 8 and 9, which are not directly publicly accessible in the system 1, but can only be reached via a further - possibly password-protected - interface. This so-called "invisible net" or "deep web" is an area of the Internet that is not directly accessible to users by controlling resources; instead, this area is available in the form of databases that can be queried on these resources via certain interfaces. In this case, archiving can include direct access to the databases behind the query interface for the purpose of archiving, if necessary after a corresponding agreement, which can also be automatically negotiated by a software solution between the resource and the archive / robot.

Provision can be made for the data in the archives 8 and 9 to be indexed with an additional note which states that access is only possible with payment of a fee or in some other way limited. It can be provided that the availability of this data is displayed as part of a search, but it can only be called up against payment of a fee. This can also include that the data is already identified by the original resource 5-7 in such a way that it can only be called up under certain conditions, for example a fee. This can apply in particular to data from the "invisible net".

The public depository or trust center 8 performs other tasks. A first task is to have the publication of certain data of resources 5-7 documented or verified. An interest in such archiving can exist, for example, if it is to be proven that certain information was already available at a certain point in time. For example, it can thus be clearly established whether information which would conflict with the patentability of an invention was already available to the public before the relevant priority date of the application. So it works about documenting, verifying and protecting the origin, time and content of data and resources from manipulation.

The method provides that the depository 8 is instructed, that is, the request for archiving, for example by the user 2a or 2b, who issues an instruction to query certain data from a resource 5-7 and in the trust center 8 - together with Information on time and origin - to be filed. Likewise, data can be stored in the trust center 8 based on the request from a resource. Both can - as described for storage in archive 9 - be done both manually (i.e. when requested) and automatically by a software solution. The deposit can also include that further levels of files connected to the data to be archived by means of links are archived. How many levels should be saved can be made dependent on the user configuration.

In connection with this, there is also the possibility, as a special case, of having certain dynamic contents documented and verified - determined by queries, user inputs or default settings. This is relevant, for example, when (sales) contracts are concluded via the Internet. In this case, the deposit can be made in such a way that the query runs via the intermediary depository 8 and the dynamically generated contents can thus be verified and documented. Another possibility is that the depository 8 executes the request virtually in parallel with the configuration of the user. Since this data is generally not relevant to the public or, on the contrary, would even have to be protected for data protection reasons, it could either be stored in a non-generally accessible area of the depository 8, which can only be viewed by one or more specified users, e.g. in a private archive 11c. Another option is to assign only one verification stamp while the actual data is being saved by the user. The functioning of the verification stamp is explained below

Another task is to make certain content or resources citable when requested by user 2a, 2b or a virtual agent 12. To do this, it must be ensured that certain contents characterized by origin and time are stored permanently and unchangeably. For the storage of data as well as the check with regard to possible changes in data during the transmission processes from and to the trust center 8, this can be done the security criteria according to the Signature Act are used. The procedure is as described above.

A third function of the depository 8 can consist in the fact that the depository 8 documents or verifies, at a specific point in time, the level of knowledge gathered in an area, for example by means of an expert system, independently of a request for the specific storage of certain data or resources. The trust center 8 can therefore also archive data of the resources 5-7 itself, analogously to the method illustrated in relation to the archive 9. In particular, data of certain resources can be monitored at regular intervals and, if necessary, archived automatically for a fee.

The trust center 8 ensures that the availability of the data is guaranteed at all times, but at the same time manipulation is excluded, so that the data queried from the trust center 8 at a later point in time is identical to the original data available in the distributed system. For this purpose, the corresponding data - as described above - can be completely archived in Trust Center 8. However, it is also conceivable for the trust center 8 to create a digital verification stamp or “fingerprints”. The stamp contains coded information on the time, origin and content. A copy of the stamp is stored in the depository 8. The data or resources then need not be stored in the trust center 8, but can also take place on the resource 5-7, in the archive 9 or in a personal archive 11a-b (ie also for a user, possibly in the storage network) Data can then be determined by comparing the verification stamp or the fingerprint whether this data is identical to the originally verified.

From the point of view of copyright law in particular, it can be indicated that data cannot be saved from all resources in such a way that it is or should be publicly accessible in the long term. In this case, the possibility of decentralized storage still remains, for example, with user 2a or 2b; As stated, only a copy of the verification stamp would be deposited in the trust center 8. In relation to the first two tasks of the trust center 8, it can be provided that after the verification or archiving process has been completed, the user or, in the broader sense, the client is notified of the archiving / verification of the data, and is additionally informed that the data are from him specified publication or citation point is permanently documented or quotable. In general, the first two tasks can be taken over by the trust center 8 against payment of a fee or the use of data archived or verified in the sense of the third task may be subject to a fee.

In parallel to the previously described methods for storing in archives 8 and 9, there is the possibility of setting up personal archives, to which only a specific user or a more specific group of users can have access. These can be designed as "virtual archives" such as 11c and ldd, in which information from archives 8 and 9 is filtered according to user specifications and, if necessary, processed. A section of the entire archive is thus visible in the personal archive. For example, an overview can also be shown It is also possible that these private archives 11c and 11d display data which are stored in archives 8 and 9 but which are only intended for a specific group of users and not for the general public In contrast, the archives 11a and 11b represent actual storage locations in the sense that data are archived here directly - together with the time and origin. The personal archive 11b is part of the user terminal 2b. Finally, the user 2a also has the option available to create a personal archive 11a len, to which only he - or a more specific group of people - has access via a corresponding proxy server 10.

Archiving in the personal archives 11a and 11b can, for example, take place automatically when the user 2a or 2b accesses certain data of the system 1. As with the trust center 8 and the archive 9, however, automatic archiving methods can also be provided. It is also possible for data and resources to be archived in the personal archives 11a and 11b when the user issues the corresponding command by directly entering an interface through a software solution, for example integrated as a button in the user's browser. Functional extensions of the personal archive 11c or lld can concern a notification of the user when new data is added.

In addition, it can be provided that not only the user 2a or 2b has access to his personal archive 11a or 11b, but that he makes it available to the general public. In this case, the personal archive 11a or 11b has the same function as the archive 9, but only contains the data archived therein personally by the users 2a or 2b. In this way it is possible to make an entire network of personal archives available to provide, so to create a decentralized storage network, which overall can contain a large part of the data provided by the system 1 in the past.

It is important to note that all archived data, regardless of whether it was archived by resources 5 and 6 themselves, trust center 8, archive 9 or private archives l la-b, contains a time index that provides information about at what point in time or in what period of time the data was available in the system. Available means that the data is basically accessible at this moment. The time index can be one, two or more dimensions. One-dimensional means that only a singular time of availability is recorded. Two-dimensional means that two points in time define a time interval (continuum) in which the data were available. Accordingly, multidimensional means that several individual times and / or intervals of availability are recorded. Data in individual resources expediently contain one- or preferably two-dimensional time indices, archived data also multi-dimensional.

The time or period of availability can be determined in various ways. In the simplest case, the original resource 5-7 gives the data a time index. Usually, this will be the time when the data will be published for the first time or the period from this time of publication to the current time or the time of the first change. The time index can also contain an indication of the time measure used to determine it (local time, but usually GMT).

When the data is called up or transferred to one of the archives 8, 9 or 11a and 11b, the time assigned by the resources can then be transferred. If the resource itself does not give a time index, the time of retrieval or archiving can be used as a time index; with ongoing review, this can also be a period.

For various reasons, other time indices can also be assigned during archiving. Especially when it comes to the verification of certain dates and times / spaces - i.e. when archiving in Trust Center 8 - it must be ensured that the data was actually accessible at the times recorded by the resource or that this data was not subsequently changed , In this case, the trust center will only be able to record certain times for the time index; this is, for example, the moment this data is called up (by a robot or manually). A period (i.e. a continuum of availability) can therefore only be recorded if there is a continuous check of the accessibility or availability. This can also be regulated by a software solution such that the resource regularly contacts the trust center as long as the data is available, or the trust center 8 or the archive 9 is automatically notified of changes.

The same applies analogously to the verification by means of the verification stamp. In order to enable verification, the verification stamp must be deposited at the exact time that the data is received or, in the case of verification, the time index that the data has is automatically the time at which the verification stamp was created.

It is also important to note that all data not archived in the original resources 5 and 6 contain a reference to their original origin.

Optionally, the archived data can contain further notes, for example the references to identical data from other resources, which enables data that come from different resources but have identical contents to be linked. A possible form of such a reference is the reference to the URN (uniform resource name) of a document, that is to say a resource-independent identifier for data. All of this becomes important when it comes to finding identical data that can be found under different resources over time. The notes on identical data can also be supplemented by user input in a corresponding interface. This makes sense, for example, when the data changes to another resource. This can be noted by user input or automatically, and consequently a temporal continuity of the data is established, even if the resource has changed. Furthermore, the data can have blocking notes, which only make the availability possible from a certain point in time or against payment of a fee.

In principle, it is conceivable that the notes on indexing, time, availability, fee, confidentiality, etc. are stored in the resource together with the file name as further file properties. This would also allow direct access to these files using a correspondingly expanded locator. Additionally or alternatively, this information can also be saved in the file itself (for example in the header for HTML documents). However, it is also conceivable that all or part of the indexing information is stored centrally in its own Database file can be stored on the corresponding resource or another resource in the distributed system. In this case, direct addressing (for example using an expanded locator) is only possible insofar as the access request for a specific file first has to be directed to the resource with the indexing information. This interprets the request accordingly and then forwards the access request so that the desired file is accessed directly.

In the case of the Internet, one way of addressing the data is to extend the URL standard to an extended locator, for example a uniform resource and time locator (URTL). In addition to addressing the resource, this new locator for resources in distributed systems also contains a time address, so it has been expanded to include a time component or a time parameter. In this case, different data, for example web pages, which can be reached under the same URL over time, can be individually controlled by the extended locator. The additional time is a further parameter in the addressing, which can be recognized as such when the data is accessed and processed directly. If addressing takes place according to the conventional standard, that is to say without a time, it can be provided that the most current data is accessed as standard.

If an entry is made with the extended locator, explicit access can also be made to data that was available under the same resource but at an earlier point in time, for example the data 5 _2. And 5 ₃ in the case of resource 5. That is, they can can be accessed directly from the resource in question. If at this point in time or interval there is no stored data, then an automatic access to the archives 8, 9, and / or 11a and 11b can be provided. If a resource or the archives do not have any data per se at the time specified in the locator, the data corresponding to the closest time can automatically be called up from the resource or from an archive (8, 9, 11a, 11b). It can also be provided that the request or access is forwarded to the archives or search engines 4a, 4b with the aim of displaying a selection of similar or identical documents (for example by means of URN), for example in a pop-up window.

If the extended locator is not supported by transmission protocols, the network infrastructure and / or individual resources of the distributed system, the extended locator can be simulated by using the previous URL specifications, so that two-dimensional addressing according to resource and time is possible is. This presupposes that the resources can also interpret the information encoded in this way in URL format using a suitable software solution.

On the user side, this new standard can be simulated by a software expansion of the proxy server 10, which converts the requests for data in connection with a specific point in time into corresponding access commands to resources 5-7 or archives 8, 9, 11a and 11b. The same can also be done by appropriately expanding the user terminal, for example the browser, in such a way that the two-dimensional input of resource and time is software-coded in the URL standard.

The method according to the invention for accessing the individual resources of the system and for receiving and / or representing the data stored in the resources will now be explained below. This should be explained in particular using the example of the Internet with the special display options in a browser.

Access takes place through a browser installed in the computer 2a or 2b, via which requests for data contained in certain resources - possibly via a proxy server 10 - are forwarded to the corresponding resources. 2 schematically shows a window of the browser displayed on the monitor 3 of the computer 2a. The address of the resource to be accessed is shown in an address field 20 in the upper area. In addition to this address field 20, a further time field 21 is arranged, which provides information about the time index attached to the data shown.

If data is to be accessed, the address of the desired resource is to be entered in the address field 20, at the same time a time parameter can be specified in the time field 21, which provides information about the point in time or the period from which the desired data should come. If the time parameter is omitted, the latest version of the stored data can be requested as standard, as shown above. Of course, the input or output of the time parameter does not have to take place via its own time field, but can be entered or displayed within the address field as part of such an expanded address.

The inputs of addresses and time parameters are then forwarded directly to the corresponding resource 5-7, possibly via the proxy server 10, if necessary in the simulated URTLocator. This query does not produce a result (because the resource is not can be reached because it does not support the standard or because it has no data for this time parameter), the request is forwarded to one of archives 8, 9 or / and 1 la, b.

Of course, parallel requests to resources and archives are also conceivable. If it is found that several resources or archives make the requested data available at the same time, if there is a mismatch between these data, the data from the trust center 8 or the data checked by means of a verification stamp are preferably retrieved, since they are protected against subsequent manipulation in any case were. If there is neither data from the desired time period in resource 5-7 nor in archives 8, 9 and 11a, b, it can be provided that either the data currently provided by the resource is automatically accessed or that data is requested what is available before or after the desired period is searched. Alternatively, alternative resources can also be output and displayed, for example, in an additional window or part of the browser, which contain identical or similar data. The procedure using URN or indexing notes is described above.

When data is displayed, the time index 21 or the information contained in the time index for the data displayed in the browser window are simultaneously displayed in the time field 21, so that it can be seen at any time from which period the data shown originate. Of course, an alternative form of representation is also conceivable, either implicitly in the address field or graphically as a time bar.

Since the data is ideally archived completely, in the case of the Internet an archived website can be displayed exactly as it was originally available. In this case - as shown in Fig. 2 - less relevant information also appears, e.g. Advertising banner 23 or the like. However, if the data is only archived in compressed or filtered form as described above, it can be provided that only the essential information, that is to say texts 24 and associated figures 25, are displayed.

Reference number 26 denotes a link that represents a cross-reference to further data or resources. Since, depending on the scope of the archiving, the data to which the link 26 refers can be archived, in this case selecting this link 26 automatically leads to the display of the information on which this link 26 is based, also in terms of time. This is the possibility given to navigate through the system at a predetermined time. However, if the data on which the link 26 is based were not stored either on the resource or in one of the archives 8, 9, 11a or 11b, it can be provided that the information available next to the predetermined point in time is accessed. Alternatively, it can also be provided that a new point in time must be specified in order to carry out the access. Possibly. an overview of the times from which data is available can also be shown (e.g. as a pop-up window).

Furthermore, a time bar 22 is shown on one side of the browser window, which offers the possibility of navigating in the time dimension on the displayed website. This means that selecting the upper arrow 22a automatically leads to access to those data which have been archived according to the data currently displayed in the window. In contrast to this, a selection of the lower arrow 22b automatically leads to access to data that is older by a time step.

Buttons can also be provided in the browser shown in FIG. 2, by means of which time tolerances can be specified with which the entered time parameter is to be treated. For example, this can be used to set the manner in which corresponding data from other periods should be accessed if data from a desired period are not available. With the help of another button, default settings can be made whether and in what order to the various data stocks of the system, i.e. For example, resources 5-7 or personal archive 11a-d should be accessed first, then archive 9 and finally trust center 8.

If the browser is to be used to navigate between different resources, the time specified by the time field 21 can be activated or deactivated. Activation means that only data that meets the time condition specified in time field 21 should be accessed. This corresponds to the previously described navigation at a fixed point in the past. Due to the frequent updating of the data made available in distributed systems, however, it often happens that cross-references to other data lead to resources that are no longer accessible or that no longer provide data corresponding to the context at that time. Provided that the data corresponding to the time at that time is not stored in the archives 8, 9 and 11a and 11b, it can be provided according to a further development of the method according to the invention that in such a case the request is automatic The search is expanded to include the most recently archived data for the resource searched for or the data closest to the time of the search. This ensures that the most recently available data can be displayed in any case. Deactivating the time specified by the time field 21, on the other hand, has the result that the current or at least the last available archived data of the corresponding resources is shown in principle.

An extension can also be that a separate window displays information about similar or identical data from another resource. This information could provide an indication that the resource you are looking for can be reached at a new address and that the data is only updated on this new resource. Furthermore, it can be displayed in an additional window which cross references have the data shown, or which other data contain cross references to the data displayed in the browser window. The information required for this is based on the indexing or reference notes outlined above or search engines, which can also categorize content.

Finally, algorithms can be implemented in the browser according to the invention, which calculate the next probable access depending on the previous access by the user and already automatically access the corresponding data in the system. This is relevant, for example, with regard to the extension just shown, if one of several alternatives that are similar in content is to be selected.

The method according to the invention offers the possibility of navigating both between different resources and also in terms of time. In addition, appropriate extensions can be used to ensure that the most recently available data can be transferred to the archive 9 even when the operation of a resource is discontinued and can be displayed from the archive when requests are made to this resource.

Finally, the method according to the invention for searching for data or for resources containing data is to be explained taking into account the time or period of availability.

For this purpose, search engines 4a and 4b are provided, which offer the possibility of searching for specific information from the data provided by the various resources 5-9 and 1 lb and possibly 1 la of system 1. For this In a first step, the user 2a or 2b transmits an inquiry containing one or more search terms to the search engine 4a or 4b. This searches in the system 1 for resources or data which meet the condition (s) caused by the search terms. The search can, as is usual with search engines on the Internet, take place in such a way that the distributed system (including the archives) is not searched for every query, but rather that the search engine is connected to a memory that contains the images of the notices (" fingerprints ") on the resources and data present in the distributed system. It is then only searched in this memory and the search results then refer to the respective data or resources in the distributed system. This memory can in turn - as in the case of the search engine 4b the archive 9 or the trust center 8 itself. The found data or information relating to the resources which contain the determined data are then transmitted back to the user 2a. Fig. 3 shows a window of such a search engine 4a or 4b, as shown on monitor 3 of user 2a, which usually has an input field 27 for entering search terms, according to which i n the available resources or data should be researched. Several search terms can also be combined with the usual links (AND, OR etc.) or exclusion criteria.

In addition, the search engine has one or more time parameter windows 28, 29, in which time information can be entered and thus one or more time intervals may be specified. As an additional search term, the time specifications determine a time parameter, by means of which the search is limited to data that were available in the system in the specified period. It is therefore possible not only to search under the current data as before, but also under data available at an earlier point in time. In particular, there is the possibility, for example, of only retrieving information on a specific topic that was available in the past at a specific point in time. The data or the resources containing the data can then, for example, be displayed on the screen in the form of a table or list 30 or be prepared as a catalog or in some other way, for example graphically.

It can be provided that the search engine 4a or 4b is not accessed in a browser, but rather via an upstream input interface in the sense of a separate software program. This interface can be implemented, for example, by an additional program or the like, which appears in the browser as a separate input window or as a browser extension. This extension offers additionally the possibility of automatically converting certain entries or error messages due to non-availability of data (in the sense of data of the "invisible net" behind the surface) or resources ("broken link") into corresponding queries to the search engine. This results in a new search request or a new access to data, which is then automatically called up, possibly reconstructed and displayed in the browser. In addition, this interface can be used to display a catalog for the selection of certain terms or resources, according to or in which research is to be carried out. In addition, this interface can be used to query stored user-specific parameters. As an alternative to a separate program, the extensions offered by the interface can also be integrated into the browser.

Analogous to the input interface just described, a corresponding interface can also be provided for the output of data obtained from the system. When entering search terms and / or resources or groups of resources and / or time or other parameters, the latter can automatically present the information found in a one- or multi-dimensional result list - sorted if necessary according to the parameters mentioned or other relevance criteria. It can be provided that in the event that a query leads to a clear result - for example when querying for a resource at a specific time - the data is displayed directly in the original format, while in the event of the occurrence of several data which meet the search criteria fulfill, a presentation can be provided in a list of results or a cataloged, categorized or graphically prepared output takes place. In order to enable the display in the original format, programs or extensions may have to be made available to users by the search engine or resources.

If only a single resource is searched for, a graphic representation of its life cycle - for example the temporal development of the data stored on it (by identifying the change) - or its networking with other pages and resources over time can be provided. Optionally, references to other resources that are similar or identical or have a common origin can be displayed. The data found can be sorted, for example, using neuronal or evolutionary algorithms. In addition, it can be provided that the search results can be searched again if several data fulfilling the search criteria are found. The method according to the invention for searching for data and data-containing resources, taking into account the time, also offers the possibility, for example, of explicitly researching for the time parameter, that is to say for example searching for data that is available at a specific point in time or within a specific period of time stood or which have changed within a predetermined period. This also implies the ability to search for resources or groups of resources on which data has changed within a certain period of time.

The present invention thus offers the possibility of conveniently accessing the resources or data made available in a distributed system, or of searching for data with corresponding information and at the same time also taking into account the period of availability of this data. As a result, the information content of the available data material can be used extremely effectively.

The methods according to the invention for searching for and for accessing the resources or data are preferably implemented by software programs. Existing search engines or browsers that do not yet support the method according to the invention can be retrofitted using additional programs or applets.

Claims

Expectations

1. A method for the automated search for data (2b, 5-10) stored in a distributed system (1) or data-containing resources, which comprises the following steps:

Transmission of an inquiry containing one or more search terms to an

Search unit (4a-c), search for data or data containing data stored in the system (1)

Resources that meet the condition defined by the search terms, and

Output of the data found in the search and / or information regarding the

Resources which contain this data, the data stored in the system (1) containing a time index relating to the point in time at which the data were or were available in the system (1), and the search terms comprise a time parameter , which limits the search to the point in time and / or period defined by the time parameter.

2. The method according to claim 1, characterized in that in the absence of a time parameter, the search is carried out only under the data currently provided by the resources (2b, 5-10).

3. The method according to claim 1 or 2, characterized in that in the event that the search provides a clear result, the data found are output immediately.

4. The method according to any one of claims 1 to 2, characterized in that in the event that several data or data containing resources have been found that meet the condition defined by the search terms, a list or graphical overview of the data found or Resources (2b, 5-10) containing the found data is output.

5. Computer program for performing a method for the automated search for data or data containing data (2b, 5-10) stored in a distributed system (1) according to one of the preceding claims.

6. Computer program according to claim 5, characterized in that it is an additional program for a search engine (4a-c) for searching for data or data containing resources (2b, 5-10) stored in a distributed system (1).

7. Search engine (4a-c) for the automated search for data or data containing data (2b, 5-10) stored in a distributed system (1), the search engine (4a-c) being designed to have one or more To receive a query containing search terms, to search in the system (1) for data or resources containing data that meet the condition defined by the search terms, and the data and / or information found in the search regarding the resources (2b, 5- 10) which contain this data, the data stored in the system (1) including a time index relating to the point in time at which the data were or were available in the system (1), and the search terms include a time parameter that limits the search to the point in time and / or period defined by the time parameter.

8. Search engine (4a-c) according to claim 7, characterized in that it searches for data or resources in a memory connected to it, which refers to the data or data containing resources in the system (1) meet the condition (s) defined by the search terms.

9. Search engine (4a-c) according to claim 7 or 8, characterized in that in the absence of a time parameter, the search is carried out only under the data currently provided by the resources (2b, 5-10).

10. A method for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying stored in the resources (2b, 5-10)

Data, the data stored in the system (1) containing a time index relating to the point in time at which the data are or were available in the system (1), and the information contained in the time index can also be displayed when the data is displayed.

11. The method according to claim 10, characterized in that the time index forms an extension of the locator for addressing the data.

12. Computer program for carrying out a method for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying data stored in the resources (2b, 5-10) according to claim 10 or 11 ,

13. Computer program according to claim 12, characterized in that it is an additional program for a browser for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying in the resources (2b , 5-10) stored data.

14. Browser for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying stored in the resources (2b, 5-10)

Data, the data stored in the system (1) containing a time index with respect to the point in time or period at which the data are or were available in the system (1), and at the same time also when the data (1) is displayed the information contained in the time index can be displayed.

15. A method for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying data stored in the resources (2b, 5-10), the data in the system (1) stored data a time index related to the

Time or period at which the data is or was available in the system (1), and the access to the data or the resources of the system (1) containing the data is dependent on a predefinable time parameter.

16. The method according to claim 15, characterized in that the time index forms an extension of the locator for addressing the data.

17. The method according to claim 15 or 16, characterized in that in the absence of the time parameter, only the data currently available from the resources (2b, 5-10) is accessed.

18. The method according to any one of claims 15 to 17, characterized in that in the event that in the resource (2b, 5-10) that is accessed, no data are available whose time index corresponds to the condition specified by the time parameter , an archive for data archiving is accessed.

19. The method according to any one of claims 15 to 18, characterized in that in the event that no data is available in the entire system (1), the time index of which corresponds to the condition specified by the time parameter, data is automatically accessed before or after the time or period specified by the time parameter are or were available.

20. Computer program for performing a method for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying data stored in the resources (2b, 5-10) according to one of claims 15 until 19.

21. Computer program according to claim 20, characterized in that it is an additional program for a browser for accessing resources (2b,

5-10) of a distributed system (1) and for receiving and / or displaying data stored in the resources (2b, 5-10).

22. Browser for accessing resources (2b, 5-10) of a distributed system (1) and for receiving and / or displaying those stored in the resources (2b, 5-10)

Data, the data stored in the system (1) containing a time index relating to the point in time at which the data is or was available in the system (1), and wherein the access to the data or the data containing resources of

Systems (1) takes place depending on a time parameter that can be specified by the browser.

23. A method for archiving data stored in a distributed system (1), which comprises the following steps:

Retrieving or receiving data from the distributed system (1), supplementing the data with a time index with respect to the point in time or period at which the data is or was available in the system (1), provided the data does not yet have a time index, such as

Archiving the data in a data archive (9) or a depository (8) in such a way that the data can be accessed by search engines, browsers or programs.

24. A method for archiving data stored in a distributed system (1), which comprises the following steps:

Retrieving or receiving data from the distributed system (1), supplementing the data with a time index with respect to the point in time or period at which the data is or was available in the system (1), provided the data does not yet have a time index,

Archiving the data in an archive (9) or a resource (2b, 5-6, 10) in such a way that the data can be accessed by search engines, browsers or programs, and

Archive verification information relating to the data in a

Depository (8).

25. The method according to claim 23 or 24, characterized in that the data or the verification information is archived in the depository (8) in such a way that manipulation of the data or verification information archived in (8) is excluded or a possible Manipulation when retrieving data archived in resources 2b, 5-6, 9 and 10 can be determined.

26. The method according to any one of claims 23 to 25, characterized in that the data is archived at the instigation of a user (2a, 2b).

27. The method according to any one of claims 23 to 25, characterized in that the depository (8) archives the data at the instigation of a resource (5-7).

28. The method according to any one of claims 23 to 25, characterized in that the depository (8) independently archives the data according to a predetermined scheme.