[go: up one dir, main page]

WO2004084095A1 - Systeme d'extraction d'informations - Google Patents

Systeme d'extraction d'informations Download PDF

Info

Publication number
WO2004084095A1
WO2004084095A1 PCT/JP2003/003245 JP0303245W WO2004084095A1 WO 2004084095 A1 WO2004084095 A1 WO 2004084095A1 JP 0303245 W JP0303245 W JP 0303245W WO 2004084095 A1 WO2004084095 A1 WO 2004084095A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
search
search request
database
information processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2003/003245
Other languages
English (en)
Japanese (ja)
Other versions
WO2004084095A9 (fr
Inventor
Akira Naruse
Kouichi Kumon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2003/003245 priority Critical patent/WO2004084095A1/fr
Priority to JP2004569562A priority patent/JPWO2004084095A1/ja
Publication of WO2004084095A1 publication Critical patent/WO2004084095A1/fr
Publication of WO2004084095A9 publication Critical patent/WO2004084095A9/fr
Priority to US11/068,395 priority patent/US20050165765A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Definitions

  • the present invention relates to an information search system, an information search method, an information search device, an information search program, and a computer-readable recording medium storing the program.
  • the present invention relates to an information search system, an information search method, an information search device, an information search program, and a computer-readable program that records the program, which are suitable for processing a search request for a database in parallel by a plurality of information processing devices. It relates to a possible recording medium. Background technology ''
  • the database to be searched is sequentially copied from an external storage device such as a hard disk to a memory (main storage device). Search processing is performed.
  • an external storage device such as a hard disk in which a database is stored has a low data input / output (I / O), and it takes time to transfer data between the external storage device and the memory. . Therefore, when a search process is performed, the data transfer time between the external storage device and the memory often determines the search time.
  • a typical OS Operating System
  • This disk cache function allows data read from an external storage device to be stored in an unused portion of memory in the near future. If the database to be searched has been processed by another search process in advance, it may not be necessary to transfer data from the external storage device to the memory. Therefore, when a search request continues for the same database, except for the first search request, there is no need to transfer data from the external storage device to the memory. As a result, the search time may be significantly reduced. There is.
  • the size of the database is limited to the case where the size of the database is smaller than the memory. This is because if the size of the database is larger than the memory size, the entire contents of the database cannot be stored in memory.
  • FIG. 12 is a diagram schematically showing a relationship between a database size and a search time in a conventional information search system. As shown in Fig. 12, if the size of the database to be searched becomes larger than the memory size of the computer (computer), even if the same database is searched in advance, the hard disk Access to an external storage device such as the above occurs, and data transfer from the external storage device becomes a bottleneck, and the search time is significantly longer than when the database size is small. '.
  • the present invention has been made in view of the above-described problems, and in a case where a search request for a database is processed in parallel by a plurality of information processing apparatuses, each information processing apparatus processes the search request at high speed.
  • an information search system includes a plurality of information processing apparatuses, and transmits a plurality of search requests including a database to be searched and search conditions for the database.
  • a sub-database creating unit that creates a plurality of sub-databases having a size equal to or less than the capacity of a storage unit provided in the information processing device based on the database;
  • An assignment management unit that assigns the sub-database created by the sub-database creating unit to the information processing device as a sub-search request, which causes the information processing device to process a search request for the sub-database;
  • a connection unit that acquires and combines the processing results related to search requests.
  • a sub-search condition creating unit that divides the search condition to create a sub-search condition is provided, and the assignment management unit causes the information processing device to search the sub-database using the sub-search condition.
  • the condition may be assigned to the information processing device as a sub-search request.
  • the information processing apparatus is provided with a DB affinity setting section that can set, as a DB affinity, information on a sub-database to be preferentially allocated by the allocation management section, and the allocation management section performs processing based on the DB affinity. Then, a sub search request may be assigned to the information processing device.
  • the DB affinity setting unit may be able to set in advance a sub-database to be processed preferentially, and the DB affinity setting unit may set the sub-database based on the processing history of the sub-search request in the information processing device. And you may set the DB facility.
  • the information processing device further includes a free space management unit that manages the free space of the storage unit provided in the information processing device, and the allocation management unit performs a process based on the free space of the storage unit of the information processing device that is managed by the free space management unit.
  • a sub-database smaller than the free space may be allocated to the information processing device.
  • the allocation management unit based on the time required for the processing predicted by the processing time prediction unit, gives priority to the sub search request from the sub search request having the longest time. May be assigned.
  • the sub-database creating unit or the sub-search condition creating unit may create a plurality of sub-search requests by dividing a sub-search request not yet assigned to the information processing device by the assignment managing unit. .
  • the DB affinity for the information processing device set by the DB affinity setting unit, the database as the sub search request, and the time required for the processing predicted by the processing time prediction unit are set.
  • An evaluation unit for evaluating at least one or more using an evaluation function may be provided, and the assignment management unit may randomly assign a sub-search request to the information processing device based on an evaluation result by the evaluation unit.
  • an information search method of the present invention is an information search method in which a search request including a database to be searched and a search condition for the database is processed in parallel by a plurality of information processing apparatuses.
  • a sub search request creating step for creating a plurality of sub search requests of a size equal to or less than the capacity of the storage unit provided in the information processing device based on the search request; and a sub search request created in the information processing device in the sub search request creating step.
  • the information search apparatus of the present invention is an information search apparatus that causes a plurality of information processing apparatuses to process a search request including a database to be searched and search conditions for the database in parallel.
  • a sub-search request creating unit that creates a plurality of sub-search requests of a size equal to or smaller than the storage unit provided in the information processing device based on the request;
  • an assignment management unit that allocates the sub-search request to the information processing device, and a combining unit that acquires and combines the processing results of the sub-search requests by the plurality of information processing devices are provided.
  • the information search program of the present invention includes a database to be searched and the database to be searched.
  • An information search program for causing a computer to execute an information search function for causing a plurality of information processing apparatuses to execute a search request including search conditions for a database in parallel with each other.
  • a sub-search request creating unit for creating a plurality of sub-search requests of a size equal to or less than the capacity of the storage unit provided in the storage unit;
  • the computer is caused to function as an assignment management unit for allocating a search request to an information processing device, and a combining unit for acquiring and combining processing results related to a sub-search request by a plurality of information processing devices.
  • a computer-readable recording medium records the above-described information search program.
  • the information search method As described above, according to the information search system, the information search method, the information search device, the information search program, and the computer-readable recording medium on which the program is recorded, the following effects and advantages are obtained.
  • the sub-database to be searched is cached in the storage unit of the information processing device that has once processed the sub-search request.
  • the information processing device does not need to access a hard disk having a slow access speed (disk access) to access the subdatabase, and can perform a high-speed search process for the subdatabase. it can.
  • Information about the sub-database to be preferentially assigned to the information processing device is set as a DB affinity, and a sub-search request (sub-database) is assigned to the information processing device based on the DB activity. This makes it possible to easily assign a sub-search request to the information processing device.
  • the DB affinity can be reliably set.
  • the DB affinity can be easily set by setting the DB affinity based on the processing history of the sub search request in the information processing device.
  • the storage of the information processing device can be easily and reliably performed.
  • a sub-database having a size equal to or less than the free space of a copy can be allocated to the information processing device.
  • FIG. 1 is a diagram showing a schematic configuration of an information search system as one embodiment of the present invention.
  • 2 to 4 are diagrams for explaining a method of creating a sub-search request (job) in the information search system as one embodiment of the present invention.
  • FIG. 5 is a diagram for explaining an example of an uneven job creation method in the information search system as one embodiment of the present invention.
  • FIG. 6 is a diagram for explaining a dynamic job assignment method by the assignment management unit in the information search system as one embodiment of the present invention.
  • FIG. 7 is a diagram showing an example of a job created by the sub-search request creating unit in the information search system as one embodiment of the present invention.
  • FIG. 8 is a flowchart for explaining a job allocating method by the allocation managing unit in the information search system as one embodiment of the present invention.
  • FIG. 9 is a diagram showing an example of a state in which a job is assigned to each PC by the assignment management unit in the information search system as one embodiment of the present invention.
  • FIG. 10 is a flowchart for explaining another job assignment method by the assignment management unit in the information search system as one embodiment of the present invention.
  • FIG. 11 is a diagram showing an example of another state in which a job has been assigned to each PC by the assignment management unit in the information search system as one embodiment of the present invention.
  • FIG. 12 is a diagram schematically showing a relationship between a database size and a search time in a conventional information search system.
  • FIG. 1 is a diagram showing a schematic configuration of an information search system as one embodiment of the present invention.
  • the information search system 1 transmits a search request for the database 3 input from the search request input unit 41 to a plurality of (four in this embodiment) PCs (Personal Computers: information processing devices) 2 a, 2 b , 2 c, 2 d are processed in parallel.
  • the database (search target) 3 collects and accumulates some kind of information (data) comprehensively (information group), and has a structure to process addition, deletion, update, and search of information.
  • the search is performed based on the search request input from the search request input unit 41, and information corresponding to the search request can be provided.
  • the search request input section 41 is for the operator to input a search request. This is for inputting a database to be searched (database 3 in the present embodiment) and desired search conditions for the database.
  • the search request input unit 41 is realized by, for example, a keyboard and a mouse in a computer system.
  • the search request includes a database to be searched and search conditions for the database.
  • only one database 3 is provided, but the invention is not limited to this, and a plurality of databases are provided, and at least one specific database is searched from these databases. It may be targeted, and the operator may arbitrarily select at least one of the databases as a search target.
  • the operator can input arbitrary search conditions.
  • the search result output unit 42 obtains a search result for the search request input from the search request input unit 41 from the management server 2 ⁇ and outputs it to an operator or the like.
  • the search result output unit 42 is realized by various types of output devices such as a display device and a printer.
  • PCs 2 a, 2 b, 2 c, and 2 d are information processing devices (computers) that respectively process search requests input from the search request input unit 41.
  • the PCs 2 a, 2 b, 2 c, and 2 d are sub search requests created by the sub search request creation unit 5 and assigned by the assignment management unit 8 in the management server 20. (Described later), and the processing result is transmitted to the management server 20.
  • the codes 2a, 2b, 2c, and 2d are used when it is necessary to specify one of a plurality of PCs, but when any PC is indicated, the code is used. Use 2.
  • the PCs 2 a, 2 b, 2 c, and 2 d include a CPU (Central Processing Unit) 30, a RAM 31, a ROMS 2, a hard disk 33, and a hard disk 33, respectively. It has a communication control section 34.
  • the ROM 32 pre-records programs such as BI OS (Basic Input I Output System) for performing basic input and output on the PC 2, and the hard disk 33 stores an OS (Operating System) and Various programs and data for operating the PC 2, such as an application program, are stored. Then, the CPU 30 executes a variety of programs stored in the ROM 32 and the hard disk 33, so that a sub search request (described later) can be processed.
  • the RAM (storage unit) 31 is for temporarily storing and expanding data and the like when the CPU 30 performs various processes.
  • a sub-database as a sub-search request (details) 1S is loaded on this RAM31.
  • each of the PCs 2a, 2b, 2c, and 2d has a RAM 31 of the same capacity.
  • the PC 2 has a disk cache function, and stores (caches) frequently used data and last used data in the RAM 31.
  • the CPU 30 does not need to read the data from the hard disk 33 or the database 3 having a low access speed, and can read the data from the RAM 31.
  • the processing can be performed at high speed.
  • the sub-database once processed is held in the RAM 31 by this disk cache function.
  • the data cached in the RAM 31 is flushed out of the RAM 31 in a first-in first-out manner, starting with the oldest data.
  • the communication control unit 34 controls communication of various data between the PC 2 and the outside.For example, the communication control unit 34 receives a sub search request (described later) transmitted from the management server 20, The control for transmitting the processed search result to the management server 20 is performed.
  • the management server 20 converts the search request input by the search request The processing is performed by the PCs 2 in parallel. Further, the processing results obtained by the PCs 2 are obtained and combined, and the search results are output to the search result output unit 42.
  • the management server 20 includes a sub-search request creating unit 5, an allocation managing unit 8, a DB affinity setting unit 9, a memory managing unit (free space managing unit) 10, a processing time estimating unit 1 1 And a connecting portion 12.
  • the memory management unit (free space management unit) 10 obtains and holds the size of the RAM 31 of each PC 2 in advance, so that the sub database creation unit 6 described later can store the RAM 31 of each PC 2 You can know the size.
  • the memory management unit 10 manages the usage status of the RAM 31 of each PC 2, and what kind of sub search request is issued to each PC 2 by the allocation management unit 8 (details will be described later).
  • the size of the sub-database cached in the RAM 31 of each PC 2 is managed by managing whether the (sub-database) has been allocated, thereby using the RAM 31 of each PC 2 You can manage the free space (remaining capacity; size of unused area).
  • each PC 2 is configured to perform only the processing (sub search request) assigned by the management server 20 (assignment management unit 8).
  • the RAM 31 caches the sub-databases assigned by the assignment management unit 8 respectively.
  • the processing time prediction unit 11 predicts the time required for processing related to the sub-search request by each PC 2, and includes, for example, the size of the sub-database allocated to the PC 2, the contents of the sub-search conditions, and the PC 2 ( Based on information such as the specifications and performance of the CPU 30), the processing time can be predicted.
  • the processing time estimating unit 11 may inquire of the PC 2 about the time required for each processing, and may be implemented in various modifications without departing from the spirit of the present invention.
  • the sub-search request creating unit 5 creates a sub-search request to be processed by the PC 2 based on the search request input from the search request input unit 41.
  • the sub-search request creating unit 5 creates a plurality of sub-search requests (hereinafter, also referred to as jobs) by using two methods, database division and query division. As shown in FIG. 1, the system includes a sub-database creating unit 6 and a sub-search condition creating unit 7.
  • FIG. 2 to 4 are diagrams for explaining a method of creating a sub-search request (job) in the information search system 1 as one embodiment of the present invention
  • FIG. Fig. 3 is a diagram for explaining a method for creating multiple jobs by query division
  • Fig. 4 is a diagram for explaining a method for creating multiple jobs by query division
  • FIG. 7 is a diagram for explaining a method of creating the job.
  • FIGS. 2 to 4 a search request including a database DB 1 (vertical axis in the figure) to be searched and a query (Query) Q [search condition; horizontal axis in the figure] for the database is shown.
  • DB 1 vertical axis in the figure
  • Query query
  • search condition horizontal axis in the figure
  • An example is shown in which multiple jobs (sub search requests) are created by division.
  • the sub-database creating unit 6 creates a plurality of sub-databases having a size equal to or smaller than the capacity of the RAM 31 provided for each PC 2 based on the database 3.
  • the sub-database creating unit 6 divides the database DB 1 into the sub-databases SDB 1 and SDB 2 by dividing the database with respect to the search request including the database DB 1 and the query Q.
  • two jobs are created: a job 1 that searches the sub-database SDB 1 with the query Q and a job 2 that searches the sub-database SDB 2 with the query Q.
  • the sub-database creating unit 6 stores the database 3 relating to the search request input by the search request input unit 41 based on the information on the size of the RAM 31 of each PC 2 managed by the memory management unit 10. Therefore, a plurality of sub-databases that are smaller than the size of RAM 31 of each PC 2 are created.
  • the sub-database creating unit 6 creates a plurality of sub-databases of a size of 256 MB or less.
  • databases are composed of many independent entries. Therefore, it is easy to divide the database in units of this entry. It is. Further, due to the characteristic that there are many entries, there is also a characteristic that even when the number of PCs 2 is large, it is easy to create a sub-database having a predicted search time equivalent to the number of PCs 2.
  • the sub-search condition creation unit divides the search condition into a plurality of sub-search conditions having no mutual dependency based on the search condition (query; search request) input from the search request input unit 41. By doing so, a plurality of jobs to be processed by each PC 2 are created.
  • the sub-search condition creating unit 7 divides the search condition (query) Q into one sub-search condition (query) for a search request including the database DB 1 and the query Q.
  • SQA and SQB Job A searches the database DB 1 with the sub search condition SQA and Job B searches the database DB 1 with the sub search condition SQB And have created two jobs.
  • each sub-search condition is divided into multiple PCs.
  • the search results are the same when processing is performed in parallel with, and when the search conditions before the division are processed by a single computer (PC 2). Therefore, in the merging process (described later) of the search results for those sub-search conditions performed by the combining unit 12, the load is relatively light, which is characteristic.
  • BLAST Basic Local Alignment Search Tool
  • FASTA sometimes uses a completely independent set of search requests as a query. In such a case, it can be said that there is almost no merging load due to the query division in the combining unit 12.
  • the database division by the sub database creation unit 6 and the query division by the sub search condition creation unit 7 can be performed together. While database partitioning is easy to do, The feature is that the load of merging these search results is heavy. Conversely, query division is characterized by the fact that while segmentation is difficult, the load on merging those search results is light.
  • database partitioning and query partitioning have conflicting features in the ease of database partitioning and the ease of merging search results.
  • the sub-database creating unit 6 divides the database DB 1 into sub-databases SDB 1 and SDB 2 by dividing the database in response to a search request including the database DB 1 and the query Q.
  • the sub-search condition creating unit 7 divides the search condition (query) Q into sub-search conditions (sub-query) SQA and SQB by query division for the search request including the database DB1 and the query Q. are doing.
  • the sub-search request creator 5 sends the sub-database SDB 1 to the job 1A and the sub-database SDB 1 that perform the search using the sub-search condition S QA Job 1 B to search with sub search condition SQB, job 2 A to search with sub search condition SQ A for database SDB 2, and sub search condition S QB for sub database SDB 2 Job 4 to be executed 4 jobs of B are created.
  • the sub-database creating section 6 and the sub-search condition creating section 7 are used to process jobs of different sizes (unequal) in the estimated processing time (search time) in the PC 2. Is created (uneven job creation).
  • the sub-database creation unit 6 creates a plurality of sub-databases having different numbers of entries. Can be realized.
  • FIG. 5 is a diagram for explaining an example of an uneven job creation method in the information search system 1 as one embodiment of the present invention.
  • an uneven job for example, as shown in FIG. 5, for example, first, the same number of jobs as the number of PCs 2 (four in this embodiment) for which the estimated search times are almost equal, Each of these jobs is created by dividing it into three so that the predicted search time is in the ratio of 4: 2: 1. This makes it possible to easily create 12 unequal jobs for four PCs 2.
  • the created jobs are denoted by reference numerals 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42 and 43. Is shown.
  • the DB affinity setting unit 9 can set information on a sub database to be preferentially assigned to the PC 2 by the assignment managing unit 8 as a DB affinity (Data Base Affinity). That is, the DB affinity setting unit 9 can set in advance, for each PC 2, a sub-database to be searched by the PC 2. Although a plurality of sub-databases may be specified as DB affinity for each PC 2, the total size of the sub-databases set as DB affinity for one PC 2 is calculated based on the PC 2 Do not exceed the size of RAM 31 (memory size).
  • the DB affinity setting unit 9 may set in advance the sub database on which each PC 2 performs processing preferentially by an operator, an administrator, or the like, or the processing history of the sub database in each PC 2. Based on the sub database, the sub database that has been processed in the past may be preferentially set as the DB affinity.
  • the assignment management unit 8 assigns the sub search request created by the sub search request creation unit 5 (the sub database creation unit 6 and the sub search condition creation unit 7) to the PC 2. In other words, the assignment management unit 8 requests the PC 2 to process the search request for the sub-database by requesting the sub-database and the sub-search condition created by the sub-database creation unit 6 (Job) is assigned to PC 2.
  • the allocation management unit 8 When allocating a job to the PC 2, the allocation management unit 8 By referring to the DB affinity set in the security setting section 9, a sub-database having a matching DB affinity is assigned to the PC 2.
  • a sub-database in charge of each PC 2 is set in advance as a DB affinity, and the assignment management unit 8 assigns a job in accordance with the DB affinity. Has become.
  • the sub database to be searched is cached in the RAM 31 of the PC 2 that has once performed the process for the sub search request.
  • the PC 2 does not need to perform disk access to access the sub-database. Search processing for the active database can be performed at high speed.
  • the assignment management unit 8 assigns a job to each PC 2 using a dynamic job assignment method.
  • the dynamic job allocation method is a method of literally dynamically allocating a job to each PC 2. In the dynamic job assignment, more jobs than the number of PCs 2 are prepared, and the jobs are assigned to each PC 2 in order from the job with the longest estimated search time, and the processing is completed. This can be realized by selecting a job having a long predicted search time from the remaining jobs to the PC 2 and sequentially assigning the same, and repeating these processes until there are no more jobs.
  • the dynamic job allocation is particularly effective when the processing time prediction unit 11 has low accuracy in predicting the job search time in each PC 2.
  • FIG. 6 is a diagram for explaining a dynamic job assignment method by the assignment management unit 8 in the information search system 1 as one embodiment of the present invention.
  • FIG. 5) is a diagram showing an example in which is dynamically assigned to four PCs 2a, 2b, 2c, and 2d. In the example shown in FIG. 6, it is assumed that each of the jobs 41, 42, and 43 has twice as long as the prediction search time by the processing time prediction unit 11.
  • jobs 1 1, 21, 3 1, 41 are assigned to PCs 2a, 2b, 2c, and 2d, respectively.
  • the remaining jobs are sent to PCs 2a, 2b, 2 Jobs 1 2, 2 2, and 3 2, which have a long predicted search time from among them, are assigned, and after their processing is completed, jobs 4 2, 1 3 are assigned to PCs 2 a, 2 b, and 2 c, respectively.
  • jobs 33 and 43 are assigned to the PCs 2b and 2c which have completed the processing, respectively.
  • the dynamic job allocation as described above requires more complicated job management than the static job allocation and has a large job management load. If the prediction accuracy of the job is sufficiently high, the assignment management unit 8 may assign a static job without necessarily assigning a dynamic job.
  • the static job allocation method is a method of literally allocating a job to each PC 2 statically. For example, when the correlation between the search time and the number of entries in the database is very high, the static job assignment is performed by dividing the database so that the number of entries is equal to each other and creating jobs for the number of PCs 2, By simply assigning those jobs to each PC 2 statically, the load balance between the PCs 2 can be maintained. When the prediction accuracy of the prediction search time is low, it can be said that the dynamic job allocation method can shorten the overall processing time rather than the static job allocation method. For example, when creating jobs for the number of PCs 2 and letting each PC 2 process each of these jobs, the processing of one job took twice as long as the other jobs. In such a case, the search time for the job that took that long time will determine the performance of the entire system 1. Therefore, in such a case, the effect of performing the processing in parallel using a plurality of PCs 2 becomes unclear.
  • the static job allocation method is easier to manage jobs than the dynamic job allocation method, and is effective when the processing time (predicted search time) of each job can be predicted with high accuracy in advance. It is.
  • the assignment managing unit 8 may calculate the evaluation value of the job that has not been assigned using the evaluation function and determine the job to be assigned to the PC 2. ,.
  • a simple example of a merit function is Conceivable.
  • the allocation management unit 8 selects the job having the highest evaluation value from the jobs that have not been allocated yet, and allocates it to PC2.
  • the combining unit 12 acquires the processing result (search result) of the job (sub search request) by each PC 2 and combines (merges) it.
  • the search for the search request input from the search request input unit 41 is performed To create the result.
  • the search results combined by the combining unit 12 are transmitted to the search result output unit 42.
  • a search request database to be searched and search conditions
  • the search request is transmitted to the management server 20.
  • the sub-search request creating unit 5 generates a plurality of jobs (sub-search requests) to be processed by the plurality of PCs 2 based on the search request input from the search request input unit 41.
  • Create sub search request creation step.
  • the sub-database creating unit 6 creates a plurality of sub-databases based on the database 3 so as to have a capacity equal to or smaller than the RAM 31 of each PC 2.
  • the sub search condition creating unit 7 creates a sub search request based on the search condition input as a search request from the search request input unit 41 as necessary.
  • sub-search request creating unit 5 creates an uneven job based on the estimated processing time by the processing time estimating unit 11.
  • FIG. 7 is a diagram illustrating an example of a job created by the sub-search request creating unit 5 in the information search system 1 according to an embodiment of the present invention, in which four PCs 2 (2 a, 2 b, 2c, 2d), based on a search request to search a database 3 having a size (for example, 384MB) 1.5 times the memory size (for example, 256MB) of the RAM 31 of each PC 2
  • a size for example, 384MB
  • the memory size for example, 256MB
  • the sub-search condition creating unit 7 divides the search request input from the search request input unit 41 and creates four sub-search conditions S QA, SQB, SQC, and SQD. ing.
  • the predicted search times of these jobs predicted by the processing time prediction unit 11 indicate that the jobs IB, 1D, 2B, and 2D require substantially the same predicted search time, and that the jobs 1A, 1C, 2A, and 2C require approximately the same estimated search time, and jobs 1A, 1C, 2A, and 2C have approximately twice the predicted search time as jobs 1B, ID, 2B, and 2D. Search time is required.
  • the DB affinity setting unit 9 sets the DB affinity so that the sub database SDB 1 is preferentially assigned to PC 2 a and PC 2 b. It is assumed that the DB affinity is set so that the sub database SDB 2 is preferentially assigned to PC 2 c and PC 2 d.
  • the assignment management unit 8 sets each of the jobs 1A, 1B, 1C, ID, 2A, 2B, 2C, 2D created by the sub-search request creating unit 5 as described above in the DB affinity setting.
  • each PC 2 is assigned individually (assignment management step).
  • the assignment management unit 8 assigns each job to the PC 2 using a dynamic job assignment method.
  • FIG. 9 is a diagram showing an example of a state in which a job has been assigned to each PC 2 by the assignment management unit 8 in the information search system 1 as one embodiment of the present invention.
  • the assignment management unit 8 determines whether there is any unassigned job (step A 10
  • step A10 If there is no unassigned job (see NO route in step A10), the processing ends.
  • the assignment management unit 8 determines whether there is a PC 2 waiting for the job to be assigned, that is, processes the job. Determine if any PC 2 is ready (step A 20).
  • the assignment management unit 8 refers to the DB affinity setting unit 9 and the DB facility for the PC 2 It is determined whether there is a matching (matching) job (step A60). If there is a job that matches the DB affinity for the PC 2 (refer to the YES route in step A60), the processing time prediction unit 11 refers to the predicted search time to determine the DB affinity. Among the jobs that match the two, the job with the longest predicted search time is assigned to the PC 2 (step A80), and the process proceeds to step A10.
  • step A60 the assignment of the job to the PC 2 is completed (step A50), and the process returns to step A20.
  • the assignment management unit 8 determines whether there is any PC 2 executing the job (step A). 30). If there is a PC 2 executing a job (see the YES route in step A 30), the assignment management unit 8 waits for the job of the PC 2 to be completed (step A 70). Return to 10. If there is no PC 2 that is executing the job (see the NO route in step A30), the assignment management unit 8 outputs a message indicating an error to the operator of the information search system 1 or the like. Then (step A40), and return to step A10.
  • a job is dynamically allocated to each PC 2 by the job allocation method as described above.
  • the processing of job 2C takes 1.5 times longer than the predicted search time predicted by the processing time prediction unit 11, and the processing of job 2D is performed. It is assumed that the processing time is twice as long as the prediction search time predicted by the processing time prediction unit 11.
  • each PC 2 processes the assigned job. That is, each PC 2 searches the sub database based on the sub search condition, and transmits the search result to the management server 20.
  • the sub database assigned to each PC 2 is smaller than the size of RAM 31 provided for each PC 2, the sub database is searched for sub-search conditions.
  • the search can be performed by expanding all the sub-databases to be searched on the RAM 31, and the search process can be performed at high speed without generating a disk access or the like.
  • the search results from each PC 2 are combined (merged) by the combining unit 12 in the management server 20, transmitted to the search result output unit 42 as a search result for the search request, and presented to the operator. .
  • the sub-database creation unit 6 (sub-search request creation unit 5) has a capacity S, which is equal to or less than the capacity of the RAM 31 provided for each PC 2. Creates multiple sub-databases of the same size and assigns these sub-databases to each PC 2 so that the PC 2 that has processed the sub-search request once has its RAM 31 Sub-databases are cached. As a result, in p C 2, it is not necessary to access a hard disk having a slow access speed (disk access) to access the sub database, and the retrieval process for the sub database can be performed at high speed.
  • a sub-database in charge of each PC 2 is set in advance in the DB affinity setting section 9 as a DB affinity, and the assignment management section 8 complies with the DB affinity.
  • the assignment management unit 8 can easily assign the job (sub database) to the PC 2, and the PC 2 that has once processed the sub search request has its RAM 31 Then, the sub database to be searched is cached, and the PC 2 does not need to perform a disk access to access the sub database, so that the search process for the sub database can be performed at high speed.
  • the sub-search condition creating unit 7 divides the search condition input from the search request input unit 41 to create a sub-search condition, so that an appropriate (arbitrary) size (predicted search time length) is obtained.
  • the sub search request can be easily created, and the convenience is high. Since the processing time prediction unit 11 predicts the predicted search time for each job, the allocation management unit 8 can easily perform dynamic job allocation and is highly convenient.
  • the sub-database that each PC 2 processes with priority is set in advance by the operator or administrator, so that the DB affinity can be set reliably.
  • the sub-database that has been processed in the past is preferentially set as the DB affinity, so that the DB affinity can be easily set.
  • the memory management unit 10 manages the free space of the RAM 31 provided in each PC 2 and allocates a sub-database smaller than the free space to each PC 2 to easily and reliably. A sub-database smaller than the free space of RAM 31 of PC 2 can be allocated to PC 2.
  • the processing time prediction unit 11 predicts the time (predicted search time) required for processing of each sub search request by each PC 2, and the allocation management unit 8 preferentially sub-processes the sub search request having the long predicted search time. By allocating the search request to the PC 2, a plurality of jobs can be efficiently allocated to the plurality of PCs 2.
  • the assignment management unit 8 determines whether the? ⁇ 2 of 1] ⁇ [31 remaining capacity or less A job with a data size of PC 2 may be assigned to PC 2, and the sub-search request creating unit 5 may assign a job to PC 2 so that the data size is within the remaining capacity of RAM 31 of PC 2. Unassigned (unallocated) jobs (subdatabases) may be further divided into less than the remaining capacity and assigned to PC 2. That is, in the present information search system 1, the allocation management unit 8
  • the usage status or free space (size of unused area) of RAM 31 of each PC 2 is obtained, and according to the free space, unallocated jobs are further added to the sub search request creation unit.
  • the job may be divided into a plurality of jobs (sub-databases) less than the free space, and the job created by the division may be assigned to the PCs 2.
  • FIG. 10 is a flow chart (step B 1) showing another job allocation method by the allocation management unit 8 in the information search system 1 as one embodiment of the present invention, with reference to FIG. 11. 0 to B110).
  • FIG. 11 is a diagram showing an example of another state in which a job is assigned to each PC 2 by the assignment managing unit 8 in the information search system 1 as one embodiment of the present invention.
  • the allocation management unit 8 determines whether there is an unallocated job (step B10). If there is no unallocated job (see the N route in step B10), the processing ends. .
  • the assignment managing unit 8 next determines that there is a PC 2 waiting for the job to be assigned. It is determined whether or not there is any PC 2 in a state where it can process (step B 20).
  • the assignment management unit 8 next determines whether there is any PC 2 that is executing the job. Judge (step B30). If there is a PC 2 executing a job (see the YES route in step B 30), the assignment management unit 8 waits for the job of the PC 2 to be completed (step B 50), Return to 10. If there is no PC 2 executing the job (see the NO route in step B 30), the assignment management unit 8 sends a message indicating an error to the operator of the information search system 1. Output (Step B 40), and return to step B 10.
  • the assignment management unit 8 refers to the DB facility setting unit 9 to check the DB 2 Then, it is determined whether or not there is a job that matches (matches) (Step B60). If there is a job that matches the DB affinity for the PC 2 (see the YES route in step B60), the assignment management unit 8 refers to the predicted search time by the processing time prediction unit 11 Then, among the jobs suitable for the PC 2, the job having the longest predicted search time is assigned to the PC 2 (step B80), and the process proceeds to step B10.
  • the allocation management unit 8 refers to the memory management unit 10, checks the remaining capacity of the RAM 31 of the PC 2, and checks the job size of the data within the remaining capacity of the PC 2. It is determined whether or not there is, that is, whether or not there is a job whose sub-database size is within the remaining capacity of the PC 2 (step B70). If there is a job with a data size within the remaining capacity of the PC 2 (see the YES route in step B70), the assignment management unit 8 proceeds to step B80. In other words, even if the DB affinity does not match, the assignment management unit 8 assigns the job with the longest predicted search time among the jobs whose data size is within the remaining capacity of the RAM 31 of the PC 2 to that PC. It allocates to 2 (step B80) and moves to step B10.
  • the assignment management unit 8 sets the unassigned job to the PC 2. For example, if there is a job that can be created so that the RAM of PC 2 is less than or equal to the remaining capacity of RAM3, that is, if there is a job that can further divide its sub-database for unassigned jobs (Step B 90).
  • step B90 If there is a job that can further divide the sub-database (see YES route in Step B90), the assignment management unit 8 sends the sub-database of the job to the RAM3 It will be less than 1 remaining capacity Then, the process is further divided (step B1 10), and the process proceeds to step B10.
  • step B90 If there is no job that can further divide the sub-database (see NO route in step B90), the job assignment to the PC 2 is terminated (step B100), and the process proceeds to step B20. .
  • a job is dynamically assigned to each PC 2 as shown in FIG.
  • the processing of job 2A takes 1.5 times longer than the predicted search time predicted by the processing time prediction unit 11
  • the processing of job 2C takes It is assumed that it takes twice as long as the prediction search time predicted by the processing time prediction unit 11.
  • both jobs 2D-1 and 2D-2 are created by subdividing job 2D.
  • the job 2D— 1 and 2D— 2 do not have the same DB affinity as the PCs 2 a and 2 b.
  • 2 Sub-database for D-2 SDB 2 is not cached. Therefore, in order for PCs 2a and 2b to process jobs 2D-1 and 2D-2, PCs 2a and 2b store the sub-database SDB 2 for jobs 2D-1 and 2D-2 in database 3
  • the job 2D-1 and 2D-2 must be re-divided and created by the sub-search request creating unit 5 so as to reduce the read load. As a result, the processing time (search) of the entire system is shortened, the processing speed can be shortened as a result, and the risk of allocating jobs that do not conform to the DB affinity (sub database read load) ) Can be reduced.
  • the sub-database S DB 1 is stored in the RAM 31 of the PCs 2a and 2b. Are cached continuously, and the search speed does not decrease when PCs 2a and 2b are again processed for the sub-database SDB 1.
  • the assignment management unit 8 may randomly assign a job to each PC 2 based on the evaluation result by this evaluation unit. . Thereby, the assignment management unit 8 can easily and reliably assign a job to each PC 2.
  • each PC 2 has the same capacity R
  • the AM 31 is provided, the present invention is not limited to this.Each PC 2 may have a RAM 31 of a different size, and various modifications may be made without departing from the spirit of the present invention. Can be implemented.
  • the management server 20 is realized by, for example, a computer (information processing device) having a server function, and the CPU of the computer executes an information search program to thereby execute the above-described sub-search request creating unit 5 and sub-database.
  • the programs (information retrieval program) for realizing the function as the evaluation unit are, for example, flexible disk, CD-ROM, CD-R, CD-R / W, DVD, DVD-R, DVD-R / It is provided in a form recorded on a computer-readable recording medium such as a W, magnetic disk, optical disk, or magneto-optical disk.
  • a computer-readable recording medium such as a W, magnetic disk, optical disk, or magneto-optical disk.
  • the computer reads the program from the recording medium, transfers the program to an internal storage device or an external storage device, stores the program, and uses the program.
  • the program may be recorded on a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided to the computer from the storage device via a communication path. .
  • Sub search request creator 5 Sub database creator 6, Sub search condition creator 7, Assignment manager 8, DB abundity setting unit 9, Memory management unit 10, Processing time prediction unit 11, Combining unit 12,
  • the program stored in the internal storage device RAM or ROM of the printer in this embodiment
  • the computer may read and execute the program recorded on the recording medium.
  • the computer is a concept including hardware and an operating system, and means hardware that operates under the control of the operating system.
  • the hardware itself corresponds to a computer.
  • the hardware includes at least a microprocessor such as a CPU and means for reading a computer program recorded on a recording medium.
  • the management server 20 has a function as a computer. It is doing.
  • the above-mentioned flexible disk CD-ROM, CD-R, CD-R / W, DVD, DVD-R, DVD-R / W, magnetic disk, optical disk, magneto-optical
  • Various computer-readable media can be used. If each embodiment of the present invention is disclosed, it can be manufactured by those skilled in the art. Industrial applicability
  • the information retrieval system, the information retrieval method, the information retrieval device, the information retrieval program, and the computer-readable recording medium recording the program according to the present invention include a plurality of information processing devices, It is useful for a plurality of information processing devices to process search requests consisting of a database and search conditions for this database in parallel. Especially, each information processing device processes search requests at high speed. It is suitable for performing high-speed searches against databases.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système d'extraction d'informations caractérisé en ce que l'opération d'extraction d'une base de données récupérée par une pluralité de processeurs d'informations est accélérée par division de la base de données en sous-bases de données présentant chacune une capacité de mémoire non supérieure à celle de chaque processeur d'informations lors de la réception d'une demande d'extraction constituée de conditions d'extraction de la base de données, par division de la demande d'extraction en vue d'un traitement en parallèle puis par combinaison des résultats de traitement.
PCT/JP2003/003245 2003-03-18 2003-03-18 Systeme d'extraction d'informations Ceased WO2004084095A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2003/003245 WO2004084095A1 (fr) 2003-03-18 2003-03-18 Systeme d'extraction d'informations
JP2004569562A JPWO2004084095A1 (ja) 2003-03-18 2003-03-18 情報検索システム,情報検索方法,情報検索装置,情報検索プログラムおよび当該プログラムを記録したコンピュータ読取可能な記録媒体
US11/068,395 US20050165765A1 (en) 2003-03-18 2005-03-01 Information search system, information search method, information search apparatus, and recording medium to which information search program, is recorded and which can be read by computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2003/003245 WO2004084095A1 (fr) 2003-03-18 2003-03-18 Systeme d'extraction d'informations

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/068,395 Continuation US20050165765A1 (en) 2003-03-18 2005-03-01 Information search system, information search method, information search apparatus, and recording medium to which information search program, is recorded and which can be read by computer

Publications (2)

Publication Number Publication Date
WO2004084095A1 true WO2004084095A1 (fr) 2004-09-30
WO2004084095A9 WO2004084095A9 (fr) 2005-02-10

Family

ID=33018140

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2003/003245 Ceased WO2004084095A1 (fr) 2003-03-18 2003-03-18 Systeme d'extraction d'informations

Country Status (2)

Country Link
JP (1) JPWO2004084095A1 (fr)
WO (1) WO2004084095A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006133970A (ja) * 2004-11-04 2006-05-25 Canon Inc 情報処理装置及びその制御方法及びプログラム
JP2009181577A (ja) * 2008-01-31 2009-08-13 Hewlett-Packard Development Co Lp インテリジェントデータストレージシステム
JP2009541869A (ja) * 2006-07-03 2009-11-26 インテル・コーポレーション 高速音声検索の方法および装置
WO2010001464A1 (fr) * 2008-07-01 2010-01-07 富士通株式会社 Dispositif de recherche et procédé de recherche
JP2010524060A (ja) * 2007-03-30 2010-07-15 アリババ グループ ホールディング リミテッド 分散コンピューティングにおけるデータマージング
JP2012133371A (ja) * 2012-01-04 2012-07-12 Intel Corp 高速音声検索の方法および装置
CN114764918A (zh) * 2021-01-13 2022-07-19 腾讯科技(深圳)有限公司 特征检索方法、装置、计算机设备和存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682853B2 (en) * 2008-05-16 2014-03-25 Paraccel Llc System and method for enhancing storage performance in analytical database applications

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0950438A (ja) * 1995-08-07 1997-02-18 Hitachi Ltd 生体高分子配列ホモロジ検索方法
JPH1185585A (ja) * 1997-09-12 1999-03-30 N T T Data:Kk 完全メモリ常駐型インデックス方法および装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002090978A1 (fr) * 2001-05-04 2002-11-14 Paracel, Inc. Procédé et dispositif pour recherches approchées de sous-chaînes à grande vitesse

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0950438A (ja) * 1995-08-07 1997-02-18 Hitachi Ltd 生体高分子配列ホモロジ検索方法
JPH1185585A (ja) * 1997-09-12 1999-03-30 N T T Data:Kk 完全メモリ常駐型インデックス方法および装置

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Akiyama Y., 'Daikibo Heiretsu Keisanki ni yoru Tampakusitsu Joho Kaiseki', Journal of Japanese Society for Artificial Intelligence, 1 january 2000, vol 15, nr 1, pg 27-34 *
Ando K., Kosoku Kensaku niwa Processor Kadoritsu no Kojo ga Fukaketsu 'Shared Everything' ga Chumoku o Atsumeru', Nikkei Electronics, 19 july 1993, nr 586, pg 98-106 *
Kawanishi Y., 'Chokosoku Idenshi Kaiseki Gijutsu no Jitsugen to Super Computer VPP Tekiyo Hyoka', Fujitsu 25 march 1996, vol 47, nr 2, pg 183-191 *
Nishijima H., 'Kensaku wa Subtask ni Bunkatsu shite Jikko', Nikkei Computer, 12 december 1994, nr 354, pg 160-161 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006133970A (ja) * 2004-11-04 2006-05-25 Canon Inc 情報処理装置及びその制御方法及びプログラム
US7962475B2 (en) 2004-11-04 2011-06-14 Canon Kabushiki Kaisha Information processing apparatus for searching for a desired image processing apparatus connected to a network, method for controlling the same, and computer-readable storage medium
JP2009541869A (ja) * 2006-07-03 2009-11-26 インテル・コーポレーション 高速音声検索の方法および装置
JP2010524060A (ja) * 2007-03-30 2010-07-15 アリババ グループ ホールディング リミテッド 分散コンピューティングにおけるデータマージング
US8463822B2 (en) 2007-03-30 2013-06-11 Alibaba Group Holding Limited Data merging in distributed computing
JP2009181577A (ja) * 2008-01-31 2009-08-13 Hewlett-Packard Development Co Lp インテリジェントデータストレージシステム
WO2010001464A1 (fr) * 2008-07-01 2010-01-07 富士通株式会社 Dispositif de recherche et procédé de recherche
JP5110162B2 (ja) * 2008-07-01 2012-12-26 富士通株式会社 検索装置および検索方法
US8423499B2 (en) 2008-07-01 2013-04-16 Fujitsu Limited Search device and search method
JP2012133371A (ja) * 2012-01-04 2012-07-12 Intel Corp 高速音声検索の方法および装置
CN114764918A (zh) * 2021-01-13 2022-07-19 腾讯科技(深圳)有限公司 特征检索方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
WO2004084095A9 (fr) 2005-02-10
JPWO2004084095A1 (ja) 2006-06-22

Similar Documents

Publication Publication Date Title
JP4786945B2 (ja) インデックス付与強制クエリ
EP3312714B1 (fr) Méthode parallèle de données distribuées pour la récupération d'espace
TWI539280B (zh) 用於分析未經特定設計以提供記憶體分配資訊之應用程式及擷取記憶體分配資訊的方法、及其電腦系統和電腦可讀儲存媒體
KR20090079012A (ko) 가상 머신의 상태를 저장, 복원하는 방법 및 장치
US9928004B2 (en) Assigning device adaptors to use to copy source extents to target extents in a copy relationship
US10394819B2 (en) Controlling mirroring of tables based on access prediction
CN103959275A (zh) 动态进程/对象范围的存储器关联性调整器
CN108475201A (zh) 一种虚拟机启动过程中的数据获取方法和云计算系统
US7991976B2 (en) Permanent pool memory management method and system
JPWO2009066611A1 (ja) 仮想マシン向けデータ格納システム、データ格納方法およびデータ格納用プログラム
WO2004084095A1 (fr) Systeme d'extraction d'informations
US5678024A (en) Method and system for dynamic performance resource management within a computer based system
CN1924842B (zh) 用于i/o适配器的方法和装置
US7162665B2 (en) Information processing system, method for outputting log data, and computer-readable medium storing a computer software program for the same
US20050165765A1 (en) Information search system, information search method, information search apparatus, and recording medium to which information search program, is recorded and which can be read by computer
JP2009037369A (ja) データベースサーバへのリソース割当て方法
JP4177833B2 (ja) リンクリストへのマルチプロセスアクセス方法および装置
US20050086430A1 (en) Method, system, and program for designating a storage group preference order
JP3884239B2 (ja) サーバ計算機
JP2020135683A (ja) 制御装置、及び制御プログラム
US20250200005A1 (en) Method and apparatus for free space management
JP6020014B2 (ja) 分散データストア管理装置、分散並列処理実行装置、分散並列処理システム、分散データストア管理方法、分散並列処理実行方法、および、コンピュータ・プログラム
CN119719150A (zh) 用于执行加速器资源分配的设备、方法和制品
JP2006277530A (ja) 割当システム、割当装置、割当方法及びそのプログラム
JP2003050731A (ja) ファイルシステムアクセス方法及びその実施装置並びにその処理プログラム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

WWE Wipo information: entry into national phase

Ref document number: 2004569562

Country of ref document: JP

COP Corrected version of pamphlet

Free format text: PAGE 1, DESCRIPTION, REPLACED BY CORRECT PAGE 1

WWE Wipo information: entry into national phase

Ref document number: 11068395

Country of ref document: US