CN115080514B - Index data generation method, information retrieval method, device and computer system - Google Patents
Index data generation method, information retrieval method, device and computer system Download PDFInfo
- Publication number
- CN115080514B CN115080514B CN202210531570.8A CN202210531570A CN115080514B CN 115080514 B CN115080514 B CN 115080514B CN 202210531570 A CN202210531570 A CN 202210531570A CN 115080514 B CN115080514 B CN 115080514B
- Authority
- CN
- China
- Prior art keywords
- target
- data
- writing
- unit
- target data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides an index data generation method, an information retrieval method, an apparatus, a computer system, a computer-readable storage medium, and a computer program product, which can be used in big data, information security technology fields, or other fields. The index data generation method comprises the steps of writing first target data into a distributed full-text search server unit in response to receiving the first target data determined according to the configuration file, recording a writing result, writing a target configuration file corresponding to the target writing result into a message queue retry module in response to detecting the target writing result representing failure of the writing process, determining second target data according to the target configuration file in response to determining that the message queue retry module receives the target configuration file, writing the second target data into the distributed full-text search server unit, and determining index data according to the first target data and the second target data written into the distributed full-text search server unit.
Description
Technical Field
The present disclosure relates to the field of big data, information security technologies, and more particularly, to an index data generation method, an information retrieval method, an apparatus, a computer system, a computer readable storage medium, and a computer program product.
Background
With the development of big data technology, data retrieval is increasingly applied to various fields such as industrial and agricultural production, construction, logistics, daily life and the like. The data retrieval is a process or technology of storing selected, arranged and rated data in a carrier and retrieving accurate data capable of answering questions from a data set according to the needs of users.
An index is a decentralized storage structure created to speed up the retrieval of data lines in a table. The index is built against a table that is made up of index pages other than the data page, with rows in each index page containing logical pointers to speed up the retrieval of physical data.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art, and the data retrieval efficiency is not high.
Disclosure of Invention
In view of this, the present disclosure provides an index data generation method, an information retrieval method, an apparatus, a computer system, a computer-readable storage medium, and a computer program product.
One aspect of the disclosure provides an index data generating method, which comprises the steps of writing first target data into a distributed full-text search server unit in response to receiving the first target data determined according to a configuration file, recording a writing result, writing a target configuration file corresponding to the target writing result into a message queue retry module in response to detecting a target writing result representing failure of the process of writing the first target data into the distributed full-text search server unit, determining second target data according to the target configuration file in response to determining that the message queue retry module receives the target configuration file, writing the second target data into the distributed full-text search server unit, and determining index data according to the first target data and the second target data written into the distributed full-text search server unit.
One aspect of the disclosure provides an information retrieval method, which comprises the steps of obtaining a target retrieval word, and retrieving the target retrieval word based on index data to obtain a retrieval result, wherein the index data is determined according to the data generation method.
Another aspect of the disclosure provides an index data generating apparatus, which includes a first writing module configured to write first target data into a distributed full-text search server unit in response to receiving the first target data determined according to a configuration file, record a writing result, a second writing module configured to write a target configuration file corresponding to the target writing result into a message queue retry module in response to detecting a target writing result indicating failure in writing the first target data into the distributed full-text search server unit, a first determining module configured to determine second target data according to the target configuration file in response to determining that the message queue retry module receives the target configuration file, a third writing module configured to write the second target data into the distributed full-text search server unit, and a second determining module configured to determine index data according to the first target data and the second target data written into the distributed full-text search server unit.
One aspect of the disclosure provides an information retrieval device, which comprises an information retrieval device and a retrieval module, wherein the information retrieval device comprises an acquisition module and a retrieval module, the acquisition module is used for acquiring target retrieval words, the retrieval module is used for retrieving the target retrieval words based on index data to obtain retrieval results, and the index data is determined by the data generation device according to the disclosure.
Another aspect of the present disclosure provides a computer system including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement an index data generation method and an information retrieval method according to the present disclosure.
Another aspect of the present disclosure provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed, are used to implement an index data generation method and an information retrieval method according to the present disclosure.
Another aspect of the present disclosure provides a computer program product comprising computer executable instructions which, when executed, are for implementing an index data generation method and an information retrieval method according to the present disclosure.
According to the embodiment of the disclosure, the technical means of writing first target data into the distributed full-text search server unit in response to receiving the first target data determined according to the configuration file, recording the writing result, writing the target configuration file corresponding to the target writing result into the message queue retry module in response to detecting the target writing result representing failure of the writing process, determining second target data according to the target configuration file in response to determining that the message queue retry module receives the target configuration file, writing the second target data into the distributed full-text search server unit, and determining index data according to the first target data and the second target data written into the distributed full-text search server unit are adopted.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture to which an index data generation method and an information retrieval method may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of an index data generation method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram for generating search data based on an integrated intelligent search engine system, in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of an information retrieval method according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of an integrated intelligent search engine system with index data generation and information retrieval functionality in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a block diagram of an index data generating device according to an embodiment of the disclosure;
FIG. 7 schematically illustrates a block diagram of an information retrieval apparatus according to an embodiment of the present disclosure, and
Fig. 8 schematically illustrates a block diagram of a computer system suitable for implementing the above-described methods, according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The inventor finds that in the process of realizing the conception of the disclosure, in the background of big data age of blowout of data resources, the trend of completing aggregation and integration of data by applying a real-time search technology is exponentially increased. In addition, for millions of data volumes on the short spot, it is difficult to meet the real-time requirements of intelligent searching.
Embodiments of the present disclosure provide an index data generation method, an information retrieval method, an apparatus, a computer system, a computer-readable storage medium, and a computer program product. The index data generating method comprises the steps of responding to receiving first target data determined according to a configuration file, writing the first target data into a distributed full-text search server unit, recording a writing result, responding to detecting a target writing result which represents failure of the process of writing the first target data into the distributed full-text search server unit, writing a target configuration file corresponding to the target writing result into a message queue retry module, responding to the fact that the message queue retry module receives the target configuration file, determining second target data according to the target configuration file, writing the second target data into the distributed full-text search server unit, and determining index data according to the first target data and the second target data written into the distributed full-text search server unit. The information retrieval method comprises the steps of obtaining target retrieval words and retrieving the target retrieval words based on index data to obtain retrieval results, wherein the index data is determined according to the index data generation method of the embodiment of the disclosure.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which an index data generation method and an information retrieval method may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, to name a few.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the index data generating method and the information retrieving method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the index data generating means and the information retrieving means provided by the embodiments of the present disclosure may be generally provided in the server 105. The index data generation method and the information retrieval method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the index data generating means and the information retrieving means provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Or the index data generation method and the information retrieval method provided by the embodiments of the present disclosure may also be performed by the terminal device 101, 102, or 103, or may also be performed by other terminal devices other than the terminal device 101, 102, or 103. Accordingly, the index data generating device and the information retrieving device provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103, or in other terminal devices different from the terminal device 101, 102, or 103.
For example, the file to be configured may be originally stored in any one of the terminal devices 101, 102, or 103 (for example, but not limited to, the terminal device 101), or stored on an external storage device and imported into the terminal device 101. Then, the terminal device 101 may locally perform the index data generation method provided by the embodiment of the present disclosure, or send the file to be configured to other terminal devices, servers, or server clusters, and perform the index data generation method provided by the embodiment of the present disclosure by the other terminal devices, servers, or server clusters that receive the file to be configured.
For example, the target search term may be originally stored in any one of the terminal devices 101, 102, or 103 (for example, but not limited to, the terminal device 101), or stored on an external storage device and may be imported into the terminal device 101. Then, the terminal device 101 may locally perform the information retrieval method provided by the embodiment of the present disclosure, or transmit the target retrieval word to other terminal devices, servers, or server clusters, and perform the information retrieval method provided by the embodiment of the present disclosure by the other terminal devices, servers, or server clusters that receive the target retrieval word.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that the index data generating method, the information retrieving method, the apparatus, the computer system, the computer readable storage medium and the computer program product of the present disclosure may be used in the technical field of big data and information security, and may also be used in any field other than the technical field of big data and information security, and the application fields of the index data generating method, the information retrieving method, the apparatus, the computer system, the computer readable storage medium and the computer program product of the present disclosure are not limited.
Fig. 2 schematically illustrates a flowchart of an index data generation method according to an embodiment of the present disclosure.
As shown in FIG. 2, the method includes operations S201-S205.
In response to receiving the first target data determined from the configuration file, the first target data is written to the distributed full-text search server unit, and the writing result is recorded in operation S201.
According to an embodiment of the present disclosure, the configuration file may configure at least one of attribute information, access address information, file information, search statement information, user-related information, and the like related to obtaining the first target data. The first target data may be determined from configuration information in the configuration file. The configuration file may include a file written, configured by the data source provider according to predefined rules.
For example, the configuration file may be configured with at least one of file name information, field information corresponding to each of the file name information, information of file contents corresponding to a file name characterized by each of the file name information, and the like. The first target data may be determined from file contents corresponding to the field information according to the field information.
For example, at least one of search statement information, access address information, access port information, instance name information, authorized user name password information, and the like may be also configured in the configuration file. By accessing the set address and port using the authorized user name password, the data information related to the instance name is retrieved based on the retrieval statement, and the first target data can be obtained.
According to embodiments of the present disclosure, a distributed full-text search server unit may be used to store first target data for retrieval. Distributed computing is a new way of computing. The distributed computing can be distributed to a plurality of servers for processing after the search engine is decomposed into a plurality of small modules, so that the overall computing time and computing resources are saved, and the computing efficiency is improved. Full-text searching is a retrieval mode realized by an indexing program or a derivative program based on Lucene (full-text search engine). An index may be created for each word segment by scanning each word segment in the data, indicating the number and location of occurrences of the corresponding word segment in the article. When the user searches, the searching program can search according to the index established in advance and feed back the searching result to the user.
According to an embodiment of the present disclosure, the writing result may include at least one of writing period information, identification information of the writing data, writing success or writing failure, and other related information, and may not be limited thereto.
In response to detecting a target write result that characterizes a failure of the process of writing the first target data to the distributed full-text search server unit, a target profile corresponding to the target write result is written to the message queue retry module in operation S202.
According to the embodiment of the disclosure, in the case that the existence of the first target data with the writing failure is detected according to the writing result, the message queue retry module may store and reprocess the target configuration file corresponding to the first target data with the writing failure in the form of a queue.
In operation S203, in response to determining that the message queue retry module receives the target profile, second target data is determined according to the target profile.
According to the embodiment of the disclosure, after the message queue retry module receives the target configuration file, the second target data may be obtained according to the configuration information in the target configuration file in combination with the aforementioned obtaining manner of the first target data.
In operation S204, the second target data is written into the distributed full-text search server unit.
According to an embodiment of the present disclosure, the second target data may also be stored in the distributed full text search server unit as data for retrieval.
In operation S205, index data is determined from the first target data and the second target data written to the distributed full-text search server unit.
According to the embodiment of the disclosure, after the distributed full-text search server unit receives the first target data and the second target data for retrieval, an index can be established based on the information of the first target data and the second target data, and index data required in the retrieval process can be obtained.
Through the above embodiment of the present disclosure, when the first target data is not successfully written into the distributed full-text search server unit, a retry operation may be performed in combination with the message queue retry module, and the second target data obtained by the retry may be written into the distributed full-text search server unit. The first target data which is failed to be written can be processed in time, so that the efficiency of data real-time writing can be effectively improved, the integrity of all data obtained by construction is ensured, and the retrieval efficiency is improved.
The method shown in fig. 2 is further described below in connection with the specific examples.
According to an embodiment of the present disclosure, writing the first target data to the distributed full-text search server unit in response to receiving the first target data determined from the configuration file may include determining, by the multi-threaded data processing unit, the first target data in response to receiving the configuration file. The first target data is written to the distributed full-text search server unit. In response to detecting successful writing of the first target data to the distributed full-text search server unit within a preset time period, recording, by the database unit, a writing result characterizing successful writing of the first target data to the distributed full-text search server unit. In response to detecting that the first target data is not successfully written into the distributed full-text search server unit within the preset time period, recording, through the database unit, a writing result representing failure of the process of writing the first target data into the distributed full-text search server unit.
According to embodiments of the present disclosure, a multi-threaded data processing unit may provide multiple information processing threads. The plurality of information processing threads may process the information in the received configuration file in parallel to obtain the first target data. The manner of the multithreading may refer to the foregoing manner of obtaining the first target data, which is not described herein.
According to the embodiment of the disclosure, after the first target data is obtained through processing, the first target data can be written into the distributed full-text search server unit for realizing data retrieval in a subsequent retrieval process.
According to an embodiment of the present disclosure, a database table for storing the writing result may be predefined in the database unit according to writing information of the writing result. Information related to the writing result may be recorded in the database table of the database unit. When the writing result is detected to be successful or not, the record information in the database table can be directly and effectively detected.
By combining the high concurrency of the multiple threads, the first target data is determined by the multiple thread processing unit, so that the data processing efficiency can be effectively improved, and the speed of writing the first target data into the distributed full-text search server unit can be increased. In addition, the writing result is recorded through the database unit, so that the writing result can be conveniently managed and detected, and the data processing efficiency is further improved.
According to an embodiment of the present disclosure, the target profile includes a profile sub-file configured with a predetermined file name and predetermined field information, and a data sub-file related to the predetermined file name. The message queue retry module includes a message processing system unit and a consumer application unit. In response to determining that the message queue retry module receives the target configuration file, determining, based on the target configuration file, the second target data may include writing, by the message processing system unit, the configuration subfile and a data subfile associated with the configuration subfile as a task to the consumer application unit. And obtaining, by the consumer application unit, second target data from the data subfiles associated with the configuration subfiles according to the configuration subfiles.
According to an embodiment of the present disclosure, the predetermined field information configured in the configuration sub-file may be determined according to field information included in file contents corresponding to a predetermined file name. If information such as an ID (identification information), date, address, etc. is recorded in the file content corresponding to a certain predetermined file name app1_yyyyyymmdd.csd, field information corresponding to the file app1_yyyyymmdd.csd may be configured as an id|date|address, etc. At this time, all file contents having a file name of app1_yyyymmdd.csd may be included in the data subfile.
According to the embodiment of the disclosure, the message processing system unit may write the information of the target configuration file corresponding to the first target data failed to be written into the distributed full-text search server unit into the consumer application unit. The writing process may include writing each configuration sub-file and the data sub-file associated with the configuration sub-file as a task in the form of a queue to the consumer application unit in the case that the number of the target configuration files includes a plurality or the configuration sub-files configured in the target configuration files includes a plurality.
According to embodiments of the present disclosure, the consumer application unit may be a unit implemented based on a Kafka (a high throughput distributed publish-subscribe messaging system) mechanism. The consumer application unit may be adapted to obtain the second target data from the data subfiles associated with the predetermined field information based on the predetermined field information.
Through the above embodiments of the present disclosure, in the case that writing of the first target data into the distributed full-text search server unit fails, the target configuration information related to the first target data is processed in combination with the message processing system unit and the consumer application unit, so as to obtain second target data corresponding to the first target data that should be originally written into the distributed full-text search server unit. And the second target data can be written into the distributed full-text search server unit in time, so that the integrity of the target data for retrieval is ensured, and the data retrieval efficiency is improved.
According to an embodiment of the present disclosure, the target profile includes at least one of access port information, access user information, and retrieval statement information related to a target database to be accessed. The message queue retry module includes a message processing system unit and a consumer application unit. In response to determining that the message queue retry module receives the target configuration file, determining, based on the target configuration file, the second target data may include writing, by the message processing system unit, at least one of access port information, access user information, and search statement information as a task to the consumer application unit. And obtaining second target data from the target database through the consumer application unit according to at least one of the access port information, the access user information and the search statement information.
According to an embodiment of the present disclosure, the access interface information may include at least one of IP address information to be accessed, port information, and other information related to the access interface, etc. Accessing user information may include authorized user name, password, etc. information. The search statement information may be used to search for a desired data file.
According to the embodiment of the disclosure, the message processing system unit may write the access port information, the access user information and the search statement information corresponding to the first target data failed to be written into the distributed full-text search server unit into the consumer application unit. The writing process may include writing the access port information, the access user information, and the search statement information having the association relationship as one task to the consumer application unit in the case where the access port information, the access user information, and the search statement information are configured. The writing process may further comprise writing different access port information as a task, each in the form of a queue, to the consumer application unit in case only the access port information is configured. The writing process may further include, in case that the access port information and at least one search term information are configured, writing the access port information and each search term information related to the access port information in turn as a task in the form of a queue to the consumer application unit.
According to the embodiments of the present disclosure, in the case where the access user information is not configured, the access user information may not be considered each time one task is written. In case of configuration with access user information, each time a task is written, the corresponding access user information may be written to the consumer application unit as part of the task.
According to the embodiment of the disclosure, the consumer application unit can obtain the second target data from the target database according to the configuration information in each task. For example, in the case where only access port information is included in one task, all information in the target database may be determined as the second target data. In the case where the access port information and the search term information are included in one task, the data information found from the target database according to the search term characterized by the search term information may be determined as the second target data.
Through the embodiment of the disclosure, the second target data can be written into the distributed full-text search server unit in time, so that the integrity of the target data for retrieval is ensured, and the data retrieval efficiency is improved.
According to an embodiment of the disclosure, in response to determining that the message queue retry module receives the target configuration file, determining the second target data according to the target configuration file may include, in the event that the message queue retry module is determined to receive the target configuration file, in response to receiving a modification operation for the target configuration file written in the message queue retry module, determining the second target data according to the modified target configuration file.
According to the embodiment of the disclosure, the reasons for the failure of the process of writing the first target data into the distributed full-text search server unit can comprise at least one of overload when the multithreaded data processing unit processes the configuration file in parallel, interruption or stagnation of the processing process, and errors of information configured in the configuration file.
According to the embodiment of the disclosure, under the condition that the message queue retry module receives the target configuration file, configuration information in the target configuration file can be checked and modified to ensure the accuracy of the configuration information, so that effective second target data can be obtained.
According to the embodiment of the present disclosure, the manner of determining the second target data according to the modified target configuration file may refer to the foregoing manner of determining the second target data according to the target configuration file, which is not described herein again.
Through the embodiment of the disclosure, the target configuration information is written into the message queue retry module to carry out operations such as modification and retry, so that errors can be removed in time, effective second target data can be obtained, the accuracy of obtaining the target data for retrieval is improved, the integrity of obtaining the target data for retrieval is ensured, and the data retrieval efficiency is effectively improved.
According to the embodiment of the disclosure, based on the determined configuration file, the multi-thread data processing unit, the database unit, the distributed full-text search server unit, the message processing system unit and the consumer application unit are combined to form the comprehensive intelligent search engine system of the distributed full-text search server cluster architecture.
FIG. 3 schematically illustrates a flow diagram for generating search data based on an integrated intelligent search engine system, in accordance with an embodiment of the present disclosure.
As shown in FIG. 3, the method includes operations S301-S315.
In operation S301, a configuration sub-file is uploaded.
According to an embodiment of the disclosure, a configuration file unit and a file system unit may also be provided in the integrated intelligent search engine system. The configuration file unit can write the data source provider with the file name and the file field information according to the rule to obtain the configuration subfile, and write the configuration subfile into the file system unit.
In operation S302, a data subfile is uploaded.
According to embodiments of the present disclosure, a data file unit may also be provided in the integrated intelligent search engine system. The data file unit may write a source file associated with a configured file name in the configuration subfile to the file system unit.
In operation S303, the configuration file is read.
According to embodiments of the present disclosure, a file system unit may scan a file name and may move a source file corresponding to the file name after scanning the file name of a corresponding file format.
In operation S304, the file list is divided.
According to embodiments of the present disclosure, a file system unit may move and import source files scanned while scanning for the same profile in the same path according to date and order of about to import. For source files scanned in different configuration files, different paths may be moved and imported in the same responsibility.
In operation S305, the upstream database access information is uploaded.
According to an embodiment of the present disclosure, a first database unit for storing access information provided by a data source provider may also be provided in the integrated intelligent search engine system. The first database unit may construct a database connection pool by connecting databases including MySQL, oracle, SQL SERVER, etc. provided by the data source provider to write relevant access information to the multi-threaded data processing unit. The access information may include, for example, at least one of database IP address information, access port information, instance name information, user name password information, and statement information about SQL, etc. of the production environment.
It should be noted that, operations S301 to S304 may implement one method of importing data for retrieval in the form of file collection, and operation S306 may implement another method of importing data for retrieval in the form of directly accessing a database.
In operation S306, data is read and processed.
According to an embodiment of the present disclosure, the multithreaded data processing unit may be written based on the divided file in operation S301 and the access information obtained in operation S305, and the divided file and the received access information may be processed by the multithreaded data processing unit.
In operation S307, the multithreading information is recorded.
According to an embodiment of the present disclosure, a second database unit for recording processing information may be further provided in the integrated intelligent search engine system. The file dividing result to be processed and the related information based on the multithreading data processing unit when the divided file is processed can be recorded in the second database unit, so that information tracking is facilitated.
In operation S308, the target cluster is written.
According to embodiments of the present disclosure, the multithreaded data processing unit may import the partitioned files into the target cluster of distributed full-text search server units in a profile configuration field format.
According to an embodiment of the disclosure, the multithreaded data processing unit may also read the access information, obtain the access result, and write the access result into the target cluster of the distributed full-text search server unit.
In operation S309, it is determined whether the data is successfully written. If yes, operations S311 to S315 are performed.
In operation S310, the data that failed to be written is re-consumed.
According to embodiments of the present disclosure, the exception data may be re-queued for processing by the message processing system unit and the consumer application unit based on the Kafka mechanism.
In operation S311, the backup cluster is written.
According to embodiments of the present disclosure, the data obtained by the re-consumption may be written again to the backup cluster of the distributed full-text search server unit.
In operation S312, the writing result is recorded.
According to the embodiment of the disclosure, the second database unit can also record the information of success and failure of writing when the data is written into the distributed full-text search server unit, thereby being beneficial to tracking detection and timely diagnosing errors.
In operation S313, a sharded index is constructed.
According to embodiments of the present disclosure, after the distributed full-text search server unit receives the data for retrieval, a list of source files may be algorithmically assigned, in conjunction with a slicing technique, to construct index data. The process of constructing the index can adopt a Bulk mode, which is a high-performance data loading mode. The number of fragments may be allocated according to the amount of data that is indexed.
In operation S314, the processed file is moved to the temporary directory.
According to embodiments of the present disclosure, the processed file may include a file with an index built, and after the processed file is moved into the temporary directory, the original file may be deleted, or the processed file may be stored on a backup distributed full text search server.
In operation S315, the writing result is recorded.
According to the embodiment of the disclosure, the statistical result of the data processing of the current climbing net can be written into the second database unit, so that the data obtained by each climbing net can be conveniently compared.
Through the embodiment of the disclosure, the first target data with the writing failure can be processed in time, so that the efficiency of data real-time writing can be effectively improved, the technical problem of low retrieval efficiency is at least partially solved, and the technical effect of improving the retrieval efficiency is further achieved.
Fig. 4 schematically shows a flow chart of an information retrieval method according to an embodiment of the present disclosure.
As shown in FIG. 4, the method includes operations S401-S402.
In operation S401, a target retrieval word is acquired.
In operation S402, a target search term is searched based on the index data, and a search result is obtained.
According to embodiments of the present disclosure, the target term may include one or more related terms determined from the term of the input search box. The related words may include at least one of related words having a similarity to the text content of the search word input to the search box greater than a first threshold, related words having a similarity to the semantic information of the search word input to the search box greater than a second threshold, related words determined according to other means, and the like.
According to an embodiment of the present disclosure, the index data is all data determined according to the aforementioned index data generation method. The search results may include results related to all related words. The search results can be displayed in order from large to small according to the degree of correlation with the search term input to the search box.
Through the embodiment of the disclosure, the search data is written into the full text search server unit in real time and distributed deployment is performed, so that the practicability of the search function can be effectively improved, the search efficiency and the search integrity are improved, and the user experience is improved.
According to embodiments of the present disclosure, obtaining the target term may include obtaining an initial term. And processing the initial search term by using a preset word segmentation device to obtain at least one target search term with similarity with the initial search term being greater than a preset threshold value.
According to embodiments of the present disclosure, the initial search term may be determined from the search term entered into the search box. The preset word segmentation device can comprise a word segmentation device which can segment words based on various word segmentation modes and has a vocabulary expansion function. When vocabulary extension is performed, the extensible vocabulary may be related to actual business requirements.
According to the embodiment of the disclosure, based on the preset word segmentation device, after the initial search word is processed, one or more target search words can be obtained for searching to obtain a search result related to the initial search word.
Through the embodiment of the disclosure, the query of the initial search term is completed by the way of segmenting the initial search term by the multifunctional preset word segmentation device, the word semantics of the initial search term and the conditions of few inputs or multiple inputs of a user can be considered, and the retrieval accuracy and the retrieval efficiency can be effectively improved.
Fig. 5 schematically illustrates a block diagram of an integrated intelligent search engine system with index data generation and information retrieval functionality in accordance with an embodiment of the present disclosure.
As shown in fig. 5, the integrated intelligent search engine system 500 includes a profile unit 501, a data file unit 502, a file system unit 503, a first database unit 504, a multi-threaded data processing unit 505, a second database unit 506, a full text search server unit 507, an online service interface unit 508, a message processing system unit 509, and a consumer application unit 510.
According to embodiments of the present disclosure, the search engine may include a search engine implemented based on an elastesearch framework. In an aspect, through the configuration file unit 501, the data file unit 502, and the file system unit 503, data or files to be imported may be imported into the file system unit with the aid of configuration subfiles in a prescribed format. After importing the data subfiles and configuration subfiles into the file system unit 503, processing may be performed by the multithreaded processing unit 505. On the other hand, after the information input of the first database unit 504 storing the access information is completed, the multi-threaded data processing unit 505 may directly crawl the data from the database into the full text search server unit 507 through SQL statements, and the result information after crawling may be recorded in the second database unit 506. The full-text search server unit 507 may build an index based on the received data, implementing information query.
According to an embodiment of the present disclosure, in case that the data writing to the full text search server 507 fails, the multi-threaded data processing unit 505 may import the input into the message processing system unit 509, and then re-write the data to the full text search server unit 507 through the retry mechanism consumer application unit 510 provided by kafka.
According to an embodiment of the present disclosure, the online service interface unit 508 may be called through an API interface layer when information query is performed, and the online service interface unit 508 may provide an input box for inputting a search term. After receiving the search term through the online service interface unit 508, the search result can be obtained by querying from the full-text search server unit 507 in combination with the processing of the message processing system unit 109.
According to the embodiment of the disclosure, the predetermined word segmentation device can be combined in the query process, so that the accuracy and the completeness of the query result are improved.
According to embodiments of the present disclosure, an integrated intelligent search engine that performs a full text search through a distributed system may not belong to multiple regions of different geographic locations.
Through the embodiment of the disclosure, the obtained full-text search engine is combined with the elastic search and Kafka frames, and high similarity matching is completed based on a reasonable Chinese word segmentation device by reasonable distributed deployment, real-time data reading and writing and other methods, so that the most suitable result can be matched from millions of data in millisecond level after keywords are input, the real-time performance, accuracy and calculation efficiency of the search engine are improved, and the user experience is improved. In addition, the comprehensive intelligent search engine can achieve a cluster architecture with high performance, high availability and load balancing through F5 load balancing.
Fig. 6 schematically shows a block diagram of an index data generating device according to an embodiment of the present disclosure.
As shown in fig. 6, the index data generating device 600 includes a first writing module 610, a second writing module 620, a first determining module 630, a third writing module 640, and a second determining module 650.
The first writing module 610 is configured to write the first target data into the distributed full-text search server unit in response to receiving the first target data determined according to the configuration file, and record a writing result.
A second writing module 620, configured to, in response to detecting a target writing result indicating that the process of writing the first target data into the distributed full-text search server unit fails, write a target configuration file corresponding to the target writing result into the message queue retry module.
The first determining module 630 is configured to determine, in response to determining that the message queue retry module receives the target configuration file, second target data according to the target configuration file.
A third writing module 640 for writing the second target data to the distributed full text search server unit.
The second determining module 650 is configured to determine index data according to the first target data and the second target data written in the distributed full-text search server unit.
According to an embodiment of the present disclosure, the target profile includes a profile sub-file configured with a predetermined file name and predetermined field information, and a data sub-file related to the predetermined file name. The message queue retry module includes a message processing system unit and a consumer application unit. The first determination module includes a first writing unit and a first obtaining unit.
The first writing unit is used for writing the configuration subfile and the data subfile related to the configuration subfile into the consumer application unit as a task through the message processing system unit.
The first obtaining unit is used for obtaining second target data from the data subfiles related to the configuration subfiles through the consumer application unit according to the configuration subfiles.
According to an embodiment of the present disclosure, the target profile includes at least one of access interface information, access user information, and retrieval statement information related to a target database to be accessed. The message queue retry module includes a message processing system unit and a consumer application unit. The first determination module includes a second writing unit and a second obtaining unit.
And the second writing unit is used for writing at least one of the access interface information, the access user information and the search statement information into the consumer application unit as a task through the message processing system unit.
And the second obtaining unit is used for obtaining second target data from the target database through the consumer application unit according to at least one of the access interface information, the access user information and the search statement information.
According to an embodiment of the disclosure, the first determination module comprises a first determination unit.
And the first determining unit is used for responding to the received modification operation for the written target configuration file in the message queue retry module under the condition that the message queue retry module receives the target configuration file, and determining second target data according to the modified target configuration file.
According to an embodiment of the present disclosure, the first writing module includes a second determining unit, a third writing unit, a first recording unit, and a second recording unit.
And the second determining unit is used for determining the first target data through the multithreaded data processing unit in response to receiving the configuration file.
And a third writing unit for writing the first target data into the distributed full-text search server unit.
And the first recording unit is used for recording a writing result which represents that the process of writing the first target data into the distributed full-text search server unit is successful through the database unit in response to the detection of the successful writing of the first target data into the distributed full-text search server unit within a preset time period.
And the second recording unit is used for recording a writing result representing failure of the process of writing the first target data into the distributed full-text search server unit through the database unit in response to detecting that the first target data is not successfully written into the distributed full-text search server unit within a preset time period.
Fig. 7 schematically shows a block diagram of an information retrieval apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the information retrieval apparatus 700 includes an acquisition module 710 and a retrieval module 720.
And an acquisition module 710, configured to acquire the target term.
And the retrieval module 720 is configured to retrieve the target retrieval word based on the index data, so as to obtain a retrieval result. Wherein the index data is determined according to a search data generation method implemented in the present disclosure.
According to an embodiment of the present disclosure, an acquisition module includes an acquisition unit and a processing unit.
And the acquisition unit is used for acquiring the initial search term.
The processing unit is used for processing the initial search word by utilizing the preset word segmentation device to obtain at least one target search word with the similarity with the initial search word being larger than a preset threshold value.
Any number of the modules, units, or at least some of the functionality of any number of the modules, units, or units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or in any suitable combination of three of software, hardware, and firmware. Or one or more of the modules, units according to embodiments of the present disclosure may be at least partially implemented as computer program modules which, when executed, may perform the corresponding functions.
For example, any of the first writing module 610, the second writing module 620, the first determining module 630, the third writing module 640, and the second determining module 650, or the obtaining module 710 and the retrieving module 720 may be combined in one module/unit to be implemented, or any one of the modules/units may be split into a plurality of modules/units. Or at least some of the functionality of one or more of the modules/units may be combined with, and implemented in, at least some of the functionality of other modules/units. According to embodiments of the present disclosure, at least one of the first writing module 610, the second writing module 620, the first determining module 630, the third writing module 640, and the second determining module 650, or the retrieving module 710 and the retrieving module 720 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or the first writing module 610, the second writing module 620, the first determining module 630, the third writing module 640 and the second determining module 650, or at least one of the retrieving module 710 and the retrieving module 720 may be at least partially implemented as a computer program module, which when executed may perform the respective functions.
It should be noted that, in the embodiment of the present disclosure, the index data generating device portion corresponds to the index data generating method portion in the embodiment of the present disclosure, and the description of the index data generating device portion specifically refers to the index data generating method portion and is not described herein.
It should be noted that, in the embodiment of the present disclosure, the information retrieving apparatus portion corresponds to the information retrieving method portion in the embodiment of the present disclosure, and the description of the information retrieving apparatus portion specifically refers to the information retrieving method portion and is not described herein.
Fig. 8 schematically illustrates a block diagram of a computer system suitable for implementing the above-described methods, according to an embodiment of the present disclosure. The computer system illustrated in fig. 8 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 8, a computer system 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 803, various programs and data required for the operation of the system 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the system 800 may further include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The system 800 may also include one or more of an input portion 806 including a keyboard, mouse, etc., an output portion 807 including a display such as a Cathode Ray Tube (CRT), liquid Crystal Display (LCD), etc., and speakers, etc., a storage portion 808 including a hard disk, etc., and a communication portion 809 including a network interface card such as a LAN card, modem, etc., connected to the I/O interface 805. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be included in the apparatus/device/system described in the above embodiments, or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Such as, but not limited to, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the index data generation method and the information retrieval method provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may comprise program code that is transmitted using any appropriate network medium, including but not limited to wireless, wireline, etc., or any suitable combination of the preceding.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.
Claims (11)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210531570.8A CN115080514B (en) | 2022-05-16 | 2022-05-16 | Index data generation method, information retrieval method, device and computer system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210531570.8A CN115080514B (en) | 2022-05-16 | 2022-05-16 | Index data generation method, information retrieval method, device and computer system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115080514A CN115080514A (en) | 2022-09-20 |
CN115080514B true CN115080514B (en) | 2025-06-20 |
Family
ID=83247805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210531570.8A Active CN115080514B (en) | 2022-05-16 | 2022-05-16 | Index data generation method, information retrieval method, device and computer system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115080514B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115510064A (en) * | 2022-09-27 | 2022-12-23 | 杭州安恒信息技术股份有限公司 | ES alarm data backfill method, device, equipment and medium |
CN116414830A (en) * | 2022-12-19 | 2023-07-11 | 阿里巴巴(中国)有限公司 | Index construction method and device |
CN116492690A (en) * | 2023-04-13 | 2023-07-28 | 广州炫动信息科技有限公司 | Game data processing method, system and storage medium |
CN116431523B (en) * | 2023-06-12 | 2023-08-29 | 建信金融科技有限责任公司 | Test data management method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019080A (en) * | 2017-07-14 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Data access method and device |
CN114490735A (en) * | 2021-12-20 | 2022-05-13 | 中盈优创资讯科技有限公司 | Method and device for constructing distributed OLAP data analysis based on MPP and full-text index |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468720A (en) * | 2015-11-20 | 2016-04-06 | 北京锐安科技有限公司 | Method for integrating distributed data processing systems, corresponding systems and data processing method |
CN109167672B (en) * | 2018-07-13 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Return source error positioning method, device, storage medium and system |
CN111198769A (en) * | 2018-11-16 | 2020-05-26 | 北京京东金融科技控股有限公司 | Information processing method and system, computer system and computer readable medium |
CN112463800A (en) * | 2020-12-11 | 2021-03-09 | 微医云(杭州)控股有限公司 | Data reading method and device, server and storage medium |
CN113312539B (en) * | 2021-06-10 | 2024-01-12 | 北京百度网讯科技有限公司 | A method, device, equipment and medium for providing retrieval services |
-
2022
- 2022-05-16 CN CN202210531570.8A patent/CN115080514B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019080A (en) * | 2017-07-14 | 2019-07-16 | 北京京东尚科信息技术有限公司 | Data access method and device |
CN114490735A (en) * | 2021-12-20 | 2022-05-13 | 中盈优创资讯科技有限公司 | Method and device for constructing distributed OLAP data analysis based on MPP and full-text index |
Also Published As
Publication number | Publication date |
---|---|
CN115080514A (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115080514B (en) | Index data generation method, information retrieval method, device and computer system | |
US9244991B2 (en) | Uniform search, navigation and combination of heterogeneous data | |
US9208219B2 (en) | Similar document detection and electronic discovery | |
CN110162522B (en) | Distributed data search system and method | |
US20170212930A1 (en) | Hybrid architecture for processing graph-based queries | |
US12277105B2 (en) | Methods and systems for improved search for data loss prevention | |
CN103827852B (en) | Aggregate WEB Pages on Search Engine Results Pages | |
CN111782824A (en) | Information query method, device, system and medium | |
US20140006369A1 (en) | Processing structured and unstructured data | |
US11567906B2 (en) | Generation and traversal of a hierarchical index structure for efficient data retrieval | |
CN111435376A (en) | Information processing method and system, computer system, and computer-readable storage medium | |
CN116594683A (en) | Code annotation information generation method, device, equipment and storage medium | |
CN112052259A (en) | Data processing method, apparatus, equipment and computer storage medium | |
CN117874082A (en) | Method for searching associated dictionary data and related components | |
US20130346405A1 (en) | Systems and methods for managing data items using structured tags | |
CN119248799B (en) | Database multi-transaction processing method, device, equipment and storage medium | |
CN110399431A (en) | A kind of incidence relation construction method, device and equipment | |
US9286349B2 (en) | Dynamic search system | |
CN116628042A (en) | Data processing method, device, equipment and medium | |
CN114579573A (en) | Information retrieval method, information retrieval device, electronic equipment and storage medium | |
US20250245236A1 (en) | Semantic searching of structured data using generated summaries | |
CN109710673B (en) | Work processing method, device, equipment and medium | |
US20190012360A1 (en) | Searching and tagging media storage with a knowledge database | |
CN108256096B (en) | Data processing method and device | |
CN116483954A (en) | Data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |