Retrieval method and retrieval system
Technical Field
The present invention relates to computer communication technologies, and in particular, to a search method and a search system.
Background
With the development of computer communication technology and internet technology, information resources of the internet have increased exponentially, retrieving and acquiring related information through the internet is becoming an indispensable part of people's life and work, how to effectively retrieve and acquire information required by a user is becoming a problem to be solved urgently, and two commonly used retrieval methods applied to a search retrieval end are briefly described below.
The first retrieval method is a long connection packet & unpacker of the Tencent so, and belongs to a lightweight network protocol framework, the long connection packet & upack implements retrieval by using a key value pair mode of < key, value >, and the client and the retrieval server preset a key (key) and a key value (value) pair corresponding to each retrieval field, for example, when the client sends a retrieval request packet to the retrieval server, it is assumed that the retrieval condition includes three parameters, the keys corresponding to the three parameters are A, B, C respectively, and the key values corresponding to the three parameters are obtained according to the key values of a 32-bit integer value preset by the client and the retrieval server, and the key values are: keyA, keyB, and keyC, so that the format of the search request packet transmitted to the search server according to the search request packet format negotiated in advance with the search server may be as follows:
packet header (32bit) + < A, keyA > + < B, keyB > + < C, keyC >
After receiving the retrieval request packet, the retrieval server firstly analyzes the packet header information and verifies the legality and the length of the retrieval request packet according to the analyzed packet header information; secondly, matching the received retrieval request packet according to a retrieval request packet format negotiated with the client in advance, if the matching is unsuccessful, discarding the retrieval request packet, and if the matching is successful, sequentially taking out key values (keyA, keyB and keyC) corresponding to the three parameters; then, searching is carried out according to the key value, and the content of the searching request is obtained.
The second retrieval method is protocol caching (protobuf) of google (google), also belongs to a lightweight network protocol framework, and is a structured data serialization method which is provided by google and has no relation with language, no relation with platform, and good expansibility, and is used for communication protocol and data storage, and based on binary system, multi-languages (C + +, Java, Python) are supported, and the serialized transmission of structured data is supported. The method comprises the steps of defining a serializable data structure through an adaptive file (jce document) by downloading and installing proto buf software to serialize retrieval fields, defining Reader and Writer for each data structure, sharing the same Reader and Writer by both communication sides (a client side and a retrieval server), carrying out matching processing by the retrieval server according to the shared Reader, discarding the data structure if matching is unsuccessful, and retrieving according to the serialized retrieval fields if matching is successful to obtain the content of a retrieval request.
As can be seen from the above, in the existing retrieval methods, both of the data structures that failed to match are directly discarded. For example, if the client adds a serialized retrieval field in the data structure, the retrieval server will cause a matching failure during matching, thereby discarding the retrieval request packet, resulting in that the retrieval cannot be performed, and causing the retrieval efficiency to be low; furthermore, for the long connection packet & unpacker, a great number of key, value key value pairs appear because the key, value key value pairs need to be adopted, and the key value need to be defined in advance, so that the key value pairs corresponding to the protocol fields of the client and the retrieval server are required to synchronize each time the protocol fields are added and deleted, which is not beneficial to version release, especially cross-department cooperation, and the retrieval universality and expandability are low; moreover, the definition of the key value causes a sharp expansion of the definition of the data, which causes great trouble to the maintenance.
Disclosure of Invention
In view of the above, the present invention is directed to a retrieval method, which improves the retrieval efficiency.
Another objective of the present invention is to provide a search system, which improves the efficiency of search.
In order to achieve the above object, the present invention provides a search method, including:
analyzing a retrieval request packet from a client to obtain packet header content and packet body content, wherein the packet header content is a check value generated by the client according to a pre-negotiated check strategy, and the packet body content is obtained by performing serialized data structure processing on a key value corresponding to a retrieval field input by a user according to a serialized data structure defined by a preset adaptation file by the client;
regenerating a check value according to the check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content;
and if the regenerated check value is the same as the check value carried in the packet header content, matching the packet header content with the defined serialized data structure, acquiring a key value corresponding to the successfully matched retrieval field, and retrieving.
The verification strategy comprises at least one of the following: the cyclic redundancy check CRC combines the header validity check, the message digest algorithm fifth version MD5 check, and the hamming check.
The check strategy is Cyclic Redundancy Check (CRC) combined with packet header validity check, and the method further comprises the following steps:
according to the pre-negotiated encryption field, carrying out CRC encoding on a random number to generate a CRC code table;
acquiring a length information value of a retrieval request packet;
and taking the encryption field, the random number, the CRC code table and the retrieval request packet length information value as check values.
The regenerating a check value according to the check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content includes:
a, judging whether the length information value of the retrieval request packet carried in the packet header content obtained by analysis is not greater than a preset retrieval request packet length threshold value, if so, executing the step B, and if not, judging that the regenerated check value is different from the check value carried in the packet header content;
b, performing CRC coding according to the encrypted field and the random number carried in the packet header content obtained by analysis to generate a CRC code table;
and C, judging whether the generated CRC code table is the same as the CRC code table carried in the packet header content, if so, judging that the regenerated check value is the same as the check value carried in the packet header content, and if not, judging that the regenerated check value is different from the check value carried in the packet header content.
A retrieval system, the retrieval system comprising: a client and a retrieval server, wherein,
the client is used for receiving a retrieval field input by a user, acquiring a key value corresponding to the retrieval field, performing serialized data structure processing on the key value according to a serialized data structure defined by a preset adaptation file, acquiring the packet body content of a retrieval request packet, generating a check value as the packet head content of the retrieval request packet according to a pre-negotiated check strategy, and sending the retrieval request packet to a retrieval server;
the retrieval server is used for receiving the retrieval request packet, analyzing and acquiring packet header content and packet body content, regenerating a check value according to the check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content; and if the regenerated check value is the same as the check value carried in the packet header content, matching the packet header content with the defined serialized data structure, acquiring a key value corresponding to the successfully matched retrieval field, and retrieving.
The client comprises: a data structure processing unit, a check value generating unit, and an encapsulating unit, wherein,
the data structure processing unit is used for receiving a retrieval field input by a user, acquiring a key value corresponding to the retrieval field, and performing serialized data structure processing on the key value according to a serialized data structure defined by a preset adaptation file to acquire the inclusion content of a retrieval request packet;
a check value generating unit, configured to generate a check value as a packet header content of the search request packet according to a pre-negotiated check policy;
and the encapsulating unit is used for encapsulating the packet head content and the packet body content, generating a retrieval request packet and sending the retrieval request packet to the retrieval server.
The check value generation unit includes: a CRC check value generating sub-unit and a retrieval request packet length information value acquiring sub-unit, wherein,
and the CRC check value generation subunit is used for carrying out CRC coding on the encrypted field negotiated in advance and a random number to generate a CRC code table, and taking the encrypted field, the random number, the CRC code table and the search request packet length information value acquired by the search request packet length information value acquisition subunit as the packet header content of the search request packet.
The retrieval server comprises an analysis unit, a check value checking unit, a matching unit and a retrieval unit, wherein,
the analysis unit is used for receiving the retrieval request packet and analyzing and acquiring packet header contents;
a check value checking unit, configured to regenerate the check value according to the check policy, and determine whether the regenerated check value is the same as the check value carried in the packet header content; if the regenerated check value is the same as the check value carried in the packet header content, informing an analysis unit to analyze the packet body content;
the matching unit is used for matching the inclusion content with the defined serialized data structure to obtain a key value corresponding to the successfully matched retrieval field;
and the retrieval unit is used for retrieving according to the key value corresponding to the successfully matched retrieval field to obtain a retrieval result.
The check value checking unit includes: a CRC check value check subunit and a retrieve request packet length check subunit, wherein,
a search request packet length checking subunit, configured to check a search request packet length information value carried in the packet header content obtained through the analysis, and notify a CRC check value checking subunit if the search request packet length information value is not greater than a preset search request packet length threshold;
and the CRC check value checking subunit is used for performing CRC coding according to the encrypted field and the random number carried in the packet header content obtained by analysis, generating a CRC code table, determining that the generated CRC code table is the same as the CRC code table carried in the packet header content, and informing the analysis unit to analyze the packet body content.
As can be seen from the foregoing technical solutions, in the retrieval method and the retrieval system provided in the embodiments of the present invention, a retrieval request packet from a client is analyzed to obtain a packet header content and a packet body content, where the packet header content is a check value generated by the client according to a pre-negotiated check policy, and the packet body content is obtained by performing serialized data structure processing on a key value corresponding to a retrieval field input by a user according to a serialized data structure defined by a preset adaptation file by the client; regenerating a check value according to the check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content; and if the regenerated check value is the same as the check value carried in the packet header content, matching the packet header content with the defined serialized data structure, acquiring a key value corresponding to the successfully matched retrieval field, and retrieving. Therefore, by checking the content of the packet header, the access of invalid connections such as illegal requests, port scanning tools and the like can be effectively prevented; and matching each retrieval field contained in the packet body content of the retrieval request packet, ignoring the retrieval fields which are not successfully matched, and performing retrieval according to the key values corresponding to the retrieval fields which are successfully matched, so that the retrieval efficiency is improved.
Drawings
Fig. 1 is a schematic flow chart of a retrieval method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an adaptation file and a data structure according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating a header structure of a search request packet according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of matching the contents of the bag body according to the embodiment of the present invention.
Fig. 5 is a schematic view of an application scenario of the search method according to the embodiment of the present invention.
FIG. 6 is a schematic structural diagram of a retrieval system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
In the prior art, matching processing is performed on the long connection packet & unpacker and the Google proto buf according to a pre-negotiation, so that the retrieval fault tolerance, universality and expandability are poor. In the embodiment of the invention, by using proto buf and long connection packet and unpacker for reference, a retrieval method of a lightweight network protocol based on a search engine online retrieval end is provided, a data structure serialization mode is adopted to provide retrieval service during multi-service retrieval, a more lightweight serialization tool is provided in a simple source code mode, and the retrieval service can be compiled together with a source code of the retrieval service without installation; the service is identified by key values through unifying keys corresponding to services (retrieval fields) of all service departments, and when matching processing is carried out, the fields which are successfully matched in the retrieval request packet are retrieved, and the fields which are not successfully matched in the retrieval request packet are ignored, so that the universality and the expandability of multi-service retrieval of community search are improved.
Fig. 1 is a schematic flow chart of a retrieval method according to an embodiment of the present invention. Referring to fig. 1, the process includes:
step 101, analyzing a retrieval request packet from a client to obtain packet header content and packet body content, wherein the packet header content is a check value generated by the client according to a pre-negotiated check strategy, and the packet body content is obtained by performing serialized data structure processing on a key value corresponding to a retrieval field input by a user according to a serialized data structure defined by a preset adaptation file by the client;
in the step, the client side obtains a corresponding key value according to a retrieval field input by a user, and carries out serialized data structure processing as the inclusion content of a retrieval request packet; the keys corresponding to different services are the same, and different key values correspond to retrieval fields of different services, so that key value pairs of < key, value > are effectively reduced, when protocol fields are added and deleted each time, only a client and a retrieval server are needed to synchronize the key values corresponding to the protocol fields, version release can be facilitated, and different services of each department adopt the same key, so that cross-department cooperation is facilitated; furthermore, the definition of the key value cannot cause the sharp expansion of the data definition, and the maintenance is convenient.
The processing of the serialized data structure specifically includes:
a01, acquiring a preset adaptation file;
in the step, a proto buf protocol mode is adopted to obtain a proto buf Java encryption extension (jce) adaptation file, and a protocol code is generated through a jce document (jce adaptation file) in the subsequent process.
And A02, carrying out serialized data structure processing on the key values corresponding to the search fields according to the serialized data structure defined by the acquired adaptation file.
In the step, the adaptation file defines a serializable data structure, the client background provides a lexical and syntactic analyzer for analyzing the jce document, and keys and key values corresponding to the retrieval fields are analyzed into corresponding source codes of the high-level language through the jce document, wherein the source codes correspond to the serializable data structure.
Fig. 2 is a schematic structural diagram of an adaptation file and a data structure according to an embodiment of the present invention. Referring to fig. 2, the adaptation file is a jce document and includes a plurality of structures, for example, a structure a, a structure B, and the like, where an optional field represents an optional field, and a required field represents a required field, a user can define a default assignment of the field in the jce document by himself, and for the optional field, both communication parties may not make a mandatory constraint but adopt a default value for filling, so that by setting the optional field and the required field, both protocol parties can issue asynchronously without causing protocol confusion. For a detailed description of the jce document, reference may be made to related technical documents, which are not described herein again.
Through the jce document, the keys and key values corresponding to the search fields can be automatically resolved into the source codes of the corresponding high-level languages, as shown in the right part of fig. 2.
In the embodiment of the invention, a jce document defines a common lightweight serialization tool Reader and a common Writer (jce class library) for each data structure, two communication parties share the same Reader and the same Writer, binary reading or writing is carried out according to the identified serial number, and the mutual conversion of a network sequence and a local sequence is realized.
The checking strategy may adopt Cyclic Redundancy Check (CRC) combined with packet header validity Check, Message Digest Algorithm fifth edition (MD 5) Check, hamming Check, or the like.
When the check strategy adopts CRC in combination with header validity check, generating a check value specifically includes:
a11, according to the pre-negotiated encrypted field, performing CRC encoding with a random number to generate a CRC code table;
in this step, the client and the search server negotiate for an encryption field in advance, and store the encryption field respectively.
A12, obtaining the length information value of the search request packet;
and A13, using the encryption field, the random number, the CRC code table and the search request packet length information value as check values.
Fig. 3 is a schematic diagram illustrating a header structure of a search request packet according to an embodiment of the present invention. Referring to fig. 3, the packet header includes a header encryption part and a retrieval request packet length information value part, wherein the header encryption part is composed of a fixed encryption field, a random number and a CRC code table generated by CRC encoding.
102, regenerating a check value according to the check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content;
in this step, if the check value carried in the packet header content is different from the regenerated check value, the search request packet is discarded, so that the search access of invalid connection of an illegal request and invalid connection of a port scanning tool can be effectively rejected by checking the received packet header content.
The step of regenerating the check value according to the check strategy and judging whether the regenerated check value is the same as the check value carried in the packet header content specifically comprises the following steps:
a21, judging whether the length information value of the search request packet carried in the analyzed packet header content is larger than the preset search request packet length threshold value, if so, discarding the search request packet, otherwise, executing the step A22;
a22, performing CRC coding according to the encrypted field and the random number carried in the packet header content obtained by analysis, and generating a CRC code table;
in this step, it may also be determined that the encryption field carried in the packet header content obtained by the parsing is consistent with the encryption field negotiated with the client stored by itself, and then the flow of CRC encoding is performed on the encryption field and the random number.
A23, judging whether the generated CRC code table is the same as the CRC code table carried in the packet header content, if so, judging that the regenerated check value is the same as the check value carried in the packet header content, otherwise, judging that the regenerated check value is not the same as the check value carried in the packet header content, and discarding the search request packet.
In this step, it is determined whether the newly generated CRC code table is the same as the received CRC code table, if so, the check is successful, otherwise, it is determined that the packet body data is erroneous, thereby discarding the search request packet. That is, the regenerated check value includes: and searching a request packet length threshold, an encryption field, a random number and a CRC code table generated by CRC coding according to the encryption field and the random number.
In the embodiment of the invention, because the random number of each request is uncertain, the generated CRC code table is not unique, and therefore, the double check can reject most illegal requests and the access of invalid connections such as port scanning tools and the like.
And 103, determining that the regenerated check value is the same as the check value carried in the packet header content, matching the packet body content with the defined serialized data structure, acquiring a key value corresponding to the successfully matched retrieval field, and retrieving.
In this step, the search fields included in the inclusion content of the search request packet are respectively matched, the search fields with unsuccessful matching are ignored, and the search is performed according to the key value corresponding to the search fields with successful matching, instead of discarding the search request packet when one of the search fields included in the inclusion content of the search request packet is not successfully matched, so that the fault tolerance of the search is improved.
Fig. 4 is a schematic diagram of matching the contents of the bag body according to the embodiment of the present invention. Referring to fig. 4, a structure a including a field 0 and a field 1 is a data structure negotiated by a client and a retrieval server in advance, assuming that the client adds a field 2 in a retrieval request packet sent by the client, that is, adds the field 2 in the structure a, the client serializes information such as key values, lengths, options and the like corresponding to the added field 2 through a Writer, sends the serialized information to the retrieval server through network communication, the retrieval server calls a Reader, reads the retrieval request packet, matches fields included in the structure a, detects that the field 2 is not matched with the data structure negotiated in advance when the field 2 is read, determines that the field belongs to the optional field, discards the field, retrieves the field 0 and the field 1 to obtain a retrieval result, thereby not affecting the communication, and enabling the multi-service retrieval for community search to have higher retrieval efficiency, Good versatility and expandability.
Fig. 5 is a schematic view of an application scenario of the search method according to the embodiment of the present invention. Different services, such as spatial service retrieval, alumni service retrieval and wireless service retrieval, share the same key (fixed key) for packaging, and each retrieval request packet shares the same header information and version information for verification; different values (wup buf) corresponding to the key are used for distinguishing different services, different services adopt the same network protocol framework to acquire different protocol contents and flexibly process the different protocol contents, after retrieval, the corresponding service retrieval results are returned to the client after being packaged by the wup protocol, and different retrieval results correspond to different wup buf, for example, the result returned to the space is as follows: packet header < key, qzone _ buf >; the results returned to the alumni are: the packet header < key, phosphor _ buf >. The packet header and the key are fixed, and the content of the packet body is different due to different services.
FIG. 6 is a schematic structural diagram of a retrieval system according to an embodiment of the present invention. Referring to fig. 6, the retrieval system includes: a client and a retrieval server, wherein,
the client is used for receiving a retrieval field input by a user, acquiring a key value corresponding to the retrieval field, performing serialized data structure processing on the key value according to a serialized data structure defined by a preset adaptation file, acquiring the packet body content of a retrieval request packet, generating a check value as the packet head content of the retrieval request packet according to a pre-negotiated check strategy, and sending the retrieval request packet to a retrieval server;
in the embodiment of the invention, different key values correspond to retrieval fields of different services, and keys corresponding to different services are the same, so that key value pairs are effectively reduced, when protocol fields are added and deleted each time, only the key values corresponding to the protocol fields of the client and the retrieval server are needed to be synchronized, version release is facilitated, different services of each department adopt the same key, and cross-department cooperation is facilitated. The checking strategy may adopt Cyclic Redundancy Check (CRC) combined with packet header validity Check, Message Digest Algorithm fifth edition (MD 5) Check, hamming Check, or the like.
The retrieval server is used for receiving the retrieval request packet, analyzing and acquiring packet header content and packet body content, regenerating a check value according to the pre-negotiated check strategy, and judging whether the regenerated check value is the same as the check value carried in the packet header content; and determining that the regenerated check value is the same as the check value carried in the packet header content, matching the packet header content with the defined serialized data structure, acquiring the key value corresponding to the successfully matched retrieval field, and retrieving.
In the embodiment of the invention, the received packet header content is verified, so that the invalid connection of an illegal request and the retrieval access of the invalid connection of a port scanning tool can be effectively rejected, the retrieval fields contained in the packet body content of the retrieval request packet are respectively matched, the retrieval fields which are not successfully matched are ignored, and the retrieval request packet is discarded according to the key value corresponding to the retrieval fields which are successfully matched, rather than when one of the retrieval fields contained in the packet body content of the retrieval request packet is not successfully matched, so that the retrieval efficiency and the fault tolerance are improved.
Wherein,
the client comprises: a data structure processing unit, a check value generating unit, and an encapsulating unit (not shown in the figure), wherein,
the data structure processing unit is used for receiving a retrieval field input by a user, acquiring a key value corresponding to the retrieval field, and performing serialized data structure processing on the key value according to a serialized data structure defined by a preset adaptation file to acquire the inclusion content of a retrieval request packet;
a check value generating unit, configured to generate a check value as a packet header content of the search request packet according to a pre-negotiated check policy;
and the encapsulating unit is used for encapsulating the packet head content and the packet body content, generating a retrieval request packet and sending the retrieval request packet to the retrieval server.
The check value generation unit includes: a CRC check value generating sub-unit, and a retrieval request packet length information value acquiring sub-unit (not shown in the drawings), wherein,
and the CRC check value generation subunit is used for carrying out CRC coding on the encrypted field negotiated in advance and a random number to generate a CRC code table, and taking the encrypted field, the random number, the CRC code table and the search request packet length information value acquired by the search request packet length information value acquisition subunit as the packet header content of the search request packet.
The retrieval server comprises a parsing unit, a check value checking unit, a matching unit and a retrieval unit (not shown in the figure), wherein,
the analysis unit is used for receiving the retrieval request packet and analyzing and acquiring packet header contents;
a check value checking unit, configured to regenerate the check value according to a pre-negotiated check policy, and determine whether the regenerated check value is the same as the check value carried in the packet header content; determining that the regenerated check value is the same as the check value carried in the packet header content, and informing an analysis unit to analyze the packet body content;
the matching unit is used for matching the inclusion content with the defined serialized data structure to obtain a key value corresponding to the successfully matched retrieval field;
and the retrieval unit is used for retrieving according to the key value corresponding to the successfully matched retrieval field to obtain a retrieval result.
The check value checking unit includes: a CRC check value check subunit, and a retrieve request packet length check subunit (not shown), wherein,
a search request packet length checking subunit, configured to check a search request packet length information value carried in the packet header content obtained through the analysis, and notify a CRC check value checking subunit if the search request packet length information value is not greater than a preset search request packet length threshold;
in the embodiment of the invention, if the length information value of the retrieval request packet is larger than the preset length threshold of the retrieval request packet, the retrieval request packet is discarded.
And the CRC check value checking subunit is used for performing CRC coding according to the encrypted field and the random number carried in the packet header content obtained by analysis, generating a CRC code table, determining that the generated CRC code table is the same as the CRC code table carried in the packet header content, and informing the analysis unit to analyze the packet body content.
In the embodiment of the present invention, the CRC check value checking subunit may also determine that an encrypted field carried in the packet header content obtained by the analysis is consistent with an encrypted field that is stored by itself and negotiated with the client, and then perform a CRC encoding process on the encrypted field and the random number.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.