Detailed Description
The terms first, second, third and the like in the description and in the claims and in the drawings are used for distinguishing between different objects and not for limiting the specified order.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the related art, in a scene of multi-modal search of portrait pictures or videos represented by CLIP, in order to accurately identify attribute information such as person names (e.g., person names, nicknames, person relationships, etc.) and the number of persons from search words describing portrait pictures or videos, there are the following difficulties:
1. Ambiguity of person names user-defined person names are not normalized, such as Zhang San, dither, flower blossom bringing rich and honour, and the like, and therefore, the person names have ambiguity in practical application and are difficult to accurately identify.
2. The limiting object of the quantitative word has a variety that the quantitative word can be used for limiting various subjects, and the quantitative word limiting the person needs to be accurately extracted, for example, three in the three-pot flower is not the quantitative word limiting the person, and two in the double photo are the quantitative word limiting the person.
3. Privacy and power consumption problems for schemes that require user data to be reported to the cloud side to train a language model, uploading user data (e.g., information such as person names of users marked in a gallery) to the cloud side may result in user privacy leakage.
4. The problem of identification accuracy is that the matching inquiry is directly carried out on the search statement and the person name, and the problem of false identification is very easy to be caused, for example, a user searches lyric information containing songs of 'Zhang Sanng', but does not search for 'Zhang Sanng'. Another is to train a language model using natural language understanding technology, and the scheme is very easy to cause the problem of missing recognition due to the limitation of training corpus, for example, a "small and beautiful photo" is found, and the word segmentation of the language model results in a "small", "beautiful" photo.
In order to solve the technical problems, the application provides a multimedia content searching method, which constructs a slot tree corresponding to image attribute information based on existing information in advance, and constructs a slot tree according to each type of attribute information. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. And identifying specific image attribute features in the search statement, such as person names, person numbers and the like, based on the mode tree and the slot tree, namely improving the accuracy of semantic understanding results of the search statement. Further, the matched picture or video is searched based on the image attribute characteristics and the characteristics of two dimensions of the visual semantic vector of the search statement, so that the accuracy of the search result is improved.
The multimedia content searching method provided by the application is suitable for end-side electronic equipment, such as equipment of mobile phones, tablet computers, desktop, laptop computers, notebook computers, ultra-mobile Personal Computer (UMPC), handheld computers, netbooks, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), wearable electronic equipment, intelligent watches and the like.
Referring to fig. 1, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown.
As shown in fig. 1, the electronic device may include a processor 110, a memory 120, a camera 130, a display screen 140, and the like.
It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, and the different processing units may be separate devices or may be integrated into one or more processors.
Memory 120 is used to store computer-executable program code that includes instructions. The processor 110 performs various functional applications and data processing of the electronic device by executing instructions stored in memory. For example, in the present embodiment, the processor 110 may perform a multimedia content search by executing instructions stored in the memory.
The memory 120 may include a stored program area and a stored data area. The storage program area may store an application program (such as a capturing function of an image or video, a playing function of an image or video, etc.) required for at least one function of the operating system. The storage data area may store data created during use of the electronic device (e.g., image data, video data, etc.), and so on.
The camera 130 is used to capture still images or video. The multimedia content of the present application is an image or video.
The display screen 140 is used to display images, videos, and the like.
In addition, an operating system is run on the components. For example, the number of the cells to be processed,The operating system is such that the operating system,The operating system is such that the operating system,An operating system, etc. An application may be running in an operating system, for example, the application to which the present application relates includes a gallery.
In order to solve the problem of poor searching effect of the current multi-mode multimedia content searching scheme, the method for searching the multimedia content provided by the embodiment of the application can be applied to the electronic equipment shown in fig. 1, and particularly can be operated in a gallery application. Moreover, the method can be applied to searching of images and also to searching of videos.
First, a processing module related to a multimedia content searching method in a gallery application is introduced, and in an embodiment of the present application, the gallery application may include a gallery service, a searching module, and a multi-modal understanding module.
The gallery service is used for realizing basic functions of gallery application, such as storing images and videos, further displaying the stored images or videos, and receiving and storing attribute information corresponding to the multimedia content input by a user, such as adding attribute information or modifying attribute information.
In the embodiment of the application, the searching module is used for constructing a slot tree, a mode tree and a picture index so as to realize searching of images or videos stored in a gallery.
In the embodiment of the application, the multi-mode understanding module is used for carrying out visual semantic understanding conversion on image content or video content into corresponding vectors and carrying out vector conversion on search sentences input by a user.
The method for searching multimedia contents according to the present application will be described in detail with reference to fig. 2, and may include the steps of:
s101, the search module builds a slot tree corresponding to the image attribute information based on the existing information.
In the embodiment of the application, the image attribute information includes various information related to the image, for example, may include shooting information (such as shooting time, geographical position and the like) of the image, labeling information (such as name, nickname, person relationship and the like) of a user on an object contained in the image, and an image content label, wherein the content label may be an object contained in the image automatically identified based on the image content, such as sky, sunrise, sunset, seaside, building, children, animals and the like.
The existing information may include at least one of attribute information and encyclopedia knowledge of existing pictures in the gallery.
In the embodiment of the application, a dictionary tree, also called a slot tree, is constructed for part of type attribute information. The dictionary tree is a tree structure, and common prefixes of different character strings do not need to be repeatedly stored, so that the storage space can be saved. Meanwhile, the common prefix of the character strings can be utilized to reduce the inquiry time, reduce unnecessary character string comparison to the maximum extent and improve the inquiry efficiency.
In an exemplary embodiment, information that a user annotates a portrait in an image, such as a name, nickname, relationship of the persona, etc., may be collectively referred to as a "name", and a slot tree built for such attribute information is referred to as a "slot tree". The construction of the title information of the title slot tree may include the title information of the user for the picture annotation in the gallery. In other embodiments, the name slot tree may also be extended according to the contact name stored in the address book of the electronic device.
In one example, FIG. 3 illustrates a so-called slot tree that includes names of persons, nicknames, persona relationships, etc., e.g., where names of persons include Wang Ming, nicknames include small america, xiao Shuai, etc., and persona relationships include siblings, children, daughter, mother, friend, etc., in the example illustrated in FIG. 3. Fig. 3 is only an example, and in practical application, the slot tree may include more nodes and more complex structures, and the present application is not shown one by one.
In another exemplary embodiment, slot trees, such as a person name slot tree, a nickname slot tree, a persona relationship slot tree, etc., may be built separately for person names, nicknames, persona relationships, etc.
The name slot tree only comprises names corresponding to the figures, can be created according to names of the figures of the user in the gallery and further can be expanded according to names of contacts in the address book. For example, in one example, FIG. 4 is a schematic diagram of a personal name slot tree, where the names include Zhang Sanfeng, zhang Sanfeng, zhang, wang Ming, and so on.
The nickname slot tree only comprises nicknames corresponding to the portraits, can be created based on nicknames marked by users on portrait images or videos in the gallery, and further can be expanded according to nicknames of contacts in the instant messaging application. For example, in one example, FIG. 5 is a schematic diagram of a nickname slot tree, such as nicknames including flower blossom bringing rich and honour, flower placing forgetfulness and apprehension, summer, cotton candy, small lovely, and the like.
The character relation slot tree is a dictionary tree created based on words representing the relation between people obtained by summarizing encyclopedia knowledge. For example, in one example FIG. 6 is a schematic diagram of a persona relationship slot tree, e.g., persona relationships may include siblings, sisters, go, sisters, dad, mom, son, daughter, friend, classmate, and so forth.
In addition, in the embodiment of the application, a corresponding slot tree, also called a label slot tree, can be constructed for the image content labels. And constructing a label slot tree according to the content labels of the gallery labels.
In the embodiment of the application, in order to accurately identify the semantic content of the search statement input by the user, corresponding slot trees can be respectively constructed for the possible number words, the graduated words, the suffix words and the stop words (i.e. the words without actual semantics) in the search statement.
The number slot tree is a dictionary tree created based on numbers that may occur in the search term, and commonly used digital words can be summarized according to encyclopedia knowledge and corpus generalization of the search term. For example, in one example, FIG. 7 is a schematic diagram of a number slot tree, where the number may include two, three, four, etc.
The thesaurus slot tree is a dictionary tree created based on the possible thesaurus in the search sentence, and can be used for summarizing the corpus of the search sentence based on encyclopedia knowledge to obtain the common thesaurus. For example, in one example, FIG. 8 is a schematic diagram of a thesaurus tree, e.g., a thesaurus may include a person, a place, a name, a person, etc.
The suffix word slot tree is a dictionary tree created based on suffix words that may occur in the search term, and the dictionary tree may be created by summarizing commonly used suffix words from the corpus of the search term. For example, FIG. 9 is a schematic diagram of a suffix word slot tree, in which the suffix word may include a group photo, a large group photo, etc.
The stop word slot tree is based on words which may occur in search sentences and have no actual semantics, such as conjunctions, auxiliary words and the like, and common stop words can be obtained based on encyclopedia knowledge and summarization of search sentence corpus. For example, in one example, FIG. 10 is a schematic diagram of a stop word slot tree, where the stop words in this example may include AND, heel, and the like.
S102, the search module builds a mode tree corresponding to the search statement.
In the embodiment of the application, a dictionary tree, also called a pattern tree, is constructed for the arrangement sequence of word types in a search sentence.
In an exemplary embodiment, rules of the search term corpus (i.e., search terms of images or videos by multiple users) are summarized and analyzed in combination with encyclopedia knowledge, and different types of term arrangements frequently occurring in the search terms are obtained.
The present embodiment is described taking a search mode describing portrait information as an example, and as shown in fig. 11, in an example, a mode tree may include the following three search modes describing portrait information:
Search pattern 1 [ appellation ] (title) [ stop word ] (stop word) [ appellation ] [ stop word ] [ number ] (number word) [ quantifier ] (quantity word) [ suffix ] (suffix word). In the search statement "double-person" of brother and small, the "brother" is a designation, "and" is a stop word, "small" is a designation, "is a stop word," double "is a numeral word," person "is a graduated word, and" syndication "is a suffix word.
Search pattern 2, [ appellation ] [ stop word ] [ suffix ], e.g., "xiao Shuai" in "photos of xiao Shuai" is a designation, "photo" is a stop word, and "photo" is a suffix word.
Search pattern 3 [ number ] [ quantifier ] [ suffix ], e.g., "five people group" in "five people group" is a numeral word, "people" is an advert and "group" is a suffix word.
The above-mentioned steps S101 to S102 show a dictionary tree construction process, which is only performed before the first search, and is not performed before each search, but is performed by updating the dictionary tree as required.
S103, the user adds or modifies the image and the attribute information thereof in the gallery application.
In an exemplary embodiment, the user may add attribute information of the image in a gallery application, e.g., annotate the picture of a sibling with attribute information of a "sibling".
In another exemplary embodiment, the user may also modify the annotated attribute information, for example, annotate a picture with "Zhang Sanj" and then modify the annotation as "Dither".
In yet another exemplary embodiment, the user may add a new annotation to the image that has been annotated with attribute information, e.g., annotate a picture with a "son" followed by a "little lovely" annotation.
In addition, the user may also modify image content, such as graffiti, cropping, filters, adjusting image parameters (e.g., brightness, contrast, saturation, sharpness, etc.
S104, the gallery service stores images and attribute information thereof.
And the gallery service receives the image and the attribute information thereof and correspondingly stores the image and the attribute information thereof into a gallery database.
S105, the gallery service transmits images to the multi-mode understanding module for visual semantic understanding.
The gallery service transmits the received image to a multi-modal understanding module, and the multi-modal understanding module carries out visual semantic understanding on the image to obtain a vector corresponding to the picture, which is also called a visual semantic vector.
Visual semantic understanding refers to automatically identifying high-level concepts of objects, scenes, actions, etc., and relationships between them, from an image. It can help the computer understand the image more deeply and make a finer analysis and understanding of the image content.
S106, the multi-mode understanding module returns visual semantic vectors to the gallery service.
The multimodal understanding module delivers visual semantic vectors obtained through visual semantic understanding to a gallery service.
S107, the gallery service stores visual semantic vectors.
For example, the gallery service may store visual semantic vectors returned by the multimodal understanding module into a gallery database.
S108, the gallery service transmits visual semantic vectors and attribute information required for constructing indexes to the search module.
In practical applications, a user usually only focuses on part of attribute information (such as shooting time, geographic position, who is shooting an object, and relationship with the object) of an image, and does not focus on information such as sensitivity, exposure value, white balance, shutter speed, and the like of the image, but only needs to construct an index for the attribute focused by the user. Therefore, the gallery service only needs to transfer the visual semantic vector and the attribute information required for constructing the index to the search module.
For example, the attribute information required to construct the index may be agreed with the search module and written into the code of the gallery service.
S109, the search module constructs an index information item corresponding to the picture based on the visual semantic vector and the attribute information.
In an exemplary embodiment, the visual semantic vector sent by the gallery service includes a unique identifier of the vector and the picture, which is also referred to as a picture ID, and after the search module receives the visual semantic vector and the attribute information, the search module writes the visual semantic vector, each item of attribute information, and the picture ID into corresponding fields of the index gallery.
For example, the attribute information of the image includes shooting time, shooting location (i.e. geographical location), person name, etc., the index information item of the index library includes fields of time, location, name, picture semantic vector, picture ID, etc., in this example, the shooting time is written into the time field, the shooting location is written into the place field, the person name is written into the name, the visual semantic vector is written into the visual semantic vector field, the picture ID is written into the picture ID field, and so on, all information is written into the corresponding fields, so as to obtain the index information item corresponding to the picture.
The above-mentioned S103 to S109 show the index construction process, which is not required to be performed before each search, but is performed when the user newly adds or modifies the picture and its attribute information.
In addition, in order to improve the endurance of the electronic device, the above index construction process may be performed in a state in which the electronic device is charged or turned off, so as to reduce the power consumption of the index construction process.
S110, the gallery service receives search sentences.
Illustratively, the user may input a search term in a search field provided by the gallery application, as shown in FIG. 12, the bottom of the page 100 of the gallery application includes tab navigation fields 101, such as including four tab options of photo 102, album 103, time 104, and find 105, and the page 100 is a photo tab interface. The top of the page 100 includes the title "photo" 106 of the current tab interface, with a search field 107 below the tab title 106, and the user may enter a search term in the search field 107.
S111, the gallery service transmits search sentences to the search module.
S112, the search module identifies image attribute features in the search statement based on the mode tree and the slot tree.
In an exemplary embodiment, the process shown in S112 may include the steps of:
(1) The root node of the pattern tree is read as the current node.
(2) And reading all child nodes of the current node.
(3) And sequentially reading the slot nodes in the corresponding name slot tree according to the names of the child nodes.
(4) And matching the current nodes with the slot nodes one by one according to the character sequence in the search statement, and if the slot information successfully matched exists, describing the current nodes in the hit mode tree. Otherwise, stopping the inquiry of the subsequent nodes.
(5) Repeating the steps (2) - (4) until all characters in the search sentence hit sub-nodes in the pattern tree, namely the search pattern in the search sentence hit pattern tree, and otherwise, indicating the search pattern in the search sentence miss pattern tree.
(6) And acquiring related information contained in the search statement, such as person names, number of persons, sky, building, animals, sunrise and sunset, and the like, from the hit search mode.
S113, the search module transmits the search statement to the multi-mode semantic understanding module to perform vector conversion.
S114, the multi-mode semantic understanding module returns text vectors corresponding to the search sentences to the search module.
The search module transmits the received search sentence to the multi-mode semantic understanding module, the multi-mode semantic understanding module carries out vector conversion on the search sentence to obtain a text vector corresponding to the search sentence, and the text vector is returned to the search module.
In an exemplary embodiment, all characters of the search term may be directly converted into corresponding vectors.
In another exemplary embodiment, the multi-modal semantic module may identify non-visual information in a search term (also referred to as a first search term), delete words of the non-visual information in the search term to obtain a second search term related to the visual information, and further perform vector conversion on the second search term to obtain a text vector. For example, the first search term is "the day of the summer shot in 2023", wherein "2023 is non-visual information, and the second search term after deleting the non-visual information is" the day of the summer shot ".
S115, the searching module searches the images matched with the image attribute features and the text vectors corresponding to the search sentences in the index library.
In an exemplary embodiment, the search module compares the image attribute features one by one with corresponding fields in the index base. And meanwhile, comparing the similarity between the text vector corresponding to the search sentence and the visual semantic vector in the index library, and determining that the visual semantic vector is matched with the search sentence if the similarity is greater than or equal to a threshold value.
And combining the matching result of the image attribute characteristics and the matching result of the text vector to obtain a search result matched with the search statement.
In an exemplary embodiment, the search results may include pictures where the text vector matches all of the image attribute features, and may include pictures where the text vector matches and some of the image attribute features match. For example, the search term is "the day of the year 2023 is taken in beijing", the search result may include a picture in which the taking time is 2023 is taken in summer, the taking place is beijing, and the subject includes the day of the year, that is, a picture that is completely matched with the search term, and may further include a picture of all the days of the year 2023 taken in other places, or the day of the year taken at other times.
And S116, the searching module ranks the searched pictures.
In an exemplary embodiment, the pictures in the search results are ranked from high to low according to how well they match the search term. For example, one search term corresponds to a plurality of image attribute features, the more the attribute features match, the higher the matching degree, whereas the fewer the attribute features match, the lower the matching degree.
For example, in S115, the search term is "the day of the year 2023 is picked up in beijing" and the pickup time is 2023 is summer, the pickup location is beijing, and the pickup subject includes the highest matching degree of the day-falling picture, the second highest matching degree of the day-falling picture picked up in other locations in 2023 is summer, and the lowest matching degree of the day-falling picture picked up in other locations at other times.
S117, the search module returns the picture search result to the gallery service.
In the embodiment of the application, the picture search result obtained by the search module is a set of picture IDs matched with the search statement.
S118, the gallery service displays the picture search results.
And after receiving the search result fed back by the search module, the gallery service reads the corresponding picture from the picture database according to the picture ID in the search result and displays the picture on a search result display interface.
As shown in fig. 13, the interface 200 is shown for a picture search result of a gallery application, where the interface 200 includes a search bar 201, and the search bar 201 includes content of the search, for example, double photo of Xiaomei and Dither. A folder display area 202 is located below the search field 201, and is used to display a folder to which the searched picture belongs. The search result display area 203 is located below the folder display area 202, and is used for displaying a preset number of pictures.
According to the multimedia content searching method provided by the embodiment, the slot tree corresponding to the image attribute information is constructed in advance based on the existing information, and each type of attribute information constructs a slot tree. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. After receiving a search statement input by a user, identifying image attribute features contained in the search statement based on the mode tree and the slot tree, and searching matched pictures based on the identified image attribute features and text vectors corresponding to the search statement. The method accurately identifies the image attribute characteristics in the search statement by utilizing the mode tree of the search statement and the slot tree corresponding to the attribute information, improves the accuracy of the semantic understanding result of the search statement, and finally improves the accuracy of the search result.
In addition, the method is executed in the electronic equipment, and user data (such as information marked on pictures or videos by users, such as person names and the like) does not need to be reported to the cloud server side, so that the safety of the user data is improved. In addition, the method identifies the image attribute characteristics by traversing the mode tree and the slot tree, and the mode tree and the slot tree are dictionary trees, so that the searching efficiency is high, the calculated amount is small and the power consumption is low.
The following describes a procedure of identifying image attribute features in a search term shown in S112 in connection with an example:
Example 1 search term "double group of small beauty and brothers", the process of identifying image attribute features is as follows:
(1) Reading a root node of the mode tree as a current node;
(2) As shown in fig. 11, the number node and appellation node are read from the mode tree according to the current node;
(3) Reading digital slot nodes from the digital slot tree, and reading title slot nodes from the title slot tree;
(4) Reading the first character "small" of the search sentence, and determining that the first character of the search sentence does not hit the number node if no node containing the "small" character is found in the digital slot tree as shown in fig. 7.
(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes called "small" nodes in the slot tree include two child nodes, namely "beautiful" and "commander", hit the "beautiful" node.
(6) And reading the child node of the appellation node in the mode tree, namely the stop word node, reading the next character of the American character, namely the sum from the search statement, searching the node containing the sum character in the stop word slot tree, and determining the sum hit stop word node.
(7) The child nodes of the stop word node in the read pattern tree, namely appellation and the unifix node, further read the next character of the "sum" character, namely the "sibling" character, from the search statement. And searching nodes containing 'brother' characters in the slot tree, continuing to read the next character, namely 'brother', of the 'brother' in the search statement, wherein the next character is identical to child nodes of the 'brother' nodes in the slot tree, and if the 'brother' in the slot tree is a leaf node, determining appellation nodes in a 'brother' hit mode tree in the search statement.
(8) The child node of appellation nodes in the reading mode tree, namely stop word nodes, reads the next character of the second 'sibling' character, namely 'from the search statement, searches the' node 'from the dead word slot tree, and the' node 'is a leaf node, so that the stop word nodes in the' character hit mode tree in the search statement are determined.
(9) Reading the next node of the stop word node in the mode tree, namely the number node, reading the next character 'double' to be compared in the search statement, searching the node containing the 'double' character in the digital slot tree, and determining the number node in the 'double' hit mode tree if the 'double' node is a leaf node.
(10) Reading the next node to be compared in the pattern tree, namely quantifiter nodes, reading the next character 'people' in the search statement, searching the nodes of the 'people' characters from the graduated slot tree, and determining quantifiter nodes in the 'people' hit pattern tree if the 'people' nodes are leaf nodes.
(11) Reading the next node of the pattern tree, namely a suffix node, reading the next character ' in the search statement, finding the node of the ' in-process ' character from the suffix word position tree, wherein the ' in-process ' node is not a leaf node, continuing to read the next character ' shadow ' in the search statement, finding the child node of the ' in-process ' node from the suffix word position tree to comprise a ' shadow ' node, and determining the suffix node in the ' in-process ' name pattern tree if the ' shadow ' node is a leaf node. The unification node is a leaf node of the pattern tree, then it is determined that the current search statement hits the search pattern [ appellation ] [ stop word ] [ appellation ] [ stop word ] [ number quantifier ] [ unification ].
(12) And determining the 'Xiaomei' and 'brother' to be called as the human body and the 'double' human body according to the search mode hit by the search statement.
Example 2 search statement "double group of sheetlet and brother", the process of identifying image attribute features is as follows:
(1) Reading a root node of the mode tree as a current node;
(2) Reading child nodes of a root node in the mode tree, namely a number node and a appellation node;
(3) Reading digital slot nodes from the digital slot tree, and reading title slot nodes from the title slot tree;
(4) Reading the first character "small" of the search sentence, and determining that the first character of the search sentence does not hit the number node if no node containing the "small" character is found in the digital slot tree as shown in fig. 7.
(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes of the "small" node in the title slot tree include two child nodes of "beauty" and "commander" and do not contain "sheet" nodes, i.e., the "small sheet" does not hit any title node.
(6) The query ends, the current search statement does not hit any search patterns, and no person names and number of persons are identified.
In this example, when searching is further performed based on the search term, the text vector corresponding to the search term is compared with the visual semantic vector in the index library in similarity to obtain an image matching the search term.
Example 3 search statement "three pot flower", the process of identifying image attribute features is as follows:
(1) The steps (3) are the same as the first three steps of the examples 1 and 2, and are not repeated here.
(4) Reading the first character 'three' of the search statement, finding a 'three' node in the digital slot tree, and determining that the search statement starts with a number if the node is a leaf node. And reading a second character 'basin' in the search statement, and determining that the current mode is not hit if the second character 'basin' does not hit in a slot node corresponding to the quantifier node in the mode tree, namely, if the second character 'basin' does not hit in any node of the graduated word slot tree.
(5) The first character "three" does not query any node in the title slot tree, and the determination search statement does not begin with the title.
(6) Stopping the query, determining that the current search statement did not hit any search patterns.
This example is similar to example 2 described above, and when searching is further based on a search term, similarity comparison is performed between the text vector corresponding to the search term and the visual semantic vector in the index library to obtain an image matching the search term.
In addition, in other embodiments, the search sentence is "double photo of the next year's sheetlet and the next year's photo", as can be seen from example 2, the matching of the sheetlet is unsuccessful, that is, the name of the person and the number of the person contained in the search sentence cannot be identified, and in this scenario, the time attribute information "next year" in the search sentence is retained and compared with the time field in the index library, and meanwhile, the similarity comparison is performed based on the text vector of the search sentence and the visual semantic vector in the index library, and the matching image with the search sentence is obtained by integrating the time attribute information and the text vector.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
In the several embodiments provided in this embodiment, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present embodiment may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present embodiment may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the respective embodiments. The storage medium includes various media capable of storing program codes such as flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk.
The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.