CN119961466A

CN119961466A - Multimedia content search method and device

Info

Publication number: CN119961466A
Application number: CN202311428944.4A
Authority: CN
Inventors: 王亚猛; 罗红枫; 刘华兴; 徐超劲
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2025-05-09

Abstract

The present application provides a multimedia content search method and device. After receiving a search statement for multimedia content input by a user, the solution identifies the image attribute features contained in the search statement based on a pre-built pattern tree and slot tree. For example, for multimedia content containing character objects, the character title, the number of characters, etc. can be identified. Furthermore, matching images are searched from an index library based on the identified image attribute features and the text vector corresponding to the search statement. The method uses the pattern tree of the search statement and the slot tree corresponding to the attribute information to accurately identify the image attribute features (such as character titles and the number of characters, etc.) in the search statement, thereby improving the accuracy of the semantic understanding results of the search statement and ultimately improving the accuracy of the search results.

Description

Multimedia content searching method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for searching multimedia content.

Background

With the continuous development of artificial intelligence technology, the multi-modal model enables a computer to have the ability to understand multi-modal media resources. Wherein the multimodal model represented by CLIP (Contrastive Language-IMAGE PRETRAINING) is used to understand the relationships of different modality resources in language and vision by encoding images and text into a unified vector space. However, due to limitations in training data sets, current multimodal models often have difficulty mapping user-personalized search sentences describing image information to the correct images, resulting in poor search results.

Disclosure of Invention

In view of the above, the present application provides a method and apparatus for searching multimedia content to solve the above problems, and the disclosed technical solution is as follows:

The application provides a multimedia content searching method which is applied to electronic equipment and comprises the steps of receiving a search statement of multimedia content, identifying image attribute features contained in the search statement based on a pre-constructed mode tree and a slot tree, wherein the mode tree represents a plurality of description modes comprising the search statement, each description mode comprises at least two attribute types, each attribute type corresponds to one slot tree, one slot tree comprises one type of attribute information of the multimedia content, the image attribute features comprise non-visual information of the multimedia content, converting the search statement into a text vector, searching multimedia content searching results matched with the search statement from a pre-constructed index base based on the image attribute features and the text vector, and displaying the multimedia content searching results. It can be seen that the method accurately identifies image attribute features in the search statement by using a mode tree of the search statement and a slot tree corresponding to the attribute information, wherein the image attribute features represent non-visual information of the multimedia content, such as person names, number of persons and the like. By utilizing the method, the non-visual attribute characteristics in the search statement can be accurately identified, so that the accuracy of the semantic understanding result of the search statement is improved, and finally the accuracy of the search result is improved.

In one possible implementation, the index library includes a plurality of index information items corresponding to the multimedia content, and one index information item includes attribute information corresponding to the multimedia content and a visual semantic vector corresponding to the multimedia content. Therefore, the search can be directly performed in the index base based on the attribute features and the visual semantic vectors, and the accuracy of the search results is improved.

In one possible implementation, the method for searching the multimedia content search result matched with the search statement from the pre-constructed index library based on the image attribute features and the text vectors comprises the steps of comparing the image attribute features with corresponding information fields of index information items in the index library one by one, screening out a first index information item set matched with the image attribute features, comparing the similarity between the text vectors and the visual semantic vectors, screening out a second index information item set with the similarity greater than or equal to a preset threshold value, and obtaining the multimedia content search result according to the first index information item and the second index information item set. Therefore, the method searches the images matched with the search sentences from the index base based on the information of the two dimensions of the image attribute features and the visual semantic features, and improves the accuracy of the search results.

In one possible implementation mode, identifying image attribute features contained in a search statement based on a pre-constructed mode tree and a slot tree comprises reading all first sub-nodes of a root node in the mode tree, sequentially reading first type slot nodes from the slot tree with the same name according to the slot names of all first sub-nodes, searching whether matched first type slot nodes exist one by one according to the character sequence in the search statement, determining whether the current character in the search statement is successfully matched with the current node in the mode tree if the matched first type slot nodes exist, wherein the current node is a node corresponding to the slot node successfully matched in the mode tree, reading all second sub-nodes of the current node in the mode tree, reading corresponding second type slot nodes according to the slot names of all second sub-nodes, sequentially searching the rest characters in the search statement until the mode tree is successfully matched with the leaf nodes in the mode tree, and determining attribute features corresponding to multimedia contents based on the target search mode and the search statement. It can be seen that the method firstly reads a node in the mode tree, then determines whether characters of the search statement are matched with the node in the mode tree according to the slot tree matched with the slot name of the node, and if so, continues to match the next characters of the search statement until all characters in the search statement are successfully matched with the corresponding node in the mode tree, and determines image attribute features contained in the search statement according to the slot names of the nodes in the mode tree. The scheme stores the description mode (namely the mode tree) of the search statement and the attribute information (namely the slot tree) of the multimedia content in a dictionary tree mode, and the data structure has higher query efficiency and occupies small storage space.

In one possible implementation mode, the step of searching whether the first type of the slot nodes matched with each other exists or not according to the character sequence in the search statement comprises the steps of reading first characters in the search statement, searching whether the slot tree corresponding to each first sub-node contains the slot nodes of the first characters or not respectively, judging whether the first matched slot node corresponding to the first characters is a leaf node in the first slot tree or not if the first slot tree containing the first characters exists, reading the next characters of the first characters if the slot node corresponding to the first characters in the first slot tree is not the leaf node, continuously searching whether all sub-nodes of the first matched slot node in the first slot tree contain the next characters or not until the slot node which is successfully matched is the leaf node, and determining that the first type of the slot node matched with the characters in the search statement exists.

In one possible implementation, determining the attribute characteristics corresponding to the multimedia content based on the target search mode and the search sentence comprises determining that content matched with the character name node in the search sentence is character name based on the character name node contained in the target search mode, and determining that content matched with the character number node in the search sentence is character number based on the character number node contained in the target search mode. Therefore, the scheme can accurately identify the attribute characteristics such as person names, person numbers and the like in the search statement, so that the accuracy of the search result is improved.

In one possible implementation, the process of constructing the slot tree includes constructing the slot tree corresponding to the attribute information based on the existing information, wherein the existing information includes at least one of attribute information and encyclopedia knowledge corresponding to the existing multimedia content in the gallery, the attribute information corresponding to the multimedia content includes at least one of label information input by a user to the multimedia content, shooting information of the multimedia content, and an entity object label identified by the electronic device to the multimedia content, and the label information includes a person name including at least one of a name, a nickname, and a person relationship for the multimedia content including the person object. Therefore, the scheme constructs the slot tree based on the attribute information corresponding to the existing multimedia content of the gallery and the encyclopedia knowledge, and the construction process is simple.

In one possible implementation, the slot tree includes a persona-designation slot tree, a number word slot tree, a stop word slot tree, and a suffix word slot tree, the persona-designation slot tree includes at least one of a persona name, a nickname, and a persona relationship, or the slot tree includes a name slot tree, a nickname slot tree, a persona relationship slot tree, a number word slot tree, a stop word slot tree, and a suffix word slot tree.

In one possible implementation mode, the method comprises the steps of constructing a slot tree corresponding to attribute information based on existing information, constructing a character name slot tree based on character name marking information corresponding to existing multimedia content in electronic equipment and encyclopedia knowledge of character relations, constructing a word number slot tree based on encyclopedia knowledge of words and words contained in search statement corpus obtained in advance, constructing a word number slot tree based on encyclopedia knowledge of words and words contained in search statement corpus, constructing an idle word slot tree based on words without practical meaning contained in encyclopedia knowledge and search statement corpus, and constructing a suffix word slot tree based on suffix words contained in search statement corpus.

In one possible implementation, the process of constructing the pattern tree comprises analyzing word arrangement sequences of corresponding attributes of each slot tree contained in a search sentence corpus obtained in advance based on encyclopedia knowledge to obtain description patterns corresponding to the search sentence, and constructing the pattern tree based on various description patterns.

In a second aspect, the application also provides an electronic device comprising one or more processors, a memory and a touch screen, the memory being for storing program code, the processor being for running the program code such that the electronic device implements the method of searching for multimedia content as in any of the first aspects.

In a third aspect, the present application also provides a computer readable storage medium having instructions stored thereon which, when executed on an electronic device, cause the electronic device to perform the multimedia content searching method of any of the first aspects.

In a fourth aspect, the present application also provides a computer program product having stored thereon an execution, which when run on an electronic device causes the electronic device to implement the network traffic optimization method according to any of the first aspects.

It should be appreciated that the description of technical features, aspects, benefits or similar language in the present application does not imply that all of the features and advantages may be realized with any single embodiment. Conversely, it should be understood that the description of features or advantages is intended to include, in at least one embodiment, the particular features, aspects, or advantages. Therefore, the description of technical features, technical solutions or advantageous effects in this specification does not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantageous effects described in the present embodiment may also be combined in any appropriate manner. Those of skill in the art will appreciate that an embodiment may be implemented without one or more particular features, aspects, or benefits of a particular embodiment. In other embodiments, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a flowchart of a multimedia searching method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a so-called slot tree provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a name slot tree according to an embodiment of the present application;

FIG. 5 is a diagram of a nickname slot tree provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a character relationship slot tree provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a digital slot tree according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a thesaurus tree provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a suffix word slot tree provided by an embodiment of the application;

FIG. 10 is a schematic diagram of a stop word slot tree provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a pattern tree provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a gallery application interface provided by an embodiment of the application;

FIG. 13 is a schematic diagram of another gallery application interface provided by an embodiment of the application.

Detailed Description

The terms first, second, third and the like in the description and in the claims and in the drawings are used for distinguishing between different objects and not for limiting the specified order.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the related art, in a scene of multi-modal search of portrait pictures or videos represented by CLIP, in order to accurately identify attribute information such as person names (e.g., person names, nicknames, person relationships, etc.) and the number of persons from search words describing portrait pictures or videos, there are the following difficulties:

1. Ambiguity of person names user-defined person names are not normalized, such as Zhang San, dither, flower blossom bringing rich and honour, and the like, and therefore, the person names have ambiguity in practical application and are difficult to accurately identify.

2. The limiting object of the quantitative word has a variety that the quantitative word can be used for limiting various subjects, and the quantitative word limiting the person needs to be accurately extracted, for example, three in the three-pot flower is not the quantitative word limiting the person, and two in the double photo are the quantitative word limiting the person.

3. Privacy and power consumption problems for schemes that require user data to be reported to the cloud side to train a language model, uploading user data (e.g., information such as person names of users marked in a gallery) to the cloud side may result in user privacy leakage.

4. The problem of identification accuracy is that the matching inquiry is directly carried out on the search statement and the person name, and the problem of false identification is very easy to be caused, for example, a user searches lyric information containing songs of 'Zhang Sanng', but does not search for 'Zhang Sanng'. Another is to train a language model using natural language understanding technology, and the scheme is very easy to cause the problem of missing recognition due to the limitation of training corpus, for example, a "small and beautiful photo" is found, and the word segmentation of the language model results in a "small", "beautiful" photo.

In order to solve the technical problems, the application provides a multimedia content searching method, which constructs a slot tree corresponding to image attribute information based on existing information in advance, and constructs a slot tree according to each type of attribute information. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. And identifying specific image attribute features in the search statement, such as person names, person numbers and the like, based on the mode tree and the slot tree, namely improving the accuracy of semantic understanding results of the search statement. Further, the matched picture or video is searched based on the image attribute characteristics and the characteristics of two dimensions of the visual semantic vector of the search statement, so that the accuracy of the search result is improved.

The multimedia content searching method provided by the application is suitable for end-side electronic equipment, such as equipment of mobile phones, tablet computers, desktop, laptop computers, notebook computers, ultra-mobile Personal Computer (UMPC), handheld computers, netbooks, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), wearable electronic equipment, intelligent watches and the like.

Referring to fig. 1, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown.

As shown in fig. 1, the electronic device may include a processor 110, a memory 120, a camera 130, a display screen 140, and the like.

It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, and the different processing units may be separate devices or may be integrated into one or more processors.

Memory 120 is used to store computer-executable program code that includes instructions. The processor 110 performs various functional applications and data processing of the electronic device by executing instructions stored in memory. For example, in the present embodiment, the processor 110 may perform a multimedia content search by executing instructions stored in the memory.

The memory 120 may include a stored program area and a stored data area. The storage program area may store an application program (such as a capturing function of an image or video, a playing function of an image or video, etc.) required for at least one function of the operating system. The storage data area may store data created during use of the electronic device (e.g., image data, video data, etc.), and so on.

The camera 130 is used to capture still images or video. The multimedia content of the present application is an image or video.

The display screen 140 is used to display images, videos, and the like.

In addition, an operating system is run on the components. For example, the number of the cells to be processed,The operating system is such that the operating system,The operating system is such that the operating system,An operating system, etc. An application may be running in an operating system, for example, the application to which the present application relates includes a gallery.

In order to solve the problem of poor searching effect of the current multi-mode multimedia content searching scheme, the method for searching the multimedia content provided by the embodiment of the application can be applied to the electronic equipment shown in fig. 1, and particularly can be operated in a gallery application. Moreover, the method can be applied to searching of images and also to searching of videos.

First, a processing module related to a multimedia content searching method in a gallery application is introduced, and in an embodiment of the present application, the gallery application may include a gallery service, a searching module, and a multi-modal understanding module.

The gallery service is used for realizing basic functions of gallery application, such as storing images and videos, further displaying the stored images or videos, and receiving and storing attribute information corresponding to the multimedia content input by a user, such as adding attribute information or modifying attribute information.

In the embodiment of the application, the searching module is used for constructing a slot tree, a mode tree and a picture index so as to realize searching of images or videos stored in a gallery.

In the embodiment of the application, the multi-mode understanding module is used for carrying out visual semantic understanding conversion on image content or video content into corresponding vectors and carrying out vector conversion on search sentences input by a user.

The method for searching multimedia contents according to the present application will be described in detail with reference to fig. 2, and may include the steps of:

s101, the search module builds a slot tree corresponding to the image attribute information based on the existing information.

In the embodiment of the application, the image attribute information includes various information related to the image, for example, may include shooting information (such as shooting time, geographical position and the like) of the image, labeling information (such as name, nickname, person relationship and the like) of a user on an object contained in the image, and an image content label, wherein the content label may be an object contained in the image automatically identified based on the image content, such as sky, sunrise, sunset, seaside, building, children, animals and the like.

The existing information may include at least one of attribute information and encyclopedia knowledge of existing pictures in the gallery.

In the embodiment of the application, a dictionary tree, also called a slot tree, is constructed for part of type attribute information. The dictionary tree is a tree structure, and common prefixes of different character strings do not need to be repeatedly stored, so that the storage space can be saved. Meanwhile, the common prefix of the character strings can be utilized to reduce the inquiry time, reduce unnecessary character string comparison to the maximum extent and improve the inquiry efficiency.

In an exemplary embodiment, information that a user annotates a portrait in an image, such as a name, nickname, relationship of the persona, etc., may be collectively referred to as a "name", and a slot tree built for such attribute information is referred to as a "slot tree". The construction of the title information of the title slot tree may include the title information of the user for the picture annotation in the gallery. In other embodiments, the name slot tree may also be extended according to the contact name stored in the address book of the electronic device.

In one example, FIG. 3 illustrates a so-called slot tree that includes names of persons, nicknames, persona relationships, etc., e.g., where names of persons include Wang Ming, nicknames include small america, xiao Shuai, etc., and persona relationships include siblings, children, daughter, mother, friend, etc., in the example illustrated in FIG. 3. Fig. 3 is only an example, and in practical application, the slot tree may include more nodes and more complex structures, and the present application is not shown one by one.

In another exemplary embodiment, slot trees, such as a person name slot tree, a nickname slot tree, a persona relationship slot tree, etc., may be built separately for person names, nicknames, persona relationships, etc.

The name slot tree only comprises names corresponding to the figures, can be created according to names of the figures of the user in the gallery and further can be expanded according to names of contacts in the address book. For example, in one example, FIG. 4 is a schematic diagram of a personal name slot tree, where the names include Zhang Sanfeng, zhang Sanfeng, zhang, wang Ming, and so on.

The nickname slot tree only comprises nicknames corresponding to the portraits, can be created based on nicknames marked by users on portrait images or videos in the gallery, and further can be expanded according to nicknames of contacts in the instant messaging application. For example, in one example, FIG. 5 is a schematic diagram of a nickname slot tree, such as nicknames including flower blossom bringing rich and honour, flower placing forgetfulness and apprehension, summer, cotton candy, small lovely, and the like.

The character relation slot tree is a dictionary tree created based on words representing the relation between people obtained by summarizing encyclopedia knowledge. For example, in one example FIG. 6 is a schematic diagram of a persona relationship slot tree, e.g., persona relationships may include siblings, sisters, go, sisters, dad, mom, son, daughter, friend, classmate, and so forth.

In addition, in the embodiment of the application, a corresponding slot tree, also called a label slot tree, can be constructed for the image content labels. And constructing a label slot tree according to the content labels of the gallery labels.

In the embodiment of the application, in order to accurately identify the semantic content of the search statement input by the user, corresponding slot trees can be respectively constructed for the possible number words, the graduated words, the suffix words and the stop words (i.e. the words without actual semantics) in the search statement.

The number slot tree is a dictionary tree created based on numbers that may occur in the search term, and commonly used digital words can be summarized according to encyclopedia knowledge and corpus generalization of the search term. For example, in one example, FIG. 7 is a schematic diagram of a number slot tree, where the number may include two, three, four, etc.

The thesaurus slot tree is a dictionary tree created based on the possible thesaurus in the search sentence, and can be used for summarizing the corpus of the search sentence based on encyclopedia knowledge to obtain the common thesaurus. For example, in one example, FIG. 8 is a schematic diagram of a thesaurus tree, e.g., a thesaurus may include a person, a place, a name, a person, etc.

The suffix word slot tree is a dictionary tree created based on suffix words that may occur in the search term, and the dictionary tree may be created by summarizing commonly used suffix words from the corpus of the search term. For example, FIG. 9 is a schematic diagram of a suffix word slot tree, in which the suffix word may include a group photo, a large group photo, etc.

The stop word slot tree is based on words which may occur in search sentences and have no actual semantics, such as conjunctions, auxiliary words and the like, and common stop words can be obtained based on encyclopedia knowledge and summarization of search sentence corpus. For example, in one example, FIG. 10 is a schematic diagram of a stop word slot tree, where the stop words in this example may include AND, heel, and the like.

S102, the search module builds a mode tree corresponding to the search statement.

In the embodiment of the application, a dictionary tree, also called a pattern tree, is constructed for the arrangement sequence of word types in a search sentence.

In an exemplary embodiment, rules of the search term corpus (i.e., search terms of images or videos by multiple users) are summarized and analyzed in combination with encyclopedia knowledge, and different types of term arrangements frequently occurring in the search terms are obtained.

The present embodiment is described taking a search mode describing portrait information as an example, and as shown in fig. 11, in an example, a mode tree may include the following three search modes describing portrait information:

Search pattern 1 [ appellation ] (title) [ stop word ] (stop word) [ appellation ] [ stop word ] [ number ] (number word) [ quantifier ] (quantity word) [ suffix ] (suffix word). In the search statement "double-person" of brother and small, the "brother" is a designation, "and" is a stop word, "small" is a designation, "is a stop word," double "is a numeral word," person "is a graduated word, and" syndication "is a suffix word.

Search pattern 2, [ appellation ] [ stop word ] [ suffix ], e.g., "xiao Shuai" in "photos of xiao Shuai" is a designation, "photo" is a stop word, and "photo" is a suffix word.

Search pattern 3 [ number ] [ quantifier ] [ suffix ], e.g., "five people group" in "five people group" is a numeral word, "people" is an advert and "group" is a suffix word.

The above-mentioned steps S101 to S102 show a dictionary tree construction process, which is only performed before the first search, and is not performed before each search, but is performed by updating the dictionary tree as required.

S103, the user adds or modifies the image and the attribute information thereof in the gallery application.

In an exemplary embodiment, the user may add attribute information of the image in a gallery application, e.g., annotate the picture of a sibling with attribute information of a "sibling".

In another exemplary embodiment, the user may also modify the annotated attribute information, for example, annotate a picture with "Zhang Sanj" and then modify the annotation as "Dither".

In yet another exemplary embodiment, the user may add a new annotation to the image that has been annotated with attribute information, e.g., annotate a picture with a "son" followed by a "little lovely" annotation.

In addition, the user may also modify image content, such as graffiti, cropping, filters, adjusting image parameters (e.g., brightness, contrast, saturation, sharpness, etc.

S104, the gallery service stores images and attribute information thereof.

And the gallery service receives the image and the attribute information thereof and correspondingly stores the image and the attribute information thereof into a gallery database.

S105, the gallery service transmits images to the multi-mode understanding module for visual semantic understanding.

The gallery service transmits the received image to a multi-modal understanding module, and the multi-modal understanding module carries out visual semantic understanding on the image to obtain a vector corresponding to the picture, which is also called a visual semantic vector.

Visual semantic understanding refers to automatically identifying high-level concepts of objects, scenes, actions, etc., and relationships between them, from an image. It can help the computer understand the image more deeply and make a finer analysis and understanding of the image content.

S106, the multi-mode understanding module returns visual semantic vectors to the gallery service.

The multimodal understanding module delivers visual semantic vectors obtained through visual semantic understanding to a gallery service.

S107, the gallery service stores visual semantic vectors.

For example, the gallery service may store visual semantic vectors returned by the multimodal understanding module into a gallery database.

S108, the gallery service transmits visual semantic vectors and attribute information required for constructing indexes to the search module.

In practical applications, a user usually only focuses on part of attribute information (such as shooting time, geographic position, who is shooting an object, and relationship with the object) of an image, and does not focus on information such as sensitivity, exposure value, white balance, shutter speed, and the like of the image, but only needs to construct an index for the attribute focused by the user. Therefore, the gallery service only needs to transfer the visual semantic vector and the attribute information required for constructing the index to the search module.

For example, the attribute information required to construct the index may be agreed with the search module and written into the code of the gallery service.

S109, the search module constructs an index information item corresponding to the picture based on the visual semantic vector and the attribute information.

In an exemplary embodiment, the visual semantic vector sent by the gallery service includes a unique identifier of the vector and the picture, which is also referred to as a picture ID, and after the search module receives the visual semantic vector and the attribute information, the search module writes the visual semantic vector, each item of attribute information, and the picture ID into corresponding fields of the index gallery.

For example, the attribute information of the image includes shooting time, shooting location (i.e. geographical location), person name, etc., the index information item of the index library includes fields of time, location, name, picture semantic vector, picture ID, etc., in this example, the shooting time is written into the time field, the shooting location is written into the place field, the person name is written into the name, the visual semantic vector is written into the visual semantic vector field, the picture ID is written into the picture ID field, and so on, all information is written into the corresponding fields, so as to obtain the index information item corresponding to the picture.

The above-mentioned S103 to S109 show the index construction process, which is not required to be performed before each search, but is performed when the user newly adds or modifies the picture and its attribute information.

In addition, in order to improve the endurance of the electronic device, the above index construction process may be performed in a state in which the electronic device is charged or turned off, so as to reduce the power consumption of the index construction process.

S110, the gallery service receives search sentences.

Illustratively, the user may input a search term in a search field provided by the gallery application, as shown in FIG. 12, the bottom of the page 100 of the gallery application includes tab navigation fields 101, such as including four tab options of photo 102, album 103, time 104, and find 105, and the page 100 is a photo tab interface. The top of the page 100 includes the title "photo" 106 of the current tab interface, with a search field 107 below the tab title 106, and the user may enter a search term in the search field 107.

S111, the gallery service transmits search sentences to the search module.

S112, the search module identifies image attribute features in the search statement based on the mode tree and the slot tree.

In an exemplary embodiment, the process shown in S112 may include the steps of:

(1) The root node of the pattern tree is read as the current node.

(2) And reading all child nodes of the current node.

(3) And sequentially reading the slot nodes in the corresponding name slot tree according to the names of the child nodes.

(4) And matching the current nodes with the slot nodes one by one according to the character sequence in the search statement, and if the slot information successfully matched exists, describing the current nodes in the hit mode tree. Otherwise, stopping the inquiry of the subsequent nodes.

(5) Repeating the steps (2) - (4) until all characters in the search sentence hit sub-nodes in the pattern tree, namely the search pattern in the search sentence hit pattern tree, and otherwise, indicating the search pattern in the search sentence miss pattern tree.

(6) And acquiring related information contained in the search statement, such as person names, number of persons, sky, building, animals, sunrise and sunset, and the like, from the hit search mode.

S113, the search module transmits the search statement to the multi-mode semantic understanding module to perform vector conversion.

S114, the multi-mode semantic understanding module returns text vectors corresponding to the search sentences to the search module.

The search module transmits the received search sentence to the multi-mode semantic understanding module, the multi-mode semantic understanding module carries out vector conversion on the search sentence to obtain a text vector corresponding to the search sentence, and the text vector is returned to the search module.

In an exemplary embodiment, all characters of the search term may be directly converted into corresponding vectors.

In another exemplary embodiment, the multi-modal semantic module may identify non-visual information in a search term (also referred to as a first search term), delete words of the non-visual information in the search term to obtain a second search term related to the visual information, and further perform vector conversion on the second search term to obtain a text vector. For example, the first search term is "the day of the summer shot in 2023", wherein "2023 is non-visual information, and the second search term after deleting the non-visual information is" the day of the summer shot ".

S115, the searching module searches the images matched with the image attribute features and the text vectors corresponding to the search sentences in the index library.

In an exemplary embodiment, the search module compares the image attribute features one by one with corresponding fields in the index base. And meanwhile, comparing the similarity between the text vector corresponding to the search sentence and the visual semantic vector in the index library, and determining that the visual semantic vector is matched with the search sentence if the similarity is greater than or equal to a threshold value.

And combining the matching result of the image attribute characteristics and the matching result of the text vector to obtain a search result matched with the search statement.

In an exemplary embodiment, the search results may include pictures where the text vector matches all of the image attribute features, and may include pictures where the text vector matches and some of the image attribute features match. For example, the search term is "the day of the year 2023 is taken in beijing", the search result may include a picture in which the taking time is 2023 is taken in summer, the taking place is beijing, and the subject includes the day of the year, that is, a picture that is completely matched with the search term, and may further include a picture of all the days of the year 2023 taken in other places, or the day of the year taken at other times.

And S116, the searching module ranks the searched pictures.

In an exemplary embodiment, the pictures in the search results are ranked from high to low according to how well they match the search term. For example, one search term corresponds to a plurality of image attribute features, the more the attribute features match, the higher the matching degree, whereas the fewer the attribute features match, the lower the matching degree.

For example, in S115, the search term is "the day of the year 2023 is picked up in beijing" and the pickup time is 2023 is summer, the pickup location is beijing, and the pickup subject includes the highest matching degree of the day-falling picture, the second highest matching degree of the day-falling picture picked up in other locations in 2023 is summer, and the lowest matching degree of the day-falling picture picked up in other locations at other times.

S117, the search module returns the picture search result to the gallery service.

In the embodiment of the application, the picture search result obtained by the search module is a set of picture IDs matched with the search statement.

S118, the gallery service displays the picture search results.

And after receiving the search result fed back by the search module, the gallery service reads the corresponding picture from the picture database according to the picture ID in the search result and displays the picture on a search result display interface.

As shown in fig. 13, the interface 200 is shown for a picture search result of a gallery application, where the interface 200 includes a search bar 201, and the search bar 201 includes content of the search, for example, double photo of Xiaomei and Dither. A folder display area 202 is located below the search field 201, and is used to display a folder to which the searched picture belongs. The search result display area 203 is located below the folder display area 202, and is used for displaying a preset number of pictures.

According to the multimedia content searching method provided by the embodiment, the slot tree corresponding to the image attribute information is constructed in advance based on the existing information, and each type of attribute information constructs a slot tree. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. After receiving a search statement input by a user, identifying image attribute features contained in the search statement based on the mode tree and the slot tree, and searching matched pictures based on the identified image attribute features and text vectors corresponding to the search statement. The method accurately identifies the image attribute characteristics in the search statement by utilizing the mode tree of the search statement and the slot tree corresponding to the attribute information, improves the accuracy of the semantic understanding result of the search statement, and finally improves the accuracy of the search result.

In addition, the method is executed in the electronic equipment, and user data (such as information marked on pictures or videos by users, such as person names and the like) does not need to be reported to the cloud server side, so that the safety of the user data is improved. In addition, the method identifies the image attribute characteristics by traversing the mode tree and the slot tree, and the mode tree and the slot tree are dictionary trees, so that the searching efficiency is high, the calculated amount is small and the power consumption is low.

The following describes a procedure of identifying image attribute features in a search term shown in S112 in connection with an example:

Example 1 search term "double group of small beauty and brothers", the process of identifying image attribute features is as follows:

(1) Reading a root node of the mode tree as a current node;

(2) As shown in fig. 11, the number node and appellation node are read from the mode tree according to the current node;

(3) Reading digital slot nodes from the digital slot tree, and reading title slot nodes from the title slot tree;

(4) Reading the first character "small" of the search sentence, and determining that the first character of the search sentence does not hit the number node if no node containing the "small" character is found in the digital slot tree as shown in fig. 7.

(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes called "small" nodes in the slot tree include two child nodes, namely "beautiful" and "commander", hit the "beautiful" node.

(6) And reading the child node of the appellation node in the mode tree, namely the stop word node, reading the next character of the American character, namely the sum from the search statement, searching the node containing the sum character in the stop word slot tree, and determining the sum hit stop word node.

(7) The child nodes of the stop word node in the read pattern tree, namely appellation and the unifix node, further read the next character of the "sum" character, namely the "sibling" character, from the search statement. And searching nodes containing 'brother' characters in the slot tree, continuing to read the next character, namely 'brother', of the 'brother' in the search statement, wherein the next character is identical to child nodes of the 'brother' nodes in the slot tree, and if the 'brother' in the slot tree is a leaf node, determining appellation nodes in a 'brother' hit mode tree in the search statement.

(8) The child node of appellation nodes in the reading mode tree, namely stop word nodes, reads the next character of the second 'sibling' character, namely 'from the search statement, searches the' node 'from the dead word slot tree, and the' node 'is a leaf node, so that the stop word nodes in the' character hit mode tree in the search statement are determined.

(9) Reading the next node of the stop word node in the mode tree, namely the number node, reading the next character 'double' to be compared in the search statement, searching the node containing the 'double' character in the digital slot tree, and determining the number node in the 'double' hit mode tree if the 'double' node is a leaf node.

(10) Reading the next node to be compared in the pattern tree, namely quantifiter nodes, reading the next character 'people' in the search statement, searching the nodes of the 'people' characters from the graduated slot tree, and determining quantifiter nodes in the 'people' hit pattern tree if the 'people' nodes are leaf nodes.

(11) Reading the next node of the pattern tree, namely a suffix node, reading the next character ' in the search statement, finding the node of the ' in-process ' character from the suffix word position tree, wherein the ' in-process ' node is not a leaf node, continuing to read the next character ' shadow ' in the search statement, finding the child node of the ' in-process ' node from the suffix word position tree to comprise a ' shadow ' node, and determining the suffix node in the ' in-process ' name pattern tree if the ' shadow ' node is a leaf node. The unification node is a leaf node of the pattern tree, then it is determined that the current search statement hits the search pattern [ appellation ] [ stop word ] [ appellation ] [ stop word ] [ number quantifier ] [ unification ].

(12) And determining the 'Xiaomei' and 'brother' to be called as the human body and the 'double' human body according to the search mode hit by the search statement.

Example 2 search statement "double group of sheetlet and brother", the process of identifying image attribute features is as follows:

(1) Reading a root node of the mode tree as a current node;

(2) Reading child nodes of a root node in the mode tree, namely a number node and a appellation node;

(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes of the "small" node in the title slot tree include two child nodes of "beauty" and "commander" and do not contain "sheet" nodes, i.e., the "small sheet" does not hit any title node.

(6) The query ends, the current search statement does not hit any search patterns, and no person names and number of persons are identified.

In this example, when searching is further performed based on the search term, the text vector corresponding to the search term is compared with the visual semantic vector in the index library in similarity to obtain an image matching the search term.

Example 3 search statement "three pot flower", the process of identifying image attribute features is as follows:

(1) The steps (3) are the same as the first three steps of the examples 1 and 2, and are not repeated here.

(4) Reading the first character 'three' of the search statement, finding a 'three' node in the digital slot tree, and determining that the search statement starts with a number if the node is a leaf node. And reading a second character 'basin' in the search statement, and determining that the current mode is not hit if the second character 'basin' does not hit in a slot node corresponding to the quantifier node in the mode tree, namely, if the second character 'basin' does not hit in any node of the graduated word slot tree.

(5) The first character "three" does not query any node in the title slot tree, and the determination search statement does not begin with the title.

(6) Stopping the query, determining that the current search statement did not hit any search patterns.

This example is similar to example 2 described above, and when searching is further based on a search term, similarity comparison is performed between the text vector corresponding to the search term and the visual semantic vector in the index library to obtain an image matching the search term.

In addition, in other embodiments, the search sentence is "double photo of the next year's sheetlet and the next year's photo", as can be seen from example 2, the matching of the sheetlet is unsuccessful, that is, the name of the person and the number of the person contained in the search sentence cannot be identified, and in this scenario, the time attribute information "next year" in the search sentence is retained and compared with the time field in the index library, and meanwhile, the similarity comparison is performed based on the text vector of the search sentence and the visual semantic vector in the index library, and the matching image with the search sentence is obtained by integrating the time attribute information and the text vector.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

In the several embodiments provided in this embodiment, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present embodiment may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present embodiment may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the respective embodiments. The storage medium includes various media capable of storing program codes such as flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk.

The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A multimedia content search method, characterized in that it is applied to an electronic device, the method comprising:

receiving a search statement for multimedia content;

Identify the image attribute features contained in the search statement based on a pre-built pattern tree and slot tree, wherein the pattern tree representation includes multiple description patterns of the search statement, each description pattern includes at least two attribute types, each attribute type corresponds to a slot tree, a slot tree includes a type of attribute information of multimedia content, and the image attribute features include non-visual information of the multimedia content;

Converting the search statement into a text vector;

Based on the image attribute features and the text vector, searching for multimedia content search results matching the search statement from a pre-built index library;

The multimedia content search results are displayed.

2. The method according to claim 1 is characterized in that the index library includes index information items corresponding to multiple multimedia contents, and an index information item includes attribute information corresponding to a multimedia content and a visual semantic vector corresponding to the multimedia content.

3. The method according to claim 2, wherein searching for multimedia content search results matching the search statement from a pre-built index library based on the image attribute features and the text vector comprises:

Compare the image attribute features with corresponding information fields of index information items in the index library one by one, and filter out a first set of index information items that match the image attribute features;

and, comparing the similarity between the text vector and the visual semantic vector, and screening out a set of second index information items whose similarity is greater than or equal to a preset threshold;

The multimedia content search result is obtained according to the first index information item and the second index information item set.

4. The method according to any one of claims 1 to 3, characterized in that the identifying the image attribute features contained in the search statement based on the pre-built pattern tree and slot tree comprises:

Read all first child nodes of the root node in the pattern tree;

Read the first type of slot nodes from the slot tree with the same name in sequence according to the slot name of each first child node;

Querying one by one whether there is a matching first-category slot node according to the character sequence in the search statement, and if there is a matching first-category slot node, determining that the current character in the search statement successfully matches the current node in the pattern tree, and the current node is the node corresponding to the successfully matched slot node in the pattern tree;

Read all second child nodes of the current node in the pattern tree, and read corresponding second-type slot nodes according to the slot name of each second child node, and query the remaining characters in the search statement in order until a match is successfully made with a leaf node in the pattern tree to determine that a target search pattern matching the search statement exists in the pattern tree;

The attribute features corresponding to the multimedia content are determined based on the target search mode and the search sentence.

5. The method according to claim 4, characterized in that the querying one by one according to the order of characters in the search statement to see whether there is a matching first-category slot node comprises:

Read the first character in the search statement, and query whether the slot tree corresponding to each of the first sub-nodes contains a slot node of the first character;

If there is a first slot tree containing the first character, determining whether the first matching slot node corresponding to the first character is a leaf node in the first slot tree;

If the slot node corresponding to the first character in the first slot tree is not a leaf node, read the next character of the first character, and continue to query whether all child nodes of the first matching slot node in the first slot tree contain the next character, until it is found that the slot node with a successful match is a leaf node, and it is determined that there is a first type slot node that matches the character in the search statement.

6. The method according to claim 4 or 5, characterized in that determining the attribute features corresponding to the multimedia content based on the target search mode and the search statement comprises:

Determining, based on the character title node included in the target search pattern, that the content in the search sentence that matches the character title node is a character title;

Based on the character quantity node included in the target search pattern, it is determined that the content in the search sentence that matches the character quantity node is the character quantity.

7. The method according to any one of claims 1 to 6, characterized in that the process of constructing the slot tree comprises:

A slot tree corresponding to the attribute information is constructed based on existing information, wherein the existing information includes attribute information corresponding to existing multimedia content in a gallery and at least one of encyclopedic knowledge, and the attribute information corresponding to the multimedia content includes at least one of the following: annotation information input by a user for the multimedia content, shooting information of the multimedia content, and entity object labels recognized by the electronic device for the multimedia content; for multimedia content containing character objects, the annotation information includes character titles, and the character titles include at least one of a name, a nickname, and a character relationship.

8. The method according to any one of claims 1 to 7, characterized in that the slot tree includes a character title slot tree, a numeral slot tree, a quantifier slot tree, a stop word slot tree, and a suffix slot tree, and the character title slot tree includes at least one of a character name, a nickname, and a character relationship;

Alternatively, the slot tree includes a name slot tree, a nickname slot tree, a character relationship slot tree, a numeral slot tree, a quantifier slot tree, a stop word slot tree, and a suffix word slot tree.

9. The method according to claim 7 or 8, characterized in that the step of constructing a slot tree corresponding to the attribute information based on the existing information comprises:

Constructing a character title slot tree based on character title annotation information corresponding to the existing multimedia content in the electronic device and encyclopedic knowledge of character relationships;

Constructing a numeral slot tree based on encyclopedic knowledge of numerals and numerals contained in a pre-acquired search sentence corpus;

Constructing a quantifier slot tree based on encyclopedia knowledge of quantifiers and the quantifiers included in the search sentence corpus;

Constructing a stop word slot tree based on encyclopedia knowledge and words without actual meaning contained in the search sentence corpus;

A suffix word slot tree is constructed based on the suffix words included in the search sentence corpus.

10. The method according to any one of claims 1 to 9, characterized in that the process of constructing the pattern tree comprises:

Analyze the word arrangement order of each slot tree corresponding to the attribute contained in the pre-obtained search sentence corpus based on encyclopedia knowledge to obtain the description mode corresponding to the search sentence;

The pattern tree is constructed based on various description patterns.

11. An electronic device, characterized in that the electronic device comprises: one or more processors, a memory and a touch screen; the memory is used to store program code; the processor is used to run the program code, so that the electronic device implements the multimedia content search method as described in any one of claims 1 to 10.

12. A computer-readable storage medium, characterized in that instructions are stored thereon, and when the instructions are executed on an electronic device, the electronic device executes the multimedia content search method according to any one of claims 1 to 10.