[go: up one dir, main page]

CN119961466A - Multimedia content search method and device - Google Patents

Multimedia content search method and device Download PDF

Info

Publication number
CN119961466A
CN119961466A CN202311428944.4A CN202311428944A CN119961466A CN 119961466 A CN119961466 A CN 119961466A CN 202311428944 A CN202311428944 A CN 202311428944A CN 119961466 A CN119961466 A CN 119961466A
Authority
CN
China
Prior art keywords
tree
slot
search
character
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311428944.4A
Other languages
Chinese (zh)
Inventor
王亚猛
罗红枫
刘华兴
徐超劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202311428944.4A priority Critical patent/CN119961466A/en
Publication of CN119961466A publication Critical patent/CN119961466A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种多媒体内容搜索方法及装置,该方案在接收到用户输入的对多媒体内容的搜索语句后,基于预先构建的模式树及槽位树识别出该搜索语句中包含的图像属性特征,例如,对于包含人物对象的多媒体内容,可以识别出人物称谓、人物数量等。进一步地,基于识别出的图像属性特征及搜索语句对应的文本向量从索引库中搜索到相匹配的图片。该方法利用搜索语句的模式树和属性信息对应的槽位树准确识别出搜索语句中的图像属性特征(如人物称谓和人物数量等),提高了搜索语句的语义理解结果的准确度,最终提高了搜索结果的准确率。

The present application provides a multimedia content search method and device. After receiving a search statement for multimedia content input by a user, the solution identifies the image attribute features contained in the search statement based on a pre-built pattern tree and slot tree. For example, for multimedia content containing character objects, the character title, the number of characters, etc. can be identified. Furthermore, matching images are searched from an index library based on the identified image attribute features and the text vector corresponding to the search statement. The method uses the pattern tree of the search statement and the slot tree corresponding to the attribute information to accurately identify the image attribute features (such as character titles and the number of characters, etc.) in the search statement, thereby improving the accuracy of the semantic understanding results of the search statement and ultimately improving the accuracy of the search results.

Description

Multimedia content searching method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for searching multimedia content.
Background
With the continuous development of artificial intelligence technology, the multi-modal model enables a computer to have the ability to understand multi-modal media resources. Wherein the multimodal model represented by CLIP (Contrastive Language-IMAGE PRETRAINING) is used to understand the relationships of different modality resources in language and vision by encoding images and text into a unified vector space. However, due to limitations in training data sets, current multimodal models often have difficulty mapping user-personalized search sentences describing image information to the correct images, resulting in poor search results.
Disclosure of Invention
In view of the above, the present application provides a method and apparatus for searching multimedia content to solve the above problems, and the disclosed technical solution is as follows:
The application provides a multimedia content searching method which is applied to electronic equipment and comprises the steps of receiving a search statement of multimedia content, identifying image attribute features contained in the search statement based on a pre-constructed mode tree and a slot tree, wherein the mode tree represents a plurality of description modes comprising the search statement, each description mode comprises at least two attribute types, each attribute type corresponds to one slot tree, one slot tree comprises one type of attribute information of the multimedia content, the image attribute features comprise non-visual information of the multimedia content, converting the search statement into a text vector, searching multimedia content searching results matched with the search statement from a pre-constructed index base based on the image attribute features and the text vector, and displaying the multimedia content searching results. It can be seen that the method accurately identifies image attribute features in the search statement by using a mode tree of the search statement and a slot tree corresponding to the attribute information, wherein the image attribute features represent non-visual information of the multimedia content, such as person names, number of persons and the like. By utilizing the method, the non-visual attribute characteristics in the search statement can be accurately identified, so that the accuracy of the semantic understanding result of the search statement is improved, and finally the accuracy of the search result is improved.
In one possible implementation, the index library includes a plurality of index information items corresponding to the multimedia content, and one index information item includes attribute information corresponding to the multimedia content and a visual semantic vector corresponding to the multimedia content. Therefore, the search can be directly performed in the index base based on the attribute features and the visual semantic vectors, and the accuracy of the search results is improved.
In one possible implementation, the method for searching the multimedia content search result matched with the search statement from the pre-constructed index library based on the image attribute features and the text vectors comprises the steps of comparing the image attribute features with corresponding information fields of index information items in the index library one by one, screening out a first index information item set matched with the image attribute features, comparing the similarity between the text vectors and the visual semantic vectors, screening out a second index information item set with the similarity greater than or equal to a preset threshold value, and obtaining the multimedia content search result according to the first index information item and the second index information item set. Therefore, the method searches the images matched with the search sentences from the index base based on the information of the two dimensions of the image attribute features and the visual semantic features, and improves the accuracy of the search results.
In one possible implementation mode, identifying image attribute features contained in a search statement based on a pre-constructed mode tree and a slot tree comprises reading all first sub-nodes of a root node in the mode tree, sequentially reading first type slot nodes from the slot tree with the same name according to the slot names of all first sub-nodes, searching whether matched first type slot nodes exist one by one according to the character sequence in the search statement, determining whether the current character in the search statement is successfully matched with the current node in the mode tree if the matched first type slot nodes exist, wherein the current node is a node corresponding to the slot node successfully matched in the mode tree, reading all second sub-nodes of the current node in the mode tree, reading corresponding second type slot nodes according to the slot names of all second sub-nodes, sequentially searching the rest characters in the search statement until the mode tree is successfully matched with the leaf nodes in the mode tree, and determining attribute features corresponding to multimedia contents based on the target search mode and the search statement. It can be seen that the method firstly reads a node in the mode tree, then determines whether characters of the search statement are matched with the node in the mode tree according to the slot tree matched with the slot name of the node, and if so, continues to match the next characters of the search statement until all characters in the search statement are successfully matched with the corresponding node in the mode tree, and determines image attribute features contained in the search statement according to the slot names of the nodes in the mode tree. The scheme stores the description mode (namely the mode tree) of the search statement and the attribute information (namely the slot tree) of the multimedia content in a dictionary tree mode, and the data structure has higher query efficiency and occupies small storage space.
In one possible implementation mode, the step of searching whether the first type of the slot nodes matched with each other exists or not according to the character sequence in the search statement comprises the steps of reading first characters in the search statement, searching whether the slot tree corresponding to each first sub-node contains the slot nodes of the first characters or not respectively, judging whether the first matched slot node corresponding to the first characters is a leaf node in the first slot tree or not if the first slot tree containing the first characters exists, reading the next characters of the first characters if the slot node corresponding to the first characters in the first slot tree is not the leaf node, continuously searching whether all sub-nodes of the first matched slot node in the first slot tree contain the next characters or not until the slot node which is successfully matched is the leaf node, and determining that the first type of the slot node matched with the characters in the search statement exists.
In one possible implementation, determining the attribute characteristics corresponding to the multimedia content based on the target search mode and the search sentence comprises determining that content matched with the character name node in the search sentence is character name based on the character name node contained in the target search mode, and determining that content matched with the character number node in the search sentence is character number based on the character number node contained in the target search mode. Therefore, the scheme can accurately identify the attribute characteristics such as person names, person numbers and the like in the search statement, so that the accuracy of the search result is improved.
In one possible implementation, the process of constructing the slot tree includes constructing the slot tree corresponding to the attribute information based on the existing information, wherein the existing information includes at least one of attribute information and encyclopedia knowledge corresponding to the existing multimedia content in the gallery, the attribute information corresponding to the multimedia content includes at least one of label information input by a user to the multimedia content, shooting information of the multimedia content, and an entity object label identified by the electronic device to the multimedia content, and the label information includes a person name including at least one of a name, a nickname, and a person relationship for the multimedia content including the person object. Therefore, the scheme constructs the slot tree based on the attribute information corresponding to the existing multimedia content of the gallery and the encyclopedia knowledge, and the construction process is simple.
In one possible implementation, the slot tree includes a persona-designation slot tree, a number word slot tree, a stop word slot tree, and a suffix word slot tree, the persona-designation slot tree includes at least one of a persona name, a nickname, and a persona relationship, or the slot tree includes a name slot tree, a nickname slot tree, a persona relationship slot tree, a number word slot tree, a stop word slot tree, and a suffix word slot tree.
In one possible implementation mode, the method comprises the steps of constructing a slot tree corresponding to attribute information based on existing information, constructing a character name slot tree based on character name marking information corresponding to existing multimedia content in electronic equipment and encyclopedia knowledge of character relations, constructing a word number slot tree based on encyclopedia knowledge of words and words contained in search statement corpus obtained in advance, constructing a word number slot tree based on encyclopedia knowledge of words and words contained in search statement corpus, constructing an idle word slot tree based on words without practical meaning contained in encyclopedia knowledge and search statement corpus, and constructing a suffix word slot tree based on suffix words contained in search statement corpus.
In one possible implementation, the process of constructing the pattern tree comprises analyzing word arrangement sequences of corresponding attributes of each slot tree contained in a search sentence corpus obtained in advance based on encyclopedia knowledge to obtain description patterns corresponding to the search sentence, and constructing the pattern tree based on various description patterns.
In a second aspect, the application also provides an electronic device comprising one or more processors, a memory and a touch screen, the memory being for storing program code, the processor being for running the program code such that the electronic device implements the method of searching for multimedia content as in any of the first aspects.
In a third aspect, the present application also provides a computer readable storage medium having instructions stored thereon which, when executed on an electronic device, cause the electronic device to perform the multimedia content searching method of any of the first aspects.
In a fourth aspect, the present application also provides a computer program product having stored thereon an execution, which when run on an electronic device causes the electronic device to implement the network traffic optimization method according to any of the first aspects.
It should be appreciated that the description of technical features, aspects, benefits or similar language in the present application does not imply that all of the features and advantages may be realized with any single embodiment. Conversely, it should be understood that the description of features or advantages is intended to include, in at least one embodiment, the particular features, aspects, or advantages. Therefore, the description of technical features, technical solutions or advantageous effects in this specification does not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantageous effects described in the present embodiment may also be combined in any appropriate manner. Those of skill in the art will appreciate that an embodiment may be implemented without one or more particular features, aspects, or benefits of a particular embodiment. In other embodiments, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 2 is a flowchart of a multimedia searching method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a so-called slot tree provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a name slot tree according to an embodiment of the present application;
FIG. 5 is a diagram of a nickname slot tree provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a character relationship slot tree provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a digital slot tree according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a thesaurus tree provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of a suffix word slot tree provided by an embodiment of the application;
FIG. 10 is a schematic diagram of a stop word slot tree provided by an embodiment of the present application;
FIG. 11 is a schematic diagram of a pattern tree provided by an embodiment of the present application;
FIG. 12 is a schematic diagram of a gallery application interface provided by an embodiment of the application;
FIG. 13 is a schematic diagram of another gallery application interface provided by an embodiment of the application.
Detailed Description
The terms first, second, third and the like in the description and in the claims and in the drawings are used for distinguishing between different objects and not for limiting the specified order.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the related art, in a scene of multi-modal search of portrait pictures or videos represented by CLIP, in order to accurately identify attribute information such as person names (e.g., person names, nicknames, person relationships, etc.) and the number of persons from search words describing portrait pictures or videos, there are the following difficulties:
1. Ambiguity of person names user-defined person names are not normalized, such as Zhang San, dither, flower blossom bringing rich and honour, and the like, and therefore, the person names have ambiguity in practical application and are difficult to accurately identify.
2. The limiting object of the quantitative word has a variety that the quantitative word can be used for limiting various subjects, and the quantitative word limiting the person needs to be accurately extracted, for example, three in the three-pot flower is not the quantitative word limiting the person, and two in the double photo are the quantitative word limiting the person.
3. Privacy and power consumption problems for schemes that require user data to be reported to the cloud side to train a language model, uploading user data (e.g., information such as person names of users marked in a gallery) to the cloud side may result in user privacy leakage.
4. The problem of identification accuracy is that the matching inquiry is directly carried out on the search statement and the person name, and the problem of false identification is very easy to be caused, for example, a user searches lyric information containing songs of 'Zhang Sanng', but does not search for 'Zhang Sanng'. Another is to train a language model using natural language understanding technology, and the scheme is very easy to cause the problem of missing recognition due to the limitation of training corpus, for example, a "small and beautiful photo" is found, and the word segmentation of the language model results in a "small", "beautiful" photo.
In order to solve the technical problems, the application provides a multimedia content searching method, which constructs a slot tree corresponding to image attribute information based on existing information in advance, and constructs a slot tree according to each type of attribute information. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. And identifying specific image attribute features in the search statement, such as person names, person numbers and the like, based on the mode tree and the slot tree, namely improving the accuracy of semantic understanding results of the search statement. Further, the matched picture or video is searched based on the image attribute characteristics and the characteristics of two dimensions of the visual semantic vector of the search statement, so that the accuracy of the search result is improved.
The multimedia content searching method provided by the application is suitable for end-side electronic equipment, such as equipment of mobile phones, tablet computers, desktop, laptop computers, notebook computers, ultra-mobile Personal Computer (UMPC), handheld computers, netbooks, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), wearable electronic equipment, intelligent watches and the like.
Referring to fig. 1, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown.
As shown in fig. 1, the electronic device may include a processor 110, a memory 120, a camera 130, a display screen 140, and the like.
It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, and the different processing units may be separate devices or may be integrated into one or more processors.
Memory 120 is used to store computer-executable program code that includes instructions. The processor 110 performs various functional applications and data processing of the electronic device by executing instructions stored in memory. For example, in the present embodiment, the processor 110 may perform a multimedia content search by executing instructions stored in the memory.
The memory 120 may include a stored program area and a stored data area. The storage program area may store an application program (such as a capturing function of an image or video, a playing function of an image or video, etc.) required for at least one function of the operating system. The storage data area may store data created during use of the electronic device (e.g., image data, video data, etc.), and so on.
The camera 130 is used to capture still images or video. The multimedia content of the present application is an image or video.
The display screen 140 is used to display images, videos, and the like.
In addition, an operating system is run on the components. For example, the number of the cells to be processed,The operating system is such that the operating system,The operating system is such that the operating system,An operating system, etc. An application may be running in an operating system, for example, the application to which the present application relates includes a gallery.
In order to solve the problem of poor searching effect of the current multi-mode multimedia content searching scheme, the method for searching the multimedia content provided by the embodiment of the application can be applied to the electronic equipment shown in fig. 1, and particularly can be operated in a gallery application. Moreover, the method can be applied to searching of images and also to searching of videos.
First, a processing module related to a multimedia content searching method in a gallery application is introduced, and in an embodiment of the present application, the gallery application may include a gallery service, a searching module, and a multi-modal understanding module.
The gallery service is used for realizing basic functions of gallery application, such as storing images and videos, further displaying the stored images or videos, and receiving and storing attribute information corresponding to the multimedia content input by a user, such as adding attribute information or modifying attribute information.
In the embodiment of the application, the searching module is used for constructing a slot tree, a mode tree and a picture index so as to realize searching of images or videos stored in a gallery.
In the embodiment of the application, the multi-mode understanding module is used for carrying out visual semantic understanding conversion on image content or video content into corresponding vectors and carrying out vector conversion on search sentences input by a user.
The method for searching multimedia contents according to the present application will be described in detail with reference to fig. 2, and may include the steps of:
s101, the search module builds a slot tree corresponding to the image attribute information based on the existing information.
In the embodiment of the application, the image attribute information includes various information related to the image, for example, may include shooting information (such as shooting time, geographical position and the like) of the image, labeling information (such as name, nickname, person relationship and the like) of a user on an object contained in the image, and an image content label, wherein the content label may be an object contained in the image automatically identified based on the image content, such as sky, sunrise, sunset, seaside, building, children, animals and the like.
The existing information may include at least one of attribute information and encyclopedia knowledge of existing pictures in the gallery.
In the embodiment of the application, a dictionary tree, also called a slot tree, is constructed for part of type attribute information. The dictionary tree is a tree structure, and common prefixes of different character strings do not need to be repeatedly stored, so that the storage space can be saved. Meanwhile, the common prefix of the character strings can be utilized to reduce the inquiry time, reduce unnecessary character string comparison to the maximum extent and improve the inquiry efficiency.
In an exemplary embodiment, information that a user annotates a portrait in an image, such as a name, nickname, relationship of the persona, etc., may be collectively referred to as a "name", and a slot tree built for such attribute information is referred to as a "slot tree". The construction of the title information of the title slot tree may include the title information of the user for the picture annotation in the gallery. In other embodiments, the name slot tree may also be extended according to the contact name stored in the address book of the electronic device.
In one example, FIG. 3 illustrates a so-called slot tree that includes names of persons, nicknames, persona relationships, etc., e.g., where names of persons include Wang Ming, nicknames include small america, xiao Shuai, etc., and persona relationships include siblings, children, daughter, mother, friend, etc., in the example illustrated in FIG. 3. Fig. 3 is only an example, and in practical application, the slot tree may include more nodes and more complex structures, and the present application is not shown one by one.
In another exemplary embodiment, slot trees, such as a person name slot tree, a nickname slot tree, a persona relationship slot tree, etc., may be built separately for person names, nicknames, persona relationships, etc.
The name slot tree only comprises names corresponding to the figures, can be created according to names of the figures of the user in the gallery and further can be expanded according to names of contacts in the address book. For example, in one example, FIG. 4 is a schematic diagram of a personal name slot tree, where the names include Zhang Sanfeng, zhang Sanfeng, zhang, wang Ming, and so on.
The nickname slot tree only comprises nicknames corresponding to the portraits, can be created based on nicknames marked by users on portrait images or videos in the gallery, and further can be expanded according to nicknames of contacts in the instant messaging application. For example, in one example, FIG. 5 is a schematic diagram of a nickname slot tree, such as nicknames including flower blossom bringing rich and honour, flower placing forgetfulness and apprehension, summer, cotton candy, small lovely, and the like.
The character relation slot tree is a dictionary tree created based on words representing the relation between people obtained by summarizing encyclopedia knowledge. For example, in one example FIG. 6 is a schematic diagram of a persona relationship slot tree, e.g., persona relationships may include siblings, sisters, go, sisters, dad, mom, son, daughter, friend, classmate, and so forth.
In addition, in the embodiment of the application, a corresponding slot tree, also called a label slot tree, can be constructed for the image content labels. And constructing a label slot tree according to the content labels of the gallery labels.
In the embodiment of the application, in order to accurately identify the semantic content of the search statement input by the user, corresponding slot trees can be respectively constructed for the possible number words, the graduated words, the suffix words and the stop words (i.e. the words without actual semantics) in the search statement.
The number slot tree is a dictionary tree created based on numbers that may occur in the search term, and commonly used digital words can be summarized according to encyclopedia knowledge and corpus generalization of the search term. For example, in one example, FIG. 7 is a schematic diagram of a number slot tree, where the number may include two, three, four, etc.
The thesaurus slot tree is a dictionary tree created based on the possible thesaurus in the search sentence, and can be used for summarizing the corpus of the search sentence based on encyclopedia knowledge to obtain the common thesaurus. For example, in one example, FIG. 8 is a schematic diagram of a thesaurus tree, e.g., a thesaurus may include a person, a place, a name, a person, etc.
The suffix word slot tree is a dictionary tree created based on suffix words that may occur in the search term, and the dictionary tree may be created by summarizing commonly used suffix words from the corpus of the search term. For example, FIG. 9 is a schematic diagram of a suffix word slot tree, in which the suffix word may include a group photo, a large group photo, etc.
The stop word slot tree is based on words which may occur in search sentences and have no actual semantics, such as conjunctions, auxiliary words and the like, and common stop words can be obtained based on encyclopedia knowledge and summarization of search sentence corpus. For example, in one example, FIG. 10 is a schematic diagram of a stop word slot tree, where the stop words in this example may include AND, heel, and the like.
S102, the search module builds a mode tree corresponding to the search statement.
In the embodiment of the application, a dictionary tree, also called a pattern tree, is constructed for the arrangement sequence of word types in a search sentence.
In an exemplary embodiment, rules of the search term corpus (i.e., search terms of images or videos by multiple users) are summarized and analyzed in combination with encyclopedia knowledge, and different types of term arrangements frequently occurring in the search terms are obtained.
The present embodiment is described taking a search mode describing portrait information as an example, and as shown in fig. 11, in an example, a mode tree may include the following three search modes describing portrait information:
Search pattern 1 [ appellation ] (title) [ stop word ] (stop word) [ appellation ] [ stop word ] [ number ] (number word) [ quantifier ] (quantity word) [ suffix ] (suffix word). In the search statement "double-person" of brother and small, the "brother" is a designation, "and" is a stop word, "small" is a designation, "is a stop word," double "is a numeral word," person "is a graduated word, and" syndication "is a suffix word.
Search pattern 2, [ appellation ] [ stop word ] [ suffix ], e.g., "xiao Shuai" in "photos of xiao Shuai" is a designation, "photo" is a stop word, and "photo" is a suffix word.
Search pattern 3 [ number ] [ quantifier ] [ suffix ], e.g., "five people group" in "five people group" is a numeral word, "people" is an advert and "group" is a suffix word.
The above-mentioned steps S101 to S102 show a dictionary tree construction process, which is only performed before the first search, and is not performed before each search, but is performed by updating the dictionary tree as required.
S103, the user adds or modifies the image and the attribute information thereof in the gallery application.
In an exemplary embodiment, the user may add attribute information of the image in a gallery application, e.g., annotate the picture of a sibling with attribute information of a "sibling".
In another exemplary embodiment, the user may also modify the annotated attribute information, for example, annotate a picture with "Zhang Sanj" and then modify the annotation as "Dither".
In yet another exemplary embodiment, the user may add a new annotation to the image that has been annotated with attribute information, e.g., annotate a picture with a "son" followed by a "little lovely" annotation.
In addition, the user may also modify image content, such as graffiti, cropping, filters, adjusting image parameters (e.g., brightness, contrast, saturation, sharpness, etc.
S104, the gallery service stores images and attribute information thereof.
And the gallery service receives the image and the attribute information thereof and correspondingly stores the image and the attribute information thereof into a gallery database.
S105, the gallery service transmits images to the multi-mode understanding module for visual semantic understanding.
The gallery service transmits the received image to a multi-modal understanding module, and the multi-modal understanding module carries out visual semantic understanding on the image to obtain a vector corresponding to the picture, which is also called a visual semantic vector.
Visual semantic understanding refers to automatically identifying high-level concepts of objects, scenes, actions, etc., and relationships between them, from an image. It can help the computer understand the image more deeply and make a finer analysis and understanding of the image content.
S106, the multi-mode understanding module returns visual semantic vectors to the gallery service.
The multimodal understanding module delivers visual semantic vectors obtained through visual semantic understanding to a gallery service.
S107, the gallery service stores visual semantic vectors.
For example, the gallery service may store visual semantic vectors returned by the multimodal understanding module into a gallery database.
S108, the gallery service transmits visual semantic vectors and attribute information required for constructing indexes to the search module.
In practical applications, a user usually only focuses on part of attribute information (such as shooting time, geographic position, who is shooting an object, and relationship with the object) of an image, and does not focus on information such as sensitivity, exposure value, white balance, shutter speed, and the like of the image, but only needs to construct an index for the attribute focused by the user. Therefore, the gallery service only needs to transfer the visual semantic vector and the attribute information required for constructing the index to the search module.
For example, the attribute information required to construct the index may be agreed with the search module and written into the code of the gallery service.
S109, the search module constructs an index information item corresponding to the picture based on the visual semantic vector and the attribute information.
In an exemplary embodiment, the visual semantic vector sent by the gallery service includes a unique identifier of the vector and the picture, which is also referred to as a picture ID, and after the search module receives the visual semantic vector and the attribute information, the search module writes the visual semantic vector, each item of attribute information, and the picture ID into corresponding fields of the index gallery.
For example, the attribute information of the image includes shooting time, shooting location (i.e. geographical location), person name, etc., the index information item of the index library includes fields of time, location, name, picture semantic vector, picture ID, etc., in this example, the shooting time is written into the time field, the shooting location is written into the place field, the person name is written into the name, the visual semantic vector is written into the visual semantic vector field, the picture ID is written into the picture ID field, and so on, all information is written into the corresponding fields, so as to obtain the index information item corresponding to the picture.
The above-mentioned S103 to S109 show the index construction process, which is not required to be performed before each search, but is performed when the user newly adds or modifies the picture and its attribute information.
In addition, in order to improve the endurance of the electronic device, the above index construction process may be performed in a state in which the electronic device is charged or turned off, so as to reduce the power consumption of the index construction process.
S110, the gallery service receives search sentences.
Illustratively, the user may input a search term in a search field provided by the gallery application, as shown in FIG. 12, the bottom of the page 100 of the gallery application includes tab navigation fields 101, such as including four tab options of photo 102, album 103, time 104, and find 105, and the page 100 is a photo tab interface. The top of the page 100 includes the title "photo" 106 of the current tab interface, with a search field 107 below the tab title 106, and the user may enter a search term in the search field 107.
S111, the gallery service transmits search sentences to the search module.
S112, the search module identifies image attribute features in the search statement based on the mode tree and the slot tree.
In an exemplary embodiment, the process shown in S112 may include the steps of:
(1) The root node of the pattern tree is read as the current node.
(2) And reading all child nodes of the current node.
(3) And sequentially reading the slot nodes in the corresponding name slot tree according to the names of the child nodes.
(4) And matching the current nodes with the slot nodes one by one according to the character sequence in the search statement, and if the slot information successfully matched exists, describing the current nodes in the hit mode tree. Otherwise, stopping the inquiry of the subsequent nodes.
(5) Repeating the steps (2) - (4) until all characters in the search sentence hit sub-nodes in the pattern tree, namely the search pattern in the search sentence hit pattern tree, and otherwise, indicating the search pattern in the search sentence miss pattern tree.
(6) And acquiring related information contained in the search statement, such as person names, number of persons, sky, building, animals, sunrise and sunset, and the like, from the hit search mode.
S113, the search module transmits the search statement to the multi-mode semantic understanding module to perform vector conversion.
S114, the multi-mode semantic understanding module returns text vectors corresponding to the search sentences to the search module.
The search module transmits the received search sentence to the multi-mode semantic understanding module, the multi-mode semantic understanding module carries out vector conversion on the search sentence to obtain a text vector corresponding to the search sentence, and the text vector is returned to the search module.
In an exemplary embodiment, all characters of the search term may be directly converted into corresponding vectors.
In another exemplary embodiment, the multi-modal semantic module may identify non-visual information in a search term (also referred to as a first search term), delete words of the non-visual information in the search term to obtain a second search term related to the visual information, and further perform vector conversion on the second search term to obtain a text vector. For example, the first search term is "the day of the summer shot in 2023", wherein "2023 is non-visual information, and the second search term after deleting the non-visual information is" the day of the summer shot ".
S115, the searching module searches the images matched with the image attribute features and the text vectors corresponding to the search sentences in the index library.
In an exemplary embodiment, the search module compares the image attribute features one by one with corresponding fields in the index base. And meanwhile, comparing the similarity between the text vector corresponding to the search sentence and the visual semantic vector in the index library, and determining that the visual semantic vector is matched with the search sentence if the similarity is greater than or equal to a threshold value.
And combining the matching result of the image attribute characteristics and the matching result of the text vector to obtain a search result matched with the search statement.
In an exemplary embodiment, the search results may include pictures where the text vector matches all of the image attribute features, and may include pictures where the text vector matches and some of the image attribute features match. For example, the search term is "the day of the year 2023 is taken in beijing", the search result may include a picture in which the taking time is 2023 is taken in summer, the taking place is beijing, and the subject includes the day of the year, that is, a picture that is completely matched with the search term, and may further include a picture of all the days of the year 2023 taken in other places, or the day of the year taken at other times.
And S116, the searching module ranks the searched pictures.
In an exemplary embodiment, the pictures in the search results are ranked from high to low according to how well they match the search term. For example, one search term corresponds to a plurality of image attribute features, the more the attribute features match, the higher the matching degree, whereas the fewer the attribute features match, the lower the matching degree.
For example, in S115, the search term is "the day of the year 2023 is picked up in beijing" and the pickup time is 2023 is summer, the pickup location is beijing, and the pickup subject includes the highest matching degree of the day-falling picture, the second highest matching degree of the day-falling picture picked up in other locations in 2023 is summer, and the lowest matching degree of the day-falling picture picked up in other locations at other times.
S117, the search module returns the picture search result to the gallery service.
In the embodiment of the application, the picture search result obtained by the search module is a set of picture IDs matched with the search statement.
S118, the gallery service displays the picture search results.
And after receiving the search result fed back by the search module, the gallery service reads the corresponding picture from the picture database according to the picture ID in the search result and displays the picture on a search result display interface.
As shown in fig. 13, the interface 200 is shown for a picture search result of a gallery application, where the interface 200 includes a search bar 201, and the search bar 201 includes content of the search, for example, double photo of Xiaomei and Dither. A folder display area 202 is located below the search field 201, and is used to display a folder to which the searched picture belongs. The search result display area 203 is located below the folder display area 202, and is used for displaying a preset number of pictures.
According to the multimedia content searching method provided by the embodiment, the slot tree corresponding to the image attribute information is constructed in advance based on the existing information, and each type of attribute information constructs a slot tree. And constructing a pattern tree of search sentence patterns based on the encyclopedia knowledge and the search sentence corpus. After receiving a search statement input by a user, identifying image attribute features contained in the search statement based on the mode tree and the slot tree, and searching matched pictures based on the identified image attribute features and text vectors corresponding to the search statement. The method accurately identifies the image attribute characteristics in the search statement by utilizing the mode tree of the search statement and the slot tree corresponding to the attribute information, improves the accuracy of the semantic understanding result of the search statement, and finally improves the accuracy of the search result.
In addition, the method is executed in the electronic equipment, and user data (such as information marked on pictures or videos by users, such as person names and the like) does not need to be reported to the cloud server side, so that the safety of the user data is improved. In addition, the method identifies the image attribute characteristics by traversing the mode tree and the slot tree, and the mode tree and the slot tree are dictionary trees, so that the searching efficiency is high, the calculated amount is small and the power consumption is low.
The following describes a procedure of identifying image attribute features in a search term shown in S112 in connection with an example:
Example 1 search term "double group of small beauty and brothers", the process of identifying image attribute features is as follows:
(1) Reading a root node of the mode tree as a current node;
(2) As shown in fig. 11, the number node and appellation node are read from the mode tree according to the current node;
(3) Reading digital slot nodes from the digital slot tree, and reading title slot nodes from the title slot tree;
(4) Reading the first character "small" of the search sentence, and determining that the first character of the search sentence does not hit the number node if no node containing the "small" character is found in the digital slot tree as shown in fig. 7.
(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes called "small" nodes in the slot tree include two child nodes, namely "beautiful" and "commander", hit the "beautiful" node.
(6) And reading the child node of the appellation node in the mode tree, namely the stop word node, reading the next character of the American character, namely the sum from the search statement, searching the node containing the sum character in the stop word slot tree, and determining the sum hit stop word node.
(7) The child nodes of the stop word node in the read pattern tree, namely appellation and the unifix node, further read the next character of the "sum" character, namely the "sibling" character, from the search statement. And searching nodes containing 'brother' characters in the slot tree, continuing to read the next character, namely 'brother', of the 'brother' in the search statement, wherein the next character is identical to child nodes of the 'brother' nodes in the slot tree, and if the 'brother' in the slot tree is a leaf node, determining appellation nodes in a 'brother' hit mode tree in the search statement.
(8) The child node of appellation nodes in the reading mode tree, namely stop word nodes, reads the next character of the second 'sibling' character, namely 'from the search statement, searches the' node 'from the dead word slot tree, and the' node 'is a leaf node, so that the stop word nodes in the' character hit mode tree in the search statement are determined.
(9) Reading the next node of the stop word node in the mode tree, namely the number node, reading the next character 'double' to be compared in the search statement, searching the node containing the 'double' character in the digital slot tree, and determining the number node in the 'double' hit mode tree if the 'double' node is a leaf node.
(10) Reading the next node to be compared in the pattern tree, namely quantifiter nodes, reading the next character 'people' in the search statement, searching the nodes of the 'people' characters from the graduated slot tree, and determining quantifiter nodes in the 'people' hit pattern tree if the 'people' nodes are leaf nodes.
(11) Reading the next node of the pattern tree, namely a suffix node, reading the next character ' in the search statement, finding the node of the ' in-process ' character from the suffix word position tree, wherein the ' in-process ' node is not a leaf node, continuing to read the next character ' shadow ' in the search statement, finding the child node of the ' in-process ' node from the suffix word position tree to comprise a ' shadow ' node, and determining the suffix node in the ' in-process ' name pattern tree if the ' shadow ' node is a leaf node. The unification node is a leaf node of the pattern tree, then it is determined that the current search statement hits the search pattern [ appellation ] [ stop word ] [ appellation ] [ stop word ] [ number quantifier ] [ unification ].
(12) And determining the 'Xiaomei' and 'brother' to be called as the human body and the 'double' human body according to the search mode hit by the search statement.
Example 2 search statement "double group of sheetlet and brother", the process of identifying image attribute features is as follows:
(1) Reading a root node of the mode tree as a current node;
(2) Reading child nodes of a root node in the mode tree, namely a number node and a appellation node;
(3) Reading digital slot nodes from the digital slot tree, and reading title slot nodes from the title slot tree;
(4) Reading the first character "small" of the search sentence, and determining that the first character of the search sentence does not hit the number node if no node containing the "small" character is found in the digital slot tree as shown in fig. 7.
(5) The node of the "small" character is found in the child node of the root node in the title slot tree as shown in fig. 3, the next character of the "small" character, i.e. "me", is further read from the search statement, and the character is compared with the child node of the "small" character node in the title slot tree. As shown in FIG. 3, the child nodes of the "small" node in the title slot tree include two child nodes of "beauty" and "commander" and do not contain "sheet" nodes, i.e., the "small sheet" does not hit any title node.
(6) The query ends, the current search statement does not hit any search patterns, and no person names and number of persons are identified.
In this example, when searching is further performed based on the search term, the text vector corresponding to the search term is compared with the visual semantic vector in the index library in similarity to obtain an image matching the search term.
Example 3 search statement "three pot flower", the process of identifying image attribute features is as follows:
(1) The steps (3) are the same as the first three steps of the examples 1 and 2, and are not repeated here.
(4) Reading the first character 'three' of the search statement, finding a 'three' node in the digital slot tree, and determining that the search statement starts with a number if the node is a leaf node. And reading a second character 'basin' in the search statement, and determining that the current mode is not hit if the second character 'basin' does not hit in a slot node corresponding to the quantifier node in the mode tree, namely, if the second character 'basin' does not hit in any node of the graduated word slot tree.
(5) The first character "three" does not query any node in the title slot tree, and the determination search statement does not begin with the title.
(6) Stopping the query, determining that the current search statement did not hit any search patterns.
This example is similar to example 2 described above, and when searching is further based on a search term, similarity comparison is performed between the text vector corresponding to the search term and the visual semantic vector in the index library to obtain an image matching the search term.
In addition, in other embodiments, the search sentence is "double photo of the next year's sheetlet and the next year's photo", as can be seen from example 2, the matching of the sheetlet is unsuccessful, that is, the name of the person and the number of the person contained in the search sentence cannot be identified, and in this scenario, the time attribute information "next year" in the search sentence is retained and compared with the time field in the index library, and meanwhile, the similarity comparison is performed based on the text vector of the search sentence and the visual semantic vector in the index library, and the matching image with the search sentence is obtained by integrating the time attribute information and the text vector.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
In the several embodiments provided in this embodiment, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present embodiment may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present embodiment may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the respective embodiments. The storage medium includes various media capable of storing program codes such as flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk.
The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1.一种多媒体内容搜索方法,其特征在于,应用于电子设备,所述方法包括:1. A multimedia content search method, characterized in that it is applied to an electronic device, the method comprising: 接收对多媒体内容的搜索语句;receiving a search statement for multimedia content; 基于预先构建的模式树及槽位树识别所述搜索语句包含的图像属性特征,所述模式树表征包括搜索语句的多种描述模式,每个描述模式包括至少两个属性类型,每个属性类型对应一棵槽位树,一棵槽位树包括多媒体内容的一类属性信息,所述图像属性特征包括多媒体内容的非视觉信息;Identify the image attribute features contained in the search statement based on a pre-built pattern tree and slot tree, wherein the pattern tree representation includes multiple description patterns of the search statement, each description pattern includes at least two attribute types, each attribute type corresponds to a slot tree, a slot tree includes a type of attribute information of multimedia content, and the image attribute features include non-visual information of the multimedia content; 将所述搜索语句转换为文本向量;Converting the search statement into a text vector; 基于所述图像属性特征及所述文本向量,从预先构建的索引库中搜索与所述搜索语句相匹配的多媒体内容搜索结果;Based on the image attribute features and the text vector, searching for multimedia content search results matching the search statement from a pre-built index library; 显示所述多媒体内容搜索结果。The multimedia content search results are displayed. 2.根据权利要求1所述的方法,其特征在于,所述索引库包括多个多媒体内容对应的索引信息项,一条索引信息项包括一个多媒体内容对应的属性信息以及所述多媒体内容对应的视觉语义向量。2. The method according to claim 1 is characterized in that the index library includes index information items corresponding to multiple multimedia contents, and an index information item includes attribute information corresponding to a multimedia content and a visual semantic vector corresponding to the multimedia content. 3.根据权利要求2所述的方法,其特征在于,所述基于所述图像属性特征及所述文本向量,从预先构建的索引库中搜索与所述搜索语句相匹配的多媒体内容搜索结果,包括:3. The method according to claim 2, wherein searching for multimedia content search results matching the search statement from a pre-built index library based on the image attribute features and the text vector comprises: 将所述图像属性特征逐个与所述索引库中索引信息项的相应信息字段进行比较,筛选出与所述图像属性特征相匹配的第一索引信息项集合;Compare the image attribute features with corresponding information fields of index information items in the index library one by one, and filter out a first set of index information items that match the image attribute features; 以及,比较所述文本向量与所述视觉语义向量之间的相似度,筛选出所述相似度大于或等于预设阈值的第二索引信息项集合;and, comparing the similarity between the text vector and the visual semantic vector, and screening out a set of second index information items whose similarity is greater than or equal to a preset threshold; 根据所述第一索引信息项及所述第二索引信息项集合得到所述多媒体内容搜索结果。The multimedia content search result is obtained according to the first index information item and the second index information item set. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述基于预先构建的模式树及槽位树识别所述搜索语句包含的图像属性特征,包括:4. The method according to any one of claims 1 to 3, characterized in that the identifying the image attribute features contained in the search statement based on the pre-built pattern tree and slot tree comprises: 读取所述模式树中根节点的所有第一子节点;Read all first child nodes of the root node in the pattern tree; 根据各个第一子节点的槽位名称依次从相同名称的槽位树中读取第一类槽位节点;Read the first type of slot nodes from the slot tree with the same name in sequence according to the slot name of each first child node; 按照所述搜索语句中的字符顺序逐个查询是否存在相匹配的第一类槽位节点,若存在相匹配的第一类槽位节点确定所述搜索语句中的当前字符与所述模式树中的当前节点匹配成功,所述当前节点是所述模式树中匹配成功的槽位节点对应的节点;Querying one by one whether there is a matching first-category slot node according to the character sequence in the search statement, and if there is a matching first-category slot node, determining that the current character in the search statement successfully matches the current node in the pattern tree, and the current node is the node corresponding to the successfully matched slot node in the pattern tree; 读取所述模式树中所述当前节点的所有第二子节点,以及根据各第二子节点的槽位名称读取相应的第二类槽位节点,按顺序查询所述搜索语句中的剩余字符,直到与所述模式树中的叶子节点匹配成功确定所述模式树存在与所述搜索语句相匹配的目标搜索模式;Read all second child nodes of the current node in the pattern tree, and read corresponding second-type slot nodes according to the slot name of each second child node, and query the remaining characters in the search statement in order until a match is successfully made with a leaf node in the pattern tree to determine that a target search pattern matching the search statement exists in the pattern tree; 基于所述目标搜索模式和所述搜索语句确定多媒体内容对应的属性特征。The attribute features corresponding to the multimedia content are determined based on the target search mode and the search sentence. 5.根据权利要求4所述的方法,其特征在于,所述按照搜索语句中的字符顺序逐个查询是否存在相匹配的第一类槽位节点,包括:5. The method according to claim 4, characterized in that the querying one by one according to the order of characters in the search statement to see whether there is a matching first-category slot node comprises: 读取所述搜索语句中的首字符,分别查询各个所述第一子节点对应的槽位树中是否包含所述首字符的槽位节点;Read the first character in the search statement, and query whether the slot tree corresponding to each of the first sub-nodes contains a slot node of the first character; 若存在包含所述首字符的第一槽位树,则判断所述首字符对应的第一匹配槽位节点在所述第一槽位树中是否是叶子节点;If there is a first slot tree containing the first character, determining whether the first matching slot node corresponding to the first character is a leaf node in the first slot tree; 若在所述第一槽位树中所述首字符对应的槽位节点不是叶子节点,则读取所述首字符的下一个字符,继续查询所述第一槽位树中所述第一匹配槽位节点的所有子节点中是否包含所述下一个字符,直到查询到匹配成功的槽位节点是叶子节点,确定存在与所述搜索语句中的字符相匹配的第一类槽位节点。If the slot node corresponding to the first character in the first slot tree is not a leaf node, read the next character of the first character, and continue to query whether all child nodes of the first matching slot node in the first slot tree contain the next character, until it is found that the slot node with a successful match is a leaf node, and it is determined that there is a first type slot node that matches the character in the search statement. 6.根据权利要求4或5所述的方法,其特征在于,基于所述目标搜索模式和所述搜索语句确定多媒体内容对应的属性特征,包括:6. The method according to claim 4 or 5, characterized in that determining the attribute features corresponding to the multimedia content based on the target search mode and the search statement comprises: 基于所述目标搜索模式包含的人物称谓节点确定所述搜索语句中与所述人物称谓节点相匹配的内容为人物称谓;Determining, based on the character title node included in the target search pattern, that the content in the search sentence that matches the character title node is a character title; 基于所述目标搜索模式包含的人物数量节点确定所述搜索语句中与所述人物数量节点相匹配的内容为人物数量。Based on the character quantity node included in the target search pattern, it is determined that the content in the search sentence that matches the character quantity node is the character quantity. 7.根据权利要求1-6任一项所述的方法,其特征在于,构建槽位树的过程包括:7. The method according to any one of claims 1 to 6, characterized in that the process of constructing the slot tree comprises: 基于已有信息构建属性信息对应的槽位树,所述已有信息包括图库内已有多媒体内容对应的属性信息及百科知识中的至少一种,所述多媒体内容对应的属性信息包括如下至少一种:用户对多媒体内容输入的标注信息,所述多媒体内容的拍摄信息,以及所述电子设备对所述多媒体内容识别出的实体对象标签;对于包含人物对象的多媒体内容,所述标注信息包括人物称谓,所述人物称谓包括人物的名字,昵称,以及人物关系中的至少一种。A slot tree corresponding to the attribute information is constructed based on existing information, wherein the existing information includes attribute information corresponding to existing multimedia content in a gallery and at least one of encyclopedic knowledge, and the attribute information corresponding to the multimedia content includes at least one of the following: annotation information input by a user for the multimedia content, shooting information of the multimedia content, and entity object labels recognized by the electronic device for the multimedia content; for multimedia content containing character objects, the annotation information includes character titles, and the character titles include at least one of a name, a nickname, and a character relationship. 8.根据权利要求1-7任一项所述的方法,其特征在于,所述槽位树包括人物称谓槽位树、数词槽位树、量词槽位树、停用词槽位树和后缀词槽位树,所述人物称谓槽位树包括人物名字、昵称和人物关系中的至少一种;8. The method according to any one of claims 1 to 7, characterized in that the slot tree includes a character title slot tree, a numeral slot tree, a quantifier slot tree, a stop word slot tree, and a suffix slot tree, and the character title slot tree includes at least one of a character name, a nickname, and a character relationship; 或者,所述槽位树包括名字槽位树、昵称槽位树、人物关系槽位树、数词槽位树、量词槽位树、停用词槽位树和后缀词槽位树。Alternatively, the slot tree includes a name slot tree, a nickname slot tree, a character relationship slot tree, a numeral slot tree, a quantifier slot tree, a stop word slot tree, and a suffix word slot tree. 9.根据权利要求7或8所述的方法,其特征在于,所述基于已有信息构建属性信息对应的槽位树,包括:9. The method according to claim 7 or 8, characterized in that the step of constructing a slot tree corresponding to the attribute information based on the existing information comprises: 基于所述电子设备中已有多媒体内容对应的人物称谓标注信息及人物关系的百科知识,构建人物称谓槽位树;Constructing a character title slot tree based on character title annotation information corresponding to the existing multimedia content in the electronic device and encyclopedic knowledge of character relationships; 基于数词的百科知识及预先获得的搜索语句语料包含的数词构建数词槽位树;Constructing a numeral slot tree based on encyclopedic knowledge of numerals and numerals contained in a pre-acquired search sentence corpus; 基于量词的百科知识及所述搜索语句语料包含的量词构建量词槽位树;Constructing a quantifier slot tree based on encyclopedia knowledge of quantifiers and the quantifiers included in the search sentence corpus; 基于百科知识及所述搜索语句语料包含的无实际含义的词语构建停用词槽位树;Constructing a stop word slot tree based on encyclopedia knowledge and words without actual meaning contained in the search sentence corpus; 基于所述搜索语句语料包含的后缀词语构建后缀词槽位树。A suffix word slot tree is constructed based on the suffix words included in the search sentence corpus. 10.根据权利要求1-9任一项所述的方法,其特征在于,构建模式树的过程包括:10. The method according to any one of claims 1 to 9, characterized in that the process of constructing the pattern tree comprises: 基于百科知识分析预先获得的搜索语句语料包含的各槽位树对应属性的词语排列顺序,获得搜索语句对应的描述模式;Analyze the word arrangement order of each slot tree corresponding to the attribute contained in the pre-obtained search sentence corpus based on encyclopedia knowledge to obtain the description mode corresponding to the search sentence; 基于各种描述模式构建所述模式树。The pattern tree is constructed based on various description patterns. 11.一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器、存储器和触摸屏;所述存储器用于存储程序代码;所述处理器用于运行所述程序代码,使得所述电子设备实现如权利要求1至10任一项所述的多媒体内容搜索方法。11. An electronic device, characterized in that the electronic device comprises: one or more processors, a memory and a touch screen; the memory is used to store program code; the processor is used to run the program code, so that the electronic device implements the multimedia content search method as described in any one of claims 1 to 10. 12.一种计算机可读存储介质,其特征在于,其上存储有指令,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1至10任一项所述的多媒体内容搜索方法。12. A computer-readable storage medium, characterized in that instructions are stored thereon, and when the instructions are executed on an electronic device, the electronic device executes the multimedia content search method according to any one of claims 1 to 10.
CN202311428944.4A 2023-10-30 2023-10-30 Multimedia content search method and device Pending CN119961466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311428944.4A CN119961466A (en) 2023-10-30 2023-10-30 Multimedia content search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311428944.4A CN119961466A (en) 2023-10-30 2023-10-30 Multimedia content search method and device

Publications (1)

Publication Number Publication Date
CN119961466A true CN119961466A (en) 2025-05-09

Family

ID=95588706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311428944.4A Pending CN119961466A (en) 2023-10-30 2023-10-30 Multimedia content search method and device

Country Status (1)

Country Link
CN (1) CN119961466A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130112A1 (en) * 2005-06-30 2007-06-07 Intelligentek Corp. Multimedia conceptual search system and associated search method
US20110022638A1 (en) * 2007-06-27 2011-01-27 Wenyu Jiang Incremental Construction of Search Tree with Signature Pointers for Identification of Multimedia Content
CN103020052A (en) * 2011-09-20 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device
CN110019867A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Image search method, system and index structuring method and medium
CN112948608A (en) * 2021-02-01 2021-06-11 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium
CN114218431A (en) * 2021-11-19 2022-03-22 北京百度网讯科技有限公司 Video searching method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130112A1 (en) * 2005-06-30 2007-06-07 Intelligentek Corp. Multimedia conceptual search system and associated search method
US20110022638A1 (en) * 2007-06-27 2011-01-27 Wenyu Jiang Incremental Construction of Search Tree with Signature Pointers for Identification of Multimedia Content
CN103020052A (en) * 2011-09-20 2013-04-03 北京百度网讯科技有限公司 Method and device for recognizing search demand
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device
CN110019867A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Image search method, system and index structuring method and medium
CN112948608A (en) * 2021-02-01 2021-06-11 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium
CN114218431A (en) * 2021-11-19 2022-03-22 北京百度网讯科技有限公司 Video searching method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US9430719B2 (en) System and method for providing objectified image renderings using recognition information from images
US7809192B2 (en) System and method for recognizing objects from images and identifying relevancy amongst images and information
US8649572B2 (en) System and method for enabling the use of captured images through recognition
US7809722B2 (en) System and method for enabling search and retrieval from image files based on recognized information
CN106560809B (en) Modifying at least one attribute of an image with at least one attribute extracted from another image
US8577882B2 (en) Method and system for searching multilingual documents
Kuo et al. Unsupervised semantic feature discovery for image object retrieval and tag refinement
WO2022068543A1 (en) Multimedia content publishing method and apparatus, and electronic device and storage medium
CN112069326B (en) Knowledge graph construction method, device, electronic device and storage medium
Fu et al. Tagging personal photos with transfer deep learning
WO2006122164A2 (en) System and method for enabling the use of captured images through recognition
Sang et al. Exploiting social-mobile information for location visualization
CN118097374A (en) Product appearance image generation method and device, and similar image retrieval method and device
US20230050371A1 (en) Method and device for personalized search of visual media
de Andrade et al. Photo annotation: a survey
CN110704654A (en) Picture searching method and device
CN119961466A (en) Multimedia content search method and device
CN107153712B (en) Personalized customized picture management method supporting time-space association of mobile terminal
Lai et al. Improved search in Hamming space using deep multi-index hashing
Zhang et al. Hyperlink-aware object retrieval
Budikova et al. Multi-modal image retrieval for search-based image annotation with RF
Kim et al. User‐Friendly Personal Photo Browsing for Mobile Devices
Yamamuro et al. Exsight-multimedia information retrieval system
CN111753861A (en) Active learning automatic image annotation system and method
CN120336469A (en) Private domain data question and answer retrieval enhancement method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination