[go: up one dir, main page]

US20170330054A1 - Method And Apparatus Of Establishing Image Search Relevance Prediction Model, And Image Search Method And Apparatus - Google Patents

Method And Apparatus Of Establishing Image Search Relevance Prediction Model, And Image Search Method And Apparatus Download PDF

Info

Publication number
US20170330054A1
US20170330054A1 US15/281,198 US201615281198A US2017330054A1 US 20170330054 A1 US20170330054 A1 US 20170330054A1 US 201615281198 A US201615281198 A US 201615281198A US 2017330054 A1 US2017330054 A1 US 2017330054A1
Authority
US
United States
Prior art keywords
image
sample
query
training
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/281,198
Other versions
US10354170B2 (en
Inventor
Libo Fu
Heng Luo
Gaolin Fang
Wei Xu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Publication of US20170330054A1 publication Critical patent/US20170330054A1/en
Application granted granted Critical
Publication of US10354170B2 publication Critical patent/US10354170B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • G06K9/6257
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/30271
    • G06F17/30867
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • G06K9/66
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • G06N3/0472
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • Embodiments of the present invention relate to information processing technologies, in particular, to a method and an apparatus for establishing an image search relevance prediction model, and an image search method and apparatus.
  • Image search refers to an information retrieval process whereby a user enters a natural language query, for example, a query entered via a text field provided by a search engine; an image collection is searched and a sorted image result according to relevance and other parameters is returned.
  • the relevance is one of the major performance parameters of a search engine, and measures the degree of relevance between a returned result and a user's query need.
  • Images returned by an image search engine are in a structure-less pixel format, while queries entered by the user are in a text format. These two completely different information formats cannot be put into computation directly.
  • a text matching characteristic which is obtained by comparing image surrounding text with a query
  • a classification matching characteristic which is obtained by comparing a classification label with the query, the classification label is obtained by classifying image content
  • a click-through rate characteristic which is a measure of relevance between a specific image and the query obtained by conducting statistics on click behaviors and the like of a large number of user queries.
  • the surrounding text of the image may be inconsistent with the image content, and cannot completely and accurately describe the content of the image in many cases, thus affecting the accuracy of the text matching characteristic.
  • the classification matching characteristic is limited by the integrity of a category system and the correctness of a classification model. Generally, the finer the category system is, the more difficult is the classification, the less accurate becomes the classification model, the more semantically deviated from the query text is the classification result, and the more difficult matching becomes. However, if the category system is too rough, the matching with the query is not precise enough. Therefore, this characteristic generally only plays an auxiliary role.
  • the click-through rate characteristic is mainly based on user behavior statistics, has biases and noises on one hand, and sparsity on the other hand. Sufficient click statistics can only be collected from images presented at the top and with sufficient occurrences after frequent queries, while in other cases, no click statistics can be collected, or clicks are very sparse and lack statistical significance.
  • embodiments of the present invention provide a establishment of an image search relevance prediction model, and an image search method and apparatus, to optimize an existing image search technology, and improve relevance between an image search result and a query entered by a user.
  • an embodiment of the present invention provides a method for establishing an image search relevance prediction model, comprising:
  • an embodiment of the present invention provides a method for searching an image, comprising:
  • an embodiment of the present invention provides an apparatus for establishing an image search relevance prediction model, comprising:
  • a training module configured to train a pre-constructed original deep neural network by using a training sample
  • the training sample comprises: a query and image data
  • the original deep neural network comprises: a representation vector generation network and a relevance calculation network
  • the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network
  • the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value
  • a model generation module configured to use the trained original deep neural network as the image search relevance prediction model.
  • an image search apparatus comprising:
  • an original deep neural network is constructed first, wherein inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data.
  • Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model.
  • an image search engine receives an image query entered by a user
  • the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query.
  • the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user.
  • the present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • FIG. 1 is a flowchart of a method for establishing an image search relevance prediction model according to a first embodiment of the present invention
  • FIG. 2 is a schematic structural diagram for a deep neural network applicable to the first embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of another deep neural network applicable to the first embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for establishing an image search relevance prediction model according to a second embodiment of the present invention
  • FIG. 5 is a schematic structural diagram for a training network model applicable to the second embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for establishing an image search relevance prediction model according to a third embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for generating a positive-negative sample pair according to the third embodiment of the present invention.
  • FIG. 8 is a flowchart of an image search method according to a fourth embodiment of the present invention.
  • FIG. 9 is a structural diagram of an apparatus for establishing an image search relevance prediction model according to a fifth embodiment of the present invention.
  • FIG. 10 is a structural diagram of an image search apparatus according to a sixth embodiment of the present invention.
  • the objective of conducting image search for a queryquery entered by a user may be accurately implemented by establishing a relevance calculation model between image content and the queryquery, inputs of the calculation model being the image content and the queryquery, and an output thereof being a relevance metric value.
  • the content of the image (preferably comprising a surrounding text of the image and the like) and a query text of the user are deeply converted by using a deep neural network, and a relation between the image (text+content) and the query is established during the conversion.
  • one end of the input may be the surrounding text of the image, and the image content (or comprising other characteristic or information of the image, such as a click query of the image and various characteristics describing the image quality).
  • the other end of the input is the query text (or comprising other processed characteristics of the query).
  • a final output after the deep neural network is a relevance metric value between the image and the query, which may serve as a one-dimensional characteristic of the relevance of the image and the query.
  • FIG. 1 is a flowchart of a method for establishing an image search relevance prediction model according to a first embodiment of the present invention.
  • the method of this embodiment may be executed by an apparatus for establishing an image search relevance prediction model, and the apparatus may be implemented by hardware and/or software and may be generally integrated in a server for establishing an image search relevance prediction model.
  • the method of this embodiment specifically comprises the following steps.
  • the training sample comprises: a query and image data.
  • the original deep neural network needs to be trained by using the image data and the query as training samples at the same time.
  • the image data comprises image content data.
  • the image content data may comprise: pixels of the image or a content characteristic of the image (such as a content characteristic vector) after a certain processing.
  • the image data may further comprise: image associated text data, and/or image associated characteristic data.
  • the image associated text data specifically refers to: text information that is stored corresponding to the image and used to briefly describe the image content. For example, when an image is stored, a title “birthday card” of the image is stored at the same time.
  • the image associated characteristic data may comprise: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
  • the target query when a search user inputs a target query and clicks to select a target image in an image search result returned by the target query, the target query is a click query of the target image.
  • the quality characteristic parameter may comprise: parameters for describing the image quality, such as an image compression ratio, an image format, and an image resolution, which is not limited in this embodiment.
  • the original deep neural network comprises: a representation vector generation network and a relevance calculation network.
  • the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value.
  • the relevance calculation network may comprise: a hidden layer set and an output layer connected to an output end of the hidden layer set; wherein the hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end of the vector generation network is connected to an input end of the hidden layer set, and the output layer outputs the relevance metric value.
  • FIG. 2 is a schematic structural diagram of a deep neural network applicable to the first embodiment of the present invention.
  • the training sample input to the deep neural network may comprise: a query and image data.
  • the image data further comprises: image surrounding text data, image content data, and image associated characteristic data.
  • the representation vector generation network may comprise four representation vector generation units that are separately configured to convert the input query, the image surrounding text data, the image content data and the image associated characteristic data into corresponding representation vectors, for conducting subsequent model training works.
  • the representation vector generation unit may have various implementations according to different task targets. A brief description is provided herein:
  • the representation vector generation of the image pixel content is currently widely used in a CNN (Convolutional Neural Network) classification network.
  • the input of the network is a size-normalized image pixel matrix, and the output thereof is a classification representation vector of an image.
  • the classification representation vector is generally a classification probability distribution vector P of the image in a category system (the category system generally has thousands of to tens of thousands of category labels for images).
  • P (p 1 , p 2 , . . . p n );
  • N is the size (the number of categories) of the category system.
  • the representation vectors are directly input to the relevance calculation network.
  • the representation vectors may first pass through a full-connection hidden layer (related concepts of the hidden layer will be described later) and then be input to the relevance calculation network.
  • the output of the full-connection hidden layer may be construed as being similar to an Embedding expression (related definitions of the Embedding expression will be described later) of a text.
  • representation vector generation manners thereof are the same, which are both text representation vector generation.
  • each partitioned word is mapped into an one-hot representation vector according to a preset dictionary, for example, ( . . . , 0, . . . , 1, . . . , 0, . . . ), wherein the length of the vector is the size of the dictionary, one element is 1, and other elements are all 0.
  • a serial number of the position where the element 1 is located is corresponding to a serial number of the word in the dictionary.
  • a BoW-DNN Bog of Words-Deep Neural Networks
  • CNN CNN
  • RNN Recurrent Neural Network
  • the BoW-DNN network superimposes one-hot representation vectors of all partitioned words in the text and then inputs them to a full-connection hidden layer. Weights from the position of each partitioned word in the one-hot representation vectors to connection edges of neural units of the hidden layer are jointed to form a vector (the number of dimensions thereof is the same as the number of neural units of the hidden layer), which is also referred to as a word vector of the word.
  • An output vector of the hidden layer is actually a response value of a sum of word vectors of the words in the text passing through the neural units of the hidden layer, and is also referred to as an Embedding representation of the text. Because words in the text are simply superimposed without considering the word order, it is referred to as a Bag of Words.
  • the CNN network sequentially joints word vectors of the words in the text, and after an one-dimensional convolution operation and downsampling (also referred to as pooling), a fixed-length vector is obtained.
  • the vector may also be considered as an Embedding representation of the text, but added with the function of partial word order.
  • Such CNN network is also a generalization of applying an image pixel CNN network to one-dimensional texts.
  • the RNN network also takes the word order into consideration in a manner of inputting word vectors of the words to the full-connection hidden layer, and inputting, as a feedback, an output of a current word after passing through the hidden layer into the hidden layer again together with a next word.
  • the output thereof is also a fixed-length vector, and may also be considered as an Embedding representation of the text.
  • the word vectors and the three networks may be trained separately, or word vectors or networks that have been trained in other tasks may be used, or the vectors and the three networks may be trained in this task together with the subsequent relevance calculation network.
  • the word vectors and the three networks may be initialized randomly, or may be initialized by using the trained results in other tasks, and continuously updated in the training of this task.
  • the representation generation network of other characteristic data of the image depends on physical meanings of the characteristics.
  • the CNN or RNN network may be used.
  • the BoW-DNN network may be used.
  • the representation vector generation of the image pixel content and the representation vector generation of the image associated characteristic data may be trained separately, or trained in this task together with subsequent networks.
  • the parameters may be initialized randomly, or initialization may be conducted by using parameters that have been trained in other tasks.
  • the relevance calculation network further comprises two hidden layers connected end to end, and an output layer.
  • the hidden layer refers to a full-connection hidden layer, wherein the full-connection specifically refers to that every output of a previous layer is connected to every input of a next layer.
  • Each hidden layer has several neural units, and the output layer has only one neural unit. All representation vectors of the query and the image data are input to the first full-connection hidden layer. After linear summarization and non-linear responses of the neural units of the hidden layers and the output layer, a numerical value is finally output, which is the relevance metric value between the query and the image.
  • the relevance calculation network may further comprise: a first hidden layer set, a first standard vector representation unit connected to an output end of the first hidden layer set, a second hidden layer set, and a second standard vector representation unit connected to an output end of the second hidden layer set.
  • a vector distance calculation unit is connected to output ends of the first standard vector representation unit and the second standard vector representation unit respectively.
  • the hidden layer set comprises one or more hidden layers connected end to end.
  • a representation vector output end, which corresponds to the query, in the representation generation network is connected to an input end of the first hidden layer set.
  • a representation vector output end, which corresponds to the image data, in the representation generation network is connected to an input end of the second hidden layer set, and the vector distance calculation unit outputs the relevance metric value.
  • FIG. 3 is a schematic structural diagram of another deep neural network applicable to the first embodiment of the present invention.
  • a representation vector generation unit corresponding to a query is connected to a first hidden layer.
  • the representation vector generation units corresponding to image surrounding text data, image content data and image associated characteristic data respectively are connected to a second hidden layer.
  • the first hidden layer and the second hidden layer are connected to a first standard vector representation unit and a second standard vector representation unit respectively.
  • the first standard vector representation unit and the second standard vector representation unit are separately used to convert vectors output by the first hidden layer and the second hidden layer into two new representation vectors.
  • the two new representation vectors not only have a uniform format, but also are in the same representation space, and thereby can be input to a vector distance calculating unit to calculate a relevance metric value.
  • the vector distance calculating unit may calculate a cosine distance between the two vectors output by the first standard vector representation unit and the second standard vector representation unit, to determine the relevance metric value between the two vectors, or calculate another vector distance for measuring the similarity between the two vectors, such as an Euclidean distance between the two vectors, which is not limited in this embodiment.
  • the hidden layer comprises at least two neural units
  • the output layer comprises one neural unit
  • the number of vector dimensions generated by the representation vector generation network, the number of hidden layers comprised in the relevance calculation network, the number of neural units comprised in the hidden layer, the type of a response function of the neural unit, and a method of regularizing an output of the neural unit may be predetermined according to a task.
  • the variables and parameters in 1) and 3) to 5) may be preset according to a task.
  • the weight in 6) is generally initialized in a particular manner (such as random initialization), and is then trained and updated by using a large number of training samples, till it is converged to a certain extent.
  • the original deep neural network is trained by using a large number of training samples, to obtain the image search relevance prediction model.
  • Inputs of the image search relevance prediction model are a query entered by a user and image data of a target image (for example, comprising: image surrounding text data, image content data, and image associated characteristic data), and an output thereof is a relevance metric value between the query and the target image.
  • an original deep neural network is constructed first, wherein inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data.
  • Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model.
  • an image search engine receives an image query entered by a user, the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user.
  • the present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • FIG. 4 is a flowchart of a method for establishing an image search relevance prediction model according to a second embodiment of the present invention.
  • This embodiment is optimized on the basis of the above embodiment, and in this embodiment, the step of training a pre-constructed original deep neural network by using a training sample is specifically optimized by: selecting a set number of training samples; sequentially acquiring a training sample to input to the original deep neural network, and adjusting a weighted parameter in the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network; and returning to execute the operation of acquiring a training sample to input to the original deep neural network, till a preset training termination condition is achieved.
  • the method of this embodiment further comprises the following steps:
  • a query and a positive-negative sample image pair (also briefly referred to as a pair) under the query may be used as a training sample. That is, a training sample consists of a query and a pair formed by two images. In the pair, one image has better relevance with the Query than the other image, and the two images are referred to as a positive sample and a negative sample respectively.
  • the training sample may be specifically optimized as: a positive-negative training pair consisting of a training query as well as a positive sample image and a negative sample image that separately correspond to the training query.
  • a training query is “birthday card”
  • a positive sample image corresponding to the training query is an image 1
  • a negative sample image corresponding to the training query is an image 2
  • a training sample in the form of ⁇ (birthday card, image 1 ), (birthday card, image 2 )> may be constructed accordingly.
  • the positive sample image and the negative sample image corresponding to the training query may be manually determined according to the degree of relevance between different images and the query.
  • high labour costs are required because a large number of training samples are involved during training of the original deep neural network.
  • different persons have different evaluation standards on the degree of relevance. Therefore, in a exemplified implementation of this embodiment, a positive sample image and a negative sample image corresponding to a query may be automatically determined according to image click logs of users. For example, after a user enters a query to search for an image, an image clicked by the user based on a searching result is used as a positive sample image corresponding to the query, and an image not clicked by the user is used as a negative sample image corresponding to the query.
  • the training of the original deep neural network is accomplished by using the positive-negative training pair, and therefore, to improve the training efficiency, preferably, two completely identical original deep neural networks may be constructed for receiving a positive training pair consisting of the training query and the positive sample image and a negative training pair consisting of the training query and the negative sample image, respectively, thereby implementing quick and real-time model training.
  • FIG. 5 shows a schematic structural diagram of a training network model.
  • 420 may preferably comprise the following operations:
  • a query and a positive sample image are input to a first network with a structure identical to that of the original deep neural network, to obtain a relevance predicted value 1 .
  • the query and a negative sample image are input to a second network (comprising a weight) identical to the first network, to obtain a relevance predicted value 2 .
  • the difference is included in a loss function (also referred to as a rank cost).
  • weights of layers are updated layer-by-layer along a direction reverse to the direction of minimizing the loss function, and this type of method is generally referred to as a Back Propagation (BP) algorithm.
  • BP Back Propagation
  • Specific weight updating algorithms comprise various gradient descent methods, such as LBFGS (quasi-newton algorithm) or SGD (stochastic gradient descent), wherein the SGD has a faster convergence speed and is used more commonly.
  • the technical solution of this embodiment involves two identical networks, the parameters are shared, and weights are always updated synchronously.
  • a training termination condition may be set according to actual requirements, for example, the number of training rounds (such as 1000 or 2000 ), a total error value of the neural network with respect to the training samples, or the like, which is not limited in this embodiment.
  • the technical solution of the present invention constructs a positive-negative training pair according to positive and negative sample images corresponding to a same training query to serve as a training sample, and constructs, based on the positive-negative training pair, two networks identical to a preset original neural network model, so as to train weights of the model synchronously based on the positive-negative training pair. Accordingly, it avoids the problems that it is tremendously time consuming to manually label a large number of training samples, and it is hard to find a unified standard. Moreover, training of the original neural network model can be accomplished quickly and efficiently.
  • FIG. 6 is a flowchart for a method of establishing an image search relevance prediction model according to a third embodiment of the present invention.
  • This embodiment is optimized on the basis of the foregoing embodiment.
  • the step of selecting a set number of training samples is specifically optimized by: summarizing image click information corresponding to a same query sample according to image click logs of search users, wherein the query sample comprises: a single query or at least two queries meeting a set similarity threshold condition; generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information; and selecting a set number of query samples as the training queries, and generating positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
  • the method of this embodiment further comprises the following steps:
  • the preset original neural network has a great number of weight parameters, and if text-related parameters (for example, Embedding parameters of a text dictionary) in a representation vector generation network are all put in training update, there will be millions of parameters.
  • text-related parameters for example, Embedding parameters of a text dictionary
  • training data required is mainly generated from image click logs of users.
  • a user may click multiple images in a query process.
  • the clicked images may have better relevance with a query entered by the user than images viewed but not clicked by the user.
  • the query sample may merely comprise a single query. Further, less popular queries having less clicks or no clicks may share clicked images with other queries according to semantic similarities. Correspondingly, the query sample may further comprise at least two queries meeting a set similarity threshold condition.
  • birthday card may be directly selected as a query sample, or “birthday card”, “birthdate card” and “date-of-birth card” may be used as query samples in a manner of semantic similarity clustering.
  • the image click information may merely comprise: clicked images corresponding to the query sample.
  • the step of generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information may further comprise: grouping, among the clicked images, images with the number of clicks exceeding a first threshold into the positive image sample set, and images with the number of clicks less than a second threshold into the negative image sample set.
  • the first threshold and the second threshold may be preset according to actual conditions, and they may be identical or different, which is not limited in this embodiment.
  • the image click information may comprise both clicked images corresponding to the query sample and an image search result corresponding to the query sample.
  • the step of generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information may further comprise: among images corresponding to the image search result, grouping clicked images into the positive image sample set, and grouping unclicked images into the negative image sample set.
  • the step 630 may further comprise the following operations:
  • the acquired training query is “birthday card”
  • a target positive sample image set corresponding to the “birthday card” comprises: “image 1 ⁇ image 20 ”
  • a target negative sample image set corresponding to the “birthday card” comprises: “image 21 ⁇ image 80 ”.
  • the set sample image selecting rule may comprise: the selecting according to the number of clicks, the selecting according to the heat of images, the selecting randomly or the like, which is not limited in this embodiment. Meanwhile, the first number may be identical to or different from the second number, and the two may be selected in a customized manner according to actual requirements.
  • some images may be selected randomly from presented images of other queries to serve as negative samples, referred to as random negative samples. It should be understood that because the random negative sample is significantly different from the current operation query, it may be considered as highly credible.
  • At least one image is acquired from a positive image sample set corresponding to a non-associated query other than the current operation query, to serve as a target negative sample image corresponding to the current operation query.
  • a positive sample image set corresponding to a query “tiger” comprises “image 81 ⁇ image 100 ”, and “image 81 ⁇ image 100 ” may all be used as target negative sample images of “birthday card”.
  • target positive sample images “image 1 ⁇ image 3 ” corresponding to the training query “birthday card” and target negative sample images: “image 21 ⁇ image 22 , image 81 ” corresponding to the training query “birthday card” may be selected.
  • the set positive-negative image combination rule may comprise: combining any of the target positive sample images and any of the target negative sample images into a positive-negative training pair, to finally determine the training samples.
  • the set positive-negative image combination rule of may further comprise: separately combining each of the target positive sample images and each of the target negative sample images into a positive-negative training pair, to finally determine the training samples.
  • training samples may be finally generated by using other positive-negative image combination rules, which is not limited in this embodiment.
  • determining whether a preset training termination condition is achieved if yes, executing 660 ; otherwise, returning to execute 640 .
  • the method may further comprise: filtering out noise logs comprised in the image click logs.
  • Such setting is carried out because there may be much noise in image click logs of users. For example, a user is instead attracted to click some inappropriate images or malicious images that significantly contrast with related images, or such images are clicked in whatever queries. In some queries involving a lot of related results, the requirement of a user is already met after the user browses related images at the front, and probabilities of subsequent related images being clicked will significantly decrease. Such two behaviors will result in distortion of click/non-click, the number of clicks, and the relevance. Therefore, to further improve the accuracy of selected positive and negative sample images, it is necessary to filter out noise logs comprised in the image click logs.
  • Noise log recognition and removal are necessary operations for ensuring the accuracy of a trained model. Two methods are briefly introduced herein:
  • Click query clustering method All queries (referred to as click queries below) in which an image (comprising repeated images and similar images) is clicked are gathered and clustered, and thus major requirement categories that the image meets can be obtained while minor categories can be regarded as noise and hence removed. All click queries far away from the major requirement queries can be considered as noise clicks.
  • Image clustering method All clicked images corresponding to a query (comprising semantically identical and similar queries) are gathered, and classification results or class representations of these images are clustered; thus major image categories meeting the requirement of the query can be obtained while minor categories can be regarded as noise and hence removed.
  • positive (negative) samples under a query may be sorted according to credibility thereof.
  • the credibility of a positive sample may be deduced according to user behavior evidence. For example, generally speaking, a positive sample having a higher click-through rate, having more clicks, and presented farther away from the top when being clicked has higher credibility.
  • the negative samples also have a similar sorting method. For negative samples presented but having no clicks, if there is no user behaviour evidence, the credibility thereof may also be deduced according to relevance. For example, a negative sample having less presentations (in the same time window) and presented farther away from the top has poorer relevance, and has higher credibility.
  • the credibility of a random negative sample may be considered as the highest.
  • positive (negative) samples after being sorted may be selected sequentially or randomly, to balance noise and model distinctions.
  • queries may be selected according to a task target, and proportions of different types of queries are adjusted, for example: the proportion of high-frequency (low-frequency) queries, the proportion of queries having a large (small) number of resources, or the like.
  • a classification result thereof is not semantically refined when a category system for image classification is excessively small, while the classification accuracy is low and the difficulty of matching with a text (or a category label) of the query increases dramatically (which is the so-called semantic gap between an image and a text) when the category system is excessively large.
  • the query text and the image pixel content are matched in the representation space after deep conversion, and are not limited by the query or the image category system.
  • the existing click-through rate characteristic only applies to images with valid clicks in the query.
  • the network parameters in the embodiments of the present invention are obtained through training based on image click behaviors under all queries, and it generalizes the measure of relevance between images comprised in the user click behaviors and the query to any image not clicked or sparsely clicked image, and also to any query related to the current query, thereby implementing relevance calculation between any query and image.
  • the present invention has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like, and more thoroughly solves the problems expected to be solved.
  • FIG. 8 is a flow chart of an image search method according to a fourth embodiment of the present invention.
  • the method of this embodiment may be executed by an image search apparatus, and the apparatus may be implemented by hardware and/or software and may be generally integrated in a server where an image search engine is located.
  • the method of this embodiment further comprises the following steps:
  • the image query further refers to a text-form query entered by the user through an image search engine, for example: “birthday card”.
  • the to-be-sorted images specifically refer to image search results recalled by the image search engine and corresponding to the image query.
  • the technical solution of this embodiment inputs an image query and to-be-sorted images to an image search relevance prediction model trained in advance, to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user.
  • the technical solution optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • FIG. 9 is a structural diagram of an apparatus of establishing an image search relevance prediction model according to a fifth embodiment of the present invention. As shown in FIG. 9 , the apparatus comprises:
  • an original deep neural network is constructed, inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data.
  • Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model.
  • an image search engine receives an image query entered by a user, the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user.
  • the present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • the image data may comprise: image associated text data, image content data and image associated characteristic data; wherein the image associated characteristic data comprises: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
  • the relevance calculation network may comprise: a hidden layer set and an output layer connected to an output end of the hidden layer set; wherein the hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end of the vector generation network is connected to an input end of the hidden layer set, and the output layer outputs the relevance metric value.
  • the relevance calculation network may further comprise: a first hidden layer set, a first standard vector representation unit connected to an output end of the first hidden layer set, a second hidden layer set, and a second standard vector representation unit connected to an output end of the second hidden layer set.
  • a vector distance calculation unit is connected to output ends of the first standard vector representation unit and the second standard vector representation unit, respectively.
  • the hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end, which corresponds to the query, in the representation generation network is connected to an input end of the first hidden layer set, a representation vector output end, which corresponds to the image data, in the representation generation network is connected to an input end of the second hidden layer set, and the vector distance calculation unit outputs the relevance metric value.
  • the hidden layer comprises at least two neural units
  • the output layer comprises one neural unit; wherein the number of vector dimensions generated by the representation vector generation network, the number of hidden layers comprised in the relevance calculation network, the number of neural units comprised in the hidden layer, the type of a response function of the neural unit, and a method of regularizing an output of the neural unit are preset according to a task.
  • the training module may comprise:
  • the training sample may further comprise: a positive-negative training pair consisting of a training query as well as a positive sample image and a negative sample image that separately correspond to the training query.
  • the weighted parameter adjustment unit may be further configured to:
  • the training sample selecting module may further comprise:
  • the image click information may comprise: clicked images corresponding to the query samples.
  • the positive-negative image sample set generation subunit may be further configured to: among the clicked images, group images with the number of clicks exceeding a first threshold into the positive image sample set, and group images with the number of clicks less than a second threshold into the negative image sample set.
  • the image click information may comprise: clicked images corresponding to the query sample, and an image search result corresponding to the query sample.
  • the positive-negative image sample set generation subunit may be further configured to: among images corresponding to the image search results, group clicked images into the positive image sample set, and group unclicked images into the negative image sample set.
  • the training sample generation subunit may be further configured to:
  • the training sample generation subunit may further be further configured to: acquire at least one image from a positive image sample set corresponding to a non-associated query other than the current operation query, to serve as a target negative sample image corresponding to the current operation query.
  • the training sample selecting module may further comprise: a noise log filtering sub-unit, configured to: before the image click information corresponding to the same query sample are summarized according to the image click logs of the search users, filter out noise logs comprised in the image click logs.
  • a noise log filtering sub-unit configured to: before the image click information corresponding to the same query sample are summarized according to the image click logs of the search users, filter out noise logs comprised in the image click logs.
  • the apparatus of establishing an image search relevance prediction model provided in this embodiment of the present invention may be used to execute the method of establishing an image search relevance prediction model provided in the first embodiment to the third embodiment of the present invention, has corresponding function modules, and can implement the same beneficial effects.
  • FIG. 10 is a structural diagram of an image search apparatus according to a sixth embodiment of the present invention. As shown in FIG. 10 , the apparatus comprises:
  • the technical solution of this embodiment inputs an image query and to-be-sorted images to an image search relevance prediction model trained in advance to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user.
  • the technical solution optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • the image search apparatus provided in the embodiment of the present invention may be used to execute the image search method provided in any embodiment of the present invention, has corresponding function modules, and can implement the same beneficial effects.
  • modules or steps of the present invention may be implemented by a server as described above.
  • the embodiments of the present invention may be implemented by using a program executable by a computer device, and thus they can be stored in a storage device and executed by a processor.
  • the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk, an optical disc, or the like.
  • they are made into integrated circuit modules respectively, or multiple modules or steps among them are made into a single integrated circuit module for implementation. In this way, the present invention is not limited to any specific hardware-software combination.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Library & Information Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention disclose a method and an apparatus of establishing an image search relevance prediction model, and an image search method and apparatus. The method of establishing an image search relevance prediction model comprises: training a pre-constructed original deep neural network by using a training sample, wherein the training sample comprises: a query and image data, and the original deep neural network comprises: a representation vector generation network and a relevance calculation network; and using the trained original deep neural network as the image search relevance prediction model. The technical solution of the present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is related to and claims priority from Chinese Application No. 201610306220.6, filed on May 10, 2016, entitled “METHOD AND APPARATUS OF ESTABLISHING IMAGE SEARCH RELEVANCE PREDICTION MODEL, AND IMAGE SEARCH METHOD AND APPARATUS,” the entire disclosure of which is hereby incorporated by reference.
  • TECHNICAL FIELD
  • Embodiments of the present invention relate to information processing technologies, in particular, to a method and an apparatus for establishing an image search relevance prediction model, and an image search method and apparatus.
  • BACKGROUND
  • Image search refers to an information retrieval process whereby a user enters a natural language query, for example, a query entered via a text field provided by a search engine; an image collection is searched and a sorted image result according to relevance and other parameters is returned. The relevance is one of the major performance parameters of a search engine, and measures the degree of relevance between a returned result and a user's query need. Images returned by an image search engine are in a structure-less pixel format, while queries entered by the user are in a text format. These two completely different information formats cannot be put into computation directly.
  • Currently, relevance characteristics of image search are described mainly using the following three approaches: 1. a text matching characteristic, which is obtained by comparing image surrounding text with a query; 2. a classification matching characteristic, which is obtained by comparing a classification label with the query, the classification label is obtained by classifying image content; and 3. a click-through rate characteristic, which is a measure of relevance between a specific image and the query obtained by conducting statistics on click behaviors and the like of a large number of user queries.
  • The above three methods for describing a relevance characteristic of image search all have limitations:
  • For the characteristic text matching characteristic: the surrounding text of the image may be inconsistent with the image content, and cannot completely and accurately describe the content of the image in many cases, thus affecting the accuracy of the text matching characteristic.
  • The classification matching characteristic is limited by the integrity of a category system and the correctness of a classification model. Generally, the finer the category system is, the more difficult is the classification, the less accurate becomes the classification model, the more semantically deviated from the query text is the classification result, and the more difficult matching becomes. However, if the category system is too rough, the matching with the query is not precise enough. Therefore, this characteristic generally only plays an auxiliary role.
  • The click-through rate characteristic is mainly based on user behavior statistics, has biases and noises on one hand, and sparsity on the other hand. Sufficient click statistics can only be collected from images presented at the top and with sufficient occurrences after frequent queries, while in other cases, no click statistics can be collected, or clicks are very sparse and lack statistical significance.
  • SUMMARY
  • Accordingly, embodiments of the present invention provide a establishment of an image search relevance prediction model, and an image search method and apparatus, to optimize an existing image search technology, and improve relevance between an image search result and a query entered by a user.
  • In the first aspect, an embodiment of the present invention provides a method for establishing an image search relevance prediction model, comprising:
      • training a pre-constructed original deep neural network by using a training sample;
      • wherein the training sample comprises: a query and image data, and the original deep neural network comprises: a representation vector generation network and a relevance calculation network, the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value; and
      • using the trained original deep neural network as the image search relevance prediction model.
  • In the second aspect, an embodiment of the present invention provides a method for searching an image, comprising:
      • acquiring an image query entered by a user;
      • inputting separately the image query and to-be-sorted images into the image search relevance prediction model established by using the method for establishing an image search relevance prediction model described in the embodiment of the present invention, and calculating separately a relevance metric value between each of the to-be-sorted images and the image query; and
      • sorting the to-be-sorted images according to the calculated relevance metric values, and providing an image search result corresponding to the sorting result to the user.
  • In the third aspect, an embodiment of the present invention provides an apparatus for establishing an image search relevance prediction model, comprising:
  • a training module, configured to train a pre-constructed original deep neural network by using a training sample;
  • wherein the training sample comprises: a query and image data, and the original deep neural network comprises: a representation vector generation network and a relevance calculation network, the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value; and
  • a model generation module, configured to use the trained original deep neural network as the image search relevance prediction model.
  • In the fourth aspect, an embodiment of the present invention provides an image search apparatus, comprising:
      • an image query acquisition module, configured to acquire an image query entered by a user;
      • a relevance metric value calculation module, configured to separately input the image query and to-be-sorted images into the image search relevance prediction model established by using the apparatus of establishing an image search relevance prediction model described in the embodiment of the present invention, and separately calculate a relevance metric value between each of the to-be-sorted images and the image query; and
      • an image search result providing module, configured to sort the to-be-sorted images according to the calculated relevance metric values, and provide an image search result corresponding to the sorting result to the user.
  • In the embodiments of the present invention, an original deep neural network is constructed first, wherein inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data. Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model. After an image search engine receives an image query entered by a user, the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user. The present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a method for establishing an image search relevance prediction model according to a first embodiment of the present invention;
  • FIG. 2 is a schematic structural diagram for a deep neural network applicable to the first embodiment of the present invention;
  • FIG. 3 is a schematic structural diagram of another deep neural network applicable to the first embodiment of the present invention;
  • FIG. 4 is a flowchart of a method for establishing an image search relevance prediction model according to a second embodiment of the present invention;
  • FIG. 5 is a schematic structural diagram for a training network model applicable to the second embodiment of the present invention;
  • FIG. 6 is a flowchart of a method for establishing an image search relevance prediction model according to a third embodiment of the present invention;
  • FIG. 7 is a flowchart of a method for generating a positive-negative sample pair according to the third embodiment of the present invention;
  • FIG. 8 is a flowchart of an image search method according to a fourth embodiment of the present invention;
  • FIG. 9 is a structural diagram of an apparatus for establishing an image search relevance prediction model according to a fifth embodiment of the present invention; and.
  • FIG. 10 is a structural diagram of an image search apparatus according to a sixth embodiment of the present invention.
  • DETAILED DESCRIPTION OF EXEMPLIFIED EMBODIMENTS
  • To make the objectives, technical solutions, and advantages of the present invention clearer, the following describes the specific embodiments of the present invention in detail with reference to the accompanying drawings. It can be understood that, the specific embodiments described herein are merely used to illustrate the present invention rather than limiting the present invention.
  • In addition, it should be noted that, for ease of description, only part of rather than all of content related to the present invention is shown in the accompanying drawings. Before further detailed discussion of the exemplary embodiments, it should be noted that, some exemplary embodiments can be described as a process or method that is depicted as a flowchart. Although a flowchart can describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations can be rearranged. The process can be terminated when its operations are completed. However, the process may further involve additional steps not comprised in the accompanying drawings. The process may correspond to a method, a function, a routine, a subroutine, and the like.
  • To clearly describe the content of the embodiments of the present invention, the inventive concept of the present invention is emphatically described first.
  • As described above, in the three manners for describing a relevance characteristic of image search introduced in the background, relevance between an image and a query is not calculated directly on the basis of image content; instead, the relevance between the image and the queryquery is calculated according to characteristics indirectly associated with a surrounding text of the image, a class of the image, a click-through rate of the image, and the like. In contrast, in the present application, the inventor creatively proposed that: the objective of conducting image search for a queryquery entered by a user may be accurately implemented by establishing a relevance calculation model between image content and the queryquery, inputs of the calculation model being the image content and the queryquery, and an output thereof being a relevance metric value.
  • That is, the content of the image (preferably comprising a surrounding text of the image and the like) and a query text of the user are deeply converted by using a deep neural network, and a relation between the image (text+content) and the query is established during the conversion. In other words, one end of the input may be the surrounding text of the image, and the image content (or comprising other characteristic or information of the image, such as a click query of the image and various characteristics describing the image quality). The other end of the input is the query text (or comprising other processed characteristics of the query). A final output after the deep neural network is a relevance metric value between the image and the query, which may serve as a one-dimensional characteristic of the relevance of the image and the query.
  • First Embodiment
  • FIG. 1 is a flowchart of a method for establishing an image search relevance prediction model according to a first embodiment of the present invention. The method of this embodiment may be executed by an apparatus for establishing an image search relevance prediction model, and the apparatus may be implemented by hardware and/or software and may be generally integrated in a server for establishing an image search relevance prediction model. The method of this embodiment specifically comprises the following steps.
  • 110 training a pre-constructed original deep neural network by using a training sample.
  • In this embodiment, the training sample comprises: a query and image data.
  • As described above, to ensure that the deep neural network finally outputs the relevance metric value between the image and the query, the original deep neural network needs to be trained by using the image data and the query as training samples at the same time.
  • The image data comprises image content data. Typically, the image content data may comprise: pixels of the image or a content characteristic of the image (such as a content characteristic vector) after a certain processing.
  • Preferably, to further improve the accuracy of the relevance metric value, the image data may further comprise: image associated text data, and/or image associated characteristic data.
  • The image associated text data specifically refers to: text information that is stored corresponding to the image and used to briefly describe the image content. For example, when an image is stored, a title “birthday card” of the image is stored at the same time.
  • The image associated characteristic data may comprise: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
  • In this embodiment, when a search user inputs a target query and clicks to select a target image in an image search result returned by the target query, the target query is a click query of the target image. The quality characteristic parameter may comprise: parameters for describing the image quality, such as an image compression ratio, an image format, and an image resolution, which is not limited in this embodiment.
  • In this embodiment, the original deep neural network comprises: a representation vector generation network and a relevance calculation network. The representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value.
  • In a exemplified implementation of this embodiment, the relevance calculation network may comprise: a hidden layer set and an output layer connected to an output end of the hidden layer set; wherein the hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end of the vector generation network is connected to an input end of the hidden layer set, and the output layer outputs the relevance metric value.
  • FIG. 2 is a schematic structural diagram of a deep neural network applicable to the first embodiment of the present invention. As shown in FIG. 2, the training sample input to the deep neural network may comprise: a query and image data. The image data further comprises: image surrounding text data, image content data, and image associated characteristic data.
  • The representation vector generation network may comprise four representation vector generation units that are separately configured to convert the input query, the image surrounding text data, the image content data and the image associated characteristic data into corresponding representation vectors, for conducting subsequent model training works.
  • The representation vector generation unit may have various implementations according to different task targets. A brief description is provided herein:
  • Generation of a representation vector of the image content data (typically, the image pixel content)
  • The representation vector generation of the image pixel content is currently widely used in a CNN (Convolutional Neural Network) classification network. The input of the network is a size-normalized image pixel matrix, and the output thereof is a classification representation vector of an image. The classification representation vector is generally a classification probability distribution vector P of the image in a category system (the category system generally has thousands of to tens of thousands of category labels for images). P=(p1, p2, . . . pn); pi(i=1, 2, . . . , N) is a probability, provided by the CNN network, of an image belonging to the ith category, and N is the size (the number of categories) of the category system.
  • Preferably, after processing such as weight truncation (for example, pi lower than a threshold is set as 0, or first M pi are reserved, wherein M is an integer less than or equal to N), normalization, and binarization (conversion into vectors consisting of 0 and 1) is performed on the classification representation vectors, the representation vectors are directly input to the relevance calculation network. Alternatively, the representation vectors may first pass through a full-connection hidden layer (related concepts of the hidden layer will be described later) and then be input to the relevance calculation network. The output of the full-connection hidden layer may be construed as being similar to an Embedding expression (related definitions of the Embedding expression will be described later) of a text.
  • 2. Generation of Representation Vectors of the Image Surrounding Text Data and the Query
  • Because image surrounding text data and the query are both texts, representation vector generation manners thereof are the same, which are both text representation vector generation.
  • The text is partitioned, and then each partitioned word is mapped into an one-hot representation vector according to a preset dictionary, for example, ( . . . , 0, . . . , 1, . . . , 0, . . . ), wherein the length of the vector is the size of the dictionary, one element is 1, and other elements are all 0. A serial number of the position where the element 1 is located is corresponding to a serial number of the word in the dictionary. There are several options for subsequent processing, such as a BoW-DNN (Bag of Words-Deep Neural Networks) network, a CNN network, or an RNN (Recurrent Neural Network) network, which is not limited in this embodiment.
  • The BoW-DNN network superimposes one-hot representation vectors of all partitioned words in the text and then inputs them to a full-connection hidden layer. Weights from the position of each partitioned word in the one-hot representation vectors to connection edges of neural units of the hidden layer are jointed to form a vector (the number of dimensions thereof is the same as the number of neural units of the hidden layer), which is also referred to as a word vector of the word. An output vector of the hidden layer is actually a response value of a sum of word vectors of the words in the text passing through the neural units of the hidden layer, and is also referred to as an Embedding representation of the text. Because words in the text are simply superimposed without considering the word order, it is referred to as a Bag of Words.
  • Considering the word order, the CNN network sequentially joints word vectors of the words in the text, and after an one-dimensional convolution operation and downsampling (also referred to as pooling), a fixed-length vector is obtained. The vector may also be considered as an Embedding representation of the text, but added with the function of partial word order. Such CNN network is also a generalization of applying an image pixel CNN network to one-dimensional texts.
  • The RNN network also takes the word order into consideration in a manner of inputting word vectors of the words to the full-connection hidden layer, and inputting, as a feedback, an output of a current word after passing through the hidden layer into the hidden layer again together with a next word. The output thereof is also a fixed-length vector, and may also be considered as an Embedding representation of the text.
  • The word vectors and the three networks may be trained separately, or word vectors or networks that have been trained in other tasks may be used, or the vectors and the three networks may be trained in this task together with the subsequent relevance calculation network. When being trained together with the relevance calculation network, the word vectors and the three networks may be initialized randomly, or may be initialized by using the trained results in other tasks, and continuously updated in the training of this task.
  • 3. Generation of Representation Vectors of Image Associated Characteristic Data
  • The representation generation network of other characteristic data of the image depends on physical meanings of the characteristics. In case of an ordered format such as an image and a text, the CNN or RNN network may be used. In case of an unordered set characteristic (such as a probability distribution vector and some independent statistical values), the BoW-DNN network may be used.
  • Similar to the representation vector generation of the text, the representation vector generation of the image pixel content and the representation vector generation of the image associated characteristic data may be trained separately, or trained in this task together with subsequent networks. When being trained together, the parameters may be initialized randomly, or initialization may be conducted by using parameters that have been trained in other tasks.
  • As shown in FIG. 2, the relevance calculation network further comprises two hidden layers connected end to end, and an output layer.
  • In this embodiment, the hidden layer refers to a full-connection hidden layer, wherein the full-connection specifically refers to that every output of a previous layer is connected to every input of a next layer. Each hidden layer has several neural units, and the output layer has only one neural unit. All representation vectors of the query and the image data are input to the first full-connection hidden layer. After linear summarization and non-linear responses of the neural units of the hidden layers and the output layer, a numerical value is finally output, which is the relevance metric value between the query and the image.
  • In another exemplified implementation of this embodiment, the relevance calculation network may further comprise: a first hidden layer set, a first standard vector representation unit connected to an output end of the first hidden layer set, a second hidden layer set, and a second standard vector representation unit connected to an output end of the second hidden layer set. A vector distance calculation unit is connected to output ends of the first standard vector representation unit and the second standard vector representation unit respectively.
  • The hidden layer set comprises one or more hidden layers connected end to end. A representation vector output end, which corresponds to the query, in the representation generation network is connected to an input end of the first hidden layer set. A representation vector output end, which corresponds to the image data, in the representation generation network is connected to an input end of the second hidden layer set, and the vector distance calculation unit outputs the relevance metric value.
  • FIG. 3 is a schematic structural diagram of another deep neural network applicable to the first embodiment of the present invention. As shown in FIG. 3, a representation vector generation unit corresponding to a query is connected to a first hidden layer. The representation vector generation units corresponding to image surrounding text data, image content data and image associated characteristic data respectively are connected to a second hidden layer. The first hidden layer and the second hidden layer are connected to a first standard vector representation unit and a second standard vector representation unit respectively.
  • The first standard vector representation unit and the second standard vector representation unit are separately used to convert vectors output by the first hidden layer and the second hidden layer into two new representation vectors. The two new representation vectors not only have a uniform format, but also are in the same representation space, and thereby can be input to a vector distance calculating unit to calculate a relevance metric value.
  • Typically, the vector distance calculating unit may calculate a cosine distance between the two vectors output by the first standard vector representation unit and the second standard vector representation unit, to determine the relevance metric value between the two vectors, or calculate another vector distance for measuring the similarity between the two vectors, such as an Euclidean distance between the two vectors, which is not limited in this embodiment.
  • In an exemplified implementation of this embodiment, the hidden layer comprises at least two neural units, and the output layer comprises one neural unit.
  • The number of vector dimensions generated by the representation vector generation network, the number of hidden layers comprised in the relevance calculation network, the number of neural units comprised in the hidden layer, the type of a response function of the neural unit, and a method of regularizing an output of the neural unit may be predetermined according to a task.
  • The representation vector generation network and the relevance calculation network constructed in this embodiment mainly have the following variables or parameters:
      • 1) the number of dimensions of an input characteristic (a query and image data). For example, for a text, the number of characteristic dimensions refers to the size of a text dictionary, and is generally hundreds of thousands or millions. A Chinese text may be partitioned first, and a word not existing in the dictionary may be removed, or replaced with a special symbol (which is put in the dictionary). The image pixel content refers to the number of channels and the normalized size of the image;
      • 2) the number of dimensions of a representation vector, and a generation network structure of the representation vector (this part has been described above);
      • 3) the number of hidden layers and the number of neural units of each hidden layer;
      • 4) a response function type of the neural unit;
      • 5) a method of regularizing an output of the neural unit, to avoid overflow and underflow of an output value; and
      • 6) a weight value on an input edge of the neural unit, and an initialization method.
  • The variables and parameters in 1) and 3) to 5) may be preset according to a task. The weight in 6) is generally initialized in a particular manner (such as random initialization), and is then trained and updated by using a large number of training samples, till it is converged to a certain extent.
  • 120, using the trained original deep neural network as the image search relevance prediction model.
  • As described above, after the network structure of the original deep neural network is determined, the original deep neural network is trained by using a large number of training samples, to obtain the image search relevance prediction model.
  • Inputs of the image search relevance prediction model are a query entered by a user and image data of a target image (for example, comprising: image surrounding text data, image content data, and image associated characteristic data), and an output thereof is a relevance metric value between the query and the target image.
  • In this embodiment of the present invention, an original deep neural network is constructed first, wherein inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data. Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model. After an image search engine receives an image query entered by a user, the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user. The present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • Second Embodiment
  • FIG. 4 is a flowchart of a method for establishing an image search relevance prediction model according to a second embodiment of the present invention. This embodiment is optimized on the basis of the above embodiment, and in this embodiment, the step of training a pre-constructed original deep neural network by using a training sample is specifically optimized by: selecting a set number of training samples; sequentially acquiring a training sample to input to the original deep neural network, and adjusting a weighted parameter in the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network; and returning to execute the operation of acquiring a training sample to input to the original deep neural network, till a preset training termination condition is achieved.
  • Correspondingly, the method of this embodiment further comprises the following steps:
  • 410 selecting a set number of training samples.
  • It is tremendously time consuming to directly label a large number of training samples (relevance values between a query and images), and it is hard to find a unified standard. In this embodiment, a query and a positive-negative sample image pair (also briefly referred to as a pair) under the query may be used as a training sample. That is, a training sample consists of a query and a pair formed by two images. In the pair, one image has better relevance with the Query than the other image, and the two images are referred to as a positive sample and a negative sample respectively.
  • Correspondingly, the training sample may be specifically optimized as: a positive-negative training pair consisting of a training query as well as a positive sample image and a negative sample image that separately correspond to the training query. In a specific example, a training query is “birthday card”, a positive sample image corresponding to the training query is an image 1, a negative sample image corresponding to the training query is an image 2, and therefore, a training sample in the form of <(birthday card, image 1), (birthday card, image 2)> may be constructed accordingly.
  • The positive sample image and the negative sample image corresponding to the training query may be manually determined according to the degree of relevance between different images and the query. However, high labour costs are required because a large number of training samples are involved during training of the original deep neural network. In addition, different persons have different evaluation standards on the degree of relevance. Therefore, in a exemplified implementation of this embodiment, a positive sample image and a negative sample image corresponding to a query may be automatically determined according to image click logs of users. For example, after a user enters a query to search for an image, an image clicked by the user based on a searching result is used as a positive sample image corresponding to the query, and an image not clicked by the user is used as a negative sample image corresponding to the query.
  • 420, sequentially acquiring a training sample to input to the original deep neural network, and adjusting a weighted parameter of the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network.
  • In this embodiment, the training of the original deep neural network is accomplished by using the positive-negative training pair, and therefore, to improve the training efficiency, preferably, two completely identical original deep neural networks may be constructed for receiving a positive training pair consisting of the training query and the positive sample image and a negative training pair consisting of the training query and the negative sample image, respectively, thereby implementing quick and real-time model training. FIG. 5 shows a schematic structural diagram of a training network model.
  • As shown in FIG. 5, 420 may preferably comprise the following operations:
      • inputting the training query and image data corresponding to the positive sample image to a first network with a structure identical to that of the original deep neural network, and acquiring a first predicted value output by the first network; inputting the training query and image data corresponding to the negative sample image to a second network with a structure identical to that of the first network, and acquiring a second predicted value output by the second network;
      • calculating a loss function according to the first predicted value, the second predicted value, and a relevance partial order between the positive sample image and the negative sample image; and
      • updating, along a direction reverse to a direction of minimizing the loss function, weighted parameters of layers in the first network and the second network layer-by-layer by using a set weight updating algorithm.
  • In a specific example, a query and a positive sample image are input to a first network with a structure identical to that of the original deep neural network, to obtain a relevance predicted value 1. The query and a negative sample image are input to a second network (comprising a weight) identical to the first network, to obtain a relevance predicted value 2. Depending on whether a sign of a difference between the predicted value 1 and the predicted value 2 is consistent with a relevance partial order between the positive sample image and the negative sample image, the difference is included in a loss function (also referred to as a rank cost).
  • Next, like the method for training a general deep neural network, weights of layers are updated layer-by-layer along a direction reverse to the direction of minimizing the loss function, and this type of method is generally referred to as a Back Propagation (BP) algorithm. Specific weight updating algorithms comprise various gradient descent methods, such as LBFGS (quasi-newton algorithm) or SGD (stochastic gradient descent), wherein the SGD has a faster convergence speed and is used more commonly.
  • Different from the general deep neural network training, the technical solution of this embodiment involves two identical networks, the parameters are shared, and weights are always updated synchronously.
  • 430 determining whether a preset training termination condition is achieved: if yes, executing 440; otherwise, returning to execute 420.
  • In this embodiment, a training termination condition may be set according to actual requirements, for example, the number of training rounds (such as 1000 or 2000), a total error value of the neural network with respect to the training samples, or the like, which is not limited in this embodiment.
  • 440 using the trained original deep neural network as the image search relevance prediction model.
  • The technical solution of the present invention constructs a positive-negative training pair according to positive and negative sample images corresponding to a same training query to serve as a training sample, and constructs, based on the positive-negative training pair, two networks identical to a preset original neural network model, so as to train weights of the model synchronously based on the positive-negative training pair. Accordingly, it avoids the problems that it is tremendously time consuming to manually label a large number of training samples, and it is hard to find a unified standard. Moreover, training of the original neural network model can be accomplished quickly and efficiently.
  • Third Embodiment
  • FIG. 6 is a flowchart for a method of establishing an image search relevance prediction model according to a third embodiment of the present invention. This embodiment is optimized on the basis of the foregoing embodiment. In this embodiment, the step of selecting a set number of training samples is specifically optimized by: summarizing image click information corresponding to a same query sample according to image click logs of search users, wherein the query sample comprises: a single query or at least two queries meeting a set similarity threshold condition; generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information; and selecting a set number of query samples as the training queries, and generating positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
  • Correspondingly, the method of this embodiment further comprises the following steps:
  • 610 summarizing image click information corresponding to the same query sample according to image click logs of search users.
  • Generally speaking, the preset original neural network has a great number of weight parameters, and if text-related parameters (for example, Embedding parameters of a text dictionary) in a representation vector generation network are all put in training update, there will be millions of parameters. For parameters of such a scale, training data required is mainly generated from image click logs of users.
  • A user may click multiple images in a query process. The clicked images may have better relevance with a query entered by the user than images viewed but not clicked by the user.
  • Repeated queries of a large number of users are summarized to obtain more credible positive and negative sample images statistically: images having high click-through rates (positive sample images) versus images having low click-through rates (negative sample images), and images having a lot of clicks (positive sample images) versus images having no clicks (negative sample images).
  • The query sample may merely comprise a single query. Further, less popular queries having less clicks or no clicks may share clicked images with other queries according to semantic similarities. Correspondingly, the query sample may further comprise at least two queries meeting a set similarity threshold condition.
  • In a specific example, “birthday card” may be directly selected as a query sample, or “birthday card”, “birthdate card” and “date-of-birth card” may be used as query samples in a manner of semantic similarity clustering.
  • 620, generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information.
  • In an exemplified implementation of this embodiment, the image click information may merely comprise: clicked images corresponding to the query sample.
  • Correspondingly, the step of generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information may further comprise: grouping, among the clicked images, images with the number of clicks exceeding a first threshold into the positive image sample set, and images with the number of clicks less than a second threshold into the negative image sample set.
  • The first threshold and the second threshold may be preset according to actual conditions, and they may be identical or different, which is not limited in this embodiment.
  • In another exemplified implementation of this embodiment, the image click information may comprise both clicked images corresponding to the query sample and an image search result corresponding to the query sample.
  • Correspondingly, the step of generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information may further comprise: among images corresponding to the image search result, grouping clicked images into the positive image sample set, and grouping unclicked images into the negative image sample set.
  • 630, selecting a set number of query samples as the training queries, and generating positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
  • As shown in FIG. 7, in a exemplified implementation of this embodiment, the step 630 may further comprise the following operations:
  • 6301, sequentially acquiring a training query as a current operation query, and acquiring a target positive sample image set and a target negative sample image set corresponding to the current operation query.
  • In a specific example, the acquired training query is “birthday card”, a target positive sample image set corresponding to the “birthday card” comprises: “image 1˜image 20”, and a target negative sample image set corresponding to the “birthday card” comprises: “image 21˜image 80”.
  • 6302, selecting a first number of target positive sample images from the target positive sample image set and selecting a second number of target negative sample images from the target negative sample image set according to a set sample image selecting rule.
  • The set sample image selecting rule may comprise: the selecting according to the number of clicks, the selecting according to the heat of images, the selecting randomly or the like, which is not limited in this embodiment. Meanwhile, the first number may be identical to or different from the second number, and the two may be selected in a customized manner according to actual requirements.
  • Further, to enhance the diversity of the training samples, some images may be selected randomly from presented images of other queries to serve as negative samples, referred to as random negative samples. It should be understood that because the random negative sample is significantly different from the current operation query, it may be considered as highly credible.
  • That is, at least one image is acquired from a positive image sample set corresponding to a non-associated query other than the current operation query, to serve as a target negative sample image corresponding to the current operation query.
  • For example, a positive sample image set corresponding to a query “tiger” comprises “image 81˜image 100”, and “image 81˜image 100” may all be used as target negative sample images of “birthday card”.
  • Still referring to the foregoing example, target positive sample images: “image 1˜image 3” corresponding to the training query “birthday card” and target negative sample images: “image 21˜image 22, image 81” corresponding to the training query “birthday card” may be selected.
  • 6303, separately selecting sample images from the first number of target positive sample images and the second number of target negative sample images according to a set positive-negative image combination rule, and generating a third number of positive-negative training pairs corresponding to the current operation query as the training samples.
  • In an exemplified implementation of this embodiment, the set positive-negative image combination rule may comprise: combining any of the target positive sample images and any of the target negative sample images into a positive-negative training pair, to finally determine the training samples.
  • For example: <(birthday card, image 1), (birthday card, image 21)>, <(birthday card, image 2), (birthday card, image 22)>, <(birthday card, image 3), (birthday card, image 81)>.
  • In another exemplified implementation of this embodiment, the set positive-negative image combination rule of may further comprise: separately combining each of the target positive sample images and each of the target negative sample images into a positive-negative training pair, to finally determine the training samples.
  • For example: <(birthday card, image 1), (birthday card, image 21)>, <(birthday card, image 1), (birthday card, image 22)>, <(birthday card, image 1), (birthday card, image 81)>, <(birthday card, image 2), (birthday card, image 21)>, <(birthday card, image 2), (birthday card, image 22)>, <(birthday card, image 2), (birthday card, image 81)>, <(birthday card, image 3), (birthday card, image 21)>, <(birthday card, image 3), (birthday card, image 22)>, <(birthday card, image 3), (birthday card, image 81)>.
  • Definitely, persons skilled in the art may understand that the training samples may be finally generated by using other positive-negative image combination rules, which is not limited in this embodiment.
  • 6304, determining whether all training queries are processed: if yes, ending the process; otherwise, returning to execute 6301.
  • 640, sequentially acquiring a training sample to input to the original deep neural network, and adjusting a weighted parameter in the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network.
  • 650, determining whether a preset training termination condition is achieved: if yes, executing 660; otherwise, returning to execute 640.
  • 660, using the trained original deep neural network as the image search relevance prediction model.
  • According to the technical solution of this embodiment, by finally acquiring training samples from image click logs of users, positive and negative sample images that are more statistically credible can be obtained. A finally trained image search relevance prediction model based on the positive and negative sample images can be closer to an ideal or a required relevance prediction model. Therefore, an image search result based on the image search relevance prediction model is closer to actual requirements of users, thereby improving search experience of the users.
  • On the basis of the above embodiments, before summarizing image click information corresponding to a same query sample according to image click logs of search users, the method may further comprise: filtering out noise logs comprised in the image click logs.
  • Such setting is carried out because there may be much noise in image click logs of users. For example, a user is instead attracted to click some inappropriate images or malicious images that significantly contrast with related images, or such images are clicked in whatever queries. In some queries involving a lot of related results, the requirement of a user is already met after the user browses related images at the front, and probabilities of subsequent related images being clicked will significantly decrease. Such two behaviors will result in distortion of click/non-click, the number of clicks, and the relevance. Therefore, to further improve the accuracy of selected positive and negative sample images, it is necessary to filter out noise logs comprised in the image click logs.
  • Noise log recognition and removal are necessary operations for ensuring the accuracy of a trained model. Two methods are briefly introduced herein:
  • Click query clustering method: All queries (referred to as click queries below) in which an image (comprising repeated images and similar images) is clicked are gathered and clustered, and thus major requirement categories that the image meets can be obtained while minor categories can be regarded as noise and hence removed. All click queries far away from the major requirement queries can be considered as noise clicks.
  • Image clustering method: All clicked images corresponding to a query (comprising semantically identical and similar queries) are gathered, and classification results or class representations of these images are clustered; thus major image categories meeting the requirement of the query can be obtained while minor categories can be regarded as noise and hence removed.
  • On the basis of the embodiments above, when positive and negative sample images are selected, positive (negative) samples under a query may be sorted according to credibility thereof. The credibility of a positive sample may be deduced according to user behavior evidence. For example, generally speaking, a positive sample having a higher click-through rate, having more clicks, and presented farther away from the top when being clicked has higher credibility. The negative samples also have a similar sorting method. For negative samples presented but having no clicks, if there is no user behaviour evidence, the credibility thereof may also be deduced according to relevance. For example, a negative sample having less presentations (in the same time window) and presented farther away from the top has poorer relevance, and has higher credibility. The credibility of a random negative sample may be considered as the highest.
  • Meanwhile, when the positive and negative sample images are selected, positive (negative) samples after being sorted may be selected sequentially or randomly, to balance noise and model distinctions. Moreover, queries may be selected according to a task target, and proportions of different types of queries are adjusted, for example: the proportion of high-frequency (low-frequency) queries, the proportion of queries having a large (small) number of resources, or the like.
  • In addition, it should be emphasized again that main differences between the embodiments of the present invention and the prior art lie in that:
  • Almost all existing text matching characteristics are literal matching of texts (comprising synonym expansion matching); however, the query text and image text in the present invention are matched in a representation space after deep conversion, which have a more generalized meaning, and can implement literally different but semantically correlated matching to some extent.
  • For the existing classification matching characteristics, a classification result thereof is not semantically refined when a category system for image classification is excessively small, while the classification accuracy is low and the difficulty of matching with a text (or a category label) of the query increases dramatically (which is the so-called semantic gap between an image and a text) when the category system is excessively large. In the embodiments of the present invention, the query text and the image pixel content are matched in the representation space after deep conversion, and are not limited by the query or the image category system.
  • The existing click-through rate characteristic only applies to images with valid clicks in the query. The network parameters in the embodiments of the present invention are obtained through training based on image click behaviors under all queries, and it generalizes the measure of relevance between images comprised in the user click behaviors and the query to any image not clicked or sparsely clicked image, and also to any query related to the current query, thereby implementing relevance calculation between any query and image.
  • In view of the above, the present invention has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like, and more thoroughly solves the problems expected to be solved.
  • Fourth Embodiment
  • FIG. 8 is a flow chart of an image search method according to a fourth embodiment of the present invention. The method of this embodiment may be executed by an image search apparatus, and the apparatus may be implemented by hardware and/or software and may be generally integrated in a server where an image search engine is located. The method of this embodiment further comprises the following steps:
  • 810, acquiring an image query entered by a user.
  • In this embodiment, the image query further refers to a text-form query entered by the user through an image search engine, for example: “birthday card”.
  • 820, separately inputting the image query and to-be-sorted images into the image search relevance prediction model established by using the method in the first embodiment to the third embodiment of the present invention, and separately calculating a relevance metric value between each of the to-be-sorted images and the image query.
  • In this embodiment, the to-be-sorted images specifically refer to image search results recalled by the image search engine and corresponding to the image query.
  • 830, sorting the to-be-sorted images according to the calculated relevance metric values, and providing an image search result corresponding to the sorting result to the user.
  • The technical solution of this embodiment inputs an image query and to-be-sorted images to an image search relevance prediction model trained in advance, to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user. The technical solution optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • In addition, it should be noted that, the inventor finds through experiments that: the technical solution of this embodiment of the present invention significantly improves relevance of image search. Evaluations show that, after the relevance characteristic is added, retrieval result satisfaction of the image search with a random query is improved by more than 10%. That is, a difference between the number of queries whose retrieval results perceivably improve and the number of queries whose retrieval results perceivably deteriorate accounts for more than 10% of randomly sampled queries, and the effect is very remarkable.
  • Fifth Embodiment
  • FIG. 9 is a structural diagram of an apparatus of establishing an image search relevance prediction model according to a fifth embodiment of the present invention. As shown in FIG. 9, the apparatus comprises:
      • a training module 91, configured to train a pre-constructed original deep neural network by using a training sample, wherein the training sample comprises: a query and image data, and the original deep neural network comprises: a representation vector generation network and a relevance calculation network, the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value; and
      • a model generation module 92, configured to use the trained original deep neural network as the image search relevance prediction model.
  • In the embodiment of the present invention, an original deep neural network is constructed, inputs of the original deep neural network are a query and image data, and an output thereof is a relevance metric value between the query and the image data. Appropriate training samples are selected to train the original deep neural network, and finally, the original deep neural network may be trained into an image search relevance prediction model. After an image search engine receives an image query entered by a user, the image query and to-be-sorted images are input to the image search relevance prediction model to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user. The present invention optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • On the basis of the foregoing embodiments, the image data may comprise: image associated text data, image content data and image associated characteristic data; wherein the image associated characteristic data comprises: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
  • On the basis of the above embodiments, the relevance calculation network may comprise: a hidden layer set and an output layer connected to an output end of the hidden layer set; wherein the hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end of the vector generation network is connected to an input end of the hidden layer set, and the output layer outputs the relevance metric value.
  • On the basis of the above embodiments, the relevance calculation network may further comprise: a first hidden layer set, a first standard vector representation unit connected to an output end of the first hidden layer set, a second hidden layer set, and a second standard vector representation unit connected to an output end of the second hidden layer set. A vector distance calculation unit is connected to output ends of the first standard vector representation unit and the second standard vector representation unit, respectively. The hidden layer set comprises one or more hidden layers connected end to end, a representation vector output end, which corresponds to the query, in the representation generation network is connected to an input end of the first hidden layer set, a representation vector output end, which corresponds to the image data, in the representation generation network is connected to an input end of the second hidden layer set, and the vector distance calculation unit outputs the relevance metric value.
  • On the basis of the above embodiments, the hidden layer comprises at least two neural units, and the output layer comprises one neural unit; wherein the number of vector dimensions generated by the representation vector generation network, the number of hidden layers comprised in the relevance calculation network, the number of neural units comprised in the hidden layer, the type of a response function of the neural unit, and a method of regularizing an output of the neural unit are preset according to a task.
  • On the basis of the above embodiments, the training module may comprise:
      • a training sample selection unit, configured to select a set number of training samples;
      • a weighted parameter adjustment unit, configured to sequentially acquire a training sample to input to the original deep neural network, and adjust a weighted parameter in the original deep neural network according to an output result, which is based on the training sample, of the original deep neural network; and
      • a looping execution unit, configured to return to execute the operation of acquiring a training sample to input to the original deep neural network, till a preset training termination condition is achieved.
  • On the basis of the above embodiments, the training sample may further comprise: a positive-negative training pair consisting of a training query as well as a positive sample image and a negative sample image that separately correspond to the training query.
  • The weighted parameter adjustment unit may be further configured to:
      • input the training query and image data corresponding to the positive sample image to a first network with a structure identical to that of the original deep neural network, and acquire a first predicted value output by the first network;
      • input the training query and image data corresponding to the negative sample image to a second network in a structure identical to that of the first network, and acquire a second predicted value output by the second network;
      • calculate a loss function according to the first predicted value, the second predicted value, and a relevance partial order between the positive sample image and the negative sample image; and
      • update, along a direction reverse to a direction of minimizing the loss function, weighted parameters of layers in the first network and the second network layer by layer by using a set weight updating algorithm.
  • On the basis of the above embodiments, the training sample selecting module may further comprise:
      • an image click information summarization subunit, configured to summarize image click information corresponding to a same query sample according to image click logs of search users, wherein the query sample comprises: a single query or at least two queries meeting a set similarity threshold condition;
      • a positive-negative image sample set generation subunit, configured to generate a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information; and
      • a training sample generation subunit, configured to select a set number of query samples as the training queries, and generate positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
  • On the basis of the above embodiments, the image click information may comprise: clicked images corresponding to the query samples.
  • The positive-negative image sample set generation subunit may be further configured to: among the clicked images, group images with the number of clicks exceeding a first threshold into the positive image sample set, and group images with the number of clicks less than a second threshold into the negative image sample set.
  • On the basis of the above embodiments, the image click information may comprise: clicked images corresponding to the query sample, and an image search result corresponding to the query sample.
  • The positive-negative image sample set generation subunit may be further configured to: among images corresponding to the image search results, group clicked images into the positive image sample set, and group unclicked images into the negative image sample set.
  • On the basis of the above embodiments, the training sample generation subunit may be further configured to:
      • sequentially acquire a training query as a current operation query, and acquire a target positive sample image set and a target negative sample image set corresponding to the current operation query;
      • select a first number of target positive sample images from the target positive sample image set and select a second number of target negative sample images from the target negative sample image set according to a set sample image selecting rule;
      • separately select sample images from the first number of target positive sample images and the second number of target negative sample images according to a set positive-negative image combination rule, and generate a third number of positive-negative training pairs corresponding to the current operation query as the training samples; and
      • return to execute the operation of acquiring a training query as a current operation query, till completing processing of all training queries.
  • On the basis of the above embodiments, the training sample generation subunit may further be further configured to: acquire at least one image from a positive image sample set corresponding to a non-associated query other than the current operation query, to serve as a target negative sample image corresponding to the current operation query.
  • On the basis of the above embodiments, the training sample selecting module may further comprise: a noise log filtering sub-unit, configured to: before the image click information corresponding to the same query sample are summarized according to the image click logs of the search users, filter out noise logs comprised in the image click logs.
  • The apparatus of establishing an image search relevance prediction model provided in this embodiment of the present invention may be used to execute the method of establishing an image search relevance prediction model provided in the first embodiment to the third embodiment of the present invention, has corresponding function modules, and can implement the same beneficial effects.
  • Sixth Embodiment
  • FIG. 10 is a structural diagram of an image search apparatus according to a sixth embodiment of the present invention. As shown in FIG. 10, the apparatus comprises:
      • an image query acquisition module 101, configured to acquire an image query entered by a user;
      • a relevance metric value calculation module 102, configured to separately input the image query and to-be-sorted images into the image search relevance prediction model established by using the apparatus described in the fifth embodiment, and separately calculate a relevance metric value between each of the to-be-sorted images and the image query; and
      • an image search result providing module 103, configured to sort the to-be-sorted images according to the calculated relevance metric values, and provide an image search result corresponding to the sorting result to the user.
  • The technical solution of this embodiment inputs an image query and to-be-sorted images to an image search relevance prediction model trained in advance to obtain relevance metric values between the to-be-sorted images and the image query. Then, the to-be-sorted images are sorted based on the relevance metric values, and the sorting result is returned to the user. The technical solution optimizes the existing image search technology, and has stronger capabilities than the prior art as well as various integrations and variations in terms of semantic matching between a query and an image text, semantic matching between a query and image content, click generalization and the like. Moreover, the degree of relevance between an image search result and a query entered by a user can be greatly improved.
  • The image search apparatus provided in the embodiment of the present invention may be used to execute the image search method provided in any embodiment of the present invention, has corresponding function modules, and can implement the same beneficial effects.
  • Apparently, those skilled in the art should understand that the foregoing modules or steps of the present invention may be implemented by a server as described above. Optionally, the embodiments of the present invention may be implemented by using a program executable by a computer device, and thus they can be stored in a storage device and executed by a processor. The program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk, an optical disc, or the like. Alternatively, they are made into integrated circuit modules respectively, or multiple modules or steps among them are made into a single integrated circuit module for implementation. In this way, the present invention is not limited to any specific hardware-software combination.
  • The above description describes merely exemplified embodiments of the present invention, but is not intended to limit the present invention. Those skilled in the art may make various changes and modifications to the present invention. All modifications, equivalent replacements and improvements made within the spirit and principle of the present invention should be covered in the protection scope of the present invention.

Claims (20)

What is claimed is:
1. A method for establishing an image search relevance prediction model, comprising:
training a pre-constructed original deep neural network by using a training sample;
the training sample comprising image data and a query, the original deep neural network comprising: a representation vector generation network and a relevance calculation network, the representation vector generation network used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network used to convert at least two input representation vectors into a relevance metric value; and
using the trained original deep neural network as the image search relevance prediction model.
2. The method according to claim 1, wherein the image data comprises: image associated text data, image content data and image associated characteristic data;
wherein the image associated characteristic data comprises: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
3. The method according to claim 1, wherein the relevance calculation network comprises: a hidden layer set and an output layer connected to an output end of the hidden layer set;
wherein,
the hidden layer set comprises one or more hidden layers connected end to end,
a representation vector output end of the vector generation network is connected to an input end of the hidden layer set, and
the output layer outputs the relevance metric value.
4. The method according to claim 1, wherein the relevance calculation network comprises:
a first hidden layer set,
a first standard vector representation unit connected to an output end of the first hidden layer set,
a second hidden layer set,
a second standard vector representation unit connected to an output end of the second hidden layer set, and
a vector distance calculation unit connected to an output end of the first standard vector representation unit and an output end of the second standard vector representation unit respectively;
wherein
the hidden layer set comprises one or more hidden layers connected end to end,
a representation vector output end corresponding to the query, in the representation generation network is connected to an input end of the first hidden layer set,
a representation vector output end corresponding to the image data, in the representation generation network is connected to an input end of the second hidden layer set, and
the vector distance calculation unit outputs the relevance metric value.
5. The method according to claim 3, wherein the hidden layer comprises at least two neural units, and the output layer comprises one neural unit;
wherein the number of vector dimensions generated by the representation vector generation network, the number of hidden layers included in the relevance calculation network, the number of neural units included in the hidden layer, the type of a response function of the neural unit, and a method of regularizing an output of the neural unit are preset based on a task.
6. The method according to claim 1, wherein the training a pre-constructed original deep neural network by using a training sample comprises:
selecting a set number of training samples;
acquiring sequentially a training sample to input to the original deep neural network, and adjusting a weighted parameter of the original deep neural network according to an output result, based on the training sample, of the original deep neural network; and
returning to execute the operation of acquiring a training sample to input to the original deep neural network, till reaching a preset training termination condition.
7. The method according to claim 6, wherein the training sample further comprises:
a training query; and
a positive-negative training pair consisting of a positive sample image and a negative sample image respectively corresponding to the training query;
the acquiring sequentially a training sample to input to the original deep neural network, and adjusting a weighted parameter in the original deep neural network according to an output result, based on the training sample, of the original deep neural network further comprises:
inputting the training query and image data corresponding to the positive sample image to a first network with a structure identical to that of the original deep neural network, and acquiring a first predicted value output by the first network;
inputting the training query and image data corresponding to the negative sample image to a second network with a structure identical to that of the first network, and acquiring a second predicted value output by the second network;
calculating a loss function based on the first predicted value, the second predicted value, and a relevance partial order between the positive sample image and the negative sample image; and
updating, along a direction reverse to a direction of minimizing the loss function, weighted parameters of layers in the first network and the second network layer-by-layer by using a set weight updating algorithm.
8. The method according to claim 7, wherein the selecting a set number of training samples comprises:
summarizing image click information corresponding to a same query sample according to image click logs of search users, wherein the query sample comprises: a single query; or at least two query expressions meeting a set similarity threshold condition;
generating a positive image sample set and a negative image sample set corresponding to the query sample expression according to the summarized image click information; and
selecting a set number of query samples as the training query, and generating positive-negative training pairs corresponding to the training expressions respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
9. The method according to claim 8, wherein the image click information comprises: clicked images corresponding to the query sample; and
the generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information further comprises:
grouping, among the clicked images, images with the number of clicks exceeding a first threshold into the positive image sample set, and
grouping, among the clicked images, images with the number of clicks less than a second threshold into the negative image sample set.
10. The method according to claim 8, wherein the image click information comprises: clicked images corresponding to the query sample and an image search result corresponding to the query sample; and
the generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information further comprises:
grouping, among images corresponding to the image search result, clicked images into the positive image sample set, and unclicked images into the negative image sample set, respectively.
11. The method according to claim 8, wherein the generating positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples further comprises:
sequentially acquiring a training query as a current operation query, and acquiring a target positive sample image set and a target negative sample image set corresponding to the current operation query;
selecting a first number of target positive sample images from the target positive sample image set and selecting a second number of target negative sample images from the target negative sample image set according to a set sample image selecting rule;
separately selecting sample images from the first number of target positive sample images and the second number of target negative sample images according to a set positive-negative image combination rule, and generating a third number of positive-negative training pairs corresponding to the current operation query as the training samples; and
returning to execute the operation of acquiring the training query as the current operation query, till completing processing all training queries.
12. The method according to claim 11, wherein the selecting a first number of target positive sample images from the target positive sample image set and selecting a second number of target negative sample images from the target negative sample image set according to a set sample image selecting rule further comprises:
acquiring at least one image from a positive image sample set corresponding to a non-associated query other than the current operation query, to serve as a target negative sample image corresponding to the current operation query.
13. The method according to claim 8, wherein before summarizing image click information corresponding to a same query sample according to image click logs of search users, the method further comprises:
filtering out noise logs included in the image click logs.
14. An image search method, comprising:
acquiring image query entered by a user;
inputting respectively the image query and to-be-sorted images into the image search relevance prediction model established by using the method according to claim 1, and calculating respectively a relevance metric value between each of the to-be-sorted images and the image query; and
sorting the to-be-sorted images based on the calculated relevance metric values, and providing an image search result corresponding to the sorting result to the user.
15. An apparatus of establishing an image search relevance prediction model, comprising:
at least one processor; and
a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising:
training a pre-constructed original deep neural network by using a training sample;
wherein the training sample comprises: a query and image data, and the original deep neural network comprises: a representation vector generation network and a relevance calculation network, the representation vector generation network is used to convert different types of data in the training sample into representation vectors and input the representation vectors to the relevance calculation network, and the relevance calculation network is used to convert at least two input representation vectors into a relevance metric value; and
using the trained original deep neural network as the image search relevance prediction model.
16. The apparatus according to claim 15, wherein the image data comprises: image associated text data, image content data and image associated characteristic data;
wherein the image associated characteristic data comprises: a click query corresponding to the image, and/or a quality characteristic parameter of the image.
17. The apparatus according to claim 15, wherein the training a pre-constructed original deep neural network by using a training sample comprises:
selecting a set number of training samples;
acquiring sequentially a training sample to input to the original deep neural network, and adjusting a weighted parameter of the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network; and
returning to execute the operation of acquiring a training sample to input to the original deep neural network, till achieving a preset training termination condition.
18. The apparatus according to claim 17, wherein the training sample further comprises: a positive-negative training pair consisting of a training query as well as a positive sample image and a negative sample image that separately correspond to the training query; and
the acquiring sequentially a training sample to input to the original deep neural network, and adjusting a weighted parameter of the original deep neural network according to an output result, which is obtained in response to inputting the training sample, of the original deep neural network comprises:
inputting the training query and image data corresponding to the positive sample image to a first network with a structure identical to that of the original deep neural network, and acquiring a first predicted value output by the first network;
inputting the training query and image data corresponding to the negative sample image to a second network with a structure identical to that of the first network, and acquiring a second predicted value output by the second network;
calculating a loss function according to the first predicted value, the second predicted value, and a relevance partial order between the positive sample image and the negative sample image; and
updating, along a direction reverse to a direction of minimizing the loss function, weighted parameters of layers in the first network and the second network layer-by-layer by using a set weight updating algorithm.
19. The apparatus according to claim 18, wherein the selecting a set number of training samples comprises:
summarizing image click information corresponding to a same query sample according to image click logs of search users, wherein the query sample comprises: a single query or at least two queries meeting a set similarity threshold condition;
generating a positive image sample set and a negative image sample set corresponding to the query sample according to the summarized image click information; and
selecting a set number of query samples as the training queries, and generating positive-negative training pairs corresponding to the training queries respectively according to positive image sample sets and negative image sample sets corresponding to the training queries respectively, to serve as the training samples.
20. An image search apparatus, comprising:
at least one processor; and
a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising:
acquiring an image query entered by a user;
inputting separately the image query and to-be-sorted images into the image search relevance prediction model established by using the apparatus according to claim 15, and separately calculating a relevance metric value between each of the to-be-sorted images and the image query; and
sorting the to-be-sorted images according to the calculated relevance metric values, and providing an image search result corresponding to the sorting result to the user.
US15/281,198 2016-05-10 2016-09-30 Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus Active 2037-04-04 US10354170B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201610306220.6A CN106021364B (en) 2016-05-10 2016-05-10 Foundation, image searching method and the device of picture searching dependency prediction model
CN201610306220.6 2016-05-10
CN201610306220 2016-05-10

Publications (2)

Publication Number Publication Date
US20170330054A1 true US20170330054A1 (en) 2017-11-16
US10354170B2 US10354170B2 (en) 2019-07-16

Family

ID=57100212

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/281,198 Active 2037-04-04 US10354170B2 (en) 2016-05-10 2016-09-30 Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus

Country Status (2)

Country Link
US (1) US10354170B2 (en)
CN (1) CN106021364B (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121969A (en) * 2017-12-22 2018-06-05 百度在线网络技术(北京)有限公司 For handling the method and apparatus of image
CN108399211A (en) * 2018-02-02 2018-08-14 清华大学 Large-scale image searching algorithm based on binary feature
CN108985382A (en) * 2018-05-25 2018-12-11 清华大学 The confrontation sample testing method indicated based on critical data path
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Neural network-based federated modeling method, device and readable storage medium
US20190050724A1 (en) * 2017-08-14 2019-02-14 Sisense Ltd. System and method for generating training sets for neural networks
CN109543179A (en) * 2018-11-05 2019-03-29 北京康夫子科技有限公司 The normalized method and system of colloquial style symptom
CN109657710A (en) * 2018-12-06 2019-04-19 北京达佳互联信息技术有限公司 Data screening method, apparatus, server and storage medium
CN109670532A (en) * 2018-11-23 2019-04-23 腾讯科技(深圳)有限公司 Abnormality recognition method, the apparatus and system of organism organ-tissue image
CN109840588A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Neural network model training method, device, computer equipment and storage medium
CN110046226A (en) * 2019-04-17 2019-07-23 桂林电子科技大学 A kind of Image Description Methods based on distribution term vector CNN-RNN network
US20190259084A1 (en) * 2017-01-31 2019-08-22 Walmart Apollo, Llc Systems and methods for utilizing a convolutional neural network architecture for visual product recommendations
CN110232403A (en) * 2019-05-15 2019-09-13 腾讯科技(深圳)有限公司 A kind of Tag Estimation method, apparatus, electronic equipment and medium
CN110377175A (en) * 2018-04-13 2019-10-25 矽统科技股份有限公司 The recognition methods of percussion event and system and terminal touch-control product on touch panel
WO2019212407A1 (en) * 2018-05-02 2019-11-07 Agency For Science, Technology And Research A system and method for image retrieval
CN110489582A (en) * 2019-08-19 2019-11-22 腾讯科技(深圳)有限公司 Personalization shows the generation method and device, electronic equipment of image
US20200034656A1 (en) * 2017-09-08 2020-01-30 Tencent Technology (Shenzhen) Company Limited Information recommendation method, computer device, and storage medium
CN110866134A (en) * 2019-11-08 2020-03-06 吉林大学 A Distribution Consistency Preserving Metric Learning Method for Image Retrieval
CN110942108A (en) * 2019-12-13 2020-03-31 深圳大学 Face image clustering method, device and computer-readable storage medium
CN111008294A (en) * 2018-10-08 2020-04-14 阿里巴巴集团控股有限公司 Traffic image processing and image retrieval method and device
CN111124902A (en) * 2019-12-12 2020-05-08 腾讯科技(深圳)有限公司 Object operating method and device, computer-readable storage medium and electronic device
CN111275060A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Recognition model updating processing method and device, electronic equipment and storage medium
CN111401509A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Terminal type identification method and device
CN111524570A (en) * 2020-05-06 2020-08-11 万达信息股份有限公司 Ultrasonic follow-up patient screening method based on machine learning
CN111522985A (en) * 2020-04-21 2020-08-11 易拍全球(北京)科贸有限公司 Antique artwork image retrieval algorithm based on depth-layer feature extraction and fusion
CN111667057A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for searching model structure
US10803359B2 (en) 2016-12-30 2020-10-13 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, server, and storage medium
CN111784757A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method of depth estimation model, depth estimation method, apparatus and equipment
CN111931799A (en) * 2019-05-13 2020-11-13 百度在线网络技术(北京)有限公司 Image recognition method and device
CN112015940A (en) * 2019-05-30 2020-12-01 奥多比公司 Text-to-vision machine learning embedding technique
CN112101380A (en) * 2020-08-28 2020-12-18 合肥工业大学 Product click-through rate prediction method and system based on image-text matching, storage medium
CN112434721A (en) * 2020-10-23 2021-03-02 特斯联科技集团有限公司 Image classification method, system, storage medium and terminal based on small sample learning
CN112668365A (en) * 2019-10-15 2021-04-16 顺丰科技有限公司 Material warehousing identification method, device, equipment and storage medium
CN113139381A (en) * 2021-04-29 2021-07-20 平安国际智慧城市科技股份有限公司 Unbalanced sample classification method and device, electronic equipment and storage medium
CN113240056A (en) * 2021-07-12 2021-08-10 北京百度网讯科技有限公司 Multi-mode data joint learning model training method and device
CN113255711A (en) * 2020-02-13 2021-08-13 阿里巴巴集团控股有限公司 Confrontation detection method, device and equipment
CN113282779A (en) * 2020-02-19 2021-08-20 阿里巴巴集团控股有限公司 Image searching method, device and equipment
CN113326864A (en) * 2021-04-06 2021-08-31 上海海洋大学 Image retrieval model and training method
CN113688933A (en) * 2019-01-18 2021-11-23 北京市商汤科技开发有限公司 Training method and classification method and device of classification network, and electronic equipment
US11182840B2 (en) * 2016-11-18 2021-11-23 Walmart Apollo, Llc Systems and methods for mapping a predicted entity to a product based on an online query
US20210374149A1 (en) * 2019-01-10 2021-12-02 Beijing Sankuai Online Technology Co., Ltd Sorting
CN113918753A (en) * 2021-07-23 2022-01-11 腾讯科技(深圳)有限公司 Image retrieval method based on artificial intelligence and related equipment
TWI752455B (en) * 2019-11-11 2022-01-11 大陸商深圳市商湯科技有限公司 Image classification model training method, image processing method, data classification model training method, data processing method, computer device, and storage medium
WO2022012407A1 (en) * 2020-07-15 2022-01-20 华为技术有限公司 Neural network training method and related device
CN114282035A (en) * 2021-08-17 2022-04-05 腾讯科技(深圳)有限公司 Training and searching method, device, equipment and medium of image searching model
US20220124200A1 (en) * 2019-12-06 2022-04-21 At&T Intellectual Property I, L.P. Supporting conversations between customers and customer service agents
CN114491361A (en) * 2022-01-11 2022-05-13 北京达佳互联信息技术有限公司 Click rate model generation method, click rate determination method and related equipment
CN114708449A (en) * 2022-06-02 2022-07-05 腾讯科技(深圳)有限公司 Similar video determination method, and training method and device of example characterization model
CN114722313A (en) * 2022-04-28 2022-07-08 北京爱奇艺科技有限公司 Search result sorting method, device, equipment and storage medium
CN115081589A (en) * 2020-01-09 2022-09-20 支付宝(杭州)信息技术有限公司 Method and device for processing interactive data by using LSTM neural network model
US11604822B2 (en) 2019-05-30 2023-03-14 Adobe Inc. Multi-modal differential search with real-time focus adaptation
US11605019B2 (en) 2019-05-30 2023-03-14 Adobe Inc. Visually guided machine-learning language model
US20230137671A1 (en) * 2020-08-27 2023-05-04 Samsung Electronics Co., Ltd. Method and apparatus for concept matching
US11663188B2 (en) 2017-08-14 2023-05-30 Sisense, Ltd. System and method for representing query elements in an artificial neural network
US11741620B1 (en) * 2020-01-24 2023-08-29 Apple Inc. Plane detection using depth sensor and semantic information
US11775573B2 (en) 2019-04-15 2023-10-03 Yandex Europe Ag Method of and server for retraining machine learning algorithm
CN116975675A (en) * 2022-09-23 2023-10-31 中国移动通信集团浙江有限公司 Method, device, computing device and storage medium for generating website classification model
US20240078258A1 (en) * 2019-02-01 2024-03-07 Google Llc Training Image and Text Embedding Models
CN118469041A (en) * 2024-07-10 2024-08-09 中建五局第三建设(深圳)有限公司 Bidder ring training method, predicting device, equipment and medium for detecting model
CN119493771A (en) * 2024-11-14 2025-02-21 深圳市科盛世纪产业发展合伙企业(有限合伙) Personal data retrieval method and device based on deep neural network
CN120298412A (en) * 2025-06-13 2025-07-11 北京大学第三医院(北京大学第三临床医学院) Thyroid lymphoma prediction method and device based on machine learning
US12363136B1 (en) * 2020-12-28 2025-07-15 Trend Micro Incorporated Detection of unauthorized internet of things devices in a computer network

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383912B (en) * 2016-10-14 2019-09-03 北京字节跳动网络技术有限公司 A kind of picture retrieval method and device
CN107527091B (en) * 2016-10-14 2021-05-25 腾讯科技(北京)有限公司 Data processing method and device
US11188824B2 (en) 2017-02-17 2021-11-30 Google Llc Cooperatively training and/or using separate input and subsequent content neural networks for information retrieval
CN106920147B (en) * 2017-02-28 2020-12-29 华中科技大学 A method for intelligent product recommendation based on word vector data-driven
CN107122393B (en) * 2017-03-09 2019-12-10 北京小度互娱科技有限公司 electronic album generating method and device
CN106951484B (en) * 2017-03-10 2020-10-30 百度在线网络技术(北京)有限公司 Picture retrieval method and device, computer equipment and computer readable medium
CN107016439A (en) * 2017-05-09 2017-08-04 重庆大学 Based on CR2The image text dual coding mechanism implementation model of neutral net
CN107193962B (en) * 2017-05-24 2021-06-11 百度在线网络技术(北京)有限公司 Intelligent map matching method and device for Internet promotion information
CN107239532B (en) * 2017-05-31 2020-07-31 北京京东尚科信息技术有限公司 Data mining method and device
CN110019770A (en) * 2017-07-24 2019-07-16 华为技术有限公司 The method and apparatus of train classification models
CN107437111B (en) * 2017-07-31 2020-07-14 杭州朗和科技有限公司 Data processing method, medium, device and computing device based on neural network
CN107679183B (en) 2017-09-29 2020-11-06 百度在线网络技术(北京)有限公司 Training data acquisition method and device for classifier, server and storage medium
CN110019867A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Image search method, system and index structuring method and medium
CN110069650B (en) * 2017-10-10 2024-02-09 阿里巴巴集团控股有限公司 Searching method and processing equipment
CN110019903A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Generation method, searching method and terminal, the system of image processing engine component
CN107748779A (en) * 2017-10-20 2018-03-02 百度在线网络技术(北京)有限公司 information generating method and device
CN110019889A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 Training characteristics extract model and calculate the method and relevant apparatus of picture and query word relative coefficient
CN107908783B (en) * 2017-12-07 2021-06-11 百度在线网络技术(北京)有限公司 Method, device, server and storage medium for evaluating relevance of search texts
CN108228720B (en) * 2017-12-07 2019-11-08 北京字节跳动网络技术有限公司 Identify method, system, device, terminal and the storage medium of target text content and original image correlation
CN108121814B (en) * 2017-12-28 2022-04-22 北京百度网讯科技有限公司 Search result ranking model generation method and device
US10740394B2 (en) * 2018-01-18 2020-08-11 Oath Inc. Machine-in-the-loop, image-to-video computer vision bootstrapping
CN108230121B (en) * 2018-02-09 2022-06-10 艾凯克斯(嘉兴)信息科技有限公司 Product design method based on recurrent neural network
CN108376132B (en) * 2018-03-16 2020-08-28 中国科学技术大学 A method and system for judging similar test questions
CN110322011B (en) * 2018-03-28 2021-08-06 普天信息技术有限公司 Object Relational Construction Method and Device Oriented to Reasoning Model
CN113298510B (en) * 2018-07-10 2022-06-17 马上消费金融股份有限公司 Deduction instruction initiating method and device
CN109034248B (en) * 2018-07-27 2022-04-05 电子科技大学 A deep learning-based classification method for images with noisy labels
CN110858220A (en) * 2018-08-10 2020-03-03 阿里巴巴集团控股有限公司 Method, device, storage medium and processor for determining image characteristics
CN109214006B (en) * 2018-09-18 2020-10-27 中国科学技术大学 A Natural Language Inference Method for Image Enhanced Hierarchical Semantic Representation
CN109543066B (en) * 2018-10-31 2021-04-23 北京达佳互联信息技术有限公司 Video recommendation method, apparatus and computer-readable storage medium
CN109583439A (en) * 2018-12-04 2019-04-05 龙马智芯(珠海横琴)科技有限公司 The method and device of text correction, storage medium, processor
CN109657792A (en) * 2018-12-19 2019-04-19 北京世纪好未来教育科技有限公司 Construct the method, apparatus and computer-readable medium of neural network
US10810468B2 (en) * 2019-01-30 2020-10-20 Mitsubishi Electric Research Laboratories, Inc. System for training descriptor with active sample selection
CN110033023B (en) * 2019-03-11 2021-06-15 北京光年无限科技有限公司 Image data processing method and system based on picture book recognition
CN110232131B (en) * 2019-04-26 2021-04-27 特赞(上海)信息科技有限公司 Creative material searching method and device based on creative tag
CN112182459A (en) * 2019-07-03 2021-01-05 北京奇虎科技有限公司 Training method and system based on stream processing framework
CN110457523B (en) * 2019-08-12 2022-03-08 腾讯科技(深圳)有限公司 Cover picture selection method, model training method, device and medium
CN112529986B (en) * 2019-09-19 2023-09-22 百度在线网络技术(北京)有限公司 Graph-text correlation calculation model establishment method, graph-text correlation calculation method and graph-text correlation calculation device
CN110704637B (en) * 2019-09-29 2023-05-12 出门问问信息科技有限公司 Method and device for constructing multi-modal knowledge base and computer readable medium
CN112800737B (en) * 2019-10-29 2024-06-18 京东科技控股股份有限公司 Natural language text generation method and device and dialogue system
CN111460909A (en) * 2020-03-09 2020-07-28 兰剑智能科技股份有限公司 Vision-based goods location management method and device
CN115244527B (en) * 2020-03-11 2026-01-27 谷歌有限责任公司 Method for learning embedded space using cross-over examples
CN113535829B (en) * 2020-04-17 2022-04-29 阿里巴巴集团控股有限公司 Training method and device of ranking model, electronic equipment and storage medium
CN113626588B (en) * 2020-05-09 2024-09-06 北京金山数字娱乐科技有限公司 Convolutional neural network training method and device and article classification method and device
CN113627455B (en) * 2020-05-09 2025-01-07 阿里巴巴集团控股有限公司 Image category determination method and device
CN113743430B (en) * 2020-05-29 2025-01-10 北京沃东天骏信息技术有限公司 Method and device for establishing label matching detection model, storage medium and equipment
CN111695680B (en) * 2020-06-15 2023-11-10 北京百度网讯科技有限公司 Performance prediction method, performance prediction model training method, device and electronic equipment
CN112149748B (en) * 2020-09-28 2024-05-21 商汤集团有限公司 Image classification method and device, electronic device and storage medium
CN114595800B (en) * 2020-12-02 2024-09-17 北京搜狗科技发展有限公司 Correlation model training method, sequencing method, device, electronic equipment and medium
CN112560928B (en) * 2020-12-08 2021-10-26 北京百度网讯科技有限公司 Negative sample mining method and device, electronic equipment and storage medium
CN112949750B (en) * 2021-03-25 2022-09-23 清华大学深圳国际研究生院 Image classification method and computer readable storage medium
CN113110045B (en) * 2021-03-31 2022-10-25 同济大学 Model prediction control real-time optimization parallel computing method based on computation graph
CN113283115B (en) * 2021-06-11 2023-08-08 北京有竹居网络技术有限公司 Image model generation method, device and electronic equipment
CN114139041B (en) * 2022-01-28 2022-05-13 浙江口碑网络技术有限公司 Category correlation prediction network training and category correlation prediction method and device
CN114372205B (en) * 2022-03-22 2022-06-10 腾讯科技(深圳)有限公司 Training method, device and equipment of characteristic quantization model
CN114896444A (en) * 2022-05-06 2022-08-12 上海二三四五网络科技有限公司 News picture characteristic-based recommendation method, device, equipment and medium
US11782877B1 (en) 2022-05-17 2023-10-10 Bank Of America Corporation Search technique for noisy logs and resulting user interfaces displaying log entries in ranked order of importance
CN116502286B (en) * 2023-05-24 2023-11-17 中国标准化研究院 Standard information service method and system based on edge calculation
CN117033685A (en) * 2023-08-03 2023-11-10 康键信息技术(深圳)有限公司 Image search method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006039686A2 (en) * 2004-10-01 2006-04-13 University Of Southern California User preference techniques for support vector machines in content based image retrieval
CN103593474B (en) * 2013-11-28 2017-03-01 中国科学院自动化研究所 Image retrieval sort method based on deep learning
CN105468596B (en) * 2014-08-12 2019-06-18 腾讯科技(深圳)有限公司 Picture retrieval method and device
CN104317834B (en) * 2014-10-10 2017-09-29 浙江大学 A kind of across media sort methods based on deep neural network
US10095950B2 (en) * 2015-06-03 2018-10-09 Hyperverge Inc. Systems and methods for image processing
US10586168B2 (en) * 2015-10-08 2020-03-10 Facebook, Inc. Deep translations
CN105447190B (en) * 2015-12-18 2019-03-15 小米科技有限责任公司 Picture retrieval method, device and server based on convolutional neural networks

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11182840B2 (en) * 2016-11-18 2021-11-23 Walmart Apollo, Llc Systems and methods for mapping a predicted entity to a product based on an online query
US10803359B2 (en) 2016-12-30 2020-10-13 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, server, and storage medium
US20190259084A1 (en) * 2017-01-31 2019-08-22 Walmart Apollo, Llc Systems and methods for utilizing a convolutional neural network architecture for visual product recommendations
US11107143B2 (en) * 2017-01-31 2021-08-31 Walmart Apollo Llc Systems and methods for utilizing a convolutional neural network architecture for visual product recommendations
US11734746B2 (en) 2017-01-31 2023-08-22 Walmart Apollo, Llc Systems and methods for utilizing a convolutional neural network architecture for visual product recommendations
US11663188B2 (en) 2017-08-14 2023-05-30 Sisense, Ltd. System and method for representing query elements in an artificial neural network
US12067010B2 (en) 2017-08-14 2024-08-20 Sisense Ltd. System and method for approximating query results using local and remote neural networks
US20190050724A1 (en) * 2017-08-14 2019-02-14 Sisense Ltd. System and method for generating training sets for neural networks
US20200034656A1 (en) * 2017-09-08 2020-01-30 Tencent Technology (Shenzhen) Company Limited Information recommendation method, computer device, and storage medium
US11514260B2 (en) * 2017-09-08 2022-11-29 Tencent Technology (Shenzhen) Company Limited Information recommendation method, computer device, and storage medium
CN108121969A (en) * 2017-12-22 2018-06-05 百度在线网络技术(北京)有限公司 For handling the method and apparatus of image
CN108399211A (en) * 2018-02-02 2018-08-14 清华大学 Large-scale image searching algorithm based on binary feature
CN110377175A (en) * 2018-04-13 2019-10-25 矽统科技股份有限公司 The recognition methods of percussion event and system and terminal touch-control product on touch panel
WO2019212407A1 (en) * 2018-05-02 2019-11-07 Agency For Science, Technology And Research A system and method for image retrieval
CN108985382A (en) * 2018-05-25 2018-12-11 清华大学 The confrontation sample testing method indicated based on critical data path
CN109325584A (en) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 Neural network-based federated modeling method, device and readable storage medium
CN111008294A (en) * 2018-10-08 2020-04-14 阿里巴巴集团控股有限公司 Traffic image processing and image retrieval method and device
CN109543179A (en) * 2018-11-05 2019-03-29 北京康夫子科技有限公司 The normalized method and system of colloquial style symptom
US11869227B2 (en) 2018-11-23 2024-01-09 Tencent Technology (Shenzhen) Company Limited Image recognition method, apparatus, and system and storage medium
WO2020103676A1 (en) * 2018-11-23 2020-05-28 腾讯科技(深圳)有限公司 Image identification method and apparatus, system, and storage medium
CN109670532A (en) * 2018-11-23 2019-04-23 腾讯科技(深圳)有限公司 Abnormality recognition method, the apparatus and system of organism organ-tissue image
CN111275060A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Recognition model updating processing method and device, electronic equipment and storage medium
CN109657710A (en) * 2018-12-06 2019-04-19 北京达佳互联信息技术有限公司 Data screening method, apparatus, server and storage medium
CN111401509A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Terminal type identification method and device
CN109840588A (en) * 2019-01-04 2019-06-04 平安科技(深圳)有限公司 Neural network model training method, device, computer equipment and storage medium
US20210374149A1 (en) * 2019-01-10 2021-12-02 Beijing Sankuai Online Technology Co., Ltd Sorting
CN113688933A (en) * 2019-01-18 2021-11-23 北京市商汤科技开发有限公司 Training method and classification method and device of classification network, and electronic equipment
US20240078258A1 (en) * 2019-02-01 2024-03-07 Google Llc Training Image and Text Embedding Models
US11775573B2 (en) 2019-04-15 2023-10-03 Yandex Europe Ag Method of and server for retraining machine learning algorithm
CN110046226A (en) * 2019-04-17 2019-07-23 桂林电子科技大学 A kind of Image Description Methods based on distribution term vector CNN-RNN network
CN111931799A (en) * 2019-05-13 2020-11-13 百度在线网络技术(北京)有限公司 Image recognition method and device
CN110232403A (en) * 2019-05-15 2019-09-13 腾讯科技(深圳)有限公司 A kind of Tag Estimation method, apparatus, electronic equipment and medium
US11604822B2 (en) 2019-05-30 2023-03-14 Adobe Inc. Multi-modal differential search with real-time focus adaptation
US11775578B2 (en) * 2019-05-30 2023-10-03 Adobe Inc. Text-to-visual machine learning embedding techniques
CN112015940A (en) * 2019-05-30 2020-12-01 奥多比公司 Text-to-vision machine learning embedding technique
US12079269B2 (en) 2019-05-30 2024-09-03 Adobe Inc. Visually guided machine-learning language model
US11605019B2 (en) 2019-05-30 2023-03-14 Adobe Inc. Visually guided machine-learning language model
US20210365727A1 (en) * 2019-05-30 2021-11-25 Adobe Inc. Text-to-Visual Machine Learning Embedding Techniques
US20200380298A1 (en) * 2019-05-30 2020-12-03 Adobe Inc. Text-to-Visual Machine Learning Embedding Techniques
US11144784B2 (en) * 2019-05-30 2021-10-12 Adobe Inc. Text-to-visual machine learning embedding techniques
AU2020202021B2 (en) * 2019-05-30 2021-11-11 Adobe Inc. Text-based image retrieval system by learning sentence embeddings jointly with click and title data
CN110489582A (en) * 2019-08-19 2019-11-22 腾讯科技(深圳)有限公司 Personalization shows the generation method and device, electronic equipment of image
CN110489582B (en) * 2019-08-19 2023-11-07 腾讯科技(深圳)有限公司 Method and device for generating personalized display image and electronic equipment
CN112668365A (en) * 2019-10-15 2021-04-16 顺丰科技有限公司 Material warehousing identification method, device, equipment and storage medium
CN110866134A (en) * 2019-11-08 2020-03-06 吉林大学 A Distribution Consistency Preserving Metric Learning Method for Image Retrieval
TWI752455B (en) * 2019-11-11 2022-01-11 大陸商深圳市商湯科技有限公司 Image classification model training method, image processing method, data classification model training method, data processing method, computer device, and storage medium
US20220124200A1 (en) * 2019-12-06 2022-04-21 At&T Intellectual Property I, L.P. Supporting conversations between customers and customer service agents
CN111124902A (en) * 2019-12-12 2020-05-08 腾讯科技(深圳)有限公司 Object operating method and device, computer-readable storage medium and electronic device
CN110942108A (en) * 2019-12-13 2020-03-31 深圳大学 Face image clustering method, device and computer-readable storage medium
CN115081589A (en) * 2020-01-09 2022-09-20 支付宝(杭州)信息技术有限公司 Method and device for processing interactive data by using LSTM neural network model
US11741620B1 (en) * 2020-01-24 2023-08-29 Apple Inc. Plane detection using depth sensor and semantic information
CN113255711A (en) * 2020-02-13 2021-08-13 阿里巴巴集团控股有限公司 Confrontation detection method, device and equipment
CN113282779A (en) * 2020-02-19 2021-08-20 阿里巴巴集团控股有限公司 Image searching method, device and equipment
CN111522985A (en) * 2020-04-21 2020-08-11 易拍全球(北京)科贸有限公司 Antique artwork image retrieval algorithm based on depth-layer feature extraction and fusion
CN111524570A (en) * 2020-05-06 2020-08-11 万达信息股份有限公司 Ultrasonic follow-up patient screening method based on machine learning
CN111667057A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for searching model structure
CN111784757A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Training method of depth estimation model, depth estimation method, apparatus and equipment
WO2022012407A1 (en) * 2020-07-15 2022-01-20 华为技术有限公司 Neural network training method and related device
US20230137671A1 (en) * 2020-08-27 2023-05-04 Samsung Electronics Co., Ltd. Method and apparatus for concept matching
CN112101380A (en) * 2020-08-28 2020-12-18 合肥工业大学 Product click-through rate prediction method and system based on image-text matching, storage medium
CN112434721A (en) * 2020-10-23 2021-03-02 特斯联科技集团有限公司 Image classification method, system, storage medium and terminal based on small sample learning
US12363136B1 (en) * 2020-12-28 2025-07-15 Trend Micro Incorporated Detection of unauthorized internet of things devices in a computer network
CN113326864A (en) * 2021-04-06 2021-08-31 上海海洋大学 Image retrieval model and training method
CN113139381A (en) * 2021-04-29 2021-07-20 平安国际智慧城市科技股份有限公司 Unbalanced sample classification method and device, electronic equipment and storage medium
CN113240056A (en) * 2021-07-12 2021-08-10 北京百度网讯科技有限公司 Multi-mode data joint learning model training method and device
CN113918753A (en) * 2021-07-23 2022-01-11 腾讯科技(深圳)有限公司 Image retrieval method based on artificial intelligence and related equipment
CN114282035A (en) * 2021-08-17 2022-04-05 腾讯科技(深圳)有限公司 Training and searching method, device, equipment and medium of image searching model
US12493649B2 (en) 2021-08-17 2025-12-09 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training retrieval model, retrieval method and apparatus, device and medium
CN114491361A (en) * 2022-01-11 2022-05-13 北京达佳互联信息技术有限公司 Click rate model generation method, click rate determination method and related equipment
CN114722313A (en) * 2022-04-28 2022-07-08 北京爱奇艺科技有限公司 Search result sorting method, device, equipment and storage medium
CN114708449A (en) * 2022-06-02 2022-07-05 腾讯科技(深圳)有限公司 Similar video determination method, and training method and device of example characterization model
CN116975675A (en) * 2022-09-23 2023-10-31 中国移动通信集团浙江有限公司 Method, device, computing device and storage medium for generating website classification model
CN118469041A (en) * 2024-07-10 2024-08-09 中建五局第三建设(深圳)有限公司 Bidder ring training method, predicting device, equipment and medium for detecting model
CN119493771A (en) * 2024-11-14 2025-02-21 深圳市科盛世纪产业发展合伙企业(有限合伙) Personal data retrieval method and device based on deep neural network
CN120298412A (en) * 2025-06-13 2025-07-11 北京大学第三医院(北京大学第三临床医学院) Thyroid lymphoma prediction method and device based on machine learning

Also Published As

Publication number Publication date
CN106021364B (en) 2017-12-12
US10354170B2 (en) 2019-07-16
CN106021364A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
US10354170B2 (en) Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus
US10459971B2 (en) Method and apparatus of generating image characteristic representation of query, and image search method and apparatus
CN105022754B (en) Object classification method and device based on social network
CN113987161B (en) A text sorting method and device
CN108073568B (en) Keyword extraction method and device
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
CN105183833B (en) Microblog text recommendation method and device based on user model
CN112074828A (en) Training image embedding model and text embedding model
CN112148831B (en) Image-text mixed retrieval method and device, storage medium and computer equipment
US10565253B2 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
WO2013066929A1 (en) Method and apparatus of ranking search results, and search method and apparatus
WO2023065642A1 (en) Corpus screening method, intention recognition model optimization method, device, and storage medium
CN107463605A (en) The recognition methods and device of low-quality News Resources, computer equipment and computer-readable recording medium
CN114820134B (en) Commodity information recall method, device, equipment and computer storage medium
CN115391589A (en) Training method and device for content recall model, electronic equipment and storage medium
CN112417845B (en) Text evaluation method, device, electronic device and storage medium
WO2023155306A1 (en) Data recommendation method and apparatus based on graph neural network and electronic device
WO2023155304A1 (en) Keyword recommendation model training method and apparatus, keyword recommendation method and apparatus, device, and medium
CN114661890B (en) A knowledge recommendation method, device, system and storage medium
CN118152341A (en) Log query statement generation method, device, equipment and storage medium
CN103744958B (en) Webpage classification method based on distributed computation
CN114329206B (en) Title generation method and device, electronic equipment and computer readable medium
CN111475647A (en) Document processing method and device and server
CN114626367A (en) Sentiment analysis method, system, equipment and medium based on news article content
US11314794B2 (en) System and method for adaptively adjusting related search words

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4