CN119669534A

CN119669534A - Material retrieval method, device, computer equipment and storage medium

Info

Publication number: CN119669534A
Application number: CN202411650035.XA
Authority: CN
Inventors: 赵永胜; 李志宏
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2024-11-18
Filing date: 2024-11-18
Publication date: 2025-03-21

Abstract

The application belongs to the field of artificial intelligence, and relates to a material retrieval method which comprises the steps of obtaining a material sample set, carrying out multi-mode feature extraction on each material picture of the material sample set to obtain text features and image features, splicing the text features and the image features of each material picture to obtain a plurality of multi-mode fusion features, carrying out hash coding on the multi-mode fusion features to obtain material hash codes of all the multi-mode fusion features, constructing a retrieval database based on all the material hash codes, obtaining multi-mode fusion features of an image to be detected, obtaining a query vector, calculating a hash value of the query vector to obtain a query hash value, matching the material hash code which is most similar to the query hash value from the retrieval database to obtain a target hash code, and taking a material corresponding to the target hash code as a retrieval result. The application also provides a material retrieval device, computer equipment and a storage medium. The application realizes the quick retrieval of the material warehouse and improves the accuracy of material retrieval.

Description

Material retrieval method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of digital medical and material retrieval technologies, and in particular, to a material retrieval method, device, computer equipment, and storage medium.

Background

The traditional material retrieval method mainly depends on structured information such as material codes, keywords and the like, however, unstructured information such as descriptive text, pictures and the like of materials often contains richer semantic information, and how to effectively utilize the information to improve the precision and recall ratio of material retrieval is a main challenge at present. The material retrieval method based on the similarity can solve the problems to a certain extent, but has some limitations in practical application. Firstly, similarity calculation of materials often depends on manually defined features, such as keyword frequency, text length and the like, and selection of the features requires a great deal of domain knowledge and experience, and deep semantic information of the materials is difficult to describe. Secondly, for non-text information such as pictures, the traditional feature representation method such as color histogram and texture feature can not accurately describe semantic information of the picture content, so that the accuracy of similarity calculation is low.

Disclosure of Invention

The embodiment of the application aims to provide a material retrieval method, a device, computer equipment and a storage medium, which are used for solving the technical problem of lower material identification precision in the traditional technology.

In order to solve the technical problems, the embodiment of the application provides a material retrieval method, which adopts the following technical scheme:

Acquiring a material sample set, and performing multi-mode feature extraction on each material picture in the material sample set to obtain text features and image features;

splicing the text features and the image features of each material picture to obtain a plurality of multi-mode fusion features;

carrying out hash coding on the multi-mode fusion characteristics to obtain material hash codes of all the multi-mode fusion characteristics, and constructing a retrieval database based on all the material hash codes;

Acquiring multi-mode fusion characteristics of an image to be detected, obtaining a query vector, and calculating a hash value of the query vector to obtain a query hash value;

And matching the material hash code which is most similar to the query hash value from the retrieval database to obtain a target hash code, and taking the material corresponding to the target hash code as a retrieval result.

Further, the multi-mode feature extraction is performed on each material picture of the material sample set to obtain text features, including:

Performing text recognition on the material picture to obtain text data;

Preprocessing the text data to obtain a processed text containing a plurality of segmentation words;

And extracting the characteristics of the processed text by adopting a preset word embedding model to obtain the text characteristics.

Further, the image features are obtained by:

carrying out image enhancement on each material picture in the material sample set to obtain an enhanced sample set;

carrying out multi-dimensional feature extraction on the material pictures in the reinforced sample set by adopting a pre-trained convolutional neural network to obtain multi-dimensional image features;

and fusing all the dimensional image features by adopting weighted fusion to obtain the image features.

Further, the splicing the text feature and the image feature of each material picture to obtain a plurality of multi-mode fusion features includes:

Calling a preset multi-mode feature fusion network, wherein the multi-mode feature fusion network comprises an attention mechanism module and a full connection module;

inputting the text features and the image features into the multi-mode feature fusion network, and respectively extracting the attention features of different dimensions from the text features and the image features based on a preset attention mechanism module to obtain the text attention features and the image attention features of multiple dimensions;

And inputting the text attention features and the image attention features of multiple dimensions into the fully-connected module to obtain the multi-mode fusion feature.

Further, the matching the material hash code most similar to the query hash value from the search database to obtain a target hash code includes:

calculating the Hamming distance between the inquiry hash value and each material hash code in the retrieval database;

Counting the hash codes of all materials with hamming distances exceeding a preset threshold value to obtain a candidate material set;

And calculating cosine similarity between multi-mode fusion features corresponding to the material hash codes in the candidate material set and the query vector, sorting the material hash codes in the candidate material set from large to small according to the cosine similarity, and screening the optimal material hash code from the sorted candidate material set to be used as the target hash code.

Further, after the sorting of the material hash codes in the candidate material set according to the cosine similarity from large to small, the method further includes:

Acquiring auxiliary information of materials corresponding to each material hash code in the candidate material set, wherein the auxiliary information at least comprises material quality and provider credibility;

weighting and scoring the material hash codes in the sorted candidate material sets based on the auxiliary information;

And optimizing the sorting of the candidate material sets according to the scores to obtain the comprehensively sorted candidate material sets.

Further, after taking the material corresponding to the target hash code as the search result, the method includes:

Recording a search result corresponding to the query vector, and generating a query log;

And updating the search database according to the query log in real time by incremental learning.

In order to solve the technical problems, the embodiment of the application also provides a material retrieval device, which adopts the following technical scheme:

The characteristic extraction module is used for obtaining a material sample set, and carrying out multi-mode characteristic extraction on each material picture in the material sample set to obtain text characteristics and image characteristics;

The feature fusion module is used for splicing the text features and the image features of each material picture to obtain a plurality of multi-mode fusion features;

the construction module is used for carrying out hash coding on the multi-mode fusion characteristics to obtain material hash codes of all the multi-mode fusion characteristics, and constructing a retrieval database based on all the material hash codes;

the computing module is used for acquiring multi-mode fusion characteristics of the image to be detected, obtaining a query vector, and computing a hash value of the query vector to obtain a query hash value;

and the matching module is used for matching the material hash code which is most similar to the query hash value from the retrieval database to obtain a target hash code, and taking the material corresponding to the target hash code as a retrieval result.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

The computer device comprises a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the material retrieval method as defined in any one of the preceding claims.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

the computer readable storage medium has stored thereon computer readable instructions which when executed by a processor implement the steps of the material retrieval method as defined in any one of the above.

Compared with the prior art, the embodiment of the application has the advantages that aiming at the characteristics that the material library picture simultaneously contains descriptive text and picture information, the text features and the image features of the material picture are obtained by extracting the multi-modal features of the material picture, and the two modal features are fused to form the multi-modal fusion feature. On the basis, the multi-mode fusion characteristic is mapped into the binary code by adopting the hash coding method, so that the rapid retrieval of mass material libraries is realized, and the accuracy and the efficiency of material retrieval are remarkably improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of material retrieval according to the present application;

FIG. 3 is a flow chart of one embodiment of the derived text feature of step S201 of FIG. 2;

FIG. 4 is a flow chart of one embodiment of the resulting image features of step S201 of FIG. 2;

FIG. 5 is a flow chart of one embodiment of step S202 of FIG. 2;

FIG. 6 is a flow chart of one embodiment of step S205 in FIG. 2;

FIG. 7 is a flow chart of an embodiment after step S603 in FIG. 6;

FIG. 8 is a flow chart of an embodiment following step S205 in FIG. 2;

FIG. 9 is a schematic view of an embodiment of a material retrieval device according to the present application;

FIG. 10 is a schematic structural view of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, and a server 103, where the terminal device 101 may be a notebook 1011, a tablet 1012, or a cell phone 1013. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables.

A user may interact with the server 103 via the network 102 using the terminal device 101 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal device 101.

The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, and the terminal device 101 may be an electronic book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer III), an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer IV) player, a laptop portable computer, a desktop computer, or the like, in addition to the notebook 1011, the tablet 1012, or the mobile phone 1013.

The server 103 may be a server providing various services, such as a background server providing support for pages displayed on the terminal device 101.

It should be noted that, the material searching method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the material searching device is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a method of material retrieval according to the present application is shown. The material retrieval method comprises the following steps:

step S201, acquiring a material sample set, and performing multi-mode feature extraction on each material picture in the material sample set to obtain text features and image features;

In this embodiment, the electronic device (e.g., the server/terminal device shown in fig. 1) on which the material retrieval method operates may receive a request for obtaining a pair of sample images and descriptive text through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The material sample set can be called or downloaded from a local database of a hospital or the Internet.

Step S202, splicing the text features and the image features of each material picture to obtain a plurality of multi-mode fusion features;

Step S203, hash coding is carried out on the multi-mode fusion features to obtain material hash codes of all the multi-mode fusion features, and a retrieval database is constructed based on all the material hash codes;

In this embodiment, a simple feature stitching method may be adopted, and a feature fusion method based on an attention mechanism may also be adopted, so that features of different modes may be adaptively learned, and the obtained multi-mode fusion feature has a material feature with higher accuracy. Based on the multi-mode fusion characteristics of all the material pictures, the hash coding method is adopted to code all the multi-mode fusion characteristics into binary material hash codes, and a search database is built by all the material hash codes.

Step S204, acquiring multi-mode fusion characteristics of an image to be detected, obtaining a query vector, and calculating a hash value of the query vector to obtain a query hash value;

in this embodiment, an image to be detected is obtained, multi-modal features of the image to be detected are extracted and spliced according to the method to obtain multi-modal fusion features of the image to be detected, the multi-modal fusion features of the image to be detected are used as query vectors, hash encoding is performed on the query vectors to obtain hash values of the query vectors, and the hash values of the query vectors are used as query hash values.

Step S205, matching the material hash code most similar to the query hash value from the retrieval database to obtain a target hash code, and taking the material corresponding to the target hash code as a retrieval result.

In this embodiment, based on the query vector, the most similar material hash codes are matched from the search database, and the matching method may adopt a similarity measurement method such as cosine similarity, hamming distance, euclidean distance, etc., calculate the similarity between the query vector and each material hash code in the search database, use the material hash code with the highest similarity as the target hash code, and use the material corresponding to the target hash code as the search result.

According to the embodiment, the multi-modal characteristics of each material picture in the material sample set are extracted, the text characteristics and the image characteristics of each material picture are obtained, the text characteristics and the image characteristics of each material picture are spliced, so that the multi-modal fusion characteristics of each material picture are obtained, further, the more accurate representation vector of the material picture is obtained, the multi-modal fusion characteristics are mapped into binary codes by adopting a hash coding method, the rapid retrieval of a mass material library is realized, and the accuracy and the efficiency of material retrieval are remarkably improved.

As shown in fig. 3, in some optional implementations of the present embodiment, the multi-modal feature extraction of each material picture in the material sample set in step 201, to obtain text features, includes the following steps:

step S301, performing text recognition on the material picture to obtain text data;

step S302, preprocessing the text data to obtain a processed text containing a plurality of segmentation words;

And step S303, extracting the characteristics of the processed text by adopting a preset word embedding model to obtain the text characteristics.

In the embodiment, text data of the material picture can be obtained by performing text recognition on the material picture through an OCR (optical character recognition) technology, the text data is preprocessed, operations such as text cleaning, word segmentation and stop Word removal are performed on the text data to obtain a processed text, and Word2Vec Word embedding models are utilized to convert the text data into low-dimensional dense vectors for representation, so that text characteristics of the material text are obtained. Furthermore, in order to ensure the accuracy of text features, incremental training and optimization can be performed on the word vector model regularly. Text features of text data are extracted through the word embedding model, semantic information of material pictures is obtained, and information support is provided for material retrieval.

As shown in fig. 4, in some alternative implementations of the present embodiment, the image features at step 201 are obtained by:

Step S401, carrying out image enhancement on each material picture in the material sample set to obtain an enhanced sample set;

Step S402, performing multi-dimensional feature extraction on the material pictures in the reinforced sample set by adopting a pre-trained convolutional neural network to obtain multi-dimensional image features;

and step S403, fusing all the dimensional image features by adopting weighted fusion to obtain the image features.

In this embodiment, in order to extract more material features as much as possible, image enhancement is performed on each material picture in the material sample set, for example, the contrast and brightness of the material picture are improved, the quality of the material picture is improved, and more material picture features can be extracted when the material picture is extracted. And carrying out multi-dimensional feature extraction on the material pictures in the reinforced sample set by adopting a pre-trained convolutional neural network to obtain a plurality of dimensional picture features, fusing the image features of each dimension by weighting fusion, and integrating the features of the material pictures in different dimensions to ensure that the obtained image features have more material information.

As shown in fig. 5, in some alternative implementations of the present embodiment, the following steps are included at step 202:

step S501, a preset multi-mode feature fusion network is called, wherein the multi-mode feature fusion network comprises an attention mechanism module and a full connection module;

step S502, inputting the text features and the image features into the multi-mode feature fusion network, and respectively extracting the attention features of different dimensions from the text features and the image features based on a preset attention mechanism module to obtain the text attention features and the image attention features of multiple dimensions;

step S503, inputting the text attention feature and the image attention feature of multiple dimensions into the fully connected module, to obtain the multimodal fusion feature.

In this embodiment, the attention mechanism is an important technology in deep learning, through which a user can allow a model to selectively learn key parts in image features and text features, improve the attention of the model to important features, and further mine the features. Aiming at image features and text features, a multi-modal feature fusion network is constructed, semantic association between the two features is learned through an attention mechanism, complementary information between the features is mined, and a fused multi-modal feature representation is generated. For text features, text features with different dimensions are extracted by setting different convolution kernel sizes such as 3,4 and 5, and a multi-scale text feature vector with the dimension of 512 is obtained. For image features, image features with different dimensions of 64x64, 128x128, 256x256 and the like are extracted by setting different pooling layer sizes. And inputting the extracted text features into an attention mechanism, and calculating the similarity between the features of different dimensions to obtain text attention features of different dimensions, and inputting the image features into the attention mechanism to obtain image attention features of different dimensions.

According to the embodiment, through semantic association and complementarity between the multi-mode feature fusion network learning image features and the text features, the representation of the material pictures and the corresponding texts of the fusion features is enhanced.

As shown in fig. 6, in some alternative implementations of the present embodiment, the following steps are included in step 205:

Step S601, calculating the Hamming distance between the query hash value and each material hash code in the retrieval database;

Step S602, counting the hash codes of all materials with hamming distances exceeding a preset threshold value to obtain a candidate material set;

Step S603, calculating cosine similarity between the multimodal fusion feature corresponding to each material hash code in the candidate material set and the query vector, sorting the material hash codes in the candidate material set according to the cosine similarity from large to small, and screening the optimal material hash code from the sorted candidate material set as the target hash code.

In this embodiment, the hash index is constructed based on the search database. When a search request of a user is received, acquiring a query hash value of the request, calculating the Hamming distance between the query hash value and each material hash code in a search database one by one, determining the similarity degree of the materials and the query according to the Hamming distance, taking the materials corresponding to the material hash codes with the Hamming distance exceeding a preset threshold as candidate materials to obtain a candidate material set, and sorting the candidate material set according to the multi-mode fusion characteristics of the materials and the cosine similarity of query vectors, wherein the material hash codes corresponding to the multi-mode fusion characteristics with the highest cosine similarity are target hash codes. The hamming distance is a method for measuring the difference between two character strings with equal length, and represents the number of different characters at the corresponding positions of the two character strings.

According to the method, the candidate material sets are quickly matched from the search database through hash search, the range of the candidate materials is quickly reduced, the cosine similarity between the multi-mode fusion characteristics of each candidate material and the query vector is calculated according to the cosine similarity calculation, further similarity comparison is achieved, and the search accuracy is greatly improved.

As shown in fig. 7, in some alternative implementations of the present embodiment, after step 603, the material searching method disclosed in the present application further includes the following steps:

Step S701, auxiliary information of materials corresponding to each material hash code in the candidate material set is obtained, wherein the auxiliary information at least comprises material quality and provider credibility;

step S702, weighting and scoring the material hash codes in the sorted candidate material sets based on the auxiliary information;

And step 703, optimizing the sorting of the candidate material sets according to the scores to obtain the comprehensively sorted candidate material sets.

In this embodiment, in order to further improve the retrieval accuracy, after sorting the material hash codes in the candidate material set according to the cosine similarity, the sorted material hash codes in the candidate material set may be weighted and scored in combination with auxiliary information of the material, such as material quality and provider reputation, and sorting in the candidate material set may be optimized according to the final score. The accuracy and the practicability of the retrieval result are ensured by optimizing the ordering in the candidate material set by combining the auxiliary information.

As shown in fig. 8, in some alternative implementations of the present embodiment, after step S205, the method includes:

step S801, recording a search result corresponding to the query vector, and generating a query log;

Step S802, according to the query log, updating the search database in real time according to incremental learning.

In this embodiment, in order to continuously optimize the index and the query policy, improve the performance and the use experience of material retrieval, generate the query log by recording the retrieval result corresponding to the query vector, update the retrieval database in real time by adopting incremental learning, and ensure the real-time performance and the accuracy of the retrieval result.

Further, in the material retrieval process, real-time feedback information of a user is acquired, an online learning algorithm is adopted to optimize and update a user intention understanding model and a similarity calculation model in real time, a retrieval request of the user is processed according to the optimized user intention understanding model and the similarity calculation model, cosine similarity between multi-mode fusion features corresponding to material hash codes and query vectors is calculated, materials with high cosine similarity are returned to the user as retrieval results, feedback information of the retrieval results is continuously collected, the feedback information is analyzed, satisfaction evaluation of the user on the retrieval results is acquired, further optimization and improvement are carried out on the retrieval algorithm according to the satisfaction evaluation, and performance and user experience of the material retrieval system are continuously improved.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 9, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a material retrieval apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 9, the material retrieval device 900 in this embodiment includes a feature extraction module 901, a feature fusion module 902, a construction module 903, a calculation module 904, and a matching module 905. Wherein:

the feature extraction module 901 is used for obtaining a material sample set, and performing multi-mode feature extraction on each material picture in the material sample set to obtain text features and image features;

The feature fusion module 902 is configured to splice the text feature and the image feature of each material picture to obtain a plurality of multi-mode fusion features;

the construction module 903 is configured to perform hash coding on the multimodal fusion feature to obtain material hash codes of all multimodal fusion features, and construct a search database based on all material hash codes;

The computing module 904 is configured to obtain a multimodal fusion feature of an image to be detected, obtain a query vector, and calculate a hash value of the query vector, to obtain a query hash value;

And a matching module 905, configured to match a material hash code most similar to the query hash value from the search database, obtain a target hash code, and use a material corresponding to the target hash code as a search result.

In some optional implementations of the present embodiment, the feature extraction module 901 includes:

the identification unit is used for carrying out character identification on the material picture to obtain text data;

The preprocessing unit is used for preprocessing the text data to obtain a processed text containing a plurality of segmentation words;

and the text feature extraction unit is used for extracting features of the processed text by adopting a preset word embedding model to obtain the text features.

the enhancement unit is used for carrying out image enhancement on each material picture in the material sample set to obtain an enhanced sample set;

The image feature extraction unit is used for carrying out multi-dimensional feature extraction on the material pictures in the reinforced sample set by adopting a pre-trained convolutional neural network to obtain multi-dimensional image features;

and the multidimensional feature fusion unit is used for fusing all the dimensional image features by adopting weighted fusion to obtain the image features.

In some alternative implementations of the present embodiment, the feature fusion module 902 includes:

The calling unit is used for calling a preset multi-mode feature fusion network, and the multi-mode feature fusion network comprises an attention mechanism module and a full-connection module;

The attention feature extraction unit is used for inputting the text features and the image features into the multi-mode feature fusion network, and respectively extracting the attention features of different dimensions from the text features and the image features based on a preset attention mechanism module to obtain the text attention features and the image attention features of multiple dimensions;

and the connecting unit is used for inputting the text attention characteristics and the image attention characteristics of multiple dimensions into the full-connection module to obtain the multi-mode fusion characteristics.

In some alternative implementations of the present embodiment, the matching module 905 includes:

the distance calculation unit is used for calculating the Hamming distance between the query hash value and each material hash code in the retrieval database;

the statistics unit is used for counting the hash codes of all materials with hamming distances exceeding a preset threshold value to obtain a candidate material set;

and the sorting unit is used for calculating cosine similarity between multi-mode fusion features corresponding to the material hash codes in the candidate material set and the query vector, sorting the material hash codes in the candidate material set according to the cosine similarity from large to small, and screening the optimal material hash code from the sorted candidate material set to be used as the target hash code.

In some optional implementations of this embodiment, the material retrieval device 900 further includes:

The acquisition unit is used for acquiring auxiliary information of materials corresponding to the hash codes of the materials in the candidate material set, wherein the auxiliary information at least comprises material quality and provider credibility;

the scoring unit is used for weighted scoring of the material hash codes in the sorted candidate material sets based on the auxiliary information;

And the grading and sorting unit is used for optimizing the sorting of the candidate material sets according to the grading to obtain the sorted candidate material sets.

the recording module is used for recording the search results corresponding to the query vectors and generating a query log;

and the updating module is used for updating the search database according to the query log in real time by incremental learning.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 10, fig. 10 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 10 includes a memory 1001, a processor 1002, and a network interface 1003 communicatively connected to each other through a system bus. It is noted that only a computer device 10 having a memory 1001, a processor 1002, and a network interface 1003 is shown, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 1001 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 1001 may be an internal storage unit of the computer device 10, such as a hard disk or a memory of the computer device 10. In other embodiments, the memory 1001 may also be an external storage device of the computer device 8, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 10. Of course, the memory 1001 may also include both an internal storage unit of the computer device 10 and an external storage device thereof. In this embodiment, the memory 1001 is typically used to store an operating system and various application software installed on the computer device 10, such as computer readable instructions of a material retrieval method. In addition, the memory 1001 may be used to temporarily store various types of data that have been output or are to be output.

The processor 1002 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 1002 is generally operative to control the overall operation of the computer apparatus 10. In this embodiment, the processor 1002 is configured to execute computer readable instructions stored in the memory 1001 or process data, such as computer readable instructions for executing the material retrieval method.

The network interface 1003 may include a wireless network interface or a wired network interface, which network interface 1003 is typically used to establish communications connections between the computer device 10 and other electronic devices.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of a material retrieval method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

The non-native company software tools or components present in the embodiments of the present application are presented by way of example only and are not representative of actual use.

Claims

1. A method of retrieving a material, comprising the steps of:

2. The method for retrieving materials according to claim 1, wherein the performing multi-modal feature extraction on each material picture in the material sample set to obtain text features includes:

Performing text recognition on the material picture to obtain text data;

3. The material retrieval method as recited in claim 1, wherein the image features are obtained by:

4. The method for retrieving materials according to claim 1, wherein the stitching the text feature and the image feature of each material picture to obtain a plurality of multi-modal fusion features includes:

5. The material retrieval method as recited in claim 1, wherein the matching the material hash code from the retrieval database that is most similar to the query hash value, to obtain a target hash code, includes:

6. The material retrieval method as recited in claim 5, wherein after sorting the material hash codes in the candidate set of materials from large to small in the cosine similarity, the method further comprises:

7. The material retrieval method according to claim 1, wherein after the material corresponding to the target hash code is used as the retrieval result, the method comprises:

8. A material retrieval device, comprising:

9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the material retrieval method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the material retrieval method according to any of claims 1 to 7.