CN102262670A

CN102262670A - Cross-media information retrieval system and method based on mobile visual equipment

Info

Publication number: CN102262670A
Application number: CN2011102151426A
Authority: CN
Inventors: 吴仁涛; 王若梅; 孟思明
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2011-07-29
Filing date: 2011-07-29
Publication date: 2011-11-30

Abstract

The invention discloses a cross-media information retrieval system based on a mobile visual device, comprising: an image input module for inputting an image into the mobile visual device, a text input module for inputting text information related to the image , a text-based Internet image retrieval module for retrieving an image collection associated with the input text information and establishing a text-related image collection, for searching an image matching the input image in the text-related image collection, and Establish a content-based image retrieval module that generates visually relevant image sets and extract keywords from the web pages where the images in the visually relevant image sets are finally found, and then extract relevant online content and display the retrieval results. The first retrieval processing module . The invention combines text and image retrieval to realize accurate and fast cross-media and visual retrieval. The invention also discloses a cross-media information retrieval method based on the mobile visual device.

Description

A mobile visual device-based cross-media information retrieval system and method

技术领域 technical field

本发明涉及信息检索领域，特别涉及一种基于移动可视设备的跨媒体信息检索系统及方法。The invention relates to the field of information retrieval, in particular to a cross-media information retrieval system and method based on a mobile visual device.

背景技术 Background technique

随着检索技术越来越流行，检索也逐渐成为一个很有商业前景的产业。例如，检索引擎供应商通过在返回的结果中附带付费广告的方式来盈利，以研究更新的技术，提供更新的服务，吸引更多的用户和获取更多的广告收入。现在，这个竞争已经转向无线领域。包括传统的信息检索，邻近商店和物品的本地信息检索，甚至是背景图像的检索。As retrieval technology becomes more and more popular, retrieval has gradually become a very commercially promising industry. For example, search engine providers make money by attaching paid advertisements to the returned results, in order to research newer technologies, provide newer services, attract more users and obtain more advertising revenue. Now, the competition has turned to wireless. This includes traditional information retrieval, local information retrieval of nearby stores and items, and even background image retrieval.

基于移动可视设备的移动检索是一个全新的由实际应用需求催生的研究课题。随着人们生活水平的日益提高，科学技术的不断发展，移动可视设备例如手机、笔记本电脑等广泛应用到人们的生活中，而逐渐形成的移动检索的体系将改变人们寻找及购买日常商品、移动内容以及本地信息服务的方式。Mobile retrieval based on mobile visual devices is a new research topic spurred by practical application requirements. With the improvement of people's living standards and the continuous development of science and technology, mobile visual devices such as mobile phones and notebook computers are widely used in people's lives, and the gradually formed mobile retrieval system will change people's search for and purchase of daily commodities, Ways to move content and local information services.

传统信息检索技术主要是面向文本的，采用文本检索技术，通常是利用一组关键字或词组成的查询来检索定位文本和数据库中相关文本文档。如果某个文档中包含较多和查询项相关的文本，那么就认为该文档比其他包含较少查询项的文档更相关。对于希望利用多媒体数据资源的用户来说，一般要求他们具备一定的知识背景，才能提交符合IR(Information Retrieval，信息检索)系统要求或IR系统可以理解的查询.但是，在实际应用中，大量用户对于某些概念语义的精通程度并不足以明确叙述其查询意愿，此时如果IR系统允许用户能够以多种媒体信息来描述方式输入查询意愿，那么将检索到更多符合心愿的查询结果。Traditional information retrieval technology is mainly text-oriented, using text retrieval technology, usually using a set of keywords or words to search and locate text and related text documents in the database. A document is considered more relevant if it contains more text relevant to the query term than other documents that contain fewer query terms. For users who want to use multimedia data resources, they are generally required to have a certain knowledge background in order to submit queries that meet the requirements of the IR (Information Retrieval, Information Retrieval) system or that the IR system can understand. However, in practical applications, a large number of users The degree of proficiency in the semantics of some concepts is not enough to clearly describe their query intentions. If the IR system allows users to input query intentions in a variety of media information, then more query results that meet their wishes will be retrieved.

对图像和视频等多媒体信息集来说，目前绝大多数检索系统仍采取文本检索技术，例如Google的图像和视频检索功能仍是基于文本关键词的，这些关键词可能来源于图片周围的文本，文件名等，其中少量的也可能来源于人工标注。所以近些年来，许多研究人员试图实现基于内容的多媒体信息查询技术，以弥补上述多媒体信息检索技术的缺陷。国内外有报多研究人员在积极研究基于内容的多媒体信息检索技术，其中包括对图像、视频和音频等多媒体信息的内容处理和分析(parsing)、自动标注(annotation)，构建索引(indexing)和相似检索(retrieval)等。然而，在实际应用中，大量用户对于某些概念语义的精通程度并不足以明确叙述其查询意愿，此时如果IR系统允许用户能够以多种媒体信息来描述方式输入查询意愿，那么将检索到更多符合心愿的查询结果。不能够以某一种或多种媒体表达方式描述的用户查询，在不同类型媒体表达方式描述的媒体信息之间进行查找和相关匹配，仅仅从单一媒体检索，使得用户检索的正确性大大降低，未能实现跨媒体的信息检索是当下信息检索技术需要解决的主要问题。For multimedia information sets such as images and videos, most retrieval systems still use text retrieval technology. For example, Google’s image and video retrieval functions are still based on text keywords, which may come from the text around the image. File names, etc., a small amount of which may also come from manual annotation. So in recent years, many researchers have attempted to implement content-based multimedia information query technology to make up for the above-mentioned defects of multimedia information retrieval technology. It has been reported that many researchers at home and abroad are actively studying content-based multimedia information retrieval technology, including content processing and analysis (parsing), automatic annotation (annotation), construction of index (indexing) and Similar retrieval (retrieval) and so on. However, in practical applications, a large number of users do not have enough proficiency in the semantics of certain concepts to clearly describe their query intentions. At this time, if the IR system allows users to input query intentions in a variety of media information, then the retrieved More matching query results. For user queries that cannot be described in one or more media expressions, searching and matching are performed between media information described in different types of media expressions, and only retrieve from a single media, which greatly reduces the accuracy of user retrieval. Failure to achieve cross-media information retrieval is the main problem that current information retrieval technology needs to solve.

发明内容 Contents of the invention

本发明的目的在于提供一种基于移动可视设备的跨媒体信息检索系统及方法，能够结合文本和图像检索，实现跨媒体的信息检索，从而获取到更精确的检索结果。The purpose of the present invention is to provide a cross-media information retrieval system and method based on a mobile visual device, which can combine text and image retrieval to realize cross-media information retrieval, thereby obtaining more accurate retrieval results.

为实现上述目的，本发明提供一种基于移动可视设备的跨媒体信息检索的系统，包括：图像输入模块、文本输入模块、基于文本的互联网图像检索模块、基于内容的图像检索模块及检索处理模块，所述图像输入模块用于将图像输入到所述移动可视设备内；所述文本输入模块用于输入与图像相关的文字信息；所述基于文本的互联网图像检索模块用于检索出与输入的文字信息关联的图像集合，并建立生成文本相关的图像集；所述基于内容的图像检索模块用于在文本相关的图像集中通过检索与所输入图像匹配的图像，并建立生成视觉相关的图像集；所述检索处理模块用于从最终找到的视觉相关的图像集中的图像所在的网页中提取关键词，进而提取相关在线内容，并显示检索结果。To achieve the above object, the present invention provides a system for cross-media information retrieval based on mobile visual devices, including: an image input module, a text input module, a text-based Internet image retrieval module, a content-based image retrieval module and retrieval processing module, the image input module is used to input images into the mobile visual device; the text input module is used to input text information related to images; the text-based Internet image retrieval module is used to retrieve images related to The image collection associated with the input text information, and create and generate a text-related image collection; the content-based image retrieval module is used to retrieve images matching the input image in the text-related image collection, and create and generate a visually relevant image collection image set; the retrieval processing module is used to extract keywords from the webpage where the images in the finally found visually relevant image collection are located, and then extract relevant online content, and display the retrieval results.

较佳地，所述移动可视设备内置有摄像头，所述图像通过摄像头采集并传送到图像输入模块中。Preferably, the mobile visual device has a built-in camera, and the image is collected by the camera and sent to the image input module.

较佳地，所述移动可视设备为手机。Preferably, the mobile visual device is a mobile phone.

较佳地，所述系统还包括查询扩展模块，所述查询扩展模块用于对所输入的文字信息进行扩展并将与所述文字信息同义的其他文字一起生成文本集，所述基于文本的互联网图像检索模块以所述文本集进行关联图像的检索。Preferably, the system further includes a query expansion module, the query expansion module is used to expand the input text information and generate a text set together with other texts that are synonymous with the text information, and the text-based The Internet image retrieval module retrieves associated images with the text set.

相应地，本发明还提供了一种基于移动可视设备的跨媒体信息检索方法，，包括以下步骤：步骤1：获取图像；步骤2：输入与所述图像关联的文字信息；步骤3：基于文本的互联网图像检索，即以所输入的文字信息为索引在互联网数据库中进行检索；步骤4：提取并生成文本相关的图像集；步骤5：基于内容的图像检索，在文本相关的图像集中检索与所述图像匹配的图像；步骤6：提取并生成视觉相关的图像集；步骤7：提取图像所在网页中的关键词；步骤8：提取在线内容；步骤9：显示检索结果。Correspondingly, the present invention also provides a cross-media information retrieval method based on a mobile visual device, comprising the following steps: Step 1: Acquire an image; Step 2: Input text information associated with the image; Step 3: Based on Internet image retrieval of text, that is, searching in Internet databases with the input text information as an index; Step 4: Extract and generate text-related image sets; Step 5: Content-based image retrieval, search in text-related image sets An image matching the image; step 6: extracting and generating a visually related image set; step 7: extracting keywords in the webpage where the image is located; step 8: extracting online content; step 9: displaying the search results.

较佳地，在基于文本的互联网图像检索的步骤之前还包括查询扩展步骤，所述查询扩展步骤对所输入的文字信息进行扩展并将与所述文字信息同义的其他文字一起生成文本集，所述基于文本的互联网图像检索步骤以所述文本集进行关联图像的检索。Preferably, before the step of text-based Internet image retrieval, a query expansion step is included, the query expansion step expands the input text information and generates a text set together with other texts that are synonymous with the text information, The text-based Internet image retrieval step performs retrieval of associated images with the text set.

本发明的基于移动可视设备的跨媒体信息检索系统及方法，是通过基于内容的图像检索方法来度量查询图片和文本相关的图像集中的图像的相似程度，从而找到既文本相关义视觉上相似的图片。一般图像常用的描述符主要有：色彩、纹理、形状等特征。由于纹理特征不足以分辨移动设备上获取的图像。以前一些移动设备上的信息检索工作大多只使用基于内容的图像检索方法来查询相似的图像，以挖掘更深层的信息。然而，在本系统中，考虑到计算消耗和在大量图像上搜索的精确度，利用输入图像相关的文字信息，通过基于文本的互联网图像检索先获取生成文本相关图像集，而基于内容的图像检索匹配任务只是在小数量的文本相关图像集上进行，通过这种多模态跨媒体的输入检索方案，实现更高准确度和更快速度的信息检索。The mobile visual device-based cross-media information retrieval system and method of the present invention measure the similarity between the query picture and the images in the text-related image collection through the content-based image retrieval method, so as to find both text-related and visually similar picture of. Descriptors commonly used in general images mainly include: color, texture, shape and other characteristics. Since texture features are not sufficient to distinguish images acquired on mobile devices. Most of the previous information retrieval works on mobile devices only use content-based image retrieval methods to query similar images to mine deeper information. However, in this system, considering the calculation consumption and the accuracy of searching on a large number of images, the text information related to the input image is used to generate a text-related image set through text-based Internet image retrieval, while the content-based image retrieval The matching task is only performed on a small number of text-related image sets. Through this multi-modal cross-media input retrieval scheme, higher accuracy and faster information retrieval can be achieved.

本发明的有益效果主要体现如下：Beneficial effects of the present invention are mainly reflected as follows:

第一，本系统的查询输入可以是多模态的，例如，一幅图像附带一些提示单词；First, the query input of this system can be multi-modal, for example, an image with some prompt words;

第二，本系统会使用一个动态的文本相关的图像集而不是使用一个特定的数据库；Second, the system uses a dynamic set of text-related images instead of using a specific database;

第三，在本系统中，针对更广泛的应用和大规模的数据。因此，除了精确性，搜索效率的提高也是这个系统高效的一个重要因素；Third, in this system, it is aimed at wider applications and large-scale data. Therefore, in addition to accuracy, the improvement of search efficiency is also an important factor for the efficiency of this system;

第四，传统的基于内容的图像检索(Content Based Image Retrieval，简称CBIR)方法在大数据量下的效果一般不佳，而本系统方案采用的混合图像匹配方法的结果的正确率要高的多。Fourth, the traditional content-based image retrieval (Content Based Image Retrieval, referred to as CBIR) method is generally not effective in large amounts of data, and the mixed image matching method used in this system solution has a much higher accuracy rate .

附图说明 Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明的系统结构示意图；Fig. 1 is a schematic diagram of the system structure of the present invention;

图2为本发明的方法流程图；Fig. 2 is method flowchart of the present invention;

图3为本发明的一个实施方式图。Fig. 3 is a diagram of an embodiment of the present invention.

具体实施方式 Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

本发明通过结合基于文本的互联网检索技术和基于内容的图像检索技术，把跨媒体检索的技术运用到移动检索中。提出了一套在大数据需求下进行可视化检索方法用以实现移动设备的移动检索方案，支持多用户进行多模态的查询，实现了移动设备上直观的便捷的跨媒体、可视化检索。The invention applies the cross-media retrieval technology to the mobile retrieval by combining the text-based Internet retrieval technology and the content-based image retrieval technology. A set of visual retrieval methods under the demand of big data is proposed to realize the mobile retrieval scheme of mobile devices, which supports multi-users to conduct multi-modal queries, and realizes intuitive and convenient cross-media and visual retrieval on mobile devices.

参考图1，本发明提供的一种基于移动可视设备的跨媒体信息检索的系统，包括：图像输入模块100、文本输入模块200、基于文本的互联网图像检索模块400、基于内容的图像检索模块500及检索处理模块600，所述图像输入模块100用于将图像输入到所述移动可视设备内；所述文本输入模块200用于输入与图像相关的文字信息；所述基于文本的互联网图像检索模块400用于检索出与输入的文字信息关联的图像集合，并建立生成文本相关的图像集；所述基于内容的图像检索模块500用于在文本相关的图像集中通过检索与所输入图像匹配的图像，并建立生成视觉相关的图像集；所述检索处理模块600用于从最终找到的视觉相关的图像集中的图像所在的网页中提取关键词，进而提取相关在线内容，并显示检索结果。较佳地，所述系统还包括查询扩展模块300，所述查询扩展模块300用于对所输入的文字信息进行扩展并将与所述文字信息同义的其他文字一起生成文本集，所述基于文本的互联网图像检索模块400以所述文本集进行关联图像的检索。Referring to Fig. 1, a system of cross-media information retrieval based on mobile visual devices provided by the present invention includes: an image input module 100, a text input module 200, a text-based Internet image retrieval module 400, and a content-based image retrieval module 500 and a retrieval processing module 600, the image input module 100 is used to input images into the mobile visual device; the text input module 200 is used to input text information related to images; the text-based Internet image The retrieval module 400 is used to retrieve the image collection associated with the input text information, and builds a text-related image collection; the content-based image retrieval module 500 is used to match the input image by searching in the text-related image collection images, and create and generate a visually relevant image set; the retrieval processing module 600 is used to extract keywords from the webpage where the images in the finally found visually relevant image set are located, and then extract relevant online content, and display the retrieval results. Preferably, the system further includes a query expansion module 300, the query expansion module 300 is used to expand the input text information and generate a text set together with other texts that are synonymous with the text information, based on The text internet image retrieval module 400 performs retrieval of associated images with the text set.

较佳地，所述移动可视设备为手机。较佳地，所述移动可视设备内置有摄像头，所述图像通过摄像头采集并传送到图像输入模块100中。可以理解地，所述图像也可以通过邮件或彩信mms的形式传送到手机中，而所述移动可视设备也可以是笔记本电脑或其他类似的移动智能设备。基于移动可视设备的跨媒体信息检索的普遍问题是难以获得查询图像样本(query image)，通过内置有摄像头手机拍摄进行获取图像，实用性强，操作方便，使可检索范围增大。以前的一些工作也利用这些手机上内置的摄像头来查找视觉特征相似的图像。然而大多数是基于一个有限且确定的数据库，并且只能进行单纯的图像查询。更多的互联网图像虽然可以通过基于文本的图像搜索引擎被加入进来。但检索采用的仍然是特定且规模很有限的初始图像库，不适合于在大规模数据上进行一般的搜索。本系统通过基于文本的互联网检索获取文本相关图像集，再从该有限的文本相关图像集中进行基于内容的图像检索，提高检索的速度及准确度。Preferably, the mobile visual device is a mobile phone. Preferably, the mobile visual device has a built-in camera, and the image is collected by the camera and sent to the image input module 100 . It can be understood that the image can also be sent to the mobile phone in the form of mail or mms, and the mobile visual device can also be a notebook computer or other similar mobile smart devices. The general problem of cross-media information retrieval based on mobile visual devices is that it is difficult to obtain query image samples (query image). Obtaining images by shooting with a mobile phone with a built-in camera has strong practicability and easy operation, which increases the scope of retrieval. Some previous work has also used the cameras built into these phones to find images with similar visual characteristics. However, most are based on a limited and deterministic database and can only perform pure image queries. More Internet images can be added though text-based image search engines. However, the retrieval still uses a specific and limited initial image library, which is not suitable for general search on large-scale data. The system obtains text-related image sets through text-based Internet retrieval, and then performs content-based image retrieval from the limited text-related image collection to improve retrieval speed and accuracy.

参考图2，相应地，本发明还提供了一种基于移动可视设备的跨媒体信息检索方法，，包括以下步骤：步骤1：获取图像；步骤2：输入与所述图像关联的文字信息；步骤3：基于文本的互联网图像检索，即以所输入的文字信息为索引在互联网数据库中进行检索；步骤4：提取并生成文本相关的图像集；步骤5：基于内容的图像检索，在文本相关的图像集中检索与所述图像匹配的图像；步骤6：提取并生成视觉相关的图像集；步骤7：提取图像所在网页中的关键词；步骤8：提取在线内容；步骤9：显示检索结果。Referring to Fig. 2, correspondingly, the present invention also provides a cross-media information retrieval method based on a mobile visual device, comprising the following steps: Step 1: acquire an image; Step 2: input text information associated with the image; Step 3: Text-based Internet image retrieval, that is, to search in Internet databases with the input text information as an index; Step 4: Extract and generate text-related image sets; Step 5: Content-based image retrieval, in text-related Retrieve images that match the image from the image set; Step 6: Extract and generate a visually related image set; Step 7: Extract keywords in the webpage where the image is located; Step 8: Extract online content; Step 9: Display the search results.

配合参考图3，本发明的基于移动可视设备的跨媒体信息检索系统支持多模态跨媒体的输入。即查询输入可以是拍摄的图像和提示性的词语。首先根据图像添加输入相关的解释性的词，由基于文本的图像搜索引擎检索生成一个文本相关的图像集合。然后运用基于内容的图像检索方法在这个集合中找出视觉很相似的图像。从最终找到的图像所在的网页中提取关键短语(key phrase)，并从这些关键短语中抽取出全局的关键短语，从而在在线的百科全书或是其他专门性的搜索引擎获取有关的信息。With reference to FIG. 3 , the mobile visual device-based cross-media information retrieval system of the present invention supports multi-modal cross-media input. That is, the query input can be photographed images and suggestive words. Firstly, input related explanatory words are added according to the image, and a text-related image collection is retrieved by a text-based image search engine. Then use content-based image retrieval method to find visually similar images in this set. Extract key phrases (key phrases) from the web pages where the finally found images are located, and extract global key phrases from these key phrases, so as to obtain relevant information in online encyclopedias or other specialized search engines.

在提交基于文本的查询词时，用户常常很难给出所要查找信息的完整描述。特别在移动设备上，用户通常只能用一两个词来表达他们想要的信息。这些没有很好定义的查询词只能提供模糊的描述，因此，会出现信息不足而导致比较差的搜索结果。为了解决这个问题和揭示少数的查询词以外的信息，本发明采用了扩展查询输入词的方法，从而得到一系列的新查询词。这里，查询词的扩展基本上是使用词汇的语义上的相关性来进行，而基本查询扩展程序是基于一个大型的通用词忙系统。它覆盏盖了大量英语中的名词，动词，形容词和副词。这些词基本上以同义词形式组织在一起，称为同义集。这些同义集根据定义在他们上的词汇相关性被组织在一起。如果在最初的输入词中存在预先定义的禁止单词中无实意的单词，如介词“的”(of)，冠词“这”(the)和代词“一个”(one)等等，这些单词将先被删掉.余下的单词通过算法进行转换，并由他们的词干(stems)来表示。根据词的关系，把在同义和下义并集中列出来的单词加到新生成的文本集中。除了基于词汇的方法外，还可以使用基于某方面知识的查询扩展方法。例如通过一些特定领域的知识库和人工智能的方法来补充与特定场景相关的词语。When submitting text-based queries, it is often difficult for users to give a complete description of the information they are looking for. Especially on mobile devices, users can often only express the information they want in one or two words. These poorly defined query terms provide only vague descriptions and, therefore, insufficient information leading to poor search results. In order to solve this problem and reveal information other than a small number of query words, the present invention adopts a method of expanding query input words, thereby obtaining a series of new query words. Here, the expansion of query words is basically carried out using the semantic relevance of the vocabulary, and the basic query expansion procedure is based on a large general word busy system. It covers a large number of nouns, verbs, adjectives and adverbs in English. These words are basically grouped together as synonyms called synsets. These synsets are organized together according to the lexical dependencies defined on them. If there are meaningless words in the pre-defined forbidden words in the initial input words, such as the preposition "of", the article "the" and the pronoun "one", etc., these words will be are deleted first. The remaining words are transformed algorithmically and represented by their stems. According to the word relationship, the words listed in the synonymy and hyponym unions are added to the newly generated text set. In addition to vocabulary-based methods, knowledge-based query expansion methods can also be used. For example, some domain-specific knowledge bases and artificial intelligence methods are used to supplement words related to specific scenarios.

把这个扩展的文本集看作一个文本特征向量，在基于文本的互联网图像检索过程中，就可以通过单词索引进行特征向量的匹配。即是把扩展集中的所有单词以“或”关系送到检索系统中。这样，与查询文本集在文本相关度上比较接近的图像会被给予更高的等级。根据这个等级，一定数量的图片被选出来，构成一个文本相关的图像集，从而继续从这个图像集中找出视觉上相关的图像。Treat this extended text set as a text feature vector, and in the process of text-based Internet image retrieval, the feature vector can be matched through the word index. That is to send all the words in the extended set to the retrieval system in an "or" relationship. In this way, images that are closer in textual relevance to the query text set will be given higher ranks. According to this level, a certain number of pictures are selected to form a text-related image set, so as to continue to find visually related images from this image set.

基于内容图像检索方法在处理底层的图像特征和高层的语义概念方面还有很大的断层，正确的图像可能并不在检索结果列表的前面。另外，因为移动设备上低带宽和小屏幕的限制，不可能给移动用户提供太多的结果图像。为了解决上述的困难和发掘Web内容中隐含的有用信息，在本系统中，对与图像相关的网页进行关键词抽取，并把这个关键词与图像存储在一块并同时索引，如图2中的索引II，索引的内容包括图像的标识码(ID)、视觉特征、图片内容和页数。关键词是基于网页中单词的统计特性和结构特性的分析进行脱机抽取的。在选择关键词之前，根据HTML DOM(Document Object Model)树中提出的结构布局抽取某个图像的文本上下文信息。其中，图像明显分隔符之问的内容作为文本的上下文信息。所有得到的周围文本片断(surrounding text segments)被看作收入到固一篇篇章(article)中。接下去，基于这个篇章从中抽取出这个图像的关键词。Content-based image retrieval methods still have a large fault in dealing with low-level image features and high-level semantic concepts, and the correct image may not be at the front of the retrieval result list. Also, because of low bandwidth and small screen constraints on mobile devices, it is impossible to provide too many resulting images to mobile users. In order to solve the above difficulties and discover useful information hidden in the web content, in this system, keywords are extracted from web pages related to images, and the keywords and images are stored together and indexed at the same time, as shown in Figure 2 Index II of the index, the content of the index includes the identification code (ID) of the image, visual characteristics, picture content and page number. Keywords are extracted offline based on the analysis of the statistical and structural properties of words in web pages. Before selecting keywords, the text context information of an image is extracted according to the structural layout proposed in the HTML DOM (Document Object Model) tree. Among them, the content between the obvious separators of the image is used as the context information of the text. All resulting surrounding text segments are considered to be included in an article. Next, based on this text, keywords of this image are extracted from it.

本发明的基于移动可视设备的跨媒体信息检索系统，是通过基于内容的图像检索方法来度量查询图片和文本相关的图像集中的图像的相似程度，从而找到既文本相关义视觉上相似的图片。一般图像常用的描述符主要有：色彩、纹理、形状等特征。由于纹理特征不足以分辨移动设备上获取的图像。以前一些移动设备上的信息检索工作大多只使用基于内容的图像检索方法来查询相似的图像，以挖掘更深层的信息。然而，在本系统中，考虑到计算消耗和在大量图像上搜索的精确度，利用输入图像相关的文字信息，通过基于文本的互联网图像检索先获取生成文本相关图像集，而基于内容的图像检索匹配任务只是在小数量的文本相关图像集上进行，通过这种多模态跨媒体的输入检索方案，实现更高准确度和更快速度的信息检索。The mobile visual device-based cross-media information retrieval system of the present invention uses a content-based image retrieval method to measure the similarity between the query picture and the images in the text-related image collection, so as to find visually similar pictures that are both text-related and meaningful . Descriptors commonly used in general images mainly include: color, texture, shape and other characteristics. Since texture features are not sufficient to distinguish images acquired on mobile devices. Most of the previous information retrieval works on mobile devices only use content-based image retrieval methods to query similar images to mine deeper information. However, in this system, considering the calculation consumption and the accuracy of searching on a large number of images, the text information related to the input image is used to generate a text-related image set through text-based Internet image retrieval, while the content-based image retrieval The matching task is only performed on a small number of text-related image sets. Through this multi-modal cross-media input retrieval scheme, higher accuracy and faster information retrieval can be achieved.

本发明的目的是提供一中移动搜索系统的技术方案，来说明如何运用跨媒体检索的技术进行移动搜索。就两种通常的用户信息需求提出了一套在大数据量下进行可视化搜索的算法，从而实现了在有内置摄像头的移动设备上支持用户进行多模态、跨媒体查询的移动搜索方案。需要说明的是，本发明所提到的技术方案，是利用移动设备的内置摄像头进行检索的研究，都是围绕着基于文本的互联网图像检索和基于内容的图像检索的方法。返回的是与查询图像全局特征上比较相似的图像。基于文本的搜索引擎及基于内容的图像检索的具体为本领域技术人员所熟知，因此不再详述。而对于第一类的信息要求.则需要进行更为精细(fine-grain)的匹配方案去查找包含与查询图片一样的显著物体或场景，我们这里把这类图像称为复本图像(identical image)。通过局部描述符识别同样的物体或场景的方法，在目标识别和复本图像检测方面可以取得满意的结果。利用局部特征检测子检测特征点或区域，同时提取出特征点周围的特征，并用局部描述符将其表现出来。局部描述符在处理因缩放、裁剪、剪切、旋转、局部变暗或增亮和对比度变化等操作而发生变化的图像方面取得很好的效果。同时，它们对于一定程度的视角和光照变化也具有不变性。The purpose of the present invention is to provide a technical scheme of a mobile search system, to illustrate how to use the technology of cross-media retrieval for mobile search. A set of algorithms for visual search under large amount of data is proposed for two common user information needs, thus realizing a mobile search solution that supports users to conduct multi-modal and cross-media queries on mobile devices with built-in cameras. It should be noted that the technical solution mentioned in the present invention is the research of using the built-in camera of the mobile device for retrieval, and it all revolves around the methods of text-based Internet image retrieval and content-based image retrieval. The returned image is similar to the query image in terms of global characteristics. The specifics of the text-based search engine and the content-based image retrieval are well known to those skilled in the art, and thus will not be described in detail. For the information requirements of the first category, a more refined (fine-grain) matching scheme is required to find the same salient objects or scenes as the query image. We call this type of image a duplicate image (identical image) ). The method of identifying the same object or scene through local descriptors can achieve satisfactory results in object recognition and duplicate image detection. Use local feature detectors to detect feature points or regions, and extract features around feature points, and use local descriptors to express them. Local descriptors have achieved good results in handling images that have changed due to operations such as scaling, cropping, shearing, rotation, local darkening or brightening, and contrast changes. At the same time, they are also invariant to a certain degree of viewing angle and lighting changes.

以上对本发明实施例所提供的一种基于移动可视设备的跨媒体信息检索的系统及方法，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The system and method for cross-media information retrieval based on mobile visual devices provided by the embodiments of the present invention have been introduced in detail above. In this paper, specific examples have been used to illustrate the principles and implementation methods of the present invention. The above implementation The description of the example is only used to help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary As stated above, the content of this specification should not be construed as limiting the present invention.

Claims

1. A cross-media information retrieval system based on mobile visual equipment, characterized in that, comprising:

an image input module, configured to input an image into the mobile visual device;

A text input module is used to input text information related to the image;

The text-based Internet image retrieval module is used to retrieve the image collection associated with the input text information, and establish and generate a text-related image collection;

The content-based image retrieval module retrieves images matching the input image in the text-related image set, and creates and generates a visually-related image set;

The retrieval processing module is used to extract keywords from the webpage where the images in the finally found visually relevant image set are located, and then extract relevant online content, and display the retrieval results.

2. The system according to claim 1, wherein the mobile visual device has a built-in camera, and the image is collected by the camera and transmitted to the image input module.

3. The system according to claim 1 or 2, wherein the mobile visual device is a mobile phone.

4. The system according to claim 1, further comprising a query expansion module, the query expansion module is used to expand the input text information and generate other texts that are synonymous with the text information A text set, the text-based Internet image retrieval module uses the text set to retrieve associated images.

5. A cross-media information retrieval method based on mobile visual equipment, is characterized in that, comprises the following steps:

Step 1: Get the image;

Step 2: input text information associated with the image;

Step 3: Internet image retrieval based on text, that is, searching in Internet databases with the input text information as an index;

Step 4: Extract and generate text-related image sets;

Step 5: Content-based image retrieval, retrieving images matching the image in the text-related image set;

Step 6: Extract and generate visually relevant image sets;

Step 7: Extract keywords in the webpage where the image is located;

Step 8: extract online content;

Step 9: Display the search results.

6. The method according to claim 5, wherein the mobile visual device has a built-in camera, and the image in step 1 is acquired through the camera.

7. The method according to claim 5 or 6, wherein the mobile visual device is a mobile phone.

8. The method as claimed in claim 5, further comprising a query expansion step before the step of text-based Internet image retrieval, the query expansion step expands the input text information and combines it with the text Other words that are synonymous with the information together generate a text set, and the text-based Internet image retrieval step uses the text set to retrieve associated images.