CN107657269A

CN107657269A - A kind of method and apparatus for being used to train picture purification model

Info

Publication number: CN107657269A
Application number: CN201710737312.4A
Authority: CN
Inventors: 李广
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-08-24
Filing date: 2017-08-24
Publication date: 2018-02-02

Abstract

The object of the present invention is to provide a method and device for training a picture refinement model. Compared with the prior art, the present invention expands the size of the existing pictures, obtains the expanded pictures, clusters the expanded pictures, obtains corresponding clustering results, and selects a predetermined number of clustering results from at least one clustering result. The picture is presented to the user as a sample picture, and the positive and negative samples obtained by the user based on the relevant operations on the clustering results are obtained. According to the positive and negative samples selected by the user, the corresponding picture purification model is trained, and then the picture purification model can be used Purification of image quality has achieved low-cost acquisition of high-quality data; different from the previous tens of thousands or hundreds of thousands of manual annotations, users only need to spend a few minutes to complete the labeling task of small samples, start model training, and get used for A picture refinement model for mass picture quality purification, and then use the picture refinement model to dig out more high-quality pictures from massive picture data.

Description

A method and device for training an image refinement model

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及一种用于训练图片提纯模型的技术。The invention relates to the technical field of image processing, in particular to a technique for training a picture purification model.

背景技术Background technique

图片数据的质量提纯是获取训练数据十分关键的一步。尤其深度学习领域，绝大多数的方法都是数据驱动的，导致了图片数据的质量直接关系到了算法模型性能。因此，获取高质量的训练数据是算法研究中极为重要的一步。The quality purification of image data is a critical step in obtaining training data. Especially in the field of deep learning, the vast majority of methods are data-driven, resulting in the quality of image data directly related to the performance of algorithm models. Therefore, obtaining high-quality training data is an extremely important step in algorithm research.

目前，图片数据的提纯方法主要包括基于算法的自动挖掘的方法和利用人工标注的方法。基于算法挖掘的方法是低成本的，但效果不能保证。而人工标注的方法，虽然质量高，但成本高，周期长，尤其是海量的数据，往往有数千万级别甚至数亿张级别，人工标注的方法是不能很好满足大数据的业务需求的。Currently, image data purification methods mainly include algorithm-based automatic mining methods and manual labeling methods. The method based on algorithmic mining is low-cost, but the effect cannot be guaranteed. Although the manual labeling method is of high quality, it is costly and takes a long time, especially for massive data, which often has tens of millions or even hundreds of millions of records. The manual labeling method cannot well meet the business needs of big data. .

因此，如何提供一种高效、准确的训练图片提纯模型的方法，从而利用该模型来进行图片质量提纯，成为本领域技术人员亟需解决的问题之一。Therefore, how to provide an efficient and accurate method for training a picture refinement model, so as to use the model to perform picture quality refinement, has become one of the problems to be solved urgently by those skilled in the art.

发明内容Contents of the invention

本发明的目的是提供一种用于训练图片提纯模型的方法和装置。The object of the present invention is to provide a method and device for training a picture refinement model.

根据本发明的一个方面，提供了一种用于训练图片提纯模型的方法，其中，该方法包括：According to one aspect of the present invention, a method for training a picture refinement model is provided, wherein the method includes:

a对现有图片进行图片扩容，获得扩容后的图片；a. Expand the image size of the existing image to obtain the expanded image;

b对所述扩容后的图片进行聚类，获得对应的聚类结果；b. Clustering the expanded pictures to obtain corresponding clustering results;

c在至少一个聚类结果中选择预定数量的图片作为样例图片呈现给用户；c selecting a predetermined number of pictures from at least one clustering result as sample pictures and presenting them to the user;

d获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本；d obtaining positive and negative samples obtained by the user based on related operations on the clustering results;

e根据所述用户所选择的正负样本，训练对应的图片提纯模型。e According to the positive and negative samples selected by the user, train the corresponding image purification model.

优选地，步骤a包括：Preferably, step a comprises:

根据所述用户输入的关键词，利用图片搜索引擎获得与所述关键词匹配的图片，作为所述现有图片。According to the keyword input by the user, use a picture search engine to obtain a picture matching the keyword as the existing picture.

优选地，步骤a包括：Preferably, step a comprises:

获取所述用户上传的图片，作为所述现有图片。Obtain the picture uploaded by the user as the existing picture.

优选地，步骤a包括：Preferably, step a comprises:

根据所述用户输入的关键词和自所述现有图片中选择的图片，采用关键词及以图搜图的方式进行图片扩容，获得扩容后的图片。According to the keywords input by the user and the pictures selected from the existing pictures, the pictures are expanded by using the keywords and searching for pictures by pictures to obtain the pictures after expansion.

优选地，该方法还包括：Preferably, the method also includes:

获取所述用户设置的图片扩容的目标数量；Obtain the target number of image expansion set by the user;

其中，步骤a包括：Wherein, step a includes:

根据所述目标数量，对所述现有图片进行图片扩容，获得扩容后的图片。According to the target quantity, image expansion is performed on the existing image to obtain an expanded image.

优选地，步骤d还包括：Preferably, step d also includes:

对至少一个聚类结果进行再次聚类，并对再次聚类后获得的聚类结果选择得到正、负样本。Re-clustering is performed on at least one clustering result, and positive and negative samples are selected from the clustering results obtained after re-clustering.

优选地，该方法还包括：Preferably, the method also includes:

根据所述图片提纯模型，对大规模图片集合进行图片质量提纯。According to the image purification model, image quality is purified for a large-scale image collection.

根据本发明的另一个方面，还提供了一种用于训练图片提纯模型的装置，其中，该装置包括：According to another aspect of the present invention, there is also provided a device for training a picture refinement model, wherein the device includes:

扩容装置，用于对现有图片进行图片扩容，获得扩容后的图片；The expansion device is used to expand the image size of the existing image to obtain the expanded image;

聚类装置，用于对所述扩容后的图片进行聚类，获得对应的聚类结果；A clustering device, configured to cluster the expanded pictures to obtain corresponding clustering results;

呈现装置，用于在至少一个聚类结果中选择预定数量的图片作为样例图片呈现给用户；Presenting means, configured to select a predetermined number of pictures from at least one clustering result as sample pictures and present them to the user;

第一获取装置，用于获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本；A first acquiring device, configured to acquire positive and negative samples obtained by the user based on related operations on the clustering results;

训练装置，用于根据所述用户所选择的正负样本，训练对应的图片提纯模型。The training device is used to train a corresponding image refinement model according to the positive and negative samples selected by the user.

优选地，所述扩容装置用于：Preferably, the expansion device is used for:

优选地，该装置还包括：Preferably, the device also includes:

第二获取装置，用于获取所述用户设置的图片扩容的目标数量；The second obtaining means is used to obtain the target quantity of image expansion set by the user;

其中，所述扩容装置用于：Wherein, the expansion device is used for:

优选地，所述第一获取装置还用于：Preferably, the first acquiring device is also used for:

优选地，该装置还包括：Preferably, the device also includes:

提纯装置，用于根据所述图片提纯模型，对大规模图片集合进行图片质量提纯。The purification device is used for purifying the image quality of a large-scale image collection according to the image purification model.

根据本发明的又一个方面，还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机代码，当所述计算机代码被执行时，如上任一项所述的方法被执行。According to still another aspect of the present invention, a computer-readable storage medium is also provided, the computer-readable storage medium stores computer code, when the computer code is executed, the method described in any one of the above is executed .

根据本发明的再一个方面，还提供了一种计算机程序产品，当所述计算机程序产品被计算机设备执行时，如上任一项所述的方法被执行。According to still another aspect of the present invention, a computer program product is also provided. When the computer program product is executed by a computer device, the method described in any one of the above items is executed.

根据本发明的再一个方面，还提供了一种计算机设备，所述计算机设备包括：According to still another aspect of the present invention, a computer device is also provided, and the computer device includes:

一个或多个处理器；one or more processors;

存储器，用于存储一个或多个计算机程序；memory for storing one or more computer programs;

当所述一个或多个计算机程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现如上任一项所述的方法。When the one or more computer programs are executed by the one or more processors, the one or more processors are made to implement the method described in any one of the above.

与现有技术相比，本发明对现有图片进行图片扩容，获得扩容后的图片，对所述扩容后的图片进行聚类，获得对应的聚类结果，在至少一个聚类结果中选择预定数量的图片作为样例图片呈现给用户，获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本，根据所述用户所选择的正负样本，训练对应的图片提纯模型，进而可以利用该图片提纯模型进行图片质量提纯，实现了低成本获取高质量数据；区别于以往的几万、几十万张的人工标注量，用户只需要花几分钟时间便完成小样本的标注任务，随后即可启动模型训练，直接得到用于海量图片质量提纯的图片提纯模型，此后可以用该图片提纯模型从海量的图片数据中挖掘出更多高质量的图片。Compared with the prior art, the present invention expands the size of the existing pictures, obtains the expanded pictures, clusters the expanded pictures, obtains corresponding clustering results, and selects a predetermined clustering result from at least one clustering result. A large number of pictures are presented to the user as sample pictures, the positive and negative samples obtained by the user based on the related operations on the clustering results are obtained, and the corresponding picture purification model is trained according to the positive and negative samples selected by the user , and then you can use the image purification model to purify the image quality, and achieve low-cost acquisition of high-quality data; different from the previous tens of thousands or hundreds of thousands of manual labeling, users only need to spend a few minutes to complete the small sample After labeling tasks, model training can be started, and the image purification model used for mass image quality purification can be directly obtained. After that, the image purification model can be used to mine more high-quality images from the massive image data.

附图说明Description of drawings

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1示出根据本发明一个方面的用于训练图片提纯模型的装置的示意图；Fig. 1 shows a schematic diagram of a device for training a picture refinement model according to one aspect of the present invention;

图2示出根据本发明一个优选实施例的用于训练图片提纯模型的示意图；Fig. 2 shows a schematic diagram for training a picture purification model according to a preferred embodiment of the present invention;

图3示出根据本发明一个优选实施例的用于训练图片提纯模型的示意图；Fig. 3 shows a schematic diagram for training a picture purification model according to a preferred embodiment of the present invention;

图4示出根据本发明一个优选实施例的用于训练图片提纯模型的示意图；Fig. 4 shows a schematic diagram for training a picture purification model according to a preferred embodiment of the present invention;

图5示出根据本发明一个优选实施例的用于训练图片提纯模型的示意图；Fig. 5 shows a schematic diagram for training a picture purification model according to a preferred embodiment of the present invention;

图6示出根据本发明另一个方面的用于训练图片提纯模型的方法的流程示意图。Fig. 6 shows a schematic flowchart of a method for training a picture refinement model according to another aspect of the present invention.

附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numerals in the drawings represent the same or similar components.

具体实施方式Detailed ways

在更加详细地讨论示例性实施例之前应当提到的是，一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作描述成顺序的处理，但是其中的许多操作可以被并行地、并发地或者同时实施。此外，各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止，但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。Before discussing the exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe operations as sequential processing, many of the operations may be performed in parallel, concurrently, or simultaneously. In addition, the order of operations can be rearranged. The process may be terminated when its operations are complete, but may also have additional steps not included in the figure. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.

在上下文中所称“计算机设备”，也称为“电脑”，是指可以通过运行预定程序或指令来执行数值计算和/或逻辑计算等预定处理过程的智能电子设备，其可以包括处理器与存储器，由处理器执行在存储器中预存的存续指令来执行预定处理过程，或是由ASIC、FPGA、DSP等硬件执行预定处理过程，或是由上述二者组合来实现。计算机设备包括但不限于服务器、个人电脑、笔记本电脑、平板电脑、智能手机等。The term "computer equipment" in this context, also referred to as "computer", refers to an intelligent electronic device that can perform predetermined processing procedures such as numerical calculations and/or logic calculations by running predetermined programs or instructions, which may include a processor and The memory is realized by the processor executing the pre-stored surviving instructions in the memory to execute the predetermined processing procedure, or by hardware such as ASIC, FPGA, DSP to execute the predetermined processing procedure, or by a combination of the above two. Computer equipment includes, but is not limited to, servers, personal computers, laptops, tablets, smartphones, etc.

所述计算机设备包括用户设备与网络设备。其中，所述用户设备包括但不限于电脑、智能手机、PDA等；所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量计算机或网络服务器构成的云，其中，云计算是分布式计算的一种，由一群松散耦合的计算机集组成的一个超级虚拟计算机。其中，所述计算机设备可单独运行来实现本发明，也可接入网络并通过与网络中的其他计算机设备的交互操作来实现本发明。其中，所述计算机设备所处的网络包括但不限于互联网、广域网、城域网、局域网、VPN网络等。The computer equipment includes user equipment and network equipment. Wherein, the user equipment includes, but is not limited to, computers, smart phones, PDAs, etc.; Or a cloud composed of network servers, among them, cloud computing is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets. Wherein, the computer device can operate independently to realize the present invention, and can also be connected to a network and realize the present invention by interacting with other computer devices in the network. Wherein, the network where the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

需要说明的是，所述用户设备、网络设备和网络等仅为举例，其他现有的或今后可能出现的计算机设备或网络如可适用于本发明，也应包含在本发明保护范围以内，并以引用方式包含于此。It should be noted that the user equipment, network equipment, and network are only examples, and other existing or future computer equipment or networks that are applicable to the present invention should also be included in the protection scope of the present invention, and Included herein by reference.

后面所讨论的方法(其中一些通过流程图示出)可以通过硬件、软件、固件、中间件、微代码、硬件描述语言或者其任意组合来实施。当用软件、固件、中间件或微代码来实施时，用以实施必要任务的程序代码或代码段可以被存储在机器或计算机可读介质(比如存储介质)中。(一个或多个)处理器可以实施必要的任务。The methods discussed below, some of which are illustrated by flowcharts, can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) can perform the necessary tasks.

这里所公开的具体结构和功能细节仅仅是代表性的，并且是用于描述本发明的示例性实施例的目的。但是本发明可以通过许多替换形式来具体实现，并且不应当被解释成仅仅受限于这里所阐述的实施例。Specific structural and functional details disclosed herein are representative only and for purposes of describing example embodiments of the present invention. This invention may, however, be embodied in many alternative forms and should not be construed as limited to only the embodiments set forth herein.

应当理解的是，虽然在这里可能使用了术语“第一”、“第二”等等来描述各个单元，但是这些单元不应当受这些术语限制。使用这些术语仅仅是为了将一个单元与另一个单元进行区分。举例来说，在不背离示例性实施例的范围的情况下，第一单元可以被称为第二单元，并且类似地第二单元可以被称为第一单元。这里所使用的术语“和/或”包括其中一个或更多所列出的相关联项目的任意和所有组合。It will be understood that although the terms "first", "second", etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

应当理解的是，当一个单元被称为“连接”或“耦合”到另一单元时，其可以直接连接或耦合到所述另一单元，或者可以存在中间单元。与此相对，当一个单元被称为“直接连接”或“直接耦合”到另一单元时，则不存在中间单元。应当按照类似的方式来解释被用于描述单元之间的关系的其他词语(例如“处于...之间”相比于“直接处于...之间”，“与...邻近”相比于“与...直接邻近”等等)。It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a similar fashion (e.g., "between" as opposed to "directly between", "adjacent to" as opposed to than "directly adjacent to" etc.).

这里所使用的术语仅仅是为了描述具体实施例而不意图限制示例性实施例。除非上下文明确地另有所指，否则这里所使用的单数形式“一个”、“一项”还意图包括复数。还应当理解的是，这里所使用的术语“包括”和/或“包含”规定所陈述的特征、整数、步骤、操作、单元和/或组件的存在，而不排除存在或添加一个或更多其他特征、整数、步骤、操作、单元、组件和/或其组合。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "an" are intended to include the plural unless the context clearly dictates otherwise. It should also be understood that the terms "comprising" and/or "comprising" as used herein specify the presence of stated features, integers, steps, operations, units and/or components, but do not exclude the presence or addition of one or more Other features, integers, steps, operations, units, components and/or combinations thereof.

还应当提到的是，在一些替换实现方式中，所提到的功能/动作可以按照不同于附图中标示的顺序发生。举例来说，取决于所涉及的功能/动作，相继示出的两幅图实际上可以基本上同时执行或者有时可以按照相反的顺序来执行。It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

下面结合附图对本发明作进一步详细描述。The present invention will be described in further detail below in conjunction with the accompanying drawings.

图1示出根据本发明一个方面的用于训练图片提纯模型的装置的示意图。Fig. 1 shows a schematic diagram of an apparatus for training a picture refinement model according to one aspect of the present invention.

装置1包括扩容装置101、聚类装置102、呈现装置103、第一获取装置104和训练装置105。该装置1例如位于计算机设备中，该计算机设备包括用户设备与网络设备。以下以该装置1位于网络设备中为例进行详细描述。The device 1 includes a capacity expansion device 101 , a clustering device 102 , a presentation device 103 , a first acquisition device 104 and a training device 105 . The apparatus 1 is located in computer equipment, for example, and the computer equipment includes user equipment and network equipment. The following describes in detail by taking the apparatus 1 located in a network device as an example.

其中，扩容装置101对现有图片进行图片扩容，获得扩容后的图片。Wherein, the capacity expansion device 101 performs picture expansion on an existing picture to obtain a picture after capacity expansion.

具体地，对于现有的图片，扩容装置101可以直接在该现有图片的基础上，采取以图搜图的方式，通过在本地图片库中进行图片匹配、或者通过在线图片查找等方式，对现有图片进行图片扩容，从而获得扩容后的图片。在此，直接根据全部现有图片进行图片扩容的方式可以理解为该用户选择了全部的现有图片。或者，用户可以对现有图片进行选择，从中选择一部分符合其需求的图片，随后，扩容装置101再根据该用户所选择的现有图片，采取以图搜图的方式，通过在本地图片库中进行图片匹配、或者通过在线图片查找等方式，对现有图片进行图片扩容，从而获得扩容后的图片。又或者，用户可以输入一定的关键词，例如，用户通过与用户设备的交互，在关键词输入口输入一定的关键词，则扩容装置101通过与用户设备的交互，例如通过一次或多次调用该用户设备提供的应用程序接口(API)或通过其他约定的通信方式，获取该用户输入的关键词，并根据该关键词，通过在本地图片库或搜索引擎查找等方式，获取与该关键词匹配的图片，从而对现有图片进行图片扩容，获得扩容后的图片。再或者，扩容装置101可以将上述以图搜图和关键词查找的方式相结合，即，扩容装置101根据用户从现有图片中选择的全部或部分现有图片，再结合该用户输入的关键词，通过在本地图片库或搜索引擎查找等方式，获取与该关键词和全部或部分现有图片相匹配的图片，从而对现有图片进行图片扩容，获得扩容后的图片。在此，该现有图片可以是根据用户输入的关键词，通过图片搜索引擎所匹配获得的，也可以是用户直接上传的图片。在此，图片扩容后的目标数量可以不作具体要求，即，只要是与这些用户选择的全部或部分现有图片相匹配，或进一步与用户输入的关键词相匹配的图片，都可以拿来作为扩容后的图片；当然，该图片扩容后的目标数量也可以进行具体设置，其可以是系统预置的，也可以由用户来自行调整。Specifically, for an existing picture, the capacity expansion device 101 can directly search for a picture by picture based on the existing picture, and perform picture matching in a local picture library, or search online for pictures, etc. Existing pictures are expanded to obtain expanded pictures. Here, the method of image expansion directly based on all existing images can be understood as that the user has selected all existing images. Alternatively, the user may select existing pictures, and select a part of pictures that meet his needs, and then, according to the existing pictures selected by the user, the expansion device 101 adopts the method of searching pictures by pictures, and searches through the local picture library. Perform image matching, or through online image search, etc., to expand the image size of the existing image, so as to obtain the expanded image. Alternatively, the user may input a certain keyword. For example, the user inputs a certain keyword at the keyword input port through interaction with the user equipment, and the capacity expansion device 101 interacts with the user equipment, for example, through one or more calls The application program interface (API) provided by the user equipment or through other agreed communication methods obtains the keyword entered by the user, and according to the keyword, obtains the keywords related to the keyword by searching in the local picture library or search engine, etc. Matching pictures, so as to expand the picture size of the existing picture, and obtain the picture after expansion. Alternatively, the capacity expansion device 101 may combine the above methods of image search and keyword search. words, by searching in a local image library or a search engine, etc., to obtain pictures that match the keyword and all or part of the existing pictures, so as to expand the size of the existing pictures and obtain the expanded pictures. Here, the existing picture may be obtained through matching with a picture search engine according to keywords input by the user, or may be a picture directly uploaded by the user. Here, there is no specific requirement for the target number of pictures after expansion, that is, as long as they match all or part of the existing pictures selected by these users, or further match the keywords entered by the user, they can be used as The image after expansion; of course, the target number of the image after expansion can also be specifically set, which can be preset by the system or adjusted by the user.

优选地，所述扩容装置101根据所述用户输入的关键词，利用图片搜索引擎获得与所述关键词匹配的图片，作为所述现有图片。Preferably, according to the keyword input by the user, the expansion device 101 uses a picture search engine to obtain a picture matching the keyword as the existing picture.

具体地，用户例如通过与用户设备的交互，在预定入口或通过特定应用程序的关键词输入口，输入了某个关键词，随后，扩容装置101通过与该用户设备的交互，如通过一次或多次调用对应的应用程序接口(API)，或其他约定的通信方式，获取了该用户输入的关键词；接着，该扩容装置101再利用图片搜索引擎，根据该用户输入的关键词进行查询，从而获得与该关键词匹配的图片，作为前述现有图片。Specifically, the user inputs a certain keyword at a predetermined entry or through a keyword input port of a specific application, for example, through interaction with the user equipment. The corresponding application program interface (API) or other agreed communication methods are called multiple times to obtain the keywords input by the user; then, the capacity expansion device 101 uses the image search engine to search according to the keywords input by the user, Thereby, a picture matching the keyword is obtained as the aforementioned existing picture.

优选地，所述扩容装置101获取所述用户上传的图片，作为所述现有图片。Preferably, the expansion device 101 obtains the picture uploaded by the user as the existing picture.

具体地，用户也可以直接上传图片，例如，用户通过与用户设备的交互，在预定入口或通过特定应用程序的图片上传入口，上传了一定数量的图片，则扩容装置101通过与该用户设备的交互，如通过一次或多次调用对应的应用程序接口(API)，或其他约定的通信方式，获取了该用户上传的图片，作为前述现有图片。Specifically, the user can also directly upload pictures. For example, the user uploads a certain number of pictures at a predetermined entry or through the picture upload entry of a specific application through interaction with the user equipment, and the capacity expansion device 101 communicates with the user equipment. Interaction, such as obtaining the picture uploaded by the user as the aforementioned existing picture by calling the corresponding application program interface (API) one or more times, or other agreed communication methods.

本领域技术人员应能理解，上述获得现有图片的方式仅为举例，其他现有或今后可能出现的获得现有图片的方式，如可适用于本发明，也应包含在本发明保护范围以内，并在此以引用的方式包含于此。Those skilled in the art should be able to understand that the above methods of obtaining existing pictures are only examples, and other existing or future ways of obtaining existing pictures, if applicable to the present invention, should also be included within the protection scope of the present invention , and is hereby incorporated by reference.

优选地，所述扩容装置101根据所述用户输入的关键词和自所述现有图片中选择的图片，采用关键词及以图搜图的方式进行图片扩容，获得扩容后的图片。Preferably, the capacity expansion device 101 expands the capacity of the pictures by using the key words input by the user and the pictures selected from the existing pictures by using the key words and searching pictures by pictures, and obtains the pictures after capacity expansion.

具体地，用户例如通过关键词输入口输入了一定的关键词，并在这些现有的图片中进行选择，扩容装置101可以根据该用户输入的关键词以及该用户从现有图片中选择的图片，采用关键词及以图搜图的方式，通过现有的图片搜索引擎，搜索相应的图片，这些图片与该关键词相对应，同时与这些用户选择的现有图片也相匹配，从而进行图片扩容，获得扩容后的图片。Specifically, for example, the user inputs certain keywords through the keyword input port and selects among these existing pictures, and the capacity expansion device 101 can , using keywords and image search methods, through existing image search engines, search for corresponding images, these images correspond to the keywords, and at the same time match with the existing images selected by these users, so as to search for images Expand to obtain the expanded image.

例如，如图2所示，针对现有的有关“羊”的图片，用户通过勾选的方式，在这些现有图片中勾选了部分符合其需求的图片，此外，用户还输入了关键词，如在左上角的关键词输入口输入了关键词“羊”，则扩容装置101根据这些用户勾选出来的有关“羊”的现有图片，再结合该用户输入的关键词“羊”，利用已有的图片搜索引擎，采用关键词及以图搜图的方式，搜索相应的图片，从而对现有图片进行图片扩容，获得扩容后的图片。在此，图2仅示出部分界面，实际现有图片的数量不仅限于图2所示出的部分。For example, as shown in Figure 2, for the existing pictures related to "sheep", the user selects some pictures that meet his needs in these existing pictures by checking. In addition, the user also enters keywords , if the keyword "sheep" is input in the keyword input port in the upper left corner, then the expansion device 101 will combine the keyword "sheep" input by the user according to the existing pictures of "sheep" selected by these users, Use the existing image search engine to search for corresponding images by using keywords and image search methods, so as to expand the image size of the existing images and obtain the expanded images. Here, FIG. 2 only shows part of the interface, and the actual number of existing pictures is not limited to the part shown in FIG. 2 .

本领域技术人员应能理解，上述对现有图片进行图片扩容的方式仅为举例，其他现有或今后可能出现的对现有图片进行图片扩容的方式，如可适用于本发明，也应包含在本发明保护范围以内，并在此以引用的方式包含于此。Those skilled in the art should be able to understand that the above-mentioned way of expanding the size of an existing picture is only an example, and other existing or future ways of expanding the size of an existing picture, if applicable to the present invention, should also include within the scope of the present invention and are incorporated herein by reference.

优选地，该装置1还包括第二获取装置(未示出)。该第二获取装置获取所述用户设置的图片扩容的目标数量；其中，所述扩容装置101根据所述目标数量，对所述现有图片进行图片扩容，获得扩容后的图片。Preferably, the device 1 also includes a second acquisition device (not shown). The second acquiring means acquires the target number of picture expansion set by the user; wherein, the expansion means 101 performs picture expansion on the existing picture according to the target number, and obtains the picture after expansion.

具体地，用户还可以对图片扩容的目标数量进行设置，例如，用户通过与用户设备的交互，在设置入口设置图片扩容的目标数量为5万，则第二获取装置通过与用户设备的交互，通过一次或多次调用该用户设备提供的应用程序接口(API)或通过其他约定的通信方式，获取该用户设置的图片扩容的目标数量；随后，扩容装置101再根据该目标数量，通过前述所列举的各种方式，对现有图片进行图片扩容，从而获得扩容后的图片。Specifically, the user can also set the target number of picture expansion. For example, the user sets the target number of picture expansion at the setting entry to 50,000 through interaction with the user equipment, and the second obtaining means interacts with the user equipment, By calling the application programming interface (API) provided by the user equipment one or more times or through other agreed communication methods, the target number of image expansion set by the user is obtained; then, the capacity expansion device 101 then uses the aforementioned target number according to the target number. The various methods listed are used to expand the size of the existing picture, so as to obtain the expanded picture.

例如，如图3所示，用户设置的图片扩容的目标数量为5万，则在图3所示的“新任务提交”窗口内显示“下载数量级”为5万，此外，该窗口中的“任务名称：goat”表示需要下载的是有关“羊”的图片，“选择数量：300”表示用户从现有图片中选择了300张图片。随后，用户点击“确定提交任务”按钮，则开始对这些现有图片扩容。For example, as shown in Figure 3, if the target number of image expansion set by the user is 50,000, the "Download Order of magnitude" displayed in the "New Task Submission" window shown in Figure 3 is 50,000. Task name: goat" indicates that pictures about "sheep" need to be downloaded, and "number of selection: 300" indicates that the user has selected 300 pictures from existing pictures. Then, the user clicks the button of "OK to submit the task" to start expanding the size of these existing pictures.

聚类装置102对所述扩容后的图片进行聚类，获得对应的聚类结果。The clustering unit 102 clusters the expanded pictures to obtain a corresponding clustering result.

具体地，对于经扩容装置101扩容后的图片，聚类装置102对其进行聚类处理，使得具有相同或相似特征的图片被聚为一类，从而获得对应的聚类结果。例如，对于上例中扩容后的5万张有关“羊”的图片，聚类装置102对其进行图片聚类，由于这5万张图片虽然都与“羊”有关，但其包括具有各种不同图片特征的“羊”，如包括各种不同品种的“羊”的图片，甚至还包括漫画的“羊”、剪纸的“羊”，因此，聚类装置102可以按照不同的图片特征，对这5万张图片进行聚类，使得具有相同或相似的有关“羊”的特征的图片被聚为一类，从而获得对应的聚类结果。Specifically, the clustering device 102 performs clustering processing on the pictures expanded by the expansion device 101, so that pictures with the same or similar characteristics are clustered into one category, thereby obtaining corresponding clustering results. For example, for the 50,000 pictures related to "sheep" after the expansion in the above example, the clustering device 102 performs picture clustering on them. Although these 50,000 pictures are all related to "sheep", they include various "Sheep" with different picture features, such as pictures including "sheep" of various varieties, and even "sheep" in comics and paper-cut "sheep". Therefore, the clustering device 102 can classify The 50,000 pictures are clustered, so that the pictures with the same or similar features related to "sheep" are clustered into one category, so as to obtain the corresponding clustering results.

呈现装置103在至少一个聚类结果中选择预定数量的图片作为样例图片呈现给用户。The presenting means 103 selects a predetermined number of pictures from at least one clustering result as sample pictures and presents them to the user.

具体地，聚类装置102对扩容后的图片进行聚类获得对应的聚类结果之后，呈现装置103在其中至少一个聚类结果中选择预定数量的图片作为样例图片，并通过约定的呈现方式，将这些样例图片呈现给用户。在此，该预定数量的具体数值可以是系统预置的，也可以由用户进行调整。Specifically, after the clustering means 102 clusters the expanded pictures to obtain the corresponding clustering results, the presentation means 103 selects a predetermined number of pictures from at least one of the clustering results as sample pictures, and uses the agreed presentation method , presenting these sample images to the user. Here, the specific value of the predetermined number may be preset by the system, or may be adjusted by the user.

如图4所示，接上例，聚类装置102对这5万张“羊”的图片进行了聚类，获得了多个聚类结果，呈现装置103对其中每个聚类结果，从中选择6张图片作为样例图片，并在如图4所示的界面中呈现给用户。在此，图4仅示出部分界面，实际聚类结果的数量不仅限于图4所示出的部分。从图4中可以看出，每一行是一个聚类结果，每个聚类结果包括了6张样例图片，这6张样例图片具有相同或相似的图片特征。As shown in Figure 4, following the above example, the clustering device 102 clustered the 50,000 pictures of "sheep" and obtained a plurality of clustering results, and the presenting device 103 selected from among each of the clustering results The 6 pictures are used as sample pictures, and are presented to the user in the interface shown in Figure 4. Here, FIG. 4 only shows a part of the interface, and the number of actual clustering results is not limited to the part shown in FIG. 4 . It can be seen from Figure 4 that each row is a clustering result, and each clustering result includes 6 sample pictures, and these 6 sample pictures have the same or similar picture features.

第一获取装置104获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本。The first obtaining means 104 obtains positive and negative samples obtained by the user based on related operations on the clustering results.

具体地，对于各个聚类结果，用户可以选择其中的某个聚类结果进行相关操作，该用户对聚类结果的相关操作包括但不限于：Specifically, for each clustering result, the user can select one of the clustering results to perform related operations, and the user's related operations on the clustering results include but are not limited to:

将其中一个聚类结果选择为正样本；Select one of the clustering results as a positive sample;

将其中一个聚类结果选择为负样本；Select one of the clustering results as a negative sample;

对其中一个聚类结果继续进行聚类并对再次聚类后获得的聚类结果选择正、负样本；Continue to cluster one of the clustering results and select positive and negative samples for the clustering results obtained after clustering again;

当然，用户也可以对某一个聚类结果不进行任何操作，或者，用户对再次聚类后获得的聚类结果仍旧不满意，则可以继续进行聚类，直至用户对获得的聚类结果满意为止，用户可以从该满意的聚类结果中选择正、负样本。从而，第一获取装置104获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本。Of course, the user can also not perform any operation on a certain clustering result, or if the user is still not satisfied with the clustering result obtained after re-clustering, the clustering can be continued until the user is satisfied with the obtained clustering result , the user can select positive and negative samples from the satisfactory clustering results. Therefore, the first obtaining means 104 obtains positive and negative samples obtained by the user based on related operations on the clustering results.

本领域技术人员应能理解，上述用户对聚类结果的相关操作仅为举例，其他现有或今后可能出现的用户对聚类结果的相关操作，如可适用于本发明，也应包含在本发明保护范围以内，并在此以引用的方式包含于此。Those skilled in the art should be able to understand that the related operations of the above-mentioned users on the clustering results are only examples, and other existing or future related operations of users on the clustering results, if applicable to the present invention, should also be included in this document. within the scope of the invention and is incorporated herein by reference.

优选地，所述第一获取装置104还对至少一个聚类结果进行再次聚类，并对再次聚类后获得的聚类结果选择得到正、负样本。Preferably, the first obtaining means 104 also re-clusters at least one clustering result, and selects positive and negative samples from the clustering result obtained after re-clustering.

具体地，若用户对聚类装置102聚类后获得的至少一个聚类结果不满意，则可以选择对该至少一个聚类结果再次进行聚类，第一获取装置104例如基于该用户的指示，分别对该至少一个聚类结果进行再次聚类，若经过一次再聚类后，用户对再次聚类后获得的聚类结果满意，则可以从聚类结果中选择得到正、负样本；若该用户仍旧不满意，可继续对其进行聚类，直至对获得的聚类结果满意为止，用户可以从该满意的聚类结果中选择正、负样本。从而，第一获取装置104获取该用户最终选择的正、负样本。Specifically, if the user is dissatisfied with at least one clustering result obtained by the clustering means 102, he may choose to cluster the at least one clustering result again, and the first obtaining means 104, for example, based on the user's instruction, Re-clustering the at least one clustering result respectively, if after one re-clustering, the user is satisfied with the clustering result obtained after re-clustering, then can select positive and negative samples from the clustering results; if the If the user is still dissatisfied, he can continue to cluster until he is satisfied with the obtained clustering results, and the user can select positive and negative samples from the satisfactory clustering results. Therefore, the first obtaining means 104 obtains the positive and negative samples finally selected by the user.

训练装置105根据所述用户所选择的正负样本，训练对应的图片提纯模型。The training device 105 trains a corresponding image refinement model according to the positive and negative samples selected by the user.

具体地，根据用户所选择的正负样本，例如，全对的图片聚类结果作为正样本，全错的图片聚类结果作为负样本，训练装置105根据这些正、负样本，训练对应的图片提纯模型，获得一个小数据集合的图片提纯模型。Specifically, according to the positive and negative samples selected by the user, for example, all pairs of image clustering results are used as positive samples, and all wrong image clustering results are used as negative samples, the training device 105 trains the corresponding image Purification model, obtain a picture purification model of a small data set.

在此，装置1借助于小样本的模型训练用于海量图片数据的过滤，来提取出高质量的图片数据，利用已有的图片搜素引擎，例如百度或者google等，来获取原始图片数据，利用小样本(如少于10万)的图片数据进行聚类，如聚成数百类，每类选择几张图片作为该类的示例图片，呈现给用户来选择，用户从中选择正、负样本之后，便可进行模型训练，利用低成本的方式，少量的用户标注，就可以得到分类性能好的模型；区别于以往的几万、几十万张的人工标注量，用户只需要花几分钟时间便完成小样本的标注任务，随后即可启动模型训练，直接得到用于海量图片质量提纯的图片提纯模型，此后可以用该图片提纯模型从海量的图片数据中挖掘出更多高质量的图片。Here, device 1 is used for filtering massive image data by means of small-sample model training to extract high-quality image data, and uses existing image search engines, such as Baidu or Google, to obtain original image data, Use small samples (such as less than 100,000) for clustering of image data, such as clustering into hundreds of categories, select a few images for each category as example images of this category, and present them to the user for selection, and the user selects positive and negative samples from them After that, model training can be carried out, and a model with good classification performance can be obtained by using a low-cost method and a small number of user annotations; different from the previous tens of thousands or hundreds of thousands of manual annotations, users only need to spend a few minutes The small sample labeling task can be completed within a short time, and then the model training can be started to directly obtain the image purification model for mass image quality purification. After that, the image purification model can be used to mine more high-quality images from the massive image data .

优选地，该装置1还包括提纯装置(未示出)。该提纯装置根据所述图片提纯模型，对大规模图片集合进行图片质量提纯。Preferably, the device 1 also includes a purification device (not shown). The purifying device purifies the picture quality of a large-scale picture collection according to the picture purifying model.

具体地，对于训练装置105所训练获得的图片提纯模型，提纯装置根据该图片提纯模型，对大规模图片集合进行图片质量提纯，例如，对前述扩容后的5万张图片进行图片质量提纯，或对其他方式获得的大规模图片集合进行图片质量提纯，获得提纯后的图片。Specifically, for the image purification model trained by the training device 105, the purification device performs image quality purification on a large-scale image collection according to the image purification model, for example, performs image quality purification on the aforementioned 50,000 pictures after capacity expansion, or Perform image quality purification on large-scale image collections obtained by other methods to obtain purified images.

例如，如图5所示，训练装置105已经根据用户所选择的有关“羊”的图片的正、负样本，训练得到了一个图片提纯模型，提纯装置根据该图片提纯模型，对前述5万张有关“羊”的大规模图片集合进行图片质量提纯，获得了如图5所示的有关“羊”的图片，这些有关“羊”的图片更符合用户的质量需求。For example, as shown in Figure 5, the training device 105 has trained a picture purification model according to the positive and negative samples of the pictures about "sheep" selected by the user, and the purification device performs the above-mentioned 50,000 pictures according to the picture purification model. The large-scale picture collection of "sheep" is purified for picture quality, and the pictures about "sheep" as shown in Figure 5 are obtained, and these pictures about "sheep" are more in line with the quality requirements of users.

在此，装置1根据前述训练得到的图片提纯模型，用于海量图片数据的质量提纯，以从这些海量的图片数据中挖掘出更多高质量的图片，使得这些经质量提纯后的图片数据可以进一步用于模型训练。Here, device 1 is used for quality purification of massive picture data according to the picture purification model obtained from the aforementioned training, so as to dig out more high-quality pictures from these massive picture data, so that these quality-purified picture data can be further used for model training.

在步骤S601中，装置1对现有图片进行图片扩容，获得扩容后的图片。In step S601, the device 1 expands the size of the existing picture to obtain the expanded picture.

具体地，对于现有的图片，在步骤S601中，装置1可以直接在该现有图片的基础上，采取以图搜图的方式，通过在本地图片库中进行图片匹配、或者通过在线图片查找等方式，对现有图片进行图片扩容，从而获得扩容后的图片。在此，直接根据全部现有图片进行图片扩容的方式可以理解为该用户选择了全部的现有图片。或者，用户可以对现有图片进行选择，从中选择一部分符合其需求的图片，随后，在步骤S601中，装置1再根据该用户所选择的现有图片，采取以图搜图的方式，通过在本地图片库中进行图片匹配、或者通过在线图片查找等方式，对现有图片进行图片扩容，从而获得扩容后的图片。又或者，用户可以输入一定的关键词，例如，用户通过与用户设备的交互，在关键词输入口输入一定的关键词，则在步骤S601中，装置1通过与用户设备的交互，例如通过一次或多次调用该用户设备提供的应用程序接口(API)或通过其他约定的通信方式，获取该用户输入的关键词，并根据该关键词，通过在本地图片库或搜索引擎查找等方式，获取与该关键词匹配的图片，从而对现有图片进行图片扩容，获得扩容后的图片。再或者，在步骤S601中，装置1可以将上述以图搜图和关键词查找的方式相结合，即，在步骤S601中，装置1根据用户从现有图片中选择的全部或部分现有图片，再结合该用户输入的关键词，通过在本地图片库或搜索引擎查找等方式，获取与该关键词和全部或部分现有图片相匹配的图片，从而对现有图片进行图片扩容，获得扩容后的图片。在此，该现有图片可以是根据用户输入的关键词，通过图片搜索引擎所匹配获得的，也可以是用户直接上传的图片。在此，图片扩容后的目标数量可以不作具体要求，即，只要是与这些用户选择的全部或部分现有图片相匹配，或进一步与用户输入的关键词相匹配的图片，都可以拿来作为扩容后的图片；当然，该图片扩容后的目标数量也可以进行具体设置，其可以是系统预置的，也可以由用户来自行调整。Specifically, for an existing picture, in step S601, the device 1 can directly search for a picture based on the existing picture, by performing picture matching in the local picture library, or by searching for pictures online. etc. to expand the image size of the existing image, so as to obtain the expanded image. Here, the method of image expansion directly based on all existing images can be understood as that the user has selected all existing images. Alternatively, the user can select existing pictures and select some pictures that meet his needs. Then, in step S601, the device 1 will search for pictures by pictures according to the existing pictures selected by the user. Image matching is performed in the local image library, or through online image search, etc., and the image size of the existing image is expanded to obtain the expanded image. Alternatively, the user may input a certain keyword, for example, the user inputs a certain keyword in the keyword input port through interaction with the user equipment, then in step S601, the device 1 interacts with the user equipment, for example, once Or call the application programming interface (API) provided by the user's device multiple times or through other agreed communication methods to obtain the keywords entered by the user, and according to the keywords, obtain The image matching the keyword is used to expand the image size of the existing image and obtain the expanded image. Alternatively, in step S601, device 1 may combine the above methods of image search and keyword search, that is, in step S601, device 1 may select all or part of the existing pictures according to the , combined with the keyword entered by the user, by searching in the local image library or search engine, etc., to obtain pictures that match the keyword and all or part of the existing pictures, so as to expand the size of the existing pictures and obtain expansion after the picture. Here, the existing picture may be obtained through matching with a picture search engine according to keywords input by the user, or may be a picture directly uploaded by the user. Here, there is no specific requirement for the target number of pictures after expansion, that is, as long as they match all or part of the existing pictures selected by these users, or further match the keywords entered by the user, they can be used as The image after expansion; of course, the target number of the image after expansion can also be specifically set, which can be preset by the system or adjusted by the user.

优选地，在步骤S601中，装置1根据所述用户输入的关键词，利用图片搜索引擎获得与所述关键词匹配的图片，作为所述现有图片。Preferably, in step S601, the device 1 uses a picture search engine to obtain a picture matching the keyword according to the keyword input by the user as the existing picture.

具体地，用户例如通过与用户设备的交互，在预定入口或通过特定应用程序的关键词输入口，输入了某个关键词，随后，在步骤S601中，装置1通过与该用户设备的交互，如通过一次或多次调用对应的应用程序接口(API)，或其他约定的通信方式，获取了该用户输入的关键词；接着，在步骤S601中，装置1再利用图片搜索引擎，根据该用户输入的关键词进行查询，从而获得与该关键词匹配的图片，作为前述现有图片。Specifically, the user, for example, through interaction with the user equipment, enters a certain keyword at a predetermined entry or through a keyword input port of a specific application, and then, in step S601, the device 1 interacts with the user equipment, For example, by calling the corresponding application program interface (API) one or more times, or other agreed communication methods, the keyword input by the user is obtained; then, in step S601, the device 1 uses the image search engine again, according to the user The input keyword is searched, so as to obtain the picture matching the keyword as the aforementioned existing picture.

优选地，在步骤S601中，装置1获取所述用户上传的图片，作为所述现有图片。Preferably, in step S601, the device 1 obtains the picture uploaded by the user as the existing picture.

具体地，用户也可以直接上传图片，例如，用户通过与用户设备的交互，在预定入口或通过特定应用程序的图片上传入口，上传了一定数量的图片，则在步骤S601中，装置1通过与该用户设备的交互，如通过一次或多次调用对应的应用程序接口(API)，或其他约定的通信方式，获取了该用户上传的图片，作为前述现有图片。Specifically, the user can also directly upload pictures. For example, the user uploads a certain number of pictures at a predetermined entry or through the picture upload entry of a specific application through interaction with the user equipment, then in step S601, the device 1 passes through the The interaction of the user equipment, such as calling the corresponding application program interface (API) one or more times, or other agreed communication methods, obtains the picture uploaded by the user as the aforementioned existing picture.

优选地，在步骤S601中，装置1根据所述用户输入的关键词和自所述现有图片中选择的图片，采用关键词及以图搜图的方式进行图片扩容，获得扩容后的图片。Preferably, in step S601, the device 1 expands the size of the image by using keywords and searching images by image according to the keyword input by the user and the image selected from the existing images, and obtains the enlarged image.

具体地，用户例如通过关键词输入口输入了一定的关键词，并在这些现有的图片中进行选择，在步骤S601中，装置1可以根据该用户输入的关键词以及该用户从现有图片中选择的图片，采用关键词及以图搜图的方式，通过现有的图片搜索引擎，搜索相应的图片，这些图片与该关键词相对应，同时与这些用户选择的现有图片也相匹配，从而进行图片扩容，获得扩容后的图片。Specifically, for example, the user inputs certain keywords through the keyword input port and selects among these existing pictures. For the pictures selected in the website, use keywords and image search methods to search for corresponding pictures through existing picture search engines. These pictures correspond to the keywords and also match the existing pictures selected by these users. , so as to expand the image size and obtain the expanded image.

例如，如图2所示，针对现有的有关“羊”的图片，用户通过勾选的方式，在这些现有图片中勾选了部分符合其需求的图片，此外，用户还输入了关键词，如在左上角的关键词输入口输入了关键词“羊”，则在步骤S601中，装置1根据这些用户勾选出来的有关“羊”的现有图片，再结合该用户输入的关键词“羊”，利用已有的图片搜索引擎，采用关键词及以图搜图的方式，搜索相应的图片，从而对现有图片进行图片扩容，获得扩容后的图片。在此，图2仅示出部分界面，实际现有图片的数量不仅限于图2所示出的部分。For example, as shown in Figure 2, for the existing pictures related to "sheep", the user selects some pictures that meet his needs in these existing pictures by checking. In addition, the user also enters keywords , if the keyword "sheep" is input in the keyword input port in the upper left corner, then in step S601, the device 1 combines the keywords input by the user with the existing pictures of "sheep" selected by these users "Sheep" uses the existing picture search engine to search for corresponding pictures by using keywords and image search methods, so as to expand the size of existing pictures and obtain pictures after expansion. Here, FIG. 2 only shows part of the interface, and the actual number of existing pictures is not limited to the part shown in FIG. 2 .

优选地，该方法还包括步骤S606(未示出)。在步骤S606中，装置1获取所述用户设置的图片扩容的目标数量；其中，在步骤S601中，装置1根据所述目标数量，对所述现有图片进行图片扩容，获得扩容后的图片。Preferably, the method further includes step S606 (not shown). In step S606, device 1 obtains the target number of picture expansion set by the user; wherein, in step S601, device 1 performs picture expansion on the existing picture according to the target number, and obtains an expanded picture.

具体地，用户还可以对图片扩容的目标数量进行设置，例如，用户通过与用户设备的交互，在设置入口设置图片扩容的目标数量为5万，则在步骤S606中，装置1通过与用户设备的交互，通过一次或多次调用该用户设备提供的应用程序接口(API)或通过其他约定的通信方式，获取该用户设置的图片扩容的目标数量；随后，在步骤S601中，装置1再根据该目标数量，通过前述所列举的各种方式，对现有图片进行图片扩容，从而获得扩容后的图片。Specifically, the user can also set the target number of image expansion. For example, the user sets the target number of image expansion to 50,000 through the interaction with the user equipment at the setting entry. Then in step S606, the device 1 communicates with the user equipment interaction, by calling the application programming interface (API) provided by the user equipment one or more times or through other agreed communication methods, to obtain the target number of picture expansion set by the user; then, in step S601, the device 1 then according to For the target quantity, through the above-mentioned various methods, the image size of the existing image is expanded, so as to obtain the expanded image.

在步骤S602中，装置1对所述扩容后的图片进行聚类，获得对应的聚类结果。In step S602, the device 1 clusters the expanded pictures to obtain a corresponding clustering result.

具体地，对于经在步骤S601扩容后的图片，在步骤S602中，装置1对其进行聚类处理，使得具有相同或相似特征的图片被聚为一类，从而获得对应的聚类结果。例如，对于上例中扩容后的5万张有关“羊”的图片，在步骤S602中，装置1对其进行图片聚类，由于这5万张图片虽然都与“羊”有关，但其包括具有各种不同图片特征的“羊”，如包括各种不同品种的“羊”的图片，甚至还包括漫画的“羊”、剪纸的“羊”，因此，在步骤S602中，装置1可以按照不同的图片特征，对这5万张图片进行聚类，使得具有相同或相似的有关“羊”的特征的图片被聚为一类，从而获得对应的聚类结果。Specifically, for the picture expanded in step S601, in step S602, the device 1 performs clustering processing on it, so that pictures with the same or similar characteristics are clustered into one category, so as to obtain the corresponding clustering result. For example, for the 50,000 pictures related to "sheep" after expansion in the above example, in step S602, device 1 performs picture clustering on them. Although these 50,000 pictures are all related to "sheep", they include "Sheep" with various picture features, such as pictures including "sheep" of various varieties, and even "sheep" in comics and paper-cut "sheep". Therefore, in step S602, the device 1 can follow the Different image features are used to cluster the 50,000 images, so that the images with the same or similar features related to "sheep" are clustered into one category to obtain corresponding clustering results.

在步骤S603中，装置1在至少一个聚类结果中选择预定数量的图片作为样例图片呈现给用户。In step S603, the device 1 selects a predetermined number of pictures from at least one clustering result as sample pictures and presents them to the user.

具体地，在步骤S602中，装置1对扩容后的图片进行聚类获得对应的聚类结果之后，在步骤S603中，装置1在其中至少一个聚类结果中选择预定数量的图片作为样例图片，并通过约定的呈现方式，将这些样例图片呈现给用户。在此，该预定数量的具体数值可以是系统预置的，也可以由用户进行调整。Specifically, in step S602, after device 1 clusters the expanded pictures to obtain corresponding clustering results, in step S603, device 1 selects a predetermined number of pictures from at least one of the clustering results as sample pictures , and present these sample images to the user through the agreed presentation method. Here, the specific value of the predetermined number may be preset by the system, or may be adjusted by the user.

如图4所示，接上例，在步骤S602中，装置1对这5万张“羊”的图片进行了聚类，获得了多个聚类结果，在步骤S603中，装置1对其中每个聚类结果，从中选择6张图片作为样例图片，并在如图4所示的界面中呈现给用户。在此，图4仅示出部分界面，实际聚类结果的数量不仅限于图4所示出的部分。从图4中可以看出，每一行是一个聚类结果，每个聚类结果包括了6张样例图片，这6张样例图片具有相同或相似的图片特征。As shown in Figure 4, following the example above, in step S602, device 1 clustered the 50,000 pictures of "sheep" and obtained multiple clustering results, and in step S603, device 1 clustered each clustering results, select 6 pictures as sample pictures, and present them to the user in the interface shown in Figure 4. Here, FIG. 4 only shows a part of the interface, and the number of actual clustering results is not limited to the part shown in FIG. 4 . It can be seen from Figure 4 that each row is a clustering result, and each clustering result includes 6 sample pictures, and these 6 sample pictures have the same or similar picture features.

在步骤S604中，装置1获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本。In step S604, the device 1 acquires positive and negative samples obtained by the user based on related operations on the clustering results.

当然，用户也可以对某一个聚类结果不进行任何操作，或者，用户对再次聚类后获得的聚类结果仍旧不满意，则可以继续进行聚类，直至用户对获得的聚类结果满意为止，用户可以从该满意的聚类结果中选择正、负样本。从而，在步骤S604中，装置1获取所述用户基于对所述聚类结果的相关操作所得到的正、负样本。Of course, the user can also not perform any operation on a certain clustering result, or if the user is still not satisfied with the clustering result obtained after re-clustering, the clustering can be continued until the user is satisfied with the obtained clustering result , the user can select positive and negative samples from the satisfactory clustering results. Therefore, in step S604, the device 1 acquires the positive and negative samples obtained by the user based on related operations on the clustering results.

优选地，在步骤S604中，装置1还对至少一个聚类结果进行再次聚类，并对再次聚类后获得的聚类结果选择得到正、负样本。Preferably, in step S604, the device 1 further performs re-clustering on at least one clustering result, and selects positive and negative samples from the clustering result obtained after re-clustering.

具体地，若用户对在步骤S602中聚类后获得的至少一个聚类结果不满意，则可以选择对该至少一个聚类结果再次进行聚类，在步骤S604中，装置1例如基于该用户的指示，分别对该至少一个聚类结果进行再次聚类，若经过一次再聚类后，用户对再次聚类后获得的聚类结果满意，则可以从聚类结果中选择得到正、负样本；若该用户仍旧不满意，可继续对其进行聚类，直至对获得的聚类结果满意为止，用户可以从该满意的聚类结果中选择正、负样本。从而，在步骤S604中，装置1获取该用户最终选择的正、负样本。Specifically, if the user is dissatisfied with at least one clustering result obtained after clustering in step S602, he may choose to cluster the at least one clustering result again. In step S604, the device 1, for example, based on the user's indicates that the at least one clustering result is re-clustered respectively, and if the user is satisfied with the clustering result obtained after re-clustering after one re-clustering, the positive and negative samples can be selected from the clustering results; If the user is still not satisfied, he can continue to cluster until he is satisfied with the obtained clustering results, and the user can select positive and negative samples from the satisfied clustering results. Therefore, in step S604, the device 1 acquires the positive and negative samples finally selected by the user.

在步骤S605中，装置1根据所述用户所选择的正负样本，训练对应的图片提纯模型。In step S605, the device 1 trains a corresponding image refinement model according to the positive and negative samples selected by the user.

具体地，根据用户所选择的正负样本，例如，全对的图片聚类结果作为正样本，全错的图片聚类结果作为负样本，在步骤S605中，装置1根据这些正、负样本，训练对应的图片提纯模型，获得一个小数据集合的图片提纯模型。Specifically, according to the positive and negative samples selected by the user, for example, all pairs of image clustering results are used as positive samples, and all wrong image clustering results are used as negative samples. In step S605, the device 1 according to these positive and negative samples, Train the corresponding image refinement model to obtain a picture refinement model for a small data set.

优选地，该方法还包括步骤S607(未示出)。在步骤S607中，装置1根据所述图片提纯模型，对大规模图片集合进行图片质量提纯。Preferably, the method further includes step S607 (not shown). In step S607, the device 1 performs picture quality refinement on a large-scale picture collection according to the picture refinement model.

具体地，对于在步骤S605中所训练获得的图片提纯模型，在步骤S607中，装置1根据该图片提纯模型，对大规模图片集合进行图片质量提纯，例如，对前述扩容后的5万张图片进行图片质量提纯，或对其他方式获得的大规模图片集合进行图片质量提纯，获得提纯后的图片。Specifically, for the image purification model trained and obtained in step S605, in step S607, device 1 performs image quality purification on a large-scale image collection according to the image purification model, for example, for the aforementioned 50,000 images after the expansion Perform picture quality refinement, or perform picture quality refinement on a large-scale picture collection obtained in other ways, to obtain the refined picture.

例如，如图5所示，在步骤S605中，装置1已经根据用户所选择的有关“羊”的图片的正、负样本，训练得到了一个图片提纯模型，在步骤S607中，装置1根据该图片提纯模型，对前述5万张有关“羊”的大规模图片集合进行图片质量提纯，获得了如图5所示的有关“羊”的图片，这些有关“羊”的图片更符合用户的质量需求。For example, as shown in Figure 5, in step S605, device 1 has trained a picture purification model according to the positive and negative samples of the pictures about "sheep" selected by the user. Image refinement model, the image quality of the aforementioned 50,000 large-scale image collections related to "sheep" is purified, and the images related to "sheep" are obtained as shown in Figure 5. These images related to "sheep" are more in line with the user's quality need.

本发明还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机代码，当所述计算机代码被执行时，如前任一项所述的方法被执行。The present invention also provides a computer-readable storage medium, the computer-readable storage medium stores computer codes, and when the computer codes are executed, the method described in any one of the preceding items is executed.

本发明还提供了一种计算机程序产品，当所述计算机程序产品被计算机设备执行时，如前任一项所述的方法被执行。The present invention also provides a computer program product, when the computer program product is executed by a computer device, the method described in any one of the preceding items is executed.

本发明还提供了一种计算机设备，所述计算机设备包括：The present invention also provides a kind of computer equipment, and described computer equipment comprises:

一个或多个处理器；one or more processors;

当所述一个或多个计算机程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现如前任一项所述的方法。When the one or more computer programs are executed by the one or more processors, the one or more processors are made to implement the method as described in any one of the preceding items.

需要注意的是，本发明可在软件和/或软件与硬件的组合体中被实施，例如，本发明的各个装置可采用专用集成电路(ASIC)或任何其他类似硬件设备来实现。在一个实施例中，本发明的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地，本发明的软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质中，例如，RAM存储器，磁或光驱动器或软磁盘及类似设备。另外，本发明的一些步骤或功能可采用硬件来实现，例如，作为与处理器配合从而执行各个步骤或功能的电路。It should be noted that the present invention can be implemented in software and/or a combination of software and hardware. For example, each device of the present invention can be implemented by using an application specific integrated circuit (ASIC) or any other similar hardware devices. In one embodiment, the software program of the present invention can be executed by a processor to realize the steps or functions described above. Likewise, the software program (including associated data structures) of the present invention can be stored in a computer-readable recording medium such as RAM memory, magnetic or optical drive or floppy disk and the like. In addition, some steps or functions of the present invention may be implemented by hardware, for example, as a circuit that cooperates with a processor to execute each step or function.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一，第二等词语用来表示名称，而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the invention is not limited to the details of the above-described exemplary embodiments, but that the invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices stated in the system claims may also be realized by one unit or device through software or hardware. The words first, second, etc. are used to denote names and do not imply any particular order.

Claims

1. a kind of method for being used to train picture purification model, wherein, this method includes：

A carries out picture dilatation to existing picture, obtains the picture after dilatation；

B clusters to the picture after the dilatation, cluster result corresponding to acquisition；

C selects the picture of predetermined quantity to be presented to user as sample picture at least one cluster result；

D obtains the user based on the positive and negative samples obtained by the associative operation to the cluster result；

Positive negative samples of the e according to selected by the user, picture corresponding to training purify model.

2. according to the method for claim 1, wherein, step a includes：

The keyword inputted according to the user, the picture with the Keywords matching is obtained using photographic search engine, as The existing picture.

3. according to the method for claim 1, wherein, step a includes：

The picture that the user uploads is obtained, as the existing picture.

4. according to the method in any one of claims 1 to 3, wherein, step a includes：

The keyword inputted according to the user and the picture selected from the existing picture, use keyword and to scheme to search figure Mode carry out picture dilatation, obtain the picture after dilatation.

5. method according to any one of claim 1 to 4, wherein, this method also includes：

Obtain the destination number for the picture dilatation that the user is set；

Wherein, step a includes：

According to the destination number, picture dilatation is carried out to the existing picture, obtains the picture after dilatation.

6. method according to any one of claim 1 to 5, wherein, step d also includes：

At least one cluster result is clustered again, and the cluster result to being obtained after clustering again select to obtain it is positive and negative Sample.

7. method according to any one of claim 1 to 6, wherein, this method also includes：

Model is purified according to the picture, picture quality purification is carried out to extensive picture set.

8. a kind of device for being used to train picture purification model, wherein, the device includes：

Flash chamber, for carrying out picture dilatation to existing picture, obtain the picture after dilatation；

Clustering apparatus, for being clustered to the picture after the dilatation, cluster result corresponding to acquisition；

Device is presented, for selecting the picture of predetermined quantity to be presented to use as sample picture at least one cluster result Family；

First acquisition device, for obtaining the user based on the positive and negative sample obtained by the associative operation to the cluster result This；

Trainer, for the positive negative sample according to selected by the user, picture corresponding to training purifies model.

9. device according to claim 8, wherein, the flash chamber is used for：

10. device according to claim 8, wherein, the flash chamber is used for：

The picture that the user uploads is obtained, as the existing picture.

11. the device according to any one of claim 8 to 10, wherein, the flash chamber is used for：

12. the device according to any one of claim 8 to 11, wherein, the device also includes：

Second acquisition device, the destination number of the picture dilatation set for obtaining the user；

Wherein, the flash chamber is used for：

13. the device according to any one of claim 8 to 12, wherein, first acquisition device is additionally operable to：

14. the device according to any one of claim 8 to 13, wherein, the device also includes：

Purifying plant, for purifying model according to the picture, picture quality purification is carried out to extensive picture set.

15. a kind of computer-readable recording medium, the computer-readable recording medium storage has computer code, when the meter When calculation machine code is performed, the method as any one of claim 1 to 7 is performed.

16. a kind of computer program product, when the computer program product is performed by computer equipment, such as claim 1 It is performed to the method any one of 7.

17. a kind of computer equipment, the computer equipment includes：

One or more processors；

Memory, for storing one or more computer programs；

When one or more of computer programs are by one or more of computing devices so that one or more of Processor realizes the method as any one of claim 1 to 7.