WO2018179338A1 - Dispositif d'apprentissage machine et dispositif de reconnaissance d'image - Google Patents
Dispositif d'apprentissage machine et dispositif de reconnaissance d'image Download PDFInfo
- Publication number
- WO2018179338A1 WO2018179338A1 PCT/JP2017/013603 JP2017013603W WO2018179338A1 WO 2018179338 A1 WO2018179338 A1 WO 2018179338A1 JP 2017013603 W JP2017013603 W JP 2017013603W WO 2018179338 A1 WO2018179338 A1 WO 2018179338A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- neural network
- feature amount
- unit
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- the present invention relates to a machine learning device that inputs a learning image and outputs a feature value of the learning image and updates a neural network parameter, and an image recognition device that searches for a registered image similar to a recognition target image. It is.
- Non-Patent Document 1 discloses a machine learning device that updates a parameter of a model for extracting a feature amount of an image by performing machine learning for classifying images. Since this machine learning device is a device that uses a supervised learning method, machine learning is performed using teacher data.
- the conventional machine learning apparatus is configured as described above, it is necessary to collect a large amount of teacher data when performing machine learning. For this reason, when it is difficult to collect a large amount of teacher data, it is not possible to update and optimize the parameters of the model for extracting the feature amount of the image. As a result, there is a problem that the accuracy of the feature amount extracted by the model may be deteriorated.
- the present invention has been made to solve the above-described problems, and can update the parameters of a neural network that inputs a learning image and outputs a feature amount of the learning image without using teacher data.
- the purpose is to obtain a machine learning device.
- Another object of the present invention is to obtain an image recognition device that can search for a registered image similar to a recognition target image using a neural network whose parameters are updated by a machine learning device.
- the machine learning device includes a binary image conversion unit that converts a learning image, which is a learning target image, into a binary image, and a first neural that inputs the learning image and outputs a feature amount of the learning image.
- a feature amount extraction unit having a network and a second neural network that inputs a feature amount output from the first neural network and outputs a reconstructed image that is an image reconstructed from a learning image as a binary image
- An image reconstructing unit, and the parameter updating unit includes a first neural network according to a difference between the reconstructed image output from the second neural network and the binary image converted by the binary image converting unit. Each of the parameter and the parameter of the second neural network is updated.
- a binary image conversion unit that converts a learning image into a binary image
- a feature amount extraction unit that includes a first neural network that inputs the learning image and outputs the feature amount of the learning image
- An image reconstruction unit having a second neural network that inputs a feature amount output from the first neural network and outputs a reconstructed image
- a parameter update unit is output from the second neural network. Since each of the first neural network parameter and the second neural network parameter is updated according to the difference between the reconstructed image and the binary image converted by the binary image conversion unit, the teacher data Without using the parameter of the first neural network that inputs the learning image and outputs the feature amount of the learning image. There is an advantage of being able to update the data.
- FIG. 2 is a hardware configuration diagram of a computer when the image recognition apparatus is realized by software or firmware. It is a flowchart which shows the processing content in the sampling part 2 of the machine learning apparatus by Embodiment 1 of this invention.
- FIG. 1 is a block diagram showing a machine learning apparatus according to Embodiment 1 of the present invention
- FIG. 2 is a hardware block diagram showing a machine learning apparatus according to Embodiment 1 of the present invention.
- the machine learning apparatus according to the first embodiment uses a document image that is an image showing a document as a learning image that is an image to be learned.
- a form image that is an image showing a form may be used.
- the learning image storage unit 1 is realized by, for example, the learning image storage circuit 11 shown in FIG. 2, and stores a plurality of document images acquired in advance.
- the document image stored in the learning image storage unit 1 is assumed to be a gray scale image, for example. For this reason, in the case of a color image, it is assumed that the color image is converted into a grayscale image before being stored in the learning image storage unit 1.
- the document image acquisition method is not particularly limited, and may be, for example, a document image read by a scanner or a document image taken by a camera. However, when using an image taken by a camera as a document image, the photographed image is corrected as seen from directly in front of the document, and the area where the document is reflected is selected from the corrected photographed image. Cut out as an image.
- the paper size of a document can be specified, and the learning image storage unit 1 stores a document image showing a document of the same paper size.
- the document images stored in the learning image storage unit 1 are all document images in which documents of the same paper size are shown.
- the sampling unit 2 is realized by, for example, the sampling circuit 12 shown in FIG.
- the sampling unit 2 performs a process of sequentially selecting any one document image from a plurality of document images stored in the learning image storage unit 1. Further, the sampling unit 2 performs image processing for changing the image size of the selected document image and rotating the selected document image. Further, the sampling unit 2 extracts a region having the same size as the binary image converted by the binary image conversion unit 3 from the document image after the image processing, and uses the extracted region as a document image. A process of outputting to each of the conversion unit 3 and the image generation unit 4 is performed.
- the binary image conversion unit 3 is realized by, for example, a binary image conversion circuit 13 shown in FIG.
- the binary image conversion unit 3 converts the document image output from the sampling unit 2 into a binary image, and performs a process of outputting the converted binary image to the parameter update unit 7.
- the image generation unit 4 is realized by, for example, the image generation circuit 14 illustrated in FIG.
- the image generation unit 4 adjusts the pixel value of the document image output from the sampling unit 2 to generate a document image affected by the disturbance, and outputs the generated document image to the feature amount extraction unit 5 To implement.
- Disturbances include the factors of the image acquisition equipment in addition to the factors of the environment in which the document image is taken.
- the document image adjustment processing includes, for example, processing for adding Gaussian noise and sesame salt noise to the document image, processing for performing Gaussian blurring of the document image, processing for adjusting the sharpness, contrast, and brightness value of the document image. It is done.
- the feature quantity extraction unit 5 is realized by, for example, a feature quantity extraction circuit 15 shown in FIG.
- the feature amount extraction unit 5 has a first neural network that inputs the document image output from the image generation unit 4 and outputs the feature amount of the document image.
- the first neural network included in the feature quantity extraction unit 5 is a convolutional neural network (CNN).
- the image reconstruction unit 6 is realized by, for example, an image reconstruction circuit 16 illustrated in FIG.
- the image reconstruction unit 6 inputs the feature amount output from the first neural network included in the feature amount extraction unit 5, and reconstructs a reconstructed image that is an image reconstructed as a binary image.
- a second neural network for output is included. In the first embodiment, it is assumed that the second neural network included in the image reconstruction unit 6 has CNN.
- the parameter update unit 7 is realized by, for example, the parameter update circuit 17 shown in FIG.
- the parameter update unit 7 extracts feature amounts according to the difference between the reconstructed image output from the second neural network included in the image reconstructing unit 6 and the binary image output from the binary image conversion unit 3.
- a process of updating the parameters of the first neural network included in the unit 5 and the parameters of the second neural network included in the image reconstruction unit 6 is performed.
- the parameter storage unit 8 is realized by, for example, the parameter storage circuit 18 illustrated in FIG. 2, and stores each of the parameters of the first neural network and the parameters of the second neural network updated by the parameter update unit 7. To do.
- a learning image storage unit 1 a sampling unit 2, a binary image conversion unit 3, an image generation unit 4, a feature amount extraction unit 5, an image reconstruction unit 6, and a parameter update unit 7 that are components of the machine learning device.
- Each of the parameter storage units 8 is assumed to be realized by dedicated hardware as shown in FIG. That is, what is realized by the learning image storage circuit 11, the sampling circuit 12, the binary image conversion circuit 13, the image generation circuit 14, the feature amount extraction circuit 15, the image reconstruction circuit 16, the parameter update circuit 17 and the parameter storage circuit 18. Is assumed.
- each of the learning image storage circuit 11 and the parameter storage circuit 18 includes, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Only Memory), and an EEPROM (Electrically Easy Memory).
- RAM Random Access Memory
- ROM Read Only Memory
- flash memory an EPROM (Erasable Programmable Read Only Memory)
- EEPROM Electrically Easy Memory
- Non-volatile or volatile semiconductor memories such as Read Only Memory), magnetic disks, flexible disks, optical disks, compact disks, mini disks, DVDs (Digital Versatile Discs), and the like are applicable.
- the sampling circuit 12, the binary image conversion circuit 13, the image generation circuit 14, the feature amount extraction circuit 15, the image reconstruction circuit 16, and the parameter update circuit 17 are, for example, a single circuit, a composite circuit, a programmed processor, A parallel-programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a combination thereof is applicable.
- the components of the machine learning device are not limited to those realized by dedicated hardware, and the machine learning device may be realized by software, firmware, or a combination of software and firmware.
- Software or firmware is stored as a program in the memory of a computer.
- the computer means hardware that executes a program, and includes, for example, a CPU (Central Processing Unit), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, a DSP (Digital Signal Processor), and the like. .
- FIG. 3 is a hardware configuration diagram of a computer when the machine learning device is realized by software or firmware.
- the learning image storage unit 1 and the parameter storage unit 8 are configured on the memory 21 or the storage 22 of the computer, and the sampling unit 2, the binary image conversion unit 3, the image A program for causing the computer to execute the processing procedures of the generation unit 4, the feature amount extraction unit 5, the image reconstruction unit 6, and the parameter update unit 7 is stored in the memory 21 or the storage 22, and the processor 23 of the computer stores the memory 21 or the storage
- the program stored in the program 22 may be executed.
- an image input unit 24 is an input interface device for inputting a document image
- a result output unit 25 receives a reconstructed image output from the second neural network included in the image reconstructing unit 6. It is an output interface device that outputs.
- FIG. 2 shows an example in which each component of the machine learning device is realized by dedicated hardware
- FIG. 3 shows an example in which the machine learning device is realized by software, firmware, etc. Some components in the machine learning device may be realized by dedicated hardware, and the remaining components may be realized by software, firmware, or the like.
- FIG. 4 is a block diagram showing an image recognition apparatus according to Embodiment 1 of the present invention
- FIG. 5 is a hardware block diagram showing the image recognition apparatus according to Embodiment 1 of the present invention
- the registered image storage unit 31 is realized by, for example, the registered image storage circuit 41 illustrated in FIG. 5, and stores one or more document images (learning target images) as registered images.
- This registered image may be the same document image as the document image stored in the learning image storage unit 1 of the machine learning device in FIG. 1, or may be stored in the learning image storage unit 1 of the machine learning device in FIG.
- the document image may be different from the existing document image.
- the recognition target image storage unit 32 is realized by, for example, the recognition target image storage circuit 42 illustrated in FIG. 5, and stores a recognition target image that is a recognition target document image.
- the feature amount detection unit 33 is realized by, for example, a feature amount detection circuit 43 illustrated in FIG.
- the feature amount detection unit 33 includes a first feature amount detection unit 34, an image regeneration unit 35, and a second feature amount detection unit 36.
- the feature amount detection unit 33 is stored in the registered image storage unit 31 using the mounted neural network.
- the feature amount detection unit 33 uses the implemented neural network as a preprocessing for starting the search process in which the image search unit 38 searches for a registered image similar to the recognition target image, and uses the mounted recognition target image storage unit 32.
- the recognition target image stored in is input, and the feature amount of the recognition target image is output.
- the first feature amount detecting unit 34 performs the registration process as shown in FIG.
- the first neural network whose parameters are updated by the parameter updating unit 7 of the machine learning apparatus is provided.
- the first feature amount detection unit 34 performs the preprocessing of the search process
- the first neural network that is implemented inputs the recognition target image stored in the recognition target image storage unit 32.
- the feature amount of the recognition target image is output.
- the first neural network included in the first feature quantity detection unit 34 is the same neural network as the first neural network included in the feature quantity extraction unit 5 of the machine learning device in FIG.
- the image regeneration unit 35 inputs the feature amount of the registered image output from the first neural network included in the first feature amount detection unit 34, and the registered image As a neural network that outputs a reconstructed registered image that is an image reconstructed, the second neural network whose parameters are updated by the parameter updating unit 7 of the machine learning device of FIG. 1 is provided.
- the image regenerating unit 35 outputs the mounted second neural network from the first neural network included in the first feature amount detecting unit 34.
- the feature amount of the recognized recognition target image is input, and a reconstructed recognition image that is a reconstructed image of the recognition target image is output.
- the second neural network included in the image regeneration unit 35 is the same neural network as the second neural network included in the image reconstruction unit 6 of the machine learning device in FIG.
- the second feature amount detection unit 36 When performing the registration process, the second feature amount detection unit 36 inputs the reconstructed registration image output from the second neural network included in the image regenerating unit 35, and reconstructs and registers it. As a neural network that outputs image feature amounts, the first neural network whose parameters are updated by the parameter updating unit 7 of the machine learning device of FIG. 1 is provided. When the preprocessing of the search processing is performed, the second feature amount detection unit 36 outputs the mounted first neural network from the second neural network included in the image regeneration unit 35. The reconstructed recognition image is input, and the feature amount of the reconstructed recognition image is output.
- the first neural network included in the second feature amount detection unit 36 is the same neural network as the first neural network included in the feature amount extraction unit 5 of the machine learning device in FIG.
- the feature amount storage unit 37 is realized by, for example, a feature amount storage circuit 44 illustrated in FIG.
- the feature amount storage unit 37 outputs from the first neural network included in the first feature amount detection unit 34 as the feature amount of the registered image output from the neural network included in the feature amount detection unit 36.
- Each of the registered feature values of the registered image and the feature amounts of the reconstructed registered image output from the first neural network included in the second feature value detection unit 36 is stored.
- the image search unit 38 is realized by, for example, an image search circuit 45 shown in FIG.
- the image storage unit 38 a of the image search unit 38 stores the reconstructed registered image output from the image regeneration unit 35.
- the image storage unit 38a of the image search unit 38 stores the reconstruction registration image output from the image regeneration unit 35.
- the image regeneration unit 35 performs the reconstruction registration. You may make it provide the image memory
- the image recognition apparatus in FIG. 4 may include a reconstructed registered image storage unit that stores the reconstructed registered image output from the image regenerating unit 35.
- the image search unit 38 includes the feature amounts of one or more registered images output from the first feature amount detection unit 34 among the feature amounts of one or more registered images stored in the feature amount storage unit 37. Then, a process of calculating the similarity with the feature quantity of the recognition target image output from the first feature quantity detection unit 34 is performed. The image search unit 38 identifies a registered image having the highest calculated similarity among one or more registered images stored in the registered image storage unit 31, and the identified registered image is similar to the recognition target image. The process of outputting as a search result of the registered image is executed. The image search unit 38 also includes one or more feature amounts of one or more reconstructed registered images output from the second feature amount detection unit 36 among the one or more feature amounts stored in the feature amount storage unit 37.
- the image search unit 38 identifies a reconstructed registered image having the highest calculated similarity among one or more reconstructed registered images stored in the image storage unit 38a, and corresponds to the identified reconstructed registered image.
- the registration image to be output is output as a search result of registered images similar to the recognition target image.
- each of the registered image storage unit 31, the recognition target image storage unit 32, the feature amount detection unit 33, the feature amount storage unit 37, and the image search unit 38 which are components of the image recognition apparatus, is illustrated in FIG. 5. It is assumed that it is realized by special hardware. That is, it is assumed to be realized by the registered image storage circuit 41, the recognition target image storage circuit 42, the feature amount detection circuit 43, the feature amount storage circuit 44, and the image search circuit 45.
- each of the registered image storage circuit 41, the recognition target image storage circuit 42, and the feature amount storage circuit 44 includes, for example, a nonvolatile or volatile semiconductor memory such as a RAM, a ROM, a flash memory, an EPROM, an EEPROM, or a magnetic field.
- Discs, flexible discs, optical discs, compact discs, mini discs, DVDs and the like are applicable.
- the feature quantity detection circuit 43 and the image search circuit 45 correspond to, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC, an FPGA, or a combination thereof.
- FIG. 6 is a hardware configuration diagram of a computer when the image recognition apparatus is realized by software or firmware.
- the registered image storage unit 31, the recognition target image storage unit 32, and the feature amount storage unit 37 are configured on the memory 51 or the storage 52 of the computer, and the feature amount detection unit 33 and a program for causing the computer to execute the processing procedure of the image search unit 38 are stored in the memory 51 or the storage 52, and the processor 53 of the computer executes the program stored in the memory 51 or the storage 52.
- an image input device 54 is an input interface device that inputs a registered image or a recognition target image
- a result output device 55 is an output interface device that outputs a search result of a registered image by the image search unit 38.
- FIG. 5 shows an example in which each component of the image recognition apparatus is realized by dedicated hardware
- FIG. 6 shows an example in which the image recognition apparatus is realized by software, firmware, etc. Some components in the image recognition apparatus may be realized by dedicated hardware, and the remaining components may be realized by software, firmware, or the like.
- the learning image storage unit 1 stores a plurality of document images acquired in advance.
- the document image stored in the learning image storage unit 1 is, for example, a gray scale image.
- the color image is converted into a grayscale image before being stored in the learning image storage unit 1.
- the sampling unit 2 selects any one document image from a plurality of document images stored in the learning image storage unit 1, and converts the selected document image into a binary image conversion unit 3 and an image generation unit 4. Output to each of.
- FIG. 7 is a flowchart showing the processing contents in the sampling unit 2 of the machine learning device according to Embodiment 1 of the present invention. Hereinafter, the processing content of the sampling unit 2 will be described in detail with reference to FIG.
- the sampling unit 2 randomly selects any one of the document images stored in the learning image storage unit 1 (step ST1 in FIG. 7).
- the sampling unit 2 changes the image size of the selected document image to a preset image size (H, W) (step ST2 in FIG. 7).
- H is the height of the document image
- W is the width of the document image.
- the sampling unit 2 the parameters P 1 indicating the image scale S determined by a random number, obtaining the parameter P 2 indicating the rotation angle ⁇ of the image by the random number (step ST3 in FIG. 7).
- Sampling unit 2 based on the image scale S shown parameter P 1, to implement the image processing for changing an image size of a document image selected (step ST4 in FIG. 7).
- the sampling unit 2 the rotation axis center of the document image selected, by the rotation angle ⁇ indicated by the parameter P 2, performing the image processing for rotating the document image selected (step ST5 in FIG. 7).
- the sampling unit 2 determines the coordinates (X, Y) of a part of the region cut out from the document image after the image processing by using a random number (step ST6 in FIG. 7).
- the coordinates (X, Y) of the partial area are, for example, the coordinates of the upper left corner point of the partial area.
- the sampling unit 2 performs a process of cutting out a part of the document image after the image processing, for example, a region where the coordinates of the upper left corner point are the determined coordinates (X, Y) (step in FIG. 7). ST7).
- the image size of the cutout area is an image size (h, w) set in advance.
- the sampling unit 2 outputs a part of the cut out region as a document image to each of the binary image conversion unit 3 and the image generation unit 4.
- the sampling unit 2 determines whether there are any document images that have not yet been selected among the plurality of document images stored in the learning image storage unit 1 (step ST8 in FIG. 7). If there remains a document image that has not yet been selected (step ST8: YES in FIG. 7), the sampling unit 2 repeatedly performs the processing of steps ST1 to ST8. If there is no document image that has not yet been selected (step ST8 in FIG. 7: NO), the sampling unit 2 ends the process.
- a document image can be generated as an approximately infinite learning sample from a document image that is a finite learning image. For this reason, as a generalization performance of learning results, an improvement in the ability to identify unknown objects is expected.
- the binary image conversion unit 3 converts the document image output from the sampling unit 2 into a binary image, and outputs the converted binary image to the parameter update unit 7.
- an algorithm for converting a document image into a binary image for example, adaptive threshold processing (processing using an Adaptive Threshold function) can be used. However, what is necessary is just to be able to convert a document image into a binary image. A simple algorithm may be used.
- FIG. 8 is a flowchart showing the processing contents in the image generation unit 4 of the machine learning device according to Embodiment 1 of the present invention. Hereinafter, the adjustment process of the image generation unit 4 will be specifically described with reference to FIG.
- the embodiment is not limited to the following six adjustment processes. For example, one or more and five or less adjustment processes are performed. Alternatively, seven or more adjustment processes may be performed.
- the order of the following six adjustment processes may be any order. For example, the order can be determined by a random number.
- the image generation unit 4 Upon receiving the document image output from the sampling unit 2, the image generation unit 4 adds, for example, a variance value determined by a random number as Gaussian noise to the luminance value of each pixel constituting the document image. Adjustment processing is performed (step ST11 in FIG. 8). Next, the image generation unit 4 determines a pixel to which sesame salt noise is added from each pixel constituting the document image based on a probability determined by a random number. Then, the image generation unit 4 adds sesame salt noise to the pixel by largely changing the determined luminance value of the pixel from the luminance values of the pixels around the pixel (step ST12 in FIG. 8).
- a variance value determined by a random number as Gaussian noise
- the luminance value of a peripheral pixel is a luminance value on the black side of the threshold value used for the threshold processing of the binary image in the binary image conversion unit 3
- the luminance value of the pixel is set to the whitest luminance value.
- the luminance value of the surrounding pixel is a luminance value on the white side of the threshold value
- the luminance value of the pixel is set to the blackest luminance value.
- the image generation unit 4 performs Gaussian blurring processing to blur the document image using, for example, a Gaussian function (step ST13 in FIG. 8).
- the image generation unit 4 determines a parameter indicating sharpness using a random number, and performs a process of adjusting the sharpness of the document image according to the determined parameter (step ST14 in FIG. 8).
- the image generation unit 4 determines a parameter indicating the contrast using a random number, and performs a process of adjusting the contrast of the document image according to the determined parameter (step ST15 in FIG. 8).
- the image generation unit 4 determines a parameter indicating the luminance value by a random number, and performs a process of adjusting the luminance value of the document image according to the determined parameter (step ST16 in FIG. 8).
- the feature amount extraction unit 5 has a first neural network that inputs the document image output from the image generation unit 4 and outputs the feature amount of the document image.
- the first neural network included in the feature amount extraction unit 5 is CNN, and the first neural network includes a convolution layer that performs convolution of the feature amount of the document image and a pooling layer that performs pooling processing. It is out.
- FIG. 9 is an explanatory diagram illustrating a configuration example of a first neural network included in the feature amount extraction unit 5.
- INPUT is an image input unit
- an image input from INPUT is a document image that is affected by a disturbance output from the image generation unit 4.
- OUTPUT is a feature value output unit, and the feature value output from OUTPUT is the feature value of the document image.
- CONV (1), CONV (2), and CONV (3) is a convolutional layer included in the first neural network. In the convolution layer, after performing the convolution of the feature amount of the document image, the calculation of the activation function is performed. In FIG. 9, the notation of the calculation of the activation function is omitted.
- Each of POOL (1) and POOL (2) is a pooling layer included in the first neural network.
- FIG. 10 is an explanatory diagram illustrating a convolution process in the convolution layer.
- the input feature amount shown in FIG. 10 includes feature amounts of a plurality of regions in the document image input to the convolution layer, and the feature amounts of the plurality of regions correspond to an input feature amount map that is data of a two-dimensional structure.
- FIG. 10 shows an example in which the document image has 25 regions (5 ⁇ 5 in the figure). That is, the document image shows an example in which the A direction has 5 areas and the B direction has 5 areas.
- the input feature amount includes k (k is an integer of 1 or more) input feature amount maps. If the input feature amount includes two or more input feature amount maps, two or more input features are provided.
- the feature amount map is expressed as data having a three-dimensional structure.
- the input feature amount is represented by a k map.
- the input feature The quantity includes three input feature quantity maps as an input feature quantity map for R, an input feature quantity map for G, and an input feature quantity map for R.
- the convolution layer includes a weight filter that is a convolution target, and the weight filter is called a kernel.
- the two-dimensional size of the kernel is 3 in the A direction and 3 in the B direction.
- the kernel is data having a three-dimensional structure and has the same depth size as the input feature amount map. Therefore, if the input feature quantity includes k input feature quantity maps, the kernel depth size is k. In FIG. 10, the kernel is represented by a k map.
- the feature quantity extraction unit 5 performs the calculation of the convolution process shown in the following equation (1) while moving the kernel on the plane that is the input feature map.
- “c1 ⁇ b1 + pad A ” in x (c1 ⁇ b1 + pad A , c2 ⁇ b2 + pad B , b3) corresponds to “a1” in the input feature quantity x (a1, a2, a3)
- “c2 “ ⁇ b2 + pad B ” corresponds to “a2” in the input feature quantity x (a1, a2, a3)
- “b3” corresponds to “a3” in the input feature amount x (a1, a2, a3).
- w (b1, b2, b3) is a parameter indicating a kernel weight value, and is a parameter of the first neural network updated by the parameter updating unit 7.
- y (c1, c2) is an output feature amount of each region in the document image.
- the size of the output feature amount map which is the feature amount of a plurality of regions output from the convolution layer, changes.
- FIG. 10 shows an example in which the input feature value map and the output feature value map are maps of the same size.
- the feature quantity extraction unit 5 may perform zero padding to fill the input feature quantity x (a1, a2, a3) having a negative index with zero.
- zero padding is not essential, and zero padding may not be performed.
- parameters related to the convolution process include a stride parameter indicating the movement amount of the kernel.
- the feature quantity y (1, 1) of one area in the output feature quantity is the feature quantity x (a1, a2, a3) of nine areas in the input feature quantity, that is, x (0 , 0, 0), x (0, 1, 0), x (0, 2, 0), x (1, 0, 0), x (1, 1, 0), x (1, 2, 0) , X (2, 0, 0), x (2, 1, 0) and x (2, 2, 0).
- Expression (1) shows an example in which there is one kernel
- expression (2) shows calculation of convolution processing when there are a plurality of kernels.
- k is an index of the output feature amount map.
- the number of output feature amount maps is the same as the number of kernels.
- FIG. 11 is an explanatory diagram showing a pooling process in the pooling layer.
- the pooling process in the pooling layer which the feature-value extraction part 5 implements differs from a general pooling process. That is, the pooling process performed by the feature quantity extraction unit 5 is similar to the general pooling process, for each local area that is a partial area of the output feature quantity map, for the feature quantity included in the local area. Among them, the maximum feature amount is extracted, and the extracted feature amount is output. Unlike the general pooling process, the pooling process performed by the feature quantity extraction unit 5 also outputs position information indicating the position in the document image where the extracted feature quantity exists.
- the structure of the pooling layer changes depending on the two-dimensional size of the kernel, padding parameters, and stride parameters.
- the A direction is 2
- the B direction is 2
- the C direction is K with respect to the input feature quantity of (4 ⁇ 4 ⁇ K) in which the A direction is 4, the B direction is 4 and the C direction is K.
- an output feature amount (2 ⁇ 2 ⁇ K) in which the A direction is 2, the B direction is 2, and the C direction is K is obtained.
- An example is shown.
- the kernel stride value is 2 and the padding value is zero.
- the maximum feature amount is extracted from the feature amounts included in the local region for each (2 ⁇ 2 ⁇ K) local region in the input feature amount map by the pooling process.
- the feature quantity (0, 0, 0) of one area in the output feature quantity is the feature quantity x (a1, a2, a3) of four areas in the input feature quantity, that is, x ( Among the 0, 0, 0), x (0, 1, 0), x (1, 0, 0) and x (1, 1, 0), the maximum feature amount x (0, 0, 0) is obtained. Extracting.
- the extracted maximum feature quantity x (0, 0, 0) is output.
- position information indicating the position in the document image where the extracted maximum feature quantity x (0, 0, 0) exists is output.
- FIG. 11 illustrates a position map indicating the local maximum value position as position information indicating the position in the document image where the maximum feature amount exists.
- the position map is expressed as data having the same three-dimensional structure as the input feature amount.
- “1” is written in the position in the document image corresponding to the maximum feature value
- “0” is written in the position in the document image corresponding to the feature value other than the maximum feature value.
- the position information is a position map having a three-dimensional structure.
- Equation (3) shows the pooling calculation.
- Expression (3) shows an example in which one feature value is output from a local region including four regions in the input feature value map using the operator f (.).
- the first embodiment shows an example in which the operator f (.) Is an operator that calculates the maximum value, and the pooling process in which such an operator f (.) Is used is the maximum pooling (Max). Called Pooling).
- the maximum value of the local area is calculated as one feature amount, and at the same time, position information indicating the position in the document image that is the maximum value is also calculated.
- position information indicating two or more positions may be output.
- the pooling process in the pooling layer is not limited to the maximum pooling, and may be another pooling process such as an average pooling (Average Pooling).
- the operator f (.) In Expression (3) is an operator that calculates an average value.
- the first neural network included in the feature amount extraction unit 5 is CNN.
- the first neural network is not limited to CNN.
- a multilayer structure such as a deep neural network is used. It may be a neural network.
- the image reconstruction unit 6 inputs the feature amount output from the first neural network included in the feature amount extraction unit 5, and reconstructs a reconstructed image that is an image reconstructed as a binary image.
- a second neural network for output is included.
- the second neural network included in the image reconstruction unit 6 is a CNN, and the second neural network includes an inverse pooling layer that performs inverse pooling processing, and a convolution layer that performs convolution of feature values of a binary image. Including.
- FIG. 12 is an explanatory diagram illustrating a configuration example of the second neural network included in the image reconstruction unit 6.
- INPUT is a feature quantity input unit
- the feature quantity input from INPUT is a feature quantity output from the first neural network included in the feature quantity extraction unit 5.
- OUTPUT is a feature value output unit, and the size of the output feature value, which is the feature value of a plurality of regions output from OUTPUT, is the same size as the size of the input feature value shown in FIG.
- Each of UNPOOL (1) and UNPOOL (2) is an inverse pooling layer included in the second neural network.
- UNDOOL (1) included in the second neural network corresponds to POOL (2) shown in FIG. 9, and UNDOOL (2) included in the second neural network is POOL (2) shown in FIG.
- Each of CONV (1), CONV (2), and CONV (3) is a convolutional layer included in the second neural network.
- the convolution layer the activation function is calculated after convolution of the feature quantity of the reconstructed image.
- the notation of the activation function calculation is omitted.
- the convolutional layer included in the second neural network included in the image reconstruction unit 6 is similar to the convolutional layer included in the first neural network included in the feature amount extraction unit 5.
- the size of the input feature map and the size of the output feature map are the same.
- an output feature amount that is a feature amount of a plurality of regions output from the first neural network included in the feature amount extraction unit 5 Needs to be restored to the size of the input feature quantity shown in FIG. That is, the size of the feature map, which is the output feature whose size is reduced by the pooling process of the pooling layer included in the first neural network included in the feature extractor 5, is reconstructed.
- the second neural network included in the image reconstruction unit 6 includes an inverse pooling layer corresponding to the pooling layer included in the first neural network included in the feature amount extraction unit 5. It is out.
- FIG. 13 is an explanatory diagram illustrating reverse pooling processing in the reverse pooling layer.
- the size of the output feature amount of the pooling layer corresponding to the reverse pooling layer And the size of the input feature amount of the inverse pooling layer match.
- the size of the input feature amount of the pooling layer corresponding to the reverse pooling layer and the size of the output feature amount of the reverse pooling layer match.
- the reverse pooling process shown in FIG. 13 shows an example in which (2 ⁇ 2 ⁇ K) input feature values are converted to (4 ⁇ 4 ⁇ K) output feature values. In the reverse pooling process shown in FIG.
- the positional information acquired from the corresponding pooling layer is input, and among the output feature values of (4 ⁇ 4 ⁇ K), the maximum feature value indicated by the position information is used.
- the value of the input feature amount is inserted, and zero is inserted into the feature amount at a position other than the position of the maximum value indicated by the position information.
- the input feature quantity x (0, 0, 0) is inserted at the second position from the left in the A direction and the second position from the top in the B direction in the output feature quantity of (4 ⁇ 4 ⁇ K). . Further, the input feature quantity x (0, 1, 0) is inserted at the third position from the left in the A direction and the first position from the top in the B direction in the output feature quantity of (4 ⁇ 4 ⁇ K). . The input feature quantity x (1, 0, 0) is inserted at the second position from the left in the A direction and the third position from the top in the B direction in the output feature quantity of (4 ⁇ 4 ⁇ K). .
- the input feature quantity x (1, 1, 0) is inserted at the third position from the left in the A direction and the fourth position from the top in the B direction in the output feature quantity of (4 ⁇ 4 ⁇ K). Zeros are inserted at other positions of the output feature amount of (4 ⁇ 4 ⁇ K).
- the parameter update unit 7 calculates a difference between the reconstructed image output from the second neural network included in the image reconstructing unit 6 and the binary image output from the binary image conversion unit 3.
- the parameter updating unit 7 is configured so that the parameter of the first neural network that the feature amount extraction unit 5 has and the second neural network that the image reconstruction unit 6 has so that the calculated difference is minimized. Update each of the parameters. That is, the parameter updating unit 7 sets the parameter w (b1, b2, b3) indicating the kernel weight value in the first neural network and the kernel weight value in the second neural network so that the calculated difference is minimized. Each of the parameters w (b1, b2, b3) indicating is updated.
- the difference between the reconstructed image and the binary image calculated by the parameter updating unit 7 may be, for example, a mean square error (MSE: Mean Square Error) between the reconstructed image and the binary image, or the reconstructed image. And a cross-entropy of a binary image. Further, for example, a stochastic gradient descent method or the like can be used as an optimization algorithm in which the parameter updating unit 7 updates the parameters so that the difference is minimized.
- MSE Mean Square Error
- the parameter storage unit 8 is a parameter w (b1, b2, b2) indicating kernel weight values in the first neural network updated by the parameter updating unit 7 as parameters of the first neural network updated by the parameter updating unit 7. b3) is stored.
- the parameter storage unit 8 also includes a parameter w (b1, b1) indicating a kernel weight value in the second neural network updated by the parameter update unit 7 as a parameter of the second neural network updated by the parameter update unit 7. b2, b3) are stored.
- the first feature amount detection unit 34 inputs the registered image stored in the registered image storage unit 31, outputs the feature amount of the registered image, and recognizes the recognition target stored in the recognition target image storage unit 32.
- a first neural network that inputs an image and outputs a feature amount of the recognition target image is included.
- the first neural network included in the first feature quantity detection unit 34 is the same neural network as the first neural network included in the feature quantity extraction unit 5 of the machine learning device in FIG. Therefore, the parameter w (b1, b2, b3) indicating the kernel weight value in the first neural network included in the first feature quantity detection unit 34 has the smallest difference between the reconstructed image and the binary image. 1 is optimized by the parameter updating unit 7 of the machine learning device of FIG.
- the image regeneration unit 35 receives the feature amount of the registered image output from the first neural network included in the first feature amount detection unit 34, and is a reconstructed image that is an image obtained by reconstructing the registered image. This is an image in which a registered image is output and the feature quantity of the recognition target image output from the first neural network included in the first feature quantity detection unit 34 is input to reconstruct the recognition target image.
- a second neural network for outputting the reconstructed recognition image is provided.
- the second neural network included in the image regeneration unit 35 is the same neural network as the second neural network included in the image reconstruction unit 6 of the machine learning device in FIG.
- the parameter w (b1, b2, b3) indicating the kernel weight value in the second neural network possessed by the image regeneration unit 35 is such that the difference between the reconstructed image and the binary image is minimized.
- the optimization is performed by the parameter updating unit 7 of the machine learning device of FIG.
- the second feature amount detection unit 36 receives the reconstruction registration image output from the second neural network included in the image regeneration unit 35, outputs the feature amount of the reconstruction registration image
- the image regeneration unit 35 includes a first neural network that inputs the reconstruction recognition image output from the second neural network included in the image regeneration unit 35 and outputs the feature amount of the reconstruction recognition image.
- the first neural network included in the second feature amount detection unit 36 is the same neural network as the first neural network included in the feature amount extraction unit 5 of the machine learning device in FIG. Accordingly, the parameter w (b1, b2, b3) indicating the kernel weight value in the first neural network possessed by the second feature quantity detection unit 36 has the smallest difference between the reconstructed image and the binary image. 1 is optimized by the parameter updating unit 7 of the machine learning device of FIG.
- the pooling layer included in the first neural network included in the second feature quantity detection unit 36 does not need to output position information.
- the first neural network included in the feature amount extraction unit 5 and the second neural network included in the image reconstruction unit 6 of the machine learning device in FIG. 1 is a CNN
- the first feature The first neural network included in the quantity detection unit 34 and the first neural network included in the second feature quantity detection unit 36 have only a convolution layer kernel as a free parameter. For this reason, if the kernel sizes are the same, the parameter updated by the parameter update unit 7 can be used as a learned parameter. Therefore, the feature amount map of each convolution layer in the machine learning device of FIG.
- the size of each convolutional layer in the image recognition apparatus may be different from the size map.
- the registered image storage unit 31 stores one or more document images as registered images. This registered image may be the same document image as the document image stored in the learning image storage unit 1 of the machine learning device in FIG. 1, or may be stored in the learning image storage unit 1 of the machine learning device in FIG. The document image may be different from the existing document image.
- the recognition target image storage unit 32 stores a recognition target image that is a document image to be recognized.
- the feature amount detection unit 33 includes a registration process for registering the feature amount of the registered image stored in the registered image storage unit 31 in the feature amount storage unit 37, and a recognition target image stored in the recognition target image storage unit 32. In order to enable search processing for searching for similar registered images, pre-processing for extracting feature amounts of recognition target images is performed.
- the first feature amount detection unit 34 of the feature amount detection unit 33 inputs one registered image in order from one or more registered images stored in the registered image storage unit 31.
- the first neural network included in the first feature quantity detection unit 34 receives one registered image, the first neural network outputs the feature quantity of the registered image.
- the first feature amount detection unit 34 stores the feature amount of the registered image in the feature amount storage unit 37 and outputs the feature amount of the registered image to the image regeneration unit 35.
- the image regeneration unit 35 of the feature amount detection unit 33 inputs the feature amount of the registered image output from the first feature amount detection unit 34.
- the second neural network included in the image regeneration unit 35 receives the feature amount of the registered image
- the second neural network outputs a reconstructed registered image that is an image obtained by reconstructing the registered image.
- the image regeneration unit 35 outputs the reconstructed registration image to the second feature amount detection unit 36 and the image search unit 38.
- the image storage unit 38 a of the image search unit 38 stores the reconstructed registered image output from the image regeneration unit 35. Note that the reconstructed registered image output from the image regenerating unit 35 is an image corresponding to the binary image of the registered image, but is an image reconstructed from the feature amount of the registered image. It does not necessarily match the image completely.
- the second feature quantity detection unit 36 of the feature quantity detection unit 33 receives the reconstructed registration image output from the image regeneration unit 35.
- the first neural network included in the second feature amount detection unit 36 inputs the reconstructed registered image
- the first neural network outputs the feature amount of the reconstructed registered image.
- the second feature quantity detection unit 36 stores the feature quantity of the reconstructed registration image in the feature quantity storage unit 37. If the number of registered images stored in the registered image storage unit 31 is N, the feature amount storage unit 37 stores the feature amounts of the N registered images output from the first feature amount detection unit 34.
- the feature amounts of the N reconstructed registration images output from the second feature amount detection unit 36 are stored.
- the feature amount of the reconstructed registered image output from the second feature amount detecting unit 36 is a feature amount extracted from the reconstructed registered image that is an image reconstructed by the image regenerating unit 35, the first feature amount is extracted. The influence of disturbance is removed more than the feature amount of the registered image output from the first neural network included in the feature amount detection unit 34.
- the first feature amount detection unit 34 of the feature amount detection unit 33 inputs a recognition target image that is a document image to be recognized that is stored in the recognition target image storage unit 32.
- the first neural network included in the first feature amount detection unit 34 outputs the feature amount of the recognition target image.
- the first feature amount detection unit 34 outputs the recognition target image.
- the feature amount is output to each of the image regeneration unit 35 and the image search unit 38.
- the image regeneration unit 35 included in the feature amount detection unit 33 inputs the feature amount of the recognition target image output from the first feature amount detection unit 34.
- the second neural network of the image regeneration unit 35 receives the feature amount of the recognition target image
- the second neural network outputs a reconstructed recognition image that is an image obtained by reconstructing the recognition target image.
- the image regeneration unit 35 outputs the reconstructed recognition image to the second feature amount detection unit 36.
- the second feature amount detection unit 36 of the feature amount detection unit 33 receives the reconstructed recognition image output from the image regeneration unit 35.
- the first neural network included in the second feature quantity detection unit 36 receives the reconstruction recognition image
- the first neural network outputs the feature quantity of the reconstruction recognition image.
- the second feature amount detection unit 36 outputs the feature amount of the reconstructed recognition image to the image search unit 38.
- the image search unit 38 searches for a registered image similar to the recognition target image, the feature amount of the registered image output from the first feature amount detection unit 34 and the first feature amount detection unit It is not sufficient to simply compare the feature quantity of the recognition target image output from 34. Therefore, the feature quantity of the reconstructed registered image output from the second feature quantity detection unit 36 from which the influence of many disturbances has been removed, and the second feature quantity detection unit from which the influence of many disturbances has been removed. It is highly necessary to compare the feature amount of the reconstructed recognition image output from 36.
- the feature quantity to be compared by the image search unit 38 is the feature quantity of the registered image output from the first feature quantity detection unit 34 and the feature quantity of the recognition target image. It is assumed that the feature amount of the reconstructed registered image and the feature amount of the reconstructed recognition image output from the feature amount detecting unit 36 is set in advance by the user.
- the setting in which the feature quantity to be compared by the image search unit 38 is the feature quantity of the registered image and the feature quantity of the recognition target image output from the first feature quantity detection unit 34 is referred to as “setting A”.
- a setting in which the feature quantity to be compared by the image search unit 38 is the feature quantity of the reconstructed registered image and the feature quantity of the reconstructed recognition image output from the second feature quantity detection unit 36 is referred to as “setting B”.
- the image search unit 38 outputs one or more registrations output from the first feature amount detection unit 34 among the feature amounts of one or more registered images stored in the feature amount storage unit 37.
- the similarity between the feature quantity of the image and the feature quantity of the recognition target image output from the first feature quantity detection unit 34 is calculated.
- the algorithm for calculating the similarity of the feature quantity is not particularly limited, and for example, cosine similarity can be used.
- the image search unit 38 resembles the feature amount of one or more registered images output from the first feature amount detection unit 34 and the feature amount of the recognition target image output from the first feature amount detection unit 34. When the degree is calculated, a registered image with the highest calculated similarity is specified among one or more registered images stored in the registered image storage unit 31.
- the image search unit 38 outputs the specified registered image as a search result of registered images similar to the recognition target image. In this case, when the image search unit 38 searches for a registered image similar to the recognition target image, it is not necessary for the image regeneration unit 35 and the second feature amount detection unit 36 to perform processing. The time until the search result is obtained can be shortened.
- the image search unit 38 selects one or more re-outputs output from the second feature amount detection unit 36 among the feature amounts of one or more registered images stored in the feature amount storage unit 37.
- the similarity between the feature quantity of the construction registration image and the feature quantity of the reconstructed recognition image output from the second feature quantity detection unit 36 is calculated.
- the image search unit 38 includes the feature amount of one or more reconstructed registration images output from the second feature amount detection unit 36 and the feature amount of the reconstruction recognition image output from the second feature amount detection unit 36.
- the similarity is calculated, the reconstructed registered image having the highest calculated similarity is specified among one or more reconstructed registered images stored in the image storage unit 38a.
- the image search unit 38 outputs a registered image corresponding to the identified reconstructed registered image as a search result of registered images similar to the recognition target image.
- the image regeneration unit 35 and the second feature amount detection unit 36 need to perform preprocessing. Although it takes longer to obtain a registered image search result than in the case of A, even if the registration image and the environment in which the recognition target image is acquired are different, it is possible to search for a registered image that is similar to the recognition target image. Degradation of accuracy can be suppressed.
- the image search unit 38 outputs the feature amount of one or more reconstructed registration images output from the second feature amount detection unit 36 and the second feature amount detection unit 36.
- the similarity with the feature amount of the reconstructed recognition image is calculated.
- the present invention is not limited to this.
- a method of calculating the similarity as follows can be considered.
- the image search unit 38 resembles the feature amount of one or more registered images output from the first feature amount detection unit 34 and the feature amount of the recognition target image output from the first feature amount detection unit 34.
- the degree hereinafter referred to as similarity R1 is calculated.
- the image search unit 38 also includes the feature quantities of one or more reconstructed registration images output from the second feature quantity detection unit 36 and the reconstruction recognition image output from the second feature quantity detection unit 36.
- a similarity with the feature amount (hereinafter referred to as similarity R2) is calculated.
- the image search unit 38 calculates an average value of the similarity R1 and the similarity R2 or a weighted addition value of the similarity R1 and the similarity R2 as the final similarity R.
- the image search unit 38 specifies a registered image having the highest calculated similarity R among one or more registered images.
- the image search unit 38 outputs the specified registered image as a search result of registered images similar to the recognition target image.
- the image search unit 38 calculates, for each registered image stored in the registered image storage unit 31, the similarity between the feature amount of the registered image and the feature amount of the recognition target image. Show.
- the image search unit 38 performs the feature amount storage unit 37. The degree of similarity between the plurality of same-type registered images stored in the above and the feature amount of the recognition target image is calculated. Then, the image search unit 38 may calculate an average value of the calculated similarities as the similarity between the feature amount of the same type registered image and the feature amount of the recognition target image.
- the image search unit 38 searches for the same type registered image similar to the recognition target image from among the M types of same type registered images.
- the binary image conversion unit 3 that converts the learning image into a binary image, and the first that inputs the learning image and outputs the feature amount of the learning image.
- the parameter updating unit 7 determines the parameters of the first neural network and the second neural network according to the difference between the reconstructed image output from the second neural network and the binary image converted by the binary image converting unit 3. Since each parameter is updated, the learning image is input and the feature value of the learning image is output without using teacher data. That offers an advantage of being able to update the parameters of the first neural network.
- the feature amount of the registered image is output, and when the recognition target image is given, the feature amount having a neural network that outputs the feature amount of the recognition target image.
- a detection unit 33, a feature amount storage unit 37 that stores the feature amount of the registered image output from the neural network included in the feature amount detection unit 33, and one or more stored in the feature amount storage unit 37 The feature amount of the registered image is compared with the feature amount of the recognition target image output from the neural network included in the feature amount detection unit 33, and similar to the recognition target image from one or more registered images.
- An image search unit 38 for searching for registered images, and the neural network parameters of the feature amount detection unit 33 are updated by the machine learning device.
- the parameters have been updated by the machine learning device, an effect which can be retrieved registered image that is similar to the recognition target image.
- the user can grasp the type of the recognition target image by confirming the type of the registered image searched by the image recognition apparatus.
- the registered image stored in the registered image storage unit 31 of the image recognition device of FIG. 4 is the same image as the document image stored in the learning image storage unit 1 of the machine learning device of FIG. It has been described above that the image may be different from the document image stored in the learning image storage unit 1 of the machine learning device of FIG.
- the document image stored in the learning image storage unit 1 is similar to the environment for acquiring the registered image and the recognition target image, or the document genre is the document image, the registered image, and the recognition target image. If the registered image and the recognition target image are different from the document image, a registered image similar to the recognition target image can be searched. That is, even when the machine learning apparatus of FIG.
- the acquisition environment includes not only the environment in which images are taken but also differences in image acquisition equipment.
- the genre of the documents is similar, for example, an application form of a different bank or a form of a different administrative institution can be considered.
- Embodiment 2 the initial state of the parameters of the first neural network included in the feature amount extraction unit 5 and the second neural network included in the image reconstruction unit 6 is particularly referred to. Absent.
- the first neural network included in the feature quantity extraction unit 5 is a neural network in which parameters are learned in advance based on some learning data.
- the second neural network included in the image reconstruction unit 6 is also a neural network in which parameters are learned in advance based on some learning data.
- the learning image storage unit 1 of the machine learning apparatus in FIG. 1 stores a recognition target image as a document image. Based on the recognition target image, the parameter update unit 7 performs the same as in the first embodiment. It is assumed that the parameters of the first neural network and the parameters of the second neural network are updated. In this case, although the learning time is increased as compared with the first embodiment, a registered image similar to the recognition target image can be searched more accurately than the first embodiment.
- the present invention is suitable for a machine learning apparatus that inputs a document image and updates a parameter of a neural network that outputs a feature amount of the document image.
- the present invention is also suitable for an image recognition apparatus that searches for registered images that are similar to the recognition target image.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un dispositif d'apprentissage machine qui comprend : une unité de conversion d'image binaire (3) destinée à convertir une image d'apprentissage en une image binaire ; une unité d'extraction de quantité de caractéristiques (5) ayant un premier réseau neuronal pour accepter l'image d'apprentissage en tant qu'entrée et pour émettre une quantité de caractéristiques de l'image d'apprentissage ; et une unité de reconstruction d'image (6) ayant un second réseau neuronal pour accepter la quantité de caractéristiques émise par le premier réseau neuronal en tant qu'entrée et pour émettre une image reconstruite. Une unité de mise à jour de paramètre (7) met à jour un paramètre du premier réseau neuronal et un paramètre du second réseau neuronal en fonction d'une différence entre l'image reconstruite émise depuis le second réseau neuronal et l'image binaire convertie par l'unité de conversion d'image binaire (3).
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2017/013603 WO2018179338A1 (fr) | 2017-03-31 | 2017-03-31 | Dispositif d'apprentissage machine et dispositif de reconnaissance d'image |
| JP2017554102A JP6320649B1 (ja) | 2017-03-31 | 2017-03-31 | 機械学習装置及び画像認識装置 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2017/013603 WO2018179338A1 (fr) | 2017-03-31 | 2017-03-31 | Dispositif d'apprentissage machine et dispositif de reconnaissance d'image |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018179338A1 true WO2018179338A1 (fr) | 2018-10-04 |
Family
ID=62105884
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2017/013603 Ceased WO2018179338A1 (fr) | 2017-03-31 | 2017-03-31 | Dispositif d'apprentissage machine et dispositif de reconnaissance d'image |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP6320649B1 (fr) |
| WO (1) | WO2018179338A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2020160582A (ja) * | 2019-03-25 | 2020-10-01 | 三菱電機株式会社 | 特徴特定装置、特徴特定方法及び特徴特定プログラム |
| CN112541876A (zh) * | 2020-12-15 | 2021-03-23 | 北京百度网讯科技有限公司 | 卫星图像处理方法、网络训练方法、相关装置及电子设备 |
| WO2021131248A1 (fr) * | 2019-12-24 | 2021-07-01 | 株式会社日立製作所 | Dispositif de recherche d'objet et procédé de recherche d'objet |
| JP2024008989A (ja) * | 2018-11-13 | 2024-01-19 | 株式会社半導体エネルギー研究所 | 画像検索システム及び画像検索方法 |
| JP7611315B1 (ja) | 2023-08-04 | 2025-01-09 | 三菱重工機械システム株式会社 | 学習用画像生成方法、学習用画像生成装置、学習用画像生成プログラム及び学習モデル生成方法 |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019211912A (ja) * | 2018-06-01 | 2019-12-12 | 日本電信電話株式会社 | 学習装置、検索装置、方法、及びプログラム |
| WO2020129231A1 (fr) * | 2018-12-21 | 2020-06-25 | 三菱電機株式会社 | Dispositif d'estimation de direction de source sonore, procédé d'estimation de direction de source sonore, et programme d'estimation de direction de source sonore |
| JP7269778B2 (ja) * | 2019-04-04 | 2023-05-09 | 富士フイルムヘルスケア株式会社 | 超音波撮像装置、および、画像処理装置 |
| JP7368995B2 (ja) * | 2019-09-30 | 2023-10-25 | セコム株式会社 | 画像認識システム、撮像装置、認識装置及び画像認識方法 |
| WO2021152715A1 (fr) * | 2020-01-29 | 2021-08-05 | 日本電信電話株式会社 | Dispositif d'apprentissage, dispositif de recherche, procédé d'apprentissage, procédé de recherche, et programme |
| CN113470831B (zh) * | 2021-09-03 | 2021-11-16 | 武汉泰乐奇信息科技有限公司 | 一种基于数据简并的大数据转换方法与装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005092465A (ja) * | 2003-09-16 | 2005-04-07 | Fuji Xerox Co Ltd | データ認識装置 |
| US20150238148A1 (en) * | 2013-10-17 | 2015-08-27 | Siemens Aktiengesellschaft | Method and system for anatomical object detection using marginal space deep neural networks |
| JP2016004549A (ja) * | 2014-06-19 | 2016-01-12 | ヤフー株式会社 | 特定装置、特定方法及び特定プログラム |
| US20170076438A1 (en) * | 2015-08-31 | 2017-03-16 | Cape Analytics, Inc. | Systems and methods for analyzing remote sensing imagery |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH11212990A (ja) * | 1998-01-26 | 1999-08-06 | Toray Ind Inc | 画像の検索装置および画像の検索表示方法ならびに物品の製造方法 |
-
2017
- 2017-03-31 WO PCT/JP2017/013603 patent/WO2018179338A1/fr not_active Ceased
- 2017-03-31 JP JP2017554102A patent/JP6320649B1/ja active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2005092465A (ja) * | 2003-09-16 | 2005-04-07 | Fuji Xerox Co Ltd | データ認識装置 |
| US20150238148A1 (en) * | 2013-10-17 | 2015-08-27 | Siemens Aktiengesellschaft | Method and system for anatomical object detection using marginal space deep neural networks |
| JP2016004549A (ja) * | 2014-06-19 | 2016-01-12 | ヤフー株式会社 | 特定装置、特定方法及び特定プログラム |
| US20170076438A1 (en) * | 2015-08-31 | 2017-03-16 | Cape Analytics, Inc. | Systems and methods for analyzing remote sensing imagery |
Non-Patent Citations (3)
| Title |
|---|
| HITOSHI IBA, NEURO-EVOLUTION AND DEEP LEARNING, 20 October 2015 (2015-10-20), pages 57 - 60 , 81-111, ISBN: 978-4-274-21802-6 * |
| NAOKI KUBO ET AL.: "Query-by-sketch image retrieval using pseudo-autoencoder", THE INSTITUTE OF ELECTRICAL ENGINEERS OF JAPAI KENKYUKAI SHIRYO, 28 March 2016 (2016-03-28), pages 101 - 106 * |
| TOMONORI SHINDO: "Deep Learning wa Banno ka Dai 3 Bu: Task Betsu Hen", NIKKEI ELECTRONICS, vol. 1156, 20 May 2015 (2015-05-20), pages 44 - 52, ISSN: 0385-1680 * |
Cited By (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2024008989A (ja) * | 2018-11-13 | 2024-01-19 | 株式会社半導体エネルギー研究所 | 画像検索システム及び画像検索方法 |
| US12174882B2 (en) | 2018-11-13 | 2024-12-24 | Semiconductor Energy Laboratory Co., Ltd. | Image retrieval system and image retrieval method |
| JP7585431B2 (ja) | 2018-11-13 | 2024-11-18 | 株式会社半導体エネルギー研究所 | 画像検索システム及び画像検索方法 |
| JP7357454B2 (ja) | 2019-03-25 | 2023-10-06 | 三菱電機株式会社 | 特徴特定装置、特徴特定方法及び特徴特定プログラム |
| WO2020194864A1 (fr) * | 2019-03-25 | 2020-10-01 | 三菱電機株式会社 | Dispositif de spécification de caractéristiques, procédé de spécification de caractéristiques et programme de spécification de caractéristiques |
| CN113661515A (zh) * | 2019-03-25 | 2021-11-16 | 三菱电机株式会社 | 特征确定装置、特征确定方法和特征确定程序 |
| JP2020160582A (ja) * | 2019-03-25 | 2020-10-01 | 三菱電機株式会社 | 特徴特定装置、特徴特定方法及び特徴特定プログラム |
| CN113661515B (zh) * | 2019-03-25 | 2025-02-28 | 三菱电机株式会社 | 特征确定装置、特征确定方法和计算机能读取的存储介质 |
| JP7196058B2 (ja) | 2019-12-24 | 2022-12-26 | 株式会社日立製作所 | 物体検索装置及び物体検索方法 |
| JP2021101274A (ja) * | 2019-12-24 | 2021-07-08 | 株式会社日立製作所 | 物体検索装置及び物体検索方法 |
| WO2021131248A1 (fr) * | 2019-12-24 | 2021-07-01 | 株式会社日立製作所 | Dispositif de recherche d'objet et procédé de recherche d'objet |
| US12125284B2 (en) | 2019-12-24 | 2024-10-22 | Hitachi, Ltd. | Object search device and object search method |
| CN112541876B (zh) * | 2020-12-15 | 2023-08-04 | 北京百度网讯科技有限公司 | 卫星图像处理方法、网络训练方法、相关装置及电子设备 |
| CN112541876A (zh) * | 2020-12-15 | 2021-03-23 | 北京百度网讯科技有限公司 | 卫星图像处理方法、网络训练方法、相关装置及电子设备 |
| JP7611315B1 (ja) | 2023-08-04 | 2025-01-09 | 三菱重工機械システム株式会社 | 学習用画像生成方法、学習用画像生成装置、学習用画像生成プログラム及び学習モデル生成方法 |
| WO2025032990A1 (fr) * | 2023-08-04 | 2025-02-13 | 三菱重工機械システム株式会社 | Procédé de génération d'image d'apprentissage, dispositif de génération d'image d'apprentissage, programme de génération d'image d'apprentissage et procédé de génération de modèle d'apprentissage |
| JP2025023616A (ja) * | 2023-08-04 | 2025-02-17 | 三菱重工機械システム株式会社 | 学習用画像生成方法、学習用画像生成装置、学習用画像生成プログラム及び学習モデル生成方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2018179338A1 (ja) | 2019-04-04 |
| JP6320649B1 (ja) | 2018-05-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6320649B1 (ja) | 機械学習装置及び画像認識装置 | |
| US20220114693A1 (en) | Image-based pose determination | |
| JP5602940B2 (ja) | 事前計算されたスケール空間からのデイジー記述子生成 | |
| CN116664892B (zh) | 基于交叉注意与可形变卷积的多时相遥感图像配准方法 | |
| CN109544681B (zh) | 一种基于点云的果实三维数字化方法 | |
| JP5261501B2 (ja) | 不変の視覚場面及び物体の認識 | |
| CN108961180B (zh) | 红外图像增强方法及系统 | |
| CN111179173B (zh) | 一种基于离散小波变换和坡度融合算法的图像拼接方法 | |
| CN105335952B (zh) | 匹配代价计算方法和装置、以及视差值计算方法和设备 | |
| JP2023145404A (ja) | イメージ間の対応関係を識別するためにピラミッド及び固有性マッチングフライアを使用するシステム及び方法 | |
| JP5289412B2 (ja) | 局所特徴量算出装置及び方法、並びに対応点探索装置及び方法 | |
| EP2293243A2 (fr) | Appareil de traitement d'images, appareil de capture d'images, procédé de traitement d'images et programme | |
| CN109919971B (zh) | 图像处理方法、装置、电子设备及计算机可读存储介质 | |
| CN111091567A (zh) | 医学图像配准方法、医疗设备及存储介质 | |
| KR101183391B1 (ko) | 메트릭 임베딩에 의한 이미지 비교 | |
| US20190279022A1 (en) | Object recognition method and device thereof | |
| CN114612352A (zh) | 多聚焦图像融合方法、存储介质和计算机 | |
| CN111353325B (zh) | 关键点检测模型训练方法及装置 | |
| CN112862732A (zh) | 多分辨率图像融合方法、装置、设备、介质及产品 | |
| CN108830863A (zh) | 医学成像的左心室分割方法、系统和计算机可读存储介质 | |
| CN117952901B (zh) | 基于生成对抗网络的多源异构图像变化检测方法及装置 | |
| CN113205467A (zh) | 基于模糊检测的图像处理方法及装置 | |
| CN106557772B (zh) | 用于提取局部特征子的方法、装置及图像处理方法 | |
| CN117934568A (zh) | 基于多尺度局部显著主方向特征的跨模态图像配准方法 | |
| CN114092350B (zh) | 一种水下摆扫型高光谱相机的图像校正方法和系统 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2017554102 Country of ref document: JP Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17903758 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17903758 Country of ref document: EP Kind code of ref document: A1 |