WO2017214970A1 - Building convolutional neural network - Google Patents
Building convolutional neural network Download PDFInfo
- Publication number
- WO2017214970A1 WO2017214970A1 PCT/CN2016/086154 CN2016086154W WO2017214970A1 WO 2017214970 A1 WO2017214970 A1 WO 2017214970A1 CN 2016086154 W CN2016086154 W CN 2016086154W WO 2017214970 A1 WO2017214970 A1 WO 2017214970A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature maps
- indexes
- list
- determining
- program code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- Embodiments of the present disclosure generally relate to information processing, and more particularly to methods, apparatuses and computer program products for building a Convolutional Neural Network (CNN) .
- CNN Convolutional Neural Network
- CNNs have achieved state-of-the-art performance in the applications of image recognition, object detection, speech recognition, and so on.
- Representative applications of the CNN include AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) .
- ADAS Advanced Driver Assistance Systems
- OCR Optical Character Recognition
- face recognition large-scale image classification
- HAI Human Machine Interaction
- the CNNs are organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers.
- the role of the convolutional layers is feature representation with the semantic level of the features increasing with the depth of the layers. Designing effective convolutional layers to obtain robust feature maps is the key of improving the performance of the CNNs.
- example embodiments of the present disclosure include a method, apparatus and computer program product for building a CNN.
- a method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
- updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
- the method further comprises: assigning indexes to the first feature maps; and generating a list of the indexes.
- the method further comprises: updating the list of the indexes based on the second feature maps.
- determining an amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes.
- changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
- an apparatus comprising: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
- an apparatus comprises means for performing a method according to the first aspect of the present disclosure.
- a computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to the first aspect of the present disclosure.
- Fig. 1 schematically shows an architecture of a CNN in which embodiments of the present disclosure can be implemented
- Fig. 2 is a flowchart of a method in accordance with embodiments of the present disclosure
- Fig. 3a shows an example of the feature maps prior to re-ranking
- Fig. 3b shows an example of the re-ranked feature maps
- Fig. 4 is a block diagram of an electronic device in which embodiments of the present disclosure can be implemented
- the term “includes” and its variants are to be read as opened terms that mean “includes, but is not limited to. ”
- the term “based on” is to be read as “based at least in part on. ”
- the term “one embodiment” and “an embodiment” are to be read as “at least one embodiment. ”
- the term “another embodiment” is to be read as “at least one other embodiment. ”
- Other definitions, explicit and implicit, may be included below.
- Fig. 1 schematically shows an architecture of a CNN 100 in which embodiments of the present disclosure can be implemented. It is to be understood that the structure and functionality of the CNN 100 are described only for the purpose of illustration without suggesting any limitations as to the scope of the present disclosure described herein. The present disclosure described herein can be embodied with a different structure and/or functionality.
- the CNN 100 includes an input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and an output layer 160.
- the convolutional layers 120, 140 and the pooling layers 130, 150 are organized in an alternating form.
- the convolutional layer 120 is followed by the pooling layer 130 and the convolutional layer 140 is followed by the pooling layer 150.
- the CNN 100 only includes one of the pooling layers 130 and 150 which follows the successive convolutional layers 120 and 140. In other embodiments, the CNN 100 does not include any pooling layers.
- the CNN 100 may be trained with a training dataset.
- the training dataset enters the CNN 100 at the input layer 110.
- the CNN 100 may be used for image recognition, object detection, speech recognition, and so on.
- the role of the convolutional layers 120 and 140 is feature representation with the semantic level of the features increasing with the depth of the layers.
- the pooling layers 130 and 150 are obtained by replacing an output of a preceding convolutional layer at certain location with summary statistic of the nearby outputs.
- the output layer 160 outputs classification results.
- Each of the input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and output layer 160 includes one or more planes arranged along a Z dimension.
- Each of the planes is defined by an X dimension and a Y dimension, which is referred to as a spatial domain.
- Each of the planes in the convolutional layers 120, 140 and pooling layers 130, 150 may be considered as a feature map or a channel which has a feature detector.
- the Z dimension is also referred to as a channel dimension or channel domain.
- the feature maps of each of the convolutional layers 120 and 140 may be obtained by applying convolution operation on the feature maps in a respective preceding layer in both spatial domain and channel domain.
- convolution operation By means of the convolution operation, each of elements in the feature maps of the convolutional layers 120 and 140 is only connected with elements in a local region of the feature maps in a preceding layer.
- applying the convolution operation to a preceding layer of a convolutional layer means that there is a sparse connection between these two layers.
- the terms “convolution operation” and “sparse connection” may be used interchangeably.
- the convolution operation is suitable for the situation where neighboring elements are highly correlated.
- existing learning algorithms do not guarantee that neighboring elements between different feature maps in the channel domain are highly correlated, the correlation between neighboring elements in the channel domain is not as large as the correlation between neighboring elements in the spatial domain.
- the sparse connections in the channel domain cannot result in a good performance.
- the feature maps obtained via the convolution operation do not have strong ability of feature representation and thus cannot be used as discriminative representations of an image.
- a scheme for building a CNN is proposed to improve the correlation between neighboring elements in the channel domain so that applying convolution operations in both spatial domain and channel domain yields a better performance.
- Fig. 2 shows a flowchart of a method 200 for building a CNN in accordance with embodiments of the present disclosure.
- the method 200 may be implemented in a CNN, such as the CNN 100 as shown in Fig. 1.
- the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers.
- the method 200 may be performed with respect to any of the convolutional layers 120 and 140.
- the method 200 will be described by taking convolutional layer 120 as an example.
- the method 200 is entered in step 210, where convolution parameters and first feature maps for the convolutional layer 120 in the CNN 100 are determined based on a training dataset for multimedia content.
- the CNN 100 may be used for image recognition, object detection, speech recognition, and so on.
- examples of the training dataset for multimedia content include, but are not limited to, training datasets for images, speech, video and the like.
- a convolution operation may be performed by using linear filters.
- each of the filters is convoluted over the feature maps in a preceding layer with a predefined stride followed by a nonlinear activation.
- the convolution parameters include weights of the linear filters.
- Different feature maps correspond to different parameters of the filters with a feature map sharing the same parameters.
- the convolution operation may be performed by using nonlinear functions, such as shallow MultiLayer Perceptron (MLP) .
- the convolution parameters include parameters for the MLP.
- the convolution parameters and first feature maps for the convolutional layer 120 are determined by applying a learning algorithm to the training dataset.
- the learning algorithm include, but are not limited to, back propagation (BP) , stochastic gradient descent (SGD) , and limited-memory BFGS (Broyden, Fletcher, Goldfarb, and Shanno) .
- a number of the first feature maps may be pre-determined.
- the number, denoted by M may be pre-determined as any appropriate integer larger than one, for example, eight.
- the method 200 further comprises assigning indexes to the determined feature maps.
- Fig. 3a shows an example of the first feature maps with the indexes being assigned.
- the first feature maps includes eight feature maps with the indexes of 1, 2, ... , 8.
- the eight feature maps with the indexes of 1, 2, ... , 8 also are called feature maps C 1 , C 2 , ... , C M .
- the method 200 further comprises generating a list of the indexes.
- the list of the indexes it may be known that which feature maps are neighbors.
- the list of the indexes denoted by R, is [1, 2, 3, 4, 5, 6, 7, 8] .
- the existing learning algorithms do not guarantee that neighboring elements in different feature maps in the channel domain are highly correlated.
- elements in the feature map 1 and those in feature map 2 are neighboring elements as shown in Fig. 3a, but correlation between these neighboring elements may not be high.
- correlation between the neighboring elements in the feature maps 3 and 4 may not be high.
- step 220 in order to improve the correlation between the neighboring elements in different feature maps, in step 220, an order of the first feature maps obtained in step 210 is changed according to correlation among the first feature maps so as to obtain second feature maps.
- the second feature maps are obtained by re-ranking the first feature maps.
- the second feature maps are also called the “re-ranked feature maps” .
- changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
- the representation information examples include, but are not limited to, Histograms of Oriented Gradient (HOG) in an intensity image, information extracted by an algorithm of Scale-Invariant Feature Transform (SIFT) and the like.
- HOG Histograms of Oriented Gradient
- SIFT Scale-Invariant Feature Transform
- the representation information is also referred to as HOG features.
- the step 220 will be described in detail by taking HOG features as an example and referring to Fig. 3a.
- the re-ranked maps (for example, the second feature maps) are denoted by D 1 , D 2 , ... , D M and let the feature map C 1 be the first one of the re-ranked maps D 1 , D 2 , ... , D M , for example, D 1 ⁇ C 1 . Then, differences between the HOG features f (C i ) of the feature map C 1 and the remaining M-1 feature maps C 2 , C 3 , ... , C M may be determined as follows:
- K is a number of the HOG features in each feature map andf (C i ) (k) is the k th HOG feature of the feature map C i .
- a feature map j which has the smallest difference g to the feature map C 1 is determined as follows:
- the feature map C j be the second one of the re-ranked maps D 1 , D 2 , ... , D M , for example, D 2 ⁇ C j
- the above Equations (1) and (2) are applied to the feature map C j to find from the remaining M-2 feature maps a feature map which has the smallest difference to the feature map C j .
- the other feature maps of the re-ranked maps can be determined.
- Fig. 3b shows an example of the re-ranked feature maps.
- the re-ranked feature maps includes eight feature maps with the indexes of 1, 5, 3, 4 8, 6, 7, 2.
- the method 200 further comprises updating the list of the indexes R based on the re-ranked feature maps.
- the updated list of the indexes denoted by R * , is [1, 5, 3, 4 8, 6, 7, 2] .
- step 230 the convolution parameters for the convolutional layer 120 is updated based on the training dataset and the re-ranked feature maps.
- updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
- determining the amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes. In the above example, determining the amount of change in the order of the first feature maps comprises determining a difference between the list of the indexes R and the updated list of the indexes R * .
- the difference between the list of the indexes R and the updated list of the indexes R * may be determined as follows:
- s (R * , R) represents the difference between the list of the indexes R and the updated list of the indexes R * , represents the j th element in the list R * , and R j represents the j th element in the list R. If s (R * , R) is larger than a predetermined threshold, for example, 0, the convolution parameters will be updated.
- the difference between the list of the indexes R and the updated list of the indexes R * may be determined by determining a ratio of different elements between the lists R * and R to a total number of the elements in the lists R * as follows:
- w (R * , R) represents the ratio of different elements between the lists R * and R to a total number of the elements in the lists R * , represents the j th element in the list R * , and R j represents the j th element in the list R. If w (R * , R) is larger than a predetermined threshold, the convolution parameters will be updated.
- the predetermined threshold may be any value in the range of 0.5 to 1, for example, 0.8.
- the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers.
- Equation (3) may be re-written as follows:
- R * (i,j) represents the j th element in the list R * for the convolutional layer i
- R j represents the j th element in the list R for the convolutional layer i
- N represents a number of the convolutional layers.
- Equations (4) and (5) may be re-written as follows:
- step 230 may be understood as a process of re-learning of the convolution parameters for the convolutional layer 120 based on the training dataset and the re-ranked feature maps.
- the method 200 may be performed iteratively until the amount of change in the order of the first feature maps is equal to or smaller than the predetermined threshold.
- the method 200 may further comprise updating the feature maps for the convolutionat layer 120 based on the updated convolution parameters and changing an order of the updated feature maps according to correlation among the updated feature maps to obtain third feature maps.
- the method 200 may further comprise: determining an amount of change in the order of the updated feature maps; and in response to the amount being equal to or smaller than the predetermined threshold, stopping updating the convolution parameters.
- the method 200 may further comprise receiving a testing dataset for multimedia content at the input layer 110, performing an classification on the testing dataset and outputting results of the classification at the output layer 160.
- the convolution operation is performed by using linear filters and a length of each of the filters is smaller than the number of the feature maps in each of the convolutional layers.
- an order of feature maps of at least one of convolutional layers in a CNN is changed according to correlation among the feature maps so that similar feature maps are arranged to be neighbors and then the convolution parameters are re-learned.
- the correlation between neighboring elements in the channel domain is improved so that applying convolution operations in both spatial domain and channel domain yields a better performance.
- FIG. 4 in which an example electronic device or computer system/server 12 which is applicable to implement the embodiments of the present disclosure is shown.
- Computer system/server 12 is only illustrative and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein.
- computer system/server 12 is shown in the form of a general-purpose computing device.
- the components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
- Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
- Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
- System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
- Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive” ) .
- a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (for example, a “floppy disk” )
- an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media
- each can be connected to bus 18 by one or more data media interfaces.
- memory 28 may include at least one program product having a set (for example, at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
- Program/utility 40 having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
- Program modules 42 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
- Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, and the like.
- Such communication can occur via Input/Output (I/O) interfaces 22.
- computer system/server 12 can communicate with one or more networks such as a local area network (LAN) , a general wide area network (WAN) , and/or a public network (for example, the Internet) via network adapter 20.
- network adapter 20 communicates with the other components of computer system/server 12 via bus 18.
- I/O interfaces 22 may support one or more of various different input devices that can be used to provide input to computer system/server 12.
- the input device (s) may include a user device such keyboard, keypad, touch pad, trackball, and the like.
- the input device (s) may implement one or more natural user interface techniques, such as speech recognition, touch and stylus recognition, recognition of gestures in contact with the input device (s) and adjacent to the input device (s) , recognition of air gestures, head and eye tracking, voice and speech recognition, sensing user brain activity, and machine intelligence.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biodiversity & Conservation Biology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
Embodiments of the present disclosure provide a method, apparatus and computer program product for information processing. The method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
Description
Embodiments of the present disclosure generally relate to information processing, and more particularly to methods, apparatuses and computer program products for building a Convolutional Neural Network (CNN) .
CNNs have achieved state-of-the-art performance in the applications of image recognition, object detection, speech recognition, and so on. Representative applications of the CNN include AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) .
Generally, the CNNs are organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers. The role of the convolutional layers is feature representation with the semantic level of the features increasing with the depth of the layers. Designing effective convolutional layers to obtain robust feature maps is the key of improving the performance of the CNNs.
SUMMARY
In general, example embodiments of the present disclosure include a method, apparatus and computer program product for building a CNN.
In a first aspect of the present disclosure, a method is provided. The method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
In some embodiments, updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution
parameters.
In some embodiments, the method further comprises: assigning indexes to the first feature maps; and generating a list of the indexes.
In some embodiments, the method further comprises: updating the list of the indexes based on the second feature maps.
In some embodiments, determining an amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes.
In some embodiments, changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
In a second aspect of the present disclosure, an apparatus is provided. The apparatus comprises: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
In a third aspect of the present disclosure, an apparatus is provided. The apparatus comprises means for performing a method according to the first aspect of the present disclosure.
In a fourth aspect of the present disclosure, a computer program product is provided. The computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to the first aspect of the present disclosure.
It is to be understood that the Summary is not intended to identify key or essential features of embodiments of the present disclosure, nor is it intended to be used to limit the
scope of the present disclosure. Other features of the present disclosure will become easily comprehensible through the description below.
Through the more detailed description of some embodiments of the present disclosure in the accompanying drawings, the above and other objects, features and advantages of the present disclosure will become more apparent, wherein:
Fig. 1 schematically shows an architecture of a CNN in which embodiments of the present disclosure can be implemented;
Fig. 2 is a flowchart of a method in accordance with embodiments of the present disclosure;
Fig. 3a shows an example of the feature maps prior to re-ranking;
Fig. 3b shows an example of the re-ranked feature maps; and
Fig. 4 is a block diagram of an electronic device in which embodiments of the present disclosure can be implemented;
Throughout the drawings, same or similar reference numerals represent the same or similar element.
Principle of the present disclosure will now be described with reference to some example embodiments. It is to be understood that these embodiments are described for the purpose of illustration only and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitations as to the scope of the invention. The invention described herein can be implemented in various manners other than the ones describe below.
As used herein, the term “includes” and its variants are to be read as opened terms that mean “includes, but is not limited to. ” The term “based on” is to be read as “based at least in part on. ” The term “one embodiment” and “an embodiment” are to be read as “at least one embodiment. ” The term “another embodiment” is to be read as “at least one other embodiment. ” Other definitions, explicit and implicit, may be included below.
Reference is first made to Fig. 1, which schematically shows an architecture of a
CNN 100 in which embodiments of the present disclosure can be implemented. It is to be understood that the structure and functionality of the CNN 100 are described only for the purpose of illustration without suggesting any limitations as to the scope of the present disclosure described herein. The present disclosure described herein can be embodied with a different structure and/or functionality.
As shown in Fig. 1, the CNN 100 includes an input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and an output layer 160. Typically, the convolutional layers 120, 140 and the pooling layers 130, 150 are organized in an alternating form. In some embodiments, the convolutional layer 120 is followed by the pooling layer 130 and the convolutional layer 140 is followed by the pooling layer 150. In some embodiments, the CNN 100 only includes one of the pooling layers 130 and 150 which follows the successive convolutional layers 120 and 140. In other embodiments, the CNN 100 does not include any pooling layers.
The CNN 100 may be trained with a training dataset. The training dataset enters the CNN 100 at the input layer 110. Once trained, the CNN 100 may be used for image recognition, object detection, speech recognition, and so on. The role of the convolutional layers 120 and 140 is feature representation with the semantic level of the features increasing with the depth of the layers. The pooling layers 130 and 150 are obtained by replacing an output of a preceding convolutional layer at certain location with summary statistic of the nearby outputs. The output layer 160 outputs classification results.
Each of the input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and output layer 160 includes one or more planes arranged along a Z dimension. Each of the planes is defined by an X dimension and a Y dimension, which is referred to as a spatial domain. Each of the planes in the convolutional layers 120, 140 and pooling layers 130, 150 may be considered as a feature map or a channel which has a feature detector. Thus, the Z dimension is also referred to as a channel dimension or channel domain.
The feature maps of each of the convolutional layers 120 and 140 may be obtained by applying convolution operation on the feature maps in a respective preceding layer in both spatial domain and channel domain. By means of the convolution operation, each of elements in the feature maps of the convolutional layers 120 and 140 is only connected with elements in a local region of the feature maps in a preceding layer. In this sense, applying the convolution operation to a preceding layer of a convolutional layer means that
there is a sparse connection between these two layers. Thus, as used herein, the terms “convolution operation” and “sparse connection” may be used interchangeably.
It is known that the convolution operation is suitable for the situation where neighboring elements are highly correlated. However, because existing learning algorithms do not guarantee that neighboring elements between different feature maps in the channel domain are highly correlated, the correlation between neighboring elements in the channel domain is not as large as the correlation between neighboring elements in the spatial domain. As a result, the sparse connections in the channel domain cannot result in a good performance. For example, in the case of image recognition, the feature maps obtained via the convolution operation do not have strong ability of feature representation and thus cannot be used as discriminative representations of an image.
In accordance with embodiments of the present disclosure, a scheme for building a CNN is proposed to improve the correlation between neighboring elements in the channel domain so that applying convolution operations in both spatial domain and channel domain yields a better performance.
Fig. 2 shows a flowchart of a method 200 for building a CNN in accordance with embodiments of the present disclosure. The method 200 may be implemented in a CNN, such as the CNN 100 as shown in Fig. 1. In some embodiments, the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers. In other embodiments, the method 200 may be performed with respect to any of the convolutional layers 120 and 140. Hereinafter, for the purpose of illustration, the method 200 will be described by taking convolutional layer 120 as an example.
As shown, the method 200 is entered in step 210, where convolution parameters and first feature maps for the convolutional layer 120 in the CNN 100 are determined based on a training dataset for multimedia content.
As described above, once trained, the CNN 100 may be used for image recognition, object detection, speech recognition, and so on. In this regard, examples of the training dataset for multimedia content include, but are not limited to, training datasets for images, speech, video and the like.
Typically, a convolution operation may be performed by using linear filters. In particular, each of the filters is convoluted over the feature maps in a preceding layer with a
predefined stride followed by a nonlinear activation. In the case of using linear filters, the convolution parameters include weights of the linear filters. Different feature maps correspond to different parameters of the filters with a feature map sharing the same parameters. Alternatively, instead of using the linear filters, the convolution operation may be performed by using nonlinear functions, such as shallow MultiLayer Perceptron (MLP) . In this case, the convolution parameters include parameters for the MLP.
In some embodiments, the convolution parameters and first feature maps for the convolutional layer 120 are determined by applying a learning algorithm to the training dataset. Examples of the learning algorithm include, but are not limited to, back propagation (BP) , stochastic gradient descent (SGD) , and limited-memory BFGS (Broyden, Fletcher, Goldfarb, and Shanno) .
It is to be understood that in order to determine the convolution parameters and first feature maps for the convolutional layer 120, a number of the first feature maps may be pre-determined. Usually, in order to detect multiple features of the multimedia content, multiple feature maps are used in the convolutional layer 120. Thus, the number, denoted by M, may be pre-determined as any appropriate integer larger than one, for example, eight.
In some embodiments, the method 200 further comprises assigning indexes to the determined feature maps. Fig. 3a shows an example of the first feature maps with the indexes being assigned. As shown in Fig. 3a, the first feature maps includes eight feature maps with the indexes of 1, 2, ... , 8. Thus, the eight feature maps with the indexes of 1, 2, ... , 8 also are called feature maps C1, C2, ... , CM.
In some embodiments, the method 200 further comprises generating a list of the indexes. With the list of the indexes, it may be known that which feature maps are neighbors. In the example as shown in Fig. 3a, the list of the indexes, denoted by R, is [1, 2, 3, 4, 5, 6, 7, 8] .
As described above, the existing learning algorithms, such as the BP algorithm, do not guarantee that neighboring elements in different feature maps in the channel domain are highly correlated. For example, elements in the feature map 1 and those in feature map 2 are neighboring elements as shown in Fig. 3a, but correlation between these neighboring elements may not be high. Similarly, correlation between the neighboring elements in the feature maps 3 and 4 may not be high.
Referring back to Fig. 2, in order to improve the correlation between the
neighboring elements in different feature maps, in step 220, an order of the first feature maps obtained in step 210 is changed according to correlation among the first feature maps so as to obtain second feature maps. In other words, the second feature maps are obtained by re-ranking the first feature maps. Thus, hereinafter, the second feature maps are also called the “re-ranked feature maps” .
In some embodiments, changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
Examples of the representation information include, but are not limited to, Histograms of Oriented Gradient (HOG) in an intensity image, information extracted by an algorithm of Scale-Invariant Feature Transform (SIFT) and the like. In the case of HOG, the representation information is also referred to as HOG features. Hereinafter, for the purpose of illustration, the step 220 will be described in detail by taking HOG features as an example and referring to Fig. 3a.
With respect to each of the feature maps C1, C2, ... , CM as shown in Fig. 3a, HOG features are extracted and is expressed as f (Ci) , where i=1 to M. It is to be appreciated, a process for extracting the HOG features is known in the art and thus a detailed description thereof is omitted herein.
The re-ranked maps (for example, the second feature maps) are denoted by D1, D2, ... , DM and let the feature map C1 be the first one of the re-ranked maps D1, D2, ... , DM, for example, D1←C1. Then, differences between the HOG features f (Ci) of the feature map C1 and the remaining M-1 feature maps C2, C3, ... , CM may be determined as follows:
where K is a number of the HOG features in each feature map andf (Ci) (k) is the kth HOG feature of the feature map Ci.
Then, a feature map j which has the smallest difference g to the feature map C1 is determined as follows:
Next, let the feature map Cj be the second one of the re-ranked maps D1, D2, ... , DM, for example, D2←Cj, and the above Equations (1) and (2) are applied to the feature
map Cj to find from the remaining M-2 feature maps a feature map which has the smallest difference to the feature map Cj. Similarly, the other feature maps of the re-ranked maps can be determined.
Fig. 3b shows an example of the re-ranked feature maps. As shown in Fig. 3b, the re-ranked feature maps includes eight feature maps with the indexes of 1, 5, 3, 4 8, 6, 7, 2.
In some embodiments, the method 200 further comprises updating the list of the indexes R based on the re-ranked feature maps. With reference to Fig. 3b, the updated list of the indexes, denoted by R*, is [1, 5, 3, 4 8, 6, 7, 2] .
Referring back to Fig. 2, in step 230, the convolution parameters for the convolutional layer 120 is updated based on the training dataset and the re-ranked feature maps.
In some embodiments, updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
In some embodiments, determining the amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes. In the above example, determining the amount of change in the order of the first feature maps comprises determining a difference between the list of the indexes R and the updated list of the indexes R*.
As an example, the difference between the list of the indexes R and the updated list of the indexes R* may be determined as follows:
where s (R*, R) represents the difference between the list of the indexes R and the updated list of the indexes R*, represents the jth element in the list R*, and Rj represents the jth element in the list R. If s (R*, R) is larger than a predetermined threshold, for example, 0, the convolution parameters will be updated.
As another example, the difference between the list of the indexes R and the updated list of the indexes R* may be determined by determining a ratio of different elements between the lists R* and R to a total number of the elements in the lists R* as
follows:
where w (R*, R) represents the ratio of different elements between the lists R* and R to a total number of the elements in the lists R*, represents the jth element in the list R*, and Rj represents the jth element in the list R. If w (R*, R) is larger than a predetermined threshold, the convolution parameters will be updated. The predetermined threshold may be any value in the range of 0.5 to 1, for example, 0.8.
As described above, the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers. In this case, the above Equation (3) may be re-written as follows:
where R* (i,j) represents the jth element in the list R* for the convolutional layer i, and Rj represents the jth element in the list R for the convolutional layer i, N represents a number of the convolutional layers.
Similarly, the above Equations (4) and (5) may be re-written as follows:
It is to be appreciated that the step 230 may be understood as a process of re-learning of the convolution parameters for the convolutional layer 120 based on the training dataset and the re-ranked feature maps.
In addition, the method 200 may be performed iteratively until the amount of change in the order of the first feature maps is equal to or smaller than the predetermined threshold. In this sense, the method 200 may further comprise updating the feature maps for the convolutionat layer 120 based on the updated convolution parameters and changing an order of the updated feature maps according to correlation among the updated feature maps to obtain third feature maps. Furthermore, the method 200 may further comprise:
determining an amount of change in the order of the updated feature maps; and in response to the amount being equal to or smaller than the predetermined threshold, stopping updating the convolution parameters.
In some embodiments, the method 200 may further comprise receiving a testing dataset for multimedia content at the input layer 110, performing an classification on the testing dataset and outputting results of the classification at the output layer 160.
In some embodiments, the convolution operation is performed by using linear filters and a length of each of the filters is smaller than the number of the feature maps in each of the convolutional layers.
To sum up, in the embodiments of the present disclosure, an order of feature maps of at least one of convolutional layers in a CNN is changed according to correlation among the feature maps so that similar feature maps are arranged to be neighbors and then the convolution parameters are re-learned. Thus, the correlation between neighboring elements in the channel domain is improved so that applying convolution operations in both spatial domain and channel domain yields a better performance.
Reference is made to Fig. 4, in which an example electronic device or computer system/server 12 which is applicable to implement the embodiments of the present disclosure is shown. Computer system/server 12 is only illustrative and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein.
As shown in Fig. 4, computer system/server 12 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
Computer system/server 12 typically includes a variety of computer system
readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, and the like. One or more devices that enable a user to interact with computer system/server 12; and/or any devices (for example, network card, modem, etc. ) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN) , a general wide area network (WAN) , and/or a public network (for example, the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of
computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, and the like.
In computer system/server 12, I/O interfaces 22 may support one or more of various different input devices that can be used to provide input to computer system/server 12. For example, the input device (s) may include a user device such keyboard, keypad, touch pad, trackball, and the like. The input device (s) may implement one or more natural user interface techniques, such as speech recognition, touch and stylus recognition, recognition of gestures in contact with the input device (s) and adjacent to the input device (s) , recognition of air gestures, head and eye tracking, voice and speech recognition, sensing user brain activity, and machine intelligence.
Claims (14)
- A method, comprising:determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network;changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; andupdating the convolution parameters based on the training dataset and the second feature maps.
- The method of Claim 1, wherein updating the convolution parameters comprises:determining an amount of change in the order of the first feature maps; andin response to the amount being larger than a predetermined threshold, updating the convolution parameters.
- The method of Claim 2, further comprising:assigning indexes to the first feature maps; andgenerating a list of the indexes.
- The method of Claim 3, further comprising:updating the list of the indexes based on the second feature maps.
- The method of Claim 4, wherein determining an amount of change in the order of the first feature maps comprises:determining a difference between the generated list of the indexes and the updated list of the indexes.
- The method of any of Claims 1 to 5, wherein changing an order of the first feature maps according to correlation among the first feature maps comprises:obtaining representation inforrnation of the first feature maps;determining differences among the representation information; anddetermining the correlation based on the differences among the representation information.
- An apparatus, comprising:at least one processor; andat least one memory including computer program code;the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to:determine, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network;change an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; andupdate the convolution parameters based on the training dataset and the second feature maps.
- The apparatus of Claim 7, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to:determine an amount of change in the order of the first feature maps; andupdate the convolution parameters in response to the amount being larger than a predetermined threshold.
- The apparatus of Claim 8, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to further perform:assign indexes to the first feature maps; andgenerate a list of the indexes.
- The apparatus of Claim 9, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to further perform:update the list of the indexes based on the second feature maps.
- The apparatus of Claim 10, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to determine an amount of change in the order of the first feature maps by determining a difference between the generated list of the indexes and the updated list of the indexes.
- The apparatus of any of Claims 7 to 11, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to:obtain representation information of the first feature maps;determine differences among the representation information;determine the correlation based on the differences among the representation information; andchange the order of the first feature maps according to the correlation among the first feature maps to obtain the second feature maps.
- An apparatus comprising means for performing a method according to any of Claims 1 to 6.
- A computer program product comprising at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to any of Claims 1 to 6.
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201680086870.2A CN109643396A (en) | 2016-06-17 | 2016-06-17 | Construct convolutional neural networks |
| EP16905085.3A EP3472760A4 (en) | 2016-06-17 | 2016-06-17 | CONSTRUCTION OF A NEURONAL CONVOLUTION NETWORK |
| PCT/CN2016/086154 WO2017214970A1 (en) | 2016-06-17 | 2016-06-17 | Building convolutional neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/086154 WO2017214970A1 (en) | 2016-06-17 | 2016-06-17 | Building convolutional neural network |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017214970A1 true WO2017214970A1 (en) | 2017-12-21 |
Family
ID=60663851
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/086154 Ceased WO2017214970A1 (en) | 2016-06-17 | 2016-06-17 | Building convolutional neural network |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP3472760A4 (en) |
| CN (1) | CN109643396A (en) |
| WO (1) | WO2017214970A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020050478A1 (en) * | 2018-09-06 | 2020-03-12 | Samsung Electronics Co., Ltd. | Computing apparatus using convolutional neural network and method of operating the same |
| US10810726B2 (en) | 2019-01-30 | 2020-10-20 | Walmart Apollo, Llc | Systems and methods for detecting content in images using neural network architectures |
| CN111898743A (en) * | 2020-06-02 | 2020-11-06 | 深圳市九天睿芯科技有限公司 | CNN acceleration method and accelerator |
| US10922584B2 (en) | 2019-01-30 | 2021-02-16 | Walmart Apollo, Llc | Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content |
| CN113015984A (en) * | 2018-01-08 | 2021-06-22 | 达莉娅·弗罗洛瓦 | Error correction in convolutional neural networks |
| CN113435376A (en) * | 2021-07-05 | 2021-09-24 | 宝鸡文理学院 | Bidirectional feature fusion deep convolution neural network construction method based on discrete wavelet transform |
| US11758069B2 (en) | 2020-01-27 | 2023-09-12 | Walmart Apollo, Llc | Systems and methods for identifying non-compliant images using neural network architectures |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110309837B (en) * | 2019-07-05 | 2021-07-06 | 北京迈格威科技有限公司 | Data processing method and image processing method based on convolutional neural network feature map |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140079297A1 (en) * | 2012-09-17 | 2014-03-20 | Saied Tadayon | Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities |
| US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
| CN104978601A (en) * | 2015-06-26 | 2015-10-14 | 深圳市腾讯计算机系统有限公司 | Neural network model training system and method |
| US20160086078A1 (en) * | 2014-09-22 | 2016-03-24 | Zhengping Ji | Object recognition with reduced neural network weight precision |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104346622A (en) * | 2013-07-31 | 2015-02-11 | 富士通株式会社 | Convolutional neural network classifier, and classifying method and training method thereof |
| CN104850890B (en) * | 2015-04-14 | 2017-09-26 | 西安电子科技大学 | Instance-based learning and the convolutional neural networks parameter regulation means of Sadowsky distributions |
| JP6771018B2 (en) * | 2015-07-23 | 2020-10-21 | マイヤプリカ テクノロジー エルエルシー | Improved performance of 2D array processor |
| US20180082181A1 (en) * | 2016-05-13 | 2018-03-22 | Samsung Electronics, Co. Ltd. | Neural Network Reordering, Weight Compression, and Processing |
-
2016
- 2016-06-17 WO PCT/CN2016/086154 patent/WO2017214970A1/en not_active Ceased
- 2016-06-17 CN CN201680086870.2A patent/CN109643396A/en active Pending
- 2016-06-17 EP EP16905085.3A patent/EP3472760A4/en not_active Withdrawn
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140079297A1 (en) * | 2012-09-17 | 2014-03-20 | Saied Tadayon | Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities |
| US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
| US20160086078A1 (en) * | 2014-09-22 | 2016-03-24 | Zhengping Ji | Object recognition with reduced neural network weight precision |
| CN104978601A (en) * | 2015-06-26 | 2015-10-14 | 深圳市腾讯计算机系统有限公司 | Neural network model training system and method |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3472760A4 * |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113015984A (en) * | 2018-01-08 | 2021-06-22 | 达莉娅·弗罗洛瓦 | Error correction in convolutional neural networks |
| WO2020050478A1 (en) * | 2018-09-06 | 2020-03-12 | Samsung Electronics Co., Ltd. | Computing apparatus using convolutional neural network and method of operating the same |
| US11580354B2 (en) | 2018-09-06 | 2023-02-14 | Samsung Electronics Co., Ltd. | Computing apparatus using convolutional neural network and method of operating the same |
| US10810726B2 (en) | 2019-01-30 | 2020-10-20 | Walmart Apollo, Llc | Systems and methods for detecting content in images using neural network architectures |
| US10922584B2 (en) | 2019-01-30 | 2021-02-16 | Walmart Apollo, Llc | Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content |
| US11568172B2 (en) | 2019-01-30 | 2023-01-31 | Walmart Apollo, Llc | Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content |
| US12347178B2 (en) | 2019-01-30 | 2025-07-01 | Walmart Apollo, Llc | Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content |
| US11758069B2 (en) | 2020-01-27 | 2023-09-12 | Walmart Apollo, Llc | Systems and methods for identifying non-compliant images using neural network architectures |
| CN111898743A (en) * | 2020-06-02 | 2020-11-06 | 深圳市九天睿芯科技有限公司 | CNN acceleration method and accelerator |
| CN113435376A (en) * | 2021-07-05 | 2021-09-24 | 宝鸡文理学院 | Bidirectional feature fusion deep convolution neural network construction method based on discrete wavelet transform |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3472760A1 (en) | 2019-04-24 |
| CN109643396A (en) | 2019-04-16 |
| EP3472760A4 (en) | 2020-03-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12154188B2 (en) | Training neural networks for vehicle re-identification | |
| WO2017214970A1 (en) | Building convolutional neural network | |
| CN110232689B (en) | Semantic categories localize digital environments | |
| CN114255381B (en) | Training method of image recognition model, image recognition method, device and medium | |
| US11074434B2 (en) | Detection of near-duplicate images in profiles for detection of fake-profile accounts | |
| Roy et al. | Deep learning based hand detection in cluttered environment using skin segmentation | |
| CN108734210B (en) | An object detection method based on cross-modal multi-scale feature fusion | |
| WO2016054779A1 (en) | Spatial pyramid pooling networks for image processing | |
| US12322150B2 (en) | Method and apparatus with object tracking | |
| CN110276366A (en) | Carry out test object using Weakly supervised model | |
| WO2022033095A1 (en) | Text region positioning method and apparatus | |
| WO2021136027A1 (en) | Similar image detection method and apparatus, device and storage medium | |
| KR20180134738A (en) | Electronic apparatus and method for generating trained model thereof | |
| Kumari et al. | RETRACTED ARTICLE: Efficient facial emotion recognition model using deep convolutional neural network and modified joint trilateral filter | |
| Praveena et al. | [Retracted] Effective CBMIR System Using Hybrid Features‐Based Independent Condensed Nearest Neighbor Model | |
| US20220309686A1 (en) | Method and apparatus with object tracking | |
| US9659235B2 (en) | Low-dimensional structure from high-dimensional data | |
| CN114677565B (en) | Training method and image processing method and device for feature extraction network | |
| CN110956131A (en) | Single target tracking method, device and system | |
| US10534980B2 (en) | Method and apparatus for recognizing object based on vocabulary tree | |
| CN109214271B (en) | Method and device for determining loss function for re-identification | |
| CN114170439B (en) | Gesture recognition method, device, storage medium and electronic device | |
| Bharathiraja et al. | Exposing digital image forgeries from statistical footprints | |
| US11451694B1 (en) | Mitigation of obstacles while capturing media content | |
| Choi | Real-time artificial neural network for high-dimensional medical image |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16905085 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2016905085 Country of ref document: EP Effective date: 20190117 |