[go: up one dir, main page]

WO2017214970A1 - Building convolutional neural network - Google Patents

Building convolutional neural network Download PDF

Info

Publication number
WO2017214970A1
WO2017214970A1 PCT/CN2016/086154 CN2016086154W WO2017214970A1 WO 2017214970 A1 WO2017214970 A1 WO 2017214970A1 CN 2016086154 W CN2016086154 W CN 2016086154W WO 2017214970 A1 WO2017214970 A1 WO 2017214970A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature maps
indexes
list
determining
program code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/086154
Other languages
French (fr)
Inventor
Jiale CAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Beijing Co Ltd
Nokia Technologies Oy
Original Assignee
Nokia Technologies Beijing Co Ltd
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Beijing Co Ltd, Nokia Technologies Oy filed Critical Nokia Technologies Beijing Co Ltd
Priority to CN201680086870.2A priority Critical patent/CN109643396A/en
Priority to EP16905085.3A priority patent/EP3472760A4/en
Priority to PCT/CN2016/086154 priority patent/WO2017214970A1/en
Publication of WO2017214970A1 publication Critical patent/WO2017214970A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Embodiments of the present disclosure generally relate to information processing, and more particularly to methods, apparatuses and computer program products for building a Convolutional Neural Network (CNN) .
  • CNN Convolutional Neural Network
  • CNNs have achieved state-of-the-art performance in the applications of image recognition, object detection, speech recognition, and so on.
  • Representative applications of the CNN include AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) .
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • face recognition large-scale image classification
  • HAI Human Machine Interaction
  • the CNNs are organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers.
  • the role of the convolutional layers is feature representation with the semantic level of the features increasing with the depth of the layers. Designing effective convolutional layers to obtain robust feature maps is the key of improving the performance of the CNNs.
  • example embodiments of the present disclosure include a method, apparatus and computer program product for building a CNN.
  • a method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
  • updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
  • the method further comprises: assigning indexes to the first feature maps; and generating a list of the indexes.
  • the method further comprises: updating the list of the indexes based on the second feature maps.
  • determining an amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes.
  • changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
  • an apparatus comprising: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
  • an apparatus comprises means for performing a method according to the first aspect of the present disclosure.
  • a computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to the first aspect of the present disclosure.
  • Fig. 1 schematically shows an architecture of a CNN in which embodiments of the present disclosure can be implemented
  • Fig. 2 is a flowchart of a method in accordance with embodiments of the present disclosure
  • Fig. 3a shows an example of the feature maps prior to re-ranking
  • Fig. 3b shows an example of the re-ranked feature maps
  • Fig. 4 is a block diagram of an electronic device in which embodiments of the present disclosure can be implemented
  • the term “includes” and its variants are to be read as opened terms that mean “includes, but is not limited to. ”
  • the term “based on” is to be read as “based at least in part on. ”
  • the term “one embodiment” and “an embodiment” are to be read as “at least one embodiment. ”
  • the term “another embodiment” is to be read as “at least one other embodiment. ”
  • Other definitions, explicit and implicit, may be included below.
  • Fig. 1 schematically shows an architecture of a CNN 100 in which embodiments of the present disclosure can be implemented. It is to be understood that the structure and functionality of the CNN 100 are described only for the purpose of illustration without suggesting any limitations as to the scope of the present disclosure described herein. The present disclosure described herein can be embodied with a different structure and/or functionality.
  • the CNN 100 includes an input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and an output layer 160.
  • the convolutional layers 120, 140 and the pooling layers 130, 150 are organized in an alternating form.
  • the convolutional layer 120 is followed by the pooling layer 130 and the convolutional layer 140 is followed by the pooling layer 150.
  • the CNN 100 only includes one of the pooling layers 130 and 150 which follows the successive convolutional layers 120 and 140. In other embodiments, the CNN 100 does not include any pooling layers.
  • the CNN 100 may be trained with a training dataset.
  • the training dataset enters the CNN 100 at the input layer 110.
  • the CNN 100 may be used for image recognition, object detection, speech recognition, and so on.
  • the role of the convolutional layers 120 and 140 is feature representation with the semantic level of the features increasing with the depth of the layers.
  • the pooling layers 130 and 150 are obtained by replacing an output of a preceding convolutional layer at certain location with summary statistic of the nearby outputs.
  • the output layer 160 outputs classification results.
  • Each of the input layer 110, convolutional layers 120, 140, pooling layers 130, 150, and output layer 160 includes one or more planes arranged along a Z dimension.
  • Each of the planes is defined by an X dimension and a Y dimension, which is referred to as a spatial domain.
  • Each of the planes in the convolutional layers 120, 140 and pooling layers 130, 150 may be considered as a feature map or a channel which has a feature detector.
  • the Z dimension is also referred to as a channel dimension or channel domain.
  • the feature maps of each of the convolutional layers 120 and 140 may be obtained by applying convolution operation on the feature maps in a respective preceding layer in both spatial domain and channel domain.
  • convolution operation By means of the convolution operation, each of elements in the feature maps of the convolutional layers 120 and 140 is only connected with elements in a local region of the feature maps in a preceding layer.
  • applying the convolution operation to a preceding layer of a convolutional layer means that there is a sparse connection between these two layers.
  • the terms “convolution operation” and “sparse connection” may be used interchangeably.
  • the convolution operation is suitable for the situation where neighboring elements are highly correlated.
  • existing learning algorithms do not guarantee that neighboring elements between different feature maps in the channel domain are highly correlated, the correlation between neighboring elements in the channel domain is not as large as the correlation between neighboring elements in the spatial domain.
  • the sparse connections in the channel domain cannot result in a good performance.
  • the feature maps obtained via the convolution operation do not have strong ability of feature representation and thus cannot be used as discriminative representations of an image.
  • a scheme for building a CNN is proposed to improve the correlation between neighboring elements in the channel domain so that applying convolution operations in both spatial domain and channel domain yields a better performance.
  • Fig. 2 shows a flowchart of a method 200 for building a CNN in accordance with embodiments of the present disclosure.
  • the method 200 may be implemented in a CNN, such as the CNN 100 as shown in Fig. 1.
  • the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers.
  • the method 200 may be performed with respect to any of the convolutional layers 120 and 140.
  • the method 200 will be described by taking convolutional layer 120 as an example.
  • the method 200 is entered in step 210, where convolution parameters and first feature maps for the convolutional layer 120 in the CNN 100 are determined based on a training dataset for multimedia content.
  • the CNN 100 may be used for image recognition, object detection, speech recognition, and so on.
  • examples of the training dataset for multimedia content include, but are not limited to, training datasets for images, speech, video and the like.
  • a convolution operation may be performed by using linear filters.
  • each of the filters is convoluted over the feature maps in a preceding layer with a predefined stride followed by a nonlinear activation.
  • the convolution parameters include weights of the linear filters.
  • Different feature maps correspond to different parameters of the filters with a feature map sharing the same parameters.
  • the convolution operation may be performed by using nonlinear functions, such as shallow MultiLayer Perceptron (MLP) .
  • the convolution parameters include parameters for the MLP.
  • the convolution parameters and first feature maps for the convolutional layer 120 are determined by applying a learning algorithm to the training dataset.
  • the learning algorithm include, but are not limited to, back propagation (BP) , stochastic gradient descent (SGD) , and limited-memory BFGS (Broyden, Fletcher, Goldfarb, and Shanno) .
  • a number of the first feature maps may be pre-determined.
  • the number, denoted by M may be pre-determined as any appropriate integer larger than one, for example, eight.
  • the method 200 further comprises assigning indexes to the determined feature maps.
  • Fig. 3a shows an example of the first feature maps with the indexes being assigned.
  • the first feature maps includes eight feature maps with the indexes of 1, 2, ... , 8.
  • the eight feature maps with the indexes of 1, 2, ... , 8 also are called feature maps C 1 , C 2 , ... , C M .
  • the method 200 further comprises generating a list of the indexes.
  • the list of the indexes it may be known that which feature maps are neighbors.
  • the list of the indexes denoted by R, is [1, 2, 3, 4, 5, 6, 7, 8] .
  • the existing learning algorithms do not guarantee that neighboring elements in different feature maps in the channel domain are highly correlated.
  • elements in the feature map 1 and those in feature map 2 are neighboring elements as shown in Fig. 3a, but correlation between these neighboring elements may not be high.
  • correlation between the neighboring elements in the feature maps 3 and 4 may not be high.
  • step 220 in order to improve the correlation between the neighboring elements in different feature maps, in step 220, an order of the first feature maps obtained in step 210 is changed according to correlation among the first feature maps so as to obtain second feature maps.
  • the second feature maps are obtained by re-ranking the first feature maps.
  • the second feature maps are also called the “re-ranked feature maps” .
  • changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
  • the representation information examples include, but are not limited to, Histograms of Oriented Gradient (HOG) in an intensity image, information extracted by an algorithm of Scale-Invariant Feature Transform (SIFT) and the like.
  • HOG Histograms of Oriented Gradient
  • SIFT Scale-Invariant Feature Transform
  • the representation information is also referred to as HOG features.
  • the step 220 will be described in detail by taking HOG features as an example and referring to Fig. 3a.
  • the re-ranked maps (for example, the second feature maps) are denoted by D 1 , D 2 , ... , D M and let the feature map C 1 be the first one of the re-ranked maps D 1 , D 2 , ... , D M , for example, D 1 ⁇ C 1 . Then, differences between the HOG features f (C i ) of the feature map C 1 and the remaining M-1 feature maps C 2 , C 3 , ... , C M may be determined as follows:
  • K is a number of the HOG features in each feature map andf (C i ) (k) is the k th HOG feature of the feature map C i .
  • a feature map j which has the smallest difference g to the feature map C 1 is determined as follows:
  • the feature map C j be the second one of the re-ranked maps D 1 , D 2 , ... , D M , for example, D 2 ⁇ C j
  • the above Equations (1) and (2) are applied to the feature map C j to find from the remaining M-2 feature maps a feature map which has the smallest difference to the feature map C j .
  • the other feature maps of the re-ranked maps can be determined.
  • Fig. 3b shows an example of the re-ranked feature maps.
  • the re-ranked feature maps includes eight feature maps with the indexes of 1, 5, 3, 4 8, 6, 7, 2.
  • the method 200 further comprises updating the list of the indexes R based on the re-ranked feature maps.
  • the updated list of the indexes denoted by R * , is [1, 5, 3, 4 8, 6, 7, 2] .
  • step 230 the convolution parameters for the convolutional layer 120 is updated based on the training dataset and the re-ranked feature maps.
  • updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
  • determining the amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes. In the above example, determining the amount of change in the order of the first feature maps comprises determining a difference between the list of the indexes R and the updated list of the indexes R * .
  • the difference between the list of the indexes R and the updated list of the indexes R * may be determined as follows:
  • s (R * , R) represents the difference between the list of the indexes R and the updated list of the indexes R * , represents the j th element in the list R * , and R j represents the j th element in the list R. If s (R * , R) is larger than a predetermined threshold, for example, 0, the convolution parameters will be updated.
  • the difference between the list of the indexes R and the updated list of the indexes R * may be determined by determining a ratio of different elements between the lists R * and R to a total number of the elements in the lists R * as follows:
  • w (R * , R) represents the ratio of different elements between the lists R * and R to a total number of the elements in the lists R * , represents the j th element in the list R * , and R j represents the j th element in the list R. If w (R * , R) is larger than a predetermined threshold, the convolution parameters will be updated.
  • the predetermined threshold may be any value in the range of 0.5 to 1, for example, 0.8.
  • the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers.
  • Equation (3) may be re-written as follows:
  • R * (i,j) represents the j th element in the list R * for the convolutional layer i
  • R j represents the j th element in the list R for the convolutional layer i
  • N represents a number of the convolutional layers.
  • Equations (4) and (5) may be re-written as follows:
  • step 230 may be understood as a process of re-learning of the convolution parameters for the convolutional layer 120 based on the training dataset and the re-ranked feature maps.
  • the method 200 may be performed iteratively until the amount of change in the order of the first feature maps is equal to or smaller than the predetermined threshold.
  • the method 200 may further comprise updating the feature maps for the convolutionat layer 120 based on the updated convolution parameters and changing an order of the updated feature maps according to correlation among the updated feature maps to obtain third feature maps.
  • the method 200 may further comprise: determining an amount of change in the order of the updated feature maps; and in response to the amount being equal to or smaller than the predetermined threshold, stopping updating the convolution parameters.
  • the method 200 may further comprise receiving a testing dataset for multimedia content at the input layer 110, performing an classification on the testing dataset and outputting results of the classification at the output layer 160.
  • the convolution operation is performed by using linear filters and a length of each of the filters is smaller than the number of the feature maps in each of the convolutional layers.
  • an order of feature maps of at least one of convolutional layers in a CNN is changed according to correlation among the feature maps so that similar feature maps are arranged to be neighbors and then the convolution parameters are re-learned.
  • the correlation between neighboring elements in the channel domain is improved so that applying convolution operations in both spatial domain and channel domain yields a better performance.
  • FIG. 4 in which an example electronic device or computer system/server 12 which is applicable to implement the embodiments of the present disclosure is shown.
  • Computer system/server 12 is only illustrative and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein.
  • computer system/server 12 is shown in the form of a general-purpose computing device.
  • the components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
  • Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
  • bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
  • Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
  • System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
  • Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive” ) .
  • a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (for example, a “floppy disk” )
  • an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media
  • each can be connected to bus 18 by one or more data media interfaces.
  • memory 28 may include at least one program product having a set (for example, at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
  • Program/utility 40 having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment.
  • Program modules 42 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
  • Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, and the like.
  • Such communication can occur via Input/Output (I/O) interfaces 22.
  • computer system/server 12 can communicate with one or more networks such as a local area network (LAN) , a general wide area network (WAN) , and/or a public network (for example, the Internet) via network adapter 20.
  • network adapter 20 communicates with the other components of computer system/server 12 via bus 18.
  • I/O interfaces 22 may support one or more of various different input devices that can be used to provide input to computer system/server 12.
  • the input device (s) may include a user device such keyboard, keypad, touch pad, trackball, and the like.
  • the input device (s) may implement one or more natural user interface techniques, such as speech recognition, touch and stylus recognition, recognition of gestures in contact with the input device (s) and adjacent to the input device (s) , recognition of air gestures, head and eye tracking, voice and speech recognition, sensing user brain activity, and machine intelligence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present disclosure provide a method, apparatus and computer program product for information processing. The method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.

Description

BUILDING CONVOLUTIONAL NEURAL NETWORK FIELD OF THE INVENTION
Embodiments of the present disclosure generally relate to information processing, and more particularly to methods, apparatuses and computer program products for building a Convolutional Neural Network (CNN) .
BACKGROUND
CNNs have achieved state-of-the-art performance in the applications of image recognition, object detection, speech recognition, and so on. Representative applications of the CNN include AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) .
Generally, the CNNs are organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers. The role of the convolutional layers is feature representation with the semantic level of the features increasing with the depth of the layers. Designing effective convolutional layers to obtain robust feature maps is the key of improving the performance of the CNNs.
SUMMARY
In general, example embodiments of the present disclosure include a method, apparatus and computer program product for building a CNN.
In a first aspect of the present disclosure, a method is provided. The method comprises: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
In some embodiments, updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution  parameters.
In some embodiments, the method further comprises: assigning indexes to the first feature maps; and generating a list of the indexes.
In some embodiments, the method further comprises: updating the list of the indexes based on the second feature maps.
In some embodiments, determining an amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes.
In some embodiments, changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
In a second aspect of the present disclosure, an apparatus is provided. The apparatus comprises: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network; changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and updating the convolution parameters based on the training dataset and the second feature maps.
In a third aspect of the present disclosure, an apparatus is provided. The apparatus comprises means for performing a method according to the first aspect of the present disclosure.
In a fourth aspect of the present disclosure, a computer program product is provided. The computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to the first aspect of the present disclosure.
It is to be understood that the Summary is not intended to identify key or essential features of embodiments of the present disclosure, nor is it intended to be used to limit the  scope of the present disclosure. Other features of the present disclosure will become easily comprehensible through the description below.
BRIEF DESCRIPTION OF THE DRAWINGS
Through the more detailed description of some embodiments of the present disclosure in the accompanying drawings, the above and other objects, features and advantages of the present disclosure will become more apparent, wherein:
Fig. 1 schematically shows an architecture of a CNN in which embodiments of the present disclosure can be implemented;
Fig. 2 is a flowchart of a method in accordance with embodiments of the present disclosure;
Fig. 3a shows an example of the feature maps prior to re-ranking;
Fig. 3b shows an example of the re-ranked feature maps; and
Fig. 4 is a block diagram of an electronic device in which embodiments of the present disclosure can be implemented;
Throughout the drawings, same or similar reference numerals represent the same or similar element.
DETAILED DESCRIPTION
Principle of the present disclosure will now be described with reference to some example embodiments. It is to be understood that these embodiments are described for the purpose of illustration only and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitations as to the scope of the invention. The invention described herein can be implemented in various manners other than the ones describe below.
As used herein, the term “includes” and its variants are to be read as opened terms that mean “includes, but is not limited to. ” The term “based on” is to be read as “based at least in part on. ” The term “one embodiment” and “an embodiment” are to be read as “at least one embodiment. ” The term “another embodiment” is to be read as “at least one other embodiment. ” Other definitions, explicit and implicit, may be included below.
Reference is first made to Fig. 1, which schematically shows an architecture of a  CNN 100 in which embodiments of the present disclosure can be implemented. It is to be understood that the structure and functionality of the CNN 100 are described only for the purpose of illustration without suggesting any limitations as to the scope of the present disclosure described herein. The present disclosure described herein can be embodied with a different structure and/or functionality.
As shown in Fig. 1, the CNN 100 includes an input layer 110,  convolutional layers  120, 140,  pooling layers  130, 150, and an output layer 160. Typically, the  convolutional layers  120, 140 and the  pooling layers  130, 150 are organized in an alternating form. In some embodiments, the convolutional layer 120 is followed by the pooling layer 130 and the convolutional layer 140 is followed by the pooling layer 150. In some embodiments, the CNN 100 only includes one of the  pooling layers  130 and 150 which follows the successive  convolutional layers  120 and 140. In other embodiments, the CNN 100 does not include any pooling layers.
The CNN 100 may be trained with a training dataset. The training dataset enters the CNN 100 at the input layer 110. Once trained, the CNN 100 may be used for image recognition, object detection, speech recognition, and so on. The role of the  convolutional layers  120 and 140 is feature representation with the semantic level of the features increasing with the depth of the layers. The  pooling layers  130 and 150 are obtained by replacing an output of a preceding convolutional layer at certain location with summary statistic of the nearby outputs. The output layer 160 outputs classification results.
Each of the input layer 110,  convolutional layers  120, 140,  pooling layers  130, 150, and output layer 160 includes one or more planes arranged along a Z dimension. Each of the planes is defined by an X dimension and a Y dimension, which is referred to as a spatial domain. Each of the planes in the  convolutional layers  120, 140 and pooling  layers  130, 150 may be considered as a feature map or a channel which has a feature detector. Thus, the Z dimension is also referred to as a channel dimension or channel domain.
The feature maps of each of the  convolutional layers  120 and 140 may be obtained by applying convolution operation on the feature maps in a respective preceding layer in both spatial domain and channel domain. By means of the convolution operation, each of elements in the feature maps of the  convolutional layers  120 and 140 is only connected with elements in a local region of the feature maps in a preceding layer. In this sense, applying the convolution operation to a preceding layer of a convolutional layer means that  there is a sparse connection between these two layers. Thus, as used herein, the terms “convolution operation” and “sparse connection” may be used interchangeably.
It is known that the convolution operation is suitable for the situation where neighboring elements are highly correlated. However, because existing learning algorithms do not guarantee that neighboring elements between different feature maps in the channel domain are highly correlated, the correlation between neighboring elements in the channel domain is not as large as the correlation between neighboring elements in the spatial domain. As a result, the sparse connections in the channel domain cannot result in a good performance. For example, in the case of image recognition, the feature maps obtained via the convolution operation do not have strong ability of feature representation and thus cannot be used as discriminative representations of an image.
In accordance with embodiments of the present disclosure, a scheme for building a CNN is proposed to improve the correlation between neighboring elements in the channel domain so that applying convolution operations in both spatial domain and channel domain yields a better performance.
Fig. 2 shows a flowchart of a method 200 for building a CNN in accordance with embodiments of the present disclosure. The method 200 may be implemented in a CNN, such as the CNN 100 as shown in Fig. 1. In some embodiments, the method 200 may be performed with respect to both of the  convolutional layers  120 and 140 so as to improve correlation between different feature maps in these convolutional layers. In other embodiments, the method 200 may be performed with respect to any of the  convolutional layers  120 and 140. Hereinafter, for the purpose of illustration, the method 200 will be described by taking convolutional layer 120 as an example.
As shown, the method 200 is entered in step 210, where convolution parameters and first feature maps for the convolutional layer 120 in the CNN 100 are determined based on a training dataset for multimedia content.
As described above, once trained, the CNN 100 may be used for image recognition, object detection, speech recognition, and so on. In this regard, examples of the training dataset for multimedia content include, but are not limited to, training datasets for images, speech, video and the like.
Typically, a convolution operation may be performed by using linear filters. In particular, each of the filters is convoluted over the feature maps in a preceding layer with a  predefined stride followed by a nonlinear activation. In the case of using linear filters, the convolution parameters include weights of the linear filters. Different feature maps correspond to different parameters of the filters with a feature map sharing the same parameters. Alternatively, instead of using the linear filters, the convolution operation may be performed by using nonlinear functions, such as shallow MultiLayer Perceptron (MLP) . In this case, the convolution parameters include parameters for the MLP.
In some embodiments, the convolution parameters and first feature maps for the convolutional layer 120 are determined by applying a learning algorithm to the training dataset. Examples of the learning algorithm include, but are not limited to, back propagation (BP) , stochastic gradient descent (SGD) , and limited-memory BFGS (Broyden, Fletcher, Goldfarb, and Shanno) .
It is to be understood that in order to determine the convolution parameters and first feature maps for the convolutional layer 120, a number of the first feature maps may be pre-determined. Usually, in order to detect multiple features of the multimedia content, multiple feature maps are used in the convolutional layer 120. Thus, the number, denoted by M, may be pre-determined as any appropriate integer larger than one, for example, eight.
In some embodiments, the method 200 further comprises assigning indexes to the determined feature maps. Fig. 3a shows an example of the first feature maps with the indexes being assigned. As shown in Fig. 3a, the first feature maps includes eight feature maps with the indexes of 1, 2, ... , 8. Thus, the eight feature maps with the indexes of 1, 2, ... , 8 also are called feature maps C1, C2, ... , CM.
In some embodiments, the method 200 further comprises generating a list of the indexes. With the list of the indexes, it may be known that which feature maps are neighbors. In the example as shown in Fig. 3a, the list of the indexes, denoted by R, is [1, 2, 3, 4, 5, 6, 7, 8] .
As described above, the existing learning algorithms, such as the BP algorithm, do not guarantee that neighboring elements in different feature maps in the channel domain are highly correlated. For example, elements in the feature map 1 and those in feature map 2 are neighboring elements as shown in Fig. 3a, but correlation between these neighboring elements may not be high. Similarly, correlation between the neighboring elements in the feature maps 3 and 4 may not be high.
Referring back to Fig. 2, in order to improve the correlation between the  neighboring elements in different feature maps, in step 220, an order of the first feature maps obtained in step 210 is changed according to correlation among the first feature maps so as to obtain second feature maps. In other words, the second feature maps are obtained by re-ranking the first feature maps. Thus, hereinafter, the second feature maps are also called the “re-ranked feature maps” .
In some embodiments, changing an order of the first feature maps according to correlation among the first feature maps comprises: obtaining representation information of the first feature maps; determining differences among the representation information; and determining the correlation based on the differences among the representation information.
Examples of the representation information include, but are not limited to, Histograms of Oriented Gradient (HOG) in an intensity image, information extracted by an algorithm of Scale-Invariant Feature Transform (SIFT) and the like. In the case of HOG, the representation information is also referred to as HOG features. Hereinafter, for the purpose of illustration, the step 220 will be described in detail by taking HOG features as an example and referring to Fig. 3a.
With respect to each of the feature maps C1, C2, ... , CM as shown in Fig. 3a, HOG features are extracted and is expressed as f (Ci) , where i=1 to M. It is to be appreciated, a process for extracting the HOG features is known in the art and thus a detailed description thereof is omitted herein.
The re-ranked maps (for example, the second feature maps) are denoted by D1, D2, ... , DM and let the feature map C1 be the first one of the re-ranked maps D1, D2, ... , DM, for example, D1←C1. Then, differences between the HOG features f (Ci) of the feature map C1 and the remaining M-1 feature maps C2, C3, ... , CM may be determined as follows:
Figure PCTCN2016086154-appb-000001
where K is a number of the HOG features in each feature map andf (Ci) (k) is the kth HOG feature of the feature map Ci.
Then, a feature map j which has the smallest difference g to the feature map C1 is determined as follows:
Figure PCTCN2016086154-appb-000002
Next, let the feature map Cj be the second one of the re-ranked maps D1, D2, ... , DM, for example, D2←Cj, and the above Equations (1) and (2) are applied to the feature  map Cj to find from the remaining M-2 feature maps a feature map which has the smallest difference to the feature map Cj. Similarly, the other feature maps of the re-ranked maps can be determined.
Fig. 3b shows an example of the re-ranked feature maps. As shown in Fig. 3b, the re-ranked feature maps includes eight feature maps with the indexes of 1, 5, 3, 4 8, 6, 7, 2.
In some embodiments, the method 200 further comprises updating the list of the indexes R based on the re-ranked feature maps. With reference to Fig. 3b, the updated list of the indexes, denoted by R, is [1, 5, 3, 4 8, 6, 7, 2] .
Referring back to Fig. 2, in step 230, the convolution parameters for the convolutional layer 120 is updated based on the training dataset and the re-ranked feature maps.
In some embodiments, updating the convolution parameters comprises: determining an amount of change in the order of the first feature maps; and in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
In some embodiments, determining the amount of change in the order of the first feature maps comprises: determining a difference between the generated list of the indexes and the updated list of the indexes. In the above example, determining the amount of change in the order of the first feature maps comprises determining a difference between the list of the indexes R and the updated list of the indexes R.
As an example, the difference between the list of the indexes R and the updated list of the indexes R may be determined as follows:
Figure PCTCN2016086154-appb-000003
where s (R*, R) represents the difference between the list of the indexes R and the updated list of the indexes R, 
Figure PCTCN2016086154-appb-000004
represents the jth element in the list R, and Rj represents the jth element in the list R. If s (R*, R) is larger than a predetermined threshold, for example, 0, the convolution parameters will be updated.
As another example, the difference between the list of the indexes R and the updated list of the indexes R may be determined by determining a ratio of different elements between the lists R and R to a total number of the elements in the lists R as  follows:
Figure PCTCN2016086154-appb-000005
Figure PCTCN2016086154-appb-000006
where w (R*, R) represents the ratio of different elements between the lists R and R to a total number of the elements in the lists R, 
Figure PCTCN2016086154-appb-000007
represents the jth element in the list R, and Rj represents the jth element in the list R. If w (R*, R) is larger than a predetermined threshold, the convolution parameters will be updated. The predetermined threshold may be any value in the range of 0.5 to 1, for example, 0.8.
As described above, the method 200 may be performed with respect to both of the convolutional layers 120 and 140 so as to improve correlation between different feature maps in these convolutional layers. In this case, the above Equation (3) may be re-written as follows:
Figure PCTCN2016086154-appb-000008
where R (i,j) represents the jth element in the list R for the convolutional layer i, and Rj represents the jth element in the list R for the convolutional layer i, N represents a number of the convolutional layers.
Similarly, the above Equations (4) and (5) may be re-written as follows:
Figure PCTCN2016086154-appb-000009
Figure PCTCN2016086154-appb-000010
It is to be appreciated that the step 230 may be understood as a process of re-learning of the convolution parameters for the convolutional layer 120 based on the training dataset and the re-ranked feature maps.
In addition, the method 200 may be performed iteratively until the amount of change in the order of the first feature maps is equal to or smaller than the predetermined threshold. In this sense, the method 200 may further comprise updating the feature maps for the convolutionat layer 120 based on the updated convolution parameters and changing an order of the updated feature maps according to correlation among the updated feature maps to obtain third feature maps. Furthermore, the method 200 may further comprise:  determining an amount of change in the order of the updated feature maps; and in response to the amount being equal to or smaller than the predetermined threshold, stopping updating the convolution parameters.
In some embodiments, the method 200 may further comprise receiving a testing dataset for multimedia content at the input layer 110, performing an classification on the testing dataset and outputting results of the classification at the output layer 160.
In some embodiments, the convolution operation is performed by using linear filters and a length of each of the filters is smaller than the number of the feature maps in each of the convolutional layers.
To sum up, in the embodiments of the present disclosure, an order of feature maps of at least one of convolutional layers in a CNN is changed according to correlation among the feature maps so that similar feature maps are arranged to be neighbors and then the convolution parameters are re-learned. Thus, the correlation between neighboring elements in the channel domain is improved so that applying convolution operations in both spatial domain and channel domain yields a better performance.
Reference is made to Fig. 4, in which an example electronic device or computer system/server 12 which is applicable to implement the embodiments of the present disclosure is shown. Computer system/server 12 is only illustrative and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the invention described herein.
As shown in Fig. 4, computer system/server 12 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer system/server 12 typically includes a variety of computer system  readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.
System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive” ) . Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (for example, a “floppy disk” ) , and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 18 by one or more data media interfaces. As will be further depicted and described below, memory 28 may include at least one program product having a set (for example, at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, and the like. One or more devices that enable a user to interact with computer system/server 12; and/or any devices (for example, network card, modem, etc. ) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN) , a general wide area network (WAN) , and/or a public network (for example, the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of  computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, and the like.
In computer system/server 12, I/O interfaces 22 may support one or more of various different input devices that can be used to provide input to computer system/server 12. For example, the input device (s) may include a user device such keyboard, keypad, touch pad, trackball, and the like. The input device (s) may implement one or more natural user interface techniques, such as speech recognition, touch and stylus recognition, recognition of gestures in contact with the input device (s) and adjacent to the input device (s) , recognition of air gestures, head and eye tracking, voice and speech recognition, sensing user brain activity, and machine intelligence.

Claims (14)

  1. A method, comprising:
    determining, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network;
    changing an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and
    updating the convolution parameters based on the training dataset and the second feature maps.
  2. The method of Claim 1, wherein updating the convolution parameters comprises:
    determining an amount of change in the order of the first feature maps; and
    in response to the amount being larger than a predetermined threshold, updating the convolution parameters.
  3. The method of Claim 2, further comprising:
    assigning indexes to the first feature maps; and
    generating a list of the indexes.
  4. The method of Claim 3, further comprising:
    updating the list of the indexes based on the second feature maps.
  5. The method of Claim 4, wherein determining an amount of change in the order of the first feature maps comprises:
    determining a difference between the generated list of the indexes and the updated list of the indexes.
  6. The method of any of Claims 1 to 5, wherein changing an order of the first feature maps according to correlation among the first feature maps comprises:
    obtaining representation inforrnation of the first feature maps;
    determining differences among the representation information; and
    determining the correlation based on the differences among the representation information.
  7. An apparatus, comprising:
    at least one processor; and
    at least one memory including computer program code;
    the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to:
    determine, based on a training dataset for multimedia content, convolution parameters and first feature maps for a convolutional layer in a convolutional neural network;
    change an order of the first feature maps according to correlation among the first feature maps to obtain second feature maps; and
    update the convolution parameters based on the training dataset and the second feature maps.
  8. The apparatus of Claim 7, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to:
    determine an amount of change in the order of the first feature maps; and
    update the convolution parameters in response to the amount being larger than a predetermined threshold.
  9. The apparatus of Claim 8, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to further perform:
    assign indexes to the first feature maps; and
    generate a list of the indexes.
  10. The apparatus of Claim 9, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to further perform:
    update the list of the indexes based on the second feature maps.
  11. The apparatus of Claim 10, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to determine an amount of change in the order of the first feature maps by determining a difference between the generated list of the indexes and the updated list of the indexes.
  12. The apparatus of any of Claims 7 to 11, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to:
    obtain representation information of the first feature maps;
    determine differences among the representation information;
    determine the correlation based on the differences among the representation information; and
    change the order of the first feature maps according to the correlation among the first feature maps to obtain the second feature maps.
  13. An apparatus comprising means for performing a method according to any of Claims 1 to 6.
  14. A computer program product comprising at least one computer readable non-transitory memory medium having program code stored thereon, the program code which, when executed by an apparatus, causes the apparatus to perform a method according to any of Claims 1 to 6.
PCT/CN2016/086154 2016-06-17 2016-06-17 Building convolutional neural network Ceased WO2017214970A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201680086870.2A CN109643396A (en) 2016-06-17 2016-06-17 Construct convolutional neural networks
EP16905085.3A EP3472760A4 (en) 2016-06-17 2016-06-17 CONSTRUCTION OF A NEURONAL CONVOLUTION NETWORK
PCT/CN2016/086154 WO2017214970A1 (en) 2016-06-17 2016-06-17 Building convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/086154 WO2017214970A1 (en) 2016-06-17 2016-06-17 Building convolutional neural network

Publications (1)

Publication Number Publication Date
WO2017214970A1 true WO2017214970A1 (en) 2017-12-21

Family

ID=60663851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086154 Ceased WO2017214970A1 (en) 2016-06-17 2016-06-17 Building convolutional neural network

Country Status (3)

Country Link
EP (1) EP3472760A4 (en)
CN (1) CN109643396A (en)
WO (1) WO2017214970A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020050478A1 (en) * 2018-09-06 2020-03-12 Samsung Electronics Co., Ltd. Computing apparatus using convolutional neural network and method of operating the same
US10810726B2 (en) 2019-01-30 2020-10-20 Walmart Apollo, Llc Systems and methods for detecting content in images using neural network architectures
CN111898743A (en) * 2020-06-02 2020-11-06 深圳市九天睿芯科技有限公司 CNN acceleration method and accelerator
US10922584B2 (en) 2019-01-30 2021-02-16 Walmart Apollo, Llc Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content
CN113015984A (en) * 2018-01-08 2021-06-22 达莉娅·弗罗洛瓦 Error correction in convolutional neural networks
CN113435376A (en) * 2021-07-05 2021-09-24 宝鸡文理学院 Bidirectional feature fusion deep convolution neural network construction method based on discrete wavelet transform
US11758069B2 (en) 2020-01-27 2023-09-12 Walmart Apollo, Llc Systems and methods for identifying non-compliant images using neural network architectures

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309837B (en) * 2019-07-05 2021-07-06 北京迈格威科技有限公司 Data processing method and image processing method based on convolutional neural network feature map

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140079297A1 (en) * 2012-09-17 2014-03-20 Saied Tadayon Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities
US20150117760A1 (en) * 2013-10-30 2015-04-30 Nec Laboratories America, Inc. Regionlets with Shift Invariant Neural Patterns for Object Detection
CN104978601A (en) * 2015-06-26 2015-10-14 深圳市腾讯计算机系统有限公司 Neural network model training system and method
US20160086078A1 (en) * 2014-09-22 2016-03-24 Zhengping Ji Object recognition with reduced neural network weight precision

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346622A (en) * 2013-07-31 2015-02-11 富士通株式会社 Convolutional neural network classifier, and classifying method and training method thereof
CN104850890B (en) * 2015-04-14 2017-09-26 西安电子科技大学 Instance-based learning and the convolutional neural networks parameter regulation means of Sadowsky distributions
JP6771018B2 (en) * 2015-07-23 2020-10-21 マイヤプリカ テクノロジー エルエルシー Improved performance of 2D array processor
US20180082181A1 (en) * 2016-05-13 2018-03-22 Samsung Electronics, Co. Ltd. Neural Network Reordering, Weight Compression, and Processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140079297A1 (en) * 2012-09-17 2014-03-20 Saied Tadayon Application of Z-Webs and Z-factors to Analytics, Search Engine, Learning, Recognition, Natural Language, and Other Utilities
US20150117760A1 (en) * 2013-10-30 2015-04-30 Nec Laboratories America, Inc. Regionlets with Shift Invariant Neural Patterns for Object Detection
US20160086078A1 (en) * 2014-09-22 2016-03-24 Zhengping Ji Object recognition with reduced neural network weight precision
CN104978601A (en) * 2015-06-26 2015-10-14 深圳市腾讯计算机系统有限公司 Neural network model training system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3472760A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113015984A (en) * 2018-01-08 2021-06-22 达莉娅·弗罗洛瓦 Error correction in convolutional neural networks
WO2020050478A1 (en) * 2018-09-06 2020-03-12 Samsung Electronics Co., Ltd. Computing apparatus using convolutional neural network and method of operating the same
US11580354B2 (en) 2018-09-06 2023-02-14 Samsung Electronics Co., Ltd. Computing apparatus using convolutional neural network and method of operating the same
US10810726B2 (en) 2019-01-30 2020-10-20 Walmart Apollo, Llc Systems and methods for detecting content in images using neural network architectures
US10922584B2 (en) 2019-01-30 2021-02-16 Walmart Apollo, Llc Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content
US11568172B2 (en) 2019-01-30 2023-01-31 Walmart Apollo, Llc Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content
US12347178B2 (en) 2019-01-30 2025-07-01 Walmart Apollo, Llc Systems, methods, and techniques for training neural networks and utilizing the neural networks to detect non-compliant content
US11758069B2 (en) 2020-01-27 2023-09-12 Walmart Apollo, Llc Systems and methods for identifying non-compliant images using neural network architectures
CN111898743A (en) * 2020-06-02 2020-11-06 深圳市九天睿芯科技有限公司 CNN acceleration method and accelerator
CN113435376A (en) * 2021-07-05 2021-09-24 宝鸡文理学院 Bidirectional feature fusion deep convolution neural network construction method based on discrete wavelet transform

Also Published As

Publication number Publication date
EP3472760A1 (en) 2019-04-24
CN109643396A (en) 2019-04-16
EP3472760A4 (en) 2020-03-04

Similar Documents

Publication Publication Date Title
US12154188B2 (en) Training neural networks for vehicle re-identification
WO2017214970A1 (en) Building convolutional neural network
CN110232689B (en) Semantic categories localize digital environments
CN114255381B (en) Training method of image recognition model, image recognition method, device and medium
US11074434B2 (en) Detection of near-duplicate images in profiles for detection of fake-profile accounts
Roy et al. Deep learning based hand detection in cluttered environment using skin segmentation
CN108734210B (en) An object detection method based on cross-modal multi-scale feature fusion
WO2016054779A1 (en) Spatial pyramid pooling networks for image processing
US12322150B2 (en) Method and apparatus with object tracking
CN110276366A (en) Carry out test object using Weakly supervised model
WO2022033095A1 (en) Text region positioning method and apparatus
WO2021136027A1 (en) Similar image detection method and apparatus, device and storage medium
KR20180134738A (en) Electronic apparatus and method for generating trained model thereof
Kumari et al. RETRACTED ARTICLE: Efficient facial emotion recognition model using deep convolutional neural network and modified joint trilateral filter
Praveena et al. [Retracted] Effective CBMIR System Using Hybrid Features‐Based Independent Condensed Nearest Neighbor Model
US20220309686A1 (en) Method and apparatus with object tracking
US9659235B2 (en) Low-dimensional structure from high-dimensional data
CN114677565B (en) Training method and image processing method and device for feature extraction network
CN110956131A (en) Single target tracking method, device and system
US10534980B2 (en) Method and apparatus for recognizing object based on vocabulary tree
CN109214271B (en) Method and device for determining loss function for re-identification
CN114170439B (en) Gesture recognition method, device, storage medium and electronic device
Bharathiraja et al. Exposing digital image forgeries from statistical footprints
US11451694B1 (en) Mitigation of obstacles while capturing media content
Choi Real-time artificial neural network for high-dimensional medical image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16905085

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016905085

Country of ref document: EP

Effective date: 20190117