US20220375202A1 - Hierarchical sampling for object identification - Google Patents
Hierarchical sampling for object identification Download PDFInfo
- Publication number
- US20220375202A1 US20220375202A1 US17/882,208 US202217882208A US2022375202A1 US 20220375202 A1 US20220375202 A1 US 20220375202A1 US 202217882208 A US202217882208 A US 202217882208A US 2022375202 A1 US2022375202 A1 US 2022375202A1
- Authority
- US
- United States
- Prior art keywords
- snapshots
- descriptors
- descriptor
- cluster
- snapshot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
- G06V20/47—Detecting features for summarising video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2137—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on criteria of topology preservation, e.g. multidimensional scaling or self-organising maps
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/7625—Hierarchical techniques, i.e. dividing or merging patterns to obtain a tree-like representation; Dendograms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- An aspect of the present disclosure includes a method including receiving a first plurality of snapshots, generating a first plurality of descriptors each associated with the first plurality of snapshots, grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors, selecting a representative snapshot for each of the at least one cluster, generating at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identifying a target by applying the at least second descriptor to a second plurality of snapshots.
- aspects of the present disclosure includes a system having a memory that stores instructions and a processor configured to execute the instructions to receive a first plurality of snapshots, generate a first plurality of descriptors each associated with the first plurality of snapshots, group the first plurality of snapshots into at least one cluster based on the plurality of descriptors, select a representative snapshot for each of the at least one cluster, generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identify a target by applying the at least second descriptor to a second plurality of snapshots.
- Certain aspects of the present disclosure includes a non-transitory computer readable medium having instructions stored therein that, when executed by a processor, cause the processor to receive a first plurality of snapshots, generate a first plurality of descriptors each associated with the first plurality of snapshots, group the first plurality of snapshots into at least one cluster based on the plurality of descriptors, select a representative snapshot for each of the at least one cluster, generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identify a target by applying the at least second descriptor to a second plurality of snapshots.
- FIG. 1 illustrates an example of an environment for implementing the hierarchical sampling for re-identification process in accordance with aspects of the present disclosure
- FIG. 2 illustrates an example of a method for implementing the hierarchical sampling for re-identification process in accordance with aspects of the present disclosure
- FIG. 3 illustrates an example of a method for implementing the hierarchical sampling for re-identification process including classification in accordance with aspects of the present disclosure
- FIG. 4 illustrates an example of a method for implementing the hierarchical sampling for re-identification process using neural networks in accordance with aspects of the present disclosure
- FIG. 5 illustrates an example of a computer system in accordance with aspects of the present disclosure.
- processor can refer to a device that processes signals and performs general computing and arithmetic functions. Signals processed by the processor can include digital signals, data signals, computer instructions, processor instructions, messages, a bit, a bit stream, or other computing that can be received, transmitted and/or detected.
- a processor can include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described herein.
- DSPs digital signal processors
- FPGAs field programmable gate arrays
- PLDs programmable logic devices
- state machines gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described herein.
- bus can refer to an interconnected architecture that is operably connected to transfer data between computer components within a singular or multiple systems.
- the bus can be a memory bus, a memory controller, a peripheral bus, an external bus, a crossbar switch, and/or a local bus, among others.
- Non-volatile memory can include volatile memory and/or nonvolatile memory.
- Non-volatile memory can include, for example, ROM (read only memory), PROM (programmable read only memory), EPROM (erasable PROM) and
- Volatile memory can include, for example, RAM (random access memory), synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), and direct RAM bus RAM (DRRAM).
- RAM random access memory
- SRAM synchronous RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDR SDRAM double data rate SDRAM
- DRRAM direct RAM bus RAM
- the input to the hierarchical sampling re-identification system is a set of object tracks, where each track is a sequence of snapshots captured across consecutive frames of the video stream.
- the re-identification system may extract meta-data in the form of descriptors (also called visual features), which are arrays of numbers representing the visual appearance of the object in each track.
- descriptors also called visual features
- a typical approach for is to extract a descriptor for each snapshot in the track, and store either all the descriptors or an aggregated descriptor (e.g., using average or max pooling) in the database.
- the resulting collection of descriptors provides the necessary meta-data to later perform re-identification.
- highly complex descriptors leading to accurate re-identification tend to have higher computational costs, while less complex descriptors tend to have lower computational cost, at the expense of providing lower re-identification accuracy.
- the system can extract descriptors from only M snapshots in the track, where M ⁇ N. In some instances, the higher the number of descriptors M, the more complete the description of the whole track, and the higher the accuracy of the subsequent re-identification.
- One aspect of the present disclosure includes how the system samples the best M snapshots that, combined, provide the most complete description of the object track.
- the ideal sampling process first clusters the snapshots in such a way that those with similar characteristics fall in the same cluster, and then picks a single representative snapshot per cluster, avoiding to extract descriptors from redundant snapshots in the same cluster.
- Clustering may rely on a similarity function that accurately compares the key visual appearance properties of snapshots, in order to put the ones with same properties in the same cluster.
- Such similarity function may be obtained by comparing the descriptors among snapshots, where each descriptor summarizes the key visual properties of its corresponding snapshot.
- An aspect of the present disclosure includes a system that extracts lower complexity descriptors for clustering, and then extracts higher-level complexity descriptors only from one snapshot per cluster.
- C L be the computational cost corresponding to the low-level complexity descriptor (measured as processing time in seconds that it takes to extract the descriptor for one snapshot)
- C H be the computational cost corresponding to the high-complexity descriptor
- N be the total number of snapshots in the current object track
- K be the number of samples we pick after clustering
- C c be the cost of clustering.
- the total cost for the pipeline for a two level hierarchical sampling is:
- a generalization of the previous strategy may be obtained by adding intermediate layers of complexity.
- the lower complexity descriptor C o may be applied to all the snapshots.
- the system extracts descriptors of intermediate complexity and discriminative power. As the intermediate layer is more discriminative than the lower layer, using the descriptors of intermediate complexity allows the system to further reduce the number of selected snapshots. Finally, the system extract the highest complexity descriptors from this reduced set of K 1 snapshots.
- sampling component that can be used to replace some of the layers of the pipeline is based on video segmentation. This sampling works by first detecting changes across frames, then segmenting the video into pieces of relatively constant content, and finally selecting a single snapshot for each segment. Typically, the higher the complexity of the segmentation algorithm, the better the selection of snapshots, at the cost of a higher computational cost, leading again to the same ideas exposed previously, and therefore allowing a similar hierarchical (multi-level) strategy.
- sampling layers of different complexity can be obtained by extracting not only descriptors of different complexity, but also other types of metadata. Examples of this are the number of the frame where the snapshot is found, the spatial coordinates of the object in the frame, or the size of the object in pixels. For example, using the last two types of metadata, the system may cluster the snapshots by spatial position and size, so that snapshots that haven't moved much fall into the same cluster.
- the pipeline may include an additional component which is a classifier.
- This component provides the class of object being described. Based on that, a class-specific descriptor can be extracted. For example, the system first utilizes the hierarchical sampling as described in previous sections up to level L- 1 . As a result of this process, the system may obtain a reduced number of sampled object image snapshots. For example, if the original process included only 3 levels of complexity, then the system may apply level 0 and level 1 , and may obtain K 1 snapshots as a result of last level. In general, the system will obtain K L-1 snapshots after L- 1 levels.
- the system may either apply the classification component to all these K L-1 sampled snapshots, or just apply it to a smaller subset of K′ snapshots (e.g., by using clustering again).
- a single classification decision is obtained by aggregating (e.g., averaging) the classification score obtained for each of the K′ snapshots and selecting the class whose aggregated score is maximum. Once the class has been determined, class-specific descriptors can be extracted from each of the K samples, in order to reduce the computational cost.
- clustering Many methods of clustering exist, including K-means, DBSCAN, Gaussian Mixture Models, Mean-Shift, and others. Also, different distances can be used, including Euclidean, Cosine distance, Mahalanobis, Geodesic distance and others. Another clustering type is online clustering.
- the system may select the next best snapshot that has the highest quality in terms of re-identification. For example, some snapshots will have higher quality because the object has better illumination, and therefore the details can be seen and described better. Also, the alignment of the object in the snapshot is used for re-identification, as a bad aligned snapshot will be visually dissimilar to other snapshots corresponding to the same object.
- the system may measure the Mean Average Precision (MAP) of a snapshot. Snapshots with lower quality will tend to be more often confused with snapshots from other objects, since the details are not so clear. On the other hand, snapshots with higher quality will tend to have a more clear separation to snapshots from other objects, and this can be measured by the MAP metric.
- MAP Mean Average Precision
- the system may use regression as a fast proxy.
- the idea is to train a “regression model” that is able to estimate the MAP by looking at the snapshot.
- a typical regression model is obtained by Neural Networks (NN). Because NN are also used to extract descriptors of high quality, the system may train a single network that provides both an estimated MAP score and a descriptor.
- the system may need to avoid considering all the snapshots whose similarity is higher than some pre-specified threshold.
- Using neural networks is just one possibility, as there are other regressors that can be used.
- the hierarchy comes from the fact that the first levels use fast and less accurate regressors which usually provide sub-optimal MAP estimation, so that we need to obtain more samples to compensate, while last levels obtain better MAP estimates which allows to narrow down the selection to fewer snapshots.
- an example of an environment 100 for performing hierarchical sampling for object re-identification may include a server 140 that receives surveillance videos and/or images 112 from a plurality of cameras 110 .
- the plurality of cameras 110 may capture the surveillance videos and/or images 112 of one or more locations 114 that include people and/or objects (e.g., cars, bags, etc.).
- the server 140 may include a communication component 142 that receives and/or sends data (such as the captured surveillance videos and/or images 112 ) from and to other devices, such as a data repository 150 .
- the server 140 may include an identification component 144 that performs the hierarchical sampling process for object re-identification.
- the server 140 may include a classification component 146 that classifies one or more images or objects in the images.
- the server 140 may include an artificial intelligence (AI) component 148 that performs AI operations during the re-identification process.
- AI artificial intelligence
- the captured surveillance videos and/or images may include snapshots (i.e., frames or portions of frames).
- a one minute surveillance video and/or images may include 30, 60, 120, 180, 240, or other numbers of snapshots.
- the communication component 142 may receive the surveillance video and/or images 112 from the plurality of cameras 110 .
- the identification component 144 may perform the hierarchical sampling process for re-identification.
- the classification component 146 may classify an image or objects of the image.
- the AI component 148 may perform filtering and/or representative snapshot selection process.
- the communication component 142 of the server 140 may receive the surveillance video and/or images 112 .
- the server 140 may generate and apply a first set of descriptors of low complexity (such as color, lighting, shape, etc) relating to a person or object to be identified in the surveillance video and/or images 112 .
- the application of the first set of descriptors to the surveillance video and/or images 112 may cause the server 140 to group the snapshots of the person or object to be identified in the surveillance video and/or images 112 into separate clusters.
- the server 140 may obtain three clusters: a first cluster 120 (e.g., snapshots with varying standing postures of the person), a second cluster 122 (e.g., snapshots with varying sitting postures of the person), and a third cluster 124 (e.g., snapshots with varying jumping postures of the person).
- a first cluster 120 e.g., snapshots with varying standing postures of the person
- a second cluster 122 e.g., snapshots with varying sitting postures of the person
- a third cluster 124 e.g., snapshots with varying jumping postures of the person.
- the server 140 may identify a representative snapshot 120 a , 122 a , 124 a (described in further detail below) from each of the first, second, and third clusters 120 , 122 , 124 .
- the representative snapshots 120 a , 122 a , 124 a may include the least number of background objects, having the most clear contrast, having the best lighting, showing certain desired features, etc.
- the server 140 may generate a second set of descriptors based on the representative snapshots 120 a , 122 a , 124 a .
- the second set of descriptors may include more complexity than the first set of descriptors (e.g., including spatial information, timing information, class information, etc.).
- the server 140 may apply the second set of descriptors to the surveillance video and/or images 112 to identify and/or locate a target, such as the person or object to be identified.
- an example of a method 200 for performing hierarchical sampling for re-identification may be performed by the server 140 and one or more of the communication component 142 and/or the identification component 144 .
- the method 200 may start the hierarchical sampling for a re-identification process.
- the method 200 may set the counters i and j to 0 .
- the counter i may represent the number of iterations of selecting descriptors and clustering snapshots.
- the counter j may represent the number of tracks (e.g., groups of videos and/or images).
- the identification component 144 of the server 140 receiving surveillance videos and/or image from nine cameras of the plurality of cameras 110 may have a j value of “9” ( 1 track from each camera).
- the method 200 may input snapshots of track(i) into a pool P.
- the identification component 144 may input a portion of the surveillance videos and/or images 112 into a pool P.
- the method 200 may generate a descriptor of complexity C, for each snapshot in the pool P.
- the descriptor of complexity C 0 may have lower complexity than the descriptor of complexity C 1
- the descriptor of complexity C 1 may have lower complexity than the descriptor of complexity C 2 , and so forth and so on.
- the method 200 may group snapshots into K clusters.
- the identification component 144 of the server 140 may group the surveillance videos and/or images 112 into three clusters: the first, second, and third clusters 120 , 122 , 124 .
- the method 200 may select a snapshot per cluster.
- the identification component 144 of the server 140 may select the snapshots 120 a , 122 a , 124 a for each of the first, second, and third clusters 120 , 122 , 124 .
- the method 200 may input the selected snapshots into a pool P′.
- the identification component 144 may input the snapshots 120 a , 122 a , 124 a into the pool P′.
- the method 200 may increment the counter i by one and set the pool P to be equal to the pool P′.
- the identification component 144 may increment the counter i and set the pool P to P′.
- the method 200 may inject the descriptors of complexity CL into a database, such as the data repository 150 .
- the identification component 144 may apply the descriptors of complexity CL (e.g., C 1 for 1 level, C 2 for 2 levels, etc.) on the surveillance videos and/or images 112 in the server 140 or the data repository 150 .
- the method 200 may increment the counter j by 1 .
- the identification component 144 may increment the counter j by 1 .
- an example of a method 300 for performing hierarchical sampling for re-identification including classification may be performed by the server 140 and one or more of the communication component 142 , the identification component 144 , the classification component 146 and/or the AI component 148 .
- the method 300 may perform a hierarchical sampling process for re-identification (with or without classification) for L- 1 levels as described above.
- the method 300 may group snapshots in K′ clusters as described above.
- the method 300 may select a snapshot per cluster.
- the identification component 144 may select K′ snapshots for the K′ cluster based on the quality of the snapshot as described above.
- the method 300 may classify the selected snapshots.
- the identification component 144 and/or the classification component 146 may classify the selected snapshots based on one or more classification algorithms as described above.
- each of the selected snapshot may be assigned a plurality of classification scores associated with a plurality of classes (e.g., person class, car class, building class, object class, etc.).
- a first snapshot may be assigned classification scores of (car- 1 , person- 5 , building- 2 )
- a second snapshot may be assigned classification scores of (car- 0 , person- 4 , building- 0 ).
- the method 300 may aggregation the classification score.
- the identification component 144 and/or the classification component 146 may aggregate the corresponding classification scores for the K′ snapshots as described above.
- the aggregated scores for the example above is (car- 1 , person- 9 , building- 2 ).
- the method 300 may determine a class C based on the aggregated classification scores. For example, the identification component 144 and/or the classification component 146 may determine that, given the aggregated scores of (car- 1 , person- 9 , building- 2 ), the classification for the corresponding cluster is a person as described above.
- the method 300 may generate K class-specific descriptors of class C with complexity Ct.
- the identification component 144 and/or the classification component 146 may generate class-specific descriptors of the person class with complexity CL as described above.
- the method 300 may inject the K class-specific descriptors of complexity CL into the database.
- the identification component 144 may apply the class-specific descriptors of complexity CL (e.g., C 1 for 1 level, C 2 for 2 levels, etc.) on the surveillance videos and/or images 112 in the server 140 or the data repository 150 as described above.
- an example of a method 400 for performing hierarchical sampling using neural networks for re-identification may be performed by the server 140 and one or more of the communication component 142 and/or the identification component 144 .
- the method 400 may start the hierarchical sampling for re-identification process as described above.
- the method 400 may set the counters i and j to 0 .
- the counter i may represent the number of iterations of selecting descriptors and clustering snapshots.
- the counter j may represent the number of tracks (e.g., groups of videos and/or images).
- the identification component 144 of the server 140 receiving surveillance videos and/or image from nine cameras of the plurality of cameras 110 may have a j value of “9” ( 1 track from each camera) as described above.
- the method 400 may input snapshots of track(i) into a pool P.
- the identification component 144 may input a portion of the surveillance videos and/or images 112 into a pool P as described above.
- the method 400 may select, using a network N i , snapshots from the pool P with the highest estimated mean average precisions (MAPs) and put into a pool P′.
- the AI component 148 may use a neural network N, to select snapshots having the highest MAP for re-identification purpose.
- the AI component 148 may use regression as a fast proxy as described above.
- the AI component 148 may train a “regression model” that estimates the MAP by examining a snapshot.
- the snapshot with the highest MAPs may be snapshots with the highest qualities for re-identification (e.g., good illumination, high level of details, good alignment and/or orientation).
- the method 400 may remove all snapshots having similarity indices above a threshold, wherein the similarity indices are associated with resemblance to the selected snapshot from P.
- the identification component 144 may remove all snapshots having similarity indices above a threshold, wherein the similarity indices are associated with resemblance to the selected snapshot from P as described above.
- two images that look “similar” e.g., same people/object, same background, taken within half of a second from each other, etc.
- two images that look “similar” may have high similarity indices.
- the identification component 144 of the server 140 may increment the counter i by one and set the pool P to be equal to the pool P′. For example, the identification component 144 may increment the counter i and set the pool P to P′.
- the method 400 may inject the descriptors of complexity C L , into a database, such as the data repository 150 , using the neural network NL.
- the identification component 144 may apply the descriptors of complexity C L , (e.g., C 1 for one level, C 2 for two levels, etc.) on the surveillance videos and/or images 112 in the server 140 or the data repository 150 using the neural network trained at block 408 as described above.
- the method 400 may increment the counter j by one.
- the identification component 144 may increment the counter j by one as described above.
- aspects of the present disclosures may be implemented using hardware, software, or a combination thereof and may be implemented in one or more computer systems or other processing systems. In an aspect of the present disclosures, features are directed toward one or more computer systems capable of carrying out the functionality described herein.
- An example of such the computer system 500 is shown in FIG. 5 .
- the server 140 may be implemented as the computer system 500 shown in FIG. 5 .
- the server 140 may include some or all of the components of the computer system 500 .
- the computer system 500 includes one or more processors, such as processor 504 .
- the processor 504 is connected with a communication infrastructure 506 (e.g., a communications bus, cross-over bar, or network).
- a communication infrastructure 506 e.g., a communications bus, cross-over bar, or network.
- the computer system 500 may include a display interface 502 that forwards graphics, text, and other data from the communication infrastructure 506 (or from a frame buffer not shown) for display on a display unit 550 .
- Computer system 500 also includes a main memory 508 , preferably random access memory (RAM), and may also include a secondary memory 510 .
- the secondary memory 510 may include, for example, a hard disk drive 512 , and/or a removable storage drive 514 , representing a floppy disk drive, a magnetic tape drive, an optical disk drive, a universal serial bus (USB) flash drive, etc.
- the removable storage drive 514 reads from and/or writes to a removable storage unit 518 in a well-known manner.
- Removable storage unit 518 represents a floppy disk, magnetic tape, optical disk, USB flash drive etc., which is read by and written to removable storage drive 514 .
- the removable storage unit 518 includes a computer usable storage medium having stored therein computer software and/or data.
- one or more of the main memory 508 , the secondary memory 510 , the removable storage unit 518 , and/or the removable storage unit 522 may be a non-transitory memory.
- Secondary memory 510 may include other similar devices for allowing computer programs or other instructions to be loaded into computer system 500 .
- Such devices may include, for example, a removable storage unit 522 and an interface 520 .
- Examples of such may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an erasable programmable read only memory (EPROM), or programmable read only memory (PROM)) and associated socket, and other removable storage units 522 and interfaces 520 , which allow software and data to be transferred from the removable storage unit 522 to computer system 500 .
- EPROM erasable programmable read only memory
- PROM programmable read only memory
- Computer system 500 may also include a communications circuit 524 .
- the communications circuit 524 may allow software and data to be transferred between computer system 500 and external devices. Examples of the communications circuit 524 may include a modem, a network interface (such as an Ethernet card), a communications port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc.
- Software and data transferred via the communications circuit 524 are in the form of signals 528 , which may be electronic, electromagnetic, optical or other signals capable of being received by the communications circuit 524 . These signals 528 are provided to the communications circuit 524 via a communications path (e.g., channel) 526 .
- a communications path e.g., channel
- This path 526 carries signals 528 and may be implemented using wire or cable, fiber optics, a telephone line, a cellular link, an RF link and/or other communications channels.
- computer program medium and “computer usable medium” are used to refer generally to media such as the removable storage unit 518 , a hard disk installed in hard disk drive 512 , and signals 528 .
- These computer program products provide software to the computer system 500 . Aspects of the present disclosures are directed to such computer program products.
- Computer programs are stored in main memory 508 and/or secondary memory 510 . Computer programs may also be received via communications circuit 524 . Such computer programs, when executed, enable the computer system 500 to perform the features in accordance with aspects of the present disclosures, as discussed herein. In particular, the computer programs, when executed, enable the processor 504 to perform the features in accordance with aspects of the present disclosures. Accordingly, such computer programs represent controllers of the computer system 500 .
- the software may be stored in a computer program product and loaded into computer system 500 using removable storage drive 514 , hard drive 512 , or communications interface 520 .
- the control logic when executed by the processor 504 , causes the processor 504 to perform the functions described herein.
- the system is implemented primarily in hardware using, for example, hardware components, such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
Description
- The current application is a continuation application of U.S. patent application Ser. No. 17/061,262, entitled “HIERARCHICAL SAMPLING FOR OBJECT IDENTIFICATION,” filed Oct. 1, 2020, which claims the benefit of U.S.
- Provisional Application No. 62/908,980, entitled “HIERARCHICAL SAMPLING FOR OBJECT IDENTIFICATION,” filed on Oct. 1, 2019, the contents of which are incorporated by reference in their entireties.
- In surveillance systems, numerous images (e.g., more than thousands or even millions) may be captured by multiple cameras. Each image may show people and objects (e.g., cars, infrastructures, accessories, etc.). In certain circumstances, security personnel monitoring the surveillance systems may want to locate and/or track a particular person and/or object through the multiple cameras. However, it may be computationally intensive for the surveillance systems to accurately track the particular person and/or object by searching through the images. Therefore, improvements may be desirable.
- This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the DETAILED DESCRIPTION. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
- An aspect of the present disclosure includes a method including receiving a first plurality of snapshots, generating a first plurality of descriptors each associated with the first plurality of snapshots, grouping the first plurality of snapshots into at least one cluster based on the plurality of descriptors, selecting a representative snapshot for each of the at least one cluster, generating at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identifying a target by applying the at least second descriptor to a second plurality of snapshots.
- Aspects of the present disclosure includes a system having a memory that stores instructions and a processor configured to execute the instructions to receive a first plurality of snapshots, generate a first plurality of descriptors each associated with the first plurality of snapshots, group the first plurality of snapshots into at least one cluster based on the plurality of descriptors, select a representative snapshot for each of the at least one cluster, generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identify a target by applying the at least second descriptor to a second plurality of snapshots.
- Certain aspects of the present disclosure includes a non-transitory computer readable medium having instructions stored therein that, when executed by a processor, cause the processor to receive a first plurality of snapshots, generate a first plurality of descriptors each associated with the first plurality of snapshots, group the first plurality of snapshots into at least one cluster based on the plurality of descriptors, select a representative snapshot for each of the at least one cluster, generate at least one second descriptor for the representative snapshot for each of the at least one cluster, wherein the at least one second descriptor is more complex than the first plurality of descriptors, and identify a target by applying the at least second descriptor to a second plurality of snapshots.
- The features believed to be characteristic of aspects of the disclosure are set forth in the appended claims. In the description that follows, like parts are marked throughout the specification and drawings with the same numerals, respectively. The drawing figures are not necessarily drawn to scale and certain figures may be shown in exaggerated or generalized form in the interest of clarity and conciseness. The disclosure itself, however, as well as a preferred mode of use, further objects and advantages thereof, will be best understood by reference to the following detailed description of illustrative aspects of the disclosure when read in conjunction with the accompanying drawings, wherein:
-
FIG. 1 illustrates an example of an environment for implementing the hierarchical sampling for re-identification process in accordance with aspects of the present disclosure; -
FIG. 2 illustrates an example of a method for implementing the hierarchical sampling for re-identification process in accordance with aspects of the present disclosure; -
FIG. 3 illustrates an example of a method for implementing the hierarchical sampling for re-identification process including classification in accordance with aspects of the present disclosure; -
FIG. 4 illustrates an example of a method for implementing the hierarchical sampling for re-identification process using neural networks in accordance with aspects of the present disclosure; and -
FIG. 5 illustrates an example of a computer system in accordance with aspects of the present disclosure. - The following includes definitions of selected terms employed herein. The definitions include various examples and/or forms of components that fall within the scope of a term and that may be used for implementation. The examples are not intended to be limiting.
- The term “processor,” as used herein, can refer to a device that processes signals and performs general computing and arithmetic functions. Signals processed by the processor can include digital signals, data signals, computer instructions, processor instructions, messages, a bit, a bit stream, or other computing that can be received, transmitted and/or detected. A processor, for example, can include microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described herein.
- The term “bus,” as used herein, can refer to an interconnected architecture that is operably connected to transfer data between computer components within a singular or multiple systems. The bus can be a memory bus, a memory controller, a peripheral bus, an external bus, a crossbar switch, and/or a local bus, among others.
- The term “memory,” as used herein, can include volatile memory and/or nonvolatile memory. Non-volatile memory can include, for example, ROM (read only memory), PROM (programmable read only memory), EPROM (erasable PROM) and
- EEPROM (electrically erasable PROM). Volatile memory can include, for example, RAM (random access memory), synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), and direct RAM bus RAM (DRRAM).
- The input to the hierarchical sampling re-identification system is a set of object tracks, where each track is a sequence of snapshots captured across consecutive frames of the video stream. Given this input, the re-identification system may extract meta-data in the form of descriptors (also called visual features), which are arrays of numbers representing the visual appearance of the object in each track. A typical approach for is to extract a descriptor for each snapshot in the track, and store either all the descriptors or an aggregated descriptor (e.g., using average or max pooling) in the database. The resulting collection of descriptors provides the necessary meta-data to later perform re-identification. Typically, highly complex descriptors leading to accurate re-identification tend to have higher computational costs, while less complex descriptors tend to have lower computational cost, at the expense of providing lower re-identification accuracy.
- If there are N snapshots in the track, the system extracts one descriptor per snapshot, and the extraction cost per descriptor is C, then the total cost for the track is T=C*N. In order to reduce this cost, the system can extract descriptors from only M snapshots in the track, where M<<N. In some instances, the higher the number of descriptors M, the more complete the description of the whole track, and the higher the accuracy of the subsequent re-identification.
- One aspect of the present disclosure includes how the system samples the best M snapshots that, combined, provide the most complete description of the object track. In general, the ideal sampling process first clusters the snapshots in such a way that those with similar characteristics fall in the same cluster, and then picks a single representative snapshot per cluster, avoiding to extract descriptors from redundant snapshots in the same cluster.
- Clustering may rely on a similarity function that accurately compares the key visual appearance properties of snapshots, in order to put the ones with same properties in the same cluster. Such similarity function may be obtained by comparing the descriptors among snapshots, where each descriptor summarizes the key visual properties of its corresponding snapshot.
- An aspect of the present disclosure includes a system that extracts lower complexity descriptors for clustering, and then extracts higher-level complexity descriptors only from one snapshot per cluster.
- Let CL be the computational cost corresponding to the low-level complexity descriptor (measured as processing time in seconds that it takes to extract the descriptor for one snapshot), let CH be the computational cost corresponding to the high-complexity descriptor, let N be the total number of snapshots in the current object track, let K be the number of samples we pick after clustering, and let Cc be the cost of clustering. The total cost for the pipeline for a two level hierarchical sampling is:
-
CT=N*CL+CC+K*CH - A generalization of the previous strategy may be obtained by adding intermediate layers of complexity. The lower complexity descriptor Co may be applied to all the snapshots. Next, the system extracts descriptors of intermediate complexity and discriminative power. As the intermediate layer is more discriminative than the lower layer, using the descriptors of intermediate complexity allows the system to further reduce the number of selected snapshots. Finally, the system extract the highest complexity descriptors from this reduced set of K1 snapshots.
- Another example of sampling component that can be used to replace some of the layers of the pipeline is based on video segmentation. This sampling works by first detecting changes across frames, then segmenting the video into pieces of relatively constant content, and finally selecting a single snapshot for each segment. Typically, the higher the complexity of the segmentation algorithm, the better the selection of snapshots, at the cost of a higher computational cost, leading again to the same ideas exposed previously, and therefore allowing a similar hierarchical (multi-level) strategy.
- As described in the previous paragraph, sampling layers of different complexity can be obtained by extracting not only descriptors of different complexity, but also other types of metadata. Examples of this are the number of the frame where the snapshot is found, the spatial coordinates of the object in the frame, or the size of the object in pixels. For example, using the last two types of metadata, the system may cluster the snapshots by spatial position and size, so that snapshots that haven't moved much fall into the same cluster.
- In some instances, the pipeline may include an additional component which is a classifier. This component provides the class of object being described. Based on that, a class-specific descriptor can be extracted. For example, the system first utilizes the hierarchical sampling as described in previous sections up to level L-1. As a result of this process, the system may obtain a reduced number of sampled object image snapshots. For example, if the original process included only 3 levels of complexity, then the system may apply
level 0 andlevel 1, and may obtain K1 snapshots as a result of last level. In general, the system will obtain KL-1snapshots after L-1 levels. Then, the system may either apply the classification component to all these KL-1 sampled snapshots, or just apply it to a smaller subset of K′ snapshots (e.g., by using clustering again). A single classification decision is obtained by aggregating (e.g., averaging) the classification score obtained for each of the K′ snapshots and selecting the class whose aggregated score is maximum. Once the class has been determined, class-specific descriptors can be extracted from each of the K samples, in order to reduce the computational cost. - Many methods of clustering exist, including K-means, DBSCAN, Gaussian Mixture Models, Mean-Shift, and others. Also, different distances can be used, including Euclidean, Cosine distance, Mahalanobis, Geodesic distance and others. Another clustering type is online clustering.
- In another implementation, the system may select the next best snapshot that has the highest quality in terms of re-identification. For example, some snapshots will have higher quality because the object has better illumination, and therefore the details can be seen and described better. Also, the alignment of the object in the snapshot is used for re-identification, as a bad aligned snapshot will be visually dissimilar to other snapshots corresponding to the same object.
- In order to measure the quality of the snapshot for re-identification purposes, the system may measure the Mean Average Precision (MAP) of a snapshot. Snapshots with lower quality will tend to be more often confused with snapshots from other objects, since the details are not so clear. On the other hand, snapshots with higher quality will tend to have a more clear separation to snapshots from other objects, and this can be measured by the MAP metric.
- In order to avoid having to compute this MAP metric based on actual comparisons with a gallery of snapshots in the database, the system may use regression as a fast proxy. The idea is to train a “regression model” that is able to estimate the MAP by looking at the snapshot. A typical regression model is obtained by Neural Networks (NN). Because NN are also used to extract descriptors of high quality, the system may train a single network that provides both an estimated MAP score and a descriptor.
- In order to avoid selecting multiple snapshots having similar MAP scores, after one snapshot is selected, the system may need to avoid considering all the snapshots whose similarity is higher than some pre-specified threshold. Using neural networks is just one possibility, as there are other regressors that can be used. The hierarchy comes from the fact that the first levels use fast and less accurate regressors which usually provide sub-optimal MAP estimation, so that we need to obtain more samples to compensate, while last levels obtain better MAP estimates which allows to narrow down the selection to fewer snapshots.
- Referring to
FIG. 1 , an example of anenvironment 100 for performing hierarchical sampling for object re-identification may include aserver 140 that receives surveillance videos and/orimages 112 from a plurality ofcameras 110. The plurality ofcameras 110 may capture the surveillance videos and/orimages 112 of one ormore locations 114 that include people and/or objects (e.g., cars, bags, etc.). - In certain instances, the
server 140 may include acommunication component 142 that receives and/or sends data (such as the captured surveillance videos and/or images 112) from and to other devices, such as adata repository 150. Theserver 140 may include anidentification component 144 that performs the hierarchical sampling process for object re-identification. Theserver 140 may include aclassification component 146 that classifies one or more images or objects in the images. Theserver 140 may include an artificial intelligence (AI)component 148 that performs AI operations during the re-identification process. - In some implementations, the captured surveillance videos and/or images may include snapshots (i.e., frames or portions of frames). For example a one minute surveillance video and/or images may include 30, 60, 120, 180, 240, or other numbers of snapshots. During the hierarchical sampling process, the
communication component 142 may receive the surveillance video and/orimages 112 from the plurality ofcameras 110. Theidentification component 144 may perform the hierarchical sampling process for re-identification. Theclassification component 146 may classify an image or objects of the image. TheAI component 148 may perform filtering and/or representative snapshot selection process. - In certain aspects, the
communication component 142 of theserver 140 may receive the surveillance video and/orimages 112. Theserver 140 may generate and apply a first set of descriptors of low complexity (such as color, lighting, shape, etc) relating to a person or object to be identified in the surveillance video and/orimages 112. The application of the first set of descriptors to the surveillance video and/orimages 112 may cause theserver 140 to group the snapshots of the person or object to be identified in the surveillance video and/orimages 112 into separate clusters. For example, by using a shape descriptor (shape of people or objects), theserver 140 may obtain three clusters: a first cluster 120 (e.g., snapshots with varying standing postures of the person), a second cluster 122 (e.g., snapshots with varying sitting postures of the person), and a third cluster 124 (e.g., snapshots with varying jumping postures of the person). - Next, in some instances, the
server 140 may identify a 120 a, 122 a, 124 a (described in further detail below) from each of the first, second, andrepresentative snapshot 120, 122, 124. Thethird clusters 120 a, 122 a, 124 a may include the least number of background objects, having the most clear contrast, having the best lighting, showing certain desired features, etc.representative snapshots - Next, in some examples, the
server 140 may generate a second set of descriptors based on the 120 a, 122 a, 124 a. The second set of descriptors may include more complexity than the first set of descriptors (e.g., including spatial information, timing information, class information, etc.).representative snapshots - In certain implementations, the
server 140 may apply the second set of descriptors to the surveillance video and/orimages 112 to identify and/or locate a target, such as the person or object to be identified. - Turning to
FIG. 2 , an example of amethod 200 for performing hierarchical sampling for re-identification may be performed by theserver 140 and one or more of thecommunication component 142 and/or theidentification component 144. - At
block 202, themethod 200 may start the hierarchical sampling for a re-identification process. - At
block 204, themethod 200 may set the counters i and j to 0. The counter i may represent the number of iterations of selecting descriptors and clustering snapshots. The counter j may represent the number of tracks (e.g., groups of videos and/or images). For example, theidentification component 144 of theserver 140 receiving surveillance videos and/or image from nine cameras of the plurality ofcameras 110 may have a j value of “9” (1 track from each camera). - At
block 206, themethod 200 may input snapshots of track(i) into a pool P. For example, theidentification component 144 may input a portion of the surveillance videos and/orimages 112 into a pool P. - At
block 208, themethod 200 may generate a descriptor of complexity C, for each snapshot in the pool P. The descriptor of complexity C0 may have lower complexity than the descriptor of complexity C1, the descriptor of complexity C1 may have lower complexity than the descriptor of complexity C2, and so forth and so on. - At
block 210, themethod 200 may determine if i=L, where L is the number of levels of complexities. If theidentification component 144 of theserver 140 determines that i<L, then theidentification component 144 may move ontoblock 212. - At
block 212, themethod 200 may group snapshots into K clusters. For example, theidentification component 144 of theserver 140 may group the surveillance videos and/orimages 112 into three clusters: the first, second, and 120, 122, 124.third clusters - At
block 214, themethod 200 may select a snapshot per cluster. For example, theidentification component 144 of theserver 140 may select the 120 a, 122 a, 124 a for each of the first, second, andsnapshots 120, 122, 124.third clusters - At
block 216, themethod 200 may input the selected snapshots into a pool P′. For example, theidentification component 144 may input the 120 a, 122 a, 124 a into the pool P′.snapshots - At
block 218, themethod 200 may increment the counter i by one and set the pool P to be equal to the pool P′. For example, theidentification component 144 may increment the counter i and set the pool P to P′. - In some implementations, the
method 200 may iteratively perform some or all of the steps between 208 and 218 until, atblocks block 210, theidentification component 144 of theserver 140 determines that i=L. If theidentification component 144 of theserver 140 determines that i=L, then theidentification component 144 may move ontoblock 220. - At
block 220, themethod 200 may inject the descriptors of complexity CL into a database, such as thedata repository 150. For example, theidentification component 144 may apply the descriptors of complexity CL (e.g., C1 for 1 level, C2 for 2 levels, etc.) on the surveillance videos and/orimages 112 in theserver 140 or thedata repository 150. - At block 222, the
method 200 may determine if j=M, where M is the number of tracks. If theidentification component 144 of theserver 140 determines that j<M, then theidentification component 144 may move ontoblock 224. - At
block 224, themethod 200 may increment the counter j by 1. For example, theidentification component 144 may increment the counter j by 1. - In some implementations, the
method 200 may iteratively perform some or all of the steps betweenblocks 208 and 222 until, at block 222, theidentification component 144 of theserver 140 determines that j=M. If theidentification component 144 of theserver 140 determines that j=M, then theidentification component 144 may move ontoblock 226 to terminate themethod 200. - Turning now to
FIG. 3 , an example of amethod 300 for performing hierarchical sampling for re-identification including classification may be performed by theserver 140 and one or more of thecommunication component 142, theidentification component 144, theclassification component 146 and/or theAI component 148. - At
block 302, themethod 300 may perform a hierarchical sampling process for re-identification (with or without classification) for L-1 levels as described above. - At
block 304, themethod 300 may group snapshots in K′ clusters as described above. - At
block 306, themethod 300 may select a snapshot per cluster. For example, theidentification component 144 may select K′ snapshots for the K′ cluster based on the quality of the snapshot as described above. - At
block 308, themethod 300 may classify the selected snapshots. For example, theidentification component 144 and/or theclassification component 146 may classify the selected snapshots based on one or more classification algorithms as described above. During the classification process, each of the selected snapshot may be assigned a plurality of classification scores associated with a plurality of classes (e.g., person class, car class, building class, object class, etc.). In one non-limiting example, a first snapshot may be assigned classification scores of (car-1, person-5, building-2), and a second snapshot may be assigned classification scores of (car-0, person-4, building-0). - At
block 310, themethod 300 may aggregation the classification score. For example, theidentification component 144 and/or theclassification component 146 may aggregate the corresponding classification scores for the K′ snapshots as described above. For example, the aggregated scores for the example above is (car-1, person-9, building-2). - At
block 312, themethod 300 may determine a class C based on the aggregated classification scores. For example, theidentification component 144 and/or theclassification component 146 may determine that, given the aggregated scores of (car-1, person-9, building-2), the classification for the corresponding cluster is a person as described above. - At
block 314, themethod 300 may generate K class-specific descriptors of class C with complexity Ct. For example, theidentification component 144 and/or theclassification component 146 may generate class-specific descriptors of the person class with complexity CL as described above. - At
block 316, themethod 300 may inject the K class-specific descriptors of complexity CL into the database. For example, theidentification component 144 may apply the class-specific descriptors of complexity CL (e.g., C1 for 1 level, C2 for 2 levels, etc.) on the surveillance videos and/orimages 112 in theserver 140 or thedata repository 150 as described above. - Turning now to
FIG. 4 , an example of amethod 400 for performing hierarchical sampling using neural networks for re-identification may be performed by theserver 140 and one or more of thecommunication component 142 and/or theidentification component 144. - At
block 402, themethod 400 may start the hierarchical sampling for re-identification process as described above. - At
block 404, themethod 400 may set the counters i and j to 0. The counter i may represent the number of iterations of selecting descriptors and clustering snapshots. The counter j may represent the number of tracks (e.g., groups of videos and/or images). For example, theidentification component 144 of theserver 140 receiving surveillance videos and/or image from nine cameras of the plurality ofcameras 110 may have a j value of “9” (1 track from each camera) as described above. - At
block 406, themethod 400 may input snapshots of track(i) into a pool P. For example, theidentification component 144 may input a portion of the surveillance videos and/orimages 112 into a pool P as described above. - At
block 408, themethod 400 may select, using a network Ni, snapshots from the pool P with the highest estimated mean average precisions (MAPs) and put into a pool P′. For example, theAI component 148 may use a neural network N, to select snapshots having the highest MAP for re-identification purpose. In one example, theAI component 148 may use regression as a fast proxy as described above. TheAI component 148 may train a “regression model” that estimates the MAP by examining a snapshot. The snapshot with the highest MAPs may be snapshots with the highest qualities for re-identification (e.g., good illumination, high level of details, good alignment and/or orientation). - At
block 410, themethod 400 may determine whether P′=Ki, where K is a predetermined number associated with the number of clusters. If theidentification component 144 determines that P′≠ theidentification component 144 may proceed to block 412 as described above. - At
block 412, themethod 400 may remove all snapshots having similarity indices above a threshold, wherein the similarity indices are associated with resemblance to the selected snapshot from P. For example, theidentification component 144 may remove all snapshots having similarity indices above a threshold, wherein the similarity indices are associated with resemblance to the selected snapshot from P as described above. In a non-limiting example, two images that look “similar” (e.g., same people/object, same background, taken within half of a second from each other, etc.) may have high similarity indices. - Next, the
method 400 may iteratively perform some or all of the steps between 406 and 412 until, atblocks block 410, theidentification component 144 of theserver 140 determines that P′=Ki. If theidentification component 144 of theserver 140 determines that P′=Ki, then theidentification component 144 may move ontoblock 414. - At
block 414, theidentification component 144 of theserver 140 may increment the counter i by one and set the pool P to be equal to the pool P′. For example, theidentification component 144 may increment the counter i and set the pool P to P′. - At
block 416, themethod 200 may determine if i=L, where L is the number of levels of complexities. If theidentification component 144 of theserver 140 determines that i<L, then theidentification component 144 may move back to block 408 again. - In some implementations, the
method 400 may iteratively perform some or all of the steps between 408 and 416 until, atblocks block 416, theidentification component 144 of theserver 140 determines that i=L. If theidentification component 144 of theserver 140 determines that i=L, then theidentification component 144 may move ontoblock 418. - At
block 418, themethod 400 may inject the descriptors of complexity CL, into a database, such as thedata repository 150, using the neural network NL. For example, theidentification component 144 may apply the descriptors of complexity CL, (e.g., C1 for one level, C2 for two levels, etc.) on the surveillance videos and/orimages 112 in theserver 140 or thedata repository 150 using the neural network trained atblock 408 as described above. - At
block 420, themethod 400 may determine if j=M, where M is the number of tracks. If theidentification component 144 of theserver 140 determines that j<M, then theidentification component 144 may move ontoblock 422. - At
block 422, themethod 400 may increment the counter j by one. For example, theidentification component 144 may increment the counter j by one as described above. - In some implementations, the
method 400 may iteratively perform some or all of the steps between 406 and 422 until, atblocks block 422, theidentification component 144 of theserver 140 determines that j=M. If theidentification component 144 of theserver 140 determines that j=M, then theidentification component 144 may move ontoblock 424 to terminate themethod 400. - Aspects of the present disclosures may be implemented using hardware, software, or a combination thereof and may be implemented in one or more computer systems or other processing systems. In an aspect of the present disclosures, features are directed toward one or more computer systems capable of carrying out the functionality described herein. An example of such the
computer system 500 is shown inFIG. 5 . In some examples, theserver 140 may be implemented as thecomputer system 500 shown inFIG. 5 . Theserver 140 may include some or all of the components of thecomputer system 500. - The
computer system 500 includes one or more processors, such asprocessor 504. Theprocessor 504 is connected with a communication infrastructure 506 (e.g., a communications bus, cross-over bar, or network). Various software aspects are described in terms of this example computer system. After reading this description, it will become apparent to a person skilled in the relevant art(s) how to implement aspects of the disclosures using other computer systems and/or architectures. - The
computer system 500 may include adisplay interface 502 that forwards graphics, text, and other data from the communication infrastructure 506 (or from a frame buffer not shown) for display on adisplay unit 550.Computer system 500 also includes amain memory 508, preferably random access memory (RAM), and may also include asecondary memory 510. Thesecondary memory 510 may include, for example, ahard disk drive 512, and/or aremovable storage drive 514, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, a universal serial bus (USB) flash drive, etc. Theremovable storage drive 514 reads from and/or writes to aremovable storage unit 518 in a well-known manner.Removable storage unit 518 represents a floppy disk, magnetic tape, optical disk, USB flash drive etc., which is read by and written toremovable storage drive 514. As will be appreciated, theremovable storage unit 518 includes a computer usable storage medium having stored therein computer software and/or data. In some examples, one or more of themain memory 508, thesecondary memory 510, theremovable storage unit 518, and/or theremovable storage unit 522 may be a non-transitory memory. - Alternative aspects of the present disclosures may include
secondary memory 510 and may include other similar devices for allowing computer programs or other instructions to be loaded intocomputer system 500. Such devices may include, for example, aremovable storage unit 522 and aninterface 520. Examples of such may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an erasable programmable read only memory (EPROM), or programmable read only memory (PROM)) and associated socket, and otherremovable storage units 522 andinterfaces 520, which allow software and data to be transferred from theremovable storage unit 522 tocomputer system 500. -
Computer system 500 may also include acommunications circuit 524. Thecommunications circuit 524 may allow software and data to be transferred betweencomputer system 500 and external devices. Examples of thecommunications circuit 524 may include a modem, a network interface (such as an Ethernet card), a communications port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, etc. Software and data transferred via thecommunications circuit 524 are in the form ofsignals 528, which may be electronic, electromagnetic, optical or other signals capable of being received by thecommunications circuit 524. Thesesignals 528 are provided to thecommunications circuit 524 via a communications path (e.g., channel) 526. Thispath 526 carriessignals 528 and may be implemented using wire or cable, fiber optics, a telephone line, a cellular link, an RF link and/or other communications channels. In this document, the terms “computer program medium” and “computer usable medium” are used to refer generally to media such as theremovable storage unit 518, a hard disk installed inhard disk drive 512, and signals 528. These computer program products provide software to thecomputer system 500. Aspects of the present disclosures are directed to such computer program products. - Computer programs (also referred to as computer control logic) are stored in
main memory 508 and/orsecondary memory 510. Computer programs may also be received viacommunications circuit 524. Such computer programs, when executed, enable thecomputer system 500 to perform the features in accordance with aspects of the present disclosures, as discussed herein. In particular, the computer programs, when executed, enable theprocessor 504 to perform the features in accordance with aspects of the present disclosures. Accordingly, such computer programs represent controllers of thecomputer system 500. - In an aspect of the present disclosures where the method is implemented using software, the software may be stored in a computer program product and loaded into
computer system 500 usingremovable storage drive 514,hard drive 512, orcommunications interface 520. The control logic (software), when executed by theprocessor 504, causes theprocessor 504 to perform the functions described herein. In another aspect of the present disclosures, the system is implemented primarily in hardware using, for example, hardware components, such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s). - It will be appreciated that various implementations of the above-disclosed and other features and functions, or alternatives or varieties thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/882,208 US20220375202A1 (en) | 2019-10-01 | 2022-08-05 | Hierarchical sampling for object identification |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962908980P | 2019-10-01 | 2019-10-01 | |
| US17/061,262 US11423248B2 (en) | 2019-10-01 | 2020-10-01 | Hierarchical sampling for object identification |
| US17/882,208 US20220375202A1 (en) | 2019-10-01 | 2022-08-05 | Hierarchical sampling for object identification |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/061,262 Continuation US11423248B2 (en) | 2019-10-01 | 2020-10-01 | Hierarchical sampling for object identification |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20220375202A1 true US20220375202A1 (en) | 2022-11-24 |
Family
ID=72717781
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/061,262 Active 2041-01-13 US11423248B2 (en) | 2019-10-01 | 2020-10-01 | Hierarchical sampling for object identification |
| US17/882,208 Pending US20220375202A1 (en) | 2019-10-01 | 2022-08-05 | Hierarchical sampling for object identification |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/061,262 Active 2041-01-13 US11423248B2 (en) | 2019-10-01 | 2020-10-01 | Hierarchical sampling for object identification |
Country Status (2)
| Country | Link |
|---|---|
| US (2) | US11423248B2 (en) |
| EP (1) | EP3800578A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3800578A1 (en) * | 2019-10-01 | 2021-04-07 | Sensormatic Electronics, LLC | Hierarchical sampling for object identification |
| US20250208602A1 (en) * | 2022-04-14 | 2025-06-26 | Peridot Print Llc | Identification of controlled objects |
| US12229023B2 (en) | 2022-04-26 | 2025-02-18 | Pure Storage, Inc. | Cluster-wide snapshotting of a container system cluster |
| KR20250014730A (en) * | 2023-07-21 | 2025-02-03 | 라인플러스 주식회사 | Method, computer device, and computer program to create video snapshot using image hash |
Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7181757B1 (en) * | 1999-10-11 | 2007-02-20 | Electronics And Telecommunications Research Institute | Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing |
| US20100125581A1 (en) * | 2005-11-15 | 2010-05-20 | Shmuel Peleg | Methods and systems for producing a video synopsis using clustering |
| US20110302207A1 (en) * | 2008-12-02 | 2011-12-08 | Haskolinn I Reykjavik | Multimedia identifier |
| US20160042250A1 (en) * | 2014-07-03 | 2016-02-11 | Oim Squared Inc. | Interactive content generation |
| US20160379388A1 (en) * | 2014-07-16 | 2016-12-29 | Digitalglobe, Inc. | System and method for combining geographical and economic data extracted from satellite imagery for use in predictive modeling |
| US20180032845A1 (en) * | 2016-07-26 | 2018-02-01 | Viisights Solutions Ltd. | Video content contextual classification |
| US20190130580A1 (en) * | 2017-10-26 | 2019-05-02 | Qualcomm Incorporated | Methods and systems for applying complex object detection in a video analytics system |
| US20200012864A1 (en) * | 2015-12-24 | 2020-01-09 | Intel Corporation | Video summarization using semantic information |
| US20200234047A1 (en) * | 2014-02-10 | 2020-07-23 | Geenee Gmbh | Systems and methods for image-feature-based recognition |
| US20200272509A1 (en) * | 2019-02-25 | 2020-08-27 | GM Global Technology Operations LLC | Method and apparatus of allocating automotive computing tasks to networked devices with heterogeneous capabilities |
| US20210081676A1 (en) * | 2019-09-17 | 2021-03-18 | Korea Institute Of Science And Technology | Method for generating video synopsis through scene understanding and system therefor |
| US20210103733A1 (en) * | 2019-07-19 | 2021-04-08 | Zhejiang Sensetime Technology Development Co.,Ltd. | Video processing method, apparatus, and non-transitory computer-readable storage medium |
| US11423248B2 (en) * | 2019-10-01 | 2022-08-23 | Johnson Controls Tyco IP Holdings LLP | Hierarchical sampling for object identification |
| US20230095533A1 (en) * | 2021-09-28 | 2023-03-30 | The Hong Kong University of Science and Technoloy | Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9258564B2 (en) * | 2012-02-07 | 2016-02-09 | Stmicroelectronics S.R.L. | Visual search system architectures based on compressed or compact feature descriptors |
-
2020
- 2020-10-01 EP EP20199642.8A patent/EP3800578A1/en active Pending
- 2020-10-01 US US17/061,262 patent/US11423248B2/en active Active
-
2022
- 2022-08-05 US US17/882,208 patent/US20220375202A1/en active Pending
Patent Citations (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7181757B1 (en) * | 1999-10-11 | 2007-02-20 | Electronics And Telecommunications Research Institute | Video summary description scheme and method and system of video summary description data generation for efficient overview and browsing |
| US20100125581A1 (en) * | 2005-11-15 | 2010-05-20 | Shmuel Peleg | Methods and systems for producing a video synopsis using clustering |
| US20110302207A1 (en) * | 2008-12-02 | 2011-12-08 | Haskolinn I Reykjavik | Multimedia identifier |
| US20200234047A1 (en) * | 2014-02-10 | 2020-07-23 | Geenee Gmbh | Systems and methods for image-feature-based recognition |
| US20160042250A1 (en) * | 2014-07-03 | 2016-02-11 | Oim Squared Inc. | Interactive content generation |
| US20160379388A1 (en) * | 2014-07-16 | 2016-12-29 | Digitalglobe, Inc. | System and method for combining geographical and economic data extracted from satellite imagery for use in predictive modeling |
| US20200012864A1 (en) * | 2015-12-24 | 2020-01-09 | Intel Corporation | Video summarization using semantic information |
| US20180032845A1 (en) * | 2016-07-26 | 2018-02-01 | Viisights Solutions Ltd. | Video content contextual classification |
| US20190130580A1 (en) * | 2017-10-26 | 2019-05-02 | Qualcomm Incorporated | Methods and systems for applying complex object detection in a video analytics system |
| US20200272509A1 (en) * | 2019-02-25 | 2020-08-27 | GM Global Technology Operations LLC | Method and apparatus of allocating automotive computing tasks to networked devices with heterogeneous capabilities |
| US20210103733A1 (en) * | 2019-07-19 | 2021-04-08 | Zhejiang Sensetime Technology Development Co.,Ltd. | Video processing method, apparatus, and non-transitory computer-readable storage medium |
| US20210081676A1 (en) * | 2019-09-17 | 2021-03-18 | Korea Institute Of Science And Technology | Method for generating video synopsis through scene understanding and system therefor |
| US11423248B2 (en) * | 2019-10-01 | 2022-08-23 | Johnson Controls Tyco IP Holdings LLP | Hierarchical sampling for object identification |
| US20230095533A1 (en) * | 2021-09-28 | 2023-03-30 | The Hong Kong University of Science and Technoloy | Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling |
Non-Patent Citations (2)
| Title |
|---|
| A. Amiri and M. Fathy, "Video shot boundary detection using QR-decomposition and gaussian transition detection," EURASIP Journal on Advances in Signal Processing, vol. 2009, Article ID 509438, 12 pages, 2009. (Year: 2009) * |
| R. Narasimha, A. Savakis, R. M. Rao, and R. De Queiroz, "A neural network approach to key frame extraction," in Storage and Retrieval Methods and Applications for Multimedia 2004, vol. 5307 of Proceedings of SPIE, pp. 439–447, 2004. (Year: 2004) * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3800578A1 (en) | 2021-04-07 |
| US20210097333A1 (en) | 2021-04-01 |
| US11423248B2 (en) | 2022-08-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220375202A1 (en) | Hierarchical sampling for object identification | |
| US20250148615A1 (en) | System and method for player reidentification in broadcast video | |
| US10248860B2 (en) | System and method for object re-identification | |
| US11055538B2 (en) | Object re-identification with temporal context | |
| Lavi et al. | Survey on deep learning techniques for person re-identification task | |
| CN108229456B (en) | Target tracking method and device, electronic equipment and computer storage medium | |
| CN112215156B (en) | Face snapshot method and system in video monitoring | |
| US20220012502A1 (en) | Activity detection device, activity detection system, and activity detection method | |
| US11544960B2 (en) | Attribute recognition system, learning server and non-transitory computer-readable recording medium | |
| KR20170026222A (en) | Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium | |
| Wei et al. | City-scale vehicle tracking and traffic flow estimation using low frame-rate traffic cameras | |
| CN111753618B (en) | Image recognition method, device, computer equipment and computer readable storage medium | |
| Huang et al. | Person re-identification across multi-camera system based on local descriptors | |
| Zhang et al. | A teaching evaluation system based on visual recognition technology | |
| CN115457620A (en) | User expression recognition method and device, computer equipment and storage medium | |
| CN112070003A (en) | Person tracking method and system based on deep learning | |
| Weng et al. | Crowd density estimation based on a modified multicolumn convolutional neural network | |
| Misale et al. | Learning visual words for content based image retrieval | |
| CN116052220B (en) | Pedestrian re-identification method, device, equipment and medium | |
| Chin et al. | Boosting descriptors condensed from video sequences for place recognition | |
| Bayat et al. | A Two-stage Shot Boundary Detection Framework in the presence of Fast movements: Application to Soccer videos | |
| Kamberov et al. | Collaborative track analysis, data cleansing, and labeling | |
| CN114241547A (en) | Facial feature updating method, apparatus, computer equipment and storage medium | |
| Sánchez Secades | Multiple feature temporal models for the semantic characterization of video contents | |
| Ullman | From classification to full object interpretation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| AS | Assignment |
Owner name: SENSORMATIC ELECTRONICS, LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LLOPIS, JAUME AMORES;LEE, YOUNG M.;WESTMACOTT, IAN C.;AND OTHERS;SIGNING DATES FROM 20191001 TO 20220522;REEL/FRAME:062480/0495 Owner name: SENSORMATIC ELECTRONICS, LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNORS:LLOPIS, JAUME AMORES;LEE, YOUNG M.;WESTMACOTT, IAN C.;AND OTHERS;SIGNING DATES FROM 20191001 TO 20220522;REEL/FRAME:062480/0495 |
|
| AS | Assignment |
Owner name: TYCO FIRE & SECURITY GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JOHNSON CONTROLS TYCO IP HOLDINGS LLP;REEL/FRAME:068494/0384 Effective date: 20240201 Owner name: TYCO FIRE & SECURITY GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:JOHNSON CONTROLS TYCO IP HOLDINGS LLP;REEL/FRAME:068494/0384 Effective date: 20240201 |
|
| AS | Assignment |
Owner name: TYCO FIRE & SECURITY GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JOHNSON CONTROLS TYCO IP HOLDINGS LLP;REEL/FRAME:068150/0829 Effective date: 20240201 Owner name: JOHNSON CONTROLS TYCO IP HOLDINGS LLP, WISCONSIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JOHNSON CONTROLS, INC.;REEL/FRAME:068084/0478 Effective date: 20210806 Owner name: JOHNSON CONTROLS, INC., WISCONSIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JOHNSON CONTROLS US HOLDINGS LLC;REEL/FRAME:068084/0344 Effective date: 20210806 Owner name: JOHNSON CONTROLS US HOLDINGS LLC, WISCONSIN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SENSORMATIC ELECTRONICS, LLC;REEL/FRAME:068084/0038 Effective date: 20210806 Owner name: JOHNSON CONTROLS TYCO IP HOLDINGS LLP, WISCONSIN Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:JOHNSON CONTROLS, INC.;REEL/FRAME:068084/0478 Effective date: 20210806 Owner name: TYCO FIRE & SECURITY GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:JOHNSON CONTROLS TYCO IP HOLDINGS LLP;REEL/FRAME:068150/0829 Effective date: 20240201 Owner name: JOHNSON CONTROLS, INC., WISCONSIN Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:JOHNSON CONTROLS US HOLDINGS LLC;REEL/FRAME:068084/0344 Effective date: 20210806 Owner name: JOHNSON CONTROLS US HOLDINGS LLC, WISCONSIN Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:SENSORMATIC ELECTRONICS, LLC;REEL/FRAME:068084/0038 Effective date: 20210806 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION COUNTED, NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |