SE546129C2

SE546129C2 - Method and system for optically tracking moving objects

Info

Publication number: SE546129C2
Application number: SE2230331A
Authority: SE
Inventors: Stein Norheim
Original assignee: Topgolf Sweden Ab
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2024-06-04
Also published as: WO2024083537A1; SE2230331A1; EP4605885A1; JP2025535304A

Abstract

Method for tracking moving objects, comprising the stepsdepicting a space (111) using a digital camera (110) to produce a series of consecutive digital images (It);for two or more of said pixel values (ix,y,t), determining an inequality comparing a first value to a second value, the first value being calculated based on the square of the difference between the pixel value (ix,y,t) and a predicted pixel value (fix,y,t), the second value being calculated based on a product of the square of a number Z and an estimated variance for historic pixel values (ix,y,{t-n,t-i});storing in a computer memory information indicating that the pixel value (ix,y,t) is part of a detected blob; andcorrelating detected blobs across said series of digital images (lt) to determine paths of moving objects.

Description

The present invention relates to a method and a system for optically tracking moving objects.

Known methods track moving objects using computer vision, using one or more cameras depicting a space where the moving objects exist. The tracking may be performed by first identifying an object as one image pixel, or a set of adjacent pixels, that deviate from a local background. Such deviating pixels are together denoted a "blob". Once a number of blobs have been detected in several image frames, possible tracked object paths are identified by interconnecting identified blobs in subsequent frames.

One example of such a method is exemplified in US 20220051420 A The blob generation in each individual frame potentially results in very many false positive blobs, in other words identified blobs that do not really correspond to an existing moving object. This may be due to noise, shifting lighting conditions and non-tracked objects occur- ring in the field of view of the camera in question.

The detection of possible tracked object paths normally results in a reduction of such false positives, for instance based on filtering away of physically or statistically implausible paths. Due to the large number of false positive blob detections, however, even if most of the false positives are filtered away in the tracked paths detection step, the blob detection itself is associated with heavy memory and processor load and may therefore constitute a bottle- neck for the object tracking even if high-performance hardware is used.

I\/loreover, as the performance of digital cameras increases, pixel data output from such ca meras increases correspondingly. ln order to achieve accurate tracking of moving objects, it is desired to use as accurate and precise image information as possible. ln order to avoid too many non-detected blobs (false negatives), leading to potentially missed tracked object paths, it is normally preferred to accept a relatively large share of false positive blob detections.

SUMMARY OF THE INVENTION The various embodiments described herein solve one or more ofthe above described prob- lems and provide techniques for tracking the paths of moving objects using less memory and/or processing power compared to conventional object tracking techniques.

Hence, the invention can be embodied as a method for tracking moving objects, comprising the steps obtaining a series of digital images lt at consecutive times t, the digital images lt rep- resenting optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said digital images lt having a corre- sponding set of pixels pm, said digital images comprising corresponding pixel values iX,y,t, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (lt); for two or more of said pixel values iX,y,t, determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value iX,y,t in question and a predicted pixel value ﬁxyß, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation axyß with respect to historic pixel values iX,y,tt-tt,t-1} for the pixel pm, in question, where the predicted pixel value ﬁxyß is calculated based on historic pixel values iX,y,tt-tt,t.1} for the pixel pm, in question, Z being a number selected such that ZZ is an integer such that 10 S ZZ S 20; for pixel values iX,y,t for which said first value is higher than said second value, storing in a computer memory information indicating that the pixel value iX,y,t is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space. ln some embodiments, n n n n n A 2 A n n n n sa|d |nequa||ty |s (Lxy, - yxaßt) > Zzašßt, where yxy, |s sa|d pred|cted pixel value and where any, is an estimated standard deviation with respect to historic pixel values im,,{t_ n,t.1} for the pixel pm, in question. ln some embodiments, the method further comprises storing, in said computer memory, for individual ones of said pixels pm, and for a num- f* :2 nd É-l _ 1'=f-n x.y.1" a ber n S N, the sums Sxy, = J-:Fn ixyd- and Qxy, = for individual ones of said pixel values im,t, determining said inequality as (nixyß _ Sx,y,t)2 > ZZ ln some embodiments, SX,y,r, Qmm, or both, are calculated recursively, whereby a calculated value for a pixel value im,,t is calculated using a previously stored calculated value SXM, Qm,,t, or both, for the same pixel pm, but at an immediately preceding time t- ln some embodiments, Sm,,t is calculated as Sxy, =Sx,y,t_1 +ix,y,t -ixaßbn and Qm; is calculated as i n2 n2 i n n n n Qxy; _ Qx,y,t-1"|"lx,y,t _ lxym-n _ Qx,y,t-1 "l" (lxym "l" lx,y,t-n)(lx,y,t _ lx,y,t-n)- ln some embodiments, the method further comprises storing in said computer memory Sm,,t and Qm,,t in combination as a single datatype comprising 12 bytes or less per pixel pm,. ln some embodiments, the method further comprises storing in said computer memory, for a particular digital image lt, a pixmap having, for each pixel pm, said information indicating that the pixel value im,t is part of a detected blob. ln some embodiments, said information indicating that the pixel value im,t is part of a detected blob is indi- cated in a single a bit for each pixel pm. ln some embodiments, said pixmap also comprises, for each pixel pm, a value indicating an expected pixel value im,t for that pixel pm. ln some embodiments, said value indicating an expected pixel value im,t for the pixel pm in question by storing the predicted pixel value (ﬁxyß) as a fixed-point fractional number, using a total of 15 bits for the integer and fractional parts. ln some embodiments, the predicted pixel value ﬁxaßt, the estimated variance or standard deviation axyß, or both, is or are calculated based on a set of n historic pixel values ix,y,it-n,t_1} for the pixel pm in question, where 10 S n S ln some embodiments, a number n of previous images lt considered for the estimation of an estimated vari- ance or a standard deviation omm of the second value is selected to be a power of 2. ln some embodiments, said pixel values im,t have a depth across one or several channels of between 8 andbits. ln some embodiments, the predicted pixel value ﬁxy, is determined based on an estimated projected future mean pixel value yxy, in turn determined based on historic pixel values im,,t for a sampled set of pixels pm, in said images lt. ln some embodiments, the predicted pixel value ﬁxy, is determined as ﬁxy, = ayxy, + ß, where a and ß 2 are constants determined so as to m|n|m|ze 21k (LJ-k, - (dyl-k, + ß» , where yjk, |s said estimated projected future mean pixel value for the pixel ptk in question, and wherej and k are iterated over a test set of pixels. ln some embodiments, yxy, is an estimated historic mean with respect to pixel values im,,t for the pixel ptk in question. ln some embodiments, said test set of pixels contains between 1% and 25% ofthe total set of pixels pm, in the image lt. ln some embodiments, said test set of pixels is geometrically evenly distributed across the total set of pixels pm, in the image lt. ln some embodiments, the estimated standard deviation axy, is determined according to aåyß= Û-š,y,t-1(1 _ + :Omg/ß _ alixyß _ ß)2f Where g E ]0»1[- ln some embodiments, the method further comprises determining that at least one is true of a being further away from 1 than a first thresh- old value and ß being further away from 0 than a second threshold value; and determining the predicted pixel value, (ﬁxyß) according to any one of claims 14-18 until it is determined that a is no longer further away from 1 than the first threshold value and ß is no longer further away from 0 than the second threshold value.

In some embodiments, the method further comprises for said pixel values iwt for which said first value is higher than said second value, only store said information indicating that the pixel value ix,y,t is part of a detected blob in case also the following inequality holds: B[ix,y¿ - ﬁxyßf > ﬁxyß, where ixyß is the pixel value in question, where ﬁxyß is the predicted pixel value and where B is an integer such that B > In some embodiments, the method further comprises using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part ofa same blob.

In some embodiments, the objects are golf balls.

The invention can be embodied as a method for tracking moving objects, the method com- prising: obtaining a series of digital images I from a digital camera, the digital images I repre- senting optical input from a three-dimensional space within a field of view of the digital camera over time, each of the digital images I having pixels pm, with corresponding pixel values im; performing, at a computer, image segmentation on each image ofthe series of digital images I using a statistical model of background for the optical input to detect blobs, wherein performing the image segmentation comprises, for each of two or more pixel val- ues iwt in the image, determining an inequality result using a current pixel value ix,y,t for a pixel pm, in a current image It, first Sxyß and second Qxaßt values of the statistical model for the pixel pm, and a confidence level value ZZ, wherein the first Sxy, and second Qxy, values are calcu- lated based on historic pixel values im, from images from the series of digital images I before the current image lt, each of the current pixel value im,,t, the first Sxy, and second Qxy, values, and the confidence level value ZZ are stored as integer type data in a memory ofthe computer, and the determining uses integer operations in the computer, and storing in the memory of the computer, information indicating that the current pixel value im,,t for the image pixel in the current image lt is part of a detected blob in response to the inequality result; and using the stored information to correlate detected blobs across the series of digital images I to determine paths of moving objects through the three-dimensional space within the field of view of the digital camera. l\/loreover, the invention can also be embodied as a system for tracking moving objects, the system comprising a digital camera, a digital image analyzer and a moving object tracker, the digital camera being arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images lt at consecutive times t, the digital camera being arranged to produce said digital images lt hav- ing a corresponding set of pixels pm,, said digital images comprising corresponding pixel val- ues im,,t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (lt); the digital Image analyzer being configured to, for two or more of said pixel values im,,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square ofthe difference between the pixel value im,,t in ques- tion and a predicted pixel value ﬁxyß, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or stand- ard deviation axy, with respect to historic pixel values iX,y,tt.tt,t-1} for the pixel pm, in question, where the predicted pixel value ﬁxy, is calculated based on historic pixel values iX,y,tt-tt,t-1} for the pixel pm, in question, and where Z is selected such that ZZ is an integer such that SZZ S20; the digital image analyzer being configured to, for pixel values iX,y,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value iX,y,t is part of a detected blob; and the moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space.

Furthermore, the invention can also be embodied as a computer software product config- ured to, when executing, receive a series of digital images lt from a digital camera, the digital camera being arranged to represent optical input from a three-dimensional space to produce said digital images lt at consecutive times t, the digital camera being arranged to produce said digital images lt having a corresponding set of pixels pm, said digital images comprising corre- sponding pixel values iX,y,t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (lt); for two or more of said pixel values iX,y,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value iX,y,t in question and a predicted pixel value ﬁxyß, the second value being calculated as, or based on, a product ofthe square of, firstly, a number Z and, secondly, an estimated variance or standard deviation axyß with respect to historic pixel values iX,y,tt-tt,t-1} for the pixel pm, in question, where the predicted pixel value ﬁxyß is calculated based on historic pixel values iX,y,tt-tt,t-1} for the pixel pm, in question, and where Z is selected such that ZZ is an integer such that 10 S ZZ S 20; for pixel values iX,y,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value iX,y,t is part of a detected blob; and correlate, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space.

The computer software product can be implemented by a non-transitory computer-reada- ble medium encoding instructions that cause one or more hardware processors located in at least one of computer hardware devices in the system to perform the digital image pro- cessing and the object tracking.

BRIEF DESCRIPTION OF THE DRAWINGS ln the following, the invention will be described in detail, with reference to exemplifying embodiments ofthe invention and to the enclosed drawings, wherein: Figure 1 is an overview of a system 100 configured to perform a method of the type illus- trated in Figure 3; Figure 2 is a simplified illustration of a data processing apparatus; Figure 3 shows a general flowchart for logically tracking moving target objects; Figure 4 is a flowchart of a method performed by the system 100 shown in Figure 1; Figure 5 is an overview illustrating a noise model of a type described herein; Figure 6 shows an image frame illustrating a noise model; Figure 7 illustrates an example of clustering of pixels into blobs; and Figure 8 illustrates intensities for a pixel during a sudden exposure change event.

All figures share the same reference numerals for same and corresponding parts.

DETAILED DESCRIPTION With reference to Figure 1, the method relates to a method for tracking moving target ob- jects 120. Generally, a system 100 can comprise one or several digital cameras 110, each being arranged to represent optical input from a three-dimensional space 111 within a field of view of the digital camera 110, to produce digital images of such moving target objects 120, the objects travelling through a space 111 hence being represented by the digital cam- era 110 in consecutive digital images. Such representation by the digital camera 110 will herein be denoted a "depiction", for brevity. lO The digital camera 110 is arranged to not move in relation to the space 111 during produc- tion of the series of digital images (lt). For instance, the digital camera 110 may be fixed in relation to said space 111, or, in case it is movable it is kept still during the production of the series of digital images (lt). Hence, the same part of the space 111 is depicted each time by the digital camera 110, and the digital camera 110 is arranged to produce digital images lt having a corresponding set of pixels pm,, and so that said produced digital images lt com- prise corresponding pixel values im,,t. "x" and "y" denote coordinates in an image coordinate system, whereas "t" denotes time.

That the pixel values im,,t of two or more different images lt "correspond" to each other means that individual pixels pm, measure light entering the camera 110 from the same, or substantially the same, light cone in all of the images lt in question. lt is realized that the camera 110 may move slightly, due to wind, thermal expansion and so forth, between im- ages lt, but that there is su bstantial correspondence between pixels pm, even in cases where such noise-inducing slight movement is present. There can be at least 50% overlap between light cones of any one same pixel pm, of the camera 110 between any two consecutive im- ages lt. There may also be cases where the camera 110 is movable, such as pivotable. ln such cases an image transformation can be applied to a captured image so as to bring its pixels pm, into correspondence with pixels of a previous or future captured image. ln case the system 100 comprises more than one digital camera 110, several such digital cameras 110 can be arranged to depict the same space 111 and consequently tracking the same moving target object(s) 120 through said space 111. ln such cases, the several digital cameras 110 can be used to construct a stereoscopic view of the respective tracked path of each target object As mentioned, the digital camera 110 is arranged to produce a series of consecutive images lt, at different points in time. Such images may also be denoted image "frames". ln some embodiments, the digital camera 110 is a digital video camera, arranged to produce a digital moving film comprising or being constituted by such consecutive digital image frames. ll As is illustrated in Figure 1, the system 100 comprises a digital image analyzer 130, config- ured to analyze digital images received directly from the digital camera 110, or receive from the digital camera 110 via an intermediate system, in same or processed (re-formatted, compressed, filtered, etc.) form. The analysis performed by the digital image analyzer 130 can take place in the digital domain. The digital image analyzer 130 may also be denoted a "blob detector".

The system 100 further comprises an object tracker 140, configured to track said moving target objects 120 across several of said digital images, based on information provided from the digital image analyzer 130. The analysis performed by the object tracker 140 can also take place in the digital domain. ln example embodiments, the system 100 is configured to track target objects 120 in the form of sports objects in flight, such as balls in flight, for instance baseballs or golf balls in flight. ln some embodiments, the system 100 is used at a golf practice range, such as a driv- ing range having a plurality of bays for hitting golf balls that are to be tracked using the system 100. ln other cases, the system 100 can be installed at an individual golf range bay, or at a golf tee, and configured to track golf balls being struck from said bay or tee. The system 100 can also be a portable system 100, configured to be positioned at a location from which it can track said moving target objects 120. lt is realized that the monitored "space" mentioned above will, in each of these and other cases, will be a space through which sport balls are expected to move.

Various types of computers can be used in the system 100. The digital image analyzer 130 and the object tracker 140 constitute examples ofsuch computers. ln some cases, the digital image analyzer 130 and the object tracker 140 can be provided as software functions exe- cuting on one and the same computer. The one or several digital cameras 110 can also be configured to perform digital image processing, and then also constitute examples of such computers. ln some embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented as software functions configured to execute on hardware of oneor several digital cameras 110. ln other embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented on standalone or combined hardware plat- forms, such as on a computer server.

The one or several digital cameras 110, the digital image analyzer 130 and the object tracker 140 are configured to communicate digitally, either via computer-internal communication paths, such as via a computer bus, or via computer-external wired and/or wireless commu- nication paths, such as via internet network 10 (e.g., the Internet). ln implementations that need substantial communications bandwidth, the camera(s) 110 and the digital image ana- lyzer 130 can communicate via a direct, wired digital communication route, which is not over the network 10. On the other hand, the digital image analyzer 130 and the object tracker 140 may communicate with each other over the network 10 (e.g., a conventional Internet connection).

The essential elements of a computer, in general, are a processor for performing instruc- tions and one or more memory devices for storing instructions and data. As used herein, a "computer" can include a server computer, a client computer, a personal computer, em- bedded programmable circuitry, or a special purpose logic circuitry. Such computers can be connected with one or more other computers through a network, such as the internet 10, or via any suitable peer-to-peer connection for digital communications, such as a Blue- tooth® connection.

Each computer can include various software modules, which can be distributed between an applications layer and an operating system. These can include executable and/or interpret- able software programs or libraries, including various programs that operate, for instance, as the digital image analyzer 130 program and/or the object tracker 140 program. Other examples include a digital image preprocessing and/or compressing program. The number of software modules used can vary from one implementation to another and from one such computer to another. Each of said programs can be implemented in embedded firmware and/or as software modules that are distributed on one or more data processing apparatus connected by one or more computer networks or other suitable communication networks.Figure 2 illustrates an example of such a computer, being a data processing apparatus 300 that can include hardware or firmware devices including one or more hardware processors 312, one or more additional devices 314, a non-transitory computer readable medium 316, a communication interface 318, and one or more user interface devices 320. The processor 312 is ca pa ble of processing instructions for execution within the data processing appa ratus 300, such as instructions stored on the non-transitory computer readable medium 316, which can include a storage device such as one of the additional devices 314. ln some im- plementations, the processor 312 is a single or multi-core processor, or two or more central processing units (CPUs). The data processing apparatus 300 uses its communication inter- face 318 to communicate with one or more other computers 390, for example, over the network 380. Thus, in various implementations, the processes described can be run in par- allel, concurrently, or serially, on a single or multi-core computing machine, and/or on a computer cluster/cloud, etc.

The data processing apparatus 300 includes various software modules, which can be dis- tributed between an applications layer and an operating system. These can include execut- able and/or interpretable software programs or libraries, including a program 330 that con- stitutes the digital image analyzer 130 described herein, configured to perform the method steps performed by such digital image analyzer 130. The program 330 can also constitute the object tracker 140 described herein, configured to perform the method steps per- formed by such object tracker Examples of user interface devices 320 include a display, a touchscreen display, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse. I\/|oreover, the user in- terface device(s) need not be local device(s) 320, but can be remote from the data pro- cessing apparatus 300, e.g., user interface device(s) 390 accessible via one or more commu- nication network(s) 380. The user interface device 320 can also be in the form of a standalone device having a screen, such as a conventional smartphone being connected to the system 100 via a configuration or setup step. The data processing apparatus 300 can store instructions that implement operations as described in this document, for example,on the non-transitory computer readable medium 316, which can include one or more ad- ditional devices 314, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, a tape device, and a solid state memory device (e.g., a RAM drive, a Flash memory or an EEPROM). I\/|oreover, the instructions that implement the operations described in this document can be downloaded to the non-transitory computer readable medium 316 over the network 380 from one or more computers 390 (e.g., from the cloud), and in some implementations, the RAM drive is a volatile memory device to which the in- structions are downloaded each time the computer is turned on. lt is realized that the described computer hardware can be physical hardware, virtual hard- ware or any combination thereof.

As mentioned, the system 100 is configured to perform a method according to one or more embodiments for optically tracking moving target objects The present invention can furthermore be embodied as a computer software product, con- figured to perform said method when executing on computer hardware of the type de- scribed herein. The computer software product can hence be deployed as a part of the sys- tem 100 so as to provide the functionality required to perform the present method.

Both said system 100 and said computer software product are hence configured to track moving target objects 120 moving through said space 111 in relation to one or several digital cameras 110, by comprising or embodying the above-mentioned digital image analyzer 130 and object tracker 140, in turn being configured to perform the corresponding method steps described herein. ln general, everything that is said in relation to the presently described method is equally applicable to the system 100 and to the computer software product described herein, and vice versa.

Figure 3 illustrates a general flowchart for tracking moving target objects 120 based on dig- ital image information received from one or several digital cameras ln computer vision, "image segmentation" is the process of separating an image into differ- ent regions, representing target objects within it. Generally, it is desirable to distinguish potential moving target objects from a background. The background may in general be changing and noisy, and is in many cases quite unpredictable. ln the example of a golf ball, for instance, when such a ball is far away from the digital camera 110 depicting the ball 120, it may be even as small as one single pixel pm, in the digital image frame produced by the digital camera For these reasons, it is in general not possible to separate out a foreground object 120 from a background based only on a detected shape in relation to an expected shape ofthe target object 120. lnstead, it is proposed to set up a statistical model of the background (in the following denoted a "noise model"), and to identify pixels pm, that by a probability measure deviate from an expected value with more than a threshold value, based on this model. Adjacent pixels pm, in the detected digital image that deviate from the expected value in accordance with the model are grouped together into a "blob" of pixels pm, ("blob aggrega- tion").

Such a method may result in a very large number of false positives, such as about 99.9% false positives. However, a subsequent motion tracking analysis can sort out the vast ma- jority of all false positives, such as only keeping blobs that seem to obey Newton's laws of motion between consecutive digital image frames lt.

The noise model step, as depicted in Figure 3, is used to suppress noise in the image frames, with the purpose of lowering the number of detected blobs in the subsequent blob aggre- gation step. The noise model analyzes a plurality of pixels pm,, such as every pixel pm,, in said image frames lt, and is therefore at risk of becoming a major bottleneck. These calculations, aiming to identify noise that does not conform to a detected statistical pattern in order to identify outliers, can be handled by high-performance GPUs (Graphics Processing Units), butperformance may still prove to be a problem. The approach described herein has turned out to drastically reduce the computational power required per pixel pm, in a moving target object 120 tracking system 100. This reduction can be exploited by using simpler hardware, lower power consumption or a larger incoming image bitrate.

Turning now to Figure 4, a method according to one or more embodiments is illustrated. ln a first step S1, the method starts. ln a subsequent step S2, a number Z is selected such that ZZ is an integer. The number Z can be selected such that ZZ is an integer such that 10 S ZZ S 20. lt is noted that Z may be a non-integer, as long as ZZ is an integer value. This step S2 may be performed ahead of time, such as during a system 100 design process or a system 100 calibration step. ln a subsequent step S3, the space 111 is depicted using the digital camera 110 to produce a series of digital images it at consecutive times t. The space 111 can be depicted using the digital camera 110 to produce a series of N digital images it at consecutive times t. However, it is realized that the procedure can also be a continuous or semi-continuous procedure, wherein the digital camera 110 will continue to produce digital images it at consecutive times t so long as the procedure is ongoing. Hence, in this case the number of digital images N will grow by 1 for each captured frame. ln either case, the series of digital images it at consecutive times t may be seen as a stream of digital images captured much like a digital video stream. ln a subsequent step S4, for two or more (e.g., several) of said pixel values iX,y,t, an inequality is determined, involving comparing a first value to a second value.

The first value is calculated based on the square of the difference between the pixel value iX,y,t in question and a calculated predicted pixel value ﬁxaßt for that pixel pm. The second value is calculated based on a product of, on the one hand, the square ofthe selected num- ber Z, this square then being an integer value, and, on the other hand, an estimated varianceor standard deviation axyß with respect to historic pixel values iX,y,{t-n,t-1} for the pixel pm, in question. Concretely, the second value can be calculated based on said estimated variance or a square ofthe estimated standard deviation axyß.

The predicted pixel value ﬁxyß is also calculated based on historic pixel values iX,y,{t-n,t.1} for the pixel pm, in question, in other words using information from image frames lim captured by the camera 110 at points in time prior to the time t. The predicted pixel value ﬁxyß can be calculated based on the same, or different, set of historic pixel values iX,y,{t.n,t-1} as the estimated variance or standard deviation axyß. ln the notation used herein, "n" denotes the number of historic pixel values iX,y,t, considered by the noise model, counting backwards from the currently considered image frame. This notation hence assumes that the same consecutive pixel values iwt, up to the presently considered image frame, are used to calculate both the first and the second value, but it is realized that any suitable contiguous or non-contiguous, same or different, intervals of pixel values ix,y,t can be used to calculate the first and the second value, respectively. ln general, the equations and expressions disclosed and discussed herein are provided as illustrative examples, and it is realized that in practical embodiments they can be tailored to specific needs. This can include, for instance, the introduction of various constant factors and scaling factors; additional intermediate calculation steps, such as filtering steps; and so forth. ln some embodiments, said inequality may be written as: - ^ 2 2 2 (lxyß _ Mxyß) > Z O-x,y,tf where ﬁxyß is said predicted pixel value and where axyß is an estimated standard deviation with respect to said historic pixel values iX,y,{t.n,t-1} for the pixel pm, in question.ln general, the presently described noise model can be configured to, for each pixel pm, estimate a moving average and standard deviation based on the last n image frames, and then to use these metrics to decide whether the pixel value ix,y,t in the same image location in the new frame deviates from the expected value more than an allowed limit.

This model can be designed to assume that any pixel in the background of the considered image it has an intrinsic Gaussian noise, as long as the background only contains features that are assumed to be static in the first approximation. A normal distribution can be used to establish a suitable confidence interval. For instance, if a Z score of 3.464 is used, it can be seen that 99.95% of all samples with no significant differences from the background fall within the corresponding confidence interval. Therefore, a pixel pm, with signal value im, at time t is considered to have a significant difference from the background if: |ix,y,t_l¿x,y| i _ Ek ix,y,k i > Z, Whefe .Llxy z lxy - i. (1) axy n Here, k iterates over the previous n frames. The limit is based on the (uncorrected) standard deviation: O_ I fzü-xßat-IÛZ I TI The corrected (unbiased) standard deviation would be a mathematically more correct choice, i.e. a more accurate estimate of 0 would result from dividing by n-l rather than by n. However, for the present purposes this is not significant, since the limit used is a multiple of the standard deviation that may be freely selected. Selecting the number n of previous image frames considered for the estimation of the standard deviation in the second value (used in evaluating said inequality) to be a power of 2 (e.g. 16, 32, 64, ...), we can get com- putationally efficient multiplications and divisions at a very low cost, by using shifting oper- ations.When processing an image frame li, pixel values im, from frames k E [t - n, t - 1] are used.

A variant of the formula for computing the standard deviation that allows for it to be com- puted in a single pass is the following. Here the expression for the estimate of the mean is also provided: . . 2 _ n Ek lazc,y,k"(zk lxyk) O-XJ/.ï _ _ _ Ek ix,y,k Hxy; _ n Set Qxaßt = 22k išaßk and Sxyß = 22k ixaßk, and the expressions (2) and (3) can then be re- written as: z _ nQx,y,i_Sx,y,i O-XJ/.ï _ nz 5x,y,t Hxy; _ in Revisiting (1), it is safe to square both sides, since both the left-hand and right-hand sides of this equation are non-negative: (ixyß _ Mx,y,t)2 > ZZÛ-aâyß- Combining (4) and (6) yields: 5x,y,t)2 > ZZ nQmyI-Sazgyj (lxyß _ nz This is equivalent to: (nixyß _ Sx,y,t)2 > ZZ (8) lt is noted that n S N. Hence, the above-discussed inequality can be expressed as (8), with t-l - _ _ É-l '2 j=t-n lxanj' and Qxaat _ Sx,y,t I j=t-n Lx,y,j' Since iX,y,t, n, Sxyt and QXM are all integers, and since Z can be picked to produce an appro- priate or desired number of false positives, the entire calculation can be done using only integer numbers. This means that the calculations can be performed without any loss of precision due to floating point truncation errors. Also, integer operations are typically faster than their floating-point counterparts. ln the following table, various outcomes for different selected values of Z are shown: 22 Z P (number of false positives, ppm) 12 3.46410 532.13 3.60555 311.14 3.75166 175.15 3.87298 107.16 4.00000 63.Equation (8) depends on knowledge of the sum S and the squared sum Q of the last n ob- servations of the pixel value iX,y,t in question: Sx,y,t I §;%-n ix,y,j (9) _ lÉ-l -2 Qx,y,t _ j=t-n lx,y,j' (10) While it would be possible to calculate the statistics using (9) and (10) directly, for each pixel value iX,y,t of each frame lt, it is much more computationally efficient to use a recursive def- inition, where in every step the new frame lt is added to the noise model, and the frame lt. tt from n frames back is removed: _ t-l . _ . .

Sxyß _ j=t-nlx.y.j _ font-l "l" [mat _ lxyß-n (11) _ É-l - 2 _ - 2 -Qx,y,t _ j=t-n lx,y,j _ Qx,y,t-1 + lx,y,t _ lx,y,t-n (12)Updating Qwt requires two multiplications to generate the squares. However, since Qwt involves a difference of squares, it can be reduced to one single multiplication if rewritten as follows: QW = Qxwt + (fxyt + fx,y,t_t.)(fx,y,t - fxyts.) (13) Define z_ = ixyß - ixyßﬂ, and z., = ixyß + ixyhn. Then: SxL/yß : Sx'y,t_l + Z_,' and Qx,y,t I Qx,y,t-1 + Z+Z-- (15) (14) and (15) are then the full calculations required to update the noise model. A straight- forward implementation would require only 3 (int) additions, 1 (int) subtraction and 1 (int) multiplication per pixel, which makes it very computationally efficient. Furthermore, these calculations can be accelerated by use of SIMD instructions sets such as AVX2 (on x86_64) or NEON (on aarch64), or they can be run on a GPU or even implemented on an FPGA.

The calculations performed to update the noise model between consecutive image frames lt are conceptually illustrated in Figure 5, showing how the "New frame" lt is added to the Noise l\/lodel, and a frame lt-tt from n frames back, such as the last frame in the currently considered "Queue of frames in model" is removed. As has been described above, this can be performed efficiently by considering the individual pixel values iX,y,t and iX,y,t-tt, calcuting the value of z+ and z..

From the above it is clear that 0 S S S Zbn and 0 S S S Zzbn, where b is the bit depth of the input pixel value iX,y,t data. ln some embodiments, n is not more than 300, or not more than 256, or not more than 128, and such as not more than 64 = 26 (the averaging is not performed over more than 64 consecutive image frames lt). ln some embodiments, n can be as low as 32, or even as low as 16 or even 10. ln some embodiments, the n frames con- sidered at each point in time are the n latest frames captured and provided by the camera110. ln this case, the n frames can together cover a time period of between 0.1 s and 10 s, such as between 0.5 s and 2 s, of captured video. ln other words, the number of considered frames n can be relatively close to a frame rate used by the digital camera 110. The noise model may then be required to store two integers per pixel pm, in addition to keeping the actual image frames in memory for at least as many frames lt as the length of the window size n. Furthermore, an additional single-precision float may be required per pixel to store the estimated variance ifthe calculation (as described in equation (19), below) is used. ln some embodiments, the pixel values iwt have a bit depth across one or several channels of between 8 and 48 bits, such as a single channel (for instance a gray channel) of 8 orbit depth or three channels (such as RGB) of 16 or 24 bit depth. ln case the camera 110 provides pixel value ix,y,t information across several color channels, the pixel values iwt can be transformed into a single-channel (such as a gray scale channel) before processing of the pixel values ix,y,t by the digital image analyzer 130. Alternatively, only one such channel, out of several available channels, can be used for the analysis. Fur- ther alternatively, several channels can be analyzed separately and in parallel, so that a pixel that is detected to be a blob in at least one such analyzed channel is determined to be a blob at any point in time.

The transformed pixel values iwt can have a bit depth of at least 8 bits, and in some em- bodiments at the most 24 bits, such as at the most 16 bits. A bit depth of 12 bits has proven to strike a reasonable balance between speed, memory requirements and output quality. ln case input data has a higher bit depth than required, the data from the camera 110 can be transformed (down-sampled) before processing by the digital image analyzer I\/|ore generally, the number of bits required can be found for S as D + logz (n) and for Q as 2D + logz (n), where D is a bit depth for one single considered channel.

The following table shows the required storage space for S and Q depending on the used pixel value ix,y,t bit depth when n = 64:Pixel bit depth S required bits Q required bits s 14 (uim16) 22 (uimsz) 10 16 (uim16) 26 (uimsz) 12 18 (uimsz) 30 (uimsz) 16 22 (uimsz) as (uim64) ln general, the method can comprise a step in which the noise model is updated and stored in computer memory, as a collection of updated noise model information (S and Q) with respect to individual pixels pm, for which blob detection is to be performed. This noise map can hence be updated and stored for each pixel pm, in the image.

Using the above-explained calculations, it is possible to store, in said computer memory, updated values for Sx,y,t and Qm,,t in combination as a single datatype (such as a single struc- ture, record or tuple), the datatype comprising 12 bytes or less, or 10 bytes or less, or even 8 bytes or less, per pixel pm,. This storing, for each analyzed pixel value im,,t (such as for all pixels pm, in the image lt), of updated values for Sx,y,t and Qm,,t in combination as a single datatype, constitutes an example of the "noise model" described herein. Hence, the noise model is updated for each analyzed digital image frame lt, such as for each individual image frame lt in the set of consecutive image frames lt produced and provided by the (each) digital camera ln the same step S4, for pixel values im,,t for which said first value is found to be higher than said second value, information is stored in said computer memory, the information indicat- ing that the pixel value im,,t is part of a detected blob.

This storing can take place in a generated pixmap, in other words a data structure having such indicating information for each pixel pm,. The information for each pixel pm, that it be- longs or does not belong to a blob for that image frame lt can be stored very computation- ally efficient, since it can be stored as a single binary bit.One way of implementing such a pixmap in practice is to use a "noise map" of the general type that will be described in the following, where the pixmap also comprises, for each pixel pm, a value indicating an expected pixel value imt for that pixel pm.

Hence, for each frame, the noise model established as described above can be used to gen- erate such a noise map, that for every pixel position pm provides information about whether or not that particular pixel value imt in the new frame lt was outside ofthe allowed limits (that is, if (6) or (8) was true). ln addition, the noise map can store an expected signal value for each pixel pm at time t, such as based on the calculations performed in the deter- mination ofthe noise model. The expected signal value is useful in downstream calculations, such as in a subsequent blob aggregation step, and so it is computationally efficient to es- tablish and store this information already at this point.

Figure 6 illustrates the noise model after being updated based on the information ofa most recently available image frame lt, and in particular how the frame lt relates to the values of Sm and Qm for that time t.

Even though it would be possible to first emit the noise map for each new image frame lt arriving at the digital analyzer 130, and only thereafter to update the noise model in the digital analyzer 130, both of them can be done in one go, without unloading or overwriting the information in memory between said calculations. Hence, the (each) new image frame lt is loaded into the CPU memory; and zt, z-, Sxyß, Qxyß, omm and/or yxæt are calculated for each pixel pm, as the case may be, before the loaded data is unloaded or overwritten in the CPU memory. The advantage achieved then is to avoid memory access becoming a bot- tleneck. Once the penalty of loading the data into the CPU has been paid, all the necessary calculations are performed before unloading or overwriting the data in the CPU memory. ln the following example, the noise map requires 16 bits per pixel pm to store. This infor- mation can be stored in a single two-byte datatype (such as an uint16).

The information indicating whether or not the pixel pm, corresponding to each noise map entry is a blob pixel or not can be stored in the form ofone single bit out ofthe total number of stored bits for the pixel pm, in question in the noise map. ln some embodiments, the most significant bit in datatype used to store noise map data for each pixel pm,, such as the most significant bit in the exemplifying two-byte structure, indicates whether the pixel value im,,t in question is outside the blob generating limits. Then, the lower 15 bits can encode the expected (average) pixel value im, signal, scaled to 15 bits precision and can be stored in fixed-point representation. lt is noted that this expected pixel value im, signal corresponds to the above-discussed predicted pixel value ﬁxaßt. ln other words, the value in the noise map indicating an expected pixel value im,,t for the pixel pm, can be achieved by transforming (if necessary) the predicted pixel value ﬁxy, to a grayscale bit depth of 15 bits. ln one example, the encoding is according to the following, for performance reasons: First, the expected signal is scaled to 15 bits (0..32767). lf n = 32 and the input pixel depth is 12 bits, this means that St uses 17 bits for each pixel value im,t. A simple shift operation will divide this number by 4, which puts it in the 15 bit range. Secondly, if the pixel value im,,t is within the limits given in equation (6) (or as given in its reformulated form (8)), all bits are negated. A noise map consumer can therefore iterate through the pixels pm, of the noise map data and ignore all entries that have the most significant bit set to lt should be noted that the pixmap for each pixel at least or only contains information on 1) whether that pixel is part of a blob and 2) the predicted pixel value for that pixel. ln this case, the prediction is simply the arithmetic mean of the previous n frames, but we will, later on, describe a method, which can be an alternative method to the one described so far, to predict the value to be used when the recent frames have large changes in capture parameters such as shutter time or gain. ln some embodiments, the stored noise model incorporates all available information from image frames lt received to the digital image analyzer 130 from the camera 110. ln other words, it can use n consecutive or non-consecutive image frames lt up until a most recently received image frame lt to calculate values for Qm, and Sm,. On the other hand, the estimatedprojection (predicted pixel value ﬁxyß) data stored for each pixel pm, in the noise map can be updated only using a second-to-most recently received image frame lt, i.e. not using a most recently received image frame lt that contains the pixel values im,,t to be assessed with respect to blob classification. ln practice, this may mean that the previous values for SX,y,t, before being updated using the most recently received pixel values im,,t, can be used to cal- culate the (transformed) predicted data which is then stored in the pixmap. ln the above example, the predicted pixel value ﬁxy, is determined as (or at least based on) an estimated projected future mean pixel value yxyß, in turn determined based on historic pixel values im,,t for a sampled set of pixels pm, in said sequence of image frames lt. ln embodiments that will be described in more detail in the following, the predicted pixel value ﬁxy, is determined as ﬁxy, = ayxy, + ß, where a and ß are constants determined so as to minimize the expression 2 Zjyk (ijiu _ (alljyzaf + ß» t (15) where yj-'m is said estimated projected future mean pixel value for the pixel ptk in question, and where j and k are iterated over a test set of pixels pm, in the image frame lt. The deter- mination of a and ß can take place in any per se conventional manner, which is well within the reach of the skilled person. As is the case for the above-described noise model, in some embodiments, yxy, can be an estimated historic mean with respect to pixel values im,,t for the pixel ptk in question.

The above-described pure variance based noise model has proven to give good results in a wide range of environments. However, if the light conditions in the image change too quickly, the noise map will be flooded with outliers at first. ln the image frames lt that follow upon such changed light conditions, the standard deviation estimate will be in inflated, which instead leads to some degree of blindness until the noise model stabilizes again.The suitability ofdifferent variants ofthe presently described method can also vary depend- ing on the camera 110 hardware used. For instance, exposure and gain can be more or less coarse for different types of cameras, and aperture changes can be performed more or less quickly. lt is then proposed to estimate a linear mapping between the average intensity value in the noise model and the pixel intensities in the new frame. That is, find values for variables a and ß that minimize (16). ln (16), j may represent a sample or test set of pixels pm,, such as a set of pixels pm, evenly distributed (geometrically) pixel positions in the image frame lt.

To be clear, when establishing the coefficients a and ß, pixels pm, from different positions in the same image frame lt are considered, and such pixels pm, are compared with their corresponding positions in the noise model data. ln some embodiments, said test set of pixels pm, can contain at least 0.1%, such as at least 1%, such as at least 10%, of the total set of pixels pm, in the image lt. ln some embodiments, said test set of pixels pm, can contain at most 80%, such as at most 50%, such as at most 25%, such as at most 10%, ofthe total set of pixels pm, in the image lt. The test set of pixels pm, can be geometrically evenly distributed across the total set of pixels pm, in the image lt. For instance, the set can form a uniform sparse pattern extending across the entire image lt, or extending across at least 50% of the image lt; or the set can form a sparsely but evenly distributed set of vertical and/or horizontal full or broken lines distributed across the entire image lt, or at least 50% ofthe image lt. ln some embodiments, pixels that are overexposed are not included in the test set. This can be determined by comparing the pixel values to a known threshold value, often provided by the sensor manufacturer. lf it is not known, the threshold value can easily be established experimentally.

Next, the equation for checking the limits (that is, (6) above or an equivalent formulation of this equation), is updated according to the following:_ Å 2 (lxyß _ Mxyß) > ZZÛ-aâyß- (17) Ûxyß I a/(xyß + (18) Also, since the variance aåw changes over time, the variance estimate needs to be updated as well. lt is unfortunately not feasible to use the value from (4), since it will be inflated by the exposure change that is already compensated for by using ﬁxyß as explained above. lnstead, it is updated by weighing in the current squared deviation: Û-Jâyß I Û-š,y,t-1(1 _ + šoxyß _ alixyß _ ß)2f Where g E ]0»1[- (19) (f E ]0,1[ is the factor that decides how much weight should be given to this deviation com- pared to the existing value. The higher E, the faster the noise model will adapt to fluctua- tions.

These combined, together with observing that if we apply a scaling factor a to the input, the variance can be scaled appropriately: _ 2 (lxyß _ auxyß _ > zzazÛ-aâyß- (20) This variant of the noise model requires aåw to be stored in an array, typically with one single-precision float (32 bits) per pixel pm. As comparison, the pure variance noise model stores (indirectly, by storing S and Qthat allows for calculation of aåw as described above) the estimated variance aåw for each pixel pm, when run. When this linear mapping model is used, aåw is updated using (19). Since the definition is recursive, the variance of this pixel in the previous frame will either be calculated from S and Q or from the previous iteration's calculation of (19) for this pixel. The sums SXM and square-sums OM; still need to be up- dated for every image frame lt, in order to have the numbers available as soon as the step effect has passed.To illustrate the case when the predicted pixel value is determined as ﬁxaßt = ayxyß + ß, an example will now be provided as illustrated in Figure 8. ln the chart shown therein, th Y axis shows pixel intensity iwt for one particular pixel pm, i in a sequence of consecutive dig- ital image frames lt. The X axis depicts frame numbers. The window size n = 32, which means that during the first 32 frames, the model is still being initialized. Once 32 frames have been processed, the model contains sufficient information to make predictions of expected mean yxæt and variance aåyß. The line AVG shows the rolling average ofthe last 32 frames, which is the predictor determined according to (5).

As can be seen in the graph, the true signal value fluctuates around 2021 from the start until frame #60, where there is a sudden change in exposure time. The exposure times used can be provided as a part of the frame" metadata. lf the exposure time in the new frame differs significantly from the exposure times of the recent frames, the method described in con- nection to (17)-(19) should be used, since the levels have shifted and the model will be con- taminated while this is happening.

As can also be seen on the graph, it takes 32 frames for yxyß to fully stabilize on the new level. Until that point is reached, yxyß is not a particularly good predictor, since it is lagging behind. ln order to compensate for this, the rolling average goes through a linear transfor- mation according to (18). This outcome is shown as "Adj AVG" in the graph. lt can clearly be seen that this corresponds much better to the pixel values.

Similarly, as can be seen in the following table (corresponding to the graph in Figure 8), the variance aåw is somewhere around 250 before the exposure change, whereas it gets in- flated all the way up to 4200 while the model is adapting. This is why the variance update method according to (19) is put into use. When processing frame 60, it first transforms the average value yxæt to ﬁxaßt using the linear mapping. lt calculates the new pixel value's iwt deviation from ﬁxaßt and decides whether it is outside the limits, according to (20). lf this is the first frame where the exposure change was noticed, the va riance aåyhl ofthe previous frame is used. This is initially based on S and Q (as determined according to the above), but is useful since S and Q still only includes pixel values ix,y,t from before the exposure change.

Finally, the aåyﬁl to be use for the next frame is calculated according to (19). Below, # = Frame number; PV = Pixel Value; AVG = AVG; AAVG = Adj AVG.

# PV AVG AAVG # PV AVG AAVG # PV AVG AAVG 1 2001 42 2029 2025 83 2152 2117 2149 2 2006 43 2035 2025 84 2156 21212017 44 2010 2025 85 2119 2124 2148 4 2016 45 2030 2025 86 2160 2130 2150 5 2016 46 2007 2025 87 2131 2133 2149 6 2000 47 2027 2026 88 2175 2138 2150 7 2033 48 2030 2026 89 2149 2142 2150 8 2026 49 2060 2026 90 2158 2147 2151 9 1984 50 2055 2027 91 2176 2151 2151 10 2023 51 2022 2027 92 2157 2151 11 2043 52 2026 2027 93 2154 2151 12 2016 53 2004 2027 94 2152 2151 13 2012 54 1985 2026 95 2153 2152 14 2024 55 2023 2027 96 2141 2151 15 2006 56 2022 2027 97 2151 2151 16 2012 57 2006 2027 98 2144 2152 17 2048 58 2023 2026 99 2184 2153 18 2031 59 2030 2027 2027 100 2158 2153 19 2013 60 2160 2031 2155 101 2131 2153 20 2030 61 2155 2036 2156 102 2112 2152 21 2014 62 2142 2039 2156 103 2157 2151 22 2009 63 2137 2043 2155 104 2155 2152 23 2000 64 2168 2047 2156 105 2155 2152 24 2017 65 2151 2051 2155 106 2155 2152 25 2024 66 2114 2054 2154 107 2149 2152 26 2032 67 2151 2058 2154 108 215827 2023 68 2154 2061 2153 109 2113 2150 28 2029 69 2141 2064 2152 110 2119 2148 29 1994 70 2154 2068 2152 111 2120 2148 30 2027 71 2168 2072 2152 112 2132 2147 31 2033 72 2142 2075 2151 113 2154 2148 32 2018 73 2143 2078 2150 114 2162 2148 33 2039 2019 74 2160 2082 2150 115 2142 2148 34 2023 2020 75 2155 2086 2150 116 2142 2147 35 2017 2020 76 2136 2090 2150 117 2133 2148 36 2044 2021 77 2173 2095 2151 118 2166 2148 37 2043 2021 78 2183 2100 2152 119 2142 2148 38 2046 2023 79 2144 2104 2152 120 2146 2148 39 2039 2023 80 2152 2107 2152 121 2165 2148 40 2035 2023 81 2120 2109 2149 122 2163 2148 41 2038 2025 82 2159 2113 2149 123 2135S and Q can continue to be updated as above, and can be used in order for the model to stabilize on the new level. Once the point is reached where a w 1 and ß w 0, the average and variance are considered to be stable again and can go back to the usual way of calcu- lating the va riance.

Once the information about blob-allocated pixel values ix,y,t has been updated (and the noise map has also been updated), in a subsequent step S6 blobs are generated based on the blob-allocated pixel values iX,y,t.

Blob generation is the process of iterating over the individual pixels pm, in a generated noise map, filtering out false positives and forming blobs from connected clusters of outliers. While it is important that the noise map generation is efficient, more computation per pixel pm, can be offered in the blob generation as long as it is knows that the pixel value in ques- tion ix,y,t indeed overstepped the threshold in the noise map generation.Whereas setting the limits based on mean and sample standard deviation of the recent pixel values iwt works well in most cases, one notable problematic issue arises when parts ofthe image lt become overexposed. ln this case, the signal value tends to be saturated on some value close to the upper limits of the range, and since the affected pixel values iwt as a result stop fluctuating over time, the standard deviation also becomes zero, which in turn means that even the slightest change would lead to blobs being generated.

To address this issue, one can add an additional minimum required deviation, in a step S5, used in the blob generation step as an anti-saturation filter: [ix,y,t _ .uxyßl > qyßuxy, f (21) where yxyß is the noise model's prediction for the pixel value iX,y,t. lf the deviation is less than this, the pixel value ix,y,t is discarded as a non-blob pixel despite it overstepping the initial limits set up by the noise model.

Since square roots are computationally expensive, it is better to use: _ 2 [lx,y,t _ .uxyßl > qzuxyß- (22) q is > 0 but << 1, implying that qz is even smaller. Define: B = i. (23) B is a positive number that controls the filtering limit. Since any number for B that gives the appropriate filtering effect can be selected, one can decide to pick an integer value. ln some embodiments, B is at least 10, such as at least 50, such as at least 100. ln some embodi- ments, B is at the most 10000, such as at the most Then, the condition can be rewritten as:n A 2 /\ Bhxyß _ Mxyß] > .ux,y,t- (24) Since the noise map and the noise model were updated in the same step, the noise model that this currently considered noise map was based on is already lost when arriving at the blob generation step. The noise model data is overwritten in the computer memory each iteration ofthe method. However, since yxyß was saved (with 15 bits precision in the above example) in the noise map itself, this value can be used instead when calculating (24). lfthe other terms are appropriately scaled (using fixed-point arithmetic), (24) can also be calcu- lated only using integer math.

After a possible such anti-saturation filtering step, pixel values ix,y,t overstepping the noise model limits (as described above across expressions (1)-(24)) are grouped together into multi-pixel blobs. This can be done using the per se well-known Hoshen-Kopelman algo- rithm, which is a raster-scan method to form such pixel groups that runs in linear time. Dur- ing the first pass, it runs through all pixel values iX,y,t. lf a pixel value ix,y,t oversteps a limit and it has a neighboring pixel value iXi1,yi1,t that belongs to a blob, it will be added to that same blob. lf it has multiple neighboring blob-classified pixel values iXi1,yi1,t, these will be joined into one single blob, and the pixel value ix,y,t is added to the group. Finally, if there are no neighboring blobs, the pixel value iX,y,twill be registered as a new blob. For each blob, the following metrics can be aggregated. This provides different options for estimating the center ofthe blob. One possibility is to use the absolute modulus ofthe noise model devia- tions: Short Name Type DescriptionWl weightedXSum uint32 Xp pxlip _ MP1, X Coordi- nates of blob weighted by the deviation from the noise model.

Wl weightedYSum uint32 Xp pylip _ MP1, Y Coordi- nates of blob weighted by the deviation from the noise model.

Wi weightsum uint32 Xplip _Mp]_ Sum of a|| the absolute deviations from the noise model.

XWz sqWeightedXSum uint32 2ppx[¿p_¿¿p]2_ X Coordi- nates of blob weighted by the squared deviation from the noise model.

Ywz sqWeightedYSum uint32 zppyíl-p_upiz_ y Coordi- nates of blob weighted by the squared deviation from the noise model. WZ sqWeightSum uint32 Epíl-p _ MP1? Sum of a" the square deviations from the noise model.

Experimental data so far indicates that using (21371) when the blob is small (blob size in pixels not larger than 16), but using the squared-weighted (22; jïz) options for larger blobs (number of pixels in blob > 32), and interpolating between these for medium-sized blobs achieves a good stereo matching.

Figure 7 illustrates an exemplifying clustering of four different detected blobs 1-4 based on individual pixel values ix,y,t found to fulfill the criteria for being considered as part of blobs at time t. ln a subsequent method step S7, performed by the target object tracker 140, detected blobs are correlated across said time-ordered series of digital images lt to determine paths of moving objects through said space. Such correlation can, for instance, use linear interpola- tion and/or implied Newtonian laws of motion as filtering mechanism, so as to purge blobs not moving in ways that are plausible provided a reasonable model of the types of objects being tracked. ln case several cameras 110 are used, or in case one or several cameras 110 are used to- gether with another type of target object 120 sensor, tracking information available from such available cameras 110 and any other sensors can be combined to determine one or several 3-dimensional target object 120 tracks through the space 111. This can, for instance, take place using stereoscopic techniques, that are well-known in themselves. ln a subsequent step S8, one or several determined 2D and/or 3D target object 120 tracks can be output to an external system, and/or graphically displayed on an display of a track- monitoring device. For instance, such displayed information can be used by a golfer using the system 100 to gain knowledge ofthe properties of a newly hit golf strike. ln concrete examples, the user (such as a golfer) may be presented with a visual 2D or 3D representation, on a computer display screen, of the track of a golf ball just hit, as detected using the method and system described above, against a graphical representation of a vir- tual golf practice range or similar. This will provide feedback to the golfer that can be used to make decisions regarding various parts ofthe golf swing. The track may also be part of a virtual experience, in which a golfer may for instance play a virtual golf hole and the de- tected and displayed track is represented as a golf shot in said virtual experience.lt is specifically noted that the amount of data necessary to process for achieving such tracks is substantial. For instance, at an updating frequency of 100 images per second and using a 10 l\/lpixel camera, 1 billion pixel values per second need to be processed and assessed with respect to blob status. This analysis may take place in a depicted space 111 that can include trees and other fine-granular objects displaying rapidly shifting light conditions; rapidly shifting general light conditions due to clouding, and so forth. Using the systems and tech- niques described herein, it is possible to process the data in essentially real-time, e.g., such that the track can be determined and output while the object is still in the air. ln a subsequent step S9, the method ends.

As mentioned above, the invention also relates to the system 100 as such, comprising the digital camera 110, the digital image analyzer 130 and the moving object tracker The digital camera 110 is then arranged to depict the space 111 to produce the series of digital images lt as described above. The digital image analyzer 130 is configured to deter- mine said inequality for the pixel values iX,y,t as described above, and to store in the com- puter memory information indicating that one or several pixel values iX,y,t are part of a de- tected blob. The moving object tracker 140 is configured to correlate detected blobs across said series of digital images lt as described above.

As also mentioned, the invention also relates to the computer software product as such. The computer software product is then configured to, when executing on suitable hardware as described above, embody the digital image analyzer 130 and the moving object tracker 140. As such, it is configured to receive a series of digital images lt from the digital camera 110, and to perform the above-described method steps performed by the digital image an- alyzer 130 and the moving object tracker 140. For instance, the digital frames lt can be pro- vided as a continuous or semi-continuous stream of frames from the digital camera(and a set of n most recent considered frames can be analyzed for each frame or set offrames received), or the entire set of N images can be received as one big batch and ana- lysed thereafter. The computer software product can execute on a computer belonging to the system 100, and can as such constitute part of the system 100. Above, a number of embodiments have been described. However, it is apparent to the ski||ed person that many modifications can be made to the disclosed embodiments without departing from the basic idea of the invention.

For instance, many additional data processing, filtering, transformation, etc. steps can be taken, in addition to the ones being described herein.

The generated b|ob data can be used in various ways in addition to the object tracking. ln general, everything which is said in relation to the method is equally applicable to the system and to the computer software product, and vice versa.

Hence, the invention is not limited to the described embodiments, but can be varied within the scope ofthe enclosed claims.

Claims

1. Method for tracking moving objects, comprising the steps obtaining a series of digital images (lt) at consecutive times (t), the digital images (lt) representing optical input from a three-dimensional space (111) within a field of view ofthe digital camera (110), the digital camera (110) being arranged to produce said digital images (lt) having a corresponding set of pixels (pm), said digital images comprising corresponding pixel values (iX,y,t), the digital camera (110) not moving in relation to said three-dimensional space (111) during production of said series of digital images (lt), for two or more of said pixel values (imm), determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value (imm) in question and a predicted pixel value (ﬁxyß), the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or §:~»-standard deviation (axaßt) with respect to historic pixel values (iX,y,{t-n,t-1}) for the pixel (pm) in question, where the predicted pixel value (ﬁxyß) is calculated based on historic pixel values (iX,y,{t-n,t.1}) for the pixel (pm) in for pixel values (imht) for which said first value is higher than said second value, storing in a computer memory information indicating that the pixel value (imht) is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images (lt) to determine paths of moving objects through said three-dimensional space (111).

2. Method according to claim 1, wherein 2 said inequality is (ixyß - ﬁxaßt) > Zzašßt, where ﬁxaßt is said predicted pixel value and where omm is an estimated standard deviation with respect to historic pixel values (iX,y,{t-n,t-1}) for the pixel (pm) in question. 'x ::= Method according to claim 1 or 2, wherein Method according to wherein SX,y,r, Qmm, or both, are calculated recursively, whereby a calculated value for a pixel value (imm) is calculated using a previously stored calculated value of Smm, Qm,t, or both, for the same pixel (pm) but at an immediately preceding time (t-1). Method according to claim 4, wherein Sm,t is calculated as Sxyß =Sx,y,t_1 +ix,y,t -ixaßbn and Qm; is calculated as : .2 .2 : . . . . Qx,y,t _ Qx,y,t-1 + lx,y,t _ lx,y,t-n _ Qx,y,t-1 + (lxyß + lx,y,t-n)(lx,y,t _ loggat-n)- _Method according to any preceding claim, wherein the method comprises storing in said computer memory Sm,t and Qmm in combination as a single datatype comprising 12 bytes or less per pixel (pm). ____ __Method according to any preceding claim, wherein the method comprises storing in said computer memory, for a particular digital image lt, a pixmap having, for each pixel (pX,y,) said information indicating that the pixel value (imm) is part of a detected blob. *v -= *Wršï Method according to claim 7, wherein said information indicating that the pixel value (imt) is part of a detected blob is indi- cated in a single a bit for each pixel (pm). ____ __I\/|ethod according to claim 7 or 8, wherein said pixmap also comprises, for each pixel (pm), a value indicating an expected pixel value (imm) for that pixel (pm). êšgšljgïgggglvlethod according to claim 9, wherein said value indicating an expected pixel value (imht) for the pixel (pm) in question is achieved by storing the predicted pixel value (ﬁxyß) as a fixed-point fractional number, us- ing a total of 15 bits for the integer and fractional parts. Method according to any preceding claim, wherein the predicted pixel value (ﬁxyß), the estimated variance or standard deviation (axaßt), or both, is or are calculated based on a set of n historic pixel values (iX,y,{r~n,t.1}) for the pixel (pm) in question, where 10 S n S “Wwlvlethod according to any of the preceding claims, wherein a number n of previous images (lt) considered for the estimation of an estimated var- iance or a standard deviation (axyß) of the second value is selected to be a power of _________I\/|ethod according to any preceding claim, wherein said pixel values (imht) have a depth across one or several channels of between 8 and 48 bits. “Wwlvlethod according to any preceding claim, wherein the predicted pixel value, (ﬁxyß) is determined based on an estimated projected fu- ture mean pixel value (yxyß) in turn determined based on a numerical relationship between historic pixel values (imt) and current pixel values for a sampled set of pixels (pm) in said images (lt). ethod according to claim 14, wherein the predicted pixel value (ﬁxyß) is determined as ﬁxóßt = ayxßt + ß, where a and ß 2 are constants determined so as to m|n|m|ze 21-), (LJ-Ja - (ayJ-'kß + ß» , where yj-Jm |s said estimated projected future mean pixel value for the pixel (pj,k) in question, and where j and k are iterated over a test set of pixels. ethod according to claim 15, wherein yxæt is an estimated historic mean with respect to pixel values (imm) for the pixel (pm) in question. Method according to claim 15 or 16, wherein said test set of pixels contains between 1% and 25% of the total set of pixels (pm) in the image (lt). »MMIVIethOd according to claim 17, wherein said test set of pixels is geometrically evenly distributed across the total set of pixels (pm) in the image (lt). Method according to any one of claims 15-18, wherein the method comprises determining the estimated standard deviation (axyß) according to aåyß= Method according to any one of claims 15-19, wherein the method comprises determining that at least one is true of a being further away from 1 than a first thresh- old value and ß being further away from 0 than a second threshold value; and determining the predicted pixel value, (ﬁxyß) according to any one of claims 14-18 until it is determined that a is no longer further away from 1 than the first threshold value and ß is no longer further away from 0 than the second threshold value. .g .(- - z M, .-\ xëšš* ¿_._____________________I\/|ethod according to any preceding claim, wherein the method comprises for said pixel values (imht) for which said first value is higher than said second value, only store said information indicating that the pixel value (imht) is part of a detected blob in 2 case also the following inequality holds: B[ix,y,t - ﬁxyß] > ﬁxyß, where ixyß is the pixel value in question, where ﬁxaßt is the predicted pixel value and where B is an integer such that B > WWWMethod according to any preceding claim, wherein the method comprises using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part ofa same blob. Method according to any preceding claim, wherein the objects are golf balls. gmwSystem for tracking moving objects, the system comprising a digital camera (110), a digital image analyzer and a moving object tracker, the digital camera (110) being arranged to represent optical input from a three-di- mensional space (111) within a field of view of the digital camera (110) to produce a series of digital images (lt) at consecutive times (t), the digital camera (110) being arranged to produce said digital images (lt) having a corresponding set of pixels (pm), said digital images comprising corresponding pixel values (iX,y,t), the digital camera (110) being arranged to not move in relation to said three-dimensional space (111) during production of said series of digital images (lt), the digital image analyzer being configured to, for two or more of said pixel values (imm), determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value (imm) in question and a predicted pixel value (ﬁxyß), the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation (axaßt) with respect to historic pixel values (iX,y,{t-n,t-1}) for the pixel (pm) in question, where the predicted pixel value (ﬁxyß) is calculated based on historic pixel val- ues for the pixel (pm) in question, the digital image analyzer being configured to, for pixel values (imtt) for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value (imtt) is part of a detected blob; and the moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images (lt) to determine paths of moving objects through said three-dimensional space (111). Computer software product configured to, when executing, Receive a series of digital images (lt) from a digital camera (110), the digital camera (110) being arranged to represent optical input from a three-dimensional space (111) to produce said digital images (lt) at consecutive times (t), the digital camera (110) being ar- ranged to produce said digital images (lt) having a corresponding set of pixels (pm), said digital images comprising corresponding pixel values (imt), the digital camera (110) being arranged to not move in relation to said three-dimensional space (111) during production of said series of digital images (lt); for two or more of said pixel values (imt), determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value (imtt) in question and a predicted pixel value (ﬁxyß), the second value being calculated as, or based on, a product ofthe square of, firstly, a number Z and, secondly, an estimated variance or standard deviation (axyß) with respect to historic pixel values (iX,y,tt-tt,t.1}) for the pixel (pm) in question, where the predicted pixel value (ﬁxyß) is calculated based on historic pixel values (iX,y,tt-tt,t.1}) for the pixel (pm) in question for pixel values (iX,y,t) for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value (ix,y,t) is part of a detected blob; and correlate, based on the information stored in the computer memory, detected blobs across said series of digital images (lt) to determine paths of moving objects through said three-dimensional space (111).