MXPA99007331A

MXPA99007331A - A method and apparatus for segmenting images prior to coding

Info

Publication number: MXPA99007331A
Application number: MXPA/A/1999/007331A
Authority: MX
Inventors: Chen Tsuhan; Turner Swain Cassandra
Original assignee: At&T Corp
Priority date: 1997-02-10
Filing date: 1999-08-09
Publication date: 2000-04-24

Abstract

To segment moving foreground from background, where the moving foreground is of most interest to the viewer, this method uses three detection algorithms as the input to a neural network. The multiple cues used are focus, intensity, and motion. The neural network consists of a two-layered neural network. Focus and motion measurements are taken from high frequency data, edges;whereas, intensity measuremenets are taken from low frequency data, object interiors. Combined, these measurements are used to segment a complete object. Results indicate that moving foreground can be segmented from stationary foreground and moving or stationary background. The neural network segments the entire object, both interior and exterior, in this integrated approach. Results also demonstrate that combining cues allows flexibility in both type and complexity of scenes. Integration of cues improves accuracy in segmenting complex scenes containing both moving foreground and background. Good segmentation yields bit rate savings when coding the object of interest, also called the video object in MPEG4. This method combines simple measurements to increase segmentation robustness.

Description

METHOD AND APPARATUS FOR THE SEGMENTATION OF IMAGES BEFORE T.A CODIFICATION BACKGROUND OF THE INVENTION This application refers to Application No. 08 / 429,485 filed by the same inventors on April 25, 1995, (patent number 5,710,829), which is hereby incorporated by reference as it would be repeated therein in its entirety.

The present invention relates generally to video coding and more particularly to video coding, in which the image is decomposed into objects before coding. Then each of the individual objects are coded separately.

For many image transmission and storage applications, significant data compression can be achieved if the trajectories of the moving objects in the images are estimated with good results. Traditionally, block-oriented motion estimation has been widely researched because of its simplicity and effectiveness. However, the block and object boundaries in a scene usually can not match because the blocks are not adapted to the Ref. 030754 contents of the image. This leads to visible distortions in the low bit rate encoders, known as blur and mosquito effects.

Techniques for object-oriented coding are developed to overcome the disadvantages of block-oriented coding. In a type of object-oriented coding, the sequence of images is divided into segments into moving objects. Large regions with homogeneous movement can be extracted, resulting in greater compression and visible distortions reduced movement limit. Since the objects in the foreground carry more new information in relation to the slowly changing backplane, the backplane can often be transmitted less than the first plane. Therefore, foreground objects must be correctly identified to achieve the desired compression levels without adding undue distortion.

As a result, segmentation is an important intermediary step in object-oriented image processing. For this reason, many approaches to segmentation have been tried, such as segmentation based on movement, based on focus, based on intensity, and based on disparity. The problem with each of these approaches is its characteristic specificity, which limits the scenes to which they are applied with good results. For example, the scene must contain movement for the segmentation based on the movement that is applicable. The scene must contain a significant contrast to give a segmentation based on intensity. Similar characteristics are required for the other approaches. In addition, the approach based on phrenic movement for scenes containing both foreground and backplane movement, such as the moving foreground shadows are cast on the backplane. The focus-based approach also fails when the foreground is blurred. The intensity-based approach fails for structured objects because a single object is wrongly divided into segments into multiple objects. And the measurement of disparity in the disparity-based approach is complex and prone to error.

One technique is to use a knowledge a. Priori about the images to select the coding method, which overcomes this problem. However, this makes image coding inconvenient, since the processing must include a determination of the type of image and then a selection of the most appropriate type of coding for that image. This significantly increases the costs of preprocessing the images before coding. Alternatively, a lower quality coding should be employed. Unfortunately, none of these alternatives is acceptable, since bandwidth remains limited for image transmission and consumers expect higher quality images with enhanced technology.

The problem then becomes to accentuate the consistencies of these methods and how to attenuate their defects in the segmentation of the foreground and the posterior plane. Different possibilities have been examined. An approach combines motion and brightness information in a single segmentation procedure, which determines the boundaries of moving objects. On the other hand, this approach will not work well because the mobile backplane will be divided into segments with the first mobile plane and therefore classified and encoded as the first plane.

Another approach uses defocus and motion detection to segment a portion of the first plane of the image of the portion of the back plane of the image. This process is shown in Figures 7-9. Figure 7 shows the process, Figure 8 shows the results of the segmentation on different tables, and Figure 9 shows the results of the defocus measurement. However, this approach requires a filling step for the process. Filling is a non-trivial problem, especially where the output of the segment of the foreground image by this process results in objects without closed boundaries. In this case, a significant complexity is added to the overall process. Given the complexity inherent in video encoding, the elimination of any complex step is significant in and of itself.

The present invention is therefore directed to the problem of developing a method and apparatus for segmentation of the foreground and backplane in a sequence of images before image coding, which method and apparatus does not require a priori knowledge with respect to to the image that is divided into segments and is still relatively simple to put into practice.

BRIEF DESCRIPTION OF THE INVENTION The present invention solves this problem by integrating techniques for multiple segmentation using a neural network to apply the appropriate weights to the segmentation mapping determined by each of the separate techniques. In this case, the neural network has been trained using images that are divided into segments by hand. Once trained, the neural network assigns the appropriate weights to the segmentation maps determined by the different techniques.

One embodiment of the method according to the present invention calculates the maps of movement segmentation, focus and intensity of the image, and paisa each of these maps to a neural network, which calculates the final segmentation map, which is used then to draw the first plane divided into segments in the original image. In this modality, two consecutive images are obtained for use in the detection of the entrance of the different segmentation maps to the neural network.

The motion detection step includes detecting a difference between the pixels in successive frames and determining that a pixel is in motion if the difference for that pixel exceeds a predetermined threshold. The focus detection step includes calculating the magnitude of the detection of the Sobel edge on a nxn square pixel and dividing the magnitude of the detection of the Sobel edge by the width of the edge. The intensity detection step comprises determining a gray level of the pixel.

Another embodiment of the method of the present invention for the processing of a sequence of images for segmenting the first plane of the posterior plane includes obtaining successive images in the sequence, simultaneously measuring the movement, focus and intensity of the pixels within the images. suc-esivas, introduce the measures of motion, focus and intensity to a neural network, calculate the segments of the foreground and backplane using the measures of movement, focus, and intensity with the neural network, and draw a segment map based on the foreground and back plane segments calculated.

In an advantageous implementation of the above methods according to the present invention, it is possible to advance the training of the neural network using an adaptive learning ratio. One possible mode of the adaptive learning ratio is the following equation: ? w = Go * dpt? b = Go * d where w is the weights of the layer, b is the polarization of the layer, Ir is the proportion of adaptive learning, d is the delta vectors of the layer and p is the input vector of the layer, and T indicates that it is transposed first the vector p before being multiplied.

An apparatus for segmentation of the foreground and backplane from a sequence of images according to the present invention includes a motion detector, a focus detector, a current detector and a neural network. The motion detector detects the movement of the pixels within the image sequence and produces a motion segmentation map. The focus detector detects pixels that are in focus and produces a focus segmentation map. The intensity detector detects those pixels that have high intensity and those with low intensity and produces a map of intensity segmentation. The neural network is coupled to the motion detector, the focus detector and the intensity detector, and weights the outputs of these detectors and produces a final segmentation map.

An advantageous implementation of the neural network used in the present invention includes a two-layer neural network. In this case, the neural network has a hidden layer with two neurons and an output layer with a neuron. In this implementation, the intensity map is introduced to a first neuron in the hidden layer using a first weight and a second neuron in the hidden layer using a second weight, the focus map is introduced to the first neuron in the layer hidden using a third weight and the second neuron in the hidden layer using a fourth weight, the motion map is introduced to the first neuron in the hidden layer using a fifth weight and the second neuron in the hidden layer using a sixth weight. Polarization information is introduced to the first and second neurons using a seventh weight and an eighth weight, respectively.

Still another advantageous embodiment for the implementation of the method of the present invention includes means for digitizing the sequence of images to obtain a sequence of digitized images, means for segmentation of an image based on the movement of an object within the image, movement segmentation means are coupled to the means for the digitization and production of a motion segmentation map, means for segmentation of an image using focus measurements, the focus segmentation means are coupled to the means for the digitization and production of a focus segmentation map, means for the segmentation of an image using brightness measurements, the brightness segmentation means are coupled to the means for the digitization and production of a brightness segmentation map, and a neural network that calculates a segmentation map using an output of segmentation maps by the media of segmentation of movement, the means of segmentation of brightness and the means of segmentation of focus.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 describes a two-layer neural network used in the apparatus of the present invention, with a weight mode for the different trajectories in the network.

Figure 2 describes the neural net training algorithm for the segmentation of the foreground and the posterior plane.

Figure 3 describes the segmentation algorithm of the foreground / backplane of the present invention.

Figure 4 describes the training diagram of the neural network showing the sum-error squared against periods.

Figures 5 (a) - (c) describe the results of the segmentation of the present invention, in which 5 (a) is the original frame, 5 (b) is the output divided into segments of the neural network, and (c) is the drawing of the first plane divided into segments.

Figure 6 describes a possible embodiment of the apparatus for employing the method of the present invention.

Figure 7 shows a prior art process using a fill segmentation procedure.

Figure 8 shows the results of the segmentation of the process of Figure 7 on several tables.

Figure 9 shows the results of the blur measure used in the process of Figure 7.

DETAILED DESCRIPTION The present invention provides an approach for the segmentation of the close-up of the backplane based on integrated indications. This approach integrates three measures, focus, intensity, and movement, using a two-layer neural network, to segment complex scenes. Its advantage is that it combines simple segmentation measures that increase the robustness to segment a variety of scenes.

By forming three segment maps separated from the image sequence, then the present invention chooses the best map based on the training of a neural network. The neural network used in the present invention is shown in Figure 1, along with the optimal weights determined from the training of this network using a variety of images.

Each map is an N x M image consisting of N x M pixels. The corresponding pixels are introduced into the neural network, I (i, j), m (i, j), f (i, j), where i = 1,, N and J = 1,, M, one at a time in an order from left to right from top to bottom. The pixel values for the motion map are either 0 or 255 (where 0 indicates no movement and 255 indicates movement). The pixel values for the focus map and intensity map range from 0 to 255, inclusive.

Now, once each pixel is introduced to the neural network, the network calculates an output value, or (i, j) for the inputs (i, j). The final output result is an N x M image, where 0 = backplane and 255 = foreground.

In this way, it can be thought of the processing of an image as a loop that is performed N x M times, that is, the neural network is accessed N x M times. Similarly, for a sequence of images, if an image is connected in loops N x M times, then for the K images, the neural network is accessed K x N x M times.

According to the present invention, a two-layer neural network integrates three measures for segmentation: focus, intensity, and movement. It is noticeable to note that any technique to detect focus, intensity or movement, respectively, will suffice, since it provides a segmentation map based on the same information. Then the training of the neural network will determine the appropriate weights that apply to the different inputs using the different segmentation techniques.

Two assumptions are made about the scene. First, it is assumed that the first plane of the scene is focused and the back plane becomes blurred, that is, the closest objects are focused. Second, it is assumed that objects that are divided into segments are in motion.

Focusing Detection Segmentation Measures The technique for detecting focus used in the present invention is a known technique, therefore a complete detailed description is not necessary to describe the present invention. A brief description, however, will be useful.

The focus is a function of depth. An edge is from the point of focus the farthest, it becomes the fuzziest. This measurement indicates different depths. If an E point of the object is not in focus, the resulting image is a blurred image called circle of confusion. The size of the circle of confusion, therefore the amount of focus, is a function of depth u of point N.

The focus of the image is easily measured from the high-frequency components, such as the edges of the image. An edge that is the least blurry, the focus of the image the highest, is measured from the strength of the edge. The measure of focus d on a neighborhood n x n in an image is where | s (x, y) \ is the magnitude of the detection of the Sobel edge in the image g (x, y) and w is the width of the edge in g (x, y). Then, within the neighborhood nxn, / (x + i, y + j) = d, where / (x, y) is the image of the focus measure, i = 0, ...., n, and j = 0, ...., n.

The output of this detector is a map that shows the pixels in the current image that are in focus and those that are fuzzy, that is, the pixels that are part of the foreground and the pixels that are part of the backplane. Then this map is introduced to the neural network as discussed below.

Movement detector As in the focus detection, the technique for motion detection used in the present invention is a known technique, thus a detailed description of this technique is not necessary to describe the present invention. A brief description, however, will be useful.

The movement is detected using a subtraction method, md (x, y) = g1 +? (x, y) - g ^ x ^), where md (x, y) is the detected motion image and g ± y g1 + 1 is the i = y (i + l) £ box in the sequence. The movement between successive frames is indicated by pixel differences greater than the threshold T. If the pixel difference is greater than the threshold, the pixel in the current image is set to a gray level of 255, otherwise it is fixed at a gray level of 0. In this case, a gray level of 255 represents black and a gray level of 0 represents white. This threshold is determined experimentally in a known way. Yes the object has not been moved, then the result is a blank image. 255 yes md (x, y) > T, m (x, y) = 0 yes it is different where m (x, y) is the image divided into movement segments.

The output of this motion detector is a motion map indicating the pixels that are in motion and those that are not, which represent the pixels that are part of the foreground and the pixels that are part of the backplane, respectively.

Intensity Detector As in the detection of focus and movement, the technique for intensity detection used in the present invention is a known technique, thus a detailed description of this technique is not necessary to describe the present invention. A brief description, however, will be useful.

The intensity I (x, y) is simply a gray level from 0 to 255. The importance of the intensity data in the foreground is that they help the neural network by 1? segmentation of the interiors of the object. Focus and movement are measured from the edges of the object. Therefore, a third measurement is necessary for the interiors of the object. In this application, this measure is intensity, where large regions are introduced into the neural network.

The output of this detector is an intensity map, which indicates those pixels that belong to the first plane and those to the back plane.

Neural Network A two-layer back propagation network is trained to segment a sequence.

Figure 1 shows the architecture of the network. The neural network 10 includes a hidden layer 11 and an output layer 12. The hidden layer 11 contains two neurons 13, 14, and the output layer contains a neuron 15. Neurons 13-15 use sigmoid functions with weighted inputs. Essentially, these are sum amplifiers with weighted inputs. The inputs to the network are the measures of movement, focus and intensity, or segmentation maps. The output is the image of the foreground divided into segments o (x, y), 255 yes it is the first plane, or (x, y) 0 yes it is different The network is trained using the two initial frames of a sequence and its result divided into segments by hand. It is possible to advance the training with a proportion of adaptive learning, according to the rule, ? w = Go * dpt ? b = Go * d where is the weights of the layer, b is its polarization, Ir is the adaptive learning proportion, d is the delta vectors of the layer, and p is its input vector and T indicates that the vector p is first transposed before being multiplied .

METHODOLOGY The present invention provides an integrated segmentation approach for image coding. The characteristics of the foreground and posterior plane are divided into segments and the characteristics of the posterior plane are discarded. The network is trained first using the first two frames of a sequence to obtain focus, movement, intensity, and data divided into segments. See Figure 2, which shows the four-step training algorithm 20 for training the neural network.

In the first step of the process, the first images in the sequence are obtained 21. Then, the images are divided into segments 22 by hand. Then, the movement, focus and intensity are calculated 23. Finally, the neural network is trained 24 using the acceleration process discussed earlier.

Figure 3 also shows the segmentation algorithm of four steps 30. First, two successive images are obtained 31. Then, the focus, movement, and intensity are measured 32. The measurements are introduced to the trained neural network 33. The network produces the foreground divided into segments. Then the first plane divided into segments is plotted in the original image 34, which indicates the capability of the process of the present invention.

As can be seen in Figure 5 (c), the output of the process properly results in the segmentation of the man from the posterior plane. The segmentation map is used as a mask for the separation of the foreground from the backplane in the process of image coding.

The neural network applies the weights in Table 1 below to calculate the following equation: W? [VnKx, y) + v21 / (x, y) + v31m (x, y) + v41b (x, y)] or (x, y) = w2 [v12I (x, y) + v22 / (x , y) + v32m (x, y) + v42b (x, y)] + w3b (x, y) where o (x, y) is the segmentation map, I (x, y) is the intensity segmentation map, m (x, y) is the movement segmentation map, / (x, y) is the map of focus segmentation, b (x, y) is the polarization information, and vu, v21, v31, v41, v12, v22, v32, v42, wx, w2 and 3 are the weights indicated in Table 1. These weights have been determined to operate on certain particular images. Modifications to exact weights will be presented depending on the exact images used.

These weights are merely indicative of those determined by the inventors. TABLE 1 RESULTS Training of the Neural Network Figure 4 shows a diagram 40 of the error ratio (sum-squared error) 41 against periods (ie, training cycles, which for an NxM image is NxM processing bytes) 42 during the training of a sequence test (see Figure 5). With good training, the error is reduced, since the training time increases, until a minimum error is reached. In the training session, the error ratio (sum-error squared) a minimum of 4000 is reached.

This results in an average intensity difference between the neural network-generated segmentation map and the current segmentation map for 176x144 images of 0.0025 / pixel.

An advantage of the present invention is that it is divided into segments without some post-processing operation to fill the interior of the object divided into segments. The above techniques require a filling operation to create the segmentation mask in Figure 5 (b). This filling operation is not trivial, especially with respect to an image without rectilinear segments that are closed. As a result of the present invention, the shape of the object is preserved by the intensity measurement in the neural network. Since the focus and movement detectors operate on edge effects, which are high frequency components, they provide little information with respect to the interior of the image. In this way, without the intensity measurement, a filling operation is necessary. Since the intensity measurement provides information with respect to the interior of the image, using this information in the neural network eliminates the need for filling the interior of the image, thus making the step of filling unnecessary postprocessing. In addition, intensity measurements are easily calculated.

Segmentation Figure 5 shows the results of segmentation for a frame in a sequence of images. As shown, the segmentation of the neural network is exact for the fiftieth frame of this sequence, which is trained in the first and second frames of the sequence. Figure 5 (a) describes the output of the camera 61, which is introduced to the three detectors. Figure 5 (b) shows the output of the final segmentation map of the neural network, which, as is obvious, corresponds well to the plotted figure. Figure 5 (c) shows the plot of the foreground of segmentation, which shows the limit of the foreground and backplane. This is shown to indicate the occurrence of the segmentation approach, but is never currently created for the next step in the coding process.

Figure 6 shows the apparatus 60 for the start-up of the method of the present invention. Two successive images are first obtained using a digital camera 61, for example. Then, the digitized images are input to the three detectors 63, 64, 65, which calculate the motion segmentation maps, the focus segmentation maps and the intensity segmentation maps, respectively. Then these maps are introduced to the neural network 66, which produces the final segmentation map, which is used to draw the first plane from the back plane.

Thus, the present invention describes an approach for the segmentation of the foreground and backplane using integrated measurements. This approach is advantageous for two reasons. One is computationally simple. Two, the combined measures increase the robustness in the segmentation of complex scenes. Other possible modifications include comparing the use of intensity versus color measures as a basis for segmentation.

While the neural network is used to perform the integration of multiple maps and weight allocation, a fuzzy logic circuit could also be used. This invention could also be started on a Sun Sparc workstation with an image acquisition device, such as a digital camera and a video board.

The method of the request could also be modified and use a disparity detector known as an additional input to the neural network or as a replacement for one of the focus and intensity measures. This is achieved by simply replacing one of the focus and intensity detectors with the disparity detector, which produces its version of the segmentation map, which is then weighted by the neural network.

It is noted that in relation to this date, the best method known by the applicant to carry out the aforementioned invention, is the conventional one for the manufacture of the objects to which it relates.

Having described the invention as above, the content of the following is claimed as property.

Claims

1. A method for processing a sequence of images to segment the first plane of the posterior plane, characterized in that it comprises the steps of a) obtain successive images in the sequence, b) simultaneously measuring the movement, focus and intensity of the pixels within the successive images; c > introduce measures of movement, focus and intensity to a neural network; d) calculate the segments of the foreground and posterior plane using the measures of movement, focus, and intensity with the neural network; e) draw a segment map based on the segments of the first piano and calculated backplane; f) train the neural network using two initial frames and one result divided into segments by hand; and g) accelerate the training step f) using an adaptive learning ratio, where the adaptive learning ratio comprises the following equations: ? w = Go * dpt ? b = Go * d where w is the weights of the layer, b is the polarization of the layer, Ir is the adaptive learning ratio, d is the delta vectors of the layer and p is the input vector of the layer, and T indicates that it is first transposed the vector p.

2. The method according to claim 1, characterized in that step b) of measuring the movement comprises the step of detecting the movement: (i) detecting a difference between the pixels in successive images, - and (ii) determining that a pixel is in motion if the difference for that pixel exceeds a predetermined threshold.

3. The method according to claim 1, characterized in that step b) of measuring the focus comprises the step of detecting the focus: (i) calculating the magnitude of the detection of the Sobel edge on a pixel n x n squared; Y (íí) divide the magnitude of the detection of the Sobel edge by the width of the edge.

4. The method according to claim 1, characterized in that the step of measuring the intensity comprises the step of determining a gray level of the pixel.

5. An apparatus for segmentation of the foreground and backplane from a sequence of images, characterized in that a) a motion detector detects the movement of the pixels within the image sequence and produces a motion map; b) a focus detector detects the pixels that are in focus and produces a focus map, c, an intensity detector detects those pixels that have high intensity and those that have or to intensity and produces an intensity map; Y d) a neural network is coupled to the motion detector, the focus detector and the intensity detector, weights the outputs of these detectors and produces a segmentation map, the neural network includes a hidden layer that has two neurons and a layer of output that has a neuron, and each one of the neurons employ the sigmoid functions, where: (i) the neural network applies a first weight to the intensity map, which is then introduced to a first neuron in the hidden layer, and the neural network also applies a second weight to the intensity map, which is then introduced to a second neuron in the hidden layer, (n) the neural network applies a third weight to the focus map, which is then introduced to the first neuron in the hidden layer, and the neural network also applies a fourth weight to the focus map, which is then introduced to the second neuron in the hidden layer; and (m) the neural network applies a fifth weight to the movement map, which is then introduced to the first neuron in the hidden layer, and the neural network also applies a sixth weight to the movement map, which is then introduced to the second neuron in the hidden layer

The apparatus according to claim 5, characterized in that a polarization generator also generates a polarization signal, which is introduced to the neural network, and the neural network applies a seventh weight and an eighth weight, respectively, to the signal of polarization and introduces weighted polarization signals to the first and second neurons, respectively

7 A device for segmenting objects within a sequence of images before coding the images for the transmission or storage of the images, characterized in that it comprises • a) means for digitizing the sequence of images to obtain a sequence of digitized images; b) means for segmenting an image based on the movement of an object within the image, the movement segmentation means are coupled to the means for digitization and production of a motion segmentation map, c) means for segmentation of an image using focusing measures, focusing segmentation means are coupled to the means for digitization and production of a focus segmentation map; d1! means for segmenting an image using brightness measurements, the brightness segmentation means are coupled to the means for digitization and production of a brightness segmentation map, e) a neural network for calculating a final segmentation map using a segmentation map output by the motion segmentation means, the gloss segmentation means and the focus segmentation means and f) Polarization generation means that are coupled to the neural network, and produces a polarization signal, wherein the neural network comprises a two-layer neural network that includes a hidden layer with a first neuron and a second neuron and a layer output with a neuron, where the brightness segmentation map is introduced to the first neuron in the hidden layer using a first weight and the second neuron in the hidden layer using a second weight, the focus segmentation map is introduced to the first neuron in the hidden layer using a third weight and the second neuron in the hidden layer using a fourth weight, the segmentation map of motion is introduced to the first neuron in the hidden layer using a fifth weight and the second neuron in the hidden layer using a sixth weight, and the polarization signal is introduced to the first and second neurons using a seventh weight and an eighth weight, respectively.

8. A device for segmenting objects within a sequence of images before coding the images for transmission or storage of the images, characterized in that it comprises: a) means for digitizing the sequence of images to obtain a sequence of digitized images; b) means for segmentation of an image based on the movement of an object within the image, the movement segmentation means are coupled to the means for digitization and production of a motion segmentation map; c) means for segmentation of an image using focus measurements, focus segmentation means are coupled to the means for digitization and production of a focus segmentation map; d) means for segmenting an image using brightness measurements, the brightness segmentation means are coupled to the means for digitizing and producing a brightness segmentation map; e) a neural network for calculating a final segmentation map using a segmentation map output by the motion segmentation means, the brightness segmentation means and the focus segmentation means, where the neural network calculates the following equation : ? [VpKx, y) + v21 / (x, y) + v31m (x, y) + v41b (x, y)] or (x, y) = + 2 [v12I (x, y) + v22 / (x , y) + v3? m (x, y) + v42b (x, y) J + w3b (x, y) where o (x, y) is the final segmentation map, I (x, y ^ is the map intensity segmentation, / (x, y> is the focus segmentation map, b (x, y) is the polarization signal, and v11; v ::, v31, v4i > vi2 v22 < v32 v42 , w1 (w2 and 3 are the weights used in the neural network. - SUMMARY OF THE INVENTION The invention relates to segmenting the first plane and moving backplane, where the first moving plane is of most interest to the observer, this method uses three detection algorithms as the input to a neural network. The multiple indications used are the focus , intensity, and movement. The neural network consists of a neural network of two layers. The focus and movement measurements are taken from the low frequency data, edges; while, the intensity measurements are taken from the low frequency data, inside the object. Combined, these measurements are used to segment a complete object The results indicate that the first mobile plane can be divided into segments of the first stationary piano and the moving or stationary rear plane The neural network divided into segments the entire object, both interior and outside, in this integrated approach. The results also show that combining the indications allows flexibility in both the type and complexity of the scenes The integration of the indications improves the accuracy in the segmentation of complex scenes that contain both the foreground and the moving backplane Good segmentation produces savings in the bit rate when the object of interest is encoded, also called the video object in the MPEG4. This method combines simple measures to increase: segmentation robustness.