HK1067716B

HK1067716B - Interactive teleconferencing display system

Info

Publication number: HK1067716B
Application number: HK04109747.3A
Authority: HK
Inventors: Vlahos Paul
Original assignee: 伊马特公司
Priority date: 2001-02-16
Filing date: 2002-02-14
Publication date: 2008-11-28

Description

Interactive teleconferencing display system

RELATED APPLICATIONS

This is a continuation of U.S. application serial No. 09/788,026 filed on 16/2/2001.

Background

Teleconferencing, the use of video and sound to connect two or more locations, allows a remote group of people to see and hear a presenter at another location. A presenter from a remote location will typically combine graphics by using split screen technology or dual monitors.

Rear projection and large liquid crystal display screens have been used to combine exhibitors with graphics. Viewers in the same room as the presenter in front of the rear projection or liquid crystal display will see the first generation graphics, but when the photograph is sent to another location, it must be projected again, making it a second generation. Due to the loss of two generations, the far away graphic data is degraded to the point where many drawings, charts and text cannot be clearly read.

Using front projection to combine the presenter and the graphic experiences additional problems, making the presenter eye-catching and distorting the graphic that it's body intercepts, which is disconcerting to the viewer.

In teleconferencing, there are many variations of techniques for combining a presenter and a selected graphic. None of these techniques can be considered ideal. An effort to place the presenter in front of the projected graphics is to improve the personal contact between the presenter and his audience, as compared to having a sleepy graphical image of the off-screen presenter.

Brief description of the invention

The interactive teleconferencing display system uses devices that perform the same functions at each location, thereby allowing any location to initiate or participate in a conference. The apparatus includes a front or rear projection screen, an electronic projector, and a signal processor. When the presenter is in front of the front projection screen, a matte signal (mattesignal) is generated that selectively suppresses the projector to prevent the projected graphics from reaching the presenter. The graphics are downloaded and stored in all locations. The presenter, having been extracted from the matte signal, is sent to all locations where it is masked (masked over) on the graphic prior to projection. By separately transmitting the graphical image and the image of the presenter and combining them at a remote location, each is original and has no loss of detail when displayed

Individuals in another location may participate at any time by walking in front of their screen. All positions will see and channel the presenter and additional participants. The two participants can see each other and point to and discuss the material being displayed. They can also see their local audience without being blinded by the projector. Participants from other locations can join and appear on all screens.

Brief Description of Drawings

Fig. 1 illustrates the position of the signal processor relative to the projector when using a front projection.

Fig. 2 illustrates the function of the signal processor.

Fig. 3 together with fig. 2 illustrate the interconnection between two locations.

Fig. 4 shows a block diagram including components of the present invention.

Fig. 5 is a graph showing a relationship between infrared ray deviation from a screen and reduction of a video signal.

FIG. 6 is a logic diagram of elements of an operating system.

Fig. 7 illustrates the function of the signal processor when a rear projection or liquid crystal display screen is used.

Fig. 8 illustrates the interconnections required for a four-location teleconference.

Fig. 9 illustrates the additional synthesis stages required when adding the third and fourth positions.

Detailed Description

Fig. 1 shows a typical conference room 1. Each room contains a screen 2, a participant presenter 3, an electronic projector 4, often located above a viewer 7, a computer 6 or other storage device for storing and retrieving graphics (e.g., DVD, VCR, etc.), and a signal processor 5.

The signal processor contained in a single package is a key element of the present invention in that it includes all elements of the system except the projector, projection screen and image storage device. The device is most likely a computer and is placed in an area easily accessible to the operator.

One of the signal processor components is a camera which, assuming one is using a front projection screen, must be placed in close proximity directly below or above the projector, or it may be integrated into the projector. A user with sufficient space behind the projection screen may use a rear projection. In this case, the ideal camera position is a point above the viewer perpendicular to the screen and on a common axis through the center of the screen and the projector lens. Although liquid crystal display screens are still relatively small, they are becoming large and large enough to become available to a large audience. Another possibility is a multi-cathode ray tube display. The disadvantages are the cost and the presence of connecting lines between the tubes. These screens have some advantages over rear and front projection screens, but have a few disadvantages in addition to cost and small size. Although it is expected that most users will use the front projection screen, the following system description is applicable to all display methods except where noted.

The camera provides an image of the presenter and anything it adds to the scene, such as material written on a whiteboard. The participant may not always need the stored background graphics and in these cases memory 26 will contain a black slide or will not be used.

Fig. 2 and 3 show the display components at positions a and B that are remote from each other, but the diagrams of fig. 2 and 3 illustrate the interaction that occurs between the components at each position. Numerals 20 to 29 denote functions of the signal processor. The series of numbers 30 to 39 are the same signal processor functions in the second position.

Referring to fig. 2 (position a), the selected graphics image from memory 26 is passed through a compositing function 25, through a suppressor function 24, and then to a projector 27 which projects the selected graphics onto a screen 29. The viewer at location a will see the stored graphical image from local memory projected onto the projection screen as the original image without loss of detail.

Referring to fig. 3 (position B), the same graphic image will be retrieved from computer 36 and passed through compositing function 35, through suppressor function 34, and then to projector 37 which projects the selected graphic onto screen 39. If there are third and fourth participating locations, their spectators will also see the same graphics, obtained from their own computers, projected onto their screens without loss of detail.

As long as there is no presenter in front of any projection screen, the presenter mask extraction function (22, 32) has no foreground image to extract and the compositor (25, 35) has no foreground image to composite, and the suppressor (24, 34) has no presenter to protect. When a person or object enters the front of the screen, it becomes a foreground object and activates the object related functions above.

The camera 20 is placed directly below the projector 27 to see the presenter 28 and maintain proper alignment of the suppression mask. A beam splitter is provided in the camera 20 to split the infrared or other image in order to generate a mask signal in a mask generator 21.

There are several mask generation methods in use. One is described in U.S. application serial No. 09/788,026 filed on 16/2/2001. One such method is described below with reference to fig. 4.

The projection image source 41 of fig. 4 represents a source of video images to be projected onto a projection screen 43. The image source 41 may be a computer, a video recorder, a digital video disc, another camera, or other source of video images.

The video program signal from the image source 41 is connected to a suppressor 42 where the video signal at selected pixels can be suppressed. The program signal is then connected from the suppressor 42 to a video projector 46, which projects a program image on a projection screen 43.

In one embodiment, at least one infrared source 47 is used to uniformly illuminate the projection screen 43. As illumination in the infrared, the illumination is not seen by the observer. Camera 45 is an infrared sensitive video camera that views a uniformly illuminated projection screen. The output of the camera 45 is connected to the video suppressor 42. The infrared signal from the video projector at the suppressor 42 is cleared to zero. In the case where the object 44 enters the projection beam, the infrared reflection of the object may be higher or lower than the uniform infrared brightness level of the projection screen. Any infrared deviation from the infrared signal level established for the projection screen represents the object. The addresses of those detected pixels identifying the location of the object are used to suppress the video program signal at those same addresses.

There is always the possibility that a small area on the subject's clothing will reflect exactly the same amount of infrared as the screen. In this region, the suppressor is spoofed, while the video signal is not suppressed. Such a region is less problematic because there is little possibility that the infrared reflection from the face of the subject matches the infrared reflection from the screen.

By selecting the pass band of the infrared camera that is least likely to match the object reflection level, the likelihood of spoof suppression logic may be reduced.

The near infrared bandwidth is broad and the infrared provided by an incandescent source provides a flat broad illumination bandwidth. Infrared sensitive cameras may thus be equipped with filters that abut passbands such as 700-. Only a small movement of the pass band is required to cause a large change in the infrared reflection. Filter selection may be made during setup to prevent infrared reflection of the object from matching infrared reflection of the screen.

An alternative to selecting an outer passband camera filter is to incorporate two or more infrared image channels in the camera, each filtered to a different passband, with a separate infrared reference frame stored for each passband.

It is highly unlikely that the infrared reflection of an object will match the infrared reflection of two or more infrared passbands simultaneously.

Options for

To suppress the projected image from falling onto the object when the object enters the projected image, it is necessary to separate the object from the scene onto which it is being projected.

There are several existing ways of detecting the position of an object. Standard difference keying (standard difference key) or masking (matte) relies on a reference frame of a blank screen to be compared with subsequent frames to detect the position of an object. Since the image in the visible spectrum is also projected onto the screen, it appears that the standard contrast keying does not work in this application.

Another option is to flood the projection screen with one or more bands of ultraviolet light outside the visible wavelength.

One may also separate the object from the projection screen by using a long wavelength infrared camera that is sensitive to human body temperature. Since this type of camera sees body temperature, it is not necessary to flood the screen with long-wavelength infrared light.

Other methods identify the presence of objects by radar or sonar techniques that detect objects that are at a shorter distance than the screen.

Stereoscopic devices and maximizing image detail have been used in automatic cameras to determine distance. Any scheme providing a signal separating the object from the projection image may be used in the present invention to suppress the projection image in the area occupied by the object.

Preference options

A preferred option for use in the present invention is to illuminate the projection screen with near infrared light. The infrared brightness level of the projection screen may be monitored and the reference frame may be updated to compensate for line voltage changes to the infrared source. The updated reference frame allows improved object detection when the infrared ray difference is small. By using the infrared part of the radiation spectrum, the projected and detected infrared image is not affected by variations in the content of the projected image.

The use of infrared illumination to produce difference or ratio masks (spectral) provides a practical way to identify those pixels that are occupied by an object. The equations for generating the appropriate ratio and difference masks for this purpose are as follows:

proportion shade

If IRo is less than or equal to IRm

Then M is IRo/IRm

If IRo > IRm

Then M is IRm/IRo

If IRm is IRo is 0

Then M is equal to 0

Difference shade

M＝1-{max[(IRo-IRm)，(IRm-IRo)]}

Wherein:

IRo-observed IR pixel value

IRm ═ stored IR pixel values (at the same location)

M is the calculated mask value

The suppression of the projected image may be continuous, either linear or non-linear, as opposed to a switching action. If non-linear, the earliest and smallest detectable change in the infrared signal results in a small reduction in the video signal level. As the deviation increases, the rate of suppression increases. As the deviation approaches the selected level, the rate of suppression is rapidly increased to a cut-off point or a selected low level near the cut-off point. The variable rate at which signal suppression occurs prevents the on-off flicker effect of the switching action. Fig. 5 illustrates this relationship.

The term "suppression" is defined as a reduction of the projection image level in the area occupied by the object. In fact, if the level is reduced to about 5% of full level, the visibility of the object is reduced to visible black. When there is little or no projector illumination onto the object, it will not receive illumination, except for ambient room light, which is typically attenuated to a very low level when using a projector.

Since the object illumination from the video projector has been suppressed to near zero, the RGB levels representing white (or colored) light may be added to those pixels defining the object area. The illumination of the object may thus be increased above the illumination produced by the ambient light alone. Although at a lower level, supplemental object lighting that increases ambient room light may have some interference with objects facing the projector.

The techniques described in U.S. patent No.5,270,820 can be used to locate the head (or other end) of a speaker. With this additional information, white (or colored) light projected onto the subject can be suppressed in the area of its head and eyes.

The term "projection screen" or "screen" has been used above. The screen may be white, granular, metallic or metal-coated lenticular, or any surface suitable for viewing a projected image.

Practice of

In fig. 4, image source 1 the video program source may be a computer, a video recorder or a video disc selected by the user.

The video projector 46 and the projection surface 43 are commercial devices selected by the user. An infrared filter removes any residual infrared in the video projection beam, if desired.

The infrared-sensitive camera 45 is a video camera whose light sensor extends into the near infrared above 700 nm. A filter is placed on the camera lens to remove visible wavelengths.

At least one of the infrared sources 47 is a projector using an incandescent lamp. A filter is placed over the infrared source to remove visible light. Suppressor 42 is a detector/suppressor. The function of which has been described earlier.

Fig. 6 is a logic flow diagram that illustrates the functionality of object detection and program signal suppression. Referring to fig. 6, the IR camera 61 may be a 480 line VGA progressive scan low resolution camera, or any other low resolution camera sensitive to near infrared rays. The sharp frame store 62 is a stored infrared image of the infrared illuminated screen with objects removed from the scene. The mask generator 63 compares the image of the infrared sensitive camera with the clear frame image in memory 62 and any differences identify which area the object occupies if it exists. The shaping function 64 shapes the object detection signal from an on-off signal to a linear or non-linear signal as shown in fig. 5.

Projector image source 65 is a source of programs to be projected onto the projection screen. Generally, the program video is an image having much higher resolution than the NTSC signal. Image size detection 66 determines the resolution of the program image and connects the size data to a scaling and filter 67, which acts as a standard converter, to scale the size of the infrared camera to match the size of the projected image. After the image size is matched, the program image is suppressed in the suppression projector image 68 in the area occupied by the object if the object is present. The projector 99 projects the program image onto the screen, but does not project the program onto the object.

The mask signal 21 is generated from the information provided by the camera 20 by one of such existing methods.

The mask signal generator 21 generates a suppression mask signal and supplies it to the suppressor 24. For those pixels that constitute foreground objects, the mask signal is assigned a value of 0.0. Pixels in the area of the screen around the object that displays the graphics are assigned 1.0. The graphics image 26 passes through a combiner 25 to a suppression multiplier 24. The graphic image is multiplied in 24 by the mask signal from 21, which turns off (suppresses) the projector signal in the object region at zero in the object region. The viewer at position a (fig. 2) now sees the presenter illuminated by room light, and the graphic appears on the screen behind him. The presenter can see the audience without being dazzled by the glare of the projector. The use of a mask signal in generating the suppression signal is described above. (rear projection images or liquid crystal displays do not require a suppression signal, although a masking signal would be required to isolate the object.)

The suppression mask signal from generator 21 is inverted to form a second mask signal that provides a 1.0 value for the subject area and a 0.0 value for the background surrounding the subject. The second mask and the video signal from the camera 20 are connected to a multiplier 23. The product of which is the processed foreground signal (PrFg) composed of objects relative to the 0.0 black field. The processed foreground with objects on the 0.0 black field is intentional because the darkest black in the video signal is at the top of approximately 7% of the white base (pedestal). 0.0 of the processed foreground video is thus the transmitted mask signal with the isolated object. The processed foreground 23 from position a is connected to the mask extraction function 32 and the synthesis function 35 at position B.

By setting the detection threshold at about 3%, the mask extraction function 32 separates the processed foreground, whose lowest level is 7% of the base, from 0.0 of the black field. All pixels above the threshold are in the foreground and are assigned a 1.0 value. All pixels below the selected threshold are in the background and are assigned a value of 0.0. The designation of the pixel value as 1.0 or 0.0 is arbitrary and can be reversed as desired for the function that is desired to be controlled. A threshold level above the camera and system noise is necessary to prevent background area noise peaks from being incorrectly accepted as object pixels.

The extracted mask is inverted to provide 0.0 in the foreground region processed and 1.0 in the graphics region around the object. Multiplying the pattern image from source 36 by 1.0 (the mask signal) maintains the full signal level of the pattern around the object, but 0.0 in the object area generates a 0.0 black hole in the projected pattern. The synthesis function 35 adds the processed foreground, consisting of objects only, to the black hole generated for it. The composite image from 35 is passed through the suppression function 34 to the projector 39. The viewer at location B sees that graphics from his own image source 36 are being projected onto his own screen, while the video image of the presenter from location a is composited on his graphics.

The quality of the image is limited only by the resolution of the original image and the resolution of the projector. By preloading the graphics at each location, the remaining data to be sent to the other locations is simply a processed video signal with sound.

The process of multiplying and adding the mask signals to synthesize the image over the background preserves the transparency of the object edges. However, when the designated mask signal is a binary switch (i.e., 1.0 or 0.0), the composite image may be formed by a key function derived from the mask signal to switch between the stored image and the presenter. In either case, the presenter pixel values replace the pixel values of the background image to form the composite image.

The binary I/O mask signal produces a steep-edged switch, however the mask edge can be sized to better fit the object contour, and it can be softened to improve the transition from the presenter to its background.

The suppression function 34 waits for the presence of the presenter 38. When the person 38 at position B wishes to participate, he walks in front of his screen. Functions 30, 31, and 34 suppress pixels in projector 37 from being projected onto person 38. Functions 30, 31 and 33 produce a processed foreground PrFg that is passed back to location a to mask extractor 22 and synthesizer 25. The video of person 38 at position B in front of his screen will be composited on the graphic being projected at position a. The spectator at position B will see the real person of the participant 38 in front of the projected graphic and will see the presenter 28 composited on the graphic.

By viewing the screen, both participants will see a video image of the other person composited with the graphics. Participants can see and face each other, point to elements in the graph and discuss them. The spectators at locations C and D will see presenter a and participant B on their projection screens. The persons at C and D may also become participants by walking in front of their screens. The spectator at the participant's location will see the real person of his presenter, while all other presenters will appear on the screen behind him but in front of the projected graphics.

There is a significant limit to the number of simultaneous participants that can be in the scene and still see the graphics behind the participants. If the presentation is in the form of a multi-story presentation, graphics may be generated to occupy the upper portion of the screen so that the seated participants will not obscure the material that needs to be seen by the audience. Each presenter takes their presentation in turn while the audience at all positions view the reaction of the speaker and the seat.

If a large whiteboard is used as the projection screen, the presenter and whatever he writes or draws become part of the theme and will be projected onto the whiteboard elsewhere. A participant from another location can draw on his own whiteboard and the content he writes will be projected on all other whiteboards. In this manner, each location may contribute to a sketch, add inventory, mark a location on a map, and so forth.

Rear projection and liquid crystal display systems do not require the suppression function 24 and are therefore bypassed. Fig. 7 shows the signal flow through the signal processor after the suppression function is removed or deactivated.

Interconnecting multiple locations

Fig. 8 illustrates the interconnections required for four participating locations such as A, B, C and D. The output signal at each of these locations is the processed foreground (PrFg) and is connected to the synthesis function at all other locations. The required input for each location is the PrFg signal from all other locations. In fig. 8, PrFg 23 from location a is shown connected to synthesis functions B, C and D, illustrating how PrFg is connected to an input stage at another location. The remaining connections are made as shown in fig. 8.

Fig. 9 illustrates the synthesis function required when there are four participating positions. If only location B is sending a PrFg signal to location a, then functions 22 and 25 are all required. The addition of the third position C requires separate synthesis stages 22 'and 25'. The addition of the fourth position D requires separate synthesis stages 22 "and 25". The number of synthesis stages required is one less than the number of participating locations.

Claims

1. A method for displaying a composite video image of a presenter in front of a selected background image at a plurality of remote locations without loss of detail in the background image or the presenter image, the method comprising the steps of:

a) storing the selected background image in a memory at each remote location,

b) generating a mask signal identifying those pixels in a video image representing the presenter,

c) the signal level of the pixel comprising the presenter is sent to each remote location,

d) generating a composite video image of the presenter and the stored selected background image by replacing pixel signal levels in a background image with pixel signal levels of the presenter image at corresponding addresses.

2. The method of claim 1 wherein said storage device comprises a computer, DVD, VCR or other image storage and retrieval device.

3. The method of claim 1, wherein the stored selected background image comprises at least one of a graphic, a chart, a table, and a photograph.

4. The method of claim 1, wherein the selected background image is downloaded to a remote location at the time of its selection.

5. The method of claim 1 wherein the video image transmitted to the presenter at the remote location includes said presenter on a black field having a signal level of 0.0.

6. The method of claim 5, wherein those pixels whose signal levels exceed a set threshold above a zero value are used to identify the video signal of the presenter.

7. The method of claim 1, wherein the selected background image is stored at a remote location prior to the teleconference.

8. The method of claim 1, wherein the composite video image is viewable on at least one of a front projection screen, a rear projection screen, a self-emissive liquid crystal display, and a cathode ray tube display.

9. The method of claim 8, wherein said composite video image viewed on said front projection screen employs an electronic projector.

10. The method of claim 9 wherein said projector is suppressed in an area where said presenter is located to prevent said presenter from being illuminated by said projected image.

11. The method of claim 1 wherein a person in at least one remote location can become a participant by walking in front of its own projection screen and be seen on the projection screen at all other locations.

12. The method of claim 11, wherein people in at least two locations can simultaneously become participants and be seen on the projection screen at all other locations by each walking in front of the projection screen at its own location.

13. A signal processing apparatus for displaying a composite video image of a presenter in front of a selected background at a plurality of locations without loss of detail in the background image or the presenter image, the apparatus comprising:

a) means (26) for storing said selected background image in a memory at each remote location,

b) means (21) for generating a masking signal identifying those pixels in the video image representing the presenter,

c) means (23) for transmitting a signal level comprising pixels of said presenter to each remote location,

d) means (22, 25) for composing said presenter over said background at each remote location, and

e) means (27, 29) for displaying the composite image.

14. The apparatus of claim 13 wherein said means for storing the selected background image comprises one of a computer, DVD, VCR or other image storage device.

15. The apparatus of claim 13, wherein said means for composing said presenter over said background comprises hardware or software under control of said matte signal.

16. The apparatus of claim 13, wherein the display device comprises at least one of a commercial electronic projector and projection screen, a CRT display, and a liquid crystal display.

17. The apparatus of claim 16, wherein the projection screen is reflective to infrared illumination.

18. The device of claim 13, wherein the device is capable of performing the same function at all locations.