HK1056449B

HK1056449B - Movie theatre system for processing audio signal in consumer applications and the associated method

Info

Publication number: HK1056449B
Application number: HK03108662.7A
Authority: HK
Inventors: A. Vaudrey Michael; R. Saunders William
Original assignee: Hearing Enhancement Company, Llc.
Priority date: 2000-02-04
Filing date: 2001-01-30
Publication date: 2006-03-24

Description

Movie theater system and related method for processing audio signals in consumer applications

This patent Application claims the benefit of U.S. provisional patent Application serial No.60/190,220 entitled "Use of VRA in consumer applications," filed on 4.2.2000.

Technical Field

Embodiments of the present invention relate generally to methods and apparatuses for processing audio signals, and more particularly, to methods and apparatuses for processing audio signals for use in consumer applications.

Background

End users with "premium" or expensive equipment, including multi-channel amplifiers and multi-speaker systems, currently have limited ability to adjust the volume of the center channel signal of a multi-channel audio system independently of the audio signals on the other remaining channels. Since most of the dialogue for many movies is on the center channel and other sound effects are on other channels, this limited adjustability allows the end user to boost the amplitude of most of the dialogue channels to make the dialogue more understandable during the passage with loud sound effects. Currently, this limited adjustment has significant drawbacks. First, it is an adjustment capability that can only be provided to end users with expensive Digital Versatile Disc (DVD) players and multi-channel loudspeaker systems, such as 6-loudspeaker home cinema systems, which allow the volume levels of all loudspeakers to be adjusted independently. Thus, users who cannot afford such a system cannot enjoy the way to listen to recorded or broadcast programs: allowing the end user to boost the amplitude of most conversation channels to make them more intelligible.

Second, it is an adjustment that can only provide service to one customer at a time. For example, if the user selects a sound level that is mainly dialogue sound relative to background sound to improve the intelligibility of his listening, however, this sound level may be unsatisfactory for other persons in the room. Therefore, it is not possible to provide such adjustment characteristics simultaneously to multiple listeners with different hearing preferences.

In addition, this is an adjustment that needs to be constantly modified during transients in the favorite audio signal or voice conversation (center channel) and the rest of the audio signal (all other channels). A final disadvantage is that the acceptable adjustment of speech during one audio segment of the movie program relative to the remaining audio (VRA) may not be too good for another audio segment if the remaining audio level is increased too much or the dialog level is reduced too much.

In fact, most end users will not have home theaters, i.e., Dolby digital decoders, six-channel variable gain amplifiers, and multi-speaker systems, that allow for such turndown capability for many years. In addition, the end user does not have the ability to ensure that the selected VRA ratio at the beginning of the program remains constant throughout the program.

Fig. 3 shows an arrangement of the intended spatial positioning of a common home cinema system. Although there are no written rules for audio production products for 5.1 spatial channels, there are some industry standards. As used herein, the term "spatial channel" refers to the physical location of an output device (i.e., speaker) and how sound from the output device is delivered to an end user. One of these criteria is to locate most of the dialog on the center channel 526. Likewise, other sound effects that require spatial localization would be added to any other four speakers labeled L521, R522, Ls523, and Rs524 for left, right, left surround, and right surround. In addition, to avoid damage to the midrange speaker, a Low Frequency Effect (LFE) is placed on the 0.1 channel leading to subwoofer 525. Digital audio compression allows manufacturers to provide users with a greater dynamic range of audio that is not possible with analog transmission. In the presence of some very large sound effects, this greater dynamic range makes most conversational sounds too low. The following example provides an illustration. It is assumed that analog transmission (or recording) has the capability to transmit dynamic range amplitudes up to 95dB, and that dialogs are typically recorded at 80 dB. When the remaining audio reaches an upper limit while someone is speaking, the loud passage of this remaining audio may make the conversation indistinguishable. However, this situation is exacerbated when the digital audio compression allows a dynamic range as high as 105 dB. Obviously, the dialog will remain at the same sound level (80dB) relative to the other sounds, and now only the loud remaining audio can be reproduced more realistically in terms of its amplitude. It is very common for users to complain that the dialogue level recorded on DVD is too low. In practice, the dialog is indeed at normal sound level, which is more appropriate and realistic than for analog recordings with limited dynamic range.

Even for consumers who are currently able to properly calibrate home theater systems, in many DVD movies produced today, the dialog is often masked by the remaining louder audio segments. A small group of consumers can find some improvement in intelligibility by increasing the center channel and/or decreasing the volume of all other channels. However, this fixed adjustment is only acceptable for certain audio passages, and it disturbs the sound level properly calibrated. The sound level of the loudspeaker is typically calibrated to produce a certain Sound Pressure Level (SPL) at the viewing position. This proper calibration ensures that the viewing is as realistic as possible. Unfortunately, this means that larger sounds are reproduced very loudly. During midnight viewing, this may be undesirable. However, any adjustment to the sound level of the loudspeaker will disturb the calibration.

Summary of The Invention

A method for providing a plurality of users with a modulated capability of speech relative to remaining audio (VRA), comprising receiving a speech signal and a remaining audio signal at a first decoder, and simultaneously receiving the speech signal and the remaining audio signal at a second decoder, wherein the speech signal and the remaining audio signal are received separately; and adjusting the separately received speech signal and the remaining audio signal by each decoder individually.

Brief Description of Drawings

Fig. 1 shows a general method for separating voice information associated in a recorded or broadcast program from overall background audio in accordance with the present invention.

Fig. 2 shows an exemplary embodiment for receiving and playing back an encoded program signal according to the present invention.

Fig. 3 shows the intended spatial positioning layout of a typical home cinema system.

Fig. 4 shows a block diagram of a voice-over-remaining-audio (VRA) system for simultaneous multiple broadcasts, in accordance with an embodiment of the present invention.

Fig. 5 shows an embodiment according to the invention for multi-channel transmission.

Fig. 6 shows another embodiment of the present invention.

Fig. 7 shows another embodiment of the present invention.

Fig. 8 shows another embodiment of the invention including signal processing for multi-channel rendering.

Fig. 9 shows another embodiment of the present invention.

FIG. 10 illustrates one embodiment of adding and continuously adjusting a speech component to a remaining audio component with a single control.

Figure 11 shows another embodiment of the present invention utilizing an automated VRA.

FIG. 12 illustrates an embodiment of the present invention in which various functions of the slide control are displayed.

Fig. 13 shows a flowchart of various functions of the slide control.

Fig. 14 shows another embodiment of the present invention.

Fig. 15 shows another embodiment of the present invention.

Figure 16 shows an aircraft VRA conditioning box according to one embodiment of the present invention.

Fig. 17 shows another embodiment of the present invention.

Fig. 18 shows another embodiment of the present invention.

Fig. 19 shows a headphone structure according to one embodiment of the invention.

Figure 20 shows one embodiment for maintaining a blended delivery of a work to an end user in addition to providing VRA adjustment capabilities in accordance with the principles of the present invention.

Fig. 21 shows an alternative embodiment of fig. 20.

Fig. 22 shows a fabrication process according to an embodiment of the invention.

Fig. 23 shows another embodiment of the present invention.

FIG. 24 shows a user in a multi-channel listening environment.

Figure 25 shows a VRA and an automatic VRA on a multi-channel processed headset in accordance with the principles of the present invention.

Fig. 26 shows a conventional reproduction process.

Fig. 27 shows another embodiment of the present invention.

Fig. 28 shows another embodiment of the present invention.

Detailed Description

Methods and apparatus for providing voice over remaining audio capabilities are described. In addition, the present invention discloses technological, ergonomic, economic, and application improvements for voice-over-remaining audio (VRA) and automated VRA. VRA refers to the personalized adjustment of the ratio of speech to remaining audio in an audio program by individually adjusting the volume of the utterance (language or speech), independent of the individual adjustment of the remaining audio volume (which may include music, sound effects, laughter, or other non-speech sounds included in the overall audio program). Auto VRA or auto VRA hold, refers to the automatic adjustment of the VRA ratio so that transients in the program (such as explosions) do not obscure speech. Importance of ratio of favorite audio to remaining audio

The present invention proceeds from the following facts: the listening preference range for the ratio of the preferred audio signal to any remaining audio is quite large and, of course, larger than was expected. This important finding is the result of testing a preferred range of ratios of their favorite audio signal level to all remaining audio for a small sample of the population.

Specific adjustment of the desired range for hearing impaired or normal listeners

Very targeted research has been conducted in understanding the ratio between dialog and remaining audio for normal or hearing impaired users perceiving different types of audio programs. It has been found that the population varies widely in the range of desired adjustments between speech and remaining audio.

Two experiments were performed on random samples of a population including pupils, middle school students, middle aged and elderly. A total of 71 persons were tested. The test involves requiring the user to adjust the level of his voice and the level of the remaining audio for both the football game (where the remaining audio is the noise of the court audience) and for popular songs (where the remaining audio is music). By dividing the linear value of the dialogue or voice volume by the linear value of the remaining audio volume for each selection, a measure of the so-called VRA (voice to remaining audio) ratio is derived.

As a result of this test, several facts were clarified. First, for sports and music media, no two people enjoy the same ratio of voice to remaining audio. This is important because the population relies on the producer to provide a VRA that can attract everyone (it cannot be adjusted by the consumer). But this is clearly not achievable from these test results. Although VRA is typically higher for hearing impaired people (in order to improve intelligibility), those with normal hearing will also prefer a different ratio than what the producer currently provides.

It is also important to emphasize the fact that: any device that provides VRA regulation must provide at least as much regulation capability as that derived from these tests so that it meets most of the population. Because video and home theater media provide a wide variety of programming, we should consider that the ratio should extend from the lowest measured ratio for any media (music or sports) to the highest ratio for music or sports. This would be in the range of 0.1 to 20.17, or 46 dB. It should also be noted that this is only a sample of the population and that theoretically the adjustment capability should be infinite, since it is very likely that one person would prefer not have the course audience noise while watching a sports broadcast, while another person would prefer not have the speech. It should be noted that this type of study and the particular desirability of widely varying VRA ratios have not been reported or discussed in the literature and in the prior art.

In this test, a group of elderly people was selected and asked to adjust between a fixed background noise and the announcer's voice, where only the voice could be changed and the background noise set to 6.00. The results for the senior group are as follows:

TABLE 1

Personal setting value

1 7.50

2 4.50

3 4.00

4 7.50

5 3.00

6 7.00

7 6.50

8 7.75

9 5.50

10 7.00

11 5.00

To further illustrate the fact that people of all ages have different hearing needs and preferences, 21 college students were selected to listen to a mix of voice and background sounds and the ratio of voice to background sounds was selected by adjusting the voice level. The background noise, which in this example is the court audience noise at the football match, is fixed at a setting of 6(6.00) and allows the student to adjust the announcer's pitch voices, which are recorded separately and are pure or predominantly pure. In other words, the students are selected to conduct the same test as the senior group. Students were selected to minimize hearing loss due to age. Students are all over twenty ages. The results are as follows:

TABLE 2

Student voice setting

1 4.75

2 3.75

3 4.25

4 4.50

5 5.20

6 5.75

7 4.25

8 6.70

9 3.25

10 6.00

11 5.00

12 5.25

13 3.00

14 4.25

15 3.25

16 3.00

17 6.00

18 2.00

19 4.00

20 5.50

21 6.00

The age of the senior group (as shown in table 1) was from 36 to 59, with the dominant population in the 40 or 50 year old group. As the test results indicate, the average setting tends to be reasonably high, indicating some loss of hearing as a whole. Furthermore, the range is from 3.00 to 7.75 with a span of 4.75, which further confirms the investigation of the range of deviations in the listening ratio of people's favorite voices to background sounds or any favorite signals to the rest of the audio (PSRA). The total variation range of the volume setting values for the two object groups is from 2.0 to 7.75. These levels represent the actual values of the volume adjustment mechanism used to perform this experiment. They provide an indication of the range of signal-to-noise values that may be desired by different users (when compared to a "noise" level of 6.0).

To better understand how this relates to relative loudness changes selected by different users, consider that a non-linear volume control change from 2.20 to 7.75 represents a 20dB or 10-fold increase. Thus, for even such a small sample of people and a single type of audio program, it is found that different listeners do prefer a considerably different sound level for the "favorite signal" relative to the "remaining audio". This preference spans age groups, indicating that it is consistent with personal taste and basic hearing ability and therefore completely unpredictable.

As the test results indicate, the range selected by students who did not cause hearing loss due to age (as shown in table II) varied significantly from a low setting of 2.00 to a high setting of 6.70, spanning 4.70, almost half of the total range from 1 to 10. This test illustrates how the "one size fits all" psychology for most recorded and broadcast audio signals is far from the goal of giving the individual listener the ability to adjust the mixed sound to suit his or her own taste and hearing needs. Furthermore, students are as diverse as senior groups in their settings, indicating personal differences in preferences and hearing needs. One result of this test is: hearing preferences vary widely.

Further testing in a larger sample set confirmed this result. Also, the results vary with the audio type. For example, as shown in fig. 3, when the audio source is music, the ratio of speech to remaining audio varies from near zero to about 10, while when the audio source is a sports game, the same ratio varies between near zero and about 20. In addition, the standard deviation increases almost 3 times, while the average increases more than twice as much as music.

The end result of the above tests is that if a person selects the preferred audio to remaining audio ratio and permanently fixes it, the person is likely to compose an audio program that is not so desirable for most people. As mentioned above, the optimum ratio may be a function of time, both short term and long term. Therefore, full control of this preferred audio to remaining audio ratio is desirable in order to meet the listening needs of "normal" or non-hearing impaired listeners. Moreover, providing the end user with final control of this ratio enables the end user to optimize his or her listening experience.

Independent adjustment of the preferred audio signal and the end user of the remaining audio signals will be an obvious embodiment of one aspect of the present invention. To illustrate the details of the present invention, consider an application in which the preferred audio signal is associated voice information.

Generation of a preferred audio signal and a remaining audio signal

Fig. 1 shows a general method for separating relevant voice information from general background audio in a recorded or broadcast program. First, a decision needs to be made by the program director on the definition of the associated speech. An actor, or group of actors, or live commentator must be identified as the associated speaker.

Once the relevant speaker is identified, their voice will be detected by the voice microphone 301. The voice microphone 301 requires either a near talking microphone (in the case of an announcer) or a high directional shooting gun microphone for use in recording. In addition to being highly directional, these microphones 301 should be voice band limited, preferably from 200- > 5000 Hz. The combination of directivity and bandpass filtering minimizes background noise that is coupled into the associated voice information at the time of recording. In the case of certain types of programs, the need to prevent acoustic coupling can be avoided by recording the associated speech in the dialog offline and, where appropriate, transcribing the dialog with the video portion of the program. The background microphone 302 should be fairly broadband to provide full audio quality of the background information (like music).

The camera 303 is used to provide the video portion of the program. The audio signal (speech and associated speech) is encoded in encoder 304 along with the video signal. Typically, the audio signal is separated from the video signal by simply modulating it with a different carrier frequency. Since most broadcasts are now stereo, one way to encode the background sound along with the associated speech information is to: the associated voice information is multiplexed on separate stereo channels much the same as adding the front left and front right channels to a two channel stereo sound to produce a four channel disc recording. While this may create additional broadcast bandwidth requirements, this is not a problem for recording media, as long as the audio circuitry in the video disc or tape player is designed to demodulate the associated voice information.

Once the signal is encoded (by whatever means deemed appropriate), the encoded signal is transmitted by broadcast system 305 for broadcast via antenna 313 or recorded on tape or disk by recording system 306. In the case of recorded audiovisual information, the background sound and the speech information may simply be placed on separate recording tracks.

Receiving and demodulating the preferred audio signal and the remaining audio

Fig. 2 shows an exemplary embodiment for receiving and playing back recorded program signals. In the case of broadcast information, the receiver system 307 demodulates the main carrier frequency from the encoded audio/video signal. In the case of recording medium 314, the head of a VCR or the laser reader of CD player 308 will produce an encoded audio/video signal.

In either case, these signals may be sent to the decoding system 309. Decoder 309 separates the signal into video, voice audio, and background audio by using standard decoding techniques, such as envelope detection combined with frequency or time division demodulation. The background audio signal is sent to a separate variable gain amplifier 310 so that the listener can adjust to his or her preferences. The voice signal is sent to a variable gain amplifier 311 which may be adjusted by the listener to his or her particular needs, as discussed above.

The two conditioned signals are summed by unity gain summing amplifier 312 to produce the final audio output. Alternatively, the two conditioned signals are summed by unity gain summing amplifier 312 and further amplified by variable gain amplifier 315, producing the final audio output. In this case, the listener can adjust the sound level of the associated speech to background sound at his or her unique listening requirements while playing back the audio program so that the audio program is optimized. Each time the same listener plays back the same audio, the setting of the ratio may need to be changed due to the changing hearing of the listener, the setting remaining infinitely adjustable in order to provide this flexibility.

VRA and automatic VRA embodiments

As mentioned above, the preferred ratio of voice to remaining audio is very different for different people and also different for different types of programs (sports versus music, etc.).

Figure 4 shows a block diagram of a VRA system for multiple simultaneous users, in accordance with an embodiment of the present invention. As shown, system 400 includes a transceiver 221, and a plurality of playback devices, such as Personal Listening Devices (PLDs) 220. Although only three PLDs are shown, many more PLDs may be used without departing from the spirit and scope of the present invention.

The transceiver 221 includes a receiver section 223 and a transmitter section 222 that receive the broadcast or recorded signal 235. According to one embodiment of the invention, signal 235 comprises a separate voice component signal and a remaining audio component signal that are simultaneously transmitted to transceiver 221. These signals may be decoded by a decoder (not shown) and then further processed. Alternatively, signal 235 may be processed by system components and circuitry in transmitter 222 such that a separate speech component 239 and a separate remaining audio component 240 are produced.

The separated voice and remaining audio signal components are sent by transceiver 221 to each PLD by wireless or infrared transmission or transmission over multiple routes. The received signal is received by PLD receiver 231, which may be, for example, an infrared receiver, a wireless radio frequency receiver, or a multi-port audio input plug for a wired connection. An output of the received voice signal 239 from the PLD receiver 231 is sent to a separate variable gain amplifier 229 so that the end user can adjust to his or her preference. The other output, the received remaining audio signal 240, is sent to a variable gain amplifier 230, which may be adjusted by the listener to his or her particular listening preferences. These adjusted signals are summed by summer 228 and may be further adjusted by gain amplifier 227 before being forwarded to transducer 226. The transducer converts the electrical signal from the gain amplifier 227 into an audible acoustic audio signal 232.

As discussed above, the embodiment shown in fig. 4 discloses transmitting two (or more) signals, at least one of which is a speech-only or speech-dominated signal (speech), while the other signal contains the remaining audio (which may also contain some speech). However, if the remaining audio contains some speech, the VRA ratio can only be adjusted more positively and will improve dialog intelligibility.

Separate adjustments to VRA may be made if each user is listening to a program with a Personal Listening Device (PLD) that may include, but is not limited to, headphones, hearing aids, cochlear implants, auxiliary listening devices, goggles containing speakers, or head-mounted equipment for multiple users in the same environment. Such goggles may include, for example, glasses with speakers, or a wearable computer. PLD, as used herein, will be defined to mean an audio reproduction device capable of receiving an electrical or wireless signal and converting it into audible sound in a manner that does not disturb other listeners in the same general environment.

After receiving the two (or more) signals at the personal listening device, the signals are separately adjusted by independent volume controls (or other types of controls, as described later) to achieve a VRA that is preferred for that individual user. The signals are then combined and further amplified, conditioned and transduced by the personal listening device into audible sound. Since the personal listening device does not interfere with other listeners in the same listening environment who may also have personal listening devices (with different preferred VRA settings), multiple listeners in the same environment can independently adjust the VRA to their own listening preferences. This is simplified by the fact that the signal is transmitted to each listener simultaneously (wirelessly or by wire). One possible application of this technology is in public cinema theatres. Multiple listeners may enjoy the same movie by independent VRA adjustment on their headphones, ALD, hearing aids, or other personal listening devices as discussed above. Fig. 4 illustrates these points by the following outline description.

Is transmitted to the listener

In order to enable each end user to independently adjust the level of speech relative to the rest of the audio, the signals arrive either separately at the personal listening device or in such a way (mostly encoded) that the two signals can be separated before being independently adjusted. The transmission of both signals may be accomplished, for example, using FM stereo transmission, where the voice (or remaining audio) is transmitted on the left (or right) channel. If stereo programming is also desired at the PLD, more complex multi-channel transmission is required. If both voice and the remaining audio have spatial information, four-channel transmission (either wired or wireless) and reception is required to deliver the multi-channel program to the end user.

Fig. 5 shows one possible embodiment for such multi-channel transmission in accordance with the principles of the present invention. The left speech and right speech programs are multiplexed (or alternatively encoded) together by a multiplexer 9 and the left and right programs of the remaining audio are also multiplexed by a multiplexer 10. This allows two-channel stereo to be received by the stereo receiver 13 via a transmission sent by one transmitter 11 via wired or wireless means 12. The four signals are then derived and independently conditioned 16 to form a total left 17 and right 18 program with spatial information from the speech and the remaining audio signals. There are many possible ways to transmit these signals for individual conditioning while maintaining spatial information. Other methods may include transmitting the left and right remaining audio programs along with separate monophonic voice channels (since the voice information is primarily non-spatial).

Center channel adjustment

As an extension to the above discussion, the transmission of the center channel of a multi-channel program is also considered to be related to VRA regulatory capabilities. For most multi-channel programs, the center channel contains most of the dialog in the movie. In addition, most sound effects and music are directed into one or more of the remaining 4.1 audio channels. Currently, pure voice channels are not available to the general public. Therefore, the center channel can be used as the voice channel described above before the pure voice channel is made available to the general public for most broadcasting and recording. Thus, the receiver in fig. 4 may be a multi-channel sound decoder, such as a digital cinema sound (DTS), sony dynamic signal sound (SDDS), dolby digital, or other multi-channel format decoder. The output of such a decoder 19 as shown in fig. 6 converts the digital input into left, right, left surround, right surround, center and subwoofer analog outputs. The mixer 20 may combine all channels except the center (varying the ratio according to the desired spatial effect) so that a signal channel, either stereo or mono, is output, which is sent 21 separately and simultaneously from the center channel, the latter being close to the dialogue-only channel. The reception may be carried out as shown in fig. 4.

Decoder in a personal listening device and measures for spatial processing

It should be noted that although the embodiment of fig. 4 in combination with fig. 6 refers to sending an analog signal to a PLD whose receiver or multi-channel decoder is centrally located, this does not exclude the case where the multi-channel decoder is included in the PLD, where the signal being sent is a digital signal that needs to be decoded in order to extract the speech and the remaining audio. Fig. 7 shows this concept. Digital signals read from a media source (e.g., DVD, CD, TIVO, or playback video recorder, etc.) or received from a broadcast (e.g., in digital television or digital radio) are sent 22 directly to PLD 28. The PLD has a built-in receiver 23 for receiving infrared, wireless or other broadcast signals, which are fed to a decoder 24 designed to meet the decoding specifications of the compressed format (e.g., dolby digital or DTS) it needs to operate. The mixer 26 uses the output of the decoder to produce the remaining audio and voice signals (or pure voice signals or center channel signals) which are individually conditioned by the user using gain amplifiers and/or attenuators 25, then recombined as described above, and transduced 27 into audible audio, which is the output of the PLD. This particular embodiment may be more convenient in implementing multi-channel audio rendering at PLDs, since the transmitted signal is digital (less subject to interference noise) and only one transmission channel is required; but may be more expensive because the decoding process is done at each individual PLD rather than at the mid-set location. Multi-channel rendering may include any signal processing: it spatially repositions the left, left surround, right surround and/or center audio presentation so that it feels more natural in a PLD such as a headphone. VRA adaptation is intended to work in conjunction with this type of processing in order to provide improved dialog intelligibility without affecting any spatial processing other than the user-adapted VRA mixing of sounds.

Fig. 8 provides further details of one possible embodiment, which includes signal processing for multi-channel rendering. Depending on the preferred embodiment for implementation, receiver 29 and decoder 30 are centrally located or located on the PLD. The center channel or other pure speech channel is separately adjusted 31 before spatial processing, as is the level adjustment 32 of all remaining audio. The spatial processing 33 then receives the multi-channel presentation (or in some cases the two-channel presentation) as it was previously recorded, and then generates more realistic sound conditions for the PLD. In this case, the spatial processing 33 is not affected by the VRA adjustment, but the user can still select a desired level of speech relative to the remaining audio.

VRA purpose-enabled "volume control" (attenuator)

There are many possible embodiments of the physical adjustment mechanism and overall volume signal control of the voice and remaining audio. The most common adjustment mechanism occurs when the speech has a user adjustable gain, the remaining audio signals have a user adjustable gain, and the total volume of the summed signals has a further gain adjustment (total volume control). Fig. 4 shows this in detail. Another embodiment provides a more user-friendly adjustment mechanism with fewer steps in the adjustment process when the user tries to set the VRA ratio for comfort, in addition to setting the desired total sound level. As with most entertainment programs, a conversation is a target sound that the program centers on. Therefore, as shown in fig. 9, the overall dialog sound level will control the program loudness, i.e., the overall program sound level is typically set according to the dialog sound level. Therefore, with only two controls (total volume level adjustment and remaining audio faders), the user can select the desired VRA and total volume level through a simple two-step process. First, the total sound level is set by the total volume adjustment 37 (fig. 9), thereby adjusting the sound level of the voice in the main program. At this point, the dialog is at the desired listening level and only the VRA needs to be set. By having only one attenuator acting on the remaining audio, intelligibility can be improved to 100% of theory by reducing the remaining audio without affecting the voice level. In addition, the attenuator may be implemented using a variable voltage divider, which does not require power, while still allowing the user to adjust the VRA ratio to all values greater than 0 dB. In order to save additional power, the main volume adjustment 37 following the adder 36 can also be realized with an attenuator. If attenuator 37 delivers the entire signal without voltage division, amplifier 38 is designed with sufficient gain to provide transducer 39 with power to the loudest volume level. As another example, volume adjustment 35 may be placed on the voice and not on the remaining audio to allow the user to control the total program sound level as a function of the remaining audio, rather than as a function of the dialog. Placing the attenuator on the voice is undesirable because it cannot achieve a positive VRA ratio. In contrast, if the current embodiment is implemented, an active gain stage must be placed on the voice so that the sound level can be raised high enough to exceed the remaining unaffected audio to provide a sufficient positive VRA ratio. The overall loudness is then controlled by the overall volume control as before.

Ratio balancing implemented with a single scale "

Another embodiment according to the invention for VRA and total volume adjustment is equipped with a VRA knob for single adjustment with two inputs. This single knob adjustment, unlike dual knob sound level control and single knob attenuators, can adjust the balance between the voice and the rest of the audio. Fig. 10 shows that the voice and remaining audio are additively and continuously adjusted by a single control 40, and further adjusted with an overall audio gain control (active or attenuator) 41. Balance control itself finds a well-known application in the regulation of front-to-back attenuation or left-to-right balance in car or home stereo systems. The key difference in this application is that it adjusts the ratio of content rather than the positioning of the audio on the individual speakers. In fact, further balance mode control may be implemented to adjust the spatial positioning of the audio, if desired. By implementing a single knob VRA control, the user can fully adjust the VRA using a single knob (all ratios are available). The total volume may then be adjusted according to the desired sound level.

Automatic VRA

The automatic VRA hold feature allows the end user to not only adjust the preferred ratio of voice to remaining audio, but also "lock" on this ratio when transient volume changes occur in either the voice or remaining audio. For example, a football game contains dialogue from the announcer and background noise from the fans. If the desired VRA is set during a point in time when the fan is relatively quiet, the audience noise may mask the announcer's voice as the fan becomes louder (but the announcer remains at the same sound level). Likewise, if the VRA is set during periods when the announcer is speaking very loudly, it is possible that when the announcer returns to normal speech volume, the sound level becomes too low to be well intelligible.

Standard deviation based VRA technology

To avoid the user constantly adjusting these sound levels, the user may press a button after the ratio is set, which will be stored and maintained for use by the rest of the program. One method for doing this is that the standard deviation of the voice signal and the remaining audio signal is calculated and stored at the moment the button is selected. The real-time calculation of the standard deviation of each signal then continues as the program progresses. If the deviation exceeds the stored value, the signal is multiplied by the ratio of the stored value to the actual value, thereby reducing the volume. Likewise, if the deviation is much lower than the stored value, the signal may be multiplied by the same ratio in order to boost the sound level. If it is desired to raise the sound level (when the actual deviation is lower than the stored deviation), the signal segment must be detected when no signal is present, so that the noise floor is not amplified unnecessarily. If the actual deviation is close to zero, a situation may occur where the ratio becomes close to infinity. The most general form of the automatic VRA method discussed herein is represented by the following formula, where:

G₁volume control for speech;

G₂volume control of the rest audio;

G₃total volume control;

v is speech;

RA ═ the rest of the audio;

σV_actualstandard deviation of actual speech;

σR_actualstandard deviation of the actual remaining audio;

σV_storedstandard deviation of stored speech;

σR_storedstandard deviation of the remaining stored audio.

The stored standard deviation of each of the individual signals (speech and remaining audio) is stored and compared to the actual standard deviation in real time. Here, the standard deviation is used as a measure of the level of each signal. Other metrics may also be used, including peak sound levels over a time interval. In order to control the volume adjustment and its effect on the overall signal output, it may be desirable to adjust the gain G₁And G₂The standard deviation is calculated after adding to the signal. The results will be slightly different: after the standard deviation is stored, further volume adjustments of the dialogue with the rest of the audio will be disabled until a new stored value is entered. If this is the desired characteristic, the offset calculation for V (speech) and RA (remaining audio) should include a user-selectable gain G₁And G₂. If further adjustment is desired, the gain may be added after the offset calculation and multiplication as described above.

Fig. 11 shows these concepts in more detail. Both the speech and the remaining audio signals are subjected to the same operations, respectively. It should be noted that a simpler and very efficient implementation of this concept is to remove the operations performed on the speech signal and only modify the remaining audio when the standard deviation changes. The reason for this is to reduce the required computational overhead by half (removing the operations performed on the speech signal) because it can be assumed that the changes in the speech channel are smaller than the changes that may exist in the remaining audio channels. In any event, the most common embodiment is shown in FIG. 11, which shows the operation on speech and on the remaining audio. The dashed line represents a redundant option, which does not have to be used in conjunction with the solid line, but which does provide the difference in performance described in the previous paragraph. The user adjustable dialog gain 45 may be implemented before (using unit 46) or after (using unit 44) the standard deviation calculation. When the desired performance is selected by the user at a certain time, the standard deviation of the speech from the remaining audio is stored in a storage unit (47 and 47A), which may be a volatile or non-volatile memory. This stored value is used as a numerator in the multiplication process 48 and 48A of each signal, while the denominator is the current actual standard deviation before or after the user adjustable gain stage. (it should be noted that the solid and dashed lines are not implemented simultaneously). The condition for deciding whether the current ratio is greater than or less than 1 is not shown. If it is less than 1, this means that the current actual sound level is greater than the stored sound level, and the volume should be reduced by this ratio. If it is greater than 1, it is desirable not to perform any operation, but to transmit a signal affected by the user adjustable gain value (this requires an "if" type statement to check the ratio condition against the current condition and make a decision). Doing so prevents a very large ratio multiplied by a lower signal, resulting in a very high noise level for quiet channels. Also, a lower limit may be set for the ratio by another condition that allows the appropriate low sound level to be amplified accordingly, but a very low signal level (or absence) may be unmodified or modified with the last ratio before the condition was violated.

Storage of different VRAs and automated VRAs

Storing user preferred ratio levels in 47 and 47A is advantageous for controlling the sound of different types of programs or different listener situations using hardware/software. Since all users will prefer different VRAs and the individual audio levels may vary for different types of programmes, it is considered necessary to provide multiple storage areas for different types of programmes and for different users. For example, adding a name or password to each storage unit would allow different users to invoke different VRAs for a particular program. Root of herbaceous plantDepending on the method used from fig. 11, the storage elements may include a desired voice level, a desired standard deviation of voice, a desired remaining audio level, and/or a desired remaining audio standard deviation. This would allow the user to return to the playback device at the same setting (perhaps a different setting than sports and series) without readjusting the VRA levels and resetting the hold characteristics. There is no specified limit to the number of storage units that can be provided on a playback device. FIG. 11 illustrates and shows a user adjustment as a button that selects the current standard deviation to be the stored standard deviation. In addition, there is a user pair G₁，G₂And G and₃and (4) controlling. There are several ways to provide these adjustments to the end user depending on what hardware they are used on. For example, the headset may have several buttons for storing different ratios and selecting the ratios depending on the duration of time the buttons are pressed. If these controls are used in conjunction with a personal computer, personal digital assistant, or cellular telephone, they may be graphical user interface controls implemented using software. To further simplify the regulation, it is possible to combine all the regulations (VRA and automatic VRA ratio maintenance) into a single control. The ratio of remaining audio to speech can be controlled by a single balance control as shown in figure 10. However, to implement the automatic VRA feature described in fig. 11, the performance of the knob must be modified by adding a ratio hold.

Fig. 12 is a diagram showing various functions of the slide control designed to accomplish all the functions by using a single control. (it should be noted that this could be any type of control including a rotary knob, a software control, incremental push buttons, etc., but the function is the same). The central location of the VRA/auto VRA control will provide the user with the original mix where the voice is approximately equal to the rest of the audio. When the knob is gradually moved to the left, the voice level does not change, but the remaining audio begins to decrease without the hold function. Under certain predetermined distance away from the rest point (where N equals this condition and may be as small as zero if desired), the value to which the standard deviation is compared starts to decrease as the knob is moved, i.e. starts to compress the remaining audio. This process continues until the stored standard deviation (which varies with knob movement) changes so little that the division results in a number close to zero and the multiplied output is substantially zero and only speech remains. At the other end of the knob the opposite occurs for the remaining audio.

Fig. 13 shows a block diagram of the one-in-one knob shown in fig. 12. After the knob reaches point N at the left side of the scale, the knob controls the stored values of the standard deviation for the remaining tones. Likewise, the stored standard deviation of the voice is adjusted when the knob is moved to the far right. One possible alternative to the embodiment shown in fig. 13 is to eliminate the voice automatic VRA control, whereby only the actual voice level is reduced when the knob is moved to the left. (arguments for such embodiments are given in the preceding paragraphs). Referring to the block diagram of fig. 13, the remaining audio standard deviation 52 is calculated and compared 53 to the stored remaining audio standard deviation 56, which is controlled by moving the position of the knob 57 to the left of point N. If the actual standard deviation exceeds the stored standard deviation, the remaining audio is multiplied by the stored value and divided by the actual value before being volume level modified 55 by the remaining audio, which is also controlled by the main knob 57. If it is not larger, the remaining audio is simply multiplied by the current knob setting 55 and then combined with the adjusted voice. When the knob is moved further to the right, the conversation soundtrack operates the same. Such a single knob adjustment is particularly useful in applications where space is an issue, such as headsets or hearing aids. This allows all functions of multiple controls to be provided, but only a single knob is required to effect all adjustments.

Additional VRA consumer applications

Other VRA consumer applications may include:

a portable "backing box" which receives and transmits conditioned and/or controlled signals to the acoustic transducer

Retrofittable devices to facilitate VRA regulation of movies on flight

Separate audio decoders that can be combined with existing home theater hardware to provide additional VRA adaptation for multi-user applications

Headset VRA regulation, remote controller with VRA regulation capability.

The VRA hardware is specifically designed to provide VRA regulation capability, and the description in the next section illustrates how the VRA hardware can be integrated into existing audio reproduction hardware. However, this is not limiting as the hardware described in this section must be used in conjunction with existing audio reproduction hardware. Indeed, it will be seen that the VRA specific hardware is designed to specifically interface with existing audio reproduction hardware such as a television or home cinema system. It should also be noted that each of the specific embodiments discussed in the previous section can be directly applied to each of the inventions discussed in this section, forming a new user friendly invention for adjusting the VRA ratio. For example, the first invention discussed in this section would be a portable electronic component that can receive two or more signals, one being voice and the other remaining audio, combine and condition these signals, and retransmit them to, for example, an unobtrusive headset, ALD, hearing aid, cochlear implant, auxiliary listening device, goggles containing a speaker, or headset. The single knob invention discussed in the previous section and shown in detail in fig. 9 may be included in this portable unit, providing a single adjustment capability to the end user in a portable form. However, it is not intended that each combination of the various combinations of techniques be discussed in detail by way of example, but rather that inferences be made from the description of the VRA method above described by way of example (acting on two signals) in connection with VRA hardware receiving the two signals.

Portable voice-over-remaining audio (PVRA) device

As used herein, pvr a refers to a portable VRA device used in a wide variety of environments in conjunction with standard PLD (personal listening device, such as headphones, ALD, hearing aids, cochlear implants, auxiliary listening devices, goggles containing speakers, or head-mounted devices). The pvr a device is capable of receiving wireless (or wired) transmissions from a source providing at least two signals, one being pure or nearly pure dialogue and the other being the remaining audio. (more channels may also be included for further spatial localization capabilities, as described in the previous section). The transmission 58 of fig. 14 may be standardized to a certain bandwidth and low power so that the pvr a device 59 may be used in a variety of environments. This bandwidth may be 900MHz for radio frequency transmissions, or may be standardized as a line of sight transmission of the infrared transmission type. Once the provider agrees with the standard wireless transmission format, the meeting location (such as a church) and movie theatre can send the voice and the rest of the audio to the audience. The pvr a may be a general purpose player designed to receive 60 these signals, adjust the voices 61, 68 separately from the rest of the audio 62, 69, combine them to form the overall content of the program 64, 70, and retransmit them via a wired or wireless connection to a PLD 67 having a receiver 65 and transducer 66, the transducer 66 serving to convert the signals into audible sound. The method for adjustment, which is described in detail in the previous section, includes a variable gain amplifier or attenuator, and may also include automatic VRA holding capability. The pvr a box can be a standardized component that works with many existing PLDs by including a plug for a stereo headset such as 1/4 "at the transmitter stage 63. This embodiment also requires a headphone amplifier in the pvr a device. In addition, a standardized plug for connecting a wired hearing aid to the pvr a may be included, as an example. To standardize the PVRA and home theater equipment, all that is needed is a stereo transmitter where one channel is voice and the other channel is the remaining audio, and the receiver is tuned to receive both signals.

As an adjunct to the above description of the pvr a device, another device disclosed herein is a VRA/auto VRA regulation intended for use in connection with movie viewing while an airline is in flight. The intelligibility of the film dialogue may be particularly poor when the flight is in flight, when the background noise from the airplane further obscures the film dialogue. By providing the end user with the ability to adjust the voice and the remaining audio separately, improved intelligibility during flight can be achieved. It is possible to achieve this without interfering with the existing infrastructure for audio transmission. We must assume that the audio source (VCR, DVD, broadcast, or other audio source) has a dialogue audio track separate from the rest of the audio track. This can be done in several ways, one way being by using the center channel in a multi-channel format, or alternatively, using a pure voice soundtrack that can exist with several audio compression standards. (generating a pure voice track is not a point of attention for the present invention, but rather the hardware and implementation used to condition it and deliver it to the end user). The audio delivery infrastructure of an aircraft includes a stereo (2-channel) path to the end user, which can be implemented either by (1) electronic information transfer to each chair arm (requiring a standard headset with an aviation standard connector) or (2) a waveguide system with tiny speakers in the arm that, when connected to plastic tubing, will send the sound to the user's ears. To implement the modified aircraft VRA armrest adjustment device, it must be assumed that the voice is sent on either the left or right channel, while the remaining audio is sent on the other channel to all of the aircraft's armrests. While this will remove the stereo effect, it is seen as giving a potential improvement in overall program enjoyment at a minimum of sacrifice. In addition, aircraft noise often masks the fine stereo effect during flight entertainment. Since the method of electrons with respect to the waveguide is so different, two different embodiments are required, which are shown in fig. 15 and 16, respectively. However, if it is desired to make a universal adjustment mechanism, the components of fig. 15 and 16 may be combined into a single hardware unit, which may be implemented in conjunction with any aircraft armrest.

Fig. 15 is a diagram of a version of an aerial box for electronic connection in an armrest of an aircraft. The plug 71 may be, for example, a standardized male plug designed to fit into the armrest and connect to left and right signals from a central location. These signals are then adjusted 72, 73, and 74, 75 to achieve the preferred VRA ratios for voice (left) and remaining audio (right). These conditioned signals are then combined to form the overall audio program and further conditioned by 77 and 78. An amplifier 79 is required to provide power to the transducers in the PLD. The output 80 of the aviation VRA box contains a connector to mate with a female connector already in the current armrest, for example, to allow the aircraft to use their existing headphones for PLD equipment.

Figure 16 shows an airborne VRA capsule that can be used in conjunction with waveguides in existing handrails. In order to adjust the signal levels of each of the two signals (speech and remaining audio), the signals must be converted back to electronic form. Two microphones 82, 83 and microphone amplifiers 84, 85 are placed in the device, which measure the output of the arm rest speaker that is typically used to drive the waveguide. The output of the amplifier represents the electronic signal of the voice or remaining audio. These signals are independently conditioned by 86, 87 and 88, 89 to produce a total signal 90. It is further adjusted to obtain the total sound level 91, 92 to be used to drive another loudspeaker 93. The waveguide and output plugs 94, which are identical to those in the armrest, represent the output of the aviation VRA box, so that standard waveguide style headphones can be used in conjunction with the present design.

Radio transmitter for transmitting two signals in DVD, TV set and the like

Another application for multi-user VRA regulation arises when considering home cinema and home television and movie viewing. There are typically multiple viewers in a single room, with different listening preferences for varying the ratio of speech to remaining audio. The present invention allows multiple signals to be provided to PLDs worn by multiple users so that each person can adjust the VRA (and automatic VRA) to his preferences. As previously mentioned, an audio source (e.g., a television broadcast, DVD player, etc.) contains at least one track that can be considered pure speech or primarily speech, and at least one track that can be considered to contain the remaining audio. (different audio standards and formats may support some form of pure voice track in the future). The present invention contemplates the need for multiple users to separately and simultaneously access at least two signals in the same listening environment to allow them to set rates to their own individual listening preferences. In a first and most preferred embodiment (from a spatial and cost standpoint), the multi-channel radio transmitter is located within audio reproduction hardware (such as a television or DVD player) that separately transmits voice and the rest of the audio to the listener so that a listener with a VRA-capable PLD that is capable of receiving the transmission can independently adjust the VRA. In addition to placing the wireless transmitter within the audio reproduction device, a separate audio output jack may be provided that provides access to the remaining audio (mono or multi-channel) and pure voice (mono or multi-channel) for wired adjustment by a separate PLD without wireless reception capability. In the case of a television, DVD player, or other device without a wireless transmitter but with an audio plug, the user may connect separate multi-channel wireless transmitters to those output plugs to provide separate audio signals to the listener. Fig. 17 shows these concepts. Signal source 96 will deliver an encoded or modulated version of the entire program, which may also include video information if available. The signal source may comprise, for example, a television broadcast signal (via satellite, cable, terrestrial) and a coded DVD or CD signal read by a laser. This information must be received and decoded before becoming an electrical signal representing the audio information. The decoder is able to extract the pure speech channel (if it is present at the signal source) and keep it (them) separate from the remaining audio channels. After the receive/decode stage 98, there are two options for separate voice and remaining audio signals to use (1) they can be made available by separate output audio plugs 97, such as microphone-type connectors, or (2) they can be sent to a multi-channel wireless transmitter 99 also installed in the playback device 95. The playback device 95 may be a DVD player with the signal source internal to the device, or may be a television with the signal source external. If the signals are made hardware-outputable, these outputs can be sent to a separate external multi-channel transmitter 100 that can interface with the receiving PLD to provide wireless reception and VRA adjustment at the PLD for use by all users in the same listening environment.

Additional VRA decoder for use with other VRA-incompatible systems

As an alternative embodiment, it is also possible to design a dedicated decoder for VRA applications. This would allow users who do not currently own a VRA-capable decoder/transmitter to still have access to the VRA capabilities without upgrading any particular component (i.e., without losing their current investment). Suppose a DVD or a broadcast TV or a broadcast radio starts to send a coded pure voice channel. Not only cannot the current receiving device receive and extract this information, but it also cannot provide VRA features to any individual, let alone multiple users in the same environment. The device 111 shown in figure 18 provides all of these capabilities to a user who has a source of signals with voice separation that he can use, but has no means to extract and adjust the VRA ratio. The signal source 101 may be as before (television broadcast, DVD information, etc.) and may be sent directly to the primary reproduction system 102, which may be a TV or DVD player that may not be able to provide VRA regulation, as it was not previously equipped with such features. This same signal source is "off" to the external VRA box 111, where the corresponding receiver or decoder has been installed to remove and separate the voice and remaining audio from the signal source according to the standard (which supports the type of data present in the signal). Such a decoder may be a dolby digital decoder capable of extracting hearing impairment patterns, but the invention is certainly not limited to this particular decoder. It is likely that other popular audio formats will provide a means of transmitting a pure voice soundtrack in addition to the existing audio on a given day. The invention 111 will here include an appropriate decoder depending on the desired audio. There are three options for the external device after decoding and separating the voice track from the rest of the audio track. First, the device may provide a hardware output 104, 110 in the form of an audio plug (such as an RCA-type or telephone-type connector) that may provide a line-level signal to a VRA-capable transmitter or wired PLD; second, the receiver may provide a multi-channel transmitter 105 for wirelessly transmitting the separated voice and remaining audio signals to the PLD capable of VRA adjustment; or third, provide VRA adjustments for a single user application 107 directly on the device, where the voice is adjusted separately from the rest of the audio, summed together, and further adjusted for overall volume level before being provided as output 109 to any other audio playback component. It should be noted that although only a single conditioned output (referring to the overall mono output) is shown in fig. 18, it is within the scope of the present invention to provide several outputs for multi-channel spatial localization of audio (as mentioned in the previous description). For example, if the 5 channel remaining audio programs (left, center, right, left surround, and right surround) are provided along with a 5 channel (or less) voice program, the channels may be combined after level adjustments so that the voice may be sent to any speaker, or directly to a center speaker (as is typically the case), using separate volume adjustments. This still provides VRA accommodation capability to the end user while also providing a full surround sound experience.

Reception and VRA for use in connection with existing wireless transmission of mixed (video and audio) DVD signals

A new type of product is being developed that allows consumers to enjoy DVD video and audio information from remote locations. This wireless technology delivers audio and video information from a remote DVD player to a television or home theater. This allows the owner of a DVD drive in a personal computer to use the drive to view the contents of the DVD at a location different from their computer. VRA regulation at the PLD location and at the mid-set home theater can be used in two ways in conjunction with wireless DVD technology. First, since the DVD player has already transmitted the wireless audio signal to the home theater system, the PLD may be equipped with a wireless receiver set at the same frequency so that the PLD can intercept the same transmission. Since the video information is only needed at the viewing location and not at the PLD, the audio can be selectively decoded for reproduction at the PLD. It is important to note that the reception process of the wireless signal will be followed by a decoding process to extract the voice and remaining audio from the wireless DVD signal. This is followed by adjustments, recombination, and transduction of the signal into audible sound, including sufficient adjustments in the voice level, remaining audio levels, total sound level, and any automatic VRA characteristics. All of the hardware mentioned above would be placed in a personal listening device so that each user can adjust the VRA and sound level to his/her preference; such a system is shown in fig. 4, where the transmitter transmits the entire DVD signal, but the PLD decoder is only designed to extract the audio from the incoming bitstream.

This is a second embodiment of a VRA used in conjunction with wireless DVD transmission, which may result in reduced overall cost, but an increased number of components are required. As before, the overall DVD signal is sent from the DVD player to a playback site, such as a home theater. The receiver of the overall DVD signal at the centralized home theater location can simply retransmit the voice with the remaining audio to achieve an embodiment similar to that shown in fig. 4, but with the difference that the transceiver receives the wireless signal from the wireless DVD transmitter at the central location. This allows PLDs at the same site to be equipped with only one wireless receiver, rather than one digital decoder. The decoding process is performed centrally at the home theater location, the video is sent to the viewing device, and the audio (received and decoded from the DVD player location) is retransmitted from the receiver, decoder, transmitter into the PLD that receives the voice and remaining audio.

VRA knob on headset frame

The next category of the invention discussed focuses on introducing the VRA throttling feature to the specific hardware designed for VRA applications, i.e., VRA-capable personal listening devices. There are three types of PLDs that are the focus of this embodiment, headphones, hearing aids, assistive listening devices, cochlear implants, goggles containing speakers, or head-mounted devices that utilize wired or wireless technology. Typically, the auxiliary listening device utilizes headphones in conjunction with a microphone or wireless transmitter, depending on the use of the product. In a sense, the wireless VRA system shown in fig. 4 may itself be considered an assistive listening device. In general, however, the techniques of receiving, separating, conditioning, recombining, and delivering VRA may be used on products other than ALD. A wireless headset, a telephone headset, or a small earbud may contain volume control directly on the side of the headset. Fig. 19 illustrates an embodiment where all necessary hardware is placed in the headset 112, and the necessary adjustments for VRA control are easily provided to the end user on the housing of the headset. First, the wireless receiver 113 receives a plurality of audio signals after decoding has occurred, which are transmitted by a source location (if necessary). The demodulator 114 derives the base band of the audio signal, derives the voice track and the remaining audio tracks, which are then manually adjusted 115, added 117, and further adjusted in sound level 116, amplified by a headphone amplifier 118, and reproduced by a headphone loudspeaker 119. The volume knobs 115, 116 may be located outside the headset and may be provided with balance adjustment if multi-channel (stereo) reproduction is used. If multiple audio channels are transmitted and received at the PLD, they may be conditioned and combined by having the hardware and software shown in fig. 5 and/or 8 to form the desired stereo image or spatial projection.

Remote adjustment of VRA in hearing aids

In addition to the VRA adjustments made on the headphone device (which is physically large enough to accommodate the hardware described above), it may also be desirable to include a hearing aid or smaller headphone device in the smaller PLD to allow for VRA adjustments. These smaller PLDs may not accommodate all of the required hardware for conditioning and manually controlling two or more signals. In such a situation, it may be more desirable to use a device such as that depicted in FIG. 13. With this embodiment using a small PLD, all that is needed is a wireless receiver in the PLD. Surface packaging techniques and miniaturization of electronic components make it easy to introduce low power wireless receivers into small spaces. (it should be noted that because the transmission of the adjustment signal only requires a range of about 5 feet, a smaller antenna power amplifier is required because the handheld adjustment mechanism receives a separate signal emanating from the source transmission location). The remote transceiver 13 depicted in fig. 13 may also be constructed in the form of a remote control for the PLD. Therefore, the remote control controls the volume of the voice and remaining audio and the total volume of the PLD, while also serving as a transmitter to the PLD and a receiver to receive from a source location. Future technologies may allow the entire electronics needed for VRA regulation to be placed, for example, in a hearing aid (or in a miniaturized PLD). It may still be necessary to actually control the volume level using a remote control, since it is desirable to keep the hearing aid as unobtrusive as possible. In this case, the remote control may still be desirable for aesthetic reasons.

Embodiments for VRA headphones in a movie theater environment

If the headphone design is equipped with the technology disclosed herein, the theater will provide yet another opportunity for the individual to adjust the VRA. An individual can take advantage of the ability to control the level of voice sound independently of the remaining audio while enjoying the surround sound or large screen of a movie theater. It is desirable to have full control of the remaining audio and voice in an assistive listening device or headset in a movie theater, and an earmuff headset with sufficient passive and/or active control is needed. Passive noise control using ear guards, a double-shroud design and attenuating materials is effective in blocking ambient frequencies down to about 500 Hz. It may also be desirable to include active noise control in such headsets so that lower frequencies that cannot be effectively controlled by passive measures can be further reduced. Such cinema headphones may be designed as discussed in the previous headphone embodiments. A second alternative to slightly lower the functionality is also possible if only voice control is required. Many movies are regressive, so in the rest of the audio loud passage the voice level is too low to provide good intelligibility. It may be desirable to only boost the sound level of the dialog during these passages. The reception of signals, volume control, and reception of dialogue only may be accomplished by using an earpiece type headset that allows ambient sound to reach the ear. Allowing ambient sound to reach the user's ear allows the spatial cue signals from the multi-channel surround sound to still reach the ear and be heard, but the dialog can be adjusted to improve intelligibility.

It should be appreciated that the adjustment of the voice to remaining audio (VRA) ratio, which is a part of the mixing process at the professional recording production end, may contradict the concept of artistic freedom for certain individuals. For example, audio engineers have focused on obtaining the correct mix of sounds to produce the desired effects in music, movies, and television. Therefore, it is necessary to include a means to either transmit the original (unaffected) composition level mix of the total program or to provide a method to easily achieve the mix. This allows the end user to choose between the ability to select a composition mix (the way the producer designs the mix) or the ability to adjust the VRA ratio himself. There are at least four possibilities for accomplishing this goal, which are given below.

Embodiments of VRA selection of artistic mix (acoustic audio mix)

Method 1

Figure 20 shows the first two options delivered to the end user for maintaining a mix of works (designed as intended by the producer) in addition to providing VRA regulation capability. The producer starts with a soundtrack 120 with all units separated, which form the whole program, combines them together to form a mono or multi-channel program 122, and is recorded or broadcast 123 to the end user. In addition to the composition mix, the speech used to create the composition mix (time aligned, delayed, and processed 124 with the same processing as 121) should remain separate from the composition mix throughout the recording and broadcast stage 123. Typically, the signal is broadcast at a single frequency having a certain bandwidth, so it is often represented as a single signal as input 126, as shown in fig. 20. (even though this is shown as a single signal, multiple signals are contained in the modulated/encoded signal). The decoder/playback device 125 decodes or demodulates the recording or transmission to provide the original work mix 126 in addition to the dialogue-only soundtrack 127, the dialogue-only soundtrack 127 being produced and recorded in conjunction with the level of mixing. The dialog signal is passed through a switch 129 which can disconnect the dialog signal from output 133. When this switch is open, the composition mix is provided in its original form for playback by the rendering hardware 133 in any multi-channel structure 131 in which the mix was originally made, and the volume knob 128 serves as the overall volume control for the entire program. However, when switch 129 is closed, the dialog is passed to volume adjustment 130 and added 132 to the composition mix (which would typically be added to the center channel, or equivalently to the left and right channels if it were in a multi-channel format). This allows the end user to adjust the sound level of the overall program 128 relative to the sound level of the dialog as adjusted by the dialog volume knob 130. If the dialogue volume knob 130 is turned to the lowest, the composition mix may be retrieved. If the conversation and separate conversation tracks in the composition mix are recorded or broadcast simultaneously (i.e., time aligned), there is no delay between the two tracks; therefore, as the volume of the conversation increases with the composition mix, it begins to increase the ratio of the speech heard through the reproduction device 133 to the remaining audio. Time alignment can be achieved by processing dialogue signals (such as reverberation or filtering) all using factors that cause the same delay experienced by dialogs in a composition mix. This will ensure that the dialogue in the composition mix aligns equally in time with the separate dialogue tracks. It should be noted that this particular embodiment does not allow reaching a lower VRA ratio than the working mixture VRA ratio. If this is the desired characteristic, the next two inventions will accomplish this goal.

Method 2

As before, it is assumed that the composition mix and the dialog are provided from a broadcast or recording and that the two dialog signals (one composition mix and the other pure) are time aligned with each other. Fig. 21 shows another configuration that will allow a negative VRA ratio (i.e., user reduction of the voice level and boosting of the remaining audio level, if desired) after decoding the composition mix and dialog channels. The decoded dialog signal 135 is subtracted from the composition mix 134 resulting in a pure remaining audio 137 mix. At this point it is possible to adjust the remaining audio level 139 independently of the dialog level 138 before combining 140 to form the overall user adjusted program. The composition mix may be provided on one terminal of a switch 141, this switch 141 alternating between user adjusted mixes and composition mixes. This configuration allows the dialog signal to be reduced to the extent that only the remaining audio is left. The next invention also provides a user fully adjustable VRA ratio in the event that a mix at the work level is available by recording the work information prior to the encoding process.

Method 3

Figure 22 shows a manufacturing process that sends information on how programs are mixed to ensure that the composition mix is provided at the consumer level and to provide the ability to universally adjust the VRA ratio. The original program units 143 are mixed 144 to form a multi-channel or mono program and listened to on trial until the sound levels of all of the inputs 143 are indeed at levels that are accurately reproduced as determined by the audio engineer making the program. The output of the composition mix is then designed to the dialog signal itself 149, the sound level determined by the producer to be appropriate for the dialog signal 148, the audio of all combinations deemed not to be critical to the dialog 146, and the total sound level of the remaining audio. The respective sound level information is digitally encoded 150, 151 as intermediate data into the actual audio signal 152, 153. These encoded signals with their respective sound level information are transmitted, broadcast, or recorded 154. The playback device is provided with a decoder 155 which extracts the audio information and header information 156 containing the mixing levels (possibly with respect to some maximum numerical value, depending on the resolution of the recording) of the original work. The remaining audio levels and the dialog level 157 are then provided to gain adjustment circuits 158 and 159 so that when the automatically adjusted dialog and remaining audio are combined 162 the levels are correct and the original work mix can be achieved. This can only occur if switches 160 and 161 connect the output of the header information block 156 to the gain adjustment circuit. If the switch is thrown in the right position to connect the user mix 164 of VRAs, the user selects the remaining audio level 158 and dialog level 159. While in many cases the display paths represent a single signal, it is within the scope of the invention to treat each signal path as a vector in the presence of multiple signals, such as left, right, left surround, right surround, and center channels for spatial localization, with their levels controlled by the remaining audio level controls 158 of fig. 22. Likewise, the pure speech soundtrack may also contain multi-channel information, which may be adjusted by the control 159.

VRA incorporating existing audio reproduction hardware

The above discussion has focused on providing the end user with the ability to adjust the VRA for electronically reproducing media (either broadcast playback or recorded playback) on a personal listening device PLD so that individual listeners in the same environment can enjoy different VRA rates at the same time. Further inventions will be made when the personal listening device described above is extended to include an electronic apparatus comprising:

cellular telephone

Wearable computer

Personal digital assistant

MP3 playback device

Personal audio player using magnetic storage medium to store music

These devices may be used for personal level playback of music or audio that contains a dialog and the rest of the audio (which may obscure the dialog). The embodiments discussed in the previous sections can be applied to the four devices listed above to provide VRA adjustment for the playback of audio that was prerecorded and produced separately from the rest of the audio, regardless of the format of the encoding.

VRA on personal computer

The internet has become a popular channel for the distribution of digital quality media. If a consumer receives music, movies, or other audio in real time (or with a delay) through a data connection to their personal computer, they can implement the VRA and automatic VRA features in a variety of ways. The gain control applied to the speech and the remaining audio may all be software driven through a graphical user interface. The speech is encoded separately from the rest of the audio by hardware or software (depending on the personal computer system). By adding several lines of source code to the decoding procedure, each signal indicating two signals (speech and remaining audio) is multiplied by a user adjustable constant, the signal can be amplified in the digital domain. These constants are controlled by the user through a software user interface before the signals are added together, which allows the two constants to be adjusted to multiply the decoded speech and remaining audio signals. In addition, further volume adjustments may be added to the total combined sound level to allow the user to control the total volume of the program prior to playback. Alternatively. If it is desired to provide a more "user friendly" adjustment capability to the user, a VRA knob (see discussion of methods for possible control knob embodiments) may be provided as the actual hardware on the computer speakers, keyboard, mouse or monitor, all components of the PC system. If the VRA is made available by hardware (e.g., knobs on the monitor) and the signal decoding is implemented by software, then a handshake protocol is required to ensure that the adjustments made by using the hardware knobs are translated into software gain changes and multiplications. Figure 23 shows one possible option for hardware to software interface. The movement and position of hardware knob 165, which produces voltage output 167 in relation to position and full scale voltage 166, must be sampled by a/D168 to convert the position information into a number representing the volume relative to full scale 166. One possible hardware implementation of such a knob is a rotary potentiometer, which is a potentiometer having a full scale voltage on one end and ground on the other end, and a wiper that provides a divided voltage as a function of rotational position. The output of the a/D is then periodically polled by software 169 that controls the signal flow to get the digitized number selected by the user. These numbers (one for voice and one for the remaining audio) are multiplied 170 by the respective signals, the outputs of which are added 172 to form the overall VRA-conditioned program. There are several other combinations of software and hardware usage for controlling VRAs on personal computer playback devices. The hardware-only form requires that the signal be decoded and provided as an output from the sound card or from the PC motherboard itself, as in the device shown in fig. 18. This embodiment allows volume adjustment of both signals to be carried out using hardware gain or attenuation without the need for a graphical user interface. Each configuration has its own advantages:

all hardware: cheap, readily available knobs, and easy adjustment with high visibility

Full software: no hardware updates are required to implement the VRA, greater flexibility in tuning options and features, and GUI controls can be customized by the end-user

Software/hardware: the high visibility of the adjustment mechanism requires fewer D/a converters since the sum is output rather than the rest of the audio and dialog.

Automatic VRA on personal computer

It should also be noted that while personal computers are seen as personal listening devices, headsets (and also PLDs) are often used in conjunction with PCs. Therefore, the PC can be used as a signal source for other PLDs. Previous embodiments discussing televisions and DVDs may also include a signal source on a PC, such as a headset plug with VRA controls connected to it. In addition, while attention is directed to VRA regulation only, the automatic VRA retention feature may also be implemented on a PC. In fact, due to the computations required for computing the real-time signal properties, a Central Processing Unit (CPU) or Digital Signal Processor (DSP) capable of performing a large number of computations per second is required. Therefore, the PC implementation of VRA can easily implement automatic VRA features due to available computing power. All of the automatic VRA and user controls discussed in the previous sections may be accomplished by using any of the hardware/software interface options discussed in the previous paragraphs. However, a pure hardware implementation is difficult due to the required computational power and the real-time operations required for continuously limiting the signal level. Therefore, an implementation in which either full software or a combination of hardware control and software mathematical operations are used would be preferred for an automated VRA implementation.

Fig. 24 shows a user in a multi-channel listening environment. Although fig. 24 shows a scenario with 5 speakers (left, center, right, left surround and right surround), such an environment may have 2, 3, 4, 5 or more speakers. Each speaker has a frequency response path from itself to each ear, resulting in a total of 10 paths. If the electrical signals driving these loudspeakers are filtered with estimates of these paths before being combined to form the left and right ear signals, a more realistic sound stage can be created for headphone listening. This is clearly a desirable result, allowing an individual to experience a multi-channel surround sound experience without purchasing a multi-channel amplifier/speaker system. However, there is a need for an invention to provide the end user with the ability to adjust the VRA in conjunction with multi-channel spatial processing in order to achieve the desired intelligibility while at the same time experiencing the surround sound stage by using headphones.

VRA on multi-channel processing headphones and automatic VRA used in conjunction with Dolby headphones

The design depicted in fig. 25 assumes that the dialogue tracks are provided separately outside the overall audio program. Therefore, a negative VRA ratio is not achieved for this particular embodiment. However, if two multi-channel programs are decoded simultaneously (remaining audio and speech), all possible VRA ratios will be achieved by completely lowering or raising the remaining audio and/or speech. Fig. 25 shows an overall multi-channel audio program (remaining audio plus speech) 173 that is passed through an overall volume control 174, which will ultimately serve as a control for the remaining audio. Spatial processing at 175 refers to the prior art where each signal is filtered and/or delayed to produce the desired multi-channel effect. However, the changed dialog is added 176 to the appropriate channel before they are combined together to form the two-channel headphone program. As part of the spatial processing, information is maintained to determine which speaker the conversation track should appear on, while this information is passed to decision step 178. For most works, the speech is located in the center channel or in both the right and left channels (simulated center). For the purposes of this description, it is assumed that the dialog is only transmitted to the center channel. When the center channel is indicated at 178 as the correct position for the dialog, the dialog process 181 duplicates the spatial processing information (filtering, delay time, etc.) for the channel for which the dialog is to be placed. The dialog 179 is first adjusted in level (voice level adjustment) 180 before being processed by a (center) channel process 181, this center channel process 181 being replicated from the multi-channel spatial process 175. After the conversation is processed 181, the speakers that are to receive the conversation are again moved 178 and the signals are added to the appropriate channels 176. If speech is meant to be added to the left and right channels, block 178 passes the appropriately processed speech to each of these channels via 176 and not to any other channel. The remaining audio signals can be raised and lowered using 174 and the dialog can be raised and lowered using 180, which can provide positive VRA adjustment since the speech is included in the overall program 173. The VRA-conditioned, spatially processed, multi-channel program is then further processed 177 (prior art) to produce a two-channel headphone program. This two channel program is further adjusted in volume and passed to the headset speaker 183. The latest embodiment of the prior art discussed as elements 175 and 177 of fig. 25 is a dolby headphone. The above VRA invention is designed to work with dolby headphones as well as any other multi-channel processing headphones (deriving two headphone channels from multiple spatial channels).

VRA on 'store programs' playback device

Non-linear television viewing offers significant advantages to viewers: the program may be recorded and stored for later viewing. The latest technology, hard disk drive television recorders, (including TiVO, playback, and Microsoft, among others) differs from the traditional VCR technology in that the recording method is very user friendly, that separately recorded channels can be set, and that playback is almost instantaneous. Future television viewing will likely prefer non-real time, non-linear viewing rather than real time viewing at a time that would otherwise be inconvenient. Therefore, it is important that the VRA regulation capability be able to work in conjunction with these playback devices. The conditioning hardware may be part of the remote control, video screen GUI, or physical hardware on the playback mechanism. The recording process need only record all information as it is being transmitted, including the separate voice track. The playback and throttling mechanism will then refer to the same components as discussed in the previous embodiment of the VRA and automatic VRA throttling hardware.

Remanufacturing

Currently, the production of audio (for broadcast, movies, music, etc.) can be viewed as a multi-step process, as depicted in fig. 26, which is considered prior art. At the production level, there are several types of sounds that will be recorded to form the entire audio program. These sounds 184 can be divided into several types, including sound effects, music, voice, and other sounds. Typically, to understand the content or lyrics of a program or audio, the voice segments of the production level sounds are considered critical. All of these sounds are first recorded separately 185. Some sounds are not recorded by microphones, such as sound effects, which are often transcribed from a pre-recorded set of effect tracks. It is not always necessary to record all the sounds in order to synchronize them on the main channel. Non-linear recording and playback allows work 186 to align sound with video and with other sounds during several playback/recording cycles, often by using software-driven recording and editing. The production process 186 synchronizes all sounds with each other (and with the video, if present), mixes them in the proper ratio, and adds them to the surround channels (if available) that the audio engineer perceives they should belong to. For example, it is common for actors speaking on a screen to place their voices in the center channel closest to the location of the screen. In order to transmit or record large amounts of information, an encoding or compression process 188 is often required. This is not always the case if analog recording and playback is used, but multi-channel digital playback often has some encoding (for copy protection) and/or compression (lossy or lossless) depending on the requirements of the recording medium or broadcast. The encoded and/or compressed program is then recorded or broadcast 189 and played back or received at the end user location. As can be seen from the process of fig. 26, the end user cannot make any adjustments to the dialog level relative to the remaining audio because the producer has full control at 186. Once mixed, it is almost impossible to extract speech from the remaining audio and restore it to its originally recorded quality 184.

However, it is possible that the producer will return to the main program 85 where all recorded units are separated from each other and the speech and the rest of the audio are derived separately. The present invention focuses on providing means and the ability to get the dialog and the rest of the audio separately and re-record them so that the end user can have the ability to adjust the relative sound levels to suit his/her hearing needs. Fig. 27 shows one possible method of accomplishing this goal. The main recording 191 of almost all movies and multi-track audio programs as archived media is present completely separately. In addition, information about the level and position of each track made in the original recording is also present along with those main recordings. This information is used in two separate multi-channel mixers 192 (for all remaining audio) and 193 (for speech only) to prepare two separate multi-channel programs, which may be 6 channels as shown in 187, perhaps more or less, depending on the desired effect. Therefore, the production information from the original recording process 186 is used to produce the exact same effect as the rest of the audio and speech, separately and simultaneously. Each of these multi-channel programs can thus be combined at the outputs of 192 and 193, and the same overall audio program 187 can be formed from the original mix, although they are completely separate. Each of these two multi-channel programs is then encoded and/or compressed at 188, but used 194 and 195 entirely separately. The two encoded channels are then further encoded or multiplexed 196 to produce a single signal which is then broadcast or recorded 197. On playback, this signal is decoded to form at least two multi-channel signals that can be VRA adjusted 199 using the hardware and embodiments discussed in the previous sections of this document.

Re-control of multi-channel audio to "VRA-friendly" form, and re-control of stereo to "VRA-friendly" form

Fig. 27 shows the most common method for transmitting speech to the end user separately from the rest of the audio, so that all spatial information of the rest of the audio and speech is maintained during playback. Fig. 28 shows the opposite of this case, where spatial information is not maintained, but the end user has the ability to adjust the voice to the remaining audio levels independently. The main record exists at 185 and 191 and 200 of fig. 28. All components that are considered as the remaining audio (unvoiced units in the total audio recording) are level mixed only by 201. This means that only the relative influence of one remaining audio component on the other remaining audio component is determined, but no spatial localization is performed, since the output of 201 is a single signal and not a multi-channel signal. In addition, the sound level of the voice is adjusted 202 so that when combined without any adjustment, the sound level of the voice relative to the remaining audio is exactly what the producer would like. The adjusted voice and the adjusted remaining audio program are then recorded 203 in stereo media, which may include CDs, DVDs, analog tapes, etc., but may also include audio broadcasts of stereo. At this point, the speech is maintained separately from the remaining audio on the left and right or right and left tracks, respectively. It should be noted that some convention is needed to ensure that all works recognize that either the right or left channel contains speech, while the remaining audio is present in the other channel. This may be selected based on consensus from the consumer electronics manufacturer, but selecting either does not limit the scope of the invention. The stereo playback device 204 will then provide two signals as outputs (left and right), one with only speech and the other with only the remaining audio. To experience the entire program simultaneously with VRA modulation, the two signals are passed through two variable gain amplifiers 205 and 206, where each sound level is controlled, and then they are added to form the total program. This total program may then have its sound level 207 adjusted further. This fully adjusted total program is then separated if it is to be played back 208 by the stereo system. The advantage of this configuration is that it is possible to complete the production and playback of VRA media with today's consumer electronics and master records. Only a minimal amount of additional hardware (205, 206, 207) is required to enjoy VRA accommodation. The disadvantage is that the stereo image will be lost. However, many stereo effects are so subtle and playback devices are so low fidelity that most consumers will prefer VRA adjustment rather than stereo imagery.

The two embodiments discussed above in fig. 27 and 28 represent the most complex and simplest possibilities to provide VRAs to end users. Any embodiment with any number of recorded, produced, or played back channels can be imagined from the description in the above technical description, which is not limited to the two specific embodiments shown in fig. 27 and 28.

Claims

1. A cinema system which outputs a movie to each of a plurality of listeners in such a manner that: allowing individualized audio volume adjustments to be made to a plurality of listeners located in an audience of a movie theater environment, the system comprising:

a video device to display a video portion of a movie to a viewer;

a speaker system to transmit a corresponding audio portion of the movie to a viewer;

one or more units of a storage medium storing a video portion of a movie, a corresponding audio portion of the movie, a first audio signal of one or more channels as sound signals and having spatial information, and a second audio signal including audio content other than the audio content of the first audio signal and the one or more channels having spatial information;

a transmitter for transmitting first and second audio signals from one or more elements of a storage medium to a plurality of personal listening devices in synchronization with a video portion and a corresponding audio portion of a movie, wherein each listening device of the plurality of personal listening devices is associated with each listener of a plurality of listeners in a movie theater audience, and each personal listening device comprises:

a first receiver that receives the transmitted first audio signal independently of the speaker system;

a second receiver that receives the transmitted second audio signal independently of the speaker system;

a first adjusting device that adjusts a volume of the first audio signal according to an input from a user;

a second adjusting device that adjusts the volume of the second audio signal according to an input from a user;

an audio signal combining device that combines a spatial information channel of a first audio signal with a corresponding spatial information channel of a second audio signal to produce a combined audio signal; and

one or more transducers that receive the combined audio signals, convert the combined audio signals to sound, and output the sound so that they are audible to each of a plurality of listeners associated with each of the plurality of personal listening devices in an audience for the theater;

wherein the system allows each of the plurality of listeners to adjust the first and second audio signals independently of the other plurality of listeners in the audience.

2. The theater system of claim 1 wherein at least one of the first audio signal and the second audio signal is a monophonic signal.

3. The theater system of claim 1 wherein at least one of the first audio signal and the second audio signal is a stereo signal having left and right spatial information channels.

4. The theater system of claim 1 wherein at least one of the first audio signal and the second audio signal is a surround sound signal having spatial information channels including left, center, right, and one or more surround sound channels.

5. The theater system of claim 1 wherein at least one of the first audio signal and the second audio signal is a multi-channel surround sound signal having spatial information channels including left, left surround, center, right surround, right, and one or more surround sound channels.

6. The theater system of claim 1 wherein the first and second adjusting devices are volume controllable active amplifiers.

7. The theater system of claim 1 wherein the first and second adjustment devices are volume controllable passive attenuators.

8. The theater system of claim 1 wherein the first adjustment device and the second adjustment device are combined into a single volume control device.

9. The theater system of claim 8 wherein the volume of the first audio signal increases and the volume of the second audio signal decreases when the volume control device is moved in one direction, and wherein the volume of the second audio signal increases and the volume of the first audio signal decreases when the volume control device is moved in another direction.

10. The theater system of claim 1 wherein the first receiver receives a first digital bitstream comprising a first audio signal and the second receiver receives a second digital bitstream comprising a second audio signal, the system further comprising:

a first decoder that decodes a first digital bit stream; and

a second decoder that decodes the second digital bit stream.

11. The theater system of claim 1 wherein the first receiver and the second receiver are integrated into a single receiver.

12. The theater system of claim 11 wherein the single receiver receives a single digital bitstream including the first audio signal and the second audio signal, the system further comprising:

a single decoder that decodes a single digital bit stream.

13. The theater system of claim 1 wherein the first and second receivers receive wireless transmissions.

14. The theater system of claim 1 wherein the personal listening device is at least one of a stereo headset, a mono headset, a hearing aid, and an assistive listening device.

15. The theater system of claim 1 wherein the personal listening device is a body worn receiver that provides a combined audio signal to one or more electro-acoustic transducers.

16. The theater system of claim 1 further comprising at least one of a waveguide and an amplifier to enhance the combined audio signal.

17. The theater system of claim 1 further comprising a processor that calculates a ratio of the volume of the first audio signal to the volume of the second audio signal, wherein at least one of the first adjustment device, the second adjustment device, and the audio signal combination device automatically adjusts and maintains the ratio of the volume of the first audio signal to the volume of the second audio signal.

18. The theater system of claim 17 wherein the processor calculates the standard deviation of the audio signal over a finite time interval.

19. The theater system of claim 17 wherein the ratio is stored in memory for use by the audio signal combining device.

20. The theater system of claim 17 wherein the first and second adjustment devices are controlled by a user through a graphical user interface.

21. The theater system of claim 17 wherein the first adjustment device and the second adjustment device are coupled to a single user controllable volume adjustment device for adjusting the volume of the combined audio signals throughout such that the single user controllable volume adjustment device increases the volume of the first audio signal level and decreases the volume of the second audio signal when moved in the first direction and increases the volume of the second audio signal and decreases the volume of the first audio signal when moved in the second direction.

22. The theater system of claim 1 wherein the corresponding audio portions of the movie are stored as the first audio signal and the second audio signal.

23. The theater system of claim 1 wherein the first and second conditioning devices operate using computer software or computer hardware.

24. The theater system of claim 1 wherein the personal listening device is at least one of a cellular telephone, a wireless communication device, a body worn computer, a personal data assistant, a personal audio playback device, a television, and a DVD player.

25. The theater system of claim 1 further comprising a third adjustment device that adjusts the volume of the combined audio signal.

26. The theater system of claim 25 wherein the third adjustment means comprises a user controllable switch that instantly effects mixing of the first audio signal with the original version of the second audio signal.

27. The theater system of claim 25 wherein the third conditioning apparatus comprises a surround sound processor that converts the combined audio signal into an audio signal having a predetermined number of channels of spatial information.

28. The theater system of claim 27 wherein the surround sound processor converts the combined audio signal having left, center, right surround sound, and left surround sound spatial information channels into a signal having only left and right spatial information channels.

29. The theater system of claim 1 wherein the second audio signal comprises at least a portion of the first audio signal.

30. A method of operating a cinema system which outputs a movie to each of a plurality of listeners in a manner which: allowing individualized audio volume adjustments to be made to a plurality of listeners located in an audience of a movie theater environment, the method comprising:

displaying a video portion of the movie to the viewer;

outputting to the viewer the corresponding audio portion of the movie by using the speaker system;

providing one or more units of a storage medium storing a video portion of a movie, a corresponding audio portion of the movie, a first audio signal being a sound signal and having one or more channels of spatial information, and a second audio signal comprising audio content other than the audio content of the first audio signal and having one or more channels of spatial information;

transmitting first and second audio signals from one or more units of the storage medium in synchronization with a video portion and a corresponding audio portion of the movie;

receiving the transmitted first and second audio signals by using a plurality of personal listening devices independent of the speaker system, wherein each of the plurality of personal listening devices is associated with each of a plurality of listeners in the theater audience,

adjusting a volume of the first audio signal according to an input from a user;

adjusting a volume of the second audio signal according to an input from a user;

combining a spatial information channel of the first audio signal with a corresponding spatial information channel of the second audio signal to produce a combined audio signal;

converting the combined audio signal into sound by using one or more transducers; and

outputting sounds such that they are audible to each of a plurality of listeners;

wherein the method allows each listener of the plurality of listeners to adjust the first and second audio signals independently of the other plurality of listeners.

31. The method of claim 30, wherein at least one of the first audio signal and the second audio signal is a mono signal.

32. The method of claim 30, wherein at least one of the first audio signal and the second audio signal is a stereo signal having left and right spatial information channels.

33. The method of claim 30, wherein at least one of the first audio signal and the second audio signal is a surround sound signal having spatial information channels including left, center, right, and one or more surround sound channels.

34. The method of claim 30, wherein at least one of the first audio signal and the second audio signal is a multi-channel surround sound signal having spatial information channels including left, left surround, center, right surround, right, and one or more surround sound channels.

35. The method of claim 30, wherein the adjusting step is performed using a volume controllable active amplifier.

36. The method of claim 30, wherein the adjusting step is performed using a volume controllable passive attenuator.

37. The method of claim 30, wherein the adjusting step is performed using a single volume control device.

38. The method of claim 37, wherein the volume of the first audio signal increases and the volume of the second audio signal decreases when the volume control device is moved in one direction, and wherein the volume of the second audio signal increases and the volume of the first audio signal decreases when the volume control device is moved in another direction.

39. The method of claim 30, wherein the receiving step receives a first digital bitstream comprising a first audio signal and a second digital bitstream comprising a second audio signal, the method further comprising:

decoding the first digital bit stream; and

the second digital bit stream is decoded.

40. The method of claim 30, wherein the receiving step is performed using a single receiver.

41. The method of claim 40, wherein the receiving step receives a single digital bitstream comprising the first audio signal and the second audio signal, the method further comprising:

a single digital bit stream is decoded.

42. The method of claim 30, wherein the transmission received in the receiving step is a wireless transmission.

43. The method of claim 30, wherein the personal listening device is at least one of a stereo headset, a mono headset, a hearing aid, and an assistive listening device.

44. The method of claim 30 wherein the personal listening device is a body-worn receiver that provides the combined audio signal to one or more electro-acoustic transducers.

45. The method of claim 30, further comprising enhancing the combined audio signal by using at least one of a waveguide and an amplifier.

46. The method as in claim 30, further comprising:

calculating a ratio of a volume of the first audio signal to a volume of the second audio signal; and

the ratio of the first audio signal volume to the second audio signal volume is automatically adjusted and maintained.

47. A method as in claim 46, wherein the step of calculating calculates the standard deviation of the audio signal over a finite time interval.

48. A method as in claim 46, further comprising storing the ratio in a memory for use by the audio signal combining device.

49. The method of claim 46, wherein the adjusting step is performed by a user via a graphical user interface.

50. A method as in claim 46, wherein the adjusting step is performed using a single user-controllable volume adjustment device that is used to adjust the volume of the combined audio signals throughout such that movement of the single user-controllable volume adjustment device in a first direction will increase the volume of the first audio signal level and decrease the volume of the second audio signal level, and movement in a second direction will increase the volume of the second audio signal level and decrease the volume of the first audio signal level.

51. A method as in claim 30, wherein the respective audio portions of the movie are stored as the first audio signal and the second audio signal.

52. The method of claim 30, wherein the adjusting step is performed using at least one of computer software and hardware.

53. The method of claim 30, wherein the personal listening device is at least one of a cellular telephone, a wireless communication device, a body worn computer, a personal data assistant, a personal audio playback device, a television, and a DVD player.

54. The method of claim 30, further comprising adjusting the volume of the combined audio signal.

55. The method of claim 54 wherein the conditioning of the combined audio signal includes using a user controllable switch that instantly effects mixing of the first audio signal with the original version of the second audio signal.

56. A method as in claim 54, wherein the conditioning of the combined audio signal comprises converting the combined audio signal into an audio signal having a predetermined number of channels of spatial information.

57. The method of claim 56, wherein the transforming step transforms an audio signal having a combination of left, center, right surround, and left surround spatial information channels into a signal having only left and right spatial information channels.

58. The method of claim 30, wherein the second audio signal comprises at least a portion of the first audio signal.