HK1229589A1

HK1229589A1 - Systems and methods for delivery of personalized audio

Info

Publication number: HK1229589A1
Application number: HK17102666.0A
Authority: HK
Inventors: 梅於尔．帕特尔
Original assignee: 迪士尼企业公司
Priority date: 2015-07-21
Filing date: 2017-03-15
Publication date: 2017-11-17

Description

System and method for delivering personalized audio

Background

The delivery of enhanced audio is significantly improved due to the availability of soundbar, 5.1 surround and 7.1 surround. These enhanced audio delivery systems improve the quality of audio delivery by separating the audio into audio channels that are played through speakers placed at different locations around the listener. Existing surround sound techniques enhance the perception of sound spatialization by exploiting sound localization-the ability of a listener to identify the location or origin of detected sound in terms of direction and distance.

SUMMARY

The present disclosure is directed to a system and method for delivering personalized audio substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.

Brief Description of Drawings

Fig. 1 illustrates an exemplary system for delivering personalized audio according to one implementation of the present disclosure;

FIG. 2 illustrates an exemplary environment utilizing the system of FIG. 1 in accordance with one implementation of the present disclosure;

FIG. 3 illustrates another exemplary environment utilizing the system of FIG. 1 in accordance with one implementation of the present disclosure; and

fig. 4 illustrates an example flow diagram of a method for delivering personalized audio in accordance with one implementation of this disclosure.

Detailed Description

The following description contains specific information pertaining to implementations in the present disclosure. The drawings in the present application and their accompanying detailed description are directed to merely exemplary implementations. Unless otherwise noted, similar or corresponding elements in the drawings may be indicated by similar or corresponding reference numerals. Moreover, the drawings and illustrations in this application are generally not to scale and are not intended to correspond to actual relative dimensions.

Fig. 1 illustrates an example system 100 for delivering personalized audio in accordance with one implementation of this disclosure. As shown, the system 100 includes a user device 105, audio content 107, a media device 110, and speakers 197a, 197b, …, 197 n. Media device 110 includes a processor 120 and a memory 130. Processor 120 is a hardware processor, such as a Central Processing Unit (CPU) used in a computing device. The memory 130 is a non-transitory storage device for storing computer code executed by the processor 120 and also storing various data and parameters.

The user device 105 may be a handheld personal device, such as a mobile phone, tablet computer, or the like. User device 105 may connect to media device 110 via connection 155. In some implementations, the user device 105 may be wireless-enabled and may be configured to wirelessly connect to the media device 110 using wireless technologies such as bluetooth, WiFi, and the like. Further, the user device 105 may include a software application for providing a plurality of selectable audio profiles to the user and may allow the user to select an audio language and listening mode. A conversation refers to audio of a spoken language, such as a speech, an idea, or a narration, and may include an exchange between two or more actors or characters.

The audio content 107 may include audio tracks from media sources such as television programs, movies, music files, or any other media source that includes an audio portion. In some implementations, the audio content 107 may include a single track with all audio from the media source, or the audio content 107 may be multiple tracks including separate portions of the audio content 107. For example, a movie may include audio content for conversation, audio content for music, and audio content for effects. In some implementations, the audio content 107 may include a plurality of dialog content, each dialog content including a dialog in a different language. The user may select a language for the conversation or multiple users may select multiple languages for the conversation.

Media device 110 may be configured to connect to multiple speakers, e.g., speaker 197a, speaker 197b, …, speaker 197 n. Media device 110 may be a computer, a set-top box, a DVD player, or any other media device suitable for playing audio content 107 using multiple speakers. In some implementations, the media device 107 may be configured to connect to multiple speakers via wires or wirelessly.

In one implementation, the audio content 107 may be provided in channels such as two-channel stereo or 5.1-channel surround sound. In other implementations, audio content 107, also referred to as object-based audio or sound, may be provided according to an object. In such an implementation, rather than mixing individual instrument tracks in a song or ambient sounds, sound effects, and dialog mixed in an audio track of a movie, the pieces of audio may be instructed to turn exactly to one or more speakers 197a-197n and how loud they may be played. For example, the audio content 107 may be made as metadata and instructions regarding where and how all audio pieces are played. The media device 110 may then utilize the metadata and instructions to play audio on the speakers 197a-197 n.

As shown in fig. 1, memory 130 of media device 110 includes an audio application 140. The audio application 140 is a computer algorithm for delivering personalized audio that is stored in the memory 130 for execution by the processor 120. In some implementations, the audio application 140 may include a location module 141 and an audio profile 143. The audio application 140 may utilize the audio profile 143 to deliver personalized audio to one or more listeners located at different locations relative to the plurality of speakers 197a, 197b, …, and 197n based on the personalized audio profile of each listener.

The audio application 140 also includes a location module 141, which is a computer code module for obtaining the location of the user device 105 and other user devices (not shown) in the room or theater. In some implementations, obtaining the location of the user device 105 may include transmitting a calibration signal by the media device 110. The calibration signals may include audio signals emitted from a plurality of speakers 197a, 197b, …, 197 n. In response, the user device 105 may use a microphone (not shown) to detect the calibration signal emitted from each of the plurality of speakers 197a, 197b, …, 197n and determine the location of the user device 105 based on the location of the user device 105 relative to each of the plurality of speakers 197a, 197b, …, 197n using triangulation techniques. In some implementations, the positioning module 141 may determine the location of the user device 105 using one or more cameras (not shown) of the system 100. Likewise, the position of each user relative to each of the plurality of speakers 197a, 197b, …, 197n may be determined.

The audio application 140 also includes an audio profile 143, the audio profile 143 including a prescribed listening pattern that may be optimal for different audio content. For example, the audio profile 143 may include listening modes with equalizer settings that may be optimal for a movie, such as reducing bass and increasing treble frequencies to enhance the playback of movie conversations for hard-to-hear listeners. Audio profile 143 can also include listening modes optimized for certain types of programming, such as dramas and action movies, customized listening modes, and normal listening modes that do not significantly change the audio. In some implementations, the customized listening pattern may enable a user to enhance a portion of the audio content 107, such as music, dialog, and/or effects. Enhancing a portion of audio content 107 may include increasing or decreasing the volume of that portion of audio content 107 relative to other portions of audio content 107. Enhancing a portion of the audio content 107 may include changing an equalizer setting to make that portion of the audio content 107 louder. The audio profile 143 can include the language in which the user can hear the conversation. In some implementations, the audio profile 143 can include multiple languages, and the user can select the language in which to hear the conversation.

The plurality of speakers 197a, 197b, …, 197n may be surround sound speakers or other speakers suitable for delivering audio selected from the audio content 107. The plurality of speakers 197a, 197b, …, 197n may be connected to the media device 110 using speaker wires or may be connected to the media device 110 using wireless technology. The speakers 197 may be mobile speakers and the user may change the position of one or more of the plurality of speakers 197a, 197b, …, 197 n. In some implementations, the speakers 197a-197n may be used to create virtual speakers by utilizing interference between the location of the speakers 197a-197n and the audio transmitted from each of the speakers 197a-197n to create the illusion that the sound originates from a virtual speaker. In other words, a virtual speaker may be a speaker that does not physically exist at a location from which sound sounds to originate.

Fig. 2 illustrates an exemplary environment 200 utilizing the system 100 of fig. 1 according to one implementation of the present disclosure. User 211 holds user device 205a and user 212 holds user device 205 b. In some implementations, the user device 205a may be at the same location as the user 211 and the user device 205b may be at the same location as the user 212. Accordingly, when the media device 210 obtains the location of the user device 205a relative to the speakers 297a-297e, the media device 210 may obtain the location of the user 211 relative to the speakers 297a-297 e. Similarly, when media device 210 obtains the location of user device 205b relative to speakers 297a-297e, media device 210 may obtain the location of user 212 relative to speakers 297a-297 e.

The user device 205a may determine the location relative to the speakers 297a-297e through triangulation. For example, user device 205a may receive audio calibration signals from speaker 297a, speaker 297b, speaker 297d, and speaker 297e using a microphone of user device 205 a. Based on the received audio calibration signals, the user device 205a may determine the location of the user device 205a relative to the speakers 297a-297e, e.g., by triangulation. The user device 205a may be connected with the media device 210 through a connection 255a as shown. In some implementations, the user device 205a can transmit the determined location to the media device 210. User device 205b may receive audio calibration signals from speaker 297a, speaker 297b, speaker 297c, and speaker 297e using the microphone of user device 205 b. Based on the received audio calibration signals, the user device 205b may determine the location of the user device 205b relative to the speakers 297a-297e, e.g., by triangulation. In some implementations, the user device 205b may be connected with the media device 210 through a connection 255b as shown. In some implementations, the user device 205b can communicate its location to the media device 210 over the connection 255 b. In other implementations, the user device 205b can receive the calibration signal and transmit this information over the connection 255b to the media device 210 for use in determining the location of the user device 205b through triangulation.

Fig. 3 illustrates an exemplary environment 300 utilizing the system 100 of fig. 1 according to one implementation of the present disclosure. It should be noted that in order to clearly show that audio is delivered to user 311 and user 312, FIG. 3 does not show user devices 205a and 205 b. As shown in fig. 3, the user 311 is located at a first location and receives first audio content 356. The user 312 is located at a second location and receives the second audio content 358.

The first audio content 356 may include a dialog in a language selected by the user 311 and may include other audio content such as music and effects. In some implementations, the user 311 may select a normal audio profile, where a normal audio profile refers to a selection to transfer audio to the user 311 at a level that is unchanged from the audio content 107. The second audio content 358 may include conversations in the language selected by the user 312 and may include other audio content such as music and effects. In some implementations, the user 312 may select a normal audio profile, where a normal audio profile refers to a selection to transfer the audio portion to the user 312 at an unchanged level from the audio content 107.

Each speaker 397 a-397 e may transmit cancellation audio 357. Cancelling audio 357 may cancel a portion of the audio content transmitted by speaker 397a, speaker 397b, speaker 397c, speaker 397d, and speaker 397 e. In some implementations, the cancellation audio 357 may cancel a portion of the first audio content 376 or a portion of the second audio content 358 entirely. For example, when the first audio 356 includes a conversation in a first language and the second audio 358 includes a conversation in a second language, the cancellation audio 357 may cancel the first language portion of the first audio 356 entirely so that the user 312 receives only conversations in the second language. In some implementations, the cancellation audio 357 may partially cancel a portion of the first audio content 356 or the second audio content 358. For example, when the first audio 356 includes a conversation at an increased level and in the first language and the second audio 358 includes a conversation at a normal level in the first language, the cancellation audio 357 may partially cancel the conversation portion of the first audio 356 to communicate the conversation at the appropriate level to the user 312.

Fig. 4 illustrates an example flow diagram 400 for delivering personalized audio in accordance with one implementation of the present disclosure. Beginning at 401, an audio application receives audio content 107. In some implementations, the audio content 107 can include multiple audio tracks, such as music tracks, dialog tracks, effect tracks, ambient sound tracks, background sound tracks, and so forth. In other implementations, the audio content 107 may include all audio related to the media being played back to the user in one audio track.

At 402, the media device 110 receives a first playback request from a first user device to play first audio content of the audio content 107 using the speaker 197. In some implementations, the first user device may be a smartphone, tablet computer, or other handheld device that includes a microphone adapted to transmit playback requests to media device 110 and receive calibration signals transmitted by media device 110. The first playback request may be a wireless signal transmitted from the first user device to the media device 110. In some implementations, media device 110 may send a signal to user device 105 prompting the user to launch application software on user device 105. Application software may be used to determine the location of the user device 105, and the user may use the application software to select audio settings, e.g., language and audio profile.

At 403, the media device 110 obtains a first position of a first user of the first user device relative to each of the plurality of speakers in response to the first playback request. In some implementations, the user device 105 may include a calibration application for use with the audio application 140. After the initiation of the calibration application, the user device 105 may receive a calibration signal from the media device 110. The calibration signal may be an audio signal transmitted by a plurality of speakers, such as speakers 197, and the user device 105 may use the calibration signal to determine the position of the user device 105 relative to each of the speakers 197. In some implementations, the user device 105 provides the location relative to each speaker to the media device 110. In other implementations, user device 105 may receive the calibration signal by using a microphone of user device 105 and transmit this information to media device 110 for processing. In some implementations, the media device 110 may determine the location of the user device 105 relative to the speakers 197 based on information received from the user device 105.

The calibration signal transmitted by the media device 110 may be transmitted using the speaker 197. In some implementations, the calibration signal may be a human-audible audio signal, such as an audio signal between about 20Hz and about 20kHz, or the calibration signal may be a human-inaudible audio signal, such as an audio signal having a frequency greater than about 20 kHz. To determine the location of the user device 105 relative to each of the speakers 197, the speakers 197a-197n may transmit calibration signals at different times, or the speakers 197 may transmit calibration signals simultaneously. In some implementations, the calibration signal transmitted by each of the speakers 197 may be a unique calibration signal, allowing the user device 105 to distinguish between the calibration signals transmitted by each of the speakers 197a-197 n. The calibration signal may be used to determine the position of the user device 105 relative to the speakers 197a-197n, and the calibration signal may be used to update the position of the user device 105 relative to the speakers 197a-197 n.

In some implementations, the speaker 197 may be a wireless speaker, or the speaker 197 may be a mobile speaker whose position may be changed by a user. Accordingly, the position of each of the speakers 197a-197n may change, and the distance between the speakers 197a-197n may change. The calibration signals may be used to determine the relative positions of the speakers 197a-197n and/or the distances between the speakers 197a-197 n. The calibration signals may be used to update the relative positions of the speakers 197a-197n and/or the distances between the speakers 197a-197 n.

Alternatively, the system 100 may obtain, determine, and/or track the location of the user or users through the use of a camera. In some implementations, the system 100 may include a camera, such as a digital camera. The system 100 may obtain the location of the user device 105 and then map the location of the user device 105 to an image captured by the camera to determine the location of the user. In some implementations, the system 100 may use a camera and recognition software, such as facial recognition software, to derive the user's location.

Once the system 100 obtains the user's location, the system 100 may use the camera to continuously track the user's location and/or periodically update the user's location. It may be useful to continuously track the user's location or periodically update the user's location, as the user may move during playback of the audio content 107. For example, a user watching a movie may change positions after returning from taking a snack. By tracking and/or updating the user's location, the system 100 may continue to deliver personalized audio to the user for the entire duration of the movie. In some implementations, the system 100 is configured to detect that the user or user device has left the environment, e.g., room, where the audio is being played. In response, the system 100 may stop transmitting personalized audio corresponding to that user until the user returns to the room. If the user moves, the system 100 may prompt the user to update the user's location. To update the user's location, the media device 110 may transmit a calibration signal, such as a signal at a frequency greater than 20kHz, to obtain an updated location of the user.

Further, the calibration signal may be used to determine the audio quality of the room, such as the shape of the room and the location of the walls relative to the speakers 197. The system 100 may use the calibration signal to determine the location of the wall and how the sound is echoed in the room. In some implementations, a wall may be used as another sound source. Likewise, walls and their configuration may be considered for reducing or eliminating echo, rather than canceling echo or in combination with canceling echo. The system 100 may also determine other factors that affect how sound propagates through the environment, such as the humidity of the air.

At 404, the media device 110 receives a first audio profile from a first user device. The audio profile may include user preferences that determine personalized audio to be delivered to the user. For example, the audio profile may include language selections and/or listening patterns. In some implementations, the audio content 107 may include one conversation track in one language, or multiple conversation tracks, with each conversation track in a different language. The user of the user device 105 may select the language in which the conversation track is heard and the media device 110 may communicate personalized audio including the conversation in the selected language to the first user. The language heard by the first user may include the original language of the media being played back, or the language heard by the first user may be a different language than the original language of the media being played back.

The listening mode may include settings designed to enhance the listening experience of the user, and different listening modes may be used for different situations. The system 100 may include enhanced dialogue listening modes, listening modes for action shows, drama shows, or other genre-specific listening modes, normal listening modes, and custom listening modes. The normal listening mode may convey audio as provided in the original media content, while the customized listening mode may allow the user to specify portions of the audio content 107 to enhance, for example, music, dialog, and effects.

At 405, the media device 110 receives a second playback request from a second user device to play a second audio content of the plurality of audio content using the plurality of speakers. In some implementations, the second user device may be a smartphone, tablet computer, or other handheld device that includes a microphone adapted to transmit playback requests to media device 110 and receive calibration signals transmitted by media device 110. The second playback request may be a wireless signal transmitted from the second user device to the media device 110.

At 406, the media device 110 obtains a position of a second user of the second user device relative to each of the plurality of speakers in response to the second playback request. In some implementations, the second user device may include a calibration application for use with the audio application 140. After the initiation of the calibration application, the second user device may receive a calibration signal from the media device 110. The calibration signal may be an audio signal transmitted by a plurality of speakers, such as speakers 197, and the second user device may use the calibration signal to determine the position of the user device 105 relative to each of the speakers 197. In some implementations, the second user device may provide a location relative to each speaker to the media device 110. In other implementations, the second user device may transmit information related to receiving the calibration signal to the media device 110, and the media device 110 may determine the position of the second user device relative to the speaker 197.

At 407, the media device 110 receives a second audio profile from a second user device. The second audio profile may include a second language and/or a second listening mode. After receiving the second audio profile, media device 110 selects a first listening mode based on the first audio profile and selects a second listening mode based on the second listening profile at 408. In some implementations, the first listening mode and the second listening mode may be the same listening mode, or they may be different listening modes. Continuing at 409, media device 110 selects a first language based on the first audio profile and selects a second language based on the second audio profile. In some implementations, the first language may be the same language as the second language, or the first language may be a different language than the second language.

At 410, the system 100 plays a first audio content of the plurality of audio content based on the first audio profile and a first position of a first user of the first user device relative to each of the plurality of speakers. The system 100 plays a second audio content of the plurality of audio content based on a second audio profile and a second position of a second user of the second user device relative to each of the plurality of speakers. In some implementations, a first audio content of the plurality of audio content played by the plurality of speakers may include a first conversation in a first language, and a second audio content of the plurality of audio content played by the plurality of speakers may include a second conversation in a second language.

The first audio content may include cancellation audio that cancels at least a portion of the second audio content being played by the speaker 197. In some implementations, the cancellation audio may partially cancel or fully cancel a portion of the second audio content being played by the speaker 197. To verify the effectiveness of cancelling audio, the system 100 may prompt the user, using the user device 105, to indicate whether the user is hearing an audio track that they should not hear, e.g., whether the user is hearing a conversation in a language other than the selected language. In some implementations, the user may be prompted to give additional subjective feedback, i.e., whether the music is at sufficient volume.

From the above description, it should be apparent that various techniques can be used to implement the concepts described in this application without departing from the scope of those concepts. Moreover, although concepts have been described with specific reference to certain implementations, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of those concepts. As such, the described implementations should be considered in all respects as illustrative and not restrictive. It should also be understood that the application is not limited to the particular implementations described above, but is capable of many rearrangements, modifications, and substitutions without departing from the scope of the disclosure.

Claims

1. A system, comprising:

a plurality of speakers; and

a media apparatus, comprising:

a memory configured to store an audio application;

a processor configured to execute the audio application to:

receiving a plurality of audio contents;

receiving, from a first user device, a first playback request to play a first audio content of the plurality of audio content using the plurality of speakers;

in response to the first playback request, obtaining a first position of a first user of the first user device relative to each speaker of the plurality of speakers; and

playing, using the plurality of speakers, the first audio content of the plurality of audio content based on the first position of a first user of the first user device relative to each speaker of the plurality of speakers.

2. The system of claim 1, wherein the processor is further configured to execute the audio application to:

receiving, from a second user device, a second playback request to play a second audio content of the plurality of audio contents using the plurality of speakers;

in response to the second playback request, obtaining a second position of a second user of the second user device relative to each speaker of the plurality of speakers; and

playing, using the plurality of speakers, the second audio content of the plurality of audio content based on the second position of a second user of the second user device relative to each speaker of the plurality of speakers.

3. The system of claim 2, wherein the first of the plurality of audio content played by the plurality of speakers comprises cancellation audio to cancel at least a portion of the second of the plurality of audio content played by the plurality of speakers.

4. The system of claim 2, wherein the first of the plurality of audio content played by the plurality of speakers comprises a first conversation in a first language and the second of the plurality of audio content played by the plurality of speakers comprises a second conversation in a second language.

5. The system of claim 1, wherein obtaining the first location comprises receiving the first location from the user device.

6. The system of claim 1, further comprising a camera, wherein obtaining the first location comprises using the camera.

7. The system of claim 1, wherein the processor is further configured to receive a first audio profile from the first user device and play the first of the plurality of audio content further based on the first audio profile.

8. The system of claim 7, wherein the first audio profile includes at least one of a language and a listening mode.

9. The system of claim 8, wherein the listening patterns include at least one of normal, enhanced dialog, custom, and genre.

10. The system of claim 1, wherein the first of the plurality of audio content comprises a conversation in a user selected language.

11. A method for a system comprising a plurality of speakers, a memory, and a processor, the method comprising:

receiving, using the processor, a plurality of audio content;

receiving, using the processor, a first playback request from a first user device to play a first audio content of the plurality of audio content using the plurality of speakers;

obtaining, using the processor, a position of a user of the user device relative to each of the plurality of speakers in response to the playback request; and

playing, using the plurality of speakers, the first audio content of the plurality of audio content based on the first position of the first user relative to each speaker of the plurality of speakers.

12. The method of claim 11, further comprising:

receiving, using the processor, a second playback request from a second user device to play a second audio content of the plurality of audio contents using the plurality of speakers;

playing, using the plurality of speakers, the second audio content of the plurality of audio content based on the second position of the second user relative to each speaker of the plurality of speakers.

13. The method of claim 12, wherein the first of the plurality of audio content played by the plurality of speakers comprises cancellation audio to cancel at least a portion of the second of the plurality of audio content played by the plurality of speakers.

14. The method of claim 12, wherein the first of the plurality of audio content played by the plurality of speakers comprises a first conversation in a first language and the second of the plurality of audio content played by the plurality of speakers comprises a second conversation in a second language.

15. The method of claim 11, wherein obtaining the first location comprises receiving the first location from the user equipment.

16. The method of claim 11, wherein the system further comprises a camera, wherein obtaining the first location comprises using the camera.

17. The method of claim 11, wherein the method further comprises receiving a first audio profile from the first user device, and wherein the playing of the first one of the plurality of audio content is further based on the first audio profile.

18. The method of claim 17, wherein the first audio profile comprises at least one of a language and a listening mode.

19. The method of claim 17, wherein the listening patterns include at least one of normal, enhanced dialog, custom, and genre.

20. The method of claim 11, wherein the first of the plurality of audio content comprises a dialog in a user-selected language.