WO2013088208A1 - Appareil d'alignement de scène audio - Google Patents
Appareil d'alignement de scène audio Download PDFInfo
- Publication number
- WO2013088208A1 WO2013088208A1 PCT/IB2011/055692 IB2011055692W WO2013088208A1 WO 2013088208 A1 WO2013088208 A1 WO 2013088208A1 IB 2011055692 W IB2011055692 W IB 2011055692W WO 2013088208 A1 WO2013088208 A1 WO 2013088208A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- audio
- signal
- combined
- audio signals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/27—Server based end-user applications
- H04N21/274—Storing end-user multimedia data in response to end-user request, e.g. network recorder
- H04N21/2743—Video hosting of uploaded data from client
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/462—Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
- H04N21/4622—Retrieving content or additional data from different sources, e.g. from a broadcast channel and the Internet
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8106—Monomedia components thereof involving special audio data, e.g. different tracks for different languages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/414—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
- H04N21/41407—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42202—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] environmental sensors, e.g. for detecting temperature, luminosity, pressure, earthquakes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
Definitions
- the present application relates to apparatus for the processing of audio and additionally audio-video signals to enable alignment to audio signals.
- the invention further relates to, but is not limited to, apparatus for processing audio and additionally audio-video signals from mobile devices.
- Multiple 'feeds' may be found in sharing services for video and audio signals (such as those employed by YouTube).
- Such systems which are known and are widely used to share user generated content recorded and uploaded or up- streamed to a server and then downloaded or down-streamed to a viewing/listening user.
- Such systems rely on users recording and uploading or up- streaming a recording of an event using the recording facilities at hand to the user. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone.
- the viewing/listening end user may then select one of the up-streamed or uploaded data to view or listen.
- an apparatus comprising at least one processor and at least one memory including computer code for one or more programs, the at least one memory and the computer code configured to with the at least one processor cause the apparatus to at least perform: generating a combined audio signal from at least two first audio signals; comparing the combined audio signal to at least one second audio signal; determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal; and associating the alignment value to the at least one second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal.
- the apparatus may be further caused to perform: receiving the at least two first audio signals from an audio scene; receiving the at least one second audio signal from the audio scene.
- the apparatus may be further caused to perform aligning at least one of the at least two first audio signals and the at least one second audio signal dependent on the alignment value.
- the apparatus may be further caused to render the aligned at least one of the at least two first audio signals and the at least one second audio signal for outputting.
- Generating a combined audio signal from at least two first audio signals may cause the apparatus to perform: generating a combined audio signal from an average of the at least two first audio signals when the at least two first audio signals are concurrent; and appending to the combined audio signal an available at least two first audio signal part otherwise.
- Comparing the combined audio signal to at least one second audio signal may cause the apparatus to perform a cross-correlation between the combined audio signal and the at least one second audio signal.
- Determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal may cause the apparatus to perform determining a time offset which maximises the cross-correlation product between the combined audio signal and the at least one second audio signal.
- Associating the alignment value to the at least two first audio signals may cause the apparatus to perform: assigning the at least two first audio signals to a first group of audio signals; assigning the at least one second audio signal to a second group of audio signals; assigning a null alignment value to the first group of audio signals; and assigning the alignment value to the second group of audio signals.
- the apparatus may be further caused to perform: generating a further combined audio signal from the at least two first audio signals and the at least one second audio signal associated with the alignment value; comparing the further combined audio signal to at least one further second audio signal; determining a further alignment value configured to temporally align the further combined audio signal to the at least one further second audio signal; and associating the alignment value to the at least one further second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal and the at least one further second audio signal.
- the first audio signals may comprise timestamp information and the second audio signals lack the timestamp information.
- a method comprising: generating a combined audio signal from at least two first audio signals; comparing the combined audio signal to at least one second audio signal; determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal; and associating the alignment value to the at least one second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal.
- the method may further comprise: receiving the at least two first audio signals from an audio scene; receiving the at least one second audio signal from the audio scene.
- the method may further comprise aligning at least one of the at least two first audio signals and the at least one second audio signal dependent on the alignment value.
- the method may further comprise rendering the aligned at least one of the at least two first audio signals and the at least one second audio signal for outputting.
- Generating a combined audio signal from at least two first audio signals may comprise: generating a combined audio signal from an average of the at least two first audio signals when the at least two first audio signals are concurrent; and appending to the combined audio signal an available at least two first audio signal part otherwise.
- Comparing the combined audio signal to at least one second audio signal may comprise a cross-correlation between the combined audio signal and the at least one second audio signal.
- Determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal may comprise determining a time offset which maximises the cross-correlation product between the combined audio signal and the at least one second audio signal.
- Associating the alignment value to the at least two first audio signals may comprise: assigning the at least two first audio signals to a first group of audio signals; assigning the at least one second audio signal to a second group of audio signals; assigning a null alignment value to the first group of audio signals; and assigning the alignment value to the second group of audio signals.
- the method may further comprise: generating a further combined audio signal from the at least two first audio signals and the at least one second audio signal associated with the alignment value; comparing the further combined audio signal to at least one further second audio signal; determining a further alignment value configured to temporally align the further combined audio signal to the at least one further second audio signal; and associating the alignment value to the at least one further second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal and the at least one further second audio signal.
- the first audio signals may comprise timestamp information and the second audio signals lack the timestamp information.
- an apparatus comprising: a signal combiner configured to generate a combined audio signal from at least two first audio signals; a combined signal comparator configured to compare the combined audio signal to at least one second audio signal; an signal aligner configured to determine an alignment value configured to temporally align the combined audio signal to the at least one second audio signal; and and a full signal aligner configured to associate the alignment value to the at least one second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal.
- the apparatus may further comprise: a first signal receiver configured to receive the at least two first audio signals from an audio scene; and a second signal receiver configured to receive the at least one second audio signal from the audio scene.
- the apparatus may further comprise a delay configured to align at least one of the at least two first audio signals and the at least one second audio signal dependent on the alignment value.
- the apparatus may further comprise a signal renderer configured to render the aligned at least one of the at least two first audio signals and the at least one second audio signal for outputting.
- the signal combiner may comprise: an averager configured to generate a combined audio signal from an average of the at least two first audio signals when the at least two first audio signals are concurrent; and an appender configured to append to the combined audio signal an available at least two first audio signal part otherwise.
- the signal comparator may comprises a cross-correlator configured to generate a cross-correlation product between the combined audio signal and the at least one second audio signal.
- the alignment determiner may comprise an offset determiner configured to determine a time offset which maximises the cross-correlation product between the combined audio signal and the at least one second audio signal.
- the full signal aligner may comprise: a first assigner configured to assign the at least two first audio signals to a first group of audio signals; a second assigner configured to assign the at least one second audio signal to a second group of audio signals; a null assigner configured to assign a null alignment value to the first group of audio signals; and a delay assigner configured to assign the alignment value to the second group of audio signals.
- the signal combiner may be configured to further combine an audio signal from the at least two first audio signals and the at least one second audio signal associated with the alignment value.
- the comparator may be configured to further compare the further combined audio signal to at least one further second audio signal.
- the aligner may be configured to further determine a further alignment value configured to temporally align the further combined audio signal to the at least one further second audio signal.
- the full signal aligner may be further configured to associate the alignment value to the at least one further second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal and the at least one further second audio signal.
- the first audio signals may comprise timestamp information and the second audio signals lack the timestamp information.
- an apparatus comprising: means for generating a combined audio signal from at least two first audio signals; means for comparing the combined audio signal to at least one second audio signal; means for determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal; and means for associating the alignment value to the at least one second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal.
- the apparatus may further comprise: means for receiving the at least two first audio signals from an audio scene; means for receiving the at least one second audio signal from the audio scene.
- the apparatus may further comprise means for aligning at least one of the at least two first audio signals and the at least one second audio signal dependent on the alignment value.
- the apparatus may further comprise means for rendering the aligned at least one of the at least two first audio signals and the at least one second audio signal for outputting.
- the means for generating a combined audio signal from at least two first audio signals may comprise: means for generating a combined audio signal from an average of the at least two first audio signals when the at least two first audio signals are concurrent; and means for appending to the combined audio signal an available at least two first audio signal part otherwise.
- the means for comparing the combined audio signal to at least one second audio signal may comprise means for generating a cross-correlation product between the combined audio signal and the at least one second audio signal.
- the means for determining an alignment value configured to temporally align the combined audio signal to the at least one second audio signal may comprise means for determining a time offset which maximises the cross-correlation product between the combined audio signal and the at least one second audio signal.
- the means for associating the alignment value to the at least two first audio signals may comprise: means for assigning the at least two first audio signals to a first group of audio signals; means for assigning the at least one second audio signal to a second group of audio signals; means for assigning a null alignment value to the first group of audio signals; and means for assigning the alignment value to the second group of audio signals.
- the means for combining may further comprise means for generating a further combined audio signal from the at least two first audio signals and the at least one second audio signal associated with the alignment value.
- the means for comparing may further comprise means for comparing the further combined audio signal to at least one further second audio signal.
- the means for determining an alignment value may comprise means for determining a further alignment value configured to temporally align the further combined audio signal to the at least one further second audio signal.
- the means for associating the alignment value may comprise means for associating the alignment value to the at least one further second audio signal so to temporally align the at least two first audio signals with the at least one second audio signal and the at least one further second audio signal.
- the first audio signals may comprise timestamp information and the second audio signals lack the timestamp information.
- a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
- An electronic device may comprise apparatus as described herein.
- a chipset may comprise apparatus as described herein.
- Embodiments of the present application aim to address problems associated with the state of the art.
- Figure 1 shows schematically a multi-user free-viewpoint service sharing system which may encompass embodiments of the application
- FIG. 2 shows schematically an apparatus suitable for being employed in embodiments of the application
- Figure 3 shows schematically an audio signal system according to some embodiments
- Figure 4 shows a flow diagram of the operation of the audio signal system as shown in Figure 3;
- FIG. 5 shows schematically the time stamp determiner shown in Figure 3 in further detail according to some embodiments
- Figure 6 shows the operation of the time stamp determiner shown in Figure
- Figure 7 shows an example set of time stamped and non-time stamped audio signals to be aligned according to embodiments.
- audio signals and audio capture signals are described. However it would be appreciated that in some embodiments the audio signal/audio capture is a part of an audio-video system.
- the concept of this application is related to assisting in the production of immersive person-to-person communication and can include video. It would be understood that the space within which the devices record the audio signal can be arbitrarily positioned within an event space.
- the captured signals as described herein are transmitted or alternatively stored for later consumption where the end user can select the listening point based on their preference from the reconstructed audio space.
- the rendering part then can provide one or more down mixed signals from which the multiple recordings that correspond to the selective listening point.
- each recording device can record the event seen and upload or upstream the recorded content.
- the uploaded or upstream process can include implicitly positioning information about where the content is being recorded.
- audio signal content can be uploaded in non-real time operations.
- the media content in the form of a captured audio signal can be uploaded a few minutes, hours, days or weeks after the event.
- the amount of content that represents each particular event and held in the server can fluctuate as a function of time.
- uploaded audio signal content typically does not employ common time keeping or time stamping and therefore the newly uploaded or streamed content needs to be aligned to use a common time stamping before any down mixed signal is provided for consumption.
- new content can be uploaded to the server at any time, a situation could arrive where the existing content already uses a common time stamping, however new content lacks this time stamp and should be transformed to this time stamping mode.
- the concept of this application therefore is to provide an enabler for the case where some of the audio signal content for the event is already converted to a common time stamping and the conversion is to be applied to any new uploaded audio signal content.
- a common time base can be achieved with a dedicated synchronisation signal
- the capture devices are equipped to receive a specific beacon signal or timing information obtained through a network or other received data such as positioning satellite timing data (such as from a GPS satellite)
- positioning satellite timing data such as from a GPS satellite
- the use of a beacon signal typically requires special hardware and/or software installations to the recording or capture apparatus which limits the applicability of multiuser sharing services as recording devices become too expensive for mass use or limits the use of existing devices.
- GPS or other satellite synchronisation timing signals can be used, the device requires a GPS or other satellite positioning receiver to receive the signal and furthermore could be used in circumstances where a GPS or satellite positioning signal is not available - such as, for example, indoors or operating in heavily built-up urban areas, or in woodland or forest regions.
- NTP network time protocol
- the recording devices synchronise the recordings against an NTP reference.
- NTP reference requires network connection which may not be available in all situations and typically timing errors can be introduced to the time stamps due to transmission delays.
- the audio space 1 can have located within it at least one recording or capturing device or apparatus 19 which are arbitrarily positioned within the audio space to record suitable audio scenes.
- the apparatus 19 shown in Figure 1 are represented as microphones with a polar gain pattern 101 showing the directional audio capture gain associated with each apparatus.
- the apparatus 19 in Figure 1 are shown such that some of the apparatus are capable of attempting to capture the audio scene or activity 103 within the audio space.
- the activity 103 can be any event the user of the apparatus wishes to capture. For example the event could be a music event or audio of a "news worthy" event.
- the apparatus 19 although being shown having a directional microphone gain pattern 101 would be appreciated that in some embodiments the microphone or microphone array of the recording apparatus 19 has a omnidirectional gain or different gain profile to that shown in Figure 1 .
- Each recording apparatus 19 can in some embodiments transmit or alternatively store for later consumption the captured audio signals via a transmission channel 107 to an audio scene server 109.
- the recording apparatus 19 in some embodiments can encode the audio signal to compress the audio signal in a known way in order to reduce the bandwidth required in "uploading" the audio signal to the audio scene server 109.
- the recording apparatus 19 in some embodiments can be configured to estimate and upload via the transmission channel 107 to the audio scene server 109 an estimation of the location and/or the orientation or direction of the apparatus.
- the position information can be obtained, for example, using GPS coordinates, cell-ID or a-GPS or any other suitable location estimation methods and the orientation/direction can be obtained, for example using a digital compass, accelerometer, or gyroscope information.
- the recording apparatus 19 can be configured to capture or record one or more audio signals for example the apparatus in some embodiments have multiple microphones each configured to capture the audio signal from different directions. In such embodiments the recording device or apparatus 19 can record and provide more than one signal from different the direction/orientations and further supply position/direction information for each signal.
- an audio or sound source can be defined as each of the captured or audio recorded signal.
- each audio source can be defined as having a position or location which can be an absolute or relative value.
- the audio source can be defined as having a position relative to a desired listening location or position.
- the audio source can be defined as having an orientation, for example where the audio source is a beamformed processed combination of multiple microphones in the recording apparatus, or a directional microphone.
- the orientation may have both a directionality and a range, for example defining the 3dB gain range of a directional microphone.
- the capturing and encoding of the audio signal and the estimation of the position/direction of the apparatus is shown in Figure 1 by step 1001.
- the uploading of the audio and position/direction estimate to the audio scene server 109 is shown in Figure 1 by step 1003.
- the audio scene server 109 furthermore can in some embodiments communicate via a further transmission channel 1 1 1 to a listening device 1 13.
- the listening device 1 13 which is represented in Figure 1 by a set of headphones, can prior to or during downloading via the further transmission channel 1 1 1 select a listening point, in other words select a position such as indicated in Figure 1 by the selected listening point 105.
- the listening device 113 can communicate via the further transmission channel 11 1 to the audio scene server 109 the request.
- the audio scene server 109 can as discussed above in some embodiments receive from each of the recording apparatus 19 an approximation or estimation of the location and/or direction of the recording apparatus 19.
- the audio scene server 109 can in some embodiments from the various captured audio signals from recording apparatus 19 produce a composite audio signal representing the desired listening position and the composite audio signal can be passed via the further transmission channel 1 1 1 to the listening device 1 13.
- the listening device 1 13 can request a multiple channel audio signal or a mono-channel audio signal. This request can in some embodiments be received by the audio scene server 109 which can generate the requested multiple channel data.
- the audio scene server 109 in some embodiments can receive each uploaded audio signal and can keep track of the positions and the associated direction/orientation associated with each audio source.
- the audio scene server 109 can provide a high level coordinate system which corresponds to locations where the uploaded/upstreamed content source is available to the listening device 1 13. The "high level" coordinates can be provided for example as a map to the listening device 1 13 for selection of the listening position.
- the listening device (end user or an application used by the end user) can in such embodiments be responsible for determining or selecting the listening position and sending this information to the audio scene server 109.
- the audio scene server 109 can in some embodiments receive the selection/determination and transmit the downmixed signal corresponding to the specified location to the listening device.
- the listening device/end user can be configured to select or determine other aspects of the desired audio signal, for example signal quality, number of channels of audio desired, etc.
- the audio scene server 109 can provide in some embodiments a selected set of downmixed signals which correspond to listening points neighbouring the desired location/direction and the listening device 1 13 selects the audio signal desired.
- Figure 2 shows a schematic block diagram of an exemplary apparatus or electronic device 10, which may be used to record (or operate as a recording device 19) or listen (or operate as a listening device 1 13) to the audio signals (and similarly to record or view the audio-visual images and data). Furthermore in some embodiments the apparatus or electronic device can function as the audio scene server 109.
- the electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system when functioning as the recording device or listening device 1 13.
- the apparatus can be an audio player or audio recorder, such as an MP3 player, a media recorder/player (also known as an MP4 player), or any suitable portable device suitable for recording audio or audio/video camcorder/memory audio or video recorder.
- the apparatus 10 can in some embodiments comprise an audio subsystem.
- the audio subsystem for example can comprise in some embodiments a microphone or array of microphones 1 1 for audio signal capture.
- the microphone or array of microphones can be a solid state microphone, in other words capable of capturing audio signals and outputting a suitable digital format signal.
- the microphone or array of microphones 1 1 can comprise any suitable microphone or audio capture means, for example a condenser microphone, capacitor microphone, electrostatic microphone, Electret condenser microphone, dynamic microphone, ribbon microphone, carbon microphone, piezoelectric microphone, or microelectrical-mechanical system (MEMS) microphone.
- the microphone 1 1 or array of microphones can in some embodiments output the audio captured signal to an analogue-to-digital converter (ADC) 14.
- ADC analogue-to-digital converter
- the apparatus can further comprise an analogue-to-digital converter (ADC) 14 configured to receive the analogue captured audio signal from the microphones and outputting the audio captured signal in a suitable digital form.
- ADC analogue-to-digital converter
- the analogue-to-digital converter 14 can be any suitable analogue-to- digital conversion or processing means.
- the apparatus 10 audio subsystem further comprises a digital-to-analogue converter 32 for converting digital audio signals from a processor 21 to a suitable analogue format.
- the digital-to-analogue converter (DAC) or signal processing means 32 can in some embodiments be any suitable DAC technology.
- the audio subsystem can comprise in some embodiments a speaker 33.
- the speaker 33 can in some embodiments receive the output from the digital- to-analogue converter 32 and present the analogue audio signal to the user.
- the speaker 33 can be representative of a headset, for example a set of headphones, or cordless headphones.
- the apparatus 10 is shown having both audio capture and audio presentation components, it would be understood that in some embodiments the apparatus 10 can comprise one or the other of the audio capture and audio presentation parts of the audio subsystem such that in some embodiments of the apparatus the microphone (for audio capture) or the speaker (for audio presentation) are present.
- the apparatus 10 comprises a processor 21.
- the processor 21 is coupled to the audio subsystem and specifically in some examples the analogue-to-digital converter 14 for receiving digital signals representing audio signals from the microphone 1 1 , and the digital-to-analogue converter (DAC) 12 configured to output processed digital audio signals.
- the processor 21 can be configured to execute various program codes.
- the implemented program codes can comprise for example audio classification and audio scene mapping code routines.
- the program codes can be configured to perform audio scene event detection and device selection indicator generation, wherein the audio scene server 109 can be configured to determine events from multiple received audio recordings to assist the user in selecting an audio recording which is meaningful and does not require the listener to carry out undue searching of all of the audio recordings.
- the apparatus further comprises a memory 22.
- the processor is coupled to memory 22.
- the memory can be any suitable storage means.
- the memory 22 comprises a program code section 23 for storing program codes implementable upon the processor 21 .
- the memory 22 can further comprise a stored data section 24 for storing data, for example data that has been encoded in accordance with the application or data to be encoded via the application embodiments as described later.
- the implemented program code stored within the program code section 23, and the data stored within the stored data section 24 can be retrieved by the processor 21 whenever needed via the memory-processor coupling.
- the apparatus 10 can comprise a user interface 15.
- the user interface 15 can be coupled in some embodiments to the processor 21.
- the processor can control the operation of the user interface and receive inputs from the user interface 15.
- the user interface 15 can enable a user to input commands to the electronic device or apparatus 10, for example via a keypad, and/or to obtain information from the apparatus 10, for example via a display which is part of the user interface 15.
- the user interface 15 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the apparatus 10 and further displaying information to the user of the apparatus 10.
- the apparatus further comprises a transceiver 13, the transceiver in such embodiments can be coupled to the processor and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
- the transceiver 13 or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
- the coupling can, as shown in Figure 1 , be the transmission channel 107 (where the apparatus is functioning as the recording device 19 or audio scene server 109) or further transmission channel 1 1 1 (where the device is functioning as the listening device 1 13 or audio scene server 109).
- the transceiver 13 can communicate with further devices by any suitable known communications protocol, for example in some embodiments the transceiver 13 or transceiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
- UMTS universal mobile telecommunications system
- WLAN wireless local area network
- IRDA infrared data communication pathway
- the apparatus comprises a position sensor 16 configured to estimate the position of the apparatus 10.
- the position sensor 16 can in some embodiments be a satellite positioning sensor such as a GPS (Global Positioning System), GLONASS or Galileo receiver.
- GPS Global Positioning System
- GLONASS Galileo receiver
- the positioning sensor can be a cellular ID system or an assisted GPS system.
- the apparatus 10 further comprises a direction or orientation sensor.
- the orientation/direction sensor can in some embodiments be an electronic compass, accelerometer, a gyroscope or be determined by the motion of the apparatus using the positioning estimate.
- the structure of the electronic device 10 could be supplemented and varied in many ways.
- the above apparatus 10 in some embodiments can be operated as an audio scene server 109.
- the audio scene server 109 can comprise a processor, memory and transceiver combination.
- Figure 3 shows an example alignment apparatus for content time stamping for crowd source media content.
- the apparatus is configured to create a combined signal for the received time stamped content, to align the non-time stamped content signal and the combined signal and then to further align both the time stamped and the non-time stamped content signal using the information from the operation of aligning the non-time stamped content signal and the combined signal.
- the alignment apparatus comprises in some embodiments a non-time stamped content input 201 or suitable input means.
- the non-time stamped content input 201 is configured to receive audio signals in any suitable format and pass these to the time stamped determiner 205.
- the non-time stamped content input 201 can be configured to pre-process the non-time stamped content to deliver the audio data in a format suitable for processing by the time stamp determiner 205.
- the alignment apparatus comprises a time stamped content input 203 or suitable second input means.
- the time stamp content input 203 can be configured to receive the time stamped audio data in any suitable format and pass the time stamped content audio data to the time stamped determiner 205 for further processing.
- the time stamped content input 203 and non-time stamp content input 201 can be implemented by a transceiver element of an audio scene server receiving audio signals from various capture devices recording audio signals within the audio scene and/or from the memory or store associated with the audio scene server.
- the alignment apparatus represents a combination of an audio content recorder or capturer and thus receives the time stamped content via a communications coupling such as a wireless communications link, audio scene server and listener apparatus.
- the alignment apparatus comprises a time stamp determiner 205 or suitable alignment means.
- the time stamp determiner 205 is configured to receive the audio data from the non-time stamp content input 201 and also the audio data from the time stamp content input 203 and determine (or align) a time stamp for the non-time stamp content input based on the time stamped content audio signals.
- the time stamp determiner 205 can output the audio data, both the received time stamped content input and the aligned non-time stamp content to a content renderer 207.
- the alignment apparatus comprises a content renderer 207 or rendering means.
- the content renderer 207 is configured in some embodiments to receive the audio signals and render these into an audio signal suitable for consumption.
- the content renderer 207 can be configured to generate a multi-channel audio signal from the input content audio signals suitable for passing to a listener apparatus.
- the content renderer 207 can be implemented in the audio scene server, in the content recorder or capturer, or in the content listener apparatus.
- the implementation of content rendering is generally known and will not be described any further.
- the apparatus further comprises a content processor 209 configured to receive the rendered audio signal data from the content renderer 207 and process it in order that it can be displayed or listened to by the end user.
- the content processor 209 can in some embodiments control the content renderer 207 to produce a suitable rendered audio signal at a determined position, or configuration.
- the content processor 209 can in some embodiments be implemented within a listener apparatus.
- step 309 The operation of processing the content, or consuming the content is shown in Figure 4 by step 309.
- the timestamp determiner 205 is shown in further detail. Furthermore with respect to Figure 6 the operation of the time stamp determiner as shown in Figure 5 is described in further detail.
- the time stamp determiner 205 comprises a signal combiner 401 or combiner means.
- the signal combiner 401 is configured to receive at least two of the time stamped audio content signals and combine these signals to generate a single combined time stamp signal.
- the time stamp audio signals 651 can be combined in the signal combiner 401 to generate a combined signal.
- the combined signal can be created by a suitable averaging and appending means according to the following mathematical expression,
- n ⁇ Ll Plldx ' 0
- x is the audio signal (or representative signal derived from the audio signal) of the timestamped crowd-sourced content
- nClip lldx and clipldx lldx describe the number of content items and the content item indices for the tldx th time segment, respectively
- t _start, ldx and t _end lldx describe the start and end times of the segment for the tldx th time segment, respectively.
- the combined signal 652 can then be output by the signal combiner 401 to a combined signal aligner determiner 403.
- the time stamp determiner comprising a combined signal aligner determiner 403.
- the combined signal aligner determiner 403 is configured to receive the combined signal from the signal combiner 401 containing a time stamp and also the non-time stamped content audio signals and attempt to align these signals.
- the combined signal 606 CX and a non-time stamped content signal 607D is shown.
- the combined signal aligner 403 can be configured to determine the time offset between the signals, in other words whether the non-time stamped content audio signal is delayed with respect to these combined signals with time stamped content values or vice versa.
- the determination of the time offset can be carried out via a time offset determiner or suitable signal aligner means or function.
- the output tOffset therefore in such embodiments contains corresponding time offset values for each of the input signals.
- the determination of time offset can be carried out by any suitable function such as a correlation analysis such as indicated by Carter, Nutall and Cable in "The smoothed coherence transform", Proceedings of the IEEE, Vol. 61 , No.
- the output of the combined signal aligner determiner 403 can then be passed in some embodiments to the full signal aligner 405.
- the time stamp determiner 205 comprises a full signal aligner 405 or suitable full signal alignment means configured to align or time stamp the non-time stamped content signal with respect to the time stamped content, in other words to map the output time offsets to actual input content items. For example using the signals A, B, C and D from Figure 7, the following offsets can be determined,
- D_offset tOffset(1 )
- tOffset2 ⁇ A_offset, B_offset, C_offset, D_offset ⁇
- the time offset for all of these is equal to the time offset of CX.
- time stamps of previously time stamped and non-time stamped content items can be updated to use a common time base. This can be performed, for example in some embodiments by having the following information, Table 1 :
- the grpID is used to identify crowd-sourced content items that are continuous in time.
- content audio signals A, B, and C could in such embodiments be assigned the same grpID.
- the start and end timestamps identify the positions of the content within the continuous timeline either in absolute terms or in relative terms (with respect to content A in the example Figure 7 as that appears first in the continuous timeline).
- the following information can be generated from the input content items, the creation of the combined signal(s) and the output of the time offset function: Table 2: nGroups Number of groups that share the same continuous timeline trackldx Number of combined signals
- grpCombID Group index for signals in Equation (2) where trackldxLeft describes the indices of the non-timestamped content items in the order they appear in the input content. For example, trackldxLeft for the example signal set in Figure 7 would be trackldxLeft ⁇ D ⁇ as content D is the first content item that appears after the combined signals.
- grpCombID indicates which of the input signals in the time offset function share the same continuous timeline. In some embodiments it can be assumed that the time_offset() function can provide this information as an output value. Where for example signals do not share the same continuous timeline, the time offset function can be configured to assign a time offset in such a way that the different continuous timelines do not overlap.
- grpCombldxNum indicates the number of clips (including also the content signal) the particular content signal is overlapping with.
- the alignment or mapping can be performed using such generated values as described herein according to the following pseudo code,
- M is the number of input content items which includes the time stamped and non-time stamped content audio signals that have been specified for processing.
- lines 6 to 51 are repeated for each of the non-time stamped content items.
- the group identification value grpID of the non-time stamped input content is checked to determine if it is unknown as shown in line 8 of the above pseudo code. Furthermore after determining the time offset, checks whether the non-time stamped audio signal overlaps with some of the other input signals, shown on line 10.
- lines 12 to 15 determine whether the signal is overlapping with any of the combined signals. Where no overlapping group is found the signal parameters (the input content item index) such as shown in line 20 and the time offset in line 21 are appended to the output grouping data as shown in lines 19 to 22 of the pseudo code.
- the full signal aligner can as shown in lines 24 onwards the input content item is checked to determine whether the audio signal item belongs to the combined signal as shown in line 29. Where this is the situation the values are appended to the output grouping data as shown in lines 38 to 40. Furthermore the input content item that is overlapping with the combined signal is also appended as shown in lines 45 to 48. The lines 31 to 34 of the pseudo code check whether the signal parameters are appended only once to the output grouping data.
- the full signal aligner can then in some embodiments finalise the mapping by updating the timings and the group identification values of those input content items that has been identified to share continuous timelines. In some embodiments this finalisation operation can be described with regards to the following pseudo code,
- lines 2 to 41 of the above code are repeated for each group that share continuous timelines. Where the group has more than one content signal, as determined by line 4, the corresponding input content information is updated.
- lines 6 to 1 1 determine the reference group identification value or the content signals.
- lines 13 to 16 determine the start time for the corresponding group.
- lines 18 to 26 update the start time of the content item within the group to match a relative time differences as defined by the offset time value defined herein.
- Lines 28 to 38 of the code update the start and stop times and the group identification value information for each content item within the identified group.
- the variable tServerStatus in line 35 is used to indicate that the timing information for the content item is obtained through the combined signal processing mode.
- the time stamped and non-time stamped content items can be aligned based on the guidance information as indicated herein.
- the guidance information can in some embodiments include the variable tServerStatus and the corresponding startTSinformation for each input content item where available.
- the alignment can be carried out according to the following expression,
- startTS, stopTS, grpID time_align(startTS, stopTS, grpID, ⁇ input_content_items>)
- time_align() defines the function that determines the alignment for the specified input.
- the function can take as an input the start and end times and the group identification value along with the actual media content.
- the output of such a function would be the updated timings and group identification values for each of the input content.
- the operation of determining the time offset can be any suitable approach such as shown previously with regards to correlation analysis.
- the start and stop times of the signal pair are dated such that the time offset window (toWindow) for the pair can be limited to a small value.
- the value can be set to 1 second indicating that the signals in the pair are aligned within + or - 1 second within respect to each other.
- the major operation efficiency as the final time offset needs to be searched only from a very limited time period.
- the combined signal represents the entire events scene and is therefore highly unlikely that the time offset returned by the expressions herein are not optimal since the geographical area of the event scene may cover a large area where different content signals can be located quite far from each other.
- the steps of processing as described herein are designed to overcome the challenges the size of the event scene may bring to determining the time offsets between various content in a robust and reliable manner.
- the combined signal may be computed more than once. This for example can be carried out where the number of content items to be covered is large.
- a suitable combination for example a random or fixed combination of the already time stamped content can be used to create the combined signal and then following the steps as described herein, the final time steps in the common time data can be determined.
- the combined signal can in some embodiments be recreated from time differences in composition including some new content items compared to previously created combined signal or replacing some content items with new items in the combined signal and then repeating the processing steps for alignment.
- the output values from each of these combined signal iterations can then be saved and the final time differences for the content items can be completed from the saved values.
- the data analysis can be performed to determine the time offset value towards each content item in a converging form. This converging form can be easily extracted using mean and standard variance calculations where the final output is a mean value which does not include any outlier values defined by the standard variance.
- embodiments may also be applied to audio-video signals where the audio signal components of the recorded data are processed in terms of the determining of the base signal and the determination of the time alignment factors for the remaining signals and the video signal components may be synchronised using the above embodiments of the invention.
- the video parts may be synchronised using the audio synchronisation information.
- user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
- PLMN public land mobile network
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
- any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate. Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
- a standardized electronic format e.g., Opus, GDSII, or the like
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
L'invention concerne un appareil comprenant : un combinateur de signaux configuré pour générer un signal audio combiné à partir d'au moins deux premiers signaux audio ; un comparateur de signaux combinés configuré pour comparer, à au moins un second signal audio, le signal audio combiné ; un aligneur de signaux configuré pour déterminer une valeur d'alignement configurée pour aligner temporellement le signal audio combiné sur le ou les seconds signaux audio ; et un aligneur de signaux complets configuré pour associer la valeur d'alignement au second signal audio ou aux seconds signaux audio de façon à aligner temporellement les au moins deux premiers signaux audio sur le ou les seconds signaux audio.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2011/055692 WO2013088208A1 (fr) | 2011-12-15 | 2011-12-15 | Appareil d'alignement de scène audio |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/IB2011/055692 WO2013088208A1 (fr) | 2011-12-15 | 2011-12-15 | Appareil d'alignement de scène audio |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2013088208A1 true WO2013088208A1 (fr) | 2013-06-20 |
Family
ID=48611921
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2011/055692 Ceased WO2013088208A1 (fr) | 2011-12-15 | 2011-12-15 | Appareil d'alignement de scène audio |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2013088208A1 (fr) |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018091777A1 (fr) | 2016-11-16 | 2018-05-24 | Nokia Technologies Oy | Capture audio répartie et commande de mixage |
| CN110036416A (zh) * | 2016-11-25 | 2019-07-19 | 诺基亚技术有限公司 | 用于空间音频的装置和相关方法 |
| US10728443B1 (en) | 2019-03-27 | 2020-07-28 | On Time Staffing Inc. | Automatic camera angle switching to create combined audiovisual file |
| CN112416289A (zh) * | 2020-11-12 | 2021-02-26 | 北京字节跳动网络技术有限公司 | 一种音频同步方法、装置、设备和存储介质 |
| US10963841B2 (en) | 2019-03-27 | 2021-03-30 | On Time Staffing Inc. | Employment candidate empathy scoring system |
| US11023735B1 (en) | 2020-04-02 | 2021-06-01 | On Time Staffing, Inc. | Automatic versioning of video presentations |
| US11127232B2 (en) | 2019-11-26 | 2021-09-21 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
| US11144882B1 (en) | 2020-09-18 | 2021-10-12 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
| US11423071B1 (en) | 2021-08-31 | 2022-08-23 | On Time Staffing, Inc. | Candidate data ranking method using previously selected candidate data |
| US11727040B2 (en) | 2021-08-06 | 2023-08-15 | On Time Staffing, Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
| US11907652B2 (en) | 2022-06-02 | 2024-02-20 | On Time Staffing, Inc. | User interface and systems for document creation |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100119083A1 (en) * | 2008-11-11 | 2010-05-13 | Motorola, Inc. | Compensation for nonuniform delayed group communications |
| WO2010131105A1 (fr) * | 2009-05-12 | 2010-11-18 | Nokia Corporation | Appareil |
| US20110085671A1 (en) * | 2007-09-25 | 2011-04-14 | Motorola, Inc | Apparatus and Method for Encoding a Multi-Channel Audio Signal |
| US20110301730A1 (en) * | 2010-06-02 | 2011-12-08 | Sony Corporation | Method for determining a processed audio signal and a handheld device |
-
2011
- 2011-12-15 WO PCT/IB2011/055692 patent/WO2013088208A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110085671A1 (en) * | 2007-09-25 | 2011-04-14 | Motorola, Inc | Apparatus and Method for Encoding a Multi-Channel Audio Signal |
| US20100119083A1 (en) * | 2008-11-11 | 2010-05-13 | Motorola, Inc. | Compensation for nonuniform delayed group communications |
| WO2010131105A1 (fr) * | 2009-05-12 | 2010-11-18 | Nokia Corporation | Appareil |
| US20110301730A1 (en) * | 2010-06-02 | 2011-12-08 | Sony Corporation | Method for determining a processed audio signal and a handheld device |
Cited By (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110089131B (zh) * | 2016-11-16 | 2021-07-13 | 诺基亚技术有限公司 | 用于分布式音频捕获和混合控制的装置和方法 |
| CN110089131A (zh) * | 2016-11-16 | 2019-08-02 | 诺基亚技术有限公司 | 分布式音频捕获和混合控制 |
| EP3542549A4 (fr) * | 2016-11-16 | 2020-07-08 | Nokia Technologies Oy | Capture audio répartie et commande de mixage |
| US10785565B2 (en) | 2016-11-16 | 2020-09-22 | Nokia Technologies Oy | Distributed audio capture and mixing controlling |
| WO2018091777A1 (fr) | 2016-11-16 | 2018-05-24 | Nokia Technologies Oy | Capture audio répartie et commande de mixage |
| CN110036416A (zh) * | 2016-11-25 | 2019-07-19 | 诺基亚技术有限公司 | 用于空间音频的装置和相关方法 |
| CN110036416B (zh) * | 2016-11-25 | 2022-11-29 | 诺基亚技术有限公司 | 用于空间音频的装置和相关方法 |
| US10963841B2 (en) | 2019-03-27 | 2021-03-30 | On Time Staffing Inc. | Employment candidate empathy scoring system |
| US10728443B1 (en) | 2019-03-27 | 2020-07-28 | On Time Staffing Inc. | Automatic camera angle switching to create combined audiovisual file |
| US11961044B2 (en) | 2019-03-27 | 2024-04-16 | On Time Staffing, Inc. | Behavioral data analysis and scoring system |
| US11457140B2 (en) | 2019-03-27 | 2022-09-27 | On Time Staffing Inc. | Automatic camera angle switching in response to low noise audio to create combined audiovisual file |
| US11863858B2 (en) | 2019-03-27 | 2024-01-02 | On Time Staffing Inc. | Automatic camera angle switching in response to low noise audio to create combined audiovisual file |
| US11783645B2 (en) | 2019-11-26 | 2023-10-10 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
| US11127232B2 (en) | 2019-11-26 | 2021-09-21 | On Time Staffing Inc. | Multi-camera, multi-sensor panel data extraction system and method |
| US11023735B1 (en) | 2020-04-02 | 2021-06-01 | On Time Staffing, Inc. | Automatic versioning of video presentations |
| US11861904B2 (en) | 2020-04-02 | 2024-01-02 | On Time Staffing, Inc. | Automatic versioning of video presentations |
| US11184578B2 (en) | 2020-04-02 | 2021-11-23 | On Time Staffing, Inc. | Audio and video recording and streaming in a three-computer booth |
| US11636678B2 (en) | 2020-04-02 | 2023-04-25 | On Time Staffing Inc. | Audio and video recording and streaming in a three-computer booth |
| US11720859B2 (en) | 2020-09-18 | 2023-08-08 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
| US11144882B1 (en) | 2020-09-18 | 2021-10-12 | On Time Staffing Inc. | Systems and methods for evaluating actions over a computer network and establishing live network connections |
| CN112416289B (zh) * | 2020-11-12 | 2022-12-09 | 北京字节跳动网络技术有限公司 | 一种音频同步方法、装置、设备和存储介质 |
| CN112416289A (zh) * | 2020-11-12 | 2021-02-26 | 北京字节跳动网络技术有限公司 | 一种音频同步方法、装置、设备和存储介质 |
| US11727040B2 (en) | 2021-08-06 | 2023-08-15 | On Time Staffing, Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
| US11966429B2 (en) | 2021-08-06 | 2024-04-23 | On Time Staffing Inc. | Monitoring third-party forum contributions to improve searching through time-to-live data assignments |
| US11423071B1 (en) | 2021-08-31 | 2022-08-23 | On Time Staffing, Inc. | Candidate data ranking method using previously selected candidate data |
| US11907652B2 (en) | 2022-06-02 | 2024-02-20 | On Time Staffing, Inc. | User interface and systems for document creation |
| US12321694B2 (en) | 2022-06-02 | 2025-06-03 | On Time Staffing Inc. | User interface and systems for document creation |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2013088208A1 (fr) | Appareil d'alignement de scène audio | |
| US20130304244A1 (en) | Audio alignment apparatus | |
| US9445174B2 (en) | Audio capture apparatus | |
| CN109313907B (zh) | 合并音频信号与空间元数据 | |
| US9936292B2 (en) | Spatial audio apparatus | |
| US10097943B2 (en) | Apparatus and method for reproducing recorded audio with correct spatial directionality | |
| US20130226324A1 (en) | Audio scene apparatuses and methods | |
| US20160155455A1 (en) | A shared audio scene apparatus | |
| US20130297053A1 (en) | Audio scene processing apparatus | |
| US9195740B2 (en) | Audio scene selection apparatus | |
| US20150310869A1 (en) | Apparatus aligning audio signals in a shared audio scene | |
| US20150271599A1 (en) | Shared audio scene apparatus | |
| US20150302892A1 (en) | A shared audio scene apparatus | |
| US9392363B2 (en) | Audio scene mapping apparatus | |
| CN103180907B (zh) | 音频场景装置 | |
| WO2015086894A1 (fr) | Appareil de capture de scène audio |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11877416 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11877416 Country of ref document: EP Kind code of ref document: A1 |