WO2018231185A1

WO2018231185A1 - Method of synchronizing sound signals

Info

Publication number: WO2018231185A1
Application number: PCT/UA2017/000089
Authority: WO
Inventors: Василий Васильевич ДУМА; Роман Викторович КУЛИНИЧ; Дмитрий Константинович ХАНЧОПУЛО
Original assignee: Василий Васильевич ДУМА
Priority date: 2017-06-16
Filing date: 2017-09-05
Publication date: 2018-12-20

Abstract

The invention relates to the processing of sound signals, in particular to a method of processing dynamic audio properties using a mechanism or sequence of tuning operations so as to quickly adapt to changes in the content of a sound signal. The method uses synchronization maps of sound signals recorded from a microphone for synchronizing a rendering of an original or other sound track using a mobile client device and a mechanism for generating a synchronization map and saving same in a digital file, wherein the synchronization map is generated beforehand on a server and transmitted remotely or locally to a user device, the synchronization map data being encrypted beforehand. The method provides the capability of synchronizing a converted original audio file comprising audio recorded from a microphone for reproduction of the same audio file, a similar audio file or a different audio file.

Description

METHOD OF SYNCHRONIZATION OF AUDIO SIGNALS

The invention relates to the processing of audio signals, in particular a method for processing the dynamic properties of audio using a tuning mechanism or sequence of operations for quickly adapting to changes in content in an audio signal, as well as for computer programs for implementing such methods in practice.

A tuning signal can be generated by analyzing the audio signal itself, or tuning can be triggered by an external event, such as a change in a channel on a television receiver, or a change in input selection on an audio / video receiver. In the case of an external audio signal, one or more indications of the state of dynamic properties for the current sound source can be stored and associatively associated with such a sound source before switching to a new sound source. Then, if the system switches back to the first sound source, the dynamic processor can be configured to the state saved earlier, or its approximation.

A known method of mixing two input audio signals into a single composite audio signal with support for the perceived sound level of the composite audio signal, the method includes the steps of: accepting the main input audio signal; receive a connected input audio signal, and the associated input signal is connected to the main input audio signal; accept mixing metadata containing scaling information for scaling the main input audio signal and determining how the main input signal and the associated input signal should be mixed in order to generate a composite audio signal at a perceived sound level; wherein the scaling information from the mixing metadata comprises a metadata scale factor for the main input audio signal, for scaling the main input audio signal relative to the associated input audio signal; weighting the main input audio signal and the associated input audio signal in the composite audio signal, as defined in the mixing metadata; identify the predominant signal either as the main input audio signal, or as related the input audio signal from the scaling information provided by the mixing metadata and from the mixing balance input, where the corresponding other input signal is then identified as a non-dominant signal; and where the predominant signal is identified by comparing the mixing balance input signal with a metadata scale factor for the main audio input signal; scaling the non-predominant signal relative to the predominant signal; and combining a scalable non-dominant signal with a dominant signal to generate a composite audio signal [UA j l05590, H03G 3/00, 2014].

However, the user may wish to deviate from the settings provided by the manufacturer, dictated by the metadata transmitted along with the associated signal. For example, a user who activates the director’s comments while watching a movie at some point during playback decides that he is more likely to hear the original dialogue that the manufacturer indicated in the metadata as being subject to weakening during mixing so that it does not prevail over director’s comments.

Therefore, there is a need to create a regulation that would allow the user to regulate the mixing of the input audio signals and, at the same time, would provide a favorable user experience by storing the perceived sound level in the composite signal. In addition, there is also a need to create a control for mixing the input audio signals and, at the same time, maintain a consistent sound level for the composite signal even when the scaling information from the metadata and the external input from the user can be time-varying so that there was no need for additional adjustment of the level of the composite signal.

Closest to the claimed invention is a method for processing an audio signal using a setting, which consists in the fact that the dynamic properties of the audio signal are changed in accordance with the sequence of operations for adjusting the dynamic properties, an event is detected in the temporary development of the audio signal, in which the level of the audio signal decreases by an amount greater than the threshold of visibility (Ldrop) within the time interval, no more than the second threshold value of time (tdrop), while the above is detected It reveals a decrease in the sound signal level in the plural number of frequency bands and reconfigures the sequence of operations regulation of dynamic properties in response to the mentioned detection [UA N ° 94968, H03G 3/00, H03G 7/00, 201 1].

However, this method, as well as the previous analogue, does not sufficiently effectively facilitate the synchronization of the converted original audio file with the audio recorded from the microphone to play the same, similar, or different audio file.

The basis of the invention is the task of creating a method for synchronizing audio signals, which would be able to effectively facilitate the synchronization of the converted original audio file recorded from the microphone audio to play the same, similar or different audio file.

The problem is solved in that in a method for synchronizing audio signals in which the dynamic properties of an audio signal are changed in accordance with a sequence of adjusting dynamic properties according to the invention, synchronization cards are used for audio signals recorded from a microphone to synchronize the rendering of an original or other audio track using client’s mobile device (mobile phone, smartphone, smart TV, laptop, laptop), use the card generation mechanism Synchronizing and storing it in the digital file, the synchronization map and data thereon in advance is generated and encrypted at the server and transmitted to the user device remotely or locally.

In addition, in the method of synchronizing audio signals when preparing a file, synchronization cards convert sound into a frequency domain and use filtering and extraction methods.

As a mobile device they use a mobile phone, smartphone, smart TV, laptop, laptop or tablet.

The inventive method provides the ability to synchronize the converted original audio file recorded from the microphone audio to play the same, similar or different audio file.

The utility model is illustrated by the drawings:

figure 1 shows a diagram of a device for implementing the method;

figure 2 is a sequence diagram of the method.

The method is implemented as follows.

Mobile device 2 (mobile phone, smart phone smart TV, laptop, laptop or tablet) is used to record an audio signal or sounds in an open or closed space 1, using an incoming sound data source 4 (for example, a microphone).

The device uses the recorded sound in block 12 to synchronize with another audio track, which is currently played using the reference synchronization card prepared earlier and received on the device (phone, smartphone or tablet) via a wireless or other network.

Synchronization is carried out in real time and the offset in the original synchronization file is taken into account, taking into account the recorded audio segment (from 4 to 15 seconds).

Synchronization is performed on the converted and filtered (digitized) data in the media buffer 5 on the synchronization card using the synchronization unit 12.

In addition, an acceleration or deceleration algorithm for the recorded track from the microphone can be used.

As a result of synchronization, another audio file can be played back, taking into account the time offset obtained when synchronizing with the first original audio file converted in block 1 1.

They use standard mechanisms for converting audio signals (map conversion unit 6) into another coordinate system - frequency (such as fast Fourier transform, but other methods are also possible).

Great emphasis is placed on filtering and highlighting the rms frequency maxima or values close to peak (but not frequency peaks) with values of at least 50%, 75% or more of the maximum.

The breakdown of frequencies into ranges can be performed from 5 to 14 ranges at a given time.

They also use an additional search refinement algorithm (synchronization block 12):

The work of the complementary algorithm consists of the following steps:

- compilation of vector maps (VMP);

- drawing up the intersection of vector maps;

- selection from the track sections, probably corresponding to the desired fragment;

~ selection of the site that has the most correspondence with the VMP of the desired fragment. The selected sections are analyzed and one that has the highest VMP similarity of the desired fragment according to the following criteria: the length of the section along the time axis, the number of vectors in the section and the number of vectors having common points.

For the selected site, ts is calculated. The algorithm considers that the found ts is the beginning of the desired fragment ts.

To determine the accuracy of the match, the entire recorded and converted fragment of the audio is divided into separate subbands, and the summing function passes through each of the fragments. Next, use the difference in displacements for each of the fragments. The number of fragments can be from 4 to 10.

Comparative Formula 1:

j.v4 - 3 | <| 2 - xl \ + j

During one of the stages, an array is created that stores pairs of vectors - a vector from a VMP fragment and a corresponding vector from a VMP track.

The criteria for compliance is a vector key. If several vectors from a VMP track correspond to one vector of a VMP fragment, then several elements are created in the array that have the same vector from the VMP fragment, but different vectors of the V P track.

When working with the converted original audio file, the decryption unit 10 is used, which additionally decrypts the converted original audio file in memory - the synchronization card.

After decoding part of the audio file, the client device can play back the original fragment of the audio track on the client device 2, taking into account all the delays during the operation of the algorithm.

To download synchronization files, the user first logs in to the authorization block 7 and, after gaining access, can download encrypted files: synchronization cards and audio tracks for playing in the synchronization card block (encrypted) 8 and media block 9, which through the decryption block 10 fall into the synchronization block 12 .

To download data for synchronization from the server (cloud) 3, the user via the Internet goes to the authorization unit 13 and then to the content delivery unit 14, where through the encryption unit 15 it receives data from the database for cards 16 and the audio database 17. A working model has been created for various audio files and data synchronization using synchronization cards of sound signals recorded from a microphone to synchronize the rendering of the original or other sound track using a client’s mobile device (mobile phone, smartphone, smart TV, laptop, laptop).

To get started, the client must press a button on the keyboard or on the touch screen or in any other way.

To implement this, use the mechanism for generating a synchronization card and saving it in a digital file. The synchronization card file is generated on the server in advance and on the user's device remotely or locally. Data is encrypted in advance. For encryption, both symmetric and asymmetric algorithms are used. When preparing a synchronization map file, methods are used to convert sound to the frequency domain, and various filtering and highlighting methods can be used.

Claims

CLAIM

1. A method for synchronizing audio signals in which the dynamic properties of an audio signal are changed in accordance with a sequence of adjusting dynamic properties, characterized in that they use synchronization cards of sound signals recorded from a microphone to synchronize the rendering of an original or other audio track using a client’s mobile device, a mechanism for generating a synchronization card and storing it in a digital file, while the synchronization card is generated on the Zara server it and the user device remotely or locally and the data on the card is encrypted synchronization advance.

2. The method according to claim 1, characterized in that when preparing the file, the synchronization cards turn the sound into a frequency domain and use filtering and extraction methods.

3. The method according to claim 1, characterized in that as a mobile device using a mobile phone, or smartphone, or smart TV, or laptop, or netbook, or tablet.