CN112908289B - Beat determining method, device, equipment and storage medium - Google Patents
Beat determining method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112908289B CN112908289B CN202110261348.6A CN202110261348A CN112908289B CN 112908289 B CN112908289 B CN 112908289B CN 202110261348 A CN202110261348 A CN 202110261348A CN 112908289 B CN112908289 B CN 112908289B
- Authority
- CN
- China
- Prior art keywords
- association
- peak
- beat
- determining
- audio intensity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/375—Tempo or beat alterations; Music timing control
- G10H2210/391—Automatic tempo adjustment, correction or control
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The embodiment of the application discloses a beat determining method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an autocorrelation sequence corresponding to the audio data; determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position; acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position; and if the audio intensity values corresponding to the first association position and the second association position meet the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining that the peak position is the beat position. The scheme improves the beat determination accuracy and has higher operation efficiency.
Description
Technical Field
The embodiment of the application relates to the field of computers, in particular to a beat determining method, a beat determining device, beat determining equipment and a storage medium.
Background
Audio is applied to various scenes as a main material of multimedia, and when music in an audio file is applied, it is often necessary to determine the tempo of the piece of music. Illustratively, in the video production process, the video clip effect can be better by synchronizing the clip points of the video clip with the rhythm of the music. The beat of the music is a combination rule of strong beat and weak beat, specifically the total length of notes of each bar in the music score, and the length of each bar is fixed. Generally, the tempo of a piece of music is fixed at the time of composing.
In the prior art, after an audio waveform is determined, a beat in the audio waveform is estimated by a music beat detection algorithm, and the beat of music is determined, for example, a gaussian priori is obtained by performing trend analysis on a large amount of data, and the beat is determined based on the gaussian priori distribution. However, the determination method has high requirements on the audio data, and for the audio data with more noise and complex sound combination, the deviation is easy to occur, and meanwhile, the beat detection is obtained based on estimation, so that the final determined beat has poor accuracy and needs a large amount of data operation.
Disclosure of Invention
The embodiment of the application provides a beat determining method, a beat determining device, beat determining equipment and a beat determining storage medium, which have low requirements on the noisy degree of original audio data and can efficiently and accurately determine the beat of the audio data.
In a first aspect, an embodiment of the present application provides a beat determining method, including:
acquiring an autocorrelation sequence corresponding to the audio data;
determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position;
acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position;
and if the audio intensity values corresponding to the first association position and the second association position meet the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining that the peak position is the beat position.
In a second aspect, an embodiment of the present application further provides a beat determining apparatus, including:
the sequence acquisition module is used for acquiring an autocorrelation sequence corresponding to the audio data;
the positioning module is used for determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position;
the audio intensity acquisition module is used for acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position;
and the beat position determining module is used for determining the peak position as the beat position if the audio intensity values corresponding to the first association position and the second association position meet the peak condition and the audio intensity value corresponding to the third association position does not meet the peak condition.
In a third aspect, an embodiment of the present application further provides a beat determining apparatus, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the beat determining method according to the embodiments of the present application.
In a fourth aspect, embodiments of the present application also provide a storage medium storing computer-executable instructions that, when executed by a computer processor, are configured to perform the beat determining method according to the embodiments of the present application.
In the embodiment of the application, a peak position in an autocorrelation sequence corresponding to audio data is determined by acquiring the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in a preset proportion to the peak position, audio intensity values corresponding to the first association position, the second association position and the third association position are acquired respectively, and if the audio intensity values corresponding to the first association position and the second association position meet a peak condition and the audio intensity value corresponding to the third association position does not meet the peak condition, the peak position is determined to be a beat position. The method has low noisy degree requirement on the original audio data, and can efficiently and accurately determine the beat of the audio data.
Drawings
Fig. 1 is a flowchart of a beat determining method according to an embodiment of the present application;
FIG. 1a is a corresponding autocorrelation function diagram obtained based on audio data according to an embodiment of the present application;
fig. 1b is a schematic diagram of beat location determination performed at a preferred preset scale according to an embodiment of the present application;
fig. 1c is a schematic diagram of a beat determining method and a beat determined by other methods according to an embodiment of the present application;
fig. 2 is a flowchart of another beat determining method according to an embodiment of the present application;
fig. 3 is a flowchart of another beat determining method according to an embodiment of the present application;
fig. 4 is a block diagram of a beat determining device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the application. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present application are shown in the drawings.
Fig. 1 is a flowchart of a beat determining method provided in an embodiment of the present application, which is applicable to determining a music beat in audio data, and the method may be performed by a computing device, such as a desktop, a notebook, or a server device, and specifically includes the following steps:
step S101, an autocorrelation sequence corresponding to the audio data is acquired.
The audio data may be audio of an acquired piece of music, where the music includes a fixed beat and a corresponding tempo. The rhythm is one of three elements of music, which is motion and change, and the beat is a relatively static and stable factor in the music, and has the characteristic of periodic repetition. The audio data in the scheme can be used for video editing, namely, after the beat in the audio data is determined, the synchronization of the audio and the video can be carried out in the video editing process; the system can also be used for controlling a plurality of scenes such as stage lighting change according to the determined beats during stage performance.
In one embodiment, the autocorrelation sequence corresponding to the audio data is first obtained. The autocorrelation sequence is obtained according to an audio signal in the audio data. In particular, an autocorrelation function may be utilized for the audio signal x (n)
To find the corresponding autocorrelation sequence, the above formula characterizes the similarity of a signal to itself after a delay of m points. The autocorrelation sequence corresponding to the audio data determined in the present embodiment is not limited to the calculation based on the autocorrelation function, and may be, for example, a calculation using a short-time autocorrelation function. In another embodiment, the corresponding autocorrelation sequence of the audio data may also be calculated and output using a functional interface as provided in Matlab software.
Because of the periodicity of the music beats contained in the audio data, the corresponding autocorrelation sequence obtained by the method has periodicity and is the same as the periodicity of the audio data, and the method for acquiring the periodicity of the periodic signal is provided. The autocorrelation function value may reach a maximum value over an integer multiple of the period of the periodic signal, so that the pitch period of the audio signal may be estimated from the position of the maximum value in the autocorrelation sequence, irrespective of the start time.
In one embodiment, the autocorrelation sequence corresponding to the audio data may be characterized in the form of an array comprising a position index and corresponding audio intensity values. If x [0] =13.5, x [1] =12, x [2] =10, x [3] =11, x [4] =12.5. Where 0,1,2, 3..is the position index, 13.5, 12, 10, 11, 12.5 is the corresponding audio intensity value (the audio intensity value is the relative value). It should be noted that, the autocorrelation sequence may also be stored in other data structure forms, such as a linked list, and the present solution is not limited to a specific representation form of the data structure.
Step S102, determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position.
Wherein the peak position is a position in the autocorrelation sequence where the intensity of the characterized audio signal is relatively high. I.e. the position at which the peak position is located, indexes the corresponding audio signal strength relatively high. Exemplary, as shown in fig. 1a, fig. 1a is a corresponding autocorrelation function diagram obtained based on audio data, where the abscissa represents a position index and the ordinate represents an audio signal strength according to an embodiment of the present application. As shown, the peak position determined includes x 18, which corresponds to an audio signal strength of 13.
In one embodiment, the determination of the peak position may be that after all the audio signal intensities corresponding to the position indexes in the current autocorrelation sequence are traversed, a peak interval is determined according to the traversing result, and the position index corresponding to the maximum value of the local audio signal intensities falling into the peak interval is determined as the peak position. If the audio intensity values corresponding to the position indexes x 0 to x 80 are obtained, and the interval of the maximum peak value is counted to be between 12.5 and 14, the maximum value of the local audio signal intensity in the position indexes falling into 12.5 to 14 is determined, if the audio intensity values corresponding to the position index intervals x 38 to x 42 are satisfied to be between 12.5 and 14, the maximum value of the local audio intensity value is 13.9 corresponding to x 39, and x 39 is determined as the peak value position. In another embodiment, the position index corresponding to the maximum value of the determined local audio intensity values may be determined as the peak position in the process of performing audio signal intensity traversal according to the position index.
In one embodiment, for a determined peak position, three associated positions associated with the peak position are determined, and recorded as a first associated position, a second associated position, and a third associated position, respectively, where the first associated position, the second associated position, and the third associated position are respectively in a preset proportion to the peak position, e.g., the preset proportion is denoted as m, n, and q, and the position index corresponding to the peak position is denoted as x [ a ], and then the position indexes corresponding to the first associated position, the second associated position, and the third associated position are sequentially denoted as x [ am ], x [ an ], and x [ aq ]. In one embodiment, the predetermined ratio is m: a=1:4, n: a=1:2, q: a=1:3, and the exemplary position index a=180 of the peak position is x [45], the second associated position is x [90], and the third associated position is x [60]. That is, the position index value corresponding to the first association position is one fourth of the position index value corresponding to the peak position, the position index value corresponding to the second association position is one half of the position index value corresponding to the peak position, the position index value corresponding to the third association position is one third of the position index value corresponding to the peak position, and it should be noted that the first association position, the second association position and the third association position are defined only by nouns of the determined three association positions, and the sequence of determining is not limited. Meanwhile, the preset proportion is not strictly limited to 1/4,1/2 and 1/3, and the positions of the position indexes in the preset range in the position indexes corresponding to 1/4,1/2 and 1/3 can be respectively used as a first association position, a second association position and a third association position. For example, the ratio of the position index corresponding to the preferred first association position to the position index corresponding to the peak position is 1:4, and if the position index corresponding to the peak position is 100, the position index corresponding to the first association position is 25, and the ratio of the two is 1:4. In one embodiment, the preset range may be the positions corresponding to the position indexes 2 before and after the position index corresponding to the preferred preset ratio, that is, the position indexes 23, 24, 26 and 27 may be determined as the first associated position, where the ratio of the position index corresponding to the first associated position to the position index corresponding to the peak position is 23:100, 24:100, 26:100 and 27:100, that is, the preset ratio for the first associated position to the peak position may be in the interval of 0.23 to 0.27. Similarly, the setting that the second association position and the third association position satisfy the preset ratio with the peak position may refer to the setting that the first association position and the peak position satisfy the preset ratio, which is not described herein.
Step S103, acquiring audio intensity values corresponding to the first association position, the second association position and the third association position respectively.
After the first association position, the second association position and the third association position are determined in step S102, the audio intensity values corresponding to the position indexes of the first association position, the second association position and the third association position are determined accordingly. Illustratively, the first associated position is x 45 with a corresponding audio intensity value of 13, the second associated position is x 90 with a corresponding audio intensity value of 13.8, and the third associated position is x 60 with a corresponding audio intensity value of 10.
Step S104, if the audio intensity values corresponding to the first association position and the second association position meet a peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining the peak position as a beat position.
In one embodiment, the peak condition is: and within the preset position index range, the current audio intensity value is a local maximum value. Taking the example that the current autocorrelation sequence includes 200 position indexes in total, the preset position index range may be 1-5. It should be noted that, the preset position index range may be adjusted adaptively according to different audio data, and the accuracy and the total amount of the position indexes may be adjusted timely, where the higher the exemplary accuracy range is, the more the total amount of the position indexes is, and the larger the corresponding numerical value of the preset position index range is. In another embodiment, the value of the preset position index range can be adjusted according to the size of the operation data and the operation efficiency. Taking the preset position index range as an example, for the position index x [60], it is required to determine whether the audio intensity value corresponding to x [60] is a local maximum value of the audio intensity values in the preset index range, and if the audio intensity value corresponding to x [60] is 14.2, the corresponding determined position indexes x [57] =13.8, x [58] =13.9, x [59] =14, x [61] =14.1, x [62] =13.8 and x [63] =13.7, and x [60] =14.2 are the maximum values thereof, then x [60] is determined to be the local maximum audio intensity value. Illustratively, if x [57] =14, x [58] =14.5, x [59] =14, x [61] =14.1, x [62] =14.3 and x [63] =14.1 are determined not to be the maximum value thereof, x [60] =14.2 is determined accordingly, which does not satisfy the peak condition.
In the determining process of the beat position, if it is determined that the audio intensity values corresponding to the first association position and the second association position meet the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, the peak position is determined to be the beat position correspondingly. Accordingly, if the three conditions are: and when the audio intensity value corresponding to the first association position meets the peak condition, the audio intensity value corresponding to the second association position meets the peak condition, and any one or more conditions in the peak condition are not met by the audio intensity value corresponding to the third association position, the current peak position is judged not to be the beat position.
In one embodiment, when it is determined that the current peak position is not the beat position, the next peak position in the autocorrelation sequence is correspondingly determined, and the step of determining whether the peak position is the beat position is performed until all peak positions in the autocorrelation sequence are completely determined, or after one or more beat positions are determined, remaining beat positions in the audio data are determined according to the beat intervals.
As can be seen from the above solution, in the present solution, an autocorrelation sequence corresponding to audio data is obtained, a beat position in the audio data is determined based on the autocorrelation sequence, and a beat is determined based on the accurate start time without accurately determining a start time of music in the audio data. When the beat position is determined, whether the plurality of associated positions meet peak conditions or not is used for constraint, and accuracy of beat position determination under the distance far from the starting point of the audio data is guaranteed. In addition, the method does not adopt a speed estimation or algorithm function estimation mode when determining the local peak value, so that the operation deviation is obviously reduced, and meanwhile, the data operation amount is reduced. In the scheme, the beat position is determined not directly according to the position of the maximum signal intensity in a certain area, but is constrained by using the condition of a plurality of associated positions, so that the accuracy of beat position determination is ensured.
Fig. 1b is a schematic diagram of beat location determination performed at a preferred preset scale according to an embodiment of the present application. As shown in fig. 1b, the position index 117 is determined as a peak position, and the audio intensity values corresponding to the first association position, the second association position and the third association position are determined correspondingly based on the peak position, so as to determine whether the peak condition is satisfied. Setting the ratio of the position index value corresponding to the first association position to the position index value of the peak position to be 1:4, setting the position index corresponding to the first association position to be 29 (rounding), and judging that the audio intensity value corresponding to the position index meets the peak condition; setting the ratio of the position index value corresponding to the second association position to the position index value of the peak position to be 1:2, setting the position index corresponding to the second association position to be 58, and judging that the audio intensity value corresponding to the position index meets the peak condition; and setting the ratio of the position index value corresponding to the third association position to the position index value of the peak position to be 1:3, wherein the position index corresponding to the third association position is 39, judging that the audio intensity value corresponding to the position index does not meet the peak condition, and finally determining the peak position as the beat position.
Fig. 1c is a schematic diagram of a beat determining method and a beat determined by other methods according to an embodiment of the present application. As shown in fig. 1c, the position of the oval outline in the upper area is the actual beat position in the audio data, and the beat position determined by the constraint condition (the audio intensity value corresponding to the first association position and the second association position meets the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition) in this scheme is shown in the schematic diagram shown in the bottom of fig. 1c, so that the determined beat position and the actual beat position are basically consistent. And other methods are adopted, such as the condition that the audio intensity values corresponding to the first association position and the second association position meet the peak condition, the audio intensity value corresponding to the third association position is not constrained to not meet the peak condition, the determined beat position corresponding to the audio data is as shown in the middle area of fig. 1c, and the determined beat position error is larger.
Fig. 2 is a flowchart of another beat determining method according to an embodiment of the present application, which further determines and adjusts the determined beat position accurately. As shown in fig. 2, the technical scheme is as follows:
step 201, an autocorrelation sequence corresponding to the audio data is acquired.
Step S202, determining a peak position in the autocorrelation sequence, and a first correlation position, a second correlation position and a third correlation position corresponding to the peak position.
Step S203, acquiring audio intensity values corresponding to the first association position, the second association position and the third association position respectively.
Step S204, if the audio intensity values corresponding to the first association position and the second association position meet a peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining the peak position as a beat position.
Step S205, determining a calibration position corresponding to the peak position, and adjusting the beat position based on the audio intensity value in the preset position index range at the calibration position.
In one embodiment, the peak position is further calibrated after it is determined to be a beat position based on the first, second and third correlation positions. Specifically, a calibration position corresponding to the beat position is determined, and the beat position is adjusted based on the audio intensity value in the preset position index range at the calibration position. The determining of the preset position index range refers to the explanation of step S104, and is not described herein. Optionally, the ratio of the position index value of the calibration position to the position index value of the peak position is 2:1, the position index value corresponding to the maximum value in the audio intensity values in the preset position index range at the calibration position is determined, and the position corresponding to 1/2 of the position index value is determined as the adjusted beat position. For example, the position index of the peak position is 100, the position index of the corresponding calibration position is 200, taking the preset index range as 3 as an example, the values of x [197], x [198], x [199], x [201], x [202] and x [203] are respectively obtained, for example, x [197] =15.1, x [198] =15.2, x [199] =15.5, x [200] =15.6, x [201] =15.8, x [202] =16 and x [203] =15.5, and the position index corresponding to the maximum value is determined as 202 in the values of x [197], x [198], x [199], x [201], x [202] and x [203], then 1/2 of the position index value is 101, and the beat position is adjusted to the position where the position index 101 is located. It should be noted that, the ratio of the position index value of the calibration position to the position index value of the peak position is not limited to 2:1, and may be 2n:1, wherein n is a natural number.
According to the scheme, after the peak position is determined to be the beat position, the peak position is further adjusted, the problem that the peak position is deviated to a certain extent due to the fact that the audio data are digitized is solved, the inventor finds out through experimental tests that the calibration position of the preset proportion of the peak position is obtained, the beat position is adjusted based on the audio intensity value in the index range of the preset position at the calibration position, and the accuracy of obtaining the obtained beat position is higher.
Fig. 3 is a flowchart of another beat determining method provided in the embodiment of the present application, which provides a specific method for comparing sub-stream data to determine a difficult sample. As shown in fig. 4, the technical scheme is as follows:
step S301, an autocorrelation sequence corresponding to the audio data is acquired.
Step S302, determining a peak position in the autocorrelation sequence, and a first correlation position, a second correlation position and a third correlation position corresponding to the peak position.
Step S303, acquiring audio intensity values corresponding to the first association position, the second association position and the third association position respectively.
Step S304, determining whether the audio intensity value corresponding to the first association position meets a peak condition, if yes, executing step S305, otherwise executing step S308.
Step S305, judging whether the audio intensity value corresponding to the second association position meets the peak condition, if yes, executing step S306, otherwise, executing step S308
Step S306, determining whether the audio intensity value corresponding to the third association position meets the peak condition, if not, executing step S307, otherwise executing step S308.
Step S307, determining the peak position as the beat position.
Step 308, determining that the peak position is not the beat position, obtaining another peak position in the autocorrelation sequence, and jumping to step 302.
Step S309, determining a calibration position corresponding to the peak position, where the calibration position is in a preset proportion to the peak position, and adjusting the beat position based on the audio intensity value in the preset position index range at the calibration position.
As can be seen from the above solution, in the present solution, an autocorrelation sequence corresponding to audio data is obtained, a beat position in the audio data is determined based on the autocorrelation sequence, and a beat is determined based on the accurate start time without accurately determining a start time of music in the audio data. When the beat position is determined, whether a plurality of associated positions meet peak conditions or not is used for constraint, when the associated positions of the unsatisfied conditions exist, the associated positions are determined to be non-beat positions, and otherwise, the peak positions are determined to be beat positions. In addition, the accuracy of the beat position is further ensured by performing accuracy adjustment on the beat position.
The order of the condition determining steps in the step S304, the step S305, and the step S306 may be different, and the order is not limited.
Fig. 4 is a block diagram of a beat determining device according to an embodiment of the present application, where the device is configured to execute the beat determining method provided in the foregoing embodiment, and has functional modules and beneficial effects corresponding to the executing method. As shown in fig. 4, the apparatus specifically includes: a sequence acquisition module 101, a localization module 102, an audio intensity acquisition module 103, and a beat position determination module 104, wherein,
a sequence acquisition module 101, configured to acquire an autocorrelation sequence corresponding to audio data;
a positioning module 102, configured to determine a peak position in the autocorrelation sequence, and a first association position, a second association position, and a third association position corresponding to the peak position, where the first association position, the second association position, and the third association position are in a preset proportion to the peak position;
an audio intensity obtaining module 103, configured to obtain audio intensity values corresponding to the first association position, the second association position, and the third association position respectively;
the beat position determining module 104 is configured to determine that the peak position is a beat position if the audio intensity values corresponding to the first association position and the second association position meet a peak condition and the audio intensity value corresponding to the third association position does not meet the peak condition.
As can be seen from the above solution, in the present solution, an autocorrelation sequence corresponding to audio data is obtained, a beat position in the audio data is determined based on the autocorrelation sequence, and a beat is determined based on the accurate start time without accurately determining a start time of music in the audio data. When the beat position is determined, whether the plurality of associated positions meet peak conditions or not is used for constraint, and accuracy of beat position determination under the distance far from the starting point of the audio data is guaranteed. In addition, the method does not adopt a speed estimation or algorithm function estimation mode when determining the local peak value, so that the operation deviation is obviously reduced, and meanwhile, the data operation amount is reduced. In the scheme, the beat position is determined not directly according to the position of the maximum signal intensity in a certain area, but is constrained by using the condition of a plurality of associated positions, so that the accuracy of beat position determination is ensured.
In one possible embodiment, the beat determining device further comprises: a sequence determination module 105 for:
before the autocorrelation sequence corresponding to the audio data is obtained, the autocorrelation sequence corresponding to the audio data is determined, and the autocorrelation sequence comprises a position index and a corresponding audio intensity value.
In one possible embodiment, the positioning module 102 is specifically configured to include:
and traversing each position index and the corresponding audio intensity value in the autocorrelation sequence in sequence, determining the position index meeting the peak condition, and determining the position index as the peak position.
In one possible embodiment, the first, second and third associated positions are in a preset proportion to the peak position, comprising: the ratio of the position index value of the first association position to the position index value of the peak position is 1:4, the ratio of the position index value of the second association position to the position index value of the peak position is 1:2, and the ratio of the position index value of the third association position to the position index value of the peak position is 1:3.
In one possible embodiment, the peak condition includes that the current audio intensity value is a local maximum within a preset position index range.
In one possible embodiment, the beat position determination module 104 is further configured to:
after the peak position is determined to be the beat position, determining a calibration position corresponding to the peak position, wherein the calibration position is in a preset proportion with the peak position; and adjusting the beat position based on the audio intensity value in the preset position index range at the calibration position.
In one possible embodiment, the calibration position is in a preset proportion to the peak position, comprising:
the ratio of the position index value of the calibration position to the position index value of the peak position is 2:1;
the beat position determination module 104 is specifically configured to:
and determining a position index value corresponding to the maximum value in the audio intensity values in the preset position index range at the calibration position, and determining a position corresponding to 1/2 of the position index value as the adjusted beat position.
In one possible embodiment, when the audio intensity values of the first, second, and third associated positions satisfy any one of the following, the beat position determination module 104 determines that the peak position is a non-beat position:
the audio intensity value corresponding to the first association position does not meet the peak condition;
the audio intensity value corresponding to the second association position does not meet the peak condition;
and the audio intensity value corresponding to the third association position meets the peak value condition.
Fig. 5 is a schematic structural diagram of a beat determining device according to an embodiment of the present application, and as shown in fig. 5, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; the number of processors 201 in the device may be one or more, one processor 201 being taken as an example in fig. 5; the processor 201, memory 202, input devices 203, and output devices 204 in the apparatus may be connected by a bus or other means, for example in fig. 5. The memory 202 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the beat determining method in the embodiment of the present application. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions, and modules stored in the memory 202, i.e., implements the beat determination method described above. The input means 203 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.
Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a beat determination method comprising:
acquiring an autocorrelation sequence corresponding to the audio data;
determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position;
acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position;
and if the audio intensity values corresponding to the first association position and the second association position meet the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining that the peak position is the beat position.
It should be noted that, in the above-described embodiment of the beat determining device, each unit and module included is divided according to the functional logic only, but is not limited to the above-described division, as long as the corresponding function can be realized; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the embodiments of the present application are not limited to the particular embodiments described herein, but are capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the embodiments of the present application. Therefore, while the embodiments of the present application have been described in connection with the above embodiments, the embodiments of the present application are not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the embodiments of the present application, and the scope of the embodiments of the present application is determined by the scope of the appended claims.
Claims (11)
1. The beat determining method is characterized by comprising the following steps:
acquiring an autocorrelation sequence corresponding to the audio data;
determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position;
acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position;
and if the audio intensity values corresponding to the first association position and the second association position meet the peak condition, and the audio intensity value corresponding to the third association position does not meet the peak condition, determining that the peak position is the beat position.
2. The beat determining method of claim 1, further comprising, prior to the acquiring the autocorrelation sequence corresponding to the audio data:
an autocorrelation sequence corresponding to the audio data is determined, the autocorrelation sequence including a position index and a corresponding audio intensity value.
3. The beat determination method of claim 2, wherein the determining the peak position in the autocorrelation sequence comprises:
and traversing each position index and the corresponding audio intensity value in the autocorrelation sequence in sequence, determining the position index meeting the peak condition, and determining the position index as the peak position.
4. The beat determining method of claim 3, wherein the first, second and third correlation positions are in a preset proportion to the peak position, comprising: the ratio of the position index value of the first association position to the position index value of the peak position is 1:4, the ratio of the position index value of the second association position to the position index value of the peak position is 1:2, and the ratio of the position index value of the third association position to the position index value of the peak position is 1:3.
5. The beat determining method of claim 2, wherein the peak condition comprises a current audio intensity value being a local maximum within a preset location index range.
6. The beat determination method of claim 2, further comprising, after determining the peak position as a beat position:
determining a calibration position corresponding to the peak position, wherein the calibration position is in a preset proportion with the peak position; and adjusting the beat position based on the audio intensity value in the preset position index range at the calibration position.
7. The beat determining method of claim 6, wherein the calibration position is in a preset proportion to the peak position, comprising:
the ratio of the position index value of the calibration position to the position index value of the peak position is 2:1;
the adjusting the beat position based on the audio intensity value within the preset position index range at the calibration position includes:
and determining a position index value corresponding to the maximum value in the audio intensity values in the preset position index range at the calibration position, and determining a position corresponding to 1/2 of the position index value as the adjusted beat position.
8. The beat determination method of any one of claims 1-7, wherein the peak location is determined to be a non-beat location when the audio intensity values of the first, second and third associated locations satisfy any one of the following:
the audio intensity value corresponding to the first association position does not meet the peak condition;
the audio intensity value corresponding to the second association position does not meet the peak condition;
and the audio intensity value corresponding to the third association position meets the peak value condition.
9. Beat determining means, characterized by comprising:
the sequence acquisition module is used for acquiring an autocorrelation sequence corresponding to the audio data;
the positioning module is used for determining a peak position in the autocorrelation sequence, and a first association position, a second association position and a third association position corresponding to the peak position, wherein the first association position, the second association position and the third association position are in preset proportion to the peak position;
the audio intensity acquisition module is used for acquiring audio intensity values respectively corresponding to the first association position, the second association position and the third association position;
and the beat position determining module is used for determining the peak position as the beat position if the audio intensity values corresponding to the first association position and the second association position meet the peak condition and the audio intensity value corresponding to the third association position does not meet the peak condition.
10. A beat determining device, the device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the beat determination method of any of claims 1-8.
11. A storage medium storing computer executable instructions which, when executed by a computer processor, are for performing the beat determination method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261348.6A CN112908289B (en) | 2021-03-10 | 2021-03-10 | Beat determining method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110261348.6A CN112908289B (en) | 2021-03-10 | 2021-03-10 | Beat determining method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112908289A CN112908289A (en) | 2021-06-04 |
CN112908289B true CN112908289B (en) | 2023-11-07 |
Family
ID=76104806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110261348.6A Active CN112908289B (en) | 2021-03-10 | 2021-03-10 | Beat determining method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112908289B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1868155A (en) * | 2003-08-13 | 2006-11-22 | 英特尔公司 | Universal adaptive synchronization scheme for distributed audio-video capture on heterogeneous computing platforms |
CN102543052A (en) * | 2011-12-13 | 2012-07-04 | 北京百度网讯科技有限公司 | Method and device for analyzing musical BPM |
CN103354091A (en) * | 2013-06-19 | 2013-10-16 | 北京百度网讯科技有限公司 | Audio feature extraction method based on frequency domain transformation and apparatus thereof |
CN106652981A (en) * | 2016-12-28 | 2017-05-10 | 广州酷狗计算机科技有限公司 | BPM detection method and device |
CN108172210A (en) * | 2018-02-01 | 2018-06-15 | 福州大学 | A Singing Harmony Generation Method Based on Singing Rhythm |
CN108335688A (en) * | 2017-12-28 | 2018-07-27 | 广州市百果园信息技术有限公司 | Main beat point detecting method and computer storage media, terminal in music |
CN108335687A (en) * | 2017-12-26 | 2018-07-27 | 广州市百果园信息技术有限公司 | The detection method and terminal of audio signal pucking beat point |
CN109256146A (en) * | 2018-10-30 | 2019-01-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency detection, device and storage medium |
-
2021
- 2021-03-10 CN CN202110261348.6A patent/CN112908289B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1868155A (en) * | 2003-08-13 | 2006-11-22 | 英特尔公司 | Universal adaptive synchronization scheme for distributed audio-video capture on heterogeneous computing platforms |
CN102543052A (en) * | 2011-12-13 | 2012-07-04 | 北京百度网讯科技有限公司 | Method and device for analyzing musical BPM |
CN103354091A (en) * | 2013-06-19 | 2013-10-16 | 北京百度网讯科技有限公司 | Audio feature extraction method based on frequency domain transformation and apparatus thereof |
CN106652981A (en) * | 2016-12-28 | 2017-05-10 | 广州酷狗计算机科技有限公司 | BPM detection method and device |
CN108335687A (en) * | 2017-12-26 | 2018-07-27 | 广州市百果园信息技术有限公司 | The detection method and terminal of audio signal pucking beat point |
CN108335688A (en) * | 2017-12-28 | 2018-07-27 | 广州市百果园信息技术有限公司 | Main beat point detecting method and computer storage media, terminal in music |
CN108172210A (en) * | 2018-02-01 | 2018-06-15 | 福州大学 | A Singing Harmony Generation Method Based on Singing Rhythm |
CN109256146A (en) * | 2018-10-30 | 2019-01-22 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency detection, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112908289A (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100880480B1 (en) | Real-time music / voice identification method and system of digital audio signal | |
US10497348B2 (en) | Evaluation device and evaluation method | |
US12198665B2 (en) | Method for detecting melody of audio signal and electronic device | |
CN104978962A (en) | Query by humming method and system | |
JP2017519255A (en) | Musical score tracking method and related modeling method | |
EP1895507A1 (en) | Pitch estimation, apparatus, pitch estimation method, and program | |
US20220047954A1 (en) | Game playing method and system based on a multimedia file | |
CN108711415B (en) | Method, apparatus and storage medium for correcting time delay between accompaniment and dry sound | |
US10643638B2 (en) | Technique determination device and recording medium | |
US8766078B2 (en) | Music piece order determination device, music piece order determination method, and music piece order determination program | |
CN110070884A (en) | Audio originates point detecting method and device | |
CN106782601A (en) | A kind of multimedia data processing method and its device | |
US11081138B2 (en) | Systems and methods for automated music rearrangement | |
CN111863030B (en) | Audio detection method and device | |
CN110070885B (en) | Audio starting point detection method and device | |
US11837205B2 (en) | Musical analysis method and music analysis device | |
CN112908289B (en) | Beat determining method, device, equipment and storage medium | |
CN109584902B (en) | Music rhythm determining method, device, equipment and storage medium | |
US9384758B2 (en) | Derivation of probabilistic score for audio sequence alignment | |
CN111782868B (en) | Audio processing method, device, equipment and medium | |
CN111489739B (en) | Phoneme recognition method, apparatus and computer readable storage medium | |
CN113436641A (en) | Music transition time point detection method, equipment and medium | |
CN112989109A (en) | Music structure analysis method, electronic equipment and storage medium | |
CN110085214B (en) | Audio starting point detection method and device | |
CN111179691A (en) | Note duration display method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |