[go: up one dir, main page]

HK1059959B - Device and method for analysing an audio signal in view of obtaining rhythm information - Google Patents

Device and method for analysing an audio signal in view of obtaining rhythm information Download PDF

Info

Publication number
HK1059959B
HK1059959B HK04102850.1A HK04102850A HK1059959B HK 1059959 B HK1059959 B HK 1059959B HK 04102850 A HK04102850 A HK 04102850A HK 1059959 B HK1059959 B HK 1059959B
Authority
HK
Hong Kong
Prior art keywords
sub
information
band
rhythm
signal
Prior art date
Application number
HK04102850.1A
Other languages
German (de)
French (fr)
Chinese (zh)
Other versions
HK1059959A1 (en
Inventor
Herre Jurgen
Uhle Christian
Rohden Jan
Cremer Markus
Original Assignee
弗兰霍菲尔运输应用研究公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from DE10123366A external-priority patent/DE10123366C1/en
Application filed by 弗兰霍菲尔运输应用研究公司 filed Critical 弗兰霍菲尔运输应用研究公司
Publication of HK1059959A1 publication Critical patent/HK1059959A1/en
Publication of HK1059959B publication Critical patent/HK1059959B/en

Links

Description

The present invention relates to signal processing concepts and in particular to the analysis of audio signals with respect to rhythm information.
In recent years, the availability of multimedia data such as audio and video data has increased significantly, due to a number of technical factors, in particular the widespread availability of the Internet, high-performance computer hardware and software, and high-performance data compression techniques, i.e. source coding, of audio and video processes.
The enormous volume of audiovisual data available worldwide, for example on the Internet, requires concepts that allow this data to be assessed by content criteria, catalogued, etc. There is a desire to be able to search and find multimedia data in a targeted manner by specifying meaningful criteria.
Err1:Expecting ',' delimiter: line 1 column 69 (char 68)
Of particular interest is the determination or extraction of features which have not only a signalling-theoretical but also as immediate a semantic meaning as possible, i.e. which represent properties directly perceived by the listener.
This allows the user to formulate searches in a simple and intuitive way to find pieces from the entire existing data set of an audio signal database. Similarly, semantically relevant features allow to model similarity relationships between pieces that are close to human perception. The use of features that have semantic meaning also allows, for example, an automatic suggestion of pieces of interest to a particular user if his preferences are known.
Err1:Expecting ',' delimiter: line 1 column 187 (char 186)
The aim is that the extraction of characteristics, i.e. the extraction of rhythmic information from an audio signal, can be robust and computationally efficient, which means that it must not matter whether the piece has been source-coded and decoded, whether the piece has been played over a loudspeaker and received by a mi-microphone, whether it is played loud or soft, or whether it is played by one or more instruments.
Err1:Expecting ',' delimiter: line 1 column 207 (char 206)
Err1:Expecting ',' delimiter: line 1 column 68 (char 67)In other words, the absolute amount of the sampling values is determined. The resulting n values are then smoothed, for example by averaging over a suitable window to obtain a shell curve signal. To reduce computational complexity, the shell curve signal can be sub-sampled. The shell curve signals are differentiated, i.e. sudden changes in signal amplitude are preferentially passed through the differentiation filter. The result is then limited to non-negative values. Each shell curve is then fed into a bank-type resonancer filter, i.e. o-sill, each containing a filter for each input so that the most appropriate musical signal is produced.The energies for each tempo are then summed over all subbands, with the largest energy sum representing the resulting tempo, i.e. the rhythm information.
Err1:Expecting ',' delimiter: line 1 column 212 (char 211)
Err1:Expecting ',' delimiter: line 1 column 68 (char 67)
The known algorithm is shown in Fig. 3 as a block diagram. The audio signal is fed through an audio input 300 to an analysis filter bank 302. The analysis filter bank generates from the audio input a number n of channels, i.e. individual subband signals. Each subband signal contains a certain range of frequencies of the audio signal. The filters of the analysis filter bank are selected in such a way that they approximate the selection characteristic of the human inner ear.
Err1:Expecting ',' delimiter: line 1 column 265 (char 264)
The output of the devices 304a to 304c then has an autocorrelation function for each sub-band signal, which represents aspects of the rhythm information of each sub-band signal.
The individual autocorrelation functions of the subband signals are then combined in a device 306 by summation to obtain a sum autocorrelation function (SAKF) that reproduces the rhythm information of the signal at audio input 300. This information can be output at a tempo output 308. Large values in the sum autocorrelation indicate that for a lag (lag) assigned to a tip of the SAKF there is a high periodicity of the note starts.
Musically meaningful delays are, for example, the tempo range between 60 bpm and 200 bpm. The device 306 may also be arranged to translate a delay time into tempo information. For example, a peak of a delay of one second corresponds to a tempo of 60 beats per minute. Smaller delays indicate higher tempos, while larger delays indicate lower tempos than 60 bpm.
Err1:Expecting ',' delimiter: line 1 column 1199 (char 1198)
Another problem in using autocorrelation functions to extract the periodicity of a subband signal is that the sum autocorrelation function obtained by the device 306 is ambiguous. The sum autocorrelation function at output 306 is ambiguous in that an autocorrelation function peak is generated even at multiple delays. This is understood to mean that a sine component with a period of t0 when subjected to autocorrelation function processing will generate, in addition to the desired maximum at t0, also maxima at multiple delays, i.e. at 2t0, 30t0, etc.
Err1:Expecting ',' delimiter: line 1 column 68 (char 67)
The present invention is intended to provide a computationally efficient and robust device and a computationally efficient and robust method for analyzing an audio signal with respect to rhythm information.
This task is solved by a device for analysing an audio signal according to claim 1 or by a process for analysing an audio signal according to claim 11.
The present invention is based on the finding that in the individual frequency bands, i.e. the subbands, there are often different favourable conditions for finding rhythmic periodicities. While, for example, in pop music, the signal of singing not corresponding to the beat is often dominated in the middle range, for example by 1 kHz, in the higher frequency ranges percussion sounds, such as the hihat of the beat, are often present, which allow a very good extraction of rhythmic regularities. In other words, different frequency bands contain a different amount of rhythmic information or have different significance or quality for the rhythmic information of the audio signal.
The present invention therefore consists of first breaking down the audio signal into subband signals, then examining each subband signal for its periodicity to obtain rhythm raw information for each subband signal, and then, according to the present invention, evaluating the quality of the periodicity of each subband signal to obtain a significance measure for each subband signal. A high significance measure indicates that this subband signal contains clear rhythm information, while a low significance measure indicates that this subband signal contains less clear rhythm information.
According to a preferred embodiment of the present invention, when examining a subband signal with respect to its periodicities, a modified envelope curve of the subband signal is first calculated and then an autocorrelation function of the envelope curve is calculated. The autocorrelation function of the envelope curve represents the rhythm raw information. Unique rhythm information is available when the autocorrelation function has significant maxima, while less clear rhythm information is available when the autocorrelation function of the envelope curve of the subband signal has less pronounced signal peaks or no signal peaks at all. An autocorrelation function that has significant signal peaks will therefore obtain a high significance, while an autocorrelation function that has a flat significance will obtain a relatively low significance.
Err1:Expecting ',' delimiter: line 1 column 150 (char 149)
This can be computationally efficiently implemented by a weighting factor that depends on the significance scale. While a subband signal that has a good quality for the rhythm information, i.e. has a high significance scale, could receive a weighting factor of 1, another subband signal that has a smaller significance scale will receive a weighting factor less than 1. In the extreme case, a subband signal that has a completely flat autocorrelation function will only have a weighting factor of 0. The weighted autocorrelation functions, i.e. the weighted rhythm raw information, are then simply summed up.
Err1:Expecting ',' delimiter: line 1 column 400 (char 399)
Another advantage of the method is that a significance measure can be determined with a small additional computational effort and that the evaluation of the rhythm raw information with the significance measure and subsequent addition can be done efficiently without large memory and computation time requirements, which is also recommended by the method, especially for real-time applications.
The following are examples of preferred embodiments of the present invention, which are described in detail in the accompanying drawings: Fig. 1a block diagram of a device for analysing an audio signal with a quality assessment of the rhythm raw information;Fig. 2a block diagram of a device for analysing an audio signal using weighting factors based on the significance measurements;Fig. 3a block diagram of a known device for analysing an audio signal with respect to rhythm information;Fig. 4a block diagram of a device for analysing an audio signal with respect to rhythm information using an autocorrelation function with a partband processing of the rhythm raw information; andFig. 5a detailed block diagram of the device for processing of Fig. 4.
Fig. 1 shows a block diagram of a device for analyzing an audio signal for rhythm information. The audio signal is fed through an input 100 to a device 102 to break down the audio signal into at least two sub-band signals 104a and 104b. Each sub-band signal 104a, 104b is fed into a device 106a and 106b respectively to examine it for periodicities in the sub-band signal in order to obtain rhythm raw information 108a and 108b for each sub-band signal respectively. The rhythm raw information is then fed into a device 110a and 110b respectively to assess a quality of the periodicity of at least two of the 114 signals in order to obtain a significant part of the 112a and 112b audio signals for at least two of the 112a. 112a and 112b audio signals are fed into a single rhythm-music device, although at least one of the 108a and 112b rhythm-music devices also takes into account the significant information of the 108a, 108b and 112b rhythm-music bands.
For example, if the quality assurance 110a has determined that there is no particular periodicity in the sub-band signal 104a, the significance level 112a will be very small or equal to zero. In this case, the rhythm information significance level 114 determines that the rhythm information level 112a is zero, so that the rhythm raw information 108a of the sub-band signal 104a does not need to be considered at all when determining the rhythm information of the audio signal. The rhythm information of the audio signal is then determined solely and exclusively on the basis of the rhythm raw information 108b of the sub-band signal 104b.
The following is a discussion of Fig. 2 with regard to a specific embodiment of the device of Fig. 1. The device 102 for the decomposition of the audio signal can be used as a standard analysis filter bank, which initially produces a user-selectable number of subband signals. Each subband signal is then processed by the devices 106a, 106b and 106c, respectively, and then determined by the devices 110a to 110c for the significance of each rhythm-information raw material. The device includes, in the preferred embodiment shown in Fig. 2, a device 114a for the calculation of weighting factors for each sub-signal based on the significance of this sub-band signal and the optional sum of the other sub-signal 114b. This device then finds a weighting factor for the rhythm-information raw material, which is also obtained in the 108a rhythm-information band, with the weighting factor 110A being used to calculate the weighting factor 110A.
The concept of the invention is thus as follows: after evaluation of the rhythmic information of the individual bands, which may be carried out, for example, by shell curvature, smoothing, differentiation, limits to positive values and the formation of the autocorrelation function (instruments 106a to 106c), an evaluation of the value or quality of these intermediate results in the units 110a to 110c is carried out. This is achieved by an evaluation function which assesses the reliability of the respective band results with a single measure of significance. From the signal sub-bands of significance, a weighting factor is derived for each band for the extraction of the rhythm information. The total rhythm extraction factor is then achieved in the 114 bands in each direction by taking into account the weighting factors of all their components.
As a result, a rhythm analysis algorithm implemented in this way demonstrates a good ability to reliably find rhythmic information in a signal even under adverse conditions.
In a preferred embodiment, the rhythm raw information 108a, 108b, 108c, representing the periodicity of the respective sub-band signal, is determined by means of an autocorrelation function. In this case, it is preferred to determine the significance measure by dividing a maximum of the autocorrelation function by an average of the autocorrelation function and then subtracting the value of 1.
Furthermore, the autocorrelation function is to be considered only in a specific speed range, i.e. from a maximum deceleration corresponding to the smallest interest rate to a minimum deceleration corresponding to the highest interest rate.
Alternatively, the ratio between the arithmetic mean of the autocorrelation function in the time domain of interest and the geometric mean of the autocorrelation function in the time domain of interest can be determined as a measure of significance. It is known that if all values of the autocorrelation function are equal, i.e. if the autocorrelation function has a flat course, the geometric mean of the autocorrelation function and the arithmetic mean of the autocorrelation function are equal. In this case, the significance measure would have a value equal to 1, which means that the rhythm raw information is not significant.
In the case of a system autocorrelation function with strong peaks, the ratio of arithmetic mean to geometric mean would be greater than 1, which means that the autocorrelation function has good rhythm information. However, the smaller the ratio of arithmetic mean to geometric mean, the flatter the autocorrelation function and the fewer periodicities it contains, which in turn means that the rhythm information of this subband signal is less significant, i.e. of lower quality, which will be expressed in a low or a weighting factor of 0.
As regards weighting factors, several options exist: preference is given to a relative weighting in such a way that all weighting factors of all sub-band signals add up to 1, i.e. the weighting factor of a band is determined as the significance value of that band divided by the sum of all significance values. In this case, relative weighting is performed before summing the weighted rhythm raw information to obtain the rhythm information of the audio signal.
As shown above, it is preferable to perform the evaluation of the rhythm information using an autocorrelation function. This case is illustrated in Figure 4. The audio signal is fed into the apparatus 102 via the audio input 100 to break the audio signal into sub-band signals 104a and 104b. Each sub-band signal is then examined in the apparatus 106a and 106b, as shown above, using an autocorrelation function to determine the periodicity of the sub-band signal. At the output of the apparatus 106a and 106b respectively, the rhythm information 108a and 108b is then present. These are fed into an apparatus 118a and 118b respectively to eliminate the effects of the autocorrelation function of the apparatus 116a and 120a respectively.
This has the advantage that the ambiguities of the autocorrelation functions, i.e. the rhythm-core information 108a, 108b, are already eliminated in a partial band and not only, as is the case at present, after the summation of the individual autocorrelation functions. Furthermore, the single-band elimination of the ambiguities in the autocorrelation functions by the devices 118a, 118b allows the rhythm-core information of the sub-band signals to be handled independently of each other. They can, for example, be subject to a quality assessment by means of the device 110a for the rhythm-core information 108a or by means of the device 110b for the rhythm-core information 108b.
However, as illustrated by the dashed lines in Figure 4, quality assessment can also be performed on the basis of the retrospective rhythm raw information, the latter being preferred, as quality assessment on the basis of the retrospective rhythm raw information ensures that the quality of information that is no longer ambiguous is assessed.
The determination of the rhythm information by the device 114 shall then be based on the subsequent processing of rhythm information from a channel and preferably also on the basis of the significance measure for that channel.
When a quality assessment is performed on the basis of the rhythm raw information, i.e. the signal before the device 118a, it is advantageous that if the significance is determined to be equal to 0, i.e. the autocorrelation function has a flat course, the post-processing by the device 118a can be completely omitted in order to save computation time resources.
In order to eliminate ambiguities in a partial band, as is the case with the state of the art, a scattered autocorrelation function can be calculated by means of a device 121 whereby the device 121 is rearranged to calculate the scattered autocorrelation function so that it scatters a single subset of the total number of autocorrelation information. In this case, the device 121 is prearranged in such a way that all the autocorrelation information is scattered in the first direction and then subtracted from the next function. In this case, the device 121 is rearranged in such a way that all the autocorrelation information is scattered in the first direction and then scattered in the next direction.
Alternatively or additionally, the device 121 may be arranged to calculate an autocorrelation function compressed by an integer factor, which is then added by the device 122 to the rhythm raw information to also produce ratios for delays t0/2, t0/3 etc.
In addition, the scattered or compressed versions of the rhythm raw information 108a can be weighted before addition or subtraction to achieve high robustness flexibility.
Thus, the method of investigating the periodicity of a subband signal on the basis of an autocorrelation function can be further improved by taking into account the characteristics of the autocorrelation function and performing the post-processing using the device 118a or 118b. Thus, a periodic sequence of note starts with a distance t0 produces not only an AKF peak at a delay t0 but also at 2t0, 3t0, etc. This leads to ambiguity in the tempo detection, i.e. the search for significant maxima in the autocorrelation function. The ambiguities can be eliminated by subtracting integer factors from the initial (weighted) versions of the AKF subband.
Err1:Expecting ',' delimiter: line 1 column 199 (char 198)
The AKF is thus carried out in a part-band manner, whereby an autocorrelation function is calculated for at least one part-band signal and combined with stretched or spread versions of this function.

Claims (11)

  1. Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal, comprising:
    means (102) for dividing the audio signal into at least two sub-band signals (104a, 104b);
    means for examining (106a, 106b) a sub-band signal with regard to a periodicity in the sub-band signal, to obtain rhythm raw-information (108a, 108b) for the sub-band signal;
    means for evaluating (110a, 110b) a quality of the periodicity of the rhythm raw-information (108a) of the sub-band signal (104a) to obtain a significance measure (112a) for the sub-band signal; and
    means (114) for establishing rhythm information of the audio signal under consideration of the significance measure (112a) of the sub-band signal and the rhythm raw-information (108a, 108b) of at least one sub-band signal.
  2. Apparatus according to claim 1, wherein the means for examining (106a, 106b) is formed to calculate an autocorrelation function for each of the least two sub-band signals.
  3. Apparatus according to claim 1 or 2, wherein the means for examining (106a, 106b) comprises:
    means for forming an envelope of a sub-band signal;
    means for smoothing the envelope of the sub-band signal to obtain a smoothed envelope;
    means for differentiating the smoothed envelope to obtain a differentiated envelope;
    means for limiting the differentiated envelope to positive values to obtain a limited envelope; and
    means for forming an autocorrelation function of the limited envelope to obtain the rhythm raw-information (108a, 108b).
  4. Apparatus according to claim 2 or 3, wherein the means for evaluating (110a, 110b) of the quality is formed to use a ratio of a maximum of the autocorrelation function to an average of the autocorrelation function as a significance measure.
  5. Apparatus according to claim 2 or 3, wherein the means for evaluating (110a, 110b) of the quality is formed to use a ratio of an arithmetic average of the rhythm raw-information to a geometrical average of the rhythm raw-information as significance measure.
  6. Apparatus according to claim 4 or 5, wherein the means for evaluating (110a, 110b) the quality is formed to evaluate the autocorrelation function merely within a tempo range, which extends from a minimum lag to obtain a maximum tempo to a maximum lag to obtain a minimum tempo.
  7. Apparatus according to one of the previous claims, wherein means for establishing (114) comprises:
    means (114a) for deriving a weighting factor for a sub-band by using the significance measure for the sub-band;
    means (114b) for weighting a rhythm raw-information of the sub-band by using the weighting factor for the sub-band to obtain weighted rhythm raw-information for the sub-band and for summarizing the weighted rhythm raw-information of the sub-band with weighted or unweighted rhythm raw-information of the other sub-band to obtain the rhythm information of the audio signal.
  8. Apparatus according to claim 7, wherein the means (114a) for deriving a weighting factor is disposed to derive a relative weighting factor for every sub-band signal, wherein a sum of the weighting factors for all sub-band signals equals 1.
  9. Apparatus according to claim 8, wherein the means (114a) for deriving a weighting factor is disposed to derive a weighting factor as ratio of the significance measure of a sub-band signal to the sum of the significance measure of all sub-band signals.
  10. Apparatus according to claim 9, wherein the means (106a, 106b) for examining a sub-band signal is disposed to examine a sub-band signal whose length is higher than 10 seconds.
  11. Method for analyzing an audio signal with regard to rhythm information of the audio signal, comprising:
    dividing the audio signal into at least two sub-band signals (104a, 104b);
    examining (106a, 106b) a sub-band signal with regard to a periodicity in the sub-band signal to obtain rhythm raw-information (108a, 108b) for the sub-band signal;
    evaluating (110a, 110b) a quality of the periodicity of the rhythm raw-information (108a) of the sub-band signal (104a) to obtain a significance measure (112a) for the sub-band signal; and
    establishing the rhythm information of the audio signal under consideration of the significance measure (112a) of the sub-band signal and the rhythm raw-information (108a, 108b) of at least one sub-band signal.
HK04102850.1A 2001-05-14 2002-04-25 Device and method for analysing an audio signal in view of obtaining rhythm information HK1059959B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE10123366 2001-05-14
DE10123366A DE10123366C1 (en) 2001-05-14 2001-05-14 Device for analyzing an audio signal for rhythm information
PCT/EP2002/004618 WO2002093557A1 (en) 2001-05-14 2002-04-25 Device and method for analysing an audio signal in view of obtaining rhythm information

Publications (2)

Publication Number Publication Date
HK1059959A1 HK1059959A1 (en) 2004-07-23
HK1059959B true HK1059959B (en) 2005-03-11

Family

ID=

Similar Documents

Publication Publication Date Title
US7012183B2 (en) Apparatus for analyzing an audio signal with regard to rhythm information of the audio signal by using an autocorrelation function
KR101612768B1 (en) A System For Estimating A Perceptual Tempo And A Method Thereof
RU2743315C1 (en) Method of music classification and a method of detecting music beat parts, a data medium and a computer device
Tzanetakis et al. Audio analysis using the discrete wavelet transform
US20040068401A1 (en) Device and method for analysing an audio signal in view of obtaining rhythm information
US20080300702A1 (en) Music similarity systems and methods using descriptors
KR20130010118A (en) Apparatus and method for modifying an audio signal using envelope shaping
CN101499268A (en) Device and method and retrieval system for automatically generating music structural interface information
Alonso et al. Extracting note onsets from musical recordings
US20070180980A1 (en) Method and apparatus for estimating tempo based on inter-onset interval count
Elowsson et al. Modelling perception of speed in music audio
JP2005292207A (en) Method of music analysis
Brent Cepstral analysis tools for percussive timbre identification
HK1059959B (en) Device and method for analysing an audio signal in view of obtaining rhythm information
JP5540651B2 (en) Acoustic signal analysis apparatus, acoustic signal analysis method, and acoustic signal analysis program
Peiris et al. Supervised learning approach for classification of Sri Lankan music based on music structure similarity
Peiris et al. Musical genre classification of recorded songs based on music structure similarity
JP5359786B2 (en) Acoustic signal analysis apparatus, acoustic signal analysis method, and acoustic signal analysis program
Nagaraj et al. Toward automatic transcription-pitch tracking in polyphonic environment
Nóbrega et al. Detecting key features in popular music: case study-singing voice detection
Chaudhary et al. Listener evaluation of reduction strategies for sinusoidal and resonance models
Shi et al. Log-scale modulation frequency coefficient: A tempo feature for music emotion classification
Manzo-Martínez et al. Use of the entropy of a random process in audio matching tasks
Zeppelzauer Features for content-based audio retrieval
Szczepański et al. Music Information Retrieval. A case study of MIR in modern rock and metal music