[go: up one dir, main page]

WO2012163013A1 - Music query method and apparatus - Google Patents

Music query method and apparatus Download PDF

Info

Publication number
WO2012163013A1
WO2012163013A1 PCT/CN2011/080977 CN2011080977W WO2012163013A1 WO 2012163013 A1 WO2012163013 A1 WO 2012163013A1 CN 2011080977 W CN2011080977 W CN 2011080977W WO 2012163013 A1 WO2012163013 A1 WO 2012163013A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
fingerprint
queried
segment
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2011/080977
Other languages
French (fr)
Chinese (zh)
Inventor
许洁萍
袁斌
崔建伟
王君
何山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2011/080977 priority Critical patent/WO2012163013A1/en
Priority to CN201180002170.8A priority patent/CN103180847B/en
Publication of WO2012163013A1 publication Critical patent/WO2012163013A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a music query method and apparatus.
  • a music fingerprint referred to as an Audio Fingerprint
  • An Audio Fingerprint is defined as a sequence of audio clip features that are processed to characterize the "identity" of the music.
  • the method of music pattern recognition and retrieval research is significantly different from the traditional music retrieval based on metadata such as song title and singer.
  • the music does not contain all the information of a piece of music, but it can be used to identify a unique piece of music, that is, through the music pattern, the desired music can be queried from the massive data.
  • the existing music query technology generally has specific requirements on the length and starting point of the song query segment, and the query efficiency is low.
  • Embodiments of the present invention provide a music query method and apparatus to improve the query efficiency of music.
  • An embodiment of the present invention provides a music query method, including:
  • Determining, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result.
  • the embodiment of the invention further provides a music query device, including:
  • An intercepting module configured to intercept a music piece to be queried from a music file to be queried
  • a framing module configured to framing the music segment to be queried
  • An extracting module configured to extract a fingerprint feature of the segmented segment included in the music segment to be queried, to obtain a fingerprint feature of the music segment to be queried;
  • a querying module configured to query, according to the fingerprint feature of the segmented segment included in the music segment to be queried extracted by the extraction module, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database ;
  • a returning module configured to return a query result according to a similarity degree between the fingerprint feature of the music piece to be queried and the fingerprint feature queried by the query module.
  • the music segment to be queried is first intercepted from the music file to be queried, and the music segment to be queried is segmented, and then the fingerprint feature of the framing segment included in the music segment to be queried is extracted to obtain a fingerprint of the music segment to be queried. Finally, according to the fingerprint feature of the segmented segment included in the music segment to be queried, querying the fingerprint feature matched with the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result.
  • the embodiment of the present invention does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music.
  • 1 is a flow chart of an embodiment of a music query method according to the present invention
  • 2 is a flow chart of an embodiment of a fingerprint feature extraction process of the present invention
  • FIG. 3 is a schematic diagram of an embodiment of extracting a spectrum envelope and a dimension reduction process according to the present invention
  • FIG. 4 is a schematic structural diagram of an embodiment of a music query apparatus according to the present invention
  • FIG. 5 is a schematic structural diagram of another embodiment of a music query apparatus according to the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present invention.
  • the technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention.
  • the embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
  • the music query method may include:
  • Step 101 Intercept the music piece to be queried from the music file to be queried, and categorize the music piece to be queried.
  • Step 102 Extract fingerprint features of the segmented segments included in the music segment to be queried to obtain fingerprint features of the music segment to be queried.
  • Step 103 Query, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result.
  • the known music file needs to be performed first. Framing; then extracting the fingerprint features of the segmented segments included in the known music file to obtain the fingerprint features of the known music files, and finally storing the fingerprint features of the known music files into the fingerprint database.
  • extracting the fingerprint feature of the segmented segment included in the music segment to be queried may be: performing time-frequency conversion on the segmented segment included in the music segment to be queried, and performing frequency domain data obtained by time-frequency conversion. ; choose according to the auditory characteristics of the human ear in the frequency domain data after modeling Frequency domain data on a predetermined frequency band; extracting a spectrum envelope of frequency domain data on the predetermined frequency band; performing dimensionality reduction processing on the feature matrix obtained after extracting the spectrum envelope, and obtaining fingerprint features of the frame segment included in the music segment to be queried .
  • extracting the fingerprint feature of the segmented segment included in the known music file may be: performing time-frequency conversion on the segmented segment included in the known music file, and performing frequency-modulation on the frequency-domain data obtained by time-frequency conversion; Selecting frequency domain data on a predetermined frequency band in the frequency domain data after modeling according to the human auditory characteristic; extracting a spectral envelope of the frequency domain data on the predetermined frequency band; and performing dimensionality reduction on the feature matrix obtained after extracting the spectral envelope Processing, obtaining fingerprint features of the segmented segments included in the known music file.
  • step 103 may be: first, according to the fingerprint feature of the segmented segment included in the music segment to be queried, querying the fingerprint feature of the segmented segment included in the to-be-queried music segment in the fingerprint feature stored in the fingerprint database Matching the fingerprint feature; secondly, reading a predetermined number of fingerprint features in the fingerprint database from the location according to the location of the matched fingerprint feature stored in the fingerprint database in the belonging song, the predetermined number and the music segment to be queried include The number of the fragmented segments is the same; finally, the degree of similarity between the predetermined number of fingerprint features and the fingerprint features of all the segmented segments included in the music segment to be queried is compared, and the query result is returned according to the degree of similarity.
  • the above embodiment does not require the length and starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.
  • the music query method provided by the embodiment of the present invention is described in detail in the following three aspects: the music fingerprint extraction process, the fingerprint database establishment process, and the music fingerprint query process.
  • the music fingerprint extraction process may include: decoding, downsampling, and fingerprint feature extraction. The following are described separately:
  • Decoding process Since the music file is generally encoded and compressed, before extracting the fingerprint feature of the music file, the music file is first decoded, and the music file is decoded into a waveform (Wave; hereinafter referred to as WAV)
  • WAV waveform
  • the file, the decoded music file has the same sampling rate as the original music.
  • the sampling rate of common music files is generally 44KHz or 22KHz.
  • Downsampling process Since the sampling rate of music files is generally high, most of the high frequency information is included, which makes it difficult to identify music files. Therefore, for the decoded music files, downsampling processing is also required.
  • the music file to be decoded is reduced from a higher sampling rate such as 44KHz or 22KHz to a lower sampling rate. In this embodiment, the decoded music file is uniformly reduced to a sampling rate of 5KHz, and the down sampling processing is performed.
  • the subsequent music file is converted into a file of Pulse Code Modulation (hereinafter referred
  • FIG. 2 is a flowchart of an embodiment of the fingerprint feature extraction process of the present invention, including:
  • Step 201 Perform framing on the music file after the downsampling process.
  • the process of framing is implemented by windowing.
  • Hanning window can be used for framing, and the window size of Hanning window is 2048 points.
  • Step 202 Perform time-frequency conversion on the framed segments obtained by the frame.
  • the above-mentioned frame segment is time-frequency-converted in a plurality of manners, and the implementation manner of the time-frequency conversion is not limited in this embodiment.
  • a Fast Fourier Transform hereinafter referred to as FFT
  • the data repetition rate of two adjacent frames is 31/. 32, that is, the next frame is FFTed with approximately 60 new PCM data relative to the previous frame.
  • the value obtained in step 202 is a complex number.
  • Step 203 Perform modulo calculation on frequency domain data obtained by time-frequency conversion.
  • Step 204 Select frequency domain data on a predetermined frequency band in the frequency domain data after the modulo according to the human ear hearing characteristic.
  • 33 sub-bands are selected from the frequency domain data after the modulo according to the auditory characteristics of the human ear, and the frequency distribution of the sub-bands ranges from 0 to 2.5 kHz, and the bandwidth of the above 33 sub-bands is The logarithmic domain is distributed in a linear relationship.
  • Step 205 Extract a spectrum envelope of frequency domain data on the predetermined frequency band.
  • the method for extracting the spectrum envelope is not limited.
  • the frequency envelope of the frequency domain data is extracted by using wavelet transform as an example.
  • Wavelet transform is a local transformation of space and frequency. It can perform multi-scale refinement analysis of functions or signals through operations such as scaling and translation, so that information can be extracted from signals effectively.
  • the standard Haar wavelet is used to analyze the above-mentioned frequency domain data, and only the maximum 300 wavelet coefficients (according to the absolute value of the spectral energy) are retained, and other coefficients not in the maximum 300 wavelet coefficients are quantized as "00". For each of the largest 300 wavelet coefficients, if it is a positive number, the quantization is "10", otherwise the quantization is "01".
  • Step 206 Perform dimension reduction processing on the feature matrix obtained after extracting the spectrum envelope, and obtain fingerprint features of the segmented segments included in the music file.
  • the wavelet transform is a high-dimensional 0-1 feature matrix
  • dimensionality reduction is required.
  • the minimum hash (MinHash) algorithm is used for the dimensionality reduction processing, that is, the arbitrary position of each 0-1 feature matrix is arbitrarily exchanged P times, and the position of the first one is recorded each time; generally speaking, at the 255th position After that, the probability of the first occurrence of 1 is very small, so 255 is uniformly taken as 255; thus, the high-dimensional 0-1 characteristic matrix is compressed into the P-dimensional eigenvalue, and each group of P-dimensional 0 ⁇ 255 integers is called music. A sub-grain of the pattern.
  • P 100 can be taken, so that the dimension of the 100-dimensional 0 ⁇ 255 can be obtained after the dimension reduction processing.
  • each group of 100-dimensional integers of 0 to 255 is called a sub-grain of the music pattern. .
  • FIG. 3 is a schematic diagram of an embodiment of extracting a spectral envelope and a dimensionality reduction process according to the present invention.
  • PCM data is read in chronological order, and each frame of data is read in 60 more than the previous frame. PCM data, this process continues until it reaches the end of the PCM data.
  • the spectrum envelope of each frame of PCM data is extracted according to the method provided in step 205, and then the feature matrix obtained by extracting the spectrum envelope is subjected to dimensionality reduction processing according to the method provided in step 206, to obtain the fragmented segment included in the music file. Fingerprint feature.
  • the fingerprint feature of the segmented segment is referred to as a sub-grain, and the fingerprint feature of the music file is referred to as a tune.
  • the tune is a sequence of sub-tones, and the sequence of sub-tones in the sequence The sequence reflects the sequential relationship of the sub-frame segments corresponding to the sub-tones in time.
  • Each entry in the index table stores a child note and the child note in the fingerprint database An identification, and a specific time position of the sub-grain in the song to which it belongs.
  • Each entry in the music sheet stores the music pattern of a song, that is, all the child music patterns contained in the song.
  • the fingerprint feature of the music segment to be queried is first extracted according to the method provided in the above music fingerprint extraction process, and then the fingerprint feature of the segmented segment included in the music segment to be queried is first
  • the fingerprint feature matching the fingerprint feature of the segmented segment included in the music segment to be queried is searched in the index table, and then the fingerprint matching the above is found according to the identifier of the matched fingerprint feature saved in the index table.
  • the entry corresponding to the feature reads a predetermined number of sub-categories from the position corresponding to the fingerprint pattern and the matching fingerprint feature from the position A note pattern, wherein the predetermined number is the same as the number of frame segments included in the music piece to be queried. Finally, the degree of similarity between the read predetermined number of sub-letter patterns and the fingerprint features of all the sub-frame segments included in the music segment to be queried is compared, and the query result list is returned according to the degree of similarity.
  • the music query method provided by the embodiment of the present invention has the following advantages:
  • the compression ratio of the music pattern can reach more than 100 times of compression, the compression ratio is large, and the representation is strong.
  • the wavelet transform can be used to extract the characteristics of the noise details, and the high-energy information part of the spectrogram is hash-compressed, so that one frame of data is compressed from the original 8192 points to 100 bytes; The data is reduced to a few hundredths of the original data; therefore, the feature compression ratio is large and representative.
  • music pattern design has a certain degree of noise resistance.
  • the feature matrix is processed by the minimum hash algorithm, which is up to 8192.
  • the feature of the dimension is reduced in dimension, and the feature can be similarly calculated by a simple comparison.
  • the local sensitive hash is introduced in consideration of the variation characteristics of the local features of the music, and the applicability is strong. , greatly reducing the range of candidate music patterns. Since the noise resistance tolerance is taken into account in the music pattern extraction stage, it does not contain a special denoising system, so the final noise-free music pattern has a certain Noise resistance.
  • the embodiment of the present invention can return the query result list according to the degree of similarity.
  • Embodiments of the present invention may also perform similarity comparison and metrics of overlapping portions on similar pieces of music.
  • the music patterns extracted by the embodiment of the present invention are sequential in time, it is convenient to know the source of the two segments and the position in the belonging song, thereby judging the similarity of the two similar music segments and The proportion of overlap.
  • the sequence of the fingerprint data in the fingerprint data and the efficiency of the query ensure the realization of such requirements.
  • the music query apparatus may include: an intercepting module 41, a framing module 42, an extracting module 43, a querying module 44, and a returning module 45.
  • the intercepting module 41 is configured to intercept the music segment to be queried from the music file to be queried;
  • the framing module 42 is configured to framing the music segment to be queried;
  • the extracting module 43 is configured to extract a fingerprint feature of the segmented segment included in the music segment to be queried to obtain a fingerprint feature of the music segment to be queried;
  • the querying module 44 is configured to query, according to the fingerprint feature of the frame segment included in the music segment to be queried extracted by the extraction module 43 , a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database;
  • the returning module 45 is configured to return the query result according to the degree of similarity between the fingerprint feature of the music piece to be queried and the fingerprint feature queried by the query module 44.
  • the music inquiry device does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.
  • Figure 5 is a schematic diagram of another embodiment of the music query device of the present invention, as shown in Figure 5, the music query device may further include: a storage module 46;
  • the framing module 42 is further configured to framing the known music file;
  • the extracting module 43 is further configured to extract the fingerprint feature of the framing segment included in the known music file to obtain the known music.
  • the fingerprint characteristics of the document are further configured to extract the fingerprint feature of the framing segment included in the known music file to obtain the known music.
  • the storage module 46 is configured to store the fingerprint feature of the above-mentioned known music file obtained by the extraction module 43 into the fingerprint database.
  • the extraction module 43 may include: a conversion submodule 43 1. a module submodule 432, a selection submodule 433, an envelope extraction submodule 434, and a dimension reduction submodule 435.
  • the conversion sub-module 43 1 is configured to perform time-frequency conversion on the frame segment included in the music segment to be queried;
  • the module submodule 432 is configured to perform frequency domain data obtained by time-frequency conversion, and the selection sub-module 433 is configured to select a frequency domain on a predetermined frequency band in the frequency domain data obtained by the module sub-module 432 according to the human ear hearing characteristic.
  • An envelope extraction submodule 434 configured to extract a spectrum envelope of frequency domain data on the predetermined frequency band
  • the dimension reduction sub-module 435 is configured to perform a dimensionality reduction process on the feature matrix obtained by extracting the spectrum envelope by the envelope extraction sub-module 434, and obtain the fingerprint feature of the segmented segment included in the music segment to be queried.
  • the query module 44 may include: a feature query sub-module 441 and a feature reading sub-module 442;
  • the feature query sub-module 441 is configured to query, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature stored in the fingerprint database to match a fingerprint feature of the segmented segment included in the to-be-queried music segment. Fingerprint feature
  • the feature reading sub-module 442 is configured to read, according to the location of the matched fingerprint feature stored in the fingerprint database, a predetermined number of fingerprint features in the fingerprint database, the predetermined number and the music to be queried.
  • the fragment contains the same number of fragmented segments.
  • the returning module 45 may compare the degree of similarity between the predetermined number of fingerprint features read by the feature reading sub-module 442 and the fingerprint features of all the segmented segments included in the music segment to be queried, and return the query result according to the similarity degree.
  • the music inquiry device does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.
  • FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present invention.
  • the computer device in this embodiment can implement the function of the music query device in the embodiment shown in FIG. 4 or FIG. 5, as shown in FIG.
  • the system may include: a central processing unit (hereinafter referred to as: CPU) 61, a bus control logic 62, a system bus 63, a memory 64, an interface 65, and an input/output (I/O) subsystem 66.
  • the I/O subsystem 66 includes an I/O device 661 and a memory 662.
  • the CPU 61 is configured to intercept a music segment to be queried from the music file to be queried, perform framing on the music segment to be queried, and extract a fingerprint feature of the framing segment included in the music segment to be queried to obtain the above. Querying the fingerprint feature of the music segment to be queried, according to the fingerprint feature of the segmented segment included in the extracted music segment, querying the fingerprint feature stored in the fingerprint database to match the fingerprint feature of the music segment to be queried, and according to Querying the similarity between the fingerprint feature of the music piece and the queried fingerprint feature returns the query result; the CPU 61 in this embodiment can implement the intercepting module 41, the framing module 42, and the extraction in the embodiment shown in FIG. 4 or FIG. The functions of module 43 and query module 44.
  • the fingerprint database is stored in the memory 662.
  • the CPU 61 returns the query result: the CPU 61 sends the query result to the bus control logic 62, and the query result is passed by the bus control logic 62 through the system bus 63 and the interface 65.
  • the I/O device 661 sends the query result to the I/O device 661.
  • the query result may be cached in the memory 64 first. That is, in this embodiment, the CPU 61, the bus control logic 62, the system bus 63, the memory 64, the interface 65, and the I/O device 661 collectively complete the return module 45 of the embodiment shown in FIG. 4 or FIG. 5 of the present invention.
  • the CPU 61 may further framing the known music files to extract the fingerprint features of the segmented segments included in the known music files to obtain the fingerprint features of the known music files.
  • the memory 662 is configured to save the fingerprint database, and store the fingerprint feature of the known music file obtained by the CPU 61 into the fingerprint database.
  • the memory in this embodiment 662 can implement the functions of memory module 46 in the embodiment of FIG. 5 of the present invention.
  • the above computer device does not require the length and starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.
  • modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the embodiment description, or the corresponding changes may be located in one or more apparatuses different from the embodiment.
  • the modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Collating Specific Patterns (AREA)

Abstract

Provided are a music query method and apparatus. The music query method comprises: extracting a music segment to be queried from a music file to be queried, and framing the music segment to be queried; extracting the fingerprint characteristics of a frame segment contained in the music segment to be queried in order to obtain the fingerprint characteristics of the music segment to be queried; according to the fingerprint characteristics of the frame segment contained in the music segment to be queried, querying, in the fingerprint characteristics stored in a fingerprint database, the fingerprint characteristics matched with the fingerprint characteristics of the music segment to be queried, and returning a query result according to the degree of similarity between the fingerprint characteristics of the music segment to be queried and the queried fingerprint characteristics. The present invention has no requirements for the length and the starting point of the music segment to be queried, thereby improving the music query efficiency.

Description

音乐查询方法和装置  Music inquiry method and device

技术领域 本发明实施例涉及通信技术领域, 尤其涉及一种音乐查询方法和装 置。 The present invention relates to the field of communications technologies, and in particular, to a music query method and apparatus.

背景技术 Background technique

随着互联网和数字多媒体的迅猛发展, 如何快速而有效地进行音乐检 索成为目前关注的研究热点之一, 基于文本的传统检索方式只能对有标注 信息的文件进行检索, 而基于内容的的检索则不依靠标注信息, 而是通过 获取音乐的一个片段或部分内容进行检索, 这是一个极具挑战的研究课 题。 随着基于内容的音乐检索的发展和实现, 最终会给广大音乐听众的检 索带来极大的方便性。  With the rapid development of the Internet and digital multimedia, how to quickly and effectively perform music retrieval has become one of the research hotspots at present. The traditional text-based retrieval method can only retrieve files with annotation information, and content-based retrieval. It does not rely on the annotation information, but retrieves a piece or part of the music, which is a challenging research topic. With the development and implementation of content-based music retrieval, it will eventually bring great convenience to the retrieval of music listeners.

音乐指纹, 简称为乐纹( Audio Fingerprint ) , 定义为一段经过处理后 能表征音乐 "身份" 的音频片段特征序列。 乐纹识别和检索研究的方法与 传统的基于歌名、 演唱者等元数据的音乐检索具有明显的不同。 乐纹中不 包含一首音乐的所有信息, 但是可以用来识别一首独一无二的音乐, 即通 过乐纹可以从海量数据中查询到想要的音乐。  A music fingerprint, referred to as an Audio Fingerprint, is defined as a sequence of audio clip features that are processed to characterize the "identity" of the music. The method of music pattern recognition and retrieval research is significantly different from the traditional music retrieval based on metadata such as song title and singer. The music does not contain all the information of a piece of music, but it can be used to identify a unique piece of music, that is, through the music pattern, the desired music can be queried from the massive data.

现有的音乐查询技术一般对歌曲查询片段的长度和起始点等都有具 体的要求, 查询效率较低。 。 发明内容  The existing music query technology generally has specific requirements on the length and starting point of the song query segment, and the query efficiency is low. . Summary of the invention

本发明实施例提供一种音乐查询方法和装置, 以提高音乐的查询效 率。  Embodiments of the present invention provide a music query method and apparatus to improve the query efficiency of music.

本发明实施例提供一种音乐查询方法, 包括:  An embodiment of the present invention provides a music query method, including:

从待查询音乐文件中截取待查询音乐片段, 对所述待查询音乐片段进 行分帧;  Intercepting the music piece to be queried from the music file to be queried, and framing the music piece to be queried;

提取所述待查询音乐片段包含的分帧片段的指纹特征, 以获得所述待 查询音乐片段的指紋特征; Extracting fingerprint features of the segmented segments included in the music segment to be queried to obtain the Query the fingerprint characteristics of the music piece;

根据所述待查询音乐片段包含的分帧片段的指纹特征, 在指纹数据库 存储的指纹特征中查询与所述待查询音乐片段的指纹特征匹配的指纹特 征, 并根据所述待查询音乐片段的指纹特征与查询到的指纹特征的相似程 度返回查询结果。  Determining, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result.

本发明实施例还提供一种音乐查询装置, 包括:  The embodiment of the invention further provides a music query device, including:

截取模块, 用于从待查询音乐文件中截取待查询音乐片段;  An intercepting module, configured to intercept a music piece to be queried from a music file to be queried;

分帧模块, 用于对所述待查询音乐片段进行分帧;  a framing module, configured to framing the music segment to be queried;

提取模块, 用于提取所述待查询音乐片段包含的分帧片段的指纹特 征, 以获得所述待查询音乐片段的指纹特征;  An extracting module, configured to extract a fingerprint feature of the segmented segment included in the music segment to be queried, to obtain a fingerprint feature of the music segment to be queried;

查询模块, 用于根据所述提取模块提取的所述待查询音乐片段包含的 分帧片段的指纹特征, 在指纹数据库存储的指纹特征中查询与所述待查询 音乐片段的指纹特征匹配的指纹特征;  a querying module, configured to query, according to the fingerprint feature of the segmented segment included in the music segment to be queried extracted by the extraction module, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database ;

返回模块, 用于根据所述待查询音乐片段的指纹特征与所述查询模块 查询到的指纹特征的相似程度返回查询结果。  And a returning module, configured to return a query result according to a similarity degree between the fingerprint feature of the music piece to be queried and the fingerprint feature queried by the query module.

本发明实施例首先从待查询音乐文件中截取待查询音乐片段, 并对待 查询音乐片段进行分帧, 然后提取上述待查询音乐片段包含的分帧片段的 指纹特征, 以获得待查询音乐片段的指纹特征; 最后根据所述待查询音乐 片段包含的分帧片段的指纹特征, 在指纹数据库存储的指纹特征中查询与 上述待查询音乐片段的指纹特征匹配的指纹特征, 并根据待查询音乐片段 的指纹特征与查询到的指纹特征的相似程度返回查询结果; 本发明实施例 对音乐查询片段的长度和起始点等没有要求, 可以提高音乐的查询效率。 附图说明  In the embodiment of the present invention, the music segment to be queried is first intercepted from the music file to be queried, and the music segment to be queried is segmented, and then the fingerprint feature of the framing segment included in the music segment to be queried is extracted to obtain a fingerprint of the music segment to be queried. Finally, according to the fingerprint feature of the segmented segment included in the music segment to be queried, querying the fingerprint feature matched with the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result. The embodiment of the present invention does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music. DRAWINGS

实施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见 地, 下面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员 来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的 附图。 The drawings used in the embodiments or the description of the prior art are briefly described. It is obvious that the drawings in the following description are some embodiments of the present invention, and are not creative to those skilled in the art. Other drawings can also be obtained from these drawings on the premise of labor.

图 1为本发明音乐查询方法一个实施例的流程图; 图 2为本发明指纹特征提取过程一个实施例的流程图; 1 is a flow chart of an embodiment of a music query method according to the present invention; 2 is a flow chart of an embodiment of a fingerprint feature extraction process of the present invention;

图 3为本发明提取频谱包络和降维处理一个实施例的示意图; 图 4为本发明音乐查询装置一个实施例的结构示意图;  3 is a schematic diagram of an embodiment of extracting a spectrum envelope and a dimension reduction process according to the present invention; FIG. 4 is a schematic structural diagram of an embodiment of a music query apparatus according to the present invention;

图 5为本发明音乐查询装置另一个实施例的结构示意图;  FIG. 5 is a schematic structural diagram of another embodiment of a music query apparatus according to the present invention; FIG.

图 6为本发明计算机设备一个实施例的结构示意图。 具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。  FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. The embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

图 1为本发明音乐查询方法一个实施例的流程图, 如图 1所示, 该音 乐查询方法可以包括:  1 is a flowchart of an embodiment of a music query method according to the present invention. As shown in FIG. 1, the music query method may include:

步骤 101 , 从待查询音乐文件中截取待查询音乐片段, 对待查询音乐 片段进行分帧。  Step 101: Intercept the music piece to be queried from the music file to be queried, and categorize the music piece to be queried.

步骤 102 , 提取上述待查询音乐片段包含的分帧片段的指纹特征, 以 获得所述待查询音乐片段的指纹特征。  Step 102: Extract fingerprint features of the segmented segments included in the music segment to be queried to obtain fingerprint features of the music segment to be queried.

步骤 103 , 根据上述待查询音乐片段包含的分帧片段的指纹特征, 在 指纹数据库存储的指纹特征中查询与上述待查询音乐片段的指纹特征匹 配的指纹特征, 并根据上述待查询音乐片段的指纹特征与查询到的指纹特 征的相似程度返回查询结果。  Step 103: Query, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result.

进一步地, 根据上述待查询音乐片段包含的分帧片段的指纹特征, 在 指纹数据库存储的指纹特征中查询与上述待查询音乐片段的指纹特征匹 配的指纹特征之前, 需要先对已知音乐文件进行分帧; 然后提取已知音乐 文件包含的分帧片段的指纹特征, 以获得上述已知音乐文件的指纹特征, 最后将上述已知音乐文件的指纹特征存储到指纹数据库。  Further, according to the fingerprint feature of the segmented segment included in the music segment to be queried, before the fingerprint feature matched with the fingerprint feature of the music segment to be queried is queried in the fingerprint feature stored in the fingerprint database, the known music file needs to be performed first. Framing; then extracting the fingerprint features of the segmented segments included in the known music file to obtain the fingerprint features of the known music files, and finally storing the fingerprint features of the known music files into the fingerprint database.

本实施例中, 提取上述待查询音乐片段包含的分帧片段的指纹特征可 以为: 对上述待查询音乐片段包含的分帧片段进行时频转换, 对时频转换 获得的频域数据进行求模; 根据人耳听觉特性在求模后的频域数据中选择 预定频带上的频域数据; 提取上述预定频带上的频域数据的频谱包络; 对 提取频谱包络后获得的特征矩阵进行降维处理, 获得待查询音乐片段包含 的分帧片段的指紋特征。 In this embodiment, extracting the fingerprint feature of the segmented segment included in the music segment to be queried may be: performing time-frequency conversion on the segmented segment included in the music segment to be queried, and performing frequency domain data obtained by time-frequency conversion. ; choose according to the auditory characteristics of the human ear in the frequency domain data after modeling Frequency domain data on a predetermined frequency band; extracting a spectrum envelope of frequency domain data on the predetermined frequency band; performing dimensionality reduction processing on the feature matrix obtained after extracting the spectrum envelope, and obtaining fingerprint features of the frame segment included in the music segment to be queried .

本实施例中, 提取已知音乐文件包含的分帧片段的指纹特征可以为: 对上述已知音乐文件包含的分帧片段进行时频转换, 对时频转换获得的频 域数据进行求模; 根据人耳听觉特性在求模后的频域数据中选择预定频带 上的频域数据; 提取上述预定频带上的频域数据的频谱包络; 对提取频谱 包络后获得的特征矩阵进行降维处理, 获得已知音乐文件包含的分帧片段 的指纹特征。  In this embodiment, extracting the fingerprint feature of the segmented segment included in the known music file may be: performing time-frequency conversion on the segmented segment included in the known music file, and performing frequency-modulation on the frequency-domain data obtained by time-frequency conversion; Selecting frequency domain data on a predetermined frequency band in the frequency domain data after modeling according to the human auditory characteristic; extracting a spectral envelope of the frequency domain data on the predetermined frequency band; and performing dimensionality reduction on the feature matrix obtained after extracting the spectral envelope Processing, obtaining fingerprint features of the segmented segments included in the known music file.

本实施例中, 步骤 103可以为: 首先, 根据待查询音乐片段包含的分 帧片段的指纹特征, 在上述指纹数据库存储的指纹特征中查询与上述待查 询音乐片段包含的分帧片段的指纹特征匹配的指纹特征; 其次, 根据指纹 数据库存储的所述匹配的指纹特征在所属歌曲中的位置, 从上述位置开始 在指纹数据库中读取预定数量的指纹特征, 该预定数量与待查询音乐片段 包含的分帧片段数相同; 最后, 比较上述预定数量的指纹特征与待查询音 乐片段包含的所有分帧片段的指纹特征的相似程度, 根据该相似程度返回 查询结果。  In this embodiment, step 103 may be: first, according to the fingerprint feature of the segmented segment included in the music segment to be queried, querying the fingerprint feature of the segmented segment included in the to-be-queried music segment in the fingerprint feature stored in the fingerprint database Matching the fingerprint feature; secondly, reading a predetermined number of fingerprint features in the fingerprint database from the location according to the location of the matched fingerprint feature stored in the fingerprint database in the belonging song, the predetermined number and the music segment to be queried include The number of the fragmented segments is the same; finally, the degree of similarity between the predetermined number of fingerprint features and the fingerprint features of all the segmented segments included in the music segment to be queried is compared, and the query result is returned according to the degree of similarity.

上述实施例对音乐查询片段的长度和起始点等没有要求, 可以提高音 乐的查询效率; 并且在有噪声的环境下, 能有效地完成音乐指纹查询, 返 回歌曲容噪下的匹配结果。  The above embodiment does not require the length and starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.

下面从音乐指纹提取过程、 指纹数据库建立过程和音乐指纹查询过程 这三个方面对本发明实施例提供的音乐查询方法进行详细介绍。  The music query method provided by the embodiment of the present invention is described in detail in the following three aspects: the music fingerprint extraction process, the fingerprint database establishment process, and the music fingerprint query process.

1、 音乐指纹提取过程。  1. Music fingerprint extraction process.

本实施例中, 音乐指纹提取过程可以包括: 解码、 降采样和指纹特征 提取。 下面分别加以描述:  In this embodiment, the music fingerprint extraction process may include: decoding, downsampling, and fingerprint feature extraction. The following are described separately:

( 1 ) 解码过程: 由于音乐文件一般是经过编码压缩处理的, 因此在 提取该音乐文件的指纹特征之前, 首先要对该音乐文件进行解码, 将音乐 文件解码为波形 (Wave; 以下简称: WAV ) 文件, 解码后的音乐文件的 采样率跟原始音乐相同,常见的音乐文件的采样率一般为 44KHz或 22KHz 等。 ( 2 ) 降采样过程: 由于音乐文件的采样率一般较高, 包含了大部分 高频信息, 对音乐文件的识别带来困难, 因此, 对于解码后的音乐文件, 还需要进行降采样处理, 即将解码后的音乐文件从 44KHz或 22KHz等较 高的采样率降低到较低的采样率上, 本实施例中, 将解码后的音乐文件统 一降到 5KHz采样率上, 并将进行降采样处理后的音乐文件转换成脉冲编 码调制 ( Pulse Code Modulation; 以下简称: PCM )格式的文件。 (1) Decoding process: Since the music file is generally encoded and compressed, before extracting the fingerprint feature of the music file, the music file is first decoded, and the music file is decoded into a waveform (Wave; hereinafter referred to as WAV) The file, the decoded music file has the same sampling rate as the original music. The sampling rate of common music files is generally 44KHz or 22KHz. (2) Downsampling process: Since the sampling rate of music files is generally high, most of the high frequency information is included, which makes it difficult to identify music files. Therefore, for the decoded music files, downsampling processing is also required. The music file to be decoded is reduced from a higher sampling rate such as 44KHz or 22KHz to a lower sampling rate. In this embodiment, the decoded music file is uniformly reduced to a sampling rate of 5KHz, and the down sampling processing is performed. The subsequent music file is converted into a file of Pulse Code Modulation (hereinafter referred to as PCM) format.

( 3 ) 指纹特征提取过程: 在经过解码过程和降采样过程之后, 进行 指纹特征提取过程, 如图 2所示, 图 2为本发明指纹特征提取过程一个实 施例的流程图, 包括:  (3) Fingerprint feature extraction process: After the decoding process and the downsampling process, the fingerprint feature extraction process is performed. As shown in FIG. 2, FIG. 2 is a flowchart of an embodiment of the fingerprint feature extraction process of the present invention, including:

步骤 201 , 对进行降采样处理后的音乐文件进行分帧。  Step 201: Perform framing on the music file after the downsampling process.

本实施例中, 在进行分帧时需保证相邻两帧之间有部分重叠, 以保持 信号的短时平稳性。 具体地, 分帧的过程是通过加窗实现, 例如: 可以采 用汉宁窗进行分帧, 汉宁窗窗长为 2048个点。  In this embodiment, when framing is performed, it is necessary to ensure partial overlap between adjacent two frames to maintain short-term stability of the signal. Specifically, the process of framing is implemented by windowing. For example: Hanning window can be used for framing, and the window size of Hanning window is 2048 points.

步骤 202, 对分帧得到的分帧片段进行时频转换。  Step 202: Perform time-frequency conversion on the framed segments obtained by the frame.

具体地, 可以采用多种方式对上述分帧片段进行时频转换, 本实施例 对时频转换的实现方式不作限定。 本实施例以对上述分帧片段进行 2048 个点的快速傅里叶变换( Fast Fourier Transform; 以下简称: FFT )为例进 行说明, 本实施例中, 相邻两帧的数据重复率为 31/32, 也就是后一帧相 对于前一帧大约有 60个新的 PCM数据进行 FFT。 步骤 202得到的数值是 复数。  Specifically, the above-mentioned frame segment is time-frequency-converted in a plurality of manners, and the implementation manner of the time-frequency conversion is not limited in this embodiment. In this embodiment, a Fast Fourier Transform (hereinafter referred to as FFT) of 2048 points is performed on the above-mentioned frame segment as an example. In this embodiment, the data repetition rate of two adjacent frames is 31/. 32, that is, the next frame is FFTed with approximately 60 new PCM data relative to the previous frame. The value obtained in step 202 is a complex number.

步骤 203 , 对时频转换获得的频域数据进行求模。  Step 203: Perform modulo calculation on frequency domain data obtained by time-frequency conversion.

步骤 204, 根据人耳听觉特性在求模后的频域数据中选择预定频带上 的频域数据。  Step 204: Select frequency domain data on a predetermined frequency band in the frequency domain data after the modulo according to the human ear hearing characteristic.

本实施例中, 根据人耳的听觉特性从求模后的频域数据中选定了 33 个子带, 这些子带的频率分布空间范围是 0〜2.5kHz, 并且上述 33个子带 的带宽, 在对数域上成线性关系分布。  In this embodiment, 33 sub-bands are selected from the frequency domain data after the modulo according to the auditory characteristics of the human ear, and the frequency distribution of the sub-bands ranges from 0 to 2.5 kHz, and the bandwidth of the above 33 sub-bands is The logarithmic domain is distributed in a linear relationship.

步骤 205 , 提取上述预定频带上的频域数据的频谱包络。  Step 205: Extract a spectrum envelope of frequency domain data on the predetermined frequency band.

具体地, 提取频谱包络可以采用多种方式, 本实施例对提取频谱包络 所采用的方式不作限定, 但本实施例以采用小波变换提取上述频域数据的 频媒包络为例进行说明。 小波变换是空间和频率的局部变换, 通过伸缩和平移等运算功能可对 函数或信号进行多尺度的细化分析, 因而能有效地从信号中提取信息。 本 实施例采用标准的哈尔 (Haar ) 小波对上述频域数据进行分析, 只保留最 大的 300个小波系数(按频谱能量绝对值) , 不在最大的 300个小波系数 中的其它系数均量化为 "00" 。 对于最大的 300个小波系数中的每一个系 数, 若为正数, 则量化为 " 10" , 否则量化为 "01 " 。 Specifically, the method for extracting the spectrum envelope is not limited. In this embodiment, the frequency envelope of the frequency domain data is extracted by using wavelet transform as an example. . Wavelet transform is a local transformation of space and frequency. It can perform multi-scale refinement analysis of functions or signals through operations such as scaling and translation, so that information can be extracted from signals effectively. In this embodiment, the standard Haar wavelet is used to analyze the above-mentioned frequency domain data, and only the maximum 300 wavelet coefficients (according to the absolute value of the spectral energy) are retained, and other coefficients not in the maximum 300 wavelet coefficients are quantized as "00". For each of the largest 300 wavelet coefficients, if it is a positive number, the quantization is "10", otherwise the quantization is "01".

步骤 206, 对提取频谱包络后获得的特征矩阵进行降维处理, 获得上 述音乐文件包含的分帧片段的指纹特征。  Step 206: Perform dimension reduction processing on the feature matrix obtained after extracting the spectrum envelope, and obtain fingerprint features of the segmented segments included in the music file.

由于小波变换之后得到的是高维的 0-1特征矩阵, 因此需要进行降维 处理。 本实施例使用最小哈希 (MinHash ) 算法进行降维处理, 即对每个 0-1特征矩阵任意位置任意交换 P次, 每次记录第一个 1的位置; 一般说 来,在第 255位之后第一次出现 1的概率很小,所以 255之后统一取为 255; 这样就把高维的 0-1特征矩阵压缩成 P维的特征值, 每组 P维 0〜255的整 数称为乐纹的一个子乐纹。 在实际实现时, 可以取 P = 100, 这样降维处 理后可以得到 100维 0〜255的数, 本实施例中, 将每组 100维 0〜255的整 数称为乐纹的一个子乐纹。  Since the wavelet transform is a high-dimensional 0-1 feature matrix, dimensionality reduction is required. In this embodiment, the minimum hash (MinHash) algorithm is used for the dimensionality reduction processing, that is, the arbitrary position of each 0-1 feature matrix is arbitrarily exchanged P times, and the position of the first one is recorded each time; generally speaking, at the 255th position After that, the probability of the first occurrence of 1 is very small, so 255 is uniformly taken as 255; thus, the high-dimensional 0-1 characteristic matrix is compressed into the P-dimensional eigenvalue, and each group of P-dimensional 0~255 integers is called music. A sub-grain of the pattern. In the actual implementation, P = 100 can be taken, so that the dimension of the 100-dimensional 0~255 can be obtained after the dimension reduction processing. In this embodiment, each group of 100-dimensional integers of 0 to 255 is called a sub-grain of the music pattern. .

图 3为本发明提取频谱包络和降维处理一个实施例的示意图, 如图 3 所示, 本实施例按照时间顺序读入 PCM数据, 每一帧数据相对于前一帧 多读入 60个 PCM数据, 这个过程一直循环下去, 直到到达 PCM数据的 末端。 然后, 按照步骤 205提供的方法提取每帧 PCM数据的频谱包络, 再按照步骤 206提供的方法对提取频谱包络后获得的特征矩阵进行降维处 理, 获得上述音乐文件包含的分帧片段的指纹特征。 下面将分帧片段的指 纹特征称为子乐纹, 将上述音乐文件的指纹特征称为乐纹, 由图 3可以看 出, 乐纹为子乐纹的一个序列, 序列中子乐纹的先后顺序反应了该子乐纹 对应的分帧片段在时间上的先后顺序关系。  3 is a schematic diagram of an embodiment of extracting a spectral envelope and a dimensionality reduction process according to the present invention. As shown in FIG. 3, in this embodiment, PCM data is read in chronological order, and each frame of data is read in 60 more than the previous frame. PCM data, this process continues until it reaches the end of the PCM data. Then, the spectrum envelope of each frame of PCM data is extracted according to the method provided in step 205, and then the feature matrix obtained by extracting the spectrum envelope is subjected to dimensionality reduction processing according to the method provided in step 206, to obtain the fragmented segment included in the music file. Fingerprint feature. The fingerprint feature of the segmented segment is referred to as a sub-grain, and the fingerprint feature of the music file is referred to as a tune. As can be seen from FIG. 3, the tune is a sequence of sub-tones, and the sequence of sub-tones in the sequence The sequence reflects the sequential relationship of the sub-frame segments corresponding to the sub-tones in time.

2、 指纹数据库建立过程。  2. The fingerprint database establishment process.

( 1 ) 按照上述音乐指纹提取过程中提供的方法, 提取需要入库的 已知音乐文件的指纹特征。  (1) Extract the fingerprint features of the known music files that need to be stored in accordance with the method provided in the above musical fingerprint extraction process.

( 2 ) 指纹数据库中保存两个数据表, 一个是索引表, 一个是乐纹 表。 索引表中的每个表项存储一个子乐纹和该子乐纹在指纹数据库中的唯 一标识, 以及该子乐纹在所属歌曲中的具体时间位置。 乐纹表中的每个表 项存储一首歌曲的乐纹, 即这首歌曲包含的全部子乐纹。 (2) Two data tables are stored in the fingerprint database, one is an index table, and the other is a music sheet. Each entry in the index table stores a child note and the child note in the fingerprint database An identification, and a specific time position of the sub-grain in the song to which it belongs. Each entry in the music sheet stores the music pattern of a song, that is, all the child music patterns contained in the song.

( 3 ) 乐纹入库: 具体操作为依次将每首歌曲的每个子乐纹录入索 引表, 并在索引表中为每个子乐纹分配在指纹数据库中的唯一标识, 并每 首歌曲包含的全部子乐纹录入到乐纹表里。  (3) Music storage: The specific operation is to record each sub-grain of each song into the index table in turn, and assign a unique identifier in the fingerprint database for each sub-grain in the index table, and each song contains All child music patterns are entered into the music sheet.

3、 音乐指纹查询过程。  3. Music fingerprint inquiry process.

本发明实施例中, 在音乐指纹查询过程中, 先要按照上述音乐指纹提 取过程中提供的方法提取待查询音乐片段的指纹特征, 然后对于待查询音 乐片段包含的分帧片段的指纹特征, 先要在索引表中查询与待查询音乐片 段包含的分帧片段的指纹特征匹配的指纹特征, 然后根据索引表中保存的 上述匹配的指纹特征的标识, 查找到乐纹表中与上述匹配的指纹特征对应 的表项, 再根据索引表中保存的上述匹配的指纹特征在所属歌曲中的位 置, 从上述位置开始在乐纹表与上述匹配的指纹特征对应的表项中读取预 定数量的子乐纹, 其中该预定数量与待查询音乐片段包含的分帧片段数相 同。 最后比较读取的预定数量的子乐纹与待查询音乐片段包含的所有分帧 片段的指纹特征的相似程度, 根据该相似程度返回查询结果列表。  In the embodiment of the present invention, in the process of the music fingerprint query, the fingerprint feature of the music segment to be queried is first extracted according to the method provided in the above music fingerprint extraction process, and then the fingerprint feature of the segmented segment included in the music segment to be queried is first The fingerprint feature matching the fingerprint feature of the segmented segment included in the music segment to be queried is searched in the index table, and then the fingerprint matching the above is found according to the identifier of the matched fingerprint feature saved in the index table. The entry corresponding to the feature, according to the position of the matching fingerprint feature saved in the index table in the belonging song, reads a predetermined number of sub-categories from the position corresponding to the fingerprint pattern and the matching fingerprint feature from the position A note pattern, wherein the predetermined number is the same as the number of frame segments included in the music piece to be queried. Finally, the degree of similarity between the read predetermined number of sub-letter patterns and the fingerprint features of all the sub-frame segments included in the music segment to be queried is compared, and the query result list is returned according to the degree of similarity.

综上所述, 本发明实施例提供的音乐查询方法具有以下优 , :  In summary, the music query method provided by the embodiment of the present invention has the following advantages:

1、 乐纹压缩比都可达到百倍以上的压缩, 压缩比大, 并且表征性强。 在噪声环境下, 利用小波变换可以剔出噪声细节的特点, 将频谱图中 的高能量信息部分进行哈希压缩处理, 使一帧数据由原来的 8192个点压 缩到 100个字节; 使音乐数据缩小为原数据的几百分之一; 因此, 特征压 缩比大并且表征性强。  1. The compression ratio of the music pattern can reach more than 100 times of compression, the compression ratio is large, and the representation is strong. In the noisy environment, the wavelet transform can be used to extract the characteristics of the noise details, and the high-energy information part of the spectrogram is hash-compressed, so that one frame of data is compressed from the original 8192 points to 100 bytes; The data is reduced to a few hundredths of the original data; therefore, the feature compression ratio is large and representative.

2、 乐纹设计具有一定的抗噪性。  2, music pattern design has a certain degree of noise resistance.

在提取能量谱的基础上进一步做小波变换, 并且只提取幅度较为明显 的最大的 300个小波系数, 从而避免了一定的噪声信号影响; 使用最小哈 希算法对特征矩阵进行处理, 使得长达 8192维度的特征得到了降维, 同 时使得特征经过很简单的比较计算就可以得到相似度; 在制造数据库链表 时, 考虑到乐纹局部特征的变化特性, 引入了局部敏感哈希, 适用性较强, 大大降低了候选的乐纹查找范围。 由于在乐纹提取阶段就考虑到了一定抗 噪容忍性, 并不含有专门的去噪系统, 所以最终的无噪乐纹, 具有一定的 抗噪性。 On the basis of extracting the energy spectrum, wavelet transform is further performed, and only the largest 300 wavelet coefficients with relatively obvious amplitude are extracted, thereby avoiding certain noise signal effects; the feature matrix is processed by the minimum hash algorithm, which is up to 8192. The feature of the dimension is reduced in dimension, and the feature can be similarly calculated by a simple comparison. When the database linked list is manufactured, the local sensitive hash is introduced in consideration of the variation characteristics of the local features of the music, and the applicability is strong. , greatly reducing the range of candidate music patterns. Since the noise resistance tolerance is taken into account in the music pattern extraction stage, it does not contain a special denoising system, so the final noise-free music pattern has a certain Noise resistance.

3、 对不同的查询需求, 进行不同的查询。  3. Perform different queries for different query requirements.

本发明实施例从实际应用考虑出发, 能够按照相似程度返回查询结果 列表。  According to the practical application considerations, the embodiment of the present invention can return the query result list according to the degree of similarity.

4、 本发明实施例还可以对相似音乐片段进行相似度比较和重叠部分 的度量。  4. Embodiments of the present invention may also perform similarity comparison and metrics of overlapping portions on similar pieces of music.

由于本发明实施例提取的乐纹在时间上具有顺序性, 所以可以很方便 地知道两个片段的来源, 和在所属歌曲中的位置, 由此可以判断出两个相 似音乐片段的相似度以及重叠部分所占的比例。 这种乐纹在指纹数据库存 放的顺序性以及查询的高效性, 保证了这类需求方面的实现。  Since the music patterns extracted by the embodiment of the present invention are sequential in time, it is convenient to know the source of the two segments and the position in the belonging song, thereby judging the similarity of the two similar music segments and The proportion of overlap. The sequence of the fingerprint data in the fingerprint data and the efficiency of the query ensure the realization of such requirements.

本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步 骤可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机 可读取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述的存储介质包括: ROM、 RAM, 磁碟或者光盘等各种可以存储程 序代码的介质。  A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

图 4为本发明音乐查询装置一个实施例的结构示意图, 如图 4所示, 该音乐查询装置可以包括: 截取模块 41、 分帧模块 42、 提取模块 43、 查 询模块 44和返回模块 45。  4 is a schematic structural diagram of an embodiment of a music query apparatus according to the present invention. As shown in FIG. 4, the music query apparatus may include: an intercepting module 41, a framing module 42, an extracting module 43, a querying module 44, and a returning module 45.

其中, 截取模块 41 , 用于从待查询音乐文件中截取待查询音乐片段; 分帧模块 42, 用于对上述待查询音乐片段进行分帧;  The intercepting module 41 is configured to intercept the music segment to be queried from the music file to be queried; the framing module 42 is configured to framing the music segment to be queried;

提取模块 43 ,用于提取上述待查询音乐片段包含的分帧片段的指纹特 征, 以获得上述待查询音乐片段的指纹特征;  The extracting module 43 is configured to extract a fingerprint feature of the segmented segment included in the music segment to be queried to obtain a fingerprint feature of the music segment to be queried;

查询模块 44, 用于根据提取模块 43提取的待查询音乐片段包含的分 帧片段的指纹特征, 在指纹数据库存储的指纹特征中查询与上述待查询音 乐片段的指纹特征匹配的指纹特征;  The querying module 44 is configured to query, according to the fingerprint feature of the frame segment included in the music segment to be queried extracted by the extraction module 43 , a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database;

返回模块 45 , 用于根据待查询音乐片段的指纹特征与查询模块 44查 询到的指纹特征的相似程度返回查询结果。  The returning module 45 is configured to return the query result according to the degree of similarity between the fingerprint feature of the music piece to be queried and the fingerprint feature queried by the query module 44.

上述音乐查询装置对音乐查询片段的长度和起始点等没有要求, 可以 提高音乐的查询效率; 并且在有噪声的环境下, 能有效地完成音乐指纹查 询, 返回歌曲容噪下的匹配结果。 图 5为本发明音乐查询装置另一个实施例的结构示意图,如图 5所示, 该音乐查询装置还可以包括: 存储模块 46; The music inquiry device does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned. Figure 5 is a schematic diagram of another embodiment of the music query device of the present invention, as shown in Figure 5, the music query device may further include: a storage module 46;

本实施例中, 分帧模块 42 , 还用于对已知音乐文件进行分帧; 提取模块 43 ,还用于提取上述已知音乐文件包含的分帧片段的指纹特 征, 以获得上述已知音乐文件的指纹特征;  In this embodiment, the framing module 42 is further configured to framing the known music file; the extracting module 43 is further configured to extract the fingerprint feature of the framing segment included in the known music file to obtain the known music. The fingerprint characteristics of the document;

存储模块 46 , 用于将提取模块 43获得的上述已知音乐文件的指纹特 征存储到指纹数据库。  The storage module 46 is configured to store the fingerprint feature of the above-mentioned known music file obtained by the extraction module 43 into the fingerprint database.

具体地, 提取模块 43可以包括: 转换子模块 43 1、 求模子模块 432、 选择子模块 433、 包络提取子模块 434和降维子模块 435。  Specifically, the extraction module 43 may include: a conversion submodule 43 1. a module submodule 432, a selection submodule 433, an envelope extraction submodule 434, and a dimension reduction submodule 435.

转换子模块 43 1 , 用于对上述待查询音乐片段包含的分帧片段进行时 频转换;  The conversion sub-module 43 1 is configured to perform time-frequency conversion on the frame segment included in the music segment to be queried;

求模子模块 432 , 用于对时频转换获得的频域数据进行求模; 选择子模块 433 , 用于根据人耳听觉特性在求模子模块 432获得的频 域数据中选择预定频带上的频域数据;  The module submodule 432 is configured to perform frequency domain data obtained by time-frequency conversion, and the selection sub-module 433 is configured to select a frequency domain on a predetermined frequency band in the frequency domain data obtained by the module sub-module 432 according to the human ear hearing characteristic. Data

包络提取子模块 434 , 用于提取上述预定频带上的频域数据的频谱包 络;  An envelope extraction submodule 434, configured to extract a spectrum envelope of frequency domain data on the predetermined frequency band;

降维子模块 435 , 用于对包络提取子模块 434提取频谱包络后获得的 特征矩阵进行降维处理, 获得上述待查询音乐片段包含的分帧片段的指纹 特征。  The dimension reduction sub-module 435 is configured to perform a dimensionality reduction process on the feature matrix obtained by extracting the spectrum envelope by the envelope extraction sub-module 434, and obtain the fingerprint feature of the segmented segment included in the music segment to be queried.

具体地, 本实施例中, 查询模块 44 可以包括: 特征查询子模块 441 和特征读取子模块 442;  Specifically, in this embodiment, the query module 44 may include: a feature query sub-module 441 and a feature reading sub-module 442;

其中, 特征查询子模块 441 , 用于根据上述待查询音乐片段包含的分 帧片段的指纹特征, 在指纹数据库存储的指纹特征中查询与上述待查询音 乐片段包含的分帧片段的指纹特征匹配的指纹特征;  The feature query sub-module 441 is configured to query, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature stored in the fingerprint database to match a fingerprint feature of the segmented segment included in the to-be-queried music segment. Fingerprint feature

特征读取子模块 442 , 用于根据指纹数据库存储的所述匹配的指纹特 征在所属歌曲中的位置, 从该位置开始在指纹数据库中读取预定数量的指 纹特征, 该预定数量与待查询音乐片段包含的分帧片段数相同。  The feature reading sub-module 442 is configured to read, according to the location of the matched fingerprint feature stored in the fingerprint database, a predetermined number of fingerprint features in the fingerprint database, the predetermined number and the music to be queried. The fragment contains the same number of fragmented segments.

本实施例中, 返回模块 45可以比较特征读取子模块 442读取的预定 数量的指纹特征与上述待查询音乐片段包含的所有分帧片段的指纹特征 的相似程度, 根据上述相似程度返回查询结果。 上述音乐查询装置对音乐查询片段的长度和起始点等没有要求, 可以 提高音乐的查询效率; 并且在有噪声的环境下, 能有效地完成音乐指纹查 询, 返回歌曲容噪下的匹配结果。 In this embodiment, the returning module 45 may compare the degree of similarity between the predetermined number of fingerprint features read by the feature reading sub-module 442 and the fingerprint features of all the segmented segments included in the music segment to be queried, and return the query result according to the similarity degree. . The music inquiry device does not require the length and the starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.

图 6为本发明计算机设备一个实施例的结构示意图, 本实施例中的计 算机设备可以实现本发明图 4或图 5所示实施例中音乐查询装置的功能, 如图 6所示, 该计算机设备可以包括: 中央处理单元 ( Central Processing Unit; 以下简称: CPU ) 61、 总线控制逻辑 62、 系统总线 63、 内存 64、 接口 65和输入输出 ( Input I Output; 以下简称: I/O )子系统 66; 其中 I/O 子系统 66包括 I/O设备 661和存储器 662。  FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present invention. The computer device in this embodiment can implement the function of the music query device in the embodiment shown in FIG. 4 or FIG. 5, as shown in FIG. The system may include: a central processing unit (hereinafter referred to as: CPU) 61, a bus control logic 62, a system bus 63, a memory 64, an interface 65, and an input/output (I/O) subsystem 66. Wherein the I/O subsystem 66 includes an I/O device 661 and a memory 662.

本实施例中, CPU 61 ,用于从待查询音乐文件中截取待查询音乐片段, 对上述待查询音乐片段进行分帧, 提取上述待查询音乐片段包含的分帧片 段的指纹特征, 以获得上述待查询音乐片段的指纹特征, 根据提取的待查 询音乐片段包含的分帧片段的指纹特征, 在指纹数据库存储的指纹特征中 查询与上述待查询音乐片段的指纹特征匹配的指纹特征, 并根据待查询音 乐片段的指纹特征与查询到的指纹特征的相似程度返回查询结果; 本实施 例中的 CPU 61可以实现本发明图 4或图 5所示实施例中截取模块 41、 分 帧模块 42、 提取模块 43和查询模块 44的功能。  In this embodiment, the CPU 61 is configured to intercept a music segment to be queried from the music file to be queried, perform framing on the music segment to be queried, and extract a fingerprint feature of the framing segment included in the music segment to be queried to obtain the above. Querying the fingerprint feature of the music segment to be queried, according to the fingerprint feature of the segmented segment included in the extracted music segment, querying the fingerprint feature stored in the fingerprint database to match the fingerprint feature of the music segment to be queried, and according to Querying the similarity between the fingerprint feature of the music piece and the queried fingerprint feature returns the query result; the CPU 61 in this embodiment can implement the intercepting module 41, the framing module 42, and the extraction in the embodiment shown in FIG. 4 or FIG. The functions of module 43 and query module 44.

其中, 上述指纹数据库存储在存储器 662中; 具体地, CPU 61返回 查询结果可以为: CPU 61将查询结果发送给总线控制逻辑 62, 由总线控 制逻辑 62将上述查询结果通过系统总线 63和接口 65 ,发送给 I/O设备 661 , 由 I/O设备 661将上述查询结果发送出去; 另外, 在 I/O设备 661发送上 述查询结果之前, 上述查询结果可以先緩存在内存 64 中。 也就是说, 本 实施例中, CPU 61、 总线控制逻辑 62、 系统总线 63、 内存 64、 接口 65 和 I/O设备 661共同完成本发明图 4或图 5所示实施例中返回模块 45的功 能。  The fingerprint database is stored in the memory 662. Specifically, the CPU 61 returns the query result: the CPU 61 sends the query result to the bus control logic 62, and the query result is passed by the bus control logic 62 through the system bus 63 and the interface 65. The I/O device 661 sends the query result to the I/O device 661. In addition, before the I/O device 661 sends the query result, the query result may be cached in the memory 64 first. That is, in this embodiment, the CPU 61, the bus control logic 62, the system bus 63, the memory 64, the interface 65, and the I/O device 661 collectively complete the return module 45 of the embodiment shown in FIG. 4 or FIG. 5 of the present invention. Features.

进一步地, CPU 61 还可以对已知音乐文件进行分帧, 提取上述已知 音乐文件包含的分帧片段的指纹特征, 以获得上述已知音乐文件的指纹特 征。  Further, the CPU 61 may further framing the known music files to extract the fingerprint features of the segmented segments included in the known music files to obtain the fingerprint features of the known music files.

本实施例中, 存储器 662, 用于保存指纹数据库, 将 CPU 61获得的 上述已知音乐文件的指纹特征存储到指纹数据库; 本实施例中的存储器 662可以实现本发明图 5所示实施例中存储模块 46的功能。 In this embodiment, the memory 662 is configured to save the fingerprint database, and store the fingerprint feature of the known music file obtained by the CPU 61 into the fingerprint database. The memory in this embodiment 662 can implement the functions of memory module 46 in the embodiment of FIG. 5 of the present invention.

上述计算机设备对音乐查询片段的长度和起始点等没有要求, 可以提 高音乐的查询效率;并且在有噪声的环境下,能有效地完成音乐指纹查询, 返回歌曲容噪下的匹配结果。  The above computer device does not require the length and starting point of the music query segment, and can improve the query efficiency of the music; and in the noisy environment, the music fingerprint query can be effectively completed, and the matching result under the song noise can be returned.

本领域技术人员可以理解附图只是一个优选实施例的示意图, 附图中 的模块或流程并不一定是实施本发明所必须的。  A person skilled in the art can understand that the drawings are only a schematic diagram of a preferred embodiment, and the modules or processes in the drawings are not necessarily required to implement the invention.

本领域技术人员可以理解实施例中的装置中的模块可以按照实施例 描述进行分布于实施例的装置中, 也可以进行相应变化位于不同于本实施 例的一个或多个装置中。 上述实施例的模块可以合并为一个模块, 也可以 进一步拆分成多个子模块。  Those skilled in the art can understand that the modules in the apparatus in the embodiments may be distributed in the apparatus of the embodiment according to the embodiment description, or the corresponding changes may be located in one or more apparatuses different from the embodiment. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修 改, 或者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不 使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。  It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

权 利 要 求 书 Claim 1、 一种音乐查询方法, 其特征在于, 包括: A music query method, comprising: 从待查询音乐文件中截取待查询音乐片段, 对所述待查询音乐片段进 行分帧;  Intercepting the music piece to be queried from the music file to be queried, and framing the music piece to be queried; 提取所述待查询音乐片段包含的分帧片段的指纹特征, 以获得所述待 查询音乐片段的指紋特征;  Extracting fingerprint features of the segmented segments included in the music segment to be queried to obtain fingerprint features of the music segment to be queried; 根据所述待查询音乐片段包含的分帧片段的指纹特征, 在指纹数据库 存储的指纹特征中查询与所述待查询音乐片段的指纹特征匹配的指纹特 征, 并根据所述待查询音乐片段的指纹特征与查询到的指纹特征的相似程 度返回查询结果。  Determining, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database, and according to the fingerprint of the music segment to be queried The degree of similarity between the feature and the queried fingerprint feature returns the query result. 2、 根据权利要求 1 所述的方法, 其特征在于, 所述根据所述待查询 音乐片段包含的分帧片段的指纹特征, 在指纹数据库存储的指纹特征中查 询与所述待查询音乐片段的指纹特征匹配的指纹特征之前, 还包括: 对已知音乐文件进行分帧;  The method according to claim 1, wherein the querying the fingerprint feature of the to-be-queried music segment in the fingerprint feature stored in the fingerprint database according to the fingerprint feature of the segmented segment included in the music segment to be queried Before the fingerprint feature matching the fingerprint feature, the method further includes: framing the known music file; 提取所述已知音乐文件包含的分帧片段的指纹特征, 以获得所述已知 音乐文件的指纹特征;  Extracting fingerprint features of the segmented segments included in the known music file to obtain fingerprint features of the known music files; 将所述已知音乐文件的指纹特征存储到所述指纹数据库。  The fingerprint features of the known music files are stored to the fingerprint database. 3、 根据权利要求 1 所述的方法, 其特征在于, 所述提取所述待查询 音乐片段包含的分帧片段的指纹特征包括:  The method according to claim 1, wherein the extracting the fingerprint features of the segmented segments included in the music segment to be queried comprises: 对所述待查询音乐片段包含的分帧片段进行时频转换, 对所述时频转 换获得的频域数据进行求模;  And performing time-frequency conversion on the frame segment included in the music segment to be queried, and performing frequency modulo calculation on the frequency domain data obtained by the time-frequency conversion; 根据人耳听觉特性在求模后的频域数据中选择预定频带上的频域数 据;  Selecting frequency domain data on a predetermined frequency band in the frequency domain data after modulo according to human auditory characteristics; 提取所述预定频带上的频域数据的频谱包络;  Extracting a spectral envelope of frequency domain data on the predetermined frequency band; 对提取频谱包络后获得的特征矩阵进行降维处理, 获得所述待查询音 乐片段包含的分帧片段的指纹特征。  Performing dimensionality reduction on the feature matrix obtained after extracting the spectral envelope, obtaining fingerprint features of the segmented segments included in the to-be-queried music segment. 4、 根据权利要求 2所述的方法, 其特征在于, 所述提取所述已知音 乐文件包含的分帧片段的指纹特征包括:  4. The method according to claim 2, wherein the extracting the fingerprint features of the segmented segments included in the known music file comprises: 对所述已知音乐文件包含的分帧片段进行时频转换, 对所述时频转换 获得的频域数据进行求模; 根据人耳听觉特性在求模后的频域数据中选择预定频带上的频域数 据; Performing time-frequency conversion on the frame segment included in the known music file, and performing frequency domain data obtained by the time-frequency conversion; Selecting frequency domain data on a predetermined frequency band in the frequency domain data after modulo according to human auditory characteristics; 提取所述预定频带上的频域数据的频谱包络;  Extracting a spectral envelope of frequency domain data on the predetermined frequency band; 对提取频谱包络后获得的特征矩阵进行降维处理, 获得所述已知音乐 文件包含的分帧片段的指纹特征。  The feature matrix obtained after extracting the spectrum envelope is subjected to dimensionality reduction processing to obtain fingerprint features of the segmented segments included in the known music file. 5、 根据权利要求 3或 4所述的方法, 其特征在于, 所述根据所述待 查询音乐片段包含的分帧片段的指纹特征, 在指纹数据库存储的指纹特征 中查询与所述待查询音乐片段的指纹特征匹配的指纹特征, 并根据所述待 查询音乐片段的指纹特征与查询到的指纹特征的相似程度返回查询结果 包括:  The method according to claim 3 or 4, wherein the querying the music to be queried in the fingerprint feature stored in the fingerprint database according to the fingerprint feature of the segmented segment included in the music segment to be queried The fingerprint feature matching the fingerprint feature of the segment, and returning the query result according to the similarity between the fingerprint feature of the music segment to be queried and the queried fingerprint feature includes: 根据所述待查询音乐片段包含的分帧片段的指纹特征, 在所述指纹数 据库存储的指纹特征中查询与所述待查询音乐片段包含的分帧片段的指 纹特征匹配的指纹特征;  And querying, according to the fingerprint feature of the segmented segment included in the music segment to be queried, a fingerprint feature matching the fingerprint feature of the segmented segment included in the music segment to be queried in the fingerprint feature stored in the fingerprint data library; 根据所述指纹数据库存储的所述匹配的指纹特征在所属歌曲中的位 置, 从所述位置开始在所述指纹数据库中读取预定数量的指纹特征, 所述 预定数量与所述待查询音乐片段包含的分帧片段数相同;  And reading, according to the location of the matched fingerprint feature stored in the fingerprint database, a predetermined number of fingerprint features in the fingerprint database from the location, the predetermined number and the music segment to be queried Contains the same number of fragmented segments; 比较所述预定数量的指纹特征与所述待查询音乐片段包含的所有分 帧片段的指纹特征的相似程度, 根据所述相似程度返回查询结果。  Comparing the degree of similarity between the predetermined number of fingerprint features and the fingerprint features of all the sub-frame segments included in the music segment to be queried, and returning the query result according to the similarity degree. 6、 一种音乐查询装置, 其特征在于, 包括:  6. A music query device, comprising: 截取模块, 用于从待查询音乐文件中截取待查询音乐片段; 分帧模块, 用于对所述待查询音乐片段进行分帧;  An intercepting module, configured to intercept a music segment to be queried from a music file to be queried; a framing module, configured to framing the music segment to be queried; 提取模块, 用于提取所述待查询音乐片段包含的分帧片段的指纹特 征, 以获得所述待查询音乐片段的指纹特征;  An extracting module, configured to extract a fingerprint feature of the segmented segment included in the music segment to be queried, to obtain a fingerprint feature of the music segment to be queried; 查询模块, 用于根据所述提取模块提取的所述待查询音乐片段包含的 分帧片段的指纹特征, 在指纹数据库存储的指纹特征中查询与所述待查询 音乐片段的指纹特征匹配的指纹特征;  a querying module, configured to query, according to the fingerprint feature of the segmented segment included in the music segment to be queried extracted by the extraction module, a fingerprint feature matching the fingerprint feature of the music segment to be queried in the fingerprint feature stored in the fingerprint database ; 返回模块, 用于根据所述待查询音乐片段的指纹特征与所述查询模块 查询到的指纹特征的相似程度返回查询结果。  And a returning module, configured to return a query result according to a similarity degree between the fingerprint feature of the music piece to be queried and the fingerprint feature queried by the query module. 7、 根据权利要求 6所述的装置, 其特征在于, 还包括: 存储模块; 所述分帧模块, 还用于对已知音乐文件进行分帧; 所述提取模块, 还用于提取所述已知音乐文件包含的分帧片段的指纹 特征, 以获得所述已知音乐文件的指纹特征; The device according to claim 6, further comprising: a storage module; the framing module is further configured to perform framing on the known music file; The extracting module is further configured to extract fingerprint features of the segmented segments included in the known music file to obtain fingerprint features of the known music files; 所述存储模块, 用于将所述提取模块获得的所述已知音乐文件的指纹 特征存储到所述指纹数据库。  The storage module is configured to store fingerprint features of the known music files obtained by the extraction module to the fingerprint database. 8、 根据权利要求 6或 7所述的装置, 其特征在于, 所述提取模块包 括:  8. The apparatus according to claim 6 or 7, wherein the extraction module comprises: 转换子模块, 用于对所述待查询音乐片段包含的分帧片段进行时频转 换;  a conversion submodule, configured to perform time-frequency conversion on the frame segment included in the music segment to be queried; 求模子模块, 用于对所述时频转换获得的频域数据进行求模; 选择子模块, 用于根据人耳听觉特性在所述求模子模块获得的频域数 据中选择预定频带上的频域数据;  a module for modulo frequency domain data obtained by the time-frequency conversion; a submodule for selecting a frequency in a predetermined frequency band in frequency domain data obtained by the modulo submodule according to human auditory characteristics Domain data 包络提取子模块, 用于提取所述预定频带上的频域数据的频谱包络; 降维子模块, 用于对所述包络提取子模块提取频谱包络后获得的特征 矩阵进行降维处理, 获得所述待查询音乐片段包含的分帧片段的指纹特 征。  An envelope extraction submodule, configured to extract a spectrum envelope of frequency domain data on the predetermined frequency band; and a dimension reduction submodule, configured to perform dimensionality reduction on a feature matrix obtained by extracting a spectrum envelope by the envelope extraction submodule Processing, obtaining fingerprint features of the segmented segments included in the music segment to be queried. 9、 根据权利要求 8所述的装置, 其特征在于, 所述查询模块包括: 特征查询子模块, 用于根据所述待查询音乐片段包含的分帧片段的指 纹特征, 在所述指纹数据库存储的指纹特征中查询与所述待查询音乐片段 包含的分帧片段的指纹特征匹配的指纹特征;  The device according to claim 8, wherein the query module comprises: a feature query sub-module, configured to store in the fingerprint database according to a fingerprint feature of a segmented segment included in the music segment to be queried Querying, in the fingerprint feature, a fingerprint feature that matches a fingerprint feature of the segmented segment included in the music segment to be queried; 特征读取子模块, 用于根据所述指纹数据库存储的所述匹配的指纹特 征在所属歌曲中的位置, 从所述位置开始在所述指纹数据库中读取预定数 量的指纹特征, 所述预定数量与所述待查询音乐片段包含的分帧片段数相 同。  a feature reading submodule, configured to read a predetermined number of fingerprint features in the fingerprint database from the location according to the location of the matched fingerprint feature stored in the fingerprint database in the belonging song, the predetermined The number is the same as the number of fragmented segments included in the music segment to be queried. 10、 根据权利要求 9所述的装置, 其特征在于,  10. Apparatus according to claim 9 wherein: 所述返回模块, 具体用于比较所述特征读取子模块读取的预定数量的 指纹特征与所述待查询音乐片段包含的所有分帧片段的指纹特征的相似 程度, 根据所述相似程度返回查询结果。  The returning module is specifically configured to compare the degree of similarity between a predetermined number of fingerprint features read by the feature reading sub-module and fingerprint features of all the segmented segments included in the music segment to be queried, and return according to the similarity degree search result.
PCT/CN2011/080977 2011-10-19 2011-10-19 Music query method and apparatus Ceased WO2012163013A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/080977 WO2012163013A1 (en) 2011-10-19 2011-10-19 Music query method and apparatus
CN201180002170.8A CN103180847B (en) 2011-10-19 2011-10-19 Music query method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/080977 WO2012163013A1 (en) 2011-10-19 2011-10-19 Music query method and apparatus

Publications (1)

Publication Number Publication Date
WO2012163013A1 true WO2012163013A1 (en) 2012-12-06

Family

ID=47258328

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080977 Ceased WO2012163013A1 (en) 2011-10-19 2011-10-19 Music query method and apparatus

Country Status (2)

Country Link
CN (1) CN103180847B (en)
WO (1) WO2012163013A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018027607A1 (en) * 2016-08-10 2018-02-15 董访问 Information pushing method for sound recording-based song matching and sharing system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018027606A1 (en) * 2016-08-10 2018-02-15 董访问 Data acquisition method for music matching and analysis technology and sharing system
WO2018027605A1 (en) * 2016-08-10 2018-02-15 董访问 Sound recording-based music sharing method and sharing system
CN107633078B (en) * 2017-09-25 2019-02-22 北京达佳互联信息技术有限公司 Audio-frequency fingerprint extracting method, audio-video detection method, device and terminal

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1620684A (en) * 2001-05-25 2005-05-25 多尔拜实验特许公司 Comparing audio using auditory event-based representations
CN1628303A (en) * 2002-02-06 2005-06-15 皇家飞利浦电子股份有限公司 Fast Retrieval of Multimedia Object Metadata Based on Messy Data
CN1661600A (en) * 2004-02-24 2005-08-31 微软公司 Systems and methods for generating audio thumbnails
CN1672211A (en) * 2002-05-16 2005-09-21 皇家飞利浦电子股份有限公司 Signal processing method and arragement
CN1708758A (en) * 2002-11-01 2005-12-14 皇家飞利浦电子股份有限公司 Improved audio data fingerprint search
CN1820511A (en) * 2003-07-11 2006-08-16 皇家飞利浦电子股份有限公司 Method and device for generating and detecting a fingerprint functioning as a trigger marker in a multimedia signal
CN101014953A (en) * 2003-09-23 2007-08-08 音乐Ip公司 Audio fingerprinting system and method
US20090012638A1 (en) * 2007-07-06 2009-01-08 Xia Lou Feature extraction for identification and classification of audio signals
CN101673264A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Audio content searching device
CN101673266A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio and video contents
CN101673262A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio content
CN101673267A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio and video content
CN101882439A (en) * 2010-06-10 2010-11-10 复旦大学 A Compressed Domain Audio Fingerprint Method Based on Zernike Moments
CN102096780A (en) * 2010-12-17 2011-06-15 华中科技大学 Rapid detection method of digital fingerprints under large-scale user environment
US20110173208A1 (en) * 2010-01-13 2011-07-14 Rovi Technologies Corporation Rolling audio recognition
CN102214219A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Audio/video content retrieval system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101959191B (en) * 2010-09-25 2012-12-26 华中科技大学 Safety authentication method and system for wireless network

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1620684A (en) * 2001-05-25 2005-05-25 多尔拜实验特许公司 Comparing audio using auditory event-based representations
CN1628303A (en) * 2002-02-06 2005-06-15 皇家飞利浦电子股份有限公司 Fast Retrieval of Multimedia Object Metadata Based on Messy Data
CN1672211A (en) * 2002-05-16 2005-09-21 皇家飞利浦电子股份有限公司 Signal processing method and arragement
CN1708758A (en) * 2002-11-01 2005-12-14 皇家飞利浦电子股份有限公司 Improved audio data fingerprint search
CN1820511A (en) * 2003-07-11 2006-08-16 皇家飞利浦电子股份有限公司 Method and device for generating and detecting a fingerprint functioning as a trigger marker in a multimedia signal
CN101014953A (en) * 2003-09-23 2007-08-08 音乐Ip公司 Audio fingerprinting system and method
CN1661600A (en) * 2004-02-24 2005-08-31 微软公司 Systems and methods for generating audio thumbnails
US20090012638A1 (en) * 2007-07-06 2009-01-08 Xia Lou Feature extraction for identification and classification of audio signals
CN101673264A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Audio content searching device
CN101673266A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio and video contents
CN101673262A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio content
CN101673267A (en) * 2008-09-12 2010-03-17 未序网络科技(上海)有限公司 Method for searching audio and video content
US20110173208A1 (en) * 2010-01-13 2011-07-14 Rovi Technologies Corporation Rolling audio recognition
CN101882439A (en) * 2010-06-10 2010-11-10 复旦大学 A Compressed Domain Audio Fingerprint Method Based on Zernike Moments
CN102096780A (en) * 2010-12-17 2011-06-15 华中科技大学 Rapid detection method of digital fingerprints under large-scale user environment
CN102214219A (en) * 2011-06-07 2011-10-12 盛乐信息技术(上海)有限公司 Audio/video content retrieval system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018027607A1 (en) * 2016-08-10 2018-02-15 董访问 Information pushing method for sound recording-based song matching and sharing system

Also Published As

Publication number Publication date
CN103180847A (en) 2013-06-26
CN103180847B (en) 2016-03-02

Similar Documents

Publication Publication Date Title
CN102314875B (en) Audio file identification method and device
CA2899657C (en) Method and device for audio recognition
CN103403710B (en) Extraction and coupling to the characteristic fingerprint from audio signal
US8411977B1 (en) Audio identification using wavelet-based signatures
JP5907511B2 (en) System and method for audio media recognition
WO2015027751A1 (en) Audio fingerprint feature-based music retrieval system
WO2013090207A1 (en) Low complexity repetition detection in media data
CN108197319A (en) A kind of audio search method and system of the characteristic point based on time-frequency local energy
CN102063904A (en) Melody extraction method and melody recognition system for audio files
CN104050259A (en) An Audio Fingerprint Extraction Method Based on SOM Algorithm
CN110472097A (en) Melody automatic classification method, device, computer equipment and storage medium
WO2012163013A1 (en) Music query method and apparatus
CN110059218A (en) A kind of speech retrieval method and system based on inverse fast Fourier transform
Zhang et al. An encrypted speech retrieval algorithm based on Chirp-Z transform and perceptual hashing second feature extraction
Thiruvengatanadhan Music Classification using MFCC and SVM
CN105741853A (en) Digital speech perception hash method based on formant frequency
CN104900239B (en) A kind of audio real-time comparison method based on Walsh-Hadamard transform
Yao et al. An efficient cascaded filtering retrieval method for big audio data
Wang et al. Robust audio fingerprint extraction algorithm based on 2-D chroma
Huang et al. A high-performance speech BioHashing retrieval algorithm based on audio segmentation
CN111382303B (en) Audio sample retrieval method based on fingerprint weight
CN108268572B (en) Song synchronization method and system
CN117807564A (en) Infringement identification method, device, equipment and medium for audio data
CN112037815B (en) Audio fingerprint extraction method, server and storage medium
CN115881067A (en) Music genre classification method, system and medium based on Resnet101

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11866713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11866713

Country of ref document: EP

Kind code of ref document: A1