TWI768589B - Deep learning rhythm practice system - Google Patents
Deep learning rhythm practice system Download PDFInfo
- Publication number
- TWI768589B TWI768589B TW109143675A TW109143675A TWI768589B TW I768589 B TWI768589 B TW I768589B TW 109143675 A TW109143675 A TW 109143675A TW 109143675 A TW109143675 A TW 109143675A TW I768589 B TWI768589 B TW I768589B
- Authority
- TW
- Taiwan
- Prior art keywords
- layer
- deep learning
- audio signal
- module
- music
- Prior art date
Links
- 230000033764 rhythmic process Effects 0.000 title claims abstract description 42
- 238000013135 deep learning Methods 0.000 title claims abstract description 30
- 230000005236 sound signal Effects 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims description 21
- 238000001514 detection method Methods 0.000 claims description 16
- 238000012795 verification Methods 0.000 claims description 16
- 238000011176 pooling Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000005284 excitation Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 210000002569 neuron Anatomy 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000002945 steepest descent method Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 6
- 238000000034 method Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Landscapes
- Electrophonic Musical Instruments (AREA)
Abstract
Description
本發明是有關於一種節奏練習系統,特別是指一種深度學習之節奏練習系統。The present invention relates to a rhythm practice system, in particular to a rhythm practice system for deep learning.
在學習音樂的過程中,具有正確的節奏感是很重要的,而在初學時期的節奏感練習方式,通常都是跟著節拍器的聲音來進行打拍子,等練習至有一定基礎之後,便可以播放音樂,一邊聽音樂一邊跟著音樂節拍來打拍子,且打拍子最方便的方法一便是拍手,而在傳統的節奏練習上則要依賴老師來指導學生,跟著音樂用拍手聲將拍子打在正確的拍點上,若現場無老師指導時,學生在練習上便會無法有效確定自己配合音樂打拍子的節奏感練習,其所打的拍子是否位於正確的拍點上,因為對於拍手聲結合於音樂聲之中,要在音樂聲中正確找到拍手點實為不易,同時更在練習後針對錄製音頻上,要在音樂聲中正確偵測出拍手聲並不容易,導致在判斷上會有誤差與遺漏,因此對於節奏練習與比對上,實需進行改進。In the process of learning music, it is very important to have a correct sense of rhythm. The rhythm practice method in the beginner period is usually to beat the beat with the sound of the metronome. After the practice has a certain foundation, you can Play music, listen to the music and beat the beat with the beat of the music, and the most convenient way to beat the beat is to clap your hands, while in traditional rhythm practice, you have to rely on the teacher to guide the students, follow the music and use the sound of clapping to beat the beat on the rhythm. On the correct beat, if there is no teacher's guidance on the scene, students will not be able to effectively determine their rhythm practice with the music to beat the beat, and whether the beat is on the correct beat, because the combination of clapping sounds In the sound of music, it is not easy to correctly find the clapping point in the sound of music, and it is not easy to correctly detect the sound of clapping in the sound of music for the recorded audio after practice, resulting in some judgments. There are errors and omissions, so there is a real need for improvement in rhythm practice and comparison.
因此,本發明之目的,是在提供一種深度學習之節奏練習系統,其可藉由音頻辨識與深度練習的演算法對錄製的音頻訊號進行運算,有效偵測、比對出拍手聲是否位於正確的拍點節奏上,大幅提升節奏練習的準確性。Therefore, the purpose of the present invention is to provide a deep learning rhythm practice system, which can perform operations on the recorded audio signals through the algorithm of audio recognition and in-depth practice, so as to effectively detect and compare whether the clapping sound is in the correct position. On the beat rhythm, the accuracy of rhythm practice is greatly improved.
於是,本發明一種深度學習之節奏練習系統,包含有處理裝置及偵測比對裝置;其中,該偵測比對裝置包括有一儲存模組,以及一與該儲存模組連接之深度學習模組,而前述該儲存模組至少儲存有一訓練資料集、一驗證資料集及一具有拍點的音樂集,是以,利用該偵測比對裝置之儲存模組所提供具有拍點的音樂集供該處理裝置連接下載,以進行拍手節奏練習,並透過該處理裝置進一步針對前述的拍手節奏練習予以錄製轉換為音頻訊號並予以輸出,並經該深度學習模組以深度練習的演算法對錄製的音頻訊號進行運算,以進一步自動偵測、比對判斷出輸入的音頻訊號是否出現拍手聲,同時更能比對出拍手聲是否位於正確的拍點節奏上,大大提升節奏練習比對的準確性。 Therefore, a deep learning rhythm practice system of the present invention includes a processing device and a detection and comparison device; wherein, the detection and comparison device includes a storage module and a deep learning module connected to the storage module. , and the aforementioned storage module at least stores a training data set, a verification data set and a music set with beats, so that the music set with beats provided by the storage module of the detection and comparison device is used for The processing device is connected and downloaded to perform clapping rhythm practice, and the processing device further records and converts the aforementioned clapping rhythm practice into audio signals and outputs them. The audio signal is calculated to further automatically detect and compare the input audio signal to determine whether there is a clapping sound, and at the same time, it can better compare whether the clapping sound is on the correct beat rhythm, which greatly improves the accuracy of rhythm practice comparison. .
圖1是本發明一較佳實施例之示意圖。 FIG. 1 is a schematic diagram of a preferred embodiment of the present invention.
圖2是本發明該較佳實施例之同時具有音樂聲+拍手聲長度為10秒的音頻訊號(紅色箭頭標示為拍手聲出現的位置)示意圖。 FIG. 2 is a schematic diagram of an audio signal (the red arrow indicates the position where the clapping sound appears) with music sound + clapping sound with a length of 10 seconds at the same time in the preferred embodiment of the present invention.
圖3是本發明該較佳實施例之標記為無拍手聲的長度為0.1秒的音頻訊號示意圖。 3 is a schematic diagram of an audio signal with a length of 0.1 seconds marked as no clapping sound according to the preferred embodiment of the present invention.
圖4是本發明該較佳實施例之標記為有拍手聲的長度為0.1秒的音頻訊號示意圖。 FIG. 4 is a schematic diagram of an audio signal with a length of 0.1 seconds marked as having a clapping sound according to the preferred embodiment of the present invention.
圖5是本發明該較佳實施例之深度學習提取音頻特徵示意圖。 FIG. 5 is a schematic diagram of extracting audio features by deep learning according to the preferred embodiment of the present invention.
圖6是本發明該較佳實施例之流程方塊示意圖。 FIG. 6 is a schematic block diagram of a flow chart of the preferred embodiment of the present invention.
有關本發明之前述及其他技術內容、特點與功效,在以下配合參考圖式之較佳實施例的詳細說明中,將可清楚的明白。 The foregoing and other technical contents, features and effects of the present invention will be clearly understood in the following detailed description of the preferred embodiments with reference to the drawings.
參閱圖1,本發明之一較佳實施例,該深度學習之節奏練 習系統包含有一處理裝置,以及一與該處理裝置連接之偵測比對裝置;其中,該處理裝置為一可運用在智慧型手機、平板電腦等具有數據通訊的行動裝置上的應用程式,同時該偵測比對裝置為一終端機設置,且得以與該處理裝置以連線方式進行使用。 Referring to FIG. 1, a preferred embodiment of the present invention, the rhythm training of the deep learning The learning system includes a processing device and a detection and comparison device connected with the processing device; wherein, the processing device is an application program that can be used on mobile devices with data communication such as smart phones and tablet computers, and at the same time The detection and comparison device is set in a terminal, and can be used in connection with the processing device.
仍續前述,該偵測比對裝置包括有一儲存模組,以及一與該儲存模組連接之深度學習模組,其中,該儲存模組可為快閃記憶體、或其他儲存媒介等,且該儲存模組至少儲存有一訓練資料集,一驗證資料集及一具有拍點的音樂集,而前述該訓練資料集為錄製有由多位不同錄製者在至少10種音樂聲中的拍手聲,且每位錄製者錄製10段音頻訊號,對應10種音樂聲,每一段音頻訊號為60秒,同時在具有複數個拍手聲的音頻訊號,將其中多個無拍手聲的音頻訊號進行0.1秒的標記,而對於其中多個有拍手聲的音頻訊號進行0.1秒的標記所錄製儲存而成,以提供作為訓練模型使用,而在該驗證資料集中則錄製有與該訓練資料集不同之多位不同錄製者在至少10種音樂聲中的拍手聲,同時每位錄製者錄製10段音頻訊號,並對應10種音樂聲,每一段音頻訊號為60秒,而在具有複數個拍手聲的音頻訊號,且同樣將其中多個無拍手聲的音頻訊號進行0.1秒標記,並同時對於其中多個有拍手聲的音頻訊號進行0.1秒標記所錄製而成,以作為驗證模型正確性使用。 Continuing the above, the detection and comparison device includes a storage module and a deep learning module connected to the storage module, wherein the storage module can be a flash memory, or other storage media, etc., and The storage module stores at least a training data set, a verification data set and a music set with beat points, and the training data set is recorded with at least 10 kinds of music sounds of clapping by a plurality of different recorders, And each recorder records 10 audio signals, corresponding to 10 kinds of music sounds, each audio signal is 60 seconds, and at the same time, in the audio signal with multiple clapping sounds, the audio signals without clapping sound are processed for 0.1 second. Mark, and record and store a plurality of audio signals with clapping sound for 0.1 seconds to provide for use as a training model, and in the verification data set, a number of different bits that are different from the training data set are recorded. The recorder's clapping sound in at least 10 kinds of music sounds, and each recorder records 10 audio signals corresponding to 10 kinds of music sounds, each audio signal is 60 seconds, and in the audio signal with multiple clapping sounds, In addition, a plurality of audio signals without clapping sound are marked for 0.1 second, and a plurality of audio signals with clapping sound are marked for 0.1 second, which is used to verify the correctness of the model.
再者,對於前述儲存在該訓練資料集與該驗證資料集中之音頻訊號的取樣率,於本實施例中以16KHz為例,其方式為將錄製好的音頻訊號經過4階Butterworthe高通濾波器,3dB通過頻率為0.5Hz,用於濾除直流偏移,且再採用平均值正規化方式將濾波後的音頻訊號等比例縮放到[-1,1]區間,而其算式如下列:
其中X max 與X min 分別為資料中的最小值與最大值;μ為資料的平均值,則資料將縮放到[-1,1]區間中且平均值為=0 Where X max and X min are the minimum and maximum values in the data, respectively; μ is the average value of the data, then the data will be scaled to the [-1,1] interval and the average value is =0
仍續前述,經由前述運算後,得以將正規化後之音頻訊號依序切割成長度0.1秒的音頻訊號,且以人工判別方式,將每一個0.1秒的音頻訊號,標記為有拍手聲或無拍手聲,即如圖2至圖4所示,至於,該具有拍點的音樂集則儲存錄製有複數與該訓練資料集、驗證資料集之音頻訊號相對應且具有拍點的音樂資料,以供與該處理裝置進行連接下載使用。 Continuing the above, after the aforementioned operations, the normalized audio signals can be sequentially cut into audio signals with a length of 0.1 seconds, and each 0.1 second audio signal is marked as having a clapping sound or no audio signal by manual discrimination. The clapping sound is as shown in Figures 2 to 4. As for the music set with beat points, a plurality of music data with beat points corresponding to the audio signals of the training data set and the verification data set are stored and recorded, so as to It is used for connection and download with the processing device.
接續前述,至於該深度學習模組針對該處理裝置所輸入的音頻訊號,係透過該深度學習模組的技術去提高辨識的準確率,另再配合參閱圖5,而在該深度學習模組中具有至少五個特徵提取層、一個平坦層、二個分類及一個全連接層,而前述該深度學習模組是以輸入訊號為0.1秒音頻訊號,取樣頻率為16K,因此輸入訊號長度為1600個取樣點,且該每一特徵提取層進一步還包括有一卷積神經網路層、一批次正規化層、一激勵層、一最大值池化層及一捨棄層,而該卷積神經網路層具有三十二個特徵圖,且該卷積神經網路層的卷積核長度為16,同時該最大值池化層的池化長度為2設計,因此在五個該特徵提取層之後是該平坦層,其用來將二維的特徵矩陣轉換為一維的特徵向量,以提供後面的該等分類層做為輸入訊號,且該每一分類層還進一步包括有一全連接神經網路層、一批次正規化層、一激勵層及一捨棄層,且該全連接神經網路層具有一百二十八個神經元,同時在二個該分類層之後是該全連接層,其具有二個神經元,用來計算該全連接層的二個輸出訊號對應的機率,且這二個輸出訊號分別對應於無拍手聲與有拍手聲,而分類結果即 為較大機率對應的類別,因此前述在該特徵提取層和該分類層中的卷積神經網路層及全連接神經網路層之後的該批次正規化層的功能是對訊號進行正規化,使已正規化後的訊號在進入該激勵層時,可以提高神經網路的速度、性能和穩定性,且在該特徵提取層中設有該最大值池化層的目的是為降低網路的複雜性和過度擬合的可能性,而藉由其池化長度為2的該最大值池化層可將每個特徵圖的數量減少為一半,同時再藉由該捨棄層的配合可用來減少過度擬合作用,藉以形成一最佳化的深度學習模組;同時,該深度學習模組採用該訓練資料集中所有的0.1秒音頻訊號以及對應的有或無拍手聲標記,並配合最陡下降法進行模型訓練,以找出最佳化的模型參數,接著使用該驗證資料集中所有的0.1秒音頻訊號以及對應的有或無拍手聲的標記,取得分類的結果與匹配機率,計算驗證的正確性,本發明之驗證準確度可達95%以上。 Continuing from the above, as for the audio signal input by the deep learning module to the processing device, the technology of the deep learning module is used to improve the accuracy of recognition. Please refer to FIG. 5 again. In the deep learning module There are at least five feature extraction layers, one flat layer, two classification layers and one fully connected layer, and the aforementioned deep learning module takes the input signal as a 0.1-second audio signal and the sampling frequency is 16K, so the length of the input signal is 1600 sampling points, and each feature extraction layer further includes a convolutional neural network layer, a batch normalization layer, an excitation layer, a maximum pooling layer and a discarding layer, and the convolutional neural network The layer has thirty-two feature maps, and the convolution kernel length of the convolutional neural network layer is 16, and the pooling length of the maximum pooling layer is designed to be 2, so after the five feature extraction layers are The flat layer is used to convert a two-dimensional feature matrix into a one-dimensional feature vector to provide the following classification layers as input signals, and each classification layer further includes a fully connected neural network layer , a batch of regularization layers, an excitation layer and a rejection layer, and the fully connected neural network layer has one hundred and twenty-eight neurons, and after the two classification layers is the fully connected layer, which has Two neurons are used to calculate the probability corresponding to the two output signals of the fully connected layer, and the two output signals correspond to no clapping sound and clapping sound respectively, and the classification result is is a category corresponding to a greater probability, so the function of the batch normalization layer after the convolutional neural network layer and the fully connected neural network layer in the feature extraction layer and the classification layer is to normalize the signal. , so that when the normalized signal enters the excitation layer, the speed, performance and stability of the neural network can be improved, and the purpose of setting the maximum pooling layer in the feature extraction layer is to reduce the speed of the network. the complexity and the possibility of overfitting, and the max pooling layer with a pooling length of 2 can reduce the number of each feature map by half, and then through the cooperation of the discarding layer can be used to Reduce the effect of overfitting, thereby forming an optimized deep learning module; at the same time, the deep learning module uses all the 0.1-second audio signals in the training data set and the corresponding marks with or without clapping, and match the steepest The descent method is used to train the model to find the optimal model parameters, and then use all the 0.1-second audio signals in the verification data set and the corresponding marks with or without clapping to obtain the classification results and matching probability, and calculate the verification results. Correctness, the verification accuracy of the present invention can reach more than 95%.
最後,該處理裝置包括有一播放模組,一輸入模組,以及一分別與該播放模組、該輸入模組連接之收發模組,而前述該播放模組可透過該收發模組載入該儲存模組之具有拍點的音樂集進行播放,而該輸入模組具有錄音功能,其能對播放當中的音樂與拍手聲進行錄音,且將錄音檔轉換為一音頻訊號,至於該收發模組可將該音頻訊號予以輸出,並連接至該偵測比對裝置進行音頻訊號的訓練與驗證。 Finally, the processing device includes a playback module, an input module, and a transceiver module respectively connected to the playback module and the input module, and the playback module can load the playback module through the transceiver module. The music set with beat points in the storage module is played, and the input module has a recording function, which can record the music and clapping during playback, and convert the recording file into an audio signal. As for the transceiver module The audio signal can be output and connected to the detection and comparison device for training and verification of the audio signal.
參閱圖1至圖6,使用者欲進行拍手節奏練習時,其可開啟行動裝置,以點選該播放模組,使該播放模組透過該收發模組來連接至該儲存模組中,以對儲存於內之具有拍點的音樂集中的音樂進行選擇並載入播放,使用者即可於該播放模組在播放音樂過程中,跟著一同在需要的拍手聲的節奏點上進行打拍子的節奏感練習,因此當跟隨各種不同音樂練習數遍後,若使用者想清楚瞭解自己拍手的節奏是否位在正確的 拍點上時,其可在下一次點選該播放模組載入音樂進行播放的時候,同時再一併點選啟動該輸入裝置以進行同步錄音,這時該輸入裝置便可針對當下使用者對應播放音樂所進行的拍手聲與播放的音樂以每隔0.1秒方式進行錄音,並對所錄製的錄音檔轉換為一音訊檔,以輸出至該收發模組,且由該收發模組將該音訊檔予以輸出連接至該偵測比對裝置中,這時該偵測比對裝置便可透過最佳化該深度學習模型,去偵測、辨識該音頻檔中之每0.1秒的音頻訊號是否出現拍手聲,如果出現拍手聲,接著比對該0.1秒音頻訊號所在的時間,是否位於音樂節奏拍點上,如果是,則判斷為一次位於正確拍點的拍手聲,如果不是,則判斷為一次位於錯誤拍點的拍手聲,以此類推進行,同時該偵測比對裝置亦會將驗證比對後的結果連接至該處理裝置,且由該收發模組接收,使該處理裝置將結果狀態輸出顯示給使用者瞭解,因此在使用上,得以不受場地的限制,隨時都可進行節奏訓練的練習,與即時進行節奏拍點的拍手聲的比對驗證,因此對於音樂拍點的拍手聲比對相當具有效率性,大大有效提升節奏練習的準確性。 Referring to FIG. 1 to FIG. 6 , when the user wants to practice clapping rhythm, he can turn on the mobile device to click on the playback module, so that the playback module is connected to the storage module through the transceiver module, so as to Select the music stored in the music collection with beats and load it to play, and the user can beat the beat at the rhythm point of the desired clapping sound when the playback module is playing the music. Practice the sense of rhythm, so after practicing several times with a variety of different music, if the user wants to know clearly whether the rhythm of his clapping is in the correct position When the tap is clicked, the next time you click on the playback module to load the music for playback, you can also click to activate the input device for synchronous recording, and then the input device can play correspondingly to the current user. The clapping of music and the playing music are recorded every 0.1 seconds, and the recorded recording file is converted into an audio file for output to the transceiver module, and the transceiver module uses the audio file. The output is connected to the detection and comparison device. At this time, the detection and comparison device can detect and identify whether the audio signal of every 0.1 second in the audio file has a clap sound by optimizing the deep learning model. , if there is a clapping sound, then compare the time of the 0.1-second audio signal to see if it is on the beat of the music rhythm. If so, it is judged as a clapping sound at the correct time. If not, it is judged as a wrong time. The sound of clapping hands, and so on, at the same time, the detection and comparison device will also connect the verification and comparison result to the processing device, and receive it by the transceiver module, so that the processing device outputs the result status for display For users to understand, so in use, it is not limited by the venue, you can practice rhythm training at any time, and compare and verify the clapping sound of rhythm beats in real time, so the clapping sound of music beats can be compared. It is quite efficient and greatly improves the accuracy of rhythm practice.
歸納前述,本發明深度學習之節奏練習系統,其藉由該偵測比對裝置之儲存模組得以提供儲存於內的具有拍點的音樂集,可供該處理裝置透過連線方式進行下載播放,以供使用者配合該具有拍點的音樂集進行節奏拍點的拍手聲練習,同時更可通過該處理裝置得以同步對對拍手聲與音樂進行錄音並轉換為一音頻訊號,以連經至該偵測比對裝置中以深度練習的演算法對錄製的音頻訊號進行運算,且與該儲存模組內的儲存資料進行偵測、比對,不僅能有效自動判斷出輸入的音頻訊號是否出現拍手聲,更能進一步驗證與比對出拍手聲是否位於正確的拍點節奏上,因此在使用上,得以不受場地的限制,隨時都可進行節奏訓練 的練習,有效提升驗證節奏練習比對的準確性。 To sum up the above, the deep learning rhythm practice system of the present invention can provide a music collection with beat points stored in the storage module of the detection and comparison device, which can be downloaded and played by the processing device through connection. , so that the user can practice the clapping sound of rhythm and beats with the music set with beat points, and at the same time, the processing device can simultaneously record the clapping sound and the music and convert it into an audio signal, so as to be connected to the The detection and comparison device uses a deep-practice algorithm to perform operations on the recorded audio signals, and performs detection and comparison with the stored data in the storage module, which can not only effectively and automatically determine whether the input audio signals appear or not. The clapping sound can further verify and compare whether the clapping sound is in the correct rhythm, so in use, it can be used for rhythm training at any time without being restricted by the venue. It can effectively improve the accuracy of verification rhythm practice comparison.
惟以上所述者,僅為說明本發明之較佳實施例而已,當不能以此限定本發明實施之範圍,即大凡依本發明申請專利範圍及發明說明書內容所作之簡單的等效變化與修飾,皆應仍屬本發明專利涵蓋之範圍內。 However, the above descriptions are only to illustrate the preferred embodiments of the present invention, and should not limit the scope of implementation of the present invention. , shall still fall within the scope covered by the patent of the present invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109143675A TWI768589B (en) | 2020-12-10 | 2020-12-10 | Deep learning rhythm practice system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109143675A TWI768589B (en) | 2020-12-10 | 2020-12-10 | Deep learning rhythm practice system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202223879A TW202223879A (en) | 2022-06-16 |
| TWI768589B true TWI768589B (en) | 2022-06-21 |
Family
ID=83062348
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW109143675A TWI768589B (en) | 2020-12-10 | 2020-12-10 | Deep learning rhythm practice system |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI768589B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030040904A1 (en) * | 2001-08-27 | 2003-02-27 | Nec Research Institute, Inc. | Extracting classifying data in music from an audio bitstream |
| US20160306844A1 (en) * | 2015-01-29 | 2016-10-20 | Affectomatics Ltd. | Determining a Cause of Inaccuracy in Predicted Affective Response |
| TW201820315A (en) * | 2016-11-21 | 2018-06-01 | 法國國立高等礦業電信學校聯盟 | Improved audio headset device |
| CN108962217A (en) * | 2018-07-28 | 2018-12-07 | 华为技术有限公司 | Phoneme synthesizing method and relevant device |
-
2020
- 2020-12-10 TW TW109143675A patent/TWI768589B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030040904A1 (en) * | 2001-08-27 | 2003-02-27 | Nec Research Institute, Inc. | Extracting classifying data in music from an audio bitstream |
| US20160306844A1 (en) * | 2015-01-29 | 2016-10-20 | Affectomatics Ltd. | Determining a Cause of Inaccuracy in Predicted Affective Response |
| TW201820315A (en) * | 2016-11-21 | 2018-06-01 | 法國國立高等礦業電信學校聯盟 | Improved audio headset device |
| CN108962217A (en) * | 2018-07-28 | 2018-12-07 | 华为技术有限公司 | Phoneme synthesizing method and relevant device |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202223879A (en) | 2022-06-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107274916B (en) | Method and device for operating audio/video files based on voiceprint information | |
| CN113724712B (en) | A Bird Voice Recognition Method Based on Multi-feature Fusion and Combination Model | |
| CN109686377B (en) | Audio identification methods and device, computer readable storage medium | |
| CN110085261A (en) | A kind of pronunciation correction method, apparatus, equipment and computer readable storage medium | |
| CN108986824B (en) | Playback voice detection method | |
| CN114863937A (en) | Hybrid birdsong identification method based on deep migration learning and XGboost | |
| CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
| CN109669661A (en) | Control method of dictation progress and electronic equipment | |
| CN113889081B (en) | Speech recognition method, medium, device and computing equipment | |
| CN113691382A (en) | Conference recording method, device, computer equipment and medium | |
| CN115881145B (en) | Speech processing and training methods and electronic devices | |
| CN115954007A (en) | A voiceprint detection method, device, electronic equipment and storage medium | |
| CN108810838A (en) | The room-level localization method known based on smart mobile phone room background phonoreception | |
| CN110570870A (en) | A text-independent voiceprint recognition method, device and equipment | |
| CN118675092A (en) | Multi-mode video understanding method based on large language model | |
| CN116052725A (en) | A fine-grained bowel sound recognition method and device based on deep neural network | |
| CN115438725A (en) | State detection method, device, equipment and storage medium | |
| CN114937454A (en) | Method, device and storage medium for preventing voice synthesis attack by voiceprint recognition | |
| CN110136746A (en) | A method for mobile phone source identification in additive noise environment based on fusion feature | |
| CN111326161B (en) | Voiceprint determining method and device | |
| CN113255470B (en) | Multi-mode piano accompany training system and method based on hand gesture estimation | |
| CN114520003B (en) | Voice interaction method, device, electronic equipment and storage medium | |
| CN114822557B (en) | Methods, devices, equipment and storage media for distinguishing different sounds in the classroom | |
| TWI768589B (en) | Deep learning rhythm practice system | |
| WO2023273469A1 (en) | Model training method, voice detection and localization method, apparatus, device, and medium |