CN115840877B - Distributed stream processing method, system, storage medium and computer extracted from MFCC - Google Patents
Distributed stream processing method, system, storage medium and computer extracted from MFCC Download PDFInfo
- Publication number
- CN115840877B CN115840877B CN202211558715.XA CN202211558715A CN115840877B CN 115840877 B CN115840877 B CN 115840877B CN 202211558715 A CN202211558715 A CN 202211558715A CN 115840877 B CN115840877 B CN 115840877B
- Authority
- CN
- China
- Prior art keywords
- data
- data stream
- mel
- function
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 97
- 238000012545 processing Methods 0.000 claims abstract description 67
- 238000000605 extraction Methods 0.000 claims abstract description 52
- 238000013507 mapping Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 13
- 239000000284 extract Substances 0.000 claims abstract description 10
- 238000005192 partition Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000003491 array Methods 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 description 16
- 239000011159 matrix material Substances 0.000 description 9
- 238000005070 sampling Methods 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- ZLIBICFPKPWGIZ-UHFFFAOYSA-N pyrimethanil Chemical compound CC1=CC(C)=NC(NC=2C=CC=CC=2)=N1 ZLIBICFPKPWGIZ-UHFFFAOYSA-N 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000010998 test method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005111 flow chemistry technique Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 101000934888 Homo sapiens Succinate dehydrogenase cytochrome b560 subunit, mitochondrial Proteins 0.000 description 1
- 102100025393 Succinate dehydrogenase cytochrome b560 subunit, mitochondrial Human genes 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Complex Calculations (AREA)
Abstract
本发明涉及MFCC提取的分布式流处理方法、系统、存储介质及计算机,其方法包括如下步骤:并行获取多源信号原始数据流;多源信号原始数据流的数据类型为String数据;将多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;对多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;利用并行窗口处理函数在并行的连续不断的滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流;本发明通过对数据流进行扁平映射以及分窗操作,让多源数据流的梅尔频率倒谱系数提取工作能够并行执行,提高梅尔频率倒谱系数的提取效率和及时性,避免了大批数据离线处理的滞后性以及数据处理量庞大而带来的数据处理压力。
The present invention relates to the distributed stream processing method, system, storage medium and computer that MFCC extracts, and its method comprises the following steps: obtain multi-source signal original data stream in parallel; The data type of multi-source signal original data stream is String data; Multi-source Perform parallel flat mapping on the original signal data stream to obtain multi-source discrete signal data streams; perform data stream windowing operation on multi-source discrete signal data streams to obtain parallel continuous sliding windows; use parallel window processing functions in parallel continuous Mel frequency cepstral coefficients are extracted in the sliding window of the multi-source signal to obtain the corresponding Mel frequency cepstral coefficient data stream of the multi-source signal; the present invention makes the Mel frequency of the multi-source data stream The extraction of cepstral coefficients can be performed in parallel, improving the efficiency and timeliness of the extraction of Mel-frequency cepstral coefficients, and avoiding the lag of offline processing of a large number of data and the pressure of data processing caused by the huge amount of data processing.
Description
技术领域technical field
本发明涉及工业大数据和信号处理领域,用于机械设备振动信号、声音信号的实时处理,具体涉及MFCC提取的分布式流处理方法、系统、存储介质及计算机。The invention relates to the field of industrial big data and signal processing, and is used for real-time processing of mechanical equipment vibration signals and sound signals, in particular to a distributed stream processing method, system, storage medium and computer extracted by MFCC.
背景技术Background technique
MFCC,全称为Mel Frequency Cepstrum Coefficient,即梅尔频率倒谱系数,MFCC常用于声音信号的处理,也用于振动信号处理。在机械设备运维场景下,振动监测和声音监听是常用的技术手段,通过提取信号的MFCC,用于机械设备的异常检测和故障诊断。MFCC, the full name is Mel Frequency Cepstrum Coefficient, which is the Mel frequency cepstrum coefficient. MFCC is often used in the processing of sound signals and also in vibration signal processing. In the operation and maintenance scenario of mechanical equipment, vibration monitoring and sound monitoring are commonly used technical means. Through the MFCC that extracts the signal, it is used for abnormal detection and fault diagnosis of mechanical equipment.
目前已知的MFCC提取方法和技术,都是在振动数据或声音数据采集完成后进行离线分析的,MFCC提取算法需要对采集的信号进行分帧处理。当被监测设备数量增多,振动、声音传感器数量变得越来越庞大时,信号经过分帧后将形成巨量的矩阵,给离线特征提取工作带来巨大压力,而且由于特征提取滞后,难以满足需要实时分析MFCC的在线故障诊断场景。The currently known MFCC extraction methods and technologies all perform off-line analysis after the vibration data or sound data collection is completed, and the MFCC extraction algorithm needs to process the collected signals in frames. When the number of monitored devices increases and the number of vibration and sound sensors becomes larger and larger, the signal will form a huge matrix after being framed, which will bring huge pressure to the offline feature extraction work, and due to the lag of feature extraction, it is difficult to meet Online fault diagnosis scenarios of MFCCs need to be analyzed in real time.
发明内容Contents of the invention
为了解决被监测设备数量较多时,振动、声音传感器数量变得越来越庞大时,信号经过分帧后将形成巨量的矩阵,给离线特征提取工作带来巨大压力,而且由于特征提取滞后,难以满足需要实时分析MFCC的在线故障诊断场景等技术问题,本发明提供MFCC提取的分布式流处理方法、系统、存储介质及计算机。In order to solve the problem that when the number of monitored devices is large and the number of vibration and sound sensors becomes larger and larger, the signal will form a huge matrix after being framed, which will bring huge pressure to the offline feature extraction work, and because the feature extraction lags behind, It is difficult to meet technical problems such as real-time analysis of MFCC online fault diagnosis scenarios, and the present invention provides a distributed stream processing method, system, storage medium and computer for MFCC extraction.
本发明解决上述技术问题的技术方案如下:The technical scheme that the present invention solves the problems of the technologies described above is as follows:
MFCC提取的分布式流处理方法,包括如下步骤:The distributed stream processing method extracted by MFCC includes the following steps:
并行获取多源信号原始数据流;其中,所述多源信号原始数据流的数据类型为String数据;Obtaining multi-source signal original data streams in parallel; wherein, the data type of the multi-source signal original data streams is String data;
将所述多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;performing parallel flat mapping on the multi-source signal original data stream to obtain a multi-source discrete signal data stream;
对所述多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;Performing a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows;
利用并行窗口处理函数在并行的连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流。The parallel window processing function is used to extract the Mel-frequency cepstral coefficients in the parallel continuous sliding windows, so as to obtain the Mel-frequency cepstral coefficient data streams corresponding to the multi-source signals.
本发明的有益效果是:振动、声音等信号,每毫秒就产生很多个数据点,如果对数据点进行逐个发送,很有可能就会出现先发生的数据点后到达处理系统的情况,这样处理系统收到的数据就会出现乱序。所以本发明通过将多个毫秒采集的数据封装成一个String格式的片段来发送,片段内的信号数据点是按照原本发生顺序排列的,因此不会乱序,由于每个片段产生前后相差有多个毫秒,在网络传输正常时片段之间也不会产生乱序。当原始数据流的数据源有多个时,通过对数据流进行扁平映射以及分区分窗操作,让多源数据流的梅尔频率倒谱系数提取工作能够并行执行,提高梅尔频率倒谱系数的提取效率和及时性,避免了大批数据离线处理的滞后性以及数据处理量庞大而带来的数据处理压力。The beneficial effects of the present invention are: vibration, sound and other signals generate a lot of data points per millisecond, if the data points are sent one by one, it is very likely that the data points that occur first will arrive at the processing system. The data received by the system will be out of order. Therefore, the present invention encapsulates the data collected in multiple milliseconds into a segment in String format to send, and the signal data points in the segment are arranged according to the original order of occurrence, so they will not be out of sequence. Milliseconds, there will be no out-of-order between fragments when the network transmission is normal. When there are multiple data sources of the original data stream, the extraction of Mel-frequency cepstral coefficients of multi-source data streams can be performed in parallel by performing flat mapping and partitioning and windowing operations on the data streams, improving the Mel-frequency cepstral coefficients The extraction efficiency and timeliness are high, avoiding the lag of offline processing of a large amount of data and the data processing pressure caused by the huge amount of data processing.
在上述技术方案的基础上,本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.
进一步,所述String数据至少包括时间戳、传感器ID、信号值以及分隔符,其中,所述传感器ID为所述原始数据流对应的传感器编号。Further, the String data includes at least a timestamp, a sensor ID, a signal value, and a delimiter, wherein the sensor ID is a sensor number corresponding to the original data stream.
进一步,将所述多源原始数据流进行并行扁平映射,得到多源离散信号数据流,包括如下步骤:利用Flink流处理方法将所述多源原始数据流进行并行扁平映射,得到多源离散信号数据流。Further, performing parallel flat mapping on the multi-source original data stream to obtain a multi-source discrete signal data stream includes the following steps: performing parallel flat mapping on the multi-source original data stream by using the Flink stream processing method to obtain a multi-source discrete signal data flow.
采用上述进一步方案的有益效果是,Flink流处理具备的高吞吐、低延迟、分布式等特点,提高了数据处理效率。The beneficial effect of adopting the above further solution is that Flink stream processing has the characteristics of high throughput, low latency, and distributed, which improves the data processing efficiency.
进一步,对所述多源离散信号数据流进行数据流分窗操作,得到把并行的连续不断的滑动窗口,包括如下步骤:Further, performing a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows includes the following steps:
将所述多源离散信号数据流按照传感器ID进行keyBy操作,得到键控数据流;其中,keyBy操作具体为将传感器ID相同的所述多源离散信号数据流发送到指定分区中;Performing a keyBy operation on the multi-source discrete signal data stream according to the sensor ID to obtain a keyed data stream; wherein, the keyBy operation is specifically sending the multi-source discrete signal data stream with the same sensor ID to a designated partition;
对每一分区中的所述键控数据流进行数据流分窗操作,得到每个传感器对应的并行的连续不断的滑动窗口。A data stream windowing operation is performed on the keyed data stream in each partition to obtain parallel and continuous sliding windows corresponding to each sensor.
进一步,利用并行窗口处理函数在连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到梅尔频率倒谱系数数据流,包括如下步骤:Further, using the parallel window processing function to extract the Mel-frequency cepstral coefficients in the continuous sliding window, to obtain the Mel-frequency cepstral coefficient data stream, comprises the following steps:
利用所述并行窗口处理函数将每个所述滑动窗口中的数据存储在一个对应的双精度数组中;storing the data in each of the sliding windows in a corresponding double-precision array by using the parallel window processing function;
对每个所述双精度数组调用梅尔频率倒谱系数提取函数得到所述梅尔频率倒谱系数数据流。Calling a Mel-frequency cepstral coefficient extraction function for each of the double-precision arrays to obtain the Mel-frequency cepstral coefficient data stream.
进一步,所述梅尔频率倒谱系数提取函数包括主函数以及多个子函数,多个所述子函数分别为梅尔滤波器组函数、离散余弦变换函数、快速傅里叶变换函数以及海明窗口函数;Further, the Mel-frequency cepstral coefficient extraction function includes a main function and a plurality of sub-functions, and the plurality of sub-functions are respectively a Mel filter bank function, a discrete cosine transform function, a fast Fourier transform function, and a Hamming window function;
将所述双精度数组输入所述主函数,所述主函数通过调用多个所述子函数对所述双精度数组进行计算,得到梅尔频率倒谱系数。The double-precision array is input into the main function, and the main function calculates the double-precision array by calling a plurality of the sub-functions to obtain Mel-frequency cepstral coefficients.
为了解决上述技术问题,本发明还提供多源信号梅尔频率倒谱系数提取分布式流处理系统,其具体技术内容如下:In order to solve the above-mentioned technical problems, the present invention also provides a distributed stream processing system for extracting multi-source signal Mel-frequency cepstral coefficients, the specific technical content of which is as follows:
MFCC提取的分布式流处理系统,包括:The distributed stream processing system extracted by MFCC, including:
数据获取模块,用于并行获取多源信号原始数据流;其中,所述多源信号原始数据流的数据类型为String数据;A data acquisition module, configured to acquire multi-source signal original data streams in parallel; wherein, the data type of the multi-source signal original data streams is String data;
数据处理模块,用于将所述多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;对所述多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;利用并行窗口处理函数在连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流。The data processing module is used to perform parallel flat mapping on the multi-source signal original data stream to obtain a multi-source discrete signal data stream; perform a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous Sliding window: using a parallel window processing function to extract Mel-frequency cepstral coefficients in the continuous sliding windows to obtain a data stream of Mel-frequency cepstral coefficients corresponding to multi-source signals.
基于多源信号梅尔频率倒谱系数提取分布式流处理方法,本发明还提供一种存储介质,其技术内容如下:Based on the distributed stream processing method for extracting multi-source signal Mel-frequency cepstral coefficients, the present invention also provides a storage medium, the technical content of which is as follows:
一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被计算机的处理器执行时,实现上述多源信号梅尔频率倒谱系数提取分布式流处理方法。A storage medium stores a computer program, and when the computer program is executed by a computer processor, the above-mentioned distributed flow processing method for extracting Mel-frequency cepstral coefficients of multi-source signals is realized.
基于多源信号梅尔频率倒谱系数分布式流处理提取方法,本发明还提供一种计算机,其技术内容如下:Based on the multi-source signal Mel frequency cepstral coefficient distributed stream processing extraction method, the present invention also provides a computer, the technical content of which is as follows:
一种计算机,包括存储器以及处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,实现上述MFCC提取的分布式流处理方法。A computer, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the above-mentioned distributed flow processing method extracted by MFCC is realized.
附图说明Description of drawings
图1为本发明实施例1中一种MFCC提取的分布式流处理方法的流程框图;Fig. 1 is the block flow diagram of the distributed stream processing method that a kind of MFCC extracts in the embodiment of the present invention 1;
图2为本发明实施例1中信号流扁平映射处理流程示意图;FIG. 2 is a schematic diagram of a signal flow flat mapping processing flow in Embodiment 1 of the present invention;
图3为本发明实施例1中MFCC提取函数集的结构示意图;Fig. 3 is the structural representation of MFCC extraction function set in the embodiment 1 of the present invention;
图4为本发明实施例3中原始振动信号的曲线图;Fig. 4 is the graph of original vibration signal in the embodiment 3 of the present invention;
图5为本发明实施例3中振动信号数据流记录图;Fig. 5 is the vibration signal data flow recording figure in the embodiment 3 of the present invention;
图6为本发明实施例3中Flink MFCC提取任务程序执行框图。Fig. 6 is a block diagram of the execution of the Flink MFCC extraction task program in Embodiment 3 of the present invention.
具体实施方式Detailed ways
以下结合附图对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.
实施例1Example 1
如图1所示,本实施例提供一种MFCC提取的分布式流处理方法,包括如下步骤:As shown in Figure 1, the present embodiment provides a distributed stream processing method extracted by MFCC, including the following steps:
S1、并行获取多源信号原始数据流;其中,所述原始数据流的数据类型为String数据;其中,所述String数据至少包括时间戳、传感器ID、信号值以及分隔符,所述传感器ID为所述原始数据流对应的传感器编号。对于振动信号和声音信号数据源而言,信号采集频率往往很高。以采集频率20kHz为例说明,每毫秒将生成20个点,为了避免信号经过网络传输带来的乱序问题,Kafka生产者采用循环打包若干毫秒的方式发送消息记录。Kafka是由Apache软件基金会开发的一个开源流处理平台,是一种高吞吐量的分布式发布订阅消息系统,由Scala和Java编写。S1. Obtain multi-source signal original data streams in parallel; wherein, the data type of the original data stream is String data; wherein, the String data includes at least a timestamp, a sensor ID, a signal value, and a separator, and the sensor ID is The sensor number corresponding to the raw data stream. For vibration signal and sound signal data sources, the frequency of signal acquisition is often high. Taking the collection frequency of 20kHz as an example, 20 points will be generated every millisecond. In order to avoid the out-of-order problem caused by the signal transmission through the network, Kafka producers send message records in a way of cyclically packing several milliseconds. Kafka is an open source stream processing platform developed by the Apache Software Foundation. It is a high-throughput distributed publish-subscribe messaging system written in Scala and Java.
S2、将所述多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;具体为,利用Flink流处理方法将所述多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;大数据场景下,采用Kafka生产者作为数据源,Flink程序则从Kafka中拉取并消费数据。S2. Perform parallel flat mapping of the multi-source signal original data stream to obtain a multi-source discrete signal data stream; specifically, use the Flink stream processing method to perform parallel flat mapping on the multi-source signal original data stream to obtain a multi-source discrete signal data stream. Signal data flow; in big data scenarios, Kafka producers are used as data sources, and Flink programs pull and consume data from Kafka.
S3、所述多源离散信号数据流对所述多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;具体步骤为:将所述多源离散信号数据流按照传感器ID进行keyBy操作,得到键控数据流;其中,keyBy操作具体为将传感器ID相同的所述多源离散信号数据流发送到指定的同一分区中;S3. The multi-source discrete signal data stream performs a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; the specific steps are: divide the multi-source discrete signal data stream according to the sensor The ID performs a keyBy operation to obtain a keyed data stream; wherein, the keyBy operation is specifically to send the multi-source discrete signal data stream with the same sensor ID to the specified same partition;
对每一指定的分区中的所述键控数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口。A data stream windowing operation is performed on the keyed data stream in each specified partition to obtain parallel continuous sliding windows.
S4、利用并行窗口处理函数在并行的连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流。具体步骤为:S4. Using a parallel window processing function to extract Mel-frequency cepstral coefficients in the parallel continuous sliding windows to obtain a data stream of Mel-frequency cepstral coefficients corresponding to multi-source signals. The specific steps are:
利用并行窗口处理函数将每个所述滑动窗口中的数据存储在一个对应的双精度数组中;storing the data in each of the sliding windows in a corresponding double-precision array by utilizing a parallel window processing function;
对每个所述双精度数组调用梅尔频率倒谱系数提取函数得到所述梅尔频率倒谱系数数据流;calling the Mel-frequency cepstral coefficient extraction function for each of the double-precision arrays to obtain the Mel-frequency cepstral coefficient data stream;
其中,所述梅尔频率倒谱系数提取函数包括主函数以及多个子函数,多个所述子函数分别为梅尔滤波器组函数、离散余弦变换函数、快速傅里叶变换函数以及海明窗口函数;Wherein, the Mel-frequency cepstral coefficient extraction function includes a main function and a plurality of sub-functions, and the plurality of sub-functions are respectively a Mel filter bank function, a discrete cosine transform function, a fast Fourier transform function, and a Hamming window function;
将所述双精度数组输入所述主函数,所述主函数通过调用多个所述子函数对所述双精度数组进行计算,得到梅尔频率倒谱系数。调用Mel滤波器组生成函数,构造出Mel滤波器组;然后,调用DCT系数矩阵生成函数,得到DCT系数;接着对所述双精度数组进行快速傅里叶变换,得到频谱结果;最后,对频谱结果滤波,即用Mel滤波器组乘以频谱结果后,求对数并乘以DCT系数,得到梅尔频率倒谱系数。The double-precision array is input into the main function, and the main function calculates the double-precision array by calling a plurality of the sub-functions to obtain Mel-frequency cepstral coefficients. Call the Mel filter bank generation function to construct the Mel filter bank; then, call the DCT coefficient matrix generation function to obtain the DCT coefficients; then carry out fast Fourier transform to the double-precision array to obtain the spectrum result; finally, the spectrum The result is filtered, that is, after the Mel filter bank is multiplied by the spectral result, the logarithm is calculated and multiplied by the DCT coefficient to obtain the Mel frequency cepstral coefficient.
如图2所示,多源信号原始数据由对应的Kafka分区发送,这样利用Flink流处理程序就能以并行的方式从数据源读取数据,数据读取后形成的原始数据流是一条条String格式的数据,然后需要将String格式的数据进行扁平映射。采用Flink的flapMap算子来实现这一转换操作,核心是自定义的FlatMapFunction。flapMap算子的主要步骤是:按照分隔符将String格式的记录划分成字符串数组Array[String];循环遍历字符串数组Array[String],从所述字符串数组Array[String]的第3个元素即第1个信号值开始,每次循环都发送出包装成如下样例类(caseclass)格式的数据:As shown in Figure 2, the original data of the multi-source signal is sent by the corresponding Kafka partition, so that the Flink stream processing program can be used to read data from the data source in parallel, and the original data stream formed after the data is read is a String Format data, and then need to flat map the data in String format. Flink's flapMap operator is used to implement this conversion operation, and the core is the custom FlatMapFunction. The main steps of the flapMap operator are: divide the records in the String format into a string array Array[String] according to the delimiter; The element starts with the first signal value, and each cycle sends out data packaged in the following caseclass format:
case class VibElement(ts:Long,sensorId:Int,acc:Double)case class VibElement(ts:Long,sensorId:Int,acc:Double)
其中,ts为原始记录的时间戳,sensorId为原始记录的传感器ID,acc为离散的信号值。Among them, ts is the timestamp of the original record, sensorId is the sensor ID of the original record, and acc is the discrete signal value.
如图3所示,对扁平映射转换后得到的数据流,按照传感器ID进行keyBy操作,形成键控数据流KeyedStream,后续分窗及窗口处理均针对不同的传感器数据流进行独立并行操作。其中,keyBy主要作用是把具有相同key的数据发送到相同的分区中;数据本来是分布在不同的slot即分区中,keyBy会把相同key的数据拉到相同的slot即分区中。As shown in Figure 3, the keyBy operation is performed on the data stream obtained after the flat mapping conversion according to the sensor ID to form a keyed data stream KeyedStream, and subsequent windowing and window processing are performed independently and in parallel for different sensor data streams. Among them, the main function of keyBy is to send the data with the same key to the same partition; the data is originally distributed in different slots or partitions, and keyBy will pull the data of the same key to the same slot or partition.
分窗操作采取计数滑动窗口,窗口长度即数据元素个数、滑动步长为分窗操作的两个参数。在信号MFCC提取应用中,窗口长度一般设置为256,滑动步长设置为128。The windowing operation adopts a counting sliding window, the window length is the number of data elements, and the sliding step is the two parameters of the windowing operation. In signal MFCC extraction applications, the window length is generally set to 256, and the sliding step is set to 128.
当每个窗口收集完成所有数据时,将数据存储在一个Double类型的数组中,然后对这个Double类型的数组调用MFCC提取函数,得到DenseVector[Double]类型的MFCC。为了辨识该条MFCC数据所属的传感器ID以及信号窗口的时间戳,将最终提取结果包装成如下类型的样例类输出:When all the data is collected for each window, store the data in an array of Double type, and then call the MFCC extraction function on the array of Double type to obtain the MFCC of DenseVector[Double] type. In order to identify the sensor ID to which the MFCC data belongs and the timestamp of the signal window, the final extraction result is packaged into the following type of sample class output:
case class MfccResult(startts:Long,endts:Long,case class MfccResult(startts: Long, endts: Long,
sensorId:Int,mfcc:DenseVector[Double])sensorId:Int,mfcc:DenseVector[Double])
其中,startts从该窗口的第1条数据中提取,为窗口起始时间戳,endts从该窗口的最后1条数据中提取,为窗口截止时间戳,sensorId即键控数据流所带的key标记。这里时间戳均为事件时间,即传感器信号的采集时间。Among them, startts is extracted from the first piece of data in the window, which is the window start time stamp, endts is extracted from the last piece of data in the window, and is the window end time stamp, sensorId is the key tag carried by the keyed data stream . Here, the time stamps are event time, that is, the acquisition time of the sensor signal.
如图4所示,Scala是大数据技术常用的开发语言,为了能供信号流特征提取的Flink主程序直接调用,提供用Scala开发的MFCC提取函数集,MFCC提取函数集包括主函数、Mel滤波器组生成函数、DCT函数、FFT函数、Hamming窗口函数。DCT函数为上述的离散余弦变换系数矩阵生成函数,FFT函数为快速傅里叶变换函数,Hamming窗口函数为海明窗口函数,Mel滤波器组生成函数为梅尔滤波器组生成函数,Mel滤波器组为梅尔滤波器组。As shown in Figure 4, Scala is a commonly used development language for big data technology. In order to be directly called by the Flink main program for signal flow feature extraction, the MFCC extraction function set developed with Scala is provided. The MFCC extraction function set includes the main function and Mel filter. Group generation function, DCT function, FFT function, Hamming window function. The DCT function is the above discrete cosine transform coefficient matrix generation function, the FFT function is the fast Fourier transform function, the Hamming window function is the Hamming window function, the Mel filter bank generation function is the Mel filter bank generation function, and the Mel filter The group is a Mel filter bank.
主函数的流程、输入输出以及函数调用关系如下:The flow, input and output, and function call relationship of the main function are as follows:
①主函数的设计:①The design of the main function:
主函数的输入为Array[Double]类型的数组x,此外还包括采样率fs、Mel滤波器阶数p、FFT变换长度N,这三个参数的数据类型均为I nt即整型数据类型,其中p一般设为24。FFT表示快速傅里叶变换,数组x表示上述双精度数组。The input of the main function is the array x of the Array[Double] type, and also includes the sampling rate fs, the Mel filter order p, and the FFT transformation length N. The data types of these three parameters are Int, that is, the integer data type, Where p is generally set to 24. FFT means Fast Fourier Transform and the array x means the above array of doubles.
主函数的输出为MFCC,MFCC的格式为p/2的DenseVector[Doub le]类型。The output of the main function is MFCC, and the format of MFCC is the DenseVector[Double] type of p/2.
主函数的主要流程是:首先,调用Mel滤波器组生成函数,构造出Mel滤波器组;然后,调用DCT系数矩阵生成函数,得到DCT系数;接着对数组x进行快速傅里叶变换,得到频谱结果;最后,对频谱结果滤波即用滤波器组乘以频谱,求对数并乘以DCT系数,得到MFCC系数。The main flow of the main function is: first, call the Mel filter bank generation function to construct the Mel filter bank; then, call the DCT coefficient matrix generation function to obtain the DCT coefficients; then perform fast Fourier transform on the array x to obtain the spectrum Result; finally, filter the spectrum result, that is, multiply the spectrum by the filter bank, calculate the logarithm and multiply it by the DCT coefficient to obtain the MFCC coefficient.
②Me l滤波器组生成函数,把从主函数输入的三个参数fs、p、N传递给该函数,采用常规的Mel滤波器组生成方法得到DenseMatr ix[Double]类型的Me l滤波器组,该矩阵的尺寸为:②Mel filter bank generation function, the three parameters fs, p, N input from the main function are passed to this function, adopt the conventional Mel filter bank generation method to obtain the Mel filter bank of DenseMatrix[Double] type, The dimensions of this matrix are:
p×(N/2+1)p×(N/2+1)
其中,DCT系数矩阵生成函数用于把从主函数输入的参数p作为该函数的输入,采用常规的DCT系数生成方法得到DenseMatr ix[Doub le]类型的DCT系数矩阵,该矩阵的尺寸为p/2×p。其中,DCT系数为离散余弦变换系数。Among them, the DCT coefficient matrix generation function is used to use the parameter p input from the main function as the input of the function, and the DCT coefficient matrix of the DenseMatrix[Double] type is obtained by using the conventional DCT coefficient generation method, and the size of the matrix is p/ 2 x p. Wherein, the DCT coefficients are discrete cosine transform coefficients.
FFT函数用于把从主函数输入的数组x作为该函数的输入,采用常规的FFT变换方法得到Array[Doubl e]类型的输出,在FFT函数中,首先要对x进行Hammi ng窗过滤。The FFT function is used to take the array x input from the main function as the input of the function, and use the conventional FFT transformation method to obtain the output of the Array[Double] type. In the FFT function, the Hammi ng window filter is first performed on x.
Hamming窗口过滤函数用于把从FFT函数中输入的数组x作为该函数的输入,采用常规的Hamming窗口构建方法得到同样长度的数组,作为滤波后的结果。The Hamming window filter function is used to take the array x input from the FFT function as the input of the function, and use the conventional Hamming window construction method to obtain an array of the same length as the filtered result.
本发明实施例通过将原始数据流的数据设定为Str ing数据,振动、声音等信号,每毫秒就产生很多个数据点,如果对数据点进行逐个发送,很有可能就会出现先发生的数据点后到达处理系统的情况,这样处理系统收到的数据就会出现乱序。所以本发明通过将多个毫秒采集的数据封装成一个String格式的片段来发送,片段内的信号数据点是按照原本发生顺序排列的,因此不会乱序,由于每个片段产生前后相差有多个毫秒,在网络传输正常时片段之间也不会产生乱序。当原始数据流的数据源有多个时,通过对数据流进行扁平映射以及分窗操作,让多源数据流的梅尔频率倒谱系数提取工作能够并行执行,提高梅尔频率倒谱系数的提取效率和及时性,避免了大批数据离线处理的滞后性以及数据处理量庞大而带来的数据处理压力。Flink流处理具备的高吞吐、低延迟、分布式等特点,提高了数据处理效率。不同于现有技术中离线的梅尔频率倒谱系数特征提取,本发明实施例是在流式处理范式下提取梅尔频率倒谱系数,提取过程中不需要分帧操作。In the embodiment of the present invention, by setting the data of the original data stream as String data, vibration, sound and other signals, many data points are generated every millisecond. If the data points are sent one by one, it is very likely that the first occurrence will occur A situation where the data points arrive at the processing system later, so that the data received by the processing system is out of order. Therefore, the present invention encapsulates the data collected in multiple milliseconds into a segment in String format to send, and the signal data points in the segment are arranged according to the original order of occurrence, so they will not be out of sequence. Milliseconds, there will be no out-of-order between fragments when the network transmission is normal. When there are multiple data sources of the original data stream, by performing flat mapping and windowing operations on the data stream, the extraction of the Mel-frequency cepstral coefficients of the multi-source data stream can be performed in parallel, and the efficiency of the Mel-frequency cepstral coefficients can be improved. The extraction efficiency and timeliness avoid the lag of offline processing of large amounts of data and the pressure of data processing caused by the huge amount of data processing. Flink stream processing has the characteristics of high throughput, low latency, and distribution, which improves the efficiency of data processing. Different from the offline feature extraction of Mel-frequency cepstral coefficients in the prior art, the embodiment of the present invention extracts Mel-frequency cepstral coefficients under the stream processing paradigm, and does not require frame division operation during the extraction process.
实施例2Example 2
基于实施例1,本实施例提供一种MFCC提取的分布式流处理系统,包括数据获取模块以及数据处理模块;Based on embodiment 1, this embodiment provides a distributed stream processing system extracted by MFCC, including a data acquisition module and a data processing module;
数据获取模块用于并行获取多源信号原始数据流;其中,所述原始数据流的数据类型为String数据;具体数据获取方式为:采用Kafka生产者作为数据源,持续发送String格式的多源信号原始数据流,Flink程序则从Kafka中拉取并消费数据。The data acquisition module is used to acquire multi-source signal original data flow in parallel; wherein, the data type of said original data flow is String data; the specific data acquisition method is: adopt Kafka producer as data source, continuously send the multi-source signal of String format The original data stream, and the Flink program pulls and consumes data from Kafka.
数据处理模块,用于将所述多源信号原始数据流进行并行扁平映射,得到多源离散信号数据流;对所述多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;利用并行窗口处理函数在并行的连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流。The data processing module is used to perform parallel flat mapping on the multi-source signal original data stream to obtain a multi-source discrete signal data stream; perform a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous Sliding window: using a parallel window processing function to extract Mel-frequency cepstral coefficients in the parallel continuous sliding windows to obtain a data stream of Mel-frequency cepstral coefficients corresponding to multi-source signals.
其中,将所述原始数据流进行并行扁平映射,得到多源离散信号数据流;具体为,利用Flink流处理的flatMap算子将所述原始数据流进行并行扁平映射,得到多源离散信号数据流。Wherein, the original data stream is subjected to parallel flat mapping to obtain a multi-source discrete signal data stream; specifically, the original data stream is subjected to parallel flat mapping by using the flatMap operator of Flink stream processing to obtain a multi-source discrete signal data stream .
所述多源离散信号数据流对所述多源离散信号数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口;具体步骤为:将所述多源离散信号数据流按照传感器ID进行keyBy操作,得到键控数据流;其中,keyBy操作具体为将传感器ID相同的所述多源离散信号数据流发送到指定的同一分区中;The multi-source discrete signal data stream performs a data stream windowing operation on the multi-source discrete signal data stream to obtain parallel continuous sliding windows; the specific steps are: perform the multi-source discrete signal data stream according to the sensor ID keyBy operation to obtain the keyed data stream; wherein, the keyBy operation is specifically to send the multi-source discrete signal data stream with the same sensor ID to the same designated partition;
对每一指定的分区中的所述键控数据流进行数据流分窗操作,得到并行的连续不断的滑动窗口。A data stream windowing operation is performed on the keyed data stream in each specified partition to obtain parallel continuous sliding windows.
利用并行窗口处理函数在并行的连续不断的所述滑动窗口中提取梅尔频率倒谱系数,得到多源信号对应的梅尔频率倒谱系数数据流。具体步骤为:The parallel window processing function is used to extract the Mel-frequency cepstral coefficients in the parallel continuous sliding windows, so as to obtain the Mel-frequency cepstral coefficient data streams corresponding to the multi-source signals. The specific steps are:
利用窗口函数将每个所述滑动窗口中的数据存储在一个对应的双精度数组中;storing data in each of the sliding windows in a corresponding double-precision array using a window function;
对每个所述双精度数组调用梅尔频率倒谱系数提取函数得到所述梅尔频率倒谱系数数据流;calling the Mel-frequency cepstral coefficient extraction function for each of the double-precision arrays to obtain the Mel-frequency cepstral coefficient data stream;
其中,所述梅尔频率倒谱系数提取函数包括主函数以及多个子函数,多个所述子函数分别为梅尔滤波器组函数、离散余弦变换函数、快速傅里叶变换函数以及海明窗口函数;Wherein, the Mel-frequency cepstral coefficient extraction function includes a main function and a plurality of sub-functions, and the plurality of sub-functions are respectively a Mel filter bank function, a discrete cosine transform function, a fast Fourier transform function, and a Hamming window function;
将所述双精度数组输入所述主函数,所述主函数通过调用多个所述子函数对所述双精度数组进行计算,得到梅尔频率倒谱系数。The double-precision array is input into the main function, and the main function calculates the double-precision array by calling a plurality of the sub-functions to obtain Mel-frequency cepstral coefficients.
实施例3Example 3
基于实施例1,本实施例提供一种MFCC提取的分布式流处理方法或一种MFCC提取的分布式流处理系统的实验验证过程以及验证结果。具体实验过程及验证结果如下:Based on Embodiment 1, this embodiment provides an experimental verification process and verification results of a distributed stream processing method extracted from MFCC or a distributed stream processing system extracted from MFCC. The specific experimental process and verification results are as follows:
以振动传感器采集的加速度信号为实例来对本发明提出的方法和系统进行验证。在本实例中,一共布置了4个振动传感器来实时采集设备的振动信号,信号的采样频率为20kHz,图4为1s内的振动信号曲线图。The method and system proposed by the present invention are verified by taking the acceleration signal collected by the vibration sensor as an example. In this example, a total of 4 vibration sensors are arranged to collect the vibration signal of the equipment in real time. The sampling frequency of the signal is 20kHz. Figure 4 is a graph of the vibration signal within 1s.
通过Kafka生产者并行发布4个传感器的信号流,每8ms产生1条记录,该记录包含160个采样点,得到如图5所示的数据流记录。The signal streams of 4 sensors are published in parallel through the Kafka producer, and a record is generated every 8ms. The record contains 160 sampling points, and the data stream record shown in Figure 5 is obtained.
测试项与测试方法;Test items and test methods;
采用4个振动信号数据源对MFCC提取流处理方法和系统进行测试,主要测试项及其测试方法见表1:Four vibration signal data sources were used to test the MFCC extraction stream processing method and system. The main test items and their test methods are shown in Table 1:
表1测试项及测试方法Table 1 Test items and test methods
测试过程及结果;Test process and results;
MFCC提取分布式流处理功能测试如下:MFCC extracts the distributed stream processing function test as follows:
运行Kafka生产者程序,并行发送4个振动传感器采集的振动信号数据,每个传感器每8ms均发送160条采样点,将每个传感器每8ms发送的采样点表示为1次记录,持续发送20000次记录,数据流持续时长大约为2.7分钟,测试在此期间能否正常提取并输出MFCC提取结果数据流。Run the Kafka producer program and send the vibration signal data collected by 4 vibration sensors in parallel. Each sensor sends 160 sampling points every 8ms. The sampling points sent by each sensor every 8ms are represented as 1 record, and the continuous sending is 20,000 times. Record, the duration of the data flow is about 2.7 minutes, test whether it can be extracted normally and output the data flow of MFCC extraction results during this period.
按照256长度、128滑动步长的计数窗口,每个传感器的振动采样点为3200000个,提取的MFCC结果的个数累计为:According to the counting window with a length of 256 and a sliding step of 128, each sensor has 3,200,000 vibration sampling points, and the accumulated number of extracted MFCC results is:
测试结果显示:The test results show:
数据流生成和MFCC特征提取保持同步,信号发送至处理系统的瞬间,MFCC特征提取计算任务就触发并完成了;经过反复多次测试,特征提取结果与离线处理结果完全一致,表明程序设计与运行的正确性;每个传感器MFCC提取结果的条数均为24999,证明数据处理的完整性为100%。Data stream generation and MFCC feature extraction are kept in sync, and the moment the signal is sent to the processing system, the MFCC feature extraction calculation task is triggered and completed; after repeated tests, the feature extraction results are completely consistent with the offline processing results, indicating that the program design and operation The correctness; the number of MFCC extraction results for each sensor is 24999, which proves that the integrity of data processing is 100%.
MFCC特征提取延迟时间测试:MFCC feature extraction delay time test:
为了测试特征提取延迟时间,在同一台主机上运行Kafka生产者程序和Flink特征提取流处理主程序,通过MFCC特征处理完成瞬时的计算机系统时间减去窗口截止的事件时间,得到每次生成MFCC的延迟时间,共得到四个传感器特征提取的延迟时间数据样本99996个,相关统计结果见表2。In order to test the feature extraction delay time, run the Kafka producer program and the Flink feature extraction stream processing main program on the same host, and subtract the event time of the window cut-off from the instantaneous computer system time through MFCC feature processing to obtain the MFCC generated each time Delay time, a total of 99996 delay time data samples of four sensor feature extractions are obtained, and the relevant statistical results are shown in Table 2.
表2延迟时间测试结果Table 2 Delay time test results
由于测试数据经历了从本地主机运行的Kafka生产者程序发送到Kafka集群,再由本地主机运行的Flink主程序从Kafka集群拉取数据。经过实测,每个窗口的特征提取本身的处理时间非常短,且小于1ms;因此,延迟时间大部分是网络传输所引入的,即便如此平均30多毫秒的整体延迟也充分证明通过流处理来提取MFCC特征是非常高效的,也就是说从多源信号原始数据产生到对应的MFCC特征输出之间延迟时间是极短的。Since the test data is sent from the Kafka producer program running on the local host to the Kafka cluster, and then the Flink main program running on the local host pulls data from the Kafka cluster. After actual measurement, the processing time of the feature extraction itself for each window is very short and less than 1ms; therefore, most of the delay time is introduced by network transmission, even so the average overall delay of more than 30 milliseconds fully proves that the extraction through stream processing The MFCC feature is very efficient, that is to say, the delay time between the generation of the original data of the multi-source signal and the output of the corresponding MFCC feature is extremely short.
特征提取程序在计算机集群中的测试:Test of the feature extractor on a computer cluster:
如图6所示,将FlinkMFCC提取程序及其第三方依赖打包,部署至Hadoop集群。通过“./bin/yarn-session.sh-nmmfccflinktest-d”命令,为执行Flink任务开启一个Yarn会话。然后通过“./bin/flinkrun-corg.atcsu.mfcc.VibMFCCRealTime/opt/program/FlinkMFCC-1.0.0.jar”命令,提交Flink任务。运行Kafka生产者程序,生成多源振动信号原始数据。在集群模式下,程序正常运行,MFCC实时提取结果正确。As shown in Figure 6, the FlinkMFCC extraction program and its third-party dependencies are packaged and deployed to the Hadoop cluster. Open a Yarn session for executing Flink tasks through the "./bin/yarn-session.sh-nmmfccflinktest-d" command. Then submit the Flink task through the "./bin/flinkrun-corg.atcsu.mfcc.VibMFCCRealTime/opt/program/FlinkMFCC-1.0.0.jar" command. Run the Kafka producer program to generate raw data of multi-source vibration signals. In cluster mode, the program runs normally, and the real-time extraction results of MFCC are correct.
本发明实施例通过实验验证了测试结果和本地单机测试结果一致,表明计算机集群处理模式下能对多源信号原始数据流进行正确提取MFCC特征,数据产生即完成处理,且数据处理完整率也为100%。因此,本发明能够让多源数据流的梅尔频率倒谱系数提取工作能够实时且并行执行,提高梅尔频率倒谱系数的提取效率和及时性,避免了大批数据离线处理的滞后性以及数据处理量庞大而带来的数据处理压力。The embodiment of the present invention has verified that the test results are consistent with the local stand-alone test results through experiments, showing that the original data stream of multi-source signals can be correctly extracted from the MFCC feature under the computer cluster processing mode, and the data processing is completed when the data is generated, and the data processing integrity rate is also 100%. Therefore, the present invention enables the extraction of Mel-frequency cepstral coefficients of multi-source data streams to be performed in real time and in parallel, improves the extraction efficiency and timeliness of Mel-frequency cepstral coefficients, and avoids the hysteresis and data The data processing pressure brought by the huge amount of processing.
实施例4Example 4
基于实施例1,本实施例提供一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被计算机的处理器执行时,实现上述MFCC提取的分布式流处理方法。存储介质指存储数据的载体。比如软盘、光盘、DVD、硬盘、闪存、U盘、CF卡、SD卡、MMC卡、SM卡、记忆棒(Memory Stick)、xD卡等。存储介质还可以是基于闪存即Nandflash的,比如U盘、CF卡、SD卡、SDHC卡、MMC卡、SM卡、记忆棒、xD卡等。Based on Embodiment 1, this embodiment provides a storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor of a computer, the above-mentioned distributed stream processing method extracted by MFCC is implemented. A storage medium refers to a carrier that stores data. Such as floppy disk, CD, DVD, hard disk, flash memory, U disk, CF card, SD card, MMC card, SM card, Memory Stick (Memory Stick), xD card, etc. The storage medium can also be based on flash memory, namely Nandflash, such as U disk, CF card, SD card, SDHC card, MMC card, SM card, memory stick, xD card, etc.
本发明实施通过将程序存储于存储介质中,能够通过处理器来执行程序实现上述MFCC提取的分布式流处理方法,提高数据处理效率。The present invention implements a distributed stream processing method that can implement the above-mentioned MFCC extraction by storing the program in a storage medium and executing the program through a processor, thereby improving data processing efficiency.
实施例5Example 5
基于实施例1,本实施例提供一种计算机,包括存储器以及处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,实现上述所述的MFCC提取的分布式流处理方法。通过运行或执行存储在存储器内的软件程序和/或模块,以及调用存储在存储器内的数据,执行终端的各种功能和处理数据,从而对终端进行整体监控,例如实现上述MFCC提取的分布式流处理方法。处理器可以为一个或多个,处理器还可以被实现为计算设备的组合。Based on Embodiment 1, this embodiment provides a computer, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the above-mentioned distributed flow extracted by the MFCC is realized. Approach. By running or executing software programs and/or modules stored in the memory, and calling data stored in the memory, executing various functions of the terminal and processing data, the terminal is monitored as a whole, such as realizing the above-mentioned distributed MFCC extraction stream processing method. There may be one or more processors, and the processors may also be implemented as a combination of computing devices.
本发明实施例通过利用计算机实现上述MFCC提取的分布式流处理方法对应的程序或应用模块,提高数据处理效率。The embodiment of the present invention improves data processing efficiency by using a computer to implement the program or application module corresponding to the distributed stream processing method extracted by the MFCC.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的构思和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the concept and principles of the present invention shall be included in the protection of the present invention. within range.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211558715.XA CN115840877B (en) | 2022-12-06 | 2022-12-06 | Distributed stream processing method, system, storage medium and computer extracted from MFCC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211558715.XA CN115840877B (en) | 2022-12-06 | 2022-12-06 | Distributed stream processing method, system, storage medium and computer extracted from MFCC |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115840877A CN115840877A (en) | 2023-03-24 |
CN115840877B true CN115840877B (en) | 2023-07-07 |
Family
ID=85578169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211558715.XA Active CN115840877B (en) | 2022-12-06 | 2022-12-06 | Distributed stream processing method, system, storage medium and computer extracted from MFCC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115840877B (en) |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8412526B2 (en) * | 2003-04-01 | 2013-04-02 | Nuance Communications, Inc. | Restoration of high-order Mel frequency cepstral coefficients |
CN101256768B (en) * | 2008-04-03 | 2011-03-30 | 清华大学 | Time-Frequency Two-Dimensional Cepstrum Feature Extraction Method for Language Recognition |
US8656369B2 (en) * | 2010-05-24 | 2014-02-18 | International Business Machines Corporation | Tracing flow of data in a distributed computing application |
WO2014020588A1 (en) * | 2012-07-31 | 2014-02-06 | Novospeech Ltd. | Method and apparatus for speech recognition |
KR101371299B1 (en) * | 2013-02-14 | 2014-03-12 | 한국과학기술원 | Analyzing method and apparatus for the depth of anesthesia using cepstrum method |
US9256460B2 (en) * | 2013-03-15 | 2016-02-09 | International Business Machines Corporation | Selective checkpointing of links in a data flow based on a set of predefined criteria |
CA2998399C (en) * | 2014-09-17 | 2021-11-16 | Evrika Research Technologies Inc. | Systems, methods and devices for highly-parallelized qus-value determination for characterizing a specimen |
US11017778B1 (en) * | 2018-12-04 | 2021-05-25 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
US10983969B2 (en) * | 2019-02-18 | 2021-04-20 | Boomi, Inc. | Methods and systems for mapping flattened structure to relationship preserving hierarchical structure |
US11194798B2 (en) * | 2019-04-19 | 2021-12-07 | International Business Machines Corporation | Automatic transformation of complex tables in documents into computer understandable structured format with mapped dependencies and providing schema-less query support for searching table data |
CN111210806B (en) * | 2020-01-10 | 2022-06-17 | 东南大学 | A Low-Power MFCC Speech Feature Extraction Circuit Based on Serial FFT |
CN111261189B (en) * | 2020-04-02 | 2023-01-31 | 中国科学院上海微系统与信息技术研究所 | A feature extraction method of vehicle sound signal |
AU2020102350A4 (en) * | 2020-09-21 | 2020-10-29 | Guizhou Minzu University | A Spark-Based Deep Learning Method for Data-Driven Traffic Flow Forecasting |
CN112270933B (en) * | 2020-11-12 | 2024-03-12 | 北京猿力未来科技有限公司 | Audio identification method and device |
CN114095032B (en) * | 2021-11-12 | 2022-07-15 | 中国科学院空间应用工程与技术中心 | Data stream compression method based on Flink and RVR, edge computing system and storage medium |
CN115331678B (en) * | 2022-03-21 | 2024-10-22 | 西北工业大学 | A method of acoustic signal recognition based on generalized regression neural network using Mel-frequency cepstral coefficients |
CN115273904A (en) * | 2022-07-22 | 2022-11-01 | 浙江大学 | Angry emotion recognition method and device based on multi-feature fusion |
-
2022
- 2022-12-06 CN CN202211558715.XA patent/CN115840877B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115840877A (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287163B (en) | Method, device, equipment and medium for collecting and analyzing security log | |
CN115514620B (en) | An anomaly detection method and cloud network platform | |
CN110365942B (en) | Real-time video intelligent analysis method and system | |
CN114095032B (en) | Data stream compression method based on Flink and RVR, edge computing system and storage medium | |
CN102624889A (en) | A Massive Data Concurrent Processing Method Based on Separation of Receiving and Processing | |
CN109408330A (en) | Log analysis method, device, terminal device and readable storage medium storing program for executing | |
US8521765B2 (en) | Method and system for post processing trace data | |
CN112612823A (en) | Big data time sequence analysis method based on fusion of Pyspark and Pandas | |
WO2023206860A1 (en) | Method and apparatus for determining mechanical device fault | |
CN112187589A (en) | System testing method based on flow playback | |
CN114564983A (en) | Hydroelectric generating set state monitoring characteristic signal processing method based on time-frequency conversion | |
CN111339052A (en) | Method and device for processing unstructured log data | |
CN118155633A (en) | Fault detection method, device, computer equipment and storage medium | |
CN115840877B (en) | Distributed stream processing method, system, storage medium and computer extracted from MFCC | |
CN107659560A (en) | A kind of abnormal auditing method for mass network data flow log processing | |
CN103412942B (en) | A kind of voltage dip data analysing method based on cloud computing technology | |
CN113569879B (en) | Training method of abnormal recognition model, abnormal account recognition method and related device | |
CN113075547A (en) | Motor data acquisition method and system | |
CN116297883B (en) | A structure recognition method, device, system and terminal equipment based on knocking sound | |
CN112543127A (en) | Monitoring method and device of micro-service architecture | |
CN115525392A (en) | Container monitoring method and device, electronic equipment and storage medium | |
CN105956036A (en) | Transaction quality analysis device and transaction quality analysis method | |
CN116881781A (en) | An operating mode damping identification method, damage detection method, system and equipment | |
CN112256651B (en) | Method and device for collecting multi-source heterogeneous logs | |
CN112362323B (en) | Data Storage Method for Vibration Online Monitoring and Fault Diagnosis System of Turbine Generator Set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |