CN113447111B

CN113447111B - Visual vibration amplification method, detection method and system based on morphological component analysis

Info

Publication number: CN113447111B
Application number: CN202110668319.1A
Authority: CN
Inventors: 杨学志; 沈晶; 张龙; 张肖; 臧宗迪; 孔瑞; 杨平安
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2022-09-27
Anticipated expiration: 2041-06-16
Also published as: CN113447111A

Abstract

The invention discloses a visual vibration amplification method, a detection method and a system based on morphological component analysis, which belong to the technical field of computer vision. Acquiring a video file including the target object, and determining the region of interest of the target object in each frame of the video file; using a quality factor adjustable wavelet dictionary and a discrete sine-cosine dictionary to represent the structural components and texture components of each frame of video respectively ; Using the threshold selection algorithm based on the generalized Gaussian density distribution to determine the threshold value of the structural components; using adaptive MCA to separate the image structural components and texture components; A large video is constructed, and at the same time, the visual vibration signal is extracted for the structural components to realize the measurement of multiple vibration frequencies.

Description

Visual vibration amplification method, detection method and system based on morphological component analysis

技术领域technical field

本发明属于计算机视觉技术领域，具体地说，涉及一种基于形态分量分析的视觉振动放大、检测方法及系统。The invention belongs to the technical field of computer vision, and in particular relates to a visual vibration amplification and detection method and system based on morphological component analysis.

背景技术Background technique

频率信息是振动分析的重要依据，广泛应用于机械或者建筑结构健康检测和质量检测。人们一般通过分析物体的振动信号来检测振动频率。在振动测量领域，被测对象往往受不同激励源的激励，在频谱上表现为多个频率的混合。Frequency information is an important basis for vibration analysis, which is widely used in mechanical or building structural health inspection and quality inspection. People generally detect the vibration frequency by analyzing the vibration signal of the object. In the field of vibration measurement, the measured object is often stimulated by different excitation sources, and it appears as a mixture of multiple frequencies in the frequency spectrum.

在传统的振动检测中，测量被测对象的频率的方式主要是在被测对象表面粘贴加速度传感器获得加速度信号，根据加速度信号获得位移信号，针对位移信号，获得被测对象的频谱，这种接触式的振动检测方法存在明显不足：由于加速度传感器本身有重量，难以检测管状的对象，同时加速度传感器本身的存在会影响被测对象的固有频率；且传感器粘贴过程耗时耗力，在多点测量时还需要进行传感器融合。而无接触式的激光传感器只能进行单点测量，若需进行全场测量，需要多个激光传感器，费用昂贵；且具有延时的缺点。In the traditional vibration detection, the method of measuring the frequency of the measured object is mainly to paste the acceleration sensor on the surface of the measured object to obtain the acceleration signal, obtain the displacement signal according to the acceleration signal, and obtain the frequency spectrum of the measured object for the displacement signal. There are obvious shortcomings in the type of vibration detection method: due to the weight of the acceleration sensor itself, it is difficult to detect tubular objects, and the existence of the acceleration sensor itself will affect the natural frequency of the measured object; and the sensor sticking process is time-consuming and labor-intensive. Sensor fusion is also required. The non-contact laser sensor can only perform single-point measurement. To perform full-field measurement, multiple laser sensors are required, which is expensive and has the disadvantage of time delay.

因此，目前提出了基于视觉的无接触式的振动信号采集方式，其主要的技术思路是：将每个像素都作为一个视觉传感器，任何一点的振动信号都可以提取出来，即无接触式的全场测量。然而，受摄像机拍摄光照条件的变化，拍摄对象表面不均匀反射，扰乱了从视频中提取的振动信号，导致测量结果不准确。Therefore, a non-contact vibration signal acquisition method based on vision has been proposed. The main technical idea is: each pixel is used as a visual sensor, and the vibration signal at any point can be extracted. Field measurement. However, due to changes in the lighting conditions of the camera, the uneven reflection of the object surface disturbs the vibration signal extracted from the video, resulting in inaccurate measurement results.

针对光照的变化，近年来，也有人开始研究，例如基于相位的运动放大方法广泛应用于测量和放大视频中不可感知的运动(Wadhwa,M.Rubinstein,F.Durand,andW.Freeman,“Phase-based video motion processing,”ACM Trans.Graph.,vol.32,no.4,Jul.2013,Art.no.80.)。(A.Davis et al.,“Visual vibrometry:Estimating materialproperties from small motions in video,”IEEE Trans.Pattern Anal.Mach.Intell.,vol.39,no.4,pp.732–745,Apr.2017)；从视频中提取运动信号计算振动物体的频率，用于推测材料的性质，比如，杨氏模量、刚度、阻尼、几何尺寸等。(Chen J G,Wadhwa N,Cha Y J,et al.Modal identification of simple structures with high-speed video usingmotion magnification[J].Journal of Sound and Vibration,2015,345:58-71.)利用该方法用于测量悬臂梁和塑料管的振动频率和位移。In recent years, some people have begun to study the changes in illumination. For example, phase-based motion amplification methods are widely used to measure and amplify imperceptible motion in videos (Wadhwa, M. Rubinstein, F. Durand, and W. Freeman, "Phase- based video motion processing," ACM Trans. Graph., vol. 32, no. 4, Jul. 2013, Art. no. 80.). (A. Davis et al., "Visual vibrometry: Estimating materialproperties from small motions in video," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 732–745, Apr. 2017) ; Extract the motion signal from the video to calculate the frequency of the vibrating object, which is used to infer the properties of the material, such as Young's modulus, stiffness, damping, geometric dimensions, etc. (Chen J G, Wadhwa N, Cha Y J, et al. Modal identification of simple structures with high-speed video using motion magnification [J]. Journal of Sound and Vibration, 2015, 345: 58-71.) Using this method for measurement Vibration frequency and displacement of cantilever beams and plastic pipes.

但是，在一些应用场景中，以怠速状态的发动机频谱为例，在正常情况下，发动机总成悬置系统的6个模态频率都在25hz以下。因此，发动机怠速状态的振动频谱在25Hz以下往往表现为多个频率的混合。然而，由于上述的相位缠绕或解缠算法缺陷，导致提取的振动波形不准确，影响了微振动频率测量的精确度，特别是在系统存在多个混合频率时，更容易检测出不存在的异常频率。However, in some application scenarios, taking the engine frequency spectrum at idle speed as an example, under normal circumstances, the six modal frequencies of the engine assembly mounting system are all below 25hz. Therefore, the vibration spectrum of the engine idling state is often a mixture of multiple frequencies below 25 Hz. However, due to the above-mentioned defects of the phase wrapping or unwrapping algorithm, the extracted vibration waveform is inaccurate, which affects the accuracy of micro-vibration frequency measurement, especially when there are multiple mixed frequencies in the system, it is easier to detect non-existent anomalies frequency.

综上所述，现有的振动测量技术还无法有效地解决微幅振动的测量问题。在这种情况下，针对微幅振动测量问题，亟需一种视觉振动放大方法，降低拍摄光照的干扰，准确地检测出被测对象的多个模态频率。To sum up, the existing vibration measurement technology cannot effectively solve the measurement problem of micro-amplitude vibration. In this case, for the measurement of micro-amplitude vibration, a visual vibration amplification method is urgently needed, which can reduce the interference of shooting light and accurately detect multiple modal frequencies of the measured object.

发明内容SUMMARY OF THE INVENTION

为了解决或者至少部分解决上述问题，本发明提供一种基于形态分量分析的视觉振动放大方法，去除拍摄光照的干扰，能够准确地检测出由发动机内部橡胶件位置和角度的偏差引起的振动频率，可以获得发动机的多个模态频率。In order to solve or at least partially solve the above problems, the present invention provides a visual vibration amplification method based on morphological component analysis, which removes the interference of photographing light and can accurately detect the vibration frequency caused by the deviation of the position and angle of the rubber parts inside the engine, Multiple modal frequencies of the engine can be obtained.

为了实现上述目的，本发明提供如下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

本发明第一方面提供一种基于形态分量分析的视觉振动放大方法A first aspect of the present invention provides a visual vibration amplification method based on morphological component analysis

获取包括目标对象的视频文件，确定所述目标对象在所述视频文件中每帧图像中感兴趣区域；Obtain a video file including a target object, and determine the region of interest of the target object in each frame of image in the video file;

构造表征所述视频文件每帧图像中感兴趣区域的结构分量；构造表征所述视频文件中每帧图像感兴趣区域的纹理分量；Constructing a structural component representing the region of interest in each frame of the video file; constructing a texture component representing the region of interest in each frame of the video file;

分离所述感兴趣区域中的结构分量和纹理分量，将分离出的结构分量和纹理分量，结合欧拉视角放大所述感兴趣区域的微振动信号，重构包含目标对象的放大视频文件。The structure component and texture component in the region of interest are separated, and the separated structure component and texture component are combined with Euler's perspective to amplify the micro-vibration signal of the region of interest, and an enlarged video file containing the target object is reconstructed.

进一步的，所述构造表征所述视频文件每帧图像中感兴趣区域的结构分量；构造表征所述视频文件中每帧图像感兴趣区域的纹理分量的步骤包括：Further, the structure represents the structural component of the region of interest in each frame of the image of the video file; the step of constructing the texture component of the region of interest of each frame of the image in the video file includes:

构建品质因子可调小波字典用于表征图像的结构成分，构建局部离散余弦变换与离散正弦变换字典用于表示图像的纹理成分。The quality factor adjustable wavelet dictionary is constructed to represent the structural components of the image, and the local discrete cosine transform and discrete sine transform dictionaries are constructed to represent the texture components of the image.

进一步的，所述分离感兴趣区域中的结构分量和纹理分量的步骤包括：Further, the step of separating the structural component and the texture component in the region of interest includes:

根据MCA方法对视频文件的每帧图像的感兴趣区域进行分离，获得每帧图像的结构分量纹理分量以及噪声分量；According to the MCA method, the region of interest of each frame image of the video file is separated, and the structure component texture component and the noise component of each frame image are obtained;

构建基于广义高斯密度分布的阈值选择算法，确定所述纹理分量和所述结构分量的硬阈值；constructing a threshold selection algorithm based on a generalized Gaussian density distribution to determine hard thresholds for the texture component and the structure component;

根据所述纹理分量和所述结构分量的硬阈值，通过迭代阈值法求解结构分量和纹理分量。According to the hard thresholds of the texture component and the texture component, the texture component and the texture component are solved by an iterative threshold method.

进一步的，所述确定所述目标对象在所述视频文件中每帧图像中感兴趣区域的步骤包括：Further, the step of determining the region of interest of the target object in each frame of image in the video file includes:

构建SVM识别模型，将所述视频文件数据的某一帧图像输入所述SVM识别模型，获取所述目标对象的关键区域，作为感兴趣区域；其中所述SVM识别模型是通过训练集训练得到的。Build an SVM recognition model, input a certain frame of image of the video file data into the SVM recognition model, and obtain the key area of the target object as a region of interest; wherein the SVM recognition model is obtained through training set training .

进一步的，所述训练集的获取步骤包括：Further, the step of obtaining the training set includes:

获取包含目标对象的多组图像，采用分水岭算法，分割出每一组图像的候选区域，所述每个候选区域内只包含所述目标对象的一组关键对象，对所述候选区域进行标记；Obtain multiple sets of images containing the target object, and use the watershed algorithm to segment candidate regions of each set of images. Each candidate region only contains a set of key objects of the target object. mark;

采用滑动窗口的方式，将带有关键对象标记的候选区域图像块作为正例，将无关键对象标记的背景图像块作为负例，构建多值分类的训练集。Using the sliding window method, the image patches in the candidate region marked with key objects are taken as positive examples, and the background image patches without key object markings are taken as negative examples to construct a training set for multi-value classification.

进一步的，结合欧拉视角放大所述感兴趣区域的微振动信号，重构包含目标对象的放大视频文件数据的步骤包括：Further, the micro-vibration signal of the region of interest is amplified in combination with the Euler perspective, and the step of reconstructing the amplified video file data including the target object includes:

结合欧拉视角，根据所述视频文件中每帧图像结构分量中各个像素的亮度变化来表征目标对象的振动变化，获得视觉振动信号；Combined with Euler's perspective, the vibration change of the target object is characterized according to the brightness change of each pixel in the image structure component of each frame in the video file, and a visual vibration signal is obtained;

接收放大倍数，将所述放大倍数乘以视觉振动信号，再将每帧图像结构分量中各个像素的亮度变化加上放大后的视觉振动信号；Receiving the magnification, multiplying the magnification by the visual vibration signal, and then adding the amplified visual vibration signal to the brightness change of each pixel in the structural component of each frame of image;

根据目标对象的理论频带范围设计带通滤波，使得感兴趣区域在特定的频带中振动幅度较小的振动被放大；Design band-pass filtering according to the theoretical frequency band range of the target object, so that the vibration of the region of interest with a small vibration amplitude in a specific frequency band is amplified;

将所述感兴趣区域放大后的结构分量与纹理分量重新组合以生成振动放大视频。The enlarged structural and texture components of the region of interest are recombined to generate a vibration-enlarged video.

本发明第二方面提供一种基于形态分量分析的视觉振动检测方法A second aspect of the present invention provides a visual vibration detection method based on morphological component analysis

将获得的视觉振动信号，通过快速傅里叶变换，获得视觉振动信号对应的频谱；The obtained visual vibration signal is obtained through fast Fourier transform to obtain the spectrum corresponding to the visual vibration signal;

将所述视觉振动信号对应的频谱结合上述方法获得的振动放大视频，对目标对象的振动情况进行检测。The frequency spectrum corresponding to the visual vibration signal is combined with the vibration amplification video obtained by the above method to detect the vibration of the target object.

本发明第三方面提供一种基于形态分量分析的视觉振动放大系统，包括：A third aspect of the present invention provides a visual vibration amplification system based on morphological component analysis, comprising:

提取模块，其用于获取包括目标对象的视频文件，确定所述目标对象在所述视频文件中每帧图像中感兴趣区域；an extraction module, which is used to obtain a video file including a target object, and determine the region of interest of the target object in each frame of image in the video file;

构造模块，其用于构造表征所述视频文件每帧图像感兴趣区域的结构成分；构造表征所述视频文件每帧图像感兴趣区域的纹理成分；及a construction module, which is used for constructing a structural component representing the region of interest of each frame of the video file; constructing a texture component representing the region of interest of each frame of the video file; and

重构模块，其用于分离所述感兴趣区域中的结构分量和纹理分量，将分离出的结构分量和纹理分量，结合欧拉视角放大所述感兴趣区域的微振动信号，重构包含目标对象的放大视频文件。A reconstruction module, which is used to separate the structural component and the texture component in the region of interest, amplify the micro-vibration signal of the region of interest by combining the separated structural component and texture component, and combine the Euler perspective to reconstruct the target A zoomed-in video file of the object.

本发明第四方面提供一种电子设备，包括处理器、输入设备、输出设备和存储器，所述处理器、输入设备、输出设备和存储器依次连接，所述存储器用于存储计算机程序，所述计算机程序包括程序指令，所述处理器被配置用于调用所述程序指令，执行上述的方法。A fourth aspect of the present invention provides an electronic device, including a processor, an input device, an output device, and a memory, wherein the processor, the input device, the output device, and the memory are connected in sequence, and the memory is used to store a computer program, and the computer The program includes program instructions, and the processor is configured to invoke the program instructions to perform the above-described method.

本发明第五方面提供一种可读存储介质，所述存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行上述的方法。A fifth aspect of the present invention provides a readable storage medium storing a computer program, the computer program including program instructions, the program instructions, when executed by a processor, cause the processor to perform the above method.

相比于现有技术，本发明的实施例至少具有以下有益效果：Compared with the prior art, the embodiments of the present invention at least have the following beneficial effects:

(1)本发明通过分离目标对象的感兴趣区域中的结构分量和纹理分量，针对结构分量，采用欧拉方法计算振动频谱，对比发动机怠速状态振动频率的理论值，可以完成由橡胶偏差引起的发动机异常频率检测；同时，可以对发动机的故障状态视频进行放大，直观地观察并分析发动机故障时的振动形态。相比于现有专利基本采用接触式的加速度传感器或非接触式的激光传感器，最终结果仅限于给出发动机的振动频谱；本发明在采集发动机信号和最终呈现发动机故障转态的结果，都有着本质的不同，具有明显优势。(1) In the present invention, by separating the structural component and the texture component in the region of interest of the target object, for the structural component, the Euler method is used to calculate the vibration spectrum, and the theoretical value of the vibration frequency in the idle state of the engine can be compared. The abnormal frequency of the engine is detected; at the same time, the video of the engine failure state can be enlarged, and the vibration pattern of the engine failure can be observed and analyzed intuitively. Compared with the existing patents, which basically use contact acceleration sensors or non-contact laser sensors, the final result is limited to giving the vibration spectrum of the engine. The difference in nature has obvious advantages.

(2)本发明注重的是采用固定的字典对图像进行表征，不涉及大量的实验样本的处理；并利用形态分量分析的方式获得结构成分和纹理成分，从本质上分析，每个当前帧和第一帧的结构成分差对应的是正常振动，每个当前帧和第一帧的纹理成分差对应的是异常扰动，利用欧拉视角原理，本示例只放大结构成分，减小了纹理成分带来的干扰；同时，针对结构成分提取出的视觉振动信号不受异常扰动(如光线、不均匀表面)的影响，也不涉及到相位解缠算法，有效地反映了目标对象的振动情况，在此基础上做傅里叶变换，求其频谱，可以准确计算出的多个微振动的频率。(2) The present invention focuses on using a fixed dictionary to characterize the image, and does not involve the processing of a large number of experimental samples; and uses morphological component analysis to obtain structural components and texture components. Essentially, each current frame and The structural component difference of the first frame corresponds to normal vibration, and the texture component difference of each current frame and the first frame corresponds to abnormal disturbance. Using the principle of Euler's perspective, this example only enlarges the structural component and reduces the texture component band. At the same time, the visual vibration signal extracted from the structural components is not affected by abnormal disturbances (such as light, uneven surface), and does not involve the phase unwrapping algorithm, which effectively reflects the vibration of the target object. On this basis, do Fourier transform and find its frequency spectrum, which can accurately calculate the frequency of multiple micro-vibration.

(3)本发明通过形态分量分析(Morphological Component Analysis，MCA)，用于分离视频文件图像中的感兴趣区域的结构成分、纹理成分及噪声，其中结构成分和纹理成分分别对应于图像的相位和幅度，将分离出的结构成分结合欧拉视角，提取视觉振动信号，这种方法同样可以去除纹理干扰，提高算法对光线变化的鲁棒性，且不涉及相位解缠；针对结构分量，采用欧拉方法计算发动机的频谱，并进一步结合放大后的结构分量和纹理分量重新组合以生成振动放大的视频，获得目标对象的振动情况的可视化效果，可以更好的辅助人工检测发动机运行状态目标对象。(3) The present invention uses morphological component analysis (Morphological Component Analysis, MCA) to separate the structural component, texture component and noise of the region of interest in the video file image, wherein the structural component and the texture component correspond to the phase and Amplitude, the separated structural components are combined with Euler's perspective to extract the visual vibration signal. This method can also remove texture interference, improve the robustness of the algorithm to light changes, and does not involve phase unwrapping; The pulling method calculates the frequency spectrum of the engine, and further combines the amplified structural components and texture components to generate a vibration-amplified video to obtain a visual effect of the vibration of the target object, which can better assist in the manual detection of the target object in the running state of the engine.

附图说明Description of drawings

通过结合附图对本申请实施例进行更详细的描述，本申请的上述以及其他目的、特征和优势将变得更加明显。附图用来提供对本申请实施例的进一步理解，并且构成说明书的一部分，与本申请实施例一起用于解释本申请，并不构成对本申请的限制。在附图中，相同的参考标号通常代表相同部件或步骤。The above and other objects, features and advantages of the present application will become more apparent from the detailed description of the embodiments of the present application in conjunction with the accompanying drawings. The accompanying drawings are used to provide a further understanding of the embodiments of the present application, constitute a part of the specification, and are used to explain the present application together with the embodiments of the present application, and do not constitute a limitation to the present application. In the drawings, the same reference numbers generally refer to the same components or steps.

图1基于形态分量分析的发动机故障视觉振动检测方法总体流程图；Fig. 1 The overall flow chart of the visual vibration detection method for engine faults based on morphological component analysis;

图2基于欧拉视角的发动机故障频率检测及可视化方法流程图；Fig. 2 is a flow chart of engine fault frequency detection and visualization method based on Euler's perspective;

图3为发动机总成怠速状态图；Fig. 3 is the idle state diagram of the engine assembly;

图4图示了根据本申请实施例的电子设备的框图；4 illustrates a block diagram of an electronic device according to an embodiment of the present application;

图5为发动机总成怠速状态悬置系统频谱；Fig. 5 is the frequency spectrum of the mount system in the idle state of the engine assembly;

图6为本发明实施例提供的一种基于形态分量分析的视觉振动放大方法流程图；6 is a flowchart of a visual vibration amplification method based on morphological component analysis provided by an embodiment of the present invention;

图7为本发明实施例提供的一种基于形态分量分析的视觉振动放大系统框图。FIG. 7 is a block diagram of a visual vibration amplification system based on morphological component analysis provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面，将参考附图详细地描述根据本申请的示例实施例。显然，所描述的实施例仅仅是本申请的一部分实施例，而不是本申请的全部实施例，应理解，本申请不受这里描述的示例实施例的限制。Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

申请概述Application overview

以发动机检测为例，在发动机振动检测的过程中，由于内部橡胶件位置和角度的偏差会引起橡胶系统固有频率的改变，造成发动机在怠速状态运行时出现异常的振动频率。为了检测出发动机怠速状态的异常模态频率(本质上表现为多个混合频率的检测)，在传统的振动检测中，往往是在发动机缸体和金属件表面粘贴加速度传感器，这种接触式的振动检测方法存在明显不足：由于加速度传感器本身有重量，会影响目标对象的固有频率，且粘贴过程耗时耗力，多点测量还需要进行传感器融合。再如，无接触式的激光传感器只能进行单点测量，若需进行全场测量，需要多个激光传感器，费用昂贵；且具有延时的缺点。Taking engine detection as an example, in the process of engine vibration detection, due to the deviation of the position and angle of the internal rubber parts, the natural frequency of the rubber system will change, resulting in abnormal vibration frequency when the engine is running at idle speed. In order to detect the abnormal modal frequency of the engine idling state (essentially the detection of multiple mixed frequencies), in traditional vibration detection, acceleration sensors are often attached to the surface of the engine block and metal parts. The vibration detection method has obvious shortcomings: due to the weight of the acceleration sensor itself, it will affect the natural frequency of the target object, and the pasting process is time-consuming and labor-intensive, and multi-point measurement also requires sensor fusion. For another example, a non-contact laser sensor can only perform single-point measurement. If a full-field measurement is required, multiple laser sensors are required, which is expensive and has the disadvantage of time delay.

本发明通过形态分量分析(Morphological Component Analysis，MCA)，用于分离视频文件图像中的感兴趣区域的结构成分、纹理成分及噪声，其中结构成分和纹理成分分别对应于图像的相位和幅度，将分离出的结构成分结合欧拉视角，提取视觉振动信号，这种方法同样可以去除纹理干扰，提高算法对光线变化的鲁棒性，且不涉及相位解缠。针对结构分量，采用欧拉方法计算发动机的频谱，并进一步结合放大后的结构分量和纹理分量重新组合以生成振动放大的视频，获得目标对象的振动情况的可视化效果，可以更好的辅助人工检测发动机运行状态。The present invention uses morphological component analysis (Morphological Component Analysis, MCA) to separate the structural component, texture component and noise of the region of interest in the video file image, wherein the structural component and the texture component correspond to the phase and amplitude of the image, respectively. The separated structural components are combined with Euler's perspective to extract the visual vibration signal. This method can also remove texture interference, improve the robustness of the algorithm to light changes, and does not involve phase unwrapping. For the structural component, the Euler method is used to calculate the frequency spectrum of the engine, and the amplified structural component and texture component are further combined to generate an amplified vibration video, and the visual effect of the vibration of the target object is obtained, which can better assist manual detection. Engine running state.

示例性方法Exemplary method

如图1和6所示，本示例提供一种基于形态分量分析的视觉振动放大方法，具体包括如下步骤：As shown in Figures 1 and 6, this example provides a visual vibration amplification method based on morphological component analysis, which specifically includes the following steps:

S100：获取包括目标对象的视频文件，确定所述目标对象在所述视频文件中每帧图像中感兴趣区域；S100: Acquire a video file including a target object, and determine a region of interest of the target object in each frame of image in the video file;

具体的，本示例主要是对包含目标对象的视频文件进行处理，目标对象包括机械设备以及大型的建筑体，例如电缆、斜拉桥体、发动机、离心泵等在实际运用过程中可能产生振动的结构。感兴趣区域指的是目标对象需要振动放大的区域，该区域情况一般振动较为明显，例如在发动机中的发动机缸体和金属件。Specifically, this example mainly processes video files containing target objects, including mechanical equipment and large buildings, such as cables, cable-stayed bridges, engines, centrifugal pumps, etc., which may generate vibration during actual use. structure. The area of interest refers to the area where the target object needs vibration amplification, and the vibration is generally obvious in this area, such as the engine block and metal parts in the engine.

本示例视频文件可以包括录制目标对象的工作视频，也可以是一段包括目标对象的视频流。在视频流或者视频文件中获取包含目标对象的某一帧图像，例如，通过摄像头读取图像，使用摄像头读取Api直接读取当前图像流的帧；根据所述视频文件数据的某一帧图像的几何特征，确定所述目标对象在所述视频文件某一帧图像中感兴趣区域。确定图像中感兴趣区域方法，可以通过特征识别算法或者训练好的图像识别模型提取目标对象的感兴趣区域(ROI)，但不以此为限。This example video file can include a working video recording the target object, or it can be a video stream including the target object. Obtain a certain frame of image containing the target object in the video stream or video file, for example, read the image through the camera, and use the camera to read the Api to directly read the frame of the current image stream; according to a certain frame of the video file data The geometric features of the target object are determined to determine the region of interest of the target object in a certain frame of the video file. For the method of determining the region of interest in the image, the region of interest (ROI) of the target object can be extracted through a feature recognition algorithm or a trained image recognition model, but it is not limited thereto.

S200：构造表征所述视频文件每帧图像中感兴趣区域的结构成分；构造表征所述视频文件中每帧图像感兴趣区域的纹理成分；S200: constructing a structural component representing the region of interest in each frame of the video file; constructing a texture component representing the region of interest in each frame of the video file;

具体的，分别通过构建品质因子可调小波字典表示视频图像的结构分量，以及构建离散正余弦字典表示视频图像的纹理分量。由于视频图像结构成分表现为显著的指向性边缘、纹理结构表现为周期性内容，因此，本示例分别选取品质因子可调小波字典(TQWT)表示结构成分、离散正余弦字典(DCT)表示纹理成分。Specifically, the structure component of the video image is represented by constructing a wavelet dictionary with adjustable quality factor, and the texture component of the video image is represented by constructing a discrete sine and cosine dictionary. Since the structural components of the video image show significant directional edges and the texture structure shows periodic content, in this example, the Quality Factor Tunable Wavelet Dictionary (TQWT) is selected to represent the structural components, and the discrete sine and cosine dictionary (DCT) to represent the texture components. .

S300：分离感兴趣区域中的结构分量和纹理分量，将分离出的结构分量和纹理分量，结合欧拉视角放大目标对象微振动信号，重构包含目标对象的放大视频文件。S300: Separating structural components and texture components in the region of interest, and amplifying the micro-vibration signals of the target object with the separated structural components and texture components in combination with Euler's perspective, and reconstructing an enlarged video file containing the target object.

在一些实施例中，所述分离感兴趣区域中的结构分量和纹理分量的步骤包括：In some embodiments, the step of separating structural and texture components in the region of interest includes:

根据MCA方法对视频文件的每帧图像的感兴趣区域进行分离，获得每帧图像的结构分量纹理分量以及噪声分量；构建基于广义高斯密度分布的阈值选择算法，确定所述纹理分量和所述结构分量的硬阈值；根据所述纹理分量和所述结构分量的硬阈值，通过迭代阈值法求解结构分量和纹理分量。According to the MCA method, the region of interest of each frame image of the video file is separated to obtain the structure component, texture component and noise component of each frame image; a threshold selection algorithm based on generalized Gaussian density distribution is constructed to determine the texture component and the structure. The hard threshold of the component; according to the hard threshold of the texture component and the structural component, the structural component and the texture component are solved by an iterative threshold method.

具体的，首先构建基于自适应MCA的图像分离方法，其目的是为了分离视频中感兴趣区域(ROI)每帧图像的结构分量和纹理分量，从而得到结构分量、纹理分量以及结构分量与纹理分量之和与原图像的差值，即噪声分量N。Specifically, an image separation method based on adaptive MCA is first constructed, the purpose of which is to separate the structural components and texture components of each frame of the region of interest (ROI) in the video, so as to obtain the structural components, texture components, and structural components and texture components. The difference between the sum and the original image is the noise component N.

利用TQWT对图像进行5层分解，选取第5层信号的广义高斯密度函数，其中第五层信号中基本上只包含纹理成分，计算与图像的纹理成分对应的形态参数α,β，并确定最优秩ρ；获得最优秩ρ后，进一步使用迭代阈值法求解结构分量和纹理分量。Use TQWT to decompose the image into five layers, select the generalized Gaussian density function of the fifth layer signal, in which the fifth layer signal basically only contains texture components, calculate the morphological parameters α, β corresponding to the texture components of the image, and determine the most The optimal rank ρ; after obtaining the optimal rank ρ, further use the iterative threshold method to solve the structural component and the texture component.

进一步，如图2所示，所述结合欧拉视角放大所述感兴趣区域的微振动信号，重构包含目标对象的放大视频文件数据的步骤包括：结合欧拉视角，根据所述视频文件中每帧图像结构分量中各个像素的亮度变化来表征目标对象的振动变化，获得视觉振动信号；接收放大倍数，将所述放大倍数乘以视觉振动信号，再将每帧图像结构分量中各个像素的亮度变化加上放大后的视觉振动信号；根据目标对象的理论频带范围设计带通滤波，使得感兴趣区域在特定的频带中振动幅度较小的振动被放大；将所述感兴趣区域放大后的结构分量与纹理分量重新组合以生成振动放大视频。Further, as shown in FIG. 2 , the step of amplifying the micro-vibration signal of the region of interest in combination with the Euler perspective, and reconstructing the enlarged video file data including the target object includes: combining the Euler perspective, according to the video file. The brightness change of each pixel in the image structure component of each frame is used to characterize the vibration change of the target object, and the visual vibration signal is obtained; after receiving the magnification, the magnification is multiplied by the visual vibration signal, and then the value of each pixel in the image structure component of each frame is calculated. The brightness change plus the amplified visual vibration signal; the band-pass filter is designed according to the theoretical frequency band range of the target object, so that the vibration of the region of interest with a small vibration amplitude in a specific frequency band is amplified; the amplified region of interest is amplified. Structural components are recombined with texture components to generate vibrational magnification videos.

具体的，根据欧拉视角的原理，用视频每帧图像结构分量中各个像素的亮度变化来表示目标对象的振动变化，获得视觉振动信号，人工确定放大倍数，将放大倍数乘以视觉振动信号，将每帧图像结构分量中各个像素的亮度变化加上放大后的视觉振动信号，结合上述步骤中MCA方法分离获得的纹理分量，将放大后的结构分量与纹理分量重新组合以生成振动放大的视频。在这个视频中，根据目标对象的理论频带范围设计带通滤波，使得在特定的频带中振动幅度较小的振动可以被放大到易于感知的程度，获得目标对象的振动情况的可视化效果。构建基于欧拉视角的故障频率检测及可视化方法，其目的是针对微振动幅度变化较小的区域，采用视觉放大技术，增强振动信号的可视化效果，有利于分析目标对象的振动情况。Specifically, according to the principle of Euler's perspective, the brightness change of each pixel in the image structure component of each frame of the video is used to represent the vibration change of the target object, the visual vibration signal is obtained, the magnification is manually determined, and the magnification is multiplied by the visual vibration signal. The brightness change of each pixel in the structural components of each frame of the image is added to the amplified visual vibration signal, combined with the texture components obtained by the MCA method in the above steps, and the amplified structural components and texture components are recombined to generate a vibration amplified video. . In this video, the band-pass filter is designed according to the theoretical frequency band range of the target object, so that the vibration with a small vibration amplitude in a specific frequency band can be amplified to a degree that is easy to perceive, and the visualization effect of the vibration of the target object can be obtained. A fault frequency detection and visualization method based on Euler's perspective is constructed. The purpose is to use visual magnification technology to enhance the visualization of vibration signals for areas where the amplitude of micro-vibration changes is small, which is conducive to analyzing the vibration of the target object.

作为一个变化例，在步骤S100中，感兴趣区域(ROI)通过如下方式获得：As a variation example, in step S100, the region of interest (ROI) is obtained in the following manner:

构建SVM识别模型，将所述视频文件数据的某一帧图像输入所述SVM识别模型，获取所述目标对象的关键区域，作为感兴趣区域；其中所述SVM识别模型是通过训练集训练得到的。所述关键区域指的是目标对象振动明显的区域，例如在发动机中发动机缸体和金属件。Build an SVM recognition model, input a certain frame of image of the video file data into the SVM recognition model, and obtain the key area of the target object as a region of interest; wherein the SVM recognition model is obtained through training set training . The critical area refers to an area where the target object vibrates significantly, such as an engine block and metal parts in an engine.

为了使感兴趣区域(ROI)得识别有更好的效果，优选的，所述训练集的获取步骤包括：In order to have a better effect in identifying regions of interest (ROI), preferably, the step of acquiring the training set includes:

获取包含目标对象的多组图像，采用分水岭算法，分割出每一组图像的候选区域，所述每个候选区域内只包含所述目标对象的一种关键区域；例如发动机中管状区域、线状区域、块状区域等。Obtain multiple sets of images containing the target object, and use the watershed algorithm to segment candidate regions of each set of images, each candidate region only contains one key region of the target object; for example, the tubular region in the engine, Linear area, block area, etc.

抽取包含目标对象的不同视频文件，对其中的关键区域进行标记；采用滑动窗口的方式，将带有关键区域标记的图像块作为正例，采集背景图像块作为负例，构建多值分类的所述训练集；此处的背景图像块为不包含关键区域的图片。Extract different video files containing the target object, and mark the key areas in them; adopt the sliding window method, take the image blocks marked with key areas as positive examples, and collect background image blocks as negative examples to construct all the multi-value classification. The training set described above; the background image patches here are images that do not contain key regions.

下面以发动机感兴趣区域的获取为例，进行说明。The following takes the acquisition of the region of interest of the engine as an example for description.

S101：采用分水岭算法检测包含发动机作为目标对象的图像，获得不同区域之间的边界，，以便获得不同边界分割出的候选区域，每个候选区域内只包含一种部件；此处的候选区域可以指的是发动机中管状区域、线状区域、块状区域。S101: Use the watershed algorithm to detect the image containing the engine as the target object, and obtain the boundaries between different regions, so as to obtain candidate regions divided by different boundaries, and each candidate region contains only one type of component; the candidate region here can be Refers to the tubular area, linear area, block area in the engine.

S102:根据步骤101的方式，抽取不同的发动机视频，并对于每组发动机视频中多帧图像中的不同候选区域的关键对象进行标记以更丰富训练样本，例如对管状区域标记为“金属件”，对“块状区域”标记为“发动机缸体”；不同型号的发动机视频可以是在不同角度、不同光照条件下拍的发动机视频，来丰富训练集的内容。S102: According to the method of step 101, extract different engine videos, and mark key objects in different candidate regions in the multi-frame images in each group of engine videos to enrich the training samples, for example, mark the tubular region as "metal piece" , and mark the "block area" as "engine block"; engine videos of different models can be engine videos taken at different angles and under different lighting conditions to enrich the content of the training set.

S103:采用滑动窗口的方式，将带有关键区域标记的图像块作为正例，采集背景图像块作为负例，构建多值分类的所述训练集。在SVM识别模型测试过程中，残差网络输入图像块，经过每个层次的卷积，激励非线性变换和池化操作，最后输入到两层的全连接层中，输出候选类别的得分，并将最大的得分作为部件的预测类别；本示例选取发动机缸体和金属件作为发动机视觉振动信号提取的感兴趣(ROI，region of interest)区域。S103: In a sliding window manner, image blocks marked with key regions are used as positive examples, and background image blocks are collected as negative examples to construct the training set for multi-value classification. In the SVM recognition model testing process, the residual network input image block, after each level of convolution, excitation nonlinear transformation and pooling operations, and finally input to the two-layer fully connected layer, the output of the candidate category score, and The largest score is used as the predicted category of the component; in this example, the engine block and metal parts are selected as the region of interest (ROI, region of interest) for the extraction of the engine visual vibration signal.

在一些实施例中，所述步骤S200可以包括如下步骤：In some embodiments, the step S200 may include the following steps:

S201：利用品质因子可调小波字典表示视频图像的结构分量，以及利用离散正余弦字典表示视频图像的纹理分量；S201: Use a quality factor adjustable wavelet dictionary to represent the structural component of the video image, and use a discrete sine and cosine dictionary to represent the texture component of the video image;

S202：利用MCA方法对视频中的每帧图像进行分离，得到包含图像(即截取的感兴趣区域图像)的结构分量Y₁、包含纹理分量Y₂以及两分量之和与原图像的差值，即噪声分量N；S202: Use the MCA method to separate each frame of image in the video to obtain the structural component Y ₁ including the image (that is, the intercepted region of interest image), the texture component Y ₂ and the difference between the sum of the two components and the original image, That is, the noise component N;

δx＝Y₁+Y₂+N (1)，δx=Y ₁ +Y ₂ +N (1),

所述的构建基于广义高斯密度分布的阈值选择算法包括以下步骤：The described threshold selection algorithm based on generalized Gaussian density distribution includes the following steps:

步骤S311：通过迭代阈值法求解y₁和y₂：Step S311: Solve y ₁ and y ₂ by iterative threshold method:

根据MCA算法，确定目标函数形式为：According to the MCA algorithm, the form of the objective function is determined as:

其中，Φ₁表示TQWT的奇函数；Φ₂表示DCT的奇函数；T表示转秩；y₁表示TQWT域的系数；y₂表示DCT域的对应的系数；Among them, Φ ₁ represents the odd function of TQWT; Φ ₂ represents the odd function of DCT; T represents rank transfer; y ₁ represents the coefficient of the TQWT domain; y ₂ represents the corresponding coefficient of the DCT domain;

使用迭代阈值法求解y₁和y₂：Solve for y ₁ and y ₂ using the iterative threshold method:

其中，

表示y1的估计值；Y_n表示噪声；

表示Y₂在前一次迭代的估计误差。in,

represents the estimated value of y1; Y _n represents noise;

represents the estimated error of _Y2 at the previous iteration.

其中，

和n₁分别表示

和噪声的TQWT系数。in,

and n ₁ respectively represent

and TQWT coefficients for noise.

进一步，为了准确地估算y₁，需要设计一种方法准确地移除

表示y₂的估计误差，和y₂一样，为谐振分量(结构分量)；因此，当

用TQWT系数表示时，

的秩要低于r₁的秩，即

Further, in order to accurately estimate y ₁ , a method needs to be devised to accurately remove

represents the estimation error of y ₂ , which, like y ₂ , is the resonance component (structural component); therefore, when

When expressed by the TQWT coefficient,

The rank of is lower than the rank of r ₁ , that is

S312：确定

的最优秩，其目的是为了准确估算式(3)中的

在这里，我们把y₁和广义高斯密度函数联系起来，广义高斯密度函数：S312: OK

The optimal rank of , whose purpose is to accurately estimate the equation (3)

Here, we relate y ₁ to the generalized Gaussian density function, the generalized Gaussian density function:

其中α，β表示广义的形态参数；where α, β represent generalized morphological parameters;

最优秩ρ所对应的形态参数α应该是能给出一个

使其最接近y₁，可以写成目标函数的形式：The morphological parameter α corresponding to the optimal rank ρ should be able to give a

To make it closest to y ₁ , it can be written in the form of the objective function:

其中α_ρ表示广义的形态参数α中最优的形态参量；β_ρ表示广义的形态参数β中最优的形态参量。Among them, α _ρ represents the optimal morphological parameter in the generalized morphological parameter α; β _ρ represents the optimal morphological parameter in the generalized morphological parameter β.

在这里，利用TQWT对图像进行5层分解，选取第5层信号(基本上只包含冲击成分)的广义高斯密度函数，计算与其对应的形态参数α，β，并根据式(17)确定最优秩ρ。Here, TQWT is used to decompose the image into 5 layers, and the generalized Gaussian density function of the fifth layer signal (basically only contains the shock component) is selected, the corresponding morphological parameters α and β are calculated, and the optimal value is determined according to formula (17). rank ρ.

再确定最优秩ρ；获得最优秩ρ后，进一步使用迭代阈值法求解冲击分量(纹理分量)和谐振分量(结构分量)。Then determine the optimal rank ρ; after obtaining the optimal rank ρ, further use the iterative threshold method to solve the impact component (texture component) and the resonance component (structural component).

步骤S320：所述的基于自适应MCA的图像分离方法，包括以下步骤：Step S320: The described adaptive MCA-based image separation method includes the following steps:

S321：求解结构分量

S321: Solving for structural components

可以通过最优化方法求解，如下式：

It can be solved by the optimization method, as follows:

其中，

表示

的最优解；

表示

表示Y₂在前一次迭代的估计误差；||x||_F是x的Frobenious范数；通过使用奇异值分解(SVD：singular-value decomposition)，r₁可以分解成r₁＝U∑V^T，其中，∑＝diag(σ₁，….，σ_n)；V^T表示；U表示；in,

express

the optimal solution;

express

represents the estimation error of Y ₂ in the previous iteration; ||x|| _F is the Frobenious norm of x; by using singular value decomposition (SVD: singular-value decomposition), r ₁ can be decomposed into r ₁ =U∑V ^T , where ∑=diag(σ ₁ ,....,σ _n ); V ^T represents; U represents;

根据Eckart-Young-Mirsky定理，式(9)可以分解为：According to the Eckart-Young-Mirsky theorem, equation (9) can be decomposed into:

其中，η_ρ(∑)＝diag(σ₁，...，σ_n，0，...，0)where η _ρ (Σ)=diag(σ ₁ ,...,σ _n ,0,...,0)

y₁的粗估计

如下：rough estimate of y ₁

as follows:

其中

表示y1的最优解；

表示y2的最优解；最终的

可以通过如下所示的硬阈值决定：in

represents the optimal solution of y1;

represents the optimal solution of y2; the final

This can be determined by a hard threshold as shown below:

其中λ₁＝MAD/0.6745，MAD表示

绝对值的中位数；H表示决定硬阈值的算法；where λ ₁ =MAD/0.6745, MAD represents

median of absolute values; H represents the algorithm for determining the hard threshold;

S322：求解纹理分量

S322: Solve for texture components

根据式(3)

According to formula (3)

类似于r₁，r₂可以定义为Similar to r ₁ , r ₂ can be defined as

和n₂分别为

和噪声的DCT系数；此处的硬阈值λ₂正比于

误差R_T定义如下：

and _n2 are respectively

and the DCT coefficients of noise; here the hard threshold _λ2 is proportional to

The error R _T is defined as follows:

由此，我们可以估计出

From this, we can estimate

其中

可以通过式(10)计算得到；此处的硬阈值λ₂可以设置为，in

It can be calculated by formula (10); the hard threshold λ ₂ here can be set as,

最终的

可以通过如下所示的硬阈值决定：final

This can be determined by a hard threshold as shown below:

在一些实施例中，以发动机作为目标对象为例，所述结合欧拉视角放大所述感兴趣区域的微振动信号，重构包含目标对象的放大视频文件数据的步骤包括：In some embodiments, taking the engine as the target object as an example, the step of amplifying the micro-vibration signal of the region of interest in combination with the Euler perspective, the step of reconstructing the enlarged video file data including the target object includes:

步骤S331：根据欧拉视角的原理，用视频每帧图像结构分量中各个像素的亮度变化来表示目标对象的振动变化，人工确定放大倍数，将放大倍数乘以各个像素的亮度变化，将每帧图像结构分量中各个像素的亮度变化加上放大后的视觉振动信号，结合上述步骤中MCA方法分离获得的纹理分量，将放大后的结构分量与纹理分量重新组合以生成振动放大的视频。在这个视频中，根据目标对象的理论频带范围设计带通滤波，使得在特定的频带中振动幅度较小的振动可以被放大到易于感知的程度，获得目标对象的振动情况的可视化效果，具体公式如下：Step S331: According to the principle of Euler's perspective, use the brightness change of each pixel in the image structure component of each frame of the video to represent the vibration change of the target object, manually determine the magnification, and multiply the magnification by the brightness change of each pixel. The brightness change of each pixel in the image structural component is added to the amplified visual vibration signal, combined with the texture component obtained by the MCA method in the above steps, and the amplified structural component and texture component are recombined to generate a vibration-amplified video. In this video, the band-pass filter is designed according to the theoretical frequency band range of the target object, so that the vibration with a small vibration amplitude in a specific frequency band can be amplified to a degree that is easy to perceive, and the visualization effect of the vibration of the target object can be obtained. The specific formula as follows:

δ＝∑_kh(α·(Y_1k-Y₁₁))+Y₂ (19)δ=∑ _k h(α·(Y _1k -Y ₁₁ ))+Y ₂ (19)

其中，α表示放大倍数，h表示带通滤波，k表示当前帧，δ表示放大后的视频；Y₁₁表示第一帧的结构分量；Y₂表示纹理分量；Y_1k表示第k帧的结构分量；Among them, α represents the magnification factor, h represents the bandpass filter, k represents the current frame, and δ represents the amplified video; Y ₁₁ represents the structural component of the first frame; Y ₂ represents the texture component; Y _1k represents the structural component of the kth frame ;

步骤S332：针对获得的视觉振动信号，根据FFT变换，获得视觉振动信号对应的频谱，检测出目标对象对应的频率，具体公式如下：Step S332: For the obtained visual vibration signal, according to the FFT transformation, obtain the frequency spectrum corresponding to the visual vibration signal, and detect the frequency corresponding to the target object. The specific formula is as follows:

δy(t)＝∑_k∑_ih(Y_1k(i)-Y₁₁(i)) (20)δy(t)=∑ _k ∑ _i h(Y _1k (i)-Y ₁₁ (i)) (20)

其中，δy(t)表示视觉振动信号，i表示每帧视频中的像素。Among them, δy(t) represents the visual vibration signal, and i represents the pixel in each frame of video.

需要说明的是，申请人在研究过程中发现：现有论文[Wadhwa，M.Rubinstein，F.Durand，and W.Freeman，“Phase-based video motion processing，”ACMTrans.Graph.，vol.32，no.4，Jul.2013，Art.no.80.]使用的是基于相位的运动放大方法，易受相位缠绕和解缠算法的影响，从而造成混合频率检测不准确。现有论文[Learning-based Video Motion Magnication，2018]采用深度学习的方法分离形状成分和纹理成分，但其深度学习的网络仅仅用于实现运动放大，且深度学习的方法依赖于大量样本，目前，并不适用于实际的振动测量中。现有论文[ANew Approach to the Phase-Based VideoMotion Magnification for Measuring Microdisplacements，2019]采用Hilbert变换实现复可控金字塔的分解，其本质仍是基于相位的运动放大，不能解决相位缠绕的影响。It should be noted that during the research process, the applicant discovered: the existing paper [Wadhwa, M. Rubinstein, F. Durand, and W. Freeman, "Phase-based video motion processing," ACMTrans.Graph., vol.32, no.4, Jul.2013, Art.no.80.] uses a phase-based motion amplification method, which is susceptible to phase wrapping and unwrapping algorithms, resulting in inaccurate detection of mixed frequencies. The existing paper [Learning-based Video Motion Magnication, 2018] uses deep learning to separate shape components and texture components, but its deep learning network is only used to achieve motion amplification, and the deep learning method relies on a large number of samples. Currently, Not suitable for actual vibration measurement. The existing paper [ANew Approach to the Phase-Based VideoMotion Magnification for Measuring Microdisplacements, 2019] uses the Hilbert transform to achieve the decomposition of the complex controllable pyramid, and its essence is still phase-based motion amplification, which cannot solve the effect of phase winding.

与上述方法不同的是，本示例注重的是采用固定的字典，不涉及大量的实验样本。本示例利用形态分量分析的方式获得结构成分和纹理成分，从本质上分析，每个当前帧和第一帧的结构成分差对应的是正常振动，每个当前帧和第一帧的纹理成分差对应的是异常扰动，利用欧拉视角原理，本示例只放大结构成分，减小了纹理成分带来的干扰。同时，针对结构成分提取出的视觉振动信号不受异常扰动(如光线、不均匀表面)的影响，也不涉及到相位解缠算法，有效地反映了目标对象的振动情况，在此基础上做傅里叶变换，求其频谱，可以准确计算出的多个微振动的频率。Different from the above methods, this example focuses on using a fixed dictionary and does not involve a large number of experimental samples. This example uses morphological component analysis to obtain structural components and texture components. In essence, the difference between the structural components of each current frame and the first frame corresponds to normal vibration, and the difference between the texture components of each current frame and the first frame corresponds to normal vibration. Corresponding to the abnormal disturbance, using the principle of Euler's perspective, this example only amplifies the structural components, reducing the interference caused by the texture components. At the same time, the visual vibration signal extracted from the structural components is not affected by abnormal disturbances (such as light, uneven surface), and does not involve the phase unwrapping algorithm, which effectively reflects the vibration of the target object. Fourier transform, find its frequency spectrum, can accurately calculate the frequency of multiple micro-vibration.

如图1和2所示，本示例还提供一种基于形态分量分析的视觉振动检测方法，将获得的视觉振动信号，通过FFT变换，获得视觉振动信号对应的频谱；将所述视觉振动信号对应的频谱结合上述实施例获得的振动放大视频，对目标对象的振动情况进行检测；通过工程人员分析发动机故障。As shown in Figures 1 and 2, this example also provides a visual vibration detection method based on morphological component analysis. The obtained visual vibration signal is transformed by FFT to obtain a frequency spectrum corresponding to the visual vibration signal; the visual vibration signal corresponding to The frequency spectrum of the target object is combined with the vibration amplification video obtained in the above embodiment to detect the vibration of the target object; the engine failure is analyzed by engineers.

具体的，用视频每帧图像结构分量中各个像素的亮度变化来表示目标对象的振动变化，获得视觉振动信号，在此基础上，利用FFT(傅里叶)变换，获得视觉振动信号对应的频谱，检测出目标对象对应的频率；在这个视频中，根据目标对象的理论频带范围设计带通滤波，使得在特定的频带中振动幅度较小的振动可以被放大到易于感知的程度，获得目标对象的振动情况的可视化效果。结合发动机故障可视化视频和发动机故障频谱，分析发动机故障。本示例中发动机可以是柴油发动机、汽油发动机、电动汽车电动机等。Specifically, the brightness change of each pixel in the image structure component of each frame of the video is used to represent the vibration change of the target object, and the visual vibration signal is obtained. On this basis, FFT (Fourier) transformation is used to obtain the spectrum corresponding to the visual vibration signal. , to detect the frequency corresponding to the target object; in this video, a band-pass filter is designed according to the theoretical frequency band range of the target object, so that the vibration with a small vibration amplitude in a specific frequency band can be amplified to a degree that is easy to perceive, and the target object can be obtained. The visualization of the vibration situation. Combine engine failure visualization video and engine failure spectrum to analyze engine failures. The engine in this example may be a diesel engine, a gasoline engine, an electric vehicle motor, or the like.

示例性装置Exemplary device

如图7所示，一种基于形态分量分析的视觉振动放大系包括：As shown in Figure 7, a visual vibration amplification system based on morphological component analysis includes:

提取模块20，其用于获取包括目标对象的视频文件数据，根据所述视频文件数据的某一帧图像的几何特征，选定目标对象的感兴趣区域；Extraction module 20, which is used to obtain the video file data including the target object, and selects the region of interest of the target object according to the geometric feature of a certain frame of the video file data;

构造模块30，其用于构造表征所述视频文件每帧图像感兴趣区域的结构成分；构造表征所述视频文件每帧图像感兴趣区域的纹理成分；及A construction module 30, which is used for constructing a structural component representing the region of interest of each frame of the video file; constructing a texture component representing the region of interest of each frame of the video file; and

重构模块40，其用于分离感兴趣区域中的结构分量和纹理分量，将分离出的结构分量和纹理分量，结合欧拉视角放大目标对象微振动信号，重构包含目标对象的放大视频文件。The reconstruction module 40 is used to separate the structural component and the texture component in the region of interest, and the separated structural component and the texture component are combined with the Euler perspective to amplify the micro-vibration signal of the target object, and reconstruct the enlarged video file containing the target object .

示例性电子设备Exemplary Electronics

下面，参考图1来描述根据本申请实施例的电子设备。该电子设备可以是可移动设备本身，或与其独立的单机设备，该单机设备可以与可移动设备进行通信，以从它们接收所采集到的输入信号，并向其发送所选择的目标决策行为。Hereinafter, an electronic device according to an embodiment of the present application will be described with reference to FIG. 1 . The electronic device can be the mobile device itself, or a stand-alone device independent of it, which can communicate with the mobile devices to receive collected input signals from them and transmit to them selected target decision-making behaviors.

图4图示了根据本申请实施例的电子设备的框图。FIG. 4 illustrates a block diagram of an electronic device according to an embodiment of the present application.

如图4所示，电子设备10包括一个或多个处理器11和存储器12。As shown in FIG. 4 , the electronic device 10 includes one or more processors 11 and a memory 12 .

处理器11可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元，并且可以控制电子设备10中的其他组件以执行期望的功能。Processor 11 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 10 to perform desired functions.

存储器12可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器11可以运行所述程序指令，以实现上文所述的本申请的各个实施例的决策行为决策方法以及/或者其他期望的功能。Memory 12 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory, or the like. The non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 11 may execute the program instructions to implement the decision-making behavior decision method and/or the above-described various embodiments of the present application Other desired features.

在一个示例中，电子设备10还可以包括：输入装置13和输出装置14，这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。例如，该输入设备13可以包括例如车载诊断系统(OBD)、统一诊断服务(UDS)、惯性测量单元(IMU)、摄像头、激光雷达、毫米波雷达、超声波雷达、车载通信(V2X)等各种设备。该输入设备13还可以包括例如键盘、鼠标等等。该输出装置14可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。In one example, the electronic device 10 may also include an input device 13 and an output device 14 interconnected by a bus system and/or other form of connection mechanism (not shown). For example, the input device 13 may include various types such as on-board diagnostic system (OBD), unified diagnostic service (UDS), inertial measurement unit (IMU), camera, lidar, millimeter-wave radar, ultrasonic radar, vehicle-to-vehicle communication (V2X), etc. equipment. The input device 13 may also include, for example, a keyboard, a mouse, and the like. The output device 14 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

当然，为了简化，图4中仅示出了该电子设备10中与本申请有关的组件中的一些，省略了诸如总线、输入/输出接口等等的组件。除此之外，根据具体应用情况，电子设备10还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components in the electronic device 10 related to the present application are shown in FIG. 4 , and components such as buses, input/output interfaces, and the like are omitted. Besides, the electronic device 10 may also include any other suitable components according to the specific application.

示例性计算机程序产品和计算机可读存储介质Exemplary computer program product and computer readable storage medium

除了上述方法和设备以外，本申请的实施例还可以是计算机程序产品，其包括计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的决策行为决策方法中的步骤。In addition to the methods and apparatuses described above, embodiments of the present application may also be computer program products comprising computer program instructions that, when executed by a processor, cause the processor to perform the "exemplary methods" described above in this specification The steps in the decision-making method according to various embodiments of the present application described in the section.

所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的程序代码，所述程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product can write program codes for performing the operations of the embodiments of the present application in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as "C" language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

此外，本申请的实施例还可以是计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的决策行为决策方法中的步骤。In addition, embodiments of the present application may also be computer-readable storage media having computer program instructions stored thereon, the computer program instructions, when executed by a processor, cause the processor to perform the above-mentioned "Example Method" section of this specification The steps in the decision-making method according to various embodiments of the present application described in .

所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

以上结合具体实施例描述了本申请的基本原理，但是，需要指出的是，在本申请中提及的优点、优势、效果等仅是示例而非限制，不能认为这些优点、优势、效果等是本申请的各个实施例必须具备的。另外，上述公开的具体细节仅是为了示例的作用和便于理解的作用，而非限制，上述细节并不限制本申请为必须采用上述具体的细节来实现。The basic principles of the present application have been described above in conjunction with specific embodiments. However, it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present application are only examples rather than limitations, and these advantages, advantages, effects, etc., are not considered to be Required for each embodiment of this application. In addition, the specific details disclosed above are only for the purpose of example and easy understanding, rather than limiting, and the above-mentioned details do not limit the application to be implemented by using the above-mentioned specific details.

本申请中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的，可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇，指“包括但不限于”，且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”，且可与其互换使用，除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”，且可与其互换使用。The block diagrams of devices, apparatus, apparatuses, and systems referred to in this application are merely illustrative examples and are not intended to require or imply that the connections, arrangements, or configurations must be in the manner shown in the block diagrams. As those skilled in the art will appreciate, these means, apparatuses, apparatuses, systems may be connected, arranged, configured in any manner. Words such as "including", "including", "having" and the like are open-ended words meaning "including but not limited to" and are used interchangeably therewith. As used herein, the words "or" and "and" refer to and are used interchangeably with the word "and/or" unless the context clearly dictates otherwise. As used herein, the word "such as" refers to and is used interchangeably with the phrase "such as but not limited to".

还需要指出的是，在本申请的装置、设备和方法中，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本申请的等效方案。It should also be pointed out that in the apparatus, equipment and method of the present application, each component or each step can be decomposed and/or recombined. These disaggregations and/or recombinations should be considered as equivalents of the present application.

提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本申请。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的，并且在此定义的一般原理可以应用于其他方面而不脱离本申请的范围。因此，本申请不意图被限制到在此示出的方面，而是按照与在此公开的原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use this application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Therefore, this application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

为了例示和描述的目的已经给出了以上描述。此外，此描述不意图将本申请的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例，但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for the purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

1. A visual vibration amplification method based on morphological component analysis is characterized by comprising the following steps:

s100: acquiring a video file comprising a target object, and determining an interested area of the target object in each frame of image in the video file;

s200: constructing a structural component for representing the interested region in each frame of image of the video file; constructing texture components for representing the interested areas of each frame of image in the video file; constructing a quality factor adjustable wavelet dictionary for representing structural components of the image, and constructing a local discrete cosine transform and discrete sine transform dictionary for representing texture components of the image;

s300: separating the structure component, the texture component and the noise component in the region of interest, amplifying the micro-vibration signal of the region of interest by combining the separated structure component and texture component with an Euler visual angle, and reconstructing an amplified video file containing a target object; the structural component difference of each current frame and the first frame corresponds to normal vibration, the texture component difference of each current frame and the first frame corresponds to abnormal disturbance, only the structural component is amplified by utilizing an Euler visual angle principle, and the interference brought by the texture component is reduced;

the step of separating the structure component and the texture component in the region of interest comprises:

separating the interested region of each frame of image of the video file by using an MCA method to obtain a structure component texture component and a noise component of each frame of image; the difference between the sum of the structural component and the texture component and the original image is a noise component;

constructing a threshold selection algorithm based on generalized Gaussian density distribution, and determining hard thresholds of the texture component and the structure component;

solving the structure component and the texture component by an iterative threshold method according to the texture component and the hard threshold of the structure component;

the step of reconstructing the enlarged video file data containing the target object by enlarging the micro-vibration signal of the region of interest in combination with the Euler view angle comprises the following steps:

representing the vibration change of a target object according to the brightness change of each pixel in each frame of image structure component in the video file to obtain a visual vibration signal;

receiving an amplification factor, multiplying the amplification factor by a visual vibration signal, and adding the brightness change of each pixel in each frame of image structure component to the amplified visual vibration signal;

designing band-pass filtering according to the theoretical frequency band range of the target object, so that the vibration amplitude of the region of interest in a specific frequency band is amplified;

and recombining the structure component and the texture component after the region of interest is enlarged to generate the vibration enlarged video.

2. The visual vibration amplifying method based on morphological component analysis as claimed in claim 1, wherein the step of determining the interest region of the target object in each frame of image in the video file comprises:

constructing an SVM recognition model, inputting a certain frame of image of the video file data into the SVM recognition model, and acquiring a key area of the target object as an interesting area; the SVM recognition model is obtained by training through a training set.

3. The visual vibration amplifying method based on morphological component analysis as claimed in claim 2, wherein the step of obtaining the training set comprises:

acquiring a plurality of groups of images containing target objects, segmenting candidate regions of each group of images by adopting a watershed algorithm, wherein each candidate region only contains one group of key objects of the target objects, and marking the candidate regions;

and adopting a sliding window mode, taking the candidate area image block with the key object mark as a positive example, and taking the background image block with the irrelevant object mark as a negative example, and constructing a training set of multi-value classification.

4. A visual vibration detection method based on morphological component analysis is characterized by comprising the following steps:

obtaining a frequency spectrum corresponding to the visual vibration signal through fast Fourier transform of the obtained visual vibration signal;

and detecting the vibration condition of the target object by combining the corresponding frequency spectrum of the visual vibration signal with the vibration amplification video obtained by any one of claims 1 to 3.

5. A visual vibration amplification system based on morphological component analysis, comprising:

the extraction module is used for acquiring a video file comprising a target object and determining an interest area of the target object in each frame of image in the video file;

the construction module is used for constructing structural components for representing the interested region of each frame of image of the video file; texture components representing the interested region of each frame of image of the video file are constructed, wherein a quality factor adjustable wavelet dictionary is constructed and used for representing the structural components of the image, and a local discrete cosine transform and discrete sine transform dictionary is constructed and used for representing the texture components of the image; and

the reconstruction module is used for separating the structure component and the texture component in the region of interest, amplifying the micro-vibration signal of the region of interest by combining the separated structure component and the texture component with an Euler visual angle, and reconstructing an amplified video file containing a target object;

the structural component difference of each current frame and the first frame corresponds to normal vibration, the texture component difference of each current frame and the first frame corresponds to abnormal disturbance, only the structural component is amplified by utilizing an Euler visual angle principle, and the interference brought by the texture component is reduced;

6. An electronic device comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being connected in series, the memory being configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-4.

7. A readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-4.