CN110245267B

CN110245267B - Multi-user video stream deep learning sharing calculation multiplexing method

Info

Publication number: CN110245267B
Application number: CN201910413748.7A
Authority: CN
Inventors: 汤善江; 刘言杰; 于策; 孙超; 肖健
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-05-17
Filing date: 2019-05-17
Publication date: 2023-08-11
Anticipated expiration: 2039-05-17
Also published as: CN110245267A

Abstract

The invention relates to a computer and video processing, in order to realize the service of providing video inquiry through technologies such as target recognition and target detection in a multi-user scene, the invention discloses a multi-user video stream deep learning sharing calculation multiplexing method, firstly, when a request with detection operation or recognition operation comes, the request is combined according to the relevance of space dimension; then, according to the relevance of the time dimension, inquiring whether proper data are available for multiplexing, and calling a deep learning model with configured parameters for analysis; for the non-reusable part, firstly, finding the most suitable parameter configuration in the analysis process according to the balance relation of the speed precision, then calling a difference detector and a deep learning model to perform video analysis according to the parameter configuration, finally outputting an analysis result and storing the analysis result into a data warehouse, and a lifting module is used for lifting the original result precision in the database so as to facilitate multiplexing the query request with high precision. The invention is mainly applied to video processing occasions.

Description

Multi-user video stream deep learning shared computing multiplexing method

技术领域technical field

本发明涉及计算机、视频处理，具体涉及多用户视频流深度学习共享计算复用方法。The invention relates to computer and video processing, in particular to a multi-user video stream deep learning shared computing multiplexing method.

背景技术Background technique

当前，深度学习已经成为推动人工智能应用落地与普及的重要引擎。尤其是在计算机视觉领域，深度学习的快速发展为该领域带来了深刻的变革，其中最为具有代表性的是图像分析技术如目标检测与目标识别等技术的不断发展。目标检测可用来对目标进行分类，例如识别出图像中是否有狗，有人，有桌子，但是无法识别出该人的姓名；目标识别则可识别出该人的具体身份。对于一张图像，应用目标检测模型可以快速的识别图像中所有的目标，这为视频分析过程带来了巨大的变化。At present, deep learning has become an important engine to promote the application and popularization of artificial intelligence. Especially in the field of computer vision, the rapid development of deep learning has brought profound changes to the field, the most representative of which is the continuous development of image analysis technologies such as target detection and target recognition. Object detection can be used to classify objects, such as identifying whether there is a dog, a person, or a table in the image, but the name of the person cannot be recognized; object recognition can identify the specific identity of the person. For an image, applying the object detection model can quickly identify all the objects in the image, which brings a huge change to the video analysis process.

在传统的视频分析过程中，主要通过手动打标签的方式为用户提供查询服务。借助于深度学习的发展，通过目标识别、目标检测等技术可自动分析视频内容，提供更加优质与丰富的查询服务。例如，当用户剪辑视频时，可以通过目标识别算法自动地找到视频中某人出现的所有片段，这为用户带来了极大的便利性。因此，使用基于深度学习的视频流分析替代传统的手动标签式视频查询方式已经逐渐成为一种趋势。不过，因为深度学习模型资源需求高，训练时间长等特点，通常会采用资源共享的方式让多个用户共享一个计算平台，从而提高深度学习系统计算资源的利用率和降低企业成本。In the traditional video analysis process, query services are provided to users mainly through manual labeling. With the help of the development of deep learning, video content can be automatically analyzed through target recognition, target detection and other technologies to provide more high-quality and richer query services. For example, when a user edits a video, the target recognition algorithm can automatically find all the clips in which someone appears in the video, which brings great convenience to the user. Therefore, it has gradually become a trend to use deep learning-based video stream analysis to replace traditional manual tagged video query methods. However, due to the high resource requirements and long training time of deep learning models, resource sharing is usually used to allow multiple users to share a computing platform, thereby improving the utilization of computing resources in deep learning systems and reducing enterprise costs.

现有技术的缺点Disadvantages of existing technology

现有的基于深度学习的视频分析技术大多具有两个方面的缺点：一是查询结果单一，二是没有解决共享平台下多用户查询的局部性。查询结果单一在noscope，chameleon等系统中体现的尤为明显。一般来说，目标检测模型可以识别出数千种不同的目标类别，但识别速度较慢。因此，为解决深度学习模型处理视频数据过慢的问题，noscope提出通过训练网络层数更少的专有模型来对车辆进行识别，大大提高了识别速度。但是这个方式放弃了深度学习模型的通用性，当用户需要查询多个目标时便无法满足其需求。Most of the existing video analysis technologies based on deep learning have two shortcomings: one is that the query results are single, and the other is that they do not solve the locality of multi-user queries under the shared platform. The singleness of query results is especially evident in systems such as noscope and chameleon. In general, object detection models can recognize thousands of different object categories, but the recognition speed is slow. Therefore, in order to solve the problem that the deep learning model processes video data too slowly, noscope proposes to recognize vehicles by training a proprietary model with fewer network layers, which greatly improves the recognition speed. However, this method gives up the versatility of the deep learning model, and it cannot meet the needs of users when they need to query multiple targets.

另一方面，noscope、chameleon、videostorm这些模型没有解决共享平台下多用户查询的局限性。局限性体现在以下几个方面：On the other hand, models such as noscope, chameleon, and videostorm do not solve the limitations of multi-user queries under a shared platform. The limitations are reflected in the following aspects:

时间维度的局部性。时间维度的局部性表现在不同时段查询在查询数据上具有重复性，当新的查询到来时，前面已经有一些用户对相同的内容进行了查询关注。这在一些热门视频上表现尤为明显。时间维度的局部性还体现在连续帧内容上的相似性。由于视频数据本身所独有的特点，连续帧之间具有极高的相似性以保持视频的连贯性。另外，在一些监控视频如公园中的摄像头监控，该视频中部分时间(如晚间)内容上的变化极小。当通过深度学习模型识别视频帧中的目标时，这些连续帧之间的识别结果也会基本相同，因此这些连续帧带来了大量的重复计算。Locality in the time dimension. The locality of the time dimension is manifested in the fact that queries in different time periods are repetitive in terms of query data. When a new query arrives, some users have already paid attention to the same content. This is especially evident on some popular videos. The locality of the time dimension is also reflected in the similarity in the content of consecutive frames. Due to the unique characteristics of the video data itself, there is a high similarity between consecutive frames to maintain the coherence of the video. In addition, in some surveillance videos, such as camera surveillance in a park, the content of the video changes very little during part of the time (such as at night). When the target in the video frame is recognized by the deep learning model, the recognition results between these consecutive frames will be basically the same, so these consecutive frames bring a lot of repeated calculations.

空间维度的局部性。主要表现在同一段数据会被同时到来的多个推理请求同时进行查询处理。这些查询往往存在着相同和相似性。对于这些查询计算，往往可以对其进行裁剪合并操作，以避免不必要的冗余计算。Locality in spatial dimensions. The main performance is that the same piece of data will be queried and processed by multiple reasoning requests arriving at the same time. These queries often have the same and similarities. For these query calculations, it is often possible to perform pruning and merging operations to avoid unnecessary redundant calculations.

数据结果逻辑间的局部性。体现在不同模型间的数据结果存在逻辑上的复用的可能性。例如物体检测和物体识别彼此往往存在着逻辑上的复用关系。例如，对于一个视频的某一帧，当物体检测表明没有人的时候，物体识别算法则不可能识别出张三，从而避免了物体识别人的计算。反之，物体识别的结果也能直接在一定情况下复用到物体检测当中。如通过物体识别算法表明存在张三，则表明是有人，直接避免了对于人的物体检测。Locality between data result logic. There is a possibility of logical reuse of the data results reflected in different models. For example, object detection and object recognition often have a logical multiplexing relationship with each other. For example, for a certain frame of a video, when the object detection shows that there is no person, it is impossible for the object recognition algorithm to recognize Zhang San, thereby avoiding the calculation of the object recognition person. Conversely, the results of object recognition can also be directly reused in object detection under certain circumstances. If the object recognition algorithm shows that there is Zhang San, it indicates that there is a person, which directly avoids object detection for people.

发明内容Contents of the invention

为克服现有技术的不足，本发明旨在提出一种多用户视频流深度学习共享计算复用方法。实现在多用户场景下，通过目标识别和目标检测等技术提供视频查询的服务。并针对共享计算环境下多用户视频查询存在的局部性进行优化，使得多用户间能够针对深度学习推理结果进行共享，从而提高运行速度，解决深度学习模型在处理视频数据时速度过低的问题。同时，解决因共享带来的速度与精度的平衡问题。为此，本发明采取的技术方案是，多用户视频流深度学习共享计算复用方法，首先，当带有检测操作或识别操作的请求到来时，根据空间维度的关联性，对请求进行合并，裁剪掉请求中存在重叠的部分；然后，根据时间维度的关联性，请求到物体检测数据库或物体识别数据库中进行检索，查询是否已有合适的数据可供复用，当不存在可复用的数据时，根据逻辑间的关联性，应用物体识别数据库或物体检测数据库中数据对检测操作或识别操作进行过滤减少无效计算，再调用配置好参数的深度学习模型进行分析；对于不可复用的部分，首先根据速度精度的平衡关系，找到分析过程中最合适的参数配置，然后据此调用差异检测器与深度学习模型进行视频分析，最终将分析结果输出并存入数据仓库，提升模块则用来提升数据库中原有的结果精度以便于复用给高精度的查询请求。In order to overcome the deficiencies of the prior art, the present invention aims to propose a multi-user video stream deep learning shared computing multiplexing method. Realize the provision of video query services through target recognition and target detection technologies in multi-user scenarios. And optimize the locality of multi-user video query in the shared computing environment, so that multiple users can share the results of deep learning reasoning, thereby improving the running speed and solving the problem that the speed of deep learning models is too low when processing video data. At the same time, solve the balance between speed and precision brought about by sharing. For this reason, the technical solution adopted by the present invention is a multi-user video stream deep learning shared computing multiplexing method. First, when a request with a detection operation or a recognition operation arrives, the requests are merged according to the relevance of the spatial dimension. Cut out the overlapping parts of the request; then, according to the relevance of the time dimension, request to search in the object detection database or object recognition database, and check whether there is suitable data available for reuse. When processing data, according to the correlation between logics, the data in the object recognition database or object detection database is used to filter the detection operation or recognition operation to reduce invalid calculations, and then call the deep learning model with configured parameters for analysis; for non-reusable parts , first find the most suitable parameter configuration in the analysis process according to the balance between speed and accuracy, then call the difference detector and deep learning model to analyze the video, and finally output the analysis results and store them in the data warehouse. The promotion module is used to Improve the accuracy of the original results in the database so that they can be reused for high-precision query requests.

需要对相关参数如分辨率、深度学习模型的选择、跳帧率进行合理配置。It is necessary to reasonably configure related parameters such as resolution, selection of deep learning models, and frame skipping rate.

时间维度的复用具体步骤：Specific steps for multiplexing the time dimension:

1)采用差异检测的方式检测连续帧之间的相似度，差异检测器获取连续帧的直方图并计算直方图距离，进一步计算出相似度，然后根据相似度判断数据是否可进行复用；1) Use the method of difference detection to detect the similarity between consecutive frames. The difference detector obtains the histogram of consecutive frames and calculates the histogram distance, further calculates the similarity, and then judges whether the data can be reused according to the similarity;

2)新的请求到来时，首先对请求进行拆分为两部分，可复用部分与不可复用部分，可复用部分的数据已存在于数据库中，可直接通过运行时间更快的数据库查询操作获取结果；对于不可复用的部分，仍通过深度学习模型进行处理获取请求结果，然后将请求到的结果反馈给数据库，以便于后续查询进行复用。2) When a new request comes, the request is first split into two parts, the reusable part and the non-reusable part. The data of the reusable part already exists in the database, and can be directly queried through the database with faster running time The operation obtains the result; for the part that cannot be reused, it is still processed through the deep learning model to obtain the request result, and then the requested result is fed back to the database for reuse in subsequent queries.

空间维度的复用：采用请求裁剪合并的方式对请求进行合并，裁剪掉其中的重叠部分，减少多次的重复请求。Multiplexing of spatial dimensions: Merge requests by cutting and merging requests, cut out overlapping parts, and reduce repeated requests.

逻辑维度的复用：对物体检测与物体识别的数据建立关联，找到不同模型间存在关联性的数据从而进行互相过滤。Reuse of logical dimensions: establish associations between object detection and object recognition data, and find data that is correlated between different models to filter each other.

逻辑维度的复用具体步骤：Specific steps for reuse of logical dimensions:

一个视频中含有m帧图像，共有l类物体可检测和k个人物可识别，使用Dl*m和Rk*m两个矩阵来对数据进行存储，Dl*m为代表物体矩阵，Rk*m为代表人物矩阵，当新的查询到来时，首先查询Dl*m和Rk*m中是否有对应数据存在，若有则直接使用，Dij为1表示第i帧图片中存在j物体，为0则表示不存在该物体；A video contains m frames of images. There are l types of objects that can be detected and k people can be identified. Two matrices Dl*m and Rk*m are used to store the data. Dl*m represents the matrix of objects, and Rk*m is Represents the character matrix. When a new query comes, first check whether there is corresponding data in Dl*m and Rk*m. If there is, use it directly. Dij is 1, which means there is j object in the i-th frame of the picture, and 0 means the object does not exist;

定理一：如果D1j为0，那么当0<j<＝m时，R*j也为0；Theorem 1: If D1j is 0, then when 0<j<=m, R*j is also 0;

定理二：如果Rij为1，那么当0<j<＝m时，D1j也为1；Theorem 2: If Rij is 1, then when 0<j<=m, D1j is also 1;

定理一表示，当第j帧检测到没有人时，那么目标识别模型在第j帧将会检测不到任何人；定理二表示，当第j帧中识别出人物i时，目标检测矩阵会自动更新第j帧中有人，由此，在进行逻辑复用时，根据定理一与定理二动态更新数据库数据，为目标检测和目标识别建立关联。Theorem 1 indicates that when no one is detected in the jth frame, the target recognition model will not detect anyone in the jth frame; Theorem 2 indicates that when the person i is recognized in the jth frame, the target detection matrix will automatically Update the person in the j-th frame. Therefore, when performing logic multiplexing, dynamically update the database data according to Theorem 1 and Theorem 2, and establish associations for target detection and target recognition.

平衡查询的精度与速度具体步骤：The specific steps to balance the accuracy and speed of queries:

步骤一、对不同参数下的不同模型的精度与速度关系进行拟合，获取各个模型速度与精度的对应关系，在查询到来时便可选择最优的参数进行视频分析，以达到用户需求；Step 1. Fit the relationship between accuracy and speed of different models under different parameters, obtain the corresponding relationship between the speed and accuracy of each model, and select the optimal parameters for video analysis when the query arrives, so as to meet user needs;

步骤二、根据马尔科夫链法则，由前一帧的精度以及两帧之间的差异度，再加上相应的调节参数，来预测后一帧的精度，通过验证实验来不断更正调节参数的数值，从而能够进行准确的精度评价，具体使用δdiff代表两帧之间的差异度，使用A(fi-1)代表i-1帧的精度，使用k作为调节参数，那么根据马尔科夫链法则，第i帧的精度为：Step 2. According to the Markov chain rule, the accuracy of the next frame is predicted from the accuracy of the previous frame and the difference between the two frames, plus the corresponding adjustment parameters, and the adjustment parameters are continuously corrected through verification experiments. value, so that accurate accuracy evaluation can be performed. Specifically, δdiff is used to represent the difference between two frames, A(fi-1) is used to represent the accuracy of i-1 frame, and k is used as an adjustment parameter, then according to the Markov chain rule , the accuracy of the i-th frame is:

A(fi-1)＝k*δdiff*A(fi-1).A(fi-1)＝k*δdiff*A(fi-1).

最后，推理请求之间的结果复用是有条件的复用；Finally, the multiplexing of results between reasoning requests is conditional multiplexing;

步骤三、保留深度学习模型的结果，对差异检测器的部分结果使用高精度模型进行重新检测，并充分复用差异检测器关于两帧之间相似度的计算结果，重新评估连续帧之间的精度，从而在整体上提高检测精度。Step 3. Retain the results of the deep learning model, re-detect part of the results of the difference detector using a high-precision model, and fully reuse the calculation results of the difference detector on the similarity between two frames, and re-evaluate the difference between consecutive frames. Accuracy, thereby improving the detection accuracy as a whole.

有条件的复用具体包括采用增加使用深度学习模型的频次。Conditional reuse specifically includes the use of increased frequency of use of deep learning models.

本发明的特点及有益效果是：Features and beneficial effects of the present invention are:

本发明搭建了多样的深度学习模型，因而能够实现在多用户场景下，通过目标识别目标检测等技术提供视频查询的服务。并针对共享计算环境下多用户视频查询存在的局限性进行优化，使得多用户间能够针对深度学习推理结果进行共享，从而提高运行速度，解决深度学习模型在处理视频数据时速度过低的问题。同时，解决因共享带来的速度与精度的平衡问题。The present invention builds a variety of deep learning models, so it can provide video query services through technologies such as target recognition and target detection in a multi-user scenario. It also optimizes the limitations of multi-user video query in a shared computing environment, so that multiple users can share the results of deep learning reasoning, thereby improving the running speed and solving the problem that the speed of deep learning models is too low when processing video data. At the same time, solve the balance between speed and precision brought about by sharing.

附图说明：Description of drawings:

图1：同类型数据的请求结果复用。Figure 1: Multiplexing of request results for the same type of data.

图2：查询的合并处理。Figure 2: Merge processing of queries.

图3：逻辑维度的复用。Figure 3: Multiplexing of logical dimensions.

图4：多推理请求结果复用整体架构。Figure 4: The overall architecture of multiple inference request result multiplexing.

具体实施方式Detailed ways

本发明涉及计算机视觉与高性能计算领域，提出了一种多用户视频流深度学习共享计算复用方法。该方法实现了在多用户场景下，通过目标识别目标检测等技术提供视频查询的服务。并针对共享计算环境下多用户视频查询存在的局部性进行优化，使得多用户间能够针对深度学习推理结果进行共享，从而提高运行速度，解决深度学习模型在处理视频数据时速度过低的问题。同时，解决因共享带来的速度与精度的平衡问题。The invention relates to the fields of computer vision and high-performance computing, and proposes a multi-user video stream deep learning shared computing multiplexing method. The method realizes the provision of video query service through technologies such as target recognition and target detection in a multi-user scenario. And optimize the locality of multi-user video query in the shared computing environment, so that multiple users can share the results of deep learning reasoning, thereby improving the running speed and solving the problem that the speed of deep learning models is too low when processing video data. At the same time, solve the balance between speed and precision brought about by sharing.

针对现有技术的局限性，我们提出一个新的视频流分析方法，支持多用户共享环境下的数据共享从而提高分析速度。在共享计算环境中，用户提交的推理请求内容包括几个方面：请求类型(物体检测，物体识别，物体追踪等)，请求数据及范围，需求精度等。为满足用户多样化的推理请求内容，需在共享计算环境中搭建多样的深度学习模型，在请求到来时动态地选择合适的模型以满足用户需求。在多推理请求结果复用问题中，研究重点便是在这多样化的请求内容之中，找到多推理请求之间的联系，构建多请求之间的关联性关系进行复用从而加快运行时间，同时解决因复用而带来的精度等问题，达到满足用户需求的效果。Aiming at the limitations of existing technologies, we propose a new video stream analysis method, which supports data sharing in a multi-user sharing environment to improve analysis speed. In a shared computing environment, the inference request submitted by the user includes several aspects: request type (object detection, object recognition, object tracking, etc.), requested data and range, required accuracy, etc. In order to meet the diverse content of reasoning requests from users, it is necessary to build a variety of deep learning models in a shared computing environment, and dynamically select the appropriate model to meet user needs when the request comes. In the multiplexing of the results of multiple inference requests, the focus of research is to find the connection between multiple inference requests among the diverse request content, and build the correlation relationship between multiple requests for reuse to speed up the running time. At the same time, it solves the problems of accuracy and other issues caused by multiplexing, and achieves the effect of meeting user needs.

本方法主要包括两大模块，一是构建请求间的关联性，二是解决因复用带来的精度与速度的平衡问题。This method mainly includes two modules, one is to build the correlation between requests, and the other is to solve the problem of balance between precision and speed caused by multiplexing.

构建请求间的关联性Build associations between requests

步骤一，时间维度的复用。针对视频数据，时间维度的关联性体现在两个方面：一是在视频分析中，连续帧之间在内容上具有高度的相似性；二是对于热门数据，会有大量不同时段的查询会对其进行检索分析。这两个方面带来了大量复用的可能性。首先对于连续帧的相似性，由于深度学习模型推理请求时间较长，因此对于高度相似的两帧，可将前一帧(称之为参考帧)的结果复用给后一帧使用，从而避免了深度学习模型对后一帧的检测，提高速度。为此，我们采用差异检测的方式检测连续帧之间的相似度，差异检测器获取连续帧的直方图并计算直方图距离，进一步计算出相似度。然后根据相似度判断数据是否可进行复用。Step 1, the multiplexing of the time dimension. For video data, the relevance of the time dimension is reflected in two aspects: first, in video analysis, there is a high degree of similarity in content between consecutive frames; second, for popular data, there will be a large number of queries at different time periods that will It performs search analysis. These two aspects bring a lot of possibilities for reuse. First of all, for the similarity of consecutive frames, since the inference request time of the deep learning model is relatively long, for two highly similar frames, the result of the previous frame (called the reference frame) can be multiplexed to the next frame to avoid The detection of the next frame by the deep learning model is improved, and the speed is improved. To this end, we use the method of difference detection to detect the similarity between consecutive frames. The difference detector obtains the histogram of consecutive frames and calculates the histogram distance, and further calculates the similarity. Then judge whether the data can be reused according to the similarity.

不同时段查询在查询数据上的重复性同样也为系统带来了大量的冗余计算。由于没有对查询之间进行关联，因此每一次的查询都会独立地对视频数据进行请求分析，带来了巨大的资源浪费。针对此问题，针对请求结果进行存储，当新的查询到来时找出重复的部分直接复用。如图1所示，新的请求到来时，首先对请求进行拆分为两部分，可复用部分与不可复用部分。可复用部分的数据已存在于数据库中，可直接通过运行时间更快的数据库查询操作获取结果。对于不可复用的部分，仍通过深度学习模型进行处理获取请求结果。然后将请求到的结果反馈给数据库，以便于后续查询进行复用。假定一个视频中含有m帧图像，共有l类物体可检测和k个人物可识别，我们使用Dl*m和Rk*m两个矩阵来对数据进行存储，当新的查询到来时，首先查询Dl*m和Rk*m中是否有对应数据存在，若有则直接使用，避免了重复计算。Dij为1表示第i帧图片中存在j物体，为0则表示不存在该物体。The repetition of query data in different periods of time also brings a lot of redundant calculations to the system. Since there is no correlation between queries, each query will request and analyze the video data independently, resulting in a huge waste of resources. To solve this problem, the request results are stored, and when a new query arrives, duplicate parts are found and reused directly. As shown in Figure 1, when a new request arrives, the request is first split into two parts, the reusable part and the non-reusable part. The reusable part of the data already exists in the database, and the results can be obtained directly through the database query operation with faster running time. For the non-reusable parts, the request results are still processed through the deep learning model. Then the requested results are fed back to the database for reuse in subsequent queries. Assuming that a video contains m frames of images, there are a total of l types of objects that can be detected and k people can be identified. We use two matrices Dl*m and Rk*m to store the data. When a new query comes, first query Dl Whether there is corresponding data in *m and Rk*m, if there is, use it directly, avoiding double calculation. If Dij is 1, it means that there is an object j in the i-th frame of the picture, and if it is 0, it means that the object does not exist.

步骤二，空间维度的复用。在共享计算环境中，存在着大量的并发查询操作，当这些并发的查询请求对于同一部分数据感兴趣时，便导致了大量的重复计算。本发明采用请求裁剪合并的方式对请求进行合并，裁剪掉其中的重叠部分，减少多次的重复请求。如图2所示。两个同时到来的请求查询Q1与Q2对部分数据存在着重叠。通过查询合并的方式，合理的将两个请求合并为一个请求Q1’，避免重部分的重复计算。当多个查询同时到来时也是一样，对多个查询进行合并，最大限度地找到多请求间的重叠部分以减少重复检测。Step 2, multiplexing of spatial dimensions. In a shared computing environment, there are a large number of concurrent query operations. When these concurrent query requests are interested in the same part of data, a large number of repeated calculations are caused. The present invention merges the requests by cutting and merging the requests, cuts out overlapping parts, and reduces repeated requests. as shown in picture 2. There is an overlap between two simultaneous requests for query Q1 and Q2 for part of the data. Through query merging, two requests are reasonably merged into one request Q1' to avoid double calculation of heavy parts. The same is true when multiple queries arrive at the same time. Multiple queries are combined to maximize the overlap between multiple requests to reduce duplicate detection.

步骤三，逻辑维度的复用。视频分析系统中提供多样化的分析方式，包括对视频中的数据进行物体检测、物体识别、物体追踪等各方面，这便需要多样的深度学习模型提供不同的检测能力。然而在这些不同的模型之间，互相也存在着复用的可能性。最直观的一点，物体检测在某些帧若检测不到人，便可直接免去物体识别模型在这些帧识别人的过程。反之，若物体识别已识别到有人，则物体检测也可免去对这些帧中对于人的检测。如图3所示，我们对物体检测与物体识别的数据建立关联，找到不同模型间存在关联性的数据从而进行互相过滤。例如，当物体识别模型识别到人时，在更新物体识别数据库的同时，也会将数据反馈给物体检测数据库进行更新，将检测结果标记为有人。当新的推理请求查询是否有人时便可直接根据此结果进行过滤。我们使用D1j记录第j帧中是否有人存在，使用Rij记录第j帧中是否有人物i，R*j表示第j帧中所有的人物。根据这些性质，我们总结出如下两条定理。Step 3, reuse of logical dimensions. The video analysis system provides a variety of analysis methods, including object detection, object recognition, object tracking and other aspects of data in the video, which requires a variety of deep learning models to provide different detection capabilities. However, there is also the possibility of reuse between these different models. The most intuitive point is that if the object detection cannot detect people in some frames, the process of recognizing people in these frames can be directly eliminated by the object recognition model. Conversely, if the object recognition has identified a person, the object detection can also avoid the detection of the person in these frames. As shown in Figure 3, we establish associations between object detection and object recognition data, and find data that is correlated between different models to filter each other. For example, when the object recognition model recognizes a person, while updating the object recognition database, the data will also be fed back to the object detection database for updating, and the detection result will be marked as a person. When a new inference request queries whether there is a person, it can be filtered directly based on this result. We use D1j to record whether there is someone in the jth frame, use Rij to record whether there is person i in the jth frame, and R*j means all the characters in the jth frame. According to these properties, we conclude the following two theorems.

定理一：如果D1j为0，那么当0<j<＝m时，R*j也为0。Theorem 1: If D1j is 0, then when 0<j<=m, R*j is also 0.

定理二：如果Rij为1，那么当0<j<＝m时，D1j也为1。Theorem 2: If Rij is 1, then D1j is also 1 when 0<j<=m.

定理一表示，当第j帧检测到没有人时，那么目标识别模型在第j帧将会检测不到任何人。定理二表示，当第j帧中识别出人物i时，目标检测矩阵会自动更新第j帧中有人。由此，在进行逻辑复用时，我们便可根据定理一与定理二动态更新数据库数据，为目标检测和目标识别建立关联。Theorem 1 indicates that when no one is detected in the jth frame, then the target recognition model will not detect anyone in the jth frame. Theorem 2 states that when a person i is recognized in the jth frame, the target detection matrix will automatically update the person in the jth frame. Therefore, when performing logic reuse, we can dynamically update the database data according to Theorem 1 and Theorem 2, and establish associations for target detection and target recognition.

平衡查询的精度与速度Balance query precision and speed

在目前的深度学习模型中，虽然分析精度在逐步提升，但随着精度提升而带来的资源消耗也越来越大，这就要求更长的运行时间。一般来说，精度高的模型，运行速度较慢，精度低的模型，运行速度较快。另一方面，视频分析流程中的图片分辨率，差异检测器的跳帧数等也都会对精度与速度产生影响。这就为分析过程中的参数配置带来了挑战。因为在实际应用中，不同的推理请求对于精度与速度的要求各不相同。部分请求如对绑架抢劫的检测需要较高的精度以保持正确性，为此可以牺牲掉部分速度；而诸如红绿灯检测等请求则需要较高的时效性，精度达到合适的水平即可。为此，如何合理的选择视频分析过程中的模型，分辨率，跳帧数等参数是面临的首要问题。In the current deep learning model, although the analysis accuracy is gradually improving, the resource consumption brought about by the accuracy improvement is also increasing, which requires a longer running time. In general, models with higher accuracy run slower, and models with lower accuracy run faster. On the other hand, the image resolution in the video analysis process and the number of skipped frames of the difference detector will also affect the accuracy and speed. This poses a challenge for parameter configuration during analysis. Because in practical applications, different inference requests have different requirements for accuracy and speed. Some requests, such as the detection of kidnapping and robbery, require high accuracy to maintain correctness, for which part of the speed can be sacrificed; while requests such as traffic light detection require high timeliness, and the accuracy can only reach an appropriate level. Therefore, how to reasonably select parameters such as model, resolution, and number of skipped frames in the process of video analysis is the primary problem.

步骤一、对不同参数下的不同模型的精度与速度关系进行拟合，获取各个模型速度与精度的对应关系。在查询到来时便可选择最优的参数进行视频分析，以达到用户需求。Step 1: Fit the relationship between accuracy and speed of different models under different parameters to obtain the corresponding relationship between speed and accuracy of each model. When the query comes, the optimal parameters can be selected for video analysis to meet user needs.

另一方面，差异检测器为精度的测量带来了不确定性。差异检测器检测当前帧与前一帧(参考帧)的相似度，在高度相似的情况下直接复用前一帧的分析结果，从而避免代价高昂的推理计算。但是，直接复用导致了复用结果与实际结果存在偏差，降低了检测精度。所以需要对差异检测器进行有效的精度评估，从而能够计算整体的有效精度，准确的满足用户的精度需求。由于差异检测器是对连续的前后两帧进行检测，那么显然后一帧的精度与前一帧的精度以及两帧之间的差异度密切相关。连续帧之间的这个特性符合马尔科夫链法则。On the other hand, difference detectors introduce uncertainty into the measure of accuracy. The difference detector detects the similarity between the current frame and the previous frame (reference frame), and directly reuses the analysis results of the previous frame in the case of high similarity, thereby avoiding costly inference calculations. However, the direct multiplexing leads to the deviation between the multiplexing result and the actual result, which reduces the detection accuracy. Therefore, it is necessary to perform an effective accuracy evaluation on the difference detector, so as to be able to calculate the overall effective accuracy and accurately meet the user's accuracy requirements. Since the difference detector detects two consecutive frames, it is obvious that the accuracy of the latter frame is closely related to the accuracy of the previous frame and the degree of difference between the two frames. This property between consecutive frames follows the Markov chain rule.

步骤二、根据马尔科夫链法则，由前一帧的精度以及两帧之间的差异度，再加上相应的调节参数，来预测后一帧的精度。通过大量的验证实验来不断更正调节参数的数值，从而能够进行准确的精度评价。我们使用δdiff代表两帧之间的差异度，使用A(fi-1)代表i-1帧的精度，使用k作为调节参数，那么根据马尔科夫链法则，第i帧的精度为：Step 2. According to the Markov chain rule, the accuracy of the next frame is predicted from the accuracy of the previous frame and the difference between the two frames, plus the corresponding adjustment parameters. Through a large number of verification experiments to continuously correct the value of the adjustment parameters, so that accurate accuracy evaluation can be carried out. We use δdiff to represent the difference between two frames, use A(fi-1) to represent the accuracy of frame i-1, and use k as an adjustment parameter, then according to the Markov chain rule, the accuracy of frame i is:

A(fi-1)＝k*δdiff*A(fi-1).A(fi-1)＝k*δdiff*A(fi-1).

最后，推理请求之间的结果复用是有条件的复用，并不是不加考虑的完全复用，精度在复用中的影响尤其重要。譬如，当一个要求高精度结果的请求到来时，已存在的低精度结果显然无法复用给高精度请求。因此，存在不同精度的数据无法复用的问题。显然最直观的方法为抛弃低精度的结果，使用高精度模型进行重新检测，但这毫无疑问会带来较大开销。考虑到复用的数据是由深度学习模型和差异检测器共同带来的结果，例如深度学习模型检测第一帧，差异检测器用于处理后四帧，其中深度学习模型带来的结果精度相较于差异检测器带来的结果精度要高。例如，对于同一段视频数据，假定查询请求Q1要求90％的精度，根据此要求，精度速度平衡部分选择跳帧步数为5的频率进行检测。即，每隔5帧通过深度学习模型进行一次识别，跳过的5帧通过差异检测器获取差异度。一段时间后，新的查询请求Q2到来要求95％的精度，显然Q1的结果并不能直接用于Q2，此时，通过计算得知跳帧步数为2时可达到精度需求。首先，复用Q1中的数据，即复用已检测过的帧，对使用差异检测器跳过的5帧，使用深度学习模型检测第3帧，这样即可达到跳帧步数为2的要求。最终达到有条件复用的效果。Finally, the multiplexing of results between reasoning requests is conditional multiplexing, not complete multiplexing without consideration, and the impact of precision on multiplexing is particularly important. For example, when a request for high-precision results arrives, the existing low-precision results obviously cannot be reused for high-precision requests. Therefore, there is a problem that data with different precisions cannot be reused. Obviously, the most intuitive method is to discard low-precision results and use high-precision models for re-detection, but this will undoubtedly bring a lot of overhead. Considering that the multiplexed data is the result of both the deep learning model and the difference detector, for example, the deep learning model detects the first frame, and the difference detector is used to process the last four frames. The accuracy of the results brought by the deep learning model is compared with The accuracy of the results brought by the difference detector is higher. For example, for the same piece of video data, it is assumed that the query request Q1 requires 90% accuracy. According to this requirement, the accuracy and speed balance part selects a frequency of 5 frame skipping steps for detection. That is, recognition is performed every 5 frames through the deep learning model, and the difference degree is obtained through the difference detector for the skipped 5 frames. After a period of time, the new query request Q2 arrives and requires 95% accuracy. Obviously, the results of Q1 cannot be directly used for Q2. At this time, the accuracy requirement can be met when the number of frame skipping steps is 2 through calculation. First, reuse the data in Q1, that is, reuse the detected frames. For the 5 frames skipped by the difference detector, use the deep learning model to detect the third frame, so that the requirement of 2 frame skipping steps can be achieved. . Finally, the effect of conditional reuse is achieved.

步骤三、保留深度学习模型的结果，对差异检测器的部分结果(如后四帧中的某一帧)使用高精度模型进行重新检测。并充分复用差异检测器关于两帧之间相似度的计算结果，重新评估连续帧之间的精度，从而在整体上提高检测精度。Step 3: Keep the results of the deep learning model, and re-detect part of the results of the difference detector (such as a frame in the last four frames) using a high-precision model. And fully reuse the calculation results of the difference detector about the similarity between two frames, and re-evaluate the accuracy between consecutive frames, so as to improve the detection accuracy as a whole.

综合以上几个部分，针对多推理请求的数据复用模型总体架构如图4所示。该系统中提供多样化的视频分析请求，这些请求最终用到的深度学习模型主要有两类：物体识别模型与物体检测模型。首先。当带有检测操作(或识别操作)的请求到来时，根据空间维度的关联性，系统首先对请求进行合并，裁剪掉请求中存在重叠的部分。然后，在图中复用模块部分，根据时间维度的关联性，请求到物体检测数据库(或物体识别数据库)中进行检索，查询是否已有合适的数据可供复用。另外，根据逻辑间的关联性，应用物体识别数据库(或物体检测数据库)中数据对检测操作(或识别操作)进行过滤减少无效计算。当不存在可复用的数据时，调用配置好参数的深度学习模型进行分析。在调用深度学习模型之前，首先要根据用户关于精度与速度的需求，通过速度-精度平衡部分对相关参数进行配置，在满足用户精度需求的同时，达到尽可能高的速度。在图中inference部分，需要对相关参数如分辨率、深度学习模型的选择、跳帧率等进行合理配置，然后调用差异检测器与深度学习模型进行视频分析，最终将分析结果输出并存入数据仓库。精度提升模块则不断更新数据库中的结果精度，以便高精度的查询请求能够复用。Combining the above parts, the overall architecture of the data reuse model for multiple reasoning requests is shown in Figure 4. The system provides a variety of video analysis requests. There are two main types of deep learning models used in these requests: object recognition models and object detection models. first. When a request with a detection operation (or recognition operation) comes, according to the correlation of the spatial dimensions, the system first merges the requests and cuts out the overlapping parts of the requests. Then, in the multiplexing module part in the figure, according to the relevance of the time dimension, request to search in the object detection database (or object recognition database) to check whether there is suitable data available for reuse. In addition, according to the correlation between logics, the data in the object recognition database (or object detection database) is used to filter the detection operation (or recognition operation) to reduce invalid calculations. When there is no reusable data, the deep learning model with configured parameters is invoked for analysis. Before invoking the deep learning model, first of all, according to the user's demand for accuracy and speed, the relevant parameters must be configured through the speed-accuracy balance part, so as to achieve the highest possible speed while meeting the user's precision demand. In the inference part of the figure, it is necessary to reasonably configure relevant parameters such as resolution, deep learning model selection, frame skip rate, etc., then call the difference detector and deep learning model for video analysis, and finally output the analysis results and store them in the data storehouse. The precision improvement module continuously updates the result precision in the database so that high-precision query requests can be reused.

Claims

1. A multi-user video stream deep learning shared computing multiplexing method is characterized in that, first, when a request with a detection operation or a recognition operation arrives, the requests are merged according to the relevance of the spatial dimension, and the request is cut out There are overlapping parts; then, according to the relevance of the time dimension, request to search in the object detection database or object recognition database, query whether there is existing data available for reuse, when there is no reusable data, according to the logic Relevance between objects, use the data in the object recognition database or object detection database to filter the detection operation or recognition operation to reduce invalid calculations, and then call the deep learning model with configured parameters for analysis; for non-reusable parts, firstly, according to the speed accuracy The balance relationship, find the parameter configuration in the analysis process, and then call the difference detector and deep learning model to analyze the video, and finally output the analysis results and store them in the data warehouse. The improvement module is used to improve the accuracy of the original results in the database. It is suitable for multiplexing high-precision query requests; it is necessary to reasonably configure relevant parameters including resolution, selection of deep learning models, and frame skipping rate; specific steps to balance query accuracy and speed:

Step 1. Fit the relationship between accuracy and speed of different models under different parameters, obtain the corresponding relationship between the speed and accuracy of each model, and select the optimal parameters for video analysis when the query arrives, so as to meet user needs;

Step 2. According to the Markov chain rule, the accuracy of the next frame is predicted from the accuracy of the previous frame and the difference between the two frames, plus the corresponding adjustment parameters, and the adjustment parameters are continuously corrected through verification experiments. value, so that accurate accuracy evaluation can be performed. Specifically, δdiff is used to represent the difference between two frames, A(fi-1) is used to represent the accuracy of i-1 frame, and k is used as an adjustment parameter, then according to the Markov chain rule , the accuracy of the i-th frame is:

A(fi-1)＝k*δdiff*A(fi-1).

Finally, the multiplexing of results between inference requests is conditional multiplexing, that is, the frequency of using the deep learning model is increased; step 3, the results of the deep learning model are retained, and part of the results of the difference detector are re-detected using a high-precision model , and fully reuse the calculation results of the difference detector about the similarity between two frames, and re-evaluate the accuracy between consecutive frames, thereby improving the detection accuracy as a whole;

Among them, the reuse of logical dimensions: establish associations between object detection and object recognition data, and find data that is correlated between different models to filter each other. Specific steps for the reuse of logical dimensions:

A video contains m frames of images. There are l types of objects that can be detected and k people can be identified. Two matrices Dl*m and Rk*m are used to store the data. Dl*m represents the matrix of objects, and Rk*m is Represents the character matrix. When a new query comes, first check whether there is corresponding data in Dl*m and Rk*m. If there is, use it directly. D _ij is 1, which means that there is the i-th object in the j-th frame of the picture, and it is 0 It means that the i-th object does not exist; if R _ij is 1, it means that the i-th person exists in the j-th frame of the picture, and if it is 0, it means that the i-th person does not exist; D _1j records whether there is someone in the j-th frame; R _*j means that the j-th All characters in the frame;

Theorem 1: If D1j is 0, then when 0<j<=m, R*j is also 0;

Theorem 2: If Rij is 1, then when 0<j<=m, D1j is also 1;

Theorem 1 indicates that when no one is detected in the jth frame, the target recognition model will not detect anyone in the jth frame; Theorem 2 indicates that when the person i is recognized in the jth frame, the target detection matrix will automatically Update the person in the j-th frame. Therefore, when performing logic multiplexing, dynamically update the database data according to Theorem 1 and Theorem 2, and establish associations for target detection and target recognition.

2. The multi-user video stream deep learning sharing calculation multiplexing method as claimed in claim 1, is characterized in that, the multiplexing concrete steps of time dimension:

1) Use the method of difference detection to detect the similarity between consecutive frames. The difference detector obtains the histogram of consecutive frames and calculates the histogram distance, further calculates the similarity, and then judges whether the data can be reused according to the similarity;

2) When a new request comes, the request is first split into two parts, the reusable part and the non-reusable part. The data of the reusable part already exists in the database, and can be directly queried through the database with faster running time The operation obtains the result; for the part that cannot be reused, it is still processed through the deep learning model to obtain the request result, and then the requested result is fed back to the database for reuse in subsequent queries.

3. The multi-user video stream deep learning shared computing multiplexing method according to claim 1, characterized in that the multiplexing of spatial dimensions: the request is merged by cutting and merging requests, and overlapping parts are cut out to reduce Multiple repeated requests.