CN114546587A

CN114546587A - A method for expanding and shrinking capacity of online image recognition service and related device

Info

Publication number: CN114546587A
Application number: CN202210131903.8A
Authority: CN
Inventors: 莫浩; 朱立谷; 姬光; 张恩伟; 尹宇鹤; 谢群; 蒙移发
Original assignee: BEIJING TELESOUND ELECTRONICS CO LTD
Current assignee: BEIJING TELESOUND ELECTRONICS CO LTD
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-27

Abstract

The present application discloses a method for expanding and shrinking capacity of an online image recognition service and a related device. The method monitors the number of images to be processed in the online image recognition service, so as to determine the execution of a prediction task according to the number of detected images to be processed. The amount of device resources required. Based on the preset conversion relationship, determine the preset monitoring indicators corresponding to the number of images to be processed in the container management system, and control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so as to Make the amount of resources held by the expanded and contracted containers meet the amount of device resources. Finally, the prediction task is performed according to the amount of resources held. The above process uses the number of images to be processed as an evaluation index for performing expansion and shrinkage operations to reduce the occurrence of expansion and shrinkage misoperations, and divides the physical device cards into virtual device cards for performing prediction tasks through container encapsulation. , which improves the resource utilization of the server.

Description

A method for expanding and shrinking capacity of online image recognition service and related device

技术领域technical field

本发明涉及数据处理技术领域，特别涉及一种在线图像识别服务的扩缩容方法及相关装置。The invention relates to the technical field of data processing, and in particular, to a method for expanding and shrinking capacity of an online image recognition service and a related device.

背景技术Background technique

在线图像识别服务是借助深度学习在线服务框架和中央处理器(CentralProcessing Unit，CPU)的算力资源，在线提供稳定的高吞吐量预测服务。该服务用于根据预置的识别模型对客户端上传的待处理图像进行识别预测。为合理分配各预测任务对资源的占用，相关技术中多通过自动扩缩容操作来动态分配用于执行预测任务的资源量。The online image recognition service is to provide a stable high-throughput prediction service online with the help of the deep learning online service framework and the computing resources of the Central Processing Unit (CPU). This service is used to identify and predict the images to be processed uploaded by the client according to the preset identification model. In order to reasonably allocate the resources occupied by each prediction task, in the related art, the amount of resources used for executing the prediction task is dynamically allocated through automatic expansion and contraction operations.

传统的扩缩容方式是根据在线图像识别服务的CPU资源使用情况作为评判指标，通过对执行预测任务所需的计算资源进行扩缩容操作，以使在线图像识别服务能够动态的适应流量高峰以及低谷时期的不同资源需求。这种直接以CPU资源的占用作为判断指标的方式多存在扩缩容误操作的发生，例如因与在线图像识别服务无关的其他系统占用CPU资源较多时，会导致CPU占用率升高。当在线图像识别服务监测到CPU占用率升高时会通过执行扩缩容操作以获取更多的计算资源。但由于该CPU资源利用率升高的原因并非由在线图像识别服务造成，本应无需对在线图像识别服务执行扩缩容操作。这就导致了扩缩容误操作的发生，降低了服务器的资源利用率。The traditional way of expanding and shrinking capacity is based on the CPU resource usage of the online image recognition service as the evaluation index, and by expanding and shrinking the computing resources required to perform prediction tasks, so that the online image recognition service can dynamically adapt to traffic peaks and Different resource demands during trough periods. This method of directly using CPU resource occupancy as a judging indicator is often prone to misoperations of expansion and contraction. For example, when other systems unrelated to the online image recognition service occupy a lot of CPU resources, the CPU occupancy rate will increase. When the online image recognition service detects an increase in CPU usage, it will perform scaling operations to obtain more computing resources. However, since the reason for the increase in CPU resource utilization is not caused by the online image recognition service, there should be no need to perform a capacity expansion or reduction operation on the online image recognition service. This leads to the occurrence of expansion and contraction errors and reduces the resource utilization of the server.

发明内容SUMMARY OF THE INVENTION

本发明提供一种在线图像识别服务的扩缩容方法及相关装置，用于将待处理图像的数量作为执行扩缩容操作的评判指标以降低扩缩容误操作的发生，并通过从实体设备卡中划分出用于执行预测任务的虚拟设备卡，提高服务器的资源利用率。The present invention provides a capacity expansion/reduction method and a related device for an online image recognition service, which are used to use the number of images to be processed as an evaluation index for performing capacity expansion/reduction operations to reduce the occurrence of capacity expansion/reduction errors. The virtual device card used for performing prediction tasks is divided into the card, so as to improve the resource utilization rate of the server.

第一方面，本发明实施例提供一种在线图像识别服务的扩缩容方法，所述方法包括：In a first aspect, an embodiment of the present invention provides a method for expanding or shrinking an online image recognition service, the method comprising:

对在线图像识别服务的待处理图像的数量进行监测，根据监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量；Monitoring the number of images to be processed in the online image recognition service, and determining the amount of equipment resources required to perform the prediction task according to the number of the monitored images to be processed;

基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标；其中，所述容器管理系统用于对所述在线图像识别服务持有的容器进行管理；每一容器内封装有图形处理器资源，所述容器为在线图像识别服务持有资源量的分配单位，所述容器的逻辑封装用于拦截对统一计算设备架构库的应用程序编程接口的调用；A preset monitoring index corresponding to the number of images to be processed in the container management system is determined based on a preset conversion relationship; wherein the container management system is used to manage the containers held by the online image recognition service; each The container is encapsulated with graphics processor resources, the container is the allocation unit of the resource amount held by the online image recognition service, and the logical encapsulation of the container is used for intercepting the invocation of the application programming interface of the unified computing device architecture library;

根据所述预设监控指标控制所述容器管理系统对所述在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足所述设备资源量；根据所述持有资源量执行所述预测任务；The container management system is controlled to perform capacity expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so that the amount of resources held by the expanded and contracted containers can meet the requirements of the device resources. amount; perform the forecasting task according to the amount of resources held;

其中，执行扩容操作时，从资源集群的实体设备卡中划分出与目标资源量相同的资源，对所述资源进行容器封装以构建虚拟设备卡；所述目标资源量是根据所述设备资源量和所述持有资源量确定的；Wherein, when the expansion operation is performed, resources that are the same as the target resource amount are divided from the physical device cards of the resource cluster, and the resources are packaged in containers to construct a virtual device card; the target resource amount is based on the device resource amount. and the amount of resources held;

执行缩容操作时，对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内。When the shrinking operation is performed, a preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range.

本申请实施例通过对在线图像识别服务的待处理图像的数量进行监控，以根据检测到的待处理图像的数量确定执行预测任务所需的设备资源量。基于预设转换关系确定该待处理图像的数量在容器管理系统中对应的预设监控指标，并根据预设监控指标控制容器管理系统对在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足设备资源量。最后根据持有资源量执行预测任务。上述流程以逻辑封装的方式实现从实体设备卡中划分出用于执行预测任务的虚拟设备卡，提高了服务器的资源利用率。并且待处理图像的数量能够更为准确的反应业务流量高峰低谷的情况，上述流程将待处理图像的数量作为执行扩缩容操作的评判指标能够降低扩缩容误操作的发生。This embodiment of the present application determines the amount of equipment resources required to perform the prediction task according to the detected number of images to be processed by monitoring the number of images to be processed in the online image recognition service. Determine the preset monitoring indicators corresponding to the number of images to be processed in the container management system based on the preset conversion relationship, and control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so as to Make the amount of resources held by the expanded and contracted containers meet the amount of device resources. Finally, the prediction task is performed according to the amount of resources held. The above process implements the division of the virtual device card for performing the prediction task from the physical device card in the manner of logical encapsulation, which improves the resource utilization rate of the server. In addition, the number of images to be processed can more accurately reflect the peak and trough of business traffic. The above process uses the number of images to be processed as an evaluation index for performing expansion and shrinkage operations, which can reduce the occurrence of expansion and shrinkage misoperations.

在一些可能的实施例中，所述基于监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量，包括：In some possible embodiments, determining the amount of equipment resources required to perform the prediction task based on the monitored number of the images to be processed includes:

获取所述预测任务的延时时长以及图像处理器的处理速率；Obtain the delay duration of the prediction task and the processing rate of the image processor;

根据所述待处理图像的数量、所述延时时长以及所述处理速率确定所述设备资源量。The amount of device resources is determined according to the number of images to be processed, the delay duration and the processing rate.

本申请实施例根据执行预测任务的延时要求和图像处理器的处理速率以及该待处理图像的数量确定执行该预测任务所需的设备资源量，提高设备资源量的预测精度。The embodiment of the present application determines the amount of equipment resources required to execute the prediction task according to the delay requirement for executing the prediction task, the processing rate of the image processor, and the number of images to be processed, so as to improve the prediction accuracy of the amount of equipment resources.

在一些可能的实施例中，所述基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标，包括：In some possible embodiments, determining the preset monitoring indicators corresponding to the number of images to be processed in the container management system based on the preset conversion relationship includes:

若所述待处理图像的数量大于第一监控阈值，则确定所述待处理图像的数量对应于表征执行扩容操作的预设监控指标；If the number of the to-be-processed images is greater than the first monitoring threshold, determining that the number of the to-be-processed images corresponds to a preset monitoring index representing the execution of the expansion operation;

若所述待处理图像的数量小于第二监控阈值，则确定所述待处理图像的数量对应于表征执行缩容操作的预设监控指标；其中，所述第二监控阈值小于所述第一监控阈值；If the number of the to-be-processed images is less than the second monitoring threshold, it is determined that the number of the to-be-processed images corresponds to a preset monitoring index representing the execution of a scaling operation; wherein the second monitoring threshold is less than the first monitoring threshold threshold;

若所述待处理图像的数量处于所述第一监控阈值和所述第二监控阈值之间，则确定所述待处理图像的数量对应于表征无需执行扩缩容操作的预设监控指标。If the number of images to be processed is between the first monitoring threshold and the second monitoring threshold, it is determined that the number of images to be processed corresponds to a preset monitoring indicator indicating that no expansion or reduction operation needs to be performed.

本申请实施例待处理图像的数量大于第一监控阈值时确定待处理图像的数量对应于表征执行扩容操作的预设监控指标；当待处理图像的数量小于第二监控阈值时确定待处理图像的数量对应于表征执行缩容操作的预设监控指标。上述流程通过对待处理图像的数量设置对应的监控阈值以确定是否执行扩缩容任务，降低扩缩容误操作的发生。In this embodiment of the present application, when the number of images to be processed is greater than the first monitoring threshold, it is determined that the number of images to be processed corresponds to a preset monitoring indicator representing the execution of the expansion operation; when the number of images to be processed is less than the second monitoring threshold, it is determined that the number of images to be processed corresponds to The number corresponds to the preset monitoring metrics that characterize the scaling operation. In the above process, a corresponding monitoring threshold is set for the number of images to be processed to determine whether to perform a capacity expansion or reduction task, so as to reduce the occurrence of wrong operation of capacity expansion or reduction.

在一些可能的实施例中，所述从资源集群的实体设备卡中划分出与目标资源量相同的虚拟资源之前，所述方法还包括：In some possible embodiments, before dividing the virtual resources with the same amount of target resources from the physical device cards of the resource cluster, the method further includes:

对所述资源集群进行检测，获取满足所述设备资源量的实体设备卡；Detecting the resource cluster, and obtaining a physical device card that satisfies the device resource amount;

所述从资源集群的实体设备卡中划分出与目标资源量相同的图形处理器资源，对所述图形处理器资源进行容器封装以构建虚拟设备卡，包括：The step of dividing the graphics processor resources with the same amount of target resources from the physical device cards of the resource cluster, and performing container encapsulation on the graphics processor resources to construct a virtual device card, includes:

选取图形处理资源最少的实体设备卡作为目标实体设备卡，从所述目标实体设备卡中划分出与所述目标资源量相同的图形处理器资源；Selecting the physical device card with the least graphics processing resources as the target physical device card, and dividing the graphics processor resource with the same amount of the target resources from the target physical device card;

对所述图形处理器资源进行容器封装，将封装的容器作为所述虚拟设备卡。Perform container encapsulation on the graphics processor resource, and use the encapsulated container as the virtual device card.

本申请实施例选取显存最小的实体设备卡作为目标实体设备卡后，从目标实体设备卡中划分出与目标资源量相同的图形处理器资源，并将该资源进行容器封装以构建虚拟设备卡。容器封装用于拦截统一计算设备架构库对容器的应用程序编程接口的调用，通过上述方式可将在线服务系统持有资源量的最小分配单位由原本的一张实体设备卡变为实体设备卡中的部分虚拟资源，由此提高服务器的资源利用率。In the embodiment of the present application, after selecting the physical device card with the smallest video memory as the target physical device card, the graphics processor resource with the same amount of target resources is divided from the target physical device card, and the resource is packaged in a container to construct a virtual device card. The container encapsulation is used to intercept the call of the unified computing device architecture library to the application programming interface of the container. Through the above method, the minimum allocation unit of resources held by the online service system can be changed from the original physical device card to the physical device card. part of the virtual resources, thereby improving the resource utilization of the server.

在一些可能的实施例中，所述方法还包括：In some possible embodiments, the method further includes:

对所述资源集群进行检测时，若未检测到所述资源集群中存在满足所述设备资源量的实体设备卡，则基于预设时间间隔对所述资源集群执行预设次数的资源重检；When detecting the resource cluster, if it is not detected that there is a physical device card that meets the device resource amount in the resource cluster, performing a preset number of resource rechecks on the resource cluster based on a preset time interval;

执行所述资源重检过程中，若检测到满足所述设备资源量的实体设备卡，则选取图形处理资源最少的实体设备卡作为所述目标实体设备卡；During the execution of the resource rechecking process, if an entity device card satisfying the amount of the device resources is detected, the entity device card with the least graphics processing resources is selected as the target entity device card;

若未检测到所述资源集群中存在满足所述设备资源量的实体设备卡，则输出表征当前无可用资源的提示，并结束本次扩容任务。If it is not detected that there is a physical device card in the resource cluster that meets the device resource amount, a prompt indicating that there is currently no available resource is output, and the current capacity expansion task ends.

本申请实施例对资源集群进行检测时，若未检测到资源集群中存在满足设备资源量的实体设备卡，则基于预设时间间隔对资源集群进行预设次数的资源重检。在执行资源重检的过程中若检测到满足设备资源量的实体设备卡，则选用图形处理资源最少的实体设备卡作为目标设备卡。通过上述重检方式可降低重新上传预测任务的情况，提高用户满意度。When detecting a resource cluster in this embodiment of the present application, if it is not detected that there is a physical device card that meets the amount of device resources in the resource cluster, a preset number of resource rechecks are performed on the resource cluster based on a preset time interval. In the process of performing resource rechecking, if a physical device card satisfying the amount of device resources is detected, the physical device card with the least graphics processing resources is selected as the target device card. Through the above rechecking method, the situation of re-uploading the prediction task can be reduced, and the user satisfaction can be improved.

在一些可能的实施例中，所述对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内，包括：In some possible embodiments, performing a preset deletion operation on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range, includes:

确定每一容器的图形处理器资源的资源量，将所述资源量最小的容器作为待处理容器；Determine the resource amount of graphics processor resources of each container, and use the container with the smallest amount of resources as the container to be processed;

删除所述待处理容器，并监测在线图像识别服务的响应时间；delete the container to be processed, and monitor the response time of the online image recognition service;

若所述响应时间未处于所述预设时间范围内，则从剩余容器中重新选取所述待处理容器，直至所述响应时间处于所述预设时间范围内，停止所述预设删除操作。If the response time is not within the preset time range, the to-be-processed container is reselected from the remaining containers, and the preset deletion operation is stopped until the response time is within the preset time range.

本申请实施例执行缩容操作时，将图形处理器资源量最小的容器作为待处理容器。并根据在线图像识别服务的响应时间与预设时间范围的比对结果确定是否继续删除待处理容器，以此提高服务器的资源利用率。When performing the scaling operation in the embodiment of the present application, the container with the smallest amount of graphics processor resources is used as the container to be processed. And according to the comparison result between the response time of the online image recognition service and the preset time range, it is determined whether to continue to delete the container to be processed, so as to improve the resource utilization rate of the server.

在一些可能的实施例中，所述待处理图像的数量是根据预先部署在所述在线图像识别服务的拦截器或过滤器确定的。In some possible embodiments, the number of images to be processed is determined according to interceptors or filters pre-deployed in the online image recognition service.

本申请实施例对在线图像识别服务进行预先部署，通过预先部署拦截器或过滤器以获取在线图像识别服务的响应时间。This embodiment of the present application pre-deploys the online image recognition service, and obtains the response time of the online image recognition service by pre-deploying an interceptor or filter.

第二方面，本申请实施例提供了一种在线图像识别服务的扩缩容装置，所述装置包括：In a second aspect, an embodiment of the present application provides a capacity expansion/reduction device for an online image recognition service, the device comprising:

流量监测模块，被配置为执行对在线图像识别服务的待处理图像的数量进行监测，根据监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量；a flow monitoring module, configured to monitor the number of images to be processed in the online image recognition service, and determine the amount of equipment resources required to perform the prediction task according to the monitored number of the images to be processed;

指标确定模块，被配置为执行基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标；其中，所述容器管理系统用于对所述在线图像识别服务持有的容器进行管理；每一容器内封装有图形处理器资源，所述容器为在线图像识别服务持有资源量的分配单位，所述容器的逻辑封装用于拦截对统一计算设备架构库的应用程序编程接口的调用；The indicator determination module is configured to determine the preset monitoring indicators corresponding to the quantity of the images to be processed in the container management system based on the preset conversion relationship; wherein, the container management system is used for maintaining the online image recognition service Some containers are managed; each container is encapsulated with graphics processor resources, the container is the allocation unit of resources held by the online image recognition service, and the logical encapsulation of the container is used to intercept the application of the unified computing device architecture library Program programming interface invocation;

扩缩容模块，被配置为执行根据所述预设监控指标控制所述容器管理系统对所述在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足所述设备资源量；根据所述持有资源量执行所述预测任务；The capacity expansion and contraction module is configured to control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so that the containers corresponding to the expanded and contracted containers can be expanded or contracted. The amount of resources held satisfies the amount of equipment resources; the prediction task is performed according to the amount of resources held;

其中，所述扩缩容模块执行扩容操作时，从资源集群的实体设备卡中划分出与目标资源量相同的资源，对所述资源进行容器封装以构建虚拟设备卡；所述目标资源量是根据所述设备资源量和所述持有资源量确定的；Wherein, when the expansion and contraction module performs the expansion operation, it divides the resources with the same target resource amount from the physical device cards of the resource cluster, and performs container encapsulation on the resources to construct a virtual device card; the target resource amount is Determined according to the amount of equipment resources and the amount of held resources;

所述扩缩容模块执行缩容操作时，对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内。When the capacity expansion/reduction module performs a capacity reduction operation, a preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range.

在一些可能的实施例中，执行所述基于监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量，所述流量监测模块被配置为：In some possible embodiments, to perform the determining of the amount of equipment resources required to perform the prediction task based on the monitored number of the images to be processed, the flow monitoring module is configured to:

在一些可能的实施例中，执行所述基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标，所述指标确定模块被配置为：In some possible embodiments, the determining a preset monitoring index corresponding to the number of images to be processed in the container management system based on a preset conversion relationship is performed, and the index determining module is configured to:

在一些可能的实施例中，执行所述从资源集群的实体设备卡中划分出与目标资源量相同的虚拟资源之前，所述扩缩容模块还被配置为：In some possible embodiments, before performing the dividing of virtual resources with the same amount of target resources from the physical device cards of the resource cluster, the capacity expansion and contraction module is further configured to:

对所述资源集群进行检测，获取满足所述设备资源量的实体设备卡；Detecting the resource cluster, and obtaining a physical device card that satisfies the amount of device resources;

在一些可能的实施例中，所述扩缩容模块还被配置为：In some possible embodiments, the capacity expansion/reduction module is further configured to:

在一些可能的实施例中，执行所述对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内，所述扩缩容模块被配置为：In some possible embodiments, the performing the preset deletion operation on the container currently held by the online image recognition service is performed, so that the response time of the online image recognition service is within a preset time range, and the extension The shrink module is configured as:

第三方面，本申请实施例还提供了一种电子设备，包括：In a third aspect, an embodiment of the present application also provides an electronic device, including:

处理器；processor;

用于存储所述处理器可执行指令的存储器；memory for storing instructions executable by the processor;

其中，所述处理器被配置为执行所述指令，以实现如本申请第一方面中提供的任一方法。Wherein, the processor is configured to execute the instructions to implement any method as provided in the first aspect of the present application.

第四方面，本申请实施例还提供了一种存储介质，当所述存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行如本申请第一方面中提供的任一方法。In a fourth aspect, an embodiment of the present application further provides a storage medium, when an instruction in the storage medium is executed by a processor of an electronic device, the electronic device can execute any method provided in the first aspect of the present application .

第五方面，本申请一实施例提供了一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如本申请第一方面中提供的任一方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including a computer program, which implements any of the methods provided in the first aspect of the present application when the computer program is executed by a processor.

本申请的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请而了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description, claims, and drawings.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对本申请实施例中所需要使用的附图作简单地介绍，显而易见地，下面所介绍的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that need to be used in the embodiments of the present application. Obviously, the drawings introduced below are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本申请实施例示出的K8s系统架构图；Fig. 1 is the K8s system architecture diagram shown in the embodiment of this application;

图2a为本申请实施例示出的在线图像识别服务的扩缩容方法流程图；2a is a flowchart of a method for expanding and shrinking the online image recognition service according to an embodiment of the present application;

图2b为本申请实施例示出的虚拟设备卡分配示意图；FIG. 2b is a schematic diagram of virtual device card allocation according to an embodiment of the application;

图2c为本申请实施例示出的提示信息示意图；FIG. 2c is a schematic diagram of prompt information shown in an embodiment of the present application;

图3为本申请实施例示出的GPUshare组件示意图；3 is a schematic diagram of a GPUshare component shown in an embodiment of the application;

图4为本申请实施例示出的在线图像识别服务的扩缩容装置400结构图；FIG. 4 is a structural diagram of an apparatus 400 for expanding and shrinking capacity of an online image recognition service according to an embodiment of the present application;

图5为本申请实施例示出的电子设备示意图。FIG. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。在本申请实施例的描述中，除非另有说明，“面将表示或的意思，例如，A/B可以表示A或B；文本中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，另外，在本申请实施例的描述中，“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described clearly and in detail below with reference to the accompanying drawings. In the description of the embodiments of the present application, unless otherwise specified, "face will mean or, for example, A/B may mean A or B; "and/or" in the text is only an association to describe the associated object Relationship, it means that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of this application, " Plural" means two or more.

在本申请实施例的描述中，除非另有说明，术语“多个”是指两个或两个以上，其它量词与之类似应当理解，此处所描述的优选实施例仅用于说明和解释本申请，并不用于限定本申请，并且在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In the description of the embodiments of the present application, unless otherwise specified, the term "plurality" refers to two or more, and other quantifiers are similar to it. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention. The application is not used to limit the application, and the embodiments of the application and the features in the embodiments can be combined with each other under the condition of no conflict.

为进一步说明本申请实施例提供的技术方案，下面结合附图以及具体实施方式对此进行详细的说明。虽然本申请实施例提供了如下述实施例或附图所示的方法操作步骤，但基于常规或者无需创造性的劳动在方法中可以包括更多或者更少的操作步骤。在逻辑上不存在必要因果关系的步骤中，这些步骤的执行顺序不限于本申请实施例提供的执行顺序。方法在实际的处理过程中或者控制设备执行时，可以按照实施例或者附图所示的方法顺序执行或者并行执行。In order to further illustrate the technical solutions provided by the embodiments of the present application, a detailed description is given below in conjunction with the accompanying drawings and specific embodiments. Although the embodiments of the present application provide method operation steps as shown in the following embodiments or the accompanying drawings, more or less operation steps may be included in the method based on routine or without creative work. In steps that logically do not have a necessary causal relationship, the execution order of these steps is not limited to the execution order provided by the embodiments of the present application. In the actual processing process or when the method is executed by the control device, the method may be executed sequentially or in parallel according to the methods shown in the embodiments or the accompanying drawings.

传统的扩缩容方式是根据在线图像识别服务的CPU资源使用情况作为评判指标，通过对执行预测任务所需的计算资源进行扩缩容操作，以使在线图像识别服务能够动态的适应流量高峰以及低谷时期的不同资源需求。这种直接以CPU资源的占用作为判断指标的方式多存在扩缩容误操作的发生。The traditional way of expanding and shrinking capacity is based on the CPU resource usage of the online image recognition service as the evaluation index, and by expanding and shrinking the computing resources required to perform prediction tasks, so that the online image recognition service can dynamically adapt to traffic peaks and Different resource demands during trough periods. This method of directly using the occupancy of CPU resources as a judgment indicator often leads to the occurrence of expansion and shrinkage misoperations.

此外，现有的深度学习预测任务在集群运行时，会针对每一预测任务分配满足该预测任务所需资源量的图形处理器(Graphic Processing Unit,GPU)设备卡，即实体设备卡。并控制预测模型使用该实体设备卡的持有资源执行该预测任务。然而，通常执行单个预测任务所需的资源量远小于一张实体设备卡持有的资源量。上述以实体设备卡作为最小分配单位的资源分配方式会造成对服务器资源利用率的浪费，进而影响集群预测请求的吞吐量。In addition, when an existing deep learning prediction task is running in a cluster, a graphics processor (Graphic Processing Unit, GPU) device card, ie a physical device card, that satisfies the resource amount required by the prediction task is allocated for each prediction task. And control the prediction model to use the resources held by the physical device card to perform the prediction task. However, usually the amount of resources required to perform a single prediction task is much smaller than the amount of resources held by a single physical device card. The above resource allocation method using the physical device card as the minimum allocation unit will cause waste of server resource utilization, thereby affecting the throughput of the cluster prediction request.

为解决上述技术问题，本申请的发明构思为：通过对在线图像识别服务的待处理图像的数量进行监控，以根据检测到的待处理图像的数量确定执行预测任务所需的设备资源量。基于预设转换关系确定该待处理图像的数量在容器管理系统中对应的预设监控指标，并根据预设监控指标控制容器管理系统对在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足设备资源量。最后根据持有资源量执行预测任务。上述流程以逻辑封装的方式实现从实体设备卡中划分出用于执行预测任务的虚拟设备卡，提高了服务器的资源利用率。并且待处理图像的数量能够更为准确的反应业务流量高峰低谷的情况，上述流程将待处理图像的数量作为执行扩缩容操作的评判指标能够降低扩缩容误操作的发生。In order to solve the above technical problems, the inventive concept of the present application is to monitor the number of images to be processed in the online image recognition service to determine the amount of equipment resources required to perform prediction tasks according to the number of detected images to be processed. Determine the preset monitoring indicators corresponding to the number of images to be processed in the container management system based on the preset conversion relationship, and control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so as to Make the amount of resources held by the expanded and contracted containers meet the amount of device resources. Finally, the prediction task is performed according to the amount of resources held. The above process implements the division of the virtual device card for performing the prediction task from the physical device card in the manner of logical encapsulation, which improves the resource utilization rate of the server. In addition, the number of images to be processed can more accurately reflect the peaks and valleys of business traffic. The above process uses the number of images to be processed as an evaluation index for performing expansion and shrinking operations, which can reduce the occurrence of incorrect expansion and shrinkage operations.

上述资源集群可以理解为大量具有运算处理能力的电子设备的集合，这些电子可以协作完成特定的工作。其中，电子设备可以是服务器、虚拟机等。The above-mentioned resource cluster can be understood as a collection of a large number of electronic devices with computing processing capabilities, and these electronic devices can cooperate to complete specific tasks. The electronic device may be a server, a virtual machine, or the like.

在资源集群环境中，应用程序以容器化的方式进行部署和运行，资源集群内部署有用于对容器进行管理的容器管理系统。目前市面上主流的容器管理系统多为开源容器编排系统(Kubernetes，K8s)。K8s系统采用分布式架构，为容器化的应用提供部署运行、资源调度、服务发现和动态伸缩等一系列完整功能，提高了大规模容器集群管理的便捷性。图1示出了K8s的系统架构图。In a resource cluster environment, applications are deployed and run in a containerized manner, and a container management system for managing containers is deployed in the resource cluster. At present, the mainstream container management systems on the market are mostly open source container orchestration systems (Kubernetes, K8s). The K8s system adopts a distributed architecture to provide a series of complete functions such as deployment and operation, resource scheduling, service discovery, and dynamic scaling for containerized applications, which improves the convenience of large-scale container cluster management. Figure 1 shows the system architecture diagram of K8s.

参照图1，K8s集群100主要包括管理节点110和工作节点120，各节点之间可以相互通信，在图1示出的例子中，管理节点110为一个，工作节点120为多个，这里的节点可以指集群中的服务器，但不限于物理设备。Referring to FIG. 1, the K8s cluster 100 mainly includes a management node 110 and a working node 120, and each node can communicate with each other. In the example shown in FIG. 1, there is one management node 110 and multiple working nodes 120. Here, the nodes Can refer to servers in a cluster, but is not limited to physical devices.

其中，管理节点110负责集群的管理和控制，管理节点110上运行如应用程序编程接口服务器(kube-apiserver)、控制器管理器(controller manager)、调度器(kube-scheduler)以及分布式持久化存储(etcd)等关键进程。工作节点120为集群的工作负载节点，管理节点110会将工作负载分配给工作节点120，工作节点120上运行服务代理(kubelet、kube-proxy)、容器运行时(Docker engine)等关键进程。以上构成K8s系统的各进程的具体功能可以参考现有技术，这里不做具体阐述。Among them, the management node 110 is responsible for the management and control of the cluster, such as application programming interface server (kube-apiserver), controller manager (controller manager), scheduler (kube-scheduler) and distributed persistence running on the management node 110 Storage (etcd) and other critical processes. The worker node 120 is a workload node of the cluster, and the management node 110 will distribute the workload to the worker node 120. The worker node 120 runs key processes such as service proxies (kubelet, kube-proxy), and container runtime (Docker engine). For the specific functions of the above processes constituting the K8s system, reference may be made to the prior art, which will not be described in detail here.

在K8s集群100中可以创建和管理的最小可部署计算单元称为并置容器组(pod)。pod表示包含一个或多个容器(Container)的逻辑主机，这些容器始终位于同一位置，并在共享的上下文中运行。K8s集群100还提供了一个模块化的应用程序编程接口核心，使用户可以在YAML或JSON对象(也称为PodSpec)中描述pod所需行为的规范。在后文中为便于描述本申请，预设每个pod对应一个容器。The smallest deployable computing unit that can be created and managed in a K8s cluster 100 is called a collocated container group (pod). A pod represents a logical host containing one or more containers (Containers) that are always co-located and run in a shared context. K8s Cluster 100 also provides a modular API core that enables users to describe specifications for the desired behavior of pods in YAML or JSON objects (also known as PodSpecs). In the following, for the convenience of describing the present application, it is preset that each pod corresponds to one container.

应理解的是，上述K8s系统仅仅是示例，本申请的方法实际上并不限定容器管理系统具体采用何种系统，但为简化阐述，后文在提到容器管理系统时，仍然主要以K8s为例进行介绍。It should be understood that the above K8s system is only an example, and the method of this application does not actually limit the specific system used by the container management system. However, in order to simplify the description, when referring to the container management system later, K8s is still mainly used. example is introduced.

在本申请中，容器中要运行的应用程序主要指需要使用集群中的GPU设备资源的深度学习预测任务，在K8s集群100中，GPU设备可以配置在工作节点120上。可以使用GPU设备资源的容器在本申请中称为GPU容器，由于GPU设备价格昂贵，因此在K8s集群中，GPU设备资源通常是一种比较稀缺的资源。上述预测任务要使用GPU设备资源，因此需要在GPU容器中运行。In this application, the application to be run in the container mainly refers to the deep learning prediction task that needs to use the GPU device resources in the cluster. In the K8s cluster 100 , the GPU device can be configured on the worker node 120 . A container that can use GPU device resources is referred to as a GPU container in this application. Since GPU devices are expensive, GPU device resources are usually a relatively scarce resource in a K8s cluster. The above prediction task uses GPU device resources, so it needs to run in a GPU container.

现有技术中，为使原生的Docker容器能够支持使用GPU设备资源，在Docker容器的基础上进行了一层封装，例如Nvidia-Docker容器。Nvidia-Docker容器是一种GPU容器。K8s系统提供了对Nvidia-Docker容器的创建和调度能力，但Nvidia-Docker对GPU设备资源的利用最小粒度为一张GPU设备卡(并且只独占使用一张GPU卡)，不支持更细粒度资源的分配。这直接造成对GPU设备资源利用率的浪费，进而影响集群预测请求的吞吐量。In the prior art, in order to enable the native Docker container to support the use of GPU device resources, a layer of encapsulation is performed on the basis of the Docker container, such as the Nvidia-Docker container. Nvidia-Docker container is a GPU container. The K8s system provides the ability to create and schedule Nvidia-Docker containers, but the minimum granularity of Nvidia-Docker utilization of GPU device resources is one GPU device card (and only one GPU card is used exclusively), and more fine-grained resources are not supported. allocation. This directly results in a waste of GPU device resource utilization, which in turn affects the throughput of cluster prediction requests.

为解决上述问题，本申请提供了一种在线图像识别服务的扩缩容方法，该方法可以应用于部署在资源集群中的容器管理系统，包括但不限于上述K8s系统。In order to solve the above problems, the present application provides a method for expanding and shrinking the capacity of an online image recognition service, which can be applied to a container management system deployed in a resource cluster, including but not limited to the above-mentioned K8s system.

下面结合附图对本申请实施例提供技术方案的进行详细说明，具体如图2a所示，包括以下步骤：The technical solutions provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings. Specifically, as shown in Figure 2a, the following steps are included:

步骤201：对在线图像识别服务的待处理图像的数量进行监测，根据监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量；Step 201: monitor the number of images to be processed in the online image recognition service, and determine the amount of equipment resources required to perform the prediction task according to the monitored number of the images to be processed;

资源集群通常为多个微服务系统提供资源，相关技术中多将资源集群中的CPU占用率作为执行扩缩容操作的判断指标。当与在线图像识别服务无关的其他系统占用CPU资源较多时，会导致CPU占用率升高。当在线图像识别服务监测到CPU资源利用率升高时会通过执行扩缩容操作以获取更多的计算资源。A resource cluster usually provides resources for multiple microservice systems. In related technologies, the CPU occupancy rate in the resource cluster is often used as a judgment indicator for performing expansion and contraction operations. When other systems unrelated to the online image recognition service occupy more CPU resources, the CPU usage rate will increase. When the online image recognition service detects an increase in CPU resource utilization, it will perform scaling operations to obtain more computing resources.

由于该CPU资源利用率升高的原因并非由在线图像识别服务造成的，本应无需对在线图像识别服务执行扩缩容操作。这就导致了扩缩容误操作的发生，进而降低了服务器的资源利用率。考虑到在线图像识别服务的访客流量能够直观的体现在线图像识别服务的业务需求量，本申请实施例根据访客流量确定执行预测任务所需的设备资源量。实施时，考虑到不同客户需求的处理速率不同，因而设置表征客户需求处理速率的权重，该权重与待处理图像的数量的乘积即可代表执行预测任务所需的设备资源量。Since the reason for the increase in CPU resource utilization is not caused by the online image recognition service, there should be no need to perform expansion or reduction operations on the online image recognition service. This leads to the occurrence of expansion and contraction errors, which in turn reduces the resource utilization of the server. Considering that the visitor flow of the online image recognition service can intuitively reflect the business demand of the online image recognition service, the embodiment of the present application determines the amount of equipment resources required to perform the prediction task according to the visitor flow. During implementation, considering that the processing rates of different customer requirements are different, a weight representing the processing rate of customer requirements is set, and the product of the weight and the number of images to be processed can represent the amount of equipment resources required to perform the prediction task.

实施时可根据下述公式(1)确定设备资源量：During implementation, the amount of equipment resources can be determined according to the following formula (1):

y＝ω₁x₁+ω₂x₂+ω₃x₃+b公式(1)y=ω ₁ x ₁ +ω ₂ x ₂ +ω ₃ x ₃ +b Formula (1)

其中，y为设备资源量；ω_n为x_n的预设权重，n∈[1，2，3]；x₁为待处理图像的数量；x₂为延时时长，即为客户端对执行预测任务的延时要求；x₃为图像处理器的处理速率，即表示GPU设备的计算能力，该计算能力与GPU卡型号有关，具体可根据GPU设备的规格参数(如设备的张量核心数量)来表征计算能力；b为线性方程的常数项，可根据用户需求自行设定。Among them, y is the amount of device resources; ω _n is the preset weight of x _n , n∈[1, 2, 3]; x ₁ is the number of images to be processed; x ₂ is the delay time, that is, the client executes the The delay requirement of the prediction task; x ₃ is the processing rate of the image processor, which means the computing power of the GPU device, which is related to the GPU card model. ) to represent the computing power; b is the constant term of the linear equation, which can be set according to user needs.

在一些可能的实施例中，该待处理图像的数量是根据预先部署在所述在线图像识别服务的拦截器或过滤器确定的。In some possible embodiments, the number of images to be processed is determined according to interceptors or filters pre-deployed in the online image recognition service.

步骤202：基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标；其中，所述容器管理系统用于对所述在线图像识别服务持有的容器进行管理；每一容器内封装有图形处理器资源，所述容器为在线图像识别服务持有资源量的分配单位，所述容器的逻辑封装用于拦截对统一计算设备架构库的应用程序编程接口的调用；Step 202: Determine a preset monitoring index corresponding to the number of images to be processed in a container management system based on a preset conversion relationship; wherein the container management system is used to manage the containers held by the online image recognition service Each container is encapsulated with graphics processor resources, the container is the allocation unit of the online image recognition service holding the amount of resources, and the logical encapsulation of the container is used to intercept calls to the application programming interface of the unified computing device architecture library ;

容器管理系统用于对在线图像识别服务持有的容器进行管理，上述K8s即为市面常见的容器管理系统。容器即为对执行预测服务的图形处理器资源的承载。统一计算设备架构库(CUDA)即为K8s对实体设备卡的驱动程序，由于K8s不支持对实体设备卡内图形处理器资源的共享，因而本申请实施例通过拦截CUDA库中所有与内存相关的API和与计算相关的API可限制和隔离GPU的使用。这样可实现从实体设备卡中划分出部分资源，并对该部分资源进行容器封装以构建虚拟设备卡。由此将资源的最小分配单位从实体设备卡变更为承载虚拟设备卡的容器，提高服务器的资源利用率。The container management system is used to manage the containers held by the online image recognition service. The above K8s is a common container management system in the market. The container is the bearer of the graphics processor resources that execute the prediction service. The Unified Computing Device Architecture Library (CUDA) is the K8s driver for the physical device card. Since K8s does not support the sharing of graphics processor resources in the physical device card, the embodiment of this application intercepts all memory-related memory-related data in the CUDA library. APIs and compute-related APIs to limit and isolate GPU usage. In this way, a part of resources can be divided from the physical device card, and the part of resources can be packaged in a container to construct a virtual device card. Therefore, the minimum allocation unit of resources is changed from the physical device card to the container carrying the virtual device card, thereby improving the resource utilization rate of the server.

具体可如图2b所示，例如预测任务1所需的设备资源为700MB，资源集群中空闲的实体设备卡持有资源为1024MB。此时可从该实体设备卡中分出700MB的虚拟资源进行容器封装，以构建虚拟设备卡1。然后将该虚拟设备卡1分配给预测任务1使用。此时若存在预测任务2，预测任务2所需的设备资源为500MB，则可从该实体设备卡(现持有资源为1024MB-700MB＝524MB)中分出500MB构建虚拟设备卡2，并将虚拟设备卡2给预测任务2使用。这样就无需分配两张实体设备卡给预测任务1和预测任务2，提高了服务器的资源利用率。Specifically, as shown in FIG. 2b, for example, the equipment resources required for the prediction task 1 are 700MB, and the resources held by the idle physical device cards in the resource cluster are 1024MB. At this time, 700MB of virtual resources can be separated from the physical device card for container encapsulation, so as to construct the virtual device card 1 . Then assign the virtual device card 1 to the prediction task 1 for use. At this time, if there is a prediction task 2, and the equipment resources required for the prediction task 2 are 500MB, then 500MB can be divided from the physical device card (currently holding resources are 1024MB-700MB=524MB) to construct a virtual device card 2, and the Virtual device card 2 is used by prediction task 2. In this way, there is no need to allocate two physical device cards to the prediction task 1 and the prediction task 2, which improves the resource utilization rate of the server.

步骤203：根据所述预设监控指标控制所述容器管理系统对所述在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足所述设备资源量；根据所述持有资源量执行所述预测任务；Step 203: Control the container management system to perform an expansion/reduction operation on the container held by the online image recognition service according to the preset monitoring index, so that the amount of resources held corresponding to the expanded/reduced container satisfies all requirements. the amount of equipment resources; perform the prediction task according to the amount of held resources;

本申请实施例通过对待处理图像的数量设置对应的监控阈值以确定是否执行扩缩容任务，降低扩缩容误操作的发生。In this embodiment of the present application, by setting a corresponding monitoring threshold for the number of images to be processed to determine whether to perform a capacity expansion or reduction task, the occurrence of incorrect operation of capacity expansion or reduction is reduced.

实施时，可据业务需求和硬件处理能力自适应定义大小不同的第一监控阈值和第二监控阈值。具体的，若待处理图像的数量大于第一监控阈值，则确定待处理图像的数量对应于表征执行扩容操作的预设监控指标。若待处理图像的数量小于第二监控阈值，则确定待处理图像的数量对应于表征执行缩容操作的预设监控指标；若待处理图像的数量处于第一监控阈值和第二监控阈值之间，则确定待处理图像的数量对应于表征无需执行扩缩容操作的预设监控指标。其中，上述第二监控阈值小于第一监控阈值。During implementation, the first monitoring threshold and the second monitoring threshold with different sizes can be adaptively defined according to business requirements and hardware processing capabilities. Specifically, if the number of images to be processed is greater than the first monitoring threshold, it is determined that the number of images to be processed corresponds to a preset monitoring index representing the execution of the expansion operation. If the number of images to be processed is less than the second monitoring threshold, it is determined that the number of images to be processed corresponds to the preset monitoring index representing the execution of the shrinking operation; if the number of images to be processed is between the first monitoring threshold and the second monitoring threshold , then it is determined that the number of images to be processed corresponds to a preset monitoring indicator indicating that no expansion or reduction operation is required. Wherein, the above-mentioned second monitoring threshold is smaller than the first monitoring threshold.

执行扩容操作时，从资源集群的实体设备卡中划分出与目标资源量相同的资源，对所述资源进行容器封装以构建虚拟设备卡；所述目标资源量是根据所述设备资源量和所述持有资源量确定的；When the expansion operation is performed, the resources that are the same as the target resource amount are divided from the physical device cards of the resource cluster, and the resources are containerized to construct a virtual device card; the target resource amount is based on the device resource amount and all the resources. The amount of resources held is determined according to the description;

在一些可能的实施例中，上述目标资源量为设备资源量与持有资源量的差值。假设执行预测任务所需的设备资源量为1000MB，在线图像识别服务当前持有的全部容器对应的资源总量(即持有资源量)为700MB，则表征直接使用在线图像识别服务当前的持有资源执行该预测任务的延时不满足业务需求。此时需通过扩容操作为该预测任务提供更多的资源。In some possible embodiments, the above target resource amount is the difference between the device resource amount and the held resource amount. Assuming that the amount of equipment resources required to perform the prediction task is 1000MB, and the total amount of resources (that is, the amount of resources held) corresponding to all containers currently held by the online image recognition service is 700MB, it represents the current holding of the online image recognition service directly. The delay for the resource to perform the prediction task does not meet the business requirement. In this case, it is necessary to provide more resources for the prediction task through the expansion operation.

执行扩容操作时，首先对资源集群进行检测，获取满足所述设备资源量的实体设备卡。若获取的实体设备卡存在多张，则选取图形处理资源最少的实体设备卡作为目标实体设备卡。若仅存在一张，则将该实体设备卡作为目标实体设备卡。进一步的，从目标实体设备卡中划分出与目标资源量相同的图形处理器资源，并对图形处理器资源进行容器封装，将封装的容器作为虚拟设备卡。When performing a capacity expansion operation, firstly, the resource cluster is detected, and a physical device card satisfying the device resource amount is obtained. If there are multiple acquired physical device cards, the physical device card with the least graphics processing resources is selected as the target physical device card. If there is only one, take the physical device card as the target physical device card. Further, the graphics processor resource with the same amount of target resources is divided from the target physical device card, and the graphics processor resource is encapsulated in a container, and the encapsulated container is used as a virtual device card.

在一些可能的实施例中，对资源集群进行检测时，若未检测到资源集群中存在满足设备资源量的实体设备卡，则基于预设时间间隔对资源集群执行预设次数的资源重检。In some possible embodiments, when the resource cluster is detected, if it is not detected that there is a physical device card in the resource cluster that meets the device resource amount, the resource cluster is rechecked for a preset number of times based on a preset time interval.

执行资源重检过程中，若检测到满足设备资源量的实体设备卡，则选取图形处理资源最少的实体设备卡作为目标实体设备卡。若未检测到资源集群中存在满足设备资源量的实体设备卡则如图2c所示，输出表征当前无可用资源的提示，并结束本次扩容任务。During the resource rechecking process, if a physical device card satisfying the amount of device resources is detected, the physical device card with the least graphics processing resources is selected as the target physical device card. If it is not detected that there is a physical device card that meets the amount of device resources in the resource cluster, as shown in Figure 2c, a prompt indicating that there is currently no available resource is output, and the current capacity expansion task ends.

假设执行预测任务所需的设备资源量为500MB，在线图像识别服务当前持有的全部容器对应的资源总量为1200MB，则表征直接使用在线图像识别服务当前的持有资源执行该预测任务的延时足以满足业务需求。Assuming that the amount of equipment resources required to perform the prediction task is 500MB, and the total resources corresponding to all containers currently held by the online image recognition service are 1200MB, it represents the delay of directly using the resources currently held by the online image recognition service to execute the prediction task. sufficient to meet business needs.

此时为提高服务器的资源利用率，可通过缩容操作来释放在线图像识别服务当前持有的部分容器，从而达到提高延时的目的。但释放过程中还需保证提高延时后的响应时间仍能满足业务需求。At this time, in order to improve the resource utilization rate of the server, the capacity reduction operation can be used to release some of the containers currently held by the online image recognition service, so as to achieve the purpose of increasing the delay. However, in the release process, it is necessary to ensure that the response time after the delay is increased can still meet the business requirements.

实施时，首先确定在线图像识别服务当前持有的每一容器的图形处理器资源的资源量，并将资源量最小的容器作为待处理容器。然后删除该待处理容器，并监测在线图像识别服务的响应时间。若响应时间未处于预设时间范围内，则从剩余容器中重新选取待处理容器，直至响应时间处于预设时间范围内，停止预设删除操作。During implementation, the resource amount of graphics processor resources of each container currently held by the online image recognition service is first determined, and the container with the smallest resource amount is used as the container to be processed. Then delete the to-be-processed container and monitor the response time of the online image recognition service. If the response time is not within the preset time range, the container to be processed is reselected from the remaining containers, and the preset deletion operation is stopped until the response time is within the preset time range.

具体的，假设业务需求的响应时间为100～120ms，根据在线图像识别服务当前的持有资源量确定执行该预测任务所需的响应时间范围为80ms，则可将资源量最小的容器作为待处理容器，并删除该待处理容器后重新根据当前的持有资源量确定新响应时间，假设新响应时间为110ms，则可停止该预设删除操作，即结束本次扩容。Specifically, assuming that the response time of business requirements is 100-120ms, and the range of response time required to execute the prediction task is determined to be 80ms according to the current amount of resources held by the online image recognition service, the container with the smallest amount of resources can be used as the pending processing. After deleting the container to be processed, the new response time is determined according to the current amount of resources held. Assuming that the new response time is 110ms, the preset deletion operation can be stopped, that is, this expansion is ended.

通过上述流程，根据待处理图像的数量作为执行扩缩容操作的评判指标，以降低扩缩容误操作的发生。并通过容器封装的方式实现从实体设备卡中划分出用于执行预测任务的虚拟设备卡，提高了服务器的资源利用率。Through the above process, the number of images to be processed is used as an evaluation index for performing the capacity expansion/reduction operation, so as to reduce the occurrence of wrong operation of capacity expansion/reduction. In addition, a virtual device card for performing prediction tasks is divided from the physical device card by means of container encapsulation, thereby improving the resource utilization rate of the server.

为便于理解上述流程中如何实现以承载虚拟设备卡的容器作为资源的最小分配单位，下面结合图3对该实现流程进行介绍：In order to facilitate understanding of how to implement the container carrying the virtual device card as the minimum allocation unit of resources in the above process, the implementation process is described below with reference to Figure 3:

本申请开发了一个新的框架扩展来支持K8s系统中对实体设备卡的共享。下文将该框架称为GPUShare。GPUShare的作用是创建和管理在K8s中创建的自定义资源(SharePod)，SharePod表示能够在容器上连接共享实体设备卡的pod。This application develops a new framework extension to support the sharing of physical device cards in the K8s system. This framework is referred to below as GPUShare. The role of GPUShare is to create and manage custom resources (SharePod) created in K8s. SharePod represents a pod that can connect to a shared physical device card on a container.

实施时，创建SharePod的描述文件(SharePodSpec)。描述文件的内容包括预测任务所需的设备资源量、实体设备卡的标识符实体设备卡在资源集群中的节点名称等。设备资源量即为执行预测任务时对设备资源的需求，具体包括显存占用需求和运算占用需求。其中，显存占用需求以空间模式共享，运算占用需求以时间片模式共享。例如，显存占用需求为0.5表示容器最多可以分配GPU设备总内存空间的50％，而运算占用需求为0.5表示容器在时间片中应至少有50％的内核执行时间。When implementing, create a description file of SharePod (SharePodSpec). The content of the description file includes the amount of device resources required for the prediction task, the identifier of the physical device card, and the node name of the physical device card in the resource cluster, etc. The amount of device resources is the demand for device resources when performing the prediction task, which specifically includes the demand for video memory and the demand for computing. Among them, the video memory occupancy requirements are shared in the space mode, and the operation occupancy requirements are shared in the time slice mode. For example, a video memory requirement of 0.5 means that the container can allocate up to 50% of the total memory space of the GPU device, while an operation requirement of 0.5 means that the container should have at least 50% of the kernel execution time in the time slice.

为实现实体设备卡内资源(即图形处理器资源)的共享，GPUShare从K8s分配实体设备卡，然后将实体设备卡中的部分资源给SharePod。这些由GPUShare管理的资源通过容器封装为虚拟设备卡(virtual GPU,vGPU)。当GPUShare从K8s处获取这些vGPU时，这些vGPU的物理位置可以分布在资源群集中的多个节点上。因此可使用vGPU池来表示GPUShare管理的所有vGPU的集合。当GPU加入vGPU池时，它将被分配一个唯一标识符(GPUID)。GPUShare组件如图3所示，包括调度器301、虚拟设备卡管理器302以及虚拟设备卡数据库303。In order to realize the sharing of resources in the physical device card (ie graphics processor resources), GPUShare allocates the physical device card from K8s, and then transfers some of the resources in the physical device card to SharePod. These resources managed by GPUShare are packaged as virtual device cards (virtual GPU, vGPU) through containers. When GPUShare acquires these vGPUs from K8s, the physical locations of these vGPUs can be distributed across multiple nodes in the resource cluster. Thus a vGPU pool can be used to represent the set of all vGPUs managed by GPUShare. When a GPU joins the vGPU pool, it will be assigned a unique identifier (GPUID). As shown in FIG. 3 , the GPUShare component includes a scheduler 301 , a virtual device card manager 302 and a virtual device card database 303 .

实施时，调度程序根据资源集群的当前GPU设备的资源状态和执行预测任务所需的设备资源量决定容器和vGPU之间的映射。然后，调度器301使用调度策略确定的GPUID值生成SharePodSpec，并要求虚拟设备卡管理器302创建相应的SharePod实例。任何调度算法都可以在这个组件中实现，例如，binpack算法。它能够优先选择空闲资源满足设备资源量的需求但同时又是所剩资源最少的实体设备卡。When implemented, the scheduler decides the mapping between containers and vGPUs based on the resource state of the resource cluster's current GPU devices and the amount of device resources required to perform the predicted task. Then, the scheduler 301 generates a SharePodSpec using the GPUID value determined by the scheduling policy, and requests the virtual device card manager 302 to create a corresponding SharePod instance. Any scheduling algorithm can be implemented in this component, for example, the binpack algorithm. It can preferentially select idle resources to meet the demand of device resources, but at the same time it is the physical device card with the least remaining resources.

虚拟设备卡管理器302的作用是创建SharePod对象，然后在从调度器301收到SharePodSpec后在容器中初始化设备环境。具体来说，(vGPU库)设置了环境变量，并在容器中安装后文提到的虚拟设备卡数据库303，以隔离它们的GPU使用。即上述流程通过对从实体设备卡中划分的虚拟资源对象进行容器封装，以拦截CUDA库中所有与内存相关的API和与计算相关的API可限制和隔离GPU的使用。The role of the virtual device card manager 302 is to create a SharePod object, and then initialize the device environment in the container after receiving the SharePodSpec from the scheduler 301 . Specifically, the (vGPU library) sets environment variables and installs the later-mentioned virtual device card database 303 in the container to isolate their GPU usage. That is, the above process can limit and isolate the use of the GPU by encapsulating the virtual resource objects divided from the physical device card to intercept all memory-related APIs and computing-related APIs in the CUDA library.

vGPU的生命周期包括四个阶段：创建、活跃、空闲和删除。当vGPU设备管理器收到不存在GPUID的SharePod请求时，将创建vGPU。如果SharePod请求中的GPUID已经存在，虚拟设备卡管理器302只需从vGPU池中检索相应的UUID，而无需创建新的vGPU。vGPU被绑定到SharePod时变为活跃状态。当K8s中的用户删除SharePod时，vGPU与SharePod分离。当vGPU未被任何SharePod绑定时，它将进入空闲状态。当虚拟设备卡管理器302将空闲的vGPU分配给另一个SharePod时，它可以再次变为活跃状态。最后，虚拟设备卡管理器302可自行决定何时删除空闲vGPU，并将GPU释放给K8s。The life cycle of a vGPU consists of four phases: create, active, idle, and delete. A vGPU is created when the vGPU device manager receives a SharePod request for which GPUID does not exist. If the GPUID in the SharePod request already exists, the virtual device card manager 302 simply retrieves the corresponding UUID from the vGPU pool without creating a new vGPU. The vGPU becomes active when it is bound to the SharePod. When a user in K8s deletes the SharePod, the vGPU is separated from the SharePod. When the vGPU is not bound by any SharePod, it will go into idle state. When the virtual device card manager 302 assigns the idle vGPU to another SharePod, it can become active again. Finally, the virtual device card manager 302 can decide at its own discretion when to delete idle vGPUs and release GPUs to K8s.

资源过度使用往往会导致性能干扰问题，极端情况下，它可能导致应用程序失败或崩溃，例如实体设备卡的资源过度分配。因此，除了在容器之间实现资源共享外，还必须确保容器承载的资源使用可以在预测任务所需的设备资源量下受到限制，即实体设备卡隔离。为了实现这一点，本申请开发了虚拟设备卡数据库303。它是一个安装在每个SharePod容器中的库，通过拦截CUDA库中所有与内存相关的API和与计算相关的API来限制和隔离GPU的使用。Overcommitment of resources often leads to performance interference issues, and in extreme cases, it can cause applications to fail or crash, such as resource over-allocation of physical device cards. Therefore, in addition to enabling resource sharing between containers, it is also necessary to ensure that the resource usage hosted by the container can be limited under the amount of device resources required for the predicted task, i.e. physical device card isolation. To achieve this, the present application develops a virtual device card database 303 . It is a library installed in each SharePod container that limits and isolates GPU usage by intercepting all memory-related APIs and compute-related APIs in the CUDA library.

在一些实现方式中，可以使用令牌(token)以分时方式隔离容器之间的GPU使用。容器只有在拥有有效令牌时，才能在GPU设备上运行程序代码。令牌与时间配额相关联，当时间配额过期时，容器必须重新获取令牌以继续执行程序。因此，容器将通过在它们之间传递令牌来依次使用GPU设备，容器的使用率可以通过允许它持有令牌的时间量来控制。In some implementations, a token can be used to isolate GPU usage between containers in a time-sharing fashion. A container can only run program code on a GPU device if it has a valid token. The token is associated with a time quota, and when the time quota expires, the container must re-acquire the token to continue program execution. Thus, the container will in turn use the GPU device by passing tokens between them, and the container's usage can be controlled by the amount of time it is allowed to hold the token.

具体技术上，通过每个容器的前端模块和每个集群节点的后端模块实现。前端模块是容器内的动态链接库。它通过Linux LD_PRELOAD机制拦截所有与显存和计算相关的CUDA库API，使得应用程序在加载标准GPU CUDA库之前加载上述虚拟设备卡数据库303。如果容器没有有效的令牌，前端模块将阻止拦截的CUDA调用，直到它从后端模块重新获取有效令牌。后端模块是资源集群节点上运行的独立守护进程，用于管理容器之间的令牌。由于每个实体设备卡都与自己的令牌相关联，因此资源集群的节点上只需要一个后端模块来独立管理每个设备的令牌。Specifically, it is implemented through the front-end module of each container and the back-end module of each cluster node. A front-end module is a dynamic link library inside a container. It intercepts all CUDA library APIs related to video memory and computation through the Linux LD_PRELOAD mechanism, so that the application loads the above-mentioned virtual device card database 303 before loading the standard GPU CUDA library. If the container does not have a valid token, the front-end module will block the intercepted CUDA call until it re-obtains a valid token from the back-end module. Backend modules are independent daemons running on resource cluster nodes that manage tokens between containers. Since each physical device card is associated with its own token, only one backend module is required on the nodes of the resource cluster to independently manage each device's token.

基于相同的发明构思，本申请实施例还提供了一种在线图像识别服务的扩缩容装置400，具体如图4所示，包括：Based on the same inventive concept, an embodiment of the present application further provides an apparatus 400 for expanding and shrinking the capacity of an online image recognition service, as shown in FIG. 4 , including:

流量监测模块401，被配置为执行对在线图像识别服务的待处理图像的数量进行监测，根据监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量；The flow monitoring module 401 is configured to perform monitoring on the number of images to be processed in the online image recognition service, and determine the amount of equipment resources required to perform the prediction task according to the detected number of the images to be processed;

指标确定模块402，被配置为执行基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标；其中，所述容器管理系统用于对所述在线图像识别服务持有的容器进行管理；每一容器内封装有图形处理器资源，所述容器为在线图像识别服务持有资源量的分配单位，所述容器的逻辑封装用于拦截对统一计算设备架构库的应用程序编程接口的调用；The indicator determination module 402 is configured to perform determining a preset monitoring indicator corresponding to the quantity of the images to be processed in the container management system based on a preset conversion relationship; wherein the container management system is used for identifying the online image service The held containers are managed; each container is encapsulated with graphics processor resources, the container is the allocation unit of resources held by the online image recognition service, and the logical encapsulation of the container is used to intercept the unified computing device architecture library. Application programming interface invocation;

扩缩容模块403，被配置为执行根据所述预设监控指标控制所述容器管理系统对所述在线图像识别服务持有的容器进行扩缩容操作，以使扩缩容后的容器所对应的持有资源量满足所述设备资源量；根据所述持有资源量执行所述预测任务；The capacity expansion and contraction module 403 is configured to control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so that the expanded and contracted containers correspond to The amount of held resources meets the amount of equipment resources; the prediction task is performed according to the amount of held resources;

其中，所述扩缩容模块403执行扩容操作时，从资源集群的实体设备卡中划分出与目标资源量相同的资源，对所述资源进行容器封装以构建虚拟设备卡；所述目标资源量是根据所述设备资源量和所述持有资源量确定的；Wherein, when the capacity expansion/reduction module 403 performs the capacity expansion operation, it divides resources with the same amount of target resources from the physical device cards of the resource cluster, and performs container encapsulation on the resources to construct a virtual device card; the target resource amount is determined according to the amount of equipment resources and the amount of held resources;

所述扩缩容模块403执行缩容操作时，对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内。When the capacity expansion/reduction module 403 performs a capacity reduction operation, a preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range.

在一些可能的实施例中，执行所述基于监测到的所述待处理图像的数量确定执行预测任务所需的设备资源量，所述流量监测模块401被配置为：In some possible embodiments, to perform the determining of the amount of equipment resources required to perform the prediction task based on the number of the monitored images to be processed, the flow monitoring module 401 is configured to:

在一些可能的实施例中，执行所述基于预设转换关系确定所述待处理图像的数量在容器管理系统中对应的预设监控指标，所述指标确定模块402被配置为：In some possible embodiments, the determining a preset monitoring indicator corresponding to the number of images to be processed in the container management system based on a preset conversion relationship is performed, and the indicator determining module 402 is configured to:

在一些可能的实施例中，执行所述从资源集群的实体设备卡中划分出与目标资源量相同的虚拟资源之前，所述扩缩容模块403还被配置为：In some possible embodiments, before executing the partitioning of virtual resources with the same amount of target resources from the physical device cards of the resource cluster, the capacity expansion and contraction module 403 is further configured to:

在一些可能的实施例中，所述扩缩容模块403还被配置为：In some possible embodiments, the capacity expansion/reduction module 403 is further configured to:

在一些可能的实施例中，执行所述对所述在线图像识别服务当前持有的容器执行预设删除操作，以使所述在线图像识别服务的响应时间处于预设时间范围内，所述扩缩容模块403被配置为：In some possible embodiments, the performing the preset deletion operation on the container currently held by the online image recognition service is performed, so that the response time of the online image recognition service is within a preset time range, and the extension The shrinking module 403 is configured to:

下面参照图5来描述根据本申请的这种实施方式的电子设备130。图5显示的电子设备130仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。The electronic device 130 according to this embodiment of the present application is described below with reference to FIG. 5 . The electronic device 130 shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

如图5所示，电子设备130以通用电子设备的形式表现。电子设备130的组件可以包括但不限于：上述至少一个处理器131、上述至少一个存储器132、连接不同系统组件(包括存储器132和处理器131)的总线133。As shown in FIG. 5, the electronic device 130 takes the form of a general electronic device. Components of the electronic device 130 may include, but are not limited to: the above-mentioned at least one processor 131 , the above-mentioned at least one memory 132 , and a bus 133 connecting different system components (including the memory 132 and the processor 131 ).

总线133表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器、外围总线、处理器或者使用多种总线结构中的任意总线结构的局域总线。Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus structures.

存储器132可以包括易失性存储器形式的可读介质，例如随机存取存储器(RAM)1321和/或高速缓存存储器1322，还可以进一步包括只读存储器(ROM)1323。Memory 132 may include readable media in the form of volatile memory, such as random access memory (RAM) 1321 and/or cache memory 1322 , and may further include read only memory (ROM) 1323 .

存储器132还可以包括具有一组(至少一个)程序模块1324的程序/实用工具1325，这样的程序模块1324包括但不限于：操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。The memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, which An implementation of a network environment may be included in each or some combination of the examples.

电子设备130也可以与一个或多个外部设备134(例如键盘、指向设备等)通信，还可与一个或者多个使得用户能与电子设备130交互的设备通信，和/或与使得该电子设备130能与一个或多个其它电子设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口135进行。并且，电子设备130还可以通过网络适配器136与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器136通过总线133与用于电子设备130的其它模块通信。应当理解，尽管图中未示出，可以结合电子设备130使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Electronic device 130 may also communicate with one or more external devices 134 (eg, keyboards, pointing devices, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 130, and/or communicate with the electronic device 130 communicates with any device (eg, router, modem, etc.) capable of communicating with one or more other electronic devices. Such communication may take place through input/output (I/O) interface 135 . Also, the electronic device 130 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 136 . As shown, network adapter 136 communicates with other modules for electronic device 130 via bus 133 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives and data backup storage systems.

在示例性实施例中，还提供了一种包括指令的计算机可读存储介质，例如包括指令的存储器132，上述指令可由装置400的处理器131执行以完成上述方法。可选地，计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a computer-readable storage medium including instructions, such as a memory 132 including instructions, which are executable by the processor 131 of the apparatus 400 to perform the above method. Alternatively, the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

在示例性实施例中，还提供一种计算机程序产品，包括计算机程序/指令，所述计算机程序/指令被处理器131执行时实现如本申请提供的一种在线图像识别服务的扩缩容方法。In an exemplary embodiment, a computer program product is also provided, including computer programs/instructions, and when the computer programs/instructions are executed by the processor 131, an online image recognition service scaling method as provided in the present application is implemented. .

在示例性实施例中，本申请提供的一种在线图像识别服务的扩缩容方法的各个方面还可以实现为一种程序产品的形式，其包括程序代码，当程序产品在计算机设备上运行时，程序代码用于使计算机设备执行本说明书上述描述的根据本申请各种示例性实施方式的一种在线图像识别服务的扩缩容方法中的步骤。In an exemplary embodiment, various aspects of an online image recognition service scaling method provided by the present application can also be implemented in the form of a program product, which includes program code, and when the program product runs on a computer device , the program code is used to cause the computer device to execute the steps in the method for expanding or shrinking the capacity of an online image recognition service according to various exemplary embodiments of the present application described above in this specification.

程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

本申请的实施方式的用于一种在线图像识别服务的扩缩容的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码，并可以在电子设备上运行。然而，本申请的程序产品不限于此，在本文件中，可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The program product for expansion and contraction of an online image recognition service according to the embodiment of the present application may adopt a portable compact disk read only memory (CD-ROM) and include program codes, and may be executed on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了可读程序代码。这种传播的数据信号可以采用多种形式，包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质，该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

可读介质上包含的程序代码可以用任何适当的介质传输，包括——但不限于——无线、有线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码，程序设计语言包括面向对象的程序设计语言—诸如Java、C++等，还包括常规的过程式程序设计语言—诸如“如“语言或类似的程序设计语言。程序代码可以完全地在用户电子设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户电子设备上部分在远程电子设备上执行、或者完全在远程电子设备或服务端上执行。在涉及远程电子设备的情形中，远程电子设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户电子设备，或者，可以连接到外部电子设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming Language - such as "as" language or similar programming language. The program code may execute entirely on the user's electronic device, partly on the user's device, as a stand-alone software package, partly on the user's electronic device and partly on a remote electronic device, or entirely on the remote electronic device or service Execute on the end. In cases involving remote electronic devices, the remote electronic devices may be connected to the user electronic device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to external electronic devices (eg, using Internet services provider to connect via the Internet).

应当注意，尽管在上文详细描述中提及了装置的若干单元或子单元，但是这种划分仅仅是示例性的并非强制性的。实际上，根据本申请的实施方式，上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之，上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, this division is merely exemplary and not mandatory. Indeed, according to embodiments of the present application, the features and functions of two or more units described above may be embodied in one unit. Conversely, the features and functions of one unit described above may be further subdivided to be embodied by multiple units.

此外，尽管在附图中以特定顺序描述了本申请方法的操作，但是，这并非要求或者暗示必须按照该特定顺序来执行这些操作，或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地，可以省略某些步骤，将多个步骤合并为一个步骤执行，和/或将一个步骤分解为多个步骤执行。Furthermore, although the operations of the methods of the present application are depicted in the figures in a particular order, this does not require or imply that the operations must be performed in the particular order, or that all illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined to be performed as one step, and/or one step may be decomposed into multiple steps to be performed.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程图像缩放设备的处理器以产生一个机器，使得通过计算机或其他可编程图像缩放设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable image scaling device to produce a machine such that the instructions executed by the processor of the computer or other programmable image scaling device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程图像缩放设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable image scaling device to function in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程图像缩放设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable image scaling device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that an image executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

尽管已描述了本申请的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While the preferred embodiments of the present application have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of this application.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. a method for expanding and shrinking capacity of an online image recognition service, wherein the method comprises:

Monitoring the number of images to be processed in the online image recognition service, and determining the amount of equipment resources required to perform the prediction task based on the monitored number of images to be processed;

A preset monitoring index corresponding to the number of images to be processed in the container management system is determined based on a preset conversion relationship; wherein the container management system is used to manage the containers held by the online image recognition service; each The container is encapsulated with graphics processor resources, the container is the allocation unit of the resource amount held by the online image recognition service, and the logical encapsulation of the container is used for intercepting the invocation of the application programming interface of the unified computing device architecture library;

The container management system is controlled to perform capacity expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so that the amount of resources held by the expanded and contracted containers can meet the requirements of the device resources. amount; perform the forecasting task according to the amount of resources held;

Wherein, when the expansion operation is performed, resources that are the same as the target resource amount are divided from the physical device cards of the resource cluster, and the resources are packaged in containers to construct a virtual device card; the target resource amount is based on the device resource amount. and the amount of resources held;

When the shrinking operation is performed, a preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range.

2. The method according to claim 1, wherein the determining the amount of equipment resources required to perform the prediction task based on the monitored number of the images to be processed comprises:

Obtain the delay duration of the prediction task and the processing rate of the image processor;

The amount of device resources is determined according to the number of images to be processed, the delay duration and the processing rate.

3 . The method according to claim 1 , wherein the determining a preset monitoring index corresponding to the number of the images to be processed in the container management system based on a preset conversion relationship comprises: 3 .

If the number of the to-be-processed images is greater than the first monitoring threshold, determining that the number of the to-be-processed images corresponds to a preset monitoring index representing the execution of the expansion operation;

If the number of the to-be-processed images is less than the second monitoring threshold, it is determined that the number of the to-be-processed images corresponds to a preset monitoring index representing the execution of a scaling operation; wherein the second monitoring threshold is less than the first monitoring threshold threshold;

If the number of images to be processed is between the first monitoring threshold and the second monitoring threshold, it is determined that the number of images to be processed corresponds to a preset monitoring indicator indicating that no expansion or reduction operation needs to be performed.

4 . The method according to claim 1 , wherein, before dividing the virtual resources with the same amount of target resources from the physical device cards of the resource cluster, the method further comprises: 5 .

Detecting the resource cluster, and obtaining a physical device card that satisfies the device resource amount;

The step of dividing the graphics processor resources with the same amount of target resources from the physical device cards of the resource cluster, and performing container encapsulation on the graphics processor resources to construct a virtual device card, includes:

Selecting the physical device card with the least graphics processing resources as the target physical device card, and dividing the graphics processor resource with the same amount of the target resources from the target physical device card;

Perform container encapsulation on the graphics processor resource, and use the encapsulated container as the virtual device card.

5. The method according to claim 4, wherein the method further comprises:

When detecting the resource cluster, if it is not detected that there is a physical device card that meets the device resource amount in the resource cluster, performing a preset number of resource rechecks on the resource cluster based on a preset time interval;

During the execution of the resource rechecking process, if an entity device card satisfying the amount of the device resources is detected, the entity device card with the least graphics processing resources is selected as the target entity device card;

If it is not detected that there is a physical device card in the resource cluster that meets the device resource amount, a prompt indicating that there is currently no available resource is output, and the current capacity expansion task ends.

6 . The method according to claim 1 , wherein the preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time. 7 . range, including:

Determine the resource amount of graphics processor resources of each container, and use the container with the smallest amount of resources as the container to be processed;

delete the container to be processed, and monitor the response time of the online image recognition service;

If the response time is not within the preset time range, the to-be-processed container is reselected from the remaining containers, and the preset deletion operation is stopped until the response time is within the preset time range.

7. The method according to any one of claims 1-6, wherein the number of images to be processed is determined according to an interceptor or filter pre-deployed in the online image recognition service.

8. A capacity expansion/reduction device for an online image recognition service, wherein the device comprises:

a flow monitoring module, configured to monitor the number of images to be processed in the online image recognition service, and determine the amount of equipment resources required to perform the prediction task according to the monitored number of the images to be processed;

The indicator determination module is configured to determine the preset monitoring indicators corresponding to the quantity of the images to be processed in the container management system based on the preset conversion relationship; wherein, the container management system is used for maintaining the online image recognition service Some containers are managed; each container is encapsulated with graphics processor resources, the container is the allocation unit of resources held by the online image recognition service, and the logical encapsulation of the container is used to intercept the application of the unified computing device architecture library Program programming interface invocation;

The capacity expansion and contraction module is configured to control the container management system to perform expansion and contraction operations on the containers held by the online image recognition service according to the preset monitoring indicators, so that the containers corresponding to the expanded and contracted containers can be expanded or contracted. The amount of resources held satisfies the amount of equipment resources; the prediction task is performed according to the amount of resources held;

Wherein, when the expansion and contraction module performs the expansion operation, it divides the resources with the same target resource amount from the physical device cards of the resource cluster, and performs container encapsulation on the resources to construct a virtual device card; the target resource amount is Determined according to the amount of equipment resources and the amount of held resources;

When the capacity expansion/reduction module performs a capacity reduction operation, a preset deletion operation is performed on the container currently held by the online image recognition service, so that the response time of the online image recognition service is within a preset time range.

9. An electronic device, characterized in that, comprising:

processor;

memory for storing instructions executable by the processor;

wherein the processor is configured to execute the instructions to implement the method of any of claims 1-7.

10 . A storage medium, characterized in that, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the method according to any one of claims 1 to 7 .