CN119883512B

CN119883512B - Model task processing method, device, equipment and medium

Info

Publication number: CN119883512B
Application number: CN202411959462.6A
Authority: CN
Inventors: 杨子夜; 文骞
Original assignee: Beijing Volcano Engine Technology Co Ltd
Current assignee: Beijing Volcano Engine Technology Co Ltd
Priority date: 2024-12-27
Filing date: 2024-12-27
Publication date: 2025-09-26
Anticipated expiration: 2044-12-27
Also published as: CN119883512A

Abstract

The disclosed embodiments relate to a model task processing method, apparatus, device, and medium, wherein the method comprises: obtaining a target model task and reinforcement configuration information, determining a task stage to be reinforced and a non-reinforced task stage among multiple task stages based on the reinforcement configuration information, scheduling the task stage to be reinforced to a corresponding first virtual machine for processing, and scheduling the non-reinforced task stage to a corresponding second virtual machine for processing, wherein the first virtual machine is deployed in a trusted execution environment based on trusted hardware. The above technical solution effectively improves the security of model task processing, prevents internal attacks by relevant personnel of the cloud service provider, and improves the security level of data isolation at runtime while ensuring that performance is not affected.

Description

Model task processing method, device, equipment and medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a method, a device, equipment and a medium for processing model tasks.

Background

In a cloud computing scenario, a cloud service provider of a Model-i-service (Model AS A SERVICE, maaS) may provide the Model as a service to users who want their own set of containers (pod) of Model-related tasks isolated in all network, computing and storage dimensions to ensure security. Related personnel of the cloud service provider may acquire some runtime data of the cloud server virtual machine (Elastic Compute Service Virtual Machine, ECS VM) where the user container group is located without obtaining authorization, thereby acquiring some data of user reasoning or fine tuning, resulting in lower security of model task processing of the user. The above problems are solved by a trusted execution environment or a password confusion technology at a device level or a process level in the related art, but the effect is poor and improvement is required.

Disclosure of Invention

In order to solve the technical problems, the present disclosure provides a method, an apparatus, a device, and a medium for processing model tasks.

The embodiment of the disclosure provides a model task processing method, which comprises the following steps:

acquiring a target model task and reinforcement configuration information, wherein the target model task comprises a plurality of task stages aiming at a target model;

Determining a task stage to be reinforced and a non-reinforced task stage in the plurality of task stages according to the reinforcement configuration information;

Scheduling the task stage to be reinforced to a corresponding first virtual machine for processing, and scheduling the non-reinforced task stage to a corresponding second virtual machine for processing;

wherein the first virtual machine is deployed in a trusted execution environment based on trusted hardware.

The embodiment of the disclosure also provides a model task processing device, which comprises:

the system comprises an acquisition module, a reinforcement configuration module and a reinforcement configuration module, wherein the acquisition module is used for acquiring a target model task and reinforcement configuration information, and the target model task comprises a plurality of task stages aiming at a target model;

the determining module is used for determining a task stage to be reinforced and a non-reinforced task stage in the plurality of task stages according to the reinforcement configuration information;

the processing module is used for dispatching the task stage to be reinforced to the corresponding first virtual machine for processing, and dispatching the non-reinforced task stage to the corresponding second virtual machine for processing;

The embodiment of the disclosure also provides electronic equipment, which comprises a processor, a memory for storing executable instructions of the processor, and the processor, wherein the processor is used for reading the executable instructions from the memory and executing the instructions to realize the model task processing method provided by the embodiment of the disclosure.

The present disclosure also provides a computer-readable storage medium storing a computer program for executing the model task processing method as provided by the embodiments of the present disclosure.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the advantages that the model task processing scheme provided by the embodiment of the disclosure acquires the target model task and the reinforcement configuration information, wherein the target model task comprises a plurality of task stages aiming at the target model, the task stages to be reinforced and the non-reinforcement task stages in the plurality of task stages are determined according to the reinforcement configuration information, the task stages to be reinforced are scheduled to the corresponding first virtual machines for processing, and the non-reinforcement task stages are scheduled to the corresponding second virtual machines for processing, wherein the first virtual machines are deployed in a trusted execution environment based on trusted hardware. By adopting the technical scheme, the cloud service provider can divide the target model task into a plurality of task stages, schedule different task stages to different virtual machines for reinforcement according to the reinforcement configuration information, and because the first virtual machine is deployed in the trusted execution environment based on trusted hardware, the model task processed by the first virtual machine can obtain higher safety, the safety of model task processing is effectively improved, the internal attack of related personnel of the cloud service provider is prevented, and the safety water level of data isolation during operation is improved on the premise of influencing the performance.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of a model task processing method provided in some embodiments of the present disclosure;

FIG. 2 is a flow chart of another method of model task processing provided in some embodiments of the present disclosure;

FIG. 3 is a schematic architecture diagram of a virtual machine deployed based on an executable environment provided by some embodiments of the present disclosure;

FIG. 4 is a flow chart of yet another model task processing method provided by some embodiments of the present disclosure;

FIG. 5 is a schematic diagram of a control plane architecture provided by some embodiments of the present disclosure;

FIG. 6 is a schematic diagram of a model task processing device according to some embodiments of the present disclosure;

Fig. 7 is a schematic structural diagram of an electronic device according to some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "and" some embodiments "means" at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

In a cloud computing scenario, a user may run related tasks of a large language model through a rented Infrastructure as a service (AS A SERVICE, iaaS) or Platform as a service (Platform AS A SERVICE, paaS), and when the user obtains the model as a service from a cloud service provider, the user wants its own set of containers of model related tasks to be isolated in all network, computing and storage dimensions to ensure security. However, under the condition that the related personnel of the cloud service provider are not authorized online, the related personnel enter the container group of the service user to steal data, for example, the related personnel of the cloud service provider operate an operating system or a virtual machine management program to dump some runtime memories of the virtual machines of the container group of some users through the technology of infrastructure, and then perform offline analysis, so that some information of the container group is obtained, and the security of model task processing of the user is low.

In the related art, one implementation is to directly put model tasks into devices of a trusted execution environment, and completely depend on the characteristics of the devices, and if no devices are available, no workload can be put into the devices. Another implementation is to use a trusted execution environment at the process level to promote the security of model task processing, require the use of applications for migration, are costly, and are not suitable for model task processing. In another embodiment, the security of model task processing is improved by using a password confusion technology, and a cryptography method is adopted in particular, so that a user of a cloud service provider cannot identify data running on an infrastructure, namely a service platform, but the method requires the user to perform some data conversion on a client side, and has low practicability. In summary, the above-described method has poor effect in improving the safety of model task processing, and needs to be improved.

In order to solve the above-mentioned problems, embodiments of the present disclosure provide a model task processing method, which is described below with reference to specific embodiments.

Fig. 1 is a flow chart of a model task processing method provided in some embodiments of the present disclosure, which may be performed by a model task processing device, where the device may be implemented in software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method includes:

step 101, acquiring a target model task and reinforcement configuration information, wherein the target model task comprises a plurality of task stages aiming at a target model.

The model task processing method provided by the embodiment of the disclosure can be applied to a cloud service provider (Cloud Service Provider, CSP), wherein the cloud service provider refers to an entity for providing cloud services (such as infrastructure as a service or platform as a service) to serve different model providers (vender), such as the cloud service provider can accept a large model of the model provider and provide services of reasoning, fine tuning and the like based on the large model. The large model is also called a large language model (Large Language Model, LLM), which is a natural language-based processing model, through which rules and structures of a language can be automatically learned, meaning of the language is understood, and text with correct grammar and semantic consistency is generated according to the understood meaning. The large model may also enable generating images from text, generating videos from text, and so forth. Model providers refer to entities that own large models and utilize cloud service providers to build their own reasoning-related services.

The model task refers to a task related to any model issued by a user, such as one or more of a model reasoning task and a model fine tuning task for a certain model, and in the embodiment of the present disclosure, the target model task refers to a related task of a target model, and the target model may be a model that the user needs to execute task processing. A user (client) refers to an entity running an application provided by a cloud service provider and a model provider, and has some requirements on the cloud service provider and the model provider, and may be considered as a client common to the cloud service provider and the model provider. Model inference tasks refer to the process of predicting or classifying data using a trained model. Model fine tuning tasks refer to fine tuning a model using training data to improve its performance on a particular task.

Specifically, the target model task includes a plurality of task phases for the target model, the specific number is not limited. The task phase refers to different links or tasks which can be split and processed in a life cycle of a model task determined by a cloud service provider through analyzing a plurality of model tasks.

For example, when the target model task is a model training task for a target model, the target model task may include a plurality of task phases such as a preprocessing phase, an intermediate phase, and a post-processing phase. The preprocessing stage refers to preliminary processing and preparation work of input data, such as tasks of data cleaning, format conversion, feature extraction, preprocessing and the like. The intermediate stages are mainly performing computational tasks, typically the core computational part of the model. The post-processing stage is to further process and integrate the data output from the model to convert the results into a form that can ultimately be used or displayed. Specifically, the preprocessing stage is typically performed by a central processing unit (Central Processing Unit, CPU), the intermediate processing stage is typically performed by a graphics processor (Graphics Processing Unit, GPU), and the post-processing stage is typically performed by the central processing unit. Taking a target model task as a model training task as an example, the model training task can be divided into a preprocessing stage of a data set, a preprocessing data set training stage and an integration stage of model output data.

The reinforcement configuration information refers to information of task configuration of a model provider for a target model based on reinforcement requirements in advance, and specifically, a visual page can be provided to the model provider through a cloud service provider, so that the model provider can rapidly configure the model. Specifically, the reinforcement configuration information is used to specify a task stage to be reinforced of the target model task using a target configuration dimension, where the target configuration dimension refers to a set of key parameters for describing and defining the model task, and specifically, the target configuration dimension may include at least one of a reinforcement level, a task stage, a model, and a user.

The reinforcement level refers to a protection level configured by a model provider for tasks of a target model in advance from a reinforcement level dimension, and the number of task stages to be reinforced corresponding to different levels is different. Illustratively, if the reinforcement level is a first level, each task stage of the target model is not reinforced, and if the reinforcement level is a second level, 1/3 of the task stages of the target model are reinforced, and 2/3 of the task stages of the target model are not reinforced. The task phase refers to reinforcement configuration information that the model provider specifies from the task phase dimension which task phase needs reinforcement, and if the model provider specifies task phase a in the target configuration dimension, then task phase a is determined to need reinforcement. The model refers to reinforcement configuration information specified by the model provider from the model dimension, and if the model specified by the model provider in the target configuration dimension is model B, all model tasks corresponding to model B need reinforcement. The user refers to reinforcement configuration information designated by the model provider from the user dimension, and if the user designated by the model provider in the target configuration dimension is user C, reinforcement is required at each task stage of the model task related to all the models corresponding to user C. The task stage to be reinforced refers to a task stage which needs to be reinforced in the model task.

In the embodiment of the disclosure, the cloud service provider may acquire the target model task issued by the user and the reinforcement configuration information preset by the model provider, where the reinforcement configuration information may be set according to the actual service requirement.

And 102, determining a task stage to be reinforced and a non-reinforced task stage in a plurality of task stages according to the reinforcement configuration information.

The task stage to be reinforced refers to a task stage in a model task, wherein the reinforcement refers to data isolation performed on the aspects of network, calculation, storage and the like, and potential attack of a host operating system or a virtual machine management program on information in a virtual machine is prevented. The non-reinforcement phase refers to a task phase in which reinforcement is not required in the model task.

In the embodiment of the disclosure, the cloud service provider determines a task stage to be reinforced and a non-reinforced task stage in a plurality of task stages of the target model task by analyzing the reinforced configuration information.

For example, fig. 2 is a schematic flow chart of another method for processing a model task according to some embodiments of the present disclosure, as shown in fig. 2, where step 102 may include step 201, step 202, and step 203, and specifically includes:

step 201, when the target configuration dimension is the reinforcement level, determining task stages to be reinforced based on the target number corresponding to the reinforcement level, and determining task stages other than the task stages to be reinforced as non-reinforcement task stages.

The target number refers to the number of task stages, which need reinforcement, among task stages of the target model task determined based on the reinforcement level.

In the embodiment of the present disclosure, when the target configuration dimension is the reinforcement level, the cloud service provider may determine the target number based on the reinforcement level, extract the task phases of the target number from the plurality of task phases according to a preset extraction policy as task phases to be reinforced, and determine the task phases other than the task phases to be reinforced as non-reinforced task phases, where the preset extraction policy may be, for example, random extraction or extraction according to a sequence of data amounts of the task phases from large to small, and the like, and is not limited specifically, for example, the preset extraction policy is random extraction, and when the target number is half, half of the task phases of random extraction may be determined as task phases to be reinforced, and determine the task phases other than the task phases to be reinforced as non-reinforced task phases.

Step 202, when the target configuration dimension is a model or a user, determining all task phases in the plurality of task phases as task phases to be reinforced when the target model task matches the model or the user.

Specifically, when the target configuration dimension is a model, the cloud service provider matches the target model task with the model, and if the matching is successful, all task phases of the target model task are determined to be task phases to be reinforced. And when the target configuration dimension is the user, the cloud service provider matches the target model task with the user, and if the matching is successful, all task phases of the target model task are determined to be task phases to be reinforced.

In step 203, when the target configuration dimension is a task stage, task stages corresponding to the target configuration dimension in the task stages are determined as task stages to be reinforced, and task stages other than the task stages to be reinforced are determined as non-reinforced task stages.

In the embodiment of the disclosure, when the target configuration dimension is a task stage, the cloud service provider may determine a task stage, which is to be reinforced by the target model preconfigured by the model provider, of a plurality of task stages of the target model task as a task stage to be reinforced, and determine task stages other than the task stage to be reinforced as non-reinforced task stages.

And step 103, scheduling the task stage to be reinforced to a corresponding first virtual machine for processing, and scheduling the non-reinforced task stage to a corresponding second virtual machine for processing, wherein the first virtual machine is deployed in a trusted execution environment based on trusted hardware.

The virtual machine (VirtualMachine, VM) can provide a stand-alone operating environment for managing User container groups (User Pod) and Service container groups (Service Pod). So that the above-mentioned container groups share the same virtual hardware resources in the virtual machines, but remain independent and isolated from each other. The first virtual machine refers to a virtual machine deployed in a trusted execution environment based on trusted hardware, and the specific number is not limited. The second virtual machine refers to a virtual machine which is not deployed in a trusted execution environment based on trusted hardware, and the specific number is not limited.

The trusted execution environment (Trusted Execution Environment, TEE) is a secure area of device hardware or software, isolated from the host operating system, providing a trusted environment to execute sensitive or critical code and data. Security in the TEE comes primarily from its isolation from the host operating system and hardware protection measures, the first virtual machine being deployed in a trusted hardware-based trusted execution environment. The TEE provides a secure execution environment in which stored and executed code and data are protected. The TEE itself is composed of special hardware in the processor, and through some security protection mechanisms, the code and data in the TEE are prevented from being tampered or stolen by the outside. In addition, the TEE does not allow common applications to access code and data therein, thereby improving security of the system. In the embodiment of the disclosure, the first virtual machine is completely encapsulated in the executable environment, so that the integration of the first virtual machine and the executable environment is realized, related data is isolated in the executable environment, and the safety of the data is ensured.

Trusted hardware-based trusted execution environments provide a secure computing environment through hardware protection mechanisms to prevent potential attacks by a host operating system or virtual machine manager on information within a virtual machine. Specifically, unauthorized access to the memory and page table of the virtual machine is limited through hardware protection, and a safe sandbox environment is constructed. For cloud service providers, this technique helps to enhance the security of user data. Fig. 3 is a schematic architecture diagram of a virtual machine deployed based on an executable environment according to some embodiments of the present disclosure, and as shown in fig. 3, a consolidated virtual machine deployed using a trusted execution environment may be an example of the first virtual machine described above, so as to protect memory security through strict hardware access control. Only authenticated hardware can access the memory, and when the CPU accesses the memory, the data exists in a plaintext form, and the other conditions are all encrypted, so that the authority of a host operating system and a virtual machine management program is limited, and even if the memory data is exported, the memory data cannot be read.

In the embodiment of the disclosure, after determining a task stage to be consolidated and a non-consolidated task stage in a plurality of task stages according to the consolidation configuration information, the cloud service provider schedules the task stage to be consolidated to a corresponding first virtual machine for processing, and schedules the non-consolidated task stage to a corresponding second virtual machine for processing.

In an alternative implementation, the task stage to be reinforced is scheduled to the corresponding first virtual machine for processing, and the method comprises the steps of scheduling the first task stage of the corresponding central processor in the task stage to be reinforced to the central processing virtual machine in the first virtual machine for processing, and scheduling the second task stage of the corresponding graphic processor to the graphic processing virtual machine in the first virtual machine for processing.

The first task stage refers to any task stage to be reinforced, which needs to be processed by using the capability of the central processing unit. A central processing virtual machine refers to a virtual machine having the function of emulating a central processor for providing the capability of a central processor similar to an actual computer. The second task stage refers to any task stage to be consolidated, which needs to be processed by using the capability of the image processor. Graphics processing virtual machines refer to virtual machines that have the functionality of emulating a graphics processor, providing the ability to resemble an actual computer graphics processor.

The central processing virtual machine in the first virtual machine can realize the isolation on hardware by directly deploying the central processing virtual machine in a trusted execution environment based on trusted hardware, the graphics processing virtual machine in the first virtual machine can realize the isolation on hardware by directly deploying the graphics processing virtual machine in the trusted execution environment based on trusted hardware, and the isolation on hardware can also be realized by accessing the central processing virtual machine in the first virtual machine into the graphics processor. Specifically, the central processing virtual machine in the first virtual machine not only can isolate the central processor and the memory in hardware, but also can isolate the external devices such as the graphics processor, ensures that the content entering the peripheral of the graphics processor is encrypted, can prevent the attack from the peripheral device, and in addition, only the graphics processor which is authorized or passes verification can be trusted by the trusted execution environment virtual machine based on the graphics processor, at this time, the isolated virtual machine in hardware is realized by the way that the central processing virtual machine in the first virtual machine is accessed to the graphics processor, and is not a simple trusted execution environment virtual machine based on the central processor but is a trusted execution environment virtual machine based on the graphics processor.

In the embodiment of the disclosure, after determining a task stage to be consolidated and a non-consolidated task stage in a plurality of task stages according to the consolidation configuration information, the cloud service provider determines whether the task stage to be consolidated needs to be processed by using the capability of the central processor or the capability of the graphics processor, schedules a first task stage to be processed by using the capability of the central processor in the task stage to be consolidated to a central processing virtual machine in the first virtual machine for processing, and schedules a second task stage to be processed by using the capability of the graphics processor in the task stage to be consolidated to the graphics processing virtual machine in the first virtual machine for processing.

Illustratively, the cloud service provider utilizes a trusted hardware-based trusted execution environment to schedule multiple task phases of the target model task as needed to different virtual machines deployed in the trusted hardware-based trusted execution environment. Taking a target model task as an example of a training task, the training task includes a data preprocessing task stage of the training task and a fine tuning task stage of the data, and specifically, the cloud service provider may process the data preprocessing task stage of the training task in a central processing virtual machine in the first virtual machine, and process the fine tuning task stage of the data in a graphics processing virtual machine in the first virtual machine. Thus, the cloud service provider can fully utilize the advantages of different virtual machines deployed in the trusted execution environment based on the trusted hardware, so as to protect loads of different stages of the target model. Therefore, the embodiment of the disclosure can effectively manage different virtual machine resources deployed in the trusted execution environment based on the trusted hardware, can divide the target model task into different task stages according to the requirements of the target model task, and then put the different task stages into different types of virtual machines deployed in the trusted execution environment based on the trusted hardware, thereby improving the data isolation of the model task in the running process of a certain task stage, and further resisting the attack from the infrastructure, namely the service level.

In an alternative embodiment, the scheduling of the non-consolidated task phases to the corresponding second virtual machine for processing includes scheduling a third task phase of the non-consolidated task phase corresponding to the central processor to the central processing virtual machine in the second virtual machine for processing, and scheduling a fourth task phase of the corresponding graphics processor to the graphics processing virtual machine in the second virtual machine for processing.

The third task stage refers to any task stage which needs to be processed by the capability of the central processing unit in the non-reinforcement task stage, and the fourth task stage refers to any task stage which needs to be processed by the capability of the graphics processor in the non-reinforcement task stage.

In the embodiment of the disclosure, after determining a task stage to be consolidated and a non-consolidated task stage in a plurality of task stages according to the consolidation configuration information, the cloud service provider determines whether the non-consolidated task stage needs to be processed by using the capability of the central processor or the capability of the graphics processor, schedules a third task stage, which needs to be processed by using the capability of the central processor, in the non-consolidated task stage to a central processing virtual machine in the second virtual machine for processing, and schedules a fourth task stage, which needs to be processed by using the capability of the graphics processor, in the non-consolidated task stage to the graphics processing virtual machine in the second virtual machine for processing.

In addition, the task stage to be reinforced and the non-reinforced task stage are also scheduled according to virtual machine parameters or virtual machine labels set in the reinforced configuration information.

The virtual machine parameters may be configuration parameters of the virtual machine, for example, 8 cores and 16 cores, where a core refers to the number of cores of the central processing unit, and may also be a memory size of the virtual machine, such as 8GB and 16GB, and different virtual machine management programs. The virtual machine tag may include the source of the virtual machine, the model of the virtual machine, and the like.

In the embodiment of the present disclosure, the task stage to be consolidated and the non-consolidated task stage are further scheduled according to the virtual machine parameters set in the consolidated configuration information, and specifically, if the virtual machine parameters set in the consolidated configuration information are 8GB, at least one task stage of the multiple task stages of the target model task is processed by using a virtual machine corresponding to the 8 GB.

In another embodiment of the present disclosure, the task stage to be consolidated and the non-consolidated task stage are further scheduled according to a virtual machine tag set in the consolidated configuration information, and specifically, if the virtual machine tag set in the consolidated configuration information is a virtual machine D, at least one task stage of the multiple task stages of the target model task is processed by using a virtual machine corresponding to the virtual machine D.

In order to prove to the user that the respective load does operate in a hardware-protected environment, in an alternative embodiment, the cloud service provider may output corresponding data processing reports through container components added in the first virtual machine and the second virtual machine, wherein the data processing reports carry corresponding authentication credentials for the content of the first virtual machine.

The container component is used for reading task execution records of task stages in the virtual machine. The data processing report refers to a comprehensive description and record of the data processing process, in particular, the data processing report includes task execution records of each task stage of the target model, and may include each stage of task division of the target model, and operations and access situations of services respectively executed by each stage. The certification certificate refers to a certificate issued by an authority and used for proving that the content of the first virtual machine is a certification with certain qualification, and the certification certificate cannot be tampered and can be verified later. Specifically, the authentication certificate may be decrypted and verified by a standard method, ensuring the authenticity of the content of the first virtual machine. So that each log in the data processing report can be traced to a specific hardware device. For example, when using a virtual machine, the source and accuracy of the log may be verified by tracking the number of a specific device through the log, and the content of the first virtual machine refers to an execution record of a model task process, such as an execution log of the model task process.

In the embodiment of the disclosure, the cloud service provider may output a corresponding data processing report by adding a container component to a container group in the first virtual machine and the second virtual machine, specifically, when the container group is started, the container component preferentially reads a data processing result on the virtual machine where the container group is located, and inputs the data processing report and the container group identifier into the log service, so that the virtual machine identifier and the container group identifier can be based on the container group identifier, and the corresponding data processing report can be queried from the log service to determine whether the data processing report is in a trusted execution environment based on trusted hardware.

According to the model task processing scheme provided by the embodiment of the disclosure, a target model task and reinforcement configuration information are obtained, wherein the target model task comprises a plurality of task stages aiming at the target model, a task stage to be reinforced and a non-reinforcement task stage in the plurality of task stages are determined according to the reinforcement configuration information, the task stage to be reinforced is scheduled to a corresponding first virtual machine to be processed, the non-reinforcement task stage is scheduled to a corresponding second virtual machine to be processed, and the first virtual machine is deployed in a trusted execution environment based on trusted hardware. By adopting the technical scheme, the cloud service provider can divide the target model task into a plurality of task stages, schedule different task stages to different virtual machines for reinforcement according to the reinforcement configuration information, and because the first virtual machine is deployed in the trusted execution environment based on trusted hardware, the model task processed by the first virtual machine can obtain higher safety, the safety of model task processing is effectively improved, the internal attack of related personnel of the cloud service provider is prevented, and the safety water level of data isolation during operation is improved on the premise of influencing the performance.

Fig. 4 is a flow chart of another method for processing a model task according to some embodiments of the present disclosure, as shown in fig. 4, where the method includes:

Firstly, a user submits a target model task, a scheduling module at the rear end can sense the requirement of a model provider after receiving the target model task, different dimensions are adopted to specify different parameters, and the scheduling module is informed of virtual machines to be scheduled in each task stage of the target model task. For example, if the target model task is a training or fine tuning task, the target configuration dimension may be employed to specify which task stage of the target model task requires reinforcement, e.g., which task stage of the target model task may be specified from at least one of the reinforcement level, task stage, model, and user to require reinforcement to select an appropriate resource for scheduling.

According to the selection of the scheduling period, tasks in different task stages can be put into different virtual machines, such as a reinforced central processing virtual machine, a central processing virtual machine (i.e. a central processing virtual machine of a second virtual machine), a graphic processing virtual machine (i.e. a graphic processing virtual machine of the second virtual machine) and other types of virtual machines, namely flexible scheduling can be performed for each task stage, and multiple task stages of a target model task are not required to be put into the same virtual machine. If the tasks related to the central processing unit are processed by using the reinforced central processing virtual machine, the tasks related to the graphic processing unit are processed by using the graphic processing virtual machine, so that the scheduling is more fine-grained, and the cost is reduced.

After the dispatching is finished, the corresponding task stages are dispatched to the corresponding virtual machines, after the task is finished, data processing reports are generated for the corresponding task stages, specifically, for the task running in each virtual machine, the data analysis reports can give out information to the user, such as which task stages of the target model task are put into the reinforced central processing virtual machine, the user can be informed by the data processing reports by carrying corresponding authentication certificates for the content of the reinforced central processing virtual machine, and the user is further led to trust the security of the model task processing.

Fig. 5 is a schematic diagram of a Control Plane architecture provided by some embodiments of the present disclosure, where the hardware in fig. 5 includes a central processor, a Network interface card (Network INTERFACE CARD, NIC), and an accelerator (ACCELERATED DEVICES). The network interface card may include, among other things, a data processing unit (Data Processing Unit, DPU). The accelerator may include one of a graphics processing unit, a tensor processing unit (Tensor Processing Unit, TPU), a field programmable gate array (Field Programmable GATE ARRAY, FPGA), an Application SPECIFIC INTEGRATED Circuit (ASIC). The Storage Service (Storage Service) in fig. 5 may be a cloud Storage Service implemented based on a cloud server, and the embodiment is not limited to a specific hardware device for implementing the Storage Service, and the Storage Service may include one or more databases. As shown in fig. 5, in this embodiment, the model of the control plane may be used as a model, that is, a large model of the service receiving each model provider, and then services such as an inference service, a fine tuning service, and the like are provided to the outside by using these large models. Essentially, the model as a service is a platform as a service.

Depending on infrastructure as a service on the data plane, the data plane may have a corresponding software stack that may be deployed as a service container group schedule at the managed cloud server (Elastic Compute Service, ECS) node, and in response to the user's schedule, may perform the corresponding schedule according to the scheduler. In fig. 5, there are multiple user container groups and service container groups in the cloud server virtual machine.

Related tasks of a general model, i.e., a service, can be classified into an inference task and a training task (e.g., a dynamic fine tuning task), etc. And the task has three roles of a cloud service provider, a model provider and a user.

In particular, cloud service providers provide cloud services (e.g., infrastructure as a service or platform as a service) to serve entities of different model providers. The model provider has a large model and builds entities of services such as self reasoning tasks by using the cloud service provider. The user runs entities of the application programs provided by the cloud service provider and the model provider. The user has corresponding isolation requirements for the model provider and the cloud service provider. Specifically, when a user obtains a model, i.e., a service, from a cloud service provider, the set of containers of the user's own model-related tasks are isolated in network, computing and storage dimensions to ensure security.

Fig. 6 is a schematic structural diagram of a model task processing device provided in some embodiments of the present disclosure, where the device may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 6, the apparatus is disposed in a first base station, and includes:

an obtaining module 601, configured to obtain a target model task and reinforcement configuration information, where the target model task includes a plurality of task phases for a target model;

a determining module 602, configured to determine a task stage to be consolidated and a non-consolidated task stage in the plurality of task stages according to the consolidated configuration information;

The processing module 603 is configured to schedule the task stage to be consolidated to a corresponding first virtual machine for processing, and schedule the task stage not to be consolidated to a corresponding second virtual machine for processing;

In an alternative embodiment, the reinforcement configuration information is used to specify a task stage to be reinforced of the target model task using a target configuration dimension, where the target configuration dimension includes at least one of a reinforcement level, a task stage, a model, and a user.

In an alternative embodiment, the illustrated determination module 602 includes:

The first determining submodule is used for determining task stages to be reinforced based on the target number corresponding to the reinforcement grade when the target configuration dimension is the reinforcement grade, and determining task stages except the task stages to be reinforced as non-reinforcement task stages;

a second determining submodule, configured to determine all task phases of the plurality of task phases as task phases to be reinforced when the target configuration dimension is a model or a user and the target model task is matched with the model or the user;

and the third determining submodule is used for determining the task stage corresponding to the target configuration dimension in the task stages as a task stage to be reinforced when the target configuration dimension is the task stage, and determining the task stages except the task stage to be reinforced as non-reinforced task stages.

In an alternative embodiment, the processing module 603 includes a first scheduling sub-module and a second scheduling sub-module;

The first scheduling sub-module is used for scheduling the task stage to be reinforced to the corresponding first virtual machine for processing;

And the second scheduling sub-module is used for scheduling the non-reinforced task stage to a corresponding second virtual machine for processing.

In an alternative embodiment, the first scheduling submodule is specifically configured to:

And scheduling a first task stage corresponding to the central processing unit in the task stages to be reinforced to a central processing virtual machine in the first virtual machine for processing, and scheduling a second task stage corresponding to the graphic processor to a graphic processing virtual machine in the first virtual machine for processing.

In an alternative embodiment, the second scheduling sub-module is specifically configured to:

and scheduling a third task stage corresponding to the central processing unit in the non-reinforcement task stage to a central processing virtual machine in the second virtual machine for processing, and scheduling a fourth task stage corresponding to the graphic processor to a graphic processing virtual machine in the second virtual machine for processing.

In an optional implementation manner, the task stage to be consolidated and the non-consolidated task stage are further scheduled according to virtual machine parameters or virtual machine labels set in the consolidated configuration information.

In an alternative embodiment, the apparatus further comprises:

The output module is used for outputting a corresponding data processing report through the container components added in the first virtual machine and the second virtual machine, wherein the data processing report carries a corresponding authentication certificate for the content of the first virtual machine.

The model task processing device provided by the embodiment of the disclosure can execute the model task processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

Embodiments of the present disclosure also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the model task processing method provided by any of the embodiments of the present disclosure.

Referring now in particular to fig. 7, a schematic diagram of an electronic device 700 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 700 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 7 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, devices may be connected to I/O interface 705 including input devices 706 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 707 including a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 708 including, for example, magnetic tape, hard disk, etc., and communication devices 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. When executed by the processing device 701, the computer program performs the above-described functions defined in the model task processing method of the embodiment of the present disclosure.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs, when the one or more programs are executed by the electronic device, the electronic device is caused to acquire target model tasks and reinforcement configuration information, wherein the target model tasks comprise a plurality of task phases aiming at a target model, the task phases to be reinforced and the task phases not to be reinforced in the plurality of task phases are determined according to the reinforcement configuration information, the task phases to be reinforced are scheduled to corresponding first virtual machines for processing, and the task phases not to be reinforced are scheduled to corresponding second virtual machines for processing, and the first virtual machines are deployed in a trusted execution environment based on trusted hardware.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type of information, the scope of use, the use scenario, etc. related to the present disclosure in an appropriate manner according to relevant legal regulations.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A model task processing method, characterized by comprising:

Obtaining a target model task and reinforcement configuration information, wherein the target model task comprises a plurality of task stages aiming at a target model, the task stages refer to different links or tasks which can be split and processed in a life cycle of the model task determined by a cloud service provider through analysis of the plurality of model tasks, the reinforcement configuration information refers to information of task configuration aiming at the target model in advance by the model provider based on reinforcement requirements, the reinforcement configuration information is used for designating a task stage to be reinforced of the target model task by adopting a target configuration dimension, and the target configuration dimension comprises at least one of reinforcement grade, task stage, model and user;

2. The method of claim 1, wherein determining a task phase to be consolidated and a non-consolidated task phase of the plurality of task phases based on the consolidated configuration information comprises:

When the target configuration dimension is a reinforcement grade, determining task stages to be reinforced based on the target quantity corresponding to the reinforcement grade, and determining task stages except the task stages to be reinforced as non-reinforcement task stages;

When the target configuration dimension is a model or a user, determining all task phases in the plurality of task phases as task phases to be reinforced when the target model task is matched with the model or the user;

And when the target configuration dimension is a task stage, determining the task stage corresponding to the target configuration dimension in the task stages as a task stage to be reinforced, and determining the task stages except the task stage to be reinforced as non-reinforced task stages.

3. The method of claim 1, wherein scheduling the task phase to be consolidated into a corresponding first virtual machine for processing comprises:

4. The method of claim 1, wherein scheduling the non-consolidated task phase to a corresponding second virtual machine for processing comprises:

5. The method of claim 1, wherein the task phase to be consolidated and the non-consolidated task phase are further scheduled according to virtual machine parameters or virtual machine tags set in the consolidated configuration information.

6. The method according to claim 1, wherein the method further comprises:

Outputting a corresponding data processing report through the added container components in the first virtual machine and the second virtual machine, wherein the data processing report carries a corresponding authentication certificate for the content of the first virtual machine.

7. A model task processing device, characterized by comprising:

The system comprises an acquisition module, a reinforcement configuration module and a reinforcement module, wherein the acquisition module is used for acquiring a target model task and reinforcement configuration information, the target model task comprises a plurality of task stages aiming at a target model, the task stages refer to different links or tasks which can be split and processed in a life cycle of the model task determined by a cloud service provider through analysis of the plurality of model tasks, the reinforcement configuration information refers to information which is configured in advance by the model provider for the task of the target model based on reinforcement requirements, the reinforcement configuration information is used for designating the task stages to be reinforced of the target model task by adopting a target configuration dimension, and the target configuration dimension comprises at least one of reinforcement grade, task stage, model and user;

8. An electronic device, the electronic device comprising:

A processor;

A memory for storing the processor-executable instructions;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the model task processing method according to any one of the preceding claims 1-6.

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the model task processing method according to any one of the preceding claims 1-6.