CN118414606A

CN118414606A - Machine Learning Using Serverless Computing Architecture

Info

Publication number: CN118414606A
Application number: CN202280084469.0A
Authority: CN
Inventors: M·麦康蒂; G·D·安贾内亚普拉兰奇; R·雷乔杜里; M·帕姆; S·夏尔马; S·维克拉姆; J·A·桑德斯; M·萨瑟
Original assignee: Amazon Technologies Inc
Current assignee: Amazon Technologies Inc
Priority date: 2021-11-27
Filing date: 2022-11-18
Publication date: 2024-07-30
Anticipated expiration: 2042-11-18
Also published as: WO2023097176A1; EP4437415A1; CN118414606B

Abstract

A serverless computing system is configured to provide access to a machine learning model by associating at least an endpoint including code to access the machine learning model with an extension that interfaces between a serverless computing architecture and the endpoint. A request to perform an inference is received by the system and processed by executing a compute function using the serverless computing architecture. The compute function causes the extension to interface with the endpoint to cause the machine learning model to perform the inference.

Description

Machine Learning Using Serverless Computing Architecture

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求2021年11月27日提交的名称为“Liberty 1-IN”的印度专利申请202111054927、2022年3月31日提交的名称为“MACHINE LEARNING USING SERVERLESSCOMPUTE ARCHITECTURE”的美国专利申请17/710,853、2021年11月27日提交的名称为“Liberty 2-IN”的印度专利申请202111054942以及2022年3月31日提交的名称为“MACHINELEARNING USING A HYBRID SERVERLESS COMPUTE ARCHITECTURE”的美国专利申请17/710,864的优先权。This application claims priority to Indian patent application 202111054927, named “Liberty 1-IN”, filed on November 27, 2021, U.S. patent application 17/710,853, named “MACHINE LEARNING USING SERVERLESSCOMPUTE ARCHITECTURE”, filed on March 31, 2022, Indian patent application 202111054942, named “Liberty 2-IN”, filed on November 27, 2021, and U.S. patent application 17/710,864, named “MACHINELEARNING USING A HYBRID SERVERLESS COMPUTE ARCHITECTURE”, filed on March 31, 2022.

背景技术Background technique

机器学习技术越来越广泛地应用于各种行业。然而，这些技术的维护和管理可能比较困难。机器学习模型的开发和使用可能需要大量的计算资源，诸如存储和处理时间。这些资源可能难以获得和管理，这可能会给采用机器学习技术带来重大障碍。Machine learning techniques are increasingly being used in a variety of industries. However, these techniques can be difficult to maintain and manage. The development and use of machine learning models can require significant computing resources, such as storage and processing time. These resources can be difficult to obtain and manage, which can present a significant barrier to the adoption of machine learning techniques.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

将参考附图描述各种技术，在附图中：Various techniques will be described with reference to the accompanying drawings, in which:

图1示出了根据至少一个实施方案的包括无服务器计算架构的用于执行机器学习推断的系统；FIG1 illustrates a system for performing machine learning inference including a serverless computing architecture according to at least one embodiment;

图2示出了根据至少一个实施方案的用于使得无服务器计算架构能够执行机器学习推断的示例过程；FIG2 illustrates an example process for enabling a serverless computing architecture to perform machine learning inference according to at least one embodiment;

图3示出了根据至少一个实施方案的调用并执行计算函数以执行机器学习推断的示例；FIG3 illustrates an example of calling and executing a compute function to perform machine learning inference according to at least one embodiment;

图4示出了根据至少一个实施方案的结合有机器学习推断的无服务器处理和服务器完整处理的混合系统的示例；4 illustrates an example of a hybrid system combining serverless processing and server-full processing with machine learning inference according to at least one embodiment;

图5示出了根据至少一个实施方案的用于将无服务器计算架构配置成执行机器学习推断的示例过程；5 illustrates an example process for configuring a serverless computing architecture to perform machine learning inference according to at least one embodiment;

图6示出了根据至少一个实施方案的用于将混合计算架构配置成执行机器学习推断的示例过程；6 illustrates an example process for configuring a hybrid computing architecture to perform machine learning inference according to at least one embodiment;

图7示出了根据至少一个实施方案的用于使用无服务器计算架构来执行机器学习推断的示例过程；FIG7 illustrates an example process for performing machine learning inference using a serverless computing architecture according to at least one embodiment;

图8示出了根据至少一个实施方案的用于使用混合计算架构来执行机器学习推断的示例过程；并且FIG8 illustrates an example process for performing machine learning inference using a hybrid computing architecture according to at least one embodiment; and

图9示出了其中可以实现各个实施方案的系统。FIG. 9 illustrates a system in which various embodiments may be implemented.

具体实施方式Detailed ways

在示例中，系统利用无服务器计算架构以使用机器学习模型生成推断。无服务器计算架构(其也可称为无服务器计算系统或无服务器计算子系统)包括动态提供计算资源以执行计算函数的硬件和软件。使用模型服务器来促进对机器学习模型的访问，该模型服务器虽然可在专用服务器上使用，但也可以通过采用本文中描述的技术来利用无服务器计算架构。In an example, the system utilizes a serverless computing architecture to generate inferences using a machine learning model. A serverless computing architecture (which may also be referred to as a serverless computing system or a serverless computing subsystem) includes hardware and software that dynamically provides computing resources to perform computing functions. Access to the machine learning model is facilitated using a model server, which, while available on a dedicated server, can also utilize a serverless computing architecture by employing the techniques described herein.

在基于服务器的应用程序中，机器学习服务的用户可以创建并训练由该服务托管的机器学习模型。为了使用托管机器学习模型或其他计算服务，客户被指派专用服务器实例，模型服务器在该专用服务器实例上安装并激活。模型服务器是实现超文本传输协议(“HTTP”)服务器的代码单元，该超文本传输协议服务器监听请求以从模型中获得推断，并且通过与托管模型交互来对那些请求作出响应。专用服务是指派托管模型服务器的任务的计算装置。如果需求激增，则使用专用服务器可能效果不佳，因为专用服务器实例的能力可能是有限的。此外，此方法通常将需要其他开销，诸如管理负担。In a server-based application, users of a machine learning service can create and train machine learning models hosted by the service. To use a hosted machine learning model or other computing service, a customer is assigned a dedicated server instance on which a model server is installed and activated. A model server is a unit of code that implements a Hypertext Transfer Protocol (“HTTP”) server that listens for requests to obtain inferences from a model and responds to those requests by interacting with the hosted model. A dedicated service is a computing device that is tasked with hosting a model server. If demand surges, using a dedicated server may not work well because the capacity of a dedicated server instance may be limited. In addition, this approach will typically require other overhead, such as management burdens.

为了解决这些问题，用户能够请求使用无服务器配置来提供对机器学习模型的访问。在该示例的实施方案中，这是通过在端点配置中指定应使用无服务器配置来完成的。还可以供应附加参数，诸如将利用的最大存储量或者将支持的最大并发请求数量。这些参数可用于帮助管理无服务器环境所利用的能力。在此处，端点是指标识或者访问机器学习模型的地址或其他手段。端点可以充当面向外的接口，机器学习模型或其他计算服务的用户将它们的请求指向该面向外的接口。To address these issues, users can request that access to the machine learning model be provided using a serverless configuration. In the implementation scheme of this example, this is accomplished by specifying in the endpoint configuration that a serverless configuration should be used. Additional parameters may also be supplied, such as the maximum amount of storage to be utilized or the maximum number of concurrent requests to be supported. These parameters can be used to help manage the capabilities utilized by the serverless environment. Here, an endpoint refers to an address or other means of identifying or accessing a machine learning model. An endpoint can act as an outward-facing interface to which users of a machine learning model or other computing service direct their requests.

当请求无服务器配置时，系统将无服务器计算架构配置成使用模型服务器来处理请求以获得推断，并且路由器被配置成将这类请求转发到无服务器计算架构。为了能够使用模型服务器，系统生成包括模型服务器和扩展的容器，该容器在无服务器计算架构与扩展之间交接。生成容器还可以包括清理过程，该清理过程是指系统编辑或者移除模型服务器所使用的配置数据。此配置数据可以包括若不被编辑或者被移除便可能会产生负面影响的配置数据。例如，模型服务器可能具有适用于模型服务器安装在专用服务器实例上的情况但不适用于模型服务器由无服务器计算架构执行的情况的配置数据。如果端点被配置成使用混合或全服务器配置或者当端点被配置成使用混合或全服务器配置时，可以存储并调用所移除的或所编辑的信息以供稍后使用。When a serverless configuration is requested, the system configures the serverless computing architecture to use a model server to process the request to obtain an inference, and the router is configured to forward such requests to the serverless computing architecture. In order to be able to use the model server, the system generates a container including the model server and the extension, which is handed over between the serverless computing architecture and the extension. Generating the container may also include a cleanup process, which refers to the system editing or removing configuration data used by the model server. This configuration data may include configuration data that may have a negative impact if it is not edited or removed. For example, a model server may have configuration data that is applicable to a situation where the model server is installed on a dedicated server instance but is not applicable to a situation where the model server is executed by a serverless computing architecture. If the endpoint is configured to use a hybrid or full server configuration or when the endpoint is configured to use a hybrid or full server configuration, the removed or edited information can be stored and called for later use.

当接收到执行推断的请求时，从存储装置中检索此容器。无服务器计算架构动态地分配计算能力以调用并执行与扩展交接的计算函数。该扩展随后激活由模型服务器实现的HTTP服务器并且调用由模型服务器实现的基于网络的方法。这进而使得模型服务器访问机器学习模型并且获得所请求的推断。When a request to perform inference is received, this container is retrieved from storage. The serverless computing architecture dynamically allocates computing power to call and execute the computing function that is interfaced with the extension. The extension then activates the HTTP server implemented by the model server and calls the network-based method implemented by the model server. This in turn causes the model server to access the machine learning model and obtain the requested inference.

在该示例的另一方面，用户可以请求使用混合操作模式来提供对机器学习模型的访问。当被配置成以混合模式操作时，该系统使用已安装有模型服务器的专用服务器来处理传入推断请求的一部分。可以确定此部分的大小，使得该大小将专用服务器的使用最大化。然而，为了应对临时的需求激增或者应对因增添专用服务器而尚未解决的需求增加，该系统采用无服务器计算架构。因此，使用无服务器计算架构来处理超过专用服务器的能力的请求，该无服务器计算架构包括容器，该容器包括模型服务器和扩展，如前述段落中所描述。In another aspect of this example, a user can request to use a hybrid operating mode to provide access to a machine learning model. When configured to operate in hybrid mode, the system uses a dedicated server that has a model server installed to process a portion of the incoming inference requests. The size of this portion can be determined so that the size maximizes the use of the dedicated server. However, in order to cope with temporary surges in demand or to cope with increases in demand that have not been addressed by adding dedicated servers, the system employs a serverless computing architecture. Therefore, a serverless computing architecture is used to process requests that exceed the capabilities of a dedicated server, the serverless computing architecture comprising a container that includes a model server and an extension, as described in the preceding paragraph.

在前面和下面的描述中，描述了各种技术。出于解释的目的，阐述了具体配置和细节以便提供对实现这些技术的可能方式的全面理解。然而，还将显而易见的是，以下描述的技术可以在没有具体细节的情况下以不同的配置来实践。此外，可省略或简化公知的特征，以避免使所描述的技术不清楚。In the foregoing and following descriptions, various techniques are described. For purposes of explanation, specific configurations and details are set forth in order to provide a comprehensive understanding of possible ways to implement these techniques. However, it will also be apparent that the techniques described below can be practiced in different configurations without the specific details. In addition, well-known features may be omitted or simplified to avoid making the described techniques unclear.

图1示出了根据至少一个实施方案的包括无服务器计算架构的用于执行机器学习的系统。在示例系统100中，客户端130传输利用机器学习模型116的请求，并且此请求由路由器104接收。该请求由无服务器计算架构102处理，该无服务器计算架构以所请求的方式利用机器学习模型116并且将结果返回给客户端130。1 illustrates a system for performing machine learning including a serverless computing architecture according to at least one embodiment. In the example system 100, a client 130 transmits a request to utilize a machine learning model 116, and this request is received by a router 104. The request is processed by the serverless computing architecture 102, which utilizes the machine learning model 116 in the requested manner and returns the result to the client 130.

机器学习模型(诸如所描绘的机器学习模型116)可以包括但不限于实现与监督和非监督学习、强化学习、线性回归、朴素贝叶斯( Bayes)网络、神经网络、深度学习、随机森林、分类、回归、预测等相关的各种算法或技术中的任一者的数据和代码。在至少一个实施方案中，机器学习模型包括针对这种模型或算法的参数。这些参数可以包括例如与神经网络相关联的各种连接权重。在至少一个实施方案中，机器学习模型包括模型架构的定义以及表示模型的当前状态的参数集。通常，这类参数表示模型训练的当前状态。Machine learning models, such as the depicted machine learning model 116, may include, but are not limited to, implementing supervised and unsupervised learning, reinforcement learning, linear regression, Naive Bayes ( The machine learning model includes data and code for any of various algorithms or techniques related to Bayesian networks, neural networks, deep learning, random forests, classification, regression, prediction, etc. In at least one embodiment, the machine learning model includes parameters for such a model or algorithm. These parameters may include, for example, various connection weights associated with the neural network. In at least one embodiment, the machine learning model includes a definition of the model architecture and a set of parameters representing the current state of the model. Typically, such parameters represent the current state of model training.

无服务器计算架构102允许使用在按需基础上指派的计算能力来执行计算函数，诸如所描绘的计算函数110。架构102被描述为无服务器的，因为不是将特定计算实例专用于执行计算函数，而是动态指派计算能力以便执行计算函数。因此，无服务器计算架构(诸如图1中描绘的架构102)包括一个或多个计算系统，该一个或多个计算系统响应于调用并执行计算函数的请求而分配足以调用并执行该计算函数的计算能力，随后调用并执行该函数。在一些实施方案中，无服务器计算架构还基于客户端利用的能力的量，而不是基于专用于客户端使用的服务器实例的数量来跟踪对计算能力的利用。无服务器计算架构中的客户端所利用的能力可以根据各种度量来测量，这些度量可能包括计算函数的调用次数、执行计算函数所花费的时间或者计算函数所执行的输入操作或输出操作的大小。在示例系统100中，无服务器计算架构包括利用无服务器计算的附加特征，该无服务器计算使用本文中描述的技术来利用机器学习模型。The serverless computing architecture 102 allows computing functions, such as the depicted computing function 110, to be executed using computing power assigned on an on-demand basis. The architecture 102 is described as serverless because, rather than dedicating a particular computing instance to executing a computing function, computing power is dynamically assigned to execute a computing function. Therefore, a serverless computing architecture (such as the architecture 102 depicted in FIG. 1 ) includes one or more computing systems that allocate computing power sufficient to call and execute the computing function in response to a request to call and execute a computing function, and then call and execute the function. In some embodiments, the serverless computing architecture also tracks the utilization of computing power based on the amount of power utilized by the client, rather than based on the number of server instances dedicated to client use. The power utilized by the client in the serverless computing architecture can be measured according to various metrics, which may include the number of invocations of the computing function, the time spent executing the computing function, or the size of the input operations or output operations performed by the computing function. In the example system 100, the serverless computing architecture includes additional features that utilize serverless computing, which utilizes machine learning models using the techniques described herein.

计算函数(诸如所描绘的计算函数110)包括可执行代码单元。代码可以包括编译指令、源代码或中间代码。计算函数有时可以称为过程、例程、方法、表达式、闭包、λ函数等。在无服务器计算架构中，计算函数可以由客户端提供。然而，在示例系统100中，计算函数110可以由系统100自动生成以用于利用无服务器计算架构102来执行机器学习功能。A computation function, such as the depicted computation function 110, includes an executable code unit. The code may include compiled instructions, source code, or intermediate code. A computation function may sometimes be referred to as a procedure, a routine, a method, an expression, a closure, a lambda function, etc. In a serverless computing architecture, the computation function may be provided by a client. However, in the example system 100, the computation function 110 may be automatically generated by the system 100 for performing machine learning functions utilizing the serverless computing architecture 102.

利用机器学习模型的请求可以包括但不一定限于电子传输或其他通信，该其他通信包括指示机器学习模型(诸如所描绘的机器学习模型116)应执行操作的信息。这些操作可以包括但不一定限于推断操作。如本文中所使用，推断操作可以包括广泛多种机器学习任务中的任何一种机器学习任务，诸如分类、预测、回归、聚类、分割等。A request to utilize a machine learning model may include, but is not necessarily limited to, an electronic transmission or other communication that includes information indicating that a machine learning model (such as the depicted machine learning model 116) should perform operations. These operations may include, but are not necessarily limited to, inference operations. As used herein, inference operations may include any of a wide variety of machine learning tasks, such as classification, prediction, regression, clustering, segmentation, and the like.

路由器104可以包括网络装置或者具有网络通信硬件的其他计算装置，该其他计算装置被配置成能够与客户端130和无服务器计算架构102的各种部件通信。在至少一个实施方案中，无服务器计算架构102包括路由器104，而在其他实施方案中，路由器104是与无服务器计算架构分离但连接到无服务器计算架构并且能够与无服务器计算架构通信的前端部件。尽管图1中未描绘，但路由器104也可以连接到代表客户端130托管的服务器实例并且与该服务器实例通信，并且该服务器实例托管用于利用机器学习模型的各种模型服务器。进一步关于图4描述了包括此配置(有时称为混合配置)的实施方案的示例。The router 104 may include a network device or other computing device with network communication hardware that is configured to be able to communicate with the client 130 and various components of the serverless computing architecture 102. In at least one embodiment, the serverless computing architecture 102 includes the router 104, while in other embodiments, the router 104 is a front-end component that is separate from the serverless computing architecture but connected to the serverless computing architecture and capable of communicating with the serverless computing architecture. Although not depicted in Figure 1, the router 104 may also be connected to and communicate with a server instance hosted on behalf of the client 130, and the server instance hosts various model servers for utilizing machine learning models. An example of an embodiment including this configuration (sometimes referred to as a hybrid configuration) is further described with respect to Figure 4.

在接收到执行推断的请求时，路由器104可以确定可使用无服务器计算架构102来处理该请求。如果是，则路由器104可以确定利用计算服务108来获得推断，而不是将请求直接指向服务器上托管的模型服务器。如果将使用无服务器架构，路由器104因此可以将请求转换成合适的数据格式，并且使用计算服务108来调用合适的计算函数110。Upon receiving a request to perform inference, router 104 may determine that the request can be processed using serverless computing architecture 102. If so, router 104 may determine to utilize computing service 108 to obtain inferences, rather than directing the request directly to the model server hosted on the server. If a serverless architecture is to be used, router 104 may therefore convert the request into a suitable data format and use computing service 108 to call the appropriate computing function 110.

在至少一个实施方案中，路由器104利用角色代理服务106，使得在处理请求时使用适当级别的许可或授权。在至少一个实施方案中，角色代理服务106包括临时模拟某些计算角色的硬件和/或软件。这可以包括当这些许可与传入请求不相关联时，临时采用其许可适于利用计算服务108的角色。指向路由器104的请求可能不一定具有适当的许可，因为要求传入请求具有这类许可可能会使得更难以使用无服务器计算来设置或者利用机器学习模型116。In at least one embodiment, the router 104 utilizes a role proxy service 106 so that the appropriate level of permissions or authorizations are used when processing requests. In at least one embodiment, the role proxy service 106 includes hardware and/or software that temporarily simulates certain computing roles. This can include temporarily adopting a role whose permissions are suitable for utilizing computing services 108 when these permissions are not associated with incoming requests. Requests directed to the router 104 may not necessarily have the appropriate permissions, because requiring incoming requests to have such permissions may make it more difficult to use serverless computing to set up or utilize machine learning models 116.

在至少一个实施方案中，无服务器计算架构包括计算服务108，该计算服务为调用并执行计算函数110动态地分配计算能力。计算服务108(其也可以被描述为无服务器计算服务)包括对调用并执行计算函数110的请求作出响应的硬件。此响应包括分配充足的计算能力，随后调用并执行计算函数110。In at least one embodiment, the serverless computing architecture includes a computing service 108 that dynamically allocates computing power for invoking and executing a computing function 110. The computing service 108 (which may also be described as a serverless computing service) includes hardware that responds to requests to invoke and execute a computing function 110. This response includes allocating sufficient computing power and then invoking and executing the computing function 110.

计算函数110被设计成使得当由计算服务108调用并执行时，该计算函数与扩展112交接以使得模型服务器114从机器学习模型116获得推断或者以其他方式利用该机器学习模型。The compute function 110 is designed such that when called and executed by the compute service 108 , it interfaces with the extension 112 to enable the model server 114 to obtain inferences from the machine learning model 116 or otherwise utilize the machine learning model.

在至少一个实施方案中，机器学习模型116由机器学习服务118托管。该服务可以包括被配置成提供与机器学习模型的配置、训练、部署和使用相关的能力的计算装置和其他计算资源。在一些情况和实施方案中，服务118可以包括用于模型参数的存储装置，而在其他情况和实施方案中，除了机器学习服务118以外，还可以使用外部存储服务。In at least one embodiment, the machine learning model 116 is hosted by a machine learning service 118. The service may include computing devices and other computing resources configured to provide capabilities related to the configuration, training, deployment, and use of machine learning models. In some cases and embodiments, the service 118 may include storage for model parameters, while in other cases and embodiments, an external storage service may be used in addition to the machine learning service 118.

模型服务器114包括用于与机器学习模型116交接的代码。在此处，交接可以指模块、库、程序、函数或其他代码单元之间的交互。示例包括但不一定限于模块调用另一模块的函数并且获得结果、模块发起另一模块上的过程、程序访问由库的类实现的特性等。将了解，这些示例旨在说明而不是限制。一般来说，模型服务器114充当机器学习模型116的前端并且可用于例如训练模型116或者使用模型116来获得推断。The model server 114 includes code for interfacing with the machine learning model 116. Here, interfacing can refer to interactions between modules, libraries, programs, functions, or other code units. Examples include, but are not necessarily limited to, a module calling a function of another module and obtaining a result, a module initiating a process on another module, a program accessing a property implemented by a class of a library, etc. It will be appreciated that these examples are intended to be illustrative and not limiting. In general, the model server 114 acts as a front end for the machine learning model 116 and can be used, for example, to train the model 116 or use the model 116 to obtain inferences.

在至少一个实施方案中，模型服务器114不仅与图1中描绘的无服务器计算架构102兼容，还与基于服务器的配置兼容，在基于服务器的配置中，客户端直接访问模型服务器114。图1描绘了在无服务器计算环境内对模型服务器114的使用，而图4中描绘了在混合环境中使用模型服务器114(涉及无服务器的配置和基于服务器的配置)的示例。In at least one embodiment, the model server 114 is compatible not only with the serverless computing architecture 102 depicted in FIG1 , but also with a server-based configuration in which clients directly access the model server 114. FIG1 depicts the use of the model server 114 within a serverless computing environment, while FIG4 depicts an example of the use of the model server 114 in a hybrid environment (involving both serverless and server-based configurations).

在至少一个实施方案中，模型服务器114包括实现超文本传输协议(“HTTP”)服务器的代码。在这类实施方案中，服务器实现为接收请求使用机器学习模型来执行机器学习任务的HTTP兼容消息。例如，在至少一个实施方案中，模型服务器包括接收HTTP兼容请求以执行推断的代码，并且利用包括通过执行推断所获得的数据的HTTP兼容响应作出响应。该请求可以包括与该请求相关的各种参数或属性，诸如端点的名称、待使用的机器学习模型的标识符、待进行的推断的标识符、针对机器学习模型的输入以及/或者其他数据。In at least one embodiment, model server 114 includes code that implements a Hypertext Transfer Protocol ("HTTP") server. In such embodiments, the server is implemented to receive HTTP-compatible messages requesting the use of a machine learning model to perform a machine learning task. For example, in at least one embodiment, the model server includes code that receives an HTTP-compatible request to perform an inference, and responds with an HTTP-compatible response that includes data obtained by performing the inference. The request may include various parameters or attributes associated with the request, such as a name of an endpoint, an identifier of a machine learning model to use, an identifier of an inference to be performed, inputs to the machine learning model, and/or other data.

当在无服务器计算架构(诸如图1中描绘的无服务器计算架构102)内操作时，不在服务器的专用实例上托管模型服务器114，因为无服务器计算架构102在每个请求基础上动态地分配计算资源。这可能会形成各种挑战。一个这样的挑战是由于与HTTP兼容请求中使用的数据格式相比，用于在无服务器计算环境中调用无服务器计算函数的数据格式不同。另一原因是HTTP服务器通常是长期运行的。虽然HTTP服务器可以在专用服务器上被激活一次并且保持运行，但此方法在无服务器计算环境中可能不可行，因为被指派以调用并执行计算函数的资源是动态指派的，并且在一些情况下可能不会持续超过单个请求的持续时间。When operating within a serverless computing architecture (such as the serverless computing architecture 102 depicted in FIG. 1 ), the model server 114 is not hosted on a dedicated instance of a server because the serverless computing architecture 102 dynamically allocates computing resources on a per-request basis. This may create various challenges. One such challenge is due to the different data formats used to call serverless computing functions in a serverless computing environment compared to the data formats used in HTTP-compatible requests. Another reason is that HTTP servers are typically long-running. Although an HTTP server can be activated once on a dedicated server and kept running, this approach may not be feasible in a serverless computing environment because the resources assigned to call and execute computing functions are dynamically assigned and may not last longer than the duration of a single request in some cases.

在实施方案中，使用扩展112来解决这些问题。扩展112包括与模型服务器114交接的代码。交接可以包括在数据格式之间进行转换的操作。例如，在至少一个实施方案中，计算服务108可以接受对呈有限数量的数据格式的计算函数110的调用。在一个示例中，使用JavaScript对象符号(“JSON”)格式，但将了解，此示例旨在说明而不是限制。然而，应注意，系统100在没有专门针对实现机器学习的修改的情况下利用预先存在的计算服务108可能是有利的。因此，改变用于调用计算函数110的数据格式可能是不实际的，并且为了解决此问题，扩展112在用于调用计算函数110的数据格式与模型服务器114所预期的任何数据格式之间进行转换。In an embodiment, extensions 112 are used to address these issues. Extensions 112 include code that interfaces with model servers 114. The interface may include operations that convert between data formats. For example, in at least one embodiment, compute services 108 may accept calls to compute functions 110 in a limited number of data formats. In one example, JavaScript Object Notation ("JSON") format is used, but it will be appreciated that this example is intended to illustrate rather than limit. However, it should be noted that it may be advantageous for system 100 to utilize pre-existing compute services 108 without modifications specifically for implementing machine learning. Therefore, it may be impractical to change the data format used to call compute functions 110, and to address this issue, extensions 112 convert between the data format used to call compute functions 110 and any data format expected by model servers 114.

在至少一个实施方案中，扩展112还包括用于激活模型服务器114以在计算函数110的环境内使用的代码。这可以包括例如与模型服务器114交接以将由模型服务器114实现的HTTP服务器初始化，使得扩展112随后可以进一步利用HTTP服务器来访问机器学习模型116。In at least one embodiment, extension 112 also includes code for activating model server 114 for use within the context of compute function 110. This may include, for example, interfacing with model server 114 to initialize an HTTP server implemented by model server 114 so that extension 112 can then further utilize the HTTP server to access machine learning model 116.

在至少一个实施方案中，系统100包括用于记录活动并记录度量的各种设施。这些度量可以包括与路由器104和无服务器计算架构102的操作相关的度量。例如，路由器104和计算函数120a、120b可以各自输出与它们的相应活动相关的日志和度量。客户端130可以经由监视控制台132获得这些和其他日志和度量。In at least one embodiment, the system 100 includes various facilities for logging activities and recording metrics. These metrics can include metrics related to the operation of the router 104 and the serverless computing architecture 102. For example, the router 104 and the computing functions 120a, 120b can each output logs and metrics related to their respective activities. The client 130 can obtain these and other logs and metrics via the monitoring console 132.

图2示出了根据至少一个实施方案的用于使得无服务器计算架构能够执行机器学习的示例过程。在示例200中，客户端204请求经由无服务器计算架构向客户端204提供对机器学习能力(诸如推断)的访问。2 illustrates an example process for enabling a serverless computing architecture to perform machine learning, according to at least one embodiment. In example 200, a client 204 requests access to machine learning capabilities (such as inference) provided to the client 204 via a serverless computing architecture.

在至少一个实施方案中，无服务器端点请求230由客户端204提供给控制面220。无服务器端点请求230包括指示客户端希望访问机器学习能力的数据，并且这种访问应使用无服务器计算架构来提供。无服务器端点请求230还可以包含附加配置信息，该附加配置信息可用于代表客户端限制无服务器计算架构所利用的计算能力。施加这类限制可以有助于管理成本并且改善计算能力的提供。In at least one embodiment, a serverless endpoint request 230 is provided by the client 204 to the control plane 220. The serverless endpoint request 230 includes data indicating that the client desires access to machine learning capabilities and that such access should be provided using a serverless computing architecture. The serverless endpoint request 230 may also include additional configuration information that may be used to limit the computing capabilities utilized by the serverless computing architecture on behalf of the client. Imposing such limits may help manage costs and improve the provision of computing capabilities.

控制面220可以包括硬件和/或软件以协调工作流(诸如关于图2描述的工作流)的执行，以使得能够创建端点以经由无服务器架构来利用机器学习模型。响应于接收到无服务器端点请求230，控制面220可以将发起端点创建232的命令发送到端点创建服务226。Control plane 220 may include hardware and/or software to coordinate the execution of a workflow (such as the workflow described with respect to FIG. 2 ) to enable the creation of an endpoint to utilize a machine learning model via a serverless architecture. In response to receiving serverless endpoint request 230 , control plane 220 may send a command to initiate endpoint creation 232 to endpoint creation service 226 .

端点创建服务226可以包括进行以下操作的硬件和/或软件：生成包括模型服务器和扩展的模型容器，发起对模型容器234的代管，并且在适当时利用角色代理服务206来承担用于生成模型容器的角色236。在至少一个实施方案中，创建容器的任务被委托给代管服务222。端点创建服务226可以在238处创建无服务器计算函数以供计算服务208稍后使用。The endpoint creation service 226 may include hardware and/or software to generate a model container including a model server and extensions, initiate hosting of the model container 234, and, when appropriate, utilize the role proxy service 206 to assume the role 236 for generating the model container. In at least one embodiment, the task of creating the container is delegated to the hosting service 222. The endpoint creation service 226 may create a serverless compute function at 238 for later use by the compute service 208.

在至少一个实施方案中，模型容器是包括模型服务器214和扩展212的二进制文件。模型服务器214和扩展212可对应于图1中描绘的模型服务器114和扩展112。模型服务器214包括用于与机器学习模型交接的代码，并且可以与无服务器配置、服务器完整配置和混合配置中的使用兼容。扩展212包括用于在无服务器计算函数210与模型服务器214之间交接的代码。In at least one embodiment, the model container is a binary file that includes a model server 214 and an extension 212. The model server 214 and the extension 212 may correspond to the model server 114 and the extension 112 depicted in FIG. 1. The model server 214 includes code for interfacing with the machine learning model and may be compatible with use in a serverless configuration, a server-full configuration, and a hybrid configuration. The extension 212 includes code for interfacing between the serverless compute function 210 and the model server 214.

如本文中所使用，无服务器配置是准许计算函数(诸如由模型服务器214和扩展212执行的计算函数)在不需要专用服务器的情况下被执行的配置。相比之下，服务器完整配置使用专用服务器而不是无服务器计算架构。混合配置使用至少一些专用实例，但也采用无服务器计算架构。As used herein, a serverless configuration is a configuration that permits computing functions (such as those performed by model server 214 and extensions 212) to be performed without the need for dedicated servers. In contrast, a server-full configuration uses dedicated servers rather than a serverless computing architecture. A hybrid configuration uses at least some dedicated instances, but also employs a serverless computing architecture.

代管服务222包括用于生成、验证并且/或者存储模型容器的硬件和/或软件。代管服务222可以将模型容器存储在存储库224中，随后可由计算服务208在该存储库处下载该模型容器。此时，系统200被配置成用于机器学习能力的无服务器提供。The hosting service 222 includes hardware and/or software for generating, validating, and/or storing model containers. The hosting service 222 can store the model container in a repository 224, which can then be downloaded by the computing service 208. At this point, the system 200 is configured for serverless provision of machine learning capabilities.

在至少一个实施方案中，存储库224包括存储有容器的存储系统。存储库224可以存储许多这类容器，并且每个容器可以映射为与不同机器学习模型或端点相关联。作为说明性示例，存储库224可以包含三个容器，第一容器包括对应于端点E₁的模型服务器M₁，第二容器包括对应于端点E₂的模型服务器M₂，并且第三容器包括对应于端点E₃的模型服务器M₃。端点E₁、E₂和E₃或相关联的模型服务器中的每一者又可以与不同的IP地址和机器学习模型相关联。这些端点或相关联的模型服务器也可以与不同客户端相关联。In at least one embodiment, the repository 224 includes a storage system storing containers. The repository 224 can store many such containers, and each container can be mapped to be associated with a different machine learning model or endpoint. As an illustrative example, the repository 224 can include three containers, the first container including a model server _M1 corresponding to the endpoint _E1 , the second container including a model server _M2 corresponding to the endpoint _E2 , and the third container including _a model server _M3 corresponding to the endpoint E3. Each of the endpoints _E1 , _E2 , and _E3 or the associated model servers can be associated with different IP addresses and machine learning models. These endpoints or associated model servers can also be associated with different clients.

在至少一个实施方案中，系统200随后可以通过在步骤240处将代管容器240下载到计算服务208，随后在步骤242处调用计算函数210来对执行机器学习任务的请求作出响应。在至少一个实施方案中，计算函数210由系统200在代管过程期间自动生成并且包括在容器中的代码来实现。在由计算机服务208调用之后，计算函数210与扩展交接，该扩展随后将执行机器学习任务的请求转换为模型服务器214可使用的格式，并且该计算函数与模型服务器214交接以使得模型服务器从机器学习模型获得推断结果。关于图1更详细地解释了此过程。In at least one embodiment, the system 200 can then respond to the request to perform the machine learning task by downloading the managed container 240 to the computing service 208 at step 240, and then calling the computing function 210 at step 242. In at least one embodiment, the computing function 210 is implemented by code automatically generated by the system 200 during the hosting process and included in the container. After being called by the computer service 208, the computing function 210 is handed over to the extension, which then converts the request to perform the machine learning task into a format that can be used by the model server 214, and the computing function is handed over to the model server 214 so that the model server obtains the inference results from the machine learning model. This process is explained in more detail with respect to FIG. 1.

图3示出了根据至少一个实施方案的调用并执行计算函数以执行机器学习推断的示例。在示例系统300中，路由器304接收执行推断的请求。路由器304可以类似于图1中描绘的路由器104。路由器304确定该请求与模型服务器314相关联并且模型服务器已被配置成在无服务器配置中操作。此确定可以以多种方式进行，这些方式可能包括但不限于检索并检查与请求指向的模型服务器相关联的配置元数据。FIG3 illustrates an example of calling and executing a compute function to perform machine learning inference according to at least one embodiment. In the example system 300, a router 304 receives a request to perform inference. The router 304 may be similar to the router 104 depicted in FIG1 . The router 304 determines that the request is associated with a model server 314 and that the model server has been configured to operate in a serverless configuration. This determination may be made in a variety of ways, which may include, but are not limited to, retrieving and inspecting configuration metadata associated with the model server to which the request is directed.

已确定将使用无服务器配置来执行请求的系统300使得计算服务308从存储库324获得扩展312和模型服务器314。这可以响应于来自路由器304的消息来完成，一旦该路由器已确定应采用无服务器配置来处理执行推断的请求，该路由器便可以将请求转发到计算服务308。The system 300, having determined that the request will be executed using a serverless configuration, causes the compute service 308 to obtain the extension 312 and the model server 314 from the repository 324. This may be done in response to a message from the router 304, which may forward the request to the compute service 308 once the router has determined that the request to perform inference should be handled using a serverless configuration.

在至少一个实施方案中，扩展312和模型服务器314存储在容器文件内，并且从存储库324中检索容器文件。可以基于将请求指向的模型服务器(在此示例中为模型服务器314)与容器相关联的索引而在存储库324中找到容器。容器还可以包含计算函数310的具体实现，并且在一些实施方案中，计算函数310由扩展312实现。在其他实施方案中，系统300可以独立于容器或扩展312获得计算函数的具体实现。在一些实施方案中，模型服务器314和扩展312也可以被分开存储和检索，而不是组合成单个容器文件。In at least one embodiment, the extension 312 and the model server 314 are stored within a container file, and the container file is retrieved from the repository 324. The container can be found in the repository 324 based on an index that associates the model server to which the request is directed (in this example, the model server 314) with the container. The container can also contain a specific implementation of the compute function 310, and in some embodiments, the compute function 310 is implemented by the extension 312. In other embodiments, the system 300 can obtain the specific implementation of the compute function independently of the container or the extension 312. In some embodiments, the model server 314 and the extension 312 can also be stored and retrieved separately, rather than being combined into a single container file.

计算服务308调用计算函数310，该计算函数包括用于使用扩展312的代码，使得该扩展确保模型服务器314被激活并且该扩展与模型服务器314交接以获得推断。例如，在模型服务器314实现HTTP服务器的实施方案中，计算函数使得扩展312确保HTTP服务器已被初始化。还使得该扩展向HTTP服务器发出一个或多个HTTP请求以使用机器学习模型316来执行推断。扩展312可以包括用于将经由计算函数提供的数据转换为与HTTP请求兼容的格式的代码。这可以传达一个优点，即这允许使用预先存在的无服务器计算架构和预先存在的机器学习平台和服务，而无需进行大范围修改。The compute service 308 calls a compute function 310, which includes code for using an extension 312, causing the extension to ensure that the model server 314 is activated and the extension to interface with the model server 314 to obtain inferences. For example, in an embodiment where the model server 314 implements an HTTP server, the compute function causes the extension 312 to ensure that the HTTP server has been initialized. The extension is also caused to issue one or more HTTP requests to the HTTP server to perform inferences using the machine learning model 316. The extension 312 may include code for converting data provided via the compute function into a format compatible with the HTTP request. This may convey an advantage that this allows the use of pre-existing serverless computing architectures and pre-existing machine learning platforms and services without extensive modification.

扩展312还可以检索模型服务器314所需的数据。这可以包括例如模型参数340和各种形式的元数据342。扩展312可以根据需要向模型服务器314提供此数据。为了在无服务器配置中利用模型服务器314，可能需要向端点提供在服务器完整实现中在安装有端点的任何服务器实例上应正常可用的数据。然而，由于采用无服务器架构，因而此数据不能预安装在专用服务实例上。此技术挑战可以通过使用扩展312以从模型参数340、元数据342或其他所需信息的相应存储位置加载模型参数、元数据或其他所需信息并且在需要的基础上将信息提供给模型服务器314来解决。这可以通过扩展312与模型服务器314交接以向该模型服务器提供参数340、元数据342或者机器学习模型316将使用的其他信息来完成。在一些情况下，诸如在于机器学习服务(诸如图1中描绘的机器学习服务)上托管机器学习模型的情况下，扩展312可以触发与该服务的使得模型准备好使用所必需的任何交互。在至少一个实施方案中，扩展312将日志数据存储在日志320中。The extension 312 may also retrieve data required by the model server 314. This may include, for example, model parameters 340 and various forms of metadata 342. The extension 312 may provide this data to the model server 314 as needed. In order to utilize the model server 314 in a serverless configuration, it may be necessary to provide the endpoint with data that should be normally available on any server instance where the endpoint is installed in a server-full implementation. However, due to the serverless architecture, this data cannot be pre-installed on a dedicated service instance. This technical challenge can be solved by using the extension 312 to load the model parameters, metadata, or other required information from their corresponding storage locations and provide the information to the model server 314 on an as-needed basis. This can be accomplished by the extension 312 interfacing with the model server 314 to provide the model server with parameters 340, metadata 342, or other information to be used by the machine learning model 316. In some cases, such as where the machine learning model is hosted on a machine learning service (such as the machine learning service depicted in FIG. 1 ), the extension 312 may trigger any interaction with the service necessary to make the model ready for use. In at least one embodiment, extension 312 stores log data in log 320 .

可在扩展312与模型服务器314之间进行的交接操作的示例可以潜在地包括但不限于配置操作、推断操作、训练操作、调试操作、数据变换操作等。扩展312可以接收执行这些操作中的一个操作的请求，该扩展又可以将该请求转换为与模型服务器314兼容的格式，并且与模型服务器314交接以使得该模型服务器执行所请求的操作并获得操作的任何结果。Examples of handover operations that can be performed between the extension 312 and the model server 314 can potentially include, but are not limited to, configuration operations, inference operations, training operations, debugging operations, data transformation operations, etc. The extension 312 can receive a request to perform one of these operations, which in turn can convert the request into a format compatible with the model server 314, and handover with the model server 314 to cause the model server to perform the requested operation and obtain any results of the operation.

模型服务器314通过与机器学习模型316交互来执行这些操作。模型服务器314可以包括HTTP服务器330。可以在用于配置、训练、部署并使用机器学习模型的分布式可扩充服务上托管机器学习模型。在这类实施方案中，模型服务器314可以通过调用由服务提供的用于托管机器学习模型的基于网络的方法来与服务交接。由服务实现的基于网络的方法可以潜在地包括但不限于配置操作、推断操作、训练操作、调试操作、数据变换操作等。The model server 314 performs these operations by interacting with the machine learning model 316. The model server 314 may include an HTTP server 330. The machine learning model may be hosted on a distributed, scalable service for configuring, training, deploying, and using the machine learning model. In such embodiments, the model server 314 may interface with the service by calling a network-based method provided by the service for hosting the machine learning model. The network-based method implemented by the service may potentially include, but is not limited to, configuration operations, inference operations, training operations, debugging operations, data transformation operations, and the like.

图4示出了根据至少一个实施方案的结合有机器学习推断的无服务器处理和服务器完整处理的混合系统的示例。诸如所描绘的系统400的混合系统利用在其上执行机器学习操作的一个或多个专用服务器实例来提供机器学习能力，同时还结合有提供用于执行机器学习操作的附加能力的无服务器计算架构。在至少一个实施方案中，基线量的机器学习操作由专用实例执行，并且采用无服务器计算架构来提供激增能力。4 illustrates an example of a hybrid system combining serverless processing and server-full processing of machine learning inferences according to at least one embodiment. A hybrid system such as the depicted system 400 provides machine learning capabilities using one or more dedicated server instances on which machine learning operations are performed, while also incorporating a serverless computing architecture that provides additional capabilities for performing machine learning operations. In at least one embodiment, a baseline amount of machine learning operations are performed by dedicated instances, and a serverless computing architecture is employed to provide burst capabilities.

在至少一个实施方案中，混合系统400包括路由器404。类似于图1中描绘的路由器104，路由器404可以包括网络装置或者具有网络通信硬件的被配置成能够与客户端430和混合系统400的各种部件通信的其他计算装置，包括专用服务器实例，诸如所描绘的服务器实例450和计算服务408。In at least one embodiment, hybrid system 400 includes router 404. Similar to router 104 depicted in FIG. 1 , router 404 may include a network device or other computing device having network communication hardware configured to enable communication with clients 430 and various components of hybrid system 400, including dedicated server instances, such as depicted server instance 450 and computing service 408.

路由器404从客户端430接收一定量的请求，并且在服务器实例450上的模型服务器414a与计算服务408之间分发请求。如果有多于一个服务器实例450，则可以由路由器或负载平衡部件在服务器实例以及安装在该服务器实例上的模型服务器之间划分一定比例的请求。在至少一个实施方案中，路由到服务器实例450的请求的比例是基于对实例能力的利用。当利用率超过阈值量时，路由器404开始将一定比例的请求分发到计算服务408。发送到计算服务408的比例可以被动态调整以将专用计算实例的利用率最大化，同时还防止实例过载。实施方案还可以尝试将计算服务408的利用率最小化以便将成本降至最低。例如，在至少一个实施方案中，客户被分配有用于使用专用服务器实例450的固定金额的费用，并且被分配有用于使用计算服务408的可变金额的费用。在这类情况下，通过将其费用分配固定的服务器实例450的利用率最大化并且将除了固定费用以外其费用可变的计算服务408的利用率最小化，实施方案可以尝试将总费用分配最小化。在实施方案中，可以完成此操作，同时还避免专用服务器实例450的过度利用。The router 404 receives a certain amount of requests from the client 430 and distributes the requests between the model server 414a on the server instance 450 and the computing service 408. If there are more than one server instance 450, a certain proportion of requests can be divided between the server instance and the model server installed on the server instance by the router or load balancing component. In at least one embodiment, the proportion of requests routed to the server instance 450 is based on the utilization of the instance capacity. When the utilization exceeds the threshold amount, the router 404 begins to distribute a certain proportion of requests to the computing service 408. The proportion sent to the computing service 408 can be dynamically adjusted to maximize the utilization of the dedicated computing instance while also preventing the instance from being overloaded. The implementation scheme can also try to minimize the utilization of the computing service 408 in order to minimize the cost. For example, in at least one embodiment, the customer is assigned a fixed amount of fees for using the dedicated server instance 450, and is assigned a variable amount of fees for using the computing service 408. In such cases, embodiments can attempt to minimize the total cost allocation by maximizing the utilization of server instances 450 whose cost allocation is fixed and minimizing the utilization of computing services 408 whose costs are variable except for the fixed costs. In embodiments, this can be done while also avoiding over-utilization of dedicated server instances 450.

如图4中所描绘，服务器实例450表示被指派有持续且无限期地承担托管模型服务器414a的实例的任务的服务器。可以存在各自托管一个或多个模型服务器的许多这类实例，但为了解释清楚起见，图4仅描绘了单个这样的实例。服务器实例可以包括适于托管模型服务器414a的任何计算装置。As depicted in Figure 4, server instance 450 represents a server that is assigned the task of continuously and indefinitely hosting an instance of model server 414a. There may be many such instances, each hosting one or more model servers, but for clarity of explanation, Figure 4 depicts only a single such instance. A server instance may include any computing device suitable for hosting model server 414a.

计算服务408提供对计算函数410的无服务器调用，并且计算服务408的实施方案可对应于关于图1中描绘的计算服务108所描述的实施方案。Compute service 408 provides serverless invocation of compute functions 410 , and an implementation of compute service 408 may correspond to the implementation described with respect to compute service 108 depicted in FIG. 1 .

计算函数410包括由计算服务408使用动态地分配的计算能力调用并执行的可执行代码单元。计算函数410的实施方案可对应于关于图1中描绘的计算函数110所描述的实施方案。Compute function 410 includes executable code units that are called and executed by compute service 408 using dynamically allocated compute capacity. Implementations of compute function 410 may correspond to implementations described with respect to compute function 110 depicted in FIG.

扩展412包括用于与模型服务器114交接的代码，并且扩展412的实施方案可对应于关于图1中描绘的计算函数110所描述的实施方案。类似地，模型服务器414包括用于与机器学习模型416交接的代码，并且模型服务器414的实施方案可对应于关于图1中描绘的计算函数110所描述的实施方案。Extension 412 includes code for interfacing with model server 114, and an implementation of extension 412 may correspond to an implementation described with respect to compute function 110 depicted in FIG1. Similarly, model server 414 includes code for interfacing with machine learning model 416, and an implementation of model server 414 may correspond to an implementation described with respect to compute function 110 depicted in FIG1.

机器学习服务418和机器学习模型416的实施方案也可对应于关于针对图1描述的机器学习服务118和机器学习模型116所描述的实施方案。应注意，在此处，被训练以执行某个特定类型的推断的个别机器学习模型416是由在专用服务器实例上运行的模型服务器414a以及经由包括计算服务408的无服务器计算架构执行的另一模型服务器414b两者使用的。此外，模型服务器414a、414b可以是相同模型服务器的实例，这意味着构成这两个实例的代码是相同的。这传达的技术优势在于，客户仅需要提供或者定义单个端点，但可以在两个架构中使用该模型服务器。在与图4一致的一些实施方案中，模型服务器414b与扩展412相关联以便在无服务器计算架构内使用模型服务器414b。The implementation scheme of the machine learning service 418 and the machine learning model 416 may also correspond to the implementation scheme described with respect to the machine learning service 118 and the machine learning model 116 described with respect to FIG. 1 . It should be noted that here, the individual machine learning model 416 trained to perform a particular type of inference is used by both a model server 414a running on a dedicated server instance and another model server 414b executed via a serverless computing architecture including the computing service 408. In addition, the model servers 414a, 414b can be instances of the same model server, which means that the code constituting the two instances is the same. The technical advantage conveyed by this is that the customer only needs to provide or define a single endpoint, but the model server can be used in both architectures. In some embodiments consistent with FIG. 4 , the model server 414b is associated with the extension 412 so as to use the model server 414b within the serverless computing architecture.

图5示出了根据至少一个实施方案的用于将无服务器计算架构配置成执行机器学习推断的示例过程。尽管示例过程500被描绘为一系列步骤或操作，但将了解，除了明确指出或逻辑上需要的情况(诸如一个步骤或操作的输出被用作针对另一步骤或操作的输入的情况)之外，所描绘的过程的实施方案可以包括更改或重新排序的步骤或操作，或者可以省略某些步骤或操作。在至少一个实施方案中，示例过程500由结合有无服务器计算架构的系统来实现，该无服务器计算架构诸如关于附图描绘或描述的架构中的任何架构。FIG5 illustrates an example process for configuring a serverless computing architecture to perform machine learning inference according to at least one embodiment. Although the example process 500 is depicted as a series of steps or operations, it will be appreciated that the depicted embodiment of the process may include altered or reordered steps or operations, or may omit certain steps or operations, except where expressly indicated or logically required (such as where the output of one step or operation is used as input to another step or operation). In at least one embodiment, the example process 500 is implemented by a system incorporating a serverless computing architecture, such as any of the architectures depicted or described with respect to the figures.

在502处，系统接收创建端点以提供机器学习服务的请求。在实施方案中，端点创建是指系统使得其自身能够接收与机器学习模型交互的请求。在至少一个实施方案中，端点与访问机器学习模型的请求指向的网络地址相关联。在其他实施方案中，与端点相关联的模型服务器与网络地址相关联。At 502, the system receives a request to create an endpoint to provide a machine learning service. In an embodiment, endpoint creation refers to the system enabling itself to receive requests to interact with a machine learning model. In at least one embodiment, the endpoint is associated with a network address to which a request to access the machine learning model is directed. In other embodiments, a model server associated with the endpoint is associated with a network address.

在504处，系统确定无服务器配置已被请求。例如，在至少一个实施方案中，创建端点的请求可以伴随有元数据，该元数据指定端点所需的属性，并且该元数据还可以包括指示端点应在无服务器配置中被托管的标志或其他值。与无服务器配置相关的附加属性也可以包括在请求中。At 504, the system determines that a serverless configuration has been requested. For example, in at least one embodiment, the request to create an endpoint can be accompanied by metadata that specifies the properties required for the endpoint, and the metadata can also include a flag or other value indicating that the endpoint should be hosted in a serverless configuration. Additional properties related to the serverless configuration can also be included in the request.

在506处，系统标识针对无服务器计算架构的最大并发性和存储器利用率的参数。这些参数也可以经由包括在创建端点的请求中的元数据来指定。并发性是指在给定时间处指向端点的未决请求的数量。存储器利用率是指系统存储器的使用情况。将了解，这些示例旨在说明而不是限制。At 506, the system identifies parameters for maximum concurrency and memory utilization for the serverless computing architecture. These parameters may also be specified via metadata included in the request to create the endpoint. Concurrency refers to the number of outstanding requests directed to the endpoint at a given time. Memory utilization refers to the usage of system memory. It will be appreciated that these examples are intended to be illustrative and not limiting.

在508处，系统生成并存储容器，该容器包括与所请求的端点和扩展相关联的模型服务器。在此处，模型服务器是指用于实现模型服务器的代码和/或配置数据，并且扩展是指至少包括用于与模型服务器交接的指令的代码。容器、模型服务器和扩展可以参考本文中关于附图描述的实施方案，包括关于图1描述的实施方案。At 508, the system generates and stores a container including a model server associated with the requested endpoint and extension. Here, the model server refers to code and/or configuration data for implementing the model server, and the extension refers to code including at least instructions for interfacing with the model server. The container, the model server, and the extension may refer to the embodiments described in the accompanying drawings herein, including the embodiments described in relation to FIG. 1 .

随后当系统接收到指向对应端点的请求时，可以对所存储的容器进行定位并且从存储装置中调用所存储的容器。例如，容器可以存储在由网络地址索引的存储库中。可以使用类似方法来存储并索引与端点相关联的元数据。当系统接收到指向端点的请求时，该系统可以使用索引来确定该请求应经由无服务器计算架构来处理，加载容器并且继续处理该请求。本文中关于附图(包括关于图1)描述了处理请求的实施方案的示例。Subsequently, when the system receives a request directed to the corresponding endpoint, the stored container can be located and called from the storage device. For example, the container can be stored in a repository indexed by a network address. A similar method can be used to store and index metadata associated with an endpoint. When the system receives a request directed to an endpoint, the system can use the index to determine that the request should be processed via a serverless computing architecture, load the container, and continue processing the request. Examples of implementations of processing requests are described herein with respect to the accompanying drawings (including with respect to FIG. 1).

图6示出了根据至少一个实施方案的用于将混合计算架构配置成执行机器学习推断的示例过程。尽管示例过程600被描绘为一系列步骤或操作，但将了解，除了明确指出或逻辑上需要的情况(诸如一个步骤或操作的输出被用作针对另一步骤或操作的输入的情况)之外，所描绘的过程的实施方案可以包括更改或重新排序的步骤或操作，或者可以省略某些步骤或操作。在至少一个实施方案中，示例过程600由结合有无服务器计算架构的系统来实现，该无服务器计算架构诸如关于附图描绘或描述的架构中的任何架构。FIG6 illustrates an example process for configuring a hybrid computing architecture to perform machine learning inference according to at least one embodiment. Although the example process 600 is depicted as a series of steps or operations, it will be appreciated that the depicted embodiment of the process may include altered or reordered steps or operations, or may omit certain steps or operations, except where expressly indicated or logically required (such as where the output of one step or operation is used as input to another step or operation). In at least one embodiment, the example process 600 is implemented by a system incorporating a serverless computing architecture, such as any of the architectures depicted or described with respect to the figures.

在602处，系统接收启用混合配置以提供机器学习推断的请求。如上文关于图5所描述，创建用于与机器学习模型通信的端点的请求可以包括指示应如何配置端点的信息。这可以包括指示可使用混合配置的信息。在混合配置中，该系统采用一个或多个专用服务器来处理指向对应端点或模型服务器的请求的一部分，并且采用无服务器计算架构来处理剩余部分。在一些情况下，这样做是为了应对需求的激增或者临时应对需求增加，直到可以添加新的专用实例为止。At 602, the system receives a request to enable a hybrid configuration to provide machine learning inference. As described above with respect to FIG. 5, a request to create an endpoint for communicating with a machine learning model may include information indicating how the endpoint should be configured. This may include information indicating that a hybrid configuration may be used. In a hybrid configuration, the system employs one or more dedicated servers to handle a portion of requests directed to a corresponding endpoint or model server, and employs a serverless computing architecture to handle the remainder. In some cases, this is done to respond to a surge in demand or to temporarily respond to an increase in demand until a new dedicated instance can be added.

在604处，系统标识或者获得专用服务器实例。该服务器被称为专用的，因为它们被指派有持续地处理指向端点或相关联的模型服务器的请求的角色。这一般涉及安装在服务器上的模型服务器以及在一系列请求内保持激活。此外，专用服务器可以被分配给与端点相同的用户或账户并且不被其他用户或账户使用。At 604, the system identifies or obtains a dedicated server instance. The servers are referred to as dedicated because they are assigned the role of continuously processing requests directed to the endpoint or associated model server. This generally involves the model server being installed on the server and remaining active for a series of requests. In addition, a dedicated server can be assigned to the same user or account as the endpoint and not used by other users or accounts.

在一些情况下，端点可以被重新配置成使得该端点从服务器完整配置转换为混合配置。在这类情况下，系统可以标识任何现有专用服务器，并且继续使用那些服务器来处理传入请求的一部分，并且将无服务器计算架构配置成处理附加部分。In some cases, the endpoint can be reconfigured so that the endpoint is converted from a server-full configuration to a hybrid configuration. In such cases, the system can identify any existing dedicated servers and continue to use those servers to handle a portion of the incoming requests, and configure the serverless computing architecture to handle the additional portion.

在其他情况下，诸如当第一次创建端点时，系统可以获得对一个或多个专用服务器的访问，将该一个或多个专用服务器配置成用于处理指向端点的请求的一部分，并且将无服务器计算架构配置成处理附加部分。In other cases, such as when an endpoint is first created, the system can obtain access to one or more dedicated servers, configure the one or more dedicated servers to process a portion of requests directed to the endpoint, and configure the serverless computing architecture to process additional portions.

在606处，该系统获得用于操作无服务器计算架构的参数。这些参数可以包括上文关于图5描述的参数。另外，在608处，该系统获得服务水平参数。这些服务水平参数可以包括与专用服务器的期望利用率水平相关的参数。为了说明，在一些实施方案中，通过保持专用服务器的利用率相对较低并且在超过该利用量的情况下容易将负载转移到无服务器计算架构，可以实现较高服务水平。另一方面，这可能导致用户被分配有超过并高于与专用实例相关联的固定费用的附加费用。实施方案可以通过允许用户指示应如何在专用实例与无服务器计算架构之间划分能力利用率来解决这一点。At 606, the system obtains parameters for operating a serverless computing architecture. These parameters may include the parameters described above with respect to FIG. 5. In addition, at 608, the system obtains service level parameters. These service level parameters may include parameters related to the expected utilization level of a dedicated server. To illustrate, in some embodiments, a higher service level may be achieved by keeping the utilization of the dedicated server relatively low and easily transferring the load to the serverless computing architecture when exceeding the utilization. On the other hand, this may result in the user being assigned an additional fee that exceeds and is higher than the fixed fee associated with the dedicated instance. The embodiment may solve this by allowing the user to indicate how the capacity utilization should be divided between the dedicated instance and the serverless computing architecture.

在610处，该系统将其自身配置成用于混合操作。这可以包括使用与关于图5描述的步骤类似或相同的步骤来生成并存储针对模型服务器和扩展的容器。混合系统的配置还可以包括将路由器(诸如图4中描绘的路由器404)配置成在一个或多个专用服务器与无服务器计算架构之间分发工作负荷。At 610, the system configures itself for hybrid operation. This may include generating and storing containers for model servers and extensions using steps similar or identical to those described with respect to FIG5. Configuration of the hybrid system may also include configuring a router (such as router 404 depicted in FIG4) to distribute workloads between one or more dedicated servers and the serverless computing architecture.

一旦被配置，在混合模式下操作的系统便可以在专用服务器实例与无服务器计算架构之间实现负载平衡。在至少一个实施方案中，此负载平衡可以包括将专用服务器的利用率最大化，以及在专用服务器的利用率超过期望参数时将负载转移到无服务器计算架构。Once configured, a system operating in hybrid mode can achieve load balancing between dedicated server instances and serverless computing architectures. In at least one embodiment, this load balancing can include maximizing the utilization of dedicated servers and transferring the load to the serverless computing architecture when the utilization of dedicated servers exceeds a desired parameter.

在至少一个实施方案中，该系统可以基于与无服务器计算架构相关联的使用模式和费用来生成用于调整专用服务器数量的建议。例如，如果无服务器计算架构的利用率一致较高，则该建议可以将添加附加专用服务器；或者如果该系统确定可使用无服务器计算架构来高效应对周期性需求激增，则该建议可能将移除服务器。将了解，这些示例旨在说明而不是限制。In at least one embodiment, the system can generate recommendations for adjusting the number of dedicated servers based on usage patterns and costs associated with the serverless computing architecture. For example, if the utilization of the serverless computing architecture is consistently high, the recommendation may be to add additional dedicated servers; or if the system determines that the serverless computing architecture can be used to efficiently handle periodic demand surges, the recommendation may be to remove servers. It will be appreciated that these examples are intended to be illustrative and not limiting.

图7示出了根据至少一个实施方案的用于使用无服务器计算架构来执行机器学习推断的示例过程。尽管示例过程700被描绘为一系列步骤或操作，但将了解，除了明确指出或逻辑上需要的情况(诸如一个步骤或操作的输出被用作针对另一步骤或操作的输入的情况)之外，所描绘的过程的实施方案可以包括更改或重新排序的步骤或操作，或者可以省略某些步骤或操作。在至少一个实施方案中，示例过程700由结合有无服务器计算架构的系统来实现，该无服务器计算架构诸如关于附图描绘或描述的架构中的任何架构。FIG7 illustrates an example process for performing machine learning inference using a serverless computing architecture according to at least one embodiment. Although the example process 700 is depicted as a series of steps or operations, it will be appreciated that the depicted embodiment of the process may include altered or reordered steps or operations, or may omit certain steps or operations, except where expressly indicated or logically required (such as where the output of one step or operation is used as input to another step or operation). In at least one embodiment, the example process 700 is implemented by a system incorporating a serverless computing architecture, such as any of the architectures depicted or described with respect to the figures.

在702处，该系统接收使用无服务器计算架构来托管机器学习模型的请求。在至少一个实施方案中，该请求包括指示将使用哪个机器学习模型的信息，或者包括指示将用于访问机器学习模型的端点的信息。At 702, the system receives a request to host a machine learning model using a serverless computing architecture. In at least one embodiment, the request includes information indicating which machine learning model to use, or includes information indicating an endpoint to be used to access the machine learning model.

在704处，该系统标识与请求相关联的端点。该请求可以包括指示机器学习模型与端点之间的关联的数据。该端点可以与模型服务器相关联，如本文中关于各种实施方案所描述，该模型服务器可用于访问机器学习模型并且执行推断。模型服务器可以包含用于在客户端与机器学习模型之间交接的代码，诸如实现HTTP服务器的代码，该HTTP服务器的方法可用于使用模型来执行推断。At 704, the system identifies an endpoint associated with the request. The request may include data indicating an association between the machine learning model and the endpoint. The endpoint may be associated with a model server, as described herein with respect to various embodiments, which may be used to access the machine learning model and perform inference. The model server may include code for interfacing between the client and the machine learning model, such as code implementing an HTTP server whose methods may be used to perform inference using the model.

在706处，该系统将端点与扩展相关联，其中该扩展在无服务器计算函数与模型服务器之间进行推断。这可以包括用于进行以下操作的代码：将由计算函数提供的数据转换为与模型服务器兼容的格式，使得当无服务器计算架构调用计算函数时，可以使所述数据与模型服务器所预期的任何格式兼容。例如，在模型服务器实现HTTP服务器的实施方案中，该扩展可以将数据转换成与HTTP服务器的基于网络的方法兼容的数据格式。At 706, the system associates the endpoint with an extension that performs inference between the serverless compute function and the model server. This may include code for converting data provided by the compute function into a format compatible with the model server so that when the serverless compute architecture calls the compute function, the data can be made compatible with any format expected by the model server. For example, in an embodiment where the model server implements an HTTP server, the extension can convert the data into a data format compatible with a network-based approach of the HTTP server.

在至少一个实施方案中，该扩展包括代码，该代码在由无服务器架构的计算函数调用时使得机器学习模型可被模型服务器访问。在至少一个实施方案中，这是通过调用与模型服务器相关联的初始化函数来完成的。In at least one embodiment, the extension includes code that, when called by a compute function of a serverless architecture, makes the machine learning model accessible to a model server. In at least one embodiment, this is accomplished by calling an initialization function associated with the model server.

在至少一个实施方案中，通过创建容器文件将模型服务器或相关联的端点与该扩展相关联。例如，该系统可以生成包括模型服务器和扩展的文件，存储该文件，并且存储该文件与标识该端点的信息之间的关联。此信息随后可用于基于执行推断的请求中所提供的信息来对文件进行定位。在至少一个实施方案中，该信息是与端点或模型服务器相关联的网络地址。In at least one embodiment, the model server or associated endpoint is associated with the extension by creating a container file. For example, the system can generate a file including the model server and the extension, store the file, and store an association between the file and information identifying the endpoint. This information can then be used to locate the file based on information provided in a request to perform inference. In at least one embodiment, the information is a network address associated with the endpoint or model server.

在708处，该系统接收使用托管机器学习模型来执行推断的请求。该请求可以由系统作为指向端点或模型服务器的基于网络的请求来接收。该系统随后可以确定应使用无服务器计算架构来处理该请求。At 708, the system receives a request to perform inference using a hosted machine learning model. The request can be received by the system as a network-based request directed to an endpoint or model server. The system can then determine that the request should be processed using a serverless computing architecture.

在710处，该系统通过执行无服务器计算函数来处理该请求。该计算函数在被执行时使用扩展以经由模型服务器获得由机器学习模型生成的推断。一般来说，控制流包括调用计算函数的计算服务、扩展的计算函数调用方法以及模型服务器的扩展调用方法。扩展与模型服务器之间的交接可以在该扩展已在端点上执行合适的初始化过程之后进行。At 710, the system processes the request by executing a serverless compute function. The compute function, when executed, uses the extension to obtain inferences generated by the machine learning model via the model server. Generally, the control flow includes a compute service that calls the compute function, a compute function calling method of the extension, and an extension calling method of the model server. The handover between the extension and the model server can be performed after the extension has performed a suitable initialization process on the endpoint.

在712处，该系统响应于该请求而提供所请求的推断。在至少一个实施方案中，该扩展调用模型服务器上的一个或多个方法以使得模型服务器访问机器学习模型。机器学习模型执行推断，并且经由模型服务器来返回构成所执行的推断的结果的数据。At 712, the system provides the requested inference in response to the request. In at least one embodiment, the extension calls one or more methods on the model server to cause the model server to access the machine learning model. The machine learning model performs the inference and returns data constituting the result of the performed inference via the model server.

图8示出了根据至少一个实施方案的用于使用混合计算架构来执行机器学习推断的示例过程。尽管示例过程600被描绘为一系列步骤或操作，但将了解，除了明确指出或逻辑上需要的情况(诸如一个步骤或操作的输出被用作针对另一步骤或操作的输入的情况)之外，所描绘的过程的实施方案可以包括更改或重新排序的步骤或操作，或者可以省略某些步骤或操作。在至少一个实施方案中，示例过程800由结合有无服务器计算架构的系统来实现，该无服务器计算架构诸如关于附图描绘或描述的架构中的任何架构。FIG8 illustrates an example process for performing machine learning inference using a hybrid computing architecture according to at least one embodiment. Although the example process 600 is depicted as a series of steps or operations, it will be appreciated that the depicted embodiment of the process may include altered or reordered steps or operations, or may omit certain steps or operations, except where expressly indicated or logically required (such as where the output of one step or operation is used as input to another step or operation). In at least one embodiment, the example process 800 is implemented by a system incorporating a serverless computing architecture, such as any of the architectures depicted or described with respect to the figures.

在802处，该系统叙述对端点进行配置以利用混合配置的请求。该系统随后可以标识与端点相关联的模型服务器，其中模型服务器包括用于与机器学习模型交接的代码。在一些情况下，该端点先前可能已被创建，诸如在该系统将从仅依赖于专用实例的配置转变为依赖于混合配置的配置的情况下。在其他情况下，创建新端点。At 802, the system recites a request to configure an endpoint to utilize a hybrid configuration. The system can then identify a model server associated with the endpoint, wherein the model server includes code for interfacing with a machine learning model. In some cases, the endpoint may have been previously created, such as when the system is transitioning from a configuration that relies only on dedicated instances to a configuration that relies on a hybrid configuration. In other cases, a new endpoint is created.

在804处，该系统将端点或模型服务器与包括用于与模型服务器交接的代码的扩展相关联。如本文中例如关于图4所描述，这可以包括生成包括针对模型服务器和扩展两者的代码的容器，以及存储可用于确定无服务器计算架构已被配置成支持推断请求的无服务器处理的信息。At 804, the system associates an endpoint or model server with an extension that includes code for interfacing with the model server. As described herein, for example, with respect to FIG. 4, this can include generating a container that includes code for both the model server and the extension, and storing information that can be used to determine that the serverless computing architecture has been configured to support serverless processing of inference requests.

在806处，该系统接收获得推断的请求。该请求可以随时间推移以各种模式被接收，这些模式可以包括需求激增或需求稳定增加。将了解，这些示例旨在说明而不是限制。该系统随后可以根据预期模式来划分处理这些请求的责任，例如，如上文关于图6所描述。因此，在至少一个实施方案中，该系统基于专用服务器的能力而在两个系统之间划分请求。At 806, the system receives a request to obtain an inference. The request can be received in various patterns over time, which can include a surge in demand or a steady increase in demand. It will be appreciated that these examples are intended to be illustrative and not limiting. The system can then divide the responsibility for processing these requests according to the expected pattern, for example, as described above with respect to FIG. 6. Thus, in at least one embodiment, the system divides the request between the two systems based on the capabilities of the dedicated servers.

在808处，该系统使用在至少一个专用服务器实例上操作的模型服务器的至少第一实例来对请求的第一部分作出响应。在一些情况和实施方案中，通过将专用实例的利用率最大化直到某个最大利用率来确定第一部分，并且将请求的任何剩余部分分配给无服务器计算架构。At 808, the system responds to a first portion of the request using at least a first instance of a model server operating on at least one dedicated server instance. In some cases and embodiments, the first portion is determined by maximizing utilization of the dedicated instances up to a certain maximum utilization, and any remaining portion of the request is allocated to the serverless computing architecture.

在810处，该系统使用无服务器计算架构上的模型服务器的第二实例来对请求的第二部分作出响应。无服务器计算架构随后根据第二部分的大小来动态地分配处理请求的此部分的能力。At 810, the system responds to a second portion of the request using a second instance of the model server on the serverless computing architecture. The serverless computing architecture then dynamically allocates capacity to process this portion of the request based on the size of the second portion.

本文中描述的系统、技术和方法可应用于多种计算服务，包括但不一定限于机器学习模型、模拟、基于网络的应用程序或其他代码单元。一般来说，所公开的技术可适用于包括但不一定限于使用包括所公开的模型服务器的架构来访问软件应用程序的场景。The systems, techniques, and methods described herein can be applied to a variety of computing services, including but not necessarily limited to machine learning models, simulations, network-based applications, or other code units. In general, the disclosed techniques can be applied to scenarios including but not necessarily limited to accessing software applications using an architecture including the disclosed model server.

在实施方案中，一种系统包括至少一个处理器以及包括计算机可执行指令的存储器，该计算机可执行指令响应于该至少一个处理器的执行而使得该系统将无服务器计算架构配置成托管计算服务。计算服务可以包括机器学习模型、基于计算机的模拟、基于网络的应用程序或其他计算机服务，前提条件是计算服务是经由如本文中所描述的模型服务器来访问的。该系统通过至少将端点与包括用于与模型服务器交接的代码的扩展相关联而将无服务器计算架构配置成托管计算服务，其中模型服务器包括用于访问计算服务的代码。该系统随后可以接收从计算服务获得结果的请求，并且通过在无服务器计算架构上执行计算函数来对该请求作出响应。计算函数调用由扩展实现的一个或多个函数，并且该一个或多个函数经由模型服务器来获得由计算服务生成的结果。In an embodiment, a system includes at least one processor and a memory including computer executable instructions, which in response to the execution of the at least one processor causes the system to configure a serverless computing architecture to a hosted computing service. The computing service may include a machine learning model, a computer-based simulation, a network-based application, or other computer services, provided that the computing service is accessed via a model server as described herein. The system configures the serverless computing architecture to a hosted computing service by associating at least an endpoint with an extension including a code for interfacing with a model server, wherein the model server includes a code for accessing the computing service. The system may then receive a request to obtain a result from the computing service, and respond to the request by executing a computing function on the serverless computing architecture. The computing function calls one or more functions implemented by the extension, and the one or more functions obtain the result generated by the computing service via the model server.

在至少一个实施方案中，由扩展实现的一个或多个函数在由无服务器计算函数调用时使得计算服务准备好供模型服务器使用。In at least one embodiment, one or more functions implemented by the extension, when invoked by the serverless compute function, prepare the compute service for use by the model server.

在至少一个实施方案中，该系统的存储器还包括计算机可执行指令，该计算机可执行指令响应于由该至少一个处理器的执行而使得该系统响应于确定该请求指向与模型服务器相关联的网络地址并且端点已被配置成利用无服务器计算架构而拦截该请求。In at least one embodiment, the memory of the system also includes computer executable instructions that, in response to execution by the at least one processor, cause the system to intercept the request in response to determining that the request is directed to a network address associated with the model server and the endpoint has been configured to utilize a serverless computing architecture.

在至少一个实施方案中，该模型服务器包括用于使得能够在已被保留用于提供对计算服务的访问的服务器的实例上托管模型服务器的代码。In at least one embodiment, the model server includes code for enabling hosting of the model server on an instance of a server that has been reserved for providing access to a computing service.

在至少一个实施方案中，该无服务器计算架构根据需求来动态地分配计算能力，以处理使用计算服务来获得推断的请求。In at least one embodiment, the serverless computing architecture dynamically allocates computing capacity based on demand to process requests to obtain inferences using computing services.

在另一示例中，一种利用混合配置的方法包括：获得包括用于与计算服务交接的代码的模型服务器，以及将模型服务器与和模型服务器交接的扩展相关联。当接收到通过使用计算服务获得结果的请求时，该方法包括通过使用安装在服务器上的模型服务器的第一实例来对请求的第一部分作出响应，以及通过使用无服务器计算架构来对请求的第二部分作出响应。为了使用无服务器计算架构，该方法包括调用计算函数，该计算函数使用扩展和模型服务器的至少第二实例来从计算服务获得结果。In another example, a method for utilizing a hybrid configuration includes obtaining a model server including code for interfacing with a compute service, and associating the model server with an extension that interfaces with the model server. When a request to obtain a result using the compute service is received, the method includes responding to a first portion of the request by using a first instance of the model server installed on a server, and responding to a second portion of the request by using a serverless compute architecture. To use the serverless compute architecture, the method includes invoking a compute function that uses the extension and at least a second instance of the model server to obtain a result from the compute service.

图9示出了用于实现根据实施方案的各方面的示例系统900的各方面。如将理解，尽管出于解释的目的使用了基于网络的系统，但是可适当地使用不同的系统来实现各种实施方案。在实施方案中，系统包括电子客户端装置902，该电子客户端装置包括可操作来通过适当的网络904发送和/或接收请求、消息或信息并且将信息传送回到装置的用户的任何适当的装置。此类客户端装置的示例包括个人计算机、手机或其他移动电话、手持式消息传递装置、膝上计算机、平板计算机、机顶盒、个人数据助理、嵌入式计算机系统、电子书阅读器等。在实施方案中，该网络包括任何适当的网络，包括内联网、互联网、蜂窝网络、局域网、卫星网络或任何其他这样的网络和/或它们的组合，并且用于这种系统的部件至少部分地取决于所选择的网络和/或系统的类型。用于经由这种网络通信的许多协议和组件是众所周知的，因而本文不再详细论述。在实施方案中，网络上的通信通过有线和/或无线连接及其组合来实现。在实施方案中，网络包括互联网和/或其他可公开寻址的通信网络，因为系统包括用于接收请求并且响应于所述请求而提供内容的web服务器906，然而对于其他网络来说，可使用服务类似目的的替代装置，如本领域普通技术人员所显而易见的。Fig. 9 shows various aspects of the example system 900 for realizing various aspects according to the embodiment.As will be understood, although a network-based system is used for the purpose of explanation, various embodiments can be realized by using different systems appropriately.In an embodiment, the system includes an electronic client device 902, which includes any appropriate device that can be operated to send and/or receive a request, message or information through an appropriate network 904 and transmit the information back to the user of the device.The example of such client devices includes a personal computer, a mobile phone or other mobile phones, a handheld messaging device, a laptop computer, a tablet computer, a set-top box, a personal data assistant, an embedded computer system, an e-book reader, etc.In an embodiment, the network includes any appropriate network, including an intranet, the Internet, a cellular network, a local area network, a satellite network or any other such network and/or their combination, and the components for such a system depend at least in part on the type of the selected network and/or system.It is well known that many protocols and components for communicating via such a network are thus not discussed in detail herein.In an embodiment, the communication on the network is realized by wired and/or wireless connection and a combination thereof. In an embodiment, the network includes the Internet and/or other publicly addressable communications network, as the system includes a web server 906 for receiving requests and providing content in response to the requests, however for other networks, alternative devices serving similar purposes may be used, as will be apparent to one of ordinary skill in the art.

在实施方案中，说明性系统包括至少一个应用程序服务器908和数据存储区910，并且应理解，可以存在可以链接起来或以其他方式来配置的若干应用程序服务器、层或其他元件、过程或部件，其可进行交互以执行如从适当数据存储区获得数据的任务。在实施方案中，服务器被实现为硬件装置、虚拟计算机系统、在计算机系统上执行的编程模块、和/或配置有硬件和/或软件的其他装置以通过网络接收和响应通信(例如，web服务应用程序编程接口(API)请求)。如本文所用，除非另有说明或从上下文清楚可见，否则术语“数据存储区”是指能够存储、访问和检索数据的任何装置或装置组合，其可包括任何标准、分布式、虚拟或集群系统中的任何组合的和任何数量的数据服务器、数据库、数据存储装置和数据存储介质。在实施方案中，数据存储区与块级和/或对象级接口通信。应用程序服务器可以包括任何适当的硬件、软件和固件，以用于根据需要与数据存储区集成以执行针对客户端装置的一个或多个应用程序的各方面，从而处理针对应用程序的数据访问和业务逻辑中的一些或全部。In an embodiment, the illustrative system includes at least one application server 908 and a data store 910, and it should be understood that there may be several application servers, layers or other elements, processes or components that can be linked or otherwise configured, which can interact to perform tasks such as obtaining data from appropriate data stores. In an embodiment, the server is implemented as a hardware device, a virtual computer system, a programming module executed on a computer system, and/or other devices configured with hardware and/or software to receive and respond to communications (e.g., web service application programming interface (API) requests) over a network. As used herein, unless otherwise specified or clearly visible from the context, the term "data store" refers to any device or device combination capable of storing, accessing and retrieving data, which may include any combination of and any number of data servers, databases, data storage devices and data storage media in any standard, distributed, virtual or clustered systems. In an embodiment, the data store communicates with a block-level and/or object-level interface. The application server may include any appropriate hardware, software and firmware for integrating with the data store as needed to perform various aspects of one or more applications for a client device, thereby processing some or all of the data access and business logic for the application.

在一个实施方案中，应用程序服务器与数据存储区协作地提供访问控制服务并且生成内容，包括但不限于由网络服务器以超文本标记语言(“HTML”)、可扩展标记语言(“XML”)、JavaScript、级联样式表(“CSS”)、JavaScript对象符号(JSON)和/或另一适当的客户端侧或其他结构化语言的形式提供给与客户端装置相关联的用户的文本、图形、音频、视频和/或其他内容。在实施方案中，传送到客户端装置的内容由客户端装置处理以提供一种或多种形式的内容，包括但不限于用户可以通过听觉、视觉和/或其他感官感受到的形式。在实施方案中，对所有请求和响应的处理以及客户端装置902与应用程序服务器908之间的内容递送由网络服务器使用PHP：超文本预处理器(“PHP”)、Python、Ruby、Perl、Java、HTML、XML、JSON和/或此示例中的另一适当的服务器侧结构化语言来处理。在实施方案中，本文描述为由单个装置执行的操作由形成分布式和/或虚拟系统的多个装置共同执行。In one embodiment, the application server cooperates with the data store to provide access control services and generate content, including but not limited to text, graphics, audio, video and/or other content provided by the network server to a user associated with the client device in the form of Hypertext Markup Language ("HTML"), Extensible Markup Language ("XML"), JavaScript, Cascading Style Sheets ("CSS"), JavaScript Object Notation (JSON) and/or another appropriate client-side or other structured language. In an embodiment, the content transmitted to the client device is processed by the client device to provide one or more forms of content, including but not limited to forms that the user can perceive through hearing, vision and/or other senses. In an embodiment, the processing of all requests and responses and the delivery of content between the client device 902 and the application server 908 are handled by the network server using PHP: Hypertext Preprocessor ("PHP"), Python, Ruby, Perl, Java, HTML, XML, JSON and/or another appropriate server-side structured language in this example. In an embodiment, the operations described herein as being performed by a single device are performed jointly by multiple devices forming a distributed and/or virtual system.

在实施方案中，数据存储区910包括若干单独的数据表、数据库、数据文档、动态数据存储方案和/或用于存储与本公开的特定方面相关的数据的其他数据存储机构和介质。在实施方案中，数据存储区包括用于存储生成数据912和用户信息916的机构，该生成数据和用户信息用于提供用于生成端的内容。数据存储区还被示出为包括用于存储日志数据914的机构，该日志数据在实施方案中用于报告、计算资源管理、分析或其他此类目的。在实施方案中，诸如页面图像信息和访问权限信息(例如，访问控制策略或其他许可编码)的其他方面视情况存储在以上列出的任何机构中的数据存储区中或数据存储区910中的附加机构中。In an embodiment, data storage area 910 includes several separate data tables, databases, data documents, dynamic data storage schemes and/or other data storage mechanisms and media for storing data related to specific aspects of the present disclosure. In an embodiment, data storage area includes a mechanism for storing generated data 912 and user information 916, which are used to provide content for the generation end. The data storage area is also shown as including a mechanism for storing log data 914, which is used for reporting, computing resource management, analysis or other such purposes in an embodiment. In an embodiment, other aspects such as page image information and access rights information (e.g., access control policies or other permission encodings) are stored in the data storage area in any of the mechanisms listed above or in an additional mechanism in the data storage area 910 as appropriate.

在实施方案中，通过与数据存储区910相关联的逻辑，该数据存储区是可操作的，以从应用程序服务器908接收指令并响应于该指令获得、更新或以其他方式处理数据，并且应用程序服务器908响应于所接收的指令提供静态数据、动态数据或静态数据和动态数据的组合。在实施方案中，动态数据(诸如网络日志(博客)、购物应用程序、新闻服务和其他这类应用程序中使用的数据)由如本文中所描述的服务器侧结构化语言生成，或者由在应用程序服务器上操作或者在应用程序服务器的控制下操作的内容管理系统(“CMS”)提供。在实施方案中，用户通过用户操作的装置提交对某类项目的搜索请求。在该示例中，数据存储区访问用户信息以验证用户的身份，访问目录详细信息以获得有关该类型项目的信息，并将信息返回给用户，诸如在结果列表中用户经由用户装置902上的浏览器查看的网页。继续该示例，在浏览器的专用页面或窗口中查看感兴趣的特定项目的信息。然而，应注意，本公开的实施方案不一定限于网页的上下文，而是更普遍地适用于一般处理请求，其中请求不一定是对内容的请求。示例请求包括管理系统900和/或另一个系统托管的计算资源和/或与其进行交互的请求，诸如用于启动、终止、删除、修改、读取、和/或以其他方式访问此类计算资源。In an embodiment, the data store 910 is operable, through logic associated with the data store 910, to receive instructions from the application server 908 and obtain, update or otherwise process data in response to the instructions, and the application server 908 provides static data, dynamic data or a combination of static data and dynamic data in response to the received instructions. In an embodiment, dynamic data (such as data used in web logs (blogs), shopping applications, news services and other such applications) is generated by a server-side structured language as described herein, or provided by a content management system ("CMS") operating on the application server or operating under the control of the application server. In an embodiment, a user submits a search request for a certain type of project through a device operated by the user. In this example, the data store accesses user information to verify the identity of the user, accesses directory details to obtain information about the type of project, and returns the information to the user, such as a web page viewed by the user via a browser on the user device 902 in a result list. Continuing with this example, the information of a specific project of interest is viewed in a dedicated page or window of the browser. However, it should be noted that the embodiments of the present disclosure are not necessarily limited to the context of web pages, but are more generally applicable to general processing requests, where the request is not necessarily a request for content. Example requests include requests to manage and/or interact with computing resources hosted by system 900 and/or another system, such as to launch, terminate, delete, modify, read, and/or otherwise access such computing resources.

在实施方案中，每个服务器通常包括提供用于所述服务器的一般管理和操作的可执行程序指令的操作系统，并且包括存储指令的计算机可读存储介质(例如，硬盘、随机存取存储器、只读存储器等)，当由服务器的处理器执行时，所述指令使得或以其他方式允许服务器执行其期望的功能(例如，这些功能是根据服务器的一个或多个处理器执行存储在计算机可读存储介质上的指令而实现的)。In an embodiment, each server typically includes an operating system that provides executable program instructions for general management and operation of the server, and includes a computer-readable storage medium (e.g., a hard disk, random access memory, read-only memory, etc.) that stores instructions that, when executed by a processor of the server, cause or otherwise enable the server to perform its desired functions (e.g., such functions are implemented based on execution of instructions stored on the computer-readable storage medium by one or more processors of the server).

在实施方案中，系统900是利用经由通信链路(例如，传输控制协议(TCP)连接和/或传输层安全(TLS)或其他密码保护的通信会话)使用一个或多个计算机网络或直接连接互连的若干计算机系统和部件的分布式和/或虚拟计算系统。然而，本领域的普通技术人员将理解，这种系统可以在具有比图9中所示的更少或更多数量的部件的系统中操作。因此，图9中的系统900的描绘应该被视为本质上是说明性的并且不限制本公开的范围。In an embodiment, system 900 is a distributed and/or virtual computing system utilizing several computer systems and components interconnected using one or more computer networks or direct connections via communication links (e.g., Transmission Control Protocol (TCP) connections and/or Transport Layer Security (TLS) or other cryptographically protected communication sessions). However, one of ordinary skill in the art will appreciate that such a system may operate in a system having fewer or greater numbers of components than those shown in FIG. 9 . Thus, the depiction of system 900 in FIG. 9 should be viewed as illustrative in nature and not limiting the scope of the present disclosure.

各个实施方案可进一步在广泛范围的操作环境中实现，在一些情况下，所述环境可包括可用于操作多个应用程序中的任何应用程序的一个或多个用户计算机、计算装置或处理装置。在实施方案中，用户或客户端装置包括多个计算机中的任何计算机，诸如运行标准操作系统的台式计算机、膝上计算机或平板计算机，以及运行移动软件并能够支持多个联网和消息传递协议的(移动)手机、无线和手持式装置，并且此类系统还包括运行各种可商购获得的操作系统和其他已知应用程序中的任一个的多个工作站，用于诸如开发和数据库管理之类的目的。在实施方案中，这些装置还包括其他电子装置，诸如虚拟终端、瘦客户端、游戏系统和能够经由网络进行通信的其他装置；以及虚拟装置，诸如虚拟机、管理程序、利用操作系统级虚拟化的软件容器和能够经由网络进行通信的支持虚拟化的其他虚拟装置或非虚拟装置。Various embodiments may further be implemented in a wide range of operating environments, which in some cases may include one or more user computers, computing devices, or processing devices that can be used to operate any of a plurality of applications. In an embodiment, user or client devices include any of a plurality of computers, such as desktop computers, laptop computers, or tablet computers running a standard operating system, and (mobile) cell phones, wireless and handheld devices running mobile software and capable of supporting a plurality of networking and messaging protocols, and such systems also include a plurality of workstations running any of a variety of commercially available operating systems and other known applications for purposes such as development and database management. In an embodiment, these devices also include other electronic devices, such as virtual terminals, thin clients, gaming systems, and other devices capable of communicating via a network; and virtual devices, such as virtual machines, hypervisors, software containers utilizing operating system-level virtualization, and other virtual or non-virtual devices capable of communicating via a network that support virtualization.

在实施方案中，系统利用本领域技术人员熟知的至少一个网络来支持使用诸如以下的各种可商购获得的协议中的任何协议的通信：传输控制协议/互联网协议(“TCP/IP”)、用户数据报协议(“UDP”)、在开放系统互连(“OSI”)模型的各个层中操作的协议、文件传输协议(“FTP”)、通用即插即用(“UpnP”)、网络文件系统(“NFS”)、通用互联网文件系统(“CIFS”)以及其他协议。在实施方案中，网络是局域网、广域网、虚拟专用网、互联网、内联网、外联网、公共交换电话网、红外网络、无线网络、卫星网络及其任何组合。在实施方案中，面向连接的协议用于在网络端点之间进行通信，使得面向连接的协议(有时称为基于连接的协议)能够以有序流来传输数据。在实施方案中，面向连接的协议可能可靠，也可能不可靠。例如，TCP协议是一种可靠的面向连接的协议。异步传送模式(“ATM”)和帧中继是不可靠的面向连接的协议。面向连接的协议与面向数据包的协议形成对比，诸如UDP，UDP传输数据包时不保证顺序。In an embodiment, the system utilizes at least one network well known to those skilled in the art to support communications using any of a variety of commercially available protocols such as the Transmission Control Protocol/Internet Protocol ("TCP/IP"), the User Datagram Protocol ("UDP"), protocols operating in various layers of the Open Systems Interconnection ("OSI") model, the File Transfer Protocol ("FTP"), Universal Plug and Play ("UPnP"), the Network File System ("NFS"), the Common Internet File System ("CIFS"), and other protocols. In an embodiment, the network is a local area network, a wide area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, a satellite network, and any combination thereof. In an embodiment, a connection-oriented protocol is used to communicate between network endpoints, so that a connection-oriented protocol (sometimes referred to as a connection-based protocol) can transmit data in an ordered stream. In an embodiment, a connection-oriented protocol may be reliable or unreliable. For example, the TCP protocol is a reliable connection-oriented protocol. Asynchronous Transfer Mode ("ATM") and frame relay are unreliable connection-oriented protocols. Connection-oriented protocols are in contrast to packet-oriented protocols, such as UDP, which do not guarantee the order in which packets are transmitted.

在实施方案中，该系统利用运行多种服务器或中间层应用程序中的一者或多者的网络服务器，包括超文本传输协议(“HTTP”)服务器、FTP服务器、公共网关接口(“CGI”)服务器、数据服务器、Java服务器、Apache服务器和商业应用程序服务器。在实施方案中，一个或多个服务器还能够响应于来自用户装置的请求而执行程序或脚本，如通过执行实现为以任何编程语言(如C、C#或C++)或任何脚本语言(如Ruby、PHP、Perl、Python或TCL)以及其组合写成的一个或多个脚本或程序的一个或多个网络应用程序。在实施方案中，一个或多个服务器还包括数据库服务器，包括但不限于从和可商购获得的那些，以及开源服务器，诸如MySQL、Postgres、SQLite、MongoDB以及能够存储、检索和访问结构化或非结构化数据的任何其他服务器。在实施方案中，数据库服务器包括基于表的服务器、基于文档的服务器、非结构化服务器、关系服务器、非关系服务器，或者这些和/或其他数据库服务器的组合。In an embodiment, the system utilizes a web server running one or more of a variety of server or middle-tier applications, including a Hypertext Transfer Protocol ("HTTP") server, an FTP server, a Common Gateway Interface ("CGI") server, a data server, a Java server, an Apache server, and a commercial application server. In an embodiment, one or more servers are also capable of executing programs or scripts in response to requests from user devices, such as by executing a program implemented in any programming language (such as C, C# or C++) or any scripting language (such as Ruby, PHP, Perl, Python or TCL) and their combinations. In an embodiment, the one or more servers also include a database server, including but not limited to and Commercially available ones, as well as open source servers such as MySQL, Postgres, SQLite, MongoDB, and any other server capable of storing, retrieving, and accessing structured or unstructured data. In an embodiment, the database server includes a table-based server, a document-based server, an unstructured server, a relational server, a non-relational server, or a combination of these and/or other database servers.

在实施方案中，系统包括以上讨论的各种数据存储区以及其他存储器和存储介质，其可驻留在各个位置，诸如在一个或多个计算机本地(和/或驻留在一个或多个计算机中)的存储介质上，或远离网络上的计算机中的任何或所有计算机。在实施方案中，信息驻留在本领域技术人员熟悉的存储区域网(“SAN”)中，并且类似地，用于执行属于计算机、服务器或其他网络装置的功能的任何必要的文件视情况本地和/或远程存储。在系统包括计算机化装置的实施方案中，每个这种装置可包括经由总线电耦合的硬件元件，这些元件包括例如至少一个中央处理单元(“CPU”或“处理器”)、至少一个输入装置(例如，鼠标、键盘、控制器、触摸屏或小键盘)、至少一个输出装置(例如显示装置、打印机或扬声器)、至少一个存储装置诸如磁盘驱动器、光学存储装置以及固态存储装置诸如随机存取存储器(“RAM”)或只读存储器(“ROM”)，以及可移动媒体装置、存储卡、闪存卡等，以及它们的各种组合。In an embodiment, the system includes the various data stores discussed above, as well as other memories and storage media, which may reside in various locations, such as on storage media local to (and/or resident in) one or more computers, or remote from any or all of the computers on the network. In an embodiment, the information resides in a storage area network ("SAN") familiar to those skilled in the art, and similarly, any necessary files for performing the functions attributed to the computer, server, or other network device are stored locally and/or remotely as appropriate. In an embodiment where the system includes computerized devices, each such device may include hardware elements electrically coupled via a bus, including, for example, at least one central processing unit ("CPU" or "processor"), at least one input device (e.g., a mouse, keyboard, controller, touch screen, or keypad), at least one output device (e.g., a display device, printer, or speaker), at least one storage device such as a disk drive, an optical storage device, and a solid-state storage device such as a random access memory ("RAM") or a read-only memory ("ROM"), as well as removable media devices, memory cards, flash memory cards, etc., and various combinations thereof.

在实施方案中，此类装置还包括计算机可读存储介质读取器、通信装置(例如，调制解调器、网卡(无线或有线)、红外通信装置等)、以及如上所述的工作存储器，其中计算机可读存储介质读取器连接或被配置来接收计算机可读存储介质，表示远程、本地、固定和/或可移动存储装置以及用于暂时和/或更永久地含有、存储、传输和检索计算机可读信息的存储介质。在实施方案中，系统和各种装置通常还包括位于至少一个工作存储器装置内的多个软件应用程序、模块、服务系统或其他元件，包括操作系统和应用程序，诸如客户端应用程序或网络浏览器。在实施方案中，使用定制的硬件，并且/或者在硬件、软件(包括诸如小程序的便携式软件)或两者中实现特定的元件。在实施方案中，采用到诸如网络输入/输出装置的其他计算装置的连接。In an embodiment, such devices also include a computer-readable storage medium reader, a communication device (e.g., a modem, a network card (wireless or wired), an infrared communication device, etc.), and a working memory as described above, wherein the computer-readable storage medium reader is connected or configured to receive a computer-readable storage medium, representing a remote, local, fixed and/or removable storage device and a storage medium for temporarily and/or more permanently containing, storing, transmitting and retrieving computer-readable information. In an embodiment, the system and various devices typically also include a plurality of software applications, modules, service systems or other elements located within at least one working memory device, including an operating system and applications, such as a client application or a web browser. In an embodiment, customized hardware is used, and/or specific elements are implemented in hardware, software (including portable software such as applets), or both. In an embodiment, a connection to other computing devices such as a network input/output device is employed.

在实施方案中，用于含有代码或代码部分的存储介质和计算机可读介质包括本领域已知或使用的任何适当介质，包括存储介质和通信介质，诸如但不限于以用于存储和/或传输信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术所实现的易失性和非易失性、可移动和不可移动的介质，包括RAM、ROM、电可擦可编程只读存储器(“EEPROM”)、闪存或其他存储器技术、只读光盘驱动器(“CD-ROM”)、数字通用光盘(DVD)或其他光学存储器、磁盒、磁带、磁盘存储装置或其他磁性存储装置，或可用于存储所需信息且可由系统装置访问的任何其他介质。基于本文所提供的公开内容和教义，本技术领域普通技术人员将了解实现各个实施方案的其他方式和/或方法。In an embodiment, storage media and computer-readable media for containing code or code portions include any suitable media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing and/or transmitting information (such as computer-readable instructions, data structures, program modules or other data), including RAM, ROM, electrically erasable programmable read-only memory ("EEPROM"), flash memory or other memory technology, read-only compact disk drive ("CD-ROM"), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tapes, magnetic disk storage devices or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by the system device. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will understand other ways and/or methods to implement various embodiments.

因此，本说明书和附图被认为是说明性的而非限制性的。然而，将显而易见的是：在不脱离如在权利要求书中阐述的本发明的更宽广精神和范围的情况下，可以对其做出各种修改和改变。Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the claims.

其他变型也在本公开的精神内。因此，尽管所公开的技术可具有各种修改和替代构造，但在附图中示出并在上文中详细描述了其某些示出的实施方案。然而，应理解，并不旨在将本发明限于所公开的一种或多种具体形式，相反，旨在涵盖落在如所附权利要求限定的本发明的精神和范围内的所有修改、替代构造和等效物。Other variations are also within the spirit of the present disclosure. Therefore, although the disclosed technology may have various modifications and alternative configurations, certain illustrated embodiments thereof are shown in the drawings and described in detail above. However, it should be understood that it is not intended to limit the present invention to one or more specific forms disclosed, but rather, it is intended to cover all modifications, alternative configurations and equivalents that fall within the spirit and scope of the present invention as defined by the appended claims.

另外，可鉴于以下条款对本公开的实施方案进行描述：Additionally, embodiments of the present disclosure may be described in terms of the following:

1.一种计算机实现的方法，其包括：1. A computer-implemented method comprising:

接收在无服务器计算架构上托管机器学习模型的第一请求，所述第一请求包括端点的指示，所述端点包括用于访问所述机器学习模型的代码；receiving a first request to host a machine learning model on a serverless computing architecture, the first request comprising an indication of an endpoint, the endpoint comprising code for accessing the machine learning model;

将模型服务器与扩展相关联，所述模型服务器与所述端点相关联，所述扩展包括用于与所述模型服务器交接的代码；associating a model server with an extension, the model server being associated with the endpoint, the extension comprising code for interfacing with the model server;

接收使用所述机器学习模型生成推断的第二请求；receiving a second request to generate an inference using the machine learning model;

通过在所述无服务器计算架构上至少执行计算函数来处理所述第二请求，其中所述计算函数使用所述扩展以经由所述模型服务器来获得由所述机器学习模型生成的推断；以及processing the second request by executing at least a compute function on the serverless computing architecture, wherein the compute function uses the extension to obtain inferences generated by the machine learning model via the model server; and

响应于所述第二请求而提供所述推断。The inference is provided in response to the second request.

2.如条款1所述的计算机实现的方法，其中所述扩展中的代码在由所述计算函数调用时使得所述机器学习模型能够被所述模型服务器访问。2. A computer-implemented method as described in clause 1, wherein the code in the extension enables the machine learning model to be accessible to the model server when called by the compute function.

3.如条款1或2所述的计算机实现的方法，其还包括：3. The computer-implemented method of clause 1 or 2, further comprising:

生成包括所述模型服务器和所述扩展的文件；Generate a file including the model server and the extension;

存储所述文件；以及storing the file; and

存储所述文件与所述端点之间的关联，其中信息适于基于包括在所述第二请求中的信息来对所述文件进行定位。An association between the file and the endpoint is stored, wherein the information is suitable for locating the file based on information included in the second request.

4.如条款1至3中任一项所述的计算机实现的方法，其中所述扩展包括用于将由所述无服务器计算函数提供的数据转换为与由所述模型服务器实现的网络服务器兼容的格式的代码。4. A computer-implemented method as described in any of clauses 1 to 3, wherein the extension includes code for converting data provided by the serverless computing function into a format compatible with a network server implemented by the model server.

5.如条款1至4中任一项所述的计算机实现的方法，其中所述无服务器计算架构至少部分地基于使用所述机器学习模型生成推断的请求的量来添加或者移除计算能力。5. A computer-implemented method as described in any of clauses 1 to 4, wherein the serverless computing architecture adds or removes computing power based at least in part on the amount of requests to generate inferences using the machine learning model.

6.一种系统，其包括：6. A system comprising:

至少一个处理器；以及at least one processor; and

包括计算机可执行指令的存储器，所述计算机可执行指令响应于所述至少一个处理器的执行而使得所述系统：a memory including computer executable instructions that, in response to execution by the at least one processor, cause the system to:

通过至少将端点与包括用于与模型服务器交接的代码的扩展相关联而将无服务器计算架构配置成托管计算服务，所述模型服务器与所述端点相关联，其中所述模型服务器包括用于访问所述计算服务的代码；configuring the serverless computing architecture to host a computing service by associating at least an endpoint with an extension including code for interfacing with a model server, the model server being associated with the endpoint, wherein the model server includes code for accessing the computing service;

接收从所述计算服务获得结果的请求；以及receiving a request to obtain a result from the computing service; and

通过在所述无服务器计算架构上至少执行计算函数来对所述请求作出响应，其中所述计算函数调用由所述扩展实现的一个或多个函数，并且所述一个或多个函数经由所述模型服务器来获得由所述计算服务生成的结果。The request is responded to by executing at least a compute function on the serverless computing architecture, wherein the compute function calls one or more functions implemented by the extension, and the one or more functions obtain results generated by the compute service via the model server.

7.如条款6所述的系统，其中由所述扩展实现的所述一个或多个函数在由所述无服务器计算函数调用时使得所述计算服务准备好供所述模型服务器使用。7. The system of clause 6, wherein the one or more functions implemented by the extension, when invoked by the serverless compute function, prepare the compute service for use by the model server.

8.如条款6或7所述的系统，所述存储器还包括计算机可执行指令，所述计算机可执行指令响应于所述至少一个处理器的执行而使得所述系统：8. The system of clause 6 or 7, wherein the memory further comprises computer executable instructions that, in response to execution by the at least one processor, cause the system to:

响应于确定所述请求指向与所述模型服务器相关联的网络地址并且所述端点已被配置成利用所述无服务器计算架构，拦截所述请求。In response to determining that the request is directed to a network address associated with the model server and the endpoint has been configured to utilize the serverless computing architecture, the request is intercepted.

9.如条款6至8所述的系统，所述存储器还包括计算机可执行指令，所述计算机可执行指令响应于所述至少一个处理器的执行而使得所述系统：9. The system of clauses 6 to 8, wherein the memory further comprises computer executable instructions which, in response to execution by the at least one processor, cause the system to:

生成包括所述端点和所述扩展的文件；generating a file including the endpoint and the extension;

存储所述文件；storing the file;

存储所述文件与所述端点之间的关联；并且storing an association between the file and the endpoint; and

使用信息以基于包括在所述请求中的信息来对所述文件进行定位以生成推断。Information is used to locate the file based on information included in the request to generate an inference.

10.如条款6至9中任一条款所述的系统，其中由所述扩展实现的所述一个或多个函数将由所述无服务器计算函数提供的数据转换为所述模型服务器能够使用的格式。10. The system of any one of clauses 6 to 9, wherein the one or more functions implemented by the extension convert data provided by the serverless computing function into a format usable by the model server.

11.如条款6至10中任一项所述的系统，其中所述模型服务器包括用于使得能够在被保留用于提供对所述计算服务的访问的服务器的实例上托管所述模型服务器的代码。11. The system of any of clauses 6 to 10, wherein the model server comprises code for enabling hosting of the model server on an instance of a server reserved for providing access to the computing service.

12.如条款6至11中任一项所述的系统，其中所述服务器实现HTTP服务器，并且其中所述扩展与所述端点交接以激活所述HTTP服务器。12. The system of any of clauses 6 to 11, wherein the server implements an HTTP server, and wherein the extension interfaces with the endpoint to activate the HTTP server.

13.如条款6至12中任一项所述的系统，其中所述无服务器计算架构根据需求来动态地分配计算能力，以处理使用所述计算服务来获得推断的请求。13. The system of any of clauses 6 to 12, wherein the serverless computing architecture dynamically allocates computing capacity based on demand to process requests to obtain inferences using the computing service.

14.一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质上存储有可执行指令，所述可执行指令作为由计算机系统的一个或多个处理器执行的结果而使得计算机系统至少：14. A non-transitory computer-readable storage medium having executable instructions stored thereon, the executable instructions as a result of execution by one or more processors of a computer system causing the computer system to at least:

通过至少将端点与包括用于与所述端点交接的代码的扩展相关联而将无服务器计算架构配置成托管机器学习模型，其中与所述端点相关联的模型服务器包括用于与所述机器学习模型交接的代码；configuring a serverless computing architecture to host a machine learning model by associating at least an endpoint with an extension comprising code for interfacing with the endpoint, wherein a model server associated with the endpoint comprises code for interfacing with the machine learning model;

接收使用所述机器学习模型生成推断的请求；并且receiving a request to generate an inference using the machine learning model; and

通过在所述无服务器计算架构上至少执行无服务器计算函数来处理所述请求，其中所述无服务器计算函数使用所述扩展以经由所述模型服务器来获得由所述机器学习模型生成的推断。The request is processed by executing at least a serverless computing function on the serverless computing architecture, wherein the serverless computing function uses the extension to obtain inferences generated by the machine learning model via the model server.

15.如条款14所述的非暂态计算机可读存储介质，其中扩展使得所述机器学习模型能够被所述模型服务器访问。15. The non-transitory computer-readable storage medium of clause 14, wherein the extension enables the machine learning model to be accessible to the model server.

16.如条款14或15所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：16. The non-transitory computer-readable storage medium of clause 14 or 15, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

响应于确定所述请求指向与所述模型服务器相关联的网络地址并且所述端点已被配置成利用所述无服务器计算架构，拦截所述请求以生成推断。In response to determining that the request is directed to a network address associated with the model server and the endpoint has been configured to utilize the serverless computing architecture, the request is intercepted to generate inferences.

17.如条款14至16中任一项所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：17. The non-transitory computer-readable storage medium of any one of clauses 14 to 16, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

存储所述文件；以及storing the file; and

响应于接收到生成推断的所述请求而从存储装置中检索所述文件。The file is retrieved from storage in response to receiving the request to generate an inference.

18.如条款14至17中任一项所述的非暂态计算机可读存储介质，其中所述扩展将由所述无服务器计算函数提供的数据转换为与由所述模型服务器实现的网络服务器兼容的格式。18. The non-transitory computer-readable storage medium of any of clauses 14 to 17, wherein the extension converts data provided by the serverless compute function into a format compatible with a network server implemented by the model server.

19.如条款14至18中任一项所述的非暂态计算机可读存储介质，其中所述端点包括用于使用所述机器学习模型来获得推断的代码，并且所述扩展包括用于使用所述模型服务器来获得所述推断的代码。19. A non-transitory computer-readable storage medium as described in any of clauses 14 to 18, wherein the endpoint includes code for obtaining inferences using the machine learning model, and the extension includes code for obtaining the inferences using the model server.

20.如条款14至19中任一项所述的非暂态计算机可读存储介质，其中所述机器学习模型由所述模型服务器所访问的计算服务来托管。20. A non-transitory computer-readable storage medium as described in any of clauses 14 to 19, wherein the machine learning model is hosted by a computing service accessed by the model server.

21.一种系统，其包括：21. A system comprising:

至少一个处理器；at least one processor;

获得包括用于与机器学习模型交接的代码的模型服务器；obtaining a model server including code for interfacing with the machine learning model;

将所述模型服务器与包括用于与所述模型服务器交接的代码的扩展相关联；associating the model server with an extension including code for interfacing with the model server;

接收使用所述机器学习模型来获得推断的多个请求；receiving a plurality of requests to obtain inferences using the machine learning model;

通过在被配置成使用所述模型服务器生成推断的服务器上使用所述模型服务器的第一实例来对所述多个请求的第一部分作出响应；并且responding to a first portion of the plurality of requests using a first instance of the model server on a server configured to generate inferences using the model server; and

通过使用无服务器计算架构调用计算函数来对所述请求的第二部分作出响应，其中所述计算函数使用所述扩展和所述模型服务器的至少第二实例来生成推断，其中所述请求的所述第二部分的大小至少部分地基于所述服务器对请求的所述第一部分作出响应的能力。A second portion of the request is responded to by invoking a compute function using a serverless computing architecture, wherein the compute function generates inferences using the extension and at least a second instance of the model server, wherein a size of the second portion of the request is based at least in part on a capability of the server to respond to the first portion of the request.

22.如条款21所述的系统，其中所述无服务器计算架构根据所述第二部分的所述大小来分配用于处理所述请求的所述第二部分的能力。22. The system of clause 21, wherein the serverless computing architecture allocates capacity for processing the second portion of the request based on the size of the second portion.

23.如条款21或22所述的系统，所述存储器还包括计算机可执行指令，所述计算机可执行指令响应于所述至少一个处理器的执行而使得所述系统：23. The system of clause 21 or 22, wherein the memory further comprises computer executable instructions that, in response to execution by the at least one processor, cause the system to:

生成将一个或多个附加服务器配置成使用所述模型服务器生成推断的建议。A recommendation is generated to configure one or more additional servers to generate inferences using the model server.

24.如条款21至23中任一项所述的系统，其中所述模型服务器包括由所述扩展激活的HTTP服务器。24. The system of any of clauses 21 to 23, wherein the model server comprises an HTTP server activated by the extension.

25.如条款21至24中任一项所述的系统，其中调整所述第一部分和所述第二部分的相应大小以将所述服务器的利用率最大化直到阈值量，并且其中将导致所述利用率超过所述阈值量的请求被指派给所述第二部分。25. A system as described in any of clauses 21 to 24, wherein the respective sizes of the first part and the second part are adjusted to maximize utilization of the server up to a threshold amount, and wherein requests that will cause the utilization to exceed the threshold amount are assigned to the second part.

26.一种方法，其包括：26. A method comprising:

获得包括用于与计算服务交接的代码的模型服务器；obtaining a model server including code for interfacing with a computing service;

将所述模型服务器与和所述模型服务器交接的扩展相关联；Associating the model server with an extension that interfaces with the model server;

接收通过使用所述计算服务来获得结果的多个请求；receiving a plurality of requests to obtain results by using the computing service;

通过使用安装在服务器上的所述模型服务器的第一实例来对从所述计算服务获得结果的所述多个请求的第一部分作出响应；以及responding to a first portion of the plurality of requests to obtain results from the computing service by using a first instance of the model server installed on a server; and

通过使用无服务器计算架构调用计算函数来对所述请求的第二部分作出响应，其中所述计算函数使用所述扩展和所述模型服务器的至少第二实例来从所述计算服务获得结果。A second portion of the request is responded to by invoking a compute function using a serverless computing architecture, wherein the compute function uses the extension and at least a second instance of the model server to obtain a result from the compute service.

27.如条款26所述的方法，其中至少部分地基于所述服务器的能力而在所述第一部分与所述第二部分之间划分请求。27. The method of clause 26, wherein the request is divided between the first portion and the second portion based at least in part on capabilities of the server.

28.如条款26或27所述的方法，其中所述无服务器计算架构根据所述请求的所述第二部分的大小来分配用于处理所述第二部分的能力。28. The method of clause 26 or 27, wherein the serverless computing architecture allocates capacity for processing the second portion of the request based on a size of the second portion of the request.

29.如条款26至28中任一项所述的方法，其还包括：29. The method of any one of clauses 26 to 28, further comprising:

生成添加具有所安装的端点的附加服务器的建议，所述建议至少部分地基于利用所述无服务器计算架构以从所述计算服务获得结果。A recommendation to add an additional server having the installed endpoint is generated, the recommendation based at least in part on utilizing the serverless computing architecture to obtain results from the computing service.

30.如条款26至29中任一项所述的方法，其还包括：30. The method of any one of clauses 26 to 29, further comprising:

调整所述第一部分的大小以将所述服务器的利用率最大化而不超过阈值利用量。The first portion is sized to maximize utilization of the server without exceeding a threshold utilization amount.

31.如条款26至30中任一项所述的方法，其还包括：31. The method of any one of clauses 26 to 30, further comprising:

接收启用混合配置以用于使用所述机器学习模型来获得推断的请求；以及receiving a request to enable a hybrid configuration for obtaining inferences using the machine learning model; and

响应于所述请求，生成所述扩展并且将所述扩展与所述模型服务器相关联。In response to the request, the extension is generated and associated with the model server.

32.如条款26至31中任一项所述的方法，其还包括：32. The method of any one of clauses 26 to 31, further comprising:

拦截从所述计算服务获得结果的第一请求和第二请求，所述拦截至少部分地基于确定混合配置已被启用；intercepting a first request and a second request to obtain a result from the computing service, the intercepting being based at least in part on determining that a hybrid configuration has been enabled;

使用所述服务器来处理所述第一请求；以及processing the first request using the server; and

使用所述无服务器计算架构来处理所述第二请求。The second request is processed using the serverless computing architecture.

33.如条款26至32中任一项所述的方法，其中包括用于获得推断的所述请求的工作负荷可在所述服务器的实例与所述无服务器计算架构之间转移。33. The method of any of clauses 26 to 32, wherein the workload comprising the request for obtaining the inference is transferable between instances of the server and the serverless computing architecture.

34.一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质上存储有可执行指令，所述可执行指令作为由计算机系统的一个或多个处理器执行的结果而使得计算机系统至少：34. A non-transitory computer-readable storage medium having executable instructions stored thereon, the executable instructions as a result of execution by one or more processors of a computer system causing the computer system to at least:

将所述端点与和所述模型服务器交接的扩展相关联；Associating the endpoint with an extension that interfaces with the model server;

通过使用安装在服务器上的所述模型服务器的第一实例来对获得推断的所述多个请求的第一部分作出响应；以及responding to a first portion of the plurality of requests to obtain inferences by using a first instance of the model server installed on a server; and

通过使用无服务器计算架构调用计算函数来对所述请求的第二部分作出响应，其中所述计算函数使用所述扩展和所述模型服务器的至少第二实例来生成推断。A second portion of the request is responded to by invoking a compute function using a serverless computing architecture, wherein the compute function generates inferences using the extension and at least a second instance of the model server.

35.如条款34所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：35. The non-transitory computer-readable storage medium of clause 34, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

将路由器配置成至少部分地基于所述服务器的能力利用率来确定所述第一部分和所述第二部分。The router is configured to determine the first portion and the second portion based at least in part on capacity utilization of the server.

36.如条款34或35所述的非暂态计算机可读存储介质，其中所述无服务器计算架构根据所述请求的所述第二部分的大小来分配用于处理所述第二部分的能力。36. The non-transitory computer-readable storage medium of clause 34 or 35, wherein the serverless computing architecture allocates capacity for processing the second portion of the request based on a size of the second portion.

37.如条款34至36中任一项所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：37. The non-transitory computer-readable storage medium of any of clauses 34 to 36, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

生成添加具有所安装的模型服务器的附加服务器的建议，所述建议至少部分地基于利用所述无服务器计算架构来生成推断。A recommendation is generated to add an additional server having an installed model server, the recommendation based at least in part on generating inferences utilizing the serverless computing architecture.

38.如条款34至37中任一项所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：38. The non-transitory computer-readable storage medium of any of clauses 34 to 37, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

确定处理请求的较大部分将导致所述服务器超过阈值利用量；并且determining that processing a greater portion of the requests will cause the server to exceed a threshold utilization; and

响应于所述确定而调整所述第一部分和所述第二部分的相对大小。Relative sizes of the first portion and the second portion are adjusted in response to the determination.

39.如条款34至38中任一项所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：39. The non-transitory computer-readable storage medium of any of clauses 34 to 38, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

接收启用混合配置以用于使用所述机器学习模型来获得推断的请求；并且receiving a request to enable a hybrid configuration for obtaining inferences using the machine learning model; and

生成所述扩展；并且generating the extension; and

存储所述扩展与所述模型服务器之间的关联。An association between the extension and the model server is stored.

40.如条款34至39中任一项所述的非暂态计算机可读存储介质，其中所述指令还包括作为由所述一个或多个处理器执行的结果而使得所述计算机系统进行以下操作的指令：40. The non-transitory computer-readable storage medium of any of clauses 34 to 39, wherein the instructions further comprise instructions that, as a result of execution by the one or more processors, cause the computer system to:

将路由器配置成拦截请求以获得推断，其中所述拦截将至少部分地基于确定混合配置已被启用。The router is configured to intercept the request to obtain the inference, wherein the interception will be based at least in part on determining that the hybrid configuration has been enabled.

在描述所公开实施方案的上下文中(尤其是在以下权利要求的上下文中)，术语“一个”和“一种”以及“所述”以及类似指称对象的使用应解释为涵盖单数和复数两者，除非在本文另外地指示或明显地与上下文矛盾。类似地，除非明确地或者与上下文相矛盾，否则术语“或”的使用应被解释为意指“和/或”。术语“包括”、“具有”、“包含”和“含有”应被解释为开放式术语(即，意味着“包括但不限于”)，除非另有说明。术语“连接的”在未进行修改并且指代物理连接的情况下应解释为部分地或全部地纳入在以下解释内：附接至或结合在一起，即使存在介入物。除非本文另外指明，否则本文中值范围的列举仅仅意图用作个别地表示属于所述范围的各单独值的速记方法，并且犹如本文个别描述地那样将各单独值并入到本说明书中。除非另有说明或与上下文矛盾，否则对术语“集合”(例如，“项目集合”)或“子集”的使用，应被解释为包括一个或多个元素的非空集合。此外，除非另有说明或与上下文矛盾，否则相应集合的术语“子集”不一定表示相应集合的适当子集，但是该子集和相应集合可能相等。除非另有明确说明或从上下文中清楚，否则对短语“基于”的使用意指“至少部分基于”，并且不限于“仅基于”。In the context of describing the disclosed embodiments (especially in the context of the following claims), the use of the terms "one" and "a kind of" and "the" and similar referents should be interpreted as covering both the singular and the plural, unless otherwise indicated herein or clearly contradicted by the context. Similarly, the use of the term "or" should be interpreted as meaning "and/or" unless explicitly or contradicted by the context. The terms "include", "have", "include" and "contain" should be interpreted as open terms (i.e., meaning "including but not limited to") unless otherwise specified. The term "connected" should be interpreted as being partially or fully included in the following explanation when it is not modified and refers to a physical connection: attached to or combined together, even if there is an intervening object. Unless otherwise indicated herein, the enumeration of the value range herein is intended to be used only as a shorthand method to individually represent each individual value belonging to the range, and each individual value is incorporated into this specification as if it were individually described herein. Unless otherwise specified or contradicted by context, use of the term "set" (e.g., "a set of items") or "subset" should be interpreted as a non-empty set that includes one or more elements. Furthermore, unless otherwise specified or contradicted by context, the term "subset" of a corresponding set does not necessarily mean a proper subset of the corresponding set, although the subset and the corresponding set may be equal. Unless explicitly stated otherwise or clear from context, use of the phrase "based on" means "based at least in part on," and is not limited to "based only on."

除非另外特别陈述或者另外与上下文明显矛盾，否则诸如“A、B和C中的至少一者(at least one of A,B,and C)”或者“A、B和C中的至少一者(at least one of A,B andC)”形式的短语(即，具有或不具有牛津逗号(Oxford comma)的相同短语)的连接短语在上下文内另外被理解为一般用于表示项目、术语等可以是A或B或C、A和B和C的集合的任何非空子集或者不与上下文矛盾或以其他方式被排除的包含至少一个A、至少一个B或至少一个C的任何集合。例如，在具有三个成员的集合的说明性示例中，连接短语“A、B和C中的至少一者(at least one of A,B,and C)”以及“A、B和C中的至少一者(at least one of A,B andC)”是指以下集合中的任何集合：{A}、{B}、{C}、{A,B}、{A,C}、{B,C}、{A,B,C}，并且如果没有明确地或者与上下文相矛盾，则该连接短语是指具有{A}、{B}和/或{C}作为子集的任何集合(例如具有多个“A”的集合)。因此，这种合取性语言通常不意图暗示某些实施方案要求各自存在A中的至少一者、B中的至少一者和C中的至少一者。类似地，除非有明确说明或从上下文中可以清楚地看出不同的含义，否则诸如"A、B、或C中的至少一者"和"A、B或C中的至少一者"的短语与"A、B、和C中的至少一者"和"A、B和C中的至少一者"一样都是指以下集合中的任何集合：{A}、{B}、{C}、{A,B}、{A,C}、{B,C}、{A,B,C}。另外，除非另有说明或与上下文矛盾，否则术语“多个”指示为复数的状态(例如，“多个项”指示多项)。多个项目的数量为至少两个，但当明确地或者通过上下文指出时，该数量可以为更多。Unless specifically stated otherwise or otherwise clearly contradicted by context, conjunction phrases such as “at least one of A, B, and C” or “at least one of A, B and C” (i.e., the same phrase with or without the Oxford comma) are additionally understood within the context to be generally used to indicate that an item, term, etc. can be any non-empty subset of the set of A or B or C, A and B and C, or any set that includes at least one A, at least one B, or at least one C that is not contradicted by context or otherwise excluded. For example, in the illustrative example of a set with three members, the conjunction phrases "at least one of A, B, and C" and "at least one of A, B and C" refer to any of the following sets: {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}, and, if not explicitly or contradicted by context, any set having {A}, {B}, and/or {C} as a subset (e.g., a set with multiple "A"s). Thus, such conjunction language is generally not intended to imply that certain embodiments require the presence of at least one of A, at least one of B, and at least one of C, respectively. Similarly, phrases such as "at least one of A, B, or C" and "at least one of A, B, or C" refer to any of the following sets: {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}, unless expressly specified or clearly indicated by context. In addition, the term "plurality" indicates plural state (e.g., "plurality items" indicates multiple items) unless otherwise specified or contradicted by context. The number of plural items is at least two, but the number can be more when expressly or by context indicates otherwise.

除非本文中另外指出或与上下文明显矛盾，否则本文中所描述的过程的操作都可以按任何合适的顺序进行。在实施方案中，诸如本文中描述的那些过程(或其变型和/或组合)的过程在被配置有可执行指令的一个或多个计算机系统的控制下执行，并且通过硬件或其组合而实现为在一个或多个处理器上共同执行的代码(例如可执行指令、一个或多个计算机程序或者一个或多个应用程序)。在实施方案中，代码例如以计算机程序的形式存储在计算机可读存储介质上，所述计算机程序包括可由一个或多个处理器执行的多个指令。在实施方案中，计算机可读存储介质是排除暂态信号(例如，传播的瞬变电或电磁传输)但包括在暂态信号的收发器内的非暂态数据存储电路(例如，缓冲器、高速缓存和队列)的非暂态计算机可读存储介质。在实施方案中，代码(例如，可执行代码或源代码)被存储在其上存储有可执行指令的一组一个或多个非暂态计算机可读存储介质上，所述可执行指令在由计算机系统的一个或多个处理器执行时(即，由于被执行)使计算机系统执行本文所述的操作。在实施方案中，该组非暂态计算机可读存储介质包括多个非暂态计算机可读存储介质，并且多个非暂态计算机可读存储介质中的一个或多个单独的非暂态存储介质不具有所有代码，而所述多个非暂态计算机可读存储介质共同存储所有代码。在实施方案中，执行可执行指令，使得不同指令由不同处理器执行——例如，在实施方案中，非暂态计算机可读存储介质存储指令，并且主CPU执行指令中的一些指令，而图形处理器单元执行其他指令。在另一实施方案中，计算机系统的不同部件具有单独的处理器并且不同处理器执行指令的不同子集。Unless otherwise noted herein or clearly contradictory to the context, the operations of the processes described herein can be performed in any suitable order. In an embodiment, processes such as those described herein (or variations and/or combinations thereof) are performed under the control of one or more computer systems configured with executable instructions, and are implemented as codes (e.g., executable instructions, one or more computer programs, or one or more applications) executed together on one or more processors by hardware or a combination thereof. In an embodiment, the code is stored on a computer-readable storage medium, for example, in the form of a computer program, which includes multiple instructions that can be executed by one or more processors. In an embodiment, a computer-readable storage medium is a non-transient computer-readable storage medium that excludes transient signals (e.g., propagated transient electricity or electromagnetic transmission) but includes non-transient data storage circuits (e.g., buffers, caches, and queues) in a transceiver of a transient signal. In an embodiment, code (e.g., executable code or source code) is stored on a set of one or more non-transient computer-readable storage media on which executable instructions are stored, and the executable instructions are executed by one or more processors of a computer system (i.e., due to being executed) to cause the computer system to perform the operations described herein. In an embodiment, the set of non-transitory computer-readable storage media includes a plurality of non-transitory computer-readable storage media, and one or more individual non-transitory storage media of the plurality of non-transitory computer-readable storage media do not have all of the code, while the plurality of non-transitory computer-readable storage media collectively store all of the code. In an embodiment, the executable instructions are executed so that different instructions are executed by different processors—for example, in an embodiment, the non-transitory computer-readable storage medium stores instructions, and the main CPU executes some of the instructions, while the graphics processor unit executes other instructions. In another embodiment, different components of the computer system have separate processors and different processors execute different subsets of the instructions.

因此，在实施方案中，计算机系统被配置成实现单独地或共同地执行本文所描述的过程的操作的一个或多个服务，并且这样的计算机系统配置有使得能够执行操作的适用的硬件和/或软件。此外，在本公开的实施方案中，计算机系统是单个装置，并且在另一实施方案中，计算机系统是包括多个装置的分布式计算机系统，该多个装置以不同方式操作，使得分布式计算机系统执行本文中描述的操作并且使得单个装置不执行所有操作。Thus, in an embodiment, the computer system is configured to implement one or more services that individually or collectively perform the operations of the processes described herein, and such a computer system is configured with applicable hardware and/or software that enables the operations to be performed. Furthermore, in an embodiment of the present disclosure, the computer system is a single device, and in another embodiment, the computer system is a distributed computer system comprising multiple devices that operate in different ways such that the distributed computer system performs the operations described herein and such that a single device does not perform all the operations.

除非另外要求保护，否则本文中提供的任何和所有示例或示例性语言(例如“诸如”)的使用仅旨在更好地说明本发明的实施方案，而不是对本发明的范围进行限制。本说明书中的语言不应被解释为将任何非要求保护的要素指示为实践本发明所必需的。Unless otherwise claimed, the use of any and all examples or exemplary language (e.g., "such as") provided herein is intended only to better illustrate embodiments of the present invention, rather than to limit the scope of the present invention. Language in this specification should not be construed as indicating any non-claimed element as essential to the practice of the present invention.

本文描述了本公开的实施方案，包括发明人已知用于执行本发明的最佳模式。阅读上述描述后，那些实施方案的变体对于本领域普通技术人员可以变得显而易见。发明人预期熟练技术人员在适当时采用这类变化，并且发明人旨在以不同于如本文中具体描述的方式实践本公开的实施方案。因此，如适用法律所允许，本公开的范围包括此处所附权利要求中叙述的主题的所有修改和等效物。此外，除非本文中另有指示或另外与上下文明显矛盾，否则本公开的范围涵盖其所有可能变型中的上述要素的任何组合。Embodiments of the present disclosure are described herein, including the best mode known to the inventor for carrying out the present invention. After reading the above description, variations of those embodiments may become apparent to those of ordinary skill in the art. The inventor expects that skilled persons will adopt such variations when appropriate, and the inventor intends to practice the embodiments of the present disclosure in a manner different from that specifically described herein. Therefore, as permitted by applicable law, the scope of the present disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto. In addition, unless otherwise indicated herein or otherwise clearly contradictory to the context, the scope of the present disclosure covers any combination of the above-mentioned elements in all possible variations thereof.

本文所引用的所有参考文献(包括出版物、专利申请和专利)据此以引用方式并入，其程度等同于每个参考文献单独地且具体地被表示为以引用方式并入本文并且以其全文在本文得以陈述。All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.

Claims

1. A computer-implemented method, comprising:

receiving a first request to host a machine learning model on a serverless computing architecture, the first request comprising an indication of an endpoint, the endpoint comprising code for accessing the machine learning model;

Associating a model server with an extension, the model server associated with the endpoint, the extension comprising code for interfacing with the model server;

Receiving a second request to generate an inference using the machine learning model;

Processing the second request by executing at least a computing function on the serverless computing architecture, wherein the computing function uses the expansion to obtain inferences generated by the machine learning model via the model server; and

The inference is provided in response to the second request.

2. The computer-implemented method of claim 1, wherein the code in the extension, when invoked by the computing function, enables the machine learning model to be accessed by the model server.

3. The computer-implemented method of claim 1, further comprising:

Generating a file comprising the model server and the extension;

storing the file; and

An association between the file and the endpoint is stored, wherein information is adapted to locate the file based on information included in the second request.

4. The computer-implemented method of claim 1, wherein the extension includes code for converting data provided by the serverless computing function into a format compatible with a web server implemented by the model server.

5. The computer-implemented method of claim 1, wherein the server-less computing architecture adds or removes computing power based at least in part on an amount of requests to generate inferences using the machine learning model.

6. A system, comprising:

at least one processor; and

A memory comprising computer-executable instructions that, in response to execution by the at least one processor, cause the system to:

Configuring a serverless computing architecture to host a computing service by associating at least an endpoint with an extension that includes code for interfacing with a model server associated with the endpoint, wherein the model server includes code for accessing the computing service;

Receiving a request to obtain a result from the computing service; and

Responding to the request by executing at least a computing function on the serverless computing architecture, wherein the computing function invokes one or more functions implemented by the extension and the one or more functions obtain results generated by the computing service via the model server.

7. The system of claim 6, wherein the one or more functions implemented by the extension, when invoked by the serverless computing function, prepares the computing service for use by the model server.

8. The system of claim 6, the memory further comprising computer-executable instructions that, in response to execution by the at least one processor, cause the system to:

In response to determining that the request is directed to a network address associated with the model server and that the endpoint has been configured to utilize the serverless computing architecture, intercept the request.

9. The system of claim 6, wherein the model server comprises code for enabling hosting of the model server on an instance of a server reserved for providing access to the computing service.

10. The system of claim 6, wherein the serverless computing architecture dynamically allocates computing power according to demand to handle requests to obtain inferences using the computing services.

11. A method, comprising:

Obtaining a model server comprising code for interfacing with a computing service;

associating the model server with an extension interfacing with the model server;

Receiving a plurality of requests to obtain results by using the computing service;

Responding to a first portion of the plurality of requests to obtain results from the computing service by using a first instance of the model server installed on a server; and

Responding to the second portion of the request by invoking a computing function using a serverless computing architecture, wherein the computing function uses the extension and at least a second instance of the model server to obtain a result from the computing service.

12. The method of claim 11, wherein the serverless computing architecture allocates capacity for processing the second portion of the request according to a size of the second portion.

13. The method of claim 11, further comprising:

Receiving a request to enable a hybrid configuration for obtaining an inference using the machine learning model; and

In response to the request, the extension is generated and associated with the model server.

14. The method of claim 11, further comprising:

intercepting a first request and a second request to obtain results from the computing service, the intercepting based at least in part on determining that a hybrid configuration has been enabled;

Processing the first request using the server; and

The second request is processed using the serverless computing architecture.

15. The method of claim 11, wherein a workload including the request to obtain inferences can be transferred between an instance of the server and the serverless computing architecture.