CN115130674B

CN115130674B - A method and device for accelerating model reasoning

Info

Publication number: CN115130674B
Application number: CN202210741859.2A
Authority: CN
Inventors: 刘帅朝; 黄乐乐; 张德
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2025-04-18
Anticipated expiration: 2042-06-27
Also published as: CN115130674A

Abstract

The invention provides a method and a device for model reasoning acceleration, wherein the method comprises the steps of obtaining request information, analyzing the request information, determining target model services with a dependency relation corresponding to the request information, determining model parameter values, processing the model parameter values sequentially through at least one target model service with the dependency relation and a message queue arranged between the at least one target model service with the dependency relation to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relation is used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service. The method and the device solve the problem of high delay of the target model service, and improve the performance of the whole integrated model service system.

Description

Model reasoning acceleration method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for model reasoning acceleration.

Background

In the high concurrency scene of risk control, different business scenes need integration of a plurality of models to provide services so as to meet the business requirements. The scheme in the prior art focuses on optimizing the model, such as calculation optimization or model compression in the process of reasoning and calculation by a single model, and the link processed by the model is not deeply optimized, so that the overall delay of the system cannot be reduced, and the delay is higher and higher in a high concurrency scene.

Disclosure of Invention

The method and the device for accelerating model reasoning solve the problem of high delay of the target model service and improve the performance of the whole integrated model service system.

In a first aspect, the present disclosure provides a method for model inference acceleration for an integrated model service system, the method comprising:

Acquiring request information, analyzing the request information, and determining a target model service with a dependency relationship corresponding to the request information, wherein the integrated model service system comprises at least one model service;

Determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;

The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.

According to the method for accelerating model reasoning provided by the disclosure, for each two target model services with a dependency relationship, the model parameter values are sequentially processed through at least one target model service with a dependency relationship and a message queue arranged between the target model services, and the method comprises the following steps:

according to the processing capacity value of the previous target model service, task data processed by the previous target model service are stored in the message queue in a divided manner until the task data corresponding to the previous target model service are stored;

and according to the processing capacity value of the latter target model service, acquiring task data of the corresponding data quantity each time for processing until the task data in the message queue is acquired, and outputting the task data processed by the latter target model service.

According to the method for accelerating model reasoning provided by the disclosure, the integrated model service system comprises a unified configuration center component;

Before the request information is acquired and analyzed, the method further includes:

and carrying out micro-service processing on the model to obtain model service, and registering the model service to the unified configuration center component.

According to the method for accelerating model reasoning provided by the present disclosure, after the task data processed by the subsequent target model service is output, the method further includes:

Periodically acquiring system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training;

And synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.

According to the method for model reasoning acceleration provided by the present disclosure, in the case that the trained processing capability value regression prediction model does not output a new first processing capability value corresponding to a target model service, after outputting task data processed by the target model service, the method further includes:

determining a new second processing capacity value corresponding to the target model service through a dynamic programming algorithm;

And synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.

According to the method for accelerating model reasoning provided by the present disclosure, the integrated model service system comprises a gateway component;

The obtaining request information and analyzing the request information, and determining the target model service with the dependency relationship corresponding to the request information comprises the following steps:

acquiring request information through the gateway component, analyzing the request information, and acquiring an analysis result;

And sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.

According to the method for accelerating model reasoning provided by the present disclosure, before determining the model parameter value, the method includes:

Carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service;

Determining a corresponding target feature engineering service through the target model service;

The determining of the model parameter values includes:

Acquiring original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract features;

and processing the features through the target feature engineering service to determine the model parameter values.

In a second aspect, the present disclosure provides an apparatus for model reasoning acceleration, provided in an integrated model service system, the apparatus comprising:

The system comprises an acquisition module, an integration model service system and a request information analysis module, wherein the acquisition module is used for acquiring request information and analyzing the request information to determine a target model service with a dependency relationship corresponding to the request information;

The determining module is used for determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;

In a third aspect, the present disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of model reasoning acceleration as claimed in any one of the preceding claims when the program is executed by the processor.

In a fourth aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of model inference acceleration as set forth in any one of the preceding claims.

In a fifth aspect, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a method of model reasoning acceleration as claimed in any one of the above.

The method and the device for model reasoning acceleration provided by the disclosure determine the target model service with the dependency relation corresponding to the request information by acquiring the request information and analyzing the request information, determine the model parameter value, process the model parameter value sequentially through at least one target model service with the dependency relation and a message queue arranged between the target model service to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relation is used for receiving task data generated by the previous target model service according to the processing capacity value of the previous target model service and transmitting the task data to the next target model service according to the processing capacity value of the next target model service, so that the processing capacity value of the previous target model service and the processing capacity value of the next target model are balanced, and under the condition of reaching balance, the whole integrated model service system is smoother, the delay of the target model service is reduced, and the performance of the whole integrated model service system is improved.

Drawings

In order to more clearly illustrate the present disclosure or the prior art solutions, a brief description will be given below of the drawings that are needed in the embodiments or prior art descriptions, it being apparent that the drawings in the following description are some embodiments of the present disclosure and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a flow diagram of a method of model inference acceleration provided by the present disclosure;

FIG. 2 is a block diagram of processing a set message queue provided by the present disclosure;

FIG. 3 is a block diagram of a method of model inference acceleration provided by the present disclosure;

FIG. 4 is a schematic diagram of a model reasoning acceleration apparatus;

fig. 5 is a schematic structural diagram of an electronic device provided by the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present disclosure more apparent, the technical solutions in the present disclosure will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are some, but not all, embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

In recent years, machine learning and deep learning are deeply applied in the field of risk control, and corresponding prediction or calculation tasks are completed by models with different characteristics aiming at different scenes. The method has the advantages that the models with different algorithms and different targets are used as input parameters, a plurality of models are needed to be matched in the current concrete business to meet a complex scene, and risk control is used as an indispensable link in the whole business, so that extremely high concurrency requirements are brought to the whole model service, and each link is needed to be optimized in the process of providing reasoning calculation for the model service to reduce the delay of the whole system.

In the prior art, a scheme does not have a dynamic resource adjustment mode to solve the problem that a model reasoning service optimization scheme in an integrated scene of a plurality of models is concentrated on an optimization model, calculation optimization or model compression and the like in a single model reasoning calculation process or simple management service of a scheme for whole life cycle management of a machine learning model is not subjected to deep optimization on links processed by the model, the overall delay of the system cannot be reduced, and the delay is higher and higher in a high concurrency scene.

Referring to fig. 1, a flow chart of a method for model reasoning acceleration provided in the present disclosure is used for an integrated model service system, and specifically includes:

And 110, acquiring request information, analyzing the request information, and determining a target model service with a dependency relationship corresponding to the request information, wherein the integrated model service system comprises at least one model service.

In this step, a request based on the HTTP protocol may be initiated by the Client, so that the integrated model service system obtains the request information. HTTP is an application layer protocol for distributed, collaborative, and hypermedia information systems, among others.

The object model service having the dependency relationship refers to the object model service having the upstream and downstream relationship therebetween.

120, Determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result.

In this step, the model parameter values are input to the target model service for calculation.

Since the processing capacity values of the object model services having the upstream-downstream relationship are different, a message queue is set between the object model services having the upstream-downstream relationship, thereby balancing the processing capacity values of the object model services.

Specifically, the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a subsequent target model service according to the processing capacity value of the subsequent target model service.

The model reasoning acceleration method provided by the disclosure comprises the steps of obtaining request information and analyzing the request information to determine target model services with a dependency relationship corresponding to the request information, determining model parameter values, processing the model parameter values sequentially through at least one target model service with the dependency relationship and a message queue arranged between the target model service with the dependency relationship to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capability value of the target model service, and sending the task data to a next target model service according to the processing capability value of the next target model service.

Based on any of the above embodiments, for each two target model services with a dependency relationship, the step 120 specifically includes the following steps 121 to 122:

And step 121, according to the processing capability value of the previous target model service, storing the task data processed by the previous target model service into the message queue in a divided manner until the task data corresponding to the previous target model service is stored.

And step 122, according to the processing capacity value of the service of the next target model, acquiring the task data of the corresponding data quantity each time for processing until the task data in the message queue is acquired, and outputting the task data processed by the service of the next target model.

Specific steps 121 to 122 refer to fig. 2, which is a block diagram of processing a set message queue provided in the present disclosure.

The processing capability value is batch, which refers to the capability of processing multiple request events simultaneously.

The former object model service a, the latter object model service B, and the processing power values of a and B are different, for example, the batch of the former object model service a is 3, the batch of the latter object model service B is 40 (for convenience of drawing, the batch of the latter object model service B is 4 in fig. 2 for illustration). The capacity values of the two target model services differ by an order of magnitude, if the two target model services cannot be deployed across orders of magnitude machines under a high concurrency scene, and meanwhile hardware resources of the whole model service cannot be fully utilized, therefore, a message queue middleware is arranged between the two target model services, the former target model service A places a task into a message queue of the latter target model service B after the task is processed, and at the moment, the latter target model service B can consume 40 tasks of the size of a batch at one time to perform service calculation, so that the problem of batch unequal between the former target model service A and the latter target model service B can be smoothly solved.

Based on any of the above embodiments, after step 122, the method further includes steps 11 to 12:

And step 11, periodically acquiring system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training.

And step 12, synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.

In step 11-12, the system information includes machine information and monitoring indexes, the machine information may include a memory, a utilization rate of cpu per second, and the like, and the monitoring indexes may include response delay, throughput, and the like.

The processing power value regression prediction model may be Xgboost models, providing a gradient lifting framework.

Because the request information in different time periods is distributed differently, the batch of the model service in the integrated model service system is dynamically adjusted along with the change of time, so that the throughput of the whole system is improved by fully utilizing hardware resources. The integrated model service system trains a regression prediction Xgboost model through collecting machine information and monitoring indexes of each model service in the whole system at regular time (can be per hour), takes the machine information and the monitoring indexes as the participation of the regression prediction Xgboost model in a certain time period (can be per hour) to infer real-time batch of each model service in the current integrated model service system, and takes effect of batch synchronization of different model services through a unified configuration center.

Based on any of the foregoing embodiments, in the case where the trained regression prediction model does not output a new first processing capability value corresponding to the target model service, after step 122, the method further includes steps 13 to 14:

And step 13, determining a new second processing capacity value corresponding to the target model service through a dynamic programming algorithm.

And step 14, synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.

In steps 13-14, the dynamic programming algorithm is a DP dynamic programming algorithm, which refers to an algorithm for solving complex problems by decomposing the original problem into relatively simple sub-problems.

In order to consider the system stability, the DP dynamic programming algorithm is used as a backing scheme, and if the regression prediction Xgboost model does not predict the result, the DP dynamic programming algorithm needs to be used to calculate a new second processing capacity value corresponding to the target model service, so that abnormal situations can be effectively avoided.

The system information of the system is continuously collected, a Xgboost regression prediction model is regularly trained to calculate the size of each model service batch in real time, the size of each model service batch is dynamically adjusted, and meanwhile, under the condition that the Xgboost regression prediction model cannot be predicted, a new batch is obtained through a DP dynamic programming algorithm, so that the stability of the system is improved, and the robustness of the whole system is stronger.

Based on any of the above embodiments, the integrated model service system includes a unified configuration center component;

Prior to the step 110, the method further includes:

In this step, micro Service (Micro Service) is a software architecture style based on small functional blocks (Small Building Blocks) focusing on single responsibility and functions, which are combined into a complex large application program in a modularized manner, and the functional blocks communicate with each other using an API set of Language-Independent (Language-Independent/Language agnostic).

The unified configuration center component is used to provide registration and discovery mechanisms for model services.

Based on any of the above embodiments, the integrated model service system includes a gateway component;

The step 110 specifically includes the following steps 111 to 112:

and step 111, acquiring request information through the gateway component and analyzing the request information to acquire an analysis result.

In this step, the Gateway component refers to a Gateway, and when forwarding the request information sent by the client, it can analyze and process the request information just like an origin server having a resource. The gateway component also comprises functions of authentication, current limiting and the like.

And step 112, sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.

In this step, the unified configuration center component discovers the target model service through the obtained analysis result, and determines the network location, port and other relevant information of the target model service.

Based on any of the above embodiments, before the step 120, the method includes the following steps 1-2:

And step 1, carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service.

In this step, feature engineering (Feature engineering) refers to screening better data features from the original data by a series of engineering modes, so as to improve the training effect of the model.

And 2, determining a corresponding target feature engineering service through the target model service.

In this step, different model services correspond to different feature engineering services, and thus, corresponding target feature engineering services are determined by the target model services.

The determining the model parameter value in the step 120 specifically includes the following steps 3 to 4:

And step 3, obtaining the original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract the features.

In the step, different feature engineering services correspond to different data sources in the peripheral system, and original data are acquired from the data sources.

And step 4, processing the characteristics through the target characteristic engineering service to determine the model parameter values.

In this step, the model parameter value of the current target model service is calculated by the previous feature engineering service.

Further, to supplement the disclosure, referring to fig. 3, a block diagram of a method for accelerating model reasoning provided by the disclosure is applied to an integrated model service system, where the integrated model service system includes three models as an example.

(1) And performing micro-service processing on the model and the feature engineering in the integrated model service system to obtain model service and feature engineering service. Specifically, in the figure, a model1 corresponds to a model service 1, a model 2 corresponds to a model service 2, a model 3 corresponds to a model service 3, a feature project 1 corresponds to a feature project service 1, and a feature project 2 corresponds to a feature project service 2.

(2) And initiating a request based on the HTTP through the Client, and acquiring request information through a gateway by the integrated model service system.

(3) And sending the analysis result to a unified configuration intermediate component through a gateway component, and determining the target model service 1, the target model service 2 and the target model service 3 which correspond to the request information and have the dependency relationship through a unified configuration center component.

(4) The feature engineering 1 corresponds to the data source 1 in the peripheral system, the feature engineering 2 corresponds to the data sources 2 and 3 in the peripheral system, the data source 1 is input into the feature engineering service 1 to obtain the feature 1, the data sources 2 and 3 are input into the feature engineering service 2 to obtain the feature 2 and the feature 3, the model parameter value 1 of the model service 1 is obtained through the feature 1, and the model parameter value 2 of the model service 2 is obtained through the feature 2 and the feature 3.

(5) Since the corresponding batches of the object model service 1, the object model service 2 and the object model service 3 are different, message queues are set in the object model service 1, the object model service 2 and the object model service 3 to balance the processing capacity values among the three.

Specifically, a message queue 1 is arranged between a target model service 1 and a target model service 2, task data generated by the target model service 1 according to the processing capacity value of the target model service is sent to the message queue 1, the target model service 2 acquires corresponding task data from the message queue 1 according to the processing capacity value of the target model service, the message queue 2 is arranged between the target model service 2 and the target model service 3, task data generated by the target model service 2 according to the processing capacity value of the target model service 2 is sent to the message queue 2, and the target model service 3 acquires corresponding task data from the message queue 2 according to the processing capacity value of the target model service 1, the target model service 2 and the target model service 3, so that the processing capacity values of the target model service 1, the target model service 2 and the target model service 3 can be balanced.

(6) The model parameter value 1 is input into the model service 1, the model parameter value 2 is input into the model service 2 for calculation to obtain a calculation result, and the calculation result is used as a model calculation result corresponding to the request information.

The model reasoning acceleration method solves the problem of delay of the integrated model in the wind control high concurrency scene, and improves the throughput of the whole system.

The apparatus for model inference acceleration provided in the present disclosure is described below, and the apparatus for model inference acceleration described below and the method for model inference acceleration described above may be referred to correspondingly to each other.

Referring to fig. 4, a schematic structural diagram of a device for model reasoning acceleration is provided in an integrated model service system, and the device includes:

an acquisition module 410, configured to acquire request information and analyze the request information to determine a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service;

A determining module 420, configured to determine a model parameter value, and process the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween, to obtain a final output model calculation result;

The device for model reasoning acceleration provided by the disclosure determines target model services with a dependency relationship corresponding to request information by acquiring the request information and analyzing the request information, determines model parameter values, processes the model parameter values sequentially through at least one target model service with the dependency relationship and a message queue arranged between the target model services to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capability value of the previous target model service and transmitting the task data to a next target model service according to the processing capability value of the next target model service, so that the processing capability value of the previous target model service is equivalent to the processing capability value of the next target model service.

Based on any of the above embodiments, for each two target model services with dependencies, the determining module 420 includes:

The storage module is used for storing the task data processed by the previous target model service into the message queue in a divided manner according to the processing capacity value of the previous target model service until the task data corresponding to the previous target model service is stored;

And the processing module is used for processing the task data of the corresponding data quantity obtained each time according to the processing capacity value of the service of the next target model until the task data in the message queue is obtained, and outputting the task data processed by the service of the next target model.

based on any of the above embodiments, the apparatus further comprises:

The first micro service processing module is used for carrying out micro service processing on the model to obtain model service, and registering the model service to the unified configuration center component.

Based on any of the above embodiments, the apparatus further comprises:

The output unit is used for periodically acquiring the system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training;

And the first synchronization unit is used for synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.

Based on any of the foregoing embodiments, in a case where the trained throughput value regression prediction model does not output a new first throughput value corresponding to a target model service, after the processing module, the apparatus further includes:

a second determining unit, configured to determine a new second processing capability value corresponding to the target model service through a dynamic programming algorithm;

And the second synchronization unit is used for synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.

The obtaining module 410 includes:

The analysis module is used for acquiring request information through the gateway component and analyzing the request information to acquire an analysis result;

and the sending module is used for sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.

Based on any of the above embodiments, the apparatus further comprises:

The second micro-service processing module is used for carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service;

the first determining unit is used for determining a corresponding target feature engineering service through the target model service;

the determining module 420 is specifically configured to:

Fig. 5 illustrates a physical schematic diagram of an electronic device, which may include a processor (processor) 510, a communication interface (Communications Interface) 520, a memory (memory) 530, and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 perform communication with each other through the communication bus 540, as shown in fig. 5. The processor 510 may call logic instructions in the memory 530 to execute a method for model inference acceleration, for an integrated model service system, where the method includes obtaining request information and analyzing the request information, determining a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially via at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to its own processing capability value, and send the task data to a subsequent target model service according to the processing capability value of the subsequent target model service.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

In another aspect, the disclosure further provides a computer program product, where the computer program product includes a computer program, where the computer program is capable of being stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program is capable of executing a model reasoning acceleration method provided by the above methods, for an integrated model service system, where the method includes obtaining request information and analyzing the request information, determining a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to its own processing capability value, and send the task data to a subsequent target model service according to its processing capability value.

In yet another aspect, the disclosure further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented when executed by a processor to perform a method for model reasoning acceleration provided by the above methods, for an integrated model service system, where the method includes obtaining request information and analyzing the request information to determine a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to a processing capability value of the previous target model service, and send the task data to a subsequent target model service according to the processing capability value of the subsequent target model service.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that the foregoing embodiments are merely illustrative of the technical solutions of the present disclosure, and not limiting thereof, and although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments or equivalents may be substituted for some of the technical features thereof, and these modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present disclosure in essence.

Claims

1. A method for model inference acceleration for an integrated model services system, the method comprising:

2. The method for accelerating model reasoning according to claim 1, wherein for each two target model services with a dependency relationship, the processing the model parameter values sequentially via at least one target model service with a dependency relationship and a message queue set therebetween comprises:

3. The method of model inference acceleration according to claim 2, characterized in, that the integrated model service system comprises a unified configuration center component;

4. A method of model inference acceleration as set forth in claim 3, wherein after said outputting the task data processed by the subsequent object model service, the method further comprises:

5. The method of model inference acceleration according to claim 4, wherein, in case the trained processing capability value regression prediction model does not output a new first processing capability value corresponding to a target model service, the method further comprises, after outputting the task data processed by the subsequent target model service:

6. A method of model inference acceleration as set forth in claim 3, wherein the integrated model service system includes a gateway component;

And sending the analysis result to the unified configuration center component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.

7. The method for model inference acceleration as set forth in claim 1, characterized in that,

Before determining the model parameter values, the method comprises the following steps:

The determining of the model parameter values includes:

8. An apparatus for model reasoning acceleration, disposed in an integrated model service system, the apparatus comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method of model inference acceleration as claimed in any one of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements a method of model inference acceleration as claimed in any one of claims 1 to 7.

11. A computer program product comprising a computer program which, when executed by a processor, implements a method of model inference acceleration as claimed in any one of claims 1 to 7.