[go: up one dir, main page]

CN115130674B - A method and device for accelerating model reasoning - Google Patents

A method and device for accelerating model reasoning Download PDF

Info

Publication number
CN115130674B
CN115130674B CN202210741859.2A CN202210741859A CN115130674B CN 115130674 B CN115130674 B CN 115130674B CN 202210741859 A CN202210741859 A CN 202210741859A CN 115130674 B CN115130674 B CN 115130674B
Authority
CN
China
Prior art keywords
model
service
target model
model service
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210741859.2A
Other languages
Chinese (zh)
Other versions
CN115130674A (en
Inventor
刘帅朝
黄乐乐
张德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202210741859.2A priority Critical patent/CN115130674B/en
Publication of CN115130674A publication Critical patent/CN115130674A/en
Application granted granted Critical
Publication of CN115130674B publication Critical patent/CN115130674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/043Distributed expert systems; Blackboards
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Pure & Applied Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Operations Research (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for model reasoning acceleration, wherein the method comprises the steps of obtaining request information, analyzing the request information, determining target model services with a dependency relation corresponding to the request information, determining model parameter values, processing the model parameter values sequentially through at least one target model service with the dependency relation and a message queue arranged between the at least one target model service with the dependency relation to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relation is used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service. The method and the device solve the problem of high delay of the target model service, and improve the performance of the whole integrated model service system.

Description

Model reasoning acceleration method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for model reasoning acceleration.
Background
In the high concurrency scene of risk control, different business scenes need integration of a plurality of models to provide services so as to meet the business requirements. The scheme in the prior art focuses on optimizing the model, such as calculation optimization or model compression in the process of reasoning and calculation by a single model, and the link processed by the model is not deeply optimized, so that the overall delay of the system cannot be reduced, and the delay is higher and higher in a high concurrency scene.
Disclosure of Invention
The method and the device for accelerating model reasoning solve the problem of high delay of the target model service and improve the performance of the whole integrated model service system.
In a first aspect, the present disclosure provides a method for model inference acceleration for an integrated model service system, the method comprising:
Acquiring request information, analyzing the request information, and determining a target model service with a dependency relationship corresponding to the request information, wherein the integrated model service system comprises at least one model service;
Determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;
The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.
According to the method for accelerating model reasoning provided by the disclosure, for each two target model services with a dependency relationship, the model parameter values are sequentially processed through at least one target model service with a dependency relationship and a message queue arranged between the target model services, and the method comprises the following steps:
according to the processing capacity value of the previous target model service, task data processed by the previous target model service are stored in the message queue in a divided manner until the task data corresponding to the previous target model service are stored;
and according to the processing capacity value of the latter target model service, acquiring task data of the corresponding data quantity each time for processing until the task data in the message queue is acquired, and outputting the task data processed by the latter target model service.
According to the method for accelerating model reasoning provided by the disclosure, the integrated model service system comprises a unified configuration center component;
Before the request information is acquired and analyzed, the method further includes:
and carrying out micro-service processing on the model to obtain model service, and registering the model service to the unified configuration center component.
According to the method for accelerating model reasoning provided by the present disclosure, after the task data processed by the subsequent target model service is output, the method further includes:
Periodically acquiring system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training;
And synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.
According to the method for model reasoning acceleration provided by the present disclosure, in the case that the trained processing capability value regression prediction model does not output a new first processing capability value corresponding to a target model service, after outputting task data processed by the target model service, the method further includes:
determining a new second processing capacity value corresponding to the target model service through a dynamic programming algorithm;
And synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.
According to the method for accelerating model reasoning provided by the present disclosure, the integrated model service system comprises a gateway component;
The obtaining request information and analyzing the request information, and determining the target model service with the dependency relationship corresponding to the request information comprises the following steps:
acquiring request information through the gateway component, analyzing the request information, and acquiring an analysis result;
And sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.
According to the method for accelerating model reasoning provided by the present disclosure, before determining the model parameter value, the method includes:
Carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service;
Determining a corresponding target feature engineering service through the target model service;
The determining of the model parameter values includes:
Acquiring original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract features;
and processing the features through the target feature engineering service to determine the model parameter values.
In a second aspect, the present disclosure provides an apparatus for model reasoning acceleration, provided in an integrated model service system, the apparatus comprising:
The system comprises an acquisition module, an integration model service system and a request information analysis module, wherein the acquisition module is used for acquiring request information and analyzing the request information to determine a target model service with a dependency relationship corresponding to the request information;
The determining module is used for determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;
The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.
In a third aspect, the present disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of model reasoning acceleration as claimed in any one of the preceding claims when the program is executed by the processor.
In a fourth aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of model inference acceleration as set forth in any one of the preceding claims.
In a fifth aspect, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a method of model reasoning acceleration as claimed in any one of the above.
The method and the device for model reasoning acceleration provided by the disclosure determine the target model service with the dependency relation corresponding to the request information by acquiring the request information and analyzing the request information, determine the model parameter value, process the model parameter value sequentially through at least one target model service with the dependency relation and a message queue arranged between the target model service to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relation is used for receiving task data generated by the previous target model service according to the processing capacity value of the previous target model service and transmitting the task data to the next target model service according to the processing capacity value of the next target model service, so that the processing capacity value of the previous target model service and the processing capacity value of the next target model are balanced, and under the condition of reaching balance, the whole integrated model service system is smoother, the delay of the target model service is reduced, and the performance of the whole integrated model service system is improved.
Drawings
In order to more clearly illustrate the present disclosure or the prior art solutions, a brief description will be given below of the drawings that are needed in the embodiments or prior art descriptions, it being apparent that the drawings in the following description are some embodiments of the present disclosure and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a flow diagram of a method of model inference acceleration provided by the present disclosure;
FIG. 2 is a block diagram of processing a set message queue provided by the present disclosure;
FIG. 3 is a block diagram of a method of model inference acceleration provided by the present disclosure;
FIG. 4 is a schematic diagram of a model reasoning acceleration apparatus;
fig. 5 is a schematic structural diagram of an electronic device provided by the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present disclosure more apparent, the technical solutions in the present disclosure will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are some, but not all, embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
In recent years, machine learning and deep learning are deeply applied in the field of risk control, and corresponding prediction or calculation tasks are completed by models with different characteristics aiming at different scenes. The method has the advantages that the models with different algorithms and different targets are used as input parameters, a plurality of models are needed to be matched in the current concrete business to meet a complex scene, and risk control is used as an indispensable link in the whole business, so that extremely high concurrency requirements are brought to the whole model service, and each link is needed to be optimized in the process of providing reasoning calculation for the model service to reduce the delay of the whole system.
In the prior art, a scheme does not have a dynamic resource adjustment mode to solve the problem that a model reasoning service optimization scheme in an integrated scene of a plurality of models is concentrated on an optimization model, calculation optimization or model compression and the like in a single model reasoning calculation process or simple management service of a scheme for whole life cycle management of a machine learning model is not subjected to deep optimization on links processed by the model, the overall delay of the system cannot be reduced, and the delay is higher and higher in a high concurrency scene.
Referring to fig. 1, a flow chart of a method for model reasoning acceleration provided in the present disclosure is used for an integrated model service system, and specifically includes:
And 110, acquiring request information, analyzing the request information, and determining a target model service with a dependency relationship corresponding to the request information, wherein the integrated model service system comprises at least one model service.
In this step, a request based on the HTTP protocol may be initiated by the Client, so that the integrated model service system obtains the request information. HTTP is an application layer protocol for distributed, collaborative, and hypermedia information systems, among others.
The object model service having the dependency relationship refers to the object model service having the upstream and downstream relationship therebetween.
120, Determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result.
In this step, the model parameter values are input to the target model service for calculation.
Since the processing capacity values of the object model services having the upstream-downstream relationship are different, a message queue is set between the object model services having the upstream-downstream relationship, thereby balancing the processing capacity values of the object model services.
Specifically, the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a subsequent target model service according to the processing capacity value of the subsequent target model service.
The model reasoning acceleration method provided by the disclosure comprises the steps of obtaining request information and analyzing the request information to determine target model services with a dependency relationship corresponding to the request information, determining model parameter values, processing the model parameter values sequentially through at least one target model service with the dependency relationship and a message queue arranged between the target model service with the dependency relationship to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capability value of the target model service, and sending the task data to a next target model service according to the processing capability value of the next target model service.
Based on any of the above embodiments, for each two target model services with a dependency relationship, the step 120 specifically includes the following steps 121 to 122:
And step 121, according to the processing capability value of the previous target model service, storing the task data processed by the previous target model service into the message queue in a divided manner until the task data corresponding to the previous target model service is stored.
And step 122, according to the processing capacity value of the service of the next target model, acquiring the task data of the corresponding data quantity each time for processing until the task data in the message queue is acquired, and outputting the task data processed by the service of the next target model.
Specific steps 121 to 122 refer to fig. 2, which is a block diagram of processing a set message queue provided in the present disclosure.
The processing capability value is batch, which refers to the capability of processing multiple request events simultaneously.
The former object model service a, the latter object model service B, and the processing power values of a and B are different, for example, the batch of the former object model service a is 3, the batch of the latter object model service B is 40 (for convenience of drawing, the batch of the latter object model service B is 4 in fig. 2 for illustration). The capacity values of the two target model services differ by an order of magnitude, if the two target model services cannot be deployed across orders of magnitude machines under a high concurrency scene, and meanwhile hardware resources of the whole model service cannot be fully utilized, therefore, a message queue middleware is arranged between the two target model services, the former target model service A places a task into a message queue of the latter target model service B after the task is processed, and at the moment, the latter target model service B can consume 40 tasks of the size of a batch at one time to perform service calculation, so that the problem of batch unequal between the former target model service A and the latter target model service B can be smoothly solved.
Based on any of the above embodiments, after step 122, the method further includes steps 11 to 12:
And step 11, periodically acquiring system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training.
And step 12, synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.
In step 11-12, the system information includes machine information and monitoring indexes, the machine information may include a memory, a utilization rate of cpu per second, and the like, and the monitoring indexes may include response delay, throughput, and the like.
The processing power value regression prediction model may be Xgboost models, providing a gradient lifting framework.
Because the request information in different time periods is distributed differently, the batch of the model service in the integrated model service system is dynamically adjusted along with the change of time, so that the throughput of the whole system is improved by fully utilizing hardware resources. The integrated model service system trains a regression prediction Xgboost model through collecting machine information and monitoring indexes of each model service in the whole system at regular time (can be per hour), takes the machine information and the monitoring indexes as the participation of the regression prediction Xgboost model in a certain time period (can be per hour) to infer real-time batch of each model service in the current integrated model service system, and takes effect of batch synchronization of different model services through a unified configuration center.
Based on any of the foregoing embodiments, in the case where the trained regression prediction model does not output a new first processing capability value corresponding to the target model service, after step 122, the method further includes steps 13 to 14:
And step 13, determining a new second processing capacity value corresponding to the target model service through a dynamic programming algorithm.
And step 14, synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.
In steps 13-14, the dynamic programming algorithm is a DP dynamic programming algorithm, which refers to an algorithm for solving complex problems by decomposing the original problem into relatively simple sub-problems.
In order to consider the system stability, the DP dynamic programming algorithm is used as a backing scheme, and if the regression prediction Xgboost model does not predict the result, the DP dynamic programming algorithm needs to be used to calculate a new second processing capacity value corresponding to the target model service, so that abnormal situations can be effectively avoided.
The system information of the system is continuously collected, a Xgboost regression prediction model is regularly trained to calculate the size of each model service batch in real time, the size of each model service batch is dynamically adjusted, and meanwhile, under the condition that the Xgboost regression prediction model cannot be predicted, a new batch is obtained through a DP dynamic programming algorithm, so that the stability of the system is improved, and the robustness of the whole system is stronger.
Based on any of the above embodiments, the integrated model service system includes a unified configuration center component;
Prior to the step 110, the method further includes:
and carrying out micro-service processing on the model to obtain model service, and registering the model service to the unified configuration center component.
In this step, micro Service (Micro Service) is a software architecture style based on small functional blocks (Small Building Blocks) focusing on single responsibility and functions, which are combined into a complex large application program in a modularized manner, and the functional blocks communicate with each other using an API set of Language-Independent (Language-Independent/Language agnostic).
The unified configuration center component is used to provide registration and discovery mechanisms for model services.
Based on any of the above embodiments, the integrated model service system includes a gateway component;
The step 110 specifically includes the following steps 111 to 112:
and step 111, acquiring request information through the gateway component and analyzing the request information to acquire an analysis result.
In this step, the Gateway component refers to a Gateway, and when forwarding the request information sent by the client, it can analyze and process the request information just like an origin server having a resource. The gateway component also comprises functions of authentication, current limiting and the like.
And step 112, sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.
In this step, the unified configuration center component discovers the target model service through the obtained analysis result, and determines the network location, port and other relevant information of the target model service.
Based on any of the above embodiments, before the step 120, the method includes the following steps 1-2:
And step 1, carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service.
In this step, feature engineering (Feature engineering) refers to screening better data features from the original data by a series of engineering modes, so as to improve the training effect of the model.
And 2, determining a corresponding target feature engineering service through the target model service.
In this step, different model services correspond to different feature engineering services, and thus, corresponding target feature engineering services are determined by the target model services.
The determining the model parameter value in the step 120 specifically includes the following steps 3 to 4:
And step 3, obtaining the original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract the features.
In the step, different feature engineering services correspond to different data sources in the peripheral system, and original data are acquired from the data sources.
And step 4, processing the characteristics through the target characteristic engineering service to determine the model parameter values.
In this step, the model parameter value of the current target model service is calculated by the previous feature engineering service.
Further, to supplement the disclosure, referring to fig. 3, a block diagram of a method for accelerating model reasoning provided by the disclosure is applied to an integrated model service system, where the integrated model service system includes three models as an example.
(1) And performing micro-service processing on the model and the feature engineering in the integrated model service system to obtain model service and feature engineering service. Specifically, in the figure, a model1 corresponds to a model service 1, a model 2 corresponds to a model service 2, a model 3 corresponds to a model service 3, a feature project 1 corresponds to a feature project service 1, and a feature project 2 corresponds to a feature project service 2.
(2) And initiating a request based on the HTTP through the Client, and acquiring request information through a gateway by the integrated model service system.
(3) And sending the analysis result to a unified configuration intermediate component through a gateway component, and determining the target model service 1, the target model service 2 and the target model service 3 which correspond to the request information and have the dependency relationship through a unified configuration center component.
(4) The feature engineering 1 corresponds to the data source 1 in the peripheral system, the feature engineering 2 corresponds to the data sources 2 and 3 in the peripheral system, the data source 1 is input into the feature engineering service 1 to obtain the feature 1, the data sources 2 and 3 are input into the feature engineering service 2 to obtain the feature 2 and the feature 3, the model parameter value 1 of the model service 1 is obtained through the feature 1, and the model parameter value 2 of the model service 2 is obtained through the feature 2 and the feature 3.
(5) Since the corresponding batches of the object model service 1, the object model service 2 and the object model service 3 are different, message queues are set in the object model service 1, the object model service 2 and the object model service 3 to balance the processing capacity values among the three.
Specifically, a message queue 1 is arranged between a target model service 1 and a target model service 2, task data generated by the target model service 1 according to the processing capacity value of the target model service is sent to the message queue 1, the target model service 2 acquires corresponding task data from the message queue 1 according to the processing capacity value of the target model service, the message queue 2 is arranged between the target model service 2 and the target model service 3, task data generated by the target model service 2 according to the processing capacity value of the target model service 2 is sent to the message queue 2, and the target model service 3 acquires corresponding task data from the message queue 2 according to the processing capacity value of the target model service 1, the target model service 2 and the target model service 3, so that the processing capacity values of the target model service 1, the target model service 2 and the target model service 3 can be balanced.
(6) The model parameter value 1 is input into the model service 1, the model parameter value 2 is input into the model service 2 for calculation to obtain a calculation result, and the calculation result is used as a model calculation result corresponding to the request information.
The model reasoning acceleration method solves the problem of delay of the integrated model in the wind control high concurrency scene, and improves the throughput of the whole system.
The apparatus for model inference acceleration provided in the present disclosure is described below, and the apparatus for model inference acceleration described below and the method for model inference acceleration described above may be referred to correspondingly to each other.
Referring to fig. 4, a schematic structural diagram of a device for model reasoning acceleration is provided in an integrated model service system, and the device includes:
an acquisition module 410, configured to acquire request information and analyze the request information to determine a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service;
A determining module 420, configured to determine a model parameter value, and process the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween, to obtain a final output model calculation result;
The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.
The device for model reasoning acceleration provided by the disclosure determines target model services with a dependency relationship corresponding to request information by acquiring the request information and analyzing the request information, determines model parameter values, processes the model parameter values sequentially through at least one target model service with the dependency relationship and a message queue arranged between the target model services to obtain a final output model calculation result, wherein the message queue between every two target model services with the dependency relationship is used for receiving task data generated by a previous target model service according to the processing capability value of the previous target model service and transmitting the task data to a next target model service according to the processing capability value of the next target model service, so that the processing capability value of the previous target model service is equivalent to the processing capability value of the next target model service.
Based on any of the above embodiments, for each two target model services with dependencies, the determining module 420 includes:
The storage module is used for storing the task data processed by the previous target model service into the message queue in a divided manner according to the processing capacity value of the previous target model service until the task data corresponding to the previous target model service is stored;
And the processing module is used for processing the task data of the corresponding data quantity obtained each time according to the processing capacity value of the service of the next target model until the task data in the message queue is obtained, and outputting the task data processed by the service of the next target model.
Based on any of the above embodiments, the integrated model service system includes a unified configuration center component;
based on any of the above embodiments, the apparatus further comprises:
The first micro service processing module is used for carrying out micro service processing on the model to obtain model service, and registering the model service to the unified configuration center component.
Based on any of the above embodiments, the apparatus further comprises:
The output unit is used for periodically acquiring the system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training;
And the first synchronization unit is used for synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.
Based on any of the foregoing embodiments, in a case where the trained throughput value regression prediction model does not output a new first throughput value corresponding to a target model service, after the processing module, the apparatus further includes:
a second determining unit, configured to determine a new second processing capability value corresponding to the target model service through a dynamic programming algorithm;
And the second synchronization unit is used for synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.
Based on any of the above embodiments, the integrated model service system includes a gateway component;
The obtaining module 410 includes:
The analysis module is used for acquiring request information through the gateway component and analyzing the request information to acquire an analysis result;
and the sending module is used for sending the analysis result to the unified configuration intermediate component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.
Based on any of the above embodiments, the apparatus further comprises:
The second micro-service processing module is used for carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service;
the first determining unit is used for determining a corresponding target feature engineering service through the target model service;
the determining module 420 is specifically configured to:
Acquiring original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract features;
and processing the features through the target feature engineering service to determine the model parameter values.
Fig. 5 illustrates a physical schematic diagram of an electronic device, which may include a processor (processor) 510, a communication interface (Communications Interface) 520, a memory (memory) 530, and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 perform communication with each other through the communication bus 540, as shown in fig. 5. The processor 510 may call logic instructions in the memory 530 to execute a method for model inference acceleration, for an integrated model service system, where the method includes obtaining request information and analyzing the request information, determining a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially via at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to its own processing capability value, and send the task data to a subsequent target model service according to the processing capability value of the subsequent target model service.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In another aspect, the disclosure further provides a computer program product, where the computer program product includes a computer program, where the computer program is capable of being stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program is capable of executing a model reasoning acceleration method provided by the above methods, for an integrated model service system, where the method includes obtaining request information and analyzing the request information, determining a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to its own processing capability value, and send the task data to a subsequent target model service according to its processing capability value.
In yet another aspect, the disclosure further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented when executed by a processor to perform a method for model reasoning acceleration provided by the above methods, for an integrated model service system, where the method includes obtaining request information and analyzing the request information to determine a target model service with a dependency relationship corresponding to the request information, where the integrated model service system includes at least one model service, determining a model parameter value, and processing the model parameter value sequentially through at least one target model service with a dependency relationship and a message queue set therebetween to obtain a final output model calculation result, where a message queue between each two target model services with a dependency relationship is used to receive task data generated by a previous target model service according to a processing capability value of the previous target model service, and send the task data to a subsequent target model service according to the processing capability value of the subsequent target model service.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that the foregoing embodiments are merely illustrative of the technical solutions of the present disclosure, and not limiting thereof, and although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that modifications may be made to the technical solutions described in the foregoing embodiments or equivalents may be substituted for some of the technical features thereof, and these modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present disclosure in essence.

Claims (11)

1. A method for model inference acceleration for an integrated model services system, the method comprising:
Acquiring request information, analyzing the request information, and determining a target model service with a dependency relationship corresponding to the request information, wherein the integrated model service system comprises at least one model service;
Determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;
The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.
2. The method for accelerating model reasoning according to claim 1, wherein for each two target model services with a dependency relationship, the processing the model parameter values sequentially via at least one target model service with a dependency relationship and a message queue set therebetween comprises:
according to the processing capacity value of the previous target model service, task data processed by the previous target model service are stored in the message queue in a divided manner until the task data corresponding to the previous target model service are stored;
and according to the processing capacity value of the latter target model service, acquiring task data of the corresponding data quantity each time for processing until the task data in the message queue is acquired, and outputting the task data processed by the latter target model service.
3. The method of model inference acceleration according to claim 2, characterized in, that the integrated model service system comprises a unified configuration center component;
Before the request information is acquired and analyzed, the method further includes:
and carrying out micro-service processing on the model to obtain model service, and registering the model service to the unified configuration center component.
4. A method of model inference acceleration as set forth in claim 3, wherein after said outputting the task data processed by the subsequent object model service, the method further comprises:
Periodically acquiring system information of the target model service, inputting the system information into a trained processing capacity value regression prediction model, and outputting a new first processing capacity value corresponding to the target model service, wherein the trained processing capacity value regression prediction model is obtained based on sample system information training;
And synchronizing the new first processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new first processing capacity value through the unified configuration center component.
5. The method of model inference acceleration according to claim 4, wherein, in case the trained processing capability value regression prediction model does not output a new first processing capability value corresponding to a target model service, the method further comprises, after outputting the task data processed by the subsequent target model service:
determining a new second processing capacity value corresponding to the target model service through a dynamic programming algorithm;
And synchronizing the new second processing capacity value to the unified configuration center component, and updating the processing capacity value of the target model service into the new second processing capacity value through the unified configuration center component.
6. A method of model inference acceleration as set forth in claim 3, wherein the integrated model service system includes a gateway component;
The obtaining request information and analyzing the request information, and determining the target model service with the dependency relationship corresponding to the request information comprises the following steps:
acquiring request information through the gateway component, analyzing the request information, and acquiring an analysis result;
And sending the analysis result to the unified configuration center component through the gateway component, and determining the target model service with the dependency relationship corresponding to the request information through the unified configuration center component.
7. The method for model inference acceleration as set forth in claim 1, characterized in that,
Before determining the model parameter values, the method comprises the following steps:
Carrying out micro-service processing on the feature engineering corresponding to the model to obtain feature engineering service;
Determining a corresponding target feature engineering service through the target model service;
The determining of the model parameter values includes:
Acquiring original data corresponding to the target feature engineering service, and inputting the original data into the target feature engineering service to extract features;
and processing the features through the target feature engineering service to determine the model parameter values.
8. An apparatus for model reasoning acceleration, disposed in an integrated model service system, the apparatus comprising:
The system comprises an acquisition module, an integration model service system and a request information analysis module, wherein the acquisition module is used for acquiring request information and analyzing the request information to determine a target model service with a dependency relationship corresponding to the request information;
The determining module is used for determining model parameter values, and processing the model parameter values sequentially through at least one target model service with a dependency relationship and a message queue arranged between the target model service to obtain a finally output model calculation result;
The message queues between every two target model services with the dependency relationship are used for receiving task data generated by a previous target model service according to the processing capacity value of the previous target model service, and sending the task data to a next target model service according to the processing capacity value of the next target model service.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a method of model inference acceleration as claimed in any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements a method of model inference acceleration as claimed in any one of claims 1 to 7.
11. A computer program product comprising a computer program which, when executed by a processor, implements a method of model inference acceleration as claimed in any one of claims 1 to 7.
CN202210741859.2A 2022-06-27 2022-06-27 A method and device for accelerating model reasoning Active CN115130674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210741859.2A CN115130674B (en) 2022-06-27 2022-06-27 A method and device for accelerating model reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210741859.2A CN115130674B (en) 2022-06-27 2022-06-27 A method and device for accelerating model reasoning

Publications (2)

Publication Number Publication Date
CN115130674A CN115130674A (en) 2022-09-30
CN115130674B true CN115130674B (en) 2025-04-18

Family

ID=83379516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210741859.2A Active CN115130674B (en) 2022-06-27 2022-06-27 A method and device for accelerating model reasoning

Country Status (1)

Country Link
CN (1) CN115130674B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162414A (en) * 2019-02-01 2019-08-23 腾讯科技(深圳)有限公司 The method and device of artificial intelligence service is realized based on micro services framework
CN114615521A (en) * 2022-03-10 2022-06-10 网易(杭州)网络有限公司 Video processing method and device, computer readable storage medium and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4257857B2 (en) * 2004-09-22 2009-04-22 インターナショナル・ビジネス・マシーンズ・コーポレーション Data processing system and data processing method
CN102938731B (en) * 2012-11-22 2015-01-21 北京锐易特软件技术有限公司 Exchange and integration device and method based on proxy cache adaptation model
CN105678398A (en) * 2015-12-24 2016-06-15 国家电网公司 Power load forecasting method based on big data technology, and research and application system based on method
US10528863B2 (en) * 2016-04-01 2020-01-07 Numenta, Inc. Feedback mechanisms in sequence learning systems with temporal processing capability
CN108510082B (en) * 2018-03-27 2022-11-11 苏宁易购集团股份有限公司 Method and device for processing machine learning model
CN111262795B (en) * 2020-01-08 2024-02-06 京东科技控股股份有限公司 Service interface-based current limiting method and device, electronic equipment and storage medium
CN111913818B (en) * 2020-08-07 2022-12-02 平安科技(深圳)有限公司 Method for determining dependency relationship between services and related device
CN112270410A (en) * 2020-10-19 2021-01-26 北京达佳互联信息技术有限公司 Online reasoning service system, method and device for providing online reasoning service
CN113806058B (en) * 2021-10-09 2024-10-18 京东科技控股股份有限公司 Task management method and device, storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162414A (en) * 2019-02-01 2019-08-23 腾讯科技(深圳)有限公司 The method and device of artificial intelligence service is realized based on micro services framework
CN114615521A (en) * 2022-03-10 2022-06-10 网易(杭州)网络有限公司 Video processing method and device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN115130674A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
CN111768008B (en) Federal learning method, apparatus, device, and storage medium
US20170286861A1 (en) Structured machine learning framework
CN115249073A (en) A federated learning method and device
CN110895506B (en) Method and system for constructing test data
CN114997401B (en) Adaptive inference acceleration method, apparatus, computer device, and storage medium
CN112783632A (en) Stream calculation system, method, electronic device, and readable storage medium
CN114706675A (en) Task deployment method and device based on cloud-edge collaborative system
CN115037625B (en) Network slice processing method and device, electronic equipment and readable storage medium
CN111400007A (en) Task scheduling method and system based on edge calculation
CN110781180B (en) Data screening method and data screening device
EP4170974A1 (en) Slice service processing method and apparatus, network device, and readable storage medium
CN114531448A (en) Calculation force determination method and device and calculation force sharing system
WO2024121612A1 (en) Method and apparatus for scheduling and assigning tasks to computing resources
Kanwal et al. A genetic based leader election algorithm for IoT cloud data processing
Feng et al. FedDD: Toward communication-efficient federated learning with differential parameter dropout
CN116149272B (en) A cloud-edge collaborative production line monitoring method, device and system
CN116755886A (en) Method, device, storage medium and system for forwarding and calculating force task
Tirana et al. Workflow optimization for parallel split learning
CN115130674B (en) A method and device for accelerating model reasoning
CN109905481A (en) Construction of a QoS model based on RTI-DDS and a method for predicting the running performance of QoS strategy scheme
CN113296991B (en) Abnormality detection method and device
CN111539281B (en) Distributed face recognition method and system
CN118972433A (en) Edge gateway system, data processing method of edge gateway system
Pandey et al. Here, there, anywhere: Profiling-driven services to tame the heterogeneity of edge applications
EP2209282A1 (en) A method, device and computer program product for service balancing in an electronic communications system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant