CN111967613B

CN111967613B - NLP model training and publishing recognition system

Info

Publication number: CN111967613B
Application number: CN202010853842.7A
Authority: CN
Inventors: 陈继扬; 王磊
Original assignee: Zhejiang Baiying Technology Co Ltd
Current assignee: Zhejiang Baiying Technology Co Ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2023-06-16
Anticipated expiration: 2040-08-24
Also published as: CN111967613A

Abstract

The invention discloses an NLP model training and publishing recognition system, which comprises: at least two GPU servers; NLP recognition model; NLP language recognition service process; at least one NLP gateway; at least two GPU server resource scheduling instruction executors, wherein the GPU server resource scheduling instruction executors are used for executing scheduling instructions initiated by a resource scheduling center module; the resource scheduling center module is used for distributing GPU server resources and coordinating executors to execute instructions in the following processes, and comprises the following steps: training an NLP recognition model, publishing the NLP recognition model and synchronously changing the relationship between the NLP recognition model and service data; and the service registry module is used for recording and deleting the service information from the registry when the NLP language identification service process is started and stopped and is used for the NLP gateway to automatically discover the service process.

Description

NLP model training release recognition system

技术领域technical field

本发明涉及模型训练领域，尤其涉及NLP模型训练发布识别系统。The invention relates to the field of model training, in particular to an NLP model training release recognition system.

背景技术Background technique

NLP(Natural Language Processing)是人工智能(AI)的一个子领域，目前常规的NLP在能提供正常接口服务之前都需要经过模型的训练，发布，以及将模型进行启动这三个过程，这三个过程目前业内都需要开发人员在服务器上进行手工操作，每一次NLP语意识别模型的发布以及训练时都会存在误操作风险,容易因为人的操作错误导致故障。NLP (Natural Language Processing) is a subfield of artificial intelligence (AI). At present, conventional NLP needs to go through three processes of model training, publishing, and starting the model before it can provide normal interface services. Process At present, developers in the industry need to manually operate on the server. Every time the NLP semantic recognition model is released and trained, there is a risk of misoperation, and it is easy to cause failure due to human operation errors.

另外，在NLP语意识别服务进程启动后需要手动进行配置，比如对外提供服务调用接口，需要配置nginx服务接口路由与NLP识别模型以及服务进程的关系，过程繁琐不适合日常的运维工作开展,且具有一定的机械重复性。In addition, after the NLP semantic recognition service process is started, it needs to be manually configured. For example, to provide a service call interface to the outside world, it is necessary to configure the relationship between the nginx service interface route, the NLP recognition model and the service process. The process is cumbersome and not suitable for daily operation and maintenance. It has certain mechanical repeatability.

发明内容Contents of the invention

本发明要解决的技术问题，在于提供NLP识别模型训练发布识别系统，能够简化NLP语意识别服务配置，提高NLP识别模型训练效率、减少误配置漏配置的风险，对服务使用方提供稳定的NLP语意识别服务。The technical problem to be solved by the present invention is to provide an NLP recognition model training release recognition system, which can simplify the NLP semantic recognition service configuration, improve the NLP recognition model training efficiency, reduce the risk of misconfiguration and missing configuration, and provide stable NLP semantics for service users Identification service.

为实现上述目的，本发明采用下述技术方案：To achieve the above object, the present invention adopts the following technical solutions:

本发明提供NLP模型训练发布识别系统，所述系统包括：The invention provides an NLP model training release recognition system, the system comprising:

至少两台GPU服务器，所述GPU服务器用于机器学习计算；At least two GPU servers, the GPU servers are used for machine learning calculations;

NLP识别模型，所述NLP识别模型为业务文本语料进行算法拟合后获得的用于启动NLP语意识别服务的模型文件；NLP recognition model, the NLP recognition model is the model file used to start the NLP semantic recognition service obtained after algorithm fitting of the business text corpus;

NLP语意识别服务进程，所述NLP语意识别服务进程为以业务文本语料训练完成后的NLP识别模型文件为基础、用于对外提供NLP语意识别服务的服务器系统进程；NLP semantic recognition service process, the NLP semantic recognition service process is based on the NLP recognition model file after the business text corpus training is completed, and is used to provide external NLP semantic recognition service server system process;

至少一个NLP网关，所述NLP网关用于将业务方调用的请求根据预设规则进行路由转发到已经启动成功正在对外提供NLP语意识服务进程所在的服务器上，并由对应的NLP语意识别服务提供识别服务；At least one NLP gateway, the NLP gateway is used to route and forward the request invoked by the business party to the server where the NLP language awareness service process has been successfully started and is being provided externally according to preset rules, and provided by the corresponding NLP semantic recognition service identification services;

至少两台GPU服务器资源调度指令执行器，所述GPU服务器资源调度指令执行器用于执行资源调度中心模块发起的调度指令；At least two GPU server resource scheduling instruction executors, the GPU server resource scheduling instruction executors are used to execute the scheduling instructions initiated by the resource scheduling center module;

至少一个资源调度中心模块，所述资源调度中心模块用于以下过程中GPU服务器资源的分配以及协调执行者进行指令执行，包括：At least one resource scheduling center module, the resource scheduling center module is used for the allocation of GPU server resources in the following processes and for coordinating executors to execute instructions, including:

NLP识别模型的训练、NLP识别模型发布以及NLP识别模型与业务数据关系的同步变更；Training of NLP recognition models, release of NLP recognition models, and synchronous change of the relationship between NLP recognition models and business data;

至少一个服务注册中心模块，所述服务注册中心模块用于NLP语意识别服务进程启停的同时将服务信息从注册中心记录以及删除，并用于NLP网关自发现服务进程。At least one service registration center module, the service registration center module is used to record and delete service information from the registration center when the NLP semantic recognition service process is started and stopped, and is used for the NLP gateway self-discovery service process.

在上述方案中，所述NLP网关通过服务注册中心模块动态发现已经可以对外提供服务的NLP语意识别服务进程。In the above solution, the NLP gateway dynamically discovers the NLP semantic recognition service process that can provide external services through the service registry module.

在上述方案中，所述所述NLP语意识别服务进程依赖于NLP识别模型训练完成后的模型文件。In the above solution, the NLP semantic recognition service process depends on the model file after the training of the NLP recognition model is completed.

在上述方案中，所述资源调度中心模块用于NLP识别模型的训练、NLP识别模型发布以及NLP识别模型与业务数据关系的同步变更过程中产生以下操作指令和编排模式，包括：In the above scheme, the resource dispatching center module is used for the training of the NLP recognition model, the release of the NLP recognition model, and the synchronous change of the relationship between the NLP recognition model and business data to generate the following operation instructions and arrangement modes, including:

训练指令、服务启动指令、服务关停指令以及关系同步指令；training commands, service activation commands, service shutdown commands, and relationship synchronization commands;

训练编排模式、模型发布编排模式、模型关停编排模式以及关系同步模式；Training orchestration mode, model release orchestration mode, model shutdown orchestration mode, and relationship synchronization mode;

训练指令产生于训练编排模式中，由资源调度中心模块对NLP识别模型训练进行分配GPU服务器后，调用对应GPU服务器资源调度指令执行器，进行模型训练；The training instruction is generated in the training arrangement mode. After the resource scheduling center module allocates the GPU server for the NLP recognition model training, it calls the corresponding GPU server resource scheduling instruction executor to perform model training;

服务启动指令以及服务关停指令用于模型发布编排模式，资源调度中心模块根据单次调度请求发布的方式，生成GPU服务器上进程纬度的服务启动指令或者服务关停指令，对具体GPU服务器上的NLP语意识别服务进程进行启动或关停操作，其中，服务关停指令也用于模型关停编排模式，用于关停GPU服务器上正在提供服务的进程；The service startup command and service shutdown command are used in the model release arrangement mode. The resource scheduling center module generates the service startup command or service shutdown command of the process latitude on the GPU server according to the way of publishing a single scheduling request. The NLP semantic recognition service process is started or shut down, and the service shutdown command is also used in the model shutdown orchestration mode to shut down the process that is providing the service on the GPU server;

关系同步指令产生于关系同步模式下，用于关停对应的NLP语意识别服务进程。The relationship synchronization command is generated in the relationship synchronization mode and is used to shut down the corresponding NLP semantic recognition service process.

在上述方案中，所述资源调度中心模块对NLP语意识别服务进程关停包括：In the above scheme, the shutting down of the NLP semantic recognition service process by the resource scheduling center module includes:

生成NLP语意识别服务进程关停调度记录表，状态为创建；Generate NLP semantic recognition service process shutdown scheduling record table, the status is created;

根据请求关停调度的NLP语意识别服务进程的ip、port查询对应的NLP语意识别服务进程，状态以及NLP语意识别服务进程运行的业务概念上行业id；According to the ip and port of the NLP semantic recognition service process scheduled to be shut down according to the request, query the corresponding NLP semantic recognition service process, status and industry id of the business concept of the NLP semantic recognition service process running;

当NLP语意识别服务进程处于运行状态时，抢占NLP语意识别服务进程的ip、port对应的GPU服务器资源操作锁；When the NLP semantic recognition service process is running, seize the GPU server resource operation lock corresponding to the ip and port of the NLP semantic recognition service process;

根据查询到NLP语意识别服务进程运行的行业id、NLP语意识别服务进程的ip和NLP语意识别服务进程的port移除注册中心对应注册的NLP语意识别服务进程；According to the industry id of the NLP semantic recognition service process running, the ip of the NLP semantic recognition service process and the port of the NLP semantic recognition service process, remove the corresponding registered NLP semantic recognition service process from the registration center;

调用CPU服务器资源调度指令执行器执行关闭NLP语意识别服务进程的操作；Call the CPU server resource scheduling instruction executor to execute the operation of closing the NLP semantic recognition service process;

调用成功时，释放GPU服务器资源操作锁并更新关停调度记录状态为成功，When the call is successful, the GPU server resource operation lock is released and the shutdown scheduling record status is updated as successful.

标记NLP服务进程资源为空闲。Mark the NLP server process resource as idle.

在上述方案中，所述资源调度中心模块对NLP语意识别服务进程关停还包括以下步骤：In the above scheme, the shutting down of the NLP semantic recognition service process by the resource scheduling center module also includes the following steps:

当调用CPU服务器资源调度指令执行器失败时，释放GPU服务器资源操作锁，并更新关停调度记录表状态为失败，附加失败原因。When calling the CPU server resource scheduling command executor fails, release the GPU server resource operation lock, and update the status of the shutdown scheduling record table to fail, and add the reason for the failure.

在上述方案中，所述系统还用于NLP服务进程扩容发布，包括：In the above scheme, the system is also used for NLP service process expansion and release, including:

获取训练完成的NLP识别模型的URL地址、行业id以及场景id，扩容发布扩容数量；Obtain the URL address, industry id, and scene id of the trained NLP recognition model, and expand the number of releases;

根据行业id、场景id生成扩容数量对应的扩容调度记录，更新扩容调度记录状态为创建状态；According to the industry id and scene id, generate the expansion scheduling record corresponding to the expansion quantity, and update the expansion scheduling record status to the created state;

查询并抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁；Query and seize the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port;

调用CPU服务器资源调度指令执行器进行发布操作；Call the CPU server resource scheduling instruction executor to perform the release operation;

更新NLP语意识别服务进程调度状态为成功和运行中；Update the scheduling status of the NLP semantic recognition service process to success and running;

将调度成功的NLP语意识别服务进程注册到注册中心并提供流量接入；Register the successfully scheduled NLP semantic recognition service process to the registration center and provide traffic access;

标记调度NLP语意识别服务进程资源成功，释放GPU服务器资源操作锁。Mark the scheduling of NLP semantic recognition service process resources successfully, and release the GPU server resource operation lock.

在上述方案中，所述NLP服务进程扩容发布还包括以下步骤：In the above scheme, the release of the expansion of the NLP service process also includes the following steps:

当查询空闲NLP语意识别服务进程数量为0时，更新扩容调度记录状态为扩容发布失败，并标记为资源不足。When the number of idle NLP semantic recognition service processes in the query is 0, update the expansion scheduling record status to expansion release failure, and mark it as insufficient resources.

在抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁之前，查询NLP服务进程是否已经扩容至预设阈值，如果是，结束NLP语意识别服务进程扩容发布。Before preempting the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port, query whether the NLP service process has been expanded to the preset threshold, and if so, end the NLP semantic recognition service process expansion release.

若调用CPU服务器资源调度指令执行器进行发布操作失败，更新扩容调度记录状态为扩容发布失败，并标记为CPU服务器资源调度指令执行器执行失败。If calling the CPU server resource scheduling instruction executor to issue the operation fails, update the expansion scheduling record status to expansion release failure, and mark it as CPU server resource scheduling instruction executor execution failure.

在上述方案中，所述系统还用于NLP服务进程滚动发布，包括：In the above scheme, the system is also used for rolling release of NLP service process, including:

生成NLP语意识别服务进程滚动发布的滚动调度记录，更新滚动调度记录状态为创建状态；Generate a rolling scheduling record for the rolling release of the NLP semantic recognition service process, and update the status of the rolling scheduling record to the created state;

查询并抢占空闲NLP语意识别服务进程资源；Query and seize idle NLP semantic recognition service process resources;

更新抢占成功的空闲NLP语意识别服务进程资源为锁定状态；Update the resources of the idle NLP semantic recognition service process that preempted successfully to the locked state;

调用CPU服务器资源调度指令执行器执行NLP语意识别服务进程扩容发布；Call the CPU server resource scheduling instruction executor to execute NLP semantic recognition service process expansion and release;

更新扩容调度记录状态为成功状态，操作NLP语意识别服务进程滚动发布对应的行业id和场景id进行NLP语意识别服务进程滚动发布编排调度，生成滚动发布调度记录；Update the status of the expansion scheduling record to success, operate the NLP semantic recognition service process to roll out the corresponding industry id and scene id to perform rolling release scheduling of the NLP semantic recognition service process, and generate a rolling release scheduling record;

更新NLP语意识别服务进程滚动发布直到遍历完待滚动发布列表。Update the rolling release of the NLP semantic recognition service process until the list to be rolled is traversed.

在上述方案中所述NLP服务进程滚动发布还包括以下步骤：在遍历完NLP语意识别服务进程滚动发布之后，对服务器进程资源加锁；反注册对应服务，调用CPU服务器资源调度指令执行器关停对应服务。In the above solution, the NLP service process rolling release also includes the following steps: after traversing the NLP semantic recognition service process rolling release, lock the server process resource; unregister the corresponding service, and call the CPU server resource scheduling instruction to shut down the executor Corresponding service.

在上述方案中，所述所述资源调度中心模块对NLP语意识别服务进程关系同步包括：In the above solution, the synchronization of the NLP semantic recognition service process relationship by the resource scheduling center module includes:

获取训练完成的NLP识别模型的关系文件URL地址、行业id和场景id；Get the relationship file URL address, industry id and scene id of the trained NLP recognition model;

根据行业id和场景id查询运行中的NLP语意识别服务进程；Query the running NLP semantic recognition service process according to the industry id and scene id;

调用CPU服务器资源调度指令执行器执行关系同步；Call the CPU server resource scheduling instruction executor to perform relationship synchronization;

日志记录NLP识别模型与业务场景关系的同步情况。The log records the synchronization of the relationship between the NLP recognition model and the business scenario.

在上述方案中，所述资源调度中心模块对NLP识别模型训练包括：In the above scheme, the training of the NLP recognition model by the resource scheduling center module includes:

获取NLP识别模型训练语料库地址、NLP识别模型训练完成后的上传URL地址，以及NLP识别模型相关的行业id和场景id；Obtain the address of the NLP recognition model training corpus, the upload URL address after the NLP recognition model training is completed, and the industry id and scene id related to the NLP recognition model;

生成训练调度记录，设置训练调度记录为创建状态；Generate a training scheduling record, set the training scheduling record as the created state;

查询可用于训练的服务器资源并加锁；Query and lock server resources available for training;

更新加锁后的用于训练的服务器资源为锁定状态；Update the locked server resources for training to the locked state;

生成训练记录，标记训练调度记录状态为调度中；Generate a training record and mark the status of the training scheduling record as scheduling;

调用CPU服务器资源调度指令执行器执行训练指令，更新训练调度记录为训练中；Call the CPU server resource scheduling instruction executor to execute the training instruction, and update the training scheduling record as training;

释放加锁的服务器资源等待CPU服务器资源调度指令执行器执行完毕训练指令并将训练结果回调；Release the locked server resources and wait for the CPU server resource scheduling instruction executor to execute the training instructions and call back the training results;

获得训练完成的NLP识别模型。Obtain the trained NLP recognition model.

在上述方案中，所述资源调度中心模块对NLP识别模型训练还包括：In the above scheme, the training of the NLP recognition model by the resource scheduling center module also includes:

调用CPU服务器资源调度指令执行器执行训练指令失败时，释放加锁的服务器资源并将训练调度记录标记为失败。When calling the CPU server resource scheduling command executor to execute the training command fails, release the locked server resource and mark the training scheduling record as failed.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明提供模型训练发布识别系统，通过简化NLP服务配置能够有效的降低人为介入操作的频率，降低人为操作故障率，提高发布/训练效率；能够自发现目前已经训练完成的NLP识别模型，对外提供NLP接口服务。The invention provides a model training release recognition system, which can effectively reduce the frequency of human intervention operations, reduce the failure rate of human operation, and improve the efficiency of release/training by simplifying the NLP service configuration; it can self-discover the NLP recognition models that have been trained so far, and provide them to the outside world. NLP interface service.

附图说明Description of drawings

图1为本发明实施例提供的模型训练发布识别系统的结构示意图；FIG. 1 is a schematic structural diagram of a model training release recognition system provided by an embodiment of the present invention;

图2为本发明实施例提供的一个示例中资源调度中心模块对NLP语意识别服务进程关停的流程示意图；Fig. 2 is a schematic flow diagram of shutting down the NLP semantic recognition service process by the resource scheduling center module in an example provided by the embodiment of the present invention;

图3为本发明实施例提供的一个示例中模型训练发布识别系统用于NLP服务进程扩容发布的流程示意图；Fig. 3 is a schematic flow diagram of the model training release identification system used for NLP service process expansion and release in an example provided by the embodiment of the present invention;

图4为本发明实施例提供的一个示例中模型训练发布识别系统用于NLP服务进程滚动发布的流程示意图；FIG. 4 is a schematic flow diagram of the rolling release of the NLP service process used by the model training release identification system in an example provided by the embodiment of the present invention;

图5为本发明实施例提供的一个示例中资源调度中心模块对NLP语意识别服务进程关系同步的流程示意图；Fig. 5 is a schematic flow diagram of the resource scheduling center module synchronizing the NLP semantic recognition service process relationship in an example provided by the embodiment of the present invention;

图6为本发明实施例提供的一个示例中资源调度中心模块对NLP识别模型进行训练的流程示意图。Fig. 6 is a schematic flow diagram of training the NLP recognition model by the resource scheduling center module in an example provided by the embodiment of the present invention.

具体实施方式Detailed ways

下面通过具体实施例，并结合附图，对本发明的技术方案作进一步的具体描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions of the present invention will be described in further detail below through specific embodiments in conjunction with the drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明实施例提供模型训练发布识别系统，能够简化NLP服务配置，提高NLP识别模型的发布和训练效率、对外提供NLP接口服务。Embodiments of the present invention provide a model training release recognition system, which can simplify NLP service configuration, improve the efficiency of release and training of NLP recognition models, and provide NLP interface services externally.

以下结合附图，详细说明本发明中各实施例提供的技术方案。The technical solutions provided by various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

本发明实施例提供模型训练发布识别系统，如附图1所示，所述系统包括：Embodiments of the present invention provide a model training release recognition system, as shown in Figure 1, the system includes:

至少两台GPU服务器10，所述GPU服务器10用于机器学习计算；At least two GPU servers 10, the GPU server 10 is used for machine learning calculation;

NLP识别模型20，所述NLP识别模型20为业务文本语料进行算法拟合后获得的用于启动NLP语意识别服务的模型文件；NLP recognition model 20, described NLP recognition model 20 is the model file that is used to start the NLP semantic recognition service that obtains after carrying out algorithm fitting for business text corpus;

NLP语意识别服务进程30，所述NLP语意识别服务进程30为以业务文本语料训练完成后的NLP识别模型文件为基础、用于对外提供NLP语意识别服务的服务器系统进程；NLP semantic recognition service process 30, said NLP semantic recognition service process 30 is a server system process for providing NLP semantic recognition service based on the NLP recognition model file after the business text corpus training is completed;

至少一个NLP网关40，所述NLP网关40用于将业务方调用的请求根据预设规则进行路由转发到已经启动成功正在对外提供NLP语意识服务进程30所在的服务器上，并由对应的NLP语意识别服务提供识别服务；At least one NLP gateway 40, the NLP gateway 40 is used to route and forward the request invoked by the business party to the server where the NLP language awareness service process 30 has been successfully started and is being provided externally according to preset rules, and the corresponding NLP semantics Identification Services provide identification services;

至少两台GPU服务器资源调度指令执行器50，所述GPU服务器资源调度指令执行器50用于执行资源调度中心模块60发起的调度指令；At least two GPU server resource scheduling instruction executors 50, the GPU server resource scheduling instruction executors 50 are used to execute the scheduling instructions initiated by the resource scheduling center module 60;

至少一个资源调度中心模块60，所述资源调度中心模块60用于以下过程中GPU服务器10资源的分配以及协调执行者进行指令执行，包括：At least one resource scheduling center module 60, the resource scheduling center module 60 is used for the allocation of GPU server 10 resources in the following processes and for coordinating executors to execute instructions, including:

NLP识别模型20的训练、NLP识别模型20发布以及NLP识别模型20与业务数据关系的同步变更；The training of the NLP recognition model 20, the release of the NLP recognition model 20, and the synchronous change of the relationship between the NLP recognition model 20 and business data;

所述资源调度中心模块60用于NLP识别模型20的训练、NLP识别模型20发布以及NLP识别模型20与业务数据关系的同步变更过程中产生以下操作指令和编排模式，包括：The resource scheduling center module 60 is used for the training of the NLP recognition model 20, the release of the NLP recognition model 20, and the synchronous change of the relationship between the NLP recognition model 20 and business data to generate the following operating instructions and layout patterns, including:

训练指令产生于训练编排模式中，由资源调度中心模块60对NLP识别模型20训练进行分配GPU服务器10后，调用对应GPU服务器资源调度指令执行器50，进行模型训练；The training instruction is generated in the training arrangement mode. After the resource dispatching center module 60 trains the NLP recognition model 20 and assigns the GPU server 10, it calls the corresponding GPU server resource dispatching instruction executor 50 to perform model training;

服务启动指令以及服务关停指令用于模型发布编排模式，资源调度中心模块60根据单次调度请求发布的方式，生成GPU服务器10上进程纬度的服务启动指令或者服务关停指令，对具体GPU服务器10上的NLP语意识别服务进程30进行启动或关停操作，其中，服务关停指令也用于模型关停编排模式，用于关停GPU服务器10上正在提供服务的进程；The service startup instruction and the service shutdown instruction are used in the model release arrangement mode, and the resource scheduling center module 60 generates a service startup instruction or a service shutdown instruction of the process latitude on the GPU server 10 according to a single scheduling request release mode, and the specific GPU server The NLP semantic recognition service process 30 on the 10 starts or shuts down the operation, wherein, the service shutdown command is also used for the model shutdown arrangement mode, for shutting down the process that is providing the service on the GPU server 10;

关系同步指令产生于关系同步模式下，用于关停对应的NLP语意识别服务进程30；The relationship synchronization command is generated in the relationship synchronization mode, and is used to shut down the corresponding NLP semantic recognition service process 30;

至少一个服务注册中心模块70，所述服务注册中心模块70用于NLP语意识别服务进程启停30的同时将服务信息从注册中心记录以及删除，并用于NLP网关40自发现服务进程。At least one service registration center module 70, the service registration center module 70 is used to record and delete service information from the registration center when the NLP semantic recognition service process starts and stops 30, and is used for the self-discovery service process of the NLP gateway 40.

在一个示例中，所述NLP网关40通过服务注册中心模块70动态发现已经可以对外提供服务的NLP语意识别服务进程30。In an example, the NLP gateway 40 dynamically discovers the NLP semantic recognition service process 30 that can provide services externally through the service registry module 70 .

在一个示例中，所述NLP语意识别服务进程30依赖于NLP识别模型20训练完成后的模型文件。In one example, the NLP semantic recognition service process 30 depends on the model file after the training of the NLP recognition model 20 is completed.

在一个示例中，如附图2所示，所述资源调度中心模块对NLP语意识别服务进程关停包括：In one example, as shown in Figure 2, the resource scheduling center module shutting down the NLP semantic recognition service process includes:

S201，生成NLP语意识别服务进程关停调度记录表，状态为创建；S201, generate NLP semantic recognition service process shutdown scheduling record table, the state is created;

S202，根据请求关停调度的NLP语意识别服务进程的ip、port查询对应的NLP语意识别服务进程，状态以及NLP语意识别服务进程运行的业务概念上行业id；S202, according to the ip and port of the NLP semantic recognition service process that is requested to be shut down and scheduled, query the corresponding NLP semantic recognition service process, status, and industry id of the business concept that the NLP semantic recognition service process runs;

S203，当NLP语意识别服务进程处于运行状态时，抢占NLP语意识别服务进程的ip、port对应的GPU服务器资源操作锁；S203, when the NLP semantic recognition service process is running, seize the GPU server resource operation lock corresponding to the ip and port of the NLP semantic recognition service process;

S204，根据查询到NLP语意识别服务进程运行的行业id、NLP语意识别服务进程的ip和NLP语意识别服务进程的port移除注册中心对应注册的NLP语意识别服务进程；S204, remove the correspondingly registered NLP semantic recognition service process from the registration center according to the industry id of the running NLP semantic recognition service process, the ip of the NLP semantic recognition service process, and the port of the NLP semantic recognition service process;

S205，调用CPU服务器资源调度指令执行器执行关闭NLP语意识别服务进程的操作；S205, calling the CPU server resource scheduling instruction executor to execute the operation of closing the NLP semantic recognition service process;

针对步骤S205，当调用CPU服务器资源调度指令执行器失败时，释放GPU服务器资源操作锁，并更新关停调度记录表状态为失败，附加失败原因。For step S205, when calling the executor of the CPU server resource scheduling instruction fails, release the GPU server resource operation lock, and update the status of the shutdown scheduling record table to fail, and add the reason for the failure.

S206，调用成功时，释放GPU服务器资源操作锁并更新关停调度记录状态为成功，标记NLP服务进程资源为空闲。S206, when the call is successful, release the GPU server resource operation lock and update the shutdown scheduling record status as successful, and mark the NLP service process resource as idle.

在一个示例中，如附图3所示，所述系统还用于NLP服务进程扩容发布，包括：In one example, as shown in accompanying drawing 3, described system is also used for NLP service process expansion release, comprises:

S301，获取训练完成的NLP识别模型的URL地址、行业id以及场景id，扩容发布扩容数量；S301. Obtain the URL address, industry id, and scene id of the trained NLP recognition model, and expand and release the expanded quantity;

S302，根据行业id、场景id生成扩容数量对应的扩容调度记录，更新扩容调度记录状态为创建状态；S302. Generate an expansion scheduling record corresponding to the expansion quantity according to the industry id and the scenario id, and update the status of the expansion scheduling record to the created state;

S303，查询并抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁；S303, query and seize the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port;

针对步骤S303，当查询空闲NLP语意识别服务进程数量为0时，更新扩容调度记录状态为扩容发布失败，并标记为资源不足。For step S303, when the number of idle NLP semantic recognition service processes is 0, the state of the capacity expansion scheduling record is updated as capacity expansion release failure, and marked as insufficient resources.

S304，调用GPU服务器资源调度指令执行器进行发布操作；S304, calling the GPU server resource scheduling instruction executor to issue an operation;

针对步骤S304，若调用GPU服务器资源调度指令执行器进行发布操作失败，更新扩容调度记录状态为扩容发布失败，并标记为GPU服务器资源调度指令执行器执行失败。For step S304, if calling the GPU server resource scheduling instruction executor to issue the operation fails, update the capacity expansion scheduling record status to expansion issue failure, and mark it as GPU server resource scheduling instruction executor execution failure.

S305，更新NLP语意识别服务进程调度状态为成功和运行中；S305, updating the scheduling status of the NLP semantic recognition service process as successful and running;

S306，将调度成功的NLP语意识别服务进程注册到注册中心并提供流量接入；S306, register the successfully scheduled NLP semantic recognition service process to the registration center and provide traffic access;

S307，标记调度NLP语意识别服务进程资源成功，释放GPU服务器资源操作锁。S307 , marking success in scheduling the resources of the NLP semantic recognition service process, and releasing the resource operation lock of the GPU server.

在一个示例中，在抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁之前，查询NLP服务进程是否已经扩容至预设阈值，如果是，结束NLP语意识别服务进程扩容发布。In one example, before preempting the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port, query whether the NLP service process has been expanded to a preset threshold, and if so, end the NLP semantic recognition service process expansion release.

在一个示例中，如附图4所示，所述系统还用于NLP服务进程滚动发布，包括：In one example, as shown in accompanying drawing 4, described system is also used in NLP service process roll release, comprises:

S401，生成NLP语意识别服务进程滚动发布的滚动调度记录，更新滚动调度记录状态为创建状态；S401. Generate a rolling scheduling record for rolling release of the NLP semantic recognition service process, and update the rolling scheduling record status to the creation status;

S402，查询并抢占空闲NLP语意识别服务进程资源；S402, querying and preempting idle NLP semantic recognition service process resources;

S403，更新抢占成功的空闲NLP语意识别服务进程资源为锁定状态；S403, updating the successfully preempted idle NLP semantic recognition service process resource to a locked state;

S404，调用CPU服务器资源调度指令执行器执行NLP语意识别服务进程扩容发布；S404, calling the CPU server resource scheduling instruction executor to execute NLP semantic recognition service process expansion and release;

S405，更新扩容调度记录状态为成功状态，操作NLP语意识别服务进程滚动发布对应的行业id和场景id进行NLP语意识别服务进程滚动发布编排调度，生成滚动发布调度记录；S405, updating the state of the expansion scheduling record to a successful state, operating the NLP semantic recognition service process to roll out the corresponding industry id and scene id to perform rolling release scheduling of the NLP semantic recognition service process, and generate a rolling release scheduling record;

S406，更新NLP语意识别服务进程滚动发布直到遍历完待滚动发布列表。S406. Update the rolling release of the NLP semantic recognition service process until the rolling release list to be traversed is exhausted.

针对步骤S406，在遍历完NLP语意识别服务进程滚动发布之后，对服务器进程资源加锁；反注册对应服务，调用CPU服务器资源调度指令执行器关停对应服务。For step S406, after traversing the rolling release of the NLP semantic recognition service process, lock the server process resource; unregister the corresponding service, and call the CPU server resource scheduling instruction executor to shut down the corresponding service.

在一个示例中，如附图5所示，所述所述资源调度中心模块对NLP语意识别服务进程关系同步包括：In one example, as shown in FIG. 5 , the synchronization of the NLP semantic recognition service process relationship by the resource scheduling center module includes:

S501，获取训练完成的NLP识别模型的关系文件URL地址、行业id和场景id；S501, obtain the URL address of the relationship file, the industry id and the scene id of the trained NLP recognition model;

S502，根据行业id和场景id查询运行中的NLP语意识别服务进程；S502, query the running NLP semantic recognition service process according to the industry id and the scene id;

S503，调用GPU服务器资源调度指令执行器执行关系同步；S503, calling the GPU server resource scheduling instruction executor to perform relationship synchronization;

S504，日志记录NLP识别模型与业务场景关系的同步情况。S504. Record the synchronization status of the relationship between the NLP recognition model and the business scenario in a log.

在一个示例中，如附图6所示，所述资源调度中心模块对NLP识别模型训练包括：In one example, as shown in accompanying drawing 6, the training of the NLP identification model by the resource scheduling center module includes:

S601，获取NLP识别模型训练语料库地址、NLP识别模型训练完成后的上传URL地址，以及NLP识别模型相关的行业id和场景id；S601, obtaining the address of the NLP recognition model training corpus, the upload URL address after the NLP recognition model training is completed, and the industry id and scene id related to the NLP recognition model;

S602，生成训练调度记录，设置训练调度记录为创建状态；S602, generating a training scheduling record, and setting the training scheduling record as a created state;

S603，查询可用于训练的服务器资源并加锁；S603, query and lock server resources available for training;

S604，更新加锁后的用于训练的服务器资源为锁定状态；S604, updating the locked server resource for training to a locked state;

S605，生成训练记录，标记训练调度记录状态为调度中；S605, generate a training record, and mark the training scheduling record status as scheduling;

S606，调用GPU服务器资源调度指令执行器执行训练指令，更新训练调度记录为训练中；S606, call the GPU server resource scheduling instruction executor to execute the training instruction, and update the training scheduling record as training;

针对步骤S606，调用GPU服务器资源调度指令执行器执行训练指令失败时，释放加锁的服务器资源并将训练调度记录标记为失败。For step S606, when calling the GPU server resource scheduling instruction executor to execute the training instruction fails, release the locked server resource and mark the training scheduling record as failure.

S607，释放加锁的服务器资源等待GPU服务器资源调度指令执行器执行完毕训练指令并将训练结果回调；S607, release the locked server resources and wait for the GPU server resource scheduling instruction executor to finish executing the training instruction and call back the training result;

S608，获得训练完成的NLP识别模型。S608. Obtain a trained NLP recognition model.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，这些均属于本发明的保护范围之内。Embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, and the above-mentioned specific embodiments are only illustrative and not restrictive; Technical personnel, according to the idea of the present invention, will have changes in the specific implementation and application scope, and these all belong to the protection scope of the present invention.

Claims

1.NLP model training release recognition system, it is characterized in that, described system comprises:

At least two GPU servers, the GPU servers are used for machine learning calculations;

NLP recognition model, the NLP recognition model is the model file used to start the NLP semantic recognition service obtained after algorithm fitting of the business text corpus;

NLP semantic recognition service process, the NLP semantic recognition service process is based on the NLP recognition model file after the business text corpus training is completed, and is used to provide external NLP semantic recognition service server system process;

At least one NLP gateway, the NLP gateway is used to route and forward the request invoked by the business party to the server where the NLP language awareness service process has been successfully started and is being provided externally according to preset rules, and provided by the corresponding NLP semantic recognition service identification services;

At least two GPU server resource scheduling instruction executors, the GPU server resource scheduling instruction executors are used to execute the scheduling instructions initiated by the resource scheduling center module;

At least one resource scheduling center module, the resource scheduling center module is used for the training of the NLP recognition model, the release of the NLP recognition model, and the synchronous change of the relationship between the NLP recognition model and business data to generate the following operating instructions and arrangement modes, including: training instructions , service startup command, service shutdown command, and relationship synchronization command; training orchestration mode, model release orchestration mode, model shutdown orchestration mode, and relationship synchronization mode;

The training instruction is generated in the training arrangement mode. After the resource scheduling center module allocates the GPU server for the NLP recognition model training, it calls the corresponding GPU server resource scheduling instruction executor to perform model training;

The service startup command and service shutdown command are used in the model release arrangement mode. The resource scheduling center module generates the service startup command or service shutdown command of the process dimension on the GPU server according to the way of publishing a single scheduling request. The NLP semantic recognition service process is started or shut down, and the service shutdown command is also used in the model shutdown orchestration mode to shut down the process that is providing the service on the GPU server;

The relationship synchronization command is generated in the relationship synchronization mode and is used to shut down the corresponding NLP semantic recognition service process;

At least one service registration center module, the service registration center module is used to record and delete service information from the registration center when the NLP semantic recognition service process is started and stopped, and is used for the NLP gateway self-discovery service process.

2. The NLP model training release recognition system according to claim 1, wherein the NLP gateway dynamically discovers the NLP semantic recognition service process that can provide services externally through the service registration center module.

3. The NLP model training release recognition system according to claim 1, wherein the NLP semantic recognition service process depends on the model file after the NLP recognition model training is completed.

4. NLP model training release identification system according to claim 1, is characterized in that, described resource scheduling center module shuts down to NLP semantic recognition service process and comprises:

Generate NLP semantic recognition service process shutdown scheduling record table, the status is created;

According to the ip and port of the NLP semantic recognition service process scheduled to be shut down according to the request, query the corresponding NLP semantic recognition service process, status and industry id of the business concept of the NLP semantic recognition service process running;

When the NLP semantic recognition service process is running, seize the GPU server resource operation lock corresponding to the ip and port of the NLP semantic recognition service process;

According to the industry id of the NLP semantic recognition service process running, the ip of the NLP semantic recognition service process and the port of the NLP semantic recognition service process, remove the corresponding registered NLP semantic recognition service process from the registration center;

Call the CPU server resource scheduling instruction executor to execute the operation of closing the NLP semantic recognition service process;

When the call is successful, release the resource operation lock of the GPU server and update the status of the shutdown scheduling record as successful, and mark the NLP service process resource as idle.

5. The NLP model training release recognition system according to claim 4, wherein the resource scheduling center module shutting down the NLP semantic recognition service process also includes the following steps: when calling the CPU server resource scheduling instruction executor fails , release the resource operation lock of the GPU server, and update the status of the shutdown scheduling record table to failure, and add the reason for the failure.

6. The NLP model training release identification system according to claim 1, wherein the system is also used for NLP service process expansion and release, including:

Obtain the URL address, industry id, and scene id of the trained NLP recognition model, and expand the number of releases;

According to the industry id and scene id, generate the expansion scheduling record corresponding to the expansion quantity, and update the expansion scheduling record status to the created state;

Query and seize the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port;

Call the CPU server resource scheduling instruction executor to perform the release operation;

Update the scheduling status of the NLP semantic recognition service process to success and running;

Register the successfully scheduled NLP semantic recognition service process to the registration center and provide traffic access;

Mark the scheduling of NLP semantic recognition service process resources successfully, and release the GPU server resource operation lock.

7. The NLP model training release identification system according to claim 6, wherein the NLP service process expansion release also includes the following steps: when the number of idle NLP semantic recognition service processes is 0, update the expansion scheduling record status Publishing for scaling fails and is marked as insufficient resources.

8. NLP model training release identification system according to claim 6, is characterized in that, described NLP service process capacity expansion release also comprises the following steps: in preempting idle NLP semantic recognition service process ip, port corresponding GPU server resource operation Before locking, query whether the NLP service process has been expanded to the preset threshold, and if so, end the NLP semantic recognition service process expansion release.

9. The NLP model training release identification system according to claim 6, wherein the release of the NLP service process expansion also includes the following steps: if calling the CPU server resource scheduling instruction executor to perform the release operation fails, updating the expansion scheduling record The state is that the expansion release failed, and it is marked as the execution failure of the CPU server resource scheduling instruction executor.

10. NLP model training release identification system according to claim 1, is characterized in that, described system is also used for NLP service process rolling release, comprising:

Generate a rolling scheduling record for the rolling release of the NLP semantic recognition service process, and update the status of the rolling scheduling record to the created state;

Query and seize idle NLP semantic recognition service process resources;

Update the resources of the idle NLP semantic recognition service process that preempted successfully to the locked state;

Call the CPU server resource scheduling instruction executor to execute NLP semantic recognition service process expansion and release;

Update the status of the expansion scheduling record to success, operate the NLP semantic recognition service process to roll out the corresponding industry id and scene id to perform rolling release scheduling of the NLP semantic recognition service process, and generate a rolling release scheduling record;

Update the rolling release of the NLP semantic recognition service process until the list to be rolled is traversed.

11. The NLP model training release identification system according to claim 10, wherein the rolling release of the NLP service process also includes the following steps: after traversing the rolling release of the NLP semantic recognition service process, locking the server process resources ; Unregister the corresponding service, call the CPU server resource scheduling instruction executor to shut down the corresponding service.

12. NLP model training release identification system according to claim 1, is characterized in that, described resource scheduling center module comprises to NLP semantic recognition service process relation synchronization:

Get the relationship file URL address, industry id and scene id of the trained NLP recognition model;

Query the running NLP semantic recognition service process according to the industry id and scene id;

Call the CPU server resource scheduling instruction executor to perform relationship synchronization;

The log records the synchronization of the relationship between the NLP recognition model and the business scenario.

13. NLP model training release identification system according to claim 1, is characterized in that, described resource scheduling center module comprises to NLP identification model training:

Obtain the address of the NLP recognition model training corpus, the upload URL address after the NLP recognition model training is completed, and the industry id and scene id related to the NLP recognition model;

Generate a training scheduling record, set the training scheduling record as the created state;

Query and lock server resources available for training;

Update the locked server resources for training to the locked state;

Generate a training record and mark the status of the training scheduling record as scheduling;

Call the CPU server resource scheduling instruction executor to execute the training instruction, and update the training scheduling record as training;

Release the locked server resources and wait for the CPU server resource scheduling instruction executor to execute the training instructions and call back the training results;

Obtain the trained NLP recognition model.

14. The NLP model training release recognition system according to claim 13, wherein the training of the NLP recognition model by the resource scheduling center module further includes: when calling the CPU server resource scheduling instruction executor to execute the training instruction fails, releasing the booster Locked server resources and marks the training schedule record as failed.