CN111967613B - NLP model training and publishing recognition system - Google Patents
NLP model training and publishing recognition system Download PDFInfo
- Publication number
- CN111967613B CN111967613B CN202010853842.7A CN202010853842A CN111967613B CN 111967613 B CN111967613 B CN 111967613B CN 202010853842 A CN202010853842 A CN 202010853842A CN 111967613 B CN111967613 B CN 111967613B
- Authority
- CN
- China
- Prior art keywords
- nlp
- service process
- release
- training
- semantic recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5022—Mechanisms to release resources
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及模型训练领域,尤其涉及NLP模型训练发布识别系统。The invention relates to the field of model training, in particular to an NLP model training release recognition system.
背景技术Background technique
NLP(Natural Language Processing)是人工智能(AI)的一个子领域,目前常规的NLP在能提供正常接口服务之前都需要经过模型的训练,发布,以及将模型进行启动这三个过程,这三个过程目前业内都需要开发人员在服务器上进行手工操作,每一次NLP语意识别模型的发布以及训练时都会存在误操作风险,容易因为人的操作错误导致故障。NLP (Natural Language Processing) is a subfield of artificial intelligence (AI). At present, conventional NLP needs to go through three processes of model training, publishing, and starting the model before it can provide normal interface services. Process At present, developers in the industry need to manually operate on the server. Every time the NLP semantic recognition model is released and trained, there is a risk of misoperation, and it is easy to cause failure due to human operation errors.
另外,在NLP语意识别服务进程启动后需要手动进行配置,比如对外提供服务调用接口,需要配置nginx服务接口路由与NLP识别模型以及服务进程的关系,过程繁琐不适合日常的运维工作开展,且具有一定的机械重复性。In addition, after the NLP semantic recognition service process is started, it needs to be manually configured. For example, to provide a service call interface to the outside world, it is necessary to configure the relationship between the nginx service interface route, the NLP recognition model and the service process. The process is cumbersome and not suitable for daily operation and maintenance. It has certain mechanical repeatability.
发明内容Contents of the invention
本发明要解决的技术问题,在于提供NLP识别模型训练发布识别系统,能够简化NLP语意识别服务配置,提高NLP识别模型训练效率、减少误配置漏配置的风险,对服务使用方提供稳定的NLP语意识别服务。The technical problem to be solved by the present invention is to provide an NLP recognition model training release recognition system, which can simplify the NLP semantic recognition service configuration, improve the NLP recognition model training efficiency, reduce the risk of misconfiguration and missing configuration, and provide stable NLP semantics for service users Identification service.
为实现上述目的,本发明采用下述技术方案:To achieve the above object, the present invention adopts the following technical solutions:
本发明提供NLP模型训练发布识别系统,所述系统包括:The invention provides an NLP model training release recognition system, the system comprising:
至少两台GPU服务器,所述GPU服务器用于机器学习计算;At least two GPU servers, the GPU servers are used for machine learning calculations;
NLP识别模型,所述NLP识别模型为业务文本语料进行算法拟合后获得的用于启动NLP语意识别服务的模型文件;NLP recognition model, the NLP recognition model is the model file used to start the NLP semantic recognition service obtained after algorithm fitting of the business text corpus;
NLP语意识别服务进程,所述NLP语意识别服务进程为以业务文本语料训练完成后的NLP识别模型文件为基础、用于对外提供NLP语意识别服务的服务器系统进程;NLP semantic recognition service process, the NLP semantic recognition service process is based on the NLP recognition model file after the business text corpus training is completed, and is used to provide external NLP semantic recognition service server system process;
至少一个NLP网关,所述NLP网关用于将业务方调用的请求根据预设规则进行路由转发到已经启动成功正在对外提供NLP语意识服务进程所在的服务器上,并由对应的NLP语意识别服务提供识别服务;At least one NLP gateway, the NLP gateway is used to route and forward the request invoked by the business party to the server where the NLP language awareness service process has been successfully started and is being provided externally according to preset rules, and provided by the corresponding NLP semantic recognition service identification services;
至少两台GPU服务器资源调度指令执行器,所述GPU服务器资源调度指令执行器用于执行资源调度中心模块发起的调度指令;At least two GPU server resource scheduling instruction executors, the GPU server resource scheduling instruction executors are used to execute the scheduling instructions initiated by the resource scheduling center module;
至少一个资源调度中心模块,所述资源调度中心模块用于以下过程中GPU服务器资源的分配以及协调执行者进行指令执行,包括:At least one resource scheduling center module, the resource scheduling center module is used for the allocation of GPU server resources in the following processes and for coordinating executors to execute instructions, including:
NLP识别模型的训练、NLP识别模型发布以及NLP识别模型与业务数据关系的同步变更;Training of NLP recognition models, release of NLP recognition models, and synchronous change of the relationship between NLP recognition models and business data;
至少一个服务注册中心模块,所述服务注册中心模块用于NLP语意识别服务进程启停的同时将服务信息从注册中心记录以及删除,并用于NLP网关自发现服务进程。At least one service registration center module, the service registration center module is used to record and delete service information from the registration center when the NLP semantic recognition service process is started and stopped, and is used for the NLP gateway self-discovery service process.
在上述方案中,所述NLP网关通过服务注册中心模块动态发现已经可以对外提供服务的NLP语意识别服务进程。In the above solution, the NLP gateway dynamically discovers the NLP semantic recognition service process that can provide external services through the service registry module.
在上述方案中,所述所述NLP语意识别服务进程依赖于NLP识别模型训练完成后的模型文件。In the above solution, the NLP semantic recognition service process depends on the model file after the training of the NLP recognition model is completed.
在上述方案中,所述资源调度中心模块用于NLP识别模型的训练、NLP识别模型发布以及NLP识别模型与业务数据关系的同步变更过程中产生以下操作指令和编排模式,包括:In the above scheme, the resource dispatching center module is used for the training of the NLP recognition model, the release of the NLP recognition model, and the synchronous change of the relationship between the NLP recognition model and business data to generate the following operation instructions and arrangement modes, including:
训练指令、服务启动指令、服务关停指令以及关系同步指令;training commands, service activation commands, service shutdown commands, and relationship synchronization commands;
训练编排模式、模型发布编排模式、模型关停编排模式以及关系同步模式;Training orchestration mode, model release orchestration mode, model shutdown orchestration mode, and relationship synchronization mode;
训练指令产生于训练编排模式中,由资源调度中心模块对NLP识别模型训练进行分配GPU服务器后,调用对应GPU服务器资源调度指令执行器,进行模型训练;The training instruction is generated in the training arrangement mode. After the resource scheduling center module allocates the GPU server for the NLP recognition model training, it calls the corresponding GPU server resource scheduling instruction executor to perform model training;
服务启动指令以及服务关停指令用于模型发布编排模式,资源调度中心模块根据单次调度请求发布的方式,生成GPU服务器上进程纬度的服务启动指令或者服务关停指令,对具体GPU服务器上的NLP语意识别服务进程进行启动或关停操作,其中,服务关停指令也用于模型关停编排模式,用于关停GPU服务器上正在提供服务的进程;The service startup command and service shutdown command are used in the model release arrangement mode. The resource scheduling center module generates the service startup command or service shutdown command of the process latitude on the GPU server according to the way of publishing a single scheduling request. The NLP semantic recognition service process is started or shut down, and the service shutdown command is also used in the model shutdown orchestration mode to shut down the process that is providing the service on the GPU server;
关系同步指令产生于关系同步模式下,用于关停对应的NLP语意识别服务进程。The relationship synchronization command is generated in the relationship synchronization mode and is used to shut down the corresponding NLP semantic recognition service process.
在上述方案中,所述资源调度中心模块对NLP语意识别服务进程关停包括:In the above scheme, the shutting down of the NLP semantic recognition service process by the resource scheduling center module includes:
生成NLP语意识别服务进程关停调度记录表,状态为创建;Generate NLP semantic recognition service process shutdown scheduling record table, the status is created;
根据请求关停调度的NLP语意识别服务进程的ip、port查询对应的NLP语意识别服务进程,状态以及NLP语意识别服务进程运行的业务概念上行业id;According to the ip and port of the NLP semantic recognition service process scheduled to be shut down according to the request, query the corresponding NLP semantic recognition service process, status and industry id of the business concept of the NLP semantic recognition service process running;
当NLP语意识别服务进程处于运行状态时,抢占NLP语意识别服务进程的ip、port对应的GPU服务器资源操作锁;When the NLP semantic recognition service process is running, seize the GPU server resource operation lock corresponding to the ip and port of the NLP semantic recognition service process;
根据查询到NLP语意识别服务进程运行的行业id、NLP语意识别服务进程的ip和NLP语意识别服务进程的port移除注册中心对应注册的NLP语意识别服务进程;According to the industry id of the NLP semantic recognition service process running, the ip of the NLP semantic recognition service process and the port of the NLP semantic recognition service process, remove the corresponding registered NLP semantic recognition service process from the registration center;
调用CPU服务器资源调度指令执行器执行关闭NLP语意识别服务进程的操作;Call the CPU server resource scheduling instruction executor to execute the operation of closing the NLP semantic recognition service process;
调用成功时,释放GPU服务器资源操作锁并更新关停调度记录状态为成功,When the call is successful, the GPU server resource operation lock is released and the shutdown scheduling record status is updated as successful.
标记NLP服务进程资源为空闲。Mark the NLP server process resource as idle.
在上述方案中,所述资源调度中心模块对NLP语意识别服务进程关停还包括以下步骤:In the above scheme, the shutting down of the NLP semantic recognition service process by the resource scheduling center module also includes the following steps:
当调用CPU服务器资源调度指令执行器失败时,释放GPU服务器资源操作锁,并更新关停调度记录表状态为失败,附加失败原因。When calling the CPU server resource scheduling command executor fails, release the GPU server resource operation lock, and update the status of the shutdown scheduling record table to fail, and add the reason for the failure.
在上述方案中,所述系统还用于NLP服务进程扩容发布,包括:In the above scheme, the system is also used for NLP service process expansion and release, including:
获取训练完成的NLP识别模型的URL地址、行业id以及场景id,扩容发布扩容数量;Obtain the URL address, industry id, and scene id of the trained NLP recognition model, and expand the number of releases;
根据行业id、场景id生成扩容数量对应的扩容调度记录,更新扩容调度记录状态为创建状态;According to the industry id and scene id, generate the expansion scheduling record corresponding to the expansion quantity, and update the expansion scheduling record status to the created state;
查询并抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁;Query and seize the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port;
调用CPU服务器资源调度指令执行器进行发布操作;Call the CPU server resource scheduling instruction executor to perform the release operation;
更新NLP语意识别服务进程调度状态为成功和运行中;Update the scheduling status of the NLP semantic recognition service process to success and running;
将调度成功的NLP语意识别服务进程注册到注册中心并提供流量接入;Register the successfully scheduled NLP semantic recognition service process to the registration center and provide traffic access;
标记调度NLP语意识别服务进程资源成功,释放GPU服务器资源操作锁。Mark the scheduling of NLP semantic recognition service process resources successfully, and release the GPU server resource operation lock.
在上述方案中,所述NLP服务进程扩容发布还包括以下步骤:In the above scheme, the release of the expansion of the NLP service process also includes the following steps:
当查询空闲NLP语意识别服务进程数量为0时,更新扩容调度记录状态为扩容发布失败,并标记为资源不足。When the number of idle NLP semantic recognition service processes in the query is 0, update the expansion scheduling record status to expansion release failure, and mark it as insufficient resources.
在上述方案中,所述NLP服务进程扩容发布还包括以下步骤:In the above scheme, the release of the expansion of the NLP service process also includes the following steps:
在抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁之前,查询NLP服务进程是否已经扩容至预设阈值,如果是,结束NLP语意识别服务进程扩容发布。Before preempting the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port, query whether the NLP service process has been expanded to the preset threshold, and if so, end the NLP semantic recognition service process expansion release.
在上述方案中,所述NLP服务进程扩容发布还包括以下步骤:In the above scheme, the release of the expansion of the NLP service process also includes the following steps:
若调用CPU服务器资源调度指令执行器进行发布操作失败,更新扩容调度记录状态为扩容发布失败,并标记为CPU服务器资源调度指令执行器执行失败。If calling the CPU server resource scheduling instruction executor to issue the operation fails, update the expansion scheduling record status to expansion release failure, and mark it as CPU server resource scheduling instruction executor execution failure.
在上述方案中,所述系统还用于NLP服务进程滚动发布,包括:In the above scheme, the system is also used for rolling release of NLP service process, including:
生成NLP语意识别服务进程滚动发布的滚动调度记录,更新滚动调度记录状态为创建状态;Generate a rolling scheduling record for the rolling release of the NLP semantic recognition service process, and update the status of the rolling scheduling record to the created state;
查询并抢占空闲NLP语意识别服务进程资源;Query and seize idle NLP semantic recognition service process resources;
更新抢占成功的空闲NLP语意识别服务进程资源为锁定状态;Update the resources of the idle NLP semantic recognition service process that preempted successfully to the locked state;
调用CPU服务器资源调度指令执行器执行NLP语意识别服务进程扩容发布;Call the CPU server resource scheduling instruction executor to execute NLP semantic recognition service process expansion and release;
更新扩容调度记录状态为成功状态,操作NLP语意识别服务进程滚动发布对应的行业id和场景id进行NLP语意识别服务进程滚动发布编排调度,生成滚动发布调度记录;Update the status of the expansion scheduling record to success, operate the NLP semantic recognition service process to roll out the corresponding industry id and scene id to perform rolling release scheduling of the NLP semantic recognition service process, and generate a rolling release scheduling record;
更新NLP语意识别服务进程滚动发布直到遍历完待滚动发布列表。Update the rolling release of the NLP semantic recognition service process until the list to be rolled is traversed.
在上述方案中所述NLP服务进程滚动发布还包括以下步骤:在遍历完NLP语意识别服务进程滚动发布之后,对服务器进程资源加锁;反注册对应服务,调用CPU服务器资源调度指令执行器关停对应服务。In the above solution, the NLP service process rolling release also includes the following steps: after traversing the NLP semantic recognition service process rolling release, lock the server process resource; unregister the corresponding service, and call the CPU server resource scheduling instruction to shut down the executor Corresponding service.
在上述方案中,所述所述资源调度中心模块对NLP语意识别服务进程关系同步包括:In the above solution, the synchronization of the NLP semantic recognition service process relationship by the resource scheduling center module includes:
获取训练完成的NLP识别模型的关系文件URL地址、行业id和场景id;Get the relationship file URL address, industry id and scene id of the trained NLP recognition model;
根据行业id和场景id查询运行中的NLP语意识别服务进程;Query the running NLP semantic recognition service process according to the industry id and scene id;
调用CPU服务器资源调度指令执行器执行关系同步;Call the CPU server resource scheduling instruction executor to perform relationship synchronization;
日志记录NLP识别模型与业务场景关系的同步情况。The log records the synchronization of the relationship between the NLP recognition model and the business scenario.
在上述方案中,所述资源调度中心模块对NLP识别模型训练包括:In the above scheme, the training of the NLP recognition model by the resource scheduling center module includes:
获取NLP识别模型训练语料库地址、NLP识别模型训练完成后的上传URL地址,以及NLP识别模型相关的行业id和场景id;Obtain the address of the NLP recognition model training corpus, the upload URL address after the NLP recognition model training is completed, and the industry id and scene id related to the NLP recognition model;
生成训练调度记录,设置训练调度记录为创建状态;Generate a training scheduling record, set the training scheduling record as the created state;
查询可用于训练的服务器资源并加锁;Query and lock server resources available for training;
更新加锁后的用于训练的服务器资源为锁定状态;Update the locked server resources for training to the locked state;
生成训练记录,标记训练调度记录状态为调度中;Generate a training record and mark the status of the training scheduling record as scheduling;
调用CPU服务器资源调度指令执行器执行训练指令,更新训练调度记录为训练中;Call the CPU server resource scheduling instruction executor to execute the training instruction, and update the training scheduling record as training;
释放加锁的服务器资源等待CPU服务器资源调度指令执行器执行完毕训练指令并将训练结果回调;Release the locked server resources and wait for the CPU server resource scheduling instruction executor to execute the training instructions and call back the training results;
获得训练完成的NLP识别模型。Obtain the trained NLP recognition model.
在上述方案中,所述资源调度中心模块对NLP识别模型训练还包括:In the above scheme, the training of the NLP recognition model by the resource scheduling center module also includes:
调用CPU服务器资源调度指令执行器执行训练指令失败时,释放加锁的服务器资源并将训练调度记录标记为失败。When calling the CPU server resource scheduling command executor to execute the training command fails, release the locked server resource and mark the training scheduling record as failed.
本发明的有益效果是:The beneficial effects of the present invention are:
本发明提供模型训练发布识别系统,通过简化NLP服务配置能够有效的降低人为介入操作的频率,降低人为操作故障率,提高发布/训练效率;能够自发现目前已经训练完成的NLP识别模型,对外提供NLP接口服务。The invention provides a model training release recognition system, which can effectively reduce the frequency of human intervention operations, reduce the failure rate of human operation, and improve the efficiency of release/training by simplifying the NLP service configuration; it can self-discover the NLP recognition models that have been trained so far, and provide them to the outside world. NLP interface service.
附图说明Description of drawings
图1为本发明实施例提供的模型训练发布识别系统的结构示意图;FIG. 1 is a schematic structural diagram of a model training release recognition system provided by an embodiment of the present invention;
图2为本发明实施例提供的一个示例中资源调度中心模块对NLP语意识别服务进程关停的流程示意图;Fig. 2 is a schematic flow diagram of shutting down the NLP semantic recognition service process by the resource scheduling center module in an example provided by the embodiment of the present invention;
图3为本发明实施例提供的一个示例中模型训练发布识别系统用于NLP服务进程扩容发布的流程示意图;Fig. 3 is a schematic flow diagram of the model training release identification system used for NLP service process expansion and release in an example provided by the embodiment of the present invention;
图4为本发明实施例提供的一个示例中模型训练发布识别系统用于NLP服务进程滚动发布的流程示意图;FIG. 4 is a schematic flow diagram of the rolling release of the NLP service process used by the model training release identification system in an example provided by the embodiment of the present invention;
图5为本发明实施例提供的一个示例中资源调度中心模块对NLP语意识别服务进程关系同步的流程示意图;Fig. 5 is a schematic flow diagram of the resource scheduling center module synchronizing the NLP semantic recognition service process relationship in an example provided by the embodiment of the present invention;
图6为本发明实施例提供的一个示例中资源调度中心模块对NLP识别模型进行训练的流程示意图。Fig. 6 is a schematic flow diagram of training the NLP recognition model by the resource scheduling center module in an example provided by the embodiment of the present invention.
具体实施方式Detailed ways
下面通过具体实施例,并结合附图,对本发明的技术方案作进一步的具体描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions of the present invention will be described in further detail below through specific embodiments in conjunction with the drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
本发明实施例提供模型训练发布识别系统,能够简化NLP服务配置,提高NLP识别模型的发布和训练效率、对外提供NLP接口服务。Embodiments of the present invention provide a model training release recognition system, which can simplify NLP service configuration, improve the efficiency of release and training of NLP recognition models, and provide NLP interface services externally.
以下结合附图,详细说明本发明中各实施例提供的技术方案。The technical solutions provided by various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
本发明实施例提供模型训练发布识别系统,如附图1所示,所述系统包括:Embodiments of the present invention provide a model training release recognition system, as shown in Figure 1, the system includes:
至少两台GPU服务器10,所述GPU服务器10用于机器学习计算;At least two
NLP识别模型20,所述NLP识别模型20为业务文本语料进行算法拟合后获得的用于启动NLP语意识别服务的模型文件;
NLP语意识别服务进程30,所述NLP语意识别服务进程30为以业务文本语料训练完成后的NLP识别模型文件为基础、用于对外提供NLP语意识别服务的服务器系统进程;NLP semantic
至少一个NLP网关40,所述NLP网关40用于将业务方调用的请求根据预设规则进行路由转发到已经启动成功正在对外提供NLP语意识服务进程30所在的服务器上,并由对应的NLP语意识别服务提供识别服务;At least one
至少两台GPU服务器资源调度指令执行器50,所述GPU服务器资源调度指令执行器50用于执行资源调度中心模块60发起的调度指令;At least two GPU server resource
至少一个资源调度中心模块60,所述资源调度中心模块60用于以下过程中GPU服务器10资源的分配以及协调执行者进行指令执行,包括:At least one resource
NLP识别模型20的训练、NLP识别模型20发布以及NLP识别模型20与业务数据关系的同步变更;The training of the
所述资源调度中心模块60用于NLP识别模型20的训练、NLP识别模型20发布以及NLP识别模型20与业务数据关系的同步变更过程中产生以下操作指令和编排模式,包括:The resource
训练指令、服务启动指令、服务关停指令以及关系同步指令;training commands, service activation commands, service shutdown commands, and relationship synchronization commands;
训练编排模式、模型发布编排模式、模型关停编排模式以及关系同步模式;Training orchestration mode, model release orchestration mode, model shutdown orchestration mode, and relationship synchronization mode;
训练指令产生于训练编排模式中,由资源调度中心模块60对NLP识别模型20训练进行分配GPU服务器10后,调用对应GPU服务器资源调度指令执行器50,进行模型训练;The training instruction is generated in the training arrangement mode. After the resource
服务启动指令以及服务关停指令用于模型发布编排模式,资源调度中心模块60根据单次调度请求发布的方式,生成GPU服务器10上进程纬度的服务启动指令或者服务关停指令,对具体GPU服务器10上的NLP语意识别服务进程30进行启动或关停操作,其中,服务关停指令也用于模型关停编排模式,用于关停GPU服务器10上正在提供服务的进程;The service startup instruction and the service shutdown instruction are used in the model release arrangement mode, and the resource
关系同步指令产生于关系同步模式下,用于关停对应的NLP语意识别服务进程30;The relationship synchronization command is generated in the relationship synchronization mode, and is used to shut down the corresponding NLP semantic
至少一个服务注册中心模块70,所述服务注册中心模块70用于NLP语意识别服务进程启停30的同时将服务信息从注册中心记录以及删除,并用于NLP网关40自发现服务进程。At least one service
在一个示例中,所述NLP网关40通过服务注册中心模块70动态发现已经可以对外提供服务的NLP语意识别服务进程30。In an example, the
在一个示例中,所述NLP语意识别服务进程30依赖于NLP识别模型20训练完成后的模型文件。In one example, the NLP semantic
在一个示例中,如附图2所示,所述资源调度中心模块对NLP语意识别服务进程关停包括:In one example, as shown in Figure 2, the resource scheduling center module shutting down the NLP semantic recognition service process includes:
S201,生成NLP语意识别服务进程关停调度记录表,状态为创建;S201, generate NLP semantic recognition service process shutdown scheduling record table, the state is created;
S202,根据请求关停调度的NLP语意识别服务进程的ip、port查询对应的NLP语意识别服务进程,状态以及NLP语意识别服务进程运行的业务概念上行业id;S202, according to the ip and port of the NLP semantic recognition service process that is requested to be shut down and scheduled, query the corresponding NLP semantic recognition service process, status, and industry id of the business concept that the NLP semantic recognition service process runs;
S203,当NLP语意识别服务进程处于运行状态时,抢占NLP语意识别服务进程的ip、port对应的GPU服务器资源操作锁;S203, when the NLP semantic recognition service process is running, seize the GPU server resource operation lock corresponding to the ip and port of the NLP semantic recognition service process;
S204,根据查询到NLP语意识别服务进程运行的行业id、NLP语意识别服务进程的ip和NLP语意识别服务进程的port移除注册中心对应注册的NLP语意识别服务进程;S204, remove the correspondingly registered NLP semantic recognition service process from the registration center according to the industry id of the running NLP semantic recognition service process, the ip of the NLP semantic recognition service process, and the port of the NLP semantic recognition service process;
S205,调用CPU服务器资源调度指令执行器执行关闭NLP语意识别服务进程的操作;S205, calling the CPU server resource scheduling instruction executor to execute the operation of closing the NLP semantic recognition service process;
针对步骤S205,当调用CPU服务器资源调度指令执行器失败时,释放GPU服务器资源操作锁,并更新关停调度记录表状态为失败,附加失败原因。For step S205, when calling the executor of the CPU server resource scheduling instruction fails, release the GPU server resource operation lock, and update the status of the shutdown scheduling record table to fail, and add the reason for the failure.
S206,调用成功时,释放GPU服务器资源操作锁并更新关停调度记录状态为成功,标记NLP服务进程资源为空闲。S206, when the call is successful, release the GPU server resource operation lock and update the shutdown scheduling record status as successful, and mark the NLP service process resource as idle.
在一个示例中,如附图3所示,所述系统还用于NLP服务进程扩容发布,包括:In one example, as shown in accompanying drawing 3, described system is also used for NLP service process expansion release, comprises:
S301,获取训练完成的NLP识别模型的URL地址、行业id以及场景id,扩容发布扩容数量;S301. Obtain the URL address, industry id, and scene id of the trained NLP recognition model, and expand and release the expanded quantity;
S302,根据行业id、场景id生成扩容数量对应的扩容调度记录,更新扩容调度记录状态为创建状态;S302. Generate an expansion scheduling record corresponding to the expansion quantity according to the industry id and the scenario id, and update the status of the expansion scheduling record to the created state;
S303,查询并抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁;S303, query and seize the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port;
针对步骤S303,当查询空闲NLP语意识别服务进程数量为0时,更新扩容调度记录状态为扩容发布失败,并标记为资源不足。For step S303, when the number of idle NLP semantic recognition service processes is 0, the state of the capacity expansion scheduling record is updated as capacity expansion release failure, and marked as insufficient resources.
S304,调用GPU服务器资源调度指令执行器进行发布操作;S304, calling the GPU server resource scheduling instruction executor to issue an operation;
针对步骤S304,若调用GPU服务器资源调度指令执行器进行发布操作失败,更新扩容调度记录状态为扩容发布失败,并标记为GPU服务器资源调度指令执行器执行失败。For step S304, if calling the GPU server resource scheduling instruction executor to issue the operation fails, update the capacity expansion scheduling record status to expansion issue failure, and mark it as GPU server resource scheduling instruction executor execution failure.
S305,更新NLP语意识别服务进程调度状态为成功和运行中;S305, updating the scheduling status of the NLP semantic recognition service process as successful and running;
S306,将调度成功的NLP语意识别服务进程注册到注册中心并提供流量接入;S306, register the successfully scheduled NLP semantic recognition service process to the registration center and provide traffic access;
S307,标记调度NLP语意识别服务进程资源成功,释放GPU服务器资源操作锁。S307 , marking success in scheduling the resources of the NLP semantic recognition service process, and releasing the resource operation lock of the GPU server.
在一个示例中,在抢占空闲的NLP语意识别服务进程ip、port对应的GPU服务器资源操作锁之前,查询NLP服务进程是否已经扩容至预设阈值,如果是,结束NLP语意识别服务进程扩容发布。In one example, before preempting the GPU server resource operation lock corresponding to the idle NLP semantic recognition service process ip and port, query whether the NLP service process has been expanded to a preset threshold, and if so, end the NLP semantic recognition service process expansion release.
在一个示例中,如附图4所示,所述系统还用于NLP服务进程滚动发布,包括:In one example, as shown in accompanying drawing 4, described system is also used in NLP service process roll release, comprises:
S401,生成NLP语意识别服务进程滚动发布的滚动调度记录,更新滚动调度记录状态为创建状态;S401. Generate a rolling scheduling record for rolling release of the NLP semantic recognition service process, and update the rolling scheduling record status to the creation status;
S402,查询并抢占空闲NLP语意识别服务进程资源;S402, querying and preempting idle NLP semantic recognition service process resources;
S403,更新抢占成功的空闲NLP语意识别服务进程资源为锁定状态;S403, updating the successfully preempted idle NLP semantic recognition service process resource to a locked state;
S404,调用CPU服务器资源调度指令执行器执行NLP语意识别服务进程扩容发布;S404, calling the CPU server resource scheduling instruction executor to execute NLP semantic recognition service process expansion and release;
S405,更新扩容调度记录状态为成功状态,操作NLP语意识别服务进程滚动发布对应的行业id和场景id进行NLP语意识别服务进程滚动发布编排调度,生成滚动发布调度记录;S405, updating the state of the expansion scheduling record to a successful state, operating the NLP semantic recognition service process to roll out the corresponding industry id and scene id to perform rolling release scheduling of the NLP semantic recognition service process, and generate a rolling release scheduling record;
S406,更新NLP语意识别服务进程滚动发布直到遍历完待滚动发布列表。S406. Update the rolling release of the NLP semantic recognition service process until the rolling release list to be traversed is exhausted.
针对步骤S406,在遍历完NLP语意识别服务进程滚动发布之后,对服务器进程资源加锁;反注册对应服务,调用CPU服务器资源调度指令执行器关停对应服务。For step S406, after traversing the rolling release of the NLP semantic recognition service process, lock the server process resource; unregister the corresponding service, and call the CPU server resource scheduling instruction executor to shut down the corresponding service.
在一个示例中,如附图5所示,所述所述资源调度中心模块对NLP语意识别服务进程关系同步包括:In one example, as shown in FIG. 5 , the synchronization of the NLP semantic recognition service process relationship by the resource scheduling center module includes:
S501,获取训练完成的NLP识别模型的关系文件URL地址、行业id和场景id;S501, obtain the URL address of the relationship file, the industry id and the scene id of the trained NLP recognition model;
S502,根据行业id和场景id查询运行中的NLP语意识别服务进程;S502, query the running NLP semantic recognition service process according to the industry id and the scene id;
S503,调用GPU服务器资源调度指令执行器执行关系同步;S503, calling the GPU server resource scheduling instruction executor to perform relationship synchronization;
S504,日志记录NLP识别模型与业务场景关系的同步情况。S504. Record the synchronization status of the relationship between the NLP recognition model and the business scenario in a log.
在一个示例中,如附图6所示,所述资源调度中心模块对NLP识别模型训练包括:In one example, as shown in accompanying drawing 6, the training of the NLP identification model by the resource scheduling center module includes:
S601,获取NLP识别模型训练语料库地址、NLP识别模型训练完成后的上传URL地址,以及NLP识别模型相关的行业id和场景id;S601, obtaining the address of the NLP recognition model training corpus, the upload URL address after the NLP recognition model training is completed, and the industry id and scene id related to the NLP recognition model;
S602,生成训练调度记录,设置训练调度记录为创建状态;S602, generating a training scheduling record, and setting the training scheduling record as a created state;
S603,查询可用于训练的服务器资源并加锁;S603, query and lock server resources available for training;
S604,更新加锁后的用于训练的服务器资源为锁定状态;S604, updating the locked server resource for training to a locked state;
S605,生成训练记录,标记训练调度记录状态为调度中;S605, generate a training record, and mark the training scheduling record status as scheduling;
S606,调用GPU服务器资源调度指令执行器执行训练指令,更新训练调度记录为训练中;S606, call the GPU server resource scheduling instruction executor to execute the training instruction, and update the training scheduling record as training;
针对步骤S606,调用GPU服务器资源调度指令执行器执行训练指令失败时,释放加锁的服务器资源并将训练调度记录标记为失败。For step S606, when calling the GPU server resource scheduling instruction executor to execute the training instruction fails, release the locked server resource and mark the training scheduling record as failure.
S607,释放加锁的服务器资源等待GPU服务器资源调度指令执行器执行完毕训练指令并将训练结果回调;S607, release the locked server resources and wait for the GPU server resource scheduling instruction executor to finish executing the training instruction and call back the training result;
S608,获得训练完成的NLP识别模型。S608. Obtain a trained NLP recognition model.
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,这些均属于本发明的保护范围之内。Embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, and the above-mentioned specific embodiments are only illustrative and not restrictive; Technical personnel, according to the idea of the present invention, will have changes in the specific implementation and application scope, and these all belong to the protection scope of the present invention.
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010853842.7A CN111967613B (en) | 2020-08-24 | 2020-08-24 | NLP model training and publishing recognition system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010853842.7A CN111967613B (en) | 2020-08-24 | 2020-08-24 | NLP model training and publishing recognition system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111967613A CN111967613A (en) | 2020-11-20 |
| CN111967613B true CN111967613B (en) | 2023-06-16 |
Family
ID=73390752
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010853842.7A Active CN111967613B (en) | 2020-08-24 | 2020-08-24 | NLP model training and publishing recognition system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111967613B (en) |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2302529B1 (en) * | 2003-01-20 | 2019-12-11 | Dell Products, L.P. | System and method for distributed block level storage |
| US8862631B2 (en) * | 2004-12-21 | 2014-10-14 | Bmc Software, Inc. | System and method for building business service model |
| CN105205735A (en) * | 2015-10-08 | 2015-12-30 | 南京南瑞继保电气有限公司 | Power dispatching data cloud service system and implementation method |
| US20170124497A1 (en) * | 2015-10-28 | 2017-05-04 | Fractal Industries, Inc. | System for automated capture and analysis of business information for reliable business venture outcome prediction |
| US10831762B2 (en) * | 2015-11-06 | 2020-11-10 | International Business Machines Corporation | Extracting and denoising concept mentions using distributed representations of concepts |
| CN105930191B (en) * | 2016-04-28 | 2019-01-04 | 网宿科技股份有限公司 | The overloaded method and device of system service |
| US20170345112A1 (en) * | 2016-05-25 | 2017-11-30 | Tyco Fire & Security Gmbh | Dynamic Threat Analysis Engine for Mobile Users |
| US10332505B2 (en) * | 2017-03-09 | 2019-06-25 | Capital One Services, Llc | Systems and methods for providing automated natural language dialogue with customers |
| CN109271602B (en) * | 2018-09-05 | 2020-09-15 | 腾讯科技(深圳)有限公司 | Deep learning model publishing method and device |
| CN110795529B (en) * | 2019-09-05 | 2023-07-25 | 腾讯科技(深圳)有限公司 | Model management method and device, storage medium and electronic equipment |
| CN110659261A (en) * | 2019-09-19 | 2020-01-07 | 成都数之联科技有限公司 | Data mining model publishing method, model and model service management method |
| CN110688473A (en) * | 2019-10-09 | 2020-01-14 | 浙江百应科技有限公司 | Method for robot to dynamically acquire information |
| CN111400081A (en) * | 2020-03-24 | 2020-07-10 | 恒生电子股份有限公司 | Process guarding method and device, electronic equipment and computer storage medium |
| CN111444021B (en) * | 2020-04-02 | 2023-03-24 | 电子科技大学 | Synchronous training method, server and system based on distributed machine learning |
-
2020
- 2020-08-24 CN CN202010853842.7A patent/CN111967613B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN111967613A (en) | 2020-11-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113067900B (en) | Intelligent contract deployment method and device | |
| CN111641515B (en) | VNF life cycle management method and device | |
| CN107463418B (en) | Configuration file generation method and device for server middleware | |
| CN113094125B (en) | Business process processing method, device, server and storage medium | |
| CN113867889B (en) | Distributed real-time simulation platform | |
| CN104503845A (en) | Task distributing method and system | |
| CN105138398A (en) | SOCKET communication and process management common platform and method under synchronous communication mode | |
| CN102662773B (en) | Structured document communication system between multiple processes | |
| CN106227587A (en) | A kind of band snapshot virtual machine complete machine dynamic migration method and device | |
| JPH08263309A (en) | Method and apparatus for notification of event between software application program objects | |
| WO2018153354A1 (en) | Resource application and vnf instance creation method and apparatus | |
| CN112463440A (en) | Disaster recovery switching method, system, storage medium and computer equipment | |
| CN108241558A (en) | A mirror warehouse backup device and method | |
| CN113918637B (en) | A process engine platform creation method and system based on BPMN2.0 specification | |
| WO2023093016A1 (en) | Cloud code development system, method, and apparatus, device, and storage medium | |
| CN117076004B (en) | Micro-service packaging and merging method and device and electronic equipment | |
| CN111967613B (en) | NLP model training and publishing recognition system | |
| CN116248629B (en) | A registration optimization method for SCA device components | |
| CN114490694A (en) | Business rule processing method and device, server and storage medium | |
| CN114611917A (en) | Personnel management system | |
| CN110442421B (en) | Kubernetes-based general service conversion method and system | |
| CN113836061B (en) | A distributed real-time simulation method suitable for simulation models and process models | |
| CN112015374A (en) | A natural language-based cross-programming language microservice integration system | |
| CN113472745B (en) | Openstack public cloud multi-tenant isolation method, system and terminal based on selinux | |
| WO2017109586A1 (en) | Semantic weaving of configuration fragments into a consistent configuration |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: NLP model training release recognition system Effective date of registration: 20231108 Granted publication date: 20230616 Pledgee: Guotou Taikang Trust Co.,Ltd. Pledgor: ZHEJIANG BYAI TECHNOLOGY Co.,Ltd. Registration number: Y2023980064435 |
|
| PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
| PC01 | Cancellation of the registration of the contract for pledge of patent right |
Granted publication date: 20230616 Pledgee: Guotou Taikang Trust Co.,Ltd. Pledgor: ZHEJIANG BYAI TECHNOLOGY Co.,Ltd. Registration number: Y2023980064435 |