[go: up one dir, main page]

CN117149972B - Method and device for checking quality of collection-accelerating sensitive words based on large model - Google Patents

Method and device for checking quality of collection-accelerating sensitive words based on large model Download PDF

Info

Publication number
CN117149972B
CN117149972B CN202311103890.4A CN202311103890A CN117149972B CN 117149972 B CN117149972 B CN 117149972B CN 202311103890 A CN202311103890 A CN 202311103890A CN 117149972 B CN117149972 B CN 117149972B
Authority
CN
China
Prior art keywords
quality inspection
text
model
collection
language model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311103890.4A
Other languages
Chinese (zh)
Other versions
CN117149972A (en
Inventor
陈希
徐维
段祖宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Sushang Bank Co ltd
Original Assignee
Jiangsu Sushang Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Sushang Bank Co ltd filed Critical Jiangsu Sushang Bank Co ltd
Priority to CN202311103890.4A priority Critical patent/CN117149972B/en
Publication of CN117149972A publication Critical patent/CN117149972A/en
Application granted granted Critical
Publication of CN117149972B publication Critical patent/CN117149972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

本发明提出了一种基于大模型的催收敏感词质检方法和装置,该方法包括:获取线上生成的催收录音;调用翻译模型API接口将催收录音转换出录音文本;对录音文本进行预处理,并对长文本进行分割,获得输入文本;将输入文本导入原始模型进行质检,输出第一质检结果;构建本地催收合规向量知识库,基于大语言模型调用向量知识库,并将输入文本导入大语言模型质检,输出第二质检结果;根据历史催收录音数据,并基于大语言模型进行训练,获得定制大语言模型;将输入文本导入定制大语言模型中进行质检,获得第三质检结果;若存在一项不合规,则质检结果为不合规。本发明可以保障合规催收、提高效率、降低成本、减少纠纷,促进大语言模型在金融领域的发展。

The present invention proposes a method and device for quality inspection of sensitive words in debt collection based on a large model, the method comprising: obtaining a debt collection recording generated online; calling a translation model API interface to convert the debt collection recording into a recording text; preprocessing the recording text, and segmenting the long text to obtain the input text; importing the input text into the original model for quality inspection, and outputting a first quality inspection result; constructing a local debt collection compliance vector knowledge base, calling the vector knowledge base based on a large language model, and importing the input text into the large language model for quality inspection, and outputting a second quality inspection result; according to historical debt collection recording data, and training based on a large language model, obtaining a customized large language model; importing the input text into the customized large language model for quality inspection, and obtaining a third quality inspection result; if there is one non-compliance, the quality inspection result is non-compliance. The present invention can ensure compliance collection, improve efficiency, reduce costs, reduce disputes, and promote the development of large language models in the financial field.

Description

Method and device for checking quality of collection-accelerating sensitive words based on large model
Technical Field
The invention relates to the technical field of finance, in particular to a method and a device for checking quality of a collection-accelerating sensitive word based on a large model.
Background
Along with the increasing supervision of the financial industry, the compliance requirement on the collection behavior is higher and higher. Enterprises need to identify and filter sensitive words in the collection process so as to ensure that collection behavior accords with relevant laws and regulations and industry standards and reduce potential legal risks. Post-credit collection is an important link of risk management of financial institutions and is the most manual intervention link. Especially traditional collection quality inspection field relies on the manual work to go on, and is with high costs and inefficiency, can not satisfy the development demand of financial collection trade.
Disclosure of Invention
In view of the problems, the invention provides a large-model-based method and a large-model-based device for checking the quality of a collection-accelerating sensitive word, which solve the problems of high cost and low efficiency of the traditional collection-accelerating quality check field mainly by manpower.
The technical scheme includes that a large-model-based collection-accelerating sensitive word quality inspection method comprises the steps of obtaining on-line generated collection-accelerating records, calling a translation model API interface to convert the collection-accelerating records into recording texts, preprocessing the recording texts, segmenting long texts in the recording texts to obtain input texts, importing the input texts into an original model to conduct quality inspection, outputting a first quality inspection result, constructing a local collection-accelerating rule vector knowledge base, calling the vector knowledge base based on a large language model, importing the input texts into the large language model to conduct quality inspection, outputting a second quality inspection result, conducting P-tuning training according to historical collection-accelerating record data and based on the large language model to obtain customized large language model, importing the input texts into the customized large language model to conduct quality inspection to obtain a third quality inspection result, and finally judging that at least one of the first quality inspection result, the second quality inspection result and the third quality inspection result is non-uniform.
The method comprises the steps of collecting a collection record based on expert experience or historical customer complaint cases, screening out a speaking part of a collector after the collection record is converted into a record text, converting the record text into 512-dimensional vectors by using coding software, storing the converted vectors in a database, converting the record text to be evaluated into the 512-dimensional vectors, performing inner product calculation with all vectors in the database, wherein the larger the inner product is, the higher the similarity is, and if the similarity exceeds a set threshold value, indicating that the corresponding record has the history-generated non-compliance problem.
The method comprises the steps of obtaining a context related to a user request by reading content, filling a template by using the request content and the context content to obtain a prompt word, and inputting the prompt word into a large language model.
Preferably, preprocessing the recorded text comprises removing recorded text for less than 30 seconds and adding target tag information based on expert experience and historical complaint information.
The method comprises the steps of collecting recording and text data in the induction field, preprocessing the data, utilizing an ASR technology to identify the recording data, distinguishing induction personnel and overdue users, converting the recording data into the text data, utilizing expert labelling to distinguish the text data, generating training samples according to whether positive and negative labels are properly marked or not, dividing the training samples into a training set and a testing set, wherein the training set is used for P-training, the testing set is used for evaluating model effects, configuring P-training model parameters, customizing the large language model when the model effects reach a set threshold, and deploying the customized large language model in a production environment for calling by an induction system in an API mode.
The method comprises the steps of identifying recording data by using an ASR technology, distinguishing an adductor from overdue users, and converting the recording data into text data, wherein a whisperX model is used for designating a language as Chinese, a speaker is 2 persons, recording files are input to a whisperX model, the speaker and the speaking content text are output, and the speaking text data of the adductor are screened out according to the fixed opening time of the adductor.
The method comprises the steps of enabling the custom large language model to be imported into a production environment, adjusting the model state to be an eval mode, providing API service to the outside by using fastapi interfaces, providing a recording text which is required to be evaluated and added with a prompt word on the API, and returning an evaluation result of the recording text.
The invention further provides a large-model-based collection-induction sensitive word quality inspection device, which comprises an acquisition module, a recording conversion module, a preprocessing module, a first quality inspection module, a second quality inspection module, a third quality inspection module, a quality inspection result and a third quality inspection module, wherein the acquisition module is used for acquiring a collection-induction recording generated on line, the recording conversion module is used for calling a translation model API interface to convert the collection-induction recording into a recording text, the preprocessing module is used for preprocessing the recording text and dividing a long text in the recording text to obtain an input text, the first quality inspection module is used for importing the input text into an original model to conduct quality inspection, the second quality inspection module is used for constructing a local collection-induction rule vector knowledge base, calling the vector knowledge base based on the large language model and importing the input text into the large language model to conduct quality inspection, the second quality inspection result is output, the model training module is used for conducting P-training according to the historical collection-induction recording data and based on the large language model to obtain a customized large language model, the third quality inspection module is used for importing the input text into the customized large language model to conduct quality inspection, and obtaining the third quality inspection result is obtained, and the first quality inspection result and the third quality inspection result is at least different from the first quality inspection result.
Compared with the prior art, the method has the beneficial effects that through mining unstructured collection recording data accumulated in financial institution collection business, after preprocessing operations such as data cleaning, a speaking object is identified, a collection sensitive word model is generated, sensitive content of speaking content is more accurately identified, quality inspection standard check is finally completed, and risk conditions of the sensitive words possibly related in collection voice are output. By utilizing the technology of prompting and receiving sensitive word quality inspection, the prompting and receiving voice and text can be analyzed in an automatic mode, so that the labor cost is reduced. By utilizing a large language model technology, sensitive words can be found through analysis of the prompting voice and text data, and the prompting efficiency is improved. The quality inspection personnel can conduct spot inspection more pertinently, the workload of manual quality inspection is reduced, and the working efficiency is improved. The method for checking the quality of the words with the prompt and collection sensitivity can ensure compliance, improve efficiency, reduce cost and disputes, and promote the development of large language model technology in the financial field.
Drawings
The disclosure of the present invention is described with reference to the accompanying drawings. It is to be understood that the drawings are designed solely for the purposes of illustration and not as a definition of the limits of the invention. In the drawings, like reference numerals are used to refer to like parts. Wherein:
FIG. 1 is a flow chart of a method for checking quality of a collection-oriented sensitive word according to an embodiment of the invention;
FIG. 2 is a schematic diagram of another flow chart of a method for checking quality of a collection-oriented word according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for checking quality of a collection-sensitive word according to an embodiment of the present invention.
Detailed Description
It is to be understood that, according to the technical solution of the present invention, those skilled in the art may propose various alternative structural modes and implementation modes without changing the true spirit of the present invention. Accordingly, the following detailed description and drawings are merely illustrative of the invention and are not intended to be exhaustive or to limit the invention to the precise form disclosed.
An embodiment according to the invention is shown in connection with fig. 1 and 2. A method for checking quality of a collection-accelerating sensitive word based on a large model comprises the following steps:
S101, acquiring the on-line generated collection record.
S102, calling a translation model API interface to convert the collect-urging record into a record text.
S103, preprocessing the recording text, and dividing the long text in the recording text to obtain an input text. And the segmented texts are respectively subjected to segment quality inspection prediction, so that the bottleneck of large language model input limitation can be solved.
The method comprises the steps of preprocessing the recorded text, namely removing the recorded text for less than 30 seconds, and adding target tag information based on expert experience and historical complaint information.
S104, importing the input text into an original model for quality inspection, and outputting a first quality inspection result. Invoking the original model may identify a portion of the apparent NSFW (not civilized or unsuitable) terms.
S105, constructing a local compliance vector knowledge base, calling the vector knowledge base based on the large language model, importing an input text into the large language model for quality inspection, and outputting a second quality inspection result. And calling a vector knowledge base of the local sensitive words through the large predictive model to identify the business non-compliance content.
Specifically, the construction of the local compliance vector knowledge base includes:
(1) Based on expert experience or historical customer complaint cases, collecting collection records and expanding collection compliance field information available for a pre-trained large language model.
(2) After the collection prompting recording is converted into a recording text, screening out a speaking part of a collection prompting person;
(3) The recorded text is converted into 512-dimensional vectors using encoding software, and the converted vectors are stored in a database. The coding software is google-universal-encoding.
(4) After converting the sound recording text to be evaluated into 512-dimensional vectors, carrying out inner product calculation with all vectors in a database, wherein the larger the inner product is, the higher the similarity is;
(5) If the similarity exceeds the set threshold, the problem of non-compliance of the corresponding record, which has occurred historically, is indicated. For example, if the similarity threshold is set to be 0.8, if the similarity threshold is exceeded, the record is indicated to have the problem of non-compliance which has occurred historically.
Before the input text is imported into the large language model for quality inspection, the method further comprises the steps of reading the content of the local vector knowledge base, obtaining the context related to the user request, filling templates with the request content and the context content to obtain prompt words, and inputting the prompt words into the large language model.
S106, P-training is carried out based on the large language model according to the historical prompting recording data, and the customized large language model is obtained.
Specifically, the method comprises the following steps:
1) Recording and text data in the field of collection are collected, and the data are preprocessed. The preprocessing comprises the operations of cleaning, denoising, labeling and the like on the data. For example, remove less than 30 seconds of recorded text and add target tag information based on expert experience and historical complaint information.
2) And identifying the recording data by using an ASR technology, distinguishing the collecting personnel and overdue users, and converting the recording data into text data in local batch.
The ASR technique is specifically module DiarizationPipeline of the whisperX model. When the record is identified, the appointed language is Chinese, the speaker is 2 persons, the record file (wav or MP3 format) is input to whisperX model, and the model directly outputs the speaker and the text of the speaking content. The speech text data of the collector can be screened out according to the fixed opening time of the collector. The fixed opening time may be "i am XX bank. . . ".
3) And distinguishing text data by using expert labeling, and generating a training sample according to whether positive and negative labels are properly labeled.
For example, the training sample format of the model is:
{ "input": "please determine whether the following catalyst records are compliant [ catalyst record text",
"Output": "non-compliance" }
4) The training samples are divided into a training set for p-training and a test set for evaluating the model effect. Typically thousands of labeled samples are pre-trained.
5) And configuring p-tuning model parameters, and customizing the training of the large language model when the model effect reaches a set threshold value. The model parameters with relatively large influence on the result are learning rate, training data are read in to train on the GPU, and evaluation is carried out on the test set after training is completed.
The P-tuning technology is adopted to fine tune the large language model, basic parameter values in the pre-trained large language model are not changed, fine tuning training is only carried out on a prompt word embedding layer in the large language model, and the method can be completed on a single-card GPU due to fewer trainable parameters. The customized model of P-tuning can output the quality inspection result of sensitive words, and the stability and accuracy of model output are greatly improved compared with the model only using pre-training.
And performing model fine adjustment on the large language model by using a P-tuning technology based on the historical induced harvest record, wherein the customized model after fine adjustment can identify the quality inspection risk of the user in the record end to end.
6) The customized large language model is deployed in a production environment and can be called by an acceleration system in an API mode. By adopting the sensitive word detection method based on the deep learning algorithm, the sensitive words in the field of collection can be efficiently detected, and the conditions of missed detection and false detection are reduced.
The method comprises the steps of arranging the custom large language model in a production environment, importing the custom large language model into the production environment, adjusting the model state to be an eval mode, providing an API service to the outside by using a fastapi interface, providing a recording text which needs to be evaluated and is added with a prompt word on the API, and returning an evaluation result of the recording text.
S107, importing the input text into a custom large language model for quality inspection to obtain a third quality inspection result. The voice which is difficult to identify can be identified by calling the custom large language model.
S108, if at least one of the first quality inspection result, the second quality inspection result and the third quality inspection result is not compliant, the final quality inspection result is not compliant.
Specifically, in the first quality inspection result, the second quality inspection result and the third quality inspection result, as long as one of the first quality inspection result, the second quality inspection result and the third quality inspection result is not compliant, the final quality inspection result is not compliant. The evaluation of the three quality tests is from three different angles, and the emphasis is different, namely the first quality test is to simply identify obvious dirty words through a large language model, the second quality test is to find out similar non-compliance cases as in history, and the third quality test is to expand the potential non-compliance cases which do not appear before prediction through the model. Through three quality inspection processes, accurate quality inspection can be effectively performed on different objects, and quality inspection efficiency and accuracy are improved.
For example, a cursory can be directly identified by the first quality inspection by a cursory in the phone. The cashier asks the customer to pay back in the phone to the cashier's private account (a historically frequent case of non-compliance) user for payment, which can be identified by the second quality check. Some hints and inducement actions that may not be as well defined may be identified by the third quality inspection.
Optionally, the detected sensitive words can be compared with quality inspection standards, so that the quality inspection flow is optimized, and the quality inspection efficiency and accuracy are improved. By adjusting and optimizing the corpus, the content of the corpus is continuously optimized according to the latest requirements of industry and the specifications of financial institutions.
Referring to fig. 3, the invention also provides a large-model-based collection-accelerating sensitive word quality inspection device, which comprises:
an acquisition module 101, configured to acquire an on-line generated collect-promoting recording;
The recording conversion module 102 is used for calling the translation model API interface to convert the prompting recording into a recording text;
The preprocessing module 103 is used for preprocessing the recording text and dividing the long text in the recording text to obtain an input text;
the first quality inspection module 104 is configured to import the input text into the original model for quality inspection, and output a first quality inspection result;
the second quality inspection module 105 is configured to construct a local compliance vector knowledge base, call the vector knowledge base based on the large language model, import the input text into the large language model for quality inspection, and output a second quality inspection result;
the model training module 106 is used for collecting recording data according to history, and performing P-training based on the large language model to obtain a customized large language model;
a third quality inspection module 107, configured to import the input text into a custom large language model for quality inspection, and obtain a third quality inspection result;
The quality inspection result module 108 determines that the final quality inspection result is non-compliant if at least one of the first quality inspection result, the second quality inspection result, and the third quality inspection result is non-compliant.
In summary, the method has the beneficial effects that through mining unstructured collection recording data accumulated in financial institution collection business, after preprocessing operations such as data cleaning, a speaking object is identified, a collection sensitive word model is generated, sensitive content of speaking content is more accurately identified, quality inspection standard verification is finally completed, and risk conditions of the sensitive words possibly related in collection voice are output. By utilizing the technology of prompting and receiving sensitive word quality inspection, the prompting and receiving voice and text can be analyzed in an automatic mode, so that the labor cost is reduced. By utilizing a large language model technology, sensitive words can be found through analysis of the prompting voice and text data, and the prompting efficiency is improved. The quality inspection personnel can conduct spot inspection more pertinently, the workload of manual quality inspection is reduced, and the working efficiency is improved. The method for checking the quality of the words with the prompt and collection sensitivity can ensure compliance, improve efficiency, reduce cost and disputes, and promote the development of large language model technology in the financial field.
The invention is the targeted application optimization of intelligent voice recognition and large language model projects in the financial field, and is an innovative attempt of a large model in the financial field in the field of collection-accelerating quality inspection compliance. The technology can be applied to quality inspection of voice, text and other data in the field of collection, effectively identifies sensitive words, improves quality inspection efficiency and accuracy, and is beneficial to protecting consumer rights and benefits and improving industry images. Meanwhile, the technical method can be also applied to other fields needing compliance management and control, such as financial product recommendation, live broadcast and other emerging industries. Besides the training output result of the large language model, the technical method reserves the rule judgment scheme of the knowledge base such as expert marking and the like, comprehensively gives the quality inspection result, and effectively combines the advantages of manpower and the large model.
The invention provides a large-model-based method and a large-model-based device for checking the quality of a collection-accelerating sensitive word, which are used for detecting data such as voice, text and the like in the collection-accelerating field by using a machine learning algorithm and have higher technical innovation. The patent discloses a specific implementation process of a collection-accelerating sensitive word quality inspection technology and an application method based on a large model, which is helpful for promoting technology communication and cooperation and promoting development of related technologies. The system can reduce cost and increase efficiency in the field of boosting and gathering, improve quality inspection efficiency and reduce labor cost, and can greatly reduce the workload of manual quality inspection and improve quality inspection efficiency by using a large model for gathering sensitive words and compliance quality inspection. The large language model has higher detection accuracy, can effectively find out sensitive words and compliance problems in the field of collection, and improves quality inspection accuracy. The patent can better monitor the compliance of the collection industry and protect the rights and interests of consumers through the collection-accelerating sensitive words and the compliance quality inspection technology based on the large language model. For the collection industry, the collection sensitive word and compliance quality inspection technology of the patent can improve the industry image and enhance the trust of society to the collection industry. After the technical method of the patent is popularized, the operation of the harvest accelerating industry can be standardized, illegal harvest accelerating actions are prevented, and the healthy development of the industry is promoted.
It should be appreciated that the integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The technical scope of the present invention is not limited to the above description, and those skilled in the art may make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and these changes and modifications should be included in the scope of the present invention.

Claims (7)

1.一种基于大模型的催收敏感词质检方法,其特征在于,包括如下步骤:1. A collection sensitive word quality inspection method based on a large model, characterized by comprising the following steps: 获取线上生成的催收录音;Obtain collection recordings generated online; 调用翻译模型API接口将所述催收录音转换出录音文本;Calling the translation model API interface to convert the collection recording into recording text; 对所述录音文本进行预处理,并对所述录音文本中的长文本进行分割,获得输入文本;Preprocessing the recorded text and segmenting the long text in the recorded text to obtain input text; 将所述输入文本导入原始模型中进行质检,输出第一质检结果;Importing the input text into the original model for quality inspection, and outputting a first quality inspection result; 构建本地催收合规向量知识库,基于大语言模型调用所述向量知识库,并将所述输入文本导入大语言模型进行质检,输出第二质检结果;Building a local debt collection compliance vector knowledge base, calling the vector knowledge base based on the large language model, importing the input text into the large language model for quality inspection, and outputting a second quality inspection result; 根据历史催收录音数据,并基于大语言模型进行P-tuning训练,获得定制大语言模型;Based on historical collection recording data and P-tuning training based on the large language model, a customized large language model is obtained; 所述根据历史催收录音数据,并基于大语言模型进行P-tuning训练,获得定制大语言模型,包括:The method of obtaining a customized large language model by performing P-tuning training based on the historical collection recording data and the large language model includes: 收集催收领域的录音和文本数据,并对数据预处理;Collect audio and text data in the field of debt collection and pre-process the data; 利用ASR技术识别录音数据,区分出催收人员和逾期用户,并把录音数据转换成文本数据;Use ASR technology to identify recording data, distinguish between debt collectors and overdue users, and convert recording data into text data; 使用专家打标签区分所述文本数据,根据是否合规打上正负标签,生成训练样本;Using experts to label the text data, assigning positive and negative labels based on compliance, and generating training samples; 将所述训练样本划分为训练集和测试集,所述训练集用于p-tuning训练,测试集用于评估模型效果;Dividing the training samples into a training set and a test set, wherein the training set is used for p-tuning training and the test set is used for evaluating the model effect; 配置p-tuning模型参数,当模型效果达到设定阈值时,定制大语言模型训练完成;Configure the p-tuning model parameters. When the model effect reaches the set threshold, the customized large language model training is completed. 将所述定制大语言模型部署在生产环境,可供催收系统通过API的方式进行调用;Deploy the customized large language model in a production environment so that it can be called by the collection system through an API; 将所述输入文本导入定制大语言模型中进行质检,获得第三质检结果;Importing the input text into the customized large language model for quality inspection to obtain a third quality inspection result; 若第一质检结果、第二质检结果和第三质检结果中存在至少一项不合规,则最终的质检结果为不合规。If at least one of the first quality inspection result, the second quality inspection result and the third quality inspection result is non-compliant, the final quality inspection result is non-compliant. 2.根据权利要求1所述的基于大模型的催收敏感词质检方法,其特征在于,所述构建本地催收合规向量知识库,包括:2. The method for quality inspection of sensitive words for debt collection based on a large model according to claim 1, characterized in that the construction of a local debt collection compliance vector knowledge base comprises: 基于专家经验或历史客户投诉案例,收集催收录音;Collect collection recordings based on expert experience or historical customer complaint cases; 将所述催收录音转成录音文本后,筛选出催收员的讲话部分;After converting the debt collection recording into a text recording, the speech portion of the debt collector is screened out; 使用编码软件将所述录音文本转化成512维的向量,并将转化后的向量存储在数据库中;Using encoding software to convert the recorded text into a 512-dimensional vector, and storing the converted vector in a database; 将待评估的录音文本转化成512维向量后,与数据库中的所有向量进行内积计算,内积越大,相似度越高;After converting the recorded text to be evaluated into a 512-dimensional vector, the inner product is calculated with all the vectors in the database. The larger the inner product, the higher the similarity; 若相似度超过设定阈值,则说明对应录音存在历史发生过的不合规问题。If the similarity exceeds the set threshold, it means that the corresponding recording has historical non-compliance issues. 3.根据权利要求1所述的基于大模型的催收敏感词质检方法,其特征在于,在将所述输入文本导入大语言模型进行质检之前,还包括:3. The method for quality inspection of sensitive words in debt collection based on a large model according to claim 1, characterized in that before the input text is imported into the large language model for quality inspection, it also includes: 读取内容,获取与用户请求相关的上下文;Read the content and obtain the context related to the user request; 使用请求内容和上下文内容填充模板,获得提示词;Fill the template with the request content and context content to obtain the prompt words; 将所述提示词输入到大语言模型中。The cue words are input into a large language model. 4.根据权利要求1所述的基于大模型的催收敏感词质检方法,其特征在于,对所述录音文本进行预处理,包括:去除小于30秒的录音文本,并增加基于专家经验和历史投诉信息的目标标签信息。4. The large-model-based collection sensitive word quality inspection method according to claim 1 is characterized in that the recording text is preprocessed, including: removing recording text less than 30 seconds, and adding target label information based on expert experience and historical complaint information. 5.根据权利要求1所述的基于大模型的催收敏感词质检方法,其特征在于,所述利用ASR技术识别录音数据,区分出催收员和逾期用户,并把录音数据转换成文本数据,包括:5. The method for quality inspection of sensitive words for debt collection based on a large model according to claim 1 is characterized in that the use of ASR technology to identify recording data, distinguish between debt collectors and overdue users, and convert the recording data into text data includes: 利用whisperX模型,指定语言为中文,讲话人为 2 人;Using the whisperX model, specify the language as Chinese and the speakers as 2 people; 输入录音文件至whisperX模型,输出讲话人和讲话内容文本;Input the recording file to the whisperX model and output the speaker and speech content text; 根据催收员的固定开场白筛选出催收员的讲话文本数据。The speech text data of the debt collectors are filtered out according to their fixed opening remarks. 6.根据权利要求1所述的基于大模型的催收敏感词质检方法,其特征在于,将所述定制大语言模型部署在生产环境,包括:6. The method for quality inspection of sensitive words in debt collection based on a large model according to claim 1, characterized in that the customized large language model is deployed in a production environment, comprising: 将所述定制大语言模型导入到生产环境,模型状态调整为 eval 模式;Import the customized large language model into the production environment, and adjust the model state to eval mode; 使用 fastapi接口对外提供 API 服务;Use fastapi interface to provide API services to the outside world; 在 API 上提供需评估的加上提示词的录音文本,即可返回录音文本的评估结果。Provide the recording text to be evaluated with the prompt words on the API, and the evaluation result of the recording text will be returned. 7.一种基于大模型的催收敏感词质检装置,其特征在于,包括:7. A debt collection sensitive word quality inspection device based on a large model, characterized by comprising: 获取模块,用于获取线上生成的催收录音;The acquisition module is used to obtain the collection recordings generated online; 录音转换模块,用于调用翻译模型API接口将所述催收录音转换出录音文本;A recording conversion module, used to call the translation model API interface to convert the collection recording into a recording text; 预处理模块,用于对所述录音文本进行预处理,并对所述录音文本中的长文本进行分割,获得输入文本;A preprocessing module, used to preprocess the recorded text and segment the long text in the recorded text to obtain input text; 第一质检模块,用于将所述输入文本导入原始模型中进行质检,输出第一质检结果;A first quality inspection module, used for importing the input text into the original model for quality inspection and outputting a first quality inspection result; 第二质检模块,用于构建本地催收合规向量知识库,基于大语言模型调用所述向量知识库,并将所述输入文本导入大语言模型进行质检,输出第二质检结果;A second quality inspection module is used to build a local debt collection compliance vector knowledge base, call the vector knowledge base based on the large language model, import the input text into the large language model for quality inspection, and output a second quality inspection result; 模型训练模块,用于根据历史催收录音数据,并基于大语言模型进行P-tuning训练,获得定制大语言模型;所述根据历史催收录音数据,并基于大语言模型进行P-tuning训练,获得定制大语言模型,包括:The model training module is used to perform P-tuning training based on the large language model according to the historical collection recording data to obtain a customized large language model; the P-tuning training based on the large language model according to the historical collection recording data to obtain a customized large language model includes: 收集催收领域的录音和文本数据,并对数据预处理;Collect audio and text data in the field of debt collection and pre-process the data; 利用ASR技术识别录音数据,区分出催收人员和逾期用户,并把录音数据转换成文本数据;Use ASR technology to identify recording data, distinguish between debt collectors and overdue users, and convert recording data into text data; 使用专家打标签区分所述文本数据,根据是否合规打上正负标签,生成训练样本;Using experts to label the text data, assigning positive and negative labels based on compliance, and generating training samples; 将所述训练样本划分为训练集和测试集,所述训练集用于p-tuning训练,测试集用于评估模型效果;Dividing the training samples into a training set and a test set, wherein the training set is used for p-tuning training and the test set is used for evaluating the model effect; 配置p-tuning模型参数,当模型效果达到设定阈值时,定制大语言模型训练完成;Configure the p-tuning model parameters. When the model effect reaches the set threshold, the customized large language model training is completed. 将所述定制大语言模型部署在生产环境,可供催收系统通过API的方式进行调用;Deploy the customized large language model in a production environment so that it can be called by the collection system through an API; 第三质检模块,用于将所述输入文本导入定制大语言模型中进行质检,获得第三质检结果;A third quality inspection module, used for importing the input text into the customized large language model for quality inspection to obtain a third quality inspection result; 质检结果模块,若第一质检结果、第二质检结果和第三质检结果中存在至少一项不合规,则最终的质检结果为不合规。Quality inspection result module: if at least one of the first quality inspection result, the second quality inspection result and the third quality inspection result is non-compliant, the final quality inspection result is non-compliant.
CN202311103890.4A 2023-08-30 2023-08-30 Method and device for checking quality of collection-accelerating sensitive words based on large model Active CN117149972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311103890.4A CN117149972B (en) 2023-08-30 2023-08-30 Method and device for checking quality of collection-accelerating sensitive words based on large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311103890.4A CN117149972B (en) 2023-08-30 2023-08-30 Method and device for checking quality of collection-accelerating sensitive words based on large model

Publications (2)

Publication Number Publication Date
CN117149972A CN117149972A (en) 2023-12-01
CN117149972B true CN117149972B (en) 2025-01-17

Family

ID=88898124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311103890.4A Active CN117149972B (en) 2023-08-30 2023-08-30 Method and device for checking quality of collection-accelerating sensitive words based on large model

Country Status (1)

Country Link
CN (1) CN117149972B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118245970B (en) * 2024-04-16 2024-11-12 北京面壁智能科技有限责任公司 Detection method and device
CN118378609B (en) * 2024-06-27 2024-09-27 浙江大学 Text auditing method and system based on large language model debate
CN118486331A (en) * 2024-07-16 2024-08-13 中博信息技术研究院有限公司 A method and system for implementing call recording quality inspection based on a large model
CN120278155B (en) * 2025-06-10 2025-09-12 中债数字金融科技有限公司 Multi-mode sensitive word processing method and device, storage medium and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151218A (en) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 Call voice quality detecting method, device, computer equipment and storage medium
CN109389971A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Insurance recording quality detecting method, device, equipment and medium based on speech recognition

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368130B (en) * 2020-02-26 2025-01-17 深圳前海微众银行股份有限公司 Quality inspection method, device and equipment for customer service record and storage medium
CN112036705A (en) * 2020-08-05 2020-12-04 苏宁金融科技(南京)有限公司 Quality inspection result data acquisition method, device and equipment
CN113642335A (en) * 2021-08-16 2021-11-12 上海云从企业发展有限公司 Method, device, equipment and medium for language compliance inspection of bank double-recording scenarios
CN115080713B (en) * 2022-05-25 2024-10-25 上海浦东发展银行股份有限公司 Intelligent voice training platform system and method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389971A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Insurance recording quality detecting method, device, equipment and medium based on speech recognition
CN109151218A (en) * 2018-08-21 2019-01-04 平安科技(深圳)有限公司 Call voice quality detecting method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117149972A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN117149972B (en) Method and device for checking quality of collection-accelerating sensitive words based on large model
US9530139B2 (en) Evaluation of voice communications
CN112804400A (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN111275444A (en) Double recording method, device, terminal and storage medium based on contract signing
CN113707173A (en) Voice separation method, device and equipment based on audio segmentation and storage medium
CN119669867A (en) A multimodal learning automated complaint content analysis and classification method and system
CN116883888A (en) Bank counter service problem traceability system and method based on multi-modal feature fusion
CN119863162A (en) A dynamic quality inspection method for keywords based on AI technology
CN119629636A (en) Spam call identification method, device, computer equipment and storage medium
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
CN119181380A (en) Speech fraud analysis method, device, equipment and storage medium
CN119046908A (en) Intelligent face signing method, device, equipment and medium based on large model intelligent body
CN117116251A (en) Repayment probability assessment method and device based on collection-accelerating record
CN112966296A (en) Sensitive information filtering method and system based on rule configuration and machine learning
CN118053420A (en) Speech recognition method, apparatus, device, medium and program product
CN114861680A (en) Conversation processing method and device
Badawood et al. Enhanced Deep Learning Techniques for Real-Time Speech Emotion Recognition in Multilingual Contexts
CN118035871B (en) Method and system for archival storage management of urge-recording data based on machine learning
CN119724196B (en) A method, device, equipment and medium for separating roles based on voice
CN119479694B (en) Risk event identification method, apparatus, computer device and storage medium
CN120564763A (en) Emotion recognition method, device, computer equipment and storage medium
CN119760144A (en) Language model construction method, device, equipment and medium based on data enhancement
CN116504391A (en) Intelligent follow-up visit quality control evaluation system, method and device
CN119211420A (en) Multi-scenario fraud call identification system based on Transformer
CN120148519A (en) A business abnormality early warning method and device based on voiceprint technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000

Applicant after: Jiangsu Sushang Bank Co.,Ltd.

Address before: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000

Applicant before: JIANGSU SUNING BANK Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant