CN118312476A

CN118312476A - Sensitive information detection method, device, medium and system

Info

Publication number: CN118312476A
Application number: CN202410424871.XA
Authority: CN
Inventors: 柯晓霞; 郭志红; 吴辉锋; 肖蕾; 李茶花; 李七妹
Original assignee: China Life Insurance Co ltd
Current assignee: China Life Insurance Co ltd
Priority date: 2024-04-10
Filing date: 2024-04-10
Publication date: 2024-07-09

Abstract

The application discloses a sensitive information detection method, a sensitive information detection device, a sensitive information detection medium and a sensitive information detection system. The method comprises the following steps: responding to a sensitive information detection request, acquiring a file to be detected, and putting the file to be detected into a detection queue; the file type of the file to be detected comprises one or more of a text file, a picture file, an audio file and a video file; taking out the target file from the detection queue, and determining a detection algorithm according to the file type of the target file; detecting sensitive information of the target file based on a detection algorithm; and determining a detection report format according to the file type of the target file, and filling the sensitive information detection result of the target file based on the detection report format to obtain a detection report of the target file. By adopting the technical scheme, the manual participation can be reduced, the processing links are saved, the manual processing cost is saved by replacing manual operation with a machine, errors are reduced, the checking efficiency of sensitive information is improved, and the processing time of the service is greatly shortened.

Description

A sensitive information detection method, device, medium and system

技术领域Technical Field

本申请涉及计算机技术领域，尤其涉及一种敏感信息检测方法、装置、介质及系统。The present application relates to the field of computer technology, and in particular to a sensitive information detection method, device, medium and system.

背景技术Background technique

随着近年来经济水平的飞速发展，人们越来越重视自身的人身安全和财产安全，因此，保险服务也成为人们关注的重点。With the rapid economic development in recent years, people pay more and more attention to their personal safety and property safety. Therefore, insurance services have also become the focus of people's attention.

传统保险领域销售风险防控工作方式为：一是材料准备方准备好相关材料之后，向风控管理人员提交审核申请，风控管理人员接收到审核任务后，需要人工逐份一页一页进行审核，对不符合规定的部分逐一进行批注，再反馈给申请人员修改，申请人员按照反馈结果进行修改后，再次提交审核申请，如此循环往复，直到材料符合相关要求为止。此过程一般都需要经过多轮的反复和沟通方才能结束，耗时费力。二是风控管理人员主动发起对相关材料进行检查，检查手段同样也是手工对各项材料逐一进行核查，如果发现有问题，将检查发现的问题提交问题处理部门进行处理，问题处理部门完成对应的处理之后，还需要将整改后的材料提交风控管理人员进行进一步核查确认，如此循环往复，直到发现的问题全部整改完成为止。The traditional insurance sales risk prevention and control work mode is as follows: First, after the material preparation party prepares the relevant materials, it submits an application for review to the risk control manager. After receiving the review task, the risk control manager needs to manually review each page one by one, annotate the parts that do not meet the requirements one by one, and then feedback to the applicant for modification. After the applicant makes modifications according to the feedback results, he submits the review application again, and repeats this cycle until the materials meet the relevant requirements. This process generally requires multiple rounds of repetition and communication before it can be completed, which is time-consuming and laborious. Second, the risk control manager takes the initiative to initiate an inspection of the relevant materials. The inspection method is also to manually check each material one by one. If any problems are found, the problems found in the inspection will be submitted to the problem handling department for processing. After the problem handling department completes the corresponding processing, it is also necessary to submit the rectified materials to the risk control manager for further verification and confirmation. This cycle repeats until all the problems found are rectified.

现有的人工审核，存在诸多问题。首先是缺少事前防控信息化支持，事前防控工作很难落实到位。并且，事中管控没有智能化系统给予支持，巨量材料的审核纯粹依靠人工来进行，效率低下，且出错率高。既往的风控系统都是基于管理者的视角出发，以管控为主，并且多为事后处置，导致风险防控工作处于被动地位。销售材料中涉及违禁描述的内容很多，不管是管理人员还是销售人员想要完全熟练的了解全部的内容都非常困难，在资料的审核上难免会出现漏审核、错审核的情况。敏感词库管理缺少信息系统的支持，哪些词汇属于敏感词汇仅仅保存在监管、行业、公司发布的各类文件中。并且，近年来短视频等新型销售模式尚未有有效的销售风险防控手段。There are many problems with the existing manual review. The first is the lack of information support for pre-emptive prevention and control, which makes it difficult to implement pre-emptive prevention and control work. In addition, there is no intelligent system to support in-process control, and the review of a large amount of materials is purely manual, which is inefficient and has a high error rate. Previous risk control systems were based on the perspective of managers, mainly focused on control, and mostly handled after the event, resulting in a passive position in risk prevention and control. There are many prohibited descriptions in sales materials, and it is very difficult for managers or sales staff to fully understand all the contents. In the review of materials, it is inevitable that there will be omissions and errors in the review. Sensitive word library management lacks the support of information systems, and which words are sensitive words are only saved in various documents issued by regulators, industries, and companies. In addition, in recent years, new sales models such as short videos have not yet had effective sales risk prevention and control measures.

发明内容Summary of the invention

本申请实施例提供一种敏感信息检测方法、装置、介质及系统。本方案结合目前人工审核的诸多问题，提供了一种自动进行敏感信息检测的技术方案。本发明提供的对于不同文件类型的待检测文件，采用不同的检测算法，得到检测结果后填入至对应的检测报告当中，可以将待检测文件的敏感信息检测并显示出来，便于用户观看和了解待检测文件中存在的敏感信息问题。本发明减少人工参与，节省处理环节，通过机器替代人工作业，节约人工处理成本，减少差错，提升了敏感信息的核查效率，大幅缩短业务的处理时长。The embodiments of the present application provide a sensitive information detection method, device, medium and system. This solution combines many problems of current manual review and provides a technical solution for automatic sensitive information detection. The present invention provides different detection algorithms for files to be detected of different file types, and fills the detection results into the corresponding detection report after obtaining them. The sensitive information of the files to be detected can be detected and displayed, which is convenient for users to view and understand the sensitive information problems in the files to be detected. The present invention reduces manual participation, saves processing links, replaces manual work with machines, saves manual processing costs, reduces errors, improves the efficiency of sensitive information verification, and greatly shortens the processing time of the business.

本申请实施例提供一种敏感信息检测方法，所述方法包括：The present application provides a sensitive information detection method, the method comprising:

响应于敏感信息检测请求，获取待检测文件，并将所述待检测文件放入到检测队列中；其中，所述待检测文件的文件类型包括文字文件、图片文件、音频文件以及视频文件中的一种或者多种；In response to a sensitive information detection request, obtain a file to be detected, and put the file to be detected into a detection queue; wherein the file type of the file to be detected includes one or more of a text file, a picture file, an audio file, and a video file;

从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法；Taking out a target file from the detection queue, and determining a detection algorithm according to a file type of the target file;

基于所述检测算法对所述目标文件进行敏感信息检测；Performing sensitive information detection on the target file based on the detection algorithm;

根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告。According to the file type of the target file, a detection report format is determined, and based on the detection report format, the sensitive information detection result of the target file is filled in to obtain a detection report of the target file.

进一步的，在将所述待检测文件放入到检测队列中之后，还包括：Furthermore, after putting the file to be detected into the detection queue, the method further includes:

将所述敏感信息检测请求的实时状态由上传状态切换为待检测状态；Switching the real-time status of the sensitive information detection request from an upload status to a pending detection status;

在从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法之后，还包括：After taking out the target file from the detection queue and determining the detection algorithm according to the file type of the target file, the method further includes:

将所述敏感信息检测请求的实时状态由待检测状态切换为正在检测状态；Switching the real-time status of the sensitive information detection request from a pending detection status to a detecting status;

在得到所述目标文件的检测报告之后，还包括：After obtaining the detection report of the target file, the method further includes:

将所述敏感信息检测请求的实时状态由正在检测状态切换为检测完成状态；Switching the real-time status of the sensitive information detection request from a detecting status to a detection completed status;

所述方法还包括：The method further comprises:

响应于所述敏感信息检测请求的状态查询指令，读取所述敏感信息检测请求的实时状态，并显示所述实时状态。In response to the status query instruction of the sensitive information detection request, the real-time status of the sensitive information detection request is read and displayed.

进一步的，根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告，包括：Further, according to the file type of the target file, a detection report format is determined, and the sensitive information detection result of the target file is filled in based on the detection report format to obtain a detection report of the target file, including:

若所述目标文件的文件类型为文字文件，则将所述目标文件的敏感信息检测结果填入文字检测报告格式；其中，所述文字检测报告格式包括敏感信息数量以及敏感信息的所在段落位置；If the file type of the target file is a text file, the sensitive information detection result of the target file is filled into a text detection report format; wherein the text detection report format includes the amount of sensitive information and the paragraph position where the sensitive information is located;

若所述目标文件的文件类型为图片文件，则将所述目标文件的敏感信息检测结果填入图片检测报告格式；其中，所述图片检测报告格式包括敏感信息数量、敏感信息的文本内容以及敏感信息的所在像素位置；If the file type of the target file is an image file, the sensitive information detection result of the target file is filled into an image detection report format; wherein the image detection report format includes the amount of sensitive information, the text content of the sensitive information, and the pixel location of the sensitive information;

若所述目标文件的文件类型为音频文件，则将所述目标文件的敏感信息检测结果填入音频检测报告格式；其中，所述音频检测报告格式包括敏感信息数量、敏感信息的文本内容、敏感信息的所在语句的文本内容以及敏感信息的开始时间点和结束时间点；If the file type of the target file is an audio file, the sensitive information detection result of the target file is filled into an audio detection report format; wherein the audio detection report format includes the amount of sensitive information, the text content of the sensitive information, the text content of the sentence where the sensitive information is located, and the start time point and the end time point of the sensitive information;

若所述目标文件的文件类型为视频文件，则将所述目标文件的敏感信息检测结果填入视频检测报告格式；其中，所述视频检测报告格式包括敏感信息数量、敏感信息的文本内容、敏感信息的所在语句的文本内容、敏感信息的开始时间点和结束时间点以及敏感信息的关联视频帧。If the file type of the target file is a video file, the sensitive information detection result of the target file is filled into a video detection report format; wherein the video detection report format includes the amount of sensitive information, the text content of the sensitive information, the text content of the sentence containing the sensitive information, the start time point and the end time point of the sensitive information, and the associated video frames of the sensitive information.

进一步的，在得到所述目标文件的检测报告之后，还包括：Furthermore, after obtaining the detection report of the target file, the method further includes:

响应于所述目标文件的检测报告的查看请求，获取并显示所述检测报告；并在接收到查看详情的指令时，显示检测到的敏感信息；或者，在接收到源文件的查看指令时，显示所述目标文件。In response to a request to view the detection report of the target file, the detection report is obtained and displayed; and upon receiving an instruction to view details, the detected sensitive information is displayed; or upon receiving an instruction to view the source file, the target file is displayed.

进一步的，基于所述检测算法对所述目标文件进行敏感信息检测，包括：Furthermore, based on the detection algorithm, sensitive information detection is performed on the target file, including:

基于检测算法将所述目标文件转化为文本信息；Converting the target file into text information based on a detection algorithm;

将所述文本信息与预先设置的敏感词库进行敏感词匹配。The text information is matched with a preset sensitive word library for sensitive words.

进一步的，所述方法还包括：Furthermore, the method further comprises:

响应于敏感词库的查询请求，按照预先划分的分类结果显示所述敏感词库中的敏感词。In response to a query request for a sensitive word library, sensitive words in the sensitive word library are displayed according to pre-classified classification results.

响应于敏感词的上报请求，接收待录入词汇；In response to a request to report sensitive words, receiving words to be entered;

发出词汇录入审核请求，并接收词汇录入审核结果；Send out vocabulary entry review requests and receive vocabulary entry review results;

若审核结果为允许录入，则确定所述待录入词汇的所属分类，并依据所述所属分类将所述待录入词汇录入至敏感词库中；If the review result is that the entry is allowed, the category to which the word to be entered belongs is determined, and the word to be entered is entered into the sensitive word library according to the category;

若审核结果为不允许录入，则结束敏感词的上报流程。If the review result is that entry is not allowed, the reporting process of sensitive words will end.

本申请实施例还提供了一种敏感信息检测装置，所述装置包括：The present application also provides a sensitive information detection device, the device comprising:

待检测文件获取模块，用于响应于敏感信息检测请求，获取待检测文件，并将所述待检测文件放入到检测队列中；其中，所述待检测文件的文件类型包括文字文件、图片文件、音频文件以及视频文件中的一种或者多种；A module for acquiring files to be detected, used to respond to a sensitive information detection request, acquire files to be detected, and put the files to be detected into a detection queue; wherein the file types of the files to be detected include one or more of text files, picture files, audio files, and video files;

检测算法确定模块，用于从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法；A detection algorithm determination module, used to take out a target file from the detection queue and determine a detection algorithm according to the file type of the target file;

检测模块，用于基于所述检测算法对所述目标文件进行敏感信息检测；A detection module, used to perform sensitive information detection on the target file based on the detection algorithm;

检测报告生成模块，用于根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告。The detection report generation module is used to determine the detection report format according to the file type of the target file, and fill in the sensitive information detection result of the target file based on the detection report format to obtain the detection report of the target file.

本申请实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本申请实施例所述的敏感信息检测方法。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the sensitive information detection method as described in the embodiment of the present application is implemented.

本申请实施例还提供了一种敏感信息检测系统，所述敏感信息检测系统用于执行如本申请实施例所述的敏感信息检测方法。An embodiment of the present application also provides a sensitive information detection system, which is used to execute the sensitive information detection method as described in the embodiment of the present application.

本申请实施例还提供了一种电子设备，包括存储器，处理器及存储在存储器上并可在处理器运行的计算机程序，所述处理器执行所述计算机程序时实现如本申请实施例所述的敏感信息检测方法。An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the sensitive information detection method as described in the embodiment of the present application is implemented.

本申请实施例采用的上述至少一个技术方案能够达到以下有益效果：At least one of the above technical solutions adopted in the embodiments of the present application can achieve the following beneficial effects:

本申请提供的技术方案，通过对于不同文件类型的待检测文件，采用不同的检测算法，得到检测结果后填入至对应的检测报告当中，可以将待检测文件的敏感信息检测并显示出来，便于用户观看和了解待检测文件中存在的敏感信息问题。本发明减少人工参与，节省处理环节，通过机器替代人工作业，节约人工处理成本，减少差错，提升了敏感信息的核查效率，大幅缩短业务的处理时长。The technical solution provided by the present application can detect and display the sensitive information of the files to be detected by using different detection algorithms for the files of different file types, and fill the detection results into the corresponding detection reports, so that users can view and understand the sensitive information problems in the files to be detected. The present invention reduces manual participation, saves processing links, replaces manual work with machines, saves manual processing costs, reduces errors, improves the efficiency of sensitive information verification, and greatly shortens the processing time of the business.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation on the present application. In the drawings:

图1是本申请实施例一提供的敏感信息检测方法的流程示意图；FIG1 is a schematic diagram of a process flow of a sensitive information detection method provided in Example 1 of the present application;

图2是本申请实施例一提供的敏感信息检测系统的交互流程示意图；FIG2 is a schematic diagram of an interaction flow of a sensitive information detection system provided in Example 1 of the present application;

图3是本申请实施例一提供的敏感信息检测系统的上传检测流程示意图；FIG3 is a schematic diagram of an upload detection process of a sensitive information detection system provided in Example 1 of the present application;

图4是本申请实施例一提供的敏感信息检测系统的输入检测流程示意图；FIG4 is a schematic diagram of an input detection process of a sensitive information detection system provided in Example 1 of the present application;

图5是本申请实施例一提供的敏感信息检测系统的历史检测清单交互流程示意图；5 is a schematic diagram of the historical detection list interaction process of the sensitive information detection system provided in Example 1 of the present application;

图6是本申请实施例一提供的敏感信息检测系统的词库查询流程示意图；FIG6 is a schematic diagram of a vocabulary query process of a sensitive information detection system provided in Example 1 of the present application;

图7是本申请实施例一提供的敏感信息检测系统的敏感词汇新增流程示意图；FIG7 is a schematic diagram of a sensitive word adding process of a sensitive information detection system provided in Embodiment 1 of the present application;

图8是本申请实施例一提供的敏感信息检测系统的新增的敏感词汇审核流程示意图；FIG8 is a schematic diagram of a newly added sensitive word review process of the sensitive information detection system provided in Example 1 of the present application;

图9是本申请实施例二提供的敏感信息检测装置的结构示意图；FIG9 is a schematic diagram of the structure of a sensitive information detection device provided in Example 2 of the present application;

图10是本申请实施例五提供的一种电子设备的结构示意图。FIG10 is a schematic diagram of the structure of an electronic device provided in Embodiment 5 of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整的描述。显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be clearly and completely described below in combination with the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of this application.

以下结合附图，详细说明本申请各实施例提供的技术方案。The technical solutions provided by various embodiments of the present application are described in detail below in conjunction with the accompanying drawings.

实施例一Embodiment 1

图1是本申请实施例一提供的敏感信息检测方法的流程示意图，本实施例可适用于对于视频、音频、文字、图片中是否包含销售敏感词汇进行检测的情况，该方法可以由本申请实施例所提供的敏感信息检测装置执行，该装置可以由软件和/或硬件的方式来实现，并且可以集成于用于敏感信息检测的电子设备当中。Figure 1 is a flow chart of the sensitive information detection method provided in the first embodiment of the present application. This embodiment can be applied to the case of detecting whether videos, audios, texts, and pictures contain sales-sensitive words. The method can be executed by the sensitive information detection device provided in the embodiment of the present application. The device can be implemented by software and/or hardware, and can be integrated into an electronic device for sensitive information detection.

如图1所示，所述方法包括：As shown in FIG1 , the method comprises:

S110、响应于敏感信息检测请求，获取待检测文件，并将所述待检测文件放入到检测队列中；其中，所述待检测文件的文件类型包括文字文件、图片文件、音频文件以及视频文件中的一种或者多种；S110, in response to a sensitive information detection request, obtaining a file to be detected, and placing the file to be detected into a detection queue; wherein the file type of the file to be detected includes one or more of a text file, a picture file, an audio file, and a video file;

本方案中，敏感信息检测请求可以是由销售人员或者用户发出的，该请求可以是在敏感信息检测系统上通过点击“敏感信息检测”按钮来发出的。In this solution, the sensitive information detection request may be issued by a salesperson or a user, and the request may be issued by clicking a "Sensitive Information Detection" button on the sensitive information detection system.

待检测文件，可以是销售人员或者用户上传的文件，例如图片、音频以及视频等文件，还可以是直接在搜索栏中输入的文字内容。The files to be detected may be files uploaded by sales personnel or users, such as pictures, audios, and videos, or text content directly entered in the search bar.

本方案中，敏感信息检测系统中可以维护一个检测队列，该检测队列可以用于将接收到的请求放入到其中，并按照放入的时间顺序逐个进行检测。In this solution, a detection queue can be maintained in the sensitive information detection system, and the detection queue can be used to put the received requests into it and detect them one by one in the order of the time they are put in.

其中，敏感信息检测包括但不限于敏感词检测。本方案可帮助用户智能核查视频文件、音频文件、常用办公文件、各类型图片的音频、文字、图片内容中是否包含销售敏感词汇；针对不同类型的文件出具与之相对应的检测报告，各类检测报告中均可一目了然的了解这份文件语音、图像、文字部分是否包含销售敏感词汇，包含哪些敏感词汇，并给予高亮凸显，帮助用户快速锁定敏感词汇，方便用户快捷整改。系统打破纯人工操作困局，通过提供简单、快速、便捷的智能检验、审核、分析等工具，降低操作难度，降低人员使用门槛，做到一学就会，一用就懂，人人可用，人人会用，彻底颠覆人工审查的工作模式，提质增效。助力构建销售风险防控智能防线，同时填补行业内敏感信息检测应用空白。Among them, sensitive information detection includes but is not limited to sensitive word detection. This solution can help users intelligently check whether the audio, text, and picture content of video files, audio files, common office files, and various types of pictures contain sales-sensitive words; corresponding test reports are issued for different types of files. In various test reports, you can clearly understand whether the voice, image, and text parts of this file contain sales-sensitive words, which sensitive words are contained, and highlight them to help users quickly lock sensitive words and facilitate users to quickly make corrections. The system breaks the dilemma of pure manual operation. By providing simple, fast, and convenient intelligent inspection, review, analysis and other tools, it reduces the difficulty of operation and the threshold for personnel to use, so that everyone can learn it once, understand it once, and everyone can use it. It completely subverts the working mode of manual review and improves quality and efficiency. It helps to build an intelligent line of defense for sales risk prevention and control, and at the same time fills the gap in the application of sensitive information detection in the industry.

通过敏感信息检测系统自主、主动检查相关文件的音频、文字、图片中是否包含销售敏感词汇。从源头上预防销售风险的发生，变被动处置为主动防控，形成覆盖事前、事中、事后的风险管控闭环，有效减轻一线风控压力。同时，方案弥补风险事前防控信息化空白，将风险防控防线前置到销售的最前端。本方案首创风险防控由排查监督向服务自控、主动防控方式转变，彻底改变传统风险防控工作方式。The sensitive information detection system can be used to autonomously and proactively check whether the audio, text, and pictures of relevant documents contain sales-sensitive words. Prevent the occurrence of sales risks at the source, change passive disposal to active prevention and control, form a closed loop of risk control covering before, during, and after the event, and effectively reduce the pressure of front-line risk control. At the same time, the solution fills the gap in information technology for risk prevention and control beforehand, and moves the risk prevention and control line to the forefront of sales. This solution is the first to transform risk prevention and control from investigation and supervision to service self-control and active prevention and control, which completely changes the traditional way of risk prevention and control.

S120、从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法；S120, taking out a target file from the detection queue, and determining a detection algorithm according to a file type of the target file;

其中，检测队列中取出待检测文件，作为目标文件。目标文件的文件类型，可以是单一类型，也可以是混合类型，例如文字与图片的混合，以及音频与文字的混合等。可以知道的，对于不同类型的目标文件，可以采用不同的检测算法，对其进行敏感词检测。Among them, the file to be detected is taken out from the detection queue as the target file. The file type of the target file can be a single type or a mixed type, such as a mixture of text and pictures, or a mixture of audio and text. It can be known that different detection algorithms can be used for different types of target files to detect sensitive words.

S130、基于所述检测算法对所述目标文件进行敏感信息检测；S130, performing sensitive information detection on the target file based on the detection algorithm;

其中，检测算法可以用于对目标盘文件进行敏感信息检测，例如对出现过的敏感词进行检测，或者对其他敏感信息进行检测，例如对于敏感图像元素，或者敏感音频信息进行检测等等。Among them, the detection algorithm can be used to detect sensitive information in the target disk file, such as detecting sensitive words that have appeared, or detecting other sensitive information, such as detecting sensitive image elements, or sensitive audio information, etc.

本方案中，具体的，可以是针对不同类型的目标文件，提取其中的文本内容，例如提取视频或者音频文件中的文本内容，并与预先确定的敏感词进行对比，确定是否包含有敏感词，若包含，则确定检测到敏感信息。对于图片，可以对图片中包括的文字内容进行提取，并与预先确定的敏感词进行对比。In this solution, specifically, the text content of different types of target files can be extracted, such as extracting the text content of video or audio files, and comparing it with pre-determined sensitive words to determine whether sensitive words are contained. If so, sensitive information is detected. For pictures, the text content included in the pictures can be extracted and compared with pre-determined sensitive words.

S140、根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告。S140. Determine a detection report format according to the file type of the target file, and fill in the sensitive information detection result of the target file based on the detection report format to obtain a detection report of the target file.

其中，针对不同的文件类型，可以确定不同的检测报告格式。可以理解的，该检测报告格式可以用于更好的呈现出当前的目标文件的敏感信息检测结果。Different detection report formats may be determined for different file types. It is understandable that the detection report format may be used to better present the sensitive information detection result of the current target file.

在确定检测报告格式之后，可以将敏感信息检测结果对应的填入至检测报告格式中，以得到目标文件的检测报告。After determining the detection report format, the sensitive information detection results can be filled into the detection report format accordingly to obtain a detection report for the target file.

例如，可以智能识别视频文件中的语音、图像、文字内容，自动检测保险销售敏感信息，分别出具视频文件的音频内容检测报告和图片内容检测报告。在视频内容检测报告中精准标记敏感词汇出现的视频帧数量、视频帧顺序、视频帧出现在视频文件的具体时间，以及高亮显示视频帧上检测到的敏感词汇。在音频内容检测报告中明确注明每段检测到敏感词汇的音频出现在整个视频文件的开始时间、结束时间，提供对应的文字稿，同时在文字稿中精准、高亮标记检测到的敏感词汇。本方案极大提升视频文件检测效率和检测质量。For example, it can intelligently identify the voice, image, and text content in video files, automatically detect sensitive information about insurance sales, and issue audio content detection reports and image content detection reports for video files respectively. In the video content detection report, the number of video frames where sensitive words appear, the order of video frames, the specific time when the video frames appear in the video file, and the sensitive words detected on the video frames are accurately marked, and the sensitive words detected on the video frames are highlighted. In the audio content detection report, it is clearly noted that each audio segment where sensitive words are detected appears at the start and end time of the entire video file, and the corresponding transcript is provided. At the same time, the detected sensitive words are accurately and highlighted in the transcript. This solution greatly improves the efficiency and quality of video file detection.

在本技术方案中，可选的，根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告，包括：In the technical solution, optionally, according to the file type of the target file, a detection report format is determined, and based on the detection report format, the sensitive information detection result of the target file is filled in to obtain the detection report of the target file, including:

可以理解的，对于文字文件，可以将检测到的敏感信息直接填写到检测报告中，包括敏感信息数量，以及敏感信息所在的段落位置，或者直接高亮显示敏感信息所在位置。It can be understood that for text files, the detected sensitive information can be directly filled into the detection report, including the amount of sensitive information and the paragraph location where the sensitive information is located, or the location of the sensitive information can be directly highlighted.

对于图片文件，可以将图片文件中包括的文本内容或者图像内容进行检测，确定其中的文本内容或者图像内容是否存在敏感信息，如果存在，可以采用矩形框的形式将其框出，或者采用背景高亮显示等，还可以具体的输出敏感信息的所在位置，例如输出敏感信息的像素点坐标等。For picture files, the text content or image content included in the picture files can be detected to determine whether there is sensitive information in the text content or image content. If so, it can be framed in the form of a rectangular frame, or the background can be highlighted, etc. The location of the sensitive information can also be specifically output, such as the pixel coordinates of the sensitive information.

对于音频文件，可以提取其中的语音信息，并将语音信息转换为文本信息，从而识别文本信息中是否含有敏感信息，以及敏感信息的数量、敏感信息的文本内容、敏感信息的所在语句的文本内容以及敏感信息的开始时间点和结束时间点。For audio files, voice information can be extracted and converted into text information, so as to identify whether the text information contains sensitive information, the amount of sensitive information, the text content of the sensitive information, the text content of the sentence containing the sensitive information, and the start and end time points of the sensitive information.

对于视频文件，可以提取其中的不同人物的语音信息，并将语音信息转换为文本信息，从而识别文本信息中是否含有敏感信息，以及敏感信息的数量、敏感信息的文本内容、敏感信息的所在语句的文本内容、敏感信息的开始时间点和结束时间点以及敏感信息的关联视频帧。For video files, voice information of different characters can be extracted and converted into text information, so as to identify whether the text information contains sensitive information, as well as the amount of sensitive information, the text content of the sensitive information, the text content of the sentence containing the sensitive information, the start time and end time of the sensitive information, and the associated video frames of the sensitive information.

本方案这样设置，可以针对不同类型的目标文件，采用相对应的方式进行敏感信息的审核，提高了敏感信息检测系统的广泛适用性，同时，本方案对于目标文件的检测报告的格式设置，可以更好的体现出目标文件中包含的敏感信息的相关内容，可以有助于提高查看人员对于该敏感信息的所在位置的高效定位，提高检测报告的展示效果。This arrangement of the present scheme allows for the review of sensitive information in corresponding ways for different types of target files, thereby improving the wide applicability of the sensitive information detection system. At the same time, the formatting of the detection report for the target file in this scheme can better reflect the relevant content of the sensitive information contained in the target file, which can help improve the efficient positioning of the location of the sensitive information by viewers and improve the display effect of the detection report.

本实施例提供的技术方案，响应于敏感信息检测请求，获取待检测文件，并将所述待检测文件放入到检测队列中；其中，所述待检测文件的文件类型包括文字文件、图片文件、音频文件以及视频文件中的一种或者多种；从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法；基于所述检测算法对所述目标文件进行敏感信息检测；根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告。本方案通过对于不同文件类型的待检测文件，采用不同的检测算法，得到检测结果后填入至对应的检测报告当中，可以将待检测文件的敏感信息检测并显示出来，便于用户观看和了解待检测文件中存在的敏感信息问题。本发明减少人工参与，节省处理环节，通过机器替代人工作业，节约人工处理成本，减少差错，提升了敏感信息的核查效率，大幅缩短业务的处理时长。The technical solution provided by this embodiment, in response to a sensitive information detection request, obtains a file to be detected, and puts the file to be detected into a detection queue; wherein the file type of the file to be detected includes one or more of a text file, a picture file, an audio file, and a video file; takes out a target file from the detection queue, and determines a detection algorithm according to the file type of the target file; performs sensitive information detection on the target file based on the detection algorithm; determines a detection report format according to the file type of the target file, and fills in the sensitive information detection result of the target file based on the detection report format to obtain a detection report of the target file. This solution adopts different detection algorithms for files to be detected of different file types, obtains the detection results and fills them into the corresponding detection report, and can detect and display the sensitive information of the file to be detected, so that users can watch and understand the sensitive information problems in the file to be detected. The present invention reduces manual participation, saves processing links, replaces manual work with machines, saves manual processing costs, reduces errors, improves the efficiency of sensitive information verification, and greatly shortens the processing time of the business.

在上述各技术方案中，可选的，在将所述待检测文件放入到检测队列中之后，还包括：In the above technical solutions, optionally, after putting the to-be-detected file into the detection queue, the method further includes:

所述方法还包括：The method further comprises:

其中，敏感信息检测请求的状态查询指令，可以用户手动发出的，也可以是用户进入到敏感信息检测系统的相应界面时，由系统自动发出的，例如进入到待检测文件显示界面，就可以在其相对应的位置显示该待检测文件的实时状态。Among them, the status query instruction of the sensitive information detection request can be issued manually by the user, or it can be automatically issued by the system when the user enters the corresponding interface of the sensitive information detection system. For example, when entering the display interface of the file to be detected, the real-time status of the file to be detected can be displayed at its corresponding position.

本方案通过这样的设置，对于状态的同步，可以有助于销售人员或者用户更加直观的得到之前所输入的待检测文件的检测状态，更好的为销售人员或者用户提供待检测文件的检测状态的直观体现，提高用户对于敏感信息检测系统的使用体验。Through such a setting, this solution can help sales personnel or users to more intuitively obtain the detection status of the previously input files to be tested for status synchronization, better provide sales personnel or users with an intuitive reflection of the detection status of the files to be tested, and improve the user experience of the sensitive information detection system.

在上述各技术方案中，可选的，在得到所述目标文件的检测报告之后，还包括：In the above technical solutions, optionally, after obtaining the detection report of the target file, the method further includes:

其中，对于检测报告的查看请求，可以用户点击检测报告“查看详情”的文本链接时所发出的请求，通过点击该文本链接，可以进入到检测报告中的具体敏感信息的内容以及位置等信息。此时，销售人员或者用户还可以点击“查看源文件”的按钮，来查看所输入的待检测文件，即目标文件。可以用于对源文件中出现的信息的上下文进行查看等，使得用户更加理解在什么情况下进行敏感词的规避。Among them, the request to view the test report can be issued when the user clicks the text link "View Details" of the test report. By clicking the text link, the content and location of specific sensitive information in the test report can be entered. At this time, the salesperson or user can also click the "View Source File" button to view the input file to be tested, that is, the target file. It can be used to view the context of the information appearing in the source file, etc., so that the user can better understand under what circumstances to avoid sensitive words.

本方案这样设置，可以在得到检测报告之后，为用户提供更加广泛的操作，使得用户在敏感信息检测系统上面的操作更加灵活，有助于掌握敏感信息的上下文、注意相应的规避方式等信息。This scheme is configured to provide users with a wider range of operations after obtaining the detection report, making the user's operations on the sensitive information detection system more flexible, helping to grasp the context of sensitive information, pay attention to corresponding avoidance methods and other information.

在上述各技术方案中，可选的，基于所述检测算法对所述目标文件进行敏感信息检测，包括：In each of the above technical solutions, optionally, performing sensitive information detection on the target file based on the detection algorithm includes:

可以理解的，如果目标文件为文字文件，则可以直接获取文本信息，如果目标文件为图片、音频以及视频等其他形式的文件，可以对其进行文本信息的获取。例如可以使用媒体内容分析、视频识别、语音识别、语音转写、图像识别、OCR识别等一系列主流AI技术。It is understandable that if the target file is a text file, the text information can be directly obtained, and if the target file is a file in other forms such as a picture, audio, or video, the text information can be obtained. For example, a series of mainstream AI technologies such as media content analysis, video recognition, speech recognition, speech transcription, image recognition, and OCR recognition can be used.

本方案这样设置，可以实现智能识别多媒体文件(视频文件、音频文件)、非多媒体文件(WORD、PPT、PDF、excel、TXT等常用文档，png、jpg、jpeg、bmp等各种格式图片等)中所包含的语音、图像、文字内容，是否涉及保险销售敏感信息，精准实现保险领域内风险预防自主查和风险排查智能查，将风险防控防线前置到销售最前端，实现源头管控。本方案通过打造AI能力引擎，实现能力输出，赋能更多保险风控应用场景。媒体内容分析、视频识别、语音识别、语音转写、图像识别、OCR识别等一系列AI技术基于腾讯云AI接口实现，通过腾讯云媒体处理、OCR识别、对象存储接口，将其融合进本项申请中，用作AI技术引擎支持。This scheme is set up in this way to realize intelligent recognition of whether the voice, image, and text content contained in multimedia files (video files, audio files) and non-multimedia files (common documents such as WORD, PPT, PDF, excel, TXT, and various formats of pictures such as png, jpg, jpeg, bmp, etc.) involve sensitive information on insurance sales, accurately realize independent risk prevention and intelligent risk investigation in the insurance field, and move the risk prevention and control line to the forefront of sales to achieve source control. This scheme enables more insurance risk control application scenarios by creating an AI capability engine to achieve capability output. A series of AI technologies such as media content analysis, video recognition, voice recognition, voice transcription, image recognition, and OCR recognition are implemented based on the Tencent Cloud AI interface. Through the Tencent Cloud media processing, OCR recognition, and object storage interface, they are integrated into this application and used as AI technology engine support.

在上述各技术方案中，可选的，所述方法还包括：In the above technical solutions, optionally, the method further includes:

本方案中，敏感信息检测系统，可以设置有敏感词库，该敏感词库可以用于进行敏感词的查询。例如，用户可以进入“词库查询”页面，一是可查看系统敏感词库中所包含的所有词汇，二是可选择类型，分别查看不同类型所包含的词汇，三是可手工输入字词，搜索检测该字词是否为敏感词汇。In this solution, the sensitive information detection system can be provided with a sensitive word library, which can be used to search for sensitive words. For example, a user can enter the "word library query" page, first, to view all the words contained in the system's sensitive word library, second, to select a type and view the words contained in different types, and third, to manually enter a word to search and detect whether the word is a sensitive word.

本方案通过这样的设置，可以有助于销售人员预先了解想要使用的词语是否为敏感词，以及对销售信息中容易出现的敏感词进行规避等。由此，可以提高销售人员所制定的文字或者图片内容的合规性。Through such a setting, this solution can help sales personnel to know in advance whether the words they want to use are sensitive words, and avoid sensitive words that are likely to appear in sales information, etc. Therefore, the compliance of the text or picture content formulated by the sales personnel can be improved.

在另一个实施例中，可选的，所述方法还包括：In another embodiment, optionally, the method further comprises:

本方案中，敏感信息检测系统，可以进入“敏感词管理”页面，一是可进行字词上报：选择添加字词的类型，输入要上报的关键字，支持多个字词同时上报，每个字词之间用顿号隔开；二是可查看自己上报的字词审核情况；如果是审核管理员，三是可查看已上报的字词清单列表；四是可进行逐个字词的审批：如果审批通过，选择添加字词的类型，点击“审批通过”按钮，如果审批不通过，填写不通过的理由，点击“不通过”按钮。In this solution, the sensitive information detection system can enter the "Sensitive Word Management" page, where you can first report words: select the type of words to be added, enter the keywords to be reported, and support reporting multiple words at the same time, with each word separated by a comma; second, you can view the review status of the words you reported; if you are a review administrator, third, you can view a list of reported words; fourth, you can review each word one by one: if the review is passed, select the type of words to be added, and click the "Approved" button; if the review is not passed, fill in the reason for the failure, and click the "Failed" button.

本方案通过这样的设置，可以让销售人员或者用户参与到敏感词收集的过程中，同时通过审核人员的审核，可以在开源的敏感词库的使用过程中，让敏感词的录入更加规范化，优化敏感信息检测系统的性能。Through such a setting, this solution can allow sales personnel or users to participate in the process of collecting sensitive words. At the same time, through the review of auditors, the entry of sensitive words in the use of the open source sensitive word library can be made more standardized, thereby optimizing the performance of the sensitive information detection system.

图2是本申请实施例一提供的敏感信息检测系统的交互流程示意图，如图2所示，用户可以上传需要检测的文件，敏感信息检测系统可以通过自动调用接口，采用媒体内容分析、视频识别、语音识别、语音转写、图像识别、OCR识别等手段返回敏感信息检测结果，并将检测报告反馈给用户。Figure 2 is a schematic diagram of the interactive process of the sensitive information detection system provided in Example 1 of the present application. As shown in Figure 2, the user can upload the file to be detected, and the sensitive information detection system can return the sensitive information detection result by automatically calling the interface, using media content analysis, video recognition, voice recognition, voice transcription, image recognition, OCR recognition and other means, and feedback the detection report to the user.

图3是本申请实施例一提供的敏感信息检测系统的上传检测流程示意图，图4是本申请实施例一提供的敏感信息检测系统的输入检测流程示意图。如图3和图4所示，可以分别对上传文件和输入文件进行检测，并反馈检测报告。例如，进入敏感信息检测系统，选择“上传检测”，选择需要检测的文件，待系统自动上传文件结束后，会自动跳转到“我的检测”界面，查看该文件检测进度及报告详情；或者，进入“输入检测”页面，输入文字内容，点击“开始检测”，待系统自动上传结束后，会自动跳转“我的检测”界面，查看该输入文字检测进度及报告详情。FIG3 is a schematic diagram of the upload detection process of the sensitive information detection system provided in Example 1 of the present application, and FIG4 is a schematic diagram of the input detection process of the sensitive information detection system provided in Example 1 of the present application. As shown in FIG3 and FIG4, the uploaded file and the input file can be detected respectively, and the detection report can be fed back. For example, enter the sensitive information detection system, select "Upload Detection", select the file to be detected, and after the system automatically uploads the file, it will automatically jump to the "My Detection" interface to view the file detection progress and report details; or enter the "Input Detection" page, enter the text content, click "Start Detection", and after the system automatically uploads, it will automatically jump to the "My Detection" interface to view the input text detection progress and report details.

图5是本申请实施例一提供的敏感信息检测系统的历史检测清单交互流程示意图，如图5所示，敏感信息检测系统可以进入“我的检测”页面，可查看本人的统计数据、所有上传检测文件清单以及对应的检测报告。Figure 5 is a schematic diagram of the historical detection list interaction process of the sensitive information detection system provided in Example 1 of the present application. As shown in Figure 5, the sensitive information detection system can enter the "My Detection" page to view one's own statistics, a list of all uploaded detection files, and the corresponding detection report.

图6是本申请实施例一提供的敏感信息检测系统的词库查询流程示意图，如图6所示，敏感信息检测系统可以进入“词库查询”页面，一是可查看系统敏感词库中所包含的所有词汇，二是可选择类型，分别查看不同类型所包含的词汇，三是可手工输入字词，搜索检测该字词是否为敏感词汇。Figure 6 is a schematic diagram of the vocabulary query process of the sensitive information detection system provided in Example 1 of the present application. As shown in Figure 6, the sensitive information detection system can enter the "vocabulary query" page. First, all the words contained in the system's sensitive vocabulary can be viewed. Second, the type can be selected to view the words contained in different types respectively. Third, a word can be manually entered to search and detect whether the word is a sensitive word.

图7是本申请实施例一提供的敏感信息检测系统的敏感词汇新增流程示意图，图8是本申请实施例一提供的敏感信息检测系统的新增的敏感词汇审核流程示意图。如图7和图8所示，敏感信息检测系统可以进入“敏感词管理”页面，一是可进行字词上报：选择添加字词的类型，输入要上报的关键字，支持多个字词同时上报，每个字词之间用顿号隔开；二是可查看自己上报的字词审核情况；如果是审核管理员，三是可查看已上报的字词清单列表；四是可进行逐个字词的审批：如果审批通过，选择添加字词的类型，点击“审批通过”按钮，如果审批不通过，填写不通过的理由，点击“不通过”按钮。FIG7 is a schematic diagram of the sensitive word addition process of the sensitive information detection system provided in the first embodiment of the present application, and FIG8 is a schematic diagram of the newly added sensitive word review process of the sensitive information detection system provided in the first embodiment of the present application. As shown in FIG7 and FIG8, the sensitive information detection system can enter the "Sensitive Word Management" page, first, it can report words: select the type of words to be added, enter the keywords to be reported, support multiple words to be reported at the same time, and each word is separated by a comma; second, it can view the review status of the words reported by itself; if it is a review administrator, third, it can view the list of reported words; fourth, it can review each word one by one: if the review is passed, select the type of words to be added, click the "Approved" button, if the review is not passed, fill in the reason for the failure, and click the "Failed" button.

本申请所提供的技术方案，致力于推进保险风险治理智能化，不断探索人工智能等新技术在保险风险管理领域的运用，申请主要基于媒体内容分析、视频识别、语音识别、语音转写、图像识别、OCR识别等一系列主流AI技术，实现智能识别多媒体文件(视频文件、音频文件)、非多媒体文件(WORD、PPT、PDF、excel、TXT等常用文档，png、jpg、jpeg、bmp等各种格式图片等)中所包含的语音、图像、文字内容，是否涉及保险销售敏感信息，精准实现保险领域内风险预防自主查和风险排查智能查，将风险防控防线前置到销售最前端，实现源头管控；首创风险防控由排查监督向服务自控方式转变；彻底改变传统风险管理工作方式，变被动处置为主动防控；首创敏感词库“共建、共享、共用”模式，建立销售风险防控大脑，实现与时俱进；通过打造AI能力引擎，实现能力输出，赋能更多保险风控应用场景。媒体内容分析、视频识别、语音识别、语音转写、图像识别、OCR识别等一系列AI技术基于腾讯云AI接口实现，通过腾讯云媒体处理、OCR识别、对象存储接口，将其融合进本项申请中，用作AI技术引擎支持。The technical solution provided in this application is committed to promoting the intelligentization of insurance risk governance and constantly exploring the application of new technologies such as artificial intelligence in the field of insurance risk management. The application is mainly based on a series of mainstream AI technologies such as media content analysis, video recognition, voice recognition, voice transcription, image recognition, OCR recognition, etc., to realize intelligent recognition of multimedia files (video files, audio files) and non-multimedia files (common documents such as WORD, PPT, PDF, excel, TXT, png, jpg, jpeg, bmp and other formats of pictures, etc.) The voice, image, and text content contained in the content, whether it involves sensitive information on insurance sales, accurately realizes risk prevention self-checking and risk investigation intelligent check in the insurance field, and advances the risk prevention and control line to the forefront of sales to achieve source control; pioneers the transformation of risk prevention and control from investigation and supervision to service self-control; completely changes the traditional risk management working method, and changes passive disposal to active prevention and control; pioneers the "co-construction, sharing, and common use" model of sensitive word library, establishes a sales risk prevention and control brain, and keeps pace with the times; by creating an AI capability engine, realizes capability output, and empowers more insurance risk control application scenarios. A series of AI technologies such as media content analysis, video recognition, speech recognition, speech transcription, image recognition, OCR recognition, etc. are implemented based on Tencent Cloud AI interface. They are integrated into this application through Tencent Cloud media processing, OCR recognition, and object storage interface as AI technology engine support.

本方案首创敏感词库“共建、共享、共用”模式，首创敏感词库数字化管理全流程，建立销售风险防控大脑。This solution pioneered the "co-construction, sharing, and use" model for sensitive word libraries, pioneered the full process of digital management of sensitive word libraries, and established a sales risk prevention and control brain.

本方案首创专属的保险销售敏感词库，保险销售敏感词库依据监管、行业、公司相关文件建立，同时将每个敏感词汇按照不同类别进行归类整理。并且系统建立了一套标准、完备的敏感词库数字化管理流程，在该流程中，系统将用户分为普通用户和审批管理员两个角色；在查阅环节，普通用户和审批管理员均可随时查阅系统中当前正式采纳使用的敏感词汇。在上报环节，普通用户可随时上报词库中缺少的敏感词汇，但是需要经过审批管理员的审批，只有经审批管理员审核通过后的敏感词汇才能入库正式采纳使用。与此同时，审批管理员可直接入库自己上报的敏感词汇，一旦入库即正式采纳使用。在审批环节，普通用户可实时查看审核结果和驳回理由；审核管理员可对上报的敏感词汇做出同意采纳或驳回申请操作，如果操作为“同意采纳”，需要勾选该词汇所属类别方可入库，如果操作为“驳回申请”，审核管理员还需填写驳回理由和原因。This solution is the first to create a dedicated insurance sales sensitive word library. The insurance sales sensitive word library is established based on regulatory, industry, and company-related documents, and each sensitive word is classified and organized according to different categories. In addition, the system has established a set of standard and complete digital management processes for sensitive word libraries. In this process, the system divides users into two roles: ordinary users and approval administrators. In the review stage, ordinary users and approval administrators can review the sensitive words currently adopted and used in the system at any time. In the reporting stage, ordinary users can report sensitive words missing in the word library at any time, but they need to be approved by the approval administrator. Only sensitive words that have been reviewed and approved by the approval administrator can be officially adopted and used. At the same time, the approval administrator can directly enter the sensitive words reported by himself into the library, and once they are entered into the library, they will be officially adopted and used. In the approval stage, ordinary users can view the review results and rejection reasons in real time; the review administrator can make an application for approval or rejection of the reported sensitive words. If the operation is "agree to adopt", it is necessary to check the category to which the word belongs before it can be entered into the library. If the operation is "reject application", the review administrator must also fill in the rejection reason and reason.

敏感词库数字化管理全流程覆盖上报、审批、查阅等环节，实现数字化管理的同时，引入“开放、共建、共享”的互联网思维，确保内容持续跟进国家监管要求变化，实现与时俱进。The entire process of digital management of sensitive word libraries covers reporting, approval, and review. While realizing digital management, it introduces the Internet thinking of "openness, co-construction, and sharing" to ensure that the content continues to follow changes in national regulatory requirements and keep pace with the times.

另外，本方案利用媒体内容分析、视频识别等AI新技术，实现对视频文件敏感信息检测。In addition, this solution uses new AI technologies such as media content analysis and video recognition to detect sensitive information in video files.

本方案智能识别视频文件中的语音、图像、文字内容，自动检测保险销售敏感信息，分别出具视频文件的音频内容检测报告和图片内容检测报告。在图片内容检测报告中精准标记敏感词汇出现的视频帧数量、视频帧顺序、视频帧出现在视频文件的具体时间，以及高亮显示视频帧上检测到的敏感词汇。在音频内容检测报告中明确注明每段检测到敏感词汇的音频出现在整个视频文件的开始时间、结束时间，提供对应的文字稿，同时在文字稿中精准、高亮标记检测到的敏感词汇。极大提升视频文件检测效率和检测质量。This solution intelligently identifies the voice, image, and text content in video files, automatically detects sensitive information about insurance sales, and issues audio content detection reports and picture content detection reports for video files. In the picture content detection report, the number of video frames where sensitive words appear, the order of video frames, the specific time when the video frames appear in the video file, and the sensitive words detected on the video frames are accurately marked, and the sensitive words detected on the video frames are highlighted. In the audio content detection report, the start and end time of each audio segment where sensitive words are detected in the entire video file are clearly noted, and the corresponding transcript is provided. At the same time, the detected sensitive words are accurately and highlighted in the transcript. Greatly improve the efficiency and quality of video file detection.

其次，本方案采用语音识别及转写等技术，实现对音频文件敏感信息检测。Secondly, this solution uses technologies such as speech recognition and transcription to detect sensitive information in audio files.

本方案自动将音频文件内容识别为文字内容，并对文字内容进行智能检测。方案出具两份音频文件检测报告：一是分别反馈每段检测到敏感词汇的音频文件检测结果，包括这段音频出现在整段音频文件的开始时间、结束时间，提供对应的文字稿，同时在文字稿中精准、高亮标记检测到的敏感词汇；二是反馈一份完整的检测报告，提供整段音频文件的全文文字稿，同时在文字稿中醒目、高亮标记检测出的保险销售敏感词汇。This solution automatically identifies the content of audio files as text content and performs intelligent detection on the text content. The solution issues two audio file detection reports: one is to feedback the detection results of each audio file where sensitive words are detected, including the start and end time of the audio file in the entire audio file, provide the corresponding transcript, and accurately and highlight the detected sensitive words in the transcript; the other is to feedback a complete detection report, provide the full text transcript of the entire audio file, and highlight the detected sensitive words for insurance sales in the transcript.

并且，本方案利用图像识别、OCR识别等AI技术，实现对非多媒体文件敏感信息检测In addition, this solution uses AI technologies such as image recognition and OCR recognition to detect sensitive information in non-multimedia files.

本方案支持WORD、PPT、TXT、EXCEL、WPS、PDF等多种常用文件类型及png、jpg、jpeg、bmp等各种格式图片进行销售敏感词汇检测。在出具的检测报告中会明确提示每页检测出敏感词汇的具体页码，同时分别注明该页面在文字和图片部分检测出的敏感词汇，并且检测出的敏感词汇给予高亮凸显，帮助用户快速锁定敏感词汇。This solution supports the detection of sales sensitive words in various common file types such as WORD, PPT, TXT, EXCEL, WPS, PDF, and various formats of pictures such as png, jpg, jpeg, bmp, etc. The specific page number where sensitive words are detected will be clearly indicated in the test report, and the sensitive words detected in the text and picture parts of the page will be noted separately, and the detected sensitive words will be highlighted to help users quickly lock in sensitive words.

另外，本方案记录每位用户的每次检测情况，并针对每位用户出具个人检测行为分析报告，用户可以一目了然的了解自己使用系统的情况，检测出违规词汇的情况，以及个人文件中出现次数最多的5个敏感词汇，帮助用户了解自己的用词习惯，避免在今后的工作中多次重复使用这些敏感词汇。同时系统保留上传的源文件，方便事后核查使用。In addition, this solution records each user's detection and issues a personal detection behavior analysis report for each user. Users can clearly understand their use of the system, the detection of illegal words, and the top five sensitive words in their personal files, helping users understand their word usage habits and avoid using these sensitive words repeatedly in future work. At the same time, the system retains the uploaded source files for easy verification and use afterwards.

最后，本方案打造敏感信息检测能力引擎，实现科技服务新模式。Finally, this solution creates a sensitive information detection capability engine to realize a new model of scientific and technological services.

本方案基于腾讯云媒体处理、OCR识别、对象存储接口等技术，构建中国人寿系统内多媒体文件和非多媒体文件敏感信息检测的技术体系，研发了一组支持视频、音频、常用办公文件类型及图片的文件敏感信息检测API，API实现对文件的敏感词汇检测，并返回检测结果。文件敏感信息检测API面向中国人寿寿险全系统开放，实现检测能力输出，支持快速对接前端应用，赋能更多的检测应用场景，提升科技应用效能，实现科技服务新模式。目前该方案已成功赋能中国人寿寿险教育培训相关系统，实现教育培训课件的智能防控。This solution is based on Tencent Cloud media processing, OCR recognition, object storage interface and other technologies to build a technical system for sensitive information detection of multimedia files and non-multimedia files in the China Life system. A set of file sensitive information detection APIs that support video, audio, common office file types and pictures have been developed. The API detects sensitive words in files and returns the detection results. The file sensitive information detection API is open to the entire China Life insurance system, realizes the output of detection capabilities, supports rapid docking with front-end applications, enables more detection application scenarios, improves the efficiency of technology applications, and realizes a new model of technology services. At present, the solution has successfully empowered China Life's education and training related systems and realized intelligent prevention and control of education and training courseware.

相比于传统的敏感信息审核方式，本方案具有如下优点：Compared with the traditional sensitive information review method, this solution has the following advantages:

一、颠覆式创新，将风险防控防线前置到销售最前端。1. Disruptive innovation, moving the risk prevention and control line forward to the forefront of sales.

本方案实现风险预防自主查，达到自主防控的目的。保险公司销售人员、内勤人员或者管理人员都可以在发微信、抖音、小红书、微博前，或者完成宣传视频、培训PPT等资料后，使用智查查系统自主检查相关文件的语音、图像、文字中是否包含销售敏感词汇，强化保险从业人员自我规范，为实现事前自我检查和全员共防保险销售误导风险提供有力支持。从源头上预防销售风险的发生，彻底颠覆传统工作模式。This solution realizes self-inspection of risk prevention and achieves the purpose of self-control. Insurance company sales staff, office staff or managers can use the Zhichacha system to self-check whether the voice, image and text of relevant documents contain sales-sensitive words before sending WeChat, Douyin, Xiaohongshu, Weibo, or after completing promotional videos, training PPT and other materials, strengthen the self-regulation of insurance practitioners, and provide strong support for the realization of self-inspection in advance and the joint prevention of insurance sales misleading risks by all employees. Prevent the occurrence of sales risks from the source and completely subvert the traditional working model.

二、科技赋能，率先在保险行业内探索文件敏感信息检测技术体系和应用实践，填补公司系统敏感信息检测应用空白。2. Empowered by technology, we are the first to explore the technical system and application practice of file sensitive information detection in the insurance industry, filling the gap in the application of sensitive information detection in the company's system.

本方案实现风险排查智能查，达到智能防控的目的。系统支持对多种类型的视频文件、音频文件、常用办公文件、各类型图片进行保险销售敏感词汇检测，同时检测内容包括文件所包含的语音、图像、文字部分，检测内容全面、无死角，与此同时，针对不同类型的文件出具与之相对应的检测报告，各类检测报告中均可一目了然的了解这份文件语音、图像、文字部分是否包含销售敏感词汇，包含哪些敏感词汇，并给予高亮凸显，帮助用户快速锁定敏感词汇，方便用户快捷整改。本方案首创将保险风险防控工作信息化、智能化、工具化，彻底颠覆人工审查的工作模式，提质增效。This solution realizes intelligent risk screening and achieves the purpose of intelligent prevention and control. The system supports the detection of sensitive words for insurance sales in various types of video files, audio files, common office files, and various types of pictures. At the same time, the detection content includes the voice, image, and text parts contained in the file. The detection content is comprehensive and has no blind spots. At the same time, corresponding test reports are issued for different types of files. In various test reports, you can clearly understand whether the voice, image, and text parts of this file contain sales sensitive words, which sensitive words are contained, and highlight them to help users quickly lock in sensitive words and facilitate users to quickly make corrections. This solution is the first to make insurance risk prevention and control work information-based, intelligent, and tool-based, completely subverting the working mode of manual review and improving quality and efficiency.

三、首创敏感词库数字化管理全流程。3. The first full process of digital management of sensitive vocabulary.

本方案汇集了监管、行业、公司多个方面相关的多份文件，创建了首个专属的保险销售敏感词库，截至当前正式投放使用包括涉及涉嫌炒停、限售、抢购等不合规描述、涉嫌混淆产品性质或混淆产品开发主体等不合规描述、涉嫌给予或者承诺给予投保人、被保险人或者受益人保险合同约定以外利益的不合规描述、涉嫌营销员招募不规范用语、涉嫌违反《广告法》的绝对化描述等五个大类的不合规描述或者不规范用语，为用户提供了一个统一、标准、数字化的保险销售敏感词库，方便用户随时随地查阅使用。与此同时，系统建立了一套完备的敏感词库数字化管理流程，覆盖上报、审批、查阅等环节，实现数字化管理的同时确保内容持续跟进国家监管要求变化，实现与时俱进。This solution brings together multiple documents related to supervision, industry, and companies, and creates the first exclusive insurance sales sensitive word library. As of now, it has officially been put into use, including non-compliant descriptions such as suspected speculation, suspension, limited sales, and panic buying, suspected non-compliant descriptions such as confusing the nature of the product or confusing the product development entity, suspected non-compliant descriptions of giving or promising to give the policyholder, the insured, or the beneficiary benefits beyond the insurance contract, suspected non-standard terms for recruiting salesmen, and suspected absolute descriptions that violate the "Advertising Law". It provides users with a unified, standard, and digital insurance sales sensitive word library, which is convenient for users to access and use anytime, anywhere. At the same time, the system has established a complete digital management process for sensitive word libraries, covering reporting, approval, and review, etc., to achieve digital management while ensuring that the content continues to follow up on changes in national regulatory requirements and keep pace with the times.

四、通过打造AI能力引擎，实现能力输出，赋能更多保险风控应用场景。4. By building an AI capability engine, we can achieve capability output and enable more insurance risk control application scenarios.

本方案面向中国人寿寿险全系统开放文件检测能力接口，API实现对文件的敏感词汇检测，并返回检测结果。通过提供API的方式，降低开发人员的开发成本和时间成本，提高系统的安全性和稳定性，保证数据的一致性，同时为系统扩展新功能预留了提升的空间。本方案通过打造敏感信息检测AI能力引擎，实现AI检测能力的赋能输出，支持前端应用快速接入，赋能公司各类经营管理、智慧风控等领域智能化应用场景，提升科技赋能水平。当前本方案已成功赋能教育培训领域，实现教育培训课件的智能防控。This solution opens the file detection capability interface to the entire China Life Insurance system. The API detects sensitive words in files and returns the detection results. By providing an API, the development cost and time cost of developers are reduced, the security and stability of the system are improved, the consistency of data is guaranteed, and room for improvement is reserved for the expansion of new functions of the system. This solution creates an AI capability engine for sensitive information detection, realizes the empowerment output of AI detection capabilities, supports rapid access to front-end applications, enables intelligent application scenarios in various fields of business management, smart risk control, and other fields of the company, and improves the level of technological empowerment. Currently, this solution has successfully empowered the education and training field and realized the intelligent prevention and control of education and training courseware.

五、全方位提升销售风险防控水平，满足多元办公需要。5. Comprehensively improve sales risk prevention and control levels to meet diverse office needs.

本方案面向用户实现自查查、智查查。用户可以上传待检测文件，系统智能检测结束后出具检测报告，用户可查看详细检测报告，快速掌握文件涉及敏感词汇的情况，同时系统帮助用户快速锁定敏感词汇，方便用户快捷整改。用户还可随时查询敏感词汇情况以及实现敏感词汇的上报、审批、驳回等管理。与此同时，方案提供两个版本，分别支持移动端和电脑PC端使用。两个版本功能一致、操作一致、数据共享，满足多元办公需要。This solution enables users to conduct self-checking and intelligent checking. Users can upload files to be tested, and the system will issue a test report after the intelligent test. Users can view the detailed test report and quickly understand the situation of sensitive words in the file. At the same time, the system helps users quickly lock sensitive words, which is convenient for users to make quick corrections. Users can also query the situation of sensitive words at any time and implement the reporting, approval, and rejection of sensitive words. At the same time, the solution provides two versions, which support mobile and PC use respectively. The two versions have the same functions, operations, and data sharing to meet the needs of diversified offices.

相比于目前的人工检测方式，本申请所提供的技术方案的技术优势在于：Compared with the current manual detection method, the technical advantages of the technical solution provided by this application are:

本方案申请公开一种风险主动自控防控方法和系统。方案按照“未雨绸缪，防患于未然”的思路，开发“智查查”系统，实现风险预防自主查，用户在发微信、抖音、小红书、微博前，或者完成宣传视频、培训PPT等资料后，可使用智查查系统自主、主动检查相关文件的音频、文字、图片中是否包含销售敏感词汇。从源头上预防销售风险的发生，变被动处置为主动防控，形成覆盖事前、事中、事后的风险管控闭环，有效减轻一线风控压力。同时，方案弥补风险事前防控信息化空白，将风险防控防线前置到销售的最前端。本方案首创风险防控由排查监督向服务自控、主动防控方式转变，彻底改变传统风险防控工作方式。This proposal applies to disclose a method and system for proactive self-control of risks. In accordance with the idea of "preparing for a rainy day and preventing problems before they occur", the proposal develops the "Zhi Cha Cha" system to realize self-checking of risk prevention. Before sending WeChat, Douyin, Xiaohongshu, Weibo, or after completing promotional videos, training PPTs and other materials, users can use the Zhi Cha Cha system to autonomously and proactively check whether the audio, text, and pictures of relevant documents contain sales-sensitive words. Prevent the occurrence of sales risks at the source, change passive disposal to active prevention and control, form a closed loop of risk control covering before, during, and after the event, and effectively reduce the pressure of front-line risk control. At the same time, the proposal fills the gap in information technology for risk prevention and control beforehand, and moves the risk prevention and control line of defense to the forefront of sales. This proposal pioneered the transformation of risk prevention and control from investigation and supervision to service self-control and proactive prevention and control, which completely changed the traditional way of risk prevention and control.

本方案申请公开一种智能风险防控方法和系统。方案着力研究探索如何将人工智能等新技术在保险风险管理领域落地实施，推进保险风险治理智能化、数字化、工具化。本方案基于腾讯云媒体处理、OCR识别、对象存储接口等AI技术推出“智查查”系统，系统可帮助用户智能核查视频文件、音频文件、常用办公文件、各类型图片的音频、文字、图片内容中是否包含销售敏感词汇；针对不同类型的文件出具与之相对应的检测报告，各类检测报告均可一目了然的了解这份文件语音、图像、文字部分是否包含销售敏感词汇，包含哪些敏感词汇，并给予高亮凸显，帮助用户快速锁定敏感词汇，方便用户快捷整改。系统打破纯人工操作困局，通过提供简单、快速、便捷的智能检验、审核、分析等工具，降低操作难度，降低人员使用门槛，做到一学就会，一用就懂，人人可用，人人会用，彻底颠覆人工审查的工作模式，提质增效。助力构建销售风险防控智能防线，同时填补行业内敏感信息检测应用空白，属于技术服务领域的发明创新。This proposal discloses an intelligent risk prevention and control method and system. The proposal focuses on studying and exploring how to implement new technologies such as artificial intelligence in the field of insurance risk management, and promote the intelligent, digital and tool-based management of insurance risks. This proposal launches the "Smart Check" system based on AI technologies such as Tencent Cloud media processing, OCR recognition, and object storage interface. The system can help users intelligently check whether the audio, text, and picture content of video files, audio files, common office files, and various types of pictures contain sales-sensitive words; corresponding test reports are issued for different types of files. All types of test reports can clearly understand whether the voice, image, and text parts of this file contain sales-sensitive words, which sensitive words are contained, and highlight them to help users quickly lock sensitive words and facilitate users to quickly rectify them. The system breaks the dilemma of pure manual operation. By providing simple, fast, and convenient intelligent inspection, review, analysis and other tools, it reduces the difficulty of operation and the threshold for personnel to use, so that everyone can learn and understand it once they use it, and everyone can use it, which completely subverts the working mode of manual review and improves quality and efficiency. It helps build an intelligent line of defense for sales risk prevention and control, and at the same time fills the gap in sensitive information detection applications in the industry. It is an invention and innovation in the field of technical services.

本方案申请公开一种风险防控管理方法和系统。方案首创销售风险敏感词库，销售风险敏感词库依据监管、行业、公司相关文件建立，每个敏感词汇按照不同类别进行归类整理。并且建立了一套标准、完备的线上数字化敏感词库管理流程，覆盖上报、审批、查阅等环节，实现敏感词汇的闭环数字化管理。敏感词库的管理采用“共建、共享、共用”模式，引入开放、共享、共赢的互联网思维，打破条线、岗位、地域的限制，只要是系统用户均可参与到敏感词汇的建设中，成为系统的建设者、使用者和最终的受益者。数字化敏感词库管理流程确保词库内容持续跟进国家监管要求变化，实现与时俱进。同时通过提供标准、统一的敏感词库，确保销售风险防控各项政策执行的实时性、有效性，一致性，是风险防控管理模式的颠覆式创新。This proposal discloses a risk prevention and control management method and system. The proposal pioneered a sales risk sensitive word library, which was established based on regulatory, industry, and company-related documents, and each sensitive word was classified and sorted according to different categories. In addition, a set of standard and complete online digital sensitive word library management processes was established, covering reporting, approval, and review, to achieve closed-loop digital management of sensitive words. The management of the sensitive word library adopts the "co-construction, sharing, and common use" model, introduces the open, shared, and win-win Internet thinking, breaks the restrictions of lines, positions, and regions, and all system users can participate in the construction of sensitive words and become the builders, users, and ultimate beneficiaries of the system. The digital sensitive word library management process ensures that the content of the word library continues to follow up on changes in national regulatory requirements and keeps pace with the times. At the same time, by providing a standard and unified sensitive word library, it ensures the real-time, effectiveness, and consistency of the implementation of various sales risk prevention and control policies, which is a subversive innovation in the risk prevention and control management model.

本方案申请公开一种风险防控能力输出方法和系统。本方案基于腾讯云媒体处理、OCR识别、对象存储接口等技术，构建中国人寿系统内多媒体文件和非多媒体文件敏感信息检测的技术体系，研发了一组支持视频文件、音频文件、常用办公文件、各类型图片的文件敏感信息检测API，API实现对文件的敏感词汇检测，并返回检测结果。文件敏感信息检测API面向中国人寿寿险全系统开放，实现风险防控能力输出，支持前端应用快速接入，赋能公司各类经营管理、教育培训、智慧风控等更多领域智能化风控应用场景，提升科技赋能水平，提升科技应用效能，实现科技服务新模式。This proposal discloses a method and system for outputting risk prevention and control capabilities. This proposal is based on Tencent Cloud media processing, OCR recognition, object storage interface and other technologies to build a technical system for sensitive information detection of multimedia files and non-multimedia files in the China Life system, and develops a set of file sensitive information detection APIs that support video files, audio files, common office files, and various types of pictures. The API detects sensitive words in files and returns the detection results. The file sensitive information detection API is open to the entire China Life life insurance system, realizes the output of risk prevention and control capabilities, supports rapid access to front-end applications, and enables the company's various business management, education and training, smart risk control and other intelligent risk control application scenarios in more fields, improves the level of technological empowerment, improves the efficiency of technological applications, and realizes a new model of technological services.

实施例二Embodiment 2

图9是本申请实施例二提供的敏感信息检测装置的结构示意图。如图9所示，所述装置包括：FIG9 is a schematic diagram of the structure of a sensitive information detection device provided in Example 2 of the present application. As shown in FIG9 , the device includes:

待检测文件获取模块910，用于响应于敏感信息检测请求，获取待检测文件，并将所述待检测文件放入到检测队列中；其中，所述待检测文件的文件类型包括文字文件、图片文件、音频文件以及视频文件中的一种或者多种；The to-be-detected file acquisition module 910 is used to respond to the sensitive information detection request, acquire the to-be-detected file, and put the to-be-detected file into the detection queue; wherein the file type of the to-be-detected file includes one or more of a text file, a picture file, an audio file, and a video file;

检测算法确定模块920，用于从所述检测队列中取出目标文件，根据所述目标文件的文件类型，确定检测算法；A detection algorithm determination module 920, configured to take out a target file from the detection queue and determine a detection algorithm according to a file type of the target file;

检测模块930，用于基于所述检测算法对所述目标文件进行敏感信息检测；A detection module 930, configured to perform sensitive information detection on the target file based on the detection algorithm;

检测报告生成模块940，用于根据所述目标文件的文件类型，确定检测报告格式，并基于所述检测报告格式将目标文件的敏感信息检测结果填入，得到所述目标文件的检测报告。The detection report generation module 940 is used to determine the detection report format according to the file type of the target file, and fill in the sensitive information detection result of the target file based on the detection report format to obtain the detection report of the target file.

本装置可以执行上述各实施例所提供的敏感信息检测方法，具有与之相应的功能单元和有益效果。此处不再赘述。This device can execute the sensitive information detection method provided in the above embodiments, and has corresponding functional units and beneficial effects, which will not be described in detail here.

实施例三Embodiment 3

本申请实施例还提供一种包含计算机可执行指令的存储介质，所述计算机可执行指令在由计算机处理器执行时用于执行一种敏感信息检测方法，该方法包括：The embodiment of the present application further provides a storage medium containing computer executable instructions, wherein the computer executable instructions are used to perform a sensitive information detection method when executed by a computer processor, the method comprising:

存储介质——任何的各种类型的存储器电子设备或存储电子设备。术语“存储介质”旨在包括：安装介质，例如CD-ROM、软盘或磁带装置；计算机系统存储器或随机存取存储器，诸如DRAM、DDR RAM、SRAM、EDO RAM，兰巴斯(Rambus)RAM等；非易失性存储器，诸如闪存、磁介质(例如硬盘或光存储)；寄存器或其他相似类型的存储器元件等。存储介质可以还包括其他类型的存储器或其组合。另外，存储介质可以位于程序在其中被执行的计算机系统中，或者可以位于不同的第二计算机系统中，第二计算机系统通过网络(诸如因特网)连接到计算机系统。第二计算机系统可以提供程序指令给计算机用于执行。术语“存储介质”可以包括可以驻留在不同位置中(例如在通过网络连接的不同计算机系统中)的两个或更多存储介质。存储介质可以存储可由一个或多个处理器执行的程序指令(例如具体实现为计算机程序)。Storage medium - any of various types of memory electronic devices or storage electronic devices. The term "storage medium" is intended to include: installation media, such as CD-ROM, floppy disk or tape device; computer system memory or random access memory, such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; non-volatile memory, such as flash memory, magnetic media (such as hard disk or optical storage); registers or other similar types of memory elements, etc. Storage media may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system, which is connected to the computer system via a network (such as the Internet). The second computer system can provide program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations (for example, in different computer systems connected by a network). The storage medium may store program instructions (for example, embodied as a computer program) that can be executed by one or more processors.

当然，本申请实施例所提供的一种包含计算机可执行指令的存储介质，其计算机可执行指令不限于如上所述的敏感信息检测操作，还可以执行本申请任意实施例所提供的敏感信息检测方法中的相关操作。Of course, the storage medium containing computer-executable instructions provided in an embodiment of the present application, whose computer-executable instructions are not limited to the sensitive information detection operations described above, can also execute related operations in the sensitive information detection method provided in any embodiment of the present application.

实施例四Embodiment 4

本申请实施例还提供一种敏感信息检测系统，所述敏感信息检测系统用于执行一种敏感信息检测方法，该方法包括：The embodiment of the present application further provides a sensitive information detection system, wherein the sensitive information detection system is used to execute a sensitive information detection method, the method comprising:

本实施例通过提供上述敏感信息检测系统，可以对于不同文件类型的待检测文件，采用不同的检测算法，得到检测结果后填入至对应的检测报告当中，可以将待检测文件的敏感信息检测并显示出来，便于用户观看和了解待检测文件中存在的敏感信息问题。本发明减少人工参与，节省处理环节，通过机器替代人工作业，节约人工处理成本，减少差错，提升了敏感信息的核查效率，大幅缩短业务的处理时长。By providing the above-mentioned sensitive information detection system, this embodiment can adopt different detection algorithms for files to be detected of different file types, fill in the corresponding detection report after obtaining the detection results, and detect and display the sensitive information of the files to be detected, so that users can view and understand the sensitive information problems in the files to be detected. The present invention reduces manual participation, saves processing links, replaces manual work with machines, saves manual processing costs, reduces errors, improves the efficiency of sensitive information verification, and greatly shortens the processing time of the business.

实施例五Embodiment 5

本申请实施例提供了一种电子设备。图10是本申请实施例五提供的一种电子设备的结构示意图。如图10所示，本实施例提供了一种电子设备1000，其包括：一个或多个处理器1020；存储装置1010，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器1020运行，使得所述一个或多个处理器1020实现本申请实施例所提供的敏感信息检测方法，该方法包括：The embodiment of the present application provides an electronic device. FIG10 is a schematic diagram of the structure of an electronic device provided in the fifth embodiment of the present application. As shown in FIG10 , the present embodiment provides an electronic device 1000, which includes: one or more processors 1020; a storage device 1010, which is used to store one or more programs, and when the one or more programs are run by the one or more processors 1020, the one or more processors 1020 implement the sensitive information detection method provided in the embodiment of the present application, and the method includes:

图10显示的电子设备1000仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。The electronic device 1000 shown in FIG. 10 is merely an example and should not impose any limitation on the functions and scope of use of the embodiments of the present application.

如图10所示，该电子设备1000包括处理器1020、存储装置1010、输入装置1030和输出装置1040；电子设备中处理器1020的数量可以是一个或多个，图10中以一个处理器1020为例；电子设备中的处理器1020、存储装置1010、输入装置1030和输出装置1040可以通过总线或其他方式连接，图10中以通过总线1050连接为例。As shown in FIG10 , the electronic device 1000 includes a processor 1020, a storage device 1010, an input device 1030, and an output device 1040; the number of processors 1020 in the electronic device may be one or more, and FIG10 takes one processor 1020 as an example; the processor 1020, the storage device 1010, the input device 1030, and the output device 1040 in the electronic device may be connected via a bus or other means, and FIG10 takes the connection via a bus 1050 as an example.

存储装置1010作为一种计算机可读存储介质，可用于存储软件程序、计算机可运行程序以及模块单元，如本申请实施例中的敏感信息检测方法对应的程序指令。The storage device 1010, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and module units, such as program instructions corresponding to the sensitive information detection method in the embodiment of the present application.

存储装置1010可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序；存储数据区可存储根据终端的使用所创建的数据等。此外，存储装置1010可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件，或其他非易失性固态存储器件。在一些实例中，存储装置1010可进一步包括相对于处理器1020远程设置的存储器，这些远程存储器可以通过网络连接。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The storage device 1010 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required for a function; the data storage area may store data created according to the use of the terminal, etc. In addition, the storage device 1010 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some instances, the storage device 1010 may further include a memory remotely arranged relative to the processor 1020, and these remote memories may be connected via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

输入装置1030可用于接收输入的数字、字符信息或语音信息，以及产生与电子设备的用户设置以及功能控制有关的关键信号输入。输出装置1040可包括显示屏、扬声器等电子设备。The input device 1030 may be used to receive input numbers, character information or voice information, and generate key signal inputs related to user settings and function control of the electronic device. The output device 1040 may include electronic devices such as a display screen and a speaker.

本申请实施例提供的电子设备，通过对于不同文件类型的待检测文件，采用不同的检测算法，得到检测结果后填入至对应的检测报告当中，可以将待检测文件的敏感信息检测并显示出来，便于用户观看和了解待检测文件中存在的敏感信息问题。本发明减少人工参与，节省处理环节，通过机器替代人工作业，节约人工处理成本，减少差错，提升了敏感信息的核查效率，大幅缩短业务的处理时长。The electronic device provided in the embodiment of the present application can detect and display the sensitive information of the file to be detected by using different detection algorithms for the files of different file types, and fill the detection results into the corresponding detection report after obtaining them, so as to facilitate the user to view and understand the sensitive information problems existing in the file to be detected. The present invention reduces manual participation, saves processing links, replaces manual work with machines, saves manual processing costs, reduces errors, improves the efficiency of checking sensitive information, and greatly shortens the processing time of the business.

上述实施例中提供的敏感信息检测装置、介质及系统可运行本申请任意实施例所提供的敏感信息检测方法，具备运行该方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节，可参见本申请任意实施例所提供的敏感信息检测方法。The sensitive information detection device, medium and system provided in the above embodiments can run the sensitive information detection method provided in any embodiment of the present application, and have the corresponding functional modules and beneficial effects of running the method. For technical details not described in detail in the above embodiments, please refer to the sensitive information detection method provided in any embodiment of the present application.

本领域内的技术人员应明白，本申请的实施例提供的方法、系统或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that the embodiments of the present application provide methods, systems or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment or an embodiment combining software and hardware. Moreover, the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本申请是参照根据本申请实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框，以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present application. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the processes and/or boxes in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent storage in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash RAM. The memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer readable media include permanent and non-permanent, removable and non-removable media that can be implemented by any method or technology to store information. Information can be computer readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary computer readable media (transitory media), such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device. In the absence of more restrictions, the elements defined by the sentence "comprises a ..." do not exclude the existence of other identical elements in the process, method, commodity or device including the elements.

以上所述仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所做的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above is only an embodiment of the present application and is not intended to limit the present application. For those skilled in the art, the present application may have various changes and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A sensitive information detection method, characterized in that the method comprises:

In response to a sensitive information detection request, obtain a file to be detected, and put the file to be detected into a detection queue; wherein the file type of the file to be detected includes one or more of a text file, a picture file, an audio file, and a video file;

Taking out a target file from the detection queue, and determining a detection algorithm according to a file type of the target file;

Performing sensitive information detection on the target file based on the detection algorithm;

According to the file type of the target file, a detection report format is determined, and based on the detection report format, the sensitive information detection result of the target file is filled in to obtain a detection report of the target file.

2. The method according to claim 1, characterized in that after putting the file to be detected into the detection queue, it also includes:

Switching the real-time status of the sensitive information detection request from an upload status to a pending detection status;

After taking out the target file from the detection queue and determining the detection algorithm according to the file type of the target file, the method further includes:

Switching the real-time status of the sensitive information detection request from a pending detection status to a detecting status;

After obtaining the detection report of the target file, the method further includes:

Switching the real-time status of the sensitive information detection request from a detecting status to a detection completed status;

The method further comprises:

In response to the status query instruction of the sensitive information detection request, the real-time status of the sensitive information detection request is read and displayed.

3. The method according to claim 1 is characterized in that, according to the file type of the target file, a detection report format is determined, and based on the detection report format, the sensitive information detection result of the target file is filled in to obtain the detection report of the target file, comprising:

If the file type of the target file is a text file, the sensitive information detection result of the target file is filled into a text detection report format; wherein the text detection report format includes the amount of sensitive information and the paragraph position where the sensitive information is located;

If the file type of the target file is an image file, the sensitive information detection result of the target file is filled into an image detection report format; wherein the image detection report format includes the amount of sensitive information, the text content of the sensitive information, and the pixel location of the sensitive information;

If the file type of the target file is an audio file, the sensitive information detection result of the target file is filled into an audio detection report format; wherein the audio detection report format includes the amount of sensitive information, the text content of the sensitive information, the text content of the sentence where the sensitive information is located, and the start time point and the end time point of the sensitive information;

If the file type of the target file is a video file, the sensitive information detection result of the target file is filled into a video detection report format; wherein the video detection report format includes the amount of sensitive information, the text content of the sensitive information, the text content of the sentence containing the sensitive information, the start time point and the end time point of the sensitive information, and the associated video frames of the sensitive information.

4. The method according to claim 1, characterized in that after obtaining the detection report of the target file, it also includes:

In response to a request to view the detection report of the target file, the detection report is obtained and displayed; and upon receiving an instruction to view details, the detected sensitive information is displayed; or upon receiving an instruction to view the source file, the target file is displayed.

5. The method according to claim 1, characterized in that the sensitive information detection of the target file based on the detection algorithm comprises:

Converting the target file into text information based on a detection algorithm;

The text information is matched with a preset sensitive word library for sensitive words.

6. The method according to claim 5, characterized in that the method further comprises:

In response to a query request for a sensitive word library, sensitive words in the sensitive word library are displayed according to pre-classified classification results.

7. The method according to claim 5, characterized in that the method further comprises:

In response to a request to report sensitive words, receiving words to be entered;

Send out vocabulary entry review requests and receive vocabulary entry review results;

If the review result is that the entry is allowed, the category to which the word to be entered belongs is determined, and the word to be entered is entered into the sensitive word library according to the category;

If the review result is that entry is not allowed, the reporting process of sensitive words will end.

8. A sensitive information detection device, characterized in that the device comprises:

A module for acquiring files to be detected, used to respond to a sensitive information detection request, acquire files to be detected, and put the files to be detected into a detection queue; wherein the file types of the files to be detected include one or more of text files, picture files, audio files, and video files;

A detection algorithm determination module, used to take out a target file from the detection queue and determine a detection algorithm according to the file type of the target file;

A detection module, used to perform sensitive information detection on the target file based on the detection algorithm;

The detection report generation module is used to determine the detection report format according to the file type of the target file, and fill in the sensitive information detection result of the target file based on the detection report format to obtain the detection report of the target file.

9. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the sensitive information detection method as described in any one of claims 1 to 7 is implemented.

10. A sensitive information detection system, characterized in that the sensitive information detection system is used to execute the sensitive information detection method as described in any one of claims 1-7.