CN118714368A

CN118714368A - Content security processing system

Info

Publication number: CN118714368A
Application number: CN202410698265.7A
Authority: CN
Inventors: 张悦涵; 陈萌; 钟伟; 张磊; 陈超
Original assignee: Yuanjingshengsheng Beijing Technology Co ltd
Current assignee: Yuanjingshengsheng Beijing Technology Co ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2024-09-27
Also published as: CN114125494A; CN114125494B

Abstract

The embodiment of the present application discloses a content security processing system, the system comprising: an application system for providing multiple uplink audio streams generated by the clients of multiple participant users of a target session respectively; a content audit auxiliary system for obtaining the correspondence between the multiple participant users and the voiceprint features according to the multiple uplink audio streams respectively, and merging the multiple uplink audio streams into a mixed audio stream for sending to a content security audit system for content security audit; a content security audit system for performing content security audit based on the mixed audio stream and generating an audit result; the content audit auxiliary system is also used to determine the illegal content and its location after receiving the audit result returned by the content security audit system, and to extract the target audio segment at the location of the illegal content from the mixed audio stream; the audit result is used to indicate whether the target session has a violation at the session granularity; the intercepted target audio segment is matched and judged with the voiceprint features of the multiple participant users respectively, and the illegal content is located to at least one of the target participant users according to the matching result. Through the embodiment of the present application, the illegal content can be located to a specific participant at a lower cost when multiple participant users are associated with the same session.

Description

Content security processing system

技术领域Technical Field

本申请涉及内容审核技术领域，特别是涉及内容安全处理系统。The present application relates to the technical field of content auditing, and in particular to a content security processing system.

背景技术Background Art

UGC(User Generated Content，用户生成内容)，也称用户原创内容，即用户将自己原创的内容通过互联网平台进行展示或者提供给其他用户。随着互联网、智能设备及各种新生服务的飞速发展，互联网上的数据呈现爆炸式增长，图片、视频、发文、聊天、直播等各种形式的UGC内容已经成为人们表达感情、记录事件和日常工作不可或缺的组成部分。但是，这些日益增长的内容中也充斥着各种不可控的风险因素，例如色情视频和图片、涉政暴恐内容、各种垃圾广告等等。随着相关部门监管的日渐严格，这些都是各网站及平台亟待认真对待和管理的工作。因此，内容安全审核系统便应运而生了。这种内容安全审核系统就可以对互联网中UGC内容的相关风险的安全工作进行管控。UGC (User Generated Content), also known as user-generated content, is the content that users create and display or provide to other users through the Internet platform. With the rapid development of the Internet, smart devices and various new services, the data on the Internet has exploded. Various forms of UGC content such as pictures, videos, posts, chats, live broadcasts, etc. have become an indispensable part of people's expression of feelings, recording events and daily work. However, these growing contents are also full of various uncontrollable risk factors, such as pornographic videos and pictures, political and violent terrorist content, various junk advertisements, etc. With the increasingly strict supervision of relevant departments, these are all tasks that need to be taken seriously and managed by various websites and platforms. Therefore, the content security review system came into being. This content security review system can control the security work related to the risks of UGC content on the Internet.

具体而言，内容安全审核系统主要是对用户在社交平台上上传、发布或共享的内容(文字，图片，音频，视频)进行审查。其主要目的是对低质庸俗等违规内容进行过滤筛选，从而生产高质量的内容、防止降低用户体验、保持良好的内容调性。例如，其中一种审核场景就是对直播过程中的音频内容进行审核，也即，判断用户在直播过程中的发言内容是否存在违规情况，如果存在，可以提示对应的应用系统对用户进行处理，类似的场景还有游戏场景，等等。Specifically, the content security review system mainly reviews the content (text, pictures, audio, video) uploaded, published or shared by users on social platforms. Its main purpose is to filter out low-quality and vulgar illegal content, so as to produce high-quality content, prevent the reduction of user experience, and maintain a good content tone. For example, one of the review scenarios is to review the audio content during the live broadcast, that is, to determine whether the user's speech content during the live broadcast is illegal. If so, the corresponding application system can be prompted to handle the user. Similar scenarios include game scenarios, etc.

现有技术在对直播过程中的音频内容进行安全审核时，通常是以直播间为单位进行审核，也即，对具体直播间中产生的音频流进行采集，然后提供给审核系统进行安全审核。在实际应用中，一个直播间中可以有多个主播进行发言，另外，由于在直播系统中还可以提供“连麦”功能，使得观众用户也可以在直播间内进行发言，等等。对于这种情况，现有技术的处理方式是，将同一个直播间中产生的多路音频流合并成一路，然后提供给审核系统进行安全审核。相应的，审核系统可以基于合成后的一路音频流，进行语音识别，自然语言理解等处理，判断是否存在违规内容。When conducting a security audit of the audio content during a live broadcast, the prior art usually conducts an audit in units of live broadcast rooms, that is, the audio stream generated in a specific live broadcast room is collected and then provided to the audit system for security audit. In actual applications, multiple anchors can speak in a live broadcast room. In addition, since the live broadcast system can also provide a "microphone" function, audience users can also speak in the live broadcast room, and so on. For this situation, the prior art processing method is to merge the multiple audio streams generated in the same live broadcast room into one, and then provide it to the audit system for security audit. Accordingly, the audit system can perform speech recognition, natural language understanding and other processing based on the synthesized audio stream to determine whether there is any illegal content.

这种方式虽然能够实现对直播内容的内容安全审核，但是只能在直播间粒度上，判断是否存在违规内容，相应的，如果存在违规内容，则意味着整个直播间都可能会受到惩罚。然而，实际上可能只是其中某个用户的发言存在违规情况，此时，对整个直播间都进行惩罚是不公平的。因此，在这种场景下，需要具体定位到违规的个人，而不是整个直播间。为了达到该目的，一种方案是，可以将直播过程中每个发言的用户对应的单路音频流分别提供给审核系统进行内容安全审核，但是，这会急剧增加内容审核成本，对于大部分应用系统而言，这种成本是不可承受的。另外，随着直播过程中参与发言的用户量不断扩大，也会产生审核系统侧的资源耗尽等问题。Although this method can realize the content security review of live broadcast content, it can only judge whether there is any illegal content at the granularity of the live broadcast room. Correspondingly, if there is any illegal content, it means that the entire live broadcast room may be punished. However, in fact, it may be that only the speech of one of the users is in violation. At this time, it is unfair to punish the entire live broadcast room. Therefore, in this scenario, it is necessary to specifically locate the individual who violated the rules, rather than the entire live broadcast room. In order to achieve this goal, one solution is to provide the single-channel audio stream corresponding to each user who speaks during the live broadcast to the audit system for content security review. However, this will sharply increase the cost of content review, which is unbearable for most application systems. In addition, as the number of users participating in the speech during the live broadcast continues to increase, problems such as resource exhaustion on the audit system side will also arise.

因此，在同一会话关联多个参与者用户的情况下，如何以更低的成本，将违规内容定位到具体的参与者个人，成为需要本领域技术人员解决的技术问题。Therefore, when multiple participant users are associated with the same session, how to locate the illegal content to specific individual participants at a lower cost becomes a technical problem that needs to be solved by technicians in this field.

发明内容Summary of the invention

本申请提供了内容安全处理系统，能够在同一会话关联多个参与者用户的情况下，以更低的成本，将违规内容定位到具体的参与者个人。The present application provides a content security processing system, which can locate illegal content to specific individual participants at a lower cost when multiple participant users are associated with the same session.

本申请提供了如下方案：This application provides the following solutions:

一种内容安全处理系统，包括：A content security processing system, comprising:

应用系统，用于确定分别由目标会话的多个参与者用户的客户端产生的多路上行音频流；An application system, for determining multiple uplink audio streams respectively generated by clients of multiple participant users of a target session;

内容审核辅助系统，用于分别根据所述多路上行音频流，获取所述多个参与者用户与声纹特征之间的对应关系，并将所述多路上行音频流合并成混合音频流，以用于发送到内容安全审核系统进行内容安全审核；A content audit auxiliary system, used for respectively obtaining the correspondence between the multiple participant users and the voiceprint features according to the multiple uplink audio streams, and merging the multiple uplink audio streams into a mixed audio stream for sending to a content security audit system for content security audit;

内容安全审核系统，用于基于所述混合音频流进行内容安全审核，并生成审核结果；A content security audit system, used to perform content security audit based on the mixed audio stream and generate an audit result;

所述内容审核辅助系统，还用于接收到所述内容安全审核系统返回的审核结果后，确定违规内容及其所在的位置，并从所述混合音频流中截取出所述违规内容所在位置处的目标音频片段；所述审核结果用于表示所述目标会话在会话粒度上是否存在违规；将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。The content audit auxiliary system is also used to determine the illegal content and its location after receiving the audit result returned by the content security audit system, and to intercept the target audio segment at the location of the illegal content from the mixed audio stream; the audit result is used to indicate whether the target session has a violation at the session granularity; the intercepted target audio segment is matched and judged with the voiceprint features of the multiple participant users respectively, and according to the matching result, the illegal content is located to at least one of the target participant users.

内容审核辅助系统，分别根据所述多路上行音频流，获取所述多个参与者用户与声纹特征之间的对应关系，并将所述多路上行音频流合并成混合音频流，以用于发送到内容安全审核系统进行内容安全审核；接收到所述内容安全审核系统返回的审核结果后，确定违规内容及其所在的位置，并从所述混合音频流中截取出所述违规内容所在位置处的目标音频片段；所述审核结果用于表示所述目标会话在会话粒度上是否存在违规；将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。The content audit auxiliary system obtains the correspondence between the multiple participant users and the voiceprint features based on the multiple uplink audio streams, and merges the multiple uplink audio streams into a mixed audio stream for sending to the content security audit system for content security audit; after receiving the audit result returned by the content security audit system, determines the illegal content and its location, and cuts out the target audio segment at the location of the illegal content from the mixed audio stream; the audit result is used to indicate whether the target session has a violation at the session granularity; the cut target audio segment is matched and judged with the voiceprint features of the multiple participant users, and according to the matching result, the illegal content is located to at least one of the target participant users.

内容审核辅助系统，用于根据应用系统中分别由目标会话的多个参与者用户的客户端产生的多路上行音频流，获取所述多个参与者用户与声纹特征之间的对应关系，并将所述多路上行音频流合并成混合音频流；A content audit auxiliary system, for obtaining the correspondence between the multiple participant users and the voiceprint features according to the multiple uplink audio streams respectively generated by the clients of the multiple participant users of the target session in the application system, and merging the multiple uplink audio streams into a mixed audio stream;

内容安全审核系统，用于对所述内容审核辅助系统发送的混合音频流进行内容安全审核，并生成审核结果；A content security audit system, used to perform a content security audit on the mixed audio stream sent by the content audit auxiliary system and generate an audit result;

所述内容审核辅助系统还用于，接收到所述内容安全审核系统返回的审核结果后，确定违规内容及其所在的位置，并从所述混合音频流中截取出所述违规内容所在位置处的目标音频片段；所述审核结果用于表示所述目标会话在会话粒度上是否存在违规；将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。The content audit auxiliary system is also used to, after receiving the audit results returned by the content security audit system, determine the illegal content and its location, and cut out the target audio segment at the location of the illegal content from the mixed audio stream; the audit results are used to indicate whether the target session has a violation at the session granularity; the intercepted target audio segment is matched and judged with the voiceprint features of the multiple participant users respectively, and according to the matching results, the illegal content is located to at least one of the target participant users.

根据本申请提供的具体实施例，本申请公开了以下技术效果：According to the specific embodiments provided in this application, this application discloses the following technical effects:

通过本申请实施例，可以在应用系统与内容安全审核系统之间提供中间的内容审核辅助系统，其中，应用系统用于提供分别由目标会话的多个参与者用户的客户端产生的多路上行音频流，之后，在由内容安全审核系统进行审核之前，该内容审核辅助系统可以根据同一会话中的多路上行音频流，分别提取出多个参与者用户的声纹特征。在对音频流进行送审时，仍然可以将多路上行音频流合并成混合音频流，以此避免内容安全审核成本升高。另外，在收到具体内容安全审核系统返回的审核结果之后，内容审核辅助系统还可以确定出违规内容及其所在的位置，并截取出所述违规内容所在位置处的目标音频片段。这样，可以将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，然后根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。通过这种方式，可以在不会导致成本急剧上升的情况下，实现在用户粒度上对违规情况的识别，从而在需要进行惩罚等处理的情况下，可以具体惩罚到违规的个人，而不会使得同一会话中的其他参与者用户受到影响。Through the embodiment of the present application, an intermediate content audit auxiliary system can be provided between the application system and the content security audit system, wherein the application system is used to provide multiple uplink audio streams generated by the clients of multiple participant users of the target session respectively, and then, before the content security audit system is audited, the content audit auxiliary system can extract the voiceprint features of multiple participant users respectively according to the multiple uplink audio streams in the same session. When the audio stream is submitted for review, the multiple uplink audio streams can still be merged into a mixed audio stream to avoid the increase in the cost of content security audit. In addition, after receiving the audit result returned by the specific content security audit system, the content audit auxiliary system can also determine the illegal content and its location, and intercept the target audio clip at the location of the illegal content. In this way, the intercepted target audio clip can be matched and judged with the voiceprint features of the multiple participant users respectively, and then the illegal content can be located to at least one of the target participant users according to the matching results. In this way, the identification of violations at the user granularity can be realized without causing a sharp increase in costs, so that when punishment and other treatments are required, the individual who violates the law can be punished specifically without affecting other participant users in the same session.

当然，实施本申请的任一产品并不一定需要同时达到以上所述的所有优点。Of course, any product implementing the present application does not necessarily need to achieve all of the advantages described above at the same time.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本申请实施例提供的系统架构的示意图；FIG1 is a schematic diagram of a system architecture provided in an embodiment of the present application;

图2是本申请实施例提供的方法的流程图；FIG2 is a flow chart of a method provided in an embodiment of the present application;

图3是本申请实施例提供的装置的示意图；FIG3 is a schematic diagram of a device provided in an embodiment of the present application;

图4是本申请实施例提供的电子设备的示意图。FIG. 4 is a schematic diagram of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员所获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field belong to the scope of protection of this application.

在本申请实施例中，为了能够在同一会话(Session，包括直播会话，聊天会话，游戏会话等)关联多个参与者用户的情况下，以更低的成本，将违规内容定位到具体的参与者用户个人，提供了对应的解决方案。在该方案中，可以在内容安全审核系统与具体的应用系统(也即，内容生产系统，例如，直播系统，游戏系统等等)之间，建立中间层的内容审核辅助系统。具体的，应用系统可以将具体会话中产生的多路上行音频流提交到该处理系统，该处理系统除了可以对音频流进行向审核系统的送审，还可以根据每路上行音频流(其中，每个具体参与发言的用户对应一路上行音频流)，提取出对应参与者用户的声纹特征。在优选的方式下，还可以对每路上行音频流进行录音处理，这样，可以为每个参与者用户保存下多条录音记录，还可以记录下具体的录音记录与具体的会话标识、产生时间等之间的对应关系，这种录音记录可以在后续进一步确认或者对用户采取惩罚等措施的过程中提供更有力的参考依据。In the embodiment of the present application, in order to locate the illegal content to the specific participant user at a lower cost when multiple participant users are associated with the same session (Session, including live broadcast session, chat session, game session, etc.), a corresponding solution is provided. In this solution, an intermediate layer content audit auxiliary system can be established between the content security audit system and the specific application system (that is, the content production system, for example, the live broadcast system, the game system, etc.). Specifically, the application system can submit the multiple uplink audio streams generated in the specific session to the processing system. In addition to submitting the audio stream to the audit system for review, the processing system can also extract the voiceprint features of the corresponding participant user according to each uplink audio stream (wherein each specific user participating in the speech corresponds to an uplink audio stream). In a preferred manner, each uplink audio stream can also be recorded, so that multiple recording records can be saved for each participant user, and the corresponding relationship between the specific recording record and the specific session identifier, the generation time, etc. can also be recorded. This recording record can provide a more powerful reference basis in the process of further confirmation or punishment of the user in the future.

具体在向审核系统进行送审时，本申请实施例仍然可以将多路上行音频流合并为一路(或者几路)混合音频流，使得审核系统基于该混合音频流进行内容安全审核即可，而不需要单独为每个参与者用户对应的单路上行音频流分别进行内容安全审核。Specifically, when submitting the information to the audit system for review, the embodiment of the present application can still merge multiple uplink audio streams into one (or several) mixed audio streams, so that the audit system can perform content security review based on the mixed audio stream, without the need to perform content security review separately for the single uplink audio stream corresponding to each participant user.

在审核系统返回审核结果时，也会先返回到本申请实施例中的处理系统，然后，该处理系统进行定位处理后，再返回给具体的应用系统。其中，在收到审核系统返回的审核结果后，如果审核结果显示送审的音频流存在违规情况，则该处理系统可以确定出具体的违规内容(例如，可能是某用户在发言过程中用到的某个关键词或者关键语句，等等)，以及违规内容所在位置处对应的音频数据(该音频数据具体也是一个很短的音频片段，也即，用户说出的上述关键词或者关键语句对应的小音频片段)。之后，由于之前可以提取到具体参与者用户的声纹特征信息，因此，可以将该音频数据与各个参与者用户的声纹特征进行匹配判断，如果与某个用户的声纹特征匹配成功，就可以确定出是该参与者用户存在违规情况。When the audit system returns the audit result, it will first return to the processing system in the embodiment of the present application, and then the processing system will perform positioning processing before returning it to the specific application system. Among them, after receiving the audit result returned by the audit system, if the audit result shows that the audio stream submitted for review has violations, the processing system can determine the specific illegal content (for example, it may be a keyword or key sentence used by a user during a speech, etc.), and the audio data corresponding to the location of the illegal content (the audio data is also a very short audio clip, that is, a small audio clip corresponding to the above-mentioned keyword or key sentence spoken by the user). Afterwards, since the voiceprint feature information of the specific participant user can be extracted before, the audio data can be matched and judged with the voiceprint features of each participant user. If the voiceprint feature of a certain user is successfully matched, it can be determined that the participant user has violated the rules.

这样，通过上述方案，由于可以根据参与者用户的声纹特征，对包含违规内容声音的音频片段进行匹配判断，因此，可以将具体的违规内容定位到具体的参与者用户个人，从而可以基于违规的参与者用户个人进行后续的惩罚等处理，例如，禁言等，而不会使得会话中的其他参与者用户受到影响。另外，由于具体在进行内容安全审核时，仍然可以基于合流后的一路或者少数几路混合音频流进行，因此，可以以较低的成本，实现对违规内容到具体目标参与者用户个人的定位。In this way, through the above scheme, since the audio clips containing the sounds of illegal content can be matched and judged according to the voiceprint characteristics of the participant user, the specific illegal content can be located to the specific participant user, so that the subsequent punishment and other processing such as banning can be carried out based on the illegal participant user, without affecting other participant users in the session. In addition, since the content security audit can still be conducted based on one or a few mixed audio streams after confluence, the illegal content can be located to the specific target participant user at a lower cost.

具体从系统架构角度而言，如前文所述，本申请实施例可以提供内容审核辅助系统，如图1所示，该系统位于具体的内容安全审核系统与应用系统之间，应用系统的具体会话中产生的多路上行音频流，可以首先提交到本申请实施例的内容审核辅助系统，该系统进行声纹特征提取，以及对多路音频流进行合并成混合音频流后，向审核系统进行送审。收到审核结果之后，可以根据各个参与者用户的声纹特征，对存在违规内容声音的音频片段进行识别，确定出对应的目标参与者用户，从而将所述违规内容定位到个人。Specifically from the perspective of system architecture, as mentioned above, the embodiment of the present application can provide a content review assistance system. As shown in Figure 1, the system is located between a specific content security review system and an application system. The multiple uplink audio streams generated in the specific session of the application system can first be submitted to the content review assistance system of the embodiment of the present application. The system extracts voiceprint features and merges multiple audio streams into a mixed audio stream before submitting it to the review system for review. After receiving the review results, the audio clips containing illegal content can be identified based on the voiceprint features of each participant user, and the corresponding target participant user can be determined, thereby locating the illegal content to an individual.

下面对本申请实施例提供的具体实现方案进行详细介绍。The specific implementation scheme provided in the embodiments of the present application is described in detail below.

首先，本申请实施例提供了一种内容审核辅助处理方法，参见图2，该方法可以包括：First, the embodiment of the present application provides a content review auxiliary processing method, referring to FIG2 , the method may include:

S201：获取目标会话中产生的多路上行音频流，所述上行音频流分别由所述目标会话的多个参与者用户的客户端产生。S201: Acquire multiple uplink audio streams generated in a target session, where the uplink audio streams are respectively generated by clients of multiple participant users of the target session.

其中，目标会话可以根据具体应用系统中的情况而定，例如，在直播系统中，可以是直播会话(一个直播间就可以对应一个直播会话)，在游戏系统中，可以是游戏会话(一个游戏“房间”对应一个游戏会话)，在通信系统中，还可以是聊天会话，等等。具体的会话可以包括音频会话，还可以包括视频会话等。当然，在本申请实施例中，具体需要进行内容安全审核的对象主要是指具体音频内容，在视频会话的场景中，可以由具体的应用系统从视频流中分离出音频流分量，并上传到本申请实施例所述的辅助系统，以用于进行内容安全审核，等等。Among them, the target session can be determined according to the situation in the specific application system. For example, in a live broadcast system, it can be a live broadcast session (one live broadcast room can correspond to one live broadcast session), in a game system, it can be a game session (one game "room" corresponds to one game session), in a communication system, it can also be a chat session, and so on. Specific sessions can include audio sessions and video sessions, etc. Of course, in the embodiment of the present application, the specific objects that need to be subject to content security review mainly refer to specific audio content. In the scenario of a video session, the specific application system can separate the audio stream component from the video stream and upload it to the auxiliary system described in the embodiment of the present application for content security review, etc.

其中，同一个会话中，可以有多个参与者用户进行发言，每个参与发言的用户，就可以通过其客户端产生一路上行音频流。应用系统可以将多个参与者用户对应的多路上行音频流提供给本申请实施例中的审核处理系统。In the same session, multiple participants can speak, and each user can generate an uplink audio stream through his client. The application system can provide multiple uplink audio streams corresponding to multiple participants to the audit processing system in the embodiment of the present application.

S202：分别根据所述多路上行音频流，获取所述多个参与者用户与声纹特征之间的对应关系。S202: Obtain corresponding relationships between the multiple participant users and voiceprint features according to the multiple uplink audio streams respectively.

接收到多路上行音频流之后，可以分别根据每路上行音频流，提取对应参与者用户的声纹特征。具体的，由于每路上行音频流关联有参与者用户的标识，因此，在收到多路上行音频流之后，可以通过分别从每路音频流中进行声纹特征提取，并且可以与具体参与者用户标识建立起关联关系。后续便可以基于这种声纹特征，从具体违规内容对应的音频片段中，识别出说话者/发声者的身份。其中，声纹特征具体就可以是指发言者说话过程中的声学特征，是指计算机算法(数学方法)从声音信号提取出来的一组声学描述参数。具体提取声纹特征的算法可以有多种，例如，高斯混合模型(GMM),联合因子分析法(JFA),深度神经网络方法等等，这里不再详述。After receiving multiple uplink audio streams, the voiceprint features of the corresponding participant users can be extracted according to each uplink audio stream. Specifically, since each uplink audio stream is associated with the identifier of the participant user, after receiving multiple uplink audio streams, the voiceprint features can be extracted from each audio stream respectively, and an association relationship can be established with the specific participant user identifier. Subsequently, based on this voiceprint feature, the identity of the speaker/speaker can be identified from the audio clip corresponding to the specific illegal content. Among them, the voiceprint feature specifically refers to the acoustic features of the speaker during the speech process, which refers to a set of acoustic description parameters extracted from the sound signal by a computer algorithm (mathematical method). There can be many specific algorithms for extracting voiceprint features, such as Gaussian mixture model (GMM), joint factor analysis (JFA), deep neural network method, etc., which will not be described in detail here.

具体实现时，可以直接基于每路音频流，对参与者用户进行声纹特征提取。或者，在另一种方式下，还可以分别对每路上行音频流进行录音，然后，可以基于这种录音记录，对参与者用户的声纹特征进行提取，等等。另外，在这种进行录音的方式下，还可以保存每条录音记录与所述目标会话的标识、所述参与者用户的标识、产生时间之间的对应关系。以用作后续处理的依据。例如，在通过机器识别的方式定位到具体违规内容对应的参与者用户之后，还可以通过人工的方式进行复核，此时，可以将对应时间段的录音记录提供给人工复核客户端，以作为复核依据。或者，在后续识别出具体某个参与者用户有违规情况之后，也可以将这种录音记录提供给应用系统，使得应用系统在确定是否需要对该参与者用户进行惩罚之前，可以结合这种录音记录做出更准确的判断，等等。In specific implementation, the voiceprint feature extraction of the participant user can be performed directly based on each audio stream. Alternatively, in another way, each uplink audio stream can be recorded separately, and then the voiceprint feature of the participant user can be extracted based on the recording record, and so on. In addition, in this recording method, the correspondence between each recording record and the identifier of the target session, the identifier of the participant user, and the time of generation can also be saved. It can be used as a basis for subsequent processing. For example, after locating the participant user corresponding to the specific illegal content through machine recognition, it can also be manually reviewed. At this time, the recording record of the corresponding time period can be provided to the manual review client as a basis for review. Alternatively, after a specific participant user is subsequently identified as having violated the rules, the recording record can also be provided to the application system, so that the application system can make a more accurate judgment based on the recording record before determining whether the participant user needs to be punished, and so on.

这里需要说明的是，在直播等场景中，虽然可能有多个用户在同一直播间会话中发言，但是，可能有些用户只是偶尔发言，尤其是观众用户等，而主播等用户则可能会有比较长时间的发言，等等。也就是说，有些上行音频流中，可能只有部分时间有语音信号，有些上行音频流中则可能持续有语音信号，等等。而在进行录音时，可以对上行音频流中的语音信号进行检测，如果检测到有语音信号，则进行录音，否则，可以不必进行录音，因此，同一个参与者用户可以对应多条录音记录，分别对应不同的时间信息。另外，即使对于主播用户等连续性发言的情况，在进行录制时，也可以拆分成多条不同的录音记录。因此，具体对于某个会话者的多个参与者用户而言，可以分别对应多条录音记录，每条录音记录可以对应各自的起始时间、结束时间等信息。It should be noted here that in scenarios such as live broadcasting, although there may be multiple users speaking in the same live broadcast room session, some users may only speak occasionally, especially audience users, while users such as anchors may speak for a relatively long time, etc. That is to say, in some upstream audio streams, there may be voice signals only for part of the time, and in some upstream audio streams, there may be voice signals continuously, etc. When recording, the voice signal in the upstream audio stream can be detected. If a voice signal is detected, recording is performed, otherwise, recording is not required. Therefore, the same participant user can correspond to multiple recording records, corresponding to different time information. In addition, even for the case of continuous speaking by anchor users, when recording, it can be split into multiple different recording records. Therefore, for multiple participant users of a certain session, multiple recording records can be respectively corresponded, and each recording record can correspond to their own start time, end time and other information.

S203：将所述多路上行音频流合并成混合音频流，以用于发送到审核系统进行内容安全审核。S203: Merge the multiple uplink audio streams into a mixed audio stream for sending to the audit system for content security audit.

除了可以从每路上行音频流中获取各参与者用户的声纹特征，进行录音等之外，本申请实施例中的处理系统还可以执行送审处理。具体的，在本申请实施例中，仍然可以将多路上行音频流合并成混合音频流(例如，可以合并成一路混合音频流，等等)，之后，可以基于这种混合音频流，向审核系统发送审核请求，以对混合音频流的内容进行安全审核。In addition to obtaining the voiceprint features of each participant user from each uplink audio stream and recording, the processing system in the embodiment of the present application can also perform review processing. Specifically, in the embodiment of the present application, multiple uplink audio streams can still be merged into a mixed audio stream (for example, they can be merged into one mixed audio stream, etc.), and then, based on this mixed audio stream, an audit request can be sent to the audit system to perform a security audit on the content of the mixed audio stream.

其中，具体实现时，由于混合音频流属于流式数据，因此，在上传到审核系统之前，还可以首先将混合音频流切分成多个音频段落。例如，每12秒(也可以是其他时间长度)作为一个音频段落，等等。这样，具体的审核系统就可以以这种音频段落为单位，进行内容安全审核。具体的审核系统进行安全审核时，可以有多种方式，例如，一种方式下，可以对音频段落进行语音识别，也即，将语音信号转换为文本，然后，利用自然语言理解的相关算法，对转换出的文本进行自然语言理解，判断其中是否存在违规内容，其中，违规内容主要是一些带有敏感词的关键词，或者关键语句，等等。Among them, in the specific implementation, since the mixed audio stream belongs to streaming data, the mixed audio stream can be first divided into multiple audio segments before uploading to the audit system. For example, every 12 seconds (or other time lengths) is an audio segment, and so on. In this way, the specific audit system can conduct content security audits based on such audio segments. There can be many ways for the specific audit system to conduct security audits. For example, in one way, voice recognition can be performed on the audio segment, that is, the voice signal is converted into text, and then, the relevant algorithms of natural language understanding are used to perform natural language understanding on the converted text to determine whether there is any illegal content, where the illegal content is mainly some keywords with sensitive words, or key sentences, etc.

S204：接收到所述审核系统返回的审核结果后，确定违规内容及其所在的位置，并截取出所述违规内容所在位置处的目标音频片段。S204: After receiving the audit result returned by the audit system, determine the illegal content and its location, and cut out the target audio segment at the location of the illegal content.

审核系统在根据接收到的审核请求进行内容安全审核之后，可以向本申请实施例中提供的处理系统返回审核结果。具体的，如果以前述切分出的音频段落为单位申请进行内容安全审核，则具体的审核结果中主要可以包括：具体存在违规情况的音频段落。也就是说，使得处理系统可以获知，具体哪个或者哪些音频段落存在违规情况。After the audit system performs a content security audit according to the received audit request, it can return the audit result to the processing system provided in the embodiment of the present application. Specifically, if the content security audit is applied for in units of the aforementioned segmented audio segments, the specific audit result can mainly include: the audio segments with specific violations. In other words, the processing system can know which specific audio segment or segments have violations.

但是，由于具体的违规内容通常是关键词、关键语句等，因此，即使切分成音频段落，具体违规内容通常也只出现在具体音频段落中的某个位置处。例如，一个音频段落可能为12秒，而违规内容是一个关键词，只出现在该段落中，从第3至5秒之间的一个小的音频片段，等等。而只有根据具体违规内容对应的音频片段，才能够与多个参与者用户的声纹特征进行匹配，进而确定具体是哪个参与者用户说出的该违规内容。However, since the specific illegal content is usually keywords, key sentences, etc., even if it is divided into audio segments, the specific illegal content usually only appears at a certain position in the specific audio segment. For example, an audio segment may be 12 seconds, and the illegal content is a keyword that only appears in the segment, a small audio segment between the 3rd and 5th seconds, etc. Only the audio segment corresponding to the specific illegal content can be matched with the voiceprint features of multiple participant users, and then determine which participant user specifically said the illegal content.

因此，在具体实现时，收到具体的审核结果之后，还可以确定违规内容，以及所述违规内容在所述目标音频段落中的位置，并且，根据所述违规内容在所述目标音频段落中的位置，从所述目标音频段落中对应的位置处截取出所述目标音频片段，以用于进行后续的声纹识别。Therefore, in the specific implementation, after receiving the specific review results, the illegal content and the position of the illegal content in the target audio segment can also be determined, and according to the position of the illegal content in the target audio segment, the target audio segment can be cut out from the corresponding position in the target audio segment for subsequent voiceprint recognition.

具体的，确定违规内容及其所在位置的方式可以有多种。例如，一种方式下，如果审核系统返回的审核结果仅包括具体存在违规情况的音频段落，也即，从审核系统返回的审核结果中，只能知晓哪个或那几个音频段落存在违规情况，但是，具体的违规内容是什么，以及出现在具体段落中的什么位置，都是不知道的。此时，还可以由具体的处理系统，从存在违规情况的音频段落中，进行违规内容及其位置的识别。Specifically, there are many ways to determine the illegal content and its location. For example, in one way, if the audit results returned by the audit system only include the audio segments with illegal content, that is, from the audit results returned by the audit system, we can only know which audio segments have illegal content, but we don’t know what the illegal content is and where it appears in the specific segment. In this case, the specific processing system can also identify the illegal content and its location from the audio segments with illegal content.

例如，具体实现时，可以对所述目标音频段落进行语音识别(可以包括具体识别出的文本，以及具体文本内容在音频时间轴上对应的时间信息)，然后，可以将语音识别结果与预置的词库进行匹配。该词库中可以预先保存多个与违规相关的关键词，这样，如果语音识别结果命中所述词库中的某关键词，则可以将该关键词确定为所述违规内容，并将该关键词在所述目标音频段落中的位置(也即，在音频段落时间轴上的起始时间以及结束时间)，确定为所述违规内容在所述目标音频段落中的位置。For example, in a specific implementation, the target audio segment may be subjected to speech recognition (which may include the specifically recognized text and the time information corresponding to the specific text content on the audio timeline), and then the speech recognition result may be matched with a preset word library. The word library may pre-store multiple keywords related to violations, so that if the speech recognition result hits a keyword in the word library, the keyword may be determined as the illegal content, and the position of the keyword in the target audio segment (that is, the start time and end time on the audio segment timeline) may be determined as the position of the illegal content in the target audio segment.

或者，在另一种方式下，审核系统返回的审核结果中可以不仅包括哪些音频段落中存在违规情况，还可以包括具体存在违规情况的音频段落中，具体包含的违规内容是什么，以及该违规内容出现在该音频段落中的具体什么位置，等等。这样，可以直接根据审核系统返回的结果，确定出具体的违规内容，以及违规内容在具体音频段落中的位置，等等。Alternatively, in another manner, the audit results returned by the audit system may include not only which audio segments contain violations, but also the specific illegal content contained in the specific audio segments with violations, and the specific location of the illegal content in the audio segments, etc. In this way, the specific illegal content and the location of the illegal content in the specific audio segments, etc. can be directly determined based on the results returned by the audit system.

S205：将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。S205: Match the intercepted target audio clips with the voiceprint features of the multiple participant users respectively, and locate the illegal content to at least one of the target participant users according to the matching results.

在截取出与具体违规内容的发声位置对应的目标音频片段之后，可以将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，这样，便可以根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。也就是说，虽然在送审时，是将同一会话中的多路上行音频流合并成了一路混合音频流，审核系统返回的审核结果中，也只能在会话粒度上，确定是否存在违规情况。但是，通过本申请实施例中，对具体违规内容及其位置的确定，以及对违规内容所在位置处的音频片段与多个参与者用户的声纹特征进行匹配的方式，可以将具体的违规内容定位到具体的参与者用户个人(可以是一个或多个)。这样，即使后续需要对具体的违规情况进行处理，例如，进行处罚、禁播等，也仅处罚该个人即可，同一会话中的其他用户不会受到影响。After the target audio clip corresponding to the sound position of the specific illegal content is intercepted, the intercepted target audio clip can be matched and judged with the voiceprint features of the multiple participant users respectively, so that the illegal content can be located to at least one of the target participant users according to the matching results. That is to say, although the multiple uplink audio streams in the same session are merged into one mixed audio stream when submitting for review, the audit result returned by the audit system can only determine whether there is a violation at the session granularity. However, through the determination of the specific illegal content and its location in the embodiment of the present application, and the matching of the audio clip at the location of the illegal content with the voiceprint features of multiple participant users, the specific illegal content can be located to a specific participant user (which can be one or more). In this way, even if the specific violation needs to be dealt with later, for example, punishment, banning, etc., only the individual can be punished, and other users in the same session will not be affected.

具体实现时，由于下游链路中可能涉及到对用户的惩罚，因此，可以更谨慎地给出具体的判断结果。为此，在具体实现时，在通过计算机算法对违规内容进行确定，并定位到具体产生该违规内容的参与者用户个人之后，还可以通过人工干预的方式，对具体的定位结果进行进一步的确认。例如，具体的，在将所述违规内容定位到其中一目标参与者用户之后，可以将所述审核系统给出的审核结果、本申请实施例中的处理系统得出的从违规内容到目标参与者用户个人的定位结果，以及所述目标参与者用户在对应时间段的录音记录，提供给人工审核客户端，以便通过人工审核的方式，对定位结果进行进一步确认。例如，可以通过人工对录音记录进行收听的方式，判断是否真的存在违规内容，以及对应的参与者用户是否为算法识别出的用户，等等。其中，可以向人工审核客户端提供相对比较长时间的录音记录，使得人工审核时，可以结合具体发言内容的上下文等，做出更准确的判断。In specific implementation, since the punishment of users may be involved in the downstream link, the specific judgment result can be given more cautiously. For this reason, in specific implementation, after the illegal content is determined by the computer algorithm and the participant user who specifically generates the illegal content is located, the specific positioning result can be further confirmed by manual intervention. For example, specifically, after the illegal content is located to one of the target participant users, the audit result given by the audit system, the positioning result from the illegal content to the target participant user obtained by the processing system in the embodiment of the present application, and the recording record of the target participant user in the corresponding time period can be provided to the manual audit client, so as to further confirm the positioning result by manual audit. For example, it can be judged whether there is really illegal content and whether the corresponding participant user is the user identified by the algorithm by manually listening to the recording record, etc. Among them, a relatively long recording record can be provided to the manual audit client, so that when the manual audit is conducted, a more accurate judgment can be made in combination with the context of the specific speech content.

另外，在将所述违规内容定位到其中一目标参与者用户之后(还可以在进一步的人工审核确认之后)，将所述审核结果、定位结果以及所述目标参与者用户在对应时间段的录音记录，提供给对应的应用系统，以便所述应用系统确定是否对所述目标参与者用户进行惩罚。也就是说，在本申请实施例中，不仅可以确定出具体违规的用户个人，而且，具体在判断是否对违规者进行惩罚时，不仅可以依据结论性的审核结果或者定位结果，还可以结合具体的录音记录获取到用户在当时发言过程中的原声数据，通过收听这种录音记录的方式，可以做出更准确的判断。In addition, after locating the illegal content to one of the target participant users (it can also be after further manual review and confirmation), the review result, the positioning result, and the audio recording of the target participant user in the corresponding time period are provided to the corresponding application system so that the application system can determine whether to punish the target participant user. That is to say, in the embodiment of the present application, not only can the specific individual user who violated the law be determined, but also, when judging whether to punish the violator, not only can the conclusive review result or positioning result be used, but also the original sound data of the user during the speech at that time can be obtained in combination with the specific audio recording, and by listening to this audio recording, a more accurate judgment can be made.

总之，通过本申请实施例，可以在应用系统与审核系统之间提供中间的处理系统，该处理系统可以根据同一会话中的多路上行音频流，分别提取出多个参与者用户的声纹特征。在对音频流进行送审时，仍然可以将多路上行音频流合并成混合音频流，以此避免内容安全审核成本升高。但是，在收到具体审核系统返回的审核结果之后，可以确定出违规内容及其所在的位置，并截取出所述违规内容所在位置处的目标音频片段。这样，可以将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，然后根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。通过这种方式，可以在不会导致成本急剧上升的情况下，实现在用户粒度上对违规情况的识别，从而在需要进行惩罚等处理的情况下，可以具体惩罚到违规的个人，而不会使得同一会话中的其他参与者用户受到影响。In summary, through the embodiment of the present application, an intermediate processing system can be provided between the application system and the audit system, and the processing system can extract the voiceprint features of multiple participant users according to the multiple uplink audio streams in the same session. When the audio stream is submitted for review, the multiple uplink audio streams can still be merged into a mixed audio stream to avoid the increase in the cost of content security audit. However, after receiving the audit result returned by the specific audit system, the illegal content and its location can be determined, and the target audio segment at the location of the illegal content can be intercepted. In this way, the intercepted target audio segment can be matched and judged with the voiceprint features of the multiple participant users respectively, and then the illegal content can be located to at least one of the target participant users according to the matching results. In this way, the identification of violations at the user granularity can be realized without causing a sharp increase in costs, so that when punishment and other treatments are required, the individual who violates the law can be punished specifically without affecting other participant users in the same session.

需要说明的是，本申请实施例中可能会涉及到对用户数据的使用，在实际应用中，可以在符合所在国的适用法律法规要求的情况下(例如，用户明确同意，对用户切实通知，等)，在适用法律法规允许的范围内在本文描述的方案中使用用户特定的个人数据。It should be noted that the embodiments of the present application may involve the use of user data. In actual applications, user-specific personal data can be used in the scheme described herein within the scope permitted by applicable laws and regulations, subject to the requirements of applicable laws and regulations of the country where the user is located (for example, with the user's explicit consent, effective notification to the user, etc.).

与前述方法实施例相对应，本申请实施例还提供了一种内容审核辅助处理装置，参见图3，该装置可以包括：Corresponding to the aforementioned method embodiment, the embodiment of the present application further provides a content review auxiliary processing device, referring to FIG3 , which may include:

音频流获取单元301，用于获取目标会话中产生的多路上行音频流，所述上行音频流分别由所述目标会话的多个参与者用户的客户端产生；The audio stream acquisition unit 301 is used to acquire multiple uplink audio streams generated in the target session, where the uplink audio streams are respectively generated by clients of multiple participant users of the target session;

声纹特征提取单元302，用于分别根据所述多路上行音频流，获取所述多个参与者用户与声纹特征之间的对应关系；A voiceprint feature extraction unit 302 is used to obtain the corresponding relationship between the multiple participant users and the voiceprint features according to the multiple uplink audio streams respectively;

音频流混合送审单元303，用于将所述多路上行音频流合并成混合音频流，以用于发送到审核系统进行内容安全审核；The audio stream mixing and review unit 303 is used to combine the multiple uplink audio streams into a mixed audio stream for sending to the review system for content security review;

目标音频片段截取单元304，用于接收到所述审核系统返回的审核结果后，确定违规内容及其所在的位置，并截取出所述违规内容所在位置处的目标音频片段；The target audio segment interception unit 304 is used to determine the illegal content and its location after receiving the audit result returned by the audit system, and intercept the target audio segment at the location of the illegal content;

声纹匹配判断单元305，用于将所截取出的目标音频片段分别与所述多个参与者用户的声纹特征进行匹配判断，根据匹配结果，将所述违规内容定位到其中至少一个目标参与者用户。The voiceprint matching judgment unit 305 is used to match the intercepted target audio segment with the voiceprint features of the multiple participant users respectively, and locate the illegal content to at least one of the target participant users according to the matching results.

具体实现时，该装置还可以包括：In a specific implementation, the device may further include:

录制单元，用于分别对所述多路上行音频流进行录制，并保存录制记录与所述目标会话的标识、所述参与者用户的标识、产生时间之间的对应关系，以用作后续处理的依据。The recording unit is used to record the multiple uplink audio streams respectively, and save the corresponding relationship between the recording record and the identifier of the target session, the identifier of the participant user, and the generation time, so as to be used as a basis for subsequent processing.

另外，该装置还可以包括：In addition, the device may also include:

第一录制结果提供单元，用于在将所述违规内容定位到其中一目标参与者用户之后，将所述审核结果、定位结果以及所述目标参与者用户在对应时间段的录制记录，提供给人工审核客户端，以便通过人工审核的方式，对定位结果进行进一步确认。The first recording result providing unit is used to provide the review result, the positioning result and the recording record of the target participant user in the corresponding time period to the manual review client after the illegal content is positioned to one of the target participant users, so as to further confirm the positioning result through manual review.

再者，该装置还可以包括：Furthermore, the device may further include:

第二录制结果提供单元，用于在将所述违规内容定位到其中一目标参与者用户之后，将所述审核结果、定位结果以及所述目标参与者用户在对应时间段的录制记录，提供给对应的应用系统，以便所述应用系统确定是否对所述目标参与者用户进行处理。The second recording result providing unit is used to provide the review result, the positioning result and the recording record of the target participant user in the corresponding time period to the corresponding application system after the illegal content is located to one of the target participant users, so that the application system can determine whether to process the target participant user.

段落切分单元，用于将所述多路上行音频流合并成混合音频流后，将所述混合音频流切分为多个音频段落，以便所述审核系统以所述音频段落为单位进行内容安全审核；A segmentation unit, used for merging the multiple uplink audio streams into a mixed audio stream and then segmenting the mixed audio stream into a plurality of audio segments, so that the audit system can perform content security audit based on the audio segments;

所述审核系统返回的审核结果包括：存在违规内容的目标音频段落；The audit results returned by the audit system include: target audio segments with illegal content;

所述目标音频片段截取单元具体可以包括：The target audio segment interception unit may specifically include:

违规内容位置确定子单元，用于确定违规内容，以及所述违规内容在所述目标音频段落中的位置；An illegal content location determination subunit, used to determine illegal content and the location of the illegal content in the target audio segment;

截取子单元，用于根据所述违规内容在所述目标音频段落中的位置，从所述目标音频段落中对应的位置处截取出所述目标音频片段。The interception subunit is used to intercept the target audio segment from the corresponding position in the target audio segment according to the position of the illegal content in the target audio segment.

具体的，所述违规内容位置确定子单元具体可以用于：Specifically, the illegal content location determination subunit can be used to:

对所述目标音频段落进行语音识别，并将语音识别结果与预置的词库进行匹配；Performing speech recognition on the target audio segment, and matching the speech recognition result with a preset vocabulary;

如果命中所述词库中的某关键词，则将该关键词确定为所述违规内容，并将该关键词在所述目标音频段落中的位置，确定为所述违规内容在所述目标音频段落中的位置。If a keyword in the vocabulary is hit, the keyword is determined as the illegal content, and the position of the keyword in the target audio segment is determined as the position of the illegal content in the target audio segment.

或者，另一种方式下，所述审核系统返回的审核结果中还可以包括：所述违规内容，以及所述违规内容在所述目标音频片段中的位置信息；Alternatively, in another manner, the audit result returned by the audit system may further include: the illegal content, and location information of the illegal content in the target audio segment;

此时，所述违规内容位置确定子单元具体可以用于：At this time, the illegal content location determination subunit can be specifically used to:

根据所述审核系统返回的审核结果确定违规内容，以及所述违规内容在所述目标音频片段中的位置。The illegal content and the location of the illegal content in the target audio segment are determined according to the audit result returned by the audit system.

另外，本申请实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现前述方法实施例中任一项所述的方法的步骤。In addition, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of any one of the methods in the aforementioned method embodiments are implemented.

以及一种电子设备，包括：And an electronic device, comprising:

一个或多个处理器；以及one or more processors; and

与所述一个或多个处理器关联的存储器，所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时，执行前述方法实施例中任一项所述的方法的步骤。A memory associated with the one or more processors, the memory being used to store program instructions, wherein the program instructions, when read and executed by the one or more processors, execute the steps of the method described in any one of the aforementioned method embodiments.

其中，图4示例性的展示出了电子设备的架构，具体可以包括处理器410，视频显示适配器411，磁盘驱动器412，输入/输出接口413，网络接口414，以及存储器420。上述处理器410、视频显示适配器411、磁盘驱动器412、输入/输出接口413、网络接口414，与存储器420之间可以通过通信总线430进行通信连接。4 exemplarily shows the architecture of the electronic device, which may include a processor 410, a video display adapter 411, a disk drive 412, an input/output interface 413, a network interface 414, and a memory 420. The processor 410, the video display adapter 411, the disk drive 412, the input/output interface 413, the network interface 414, and the memory 420 may be communicatively connected via a communication bus 430.

其中，处理器410可以采用通用的CPU(Central Processing Unit，中心处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本申请所提供的技术方案。Among them, the processor 410 can be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solution provided in this application.

存储器420可以采用ROM(Read Only Memory，只读存储器)、RAM(Random AccessMemory，随机存取存储器)、静态存储设备，动态存储设备等形式实现。存储器420可以存储用于控制电子设备400运行的操作系统421，用于控制电子设备400的低级别操作的基本输入输出系统(BIOS)。另外，还可以存储网页浏览器423，数据存储管理系统424，以及内容审核辅助处理系统425等等。上述内容审核辅助处理系统425就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之，在通过软件或者固件来实现本申请所提供的技术方案时，相关的程序代码保存在存储器420中，并由处理器410来调用执行。The memory 420 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 420 can store an operating system 421 for controlling the operation of the electronic device 400, and a basic input and output system (BIOS) for controlling the low-level operation of the electronic device 400. In addition, a web browser 423, a data storage management system 424, and a content audit auxiliary processing system 425, etc. can also be stored. The above-mentioned content audit auxiliary processing system 425 can be an application program that specifically implements the aforementioned steps in the embodiment of the present application. In short, when the technical solution provided by the present application is implemented by software or firmware, the relevant program code is stored in the memory 420 and is called and executed by the processor 410.

输入/输出接口413用于连接输入/输出模块，以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出)，也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等，输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 413 is used to connect the input/output module to realize information input and output. The input/output module can be configured in the device as a component (not shown in the figure), or it can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, etc.

网络接口414用于连接通信模块(图中未示出)，以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The network interface 414 is used to connect to a communication module (not shown) to realize communication interaction between the device and other devices. The communication module can realize communication through a wired mode (such as USB, network cable, etc.) or a wireless mode (such as mobile network, WIFI, Bluetooth, etc.).

总线430包括一通路，在设备的各个组件(例如处理器410、视频显示适配器411、磁盘驱动器412、输入/输出接口413、网络接口414，与存储器420)之间传输信息。The bus 430 comprises a pathway for transmitting information between the various components of the device (eg, the processor 410, the video display adapter 411, the disk drive 412, the input/output interface 413, the network interface 414, and the memory 420).

需要说明的是，尽管上述设备仅示出了处理器410、视频显示适配器411、磁盘驱动器412、输入/输出接口413、网络接口414，存储器420，总线430等，但是在具体实施过程中，该设备还可以包括实现正常运行所必需的其他组件。此外，本领域的技术人员可以理解的是，上述设备中也可以仅包含实现本申请方案所必需的组件，而不必包含图中所示的全部组件。It should be noted that, although the above device only shows a processor 410, a video display adapter 411, a disk drive 412, an input/output interface 413, a network interface 414, a memory 420, a bus 430, etc., in the specific implementation process, the device may also include other components necessary for normal operation. In addition, it can be understood by those skilled in the art that the above device may also only include components necessary for implementing the solution of the present application, and does not necessarily include all the components shown in the figure.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。It can be known from the description of the above implementation methods that those skilled in the art can clearly understand that the present application can be implemented by means of software plus a necessary general hardware platform. Based on such an understanding, the technical solution of the present application can be essentially or partly contributed to the prior art in the form of a software product, which can be stored in a storage medium such as ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present application or certain parts of the embodiments.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于系统或系统实施例而言，由于其基本相似于方法实施例，所以描述得比较简单，相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can refer to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system or system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can refer to the partial description of the method embodiment. The system and system embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

以上对本申请所提供的内容审核辅助处理方法、装置及电子设备，进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本申请的限制。The above is a detailed introduction to the content review auxiliary processing method, device and electronic device provided by this application. This article uses specific examples to illustrate the principles and implementation methods of this application. The description of the above embodiments is only used to help understand the method and core ideas of this application. At the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation methods and application scope. In summary, the content of this specification should not be understood as a limitation on this application.

Claims

1. A content security processing system, comprising:

An application system for providing multiple upstream audio streams respectively generated by clients of multiple participant users of a target session;

The content auditing auxiliary system is used for acquiring the corresponding relation between the plurality of participant users and the voiceprint features according to the multi-channel uplink audio streams respectively, and combining the multi-channel uplink audio streams into a mixed audio stream for sending to the content security auditing system to conduct content security auditing;

The content security auditing system is used for conducting content security auditing based on the mixed audio stream and generating auditing results;

The content auditing auxiliary system is also used for determining illegal contents and the positions of the illegal contents after receiving auditing results returned by the content security auditing system, and intercepting target audio fragments of the positions of the illegal contents from the mixed audio stream; the auditing result is used for indicating whether the target session has violations in session granularity; and respectively carrying out matching judgment on the cut target audio fragments and voiceprint characteristics of the plurality of participant users, and positioning the illegal contents to at least one target participant user according to a matching result.

2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

The content auditing auxiliary system is also used for recording the plurality of paths of uplink audio streams respectively and storing the corresponding relation between the recorded records and the identification of the target session, the identification of the participant user and the generation time so as to be used as the basis for subsequent processing.

3. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

The content auditing auxiliary system is also used for providing the auditing result, the locating result and the record of the target participant user in the corresponding time period to the manual auditing client after locating the illegal content to one of the target participant users so as to further confirm the locating result in a manual auditing mode.

4. A system according to claim 2 or 3, wherein,

The content auditing auxiliary system is also used for providing the auditing result, the locating result and the record of the target participant user in the corresponding time period to the corresponding application system after locating the illegal content to one of the target participant users so that the application system can determine whether to process the target participant user.

5. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

The content auditing auxiliary system is also used for splitting the mixed audio stream into a plurality of audio paragraphs after merging the plurality of uplink audio streams into the mixed audio stream;

The content security auditing system is specifically configured to perform content security auditing with the audio paragraph as a unit, so that the auditing result includes: a target audio paragraph for which offending content exists;

The content auditing auxiliary system is specifically used for determining illegal contents and the positions of the illegal contents:

determining illegal contents and the positions of the illegal contents in the target audio paragraphs, and cutting out the target audio fragments from the corresponding positions in the target audio paragraphs according to the positions of the illegal contents in the target audio paragraphs.

6. The system of claim 5, wherein the system further comprises a controller configured to control the controller,

The content auditing auxiliary system is specifically used for determining illegal contents and the positions of the illegal contents when the illegal contents are determined:

and performing voice recognition on the target audio paragraph, matching a voice recognition result with a preset word stock, and if a certain keyword in the word stock is hit, determining the keyword as the illegal content, and determining the position of the keyword in the target audio paragraph as the position of the illegal content in the target audio paragraph.

7. The system of claim 5, wherein the system further comprises a controller configured to control the controller,

The auditing result returned by the content security auditing system also comprises: the offending content and the position information of the offending content in the target audio clip;

and determining illegal contents and positions of the illegal contents in the target audio fragments according to an auditing result returned by the auditing system.

8. A content security processing system, comprising:

An application system for determining multiple upstream audio streams respectively generated by clients of multiple participant users of a target session;

the content auditing auxiliary system acquires the corresponding relation between the plurality of participant users and the voiceprint features according to the multi-channel uplink audio streams respectively, and combines the multi-channel uplink audio streams into a mixed audio stream for sending to a content security auditing system for content security auditing; after receiving an auditing result returned by the content security auditing system, determining illegal contents and the positions of the illegal contents, and intercepting target audio fragments of the positions of the illegal contents from the mixed audio stream; the auditing result is used for indicating whether the target session has violations in session granularity; and respectively carrying out matching judgment on the cut target audio fragments and voiceprint characteristics of the plurality of participant users, and positioning the illegal contents to at least one target participant user according to a matching result.

9. A content security processing system, comprising:

The content auditing auxiliary system is used for acquiring the corresponding relation between the multiple participant users and the voiceprint features according to multiple uplink audio streams respectively generated by the clients of the multiple participant users of the target session in the application system, and combining the multiple uplink audio streams into a mixed audio stream;

The content security auditing system is used for conducting content security auditing on the mixed audio stream sent by the content auditing auxiliary system and generating auditing results;