CN112492534B - Message processing method, device and equipment - Google Patents
Message processing method, device and equipment Download PDFInfo
- Publication number
- CN112492534B CN112492534B CN202011338960.0A CN202011338960A CN112492534B CN 112492534 B CN112492534 B CN 112492534B CN 202011338960 A CN202011338960 A CN 202011338960A CN 112492534 B CN112492534 B CN 112492534B
- Authority
- CN
- China
- Prior art keywords
- messages
- message
- account
- message set
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 claims description 36
- 238000004590 computer program Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/12—Messaging; Mailboxes; Announcements
- H04W4/14—Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
本申请实施例提供一种消息处理方法、装置及设备,包括:在第一消息集合中确定第二消息集合,所述第二消息集合中的消息为预设类型;根据所述第二消息集合中各消息之间的相似度,将所述第二消息集合划分为至少一个第三消息集合,所述第三消息集合中的各消息之间的相似度大于或等于第一阈值;获取所述第三消息集合中每个消息对应的第一账号;在所述第一消息集合中确定每个第一账号对应的第一消息数量、所述第一账号在所述第三消息集合中对应的第二消息数量、所述第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,所述账号集合包括所述第二消息集合中的消息对应的账号。提高了运营商对发送垃圾短信的手机号码的识别精度。
Embodiments of the present application provide a message processing method, apparatus, and device, including: determining a second message set in a first message set, where the messages in the second message set are of a preset type; according to the second message set The second message set is divided into at least one third message set, and the similarity between the messages in the third message set is greater than or equal to the first threshold; obtain the A first account corresponding to each message in the third message set; determining the number of first messages corresponding to each first account in the first message set, and the number of first accounts corresponding to the first account in the third message set The second number of messages, the number of third messages included in the third message set, and the target account are determined in the account set, and the account set includes accounts corresponding to the messages in the second message set. This improves the operator's identification accuracy for mobile phone numbers that send spam text messages.
Description
技术领域technical field
本申请涉及通信技术领域,尤其涉及一种消息处理方法、装置及设备。The present application relates to the field of communication technologies, and in particular, to a message processing method, apparatus, and device.
背景技术Background technique
目前,垃圾短信的发送者通常持有大量的手机卡,并且为了提高垃圾短信的发送效率,通常使用同一个手机卡发送文本内容类似的垃圾短信,而运营商需要对发送垃圾短信的手机号码进行封禁,防止用户的手机中收到大量的垃圾短信。At present, the senders of spam messages usually hold a large number of mobile phone cards, and in order to improve the efficiency of sending spam messages, the same phone card is usually used to send spam messages with similar text content. Banned to prevent users from receiving a large number of spam text messages on their mobile phones.
现有技术中,运营商通常根据一段时间内手机号码发送的垃圾短信的数量,确定需要封禁的手机号码。例如,在一个小时内,同一个手机号码发送的垃圾短信的数量大于预设值时,运营商会对该手机号码进行封禁。然而,为了避免手机号码的误封,运营商设置的预设值通常较大,而垃圾短信的发送者在一个小时内会保证同一个手机号码发送的垃圾短信的数量小于运营商设置的预设值,使得垃圾短信的发送号码在短时间内无法被运营商识别,进而导致运营商对发送垃圾短信的手机号码的识别精度较低。In the prior art, the operator usually determines the mobile phone number to be banned according to the number of spam short messages sent by the mobile phone number within a period of time. For example, within an hour, when the number of spam text messages sent by the same mobile phone number is greater than the preset value, the operator will block the mobile phone number. However, in order to avoid the wrong blocking of mobile phone numbers, the preset value set by the operator is usually larger, and the sender of spam messages will ensure that the number of spam messages sent by the same mobile phone number is less than the preset value set by the operator within an hour. The value of this value makes the sending number of the spam short message unrecognized by the operator in a short period of time, which in turn causes the operator to have a low identification accuracy of the mobile phone number that sends the spam short message.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种消息处理方法、装置及设备,用于解决现有技术中对垃圾短信的手机号码的识别精度较低的技术问题。Embodiments of the present application provide a message processing method, apparatus, and device, which are used to solve the technical problem in the prior art that the identification accuracy of mobile phone numbers of spam short messages is low.
第一方面,本申请实施例提供一种消息处理方法,该方法包括:In a first aspect, an embodiment of the present application provides a message processing method, which includes:
在第一消息集合中确定第二消息集合,所述第一消息集合包括至少一个终端设备在预设时段内接收到的消息,所述第二消息集合中的消息为预设类型;determining a second message set in the first message set, the first message set includes messages received by at least one terminal device within a preset time period, and the messages in the second message set are of a preset type;
根据所述第二消息集合中各消息之间的相似度,将所述第二消息集合划分为至少一个第三消息集合,所述第三消息集合中的各消息之间的相似度大于或等于第一阈值;According to the similarity between the messages in the second message set, the second message set is divided into at least one third message set, and the similarity between the messages in the third message set is greater than or equal to first threshold;
获取所述第三消息集合中每个消息对应的第一账号,所述第一账号为终端设备发送所述消息所使用的账号;obtaining a first account corresponding to each message in the third message set, where the first account is an account used by the terminal device to send the message;
在所述第一消息集合中确定每个第一账号对应的第一消息数量、所述第一账号在所述第三消息集合中对应的第二消息数量、所述第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,所述账号集合包括所述第二消息集合中的消息对应的账号。In the first message set, determine the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, and the number of messages included in the third message set For the third number of messages, the target account is determined in the account set, and the account set includes the accounts corresponding to the messages in the second message set.
在一种可能的实施方式中,根据所述第二消息集合中各消息之间的相似度,将所述第二消息集合划分为至少一个第三消息集合,包括:In a possible implementation manner, according to the similarity between the messages in the second message set, the second message set is divided into at least one third message set, including:
获取所述第二消息集合中各消息之间的相似度;obtaining the similarity between the messages in the second message set;
根据所述相似度,对所述第二消息集合中的消息进行聚类处理,以将所述第二消息集合划分为至少一个第三消息集合。According to the similarity, clustering processing is performed on the messages in the second message set, so as to divide the second message set into at least one third message set.
在一种可能的实施方式中,获取所述第二消息集合中各消息之间的相似度,包括:In a possible implementation manner, acquiring the similarity between the messages in the second message set includes:
获取所述第二消息集合中各消息的字符属性,字符属性包括如下至少一种:所述消息中包括的字符数量或所述消息中包括的字符的笔画数;Acquire character attributes of each message in the second message set, where the character attributes include at least one of the following: the number of characters included in the message or the number of strokes of characters included in the message;
根据所述字符属性,确定所述第二消息集合中各消息之间的相似度。According to the character attribute, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,根据所述字符属性,确定所述第二消息集合中各消息之间的相似度,包括:In a possible implementation manner, determining the similarity between the messages in the second message set according to the character attribute, including:
根据所述字符属性,确定所述第二消息集合中各消息之间的编辑距离;According to the character attribute, determine the edit distance between the messages in the second message set;
根据所述编辑距离,确定所述第二消息集合中各消息之间的相似度。According to the edit distance, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,在所述第一消息集合中确定每个第一账号对应的第一消息数量、所述第一账号在所述第三消息集合中对应的第二消息数量、所述第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,包括:In a possible implementation manner, in the first message set, the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, The number of third messages included in the third message set, and the target account number is determined in the account set, including:
根据所述第一消息数量、所述第三消息数量,确定第二阈值;determining a second threshold according to the number of the first messages and the number of the third messages;
根据所述第二消息数量和所述第二阈值,在账号集合中确定目标账号。A target account is determined in the account set according to the second message quantity and the second threshold.
在一种可能的实施方式中,根据所述第一消息数量、所述第三消息数量,确定第二阈值,包括:In a possible implementation manner, determining the second threshold according to the first message quantity and the third message quantity includes:
获取第一预设关系,所述第一预设关系包括多个消息数量和每个消息数量对应的比例系数;acquiring a first preset relationship, where the first preset relationship includes a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
根据所述第一消息数量和所述第一预设关系,确定比例系数;determining a proportional coefficient according to the first message quantity and the first preset relationship;
根据所述第一消息数量、所述比例系数和所述第三消息数量,确定第二阈值。A second threshold is determined according to the first number of messages, the scaling factor and the third number of messages.
在一种可能的实施方式中,根据所述第二消息数量和所述第二阈值,在账号集合中确定目标账号,包括:In a possible implementation manner, according to the second message quantity and the second threshold, determining the target account in the account set, including:
判断所述第二消息数量是否大于第二阈值;judging whether the number of the second messages is greater than a second threshold;
若是,则在所述账号集合中确定所述第一账号为所述目标账号。If so, determine the first account as the target account in the account set.
第二方面,本申请实施例提供一种消息处理装置,该装置包括第一确定模块、划分模块、获取模块和第二确定模块,其中:In a second aspect, an embodiment of the present application provides a message processing device, the device includes a first determination module, a division module, an acquisition module, and a second determination module, wherein:
所述第一确定模块用于,在第一消息集合中确定第二消息集合,所述第一消息集合包括至少一个终端设备在预设时段内接收到的消息,所述第二消息集合中的消息为预设类型;The first determining module is configured to determine a second message set from the first message set, where the first message set includes messages received by at least one terminal device within a preset time period, and the messages in the second message set are The message is a preset type;
所述划分模块用于,根据所述第二消息集合中各消息之间的相似度,将所述第二消息集合划分为至少一个第三消息集合,所述第三消息集合中的各消息之间的相似度大于或等于第一阈值;The dividing module is configured to, according to the similarity between the messages in the second message set, divide the second message set into at least one third message set, where each message in the third message set is divided into at least one third message set. The similarity between them is greater than or equal to the first threshold;
所述获取模块用于,获取所述第三消息集合中每个消息对应的第一账号,所述第一账号为终端设备发送所述消息所使用的账号;The obtaining module is configured to obtain a first account corresponding to each message in the third message set, where the first account is an account used by the terminal device to send the message;
所述第二确定模块用于,在所述第一消息集合中确定每个第一账号对应的第一消息数量、所述第一账号在所述第三消息集合中对应的第二消息数量、所述第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,所述账号集合包括所述第二消息集合中的消息对应的账号。The second determining module is configured to determine, in the first message set, the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, For the number of third messages included in the third message set, the target account is determined in the account set, and the account set includes accounts corresponding to the messages in the second message set.
在一种可能的实施方式中,所述划分模块具体用于:In a possible implementation manner, the dividing module is specifically used for:
获取所述第二消息集合中各消息之间的相似度;obtaining the similarity between the messages in the second message set;
根据所述相似度,对所述第二消息集合中的消息进行聚类处理,以将所述第二消息集合划分为至少一个第三消息集合。According to the similarity, clustering processing is performed on the messages in the second message set, so as to divide the second message set into at least one third message set.
在一种可能的实施方式中,所述划分模块具体用于:In a possible implementation manner, the dividing module is specifically used for:
获取所述第二消息集合中各消息的字符属性,字符属性包括如下至少一种:所述消息中包括的字符数量或所述消息中包括的字符的笔画数;Acquire character attributes of each message in the second message set, where the character attributes include at least one of the following: the number of characters included in the message or the number of strokes of characters included in the message;
根据所述字符属性,确定所述第二消息集合中各消息之间的相似度。According to the character attribute, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,所述划分模块具体用于:In a possible implementation manner, the dividing module is specifically used for:
根据所述字符属性,确定所述第二消息集合中各消息之间的编辑距离;According to the character attribute, determine the edit distance between the messages in the second message set;
根据所述编辑距离,确定所述第二消息集合中各消息之间的相似度。According to the edit distance, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,所述第二确定模块具体用于:In a possible implementation manner, the second determining module is specifically configured to:
根据所述第一消息数量、所述第三消息数量,确定第二阈值;determining a second threshold according to the number of the first messages and the number of the third messages;
根据所述第二消息数量和所述第二阈值,在账号集合中确定目标账号。A target account is determined in the account set according to the second message quantity and the second threshold.
在一种可能的实施方式中,所述第二确定模块具体用于:In a possible implementation manner, the second determining module is specifically configured to:
获取第一预设关系,所述第一预设关系包括多个消息数量和每个消息数量对应的比例系数;acquiring a first preset relationship, where the first preset relationship includes a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
根据所述第一消息数量和所述第一预设关系,确定比例系数;determining a proportional coefficient according to the first message quantity and the first preset relationship;
根据所述第一消息数量、所述比例系数和所述第三消息数量,确定第二阈值。A second threshold is determined according to the first number of messages, the scaling factor and the third number of messages.
在一种可能的实施方式中,所述第二确定模块具体用于:In a possible implementation manner, the second determining module is specifically configured to:
判断所述第二消息数量是否大于第二阈值;judging whether the number of the second messages is greater than a second threshold;
若是,则在所述账号集合中确定所述第一账号为所述目标账号。If so, determine the first account as the target account in the account set.
第三方面,本申请实施例提供一种消息处理设备,包括:存储器、处理器和通信接口,所述存储器用于存储程序指令,所述处理器用于调用存储器中的程序指令执行如第一方面任一项所述的消息处理方法。In a third aspect, an embodiment of the present application provides a message processing device, including: a memory, a processor, and a communication interface, where the memory is used for storing program instructions, and the processor is used for invoking the program instructions in the memory to execute the execution as in the first aspect Any one of the message processing methods.
第四方面,本申请实施例提供一种可读存储介质,所述可读存储介质上存储有计算机程序;所述计算机程序用于实现如第一方面任一项所述的消息处理方法。In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored on the readable storage medium; the computer program is used to implement the message processing method according to any one of the first aspect.
本申请实施例提供一种消息处理方法、装置及设备,在第一消息集合中确定第二消息集合,其中,第一消息集合包括至少一个终端设备在预设时间段内接收到的消息,第二消息集合中的消息为预设类型。根据第二消息集合中各消息之间的相似度,将第二消息集合划分为至少一个第三消息集合,其中,第三消息集合中的各消息之间的相似度大于或等于第一阈值,获取第三消息集合中每个消息对应的第一账号,其中,第一账号为终端设备发送消息所使用的账号。在第一消息集合中确定每个第一账号对应的第一消息数量、第一账号在第三消息集合中对应的第二消息数量、第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,其中,账号集合包括第二消息集合中的消息对应的账号。在上述方法中,根据第二消息集合中各消息之间的相似度,可以准确的将文本内容相似的消息划分在同一个第三消息集合中,并根据第一账号在第一消息集合中发送的消息数量、第三消息集合中包括的消息数量以及第一账号在第三消息集合中发送的消息数量,可以准确的确定第一账号是否为目标账号,进而提高了运营商对发送垃圾短信的手机号码的识别精度。Embodiments of the present application provide a message processing method, apparatus, and device, where a second message set is determined in a first message set, where the first message set includes messages received by at least one terminal device within a preset time period, and the first message set includes messages received by at least one terminal device within a preset time period. The messages in the second message set are of a preset type. dividing the second message set into at least one third message set according to the similarity between the messages in the second message set, wherein the similarity between the messages in the third message set is greater than or equal to the first threshold, Acquire a first account corresponding to each message in the third message set, where the first account is an account used by the terminal device to send messages. In the first message set, determine the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, and the number of third messages included in the third message set, and determine in the account set The target account is determined in the set of accounts, wherein the set of accounts includes accounts corresponding to the messages in the second set of messages. In the above method, according to the similarity between the messages in the second message set, messages with similar text content can be accurately divided into the same third message set, and sent in the first message set according to the first account number The number of messages in the third message set, the number of messages included in the third message set, and the number of messages sent by the first account in the third message set can accurately determine whether the first account is the target account, thereby improving the operator's ability to send spam short messages. The recognition accuracy of mobile phone numbers.
附图说明Description of drawings
图1为本申请实施例提供的应用场景的示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种消息处理方法的流程示意图;2 is a schematic flowchart of a message processing method provided by an embodiment of the present application;
图3为本申请实施例提供的得到第三消息集合的过程示意图;3 is a schematic diagram of a process for obtaining a third message set according to an embodiment of the present application;
图4为本申请实施例提供的另一种消息处理方法的流程示意图;4 is a schematic flowchart of another message processing method provided by an embodiment of the present application;
图5为本申请实施例提供的消息处理方法的过程示意图;FIG. 5 is a schematic process diagram of a message processing method provided by an embodiment of the present application;
图6为本申请实施例提供的一种消息处理装置的结构示意图;FIG. 6 is a schematic structural diagram of a message processing apparatus provided by an embodiment of the present application;
图7为本申请实施例提供的消息处理设备的硬件结构示意图。FIG. 7 is a schematic diagram of a hardware structure of a message processing device according to an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
为了便于理解,首先对本申请实施例的应用场景进行介绍。For ease of understanding, the application scenarios of the embodiments of the present application are first introduced.
图1为本申请实施例提供的应用场景的示意图。请参见图1,包括手机A、手机B和运营商。其中,手机A可以向手机B发送短信,手机B可以接收手机A发送的短信,手机A和手机B可以与运营商进行数据交互。FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application. See Figure 1, including phone A, phone B, and the operator. The mobile phone A can send a short message to the mobile phone B, the mobile phone B can receive the short message sent by the mobile phone A, and the mobile phone A and the mobile phone B can exchange data with the operator.
在手机A向手机B发送垃圾短信时,运营商可以获取手机B接收的垃圾短信的发送方,例如,运营商可以根据垃圾短信对应的手机号码,确定垃圾短信的发送方。在运营商确定手机A为垃圾短信的发送方时,运营商可以封停手机A的号码,使得手机A无法向手机B发送垃圾短信。When mobile phone A sends a spam short message to mobile phone B, the operator can obtain the sender of the spam short message received by mobile phone B. For example, the operator can determine the sender of the spam short message according to the mobile phone number corresponding to the spam short message. When the operator determines that mobile phone A is the sender of the spam short message, the operator can block the number of mobile phone A, so that mobile phone A cannot send spam short messages to mobile phone B.
在相关技术中,由于垃圾短信的发送者在不影响阅读的前提下,通常会对垃圾短信的文本内容进行变换,并且在一段时间内使用同一个手机号码发送的垃圾短信的数量小于运营商设置的垃圾短信发送的上限,使得运营商短时间内无法识别发送垃圾短信的手机号码,导致运营商对发送垃圾短信的手机号码的识别精度较低。In the related art, because the sender of the spam message usually changes the text content of the spam message without affecting the reading, and the number of spam messages sent by the same mobile phone number within a period of time is less than that set by the operator The upper limit of sending spam text messages, so that operators cannot identify the mobile phone numbers that send spam text messages in a short period of time, resulting in lower recognition accuracy of mobile phone numbers that send spam text messages.
为了解决相关技术中运营商对发送垃圾短信的手机号码的识别精度较低的技术问题,在本申请实施例中,确定一段时间内终端设备发送的所有的短信中的第一垃圾短信集合,并根据第一垃圾短信集合中各垃圾短信之间的相似度,将垃圾短信分成不同类型的第二垃圾短信集合。确定第二垃圾短信集合中的垃圾短信对应的手机号码,并根据该手机号码在一段时间内发送的所有短信的数量和发送的该类垃圾短信的数量,以及该类第二垃圾短信集合中的垃圾短信的总数量,确定该手机号码是否为发送垃圾短信的手机号码。在上述方法中,根据垃圾短信集合中各垃圾短信之间的相似度,可以准确的将文本内容相似的垃圾短信划分在同一个第二垃圾短信集合中,若同一个第二垃圾短信集合中的多条垃圾短信为同一个手机号码发送的垃圾短信,则该手机号码为垃圾短信的发送号码,可以对该手机号码进行封停处理,这样可以提高运营商对发送垃圾短信的手机号码的识别精度。In order to solve the technical problem in the related art that the operator's identification accuracy of the mobile phone number for sending spam short messages is low, in this embodiment of the present application, the first set of spam short messages in all short messages sent by the terminal device within a period of time is determined, and According to the similarity between the spam short messages in the first spam short message set, the spam short messages are divided into second spam short message sets of different types. Determine the mobile phone number corresponding to the spam short message in the second spam short message set, and according to the number of all short messages sent by the mobile phone number within a period of time and the number of this type of spam short message sent, and the number of the second spam short message set of this type. The total number of spam text messages, to determine whether the mobile phone number is a mobile phone number that sends spam text messages. In the above method, according to the similarity between the spam short messages in the spam short message set, the spam short messages with similar text content can be accurately divided into the same second spam short message set. If multiple spam text messages are spam text messages sent from the same mobile phone number, the mobile phone number is the number for sending spam text messages, and the mobile phone number can be blocked. .
下面,通过具体实施例对本申请实施例所示的技术方案进行详细说明。需要说明的是,如下实施例可以单独存在,也可以相互结合,对于相同或相似的内容,在不同的实施例中不再重复说明。Hereinafter, the technical solutions shown in the embodiments of the present application will be described in detail through specific embodiments. It should be noted that the following embodiments may exist alone or may be combined with each other, and the same or similar content will not be repeated in different embodiments.
图2为本申请实施例提供的一种消息处理方法的流程示意图。请参见图2,该方法可以包括:FIG. 2 is a schematic flowchart of a message processing method provided by an embodiment of the present application. Referring to Figure 2, the method can include:
S201、在第一消息集合中确定第二消息集合。S201. Determine a second message set in the first message set.
本申请实施例的执行主体可以为服务器,也可以为设置在服务器中的消息处理装置。可选的,消息处理装置可以通过软件实现,也可以通过软件和硬件的结合实现。The execution body of the embodiment of the present application may be a server, or may be a message processing apparatus provided in the server. Optionally, the message processing apparatus may be implemented by software, or may be implemented by a combination of software and hardware.
第一消息集合包括至少一个终端设备在预设时段内接收到的消息。其中,终端设备可以为任意具有消息接收功能的设备。例如,终端设备可以为手机、电脑等设备。例如,在终端设备为手机时,第一消息集合可以为预设时段内至少一个手机接收的短信、微信等消息集合。The first set of messages includes messages received by at least one terminal device within a preset time period. The terminal device may be any device with a message receiving function. For example, the terminal device may be a mobile phone, a computer, or other devices. For example, when the terminal device is a mobile phone, the first message set may be a set of messages such as short messages and WeChat messages received by at least one mobile phone within a preset time period.
第二消息集合中的消息为预设类型。可选的,预设类型可以为垃圾消息。其中垃圾消息可以为未经用户同意而向用户发送的用户不愿接收的消息。例如,垃圾消息可以包括垃圾短信、垃圾邮件等。The messages in the second message set are of a preset type. Optionally, the preset type can be spam. The spam message may be a message that the user does not want to receive, which is sent to the user without the user's consent. For example, spam messages may include spam text messages, spam emails, and the like.
可选的,第二消息集合可以为垃圾消息集合。例如,第一消息集合为至少一个手机在预设时段内接收到的短信,其中,第一消息集合中的垃圾短信构成的集合为第二消息集合。可选的,垃圾消息也可以包括手机接收的垃圾微信消息、垃圾通知消息等。Optionally, the second message set may be a spam message set. For example, the first message set is a short message received by at least one mobile phone within a preset time period, wherein the set formed by the spam short messages in the first message set is the second message set. Optionally, the spam messages may also include spam WeChat messages, spam notification messages, and the like received by the mobile phone.
可选的,第二消息集合可以为垃圾邮件集合。例如,第一消息集合为至少一个邮箱在预设时段内接收到的邮件,其中,第一消息集合中的垃圾邮件构成的集合为第二消息集合。Optionally, the second message set may be a spam set. For example, the first message set is emails received by at least one mailbox within a preset time period, and the set formed by spam emails in the first message set is the second message set.
可选的,可以根据如下可行的实现方式在第一消息集合中确定第二消息集合:确定第二消息集合中消息的关键字,根据关键字在第一消息集合中确定第二消息集合。例如,第二消息集合为垃圾短信集合时,确定垃圾短信的关键字,根据关键字过滤算法,在第一消息集合中筛选出满足垃圾短信关键字的短信,进而构成第二消息集合。Optionally, the second message set may be determined in the first message set according to the following feasible implementation manner: determine the keywords of the messages in the second message set, and determine the second message set in the first message set according to the keywords. For example, when the second message set is a spam short message set, the keywords of the spam short messages are determined, and according to the keyword filtering algorithm, short messages satisfying the spam short message keywords are screened out from the first message set to form the second message set.
可选的,可以根据人工智能机器学习算法在第一消息集合中确定第二消息集合。例如,第二消息集合为垃圾短信集合时,可以根据样本垃圾短信和样本正常短信对模型进行训练,根据训练好的模型对垃圾短信进行识别,进而在第一消息集合中确定第二消息集合。Optionally, the second message set may be determined from the first message set according to an artificial intelligence machine learning algorithm. For example, when the second message set is a spam short message set, the model can be trained according to the sample spam short message and the sample normal short message, the spam short message is identified according to the trained model, and the second message set is determined in the first message set.
S202、根据第二消息集合中各消息之间的相似度,将第二消息集合划分为至少一个第三消息集合。S202. Divide the second message set into at least one third message set according to the similarity between the messages in the second message set.
第三消息集合为预设类型的消息集合。例如,预设类型为垃圾消息时,第二消息集合为垃圾消息集合,根据第二消息集合划分的第三消息集合也为垃圾消息集合。The third message set is a preset type of message set. For example, when the preset type is spam, the second message set is a spam set, and the third message set divided according to the second message set is also a spam set.
可选的,第三消息集合中的各消息之间的相似度大于或等于第一阈值。例如,第三消息集合中的消息为垃圾消息时,各垃圾消息之间的文本内容的相似程度大于或等于第一阈值。Optionally, the similarity between the messages in the third message set is greater than or equal to the first threshold. For example, when the messages in the third message set are spam messages, the degree of similarity of the text content between the spam messages is greater than or equal to the first threshold.
可以根据如下可行的方式将第二消息集合划分为至少一个第三消息集合:获取第二消息集合中各消息之间的相似度。其中,相似度指各消息之间的文本内容的相似程度。例如,短信A和短信B中的文本内容相同,则短信A和短信B的相似度为百分之百。根据相似度对第二消息集合中的消息进行聚类处理,以将第二消息集合划分为至少一个第三消息集合。The second message set may be divided into at least one third message set according to the following feasible manner: obtaining the similarity between the messages in the second message set. The similarity refers to the degree of similarity of text content between messages. For example, if the text content in the short message A and the short message B is the same, the similarity between the short message A and the short message B is 100%. Clustering processing is performed on the messages in the second message set according to the similarity, so as to divide the second message set into at least one third message set.
可选的,可以根据如下可行的方式获取第二消息集合中各消息之间的相似度:获取第二消息集合中各消息的字符属性,其中,字符属性包括如下至少一种:消息中包括的字符数量或消息中包括的字符的笔画数。Optionally, the similarity between the messages in the second message set may be acquired in the following feasible manner: acquiring character attributes of each message in the second message set, wherein the character attributes include at least one of the following: The number of characters or strokes of characters included in the message.
根据字符属性,确定第二消息集合中各消息之间的相似度。例如,根据字符属性,确定第二消息集合中各消息之间的编辑距离,根据编辑距离,确定第二消息集合中各消息之间的相似度。其中,编辑距离是指文本内容的两个文本之间,由一个转成另一个所需的最少编辑操作次数。例如,编辑操作可以包括:替换一个文本、插入一个文本、删除一个文本等操作。可选的,一个文本变成另一个文本所需要的编辑操作次数越少,两个文本的相似度越大。例如,文本“一”编辑成文本“二”所需要的编辑操作次数为一次,则文本“一”和文本“二”的相似度高,文本“一”编辑成文本“删”所需要的编辑操作次数为六次,则文本“一”和文本“删”的相似度较低。例如,垃圾短信A中的文本内容的笔画数为100,垃圾短信B中的文本内容的笔画数为100,则垃圾短信A和垃圾短信B的相似度为百分之百,垃圾短信A和垃圾短信B在同一个第三消息集合中。According to the character attribute, the similarity between the messages in the second message set is determined. For example, the edit distance between the messages in the second message set is determined according to the character attribute, and the similarity between the messages in the second message set is determined according to the edit distance. Among them, the edit distance refers to the minimum number of editing operations required to convert two texts of text content from one to the other. For example, editing operations may include operations such as replacing a text, inserting a text, deleting a text, and so on. Optionally, the fewer editing operations required for one text to become another, the greater the similarity between the two texts. For example, the number of editing operations required to edit the text "1" into the text "2" is one, then the similarity between the text "1" and the text "2" is high, and the editing required to edit the text "1" into the text "Delete" When the number of operations is six, the similarity between the text "one" and the text "delete" is low. For example, the number of strokes of the text content in spam message A is 100, and the number of strokes in the text content in spam message B is 100, then the similarity between spam message A and spam message B is 100%, and spam message A and spam message B are in in the same third message set.
可选的,可以根据第二消息集合中各消息之间的相似度,使用Mean-Shift聚类算法对第二消息集合中的消息进行聚类处理,进而得到至少一个第三消息集合。由于无法预知消息的类别数量,Mean-Shift聚类算法的滑窗半径的单位为莱文斯坦距离,可以控制消息类别的变化幅度。例如,消息的变化幅度大于滑窗半径时,将该消息认定为另一种类的消息。Optionally, according to the similarity between the messages in the second message set, the Mean-Shift clustering algorithm may be used to perform clustering processing on the messages in the second message set, thereby obtaining at least one third message set. Since the number of message categories cannot be predicted, the unit of the sliding window radius of the Mean-Shift clustering algorithm is the Levenstein distance, which can control the variation of message categories. For example, when the variation range of the message is larger than the sliding window radius, the message is regarded as another kind of message.
下面,结合图3,对将第二消息集合划分为至少一个第三消息集合的过程进行详细说明。The process of dividing the second message set into at least one third message set will be described in detail below with reference to FIG. 3 .
图3为本申请实施例提供的得到第三消息集合的过程示意图。请参见图3,包括第二消息集合、第三消息集合A、第三消息集合B和第三消息集合C。其中,第二消息集合中包括8条消息(每一个圆代表一条消息),第三消息集合A中包括3条消息,第三消息集合B中包括2条消息,第三消息集合C中包括4条消息。FIG. 3 is a schematic diagram of a process of obtaining a third message set according to an embodiment of the present application. Referring to FIG. 3 , it includes a second message set, a third message set A, a third message set B, and a third message set C. The second message set includes 8 messages (each circle represents a message), the third message set A includes 3 messages, the third message set B includes 2 messages, and the third message set C includes 4 messages messages.
请参见图3,获取第二消息集合中各消息的字符属性,并根据各消息的字符属性对第二消息集合中的消息进行相似度划分。消息之间的相似度越高,消息之间的距离越近,消息之间的相似度越低,消息之间的距离越远。Referring to FIG. 3 , the character attributes of each message in the second message set are acquired, and the similarities of the messages in the second message set are divided according to the character attributes of each message. The higher the similarity between the messages, the closer the distance between the messages, the lower the similarity between the messages, the farther the distance between the messages.
对相似度划分后的消息进行聚类处理,得到第三消息集合A、第三消息集合B和第三消息集合C,其中,第三消息集合A中的各消息之间的相似度大于或等于第一阈值,第三消息集合B中的各消息之间的相似度大于或等于第一阈值,第三消息集合C中的各消息之间的相似度大于或等于第一阈值。Perform clustering processing on the messages after the similarity division to obtain a third message set A, a third message set B and a third message set C, wherein the similarity between the messages in the third message set A is greater than or equal to The first threshold, the similarity between the messages in the third message set B is greater than or equal to the first threshold, and the similarity between the messages in the third message set C is greater than or equal to the first threshold.
S203、获取第三消息集合中每个消息对应的第一账号。S203: Acquire a first account corresponding to each message in the third message set.
第一账号为终端设备发送消息所使用的账号。例如,终端设备为手机时,手机发送短信的第一账号为手机号码,手机发送微信消息的第一账号为微信号码,手机发送邮件的第一账号为邮箱账号。The first account is an account used by the terminal device to send messages. For example, when the terminal device is a mobile phone, the first account for sending SMS messages from the mobile phone is the mobile phone number, the first account for sending WeChat messages from the mobile phone is the WeChat ID number, and the first account for sending emails from the mobile phone is the mailbox account.
可选的,可以根据第三消息集合中每个消息与账号的对应关系,确定第一账号。例如,消息与账号的对应关系可以如表1所示:Optionally, the first account may be determined according to the correspondence between each message in the third message set and the account. For example, the correspondence between messages and accounts can be as shown in Table 1:
表1Table 1
需要说明的是,表1只是以示例的形式示意消息与账号的对应关系,并非对消息与账号的对应关系的限定。It should be noted that, Table 1 only illustrates the correspondence between messages and accounts in the form of an example, and does not limit the correspondence between messages and accounts.
例如,第三消息集合中的的消息为消息1时,则第一账号为账号1;第三消息集合中的的消息为消息2时,则第一账号为账号2;第三消息集合中的的消息为消息3时,则第一账号为账号3。For example, when the message in the third message set is message 1, the first account is account 1; when the message in the third message set is message 2, the first account is account 2; When the message is message 3, the first account is account 3.
可选的,在第三消息集合为垃圾短信集合时,每个消息对应的第一账号为发送垃圾短信的手机号码,在第三消息集合为垃圾微信集合时,每个消息对应的第一账号为发送垃圾微信的微信账号,在第三消息集合为垃圾邮件集合时,每个消息对应的第一账号为发送垃圾邮件的邮箱账号。Optionally, when the third message set is a spam short message set, the first account corresponding to each message is the mobile phone number for sending spam short messages, and when the third message set is a spam WeChat set, the first account corresponding to each message is It is a WeChat account for sending spam WeChat, and when the third message set is a spam set, the first account corresponding to each message is the email account for sending spam.
S204、在第一消息集合中确定每个第一账号对应的第一消息数量、第一账号在第三消息集合中对应的第二消息数量、第三消息集合中包括的第三消息数量,在账号集合中确定目标账号。S204. Determine, in the first message set, the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, and the number of third messages included in the third message set, and Determine the target account in the account set.
第一消息数量为第一消息集合中第一账号发送的消息的数量。例如,在第一消息集合中包括100条消息,其中10条消息为第一账号发送的消息,则第一消息数量为10。The first number of messages is the number of messages sent by the first account in the first message set. For example, if the first message set includes 100 messages, 10 of which are messages sent by the first account, the number of the first messages is 10.
可选的,可以根据第一消息集合中各消息与账号的对应关系,确定第一消息集合中第一账号发送的消息的数量。例如,第一消息集合中的消息对应的账号与第一账号相同时,该消息为第一消息集合中第一账号发送的消息。Optionally, the number of messages sent by the first account in the first message set may be determined according to the correspondence between each message in the first message set and the account. For example, when the account corresponding to the message in the first message set is the same as the first account, the message is a message sent by the first account in the first message set.
第二消息数量为第三消息集合中第一账号发送的消息的数量。例如,在第三消息集合中包括10条消息,其中3条消息为第一账号发送的消息,则第二消息数量为3。The second number of messages is the number of messages sent by the first account in the third message set. For example, if the third message set includes 10 messages, among which 3 messages are messages sent by the first account, the number of the second messages is 3.
可选的,可以根据第三消息集合中各消息与账号的对应关系,确定第三消息集合中第一账号发送的消息的数量。例如,第三消息集合中的消息对应的账号与第一账号相同时,该消息为第三消息集合中第一账号发送的消息。Optionally, the number of messages sent by the first account in the third message set may be determined according to the correspondence between each message in the third message set and the account. For example, when the account corresponding to the message in the third message set is the same as the first account, the message is a message sent by the first account in the third message set.
第三消息数量为第三消息集合中包括的消息数量。例如,第三消息集合为垃圾短信集合时,第三消息集合中垃圾短信的总数量为第三消息数量。The third number of messages is the number of messages included in the third message set. For example, when the third message set is a set of spam short messages, the total number of spam short messages in the third message set is the number of third messages.
可选的,可以根据聚类处理的结果获取第三消息数量。例如,在对第二消息集合中各消息进行聚类处理时,可以确定划分的每个第三消息集合中的消息数量。Optionally, the third message quantity may be obtained according to the result of the clustering process. For example, when the messages in the second message set are clustered, the number of messages in each of the divided third message sets can be determined.
账号集合包括第二消息集合中的消息对应的账号。可选的,可以将第二消息集合中每个消息对应的账号构成账号集合。例如,若第二消息集合中的消息为垃圾短信,则第二消息集合中所有垃圾短信对应的手机号码都为账号集合中包括的账号。The account set includes accounts corresponding to the messages in the second message set. Optionally, an account set corresponding to each message in the second message set may be formed into an account set. For example, if the messages in the second message set are spam short messages, the mobile phone numbers corresponding to all the spam short messages in the second message set are accounts included in the account set.
可以根据如下两种可行的实现方式在账号集合中确定目标账号:The target account can be determined in the account set according to the following two feasible implementation methods:
一种可行的实现方式:A possible implementation:
根据第一消息数量、第三消息数量,确定第二阈值。根据第二消息数量和第二阈值,在账号集合中确定目标账号。可选的,第二阈值可以为预设的阈值,若第一账号在第三消息集合中对应的第二消息数量大于或等于第二阈值,则在账号集合中确定第一账号为目标账号。The second threshold is determined according to the first number of messages and the third number of messages. The target account is determined in the account set according to the second message quantity and the second threshold. Optionally, the second threshold may be a preset threshold. If the number of second messages corresponding to the first account in the third message set is greater than or equal to the second threshold, the first account is determined in the account set as the target account.
可选的,在第三消息集合中的消息为垃圾短信时,第一账号为发送垃圾短信的手机号码,在账号集合中确定目标手机号码时,运营商可以对目标手机号码进行停机一段时间、封禁等处理,降低用户手机收到垃圾短信的概率,进而可以提高用户的使用体验。Optionally, when the messages in the third message set are spam short messages, the first account is the mobile phone number for sending the spam short messages, and when the target mobile phone number is determined in the account set, the operator may shut down the target mobile phone number for a period of time, Banning and other processing can reduce the probability of the user's mobile phone receiving spam text messages, thereby improving the user's experience.
另一种可行的实现方式:Another possible implementation:
根据第一账号在第一消息集合中对应的第一消息数量和第一账号在第三消息集合中对应的第二消息数量,确定第一账号的可疑值。在可疑值大于或等于预设的阈值时,在账号集合中确定第一账号为目标账号。例如,可疑值可以为第一消息数量和第二消息数量的比值。例如,第一账号在第一消息集合中对应的第一消息数量为100,第一账号在第三消息集合中对应的第二消息数量为30,则第一账号的可疑值为0.3,预设的阈值为0.1时,第一账号为目标账号。The suspicious value of the first account is determined according to the number of first messages corresponding to the first account in the first message set and the number of second messages corresponding to the first account in the third message set. When the suspicious value is greater than or equal to the preset threshold, the first account is determined in the account set as the target account. For example, the suspicious value may be a ratio of the number of first messages to the number of second messages. For example, if the number of first messages corresponding to the first account in the first message set is 100, and the number of second messages corresponding to the first account in the third message set is 30, the suspicious value of the first account is 0.3, and the preset When the threshold is 0.1, the first account is the target account.
本申请实施例提供的消息处理方法,在第一消息集合中确定第二消息集合,获取第二消息集合中各消息的字符属性,根据字符属性,确定第二消息集合中各消息之间的相似度。根据相似度,对第二消息集合中的消息进行聚类处理,以将第二消息集合划分为至少一个第三消息集合。获取第三消息集合中每个消息对应的第一账号,并确定第一账号对应的第一消息数量、第二消息数量和第三消息集合中包括的第三消息数量。根据第一消息数量、第三消息数量,确定第二阈值,根据第二消息数量和第二阈值,在账号集合中确定目标账号。在上述方法中,根据第二消息集合中各消息的字符属性,可以准确的获取第二消息集合中各消息之间的相似度,进而根据相似度,可以准确的得到多个第三消息集合,在第三消息集合中第一账号对应的消息数量大于或等于第二阈值时,说明第一账号发送过多个字符属性相似的预设类型的消息,可以在账号集合中确定第一账号为目标账号,进而对目标账号进行处理,这样可以提高运营商对发送垃圾短信的手机号码的识别精度。In the message processing method provided by the embodiment of the present application, a second message set is determined in a first message set, character attributes of each message in the second message set are acquired, and similarity between messages in the second message set is determined according to the character attribute Spend. According to the similarity, clustering processing is performed on the messages in the second message set, so as to divide the second message set into at least one third message set. Acquire the first account corresponding to each message in the third message set, and determine the number of first messages, the number of second messages, and the number of third messages included in the third message set corresponding to the first account. The second threshold is determined according to the first message quantity and the third message quantity, and the target account is determined in the account set according to the second message quantity and the second threshold. In the above method, according to the character attributes of each message in the second message set, the similarity between the messages in the second message set can be accurately obtained, and then according to the similarity, a plurality of third message sets can be accurately obtained, When the number of messages corresponding to the first account in the third message set is greater than or equal to the second threshold, it means that the first account has sent multiple messages of a preset type with similar character attributes, and the first account can be determined in the account set as the target account number, and then process the target account number, so that the operator can improve the identification accuracy of the mobile phone number that sends the spam short message.
在图2所示实施例的基础上,下面结合图4,对上述消息处理方法进行详细说明。On the basis of the embodiment shown in FIG. 2 , the above message processing method will be described in detail below with reference to FIG. 4 .
图4为本申请实施例提供的另一种消息处理方法的流程示意图。请参见图4,该方法可以包括:FIG. 4 is a schematic flowchart of another message processing method provided by an embodiment of the present application. Referring to Figure 4, the method can include:
S401、在第一消息集合中确定第二消息集合。S401. Determine a second message set in the first message set.
需要说明的是,S401的执行过程可以参照S201的执行过程,此处不再进行赘述。It should be noted that, for the execution process of S401, reference may be made to the execution process of S201, which will not be repeated here.
S402、根据第二消息集合中各消息之间的相似度,将第二消息集合划分为至少一个第三消息集合。S402. Divide the second message set into at least one third message set according to the similarity between the messages in the second message set.
需要说明的是,S403的执行过程可以参照S203的执行过程,此处不再进行赘述。It should be noted that, for the execution process of S403, reference may be made to the execution process of S203, which will not be repeated here.
S403、获取第三消息集合中每个消息对应的第一账号。S403: Acquire a first account corresponding to each message in the third message set.
需要说明的是,S403的执行过程可以参照S203的执行过程,此处不再进行赘述。It should be noted that, for the execution process of S403, reference may be made to the execution process of S203, which will not be repeated here.
S404、在第一消息集合中确定每个第一账号对应的第一消息数量、第一账号在第三消息集合中对应的第二消息数量、第三消息集合中包括的第三消息数量。S404. Determine, in the first message set, the number of first messages corresponding to each first account, the number of second messages corresponding to the first account in the third message set, and the number of third messages included in the third message set.
需要说明的是,S404的执行过程可以参照S204的执行过程,此处不再进行赘述。It should be noted that, for the execution process of S404, reference may be made to the execution process of S204, which will not be repeated here.
S405、获取第一预设关系,根据第一消息数量和第一预设关系,确定比例系数。S405. Acquire a first preset relationship, and determine a proportional coefficient according to the first message quantity and the first preset relationship.
第一预设关系包括多个消息数量和每个消息数量对应的比例系数。例如,第一预设关系可以如表2所示:The first preset relationship includes a plurality of message quantities and a proportional coefficient corresponding to each message quantity. For example, the first preset relationship may be as shown in Table 2:
表2Table 2
需要说明的是,表2只是以示例的形式示意第一预设关系,并非对第一预设关系的限定。It should be noted that, Table 2 only illustrates the first preset relationship in the form of an example, and is not a limitation on the first preset relationship.
可选的,可以根据第一消息数量和第一预设关系,确定比例系数。例如,在第一消息集合中第一账号对应的第一消息数量为消息数量1时,对应的比例系数为比例系数1;在第一消息集合中第一账号对应的第一消息数量为消息数量2时,对应的比例系数为比例系数2;在第一消息集合中第一账号对应的第一消息数量为消息数量3时,对应的比例系数为比例系数3。Optionally, the proportional coefficient may be determined according to the first message quantity and the first preset relationship. For example, when the number of first messages corresponding to the first account in the first message set is the number of messages 1, the corresponding proportional coefficient is the proportional coefficient 1; in the first message set, the number of first messages corresponding to the first account is the number of messages When the value is 2, the corresponding proportional coefficient is the proportional coefficient 2; when the first message quantity corresponding to the first account in the first message set is the message quantity 3, the corresponding proportional coefficient is the proportional coefficient 3.
可选的,比例系数的取值为0-1之间。Optionally, the value of the scale coefficient is between 0 and 1.
S406、根据第一消息数量、比例系数和第三消息数量,确定第二阈值。S406. Determine a second threshold according to the first message quantity, the proportional coefficient, and the third message quantity.
可以根据如下可行的实现方式确定第二阈值:根据第一消息数量、比例系数和第三消息数量确定第二阈值。例如,根据第一消息数量与第三消息数量的比值与比例系数相乘,得到第二阈值。例如,第一消息集合中第一账号对应的第一消息数量为100,第一账号所在的第三消息集合中的第三消息数量为10,比例系数为0.3,则第二阈值为3。The second threshold may be determined according to the following feasible implementation manner: the second threshold is determined according to the first message quantity, the proportional coefficient and the third message quantity. For example, the second threshold is obtained by multiplying the ratio of the number of the first messages to the number of the third messages by the proportional coefficient. For example, if the number of first messages corresponding to the first account in the first message set is 100, the number of third messages in the third message set where the first account is located is 10, and the proportional coefficient is 0.3, the second threshold is 3.
S407、判断第二消息数量是否大于或等于第二阈值。S407. Determine whether the number of the second messages is greater than or equal to the second threshold.
若否,则执行S408。If not, execute S408.
若是,则执行S409。If yes, execute S409.
S408、确定第一账号不是目标账号。S408. Determine that the first account is not the target account.
S409、在账号集合中确定第一账号为目标账号。S409. Determine, in the account set, that the first account is the target account.
可选的,在账号集合中确定与第一账号相同的账号为目标账号。Optionally, in the account set, it is determined that the account that is the same as the first account is the target account.
本申请实施例提供的消息处理方法,在第一消息集合中确定第二消息集合,根据第二消息集合中各消息之间的相似度,将第二消息集合划分为至少一个第三消息结合,获取第三消息集合中每个消息对应的第一账号,并在第一消息集合中确定每个第一账号对应的第一消息数量、第一账号在第三消息集合中对应的第二消息数量、第三消息集合中包括的第三消息数量。获取第一预设关系,根据第一消息数量和第一预设关系,确定比例系数,并根据第一消息数量、比例系数和第三消息数量确定第二阈值,在第二消息数量大于或等于第二阈值时,在账号集合中确定第一账号为目标账号。在上述方法中,根据第二消息集合中各消息之间的相似度,可以准确的将第二消息集合划分为多个第三消息集合,进而根据第一消息数量、比例系数和第三消息数量可以确定的第二阈值,可以降低运营商对账号误封的概率,同时在第二消息数量大于或等于第二阈值时,说明第一账号发送过多个相似度较高的消息,可以在账号集合中确定第一账号为目标账号,进而对目标账号进行处理,这样可以提高运营商对发送垃圾短信的手机号码的识别精度。In the message processing method provided by the embodiment of the present application, a second message set is determined in the first message set, and the second message set is divided into at least one third message combination according to the similarity between the messages in the second message set, Obtain the first account corresponding to each message in the third message set, and determine, in the first message set, the number of first messages corresponding to each first account and the number of second messages corresponding to the first account in the third message set , the number of third messages included in the third message set. Obtain a first preset relationship, determine a proportionality coefficient according to the first message quantity and the first preset relationship, and determine a second threshold according to the first message quantity, the proportionality coefficient, and the third message quantity, when the second message quantity is greater than or equal to At the second threshold, the first account is determined as the target account in the account set. In the above method, according to the similarity between the messages in the second message set, the second message set can be accurately divided into a plurality of third message sets, and then according to the number of first messages, the proportional coefficient and the number of third messages The determinable second threshold can reduce the probability of the operator misblocking the account. At the same time, when the number of the second messages is greater than or equal to the second threshold, it means that the first account has sent multiple messages with high similarity, and can be displayed in the account number. In the collection, the first account is determined as the target account, and then the target account is processed, so that the operator can improve the identification accuracy of the mobile phone number that sends the spam short message.
在上述任意一个实施例的基础上,下面,结合图5,通过具体示例,对消息处理方法进行详细说明。On the basis of any one of the above-mentioned embodiments, the message processing method will be described in detail below with reference to FIG. 5 through a specific example.
图5为本申请实施例提供的消息处理方法的过程示意图。请参见图5,包括第一消息集合、第二消息集合、第三消息集合A和第三消息集合B,其中,第一消息集合中包括多条消息,第二消息结合中包括8条预设类型的消息(图5中一个圆圈代表一条消息),第三消息集合A中包括4条预设类型的消息,第三消息集合B中包括4条预设类型的消息,消息1为第三消息集合A中的消息。第三消息集合A中的各消息之间字符属性的相似度大于或等于第一阈值,第三消息集合B中的各消息之间字符属性的相似度大于或等于第一阈值。FIG. 5 is a schematic process diagram of a message processing method provided by an embodiment of the present application. Please refer to FIG. 5 , including a first message set, a second message set, a third message set A, and a third message set B, wherein the first message set includes multiple messages, and the second message combination includes 8 preset messages type of message (a circle in FIG. 5 represents a message), the third message set A includes 4 messages of preset types, the third message set B includes 4 messages of preset types, and message 1 is the third message The messages in collection A. The similarity of character attributes between the messages in the third message set A is greater than or equal to the first threshold, and the similarity of character attributes between the messages in the third message set B is greater than or equal to the first threshold.
请参见图5,在第一消息集合中筛选出符合预设类型的消息,组成第二消息集合。根据各消息的字符属性对第二消息集合中的各消息之间进行相似度处理,消息相似度越高,图5中各圆圈之间的距离越小。对第二消息集合中的各消息进行聚类处理,得到第三消息集合A和第三消息集合B,其中,第三消息集合A中的消息与第三消息集合B中的消息的相似度较低。Referring to FIG. 5 , messages conforming to a preset type are selected from the first message set to form a second message set. Similarity processing is performed between the messages in the second message set according to the character attributes of the messages. The higher the similarity of the messages, the smaller the distance between the circles in FIG. 5 . Perform clustering processing on each message in the second message set to obtain a third message set A and a third message set B, wherein the messages in the third message set A are more similar to the messages in the third message set B. Low.
确定第三消息集合A中的消息1对应的第一账号,并确定第一账号在第一消息集合中对应的第一消息数量。第三消息集合A中的消息数量为4,根据第一预设关系和第一消息数量,确定比例系数,根据第一消息数量、比例系数和第三消息集合A中的消息数量可以确定第二阈值。确定第一账号在第三消息集合A中对应的第二消息数量,在第二消息数量大于或等于第二阈值时,在账号集合中确定第一账号为目标账号。A first account corresponding to message 1 in the third message set A is determined, and the number of first messages corresponding to the first account in the first message set is determined. The number of messages in the third message set A is 4, the proportional coefficient is determined according to the first preset relationship and the first message number, and the second threshold. Determine the number of second messages corresponding to the first account in the third message set A, and when the number of second messages is greater than or equal to the second threshold, determine the first account in the account set as the target account.
在第三消息集合A中第一账号对应的消息数量大于或等于第二阈值时,说明第一账号发送过多个字符属性相似的预设类型的消息,可以在账号集合中确定第一账号为目标账号,进而对目标账号进行处理,这样可以提高运营商对发送垃圾短信的手机号码的识别精度。When the number of messages corresponding to the first account in the third message set A is greater than or equal to the second threshold, it means that the first account has sent multiple messages of preset types with similar character attributes, and it can be determined in the account set that the first account is The target account number is then processed, so that the operator can improve the identification accuracy of the mobile phone number that sends the spam short message.
图6为本申请实施例提供的一种消息处理装置的结构示意图。该消息处理装置可以设置在终端设备中。请参见图6,所述消息处理装置10包括:第一确定模块11、划分模块12、获取模块13和第二确定模块14,其中:FIG. 6 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present application. The message processing apparatus may be provided in the terminal device. Referring to FIG. 6, the message processing apparatus 10 includes: a
所述第一确定模块11用于,在第一消息集合中确定第二消息集合,所述第一消息集合包括至少一个终端设备在预设时段内接收到的消息,所述第二消息集合中的消息为预设类型;The first determining
所述划分模块12用于,根据所述第二消息集合中各消息之间的相似度,将所述第二消息集合划分为至少一个第三消息集合,所述第三消息集合中的各消息之间的相似度大于或等于第一阈值;The dividing
所述获取模块13用于,获取所述第三消息集合中每个消息对应的第一账号,所述第一账号为终端设备发送所述消息所使用的账号;The obtaining
所述第二确定模块14用于,在所述第一消息集合中确定每个第一账号对应的第一消息数量、所述第一账号在所述第三消息集合中对应的第二消息数量、所述第三消息集合中包括的第三消息数量,在账号集合中确定目标账号,所述账号集合包括所述第二消息集合中的消息对应的账号。The second determining
在一种可能的实施方式中,所述划分模块12具体用于:In a possible implementation manner, the dividing
获取所述第二消息集合中各消息之间的相似度;obtaining the similarity between the messages in the second message set;
根据所述相似度,对所述第二消息集合中的消息进行聚类处理,以将所述第二消息集合划分为至少一个第三消息集合。According to the similarity, clustering processing is performed on the messages in the second message set, so as to divide the second message set into at least one third message set.
在一种可能的实施方式中,所述划分模块12具体用于:In a possible implementation manner, the dividing
获取所述第二消息集合中各消息的字符属性,字符属性包括如下至少一种:所述消息中包括的字符数量或所述消息中包括的字符的笔画数;Acquire character attributes of each message in the second message set, where the character attributes include at least one of the following: the number of characters included in the message or the number of strokes of characters included in the message;
根据所述字符属性,确定所述第二消息集合中各消息之间的相似度。According to the character attribute, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,所述划分模块12具体用于:In a possible implementation manner, the dividing
根据所述字符属性,确定所述第二消息集合中各消息之间的编辑距离;According to the character attribute, determine the edit distance between the messages in the second message set;
根据所述编辑距离,确定所述第二消息集合中各消息之间的相似度。According to the edit distance, the similarity between the messages in the second message set is determined.
在一种可能的实施方式中,所述第二确定模块14具体用于:In a possible implementation manner, the second determining
根据所述第一消息数量、所述第三消息数量,确定第二阈值;determining a second threshold according to the number of the first messages and the number of the third messages;
根据所述第二消息数量和所述第二阈值,在账号集合中确定目标账号。A target account is determined in the account set according to the second message quantity and the second threshold.
在一种可能的实施方式中,所述第二确定模块14具体用于:In a possible implementation manner, the second determining
获取第一预设关系,所述第一预设关系包括多个消息数量和每个消息数量对应的比例系数;acquiring a first preset relationship, where the first preset relationship includes a plurality of message quantities and a proportional coefficient corresponding to each message quantity;
根据所述第一消息数量和所述第一预设关系,确定比例系数;determining a proportional coefficient according to the first message quantity and the first preset relationship;
根据所述第一消息数量、所述比例系数和所述第三消息数量,确定第二阈值。A second threshold is determined according to the first number of messages, the scaling factor and the third number of messages.
在一种可能的实施方式中,所述第二确定模块14具体用于:In a possible implementation manner, the second determining
判断所述第二消息数量是否大于第二阈值;judging whether the number of the second messages is greater than a second threshold;
若是,则在所述账号集合中确定所述第一账号为所述目标账号。If so, determine the first account as the target account in the account set.
本申请实施例提供的一种消息处理装置可以执行上述方法实施例所示的技术方案,其实现原理以及有益效果类似,此处不再进行赘述。A message processing apparatus provided by the embodiments of the present application can execute the technical solutions shown in the foregoing method embodiments, and the implementation principles and beneficial effects thereof are similar, which will not be repeated here.
图7为本申请实施例提供的消息处理设备的硬件结构示意图。请参见图7,该消息处理设备20可以包括:处理器21和存储器22,其中,处理器21和存储器22可以通信;示例性的,处理器21和存储器22通过通信总线23通信,所述存储器22用于存储程序指令,所述处理器21用于调用存储器中的程序指令执行上述任意方法实施例所示的消息处理方法。FIG. 7 is a schematic diagram of a hardware structure of a message processing device according to an embodiment of the present application. Referring to FIG. 7 , the message processing device 20 may include: a
可选的,消息处理设备20还可以包括通信接口,通信接口可以包括发送器和/或接收器。Optionally, the message processing device 20 may further include a communication interface, and the communication interface may include a transmitter and/or a receiver.
可选的,上述处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。Optionally, the above-mentioned processor may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC) )Wait. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
本申请实施例提供一种可读存储介质,所述可读存储介质上存储有计算机程序;所述计算机程序用于实现如上述任意实施例所述的消息处理方法。An embodiment of the present application provides a readable storage medium, where a computer program is stored on the readable storage medium; the computer program is used to implement the message processing method described in any of the foregoing embodiments.
本申请实施例提供一种计算机程序产品,所述计算机程序产品包括指令,当所述指令被执行时,使得计算机执行上述消息处理方法。An embodiment of the present application provides a computer program product, where the computer program product includes instructions that, when executed, cause a computer to execute the above message processing method.
实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一可读取存储器中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储器(存储介质)包括:只读存储器(英文:read-only memory,缩写:ROM)、RAM、快闪存储器、硬盘、固态硬盘、磁带(英文:magnetic tape)、软盘(英文:floppydisk)、光盘(英文:optical disc)及其任意组合。All or part of the steps for implementing the above method embodiments may be completed by program instructions related to hardware. The aforementioned program can be stored in a readable memory. When the program is executed, the steps including the above method embodiments are executed; and the aforementioned memory (storage medium) includes: read-only memory (English: read-only memory, abbreviation: ROM), RAM, flash memory, hard disk, Solid state drive, magnetic tape (English: magnetic tape), floppy disk (English: floppydisk), optical disc (English: optical disc) and any combination thereof.
本申请实施例是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理单元以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理单元执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, apparatuses (systems), and computer program products according to the embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processing unit of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processing unit of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.
在本申请中,术语“包括”及其变形可以指非限制性的包括;术语“或”及其变形可以指“和/或”。本本申请中术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。本申请中,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In this application, the term "comprising" and its variants may mean non-limiting inclusion; the term "or" and its variants may mean "and/or". The terms "first", "second" and the like in this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. In this application, "plurality" means two or more. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011338960.0A CN112492534B (en) | 2020-11-25 | 2020-11-25 | Message processing method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011338960.0A CN112492534B (en) | 2020-11-25 | 2020-11-25 | Message processing method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112492534A CN112492534A (en) | 2021-03-12 |
CN112492534B true CN112492534B (en) | 2022-04-15 |
Family
ID=74935003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011338960.0A Active CN112492534B (en) | 2020-11-25 | 2020-11-25 | Message processing method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112492534B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012079452A1 (en) * | 2010-12-15 | 2012-06-21 | 成都市华为赛门铁克科技有限公司 | Method, device and terminal for classifying short messages |
CN105323763A (en) * | 2014-06-27 | 2016-02-10 | 中国移动通信集团湖南有限公司 | Method and apparatus for identifying spam messages |
CN105447028A (en) * | 2014-08-27 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Method and device for identifying characteristic account |
CN110119860A (en) * | 2018-02-05 | 2019-08-13 | 阿里巴巴集团控股有限公司 | A kind of rubbish account detection method, device and equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8868663B2 (en) * | 2008-09-19 | 2014-10-21 | Yahoo! Inc. | Detection of outbound sending of spam |
-
2020
- 2020-11-25 CN CN202011338960.0A patent/CN112492534B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012079452A1 (en) * | 2010-12-15 | 2012-06-21 | 成都市华为赛门铁克科技有限公司 | Method, device and terminal for classifying short messages |
CN105323763A (en) * | 2014-06-27 | 2016-02-10 | 中国移动通信集团湖南有限公司 | Method and apparatus for identifying spam messages |
CN105447028A (en) * | 2014-08-27 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Method and device for identifying characteristic account |
CN110119860A (en) * | 2018-02-05 | 2019-08-13 | 阿里巴巴集团控股有限公司 | A kind of rubbish account detection method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112492534A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105574538B (en) | Classification model training method and device | |
US8095612B2 (en) | Ranking messages in an electronic messaging environment | |
CN110297912A (en) | Cheat recognition methods, device, equipment and computer readable storage medium | |
CN110428322A (en) | A kind of adaptation method and device of business datum | |
CN105824813B (en) | A kind of method and device for excavating core customer | |
CN108647727A (en) | Unbalanced data classification lack sampling method, apparatus, equipment and medium | |
CN108512828A (en) | Mail piece identifiers and filter method, device, server based on address list and system | |
CN103984703A (en) | Mail classification method and device | |
CN104123324A (en) | Positioning and obtaining method and device for unread messages | |
CN106156105A (en) | Email polymerization sorting technique and device | |
US20050198182A1 (en) | Method and apparatus to use a genetic algorithm to generate an improved statistical model | |
CN109547322A (en) | System prompt control method, device, computer and computer readable storage medium | |
CN113904943A (en) | Account detection method, device, electronic device and storage medium | |
CN107592399A (en) | Contact display method and mobile terminal | |
CN112492534B (en) | Message processing method, device and equipment | |
CN111597336A (en) | Processing method and device of training text, electronic equipment and readable storage medium | |
US20120260339A1 (en) | Imposter Prediction Using Historical Interaction Patterns | |
CN108932337A (en) | SMS classified data processing method, device, electronic equipment and storage medium | |
CN114693255B (en) | A mail data processing method and related products | |
CN105681523A (en) | Method and apparatus for sending birthday blessing short message automatically | |
WO2019174164A1 (en) | Advertisement short message recognition method, electronic apparatus, terminal device and storage medium | |
KR101806174B1 (en) | System and method for detecting spam sms, recording medium for performing the method | |
CN112199934B (en) | A method, device and storage medium for mail processing | |
CN113554062A (en) | Training method, device and storage medium for multi-classification model | |
CN114625864A (en) | Mail processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |