CN106777341A - Information processing method, device and computer equipment - Google Patents
Information processing method, device and computer equipment Download PDFInfo
- Publication number
- CN106777341A CN106777341A CN201710026441.2A CN201710026441A CN106777341A CN 106777341 A CN106777341 A CN 106777341A CN 201710026441 A CN201710026441 A CN 201710026441A CN 106777341 A CN106777341 A CN 106777341A
- Authority
- CN
- China
- Prior art keywords
- comment
- queue
- user
- comments
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
技术领域technical field
本发明涉及通信技术领域,尤其涉及互联网技术领域,具体涉及一种信息处理方法、装置及计算机设备。The present invention relates to the field of communication technology, in particular to the field of Internet technology, and in particular to an information processing method, device and computer equipment.
背景技术Background technique
随着互联网技术的发展,用户可以通过网络在各个论坛、社区、网站等各种公共平台上发表相关评论。然而,基于公共平台的言论开放性,部分用户通过将广告评论、推销评论、其他含有不良影响的评论等垃圾评论大量发布到公共平台上,以此影响用户对有用信息的获取,且给用户带来不良影响。现有的计算机设备在使用的过程中,垃圾评论已经越来越困扰到用户,而如何有效识别垃圾评论已越来越受到业界的普遍关注。With the development of Internet technology, users can post relevant comments on various public platforms such as various forums, communities, and websites through the Internet. However, based on the openness of speech on the public platform, some users post a large number of spam comments such as advertising comments, promotional comments, and other comments with adverse effects on the public platform, thereby affecting users' acquisition of useful information and bringing users to adverse effects. During the use of existing computer equipment, spam comments have increasingly troubled users, and how to effectively identify spam comments has attracted more and more attention from the industry.
发明内容Contents of the invention
本发明实施例提供一种信息处理方法、装置及计算机设备,可以提高信息处理效率。Embodiments of the present invention provide an information processing method, device, and computer equipment, which can improve information processing efficiency.
本发明实施例提供一种信息处理方法,所述方法包括:An embodiment of the present invention provides an information processing method, the method comprising:
获取用户评论;Get user comments;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;traversing the comment queue, and judging whether the number of comments in the comment queue that are the same as or similar to the user's comments reaches a first threshold, wherein the comment queue is a first-in-first-out queue and its length has a second threshold;
若是,则将所述用户评论确定为垃圾评论;If so, then determining the user comments as spam comments;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。If not, add the user comments into a comment queue, and process comments at the end of the first-in-first-out queue according to the second threshold.
本发明实施例还提供一种信息处理装置,所述装置包括:An embodiment of the present invention also provides an information processing device, the device comprising:
获取模块,用于获取用户评论;Obtain module, used to obtain user comments;
第一判断模块,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;The first judging module is used to traverse the comment queue and judge whether the number of comments in the comment queue that are the same as or similar to the user's comments reaches a first threshold, wherein the comment queue is a first-in-first-out queue and has a length of a second threshold ;
确定模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论;A determination module, configured to determine the user comments as spam comments when judging that the number of comments identical or similar to the user comments in the comment queue reaches a first threshold;
处理模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。A processing module, configured to add the user comments to the comment queue when judging that the number of comments that are the same as or similar to the user comments in the comment queue does not reach the first threshold, and classify the advanced user comments according to the second threshold The comments at the end of the queue that go out of the queue first are processed.
本发明实施例还提供一种计算机设备,包括存储器,处理器及存储在存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器调用所述存储器中存储的所述计算机程序,执行本发明任一实施例所述的信息处理方法。An embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor calls the computer program stored in the memory The program executes the information processing method described in any embodiment of the present invention.
附图说明Description of drawings
下面结合附图,通过对本发明的具体实施方式详细描述,将使本发明的技术方案及其它有益效果显而易见。The technical solutions and other beneficial effects of the present invention will be apparent through the detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings.
图1为本发明实施例提供的一种信息处理方法的流程示意图。FIG. 1 is a schematic flowchart of an information processing method provided by an embodiment of the present invention.
图2为本发明实施例提供的一种信息处理方法的第一使用状态示意图。Fig. 2 is a schematic diagram of a first usage state of an information processing method provided by an embodiment of the present invention.
图3为本发明实施例提供的一种信息处理方法的第二使用状态示意图。Fig. 3 is a schematic diagram of a second usage state of an information processing method provided by an embodiment of the present invention.
图4为本发明实施例提供的一种信息处理方法的另一流程示意图。FIG. 4 is another schematic flowchart of an information processing method provided by an embodiment of the present invention.
图5为本发明实施例提供的一种信息处理方法的第三使用状态示意图。FIG. 5 is a schematic diagram of a third usage state of an information processing method provided by an embodiment of the present invention.
图6为本发明实施例提供的一种信息处理方法的第四使用状态示意图。FIG. 6 is a schematic diagram of a fourth usage state of an information processing method provided by an embodiment of the present invention.
图7为本发明实施例提供的一种信息处理装置的结构示意图。FIG. 7 is a schematic structural diagram of an information processing device provided by an embodiment of the present invention.
图8为本发明实施例提供的一种信息处理装置的另一结构示意图。FIG. 8 is another schematic structural diagram of an information processing device provided by an embodiment of the present invention.
图9为本发明实施例提供的一种计算机设备的结构示意图。FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
本发明中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "first", "second" and "third" in the present invention are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or modules is not limited to the listed steps or modules, but optionally also includes steps or modules that are not listed, or optionally includes For other steps or modules inherent in these processes, methods, products or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present invention. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
本发明实施例提供的一种信息处理方法的执行主体,可以为本发明实施例提供的一种信息处理装置,或者集成了所述信息处理装置的计算机设备(譬如台式电脑、笔记本、掌上电脑、平板电脑、智能手机等),所述信息处理装置可以采用硬件或者软件的方式实现。The execution subject of an information processing method provided in an embodiment of the present invention may be an information processing device provided in an embodiment of the present invention, or a computer device integrating the information processing device (such as a desktop computer, a notebook, a palmtop computer, tablet computer, smart phone, etc.), the information processing device can be realized by hardware or software.
请参阅图1,图1为本发明实施例提供的一种信息处理方法的流程示意图。所述方法包括:Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of an information processing method provided by an embodiment of the present invention. The methods include:
步骤S101,获取用户评论。Step S101, acquiring user comments.
步骤S102,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且Step S102, traversing the comment queue, and judging whether the number of comments in the comment queue that are the same as or similar to the user's comments reaches a first threshold, wherein the comment queue is a first-in-first-out queue and
长度具有第二阈值;若否,则执行步骤S103;若是,则执行步骤S104。The length has a second threshold; if not, execute step S103; if yes, execute step S104.
一些实施方式中,可以通过判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。当判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量未达到第一阈值时,确定所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值,则执行步骤S103。当判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量达到第一阈值时,确定所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值,则执行步骤S104。In some implementation manners, it may be determined whether the number of historical comments in the comment queue whose similarity with the user's comments reaches the third threshold reaches the first threshold, so as to determine whether the comments in the comment queue are similar to the user's comments. Whether the number of comments with the same or similar comments reaches the first threshold. When it is judged that the number of historical comments in the comment queue whose similarity with the user comments reaches the third threshold does not reach the first threshold, determine the same or similar comments in the comment queue as the user comments If the number does not reach the first threshold, step S103 is executed. When it is judged that the number of historical comments in the comment queue whose similarity with the user comments reaches the third threshold reaches the first threshold, determine the number of comments that are the same as or similar to the user comments in the comment queue When the first threshold is reached, step S104 is executed.
步骤S103,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。Step S103, adding the user comments into a comment queue, and processing comments at the end of the comment queue according to the second threshold.
一些实施方式中,可以将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。In some implementation manners, the user comments may be added to the comment queue as the head comments, and the tail comments exceeding the second threshold are deleted.
步骤S104,将所述用户评论确定为垃圾评论。Step S104, determining the user comments as spam comments.
为了更进一步理解本发明技术方案,请参阅图2及图3,图2为本发明实施例提供的一种信息处理方法的第一使用状态示意图,图3为本发明实施例提供的一种信息处理方法的第二使用状态示意图。In order to further understand the technical solution of the present invention, please refer to Figure 2 and Figure 3, Figure 2 is a schematic diagram of the first use state of an information processing method provided by an embodiment of the present invention, and Figure 3 is a kind of information provided by an embodiment of the present invention Schematic diagram of the second usage state of the processing method.
如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论。该论坛的服务器遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值;若是则将所述用户评论确定为垃圾评论;若否则将所述用户评论确定为非垃圾评论,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。As shown in Figure 2, in a certain forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and the user "Code Farmer" submitted an article on this forum as "Strive for another smog as soon as possible." Sensors" user reviews. The server of this forum traverses the comment queue, and judges whether the number of comments identical or similar to the user comments in the comment queue reaches the first threshold; if so, the user comments are determined to be spam comments; otherwise, the user comments are If it is determined that the comment is not spam, the user comment is added to the comment queue, and the comments at the end of the first-in-first-out queue are processed according to the second threshold.
例如,所述评论队列是先进先出队列且长度具有1000条。For example, the comment queue is a first-in-first-out queue with a length of 1000 items.
如图3所示,当所述用户评论确定为非垃圾评论时,对评论队列进行更新,将所述用户评论“争取早日再来个雾霾传感器”添加至所述评论队列中作为评论区所显示的队首评论,并将评论时间最早且溢出第1000条的队尾评论“求甲醛传感器。”进行删除。As shown in Figure 3, when the user comment is determined to be a non-spam comment, the comment queue is updated, and the user comment "strive for another haze sensor as soon as possible" is added to the comment queue as displayed in the comment area The head of the team commented, and the comment at the end of the team with the earliest comment time and overflowing the 1000th comment "seeking formaldehyde sensor." was deleted.
本发明实施例通过获取用户评论,并遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值,若是,则将所述用户评论确定为垃圾评论;若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。本发明实施例可以有效识别出垃圾评论,并且当用户评论识别为非垃圾评论时,仅需更新评论队列,避免对数据库中的所有内容进行处理,减轻系统运行负担,有效提高信息处理效率。In the embodiment of the present invention, by acquiring user comments and traversing the comment queue, it is judged whether the number of comments identical or similar to the user comments in the comment queue reaches the first threshold, wherein the comment queue is a first-in-first-out queue with a length of The second threshold, if so, then determine the user comments as spam comments; if not, add the user comments to the comment queue, and process the comments at the end of the first-in-first-out queue according to the second threshold . The embodiment of the present invention can effectively identify spam comments, and when user comments are identified as non-spam comments, only the comment queue needs to be updated, avoiding processing all content in the database, reducing system operation burden, and effectively improving information processing efficiency.
一些实施方式中,在所述获取用户评论之后,还包括:In some implementations, after the acquisition of user comments, it also includes:
判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论。Judging whether the comment information in the user comments exists in the blacklist database, and if so, determining the user comments as spam comments.
一些实施方式中,所述判断所述用户评论中的评论信息是否存在黑名单库中,包括:In some implementation manners, the judging whether the comment information in the user comments exists in the blacklist database includes:
判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则将所述用户评论确定为垃圾评论。Judging whether the user comments contain information matching the characteristic information in the blacklist database, and if so, determining the user comments as spam comments.
一些实施方式中,当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。In some implementations, when the user comment is determined to be a spam comment, it is detected whether the user comment contains contact information, and if so, the contact information is added to the blacklist database as feature information.
一些实施方式中,所述与所述用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。In some embodiments, the comments similar to the user comments include historical comments whose similarity with the user comments reaches a third threshold.
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。All the above optional technical solutions may be combined in any way to form an optional embodiment of the present invention, which will not be repeated here.
请参阅图4,图4为本发明实施例提供的一种信息处理方法的另一流程示意图。所述方法包括:Please refer to FIG. 4 . FIG. 4 is another schematic flowchart of an information processing method provided by an embodiment of the present invention. The methods include:
步骤S201,获取用户评论。Step S201, acquiring user comments.
例如,如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论,该论坛的服务器从后台获取到该用户评论。For example, as shown in Figure 2, in a certain forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and the user "Code Farmer" submitted an article on this forum as "Strive for another one as soon as possible". Haze sensor" user comments, the forum's server obtains the user comments from the background.
步骤S202,判断所述用户评论中的评论信息是否存在黑名单库中。若否,则执行步骤S203;若是,则执行步骤S205。Step S202, judging whether the comment information in the user comments exists in the blacklist database. If not, execute step S203; if yes, execute step S205.
可以理解的是,所述用户评论中的评论信息可以包括用户名、用户ID、评论内容、评论发布时间等信息。It can be understood that the comment information in the user comments may include information such as user names, user IDs, comment content, and comment release time.
一些实施方式中,判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息。若是,则执行步骤S205;若否,则执行步骤S203。In some implementation manners, it is determined whether the user comments contain information matching the feature information in the blacklist database. If yes, execute step S205; if not, execute step S203.
目前,很多公共平台支持用户之间的互动行为,所述公共平台的表现形式可以包括电子商务平台、论坛、社区、网站、微博、贴吧、博客、应用下载平台等。比如,当用户在网站上注册并通过认证之后,拥有该网站的用户身份信息,成为该网站的用户,用户可以在网站中展示其用户行为,例如发布文章、发布产品、发微博、发帖、回复评论等,还可以对其他发布的信息进行评论、点赞等。针对上述评论内容,某些用户可能会发布大量内容相同或相近的垃圾评论,例如广告评论,推销评论,含有反动、暴力、色情、超链接、谩骂、诽谤等不良影响的评论。At present, many public platforms support interactive behaviors between users, and the forms of said public platforms may include e-commerce platforms, forums, communities, websites, microblogs, post bars, blogs, application download platforms, and the like. For example, when a user registers on the website and passes the authentication, he has the user identity information of the website and becomes a user of the website. The user can display his user behavior on the website, such as publishing articles, publishing products, tweeting, posting, Reply to comments, etc., and you can also comment, like, etc. on other posted information. Regarding the above comment content, some users may post a large number of spam comments with the same or similar content, such as advertising comments, promotional comments, comments containing reactionary, violent, pornographic, hyperlinks, abuse, defamation and other adverse effects.
可以理解的是,可以预先设置黑名单库,所述黑名单库中包含有多个特征信息。It can be understood that a blacklist library may be preset, and the blacklist library includes a plurality of feature information.
一些实施方式中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。In some embodiments, the feature information includes any one or more of user name, user ID, contact information, keywords, and homonyms of keywords.
可以理解的是,所述联系方式的格式可以为字母和数字的组合,长度超过7个字节。比如电话号码、手机号码、微信号码、QQ号码。It can be understood that the format of the contact information may be a combination of letters and numbers, and the length exceeds 7 bytes. Such as phone number, mobile phone number, WeChat number, QQ number.
例如,所述关键字可以包括超链接与广告词、违禁词、特殊符号等。For example, the keywords may include hyperlinks and advertisement words, prohibited words, special symbols and the like.
例如,用户提交的用户评论中包含有超链接与广告词,比如包括产品推销、店铺或网站推荐、公司宣传、业务推广等。所述超链接一般以网址形式出现,会出现多个连续英文字母字符,如http://...,将所述“http”字符设置为关键字,可以通过扫描用户评论中的关键字来检测是否含有超链接;若包含有超链接,即认为所述用户评论可能为垃圾评论,则进一步再判断是否包含有广告词。针对广告词,比如将QQ、特价、热卖、淘宝、包邮等词汇设置为所述公告词的关键字,还包括将任意数字与“元”的组合设置为特征信息。当用户评论中包含有所述关键字时,则确定所述用户评论中的评论信息存在黑名单库中,则执行步骤S205。For example, user comments submitted by users contain hyperlinks and advertising words, such as product promotion, store or website recommendation, company publicity, business promotion, etc. The hyperlink generally appears in the form of a URL, and there will be multiple consecutive English alphabet characters, such as http://..., and the "http" character is set as a keyword, which can be found by scanning the keywords in user comments. Detect whether it contains a hyperlink; if it contains a hyperlink, it means that the user comment may be a spam comment, and then further judge whether it contains an advertisement word. For the advertisement words, for example, QQ, special price, hot sale, Taobao, free shipping and other words are set as the keywords of the advertisement words, and the combination of any number and "yuan" is also set as the characteristic information. When the keyword is contained in the user comment, it is determined that the comment information in the user comment exists in the blacklist database, and step S205 is executed.
例如,所述违禁词为含有人身攻击的词汇。For example, the prohibited words are words containing personal attacks.
例如,有些用户在提交用户评论时,可能会在关键字或者评论信息的文字中间加入特殊符号,以此避开相关平台的对垃圾评论的检测。因此,可以将“★”、“*”、“#”、“&”等特殊符号设置为关键字,作为特征信息存储到黑名单库中。For example, when some users submit user comments, they may add special symbols in the middle of keywords or comments, so as to avoid the detection of spam comments by relevant platforms. Therefore, special symbols such as "★", "*", "#", "&" can be set as keywords and stored in the blacklist database as characteristic information.
例如,用户可能用谐音或者近音代替原来的关键字,以此避开相关平台的对垃圾评论的检测,比如“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”。因此针对上述包含有谐音或者近音的情形,可以将关键字的谐音设置为特征信息存储到黑名单库中。For example, users may replace the original keywords with homonyms or near-sounds, so as to avoid the detection of spam comments by related platforms, such as "fishing master 3 逋鱼提线 Jiaweixin a5a7a9 class tiline". Therefore, for the above-mentioned situation containing homonyms or near-sounds, the homonyms of keywords can be set as feature information and stored in the blacklist database.
例如,在某一论坛上用户提交的用户评论为“代开发票,加Q(22222211)”,检测到所述用户评论中包含有与黑名单库中的联系方式相匹配的信息,则执行步骤S205。比如,用户提交的用户评论为“深度好文,值得学习。”,检测到所述用户评论中包未含有与黑名单库中的特征信息相匹配的信息,则执行步骤S203。For example, the user comment submitted by the user on a certain forum is "invoicing on behalf of others, add Q (22222211)", and it is detected that the user comment contains information matching the contact information in the blacklist database, and then the steps are performed S205. For example, the user comment submitted by the user is "deep and good text, worth learning." If it is detected that the user comment does not contain information matching the characteristic information in the blacklist database, step S203 is performed.
如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交了内容为“争取早日再来个雾霾传感器”的用户评论。当判断用户提交的用户评论中未包含有与黑名单库中的特征信息相匹配的信息时,则执行步骤S203。As shown in Figure 2, in a forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and user "Code Farmer" submitted an article on the forum titled "Strive for another fog as soon as possible." Haze Sensor" user review. When it is judged that the user comment submitted by the user does not contain information matching the feature information in the blacklist database, step S203 is executed.
请参阅图5,图5为本发明实施例提供的一种信息处理方法的第三使用状态示意图。Please refer to FIG. 5 , which is a schematic diagram of a third usage state of an information processing method provided by an embodiment of the present invention.
在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户在该论坛上提交了内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论。当判断用户提交的用户评论中包含有与黑名单库中的特征信息相匹配的信息时,则执行步骤S205。In a forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and a user submitted an article on the forum titled "Fishing Master 3 Fish Lifting Line Jiawei Xin a5a7a9 Lesson Proposal" line" user reviews. When it is judged that the user comment submitted by the user contains information matching the characteristic information in the blacklist database, step S205 is executed.
在一些实施方式中,也可是设置白名单库,判断所述用户评论中的评论信息是否存在白名单库中,若是则可以将所述用户评论确定为非垃圾评论;若否则可以将所述用户评论确定为垃圾评论。In some embodiments, it is also possible to set a whitelist library to determine whether the comment information in the user comments exists in the whitelist library, if so, the user comments can be determined as non-spam comments; otherwise, the user comments can be Comment identified as spam.
例如,针对产品的用户评论,与产品相关的用户评论通常归类为有用信息,因此可以通过筛选与产品描述相关的关联词,比如主题词或者情感词来确实是否为垃圾评论。例如以电子商务平台上发布的产品为例,所述主题词可以是与产品相关的核心名词,可以预先将关于产品标准描述的主题词存储到白名单库中,如果检测到用户针对该产品提交的评论信息中未含有产品标准描述中的任何主题词,则可以将所述用户评论确定为垃圾评论;如果检测到用户针对该产品提交的评论信息中含有产品标准描述中的任意一个或者多个主题词时,则可以将所述用户评论确定为非垃圾评论。For example, for user reviews of products, user reviews related to products are usually classified as useful information, so it is possible to check whether they are spam reviews by filtering associated words related to product descriptions, such as subject words or emotional words. For example, taking a product published on an e-commerce platform as an example, the subject words can be core nouns related to the product, and the subject terms about the standard description of the product can be stored in the whitelist database in advance. If it is detected that the user submits a If the review information of the user does not contain any subject words in the product standard description, the user review can be determined as spam; if it is detected that the review information submitted by the user for the product contains any one or more of the product standard description When subject words are used, the user comments can be determined to be non-spam comments.
例如,所述情感词包括用户真实意愿的表达自己的主观性看法、态度、感觉、情绪等的情感词汇。比如以对某一网站销售的产品的评价为例,所述产品的评论是人们对产品相关参数及购买体验的评价和议论,人们通过评论可以真实的表达出自己的主观性看法、态度、感觉、情绪等。因此,产品评论必然包含评论者的情感。情感词词数越少,越有可能属于垃圾评论。For example, the emotional words include emotional words that the user really intends to express his subjective views, attitudes, feelings, emotions, and the like. For example, take the evaluation of products sold on a certain website as an example. The reviews of the products are people’s evaluations and discussions on product-related parameters and purchase experience. People can truly express their subjective views, attitudes, and feelings through reviews. , emotions, etc. Therefore, product reviews necessarily contain the emotions of the reviewers. The fewer emotional words, the more likely they belong to spam comments.
步骤S203,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值;若否,则执行步骤S204;若是,则执行步骤S205。Step S203, traversing the comment queue, and judging whether the number of comments in the comment queue that are the same as or similar to the user's comment reaches a first threshold; if not, execute step S204; if yes, execute step S205.
可以理解的是,可以通过检测评论队列中是否包含有与所述用户评论相同或相似的历史评论,来确定所述评论队列中与所述用户评论相同或者相似的评论数量。例如,当所述用户评论中的评论信息不存在黑名单库中时,所述评论队列中还存在大量与所述用户评论的内容相同或者相似的历史评论,当内容相同或相似的评论信息的评论数量达到某个阈值时,也会妨碍用户对有用信息的获取,实际上,该重复内容的用户评论也可以归为垃圾评论。因此为了更准确的识别出垃圾评论,可进一步检测评论队列中是否包含有与所述用户评论相同或者相似的历史评论,并判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。其中,所述评论队列为由历史评论组成的先进先出队列。It can be understood that the number of comments identical or similar to the user comments in the comment queue can be determined by detecting whether the comment queue contains historical comments that are the same or similar to the user comments. For example, when the comment information in the user comments does not exist in the blacklist database, there are still a large number of historical comments with the same or similar content as the user comments in the comment queue, and when the comment information with the same or similar content When the number of comments reaches a certain threshold, it will also hinder users from obtaining useful information. In fact, user comments with duplicate content can also be classified as spam comments. Therefore, in order to identify spam comments more accurately, it is possible to further detect whether there are historical comments identical or similar to the user comments in the comment queue, and determine whether the number of comments identical or similar to the user comments in the comment queue is reached the first threshold. Wherein, the comment queue is a first-in-first-out queue composed of historical comments.
一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。可以通过判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来确定所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。In some implementations, the comments similar to user comments include historical comments whose similarity with the user comments reaches a third threshold. By judging whether the number of historical comments in the comment queue whose similarity with the user comments reaches the third threshold reaches the first threshold, it can be determined whether the comments in the comment queue are the same as or similar to the user comments. Whether the number of comments reaches the first threshold.
比如,可以通过比对用户评论与评论队列中的历史评论中所含有的信息的匹配程度来确定出所述相似度的大小。比如,所述第三阈值可以为80%,当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到90%时,确定为相似;当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到100%时,确定为相同。For example, the degree of similarity can be determined by comparing the degree of matching between user comments and information contained in historical comments in the comment queue. For example, the third threshold may be 80%. When the matching degree of the user comments and the information contained in the historical comments in the comment queue reaches 90%, it is determined to be similar; When the degree of matching of the included information reaches 100%, it is determined to be the same.
一些实施方式中,所述评论队列可以包括链式队列、数组队列中的任意一种。In some implementation manners, the comment queue may include any one of a chain queue and an array queue.
可以理解的是,在程序设计语言中,队列是一种线性表,队列的数据元素又称为队列元素。在队列中插入一个队列元素称为入队,从队列中删除一个队列元素成为出队。因为队列只允许在一端插入,在另一端删除,即最早进入队列的元素才能最先从队列中删除,故队列又称为先进先出(FIFO—first in first out)线性表。因此,所述评论队列可以称为先进先出队列。It can be understood that, in a programming language, a queue is a linear table, and the data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueue, and removing a queue element from the queue is called dequeue. Because the queue can only be inserted at one end and deleted at the other end, that is, the elements that enter the queue at the earliest can be deleted from the queue first, so the queue is also called FIFO—first in first out (FIFO) linear table. Therefore, the comment queue can be called a first-in first-out queue.
例如,队列可以用数组Q[1…m]来存储,数组的上界m即是队列所容许的最大容量。在队列的运算中需设两个指针:head,队首指针,指向实际队首元素;tail,队尾指针,指向实际队尾元素的下一个位置。一般情况下,两个指针的初值设为0,这时队列为空,没有元素。当队列元素的个数达到数组的上界m时,当有新的队列元素入队时,最早进入队列的队列元素从队列中删除。For example, a queue can be stored in an array Q[1...m], and the upper bound m of the array is the maximum capacity allowed by the queue. In the operation of the queue, two pointers need to be set: head, the pointer of the head of the queue, which points to the actual head element of the queue; tail, the pointer of the tail of the queue, which points to the next position of the actual tail element of the queue. Under normal circumstances, the initial value of the two pointers is set to 0, and the queue is empty at this time, with no elements. When the number of queue elements reaches the upper bound m of the array, when a new queue element enters the queue, the queue element that first enters the queue is deleted from the queue.
例如,队列也可以用链表来存储,把数据在数学逻辑上的先后相邻关系用元素的存储地址的指针来指示,以此形成链式队列,可以动态地进行存储分配。For example, the queue can also be stored in a linked list, and the mathematically logical sequential and adjacent relationship of the data is indicated by the pointer of the storage address of the element, thereby forming a chained queue, which can dynamically allocate storage.
例如,所述评论队列为数组队列,则所述评论队列的长度所具有的第二阈值即为所述数组队列的最大容量,比如为1000条用户评论。For example, if the comment queue is an array queue, the second threshold of the length of the comment queue is the maximum capacity of the array queue, for example, 1000 user comments.
比如,当检测评论队列中包含有与所述用户评论相同的历史评论时,为了避免评论队列中多次出现重复内容的用户评论,进而影响用户的信息获取效率,可以拒绝对评论队列进行更新,并在记录与所述用户评论相同的历史评论的点赞数组上加1,以表示有其他人发表与所述历史评论的内容相同或相似的用户评论,或者表示有其他人赞同所述历史评论的内容。For example, when it is detected that the comment queue contains the same historical comments as the user comments, in order to avoid multiple user comments with repeated content in the comment queue, thereby affecting the user's information acquisition efficiency, it is possible to refuse to update the comment queue, And add 1 to the like array that records the same historical comments as the user comments, to indicate that other people have published user comments that are the same or similar to the content of the historical comments, or to indicate that other people agree with the historical comments Content.
如图2所示,比如所述第一阈值为5,所述评论队列中与内容为“争取早日再来个雾霾传感器”的用户评论相同的评论数量为1,则判定述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值,则执行步骤S204。As shown in Figure 2, such as the first threshold value is 5, and the number of comments identical to the user comments whose content is "strive for another haze sensor as soon as possible" in the comment queue is 1, then it is determined that the comments in the comment queue are the same as the user comments. If the number of comments with the same or similar comments from the user does not reach the first threshold, step S204 is performed.
如图5所示,比如所述第一阈值为5,所述评论队列中与内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论相同的评论数量为7,判定所述评论队列中与所述用户评论相同或者相似的评论数量已达到第一阈值,则执行步骤S205。As shown in Figure 5, for example, the first threshold is 5, and the number of comments in the comment queue that is the same as the user comment whose content is "fishing expert 3 fish lifting line Jiawei new a5a7a9 class lifting line" is 7 If it is determined that the number of comments identical or similar to the user's comments in the comment queue has reached a first threshold, step S205 is executed.
步骤S204,将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。Step S204, adding the user comment into the comment queue as the head comment, and deleting the tail comment exceeding the second threshold.
可以理解的是,所述先进先出队列的长度可以预设为第二阈值。所述长度可以用数组队列中的所能容纳的数据包总数来表示,数组在建立之前需提前设置为固定的大小,即为每个队列元素设置一个合适的字节长度,以满足单个队列元素对字节长度的需求,可以理解为每个队列元素代表一个数据包,每个数据包具有固定的大小,比如数组为N[1…1000],则所述第二阈值为1000个。所述先进先出队列的长度也可以用链式队列中的存储单元的指针个数来表示,链表不需要提前分配固定大小的存储空间,当需要存储数据时,可以为每个队列元素设置一个合适的存储单元用于存储数据,并将所述存储单元通过指针与队列中的其他的存储单元链接在一起。所述评论队列的内容是实时变化的,比如,在评论区所展示的区域有新的用户评论入队列时,将所述用户评论添加至评论队列中作为队首评论,作为队尾评论的历史评论则出队列,其他的历史评论的队列编号分别在原来的基础上加1。It can be understood that the length of the first-in-first-out queue may be preset as the second threshold. The length can be represented by the total number of data packets that can be accommodated in the array queue. Before the array is established, it needs to be set to a fixed size in advance, that is, an appropriate byte length is set for each queue element to meet the requirements of a single queue element. The requirement for byte length can be understood as each queue element represents a data packet, and each data packet has a fixed size, for example, if the array is N[1...1000], then the second threshold is 1000. The length of the first-in-first-out queue can also be represented by the number of pointers of storage units in the chained queue. The linked list does not need to allocate a fixed-size storage space in advance. When data needs to be stored, a queue element can be set for each queue element. Appropriate storage units are used to store data, and the storage units are linked together with other storage units in the queue through pointers. The content of the comment queue changes in real time. For example, when a new user comment enters the queue in the area displayed in the comment area, the user comment is added to the comment queue as the team head comment and as the history of the team tail comment Comments are queued, and the queue numbers of other historical comments are increased by 1 on the basis of the original ones.
如图3所示,所述评论队列中与所述用户评论相同或者相似的评论数量小于第一阈值时,对所述评论队列进行更新,将所述用户评论“争取早日再来个雾霾传感器”添加至所述评论队列的队首No.1,并删除位于所述评论队列的队尾No.1000的历史评论“求甲醛传感器”。原来编号为No.1的历史评论“好文章!点赞”的编号变为No.2,其显示于编号No.2的显示栏位,其余历史评论均向后移动一个显示栏位。As shown in Figure 3, when the number of comments identical or similar to the user comments in the comment queue is less than the first threshold, the comment queue is updated, and the user comment "strive for another haze sensor as soon as possible" Add to the head No.1 of the comment queue, and delete the historical comment "seeking formaldehyde sensor" located at the tail No.1000 of the comment queue. The number of the historical comment "Good article! Like" originally numbered No.1 is changed to No.2, which is displayed in the display column of No.2, and the other historical comments are all moved backward by one display column.
步骤S205,将所述用户评论确定为垃圾评论。Step S205, determining the user comments as spam comments.
可以理解的是,当确定所述用户评论为垃圾评论时,可以拒绝对评论队列进行更新。It can be understood that, when it is determined that the user comments are spam comments, updating the comment queue may be refused.
请参阅图6,图6分别为本发明实施例提供的一种信息处理方法的第四使用状态示意图。Please refer to FIG. 6 . FIG. 6 is a schematic diagram of a fourth usage state of an information processing method provided by an embodiment of the present invention.
当确定所述内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论为垃圾评论时,拒绝对评论队列进行更新。When it is determined that the user comment with the content of "fishing master 3 fish tiling line Jiawei new a5a7a9 class tiling line" is a spam comment, it refuses to update the comment queue.
一些实施方式中,在拒绝对评论队列进行更新时,还可以弹出提示框,以提醒用户其评论信息发表失败的提示信息。如图6所示,当用户点“评论”按钮之后,弹出内容为“评论审核未通过:为垃圾评论!”的提示框,同时拒绝对评论队列进行更新,所述手机界面上显示的发表评论的评论区没有变化。In some implementation manners, when the update of the comment queue is refused, a prompt box may also pop up to remind the user of the failure to publish the comment information. As shown in Figure 6, when the user clicks the "Comment" button, a prompt box pops up with the content "Comment Review Failed: Spam Comment!", and refuses to update the comment queue at the same time. The comments section of .
步骤S206,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。Step S206, detecting whether the user comments contain contact information, and if so, adding the contact information to the blacklist database as feature information.
一些实施方式中,当检测到所述用户评论中包含有联系方式时,且所述联系方式为新的联系方式时,将所述用户评论中提取到的新的联系方式新增至所述黑名单库中作为特征信息。当所述联系方式为旧的联系方式时,可以对所述黑名单库中原有的联系方式进行覆盖,或者不添加到所述黑名单库中。In some embodiments, when it is detected that the user comments contain contact information, and the contact information is a new contact information, the new contact information extracted from the user comments is added to the blacklist. List database as feature information. When the contact information is an old contact information, the original contact information in the blacklist database may be overwritten, or not added to the blacklist database.
可以理解的是,当所述用户评论中检测到新的联系方式时,提取所述新的联系方式,并新增至所述黑名单库中作为特征信息,以作为下一个用户评论的检测依据。It can be understood that, when a new contact method is detected in the user comment, the new contact method is extracted and added to the blacklist database as feature information to be used as the detection basis for the next user comment .
如图6所示,比如内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论为垃圾评论时,提取所述用户评论中的新的联系方式“a5a7a9”,并将“a5a7a9”新增至所述黑名单库中作为特征信息。As shown in Figure 6, for example, when the user comment with the content of "fishing expert 3 fish tiling line Jiawei new a5a7a9 class tiling line" is a spam comment, the new contact information "a5a7a9" in the user comment is extracted, And "a5a7a9" is added to the blacklist library as feature information.
本发明实施例通过检测用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,以确定所述用户评论是否为垃圾评论,在所述用户评论为非垃圾评论时,遍历评论队列,且在判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。本发明实施例在识别出用户提交的用户评论为非垃圾评论时,进一步检测内容重复的历史评论,以此排除避免加入大量重复内容的用户评论,减轻系统运行负担,有效提高信息处理效率,提升用户获取有用信息的效率。The embodiment of the present invention determines whether the user comment is a spam comment by detecting whether the user comment contains information that matches the characteristic information in the blacklist database, and traverses the comment queue when the user comment is a non-spam comment , and when it is judged that the number of comments that are the same as or similar to the user’s comments in the comment queue does not reach the first threshold, the user’s comment will be added to the comment queue as the leader comment, and will overflow the second threshold Comments at the end of the queue will be deleted. In the embodiment of the present invention, when the user comments submitted by the user are identified as non-spam comments, historical comments with duplicate content are further detected, so as to eliminate user comments that avoid adding a large amount of duplicate content, reduce the burden on system operation, effectively improve information processing efficiency, and improve The efficiency with which users obtain useful information.
本发明实施例还提供一种信息处理装置,如图7所示,图7为本发明实施例提供的一种信息处理装置的结构示意图。所述信息处理装置30包括获取模块31,第一判断模块33,处理模块34,以及确定模块35。An embodiment of the present invention also provides an information processing device, as shown in FIG. 7 , which is a schematic structural diagram of an information processing device provided by an embodiment of the present invention. The information processing device 30 includes an acquisition module 31 , a first judgment module 33 , a processing module 34 , and a determination module 35 .
其中,所述获取模块31,用于获取用户评论。Wherein, the acquiring module 31 is configured to acquire user comments.
所述第一判断模块33,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值。The first judging module 33 is configured to traverse the comment queue, and judge whether the number of comments identical or similar to the user comments in the comment queue reaches a first threshold, wherein the comment queue is a first-in-first-out queue with a length of second threshold.
所述确定模块35,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论。The determination module 35 is configured to determine the user comment as a spam comment when it is judged that the number of comments identical or similar to the user comment in the comment queue reaches a first threshold.
所述处理模块34,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。The processing module 34 is configured to add the user comments to the comment queue when it is judged that the number of comments that are the same as or similar to the user comments in the comment queue does not reach the first threshold, and add the user comments to the comment queue according to the second threshold. Comments at the end of the comment queue are processed.
请参阅图8,图8为本发明实施例提供的一种信息处理装置的另一结构示意图。所述信息处理装置30包括获取模块31,第二判断模块32,第一判断模块33,处理模块34,确定模块35,以及检测模块36。Please refer to FIG. 8 . FIG. 8 is another schematic structural diagram of an information processing device provided by an embodiment of the present invention. The information processing device 30 includes an acquisition module 31 , a second judgment module 32 , a first judgment module 33 , a processing module 34 , a determination module 35 , and a detection module 36 .
其中所述获取模块31,用于获取用户评论。Wherein the acquisition module 31 is used to acquire user comments.
例如,如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论,所述获取模块31获取到该用户评论。For example, as shown in Figure 2, in a certain forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and the user "Code Farmer" submitted an article on this forum as "Strive for another one as soon as possible". haze sensor", the acquisition module 31 acquires the user comments.
所述第二判断模块32,用于判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论。The second judging module 32 is configured to judge whether the comment information in the user comments exists in the blacklist database, and if so, determine the user comments as spam comments.
可以理解的是,所述用户评论中的评论信息可以包括用户名、用户ID、评论内容、评论发布时间等信息。It can be understood that the comment information in the user comments may include information such as user names, user IDs, comment content, and comment release time.
一些实施方式中,所述第二判断模块32,还用于判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则将所述用户评论确定为垃圾评论。In some implementations, the second judging module 32 is further configured to judge whether the user comments contain information that matches the characteristic information in the blacklist database, and if so, determine the user comments as spam comments.
目前,很多公共平台支持用户之间的互动行为,所述公共平台的表现形式可以包括电子商务平台、论坛、社区、网站、微博、贴吧、博客、应用下载平台等。比如,当用户在网站上注册并通过认证之后,拥有该网站的用户身份信息,成为该网站的用户,用户可以在网站中展示其用户行为,例如发布文章、发布产品、发微博、发帖、回复评论等,还可以对其他发布的信息进行评论、点赞等。针对上述评论内容,某些用户可能会发布大量内容相同或相近的垃圾评论,例如广告评论,推销评论,含有反动、暴力、色情、超链接、谩骂、诽谤等不良影响的评论。At present, many public platforms support interactive behaviors between users, and the forms of said public platforms may include e-commerce platforms, forums, communities, websites, microblogs, post bars, blogs, application download platforms, and the like. For example, when a user registers on the website and passes the authentication, he has the user identity information of the website and becomes a user of the website. The user can display his user behavior on the website, such as publishing articles, publishing products, tweeting, posting, Reply to comments, etc., and you can also comment, like, etc. on other posted information. Regarding the above comment content, some users may post a large number of spam comments with the same or similar content, such as advertising comments, promotional comments, comments containing reactionary, violent, pornographic, hyperlinks, abuse, defamation and other adverse effects.
可以理解的是,可以预先设置黑名单库,所述黑名单库中包含有多个特征信息。It can be understood that a blacklist library may be preset, and the blacklist library includes a plurality of feature information.
一些实施方式中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。In some embodiments, the feature information includes any one or more of user name, user ID, contact information, keywords, and homonyms of keywords.
可以理解的是,所述联系方式的格式可以为字母和数字的组合,长度超过7个字节。比如电话号码、手机号码、微信号码、QQ号码。It can be understood that the format of the contact information may be a combination of letters and numbers, and the length exceeds 7 bytes. Such as phone number, mobile phone number, WeChat number, QQ number.
例如,所述关键字可以包括超链接与广告词、违禁词、特殊符号等。For example, the keywords may include hyperlinks and advertisement words, prohibited words, special symbols and the like.
例如,用户提交的用户评论中包含有超链接与广告词,比如包括产品推销、店铺或网站推荐、公司宣传、业务推广等。所述超链接一般以网址形式出现,会出现多个连续英文字母字符,如http://...,将所述“http”字符设置为关键字,可以通过扫描用户评论中的关键字来检测是否含有超链接;若包含有超链接,即认为所述用户评论可能为垃圾评论,则进一步再判断是否包含有广告词。针对广告词,比如将QQ、特价、热卖、淘宝、包邮等词汇设置为所述公告词的关键字,还包括将任意数字与“元”的组合设置为特征信息。当用户评论中包含有所述关键字时,则所述第二判断模块32判定所述用户评论中的评论信息存在黑名单库中,则将所述用户评论确定为垃圾评论。For example, user comments submitted by users contain hyperlinks and advertising words, such as product promotion, store or website recommendation, company publicity, business promotion, etc. The hyperlink generally appears in the form of a URL, and there will be multiple consecutive English alphabet characters, such as http://..., and the "http" character is set as a keyword, which can be found by scanning the keywords in user comments. Detect whether it contains a hyperlink; if it contains a hyperlink, it means that the user comment may be a spam comment, and then further judge whether it contains an advertisement word. For the advertisement words, for example, QQ, special price, hot sale, Taobao, free shipping and other words are set as the keywords of the advertisement words, and the combination of any number and "yuan" is also set as the characteristic information. When the keyword is included in the user comment, the second judging module 32 determines that the comment information in the user comment exists in the blacklist database, and then determines the user comment as a spam comment.
例如,所述违禁词为含有人身攻击的词汇。For example, the prohibited words are words containing personal attacks.
例如,有些用户在提交用户评论时,可能会在关键字或者评论信息的文字中间加入特殊符号,以此避开相关平台的对垃圾评论的检测。因此,可以将“★”、“*”、“#”、“&”等特殊符号设置为关键字,作为特征信息存储到黑名单库中。For example, when some users submit user comments, they may add special symbols in the middle of keywords or comments, so as to avoid the detection of spam comments by relevant platforms. Therefore, special symbols such as "★", "*", "#", "&" can be set as keywords and stored in the blacklist database as characteristic information.
例如,用户可能用谐音或者近音代替原来的关键字,以此避开相关平台的对垃圾评论的检测,比如“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”。因此针对上述包含有谐音或者近音的情形,可以将关键字的谐音设置为特征信息存储到黑名单库中。For example, users may replace the original keywords with homonyms or near-sounds, so as to avoid the detection of spam comments by related platforms, such as "fishing master 3 逋鱼提线 Jiaweixin a5a7a9 class tiline". Therefore, for the above-mentioned situation containing homonyms or near-sounds, the homonyms of keywords can be set as feature information and stored in the blacklist database.
例如,在某一论坛上用户提交的用户评论为“代开发票,加Q(22222211)”,当所述第二判断模块32判定所述用户评论中包含有与黑名单库中的联系方式相匹配的信息时,则将所述用户评论确定为垃圾评论。For example, the user comment submitted by the user on a certain forum is "invoicing on behalf of others, add Q (22222211)", when the second judging module 32 determines that the user comment contains information related to the contact information in the blacklist database, If there is no matching information, the user comment is determined to be a spam comment.
如图5所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户在该论坛上提交了内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论。当所述第二判断模块32判定用户提交的用户评论中包含有与黑名单库中的特征信息相匹配的信息时,将所述用户评论确定为垃圾评论。As shown in Figure 5, in a certain forum, "Flying Tiger" published an article titled "Sensors in Mobile Phones", and the user submitted an article on this forum with the content "Fishing Master 3 Fish Lifting Line" Jiawei's new a5a7a9 class lift line" user reviews. When the second judging module 32 judges that the user comment submitted by the user contains information that matches the feature information in the blacklist database, the user comment is determined as a spam comment.
在一些实施方式中,也可是设置白名单库。所述第二判断模块32,也可以用于判断所述用户评论中的评论信息是否存在白名单库中,若是则可以将所述用户评论确定为非垃圾评论;若否则可以将所述用户评论确定为垃圾评论。In some implementation manners, a whitelist library may also be set. The second judging module 32 can also be used to judge whether the comment information in the user comments exists in the whitelist library, if so, the user comments can be determined as non-spam comments; otherwise, the user comments can be Confirmed as spam.
例如,针对产品的用户评论,与产品相关的用户评论通常归类为有用信息,因此可以通过筛选与产品描述相关的关联词,比如主题词或者情感词来确实是否为垃圾评论。例如以电子商务平台上发布的产品为例,所述主题词可以是与产品相关的核心名词,可以预先将关于产品标准描述的主题词存储到白名单库中,如果检测到用户针对该产品提交的评论信息中未含有产品标准描述中的任何主题词,则可以将所述用户评论确定为垃圾评论;如果检测到用户针对该产品提交的评论信息中含有产品标准描述中的任意一个或者多个主题词时,则可以将所述用户评论确定为非垃圾评论。For example, for user reviews of products, user reviews related to products are usually classified as useful information, so it is possible to check whether they are spam reviews by filtering associated words related to product descriptions, such as subject words or emotional words. For example, taking a product published on an e-commerce platform as an example, the subject words can be core nouns related to the product, and the subject terms about the standard description of the product can be stored in the whitelist database in advance. If it is detected that the user submits a If the review information of the user does not contain any subject words in the product standard description, the user review can be determined as spam; if it is detected that the review information submitted by the user for the product contains any one or more of the product standard description When subject words are used, the user comments can be determined to be non-spam comments.
例如,所述情感词包括用户真实意愿的表达自己的主观性看法、态度、感觉、情绪等的情感词汇。比如以对某一网站销售的产品的评价为例,所述产品的评论是人们对产品相关参数及购买体验的评价和议论,人们通过评论可以真实的表达出自己的主观性看法、态度、感觉、情绪等。因此,产品评论必然包含评论者的情感。情感词词数越少,越有可能属于垃圾评论。For example, the emotional words include emotional words that the user really intends to express his subjective views, attitudes, feelings, emotions, and the like. For example, take the evaluation of products sold on a certain website as an example. The reviews of the products are people’s evaluations and discussions on product-related parameters and purchase experience. People can truly express their subjective views, attitudes, and feelings through reviews. , emotions, etc. Therefore, product reviews necessarily contain the emotions of the reviewers. The fewer emotional words, the more likely they belong to spam comments.
所述第一判断模块33,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值。The first judging module 33 is configured to traverse the comment queue, and judge whether the number of comments identical or similar to the user comments in the comment queue reaches a first threshold, wherein the comment queue is a first-in-first-out queue with a length of second threshold.
可以理解的是,可以通过检测评论队列中是否包含有与所述用户评论相同或相似的历史评论,来确定所述评论队列中与所述用户评论相同或者相似的评论数量。例如,当所述用户评论中的评论信息不存在黑名单库中时,所述评论队列中还存在大量与所述用户评论的内容相同或者相似的历史评论,当内容相同或相似的评论信息的评论数量达到某个阈值时,也会妨碍用户对有用信息的获取,实际上,该重复内容的用户评论也可以归为垃圾评论。因此为了更准确的识别出垃圾评论,可进一步检测评论队列中是否包含有与所述用户评论相同或者相似的历史评论,并通过所述第一判断模块33判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。其中,所述评论队列为由历史评论组成的先进先出队列。It can be understood that the number of comments identical or similar to the user comments in the comment queue can be determined by detecting whether the comment queue contains historical comments that are the same or similar to the user comments. For example, when the comment information in the user comments does not exist in the blacklist database, there are still a large number of historical comments with the same or similar content as the user comments in the comment queue, and when the comment information with the same or similar content When the number of comments reaches a certain threshold, it will also hinder users from obtaining useful information. In fact, user comments with duplicate content can also be classified as spam comments. Therefore, in order to identify spam comments more accurately, it is possible to further detect whether there are historical comments identical or similar to the user's comments in the comment queue, and judge whether the comment queue is similar to the user's comments in the comment queue by the first judging module 33. Whether the number of comments with the same or similar comments reaches the first threshold. Wherein, the comment queue is a first-in-first-out queue composed of historical comments.
一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。可以通过所述第一判断模块33判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来确定所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。In some implementations, the comments similar to user comments include historical comments whose similarity with the user comments reaches a third threshold. It can be determined by the first judging module 33 whether the number of historical comments in the comment queue whose similarity with the user comments reaches the third threshold reaches the first threshold, so as to determine whether the comments in the comment queue are similar to the user comments. Whether the number of comments with the same or similar comments of the above user reaches the first threshold.
比如,可以通过比对用户评论与评论队列中的历史评论中所含有的信息的匹配程度来确定出所述相似度的大小。比如,所述第三阈值可以为80%,当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到90%时,确定为相似;当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到100%时,确定为相同。For example, the degree of similarity can be determined by comparing the degree of matching between user comments and information contained in historical comments in the comment queue. For example, the third threshold may be 80%. When the matching degree of the user comments and the information contained in the historical comments in the comment queue reaches 90%, it is determined to be similar; When the degree of matching of the included information reaches 100%, it is determined to be the same.
一些实施方式中,所述评论队列可以包括链式队列、数组队列中的任意一种。In some implementation manners, the comment queue may include any one of a chain queue and an array queue.
可以理解的是,在程序设计语言中,队列是一种线性表,队列的数据元素又称为队列元素。在队列中插入一个队列元素称为入队,从队列中删除一个队列元素成为出队。因为队列只允许在一端插入,在另一端删除,即最早进入队列的元素才能最先从队列中删除,故队列又称为先进先出(FIFO—first in first out)线性表。因此,所述评论队列可以称为先进先出队列。It can be understood that, in a programming language, a queue is a linear table, and the data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueue, and removing a queue element from the queue is called dequeue. Because the queue can only be inserted at one end and deleted at the other end, that is, the elements that enter the queue at the earliest can be deleted from the queue first, so the queue is also called FIFO—first in first out (FIFO) linear table. Therefore, the comment queue can be called a first-in first-out queue.
例如,队列可以用数组Q[1…m]来存储,数组的上界m即是队列所容许的最大容量。在队列的运算中需设两个指针:head,队首指针,指向实际队首元素;tail,队尾指针,指向实际队尾元素的下一个位置。一般情况下,两个指针的初值设为0,这时队列为空,没有元素。当队列元素的个数达到数组的上界m时,当有新的队列元素入队时,最早进入队列的队列元素从队列中删除。For example, a queue can be stored in an array Q[1...m], and the upper bound m of the array is the maximum capacity allowed by the queue. In the operation of the queue, two pointers need to be set: head, the pointer of the head of the queue, which points to the actual head element of the queue; tail, the pointer of the tail of the queue, which points to the next position of the actual tail element of the queue. Under normal circumstances, the initial value of the two pointers is set to 0, and the queue is empty at this time, with no elements. When the number of queue elements reaches the upper bound m of the array, when a new queue element enters the queue, the queue element that first enters the queue is deleted from the queue.
例如,队列也可以用链表来存储,把数据在数学逻辑上的先后相邻关系用元素的存储地址的指针来指示,以此形成链式队列,可以动态地进行存储分配。For example, the queue can also be stored in a linked list, and the mathematically logical sequential and adjacent relationship of the data is indicated by the pointer of the storage address of the element, thereby forming a chained queue, which can dynamically allocate storage.
例如,所述评论队列为数组队列,则所述评论队列的长度所具有的第二阈值即为所述数组队列的最大容量,比如为1000条用户评论。For example, if the comment queue is an array queue, the second threshold of the length of the comment queue is the maximum capacity of the array queue, for example, 1000 user comments.
比如,当检测评论队列中包含有与所述用户评论相同的历史评论时,为了避免评论队列中多次出现重复内容的用户评论,进而影响用户的信息获取效率,可以拒绝对评论队列进行更新,并在记录与所述用户评论相同的历史评论的点赞数组上加1,以表示有其他人发表与所述历史评论的内容相同或相似的用户评论,或者表示有其他人赞同所述历史评论的内容。For example, when it is detected that the comment queue contains the same historical comments as the user comments, in order to avoid multiple user comments with repeated content in the comment queue, thereby affecting the user's information acquisition efficiency, it is possible to refuse to update the comment queue, And add 1 to the like array that records the same historical comments as the user comments, to indicate that other people have published user comments that are the same or similar to the content of the historical comments, or to indicate that other people agree with the historical comments Content.
如图2所示,比如所述第一阈值为5,所述评论队列中与内容为“争取早日再来个雾霾传感器”的用户评论相同的评论数量为1,则所述第一判断模块33判定述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值。As shown in Figure 2, for example, the first threshold value is 5, and the number of comments identical to the user comments whose content is "strive for another haze sensor as soon as possible" in the comment queue is 1, then the first judging module 33 It is determined that the number of comments identical or similar to the user's comments in the comment queue does not reach the first threshold.
如图5所示,比如所述第一阈值为5,所述评论队列中与内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论相同的评论数量为7,所述第一判断模块33判定所述评论队列中与所述用户评论相同或者相似的评论数量已达到第一阈值。As shown in Figure 5, for example, the first threshold is 5, and the number of comments in the comment queue that is the same as the user comment whose content is "fishing expert 3 fish lifting line Jiawei new a5a7a9 class lifting line" is 7 , the first judging module 33 judges that the number of comments that are the same as or similar to the user's comments in the comment queue has reached a first threshold.
所述处理模块34,用于将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。The processing module 34 is configured to add the user comments into the comment queue as the head comments, and delete the tail comments exceeding the second threshold.
可以理解的是,所述先进先出队列的长度可以预设为第二阈值。所述长度可以用数组队列中的所能容纳的数据包总数来表示,数组在建立之前需提前设置为固定的大小,即为每个队列元素设置一个合适的字节长度,以满足单个队列元素对字节长度的需求,可以理解为每个队列元素代表一个数据包,每个数据包具有固定的大小,比如数组为N[1…1000],则所述第二阈值为1000个。所述先进先出队列的长度也可以用链式队列中的存储单元的指针个数来表示,链表不需要提前分配固定大小的存储空间,当需要存储数据时,可以为每个队列元素设置一个合适的存储单元用于存储数据,并将所述存储单元通过指针与队列中的其他的存储单元链接在一起。所述评论队列的内容是实时变化的,比如,在评论区所展示的区域有新的用户评论入队列时,所述处理模块34将所述用户评论添加至评论队列中作为队首评论,作为队尾评论的历史评论则出队列,其他的历史评论的队列编号分别在原来的基础上加1。It can be understood that the length of the FIFO queue may be preset as the second threshold. The length can be represented by the total number of data packets that can be accommodated in the array queue. Before the array is established, it needs to be set to a fixed size in advance, that is, an appropriate byte length is set for each queue element to meet the requirements of a single queue element. The requirement for byte length can be understood as each queue element represents a data packet, and each data packet has a fixed size, for example, if the array is N[1...1000], then the second threshold is 1000. The length of the first-in-first-out queue can also be represented by the number of pointers to the storage units in the chained queue. The linked list does not need to allocate a fixed-size storage space in advance. When data needs to be stored, one can be set for each queue element. Appropriate storage units are used to store data, and the storage units are linked together with other storage units in the queue through pointers. The content of the comment queue changes in real time. For example, when a new user comment enters the queue in the area displayed in the comment area, the processing module 34 adds the user comment to the comment queue as the leader comment, as The historical comments of the comments at the end of the queue will be dequeued, and the queue numbers of other historical comments will be increased by 1 on the original basis.
如图3所示,所述评论队列中与所述用户评论相同或者相似的评论数量小于第一阈值时,对所述评论队列进行更新,所述处理模块34将所述用户评论“争取早日再来个雾霾传感器”添加至所述评论队列的队首No.1,并删除位于所述评论队列的队尾No.1000的历史评论“求甲醛传感器”。原来编号为No.1的历史评论“好文章!点赞”的编号变为No.2,其显示于编号No.2的显示栏位,其余历史评论均向后移动一个显示栏位。As shown in FIG. 3 , when the number of comments that are the same as or similar to the user comments in the comment queue is less than the first threshold, the comment queue is updated, and the processing module 34 puts the user comment "try to come back sooner." Haze sensor" is added to the head No.1 of the comment queue, and the historical comment "seeking formaldehyde sensor" located at the tail No.1000 of the comment queue is deleted. The number of the historical comment "Good article! Like" originally numbered No.1 is changed to No.2, which is displayed in the display column of No.2, and the other historical comments are all moved backward by one display column.
所述确定模块35,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论。The determination module 35 is configured to determine the user comment as a spam comment when it is judged that the number of comments identical or similar to the user comment in the comment queue reaches a first threshold.
可以理解的是,当所述确定模块35确定所述用户评论为垃圾评论时,所述处理模块34可以拒绝对评论队列进行更新。It can be understood that, when the determining module 35 determines that the user comment is spam, the processing module 34 may refuse to update the comment queue.
如图6所示,当所述确定模块35确定所述内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论为垃圾评论时,所述处理模块34拒绝对评论队列进行更新。As shown in FIG. 6 , when the determination module 35 determines that the user comment whose content is "fishing master 3 耋鱼提线 Jiaweixin a5a7a9 class tiline" is a spam comment, the processing module 34 refuses to comment The comment queue is updated.
一些实施方式中,所述处理模块34在拒绝对评论队列进行更新时,还可以弹出提示框,以提醒用户其评论信息发表失败的提示信息。如图6所示,当用户点“评论”按钮之后,弹出内容为“评论审核未通过:为垃圾评论!”的提示框,同时拒绝对评论队列进行更新,所述手机界面上显示的发表评论的评论区没有变化。In some implementations, when the processing module 34 refuses to update the comment queue, a prompt box may also pop up to remind the user of the failure to publish the comment information. As shown in Figure 6, when the user clicks the "Comment" button, a prompt box pops up with the content "Comment Review Failed: Spam Comment!", and refuses to update the comment queue at the same time. The comments section of .
所述检测模块36,用于当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。The detection module 36 is configured to detect whether the user comment contains contact information when the user comment is determined to be a spam comment, and if so, add the contact information to a blacklist database as characteristic information.
一些实施方式中,当所述检测模块36检测到所述用户评论中包含有联系方式时,且所述联系方式为新的联系方式时,将所述用户评论中提取到的新的联系方式新增至所述黑名单库中作为特征信息。当所述联系方式为旧的联系方式时,可以对所述黑名单库中原有的联系方式进行覆盖,或者不添加到所述黑名单库中。In some implementations, when the detection module 36 detects that the user comments contain contact information, and the contact information is a new contact information, the new contact information extracted from the user comments is updated to Added to the blacklist library as feature information. When the contact information is an old contact information, the original contact information in the blacklist database may be overwritten, or not added to the blacklist database.
可以理解的是,当所述用户评论中检测到新的联系方式时,提取所述新的联系方式,并新增至所述黑名单库中作为特征信息,以作为下一个用户评论的检测依据。It can be understood that, when a new contact method is detected in the user comment, the new contact method is extracted and added to the blacklist database as feature information to be used as the detection basis for the next user comment .
如图6所示,比如内容为“捕鱼达人3逋鱼提线迦魏新a5a7a9课提线”的用户评论为垃圾评论时,提取所述用户评论中的新的联系方式“a5a7a9”,并将“a5a7a9”新增至所述黑名单库中作为特征信息。As shown in Figure 6, for example, when the user comment with the content of "fishing expert 3 fish tiling line Jiawei new a5a7a9 class tiling line" is a spam comment, the new contact information "a5a7a9" in the user comment is extracted, And "a5a7a9" is added to the blacklist library as feature information.
本发明实施例还提供一种计算机设备,如图9所示,图9为本发明实施例提供的一种计算机设备的结构示意图。该计算机设备400可以包括射频(RF,Radio Frequency)电路401、包括有一个或一个以上计算机可读存储介质的存储器402、输入单元403、显示单元404、传感器405、音频电路406、无线保真(WiFi,Wireless Fidelity)模块407、包括有一个或者一个以上处理核心的处理器408、以及电源409等部件。本领域技术人员可以理解,图9中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。An embodiment of the present invention also provides a computer device, as shown in FIG. 9 , which is a schematic structural diagram of a computer device provided by an embodiment of the present invention. The computer device 400 may include a radio frequency (RF, Radio Frequency) circuit 401, a memory 402 including one or more computer-readable storage media, an input unit 403, a display unit 404, a sensor 405, an audio circuit 406, a wireless fidelity ( WiFi (Wireless Fidelity) module 407, including a processor 408 with one or more processing cores, and a power supply 409 and other components. Those skilled in the art can understand that the structure of the computer device shown in FIG. 9 is not limited to the computer device, and may include more or less components than shown in the figure, or combine certain components, or arrange different components.
射频电路401可用于收发信息,或通话过程中信号的接收和发送。The radio frequency circuit 401 can be used for sending and receiving information, or receiving and sending signals during a call.
存储器402可用于存储应用程序和数据。存储器402存储的应用程序中包含有计算机程序。Memory 402 may be used to store applications and data. The application programs stored in the memory 402 include computer programs.
输入单元403可用于接收输入的数字、字符信息或用户特征信息(比如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The input unit 403 can be used to receive input numbers, character information or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
显示单元404可用于显示由用户输入的信息或提供给用户的信息以及计算机设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。The display unit 404 can be used to display information input by or provided to the user and various graphical user interfaces of the computer device. These graphical user interfaces can be composed of graphics, text, icons, videos and any combination thereof.
计算机设备还可包括至少一种传感器405,比如光传感器、运动传感器以及其他传感器。The computer device may also include at least one sensor 405, such as a light sensor, motion sensor, and other sensors.
音频电路406可通过扬声器、传声器提供用户与计算机设备之间的音频接口。The audio circuit 406 may provide an audio interface between the user and the computer device through speakers, microphones.
无线保真(WiFi)模块407可用于短距离无线传输,可以帮助用户收发电子邮件、浏览网站和访问流式媒体等,它为用户提供了无线的宽带互联网访问。The wireless fidelity (WiFi) module 407 can be used for short-distance wireless transmission, and can help users send and receive emails, browse websites and access streaming media, etc., and it provides users with wireless broadband Internet access.
处理器408是计算机设备的控制中心,利用各种接口和线路链接整个计算机设备的各个部分,通过运行或执行存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。The processor 408 is the control center of the computer equipment. It uses various interfaces and lines to link various parts of the entire computer equipment. By running or executing the application programs stored in the memory 402 and calling the data stored in the memory 402, the computer equipment executes Various functions and processing data, so as to monitor the computer equipment as a whole.
计算机设备还包括给各个部件供电的电源409(比如电池)。The computer device also includes a power source 409 (such as a battery) for powering various components.
尽管图9中未示出,计算机设备还可以包括摄像头、蓝牙模块等,在此不再赘述。Although not shown in FIG. 9 , the computer device may also include a camera, a Bluetooth module, etc., which will not be repeated here.
具体在本实施例中,计算机设备中的处理器408会按照如下的指令,将一个或一个以上的应用程序的进程对应的计算机程序加载到存储器402中,并由处理器408来运行存储在存储器402中的应用程序,执行如下操作:Specifically, in this embodiment, the processor 408 in the computer device loads the computer program corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 408 runs the computer program stored in the memory. 402 in the application, perform the following operations:
获取用户评论;Get user comments;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;traversing the comment queue, and judging whether the number of comments in the comment queue that are the same as or similar to the user's comments reaches a first threshold, wherein the comment queue is a first-in-first-out queue and its length has a second threshold;
若是,则将所述用户评论确定为垃圾评论;If so, then determining the user comments as spam comments;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。If not, add the user comment into a comment queue, and process comments at the end of the comment queue according to the second threshold.
一些实施方式中,处理器408用于所述将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理,包括:In some implementations, the processor 408 is configured to add the user comments to the comment queue, and process comments at the end of the comment queue according to the second threshold, including:
将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。Adding the user's comment into the comment queue as the head comment, and deleting the tail comment exceeding the second threshold.
一些实施方式中,处理器408用于在所述获取用户评论之后,还包括:In some implementations, the processor 408 is configured to further include:
判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论。Judging whether the comment information in the user comments exists in the blacklist database, and if so, determining the user comments as spam comments.
一些实施方式中,处理器408用于所述判断所述用户评论中的评论信息是否存在黑名单库中,包括:In some implementations, the processor 408 is used to determine whether the comment information in the user comments exists in the blacklist library, including:
判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则确定所述用户评论中的评论信息存在黑名单库中。Judging whether the user comments contain information matching the feature information in the blacklist database, and if so, determining that the comment information in the user comments exists in the blacklist database.
一些实施方式中,处理器408还用于:In some implementations, the processor 408 is also used to:
当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中。When the user comment is determined to be a spam comment, it is detected whether the user comment contains contact information, and if so, the contact information is added to the blacklist library.
一些实施方式中,处理器408用于所述判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,包括:In some implementations, the processor 408 is used to determine whether the number of comments that are the same as or similar to the user's comments in the comment queue reaches a first threshold, including:
判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,若是则确定所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值。Judging whether the number of historical comments whose similarity with the user comments in the comment queue reaches the third threshold reaches the first threshold, and if so, determine the comments that are the same as or similar to the user comments in the comment queue The number reaches the first threshold.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.
本发明实施例中,所述信息处理装置与上文实施例中的一种信息处理方法属于同一构思,在所述信息处理装置上可以运行所述信息处理方法实施例中提供的任一方法,其具体实现过程详见所述信息处理方法实施例,此处不再赘述。In the embodiment of the present invention, the information processing device and the information processing method in the above embodiments belong to the same idea, any method provided in the information processing method embodiment can be run on the information processing device, For the specific implementation process, refer to the embodiment of the information processing method, and details are not repeated here.
需要说明的是,对本发明所述信息处理方法而言,本领域普通测试人员可以理解实现本发明实施例所述信息处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在计算机设备的存储器中,并被该计算机设备内的至少一个处理器执行,在执行过程中可包括如所述信息处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。It should be noted that, for the information processing method of the present invention, ordinary testers in the field can understand that all or part of the flow of the information processing method described in the embodiment of the present invention can be completed by controlling the relevant hardware through a computer program , the computer program may be stored in a computer-readable storage medium, such as stored in a memory of a computer device, and executed by at least one processor in the computer device, and the execution process may include information processing as described Flow of an embodiment of the method. Wherein, the storage medium may be a magnetic disk, an optical disk, a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory) and the like.
对本发明实施例的所述信息处理装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。For the information processing device in the embodiment of the present invention, its various functional modules may be integrated into one processing chip, or each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium, such as read-only memory, magnetic disk or optical disk, etc. .
以上对本发明实施例所提供的一种信息处理方法、装置及计算机设备进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的技术方案及其核心思想;本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例的技术方案的范围。The information processing method, device and computer equipment provided by the embodiment of the present invention have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiment is only for helping Understand the technical solution and its core idea of the present invention; those skilled in the art should understand that: they can still modify the technical solutions recorded in the foregoing embodiments, or perform equivalent replacements for some of the technical features; and these modifications or The replacement does not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (13)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710026441.2A CN106777341A (en) | 2017-01-13 | 2017-01-13 | Information processing method, device and computer equipment |
PCT/CN2017/107191 WO2018129978A1 (en) | 2017-01-13 | 2017-10-21 | Information processing method, device, storage medium and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710026441.2A CN106777341A (en) | 2017-01-13 | 2017-01-13 | Information processing method, device and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777341A true CN106777341A (en) | 2017-05-31 |
Family
ID=58945583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710026441.2A Pending CN106777341A (en) | 2017-01-13 | 2017-01-13 | Information processing method, device and computer equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106777341A (en) |
WO (1) | WO2018129978A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018129978A1 (en) * | 2017-01-13 | 2018-07-19 | 广东欧珀移动通信有限公司 | Information processing method, device, storage medium and computer device |
CN109933775A (en) * | 2017-12-15 | 2019-06-25 | 腾讯科技(深圳)有限公司 | UGC content processing method and device |
CN110020057A (en) * | 2017-12-29 | 2019-07-16 | 中国移动通信集团陕西有限公司 | A kind of comment spam information identifying method and device |
CN110175851A (en) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | A kind of cheating detection method and device |
CN112507146A (en) * | 2020-11-27 | 2021-03-16 | 北京达佳互联信息技术有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN114245163A (en) * | 2021-12-15 | 2022-03-25 | 四川启睿克科技有限公司 | Method for filtering bullet screen of robot |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241377B (en) * | 2020-01-02 | 2023-05-26 | 华数传媒网络有限公司 | Live broadcast real-time comment system with auditing function |
CN113987158B (en) * | 2021-10-15 | 2025-08-29 | 北京搜狗科技发展有限公司 | Display method, device and device for display |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102315953A (en) * | 2010-06-29 | 2012-01-11 | 百度在线网络技术(北京)有限公司 | Method and device for detecting junk posts based on occurrence rule of posts |
CN103226576A (en) * | 2013-04-01 | 2013-07-31 | 杭州电子科技大学 | Comment spam filtering method based on semantic similarity |
US20140122584A1 (en) * | 2012-10-25 | 2014-05-01 | Google, Inc. | Soft posting to social activity streams |
CN104869467A (en) * | 2015-03-26 | 2015-08-26 | 腾讯科技(北京)有限公司 | Information output method and system for media playing, and apparatuses |
CN104933191A (en) * | 2015-07-09 | 2015-09-23 | 广东欧珀移动通信有限公司 | A method, system and terminal for identifying spam comments based on Bayesian algorithm |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159704A (en) * | 2007-10-23 | 2008-04-09 | 浙江大学 | Anti-spam method based on micro-content similarity |
CN104050195B (en) * | 2013-03-15 | 2017-11-03 | 暴风集团股份有限公司 | A kind of advertisement sticker processing method and system |
CN106777341A (en) * | 2017-01-13 | 2017-05-31 | 广东欧珀移动通信有限公司 | Information processing method, device and computer equipment |
-
2017
- 2017-01-13 CN CN201710026441.2A patent/CN106777341A/en active Pending
- 2017-10-21 WO PCT/CN2017/107191 patent/WO2018129978A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102315953A (en) * | 2010-06-29 | 2012-01-11 | 百度在线网络技术(北京)有限公司 | Method and device for detecting junk posts based on occurrence rule of posts |
US20140122584A1 (en) * | 2012-10-25 | 2014-05-01 | Google, Inc. | Soft posting to social activity streams |
CN103226576A (en) * | 2013-04-01 | 2013-07-31 | 杭州电子科技大学 | Comment spam filtering method based on semantic similarity |
CN104869467A (en) * | 2015-03-26 | 2015-08-26 | 腾讯科技(北京)有限公司 | Information output method and system for media playing, and apparatuses |
CN104933191A (en) * | 2015-07-09 | 2015-09-23 | 广东欧珀移动通信有限公司 | A method, system and terminal for identifying spam comments based on Bayesian algorithm |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018129978A1 (en) * | 2017-01-13 | 2018-07-19 | 广东欧珀移动通信有限公司 | Information processing method, device, storage medium and computer device |
CN109933775A (en) * | 2017-12-15 | 2019-06-25 | 腾讯科技(深圳)有限公司 | UGC content processing method and device |
CN109933775B (en) * | 2017-12-15 | 2022-02-18 | 腾讯科技(深圳)有限公司 | UGC content processing method and device |
CN110020057A (en) * | 2017-12-29 | 2019-07-16 | 中国移动通信集团陕西有限公司 | A kind of comment spam information identifying method and device |
CN110020057B (en) * | 2017-12-29 | 2021-05-25 | 中国移动通信集团陕西有限公司 | Method and device for identifying spam comment information |
CN110175851A (en) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | A kind of cheating detection method and device |
CN110175851B (en) * | 2019-02-28 | 2023-09-12 | 腾讯科技(深圳)有限公司 | Cheating behavior detection method and device |
CN112507146A (en) * | 2020-11-27 | 2021-03-16 | 北京达佳互联信息技术有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN114245163A (en) * | 2021-12-15 | 2022-03-25 | 四川启睿克科技有限公司 | Method for filtering bullet screen of robot |
Also Published As
Publication number | Publication date |
---|---|
WO2018129978A1 (en) | 2018-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777341A (en) | Information processing method, device and computer equipment | |
US10834218B2 (en) | Event information system classifying messages using machine learning classification model and pushing selected message to user | |
CN110046299B (en) | Computerized system and method for automatically performing an implicit message search | |
JP6744480B2 (en) | Network-based ad data traffic latency reduction | |
US20190173825A1 (en) | System and method for email message following from a user's inbox | |
US20150033141A1 (en) | System and method for providing an interactive message inbox | |
CN110390569B (en) | Content promotion method, device and storage medium | |
US10873553B2 (en) | System and method for triaging in a message system on send flow | |
US11010687B2 (en) | Detecting abusive language using character N-gram features | |
US9596205B2 (en) | System and method for mailing list identification and representation | |
US10033775B2 (en) | System and method for providing users feedback regarding their reading habits | |
CN105335398A (en) | Service recommendation method and terminal | |
CN104462509A (en) | Review spam detection method and device | |
CN109325179A (en) | Method and device for promoting content | |
US11184451B2 (en) | Intelligently delivering notifications including summary of followed content and related content | |
CN110392155B (en) | Notification message display and processing method, device and equipment | |
US20160239533A1 (en) | Identity workflow that utilizes multiple storage engines to support various lifecycles | |
CN111242709A (en) | Message pushing method and device, equipment and storage medium thereof | |
CN106354570A (en) | Method and device for copying and pasting account information | |
CN111428162A (en) | Page screenshot method and device | |
CN111813929A (en) | Information processing method, device and electronic equipment | |
CN107592399A (en) | Contact display method and mobile terminal | |
CN110442803A (en) | Data processing method, device, medium and the calculating equipment executed by calculating equipment | |
CN109981712B (en) | Method and device for pushing information | |
CN106302135A (en) | The method of a kind of mail arrangement and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 523860 No. 18, Wu Sha Beach Road, Changan Town, Dongguan, Guangdong Applicant after: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS Corp.,Ltd. Address before: 523860 No. 18, Wu Sha Beach Road, Changan Town, Dongguan, Guangdong Applicant before: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS Corp.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |