[go: up one dir, main page]

CN105095324A - User classification apparatus, user classification method and electronic device - Google Patents

User classification apparatus, user classification method and electronic device Download PDF

Info

Publication number
CN105095324A
CN105095324A CN201410222082.4A CN201410222082A CN105095324A CN 105095324 A CN105095324 A CN 105095324A CN 201410222082 A CN201410222082 A CN 201410222082A CN 105095324 A CN105095324 A CN 105095324A
Authority
CN
China
Prior art keywords
content
predetermined field
user
users
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410222082.4A
Other languages
Chinese (zh)
Inventor
葛乃晟
付奕雯
郑仲光
孟遥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201410222082.4A priority Critical patent/CN105095324A/en
Publication of CN105095324A publication Critical patent/CN105095324A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a user classification apparatus, a user classification method and an electronic device. The user classification apparatus is used for classifying users in a predetermined domain and comprises: a content searching unit used for searching a content that contains a keyword of the predetermined domain in a predetermined data source to be used as a predetermined domain content, and taking users who publish the predetermined domain content as to-be-classified users; and a user classification unit used for classifying the to-be-classified users according to an attribute, related to the user, of the predetermined domain content. According to the user classification apparatus, the user classification method and the electronic device, provided by the present invention, at least more precision classification can be performed on the users of the predetermined domain.

Description

用户分类装置、用户分类方法以及电子设备User classification device, user classification method and electronic equipment

技术领域technical field

本发明涉及信息处理领域,尤其涉及一种用于对预定领域的用户进行分类的用户分类装置、用户分类方法以及电子设备。The invention relates to the field of information processing, in particular to a user classification device, a user classification method and electronic equipment for classifying users in a predetermined field.

背景技术Background technique

随着互联网技术的发展,越来越多的用户通过在互联网平台(例如博客、微博等)上对其感兴趣的事务发表意见、感受等。如何针对这些用户、尤其是特定领域的用户进行分类并加以管理是当前研究的一大热点。目前对于发布信息的用户的分析与分类基本上都是基于由用户之间关注度形成的关系(也称为粉丝关系,即当某一用户对其他用户的关注度表达持续关注,则该用户可以被称为其他用户的粉丝,其与所关注的用户之间即构成粉丝关系)。然而这种方式的局限性在于如果用户间没有粉丝关系,则无法进行分析,而且即使有粉丝关系,由于粉丝关系并不能直接表达各个用户与某一领域的关系,因此也不能实现对该领域的用户的准确分类。因此,亟需一种能够针对某一领域的用户进行准确分类的用户分类装置、用户分类方法及电子设备。With the development of Internet technology, more and more users express their opinions, feelings, etc. on matters of interest to them on Internet platforms (such as blogs, microblogs, etc.). How to classify and manage these users, especially users in specific fields, is a hot spot in current research. At present, the analysis and classification of users who post information are basically based on the relationship formed by the degree of attention among users (also known as fan relationship, that is, when a user expresses continuous attention to the degree of attention of other users, the user can It is called a fan of other users, and it constitutes a fan relationship with the user it follows). However, the limitation of this method is that if there is no fan relationship between users, the analysis cannot be performed, and even if there is fan relationship, since the fan relationship cannot directly express the relationship between each user and a certain field, it cannot realize the analysis of the field. Accurate classification of users. Therefore, there is an urgent need for a user classification device, user classification method, and electronic equipment that can accurately classify users in a certain field.

发明内容Contents of the invention

在下文中给出了关于本发明的简要概述,以便提供关于本发明的某些方面的基本理解。应当理解,这个概述并不是关于本发明的穷举性概述。它并不是意图确定本发明的关键或重要部分,也不是意图限定本发明的范围。其目的仅仅是以简化的形式给出某些概念,以此作为稍后论述的更详细描述的前序。A brief overview of the invention is given below in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical parts of the invention nor to delineate the scope of the invention. Its purpose is merely to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

鉴于现有技术的上述缺陷,本发明的目的之一是提供一种用户分类装置、用户分类方法以及电子设备,以至少克服现有的问题。In view of the above-mentioned defects in the prior art, one object of the present invention is to provide a user classification device, a user classification method and electronic equipment, so as to at least overcome the existing problems.

根据本公开的一个方面,提供一种用户分类装置,用于针对预定领域的用户进行分类,该用户分类装置包括:内容搜索单元,用于在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及用户分类单元,用于根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。According to one aspect of the present disclosure, there is provided a user classification device, which is used to classify users in a predetermined field, and the user classification device includes: a content search unit, which is used to search a predetermined data source for keywords containing the predetermined field The content is the content in the predetermined field, and the users who publish the content in the predetermined field are the users to be classified; and a user classification unit is configured to classify the users to be classified according to the user-related attributes of the content in the predetermined field.

根据本公开的另一方面,提供一种用户分类方法,用于针对预定领域的用户进行分类,该用户分类方法包括:在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。According to another aspect of the present disclosure, there is provided a user classification method for classifying users in a predetermined field, the user classification method includes: searching for content containing keywords of the predetermined field in a predetermined data source as the content of the predetermined field , and use the user who publishes the content in the predetermined field as the user to be classified; and classify the user to be classified according to the user-related attributes of the content in the predetermined field.

根据本公开的另一个方面,还提供了一种电子设备,该电子设备包括如上所述的用户分类装置。According to another aspect of the present disclosure, an electronic device is also provided, the electronic device includes the user classification apparatus as described above.

依据本公开的其它方面,还提供了一种使得计算机用作如上所述的用户分类装置的程序。According to other aspects of the present disclosure, there is also provided a program for causing a computer to function as the user classification device as described above.

依据本公开的又一方面,还提供了相应的计算机可读存储介质,该计算机可读存储介质上存储有能够由计算设备执行的计算机程序,该计算机程序在执行时能够使计算设备执行上述用户分类方法。According to yet another aspect of the present disclosure, there is also provided a corresponding computer-readable storage medium, where a computer program executable by a computing device is stored on the computer-readable storage medium, and when executed, the computer program can enable the computing device to execute the above user Classification.

上述根据本公开实施例的用户分类装置和方法以及电子设备,至少能够获得以下益处之一:通过根据预定领域内容的、与用户相关的属性对预定领域进行划分,能够实现更精准的用户分类;利用非预定领域的实体词对预定领域的实体词进行去重,能够实现对预定领域主题词的扩展。The above-mentioned user classification apparatus and method and electronic device according to the embodiments of the present disclosure can obtain at least one of the following benefits: by dividing the predetermined field according to the attributes related to the user of the predetermined field content, more accurate user classification can be realized; Using the entity words in the non-predetermined field to deduplicate the entity words in the predetermined field can realize the expansion of the subject words in the predetermined field.

通过以下结合附图对本公开的最佳实施例的详细说明,本公开的这些以及其他优点将更加明显。These and other advantages of the present disclosure will be more apparent through the following detailed description of the preferred embodiments of the present disclosure with reference to the accompanying drawings.

附图说明Description of drawings

本公开可以通过参考下文中结合附图所给出的描述而得到更好的理解,其中在所有附图中使用了相同或相似的附图标记来表示相同或者相似的部件。所述附图连同下面的详细说明一起包含在本说明书中并且形成本说明书的一部分,而且用来进一步举例说明本公开的优选实施例和解释本公开的原理和优点。其中:The present disclosure may be better understood by referring to the following description given in conjunction with the accompanying drawings, wherein the same or similar reference numerals are used throughout to designate the same or similar parts. The accompanying drawings, together with the following detailed description, are incorporated in and form a part of this specification, and serve to further illustrate the preferred embodiments of the present disclosure and explain the principles and advantages of the present disclosure. in:

图1是示意性地示出根据本公开实施例的用户分类装置的一种示例结构的框图。FIG. 1 is a block diagram schematically showing an example structure of a user classification device according to an embodiment of the present disclosure.

图2是示意性地示出图1中的用户分类单元的一种示例结构的框图。FIG. 2 is a block diagram schematically showing an example structure of the user classification unit in FIG. 1 .

图3是示意性地示出根据本公开实施例的用户分类装置的另一种示例结构的框图。Fig. 3 is a block diagram schematically showing another example structure of a user classification device according to an embodiment of the present disclosure.

图4是示意性地示出图3中的主题词确定单元的另一种示例结构的框图。FIG. 4 is a block diagram schematically showing another example structure of the subject heading determining unit in FIG. 3 .

图5是示意性示出根据本公开实施例的用户分类方法的流程图。Fig. 5 is a flowchart schematically illustrating a user classification method according to an embodiment of the present disclosure.

图6是示出了可用来实现根据本公开实施例的用户分类装置和用户分类方法的一种可能的硬件配置的结构简图。FIG. 6 is a schematic structural diagram showing a possible hardware configuration that can be used to implement a user classification device and a user classification method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

在下文中将结合附图对本发明的示范性实施例进行描述。为了清楚和简明起见,在说明书中并未描述实际实施方式的所有特征。然而,应该了解,在开发任何这种实际实施例的过程中必须做出很多特定于实施方式的决定,以便实现开发人员的具体目标,例如,符合与系统及业务相关的那些限制条件,并且这些限制条件可能会随着实施方式的不同而有所改变。此外,还应该了解,虽然开发工作有可能是非常复杂和费时的,但对得益于本公开内容的本领域技术人员来说,这种开发工作仅仅是例行的任务。Exemplary embodiments of the present invention will be described below with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in this specification. It should be understood, however, that in developing any such practical embodiment, many implementation-specific decisions must be made in order to achieve the developer's specific goals, such as meeting those constraints related to the system and business, and those Restrictions may vary from implementation to implementation. Moreover, it should also be understood that development work, while potentially complex and time-consuming, would at least be a routine undertaking for those skilled in the art having the benefit of this disclosure.

在此,还需要说明的一点是,为了避免因不必要的细节而模糊了本发明,在附图中仅仅示出了与根据本发明的方案密切相关的装置结构和/或处理步骤,而省略了与本发明关系不大的其他细节。Here, it should also be noted that, in order to avoid obscuring the present invention due to unnecessary details, only the device structure and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and the Other details not relevant to the present invention are described.

本发明人发现,用户发表内容通常比用户之间的粉丝关系更能反映该用户与预定领域之间的关系,因此本发明提出一种基于用户发表的内容对用户进行分类的用户分类装置、用户分类方法和电子设备。The inventors found that the content published by users can usually reflect the relationship between the user and the predetermined field better than the fan relationship between users. Therefore, the present invention proposes a user classification device for classifying users based on the content published by users. Classification methods and electronic equipment.

图1是示意性地示出根据本公开实施例的用户分类装置的一种示例结构的框图。FIG. 1 is a block diagram schematically showing an example structure of a user classification device according to an embodiment of the present disclosure.

根据本公开的用户分类装置1用于针对预定领域的用户进行分类。该预定领域可以是例如美容、汽车等的任意领域,只要想针对该领域对用户进行进一步分类以便于管理即可。进行分类的用户可以是能够获取其发布的信息的任意用户、也可以是特定的用户群(例如微博用户或者博客用户等)。根据本公开的一个实施例,用户分类装置1可以针对预定领域微博用户的用户进行分类。The user classification device 1 according to the present disclosure is used to classify users in a predetermined field. The predetermined field can be any field such as beauty, automobile, etc., as long as users want to further classify users in this field for easy management. The classified users may be any users who can obtain the information published by them, or may be specific user groups (such as microblog users or blog users, etc.). According to an embodiment of the present disclosure, the user classification device 1 can classify users of microblog users in a predetermined field.

如图1所示,用户分类装置1包括:内容搜索单元10,用于在预定数据源中搜索包含预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及用户分类单元20,用于根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。As shown in Figure 1, the user classification device 1 includes: a content search unit 10, which is used to search the content containing the subject words of the predetermined field in the predetermined data source as the content of the predetermined field, and use the user who publishes the content of the predetermined field as the content to be classified users; and a user classification unit 20, configured to classify the users to be classified according to the user-related attributes of the content in the predetermined field.

根据本公开,内容搜索单元10可以在例如网络、特定数据库的预定数据源中搜索包含预定领域的主题词的内容。预定数据源例如可以是包括各个门户网站中的微博信息的数据库等。预定领域的主题词是预定领域专用的字或者词,根据本公开的实施例,可以针对预定领域指定主题词,例如对于汽车领域,可以将“变速箱”、“制动器”等指定为主题词。According to the present disclosure, the content search unit 10 may search for content containing subject words of a predetermined field in a predetermined data source such as a network, a specific database. The predetermined data source may be, for example, a database including microblog information in various portal websites. The subject headings in the predetermined field are words or phrases specific to the predetermined field. According to the embodiments of the present disclosure, subject headings can be specified for the predetermined field, for example, for the automotive field, "gearbox", "brake" and the like can be specified as the subject headings.

通过搜索预定的数据源,内容搜索单元10可以获得包含预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为该预定领域中的待分类用户,以在随后的处理中对这些用户进行分类和管理。By searching a predetermined data source, the content search unit 10 can obtain the content containing the subject words of the predetermined field as the content of the predetermined field, and use the user who released the content of the predetermined field as the user to be classified in the predetermined field for subsequent processing Classify and manage these users in .

根据本公开的一个实施例,在用户分类装置1用于对微博用户进行分类的情况下,内容搜索单元10可以在预定数据源中搜索包含该预定领域的主题词的微博内容作为所述预定领域内容,并且将发布包含该预定领域主题词的微博的微博用户作为待分类用户。According to an embodiment of the present disclosure, when the user classification device 1 is used to classify microblog users, the content search unit 10 may search for microblog content containing subject words in the predetermined field in predetermined data sources as the Predetermined field content, and the microblog users who publish microblogs containing the predetermined field keywords are regarded as users to be classified.

在通过内容搜索单元10获得了预定领域的待分类用户之后,用户分类单元20对根据预定领域内容的、与用户相关的属性,对所述待分类用户进行进一步分类,以使得从事与该预定领域相关的工作的用户能够根据用户分类单元20对于该预定领域的用户的分类,有针对性的对其潜在的客户进行管理。After obtaining the users to be classified in the predetermined field through the content search unit 10, the user classification unit 20 further classifies the users to be classified according to the attributes related to the users in the predetermined field content, so that the users engaged in the predetermined field Users of related jobs can manage their potential customers in a targeted manner according to the classification of users in the predetermined field by the user classification unit 20 .

根据本公开的实施例,预定领域内容的、与用户相关的属性可以包括待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性。待分类用户发布所述预定领域内容的行为模式可以在一定程度上表示待分类用户对于预定领域内容的感兴趣程度;预定领域内容的传播特性可以表示发布该预定领域内容的待分类用户的影响力。基于待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性对待分类用户进行划分,有利于根据用户的影响力和感兴趣程度对分类后的用户进行有针对性的管理。According to an embodiment of the present disclosure, the user-related attributes of the content in the predetermined field may include a behavior pattern of users to be classified publishing the content in the predetermined field and propagation characteristics of the content in the predetermined field. The behavior pattern of the user to be classified publishing the content in the predetermined field can indicate the degree of interest of the user to be classified in the content of the predetermined field to a certain extent; the propagation characteristics of the content in the predetermined field can indicate the influence of the user to be classified who publishes the content in the predetermined field . Classify users to be classified based on the behavior pattern of users publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field, which is conducive to targeted management of classified users according to their influence and interest .

根据本公开的实施例,待分类用户发布所述预定领域内容的行为模式可以包括所述待分类用户发布所述预定领域内容的最近时间和频率;预定领域内容的传播特性例如可以由所述预定领域内容被转发的次数和/或评论的次数确定。According to an embodiment of the present disclosure, the behavior pattern of the user to be classified publishing the content in the predetermined field may include the latest time and frequency of the content in the predetermined field published by the user to be classified; The number of times the domain content is forwarded and/or commented is determined.

根据本公开的优选实施例,凡是预定领域内容中提及了该预定领域的主题词一次,就将待分类用户发布预定领域内容的频率增加一次。具体地,当某个待分类用户发布的某一预定领域内容中包括多个该预定领域的主题词的情况下,则将该待分类用户发布预定领域内容的频率记为与所包含的主题词数量相同的多次,例如,当某篇微博中包含6次汽车领域的主题词(包括相同的或者不同的主题词)时,则将该待分类用户发布汽车领域的内容的频率增加6次。According to a preferred embodiment of the present disclosure, whenever a keyword in the predetermined field is mentioned once in the content of the predetermined field, the frequency at which the user to be classified publishes the content of the predetermined field is increased once. Specifically, when a predetermined field content released by a certain user to be classified includes multiple subject words in the predetermined field, the frequency of the predetermined field content published by the user to be classified is recorded as the frequency of the contained subject words The same number of multiple times, for example, when a microblog contains 6 keywords in the automotive field (including the same or different keywords), then the frequency of the user to be classified to publish the content of the automotive field is increased by 6 times .

本领域技术人员可以理解,待分类用户发布预定领域内容的最近时间越晚,表明该待分类用户对该预定领域内容仍然感兴趣的可能性越大。如果某个待分类用户最新发布预定领域内容的时间是6个月之前,表明该用户曾经对预定领域感兴趣,但是现在可能已经不感兴趣了。Those skilled in the art can understand that the later the latest time when the user to be classified publishes the content in the predetermined field, the greater the possibility that the user to be classified is still interested in the content in the predetermined field. If a user to be categorized latest published content in the predetermined field 6 months ago, it indicates that the user was once interested in the predetermined field, but may not be interested now.

本领域技术人员也可以理解,某个待分类用户的预定领域内容被转发和/或评论的次数越多,表明该待分类用户对于预定领域的影响力越大。Those skilled in the art can also understand that the more times content in a predetermined field of a certain user to be classified is forwarded and/or commented on, the greater the influence of the user to be classified on the predetermined field is.

根据例如待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性的预定领域内容的、与用户相关的属性,用户分类单元20可以对该预定领域的用户进一步分类,例如分类为“最近感兴趣并且有影响力的用户”、“最近没有兴趣但是有影响力的用户”、“最近很感兴趣但是没有影响力的用户”、“最近没有兴趣并且也没有影响力的用户”,从而对每一类用户进行精确的分析和有针对性的管理。According to the user-related attributes of the content in the predetermined field such as the behavior pattern of the user to be classified publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field, the user classification unit 20 can further classify the users in the predetermined field, for example Classified as "Recently interested and influential users", "Recently not interested but influential users", "Recently very interested but not influential users", "Recently not interested and not influential users ", so as to conduct precise analysis and targeted management for each type of user.

图2是示意性地示出图1中的用户分类单元的一种示例结构的框图。FIG. 2 is a block diagram schematically showing an example structure of the user classification unit in FIG. 1 .

如图2所示,用户分类单元20包括:级别确定模块201,用于将用户发布内容的最近时间参数划分为M级,将用户发布内容的频率参数划分为N级,以及将内容传播特性参数划分为P级,其中,M、N和P均为大于1的整数,从而确定M×N×P个用户级别;以及用户分类模块202,用于根据每一个待分类用户发布所述预定领域内容的最近时间和频率以及其所发布的预定领域内容的传播特性,将该待分类用户分类到所述M×N×P个用户级别之一中。As shown in Figure 2, the user classification unit 20 includes: a level determination module 201, which is used to divide the latest time parameter of the user's published content into M levels, divide the frequency parameter of the user's content into N levels, and divide the content propagation characteristic parameters into M levels. Divided into P levels, wherein M, N and P are all integers greater than 1, thereby determining M×N×P user levels; and a user classification module 202, which is used to publish the predetermined field content according to each user to be classified The latest time and frequency of the user and the dissemination characteristics of the content in the predetermined field published by it, the user to be classified is classified into one of the M×N×P user levels.

根据本公开的实施例,可以将M、N、P均设置为2。即,例如可以将用户发布内容的最近时间参数R划分为2级(例如可以设置时间阈值,而将最近时间参数划分为大于等于该时间阈值以及小于该时间阈值两级,即长和短两级),将用户发布内容的频率参数F划分为2级(例如可以设置频率阈值,从而将频率参数分为大于等于频率阈值以及小于频率阈值两级,即频率高和低两级),以及将内容传播特性参数I划分为2级(例如,可以根据预定内容的转发次数和评论次数,确定发布该预定内容的待分类用户在预定领域的影响力,并根据该影响力,将传播特性参数分为大于等于传播阈值和小于传播阈值两级)。可以利用现有技术中的任意适当方法,来基于预定内容的转发次数和评论次数确定发布该预定内容的待分类用户在预定领域的影响力,为了简明起见,确定影响力的具体过程在此不再赘述。According to an embodiment of the present disclosure, M, N, and P can all be set to 2. That is, for example, the latest time parameter R of the content published by the user can be divided into two levels (for example, a time threshold can be set, and the latest time parameter can be divided into two levels greater than or equal to the time threshold and less than the time threshold, i.e. long and short ), divide the frequency parameter F of the content published by the user into two levels (for example, the frequency threshold can be set, so that the frequency parameter can be divided into two levels greater than or equal to the frequency threshold and less than the frequency threshold, that is, two levels of high frequency and low frequency), and the content The propagation characteristic parameter I is divided into 2 levels (for example, according to the number of forwarding times and comment times of the predetermined content, determine the influence of the user to be classified who publishes the predetermined content in the predetermined field, and according to the influence, the propagation characteristic parameter is divided into Greater than or equal to the propagation threshold and less than the propagation threshold two levels). Any appropriate method in the prior art can be used to determine the influence of the user to be classified who publishes the predetermined content in the predetermined field based on the number of reposts and comments of the predetermined content. For the sake of simplicity, the specific process of determining the influence is not described here. Let me repeat.

表1示出了在上述参数M、N、P均被设置为2时的8个用户级别。Table 1 shows 8 user levels when the above parameters M, N, P are all set to 2.

表1Table 1

虽然以上示出了的是级别确定模块201将最近时间参数M、频率参数N以及内容传播特性参数P均设置为2,从而将其分别划分为2级,但是本公开不限于此,例如也可以将M、N、P设置为其它数值(例如5),此外,也可以将M、N、P分别设置为不同的数值。Although it has been shown above that the level determination module 201 sets the latest time parameter M, the frequency parameter N, and the content propagation characteristic parameter P to 2, thereby dividing them into 2 levels respectively, the present disclosure is not limited thereto, for example, Set M, N, and P to other numerical values (for example, 5). In addition, M, N, and P may also be set to different numerical values respectively.

此外,级别确定模块还可以为包括最近时间参数M、频率参数N以及内容传播特性参数P设置阈值参数的各个参数设置阈值,以便于用户分类模块202将各个待分类用户分类。例如,在将参数M、N、P均分为2级的情况下,可以将用于最近时间参数M的阈值设置为例如3个月,可以将频率参数N的阈值设置为例如5等等。In addition, the level determination module can also set thresholds for each parameter including the latest time parameter M, the frequency parameter N and the content propagation characteristic parameter P setting threshold parameters, so that the user classification module 202 can classify each user to be classified. For example, when the parameters M, N, and P are equally divided into 2 levels, the threshold for the latest time parameter M can be set to, for example, 3 months, and the threshold for the frequency parameter N can be set to, for example, 5 and so on.

在级别确定模块201确定了M×N×P个用户级别之后,用户分类模块202根据每一个待分类用户发布所述预定领域内容的最近时间和频率以及其所发布的预定领域内容的传播特性,将该待分类用户分类到所述M×N×P个用户级别之一中。例如,在将参数M、N、P均分为2级的情况下,对于发布预定领域内容的最近时间距离当前时间为短、频率高、影响力大的待分类用户,可以将其划分为表1中所示的第1类,类似地,可以将每个待分类用户分类为表1所示的8类用户之一。After the level determination module 201 has determined the M×N×P user levels, the user classification module 202 is based on the latest time and frequency of each user to be classified publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field published by the user, The user to be classified is classified into one of the M×N×P user classes. For example, when the parameters M, N, and P are divided into 2 levels, for users to be classified whose latest time is short from the current time, high frequency, and influential, they can be divided into table The first category shown in Table 1, similarly, each user to be classified can be classified into one of the eight types of users shown in Table 1.

虽然以上描述的是通过用户指定的方式来获得预定领域的主题词,以便于搜索包含主题词的内容作为预定领域内容,但是本公开不限于此,例如也可以根据预定的规则确定预定领域的主题词。Although it is described above that the subject words in the predetermined field are obtained through the user-specified manner, so as to search for content containing the subject words as the content in the predetermined field, the present disclosure is not limited thereto, for example, the subject of the predetermined field can also be determined according to predetermined rules word.

图3是示意性地示出根据本公开实施例的用户分类装置的另一种示例结构的框图。Fig. 3 is a block diagram schematically showing another example structure of a user classification device according to an embodiment of the present disclosure.

如图3所示,除了与图1的用户分类装置1类似地包括内容搜索单元10和用户分类单元20之外,用户分类装置2还包括:主题词确定单元30,用于确定该预定领域的主题词以便所述内容搜索单元搜索包含该主题词的内容作为预定领域内容。As shown in Figure 3, in addition to including the content search unit 10 and the user classification unit 20 similarly to the user classification device 1 of Figure 1, the user classification device 2 also includes: a subject word determination unit 30 for determining the content of the predetermined field A subject word so that the content search unit searches for content containing the subject term as content in a predetermined field.

图4是示意性地示出图3中的主题词确定单元的另一种示例结构的框图。FIG. 4 is a block diagram schematically showing another example structure of the subject heading determining unit in FIG. 3 .

如图4所示,主题词确定单元30包括:第一实体词提取模块301,用于在由该预定领域的特定用户发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;第二实体词提取模块302,用于在由与该预定领域无关的非预定领域的特定用户所发布的内容中提取频率高于第二阈值的实体词,构成第二实体词群;以及主题词确定模块303,用于使用所述第二实体词群中的实体词对所述第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。As shown in Figure 4, the subject word determination unit 30 includes: a first entity word extraction module 301, which is used to extract entity words with a frequency higher than the first threshold in the content issued by a specific user in the predetermined field to form a first entity Word group: The second entity word extraction module 302 is used to extract entity words with a frequency higher than the second threshold from content published by a specific user in a non-predetermined field irrelevant to the predetermined field to form a second entity word group; And the subject word determining module 303, for using the entity words in the second entity word group to carry out deduplication to the entity words in the first entity word group, and the entity words in the first entity word group after deduplication Entity words are used as subject words in the predetermined field.

根据本公开,该预定领域的特定用户是该预定领域中影响力超过第三阈值的用户,该非预定领域的特定用户是该非预定领域中影响力超过第四阈值的用户。在期望对预定领域的微博用户进行分类的情况下,预定领域的特定用户例如可以是大V用户(通常是粉丝数量大于设定阈值的用户,即有影响力的用户)。According to the present disclosure, the specific users in the predetermined field are users whose influence exceeds a third threshold in the predetermined field, and the specific users in the non-predetermined field are users whose influence exceeds a fourth threshold in the non-predetermined field. In the case where it is desired to classify microblog users in a predetermined field, the specific user in the predetermined field may be, for example, a big V user (usually a user whose number of fans is greater than a set threshold, that is, an influential user).

第一实体词提取模块301可以在由该预定领域的发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;第二实体词提取模块302,用于在由与该预定领域无关的非预定领域的例如大V用户所发布的所有内容中提取频率高于第二阈值的实体词,构成第二实体词群。The first entity word extraction module 301 can extract entity words whose frequency is higher than the first threshold in the published content of the predetermined field to form a first entity word group; the second entity word extraction module 302 is used for combining with the Entity words whose frequencies are higher than the second threshold are extracted from all content published by big V users in non-predetermined fields irrelevant to the predetermined field to form a second entity word group.

例如,在期望对户外运动领域的微博用户进行划分的情况下,第一实体词提取模块301可以在由户外运动领域的大V用户发布的内容中,按照词频提取在该大V用户发布的内容中出现频率高于第一阈值的实体词。例如,第一实体词提取模块301可以提取大V用户在预定时间段(例如最近一周)发布的所有内容中提取出现频率高于第一阈值的实体词。类似地,第二实体词提取模块302可以在由非户外运动领域的大V用户发布的内容中,按照词频提取在该大V用户发布的内容中出现频率高于第二阈值的实体词。For example, in the case of desiring to classify microblog users in the field of outdoor sports, the first entity word extraction module 301 can extract the content published by the big V user in the field of outdoor sports according to word frequency. Entity words whose occurrence frequency in the content is higher than the first threshold. For example, the first entity word extraction module 301 may extract entity words whose frequency of occurrence is higher than the first threshold from all content published by a big V user within a predetermined time period (for example, the last week). Similarly, the second entity word extraction module 302 can extract entity words whose occurrence frequency is higher than the second threshold in the content published by the big V user in the field of non-outdoor sports according to word frequency.

根据本公开,第一阈值和第二阈值可以根据例如获取待分类用户的精确度的需求任意设定。此外,本领域技术人员也可以理解,可以将第一阈值和第二阈值设置为相同,也可以设置为不同。According to the present disclosure, the first threshold and the second threshold can be arbitrarily set according to, for example, requirements for obtaining accuracy of users to be classified. In addition, those skilled in the art can also understand that the first threshold and the second threshold can be set to be the same, or can be set to be different.

通常,预定领域的大V用户所发布的内容中既包括该预定领域的实体词,也包括非预定领域的实体词,例如这些大V用户也会发布与时事、体育、财经等有关的热点实体词。基于此,可以通过提取词频高于第一阈值的实体词,并且通过在非预定领域的例如大V用户所发布的内容中提取通常与预定领域不相关的实体词来去重,可以更准确地获得预定领域的主题词。Usually, the content published by big V users in the predetermined field includes both entity words in the predetermined field and entity words in non-predetermined fields. For example, these big V users will also publish hot entities related to current affairs, sports, finance, etc. word. Based on this, by extracting entity words whose word frequency is higher than the first threshold, and by extracting entity words that are usually not related to the predetermined field from content published by users in non-predetermined fields such as big V users, it is possible to more accurately Obtain subject headings for a predetermined field.

以预定领域为户外运动领域为例,第一实体词提取模块301可以通过在户外运动领域的大V用户之一发布的内容中提取实体词,可以获得如下实体词:房价、暴走、欧冠、自驾、京开高速、骑行、踏青、切尔西、简历、郊游。通过例如在户外运动领域的多个大V用户发布的内容中提取实体词,可以得到包括例如上述实体词的第一实体词群。Taking the predetermined field as the field of outdoor sports as an example, the first entity word extraction module 301 can extract entity words from the content published by one of the big V users in the field of outdoor sports, and can obtain the following entity words: house price, runaway, Champions League, self-driving , Jingkai Expressway, cycling, outing, Chelsea, resume, outing. For example, by extracting entity words from content posted by multiple big V users in the field of outdoor sports, the first entity word group including the above entity words can be obtained.

第二实体词提取模块302通过在非户外运动领域的大V用户之一发布的内容中提取实体词,可以获得如下实体词:昆明、人民日报、房价、京开高速、简历、地震、欧冠、贝克汉姆、切尔西等等。通过例如在非户外运动领域的多个大V用户发布的内容中提取实体词,可以得到包括例如上述实体词的第二实体词群。The second entity word extraction module 302 can obtain the following entity words by extracting entity words from the content released by one of the big V users in the field of non-outdoor sports: Kunming, People's Daily, house price, Beijing-Kaikai Expressway, resume, earthquake, Champions League, Beckham, Chelsea, etc. For example, by extracting entity words from content posted by multiple big V users in the field of non-outdoor sports, a second entity word group including, for example, the above entity words can be obtained.

然后,主题词确定模块303使用与非预定领域相关的第二实体词群中的实体词对与预定领域相关的第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。Then, the subject word determination module 303 uses the entity words in the second entity word group related to the non-predetermined field to deduplicate the entity words in the first entity word group related to the predetermined field, and deduplicate the first entity word group after deduplication. The entity words in the entity word group are used as the subject words of the predetermined field.

例如,在上述示例中,主题词确定模块303可以通过去重得到“暴走、自驾、骑行、踏青、郊游”作为户外运动领域的主题词。For example, in the above example, the keyword determination module 303 may obtain "runaway, self-driving, cycling, outing, outing" as the keyword in the field of outdoor sports by removing duplicates.

然后,内容搜索单元10可以利用主题词确定模块确定的预定领域的主题词,搜索包含该主题词的内容,并且获得预定领域的待分类用户,以由用户分类单元20对该待分类用户进行分类。Then, the content search unit 10 can use the keywords in the predetermined field determined by the keyword determination module to search for content containing the keywords, and obtain users to be classified in the predetermined field, so that the users to be classified can be classified by the user classification unit 20 .

在通过根据本发明实施例的用户分类装置为预定领域的用户进行了分类之后,可以针对各个分类的用户分别进行相应的分析和管理,从而可以实现对海量数据的更准确的分析,提供了另一个维度的用户属性,以便于进一步对海量数据进行进一步处理。After the users in the predetermined field are classified by the user classification device according to the embodiment of the present invention, corresponding analysis and management can be performed on the users of each classification, so that a more accurate analysis of massive data can be realized, and another A dimension of user attributes for further processing of massive data.

根据本公开的一个实施例,还提供了一种用户分类方法。下面结合图5来描述用户分类方法的一种示例性处理。According to an embodiment of the present disclosure, a user classification method is also provided. An exemplary process of the user classification method is described below in conjunction with FIG. 5 .

如图5所示,根据本公开的实施例的用户分类方法的处理流程500开始于S510,然后执行S520的处理。As shown in FIG. 5 , the processing flow 500 of the user classification method according to the embodiment of the present disclosure starts at S510, and then performs the processing of S520.

在步骤S520中,在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用。例如,可以通过执行例如参照图1-4描述的内容获取单元10的处理来实现步骤S520,在此省略其描述。然后执行S530。In step S520 , search the predetermined data source for the content containing the subject words of the predetermined field as the predetermined field content, and use the users who publish the predetermined field content as to be classified. For example, step S520 may be implemented by executing the processing of the content acquisition unit 10 described with reference to FIGS. 1-4, for example, and its description is omitted here. Then execute S530.

在步骤S530中,根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。所述预定领域内容的、与用户相关的属性例如可以包括所述待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性。例如,可以通过执行例如参照图1-4描述的用户分类单元20的处理来实现步骤S530,在此省略其描述。然后执行S540。In step S530, the users to be classified are classified according to the user-related attributes of the content in the predetermined field. The user-related attributes of the content in the predetermined field may include, for example, the behavior pattern of the user to be classified publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field. For example, step S530 may be implemented by executing the processing of the user classification unit 20 described with reference to FIGS. 1-4 , and its description is omitted here. Then execute S540.

处理流程500结束于S540。The processing flow 500 ends at S540.

根据本公开的一个实施例,在执行步骤S530之前,还可以包括用于确定该预定领域的主题词的步骤,该步骤例如可以包括在由该预定领域的特定用户发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;在由与该预定领域无关的非预定领域的特定用户所发布的内容中提取频率高于第二阈值的实体词,构成第二实体词群;以及使用所述第二实体词群中的实体词对所述第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。确定该预定领域的主题词的各个步骤例如可以参照图4描述的第一实体词提取模块、第二实体词提取模块以及主题词确定模块的处理来实现,在此省略对其的详细描述。According to an embodiment of the present disclosure, before step S530 is performed, a step of determining keywords in the predetermined field may be included, for example, this step may include extracting keywords with a frequency higher than The entity words of the first threshold constitute the first entity word group; the entity words whose frequency is higher than the second threshold are extracted from content published by specific users in non-predetermined fields irrelevant to the predetermined field to form the second entity word group ; and use the entity words in the second entity word group to remove the weight of the entity words in the first entity word group, and use the entity words in the first entity word group after deduplication as the predetermined field Subject headings. The various steps of determining the subject headings in the predetermined field can be implemented by referring to the processing of the first entity word extraction module, the second entity word extraction module and the subject heading determination module described in FIG. 4 , and the detailed description thereof is omitted here.

与现有技术中相比,根据本公开的用户分类装置和用户分类方法具有至少以下一个优点:通过根据预定领域内容的、与用户相关的属性对预定领域进行划分,能够实现更精准的用户分类;利用非预定领域的实体词对预定领域的实体词进行去重,能够实现对预定领域主题词的扩展。Compared with the prior art, the user classification device and user classification method according to the present disclosure have at least one of the following advantages: by dividing the predetermined field according to the attributes related to the user in the predetermined field content, more accurate user classification can be realized ;Using the entity words in the non-predetermined field to deduplicate the entity words in the predetermined field can realize the expansion of the subject words in the predetermined field.

此外,本公开的实施例还提供了一种电子设备,该电子设备被配置包括如上所述的用户分类装置1。该电子设备例如可以是以下设备中的任意一种:手机;计算机;平板电脑;以及个人数字助理等。相应地,该电子设备能够拥有如上所述的用户分类装置的有益效果和优点。In addition, an embodiment of the present disclosure also provides an electronic device configured to include the user classification apparatus 1 as described above. The electronic device may be, for example, any one of the following devices: a mobile phone; a computer; a tablet computer; and a personal digital assistant. Correspondingly, the electronic device can have the beneficial effects and advantages of the above-mentioned user classification device.

上述根据本公开的实施例的用户分类装置(例如图1-4中所示的用户分类装置)中的各个组成单元、子单元等可以通过软件、固件、硬件或其任意组合的方式进行配置。在通过软件或固件实现的情况下,可从存储介质或网络向具有专用硬件结构的机器安装构成该软件或固件的程序,该机器在安装有各种程序时,能够执行上述各组成单元、子单元的各种功能。Each component unit, subunit, etc. in the user classification device according to the embodiments of the present disclosure (such as the user classification device shown in FIGS. 1-4 ) may be configured by software, firmware, hardware or any combination thereof. In the case of realization by software or firmware, the program constituting the software or firmware can be installed from a storage medium or a network to a machine with a dedicated hardware structure, and when the machine is installed with various programs, it can execute the above-mentioned constituent units and sub-units. Various functions of the unit.

图6是示出了可用来实现根据本公开的实施例的用户分类设备和方法的一种可能的处理设备的硬件配置的结构简图。FIG. 6 is a schematic structural diagram showing a hardware configuration of a possible processing device that can be used to implement the user classification device and method according to the embodiments of the present disclosure.

在图6中,中央处理单元(CPU)601根据只读存储器(ROM)602中存储的程序或从存储部分608加载到随机存取存储器(RAM)603的程序执行各种处理。在RAM603中,还根据需要存储当CPU601执行各种处理等等时所需的数据。CPU601、ROM602和RAM603经由总线604彼此连接。输入/输出接口605也连接到总线604。In FIG. 6 , a central processing unit (CPU) 601 executes various processes according to programs stored in a read only memory (ROM) 602 or loaded from a storage section 608 to a random access memory (RAM) 603 . In the RAM 603 , data required when the CPU 601 executes various processes and the like is also stored as necessary. The CPU 601 , ROM 602 , and RAM 603 are connected to each other via a bus 604 . The input/output interface 605 is also connected to the bus 604 .

下述部件也连接到输入/输出接口605:输入部分606(包括键盘、鼠标等等)、输出部分607(包括显示器,例如阴极射线管(CRT)、液晶显示器(LCD)等,和扬声器等)、存储部分608(包括硬盘等)、通信部分609(包括网络接口卡例如LAN卡、调制解调器等)。通信部分609经由网络例如因特网执行通信处理。根据需要,驱动器610也可连接到输入/输出接口605。可拆卸介质611例如磁盘、光盘、磁光盘、半导体存储器等等可以根据需要被安装在驱动器610上,使得从中读出的计算机程序可根据需要被安装到存储部分608中。The following components are also connected to the input/output interface 605: an input section 606 (including a keyboard, a mouse, etc.), an output section 607 (including a display such as a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.) , a storage section 608 (including a hard disk, etc.), a communication section 609 (including a network interface card such as a LAN card, a modem, etc.). The communication section 609 performs communication processing via a network such as the Internet. A driver 610 may also be connected to the input/output interface 605 as needed. A removable medium 611 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. can be mounted on the drive 610 as needed, so that a computer program read therefrom can be installed into the storage section 608 as needed.

在通过软件实现上述系列处理的情况下,可以从网络例如因特网或从存储介质例如可拆卸介质611安装构成软件的程序。In the case where the above-described series of processing is realized by software, the program constituting the software can be installed from a network such as the Internet or from a storage medium such as the removable medium 611 .

本领域的技术人员应当理解,这种存储介质不局限于图6所示的其中存储有程序、与设备相分离地分发以向用户提供程序的可拆卸介质611。可拆卸介质611的例子包含磁盘(包含软盘)、光盘(包含光盘只读存储器(CD-ROM)和数字通用盘(DVD))、磁光盘(包含迷你盘(MD)(注册商标))和半导体存储器。或者,存储介质可以是ROM602、存储部分608中包含的硬盘等等,其中存有程序,并且与包含它们的设备一起被分发给用户。Those skilled in the art should understand that such a storage medium is not limited to the removable medium 611 shown in FIG. 6 in which the program is stored and distributed separately from the device to provide the program to the user. Examples of the removable medium 611 include magnetic disks (including floppy disks), optical disks (including compact disk read only memory (CD-ROM) and digital versatile disks (DVD)), magneto-optical disks (including MiniDisc (MD) (registered trademark)), and semiconductor memory. Alternatively, the storage medium may be the ROM 602, a hard disk contained in the storage section 608, or the like, in which the programs are stored and distributed to users together with devices containing them.

此外,本公开还提出了一种存储有机器可读取的指令代码的程序产品。上述指令代码由机器读取并执行时,可执行上述根据本公开的实施例的用户分类方法。相应地,用于承载这种程序产品的例如磁盘、光盘、磁光盘、半导体存储器等的各种存储介质也包括在本公开的公开中。In addition, the present disclosure also proposes a program product storing machine-readable instruction codes. When the above instruction code is read and executed by a machine, the above user classification method according to the embodiment of the present disclosure can be executed. Accordingly, various storage media such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc. for carrying such program products are also included in the disclosure of the present disclosure.

在上面对本公开具体实施例的描述中,针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多个其它实施方式中使用,与其它实施方式中的特征相组合,或替代其它实施方式中的特征。In the above description of specific embodiments of the present disclosure, features described and/or illustrated for one embodiment can be used in one or more other embodiments in the same or similar manner, and features in other embodiments Combination or replacement of features in other embodiments.

此外,本公开的各实施例的方法不限于按照说明书中描述的或者附图中示出的时间顺序来执行,也可以按照其他的时间顺序、并行地或独立地执行。因此,本说明书中描述的方法的执行顺序不对本公开的技术范围构成限制。In addition, the methods in the various embodiments of the present disclosure are not limited to being executed in the time sequence described in the specification or shown in the drawings, and may also be executed in other time sequences, in parallel or independently. Therefore, the execution order of the methods described in this specification does not limit the technical scope of the present disclosure.

此外,显然,根据本公开的上述方法的各个操作过程也可以以存储在各种机器可读的存储介质中的计算机可执行程序的方式实现。In addition, obviously, each operation process of the above-mentioned method according to the present disclosure can also be implemented in the form of computer-executable programs stored in various machine-readable storage media.

而且,本公开的目的也可以通过下述方式实现:将存储有上述可执行程序代码的存储介质直接或者间接地提供给系统或设备,并且该系统或设备中的计算机或者中央处理单元(CPU)读出并执行上述程序代码。Moreover, the object of the present disclosure can also be achieved in the following manner: the storage medium storing the above-mentioned executable program code is directly or indirectly provided to a system or device, and the computer or central processing unit (CPU) in the system or device Read and execute the above program code.

此时,只要该系统或者设备具有执行程序的功能,则本公开的实施方式不局限于程序,并且该程序也可以是任意的形式,例如,目标程序、解释器执行的程序或者提供给操作系统的脚本程序等。At this time, as long as the system or device has the function of executing the program, the embodiment of the present disclosure is not limited to the program, and the program can also be in any form, for example, an object program, a program executed by an interpreter, or a program provided to an operating system. script programs, etc.

上述这些机器可读存储介质包括但不限于:各种存储器和存储单元,半导体设备,磁盘单元例如光、磁和磁光盘,以及其它适于存储信息的介质等。The above-mentioned machine-readable storage media include, but are not limited to: various memories and storage units, semiconductor devices, magnetic disk units such as optical, magnetic and magneto-optical disks, and other media suitable for storing information, and the like.

另外,客户信息处理终端通过连接到因特网上的相应网站,并且将依据本公开的计算机程序代码下载和安装到信息处理终端中然后执行该程序,也可以实现本公开的各实施例。In addition, the client information processing terminal can also implement the embodiments of the present disclosure by connecting to a corresponding website on the Internet, and downloading and installing the computer program code according to the present disclosure into the information processing terminal and then executing the program.

综上,在根据本公开的实施例中,本公开提供了如下方案,但不限于此:To sum up, in the embodiments according to the present disclosure, the present disclosure provides the following solutions, but not limited thereto:

方案1、一种用户分类装置,用于针对预定领域的用户进行分类,该用户分类装置包括:Scheme 1. A user classification device is used to classify users in a predetermined field, and the user classification device includes:

内容搜索单元,用于在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及A content search unit, configured to search for content containing keywords in the predetermined field in a predetermined data source as content in the predetermined field, and use users who publish content in the predetermined field as users to be classified; and

用户分类单元,用于根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。The user classification unit is configured to classify the users to be classified according to the user-related attributes of the content in the predetermined field.

方案2、根据方案1所述的用户分类装置,其中所述内容搜索单元用于在预定数据源中搜索包含该预定领域的主题词的微博内容作为所述预定领域内容。Solution 2. The device for classifying users according to solution 1, wherein the content search unit is configured to search for microblog content containing keywords in the predetermined field in predetermined data sources as the content in the predetermined field.

方案3、根据方案1或2所述的用户分类装置,其中,所述预定领域内容的、与用户相关的属性包括所述待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性。Solution 3. The device for classifying users according to solution 1 or 2, wherein the user-related attributes of the content in the predetermined field include the behavior pattern of the user to be classified publishing the content in the predetermined field and the content in the predetermined field propagation characteristics.

方案4、根据方案3所述的用户分类装置,其中,Solution 4. The user classification device according to solution 3, wherein,

所述待分类用户发布所述预定领域内容的行为模式包括所述待分类用户发布所述预定领域内容的最近时间和频率,以及The behavior pattern of the user to be classified publishing the content in the predetermined field includes the latest time and frequency of the content in the predetermined field published by the user to be classified, and

所述预定领域内容的传播特性由所述预定领域内容被转发的次数和/或评论的次数确定。The propagation characteristic of the content in the predetermined field is determined by the number of times the content in the predetermined field is forwarded and/or commented on.

方案5、根据方案4所述的用户分类装置,其中,所述用户分类单元包括:Solution 5. The user classification device according to solution 4, wherein the user classification unit includes:

级别确定模块,用于将用户发布内容的最近时间参数划分为M级,将用户发布内容的频率参数划分为N级,以及将内容传播特性参数划分为P级,其中,M、N和P均为大于1的整数,从而确定M×N×P个用户级别;以及The level determination module is used to divide the latest time parameter of the content released by the user into M grades, divide the frequency parameter of the content released by the user into N grades, and divide the content propagation characteristic parameters into P grades, wherein M, N and P are all is an integer greater than 1, thereby determining M×N×P user levels; and

用户分类模块,用于根据每一个待分类用户发布所述预定领域内容的最近时间和频率以及其所发布的预定领域内容的传播特性,将该待分类用户分类到所述M×N×P个用户级别之一中。The user classification module is used to classify the users to be classified into the M×N×P according to the latest time and frequency of publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field published by each user to be classified in one of the user levels.

方案6、根据方案1或2所述的用户分类装置,还包括:主题词确定单元,用于确定该预定领域的主题词以便所述内容搜索单元搜索包含该主题词的内容作为预定领域内容,所述主题词确定单元包括:Solution 6. The user classification device according to solution 1 or 2, further comprising: a subject term determining unit, configured to determine a subject term in the predetermined field so that the content search unit searches for content containing the subject term as content in the predetermined field, The subject heading determination unit includes:

第一实体词提取模块,用于在由该预定领域的特定用户发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;The first entity word extraction module is used to extract entity words with a frequency higher than the first threshold from content published by specific users in the predetermined field to form a first entity word group;

第二实体词提取模块,用于在由与该预定领域无关的非预定领域的特定用户所发布的内容中提取频率高于第二阈值的实体词,构成第二实体词群;以及The second entity word extraction module is used to extract entity words with a frequency higher than a second threshold from the content published by a specific user in a non-predetermined field irrelevant to the predetermined field to form a second entity word group; and

主题词确定模块,用于使用所述第二实体词群中的实体词对所述第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。The subject word determination module is used to use the entity words in the second entity word group to deduplicate the entity words in the first entity word group, and remove the entity words in the first entity word group after deduplication as the subject heading for the intended field.

方案7、根据方案6所述的用户分类装置,其中该预定领域的特定用户是该预定领域中影响力超过第三阈值的用户,该非预定领域的特定用户是该非预定领域中影响力超过第四阈值的用户。Solution 7. The user classification device according to solution 6, wherein the specific user in the predetermined field is a user whose influence in the predetermined field exceeds a third threshold, and the specific user in the non-predetermined field is a user whose influence in the non-predetermined field exceeds Users of the fourth threshold.

方案8、一种用户分类方法,用于针对预定领域的用户进行分类,该用户分类方法包括:Scheme 8. A user classification method, which is used to classify users in a predetermined field, and the user classification method includes:

在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及Search for content containing keywords in the predetermined field in the predetermined data source as the content in the predetermined field, and use users who publish the content in the predetermined field as users to be classified; and

根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。The users to be classified are classified according to the user-related attributes of the content in the predetermined field.

方案9、根据方案8所述的用户分类方法,其中在预定数据源中搜索包含该预定领域的主题词的微博内容作为所述预定领域内容。Solution 9. The user classification method according to solution 8, wherein the predetermined data source is searched for microblog content containing keywords in the predetermined field as the content in the predetermined field.

方案10、根据方案8或9所述的用户分类方法,其中,所述预定领域内容的、与用户相关的属性包括所述待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性。Solution 10. The user classification method according to solution 8 or 9, wherein the user-related attributes of the content in the predetermined field include the behavior pattern of the user to be classified to publish the content in the predetermined field and the content in the predetermined field propagation characteristics.

方案11、根据方案10所述的用户分类方法,其中,Solution 11. The user classification method according to solution 10, wherein,

所述待分类用户发布所述预定领域内容的行为模式包括所述待分类用户发布所述预定领域内容的最近时间和频率,以及The behavior pattern of the user to be classified publishing the content in the predetermined field includes the latest time and frequency of the content in the predetermined field published by the user to be classified, and

所述预定领域内容的传播特性由所述预定领域内容被转发的次数和/或评论的次数确定。The propagation characteristic of the content in the predetermined field is determined by the number of times the content in the predetermined field is forwarded and/or commented on.

方案12、根据方案11所述的用户分类方法,其中,对所述待分类用户进行分类包括:Solution 12. The user classification method according to solution 11, wherein classifying the users to be classified includes:

将用户发布内容的最近时间参数划分为M级,将用户发布内容的频率参数划分为N级,以及将内容传播特性参数划分为P级,其中,M、N和P均为大于1的整数,从而确定M×N×P个用户级别;以及Divide the most recent time parameter of the content published by the user into M grades, divide the frequency parameter of the content published by the user into N grades, and divide the content propagation characteristic parameters into P grades, wherein M, N and P are all integers greater than 1, Thereby determining M×N×P user levels; and

根据每一个待分类用户发布所述预定领域内容的最近时间和频率以及其所发布的预定领域内容的传播特性,将该待分类用户分类到所述M×N×P个用户级别之一中。According to the latest time and frequency when each user to be classified publishes the content in the predetermined field and the dissemination characteristics of the content in the predetermined field published by the user, the user to be classified is classified into one of the M×N×P user levels.

方案13、根据方案8或9所述的用户分类方法,还包括确定该预定领域的主题词以便所述内容搜索单元搜索包含该主题词的内容作为预定领域内容,其中确定预定领域的主题词包括:Solution 13, according to the user classification method described in solution 8 or 9, further comprising determining the subject words of the predetermined field so that the content search unit searches for content containing the subject words as the content of the predetermined field, wherein determining the subject words of the predetermined field includes :

在由该预定领域的特定用户发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;Extracting entity words with a frequency higher than a first threshold from content published by a specific user in the predetermined field to form a first entity word group;

在由与该预定领域无关的非预定领域的特定用户所发布的内容中提取频率高于第二阈值的实体词,构成第二实体词群;以及Extracting entity words with a frequency higher than a second threshold from content published by specific users in a non-predetermined field that is irrelevant to the predetermined field to form a second entity word group; and

使用所述第二实体词群中的实体词对所述第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。Use the entity words in the second entity word group to deduplicate the entity words in the first entity word group, and use the entity words in the first entity word group after deduplication as the subject words in the predetermined field .

方案14、根据方案13所述的用户分类方法,其中该预定领域的特定用户是该预定领域中影响力超过第三阈值的用户,该非预定领域的特定用户是该非预定领域中影响力超过第四阈值的用户。Solution 14. The user classification method according to solution 13, wherein the specific user in the predetermined field is a user whose influence in the predetermined field exceeds the third threshold, and the specific user in the non-predetermined field is a user whose influence in the non-predetermined field exceeds Users of the fourth threshold.

方案15、一种电子设备,包括根据方案1-7中任一项所述的用户分类装置。Solution 15. An electronic device, comprising the device for classifying users according to any one of solutions 1-7.

方案16、根据方案15所述的电子设备,其中电子设备是手机、计算机、平板电脑、或个人数字助理。Item 16. The electronic device according to item 15, wherein the electronic device is a mobile phone, a computer, a tablet computer, or a personal digital assistant.

最后,还需要说明的是,在本公开中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this disclosure, relational terms such as first and second are only used to distinguish one entity or operation from another, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

尽管上面已经通过本公开的具体实施例的描述对本公开进行了披露,但是,应该理解,本领域技术人员可在所附权利要求的精神和范围内设计对本公开的各种修改、改进或者等同物。这些修改、改进或者等同物也应当被认为包括在本公开所要求保护的范围内。Although the present disclosure has been disclosed above through the description of specific embodiments of the present disclosure, it should be understood that those skilled in the art can design various modifications, improvements or equivalents to the present disclosure within the spirit and scope of the appended claims . These modifications, improvements or equivalents should also be considered to be included in the scope of protection claimed by the present disclosure.

Claims (10)

1.一种用户分类装置,用于针对预定领域的用户进行分类,该用户分类装置包括:1. A user classification device is used to classify users in predetermined fields, the user classification device comprising: 内容搜索单元,用于在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及A content search unit, configured to search for content containing keywords in the predetermined field in a predetermined data source as content in the predetermined field, and use users who publish content in the predetermined field as users to be classified; and 用户分类单元,用于根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。The user classification unit is configured to classify the users to be classified according to the user-related attributes of the content in the predetermined field. 2.根据权利要求1所述的用户分类装置,其中所述内容搜索单元用于在预定数据源中搜索包含该预定领域的主题词的微博内容作为所述预定领域内容。2. The user classification device according to claim 1, wherein the content search unit is used to search for microblog content containing keywords in the predetermined field as the content in the predetermined field in a predetermined data source. 3.根据权利要求1或2所述的用户分类装置,其中,所述预定领域内容的、与用户相关的属性包括所述待分类用户发布所述预定领域内容的行为模式和所述预定领域内容的传播特性。3. The user classification device according to claim 1 or 2, wherein the user-related attributes of the content in the predetermined field include the behavior pattern of the user to be classified publishing the content in the predetermined field and the content in the predetermined field propagation characteristics. 4.根据权利要求3所述的用户分类装置,其中,4. The user classification device according to claim 3, wherein, 所述待分类用户发布所述预定领域内容的行为模式包括所述待分类用户发布所述预定领域内容的最近时间和频率,以及The behavior pattern of the user to be classified publishing the content in the predetermined field includes the latest time and frequency of the content in the predetermined field published by the user to be classified, and 所述预定领域内容的传播特性由所述预定领域内容被转发的次数和/或评论的次数确定。The dissemination characteristic of the content in the predetermined field is determined by the number of times the content in the predetermined field is forwarded and/or commented on. 5.根据权利要求4所述的用户分类装置,其中,所述用户分类单元包括:5. The user classification device according to claim 4, wherein the user classification unit comprises: 级别确定模块,用于将用户发布内容的最近时间参数划分为M级,将用户发布内容的频率参数划分为N级,以及将内容传播特性参数划分为P级,其中,M、N和P均为大于1的整数,从而确定M×N×P个用户级别;以及The level determination module is used to divide the latest time parameter of the content released by the user into M grades, divide the frequency parameter of the content released by the user into N grades, and divide the content propagation characteristic parameters into P grades, wherein M, N and P are all is an integer greater than 1, thereby determining M×N×P user levels; and 用户分类模块,用于根据每一个待分类用户发布所述预定领域内容的最近时间和频率以及其所发布的预定领域内容的传播特性,将该待分类用户分类到所述M×N×P个用户级别之一中。The user classification module is used to classify the users to be classified into the M×N×P according to the latest time and frequency of publishing the content in the predetermined field and the propagation characteristics of the content in the predetermined field published by each user to be classified in one of the user levels. 6.根据权利要求1或2所述的用户分类装置,还包括:主题词确定单元,用于确定该预定领域的主题词以便所述内容搜索单元搜索包含该主题词的内容作为预定领域内容,所述主题词确定单元包括:6. The user classification device according to claim 1 or 2, further comprising: a subject term determination unit, used to determine the subject term in the predetermined field so that the content search unit searches for content containing the subject term as the predetermined field content, The subject heading determination unit includes: 第一实体词提取模块,用于在由该预定领域的特定用户发布的内容中提取频率高于第一阈值的实体词,构成第一实体词群;The first entity word extraction module is used to extract entity words with a frequency higher than the first threshold from content published by specific users in the predetermined field to form a first entity word group; 第二实体词提取模块,用于在由与该预定领域无关的非预定领域的特定用户所发布的内容中提取频率高于第二阈值的实体词,构成第二实体词群;以及The second entity word extraction module is used to extract entity words with a frequency higher than a second threshold from the content published by a specific user in a non-predetermined field irrelevant to the predetermined field to form a second entity word group; and 主题词确定模块,用于使用所述第二实体词群中的实体词对所述第一实体词群中的实体词进行去重,并将去重后的第一实体词群中的实体词作为该预定领域的主题词。The subject word determination module is used to use the entity words in the second entity word group to deduplicate the entity words in the first entity word group, and remove the entity words in the first entity word group after deduplication as the subject heading for the intended field. 7.根据权利要求6所述的用户分类装置,其中该预定领域的特定用户是该预定领域中影响力超过第三阈值的用户,该非预定领域的特定用户是该非预定领域中影响力超过第四阈值的用户。7. The user classification device according to claim 6, wherein the specific user in the predetermined field is a user whose influence exceeds a third threshold in the predetermined field, and the specific user in the non-predetermined field is a user whose influence exceeds a third threshold in the non-predetermined field. Users of the fourth threshold. 8.一种用户分类方法,用于针对预定领域的用户进行分类,该用户分类方法包括:8. A user classification method for classifying users in a predetermined field, the user classification method comprising: 在预定数据源中搜索包含该预定领域的主题词的内容作为预定领域内容,并将发布该预定领域内容的用户作为待分类用户;以及Search for content containing keywords in the predetermined field in the predetermined data source as the content in the predetermined field, and use users who publish the content in the predetermined field as users to be classified; and 根据所述预定领域内容的、与用户相关的属性,对所述待分类用户进行分类。The users to be classified are classified according to the user-related attributes of the content in the predetermined field. 9.一种电子设备,包括根据权利要求1-7中任一项所述的用户分类装置。9. An electronic device, comprising the device for classifying users according to any one of claims 1-7. 10.如权利要求9所述的电子设备,其中,所述电子设备是手机、计算机、平板电脑、或者个人数字助理。10. The electronic device according to claim 9, wherein the electronic device is a mobile phone, a computer, a tablet computer, or a personal digital assistant.
CN201410222082.4A 2014-05-23 2014-05-23 User classification apparatus, user classification method and electronic device Pending CN105095324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410222082.4A CN105095324A (en) 2014-05-23 2014-05-23 User classification apparatus, user classification method and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410222082.4A CN105095324A (en) 2014-05-23 2014-05-23 User classification apparatus, user classification method and electronic device

Publications (1)

Publication Number Publication Date
CN105095324A true CN105095324A (en) 2015-11-25

Family

ID=54575769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410222082.4A Pending CN105095324A (en) 2014-05-23 2014-05-23 User classification apparatus, user classification method and electronic device

Country Status (1)

Country Link
CN (1) CN105095324A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512943A (en) * 2015-12-18 2016-04-20 合肥寰景信息技术有限公司 Intelligent analysis method of user information in network community
CN106095915A (en) * 2016-06-08 2016-11-09 百度在线网络技术(北京)有限公司 The processing method and processing device of user identity
CN107015993A (en) * 2016-01-28 2017-08-04 中国移动通信集团上海有限公司 A kind of user type recognition methods and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512943A (en) * 2015-12-18 2016-04-20 合肥寰景信息技术有限公司 Intelligent analysis method of user information in network community
CN107015993A (en) * 2016-01-28 2017-08-04 中国移动通信集团上海有限公司 A kind of user type recognition methods and device
CN106095915A (en) * 2016-06-08 2016-11-09 百度在线网络技术(北京)有限公司 The processing method and processing device of user identity

Similar Documents

Publication Publication Date Title
TWI718643B (en) Method and device for identifying abnormal groups
WO2019041521A1 (en) Apparatus and method for extracting user keyword, and computer-readable storage medium
WO2017020451A1 (en) Information push method and device
CN103294781B (en) A kind of method and apparatus for processing page data
US20130198240A1 (en) Social Network Analysis
CN111191012B (en) Knowledge graph generation device and method and computer readable storage medium thereof
CN101593204A (en) A Sentiment Analysis System Based on News Comment Webpage
US10002187B2 (en) Method and system for performing topic creation for social data
JP2017142796A (en) Identification and extraction of information
US20140147048A1 (en) Document quality measurement
CN104615715A (en) Social network event analyzing method and system based on geographic positions
US20140082183A1 (en) Detection and handling of aggregated online content using characterizing signatures of content items
CN105573971B (en) Table reconfiguration device and method
CN103164428B (en) Determine the method and apparatus of the correlativity of microblogging and given entity
CN103678371B (en) Word library updating device, data integration device and method and electronic equipment
CN111177719A (en) Address class determination method, device, computer-readable storage medium and device
CN114330329A (en) A service content search method, device, electronic device and storage medium
CN104899201A (en) Text extraction method and device, sensitive word judgment method and device, and servers
CN103164415B (en) Based on expanded keyword acquisition methods and the equipment of microblog
CN104794209A (en) Chinese microblog sentiment classification method and system based on Markov logic network
CN104572904B (en) A kind of determination method and device of label correlation degree
CN103514168B (en) Data processing method and device
CN103678356B (en) A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword
CN109446322B (en) Text analysis method, apparatus, electronic device and readable storage medium
CN105095324A (en) User classification apparatus, user classification method and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151125