[go: up one dir, main page]

CN106547921B - Label generation method and device - Google Patents

Label generation method and device Download PDF

Info

Publication number
CN106547921B
CN106547921B CN201611116938.5A CN201611116938A CN106547921B CN 106547921 B CN106547921 B CN 106547921B CN 201611116938 A CN201611116938 A CN 201611116938A CN 106547921 B CN106547921 B CN 106547921B
Authority
CN
China
Prior art keywords
matrix
label
user
unknown
access information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611116938.5A
Other languages
Chinese (zh)
Other versions
CN106547921A (en
Inventor
赵博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shouxin Huixin Beijing Technology Co ltd
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201611116938.5A priority Critical patent/CN106547921B/en
Publication of CN106547921A publication Critical patent/CN106547921A/en
Application granted granted Critical
Publication of CN106547921B publication Critical patent/CN106547921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a kind of label generating method and device, for providing a kind of mode for preferably generating label for object.The described method includes: determining the access information that known object and unknown object are accessed by the user in preset duration, wherein the known object is the object to have labelled, and the unknown object is the object not labelled, and the label is used for classified object;According to the label of the known object and the access information, the label of the unknown object is determined.

Description

标签生成方法及装置Label generation method and device

技术领域technical field

本公开涉及计算机技术领域,具体地,涉及一种标签生成方法及装置。The present disclosure relates to the field of computer technology, and in particular, to a label generation method and device.

背景技术Background technique

在信息量巨大的当代社会,可以通过标签来描述对象、归类对象。例如,对象可以是新闻,那么可以给新闻打上与其内容匹配的标签,比如,科技、民生、经济、军事,等等。标签在信息推荐中起到了重要的作用,也可以帮助我们更有效的认知信息资源,有利于信息的发现、管理、传播和利用。In the contemporary society with a huge amount of information, objects can be described and classified through tags. For example, if the object can be news, then you can label the news to match its content, such as science and technology, people's livelihood, economy, military, and so on. Tags play an important role in information recommendation, and can also help us to recognize information resources more effectively, which is conducive to the discovery, management, dissemination and utilization of information.

目前,为对象打标签的方法通常有两种:第一种是由人工手动的给对象打标签,这种方式难以处理海量的对象,且标签的质量受人工因素影响存在波动。第二种是通过分析对象自身的内容(如文章的内容,商品的详细信息等),利用数据挖掘和机器学习算法从文本信息中自动生成标签,然而实际应用中,有的对象往往缺少描述信息或说明文本,这将可能无法通过数据挖掘和机器学习算法的方式生成相应的标签。At present, there are usually two methods for labeling objects: the first one is to manually label objects, which is difficult to deal with a large number of objects, and the quality of labels fluctuates due to human factors. The second is to analyze the content of the object itself (such as the content of the article, the detailed information of the product, etc.), and use data mining and machine learning algorithms to automatically generate tags from the text information. However, in practical applications, some objects often lack descriptive information. Or explanatory text, which may not be able to generate corresponding tags by means of data mining and machine learning algorithms.

可见,目前尚无较好的给对象打标签的方式。It can be seen that there is currently no better way to label objects.

发明内容Contents of the invention

本公开的目的是提供一种标签生成方法及装置,用于提供一种较好的为对象生成标签的方式。The purpose of the present disclosure is to provide a tag generation method and device for providing a better way of generating tags for objects.

根据本公开实施例的第一方面,提供一种标签生成方法,包括:According to a first aspect of an embodiment of the present disclosure, a tag generation method is provided, including:

确定已知对象和未知对象在预设时长内被用户访问的访问信息,其中,所述已知对象为已打标签的对象,所述未知对象为未打标签的对象,所述标签用于归类对象;Determine the access information of known objects and unknown objects accessed by users within a preset time period, wherein the known objects are tagged objects, the unknown objects are untagged objects, and the tags are used for attribution class object;

根据所述已知对象的标签以及所述访问信息,确定所述未知对象的标签。Determine the label of the unknown object according to the label of the known object and the access information.

可选的,所述已知对象包括多个对象,所述未知对象至少包括一个对象,所述用户包括多个用户;Optionally, the known object includes multiple objects, the unknown object includes at least one object, and the user includes multiple users;

在确定已知对象和未知对象在预设时长内被用户访问的访问信息之后,还包括:After determining the access information of known objects and unknown objects accessed by users within a preset time period, it also includes:

生成所述已知对象的行为矩阵A,其中,所述行为矩阵A用于指示所述多个用户分别对所述已知对象包括的每个对象进行访问的访问信息;及,generating a behavior matrix A of the known object, wherein the behavior matrix A is used to indicate the access information that the plurality of users respectively visit each object included in the known object; and,

生成所述未知对象的行为矩阵B,其中,所述行为矩阵B用于指示所述多个用户分别对所述未知对象包括的每个对象进行访问的访问信息;generating a behavior matrix B of the unknown object, wherein the behavior matrix B is used to indicate the access information that the plurality of users respectively access to each object included in the unknown object;

根据所述已知对象的标签以及所述访问信息,确定所述未知对象的标签,包括:Determining the label of the unknown object according to the label of the known object and the access information includes:

根据所述已知对象的标签矩阵M、所述行为矩阵A、及所述行为矩阵B,生成所述未知对象的标签矩阵N,以确定所述未知对象的标签,其中,标签矩阵用于指示对象分别对应于不同标签的权重。According to the label matrix M of the known object, the behavior matrix A, and the behavior matrix B, generate the label matrix N of the unknown object to determine the label of the unknown object, wherein the label matrix is used to indicate Objects correspond to weights of different labels respectively.

可选的,第一用户为所述多个用户中的任一用户,第一对象为所述已知对象或所述未知对象包括的任一对象;Optionally, the first user is any user among the multiple users, and the first object is any object included in the known object or the unknown object;

确定所述第一对象在所述预设时长内被所述第一用户访问的访问信息,包括:Determining the access information of the first object being accessed by the first user within the preset time period includes:

根据所述第一用户在所述预设时长内访问所述第一对象的次数和/或访问所述第一对象的时长,确定所述第一对象被所述第一用户访问的访问信息。Determining access information that the first object is accessed by the first user according to the number of times the first user accesses the first object within the preset time period and/or the duration of accessing the first object.

可选的,根据所述已知对象的标签矩阵M、所述行为矩阵A、及所述行为矩阵B,生成所述未知对象的标签矩阵N,包括:Optionally, generating the label matrix N of the unknown object according to the label matrix M of the known object, the behavior matrix A, and the behavior matrix B includes:

将所述标签矩阵M乘以所述行为矩阵A,以得到所述已知对象的用户标签矩阵W;multiplying the label matrix M by the behavior matrix A to obtain the user label matrix W of the known object;

将所述用户标签矩阵W乘以所述行为矩阵B的转置矩阵,以得到所述未知对象的标签矩阵N。Multiplying the user label matrix W by the transpose matrix of the behavior matrix B to obtain the label matrix N of the unknown object.

可选的,根据所述已知对象的标签矩阵M、所述行为矩阵A、及所述行为矩阵B,生成所述未知对象的标签矩阵N,包括:Optionally, generating the label matrix N of the unknown object according to the label matrix M of the known object, the behavior matrix A, and the behavior matrix B includes:

根据所述已知对象的标签矩阵M、所述行为矩阵A、所述行为矩阵B、及惩罚函数F(C),生成所述未知对象的标签矩阵N,其中,C为对象被访问的次数,n为预设次数阈值。According to the label matrix M of the known object, the behavior matrix A, the behavior matrix B, and the penalty function F(C), generate the label matrix N of the unknown object, wherein, C is the number of times the object is accessed, and n is the preset times threshold.

根据本公开实施例的第二方面,提供一种标签生成装置,包括:According to a second aspect of an embodiment of the present disclosure, a label generation device is provided, including:

第一确定模块,用于确定已知对象和未知对象在预设时长内被用户访问的访问信息,其中,所述已知对象为已打标签的对象,所述未知对象为未打标签的对象,所述标签用于归类对象;The first determination module is used to determine the access information of known objects and unknown objects accessed by users within a preset time period, wherein the known objects are tagged objects, and the unknown objects are untagged objects , the label is used to classify objects;

第二确定模块,用于根据所述已知对象的标签以及所述访问信息,确定所述未知对象的标签。The second determining module is configured to determine the label of the unknown object according to the label of the known object and the access information.

可选的,所述已知对象包括多个对象,所述未知对象至少包括一个对象,所述用户包括多个用户;Optionally, the known object includes multiple objects, the unknown object includes at least one object, and the user includes multiple users;

所述装置还包括:The device also includes:

第一生成模块,用于在确定已知对象和未知对象在预设时长内被用户访问的访问信息之后,生成所述已知对象的行为矩阵A,其中,所述行为矩阵A用于指示所述多个用户分别对所述已知对象包括的每个对象进行访问的访问信息;及,The first generation module is configured to generate the behavior matrix A of the known objects after determining the access information of the known objects and the unknown objects accessed by the user within a preset time period, wherein the behavior matrix A is used to indicate the The access information that the plurality of users respectively access to each object included in the known object; and,

生成所述未知对象的行为矩阵B,其中,所述行为矩阵B用于指示所述多个用户分别对所述未知对象包括的每个对象进行访问的访问信息;generating a behavior matrix B of the unknown object, wherein the behavior matrix B is used to indicate the access information that the plurality of users respectively access to each object included in the unknown object;

第二生成模块,用于根据所述已知对象的标签矩阵M、所述行为矩阵A、及所述行为矩阵B,生成所述未知对象的标签矩阵N,以确定所述未知对象的标签,其中,标签矩阵用于指示对象分别对应于不同标签的权重。The second generation module is used to generate the label matrix N of the unknown object according to the label matrix M of the known object, the behavior matrix A, and the behavior matrix B, so as to determine the label of the unknown object, Among them, the label matrix is used to indicate the weights of objects respectively corresponding to different labels.

可选的,第一用户为所述多个用户中的任一用户,第一对象为所述已知对象或所述未知对象包括的任一对象;Optionally, the first user is any user among the multiple users, and the first object is any object included in the known object or the unknown object;

所述第一确定模块用于:The first determination module is used for:

根据所述第一用户在所述预设时长内访问所述第一对象的次数和/或访问所述第一对象的时长,确定所述第一对象被所述第一用户访问的访问信息。Determining access information that the first object is accessed by the first user according to the number of times the first user accesses the first object within the preset time period and/or the duration of accessing the first object.

可选的,所述第二生成模块用于:Optionally, the second generating module is used for:

将所述标签矩阵M乘以所述行为矩阵A,以得到所述已知对象的用户标签矩阵W;multiplying the label matrix M by the behavior matrix A to obtain the user label matrix W of the known object;

将所述用户标签矩阵W乘以所述行为矩阵B的转置矩阵,以得到所述未知对象的标签矩阵N。Multiplying the user label matrix W by the transpose matrix of the behavior matrix B to obtain the label matrix N of the unknown object.

可选的,所述第二生成模块用于:Optionally, the second generating module is used for:

根据所述已知对象的标签矩阵M、所述行为矩阵A、所述行为矩阵B、及惩罚函数F(C),生成所述未知对象的标签矩阵N,其中,C为对象被访问的次数,n为预设次数阈值。According to the label matrix M of the known object, the behavior matrix A, the behavior matrix B, and the penalty function F(C), generate the label matrix N of the unknown object, wherein, C is the number of times the object is accessed, and n is the preset times threshold.

根据本公开实施例的第三方面,提供一种非临时性计算机可读存储介质,当所述存储介质中的指令由计算机的处理器执行时,使得计算机备能够执行一种标签生成方法,所述方法包括:According to the third aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the computer, the computer equipment can execute a label generation method, so The methods described include:

确定已知对象和未知对象在预设时长内被用户访问的访问信息,其中,所述已知对象为已打标签的对象,所述未知对象为未打标签的对象,所述标签用于归类对象;Determine the access information of known objects and unknown objects accessed by users within a preset time period, wherein the known objects are tagged objects, the unknown objects are untagged objects, and the tags are used for attribution class object;

根据所述已知对象的标签以及所述访问信息,确定所述未知对象的标签。Determine the label of the unknown object according to the label of the known object and the access information.

通过上述技术方案,可以利用已打过标签的已知对象,通过用户对已知对象和未打过标签的未知对象的访问行为,确定未知对象标签。由于对同一个用户而言,可以认为喜好是相对不变的,经常访问的对象通常是相关联的,因此,通过统计用户对已知对象和未知对象的访问行为来确定未知对象的标签的方式,可以较为准确地预测未知对象的标签,无需手动打标签,且对于缺少描述信息的未知对象,同样能够通过用户的访问行为较为准确的为其生成标签。Through the above technical solution, the known objects that have been tagged can be used to determine the tag of the unknown object through the user's access behavior to the known object and the unknown object that has not been tagged. For the same user, it can be considered that preferences are relatively invariant, and frequently visited objects are usually associated. Therefore, the method of determining the label of an unknown object by counting the user's access behavior to known objects and unknown objects , can accurately predict the labels of unknown objects without manual labeling, and for unknown objects that lack descriptive information, labels can also be generated more accurately through the user's access behavior.

本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.

附图说明Description of drawings

附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the description, together with the following specific embodiments, are used to explain the present disclosure, but do not constitute a limitation to the present disclosure. In the attached picture:

图1是根据一示例性实施例示出的一种标签生成方法的流程图。Fig. 1 is a flow chart showing a label generation method according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种标签生成装置的框图。Fig. 2 is a block diagram of a label generating device according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种标签生成装置的另一框图。Fig. 3 is another block diagram of a label generating device according to an exemplary embodiment.

具体实施方式Detailed ways

以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。Specific embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

图1是根据一示例性实施例示出的一种标签生成方法的流程图,如图1所示,该标签生成方法可以应用于计算机中,包括以下步骤。Fig. 1 is a flow chart showing a method for generating a label according to an exemplary embodiment. As shown in Fig. 1 , the method for generating a label can be applied to a computer, and includes the following steps.

步骤S11:确定已知对象和未知对象在预设时长内被用户访问的访问信息。Step S11: Determine the access information of known objects and unknown objects accessed by the user within a preset time period.

步骤S12:根据已知对象的标签以及访问信息,确定未知对象的标签。Step S12: Determine the label of the unknown object according to the label of the known object and the access information.

其中,已知对象为已打标签的对象,未知对象为未打标签的对象,标签可以用于归类对象。Among them, known objects are labeled objects, unknown objects are unlabeled objects, and labels can be used to classify objects.

本公开实施例中的对象可以是能够被打标签的任意对象,例如,新闻网站的新闻、购物网站的商品、小说、电影、网站,等等。标签可以用于归类对象,例如,某一部电影的标签包括悬疑,那么可以将该电影归类为悬疑类电影,等等。The objects in the embodiments of the present disclosure may be any objects that can be tagged, for example, news on news websites, commodities on shopping websites, novels, movies, websites, and so on. Tags can be used to classify objects, for example, if the tag of a certain movie includes suspense, then the movie can be classified as suspense movie, and so on.

已知对象也就是已经打过标签的对象,例如,某一条新闻的标签包括经济、房产和理财。未知对象也就是还没有打标签的对象,本公开实施例可以通过已知对象的标签,和用户的访问行为,给未知对象打标签。A known object is an object that has already been tagged. For example, the tags of a certain piece of news include economy, real estate, and financial management. An unknown object is an object that has not been tagged yet, and the embodiment of the present disclosure can tag the unknown object by using the tag of the known object and the user's access behavior.

可选的,第一用户为多个用户中的任一用户,第一对象为已知对象或未知对象包括的任一对象。确定第一对象在预设时长内被第一用户访问的访问信息,可以根据第一用户在预设时长内访问第一对象的次数和/或访问第一对象的时长,确定第一对象被第一用户访问的访问信息。Optionally, the first user is any user among multiple users, and the first object is any object included in a known object or an unknown object. Determining the access information of the first object being accessed by the first user within the preset time period may determine that the first object is accessed by the first user according to the number of times the first user visits the first object within the preset time period and/or the duration of accessing the first object A user's access information.

预设时长可以是预先设定的任意时长,例如可以是近一个月,或者也可以是近六个月,等等,本公开实施例对此不作限定。The preset duration may be any preset duration, for example, it may be nearly one month, or it may be nearly six months, etc., which is not limited in this embodiment of the present disclosure.

在每一个用户对已知对象或者未知对象中的每个对象进行访问时都会生成访问信息,访问信息可以用于表征用户访问对象的行为。以第一用户访问第一对象为例,可以根据第一用户在预设时长内访问第一对象的次数来确定访问信息,也可以根据第一用户在预设时长内访问第一对象的时间长短来确定访问信息,还可以根据第一用户在预设时长内访问第一对象的次数和时间长短共同确定访问信息。Access information is generated when each user accesses a known object or each object in an unknown object, and the access information can be used to characterize the user's behavior of accessing an object. Taking the first user's visit to the first object as an example, the access information can be determined according to the number of times the first user visits the first object within a preset time period, or according to the length of time the first user visits the first object within a preset time period To determine the access information, the access information may also be jointly determined according to the number of times the first user accesses the first object within a preset time period and the length of time.

例如,在近一个月内,用户1浏览商品1五次,用户1浏览商品2十次。访问信息可以是反映用户访问对象的次数的信息,那么可以记用户1访问商品1的访问信息为p1=5,记用户1访问商品2的访问信息为p2=10,此时p1<p2For example, in the past month, user 1 browsed product 1 five times, and user 1 browsed product 2 ten times. The visit information can reflect the number of times the user visits the object, so it can be recorded as p 1 = 5 for the visit information of user 1 visiting product 1, and p 2 = 10 for the visit information of user 1 visiting product 2. At this time, p 1 < p2 .

或者例如,在近一个月内,用户1浏览商品1总共耗时30分钟,用户1浏览商品2总共耗时10分钟。访问信息可以是反映用户访问对象的时长的信息,那么可以记用户1访问商品1的访问信息为p1=30,记用户1访问商品2的访问信息为p2=10,此时p1>p2Or for example, in the past month, user 1 spent a total of 30 minutes browsing product 1, and user 1 spent a total of 10 minutes browsing product 2. The access information can be the information reflecting the duration of the user’s access object. Then, the access information of user 1’s access to commodity 1 can be recorded as p 1 =30, and the access information of user 1’s access to commodity 2 can be recorded as p 2 =10. At this time, p 1 > p2 .

或者例如,在近一个月内,用户1浏览商品1五次,总共耗时30分钟,用户1浏览商品2两次,总共耗时10分钟。访问信息可以是通过数值来综合反映用户访问对象的次数和时长的信息,比如,每访问一次记1分,每访问10分钟记1分,那么可以记用户1访问商品1的访问信息为p1=8,记用户1访问商品2的访问信息为p2=3,此时p1>p2Or for example, in the past month, user 1 browsed product 1 five times, and it took a total of 30 minutes, and user 1 browsed product 2 twice, and it took a total of 10 minutes. The access information can be the information that comprehensively reflects the number and duration of the user's access to the object through numerical values. For example, 1 point is scored for each visit, and 1 point is scored for every 10 minutes of access, so the access information of user 1's access to product 1 can be recorded as p 1 =8, record the access information of user 1's access to product 2 as p 2 =3, and at this time p 1 >p 2 .

通过以上的方式可以较为准确地通过访问信息表征用户访问对象的行为,有利于进一步更为准确地生成未知对象的标签。Through the above method, the user's behavior of accessing the object can be represented more accurately through the access information, which is conducive to further more accurately generating the label of the unknown object.

以下将对如何生成未知对象的标签进行说明。The following describes how to generate labels for unknown objects.

可选的,已知对象包括多个对象,未知对象至少包括一个对象,用户包括多个用户,在确定已知对象和未知对象在预设时长内被用户访问的访问信息之后,还可以生成已知对象的行为矩阵A,其中,行为矩阵A用于指示多个用户分别对已知对象包括的每个对象进行访问的访问信息;及,生成未知对象的行为矩阵B,其中,行为矩阵B用于指示多个用户分别对未知对象包括的每个对象进行访问的访问信息,可以根据已知对象的标签矩阵M、行为矩阵A、及行为矩阵B,生成未知对象的标签矩阵N,以确定未知对象的标签,其中,标签矩阵用于指示对象分别对应于不同标签的权重。Optionally, the known object includes multiple objects, the unknown object includes at least one object, and the user includes multiple users. After determining that the known object and the unknown object are accessed by the user within a preset period of time, it is also possible to generate a known object. The behavior matrix A of the known object, wherein, the behavior matrix A is used to indicate the access information that multiple users respectively visit each object included in the known object; and, generate the behavior matrix B of the unknown object, wherein, the behavior matrix B uses Based on the access information indicating that multiple users visit each object included in the unknown object, the label matrix N of the unknown object can be generated according to the label matrix M, behavior matrix A, and behavior matrix B of the known object to determine the unknown object. The labels of the objects, where the label matrix is used to indicate the weights of the objects respectively corresponding to different labels.

本公开实施例中,已知对象和用户可以是较大数量的,在实际应用中,统计的已知对象和用户的数量越大,越能够体现用户的访问习惯,那么得出的未知对象的标签也越准确。通过矩阵的方式来表示已知对象的标签和用户的访问行为,可以较为准确且方便地记录大量的数据,有利于更加准确地生成未知对象的标签。In the embodiments of the present disclosure, the number of known objects and users may be relatively large. In practical applications, the larger the number of known objects and users is, the more it can reflect the user's access habits. Then the obtained unknown object Labels are also more accurate. Representing the tags of known objects and the user's access behavior in the form of a matrix can record a large amount of data more accurately and conveniently, which is conducive to more accurately generating tags of unknown objects.

可选的,根据已知对象的标签矩阵M、行为矩阵A、及行为矩阵B,生成未知对象的标签矩阵N,可以将标签矩阵M乘以行为矩阵A,以得到已知对象的用户标签矩阵W,在将用户标签矩阵W乘以行为矩阵B的转置矩阵,以得到未知对象的标签矩阵N。Optionally, according to the label matrix M, behavior matrix A, and behavior matrix B of the known object, the label matrix N of the unknown object can be generated, and the label matrix M can be multiplied by the behavior matrix A to obtain the user label matrix of the known object W, the user label matrix W is multiplied by the transpose matrix of the behavior matrix B to obtain the label matrix N of the unknown object.

标签矩阵可以用于指示对象分别对应于不同标签的权重,例如,在标签矩阵中,新闻1对应于军事标签的权重为5,新闻1对应于社会标签的权重为2,那么也即是说新闻1的标签是新闻的权重大于标签是社会的权重,等等。The label matrix can be used to indicate the weights of the objects corresponding to different labels. For example, in the label matrix, news 1 corresponds to the military label with a weight of 5, and news 1 corresponds to the social label with a weight of 2, so that means news A label of 1 is news with a greater weight than a label that is social, and so on.

可以设U是用户集,包括多个用户,T是标签集,I是已知对象的对象集,J是未知对象的对象集,已知对象的标签矩阵为M(i×t),已知对象的行为矩阵为A(u×i),未知对象的行为矩阵为B(u×j)。It can be assumed that U is a user set, including multiple users, T is a label set, I is an object set of known objects, J is an object set of unknown objects, and the label matrix of known objects is M(i×t). The behavior matrix of the object is A(u×i), and the behavior matrix of the unknown object is B(u×j).

可以生成用户标签矩阵:A matrix of user labels can be generated:

W(u×t)=A(u×i)×M(i×t) (1)W(u×t)=A(u×i)×M(i×t) (1)

进一步可生成未知对象的对象集J的标签矩阵:Further, the label matrix of the object set J of unknown objects can be generated:

N(j×t)=BT(u×j)×W(u×t) (2)N(j×t)=B T (u×j)×W(u×t) (2)

在实际应用中,为了得到的数据能够更加真实地反映用户的行为,还可以将矩阵W(u×t)和矩阵N(j×t)除以用户访问对象的访问信息。In practical applications, in order to obtain data that can more truly reflect the user's behavior, the matrix W(u×t) and matrix N(j×t) can also be divided by the access information of the user's access object.

可选的,可以根据已知对象的标签矩阵M、行为矩阵A、行为矩阵B、及惩罚函数F(C),生成未知对象的标签矩阵N,其中,C为对象被访问的次数,n为预设次数阈值。Optionally, the label matrix N of the unknown object can be generated according to the label matrix M, behavior matrix A, behavior matrix B, and penalty function F(C) of the known object, where, C is the number of times the object is accessed, and n is the preset times threshold.

本公开实施例中,为了避免少数活跃用户的行为对结果造成的偏差,对于行为数过大的用户引入惩罚函数F(C),比如,对访问次数大于100的用户引入惩罚函数,即可以设定n=100。In the embodiment of the present disclosure, in order to avoid deviations caused by the actions of a small number of active users, a penalty function F(C) is introduced for users with too many actions. For example, a penalty function is introduced for users with more than 100 visits, that is, Set n=100.

引入惩罚函数之后,公式(1)可改进为:After introducing the penalty function, formula (1) can be improved as:

W(u×t)=A(u×i)×M(i×t)/F(C) (3)W(u×t)=A(u×i)×M(i×t)/F(C) (3)

公式(2)可改进为:Formula (2) can be improved as:

N(j×t)=BT(u×j)×W(u×t)/F(C) (4)N(j× t )=BT(u×j)×W(u×t)/F(C) (4)

以下将通过具体的实施例对本公开实施例中的技术方案进行说明。The technical solutions in the embodiments of the present disclosure will be described below through specific embodiments.

已知对象包括对象A、对象B和对象C,标签集T包括标签1、标签2、标签3、标签4和标签5,用户集U包括用户甲、用户乙和用户丙,未知对象包括对象D和对象E。Known objects include object A, object B, and object C, tag set T includes tag 1, tag 2, tag 3, tag 4, and tag 5, user set U includes user A, user B, and user C, and unknown objects include object D and object E.

已知对象的标签矩阵M(i×t)为:The label matrix M(i×t) of known objects is:

标签1label 1 标签2label 2 标签3label 3 标签4label 4 标签5label 5 对象AObject A 22 55 00 33 11 对象BObject B 22 00 44 33 22 对象CObject C 33 11 33 44 33

用户集U中所有用户在已知对象上的行为矩阵为A(u×i)为:The behavior matrix A(u×i) of all users in the user set U on known objects is:

对象AObject A 对象BObject B 对象CObject C 用户甲User A 22 11 33 用户乙User B 44 00 11 用户丙User C 00 44 33

可生成用户标签矩阵W(u×t)=A(u×i)×M(i×t)/访问信息,即:User label matrix W(u×t)=A(u×i)×M(i×t)/access information can be generated, namely:

用户集U中所有用户在未知对象上的行为矩阵为B(u×j)为:The behavior matrix of all users in the user set U on the unknown object is B(u×j):

对象DObject D 对象EObject E 用户甲User A 44 11 用户乙User B 33 22 用户丙User C 11 55

可生成未知对象的标签矩阵N(j×t)=BT(u×j)×W(u×t)/访问信息,即:The label matrix N(j×t)=B T (u×j)×W(u×t)/access information of unknown objects can be generated, namely:

在得到未知对象的标签矩阵N之后,便能够确定未知对象中每个对象的标签,进而为未知对象生成相应的标签。通过上述方式,可以较为准确地预测未知对象的标签,无需手动地给对象打标签,且对于缺少描述信息的未知对象,同样能够通过用户的访问行为较为准确的为其生成标签。After obtaining the label matrix N of the unknown object, the label of each object in the unknown object can be determined, and then a corresponding label can be generated for the unknown object. Through the above method, the label of the unknown object can be predicted more accurately without manually labeling the object, and for the unknown object lacking descriptive information, the label can also be generated more accurately through the user's access behavior.

请参见图2,基于同一发明构思,本公开实施例提供一种标签生成装置100,该装置100可以包括:Referring to FIG. 2, based on the same inventive concept, an embodiment of the present disclosure provides a label generation device 100, which may include:

第一确定模块101,用于确定已知对象和未知对象在预设时长内被用户访问的访问信息,其中,已知对象为已打标签的对象,未知对象为未打标签的对象,标签用于归类对象;The first determination module 101 is used to determine the access information of known objects and unknown objects accessed by users within a preset time period, wherein the known objects are tagged objects, unknown objects are untagged objects, and tags are used to classify objects;

第二确定模块102,用于根据已知对象的标签以及访问信息,确定未知对象的标签。The second determination module 102 is configured to determine the label of the unknown object according to the label of the known object and the access information.

可选的,请参见图3,已知对象包括多个对象,未知对象至少包括一个对象,用户包括多个用户;Optionally, referring to FIG. 3, the known object includes multiple objects, the unknown object includes at least one object, and the user includes multiple users;

装置100还包括:The device 100 also includes:

第一生成模块103,用于在确定已知对象和未知对象在预设时长内被用户访问的访问信息之后,生成已知对象的行为矩阵A,其中,行为矩阵A用于指示多个用户分别对已知对象包括的每个对象进行访问的访问信息;及,The first generation module 103 is configured to generate a behavior matrix A of known objects after determining the access information of known objects and unknown objects accessed by users within a preset period of time, wherein the behavior matrix A is used to indicate that multiple users respectively access information for each object included in the known object; and,

生成未知对象的行为矩阵B,其中,行为矩阵B用于指示多个用户分别对未知对象包括的每个对象进行访问的访问信息;generating a behavior matrix B of the unknown object, wherein the behavior matrix B is used to indicate the access information that multiple users respectively access to each object included in the unknown object;

第二生成模块104,用于根据已知对象的标签矩阵M、行为矩阵A、及行为矩阵B,生成未知对象的标签矩阵N,以确定未知对象的标签,其中,标签矩阵用于指示对象分别对应于不同标签的权重。The second generation module 104 is used to generate the label matrix N of the unknown object according to the label matrix M, behavior matrix A, and behavior matrix B of the known object, so as to determine the label of the unknown object, wherein the label matrix is used to indicate that the objects are respectively Weights corresponding to different labels.

可选的,第一用户为多个用户中的任一用户,第一对象为已知对象或未知对象包括的任一对象;Optionally, the first user is any user among multiple users, and the first object is any object included in a known object or an unknown object;

第一确定模块101用于:The first determination module 101 is used for:

根据第一用户在预设时长内访问第一对象的次数和/或访问第一对象的时长,确定第一对象被第一用户访问的访问信息。The access information that the first object is accessed by the first user is determined according to the number of times the first user accesses the first object within a preset time period and/or the duration of accessing the first object.

可选的,第二生成模块104用于:Optionally, the second generating module 104 is used for:

将标签矩阵M乘以行为矩阵A,以得到已知对象的用户标签矩阵W;Multiply the label matrix M by the behavior matrix A to obtain the user label matrix W of known objects;

将用户标签矩阵W乘以行为矩阵B的转置矩阵,以得到未知对象的标签矩阵N。Multiply the user label matrix W by the transpose of the behavior matrix B to get the label matrix N for unknown objects.

可选的,第二生成模块104用于:Optionally, the second generating module 104 is used for:

根据已知对象的标签矩阵M、行为矩阵A、行为矩阵B、及惩罚函数F(C),生成未知对象的标签矩阵N,其中,C为对象被访问的次数,n为预设次数阈值。According to the label matrix M, behavior matrix A, behavior matrix B, and penalty function F(C) of the known object, generate the label matrix N of the unknown object, where, C is the number of times the object is accessed, and n is the preset times threshold.

在本公开所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In the embodiments provided in the present disclosure, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.

在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。Each functional module in each embodiment of the present application may be integrated into one processing unit, or each module may physically exist separately, or two or more modules may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), magnetic disk or optical disk and other media that can store program codes. .

以上所述,以上实施例仅用以对本公开的技术方案进行了详细介绍,但以上实施例的说明只是用于帮助理解本公开的方法及其核心思想,不应理解为对本公开的限制。本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。As mentioned above, the above embodiments are only used to introduce the technical solutions of the present disclosure in detail, but the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure, and should not be construed as limiting the present disclosure. Within the technical scope disclosed in the present disclosure, any changes or substitutions that can be easily conceived by those skilled in the art shall fall within the protection scope of the present disclosure.

Claims (8)

1. a kind of label generating method, which is characterized in that the described method includes:
Determine the access information that known object and unknown object are accessed by the user in preset duration, wherein the known object For the object to have labelled, the unknown object is the object not labelled, and the label is used for classified object;
According to the label of the known object and the access information, the label of the unknown object is determined;
The known object includes multiple objects, and the unknown object includes at least an object, and the user includes multiple use Family;
After determining the access information that known object and unknown object are accessed by the user in preset duration, further includes:
Generate the behavioural matrix A of the known object, wherein it is right respectively that the behavioural matrix A is used to indicate the multiple user The access information that each object that the known object includes accesses;And
Generate the behavioural matrix B of the unknown object, wherein it is right respectively that the behavioural matrix B is used to indicate the multiple user The access information that each object that the unknown object includes accesses;
According to the label of the known object and the access information, the label of the unknown object is determined, comprising:
According to the label matrix M of the known object, the behavioural matrix A and the behavioural matrix B, it is described unknown right to generate The label matrix N of elephant, with the label of the determination unknown object, wherein label matrix is used to indicate object and corresponds respectively to not With the weight of label.
2. the method according to claim 1, wherein the first user be the multiple user in any user, Any object that first object is the known object or the unknown object includes;
Determine the access information that first object is accessed in the preset duration by first user, comprising:
According to first user accessed in the preset duration first object number and/or access it is described first pair The duration of elephant determines the access information that first object is accessed by first user.
3. the method according to claim 1, wherein according to the label matrix M of the known object, the behavior Matrix A and the behavioural matrix B, generate the label matrix N of the unknown object, comprising:
By the label matrix M multiplied by the behavioural matrix A, to obtain the user tag matrix W of the known object;
By the user tag matrix W multiplied by the transposed matrix of the behavioural matrix B, to obtain the label square of the unknown object Battle array N.
4. the method according to claim 1, wherein according to the label matrix M of the known object, the behavior Matrix A and the behavioural matrix B, generate the label matrix N of the unknown object, comprising:
It is raw according to the label matrix M of the known object, the behavioural matrix A, the behavioural matrix B and penalty F (C) At the label matrix N of the unknown object, whereinC is the number that object is accessed, and n is pre- If frequency threshold value.
5. a kind of label generating means, which is characterized in that described device includes:
First determining module, the access information being accessed by the user in preset duration for determining known object and unknown object, Wherein, the known object is the object to have labelled, and the unknown object is the object not labelled, and the label is for returning Class object;
Second determining module, for according to the known object label and the access information, determine the unknown object Label;
The known object includes multiple objects, and the unknown object includes at least an object, and the user includes multiple use Family;
Described device further include:
First generation module, in the access information for determining that known object and unknown object are accessed by the user in preset duration Later, the behavioural matrix A of the known object is generated, wherein it is right respectively that the behavioural matrix A is used to indicate the multiple user The access information that each object that the known object includes accesses;And
Generate the behavioural matrix B of the unknown object, wherein it is right respectively that the behavioural matrix B is used to indicate the multiple user The access information that each object that the unknown object includes accesses;
Second generation module, for the label matrix M, the behavioural matrix A and the behavioural matrix according to the known object B generates the label matrix N of the unknown object, with the label of the determination unknown object, wherein label matrix is used to indicate Object corresponds respectively to the weight of different labels.
6. device according to claim 5, which is characterized in that the first user is any user in the multiple user, Any object that first object is the known object or the unknown object includes;
First determining module is used for:
According to first user accessed in the preset duration first object number and/or access it is described first pair The duration of elephant determines the access information that first object is accessed by first user.
7. device according to claim 5, which is characterized in that second generation module is used for:
By the label matrix M multiplied by the behavioural matrix A, to obtain the user tag matrix W of the known object;
By the user tag matrix W multiplied by the transposed matrix of the behavioural matrix B, to obtain the label square of the unknown object Battle array N.
8. device according to claim 5, which is characterized in that second generation module is used for:
It is raw according to the label matrix M of the known object, the behavioural matrix A, the behavioural matrix B and penalty F (C) At the label matrix N of the unknown object, whereinC is the number that object is accessed, and n is pre- If frequency threshold value.
CN201611116938.5A 2016-12-07 2016-12-07 Label generation method and device Active CN106547921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611116938.5A CN106547921B (en) 2016-12-07 2016-12-07 Label generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611116938.5A CN106547921B (en) 2016-12-07 2016-12-07 Label generation method and device

Publications (2)

Publication Number Publication Date
CN106547921A CN106547921A (en) 2017-03-29
CN106547921B true CN106547921B (en) 2019-11-15

Family

ID=58396469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611116938.5A Active CN106547921B (en) 2016-12-07 2016-12-07 Label generation method and device

Country Status (1)

Country Link
CN (1) CN106547921B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287375B (en) * 2019-05-30 2022-02-15 北京百度网讯科技有限公司 Method and device for determining video tag and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193946A (en) * 2010-03-18 2011-09-21 株式会社理光 Method and system for adding tags into media file
CN103049479A (en) * 2012-11-26 2013-04-17 北京奇虎科技有限公司 Method and system for generating online video label
US8922375B2 (en) * 2005-03-01 2014-12-30 Alien Technology, Llc Multistatic antenna configuration for radio frequency identification (RFID) systems
CN104463202A (en) * 2014-11-28 2015-03-25 苏州大学 Multi-class image semi-supervised classifying method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8922375B2 (en) * 2005-03-01 2014-12-30 Alien Technology, Llc Multistatic antenna configuration for radio frequency identification (RFID) systems
CN102193946A (en) * 2010-03-18 2011-09-21 株式会社理光 Method and system for adding tags into media file
CN103049479A (en) * 2012-11-26 2013-04-17 北京奇虎科技有限公司 Method and system for generating online video label
CN104463202A (en) * 2014-11-28 2015-03-25 苏州大学 Multi-class image semi-supervised classifying method and system

Also Published As

Publication number Publication date
CN106547921A (en) 2017-03-29

Similar Documents

Publication Publication Date Title
CN106940705B (en) A method and device for constructing user portraits
CN107729937B (en) Method and device for determining user interest tag
US9310879B2 (en) Methods and systems for displaying web pages based on a user-specific browser history analysis
CN109325179B (en) Method and device for promoting content
US9098569B1 (en) Generating suggested search queries
JP5581408B2 (en) Information processing system, information processing apparatus, information processing method, and program
JP2018516421A (en) Network access operation identification method, server, and storage medium
JP2013522731A5 (en)
CN105183912A (en) Abnormal log determination method and device
WO2019169964A1 (en) Resource recommendation method and apparatus, marketing recommendation method and apparatus, and electronic device
CN107291755B (en) Terminal pushing method and device
CN111242709A (en) Message pushing method and device, equipment and storage medium thereof
CN105373608A (en) Input method based scene-mode content pushing method and system
US20170228378A1 (en) Extracting topics from customer review search queries
WO2022228371A1 (en) Malicious traffic account detection method, apparatus and device, and storage medium
US9460163B1 (en) Configurable extractions in social media
CN115878761B (en) Event context generation method, device and medium
CN109981712B (en) Method and device for pushing information
US9785678B1 (en) Determining taxonomy nodes for browsing
JP6680472B2 (en) Information processing apparatus, information processing method, and information processing program
CN111143546A (en) Method and device for obtaining recommendation language and electronic equipment
US10339559B2 (en) Associating social comments with individual assets used in a campaign
CN108319622A (en) A kind of media content recommendations method and device
CN106547921B (en) Label generation method and device
US10102560B1 (en) Identifying child node correlations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20250228

Address after: Room E162, Room 301 to 353, No.1 Kehua Street, Tianhe District, Guangzhou City, Guangdong Province, 510640 (Office only)

Patentee after: Guangzhou binju Technology Co.,Ltd.

Country or region after: China

Address before: Hunnan rookie street Shenyang city Liaoning province 110179 No. 2

Patentee before: NEUSOFT Corp.

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250325

Address after: 0639, 3rd Floor, Building 1, No. 2 Hongye East Road, Daxing District, Beijing 102600

Patentee after: Shouxin Huixin (Beijing) Technology Co.,Ltd.

Country or region after: China

Address before: Room E162, Room 301 to 353, No.1 Kehua Street, Tianhe District, Guangzhou City, Guangdong Province, 510640 (Office only)

Patentee before: Guangzhou binju Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right