[go: up one dir, main page]

CN113573242B - Identification method, device and equipment of re-networking user - Google Patents

Identification method, device and equipment of re-networking user Download PDF

Info

Publication number
CN113573242B
CN113573242B CN202010350086.6A CN202010350086A CN113573242B CN 113573242 B CN113573242 B CN 113573242B CN 202010350086 A CN202010350086 A CN 202010350086A CN 113573242 B CN113573242 B CN 113573242B
Authority
CN
China
Prior art keywords
user
feature
behavior
cube
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010350086.6A
Other languages
Chinese (zh)
Other versions
CN113573242A (en
Inventor
蔡国庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Research Institute of China Mobile Communication Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Research Institute of China Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, Research Institute of China Mobile Communication Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010350086.6A priority Critical patent/CN113573242B/en
Publication of CN113573242A publication Critical patent/CN113573242A/en
Application granted granted Critical
Publication of CN113573242B publication Critical patent/CN113573242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • H04B17/318Received signal strength
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/26Network addressing or numbering for mobility support

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device and equipment for identifying a re-networking user. The method comprises the following steps: acquiring behavior feature vectors of at least two users; the behavior feature vector records the occurrence time, the spatial position and the intensity representation information of the target behavior; according to the characteristic points of the behavior characteristic vector of each user determined in the space-time behavior characteristic cube, carrying out similarity analysis on the characteristic points of the first user and the characteristic points of the second user, and judging whether the second user is a re-network user of the first user; the time, the longitude of the space position and the latitude of the space position are used as coordinates of the space-time behavior feature cube, and the feature points are distributed in the space-time behavior feature cube according to the time and the space position of the target behavior. The identification method utilizes the time, the space position and the intensity representation information to construct the behavior characteristic vector of the user, compares the characteristic points of different users, and can simply and effectively identify the re-entry network user in the same operation network.

Description

重入网用户的识别方法、装置及设备Method, device and equipment for identifying re-entrant network users

技术领域technical field

本发明涉及通信技术领域,尤其是指一种重入网用户的识别方法、装置及设备。The present invention relates to the field of communication technology, in particular to a method, device and equipment for identifying re-entry network users.

背景技术Background technique

正在或者曾经使用某家运营商卡号的用户,在短期内又购买所属同一运营商的卡号入网,新号码全部或部分替代旧号码,这部分用户即为重入网用户,重入网用户占用系统卡号资源,增加了公司的营销成本,加大业务风险,因此需要进行有效识别并管理。但是由于重入网手机号码和原在网手机号码是两个不同的号码,因此如何判断这两个号码是否属于同一人使用是识别重入网号码的关键。Users who are or have used a card number of a certain operator purchase a card number belonging to the same operator in a short period of time to access the network, and the new number completely or partially replaces the old number. These users are re-entry users, and re-entry users occupy system card number resources , which increases the company's marketing costs and increases business risks, so it needs to be effectively identified and managed. However, since the re-entry mobile phone number and the original online mobile phone number are two different numbers, how to determine whether the two numbers belong to the same person is the key to identifying the re-entry number.

发明内容Contents of the invention

本发明技术方案的目的在于提供一种重入网用户的识别方法、装置及设备,能够简单、有效地识别出同一运营网络中的重入网用户。The purpose of the technical solution of the present invention is to provide a method, device and equipment for identifying re-entry users, which can simply and effectively identify re-entry users in the same operating network.

本发明实施例提供一种重入网用户的识别方法,其中,包括:An embodiment of the present invention provides a method for identifying a re-entry user, which includes:

获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;Obtaining behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;

根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;According to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, the feature points of the first user and the feature points of the second user among at least two users are analyzed for similarity, and the second user is judged. Whether the user is a re-entry user of the first user;

其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.

可选地,所述的重入网用户的识别方法,其中,所述获取至少两个用户的行为特征向量,包括:Optionally, the method for identifying re-entry users, wherein the acquiring behavioral feature vectors of at least two users includes:

采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;Collect behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;

根据所述行为数据构造每一用户的所述时空行为特征立方体;Constructing the spatiotemporal behavior characteristic cube of each user according to the behavior data;

对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector;

删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。Deleting other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.

可选地,所述的重入网用户的识别方法,其中,所述对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the method for identifying re-entry users, wherein the cluster analysis is performed on the behavior data in the spatio-temporal behavior characteristic cube, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined to be Behavior feature vectors, including:

依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;

对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;

将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.

可选地,所述的重入网用户的识别方法,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the method for identifying re-entry users, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.

可选地,所述的重入网用户的识别方法,其中,在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,所述方法还包括:Optionally, the method for identifying re-entry users, wherein, before performing similarity analysis on the feature points of the first user and the feature points of the second user among the at least two users, the method further includes:

对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized to obtain the normalized data vector of the first user and the normalized data vector of the second user ; Each normalized data vector corresponds to a feature point;

其中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, including:

对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.

可选地,所述的重入网用户的识别方法,其中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:Optionally, in the method for identifying re-entry users, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, and it is judged whether the second user is the Re-entry users of the first user, including:

确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;

在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.

可选地,所述的重入网用户的识别方法,其中,所述确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, in the method for identifying re-entry users, the determining similar feature points between the feature points of the first user and the feature points of the second user includes:

选取第一用户的第一特征点;Selecting the first feature point of the first user;

计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;

分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.

可选地,所述的重入网用户的识别方法,其中,所述分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, the method for identifying re-entry users, wherein, analyzing the similarity value between the first feature point and the second feature point, and judging the difference between the first feature point and the second feature point Whether it is a similar feature point, including:

获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;

根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;

根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;

确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.

可选地,所述的重入网用户的识别方法,其中,根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, the method for identifying re-entry users, wherein, according to the first weight value and the second weight value, determining a weight coefficient includes:

计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;

确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.

可选地,所述的重入网用户的识别方法,其中,根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, the method for identifying re-entry users, wherein, according to the weight coefficient and the distance between the first feature point and the second feature point, calculating the similarity value includes:

依据以下公式计算相似度值:The similarity value is calculated according to the following formula:

Si=1-Di/Wi;Si=1-Di/Wi;

其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.

可选地,所述的重入网用户的识别方法,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,所述方法还包括:Optionally, the method for identifying re-entry users, wherein, before de-uniting and normalizing the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user , the method also includes:

根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;

对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;

其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the de-unit standardization transformation is performed on the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user, including:

对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.

可选地,所述的重入网用户的识别方法,其中,根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the method for identifying re-entry users, wherein, in the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the behavioral feature vector of the first user and the behavioral feature vector of the second user Second, the time distribution dimension of the user's behavior feature vector, to determine the time domain segmentation point, including:

依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;

根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;

选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.

可选地,所述的重入网用户的识别方法,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the method for identifying re-entry users, wherein, the behavior feature vector in the first user's spatio-temporal behavior feature cube and the behavior feature vector in the second user's spatio-temporal behavior feature cube are de-united and standardized, include:

通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.

本发明实施例还提供一种重入网用户的识别装置,其中,包括:An embodiment of the present invention also provides an identification device for a re-entry user, which includes:

向量获取模块,用于获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;A vector acquisition module, configured to acquire behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;

比较模块,用于根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;The comparison module is used to perform similarity analysis on the feature points of the first user and the feature points of the second user among at least two users according to the feature points determined in the spatio-temporal behavior feature cube by the behavior feature vector of each user, judging whether the second user is a re-entry user of the first user;

其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.

本发明实施例还提供一种识别设备,其中,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如上任一项所述的重入网用户的识别方法。An embodiment of the present invention also provides an identification device, which includes: a processor, a memory, and a program stored on the memory and operable on the processor. When the program is executed by the processor, the following steps are implemented: The method for identifying re-entry users described in any one of the preceding items.

本发明实施例还提供一种可读存储介质,其中,所述可读存储介质上存储有程序,所述程序被处理器执行时实现如上任一项所述的重入网用户的识别方法中的步骤。An embodiment of the present invention also provides a readable storage medium, wherein a program is stored on the readable storage medium, and when the program is executed by a processor, the method for identifying re-entrant users as described in any one of the above items is implemented. step.

本发明上述技术方案中的至少一个具有以下有益效果:At least one of the above technical solutions of the present invention has the following beneficial effects:

采用本发明实施例所述重入网用户的识别方法,利用时间、空间位置和强度表示信息构建用户的行为特征向量,并根据行为特征向量在时空行为特征立方体中确定的特征点,进行相似度分析,进行重入网用户识别,该识别方法将时间、空间位置和强度表示信息相关联,相较于现有技术能够简单、有效地识别出同一运营网络中的重入网用户。Using the identification method for re-entry users described in the embodiment of the present invention, using time, spatial position and intensity representation information to construct user behavior feature vectors, and performing similarity analysis according to the feature points determined by the behavior feature vectors in the spatio-temporal behavior feature cube , to identify re-entry users, the identification method associates time, space position and intensity representation information, compared with the prior art, it can simply and effectively identify re-entry users in the same operating network.

附图说明Description of drawings

图1为本发明实施例所述重入网用户的识别方法的流程示意图;FIG. 1 is a schematic flow diagram of a method for identifying re-entry users according to an embodiment of the present invention;

图2为图1的步骤S110的流程示意图;FIG. 2 is a schematic flow chart of step S110 in FIG. 1;

图3为其中一时空行为特征立方体的示意图;3 is a schematic diagram of one of the spatiotemporal behavioral feature cubes;

图4为图2中步骤S113的流程示意图;FIG. 4 is a schematic flow chart of step S113 in FIG. 2;

图5为其中一时间切片的示意图;Figure 5 is a schematic diagram of one of the time slices;

图6为图1中步骤S120的流程示意图;FIG. 6 is a schematic flow chart of step S120 in FIG. 1;

图7为进行时域切分拼装后的时空行为特征立方体的结构示意图;Fig. 7 is a schematic structural diagram of the space-time behavior characteristic cube after time domain segmentation and assembly;

图8为本发明实施例所述重入网用户的识别装置的流程示意图;FIG. 8 is a schematic flow diagram of an identification device for a re-entry user according to an embodiment of the present invention;

图9为本发明实施例所述识别设备的结构示意图。Fig. 9 is a schematic structural diagram of an identification device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.

本发明实施例提供一种重入网用户的识别方法,利用同一用户在特定时间、空间的行为具有高度相似性的特征,利用时间、空间位置和强度表示信息构建用户的行为特征向量,根据行为特征向量在时空行为特征立方体中确定的特征点,将时间、空间位置和强度表示信息相关联,能够简单、有效地识别出同一运营网络中的重入网用户。An embodiment of the present invention provides a method for identifying re-entry users, which utilizes the characteristics of high similarity in the behavior of the same user at a specific time and space, uses time, space position and intensity representation information to construct a user's behavior feature vector, and according to the behavior characteristics The feature points determined by the vector in the spatio-temporal behavior feature cube correlate time, space position and intensity representation information, and can easily and effectively identify re-entry users in the same operating network.

本发明其中一实施例所述重入网用户的识别方法,如图1所示,所述方法包括:The method for identifying re-entry users described in one of the embodiments of the present invention, as shown in Figure 1, the method includes:

S110,获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;S110, acquiring behavior feature vectors of at least two users; the behavior feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;

S120,根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;S120, according to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, perform a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, and determine the Whether the second user is a re-entry user of the first user;

其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.

采用本发明实施例所述重入网用户的识别方法,利用时间、空间位置和强度表示信息构建用户的行为特征向量,将不同用户的行为特征向量在时空行为特征立方体中确定的特征点进行比较,进行重入网用户的识别,相较于利用单一维度模型进行用户识别,能够有效保证重入网用户识别的准确率;另外,采用该方式进行重入网用户识别,将时间、空间位置和强度表示信息相关联分析,也能够进一步有效保证重入网用户识别的准确率,且相较于现有技术分别针对每一维度进行单独建模进行相似度分析,之后综合对各个维度相似度分析,进行重入网用户识别的方式,本发明实施例所述识别方法,更简单且易于实现。Using the method for identifying re-entry users described in the embodiments of the present invention, using time, spatial position and intensity representation information to construct user behavior feature vectors, and comparing the feature points determined by the behavior feature vectors of different users in the spatiotemporal behavior feature cube, Compared with using a single-dimensional model for user identification, the identification of re-entry users can effectively guarantee the accuracy of re-entry user identification; in addition, using this method for re-entry user identification correlates time, spatial location and intensity representation information Combined analysis can further effectively ensure the accuracy of re-entry user identification, and compared with the existing technology, each dimension is modeled separately for similarity analysis, and then the similarity analysis of each dimension is comprehensively analyzed to identify re-entry users. The identification method, the identification method described in the embodiment of the present invention, is simpler and easier to implement.

可选地,如图2所示,在步骤S110,获取至少两个用户的行为特征向量,包括:Optionally, as shown in FIG. 2, in step S110, at least two user behavior feature vectors are obtained, including:

S111,采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;S111, collect the behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;

S112,根据所述行为数据构造每一用户的所述时空行为特征立方体;其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为数据依据目标行为的时间和空间位置在所述时空行为特征立方体中分布;S112, constructing the spatio-temporal behavior characteristic cube of each user according to the behavior data; wherein the spatio-temporal behavior characteristic cube takes time, the longitude of the spatial position and the latitude of the spatial position as coordinates, and the behavior data is based on the time of the target behavior and spatial positions are distributed in the spatiotemporal behavioral feature cube;

S113,对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;S113. Perform cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determine that the corresponding behavior data whose intensity representation information is greater than a preset intensity threshold is the behavior feature vector;

S114,删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。S114. Delete other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.

在步骤S111中,采集每一用户的行为数据,包括采集四个要素:时间、空间位置、行为和强度表示信息。In step S111, the behavior data of each user is collected, including collecting four elements: time, spatial location, behavior and intensity representation information.

1)时间:以预设时长(如为10分钟)为周期,采集一段时间(例如两周)内用户在每天24小时指定周期范围内的行为数据;1) Time: With a preset period (for example, 10 minutes) as the period, collect the behavior data of the user within a specified period of 24 hours a day for a period of time (for example, two weeks);

2)空间位置:包括经纬度信息;其中,通过信令数据、通话和上网数据等,采集在指定时间段内的基站位置信息,并转换为经纬度信息,获得行为数据中的空间位置;2) Spatial location: including latitude and longitude information; among them, the location information of the base station within a specified time period is collected through signaling data, call and Internet access data, etc., and converted into latitude and longitude information to obtain the spatial location in the behavior data;

3)行为:可选地,所采集的行为可以包括:用户处于开机状态但是没有通话或者上网行为,例如用户待机;通话行为,以号段区分,例如C_139;应用程序APP行为,以APP区分等。3) Behavior: Optionally, the collected behaviors may include: the user is in the power-on state but does not have a call or surf the Internet, for example, the user is on standby; the call behavior is distinguished by number segment, such as C_139; the application program APP behavior is distinguished by APP, etc. .

4)强度表示信息:可选地,该强度表示信息表示为预设统计周期内目标行为的累计时长;需要说明的是,该累计时长为预设统计周期内目标行为在目标空间发生的累计时长。例如,以其中一位置在一段时间(例如两周内)的累计行为时长(单位为秒)来标识用户的行为强度表示信息。4) Intensity representation information: Optionally, the intensity representation information is expressed as the cumulative duration of the target behavior within the preset statistical period; it should be noted that the cumulative duration is the cumulative duration of the target behavior occurring in the target space within the preset statistical period . For example, the user's behavior intensity representation information is identified by the cumulative behavior duration (in seconds) of a location within a period of time (for example, within two weeks).

通过基于上述四个要素进行每一用户的行为数据采集,获得多组分别对应不同目标行为,分别包括目标行为、时间、空间位置和强度表示信息的行为数据。例如,所采集行为数据的示例可以如下表1所示:By collecting the behavior data of each user based on the above four elements, multiple groups of behavior data corresponding to different target behaviors, including target behavior, time, spatial position and intensity representation information, are obtained. For example, an example of collected behavioral data may be shown in Table 1 below:

表1Table 1

时间time 经度longitude 维度dimension 行为Behavior 强度strength 10:00—10:1010:00—10:10 东经115”25’115"25'E 北纬39”26’Latitude 39"26'N 用户待机user standby 500500 10:10—10:2010:10—10:20 东经115”25’115"25'E 北纬39”26’Latitude 39"26'N 通话行为call behavior 4040 10:20—10:3010:20—10:30 东经115”26’115"26'E 北纬39”27’Latitude 39"27'N APP行为APP behavior 600600

在通过步骤S111获得上述形式的行为数据的条件下,本发明实施例中,在步骤S112,利用所获得的行为数据构造时空行为特征立方体。其中,在该时空行为特征立方体中,以时间、空间位置的经度和空间位置的纬度分别为特征位置的三个维度坐标,用户的每一行为数据以经度、纬度和时间三个维度为表征,在时空行为特征立方体中分布,且以特征点表示,每一行为数据对应一个特征点。On the condition that the behavior data in the above form is obtained through step S111, in the embodiment of the present invention, in step S112, a spatio-temporal behavior characteristic cube is constructed using the obtained behavior data. Among them, in the spatio-temporal behavior feature cube, time, the longitude of the spatial position and the latitude of the spatial position are respectively the three-dimensional coordinates of the feature position, and each behavior data of the user is characterized by the three dimensions of longitude, latitude and time, Distributed in the spatiotemporal behavioral feature cube and represented by feature points, each behavioral data corresponds to a feature point.

该时空行为特征立方体的示例可以如图3所示,不同目标行为在图3中用不同灰度表示,具体实施时可以通过颜色区分。根据图3,通过该时空行为特征立方体,能够清楚展示不同目标行为在时间和空间上的分布状况。An example of the spatio-temporal behavior feature cube can be shown in Figure 3. Different target behaviors are represented by different gray levels in Figure 3, which can be distinguished by color during specific implementation. According to Figure 3, through the spatio-temporal behavior feature cube, the distribution of different target behaviors in time and space can be clearly displayed.

需要说明的是,通过上述构造时空行为特征立方体的方式,对应每一用户可以分别构造相对应的时空行为特征立方体。It should be noted that, through the above method of constructing a spatio-temporal behavior feature cube, a corresponding spatio-temporal behavior feature cube can be constructed for each user.

进一步地,本发明实施例所述重入网用户的识别方法,在通过步骤S112构造时空行为特征立方体之后,通过步骤S113,对时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为行为特征向量,并保留所述时空行为特征立方体中所确定的所述行为特征向量相对应的特征点,以用于后续进行重入网用户识别时的相似度分析。Further, in the method for identifying re-entry users described in the embodiment of the present invention, after constructing the spatiotemporal behavior characteristic cube in step S112, cluster analysis is performed on the behavior data in the spatiotemporal behavior characteristic cube in step S113, and it is determined that the intensity indicates that the information is greater than The behavioral data corresponding to the preset intensity threshold is a behavioral feature vector, and the feature points corresponding to the behavioral feature vector determined in the spatio-temporal behavioral feature cube are reserved for subsequent similarity in the identification of re-entry users analyze.

可选地,在步骤S113,所述对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为行为特征向量,如图4所示,包括:Optionally, in step S113, performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is a behavior feature vector, as shown in FIG. 4 ,include:

S1131,依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;S1131. Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;

S1132,对每一切片数据内的行为数据进行聚类,确定至少一聚类点;S1132, cluster the behavior data in each slice data, and determine at least one cluster point;

S1133,将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。S1133. Compare the intensity representation information of the behavior data corresponding to each cluster point with a preset intensity threshold, and determine that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector.

在步骤S1131,在依据时间维度对所述时空行为特征立方体进行切片时,可以依据时间维段,按照每间隔预设时长,对时空行为特征立方体进行切换,形成多个切片数据;例如,对所构造的用户的时空行为特征立方体依据每半小时进行切片的方式,形成48个切片数据。In step S1131, when slicing the spatio-temporal behavior feature cube according to the time dimension, the spatio-temporal behavior feature cube can be switched according to the preset duration of each interval according to the time dimension segment to form multiple slice data; for example, for all The constructed user's spatio-temporal behavior feature cube forms 48 slices of data according to the way of slice every half hour.

在步骤S1132,对每一切片数据内的行为数据进行聚类,可选地,对于每一切片数据,可以按照具有噪声的基于密度的聚类方法(Density-Based Spatial Clustering ofApplications with Noise,DBSCAN)进行聚类,每一聚类取其中心点作为聚类点,将该聚类点作为特征标识,能够反映用户特定行为在特定时间的平均位置和平均强度,从而能够作为目标提取数据,构建用户的行为特征向量。In step S1132, the behavioral data in each slice data is clustered. Optionally, for each slice data, a density-based clustering method (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) can be used. For clustering, each cluster takes its center point as a cluster point, and the cluster point as a feature identifier, which can reflect the average position and average intensity of a user's specific behavior at a specific time, so that data can be extracted as a target and build a user profile. Behavioral feature vector of .

例如,假设某用户每天六点左右下班坐地铁回家,在地铁上喜欢玩抖音,那么其6:00-6:30经过密度聚类的切片数据的简化版本如下如图5所示,黑圆点代表用户待机行为,白圆点代表玩抖音行为,点的大小代表行为强度。For example, suppose a user takes the subway home from get off work at around 6:00 every day, and likes to play Douyin on the subway, then the simplified version of the sliced data that has undergone density clustering from 6:00 to 6:30 is shown in Figure 5, black The dots represent the user's standby behavior, the white dots represent the behavior of playing Douyin, and the size of the dots represents the intensity of the behavior.

因此,通过上述的步骤S1132,能够确定出每一切片数据中的至少一聚类点,通过该聚类点能够反映用户特定行为在特定时间的平均位置和平均强度。Therefore, through the above step S1132, at least one cluster point in each slice data can be determined, and the cluster point can reflect the average position and average intensity of the user's specific behavior at a specific time.

在此基础上,通过步骤S1133,将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,也即对每一聚类点的强度表示信息进行阈值判定,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,也即判定为有效数据,进行保存,并构建为用户的行为特征向量。On this basis, through step S1133, the intensity representation information of the behavior data corresponding to each cluster point is compared with the preset intensity threshold value, that is, the threshold value judgment is performed on the intensity representation information of each cluster point, and the intensity representation information is determined. The corresponding behavioral data whose information is greater than the preset intensity threshold is the behavioral feature vector, that is, it is determined to be valid data, saved, and constructed as the user's behavioral feature vector.

本发明实施例的其中一实施方式,如图6所示,在步骤S120,在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,所述方法还包括:In one implementation of the embodiments of the present invention, as shown in FIG. 6, in step S120, before performing similarity analysis on the feature points of the first user and the feature points of the second user among the at least two users, the method further include:

S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;S1101. Perform de-unit normalization transformation on the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user, to obtain the normalized data vector of the first user and the normalized data vector of the second user Data vector; wherein each normalized data vector corresponds to a feature point;

其中,在步骤S120中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, in step S120, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, including:

对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.

具体地,通过上述方式,对用户的行为特征向量进行数据的标准化,去除数据的单位限制,转化为无量纲的纯数值,便于不同单位或量级的指标能够进行计算和比较,以能够用于后续不同用户的标准化数据向量所对应特征点的相似度比较。Specifically, through the above method, standardize the data of the user's behavior feature vector, remove the unit restriction of the data, and convert it into a dimensionless pure value, so that indicators of different units or magnitudes can be calculated and compared, so that they can be used for Subsequent similarity comparisons of feature points corresponding to standardized data vectors of different users.

本发明实施例中,为保证数据分析准确性,可选地,在步骤S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,所述方法还包括:In the embodiment of the present invention, in order to ensure the accuracy of data analysis, optionally, in step S1101, the behavior feature vector in the first user's spatio-temporal behavior feature cube and the behavior feature vector in the second user's spatio-temporal behavior feature cube are Before de-unit normalization conversion, the method also includes:

根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;

对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;

其中,在步骤S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, in step S1101, the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized, including:

对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.

其中,可选地,根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Wherein, optionally, in the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user respectively, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user , to determine the time domain segmentation point, including:

依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;

根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;

选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.

设定第一用户为原在网用户,第二用户为待匹配用户,在从数据库中提取该第一用户和该第二用户的行为特征向量,在对第一用户和第二用户的行为特征向量进行去单位标准化转换之前,确定时域切分点,进行时域切分拼装。The first user is set as the original online user, the second user is the user to be matched, and the behavior feature vectors of the first user and the second user are extracted from the database, and the behavior features of the first user and the second user are Before the vector is de-united and normalized, the time-domain segmentation point is determined, and the time-domain segmentation is performed.

通过确定时域切分点,选择出用户活动强度最弱点,根据用户活动强度较弱范围,可以对上述所确定的时空行为特征立方体重新切分拼装,进行后续重入网用户的识别。By determining the time-domain segmentation point and selecting the weakest point of user activity intensity, according to the weaker range of user activity intensity, the above-mentioned determined spatio-temporal behavior characteristic cube can be re-segmented and assembled for subsequent identification of re-entry users.

需要说明的是,由于行为特征向量中记录的目标行为的发生时间位于一天的时间范围内时,若以默认的0点时间轴为起点进行行为特征向理提取分析,容易出现错误判断的情况,这是因为若用户存在0点附近的行为,如在23:30-24:00周期和在00:00-00:30周期的相同地点相同行为,但其实际间隔在一小时之内,在时间轴以0点的特征空间中,该同一行为的时间间隔达20多个小时,因此该数据的间隔性,会导致存在错误判断的情况。It should be noted that since the occurrence time of the target behavior recorded in the behavior feature vector is within the time range of one day, if the default 0-point time axis is used as the starting point for behavior feature extraction and analysis, it is easy to make a wrong judgment. This is because if the user has behaviors near 0 o'clock, such as the same behavior in the same place in the period of 23:30-24:00 and in the period of 00:00-00:30, but the actual interval is within one hour, at time In the feature space where the axis is 0, the time interval of the same behavior is more than 20 hours, so the interval of the data will lead to misjudgment.

基于此,本发明实施例所述识别方法中,通过上述的选择出用户活动强度最弱点的方式,通过所选出的用户活动强度最弱点,对行为特征向量中依据时域重新进行切换排列。在采用上述行为特征向量构造时空行为特征立方体时,对所构造的时空行为特征立方体重新进行切分拼装,以所确定的时域切分点作为行为特征向量分布的起始时间点。举例说明,如图7所示,根据用户A和用户B的行为特征向量确定的时空行为特征立方体的时域切分点为2点,则对时空行为特征立方体进行时域切分拼装后,2点变换为时间轴起点。Based on this, in the identification method described in the embodiment of the present invention, through the above-mentioned method of selecting the weakest point of user activity intensity, the selected weakest point of user activity intensity is used to re-switch and arrange the behavior feature vector according to the time domain. When the above-mentioned behavioral feature vectors are used to construct the spatio-temporal behavioral feature cube, the constructed spatio-temporal behavioral feature cube is re-segmented and assembled, and the determined time-domain segmentation point is used as the starting time point of behavioral feature vector distribution. For example, as shown in Figure 7, the time-domain segmentation point of the spatio-temporal behavior feature cube determined according to the behavior feature vectors of user A and user B is 2 points, then after time-domain segmentation and assembling of the spatio-temporal behavior feature cube, 2 The point is transformed into the time axis starting point.

本发明实施例中,通过依据时间维度,对所述第一用户的时空行为特征立方体和所述第二用户的行为特征立方体分别进行同一目标行为所对应强度表示信息的累加,根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。也即,提取待匹配两个用户的行为特征向量,按照时间维度进行用户行为强度的累加,选取发生最大行为强度的点作为起点,绘制行为强度的一天24小时变化曲线,然后选取两个用户所对应行为变化曲线的极低点作为时域切分点。In the embodiment of the present invention, by accumulating the intensity representation information corresponding to the same target behavior on the spatio-temporal behavior characteristic cube of the first user and the behavior characteristic cube of the second user according to the time dimension, according to each target behavior Corresponding to the accumulated maximum intensity information value, draw the first intensity change curve of the behavior feature vector of the first user, and draw the second intensity change curve of the behavior feature vector of the second user; select the first intensity The lowest point of the change curve and the second intensity change curve is the time domain cut point. That is, extract the behavior feature vectors of the two users to be matched, accumulate the user behavior intensity according to the time dimension, select the point where the maximum behavior intensity occurs as the starting point, draw the 24-hour change curve of the behavior intensity, and then select the two users The extremely low point of the corresponding behavior change curve is used as the cut-off point in the time domain.

本发明实施例所述识别方法,在上述对用户的时空行为特征立方体进行时域切分之后,进一步对每一用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换。具体地,可以通过离差标准化法或者标准差标准化法,对每一用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,以去除行为特征向量中数据的单位限制,转化为无量纲的纯数值,便于不同单位或量级的指标能够进行计算和比较。In the recognition method described in the embodiment of the present invention, after the time-domain segmentation of the user's spatio-temporal behavior feature cube is performed, the behavior feature vector in each user's spatio-temporal behavior feature cube is further de-united and standardized. Specifically, the behavior feature vector in the spatio-temporal behavior feature cube of each user can be de-united and normalized through the dispersion standardization method or the standard deviation standardization method, so as to remove the unit limitation of the data in the behavior feature vector and transform it into a dimensionless The pure value of , which facilitates the calculation and comparison of indicators with different units or magnitudes.

其中一实施方式,通过离差标准化法对原始的行为特征向量进行去单位标准化转换。其中该转换方式所采用公式可以为:In one implementation manner, the original behavioral feature vector is subjected to de-unit normalization conversion by using a deviation normalization method. The formula used in this conversion method can be:

X'=(X-min)/(max-min);X'=(X-min)/(max-min);

其中,X'为转换后的数据,X为转换前的数据,max为转换样本数据中的最大值,min为转换样本数据中的最小值。Among them, X' is the data after conversion, X is the data before conversion, max is the maximum value in the converted sample data, and min is the minimum value in the converted sample data.

通过该方式,可以将行为特征向量各个维度的数据转换为位于【0,1】区间的数据,从而去除行为特征向量中不同维度数据的单位限制。In this way, the data of each dimension of the behavioral feature vector can be converted into data in the [0, 1] interval, thereby removing the unit limitation of different dimensional data in the behavioral feature vector.

例如,采用该方式,可以对行为特征向量中的发生时间、经度和纬度进行去标准化转换,获得如下表2所示的行为特征向量:For example, using this method, the occurrence time, longitude, and latitude in the behavioral feature vector can be denormalized and converted to obtain the behavioral feature vector shown in Table 2 below:

表2Table 2

时间time 平均经度mean longitude 平均维度average dimension 行为Behavior 平均强度average intensity 0.120.12 0.3450.345 0.5670.567 BB 210210 0.340.34 0.120.12 0.80.8 CC 120120

另一实施方式,通过标准差标准化法对原始的行为特征向量进行去单位标准化转换。可选地,该转换方式通常是将转换前的数据减去均值,再除以标准差,获得转换后的数据,转换后的数据符合标准正态分布(均值为0,方差为1)。In another implementation manner, the original behavioral feature vector is subjected to de-unit normalization transformation by standard deviation normalization method. Optionally, the conversion method is usually to subtract the mean value from the data before conversion, and then divide it by the standard deviation to obtain the converted data. The converted data conforms to the standard normal distribution (the mean is 0 and the variance is 1).

需要说明的是,本领域技术人员应该能够了解上述进行去单位标准化转换的具体方式,在此不详细说明。进一步地,进行去单位标准化转换的方式不限于仅能够包括上述的两种。It should be noted that those skilled in the art should be able to understand the specific manner of performing the de-unit normalization conversion above, which will not be described in detail here. Further, the manner of performing deunitization and normalization conversion is not limited to only include the above two types.

本发明实施例中,参阅图1,在步骤S120,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:In the embodiment of the present invention, referring to FIG. 1, in step S120, a similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, and it is judged whether the second user is the first user or not. User's re-entry users, including:

确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;

在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.

其中,可选地,确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Wherein, optionally, determining similar feature points between the feature points of the first user and the feature points of the second user includes:

选取第一用户的第一特征点;Selecting the first feature point of the first user;

计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;

分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.

本发明实施例中,可选地,第一用户的第一特征点和第二用户的特征点的相似性判断可以选择欧几里德算法。In the embodiment of the present invention, optionally, the Euclidean algorithm may be selected for the similarity judgment between the first feature point of the first user and the feature point of the second user.

可选地,所述分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, the analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature are similar feature points includes:

获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;

根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;

根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;

确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.

进一步地,根据所述第一权重值和所述第二权重值,确定权重系数,包括:Further, determining a weight coefficient according to the first weight value and the second weight value includes:

计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;

确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.

可选地,根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point includes:

依据以下公式计算相似度值:The similarity value is calculated according to the following formula:

Si=1-Di/Wi;Si=1-Di/Wi;

其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.

具体地,通过提取第一用户的第一特征点,在第二用户中确定与第一特征点最近的第二特征点,距离记为Di,并根据第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及所述第二特征点所对应目标行为在预设时长内发生时的第二权重值,确定权重系数Wi;根据所计算的距离Di和权重系数Wi,即能够计算出第一特征点和第二特征点的相似度。Specifically, by extracting the first feature point of the first user, the second feature point closest to the first feature point is determined in the second user, and the distance is denoted as Di, and according to the target behavior corresponding to the first feature point in the preset The first weight value when it occurs within the time length, and the second weight value when the target behavior corresponding to the second feature point occurs within the preset time length, determine the weight coefficient Wi; according to the calculated distance Di and weight coefficient Wi, That is, the similarity between the first feature point and the second feature point can be calculated.

可选地,第一特征点与第二特征点的距离可以根据上述所确定的时空行为特征立方体,利用该两个特征点在时间、纬度和经度三个维度上的坐标位置进行距离计算。Optionally, the distance between the first feature point and the second feature point can be calculated according to the above-mentioned determined space-time behavior feature cube, using the coordinate positions of the two feature points in the three dimensions of time, latitude and longitude.

根据以上,通过上述方式,可以在第二用户的特征点中,找到与第一用户的每一特征点相对应的距离最近的特征点,并分别计算相似度。According to the above, through the above method, among the feature points of the second user, the feature point with the closest distance corresponding to each feature point of the first user can be found, and the similarity can be calculated respectively.

可选地,若第二用户的特征点中,不存在与第一用户的特征点相对应距离最近的点,则相似度可以标记为0。Optionally, if there is no feature point of the second user that is closest to the feature point of the first user, the similarity may be marked as 0.

例如,对第一用户和第二用户的多个特征点的相似度比较结果可以为如下表3所示,其中B、C和D表示不同目标行为:For example, the similarity comparison results of a plurality of feature points of the first user and the second user can be as shown in the following table 3, where B, C and D represent different target behaviors:

表3table 3

Figure GDA0003959181720000151
Figure GDA0003959181720000151

进一步地,可以预先设定进行相似度判断的阈值(第二预设值),在第二用户的特征点与第一用户的特征点的相似度超过第二预设值时,则确定为相似特征点,否则不为相似特征点。Further, the threshold (second preset value) for similarity judgment can be preset, and when the similarity between the feature points of the second user and the feature points of the first user exceeds the second preset value, it is determined to be similar feature point, otherwise it is not a similar feature point.

另外,可以预先设定进行第一用户与第二用户进行重入网用户识别时,全部特征点相似度所达到的阈值(第一预设值),在相似特征点的数量与第一用户的全部特征点的数量比值大于第二预设值时,则确定第一用户与第二用户高度相似,则判断第二用户为第一用户的重入网用户。In addition, when the first user and the second user carry out re-entry user identification, the threshold (first preset value) reached by the similarity of all feature points can be preset. When the ratio of the number of feature points is greater than the second preset value, it is determined that the first user is highly similar to the second user, and then it is determined that the second user is a re-entry user of the first user.

利用上述的相似性判断原则,用户行为相似性的判断具有如下三个原则:Using the above similarity judgment principles, the judgment of user behavior similarity has the following three principles:

1.两个特征点的距离越短相似性越高;1. The shorter the distance between two feature points, the higher the similarity;

2.两个特征点的权重越接近相似性越高;2. The closer the weight of two feature points is, the higher the similarity is;

3.两个用户相似性高的特征点越多相似性越高。3. The more feature points with high similarity between two users, the higher the similarity.

本发明实施例所述重入网用户的识别方法,利用用户在特定时间、空间的行为具有高度相似性这一特性,在模型的数据构建阶段综合利用时间、空间和用户行为三个维度的数据构建用户特征立方体,并利用时间维度数据切片,密度聚类算法以及行为强度阈值判定等技术实现用户在特定时间空间典型特征行为的提取和行为特征向量的构建,然后对两个用户的特征向量进行时域切分拼接以及欧式坐标变换等预处理,最后通过欧氏距离算法进行两个用户的相似性比较,确定两个用户是否为重入网用户。The method for identifying re-entry users described in the embodiments of the present invention utilizes the characteristic that the behaviors of users at a specific time and space have a high degree of similarity, and comprehensively utilizes the three-dimensional data construction of time, space, and user behavior in the data construction stage of the model User feature cube, and use time-dimension data slicing, density clustering algorithm, and behavior intensity threshold determination technology to realize the extraction of typical feature behaviors of users in a specific time space and the construction of behavior feature vectors, and then perform time-dependent analysis of the two user feature vectors. Preprocessing such as domain segmentation and splicing and Euclidean coordinate transformation, and finally, the similarity comparison between two users is carried out through the Euclidean distance algorithm to determine whether the two users are re-entry users.

采用本发明实施例所述重入网用户的识别方法,同时利用时间、空间以及行为三个要素进行相似性分析,并利用行为特征向量在时空行为特征立方体中确定的特征点进行用户的相似度分析,相较于单一维度的分析方式,能够提高重入网用户判断的准确性;另外,通过对多个维度的数据进行综合建模,相较于在每一维度分别进行建模,能够避免容易发生误判的问题;进一步地,通过多种技术手段构建的简化后的用户特征向量大大减少了用户身份判断的数据量,提高了重入网用户判断的效率。Using the identification method for re-entry users described in the embodiment of the present invention, the three elements of time, space and behavior are used to perform similarity analysis, and the feature points determined by the behavior feature vector in the spatiotemporal behavior feature cube are used to perform user similarity analysis , compared with a single-dimensional analysis method, it can improve the accuracy of re-entrant user judgment; in addition, by comprehensively modeling data in multiple dimensions, compared with modeling in each dimension separately, it is possible to avoid prone to The problem of misjudgment; further, the simplified user feature vector constructed by various technical means greatly reduces the amount of data for user identity judgment, and improves the efficiency of judging re-entry users.

本发明实施例还提供一种重入网用户的识别装置,如图8所示,包括:The embodiment of the present invention also provides an identification device for re-entry users, as shown in Figure 8, including:

向量获取模块810,用于获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;A vector acquisition module 810, configured to acquire behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial location, and intensity representation information of the target behavior;

比较模块820,用于根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;The comparison module 820 is configured to perform similarity analysis on the feature points of the first user and the feature points of the second user among at least two users according to the feature points determined in the spatiotemporal behavior feature cube by the behavior feature vector of each user , judging whether the second user is a re-entry user of the first user;

其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.

可选地,所述的重入网用户的识别装置,其中,所述向量获取模块810包括:Optionally, the device for identifying re-entrant users, wherein the vector acquisition module 810 includes:

采集单元811,用于采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;A collection unit 811, configured to collect behavior data of each user; the behavior data includes time, spatial location and intensity representation information of different target behaviors;

第一构造单元812,用于根据所述行为数据构造每一用户的所述时空行为特征立方体;A first construction unit 812, configured to construct the spatiotemporal behavior characteristic cube of each user according to the behavior data;

分析单元813,用于对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;The analysis unit 813 is configured to perform cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determine that the corresponding behavior data whose intensity representation information is greater than a preset intensity threshold is the behavior feature vector;

第二构造单元814,用于删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。The second construction unit 814 is configured to delete other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.

可选地,所述的重入网用户的识别装置,其中,所述分析单元813对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the device for identifying re-entry users, the analysis unit 813 performs cluster analysis on the behavior data in the spatio-temporal behavior characteristic cube, and determines the corresponding behavior whose intensity representation information is greater than a preset intensity threshold The data is the behavior feature vector, including:

依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;

对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;

将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.

可选地,所述的重入网用户的识别装置,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the device for identifying re-entry users, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.

可选地,所述的重入网用户的识别装置,其中,所述装置还包括:Optionally, the device for identifying a re-entry user, wherein the device further includes:

转换模块8101,用于在比较模块820在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The conversion module 8101 is used to analyze the behavior feature vector in the spatio-temporal behavior feature cube of the first user before the comparison module 820 performs similarity analysis on the feature points of the first user and the feature points of the second user among the at least two users Carry out de-unit standardization conversion with the behavior feature vector in the spatiotemporal behavior feature cube of the second user, obtain the normalized data vector of the first user and the normalized data vector of the second user; Wherein each normalized data vector corresponds to a feature point;

其中,所述比较模块820对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,具体为:Wherein, the comparison module 820 performs a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, specifically:

对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.

可选地,所述的重入网用户的识别装置,其中,比较模块820对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,具体为:Optionally, in the device for identifying re-entry users, the comparison module 820 performs a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, and determines whether the second user A re-entry user who is the first user, specifically:

确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;

在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.

可选地,所述的重入网用户的识别装置,其中,所述比较模块820确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, the device for identifying re-entry users, wherein the comparison module 820 determines the similar feature points compared with the feature points of the first user and the feature points of the second user, including:

选取第一用户的第一特征点;Selecting the first feature point of the first user;

计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;

分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.

可选地,所述的重入网用户的识别装置,其中,所述比较模块820分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, in the device for identifying re-entry users, the comparison module 820 analyzes the similarity value between the first feature point and the second feature point, and judges whether the first feature point is similar to the second feature point. Whether the second feature is a similar feature point, including:

获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;

根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;

根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;

确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.

可选地,所述的重入网用户的识别装置,其中,比较模块820根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, in the device for identifying re-entrant users, wherein the comparison module 820 determines a weight coefficient according to the first weight value and the second weight value, including:

计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;

确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.

可选地,所述的重入网用户的识别装置,其中,比较模块820根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, in the device for identifying re-entry users, wherein the comparison module 820 calculates a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point, including:

依据以下公式计算相似度值:The similarity value is calculated according to the following formula:

Si=1-Di/Wi;Si=1-Di/Wi;

其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.

可选地,所述的重入网用户的识别装置,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,转换模块8101还用于:Optionally, the device for identifying re-entry users, wherein, before de-uniting and normalizing the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user , the conversion module 8101 is also used for:

根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;

对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;

其中,转换模块8101对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the conversion module 8101 performs de-unit normalization conversion on the behavior feature vector in the spatiotemporal behavior feature cube of the first user and the behavior feature vector in the spatiotemporal behavior feature cube of the second user, including:

对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.

可选地,所述的重入网用户的识别装置,其中,转换模块8101根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the device for identifying re-entry users, the transformation module 8101 constructs the spatio-temporal behavior feature cube according to the behavior feature vector of the first user and the behavior feature vector of the second user, the behavior feature of the first user Vector and the time distribution dimension of the second user's behavior feature vector, determine the time domain segmentation point, including:

依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;

根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;

选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.

可选地,所述的重入网用户的识别装置,其中,转换模块8101对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the device for identifying re-entry users, the transformation module 8101 deunits the behavior feature vectors in the spatiotemporal behavior feature cube of the first user and the behavior feature vector in the spatiotemporal behavior feature cube of the second user Normalized transformations, including:

通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.

本发明实施例还提供一种识别设备,如图9所示,包括:处理器901;以及通过总线接口902与所述处理器901相连接的存储器903,所述存储器903用于存储所述处理器901在执行操作时所使用的程序和数据,处理器901调用并执行所述存储器903中所存储的程序和数据。An embodiment of the present invention also provides an identification device, as shown in FIG. 9 , including: a processor 901; and a memory 903 connected to the processor 901 through a bus interface 902, and the memory 903 is used to store the processing The processor 901 calls and executes the programs and data stored in the memory 903 .

其中,收发机904与总线接口902连接,用于在处理器901的控制下接收和发送数据,具体地,处理器901用于读取存储器903中的程序,执行下列过程:Wherein, the transceiver 904 is connected to the bus interface 902, and is used to receive and send data under the control of the processor 901. Specifically, the processor 901 is used to read the program in the memory 903, and execute the following process:

获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;Obtaining behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;

根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;According to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, the feature points of the first user and the feature points of the second user among at least two users are analyzed for similarity, and the second user is judged. Whether the user is a re-entry user of the first user;

其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.

可选地,所述的识别设备,其中,所述处理器901获取至少两个用户的行为特征向量,包括:Optionally, the identification device, wherein the processor 901 acquires behavior feature vectors of at least two users, includes:

采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;Collect behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;

根据所述行为数据构造每一用户的所述时空行为特征立方体;Constructing the spatiotemporal behavior characteristic cube of each user according to the behavior data;

对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector;

删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。Deleting other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.

可选地,所述的识别设备,其中,所述处理器901对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the identification device, the processor 901 performs cluster analysis on the behavior data in the spatio-temporal behavior characteristic cube, and determines that the corresponding behavior data whose intensity representation information is greater than a preset intensity threshold is the Behavioral feature vectors, including:

依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;

对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;

将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.

可选地,所述的识别设备,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the identification device, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.

可选地,所述的识别设备,其中,处理器901在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,还用于:Optionally, in the identification device, the processor 901 is further configured to:

对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized to obtain the normalized data vector of the first user and the normalized data vector of the second user ; Each normalized data vector corresponds to a feature point;

其中,处理器901对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, the processor 901 performs a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, including:

对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.

可选地,所述的识别设备,其中,处理器901对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:Optionally, in the identification device, the processor 901 performs a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, and determines whether the second user is the second user. A user's re-entry user, including:

确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;

在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.

可选地,所述的识别设备,其中,所述处理器901确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, in the identification device, wherein the processor 901 determines similar feature points between the feature points of the first user and the feature points of the second user, including:

选取第一用户的第一特征点;Selecting the first feature point of the first user;

计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;

分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.

可选地,所述的识别设备,其中,所述处理器901分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, in the identification device, the processor 901 analyzes the similarity value between the first feature point and the second feature point to determine whether the first feature point and the second feature are similar feature points, including:

获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;

根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;

根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;

确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.

可选地,所述的识别设备,其中,处理器901根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, in the identification device, wherein the processor 901 determines a weight coefficient according to the first weight value and the second weight value, including:

计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;

确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.

可选地,所述的识别设备,其中,处理器901根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, in the identification device, wherein the processor 901 calculates the similarity value according to the weight coefficient and the distance between the first feature point and the second feature point, including:

依据以下公式计算相似度值:The similarity value is calculated according to the following formula:

Si=1-Di/Wi;Si=1-Di/Wi;

其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.

可选地,所述的识别设备,其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,还用于:Optionally, in the identification device, before the processor 901 performs deunit normalization transformation on the behavior feature vectors in the first user’s spatiotemporal behavior feature cube and the behavior feature vector in the second user’s spatiotemporal behavior feature cube, Also used for:

根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;

对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;

其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the processor 901 performs deunit normalization transformation on the behavior feature vector in the spatiotemporal behavior feature cube of the first user and the behavior feature vector in the spatiotemporal behavior feature cube of the second user, including:

对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.

可选地,所述的识别设备,其中,处理器901根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the recognition device, the processor 901 constructs the spatio-temporal behavior feature cube according to the behavior feature vector of the first user and the behavior feature vector of the second user, the behavior feature vector of the first user and the behavior feature vector of the second user The time distribution dimension of the user's behavior feature vector determines the time domain segmentation point, including:

依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;

根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;

选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.

可选地,所述的识别设备,其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the identification device, wherein, the processor 901 performs deunit normalization transformation on the behavior feature vector in the first user’s spatiotemporal behavior feature cube and the behavior feature vector in the second user’s spatiotemporal behavior feature cube, including :

通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.

其中,在图9中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器901代表的一个或多个处理器和存储器903代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。收发机904可以是多个元件,即包括发送机和接收机,提供用于在传输介质上与各种其他装置通信的单元。处理器901负责管理总线架构和通常的处理,存储器903可以存储处理器901在执行操作时所使用的数据。Wherein, in FIG. 9 , the bus architecture may include any number of interconnected buses and bridges, specifically one or more processors represented by the processor 901 and various circuits of the memory represented by the memory 903 are linked together. The bus architecture can also link together various other circuits such as peripherals, voltage regulators, and power management circuits, etc., which are well known in the art and therefore will not be further described herein. The bus interface provides the interface. Transceiver 904 may be a plurality of elements, including a transmitter and a receiver, providing a means for communicating with various other devices over transmission media. The processor 901 is responsible for managing the bus architecture and general processing, and the memory 903 can store data used by the processor 901 when performing operations.

本领域技术人员可以理解,实现上述实施例的全部或者部分步骤可以通过硬件来完成,也可以通过程序来指示相关的硬件来完成,所述程序包括执行上述方法的部分或者全部步骤的指令;且该程序可以存储于一可读存储介质中,存储介质可以是任何形式的存储介质。Those skilled in the art can understand that all or part of the steps of the above-mentioned embodiments can be implemented by hardware, or can be completed by instructing the relevant hardware through a program, and the program includes instructions for executing some or all of the steps of the above-mentioned method; and The program can be stored in a readable storage medium, and the storage medium can be any form of storage medium.

另外,本发明具体实施例还提供一种可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如上中任一项所述的重入网用户的识别方法的步骤。In addition, a specific embodiment of the present invention also provides a readable storage medium on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method for identifying re-entry users as described in any one of the above items are implemented.

具体地,该可读存储介质应用于上述的识别设备,在应用于识别设备时,对应重入网用户的识别方法中的执行步骤如上的详细描述,在此不再赘述。Specifically, the readable storage medium is applied to the above-mentioned identification device. When applied to the identification device, the execution steps in the method for identifying re-entry users are as described in detail above, and will not be repeated here.

在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute some steps of the sending and receiving methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), magnetic disk or optical disk, etc., which can store program codes. medium.

以上所述的是本发明的优选实施方式,应当指出对于本技术领域的普通人员来说,在不脱离本发明所述原理前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。What has been described above is a preferred embodiment of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications can also be made without departing from the principle of the present invention. These improvements and modifications should also be considered as Be the protection scope of the present invention.

Claims (16)

1.一种重入网用户的识别方法,其特征在于,包括:1. An identification method for re-entry users, characterized in that, comprising: 获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;Obtaining behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior; 根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;According to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, the feature points of the first user and the feature points of the second user among at least two users are analyzed for similarity, and the second user is judged. Whether the user is a re-entry user of the first user; 其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior. 2.根据权利要求1所述的重入网用户的识别方法,其特征在于,所述获取至少两个用户的行为特征向量,包括:2. the identification method of re-entrant network user according to claim 1, is characterized in that, the behavior characteristic vector of described acquisition at least two users, comprises: 采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;Collect behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors; 根据所述行为数据构造每一用户的所述时空行为特征立方体;Constructing the spatiotemporal behavior characteristic cube of each user according to the behavior data; 对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector; 删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。Deleting other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector. 3.根据权利要求2所述的重入网用户的识别方法,其特征在于,所述对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:3. The method for identifying re-entry users according to claim 2, wherein the behavior data in the spatio-temporal behavior characteristic cube is clustered and analyzed, and the corresponding intensity representation information is determined to be greater than the preset intensity threshold. Behavior data is the behavior feature vector, including: 依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data; 对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point; 将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector. 4.根据权利要求1至3任一项所述的重入网用户的识别方法,其特征在于,所述强度表示信息表示为预设统计周期内目标行为的累计时长。4. The method for identifying re-entry users according to any one of claims 1 to 3, wherein the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period. 5.根据权利要求1所述的重入网用户的识别方法,其特征在于,在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,所述方法还包括:5. the identification method of re-entrant network user according to claim 1, is characterized in that, before carrying out the similarity analysis to the feature point of the first user and the feature point of the second user in at least two users, described method also include: 对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized to obtain the normalized data vector of the first user and the normalized data vector of the second user ; Each normalized data vector corresponds to a feature point; 其中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, including: 对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user. 6.根据权利要求1所述的重入网用户的识别方法,其特征在于,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:6. The method for identifying re-entry users according to claim 1, characterized in that, performing similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, and judging that the second user Whether it is a re-entry user of the first user, including: 确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user; 在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user. 7.根据权利要求6所述的重入网用户的识别方法,其特征在于,所述确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:7. The method for identifying re-entry users according to claim 6, wherein said determining the similar feature points of the feature points of the first user compared with the feature points of the second user comprises: 选取第一用户的第一特征点;Selecting the first feature point of the first user; 计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same; 分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points. 8.根据权利要求7所述的重入网用户的识别方法,其特征在于,所述分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:8. The identification method of re-entrant network users according to claim 7, characterized in that, said analyzing the similarity value between said first feature point and said second feature point, judging said first feature point and said second feature point Whether the second feature is a similar feature point, including: 获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period; 根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value; 根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point; 确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points. 9.根据权利要求8所述的重入网用户的识别方法,其特征在于,根据所述第一权重值和所述第二权重值,确定权重系数,包括:9. The method for identifying re-entry users according to claim 8, wherein, according to the first weight value and the second weight value, determining a weight coefficient includes: 计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value; 确定所述比值为所述权重系数。The ratio is determined as the weight coefficient. 10.根据权利要求8所述的重入网用户的识别方法,其特征在于,根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:10. The identification method of re-entrant network user according to claim 8, is characterized in that, according to the distance between described weight coefficient and described first feature point and described second feature point, calculate similarity value, comprise : 依据以下公式计算相似度值:The similarity value is calculated according to the following formula: Si=1-Di/Wi;Si=1-Di/Wi; 其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient. 11.根据权利要求5所述的重入网用户的识别方法,其特征在于,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,所述方法还包括:11. the identification method of re-entrant network user according to claim 5 is characterized in that, the behavior feature vector in the spatiotemporal behavior characteristic cube of the first user and the behavior characteristic vector in the spatiotemporal behavior characteristic cube of the second user are removed Before unit normalization conversion, the method also includes: 根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavior feature cube constructed according to the behavior feature vector of the first user and the behavior feature vector of the second user, the time distribution dimension of the behavior feature vector of the first user and the behavior feature vector of the second user is determined to determine the time domain segmentation point; 对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time-domain segmentation point as the starting time point; 其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the de-unit standardization transformation is performed on the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user, including: 对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。Perform deunit normalization transformation on the behavior feature vectors in the first user's spatiotemporal behavior feature cube and the second user's spatiotemporal behavior feature cube after segmentation and assembly. 12.根据权利要求11所述的重入网用户的识别方法,其特征在于,根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:12. The method for identifying re-entry users according to claim 11, characterized in that, in the spatio-temporal behavioral feature cube constructed respectively according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the behavior of the first user The feature vector and the time distribution dimension of the second user's behavior feature vector determine the time domain segmentation point, including: 依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior; 根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior; 选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity change curve and the second intensity change curve is selected as the time domain segmentation point. 13.根据权利要求5所述的重入网用户的识别方法,其特征在于,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:13. the identification method of re-entrant network user according to claim 5 is characterized in that, the behavior feature vector in the spatiotemporal behavior characteristic cube of the first user and the behavior characteristic vector in the spatiotemporal behavior characteristic cube of the second user are removed Unit normalization conversions, including: 通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively. 14.一种重入网用户的识别装置,其特征在于,包括:14. An identification device for re-entry users, characterized in that it comprises: 向量获取模块,用于获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;A vector acquisition module, configured to acquire behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior; 比较模块,用于根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;The comparison module is used to perform similarity analysis on the feature points of the first user and the feature points of the second user among at least two users according to the feature points determined in the spatio-temporal behavior feature cube by the behavior feature vector of each user, judging whether the second user is a re-entry user of the first user; 其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior. 15.一种识别设备,其特征在于,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如权利要求1至13任一项所述的重入网用户的识别方法。15. An identification device, characterized in that it comprises: a processor, a memory, and a program stored on the memory and operable on the processor, when the program is executed by the processor, it realizes The method for identifying re-entry users described in any one of 1 to 13. 16.一种可读存储介质,其特征在于,所述可读存储介质上存储有程序,所述程序被处理器执行时实现如权利要求1至13任一项所述的重入网用户的识别方法中的步骤。16. A readable storage medium, characterized in that a program is stored on the readable storage medium, and when the program is executed by a processor, the identification of re-entrant network users according to any one of claims 1 to 13 is realized steps in the method.
CN202010350086.6A 2020-04-28 2020-04-28 Identification method, device and equipment of re-networking user Active CN113573242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010350086.6A CN113573242B (en) 2020-04-28 2020-04-28 Identification method, device and equipment of re-networking user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010350086.6A CN113573242B (en) 2020-04-28 2020-04-28 Identification method, device and equipment of re-networking user

Publications (2)

Publication Number Publication Date
CN113573242A CN113573242A (en) 2021-10-29
CN113573242B true CN113573242B (en) 2023-03-31

Family

ID=78158091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010350086.6A Active CN113573242B (en) 2020-04-28 2020-04-28 Identification method, device and equipment of re-networking user

Country Status (1)

Country Link
CN (1) CN113573242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860557B (en) * 2022-04-08 2023-05-26 广东联想懂的通信有限公司 User behavior information generation method, device, equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902498A (en) * 2015-04-17 2015-09-09 中国联合网络通信集团有限公司 Identification method and device for subscriber re-networking

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682041B (en) * 2011-03-18 2014-06-04 日电(中国)有限公司 User behavior identification equipment and method
CN105281925B (en) * 2014-06-30 2019-05-14 腾讯科技(深圳)有限公司 The method and apparatus that network service groups of users divides
CN110290513B (en) * 2019-07-05 2021-10-15 中国联合网络通信集团有限公司 A method and system for identifying re-entry users

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902498A (en) * 2015-04-17 2015-09-09 中国联合网络通信集团有限公司 Identification method and device for subscriber re-networking

Also Published As

Publication number Publication date
CN113573242A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN110147722B (en) Video processing method, video processing device and terminal equipment
CN110147710B (en) Method and device for processing human face features and storage medium
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
CN110019891B (en) Image storage method, image retrieval method and device
CN111859451B (en) Multi-source multi-mode data processing system and method for applying same
JP7038143B2 (en) How to estimate the deleteability of a data object
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
WO2019062081A1 (en) Salesman profile formation method, electronic device and computer readable storage medium
CN108268886B (en) Method and system for identifying plug-in operations
WO2018033052A1 (en) Method and system for evaluating user portrait data
CN118378218B (en) Safety monitoring method for computer host
CN112307133A (en) Security protection method and device, computer equipment and storage medium
CN105825232A (en) Classification method and device for electromobile users
CN113573242B (en) Identification method, device and equipment of re-networking user
CN115392937A (en) User fraud risk identification method and device, electronic equipment and storage medium
CN113505369A (en) Method and device for training user risk recognition model based on space-time perception
CN117893756A (en) Training method of image segmentation model, handheld object recognition method, device and medium
CN108230001A (en) The method, apparatus and system of extending user
CN110705777B (en) Method, device and system for predicting spare part reserve
CN117150138A (en) Scientific and technological resource organization method and system based on high-dimensional space mapping
CN112487082A (en) Biological feature recognition method and related equipment
CN117785973B (en) Community user information integration method, device, equipment and storage medium
CN112907306A (en) Customer satisfaction judging method and device
CN119474689B (en) Intelligent label processing method, system and medium based on customer management technology
CN119179624B (en) An IT operation and maintenance service information integration service management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant