CN113573242B - Identification method, device and equipment of re-networking user - Google Patents
Identification method, device and equipment of re-networking user Download PDFInfo
- Publication number
- CN113573242B CN113573242B CN202010350086.6A CN202010350086A CN113573242B CN 113573242 B CN113573242 B CN 113573242B CN 202010350086 A CN202010350086 A CN 202010350086A CN 113573242 B CN113573242 B CN 113573242B
- Authority
- CN
- China
- Prior art keywords
- user
- feature
- behavior
- cube
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 239000013598 vector Substances 0.000 claims abstract description 236
- 238000004458 analytical method Methods 0.000 claims abstract description 34
- 230000006399 behavior Effects 0.000 claims description 461
- 230000003542 behavioural effect Effects 0.000 claims description 75
- 230000011218 segmentation Effects 0.000 claims description 39
- 238000010606 normalization Methods 0.000 claims description 31
- 238000006243 chemical reaction Methods 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 12
- 238000007621 cluster analysis Methods 0.000 claims description 10
- 230000001186 cumulative effect Effects 0.000 claims description 8
- 239000006185 dispersion Substances 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 6
- 238000010276 construction Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000004141 dimensional analysis Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000011428 standard deviation standardization method Methods 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/029—Location-based management or tracking services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/309—Measuring or estimating channel quality parameters
- H04B17/318—Received signal strength
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/08—Testing, supervising or monitoring using real traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/26—Network addressing or numbering for mobility support
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及通信技术领域,尤其是指一种重入网用户的识别方法、装置及设备。The present invention relates to the field of communication technology, in particular to a method, device and equipment for identifying re-entry network users.
背景技术Background technique
正在或者曾经使用某家运营商卡号的用户,在短期内又购买所属同一运营商的卡号入网,新号码全部或部分替代旧号码,这部分用户即为重入网用户,重入网用户占用系统卡号资源,增加了公司的营销成本,加大业务风险,因此需要进行有效识别并管理。但是由于重入网手机号码和原在网手机号码是两个不同的号码,因此如何判断这两个号码是否属于同一人使用是识别重入网号码的关键。Users who are or have used a card number of a certain operator purchase a card number belonging to the same operator in a short period of time to access the network, and the new number completely or partially replaces the old number. These users are re-entry users, and re-entry users occupy system card number resources , which increases the company's marketing costs and increases business risks, so it needs to be effectively identified and managed. However, since the re-entry mobile phone number and the original online mobile phone number are two different numbers, how to determine whether the two numbers belong to the same person is the key to identifying the re-entry number.
发明内容Contents of the invention
本发明技术方案的目的在于提供一种重入网用户的识别方法、装置及设备,能够简单、有效地识别出同一运营网络中的重入网用户。The purpose of the technical solution of the present invention is to provide a method, device and equipment for identifying re-entry users, which can simply and effectively identify re-entry users in the same operating network.
本发明实施例提供一种重入网用户的识别方法,其中,包括:An embodiment of the present invention provides a method for identifying a re-entry user, which includes:
获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;Obtaining behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;
根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;According to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, the feature points of the first user and the feature points of the second user among at least two users are analyzed for similarity, and the second user is judged. Whether the user is a re-entry user of the first user;
其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.
可选地,所述的重入网用户的识别方法,其中,所述获取至少两个用户的行为特征向量,包括:Optionally, the method for identifying re-entry users, wherein the acquiring behavioral feature vectors of at least two users includes:
采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;Collect behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;
根据所述行为数据构造每一用户的所述时空行为特征立方体;Constructing the spatiotemporal behavior characteristic cube of each user according to the behavior data;
对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector;
删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。Deleting other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.
可选地,所述的重入网用户的识别方法,其中,所述对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the method for identifying re-entry users, wherein the cluster analysis is performed on the behavior data in the spatio-temporal behavior characteristic cube, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined to be Behavior feature vectors, including:
依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;
对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;
将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.
可选地,所述的重入网用户的识别方法,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the method for identifying re-entry users, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.
可选地,所述的重入网用户的识别方法,其中,在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,所述方法还包括:Optionally, the method for identifying re-entry users, wherein, before performing similarity analysis on the feature points of the first user and the feature points of the second user among the at least two users, the method further includes:
对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized to obtain the normalized data vector of the first user and the normalized data vector of the second user ; Each normalized data vector corresponds to a feature point;
其中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, including:
对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.
可选地,所述的重入网用户的识别方法,其中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:Optionally, in the method for identifying re-entry users, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, and it is judged whether the second user is the Re-entry users of the first user, including:
确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;
在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.
可选地,所述的重入网用户的识别方法,其中,所述确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, in the method for identifying re-entry users, the determining similar feature points between the feature points of the first user and the feature points of the second user includes:
选取第一用户的第一特征点;Selecting the first feature point of the first user;
计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;
分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.
可选地,所述的重入网用户的识别方法,其中,所述分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, the method for identifying re-entry users, wherein, analyzing the similarity value between the first feature point and the second feature point, and judging the difference between the first feature point and the second feature point Whether it is a similar feature point, including:
获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;
根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;
根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;
确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.
可选地,所述的重入网用户的识别方法,其中,根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, the method for identifying re-entry users, wherein, according to the first weight value and the second weight value, determining a weight coefficient includes:
计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;
确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.
可选地,所述的重入网用户的识别方法,其中,根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, the method for identifying re-entry users, wherein, according to the weight coefficient and the distance between the first feature point and the second feature point, calculating the similarity value includes:
依据以下公式计算相似度值:The similarity value is calculated according to the following formula:
Si=1-Di/Wi;Si=1-Di/Wi;
其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.
可选地,所述的重入网用户的识别方法,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,所述方法还包括:Optionally, the method for identifying re-entry users, wherein, before de-uniting and normalizing the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user , the method also includes:
根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;
对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;
其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the de-unit standardization transformation is performed on the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user, including:
对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.
可选地,所述的重入网用户的识别方法,其中,根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the method for identifying re-entry users, wherein, in the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the behavioral feature vector of the first user and the behavioral feature vector of the second user Second, the time distribution dimension of the user's behavior feature vector, to determine the time domain segmentation point, including:
依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;
根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;
选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.
可选地,所述的重入网用户的识别方法,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the method for identifying re-entry users, wherein, the behavior feature vector in the first user's spatio-temporal behavior feature cube and the behavior feature vector in the second user's spatio-temporal behavior feature cube are de-united and standardized, include:
通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.
本发明实施例还提供一种重入网用户的识别装置,其中,包括:An embodiment of the present invention also provides an identification device for a re-entry user, which includes:
向量获取模块,用于获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;A vector acquisition module, configured to acquire behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;
比较模块,用于根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;The comparison module is used to perform similarity analysis on the feature points of the first user and the feature points of the second user among at least two users according to the feature points determined in the spatio-temporal behavior feature cube by the behavior feature vector of each user, judging whether the second user is a re-entry user of the first user;
其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.
本发明实施例还提供一种识别设备,其中,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如上任一项所述的重入网用户的识别方法。An embodiment of the present invention also provides an identification device, which includes: a processor, a memory, and a program stored on the memory and operable on the processor. When the program is executed by the processor, the following steps are implemented: The method for identifying re-entry users described in any one of the preceding items.
本发明实施例还提供一种可读存储介质,其中,所述可读存储介质上存储有程序,所述程序被处理器执行时实现如上任一项所述的重入网用户的识别方法中的步骤。An embodiment of the present invention also provides a readable storage medium, wherein a program is stored on the readable storage medium, and when the program is executed by a processor, the method for identifying re-entrant users as described in any one of the above items is implemented. step.
本发明上述技术方案中的至少一个具有以下有益效果:At least one of the above technical solutions of the present invention has the following beneficial effects:
采用本发明实施例所述重入网用户的识别方法,利用时间、空间位置和强度表示信息构建用户的行为特征向量,并根据行为特征向量在时空行为特征立方体中确定的特征点,进行相似度分析,进行重入网用户识别,该识别方法将时间、空间位置和强度表示信息相关联,相较于现有技术能够简单、有效地识别出同一运营网络中的重入网用户。Using the identification method for re-entry users described in the embodiment of the present invention, using time, spatial position and intensity representation information to construct user behavior feature vectors, and performing similarity analysis according to the feature points determined by the behavior feature vectors in the spatio-temporal behavior feature cube , to identify re-entry users, the identification method associates time, space position and intensity representation information, compared with the prior art, it can simply and effectively identify re-entry users in the same operating network.
附图说明Description of drawings
图1为本发明实施例所述重入网用户的识别方法的流程示意图;FIG. 1 is a schematic flow diagram of a method for identifying re-entry users according to an embodiment of the present invention;
图2为图1的步骤S110的流程示意图;FIG. 2 is a schematic flow chart of step S110 in FIG. 1;
图3为其中一时空行为特征立方体的示意图;3 is a schematic diagram of one of the spatiotemporal behavioral feature cubes;
图4为图2中步骤S113的流程示意图;FIG. 4 is a schematic flow chart of step S113 in FIG. 2;
图5为其中一时间切片的示意图;Figure 5 is a schematic diagram of one of the time slices;
图6为图1中步骤S120的流程示意图;FIG. 6 is a schematic flow chart of step S120 in FIG. 1;
图7为进行时域切分拼装后的时空行为特征立方体的结构示意图;Fig. 7 is a schematic structural diagram of the space-time behavior characteristic cube after time domain segmentation and assembly;
图8为本发明实施例所述重入网用户的识别装置的流程示意图;FIG. 8 is a schematic flow diagram of an identification device for a re-entry user according to an embodiment of the present invention;
图9为本发明实施例所述识别设备的结构示意图。Fig. 9 is a schematic structural diagram of an identification device according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.
本发明实施例提供一种重入网用户的识别方法,利用同一用户在特定时间、空间的行为具有高度相似性的特征,利用时间、空间位置和强度表示信息构建用户的行为特征向量,根据行为特征向量在时空行为特征立方体中确定的特征点,将时间、空间位置和强度表示信息相关联,能够简单、有效地识别出同一运营网络中的重入网用户。An embodiment of the present invention provides a method for identifying re-entry users, which utilizes the characteristics of high similarity in the behavior of the same user at a specific time and space, uses time, space position and intensity representation information to construct a user's behavior feature vector, and according to the behavior characteristics The feature points determined by the vector in the spatio-temporal behavior feature cube correlate time, space position and intensity representation information, and can easily and effectively identify re-entry users in the same operating network.
本发明其中一实施例所述重入网用户的识别方法,如图1所示,所述方法包括:The method for identifying re-entry users described in one of the embodiments of the present invention, as shown in Figure 1, the method includes:
S110,获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;S110, acquiring behavior feature vectors of at least two users; the behavior feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;
S120,根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;S120, according to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, perform a similarity analysis on the feature points of the first user and the feature points of the second user among at least two users, and determine the Whether the second user is a re-entry user of the first user;
其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.
采用本发明实施例所述重入网用户的识别方法,利用时间、空间位置和强度表示信息构建用户的行为特征向量,将不同用户的行为特征向量在时空行为特征立方体中确定的特征点进行比较,进行重入网用户的识别,相较于利用单一维度模型进行用户识别,能够有效保证重入网用户识别的准确率;另外,采用该方式进行重入网用户识别,将时间、空间位置和强度表示信息相关联分析,也能够进一步有效保证重入网用户识别的准确率,且相较于现有技术分别针对每一维度进行单独建模进行相似度分析,之后综合对各个维度相似度分析,进行重入网用户识别的方式,本发明实施例所述识别方法,更简单且易于实现。Using the method for identifying re-entry users described in the embodiments of the present invention, using time, spatial position and intensity representation information to construct user behavior feature vectors, and comparing the feature points determined by the behavior feature vectors of different users in the spatiotemporal behavior feature cube, Compared with using a single-dimensional model for user identification, the identification of re-entry users can effectively guarantee the accuracy of re-entry user identification; in addition, using this method for re-entry user identification correlates time, spatial location and intensity representation information Combined analysis can further effectively ensure the accuracy of re-entry user identification, and compared with the existing technology, each dimension is modeled separately for similarity analysis, and then the similarity analysis of each dimension is comprehensively analyzed to identify re-entry users. The identification method, the identification method described in the embodiment of the present invention, is simpler and easier to implement.
可选地,如图2所示,在步骤S110,获取至少两个用户的行为特征向量,包括:Optionally, as shown in FIG. 2, in step S110, at least two user behavior feature vectors are obtained, including:
S111,采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;S111, collect the behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;
S112,根据所述行为数据构造每一用户的所述时空行为特征立方体;其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为数据依据目标行为的时间和空间位置在所述时空行为特征立方体中分布;S112, constructing the spatio-temporal behavior characteristic cube of each user according to the behavior data; wherein the spatio-temporal behavior characteristic cube takes time, the longitude of the spatial position and the latitude of the spatial position as coordinates, and the behavior data is based on the time of the target behavior and spatial positions are distributed in the spatiotemporal behavioral feature cube;
S113,对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;S113. Perform cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determine that the corresponding behavior data whose intensity representation information is greater than a preset intensity threshold is the behavior feature vector;
S114,删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。S114. Delete other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.
在步骤S111中,采集每一用户的行为数据,包括采集四个要素:时间、空间位置、行为和强度表示信息。In step S111, the behavior data of each user is collected, including collecting four elements: time, spatial location, behavior and intensity representation information.
1)时间:以预设时长(如为10分钟)为周期,采集一段时间(例如两周)内用户在每天24小时指定周期范围内的行为数据;1) Time: With a preset period (for example, 10 minutes) as the period, collect the behavior data of the user within a specified period of 24 hours a day for a period of time (for example, two weeks);
2)空间位置:包括经纬度信息;其中,通过信令数据、通话和上网数据等,采集在指定时间段内的基站位置信息,并转换为经纬度信息,获得行为数据中的空间位置;2) Spatial location: including latitude and longitude information; among them, the location information of the base station within a specified time period is collected through signaling data, call and Internet access data, etc., and converted into latitude and longitude information to obtain the spatial location in the behavior data;
3)行为:可选地,所采集的行为可以包括:用户处于开机状态但是没有通话或者上网行为,例如用户待机;通话行为,以号段区分,例如C_139;应用程序APP行为,以APP区分等。3) Behavior: Optionally, the collected behaviors may include: the user is in the power-on state but does not have a call or surf the Internet, for example, the user is on standby; the call behavior is distinguished by number segment, such as C_139; the application program APP behavior is distinguished by APP, etc. .
4)强度表示信息:可选地,该强度表示信息表示为预设统计周期内目标行为的累计时长;需要说明的是,该累计时长为预设统计周期内目标行为在目标空间发生的累计时长。例如,以其中一位置在一段时间(例如两周内)的累计行为时长(单位为秒)来标识用户的行为强度表示信息。4) Intensity representation information: Optionally, the intensity representation information is expressed as the cumulative duration of the target behavior within the preset statistical period; it should be noted that the cumulative duration is the cumulative duration of the target behavior occurring in the target space within the preset statistical period . For example, the user's behavior intensity representation information is identified by the cumulative behavior duration (in seconds) of a location within a period of time (for example, within two weeks).
通过基于上述四个要素进行每一用户的行为数据采集,获得多组分别对应不同目标行为,分别包括目标行为、时间、空间位置和强度表示信息的行为数据。例如,所采集行为数据的示例可以如下表1所示:By collecting the behavior data of each user based on the above four elements, multiple groups of behavior data corresponding to different target behaviors, including target behavior, time, spatial position and intensity representation information, are obtained. For example, an example of collected behavioral data may be shown in Table 1 below:
表1Table 1
在通过步骤S111获得上述形式的行为数据的条件下,本发明实施例中,在步骤S112,利用所获得的行为数据构造时空行为特征立方体。其中,在该时空行为特征立方体中,以时间、空间位置的经度和空间位置的纬度分别为特征位置的三个维度坐标,用户的每一行为数据以经度、纬度和时间三个维度为表征,在时空行为特征立方体中分布,且以特征点表示,每一行为数据对应一个特征点。On the condition that the behavior data in the above form is obtained through step S111, in the embodiment of the present invention, in step S112, a spatio-temporal behavior characteristic cube is constructed using the obtained behavior data. Among them, in the spatio-temporal behavior feature cube, time, the longitude of the spatial position and the latitude of the spatial position are respectively the three-dimensional coordinates of the feature position, and each behavior data of the user is characterized by the three dimensions of longitude, latitude and time, Distributed in the spatiotemporal behavioral feature cube and represented by feature points, each behavioral data corresponds to a feature point.
该时空行为特征立方体的示例可以如图3所示,不同目标行为在图3中用不同灰度表示,具体实施时可以通过颜色区分。根据图3,通过该时空行为特征立方体,能够清楚展示不同目标行为在时间和空间上的分布状况。An example of the spatio-temporal behavior feature cube can be shown in Figure 3. Different target behaviors are represented by different gray levels in Figure 3, which can be distinguished by color during specific implementation. According to Figure 3, through the spatio-temporal behavior feature cube, the distribution of different target behaviors in time and space can be clearly displayed.
需要说明的是,通过上述构造时空行为特征立方体的方式,对应每一用户可以分别构造相对应的时空行为特征立方体。It should be noted that, through the above method of constructing a spatio-temporal behavior feature cube, a corresponding spatio-temporal behavior feature cube can be constructed for each user.
进一步地,本发明实施例所述重入网用户的识别方法,在通过步骤S112构造时空行为特征立方体之后,通过步骤S113,对时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为行为特征向量,并保留所述时空行为特征立方体中所确定的所述行为特征向量相对应的特征点,以用于后续进行重入网用户识别时的相似度分析。Further, in the method for identifying re-entry users described in the embodiment of the present invention, after constructing the spatiotemporal behavior characteristic cube in step S112, cluster analysis is performed on the behavior data in the spatiotemporal behavior characteristic cube in step S113, and it is determined that the intensity indicates that the information is greater than The behavioral data corresponding to the preset intensity threshold is a behavioral feature vector, and the feature points corresponding to the behavioral feature vector determined in the spatio-temporal behavioral feature cube are reserved for subsequent similarity in the identification of re-entry users analyze.
可选地,在步骤S113,所述对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为行为特征向量,如图4所示,包括:Optionally, in step S113, performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is a behavior feature vector, as shown in FIG. 4 ,include:
S1131,依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;S1131. Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;
S1132,对每一切片数据内的行为数据进行聚类,确定至少一聚类点;S1132, cluster the behavior data in each slice data, and determine at least one cluster point;
S1133,将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。S1133. Compare the intensity representation information of the behavior data corresponding to each cluster point with a preset intensity threshold, and determine that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector.
在步骤S1131,在依据时间维度对所述时空行为特征立方体进行切片时,可以依据时间维段,按照每间隔预设时长,对时空行为特征立方体进行切换,形成多个切片数据;例如,对所构造的用户的时空行为特征立方体依据每半小时进行切片的方式,形成48个切片数据。In step S1131, when slicing the spatio-temporal behavior feature cube according to the time dimension, the spatio-temporal behavior feature cube can be switched according to the preset duration of each interval according to the time dimension segment to form multiple slice data; for example, for all The constructed user's spatio-temporal behavior feature cube forms 48 slices of data according to the way of slice every half hour.
在步骤S1132,对每一切片数据内的行为数据进行聚类,可选地,对于每一切片数据,可以按照具有噪声的基于密度的聚类方法(Density-Based Spatial Clustering ofApplications with Noise,DBSCAN)进行聚类,每一聚类取其中心点作为聚类点,将该聚类点作为特征标识,能够反映用户特定行为在特定时间的平均位置和平均强度,从而能够作为目标提取数据,构建用户的行为特征向量。In step S1132, the behavioral data in each slice data is clustered. Optionally, for each slice data, a density-based clustering method (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) can be used. For clustering, each cluster takes its center point as a cluster point, and the cluster point as a feature identifier, which can reflect the average position and average intensity of a user's specific behavior at a specific time, so that data can be extracted as a target and build a user profile. Behavioral feature vector of .
例如,假设某用户每天六点左右下班坐地铁回家,在地铁上喜欢玩抖音,那么其6:00-6:30经过密度聚类的切片数据的简化版本如下如图5所示,黑圆点代表用户待机行为,白圆点代表玩抖音行为,点的大小代表行为强度。For example, suppose a user takes the subway home from get off work at around 6:00 every day, and likes to play Douyin on the subway, then the simplified version of the sliced data that has undergone density clustering from 6:00 to 6:30 is shown in Figure 5, black The dots represent the user's standby behavior, the white dots represent the behavior of playing Douyin, and the size of the dots represents the intensity of the behavior.
因此,通过上述的步骤S1132,能够确定出每一切片数据中的至少一聚类点,通过该聚类点能够反映用户特定行为在特定时间的平均位置和平均强度。Therefore, through the above step S1132, at least one cluster point in each slice data can be determined, and the cluster point can reflect the average position and average intensity of the user's specific behavior at a specific time.
在此基础上,通过步骤S1133,将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,也即对每一聚类点的强度表示信息进行阈值判定,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,也即判定为有效数据,进行保存,并构建为用户的行为特征向量。On this basis, through step S1133, the intensity representation information of the behavior data corresponding to each cluster point is compared with the preset intensity threshold value, that is, the threshold value judgment is performed on the intensity representation information of each cluster point, and the intensity representation information is determined. The corresponding behavioral data whose information is greater than the preset intensity threshold is the behavioral feature vector, that is, it is determined to be valid data, saved, and constructed as the user's behavioral feature vector.
本发明实施例的其中一实施方式,如图6所示,在步骤S120,在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,所述方法还包括:In one implementation of the embodiments of the present invention, as shown in FIG. 6, in step S120, before performing similarity analysis on the feature points of the first user and the feature points of the second user among the at least two users, the method further include:
S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;S1101. Perform de-unit normalization transformation on the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user, to obtain the normalized data vector of the first user and the normalized data vector of the second user Data vector; wherein each normalized data vector corresponds to a feature point;
其中,在步骤S120中,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, in step S120, the similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, including:
对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.
具体地,通过上述方式,对用户的行为特征向量进行数据的标准化,去除数据的单位限制,转化为无量纲的纯数值,便于不同单位或量级的指标能够进行计算和比较,以能够用于后续不同用户的标准化数据向量所对应特征点的相似度比较。Specifically, through the above method, standardize the data of the user's behavior feature vector, remove the unit restriction of the data, and convert it into a dimensionless pure value, so that indicators of different units or magnitudes can be calculated and compared, so that they can be used for Subsequent similarity comparisons of feature points corresponding to standardized data vectors of different users.
本发明实施例中,为保证数据分析准确性,可选地,在步骤S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,所述方法还包括:In the embodiment of the present invention, in order to ensure the accuracy of data analysis, optionally, in step S1101, the behavior feature vector in the first user's spatio-temporal behavior feature cube and the behavior feature vector in the second user's spatio-temporal behavior feature cube are Before de-unit normalization conversion, the method also includes:
根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;
对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;
其中,在步骤S1101,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, in step S1101, the behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized, including:
对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.
其中,可选地,根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Wherein, optionally, in the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user respectively, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user , to determine the time domain segmentation point, including:
依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;
根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;
选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.
设定第一用户为原在网用户,第二用户为待匹配用户,在从数据库中提取该第一用户和该第二用户的行为特征向量,在对第一用户和第二用户的行为特征向量进行去单位标准化转换之前,确定时域切分点,进行时域切分拼装。The first user is set as the original online user, the second user is the user to be matched, and the behavior feature vectors of the first user and the second user are extracted from the database, and the behavior features of the first user and the second user are Before the vector is de-united and normalized, the time-domain segmentation point is determined, and the time-domain segmentation is performed.
通过确定时域切分点,选择出用户活动强度最弱点,根据用户活动强度较弱范围,可以对上述所确定的时空行为特征立方体重新切分拼装,进行后续重入网用户的识别。By determining the time-domain segmentation point and selecting the weakest point of user activity intensity, according to the weaker range of user activity intensity, the above-mentioned determined spatio-temporal behavior characteristic cube can be re-segmented and assembled for subsequent identification of re-entry users.
需要说明的是,由于行为特征向量中记录的目标行为的发生时间位于一天的时间范围内时,若以默认的0点时间轴为起点进行行为特征向理提取分析,容易出现错误判断的情况,这是因为若用户存在0点附近的行为,如在23:30-24:00周期和在00:00-00:30周期的相同地点相同行为,但其实际间隔在一小时之内,在时间轴以0点的特征空间中,该同一行为的时间间隔达20多个小时,因此该数据的间隔性,会导致存在错误判断的情况。It should be noted that since the occurrence time of the target behavior recorded in the behavior feature vector is within the time range of one day, if the default 0-point time axis is used as the starting point for behavior feature extraction and analysis, it is easy to make a wrong judgment. This is because if the user has behaviors near 0 o'clock, such as the same behavior in the same place in the period of 23:30-24:00 and in the period of 00:00-00:30, but the actual interval is within one hour, at time In the feature space where the axis is 0, the time interval of the same behavior is more than 20 hours, so the interval of the data will lead to misjudgment.
基于此,本发明实施例所述识别方法中,通过上述的选择出用户活动强度最弱点的方式,通过所选出的用户活动强度最弱点,对行为特征向量中依据时域重新进行切换排列。在采用上述行为特征向量构造时空行为特征立方体时,对所构造的时空行为特征立方体重新进行切分拼装,以所确定的时域切分点作为行为特征向量分布的起始时间点。举例说明,如图7所示,根据用户A和用户B的行为特征向量确定的时空行为特征立方体的时域切分点为2点,则对时空行为特征立方体进行时域切分拼装后,2点变换为时间轴起点。Based on this, in the identification method described in the embodiment of the present invention, through the above-mentioned method of selecting the weakest point of user activity intensity, the selected weakest point of user activity intensity is used to re-switch and arrange the behavior feature vector according to the time domain. When the above-mentioned behavioral feature vectors are used to construct the spatio-temporal behavioral feature cube, the constructed spatio-temporal behavioral feature cube is re-segmented and assembled, and the determined time-domain segmentation point is used as the starting time point of behavioral feature vector distribution. For example, as shown in Figure 7, the time-domain segmentation point of the spatio-temporal behavior feature cube determined according to the behavior feature vectors of user A and user B is 2 points, then after time-domain segmentation and assembling of the spatio-temporal behavior feature cube, 2 The point is transformed into the time axis starting point.
本发明实施例中,通过依据时间维度,对所述第一用户的时空行为特征立方体和所述第二用户的行为特征立方体分别进行同一目标行为所对应强度表示信息的累加,根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。也即,提取待匹配两个用户的行为特征向量,按照时间维度进行用户行为强度的累加,选取发生最大行为强度的点作为起点,绘制行为强度的一天24小时变化曲线,然后选取两个用户所对应行为变化曲线的极低点作为时域切分点。In the embodiment of the present invention, by accumulating the intensity representation information corresponding to the same target behavior on the spatio-temporal behavior characteristic cube of the first user and the behavior characteristic cube of the second user according to the time dimension, according to each target behavior Corresponding to the accumulated maximum intensity information value, draw the first intensity change curve of the behavior feature vector of the first user, and draw the second intensity change curve of the behavior feature vector of the second user; select the first intensity The lowest point of the change curve and the second intensity change curve is the time domain cut point. That is, extract the behavior feature vectors of the two users to be matched, accumulate the user behavior intensity according to the time dimension, select the point where the maximum behavior intensity occurs as the starting point, draw the 24-hour change curve of the behavior intensity, and then select the two users The extremely low point of the corresponding behavior change curve is used as the cut-off point in the time domain.
本发明实施例所述识别方法,在上述对用户的时空行为特征立方体进行时域切分之后,进一步对每一用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换。具体地,可以通过离差标准化法或者标准差标准化法,对每一用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,以去除行为特征向量中数据的单位限制,转化为无量纲的纯数值,便于不同单位或量级的指标能够进行计算和比较。In the recognition method described in the embodiment of the present invention, after the time-domain segmentation of the user's spatio-temporal behavior feature cube is performed, the behavior feature vector in each user's spatio-temporal behavior feature cube is further de-united and standardized. Specifically, the behavior feature vector in the spatio-temporal behavior feature cube of each user can be de-united and normalized through the dispersion standardization method or the standard deviation standardization method, so as to remove the unit limitation of the data in the behavior feature vector and transform it into a dimensionless The pure value of , which facilitates the calculation and comparison of indicators with different units or magnitudes.
其中一实施方式,通过离差标准化法对原始的行为特征向量进行去单位标准化转换。其中该转换方式所采用公式可以为:In one implementation manner, the original behavioral feature vector is subjected to de-unit normalization conversion by using a deviation normalization method. The formula used in this conversion method can be:
X'=(X-min)/(max-min);X'=(X-min)/(max-min);
其中,X'为转换后的数据,X为转换前的数据,max为转换样本数据中的最大值,min为转换样本数据中的最小值。Among them, X' is the data after conversion, X is the data before conversion, max is the maximum value in the converted sample data, and min is the minimum value in the converted sample data.
通过该方式,可以将行为特征向量各个维度的数据转换为位于【0,1】区间的数据,从而去除行为特征向量中不同维度数据的单位限制。In this way, the data of each dimension of the behavioral feature vector can be converted into data in the [0, 1] interval, thereby removing the unit limitation of different dimensional data in the behavioral feature vector.
例如,采用该方式,可以对行为特征向量中的发生时间、经度和纬度进行去标准化转换,获得如下表2所示的行为特征向量:For example, using this method, the occurrence time, longitude, and latitude in the behavioral feature vector can be denormalized and converted to obtain the behavioral feature vector shown in Table 2 below:
表2Table 2
另一实施方式,通过标准差标准化法对原始的行为特征向量进行去单位标准化转换。可选地,该转换方式通常是将转换前的数据减去均值,再除以标准差,获得转换后的数据,转换后的数据符合标准正态分布(均值为0,方差为1)。In another implementation manner, the original behavioral feature vector is subjected to de-unit normalization transformation by standard deviation normalization method. Optionally, the conversion method is usually to subtract the mean value from the data before conversion, and then divide it by the standard deviation to obtain the converted data. The converted data conforms to the standard normal distribution (the mean is 0 and the variance is 1).
需要说明的是,本领域技术人员应该能够了解上述进行去单位标准化转换的具体方式,在此不详细说明。进一步地,进行去单位标准化转换的方式不限于仅能够包括上述的两种。It should be noted that those skilled in the art should be able to understand the specific manner of performing the de-unit normalization conversion above, which will not be described in detail here. Further, the manner of performing deunitization and normalization conversion is not limited to only include the above two types.
本发明实施例中,参阅图1,在步骤S120,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:In the embodiment of the present invention, referring to FIG. 1, in step S120, a similarity analysis is performed on the feature points of the first user and the feature points of the second user among at least two users, and it is judged whether the second user is the first user or not. User's re-entry users, including:
确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;
在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.
其中,可选地,确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Wherein, optionally, determining similar feature points between the feature points of the first user and the feature points of the second user includes:
选取第一用户的第一特征点;Selecting the first feature point of the first user;
计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;
分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.
本发明实施例中,可选地,第一用户的第一特征点和第二用户的特征点的相似性判断可以选择欧几里德算法。In the embodiment of the present invention, optionally, the Euclidean algorithm may be selected for the similarity judgment between the first feature point of the first user and the feature point of the second user.
可选地,所述分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, the analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature are similar feature points includes:
获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;
根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;
根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;
确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.
进一步地,根据所述第一权重值和所述第二权重值,确定权重系数,包括:Further, determining a weight coefficient according to the first weight value and the second weight value includes:
计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;
确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.
可选地,根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point includes:
依据以下公式计算相似度值:The similarity value is calculated according to the following formula:
Si=1-Di/Wi;Si=1-Di/Wi;
其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.
具体地,通过提取第一用户的第一特征点,在第二用户中确定与第一特征点最近的第二特征点,距离记为Di,并根据第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及所述第二特征点所对应目标行为在预设时长内发生时的第二权重值,确定权重系数Wi;根据所计算的距离Di和权重系数Wi,即能够计算出第一特征点和第二特征点的相似度。Specifically, by extracting the first feature point of the first user, the second feature point closest to the first feature point is determined in the second user, and the distance is denoted as Di, and according to the target behavior corresponding to the first feature point in the preset The first weight value when it occurs within the time length, and the second weight value when the target behavior corresponding to the second feature point occurs within the preset time length, determine the weight coefficient Wi; according to the calculated distance Di and weight coefficient Wi, That is, the similarity between the first feature point and the second feature point can be calculated.
可选地,第一特征点与第二特征点的距离可以根据上述所确定的时空行为特征立方体,利用该两个特征点在时间、纬度和经度三个维度上的坐标位置进行距离计算。Optionally, the distance between the first feature point and the second feature point can be calculated according to the above-mentioned determined space-time behavior feature cube, using the coordinate positions of the two feature points in the three dimensions of time, latitude and longitude.
根据以上,通过上述方式,可以在第二用户的特征点中,找到与第一用户的每一特征点相对应的距离最近的特征点,并分别计算相似度。According to the above, through the above method, among the feature points of the second user, the feature point with the closest distance corresponding to each feature point of the first user can be found, and the similarity can be calculated respectively.
可选地,若第二用户的特征点中,不存在与第一用户的特征点相对应距离最近的点,则相似度可以标记为0。Optionally, if there is no feature point of the second user that is closest to the feature point of the first user, the similarity may be marked as 0.
例如,对第一用户和第二用户的多个特征点的相似度比较结果可以为如下表3所示,其中B、C和D表示不同目标行为:For example, the similarity comparison results of a plurality of feature points of the first user and the second user can be as shown in the following table 3, where B, C and D represent different target behaviors:
表3table 3
进一步地,可以预先设定进行相似度判断的阈值(第二预设值),在第二用户的特征点与第一用户的特征点的相似度超过第二预设值时,则确定为相似特征点,否则不为相似特征点。Further, the threshold (second preset value) for similarity judgment can be preset, and when the similarity between the feature points of the second user and the feature points of the first user exceeds the second preset value, it is determined to be similar feature point, otherwise it is not a similar feature point.
另外,可以预先设定进行第一用户与第二用户进行重入网用户识别时,全部特征点相似度所达到的阈值(第一预设值),在相似特征点的数量与第一用户的全部特征点的数量比值大于第二预设值时,则确定第一用户与第二用户高度相似,则判断第二用户为第一用户的重入网用户。In addition, when the first user and the second user carry out re-entry user identification, the threshold (first preset value) reached by the similarity of all feature points can be preset. When the ratio of the number of feature points is greater than the second preset value, it is determined that the first user is highly similar to the second user, and then it is determined that the second user is a re-entry user of the first user.
利用上述的相似性判断原则,用户行为相似性的判断具有如下三个原则:Using the above similarity judgment principles, the judgment of user behavior similarity has the following three principles:
1.两个特征点的距离越短相似性越高;1. The shorter the distance between two feature points, the higher the similarity;
2.两个特征点的权重越接近相似性越高;2. The closer the weight of two feature points is, the higher the similarity is;
3.两个用户相似性高的特征点越多相似性越高。3. The more feature points with high similarity between two users, the higher the similarity.
本发明实施例所述重入网用户的识别方法,利用用户在特定时间、空间的行为具有高度相似性这一特性,在模型的数据构建阶段综合利用时间、空间和用户行为三个维度的数据构建用户特征立方体,并利用时间维度数据切片,密度聚类算法以及行为强度阈值判定等技术实现用户在特定时间空间典型特征行为的提取和行为特征向量的构建,然后对两个用户的特征向量进行时域切分拼接以及欧式坐标变换等预处理,最后通过欧氏距离算法进行两个用户的相似性比较,确定两个用户是否为重入网用户。The method for identifying re-entry users described in the embodiments of the present invention utilizes the characteristic that the behaviors of users at a specific time and space have a high degree of similarity, and comprehensively utilizes the three-dimensional data construction of time, space, and user behavior in the data construction stage of the model User feature cube, and use time-dimension data slicing, density clustering algorithm, and behavior intensity threshold determination technology to realize the extraction of typical feature behaviors of users in a specific time space and the construction of behavior feature vectors, and then perform time-dependent analysis of the two user feature vectors. Preprocessing such as domain segmentation and splicing and Euclidean coordinate transformation, and finally, the similarity comparison between two users is carried out through the Euclidean distance algorithm to determine whether the two users are re-entry users.
采用本发明实施例所述重入网用户的识别方法,同时利用时间、空间以及行为三个要素进行相似性分析,并利用行为特征向量在时空行为特征立方体中确定的特征点进行用户的相似度分析,相较于单一维度的分析方式,能够提高重入网用户判断的准确性;另外,通过对多个维度的数据进行综合建模,相较于在每一维度分别进行建模,能够避免容易发生误判的问题;进一步地,通过多种技术手段构建的简化后的用户特征向量大大减少了用户身份判断的数据量,提高了重入网用户判断的效率。Using the identification method for re-entry users described in the embodiment of the present invention, the three elements of time, space and behavior are used to perform similarity analysis, and the feature points determined by the behavior feature vector in the spatiotemporal behavior feature cube are used to perform user similarity analysis , compared with a single-dimensional analysis method, it can improve the accuracy of re-entrant user judgment; in addition, by comprehensively modeling data in multiple dimensions, compared with modeling in each dimension separately, it is possible to avoid prone to The problem of misjudgment; further, the simplified user feature vector constructed by various technical means greatly reduces the amount of data for user identity judgment, and improves the efficiency of judging re-entry users.
本发明实施例还提供一种重入网用户的识别装置,如图8所示,包括:The embodiment of the present invention also provides an identification device for re-entry users, as shown in Figure 8, including:
向量获取模块810,用于获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;A
比较模块820,用于根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;The
其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.
可选地,所述的重入网用户的识别装置,其中,所述向量获取模块810包括:Optionally, the device for identifying re-entrant users, wherein the
采集单元811,用于采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;A
第一构造单元812,用于根据所述行为数据构造每一用户的所述时空行为特征立方体;A
分析单元813,用于对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;The
第二构造单元814,用于删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。The
可选地,所述的重入网用户的识别装置,其中,所述分析单元813对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the device for identifying re-entry users, the
依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;
对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;
将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.
可选地,所述的重入网用户的识别装置,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the device for identifying re-entry users, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.
可选地,所述的重入网用户的识别装置,其中,所述装置还包括:Optionally, the device for identifying a re-entry user, wherein the device further includes:
转换模块8101,用于在比较模块820在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The
其中,所述比较模块820对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,具体为:Wherein, the
对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.
可选地,所述的重入网用户的识别装置,其中,比较模块820对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,具体为:Optionally, in the device for identifying re-entry users, the
确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;
在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.
可选地,所述的重入网用户的识别装置,其中,所述比较模块820确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, the device for identifying re-entry users, wherein the
选取第一用户的第一特征点;Selecting the first feature point of the first user;
计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;
分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.
可选地,所述的重入网用户的识别装置,其中,所述比较模块820分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, in the device for identifying re-entry users, the
获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;
根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;
根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;
确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.
可选地,所述的重入网用户的识别装置,其中,比较模块820根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, in the device for identifying re-entrant users, wherein the
计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;
确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.
可选地,所述的重入网用户的识别装置,其中,比较模块820根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, in the device for identifying re-entry users, wherein the
依据以下公式计算相似度值:The similarity value is calculated according to the following formula:
Si=1-Di/Wi;Si=1-Di/Wi;
其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.
可选地,所述的重入网用户的识别装置,其中,对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,转换模块8101还用于:Optionally, the device for identifying re-entry users, wherein, before de-uniting and normalizing the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user , the
根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;
对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;
其中,转换模块8101对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the
对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.
可选地,所述的重入网用户的识别装置,其中,转换模块8101根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the device for identifying re-entry users, the
依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;
根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;
选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.
可选地,所述的重入网用户的识别装置,其中,转换模块8101对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the device for identifying re-entry users, the
通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.
本发明实施例还提供一种识别设备,如图9所示,包括:处理器901;以及通过总线接口902与所述处理器901相连接的存储器903,所述存储器903用于存储所述处理器901在执行操作时所使用的程序和数据,处理器901调用并执行所述存储器903中所存储的程序和数据。An embodiment of the present invention also provides an identification device, as shown in FIG. 9 , including: a
其中,收发机904与总线接口902连接,用于在处理器901的控制下接收和发送数据,具体地,处理器901用于读取存储器903中的程序,执行下列过程:Wherein, the
获取至少两个用户的行为特征向量;所述行为特征向量中记录了目标行为的发生时间、空间位置和强度表示信息;Obtaining behavioral feature vectors of at least two users; the behavioral feature vectors record the occurrence time, spatial position and intensity representation information of the target behavior;
根据每一用户的所述行为特征向量在时空行为特征立方体中确定的特征点,对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户;According to the feature points determined by the behavior feature vector of each user in the spatio-temporal behavior feature cube, the feature points of the first user and the feature points of the second user among at least two users are analyzed for similarity, and the second user is judged. Whether the user is a re-entry user of the first user;
其中所述时空行为特征立方体以时间、空间位置的经度和空间位置的纬度为坐标,所述行为特征向量所对应的特征点依据目标行为的时间和空间位置在所述时空行为特征立方体中分布。Wherein the spatio-temporal behavior feature cube takes time, the longitude of the spatial position, and the latitude of the spatial position as coordinates, and the feature points corresponding to the behavior feature vector are distributed in the spatio-temporal behavior feature cube according to the time and space position of the target behavior.
可选地,所述的识别设备,其中,所述处理器901获取至少两个用户的行为特征向量,包括:Optionally, the identification device, wherein the
采集每一用户的行为数据;所述行为数据包括不同目标行为的时间、空间位置和强度表示信息;Collect behavior data of each user; the behavior data includes time, space position and intensity representation information of different target behaviors;
根据所述行为数据构造每一用户的所述时空行为特征立方体;Constructing the spatiotemporal behavior characteristic cube of each user according to the behavior data;
对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量;performing cluster analysis on the behavior data in the spatio-temporal behavior feature cube, and determining that the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is the behavior feature vector;
删除所述时空行为特征立方体中所述行为特征向量相对应特征点之外的其他特征点。Deleting other feature points in the spatio-temporal behavior feature cube other than the feature points corresponding to the behavior feature vector.
可选地,所述的识别设备,其中,所述处理器901对所述时空行为特征立方体中的行为数据进行聚类分析,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量,包括:Optionally, in the identification device, the
依据时间维度对所述时空行为特征立方体进行切片,形成多个切片数据;Slicing the spatio-temporal behavior feature cube according to the time dimension to form a plurality of slice data;
对每一切片数据内的行为数据进行聚类,确定至少一聚类点;Clustering the behavioral data in each slice data to determine at least one clustering point;
将每一聚类点相对应行为数据的强度表示信息与预设强度阈值进行比较,确定强度表示信息大于预设强度阈值的相对应行为数据为所述行为特征向量。The intensity representation information of the behavior data corresponding to each cluster point is compared with a preset intensity threshold, and the corresponding behavior data whose intensity representation information is greater than the preset intensity threshold is determined as the behavior feature vector.
可选地,所述的识别设备,其中,所述强度表示信息表示为预设统计周期内目标行为的累计时长。Optionally, in the identification device, the intensity indication information is expressed as the cumulative duration of the target behavior within a preset statistical period.
可选地,所述的识别设备,其中,处理器901在对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析之前,还用于:Optionally, in the identification device, the
对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,获得第一用户的标准化数据向量和第二用户的标准化数据向量;其中每一标准化数据向量对应一个特征点;The behavior feature vector in the spatio-temporal behavior feature cube of the first user and the behavior feature vector in the spatio-temporal behavior feature cube of the second user are deunited and standardized to obtain the normalized data vector of the first user and the normalized data vector of the second user ; Each normalized data vector corresponds to a feature point;
其中,处理器901对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,包括:Wherein, the
对第一用户的标准化数据向量相对应的特征点与第二用户的标准化数据向量相对应的特征点,进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户。Perform similarity analysis on the feature points corresponding to the normalized data vector of the first user and the feature points corresponding to the normalized data vector of the second user to determine whether the second user is a re-entry user of the first user.
可选地,所述的识别设备,其中,处理器901对至少两个用户中第一用户的特征点与第二用户的特征点进行相似度分析,判断所述第二用户是否为所述第一用户的重入网用户,包括:Optionally, in the identification device, the
确定所述第一用户的特征点与所述第二用户的特征点相比较的相似特征点;determining similar feature points between the feature points of the first user and the feature points of the second user;
在所述相似特征点的数量与所述第一用户的特征点的数量之间的比值大于第一预设值时,确定所述第二用户为所述第一用户的重入网用户。When the ratio between the number of similar feature points and the number of feature points of the first user is greater than a first preset value, it is determined that the second user is a re-entry user of the first user.
可选地,所述的识别设备,其中,所述处理器901确定所述第一用户的特征点与第二用户的特征点相比较的相似特征点,包括:Optionally, in the identification device, wherein the
选取第一用户的第一特征点;Selecting the first feature point of the first user;
计算所述第二用户中与所述第一特征点距离最短的第二特征点;其中所述第一特征点和所述第二特征点所对应行为特征向量的目标行为相同;Calculating a second feature point of the second user with the shortest distance from the first feature point; wherein the target behavior of the behavior feature vector corresponding to the first feature point and the second feature point is the same;
分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点。Analyzing the similarity value between the first feature point and the second feature point, and judging whether the first feature point and the second feature point are similar feature points.
可选地,所述的识别设备,其中,所述处理器901分析所述第一特征点与所述第二特征点的相似度值,判断所述第一特征点与所述第二特征是否为相似特征点,包括:Optionally, in the identification device, the
获取所述第一特征点所对应目标行为在预设时长内发生时的第一权重值,以及获取所述第二特征点所对应目标行为在预设时长内发生时的第二权重值;Acquiring a first weight value when the target behavior corresponding to the first feature point occurs within a preset time period, and acquiring a second weight value when the target behavior corresponding to the second feature point occurs within a preset time period;
根据所述第一权重值和所述第二权重值,确定权重系数;determining a weight coefficient according to the first weight value and the second weight value;
根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值;calculating a similarity value according to the weight coefficient and the distance between the first feature point and the second feature point;
确定所述相似度值大于第二预设值时,所述第一特征点与所述第二特征为相似特征点。When it is determined that the similarity value is greater than a second preset value, the first feature point and the second feature are similar feature points.
可选地,所述的识别设备,其中,处理器901根据所述第一权重值和所述第二权重值,确定权重系数,包括:Optionally, in the identification device, wherein the
计算所述第一权重值与所述第二权重值中的最小值,与所述第一权重值与所述第二权重值中的最大值的比值;calculating the ratio of the minimum value of the first weight value and the second weight value to the maximum value of the first weight value and the second weight value;
确定所述比值为所述权重系数。The ratio is determined as the weight coefficient.
可选地,所述的识别设备,其中,处理器901根据所述权重系数和所述第一特征点与所述第二特征点之间的距离,计算相似度值,包括:Optionally, in the identification device, wherein the
依据以下公式计算相似度值:The similarity value is calculated according to the following formula:
Si=1-Di/Wi;Si=1-Di/Wi;
其中,Si为相似度值;Di为所述第一特征点与所述第二特征点之间的距离;Wi为所述权重系数。Wherein, Si is a similarity value; Di is a distance between the first feature point and the second feature point; Wi is the weight coefficient.
可选地,所述的识别设备,其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换之前,还用于:Optionally, in the identification device, before the
根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点;In the spatio-temporal behavioral feature cube constructed according to the behavioral feature vector of the first user and the behavioral feature vector of the second user, the time distribution dimension of the behavioral feature vector of the first user and the behavioral feature vector of the second user is determined to determine the time domain segmentation point;
对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体进行切分拼装,使切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体分别以所述时域切分点所对应的时间作为起始时间点;Segment and assemble the spatiotemporal behavior characteristic cube of the first user and the spatiotemporal behavior characteristic cube of the second user, so that the spatiotemporal behavior characteristic cube of the first user and the second user's spatiotemporal behavior characteristic cube after segmentation and assembling The spatio-temporal behavior characteristic cube uses the time corresponding to the time domain segmentation point as the starting time point;
其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Wherein, the
对切分拼装后的所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。De-unit normalization is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube after segmentation and assembly.
可选地,所述的识别设备,其中,处理器901根据第一用户的行为特征向量和第二用户的行为特征向量分别构建的时空行为特征立方体中,第一用户的行为特征向量和第二用户的行为特征向量的时间分布维度,确定时域切分点,包括:Optionally, in the recognition device, the
依据时间维度,对所述第一用户的时空行为特征立方体中的行为特征向量和所述第二用户的时空行为特征立方体中的行为特征向量分别进行同一目标行为所对应强度表示信息的累加;According to the time dimension, the behavior feature vectors in the spatio-temporal behavior feature cube of the first user and the behavior feature vectors in the spatio-temporal behavior feature cube of the second user respectively accumulate the intensity representation information corresponding to the same target behavior;
根据每一目标行为所对应累加获得的最大强度信息值,绘制所述第一用户的行为特征向量的第一强度变化曲线,以及绘制所第二用户的行为特征向量的第二强度变化曲线;Draw a first intensity change curve of the behavior feature vector of the first user and draw a second intensity change curve of the behavior feature vector of the second user according to the accumulated maximum intensity information value corresponding to each target behavior;
选取所述第一强度变化曲线和所第二强度变化曲线中的最低点为所述时域切分点。The lowest point of the first intensity variation curve and the second intensity variation curve is selected as the time domain segmentation point.
可选地,所述的识别设备,其中,处理器901对第一用户的时空行为特征立方体中的行为特征向量和第二用户的时空行为特征立方体中的行为特征向量进行去单位标准化转换,包括:Optionally, in the identification device, wherein, the
通过离差标准化法或者标准差标准化法,对所述第一用户的时空行为特征立方体和所述第二用户的时空行为特征立方体中的行为特征向量分别进行去单位标准化转换。By means of a dispersion normalization method or a standard deviation normalization method, de-unit normalization conversion is performed on the behavior feature vectors in the first user's spatio-temporal behavior feature cube and the second user's spatio-temporal behavior feature cube respectively.
其中,在图9中,总线架构可以包括任意数量的互联的总线和桥,具体由处理器901代表的一个或多个处理器和存储器903代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。收发机904可以是多个元件,即包括发送机和接收机,提供用于在传输介质上与各种其他装置通信的单元。处理器901负责管理总线架构和通常的处理,存储器903可以存储处理器901在执行操作时所使用的数据。Wherein, in FIG. 9 , the bus architecture may include any number of interconnected buses and bridges, specifically one or more processors represented by the
本领域技术人员可以理解,实现上述实施例的全部或者部分步骤可以通过硬件来完成,也可以通过程序来指示相关的硬件来完成,所述程序包括执行上述方法的部分或者全部步骤的指令;且该程序可以存储于一可读存储介质中,存储介质可以是任何形式的存储介质。Those skilled in the art can understand that all or part of the steps of the above-mentioned embodiments can be implemented by hardware, or can be completed by instructing the relevant hardware through a program, and the program includes instructions for executing some or all of the steps of the above-mentioned method; and The program can be stored in a readable storage medium, and the storage medium can be any form of storage medium.
另外,本发明具体实施例还提供一种可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如上中任一项所述的重入网用户的识别方法的步骤。In addition, a specific embodiment of the present invention also provides a readable storage medium on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method for identifying re-entry users as described in any one of the above items are implemented.
具体地,该可读存储介质应用于上述的识别设备,在应用于识别设备时,对应重入网用户的识别方法中的执行步骤如上的详细描述,在此不再赘述。Specifically, the readable storage medium is applied to the above-mentioned identification device. When applied to the identification device, the execution steps in the method for identifying re-entry users are as described in detail above, and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be physically included separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述收发方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units may be stored in a computer-readable storage medium. The above-mentioned software functional units are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute some steps of the sending and receiving methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), magnetic disk or optical disk, etc., which can store program codes. medium.
以上所述的是本发明的优选实施方式,应当指出对于本技术领域的普通人员来说,在不脱离本发明所述原理前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。What has been described above is a preferred embodiment of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications can also be made without departing from the principle of the present invention. These improvements and modifications should also be considered as Be the protection scope of the present invention.
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010350086.6A CN113573242B (en) | 2020-04-28 | 2020-04-28 | Identification method, device and equipment of re-networking user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010350086.6A CN113573242B (en) | 2020-04-28 | 2020-04-28 | Identification method, device and equipment of re-networking user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113573242A CN113573242A (en) | 2021-10-29 |
CN113573242B true CN113573242B (en) | 2023-03-31 |
Family
ID=78158091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010350086.6A Active CN113573242B (en) | 2020-04-28 | 2020-04-28 | Identification method, device and equipment of re-networking user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113573242B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114860557B (en) * | 2022-04-08 | 2023-05-26 | 广东联想懂的通信有限公司 | User behavior information generation method, device, equipment and readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104902498A (en) * | 2015-04-17 | 2015-09-09 | 中国联合网络通信集团有限公司 | Identification method and device for subscriber re-networking |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682041B (en) * | 2011-03-18 | 2014-06-04 | 日电(中国)有限公司 | User behavior identification equipment and method |
CN105281925B (en) * | 2014-06-30 | 2019-05-14 | 腾讯科技(深圳)有限公司 | The method and apparatus that network service groups of users divides |
CN110290513B (en) * | 2019-07-05 | 2021-10-15 | 中国联合网络通信集团有限公司 | A method and system for identifying re-entry users |
-
2020
- 2020-04-28 CN CN202010350086.6A patent/CN113573242B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104902498A (en) * | 2015-04-17 | 2015-09-09 | 中国联合网络通信集团有限公司 | Identification method and device for subscriber re-networking |
Also Published As
Publication number | Publication date |
---|---|
CN113573242A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147722B (en) | Video processing method, video processing device and terminal equipment | |
CN110147710B (en) | Method and device for processing human face features and storage medium | |
CN111612038B (en) | Abnormal user detection method and device, storage medium and electronic equipment | |
CN110019891B (en) | Image storage method, image retrieval method and device | |
CN111859451B (en) | Multi-source multi-mode data processing system and method for applying same | |
JP7038143B2 (en) | How to estimate the deleteability of a data object | |
CN110348516B (en) | Data processing method, data processing device, storage medium and electronic equipment | |
WO2019062081A1 (en) | Salesman profile formation method, electronic device and computer readable storage medium | |
CN108268886B (en) | Method and system for identifying plug-in operations | |
WO2018033052A1 (en) | Method and system for evaluating user portrait data | |
CN118378218B (en) | Safety monitoring method for computer host | |
CN112307133A (en) | Security protection method and device, computer equipment and storage medium | |
CN105825232A (en) | Classification method and device for electromobile users | |
CN113573242B (en) | Identification method, device and equipment of re-networking user | |
CN115392937A (en) | User fraud risk identification method and device, electronic equipment and storage medium | |
CN113505369A (en) | Method and device for training user risk recognition model based on space-time perception | |
CN117893756A (en) | Training method of image segmentation model, handheld object recognition method, device and medium | |
CN108230001A (en) | The method, apparatus and system of extending user | |
CN110705777B (en) | Method, device and system for predicting spare part reserve | |
CN117150138A (en) | Scientific and technological resource organization method and system based on high-dimensional space mapping | |
CN112487082A (en) | Biological feature recognition method and related equipment | |
CN117785973B (en) | Community user information integration method, device, equipment and storage medium | |
CN112907306A (en) | Customer satisfaction judging method and device | |
CN119474689B (en) | Intelligent label processing method, system and medium based on customer management technology | |
CN119179624B (en) | An IT operation and maintenance service information integration service management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |