Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include technologies such as image segmentation, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also include common biometric technologies such as face recognition, fingerprint recognition, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, formal learning, metric learning, and the like.
The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, the machine learning technology and the like, and is specifically explained by the following embodiment:
the target user identification method based on the network hotspot can be applied to the application environment shown in fig. 1. In this application environment, a terminal 102, a server 104 and a network hotspot 106 are included. The terminal 102 and the network hotspot 106 can be connected through a wireless network; the terminal 102 and the server 104, and the server 104 and the network hotspot 106 may be connected via a wireless network or a data line. The terminal 102 connects with a hotspot network sent by the network hotspot 106, and generates network hotspot connection data in the connection process, such as connection time, location, network hotspot identification, user identification, and the like for connecting with the network hotspot 106. After acquiring the network hotspot connection data, the server 104 may generate a user relationship graph based on the network hotspot connection data; the user nodes connected in the user relationship graph represent the same network hotspots connected by corresponding users; acquiring user attribute characteristics corresponding to each user node in a user relation graph; aiming at each user node in the user relationship graph, sampling neighbor user nodes connected with the aimed user node; aggregating the user attribute characteristics corresponding to the sampled neighbor user nodes and the user attribute characteristics corresponding to the user nodes to obtain neighbor aggregate attribute characteristics; and classifying users corresponding to the aimed user nodes based on the neighbor aggregation attribute characteristics to determine the diffusion users.
The terminal 102 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like.
The server 104 may be an independent physical server, may also be a server cluster composed of a plurality of physical servers, and may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Network hotspot 106 may refer to a network device, such as a wireless router or wireless switch, that provides wireless local area network access to the Internet (Internet) services.
In one embodiment, as shown in fig. 2, a method for identifying a target user based on a network hotspot is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:
s202, generating a user relation graph based on the network hotspot connection data; the connected user nodes in the user relationship graph indicate that the network hotspots connected by the corresponding users are the same.
A network hotspot may refer to a network device that is capable of providing a wireless local area network to access Internet (Internet) services, among other things. The network hotspot connection data may refer to data formed when a user connects to a network hotspot through a terminal, and the network hotspot connection data may include: the connection time, the position information, the network hotspot identification, the user identification, the network connection relation and the like when the network hotspot is connected. Wherein the connection time may include a start time and an end time of connecting the network hotspot. The network hotspot identification may be used to distinguish between network hotspots or wireless local area networks originated by network hotspots, such as SSIDs (Service Set identifiers), which may be used to distinguish between different wireless local area networks. The user identifier may be a tag used for identifying a user, such as a user name, a user communication number (e.g., a mobile phone number, a network social account number, etc.), and may also be a terminal identifier of a terminal used when the user connects to a network hotspot.
The user relationship graph may refer to a graph in which the user identifiers are used as nodes and the user identifiers corresponding to the users connected to the same network hotspot are connected. Wherein, the nodes in the user relationship graph are called user nodes.
When the terminal is connected with the network hotspot, the network hotspot records connection time, a user identifier corresponding to the terminal and position information of the position of the terminal, and also records a network hotspot identifier of the network hotspot itself, and establishes an association relationship (namely a network connection relationship) between the user identifier corresponding to the terminal connected with the network hotspot and the network hotspot identifier, thereby obtaining network hotspot connection data. After the information recording is completed, the network hotspot connection data is stored locally or in a server.
In one embodiment, the server may obtain the network hotspot connection data from the network hotspot, or locally obtain the network hotspot connection data from the server, so that the server may determine which network hotspot the terminal corresponding to the user identifier is connected to according to the network connection relationship in the network hotspot connection data. The obtained network hotspot connection data may be data formed in a preset time period, for example, network hotspot connection data in the last week.
Specifically, the server may send a data acquisition request to the network hotspot, so that the network hotspot acquires corresponding network hotspot connection data according to the received data acquisition request, and then returns the network hotspot connection data to the server.
For the construction of the user relationship graph, the user relationship graph can be constructed in the following two ways:
mode 1, a user relationship graph is constructed based on a bipartite graph formed by network hotspot connection data.
In one embodiment, the user relationship graph is an isomorphic graph formed based on user nodes for reflecting user relationships; s202 may specifically include: the server extracts a network hotspot identification, a user identification and a network connection relation from the network hotspot connection data; generating a bipartite graph which takes the network hotspot identification as a network node and the user identification as a user node and is connected between the network node and the user node according to a network connection relation; and connecting the user nodes connected with the same network hotspot in the bipartite graph, and filtering the network nodes from the bipartite graph connected with the user nodes to obtain the same composition.
The user relationship may be used to reflect that different users are connected to the same network hotspot through respective terminals. Users connected to the same network hotspot have some similarity to some extent. For example, users who are connected to the same home network hotspot together may be family members, while users who are connected to the same corporate network hotspot may be colleagues, and such user relationships may be utilized to mine valuable information about the users.
The isomorphic graph may refer to a graph in which all nodes in the graph belong to the same category (that is, all nodes belong to a user type), that is, the isomorphic graph only includes user nodes and does not include other types of nodes.
A bipartite graph may refer to a graph that contains nodes in both the user node and the network node categories. The bipartite graph can be converted into a homogenous graph.
It should be noted that, in the bipartite graph, each user node may represent a corresponding user or a terminal of the user, and each network node represents a corresponding network hotspot. Correspondingly, in the same graph, each user node may represent a corresponding user or a terminal of the user. In the subsequent embodiment, if the user node is connected with the network hotspot or the network node, it indicates that the user corresponding to the user node is connected with the network hotspot corresponding to the network node through the terminal.
Specifically, after extracting the network hotspot identification, the user identification and the network connection relationship from the network hotspot connection data, the server may use the network hotspot identification as a network node and the user identification as a user node, then determine the network hotspot connected to the terminal of the user according to the network connection relationship, and then connect the user node of the user pair with the network node corresponding to the network hotspot, thereby obtaining the bipartite graph. Through the bipartite graph, the network hotspots connected with the user through the terminal in a certain time period can be obtained, and the network hotspots connected with the user in the certain time period can be obtained.
For example, fig. 3 is a bipartite graph obtained by connecting a network node and a user node according to a network connection relationship, and an icon in the bipartite graph is just an example, and may be represented by other icons, such as a dot or a small circle. In the figure, it can be seen which users are connected to which network hotspots, for example, a user corresponding to the user node 1 is connected to a network hotspot corresponding to the network node 1 through a computer, similarly, a user corresponding to the user node 2 is connected to a network hotspot corresponding to the network node 1 through a tablet computer, and so on. In addition, it can also be seen from the figure that the same user connects different network hotspots at different times, for example, the user corresponding to the user node 5 connects the network hotspot corresponding to the network node 1 and the network hotspot corresponding to the network node 2 through the tablet computer.
In one embodiment, in the bipartite graph, a server may connect at least two user nodes connecting the same network hotspot with an edge, thereby associating the at least two user nodes; in addition, after connecting at least two user nodes connected with the same network hotspot by using the edge, the network node can be filtered from the bipartite graph, so as to obtain the same graph about the user node. It should be noted that, by filtering out the network nodes from the bipartite graph, the corresponding edges are also deleted from the bipartite graph.
For example, fig. 4 is a bipartite graph obtained by connecting user nodes connected to the same network hotspot and filtering out network nodes. Fig. 4 is a same composition obtained by connecting the user nodes connected to the same network hotspot in the bipartite graph of fig. 3 and filtering out the network nodes, for example, in fig. 3, all the user nodes 1-5 connected to the network node 1 are connected by edges, that is, the user nodes 1-5 are connected to each other two by two, then the network node 1 is filtered out from the bipartite graph, and the corresponding edges are also filtered out from the bipartite graph, and so on, until all the user nodes connected to the same network hotspot are connected, and the network hotspot and the corresponding edge are deleted, the same composition as in fig. 4 can be obtained. In fig. 4, for at least two user nodes having a connection relationship, users corresponding to the user nodes connect the same network hotspot through a terminal, for example, the user node 1 and the user nodes 2 to 5 both connect the same network hotspot, that is, the network hotspot corresponding to the network node 1, and so on.
And 2, directly constructing a user relation graph based on the network hotspot connection data.
In one embodiment, the user relationship graph is an isomorphic graph formed based on user nodes for reflecting user relationships; s202 may specifically include: the server extracts the user identification from the network hotspot connection data; and taking the extracted user identification as a user node, and connecting target user nodes connected with the same network hotspot in all the user nodes so as to form the same composition.
For example, as shown in fig. 4, the server extracts all the user identifiers from the network hotspot connection data, and then uses the extracted user identifiers as the user nodes 1 to 23 to generate the user nodes 1 to 23 which include the user nodes 1 to 23 and are connected to the same network hotspot and are connected by using edges, thereby obtaining the same composition shown in fig. 4.
S204, obtaining the user attribute characteristics corresponding to each user node in the user relationship graph.
User attribute features may refer to various attribute features that describe a user, including but not limited to, gender, age, academic calendar, occupation, hobbies, points of interest (i.e., places of interest), assets, terminals, and native continents of the user.
In one embodiment, S204 may specifically include: the server can locally acquire corresponding user attribute characteristics according to the user identification corresponding to each user node; or sending an attribute feature acquisition request carrying a user identifier to the social server, acquiring the user attribute feature according to the carried user identifier by the social server when the authorization of the user is obtained, and then feeding back the acquired user attribute feature to the server.
The method comprises the following steps that a user generally uploads user attribute features during the process of registering an account of a social application or using the social application, and a social server stores the user attribute features uploaded by the user; when authorization of the user is obtained, the user attribute feature can be sent to the server, so that the server can locally obtain the corresponding user attribute feature according to the user identification.
In one embodiment, after obtaining the user attribute features corresponding to each user node, the server calculates edge weights between each connected user node in the user relationship graph, so as to weight the encoded user attribute features of the corresponding user nodes according to the edge weights, and then executes S206 and S208. Wherein an edge weight may refer to a weight of an edge between connected user nodes.
For the calculation of the edge weight, the edge weight calculation step may include: for a user corresponding to the user node, the server determines the frequency of connecting the network hotspot by the user in a preset time period; summing the frequencies corresponding to the users connected with the same network hotspot to obtain a sum value; and taking the sum value as the edge weight between the user nodes corresponding to the users connected with the same network hotspot.
The frequency may be the number of times that the user connects to a certain network hotspot in a preset time period through the terminal.
For example, as shown in fig. 3, for all users corresponding to the user nodes 1 to 23, that is, the users 1 to 23, the server determines the frequency of each of the users 1 to 23 connecting to the network hotspot, and if the frequency cntA that the user 1 uses the terminal 1 to connect to the network hotspot corresponding to the network node 1 within one week is 7 times, and the frequency cntB that the user 2 uses the terminal 2 to connect to the network hotspot corresponding to the network node 1 within one week is 2 times, the edge weight between the user node 1 and the user node 2 is logt(cntA)+logt(cntB)=logt7+log t2 where t is 2, e or 10 and e is a natural constant, about 2.718281828459045. That is, in the same graph in FIG. 4, the edge weight between user node 1 and user node 2 is logt7+log t2. Similarly, edge weights between user nodes in FIG. 4 may be calculated.
In one embodiment, the user attribute features include numeric attribute features and non-numeric attribute features; the method further comprises the following steps: the server carries out box separation processing on the numerical attribute characteristics to obtain discretized numerical attribute characteristics; and respectively coding the discretized numerical attribute characteristics and non-numerical attribute characteristics, and combining the coded results (namely the coded user attribute characteristics) into a characteristic matrix.
Wherein the numerical attribute feature indicates that the attribute feature of the user is related to a numerical value, such as the age of the user. The binning processing is to divide a continuous segment of numerical attribute features into a plurality of segments, and the numerical attribute features of each segment are regarded as a category. For example, the age of the user may be divided into different age groups, 0-6 being one, 6-12 being one, 12-18 being one, 18-24 being one, and so on, to obtain different age groups.
In one embodiment, after obtaining the discretized numerical attribute characteristics, the server encodes the discretized numerical attribute characteristics and the non-numerical attribute characteristics in a one-hot encoding mode, then weights the results obtained by encoding by using the edge weights, and combines the weighted results into a feature matrix, so that the feature matrix is a feature matrix subjected to weighting processing.
S206, aiming at each user node in the user relationship graph, the neighbor user nodes connected with the aimed user node are sampled preferentially according to the node relevance.
The node relevance may refer to relevance or affinity between user nodes in the user relationship graph, and the greater the relevance or affinity, the stronger the node relevance is. When sampling is carried out, the neighbor user nodes connected with the user node to which the user node is aimed are sampled according to the strength of the node relevance, namely the neighbor user nodes with stronger node relevance are easier to sample, and the neighbor user nodes with weaker node relevance are more difficult to sample in the same way. The strength of the node relevance may be affected by the number of edges between nodes, the weight of the nodes, or the frequency of connecting the user to the network hotspot corresponding to the user node, i.e., the higher the number of edges, the weight, or the frequency, the stronger the relevance of the corresponding node, and vice versa. The neighbor user node refers to a user node connected with a certain user node, and as shown in fig. 4, for the user node 1, the neighbor user nodes are user nodes 2-5.
In one embodiment, in the user relationship graph, the server samples neighboring user nodes of each user node. For example, as shown in fig. 4, for user nodes 1 to 23 in the user relationship diagram, neighboring user nodes of the user nodes may be sequentially sampled in an order from small to large in node sequence numbers, and for example, neighboring user nodes 2 to 5 of the user node 1 may be sampled to obtain sampled neighboring user nodes 3 and 5; in addition, sampling is carried out on the neighbor user nodes 1, 3-5 of the user node 2, and the sampled neighbor user nodes 1, 5 can be obtained, and so on.
In the sampling process, the sampling may be performed according to the edge weight, that is, the greater the probability that the user node with the larger edge weight is sampled, the smaller the probability that the user node with the smaller edge weight is sampled. In addition, in the sampling process, the sampling can also be performed according to the number of edges of the user node, that is, the probability that the user node is sampled is higher when the number of edges is larger; accordingly, the user nodes with fewer edges have a lower probability of being sampled. As shown in fig. 4, when the neighboring user nodes of the user node 1 are sampled, the probability of being sampled is higher because the number of edges of the user node 5 is the largest. The number of edges is the number of connecting edges between user nodes in the user relationship graph, and if two users have connected two identical wireless hotspots together, the number of edges between two corresponding user nodes is 2.
S208, weighting and summing the user attribute characteristics corresponding to the sampled neighbor user nodes and the user attribute characteristics corresponding to the user nodes, and normalizing the weighted and summed results to obtain the neighbor aggregation attribute characteristics.
S208 is a process of feature aggregation, and when feature aggregation is performed, aggregation may be performed by a weight-based method or by an attention-based method. Therefore, the user attribute features can be aggregated in the two ways, and the specific process is as follows:
mode 1, aggregation is performed based on a weighted mode.
In one embodiment, the server obtains edge weights between the targeted user node and each sampled neighbor user node; weighting the user attribute characteristics corresponding to the sampled neighbor user nodes according to the obtained edge weights to obtain weighted neighbor user attribute characteristics; and summing the user attribute characteristics corresponding to the aimed user node and the weighted neighbor user attribute characteristics, and normalizing the neighbor aggregation attribute characteristics of the summed result. Before aggregation, the edge weights may also be used to weight the user attribute features corresponding to the targeted user nodes, and then aggregation is performed.
In one embodiment, when the user attribute features are not coded, the server may use a one-hot coding mode to code the user attribute features corresponding to the sampled neighbor user nodes, and then use corresponding edge weights to weight the coded user attribute features corresponding to the neighbor user nodes; in addition, the server also adopts a one-hot coding mode to code the user attribute characteristics corresponding to the user node, and then uses the corresponding edge weights to weight the coded user attribute characteristics corresponding to the user node. And finally, the server aggregates the weighted neighbor user attribute characteristics corresponding to the sampled neighbor user nodes with the weighted user attribute characteristics corresponding to the targeted user nodes to obtain neighbor aggregation attribute characteristics.
In another embodiment, in the case of encoding the discretized numerical attribute feature and non-numerical attribute feature to obtain a feature matrix and storing the feature matrix, S208 may specifically include: and the server acquires the characteristic matrix corresponding to the sampled neighbor user node and the characteristic matrix corresponding to the targeted user node from the stored characteristic matrix, and then aggregates the characteristic matrix corresponding to the sampled neighbor user node and the characteristic matrix corresponding to the targeted user node.
Mode 2, polymerization is performed based on an attention-based mode.
In one embodiment, S208 may specifically include: the server acquires attention parameters of the targeted user node and attention parameters between the targeted user node and the sampled neighbor user node; calculating a first product value between the user attribute characteristics corresponding to the user nodes, the corresponding attention parameters and a preset parameter matrix; calculating a second product value between the sampled user attribute characteristics corresponding to the neighbor user nodes connected with the user node, the corresponding attention parameters and a preset parameter matrix; and summing the first product value corresponding to the user node and the second product value corresponding to the user node, and normalizing the summed result to obtain the neighbor aggregation attribute feature.
For the above calculation process, the neighbor aggregation attribute feature may be calculated by referring to the aggregation function as follows. Specifically, user attribute features corresponding to neighbor user nodes after sampling by a server and user attribute features corresponding to the user nodes are input into an aggregation function, a first product value and a second product value are calculated through the aggregation function, the first product value and the second product value are summed, and the summed result is normalized to obtain neighbor aggregation attribute features; wherein the aggregation function is:
i denotes the targeted user node, j denotes the sampled neighbor user node, N
iRepresenting the total number of user nodes in the user relationship graph; a is
ijRepresenting attention parameters between the targeted user node and the sampled neighbor user node, wherein W is a parameter matrix; when j ≠ i, it is,
representing the user attribute characteristics corresponding to the jth neighbor user node of the ith user node; when j is equal to i, the number of the adjacent groups,
and representing the user attribute characteristics corresponding to the ith user node. σ () may be a sigmoid function, or may be a softplus or softmax function, etc., for normalizing the result of the summation.
Before aggregation, the server may encode the user attribute features corresponding to the sampled neighbor user nodes in a one-hot encoding manner, encode the user attribute features corresponding to the user nodes, and then aggregate.
The attention parameter is calculated based on the attention parameter. In one embodiment, the attention parameter calculating step may specifically include: the server calculates the product of the parameter matrix and the user attribute characteristics corresponding to the specific user node to obtain first weighted user attribute characteristics; calculating the product of the parameter matrix and the user attribute characteristics corresponding to the sampled neighbor user nodes to obtain second weighted user attribute characteristics; performing nonlinear transformation on the first weighted user attribute feature and the second weighted user attribute feature through an attention parameter network; and normalizing the result of the nonlinear transformation to obtain the attention parameter.
For example, as shown in FIG. 5, a first weighted user attribute feature
And a second weighted user attribute feature
Respectively inputting the attention parameter networks, carrying out nonlinear transformation on the first weighted user attribute characteristic and the second weighted user attribute characteristic through a nonlinear transformation layer of the attention parameter network, and then normalizing the result of the nonlinear transformation through a softmax function to obtain an attention parameter a
ij。
And S210, scoring the corresponding user of the aimed user node based on the neighbor aggregation attribute characteristics, and taking the corresponding user as a diffusion user when the score reaches a score condition.
The diffusion user may refer to a target user identified from users corresponding to the user node in the user relationship graph and used for recommending the message.
In one embodiment, S210 may specifically include: the server classifies the neighbor aggregation attribute features through a classification model; grading the users corresponding to the user nodes according to the classified results, and sequencing the users corresponding to the user nodes according to the grading values; and taking the users with the ranking reaching the preset ranking as diffusion users. The classification model is obtained by training an initial classification model by using the user attribute characteristics of the seed user and the unknown user.
In one embodiment, the server determines that the diffusion user and the seed user have similar attribute characteristics, so that the interest and the like of the diffusion user and the seed user are similar, recommendation information related to the interest and the like of the seed user or the like can be obtained, and the recommendation information is recommended to the diffusion user.
In one embodiment, the server can also carry out vectorization on each node in the bipartite graph through a graph neural network or other graph embedded networks to obtain a network hidden vector; in addition, vectorizing the user attribute characteristics of each user node in the same composition through a graph neural network or other graph embedded networks to obtain user hidden vectors; and inputting the network hidden vector and the user hidden vector into a two-classification model, so that the two-classification model classifies users corresponding to user nodes in the same composition based on the network hidden vector and the user hidden vector to determine diffused users.
In order to more clearly understand the target user identification method based on the network hotspot, the target user identification method based on the network hotspot is described with reference to fig. 6, as shown in fig. 6, this embodiment provides an application scenario in which a user connects to a WiFi hotspot, and the target user identification method based on the network hotspot with reference to the application scenario includes the following contents:
(1) and acquiring WiFi hotspot connection data.
The WiFi hotspot connection data are data formed when a user connects a WiFi hotspot through a terminal.
(2) And constructing a WiFi user bipartite graph according to the network hotspot connection data.
The WiFi user bipartite graph is a bipartite graph with the user and the identification of the WiFi hot spot as nodes.
(3) And converting the WiFi user bipartite graph into a same graph with user identifications as nodes.
In the same graph, the connected user nodes indicate that the network hotspots to which the corresponding users are connected are the same.
(4) And collecting user attribute characteristics corresponding to each user node in the same composition.
(5) And sampling neighbor user nodes of each user node in the same composition, and aggregating user attribute characteristics of the user nodes and the corresponding neighbor user nodes.
(6) And inputting the neighbor aggregation attribute characteristics into a binary classification model for user classification processing so as to determine diffusion users.
(7) And pushing recommendation information to the diffusion users.
Wherein, the recommendation information can be introduction information or purchase links of the products in which the user is interested.
According to the scheme, users who are connected with the WiFi hotspots together can be used for diffusing out target user groups, and then information pushing is carried out on the diffused users, so that accurate coverage can be achieved, and the directional coverage rate of the users can be improved.
In the embodiment, the network hotspot connection data formed by connecting the users with the network hotspots is organized in a graph mode to obtain the user relationship graph, and the incidence relation among the users is expressed by whether the connection exists among the user nodes of the user relationship graph, so that the characteristic information is constructed in a manual intervention mode, and the incidence relation among the users can be obtained quickly. In addition, the user attribute characteristics corresponding to the neighbor user nodes and the user attribute characteristics corresponding to the corresponding user nodes are aggregated, so that neighbor aggregation attribute characteristics which simultaneously represent the similarity between the network topology and the user node characteristics can be obtained, and the users corresponding to the user nodes are scored based on the neighbor aggregation attribute characteristics, so that the diffused users can be effectively determined, and the accuracy of user directional diffusion and the user coverage rate are improved. And before aggregation, sampling neighbor user nodes connected with each user node, and then aggregating the user attribute characteristics corresponding to the sampled neighbor user nodes and the user attribute characteristics corresponding to the corresponding user nodes, so that neighbor aggregate attribute characteristics representing similarity between a network topology structure and the user node characteristics can be quickly obtained, and then diffused users can be quickly and accurately determined.
In one embodiment, the classification model is a result of processing an initial classification model; as shown in fig. 7, the step of processing the initial classification model may specifically include:
s702, generating a sample user relationship graph based on the network hotspot connection data sample; the sample user nodes connected in the sample user relationship graph represent the same network hotspots connected by corresponding users; the sample user nodes include a seed user node and a random user node.
In the above S702, reference may be made to S202 in the above embodiment for a specific process of generating the sample user relationship diagram.
S704, obtaining an attribute feature training set according to each sample user node in the sample user relationship graph.
The attribute feature training set comprises user attribute features of users corresponding to the sample user nodes. The specific obtaining process of the user attribute features may refer to S204 in the above embodiment.
And S706, aggregating the attribute feature training data respectively corresponding to each sample user node and the corresponding neighbor user node from the attribute feature training set to obtain the training aggregated attribute feature.
Before aggregation, sampling can be carried out on corresponding neighbor user nodes, and then attribute feature training data corresponding to each sample user node and the sampled corresponding neighbor user nodes are aggregated. The sampling process and the aggregation process may refer to S206 and S208 in the above embodiment.
And S708, training the initial classification model by training the aggregation attribute features, and stopping training when the feature similarity between the diffusion user and the seed user predicted by the initial classification model reaches a similarity threshold.
In one embodiment, the server takes part of the users from the seed user; taking the seed users after part of the users are taken out as positive samples, and taking the part of the users and the random users taken out as negative samples; carrying out prediction processing on training aggregation attribute characteristics corresponding to part of users through an initial classification model, and determining a classification threshold value according to the obtained prediction value; predicting the training aggregation attribute characteristics of the random user through an initial classification model to obtain a target negative sample; s708 may specifically include: and the server trains the initial classification model after prediction processing through the training aggregation attribute characteristics corresponding to the positive sample and the target negative sample.
For example, first, a part of users S is randomly sampled from a seed user P, the sampled part of users is added to a random user U of unknown type, where P' ═ P-S is used as a positive sample, and N ═ U + S is used as a negative sample.
Secondly, the binary model M is preliminarily trained according to training aggregation attribute characteristics corresponding to the positive sample P' and the negative sample N. Considering that the number of the positive samples P 'is much smaller than that of the negative samples N, the binary model M may use a Random Forest algorithm (Random Forest) without normalizing the user attribute characteristics of the positive samples P' and the negative samples N.
In addition, in the primary training process, a classification threshold t is determined according to a predicted value obtained by predicting the training aggregation attribute characteristics corresponding to the user S by the binary classification model M, and a reliable target negative sample N' is determined according to a predicted value obtained by predicting the training aggregation attribute characteristics corresponding to the random user U by the binary classification model M.
And finally, training a binary model according to the training aggregation attribute characteristics corresponding to the positive sample p 'and the target negative sample N', so as to obtain a trained binary model, and taking the binary model as a final prediction model.
In the above embodiment, the attribute feature training data corresponding to each sample user node and the corresponding neighbor user node in the attribute feature training set are aggregated to obtain the training aggregation attribute feature aggregated with the seed user and the random user, and then the training aggregation attribute feature is used to train the initial classification model, so that the obtained classification model can classify the diffusion users similar to the seed user in characteristics, and the diffusion users can be determined quickly and accurately by using the classification model.
As an example, as shown in fig. 8, the method for identifying a target user based on a network hotspot may include:
(1) and acquiring WiFi connection data, and constructing a bipartite graph of the WiFi user according to the WiFi connection data.
The WiFi connection data of the user are collected in a mode of reporting by the WiFi housekeeper background, and compared with the mode of scanning the WiFi data by the user, the WIFI connection data collected in the mode of reporting by the WiFi housekeeper background can reflect close contact between the user and a WiFi hotspot.
In addition, data within a certain period of time (such as one week) is screened out, and a bipartite graph between the user and the WiFi hotspot, namely a WiFi user bipartite graph, is constructed, as shown in fig. 3. In the WiFi user bipartite graph, a WiFi hotspot or a user identifier is used as a node, and if the user has a connection record with the WiFi hotspot through a terminal, an edge is connected between the corresponding user node and a network node. And counting the frequency of connecting the user with the WiFi hotspot in a certain time period, taking the frequency as the weight of the corresponding edge in the bipartite graph of the WiFi user, wherein the higher the frequency is, the tighter the relationship between the user and the WiFi hotspot is.
As shown in fig. 9, the graphs counted in a time period shorter than one week tend to be sparse, have fewer edges between users, and contain insufficient information. And the edges of the graph longer than one circle are more and may contain certain noise. Because the week's connection data is chosen as a particular time period, the resulting graph is similar to other social networks in its density and the week's duration can better cover the user's connection behavior.
(2) And projecting the WiFi user bipartite graph into a same graph with the user identification as a node.
When two users connect to the same WIFI hotspot through respective terminals, an edge is connected between the two corresponding user nodes in the WIFI user bipartite graph to obtain the same graph as shown in fig. 4.
The weight calculation method for edges in the same graph is as follows:
if both the user a and the user B establish connection with a certain WiFi hotspot 1, and the connection frequency is cntA and cntB, respectively, there is an edge between two user nodes (i.e., the user node a and the user node B) corresponding to the user a and the user B, and the weight of the edge between the two user nodes is wwifi-1=log(cntA)+log(cntB)。
All WiFi hotspots which are commonly connected between the user A and the user B are counted, and the weights w of the WiFi hotspots are added up to obtain the weight w of the edge between the user node A and the user node BAB=wwifi-1+···+wwifi-n。
(3) And collecting user attribute characteristics corresponding to each user node in the same composition.
Attributes of a user include gender (male/female), age (e.g., 0-6, 6-12, 12-18, 18-24, 24-30, 30-35, 35-45, 45-70, 70-100), scholarly (e.g., elementary school, junior, high school, chief specialty, major, doctor), occupation (e.g., teacher, IT engineer, etc.), assets, hobbies, points of interest (POI), frequently used Applications (APP), smart devices used, income, and native courage, etc. and portraying data.
When the user attribute features are numerical type features, binning processing needs to be performed on the numerical type user attribute features to perform discretization. Then, all the user attribute features are subjected to one-hot coding and stored as a feature matrix.
(4) And sampling neighbor user nodes for each user node in the same composition.
In the WiFi hotspot connection data, there are many users connected to the same WiFi hotspot, that is, there are many user nodes in the same composition that have thousands of neighboring user nodes (for example, there are many connection persons for some public WiFi hotspots). In order to balance speed and effect, the neighbor user nodes of each user node in the same composition are randomly sampled, and the sampling result is used as the object of the next feature aggregation.
(5) And aggregating the user attribute characteristics respectively corresponding to each user node and the neighbor user nodes in an attention mode.
And aggregating the neighbor characteristics of each node according to the following formula, wherein W is a parameter matrix to be learned, hj is the neighbor node characteristics, hi is the original node characteristics, hi' is the aggregated characteristics, and sigma is a sigmoid function.
Wherein, aijAttention parameters of a user node i and a neighbor user node j, aijFIG. 5 is a graph of an attention parameter network, i.e., a neural network for calculating an attention parameter, with the inputs of the user attributes of the user node i and the neighboring user node jThe characteristic features of the input user are subjected to nonlinear transformation, and then the result subjected to the nonlinear transformation is normalized through a softmax layer to obtain the final aijWherein, sigmajαij=1。
Optionally, in order to consider the attribute of the higher-order neighbor, the aggregation operation may be performed multiple times, where the feature input in each aggregation is the aggregated attribute feature after aggregation in the previous iteration, so as to obtain the attribute feature of the higher-order neighbor.
(6) The two-class model is trained by PU-Learning.
And for each seed packet, learning a binary classification model by taking a seed user as a positive sample and a random user as an unknown sample. The method comprises the following specific steps:
part of positive samples S are randomly sampled from a seed user P, and are added into unknown samples U, wherein P '-P-S is used as a positive sample, and N' -U + S is used as a negative sample.
A binary model M is trained from the samples P' and N. In view of the fact that the number of the samples P' is far smaller than that of the samples N, the binary model can adopt a Random Forest algorithm (Random Forest) without normalizing the characteristics.
And determining a threshold value t according to the predicted value of the binary model M to the sample S, and determining a reliable negative sample N' according to the predicted value of the binary model M to the sample U.
And training a binary model M 'according to the samples P' and N ', and taking the trained binary model M' as a final WiFi recommendation model.
(7) And classifying the neighbor aggregation attribute characteristics through the trained binary classification model to determine the diffusion users.
And (4) scoring and sequencing the users corresponding to the user nodes in the same composition through the two classification models generated in the step (6), wherein the sequenced head candidate users have similar attributes and connection WiFi behavior approximability with the seed user, so that the purpose of diffusing the users is achieved.
Through the scheme of the embodiment, the following beneficial effects can be achieved:
the method is applied to Tencent WiFi housekeeper products, the similarity among users is accurately calculated, people can be effectively diffused and recalled, the user coverage rate and accuracy of people orientation are improved, and the advertisement putting effect is improved.
In addition, the obtained user hidden vector and the WiFi hidden vector are used as the input of the two classification models, so that the problems of high dimensionality and data sparsity of a recommendation system are solved, and the recommendation effect is favorably improved.
It should be understood that although the steps in the flowcharts of fig. 2 and 7 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 10, there is provided a network hotspot-based target user identification apparatus, which may be a part of a computer device by using a software module or a hardware module, or a combination of the two modules, and specifically includes: a generating module 1002, an obtaining module 1004, a sampling module 1006, an aggregating module 1008, and a diffusing module 1010, wherein:
a generating module 1002, configured to generate a user relationship graph based on the network hotspot connection data; the user nodes connected in the user relationship graph represent the same network hotspots connected by corresponding users;
an obtaining module 1004, configured to obtain a user attribute feature corresponding to each user node in the user relationship graph;
a sampling module 1006, configured to preferentially sample, for each user node in the user relationship graph, a neighboring user node connected to the user node to which the user node is directed according to node relevance;
the aggregation module 1008 is configured to weight and sum the user attribute features corresponding to the sampled neighbor user nodes and the user attribute features corresponding to the targeted user nodes, and normalize a result obtained after weighting and summing to obtain neighbor aggregation attribute features;
the diffusion module 1010 is configured to score a user corresponding to the targeted user node based on the neighbor aggregation attribute feature, and use the user corresponding to the score when the score obtained through scoring reaches the score condition as a diffusion user.
In one embodiment, the user relationship graph is an isomorphic graph formed based on user nodes for reflecting user relationships; the generating module 1002 is further configured to extract a network hotspot identifier, a user identifier, and a network connection relationship from the network hotspot connection data; generating a bipartite graph which takes the network hotspot identification as a network node and the user identification as a user node and is connected between the network node and the user node according to a network connection relation; and connecting the user nodes connected with the same network hotspot in the bipartite graph, and filtering the network nodes from the bipartite graph connected with the user nodes to obtain the same composition.
In one embodiment, the aggregation module 1008 is further configured to obtain edge weights between the targeted user node and each sampled neighboring user node; weighting the user attribute characteristics corresponding to the sampled neighbor user nodes according to the obtained edge weights to obtain weighted neighbor user attribute characteristics; and summing the user attribute characteristics corresponding to the aimed user node and the weighted neighbor user attribute characteristics, and normalizing the summed result to obtain the neighbor aggregation attribute characteristics.
In one embodiment, the edge weight is obtained by the edge weight calculation step; as shown in fig. 11, the apparatus further includes:
a first calculating module 1012, configured to determine, for a user corresponding to a targeted user node, a frequency of connecting a network hotspot by the user in a preset time period; summing the frequencies corresponding to the users connected with the same network hotspot to obtain a sum value; and taking the sum value as the edge weight between the user nodes corresponding to the users connected with the same network hotspot.
In one embodiment, the user relationship graph is an isomorphic graph formed based on user nodes for reflecting user relationships; the generating module 1002 is further configured to extract a user identifier from the network hotspot connection data; and taking the extracted user identification as a user node, and connecting target user nodes connected with the same network hotspot in all the user nodes so as to form the same composition.
In one embodiment, the user attribute features include numeric attribute features and non-numeric attribute features; as shown in fig. 11, the apparatus further includes:
the processing module 1014 is used for performing box separation processing on the numerical attribute characteristics to obtain discretized numerical attribute characteristics;
the coding module 1016 is used for coding the discretized numerical attribute characteristics and the non-numerical attribute characteristics respectively and combining the coded results into a characteristic matrix;
the aggregation module 1008 is further configured to aggregate the feature matrix corresponding to the sampled neighbor user node and the feature matrix corresponding to the targeted user node.
In one embodiment, the aggregation module 1008 is further configured to obtain an attention parameter of the targeted user node and an attention parameter between the targeted user node and the sampled neighboring user node; calculating a first product value between the user attribute characteristics corresponding to the user nodes, the corresponding attention parameters and a preset parameter matrix; calculating a second product value between the sampled user attribute characteristics corresponding to the neighbor user nodes connected with the user node, the corresponding attention parameters and a preset parameter matrix; and summing the first product value corresponding to the user node and the second product value corresponding to the user node, and normalizing the summed result to obtain the neighbor aggregation attribute feature.
In one embodiment, the attention parameter is calculated based on the attention parameter; as shown in fig. 11, the apparatus further includes:
a second calculating module 1018, configured to calculate a product of the parameter matrix and the user attribute feature corresponding to the targeted user node, so as to obtain a first weighted user attribute feature; calculating the product of the parameter matrix and the user attribute characteristics corresponding to the sampled neighbor user nodes to obtain second weighted user attribute characteristics; performing nonlinear transformation on the first weighted user attribute feature and the second weighted user attribute feature through an attention parameter network; and normalizing the result of the nonlinear transformation to obtain the attention parameter.
In one embodiment, the diffusion module 1010 is further configured to classify the neighbor aggregation attribute features through a classification model; grading the users corresponding to the user nodes according to the classified results, and sequencing the users corresponding to the user nodes according to the grading values; and taking the users with the ranking reaching the preset ranking as diffusion users.
In the embodiment, the network hotspot connection data formed by connecting the users with the network hotspots is organized in a graph mode to obtain the user relationship graph, and the incidence relation among the users is expressed by whether the connection exists among the user nodes of the user relationship graph, so that the characteristic information is constructed in a manual intervention mode, and the incidence relation among the users can be obtained quickly. In addition, the user attribute characteristics corresponding to the neighbor user nodes and the user attribute characteristics corresponding to the corresponding user nodes are aggregated, so that neighbor aggregation attribute characteristics which simultaneously represent the similarity between the network topology and the user node characteristics can be obtained, and the users corresponding to the user nodes are scored based on the neighbor aggregation attribute characteristics, so that the diffused users can be effectively determined, and the accuracy of user directional diffusion and the user coverage rate are improved. And before aggregation, sampling neighbor user nodes connected with each user node, and then aggregating the user attribute characteristics corresponding to the sampled neighbor user nodes and the user attribute characteristics corresponding to the corresponding user nodes, so that neighbor aggregate attribute characteristics representing similarity between a network topology structure and the user node characteristics can be quickly obtained, and then diffused users can be quickly and accurately determined.
In one embodiment, the classification model is a result of processing an initial classification model; as shown in fig. 11, the apparatus may further include:
a training module 1020, configured to generate a sample user relationship graph based on the network hotspot connection data sample; the sample user nodes connected in the sample user relationship graph represent the same network hotspots connected by corresponding users; the sample user nodes comprise seed user nodes and random user nodes; acquiring an attribute feature training set according to each sample user node in the sample user relationship graph; in the dependency attribute training set, aggregating attribute feature training data respectively corresponding to each sample user node and a corresponding neighbor user node to obtain training aggregated attribute features; and training the initial classification model by training the aggregation attribute characteristics, and stopping training when the characteristic similarity between the diffusion user and the seed user predicted by the initial classification model reaches a similarity threshold value.
In one embodiment, the training module is further configured to take a portion of the users from the seed user; taking the seed users after part of the users are taken out as positive samples, and taking the part of the users and the random users taken out as negative samples; carrying out prediction processing on training aggregation attribute characteristics corresponding to part of users through an initial classification model, and determining a classification threshold value according to the obtained prediction value; predicting the training aggregation attribute characteristics of the random user through an initial classification model to obtain a target negative sample; and training the initial classification model after prediction processing through the training aggregation attribute characteristics corresponding to the positive sample and the target negative sample.
In the above embodiment, the attribute feature training data corresponding to each sample user node and the corresponding neighbor user node in the attribute feature training set are aggregated to obtain the training aggregation attribute feature aggregated with the seed user and the random user, and then the training aggregation attribute feature is used to train the initial classification model, so that the obtained classification model can classify the diffusion users similar to the seed user in characteristics, and the diffusion users can be determined quickly and accurately by using the classification model.
For specific limitations of the network hotspot-based target user identification device, reference may be made to the above limitations of the network hotspot-based target user identification method, which are not described herein again. The modules in the network hotspot-based target user identification device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the user attribute characteristics. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a network hotspot-based target user identification method.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.