CN115169526B

CN115169526B - Base station representation learning method, system and storage medium based on deep learning

Info

Publication number: CN115169526B
Application number: CN202210548996.4A
Authority: CN
Inventors: 司俊俊; 李莉; 羊晋; 涂波
Original assignee: Hezhixin Shandong Big Data Technology Co ltd; Beijing Information Science and Technology University
Current assignee: Hezhixin Shandong Big Data Technology Co ltd; Beijing Information Science and Technology University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2023-08-01
Anticipated expiration: 2042-05-20
Also published as: CN115169526A

Abstract

The present invention provides a base station representation learning method, system and storage medium based on deep learning. The method includes: acquiring base station information, the base station information including unique identification code information of the base station, static feature data of the base station, and dynamic trajectory data related to the base station ; Determine the adjacency matrix between the base stations based on the acquired longitude, latitude and base station-related dynamic trajectory data, and construct the first base station relationship diagram based on the adjacency matrix; construct the first feature vector based on the static feature data of the base station and the relevant dynamic trajectory data of the base station set; construct the self-encoder neural network model, input the first base station relationship diagram and the first feature vector set to the encoder to obtain a base station representation vector set, and input the base station representation vector set to the decoder to obtain the reconstructed first Two base station relationship graphs and a second set of feature vectors. This method can efficiently realize the representation learning of the base station, so that the information of the base station can be better applied in various data mining processes.

Description

A base station representation learning method, system and storage medium based on deep learning

技术领域technical field

本发明涉及大数据挖掘技术领域，尤其涉及一种基于深度学习的基站表示学习方法、系统及存储介质。The present invention relates to the technical field of big data mining, in particular to a base station representation learning method, system and storage medium based on deep learning.

背景技术Background technique

基站是一种重要的移动通讯设施，移动通信用户通过基站获取服务，从而可以实现接打电话、收发短信、移动上网等需求。因此，基站在移动通信大数据中扮演着重要角色。例如，移动通信轨迹数据中，某一个轨迹点中的基站信息表示了用户在这一时刻所处的位置范围。The base station is an important mobile communication facility. Mobile communication users obtain services through the base station, so that they can meet the needs of making and receiving calls, sending and receiving short messages, and mobile Internet access. Therefore, base stations play an important role in mobile communication big data. For example, in the mobile communication trajectory data, the base station information in a certain trajectory point indicates the location range of the user at this moment.

随着深度学习技术的发展，近年来应用深度学习模型对移动通信轨迹数据进行挖掘成为主流。然而，现有方法中通常把基站标识码映射为基站的经纬度信息，再把经纬度数据映射为网格编码，利用网格编码嵌入技术使得移动通信轨迹数据可传递到神经网络模型中进行训练。首先，这种预处理方式无法实现端到端的移动通信数据挖掘；其次，由于基站制式、环境等因素的不同，不同基站所表示的区域范围也是不同的，基站本身的特征和网格并不一致；再而，相比网格，基站还具有运营企业归属信息，同一个地理范围内，可能有多家运营商的基站。而如何对基站进行表示学习，将其特征转换到低维空间向量，使得深度学习模型可以在各类挖掘任务中(如轨迹预测、异常基站检测)更好地利用基站信息，是亟待解决的技术问题。With the development of deep learning technology, the application of deep learning models to mine mobile communication trajectory data has become the mainstream in recent years. However, in the existing methods, the base station identification code is usually mapped to the latitude and longitude information of the base station, and then the latitude and longitude data are mapped to grid codes, and the grid code embedding technology is used to transfer the mobile communication trajectory data to the neural network model for training. First of all, this preprocessing method cannot realize end-to-end mobile communication data mining; secondly, due to different base station standards, environments and other factors, the area ranges represented by different base stations are also different, and the characteristics of the base station itself are not consistent with the grid; Furthermore, compared with the grid, the base station also has the attribution information of the operating company, and there may be base stations of multiple operators within the same geographical area. How to learn representations of base stations and transform their features into low-dimensional space vectors, so that deep learning models can make better use of base station information in various mining tasks (such as trajectory prediction, abnormal base station detection), is an urgent technology to be solved question.

发明内容Contents of the invention

有鉴于此，本发明提供了一种基于深度学习的基站表示学习方法、系统及存储介质，以解决现有技术中存在的一个或多个问题。In view of this, the present invention provides a base station representation learning method, system and storage medium based on deep learning, so as to solve one or more problems existing in the prior art.

根据本发明的一个方面，本发明公开了一种基于深度学习的基站表示学习方法，所述方法包括：According to one aspect of the present invention, the present invention discloses a base station representation learning method based on deep learning, the method comprising:

获取基站信息，所述基站信息包括基站的唯一标识码信息、基站静态特征数据以及基站相关动态轨迹数据，所述基站静态特征数据包括位置区域码、小区编码、经度、纬度、地址、所属运营商、方向角、以及特定范围内的POI信息，所述基站相关动态轨迹数据包括基站的被访问时间、用户在基站停留的时间以及基站的访问用户数量；Obtain base station information, the base station information includes the unique identification code information of the base station, base station static feature data and base station related dynamic trajectory data, the base station static feature data includes location area code, cell code, longitude, latitude, address, operator , direction angle, and POI information within a specific range, the dynamic track data related to the base station includes the time when the base station is visited, the time the user stays in the base station, and the number of visiting users of the base station;

基于获取到的基站的经度、纬度以及所述基站相关动态轨迹数据确定基站之间的邻接矩阵，基于所述邻接矩阵构建第一基站关系图；Determining an adjacency matrix between base stations based on the acquired longitude and latitude of the base station and the relevant dynamic trajectory data of the base station, and constructing a first base station relationship diagram based on the adjacency matrix;

基于所述基站静态特征数据以及所述基站相关动态轨迹数据构造第一特征向量集合；Constructing a first set of feature vectors based on the static feature data of the base station and the dynamic trajectory data related to the base station;

构建自编码器神经网络模型，将所述第一基站关系图及所述第一特征向量集合输入至编码器得到基站表示向量集合，将所述基站表示向量集合输入至解码器得到重构的第二基站关系图及第二特征向量集合。Constructing an autoencoder neural network model, inputting the first base station relationship graph and the first feature vector set to the encoder to obtain a base station representation vector set, and inputting the base station representation vector set to a decoder to obtain a reconstructed first Two base station relationship graphs and a second set of feature vectors.

在本发明的一些实施例中，所述基站的被访问时间包括基站被访问次数最多的时间段、基站被访问次数最少的时间段；In some embodiments of the present invention, the time when the base station is visited includes a time period when the base station is visited the most times and a time period when the base station is visited the least times;

所述用户在基站停留的时间包括用户在基站停留的时间平均值以及中位数值；和/或The time that the user stays in the base station includes the average value and median value of the time that the user stays in the base station; and/or

所述基站的访问用户数量包括基站的工作日和非工作日访问用户总数量、工作日和非工作日小时访问用户数平均数量、工作日和非工作日小时访问用户数最大数量、工作日和非工作日小时访问用户数中值以及工作日和非工作日小时访问用户数最小数量。The number of visiting users of the base station includes the total number of visiting users of the base station on working days and non-working days, the average number of visiting users per hour on working days and non-working days, the maximum number of visiting users on working days and non-working days, the number of visiting users on working days and The median number of visiting users during non-working days and the minimum number of visiting users during working days and non-working days.

在本发明的一些实施例中，所述邻接矩阵中的各元素为0或1；当第一基站与第二基站之间的距离小于预设距离，且所述第一基站与所述第二基站作为相邻轨迹点的总次数不小于预设次数时，邻接矩阵中对应的元素为1。In some embodiments of the present invention, each element in the adjacency matrix is 0 or 1; when the distance between the first base station and the second base station is less than a preset distance, and the first base station and the second base station When the total number of times that the base station is used as an adjacent track point is not less than the preset number, the corresponding element in the adjacency matrix is 1.

在本发明的一些实施例中，所述方法还包括：In some embodiments of the present invention, the method also includes:

构建损失函数；build loss function;

更新所述自编码器神经网络模型的权重参数。Updating the weight parameters of the autoencoder neural network model.

在本发明的一些实施例中，所述损失函数为：其中，V_i为第i个基站的第一特征向量，/>为第i个基站的第二特征向量，N为基站的总数量，h_i为第i个基站的表示向量，λ为超模型参数，M为第i个基站的邻居基站总数量，h_j为第i个基站的邻居基站j的表示向量。In some embodiments of the present invention, the loss function is: Among them, V _i is the first eigenvector of the i-th base station, /> is the second eigenvector of the i-th base station, N is the total number of base stations, h _i is the representation vector of the i-th base station, λ is a hypermodel parameter, M is the total number of neighboring base stations of the i-th base station, h _j is The representation vector of the neighbor base station j of the i-th base station.

在本发明的一些实施例中，所述解码器包括自注意力模块，所述自注意力模块的Q、K、V均为σ(W^kh^k-1)；其中，σ为非线性激活函数，可选择Relu函数；W^k为可训练的权重系数矩阵，大小为N×L，其中N是基站数量，L是h^k-1的大小，W^k初始值随机初始化为0～1之间的实数；h^k-1为第k-1层图神经网络的输出。In some embodiments of the present invention, the decoder includes a self-attention module, and Q, K, and V of the self-attention module are all σ(W ^k h ^k-1 ); wherein, σ is a nonlinear activation function, Relu function can be selected; W ^k is a trainable weight coefficient matrix with a size of N×L, where N is the number of base stations, L is the size of h ^k-1 , and the initial value of W ^k is randomly initialized between 0 and 1 The real number of; h ^k-1 is the output of the k-1th layer graph neural network.

在本发明的一些实施例中，所述编码器和所述解码器均包括三层图注意力神经网络层。In some embodiments of the present invention, both the encoder and the decoder include a three-layer graph attention neural network layer.

设置自编码器神经网络模型的迭代次数以及学习率；Set the number of iterations and learning rate of the autoencoder neural network model;

对所述自编码器神经网络模型进行训练。The autoencoder neural network model is trained.

根据本发明的另一方面，还公开了一种基于深度学习的基站表示学习系统，该系统包括处理器和存储器，所述存储器中存储有计算机指令，所述处理器用于执行所述存储器中存储的计算机指令，当所述计算机指令被处理器执行时该系统实现如上任一实施例所述方法的步骤。According to another aspect of the present invention, a base station representation learning system based on deep learning is also disclosed, the system includes a processor and a memory, computer instructions are stored in the memory, and the processor is used to execute the instructions stored in the memory. computer instructions, when the computer instructions are executed by the processor, the system implements the steps of the method described in any one of the above embodiments.

根据本发明的又一方面，还公开了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上任一实施例所述方法的步骤。According to yet another aspect of the present invention, a computer-readable storage medium is also disclosed, on which a computer program is stored. When the program is executed by a processor, the steps of the method described in any one of the above embodiments are implemented.

本发明实施例所公开的基于深度学习的基站表示学习方法及系统，基于第一基站关系图及第一特征向量通过自编码器神经网络模型学习得到基站的表示向量，即将基站特征投影至低维空间向量，从而可高效率的实现基站的表示学习，从而可在各类数据挖掘过程中更好的应用基站信息。The base station representation learning method and system based on deep learning disclosed in the embodiment of the present invention obtains the representation vector of the base station through self-encoder neural network model learning based on the first base station relationship graph and the first feature vector, that is, projects the base station features to low-dimensional Space vector, so that the representation learning of the base station can be realized efficiently, so that the base station information can be better applied in various data mining processes.

本发明的附加优点、目的，以及特征将在下面的描述中将部分地加以阐述，且将对于本领域普通技术人员在研究下文后部分地变得明显，或者可以根据本发明的实践而获知。本发明的目的和其它优点可以通过在书面说明及其权利要求书以及附图中具体指出的结构实现到并获得。Additional advantages, objects, and features of the present invention will be set forth in part in the following description, and will be partly apparent to those of ordinary skill in the art after studying the following text, or can be learned from the practice of the present invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

本领域技术人员将会理解的是，能够用本发明实现的目的和优点不限于以上具体所述，并且根据以下详细说明将更清楚地理解本发明能够实现的上述和其他目的。It will be understood by those skilled in the art that the objects and advantages that can be achieved by the present invention are not limited to the above specific ones, and the above and other objects that can be achieved by the present invention will be more clearly understood from the following detailed description.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，并不构成对本发明的限定。附图中的部件不是成比例绘制的，而只是为了示出本发明的原理。为了便于示出和描述本发明的一些部分，附图中对应部分可能被放大，即，相对于依据本发明实际制造的示例性装置中的其它部件可能变得更大。在附图中：The drawings described here are used to provide further understanding of the present invention, constitute a part of the application, and do not limit the present invention. The components in the figures are not drawn to scale, merely illustrating the principles of the invention. For ease of illustration and description of some parts of the present invention, corresponding parts in the figures may be exaggerated, ie, may be made larger relative to other components in an exemplary device actually manufactured in accordance with the present invention. In the attached picture:

图1为本发明一实施例的基于深度学习的基站表示学习方法的流程示意图。FIG. 1 is a schematic flowchart of a base station representation learning method based on deep learning according to an embodiment of the present invention.

图2为本发明一实施例的基于深度学习的基站表示学习模型的结构示意图。FIG. 2 is a schematic structural diagram of a base station representation learning model based on deep learning according to an embodiment of the present invention.

图3为本发明一实施例的基站表示学习模型的编码器的数据更新过程示意图。FIG. 3 is a schematic diagram of a data update process of an encoder of a base station representation learning model according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚明白，下面结合附图对本发明实施例做进一步详细说明。在此，本发明的示意性实施例及其说明用于解释本发明，但并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.

在此，需要说明的是，为了避免因不必要的细节而模糊了本发明，在附图中仅仅示出了与根据本发明的方案密切相关的结构和/或处理步骤，而省略了与本发明关系不大的其他细节。Here, it should be noted that, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps that are closely related to the solution according to the present invention are shown in the drawings, while those related to the present invention are omitted. Invent other details that don't really matter.

应该强调，术语“包括/包含/具有”在本文使用时指特征、要素、步骤或组件的存在，但并不排除一个或更多个其它特征、要素、步骤或组件的存在或附加。It should be emphasized that the term "comprises/comprises/has" when used herein refers to the presence of a feature, element, step or component, but does not exclude the presence or addition of one or more other features, elements, steps or components.

为了较好的对基站进行表示学习，将基站特征转换为低维空间向量，从而使得深度学习模型可以在各类挖掘任务中更好的利用基站信息，本发明提供了一种基于深度学习的基站表示学习方法、系统及存储介质。In order to perform better representation learning on the base station and convert the features of the base station into low-dimensional space vectors, so that the deep learning model can better use the information of the base station in various mining tasks, the present invention provides a base station based on deep learning Represents a learning method, system and storage medium.

在下文中，将参考附图描述本发明的实施例。在附图中，相同的附图标记代表相同或类似的部件，或者相同或类似的步骤。Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals represent the same or similar components, or the same or similar steps.

图1为本发明一实施例的基于深度学习的基站表示学习方法的流程示意图，如图1所示，该基站表示学习方法至少包括步骤S10至S40。FIG. 1 is a schematic flow chart of a base station representation learning method based on deep learning according to an embodiment of the present invention. As shown in FIG. 1 , the base station representation learning method includes at least steps S10 to S40.

步骤S10：获取基站信息，所述基站信息包括基站的唯一标识码信息、基站静态特征数据以及基站相关动态轨迹数据，所述基站静态特征数据包括位置区域码、小区编码、经度、纬度、地址、所属运营商、方向角、以及特定范围内的POI信息，所述基站相关动态轨迹数据包括基站的被访问时间、用户在基站停留的时间以及基站的访问用户数量。Step S10: Obtain base station information, the base station information includes the unique identification code information of the base station, base station static characteristic data and base station related dynamic trajectory data, the base station static characteristic data includes location area code, cell code, longitude, latitude, address, The operator, direction angle, and POI information within a specific range. The dynamic trajectory data related to the base station includes the time when the base station is visited, the time users stay in the base station, and the number of users visiting the base station.

基站信息具体的可被预先保存在基站信息表或相应数据库中，则具体的从基站信息表或相应数据库中即可获取到需要进行表示学习的基站的信息，唯一标识码信息指基站的ID信息。示例性的，第一个基站至第N个基站可分别记为BS₁,BS₂…BS_n，BS₁表示基站1的唯一识别码；对应的，基站1至基站n的基站静态特征数据可分别记为e₁,e₂…e_n；而基站1至基站n的基站相关动态轨迹数据可分别记为f₁,f₂…f_n。The base station information can be pre-saved in the base station information table or the corresponding database. Specifically, the information of the base station that needs to be learned can be obtained from the base station information table or the corresponding database. The unique identification code information refers to the ID information of the base station . Exemplarily, the first base station to the Nth base station can be respectively recorded as BS ₁ , BS ₂ ... BS _n , and BS ₁ represents the unique identification code of the base station 1; correspondingly, the base station static feature data of the base station 1 to the base station n can be _They are _denoted as _e ₁ , _e ₂ .

在该步骤中，位置区域码(LAC,Location Area Code)用于标识不同的位置区，一个位置区可以包含一个或多个小区，位置区域码包含于位置区识别码(LAI)中。小区编码(CI，Cell Identifier)作为小区的唯一识别码，小区编码(CI)与位置区识别码(LAI)结合用于识别网络中每个基站覆盖的小区。地址是指基站的具体的位置，如XX市XX区XX街道XX号。基站的所属运营商如移动、联通或电信等，在该实施例中的基站信息表中，各运营商可通过相应的编码表示。基站的方向角通过基站中天线的方位决定，特定范围内的POI信息是指以基站为中心，基站周边的某一具体区域范围内的POI的具体信息，该实施例中的POI具体的可为商店、医院、学校、加油站等。应当理解的，特定范围可根据实际需要进行设定，例如在一实施例中，可将特定范围限定为该基站周边的以k千米为半径的圆形区域。In this step, the location area code (LAC, Location Area Code) is used to identify different location areas, one location area may contain one or more cells, and the location area code is included in the location area identification code (LAI). The cell code (CI, Cell Identifier) is used as the unique identification code of the cell, and the cell code (CI) is combined with the location area identifier (LAI) to identify the cell covered by each base station in the network. The address refers to the specific location of the base station, such as No. XX, XX Street, XX District, XX City. The operator of the base station is such as China Mobile, China Unicom or China Telecom. In the base station information table in this embodiment, each operator can be represented by a corresponding code. The direction angle of the base station is determined by the orientation of the antenna in the base station. The POI information in a specific range refers to the specific information of the POI in a specific area around the base station with the base station as the center. The POI in this embodiment can specifically be Stores, hospitals, schools, gas stations, etc. It should be understood that the specific range may be set according to actual needs. For example, in an embodiment, the specific range may be limited to a circular area around the base station with a radius of k kilometers.

基站的被访问时间可包括基站被访问次数最多的时间段、基站被访问次数最少的时间段。此时，可先获取该基站的被访问数据，例如，获取各基站在一天内被访问的次数、各次被访问的具体时间等；进而统计各基站在一天内的各时间区间(时间段)的被访问次数；时间区间例如可为小时，则此时统计各基站在一天内的各小时所被访问的次数。在统计完该基站一天内的各小时被访问的总次数之后，则进一步的可确定该基站在一天内被访问次数最多和最少的时间段，基站被访问次数最多的时间段也可理解为在该时间段内基站被访问最频繁。The visited time of the base station may include a time period when the base station is visited the most times and a time period when the base station is visited the least times. At this time, the visited data of the base station can be obtained first, for example, the number of times each base station is visited within a day, the specific time of each visit, etc.; and then the statistics of each time interval (time period) of each base station within a day The times of visits; the time interval can be hours, for example, and at this time, the times of visits of each base station in each hour of a day are counted. After counting the total number of times that the base station is visited in each hour of the day, it is further possible to determine the time period when the base station is visited the most and the least in a day, and the time period when the base station is visited the most can also be understood as The base station is visited most frequently during this time period.

用户在基站停留的时间具体的可包括用户在基站停留的时间平均值以及中位数值。在确定用户在基站停留的时间平均值以及中位数值之前，可先获取预设时间段(如一天)内各用户在该基站停留的总时间以及该基站在预设时间段内被访问的总用户数量，进而基于各用户在该基站停留的总时间之和及总用户数量计算用户在基站停留的时间平均值以及中位数值。The time that the user stays in the base station may specifically include an average value and a median value of the time that the user stays in the base station. Before determining the average value and median value of the time that users stay in the base station, the total time that each user stays in the base station within a preset time period (such as one day) and the total number of times that the base station is visited within a preset time period can be obtained first. The number of users, and then based on the sum of the total time that each user stays in the base station and the total number of users, the average and median values of the time that users stay in the base station are calculated.

基站的访问用户数量包括基站的工作日和非工作日访问用户总数量、工作日和非工作日小时访问用户数平均数量、工作日和非工作日小时访问用户数最大数量、工作日和非工作日小时访问用户数中值以及工作日和非工作日小时访问用户数最小数量。在该实施例中，基站的访问用户数量包括工作日的和非工作日的，是因为考虑到基站所覆盖的办公场所与娱乐场所的不同对基站特征产生一定的影响，例如工作日时覆盖办公场所较多的基站工作日的访问人数会大于非工作日的访问人数，而不难理解的，覆盖娱乐场所较多的基站的状态恰恰相反。因而同时获取工作日和非工作日的访问用户的相关信息则可更好的提取基站的语义特征。示例性的，无论是工作日还是非工作日，可先获取一天内每小时访问该基站的用户数，进而统计一天内访问该基站的总用户数，最终比较该天内各小时内访问该基站的用户数量大小即可确定该天内访问该基站的用户数最多及最少的时间段以及各时间段的具体用户数量，并基于统计得到的该天内访问该基站的总用户数则可计算得到该天内的小时访问用户数平均数量。The number of visiting users of the base station includes the total number of visiting users of the base station on working days and non-working days, the average number of visiting users per hour on working days and non-working days, the maximum number of visiting users on working days and non-working days, the maximum number of visiting users on working days and non-working days The median number of accessing users per hour per day and the minimum number of accessing users per hour on working days and non-working days. In this embodiment, the number of visiting users of the base station includes those on working days and those on non-working days, because it is considered that the difference between the office space and the entertainment place covered by the base station has a certain impact on the characteristics of the base station, for example, the coverage of office space on weekdays The number of visitors to base stations with more places on weekdays will be greater than that on non-workdays. It is not difficult to understand that the status of base stations with more entertainment places is just the opposite. Therefore, obtaining the relevant information of visiting users on working days and non-working days at the same time can better extract the semantic features of the base station. Exemplarily, whether it is a working day or a non-working day, the number of users who visit the base station every hour in a day can be obtained first, and then the total number of users who visit the base station in a day can be counted, and finally the number of users who visit the base station in each hour of the day can be compared The number of users can determine the time period with the largest and least number of users accessing the base station in the day and the specific number of users in each time period, and based on the total number of users accessing the base station in the day obtained from statistics, the number of users in the day can be calculated. The average number of hourly visiting users.

步骤S20：基于获取到的基站的经度、纬度以及所述基站相关动态轨迹数据确定基站之间的邻接矩阵，基于所述邻接矩阵构建第一基站关系图。Step S20: Determine an adjacency matrix between base stations based on the obtained longitude and latitude of the base stations and the dynamic trajectory data related to the base stations, and construct a first base station relationship graph based on the adjacency matrix.

在该步骤中，是为了进一步的确定邻接矩阵，基站的邻接矩阵可记为M，矩阵M中的各元素可为0或1。具体的，当第一基站与第二基站之间的距离小于预设距离，且所述第一基站与所述第二基站作为相邻轨迹点的总次数不小于预设次数时，邻接矩阵中对应的元素为1，除此之外，邻接矩阵M中的元素均记为0。示例性的，当第三基站与第四基站之间的距离大于预设距离，且第三基站与第四基站作为相邻轨迹点的总次数不小于预设次数时；或当第三基站与第四基站之间的距离大于预设距离，且第三基站与第四基站作为相邻轨迹点的次数小于预设次数时；邻接矩阵M中相应的元素均记为0。应当理解的，预设距离以及预设次数均可根据实际需要进行限定；在一实施例中，预设距离可限定为3千米，而预设次数可设为20次。In this step, in order to further determine the adjacency matrix, the adjacency matrix of the base station can be recorded as M, and each element in the matrix M can be 0 or 1. Specifically, when the distance between the first base station and the second base station is less than the preset distance, and the total number of times the first base station and the second base station are adjacent trajectory points is not less than the preset number, the adjacency matrix The corresponding element is 1, otherwise, the elements in the adjacency matrix M are recorded as 0. Exemplarily, when the distance between the third base station and the fourth base station is greater than the preset distance, and the total number of times the third base station and the fourth base station are adjacent track points is not less than the preset number; or when the third base station and the fourth base station When the distance between the fourth base stations is greater than the preset distance, and the number of times the third base station and the fourth base station are adjacent track points is less than the preset number of times; the corresponding elements in the adjacency matrix M are all recorded as 0. It should be understood that the preset distance and the preset times can be limited according to actual needs; in one embodiment, the preset distance can be limited to 3 kilometers, and the preset times can be set to 20 times.

由于邻接矩阵中的各元素反映基站的相邻基站以及基站与相邻基站之间的距离，因而在该步骤中以基站为顶点，基站与相邻基站之间的距离为边构建第一基站关系图。Since each element in the adjacency matrix reflects the adjacent base stations of the base station and the distance between the base station and the adjacent base station, in this step, the base station is used as the vertex, and the distance between the base station and the adjacent base station is used as the side to construct the first base station relationship picture.

步骤S30：基于所述基站静态特征数据以及所述基站相关动态轨迹数据构造第一特征向量集合。Step S30: Construct a first set of feature vectors based on the static feature data of the base station and the dynamic track data related to the base station.

在该步骤中，可首先确定各基站的特征向量，进而基于确定的各基站的特征向量建立第一特征向量结合。基站的特征向量组成可包括：区域码、小区编码、经度、纬度、地址、所属运营商、方向角、特定范围内的POI信息、基站被访问次数最多的时间段、基站被访问次数最少的时间段、用户在基站停留的时间平均值及中位数值、基站的工作日和非工作日访问用户数总数量、工作日和非工作日小时访问用户数平均数量、工作日和非工作日小时访问用户数最大数量、工作日和非工作日小时访问用户数中值以及工作日和非工作日小时访问用户数最小数量等。其中，特定范围内的POI信息具体的可为基站周边预设范围内的各类型POI的总数量，如购物场所的数量、娱乐场所的数量、餐饮场所的数量等；并且在该实施例中，基站特征向量除了包括上述所列举的内容之外，也可包括更多或更少个内容。In this step, the eigenvectors of each base station may be determined first, and then the first combination of eigenvectors is established based on the determined eigenvectors of each base station. The feature vector composition of the base station can include: area code, cell code, longitude, latitude, address, operator, direction angle, POI information within a specific range, time period when the base station is visited the most times, and time when the base station is visited the least times segment, the average and median value of the time that users stay in the base station, the total number of visiting users of the base station on working days and non-working days, the average number of visiting users on working days and non-working days, and the hourly visits on working days and non-working days The maximum number of users, the median number of visiting users per hour on working days and non-working days, and the minimum number of visiting users per hour on working days and non-working days. Wherein, the POI information within a specific range may specifically be the total number of POIs of various types within the preset range around the base station, such as the number of shopping places, the number of entertainment places, the number of dining places, etc.; and in this embodiment, In addition to the content listed above, the base station feature vector may also include more or less content.

步骤S40：构建自编码器神经网络模型，将所述第一基站关系图及所述第一特征向量集合输入至编码器得到基站表示向量集合，将所述基站表示向量集合输入至解码器得到重构的第二基站关系图及第二特征向量集合。Step S40: Construct a self-encoder neural network model, input the first base station relationship graph and the first feature vector set to the encoder to obtain a base station representation vector set, and input the base station representation vector set to a decoder to obtain a re- Structured second base station relationship graph and second feature vector set.

该步骤是为了构建自编码器神经网络模型以对基站进行表示学习，自编码器神经网络模型包括编码器模块和解码器模块，编码器模块的输入为第一基站关系图及第一特征向量集合，而编码器的输出为表示学习得到的向量集合(基站表示向量集合)；解码器的输入为编码器输出的基站表示向量集合，而解码器输出的为重构的第二基站关系图及第二特征向量集合。This step is to construct the self-encoder neural network model to learn the representation of the base station. The self-encoder neural network model includes an encoder module and a decoder module. The input of the encoder module is the first base station relationship graph and the first feature vector set , and the output of the encoder is the set of vectors obtained from representation learning (base station representation vector set); the input of the decoder is the set of base station representation vectors output by the encoder, and the output of the decoder is the reconstructed second base station relationship diagram and the first A collection of two eigenvectors.

另外，为了得到较理想的自编码器神经网络模型，则进一步的对自编码器神经网络模型进行迭代训练。具体的，基站表示学习方法还包括以下步骤：构建损失函数；更新所述自编码器神经网络模型的权重参数。损失函数用来估量所构建的自编码器神经网络模型学习到的真实值与预测值之间的不一致程度，损失函数越小，则模型的鲁棒性就越高。具体的，在构建自编码器神经网络模型时，还需设置相关超参数，如设置自编码器神经网络模型的迭代次数以及学习率；在设置了自编码器神经网络模型的迭代次数以及学习率之后，还可基于设置的迭代次数以及学习率对自编码器神经网络模型进行训练。基站表示学习模型的编码器的数据更新过程具体如图3所示。In addition, in order to obtain a more ideal autoencoder neural network model, iterative training is further performed on the autoencoder neural network model. Specifically, the base station representation learning method further includes the following steps: constructing a loss function; and updating weight parameters of the autoencoder neural network model. The loss function is used to measure the degree of inconsistency between the real value and the predicted value learned by the self-encoder neural network model. The smaller the loss function, the higher the robustness of the model. Specifically, when building an autoencoder neural network model, it is also necessary to set relevant hyperparameters, such as setting the number of iterations and learning rate of the autoencoder neural network model; after setting the number of iterations and learning rate of the autoencoder neural network model Afterwards, the autoencoder neural network model can also be trained based on the set number of iterations and learning rate. The data update process of the encoder of the base station representation learning model is shown in Figure 3.

在本发明中，自编码器神经网络模型的训练以及优化目标为：基站特征向量重构损失代价最小，及基站与其邻居基站的标准向量相似度最大化。则示例性的，所构建的损失函数为其中，V_i为第i个基站的第一特征向量，/>为第i个基站的第二特征向量，N为基站的总数量，h_i为第i个基站的表示向量，λ为超模型参数，M为第i个基站的邻居基站总数量，h_j为第i个基站的邻居基站j的表示向量。In the present invention, the training and optimization objectives of the autoencoder neural network model are: the cost of reconstructing the base station feature vector is minimized, and the similarity of the standard vectors between the base station and its neighboring base stations is maximized. As an example, the constructed loss function is Among them, V _i is the first eigenvector of the i-th base station, /> is the second eigenvector of the i-th base station, N is the total number of base stations, h _i is the representation vector of the i-th base station, λ is a hypermodel parameter, M is the total number of neighboring base stations of the i-th base station, h _j is The representation vector of the neighbor base station j of the i-th base station.

在本发明的一实施例中，解码器包括自注意力模块，所述自注意力模块的Q(查询)、K(键)、V(值)均为σ(W^kh^k-1)；其中，σ为非线性激活函数，可选择Relu函数；W^k为可训练的权重系数矩阵，大小为N×L，其中N是基站数量，L是h^k-1的大小，W^k初始值随机初始化为0～1之间的实数；h^k-1为第k-1层图神经网络的输出。在该实施例中，编码器的和解码器均可包括三层图注意力神经网络层，则此时k可取值1,2,3。可以理解的，编码器和解码器中均包括三层图注意力神经网络层仅是一种较优示例，可根据具体应用场景进行改变。In an embodiment of the present invention, the decoder includes a self-attention module, and Q (query), K (key), and V (value) of the self-attention module are all σ(W ^k h ^k-1 ); Among them, σ is a nonlinear activation function, and the Relu function can be selected; W ^k is a trainable weight coefficient matrix with a size of N×L, where N is the number of base stations, L is the size of h ^k-1 , and the initial value of W ^k is random Initialized as a real number between 0 and 1; h ^k-1 is the output of the k-1th layer graph neural network. In this embodiment, both the encoder and the decoder can include a three-layer graph attention neural network layer, and k can take values of 1, 2, and 3 at this time. It can be understood that including the three-layer graph attention neural network layer in both the encoder and the decoder is only a preferred example, which can be changed according to specific application scenarios.

图2为本发明一实施例的基于深度学习的基站表示学习模型的结构示意图，如图2所示，首先构建基站原始关系图，将基站原始关系图及对应的基站特征向量集合输入至编码器得到基站表征向量，进而将基站表征向量输入至解码器得到基站重构关系图以及对应的基站重构特征向量集合。Fig. 2 is a schematic structural diagram of a base station representation learning model based on deep learning according to an embodiment of the present invention. As shown in Fig. 2 , the original relationship diagram of the base station is constructed first, and the original relationship diagram of the base station and the corresponding base station feature vector set are input to the encoder A base station characterization vector is obtained, and then the base station characterization vector is input to a decoder to obtain a base station reconstruction relationship graph and a corresponding base station reconstruction feature vector set.

相应的，本发明还公开了一种基于深度学习的基站表示学习系统，该系统包括处理器和存储器，所述存储器中存储有计算机指令，所述处理器用于执行所述存储器中存储的计算机指令，当所述计算机指令被处理器执行时该系统实现如上任一实施例所述方法的步骤。Correspondingly, the present invention also discloses a base station representation learning system based on deep learning, the system includes a processor and a memory, the memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory , when the computer instructions are executed by the processor, the system implements the steps of the method described in any one of the above embodiments.

示例性的，基站表示学习系统具体的执行如下步骤：基站数据获取、基站关系图构建、图神经网络构建、基站表示学习。具体的，基站数据获取包括基站编码标识基础数据获取、基站静态特征数据获取和基站动态特征数据获取；基站静态特征数据包括基站地理坐标经纬度数据、基站所属运营商、基站方向角、基站覆盖范围等信息；基站静态数据可从文本文件、数据库中获取到。基站动态特征数据包括基站相关动态轨迹数据，基站相关动态轨迹数据可从数据库、消息队列、文本文件中获取到。Exemplarily, the base station representation learning system specifically performs the following steps: base station data acquisition, base station relationship graph construction, graph neural network construction, and base station representation learning. Specifically, base station data acquisition includes base station code identification basic data acquisition, base station static feature data acquisition, and base station dynamic feature data acquisition; base station static feature data includes base station geographic coordinates, longitude and latitude data, base station operator, base station direction angle, base station coverage, etc. Information; base station static data can be obtained from text files and databases. The dynamic feature data of the base station includes the dynamic trajectory data related to the base station, and the dynamic trajectory data related to the base station can be obtained from databases, message queues, and text files.

基站关系图构建是根据基站数据构造基站关系图，基站关系图中的每个顶点代表一个基站，具有相关性的基站连接一条边。基站相关性可以取决于基站之间的距离以及基站之间的动态切换关系。The construction of the base station relationship graph is to construct the base station relationship graph based on the base station data. Each vertex in the base station relationship graph represents a base station, and the base stations with correlation are connected with an edge. The base station correlation may depend on the distance between the base stations and the dynamic switching relationship between the base stations.

图神经网络构建是为了构建自监督图神经网络模型，包括编码器和解码器，编码器用于从基站关系图中学习得到基站表示向量，解码器用于从学习得到的基站表示向量对基站关系图数据进行重构，通过最优化重构损失进行自监督学习。The construction of the graph neural network is to build a self-supervised graph neural network model, including an encoder and a decoder. The encoder is used to learn the base station representation vector from the base station relationship graph, and the decoder is used to compare the base station graph data from the learned base station representation vector. Perform reconstruction and perform self-supervised learning by optimizing the reconstruction loss.

基站表示学习具体是基于基站关系图和基站数据，设置模型超参数，对构建的图神经网络模型进行训练，学习得到基站表示向量。The base station representation learning is based on the base station relationship graph and base station data, setting model hyperparameters, training the constructed graph neural network model, and learning the base station representation vector.

下面结合一个具体实施例对上述方法进行说明，然而，值得注意的是，该具体实施例仅是为了更好地说明本申请，并不构成对本申请的不当限定。The above method will be described below in conjunction with a specific example. However, it should be noted that this specific example is only for better illustrating the present application, and does not constitute an improper limitation to the present application.

在该实施例中，首先获取基站集合D，D＝{BS₁,BS₂,…,BS_n}，其中BS_i表示基站i的唯一识别码；n表示基站的总数量。进而获取基站静态特征数据E，E＝{(BS₁,e₁),(BS₂,e₂)…(BS_n,e_n)}，其中e_i表示BS_i对应的基站i的静态特征向量，通常包括位置区域码(LAC,Location Area Code)、小区编码(CI，Cell Identifier)、经度、维度、地址、所属运营商、方向角、覆盖范围等信息。另外根据基站经纬度信息，利用百度地图服务接口可以获取以基站为圆心，以k千米为半径的圆形区域内的POI(Point Of Interest)信息。具体的K可取3，则此时获取到的为以基站为中心，以3千米为半径的圆形区域内的POI信息。In this embodiment, a set of base stations D is obtained first, D={BS ₁ , BS ₂ ,...,BS _n }, where BS _i represents the unique identification code of base station i; n represents the total number of base stations. Then obtain the static feature data E of the base station, E={(BS ₁ ,e ₁ ),(BS ₂ ,e ₂ )...(BS _n ,e _n )}, where e _i represents the static feature vector of base station i corresponding to BS _i , usually including location area code (LAC, Location Area Code), cell code (CI, Cell Identifier), longitude, latitude, address, operator, azimuth, coverage and other information. In addition, according to the latitude and longitude information of the base station, the Baidu map service interface can be used to obtain POI (Point Of Interest) information in a circular area with the base station as the center and a radius of k kilometers. The specific value of K can be 3, then what is obtained at this time is the POI information in a circular area with the base station as the center and a radius of 3 kilometers.

当获取到基站的静态特征数据之后，进一步获取基站相关动态轨迹数据。进一步的基于基站相关动态轨迹数据可得到基站动态特征数据，基站动态特征数据记为F，则F＝{(BS₁,f₁),(BS₂,f₂),…,(BS_n,f_n)}，其中f_i表示BS_i对应的基站i的动态特征向量。基站动态特征向量通常包括访问时间、停留时间、访问用户数等信息。在本实施例中，访问时间包括：该基站被访问最频繁的小时、该基站被访问最不频繁的小时；停留时间包括：用户在该基站的停留时间平均值、中位数值；访问用户数包括：该基站工作日和非工作日的访问用户总数、工作日和非工作日小时访问用户数平均值、工作日和非工作日小时访问用户数最大值、工作日和非工作日小时访问用户数中值以及工作日和非工作日小时访问用户数最小值等。After the static feature data of the base station is obtained, the dynamic trajectory data related to the base station is further obtained. Further base station dynamic characteristic data can be obtained based on the relevant dynamic trajectory data of the base station, and the dynamic characteristic data of the base station is recorded as F, then F={(BS ₁ ,f ₁ ),(BS ₂ ,f ₂ ),...,(BS _n ,f _n )}, where f _i represents the dynamic feature vector of base station i corresponding to BS _i . The base station dynamic feature vector usually includes information such as access time, dwell time, and number of access users. In this embodiment, the visit time includes: the hour when the base station is most frequently visited, and the hour when the base station is least frequently visited; the dwell time includes: the average value and median value of the user's stay time in the base station; the number of visiting users Including: the total number of visiting users of the base station on working days and non-working days, the average number of visiting users per hour on working days and non-working days, the maximum number of visiting users on working days and non-working days, and the hourly visiting users on working days and non-working days The median value of the number and the minimum number of access users per hour on weekdays and non-workdays, etc.

进一步的基于在上述步骤中获取到的基站的经纬度数据和基站动态特征数据计算得到基站之间的邻接矩阵M，M_i，j＝1表示BS_i对应的基站i到BS_j对应的基站j的距离小于k千米，并且基站BS_i与基站BS_j在轨迹数据中出现在同一条轨迹相邻轨迹点中的次数不低于n次；否则M_i，j＝0；本实施实例中k、n取值分别为3、20。进一步的基于在该步骤中得到的邻接矩阵M，构造第一基站关系图G。Further, the adjacency matrix M between the base stations is calculated based on the latitude and longitude data of the base stations obtained in the above steps and the dynamic characteristic data of the base stations, and M _{i, j} = 1 represents the base station i corresponding to BS _i to the base station _j corresponding to BS j The distance is less than k kilometers, and the number of times that base station BS _i and base station BS _j appear in the adjacent track points of the same track in the track data is not less than n times; otherwise M _{i, j} =0; in this implementation example, k, The values of n are 3 and 20 respectively. Further, based on the adjacency matrix M obtained in this step, a first base station relationship graph G is constructed.

同时基于获取到的基站静态特征数据以及所述基站相关动态轨迹数据构造基站的第一特征向量集合V，一个基站的第一特征向量组成包括：基站所属运营商编码；基站位置区域码；以基站为圆心，以k千米为半径的圆形区域内的各类型POI的数量；基站被访问最频繁的小时；基站被访问最不频繁的小时；用户在该基站的停留时间平均值、中位数值；基站工作日和非工作日访问用户总数、工作日和非工作日小时访问用户数平均值、工作日和非工作日小时访问用户数最大值、工作日和非工作日小时访问用户数中值以及工作日和非工作日小时访问用户数最小值。At the same time, based on the obtained base station static characteristic data and the relevant dynamic trajectory data of the base station, the first characteristic vector set V of the base station is constructed. The first characteristic vector set V of a base station includes: the code of the operator to which the base station belongs; the base station location area code; is the center of the circle, the number of various types of POIs in a circular area with a radius of k kilometers; the hour when the base station is most frequently visited; the hour when the base station is least frequently visited; the average and median stay time of users in the base station Numerical value; the total number of visiting users of the base station on working days and non-working days, the average number of visiting users per hour on working days and non-working days, the maximum number of visiting users on working days and non-working days, the number of visiting users per hour on working days and non-working days value and the minimum number of access users per hour on weekdays and non-workdays.

基于确定的第一基站关系图G和第一特征向量V(V∈R^N*d，N代表基站数量，d代表基站的第一特征向量的维度)，构建图注意力自编码器神经网络模型。模型包括编码器和解码器两部分；其中编码器的输入是第一基站关系图G以及图G中顶点(即基站)的第一特征向量集合V，编码器的输出是基站表示学习得到的向量集合h(h∈R^N*d′,N是基站数量，d′是学习到的基站表征向量维度，本实施例中取值为128)。解码器的输入是基站表示学习得到的向量集合h，输出是重构的第二基站关系图和第二特征向量集合/>模型的训练目标有两个，一是基站特征向量重构损失代价最小，/>另一个是基站与其邻居基站的标准向量相似度最大化，/>总的损失函数表示为：其中N是基站总数量，h_i是第i个基站的表示向量，M是第i个基站的邻居基站总数量，h_j为第i个基站的邻居基站j的表示向量，λ是模型超参数，本实施实例中λ取0.5。Based on the determined first base station relationship graph G and the first feature vector V (V∈R ^N*d , N represents the number of base stations, and d represents the dimension of the first feature vector of the base station), construct a graph attention autoencoder neural network model . The model includes an encoder and a decoder. The input of the encoder is the first base station relationship graph G and the first feature vector set V of the vertices (ie base stations) in the graph G, and the output of the encoder is the vector obtained by base station representation learning. Set h (h∈R ^N*d′ , N is the number of base stations, d′ is the dimension of the learned base station representation vector, and the value is 128 in this embodiment). The input of the decoder is the vector set h obtained by base station representation learning, and the output is the reconstructed second base station relationship graph and the second set of eigenvectors /> There are two training objectives of the model, one is to minimize the loss cost of base station feature vector reconstruction, /> The other is to maximize the standard vector similarity between the base station and its neighbor base stations, /> The overall loss function is expressed as: where N is the total number of base stations, h _i is the representation vector of the i-th base station, M is the total number of neighbor base stations of the i-th base station, _hj is the representation vector of the i-th base station’s neighbor base station j, and λ is the model hyperparameter , λ is 0.5 in this implementation example.

在该实施例中，编码器和解码器层数均设为3，且第k-1层图神经网络的信息通过自注意力机制传递给第k层。具体地，注意力中的查询Q、键K、值V分别设置为：Q^k,K^k,V^k＝σ(W^kh^k-1)，基站i与其他基站的权重为：基站i的表示向量更新为：其中，Nei(i)表示基站i的邻居基站集合，j代表基站i的邻居基站。In this embodiment, the number of layers of the encoder and the decoder is set to 3, and the information of the graph neural network of the k-1th layer is passed to the kth layer through the self-attention mechanism. Specifically, the query Q, key K, and value V in attention are set as: Q ^k , K ^k , V ^k = σ(W ^k h ^k-1 ), and the weights of base station i and other base stations are: The representation vector of base station i is updated as: Among them, Nei(i) represents the set of neighbor base stations of base station i, and j represents the neighbor base stations of base station i.

进一步的，设置相关超参数，以对上述图自注意力编码器神经网络模型进行训练，其中，基站表示向量维度设为128，即隐藏层大小为128，编解码器层数设置为3，模型训练迭代次数设为100，学习率设为10^-3。Further, set relevant hyperparameters to train the above-mentioned graph self-attention encoder neural network model, where the base station representation vector dimension is set to 128, that is, the hidden layer size is 128, the number of codec layers is set to 3, and the model The number of training iterations is set to 100, and the learning rate is set to 10 ^-3 .

通过上述实施例可以发现，本发明的基于深度学习的基站表示学习方法及系统，基于第一基站关系图及第一特征向量通过自编码器神经网络模型学习得到基站的表示向量，即将基站特征投影至低维空间向量，从而可高效率的实现基站的表示学习，从而可在各类数据挖掘过程中更好的应用基站信息。Through the above-mentioned embodiments, it can be found that the deep learning-based base station representation learning method and system of the present invention, based on the first base station relationship graph and the first feature vector, obtains the base station representation vector through self-encoder neural network model learning, that is, the base station feature projection To a low-dimensional space vector, so that the representation learning of the base station can be realized efficiently, so that the base station information can be better applied in various data mining processes.

另外，该发明还公开了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上任一实施例所述方法的步骤。In addition, the invention also discloses a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in any one of the above embodiments are realized.

本领域普通技术人员应该可以明白，结合本文中所公开的实施方式描述的各示例性的组成部分、系统和方法，能够以硬件、软件或者二者的结合来实现。具体究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。当以硬件方式实现时，其可以例如是电子电路、专用集成电路(ASIC)、适当的固件、插件、功能卡等等。当以软件方式实现时，本发明的元素是被用于执行所需任务的程序或者代码段。程序或者代码段可以存储在机器可读介质中，或者通过载波中携带的数据信号在传输介质或者通信链路上传送。“机器可读介质”可以包括能够存储或传输信息的任何介质。机器可读介质的例子包括电子电路、半导体存储器设备、ROM、闪存、可擦除ROM(EROM)、软盘、CD-ROM、光盘、硬盘、光纤介质、射频(RF)链路，等等。代码段可以经由诸如因特网、内联网等的计算机网络被下载。Those of ordinary skill in the art should understand that each exemplary component, system and method described in conjunction with the embodiments disclosed herein can be implemented by hardware, software or a combination of the two. Whether it is implemented in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an application specific integrated circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments employed to perform the required tasks. Programs or code segments can be stored in machine-readable media, or transmitted over transmission media or communication links by data signals carried in carrier waves. "Machine-readable medium" may include any medium that can store or transmit information. Examples of machine-readable media include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, and the like. Code segments may be downloaded via a computer network such as the Internet, an Intranet, or the like.

还需要说明的是，本发明中提及的示例性实施例，基于一系列的步骤或者装置描述一些方法或系统。但是，本发明不局限于上述步骤的顺序，也就是说，可以按照实施例中提及的顺序执行步骤，也可以不同于实施例中的顺序，或者若干步骤同时执行。It should also be noted that the exemplary embodiments mentioned in the present invention describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above steps, that is, the steps may be performed in the order mentioned in the embodiment, or may be different from the order in the embodiment, or several steps may be performed simultaneously.

本发明中，针对一个实施方式描述和/或例示的特征，可以在一个或更多个其它实施方式中以相同方式或以类似方式使用，和/或与其他实施方式的特征相结合或代替其他实施方式的特征。In the present invention, features described and/or exemplified for one embodiment can be used in the same or similar manner in one or more other embodiments, and/or can be combined with features of other embodiments or replace other Features of the implementation.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明实施例可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, various modifications and changes may be made to the embodiments of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A method for deep learning-based base station representation learning, the method comprising:

acquiring base station information, wherein the base station information comprises unique identification code information of a base station, base station static characteristic data and base station related dynamic track data, the base station static characteristic data comprises position area codes, cell codes, longitudes, latitudes, addresses, affiliated operators, direction angles and POI information in a specific range, and the base station related dynamic track data comprises the accessed time of the base station, the stay time of users in the base station and the number of access users of the base station;

determining an adjacency matrix between base stations based on the acquired longitude and latitude of the base stations and the related dynamic track data of the base stations, and constructing a first base station relation diagram based on the adjacency matrix; the adjacency matrix is used for representing the distance between the base stations and the dynamic switching relation;

constructing a first feature vector set based on the base station static feature data and the base station related dynamic track data;

constructing a self-encoder neural network model, inputting the first base station relation diagram and the first characteristic vector set into an encoder to obtain a base station expression vector set, and inputting the base station expression vector set into a decoder to obtain a reconstructed second base station relation diagram and second characteristic vector set;

each element in the adjacency matrix is 0 or 1;

when the distance between the first base station and the second base station is smaller than a preset distance, and the total number of times of the first base station and the second base station serving as adjacent track points is not smaller than the preset number of times, the corresponding element in the adjacent matrix is 1;

the decoder includes self-attention modules Q, K, V each of which is σ (W ^k h ^k-1 ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein σ is a nonlinear activation function; w (W) ^k Is a trainable weight coefficient matrix with the size of N multiplied by L, wherein N is the number of base stations, and L is h ^k-1 Size W of (2) ^k The initial value is randomly initialized to be a real number between 0 and 1; h is a ^k-1 Is the output of the neural network of the k-1 layer.

2. The deep learning-based base station representation learning method according to claim 1, wherein the accessed time of the base station includes a time period in which the number of times the base station is accessed is the largest and a time period in which the number of times the base station is accessed is the smallest;

the time for the user to stay in the base station comprises the average value and the median value of the time for the user to stay in the base station; and/or

The access user quantity of the base station comprises total number of access users of the base station on working days and non-working days, average number of access users of the base station on working days and non-working days, maximum number of access users of the base station on working days and non-working days, median number of access users of the base station on working days and non-working days and minimum number of access users of the base station on hours.

3. The deep learning-based base station representation learning method of claim 1, further comprising:

constructing a loss function;

and updating weight parameters of the self-encoder neural network model.

4. A deep learning based base station representation learning method according to claim 3, characterized in that the loss function is:wherein V is _i For the first eigenvector of the i-th base station,the second eigenvector of the ith base station, N is the total number of base stations, h _i Is the expression vector of the ith base station, lambda is the supermodel parameter, M is the total number of neighbor base stations of the ith base station, h _j Is a representation vector of a neighbor base station j of the i-th base station.

5. The deep learning based base station representation learning method of any one of claims 1 to 4, wherein the encoder and the decoder each comprise a three-layer graph attention neural network layer.

6. The deep learning based base station representation learning method of claim 4, further comprising:

setting the iteration times and the learning rate of the self-encoder neural network model;

training the self-encoder neural network model.

7. A deep learning based base station representation learning system comprising a processor and a memory, wherein said memory has stored therein computer instructions for executing the computer instructions stored in said memory, which system when executed by the processor implements the steps of the method according to any of claims 1 to 6.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.