CN107346313B

CN107346313B - Method and device for virtual face mining

Info

Publication number: CN107346313B
Application number: CN201610294838.5A
Authority: CN
Inventors: 李国良; 冯建华; 沈秉文; 王鹤男; 孟凡超; 章云龙; 司向辉; 汪晓婕; 毛帅; 郭昂; 郑宇飞
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Priority date: 2016-05-05
Filing date: 2016-05-05
Publication date: 2020-11-27
Anticipated expiration: 2036-05-05
Also published as: CN107346313A

Abstract

The present invention relates to a method and device for virtual face mining. The method includes: acquiring point of interest data, check-in point data and road network data, integrating the point of interest data and the check-in point data to obtain a point set; acquiring the check-in point set of each point of interest according to the point set, Filter the check-in points in the check-in point set to obtain the correct check-in point set; obtain the minimum circumscribed polygon of the correct check-in point set; perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data . The above method and device for excavating a virtual surface obtain the minimum circumscribed polygon of the check-in point set, and use the road network to optimize the boundary of the minimum circumscribed polygon.

Description

Method and device for virtual face mining

技术领域technical field

本发明涉及信息检索与信息挖掘领域，特别是涉及一种虚拟面挖掘的方法和装置。The invention relates to the fields of information retrieval and information mining, in particular to a method and device for virtual surface mining.

背景技术Background technique

随着移动互联网的快速发展和GPS(Globe Positioning System，全球卫星定位系统)等定位技术的普及，社交网络用户产生的地理位置的数据在以惊人的速度增加。用户利用智能移动终端上的定位技术，将自己的位置轨迹发送给签到应用网站，因不同用户可能在不同的地方进行签到，这样产生了大量的基于位置的签到数据。如何根据社交网络用户的签到数据挖掘出地理位置的虚拟面区域是急需解决的问题。With the rapid development of the mobile Internet and the popularization of positioning technologies such as GPS (Globe Positioning System, global satellite positioning system), the geographic location data generated by social network users is increasing at an alarming rate. The user uses the positioning technology on the smart mobile terminal to send his position track to the check-in application website. Since different users may check-in in different places, a large amount of location-based check-in data is generated. How to mine the virtual area of geographic location according to the check-in data of social network users is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对如何根据签到数据挖掘地理位置的虚拟面的问题，提供一种虚拟面挖掘的方法，能够实现根据用户的签到数据挖掘地理位置的虚拟面区域。Based on this, it is necessary to provide a virtual surface mining method for mining the virtual surface of the geographic location according to the check-in data, which can realize the mining of the virtual surface area of the geographic location according to the user's check-in data.

此外，还有必要提供一种虚拟面挖掘的装置，能够实现根据用户的签到数据挖掘地理位置的虚拟面区域。In addition, it is also necessary to provide a device for virtual surface mining, which can realize virtual surface area mining of geographic location according to the user's check-in data.

一种虚拟面挖掘的方法，包括：A method of virtual face mining, comprising:

获取兴趣点数据、签到点数据和路网数据，对所述兴趣点数据和签到点数据进行整合得到点集合；Obtaining point-of-interest data, check-in point data and road network data, and integrating the point-of-interest data and check-in point data to obtain a point set;

根据所述点集合获取各兴趣点的签到点集合，对所述签到点集合中签到点进行过滤得到正确的签到点点集；Obtain the check-in point set of each interest point according to the point set, and filter the check-in points in the check-in point set to obtain the correct check-in point set;

获取所述正确的签到点点集的最小外接多边形；Obtain the minimum circumscribed polygon of the correct check-in point set;

根据所述路网数据对所述正确的签到点点集的最小外接多边形进行边界优化。Boundary optimization is performed on the minimum circumscribed polygon of the correct check-in point set according to the road network data.

一种虚拟面挖掘的装置，包括：A device for virtual face excavation, comprising:

整合模块，用于获取兴趣点数据、签到点数据和路网数据，对所述兴趣点数据和签到点数据进行整合得到点集合；an integration module, used for acquiring point-of-interest data, check-in point data and road network data, and integrating the point-of-interest data and check-in point data to obtain a point set;

筛选模块，用于根据所述点集合获取各兴趣点的签到点集合，对所述签到点集合中签到点进行过滤得到正确的签到点点集；a screening module, configured to obtain a check-in point set of each interest point according to the point set, and filter the check-in points in the check-in point set to obtain a correct check-in point set;

虚拟面获取模块，用于获取所述正确的签到点点集的最小外接多边形；a virtual surface acquisition module, used for acquiring the minimum circumscribed polygon of the correct check-in point set;

优化模块，用于根据所述路网数据对所述正确的签到点点集的最小外接多边形进行边界优化。An optimization module, configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.

上述虚拟面挖掘的方法和装置，通过获取兴趣点数据、签到点数据和路网数据，对兴趣点数据和签到点数据进行整合得到点集合，对点集合中各兴趣点对应的签到点进行过滤得到正确的签到点点集，获取该签到点点集的最小外接多边形，采用路网对最小外接多边形进行边界优化，该最小外接多边形即为兴趣点的虚拟面，实现了对虚拟面的挖掘。The above-mentioned method and device for virtual surface mining, by acquiring interest point data, check-in point data and road network data, integrating the interest point data and the check-in point data to obtain a point set, and filtering the check-in points corresponding to each interest point in the point set The correct check-in point set is obtained, the minimum circumscribed polygon of the check-in point set is obtained, and the boundary of the minimum circumscribed polygon is optimized by using the road network.

附图说明Description of drawings

图1为一个实施例中终端的内部结构示意图；1 is a schematic diagram of the internal structure of a terminal in one embodiment;

图2为一个实施例中虚拟面挖掘的方法的流程图；2 is a flowchart of a method for virtual surface mining in one embodiment;

图3为兴趣点与签到点的示例示意图；FIG. 3 is an example schematic diagram of a point of interest and a check-in point;

图4为R-Tree实例示意图；Fig. 4 is the schematic diagram of R-Tree instance;

图5为KR-Tree在地图上的示例示意图；Figure 5 is an example schematic diagram of KR-Tree on the map;

图6为根据图5的基于POI的ID建立的倒排索引示意图；6 is a schematic diagram of an inverted index established according to the POI-based ID of FIG. 5;

图7为根据图5建立的KR-Tree索引示意图；Fig. 7 is the KR-Tree index schematic diagram established according to Fig. 5;

图8A为一个实施例中各结点所包含叶子结点的示意图；8A is a schematic diagram of a leaf node included in each node in one embodiment;

图8B为在图8A中添加新结点X的示意图；8B is a schematic diagram of adding a new node X in FIG. 8A;

图8C为加入X后各个非叶子结点遍历得到的扩张面积示意图；8C is a schematic diagram of the expanded area obtained by traversing each non-leaf node after adding X;

图8D为图8C中选取扩张面积最小的结点A，将X加入A的示意图；8D is a schematic diagram of selecting node A with the smallest expansion area in FIG. 8C and adding X to A;

图9为凸包算法示意图；Figure 9 is a schematic diagram of the convex hull algorithm;

图10为一个实施例中虚拟面挖掘的装置的结构框图；10 is a structural block diagram of an apparatus for excavating virtual surfaces in one embodiment;

图11为另一个实施例中虚拟面挖掘的装置的结构框图。FIG. 11 is a structural block diagram of an apparatus for excavating virtual surfaces in another embodiment.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

可以理解，本发明所使用的术语“第一”、“第二”等可在本文中用于描述各种元件，但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说，在不脱离本发明的范围的情况下，可以将第一集合称为第二集合，且类似地，可将第二集合称为第一集合。第一集合和第二集合两者都是集合，但其不是同一集合。It will be understood that the terms "first", "second", etc., as used herein, may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish a first element from another element. For example, the first set may be referred to as the second set, and, similarly, the second set may be referred to as the first set, without departing from the scope of this disclosure. Both the first set and the second set are sets, but they are not the same set.

图1为一个实施例中终端的内部结构示意图。如图1所示，该终端包括通过系统总线连接的处理器、非易失性存储介质、内存、显示屏和输入装置。其中，终端的非易失性存储介质存储有操作系统。该处理器用于提供计算和控制能力，支撑整个终端的运行，该处理器被用于执行虚拟面挖掘的方法，包括：获取兴趣点数据、签到点数据和路网数据，对所述兴趣点数据和签到点数据进行整合得到点集合；根据所述点集合获取各兴趣点的签到点集合，对所述签到点集合中签到点进行过滤得到正确的签到点点集；获取所述正确的签到点点集的最小外接多边形；根据所述路网数据对所述正确的签到点点集的最小外接多边形进行边界优化。终端的显示屏可以是液晶显示屏或者电子墨水显示屏等，输入装置可以是显示屏上覆盖的触摸层，也可以是终端外壳上设置的按键、轨迹球或触控板，也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑或者个人数字助理。本领域技术人员可以理解，图1中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的终端的限定，具体的终端可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。FIG. 1 is a schematic diagram of an internal structure of a terminal in an embodiment. As shown in FIG. 1 , the terminal includes a processor, a non-volatile storage medium, a memory, a display screen and an input device connected through a system bus. The operating system is stored in the non-volatile storage medium of the terminal. The processor is used to provide computing and control capabilities to support the operation of the entire terminal, and the processor is used to execute a method for virtual surface mining, including: acquiring point-of-interest data, check-in point data and road network data, and analyzing the point-of-interest data Integrate with the check-in point data to obtain a point set; obtain the check-in point set of each point of interest according to the point set, filter the check-in points in the check-in point set to obtain the correct check-in point set; obtain the correct check-in point set The minimum circumscribed polygon of ; according to the road network data, the boundary optimization is performed on the minimal circumscribed polygon of the correct check-in point set. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen, etc., and the input device can be a touch layer covered on the display screen, a button, a trackball or a touchpad set on the terminal shell, or an external device. Keyboard, trackpad or mouse, etc. The terminal may be a mobile phone, a tablet computer or a personal digital assistant. Those skilled in the art can understand that the structure shown in FIG. 1 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied. More or fewer components are shown in the figures, either in combination or with different arrangements of components.

图2为一个实施例中虚拟面挖掘的方法的流程图。如图2所示，一种虚拟面挖掘的方法，包括：FIG. 2 is a flowchart of a method for virtual face mining in one embodiment. As shown in Figure 2, a method for virtual surface mining includes:

步骤202，获取兴趣点数据、签到点数据和路网数据，对该兴趣点数据和签到点数据进行整合得到点集合。Step 202: Acquire point-of-interest data, check-in point data, and road network data, and integrate the point-of-interest data and check-in point data to obtain a point set.

本实施例中，POI(Points Of Interest，兴趣点)数据可以从各种电子地图数据中获取。签到点数据是指用户签到点数据，可以从社交网站或一些旅游签到网站的服务提供者进行获取，例如即时通信软件、微博、Twitter、Foursquare等提供签到点数据。路网数据可从电子地图数据或四维地图上获取。In this embodiment, POI (Points Of Interest, points of interest) data can be obtained from various electronic map data. Check-in point data refers to user check-in point data, which can be obtained from social networking websites or service providers of some travel check-in websites, such as instant messaging software, Weibo, Twitter, Foursquare, etc. that provide check-in point data. Road network data can be obtained from electronic map data or four-dimensional maps.

对兴趣点数据和签到点数据进行整合包括对兴趣点数据进行整理、清除、统一结构化处理等。Integrating POI data and check-in data includes sorting, clearing, and unifying structured processing of POI data.

步骤204，根据该点集合获取各兴趣点的签到点集合，对该签到点集合中签到点进行过滤得到正确的签到点点集。Step 204: Acquire a check-in point set of each point of interest according to the point set, and filter the check-in points in the check-in point set to obtain a correct check-in point set.

本实施例中，一个兴趣点可包括一个或多个签到点，也可不包括签到点。兴趣点对应的签到点有的错误，以签到点为中心的搜索矩形结点内的其他签到在对应的兴趣点标识上的签到点的个数为热力值，热力值小于热力值阈值，则该签到点为错误的签到点。In this embodiment, a point of interest may include one or more check-in points, or may not include check-in points. There is an error in the check-in point corresponding to the point of interest. The number of other check-in points in the search rectangle node centered on the check-in point is the thermal value. If the thermal value is less than the thermal value threshold, the The check-in point is the wrong check-in point.

图3为兴趣点与签到点的示例示意图。如图3所示，P1、P2和P3为三个兴趣点，在兴趣点P1周围聚集了多个签到点，P2周围有2个签到点，P3周围没有签到点。P为远离P3的签到点。其中，兴趣点采用空白圆圈表示，签到点采用黑色圆圈表示。FIG. 3 is a schematic diagram of an example of a point of interest and a check-in point. As shown in Figure 3, P1, P2, and P3 are three points of interest. Multiple check-in points are gathered around the point of interest P1, 2 check-in points around P2, and no check-in point around P3. P is the check-in point far away from P3. Among them, points of interest are represented by blank circles, and check-in points are represented by black circles.

步骤206，获取该正确的签到点点集的最小外接多边形。Step 206: Obtain the minimum circumscribed polygon of the correct check-in point set.

本实施例中，正确的签到点点集的最小外接多边形可为最小外接矩形，获取正确的签到点点集中所有签到点的经纬度，获取最大和最小的经纬度，获取最大经度与最小经度之差，最大纬度与最小纬度之差，分别作为最小外接矩形的长和宽。In this embodiment, the minimum circumscribed polygon of the correct check-in point set may be the minimum circumscribed rectangle, obtain the latitude and longitude of all check-in points in the correct check-in point set, obtain the maximum and minimum longitude and latitude, obtain the difference between the maximum longitude and the minimum longitude, and obtain the maximum latitude and longitude. The difference from the minimum latitude is used as the length and width of the minimum circumscribed rectangle, respectively.

步骤208，根据该路网数据对该正确的签到点点集的最小外接多边形进行边界优化。Step 208: Perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.

上述虚拟面挖掘的方法，通过获取兴趣点数据、签到点数据和路网数据，对兴趣点数据和签到点数据进行整合得到点集合，对点集合中各兴趣点对应的签到点进行过滤得到正确的签到点点集，获取该签到点点集的最小外接多边形，采用路网对最小外接多边形进行边界优化，该最小外接多边形即为兴趣点的虚拟面，实现了对虚拟面的挖掘。The above-mentioned virtual surface mining method obtains a point set by acquiring interest point data, check-in point data and road network data, integrates the interest point data and the check-in point data to obtain a point set, and filters the check-in points corresponding to each interest point in the point set to obtain the correct value. The minimum circumscribed polygon of the check-in point set is obtained, and the road network is used to optimize the boundary of the minimal circumscribed polygon. The minimal circumscribed polygon is the virtual surface of the point of interest, which realizes the mining of the virtual surface.

此外，挖掘得到兴趣点的虚拟面后，可以根据用户所处的兴趣点的虚拟面给用户推荐该兴趣点的虚拟面内的资源信息。In addition, after mining the virtual surface of the point of interest, resource information in the virtual surface of the point of interest can be recommended to the user according to the virtual surface of the point of interest where the user is located.

在一个实施例中，该对该兴趣点数据和签到点数据进行整合得到点集合的步骤包括：In one embodiment, the step of integrating the point of interest data and the check-in point data to obtain a point set includes:

(1.1)：整理该兴趣点数据；(1.1): Organize the POI data;

本实施例中，从兴趣点数据中获取兴趣点的属性。兴趣点的属性包括兴趣点标识(ID，Identity)、名字、经纬度等。对于没有签到点和签到点个数小于等于第一阈值，则清除。防止在计算过程中建立索引，节省时间与空间。兴趣点标识是用于唯一表示兴趣点的字符串。该字符串可包括数字、字母和字符中的一种或多种。In this embodiment, the attribute of the point of interest is obtained from the point of interest data. The attributes of the point of interest include an identifier of the point of interest (ID, Identity), name, longitude and latitude, and the like. If there is no check-in point and the number of check-in points is less than or equal to the first threshold, it will be cleared. Prevents indexing during computation, saving time and space. A POI ID is a string used to uniquely represent a POI. The string may include one or more of numbers, letters, and characters.

整理兴趣点数据包括(1.1.1)和步骤(1.1.2)：Organizing POI data includes (1.1.1) and steps (1.1.2):

(1.1.1)：利用第一哈希集合统计签到点，其中，第一哈希集合的键为兴趣点标识，键值为签到点个数。(1.1.1): Use the first hash set to count check-in points, wherein the key of the first hash set is the identifier of the point of interest, and the key value is the number of check-in points.

利用第一哈希集合统计签到点，第一哈希集合的键为字符串类型POI的ID，键值为整数型的签到点个数。对于每个签到点，都有一条属性是它签到的POI的ID，因此扫描所有的签到点，在第一哈希集合中存在的POI的ID项，该项计数进行加1，不存在的POI的ID，计数赋值为1，插入到第一哈希集合中。该第一哈希集合可为哈希表。The first hash set is used to count the check-in points, the key of the first hash set is the ID of the POI of the string type, and the key value is the number of the check-in points of the integer type. For each check-in point, there is an attribute that is the ID of the POI that it checked in, so all check-in points are scanned, the ID items of the POIs that exist in the first hash set, the count of this item is incremented by 1, and the POIs that do not exist ID, the count is assigned a value of 1, and inserted into the first hash set. The first hash set may be a hash table.

(1.1.2)：扫描兴趣点数据，若不存在当前兴趣点的兴趣点标识键或者当前兴趣点的兴趣点标识键对应的键值小于等于第一阈值，则清除当前兴趣点，并将该清除的兴趣点的兴趣点标识记录在第二哈希集合中。(1.1.2): Scan the POI data, if there is no POI identification key of the current POI or the key value corresponding to the POI identification key of the current POI is less than or equal to the first threshold, clear the current POI, and set the The POI identifiers of the cleared POIs are recorded in the second hash set.

扫描的POI点的数据，在创建的第一哈希集合中，如果不存在当前POI的ID键或当前POI的ID键对应的计数值小于等于第一阈值(如2，即签到点过少，无法形成一个面)，则清除当前的POI。并将清除的POI的ID键入到第二哈希集合S中。如图3所示，对于POI中签到点数量太少的P2以及没有签到点的P3过滤掉，并且签到点p距离POI中P3太远，过滤签到点p。The scanned POI point data, in the created first hash set, if there is no current POI ID key or the count value corresponding to the current POI ID key is less than or equal to the first threshold (such as 2, that is, there are too few check-in points, cannot form a face), clear the current POI. And key the ID of the cleared POI into the second hash set S. As shown in Figure 3, for P2 with too few check-in points in the POI and P3 with no check-in points, and the check-in point p is too far from P3 in the POI, the check-in point p is filtered.

例如，有些用户在圆明园范围内，可能签到的位置为清华大学POI，这类不准确的签到点为错误的签到点，需要清除。For example, some users may check in at the POI of Tsinghua University within the range of Yuanmingyuan. Such inaccurate check-in points are wrong check-in points and need to be cleared.

(1.2)：将兴趣点的属性和签到点的属性进行统一结构化，并将统一结构化后的兴趣点的属性和签到点的属性记录在点集合数组中。(1.2): The attributes of the points of interest and the attributes of the check-in points are unifiedly structured, and the attributes of the unified structuring points of interest and the attributes of the check-in points are recorded in the point set array.

建立一个点集合数组points，将POI和签到点的属性进行统一结构化，统一结构化包括：ID(标识)、longitude(经度)、latitude(纬度)、type(类型)、offset_lng、offset_lat，其中，offset_lng为所签到的POI的经度，offset_lat为所签到的POI的纬度。type是用来区分该点是POI还是签到点。例如，type为1时，表示该点为POI，type为0时，表示该点为签到点，不限于此。Create a point set array points, and uniformly structure the attributes of POI and check-in points. The uniform structure includes: ID (identity), longitude (longitude), latitude (latitude), type (type), offset _lng , offset _lat , Among them, offset _lng is the longitude of the checked-in POI, and offset _lat is the latitude of the checked-in POI. type is used to distinguish whether the point is a POI or a check-in point. For example, when type is 1, it indicates that the point is POI, and when type is 0, it indicates that the point is a check-in point, but not limited to this.

步骤(1.2)包括步骤(1.2.1)、步骤(1.2.2)：Step (1.2) includes step (1.2.1), step (1.2.2):

步骤(1.2.1)：读取整理后的兴趣点数据得到兴趣点的属性，将该兴趣点的属性进行统一结构化。Step (1.2.1): Read the sorted POI data to obtain the attributes of the POIs, and uniformly structure the attributes of the POIs.

读入POI数据，将type置为1，将offset_lng和offset_lat置为POI的经度和纬度。Read in the POI data, set the type to 1, and set the offset _lng and offset _lat to the longitude and latitude of the POI.

步骤(1.2.2)：读取签到点数据，若签到点对应的兴趣点的兴趣点标识在该第二哈希集合中，则不将该签到点加入点集合数组中，若该签到点与对应的兴趣点之间的距离满足预设条件的签到点清除。Step (1.2.2): Read the check-in point data, if the POI of the point of interest corresponding to the check-in point is identified in the second hash set, the check-in point will not be added to the point set array. Check-in points whose distances between the corresponding points of interest satisfy the preset conditions are cleared.

清除条件为：The clear condition is:

|latitude-offset_lat|＞a|latitude-offset _lat |＞a

|longitude-offset_lng|＞2*a|longitude-offset _lng |＞2*a

其中，由于经度范围为360度，纬度范围为180度，经度是纬度的2倍，故采用2倍的距离阈值。签到点的经度与POI点的经度之差绝对值大于2倍的a，签到点的纬度与POI点的纬度之差的绝对值大于a，则清除该签到点。a为距离阈值参数，可为0.02、0.04等，不限于此。Among them, since the longitude range is 360 degrees, the latitude range is 180 degrees, and the longitude is twice the latitude, a distance threshold of twice is used. If the absolute value of the difference between the longitude of the check-in point and the longitude of the POI point is greater than 2 times a, and the absolute value of the difference between the latitude of the check-in point and the latitude of the POI point is greater than a, the check-in point will be cleared. a is a distance threshold parameter, which can be 0.02, 0.04, etc., but is not limited to this.

在一个实施例中，该对该兴趣点数据和签到点数据进行整合得到点集合的步骤之后，该虚拟面挖掘的方法还包括：对该路网数据进行结构化处理得到每条路的路标识、每条路包含的结点信息和每条路中结点的最大和最小经纬度，并记录在路集合中。In one embodiment, after the step of integrating the point of interest data and the check-in point data to obtain a point set, the method for mining virtual surfaces further includes: performing a structured process on the road network data to obtain a road identifier of each road , The node information contained in each road and the maximum and minimum latitude and longitude of the nodes in each road are recorded in the road set.

本实施例中，建立路集合数组Roads。对路网数据进行结构化处理，为每条路都设置路id(标识)，该路id为该路在路集合数组Roads中的位置值，并获取每条路包含的结点信息和每条路中每个结点的经纬度，并且获取这条路的最小外接矩形。还可获取路的等级、路名等信息。建立结点数组来存储路上所有的结点、每一个结点的经纬度。根据这条路上所有结点的经纬度得到最大和最小经纬度，根据最大和最小经纬度得到这条路的最小外接矩形，即获取最大经度与最小经度之差，最大纬度与最小纬度之差，分别作为最小外接矩形的长和宽。In this embodiment, a road set array Roads is established. Structure the road network data, set a road id (identity) for each road, and the road id is the position value of the road in the road set array Roads, and obtain the node information contained in each road and each road. The latitude and longitude of each node in the road, and get the minimum enclosing rectangle of this road. Information such as road grade and road name can also be obtained. Create a node array to store all the nodes on the road and the latitude and longitude of each node. Obtain the maximum and minimum latitude and longitude according to the longitude and latitude of all nodes on this road, and obtain the minimum circumscribed rectangle of this road according to the maximum and minimum longitude and latitude, that is, obtain the difference between the maximum longitude and the minimum longitude, and the difference between the maximum latitude and the minimum latitude, respectively as the minimum The length and width of the bounding rectangle.

在一个实施例中，该虚拟面挖掘的方法还包括：建立索引，根据该点集合建立倒排索引和第一关键信息平衡树索引，根据该路集合建立第二关键信息平衡树索引。通过建立三部分索引来加速地理位置区域的搜索，也使得搜索和计算的结果更加准确。In one embodiment, the method for mining virtual surfaces further includes: establishing an index, establishing an inverted index and a first key information balanced tree index according to the point set, and establishing a second key information balanced tree index according to the way set. By establishing a three-part index to speed up the search of the geographical area, it also makes the search and calculation results more accurate.

步骤(2)建立索引，包括：Step (2) establishes an index, including:

步骤(2.1)：根据该点集合建立倒排索引，包括：建立倒排索引，其中，键为兴趣点标识，键值为点集合，该点集合中每一项为该兴趣点对应的签到点在该点集合中的位置值。Step (2.1): establishing an inverted index according to the set of points, including: establishing an inverted index, wherein the key is the identifier of the point of interest, the key value is the set of points, and each item in the set of points is the check-in point corresponding to the point of interest The position value in this set of points.

利用POI的ID建立倒排索引。建立一个倒排索引，其中，键为POI的ID，键值为点集合(或点集合数组)。该点集合中每一项为该POI对应的签到点在points中的位置值。扫描points，对于倒排索引中已经存在的POI的ID，将签到的位置值加入到该项的点集合中，对于不存在的POI的ID，建立新项，并插入到倒排索引中。Use the ID of the POI to create an inverted index. Create an inverted index, where the key is the ID of the POI, and the key is the point set (or point set array). Each item in the point set is the position value of the check-in point corresponding to the POI in points. Scan the points, for the ID of the POI that already exists in the inverted index, add the check-in position value to the point set of the item, and for the ID of the POI that does not exist, create a new item and insert it into the inverted index.

步骤(2.2)：根据该点集合建立第一关键信息平衡树索引，包括步骤(2.2.1)、(2.2.2)：Step (2.2): establish the first key information balanced tree index according to the point set, including steps (2.2.1), (2.2.2):

建立KR-Tree，即第一关键信息平衡树。该KR-Tree是在R-Tree的基础上扩展得到的，KR-Tree中的每个结点包括矩形的范围信息、子结点分支和关键文本信息。该关键文本信息包括签到点的POI的ID和一个倒排索引等。倒排索引的键为该签到点的POI的ID，键值为该签到点在点集合数组points中的位置值。Establish KR-Tree, that is, the first key information balance tree. The KR-Tree is extended on the basis of the R-Tree. Each node in the KR-Tree includes the range information of the rectangle, the branch of the child nodes and the key text information. The key text information includes the ID of the POI of the check-in point, an inverted index, and the like. The key of the inverted index is the ID of the POI of the check-in point, and the key value is the position value of the check-in point in the point set array points.

图4为R-Tree实例示意图；图5为KR-Tree在地图上的示例示意图；图6为根据图5的基于POI的ID建立的倒排索引示意图；图7为根据图5建立的KR-Tree索引示意图。Fig. 4 is the schematic diagram of R-Tree instance; Fig. 5 is the example schematic diagram of KR-Tree on the map; Fig. 6 is the inverted index schematic diagram established according to the ID based on POI of Fig. 5; Fig. 7 is the KR-Tree established according to Fig. 5 Tree index diagram.

如图4所示，R-Tree是一个空间索引数据结构，其特点是：R-Tree是n叉树，n为R-Tree的扇；每个结点对应一个矩形；叶子结点上包含了小于等于n的对象，其对应的矩形为所有对象的外包矩形；非叶子结点的矩形为所有子结点矩形的外包矩形。非叶子结点A和B，非叶子结点A包括叶子结点C、D、E、F，非叶子结点B包括叶子结点G、H、I、J。As shown in Figure 4, R-Tree is a spatial index data structure. Its characteristics are: R-Tree is an n-ary tree, and n is a fan of R-Tree; each node corresponds to a rectangle; the leaf nodes contain For objects less than or equal to n, the corresponding rectangle is the enclosing rectangle of all objects; the rectangle of non-leaf nodes is the enclosing rectangle of all child node rectangles. Non-leaf nodes A and B, non-leaf node A includes leaf nodes C, D, E, F, and non-leaf node B includes leaf nodes G, H, I, J.

如图5所示，在地图上非叶子结点R0包括两个非叶子结点R1和R2，非叶子结点R1包括非叶子结点R3、R4和R5，非叶子结点R2包括非叶子结点R6、R7和R8。非叶子结点R3包括叶子结点P1和P2。非叶子结点R4包括叶子结点P11和P12。非叶子结点R5包括叶子结点P3和P4。非叶子结点R6包括叶子结点P5和P7。非叶子结点R7包括叶子结点P9和P10。非叶子结点R8包括叶子结点P6和P8。As shown in Figure 5, the non-leaf node R0 on the map includes two non-leaf nodes R1 and R2, the non-leaf node R1 includes non-leaf nodes R3, R4 and R5, and the non-leaf node R2 includes non-leaf nodes Points R6, R7 and R8. The non-leaf node R3 includes leaf nodes P1 and P2. The non-leaf node R4 includes leaf nodes P11 and P12. The non-leaf node R5 includes leaf nodes P3 and P4. The non-leaf node R6 includes leaf nodes P5 and P7. The non-leaf node R7 includes leaf nodes P9 and P10. The non-leaf node R8 includes leaf nodes P6 and P8.

如图6所示，POI的ID为438490，对应的签到点在该POI点集中位置为P1，P2，P3；POI的ID为857839，对应的签到点在该POI点集中位置为P11，P12；POI的ID为402384，对应的签到点在该POI点集中位置为P4，P5，P6；POI的ID为789347，对应的签到点在该POI点集中位置为P7，P8，P9，P10。As shown in Figure 6, the ID of the POI is 438490, and the corresponding check-in points are P1, P2, and P3 in the POI point collection; the POI ID is 857839, and the corresponding check-in points are P11, P12 in the POI point collection; The ID of POI is 402384, and the corresponding check-in points are P4, P5, and P6 in the POI point collection; the ID of POI is 789347, and the corresponding check-in points are P7, P8, P9, and P10 in the POI point collection.

如图7所示，非叶子结点R0包括两个非叶子结点R1和R2，非叶子结点R1包括非叶子结点R3、R4和R5，非叶子结点R2包括非叶子结点R6、R7和R8。非叶子结点R3包括叶子结点P1和P2。非叶子结点R4包括叶子结点P11和P12。非叶子结点R5包括叶子结点P3和P4。非叶子结点R6包括叶子结点P5和P7。非叶子结点R7包括叶子结点P9和P10。非叶子结点R8包括叶子结点P6和P8。每个叶子结点对应一个POI的ID。如POI的ID为438490，对应的签到点在该POI点集中位置为P1，P2，P3；POI的ID为857839，对应的签到点在该POI点集中位置为P11，P12；POI的ID为402384，对应的签到点在该POI点集中位置为P4，P5，P6；POI的ID为789347，对应的签到点在该POI点集中位置为P7，P8，P9，P10。As shown in Figure 7, non-leaf node R0 includes two non-leaf nodes R1 and R2, non-leaf node R1 includes non-leaf nodes R3, R4 and R5, non-leaf node R2 includes non-leaf nodes R6, R7 and R8. The non-leaf node R3 includes leaf nodes P1 and P2. The non-leaf node R4 includes leaf nodes P11 and P12. The non-leaf node R5 includes leaf nodes P3 and P4. The non-leaf node R6 includes leaf nodes P5 and P7. The non-leaf node R7 includes leaf nodes P9 and P10. The non-leaf node R8 includes leaf nodes P6 and P8. Each leaf node corresponds to a POI ID. For example, the ID of POI is 438490, and the corresponding check-in points are P1, P2, and P3 in the POI points; the ID of POI is 857839, and the corresponding check-in points are P11 and P12 in the POI points; , the corresponding check-in points are P4, P5, P6 in the POI point concentration; the ID of the POI is 789347, and the corresponding check-in points in the POI point concentration are P7, P8, P9, P10.

采用KR-Tree树，从叶子结点开始用矩形将空间框起来，结点越往上，框住的空间就越大，以此对空间进行分割。根据R树的这种数据结构，当需要进行一个高维空间查询时，只需要遍历少数几个叶子结点所包含的指针，查看这些指针指向的数据是否满足要求即可，效率高。Using the KR-Tree tree, the space is framed by a rectangle starting from the leaf node. The higher the node is, the larger the framed space is, so as to divide the space. According to the data structure of the R tree, when a high-dimensional space query needs to be performed, it is only necessary to traverse the pointers contained in a few leaf nodes to check whether the data pointed to by these pointers meets the requirements, which is highly efficient.

步骤(2.2.1)：将签到点数据改成带关键信息的矩形数据。Step (2.2.1): Change the check-in point data into rectangular data with key information.

建立一个带有关键信息的矩形数组rects，对于每一个签到点，设置矩形的最大最小经度为该签到点的经度，矩形的最大最小纬度为该签到点的纬度，并且带有结构属性，该结构属性带有该签到点的POI的ID和一个倒排索引，倒排索引的键为该签到点的POI的ID，键值为该签到点在点集合数组points中的位置值。将这个新的边长为0的矩形加入到rects中。Create a rectangular array rects with key information. For each check-in point, set the maximum and minimum longitude of the rectangle as the longitude of the check-in point, and the maximum and minimum latitude of the rectangle as the latitude of the check-in point, and with structural attributes, the structure The attribute contains the ID of the POI of the check-in point and an inverted index. The key of the inverted index is the ID of the POI of the check-in point, and the key value is the position value of the check-in point in the point set array points. Add this new rectangle with side length 0 to the rects.

步骤(2.2.2)：将该矩形数据插入到第一关键信息平衡树中。Step (2.2.2): Insert the rectangular data into the first key information balance tree.

具体地，将rects每一个矩形插入到KR-Tree中。Specifically, each rectangle of rects is inserted into the KR-Tree.

进一步的，步骤(2.2.2)包括步骤(2.2.2.1)、(2.2.2.2)、(2.2.2.3)和(2.2.2.4)。Further, step (2.2.2) includes steps (2.2.2.1), (2.2.2.2), (2.2.2.3) and (2.2.2.4).

步骤(2.2.2.1)：选择叶子结点L，以放置新条目E；Step (2.2.2.1): Select leaf node L to place new entry E;

本实施例中，通过ChooseLeaf方法选择叶子结点L以放置新的矩形记录E。步骤(2.2.2.1)包括步骤(2.2.2.1.1)、(2.2.2.1.2)、(2.2.2.1.3)、(2.2.2.1.4)。新条目E是新的矩形记录。In this embodiment, a leaf node L is selected by the ChooseLeaf method to place a new rectangular record E. Step (2.2.2.1) includes steps (2.2.2.1.1), (2.2.2.1.2), (2.2.2.1.3), (2.2.2.1.4). The new entry E is the new rectangle record.

步骤(2.2.2.1.1)：设置N为根结点。Step (2.2.2.1.1): Set N as the root node.

步骤(2.2.2.1.2)：若N为叶子结点，则直接返回N。Step (2.2.2.1.2): If N is a leaf node, return N directly.

步骤(2.2.2.1.3)：若N不为叶子结点，则遍历N中的结点，找出添加E时扩张最小的结点，并把该结点定义为F。若有多个这样的结点，则选择面积最小的结点。Step (2.2.2.1.3): If N is not a leaf node, traverse the nodes in N, find the node with the smallest expansion when adding E, and define the node as F. If there are multiple such nodes, the node with the smallest area is selected.

本实施例中，每一个结点都有最大和最小经纬度，E也有最大和最小经纬度，在这八个经纬度中，可以快速获取最大的经纬度和最小的经纬度，计算新的最大最小经纬度所形成的矩形面积，然后与旧的最大最小经纬度计算出的面积比较，可得到面积扩张最小的结点。若有多个扩张面积大小一样的结点，则选择新的面积最小的结点即可。In this embodiment, each node has maximum and minimum longitude and latitude, and E also has maximum and minimum longitude and latitude. Among these eight longitude and latitude, the maximum longitude and latitude and the minimum longitude and latitude can be quickly obtained, and the new maximum and minimum longitude and latitude can be calculated. The area of the rectangle is then compared with the area calculated by the old maximum and minimum longitude and latitude, and the node with the smallest area expansion can be obtained. If there are multiple nodes with the same expansion area, select the new node with the smallest area.

如图8A所示，非叶子结点A包括D、E、F，非叶子结点B包括G、H、I，非叶子结点C包括J、K、L、M。如图8B所示，要增加新的矩形记录X。如图8C，设置N为根结点A，若A为根结点，遍历A中的结点，找出添加X时，X的最大和最小经纬度，A结点的最大经纬度和最小经纬度。选取X和A一起的最大的经纬度和最小的经纬度，计算新的最大最小经纬度的面积，与A的最大最小经纬度计算出的面积比较，得到扩张的面积大小S1(图中虚线框)。同理，计算出X添加到B结点中后的扩张面积大小S2，计算出X添加到C结点中后的扩张的面积大小S3。S1、S2和S3中S1最小，则选取面积扩张最小的结点A，将X添加到A中，如图8D所示。As shown in FIG. 8A , the non-leaf node A includes D, E, and F, the non-leaf node B includes G, H, and I, and the non-leaf node C includes J, K, L, and M. As shown in FIG. 8B, a new rectangular record X is to be added. As shown in Figure 8C, set N as the root node A, if A is the root node, traverse the nodes in A to find the maximum and minimum latitude and longitude of X, and the maximum latitude and longitude and minimum latitude and longitude of node A when adding X. Select the maximum longitude and latitude and the minimum longitude and latitude of X and A together, calculate the area of the new maximum and minimum longitude and latitude, and compare it with the area calculated by the maximum and minimum longitude and latitude of A to obtain the expanded area size S1 (dotted box in the figure). In the same way, calculate the expanded area size S2 after X is added to the B node, and calculate the expanded area size S3 after X is added to the C node. Among S1, S2 and S3, S1 is the smallest, then select the node A with the smallest area expansion, and add X to A, as shown in Figure 8D.

步骤(2.2.2.1.4)：将N设为F，从步骤(2.2.2.1.2)开始重复操作。Step (2.2.2.1.4): Set N to F and repeat the operation from step (2.2.2.1.2).

步骤(2.2.2.2)：若L的空间大于预设空间，则在L中添加E，若L的空间不大于预设空间，则将该L分裂得到两个叶子结点L与LL，该两个叶子结点中包含有原来叶子结点L中的条目与新条目E。Step (2.2.2.2): If the space of L is larger than the preset space, add E to L; if the space of L is not larger than the preset space, split the L to obtain two leaf nodes L and LL. Each leaf node contains the entry in the original leaf node L and the new entry E.

通过SplitNode方法以获得两个叶子结点L与LL。Obtain two leaf nodes L and LL through the SplitNode method.

步骤(2.2.2.3)：对该两个叶子结点L与LL分别进行AdjustTree操作。Step (2.2.2.3): Perform the AdjustTree operation on the two leaf nodes L and LL respectively.

AdjustTree操作为叶子结点的改变向上传递至根结点以改变各个矩阵。在传递变换的过程中可能会产生结点的分裂。The AdjustTree operation is passed up to the root node for changes in the leaf nodes to change the various matrices. Splitting of nodes may occur in the process of transferring transformations.

进一步的，步骤(2.2.2.3)包括步骤(2.2.2.3.1)、(2.2.2.3.2)、(2.2.2.3.3)、(2.2.2.3.4)和(2.2.2.3.5)。Further, step (2.2.2.3) includes steps (2.2.2.3.1), (2.2.2.3.2), (2.2.2.3.3), (2.2.2.3.4) and (2.2.2.3.5) .

步骤(2.2.2.3.1)：将M设为L。Step (2.2.2.3.1): Set M to L.

步骤(2.2.2.3.2)：若M为根结点，则停止操作。Step (2.2.2.3.2): If M is the root node, stop the operation.

步骤(2.2.2.3.3)：设P为M的父结点，EN为指向父结点P中指向M的条目，调整EN以保证所有在M中的矩形都被包围。Step (2.2.2.3.3): Let P be the parent node of M, EN be the entry pointing to M in the parent node P, and adjust EN to ensure that all rectangles in M are enclosed.

步骤(2.2.2.3.4)：若M有一个被分裂产生的结点MM，则创建一个指向MM的条目EMM，若P有空间来存放EMM，则将EMM添加到P中，若没有，则对P进行分裂操作得到P和PP。Step (2.2.2.3.4): If M has a split node MM, create an entry EMM pointing to MM, if P has space to store EMM, add EMM to P, if not, then Split operation on P to get P and PP.

步骤(2.2.2.3.5)：若M等于L且发生了分裂，则把MM置为PP。Step (2.2.2.3.5): If M is equal to L and a split occurs, set MM to PP.

步骤(2.2.2.4)：若结点分裂且该分裂向上传播导致根结点分裂，则创建新的根结点，并且使该新的根结点的两个子结点分别为原来根结点分裂后的两个结点。Step (2.2.2.4): If the node splits and the split propagates upwards, causing the root node to split, a new root node is created, and the two child nodes of the new root node are split as the original root node. the last two nodes.

在一个实施例中，根据该路集合建立第二关键信息平衡树索引，包括：根据该路集合中每条路的最大和最小经纬度以及该路对应的位置值建立带有关键信息的矩形，将该带有关键信息的矩形插入到第二关键信息平衡树中。In one embodiment, establishing the second key information balanced tree index according to the road set includes: establishing a rectangle with key information according to the maximum and minimum latitude and longitude of each road in the road set and the position value corresponding to the road, This rectangle with key information is inserted into the second key information balanced tree.

将每条路都利用该条路上的最大和最小经纬度以及该条路在Roads中对应的位置值建立一个带有关键信息的矩形，并添加到矩形集合中，利用建立签到点的KR-Tree方法来建立路网的KR-Tree，将矩形集合中每一个矩形插入到第二关键信息平衡树中。与建立签到点的KR-Tree方法的区别在于叶子结点是路网每一条路的最小外接矩形，不是边长为0的签到点。Use the maximum and minimum latitude and longitude of each road and the corresponding position value of the road in Roads to create a rectangle with key information, add it to the rectangle set, and use the KR-Tree method for establishing check-in points. To build the KR-Tree of the road network, insert each rectangle in the set of rectangles into the second key information balance tree. The difference from the KR-Tree method of establishing check-in points is that the leaf nodes are the minimum circumscribed rectangles of each road in the road network, not the check-in points with side length 0.

在一个实施例中，对于某一个POI，通过倒排索引和第一关键信息平衡树索引可快速获得POI的签到点集合，并且计算出签到点集合的最小外接矩形。假设签到点在该最小外接矩形中是均匀分布的，可以计算出每一个签到点所占的最小外接矩形的边长，再针对每一个签到点，计算出搜索矩形范围内的热力值，根据热力值过滤掉错误的签到点，得到正确的签到点点集。In one embodiment, for a certain POI, the check-in point set of the POI can be quickly obtained through the inverted index and the first key information balance tree index, and the minimum circumscribed rectangle of the check-in point set is calculated. Assuming that the check-in points are evenly distributed in the minimum circumscribed rectangle, the side length of the minimum circumscribed rectangle occupied by each check-in point can be calculated, and then for each check-in point, the thermal value within the search rectangle is calculated. The value filters out the wrong check-in points to get the correct check-in point set.

根据该点集合获取各兴趣点的签到点集合的步骤包括：根据该倒排索引获取各兴趣点的签到点集合。The step of acquiring the check-in point set of each interest point according to the point set includes: acquiring the check-in point set of each interest point according to the inverted index.

具体地，利用POI的ID建立的签到点的倒排索引，获取全部的签到点集合。该签到点集合中每一项为在签到点在建立的点集合数组points中的位置值。Specifically, the inverted index of the check-in point established by the ID of the POI is used to obtain all the check-in point sets. Each item in the check-in point set is the position value of the check-in point in the established point set array points.

对该签到点集合中签到点进行过滤得到正确的签到点点集的步骤包括：The steps of filtering the check-in points in the check-in point set to obtain the correct check-in point set include:

步骤(3.1)获取签到点集合的最小外接矩形。Step (3.1) obtains the minimum circumscribed rectangle of the check-in point set.

步骤(3.1)包括步骤(3.1.1)和步骤(3.1.2)。Step (3.1) includes step (3.1.1) and step (3.1.2).

步骤(3.1.1)：根据该签到点集合获取各个签到点的经纬度，获取最大和最小的经纬度。Step (3.1.1): Acquire the latitude and longitude of each check-in point according to the set of check-in points, and obtain the maximum and minimum latitude and longitude.

利用点集合数组points，获取到每一个签到点的经纬度。遍历签到点集合，获取到最大和最小的经纬度，即签到点集合的最小外接矩形的地理位置(min_lat，max_lat，min_lng，max_lng)。Use the point set array points to obtain the latitude and longitude of each check-in point. Traverse the check-in point set to obtain the maximum and minimum latitude and longitude, that is, the geographic location (min_lat, max_lat, min_lng, max_lng) of the minimum circumscribed rectangle of the check-in point set.

步骤(3.1.2)：计算签到点集合的最小外接矩形的长和宽。Step (3.1.2): Calculate the length and width of the minimum circumscribed rectangle of the check-in point set.

根据该最大的经度减去最小的经度得到签到点集合的最小外接矩形的长度，根据该最大的纬度减去最小的纬度得到签到点集合的最小外接矩形的宽度。The length of the minimum circumscribed rectangle of the check-in point set is obtained according to the maximum longitude minus the minimum longitude, and the width of the minimum circumscribed rectangle of the check-in point set is obtained according to the maximum latitude minus the minimum latitude.

L₁＝max_lng-min_lngL ₁ =max_lng-min_lng

L₂＝max_lat-min_latL ₂ =max_lat-min_lat

其中，max_lat为最大纬度，min_lat为最小纬度，max_lng为最大经度，min_lng为最小经度。Among them, max_lat is the maximum latitude, min_lat is the minimum latitude, max_lng is the maximum longitude, and min_lng is the minimum longitude.

步骤(3.2)获取签到点所占有该最小外接矩形的边长。Step (3.2) obtains the side length of the minimum circumscribed rectangle occupied by the check-in point.

假设签到点在最小外接矩形中是均匀分布的，则对于每一个签到点，计算出它所占有的最小外接矩形的边长min_dis。Assuming that the check-in points are uniformly distributed in the minimum circumscribed rectangle, for each check-in point, the side length min_dis of the minimum circumscribed rectangle occupied by it is calculated.

其中，α为放大矩形区域的系数，可取3等，n为签到点个数。当min_dis大于最小外接矩形最大边长除以系数α时，Among them, α is the coefficient of enlarging the rectangular area, which can be 3, etc., and n is the number of check-in points. When min_dis is greater than the maximum side length of the minimum circumscribed rectangle divided by the coefficient α,

步骤(3.3)：遍历该签到点集合，计算出每一个签到点的热力值。Step (3.3): Traverse the set of check-in points, and calculate the thermal value of each check-in point.

步骤(3.3)包括步骤(3.3.1)和(3.3.2)。Step (3.3) includes steps (3.3.1) and (3.3.2).

步骤(3.3.1)：为每个签到点建立一个以签到点为中心的在第一关键信息平衡树上进行搜索的搜索矩形结点。Step (3.3.1): For each check-in point, establish a search rectangle node for searching on the first key information balance tree centered on the check-in point.

具体地，为每个签到点建立一个以签到点为中心的能在第一关键信息平衡树上进行搜索的搜索矩形结点search_rect。Specifically, a search rectangle node search_rect centered on the check-in point and capable of searching on the first key information balance tree is established for each check-in point.

计算出以当前签到点P0为中心的搜索矩形结点的最大和最小经纬度。Calculate the maximum and minimum latitude and longitude of the search rectangle node centered on the current check-in point P0.

lng₁＝P₀.lng-min_dislng ₁ =P ₀ .lng-min_dis

lng₂＝P₀.lng+min_dislng ₂ =P ₀ .lng+min_dis

其中，lat₁为搜索矩形结点的最小纬度，lat₂为搜索矩形结点的最大纬度，lng₁为搜索矩形结点的最小经度，lng₂为搜索矩形结点的最大经度。搜索矩形的关键信息为签到点上的POI的ID。Among them, lat ₁ is the minimum latitude of the search rectangle node, lat ₂ is the maximum latitude of the search rectangle node, lng ₁ is the minimum longitude of the search rectangle node, and lng ₂ is the maximum longitude of the search rectangle node. The key information of the search rectangle is the ID of the POI on the check-in point.

步骤(3.3.2)：在该第一关键信息平衡树索引上搜索并计算出在该搜索矩形结点范围内其他签到在该签到点对应的兴趣点标识上的签到点的个数，该个数作为该签到点的热力值。Step (3.3.2): search on the first key information balance tree index and calculate the number of other check-in points on the POI sign corresponding to the check-in point within the range of the search rectangle node. The number is used as the thermal value of the check-in point.

步骤(3.3.2)包括步骤(3.3.2.1)和步骤(3.3.2.2)。Step (3.3.2) includes step (3.3.2.1) and step (3.3.2.2).

步骤(3.3.2.1)：T为一颗R树的根结点。如果T是非叶子结点，且T所对应的矩形与search_rect有重合，则检查所有T中存储的条目，对于所有这些条目，搜索每一个条目所指向的子树的根结点(即T结点的子结点)。Step (3.3.2.1): T is the root node of an R tree. If T is a non-leaf node, and the rectangle corresponding to T coincides with search_rect, check all entries stored in T, and for all these entries, search for the root node of the subtree pointed to by each entry (ie, the T node child nodes).

步骤(3.3.2.2)：若T是叶子结点，且T所对应的矩形与search_rect有重合，则直接检查search_rect所指向的所有记录条目。返回POI的ID为当前的POI的ID的签到点。Step (3.3.2.2): If T is a leaf node, and the rectangle corresponding to T is coincident with search_rect, then directly check all record entries pointed to by search_rect. Returns the check-in point whose POI ID is the current POI ID.

步骤(3.4)：根据各签到点的热力值过滤错误签到点，得到正确的签到点点集。Step (3.4): Filter the wrong check-in points according to the thermal value of each check-in point to obtain the correct check-in point set.

步骤(3.4)包括步骤(3.4.1)和步骤(3.4.2)。Step (3.4) includes step (3.4.1) and step (3.4.2).

步骤(3.4.1)：若签到点集合中签到点数量大于第一数量，则热力值的阈值为第一值，若签到点集合中签到点数量不大于第一数量，则热力值的阈值为第二值，且第一阈值大于第二阈值。Step (3.4.1): If the number of check-in points in the check-in point set is greater than the first number, the threshold of the thermal value is the first value; if the number of check-in points in the check-in point set is not greater than the first number, the threshold of the thermal value is a second value, and the first threshold is greater than the second threshold.

本实施例中，第一数量可根据需要设定，如500。对于签到点集合中签到点数量大于500，则热力值的阈值θ为签到点热力值最大值乘以第一系数得到第一值。对于签到点集合中签到点数量小于500，则热力值的阈值为签到点热力值最大值乘以第二系数得到第二值。获取签到点集合中所有签到点的热力值，对各个签到点的热力值进行比较得到签到点热力值最大值。第一系数和第二系数可根据需要确定。如第一系数为0.5，第二系数为0.1。In this embodiment, the first number can be set as required, such as 500. For the number of check-in points in the check-in point set greater than 500, the threshold θ of the thermal value is the maximum value of the thermal value of the check-in point multiplied by the first coefficient to obtain the first value. If the number of check-in points in the check-in point set is less than 500, the threshold of the thermal value is the maximum value of the thermal value of the check-in point multiplied by the second coefficient to obtain the second value. Obtain the thermal value of all check-in points in the check-in point set, and compare the thermal value of each check-in point to obtain the maximum thermal value of the check-in point. The first coefficient and the second coefficient may be determined as required. For example, the first coefficient is 0.5, and the second coefficient is 0.1.

动态的根据不同的POI的签到数量来划分虚拟面的大小。The size of the virtual surface is dynamically divided according to the number of check-ins of different POIs.

步骤(3.4.2)：遍历签到点集合，将各签到点的热力值小于热力值的阈值的签到点进行清除，得到正确的签到点点集。Step (3.4.2): Traverse the set of check-in points, clear the check-in points whose thermal value of each check-in point is less than the threshold of the thermal value, and obtain a correct set of check-in points.

进一步的，遍历签到点集合，将各签到点的热力值小于热力值的阈值且小于第一阈值的签到点进行剔除。第一阈值可根据需要确定，如第一阈值为10。Further, the set of check-in points is traversed, and the check-in points whose thermal value of each check-in point is less than the threshold of the thermal value and less than the first threshold are eliminated. The first threshold can be determined as required, for example, the first threshold is 10.

在一个实施例中，该获取该正确的签到点点集的最小外接多边形的步骤包括(4.1)～(4.6)：In one embodiment, the step of obtaining the minimum circumscribed polygon of the correct check-in point set includes (4.1) to (4.6):

步骤(4.1)：定义凸包，该凸包为包含签到点点集的最小凸集。Step (4.1): Define the convex hull, which is the smallest convex set containing the check-in point set.

具体地，签到点点集

凸包记为CH(Q)，定义为

凸包CH(Q)是包含Q的最小凸集。Specifically, the check-in point set

The convex hull is denoted CH(Q) and is defined as

The convex hull CH(Q) is the smallest convex set containing Q.

步骤(4.2)：选取该签到点点集中x轴坐标最小的点或y轴坐标最小的点，记为基点P₀。Step (4.2): Select the point with the smallest x-axis coordinate or the point with the smallest y-axis coordinate in the check-in point set, and record it as the base point P ₀ .

具体地，选取该签到点点集中x轴坐标最小的点(即签到点)，若存在多个，则选择y轴坐标最小的点，即找出签到点点集Q中最左下方的点，记为基点P₀。Specifically, select the point with the smallest x-axis coordinate in the check-in point set (ie, the check-in point), if there are more than one, select the point with the smallest y-axis coordinate, that is, find the bottom leftmost point in the check-in point set Q, denoted as base point P ₀ .

步骤(4.3)：对该签到点点集中的点按照关于基点P₀的转角和与P₀的距离的字典序对该点集中的点进行排序。Step (4.3): Sort the points in the check-in point set according to the lexicographical order of the rotation angle of the base point P ₀ and the distance from P ₀ .

步骤(4.3)包括步骤(4.3.1)、步骤(4.3.2)和步骤(4.3.3)。Step (4.3) includes step (4.3.1), step (4.3.2) and step (4.3.3).

步骤(4.3.1)：对签到点进行一次坐标的转换，把基点P₀作为坐标原点，则签到点P_i(x_i,y_i)关于基点P₀的转角θ_i与距离γ_i分别为：Step (4.3.1): Perform a coordinate transformation on the check-in point, and take the base point P ₀ as the origin of the coordinates, then the rotation angle θ _i and the distance γ _i of the check-in point P _i (x _i , y _i ) about the base point P ₀ are respectively: :

步骤(4.3.2)：利用cos(θ)在[0,π]之间的单调递减特性，配置单调递减函数f(i)。Step (4.3.2): Use the monotonically decreasing characteristic of cos(θ) between [0, π] to configure the monotonically decreasing function f(i).

步骤(4.3.3)：按转角和距离的字典序排序转化成按f(i)降序和按γ_i升序的排序，简化了计算过程。Step (4.3.3): Convert the lexicographical ordering of rotation angle and distance into ordering in descending order of f(i) and ascending order of γ _i , which simplifies the calculation process.

步骤(4.4)：将排序后的签到点按照顺序入栈。Step (4.4): Push the sorted check-in points into the stack in order.

具体地，初始化P₀，P₁，P₂，分别按照顺序入栈Stack。Specifically, initialize P ₀ , P ₁ , and P ₂ , and push them into Stack in sequence.

步骤(4.5)：从排序后的第四个签到点起，若该签到点在最后入栈的相邻两个点组成向量的左侧，则该点入栈，否则最后入栈的签到点出栈。Step (4.5): Starting from the fourth check-in point after sorting, if the check-in point is on the left side of the vector formed by the last two adjacent points pushed into the stack, the point is pushed into the stack, otherwise the last check-in point pushed into the stack is out. stack.

具体地，P_i从P₃到P_n-1，若P_i严格在栈中向量

的左侧(t是栈Stack指针，P^t表示栈中第t个元素)，则P_i入栈，且i＝i+1，否则P_t出栈。Specifically, P _i goes from P ₃ to P _n-1 , if P _i is strictly a vector in the stack

The left side of (t is the stack pointer, P ^t represents the t-th element in the stack), then P _i is pushed into the stack, and i=i+1, otherwise P _t is popped out of the stack.

相邻两个点是指最后进入栈中的两个签到点。The two adjacent points refer to the last two check-in points in the stack.

左侧判定，对于平面的任意三点P₀(x₀,y₀)、P₁(x₁,y₁)、P₂(x₂,y₂)，令Judgment on the left, for any three points P ₀ (x ₀ , y ₀ ), P ₁ (x ₁ , y ₁ ), and P ₂ (x ₂ , y ₂ ) on the plane, let

若A(T)＞0，则P₂严格在

的左侧；If A(T)> ₀ , then P2 is strictly in

the left side;

若A(T)＝0，则P₂在

上；If A(T)=0, then P ₂ is

superior;

若A(T)＜0，则P₂严格在

的右侧。If A(T)< ₀ , then P2 is strictly in

on the right.

步骤(4.6)：返回栈中全部元素，得到凸包。Step (4.6): Return all elements in the stack to get the convex hull.

具体地，返回栈Stack中全部元素，得到凸包CH(Q)。Specifically, all elements in the stack are returned to obtain the convex hull CH(Q).

例如，图8为凸包算法示意图。如图8所示，凸包中包含P₀、P₁、P₂、P₃、P₄、P₅、P₆、P₇、P₈、P₉签到点，在计算凸包中签到点的最小外接多边形的过程中，选取最左最下的点为基点P₀，剩下的点按转角和距离进行排序，排序结果为P₄、P₅、P₇、P₉、P₆、P₈、P₂、P₃、P₁。将P₀、P₄、P₅入栈Stack，P₇在向量

的左侧，因此入栈。P₈在

的右侧，P₆出栈，同样在

的右侧，P₉出栈。以此类推，最后得出凸包集合{P₀,P₄，P₅,P₇,P₈,P₃,P₁}。For example, Figure 8 is a schematic diagram of the convex hull algorithm. As shown in Figure 8, the convex hull contains P ₀ , P ₁ , P ₂ , P ₃ , P ₄ , P ₅ , P ₆ , P ₇ , P ₈ , P ₉ check-in points, and the check-in points in the calculation of the convex hull are In the process of minimum circumscribed polygon, select the most left and bottom point as the base point P ₀ , the remaining points are sorted according to the angle and distance, and the sorting results are P ₄ , P ₅ , P ₇ , P ₉ , P ₆ , P ₈ , P ₂ , P ₃ , P ₁ . Push P ₀ , P ₄ , and P ₅ into Stack, and P ₇ is in the vector

the left side of , so it is pushed onto the stack. P ₈ in

On the right side of , P ₆ is popped, also in

On the right side of , P ₉ pops the stack. And so on, finally get the convex hull set {P ₀ , P ₄ , P ₅ , P ₇ , P ₈ , P ₃ , P ₁ }.

在一个实施例中，根据该路网数据对该正确的签到点点集的最小外接多边形进行边界优化的步骤包括：根据该路集合对该正确的签到点点集的最小外接多边形进行边界优化。In one embodiment, the step of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data includes: performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set.

根据该路集合对该正确的签到点点集的最小外接多边形进行边界优化的步骤包括：The steps of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set include:

步骤(5.1)：计算出该凸包的最小外接矩形的地理位置，并且该最小外接矩形的关键信息为兴趣点标识；Step (5.1): Calculate the geographic location of the minimum circumscribed rectangle of the convex hull, and the key information of the minimum circumscribed rectangle is the point of interest identifier;

步骤(5.2)：获取该凸包内签到点的个数。Step (5.2): Obtain the number of check-in points in the convex hull.

具体地，同理步骤(3.3.2.1)和步骤(3.3.2.2)的第一关键信息平衡树上的搜索操作，搜索出所有的在这个最小外接矩形范围内且POI的ID为凸包PIO的ID的签到点集合，并计算出签到点个数。Specifically, in the same way as the search operations on the first key information balance tree in steps (3.3.2.1) and (3.3.2.2), search for all POIs whose ID is the convex hull PIO within this minimum circumscribed rectangle. ID check-in point collection, and calculate the number of check-in points.

T为一颗R树的根结点。如果T是非叶子结点，且T所对应的矩形与search_rect有重合，则检查所有T中存储的条目，对于所有这些条目，搜索每一个条目所指向的子树的根结点(即T结点的子结点)。T is the root node of an R-tree. If T is a non-leaf node, and the rectangle corresponding to T coincides with search_rect, check all entries stored in T, and for all these entries, search for the root node of the subtree pointed to by each entry (ie, the T node child nodes).

若T是叶子结点，且T所对应的矩形与search_rect有重合，则直接检查search_rect所指向的所有记录条目。返回POI的ID为当前的POI的ID的签到点。If T is a leaf node, and the rectangle corresponding to T overlaps with search_rect, check all records pointed to by search_rect directly. Returns the check-in point whose POI ID is the current POI ID.

步骤(5.3)：利用该第二关键信息平衡树索引搜索出该凸包的最小外接矩形内的所有路的路结果集合。Step (5.3): Use the second key information balance tree index to search out the road result set of all the roads within the minimum circumscribed rectangle of the convex hull.

具体的搜索方法同步骤(3.3.2.1)和步骤(3.3.2.2)，得到的路结果集合中，若路的数量超过预定数量，则说明凸包不适合用路网优化。该预定数量可根据需要设定，如500等。The specific search method is the same as step (3.3.2.1) and step (3.3.2.2). In the obtained road result set, if the number of roads exceeds the predetermined number, it means that the convex hull is not suitable for road network optimization. The predetermined number can be set as required, such as 500 and so on.

步骤(5.4)：获取该凸包的中心点经纬度。Step (5.4): Obtain the latitude and longitude of the center point of the convex hull.

具体地，将所有凸包的边界点上的经度和纬度求平均值，即将凸包的边界点上的经度和除以边界点个数得到平均经度，将凸包的边界点上的纬度和除以边界点个数得到平均纬度，将平均经度和平均纬度作为中心点的经纬度。Specifically, average the longitudes and latitudes on all the boundary points of the convex hull, that is, divide the sum of the longitudes on the boundary points of the convex hull by the number of boundary points to obtain the average longitude, and divide the latitude sum on the boundary points of the convex hull by the sum of the latitudes. The average latitude is obtained by the number of boundary points, and the average longitude and average latitude are used as the latitude and longitude of the center point.

其中，lat_center为中心点的纬度，lng_center为中心点的经度，n为凸包的边界点个数，lat_i为凸包上第i个边界点的纬度，lng_i为凸包上第i个边界点的经度。Among them, lat _center is the latitude of the center point, lng _center is the longitude of the center point, n is the number of boundary points on the convex hull, lat _i is the latitude of the ith boundary point on the convex hull, and lng _i is the ith boundary point on the convex hull The longitude of a boundary point.

步骤(5.5)：对于该凸包上每一点，判断该凸包中心与该凸包边界上每一点的连线是否与该路结果集合中的路进行相交；若相交，则将该点的坐标设置为两线交点的坐标，得到新的凸包，计算新的凸包中签到点的个数；若新的凸包中签到点的个数与该原来的凸包中的签到点的个数比例大于预设比例，则使两线交点为该凸包的边界点。Step (5.5): For each point on the convex hull, determine whether the line connecting the center of the convex hull and each point on the boundary of the convex hull intersects the road in the road result set; if it does, the coordinates of the point are Set to the coordinates of the intersection of the two lines, get a new convex hull, and calculate the number of check-in points in the new convex hull; if the number of check-in points in the new convex hull is the same as the number of check-in points in the original convex hull If the scale is greater than the preset scale, the intersection of the two lines will be the boundary point of the convex hull.

具体地，凸包上的每一点P_i即为凸包上的签到点。判断凸包中心P_center与凸包边界上每一点P_i的连接线是否与步骤(5.3)中路的集合当中的路Road_j(j＝1,2,…，n)进行相交。如果相交，则设置该点的坐标为交点的坐标，得到新的凸包，计算新的凸包中签到点的个数。Specifically, each point Pi on the convex _hull is a check-in point on the convex hull. Determine whether the connecting line between the convex hull center P _center and each point P _i on the convex hull boundary intersects with the road Road _j (j=1,2,...,n) in the road set in step (5.3). If it intersects, set the coordinates of the point as the coordinates of the intersection, get a new convex hull, and calculate the number of check-in points in the new convex hull.

新的凸包中的签到点的个数与原来的凸包中的签到点的个数比例大于预设比例，则保留，使得两线交点为凸包的边界点。预设比例可根据需要设定，如0.6。If the ratio of the number of check-in points in the new convex hull to the number of check-in points in the original convex hull is greater than the preset ratio, it is retained, so that the intersection of the two lines is the boundary point of the convex hull. The preset ratio can be set as required, such as 0.6.

图1010为一个实施例中虚拟面挖掘的装置的结构框图。如图1010所示，一种虚拟面挖掘的装置，包括整合模块1002、筛选模块1004、虚拟面获取模块1006和优化模块1008。其中：Figure 1010 is a structural block diagram of an apparatus for excavating virtual surfaces in one embodiment. As shown in FIG. 1010 , an apparatus for excavating virtual surfaces includes an integration module 1002 , a screening module 1004 , a virtual surface acquisition module 1006 and an optimization module 1008 . in:

整合模块1002用于获取兴趣点数据、签到点数据和路网数据，对该兴趣点数据和签到点数据进行整合得到点集合；The integration module 1002 is used for acquiring point of interest data, check-in point data and road network data, and integrating the interest point data and check-in point data to obtain a point set;

筛选模块1004用于根据该点集合获取各兴趣点的签到点集合，对该签到点集合中签到点进行过滤得到正确的签到点点集；The screening module 1004 is configured to obtain the check-in point set of each interest point according to the point set, and filter the check-in points in the check-in point set to obtain the correct check-in point set;

虚拟面获取模块1006用于获取该正确的签到点点集的最小外接多边形；The virtual surface obtaining module 1006 is used to obtain the minimum circumscribed polygon of the correct check-in point set;

优化模块1008用于根据该路网数据对该正确的签到点点集的最小外接多边形进行边界优化。The optimization module 1008 is configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data.

上述虚拟面挖掘的装置，通过获取兴趣点数据、签到点数据和路网数据，对兴趣点数据和签到点数据进行整合得到点集合，对点集合中各兴趣点对应的签到点进行过滤得到正确的签到点点集，获取该签到点点集的最小外接多边形，采用路网对最小外接多边形进行边界优化，该最小外接多边形即为兴趣点的虚拟面，实现了对虚拟面的挖掘。The above-mentioned virtual surface mining device obtains the point set by acquiring the point of interest data, the check-in point data and the road network data, integrates the point of interest data and the check-in point data to obtain a point set, and filters the check-in points corresponding to each point of interest in the point set to obtain the correct value. The minimum circumscribed polygon of the check-in point set is obtained, and the road network is used to optimize the boundary of the minimal circumscribed polygon. The minimal circumscribed polygon is the virtual surface of the point of interest, which realizes the mining of the virtual surface.

图11为另一个实施例中虚拟面挖掘的装置的结构框图。如图11所示，一种虚拟面挖掘的装置，除了包括整合模块1002、筛选模块1004、虚拟面获取模块1006和优化模块1008，还包括索引建立模块1010。其中：FIG. 11 is a structural block diagram of an apparatus for excavating virtual surfaces in another embodiment. As shown in FIG. 11 , an apparatus for virtual surface mining includes an index building module 1010 in addition to an integration module 1002 , a screening module 1004 , a virtual surface acquisition module 1006 and an optimization module 1008 . in:

该整合模块1002还用于对该路网数据进行结构化处理得到每条路的路标识、每条路包含的结点信息和每条路中结点的最大和最小经纬度，并记录在路集合中；The integration module 1002 is further configured to perform structured processing on the road network data to obtain the road identification of each road, the node information contained in each road, and the maximum and minimum latitude and longitude of the nodes in each road, and record them in the road set middle;

索引建立模块1010用于根据该点集合建立倒排索引和第一关键信息平衡树索引，根据该路集合建立第二关键信息平衡树索引；The index establishment module 1010 is configured to establish an inverted index and a first key information balanced tree index according to the point set, and establish a second key information balanced tree index according to the road set;

该筛选模块1004还用于根据该倒排索引获取各兴趣点的签到点集合；The screening module 1004 is further configured to obtain the check-in point set of each point of interest according to the inverted index;

该优化模块1008还用于根据该路集合对该正确的签到点点集的最小外接多边形进行边界优化。The optimization module 1008 is further configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set.

在一个实施例中，该索引建立模块1010还用于根据该点集合建立倒排索引，包括：In one embodiment, the index establishment module 1010 is further configured to establish an inverted index according to the point set, including:

建立倒排索引，其中，键为兴趣点标识，键值为集合数组，该集合数组中每一项为该兴趣点对应的签到点在该点集合中的位置值；Establish an inverted index, wherein the key is the point of interest identifier, the key value is a collection array, and each item in the collection array is the position value of the check-in point corresponding to the interest point in the point collection;

根据该点集合建立第一关键信息平衡树索引，包括：The first key information balanced tree index is established according to the point set, including:

将签到点数据改成带关键信息的矩形数据；Change the check-in point data to rectangular data with key information;

将该矩形数据插入到第一关键信息平衡树中；inserting the rectangular data into the first key information balance tree;

以及根据该路集合建立第二关键信息平衡树索引，包括：and establishing a second key information balanced tree index according to the road set, including:

根据该路集合中每条路的最大和最小经纬度以及该路对应的位置值建立带有关键信息的矩形，将该带有关键信息的矩形插入到第二关键信息平衡树中。According to the maximum and minimum latitude and longitude of each road in the road set and the corresponding position value of the road, a rectangle with key information is established, and the rectangle with key information is inserted into the second key information balance tree.

在一个实施例中，整合模块1002整理该兴趣点数据；将兴趣点的属性和签到点的属性进行统一结构化，并将统一结构化后的兴趣点的属性和签到点的属性记录在点集合数组中。In one embodiment, the integration module 1002 organizes the POI data; performs unified structuring of the attributes of the POIs and the attributes of the check-in points, and records the attributes of the unified structured POIs and the attributes of the check-in points in the point set in the array.

整合模块1002还用于利用第一哈希集合统计签到点，其中，第一哈希集合的键为兴趣点标识，键值为签到点个数；The integration module 1002 is further configured to use the first hash set to count check-in points, wherein the key of the first hash set is the point of interest identifier, and the key value is the number of check-in points;

扫描兴趣点数据，若不存在当前兴趣点的兴趣点标识键或者当前兴趣点的兴趣点标识键对应的键值小于等于第一阈值，则清除当前兴趣点，并将该清除的兴趣点的兴趣点标识记录在第二哈希集合中；Scan the POI data, if there is no POI identification key of the current POI or the key value corresponding to the POI identification key of the current POI is less than or equal to the first threshold, then clear the current POI, and use the cleared POI's interest The point identification is recorded in the second hash set;

读取整理后的兴趣点数据得到兴趣点的属性，将该兴趣点的属性进行统一结构化；Read the sorted POI data to obtain the attributes of the POI, and uniformly structure the attributes of the POI;

读取签到点数据，若签到点对应的兴趣点的兴趣点标识在该第二哈希集合中，则不将该签到点加入点集合数组中，若该签到点与对应的兴趣点之间的距离满足预设条件的签到点清除。Read the check-in point data, if the point-of-interest identifier of the point of interest corresponding to the check-in point is in the second hash set, the check-in point will not be added to the point set array. Clear the check-in points that meet the preset conditions.

该将该矩形数据插入到第一关键信息平衡树中，包括：The rectangular data is inserted into the first key information balance tree, including:

选择叶子结点L，以放置新条目E；Select leaf node L to place new entry E;

若L的空间大于预设空间，则在L中添加E，若L的空间不大于预设空间，则将该L分裂得到两个叶子结点L与LL，该两个叶子结点中包含有原来叶子结点L中的条目与新条目E；If the space of L is larger than the preset space, add E to L, if the space of L is not larger than the preset space, then split the L to obtain two leaf nodes L and LL, and the two leaf nodes contain The entry in the original leaf node L and the new entry E;

对该两个叶子结点L与LL分别进行AdjustTree操作，若结点分裂且该分裂向上传播导致根结点分裂，则创建新的根结点，并且使该新的根结点的两个子结点分别为原来根结点分裂后的两个结点。Perform the AdjustTree operation on the two leaf nodes L and LL respectively. If the node splits and the split propagates upwards and the root node splits, a new root node is created, and the two child nodes of the new root node are created. The points are the two nodes after the original root node is split.

该选择叶子结点L，以放置新条目E的步骤包括：The step of selecting a leaf node L to place a new entry E includes:

设置N为根结点；Set N as the root node;

重复执行以下步骤：Repeat the following steps:

若N为叶子结点，则直接返回N；If N is a leaf node, return N directly;

若N不为叶子结点，则遍历N中的结点，找出添加E时扩张最小的结点，并将该结点定义为F；If N is not a leaf node, traverse the nodes in N, find the node with the smallest expansion when adding E, and define the node as F;

将N设为F。Let N be F.

对该两个叶子结点L与LL分别进行AdjustTree操作的步骤包括：The steps of respectively performing the AdjustTree operation on the two leaf nodes L and LL include:

将M设为L；Set M to L;

重复执行以下步骤：Repeat the following steps:

若M为根结点，则停止操作；If M is the root node, stop the operation;

设P为M的父结点，EN为指向父结点P中指向M的条目，调整EN以保证所有在M中的矩形都被包围；Let P be the parent node of M, EN be the entry pointing to M in the parent node P, and adjust EN to ensure that all rectangles in M are enclosed;

若M有一个被分裂产生的结点MM，则创建一个指向MM的条目EMM，若P有空间来存放EMM，则将EMM添加到P中，若没有，则对P进行分裂操作得到P和PP；If M has a split node MM, create an entry EMM pointing to MM, if P has space to store EMM, add EMM to P, if not, split P to get P and PP ;

若M等于L且发生了分裂，则把MM置为PP。If M is equal to L and a split occurs, set MM to PP.

筛选模块1004还用于对该签到点集合中签到点进行过滤得到正确的签到点点集，包括：The screening module 1004 is further configured to filter the check-in points in the check-in point set to obtain the correct check-in point set, including:

获取签到点集合的最小外接矩形；Get the minimum circumscribed rectangle of the check-in point set;

获取签到点所占有该最小外接矩形的边长；Get the side length of the minimum circumscribed rectangle occupied by the check-in point;

遍历该签到点集合，计算出每一个签到点的热力值；Traverse the set of check-in points and calculate the thermal value of each check-in point;

根据各签到点的热力值过滤错误签到点，得到正确的签到点点集。According to the calorific value of each check-in point, the wrong check-in points are filtered, and the correct check-in point set is obtained.

在一个实施例中，该获取签到点集合的最小外接矩形，包括：In one embodiment, the obtaining the minimum circumscribed rectangle of the check-in point set includes:

根据该签到点集合获取各个签到点的经纬度，获取最大和最小的经纬度；Obtain the latitude and longitude of each check-in point according to the set of check-in points, and obtain the maximum and minimum latitude and longitude;

在一个实施例中，该遍历该签到点集合，计算出每一个签到点的热力值，包括：In one embodiment, the check-in point set is traversed to calculate the thermal value of each check-in point, including:

为每个签到点建立一个以签到点为中心的在第一关键信息平衡树上进行搜索的搜索矩形结点；For each check-in point, establish a search rectangle node for searching on the first key information balance tree centered on the check-in point;

在该第一关键信息平衡树索引上搜索并计算出在该搜索矩形结点范围内其他签到在该签到点对应的兴趣点标识上的签到点的个数，该个数作为该签到点的热力值。Search on the first key information balance tree index and calculate the number of other check-in points on the POI identifier corresponding to the check-in point within the range of the search rectangle node, and the number is used as the thermal power of the check-in point value.

在一个实施例中，根据各签到点的热力值过滤错误签到点，得到正确的签到点点集，包括：In one embodiment, the wrong check-in points are filtered according to the thermal value of each check-in point to obtain a correct check-in point set, including:

若签到点集合中签到点数量大于第一数量，则热力值的阈值为第一值，若签到点集合中签到点数量不大于第一数量，则热力值的阈值为第二值，且第一阈值大于第二阈值；If the number of check-in points in the check-in point set is greater than the first number, the threshold of the thermal value is the first value; if the number of check-in points in the check-in point set is not greater than the first number, the threshold of the thermal value is the second value, and the first value The threshold is greater than the second threshold;

遍历签到点集合，将各签到点的热力值小于热力值的阈值的签到点进行清除，得到正确的签到点点集。Traverse the set of check-in points, clear the check-in points whose thermal value of each check-in point is less than the threshold of thermal value, and obtain the correct set of check-in points.

在一个实施例中，虚拟面获取模块1006获取该正确的签到点点集的最小外接多边形，包括：In one embodiment, the virtual surface obtaining module 1006 obtains the minimum circumscribed polygon of the correct check-in point set, including:

定义凸包，该凸包为包含签到点点集的最小凸集；Define the convex hull, which is the smallest convex set containing the check-in point set;

选取该签到点点集中x轴坐标最小的点或y轴坐标最小的点，记为基点P₀；Select the point with the smallest x-axis coordinate or the point with the smallest y-axis coordinate in the check-in point set, and record it as the base point P ₀ ;

对该签到点点集中的点按照关于基点P₀的转角和与P₀的距离的字典序对该签到点点集中的签到点进行排序；Sort the check-in points in the check-in point set according to the lexicographical order of the corner of the base point P ₀ and the distance from P ₀ ;

将排序后的签到点按照顺序入栈；Push the sorted check-in points into the stack in order;

从排序后的第四个签到点起，若该点在最后入栈的相邻两个点组成向量的左侧，则该点入栈，否则最后入栈的签到点出栈；Starting from the fourth check-in point after sorting, if the point is on the left side of the vector formed by the last two adjacent points pushed into the stack, the point is pushed into the stack, otherwise the last push-in check-in point is pushed out of the stack;

返回栈中全部元素，得到凸包。Return all elements in the stack to get the convex hull.

优化模块1008还用于根据该路集合对该正确的签到点点集的最小外接多边形进行边界优化，包括：The optimization module 1008 is further configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set, including:

计算出该凸包的最小外接矩形的地理位置，并且该最小外接矩形的关键信息为兴趣点标识；Calculate the geographic location of the smallest circumscribed rectangle of the convex hull, and the key information of the smallest circumscribed rectangle is the point of interest identifier;

获取该凸包内签到点的个数；Get the number of check-in points in the convex hull;

利用该第二关键信息平衡树索引搜索出该凸包的最小外接矩形内的所有路的路结果集合；Use the second key information balanced tree index to search out the road result set of all roads in the minimum circumscribed rectangle of the convex hull;

获取该凸包的中心点经纬度；Get the latitude and longitude of the center point of the convex hull;

对于该凸包上每一点，判断该凸包中心与该凸包边界上每一点的连线是否与该路结果集合中的路进行相交；For each point on the convex hull, determine whether the line connecting the center of the convex hull and each point on the boundary of the convex hull intersects the road in the road result set;

若相交，则将该点的坐标设置为两线交点的坐标，得到新的凸包，计算新的凸包中签到点的个数；If it intersects, set the coordinates of the point as the coordinates of the intersection of the two lines, get a new convex hull, and calculate the number of check-in points in the new convex hull;

若新的凸包中签到点的个数与原来的凸包中的签到点的个数比例大于预设比例，则使两线交点为该凸包的边界点。If the ratio of the number of check-in points in the new convex hull to the number of check-in points in the original convex hull is greater than the preset ratio, the intersection of the two lines is set as the boundary point of the convex hull.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一非易失性计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium , when the program is executed, it may include the flow of the above-mentioned method embodiments. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or the like.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as limiting the scope of the patent of the present invention. It should be noted that, for those skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims

1. A method for virtual face mining, comprising:

Obtaining point-of-interest data, check-in point data and road network data, and integrating the point-of-interest data and check-in point data to obtain a point set;

Obtain the check-in point set of each interest point according to the point set, and filter the check-in points in the check-in point set to obtain the correct check-in point set;

Obtain the minimum circumscribed polygon of the correct check-in point set;

According to the road network data, boundary optimization is performed on the minimum circumscribed polygon of the correct check-in point set to obtain the virtual surface of each interest point.

2. The method according to claim 1, wherein the step of integrating the point of interest data and the check-in point data to obtain a point set comprises:

collating said point of interest data;

The attributes of the points of interest and the attributes of the check-in points are unifiedly structured, and the attributes of the unified structuring points of interest and the attributes of the check-in points are recorded in the point set array.

3. The method according to claim 2, wherein the step of organizing the POI data comprises:

Use the first hash set to count check-in points, wherein the key of the first hash set is the identifier of the point of interest, and the key value is the number of check-in points;

Scan the POI data, if there is no POI identification key of the current POI or the key value corresponding to the POI identification key of the current POI is less than or equal to the first threshold, clear the current POI, and use the deleted POI to The POI identifier is recorded in the second hash set;

The step of uniformly structuring the attributes of the points of interest and the attributes of the check-in points, and recording the attributes of the unified structuring points of interest and the attributes of the check-in points in the point set array includes:

Reading the sorted POI data to obtain the attributes of the POIs, and uniformly structuring the attributes of the POIs;

Read the check-in point data, if the point-of-interest identifier of the point of interest corresponding to the check-in point is in the second hash set, then the check-in point is not added to the point set array, if the check-in point and the corresponding point of interest are Check-in points whose distances meet the preset conditions are cleared.

4. The method according to claim 1, wherein after the step of integrating the point of interest data and the check-in point data to obtain a point set, the method further comprises:

Structuring the road network data to obtain the road identification of each road, the node information contained in each road, and the maximum and minimum latitude and longitude of the nodes in each road, and record them in the road set;

Establish an inverted index and a first key information balanced tree index according to the point set, and establish a second key information balanced tree index according to the road set;

The step of acquiring the check-in point set of each interest point according to the point set includes:

Obtain the check-in point set of each interest point according to the inverted index;

The step of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data includes:

Boundary optimization is performed on the minimum circumscribed polygon of the correct check-in point set according to the road set.

5. The method according to claim 4, wherein establishing an inverted index according to the point set comprises:

establishing an inverted index, wherein the key is a point of interest identifier, the key value is a point set, and each item in the point set is the position value of the check-in point corresponding to the interest point in the point set;

Establishing a first key information balanced tree index according to the point set, including:

Change the check-in point data to rectangular data with key information;

inserting the rectangular data into the first key information balance tree;

Establishing a second key information balanced tree index according to the road set, including:

A rectangle with key information is established according to the maximum and minimum latitude and longitude of each road in the road set and the position value corresponding to the road, and the rectangle with key information is inserted into the second key information balance tree.

6. The method according to claim 5, wherein the step of inserting the rectangular data into the first key information balance tree comprises:

Select leaf node L to place new entry E;

If the space of L is larger than the preset space, add E to L. If the space of L is not larger than the preset space, then split the L to obtain two leaf nodes L and LL. Contains the entry in the original leaf node L and the new entry E;

Perform the AdjustTree operation on the two leaf nodes L and LL respectively. If the node splits and the split propagates upwards and causes the root node to split, a new root node is created, and the new root node is The two child nodes are the two nodes after the original root node is split.

7. The method according to claim 6, wherein the step of selecting a leaf node L to place a new entry E comprises:

Set N as the root node;

Repeat the following steps:

If N is a leaf node, return N directly;

If N is not a leaf node, traverse the nodes in N, find the node with the smallest expansion when adding E, and define the node as F;

Let N be F.

8. method according to claim 7, is characterized in that, the step that described two leaf nodes L and LL are carried out AdjustTree operation respectively comprises:

Set M to L;

Repeat the following steps:

If M is the root node, stop the operation;

Let P be the parent node of M, EN be the entry pointing to M in the parent node P, and adjust EN to ensure that all rectangles in M are enclosed;

If M has a split node MM, create an entry EMM pointing to MM, if P has space to store EMM, add EMM to P, if not, split P to get P and PP ;

If M is equal to L and a split occurs, set MM to PP.

9. The method according to claim 4, wherein the step of filtering the check-in points in the check-in point set to obtain the correct check-in point set comprises:

Get the minimum circumscribed rectangle of the check-in point set;

Obtain the side length of the minimum circumscribed rectangle occupied by the check-in point;

Traverse the set of check-in points, and calculate the thermal value of each check-in point;

According to the calorific value of each check-in point, the wrong check-in points are filtered, and the correct check-in point set is obtained.

10. The method according to claim 9, wherein the step of obtaining the minimum circumscribed rectangle of the check-in point set comprises:

Obtain the latitude and longitude of each check-in point according to the set of check-in points, and obtain the maximum and minimum latitude and longitude;

The length of the minimum circumscribed rectangle of the check-in point set is obtained according to the maximum longitude minus the minimum longitude, and the width of the minimum circumscribed rectangle of the check-in point set is obtained according to the maximum latitude minus the minimum latitude.

11. The method according to claim 9, wherein the step of traversing the set of check-in points and calculating the thermal value of each check-in point comprises:

For each check-in point, establish a search rectangle node for searching on the first key information balance tree centered on the check-in point;

Search on the first key information balance tree index and calculate the number of other check-in points within the range of the search rectangle node that are registered on the POI identifier corresponding to the check-in point, and the number is used as the number of check-in points. Describe the thermal value of the check-in point.

12. The method according to claim 9, wherein the step of filtering the wrong check-in points according to the thermal value of each check-in point to obtain the correct check-in point set comprises:

If the number of check-in points in the check-in point set is greater than the first number, the threshold of the thermal value is the first value; if the number of check-in points in the check-in point set is not greater than the first number, the threshold of the thermal value is the second value, and the first value The threshold is greater than the second threshold;

Traverse the set of check-in points, clear the check-in points whose thermal value of each check-in point is less than the threshold of thermal value, and obtain the correct set of check-in points.

13. The method according to claim 4, wherein the step of obtaining the minimum circumscribed polygon of the correct check-in point set comprises:

Define a convex hull, the convex hull is the smallest convex set containing the check-in point set;

Select the point with the smallest x-axis coordinate or the point with the smallest y-axis coordinate in the check-in point set, and record it as the base point P ₀ ;

Sort the check-in points in the check-in point set according to the lexicographical order of the corner of the base point P ₀ and the distance from P ₀ to the points in the check-in point set;

Push the sorted check-in points into the stack in order;

From the fourth check-in point after sorting, if the point is on the left side of the vector formed by the last two adjacent points pushed into the stack, then the check-in point is pushed into the stack, otherwise the last check-in point pushed into the stack is pushed out of the stack;

Return all elements in the stack to get the convex hull.

14. The method according to claim 13, wherein the step of performing boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set comprises:

Calculate the geographic location of the minimum circumscribed rectangle of the convex hull, and the key information of the minimum circumscribed rectangle is the point of interest identifier;

Obtain the number of check-in points in the convex hull;

Use the second key information balanced tree index to search out the road result set of all roads within the minimum circumscribed rectangle of the convex hull;

Obtain the latitude and longitude of the center point of the convex hull;

For each point on the convex hull, determine whether the line connecting the center of the convex hull and each point on the boundary of the convex hull intersects the road in the road result set;

If it intersects, set the coordinates of the point as the coordinates of the intersection of the two lines, obtain a new convex hull, and calculate the number of check-in points in the new convex hull;

If the ratio of the number of check-in points in the new convex hull to the number of check-in points in the original convex hull is greater than the preset ratio, the intersection of the two lines is set as the boundary point of the convex hull.

15. A device for virtual surface excavation, comprising:

an integration module, used for acquiring point-of-interest data, check-in point data and road network data, and integrating the point-of-interest data and check-in point data to obtain a point set;

a screening module, configured to obtain a check-in point set of each interest point according to the point set, and filter the check-in points in the check-in point set to obtain a correct check-in point set;

a virtual surface acquisition module, used for acquiring the minimum circumscribed polygon of the correct check-in point set;

The optimization module is configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road network data to obtain the virtual surface of each interest point.

16. The apparatus of claim 15, wherein the apparatus further comprises:

The integration module is also used to perform structured processing on the road network data to obtain the road identification of each road, the node information contained in each road, and the maximum and minimum latitude and longitude of the nodes in each road, and record them on the road. in the collection;

an index establishment module, configured to establish an inverted index and a first key information balanced tree index according to the point set, and establish a second key information balanced tree index according to the road set;

The screening module is further configured to obtain the check-in point set of each interest point according to the inverted index;

The optimization module is further configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set.

17. The apparatus according to claim 16, wherein the index establishment module is further configured to establish an inverted index according to the point set, comprising:

establishing an inverted index, wherein the key is the point of interest identifier, the key value is a set array, and each item in the set array is the position value of the check-in point corresponding to the interest point in the point set;

Change the check-in point data to rectangular data with key information;

inserting the rectangular data into the first key information balance tree;

and establishing a second key information balanced tree index according to the road set, including:

18. The apparatus according to claim 17, wherein the index establishment module is used for inserting the rectangular data into the first key information balanced tree, comprising:

Select leaf node L to place new entry E;

19. The apparatus according to claim 18, wherein the index establishment module is used to select a leaf node L to place a new entry E, comprising:

Used to set N as the root node;

Repeat the following steps:

If N is a leaf node, return N directly;

Let N be F.

20. device according to claim 19, is characterized in that, described index establishment module is used to respectively carry out AdjustTree operation to described two leaf nodes L and LL, comprising:

Let M be L;

Repeat the following steps:

If M is the root node, stop the operation;

If M is equal to L and a split occurs, set MM to PP.

21. The apparatus according to claim 16, wherein the screening module is configured to filter the check-in points in the check-in point set to obtain a correct check-in point set, comprising:

Get the minimum circumscribed rectangle of the check-in point set;

22. The device according to claim 21, wherein the screening module is configured to obtain the minimum circumscribed rectangle of the check-in point set, comprising:

23. The apparatus according to claim 21, wherein the screening module is configured to traverse the set of check-in points, and calculate the thermal value of each check-in point, comprising:

24. The device according to claim 21, wherein the screening module is used to filter the wrong check-in points according to the thermal value of each check-in point to obtain a correct check-in point set, comprising:

25. The apparatus according to claim 16, wherein the virtual surface obtaining module is configured to obtain the minimum circumscribed polygon of the correct check-in point set, comprising:

Push the sorted check-in points into the stack in order;

Return all elements in the stack to get the convex hull.

26. The apparatus according to claim 25, wherein the optimization module is further configured to perform boundary optimization on the minimum circumscribed polygon of the correct check-in point set according to the road set, comprising:

Obtain the number of check-in points in the convex hull;

Obtain the latitude and longitude of the center point of the convex hull;

27. The device according to claim 15, wherein the integration module is configured to integrate the point of interest data and the check-in point data to obtain a point set, comprising:

collating said point of interest data;

28. The apparatus according to claim 18, wherein the integration module is configured to organize the POI data, comprising:

29. A terminal comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 14 when the processor executes the computer program .

30. A non-volatile computer-readable storage medium storing a computer program, wherein the computer program implements the steps of the method according to any one of claims 1 to 14 when the computer program is executed by a processor.