CN109508815B - General activity spatial measure analysis method based on subway IC card data - Google Patents
General activity spatial measure analysis method based on subway IC card data Download PDFInfo
- Publication number
- CN109508815B CN109508815B CN201811224346.4A CN201811224346A CN109508815B CN 109508815 B CN109508815 B CN 109508815B CN 201811224346 A CN201811224346 A CN 201811224346A CN 109508815 B CN109508815 B CN 109508815B
- Authority
- CN
- China
- Prior art keywords
- general
- school
- travel
- data
- activity space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Educational Technology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提出了一种基于地铁IC卡数据的通学活动空间测度分析方法,属于交通数据挖掘领域。该方法利用地铁IC卡刷卡数据,对通学人群活动空间进行测度分析,并提取对应的测度指标,得到通学群体活动空间量化分析。本发明首次从IC卡数据中挖掘对应的通学出行活动空间测度指标,并基于通学特性对通学人群进行类别划分,划分的人群通学特征显著。发挥了IC卡数据的客观性优势的同时,为不同通学模式的形成机理研究提供条件。
The invention proposes a space measurement and analysis method for general school activities based on subway IC card data, which belongs to the field of traffic data mining. This method uses the subway IC card swiping data to measure and analyze the activity space of the general school crowd, and extract the corresponding measurement indicators to obtain a quantitative analysis of the general school group activity space. For the first time, the present invention mines the corresponding general-school travel activity space measurement index from the IC card data, and classifies the general-study population based on the general-study characteristic, and the divided population has significant general-study characteristics. While taking advantage of the objectivity of IC card data, it provides conditions for the research on the formation mechanism of different general learning modes.
Description
技术领域technical field
本发明涉及公共交通数据挖掘方法,具体涉及一种基于地铁IC卡数据的地铁通学人群活动空间测度分析方法。The invention relates to a public transportation data mining method, in particular to a method for measuring and analyzing the activity space of subway general students based on subway IC card data.
背景技术Background technique
通学群体是出行群体的重要组成部分。在我国,中小学学生住学分离的现象普遍存在,需要利用交通工具进行中长距离出行。而在学生的出行时段中,地面交通较为拥堵,因此地铁成为了他们的重要交通方式。利用活动空间对通学群体借助轨道交通进行出行的行为进行研究,不仅能够了解其出行行为的时空特性,更便于从宏观把控各项活动之间的内在联系。此外,对轨道交通的利用情况以及城市现有空间也能进行更深入的了解。The general school group is an important part of the travel group. In my country, the phenomenon of the separation of residence and school is common among primary and secondary school students, and it is necessary to use means of transportation for medium and long-distance travel. In the travel period of students, the ground traffic is more congested, so the subway has become an important mode of transportation for them. Using the activity space to study the travel behavior of general school groups by means of rail transit can not only understand the temporal and spatial characteristics of their travel behavior, but also facilitate the macro-control of the internal relationship between various activities. In addition, a deeper understanding of the utilization of rail transit and the existing space in the city can be obtained.
相较于传统的交通调查数据,地铁IC卡数据提供的信息具有准确、样本量大、覆盖面广、实时性强、获取成本较低等特性,为研究出行者的时空行为方面提供了更为优质的数据基础。但在现有的技术研究中,存在许多不足,例如:对于个人活动空间多采用传统数据进行测度,出行活动信息存在缺失或时空信息不精确、时间较短等缺陷;运用轨迹数据的研究中存在数据信息时间短的缺陷,特性分析不准确,也没有充分发挥大数据信息全面的特点;现有研究主要对通勤群体的活动空间进行探索,忽略了中小学生通学这类群体,中小学生出行时间、地点相对固定,且对城市空间结构造成了一定的影响;国内外的大量研究将活动空间与其他问题,如职住距离、社区分异等因素结合在一起,而很少关注活动空间本身的形状、面积、热度等特征与出行者、城市空间结构之间的关系,导致研究结果仅能分析特定的问题,无法对城市空间结构进行评价,也无法将大比例出行群体的出行行为所产生的交通负荷与城市公共交通建设相结合,为现状规划调整及未来规划提供可靠依据。现有技术中,尚没有出现对通学活动空间测度分析的相关研究。Compared with the traditional traffic survey data, the information provided by the subway IC card data has the characteristics of accuracy, large sample size, wide coverage, strong real-time performance and low acquisition cost, which provides a better quality for the study of travelers' spatiotemporal behavior. data base. However, there are many deficiencies in the existing technical research, for example, traditional data are used to measure the personal activity space, and there are defects such as missing travel activity information, inaccurate spatiotemporal information, and short time; The shortcomings of the short data information time, inaccurate characteristic analysis, and the comprehensive characteristics of big data information have not been fully utilized; the existing research mainly explores the activity space of commuter groups, ignoring groups such as primary and secondary school students commuting to school. The location is relatively fixed, and it has a certain impact on the urban spatial structure; a large number of studies at home and abroad combine the activity space with other issues, such as work-living distance, community differentiation and other factors, but pay little attention to the shape of the activity space itself , area, heat and other characteristics and the relationship between travelers and the urban spatial structure, the research results can only analyze specific problems, can not evaluate the urban spatial structure, nor can the traffic generated by the travel behavior of a large proportion of travel groups be analyzed. The combination of load and urban public transportation construction provides a reliable basis for current planning adjustment and future planning. In the prior art, there is no relevant research on the spatial measurement analysis of general school activities.
发明内容SUMMARY OF THE INVENTION
发明目的:基于现有技术的不足,本发明提出一种基于地铁IC卡数据的通学活动空间测度分析方法。Purpose of the invention: Based on the deficiencies of the prior art, the present invention proposes a space measurement and analysis method for general school activities based on subway IC card data.
技术方案:为了实现上述目的,本发明的基于地铁IC卡数据的通学活动空间测度分析方法,利用地铁刷卡数据,对通学人群活动空间进行测度分析,并提取对应的测度指标,分析通学群体活动空间的相似性与差异性。该方法包括以下步骤:Technical solution: In order to achieve the above purpose, the method for measuring and analyzing the general school activity space based on the subway IC card data of the present invention utilizes the subway card swiping data to measure and analyze the general school crowd activity space, and extract the corresponding measurement index to analyze the general school group activity space. similarities and differences. The method includes the following steps:
(1)获取地铁IC卡刷卡数据和地铁站点的经纬度数据,从刷卡数据中提取通学有效信息,并将经纬度数据与地理地图进行空间位置的匹配,识别出通学人群,建立通学人群出行数据库;(1) Obtain the subway IC card swiping data and the latitude and longitude data of subway stations, extract the effective information of general school from the card swiping data, match the longitude and latitude data with the geographic map, identify the general school crowd, and establish a general school crowd travel database;
(2)对通学人群出行数据进行预处理;(2) Preprocessing the travel data of general school population;
(3)分析通学人群出行时空特征,定义通学人群活动空间;(3) Analyze the travel time and space characteristics of the general school crowd, and define the activity space of the general school crowd;
(4)基于通学人群活动空间的定义,从“时间”和“空间”两个维度提取活动空间测度指标;(4) Based on the definition of the activity space of the general population, extract the activity space measurement index from the two dimensions of "time" and "space";
(5)将通学人群活动空间测度指标数据进行聚类分析,将通学人群按照活动空间的测度指标划分为不同的类别,得到基于地铁出行的通学人群活动空间模式。(5) Perform cluster analysis on the measurement index data of the general school population activity space, divide the general school population into different categories according to the measurement index of the activity space, and obtain the general school population activity space model based on subway travel.
优选地,步骤(1)中地铁站点空间位置信息匹配方法为:利用采集器在电子地图上爬取地铁站点的经纬度数据,运用万能坐标转换器将其转换为WGS-84坐标系统下的数据,最终得到每个地铁站的位置信息。Preferably, in step (1), the method for matching the spatial location information of subway stations is: using a collector to crawl the latitude and longitude data of subway stations on an electronic map, and using a universal coordinate converter to convert it into data under the WGS-84 coordinate system, Finally, the location information of each subway station is obtained.
优选地,步骤(2)中出行数据预处理,主要包含以下部分:Preferably, the trip data preprocessing in step (2) mainly includes the following parts:
21)分别提取通学群体工作日及节假日(含寒暑假)出行数据,统计通学群体不同状态下的出行时间、出行起讫点、出行次数;21) Extract the travel data of the general school group on working days and holidays (including winter and summer vacations) respectively, and count the travel time, travel start and end points, and travel times of the general school group in different states;
22)提取以学校站点为端点的出行数据,并匹配IC卡卡号,统计通学群体的出行时间、频率及端点位置分布情况。22) Extract the travel data with the school site as the endpoint, match the IC card number, and count the travel time, frequency and endpoint location distribution of the general school group.
优选地,步骤(3)包括:将步骤(21)及步骤(22)中所统计的通学人群信息与社会其他群体的相应数据进行对比,得到通学群体的时空特征,并根据其出行距离存在方向性差异的特征,定义其活动空间为以家为焦点、长短轴长度根据出行点分布计算得到的椭圆形;根据其出行时间固定的特征,加入通学频次及通学距离共同定义通学群体活动空间。Preferably, step (3) includes: comparing the general school crowd information counted in steps (21) and (22) with the corresponding data of other social groups, to obtain the spatiotemporal characteristics of the general school group, and according to the travel distance there is a direction. According to the characteristics of gender differences, the activity space is defined as an ellipse with home as the focus and the length of the major and minor axes calculated according to the distribution of travel points; according to the characteristics of fixed travel time, the frequency of travel and distance between schools are added to define the activity space of the school group.
优选地,步骤(4)中时间和空间两个维度提取活动空间测度指标主要包含:①构建置信椭圆,将扁率与面积作为刻画活动空间的空间测度指标;②通学距离作为刻画活动空间的空间测度指标;③通学频次作为刻画活动空间的时间测度指标。Preferably, in step (4), the extraction of activity space measure indexes from two dimensions of time and space mainly includes: ① constructing a confidence ellipse, and using flattening and area as the space measure indexes to describe the activity space; ② common-school distance as the space to describe the activity space Measure index; ③ The frequency of general study is used as a time measure index to describe the activity space.
优选地,步骤(4)中利用所有活动点的空间坐标建立置信椭圆并对活动空间测度指标进行提取,主要包含以下步骤:Preferably, in step (4), the spatial coordinates of all active points are used to establish a confidence ellipse and extract the active spatial measurement index, which mainly includes the following steps:
41)选择95%的置信度输出置信椭圆,得到个体活动点构成的置信椭圆的圆心坐标、长短半轴长度以及旋转角;41) Select 95% confidence level to output the confidence ellipse, and obtain the center coordinates, the length of the major and minor semi-axes and the rotation angle of the confidence ellipse formed by the individual activity points;
42)计算置信椭圆特征指标:将图层中的椭圆的属性表导出,引入扁率与面积两个指标对活动空间进行测度描述,其中椭圆的扁率α可以通过以下公式计算得出:42) Calculate the confidence ellipse feature index: export the attribute table of the ellipse in the layer, and introduce two indicators of flattening and area to measure and describe the active space. The flattening α of the ellipse can be calculated by the following formula:
其中,a为椭圆的长半轴,b为椭圆的短半轴;α值越大,说明椭圆越扁,活动点方向性较强,极有可能处于同一条地铁线路上;α值越小,说明椭圆越圆,活动点在空间上较为分散;Among them, a is the semi-major axis of the ellipse, and b is the semi-minor axis of the ellipse; the larger the value of α, the flatter the ellipse is, the more directional the active point is, and the more likely it is on the same subway line; the smaller the value of α, the more It means that the more round the ellipse is, the more scattered the active points are in space;
椭圆的面积通过公式S=π·a·b计算得出,面积值越大,说明中小学生的日常活动范围越广;The area of the ellipse is calculated by the formula S=π·a·b. The larger the area value, the wider the daily activities of primary and secondary school students;
43)加入通学频次与通学距离完善活动空间测度指标,其中,43) Add the frequency and distance to improve the activity space measurement index, among which,
通学频次这一指标通过已建立的通学人群出行数据库按照IC卡卡面号进行次数统计;The frequency of going to school is counted according to the face number of the IC card through the established travel database of the people who go to school;
通学距离利用站点经纬度信息根据欧氏距离计算公式计算,公式如下:The distance between schools is calculated using the latitude and longitude information of the site according to the Euclidean distance calculation formula. The formula is as follows:
其中, in,
long1、long2、lat1、lat2分别为家与学校地铁站点的经纬度数据。long1, long2, lat1, and lat2 are the latitude and longitude data of the home and school subway stations, respectively.
优选地,步骤(41)具体包括以下步骤:Preferably, step (41) specifically includes the following steps:
411)提取出行数据库中的IC卡卡面号、进站车站号、出站车站号以及他们对应的经纬度,将进站车站号与出站车站号进行合并,得到每个出行者的活动点信息;411) Extract the IC card face number, inbound station number, outbound station number and their corresponding latitude and longitude in the travel database, merge the inbound station number with the outbound station number, and obtain the activity point information of each traveler ;
412)在arcgis中添加地铁站点经纬度数据后,将包括家庭和学校地点在内的所有坐标从WGS84大地参考系统通过投影变换转换成高斯-克吕格坐标;412) After adding the latitude and longitude data of subway stations in arcgis, convert all coordinates including home and school locations from the WGS84 geodetic reference system to Gauss-Krüger coordinates through projection transformation;
413)在arcgis中选择方向分布工具,输入转化后的坐标,并选择95%的置信度输出置信椭圆,最终可以得到每个个体由一周的活动点构成的置信椭圆的圆心坐标、长短半轴长度以及旋转角。413) Select the direction distribution tool in arcgis, input the transformed coordinates, and select 95% confidence to output the confidence ellipse. Finally, the center coordinates and the length of the major and minor semi-axes of the confidence ellipse composed of a week of activity points for each individual can be obtained. and rotation angle.
优选地,步骤(5)中利用K均值聚类法对活动空间进行分类,通过肘部法则确定聚类个数,利用R软件计算组间平方和与总体距离平方和的比值,取比值大者作为聚类个数,根据活动空间置信椭圆及出行频次等特征进行测度分析,对同类型通学出行行为的共同点与不同类型间区别所在进行分析。Preferably, in step (5), the K-means clustering method is used to classify the activity space, the number of clusters is determined by the elbow rule, and the ratio of the sum of squares between groups to the sum of squares of the overall distance is calculated by using R software, and the larger ratio is taken. As the number of clusters, measure and analyze according to the activity space confidence ellipse and travel frequency and other characteristics, and analyze the common points of the same type of general school travel behavior and the differences between different types.
有益效果:Beneficial effects:
1、本发明实现通过数据天数跨度较长的地铁IC卡数据对通学群体的活动空间进行测度分析,更为全面、精准地体现通学群体活动空间与城市空间布局之间的联系,为解决城市交通负荷不均衡问题及城市布局优化提供决策依据。有利于了解通学群体的行为特征与不同时间的交通客流需求,更好地规划城市的空间布局以提高学生群体的生活质量。1. The present invention realizes the measurement and analysis of the activity space of the general school group through the subway IC card data with a long data span, and more comprehensively and accurately reflects the connection between the general school group activity space and the urban spatial layout, in order to solve the problem of urban traffic. The load imbalance problem and the optimization of urban layout provide decision-making basis. It is beneficial to understand the behavioral characteristics of the general school group and the demand for traffic and passenger flow at different times, and to better plan the spatial layout of the city to improve the quality of life of the student group.
2、本发明首次从IC卡数据中挖掘对应的通学出行活动空间测度指标,并基于通学特性对通学人群进行类别划分,划分的人群通学特征显著。发挥了IC卡数据的客观性优势的同时,为不同通学模式的形成机理研究提供条件。2. For the first time, the present invention mines the corresponding spatial measurement index of general school travel activities from the IC card data, and classifies the general school population based on the general school characteristics, and the divided groups have significant general school characteristics. While taking advantage of the objectivity of IC card data, it provides conditions for the research on the formation mechanism of different general learning modes.
3、本发明的方法将空间测度进行量化分析,该方法的产物便是置信椭圆的特征值,置信椭圆一定程度上代表学生活动空间,其特征值代表活动空间的测度,本发明将实际活动范围转化为数学模型,从而提出一种全新的家与学校站点的数据处理模式,也是一种全新的分析通学活动空间的方法,由于数据与方法的侧重点非常匹配,而具有较强的合理性。3. The method of the present invention carries out quantitative analysis on the spatial measurement. The product of the method is the characteristic value of the confidence ellipse. The confidence ellipse represents the student activity space to a certain extent, and its characteristic value represents the measurement of the activity space. The present invention uses the actual activity range. Converted into a mathematical model, a new data processing mode for home and school sites is proposed, and it is also a new method for analyzing the space of general school activities. Because the focus of the data and the method match very well, it has a strong rationality.
4、本发明用置信椭圆的方法将地铁刷卡数据利用起来,并将通学活动空间表示出来,同时弥补了现有研究中基础分析数据的时间跨度及准确度的不足、研究对象的局限性和活动空间本身的特性关注不够的问题。4. The present invention utilizes the subway card swiping data by the method of confidence ellipse, and expresses the general activity space, and at the same time makes up for the deficiencies in the time span and accuracy of the basic analysis data in the existing research, and the limitations and activities of the research objects. The characteristics of the space itself focus on insufficient attention.
附图说明Description of drawings
图1为本发明方法的流程图;Fig. 1 is the flow chart of the method of the present invention;
图2为根据本发明实施例的肘部法则聚类结果示意图;2 is a schematic diagram of a clustering result of the elbow rule according to an embodiment of the present invention;
图3为根据本发明实施例的聚类结果的通学活动空间图。FIG. 3 is a general learning activity space diagram of clustering results according to an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图对本发明的技术方案作进一步说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings.
参照图1,本发明提出的基于地铁IC卡数据的通学活动空间测度分析方法,包括以下步骤:Referring to Fig. 1, the method for measuring and analyzing the space of general school activity based on subway IC card data proposed by the present invention includes the following steps:
步骤(1)、获取地铁IC卡数据和地铁站点的经纬度数据,从刷卡数据中提取通学有效信息,并将经纬度数据与地理地图进行空间位置的匹配,确定通学人群家与学校地理位置,完成通学人群的识别,建立地铁IC卡通学人群出行数据库。Step (1), obtain the subway IC card data and the latitude and longitude data of the subway station, extract the effective information of general education from the card swiping data, and match the longitude and latitude data with the geographic map to determine the geographical location of the general school population and the school, and complete the general education. To identify the crowd, establish the subway IC cartoon to learn the crowd travel database.
本发明实施例采用的数据为南京市地铁公司的南京市2015年11月2日-20日连续三周的工作日刷卡数据。南京地铁对于学生乘坐有优惠政策,因此他们的票卡为单独一个类型。通过票卡类型为54这一条件最先筛选出所有的学生卡出行信息。根据《一种基于地铁刷卡数据的通学识别方法》(申请号:201711043136.0)保留卡类型为学生卡的有效出行记录,获得学生卡有效出行数据库,其数据结构示例见表1。The data used in the embodiment of the present invention is the data of card swiping on working days for three consecutive weeks in Nanjing from November 2 to 20, 2015 of the Nanjing Metro Company. Nanjing Metro has preferential policies for students, so their ticket cards are of a separate type. First filter out all the travel information of the student card by the condition that the ticket card type is 54. According to "a method of general school identification based on subway card swiping data" (application number: 201711043136.0), the valid travel records of the card type of student card are retained, and the valid travel database of student card is obtained. An example of its data structure is shown in Table 1.
表1地铁IC卡数据结构Table 1 Metro IC card data structure
利用采集器在百度地图上爬取122个地铁站点的经纬度数据,为了能与南京地图处于同一坐标系统下进行空间位置的匹配,运用万能坐标转换器将其转换为WGS-84坐标系统下的数据,最终得到每个地铁站的位置信息,如表2所示。Use the collector to crawl the latitude and longitude data of 122 subway stations on the Baidu map. In order to match the spatial position in the same coordinate system as the Nanjing map, the universal coordinate converter is used to convert it to the data in the WGS-84 coordinate system. , and finally get the location information of each subway station, as shown in Table 2.
表2地铁站经纬度数据Table 2 Longitude and latitude data of subway stations
根据《一种基于地铁刷卡数据的通学识别方法》(申请号:201711043136.0),在所有被识别出家车站和学校车站的卡号的出行记录中删除9:00之前从学校车站出发的记录和16:00之后从家车站出发的记录。此外,由于在识别家与学校的方法中采用了连续三周工作日的地铁刷卡记录作为样本,因此删除通学天数少于3天的卡号及其记录,最后得到317828条通学出行记录。至此完成了通学人群识别。According to "A Recognition Method for General School Based on Subway Card Swiping Data" (application number: 201711043136.0), delete the records of departure from the school station before 9:00 and 16:00 from the travel records of all identified home and school station card numbers. The record of departure from the home station after that. In addition, since the subway card swiping records of three consecutive working days were used as samples in the method of identifying homes and schools, the card numbers and their records with less than 3 days of school days were deleted, and finally 317,828 school travel records were obtained. So far, the general school population identification has been completed.
步骤(2)、对通学人群出行数据进行预处理,便于出行行为特征分析,以提取通学人群与其他人群的显著区分点,完善通学人群活动空间的定义,数据预处理内容主要包括:Step (2): Preprocess the travel data of the general school population, which is convenient for the analysis of travel behavior characteristics, so as to extract the significant distinguishing points between the general school population and other groups, and improve the definition of the general school population activity space. The data preprocessing mainly includes:
21)分别提取通学群体工作日及节假日(含寒暑假)出行数据,统计通学群体不同状态下的出行时间、出行起讫点、出行次数;21) Extract the travel data of the general school group on working days and holidays (including winter and summer vacations) respectively, and count the travel time, travel start and end points, and travel times of the general school group in different states;
22)提取以学校站点为端点的出行数据,并匹配IC卡卡号,统计通学群体的出行时间、频率及端点位置分布情况。22) Extract the travel data with the school site as the endpoint, match the IC card number, and count the travel time, frequency and endpoint location distribution of the general school group.
步骤(3)、分析通学人群出行时空特征,定义通学人群活动空间。Step (3), analyze the travel time and space characteristics of the general school crowd, and define the activity space of the general school crowd.
在分析通学人群出行时空特征时,将步骤(21)及步骤(22)中所统计的通学人群信息与社会其他群体的相应数据进行对比。具体地,分析过程包括:When analyzing the travel time and space characteristics of the general school crowd, the general school crowd information counted in steps (21) and (22) is compared with the corresponding data of other social groups. Specifically, the analysis process includes:
①对比通学人群工作日出行行为及节假日出行行为,出行时间分布、出行OD、出行频次存在显著差异,节假日出行时间整体滞后于工作日,出行频次不固定,且出行OD规律性不强;①Comparing the travel behavior of the general school population on weekdays and on holidays, there are significant differences in the distribution of travel time, travel OD, and travel frequency. The travel time on holidays generally lags behind working days, the travel frequency is not fixed, and the regularity of travel OD is not strong;
②对比通学人群出行行为与其他人群出行行为,通学人群出行范围及出行时间显著固定,学校站点处刷卡时间显著集中,且持续时间固定,刷卡数据量在持续时间内呈正态分布,活动范围小,活动边界及规律性明显;②Comparing the travel behavior of the general school population and other groups, the travel range and travel time of the general school population are significantly fixed, the card swiping time at the school site is significantly concentrated, and the duration is fixed, the amount of card swiping data is normally distributed within the duration, and the activity range is small. , the activity boundary and regularity are obvious;
③对比以学校站点为端点的通学人群出行行为及其他人群出行行为,通学人群出行轨迹显著集中且固定,出行生成时间显著集中,且持续时间固定,出行生成量在持续时间内呈正态分布,出行频次稳定,且工作日与节假日出行规律差异明显。③Comparing the travel behavior of school-bound people and other groups of people with the school site as the endpoint, the travel trajectories of school-bound people are significantly concentrated and fixed, the travel generation time is significantly concentrated, and the duration is fixed, and the amount of travel generation is normally distributed within the duration. The frequency of travel is stable, and the travel patterns on working days and holidays are significantly different.
通过分析得到通学群体相对于社会其他群体存在显著的时空特征:Compared with other social groups, the general school group has significant spatial and temporal characteristics through analysis:
时间特征:出行时段固定,每次出行持续时间固定,学校站点出行数据出现集中,高峰时间固定,且持续时间稳定,出行频次固定;Time characteristics: fixed travel period, fixed duration of each trip, centralized travel data at school sites, fixed peak time, stable duration, and fixed travel frequency;
空间特征:出行OD固定,活动范围小且固定,活动点的出现顺序固定,活动范围方向差异性明显。Spatial characteristics: The travel OD is fixed, the activity range is small and fixed, the appearance order of the activity points is fixed, and the direction of the activity range is obviously different.
可见相对于社会其他群体,通学人群出行起讫点分布具有极高的确定性,通学频次与通学距离和其他群体差异明显,具有较强的固定性。基于以上通学人群所特有的出行特征对其活动空间的定义进行完善,通学人群与社会其他人群共通的出行特性不作为主要指标。It can be seen that compared with other groups in the society, the distribution of travel origin and destination of the general school population has a very high degree of certainty. Based on the unique travel characteristics of the general school population above, the definition of its activity space is improved, and the common travel characteristics of the general school population and other social groups are not used as the main indicators.
活动空间强调出行者在时间与空间两个维度的限制作用下表现出的客观移动与活动,通学群体的主要活动是基于学校与家开展的,活动地点较少,无法根据出行起讫点密度分布确定通学人群活动空间,同时地铁出行的通学人群出行路径处于地下空间,在高密度起讫点附近无法形成缓冲区,从而无法根据出行路径及缓冲区定义活动空间。由于通学人群活动点集中且方向性强,基于家的活动距离存在方向差异,因此在定义通学人群活动空间时,对于此类出行距离存在明显方向性差异的出行者,定义活动模式几何形状为以家为焦点、长短轴长度根据出行点分布计算得到的椭圆形。同时,由于通学群体相对于其他群体存在较强的出行时空特性,将其通学频次及通学距离纳入通学群体活动空间定义,与活动模式的几何形状共同表示通学群体活动空间。The activity space emphasizes the objective movement and activities of travelers under the constraints of time and space. The main activities of the general school group are carried out based on school and home, and there are few activity locations, which cannot be determined according to the density distribution of travel origin and destination. At the same time, the travel path of the subway travel is in the underground space, and a buffer zone cannot be formed near the high-density starting and ending points, so it is impossible to define the activity space according to the travel path and buffer zone. Due to the concentration of activity points and strong directionality of the general school population, there are directional differences in the home-based activity distance. Therefore, when defining the activity space of the general school population, for such travelers with obvious directional differences in travel distance, the geometric shape of the activity pattern is defined as follows: The home is the focal point, and the length of the major and minor axes is an ellipse calculated from the distribution of travel points. At the same time, because the general school group has strong travel time and space characteristics compared with other groups, its frequency and distance are included in the definition of the general school group activity space, and the geometric shape of the activity pattern together represents the general school group activity space.
(4)基于通学人群活动空间的定义,从“时间”和“空间”两个维度提取活动空间测度指标。活动空间测度指标包括:①构建置信椭圆,将扁率与面积作为刻画活动空间的空间测度指标;②通学距离作为刻画活动空间的空间测度指标;③通学频次作为刻画活动空间的时间测度指标。(4) Based on the definition of the activity space of the general population, the activity space measurement indicators are extracted from the two dimensions of "time" and "space". The activity space measurement indicators include: (1) constructing confidence ellipses, and using flattening and area as the spatial measurement indicators to describe the activity space; (2) the distance between schools as a spatial measurement indicator to describe the activity space;
中小学生一般的活动范围较小,尤其是周末可能存在不出行或者不使用地铁出行的情况,所以置信椭圆没有按照工作日与休息日分别构建,而是使用了个体一周内所有的出行点。在构建过程中,只需要每个个体空间活动点的坐标,无需对进站点与出站点进行区分。选择95%的置信度输出置信椭圆,最终可以得到每个个体由一周的活动点构成的置信椭圆的圆心坐标、长短半轴长度以及旋转角。95%的置信椭圆是根据关键活动点的经纬度数据信息可以直接算得的结果,可以选择生成68%、95%、99%的置信度的椭圆,分别表示不同的标准差数,根据实际研究的适应性分析,实施例中取标准差数为2,椭圆置信度为95%。Primary and secondary school students generally have a small range of activities, especially on weekends, they may not travel or use the subway to travel, so the confidence ellipse is not constructed according to working days and rest days, but uses all travel points of the individual within a week. In the construction process, only the coordinates of each individual spatial activity point are needed, and there is no need to distinguish between incoming and outgoing stations. The confidence ellipse is output with a confidence level of 95%, and finally, the center coordinates, the length of the major and minor semi-axes, and the rotation angle of the confidence ellipse composed of a circle of activity points for each individual can be obtained. The 95% confidence ellipse is the result that can be directly calculated based on the latitude and longitude data of key activity points. Ellipses with 68%, 95%, and 99% confidence levels can be selected to represent different standard deviation numbers, which can be adapted according to actual research. For performance analysis, in the examples, the number of standard deviations was taken as 2, and the confidence level of the ellipse was 95%.
椭圆的扁率α可以通过以下公式计算得出:The flattening α of an ellipse can be calculated by the following formula:
其中,a为椭圆的长半轴,b为椭圆的短半轴。α值越大,说明椭圆越扁,活动点方向性较强,极有可能处于同一条地铁线路上;α值越小,说明椭圆越圆,活动点在空间上较为分散。Among them, a is the major semi-axis of the ellipse, and b is the minor semi-axis of the ellipse. The larger the value of α, the more flat the ellipse is, the more directional the active points are, and the more likely they are on the same subway line; the smaller the value of α, the more round the ellipse is, and the more scattered the active points in space.
椭圆的面积通过以下公式计算得出:The area of the ellipse is calculated by the following formula:
S=π·a·bS=π·a·b
面积值越大,说明中小学生的日常活动范围越广。The larger the area value, the wider the range of daily activities of primary and secondary school students.
加入通学频次与通学距离完善活动空间测度指标。基于大量出行数据的客观分析,挖掘出通学群体在通学距离和频次上显著区分于其他群体,所以这两点纳入通学群体出行时空特性分析作为置信椭圆特征值的补充测度指标。其中通学频次这一指标通过已有的通学出行数据库按照IC卡卡面号进行次数统计得到;通学距离利用欧氏距离计算方法求得,计算公式如下:The frequency and distance of passing school are added to improve the activity space measurement index. Based on the objective analysis of a large number of travel data, it is found that the general school group is significantly different from other groups in terms of distance and frequency of travel, so these two points are included in the analysis of the travel time and space characteristics of the general school group as a supplementary measure of the eigenvalue of the confidence ellipse. Among them, the index of the frequency of school trips is obtained by counting the number of times based on the IC card face number in the existing travel database for school trips; the distance between schools is calculated by the Euclidean distance calculation method, and the calculation formula is as follows:
|d12|=6368.16×arccos(sinX+cosX)|d 12 |=6368.16×arccos(sinX+cosX)
其中, in,
long1、long2、lat1、lat2分别为家与学校地铁站点的经纬度数据。long1, long2, lat1, and lat2 are the latitude and longitude data of the home and school subway stations, respectively.
(5)将通学人群活动空间测度指标数据进行聚类分析,将通学人群按照活动空间的测度指标划分为不同的类别。(5) Perform cluster analysis on the measurement index data of the activity space of the general school population, and divide the general school population into different categories according to the measurement index of the activity space.
采用K均值聚类法进行聚类分析,K均值聚类法需要预先确定好聚类个数,再进行分类。同时,需要注意的是,这种聚类方法主要是根据指标数值计算点之间的距离,因此为消除不同指标在尺度上的差异对结果的影响,需要对数据进行标准化处理。The K-means clustering method is used for cluster analysis, and the K-means clustering method needs to pre-determine the number of clusters before classifying. At the same time, it should be noted that this clustering method mainly calculates the distance between points according to the index value. Therefore, in order to eliminate the influence of the difference in scale of different indicators on the results, it is necessary to standardize the data.
在确定聚类个数的方法中,目前最常被提及的为肘部法则,这一方法显示了不同聚类个数的组内平方和,组内平方和表示的是一类数据中所有数据点到中心点的距离平方和,组内平方和越小,表示组内各个数据的各项指标相似度越高。得到结果如图2所示。从图中可以看出,从一类到五类组内平方和下降的速度很快,之后就下降得较为缓慢,因此,实施例中选用5或6作为聚类个数。Among the methods for determining the number of clusters, the elbow rule is the most commonly mentioned at present. This method shows the intra-group sum of squares of different cluster numbers. The intra-group sum of squares represents all the data in a class The sum of squares of the distances from the data point to the center point. The smaller the sum of squares within the group, the higher the similarity of each index of each data in the group. The result is shown in Figure 2. It can be seen from the figure that the sum of squares decreases rapidly from one class to five classes, and then decreases relatively slowly. Therefore, 5 or 6 are selected as the number of clusters in the embodiment.
但同时肘部法则图只给出组内的相似性结果,并不能从中看出组间的差异性。这时可以对人群分别进行5类和6类的划分,R软件会自动计算组间平方和与总体距离平方和的比值,这个比值越大,说明不同类别的差异性越大。R软件中的结果显示如表6所示,聚类个数为6的比值略大于聚类个数为5的比值,因此实施例中最终选择6类作为聚类个数。But at the same time, the elbow rule graph only gives the similarity results within the group, and cannot see the difference between the groups. At this time, the population can be divided into 5 categories and 6 categories, and the R software will automatically calculate the ratio of the sum of squares between groups to the sum of squares of the overall distance. The larger the ratio, the greater the difference between different categories. The results in the R software are shown in Table 6. The ratio of the number of clusters to 6 is slightly larger than the ratio of the number of clusters to 5. Therefore, in the embodiment, 6 categories are finally selected as the number of clusters.
表3不同聚类个数的组间平方和与总体距离平方和的比值Table 3 The ratio of the sum of squares between groups to the sum of squares of the overall distance for different numbers of clusters
通过K均值聚类法,最终将进行地铁通学的学生群体共分为六大类,并得到六类通学人群的各项平均指标如表4所示。Through the K-means clustering method, the students who go to subway general school are finally divided into six categories, and the average indicators of the six categories of general students are shown in Table 4.
表4六类通学群体的指标平均值Table 4 The average value of the indicators of the six types of general education groups
在每类通学群体中选取了一个典型个体,利用Arcgis将其家、学校、其他活动点的位置在图中标出,并根据活动点画出代表其活动空间的置信椭圆,如图3所示,(a)-(f)分别对应六类通学群体中典型个体的活动空间。可以看出,各个类型的典型出行个体的活动范围分别具有显著的不同特征。这些置信椭圆直观地表现出来通学群体的活动空间,椭圆的特征值就是活动空间的测度指标,分类进行分析可得出哪些群体的通学活动范围是合理的,哪些是范围过大的,可以体现出学校分布的合理性与否,从而对现有的城市布局进行评价,对城市交通与城市布局的配合进行评价。A typical individual is selected from each type of general school group, and ArcGIS is used to mark the location of his home, school, and other activity points in the figure, and draw a confidence ellipse representing his activity space according to the activity point, as shown in Figure 3. (a)-(f) correspond to the activity spaces of typical individuals in the six general study groups, respectively. It can be seen that the activity range of each type of typical travel individuals has significantly different characteristics. These confidence ellipses intuitively represent the activity space of the general school group. The eigenvalues of the ellipse are the measurement indicators of the activity space. Classification and analysis can show which groups have a reasonable range of general school activities and which ones are too large, which can reflect the Whether the distribution of schools is reasonable or not, so as to evaluate the existing urban layout and the cooperation between urban traffic and urban layout.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811224346.4A CN109508815B (en) | 2018-10-19 | 2018-10-19 | General activity spatial measure analysis method based on subway IC card data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811224346.4A CN109508815B (en) | 2018-10-19 | 2018-10-19 | General activity spatial measure analysis method based on subway IC card data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109508815A CN109508815A (en) | 2019-03-22 |
| CN109508815B true CN109508815B (en) | 2021-08-10 |
Family
ID=65746846
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811224346.4A Active CN109508815B (en) | 2018-10-19 | 2018-10-19 | General activity spatial measure analysis method based on subway IC card data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109508815B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113222411B (en) * | 2021-05-12 | 2024-06-14 | 北京百度网讯科技有限公司 | A method, device, equipment and storage medium for analyzing activity space distribution |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103279534A (en) * | 2013-05-31 | 2013-09-04 | 西安建筑科技大学 | Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems) |
| CN105701180A (en) * | 2016-01-06 | 2016-06-22 | 北京航空航天大学 | Commuting passenger feature extraction and determination method based on public transportation IC card data |
| CN105718946A (en) * | 2016-01-20 | 2016-06-29 | 北京工业大学 | Passenger going-out behavior analysis method based on subway card-swiping data |
| CN107818415A (en) * | 2017-10-31 | 2018-03-20 | 东南大学 | A kind of recognition methods of attending a school by taking daily trips based on subway brushing card data |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8682342B2 (en) * | 2009-05-13 | 2014-03-25 | Microsoft Corporation | Constraint-based scheduling for delivery of location information |
-
2018
- 2018-10-19 CN CN201811224346.4A patent/CN109508815B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103279534A (en) * | 2013-05-31 | 2013-09-04 | 西安建筑科技大学 | Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems) |
| CN105701180A (en) * | 2016-01-06 | 2016-06-22 | 北京航空航天大学 | Commuting passenger feature extraction and determination method based on public transportation IC card data |
| CN105718946A (en) * | 2016-01-20 | 2016-06-29 | 北京工业大学 | Passenger going-out behavior analysis method based on subway card-swiping data |
| CN107818415A (en) * | 2017-10-31 | 2018-03-20 | 东南大学 | A kind of recognition methods of attending a school by taking daily trips based on subway brushing card data |
Non-Patent Citations (2)
| Title |
|---|
| 《School Commuting Pattern in Metro System Across Different Loyalty Groups》;顾宇等;《In Transportation Research Board 97rd Annual Meeting》;20180111;第1-6页 * |
| 《基于GPS 数据的北京市郊区巨型社区居民日常活动空间》;申悦等;《地理学报》;20130430;第506-516页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109508815A (en) | 2019-03-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Li et al. | Spatially varying impacts of built environment factors on rail transit ridership at station level: A case study in Guangzhou, China | |
| CN109325085B (en) | A method for urban land use function identification and change detection | |
| CN107241512B (en) | Method and device for judging intercity traffic travel mode based on mobile phone data | |
| CN106096631B (en) | A kind of floating population's Classification and Identification analysis method based on mobile phone big data | |
| WO2023050955A1 (en) | Urban functional zone identification method based on function mixing degree and ensemble learning | |
| CN107656987B (en) | A function mining method of subway station based on LDA model | |
| Yong et al. | Mining metro commuting mobility patterns using massive smart card data | |
| CN110110902B (en) | Accessibility measuring and calculating method for shared bicycle connection rail transit station | |
| CN108681741B (en) | Subway commuting crowd information fusion method based on IC card and resident survey data | |
| CN109034474A (en) | It is a kind of to be clustered and regression analysis and system based on the subway station of POI data and passenger flow data | |
| CN110222959A (en) | A kind of urban employment accessibility measuring method and system based on big data | |
| CN105608505A (en) | Cellular signaling data based track traffic travel mode identification method for resident | |
| CN104318324A (en) | Taxi GPS (Global Positioning System) record based airport bus station and path planning method | |
| CN115809378A (en) | Medical shortage area identification and layout optimization method based on mobile phone signaling data | |
| CN106448173B (en) | A kind of long range trip traffic modal splitting method based on data in mobile phone | |
| CN108876475A (en) | City functional area identification method based on interest point acquisition, server and storage medium | |
| CN112036757A (en) | Parking transfer parking lot site selection method based on mobile phone signaling and floating car data | |
| CN109101559A (en) | Urban functional area identification method based on POI and bus card swiping data | |
| CN110399919A (en) | A Human Travel Sparse Trajectory Data Interpolation Reconstruction Method | |
| Huang et al. | Investigating socio-spatial differentiation for metro travelers using smart card data: Older people vs. others | |
| Su et al. | A vehicle trajectory-based parking location recognition and inference method: Considering both travel action and intention | |
| CN109508815B (en) | General activity spatial measure analysis method based on subway IC card data | |
| CN117056823A (en) | A method and system for identifying the occupation type of shared bicycle commuting users | |
| CN110334321A (en) | A Functional Recognition Method of Urban Rail Transit Station Area Based on POI Data | |
| CN112988855A (en) | Subway passenger analysis method and system based on data mining |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |