[go: up one dir, main page]

CN107818415B - A general recognition method based on subway card swiping data - Google Patents

A general recognition method based on subway card swiping data Download PDF

Info

Publication number
CN107818415B
CN107818415B CN201711043136.0A CN201711043136A CN107818415B CN 107818415 B CN107818415 B CN 107818415B CN 201711043136 A CN201711043136 A CN 201711043136A CN 107818415 B CN107818415 B CN 107818415B
Authority
CN
China
Prior art keywords
station
card
school
time
stations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711043136.0A
Other languages
Chinese (zh)
Other versions
CN107818415A (en
Inventor
季彦婕
顾宇
刘阳
刘攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201711043136.0A priority Critical patent/CN107818415B/en
Publication of CN107818415A publication Critical patent/CN107818415A/en
Application granted granted Critical
Publication of CN107818415B publication Critical patent/CN107818415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07BTICKET-ISSUING APPARATUS; FARE-REGISTERING APPARATUS; FRANKING APPARATUS
    • G07B15/00Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points
    • G07B15/02Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points taking into account a variable factor such as distance or time, e.g. for passenger transport, parking systems or car rental systems
    • G07B15/04Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points taking into account a variable factor such as distance or time, e.g. for passenger transport, parking systems or car rental systems comprising devices to free a barrier, turnstile, or the like
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/20Individual registration on entry or exit involving the use of a pass

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Technology (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明公开了一种基于地铁刷卡数据的通学识别方法,包括以下步骤:1)基于地铁刷卡信息,采集包括IC卡卡号、车站和刷卡时间的数据并进行预处理;2)为每个卡号找出使用频率最高的车站及与其对应的使用频率最高的车站,作为家车站和学校车站的候选车站;3)根据该城市的中小学作息时间安排,将每个卡号的两个候选车站之间的出行记录按照进站时间分类,并按照一定的时间规则确定家车站和学校车站;4)找出并删除非通学出行记录和难以判定的卡号及其记录。本发明从时间和空间角度出发,对大量地铁刷卡数据的整合处理,首次提出了基于地铁刷卡数据识别通学人群的方法,解决了使用大数据研究通学行为的基础问题,弥补了传统调查方法的不足。

Figure 201711043136

The invention discloses a general recognition method based on subway card swiping data, comprising the following steps: 1) based on subway card swiping information, collecting data including IC card number, station and card swiping time and preprocessing; 2) searching for each card number The station with the highest frequency of use and the corresponding station with the highest frequency of use are selected as the candidate stations for the home station and the school station; 3) According to the schedule of work and rest of primary and secondary schools in the city, the distance between the two candidate stations of each card number is calculated. The travel records are classified according to the entry time, and the home station and school station are determined according to certain time rules; 4) Find and delete non-school travel records and difficult-to-determine card numbers and their records. From the perspective of time and space, the present invention integrates and processes a large amount of subway card swiping data, and proposes for the first time a method for identifying general students based on subway card swiping data, solves the basic problem of using big data to study general behavior, and makes up for the deficiencies of traditional survey methods. .

Figure 201711043136

Description

一种基于地铁刷卡数据的通学识别方法A general recognition method based on subway card swiping data

技术领域technical field

本发明涉及交通规划中出行行为数据采集与分析方法,具体涉及一种基于地铁刷卡数据的通学识别方法。The invention relates to a method for collecting and analyzing travel behavior data in traffic planning, in particular to a general recognition method based on subway card swiping data.

背景技术Background technique

近年来,学生出行方式的选择引起了越来越多的学者的关注。在中国,小学和初中阶段的学生应该根据就近入学的政策入学。但是,由于教育资源分配不均衡,越来越多的家长会通过择校为他们的孩子获取高质量的教育资源,这通常会导致一个长距离的通学距离。地铁作为城市主要的中长距离出行模式会被他们选择。然而,这些会使用地铁的学生日常仍然会被父母开车接送去学校。为了引导这些学生更多的使用地铁,有必要更好地了解他们日常的通学模式。In recent years, the choice of students' travel mode has attracted more and more scholars' attention. In China, students in primary and junior high schools should be admitted according to the nearest admission policy. However, due to the uneven distribution of educational resources, more and more parents will obtain high-quality educational resources for their children through school choice, which usually leads to a long distance between schools. As the main medium and long-distance travel mode in the city, the subway will be chosen by them. However, these students who use the subway are still driven to school by their parents on a daily basis. In order to guide these students to use the subway more, it is necessary to better understand their daily commuting patterns.

通学行为即与通勤行为对应的从家中往返学校的过程,目前均采用传统问卷调查方法获得通学出行的信息来进行分析,传统的问卷调查方法调查过程耗费大量人力和时间,调查样本量少且样本覆盖范围小、不全面,导致分析结果有偏差或者只能对范围有限的局部地区进行分析。而且,通学行为是一个长期的过程,难以通过短期的传统调查来获取完整的数据。幸运的是为收费设计的地铁智能卡系统可以提供详细的刷卡记录,包括可用于许多目的的卡类型、刷卡日期和地铁站点,这些数据可以更好地代替许多传统调查方式获取的数据。但是,尽管基于智能卡数据的通勤识别研究方法日渐成熟,目前对学校通学行为识别的研究却极少。常规通勤识别方法关注的是时间和空间模式,根据一周内的乘车频次、固定的上下车站、两次乘车的时间间隔等来识别人群的通勤出行。而在通学行为识别方法中,由于家长开车接送学生上下学的现象十分普遍,因此长期使用地铁的中小学生人数并不是非常多,如果仅仅依靠出行频率将会错误地排除掉大量的真正的通学人群;同时,由于学校的午休制度,在同一地点停留的时间也不是很好的鉴定方法。因此必须采用严格合理的时间和空间约束条件来准确判断一次工作日内的出行行为是否为通学行为。The commuting behavior is the process of going from home to school corresponding to the commuting behavior. At present, the traditional questionnaire survey method is used to obtain the information of commuting to school for analysis. The traditional questionnaire survey method consumes a lot of manpower and time, and the survey sample size is small and the sample The coverage is small and incomplete, resulting in biased analysis results or limited local area analysis. Moreover, learning behavior is a long-term process, and it is difficult to obtain complete data through short-term traditional surveys. Fortunately, subway smart card systems designed for toll collection can provide detailed card swipe records, including card types that can be used for many purposes, card swipe dates, and subway stations, which can better replace the data obtained by many traditional survey methods. However, although the research methods of commuting recognition based on smart card data are becoming more and more mature, there are very few studies on the recognition of school commuting behavior. Conventional commuting identification methods focus on temporal and spatial patterns, and identify the commuting trips of people based on the frequency of commuting within a week, fixed alighting stations, and the time interval between two commuting trips. In the identification method of going to school behavior, because it is very common for parents to drive students to and from school, the number of primary and secondary school students who use the subway for a long time is not very large. If only relying on the frequency of travel will mistakenly exclude a large number of real school-going people. ; At the same time, due to the school's lunch break system, the time spent in the same place is not a good identification method. Therefore, strict and reasonable time and space constraints must be used to accurately judge whether a trip within a working day is a school-bound behavior.

发明内容SUMMARY OF THE INVENTION

发明目的:基于以上不足,本发明提供了一种基于地铁刷卡数据识别使用地铁通学的人群及出行记录的方法,能够准确地判断通学行为。Purpose of the invention: Based on the above deficiencies, the present invention provides a method for identifying people who use the subway to go to school and travel records based on subway card swiping data, which can accurately judge the behavior of going to school.

技术方案:本发明所述的一种基于地铁刷卡数据的通学识别方法,包括以下步骤:Technical solution: A general school identification method based on subway card swiping data according to the present invention includes the following steps:

(1)、数据采集与预处理:需要一个城市连续三周及以上的所有工作日内所有地铁站点的IC数据和地铁站点坐标数据。IC数据包括卡号、进站日期、进站时间、出站日期、出站时间、进站车站号、出站车站号、卡类型。采集地铁IC卡数据后,以持卡人为单位将其连续工作日所有出行记录按时间顺序合并,并筛选出并只保留所有卡类型为学生卡的刷卡记录,删除异常数据。(1) Data collection and preprocessing: IC data and subway station coordinate data of all subway stations in a city for three consecutive weeks or more are required. IC data includes card number, entry date, entry time, exit date, exit time, entry station number, exit station number, and card type. After collecting subway IC card data, all travel records of consecutive working days are merged in chronological order by the cardholder, and only all card swiping records whose card type is student card are screened and retained, and abnormal data is deleted.

(2)、为每个卡号计算出该卡号所使用的车站出现的频次,找出其中出现频次最高的车站,并统计出现频次最高的车站的数量。(2) Calculate the frequency of occurrence of the station used by the card number for each card number, find out the station with the highest frequency of occurrence, and count the number of stations with the highest frequency of occurrence.

(3)、若出现频次最高的车站数量为1,则将该车站作为该卡号的家车站或学校车站的候选车站Si1;若出现频次最高的车站数量为2,则将该两个车站作为该卡号的家车站或学校车站的候选车站Si1和Si2;若出现频次最高的车站数量大于2,则合并邻近站点后再将频率最高的车站作为家车站或学校车站的候选车站Si1,若合并后仍有并列2个频次最高的车站,则将该合并后的两个车站作为该卡号的家车站和学校车站的候选车站Si1和Si2,若合并后并列数大于2,则删除这部分卡号及其出行记录。(3) If the number of stations with the highest frequency is 1, the station will be regarded as the candidate station Si1 of the home station or school station of the card number; if the number of stations with the highest frequency is 2, the two stations will be regarded as the The candidate stations Si1 and Si2 of the home station or school station of the card number; if the number of stations with the highest frequency is greater than 2, the adjacent stations are merged, and then the station with the highest frequency is used as the candidate station Si1 of the home station or school station. If there are 2 stations with the highest frequency, the combined two stations will be used as the candidate stations Si1 and Si2 of the home station and school station of the card number. If the number of parallels after the combination is greater than 2, this part of the card number and its trip will be deleted. Record.

(4)、对于步骤3中出现频次最高的车站数量为1和大于2的卡号,计算并判断与已选出的候选车站对应的车站中出现频次最高的车站的数量:若出现频次最高的车站数量为1,则将该车站作为该卡号的家车站或学校车站的另一个候选车站Si2;若出现频次最高的车站数量大于2或等于2,则合并邻近站点后再将频率最高的车站作为家车站或学校车站的另一个候选车站Si2,若合并后仍有并列,则删除这部分卡号及其出行记录。(4) For the number of stations with the highest frequency of occurrence in step 3, 1 and greater than 2, calculate and judge the number of stations with the highest frequency of occurrence among the stations corresponding to the selected candidate stations: if the station with the highest frequency of occurrence If the number is 1, the station will be used as the home station of the card number or another candidate station Si2 of the school station; if the number of stations with the highest frequency is greater than or equal to 2, the adjacent stations will be merged and then the station with the highest frequency will be used as the home station. Another candidate station Si2 of the station or school station, if there is still a tie after the merger, this part of the card number and its travel record will be deleted.

(5)、根据中小学作息时间安排,将每个卡号候选车站之间的出行记录按照进站时间分为四类:(I)am:进站时间在上午最晚上学时间之前;(II)pm:进站时间在下午最早放学时间之后;(III)noon1:进站时间在上午最早放学时间至下午最早上学时间范围之内;(IV)noon2:进站时间在下午最早上学时间至下午最晚上学时间范围之内。(5) According to the schedule of work and rest in primary and secondary schools, the travel records between the candidate stations of each card number are divided into four categories according to the entry time: (I) am: the entry time is before the last school time in the morning; (II) pm: The pit stop time is after the earliest school time in the afternoon; (III) noon1: The pit stop time is within the range from the earliest school time in the morning to the earliest school time in the afternoon; (IV) noon2: The pit stop time is between the earliest school time in the afternoon and the earliest school time in the afternoon. within the evening school hours.

(6)、对四个类别分别按照卡号、进站时间排序;对于存在出行记录属于(I)类的卡号,将其第一条出行记录的进站车站号确定为家车站,对应的出站车站号为学校车站;对于存在出行记录属于(II)类的卡号,将最后一条出行记录的出站车站号确定为家车站,对应的进站车站号为学校车站;对于存在出行记录属于(III)类的卡号,将第一条出行记录的进站车站号确定为学校车站,对应的出站车站号为家车站;对于存在出行记录属于(IV)类的卡号,将最后一条出行记录的出站车站号确定为学校车站,对应的进站车站号为家车站。(6), sort the four categories according to the card number and the entry time respectively; for the card number with the trip record belonging to category (I), the entry station number of the first trip record is determined as the home station, and the corresponding exit station The station number is the school station; for the card number with a trip record belonging to category (II), the outbound station number of the last trip record is determined as the home station, and the corresponding inbound station number is the school station; for the card number with a trip record belonging to (III) ) class card number, the inbound station number of the first trip record is determined as the school station, and the corresponding outbound station number is the home station; for the card number with the trip record belonging to category (IV), the outbound station number of the last trip record is determined as the home station. The station station number is determined as the school station, and the corresponding entry station number is the home station.

(7)、在所有被识别出家车站和学校车站的卡号的出行记录中删掉在上午最晚上学时间之前从学校车站出发的记录以及在下午最晚放学时间之后从家车站出发的记录。(7) Delete the records of departures from the school station before the last school time in the morning and the records of departures from the home station after the latest school time in the afternoon from all travel records identified with the card numbers of the home station and the school station.

(8)、统计每个卡号的通学天数,删除通学天数少于指定阈值天数的卡号及其记录。(8) Count the number of school days for each card number, and delete the card numbers and their records whose school days are less than the specified threshold days.

有益效果:与现有技术相比,本发明具有以下优点:Beneficial effect: Compared with the prior art, the present invention has the following advantages:

本发明采用的地铁刷卡数据获取容易,数据全面、客观,使用大数据更容易揭示其自身的规律。尽管目前对地铁通勤行为的识别已经相对成熟,但是由于学生的出行方式会受到家长出行方式的影响,导致其日常使用地铁的模式多样化,不能用传统通勤识别的方法来识别学生的通学行为。因此,本发明考虑学生可能存在的被接送行为,结合学校的上学放学时间规定,提出了一种基于地铁刷卡数据的通学行为识别方法。与已有的通勤识别方法相比,本发明的时间空间约束更加严格合理,提高了通学识别结果的准确性。The subway card swiping data adopted by the present invention is easy to obtain, the data is comprehensive and objective, and it is easier to reveal its own laws by using big data. Although the identification of subway commuting behaviors is relatively mature, students' commuting behaviors cannot be identified by traditional commuting identification methods because students' travel patterns are affected by their parents' travel patterns, resulting in diverse modes of daily subway use. Therefore, the present invention proposes a method for recognizing passing behavior based on subway card swiping data in consideration of the possible pick-up and drop-off behavior of students, and in combination with the school's school-to-school and school-discharge time regulations. Compared with the existing commuting identification method, the time and space constraints of the present invention are more strict and reasonable, and the accuracy of the commuting identification result is improved.

附图说明Description of drawings

图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案作进一步说明。The technical solutions of the present invention will be further described below with reference to the accompanying drawings.

如图1所示,基于地铁刷卡数据的通学行为识别方法包括三个阶段:一、数据采集与预处理,对应图1中步骤1;二、为每个卡号识别家车站和学校车站,对应图1中步骤2-步骤6;三、删除异常出行记录和通学天数过少的卡号及其出行记录,对应图1中步骤7-步骤8,下面详述具体过程。As shown in Figure 1, the method for identifying general school behavior based on subway card swiping data includes three stages: 1. Data collection and preprocessing, corresponding to step 1 in Figure 1; 2. Identifying home and school stations for each card number, corresponding to Figure 1 Step 2-Step 6 in 1; 3. Delete abnormal travel records and card numbers with too few school days and their travel records, corresponding to steps 7-8 in Figure 1, and the specific process is described in detail below.

一、数据采集与预处理1. Data collection and preprocessing

根据本发明,需要一个城市连续三周及以上的所有工作日内所有地铁站点的IC卡数据和地铁站点坐标数据。本实施例中原始数据为南京2016年10月10号到10月28号的所有地铁站点刷卡数据。在步骤1中,首先将数据库中的原始数据保存成CSV格式由R软件读取,提取其中名为“卡号”、“进站日期”、“进站时间”、“出站日期”、“出站时间”、“进站车站号”、“出站车站号”、“卡类型”、“进站站点经度”、“进站站点纬度”、“出站站点经度”、“出站站点纬度”的列数据,然后对原始数据进行预处理,只保留卡类型为学生卡并且进站日期属于工作日的记录。本发明中卡类型为54表示学生卡,删除“进站日期”与“出站日期”不同的记录以及“进站车站号”与“出站车站号”相同的记录,得到原始出行记录数据X。具体数据形式如表1所示。According to the present invention, IC card data and subway station coordinate data of all subway stations in all working days for three consecutive weeks or more in a city are required. The original data in this embodiment is the credit card data of all subway stations in Nanjing from October 10 to October 28, 2016. In step 1, first save the original data in the database in CSV format for reading by R software, extract the names of "card number", "inbound date", "inbound time", "outbound date", "outbound date" Station Time", "Inbound Station Number", "Outbound Station Number", "Card Type", "Inbound Station Longitude", "Inbound Station Latitude", "Outbound Station Longitude", "Outbound Station Latitude" , and then preprocess the original data to keep only the records whose card type is student card and whose entry date belongs to working days. In the present invention, the card type is 54, which means the student card, delete the records with different "inbound date" and "outbound date" and the same record with "inbound station number" and "outbound station number", and obtain the original travel record data X . The specific data format is shown in Table 1.

表1地铁刷卡数据示例Table 1 Example of subway card swiping data

Figure BDA0001451690880000041
Figure BDA0001451690880000041

备注:坐标数据实际应用为保留9位小数Note: The actual application of coordinate data is to retain 9 decimal places

二、为每个卡号识别家车站和学校车站2. Identify home and school stations for each card number

首先为每个卡号i找出家车站和学校车站的候选车站Si1和Si2,然后根据该城市的中小学作息时间安排,将每个卡号的两个候选车站之间的出行记录按照进站时间分类,再按照一定的时间规则确定家车站和学校车站。其中家车站代表学生家所在地,学校车站代表学生学校所在地。First, find the candidate stations Si1 and Si2 of the home station and school station for each card number i, and then classify the travel records between the two candidate stations of each card number according to the entry time according to the schedule of primary and secondary schools in the city. , and then determine the home station and school station according to certain time rules. The home station represents the location of the student's home, and the school station represents the location of the student's school.

在步骤2中,为每个卡号计算出该卡号所有车站的使用频次,找出其中出现频次最高的车站,并判断每个卡号的出现频次最高的车站的数量。具体地,将原始出行记录数据X按照卡号排序,提取原始数据X中的卡号、进站车站号、出站车站号三列存储为数据库O;复制数据库O中的数据到新数据库N,重命名新数据库N中的列为卡号、出站车站号、进站车站号;按行合并数据库O和N,即数据库O中的进站车站号和数据库N中的出站车站号合并到一列,称为车站号A,数据库O中的出站车站号和数据库N中的进站出站号合并到一列,称为车站号B,形成数据库M,其数据形式如表2所示;将数据库M中的数据按照卡号分行排序,使用循环算法统计数据库M中每个卡号的车站号A或车站号B中的车站出现的频次,为数据库M中所有卡号定义新列Si1和Si2,数值记为0(南京地铁站中不包含编号为0的车站),记录到数据库P,其数据形式如表3所示。In step 2, the usage frequency of all stations of the card number is calculated for each card number, the station with the highest frequency of occurrence is found, and the number of stations with the highest frequency of occurrence of each card number is determined. Specifically, sort the original travel record data X according to the card number, extract the three columns of card number, inbound station number, and outbound station number in the original data X and store it as database O; copy the data in database O to a new database N, and rename it The columns in the new database N are card number, outbound station number, and inbound station number; databases O and N are merged by row, that is, the inbound station number in database O and the outbound station number in database N are merged into one column, which is called is station number A, the outbound station number in database O and the inbound and outbound station number in database N are combined into one column, called station number B, to form database M, whose data form is shown in Table 2; The data are sorted according to the card number branch, use the circular algorithm to count the frequency of station number A or station number B of each card number in the database M, and define new columns Si1 and Si2 for all card numbers in the database M, and the value is recorded as 0 ( Nanjing subway station does not include the station numbered 0), recorded in the database P, and its data form is shown in Table 3.

表2数据库M中数据示例Table 2 Examples of data in database M

Figure BDA0001451690880000042
Figure BDA0001451690880000042

Figure BDA0001451690880000051
Figure BDA0001451690880000051

表3数据库P中数据示例Table 3 Examples of data in database P

卡号card number 车站号station number 频次frequency Si1Si1 Si2Si2 9726xxxxxx789726xxxxxx78 22 11 00 00 9726xxxxxx789726xxxxxx78 1818 77 00 00 9726xxxxxx789726xxxxxx78 21twenty one 22 00 00 9726xxxxxx789726xxxxxx78 1919 44 00 00

在步骤3中,第一次找家车站和学校车站的候选车站。在数据库P中找出每个卡号使用频次最高的车站号,使用循环语句计算每个卡号频次最高的车站号的数量存储于新列n。例如表3中,卡号9726xxxxxx78使用的车站中,频次最高的车站只有18号车站一个,则频次最高的车站号数量为1。接下来,根据出现频次最高的车站的数量找出家车站或学校车站的候选车站。进行如下判断:若车站数为1,则该车站为该卡号的候选车站Si1;若车站数为2,则该两个车站为该卡号的候选车站Si1和Si2;若车站数大于2,则合并邻近站点后令频次最高的车站为Si1,若合并后频次最高的车站数仍大于2则删除该卡号及对应数据。In step 3, the candidate stations of the home station and the school station are searched for the first time. Find the station number with the highest frequency of each card number in the database P, and use the loop statement to calculate the number of station numbers with the highest frequency of each card number and store it in the new column n. For example, in Table 3, among the stations used by the card number 9726xxxxxx78, the station with the highest frequency is only station 18, and the number of station numbers with the highest frequency is 1. Next, find candidate stations for home or school stations based on the number of stations with the highest frequency. The following judgment is made: if the number of stations is 1, the station is the candidate station Si1 of the card number; if the number of stations is 2, the two stations are the candidate stations Si1 and Si2 of the card number; if the number of stations is greater than 2, then merge After adjacent stations, let the station with the highest frequency be Si1. If the number of stations with the highest frequency is still greater than 2 after merging, delete the card number and the corresponding data.

具体地,若n为1,则令这部分卡号的Si1等于出现频次最高的车站号,将这部分卡号的包括列“Si1”的所有数据存储为nrep并与X按照卡号按列合并(最终nrep应包含原nrep(即合并前nrep)中所有卡号的在X和原nrep中的所有列)。如表4所示,频次最高的车站数为1,该车站号为18,令Si1=18。Specifically, if n is 1, make Si1 of this part of the card number equal to the station number with the highest frequency of occurrence, store all the data of this part of the card number including the column "Si1" as nrep and merge it with X according to the card number by column (the final nrep Should contain all the columns in X and the original nrep of all card numbers in the original nrep (ie, the nrep before merging). As shown in Table 4, the station number with the highest frequency is 1, the station number is 18, and Si1=18.

表4数据库nrep中数据示例Table 4 Examples of data in database nrep

Figure BDA0001451690880000052
Figure BDA0001451690880000052

Figure BDA0001451690880000061
Figure BDA0001451690880000061

注:为描述简便起见,表4中与X相同的站点经纬度数据未示出Note: For the convenience of description, the latitude and longitude data of the same site as X in Table 4 are not shown

若n为2,则令这部分卡号的Si1和Si2分别等于这两个车站号(无顺序对应关系),将这部分卡号包含列“Si1”、“Si2”的所有数据存储于rep。如表5所示,频次最高的车站有2个,则将这两个车站号随机分配赋予该卡号对应的Si1和Si2,本实施例中令Si1=9,Si2=73。If n is 2, set Si1 and Si2 of this part of the card number to be equal to the two station numbers respectively (no sequential correspondence), and store all the data of this part of the card number including the columns "Si1" and "Si2" in rep. As shown in Table 5, there are two stations with the highest frequency, and the two station numbers are randomly assigned to Si1 and Si2 corresponding to the card numbers. In this embodiment, Si1=9 and Si2=73.

表5数据库rep中数据示例Table 5 Examples of data in database rep

卡号card number 车站号station number 频次frequency Si1Si1 Si2Si2 9726xxxxxx529726xxxxxx52 99 88 99 7373 9726xxxxxx529726xxxxxx52 7373 88 99 7373 9726xxxxxx529726xxxxxx52 23twenty three 11 99 7373 9726xxxxxx529726xxxxxx52 24twenty four 11 99 7373

注:为描述简便起见,表5中与X相同的站点经纬度数据未示出Note: For the convenience of description, the latitude and longitude data of the same site as X in Table 5 are not shown

若n大于2,如表6所示,车站号14、15、23都出现了3次,则利用站点经纬度信息通过如下公式分别两两计算这些车站之间的欧式距离

Figure BDA0001451690880000062
并存储于新列d12。If n is greater than 2, as shown in Table 6, the station numbers 14, 15, and 23 all appear three times, then use the station latitude and longitude information to calculate the Euclidean distance between these stations by the following formula.
Figure BDA0001451690880000062
and stored in a new column d12.

表6 n大于2数据示例Table 6 Data example of n greater than 2

卡号card number 车站号station number 频次frequency 站点经度site longitude 站点纬度site latitude 9961xxxxxx619961xxxxxx61 1414 33 118.79168700118.79168700 32.0897216832.08972168 9961xxxxxx619961xxxxxx61 1515 33 118.79687500118.79687500 32.0979003932.09790039 9961xxxxxx619961xxxxxx61 23twenty three 33 118.75372310118.75372310 32.0394897532.03948975 9961xxxxxx619961xxxxxx61 99 11 118.77893070118.77893070 32.0432739332.04327393

Figure BDA0001451690880000063
Figure BDA0001451690880000063

其中,

Figure BDA0001451690880000064
in,
Figure BDA0001451690880000064

其中,Long1为进站坐标经度,Lat1为进站坐标纬度,Long2为出站坐标经度,Lat2为出站坐标纬度。Among them, Long1 is the inbound coordinate longitude, Lat1 is the inbound coordinate latitude, Long2 is the outbound coordinate longitude, and Lat2 is the outbound coordinate latitude.

若d12小于或等于指定阈值则合并这两个站点,本发明中阈值设为1km:任取其中一个站点,用两个车站的频次之和替换该车站的频次,然后令Si1等于频次最高的车站号,如上例中车站14和车站15可进行合并,合并结果如表7所示,Si1=14。If d12 is less than or equal to the specified threshold, the two stations are merged. In the present invention, the threshold is set to 1km: take one of the stations, replace the frequency of the station with the sum of the frequencies of the two stations, and then set Si1 to be equal to the highest frequency. Station numbers, as in the above example, station 14 and station 15 can be combined, and the combined result is shown in Table 7, Si1=14.

表7 n大于2处理后数据示例Table 7 Example of data after n is greater than 2

卡号card number 车站号station number 频次frequency Si1Si1 9961xxxxxx619961xxxxxx61 1414 66 1414 9961xxxxxx619961xxxxxx61 23twenty three 33 1414 9961xxxxxx619961xxxxxx61 99 11 1414

若合并后仍有并列,则令Si1和Si2分别等于该两个合并后并列频次最高的车站号;若合并后并列数大于2,则可以认为出行OD点过于分散无法判断,删除这部分卡号及其出行记录。将数据库P中这部分处理过的卡号包含列“Si1”的所有数据存储于nrep1并与X按照卡号按列合并,然后将nrep1按行合并存储于nrep。If there is still a tie after the merger, let Si1 and Si2 be equal to the station number with the highest tie frequency after the merger; if the tie number after the merger is greater than 2, it can be considered that the travel OD points are too scattered to judge, delete this part of the card number and his travel records. Store all data of the processed card number in the database P including the column "Si1" in nrep1 and merge it with X according to the card number by column, and then combine nrep1 by row and store it in nrep.

经过上述处理,对于有些卡号,已经能够得到候选家车站和候选学校车站两者(即Si1和Si2),但是还有一些卡号,只得到了候选家车站和候选学校车站中的一个(即Si1),还需要再找出另一个候选车站。即下面所述的步骤4,第二次找候选车站。对于步骤3中只找出一个候选车站Si1的,计算与候选车站Si1对应的车站出现的频次,本发明中对应的车站是指:每个卡号的出行记录中,有的有Si1所示车站(表示该次出行涉及到Si1所示车站),有的没有(表明该次出行没有涉及到Si1所示车站),在这里只选取有Si1的出行记录,提取出除了Si1之外的剩余车站,这些车站就是与Si1对应的车站。根据出现频次最高的对应车站的数量找出家或学校的另一个候选车站Si2,执行如下判断:若车站数为1,则该车站为该卡号的另一个候选车站Si2;若车站数大于或等于2,则合并邻近站点后令频次最高的车站为Si2,若合并后仍有并列则删除该卡号及相应记录。After the above processing, for some card numbers, both the candidate station and the candidate school station (ie Si1 and Si2) have been obtained, but for some card numbers, only one of the candidate station and the candidate school station (ie Si1) has been obtained, Another candidate station needs to be found. That is, step 4 described below, to find candidate stations for the second time. For the case where only one candidate station Si1 is found in step 3, the frequency of occurrence of the station corresponding to the candidate station Si1 is calculated. The corresponding station in the present invention refers to: in the travel records of each card number, some have the station shown in Si1 ( Indicates that the trip involves the station indicated by Si1), some do not (indicating that the trip does not involve the station indicated by Si1), here only the travel records of Si1 are selected, and the remaining stations except Si1 are extracted. The station is the station corresponding to Si1. Find another candidate station Si2 for home or school according to the number of corresponding stations with the highest occurrence frequency, and perform the following judgment: if the number of stations is 1, the station is another candidate station Si2 for the card number; if the number of stations is greater than or equal to 2. After the adjacent stations are merged, the station with the highest frequency will be Si2. If there is still a tie after the merger, the card number and the corresponding record will be deleted.

具体地,在数据nrep中为每个卡号筛选出进站车站号或出站车站号中包含候选车站Si1的出行记录存储为O2,类似于步骤2,将O2中的每个卡号的进站车站号和出站车站号合并存储为M2,将M2按照卡号排序,删除所有进站车站号为Si1的卡号,统计M2中每个卡号的剩余的进站车站号出现的频次。在数据M2中找出每个卡号使用频次最高的车站号,使用循环语句计算每个卡号频次最高的车站号的数量存储于新列n',然后使用条件语句判断。若n'为1,则令这部分卡号的Si2等于出现频次最高的车站号。若n'大于2或等于2,类似于步骤3,则通过步骤3中所示公式分别两两计算这些卡号之间的欧式距离

Figure BDA0001451690880000071
并存储于新列d12',若d12'小于或等于1km则合并这两个站点,删除合并后仍有并列频次最高的车站的卡号及其出行记录。将M2中这部分处理过的卡号包含列“Si2”的所有数据存储于nrep'并与X按照卡号按列合并,然后将nrep'按行合并存储于nrep。Specifically, in the data nrep, the inbound station number or the outbound station number containing the candidate station Si1 is screened out for each card number in the data nrep and stored as O2. Similar to step 2, the inbound station of each card number in O2 is stored as O2. Number and exit station number are combined and stored as M2, sort M2 according to the card number, delete all the card numbers whose entry station number is Si1, and count the frequency of the remaining inbound station numbers for each card number in M2. Find the station number with the highest frequency of each card number in the data M2, use a loop statement to calculate the number of station numbers with the highest frequency of each card number and store it in a new column n', and then use a conditional statement to judge. If n' is 1, set Si2 of this part of the card numbers to be equal to the station number with the highest frequency. If n' is greater than 2 or equal to 2, similar to step 3, calculate the Euclidean distance between these card numbers by using the formula shown in step 3.
Figure BDA0001451690880000071
And stored in a new column d 12 ', if d 12 ' is less than or equal to 1km, merge the two stations, delete the card number and travel record of the station with the highest parallel frequency after the merger. Store all data of the processed card number in M2 including the column "Si2" in nrep' and merge it with X according to the card number by column, and then combine nrep' by row and store it in nrep.

将rep与nrep按行合并存储于数据库Q。删除数据库Q中进站车站号或出站车站号中包含除Si1和Si2外的车站号的出行记录,得到仅在候选车站之间出行的所有通学出行记录数据库R。The rep and nrep are combined and stored in the database Q by row. Delete the trip records in the database Q that contain station numbers other than Si1 and Si2 in the inbound station number or outbound station number, and obtain a database R of all school trip records that only travel between candidate stations.

接下来要做的是判断数据库R中每个卡号i的候选车站Si1和Si2是家车站还是学校车站。The next thing to do is to judge whether the candidate stations Si1 and Si2 of each card number i in the database R are home stations or school stations.

在步骤5中,选取所有卡号的两个候选车站之间的出行记录并按进站时间排序,根据进站时间按顺序依次判断每个卡号的每条出行记录所属的类别。根据南京市的中小学作息时间安排(见表8),将每个卡号i两个候选车站Si1、Si2之间的出行记录按照进站时间分为4类:(I)am:进站时间在上午最晚上学时间之前;(II)pm:进站时间在下午最早放学时间之后;(III)noon1:进站时间在上午放学时间范围内,即上午最早放学时间至下午最早上学时间范围之内;(IV)noon2:进站时间在下午上学时间范围内,即下午最早上学时间至下午最早放学时间范围之内。为了容许误差,可以适当放宽范围,本实施例中设置为(I)am:进站时间在9:00之前;(II)pm:进站时间在14:00之后;(III)noon1:进站时间在11:30到13:00之间;(IV)noon2:进站时间在13:00到14:00之间。In step 5, the travel records between the two candidate stations of all card numbers are selected and sorted according to the entry time, and the category to which each travel record of each card number belongs is determined in order according to the entry time. According to the schedule of work and rest time of primary and secondary schools in Nanjing (see Table 8), the travel records between the two candidate stations Si1 and Si2 of each card number i are divided into 4 categories according to the entry time: (I)am: The entry time is at Before the last school time in the morning; (II) pm: The entry time is after the earliest school dismissal time in the afternoon; (III) noon1: The entry time is within the morning dismissal time range, that is, within the range from the earliest morning dismissal time to the earliest afternoon school entry time ; (IV)noon2: The entry time is within the time range of the school in the afternoon, that is, within the range from the earliest start time in the afternoon to the earliest release time in the afternoon. In order to allow the error, the range can be appropriately relaxed. In this embodiment, it is set as (I)am: the time of entering the station is before 9:00; (II) pm: the time of entering the station is after 14:00; (III) noon1: the time of entering the station Time between 11:30 and 13:00; (IV)noon2: Pit time between 13:00 and 14:00.

表8南京中小学上、下学时间Table 8 Nanjing primary and secondary school start and end time

Figure BDA0001451690880000081
Figure BDA0001451690880000081

在步骤6中,根据时间条件判断家车站和学校车站。具体操作过程如下:(1)为数据库R新建列“家车站”和列“学校车站”赋值为0;(2)从数据库R中筛选出进站时间早于或等于9:00的出行记录存储于am,将am依次按照“卡号”、“进站时间”排序,选取每个卡号i的进站时间最早的出行记录,定义其进站车站号为家车站存储于列“家车站”,则该条记录中的出站车站号为学校车站存储于列“学校车站”;(3)从数据库R中筛选出进站时间晚于或等于14:00的出行记录存储于pm,删除pm中与am中卡号相同的卡号及其出行记录,将pm依次按照“卡号”、“进站时间”排序,选取每个卡号i的进站时间最晚的出行记录,定义其出站车站号为家车站存储于列“家车站”,则该条记录中的进站车站号为学校车站存储于列“学校车站”;(4)从数据库R中筛选出进站时间在11:30到13:00之间的出行记录存储于noon1,删除noon1中与am和pm中卡号相同的卡号及其出行记录,将noon1依次按照“卡号”、“进站时间”排序,选取每个卡号i的进站时间最早的出行记录,定义其进站车站号为学校车站存储于列“学校车站”,则该条记录中的出站车站号为家车站存储于列“家车站”;(5)从数据库R中筛选出进站时间在13:00到14:00之间的出行记录存储于noon2,删除noon2中与am、noon1及pm中卡号相同的卡号及其出行记录,将noon2依次按照“卡号”、“进站时间”排序,选取每个卡号i的进站时间最早的出行记录,定义其进站车站号为家车站存储于列“家车站”,则该条记录中的出站车站号为学校车站存储于列“学校车站”;(6)将am、pm、noon1、noon2按行合并,存储于数据库R1。删除数据库R中的“家车站”列和“学校车站”列,将R和R1按卡号按列合并存储于数据库S,至此得到了含有每个通学者的家车站和学校车站的数据S。在这一步的处理中,仅仅在9:00到11:30有出行的卡号及其记录被剔除了。In step 6, the home station and the school station are determined according to the time condition. The specific operation process is as follows: (1) assign 0 to the new column "home station" and column "school station" in database R; (2) filter out the trip records from database R whose entry time is earlier than or equal to 9:00. For am, sort am according to "card number" and "entry time" in turn, select the travel record with the earliest entry time for each card number i, define its entry station number as home station and store it in the column "home station", then The exit station number in this record is the school station and is stored in the column "School Station"; (3) Filter out the travel records whose entry time is later than or equal to 14:00 from the database R and store them in pm, delete the records with For the card numbers and travel records with the same card number in am, sort pm according to "card number" and "entry time" in turn, select the travel record with the latest entry time for each card number i, and define its exit station number as the home station Stored in the column "Home Station", the entry station number in this record is the school station and stored in the column "School Station"; (4) Filter out the entry time from the database R between 11:30 and 13:00 The travel records in between are stored in noon1, delete the card number and its travel record in noon1 with the same card number as the card number in am and pm, sort noon1 according to "card number" and "entry time" in turn, and select the earliest entry time for each card number i The travel record of , defines its inbound station number as the school station and stored in the column "school station", then the outbound station number in this record is the home station and stored in the column "home station"; (5) filter from the database R The travel records whose entry and exit time is between 13:00 and 14:00 are stored in noon2. Delete the card numbers and travel records in noon2 that are the same as the card numbers in am, noon1, and pm. Sort by “station time”, select the travel record with the earliest entry time of each card number i, define its entry station number as home station and store it in the column “home station”, then the exit station number in this record is stored at the school station In the column "school station"; (6) merge am, pm, noon1, noon2 by row and store in database R1. Delete the columns of "home station" and "school station" in database R, and combine R and R1 by card number and store them in database S. So far, data S containing the home station and school station of each general student is obtained. In this step of processing, only the card numbers and their records with travel from 9:00 to 11:30 are eliminated.

三、删除异常出行记录和通学天数过少的卡号及其出行记录3. Delete abnormal travel records and card numbers with too few school days and their travel records

在步骤7中,根据中小学作息时间安排,在所有被识别出家车站和学校车站的卡号的出行记录中删掉上午最晚上学时间之前从学校车站出发的记录以及下午最晚放学时间之后从家车站出发的记录,在本实施例中,适当放宽时间要求,删除了9:00之前从学校车站出发的记录和16:00之后从家车站出发的记录。In step 7, according to the schedule of work and rest in primary and secondary schools, delete the records of departure from the school station before the last school time in the morning and the trip from home after the latest school time in the afternoon from the travel records of all the card numbers identified with the home station and the school station. For the record of departure from the station, in this embodiment, the time requirement is appropriately relaxed, and the record of departure from the school station before 9:00 and the record of departure from the home station after 16:00 are deleted.

在步骤8中,使用循环算法统计数据库S中每个卡号的通学天数存储于新的列“通学天数”,通学天数的最低阈值应根据所采集的数据日期跨度而定,即采集连续n周的数据则最低阈值为n,由于本实施例采用了连续三周工作日的地铁刷卡记录作为样本,因此删除通学天数少于3天的卡号及其记录。至此得到了最终数据库Z,包含列“卡号”、“进站日期”、“进站时间”、“出站日期”、“出站时间”、“进站车站号”、“出站车站号”、“进站站点经度”、“进站站点纬度”、“出站站点经度”、“出站站点纬度”、“家车站”、“学校车站”、“通学天数”,部分识别结果如表9所示。本实施例中识别出的通学学生人数占原始数据中总学生人数的40%。In step 8, the number of school days for each card number in the statistical database S is stored in a new column "School Days", and the minimum threshold of the number of school days should be determined according to the date span of the collected data, that is, the data collected for n consecutive weeks are collected. The minimum threshold for the data is n. Since this embodiment uses the subway card swiping records of three consecutive working days as a sample, the card numbers and their records with less than 3 days of school days are deleted. At this point, the final database Z has been obtained, including the columns "card number", "inbound date", "inbound time", "outbound date", "outbound time", "inbound station number", "outbound station number" , "inbound station longitude", "inbound station latitude", "outbound station longitude", "outbound station latitude", "home station", "school station", "days to school", some identification results are shown in Table 9 shown. The number of general students identified in this embodiment accounts for 40% of the total number of students in the original data.

表9部分识别结果示例Table 9 Examples of partial recognition results

Figure BDA0001451690880000101
Figure BDA0001451690880000101

表9(续)部分识别结果示例Table 9 (continued) Examples of partial recognition results

Figure BDA0001451690880000102
Figure BDA0001451690880000102

Claims (8)

1.一种基于地铁刷卡数据的通学识别方法,其特征在于,该方法包括以下步骤:1. a general school identification method based on subway credit card data, is characterized in that, this method comprises the following steps: (1)采集一定时间段内各地铁站点的IC卡刷卡数据和地铁站点坐标数据,对原始刷卡数据进行预处理,剔除无效数据;(1) Collect IC card swipe data and subway station coordinate data of each subway station within a certain period of time, preprocess the original card swipe data, and eliminate invalid data; (2)为每个卡号计算出该卡号所有车站的出现频次,找出其中出现频次最高的车站,并统计每个卡号的出现频次最高的车站的数量;(2) Calculate the frequency of occurrence of all stations of the card number for each card number, find out the station with the highest frequency of occurrence, and count the number of stations with the highest frequency of occurrence of each card number; (3)根据出现频次最高的车站的数量,找出每个卡号的家车站和学校车站的候选车站,具体包括:(3) According to the number of stations with the highest frequency, find out the candidate stations of the home station and school station of each card number, including: (31)判断每个卡号出现频次最高的车站的数量,若该车站数为1,则对应的车站为该卡号的候选车站Si1;若车站数为2,则对应的两个车站分别为该卡号的候选车站Si1和Si2;若车站数大于2,则将相距不超过指定阈值的站点合并,再令频次最高的车站为Si1,若合并后频次最高的车站数为2,则对应的两个车站分别为该卡号的候选车站Si1和Si2,若合并后频次最高的车站数仍大于2,则删除该卡号及相应记录;(31) Determine the number of stations with the highest frequency of occurrence of each card number. If the number of stations is 1, the corresponding station is the candidate station Si1 of the card number; if the number of stations is 2, the corresponding two stations are the card number. The candidate stations Si1 and Si2 of the are the candidate stations Si1 and Si2 of the card number respectively. If the number of stations with the highest frequency after the combination is still greater than 2, the card number and the corresponding record will be deleted; (32)对于步骤(31)中只找出一个候选车站Si1的卡号,计算与候选车站Si1对应的车站出现的频次,根据出现频次最高的对应车站的数量找出家车站或学校车站的另一个候选车站Si2:若对应车站中出现频次最高的车站数量为1,则该车站为该卡号的另一个候选车站Si2;若对应车站中出现频次最高的车站数量大于或等于2,将相距不超过指定阈值的站点合并,然后令频次最高的车站为Si2,若合并后仍有并列则删除该卡号及相应记录;(32) For the card number of only one candidate station Si1 found in step (31), calculate the frequency of occurrence of the station corresponding to the candidate station Si1, and find the other one of the home station or the school station according to the number of corresponding stations with the highest frequency of occurrence Candidate station Si2: If the number of stations with the highest frequency in the corresponding station is 1, the station is another candidate station Si2 for the card number; if the number of stations with the highest frequency in the corresponding station is greater than or equal to 2, the distance will not exceed the specified distance. The stations with the threshold are merged, and then the station with the highest frequency is Si2. If there is still a tie after the merger, the card number and the corresponding record will be deleted; (4)根据中小学作息时间安排,将所有卡号候选车站之间的出行记录按照进站时间分为若干类别;(4) According to the schedule of work and rest in primary and secondary schools, the travel records between all card number candidate stations are divided into several categories according to the entry time; (5)针对落入各个类别中的出行记录,根据最早或最晚进站时间确定家车站和学校车站,从而得到通学出行记录。(5) For the trip records that fall into each category, the home station and the school station are determined according to the earliest or latest entry time, so as to obtain the trip records of school trips. 2.根据权利要求1所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,所述步骤(1)中的原始刷卡数据包括:卡号、进站日期、进站时间、出站日期、出站时间、进站车站号、出站车站号、卡类型。2. a kind of general school identification method based on subway card swiping data according to claim 1, is characterized in that, the original card swiping data in described step (1) comprises: card number, entry date, entry time, exit date , departure time, inbound station number, outbound station number, card type. 3.根据权利要求2所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,所述步骤(1)中对原始数据预处理包括:只保留卡类型为学生卡并且进站日期属于工作日的记录,并删除进站车站号和出站车站号相同以及进站日期和出站日期不同的异常数据。3. a kind of general school identification method based on subway card swiping data according to claim 2, is characterized in that, in described step (1), to original data preprocessing comprises: only keep card type as student card and the date of entry belongs to Records of working days, and delete abnormal data with the same inbound station number and outbound station number and different inbound and outbound dates. 4.根据权利要求1所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,所述步骤(1)中一定时间段的时长不低于三周。4 . The general school identification method based on subway card swiping data according to claim 1 , wherein the duration of a certain period of time in the step (1) is not less than three weeks. 5 . 5.根据权利要求1所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,所述步骤(4)中出行类别包括:5. a kind of general school identification method based on subway credit card data according to claim 1, is characterized in that, in described step (4), travel category comprises: am:进站时间在上午最晚上学时间之前;am: The pit stop time is before the last school time in the morning; pm:进站时间在下午最早放学时间之后;pm: The pit stop time is after the earliest dismissal time in the afternoon; noon1:进站时间在上午最早放学时间至下午最早上学时间范围之内;noon1: The entry time is within the range from the earliest dismissal time in the morning to the earliest beginning time in the afternoon; noon2:进站时间在下午最早上学时间至下午最晚上学时间范围之内。noon2: The entry time is within the range from the earliest school time in the afternoon to the last school time in the afternoon. 6.根据权利要求5所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,所述步骤(5)包括:对四个类别分别按照卡号、进站时间排序;对于存在出行记录属于am的卡号,将其第一条出行记录的进站车站号确定为家车站,对应的出站车站号为学校车站;对于存在出行记录属于pm的卡号,将最后一条出行记录的出站车站号确定为家车站,对应的进站车站号为学校车站;对于存在出行记录属于noon1的卡号,将第一条出行记录的进站车站号确定为学校车站,对应的出站车站号为家车站;对于存在出行记录属于noon2的卡号,将最后一条出行记录的出站车站号确定为学校车站,对应的进站车站号为家车站。6. a kind of general recognition method based on subway card swiping data according to claim 5, is characterized in that, described step (5) comprises: four categories are sorted according to card number, entry time respectively; For the card number of am, the inbound station number of the first trip record is determined as the home station, and the corresponding outbound station number is the school station; for the card number with the trip record belonging to pm, the outbound station number of the last trip record is determined. It is determined as the home station, and the corresponding inbound station number is the school station; for the card number with the trip record belonging to noon1, the inbound station number of the first trip record is determined as the school station, and the corresponding outbound station number is the home station; For a card number with a trip record belonging to noon2, the outbound station number of the last trip record is determined as the school station, and the corresponding inbound station number is the home station. 7.根据权利要求1所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,还包括:根据中小学作息时间安排,在所有被识别出家车站和学校车站的卡号的出行记录中删除上午最晚上学时间之前从学校车站出发的记录以及下午最晚放学时间之后从家车站出发的记录。7. a kind of general school identification method based on subway card swiping data according to claim 1, is characterized in that, also comprises: according to middle and primary school work and rest time arrangement, delete in all the travel records of the card numbers of the identified home station and school station Records of departures from the school station before the latest school time in the morning and records of departures from the home station after the latest school hours in the afternoon. 8.根据权利要求1所述的一种基于地铁刷卡数据的通学识别方法,其特征在于,还包括:删除通学天数少于指定阈值天数的卡号及其出行记录。8 . The method for general school identification based on subway card swiping data according to claim 1 , further comprising: deleting card numbers and travel records whose days of school days are less than a specified threshold number of days. 9 .
CN201711043136.0A 2017-10-31 2017-10-31 A general recognition method based on subway card swiping data Active CN107818415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711043136.0A CN107818415B (en) 2017-10-31 2017-10-31 A general recognition method based on subway card swiping data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711043136.0A CN107818415B (en) 2017-10-31 2017-10-31 A general recognition method based on subway card swiping data

Publications (2)

Publication Number Publication Date
CN107818415A CN107818415A (en) 2018-03-20
CN107818415B true CN107818415B (en) 2021-07-09

Family

ID=61604455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711043136.0A Active CN107818415B (en) 2017-10-31 2017-10-31 A general recognition method based on subway card swiping data

Country Status (1)

Country Link
CN (1) CN107818415B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681741B (en) * 2018-04-08 2021-11-12 东南大学 Subway commuting crowd information fusion method based on IC card and resident survey data
CN109508815B (en) * 2018-10-19 2021-08-10 东南大学 General activity spatial measure analysis method based on subway IC card data
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN110472813B (en) * 2019-06-24 2023-12-22 广东浤鑫信息科技有限公司 Self-adaptive adjustment method and system for school bus station

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237129A1 (en) * 2001-03-02 2002-09-04 Hitachi, Ltd. Service providing method
CN103198104A (en) * 2013-03-25 2013-07-10 东南大学 Bus station origin-destination (OD) obtaining method based on urban advanced public transportation system
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1237129A1 (en) * 2001-03-02 2002-09-04 Hitachi, Ltd. Service providing method
CN103198104A (en) * 2013-03-25 2013-07-10 东南大学 Bus station origin-destination (OD) obtaining method based on urban advanced public transportation system
CN103279534A (en) * 2013-05-31 2013-09-04 西安建筑科技大学 Public transport card passenger commuter OD (origin and destination) distribution estimation method based on APTS (advanced public transportation systems)
CN105718946A (en) * 2016-01-20 2016-06-29 北京工业大学 Passenger going-out behavior analysis method based on subway card-swiping data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
通勤制约度对儿童陪伴出行决策过程的影响;何保红 等;《交通运输系统工程与信息》;20141231;第14卷(第6期);第223-230页 *

Also Published As

Publication number Publication date
CN107818415A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
CN107818415B (en) A general recognition method based on subway card swiping data
WO2020238631A1 (en) Population type recognition method based on mobile phone signaling data
CN106570184B (en) Method for extracting recreation-living contact data set from mobile phone signaling data
CN105718946A (en) Passenger going-out behavior analysis method based on subway card-swiping data
CN108681741B (en) Subway commuting crowd information fusion method based on IC card and resident survey data
CN110889092A (en) A passenger flow prediction method for short-term large-scale events surrounding rail sites based on rail transaction data
CN102097002A (en) Method and system for acquiring bus stop OD based on IC card data
Zhou et al. Who you are determines how you travel: Clustering human activity patterns with a Markov-chain-based mixture model
CN107527313A (en) User Activity mode division and attribute estimation method
CN111291216B (en) Method and system for analyzing foothold based on face structured data
CN106528850B (en) Gate inhibition's data exception detection method based on machine learning clustering algorithm
CN105335795A (en) Metro-bus transfer problem automatic diagnosis method based on IC card data
Bocquier Migration analysis using demographic surveys and surveillance systems
CN112417286A (en) Method and system for analyzing influence factors gathered by regional culture industry
CN111241162A (en) Analysis method and storage medium of passenger travel behavior under the condition of high-speed railway network
CN114897345A (en) Method and device for automatically generating index scores based on employee data
Chen et al. Research on the classification of urban rail transit stations-taking Shanghai metro as an example
CN119180372B (en) Track traffic passenger flow OD prediction method and system thereof
CN114519388A (en) User subdivision method based on high-speed ETC charging data
CN106781467A (en) A kind of bus passenger based on collaborative filtering is swiped the card site information extracting method
CN112733891B (en) Method for identifying bus IC card passengers to get off station points during travel chain breakage
CN111382952A (en) Elevator quality inspection extraction method based on comprehensive coverage principle
CN109508815B (en) General activity spatial measure analysis method based on subway IC card data
Wei et al. Cluster Analysis of Trip Purpose Based on Residents’ Travel Characteristic
CN107943920A (en) A kind of trip crowd recognition method based on subway brushing card data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant