CN102006174B - Data processing method and device based on online behavior of mobile phone user - Google Patents
Data processing method and device based on online behavior of mobile phone user Download PDFInfo
- Publication number
- CN102006174B CN102006174B CN201010535447.0A CN201010535447A CN102006174B CN 102006174 B CN102006174 B CN 102006174B CN 201010535447 A CN201010535447 A CN 201010535447A CN 102006174 B CN102006174 B CN 102006174B
- Authority
- CN
- China
- Prior art keywords
- url
- bill
- data
- field
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明涉及一种基于手机用户上网行为的数据处理方法及装置,其中方法包括:根据用户上网数据生成包含有用户访问URL的第一话单;按照预定规则对第一话单中数据进行预处理,生成第二话单;对第二话单中数据进行统计分析处理。本发明在话单数据入库前,先对话单数据进行预处理,通过一系列的预处理后生成新的话单数据入库,之后系统数据库对第二话单数据进行统计分析处理。由此省去了数据库对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。
The present invention relates to a data processing method and device based on the online behavior of mobile phone users, wherein the method includes: generating a first call list including the user's access URL according to the user's online surfing data; and preprocessing the data in the first call list according to predetermined rules , generating a second bill; performing statistical analysis on the data in the second bill. Before the call bill data is put into the database, the invention preprocesses the call bill data firstly, generates new call bill data after a series of preprocessing and puts it into the database, and then the system database performs statistical analysis on the second call bill data. This eliminates the need for the database to analyze a large number of URL data, thereby greatly improving the efficiency of dialogue list data processing, and solving the performance bottleneck problem of mobile phone users' online behavior analysis.
Description
技术领域 technical field
本发明涉及移动网络技术领域,尤其涉及一种基于手机用户上网行为的数据处理方法及装置。The invention relates to the technical field of mobile networks, in particular to a data processing method and device based on mobile phone users' online behavior.
背景技术 Background technique
目前,在移动网络业务中,对用户上网数据进行分析挖掘成为一种流行趋势。随着服务供应商以及使用手机上网用户数量的不断增长,使得移动业务系统产生的话单不断增长,在话单量较多的业务系统中,业务量TPS(Tip-Per-Second)甚至已达到5000条/秒,每天的数据量约有1亿到2亿之多。At present, in mobile network services, analyzing and mining user surfing data has become a popular trend. With the continuous growth of service providers and the number of users using mobile phones to access the Internet, the bills generated by mobile service systems continue to increase. In service systems with a large number of bills, the service volume TPS (Tip-Per-Second) has even reached 5000 Articles/second, the amount of data per day is about 100 million to 200 million.
作为需要了解手机用户上网行为的运营商,通常需要对手机用户上网行为进行以下分析:As an operator that needs to understand the online behavior of mobile phone users, it is usually necessary to analyze the online behavior of mobile phone users as follows:
a)上网类型分析:用户访问频率较高的网站的类型;a) Internet type analysis: the types of websites with high user visit frequency;
b)指定网站流量分析:网站或者网站内具体内容的访问流量;b) Designated website traffic analysis: visit traffic of the website or specific content within the website;
c)广告访问流量分析:广告网址具体分类的访问流量。c) Advertisement access flow analysis: the access flow of specific classifications of advertisement URLs.
传统技术中,对手机用户上网数据进行分析所采用的方法是:对移动业务系统生成的话单中的URL(Uniform/Universal Resource Locator,统一资源定位符,也称为网页地址)字段进行分析。其中:In the traditional technology, the method adopted for analyzing the mobile phone user's online data is: analyzing the URL (Uniform/Universal Resource Locator, also known as the web page address) field in the bill generated by the mobile service system. in:
上网类型分析的过程包括:话单数据入库、维护一张HOST与类型的对照关系表、针对单个URL解析出HOST、从对照关系表中查询出类型以及针对所有的URL进行分析;The process of online type analysis includes: entering bill data into the database, maintaining a comparison table of HOST and type, analyzing HOST for a single URL, querying the type from the comparison table, and analyzing all URLs;
指定网站流量分析的过程包括:话单数据入库、维护URL转换规则对照表、针对单个URL转换以及针对所有的URL进行分析;The process of specifying website traffic analysis includes: bill data storage, maintenance of URL conversion rule comparison table, conversion of a single URL and analysis of all URLs;
广告访问流量分析的过程包括:话单数据入库、维护URL和广告的对应关系表、查询单个URL属于哪种广告以及针对所有的URL进行分析。The process of advertising access traffic analysis includes: storing bill data, maintaining the correspondence table between URLs and advertisements, querying which advertisement a single URL belongs to, and analyzing all URLs.
在数据流量较大的情况下,采用上述方法对话单数据进行处理时会使系统性能出现瓶颈现象。因为话单中的URL均是加密存储,在解析URL之前必须对URL做解密处理,还需要对URL解密后的字符串做复杂的运算操作,所以数据处理时间较长,以传统解决方案对手机用户上网行为分析的测试数据如下表1所示:In the case of a large data flow, using the above method to process dialog data will cause a bottleneck phenomenon in system performance. Because the URLs in the call list are all encrypted and stored, the URL must be decrypted before parsing the URL, and complex calculations must be performed on the decrypted string of the URL, so the data processing time is long. The test data of user online behavior analysis is shown in Table 1 below:
表1Table 1
从上表1可以看出,生成话单的速度比处理话单的速度快,由此使得话单会越堆越多而无法及时处理,不仅造成数据处理的严重延时,而且增加了系统数据库处理负担。It can be seen from the above table 1 that the speed of generating bills is faster than the speed of processing bills, so that bills will pile up more and more and cannot be processed in time, which not only causes serious delays in data processing, but also increases the system database processing burden.
发明内容 Contents of the invention
本发明的主要目的在于提供一种基于手机用户上网行为的数据处理方法及装置,旨在提高手机用户上网数据的处理速度,提高系统性能。The main purpose of the present invention is to provide a data processing method and device based on mobile phone users' online behavior, aiming at improving the processing speed of mobile phone users' online data and improving system performance.
本发明提出一种基于手机用户上网行为的数据处理方法,所述方法包括:The present invention proposes a data processing method based on the online behavior of mobile phone users, the method comprising:
根据用户上网数据生成包含有用户访问网页地址URL的第一话单;Generate the first call list containing the URL of the user's web page address according to the user's Internet access data;
按照预定规则对所述第一话单中数据进行预处理,生成第二话单;Preprocessing the data in the first bill according to predetermined rules to generate a second bill;
对所述第二话单中数据进行统计分析处理。Statistical analysis is performed on the data in the second bill.
优选地,所述按照预定规则对所述第一话单中数据进行预处理的步骤包括:Preferably, the step of preprocessing the data in the first bill according to predetermined rules includes:
对所述第一话单中数据进行上网类型URL分析处理和/或指定网站流量分析处理和/或广告访问流量分析处理。The data in the first call list is analyzed and processed by URL type of Internet access and/or specified website traffic analysis and/or advertisement access traffic analysis and processing.
优选地,所述对第一话单中数据进行上网类型URL分析处理的步骤包括:Preferably, the step of analyzing and processing the data in the first call list for the URL type of Internet access includes:
在所述第一话单中增加URL类型的字段,用于存放URL所属类别;Adding a field of the URL type in the first bill to store the category to which the URL belongs;
解析所述第一话单中的源URL;Analyzing the source URL in the first bill;
从预设的URL类别对照关系表中查找所述源URL对应的所属类别,写入第二话单中与源URL对应的URL类型的字段。Find the category corresponding to the source URL from the preset URL category comparison table, and write it into the field of the URL type corresponding to the source URL in the second bill.
优选地,所述对第一话单中数据进行指定网站流量分析处理的步骤包括:Preferably, the step of performing specified website traffic analysis on the data in the first call list includes:
在所述第一话单中增加新URL字段,用于存放转换后的新URL;Adding a new URL field in the first bill for storing the converted new URL;
根据预定转换规则转换第一话单中的源URL;Convert the source URL in the first bill according to a predetermined conversion rule;
将转换后的源URL写入第二话单中与源URL对应的新URL字段。Write the converted source URL into the new URL field corresponding to the source URL in the second bill.
优选地,所述对第一话单中数据进行广告访问流量分析处理的步骤包括:Preferably, the step of analyzing and processing the data in the first bill includes:
在所述第一话单中增加广告类URL字段,用于存放广告类URL;Adding an advertisement URL field in the first bill to store the advertisement URL;
根据所述第一话单中源URL所携带的预定标识符分离出广告类URL;separating the advertising URL according to the predetermined identifier carried by the source URL in the first bill;
将所述广告类URL写入第二话单中与源URL对应的广告类URL字段。Write the advertising URL into the advertising URL field corresponding to the source URL in the second bill.
优选地,所述对第一话单中数据进行广告访问流量分析处理的步骤包括:Preferably, the step of analyzing and processing the data in the first bill includes:
在所述第一话单中增加广告类URL字段,用于存放广告类URL;Adding an advertisement URL field in the first bill to store the advertisement URL;
根据所述第一话单中源URL所携带的预定标识符分离出广告类URL;separating the advertising URL according to the predetermined identifier carried by the source URL in the first bill;
将所述广告类URL写入第二话单中与源URL对应的广告类URL字段。Write the advertising URL into the advertising URL field corresponding to the source URL in the second bill.
本发明还提出一种基于手机用户上网行为的数据处理装置,包括:The present invention also proposes a data processing device based on the online behavior of mobile phone users, including:
原始话单生成模块,用于根据用户上网数据生成包含有用户访问URL的第一话单;The original bill generation module is used to generate the first bill that includes the user's access URL according to the user's online data;
新话单生成模块,用于按照预定规则对所述第一话单中数据进行预处理,生成第二话单;A new bill generating module, configured to preprocess the data in the first bill according to predetermined rules to generate a second bill;
新话单数据处理模块,用于对所述第二话单中数据进行统计分析处理。The new bill data processing module is used to perform statistical analysis on the data in the second bill.
优选地,所述新话单生成模块还用于对所述第一话单中数据进行上网类型URL分析处理、指定网站流量分析处理和/或广告访问流量分析处理。Preferably, the new bill generation module is further configured to analyze the data in the first bill for URL analysis of the online type, designated website traffic analysis and/or advertisement access traffic analysis.
优选地,所述新话单生成模块包括:Preferably, the new bill generation module includes:
字段增加单元,用于在所述第一话单中增加用来存放URL所属类别的URL类型的字段;A field adding unit, configured to add a field for storing the URL type of the category to which the URL belongs in the first bill;
解析单元,用于解析所述第一话单中的源URL;A parsing unit, configured to parse the source URL in the first bill;
写入单元,用于从预设的URL类别对照关系表中查找所述源URL对应的所属类别,写入第二话单中与源URL对应的URL类型的字段。A writing unit, configured to search for the category corresponding to the source URL from the preset URL category comparison table, and write the field of the URL type corresponding to the source URL in the second bill.
优选地,所述字段增加单元,还用于在所述第一话单中增加用来存放转换后的新URL的新URL字段;Preferably, the field adding unit is further configured to add a new URL field for storing the converted new URL in the first bill;
所述解析单元,还用于根据预定转换规则转换第一话单中的源URL;The parsing unit is further configured to convert the source URL in the first bill according to a predetermined conversion rule;
所述写入单元,还用于将转换后的源URL写入第二话单中与源URL对应的新URL字段;或者The writing unit is also used to write the converted source URL into the new URL field corresponding to the source URL in the second bill; or
所述字段增加单元,还用于在所述第一话单中增加用来存放广告类URL的广告类URL字段;The field adding unit is also used to add an advertisement URL field for storing the advertisement URL in the first bill;
所述解析单元,还用于根据所述第一话单中源URL所携带的预定标识符分离出广告类URL;The parsing unit is further configured to separate the advertisement-type URL according to the predetermined identifier carried by the source URL in the first bill;
所述写入单元,还用于将所述广告类URL写入第二话单中与源URL对应的广告类URL字段。The writing unit is further configured to write the advertising URL into the advertising URL field corresponding to the source URL in the second bill.
本发明提出的一种基于手机用户上网行为的数据处理方法及装置,在话单数据入库前,先使用预处理设备比如接口机对话单数据进行预处理,预处理过程包括对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后生成新的话单数据入库。之后系统数据库对第二话单数据进行统计分析处理。由此,将URL的解析过程交由接口机去处理,解析后的结果数据生成了新的话单,系统数据库直接根据结果数据进行统计分析,省去了对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。A data processing method and device based on the online behavior of mobile phone users proposed by the present invention uses a preprocessing device such as an interface machine to preprocess the dialogue ticket data before the bill data is stored in the database. Classify and summarize URLs, convert URLs according to certain rules, etc., and generate new bill data into the database after a series of preprocessing. Afterwards, the system database performs statistical analysis and processing on the second bill data. Thus, the URL parsing process is handed over to the interface machine for processing, and a new bill is generated from the parsed result data, and the system database performs statistical analysis directly according to the result data, eliminating the process of analyzing a large amount of URL data. Thereby, the efficiency of dialogue list data processing is greatly improved, and the performance bottleneck problem of mobile phone users' online behavior analysis is solved.
附图说明 Description of drawings
图1是本发明基于手机用户上网行为的数据处理方法一实施例流程示意图;Fig. 1 is a schematic flow chart of an embodiment of a data processing method based on mobile phone user online behavior of the present invention;
图2是本发明基于手机用户上网行为的数据处理方法一实施例中对第一话单中数据进行上网类型URL分析处理的流程示意图;Fig. 2 is the schematic flow chart of carrying out the Internet type URL analysis processing to the data in the first bill in an embodiment of the data processing method based on the mobile phone user's online behavior of the present invention;
图3是本发明基于手机用户上网行为的数据处理方法一实施例中对第一话单中数据进行指定网站流量分析处理的流程示意图;Fig. 3 is the schematic flow chart of carrying out the specified website traffic analysis process to the data in the first call list in an embodiment of the data processing method based on the mobile phone user's online behavior in the present invention;
图4是本发明基于手机用户上网行为的数据处理方法一实施例中对第一话单中数据进行广告访问流量分析处理的流程示意图;Fig. 4 is a schematic flow diagram of performing advertisement access traffic analysis processing on the data in the first call list in an embodiment of the data processing method based on the mobile phone user's online behavior in the present invention;
图5是本发明基于手机用户上网行为的数据处理装置一实施例结构示意图;Fig. 5 is a schematic structural diagram of an embodiment of a data processing device based on mobile phone users' online behavior in the present invention;
图6是本发明基于手机用户上网行为的数据处理装置一实施例中新话单生成模块的结构示意图。FIG. 6 is a schematic structural diagram of a new bill generation module in an embodiment of the data processing device based on mobile phone users' online behavior of the present invention.
为了使本发明的技术方案更加清楚、明了,下面将结合附图作进一步详述。In order to make the technical solution of the present invention clearer and clearer, it will be further described below in conjunction with the accompanying drawings.
具体实施方式 Detailed ways
本发明实施例解决方案主要是在话单数据入库前,先对话单数据进行预处理,预处理过程包括对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后生成新的话单数据入库。之后系统数据库对第二话单数据进行统计分析处理,以提高对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。The solution of the embodiment of the present invention is mainly to preprocess the call list data before the call list data is stored in the database. After the preprocessing of the system, new bill data is generated and put into the database. Afterwards, the system database performs statistical analysis and processing on the second bill data, so as to improve the efficiency of the data processing of the conversation bill, and solve the performance bottleneck problem of mobile phone users' online behavior analysis.
如图1所示,本发明一实施例提出一种基于手机用户上网行为的数据处理方法,包括:As shown in Figure 1, an embodiment of the present invention proposes a data processing method based on mobile phone users' online behavior, including:
步骤S101,根据用户上网数据生成包含有用户访问URL的第一话单;Step S101, generating a first bill containing the user's access URL according to the user's online data;
在本实施例中,用户可以通过手机上网,访问各种网站,以获取相应的网络信息。当用户通过手机上网时,移动业务系统根据手机用户的访问网址获取网络数据,产生原始话单,即本实施例所称第一话单,用户访问量越多,移动业务系统产生的话单量相应增加。In this embodiment, the user can surf the Internet through the mobile phone and visit various websites to obtain corresponding network information. When a user surfs the Internet through a mobile phone, the mobile service system obtains network data according to the mobile phone user's access website, and generates an original call list, which is the first call list in this embodiment. Increase.
其中,话单中包含有用户访问的URL。URL是用于完整地描述Internet上网页和其他资源的地址的一种标识方法。Internet上的每一个网页都具有一个唯一的名称标识,通常称之为URL地址,这种地址可以是本地磁盘,也可以是局域网上的某一台计算机,更多的是Internet上的站点。简单地说,URL就是Web地址,俗称“网址”。Wherein, the bill includes the URL accessed by the user. URL is an identification method used to completely describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique name identifier, usually called a URL address. This address can be a local disk, or a certain computer on the LAN, and more often it is a site on the Internet. Simply put, a URL is a web address, commonly known as a "web address".
当移动业务系统获取到第一话单后,需要对第一话单数据进行分析统计处理,以便根据处理结果了解手机用户上网行为,比如:用户常喜欢上哪些类型的网站、某些指定的网站的访问流量情况以及商家关心的广告访问流量等,从而根据手机用户上网行为后续采取相应的商业措施等。After the mobile service system obtains the first bill, it needs to analyze and statistically process the data of the first bill, so as to understand the online behavior of mobile phone users based on the processing results, such as: which types of websites users often like, and certain designated websites The visit flow of mobile phone users and the advertisement visit traffic that merchants care about, so as to take corresponding commercial measures according to the online behavior of mobile phone users.
步骤S102,按照预定规则对第一话单中数据进行预处理,生成第二话单;Step S102, preprocessing the data in the first bill according to predetermined rules to generate a second bill;
在本实施例中,预定规则是针对运营商所关心的访问网站类型、网站访问流量以及广告访问流量等主要问题而制定,其中按照预定规则对第一话单中数据进行预处理包括:对第一话单中数据进行上网类型URL分析处理和/或指定网站流量分析处理和/或广告访问流量分析处理,具体的,比如可以对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等。In this embodiment, the predetermined rule is formulated for the main issues of the operator concerned, such as the type of website visited, website visit traffic, and advertisement visit traffic, wherein the preprocessing of the data in the first bill according to the predetermined rule includes: The data in a call list is analyzed and processed by Internet type URL and/or specified website traffic analysis and/or advertisement access traffic analysis and processing. Specifically, for example, it can classify and summarize URLs generated by users surfing the Internet, and convert URLs according to certain rules. wait.
根据获取数据处理信息的需要,上述预定规则还可为其他类似的规则。According to the requirement of obtaining data processing information, the above predetermined rules may also be other similar rules.
本实施例中对第一话单中数据进行预处理,可以采用独立的设备,比如接口机,先使用接口机对话单数据进行预处理,比如对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后生成新的话单即本实施例中第二话单,然后将新话单数据入库,以便后续处理过程中,系统数据库对第二话单数据进行统计分析处理。在本实施例中,第二话单数据入库可以通过IMP入库程序将预处理后的数据录入到系统指定的数据库表中。In this embodiment, the data in the first call list is preprocessed, independent equipment can be used, such as an interface machine, and the interface machine is used to preprocess the dialogue list data first, such as classifying and summarizing URLs generated by users surfing the Internet, URLs according to Certain rules are converted etc., after a series of preprocessing, generate a new bill, which is the second bill in this embodiment, and then put the new bill data into the database, so that in the subsequent processing process, the system database will update the second bill data Perform statistical analysis. In this embodiment, the storage of the second bill data may use the IMP storage program to input the preprocessed data into the database table specified by the system.
步骤S103,对第二话单中数据进行统计分析处理。Step S103, performing statistical analysis on the data in the second bill.
如上所述,新生成的话单数据交由系统数据库进行统计分析处理,比如,根据第二话单中URL的所属类别,可以统计出用户期望获取的某一类URL的汇总数据。由此,将第一话单中URL的解析过程交由接口机去处理,解析后的结果数据生成了新的话单,系统数据库直接根据结果数据进行统计分析,省去了对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。As mentioned above, the newly generated bill data is submitted to the system database for statistical analysis and processing. For example, according to the category of the URL in the second bill, the summary data of a certain type of URL that the user expects to obtain can be counted. Thus, the analysis process of the URL in the first call list is handed over to the interface machine for processing, and the result data after analysis generates a new call list, and the system database directly conducts statistical analysis based on the result data, eliminating the need for a large number of URL data The analysis process greatly improves the efficiency of dialogue list data processing and solves the performance bottleneck problem of mobile phone users' online behavior analysis.
如图2所示,步骤S102中对第一话单中数据进行上网类型URL分析处理的步骤包括:As shown in Figure 2, in step S102, the step that the data in the first call list is carried out to surf the Internet type URL analysis process includes:
步骤S1021,在第一话单中增加用来存放URL所属类别的URL类型的字段;Step S1021, adding a URL type field for storing the category to which the URL belongs in the first bill;
步骤S1022,解析第一话单中的源URL;Step S1022, parsing the source URL in the first bill;
步骤S1023,从预设的URL类别对照关系表中查找源URL对应的所属类别,写入第二话单中与源URL对应的URL类型的字段。Step S1023: Find the category corresponding to the source URL from the preset URL category comparison table, and write it into the field of the URL type corresponding to the source URL in the second bill.
下面以具体实例说明对第一话单中数据进行上网类型URL分析处理的过程,假如有如下表2所示的第一话单数据:The process of carrying out the Internet type URL analysis process to the data in the first call list is illustrated below with specific examples, if there is the first call list data shown in Table 2 below:
表2Table 2
其中,URL的分类标准即预设的URL类别对照关系表如下表3所示:Among them, the URL classification standard is the preset URL category comparison table as shown in Table 3 below:
表3table 3
通过上网类型URL分析预处理的结果如下表4所示:Table 4 below shows the results of preprocessing through the URL analysis of Internet access types:
表4Table 4
由此可以得出,根据第二话单中URL的所属类别,可以统计出用户期望获取的某一类比如新闻类的URL的汇总数据,表4所示的新闻类的URL为两个,http://www.sina.com/news/1004.htm和http://www.sina.com/news/1005.htm。It can thus be drawn that, according to the category of the URL in the second call list, the summary data of a certain class such as the URL of the news class that the user expects to obtain can be counted. There are two URLs of the news class shown in Table 4, http ://www.sina.com/news/1004.htm and http://www.sina.com/news/1005.htm.
如图3所示,步骤S102中对第一话单中数据进行指定网站流量分析处理的步骤包括:As shown in Figure 3, in step S102, the step of carrying out the designated website flow analysis processing to the data in the first bill includes:
步骤S1024,在第一话单中增加用来存放转换后的新URL的新URL字段;Step S1024, adding a new URL field for storing the converted new URL in the first bill;
步骤S1025,根据预定转换规则转换第一话单中的源URL;Step S1025, converting the source URL in the first bill according to a predetermined conversion rule;
步骤S1026,将转换后的源URL写入第二话单中与源URL对应的新URL字段。Step S1026, write the converted source URL into the new URL field corresponding to the source URL in the second bill.
其中,预定转换规则可以是根据系统HOST文件设定规则而制定的转换规则表,比如,针对某一个HOST,有如下的规则,如表5所示,其中,对各URL设定有“是否处理扩展名”、“是否忽略参数”选项。Wherein, the predetermined conversion rule can be a conversion rule table formulated according to the system HOST file setting rules. For example, for a certain HOST, the following rules are arranged, as shown in Table 5, wherein each URL is set with "whether to process extension", "Ignore parameters" option.
表5table 5
根据上述转换规则表,可以将第一话单中的源URL转换成新的URL,写入第二话单中的相应的新URL字段。根据第二话单中新URL字段的信息可以统计出指定网站或指定网站内具体内容的访问流量。According to the conversion rule table above, the source URL in the first bill can be converted into a new URL, and written into the corresponding new URL field in the second bill. According to the information in the new URL field in the second bill, the access traffic of the designated website or specific content in the designated website can be counted.
需要说明的是,在对第一话单数据进行预处理时,可以将本实施例中所述三种预处理方式即:对第一话单中数据进行上网类型URL分析处理、指定网站流量分析处理以及广告访问流量分析处理三者结合起来进行,由此,根据最终生成的第二话单,可以同时统计出用户访问网站的类型、指定网站的访问流量以及广告访问流量等。It should be noted that when preprocessing the first bill data, the three preprocessing methods described in this embodiment can be used, namely: performing Internet-type URL analysis and processing on the data in the first bill, and specifying website traffic analysis. The processing and the analysis and processing of the advertisement access flow are carried out in combination, so that, according to the finally generated second bill, the type of website visited by the user, the visit flow of the specified website, and the advertisement visit flow can be simultaneously counted.
通过数据测试,得到本发明实施例对手机用户上网行为进行分析的解决方案与传统解决方案的比较情况如下表6所示:Through data test, obtain the solution that the embodiment of the present invention analyzes mobile phone user's online behavior and the comparative situation of traditional solution as shown in table 6 below:
表6Table 6
由表6可知,相比传统技术,本发明实施例解决方案够更快捷的分析用户上网的数据,大大提高了话单数据的处理速度,减轻了系统数据库的处理负担,解决了手机用户上网行为分析的性能瓶颈问题。It can be seen from Table 6 that, compared with the traditional technology, the solution of the embodiment of the present invention can analyze the user's online data more quickly, greatly improves the processing speed of bill data, reduces the processing burden of the system database, and solves the problem of mobile phone users' online behavior. Analyze performance bottlenecks.
本实施例在话单数据入库前,先使用预处理设备比如接口机对话单数据进行预处理,预处理过程包括对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后生成新的话单数据入库。之后系统数据库对第二话单数据进行统计分析处理。由此,将URL的解析过程交由接口机去处理,解析后的结果数据生成了新的话单,系统数据库直接根据结果数据进行统计分析,省去了对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。In this embodiment, before the bill data is stored in the database, a preprocessing device such as an interface machine is used to preprocess the dialog bill data. The preprocessing process includes classifying and summarizing the URLs generated by users surfing the Internet, converting the URLs according to certain rules, etc., through After a series of preprocessing, new bill data is generated and stored in the database. Afterwards, the system database performs statistical analysis and processing on the second bill data. Thus, the URL parsing process is handed over to the interface machine for processing, and a new bill is generated from the parsed result data, and the system database performs statistical analysis directly according to the result data, eliminating the process of analyzing a large amount of URL data. Thereby, the efficiency of dialogue list data processing is greatly improved, and the performance bottleneck problem of mobile phone users' online behavior analysis is solved.
如图4所示,步骤S102中对第一话单中数据进行广告访问流量分析处理的步骤包括:As shown in Figure 4, the step of performing advertisement access traffic analysis processing on the data in the first bill in step S102 includes:
步骤S1027,在第一话单中增加用来存放广告类URL的广告类URL字段;Step S1027, adding an advertisement URL field for storing the advertisement URL in the first bill;
步骤S1028,根据所述第一话单中源URL所携带的预定标识符分离出广告类URL;Step S1028, separating the advertising URL according to the predetermined identifier carried by the source URL in the first bill;
步骤S1029,将广告类URL写入第二话单中与源URL对应的广告类URL字段。Step S1029, write the advertising URL into the advertising URL field corresponding to the source URL in the second bill.
如图5所示,本发明一实施例提出一种基于手机用户上网行为的数据处理装置,包括:原始话单生成模块501、新话单生成模块502以及新话单数据处理模块503,其中:As shown in Figure 5, an embodiment of the present invention proposes a data processing device based on mobile phone users' online behavior, including: an original bill generation module 501, a new bill generation module 502, and a new bill data processing module 503, wherein:
原始话单生成模块501,用于根据用户上网数据生成包含有用户访问URL的第一话单;The original bill generation module 501 is used to generate the first bill that includes the user's access URL according to the user's online data;
在本实施例中,用户可以通过手机上网,访问各种网站,以获取相应的网络信息。当用户通过手机上网时,移动业务系统中原始话单生成模块501根据手机用户的访问网址获取网络数据,产生原始话单,即本实施例所称第一话单,用户访问量越多,移动业务系统产生的话单量相应增加。In this embodiment, the user can surf the Internet through the mobile phone and visit various websites to obtain corresponding network information. When a user surfs the Internet through a mobile phone, the original call list generation module 501 in the mobile service system obtains network data according to the mobile phone user's access URL, and generates the original call list, which is the first call list in this embodiment. The amount of bills generated by the business system increases accordingly.
其中,话单中包含有用户访问的URL。URL是用于完整地描述Internet上网页和其他资源的地址的一种标识方法。Internet上的每一个网页都具有一个唯一的名称标识,通常称之为URL地址,这种地址可以是本地磁盘,也可以是局域网上的某一台计算机,更多的是Internet上的站点。简单地说,URL就是Web地址,俗称“网址”。Wherein, the bill includes the URL accessed by the user. URL is an identification method used to completely describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique name identifier, usually called a URL address. This address can be a local disk, or a certain computer on the LAN, and more often it is a site on the Internet. Simply put, a URL is a web address, commonly known as a "web address".
当移动业务系统获取到第一话单后,需要对第一话单数据进行分析统计处理,以便根据处理结果了解手机用户上网行为,比如:用户常喜欢上哪些类型的网站、某些指定的网站的访问流量情况以及商家关心的广告访问流量等,从而根据手机用户上网行为后续采取相应的商业措施等。After the mobile service system obtains the first bill, it needs to analyze and statistically process the data of the first bill, so as to understand the online behavior of mobile phone users based on the processing results, such as: which types of websites users often like, and certain designated websites The visit flow of mobile phone users and the advertisement visit traffic that merchants care about, so as to take corresponding commercial measures according to the online behavior of mobile phone users.
新话单生成模块502,用于按照预定规则对第一话单中数据进行预处理,生成第二话单;The new bill generation module 502 is used to preprocess the data in the first bill according to predetermined rules to generate a second bill;
在本实施例中,新话单生成模块501按照预定规则对第一话单中数据进行预处理具体包括对第一话单中数据进行上网类型URL分析处理、指定网站流量分析处理和/或广告访问流量分析处理。In this embodiment, the new bill generation module 501 performs preprocessing on the data in the first bill according to predetermined rules, specifically including performing URL analysis processing on the Internet type, traffic analysis processing on a specified website and/or advertisement processing on the data in the first bill. Access traffic analysis processing.
其中,预定规则是针对运营商所关心的访问网站类型、网站访问流量以及广告访问流量等主要问题而制定,其中按照预定规则对第一话单中数据进行预处理包括:对第一话单中数据进行上网类型URL分析处理和/或指定网站流量分析处理和/或广告访问流量分析处理,具体的,比如可以对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等。Among them, the predetermined rules are formulated for the main issues of the operator concerned, such as the type of website visited, the traffic of website visits, and the traffic of advertisements. The preprocessing of the data in the first bill according to the predetermined rules includes: The data is analyzed and processed for Internet access type URLs and/or specified website traffic analysis and/or advertising access traffic analysis processing. Specifically, for example, URLs generated by users can be classified and summarized, and URLs can be converted according to certain rules.
根据获取数据处理信息的需要,上述预定规则还可为其他类似的规则。According to the requirement of obtaining data processing information, the above predetermined rules may also be other similar rules.
本实施例中对第一话单中数据进行预处理,可以采用独立的设备,比如接口机,先使用接口机对话单数据进行预处理,比如对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后由第二话单生成模块502生成新的话单数据入库,以便后续处理过程中,系统数据库对第二话单数据进行统计分析处理。在本实施例中,第二话单数据入库可以通过IMP入库程序将预处理后数据录入到系统指定的数据库表中。In this embodiment, the data in the first call list is preprocessed, independent equipment can be used, such as an interface machine, and the interface machine is used to preprocess the dialogue list data first, such as classifying and summarizing URLs generated by users surfing the Internet, URLs according to Certain rules are used for conversion, etc. After a series of preprocessing, the second bill generation module 502 generates new bill data for storage, so that in the subsequent processing, the system database performs statistical analysis on the second bill data. In this embodiment, the second bill data storage may use the IMP storage program to input the preprocessed data into the database table specified by the system.
新话单数据处理模块503,用于对第二话单中数据进行统计分析处理。The new bill data processing module 503 is configured to perform statistical analysis on the data in the second bill.
如上所述,新生成的话单数据交由系统数据库新话单数据处理模块503进行统计分析处理,比如,根据第二话单中URL的所属类别,可以统计出用户期望获取的某一类URL的汇总数据。As mentioned above, the newly generated call list data is handed over to the new call list data processing module 503 of the system database for statistical analysis and processing. Aggregated data.
由此,将第一话单中URL的解析过程交由接口机去处理,解析后的结果数据生成了新的话单,系统数据库直接根据结果数据进行统计分析,省去了对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。Thus, the analysis process of the URL in the first call list is handed over to the interface machine for processing, and the result data after analysis generates a new call list, and the system database directly conducts statistical analysis based on the result data, eliminating the need for a large number of URL data The analysis process greatly improves the efficiency of dialogue list data processing and solves the performance bottleneck problem of mobile phone users' online behavior analysis.
如图6所示,新话单生成模块502包括:字段增加单元5021、解析单元5022以及写入单元5023,其中:As shown in Figure 6, the new bill generation module 502 includes: a field adding unit 5021, a parsing unit 5022 and a writing unit 5023, wherein:
字段增加单元5021,用于在第一话单中增加用来存放URL所属类别的URL类型的字段;A field adding unit 5021, configured to add a field for storing the URL type of the category to which the URL belongs in the first bill;
解析单元5022,用于解析第一话单中的源URL;A parsing unit 5022, configured to parse the source URL in the first bill;
写入单元5023,用于从预设的URL类别对照关系表中查找源URL对应的所属类别,写入第二话单中与源URL对应的URL类型的字段。The writing unit 5023 is configured to search the category corresponding to the source URL from the preset URL category comparison table, and write the field of the URL type corresponding to the source URL in the second bill.
进一步的,字段增加单元5021,还用于在第一话单中增加用来存放转换后的新URL的新URL字段;Further, the field adding unit 5021 is also used to add a new URL field for storing the converted new URL in the first bill;
解析单元5022,还用于根据预定转换规则转换第一话单中的源URL;The parsing unit 5022 is further configured to convert the source URL in the first bill according to a predetermined conversion rule;
写入单元5023,还用于将转换后的源URL写入第二话单中与源URL对应的新URL字段。The writing unit 5023 is further configured to write the converted source URL into the new URL field corresponding to the source URL in the second bill.
更进一步的,字段增加单元5021,还用于在第一话单中增加用来存放广告类URL的广告类URL字段;Furthermore, the field adding unit 5021 is also used to add an advertisement URL field for storing the advertisement URL in the first bill;
解析单元5022,还用于根据第一话单中源URL所携带的预定标识符分离出广告类URL;The parsing unit 5022 is further configured to separate the advertisement-type URL according to the predetermined identifier carried by the source URL in the first bill;
写入单元5023,还用于将广告类URL写入第二话单中与源URL对应的广告类URL字段。The writing unit 5023 is further configured to write the advertising URL into the advertising URL field corresponding to the source URL in the second bill.
本发明实施例基于手机用户上网行为的数据处理方法及装置通过在话单数据入库前,先使用预处理设备比如接口机对话单数据进行预处理,预处理过程包括对用户上网生成的URL进行分类汇总、对URL按照一定规则进行转换等,通过一系列的预处理后生成新的话单数据入库。之后系统数据库对第二话单数据进行统计分析处理。由此,将URL的解析过程交由接口机去处理,解析后的结果数据生成了新的话单,系统数据库直接根据结果数据进行统计分析,省去了对大批量的URL数据进行分析的过程,从而大大提高了对话单数据处理的效率,解决了手机用户上网行为分析的性能瓶颈问题。The data processing method and device based on the mobile phone user's online behavior in the embodiment of the present invention use a preprocessing device such as an interface machine to preprocess the dialog ticket data before the bill data is stored in the database. The preprocessing process includes URL generated by the user surfing the Internet Sorting and summarizing, converting URLs according to certain rules, etc., after a series of preprocessing, new bill data is generated and stored in the database. Afterwards, the system database performs statistical analysis and processing on the second bill data. Thus, the URL parsing process is handed over to the interface machine for processing, and a new bill is generated from the parsed result data, and the system database performs statistical analysis directly according to the result data, eliminating the process of analyzing a large amount of URL data. Thereby, the efficiency of dialogue list data processing is greatly improved, and the performance bottleneck problem of mobile phone users' online behavior analysis is solved.
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本发明的专利保护范围内。The above is only a preferred embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields , are all included in the scope of patent protection of the present invention in the same way.
Claims (7)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201010535447.0A CN102006174B (en) | 2010-11-08 | 2010-11-08 | Data processing method and device based on online behavior of mobile phone user |
| PCT/CN2011/075696 WO2012062107A1 (en) | 2010-11-08 | 2011-06-13 | Method and apparatus for data processing based on surfing behavior of mobile telephone user |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201010535447.0A CN102006174B (en) | 2010-11-08 | 2010-11-08 | Data processing method and device based on online behavior of mobile phone user |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102006174A CN102006174A (en) | 2011-04-06 |
| CN102006174B true CN102006174B (en) | 2015-01-28 |
Family
ID=43813268
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201010535447.0A Expired - Fee Related CN102006174B (en) | 2010-11-08 | 2010-11-08 | Data processing method and device based on online behavior of mobile phone user |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN102006174B (en) |
| WO (1) | WO2012062107A1 (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102006174B (en) * | 2010-11-08 | 2015-01-28 | 中兴通讯股份有限公司 | Data processing method and device based on online behavior of mobile phone user |
| CN102547663B (en) * | 2012-03-09 | 2016-05-11 | 北京思特奇信息技术股份有限公司 | A kind of surfing Internet with cell phone optimization method based on traffic matrix |
| CN104331404B (en) * | 2013-07-22 | 2018-05-01 | 中国科学院深圳先进技术研究院 | A kind of user's behavior prediction method and apparatus based on user mobile phone Internet data |
| CN104978341A (en) * | 2014-04-08 | 2015-10-14 | 北京奇虎科技有限公司 | File processing method and equipment, and network system |
| CN105791613A (en) * | 2014-12-24 | 2016-07-20 | 中兴通讯股份有限公司 | Call bill processing method and device |
| CN104866909A (en) * | 2015-04-29 | 2015-08-26 | 国网智能电网研究院 | Method and system for finishing air ticket booking function URL |
| CN105827432A (en) * | 2015-12-29 | 2016-08-03 | 广东亿迅科技有限公司 | SHELL script-based traffic log statistical method and statistical system |
| CN108287831B (en) * | 2017-01-09 | 2022-08-05 | 阿里巴巴集团控股有限公司 | URL classification method and system and data processing method and system |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1353371A (en) * | 2000-11-10 | 2002-06-12 | 思网科技股份有限公司 | A dynamic real-time data analysis and processing system and method |
| WO2003005169A2 (en) * | 2001-07-06 | 2003-01-16 | Clickfox, Llc | System and method for analyzing system visitor activities |
| CN101562538A (en) * | 2009-04-15 | 2009-10-21 | 计世在线网络技术(北京)有限公司 | System for analyzing website access |
| CN101872347A (en) * | 2009-04-22 | 2010-10-27 | 富士通株式会社 | Method and device for judging webpage type |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102006174B (en) * | 2010-11-08 | 2015-01-28 | 中兴通讯股份有限公司 | Data processing method and device based on online behavior of mobile phone user |
-
2010
- 2010-11-08 CN CN201010535447.0A patent/CN102006174B/en not_active Expired - Fee Related
-
2011
- 2011-06-13 WO PCT/CN2011/075696 patent/WO2012062107A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1353371A (en) * | 2000-11-10 | 2002-06-12 | 思网科技股份有限公司 | A dynamic real-time data analysis and processing system and method |
| WO2003005169A2 (en) * | 2001-07-06 | 2003-01-16 | Clickfox, Llc | System and method for analyzing system visitor activities |
| CN101562538A (en) * | 2009-04-15 | 2009-10-21 | 计世在线网络技术(北京)有限公司 | System for analyzing website access |
| CN101872347A (en) * | 2009-04-22 | 2010-10-27 | 富士通株式会社 | Method and device for judging webpage type |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2012062107A1 (en) | 2012-05-18 |
| CN102006174A (en) | 2011-04-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102006174B (en) | Data processing method and device based on online behavior of mobile phone user | |
| US10839038B2 (en) | Generating configuration information for obtaining web resources | |
| US8917624B2 (en) | Click quality classification and delivery | |
| US9928251B2 (en) | System and method for distributed categorization | |
| CN112347377B (en) | IP address field searching method, service scheduling method, device and electronic equipment | |
| US9600470B2 (en) | Method and system relating to re-labelling multi-document clusters | |
| US10769254B2 (en) | Method and apparatus for identifying user behavior object based on traffic analysis | |
| EP3923157B1 (en) | Data stream processing | |
| CN106936667A (en) | A kind of main frame real-time identification method based on application rs traffic distributed analysis | |
| US10122722B2 (en) | Resource classification using resource requests | |
| WO2013106595A2 (en) | Processing store visiting data | |
| WO2016101811A1 (en) | Information arrangement method and apparatus | |
| WO2016078533A1 (en) | Search method, apparatus, and device and non-volatile computer storage medium | |
| CN101071445A (en) | Classified sample set optimizing method and content-related advertising server | |
| CN114443940A (en) | A message subscription method, device and device | |
| CN103093377B (en) | A kind of advertisement placement method and system | |
| CN101257461A (en) | Category-based content filtering method and device | |
| US9525744B2 (en) | Determining a uniform user identifier for a visiting user | |
| US7937392B1 (en) | Classifying uniform resource identifier (URI) using xpath expressions | |
| CN105447020B (en) | A kind of method and device of determining business object keyword | |
| CN103986606B (en) | It is a kind of based on the parallelism recognition of MapReduce algorithms, the method for statistical web page URL | |
| CN103368835B (en) | Method for classifying network users and routing equipment | |
| CN102957721A (en) | Device and method for classifying users based on identification information | |
| CN107977381A (en) | Data configuration method, index managing method, relevant apparatus and computing device | |
| CN105530279A (en) | Data processing method and processing device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20170801 Address after: 1109, 136, 138, No. 510620, sports east road, Guangzhou, Guangdong, Tianhe District (only for office use). 1110 Patentee after: Guangzhou Verce Intelligent Technology Co.,Ltd. Address before: 518057 Nanshan District Guangdong high tech Industrial Park, South Road, science and technology, ZTE building, Ministry of Justice Patentee before: ZTE Corp. |
|
| TR01 | Transfer of patent right | ||
| CP02 | Change in the address of a patent holder |
Address after: 510620 Room 901, Radio and Television Science and Technology Building, 163 Pingyun Road, Tianhe District, Guangzhou City, Guangdong Province Patentee after: Guangzhou Verce Intelligent Technology Co.,Ltd. Address before: 510620 Tianhe District, Guangzhou, Guangdong Sports East Road 136, 138 1109, 1110 units (for office use only) Patentee before: Guangzhou Verce Intelligent Technology Co.,Ltd. |
|
| CP02 | Change in the address of a patent holder | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150128 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |