CN106815274B

CN106815274B - Hadoop-based log data mining method and system

Info

Publication number: CN106815274B
Application number: CN201510875453.3A
Authority: CN
Inventors: 惠羿; 熊伟; 哈景楠
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2015-12-02
Filing date: 2015-12-02
Publication date: 2022-02-18
Anticipated expiration: 2035-12-02
Also published as: CN106815274A; WO2017092444A1

Abstract

The invention discloses a log data mining method based on Hadoop, which saves the acquired first log data set in the current time period into a Hadoop database; if the number of the first log data sets saved in the Hadoop database meets a preset number , then use the preset parallel computing model to perform parallel aggregation processing on the first log data set in the Hadoop database to obtain a second log data set; according to the dimension of the log data in the second log data set, the second log data The log data in the set is divided into dimensions, and the obtained third log data sets corresponding to different dimensions are stored in the Hadoop database. The invention also discloses a log data mining system based on Hadoop. The invention can quickly and effectively realize the mining of massive data, and meet the storage and operation requirements for mining the massive data.

Description

Hadoop-based log data mining method and system

Technical Field

The invention relates to the field of computer data processing, in particular to a log data mining method and system based on Hadoop.

Background

Since the internet era, how to quickly find a more appropriate, quantifiable, and predictable accurate marketing strategy in an ever-increasing mass of user information becomes a core demand of numerous enterprises including operators.

However, the traditional database has limited data operation capability and expensive storage cost, and cannot meet the requirement of mining mass data.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a log data mining method and system based on Hadoop, and aims to solve the technical problems that a traditional database is limited in data operation capacity, expensive in storage cost and incapable of providing massive data mining.

In order to achieve the above object, the invention provides a log data mining method based on Hadoop, comprising:

storing the acquired first log data set in the current time period into a Hadoop database;

if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, performing parallel aggregation processing on the first log data sets in the Hadoop database by using a preset parallel operation model to obtain a second log data set;

and performing dimension division on the log data in the second log data set according to the dimensions of the log data in the second log data set, and storing the obtained third log data sets corresponding to different dimensions into the Hadoop database.

Preferably, the method further comprises:

acquiring log data in the current time period from a network side;

and carrying out aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

Preferably, the step of obtaining the log data in the current time period from the network side further includes:

performing data cleaning on the log data in the current time period to obtain cleaned log data in the current time period;

the step of performing aggregation processing on the log data in the current time period to obtain a first log data set in the current time period includes:

and carrying out aggregation processing on the cleaned log data in the current time period to obtain a first log data set in the current time period.

Preferably, the method further comprises:

if a data query instruction is received, reading a third log data set corresponding to the query dimension from the Hadoop database according to the query dimension contained in the data query instruction;

and performing data analysis on the third log data set, and displaying the result of the data analysis on a display interface.

Preferably, the performing data analysis on the third log data set includes:

performing user grouping on the users in the third log data set according to a preset clustering algorithm to obtain a user grouping list;

obtaining a level configuration table corresponding to at least two user dimensions according to log data of users in a user grouping list, wherein the user dimensions are preset, and the level configuration table comprises levels determined by the users in the user grouping list according to the user dimensions in a grading manner.

In order to achieve the above object, the present invention further provides a log data mining system based on Hadoop, including:

the first storage module is used for storing the acquired first log data set in the current time period into a Hadoop database;

the parallel aggregation module is used for performing parallel aggregation processing on the first log data set in the Hadoop database by using a preset parallel operation model to obtain a second log data set if the number of the first log data sets stored in the Hadoop database meets a preset numerical value;

and the division and storage module is used for performing dimension division on the log data in the second log data set according to the dimension of the log data in the second log data set, and storing the obtained third log data sets corresponding to different dimensions into the Hadoop database.

Preferably, the system further comprises:

the acquisition module is used for acquiring the log data in the current time period from a network side;

and the first aggregation module is used for performing aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

Preferably, the system further comprises a cleaning module;

the cleaning module is used for cleaning the log data in the current time period after the acquisition module acquires the log data in the current time period to obtain the cleaned log data in the current time period;

and the first aggregation module is specifically configured to perform aggregation processing on the cleaned log data in the current time period to obtain a first log data set in the current time period.

Preferably, the system further comprises:

the reading module is used for reading a third log data set corresponding to a query dimension from the Hadoop database according to the query dimension contained in the data query instruction if the data query instruction is received;

and the analysis module is used for carrying out data analysis on the third log data set and displaying the result of the data analysis on a display interface.

Preferably, the analysis module comprises:

the clustering module is used for carrying out user grouping on the users in the third log data set according to a preset clustering algorithm to obtain a user grouping list;

an obtaining and displaying module, configured to obtain a level configuration table corresponding to at least two user dimensions according to log data of users in a user grouping list, where the user dimensions are preset, and the level configuration table includes levels determined by users in the user grouping list in a hierarchical manner according to the user dimensions

The invention provides a Hadoop-based log data mining method, which comprises the steps of storing a first log data set in the current time period into a Hadoop database, if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, performing parallel aggregation processing on the first log data set in the Hadoop database by using a preset parallel operation model to obtain a second log data set, performing maintenance and division on the log data in the second log data set according to the dimensionality of the log data in the second log data set, and storing a third log data set corresponding to different dimensionalities into the Hadoop database to finish the mining of the log data. The Hadoop database has better distributed storage capacity and parallel operation capacity, so that the log data are stored in a distributed mode by the Hadoop database and parallel operation is performed by the parallel operation model, massive data can be mined quickly and effectively, and the storage and operation requirements for mining the massive data are met.

Drawings

FIG. 1 is a schematic flow chart of a Hadoop-based log data mining method according to a first embodiment of the present invention;

FIG. 2 is a schematic flow chart showing additional steps before step 101 of the first embodiment of FIG. 1;

FIG. 3 is a flow chart illustrating additional steps after step 103 of the first embodiment of FIG. 1;

FIG. 4 is a diagram illustrating functional modules of a Hadoop-based log data mining system according to a second embodiment of the present invention;

FIG. 5 is a diagram of additional functional modules in the second embodiment of FIG. 4;

fig. 6 is a schematic diagram of additional functional modules in the second embodiment of fig. 4.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a Hadoop-based log data mining method, which comprises the steps of storing a first log data set in the current time period into a Hadoop database, if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, performing parallel aggregation processing on the first log data set in the Hadoop database by using a preset parallel operation model to obtain a second log data set, performing maintenance and division on the log data in the second log data set according to the dimensionality of the log data in the second log data set, and storing a third log data set corresponding to different dimensionalities into the Hadoop database to finish the mining of the log data. The Hadoop database has better distributed storage capacity and parallel operation capacity, so that the log data are stored in a distributed mode by the Hadoop database and parallel operation is performed by a preset parallel operation model in the Hadoop, massive data can be mined quickly and effectively, and the storage and operation requirements for mining the massive data are met.

Referring to fig. 1, a schematic flow chart of a Hadoop-based log data mining method according to a first embodiment of the present invention includes:

step 101, storing the acquired first log data set in the current time period into a Hadoop database;

in the embodiment of the invention, the log data mining method based on Hadoop can be applied to a log data mining system based on Hadoop (hereinafter referred to as mining system), and the mining system stores the acquired first log data set in the current time period into a Hadoop database.

The mining system acquires the first log data set according to a time period, for example, if the time period is 15 minutes or 30 minutes, the mining system acquires the first log data set in the current 15-minute time period or acquires the first log data set in the current 30-minute time period.

The time period is a period for acquiring data, and the duration of the time period can be determined according to the size of the data volume.

The Hadoop can realize a Distributed File System (HDFS), and the frame core of the Hadoop is a Hadoop database and a parallel operation model, wherein the Hadoop database can provide Distributed storage for massive data, and the parallel operation model can provide parallel operation for the massive data.

Preferably, the parallel operation model is a mapreduce operation model.

102, if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, performing parallel aggregation processing on the first log data sets in the Hadoop database by using a preset parallel operation model to obtain a second log data set;

in the embodiment of the invention, the mining system stores the acquired first log data set into the Hadoop database in each time period, and if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, the first log data set in the Hadoop database can be aggregated by using a preset parallel operation model in the Hadoop frame to obtain a second log data set.

In practical applications, the value may be preset according to specific needs, for example, if the time period is 15 minutes and aggregation processing needs to be performed on the first log data set within one hour, the preset value is 4; if the time period is 30 minutes and the aggregation process needs to be performed on the first log data set within 1 day, the preset value is 48.

It will be appreciated that based on the aggregation process described above, the mining system may also derive log data sets for different time periods in a similar manner, such as: the log data sets within one hour can be obtained by using the first log data sets with 4 time periods of 15 minutes, the log data sets within one day can be obtained by using the log data sets within 24 one hour, the log data sets within one month can be obtained by using the log data sets within 30 one day, and the like, the log data sets within different time periods can be obtained to meet different requirements.

In the embodiment of the invention, when the mining system carries out parallel aggregation processing by using the preset parallel operation model, the same count value of the log data is accumulated.

103, performing dimensionality division on the log data in the second log data set according to the dimensionality of the log data in the second log data set, and storing the obtained third log data sets corresponding to different dimensionalities into a Hadoop database.

In the embodiment of the invention, after the mining system obtains the second log data set, the dimension division is carried out on the log data in the second log data set according to the dimension of the log data in the second log data set, and the obtained third log data sets corresponding to different dimensions are stored in a Hadoop database so as to realize the mining of mass log data, and the stored third log data sets can be used as data sources for user data query and support icons, graphic query and multi-dimensional query of a display interface, so that the data can be displayed in multiple angles, and the display effect of data mining is achieved.

The dimensions of the log data are many, including but not limited to internet surfing content, internet surfing position and internet surfing time, where the internet surfing content refers to a browsing position of a user, and the browsing position may be a specific certain position, such as hundredth, fox search, new wave microblog, or a type of website, for example: music, movies, and the like. The internet surfing position refers to the geographical position range of the IP position used by the user, and the internet surfing time refers to the time for generating log data. And the dimension division is to finish further description of the whole behavior of the user through data on the dimension according to the requirements of the system. It should be noted that, for different types of log data, the dimensions of the log data are also different, for example: when the technical scheme of the embodiment of the invention is adopted to perform data mining on the traffic data of the user in the log data, the dimension of the data mining can also include the internet surfing frequency, the user age, the monthly consumption and the like besides the internet surfing content, the internet surfing position and the internet surfing time, so that in practical application, dimension division can be performed according to specific needs, and the dimension division is not limited here.

Preferably, in the embodiment of the present invention, after the mining system stores the third log data sets corresponding to different dimensions into the Hadoop database, the mining system may also store the third log data sets corresponding to different dimensions into the column storage array, so that cooperative work of the Hadoop database and the column storage array can be realized, and data requirements of different application scenarios can be met.

Preferably, the mining system executes the operations of parallel aggregation processing and dimension division only when the number of the first log data sets stored in the Hadoop database satisfies a preset value, so that the obtained third log data set actually corresponds to a time period, and the mining system can store the corresponding relationship among the dimension, the time period and the third log data set when storing the third log data set.

In the embodiment of the invention, the mining system stores the acquired first log data set in the current time period into the Hadoop database, if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, the first log data set in the Hadoop database is subjected to parallel aggregation processing by using a preset parallel operation model to obtain a second log data set, the log data in the second log data set are maintained and divided according to the dimensionality of the log data in the second log data set, and the obtained third log data sets corresponding to different dimensionalities are stored into the Hadoop database to finish the mining of the log data. The Hadoop database has better distributed storage capacity and parallel operation capacity, so that the log data are stored in a distributed mode by the Hadoop database and parallel operation is performed by the parallel operation model in the Hadoop database, massive data can be mined quickly and effectively, and the storage and operation requirements for mining the massive data are met.

Referring to fig. 2, a flow chart illustrating an additional step before step 101 in the first embodiment of fig. 1 according to the present invention includes:

step 201, obtaining log data in the current time period from a network side;

in the embodiment of the present invention, the mining system obtains log data in the current time period from the network side, specifically: the mining system may acquire the log data in the current time period from the network side by means of extraction of the log data, or may acquire the log data in the current time period from the network side by using a web crawler technology, or may acquire the log data in the current time period from a BOSS accounting database of the network side, or may receive the log data in the current time period provided by a third party vendor of the network side, or may acquire the log data in the current time period by combining at least two of the above-mentioned manners.

Step 202, performing aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

In the embodiment of the invention, after acquiring the log data in the current time period, the mining system performs aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

In step 202, the aggregation may be performed by classifying according to the content of the log data, and accumulating the log data of the same content or the same class of content as one piece of data in number, where the order of magnitude of the first log data set obtained after aggregation is far lower than the order of magnitude of the obtained log data in the current time period, and the meaning of the data at that time is completely preserved.

In the embodiment of the present invention, the mining system implements acquisition of the first log data set through the additional steps shown in fig. 2, and by aggregating the log data acquired from the network side in the current time period, the magnitude of the log data can be effectively reduced, so that the storage space required in the Hadoop database is reduced, and the storage space is saved.

Preferably, in the embodiment of the present invention, before performing step 202, the mining system may further perform the following steps:

in the embodiment of the invention, before the mining system aggregates the acquired log data in the current time period, the mining system can also perform data cleaning on the log data in the current time period to obtain the cleaned log data in the current time period.

If the excavation system executes the above steps, the adaptive adjustment of step 202 is also required, and the adaptive adjustment of step 202 is:

The log data can be cleaned by removing some log data which do not meet the preset data type, and/or finding and correcting recognizable errors in the log data, and correcting or deleting the recognizable log data.

In the embodiment of the invention, the mining system can remove some useless or error log data by performing data cleaning on the log data in the current time period, reduce the number of log data processing and facilitate better data mining.

Referring to fig. 3, a flow chart illustrating additional steps after step 103 in the first embodiment of fig. 1 according to the present invention includes:

step 301, if a data query instruction is received, reading a third log data set corresponding to a query dimension from a Hadoop database according to the query dimension contained in the data query instruction;

in the embodiment of the invention, after the mining system stores the obtained third log data in the Hadoop database, a user can request to query the data by inputting a data query instruction, and if the mining system receives the data query instruction, the third log data set corresponding to the dimension is read from the Hadoop database according to the query dimension contained in the data query instruction.

Preferably, the data query instruction may further include a certain time period, and the mining system reads a third log data set corresponding to the query dimension in the time period.

And 302, performing data analysis on the third log data set, and displaying the result of the data analysis on a display interface.

In the embodiment of the present invention, the mining system further performs data analysis on the third log data set, and displays a result of the data analysis on a display interface, specifically: the mining system carries out user grouping on the users in the third log data set according to a preset clustering algorithm to obtain a user grouping list; obtaining a level configuration table corresponding to at least two user dimensions according to the log data of the users in the user grouping list, and displaying the level configuration table on a display interface; the user dimension is preset, and the level configuration table comprises the level determined by the user in the user grouping list according to the user dimension.

Wherein the user dimensions can be divided into a horizontal dimension and a vertical dimension, and the user is rated in different dimensions. For example: and mining the user groups obtained by the system, wherein the user groups comprise: and for all user groups and the microblog user group, ranking all users in the group according to the used flow, wherein five-star users are ranked 20% at the top, four-star users are ranked 20% to 40% at the top, and the star level of each user in all the user groups is determined by analogy. This is the lateral dimension rating. And for the users in the microblog user group, ranking the users in the rank ranking according to the traffic generated after the users start the microblog, wherein five-star users are ranked 20% at the top, four-star users are ranked 20% to 40% at the top, and the star ranking of each user in the microblog user group is determined by analogy. This is the vertical dimension rating. By means of the horizontal dimension rating and the vertical dimension rating, portrait display of user groups can be achieved, and a targeted scheme can be obtained by a service expert for specific grouped portraits.

Preferably, the preset clustering algorithm may be a K-means algorithm.

The query dimension is set based on a dimension corresponding to a third log data set stored in the Hadoop database, for example: the query dimension can be any one or more of internet surfing content, internet surfing time, internet surfing position and the like.

In the embodiment of the invention, the mining system reads the third log data set corresponding to the query dimension from the Hadoop database according to the query dimension contained in the data query instruction, performs data analysis on the third log data set, and displays the result of the data analysis on the display interface, so that the result of the data mining can be effectively displayed to a user.

It should be noted that, in the embodiment of the present invention, the method for mining log data based on the Hadoop database may be applied to a precise marketing system of traffic data, for example, mining of a target user, mining of a marketing site, and the like may be implemented by using the technical solutions described in the embodiments shown in fig. 1 to fig. 3, so as to provide a data basis for targeted and refined marketing of the target user or a target base station cell by an operator.

If the target user needs to be determined, in step 301 in the embodiment shown in fig. 3, the query dimension may be internet content or internet traffic, and if the target base station cell needs to be determined, the query dimension may be an internet location.

In practical applications, the user may select the query dimension according to specific needs, which is not limited herein.

Referring to fig. 4, a schematic diagram of functional modules of a Hadoop-based log data mining system according to a second embodiment of the present invention includes:

the first saving module 401 is configured to save the acquired first log data set in the current time period to a Hadoop database;

Preferably, the parallel operation model is a mapreduce operation model.

A parallel aggregation module 402, configured to perform parallel aggregation processing on a first log data set in the Hadoop database by using a preset parallel operation model if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, so as to obtain a second log data set;

It is understood that based on the above aggregation process, the parallel aggregation module 402 can also obtain the log data sets in different time periods in a similar manner, for example: the log data sets within one hour can be obtained by using the first log data sets with 4 time periods of 15 minutes, the log data sets within one day can be obtained by using the log data sets within 24 one hour, the log data sets within one month can be obtained by using the log data sets within 30 one day, and the like, the log data sets within different time periods can be obtained to meet different requirements.

The division and storage module 403 is configured to perform dimension division on the log data in the second log data set according to the dimensions of the log data in the second log data set, and store the obtained third log data sets corresponding to different dimensions into the Hadoop database.

In this embodiment of the present invention, a first saving module 401 saves an acquired first log data set in a current time period to a Hadoop database, if the number of the first log data sets saved in the Hadoop database satisfies a preset numerical value, a parallel aggregation module 402 performs parallel aggregation processing on the first log data set in the Hadoop database by using a preset parallel operation model to obtain a second log data set, and finally a division saving module 403 performs dimension division on the log data in the second log data set according to the dimension of the log data in the second log data set, and saves an acquired third log data set corresponding to different dimensions to the Hadoop database.

In the embodiment of the invention, the mining system stores the acquired first log data set in the current time period into a Hadoop database, if the number of the first log data sets stored in the Hadoop database meets a preset numerical value, a parallel operation model in the Hadoop database is utilized to perform parallel aggregation processing on the first log data set in the Hadoop database to obtain a second log data set, the log data in the second log data set is maintained and divided according to the dimensionality of the log data in the second log data set, and a third log data set corresponding to different dimensionalities is stored into the Hadoop database to finish the mining of the log data. The Hadoop database has better distributed storage capacity and parallel operation capacity, so that the log data are stored in a distributed mode by the Hadoop database and parallel operation is performed by the parallel operation model in the Hadoop database, massive data can be mined quickly and effectively, and the storage and operation requirements for mining the massive data are met.

Please refer to fig. 5, which is a schematic diagram of additional functional modules in the second embodiment shown in fig. 4, including:

an obtaining module 501, configured to obtain log data in a current time period from a network side;

in this embodiment of the present invention, the obtaining module 501 obtains log data in the current time period from a network side, specifically: the obtaining module 501 may obtain log data in the current time period from the network side by extracting the log data, or may obtain the log data in the current time period from the network side by using a web crawler technology, or may obtain the log data in the current time period from a BOSS accounting database of the network side, or may receive the log data in the current time period provided by a third party vendor of the network side, or may obtain the log data in the current time period by combining at least two of the above manners.

A first aggregation module 502, configured to perform aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

The first aggregation module 502 may classify the log data according to the content of the log data, and accumulate the log data of the same content or the same class of content as one piece of data in number, where the order of magnitude of the first log data set obtained after aggregation is far lower than the order of magnitude of the obtained log data in the current time period, and the meaning of the data at that time is completely stored.

In the embodiment of the present invention, the mining system will not start executing the first saving module 401 in the embodiment shown in fig. 4 until the first aggregation module 502 is executed.

In an embodiment of the present invention, the system further comprises a cleaning module 503;

the cleaning module 503 is configured to perform data cleaning on the log data in the current time period after the obtaining module 501 obtains the log data in the current time period, so as to obtain the cleaned log data in the current time period;

and if the mining system executes the cleaning module 503, the first aggregation module 502 is specifically configured to perform aggregation processing on the cleaned log data in the current time period to obtain a first log data set in the current time period.

In the embodiment of the present invention, the mining system implements acquisition of the first log data set through the additional steps shown in fig. 2, and by aggregating the log data acquired from the network side in the current time period, the magnitude of the log data can be effectively reduced, so that the storage space required in the Hadoop database is reduced, and the storage space is saved. And the mining system can also remove some useless or error log data by carrying out data cleaning on the log data in the current time period, so that the processing quantity of the log data is reduced, and the data mining is facilitated to be better carried out.

Please refer to fig. 6, which is a schematic diagram of additional functional modules of the second embodiment shown in fig. 4, including:

a reading module 601, configured to, if a data query instruction is received, read a third log data set corresponding to a query dimension from the Hadoop database according to the query dimension included in the data query instruction;

an analysis module 602, configured to perform data analysis on the third log data set, and display a result of the data analysis on a display interface.

Wherein the analysis module 602 comprises:

a clustering module 603, configured to perform user grouping on users in the third log data set according to a preset clustering algorithm, so as to obtain a user grouping list;

an obtaining and displaying module 604, configured to obtain a level configuration table corresponding to at least two user dimensions according to log data of users in the user grouping list, and display the level configuration table on a display interface; the user dimension is preset, and the level configuration table comprises the level determined by the user in the user grouping list according to the user dimension in a grading way.

Preferably, the preset clustering algorithm may be a K-means algorithm.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. a log data mining method based on Hadoop, is characterized in that, comprises:

Save the acquired first log data set in the current time period to the Hadoop database;

If the number of the first log data sets saved in the Hadoop database satisfies a preset value, the preset parallel computing model is used to perform parallel aggregation processing on the first log data sets in the Hadoop database, and a second log data set is obtained. log data collection;

Dimensionally divide the log data in the second log data set according to the dimensions of the log data in the second log data set, and save the obtained third log data sets corresponding to different dimensions in the Hadoop database;

If a data query instruction is received, read a third log data set corresponding to the query dimension from the Hadoop database according to the query dimension included in the data query instruction;

Perform user grouping on the users in the third log data set according to a preset clustering algorithm to obtain a user grouping list;

Obtain level configuration tables corresponding to at least two user dimensions according to the log data of users in the user grouping list, and display the level configuration tables on the display interface; the user dimension is preset, and the level configuration table contains The users in the user grouping list are classified and determined according to the user dimension.

2. The method according to claim 1, wherein the method further comprises:

Obtain log data in the current time period from the network side;

The log data in the current time period is aggregated to obtain a first log data set in the current time period.

3. The method according to claim 2, wherein after the step of acquiring the log data in the current time period from the network side, the step further comprises:

Perform data cleaning on the log data in the current time period to obtain the cleaned log data in the current time period;

Then, the steps of performing aggregation processing on the log data in the current time period to obtain the first log data set in the current time period include:

Perform aggregation processing on the cleaned log data in the current time period to obtain a first log data set in the current time period.

4. a log data mining system based on Hadoop, is characterized in that, comprises:

The first saving module is used to save the acquired first log data set in the current time period into the Hadoop database;

A parallel aggregation module, configured to use a preset parallel computing model to parallelize the first log data set in the Hadoop database if the number of the first log data sets saved in the Hadoop database satisfies a preset value Aggregate processing to obtain a second log data set;

The dividing and saving module is configured to perform dimension division on the log data in the second log data set according to the dimensions of the log data in the second log data set, and save the obtained third log data sets corresponding to different dimensions to in the Hadoop database;

a reading module, configured to read a third log data set corresponding to the query dimension from the Hadoop database according to the query dimension included in the data query instruction if a data query instruction is received;

a clustering module, configured to perform user grouping on the users in the third log data set according to a preset clustering algorithm to obtain a user grouping list;

The acquisition and display module is used to obtain the level configuration tables corresponding to at least two user dimensions according to the log data of the users in the user grouping list, and display the level configuration tables on the display interface; the user dimensions are preset, so The level configuration table includes the levels determined by the users in the user grouping list according to the user dimension.

5. The system of claim 4, wherein the system further comprises:

The acquisition module is used to acquire log data in the current time period from the network side;

The first aggregation module is configured to perform aggregation processing on the log data in the current time period to obtain a first log data set in the current time period.

6. The system of claim 5, further comprising a cleaning module;

The cleaning module is configured to perform data cleaning on the log data in the current time period after the acquisition module acquires the log data in the current time period, to obtain the cleaned log data in the current time period;