CN116703467A - User data monitoring method, system, electronic device and readable storage medium - Google Patents
User data monitoring method, system, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN116703467A CN116703467A CN202310814204.8A CN202310814204A CN116703467A CN 116703467 A CN116703467 A CN 116703467A CN 202310814204 A CN202310814204 A CN 202310814204A CN 116703467 A CN116703467 A CN 116703467A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- user data
- information system
- streaming
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application relates to the technical field of computers, the financial field and the digital medical field, and discloses a user data monitoring method, a system, electronic equipment and a readable storage medium, wherein the method comprises the following steps: the data information system receives and stores user data generated by the operation of a user in the service system in real time; the streaming data processing system obtains user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness; the business data prediction system predicts at least one user activity index in a future preset time length based on the business data prediction model and according to the processed data. The application realizes the real-time summarization of the user activity indexes and the prediction of future user data.
Description
Technical Field
The application relates to the technical field of computers, the financial field and the digital medical field, in particular to a user data monitoring method, a system, electronic equipment and a readable storage medium.
Background
With the rapid development of internet services such as electronic commerce and online medical treatment, online user groups are expanding continuously, and in order to improve the use experience of the whole front-end application product and know the use intention of a user, information collection is often required to be carried out on a client user so as to statistically analyze relevant indexes such as user activity, user retention, user loss and the like.
For example, in many hospitals at present, an online consultation system is built or used, patient information needs to be collected and summarized for better patient service, and information such as patient liveness is further counted to know the use condition of the online consultation system.
The existing user information collection and analysis system has weaker support for window calculation, which is not beneficial to the expansion of statistical data; and the function is single; in a scene with secondary aggregation, a large amount of data operation exists, so that the query speed is slower; the problem of poor experience of monitoring a large disc under multiple platforms and multiple channels is solved.
Disclosure of Invention
In view of the foregoing, embodiments of the present application provide a method, a system, an electronic device, and a readable storage medium for monitoring user data, so as to overcome or at least partially overcome the disadvantages of the prior art.
In a first aspect, an embodiment of the present application provides a user data monitoring method, where the method is implemented by a user data monitoring platform, where the user data monitoring platform includes: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the data information system is respectively connected with the stream data processing system and the service data prediction system; the user data monitoring platform can be connected with at least one service system;
the data information system receives and stores user data generated by the operation of a user in the service system in real time;
the streaming data processing system obtains the user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness;
the business data prediction system predicts at least one user activity index in a future preset time period based on a business data prediction model and according to the processed data, wherein the business data prediction model is obtained by training at least according to the processed data in the preset history time period obtained from the data information system.
In a second aspect, an embodiment of the present application further provides a user data monitoring platform, where the user data monitoring platform includes: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the data information system is respectively connected with the stream data processing system and the service data prediction system; the user data monitoring platform can be connected with at least one service system;
the data information system is used for receiving and storing user data generated by the operation of a user in the service system in real time and receiving and storing processed data returned by the stream data processing system;
the streaming data processing system obtains the user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers and displays user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness;
the business data prediction system predicts the user activity index in the future preset time length based on a business data prediction model and according to the processed data, wherein the business data prediction model is obtained by training the business data prediction system by acquiring the processed data in the preset history time length from the data information system.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the user data monitoring method described above.
In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the user data monitoring method described above.
The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:
the application designs a user data monitoring platform, which comprises: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the stream data processing system and the service data prediction system are respectively connected with the data information system; the user data monitoring platform can be externally connected with at least one service system. When a user operates in a service system, user data are generated, the user data are collected and then sent to a data information system for storage, a streaming data processing system acquires the user data from the data information system based on a streaming processing mechanism and performs preprocessing and integration, and on one hand, various activity indexes of the user are monitored and early-warned in real time through the integrated data; on the other hand, the integrated data is returned to the data information system for consumption by the service data prediction system, a service data prediction model is preset in the service data prediction system, and at least a part of training data is processed data which is stored in the data information system and is obtained by processing user data by a streaming data processing system in a historical period of time when the model is trained; the business data prediction system can predict business data in a period of time in the future of the business system based on the business data prediction model, such as the number of users, the activity of the users and the like, and can alarm and early warn some risk items. The user data monitoring method can realize real-time summarization of the user activity indexes and prediction of future user data at the same time; the data processing system based on the stream processing mechanism remarkably improves the query efficiency of the user activity index and the data instantaneity, and adapts to different service demands and statistics of various aggregation types; after the intelligent model is trained, the online alarm prediction is more accurate; the feature engineering enables the streaming data processing system to execute, so that a set of calculation engines of the streaming data processing system support model training and model prediction, the deployment is simpler, the feature processing link on user data during model training is avoided, and resources and calculation force are greatly saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 shows a schematic diagram of a user data monitoring platform according to one embodiment of the application;
FIG. 2 shows a flow diagram of a user data monitoring method according to one embodiment of the application;
FIG. 3 depicts a schematic diagram of a Flink framed streaming data processing system in accordance with an embodiment of the application;
FIG. 4 illustrates a flow diagram of a flow chart summarizing user activity metrics by a Flink framework-based streaming data processing system 120 based on a streaming mechanism in accordance with one embodiment of the present application;
FIG. 5-a illustrates a large disk presentation effect diagram of user data monitoring according to one embodiment of the application;
FIG. 5-b illustrates a large disk presentation effect diagram of user data monitoring according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
In the financial field, with the rapid development of electronic commerce, online user groups are expanding continuously, so as to improve the use experience of the whole front-end application product, know the use intention of the user, and often need to collect information of the client user, so as to statistically analyze relevant indexes such as user activity, user retention, user loss and the like.
For example, the financial software system can be an insurance system, a banking system, a transaction system, an order system and the like, and the systems support functions of applying loans, credit cards or purchasing insurance, financial products and the like, so that the life of people is facilitated. When users use the system, traces are generated, and in order to better serve the users, statistics and monitoring on some user activity indexes are needed. In the digital medical field, for example, many hospitals build or use an online consultation system, in order to better serve patients, patient information needs to be collected and summarized, information indexes such as patient liveness and the like are further counted, and the use condition of the online consultation system and the business and the asking money of the hospitals are known.
The on-line inquiry system can be a front-end application APP, an applet and the like constructed by the medical party B, and can also be an on-line inquiry function constructed by a third party system. Patient users can trace when using these systems, and some patient activity metrics need to be counted and monitored in order to better serve the patient.
The existing user information collection and analysis system has weaker support for window calculation, which is not beneficial to the expansion of statistical data; and the function is single; in a scene with secondary aggregation, a large amount of data operation exists, so that the query speed is slower; the problem of poor experience of monitoring a large disc under multiple platforms and multiple channels is solved.
In this regard, the present application provides a method for monitoring user data, which may be implemented by a user data monitoring platform. Fig. 1 shows a schematic structure of a user data monitoring platform according to an embodiment of the present application, and as can be seen from fig. 1, the user data monitoring platform 100 includes a data information system 110, a stream data processing system 120 and a service data prediction system 130, where the stream data processing system 120 is communicatively connected to the data information system 110, the service data prediction system 130 is communicatively connected to the data information system 110, and the data information system 110 is externally connectable to one or more service systems 200, which can be understood as the aforementioned financial software system, and can be presented in various forms such as clients, web pages, applets, and the like. And the streaming data processing system 120 can be externally connected with the external columnar data management system 300 to store the real-time summary result.
In some embodiments of the present application, the data information system may be a data storage system with a read-write function constructed based on any one of Kakfa, database, elastomer search, and HDFS, and hereinafter, for illustration, kakfa is used as an example.
The streaming data processing system may be any data processing system based on a streaming mechanism, in some embodiments of the present application, the streaming data processing system is built based on a flank framework, and flank is a framework for streaming processing and batch processing, may be implemented based on a streaming engine, and is used for performing stateful computation on borderless and borderless data streams, may support stand-alone operation, may also support big data yarn clusters, notify exact Once and Checkpoint that supports stateful computation, is a mechanism for saving a current state during operation so as to be capable of returning to a previous state when a system crashes or fails, and may also perform storage of savepoints (is a mechanism in a database management system, allowing a flag to be created during transaction execution so as to be capable of returning to the flag when a transaction fails or rolls back) so as to restore tasks, may support multiple window computation aggregation, and may also be driven by a trigger event that comes when a flank.
In the prior art, in a user data supervision system, a store is often adopted to perform data processing, compared with the store, the throughput of the Flink is about 3-5 times of that of the store, under the same operation as a real-time computing framework, the store is used for ensuring that the consumption of an ACK mechanism is large, and the Flink realizes exact on semantics and only increases alignment operation, so that the effect on the performance of the Flink is not great under the condition that the operator concurrency is not great and slow nodes do not appear, and meanwhile, the Flink provides a mechanism for managing states (states), wherein the states refer to intermediate results and intermediate states which need to be stored and managed in the stream processing process, and the states define a storage mode and a management mode of the states in the Flink) can be stored in the states.
The business data prediction system may be any deep learning computing architecture in the prior art, and in some embodiments of the present application, the business data prediction system is constructed based on a TensorFlow.
TensorFlow is a deep learning computing framework, an open source software library, used to build and train machine learning models. It allows developers to build complex neural network models using dataflow graphs, and can perform distributed training on multiple CPUs and GPUs.
Preferably, in some embodiments, the streaming data processing system is built based on a Flink framework and the traffic data prediction system is built based on a TensorFlow. Advantage of the flank for data processing: the Flink can divide data into different batches, then each batch is calculated and processed, the processing can realize efficient calculation, meanwhile, the Flink is used as a distributed computing frame, the processing speed of the data can be remarkably improved, the Flink has quicker response time, finally, the Flink has efficient data fault tolerance, training data can be saved and not lost, and meanwhile, conversion of various data can be carried out, so that the Flink can be used for training of Tensorflow and can be used for other various systems, and is very suitable for supporting machine learning tasks of Tensorflow.
Fig. 2 shows a flow chart of a user data monitoring method according to an embodiment of the present application, and as can be seen from fig. 2, the user data monitoring method of the present application at least includes steps S210 to S230:
step S210: the data information system receives and stores user data generated by a user operating in the service system in real time.
The user operating on the business system, such as logging in, browsing, etc., generates data, which in the present application is recorded as user data, which is typically presented in the form of a log, and in the present application, the business system 200 is connected to the data information system 110, and the data is transmitted to the data information system 110 for storage.
In some embodiments of the present application, to improve the data collection efficiency, an intermediate component responsible for log collection, such as a collector, may be disposed between the data information system 110 and the service system 200, and is responsible for collecting log information, and sending the data to the topic of Kafka.
Step S120: the streaming data processing system obtains the user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness.
In monitoring the user data, the streaming data processing system 120 reads the user data from the data information system 110, specifically, for example, the user data is read from the corresponding topic of Kafka, which is the unprocessed source data.
The processing of data by streaming data processing system 120 is based on a streaming mechanism, which is a way to process data that processes and analyzes an unlimited amount of data in real-time in data stream basis. Unlike batch processing, streaming mechanisms can process continuous data streams without waiting for the data set to arrive completely. The streaming mechanism can rapidly respond to data change, and the real-time performance and accuracy of data analysis are improved.
After the streaming data processing system 120 acquires the user data, the user data is processed in terms of feature engineering, and the processed data is referred to as processed data in the present application. For processed data, the streaming data processing system 120 returns it to the data information system 110 for consumption by the business data prediction system 130.
In another aspect, the streaming data processing system 120 may summarize at least one user activity index in real time and display a large disk report according to the processed data. User activity metrics include, but are not limited to, user activity, user retention, user churn, and like related metrics. The user activity is usually the index of most concern of enterprises, and the method supports the summarization and display of the user activity index in various dimensions.
First, to brief description of a process of processing data by the flank-framed streaming data processing system 120, referring to fig. 3, fig. 3 is a schematic diagram illustrating a structure of the flank-framed streaming data processing system according to an embodiment of the present application, and on the structure shown in fig. 3, a flow of processing data by the streaming data processing system may be briefly described as follows:
1. when a user submits a job, the submission script will first initiate a Client process responsible for compiling and submitting the job.
2. When the Dispatcher receives a job provided by a user and is responsible for pulling up a new JobManager component for the newly submitted job, jobManager is responsible for managing execution of the job, while it compiles code written by the user into a JobGraph and submits the JobGraph to a cluster, there may be multiple jobs executing simultaneously in a Flink cluster, each job has its own JobManager component, while requesting corresponding resources from the ResourceManager, and there is only one ResourceManager in the entire Flink cluster.
3. The task manager is a truly live component, and although he calls a task manager, it does not manage tasks, it is a task executor, and according to the needs of a specific task, starts up a corresponding task manager, and also manages slots (slots) therein. Slot is the smallest resource unit in a Flink cluster.
4. When a resource manager requests a corresponding Slot resource, a Task is started, one Slot is not only capable of executing one Task, only overlapping tasks can be in one Slot, after the Task is finished, whether the Task is normally finished or abnormally finished, the corresponding finishing state of the JobManager is notified, and then the Slot is marked as the occupied state but not executing the Task at the end of the Task manager.
5. In addition to normal communication logic, there is a timed heartbeat message between the resource manager and the TaskManager to synchronize the state of the Slot, and at the same time, the corresponding File System records the stateBackend storage of each node.
Because there may be multiple service systems, and possibly different platforms and different channels, in many scenarios, the data information system includes user data that user data is multiple data sources, and in some embodiments of the present application, based on the above data processing flow, the stream processing mechanism obtains the user data from the data information system, performs preprocessing on the user data, and stores the obtained processed data in the data information system, where the method includes: based on a stream processing mechanism, converting user data of each data source into a plaintext form based on an anti-sequence speech mode; merging user data of multiple data sources in a plaintext form based on a Map mode to obtain a user data stream; grouping the user data streams according to a plurality of different dimensions by setting conditions by means of the key by to obtain a plurality of user data sets; preprocessing each user data group on each user activity index to obtain processed data, wherein the preprocessing comprises the following steps: at least one of data cleansing, format conversion, missing value completion, and feature extraction.
Referring to fig. 4, fig. 4 is a flow chart illustrating a flow chart of summarizing user activity indexes by the Stream data processing system 120 based on the link framework according to an embodiment of the present application, specifically, the whole calculation process adopts Stream data flow, which is a sequence of consecutive events, and the data analysis and processing are implemented by processing and converting the events. Specifically, user data is acquired from a plurality of data sources, and the operation thereof is called Source Operator; then converting the user data into plaintext data, and then adopting the middleware map shown in fig. 4 to perform data merging; and key by state grouping, also including window processing window, and aggregation, statistics, etc., all referred to as Tranformation Operators; the outflow of the final result data is called sink operators, and the data of the flow can be stored in an external data management system, such as a column-type data management system, and ES, DB, kakfa is also supported.
More specifically, in some embodiments of the present application, the converting, based on the streaming mechanism, the user data of each data source into a plaintext form based on an anti-sequence speech manner includes: setting delay time through a watermark, and filtering user data with event time stamps larger than the delay time; based on the stream processing mechanism, the retained user data is converted into a plaintext form based on an anti-sequence speech mode.
The Flink acquires corresponding message data from the corresponding kafka Topic, converts the message data into a data plaintext in an anti-sequence conversation mode, in the process, in order to prevent the influence caused by data disorder, a watermark is used for solving the disorder problem, a watermark=maximum event time (eventTime) for entering a Flink window, namely a designated delay time (t), a proper delay time is set according to the network condition, and when a watermark timestamp is greater than or equal to the window ending time, the window is ended, and window calculation is triggered.
And then, merging a plurality of sources through maps for data of different message channels, and judging whether the watermark needs to be updated according to actual conditions.
The data source operation is finished, firstly, the key by grouping is carried out, in order to classify and count users of different platforms and channels, key by conditions can be set to be client model, platform, channel, equipment identifier, APP version and the like, and statistics of user liveness is carried out in a window after grouping is finished.
In some embodiments of the application, in the above method, the streaming data processing system comprises a plurality of computing nodes; the step of summarizing the user activity indexes in real time according to the processed data comprises the following steps: determining whether the memory of the current computing node meets the data summarizing requirement or not based on a bitmap filtering mode; if the data is not satisfied, setting an initial bit array based on a bloom filter, wherein the initial bit array comprises a plurality of array elements with minimum units; for one piece of user data, according to a plurality of hash functions, placing the user data on a corresponding array element in the initial bit array; and if all the user data hit the hash value of the corresponding array element, determining that the user data is an active user.
Because the online users basically reach the millions of levels, in the application, the user data identification type can be counted by using the long type, and the statistics of the users needs to consider whether the memory of the computing node meets the existing condition, and the conventional data filtering has a Bitmap mode, and the storage space computing mode of the Bitmap is as follows: finding the largest (assumed to be N) of all elements, and the space S required by the Bitmap is: s=n/8 Byte, when N is a 64-bit number, the required space conventional equipment cannot be satisfied, so that it can be determined whether the memory of the current computing node can satisfy the data summarizing requirement, and if not, the bloom filter mode can be used for counting the user activity index.
Taking the user liveness as an example, a bloom filter is adopted to only estimate the size of the data volume, an initial bit array is set according to the size of the data volume, each array element is one bit, the data to be judged is hashed according to a plurality of hash functions, namely the hash functions are adopted, then the data to be judged is put into the corresponding array bit, whether all hits are judged, namely the corresponding hash values are hit, and if all hits are hit, the user can be determined to be a new active user in the period of time.
Referring again to FIG. 1, the streaming data processing system 120 may also be coupled to an external columnar data management system 300; the method further comprises the steps of: the streaming data processing system 120 stores the summary result obtained by the real-time summary in the external columnar data management system, so as to perform secondary aggregation according to the summary result.
The streaming data processing system 120 puts the data obtained by statistics into an external Sink (which can be understood as a data management system), for example, the column-type data management system clickHouse has the advantage of higher query performance when the user activity needs to perform secondary aggregation, and is very suitable for aggregation calculation. The method is different from the traditional relational database in that the relational database is in line type storage, the clickHouse is in column type storage, and the column type storage is to put data in the same column in the same adjacent data block, so that IO consumption can be greatly reduced when calculation type inquiry is carried out, a return result is faster, a scene of data modification is basically not existed in statistics of user activity information, and the defect of the database clickHouse can be avoided. Referring to fig. 5-a and fig. 5-b, fig. 5-a shows a large disc display effect diagram of user data monitoring according to an embodiment of the present application, and as can be seen from fig. 5-a, the present application can realize active monitoring of all-weather user data, multi-dimensional screening, and finer granularity. Referring to fig. 5-b again, fig. 5-b shows a big disc display effect diagram of user data monitoring according to another embodiment of the present application, and as can be seen from fig. 5-b, the present application can implement monitoring of user activity of a specific feature service.
Step S130: the business data prediction system predicts at least one user activity index in a future preset time period based on a business data prediction model and according to the processed data, wherein the business data prediction model is obtained by training at least according to the processed data in the preset history time period obtained from the data information system.
As described above, the present application not only can realize real-time summarization and presentation of the user activity index, but also can realize prediction of future data, wherein the prediction of the future data is realized through the service data prediction system 130, and the service data prediction model is set in the service data prediction system 130.
As described above, the streaming data processing system 120 performs operations such as batch processing, cleaning, and the like on data, and then stores the data in the data information system 110, and the service data prediction system 130 pulls relevant data to the data information system 110, and by using these data, training of a service data prediction model and prediction of future user data can be achieved, and specifically, the streaming data processing system 120 can perform data cleaning, format conversion, missing value processing, feature extraction, and filtering of dirty data can also be performed.
The business data prediction system 130 trains, optimizes and predicts a business data prediction model based on input data, and then outputs corresponding results, and the business data prediction system 130 can use various algorithms and model structures to implement various machine learning tasks.
The results finally output by the business data prediction system 130 may also be stored in an external data storage system, such as kafka, HDFS, etc.
Furthermore, in some embodiments of the application, to maintain the prediction accuracy of the model, the method further comprises: the service data prediction system acquires processed data in a preset historical time length taking the current time as a starting point from the data information system according to a preset period; and carrying out distributed training on the service data prediction model at a plurality of training nodes based on the acquired processed data and updating the service data prediction model.
The processed data, which is newly generated for one month, is acquired from the data information system 110, and then the business data prediction model is trained and updated, for example, according to a period of 1 month. And the business data prediction system 130 of the present application supports distributed training.
It should be noted that, the streaming data processing system 120 and the service data prediction system 130 are independent from each other, and the processes of data processing may be synchronous or asynchronous, which is not limited by the present application.
As can be seen from the method shown in fig. 1, the present application designs a user data monitoring platform, which includes: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the stream data processing system and the service data prediction system are respectively connected with the data information system; the user data monitoring platform can be externally connected with at least one service system. When a user operates in a service system, user data are generated, the user data are collected and then sent to a data information system for storage, a streaming data processing system acquires the user data from the data information system based on a streaming processing mechanism and performs preprocessing and integration, and on one hand, various activity indexes of the user are monitored and early-warned in real time through the integrated data; on the other hand, the integrated data is returned to the data information system for consumption by the service data prediction system, a service data prediction model is preset in the service data prediction system, and at least a part of training data is processed data which is stored in the data information system and is obtained by processing user data by a streaming data processing system in a historical period of time when the model is trained; the business data prediction system can predict business data in a period of time in the future of the business system based on the business data prediction model, such as the number of users, the activity of the users and the like, and can alarm and early warn some risk items. The user data monitoring method can simultaneously realize the real-time summarization of the user activity indexes and the prediction of future user data; the data processing system based on the stream processing mechanism remarkably improves the query efficiency of the user activity index and the data instantaneity, and adapts to different service demands and statistics of various aggregation types; after the intelligent model is trained, the online alarm prediction is more accurate; the feature engineering enables the streaming data processing system to execute, so that a set of calculation engines of the streaming data processing system support model training and model prediction, the deployment is simpler, the feature processing link on user data during model training is avoided, and resources and calculation force are greatly saved.
Fig. 6 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 6, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a user data monitoring platform on a logic level. And the processor is used for executing the program stored in the memory and particularly used for executing the method.
The method performed by the user data monitoring platform disclosed in the embodiment of fig. 1 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may further execute the method executed by the user data monitoring platform in fig. 1, and implement the functions of the user data monitoring platform in the embodiment shown in fig. 1, which is not described herein again.
The embodiments of the present application also provide a computer readable storage medium storing one or more programs, the one or more programs including instructions, which when executed by an electronic device comprising a plurality of application programs, enable the electronic device to perform a method performed by a user data monitoring platform in the embodiment shown in fig. 1, and in particular to perform the aforementioned method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Claims (10)
1. A method for monitoring user data, the method being implemented by a user data monitoring platform, the user data monitoring platform comprising: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the data information system is respectively connected with the stream data processing system and the service data prediction system; the user data monitoring platform can be connected with at least one service system;
The data information system receives and stores user data generated by the operation of a user in the service system in real time;
the streaming data processing system obtains the user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness;
the business data prediction system predicts at least one user activity index in a future preset time period based on a business data prediction model and according to the processed data, wherein the business data prediction model is obtained by training at least according to the processed data in the preset history time period obtained from the data information system.
2. The method of claim 1, wherein the data information system is constructed based on any one of Kakfa, database, elastomer search, and HDFS;
the streaming data processing system is constructed based on a Flink framework;
the business data prediction system is constructed based on TensorFlow.
3. The method of claim 2, wherein the data information system includes user data for multiple data sources;
the streaming mechanism is used for acquiring the user data from the data information system, preprocessing the user data, and storing the obtained processed data in the data information system, and the streaming mechanism comprises the following steps:
based on a stream processing mechanism, converting user data of each data source into a plaintext form based on an anti-sequence speech mode;
merging user data of multiple data sources in a plaintext form based on a Map mode to obtain a user data stream;
grouping the user data streams according to a plurality of different dimensions by setting conditions by means of the key by to obtain a plurality of user data sets;
preprocessing each user data group on each user activity index to obtain processed data, wherein the preprocessing comprises the following steps: at least one of data cleansing, format conversion, missing value completion, and feature extraction.
4. A method according to claim 3, wherein said converting user data for each data source into plaintext based on an anti-sequence speech approach based on a streaming mechanism comprises:
Setting delay time through a watermark, and filtering user data with event time stamps larger than the delay time;
based on the stream processing mechanism, the retained user data is converted into a plaintext form based on an anti-sequence speech mode.
5. The method of claim 1, wherein the streaming data processing system comprises a plurality of computing nodes;
the step of summarizing the user activity indexes in real time according to the processed data comprises the following steps:
determining whether the memory of the current computing node meets the data summarizing requirement or not based on a bitmap filtering mode;
if the data is not satisfied, setting an initial bit array based on a bloom filter, wherein the initial bit array comprises a plurality of array elements with minimum units;
for one piece of user data, according to a plurality of hash functions, placing the user data on a corresponding array element in the initial bit array;
and if all the user data hit the hash value of the corresponding array element, determining that the user data is an active user.
6. The method of claim 1, wherein the streaming data processing system is further coupled to an external columnar data management system;
the method further comprises the steps of:
And the streaming data processing system stores the summarized results obtained by the real-time summarization in the external columnar data management system so as to carry out secondary aggregation according to the summarized results.
7. The method of claim 1, wherein the business data prediction model is constructed based on a multi-layer CNN convolutional neural network;
the method further comprises the steps of:
the service data prediction system acquires processed data in a preset historical time length taking the current time as a starting point from the data information system according to a preset period;
and carrying out distributed training on the service data prediction model at a plurality of training nodes based on the acquired processed data and updating the service data prediction model.
8. A user data monitoring platform, the user data monitoring platform comprising: the system comprises a data information system, a stream data processing system and a service data prediction system, wherein the data information system is respectively connected with the stream data processing system and the service data prediction system; the user data monitoring platform can be connected with at least one service system;
the data information system is used for receiving and storing user data generated by the operation of a user in the service system in real time and receiving and storing processed data returned by the stream data processing system;
The streaming data processing system obtains the user data from the data information system based on a streaming processing mechanism, preprocesses the user data, stores the obtained processed data in the data information system, and gathers and displays user activity indexes in real time according to the processed data, wherein the user activity indexes at least comprise: user liveness;
the business data prediction system predicts the user activity index in the future preset time length based on a business data prediction model and according to the processed data, wherein the business data prediction model is obtained by training the business data prediction system by acquiring the processed data in the preset history time length from the data information system.
9. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions which when executed cause the processor to perform the user data monitoring method of any of claims 1 to 7.
10. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the user data monitoring method of any of claims 1-7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310814204.8A CN116703467A (en) | 2023-07-04 | 2023-07-04 | User data monitoring method, system, electronic device and readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310814204.8A CN116703467A (en) | 2023-07-04 | 2023-07-04 | User data monitoring method, system, electronic device and readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116703467A true CN116703467A (en) | 2023-09-05 |
Family
ID=87823932
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310814204.8A Pending CN116703467A (en) | 2023-07-04 | 2023-07-04 | User data monitoring method, system, electronic device and readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116703467A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118260294A (en) * | 2024-05-29 | 2024-06-28 | 蒲惠智造科技股份有限公司 | Manufacturing pain signal summarizing method, system, medium and equipment based on AI |
-
2023
- 2023-07-04 CN CN202310814204.8A patent/CN116703467A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118260294A (en) * | 2024-05-29 | 2024-06-28 | 蒲惠智造科技股份有限公司 | Manufacturing pain signal summarizing method, system, medium and equipment based on AI |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9367601B2 (en) | Cost-based optimization of configuration parameters and cluster sizing for hadoop | |
| US7673291B2 (en) | Automatic database diagnostic monitor architecture | |
| CN111143286B (en) | A cloud platform log management method and system | |
| US20180357111A1 (en) | Data center operation | |
| CN111209352A (en) | Data processing method and device, electronic equipment and storage medium | |
| US7376682B2 (en) | Time model | |
| US9037905B2 (en) | Data processing failure recovery method, system and program | |
| CN112506743A (en) | Log monitoring method and device and server | |
| CN110675194A (en) | Funnel analysis method, device, equipment and readable medium | |
| CN111177237B (en) | Data processing system, method and device | |
| CN105868070A (en) | Method and apparatus for determining resources consumed by tasks | |
| CN114090529A (en) | Log management method, device, system and storage medium | |
| CN110704484A (en) | Method and system for processing mass real-time data stream | |
| CN109033188A (en) | A kind of metadata acquisition method, apparatus, server and computer-readable medium | |
| CN108509313A (en) | A kind of business monitoring method, platform and storage medium | |
| CN113360581A (en) | Data processing method, device and storage medium | |
| WO2017015059A1 (en) | Efficient cache warm up based on user requests | |
| CN113220530B (en) | Data quality monitoring method and platform | |
| CN113297057A (en) | Memory analysis method, device and system | |
| CN116703467A (en) | User data monitoring method, system, electronic device and readable storage medium | |
| CN112286918B (en) | Method and device for fast access conversion of data, electronic equipment and storage medium | |
| CN117632860A (en) | Method and device for merging small files based on Flink engine and electronic equipment | |
| US11899693B2 (en) | Trait expansion techniques in binary matrix datasets | |
| CN110019045A (en) | Method and device is landed in log | |
| CN115033179B (en) | Data storage method, device, equipment and medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |