CN119474765B

CN119474765B - A web-based big data analysis method and related equipment

Info

Publication number: CN119474765B
Application number: CN202510061862.3A
Authority: CN
Inventors: 黄颂凯; 于鹏; 石自军; 陈兆俊
Original assignee: Shenzhen Zhongzheng Huizhi Management Consulting Co ltd
Current assignee: Shenzhen Zhongzheng Huizhi Management Consulting Co ltd
Priority date: 2025-01-15
Filing date: 2025-01-15
Publication date: 2025-05-09
Anticipated expiration: 2045-01-15
Also published as: CN119474765A

Abstract

The present application relates to a field, and discloses a web-based big data analysis method and related equipment, the method comprising: integrating the microservice front-end components of the subsystem into the low-code platform; integrating the shared data of the subsystem in the low-code platform according to the subsystem loader; preprocessing the shared data according to the preset cleaning rules; when receiving the user's data analysis request, dividing the data analysis task corresponding to the data analysis request into N subtasks according to the preprocessed target shared data; and assigning the subtasks to the subsystem for data analysis. Through the implementation of the scheme of the present application, the shared data of the subsystem is integrated and preprocessed on the low-code platform, and when executing the data analysis task, the data analysis task is divided into multiple subtasks according to the preprocessed target shared data, and the processing efficiency of the data analysis task is effectively improved through the distributed intelligent task scheduling mechanism.

Description

Web-based big data analysis method and related equipment

Technical Field

The application relates to the field of Internet, in particular to a web-based big data analysis method and related equipment.

Background

In the current information and digital age, enterprises and organizations are increasingly relying on big data analytics to support decision making and business optimization. To achieve efficient big data analysis, micro-service architecture based systems are widely adopted. These systems are typically composed of a plurality of independent modules, each of which is responsible for processing a different data set or function. However, since the data storage and management of each module are independent, the sharing and integration of data become difficult, further, the data of each module cannot be comprehensively analyzed in the big data analysis process, and some big data analysis systems usually adopt a centralized task scheduling mechanism, so that the task division and allocation process lacks flexibility and intelligence. In the case of multiple modules, centralized scheduling is difficult to efficiently handle complex data analysis tasks.

Disclosure of Invention

The application provides a web-based big data analysis method and related equipment, which are used for solving the problem that centralized scheduling is difficult to efficiently process complex data analysis tasks in the related technology.

The first aspect of the application provides a web-based big data analysis method, which comprises the following steps:

Integrating a micro-service front-end component of the subsystem into a low-code platform;

integrating the shared data of the subsystem in the low-code platform according to a subsystem loader;

Preprocessing the shared data according to a preset cleaning rule;

when a data analysis request of a user is received, dividing a data analysis task corresponding to the data analysis request into N sub-tasks according to the preprocessed target shared data, wherein N is an integer greater than or equal to 2;

and respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis.

Optionally, in a first implementation manner of the first aspect of the present application, the step of integrating the micro service front end component of the subsystem into the low code platform includes:

generating a sandbox in the low code platform, wherein the sandbox is an independent virtual environment;

generating an environment identifier for a micro-service front-end component of the subsystem according to a global manager;

the microservice front end component and the corresponding environment identifier are assigned to the sandbox.

Optionally, in a second implementation manner of the first aspect of the present application, after the step of allocating the micro service front end component and the corresponding environment identifier to the sandbox, the method further includes:

Determining a target subsystem according to the data analysis request;

acquiring a target environment identifier corresponding to the target subsystem;

Creating a window container according to the target environment identifier;

and loading the micro-service front-end component of the target subsystem to the window container.

Optionally, in a third implementation manner of the first aspect of the present application, after the step of integrating the micro service front end component of the subsystem into the low code platform, the method further includes:

according to a preset configuration file, initial data required by data sharing are sent to the micro-service front-end component;

When a data query instruction of a user is detected, the data query instruction is sent to the micro-service front-end component according to an event bus;

and receiving the shared data sent by the micro-service front-end component based on the initial data and the data query instruction.

Optionally, in a fourth implementation manner of the first aspect of the present application, the method further includes:

determining a target subsystem window which is required to be queried by a user according to the data query instruction;

Predicting a subsystem window to be accessed according to the target subsystem and the user history inquiry record;

and loading the shared data of the corresponding access subsystem according to the predicted result of the subsystem window to be accessed.

Optionally, in a fifth implementation manner of the first aspect of the present application, when a data analysis request of a user is received, the step of dividing a data analysis task corresponding to the data analysis request into N sub-tasks according to the preprocessed target shared data includes:

Determining a calculation logic tree of the data analysis task according to the data analysis request;

determining a data format of the target shared data;

Dividing the data analysis task into a plurality of subtasks according to the calculation logic tree and the data format.

Optionally, in a sixth implementation manner of the first aspect of the present application, after the step of allocating the N sub-tasks to the corresponding sub-systems for data analysis, the method further includes:

monitoring the task execution state of the subsystem in real time;

triggering a rescheduling mechanism when the main system detects that the subtasks fail or the result is incomplete;

And reallocating the unfinished subtasks according to the rescheduling mechanism.

A second aspect of the present application provides a web-based big data analysis device including:

the integration module is used for integrating the micro-service front-end component of the subsystem to the low-code platform;

The integration module is used for integrating the shared data of the subsystem in the low-code platform according to a subsystem loader;

the preprocessing module is used for preprocessing the shared data according to a preset cleaning rule;

The system comprises a dividing module, a data analysis module and a data analysis module, wherein the dividing module is used for dividing a data analysis task corresponding to a data analysis request into N sub-tasks according to preprocessed target shared data when the data analysis request of a user is received, wherein N is an integer greater than or equal to 2;

and the distribution module is used for respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis.

A third aspect of the embodiment of the present application provides an electronic device, including a memory and a processor, where the processor is configured to execute a computer program stored on the memory, and when the processor executes the computer program, each step in the web-based big data analysis method provided in the first aspect of the embodiment of the present application is implemented.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the web-based big data analysis method provided in the first aspect of the embodiments of the present application described above.

In summary, according to the web-based big data analysis method and the related device provided by the scheme of the application, a micro-service front-end component of a subsystem is integrated to a low-code platform, shared data of the subsystem is integrated in the low-code platform according to a subsystem loader, the shared data is preprocessed according to a preset cleaning rule, when a data analysis request of a user is received, a data analysis task corresponding to the data analysis request is divided into N sub-tasks according to the preprocessed target shared data, and the N sub-tasks are respectively distributed to the corresponding subsystem for data analysis. By implementing the scheme of the application, the shared data of the sub-system is integrated and preprocessed on the low-code platform, and when the data analysis task is executed, the data analysis task is divided into a plurality of sub-tasks according to the preprocessed target shared data, and the processing efficiency of the data analysis task is effectively improved through a distributed intelligent task scheduling mechanism.

Drawings

FIG. 1 is a flow chart of a web-based big data analysis method provided by an embodiment of the application;

FIG. 2 is a schematic diagram of a program module of a web-based big data analysis device according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to solve the problem that centralized scheduling is difficult to efficiently process complex data analysis tasks in the related art, an embodiment of the present application provides a web-based big data analysis method, as shown in fig. 1, which is a schematic flow chart of the web-based big data analysis method provided in the present embodiment, where the web-based big data analysis method includes the following steps:

step 110, integrating the micro-service front-end components of the subsystem into a low code platform.

Specifically, in this embodiment, a low-code platform is used as a core architecture of a main system, and a micro-service architecture is introduced as an organization form of a subsystem, so as to implement efficient management and unified integration of multiple system resources. In the low code platform, the main system integrates the micro-service front-end component into the main interface of the system in a mode of dynamically loading the custom component.

In an alternative implementation manner of the embodiment, the step of integrating the micro-service front-end component of the subsystem into the low-code platform comprises the steps of generating a sandbox in the low-code platform, wherein the sandbox is an independent virtual environment, generating an environment identifier for the micro-service front-end component of the subsystem according to a global manager, and distributing the micro-service front-end component and the environment identifier to the sandbox.

Specifically, in this embodiment, each micro-service component is assigned to a separate virtual environment (sandbox) when loaded. The sandbox utilizes the memory isolation technology of the browser and an independent style scope (which refers to a set of CSS styles defined for a component, only acts on DOM elements of the current component, but does not leak outside or influence styles of other components), so that static resources (including an HTML structure, CSS styles and JavaScript logic) among all subsystems are ensured not to interfere with each other. When components of each subsystem are loaded, the host system generates a unique environment identifier through a global manager (Resource Orchestrator) and injects it into the sandbox for resource isolation and call tracking. On the basis, the main system and the subsystem communicate through the event bus customized by the low-code platform. The main system is responsible for monitoring operation events triggered by a user, loading corresponding subsystem components according to an operation target, and completing seamless switching of a main interface and a subsystem interface. The mechanism not only realizes flexible loading of the front end of the micro service, but also greatly enhances maintainability and expansibility of the system.

In an optional implementation manner of this embodiment, after the step of allocating the micro service front end component and the environment identifier corresponding to the micro service front end component to the sandbox, the method further includes determining a target subsystem according to the data analysis request, obtaining a target environment identifier corresponding to the target subsystem, creating a window container according to the target environment identifier, and loading the micro service front end component of the target subsystem to the window container.

Specifically, in this embodiment, in order to further optimize the user operation experience, interface resources of multiple subsystems are integrated into the main system in a componentization and multi-windowed manner, so as to achieve the effect of parallel running of "multi-task windows". In the conventional micro front end architecture, a user usually jumps to different subsystem interfaces through URLs, but this way cannot show multiple systems in the same interface at the same time, resulting in poor user experience. At the heart of the multi-window mechanism is a window management module of the low code platform. When a user triggers a subsystem in the main interface, the main system dynamically creates a window container and loads the micro-service front-end component corresponding to the target subsystem into the container. Each window container has independent lifecycle management and style scopes, and a user can freely drag, zoom, minimize, or close the window. To ensure resource independence between multiple windows, a sandboxed mechanism generated environment identifier is introduced in the creation process of each window. The window container can be automatically bound into the corresponding sandbox environment, so that the data and patterns among different windows are ensured not to interfere with each other. On this basis, the window management module of the main system also supports the message passing function between windows. For example, a user submits a piece of form data in a window, and another window can receive and process the piece of data in real time to complete collaborative processing of tasks.

In an optional implementation manner of this embodiment, after the step of integrating the micro service front end component of the subsystem into the low code platform, the method further includes sending initial data required for data sharing to the micro service front end component according to a preset configuration file, sending a data query instruction to the micro service front end component according to an event bus when a data query instruction of a user is detected, and receiving the shared data sent by the micro service front end component based on the initial data and the data query instruction.

Specifically, in this embodiment, after the system integration is completed, in order to implement data interaction between the main system and the subsystem, when the main system loads a subsystem component, initial data required for operation of the subsystem is injected into the subsystem through a configuration file of the custom component. For example, the host system user triggers a data query operation by clicking a button, at which time the host system will transmit an operation instruction to the micro-service front-end component corresponding to the target subsystem via the event bus. In order to ensure the security and the real-time performance of data transmission, an authentication mechanism based on an encryption token can be introduced, a temporary token is generated each time the main system sends data to the subsystem, and the subsystem can continue to process the data by decrypting and verifying the validity of the token after receiving the data.

Optionally, when the user initiates a data query operation on the low-code platform, a data query instruction is generated, where the instruction includes specific content and a target of the query, for example, sales data or user behavior records in a specific period of time are queried, a query analysis module of the low-code platform analyzes the instruction, extracts query content and target subsystem information, determines a target subsystem window that the user needs to query according to the analysis result, and after the target subsystem window is determined, the host system activates a corresponding subsystem window in the low-code platform through a dynamic component loading mechanism, and simultaneously continuously collects query records of the user on the low-code platform, including information of the target subsystem, the query content, a timestamp and the like of each query, the host system constructs a prediction model through a machine learning algorithm, an input of the model is a historical query record of the user, an output of the model is a prediction result of the subsystem window to be accessed, and a commonly used prediction model includes, but is not limited to a prediction based on a sequence model, for example, a long-term memory network (LSTM) is used to predict a subsystem window that the user may query next time. When a user initiates a new data query instruction, the main system predicts other subsystem windows which the user may access by combining the target subsystem of the current query and the historical query record of the user by using the trained prediction model, and loads shared data of the corresponding subsystems in advance according to the prediction result of the subsystem window to be accessed.

Optionally, in modern big data analysis and micro-service architecture, in order to achieve efficient data management and analysis, the problem of data integration and isolation is solved by sharing an independent lightweight data warehouse and binding data areas with environment identifiers. Specifically, the lightweight data warehouse is a unified data storage center and is used for sharing data between the main system and all subsystems, and compared with a traditional database, the lightweight data warehouse is simpler and more efficient, is suitable for data storage and management of a front-end environment, and all data are stored in one data warehouse in a centralized manner, so that management complexity caused by data dispersion is avoided. While the host system and all subsystems share the same lightweight data warehouse, each subsystem can only access data regions bound to its environment identifier. For example, subsystem A has an environment identifier EnvA, which can only access data associated with EnvA in the data warehouse, i.e., the environment identifier is used to mark the operating environment and data authority range of the subsystem.

And 120, integrating the shared data of the subsystems in the low-code platform according to the subsystem loader.

Specifically, in this embodiment, the subsystem loader is configured to load a resource file related to a subsystem into the main system, where when the shared data (such as an SQL database, a NoSQL storage, real-time streaming data, etc.) of each subsystem is accessed into the main system, the shared data is connected to a resource manager of the main system through the corresponding subsystem loader. The subsystem loader binds subsystem data according to the environment identifier of each subsystem, so that the collection and management of the data are logically completely isolated, but can be queried and operated uniformly in the main system. In actual operation, when a user selects a certain big data analysis task (such as cross-system sales forecast analysis), the host system dynamically loads the adapter modules of related subsystems and completes integration of multiple data sources by utilizing the subsystem loader.

And 130, preprocessing the shared data according to a preset cleaning rule.

Specifically, in this embodiment, after completing data source integration, the host system will utilize the shared lightweight data warehouse to preprocess and clean shared data, and the host system uniformly invokes predefined cleaning rules (such as deduplication, outlier processing, data type conversion, etc.) to process the shared data, so that the host system supports dynamic loading and executes the cleaning rules for different service scenarios. For example, when analyzing user behavior data, the main system can automatically clean repeated user IDs and unified data formats in all subsystems, and reject noise data to ensure the accuracy of subsequent analysis. For example, the time format of the shared data is normalized, repeated data records are cleaned, the data uniqueness is ensured, missing values in the data are filled, removed or marked, abnormal values (such as error input, extreme values and the like) in the data are detected and processed, field names and field values in different subsystems are mapped to unified standards, the consistency of all data fields is ensured, and when the method is understood, the execution of the cleaning rule depends on a built-in rule engine, and the rule engine can dynamically load the cleaning rule and process the data piece by piece.

It should be noted that, given time-series data is set asThe cleaning rule in this embodiment can be expressed by the following formula:

,

wherein, For the mean value of the time series data X, define,For standard deviation of time series data, define,A decision threshold value that is an outlier,The value of (2) can be adjusted according to the actual application scene,Correction factors for outliers, usually taken to pull the outliers back into a reasonable range,Data pointsIs a set of neighbors of a given set,For neighborhood setsIs defined as the number of elements of the set,For averaging data points in the neighborhood, it can be understood that the judgment formulas of the first row and the second row in the formula are expressed as the treatment of abnormal values, the abnormal values exceeding the normal range are limited in a reasonable range through the judgment formulas of the first row and the second row, so that the influence of the abnormal values on an analysis result is reduced, the judgment formulas of the third row in the formula are the treatment of the missing values, the local trend information of adjacent data is utilized to fill the missing values, so that the damage to the overall trend of a time sequence is avoided, the judgment formulas of the fourth row in the formula are the treatment of noise data, and ifDoes not satisfy the outlier condition (i.e) And the original value is directly reserved if the data is not the missing value, so that the integrity of the normal data point is ensured, and unnecessary correction is not introduced. The formula is a comprehensive cleaning rule, and provides an accurate and flexible correction method for abnormal values, missing values and noise problems in time series data. By rational setting of parameters (e.g、) The method can adapt to different data characteristics and analysis requirements, and ensures that the cleaned data truly reflects the original trend and is not interfered by abnormal data.

Step 140, when a data analysis request of a user is received, dividing a data analysis task corresponding to the data analysis request into N sub-tasks according to the preprocessed target shared data;

and step 150, distributing the sub-tasks to the subsystem for data analysis.

Specifically, in this embodiment, when a user initiates a big data analysis request, the host system parses the task dependency (the cleaning step normalizes the data structure and content, such as unified data format, normalized field names, merging redundant data, etc., these operations provide explicit dependencies for subsequent distributed computing tasks), and decomposes the analysis task into multiple subtasks through the task scheduling module. These sub-tasks may be distributed to the corresponding sub-systems for execution, with the sandboxed environment of the sub-systems ensuring the independence and security of the tasks. The results of the distributed computation are gradually summarized into a main system, and the main system combines, counts and converts the data through a built-in real-time stream processing module to generate a final analysis result. For example, in a sales prediction analysis scenario, a host system may schedule sales data for each regional subsystem, generating nationwide sales prediction data through a distributed computing model.

Optionally, after the data analysis task is completed, the big data analysis result is visually displayed in a multi-view mode, and a user can open a plurality of windows in the main system interface, wherein each window displays different analysis dimensions or result data, and the windows can include but are not limited to view types including a line graph, a bar graph, a thermodynamic diagram, a geographic visual map and a dynamic dashboard. The window management module of the main system supports real-time interaction and data linkage between windows, for example, when a user selects a specific time range in a certain window, the display content of other windows can be automatically updated in a linkage manner to reflect the data in the selected range. Each window corresponds to an independent data analysis result, and the sandboxed mechanism of the window ensures the independence and performance stability of view rendering.

In an optional implementation manner of this embodiment, when a data analysis request of a user is received, the step of dividing a data analysis task corresponding to the data analysis request into N subtasks according to the preprocessed target shared data includes determining a computation logic tree of the data analysis task according to the data analysis request, determining a data format of the target shared data, and dividing the data analysis task into N subtasks according to the computation logic tree and the data format.

Specifically, in this embodiment, the task scheduling module of the host system may analyze the analysis requirement of the user and convert the analysis requirement into a computation logic tree, where the computation logic tree describes the decomposition manner and the dependency relationship of the task, for example, taking analysis of sales data as an example, a first-level task, which performs sales statistics according to regional (Region) division data, a second-level task, which further subdivides the statistics result according to time periods, and a third-level task, which aggregates sales of all regions to generate a global trend. The basis for task partitioning is directly from the data preprocessing rules and the cleaning results, e.g., if the data is partitioned by region labels, the scheduling module may partition the data into a plurality of subtasks based on these partitioning rules. The task scheduling module distributes the divided sub-tasks to the corresponding sub-systems through the resource manager in the main system, and the task scheduling module distributes the tasks according to the computing resources (CPU, memory and the like) of each sub-system, for example, the sub-system with stronger computing resources can process larger-scale data or more complex computing logic. After the sub-tasks are distributed to the sub-systems, the sub-systems independently execute the tasks through the sandboxed environment and the distributed computing framework, after the corresponding data analysis is completed, each sub-system can transmit the data analysis results of the sub-tasks back to the main system, and after the main system receives the data analysis results of each sub-system, the main system can summarize the results by utilizing the real-time flow processing module.

In an optional implementation manner of this embodiment, after the step of respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis, the method further includes monitoring a task execution state of the sub-systems in real time, triggering a rescheduling mechanism when the main system detects that a sub-task fails or a result is incomplete, and redistributing the incomplete sub-tasks according to the rescheduling mechanism.

Specifically, in this embodiment, each subtask is assigned a unique task identifier when scheduling, the subsystem reports the task status (such as "in progress", "completed", "failed", etc.) to the main system periodically when executing the task, the main system determines whether the subtask fails or fails over a preset timeout period by monitoring the task status and the timeout period, and when the main system detects that the subtask fails or results are incomplete, the main system triggers a rescheduling mechanism to reassign and execute the incomplete subtask, for example, a task retry mechanism, in which the main system first tries to reassign the failed subtask to an atomic system, the retry times can be set according to the system configuration, and a task migration mechanism, in which the main system migrates the task to other available subsystems for execution if the retry times exceeds the preset value. And when the task is migrated, selecting according to the computing resource and the current load condition of the subsystem, and processing partial results, wherein for the subsystem which generates partial results, the system stores the partial results and continues to calculate when the task is redistributed, so that repeated calculation is avoided.

Optionally, by means of a rescheduling algorithm, combining with a task retry and migration mechanism, it is ensured that the task can be reassigned and executed in time after failure, and the rescheduling algorithm has the following formula:

,

wherein, Representing a rescheduling function for reallocating the task T, T representing the task to be rescheduled, S representing the subsystem to which the task was originally allocated, R representing the current number of retries, an initial value of 0,Indicating the current number of retries to return to task T,Indicating the maximum number of retries of the system configuration,For the retry function, task T is reassigned to subsystem S,' Represent a new subsystem, represent the subsystem to which the task is migrated, are selected by the task migration mechanism,Representing migration functions, assigning task T to a new subsystemIt will be appreciated that if the current retry number R is less than the maximum retry numberThen callA function, reassigning task T to atomic system S,The function internally increases the count of the number of retries R and reassigns tasks to the subsystem S for execution if the current number of retries R has reached or exceeded the maximum number of retriesThen callFunction, migration of task T to new subsystem' Execution. New subsystem' Selected by the task migration mechanism, typically based on the computing resources and load conditions of the subsystem. If the task T is successfully completed after retry or migration, saving the result and updating the task state to be 'completed', and if the task T is not completed after all retry and migration, recording the failure reason of the task and triggering alarm or manual intervention.

It should be noted that the selection algorithm for the new subsystem is as follows:

,

where M is the set of all available subsystems, The current load of the subsystem S, representing the number of tasks currently being processed by the subsystem,The computing resource is denoted as a subsystem S, and represents the computing capability (such as the number of CPU cores, the memory size, etc.) of the subsystem, where the meaning of the formula is that by calculating the ratio of the load of each subsystem to the computing resource, the subsystem with the smallest ratio is selected as a new task migration target. The smaller the ratio, the lighter the load of the subsystem relative to its computing resources, suitable as a target for task migration.

Alternatively, in big data analysis, users often need to aggregate analysis and prediction of multidimensional data to gain more comprehensive business insight. For example, in an e-commerce platform, a user may need to analyze sales data over a period of time, aggregate statistics in terms of regional, product type, etc. dimensions, and predict future sales trends. Therefore, the embodiment includes a multi-dimensional aggregation analysis and prediction algorithm, which generates complex multi-dimensional statistical results and trend predictions based on user query instructions and by combining multi-dimensional data through aggregation functions and prediction models.

Specifically, in this embodiment, according to a user query instruction, a target data set is extracted, data cleaning and format unification are performed, according to dimensions specified by a user (such as time, region, and product type), multidimensional aggregation statistics is performed on the data, and a statistical result is generated through an aggregation function (such as summation, average value, maximum value, minimum value, and the like). Based on the aggregation result and the historical data, a prediction model is constructed to predict future trend, and a time sequence analysis and a machine learning algorithm are applied to generate a prediction result. It should be noted that the aggregation function in this embodiment can be expressed as:

,

wherein, Expressed in dimensions asThe result of the aggregate calculation on the data set M is above,Representing an aggregate dimension specified for the user, such as time, region, product type, etc.Represented as the ith record in dimensionThe value of the above-mentioned value,For example, aggregate functions such as summation, average, maximum, minimum, etc.

The predictive formula can be expressed as:

,

wherein, As a predicted value for the point in time t,、、For the point in time、、Is used to determine the historical value of (c), the history values are all aggregation values after the aggregation function,、、For model parameters, the influence weight of the historical value on the predicted value is expressed,As error terms, random errors of the prediction model are represented.

After obtaining the predicted value, the following formula may be used for task allocation:

,

wherein, Processing data sets for subsystem jIs used for the calculation of the calculation efficiency of (a),For subsystem j pair data setAs a result of the multi-dimensional aggregate calculations,Processing data sets for subsystem jIs used for the time period of (a),The most computationally efficient subsystem is represented as the optimal subsystem. According to the aggregate calculation result and the processing time of the subsystems, calculating the calculation efficiency of each subsystem, selecting the subsystem with the highest calculation efficiency for data processing, and optimizing the calculation performance.

According to the web-based big data analysis method provided by the scheme of the application, a micro-service front-end component of a subsystem is integrated to a low-code platform, shared data of the subsystem is integrated in the low-code platform according to a subsystem loader, the shared data is preprocessed according to a preset cleaning rule, when a data analysis request of a user is received, a data analysis task corresponding to the data analysis request is divided into N sub-tasks according to the preprocessed target shared data, and the N sub-tasks are respectively distributed to the corresponding subsystem for data analysis. By implementing the scheme of the application, the shared data of the sub-system is integrated and preprocessed on the low-code platform, and when the data analysis task is executed, the data analysis task is divided into a plurality of sub-tasks according to the preprocessed target shared data, and the processing efficiency of the data analysis task is effectively improved through a distributed intelligent task scheduling mechanism.

Fig. 2 is a diagram of a web-based big data analysis device according to an embodiment of the present application, which may be used to implement the web-based big data analysis method in the foregoing embodiment. As shown in fig. 2, the web-based big data analysis apparatus mainly includes:

an integration module 10 for integrating the micro-service front-end components of the subsystem into a low code platform;

an integration module 20, configured to integrate the shared data of the subsystem in the low-code platform according to a subsystem loader;

A preprocessing module 30, configured to preprocess the shared data according to a preset cleaning rule;

The dividing module 40 is configured to divide, when a data analysis request of a user is received, a data analysis task corresponding to the data analysis request into N sub-tasks according to the preprocessed target shared data, where N is an integer greater than or equal to 2;

and the distribution module 50 is used for respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis.

In an alternative implementation manner of the embodiment, the integration module is specifically configured to generate a sandbox in the low-code platform, where the sandbox is an independent virtual environment, generate an environment identifier for a micro-service front-end component of the subsystem according to a global manager, and allocate the micro-service front-end component and the environment identifier to the sandbox.

In an optional implementation manner of this embodiment, the big data analysis device further includes a determining module, an obtaining module, a creating module, and a loading module. The determining module is used for determining a target subsystem according to the data analysis request. The acquisition module is used for acquiring the target environment identifier corresponding to the target subsystem. The creating module is used for creating a window container according to the target environment identifier. And the loading module is used for loading the micro-service front-end component of the target subsystem to the window container.

In an optional implementation manner of this embodiment, the big data analysis device further includes a sending module and a receiving module. The sending module is used for sending initial data required by data sharing to the micro-service front-end component according to a preset configuration file, and sending the data query instruction to the micro-service front-end component according to an event bus when the data query instruction of a user is detected. The receiving module is used for receiving the shared data sent by the micro-service front-end component based on the initial data and the data query instruction.

Further, in an alternative implementation manner of the embodiment, the big data analysis device further comprises a prediction module. The determining module is also used for determining a target subsystem window which is required to be queried by the user according to the data query instruction. And the prediction module is used for predicting a subsystem window to be accessed according to the target subsystem and the user history query record. The loading module is also used for loading the shared data of the corresponding access subsystem according to the prediction result of the subsystem window to be accessed.

In an optional implementation manner of this embodiment, the dividing module is specifically configured to determine a computation logic tree of the data analysis task according to the data analysis request, determine a data format of the target shared data, and divide the data analysis task into a plurality of subtasks according to the computation logic tree and the data format.

In an alternative implementation manner of the embodiment, the big data analysis device further comprises a monitoring module and a processing module. The monitoring module is used for monitoring the task execution state of the subsystem in real time. The processing module is used for triggering a rescheduling mechanism when the main system detects that the subtasks fail or the result is incomplete, and reallocating the incomplete subtasks according to the rescheduling mechanism.

According to the web-based big data analysis device provided by the scheme of the application, a micro-service front-end component of a subsystem is integrated to a low-code platform, shared data of the subsystem is integrated in the low-code platform according to a subsystem loader, the shared data is preprocessed according to a preset cleaning rule, when a data analysis request of a user is received, a data analysis task corresponding to the data analysis request is divided into N sub-tasks according to the preprocessed target shared data, and the N sub-tasks are respectively distributed to the corresponding subsystem for data analysis. By implementing the scheme of the application, the shared data of the sub-system is integrated and preprocessed on the low-code platform, and when the data analysis task is executed, the data analysis task is divided into a plurality of sub-tasks according to the preprocessed target shared data, and the processing efficiency of the data analysis task is effectively improved through a distributed intelligent task scheduling mechanism.

Fig. 3 provided according to the scheme of the present application is an electronic device provided by an embodiment of the present application. The electronic device can be used for realizing the web-based big data analysis method in the previous embodiment, and mainly comprises the following steps:

Memory 301, processor 302, and computer program 303 stored on memory 301 and executable on processor 302, memory 301 and processor 302 being connected by communication. When the processor 302 executes the computer program 303, the web-based big data analysis method in the foregoing embodiment is implemented. Wherein the number of processors may be one or more.

The memory 301 may be a high-speed random access memory (RAM, random Access Memory) memory or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 301 is used for storing executable program code, and the processor 302 is coupled to the memory 301.

Further, an embodiment of the present application further provides a computer readable storage medium, which may be provided in the electronic device in each of the foregoing embodiments, and the computer readable storage medium may be a memory in the foregoing embodiment shown in fig. 3.

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the web-based big data analysis method of the foregoing embodiments. Further, the computer-readable medium may be any medium capable of storing a program code, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the application.

Claims

1. A web-based big data analysis method, comprising:

Preprocessing the shared data according to a preset cleaning rule;

Respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis;

the method further comprises the steps of:

The method comprises the steps of carrying out aggregation statistics on multidimensional data of shared data according to a dimension appointed in a user query instruction, generating an aggregation result according to an aggregation function, constructing a prediction model based on the aggregation result and corresponding historical data, and generating a prediction result reflecting future trend;

The above aggregate result is calculated by the following formula:

,

wherein, Expressed in dimensions asThe result of the aggregate calculation on the data set M is above,Represented as an aggregated dimension specified by the user,Represented as the ith record in dimensionThe value of the above-mentioned value,Is an aggregation function;

The above prediction result is calculated by the following formula:

,

wherein, As a predicted value for the point in time t,、、For the point in time、、The history values of the above are all aggregation results after the aggregation function,、、For model parameters, the influence weight of the historical value on the predicted value is expressed,As an error term, representing a random error of the prediction model;

Task allocation is performed by the following formula:

,

wherein, The computational efficiency of processing the data set M for subsystem j,As a result of the multi-dimensional aggregate computation of the data set M for subsystem j,The time for processing data set M for subsystem j,Calculating the calculation efficiency of each subsystem according to the aggregate calculation result and the processing time of the subsystems, selecting the subsystem with the highest calculation efficiency for data processing, and optimizing the calculation performance.

2. The web-based big data analysis method of claim 1, wherein the step of integrating the micro-service front-end components of the subsystem into a low code platform comprises:

3. The web-based big data analysis method of claim 2, wherein after the step of assigning the micro-service front-end component and the corresponding environment identifier to the sandbox, further comprising:

Determining a target subsystem according to the data analysis request;

creating a window container according to the target environment identifier;

4. The web-based big data analysis method of claim 1, wherein after the step of integrating the micro-service front-end components of the subsystem into the low code platform, further comprising:

5. The web-based big data analysis method of claim 4, wherein the method further comprises:

6. The web-based big data analysis method according to claim 1, wherein when a data analysis request of a user is received, the step of dividing a data analysis task corresponding to the data analysis request into N sub-tasks according to the preprocessed target shared data comprises:

determining a data format of the target shared data;

and dividing the data analysis task into N sub-tasks according to the calculation logic tree and the data format.

7. The web-based big data analysis method according to claim 1, wherein after the step of assigning the N sub-tasks to the respective sub-systems for data analysis, further comprising:

monitoring the task execution state of the subsystem in real time;

8. A web-based big data analysis device, the web-based big data analysis device comprising:

The distribution module is used for respectively distributing the N sub-tasks to the corresponding sub-systems for data analysis;

The distribution module is also used for carrying out aggregation statistics on the multidimensional data of the shared data according to the dimension appointed in the user query instruction and generating an aggregation result according to an aggregation function, constructing a prediction model based on the aggregation result and corresponding historical data and generating a prediction result reflecting future trend;

The above aggregate result is calculated by the following formula:

,

The above prediction result is calculated by the following formula:

,

Task allocation is performed by the following formula:

,

9. An electronic device comprising a memory and a processor, wherein:

The processor is used for executing the computer program stored on the memory;

The processor, when executing the computer program, implements the steps of the web-based big data analysis method of any of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the web-based big data analysis method of any of claims 1 to 7.