CN116069819B

CN116069819B - Methods, devices, equipment and media for presenting portrait reports

Info

Publication number: CN116069819B
Application number: CN202310108267.1A
Authority: CN
Inventors: 杨帆; 陈婷; 吴三平; 王宗泽
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2025-12-05
Anticipated expiration: 2043-01-30
Also published as: CN116069819A

Abstract

This application discloses a method, apparatus, device, and medium for presenting a profile report, belonging to the field of data storage technology. The method includes: obtaining condition information corresponding to a profile query request and generating profile items based on the condition information; generating a database query statement based on the profile items, and using the database query statement to obtain the profile item's metadata and a profile result set; wherein, the profile metadata serves as an identifier for the condition information; generating a data bucket identifier corresponding to the profile result set based on the profile metadata according to a preset data bucket identifier generation rule; performing distributed data storage proxy on the profile result set based on the data bucket identifier; and generating and presenting a profile report based on the distributed storage of the profile result set. This solves the technical problem that a single data source cannot meet data storage needs, resulting in a storage bottleneck.

Description

Portrait report presentation method, device, equipment and medium

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a portrait report presenting method, a portrait report presenting device, a portrait report presenting apparatus, and a computer readable storage medium.

Background

At present, most user portrait systems are mainly oriented to single-dimensional data tag distribution display, so that in the data storage technology, a relational database of a single data source or other non-relational NoSQL (non-relational database) databases are generally adopted to store portrait result data. When the portrait dimension is too much, the data storage requirement cannot be met by a single relational database or NoSQL data according to the expansion of data volume, and when the portrait data is accumulated, the portrait data query performance is extremely reduced. Therefore, the existing storage schemes are not suitable for complex user portrayal systems that require multidimensional-oriented data queries and even multidimensional-oriented interactive instant queries.

Disclosure of Invention

The application mainly aims to provide an portrait report presenting method, an portrait report presenting chip and a computer readable storage medium, which aim to solve the technical problem that a single data source in the prior art cannot meet the data storage requirement and has storage bottleneck.

In order to achieve the above object, the present application provides a portrait report presenting method, including:

acquiring condition information corresponding to a portrait inquiry request, and generating portrait items based on the condition information;

Generating a database query statement according to the portrait item, and acquiring item metadata and an item result set of the portrait item by using the database query statement, wherein the item metadata is the identification of the condition information;

And generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, performing distributed data storage agent on the item result set according to the data bucket identifier, and generating and presenting a portrait report based on the item result set which is stored in a distributed mode.

The step of generating the data bucket identifier corresponding to the item result set according to the preset data bucket identifier generating rule based on the item metadata includes:

Combining the project metadata to generate a data bucket identifier corresponding to the project result set;

The dimension identification and/or index identification in the data bucket identification are selected according to whether the item result set can be uniquely matched or not.

Illustratively, the step of distributing the set of project results according to the data bucket identification comprises:

Performing multi-data source connection according to the data bucket identifier, and performing distributed dump of the project result set to different storage engines;

And carrying out index optimization and sub-table storage on the item result set according to the data bucket identification.

Illustratively, the step of performing multi-data source connection according to the data bucket identifier and dispersedly dumping the item result set to different storage engines includes:

And acquiring the dimension number and the dimension base of the item result set, and storing the item result set into a data storage engine corresponding to the dimension number and the dimension base.

Illustratively, the step of index optimizing the set of item results based on the data bucket identification includes:

And obtaining the table data volume of the item result set, and optimizing the index of the item result set to be a target index corresponding to the table data volume, wherein the target index comprises the data bucket identifier, the dimension identifier and the dimension identifier combined index.

Illustratively, the step of sub-table storing the item result set according to the data bucket identifier includes:

Generating a table structure according to the number of dimensions contained in the item result set, wherein the table structure comprises the data bucket identifier, the date and the dimension identifier;

after the item result set entries are greater than a preset number, a new table is generated based on the fixed date and sequence number to store a new item result set.

Illustratively, the step of generating and presenting a portrait report based on the item result set stored in a distributed manner includes:

querying and obtaining portrait report associated data from a data storage engine storing the item result set, and generating a visual portrait report for presentation.

The present application also provides a portrait report presenting apparatus, including:

The acquisition generation module is used for acquiring condition information corresponding to the portrait inquiry request and generating portrait items based on the condition information;

The data query module is used for generating a database query statement according to the portrait item and acquiring item metadata and an item result set of the portrait item by using the database query statement, wherein the item metadata is the identification of the condition information;

And the portrait presentation module is used for generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, carrying out distributed data storage agent on the item result set according to the data bucket identifier, and generating and presenting a portrait report based on the item result set which is stored in a distributed mode.

The application also provides a portrayal report rendering device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the portrayal report rendering method described above.

The present application also provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the portrait report presentation method described above.

The application discloses a portrait report presenting method, a portrait report presenting device and a computer readable storage medium, which are used for acquiring condition information corresponding to a portrait inquiry request, generating a portrait item based on the condition information, generating a database inquiry statement according to the portrait item, acquiring item metadata and an item result set of the portrait item by using the database inquiry statement, wherein the item metadata is an identification of the condition information, generating a data bucket identification corresponding to the item result set based on the item metadata according to a preset data bucket identification generating rule, carrying out a distributed data storage agent on the item result set according to the data bucket identification, and generating and presenting a portrait report based on the item result set which is stored in a distributed mode.

According to the application, portrait items are generated according to condition information corresponding to a portrait inquiry request, item metadata and item result sets of the portrait items are obtained by using a database inquiry statement generated by the portrait items, data bucket identifications corresponding to the item result sets are generated according to a preset data bucket identification generation rule based on the item metadata, a distributed data storage agent is carried out on the item result sets according to the data bucket identifications, and a portrait report is generated and presented based on the item result sets which are stored in a distributed mode. Aiming at the current complex user portrait system needing to realize multi-dimensional aggregation, the portrait data aggregated in different dimensions are automatically stored in different data storage engines in batches, for example, mySQL database stores a plurality of single-dimensional data, and an OLAP query engine such as Kylin is used for storing Gao Jiwei or multi-dimensional combined data. Therefore, in the multi-dimensional and high-base-dimensional user portrait scene, the query performance of a user can be effectively ensured in the process of applying multi-dimensional portrait query. In addition, the connection of various heterogeneous data sources can be effectively managed, judgment is automatically performed based on the dimension of the data set output by the portrait system, and then the data is transferred to different abnormal data storage engines.

Drawings

FIG. 1 is a schematic diagram of an operating device of a hardware operating environment according to an embodiment of the present application;

FIG. 2 is a flow chart of an embodiment of a portrait report presentation method according to an embodiment of the present application;

FIG. 3 is a schematic application diagram of an embodiment of a portrait report presentation method according to an embodiment of the present application;

FIG. 4 is a flow chart of another embodiment of a portrait report presentation method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an apparatus for presenting an image report according to an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Referring to fig. 1, fig. 1 is a schematic diagram of an operating device of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 1, the operating device may include a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a wireless FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 is not limiting of the operating device and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a computer program may be included in the memory 1005 as one type of storage medium.

In the running device shown in fig. 1, the network interface 1004 is mainly used for data communication with other devices, the user interface 1003 is mainly used for data interaction with a user, the processor 1001 and the memory 1005 in the running device of the present application can be arranged in the running device, and the running device calls the computer program stored in the memory 1005 through the processor 1001 and performs the following operations:

Illustratively, the processor 1001 may call a computer program stored in the memory 1005, and also perform the following:

The step of generating the data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata comprises the following steps:

The step of performing distributed data storage agent on the item result set according to the data bucket identifier includes:

the step of performing multi-data source connection according to the data bucket identifier and performing distributed dump of the item result set to different storage engines comprises the following steps:

the step of index optimization of the item result set according to the data bucket identifier includes:

the step of sub-table storage of the item result set according to the data bucket identifier comprises the following steps:

the step of generating and presenting a portrait report based on the item result set stored in a distributed manner includes:

An embodiment of the present application provides a portrait report presenting method, referring to fig. 2, in an embodiment of the portrait report presenting method, the method includes:

step S10, obtaining condition information corresponding to the portrait inquiry request, and generating portrait items based on the condition information;

referring to FIG. 3, in a first stage (S1-portrait item management), a user creates a portrait item in steps. The user selects the customer group (customer group) to be observed through the conditions, then selects the corresponding data labels (dimension, index), finally combines the conditions to submit a portrait inquiry request, and then the system automatically generates a portrait item based on the input information of the user corresponding to the portrait inquiry request, such as items, customer groups, dimension, index and the like. Wherein, the portrait items comprise condition information such as item information, guest group information, dimension information, index information and the like.

Step S20, generating a database query statement according to the portrait item, and acquiring item metadata and an item result set of the portrait item by using the database query statement, wherein the item metadata is an identification of the condition information;

referring to FIG. 3, in a second stage (S2-portrayal query engine), the portrayal items generated by the system are automatically submitted to the portrayal query engine. The portrait inquiry engine automatically assembles and generates SQL (structured query language) inquiry language according to metadata information defined by portrait items, submits the SQL inquiry language to a docked big data analysis platform, and outputs portrait item metadata and portrait item result set to the next link for use after task execution is completed. The metadata mainly includes an item ID (identification) in the portrait item, a guest group ID, and an associated condition information identification such as a dimension ID and an index ID. Wherein the result set (dataset) is a data result of the portrait system calculation.

Step S30, generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, performing distributed data storage agent on the item result set according to the data bucket identifier, and generating and presenting a portrait report based on the item result set which is stored in a distributed mode.

Referring to fig. 3, in a third stage (S3-data bucket distributor), the system collects the item metadata and the item result set output by the second stage into the data bucket distributor. The data bucket distributor generates a data bucket ID which is globally unique to each output item result set according to a preset data bucket identification generation rule based on item metadata. In a fourth stage (S4-distributed data storage agent), the system manages the different data storage engine connections using the distributed data storage agent, performs distributed data storage agent on the item result set according to the data bucket identification, and finally generates and presents the portrayal report based on the distributed stored item result set. Among other things, data storage engines include, but are not limited to, traditional RDBMS (MySQL), or NoSQL database (Mongodb), columnar storage database (Clickhouse), pre-calculation engine (Kylin), and the like.

Referring to FIG. 3, in a fifth stage (S5-portrayal presentation), when a user opens a portrayal report, the result set needs some structure to be organized to be presented in the portrayal report, and at this time, the query request is sent to a portrayal presentation module, which automatically queries the data associated with the portrayal report from different data storage engines based on the metadata corresponding to the relevant item context information, and generates a visual report to be presented to the user. In an actual application scene, the technical scheme of the application is used for representing two application links through a user query request and a picture report of a user portrait system.

In this embodiment, a portrait item is generated according to condition information corresponding to a portrait query request, item metadata and an item result set of the portrait item are obtained by using a database query statement generated by the portrait item, a data bucket identifier corresponding to the item result set is generated according to a preset data bucket identifier generating rule based on the item metadata, a distributed data storage agent is performed on the item result set according to the data bucket identifier, and a portrait report is generated and presented based on the item result set stored in a distributed manner. Aiming at the current complex user portrait system needing to realize multi-dimensional aggregation, the portrait data aggregated in different dimensions are automatically stored in different data storage engines in batches, for example, mySQL database stores a plurality of single-dimensional data, and an OLAP query engine such as Kylin is used for storing Gao Jiwei or multi-dimensional combined data. Therefore, in the multi-dimensional and high-base-dimensional user portrait scene, the query performance of a user can be effectively ensured in the process of applying multi-dimensional portrait query. In addition, the connection of various heterogeneous data sources can be effectively managed, judgment is automatically performed based on the dimension of the data set output by the portrait system, and then the data is transferred to different abnormal data storage engines.

In another embodiment of the portrait report presenting method of the present application, the step of generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generating rule based on the item metadata includes:

In this embodiment, a method for generating a data bucket identifier corresponding to a project result set is provided.

The globally unique data bucket ID generator generates a globally unique data bucket ID through a fixed algorithm rule, and the data bucket ID can be used in the subsequent scenes of data set storage, data index optimization, portrait data query and the like. One of the data bucket identifier generation rules is an identifier of condition information corresponding to a combined item result set, and the generated data bucket identifier is:

Wherein, the task ID is the identification of the repeatedly executable computing task between S2-S3 in FIG. 3, the optional option is included in the brackets, the dimension ID and/or the index ID are generated according to the output data result set as required, and the data bucket ID at least needs to contain a data tag ID (dimension ID or index ID). When the portrait item definition metadata information comprises dimension ID or index ID, when the data bucket identification is generated, the IDs are combined together to identify and generate only one portrait data, so that repetition is prevented, unique matching is not possible, and distinguishing and searching are facilitated.

Referring to FIG. 4, in another embodiment of the portrayal report presentation method of the present application, the step of performing a distributed data storage agent on the set of item results based on the data bucket identification includes:

Step S30A, performing multi-data source connection according to the data bucket identification, and dispersedly dumping the project result set to different storage engines;

and step S30B, carrying out index optimization and sub-table storage on the item result set according to the data bucket identification.

In this embodiment, a distributed data storage proxy method is proposed. The method can automatically index and optimize the data in the data set storage process, separate the table and the database, and improve the data query performance.

The distributed data storage agent performs mainly:

1. The multi-data source connection is used for simultaneously and automatically dispersing and dumping the result sets to different storage engines based on the dimension data of the result sets output by the upstream and different dimension cardinalities;

2. The system automatically uses the data bucket ID generated in the third stage to automatically index and optimize the data set which is just inquired and stored so as to improve the data inquiry performance;

3. automatic sub-table storage for conventional RDBMS (Relational Database MANAGEMENT SYSTEM ) storage engines, the system automatically sub-table stores the image dataset based on storage space and query performance considerations.

In addition, current application systems more consider the storage of image data sets from a static perspective, but later with more and more users frequently using the image system, viewing image reports based on existing image result sets, the system needs to further consider caching data results from a dynamic perspective. In this way, the application system needs to be further expanded, and in an embodiment, a cache management module can be added adaptively, and some cache management middleware (Redis, memcache) is accessed.

Illustratively, step S30A, performing multi-data source connection according to the data bucket identifier, and dispersedly dumping the item result set to different storage engines, which includes:

The system judges the data set produced in the portrait inquiry process, and dumps different data sets to different data storage engines according to rules. Exemplary, specific dump rules are shown in the following table:

hit rule	Storage engine
		Dimension number < = N1	RDMBS(MySQL)
Dimension number > =n2 and dimension number < N3 and dimension radix < C1	RDMBS(MySQL)
		Dimension number > =n2 and dimension number < N3 and dimension radix > =c1	OLAP(Clickhouse,Kylin)
Dimension number > =n3	OLAP(Clickhouse,Kylin)

Wherein the dimension base is the number of specific dimension values in a certain dimension, OLAP (On-LINE ANALYTICAL Processing technology), N1, N2, N3 and C1 are all preset values, and N1< N2< N3.

Illustratively, step S30B of index optimizing the set of item results based on the data bucket identification comprises:

After the portrait system data is put in storage, users may frequently query portrait graph data through a portrait presentation module, so in order to improve data query performance, the system performs automatic index optimization on each portrait data table. Exemplary, specific index optimization rules are shown in the following table:

Wherein BucketID is a data bucket identifier, bucketID _index is a data bucket identifier Index, dim is a dimension identifier, dim_1_index is a dimension identifier Index.

Illustratively, step S30B, the sub-table storing the item result set according to the data bucket identifier includes:

The portrait system is divided into libraries and tables mainly for two reasons. Firstly, the dimension of the portrait data sets processed by the portrait system is different, the data sets are directly stored in a wide table, on one hand, the storage space is wasted, on the other hand, the data query performance is reduced, and further index optimization is inconvenient. Second, over time, the representation system may accumulate very multi-user generated representation data sets, if the data is not stored in separate tables, the table space may become larger and the query efficiency may be slower.

Thus, in this embodiment, the system performs sub-table storage of the portrait dataset with reference to the following two rules. Therefore, the query performance of the portrait data is greatly improved, and meanwhile, the frequently accessed data blocks are correspondingly buffer optimized. The method solves the problem of data query performance in the prior art.

Rule one is based on a data dimension number sub-table. The data sets of different numbers of dimensions store different table structures, and different table structures are generated based on the number of dimensions contained in the data sets.

The resulting table structure is approximately as follows:

wherein BucketID is a data bucket identifier, D_DATE is a DATE, DIM_n is a dimension identifier, and IND_value is a selectable option, i.e. an index value.

Rule two, sub-table based on dataset entry > =n. When the data set item in a certain table exceeds a certain item (N), the application system automatically stores the newly output data set in a new table, the data table is named by adding a serial number according to a fixed date format, and the date is added in the table structure.

Referring to fig. 5, an embodiment of the present application further provides a portrait report presenting apparatus, including:

the acquisition generation module M1 is used for acquiring condition information corresponding to the portrait inquiry request and generating portrait items based on the condition information;

The data query module M2 is used for generating a database query statement according to the portrait item and acquiring item metadata and an item result set of the portrait item by using the database query statement, wherein the item metadata is the identification of the condition information;

And the portrayal presentation module M3 is used for generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, carrying out distributed data storage agent on the item result set according to the data bucket identifier, and generating and presenting a portrayal report based on the item result set which is stored in a distributed mode.

Illustratively, the portrait rendering module is further to:

The portrait report presenting device provided by the application solves the technical problems that a single data source cannot meet the data storage requirement and the storage bottleneck exists in the prior art by adopting the portrait report presenting method in the embodiment. Compared with the prior art, the image report presenting device provided by the embodiment of the application has the same beneficial effects as those of the image report presenting method provided by the embodiment, and other technical features in the image report presenting device are the same as those disclosed by the method of the embodiment, and are not repeated herein.

In addition, the embodiment of the application also provides a portrait report presenting chip which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the computer program is configured to realize the steps of the portrait report presenting method.

In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the portrait report presenting method when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A portrait report presentation method, the method comprising:

generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, performing multi-data source connection according to the data bucket identifier, performing distributed dump on the item result set to different storage engines, performing index optimization and sub-table storage on the item result set according to the data bucket identifier, and generating and presenting a portrait report based on the item result set stored in a distributed mode.

2. The portrait report presentation method of claim 1 wherein the step of generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata includes:

3. The portrayal report presentation method of claim 1, wherein the step of performing a multi-data source connection based on the data bucket identification to scatter dump the set of item results to different storage engines comprises:

4. The portrayal report presentation method of claim 1, wherein the step of index optimizing the set of item results based on the data bucket identification comprises:

5. The portrayal report presentation method of claim 1, wherein the step of sub-table storing the set of item results according to the data bucket identification comprises:

6. The portrayal report presentation method as recited in claim 1, wherein the step of generating and presenting a portrayal report based on the set of item results stored in a distributed manner comprises:

7. A portrayal report presentation apparatus, the portrayal report presentation apparatus comprising:

The portrait presentation module is used for generating a data bucket identifier corresponding to the item result set according to a preset data bucket identifier generation rule based on the item metadata, performing multi-data source connection according to the data bucket identifier, performing scattered dumping on the item result set to different storage engines, performing index optimization and sub-table storage on the item result set according to the data bucket identifier, and generating and presenting a portrait report based on the item result set which is stored in a distributed mode.

8. A portrayal report rendering device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program being configured to implement the steps of the portrayal report rendering method according to any one of claims 1 to 6.

9. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the portrait report presentation method according to any one of claims 1 to 6.