Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 1, a data value evaluation analysis method based on data attribute analysis includes:
step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
step S2, acquiring data attribute sets required under different application scenes, and combining all data attributes in the data attribute sets into a plurality of data attribute subsets, wherein each data attribute subset is associated with a corresponding application scene and at least comprises two data attributes;
step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
and step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene.
From the above description, the beneficial effects of the present invention are: and calculating and obtaining first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to a data value evaluation analysis algorithm and calculating and obtaining second data asset value of each multi-attribute data set in the associated application scene, and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
Further, the calculation process of the data value evaluation analysis algorithm specifically includes:
step S11, traversing all data of a first data set, and acquiring the number of missing data fields, the number of data fields not conforming to the corresponding data attribute specification, and whether the data field values on the matching associated items of all data tables are consistent, so as to sequentially obtain an integrity value, an effectiveness value and a consistency value, wherein the first data set is a single-attribute data set or a multi-attribute data set;
step S12, acquiring all professional data related to data value evaluation and analysis in academic papers, academic journals and published patents, screening out first professional data containing integrity, effectiveness and consistency from all the professional data, converting the first professional data into relative proportion relations by uniformly summing specific proportion relations of the integrity, the effectiveness and the consistency in the first professional data into 1, accumulating all the relative proportion relations corresponding to the integrity, the effectiveness and the consistency to obtain a quality weight ratio of the integrity, the effectiveness and the consistency, and calculating the integrity value, the effectiveness value and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set;
step S13, obtaining a rarity value, a timeliness value, a consumption value, and a feasibility value in sequence according to the number of data sources and data update timeliness of different data attributes in the first data set, consumption data of a first application scenario, and a ratio between the type of the data attributes in the first data set and the data attributes required by the first application scenario, where the first application scenario is any one of a plurality of different application scenarios;
step S14, screening second professional data containing rarity, timeliness, consumption and feasibility from all the professional data, converting the second professional data into a relative proportion relation according to the specific proportion relation of the rarity, the timeliness, the consumption and the feasibility in the second professional data, wherein the specific proportion relation is 1 in a unified mode, accumulating all the relative proportion relations corresponding to the rarity, the timeliness, the consumption and the feasibility to obtain scene weight ratios among the four parts, and calculating the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set according to the scene weight ratios to obtain data scene components of the first data set;
and step S15, taking the product of the data quality score and the data scene score as the data asset value.
From the above description, it can be known that the calculation is performed on different dimensional data in terms of data quality and application scenarios, and the weight ratio of each dimensional data is retrieved and analyzed according to all professional data related to data value evaluation analysis in academic papers, academic journals and published patents, so that the setting of the weight ratio is more accurate, and the accurate value evaluation of the data assets is realized.
Further, the step of screening out the first professional data including the integrity, the validity and the consistency from all the professional data includes: screening all professional data to obtain first professional data at least comprising two properties of the completeness, the effectiveness and the consistency;
the second professional data which simultaneously comprise rarity, timeliness, consumption and feasibility and are screened from all the professional data are specifically as follows: screening out second professional data at least comprising two properties of rareness, timeliness, consumption and feasibility from all the professional data;
in the specific proportional relationship, if a certain property is not present, the sum is calculated as 0 by converting the specific proportional relationship into a relative proportional relationship in which the sum is 1.
From the above description, more than half of the total weight has a certain reference value, so that the data size is increased to ensure more accurate setting of the weight ratio.
Further, the quality weight ratios of the completeness, the validity and the consistency and the scene weight ratios of the rarity, the timeliness, the consumability and the feasibility are respectively given to a value range by an expert end in advance;
if each weight ratio obtained in the step S12 is within the corresponding value range, calculating the integrity value, the validity value, and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set, otherwise, sending each generated weight ratio to an expert;
if each weight ratio obtained in the step S14 is within the corresponding value range, the rarity value, the timeliness value, the consumption value, and the feasibility value of the first data set are calculated according to the scene weight ratio to obtain a data scene score of the first data set, otherwise, each generated weight ratio is sent to an expert.
From the above description, a value range is set to constrain the result of big data analysis, so that all possible deviation phenomena of machine learning are avoided through manual work, and the accuracy of the weight ratio is ensured.
Furthermore, there are multiple expert terminals, and the value range is obtained by discussing multiple expert terminal negotiations.
Further, the step of sending each generated weight ratio to the expert at step S14 specifically includes the following steps:
and sending each generated weight ratio and a plurality of professional data I which are closest to the generated weight ratio to an expert side.
From the above description, when the weight ratio exceeds the set value range, that is, there is a dispute between artificial constraint and machine learning, a plurality of professional data one most similar to the generated weight ratio in machine learning are sent to the expert, and the expert reads the relevant professional data and then judges whether the weight ratio is reasonable and reliable, so that the weight ratio is set manually and by machines, and the accuracy of the weight ratio is ensured.
Further, the step S4 specifically includes:
and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value which are larger than the data cost in the current application scene.
Further, the sum of the quality weight ratios of the completeness, the effectiveness and the consistency is 1, and the sum of the scene weight ratios of the rareness, the timeliness, the consumability and the feasibility is 1.
Further, the step S1 is preceded by the following steps:
and carrying out metadata management on the original data, and taking the obtained metadata as a data set.
Referring to fig. 2, the data value evaluation and analysis apparatus based on data attribute analysis includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the data value evaluation and analysis method based on data attribute analysis when executing the computer program.
From the above description, the beneficial effects of the present invention are: and calculating and obtaining first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to a data value evaluation analysis algorithm and calculating and obtaining second data asset value of each multi-attribute data set in the associated application scene, and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
Referring to fig. 1, a first embodiment of the present invention is:
the data value evaluation analysis method based on data attribute analysis comprises the following steps:
step S0 is to perform metadata management on the raw data, and use the obtained metadata as a data set.
I.e. all collected data is converted into metadata for subsequent statistical analysis.
Step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
in this embodiment, the data attribute refers to what the data represents, such as the data attribute being the purchased goods, the amount of consumption, the location information, and so on. In step S1, asset valuations are performed for only a single data attribute.
Step S2, acquiring a data attribute set required under different application scenes, and combining all data attributes in the data attribute set into a plurality of data attribute subsets, wherein each data attribute subset is associated with the corresponding application scene and at least comprises two data attributes;
in this embodiment, A, B two application scenarios are included, and the data attributes include three types of purchased articles, amount of consumption, and location information, then the a application scenario requires two types of data attributes, namely, purchased articles and amount of consumption, and thus only one data attribute subset is provided, while the B application scenario requires three types of data attributes, namely, purchased articles, amount of consumption, and location information, and four data attribute subsets are provided.
Step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
therefore, whether the combination of the data attributes can generate higher data asset value or not is evaluated, so that a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
And step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value which are larger than the data cost in the current application scene.
That is, the data cost is mainly the hardware cost in data storage, and if the value of the data asset is less than the data cost, the data asset does not need to be stored, and the data asset can be directly discarded.
Referring to fig. 1, a first embodiment of the present invention is:
based on the first embodiment, the data value evaluation analysis method based on data attribute analysis specifically includes the following steps:
step S11, traversing all data of a first data set, and acquiring the number of missing data fields, the number of data fields not conforming to the corresponding data attribute specification, and whether the data field values on the matching associated items of all data tables are consistent, so as to sequentially obtain an integrity value, an effectiveness value and a consistency value, wherein the first data set is a single-attribute data set or a multi-attribute data set;
in the present embodiment, the data field is missing or incomplete, and thus the ratio of the number of the data from which the part is subtracted to the total number of the data is taken as an integrity value, which is 95% in the present embodiment, and the validity value and the consistency value are 90% and 96%, respectively.
Step S12, acquiring all professional data related to data value evaluation and analysis in academic papers, academic journals and published patents, screening out first professional data containing integrity, effectiveness and consistency from all the professional data, converting the first professional data into relative proportion relations according to specific proportion relations of the integrity, the effectiveness and the consistency in the first professional data, accumulating all the relative proportion relations corresponding to the integrity, the effectiveness and the consistency to obtain quality weight ratios of the integrity values, the effectiveness values and the consistency values of the first data set, and calculating the data quality scores of the first data set according to the quality weight ratios;
wherein the sum of the quality weight ratios of integrity, validity and consistency is 1.
In this embodiment, the first professional data at least including two properties of integrity, effectiveness and consistency is screened from all professional data, that is, only any two properties of the professional data exist as the first professional data, and the subsequent specific proportional relationship is converted into a relative proportional relationship by summing 1, and if one property does not exist, the relative proportional relationship is counted as 0 for calculation. For example, if there are two properties of integrity and validity in a professional data, they are 100 and 50 in a professional data, respectively, then they become 2/3 and 1/3 after being converted to 1, and the consistency is 0, because the consistency is not listed in the professional data, it means that the weight ratio of the consistency is not important to the author of the professional data, so that it is equal to 0.
In this embodiment, the quality weight ratios of integrity, validity, and consistency and the scene weight ratios of rarity, timeliness, consumption, and feasibility are respectively given to a value range by the expert terminal in advance, that is, if each weight ratio obtained in step S12 is within a corresponding value range, the integrity value, validity value, and consistency value of the first data set are calculated according to the quality weight ratio to obtain the data quality score of the first data set, otherwise, each generated weight ratio and a plurality of professional data sets closest to the generated weight ratio are sent to the expert terminal;
the expert terminals have at least three value ranges which are obtained by the discussion of a plurality of expert terminal negotiations, the value ranges of different expert terminals are sent to all the expert terminals after the value ranges of the at least three expert terminals are given by the expert terminals, the value ranges are redefined after the expert terminals receive the value ranges of the different expert terminals and then the negotiation, and the process is repeated, and at least one uniform value range is discussed by the negotiation of all the expert terminals.
In this example, the ranges of integrity, effectiveness, and consistency are 30% -45%, and 10% -25%, respectively. And the weight ratios of the three obtained by the sensory professional data are 39%, 41% and 20% respectively, so that the weight ratios are within the value range, and the weight ratios are used for subsequent calculation. At this time, the integrity value, the validity value, and the consistency value were 95%, 90%, and 96%, respectively, and the data mass score was calculated to be 95% + 39% + 90% + 41% + 96% + 20% + 37.05% + 36.9% + 19.2% + 93.15%.
Therefore, the weight ratio of each dimension data is searched and analyzed from all professional data related to data value evaluation analysis in academic papers, academic journals and published patents, so that the setting of the weight ratio is more accurate.
Step S13, according to the number of data sources and data updating time efficiency of different data attributes in the first data set, consumption data of the first application scene and the ratio of the type of the data attributes in the first data set to the data attributes required by the first application scene, a rarity value, a time efficiency value, a consumption value and a feasibility value are obtained in sequence, wherein the first application scene is any one of a plurality of different application scenes;
thus, step S13 refers to step S11 described above to obtain rarity, timeliness, consumability, and feasibility values of 60%, 40%, 25%, and 50%, respectively.
S14, screening second professional data containing rarity, timeliness, consumption and feasibility from all the professional data, converting the second professional data into a relative proportion relation according to the specific proportion relation of the rarity, timeliness, consumption and feasibility in the second professional data, wherein the specific proportion relation is 1, accumulating all the relative proportion relations corresponding to the rarity, timeliness, consumption and feasibility to obtain scene weight ratios among the four, and calculating the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set according to the scene weight ratios to obtain data scene components of the first data set;
wherein, the sum of scene weight ratios of rarity, timeliness, consumption and feasibility is 1.
If each weight ratio obtained in the step S14 is within the corresponding value range, the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set are calculated according to the scene weight ratio to obtain the data scene score of the first data set, otherwise, each generated weight ratio is sent to the expert.
In this embodiment, a second specific data including at least two of rarity, timeliness, consumption and feasibility is selected from all specific data, and the detailed description of step S12 is referred to.
Thus, step S14 refers to step S12 described above to obtain respective weighting ratios of rarity, timeliness, consumability, and feasibility of 15%, 20%, 40%, and 25%, respectively, and then calculates a data scenario score of 60% + 15% + 40% + 25% + 40% + 50% + 25% + 9% + 8% + 10% + 12.5% + 39.5%.
And step S15, taking the product of the data quality score and the data scene score as the data asset value.
Thus, the data asset value of the above embodiment is 39.5% by 93.15% to 36.8%.
Referring to fig. 1, a third embodiment of the present invention is:
based on the second embodiment, the step S4 specifically includes the following steps:
step S41, acquiring a first single attribute data set corresponding to a first data asset value greater than the data cost and a first multi-attribute data set corresponding to a second data asset value greater than the data cost in the current application scene;
the data cost comprises the original cost of data required by the data of each data attribute to be acquired to be stored and the data sale cost of selling one data set. The data cost corresponding to the multi-attribute data set is naturally the sum of the original data cost of all the included data attributes plus one data sale cost.
Step S42, removing data cost from the first data asset value of all the first single-attribute data sets to obtain first data asset net profit, removing data cost from the second data asset value of the first multi-attribute data set to obtain second data asset net profit, combining all the first single-attribute data sets and the first multi-attribute data sets according to a maximum non-repetition principle to obtain a data asset selling combination, and calculating the total profit of the data asset selling combination according to the corresponding data asset net profit, wherein the maximum non-repetition principle means that the number of data attributes contained in the data asset selling combination is the theoretical maximum attribute value and all the data attributes in the data asset selling combination are stored uniquely;
wherein, because a part of the data sets with lower data cost are filtered in step S41, not all data asset sales combinations can contain all data attributes, and the data attributes include purchase item, consumption amount and location information, wherein the first data asset value of consumption amount is smaller than the data cost, the first data asset values of the rest purchase items and location information are all larger than the data cost, and the second data asset values of the 4 multi-attribute data sets of purchase item, consumption amount and location information combination are also all larger than the data cost, that is, the purchase item, location information, purchase item + consumption amount, purchase item + location information, consumption amount + location information and purchase item + consumption amount + location information are all used as a group of data asset sales combinations, in addition, the purchase item and consumption amount + position information, the position information and the purchase item + consumption amount and the purchase item + position information are also three groups of data asset selling combinations respectively, and four groups are totalized, wherein the purchase item + position information does not contain three data attributes because the first data asset value of the consumption amount is smaller than the data cost, but two data asset value are also the theoretical maximum attribute value which can be combined by the data set.
And step S43, taking the data asset selling combination with the highest total profit as the data set with the highest data asset value under the current application scene.
Therefore, the total profit of the position information and the purchase item + the consumption amount in the four groups is the highest, namely, the position information is sold singly, and the purchase item + the consumption amount is sold in a binding manner, so as to ensure that the value of the data asset is maximized.
In other embodiments, the number of data sets for a data asset offering portfolio may be limited to a maximum of two or three, taking into account the negative impact of multiple separate offerings in the same application scenario.
Referring to fig. 2, a fourth embodiment of the present invention is:
the data value evaluation and analysis device 1 based on data attribute analysis comprises a memory 3, a processor 2 and a computer program which is stored on the memory 3 and can run on the processor 2, wherein the processor 2 realizes the steps of the first embodiment, the second embodiment or the third embodiment when executing the computer program.
In summary, the data value evaluation analysis method and device based on data attribute analysis provided by the invention set different dimensional data in data quality and application scene for calculation, search and analyze the weight ratio of each dimensional data in academic papers, academic journals and all professional data related to data value evaluation analysis in published patents, and add the value range provided by experts for artificial constraint to obtain a more accurate data value evaluation analysis algorithm, calculate and obtain the first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to the data value evaluation analysis algorithm and calculate and obtain the second data asset value of each multi-attribute data set in the associated application scenes, and obtain the data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.