[go: up one dir, main page]

CN113901106A - Data value evaluation analysis method and device based on data attribute analysis - Google Patents

Data value evaluation analysis method and device based on data attribute analysis Download PDF

Info

Publication number
CN113901106A
CN113901106A CN202111175477.XA CN202111175477A CN113901106A CN 113901106 A CN113901106 A CN 113901106A CN 202111175477 A CN202111175477 A CN 202111175477A CN 113901106 A CN113901106 A CN 113901106A
Authority
CN
China
Prior art keywords
data
value
attribute
professional
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111175477.XA
Other languages
Chinese (zh)
Other versions
CN113901106B (en
Inventor
金华松
何颖
王小军
翁武焰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Zhongxin Wang 'an Information Technology Co ltd
Original Assignee
Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Zhongxin Wang 'an Information Technology Co ltd filed Critical Fujian Zhongxin Wang 'an Information Technology Co ltd
Priority to CN202111175477.XA priority Critical patent/CN113901106B/en
Publication of CN113901106A publication Critical patent/CN113901106A/en
Application granted granted Critical
Publication of CN113901106B publication Critical patent/CN113901106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了基于数据属性分析的数据价值评估分析方法及装置,根据数据价值评估分析算法计算并得到单一数据属性的单属性数据集在多个不同应用场景下的第一数据资产价值;获取不同应用场景下所需要的数据属性集合,将数据属性集合内的所有数据属性进行组合成多个数据属性子集;将同一个数据属性子集下的所有数据属性的数据集作为一个多属性数据集,根据数据价值评估分析算法计算并得到每一个多属性数据集在所关联的应用场景下的第二数据资产价值;根据当前应用场景下的第一数据资产价值和第二数据资产价值得到当前应用场景下数据资产价值最高的数据集。本发明不仅间接提高了数据资产的价值,也实现了对数据资产的准确价值评估。

Figure 202111175477

The invention discloses a data value evaluation and analysis method and device based on data attribute analysis. According to the data value evaluation and analysis algorithm, the first data asset value of a single attribute data set with a single data attribute in multiple different application scenarios is calculated and obtained; For the data attribute set required in the application scenario, all data attributes in the data attribute set are combined into multiple data attribute subsets; the data sets of all data attributes under the same data attribute subset are regarded as a multi-attribute data set , calculate and obtain the second data asset value of each multi-attribute data set under the associated application scenario according to the data value evaluation analysis algorithm; obtain the current application value according to the first data asset value and the second data asset value under the current application scenario The dataset with the highest value of data assets in the scenario. The invention not only indirectly improves the value of the data assets, but also realizes the accurate value evaluation of the data assets.

Figure 202111175477

Description

Data value evaluation analysis method and device based on data attribute analysis
Technical Field
The invention relates to the technical field of data mining, in particular to a data value evaluation analysis method and device based on data attribute analysis.
Background
A large amount of data generated by various industries increasingly becomes digital assets which can be compared with tangible assets, and data of certain key industries becomes strategic resources which need key protection urgently in China. The digital economy with data as key elements becomes a new engine for industrial development and a new kinetic energy for national development.
The data assets are novel intangible assets, have no consumption, value-added property, attachment and value variability, are controlled by enterprise main bodies and attached to tangible assets, and the value of the application is also influenced by various variable factors, such as the quality of data and the application value of the data in different scenes, so that the existing value evaluation of the data assets is very deficient, and a complete data value evaluation analysis method does not exist at present.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a data value evaluation analysis method and device based on data attribute analysis are provided to accurately evaluate the value of data assets.
In order to solve the technical problems, the invention adopts the technical scheme that:
a data value evaluation analysis method based on data attribute analysis comprises the following steps:
step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
step S2, acquiring data attribute sets required under different application scenes, and combining all data attributes in the data attribute sets into a plurality of data attribute subsets, wherein each data attribute subset is associated with a corresponding application scene and at least comprises two data attributes;
step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
and step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene.
In order to solve the technical problem, the invention adopts another technical scheme as follows:
the data value evaluation and analysis device based on data attribute analysis comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the steps in the data value evaluation and analysis method based on data attribute analysis.
The invention has the beneficial effects that: the data value evaluation analysis method and device based on data attribute analysis calculate and obtain first data asset values of single attribute data sets with single data attributes in a plurality of different application scenes according to a data value evaluation analysis algorithm and calculate and obtain second data asset values of each multi-attribute data set in the associated application scenes, and obtain a data set with the highest data asset value in the current application scene according to the first data asset values and the second data asset values in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
Drawings
FIG. 1 is a schematic flow chart of a data value evaluation analysis method based on data attribute analysis according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data value evaluation and analysis apparatus based on data attribute analysis according to an embodiment of the present invention.
Description of reference numerals:
1. a data value evaluation and analysis device based on data attribute analysis; 2. a processor; 3. a memory.
Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 1, a data value evaluation analysis method based on data attribute analysis includes:
step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
step S2, acquiring data attribute sets required under different application scenes, and combining all data attributes in the data attribute sets into a plurality of data attribute subsets, wherein each data attribute subset is associated with a corresponding application scene and at least comprises two data attributes;
step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
and step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene.
From the above description, the beneficial effects of the present invention are: and calculating and obtaining first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to a data value evaluation analysis algorithm and calculating and obtaining second data asset value of each multi-attribute data set in the associated application scene, and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
Further, the calculation process of the data value evaluation analysis algorithm specifically includes:
step S11, traversing all data of a first data set, and acquiring the number of missing data fields, the number of data fields not conforming to the corresponding data attribute specification, and whether the data field values on the matching associated items of all data tables are consistent, so as to sequentially obtain an integrity value, an effectiveness value and a consistency value, wherein the first data set is a single-attribute data set or a multi-attribute data set;
step S12, acquiring all professional data related to data value evaluation and analysis in academic papers, academic journals and published patents, screening out first professional data containing integrity, effectiveness and consistency from all the professional data, converting the first professional data into relative proportion relations by uniformly summing specific proportion relations of the integrity, the effectiveness and the consistency in the first professional data into 1, accumulating all the relative proportion relations corresponding to the integrity, the effectiveness and the consistency to obtain a quality weight ratio of the integrity, the effectiveness and the consistency, and calculating the integrity value, the effectiveness value and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set;
step S13, obtaining a rarity value, a timeliness value, a consumption value, and a feasibility value in sequence according to the number of data sources and data update timeliness of different data attributes in the first data set, consumption data of a first application scenario, and a ratio between the type of the data attributes in the first data set and the data attributes required by the first application scenario, where the first application scenario is any one of a plurality of different application scenarios;
step S14, screening second professional data containing rarity, timeliness, consumption and feasibility from all the professional data, converting the second professional data into a relative proportion relation according to the specific proportion relation of the rarity, the timeliness, the consumption and the feasibility in the second professional data, wherein the specific proportion relation is 1 in a unified mode, accumulating all the relative proportion relations corresponding to the rarity, the timeliness, the consumption and the feasibility to obtain scene weight ratios among the four parts, and calculating the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set according to the scene weight ratios to obtain data scene components of the first data set;
and step S15, taking the product of the data quality score and the data scene score as the data asset value.
From the above description, it can be known that the calculation is performed on different dimensional data in terms of data quality and application scenarios, and the weight ratio of each dimensional data is retrieved and analyzed according to all professional data related to data value evaluation analysis in academic papers, academic journals and published patents, so that the setting of the weight ratio is more accurate, and the accurate value evaluation of the data assets is realized.
Further, the step of screening out the first professional data including the integrity, the validity and the consistency from all the professional data includes: screening all professional data to obtain first professional data at least comprising two properties of the completeness, the effectiveness and the consistency;
the second professional data which simultaneously comprise rarity, timeliness, consumption and feasibility and are screened from all the professional data are specifically as follows: screening out second professional data at least comprising two properties of rareness, timeliness, consumption and feasibility from all the professional data;
in the specific proportional relationship, if a certain property is not present, the sum is calculated as 0 by converting the specific proportional relationship into a relative proportional relationship in which the sum is 1.
From the above description, more than half of the total weight has a certain reference value, so that the data size is increased to ensure more accurate setting of the weight ratio.
Further, the quality weight ratios of the completeness, the validity and the consistency and the scene weight ratios of the rarity, the timeliness, the consumability and the feasibility are respectively given to a value range by an expert end in advance;
if each weight ratio obtained in the step S12 is within the corresponding value range, calculating the integrity value, the validity value, and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set, otherwise, sending each generated weight ratio to an expert;
if each weight ratio obtained in the step S14 is within the corresponding value range, the rarity value, the timeliness value, the consumption value, and the feasibility value of the first data set are calculated according to the scene weight ratio to obtain a data scene score of the first data set, otherwise, each generated weight ratio is sent to an expert.
From the above description, a value range is set to constrain the result of big data analysis, so that all possible deviation phenomena of machine learning are avoided through manual work, and the accuracy of the weight ratio is ensured.
Furthermore, there are multiple expert terminals, and the value range is obtained by discussing multiple expert terminal negotiations.
Further, the step of sending each generated weight ratio to the expert at step S14 specifically includes the following steps:
and sending each generated weight ratio and a plurality of professional data I which are closest to the generated weight ratio to an expert side.
From the above description, when the weight ratio exceeds the set value range, that is, there is a dispute between artificial constraint and machine learning, a plurality of professional data one most similar to the generated weight ratio in machine learning are sent to the expert, and the expert reads the relevant professional data and then judges whether the weight ratio is reasonable and reliable, so that the weight ratio is set manually and by machines, and the accuracy of the weight ratio is ensured.
Further, the step S4 specifically includes:
and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value which are larger than the data cost in the current application scene.
Further, the sum of the quality weight ratios of the completeness, the effectiveness and the consistency is 1, and the sum of the scene weight ratios of the rareness, the timeliness, the consumability and the feasibility is 1.
Further, the step S1 is preceded by the following steps:
and carrying out metadata management on the original data, and taking the obtained metadata as a data set.
Referring to fig. 2, the data value evaluation and analysis apparatus based on data attribute analysis includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the data value evaluation and analysis method based on data attribute analysis when executing the computer program.
From the above description, the beneficial effects of the present invention are: and calculating and obtaining first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to a data value evaluation analysis algorithm and calculating and obtaining second data asset value of each multi-attribute data set in the associated application scene, and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
Referring to fig. 1, a first embodiment of the present invention is:
the data value evaluation analysis method based on data attribute analysis comprises the following steps:
step S0 is to perform metadata management on the raw data, and use the obtained metadata as a data set.
I.e. all collected data is converted into metadata for subsequent statistical analysis.
Step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
in this embodiment, the data attribute refers to what the data represents, such as the data attribute being the purchased goods, the amount of consumption, the location information, and so on. In step S1, asset valuations are performed for only a single data attribute.
Step S2, acquiring a data attribute set required under different application scenes, and combining all data attributes in the data attribute set into a plurality of data attribute subsets, wherein each data attribute subset is associated with the corresponding application scene and at least comprises two data attributes;
in this embodiment, A, B two application scenarios are included, and the data attributes include three types of purchased articles, amount of consumption, and location information, then the a application scenario requires two types of data attributes, namely, purchased articles and amount of consumption, and thus only one data attribute subset is provided, while the B application scenario requires three types of data attributes, namely, purchased articles, amount of consumption, and location information, and four data attribute subsets are provided.
Step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
therefore, whether the combination of the data attributes can generate higher data asset value or not is evaluated, so that a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
And step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value which are larger than the data cost in the current application scene.
That is, the data cost is mainly the hardware cost in data storage, and if the value of the data asset is less than the data cost, the data asset does not need to be stored, and the data asset can be directly discarded.
Referring to fig. 1, a first embodiment of the present invention is:
based on the first embodiment, the data value evaluation analysis method based on data attribute analysis specifically includes the following steps:
step S11, traversing all data of a first data set, and acquiring the number of missing data fields, the number of data fields not conforming to the corresponding data attribute specification, and whether the data field values on the matching associated items of all data tables are consistent, so as to sequentially obtain an integrity value, an effectiveness value and a consistency value, wherein the first data set is a single-attribute data set or a multi-attribute data set;
in the present embodiment, the data field is missing or incomplete, and thus the ratio of the number of the data from which the part is subtracted to the total number of the data is taken as an integrity value, which is 95% in the present embodiment, and the validity value and the consistency value are 90% and 96%, respectively.
Step S12, acquiring all professional data related to data value evaluation and analysis in academic papers, academic journals and published patents, screening out first professional data containing integrity, effectiveness and consistency from all the professional data, converting the first professional data into relative proportion relations according to specific proportion relations of the integrity, the effectiveness and the consistency in the first professional data, accumulating all the relative proportion relations corresponding to the integrity, the effectiveness and the consistency to obtain quality weight ratios of the integrity values, the effectiveness values and the consistency values of the first data set, and calculating the data quality scores of the first data set according to the quality weight ratios;
wherein the sum of the quality weight ratios of integrity, validity and consistency is 1.
In this embodiment, the first professional data at least including two properties of integrity, effectiveness and consistency is screened from all professional data, that is, only any two properties of the professional data exist as the first professional data, and the subsequent specific proportional relationship is converted into a relative proportional relationship by summing 1, and if one property does not exist, the relative proportional relationship is counted as 0 for calculation. For example, if there are two properties of integrity and validity in a professional data, they are 100 and 50 in a professional data, respectively, then they become 2/3 and 1/3 after being converted to 1, and the consistency is 0, because the consistency is not listed in the professional data, it means that the weight ratio of the consistency is not important to the author of the professional data, so that it is equal to 0.
In this embodiment, the quality weight ratios of integrity, validity, and consistency and the scene weight ratios of rarity, timeliness, consumption, and feasibility are respectively given to a value range by the expert terminal in advance, that is, if each weight ratio obtained in step S12 is within a corresponding value range, the integrity value, validity value, and consistency value of the first data set are calculated according to the quality weight ratio to obtain the data quality score of the first data set, otherwise, each generated weight ratio and a plurality of professional data sets closest to the generated weight ratio are sent to the expert terminal;
the expert terminals have at least three value ranges which are obtained by the discussion of a plurality of expert terminal negotiations, the value ranges of different expert terminals are sent to all the expert terminals after the value ranges of the at least three expert terminals are given by the expert terminals, the value ranges are redefined after the expert terminals receive the value ranges of the different expert terminals and then the negotiation, and the process is repeated, and at least one uniform value range is discussed by the negotiation of all the expert terminals.
In this example, the ranges of integrity, effectiveness, and consistency are 30% -45%, and 10% -25%, respectively. And the weight ratios of the three obtained by the sensory professional data are 39%, 41% and 20% respectively, so that the weight ratios are within the value range, and the weight ratios are used for subsequent calculation. At this time, the integrity value, the validity value, and the consistency value were 95%, 90%, and 96%, respectively, and the data mass score was calculated to be 95% + 39% + 90% + 41% + 96% + 20% + 37.05% + 36.9% + 19.2% + 93.15%.
Therefore, the weight ratio of each dimension data is searched and analyzed from all professional data related to data value evaluation analysis in academic papers, academic journals and published patents, so that the setting of the weight ratio is more accurate.
Step S13, according to the number of data sources and data updating time efficiency of different data attributes in the first data set, consumption data of the first application scene and the ratio of the type of the data attributes in the first data set to the data attributes required by the first application scene, a rarity value, a time efficiency value, a consumption value and a feasibility value are obtained in sequence, wherein the first application scene is any one of a plurality of different application scenes;
thus, step S13 refers to step S11 described above to obtain rarity, timeliness, consumability, and feasibility values of 60%, 40%, 25%, and 50%, respectively.
S14, screening second professional data containing rarity, timeliness, consumption and feasibility from all the professional data, converting the second professional data into a relative proportion relation according to the specific proportion relation of the rarity, timeliness, consumption and feasibility in the second professional data, wherein the specific proportion relation is 1, accumulating all the relative proportion relations corresponding to the rarity, timeliness, consumption and feasibility to obtain scene weight ratios among the four, and calculating the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set according to the scene weight ratios to obtain data scene components of the first data set;
wherein, the sum of scene weight ratios of rarity, timeliness, consumption and feasibility is 1.
If each weight ratio obtained in the step S14 is within the corresponding value range, the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set are calculated according to the scene weight ratio to obtain the data scene score of the first data set, otherwise, each generated weight ratio is sent to the expert.
In this embodiment, a second specific data including at least two of rarity, timeliness, consumption and feasibility is selected from all specific data, and the detailed description of step S12 is referred to.
Thus, step S14 refers to step S12 described above to obtain respective weighting ratios of rarity, timeliness, consumability, and feasibility of 15%, 20%, 40%, and 25%, respectively, and then calculates a data scenario score of 60% + 15% + 40% + 25% + 40% + 50% + 25% + 9% + 8% + 10% + 12.5% + 39.5%.
And step S15, taking the product of the data quality score and the data scene score as the data asset value.
Thus, the data asset value of the above embodiment is 39.5% by 93.15% to 36.8%.
Referring to fig. 1, a third embodiment of the present invention is:
based on the second embodiment, the step S4 specifically includes the following steps:
step S41, acquiring a first single attribute data set corresponding to a first data asset value greater than the data cost and a first multi-attribute data set corresponding to a second data asset value greater than the data cost in the current application scene;
the data cost comprises the original cost of data required by the data of each data attribute to be acquired to be stored and the data sale cost of selling one data set. The data cost corresponding to the multi-attribute data set is naturally the sum of the original data cost of all the included data attributes plus one data sale cost.
Step S42, removing data cost from the first data asset value of all the first single-attribute data sets to obtain first data asset net profit, removing data cost from the second data asset value of the first multi-attribute data set to obtain second data asset net profit, combining all the first single-attribute data sets and the first multi-attribute data sets according to a maximum non-repetition principle to obtain a data asset selling combination, and calculating the total profit of the data asset selling combination according to the corresponding data asset net profit, wherein the maximum non-repetition principle means that the number of data attributes contained in the data asset selling combination is the theoretical maximum attribute value and all the data attributes in the data asset selling combination are stored uniquely;
wherein, because a part of the data sets with lower data cost are filtered in step S41, not all data asset sales combinations can contain all data attributes, and the data attributes include purchase item, consumption amount and location information, wherein the first data asset value of consumption amount is smaller than the data cost, the first data asset values of the rest purchase items and location information are all larger than the data cost, and the second data asset values of the 4 multi-attribute data sets of purchase item, consumption amount and location information combination are also all larger than the data cost, that is, the purchase item, location information, purchase item + consumption amount, purchase item + location information, consumption amount + location information and purchase item + consumption amount + location information are all used as a group of data asset sales combinations, in addition, the purchase item and consumption amount + position information, the position information and the purchase item + consumption amount and the purchase item + position information are also three groups of data asset selling combinations respectively, and four groups are totalized, wherein the purchase item + position information does not contain three data attributes because the first data asset value of the consumption amount is smaller than the data cost, but two data asset value are also the theoretical maximum attribute value which can be combined by the data set.
And step S43, taking the data asset selling combination with the highest total profit as the data set with the highest data asset value under the current application scene.
Therefore, the total profit of the position information and the purchase item + the consumption amount in the four groups is the highest, namely, the position information is sold singly, and the purchase item + the consumption amount is sold in a binding manner, so as to ensure that the value of the data asset is maximized.
In other embodiments, the number of data sets for a data asset offering portfolio may be limited to a maximum of two or three, taking into account the negative impact of multiple separate offerings in the same application scenario.
Referring to fig. 2, a fourth embodiment of the present invention is:
the data value evaluation and analysis device 1 based on data attribute analysis comprises a memory 3, a processor 2 and a computer program which is stored on the memory 3 and can run on the processor 2, wherein the processor 2 realizes the steps of the first embodiment, the second embodiment or the third embodiment when executing the computer program.
In summary, the data value evaluation analysis method and device based on data attribute analysis provided by the invention set different dimensional data in data quality and application scene for calculation, search and analyze the weight ratio of each dimensional data in academic papers, academic journals and all professional data related to data value evaluation analysis in published patents, and add the value range provided by experts for artificial constraint to obtain a more accurate data value evaluation analysis algorithm, calculate and obtain the first data asset value of a single attribute data set with a single data attribute in a plurality of different application scenes according to the data value evaluation analysis algorithm and calculate and obtain the second data asset value of each multi-attribute data set in the associated application scenes, and obtain the data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene. Therefore, the data asset value of each single data attribute is considered, whether the combination of the multiple data attributes can generate higher data asset value or not is evaluated, and a data set with higher data asset value is mined to reflect the real value of the data asset of each data attribute, so that the value of the data asset is indirectly improved, and the accurate value evaluation of the data asset is realized.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data value evaluation analysis method based on data attribute analysis is characterized by comprising the following steps:
step S1, calculating and obtaining the first data asset value of a single attribute data set with a single data attribute under a plurality of different application scenes according to a data value evaluation analysis algorithm;
step S2, acquiring data attribute sets required under different application scenes, and combining all data attributes in the data attribute sets into a plurality of data attribute subsets, wherein each data attribute subset is associated with a corresponding application scene and at least comprises two data attributes;
step S3, taking the data sets of all data attributes under the same data attribute subset as a multi-attribute data set, and calculating and obtaining a second data asset value of each multi-attribute data set under the associated application scene according to a data value evaluation analysis algorithm;
and step S4, obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value in the current application scene.
2. The data value evaluation analysis method based on data attribute analysis according to claim 1, wherein the calculation process of the data value evaluation analysis algorithm specifically comprises:
step S11, traversing all data of a first data set, and acquiring the number of missing data fields, the number of data fields not conforming to the corresponding data attribute specification, and whether the data field values on the matching associated items of all data tables are consistent, so as to sequentially obtain an integrity value, an effectiveness value and a consistency value, wherein the first data set is a single-attribute data set or a multi-attribute data set;
step S12, acquiring all professional data related to data value evaluation and analysis in academic papers, academic journals and published patents, screening out first professional data containing integrity, effectiveness and consistency from all the professional data, converting the first professional data into relative proportion relations by uniformly summing specific proportion relations of the integrity, the effectiveness and the consistency in the first professional data into 1, accumulating all the relative proportion relations corresponding to the integrity, the effectiveness and the consistency to obtain a quality weight ratio of the integrity, the effectiveness and the consistency, and calculating the integrity value, the effectiveness value and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set;
step S13, obtaining a rarity value, a timeliness value, a consumption value, and a feasibility value in sequence according to the number of data sources and data update timeliness of different data attributes in the first data set, consumption data of a first application scenario, and a ratio between the type of the data attributes in the first data set and the data attributes required by the first application scenario, where the first application scenario is any one of a plurality of different application scenarios;
step S14, screening second professional data containing rarity, timeliness, consumption and feasibility from all the professional data, converting the second professional data into a relative proportion relation according to the specific proportion relation of the rarity, the timeliness, the consumption and the feasibility in the second professional data, wherein the specific proportion relation is 1 in a unified mode, accumulating all the relative proportion relations corresponding to the rarity, the timeliness, the consumption and the feasibility to obtain scene weight ratios among the four parts, and calculating the rarity value, the timeliness value, the consumption value and the feasibility value of the first data set according to the scene weight ratios to obtain data scene components of the first data set;
and step S15, taking the product of the data quality score and the data scene score as the data asset value.
3. The data value evaluation analysis method based on data attribute analysis according to claim 2, wherein the step of screening out the first professional data including the integrity, the validity and the consistency from all the professional data specifically comprises: screening all professional data to obtain first professional data at least comprising two properties of the completeness, the effectiveness and the consistency;
the second professional data which simultaneously comprise rarity, timeliness, consumption and feasibility and are screened from all the professional data are specifically as follows: screening out second professional data at least comprising two properties of rareness, timeliness, consumption and feasibility from all the professional data;
in the specific proportional relationship, if a certain property is not present, the sum is calculated as 0 by converting the specific proportional relationship into a relative proportional relationship in which the sum is 1.
4. The data value evaluation and analysis method based on data attribute analysis according to claim 2, wherein the quality weight ratios of the completeness, the validity and the consistency and the scene weight ratios of the rarity, the timeliness, the consumption and the feasibility are respectively given to a value range by an expert terminal in advance;
if each weight ratio obtained in the step S12 is within the corresponding value range, calculating the integrity value, the validity value, and the consistency value of the first data set according to the quality weight ratio to obtain a data quality score of the first data set, otherwise, sending each generated weight ratio to an expert;
if each weight ratio obtained in the step S14 is within the corresponding value range, the rarity value, the timeliness value, the consumption value, and the feasibility value of the first data set are calculated according to the scene weight ratio to obtain a data scene score of the first data set, otherwise, each generated weight ratio is sent to an expert.
5. The data value evaluation and analysis method based on data attribute analysis according to claim 4, wherein there are a plurality of expert terminals, and the value range is obtained by a plurality of expert terminal negotiation.
6. The data value evaluation analysis method based on data attribute analysis according to claim 5, wherein the step S14 of sending each generated weight ratio to an expert end specifically comprises the following steps:
and sending each generated weight ratio and a plurality of professional data I which are closest to the generated weight ratio to an expert side.
7. The data value evaluation analysis method based on data attribute analysis according to claim 1, wherein the step S4 specifically includes:
and obtaining a data set with the highest data asset value in the current application scene according to the first data asset value and the second data asset value which are larger than the data cost in the current application scene.
8. The data value evaluation analysis method based on data attribute analysis according to claim 2, wherein the sum of the quality weight ratios of the completeness, the validity and the consistency is 1, and the sum of the scenario weight ratios of the rareness, the timeliness, the consumnability and the feasibility is 1.
9. The data value evaluation analysis method based on data attribute analysis according to claim 1, wherein the step S1 is preceded by the steps of:
and carrying out metadata management on the original data, and taking the obtained metadata as a data set.
10. A data value evaluation and analysis apparatus based on data attribute analysis, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the data value evaluation and analysis method based on data attribute analysis according to any one of claims 1 to 9.
CN202111175477.XA 2021-10-09 2021-10-09 Data value assessment analysis method and device based on data attribute analysis Active CN113901106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111175477.XA CN113901106B (en) 2021-10-09 2021-10-09 Data value assessment analysis method and device based on data attribute analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111175477.XA CN113901106B (en) 2021-10-09 2021-10-09 Data value assessment analysis method and device based on data attribute analysis

Publications (2)

Publication Number Publication Date
CN113901106A true CN113901106A (en) 2022-01-07
CN113901106B CN113901106B (en) 2025-06-13

Family

ID=79190639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111175477.XA Active CN113901106B (en) 2021-10-09 2021-10-09 Data value assessment analysis method and device based on data attribute analysis

Country Status (1)

Country Link
CN (1) CN113901106B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829772A (en) * 2024-01-03 2024-04-05 北京祝融视觉科技股份有限公司 Acquisition management method and system for asset data of infrastructure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1258814A1 (en) * 2001-05-17 2002-11-20 Requisite Technology Inc. Method and apparatus for analyzing the quality of the content of a database
CN110401625A (en) * 2019-03-07 2019-11-01 中国科学院软件研究所 Methods of risk assessment and system based on association analysis
CN111667072A (en) * 2020-05-15 2020-09-15 中国电子科技集团公司电子科学研究院 Method for evaluating information use value
CN111724084A (en) * 2020-07-27 2020-09-29 腾讯科技(深圳)有限公司 Value display method, device, equipment and storage medium of data assets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1258814A1 (en) * 2001-05-17 2002-11-20 Requisite Technology Inc. Method and apparatus for analyzing the quality of the content of a database
CN110401625A (en) * 2019-03-07 2019-11-01 中国科学院软件研究所 Methods of risk assessment and system based on association analysis
CN111667072A (en) * 2020-05-15 2020-09-15 中国电子科技集团公司电子科学研究院 Method for evaluating information use value
CN111724084A (en) * 2020-07-27 2020-09-29 腾讯科技(深圳)有限公司 Value display method, device, equipment and storage medium of data assets

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志刚;杨栋枢;吴红侠;: "数据资产价值评估模型研究与应用", 现代电子技术, no. 20, 15 October 2015 (2015-10-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829772A (en) * 2024-01-03 2024-04-05 北京祝融视觉科技股份有限公司 Acquisition management method and system for asset data of infrastructure

Also Published As

Publication number Publication date
CN113901106B (en) 2025-06-13

Similar Documents

Publication Publication Date Title
Gleixner et al. MIPLIB 2017: data-driven compilation of the 6th mixed-integer programming library
CN100570606C (en) Aggregate data with synthetic operations
US20150309919A1 (en) System and method for generating synthetic data for software testing purposes
Pradana et al. Blockchain modeling for traceability information system in supply chain of coffee agroindustry
CN110766272A (en) Power business collaborative classification method and system based on ID3 decision tree algorithm
CN110728422A (en) Building information model, method, device and settlement system for construction project
CN113449004A (en) Data matching method and device
CN109598484A (en) A kind of project under construction turns fixed assets number auditing method and device
Mills et al. A machine learning approach for determining the validity of traceability links
CN111177188A (en) Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge
WO2011080347A1 (en) Art evaluation engine and method for automatic development of an art index
CN106790529A (en) The dispatching method of computing resource, control centre and scheduling system
Reinhard et al. Contribution-based prioritization of LCI database improvements: Method design, demonstration, and evaluation
CN111310032A (en) Resource recommendation method and device, computer equipment and readable storage medium
Xie et al. Evaluating performance of super-efficiency models in ranking efficient decision-making units based on Monte Carlo simulations
CN113901106A (en) Data value evaluation analysis method and device based on data attribute analysis
CN114510462A (en) Method, platform, system, device and medium for measuring software development efficiency
CN113393297A (en) Financial product pushing method and device
CN109271413A (en) A kind of method, apparatus and computer storage medium of data query
CN116151668A (en) A method and device for determining the quality of grid data
Volk et al. Ask the Right Questions: Requirements Engineering for the Execution of Big Data Projects.
CN107832937A (en) Financial technology Central exponent analysis method, storage medium and equipment
CN118396719A (en) Agricultural product sales lead intelligent rating and recommending method and system
CN116580249B (en) Method, system and storage medium for classifying beats based on ensemble learning model
JP2005115594A (en) Patent asset value evaluation device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant