CN119691470B - Audit method based on big data - Google Patents
Audit method based on big data Download PDFInfo
- Publication number
- CN119691470B CN119691470B CN202510205822.1A CN202510205822A CN119691470B CN 119691470 B CN119691470 B CN 119691470B CN 202510205822 A CN202510205822 A CN 202510205822A CN 119691470 B CN119691470 B CN 119691470B
- Authority
- CN
- China
- Prior art keywords
- data
- node
- item
- feature
- event node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an audit method based on big data, which relates to the technical field of audit methods, and comprises the steps of initializing initial position characteristics and initial importance characteristics of each item node in a data set; the method comprises the steps of calculating the support degree given by adjacent nodes of each item node, calculating the aggregate characteristic of each item node according to the support degree after normalization processing, determining the initial importance characteristic when the similarity function is maximized as the target importance characteristic of the item node, calculating the matching degree between each item node in the data set according to the target importance characteristic of each item node in the two data sets, determining two item nodes with the matching degree not smaller than a threshold value as related items, auditing each data set based on each related item, and providing reference for the auditing process by finding out the relation between each item in the two data sets, thereby improving the auditing efficiency and auditing quality.
Description
Technical Field
The invention relates to the technical field of auditing methods, in particular to an auditing method based on big data.
Background
The audit is for the execution conditions of economic activities, financial balance, financial regulations and the like of a certain industry, is organized, led and planned from top to bottom in an audit organization and audit staff, is expanded from unit economic activity audit to whole industry economic activity audit, is changed from microscopic economic activity audit to medium-view economic activity or macroscopic economic activity audit, takes data as the audit basis, and the quality of the data directly influences the audit quality, and the quality of the data currently used for audit needs to be improved.
Disclosure of Invention
The invention aims to provide an audit method based on big data, which can improve the quality of data materials used for audit work, help organizations and institutions to more efficiently conduct audit work, improve audit quality and reduce risks under the condition of ensuring compliance.
The technical aim of the invention is realized by the following technical scheme:
in a first aspect, the present application provides an audit method based on big data, comprising the following specific steps:
acquiring at least two data sets to be audited of different types, and preprocessing the data sets, wherein the preprocessing comprises data cleaning and abnormal data detection;
Based on each preprocessed data set, utilizing Gaussian distribution random initialization to obtain initial position characteristics and initial importance characteristics of each item node in the data set;
Calculating the support degree given by the adjacent node of each item node by using the initial position feature and the initial importance feature, and calculating the aggregate feature of each item node fused with the adjacent node feature according to the support degree after normalization treatment;
Constructing a similarity function of each item node by using the initial position features and the aggregation features, and determining the initial importance features when the similarity function is maximized as target importance features of each item node;
according to the target importance characteristics of each item node in at least two data sets, calculating to obtain the matching degree between each item node in different types of data sets, determining two item nodes with the matching degree not smaller than a threshold value as associated items, and auditing each data set based on each associated item.
The method has the advantages that in the scheme, firstly, preprocessing such as data cleaning and abnormal data detection is conducted on a data set to be audited, repeated data and abnormal data in the data set are removed, the data in the data set are enabled to be more simplified and accurate, then initial position features and initial importance features of all item nodes in the data set are obtained through Gaussian distribution random initialization, secondly, the support degree given by adjacent nodes of all item nodes is calculated through the initial position features and the initial importance features, aggregate features of the adjacent node features are obtained through calculation of the support degree after normalization processing of all item nodes, similarity functions of all item nodes are built through the initial position features and the aggregate features, initial importance features when the similarity functions are maximized are determined to be target importance features of all item nodes, finally, matching items among all item nodes in different types of data sets are calculated according to the target importance features of all item nodes in all data sets, two item nodes with the matching degrees not smaller than a threshold value are determined to be related items, the association items are indicated to be related items, the association items are high, the association items can be related to each other, and the audit items can be provided with high correlation effects on all data sets, and the audit sets can be promoted, and the audit items are relevant to all the data sets are strongly correlated.
In the scheme, if the data sets of economic activities and the data sets of financial balances are jointly examined, the expenditure of a certain event in the economic activities is a certain value, and the expenditure of a certain item in the financial balances is also the value, and under the condition that the detailed activity content is not clear, the two events in the two data sets show higher matching degree, so that when the data sets of different types are examined jointly, reference is provided for the auditing process by finding out the relation between the items in the two data sets of different types, higher improvement is provided for the auditing work, the quality of the data materials for the auditing work is improved, the organization and the organization are helped to more efficiently conduct the auditing work, the auditing quality is improved, and the risk is reduced under the condition that the compliance is ensured.
On the basis of the technical scheme, the invention can be improved as follows.
Further, the data cleaning specifically includes:
discretizing attribute items of each original data in the normalized data set, and transforming each attribute value obtained after discretization into a preset integer interval according to the size;
calculating information gain rates of attribute items of each original data based on the converted attribute values, and constructing an attribute set through each information gain rate which is not smaller than a preset value;
And inserting each data corresponding to the attribute set into a preset prefix tree, traversing each leaf node in the prefix tree to delete the repeated data, and obtaining a data set after data cleaning.
The data cleaning process by calculating the information gain rate and inserting the prefix tree can reduce the time complexity of the detection process and ensure the accuracy of the data set.
Further, the abnormal data detection specifically includes:
clustering the normalized data set by using a K-Means clustering algorithm, and obtaining a plurality of data clusters formed by each original data in the data set;
based on a plurality of data clusters, calculating to obtain a first Euclidean distance between each data in each data cluster and other data in the same cluster and a second Euclidean distance between each data in each data cluster and each data in other data clusters;
calculating an outlier factor of each data in each data cluster based on the first Euclidean distance, and determining the data with the outlier factor not smaller than a first threshold value as local isolated data;
And determining each data with the second Euclidean distance not smaller than a second threshold value as global isolated data, determining original data corresponding to the local isolated data and the global isolated data as abnormal data, deleting the abnormal data, and obtaining a data set detected by the abnormal data.
The adoption of the further scheme has the beneficial effects that as the data to be audited is generally complicated, the data volume is larger, and the isolated points based on the density are not ideal in algorithm execution efficiency and global isolated points identification, the global isolated points can be identified while the algorithm execution time is reduced by combining with the clustering algorithm thought.
Further, the information gain ratio of the attribute items of each original data is specifically:
wherein: ;
In the formula, The gain ratio of the information representing attribute item a in data set D,Information gain representing attribute item a in dataset D,Representing the number of samples for which the attribute item a has a value i,Representing the total number of samples in the data set D, n representing the number of values of the attribute item a.
Further, the outlier factor of each data in each data cluster is specifically:
;
In the formula, Representing dataIs used to determine the outlier factor of (1),Representing dataIs used for the distance to be reached,DataIs used for the production of the high-density polyethylene,Representing distance dataThe most recent k data constitute a set.
Further, the support degree given by the neighboring node of each item node is specifically:
In the formula (I), in the formula (II), Indicating the degree of support given by the neighboring node n to the item node m,Represents an initial importance feature of the item node m,Representing an initial importance feature of the neighboring node n;
each item node fuses the aggregation characteristics of adjacent node characteristics, specifically:
In the formula (I), in the formula (II), Represents the aggregate characteristics of the transaction node m,Representing a set of neighboring nodes to item node m,Representing the support after normalization by the softmax function,Representing the initial location characteristics of the neighboring node n.
Further, the similarity function specifically includes:
In the formula (I), in the formula (II), The value of the objective function is indicated,Represents the aggregate characteristics of the transaction node m,The initial position characteristic of the item node m is represented, and the corner mark T represents the vector transposition operation;
The matching degree between each item node is specifically:
wherein:
,;
In the formula, Represents the degree of matching between item node m and item node n, the corner label T represents the vector transpose operation,A target importance feature representing a transaction node m,The target importance characteristics of item node n,Respectively representing a weight matrix and a bias term respectively corresponding to the item nodes m,The weight matrix and the bias term respectively corresponding to the item node n are respectively represented.
In a second aspect, the present application provides a big data based auditing system, applied to any one of the first aspects, comprising:
the first module is used for acquiring at least two data sets to be audited of different types, and preprocessing the data sets, wherein the preprocessing comprises data cleaning and abnormal data detection;
The second module is used for randomly initializing by utilizing Gaussian distribution based on each preprocessed data set to obtain initial position characteristics and initial importance characteristics of each item node in the data set;
The third module is used for calculating the support degree given by the adjacent node of each item node by utilizing the initial position feature and the initial importance feature, and obtaining the aggregate feature of the adjacent node feature fused by each item node according to the support degree calculation after normalization processing;
A fourth module, configured to construct a similarity function of each item node using the initial position feature and the aggregate feature, and determine an initial importance feature when the similarity function is maximized as a target importance feature of each item node;
and a fifth module, configured to calculate, according to the target importance characteristics of each item node in at least two data sets, a matching degree between each item node in different types of data sets, determine two item nodes with matching degrees not smaller than a threshold value as related items, and audit each data set based on each related item.
In a third aspect, the application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the first aspects when executing the computer program.
In a fourth aspect, the present application provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the method of any one of the first aspects.
Compared with the prior art, the invention has at least the following beneficial effects:
According to the method, firstly, preprocessing such as data cleaning and abnormal data detection is carried out on a data set to be audited, repeated data and abnormal data in the data set are removed, so that the data in the data set is more simplified and accurate, then initial position features and initial importance features of all item nodes in the data set are obtained through Gaussian distribution random initialization, secondly, the support degree given by adjacent nodes of all item nodes is calculated by the initial position features and the initial importance features, aggregate features of adjacent node features are obtained through calculation according to the support degree after normalization processing, similarity functions of all item nodes are built by the initial position features and the aggregate features, the initial importance features are determined to be target importance features of all item nodes when the similarity functions are maximized, finally, matching degrees among all item nodes in the data set are calculated according to the target importance features of all item nodes in the two data sets, two item nodes with the matching degrees not smaller than a threshold value are determined to be related items, the association items are represented by association, the association shows that the association between two item nodes can be high, the association items can be provided with relative to each other, and the audit set can be promoted based on the two item nodes, and the audit sets have final effect on all the data sets.
In the application, when the data sets of different types are subjected to joint audit, the relation between matters in the two data sets of different types is found, a reference is provided for the audit process, the audit work is improved to a higher degree, the quality of data materials used for the audit work is improved, the organization and the organization are helped to more efficiently carry out the audit work, the audit quality is improved, the risk is reduced under the condition of ensuring the compliance, the time complexity of the detection process is reduced and the accuracy of the data sets is ensured by the data cleaning process carried out in a mode of calculating the information gain rate and inserting a prefix tree, meanwhile, the data quantity to be audited is larger, the isolated point based on the density is not ideal in algorithm execution efficiency and global isolated point identification, and the global isolated point can be identified while the algorithm execution time is reduced by combining the clustering algorithm idea.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings:
FIG. 1 is a method flow diagram of an audit method in an embodiment of the present invention;
FIG. 2 is a schematic diagram of the connection of an audit system in an embodiment of the present invention;
fig. 3 is a schematic connection diagram of an electronic device according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the embodiments of the present invention, "plurality" means at least 2.
In order to improve the quality of data materials used for auditing works, help organizations and institutions to more efficiently conduct auditing works, and reduce risks under the conditions of improving auditing quality and ensuring compliance, the embodiment provides an auditing method based on big data, as shown in fig. 1, comprising the following specific steps:
S1, acquiring at least two data sets to be audited of different types, and preprocessing the data sets, wherein the preprocessing comprises data cleaning and abnormal data detection.
Optionally, the data cleaning specifically includes:
S11, discretizing attribute items of each original data in the normalized data set, and transforming each attribute value obtained after discretization into a preset integer interval according to the size.
And S12, calculating information gain rates of attribute items of the original data based on the converted attribute values, and constructing an attribute set through the information gain rates not smaller than a preset value.
The information gain ratio of the attribute items of each original data is specifically:
wherein: ;
In the formula, The gain ratio of the information representing attribute item a in data set D,Information gain representing attribute item a in dataset D,Representing the number of samples for which the attribute item a has a value i,Representing the total number of samples in the data set D, n representing the number of values of the attribute item a.
S13, inserting each data corresponding to the attribute set into a preset prefix tree, traversing each leaf node in the prefix tree to delete the repeated data, and obtaining a data set after data cleaning.
Specifically, the preset prefix tree has the following characteristics and structural improvement points:
1) Non-leaf nodes act as index splitting entries and do not store data information.
2) Only leaf nodes store data and may store multiple pieces of data.
3) There are n attributes as data of index item, there are n+2 layers, wherein the first layer is root node, and the last layer stores sample data information.
When detecting repeated data in the prefix tree, firstly, traversing the data in each leaf node in turn, secondly, comparing each sample point with the data in the leaf node where the sample point is located, outputting the data to a similar data set if the similarity between the two pieces of data is larger than a given threshold value, marking the data as compared data after the comparison with other data, and comparing A with B when traversing the data for the next time, if A and B are in the same leaf node, outputting A to the similar data set if the similarity is larger than the given threshold value, deleting A from the leaf node where the sample point is located, and only one piece of data at the leaf node where C, D, G is located, wherein the data which exist singly are not repeated data.
Specifically, after the prefix tree is improved, similar data can be quickly gathered in the same leaf node, the operation process is reduced, then the leaf node is traversed, the similarity among the data is calculated in the leaf node, the detection of the repeated data is completed, and the efficiency of the repeated data detection is improved.
Optionally, the detecting of the abnormal data specifically includes:
s14, clustering the normalized data set by using a K-Means clustering algorithm, and obtaining a plurality of data clusters formed by the original data in the data set.
And S15, calculating to obtain a first Euclidean distance between each data in each data cluster and other data in the same cluster and a second Euclidean distance between each data in each data cluster and each data in other data clusters based on the plurality of data clusters.
S16, calculating an outlier factor of each data in each data cluster based on the first Euclidean distance, and determining the data with the outlier factor not smaller than a first threshold value as local isolated data.
The outlier factor of each data in each data cluster is specifically:
;
In the formula, Representing dataIs used to determine the outlier factor of (1),Representing dataIs used for the distance to be reached,DataIs used for the production of the high-density polyethylene,Representing distance dataThe most recent k data constitute a set.
S17, determining each data with the second Euclidean distance not smaller than a second threshold value as global isolated data, determining original data corresponding to the local isolated data and the global isolated data as abnormal data, deleting the abnormal data, and obtaining a data set detected by the abnormal data.
Specifically, because the data to be audited is generally complicated, the data volume is larger, and the isolated points based on the density are not ideal in algorithm execution efficiency and global isolated points identification, the global isolated points can be identified while the algorithm execution time is reduced by combining the thought of clustering algorithm.
S2, based on each preprocessed data set, utilizing Gaussian distribution random initialization to obtain initial position characteristics and initial importance characteristics of each item node in the data set.
Therefore, the position features represent the environment information of the adjacent nodes, the importance features represent unique supporting relations, and compared with the position features, the importance features have stronger distinguishing property.
And S3, calculating the support degree given by the adjacent nodes of each item node by using the initial position features and the initial importance features, and calculating the aggregate features of the adjacent node features fused with each item node according to the support degree after normalization processing.
The support degree given by the adjacent node of each item node is specifically:
In the formula (I), in the formula (II), Indicating the degree of support given by the neighboring node n to the item node m,Represents an initial importance feature of the item node m,Representing an initial importance feature of the neighboring node n;
further, each item node fuses the aggregation characteristics of the adjacent node characteristics, specifically:
In the formula (I), in the formula (II), Represents the aggregate characteristics of the transaction node m,Representing a set of neighboring nodes to item node m,Representing the support after normalization by the softmax function,Representing the initial location characteristics of the neighboring node n.
S4, constructing a similarity function of each item node by using the initial position features and the aggregation features, and determining the initial importance features when the similarity functions are maximized as target importance features of each item node, wherein the maximized value of the similarity functions is 100%, namely 1.
Specifically, the similarity function specifically includes:
In the formula (I), in the formula (II), The value of the objective function is indicated,Represents the aggregate characteristics of the transaction node m,The initial position feature of the item node m is represented, and the corner mark T represents the vector transpose operation.
S5, calculating to obtain the matching degree between the item nodes in the data sets of different types according to the target importance characteristics of the item nodes in the data sets, determining the two item nodes with the matching degree not smaller than a threshold value as related items, and auditing the data sets based on the related items.
Specifically, when the data sets of different types are subjected to joint audit, the relation among all matters in the two data sets of different types is found, so that references are provided for the audit process, the audit work is improved, the quality of data materials for the audit work is improved, organizations and institutions are helped to carry out the audit work more efficiently, the audit quality is improved, and risks are reduced under the condition of ensuring compliance.
The matching degree between each item node is specifically:
wherein:
,;
In the formula, Represents the degree of matching between item node m and item node n, the corner label T represents the vector transpose operation,A target importance feature representing a transaction node m,The target importance characteristics of item node n,Respectively representing a weight matrix and a bias term respectively corresponding to the item nodes m,The weight matrix and the bias term respectively corresponding to the item node n are respectively represented.
Embodiment 2. The embodiment of the application provides an audit system based on big data, which is applied to any one of the embodiment 1 and is shown in fig. 2, and comprises the following steps:
The first module is used for acquiring at least two data sets to be audited of different types, and preprocessing the data sets, wherein the preprocessing comprises data cleaning and abnormal data detection.
And the second module is used for randomly initializing by utilizing Gaussian distribution based on each preprocessed data set to obtain the initial position characteristic and the initial importance characteristic of each item node in the data set.
And the third module is used for calculating the support degree given by the adjacent node of each item node by utilizing the initial position feature and the initial importance feature, and obtaining the aggregate feature of the adjacent node feature fused by each item node according to the support degree calculation after normalization processing.
And a fourth module, configured to construct a similarity function of each item node using the initial position feature and the aggregate feature, and determine an initial importance feature when the similarity function is maximized as a target importance feature of each item node.
And a fifth module, configured to calculate, according to the target importance characteristics of each item node in at least two data sets, a matching degree between each item node in different types of data sets, determine two item nodes with matching degrees not smaller than a threshold value as related items, and audit each data set based on each related item.
Embodiment 3 an embodiment of the present application provides an electronic device, as shown in fig. 3, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of embodiment 1 when executing the computer program.
Embodiment 4. The present application provides a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of embodiment 1.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202510205822.1A CN119691470B (en) | 2025-02-25 | 2025-02-25 | Audit method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202510205822.1A CN119691470B (en) | 2025-02-25 | 2025-02-25 | Audit method based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119691470A CN119691470A (en) | 2025-03-25 |
CN119691470B true CN119691470B (en) | 2025-06-13 |
Family
ID=95027844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202510205822.1A Active CN119691470B (en) | 2025-02-25 | 2025-02-25 | Audit method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119691470B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076352A (en) * | 2021-03-17 | 2021-07-06 | 远光软件股份有限公司 | Auditing method, electronic device and storage medium |
CN113657549A (en) * | 2021-08-31 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Medical data auditing method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8917902B2 (en) * | 2011-08-24 | 2014-12-23 | The Nielsen Company (Us), Llc | Image overlaying and comparison for inventory display auditing |
JP7012895B1 (en) * | 2021-07-19 | 2022-01-28 | 株式会社Tkc | Accounting systems, methods, and programs |
CN118312909B (en) * | 2024-06-06 | 2024-10-18 | 湖南三湘银行股份有限公司 | Bank auditing method and system based on deep neural network |
CN118396684B (en) * | 2024-06-26 | 2024-09-20 | 广东省广告集团股份有限公司 | User advertisement recommendation method and device based on fused neural network and model construction method thereof |
-
2025
- 2025-02-25 CN CN202510205822.1A patent/CN119691470B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076352A (en) * | 2021-03-17 | 2021-07-06 | 远光软件股份有限公司 | Auditing method, electronic device and storage medium |
CN113657549A (en) * | 2021-08-31 | 2021-11-16 | 平安医疗健康管理股份有限公司 | Medical data auditing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN119691470A (en) | 2025-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12216683B1 (en) | Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis | |
US10515090B2 (en) | Data extraction and transformation method and system | |
US20220075762A1 (en) | Method for classifying an unmanaged dataset | |
RU2268488C2 (en) | Method and system for data organization | |
US6542896B1 (en) | System and method for organizing data | |
CN102197406B (en) | fuzzy data manipulation | |
CN105469096B (en) | A kind of characteristic bag image search method based on Hash binary-coding | |
CN108647322B (en) | Method for identifying similarity of mass Web text information based on word network | |
CN107291895B (en) | A Fast Hierarchical Document Query Method | |
CN107463665A (en) | A kind of data correlation rule mining algorithms | |
US11188981B1 (en) | Identifying matching transfer transactions | |
CN109582783B (en) | Hot topic detection method and device | |
CN108564009A (en) | A kind of improvement characteristic evaluation method based on mutual information | |
US20220229854A1 (en) | Constructing ground truth when classifying data | |
CN109992676A (en) | A kind of cross-media resource retrieval method and retrieval system | |
CN119807912A (en) | Abnormal data detection method based on improved differential privacy and clustering algorithm optimization | |
CN119691470B (en) | Audit method based on big data | |
CN114328600A (en) | Method, device, equipment and storage medium for determining standard data element | |
CN115964658B (en) | A clustering-based classification label updating method and system | |
Gabor-Toth et al. | Linking Deutsche Bundesbank Company Data | |
CN111625530A (en) | Large-scale vector retrieval method and device | |
CN114328844B (en) | A text data set management method, device, equipment and storage medium | |
CN113988878B (en) | Graph database technology-based anti-fraud method and system | |
CN115186138A (en) | A method and terminal for comparison of distribution network data | |
CN111753084B (en) | Short text feature extraction and classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |