CN110502552B - Classification data conversion method based on fine-tuning conditional probability - Google Patents
Classification data conversion method based on fine-tuning conditional probability Download PDFInfo
- Publication number
- CN110502552B CN110502552B CN201910770010.6A CN201910770010A CN110502552B CN 110502552 B CN110502552 B CN 110502552B CN 201910770010 A CN201910770010 A CN 201910770010A CN 110502552 B CN110502552 B CN 110502552B
- Authority
- CN
- China
- Prior art keywords
- data
- fine
- tuning
- conditional probability
- numerical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及数据预处理的数据挖掘或机器学习领域,具体涉及一种基于微调条件概率的分类数据转换方法。The invention relates to the field of data mining or machine learning for data preprocessing, in particular to a classification data conversion method based on fine-tuning conditional probability.
背景技术Background technique
在一个数据挖掘或机器学习任务中,采集的数据通常会包含数值型和分类型两类数据。然而大部分机器学习算法(如神经网络、支持向量机、逻辑回归等)只能直接处理数值数据,仅有少数地如决策树、贝叶斯等算法可直接处理分类数据;此外,直接处理数值数据的算法通常比直接处理分类数据的算法具有更高效的性能。为了能广泛地使用数值输入的机器学习算法,分类数据需要转换为数值数据。目前,国内外已经提出了多种分类数据转换方法,然而,这些方法多数存在的一个缺陷是将分类数据转换为低质量的数值数据,从而偏离了原始数据的真实分布,以至于降低了下一阶段机器学习算法的性能和可靠性。因此,研究一种高效合理的分类数据转换方法极为重要。In a data mining or machine learning task, the collected data usually contains two types of data, numerical and categorical. However, most machine learning algorithms (such as neural networks, support vector machines, logistic regression, etc.) can only directly process numerical data, and only a few algorithms such as decision trees and Bayesian can directly process classified data; Algorithms for categorical data often have more efficient performance than algorithms that process categorical data directly. In order to widely use machine learning algorithms with numerical input, categorical data needs to be converted to numerical data. At present, a variety of categorical data conversion methods have been proposed at home and abroad. However, one of the defects of most of these methods is that the categorical data is converted into low-quality numerical data, which deviates from the true distribution of the original data, so that the next step is reduced. Performance and reliability of stage machine learning algorithms. Therefore, it is extremely important to study an efficient and reasonable classification data conversion method.
在分类数据转换为数值数据的众多方法之中,最常用的方法是独热编码(One-hotEncoding),它将分类属性内的每个分类值转换为一个高维的0-1向量;当分类属性的分类值基数很大时,这个方法极易出现维度灾难问题,从而增加数据存储的开销和后序机器学习算法的时间开销。为此,专利CN109740680A公开了一种混合值属性审批数据的分类方法及系统,通过独热编码转换为高维的数值数据后,再用神经网络进行深度编码以降低属性维度,但是需要花费大量的时间去寻找一个好的神经网络结构;专利US20190164083A1公开了一种自然语言处理领域下用于机器学习的分类数据转换和聚类方法,该方法首先也是使用独热编码转换,随后使用聚类算法去降低属性维度。除了独热编码及其改进方法外,专利CN109255373A公开了一种分类数据数字化的数据处理方法,但该方法仅应用于土地利用和土壤类型等环境领域问题,不具普适性。授权专利US9619757B2公开了一种使用结果可能性的标称属性转换方法,它将每个分类值转换为该分类值在数据集中出现的可能性(或概率),这种方法没有考虑类标签信息,因此可能会损失部分信息。Among the many methods for converting categorical data into numerical data, the most commonly used method is One-hot Encoding, which converts each categorical value in a categorical attribute into a high-dimensional 0-1 vector; when the classification When the categorical value cardinality of attributes is large, this method is prone to the curse of dimensionality, which increases the overhead of data storage and the time overhead of subsequent machine learning algorithms. For this reason, patent CN109740680A discloses a classification method and system for mixed-value attribute approval data. After one-hot encoding is converted into high-dimensional numerical data, neural network is used for deep encoding to reduce the attribute dimension, but it takes a lot of money Time to find a good neural network structure; patent US20190164083A1 discloses a classification data conversion and clustering method for machine learning in the field of natural language processing. This method first uses one-hot encoding conversion, and then uses a clustering algorithm to Reduce attribute dimensionality. In addition to one-hot encoding and its improved method, patent CN109255373A discloses a data processing method for digitizing classified data, but this method is only applicable to environmental issues such as land use and soil type, and is not universal. The authorized patent US9619757B2 discloses a nominal attribute conversion method using the possibility of results, which converts each categorical value into the possibility (or probability) of the categorical value appearing in the data set. This method does not consider the class label information. Therefore some information may be lost.
Kasif等人考虑了类标签信息后提出了一种基于记忆推理的转换方法,将分类属性内的每个分类值转换为一个条件概率向量。然而他们并没有将转换的条件概率向量应用于数值输入的机器学习算法,而只是用于计算分类值之间的距离。Hernández-Pereira等人将上述转换方法的条件概率应用于数值输入的神经网络算法,并在入侵检测问题中取得了很好的实验效果。基于记忆推理的转换方法因考虑了类标签信息而获得了较高质量的数值数据,然而,我们通过深入分析这种转换方法后发现:它依赖属性独立假设,假设数据集内的属性之间是相互独立的。当属性之间存在某种依赖关系时便违反了这个假设(注:属性之间通常是相互依赖的),从而转换后的条件概率也不太可靠,稍许的偏离了原始数据的真实分布。Kasif et al. proposed a conversion method based on memory inference after considering the class label information, converting each categorical value within a categorical attribute into a conditional probability vector. However, they did not apply the transformed conditional probability vectors to machine learning algorithms for numerical inputs, but only for computing distances between categorical values. Hernández-Pereira et al. applied the conditional probability of the above conversion method to the neural network algorithm of numerical input, and achieved good experimental results in the intrusion detection problem. The conversion method based on memory reasoning has obtained higher-quality numerical data due to the consideration of class label information. However, we have found through in-depth analysis of this conversion method that it relies on the attribute independence assumption, assuming that the attributes in the data set are mutually independant. This assumption is violated when there is a certain dependency between attributes (note: attributes are usually interdependent), so the converted conditional probability is not very reliable, and slightly deviates from the true distribution of the original data.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种基于微调条件概率的分类数据转换方法,可将分类数据集中的分类值转换为高质量的数值向量,使得转换后的数值数据依然能保持原始数据的真实分布,从而提高了下一阶段机器学习算法的分类性能,并保证了数据挖掘任务的可靠性。The purpose of the present invention is to provide a classification data conversion method based on fine-tuning conditional probability, which can convert the classification values in the classification data set into high-quality numerical vectors, so that the converted numerical data can still maintain the true distribution of the original data, thereby It improves the classification performance of the next-stage machine learning algorithms and ensures the reliability of data mining tasks.
本发明提出的一种基于微调条件概率的分类数据转换方法,包括:A classification data conversion method based on fine-tuning conditional probability proposed by the present invention includes:
S1、分类数据的数据采集;S1. Data collection of classified data;
S2、数据预处理,清洗分类数据中的缺失数据,噪音数据,以及无效数据;S2. Data preprocessing, cleaning missing data, noisy data, and invalid data in the classified data;
S3、条件概率计算,将清洗以后的分类数据转换为数值向量;S3. Conditional probability calculation, converting the cleaned classified data into numerical vectors;
S4、微调条件概率,对步骤S3中转换后的数值向量进行数值微调;S4, fine-tuning the conditional probability, performing numerical fine-tuning on the converted numerical vector in step S3;
S5、分类数据的数值嵌入,对步骤S4中进行数值微调以后的数值向量,采用原始的分类数据嵌入或映射为数值数据。S5. Numerical embedding of categorical data, embedding or mapping the original categorical data into numerical data for the numerical vector after numerical fine-tuning in step S4.
本发明一种基于微调条件概率的分类数据转换方法的有益效果:可靠性:可将分类数据集中的分类值转换为高质量的数值向量,转换后的数值数据能保持原始数据的真实分布,保证了数据挖掘任务的可靠性;Beneficial effects of the classification data conversion method based on fine-tuning conditional probability of the present invention: reliability: the classification values in the classification data set can be converted into high-quality numerical vectors, and the converted numerical data can maintain the true distribution of the original data, ensuring Improve the reliability of data mining tasks;
高性能:转换的数据应用于下一阶段机器学习算法后,能取得高性能指标(高的准确率,召回率,F得分等);High performance: After the converted data is applied to the next stage of machine learning algorithms, high performance indicators (high accuracy, recall, F score, etc.) can be obtained;
高效性:转换的数据维度远低于独热编码方法,且比独热编码及其改进方法具有更少的运行时间;Efficiency: The converted data dimension is much lower than the one-hot encoding method, and has less running time than the one-hot encoding and its improved methods;
便捷性:预设的参数个数少,减少用户设置参数带来的麻烦,更有利于实际的应用场景;Convenience: The number of preset parameters is small, which reduces the trouble caused by user setting parameters, and is more conducive to actual application scenarios;
普适性:它是一种基于数据驱动的转换方法,能自适应的应用于各种分类数据集。Universality: It is a data-driven transformation method that can be adaptively applied to various classification data sets.
附图说明Description of drawings
图1为本发明实施例的一种基于微调条件概率的分类数据转换方法的算法流程图;FIG. 1 is an algorithm flow chart of a classification data conversion method based on fine-tuning conditional probability according to an embodiment of the present invention;
图2为本发明实施例的一种基于微调条件概率的分类数据转换方法实际运用环境图;FIG. 2 is an actual application environment diagram of a classification data conversion method based on fine-tuning conditional probability according to an embodiment of the present invention;
图3为本发明实施例的一种基于微调条件概率的分类数据转换方法分类数据矩阵的样例图;3 is a sample diagram of a classification data matrix based on a fine-tuning conditional probability classification data conversion method according to an embodiment of the present invention;
图4为本发明实施例的一种基于微调条件概率的分类数据转换方法应用系统架构图;FIG. 4 is an application system architecture diagram of a classified data conversion method based on fine-tuning conditional probability according to an embodiment of the present invention;
图5为本发明实施例的一种基于微调条件概率的分类数据转换方法实现分类数据转换的一个示例;FIG. 5 is an example of a classification data conversion method based on fine-tuning conditional probability to realize classification data conversion according to an embodiment of the present invention;
其中:101、数据采集,102、网络,103、数据库,104、服务系统,105、用户设备,200、分类数据样例,301、数据转换模块,302、分类器模块,303、分析报告,401、条件概率计算,402、估计有效范围,403、微调条件概率,404、微调后验证,405、条件判断,501、分类数据集,502、分类属性,505、数值数据集。Among them: 101. Data collection, 102. Network, 103. Database, 104. Service system, 105. User equipment, 200. Classification data sample, 301. Data conversion module, 302. Classifier module, 303. Analysis report, 401 . Calculation of conditional probability, 402. Estimated effective range, 403. Fine-tuning conditional probability, 404. Verification after fine-tuning, 405. Conditional judgment, 501. Classification data set, 502. Classification attribute, 505. Numerical data set.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明的一种基于微调条件概率的分类数据转换方法作进一步的说明。以下实施例仅用于更加清楚地说明本发明的技术方案,而不能以此来限制本发明的保护范围。A classification data conversion method based on fine-tuning conditional probability of the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.
如图1所示,本发明是一种基于微调条件概率的分类数据转换方法,包括:As shown in Figure 1, the present invention is a classification data conversion method based on fine-tuning conditional probability, comprising:
S1、分类数据的数据采集101;S1. Data collection of
S2、数据预处理,清洗分类数据中的缺失数据,噪音数据,以及无效数据;S2. Data preprocessing, cleaning missing data, noisy data, and invalid data in the classified data;
S3、条件概率计算401,将清洗以后的分类数据转换为数值向量;S3.
S4、微调条件概率,对步骤S3中转换后的数值向量进行数值微调;S4, fine-tuning the conditional probability, performing numerical fine-tuning on the converted numerical vector in step S3;
S5、分类数据的数值嵌入,对步骤S4中进行数值微调以后的数值向量,采用原始的分类数据嵌入或映射为数值数据。S5. Numerical embedding of categorical data, embedding or mapping the original categorical data into numerical data for the numerical vector after numerical fine-tuning in step S4.
本发明一种基于微调条件概率的分类数据转换方法的有益效果:可靠性:可将分类数据集501中的分类值转换为高质量的数值向量,转换后的数值数据集505能保持原始数据的真实分布,保证了数据挖掘任务的可靠性;Beneficial effects of a classification data conversion method based on fine-tuning conditional probability of the present invention: reliability: the classification value in the
高性能:转换的数据应用于下一阶段机器学习算法后,能取得高性能指标(高的准确率,召回率,F得分等);High performance: After the converted data is applied to the next stage of machine learning algorithms, high performance indicators (high accuracy, recall, F score, etc.) can be obtained;
高效性:转换的数据维度远低于独热编码方法,且比独热编码及其改进方法具有更少的运行时间;Efficiency: The converted data dimension is much lower than the one-hot encoding method, and has less running time than the one-hot encoding and its improved methods;
便捷性:预设的参数个数少,减少用户设置参数带来的麻烦,更有利于实际的应用场景;Convenience: The number of preset parameters is small, which reduces the trouble caused by user setting parameters, and is more conducive to actual application scenarios;
普适性:它是一种基于数据驱动的转换方法,能自适应的应用于各种分类数据集501。Universality: It is a data-driven transformation method that can be adaptively applied to various classification data sets501.
设X是一个包含N个样本的分类数据集501,每个样本由一个m维的向量[a1(x),…,am(x)]表示,其中ai(x)是样本x的第i属性的分类值,此外,X的类标签为在算法的流程图中,条件概率计算401首先提取每个分类属性502Ai和类标签C中的数据,然后计算属性Ai内每个分类值ai(x)的条件概率,并生成如下的l维数值向量:Suppose X is a
ai(x)→[P(c1|ai(x)),…,P(cj|ai(x)),…,P(cl|ai(x))] (1)a i (x)→[P(c 1 |a i (x)),…,P(c j |a i (x)),…,P(c l |a i (x))] (1)
其中式(1)中的条件概率项P(cj|ai(x))是由拉普拉斯平滑(Laplace Smoothing)的贝叶斯估计(Bayesian Estimation)进行计算,即为:The conditional probability item P(c j |a i (x)) in formula (1) is calculated by Bayesian Estimation of Laplace Smoothing, which is:
其中式(2)中的I(x,y)是一个指标函数,即当x=y时I(x,y)=1,否则I(x,y)=0;λ(≥0)是一个拉普拉斯平滑因子。I(x,y) in formula (2) is an indicator function, that is, I(x,y)=1 when x=y, otherwise I(x,y)=0; λ(≥0) is a Laplace smoothing factor.
估计有效范围402,利用有效范围算法(ValidRanges Algorithm)的计算属性Ai内每个分类值ai(x)的有效范围[Pmin(cj|ai),Pmax(cj|ai)],其中0≤Pmin(cj|ai)≤Pmax(cj|ai)≤1。Estimate the
S01、如果条件概率项P(cj|ai(x))用于正确分类的样本数大于错误分类的样本数时,即Neg_ratio(ai,cj)>pos_ratio(ai,cj)时,微调这个概率项P(cj|ai(x)),否则退出微调过程;S01. If the conditional probability item P(c j |a i (x)) is used to correctly classify the number of samples greater than the number of wrongly classified samples, that is, Neg_ratio(a i ,c j )>pos_ratio(a i ,c j ) , fine-tune the probability item P(c j |a i (x)), otherwise exit the fine-tuning process;
S02、计算分类值ai(x)的平均有效范围与条件概率的绝对值,其中 S02, calculate the average effective range of the classification value a i (x) with conditional probability the absolute value of in
S03、把条件概率用进行更新,即 S03, the conditional probability use to update, i.e.
S04、归一化更新的条件概率即 S04. Conditional probability of normalized update which is
微调后验证404,使用机器学习分类器验证微调后的条件概率的性能是否得到提高,即验证微调算法是否能更加真实的拟合原始数据的分布。Verification after fine-
条件判断405,判断微调后验证404中条件概率的性能是否提高,如果得到提高,说明本次微调是有效的,转到微调条件概率403,继续微调;否则终止微调过程,退出程序;此外,为防止微调过程进入死循环,微调次数限制在预设的1000次以内。
计算环境图包括由通信网络102耦合的数据采集101、存储数据库103、数据挖掘服务系统104和用户设备105四个功能块。数据采集101终端可能由台式机电脑、笔记本电脑或移动设备自动的在线收集有用的分类数据(如电商网页数据,医疗监测数据等),也可能是人工收集后再录入系统的分类数据集501(如市场访问数据,人口普查数据等)。数据采集101终端将收集的分类数据集501通过网络102发送到数据库103中进行存储,存储分类数据集501的数据库可能是本地工作站或远程服务器,或是云端数据服务器。用户通过用户设备105向服务系统104发送请求,要求分析某个数据挖掘任务(如信用卡欺诈检测的任务)。服务系统104收到请求后,从数据库103中调用相应的分类数据集501,通过数据挖掘分析后将分析报告303传回给用户设备105,以供用户查看和决策。The computing environment diagram includes four functional blocks of
数据采集101将收集的分类数据集501存储在数据库103中,这些分类数据集501的一个示例如图3所示。分类数据样例200是一个信用卡欺诈检测的数据矩阵集,该矩阵的每行代表一个信用贷款客户,每列描述客户的基本信息(或属性,如性别,婚烟状况,收入,信用记录)。这些属性是分类数据(如性别的值为“男”,“女”),而非数值数据(如0.12,1.85等)。The
用户设备105请求服务系统104分析某个数据挖掘任务时,应用于用户的一个服务系统104如图4所示。服务系统104首先从数据库103中调用相应的分类数据集501,然后再在本系统中运行数据转换模块301和分类器模型302,并汇总分析报告303。本发明的数据转换模块301能将数据库103中的分类数据转换为高质量的数值数据,它包括具有数据清洗功能的数据预处理子模块(如清洗缺失数据,噪音数据等)、条件概率计算子模块、微调条件概率子模块和嵌入数值子模块(数据嵌入或数据映射)。转换后的数值数据送入到分类器模块302中,分类器模块302选择适合的机器学习模型(如神经网络、支持向量机、逻辑回归等学习模型)和损失函数(平方损失、0-1损失、交叉熵损失、对数损失等)训练一个分类器。然后,分类器模块302中的分类器对数据转换模块301中的转换数据进行评估,并形成分析报告303。分析报告303中主要包括预测样本的标签,以及分类器性能和效率的评价等内容。When the
数据转换模块301的一个实施例:An embodiment of the data conversion module 301:
采用数据转换模块301可以将数据集中的分类数据转换为数值数据,下面以信用卡欺诈检测的数据集为例说明。该信用卡欺诈检测数据集来源于某市某银行的信用卡部门,在2013年共收集了284,807条数据记录,每条记录含28个分类属性502。该数据集的示例如分类数据矩阵样例200所示。The classification data in the data set can be converted into numerical data by using the
操作步骤如下:The operation steps are as follows:
Step1:数据转换模块301中的数据预处理子模块对原始数据通过清洗缺失数据,噪音数据等操作后得到处理后的分类数据集501;Step1: the data preprocessing sub-module in the
Step2:从分类数据集501中提取每N个分类属性502和类标签;Step2: extract every
Step3:通过公式(1)、(2)计算条件概率401,例如:分类值“结婚”对应的条件概率是[0.15,0.51,...],分类值“单身”对应的条件概率是[0.33,0.12,...]等等;Step3: Calculate the
Step4:微调条件概率403是按照本发明的说明书附图图5进行,例如分类值“结婚”对应的条件概率为[0.15,0.51,...]通过微调后,它对应的微调条件概率403为[0.13,0.47,...];Step4: The fine-tuning
Step5:分类数据集501的分类数据用微调条件概率403进行转换,并将转换后的数值数据保存到数值数据集505中。Step5: The categorical data of the
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770010.6A CN110502552B (en) | 2019-08-20 | 2019-08-20 | Classification data conversion method based on fine-tuning conditional probability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770010.6A CN110502552B (en) | 2019-08-20 | 2019-08-20 | Classification data conversion method based on fine-tuning conditional probability |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110502552A CN110502552A (en) | 2019-11-26 |
CN110502552B true CN110502552B (en) | 2022-10-28 |
Family
ID=68588872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910770010.6A Active CN110502552B (en) | 2019-08-20 | 2019-08-20 | Classification data conversion method based on fine-tuning conditional probability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110502552B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444400A (en) * | 2020-04-07 | 2020-07-24 | 中国汽车工程研究院股份有限公司 | Force and Flow Field Data Management Methods |
CN114549178A (en) * | 2022-02-23 | 2022-05-27 | 中国工商银行股份有限公司 | Credit evaluation method, credit evaluation device, electronic device and medium |
CN115264048B (en) * | 2022-07-26 | 2023-05-23 | 重庆大学 | Intelligent gear decision design method for automatic transmission based on data mining |
CN117009339A (en) * | 2023-08-15 | 2023-11-07 | 中国银行股份有限公司 | Data cleaning method, device, equipment and readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7020593B2 (en) * | 2002-12-04 | 2006-03-28 | International Business Machines Corporation | Method for ensemble predictive modeling by multiplicative adjustment of class probability: APM (adjusted probability model) |
US10558766B2 (en) * | 2015-12-31 | 2020-02-11 | Palo Alto Research Center Incorporated | Method for Modelica-based system fault analysis at the design stage |
-
2019
- 2019-08-20 CN CN201910770010.6A patent/CN110502552B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN104391860A (en) * | 2014-10-22 | 2015-03-04 | 安一恒通(北京)科技有限公司 | Content type detection method and device |
Non-Patent Citations (2)
Title |
---|
A probabilistic framework for memory-based reasoning;Simon Kasif et al.;《Artificial Intelligence》;19980930;第1-2卷(第104期);第287-311页 * |
Handling nominal features in anomaly intrusion detection problems;Mei-Ling Shyu et al.;《15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications》;20050906;第55-62页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110502552A (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502552B (en) | Classification data conversion method based on fine-tuning conditional probability | |
WO2021088499A1 (en) | False invoice issuing identification method and system based on dynamic network representation | |
CN109949152A (en) | A Personal Credit Default Prediction Method | |
CN104751363B (en) | Stock Forecasting of Middle And Long Period Trends method and system based on Bayes classifier | |
CN115410275A (en) | Office place personnel state detection method and system based on image recognition | |
Sudipa et al. | Trend Forecasting of the Top 3 Indonesian Bank Stocks Using the ARIMA Method | |
CN110163378A (en) | Characteristic processing method, apparatus, computer readable storage medium and computer equipment | |
CN119006144A (en) | Business project management method, device, computer equipment and storage medium | |
CN114219630A (en) | Service risk prediction method, device, equipment and medium | |
CN113569048A (en) | Method and system for automatically dividing affiliated industries based on enterprise operation range | |
CN117973675A (en) | A method and system for judging false closed loop of defects and countermeasures based on planned work orders | |
CN120163653A (en) | Dynamic risk control method, device, equipment and storage medium | |
CN118297640A (en) | Product marketing management system and method based on big data | |
CN112329862A (en) | Decision tree-based anti-money laundering method and system | |
CN116843345A (en) | Intelligent wind control system and method for trading clients based on artificial intelligence technology | |
CN117726426A (en) | Credit evaluation method, credit evaluation device, electronic equipment and storage medium | |
CN117853151A (en) | Electronic commerce data analysis system and method based on big data | |
CN116384751A (en) | Method and computing device for carrying out standardized risk index and risk rating prediction | |
CN115293867A (en) | Financial reimbursement user portrait optimization method, device, equipment and storage medium | |
CN115237970A (en) | Data prediction method, device, equipment, storage medium and program product | |
CN113724060A (en) | Credit risk assessment method and system | |
CN118468207B (en) | Enterprise abnormal behavior monitoring system and method based on big data | |
CN119359307B (en) | Supply chain financial transaction safety early warning method and system | |
CN119693125B (en) | Credit risk level assessment method, apparatus, device and storage medium | |
CN117952717B (en) | A method and system for processing air ticket orders based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |