CN104731916A - Optimizing initial center K-means clustering method based on density in data mining - Google Patents
Optimizing initial center K-means clustering method based on density in data mining Download PDFInfo
- Publication number
- CN104731916A CN104731916A CN201510131975.2A CN201510131975A CN104731916A CN 104731916 A CN104731916 A CN 104731916A CN 201510131975 A CN201510131975 A CN 201510131975A CN 104731916 A CN104731916 A CN 104731916A
- Authority
- CN
- China
- Prior art keywords
- density
- data object
- data
- cluster
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an optimizing initial center K-means clustering method based on density in data mining. The method comprises the following steps that step 1, a needed data set is given, and the clustering number K is determined; step 2, the density of all data objects in the data set is calculated, and the mean density of the data set is calculated according to the obtained density of the data objects; step 3, a minimum density distance value of each data object in the data set is calculated; step 4, descending sort is performed on the minimum density distance values of the data objects in the data set, and the data objects which are corresponding to the first K minimum density distance values and of which the density is larger than the mean density are selected as an initial clustering center according to the determined clustering number K; step 5, clustering is performed on the data set by using a K-means clustering method according to the above obtained initial clustering center until a clustering result is output. According to the optimizing initial center K-means clustering method based on the density in data mining, the calculating complexity is reduced, classification accuracy rate is improved, the stability is high, and rapid convergence is improved.
Description
Technical field
The present invention relates to a kind of clustering method, in especially a kind of data mining, density based optimizes the K means clustering method of initial center, belongs to the technical field of cluster analysis.
Background technology
Data mining is one of heat topic of computer nowadays research, cluster analysis is as the unsupervised machine learning method of one, refer to for a set of data objects, how research be divided into data object in different bunches automatically, allow the object in same cluster have higher similarity under certain criterion, and the data object in different bunches have low similarity.Cluster analysis is widely used at Disciplinary Frontiers such as machine learning, data mining, speech recognition, Iamge Segmentation, business analysis and Bioinformatics.At present, traditional clustering algorithm mainly comprises five classes, they respectively: based on the clustering algorithm divided, clustering algorithm, density-based algorithms, the clustering algorithm based on grid and the clustering algorithm based on model based on level.
In the middle of clustering algorithm, K-means algorithm belongs to the clustering algorithm based on dividing, and it is succinctly quick, with efficient and famous.But there are some defects in original K-means algorithm: 1), primal algorithm requires that user provides K value, i.e. the number of class bunch, and this value gets primarily of experience, so the difficulty of true defining K value is larger; 2), algorithm is responsive to initial cluster center, and the quality that initial center is selected, can affect cluster result, affect the efficiency of algorithm operation; 3), this algorithm is comparatively responsive to abnormal data, and result can be caused to be absorbed in locally optimal solution.
At present, some scholars have made a little improvement to initial center point problem, and as preventing result to be absorbed in local optimum, the point of the comparison dispersion that normally chosen distance is far away is as initial center point.If but only consider distance factor, then easily choose abnormity point, and then have influence on Clustering Effect.Scholar also considers these problems, and then from the angle of density, filters out abnormity point.Problem is also had to be the point that initial center point likely can be chosen in same class bunch, although namely the density ratio of certain point is larger, but somewhat selectedly in the class of this some correspondence bunch do central point, now should select the representational point in other class, otherwise, result also can be caused easily to be absorbed in locally optimal solution.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide density based in a kind of data mining to optimize the K means clustering method of initial center, it reduces computation complexity, and improve the accuracy rate of classification, stability is high, improves Fast Convergent.
According to technical scheme provided by the invention, in a kind of data mining, density based optimizes the K means clustering method of initial center, and described clustering method comprises the steps:
Step 1, given required data set, and determine cluster number K;
Step 2, calculate the density of all data objects in data set, and according to obtaining the average density of density calculation data set of data object;
The minimum density distance value of each data object in step 3, calculating data set;
Step 4, descending sort is carried out to the minimum density distance value of data object in data set, according to the cluster number K determined, select and density corresponding with front K minimum density distance value to be greater than the data object initial cluster center the most of average density;
Step 5, initial cluster center according to above-mentioned acquisition, utilize K-means clustering method to carry out cluster to data set, until export cluster result.
Described step 5 comprises the steps:
Step 5.1, according to selected initial cluster center, the data object in data set is assigned to the initial cluster center nearest with described data object, and the error sum of squares of data object in a calculating K cluster, to obtain initial error quadratic sum;
Step 5.2, after the data object in data set is assigned to nearest initial cluster center, calculate the cluster centre of K cluster, to obtain revising cluster centre;
Step 5.3, according to correction cluster centre, determine the error sum of squares of data object in K cluster, to obtain round-off error quadratic sum;
When step 5.4, difference between round-off error quadratic sum and initial error quadratic sum do not meet the condition of convergence, then using the correction cluster centre that obtains again as initial cluster center, and repeat above-mentioned steps, until the difference between round-off error quadratic sum and initial error quadratic sum meets the condition of convergence.
For data set X={x
i| i=1,2 ..., n}, data object has m dimensional feature, then the density of data object is
Wherein, d (x
i, x
j) be data object x
iwith data object x
jbetween Euclidean distance,
R is data object x
ithe radius of neighbourhood.
For data object x
i, calculate data object x
ito the distance of the data object larger than its density, then minimum density distance value is data object x
iminimum value in the distance than its density large data objects; As described data object x
iduring for data object that density is maximum, then minimum density distance value is data object x
iand the maximum distance in data set between data object.
Advantage of the present invention: consider from the bulk properties of class, inner for class general point and density maximum point are made a distinction, because the dot density in same class has certain correlativity, this correlativity can reflect from point to the lowest distance value of the point higher than its density, the point that this distance value of general point is corresponding must be the point with oneself in same bunch, and distance value is less; And this distance value of the central point of class i.e. the maximum point of class Midst density, the meeting that correspond to is the high density point in other class, and distance value is comparatively large, can general point and central point to distinguishing.In addition, this distance value of abnormity point is also comparatively large, is filtered out just passable by average density.So, just can filter out high-quality initial cluster center point, finally use K means clustering method cluster.Simulation result shows to compare existing K Mean Method, and the present invention has higher accuracy rate and less iterations, can Fast Convergent.
Accompanying drawing explanation
Fig. 1 is the schematic diagram utilizing random selecting mode to select cluster centre point.
Fig. 2 is the schematic diagram that the present invention obtains initial cluster center point.
Fig. 3 is the distribution situation containing 240 data objects.
Fig. 4 is process flow diagram of the present invention.
Embodiment
Below in conjunction with concrete drawings and Examples, the invention will be further described.
As shown in Figure 4: in order to improve the accuracy rate of classification, stability is high, improve Fast Convergent, clustering method of the present invention comprises the steps:
Step 1, given required data set, and determine cluster number K;
In the embodiment of the present invention, for data set X={x
i| i=1,2 ..., n}, data object has m dimensional feature, C
j(j=1,2 ..., K) represent K classification of cluster, c
j(j=1,2 ..., K) represent initial cluster center.
Step 2, calculate the density of all data objects in data set, and according to obtaining the average density of density calculation data set of data object;
In the embodiment of the present invention, in data set, the density of data object is
Wherein, d (x
i, x
j) be data object x
iwith data object x
jbetween Euclidean distance,
R is data object x
ithe radius of neighbourhood.The radius of neighbourhood of data object is: calculate the distance between all data objects, adjusts the distance and carries out ascending sort, gets radius value R for being positioned at
the value of position, wherein, num is number of distances, and for given data set, number of distances num is the number of distances between data object, can uniquely determine according to data object in data set; Percent is coverage number percent, and in the specific implementation, when percent is set to 1%-2%, effect is better.For arbitrary data object x
i, with data object x
icentered by, R is the border circular areas of radius, is called data object x
ineighborhood, i.e. neighbourhood={x|0 < d (x, x
i) < R}.The average density of data set can be calculated by arithmetic mean method according to the density of all data objects, and concrete computation process, known by the art personnel, repeats no more herein.
The minimum density distance value of each data object in step 3, calculating data set;
In the embodiment of the present invention, for data object x
i, calculate data object x
ito the distance of the data object larger than its density, then minimum density distance value is data object x
iminimum value in the distance than its density large data objects; As described data object x
iduring for data object that density is maximum, then minimum density distance value is data object x
iand the maximum distance in data set between data object.
Step 4, descending sort is carried out to the minimum density distance value of data object in data set, according to the cluster number K determined, select and density corresponding with front K minimum density distance value to be greater than the data object initial cluster center the most of average density;
In the embodiment of the present invention, descending sort is carried out to the minimum density distance value of data object in data set, then choose with front K minimum density distance value is corresponding and density is greater than the data object of average density as initial cluster center, can c be obtained
j(j=1,2 ..., K).
Step 5, initial cluster center according to above-mentioned acquisition, utilize K-means clustering method to carry out cluster to data set, until export cluster result.
In the embodiment of the present invention, utilize K-means clustering method to carry out cluster process known by the art personnel to data set, particularly, described step 5 comprises the steps:
Step 5.1, according to selected initial cluster center, the data object in data set is assigned to the initial cluster center nearest with described data object, and the error sum of squares of data object in a calculating K cluster, to obtain initial error quadratic sum;
Initial error quadratic sum can be obtained by formulae discovery, namely
Wherein, x
itbe t data object of i-th cluster, n
ibe the data object number in i-th cluster, initial error quadratic sum E is less, then the similarity in cluster is higher, otherwise the similarity in cluster is lower.
Step 5.2, after the data object in data set is assigned to nearest initial cluster center, calculate the cluster centre of K cluster, to obtain revising cluster centre;
In the embodiment of the present invention, revising cluster centre can be obtained by below method
Wherein, j=1,2 ..., K; n
jfor the data object quantity of jth cluster.
Step 5.3, according to correction cluster centre, determine the error sum of squares of data object in K cluster, to obtain round-off error quadratic sum;
In the embodiment of the present invention, the computing method of round-off error quadratic sum are identical with initial error quadratic sum, repeat no more herein.
When step 5.4, difference between round-off error quadratic sum and initial error quadratic sum do not meet the condition of convergence, then using the correction cluster centre that obtains again as initial cluster center, and repeat above-mentioned steps, until the difference between round-off error quadratic sum and initial error quadratic sum meets the condition of convergence.
In the specific implementation, the condition of convergence can be determined according to data set, usually, when the difference between round-off error quadratic sum and initial error quadratic sum is less than a definite value or stabilizes to a definite value, then can think and meet the condition of convergence.
In order to verify the validity of clustering method of the present invention, under identical experimental situation, running the method for algorithm and traditional random acquisition central point herein respectively, contrast their operational effect.Experimental situation is: operating system Mac OS X 10.10.1, composing software Matlab2014a, eclipse, Hardware I ntel Core i72.4GHz, internal memory 8GB.
First, testing algorithm chooses the effect of representative central point.
Data set adopts self-defining data set, and it has 6 classifications, and described data set is used to the validity that test center point Algorithms of Selecting is each class Selecting Representative Points from A on irregular data collection, and the distinction to common high density point and representative central point.Data distribution8 as shown in Figures 1 and 2.Fig. 1 is the central point that the method for random selecting obtains, and Fig. 2 is the central point that clustering method of the present invention obtains.Asterisk is the central point of algorithm picks, and circle is general point.
As can be seen from Figure 1, the central point that existing random selection method obtains has instability, and representativeness is not very strong, and common high density point and effective central point are not made a distinction, some points are arranged in same bunch.In fig. 2, the central point of acquisition is distributed in each class respectively, for each classification have chosen representative central point, has well distinguished active center point and common high density point, avoids central point and is chosen in phenomenon in same class bunch.Further, during Selection Center point, also abnormity point has been filtered out by average density, for the subsequent step of algorithm is laid a good groundwork.
Then, the accuracy of testing algorithm Selection Center point.
Test data set adopts Iris (3 classes in UCI database, 4 attributes, 150 objects), through consulting, the practical center of Iris data set, be respectively (5.003.421.460.24), (6.582.975.552.02), (5.932.774.261.32).Algorithm parameter percent herein, when choosing percent=2%, Clustering Effect is better.The experimental result of clustering method of the present invention and existing random selecting method gained, contrasts with Iris real data center, the results are shown in Table 1..
Table 1 compares with original I ris data set center
As can be seen from Table 1, central point and the original I ris data set center standard error of clustering method acquisition of the present invention are less, and more close to practical center, accuracy is high and stable, has more representativeness, illustrates that this algorithm is effective to original I ris data set.
Then, the Clustering Effect of testing algorithm.
Test data adopts the Iris in UCI database, Wine, Glass Identification raw data set, and Clustering Effect error sum of squares E value is weighed, E value is less, illustrate object in class and cluster centre nearer, Clustering Effect is better, on the contrary, E value is larger, then Clustering Effect is poorer.On data set, run 20 times with two kinds of clustering methods respectively, the error sum of squares E value of gained is averaged, as shown in table 2.The mean value of the error sum of squares E that clustering method of the present invention is corresponding, has reduction compared to random algorithm as can be seen from the table, illustrates that the method for the present invention to K-means initial center point improves to Clustering Effect.
The contrast of table 2 two kinds of algorithm Clustering Effects
Finally, testing algorithm working time and iterations.
K-means algorithm is suitable for the data set of convex structure, following experimental data is respectively with (1.01.0), (3.33.3), (-3.253.25), (-3.25-2.25), (3.25-2.25) be mean vector, then respectively with [0.90; 00.3], [0.70; 00.5], [1.50; 00.9], [0.30; 01.3], [1.10; 00.7] be covariance matrix, what generate respectively has 240,1100, the two-dimentional data set meeting normal distribution of 4500 and 11000 points.Data acquisition when generating 240 points distributes as shown in Figure 3.
High-quality initial center point contributes to the time and the iterations that reduce the operation of K-means algorithm.Two kinds of initial center point-generating algorithm, in conjunction with K-means algorithm, each data set run 20 times, the average operating time of testing algorithm when different pieces of information amount and mean iterative number of time, as shown in table 3.
The working time contrast of table 3 two kinds of methods when different pieces of information amount
Along with increasing of data object, the working time of two kinds of methods is all in increase.But the clustering method of the present invention time used is minimum.Along with the increase of data volume, clustering method of the present invention iterations used is obviously less than the algorithm of random selecting central point.The method that clustering method of the present invention compares traditional random selecting central point has spent the time when calculating minor increment density (MDD) Selection Center point, but algorithm actual run time is less than traditional K-means algorithm, analyzing reason is that the representativeness of initial cluster center point owing to choosing is strong, quality is high, decrease the iterations of algorithm, for the subsequent step of algorithm saves the time, so initial cluster center point of the present invention can make K-means algorithm Fast Convergent, improve the operational efficiency of algorithm.
The influential effect chosen for cluster of the initial cluster center of K-means algorithm is comparatively large, and the bad meeting of choosing makes cluster result unstable, is easily absorbed in locally optimal solution.The present invention is on the basis calculating data object density, further calculating minimum density distance, though filtered density height with this, but the point that representativeness is not strong, abnormity point is filtered out by average density, for data set selects representative initial cluster center point, and by one of this starting condition as K-means algorithm.As can be seen from experimental result, clustering method of the present invention is conducive to K-means algorithm Fast Convergent, improves the operational efficiency of algorithm, and accuracy, and improves Clustering Effect, demonstrates the validity of algorithm.
The present invention considers from the bulk properties of class, inner for class general point and density maximum point are made a distinction, because the dot density in same class has certain correlativity, this correlativity can reflect from point to the lowest distance value of the point higher than its density, the point that this distance value of general point is corresponding must be the point with oneself in same bunch, and distance value is less; And this distance value of the central point of class i.e. the maximum point of class Midst density, the meeting that correspond to is the high density point in other class, and distance value is comparatively large, can general point and central point to distinguishing.In addition, this distance value of abnormity point is also comparatively large, is filtered out just passable by average density.So, just can filter out high-quality initial cluster center point, finally use K means clustering method cluster.Simulation result shows to compare traditional K Mean Method, and the present invention has higher accuracy rate and less iterations, can Fast Convergent.
Claims (4)
1. in data mining, density based optimizes a K means clustering method for initial center, and it is characterized in that, described clustering method comprises the steps:
Step 1, given required data set, and determine cluster number K;
Step 2, calculate the density of all data objects in data set, and according to obtaining the average density of density calculation data set of data object;
The minimum density distance value of each data object in step 3, calculating data set;
Step 4, descending sort is carried out to the minimum density distance value of data object in data set, according to the cluster number K determined, select and density corresponding with front K minimum density distance value to be greater than the data object initial cluster center the most of average density;
Step 5, initial cluster center according to above-mentioned acquisition, utilize K-means clustering method to carry out cluster to data set, until export cluster result.
2. in data mining according to claim 1, density based optimizes the K means clustering method of initial center, and it is characterized in that, described step 5 comprises the steps:
Step 5.1, according to selected initial cluster center, the data object in data set is assigned to the initial cluster center nearest with described data object, and the error sum of squares of data object in a calculating K cluster, to obtain initial error quadratic sum;
Step 5.2, after the data object in data set is assigned to nearest initial cluster center, calculate the cluster centre of K cluster, to obtain revising cluster centre;
Step 5.3, according to correction cluster centre, determine the error sum of squares of data object in K cluster, to obtain round-off error quadratic sum;
When step 5.4, difference between round-off error quadratic sum and initial error quadratic sum do not meet the condition of convergence, then using the correction cluster centre that obtains again as initial cluster center, and repeat above-mentioned steps, until the difference between round-off error quadratic sum and initial error quadratic sum meets the condition of convergence.
3. in data mining according to claim 1, density based optimizes the K means clustering method of initial center, it is characterized in that, for data set X={x
i| i=1,2 ..., n}, data object has m dimensional feature, then the density of data object is
Wherein, d (x
i, x
j) be data object x
iwith data object x
jbetween Euclidean distance,
i=1,2 ..., n; J=1,2 ..., n; R is data object x
ithe radius of neighbourhood.
4. in data mining according to claim 1, density based optimizes the K means clustering method of initial center, it is characterized in that, for data object x
i, calculate data object x
ito the distance of the data object larger than its density, then minimum density distance value is data object x
iminimum value in the distance than its density large data objects; As described data object x
iduring for data object that density is maximum, then minimum density distance value is data object x
iand the maximum distance in data set between data object.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510131975.2A CN104731916A (en) | 2015-03-24 | 2015-03-24 | Optimizing initial center K-means clustering method based on density in data mining |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510131975.2A CN104731916A (en) | 2015-03-24 | 2015-03-24 | Optimizing initial center K-means clustering method based on density in data mining |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN104731916A true CN104731916A (en) | 2015-06-24 |
Family
ID=53455803
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510131975.2A Pending CN104731916A (en) | 2015-03-24 | 2015-03-24 | Optimizing initial center K-means clustering method based on density in data mining |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104731916A (en) |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105631416A (en) * | 2015-12-24 | 2016-06-01 | 华侨大学 | Method for carrying out face recognition by using novel density clustering |
| CN105903692A (en) * | 2016-05-19 | 2016-08-31 | 四川长虹电器股份有限公司 | Lithium ion battery consistency screening method |
| CN106874923A (en) * | 2015-12-14 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of genre classification of commodity determines method and device |
| CN107358368A (en) * | 2017-07-21 | 2017-11-17 | 国网四川省电力公司眉山供电公司 | A kind of robust k means clustering methods towards power consumer subdivision |
| CN108154173A (en) * | 2017-12-21 | 2018-06-12 | 陕西科技大学 | A kind of oil-water interface measuring device of crude oil storage tank and method |
| CN108536648A (en) * | 2018-03-30 | 2018-09-14 | 武汉大学 | Shelf depreciation nonlinear model conversion based on multiple ultrasonic sensors solves and optimization method |
| CN109002513A (en) * | 2018-07-04 | 2018-12-14 | 深圳软通动力科技有限公司 | A kind of data clustering method and device |
| CN110047509A (en) * | 2019-03-28 | 2019-07-23 | 国家计算机网络与信息安全管理中心 | A kind of two-stage Subspace partition method and device |
| CN110222747A (en) * | 2019-05-24 | 2019-09-10 | 河海大学 | A kind of clustering method of optimization |
| CN110414569A (en) * | 2019-07-03 | 2019-11-05 | 北京小米智能科技有限公司 | Cluster realizing method and device |
| CN110825826A (en) * | 2019-11-07 | 2020-02-21 | 深圳大学 | Cluster computing method, device, terminal and storage medium |
| CN111221915A (en) * | 2019-04-18 | 2020-06-02 | 江苏大学 | Online learning resource quality analysis method based on CWK-means |
| CN111860700A (en) * | 2020-09-22 | 2020-10-30 | 深圳须弥云图空间科技有限公司 | Energy consumption classification method and device, storage medium and equipment |
| CN114493229A (en) * | 2022-01-20 | 2022-05-13 | 广东电网有限责任公司电力调度控制中心 | Regulation and control business arrangement agent method and system based on unsupervised learning technology |
-
2015
- 2015-03-24 CN CN201510131975.2A patent/CN104731916A/en active Pending
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106874923A (en) * | 2015-12-14 | 2017-06-20 | 阿里巴巴集团控股有限公司 | A kind of genre classification of commodity determines method and device |
| CN105631416A (en) * | 2015-12-24 | 2016-06-01 | 华侨大学 | Method for carrying out face recognition by using novel density clustering |
| CN105631416B (en) * | 2015-12-24 | 2018-11-13 | 华侨大学 | The method for carrying out recognition of face is clustered using novel density |
| CN105903692A (en) * | 2016-05-19 | 2016-08-31 | 四川长虹电器股份有限公司 | Lithium ion battery consistency screening method |
| CN105903692B (en) * | 2016-05-19 | 2018-05-25 | 四川长虹电器股份有限公司 | Lithium ion battery conformity classification method |
| CN107358368A (en) * | 2017-07-21 | 2017-11-17 | 国网四川省电力公司眉山供电公司 | A kind of robust k means clustering methods towards power consumer subdivision |
| CN107358368B (en) * | 2017-07-21 | 2021-07-20 | 国网四川省电力公司眉山供电公司 | A Robust k-means Clustering Method for Electricity User Segmentation |
| CN108154173A (en) * | 2017-12-21 | 2018-06-12 | 陕西科技大学 | A kind of oil-water interface measuring device of crude oil storage tank and method |
| CN108154173B (en) * | 2017-12-21 | 2021-08-24 | 陕西科技大学 | A crude oil storage tank oil-water interface measuring device and method |
| CN108536648B (en) * | 2018-03-30 | 2021-07-06 | 武汉大学 | Conversion solution and optimization method of partial discharge nonlinear model based on multiple ultrasonic sensors |
| CN108536648A (en) * | 2018-03-30 | 2018-09-14 | 武汉大学 | Shelf depreciation nonlinear model conversion based on multiple ultrasonic sensors solves and optimization method |
| CN109002513A (en) * | 2018-07-04 | 2018-12-14 | 深圳软通动力科技有限公司 | A kind of data clustering method and device |
| CN110047509A (en) * | 2019-03-28 | 2019-07-23 | 国家计算机网络与信息安全管理中心 | A kind of two-stage Subspace partition method and device |
| CN111221915A (en) * | 2019-04-18 | 2020-06-02 | 江苏大学 | Online learning resource quality analysis method based on CWK-means |
| CN111221915B (en) * | 2019-04-18 | 2024-01-09 | 西安睿德培欣教育科技有限公司 | Online learning resource quality analysis method based on CWK-means |
| CN110222747A (en) * | 2019-05-24 | 2019-09-10 | 河海大学 | A kind of clustering method of optimization |
| CN110222747B (en) * | 2019-05-24 | 2022-08-16 | 河海大学 | An optimized clustering method |
| CN110414569A (en) * | 2019-07-03 | 2019-11-05 | 北京小米智能科技有限公司 | Cluster realizing method and device |
| US11501099B2 (en) | 2019-07-03 | 2022-11-15 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Clustering method and device |
| CN110825826A (en) * | 2019-11-07 | 2020-02-21 | 深圳大学 | Cluster computing method, device, terminal and storage medium |
| CN111860700A (en) * | 2020-09-22 | 2020-10-30 | 深圳须弥云图空间科技有限公司 | Energy consumption classification method and device, storage medium and equipment |
| CN114493229A (en) * | 2022-01-20 | 2022-05-13 | 广东电网有限责任公司电力调度控制中心 | Regulation and control business arrangement agent method and system based on unsupervised learning technology |
| CN114493229B (en) * | 2022-01-20 | 2024-10-15 | 广东电网有限责任公司电力调度控制中心 | Method and system for scheduling and proxy of regulation and control business based on unsupervised learning technology |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104731916A (en) | Optimizing initial center K-means clustering method based on density in data mining | |
| CN104765879A (en) | Density-based partitioning and clustering method for K center points in data mining | |
| CN102663100B (en) | A two-stage hybrid particle swarm optimization clustering method | |
| CN111062425B (en) | Unbalanced data set processing method based on C-K-SMOTE algorithm | |
| CN112037009A (en) | Risk assessment method for consumption credit scene based on random forest algorithm | |
| TW201109949A (en) | Density-based data clustering method | |
| CN106056136A (en) | Data clustering method for rapidly determining clustering center | |
| CN110826618A (en) | A personal credit risk assessment method based on random forest | |
| CN113378987A (en) | Density-based unbalanced data mixed sampling algorithm | |
| CN110288048A (en) | A SVM Directed Acyclic Graph Method for Risk Assessment of Submarine Pipelines | |
| CN106202388B (en) | A kind of user gradation Automated Partition Method and system | |
| CN111523576B (en) | A Density Peak Clustering Outlier Detection Method Applicable to Electronic Mass Inspection | |
| CN110493221A (en) | A kind of network anomaly detection method based on the profile that clusters | |
| CN106326923A (en) | Sign-in position data clustering method in consideration of position repetition and density peak point | |
| CN104239434A (en) | Clustering method based on ecological niche genetic algorithm with diverse radius technology | |
| CN106126918A (en) | A kind of geographical space abnormal aggregation domain scanning statistical method based on interaction force | |
| CN105205052A (en) | Method and device for mining data | |
| Bruzzese et al. | DESPOTA: DEndrogram slicing through a pemutation test approach | |
| CN111737110A (en) | A Test Input Selection Method for Deep Learning Models | |
| CN103793438B (en) | A kind of parallel clustering method based on MapReduce | |
| CN109978023A (en) | Feature selection approach and computer storage medium towards higher-dimension big data analysis | |
| CN114077869A (en) | Bank data set clustering method, device and equipment | |
| CN114676931B (en) | A power prediction system based on data center technology | |
| CN115510959A (en) | Density peak value clustering method based on natural nearest neighbor and multi-cluster combination | |
| CN115146742A (en) | Grouping and flagship selection method of offshore wind farm units suitable for farm group control |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150624 |