Disclosure of Invention
The invention aims to provide a heterogeneous feature hybrid extraction method to solve the problem of extraction of data hybrid features in the field of pattern recognition and machine learning.
In order to achieve the purpose, the invention adopts the following technical scheme: the heterogeneous data attribute decision table is provided with a numerical characteristic attribute and a category characteristic attribute, the heterogeneous data attribute is divided into a numerical characteristic attribute space and a category characteristic attribute space, the granularity of the sample on the union of the two spaces is calculated, the approximate upper limit and the approximate lower limit of the target subset are calculated, and then the mixed features can be extracted.
Compared with the prior art, the invention has the beneficial effects that: the method can solve the problem of mixed extraction of heterogeneous characteristic data and prepare preprocessing for data classification.
Detailed Description
A heterogeneous characteristic mixed extraction method comprises the following steps: the decision table of a structured data information system can be expressed as:
DT=<U,A〉 (1)
wherein the universe U is a non-empty finite sample set { x1,x2,…xnA is a feature attribute set { a }1,a2,…amAnd n and m are any natural numbers.
Order: a ═ C @ D, where C is the conditional attribute total and D is the decision attribute. For arbitrary x
iE.g. U and
x is then
iNeighborhood δ in feature space B
B(x
i) Can be expressed as:
δB(xi)={xj|xj∈U,ΔB(xi,xj)≤δ} (2)
wherein, Δ is a distance function, which may be a Manhattan distance, an Euclidean distance, and a Chebychev distance, and which distance function is used is determined according to specific attribute conditions in the decision table; delta is a threshold value, the value is any non-negative real number, and the threshold value determines the granularity of a neighborhood; i and j are any positive integer.
Heterogeneous feature attribute set
And
respectively representing a numerical value characteristic attribute set and a category characteristic attribute set, and then the sample x is in the characteristic attribute set B
1、B
2And B
1∪B
2The neighborhood granularity above may be expressed as:
wherein, the operation A represents the conjunction, i is any positive integer; expression (3) represents a numerical characteristic attribute, expression (4) represents a category characteristic attribute, and expression (5) represents a mixed attribute of a numerical value and a category; according to equation (3) and equation (4), the samples have the same value on the class feature, and the distance on the numerical feature is smaller than the threshold value δ.
For arbitrary
Then X is in the decision table<U,A>Two subsets of medium targets, i.e. upper and lower limits, can be approximated as:
i.e. the set of extracted features.
A heterogeneous Decision table, where the data set is composed of a numerical characteristic attribute and a category characteristic attribute, as shown in the table of fig. 1, numerical _ attribute is the numerical characteristic attribute, category _ attribute is the category characteristic attribute, and Decision is the Decision attribute.
Numerical characteristic Attribute
Category feature attributes
The threshold δ is 0.1, and the neighborhood granularity of the sample is calculated on the numerical characteristic attribute according to equation (1) using Euclidean distance:
the neighborhood granularity of the sample is computed on the class feature attribute according to equation (2):
the two subsets of decision attributes are: x1={x1,x3,x6},X2={x2,x4,x5};
The neighborhood granularity of the sample is computed over the numerical and class attributes according to equation (3):
calculating X on the numerical characteristic attribute and the class characteristic attribute according to the formula (4) and the formula (5)
1And X
2Approximate upper limit and approximate lower limit of (c):
AX
1={x
1,x
3,x
6},
AX
2={x
2,x
4,x
5},
and
is the set of extracted features.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.