Disclosure of Invention
The application provides a text classification method, a text classification device, a text classification equipment and a text classification storage medium for a power grid maintenance list, which are used for solving the technical problems that the existing matching method for machine learning needs a large amount of training data and cannot process complex inter-class relations, so that the result is lack of accuracy and reliability.
In view of this, the first aspect of the present application provides a text classification method for a power grid maintenance list, including:
selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence subclasses;
integrating all the keyword sets according to the sentence categories to obtain category keyword sets;
carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence;
ranking and assigning a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence;
carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results;
and selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
Preferably, the selecting of the keyword set of each sentence in the historical repair list according to the preset proportion includes:
and carrying out category labeling processing on each sentence in the historical overhaul list to obtain a sentence category.
Preferably, the selecting a keyword set of each sentence in the historical service list according to a preset proportion, where the historical service list includes a sentence category, includes:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Preferably, the performing ranking assignment on the keywords in a preset number in the keyword sequence to obtain a keyword coefficient sequence includes:
acquiring a preset number of keywords in the keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the front preset number, and assigning the coefficients of the keywords with the number not front preset number to be 0 to obtain a keyword coefficient sequence.
This application second aspect provides a text classification device of electric wire netting maintenance list, includes:
the keyword selection module is used for selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence categories;
the word integration module is used for integrating all the keyword sets according to the sentence categories to obtain category keyword sets;
the word processing module is used for carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence;
the ranking assignment module is used for performing ranking assignment on the keywords in the preset number in the keyword sequence to obtain a keyword coefficient sequence;
the score calculation module is used for carrying out sentence score calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of score results;
and the text classification module is used for selecting the sentence category corresponding to the maximum value in the scoring result as the target sentence category of the text sentence to be recognized.
Preferably, the method further comprises the following steps:
and the sentence marking module is used for carrying out category marking processing on each sentence in the historical overhaul list to obtain the sentence category.
Preferably, the keyword selection module is specifically configured to:
performing artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Preferably, the ranking assignment module is specifically configured to:
acquiring a preset number of keywords in the keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the front preset number, and assigning the coefficients of the keywords with the number not front preset number to be 0 to obtain a keyword coefficient sequence.
The third aspect of the application provides a text classification device for a power grid maintenance list, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the text classification method for the grid service list according to the instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code for executing the method for text classification of a grid service ticket according to the first aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a text classification method for a power grid maintenance list, which comprises the following steps: selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence subclasses; integrating all keyword sets according to sentence categories to obtain category keyword sets; carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence; ranking and assigning a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence; carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results; and selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
According to the text classification method for the power grid maintenance list, the previous text preparation work can be completed only by few representative historical maintenance lists, namely keyword integration, proportional screening, ascending order arrangement and other operations in sentences, then each word is assigned to obtain a corresponding keyword coefficient sequence, and for any text sentence to be identified, the obtained coefficient can be adopted for grading calculation, so that accurate target category matching can be performed according to the grade; the influence of the complex inter-class relation of sentences can be avoided by selecting a batch of keywords with low frequency, so that the classification result is more reliable. Therefore, the method and the device can solve the technical problems that the existing matching method for machine learning needs a large amount of training data, complex inter-class relations cannot be processed, and the result lacks accuracy and reliability.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For convenience of understanding, please refer to fig. 1, an embodiment of a text classification method for a power grid service list provided by the present application includes:
step 101, selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence categories.
Further, step 101 includes:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting the keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
The historical overhaul bill has text sentences of various tasks, and each sentence can be subjected to artificial intelligence word segmentation according to text rules to obtain an initial word segmentation set; the words repeatedly marked out can be deleted before the global frequency statistics is carried out, and the duplication elimination processing can ensure that each keyword is unique in the statistical process, which can be directly executed in actual operation and does not need to be repeated. The frequency statistics mainly comprises the steps of recording the frequency of each word appearing in a historical overhaul list, wherein the more the frequency of the appearance, the stronger the relevance between the word and a task needing to be executed in the historical overhaul list is, and the stronger the relevance between the word and a text sentence is; otherwise, the weaker the association. The ascending arrangement is to place the keywords with weak relevance at the top for subsequent selection. The preset ratio may be recorded as K, and may be set according to an actual situation, which is not described herein.
If there are X participles in the initial participle set obtained after the processing, then X participles can be denoted as FEN ═ a1-AX, and each sentence can be expressed as JYAnd Y is the total number of sentences, each sentence can be subjected to keyword check to form a word segmentation sequence based on the sentences, the number of the sentences is the number of the word segmentation sequences, the words in each word segmentation sequence are arranged according to ascending order of frequency, when the keywords are selected in a preset proportion, the front part words of the word segmentation sequences are obtained, namely, the part words with lower frequency are cut and selected to form a keyword set An.
Further, step 101, before, further includes:
and carrying out category labeling processing on each sentence in the historical overhaul list to obtain a sentence category.
And 102, integrating all the keyword sets according to sentence categories to obtain category keyword sets.
Integrating keyword sets corresponding to sentences of the same category into a total keyword set, and finally obtaining keyword sets with the same number as the categories of the sentences; it will be appreciated that the number of partial words within these keyword sets is not equal, and therefore a subsequent keyword selection process is required.
And 103, carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence.
Taking each category keyword set as a unit, carrying out global frequency statistics and ascending arrangement on the internal keywords to obtain a keyword sequence, wherein the operation is to prepare for word selection.
And 104, carrying out ranking assignment on the keywords with the preset number in the keyword sequence to obtain a keyword coefficient sequence.
Further, step 104 includes:
acquiring a preset number of keywords in a keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the preset number, and assigning the coefficients of the keywords with the non-preset number to be 0 to obtain a keyword coefficient sequence.
The selected keywords are still the front part with low frequency, and the category keyword set corresponding to each sentence is processed to be the set of the keywords with preset number, so that the expression of the keyword set is unified. The preset number may be set as needed, and is not limited herein.
The frequency expression is too complicated, the subsequent calculation amount can be increased, in order to simplify the calculation, the sequence number of the keyword is assigned to the corresponding keyword, and the assignment mode is a reverse order assignment method; except for the preset number of selected keywords, the coefficients of other keywords are all assigned with 0, namely, the keywords do not participate in effective calculation.
For example, if the preset number is defined as Num ═ 3, then the coefficient ξ of the first-ranked keyword in the keyword sequence isijCoefficient xi of the second ranked keyword, 3ij2; coefficient xi of the third ranked keywordij1 is ═ 1; coefficient xi of the fourth ranked keywordij=0……。
If Kij is used to represent the key in the key sequence, xi is usedijRepresenting the coefficient corresponding to each keyword, and using Hi to represent the sentence category, the keyword list can refer to table 1, and the keyword coefficient list can refer to table 2.
TABLE 1 keyword List
| Sentence classification
|
Keyword sequence
|
|
|
|
| H1
|
K11
|
K12
|
……
|
K1j
|
| ……
|
……
|
……
|
……
|
……
|
| Hi
|
Ki1
|
Ki2
|
|
Kij |
Table 2 keyword coefficient list
| Sentence categories
|
Keyword coefficient sequence
|
|
|
|
| H1
|
ξ11 |
ξ12 |
……
|
ξ1j |
| ……
|
……
|
……
|
……
|
……
|
| Hi
|
ξi1 |
ξi2 |
|
ξij |
And 105, carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results.
For any text sentence W to be recognized, it can be found whether there is a word in the keyword sequence in the text sentence, each sentence corresponds to a keyword sequence, if there is a keyword sequence, the keyword sequence is recorded as 1, if there is no keyword sequence, the keyword coefficient of the existing keyword is also recorded as 0, and the coefficient corresponding to the nonexistent keyword is also recorded as 0, which can be specifically expressed as:
wherein alpha isijIs a score of the presence or absence of a keyword in each sentence. The score is then calculated according to the following formula:
the keyword sequence of each sentence category can calculate a corresponding score, and the number of the sentence categories is the number of the scoring results.
And 106, selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
And selecting the sentence corresponding to the maximum score, wherein the category of the sentence is the category of the text sentence to be recognized, so as to obtain a matching result, namely the category of the target sentence.
To facilitate understanding of the present embodiment, the following case of daily grid service record is given:
TABLE 3 Power grid Admission Overhaul Single keyword extraction example
According to the extracted keywords, assignment can be carried out according to an assignment method to obtain a keyword coefficient sequence, and then grading calculation is carried out to obtain the best matching result.
According to the text classification method for the power grid maintenance list, the early text preparation work can be completed only by few representative historical maintenance lists, namely keyword integration, proportional screening, ascending order arrangement and other operations in sentences, then each word is assigned to obtain a corresponding keyword coefficient sequence, and for any text sentence to be identified, the obtained coefficient can be adopted for scoring calculation, so that accurate target category matching can be performed according to the grade; the influence of the complex inter-class relation of sentences can be avoided by selecting a batch of keywords with low frequency, so that the classification result is more reliable. Therefore, the method and the device for matching the machine learning can solve the technical problems that the existing matching method for machine learning needs a large amount of training data, complex inter-class relations cannot be processed, and the result lacks accuracy and reliability.
For easy understanding, please refer to fig. 2, the present application provides an embodiment of a text classification apparatus for a power grid service list, including:
a keyword selection module 201, configured to select a keyword set of each sentence in a historical overhaul list according to a preset ratio, where the historical overhaul list includes a sentence category;
the word integration module 202 is configured to integrate all keyword sets according to sentence categories to obtain category keyword sets;
the word processing module 203 is configured to perform global frequency statistics and ascending order on the category keyword sets in sequence to obtain a keyword sequence;
the ranking assignment module 204 is configured to perform ranking assignment on a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence;
the score calculation module 205 is configured to perform sentence score calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of score results;
and the text classification module 206 is configured to select a sentence category corresponding to the maximum value in the scoring result as a target sentence category of the text sentence to be recognized.
Further, still include:
and a sentence labeling module 207, configured to perform category labeling processing on each sentence in the historical repair list, so as to obtain a sentence category.
Further, the keyword selecting module 201 is specifically configured to:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting the keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Further, the ranking assignment module 204 is specifically configured to:
acquiring a preset number of keywords in a keyword sequence;
and sequentially assigning the reverse order sequence number of the keywords as a coefficient to the keywords with the number of the keywords which are preset in the front, and assigning the coefficient of the keywords which are not preset in the front to be 0 to obtain a keyword coefficient sequence.
The application also provides a text classification device of the power grid maintenance list, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the text classification method of the power grid maintenance list in the above method embodiment according to the instructions in the program code.
The application also provides a computer-readable storage medium for storing program codes for executing the text classification method of the power grid service list in the above method embodiment.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.