[go: up one dir, main page]

CN114625839A - Text classification method, device, equipment and storage medium for power grid maintenance list - Google Patents

Text classification method, device, equipment and storage medium for power grid maintenance list Download PDF

Info

Publication number
CN114625839A
CN114625839A CN202210270317.1A CN202210270317A CN114625839A CN 114625839 A CN114625839 A CN 114625839A CN 202210270317 A CN202210270317 A CN 202210270317A CN 114625839 A CN114625839 A CN 114625839A
Authority
CN
China
Prior art keywords
sentence
keyword
sequence
keywords
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210270317.1A
Other languages
Chinese (zh)
Inventor
余勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Original Assignee
Guangdong Power Grid Co Ltd
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Foshan Power Supply Bureau of Guangdong Power Grid Corp filed Critical Guangdong Power Grid Co Ltd
Priority to CN202210270317.1A priority Critical patent/CN114625839A/en
Publication of CN114625839A publication Critical patent/CN114625839A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种电网检修单的文本分类方法、装置、设备及存储介质,方法包括:根据预置比例选取历史检修单中每个句子的关键词集合,历史检修单包括句子类别;将所有的关键词集合按照句子类别进行整合,得到类别关键词集合;对类别关键词集合依次进行全局频数统计和升序排列,得到关键词序列;对关键词序列中前预置数量的关键词进行排名赋值,得到关键词系数序列;根据关键词系数序列和关键词序列对待识别文本语句进行句子评分计算,得到多个评分结果;选取评分结果中最大值对应的句子类别作为待识别文本语句的目标句子类别。本申请解决了现有方法需要大量的训练数据,且无法处理复杂的类间关系,导致结果缺乏准确性和可靠性的技术问题。

Figure 202210270317

The present application discloses a text classification method, device, equipment and storage medium for a power grid maintenance order. The method includes: selecting a keyword set of each sentence in a historical maintenance order according to a preset ratio, and the historical maintenance order includes sentence categories; The keyword sets are integrated according to the sentence categories to obtain the category keyword sets; the global frequency statistics and ascending order are sequentially performed on the category keyword sets to obtain the keyword sequence; the pre-set number of keywords in the keyword sequence are ranked and assigned , obtain the keyword coefficient sequence; perform sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence, and obtain multiple scoring results; select the sentence category corresponding to the maximum value in the scoring result as the target sentence category of the text sentence to be recognized . The present application solves the technical problems that the existing methods require a large amount of training data and cannot handle complex inter-class relationships, resulting in lack of accuracy and reliability of results.

Figure 202210270317

Description

Text classification method, device, equipment and storage medium for power grid maintenance list
Technical Field
The application relates to the technical field of text classification, in particular to a text classification method, device, equipment and storage medium for a power grid maintenance list.
Background
In the power production, maintenance and overhaul are important, and the chief of an overhaul class often needs to arrange work and order safety precautions and write the safety precautions into a meeting before and after the class. The safe notice items are expressed invariably for certain types of work, but because the expression of the work tasks is different, if the work tasks expressed by natural language can be analyzed and processed and the corresponding safe notice items are accurately matched, the work arrangement efficiency can be greatly improved, and even the records before and after work can be automatically written.
The most efficient classification scheme based on machine learning in the existing text matching method, but the scheme needs a large amount of training data, and a high-accuracy matching result is difficult to obtain for a more complex text relation, so that the actual application effect is poor.
Disclosure of Invention
The application provides a text classification method, a text classification device, a text classification equipment and a text classification storage medium for a power grid maintenance list, which are used for solving the technical problems that the existing matching method for machine learning needs a large amount of training data and cannot process complex inter-class relations, so that the result is lack of accuracy and reliability.
In view of this, the first aspect of the present application provides a text classification method for a power grid maintenance list, including:
selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence subclasses;
integrating all the keyword sets according to the sentence categories to obtain category keyword sets;
carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence;
ranking and assigning a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence;
carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results;
and selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
Preferably, the selecting of the keyword set of each sentence in the historical repair list according to the preset proportion includes:
and carrying out category labeling processing on each sentence in the historical overhaul list to obtain a sentence category.
Preferably, the selecting a keyword set of each sentence in the historical service list according to a preset proportion, where the historical service list includes a sentence category, includes:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Preferably, the performing ranking assignment on the keywords in a preset number in the keyword sequence to obtain a keyword coefficient sequence includes:
acquiring a preset number of keywords in the keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the front preset number, and assigning the coefficients of the keywords with the number not front preset number to be 0 to obtain a keyword coefficient sequence.
This application second aspect provides a text classification device of electric wire netting maintenance list, includes:
the keyword selection module is used for selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence categories;
the word integration module is used for integrating all the keyword sets according to the sentence categories to obtain category keyword sets;
the word processing module is used for carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence;
the ranking assignment module is used for performing ranking assignment on the keywords in the preset number in the keyword sequence to obtain a keyword coefficient sequence;
the score calculation module is used for carrying out sentence score calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of score results;
and the text classification module is used for selecting the sentence category corresponding to the maximum value in the scoring result as the target sentence category of the text sentence to be recognized.
Preferably, the method further comprises the following steps:
and the sentence marking module is used for carrying out category marking processing on each sentence in the historical overhaul list to obtain the sentence category.
Preferably, the keyword selection module is specifically configured to:
performing artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Preferably, the ranking assignment module is specifically configured to:
acquiring a preset number of keywords in the keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the front preset number, and assigning the coefficients of the keywords with the number not front preset number to be 0 to obtain a keyword coefficient sequence.
The third aspect of the application provides a text classification device for a power grid maintenance list, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the text classification method for the grid service list according to the instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code for executing the method for text classification of a grid service ticket according to the first aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a text classification method for a power grid maintenance list, which comprises the following steps: selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence subclasses; integrating all keyword sets according to sentence categories to obtain category keyword sets; carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence; ranking and assigning a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence; carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results; and selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
According to the text classification method for the power grid maintenance list, the previous text preparation work can be completed only by few representative historical maintenance lists, namely keyword integration, proportional screening, ascending order arrangement and other operations in sentences, then each word is assigned to obtain a corresponding keyword coefficient sequence, and for any text sentence to be identified, the obtained coefficient can be adopted for grading calculation, so that accurate target category matching can be performed according to the grade; the influence of the complex inter-class relation of sentences can be avoided by selecting a batch of keywords with low frequency, so that the classification result is more reliable. Therefore, the method and the device can solve the technical problems that the existing matching method for machine learning needs a large amount of training data, complex inter-class relations cannot be processed, and the result lacks accuracy and reliability.
Drawings
Fig. 1 is a schematic flowchart of a text classification method for a power grid maintenance order according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a text classification device of a power grid maintenance list provided in an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For convenience of understanding, please refer to fig. 1, an embodiment of a text classification method for a power grid service list provided by the present application includes:
step 101, selecting a keyword set of each sentence in a historical overhaul list according to a preset proportion, wherein the historical overhaul list comprises sentence categories.
Further, step 101 includes:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting the keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
The historical overhaul bill has text sentences of various tasks, and each sentence can be subjected to artificial intelligence word segmentation according to text rules to obtain an initial word segmentation set; the words repeatedly marked out can be deleted before the global frequency statistics is carried out, and the duplication elimination processing can ensure that each keyword is unique in the statistical process, which can be directly executed in actual operation and does not need to be repeated. The frequency statistics mainly comprises the steps of recording the frequency of each word appearing in a historical overhaul list, wherein the more the frequency of the appearance, the stronger the relevance between the word and a task needing to be executed in the historical overhaul list is, and the stronger the relevance between the word and a text sentence is; otherwise, the weaker the association. The ascending arrangement is to place the keywords with weak relevance at the top for subsequent selection. The preset ratio may be recorded as K, and may be set according to an actual situation, which is not described herein.
If there are X participles in the initial participle set obtained after the processing, then X participles can be denoted as FEN ═ a1-AX, and each sentence can be expressed as JYAnd Y is the total number of sentences, each sentence can be subjected to keyword check to form a word segmentation sequence based on the sentences, the number of the sentences is the number of the word segmentation sequences, the words in each word segmentation sequence are arranged according to ascending order of frequency, when the keywords are selected in a preset proportion, the front part words of the word segmentation sequences are obtained, namely, the part words with lower frequency are cut and selected to form a keyword set An.
Further, step 101, before, further includes:
and carrying out category labeling processing on each sentence in the historical overhaul list to obtain a sentence category.
And 102, integrating all the keyword sets according to sentence categories to obtain category keyword sets.
Integrating keyword sets corresponding to sentences of the same category into a total keyword set, and finally obtaining keyword sets with the same number as the categories of the sentences; it will be appreciated that the number of partial words within these keyword sets is not equal, and therefore a subsequent keyword selection process is required.
And 103, carrying out global frequency statistics and ascending arrangement on the category keyword set in sequence to obtain a keyword sequence.
Taking each category keyword set as a unit, carrying out global frequency statistics and ascending arrangement on the internal keywords to obtain a keyword sequence, wherein the operation is to prepare for word selection.
And 104, carrying out ranking assignment on the keywords with the preset number in the keyword sequence to obtain a keyword coefficient sequence.
Further, step 104 includes:
acquiring a preset number of keywords in a keyword sequence;
and sequentially assigning the reverse order sorting serial numbers of the keywords as coefficients to the keywords with the preset number, and assigning the coefficients of the keywords with the non-preset number to be 0 to obtain a keyword coefficient sequence.
The selected keywords are still the front part with low frequency, and the category keyword set corresponding to each sentence is processed to be the set of the keywords with preset number, so that the expression of the keyword set is unified. The preset number may be set as needed, and is not limited herein.
The frequency expression is too complicated, the subsequent calculation amount can be increased, in order to simplify the calculation, the sequence number of the keyword is assigned to the corresponding keyword, and the assignment mode is a reverse order assignment method; except for the preset number of selected keywords, the coefficients of other keywords are all assigned with 0, namely, the keywords do not participate in effective calculation.
For example, if the preset number is defined as Num ═ 3, then the coefficient ξ of the first-ranked keyword in the keyword sequence isijCoefficient xi of the second ranked keyword, 3ij2; coefficient xi of the third ranked keywordij1 is ═ 1; coefficient xi of the fourth ranked keywordij=0……。
If Kij is used to represent the key in the key sequence, xi is usedijRepresenting the coefficient corresponding to each keyword, and using Hi to represent the sentence category, the keyword list can refer to table 1, and the keyword coefficient list can refer to table 2.
TABLE 1 keyword List
Sentence classification Keyword sequence
H1 K11 K12 …… K1j
…… …… …… …… ……
Hi Ki1 Ki2 Kij
Table 2 keyword coefficient list
Sentence categories Keyword coefficient sequence
H1 ξ11 ξ12 …… ξ1j
…… …… …… …… ……
Hi ξi1 ξi2 ξij
And 105, carrying out sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of scoring results.
For any text sentence W to be recognized, it can be found whether there is a word in the keyword sequence in the text sentence, each sentence corresponds to a keyword sequence, if there is a keyword sequence, the keyword sequence is recorded as 1, if there is no keyword sequence, the keyword coefficient of the existing keyword is also recorded as 0, and the coefficient corresponding to the nonexistent keyword is also recorded as 0, which can be specifically expressed as:
Figure BDA0003554411070000071
wherein alpha isijIs a score of the presence or absence of a keyword in each sentence. The score is then calculated according to the following formula:
Figure BDA0003554411070000072
the keyword sequence of each sentence category can calculate a corresponding score, and the number of the sentence categories is the number of the scoring results.
And 106, selecting the sentence category corresponding to the maximum value in the grading result as the target sentence category of the text sentence to be recognized.
And selecting the sentence corresponding to the maximum score, wherein the category of the sentence is the category of the text sentence to be recognized, so as to obtain a matching result, namely the category of the target sentence.
To facilitate understanding of the present embodiment, the following case of daily grid service record is given:
TABLE 3 Power grid Admission Overhaul Single keyword extraction example
Figure BDA0003554411070000073
Figure BDA0003554411070000081
According to the extracted keywords, assignment can be carried out according to an assignment method to obtain a keyword coefficient sequence, and then grading calculation is carried out to obtain the best matching result.
According to the text classification method for the power grid maintenance list, the early text preparation work can be completed only by few representative historical maintenance lists, namely keyword integration, proportional screening, ascending order arrangement and other operations in sentences, then each word is assigned to obtain a corresponding keyword coefficient sequence, and for any text sentence to be identified, the obtained coefficient can be adopted for scoring calculation, so that accurate target category matching can be performed according to the grade; the influence of the complex inter-class relation of sentences can be avoided by selecting a batch of keywords with low frequency, so that the classification result is more reliable. Therefore, the method and the device for matching the machine learning can solve the technical problems that the existing matching method for machine learning needs a large amount of training data, complex inter-class relations cannot be processed, and the result lacks accuracy and reliability.
For easy understanding, please refer to fig. 2, the present application provides an embodiment of a text classification apparatus for a power grid service list, including:
a keyword selection module 201, configured to select a keyword set of each sentence in a historical overhaul list according to a preset ratio, where the historical overhaul list includes a sentence category;
the word integration module 202 is configured to integrate all keyword sets according to sentence categories to obtain category keyword sets;
the word processing module 203 is configured to perform global frequency statistics and ascending order on the category keyword sets in sequence to obtain a keyword sequence;
the ranking assignment module 204 is configured to perform ranking assignment on a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence;
the score calculation module 205 is configured to perform sentence score calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence to obtain a plurality of score results;
and the text classification module 206 is configured to select a sentence category corresponding to the maximum value in the scoring result as a target sentence category of the text sentence to be recognized.
Further, still include:
and a sentence labeling module 207, configured to perform category labeling processing on each sentence in the historical repair list, so as to obtain a sentence category.
Further, the keyword selecting module 201 is specifically configured to:
carrying out artificial intelligence word segmentation on each sentence in the historical overhaul list to obtain an initial word segmentation set;
counting the global frequency of each word in the sentence according to the initial word segmentation set, and sequencing the words in an ascending order to obtain a word segmentation sequence corresponding to each sentence;
and sequentially selecting the keywords in the word segmentation sequence according to a preset proportion to obtain a keyword set.
Further, the ranking assignment module 204 is specifically configured to:
acquiring a preset number of keywords in a keyword sequence;
and sequentially assigning the reverse order sequence number of the keywords as a coefficient to the keywords with the number of the keywords which are preset in the front, and assigning the coefficient of the keywords which are not preset in the front to be 0 to obtain a keyword coefficient sequence.
The application also provides a text classification device of the power grid maintenance list, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the text classification method of the power grid maintenance list in the above method embodiment according to the instructions in the program code.
The application also provides a computer-readable storage medium for storing program codes for executing the text classification method of the power grid service list in the above method embodiment.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1.一种电网检修单的文本分类方法,其特征在于,包括:1. a text classification method of power grid maintenance list, is characterized in that, comprises: 根据预置比例选取历史检修单中每个句子的关键词集合,所述历史检修单包括句子类别;Select the keyword set of each sentence in the historical maintenance list according to a preset ratio, and the historical maintenance list includes sentence categories; 将所有的所述关键词集合按照所述句子类别进行整合,得到类别关键词集合;Integrate all the keyword sets according to the sentence categories to obtain a category keyword set; 对所述类别关键词集合依次进行全局频数统计和升序排列,得到关键词序列;Perform global frequency statistics and ascending order on the category keyword set in turn to obtain a keyword sequence; 对所述关键词序列中前预置数量的关键词进行排名赋值,得到关键词系数序列;Ranking and assigning a preset number of keywords in the keyword sequence to obtain a keyword coefficient sequence; 根据所述关键词系数序列和所述关键词序列对待识别文本语句进行句子评分计算,得到多个评分结果;Perform sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence, and obtain multiple scoring results; 选取所述评分结果中最大值对应的句子类别作为所述待识别文本语句的目标句子类别。The sentence category corresponding to the maximum value in the scoring result is selected as the target sentence category of the text sentence to be recognized. 2.根据权利要求1所述的电网检修单的文本分类方法,其特征在于,所述根据预置比例选取历史检修单中每个句子的关键词集合,所述历史检修单包括句子类别,之前还包括:2. The method for text classification of power grid maintenance orders according to claim 1, wherein the keyword set of each sentence in the historical maintenance order is selected according to a preset ratio, and the historical maintenance order includes sentence categories. Also includes: 对历史检修单中的每个句子进行类别标注处理,得到句子类别。Perform category labeling processing on each sentence in the historical maintenance sheet to obtain the sentence category. 3.根据权利要求1所述的电网检修单的文本分类方法,其特征在于,所述根据预置比例选取历史检修单中每个句子的关键词集合,所述历史检修单包括句子类别,包括:3. The method for text classification of power grid maintenance orders according to claim 1, wherein the keyword set of each sentence in the historical maintenance order is selected according to a preset ratio, and the historical maintenance order includes sentence categories, including : 对历史检修单中的每个句子进行人工智能分词,得到初始分词集合;Perform artificial intelligence word segmentation on each sentence in the historical maintenance sheet to obtain an initial word segmentation set; 根据所述初始分词集合统计句子中每个词语的全局频数,并升序排列,得到每个句子对应的分词序列;Count the global frequency of each word in the sentence according to the initial word segmentation set, and arrange them in ascending order to obtain the word segmentation sequence corresponding to each sentence; 根据预置比例在所述分词序列中顺序选取关键词,得到关键词集合。According to a preset ratio, keywords are sequentially selected in the word segmentation sequence to obtain a keyword set. 4.根据权利要求1所述的电网检修单的文本分类方法,其特征在于,所述对所述关键词序列中前预置数量的关键词进行排名赋值,得到关键词系数序列,包括:4. The method for classifying the text of a power grid maintenance order according to claim 1, characterized in that, performing a ranking assignment on the keywords of the pre-set number of keywords in the keyword sequence to obtain a keyword coefficient sequence, comprising: 获取所述关键词序列中的前预置数量的关键词;obtaining the keywords of the first preset number in the keyword sequence; 将所述关键词的倒序排序序号作为系数依次赋值给所述前预置数量个关键词,并将非前所述预置数量的关键词的系数赋值为0,得到关键词系数序列。The reverse order sequence numbers of the keywords are assigned as coefficients to the first preset number of keywords in turn, and the coefficients of the keywords that are not the preset number are assigned 0 to obtain a keyword coefficient sequence. 5.一种电网检修单的文本分类装置,其特征在于,包括:5. A text classification device for a power grid maintenance order, characterized in that, comprising: 关键词选取模块,用于根据预置比例选取历史检修单中每个句子的关键词集合,所述历史检修单包括句子类别;The keyword selection module is used to select the keyword set of each sentence in the historical maintenance list according to a preset ratio, and the historical maintenance list includes sentence categories; 词语整合模块,用于将所有的所述关键词集合按照所述句子类别进行整合,得到类别关键词集合;A word integration module for integrating all the keyword sets according to the sentence categories to obtain a category keyword set; 词语处理模块,用于对所述类别关键词集合依次进行全局频数统计和升序排列,得到关键词序列;A word processing module, configured to sequentially perform global frequency statistics and ascending arrangement on the category keyword set to obtain a keyword sequence; 排名赋值模块,用于对所述关键词序列中前预置数量的关键词进行排名赋值,得到关键词系数序列;a ranking assignment module, configured to perform ranking assignments on the pre-set number of keywords in the keyword sequence to obtain a keyword coefficient sequence; 评分计算模块,用于根据所述关键词系数序列和所述关键词序列对待识别文本语句进行句子评分计算,得到多个评分结果;a scoring calculation module, configured to perform sentence scoring calculation on the text sentence to be recognized according to the keyword coefficient sequence and the keyword sequence, and obtain multiple scoring results; 文本分类模块,用于选取所述评分结果中最大值对应的句子类别作为所述待识别文本语句的目标句子类别。A text classification module, configured to select the sentence category corresponding to the maximum value in the scoring result as the target sentence category of the text sentence to be recognized. 6.根据权利要求5所述的电网检修单的文本分类装置,其特征在于,还包括:6. The text classification device for a power grid maintenance order according to claim 5, further comprising: 句子标注模块,用于对历史检修单中的每个句子进行类别标注处理,得到句子类别。The sentence labeling module is used to label each sentence in the historical maintenance sheet to obtain sentence categories. 7.根据权利要求5所述的电网检修单的文本分类装置,其特征在于,所述关键词选取模块,具体用于:7. The text classification device of power grid maintenance list according to claim 5, wherein the keyword selection module is specifically used for: 对历史检修单中的每个句子进行人工智能分词,得到初始分词集合;Perform artificial intelligence word segmentation on each sentence in the historical maintenance sheet to obtain an initial word segmentation set; 根据所述初始分词集合统计句子中每个词语的全局频数,并升序排列,得到每个句子对应的分词序列;Count the global frequency of each word in the sentence according to the initial word segmentation set, and arrange them in ascending order to obtain the word segmentation sequence corresponding to each sentence; 根据预置比例在所述分词序列中顺序选取关键词,得到关键词集合。According to a preset ratio, keywords are sequentially selected in the word segmentation sequence to obtain a keyword set. 8.根据权利要求5所述的电网检修单的文本分类装置,其特征在于,所述排名赋值模块,具体用于:8. The text classification device of the power grid maintenance list according to claim 5, wherein the ranking assignment module is specifically used for: 获取所述关键词序列中的前预置数量的关键词;obtaining the keywords of the first preset number in the keyword sequence; 将所述关键词的倒序排序序号作为系数依次赋值给所述前预置数量个关键词,并将非前所述预置数量的关键词的系数赋值为0,得到关键词系数序列。The reverse order sequence numbers of the keywords are assigned as coefficients to the first preset number of keywords in turn, and the coefficients of the keywords that are not the preset number are assigned 0 to obtain a keyword coefficient sequence. 9.一种电网检修单的文本分类设备,其特征在于,所述设备包括处理器以及存储器;9. A text classification device for a power grid maintenance order, wherein the device comprises a processor and a memory; 所述存储器用于存储程序代码,并将所述程序代码传输给所述处理器;the memory is used to store program code and transmit the program code to the processor; 所述处理器用于根据所述程序代码中的指令执行权利要求1-4任一项所述的电网检修单的文本分类方法。The processor is configured to execute the text classification method for a power grid maintenance order according to any one of claims 1-4 according to the instructions in the program code. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1-4任一项所述的电网检修单的文本分类方法。10. A computer-readable storage medium, wherein the computer-readable storage medium is used to store program codes, and the program codes are used to execute the text of the power grid maintenance order according to any one of claims 1-4 Classification.
CN202210270317.1A 2022-03-18 2022-03-18 Text classification method, device, equipment and storage medium for power grid maintenance list Pending CN114625839A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210270317.1A CN114625839A (en) 2022-03-18 2022-03-18 Text classification method, device, equipment and storage medium for power grid maintenance list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210270317.1A CN114625839A (en) 2022-03-18 2022-03-18 Text classification method, device, equipment and storage medium for power grid maintenance list

Publications (1)

Publication Number Publication Date
CN114625839A true CN114625839A (en) 2022-06-14

Family

ID=81903017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210270317.1A Pending CN114625839A (en) 2022-03-18 2022-03-18 Text classification method, device, equipment and storage medium for power grid maintenance list

Country Status (1)

Country Link
CN (1) CN114625839A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827867A (en) * 2022-12-12 2023-03-21 北京百度网讯科技有限公司 Text type detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664473A (en) * 2018-05-11 2018-10-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of text key message
WO2019210820A1 (en) * 2018-05-03 2019-11-07 华为技术有限公司 Information output method and apparatus
CN111475601A (en) * 2020-04-09 2020-07-31 云南电网有限责任公司电力科学研究院 Method and device for acquiring hot subject of power work order
CN111861610A (en) * 2019-04-30 2020-10-30 北京嘀嘀无限科技发展有限公司 A data processing method, device, electronic device and storage medium
CN112214990A (en) * 2020-09-24 2021-01-12 交控科技股份有限公司 Method and device for extracting key words of rail transit maintenance work order
CN113378567A (en) * 2021-07-05 2021-09-10 广东工业大学 Chinese short text classification method for improving low-frequency words

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019210820A1 (en) * 2018-05-03 2019-11-07 华为技术有限公司 Information output method and apparatus
CN108664473A (en) * 2018-05-11 2018-10-16 平安科技(深圳)有限公司 Recognition methods, electronic device and the readable storage medium storing program for executing of text key message
CN111861610A (en) * 2019-04-30 2020-10-30 北京嘀嘀无限科技发展有限公司 A data processing method, device, electronic device and storage medium
CN111475601A (en) * 2020-04-09 2020-07-31 云南电网有限责任公司电力科学研究院 Method and device for acquiring hot subject of power work order
CN112214990A (en) * 2020-09-24 2021-01-12 交控科技股份有限公司 Method and device for extracting key words of rail transit maintenance work order
CN113378567A (en) * 2021-07-05 2021-09-10 广东工业大学 Chinese short text classification method for improving low-frequency words

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827867A (en) * 2022-12-12 2023-03-21 北京百度网讯科技有限公司 Text type detection method and device

Similar Documents

Publication Publication Date Title
JP5165033B2 (en) Communication text classification method and apparatus
US10366117B2 (en) Computer-implemented systems and methods for taxonomy development
CN113051291A (en) Work order information processing method, device, equipment and storage medium
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN104077407B (en) A kind of intelligent data search system and method
CN111984788B (en) Power system violation management method, device and power equipment
US20210073216A1 (en) Business intelligence system based on artificial intelligence and analysis method thereof
CN110990529A (en) Enterprise industry detail division method and system
CN107506389A (en) A kind of method and apparatus for extracting position skill requirement
CN109711424A (en) A kind of rule of conduct acquisition methods, device and equipment based on decision tree
CN115238816B (en) User classification method and related equipment based on multivariate data fusion
CN115544348A (en) Intelligent mass information searching system based on Internet big data
CN110825839A (en) Incidence relation analysis method for targets in text information
CN110347833B (en) A Classification Method for Multi-round Dialogue
CN117454217A (en) A method, device and system for identifying depressive emotions based on deep integrated learning
CN118733712A (en) An intelligent search method based on retrieval-enhanced generation
CN110955767A (en) An algorithm and device for generating a list of intent candidate sets in a robot dialogue system
CN114625839A (en) Text classification method, device, equipment and storage medium for power grid maintenance list
CN111104422B (en) Training method, device, equipment and storage medium of data recommendation model
CN111191430B (en) Automatic table building method and device, computer equipment and storage medium
CN109471934B (en) Internet-based financial risk clues mining method
CN113889274B (en) Method and device for constructing risk prediction model of autism spectrum disorder
CN116340387A (en) A method and system for statistical analysis of personal information disclosure for data tables
CN109241276B (en) Word classification method in text, and speech creativity evaluation method and system
CN115169831A (en) Enterprise risk early warning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination