CN119739866A

CN119739866A - A method and device for identifying electrochemical energy storage technology based on patent text analysis

Info

Publication number: CN119739866A
Application number: CN202510229385.7A
Authority: CN
Inventors: 秦全德; 范璧; 蔡孜晓; 张珺婷
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2025-02-28
Filing date: 2025-02-28
Publication date: 2025-04-01
Anticipated expiration: 2045-02-28
Also published as: CN119739866B

Abstract

The invention provides an electrochemical energy storage technology identification method and device based on patent text analysis, and relates to the technical field of data processing, wherein the method comprises the steps of obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage; obtaining topic distribution corresponding to a patent text through a Dirichlet distribution topic model, taking topics included in each patent text as technical characteristics of electrochemical energy storage, obtaining the application quantity of the patent text in a plurality of patent institutions, determining average marginal effect of each technical characteristic based on the topic distribution and the application quantity of the patent text, determining target technical characteristics based on the average marginal effect, clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises technical topics obtained by clustering the target technical characteristics. The objectivity and timeliness of the identification result of the electrochemical energy storage technology can be improved.

Description

Electrochemical energy storage technology identification method and device based on patent text analysis

Technical Field

The invention relates to the technical field of data processing, in particular to an electrochemical energy storage technology identification method and device based on patent text analysis.

Background

Although new energy sources such as photovoltaic and wind power can reduce carbon emission, the intermittence and instability of the new energy sources bring challenges to an electric power system. The electrochemical energy storage technology can effectively store and release energy, and ensure the stable operation of the power system. Despite the rapid development of electrochemical energy storage technology, the problem of uncertainty in technical route decisions is still faced. Therefore, an effective electrochemical energy storage technology identification method is established, a plurality of electrochemical energy storage technologies can be screened, and scientific decision basis is provided for enterprises and research institutions.

In the prior art, the identification method of the electrochemical energy storage technology mainly adopts a qualitative analysis method, and the method relies on the experience and knowledge of an expert and carries out the identification and evaluation of key technologies by using tools such as a Delphi method, a analytic hierarchy process, a contextual analysis method and the like. Firstly, building a field expert team, designing an evaluation index system, then collecting expert opinions in a questionnaire or seminar form, and finally integrating expert scores to obtain a conclusion. Because of the knowledge reserve and experience judgment of the expert personnel, in the environment of rapid technology iteration, the expert often has difficulty in timely and comprehensively grasping the latest technology dynamics, so that the objectivity and timeliness of the evaluation result are poor.

Disclosure of Invention

The invention provides an electrochemical energy storage technology identification method and device based on patent text analysis, which are used for solving the defect that the objectivity and timeliness of an electrochemical energy storage technology identification result are poor by adopting a qualitative analysis method in the prior art, and improving the objectivity and timeliness of the electrochemical energy storage technology identification result.

The invention provides an electrochemical energy storage technology identification method based on patent text analysis, which comprises the following steps:

obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;

obtaining topic distribution corresponding to the patent texts through a dirichlet allocation topic model, and taking the topics contained in each patent text as technical characteristics of electrochemical energy storage;

Acquiring the application quantity of the patent text in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the subject distribution of the patent text and the application quantity, and determining the target technical feature based on the average marginal effect;

Clustering the target technical features to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical features.

According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the topic distribution corresponding to the patent text is obtained through a dirichlet allocation topic model, and the method comprises the following steps:

The patent text is distributed through the dirichlet distribution theme model by sampling labels, wherein the labels are Dewent manual codes;

Sampling the topic distribution of each tag through the dirichlet allocation topic model to obtain topic distributions corresponding to each topic and the tag of the patent text;

And sampling the subject word distribution of each subject through the dirichlet allocation subject model to obtain the subject word distribution corresponding to each subject word and the subject of the patent text.

According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the average marginal effect of each technical feature is determined based on the topic distribution of the patent text and the application quantity, and the method comprises the following steps:

constructing a fitting relation between the technical characteristics and the application quantity in the patent text through an ordered logistic regression model;

Calculating the average marginal effect of each technical feature in each patent text based on the fitting relation.

According to the electrochemical energy storage technology identification method based on patent text analysis, which is provided by the invention, the target technical characteristics are determined based on the average marginal effect, and the method comprises the following steps:

and selecting the technical characteristics with the average marginal effect lower than a preset threshold value in the patent texts with the lowest application quantity as the target technical characteristics.

According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the target technical characteristics are clustered to obtain an electrochemical energy storage technology identification result, and the method comprises the following steps:

Obtaining a similarity matrix corresponding to the target technical features, wherein the similarity matrix comprises the similarity among the target technical features;

based on the similarity matrix, determining a clustering result based on a neighbor propagation algorithm, and taking each cluster in the clustering result as a technical subject;

And determining the recognition result of the electrochemical energy storage technology based on the technical subject.

According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, after the electrochemical energy storage technology identification result is determined based on the technical subject, the method comprises the following steps:

performing technical life cycle analysis on the technical characteristics included in the technical theme to obtain the technical maturity of each technical characteristic in the technical theme;

And generating an evaluation result of each technical topic in the electrochemical energy storage technology identification result based on the average value of the technical maturity of each technical feature included in the technical topic and each technical feature included in the technical topic.

The invention also provides an electrochemical energy storage technology identification device based on patent text analysis, which comprises:

The patent text acquisition module is used for acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;

The text topic mining module is used for obtaining topic distribution corresponding to the patent texts through a Dirichlet distribution topic model, and taking the topics contained in each patent text as technical characteristics of electrochemical energy storage;

The technical feature extraction module is used for acquiring the application quantity of the patent text in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the subject distribution of the patent text and the application quantity, and determining the target technical feature based on the average marginal effect;

The technical characteristic clustering module is used for clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical characteristics.

The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the electrochemical energy storage technology identification method based on the patent text analysis when executing the computer program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an electrochemical energy storage technology identification method based on patent text analysis as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements the electrochemical energy storage technology identification method based on patent text analysis as described in any one of the above.

The electrochemical energy storage technology identification method and device based on patent text analysis, the method comprises the steps of obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage, obtaining theme distribution corresponding to the patent texts through a dirichlet allocation theme model, taking themes contained in each patent text as technical features of electrochemical energy storage, obtaining the application quantity of the patent texts in a plurality of patent institutions, determining average marginal effects of each technical feature based on the theme distribution and the application quantity of the patent texts, determining target technical features based on the average marginal effects, clustering the target technical features to obtain an electrochemical energy storage technology identification result, and clustering the target technical features in the electrochemical energy storage technology identification result.

In this way, through analyzing the patent text related to electrochemical energy storage, the dirichlet distribution topic model is utilized to mine the technical information of the patent text and extract the technical features, then the problem of inconsistent content levels of the recognition results is faced, the technical features of targets are selected based on the patent family scale to remove the technical features with wide content or low value, and finally the selected technical features of targets are clustered into technical topics. Because the patent application has prospective, the recognition of the electrochemical energy storage technology is realized based on a large number of patent text analyses, the objectivity and timeliness of the recognition result can be improved, and accurate reference information is given to a decision maker.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an electrochemical energy storage technology identification method based on patent text analysis.

Fig. 2 is a probability diagram of a dirichlet allocation topic model in the electrochemical energy storage technology identification method based on patent text analysis provided by the invention.

Fig. 3 is a schematic structural diagram of an electrochemical energy storage technology recognition device based on patent text analysis.

Fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The electrochemical energy storage technology identification method based on patent text analysis provided by the invention is described below with reference to fig. 1-2. As shown in fig. 1, the electrochemical energy storage technology identification method based on patent text analysis comprises the following steps:

S110, acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;

S120, obtaining topic distribution corresponding to the patent texts through a dirichlet allocation topic model, and taking topics contained in each patent text as technical characteristics of electrochemical energy storage;

s130, acquiring the application quantity of patent texts in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the topic distribution and the application quantity of the patent texts, and determining the target technical feature based on the average marginal effect;

S140, clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical characteristics.

According to the electrochemical energy storage technology identification method based on patent text analysis, through analysis of the patent text related to electrochemical energy storage, the dirichlet allocation topic model is utilized, technical information of the patent text is mined, technical characteristics are extracted, then the problem of inconsistent content levels of identification results is faced, the technical characteristics of targets are selected based on patent family scale to eliminate the technical characteristics with wide content or low value, and finally the selected technical characteristics of targets are clustered into technical topics. Because the patent application has prospective, the recognition of the electrochemical energy storage technology is realized based on a large number of patent text analyses, the objectivity and timeliness of the recognition result can be improved, and accurate reference information is given to a decision maker.

The patent text database used in the method provided by the invention is constructed by searching and collecting electrochemical energy storage related patents in the patent database. In one possible implementation, electrochemical energy storage related patents may be retrieved from a dewent patent intelligence database (Derwent Innovations Index) to obtain a patent text database for use in the methods provided herein. The Dewent patent information database is a universal patent information and technology information organization worldwide, and can reflect the international technology development dynamics more comprehensively. The Dewent database provides detailed classification information, including patent families, dewent classification codes, and manual codes, which are finer than conventional patent classification systems (e.g., international patent Classification IPC). The related patent text data of electrochemical energy storage is collected, and the patent data of 21 st century is collected by taking DC= (X16) as a search type. The data is preprocessed, and a high-quality patent text database is formed according to fields (including patent titles, abstracts, dewent manual codes, application detailed information, patent family information and the like).

And carrying out technical information mining on the patent text in the patent text database through the dirichlet distribution topic model. According to the method provided by the invention, the Dewent manual code is embedded into the Dirichlet distribution subject model to restrict the model output result, so that the directional label information is obtained. Specifically, obtaining the topic distribution corresponding to the patent text through the dirichlet distribution topic model comprises the following steps:

The patent text is distributed by using a dirichlet distribution theme model, and the labels are De-temperature manual codes;

sampling the topic distribution of each tag through a dirichlet allocation topic model to obtain topic distributions corresponding to the tags of each topic and the patent text;

and sampling the topic word distribution of each topic through a Dirichlet distribution topic model to obtain the topic word distribution corresponding to each topic word and the topic of the patent text.

In the method provided by the invention, the tag dimension is introduced into the hierarchical structure of the document-theme-vocabulary in the dirichlet allocation theme model to form a four-level structure of the document-tag-theme-word.

The specific process of forming the four-level structure of the document-tag-subject-word is as follows:

(1) For each patent text Sample tag distributionWhereinIs of length ofVector and of (2)The individual elements are;

(2) For each tagSampling topic distribution;

(3) For each themeSampling subject matter word distribution

(4) For each documentEach word of (a)Is first distributed from the tagsMiddle sampling a labelThen from the theme distributionMid-sampling a potential topicFinally, from the subject term distributionMiddle sampling word。

As shown in FIG. 2, the black-marked nodes represent values which can be directly observed, and the method provided by the invention can obtain the combined distribution of the document and the themeSubject term distributionEstimated by the markov chain monte carlo sampling method.

In the above-described process and in fig. 2, D represents the number of proprietary texts in the patent text database, D represents one of the patent texts,Representing the number of words in the patent text d, L representing the number of all tags in the patent text database,For one of the tags to be used,Representing all the tags owned by the document d,The tag distribution of the document d is represented,Representation tagIs a number of potential topics of the (a),D label for representing documentIs provided with a distribution of the subject matter of (a),Representation tagThe subject word distribution of the kth subject of (c),Representing topic distributionIs a super-parameter of the prior dirichlet distribution,Representing subject word distributionIs a super-parameter of the prior dirichlet distribution,Expressed as observed words, z represents each observed wordIs a potential subject of (a).

According to the method provided by the invention, only the subject related to the self-owned label is controlled to be extracted from each patent text by using the label information, a patent classification system is used as the label information, the method can be different from the traditional unsupervised text mining technology, the subject mined from the patent text is limited in a specific technical range, so that the output subject result not only contains the technical field information of the patent classification label, but also can reveal the technical content of the more detailed and more fitting data subdivision technical field (electrochemical energy storage technology) than the patent classification label on the finer granularity hierarchical scale. The patent classification information provides a frame of macroscopic technical field, limits an analysis range, reduces noise interference, and digs fine-grained technical characteristics in the patent by text semantic information to capture deep semantic association. The combination of the two realizes multi-level technical feature extraction from macroscopic level to microscopic level, not only improves the accuracy of technical information mining, but also enhances the technical recognition capability of the cross-field technology, and adapts to complex technical ecology.

And taking the theme output by the model as technical characteristics, and explaining specific technical information through the distribution of the themes. Meanwhile, the method provided by the invention expresses each patent text as a document-topic distribution (namely, the proportion of each technical feature in the document, and the sum of the proportion of each topic in all documents is called popularity of the topic), so that the technical feature vector of each patent is obtained.

In general, a patent having a layout in a plurality of patent institutions has higher technical value, and in the method provided by the invention, the application number of the patent text in the plurality of patent institutions is obtained, and based on the topic distribution and the application number of the patent text, the average marginal effect of each technical feature is determined, which specifically comprises:

Constructing a fitting relation between technical features and the number of applications in a patent text through an ordered logistic regression model;

and calculating the average marginal effect of each technical feature in each patent text based on the fitting relation.

Specifically, the number of applications of major patent application institutions (e.g., the national intellectual property office of China, the European patent office, the Japanese patent and trademark office, korea patent office, and the United states patent and trademark office) having a large patent text patent application amount in the patent text database can be acquired. Taking 5 patent application institutions as an example, counting the application quantity of each patent in the patent text database in the 5 patent institutions to obtain an integer ranging from 0 to 5, wherein 0 is obtained when no patent institution is applied, 1 is obtained when one patent institution is applied, 2 is obtained when two patent institutions are applied, and the like. The resulting integer is encoded to yield multiple classes, e.g., 0 and 1 for class 0,2 for class 1,3 for class 2,4 and 5 for class 3. Thus, the patent technical value measure index can be constructed as a four-classification ordered component, and the higher the numerical value is, the higher the patent is applied in more patent application institutions, and the higher the technical value is.

Based on the topic distribution of the patent text and the application number of the patent text, a fitting relation between the technical characteristics and the application number in the patent text is constructed. The fit relationship may be constructed by an ordered logistic regression (Ordinal Logistic Regression, OLR) model. That is, the application quantity index=olr (patent technical feature vector), and the relation between the technical feature distribution of the constructed patent text and the application quantity of the patent text can be expressed by a formula:

;

Wherein, A division threshold value representing the application number category j,An i-th item representing a patent technical feature vector, M being the total number of technical features,Representation ofRegression coefficients of corresponding technical features, Y represents a patent technical value measure index, and P represents probability.

The average marginal effect (AVERAGE MARGINAL EFFECT, AME) of each technical feature can be calculated according to the result of ordered logistic regression model fitting, and the process is as follows:

;

Where N is the total number of patent texts, Is to the firstPersonal patent textAnd calculating marginal effect.Is the average marginal effect of the kth technical feature.

The technical characteristics reflect the influence degree of the technical characteristics on the application quantity of the patent text (reflecting the technical value of the patent) in the marginal effect calculated on a certain patent text, and average the marginal effect calculated on all the patent text by the technical characteristics to obtain the average marginal effect of the technical characteristics.

Determining target features based on the average marginal effect, comprising:

And selecting the technical characteristics with the average marginal effect lower than a preset threshold value in the patent texts with the lowest application number as target technical characteristics.

Specifically, the average marginal effect of category 0 (the category representing the lowest number of applications) may be selected as the basis for selecting the target technical feature, that is, the value of the change in probability that the patent belongs to category 0 (i.e., the patent technical value measure index is 0 or 1) when the unit of occurrence of the component of the selected technical feature increases. Obviously, when this value of a feature is negative, the higher the component the feature the patent contains, the more it is able to reduce the probability that it belongs to class 0, i.e. the probability that it belongs to high technical value increases. Therefore, if the average marginal effect of the technical feature in the category representing the lowest technical value is negative, it indicates that the technical feature has positive correlation to the technical value of the improvement patent, and the larger the absolute value of the numerical value is, the larger the influence of the technical feature is. Finally, selecting all technical features with average marginal effect smaller than 0 (under the significance level of 0.05) in the category with the lowest representative technical value as key technical features (namely target technical features) of the electrochemical energy storage technology, taking absolute values of the average marginal effect of the features, and taking the absolute values as numerical bases for calculating the technical value in the follow-up.

The technical value of the patent technology is quantified by constructing a new technical value measure index, and an influence model between the technical value and the technical characteristics is established based on an ordered logistic regression model. The method provided by the invention creatively combines the technical value measure with the statistical modeling, can effectively identify the key technical characteristics with great influence on the technical value, and provides scientific basis for the screening of the key technologies.

The target technical features selected through the steps have clear meanings and accurately point to the technical content with high technical value, and in order to support technical management decisions, the method provided by the invention further condenses the target technical features into a more representative mesoscopic technical theme so as to refine the electrochemical energy storage relation technology.

Clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, including:

based on the similarity matrix, determining a clustering result based on a neighbor propagation algorithm, and taking each cluster in the clustering result as a technical theme;

and determining an electrochemical energy storage technology identification result based on the technical subject.

Firstly, constructing a target by using the selected M target technical featuresSimilarity matrix of (c)WhereinIs the firstFirst 15 main topic keyword sets of project label technical features, each elementRepresenting technical characteristicsAndJaccard coefficients of (a):

;

Then, a neighbor propagation algorithm (Affinity Propagation, AP) based on the similarity matrix is constructed, and the technical features are clustered. Using an iterative messaging mechanism, two pieces of information, the "responsibility" and "availability", are propagated between the data points, determining the cluster center for each data point. For responsibility values, representing data pointsConsider data pointsAs a fitness of its cluster center.To availability, represent data pointsAs data pointsThe suitability of the cluster center. The following iterations are updated until convergence.

;

。

According to the final resultSelecting the largest valueAs the cluster center. And continuously iterating the responsibility matrix and the available matrix through the mutual information of the adjacent points, and finally obtaining the clustering quantity and the clustering center in a self-adaptive manner according to the calculation results of the two matrices to realize the technical theme of clustering the technical features into mesoscale.

In patent topic clustering, the confirmation of topic number and clustering robustness are all the time difficult problems. In the method provided by the invention, the clustering quantity is not required to be designated in advance through a neighbor propagation clustering algorithm, all data points are used as potential clustering centers, and clustering is realized through message transmission, so that the subjectivity of the super parameter of decision clustering quantity is avoided, and meanwhile, the Jaccard coefficient is not Euclidean distance measurement, so that higher robustness is provided.

And constructing a similarity matrix by taking Jaccard coefficients of the technical feature topic keyword sets as similarity measures, adaptively determining the clustering quantity and the centers through a neighbor propagation algorithm, and organizing the technical features into technical topics with moderate concrete and strong strategic guidance. The method provided by the invention does not need to preset the clustering quantity, avoids subjectivity, has higher robustness when processing non-European space data, and can better adapt to the complexity and diversity of technical characteristics. And a more representative mesoscopic technical theme is extracted through the clustering result, so that the interpretability and the practicability of the decision support are improved.

And finally obtaining the key technical theme through the clustering operation. And for each key technical theme, combining the specific content of the top 15 theme keywords with the highest content in the theme, and identifying the electrochemical energy storage key technology specifically represented by each technical theme.

In one possible implementation manner, the technical maturity of each technical topic may be further obtained, and in combination with the technical maturity of the technical topic and the number of applications corresponding to the technical topic (reflecting the technical value of the technical feature), each technical topic is evaluated, so as to provide more reference information for a technical decision maker. I.e. after determining the recognition result of the electrochemical energy storage technology based on the technical subject, comprising:

Technical life cycle analysis is carried out on the technical characteristics included in the technical theme, so that the technical maturity of each technical characteristic in the technical theme is obtained;

Specifically, according to the method provided by the invention, the identified technical characteristics of the electrochemical energy storage target are modeled through the Gompertz growth S curve in the technical life cycle analysis, and the method is expressed as follows:

in the formula, In order to increase the saturation value of the liquid,To control the parameters of the growth rate, t is the year,Representing popularity of the target technical feature. Based on the increasing data of popularity of all target technical features, fitting an S curve, estimating parameters and taking(I.e., the ratio of the last year's situation to the maximum value of the predicted limit) as a technical feature. Finally, each technical topic uses the average value of the maturity of all technical features in its cluster as a quantization index for its technical maturity.

Where m represents the number of features included under the technical subject matter.

The method measures the application breadth and development depth of a technical feature by measuring the technical value and the technical maturity of the key technical feature. And averaging the technical values and the technical maturity of the technical features belonging to the same subject, wherein the obtained average value is used as the technical value and the technical maturity of the identified electrochemical energy storage key technology. And normalizing the technical value and the technical maturity measurement index of each technology to be respectively used as a vertical axis and a horizontal axis to construct a key technology evaluation matrix. The development situation types of the key technologies are qualitatively divided according to the high and low technical values (approximate to the middle value of the technical value distribution range, namely 0.15) and the standards of early, middle and late technical development (0.25 and 0.75 of technical maturity). Finally, the early-medium-term electrochemical energy storage key technology with high technical value is obtained.

Modeling the life cycle of the technology through a Gompertz growth curve, quantifying the maturity of the technical theme through the mean value in the cluster, constructing a key technology evaluation matrix through the technical value and the technical maturity, and finally identifying the early-middle key technology with high technical value. Revealing the rules of different technologies in the development stage and market value and providing clear guidance for technical management.

The electrochemical energy storage technology recognition device based on patent text analysis provided by the invention is described below, and the electrochemical energy storage technology recognition device based on patent text analysis described below and the electrochemical energy storage technology recognition method based on patent text analysis described above can be correspondingly referred to each other. As shown in fig. 3, the electrochemical energy storage technology recognition device based on patent text analysis provided by the invention comprises:

A patent text acquisition module 310, configured to acquire a patent text database, where the patent text database includes patent text related to electrochemical energy storage;

the text topic mining module 320 is configured to obtain topic distributions corresponding to the patent texts through a dirichlet allocation topic model, and take topics included in each patent text as technical features of electrochemical energy storage;

The technical feature extraction module 330 is configured to obtain the number of applications of the patent text in the plurality of patent institutions, determine an average marginal effect of each technical feature based on the subject distribution and the number of applications of the patent text, and determine a target technical feature based on the average marginal effect;

The technical feature clustering module 340 is configured to cluster the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes a technical theme obtained by clustering the target technical features.

Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include a processor (processor) 410, a communication interface (Communications Interface) 420, a memory (memory) 430, and a communication bus 440, where the processor 410, the communication interface 420, and the memory 430 perform communication with each other through the communication bus 440. The processor 410 may call logic instructions in the memory 430 to execute an electrochemical energy storage technology identification method based on patent text analysis, where the method includes obtaining a patent text database, where the patent text database includes patent texts related to electrochemical energy storage, obtaining a topic distribution corresponding to the patent texts through a dirichlet distribution topic model, taking topics included in each patent text as technical features of electrochemical energy storage, obtaining application numbers of the patent texts in a plurality of patent institutions, determining average marginal effects of each technical feature based on the topic distribution and the application numbers of the patent texts, determining target technical features based on the average marginal effects, and clustering the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes technical topics obtained by clustering the target technical features.

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

On the other hand, the invention also provides a computer program product, which comprises a computer program, wherein the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the electrochemical energy storage technology identification method based on the patent text analysis provided by the methods, and the method comprises the steps of acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage, acquiring topic distribution corresponding to the patent texts through a Dirichlet distribution topic model, taking topics included in each patent text as technical characteristics of electrochemical energy storage, acquiring the number of application of the patent texts in a plurality of patent institutions, determining the average marginal effect of each technical characteristic based on the topic distribution and the number of application of the patent texts, determining the target technical characteristics based on the average marginal effect, clustering the target technical characteristics, acquiring an electrochemical energy storage technology identification result, and acquiring technical topics obtained by clustering the target technical characteristics in the electrochemical energy storage technology identification result.

In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is implemented when executed by a processor to perform the electrochemical energy storage technology identification method based on patent text analysis provided by the above methods, where the method includes obtaining a patent text database, where the patent text database includes patent text related to electrochemical energy storage, obtaining topic distribution corresponding to the patent text by using a dirichlet distribution topic model, taking topics included in each patent text as technical features of electrochemical energy storage, obtaining application numbers of the patent text in a plurality of patent institutions, determining average marginal effects of each technical feature based on the topic distribution and the application numbers of the patent text, determining target technical features based on the average marginal effects, and clustering the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes technical topics obtained by clustering the target technical features.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. A method for identifying electrochemical energy storage technology based on patent text analysis, characterized by comprising:

Obtaining a patent text database, wherein the patent text database includes patent texts related to electrochemical energy storage;

The topic distribution corresponding to the patent text is obtained through the Dirichlet distribution topic model, and the topics included in each of the patent texts are used as technical features of electrochemical energy storage;

Obtain the number of applications for the patent text in multiple patent institutions, determine the average marginal effect of each of the technical features based on the subject distribution of the patent text and the number of applications, and determine the target technical feature based on the average marginal effect;

The target technical features are clustered to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result includes a technical theme obtained by clustering the target technical features.

2. The electrochemical energy storage technology identification method based on patent text analysis according to claim 1 is characterized in that the topic distribution corresponding to the patent text is obtained by using the Dirichlet distribution topic model, including:

The patent text is sampled and labeled using the Dirichlet distribution topic model, wherein the label is a Derwent manual code;

Sampling the topic distribution of each of the labels through the Dirichlet distribution topic model to obtain the topic distribution corresponding to each topic and the label of the patent text;

The Dirichlet distribution topic model is used to sample the topic word distribution for each of the topics to obtain the topic word distribution corresponding to each topic word and the topic of the patent text.

3. The electrochemical energy storage technology identification method based on patent text analysis according to claim 1 is characterized in that the average marginal effect of each of the technical features is determined based on the subject distribution of the patent text and the number of applications, including:

Constructing a fitting relationship between the technical features in the patent text and the number of applications through an ordered logistic regression model;

The average marginal effect of each of the technical features in each of the patent texts is calculated based on the fitting relationship.

4. The electrochemical energy storage technology identification method based on patent text analysis according to claim 1 is characterized in that the determination of the target technology features based on the average marginal effect comprises:

The technical feature whose average marginal effect in the patent text with the lowest number of applications is lower than a preset threshold is selected as the target technical feature.

5. The electrochemical energy storage technology identification method based on patent text analysis according to claim 2 is characterized in that clustering the target technology features to obtain the electrochemical energy storage technology identification results includes:

Obtaining a similarity matrix corresponding to the target technical features, wherein the similarity matrix includes similarities between the target technical features;

Based on the similarity matrix, a clustering result is determined based on a neighbor propagation algorithm, and each cluster in the clustering result is used as a technical theme;

The electrochemical energy storage technology identification result is determined based on the technical subject.

6. The electrochemical energy storage technology identification method based on patent text analysis according to claim 5 is characterized in that after determining the electrochemical energy storage technology identification result based on the technical subject, it includes:

Performing a technology life cycle analysis on the technical features included in the technical subject to obtain the technical maturity of each of the technical features in the technical subject;

Based on the average value of the technical maturity of each of the technical features included in the technical subject and each of the technical features included in the technical subject, an evaluation result of each of the technical subjects in the electrochemical energy storage technology identification result is generated.

7. An electrochemical energy storage technology identification device based on patent text analysis, characterized by comprising:

A patent text acquisition module, used to acquire a patent text database, wherein the patent text database includes patent texts related to electrochemical energy storage;

A text topic mining module, used to obtain the topic distribution corresponding to the patent text through the Dirichlet distribution topic model, and use the topics included in each of the patent texts as the technical features of electrochemical energy storage;

A technical feature extraction module is used to obtain the number of applications of the patent text in multiple patent institutions, determine the average marginal effect of each technical feature based on the subject distribution of the patent text and the number of applications, and determine the target technical feature based on the average marginal effect;

The technical feature clustering module is used to cluster the target technical features to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result includes a technical theme obtained by clustering the target technical features.

8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein when the processor executes the computer program, the electrochemical energy storage technology identification method based on patent text analysis as described in any one of claims 1 to 6 is implemented.

9. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the electrochemical energy storage technology identification method based on patent text analysis as described in any one of claims 1 to 6 is implemented.

10. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the electrochemical energy storage technology identification method based on patent text analysis as claimed in any one of claims 1 to 6 is implemented.