CN119739866A - A method and device for identifying electrochemical energy storage technology based on patent text analysis - Google Patents
A method and device for identifying electrochemical energy storage technology based on patent text analysis Download PDFInfo
- Publication number
- CN119739866A CN119739866A CN202510229385.7A CN202510229385A CN119739866A CN 119739866 A CN119739866 A CN 119739866A CN 202510229385 A CN202510229385 A CN 202510229385A CN 119739866 A CN119739866 A CN 119739866A
- Authority
- CN
- China
- Prior art keywords
- technical
- energy storage
- electrochemical energy
- patent text
- storage technology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an electrochemical energy storage technology identification method and device based on patent text analysis, and relates to the technical field of data processing, wherein the method comprises the steps of obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage; obtaining topic distribution corresponding to a patent text through a Dirichlet distribution topic model, taking topics included in each patent text as technical characteristics of electrochemical energy storage, obtaining the application quantity of the patent text in a plurality of patent institutions, determining average marginal effect of each technical characteristic based on the topic distribution and the application quantity of the patent text, determining target technical characteristics based on the average marginal effect, clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises technical topics obtained by clustering the target technical characteristics. The objectivity and timeliness of the identification result of the electrochemical energy storage technology can be improved.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an electrochemical energy storage technology identification method and device based on patent text analysis.
Background
Although new energy sources such as photovoltaic and wind power can reduce carbon emission, the intermittence and instability of the new energy sources bring challenges to an electric power system. The electrochemical energy storage technology can effectively store and release energy, and ensure the stable operation of the power system. Despite the rapid development of electrochemical energy storage technology, the problem of uncertainty in technical route decisions is still faced. Therefore, an effective electrochemical energy storage technology identification method is established, a plurality of electrochemical energy storage technologies can be screened, and scientific decision basis is provided for enterprises and research institutions.
In the prior art, the identification method of the electrochemical energy storage technology mainly adopts a qualitative analysis method, and the method relies on the experience and knowledge of an expert and carries out the identification and evaluation of key technologies by using tools such as a Delphi method, a analytic hierarchy process, a contextual analysis method and the like. Firstly, building a field expert team, designing an evaluation index system, then collecting expert opinions in a questionnaire or seminar form, and finally integrating expert scores to obtain a conclusion. Because of the knowledge reserve and experience judgment of the expert personnel, in the environment of rapid technology iteration, the expert often has difficulty in timely and comprehensively grasping the latest technology dynamics, so that the objectivity and timeliness of the evaluation result are poor.
Disclosure of Invention
The invention provides an electrochemical energy storage technology identification method and device based on patent text analysis, which are used for solving the defect that the objectivity and timeliness of an electrochemical energy storage technology identification result are poor by adopting a qualitative analysis method in the prior art, and improving the objectivity and timeliness of the electrochemical energy storage technology identification result.
The invention provides an electrochemical energy storage technology identification method based on patent text analysis, which comprises the following steps:
obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;
obtaining topic distribution corresponding to the patent texts through a dirichlet allocation topic model, and taking the topics contained in each patent text as technical characteristics of electrochemical energy storage;
Acquiring the application quantity of the patent text in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the subject distribution of the patent text and the application quantity, and determining the target technical feature based on the average marginal effect;
Clustering the target technical features to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical features.
According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the topic distribution corresponding to the patent text is obtained through a dirichlet allocation topic model, and the method comprises the following steps:
The patent text is distributed through the dirichlet distribution theme model by sampling labels, wherein the labels are Dewent manual codes;
Sampling the topic distribution of each tag through the dirichlet allocation topic model to obtain topic distributions corresponding to each topic and the tag of the patent text;
And sampling the subject word distribution of each subject through the dirichlet allocation subject model to obtain the subject word distribution corresponding to each subject word and the subject of the patent text.
According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the average marginal effect of each technical feature is determined based on the topic distribution of the patent text and the application quantity, and the method comprises the following steps:
constructing a fitting relation between the technical characteristics and the application quantity in the patent text through an ordered logistic regression model;
Calculating the average marginal effect of each technical feature in each patent text based on the fitting relation.
According to the electrochemical energy storage technology identification method based on patent text analysis, which is provided by the invention, the target technical characteristics are determined based on the average marginal effect, and the method comprises the following steps:
and selecting the technical characteristics with the average marginal effect lower than a preset threshold value in the patent texts with the lowest application quantity as the target technical characteristics.
According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, the target technical characteristics are clustered to obtain an electrochemical energy storage technology identification result, and the method comprises the following steps:
Obtaining a similarity matrix corresponding to the target technical features, wherein the similarity matrix comprises the similarity among the target technical features;
based on the similarity matrix, determining a clustering result based on a neighbor propagation algorithm, and taking each cluster in the clustering result as a technical subject;
And determining the recognition result of the electrochemical energy storage technology based on the technical subject.
According to the electrochemical energy storage technology identification method based on patent text analysis provided by the invention, after the electrochemical energy storage technology identification result is determined based on the technical subject, the method comprises the following steps:
performing technical life cycle analysis on the technical characteristics included in the technical theme to obtain the technical maturity of each technical characteristic in the technical theme;
And generating an evaluation result of each technical topic in the electrochemical energy storage technology identification result based on the average value of the technical maturity of each technical feature included in the technical topic and each technical feature included in the technical topic.
The invention also provides an electrochemical energy storage technology identification device based on patent text analysis, which comprises:
The patent text acquisition module is used for acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;
The text topic mining module is used for obtaining topic distribution corresponding to the patent texts through a Dirichlet distribution topic model, and taking the topics contained in each patent text as technical characteristics of electrochemical energy storage;
The technical feature extraction module is used for acquiring the application quantity of the patent text in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the subject distribution of the patent text and the application quantity, and determining the target technical feature based on the average marginal effect;
The technical characteristic clustering module is used for clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical characteristics.
The invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the electrochemical energy storage technology identification method based on the patent text analysis when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an electrochemical energy storage technology identification method based on patent text analysis as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements the electrochemical energy storage technology identification method based on patent text analysis as described in any one of the above.
The electrochemical energy storage technology identification method and device based on patent text analysis, the method comprises the steps of obtaining a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage, obtaining theme distribution corresponding to the patent texts through a dirichlet allocation theme model, taking themes contained in each patent text as technical features of electrochemical energy storage, obtaining the application quantity of the patent texts in a plurality of patent institutions, determining average marginal effects of each technical feature based on the theme distribution and the application quantity of the patent texts, determining target technical features based on the average marginal effects, clustering the target technical features to obtain an electrochemical energy storage technology identification result, and clustering the target technical features in the electrochemical energy storage technology identification result.
In this way, through analyzing the patent text related to electrochemical energy storage, the dirichlet distribution topic model is utilized to mine the technical information of the patent text and extract the technical features, then the problem of inconsistent content levels of the recognition results is faced, the technical features of targets are selected based on the patent family scale to remove the technical features with wide content or low value, and finally the selected technical features of targets are clustered into technical topics. Because the patent application has prospective, the recognition of the electrochemical energy storage technology is realized based on a large number of patent text analyses, the objectivity and timeliness of the recognition result can be improved, and accurate reference information is given to a decision maker.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an electrochemical energy storage technology identification method based on patent text analysis.
Fig. 2 is a probability diagram of a dirichlet allocation topic model in the electrochemical energy storage technology identification method based on patent text analysis provided by the invention.
Fig. 3 is a schematic structural diagram of an electrochemical energy storage technology recognition device based on patent text analysis.
Fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The electrochemical energy storage technology identification method based on patent text analysis provided by the invention is described below with reference to fig. 1-2. As shown in fig. 1, the electrochemical energy storage technology identification method based on patent text analysis comprises the following steps:
S110, acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage;
S120, obtaining topic distribution corresponding to the patent texts through a dirichlet allocation topic model, and taking topics contained in each patent text as technical characteristics of electrochemical energy storage;
s130, acquiring the application quantity of patent texts in a plurality of patent institutions, determining the average marginal effect of each technical feature based on the topic distribution and the application quantity of the patent texts, and determining the target technical feature based on the average marginal effect;
S140, clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, wherein the electrochemical energy storage technology identification result comprises a technical theme obtained by clustering the target technical characteristics.
According to the electrochemical energy storage technology identification method based on patent text analysis, through analysis of the patent text related to electrochemical energy storage, the dirichlet allocation topic model is utilized, technical information of the patent text is mined, technical characteristics are extracted, then the problem of inconsistent content levels of identification results is faced, the technical characteristics of targets are selected based on patent family scale to eliminate the technical characteristics with wide content or low value, and finally the selected technical characteristics of targets are clustered into technical topics. Because the patent application has prospective, the recognition of the electrochemical energy storage technology is realized based on a large number of patent text analyses, the objectivity and timeliness of the recognition result can be improved, and accurate reference information is given to a decision maker.
The patent text database used in the method provided by the invention is constructed by searching and collecting electrochemical energy storage related patents in the patent database. In one possible implementation, electrochemical energy storage related patents may be retrieved from a dewent patent intelligence database (Derwent Innovations Index) to obtain a patent text database for use in the methods provided herein. The Dewent patent information database is a universal patent information and technology information organization worldwide, and can reflect the international technology development dynamics more comprehensively. The Dewent database provides detailed classification information, including patent families, dewent classification codes, and manual codes, which are finer than conventional patent classification systems (e.g., international patent Classification IPC). The related patent text data of electrochemical energy storage is collected, and the patent data of 21 st century is collected by taking DC= (X16) as a search type. The data is preprocessed, and a high-quality patent text database is formed according to fields (including patent titles, abstracts, dewent manual codes, application detailed information, patent family information and the like).
And carrying out technical information mining on the patent text in the patent text database through the dirichlet distribution topic model. According to the method provided by the invention, the Dewent manual code is embedded into the Dirichlet distribution subject model to restrict the model output result, so that the directional label information is obtained. Specifically, obtaining the topic distribution corresponding to the patent text through the dirichlet distribution topic model comprises the following steps:
The patent text is distributed by using a dirichlet distribution theme model, and the labels are De-temperature manual codes;
sampling the topic distribution of each tag through a dirichlet allocation topic model to obtain topic distributions corresponding to the tags of each topic and the patent text;
and sampling the topic word distribution of each topic through a Dirichlet distribution topic model to obtain the topic word distribution corresponding to each topic word and the topic of the patent text.
In the method provided by the invention, the tag dimension is introduced into the hierarchical structure of the document-theme-vocabulary in the dirichlet allocation theme model to form a four-level structure of the document-tag-theme-word.
The specific process of forming the four-level structure of the document-tag-subject-word is as follows:
(1) For each patent text Sample tag distributionWhereinIs of length ofVector and of (2)The individual elements are;
(2) For each tagSampling topic distribution;
(3) For each themeSampling subject matter word distribution
(4) For each documentEach word of (a)Is first distributed from the tagsMiddle sampling a labelThen from the theme distributionMid-sampling a potential topicFinally, from the subject term distributionMiddle sampling word。
As shown in FIG. 2, the black-marked nodes represent values which can be directly observed, and the method provided by the invention can obtain the combined distribution of the document and the themeSubject term distributionEstimated by the markov chain monte carlo sampling method.
In the above-described process and in fig. 2, D represents the number of proprietary texts in the patent text database, D represents one of the patent texts,Representing the number of words in the patent text d, L representing the number of all tags in the patent text database,For one of the tags to be used,Representing all the tags owned by the document d,The tag distribution of the document d is represented,Representation tagIs a number of potential topics of the (a),D label for representing documentIs provided with a distribution of the subject matter of (a),Representation tagThe subject word distribution of the kth subject of (c),Representing topic distributionIs a super-parameter of the prior dirichlet distribution,Representing subject word distributionIs a super-parameter of the prior dirichlet distribution,Expressed as observed words, z represents each observed wordIs a potential subject of (a).
According to the method provided by the invention, only the subject related to the self-owned label is controlled to be extracted from each patent text by using the label information, a patent classification system is used as the label information, the method can be different from the traditional unsupervised text mining technology, the subject mined from the patent text is limited in a specific technical range, so that the output subject result not only contains the technical field information of the patent classification label, but also can reveal the technical content of the more detailed and more fitting data subdivision technical field (electrochemical energy storage technology) than the patent classification label on the finer granularity hierarchical scale. The patent classification information provides a frame of macroscopic technical field, limits an analysis range, reduces noise interference, and digs fine-grained technical characteristics in the patent by text semantic information to capture deep semantic association. The combination of the two realizes multi-level technical feature extraction from macroscopic level to microscopic level, not only improves the accuracy of technical information mining, but also enhances the technical recognition capability of the cross-field technology, and adapts to complex technical ecology.
And taking the theme output by the model as technical characteristics, and explaining specific technical information through the distribution of the themes. Meanwhile, the method provided by the invention expresses each patent text as a document-topic distribution (namely, the proportion of each technical feature in the document, and the sum of the proportion of each topic in all documents is called popularity of the topic), so that the technical feature vector of each patent is obtained.
In general, a patent having a layout in a plurality of patent institutions has higher technical value, and in the method provided by the invention, the application number of the patent text in the plurality of patent institutions is obtained, and based on the topic distribution and the application number of the patent text, the average marginal effect of each technical feature is determined, which specifically comprises:
Constructing a fitting relation between technical features and the number of applications in a patent text through an ordered logistic regression model;
and calculating the average marginal effect of each technical feature in each patent text based on the fitting relation.
Specifically, the number of applications of major patent application institutions (e.g., the national intellectual property office of China, the European patent office, the Japanese patent and trademark office, korea patent office, and the United states patent and trademark office) having a large patent text patent application amount in the patent text database can be acquired. Taking 5 patent application institutions as an example, counting the application quantity of each patent in the patent text database in the 5 patent institutions to obtain an integer ranging from 0 to 5, wherein 0 is obtained when no patent institution is applied, 1 is obtained when one patent institution is applied, 2 is obtained when two patent institutions are applied, and the like. The resulting integer is encoded to yield multiple classes, e.g., 0 and 1 for class 0,2 for class 1,3 for class 2,4 and 5 for class 3. Thus, the patent technical value measure index can be constructed as a four-classification ordered component, and the higher the numerical value is, the higher the patent is applied in more patent application institutions, and the higher the technical value is.
Based on the topic distribution of the patent text and the application number of the patent text, a fitting relation between the technical characteristics and the application number in the patent text is constructed. The fit relationship may be constructed by an ordered logistic regression (Ordinal Logistic Regression, OLR) model. That is, the application quantity index=olr (patent technical feature vector), and the relation between the technical feature distribution of the constructed patent text and the application quantity of the patent text can be expressed by a formula:
;
Wherein, A division threshold value representing the application number category j,An i-th item representing a patent technical feature vector, M being the total number of technical features,Representation ofRegression coefficients of corresponding technical features, Y represents a patent technical value measure index, and P represents probability.
The average marginal effect (AVERAGE MARGINAL EFFECT, AME) of each technical feature can be calculated according to the result of ordered logistic regression model fitting, and the process is as follows:
;
;
Where N is the total number of patent texts, Is to the firstPersonal patent textAnd calculating marginal effect.Is the average marginal effect of the kth technical feature.
The technical characteristics reflect the influence degree of the technical characteristics on the application quantity of the patent text (reflecting the technical value of the patent) in the marginal effect calculated on a certain patent text, and average the marginal effect calculated on all the patent text by the technical characteristics to obtain the average marginal effect of the technical characteristics.
Determining target features based on the average marginal effect, comprising:
And selecting the technical characteristics with the average marginal effect lower than a preset threshold value in the patent texts with the lowest application number as target technical characteristics.
Specifically, the average marginal effect of category 0 (the category representing the lowest number of applications) may be selected as the basis for selecting the target technical feature, that is, the value of the change in probability that the patent belongs to category 0 (i.e., the patent technical value measure index is 0 or 1) when the unit of occurrence of the component of the selected technical feature increases. Obviously, when this value of a feature is negative, the higher the component the feature the patent contains, the more it is able to reduce the probability that it belongs to class 0, i.e. the probability that it belongs to high technical value increases. Therefore, if the average marginal effect of the technical feature in the category representing the lowest technical value is negative, it indicates that the technical feature has positive correlation to the technical value of the improvement patent, and the larger the absolute value of the numerical value is, the larger the influence of the technical feature is. Finally, selecting all technical features with average marginal effect smaller than 0 (under the significance level of 0.05) in the category with the lowest representative technical value as key technical features (namely target technical features) of the electrochemical energy storage technology, taking absolute values of the average marginal effect of the features, and taking the absolute values as numerical bases for calculating the technical value in the follow-up.
The technical value of the patent technology is quantified by constructing a new technical value measure index, and an influence model between the technical value and the technical characteristics is established based on an ordered logistic regression model. The method provided by the invention creatively combines the technical value measure with the statistical modeling, can effectively identify the key technical characteristics with great influence on the technical value, and provides scientific basis for the screening of the key technologies.
The target technical features selected through the steps have clear meanings and accurately point to the technical content with high technical value, and in order to support technical management decisions, the method provided by the invention further condenses the target technical features into a more representative mesoscopic technical theme so as to refine the electrochemical energy storage relation technology.
Clustering the target technical characteristics to obtain an electrochemical energy storage technology identification result, including:
obtaining a similarity matrix corresponding to the target technical features, wherein the similarity matrix comprises the similarity among the target technical features;
based on the similarity matrix, determining a clustering result based on a neighbor propagation algorithm, and taking each cluster in the clustering result as a technical theme;
and determining an electrochemical energy storage technology identification result based on the technical subject.
Firstly, constructing a target by using the selected M target technical featuresSimilarity matrix of (c)WhereinIs the firstFirst 15 main topic keyword sets of project label technical features, each elementRepresenting technical characteristicsAndJaccard coefficients of (a):
;
Then, a neighbor propagation algorithm (Affinity Propagation, AP) based on the similarity matrix is constructed, and the technical features are clustered. Using an iterative messaging mechanism, two pieces of information, the "responsibility" and "availability", are propagated between the data points, determining the cluster center for each data point. For responsibility values, representing data pointsConsider data pointsAs a fitness of its cluster center.To availability, represent data pointsAs data pointsThe suitability of the cluster center. The following iterations are updated until convergence.
;
;
。
According to the final resultSelecting the largest valueAs the cluster center. And continuously iterating the responsibility matrix and the available matrix through the mutual information of the adjacent points, and finally obtaining the clustering quantity and the clustering center in a self-adaptive manner according to the calculation results of the two matrices to realize the technical theme of clustering the technical features into mesoscale.
In patent topic clustering, the confirmation of topic number and clustering robustness are all the time difficult problems. In the method provided by the invention, the clustering quantity is not required to be designated in advance through a neighbor propagation clustering algorithm, all data points are used as potential clustering centers, and clustering is realized through message transmission, so that the subjectivity of the super parameter of decision clustering quantity is avoided, and meanwhile, the Jaccard coefficient is not Euclidean distance measurement, so that higher robustness is provided.
And constructing a similarity matrix by taking Jaccard coefficients of the technical feature topic keyword sets as similarity measures, adaptively determining the clustering quantity and the centers through a neighbor propagation algorithm, and organizing the technical features into technical topics with moderate concrete and strong strategic guidance. The method provided by the invention does not need to preset the clustering quantity, avoids subjectivity, has higher robustness when processing non-European space data, and can better adapt to the complexity and diversity of technical characteristics. And a more representative mesoscopic technical theme is extracted through the clustering result, so that the interpretability and the practicability of the decision support are improved.
And finally obtaining the key technical theme through the clustering operation. And for each key technical theme, combining the specific content of the top 15 theme keywords with the highest content in the theme, and identifying the electrochemical energy storage key technology specifically represented by each technical theme.
In one possible implementation manner, the technical maturity of each technical topic may be further obtained, and in combination with the technical maturity of the technical topic and the number of applications corresponding to the technical topic (reflecting the technical value of the technical feature), each technical topic is evaluated, so as to provide more reference information for a technical decision maker. I.e. after determining the recognition result of the electrochemical energy storage technology based on the technical subject, comprising:
Technical life cycle analysis is carried out on the technical characteristics included in the technical theme, so that the technical maturity of each technical characteristic in the technical theme is obtained;
and generating an evaluation result of each technical topic in the electrochemical energy storage technology identification result based on the average value of the technical maturity of each technical feature included in the technical topic and each technical feature included in the technical topic.
Specifically, according to the method provided by the invention, the identified technical characteristics of the electrochemical energy storage target are modeled through the Gompertz growth S curve in the technical life cycle analysis, and the method is expressed as follows:
in the formula, In order to increase the saturation value of the liquid,To control the parameters of the growth rate, t is the year,Representing popularity of the target technical feature. Based on the increasing data of popularity of all target technical features, fitting an S curve, estimating parameters and taking(I.e., the ratio of the last year's situation to the maximum value of the predicted limit) as a technical feature. Finally, each technical topic uses the average value of the maturity of all technical features in its cluster as a quantization index for its technical maturity.
Where m represents the number of features included under the technical subject matter.
The method measures the application breadth and development depth of a technical feature by measuring the technical value and the technical maturity of the key technical feature. And averaging the technical values and the technical maturity of the technical features belonging to the same subject, wherein the obtained average value is used as the technical value and the technical maturity of the identified electrochemical energy storage key technology. And normalizing the technical value and the technical maturity measurement index of each technology to be respectively used as a vertical axis and a horizontal axis to construct a key technology evaluation matrix. The development situation types of the key technologies are qualitatively divided according to the high and low technical values (approximate to the middle value of the technical value distribution range, namely 0.15) and the standards of early, middle and late technical development (0.25 and 0.75 of technical maturity). Finally, the early-medium-term electrochemical energy storage key technology with high technical value is obtained.
Modeling the life cycle of the technology through a Gompertz growth curve, quantifying the maturity of the technical theme through the mean value in the cluster, constructing a key technology evaluation matrix through the technical value and the technical maturity, and finally identifying the early-middle key technology with high technical value. Revealing the rules of different technologies in the development stage and market value and providing clear guidance for technical management.
The electrochemical energy storage technology recognition device based on patent text analysis provided by the invention is described below, and the electrochemical energy storage technology recognition device based on patent text analysis described below and the electrochemical energy storage technology recognition method based on patent text analysis described above can be correspondingly referred to each other. As shown in fig. 3, the electrochemical energy storage technology recognition device based on patent text analysis provided by the invention comprises:
A patent text acquisition module 310, configured to acquire a patent text database, where the patent text database includes patent text related to electrochemical energy storage;
the text topic mining module 320 is configured to obtain topic distributions corresponding to the patent texts through a dirichlet allocation topic model, and take topics included in each patent text as technical features of electrochemical energy storage;
The technical feature extraction module 330 is configured to obtain the number of applications of the patent text in the plurality of patent institutions, determine an average marginal effect of each technical feature based on the subject distribution and the number of applications of the patent text, and determine a target technical feature based on the average marginal effect;
The technical feature clustering module 340 is configured to cluster the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes a technical theme obtained by clustering the target technical features.
Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include a processor (processor) 410, a communication interface (Communications Interface) 420, a memory (memory) 430, and a communication bus 440, where the processor 410, the communication interface 420, and the memory 430 perform communication with each other through the communication bus 440. The processor 410 may call logic instructions in the memory 430 to execute an electrochemical energy storage technology identification method based on patent text analysis, where the method includes obtaining a patent text database, where the patent text database includes patent texts related to electrochemical energy storage, obtaining a topic distribution corresponding to the patent texts through a dirichlet distribution topic model, taking topics included in each patent text as technical features of electrochemical energy storage, obtaining application numbers of the patent texts in a plurality of patent institutions, determining average marginal effects of each technical feature based on the topic distribution and the application numbers of the patent texts, determining target technical features based on the average marginal effects, and clustering the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes technical topics obtained by clustering the target technical features.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
On the other hand, the invention also provides a computer program product, which comprises a computer program, wherein the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the electrochemical energy storage technology identification method based on the patent text analysis provided by the methods, and the method comprises the steps of acquiring a patent text database, wherein the patent text database comprises patent texts related to electrochemical energy storage, acquiring topic distribution corresponding to the patent texts through a Dirichlet distribution topic model, taking topics included in each patent text as technical characteristics of electrochemical energy storage, acquiring the number of application of the patent texts in a plurality of patent institutions, determining the average marginal effect of each technical characteristic based on the topic distribution and the number of application of the patent texts, determining the target technical characteristics based on the average marginal effect, clustering the target technical characteristics, acquiring an electrochemical energy storage technology identification result, and acquiring technical topics obtained by clustering the target technical characteristics in the electrochemical energy storage technology identification result.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program is implemented when executed by a processor to perform the electrochemical energy storage technology identification method based on patent text analysis provided by the above methods, where the method includes obtaining a patent text database, where the patent text database includes patent text related to electrochemical energy storage, obtaining topic distribution corresponding to the patent text by using a dirichlet distribution topic model, taking topics included in each patent text as technical features of electrochemical energy storage, obtaining application numbers of the patent text in a plurality of patent institutions, determining average marginal effects of each technical feature based on the topic distribution and the application numbers of the patent text, determining target technical features based on the average marginal effects, and clustering the target technical features to obtain an electrochemical energy storage technology identification result, where the electrochemical energy storage technology identification result includes technical topics obtained by clustering the target technical features.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202510229385.7A CN119739866B (en) | 2025-02-28 | 2025-02-28 | Electrochemical energy storage technology identification method and device based on patent text analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202510229385.7A CN119739866B (en) | 2025-02-28 | 2025-02-28 | Electrochemical energy storage technology identification method and device based on patent text analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119739866A true CN119739866A (en) | 2025-04-01 |
CN119739866B CN119739866B (en) | 2025-06-17 |
Family
ID=95134063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202510229385.7A Active CN119739866B (en) | 2025-02-28 | 2025-02-28 | Electrochemical energy storage technology identification method and device based on patent text analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119739866B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070073625A1 (en) * | 2005-09-27 | 2007-03-29 | Shelton Robert H | System and method of licensing intellectual property assets |
KR20200017575A (en) * | 2018-07-24 | 2020-02-19 | 배재대학교 산학협력단 | Similar patent search service system and method |
CN118193731A (en) * | 2024-02-21 | 2024-06-14 | 南方电网科学研究院有限责任公司 | Method and system for topic identification and clustering screening of scientific and technological texts based on SAO structure |
CN119357375A (en) * | 2024-12-20 | 2025-01-24 | 北京亦庄科技创新有限公司 | Method and system for generating relevant prompt words for science and technology text based on large model |
-
2025
- 2025-02-28 CN CN202510229385.7A patent/CN119739866B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070073625A1 (en) * | 2005-09-27 | 2007-03-29 | Shelton Robert H | System and method of licensing intellectual property assets |
KR20200017575A (en) * | 2018-07-24 | 2020-02-19 | 배재대학교 산학협력단 | Similar patent search service system and method |
CN118193731A (en) * | 2024-02-21 | 2024-06-14 | 南方电网科学研究院有限责任公司 | Method and system for topic identification and clustering screening of scientific and technological texts based on SAO structure |
CN119357375A (en) * | 2024-12-20 | 2025-01-24 | 北京亦庄科技创新有限公司 | Method and system for generating relevant prompt words for science and technology text based on large model |
Non-Patent Citations (2)
Title |
---|
刘恬恬: "全球半导体设备制造上市公司动态效率的地区差异、收敛性及其影响因素分析", 《中国优秀硕士学位论文全文数据库(电子期刊)》, 30 November 2024 (2024-11-30), pages 135 - 14 * |
邱一卉;张驰雨;陈水宣;: "基于分类回归树算法的专利价值评估指标体系研究", 厦门大学学报(自然科学版), no. 02, pages 244 - 249 * |
Also Published As
Publication number | Publication date |
---|---|
CN119739866B (en) | 2025-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112836509B (en) | Expert system knowledge base construction method and system | |
Karrar | The effect of using data pre-processing by imputations in handling missing values | |
CN113590807B (en) | Scientific and technological enterprise credit evaluation method based on big data mining | |
CN112884570B (en) | Method, device and equipment for determining model security | |
CN118468061B (en) | Automatic algorithm matching and parameter optimizing method and system | |
CN108804577B (en) | Method for estimating interest degree of information tag | |
CN120068882B (en) | Method and system for analyzing and predicting the trend of scientific and technological literature | |
CN119128990A (en) | Dynamic data adaptive desensitization method and device based on artificial intelligence | |
CN110310012B (en) | Data analysis method, device, equipment and computer readable storage medium | |
CN117556118A (en) | Visual recommendation system and method based on scientific research big data prediction | |
CN119669872B (en) | An information-based accounting archive management method and system | |
Cao | Design and optimization of a decision support system for sports training based on data mining technology | |
CN120012004A (en) | Abnormal behavior identification method and system based on multidimensional data analysis | |
CN114780617B (en) | A technology list generation method and system based on multi-source data and topic model | |
CN120067327A (en) | System and method for calculating technology maturity based on graph convolution neural network | |
Chen et al. | Community Detection Based on DeepWalk Model in Large‐Scale Networks | |
CN119739866B (en) | Electrochemical energy storage technology identification method and device based on patent text analysis | |
US12387150B2 (en) | System and method for hierarchical factor-based forecasting | |
CN118211087A (en) | A method for generating user portraits of repeated offenders based on unbalanced data based on DBSCAN-cGAN-XGBoost model | |
Alshara | [Retracted] Multilayer Graph‐Based Deep Learning Approach for Stock Price Prediction | |
Mutasim et al. | Impute missing values in r language using ibk classification algorithm | |
Ma | The Research of Stock Predictive Model based on the Combination of CART and DBSCAN | |
CN114493899A (en) | Method and system for constructing classification prediction model of authenticable state | |
Chakraborty et al. | Data mining-based variant subset features | |
CN113516550A (en) | Training method, device and equipment for illegal funding risk prediction model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |