JP2007272361A

JP2007272361A - Plant control equipment

Info

Publication number: JP2007272361A
Application number: JP2006094762A
Authority: JP
Inventors: Hisahiro Kusumi; 尚弘楠見; Akihiko Yamada; 昭彦山田; Takaaki Sekiai; 孝明関合; Yoshiharu Hayashi; 喜治林; Masayuki Fukai; 雅之深井; Satoru Shimizu; 悟清水
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2007-10-18
Anticipated expiration: 2026-03-30
Also published as: WO2007116592A1; JP4741968B2

Abstract

【課題】模擬モデルの作成が容易なプラントの制御装置を提供すること。
【解決手段】制御対象１００に所定の操作信号１６を与えたとき、制御対象１００から得られる計測信号１の値を予測するモデル５００と、このモデル５００の予測結果であるモデル出力１３が、モデル出力目標値に収斂するように、モデル５００に与えるモデル入力１２の生成方法を学習する学習部６００と、学習部６００の結果に従って操作信号１５を生成する操作信号生成部３００を有し、操作信号生成部３００により生成される操作信号１５を操作信号１６とするようにしたプラントの制御装置において、計測信号１を取り込む外部入力インターフェイス２１０と、計測信号２の値を保存する計測信号データベース２３０を備え、この計測信号データベース２３０に保存された計測信号の平均と分散を学習部６００で計算し、この平均と分散の結果を用いて操作信号生成部３００により操作信号１５を修正するようにしたもの。
【選択図】図１
To provide a plant control device in which a simulation model can be easily created.
A model 500 that predicts a value of a measurement signal 1 obtained from a control object 100 when a predetermined operation signal 16 is given to the control object 100 and a model output 13 that is a prediction result of the model 500 are modeled. A learning unit 600 that learns how to generate the model input 12 to be given to the model 500 so as to converge to the output target value, and an operation signal generation unit 300 that generates the operation signal 15 according to the result of the learning unit 600. The plant control apparatus in which the operation signal 15 generated by the generation unit 300 is used as the operation signal 16 includes an external input interface 210 that takes in the measurement signal 1 and a measurement signal database 230 that stores the value of the measurement signal 2. The learning unit 600 calculates the average and variance of the measurement signals stored in the measurement signal database 230, and Those to modify the operation signal 15 by the operation signal generating unit 300 using the result of the dispersion and.
[Selection] Figure 1

Description

本発明は、発電設備などプラントの制御装置に係り、特にボイラ設備の制御に好適な制御装置に関する。 The present invention relates to a plant control device such as a power generation facility, and more particularly to a control device suitable for controlling boiler equipment.

火力発電設備などのプラントの制御においては、通常、制御対象であるプラントから得られる計測信号を処理し、制御対象に与える操作信号を算出するようになっている。このため制御装置は、制御対象から取り込まれたプラント計測信号が運転目標を達成するように、操作信号を計算するアルゴリズムが実装されている。 In the control of a plant such as a thermal power generation facility, usually, a measurement signal obtained from a plant to be controlled is processed, and an operation signal to be given to the control target is calculated. For this reason, the control device is implemented with an algorithm for calculating the operation signal so that the plant measurement signal taken from the control target achieves the operation target.

このときのプラントの制御に用いられている制御アルゴリズムとしては、いわゆるＰＩ(比例・積分)制御アルゴリズムが従来から知られている。ここで、このＰＩアルゴリズムとは、運転目標値と計測信号の偏差に比例ゲインを乗じた上で、その値に、偏差を時間積分した値を加算して操作信号を導出するものである。このとき、学習アルゴリズムを用いて、プラントの操作信号を導出する場合もある。 As a control algorithm used for plant control at this time, a so-called PI (proportional / integral) control algorithm is conventionally known. Here, the PI algorithm is to derive the operation signal by multiplying the deviation between the operation target value and the measurement signal by a proportional gain, and adding a value obtained by time-integrating the deviation to the value. At this time, a plant operation signal may be derived using a learning algorithm.

ところで、プラント制御に用いられる制御アルゴリズムには、幾つかのパラメータが存在するが、これらのパラメータについては、制御対象に適した値に事前にチューニングしておく必要がある。そして、このパラメータのチューニングには、制御対象を物理モデル或いは統計モデルなどを用いて模擬(シミュレーション)したものを対象にして実施するのが一般的である。 By the way, some parameters exist in the control algorithm used for plant control, but it is necessary to tune these parameters in advance to values suitable for the control target. In general, tuning of the parameter is performed on a control target that is simulated (simulated) using a physical model or a statistical model.

ここで、特に統計モデルの場合には、人間のもつ神経回路網のニューロンを線形或いは非線形関数で模擬したノードと呼ばれる素子で表現し、これを層状に並べ、前の層から次の層へと信号が伝わっていくネットワーク構造を人工的に模擬した、いわゆるニューラルネットワークを用いる方法がよく知られている。 Here, especially in the case of statistical models, the neurons of the human neural network are represented by elements called nodes that are simulated by linear or nonlinear functions, arranged in layers, and moved from the previous layer to the next layer. A method using a so-called neural network that artificially simulates a network structure through which a signal is transmitted is well known.

このニューラルネットワークを用いたモデルは、入力信号と所望の出力信号を教師信号として与えることで、モデル内のパラメータを調整し、所望の出力信号が出力されるモデルとなる。そして、このように、モデルとして制御対象を模擬したモデルを用いる場合には、制御対象に入力する操作信号をモデルへの入力信号とし、プラントからの計測信号をモデルの出力信号とすればよい。 The model using the neural network is a model in which a desired output signal is output by adjusting parameters in the model by providing an input signal and a desired output signal as a teacher signal. In this way, when a model simulating a control target is used as a model, an operation signal input to the control target may be used as an input signal to the model, and a measurement signal from the plant may be used as an output signal of the model.

このときニューラルネットワークの基本的な構造とモデル内のパラメータを調整する手法としては、例えばバックプロハゲーション(Back Propagation)法や、フイードバック機構をもつニューラルネットワークの学習法であるバックプロハゲーションスルータイム法(Back Propagation Through Time)法などがある(例えば、非特許文献１など参照。)。 At this time, as a method for adjusting the basic structure of the neural network and the parameters in the model, for example, the Back Propagation method, or the back propagation through time which is a learning method of the neural network having a feedback mechanism, is used. (Back Propagation Through Time) method and the like (for example, see Non-Patent Document 1).

一方、ニューラルネットワークのように教師信号を与えて学習する場合とは異なり、教師なし学習の分野では、強化学習と呼ばれる手法が盛んに研究されている。ここで、この強化学習とは、試行錯誤を通じて環境に適応する学習制御の枠組であり、環境の状態を取得し、それに対して行動すると、その内容に応じて報酬が得られるというものであるが、このときの報酬は、環境に対して正しい行動或いは環境が目指す目標に到達するような行動であればある程、多くの報酬が得られることになる。 On the other hand, unlike the case of learning by giving a teacher signal as in a neural network, a technique called reinforcement learning has been actively studied in the field of unsupervised learning. Here, this reinforcement learning is a framework of learning control that adapts to the environment through trial and error, and if you acquire the state of the environment and act on it, you will be rewarded according to its content As the reward at this time is the correct action with respect to the environment or the action that reaches the target of the environment, the more rewards can be obtained.

従って、この場合、より多くの報酬を得るような行動を目標に選択するようになり、結果、環境が目指す目標に到達する行動へと適応していく。このとき、環境に対して、より多くの報酬を得るための行動を選択するものを一般的にエージェントと呼ぶが、ここで環境を制御対象、エージェントを制御器と見なすと、制御対象と試行錯誤的な相互作用を通じて、制御対象から得られる計測信号が望ましいものとなるように、制御対象に与える操作信号の生成方法が学習されることになり、これが学習制御の枠組みとして知られているものである。 Therefore, in this case, an action that obtains more reward is selected as a target, and as a result, the action that reaches the target aimed by the environment is adapted. At this time, an agent that selects an action for obtaining more rewards with respect to the environment is generally called an agent. If the environment is regarded as a control object and the agent as a controller, the control object and the trial and error are considered. The generation method of the operation signal given to the controlled object is learned so that the measurement signal obtained from the controlled object becomes desirable through the dynamic interaction, and this is known as a learning control framework. is there.

この強化学習では、制御対象から得られる信号を用いて計算されるスカラー量の評価値(これが強化学習で報酬と呼ばれているものである)を手掛かりにして、現状態から将来までに得られる評価値の期待値が最大となるような、操作信号の生成方法が学習されることになるが、このときの操作信号の生成方法としては、計測信号が運転目標値を達成した場合に正の評価値を与え、Actor−Critic、Ｑ学習、実時間 Dynamic Programmingなどのアルゴリズムを用いて学習する手法が知られている(例えば、非特許文献２など参照。)。 In this reinforcement learning, it can be obtained from the current state to the future using the evaluation value of the scalar quantity calculated by using the signal obtained from the controlled object (this is called reward in reinforcement learning). The operation signal generation method that maximizes the expected value of the evaluation value is learned, but the operation signal generation method at this time is positive when the measurement signal reaches the driving target value. There is known a method of giving an evaluation value and learning using an algorithm such as Actor-Critic, Q-learning, or real-time dynamic programming (see, for example, Non-Patent Document 2).

また、上述の手法を発展させた方式として、Dyna−アーキテクチャと呼ばれる枠組みも知られている(非特許文献１参照)が、この枠組みでは、制御装置内に制御対象を模擬するモデルを持つ。この場合、モデルには、制御対象に与える操作信号をモデル入力として取り込み、制御対象の計測信号の予測値であるモデル出力を算出する。このときのモデルは物理式や統計的手法を用いて構築する。 Further, a framework called Dyna-architecture is known as a method developed from the above-described method (see Non-Patent Document 1). In this framework, a model for simulating a control target is provided in the control device. In this case, the model takes an operation signal given to the controlled object as a model input, and calculates a model output that is a predicted value of the measured signal to be controlled. The model at this time is constructed using physical formulas and statistical methods.

そして、このモデル出力を用いて計算される評価値を手掛りにしてモデル入力の生成方法を学習するが、このDyna−アーキテクチャでは、モデル出力目標値を達成するようなモデル入力の生成方法を予め学習しておき、この学習結果に従って制御対象に印加する操作信号を決定する。
“ニューラルネットと計測制御” 西川繕一・北村新三編著朝倉書店１９９５年１月２５日出版 “強化学習(Reinforcement Learning)” 三上貞芳・皆川雅章共訳森北出版株式会社２０００年１２月２０日出版 Then, a model input generation method is learned by using the evaluation value calculated using this model output as a clue. In this Dyna-architecture, a model input generation method that achieves the model output target value is learned in advance. In addition, an operation signal to be applied to the controlled object is determined according to the learning result.
“Neural Network and Measurement Control” by Nishikawa Hoichi and Shinzo Kitamura, Asakura Shoten, published January 25, 1995 “Reinforcement Learning” by Sadayoshi Mikami and Masaaki Minagawa Morikita Publishing Co., Ltd. December 20, 2000

前述のように、プラントの制御装置を設計する際には、制御対象を適切に模擬したモデルを作成する必要があるが、ここで、まず、物理モデルによる制御対象の模擬を考えた場合、精度向上のためには、詳細な物理モデルと数値解析が必要となる。そして、この数値解析には、メッシュ(計算格子)の作成が必要となり、精度の向上には、メッシュ数の増加が必要になる。 As mentioned above, when designing a plant control device, it is necessary to create a model that appropriately simulates the controlled object. In order to improve, detailed physical models and numerical analysis are required. This numerical analysis requires creation of a mesh (calculation grid), and an increase in the number of meshes is necessary for improving accuracy.

例えば、火力発電所のボイラのように大型のプラントを対象とし、その燃焼現象を解析する場合などは、多大なメッシュ数を必要とし、このためには数時間〜数十数日の計算時間を要することがあり、このため、アルゴリズムの高速化や並列計算化によって計算時間を短縮する対策が従来から用いられているが、それでも多様な運転条件を連続的に計算することは事実上困難である。 For example, when analyzing a combustion phenomenon for a large plant such as a boiler of a thermal power plant, a large number of meshes are required. For this reason, measures to shorten the calculation time by increasing the algorithm speed or parallel calculation have been used, but it is still difficult to calculate various operating conditions continuously. .

また、統計モデルによる制御対象の模擬を考えた場合、モデル作成に用いたデータにはよく適合(フィット)するが、異なった値が入力された場合の精度が著しく低下するという現象が発生する。この現象は、一般的には過学習と呼ばれているが、従来技術では、この現象を回避し、汎用性のあるモデルを作成する工夫が必要であり、このため適用が困難で適用範囲に制限があった。 In addition, when considering simulation of a control target using a statistical model, the data used to create the model is well fitted (fit), but the phenomenon that the accuracy when a different value is input is significantly reduced occurs. This phenomenon is generally referred to as over-learning, but the conventional technology requires a device to avoid this phenomenon and create a general-purpose model. There were restrictions.

本発明は、上述の課題に鑑みてなされたものであり、その目的は、模擬モデルの作成が容易なプラントの制御装置を提供することにある。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a plant control apparatus in which a simulation model can be easily created.

上記目的は、制御対象に所定の操作信号を与えたとき、前記制御対象から得られる計測信号の値が、当該制御対象の運転目標値に収まって行くようにするのに必要な操作信号を生成し、この操作信号を前記所定の操作信号とするようにしたプラントの制御装置であって、制御対象に所定の操作信号を与えたとき、当該制御対象から得られる計測信号の値を予測するモデルと、このモデルの予測結果であるモデル出力が、モデル出力目標値に収斂するように、前記モデルに与えるモデル入力の生成方法を学習する学習手段と、この学習手段の結果に従って前記制御対象に与える操作信号を生成する操作信号生成手段を有し、この操作信号生成手段により生成される操作信号を前記所定の操作信号とするようにしたプラントの制御装置において、前記制御対象の計測信号を取り込む外部入力インターフェイスと、前記インターフェースにより取り込んだ計測信号の値を保存する計測信号データベースを備え、前記計測信号データベースに保存された計測信号の平均と分散を計算し、この平均と分散の結果を用いて前記操作信号を修正し、前記所定の操作信号を新たに生成することにより達成される。 The purpose is to generate an operation signal necessary for the measurement signal value obtained from the control object to be within the operation target value of the control object when a predetermined operation signal is given to the control object. A plant control apparatus that uses the operation signal as the predetermined operation signal, and predicts the value of the measurement signal obtained from the control object when the predetermined operation signal is given to the control object. And learning means for learning a method of generating a model input to be given to the model so that the model output which is a prediction result of the model converges to the model output target value, and given to the control object according to the result of the learning means In a plant control apparatus that includes an operation signal generation unit that generates an operation signal, and the operation signal generated by the operation signal generation unit is the predetermined operation signal. An external input interface that captures the measurement signal to be controlled and a measurement signal database that stores the value of the measurement signal captured by the interface, and calculates the average and variance of the measurement signal stored in the measurement signal database. This is achieved by modifying the operation signal using the result of averaging and variance and newly generating the predetermined operation signal.

また、上記目的は、制御対象に所定の操作信号を与えたとき、前記制御対象から得られる計測信号の値が、当該制御対象の運転目標値に収まって行くようにするのに必要な操作信号を生成し、この操作信号を前記所定の操作信号とするようにしたプラントの制御装置であって、制御対象に所定の操作信号を与えたとき、当該制御対象から得られる計測信号の値を予測するモデルと、このモデルの予測結果であるモデル出力が、モデル出力目標値に収斂するように、前記モデルに与えるモデル入力の生成方法を学習する学習手段と、この学習手段の結果に従って前記制御対象に与える操作信号を生成する操作信号生成手段を有し、この操作信号生成手段により生成される操作信号を前記所定の操作信号とするようにしたプラントの制御装置において、前記制御対象の計測信号を取り込む外部入力インターフェイスと、前記インターフェースにより取り込んだ計測信号の値を保存する計測信号データベースを備え、前記計測信号データベースに保存された計測信号の平均と分散を計算し、前記操作信号を修正して新たな操作信号を生成する際、前記操作信号の変化幅を、前記計測信号の分散に基づいて決定することにより達成される。 Further, the above-described object is that when a predetermined operation signal is given to the control object, an operation signal necessary for the value of the measurement signal obtained from the control object to fall within the operation target value of the control object. Is generated and the operation signal is used as the predetermined operation signal, and when the predetermined operation signal is given to the control target, the value of the measurement signal obtained from the control target is predicted. And a learning means for learning a generation method of a model input given to the model so that a model output that is a prediction result of the model converges to a model output target value, and the control target according to a result of the learning means The plant control device has an operation signal generation means for generating an operation signal to be applied to the plant, and the operation signal generated by the operation signal generation means is used as the predetermined operation signal. An external input interface for capturing the measurement signal to be controlled, and a measurement signal database for storing the value of the measurement signal captured by the interface, and calculating an average and variance of the measurement signals stored in the measurement signal database. When the operation signal is modified to generate a new operation signal, the change width of the operation signal is determined based on the variance of the measurement signal.

このとき、計測信号の平均と分散から期待値を計算した結果を用いて操作信号を生成する機能を備えるようにしてもよく、外部入力機能として、制御装置に計測信号の分布形状を入力するためのユーザーインターフェイスを備えるようにしてもよく、更には外部入力機能として、制御装置に計測信号の平均値、期待値、分散、分布形状のうち少なくとも一つを入力するためのユーザーインターフェイスを備えるようにしてもよい。 At this time, a function for generating an operation signal using a result of calculating an expected value from the average and variance of the measurement signal may be provided, and as an external input function, a distribution shape of the measurement signal is input to the control device. In addition, as an external input function, a user interface for inputting at least one of the average value, expected value, variance, and distribution shape of the measurement signal to the control device may be provided. May be.

また、このとき、前記制御対象が火力発電プラントであり、前記火力発電プラントの計測信号のうち、一酸化炭素、窒素酸化物の少なくとも１つを制御装置内に取り込む機能と、外部入力機能として、一酸化炭素、窒素酸化物の少なくとも1つの環境規制値を計測信号の制限値として設定する機能と、学習結果に従って、少なくとも空気ダンパ開度の操作信号を生成する機能を備えるようにしても上記目的が達成され、更に前記火力発電プラントの計測信号のうち、一酸化炭素、窒素酸化物の少なくとも１つを制御装置内に取り込む機能と、外部入力機能として、一酸化炭素、窒素酸化物、二酸化炭素、硫黄酸化物、水銀、フッ素、煤塵またはミストからなる微粒子類、揮発性有機化合物の少なくとも１つの環境規制値を前記計測信号の制限値として設定する機能と、学習結果に従って、空気ダンパ開度、バーナへ供給する燃料流量、バーナ空気流量、エアポートへ供給する空気流量、ガス再循環量、バーナ角度、供給空気温度のうち少なくとも一つの操作信号を生成する機能を備えるようにしても上記目的が達成される。 Further, at this time, the control target is a thermal power plant, and among the measurement signals of the thermal power plant, a function of taking at least one of carbon monoxide and nitrogen oxide into the control device, and an external input function, Even if it has a function of setting at least one environmental regulation value of carbon monoxide and nitrogen oxide as a limit value of the measurement signal and a function of generating an operation signal of at least the air damper opening according to the learning result Furthermore, among the measurement signals of the thermal power plant, a function of taking at least one of carbon monoxide and nitrogen oxide into the control device, and an external input function are carbon monoxide, nitrogen oxide, and carbon dioxide. , Sulfur oxide, mercury, fluorine, fine particles of dust or mist, and at least one environmental regulation value of volatile organic compounds According to the learning result and the function to be set as at least one of air damper opening, fuel flow rate supplied to the burner, burner air flow rate, air flow rate supplied to the air port, gas recirculation amount, burner angle, supply air temperature The above object can also be achieved by providing a function for generating a signal.

本発明によれば、計測信号の平均及び分散を計算し、この計算結果から制御対象を模擬するモデルを作成するようになっているので、制御対象を模擬するモデルには、データの蓄積に応じた分布形状が組み込まれ、分散の大きさなどからデータの変動を知ることができる。 According to the present invention, the average and variance of the measurement signal are calculated, and a model for simulating the controlled object is created from the calculation result. Therefore, the model for simulating the controlled object corresponds to the accumulation of data. The distribution shape is incorporated, and fluctuations in data can be known from the size of the variance.

この結果、分散が大きい場合には、プラントの運転状態或いは他のプロセス値の影響が大きいことが分り、分散が小さい場合には、プラントの運転状態或いは他のプロセス値の影響が小さいことが分り、従って、本発明によれば、分散の大きさを考慮し制御アルゴリズムを構築することで、データ変動や蓄積データの少なさからくる信頼性の低さを回避することができる。 As a result, it can be seen that when the variance is large, the influence of the plant operating state or other process values is large, and when the variance is small, the influence of the plant operating state or other process values is small. Therefore, according to the present invention, by constructing a control algorithm in consideration of the magnitude of dispersion, it is possible to avoid low reliability due to data fluctuations and a small amount of accumulated data.

以下、本発明によるプラントの制御装置について、図示の実施の形態により詳細に説明する。図１は、本発明に係るプラントの制御装置を制御対象１００に適用した場合の一実施形態で、このため、制御装置２００と入力装置９００、保守ツール９１０、それに画像表示装置９５０を備えている。 Hereinafter, a plant control apparatus according to the present invention will be described in detail with reference to embodiments shown in the drawings. FIG. 1 shows an embodiment in which a plant control apparatus according to the present invention is applied to a control object 100. For this purpose, a control apparatus 200, an input apparatus 900, a maintenance tool 910, and an image display apparatus 950 are provided. .

まず、制御装置２００は、制御対象１００からの計測信号１を、外部入力インターフェイス２１０を介して取り込み、操作信号１６は、外部出力インターフェイス２２０を介して、制御対象１００に送信する。外部入力インターフェイス２１０で取り込んだ計測信号２は、操作信号生成部３００に伝送され、これと共に、計測信号データベース２３０に保存される。そして、操作信号生成部３００において生成された操作信号１５は、外部出力インターフェイス２２０に伝送され、これと共に、操作信号データベース２４０に保存される。 First, the control device 200 captures the measurement signal 1 from the control target 100 via the external input interface 210, and transmits the operation signal 16 to the control target 100 via the external output interface 220. The measurement signal 2 captured by the external input interface 210 is transmitted to the operation signal generation unit 300 and is stored in the measurement signal database 230 together with this. The operation signal 15 generated by the operation signal generation unit 300 is transmitted to the external output interface 220 and is stored in the operation signal database 240 together with this.

操作信号生成部３００では、制御ロジックデータベース２５０と学習情報データベース２８０に保存されている情報を用い、制御対象１００からの計測信号１が運転目標値を達成するように、操作信号１５を生成する。このとき学習情報データベース２８０に保存される情報は、学習部６００により生成されるが、このため学習部６００は、モデル５００と接続される。 The operation signal generation unit 300 generates the operation signal 15 using the information stored in the control logic database 250 and the learning information database 280 so that the measurement signal 1 from the control target 100 achieves the driving target value. At this time, the information stored in the learning information database 280 is generated by the learning unit 600. For this reason, the learning unit 600 is connected to the model 500.

ここで、このモデル５００は、制御対象１００の特性を模擬する機能を持つ。つまり、操作信号１６を制御対象１００に与え、計測信号１を得る場合と同じように、モデル５００を動作させるためのモデル入力１２を当該モデル５００に与え、その結果として、モデル出力１３を得るのである。このときモデル出力１３は、計測信号１の予測値となる。従って、モデル５００は、制御対象１００の特性を模擬するものであり、これは物理法則に基づくモデル式、或いは統計的手法を用いて、モデル入力１２に対するモデル出力１３を計算する機能を持つ。 Here, the model 500 has a function of simulating the characteristics of the controlled object 100. That is, the model input 12 for operating the model 500 is given to the model 500 in the same manner as when the operation signal 16 is given to the control object 100 and the measurement signal 1 is obtained, and as a result, the model output 13 is obtained. is there. At this time, the model output 13 is a predicted value of the measurement signal 1. Therefore, the model 500 simulates the characteristics of the controlled object 100 and has a function of calculating the model output 13 for the model input 12 using a model formula based on a physical law or a statistical method.

モデル作成部４００は、モデルパラメータデータベース２７０に保存されている前回モデルパラメータ５と計測信号３を用い、これからモデル５００を生成する機能を持つ。また、このモデル作成部４００は、モデルパラメータデータベース２７０に前回モデルパラメータ５が無った場合には、乱数等によって生成されたモデルパラメータと計測信号３を用いて、新たにモデル５００を生成する機能を持つ。 The model creation unit 400 has a function of generating a model 500 from the previous model parameter 5 and the measurement signal 3 stored in the model parameter database 270. The model creation unit 400 also has a function of newly generating a model 500 using the model parameter generated by a random number or the like and the measurement signal 3 when the model parameter database 270 does not have the model parameter 5 last time. have.

そこで、学習部６００は、学習情報データベース２８０に保存されている前回学習情報１１、学習パラメータデータベース２６０に保存されている学習パラメータ７、及びモデル出力１３を用いてモデル入力１２を生成する。このため、モデル５００で計算されたモデル出力１３を用いて計算した評価値１４が、この学習部６００に入力されている。そして、この学習部６００では、評価値１４を用いて学習情報を更新し、更新学習情報１０を学習情報データベース２８０に送信する。 Therefore, the learning unit 600 generates a model input 12 using the previous learning information 11 stored in the learning information database 280, the learning parameter 7 stored in the learning parameter database 260, and the model output 13. For this reason, the evaluation value 14 calculated using the model output 13 calculated by the model 500 is input to the learning unit 600. In the learning unit 600, the learning information is updated using the evaluation value 14, and the updated learning information 10 is transmitted to the learning information database 280.

操作信号生成部３００では、学習情報データベース２８０に保存されている学習情報９と、制御ロジックデータベース２５０に保存されている制御ロジック情報６を用いて、操作信号１５を生成する。このとき制御対象１００の運転員は、キーボード９０１とマウス９０２で構成されている入力装置９００と、画像表示装置９５０に接続されている保守ツール９１０を用いることにより、制御装置２００に備えられている種種のデータベースに保存されている情報にアクセスすることができる。 The operation signal generation unit 300 generates the operation signal 15 using the learning information 9 stored in the learning information database 280 and the control logic information 6 stored in the control logic database 250. At this time, the operator of the control target 100 is provided in the control device 200 by using the input device 900 configured by the keyboard 901 and the mouse 902 and the maintenance tool 910 connected to the image display device 950. You can access information stored in various databases.

保守ツール９１０は、外部入力インターフェイス９２０、データ送受信処理部９３０、外部出力インターフェイス９４０で構成されていて、入力装置９００で生成した入力信号３１は、外部入力インターフェイス９２０を介して、この保守ツール９１０に取り込まれる。このときデータ送受信処理部９３０では、入力信号３２の情報に従って、制御装置２００に備えられているデータベース情報３０を取得する。 The maintenance tool 910 includes an external input interface 920, a data transmission / reception processing unit 930, and an external output interface 940. An input signal 31 generated by the input device 900 is sent to the maintenance tool 910 via the external input interface 920. It is captured. At this time, the data transmission / reception processing unit 930 acquires the database information 30 provided in the control device 200 according to the information of the input signal 32.

データ送受信処理部９３０では、データベース情報３０を処理した結果、得られる出力信号３３を、外部出力インターフェイス９４０に送信する。そして、この外部出力インターフェイス９４０から出力信号３４が画像表示装置９５０に供給され、運転員のモニタに備えて画像として表示される。 The data transmission / reception processing unit 930 transmits an output signal 33 obtained as a result of processing the database information 30 to the external output interface 940. The output signal 34 is supplied from the external output interface 940 to the image display device 950 and displayed as an image in preparation for the operator's monitor.

なお、この実施形態では、必要なデータベースが全て制御装置２００の内部に配置されているが、これらを制御装置２００の外部に配置することもできる。また、この実施形態では、操作信号１６を生成するための信号処理機能が全て制御装置２００の内部に配置されているが、これらを制御装置２００の外部に配置してもよい。 In this embodiment, all necessary databases are arranged inside the control device 200, but these can also be arranged outside the control device 200. Further, in this embodiment, all the signal processing functions for generating the operation signal 16 are arranged inside the control device 200, but these may be arranged outside the control device 200.

次に、この実施形態の動作について、以下、本発明を火力発電プラントに適用した場合を例にとり、データベースに保存されている情報、及び信号処理機能と共に説明する。ここで、ます制御対象１００となる火力発電プラントについて、図２により説明する。ここでは、石炭を燃料とする場合について説明すると、この場合は、コールバンカー１１１に石炭が貯蔵されている。そして、このコールバンカー１１１から給炭器１１２を介してミル１１０に石炭が供給される。 Next, the operation of this embodiment will be described below along with the information stored in the database and the signal processing function, taking the case where the present invention is applied to a thermal power plant as an example. Here, a thermal power plant that is to be controlled 100 will be described with reference to FIG. Here, the case where coal is used as fuel will be described. In this case, coal is stored in the coal bunker 111. The coal bunker 111 supplies coal to the mill 110 via the coal feeder 112.

ミル１１０では、内部のローラにより石炭が細かく砕かれ、微粉状の石炭、いわゆる微粉炭にされる。そして、この微粉炭が石炭搬送用の１次空気により、燃焼調整用の２次空気と共にバーナー１０２に搬送され、ボイラ１０１の炉内に供給されて燃焼される。このとき１次空気は配管１３３を介してミル１１０に供給され、微粉炭と１次空気は配管１３４を介してバーナー１０２に、また、２次空気は配管１４１を介してバーナー１０２に、それぞれ導かれる。 In the mill 110, the coal is finely crushed by an internal roller to be pulverized coal, so-called pulverized coal. And this pulverized coal is conveyed with the primary air for coal conveyance to the burner 102 with the secondary air for combustion adjustment, is supplied in the furnace of the boiler 101, and is combusted. At this time, the primary air is supplied to the mill 110 via the pipe 133, the pulverized coal and the primary air are guided to the burner 102 via the pipe 134, and the secondary air is guided to the burner 102 via the pipe 141. It is burned.

このとき、ボイラ１０１の炉内には、アフタエアポート１０３を介して、２段燃焼用のアフタエアが供給されるが、このアフタエアは、配管１４２を介して導かれる。そして、石炭の燃焼により炉内で発生した高温のガスは、ボイラ１０１の炉内でボイラ本体の熱交換器１０６を含む所定の経路に沿って流れた後、エアーヒーター１０４を通過し、排ガス処理された後、煙突を介して大気に放出される。 At this time, the after-air for two-stage combustion is supplied into the furnace of the boiler 101 via the after-air port 103, and this after-air is guided via the pipe 142. The high-temperature gas generated in the furnace due to the combustion of coal flows along the predetermined path including the heat exchanger 106 of the boiler body in the furnace of the boiler 101, and then passes through the air heater 104 to treat the exhaust gas. And then released to the atmosphere through the chimney.

このとき、ボイラ１０１の熱交換器１０６を循環する給水は、給水ポンプ１０５により加圧されてボイラ１０１に導入され、熱交換器１０６で加熱され高温高圧の蒸気となる。なお、この例では、熱交換器が１基になっているが、熱交換器が複数基、配置されるようにしてもよい。 At this time, the feed water circulating through the heat exchanger 106 of the boiler 101 is pressurized by the feed water pump 105 and introduced into the boiler 101 and heated by the heat exchanger 106 to become high-temperature and high-pressure steam. In this example, one heat exchanger is provided, but a plurality of heat exchangers may be arranged.

熱交換器１０６を通過して高温高圧になった蒸気は、タービンガバナ１０７を介して蒸気タービン１０８に導かれ、ここで蒸気の持つエネルギーが回転エネルギーに変換され、発電機１０９が回転駆動された結果、電力が発生される。このとき蒸気タービン１０８の排気は復水器１１３に送られ、ここで冷却された結果、復水となって再び給水ポンプ１０５に送られる。この過程で、タービン１０８から抽気を行い、抽気した蒸気によ給水を加熱する装置を配置し、熱効率を向上させるようになっている。 The steam that has passed through the heat exchanger 106 and became high temperature and pressure is guided to the steam turbine 108 through the turbine governor 107, where the energy of the steam is converted into rotational energy, and the generator 109 is rotationally driven. As a result, electric power is generated. At this time, the exhaust gas from the steam turbine 108 is sent to the condenser 113, and as a result of cooling here, it becomes condensed water and is sent to the feed water pump 105 again. In this process, air is extracted from the turbine 108, and a device for heating the feed water by the extracted steam is arranged to improve the thermal efficiency.

ところで、このような火力発電プラントには様々な計測器が配置されている。例えば、図２には、流量計測器１５０、温度計測器１５１、圧力計測器１５２、発電出力計測器１５３、及び濃度計測器１５４が図示されている。そして、流量計測器１５０では、給水ポンプ１０５からボイラ１０１に供給される給水の流量が計測される。また、温度計測器１５１と圧力計測器１５２は、蒸気タービン１０８に供給される蒸気の温度と圧力が計測する。発電機１０９で発電された電力量は、発電出力計測器１５３で計測する。 By the way, various measuring instruments are arranged in such a thermal power plant. For example, FIG. 2 illustrates a flow rate measuring device 150, a temperature measuring device 151, a pressure measuring device 152, a power generation output measuring device 153, and a concentration measuring device 154. And in the flow measuring device 150, the flow volume of the water supply supplied to the boiler 101 from the water supply pump 105 is measured. The temperature measuring device 151 and the pressure measuring device 152 measure the temperature and pressure of the steam supplied to the steam turbine 108. The amount of power generated by the power generator 109 is measured by a power generation output measuring device 153.

一方、ボイラ１０１を通過中のガスに含まれているＣＯ(一酸化炭素)、ＮＯｘ(窒素酸化物)、二酸化炭素、硫黄酸化物、水銀、フッ素、煤塵またはミストからなる微粒子類、揮発性有機化合物の少なくとも１つの環境規制値などの成分の濃度に関する情報は、濃度計測器１５４で計測される。なお、一般的には、図２に図示した以外にも、多数の計測器が火力発電プラントに配置されているが、図２では省略している。そして、これらの計測器から取得された情報が、図１では制御対象１００から出力される計測情報１として示されていて、これらが制御装置２００に伝送される。 On the other hand, CO (carbon monoxide), NOx (nitrogen oxide), carbon dioxide, sulfur oxide, mercury, fluorine, dust or mist fine particles contained in the gas passing through the boiler 101, volatile organic Information on the concentration of the component such as at least one environmental regulation value of the compound is measured by the concentration measuring device 154. In general, many measuring instruments other than those shown in FIG. 2 are arranged in the thermal power plant, but are omitted in FIG. Information acquired from these measuring instruments is shown as measurement information 1 output from the control object 100 in FIG. 1, and these are transmitted to the control device 200.

次に、バーナー１０２から供給される１次空気と２次空気、それにアフタエアポート１０３から投入されるアフタエアの経路について説明する。まず、１次空気は、ファン１２０から配管１３０に取り込まれ、途中でエアーヒーター１０４を通過する配管１３２と通過しない配管１３１に分岐た後、配管１３３に合流してミル１１０に導かれる。このときエアーヒーター１０４を通過する空気はガスにより加熱され、ミル１１０で生成される微粉炭をバーナー１０２に搬送するのに使用される。 Next, the paths of the primary air and secondary air supplied from the burner 102 and the after-air supplied from the after-air port 103 will be described. First, the primary air is taken into the pipe 130 from the fan 120, and is branched into a pipe 132 that passes through the air heater 104 and a pipe 131 that does not pass along the way, and then joins the pipe 133 and is guided to the mill 110. At this time, the air passing through the air heater 104 is heated by the gas and used to convey the pulverized coal produced by the mill 110 to the burner 102.

一方、２次空気とアフタエアは、ファン１２１により配管１４０に取り込まれ、エアーヒーター１０４で加熱された後、２次空気用の配管１４１とアフタエア用の配管１４２とに分岐し、それぞれバーナー１０２とアフタエアポート１０３に導かれるようになっている。 On the other hand, the secondary air and the after air are taken into the pipe 140 by the fan 121 and heated by the air heater 104, and then branched into the secondary air pipe 141 and the after air pipe 142, respectively, and the burner 102 and the after air, respectively. It is led to the airport 103.

図３は、このときの１次空気と２次空気、それにアフタエアが通過する配管部と、エアーヒーター１０４の拡大図であり、この図に示されているように、各配管にはエアダンパ１６０、１６１、１６２、１６３が配置されていて、これらのエアダンパを操作することにより、配管を空気が通過する面積を変更することでき、エアダンパの操作によって配管を通過する空気流量を調整できる。そこで、制御装置２００は、そこで生成される操作信号１６を用いて、給水ポンプ１０５、ミル１１０、エアダンパ１６０、１６１、１６２、１６３などの機器を操作する。 FIG. 3 is an enlarged view of the air heater 104 and the piping section through which the primary air and secondary air and after-air pass at this time, and as shown in this figure, each pipe has an air damper 160, 161, 162, and 163 are arranged, and by operating these air dampers, the area through which air passes through the pipe can be changed, and the flow rate of air passing through the pipe can be adjusted by operating the air damper. Therefore, the control device 200 operates devices such as the water supply pump 105, the mill 110, and the air dampers 160, 161, 162, and 163 using the operation signal 16 generated there.

次に、計測信号データベース２３０と操作信号データベース２４０に保存される情報について、図４と図５により説明する。ここで、図４は計測信号データベース２３０に保存される情報の一例であり、図５は操作信号データベース２４０に保存されている情報の一例である。 Next, information stored in the measurement signal database 230 and the operation signal database 240 will be described with reference to FIGS. 4 is an example of information stored in the measurement signal database 230, and FIG. 5 is an example of information stored in the operation signal database 240.

まず、計測信号データベース２３０には、図４に示すように、制御対象１００において計測された情報が、計測器毎に各計測時刻と共に保存される。例えば、図２における流量計測器１５０で計測した流量値Ｆ、温度計測器１５１で計測した温度値Ｔ、圧力計測器１５２で計測した圧力値ｐ、発電出力計測器１５３で計測した圧力値ｐ、発電出力値Ｅ、それに排ガスに含まれるＮＯｘ濃度Ｄが、時間の情報と共に保存される。 First, in the measurement signal database 230, as shown in FIG. 4, information measured on the control target 100 is stored together with each measurement time for each measuring instrument. For example, the flow rate value F measured by the flow meter 150 in FIG. 2, the temperature value T measured by the temperature meter 151, the pressure value p measured by the pressure meter 152, the pressure value p measured by the power generation output meter 153, The power generation output value E and the NOx concentration D contained in the exhaust gas are stored together with time information.

このとき、計測信号データベース２３０に格納されているデータを容易に活用できるようにするため、各計測値には、図示のように、ＰＩＤ番号という固有の番号が割り当てられている。なお、この図４では、１秒周期でデータを保存しているが、このときの周期、つまりデータ収集のサンプリング周期は任意に設定することが可能である。 At this time, in order to make it easy to use the data stored in the measurement signal database 230, a unique number called a PID number is assigned to each measurement value as shown in the figure. In FIG. 4, data is stored at a cycle of 1 second, but the cycle at this time, that is, the sampling cycle of data collection can be arbitrarily set.

次に、操作信号データベース２４０には、図５に示すように、給水流量の指令信号などの操作信号が、時間の情報と共に保存される。なお、ここでも各操作信号には固有のＰＩＤ番号が割り当てられており、時間間隔も任意に設定することができるのは言うまでもない。 Next, as shown in FIG. 5, the operation signal database 240 stores operation signals such as a feed water flow rate command signal together with time information. Here, it goes without saying that a unique PID number is assigned to each operation signal, and the time interval can be arbitrarily set.

次に、モデル作成部４００とモデル５００の動作について説明する。モデル５００は、図６に示す計測信号の関係を、図７に示す構造により実現する。ここで図６は、空気流量比率と計測信号Ａとの関係をプロットしたものであるが、このとき、プラントの状況により、グラフにプロットできるデータの数が異なる。例えば、新設プラントでは、設計値情報などから求めることになるので、少ないデータ数となる。一方、運転年数の多いプラントでは、データ数が多くなる。 Next, operations of the model creation unit 400 and the model 500 will be described. The model 500 realizes the relationship of the measurement signals shown in FIG. 6 by the structure shown in FIG. Here, FIG. 6 is a plot of the relationship between the air flow rate ratio and the measurement signal A. At this time, the number of data that can be plotted on the graph varies depending on the state of the plant. For example, in a new plant, since it is obtained from design value information, the number of data is small. On the other hand, the number of data increases in a plant with many years of operation.

このように、データ数はプラント状況により差異が生じるため、ここでは、各データに分布を仮定し、データ数の差異を分布の形状で表現することにする。そうすると、データ数が少ない場合には分散が大きいので、広がった分布となり、他方、データ数が多い場合には、分散が小さくなるので尖った分布となる。このときデータに対する事前情報がある場合には、分布形状を仮定することができるが、新規データなどの場合には、事前情報が無く得られたデータをもとに分布を推定する必要がある。 Thus, since the number of data varies depending on the plant situation, here, a distribution is assumed for each data, and the difference in the number of data is expressed by the shape of the distribution. Then, when the number of data is small, the variance is large, so the distribution is widened. On the other hand, when the number of data is large, the variance is small and the distribution is sharp. At this time, if there is prior information on the data, the distribution shape can be assumed. However, in the case of new data, it is necessary to estimate the distribution based on the data obtained without the prior information.

ここで、データのみから分布を推定する手法は多数知られているが、何れも、母集団分布が何であってもデータ数の増加によりその分布は正規分布に近づくという、中心極限定理から正規分布を仮定すればよい。分布が仮定できれば平均と分散より形状を決定することができる。なお、この中心極限定理による正規分布の仮定については、例えば、“統計学入門”東京大学教養学部統計学教室編、東京大学出版会、１９９１年７月１０日出版”に詳しく述べられている。 There are many known methods for estimating the distribution from only the data, but in any case, the normal distribution is based on the central limit theorem that the distribution approaches the normal distribution by increasing the number of data, regardless of the population distribution. Can be assumed. If the distribution can be assumed, the shape can be determined from the mean and the variance. The assumption of normal distribution based on the central limit theorem is described in detail in, for example, “Introduction to Statistics”, the Department of Statistics, Faculty of Liberal Arts, The University of Tokyo, Tokyo University Press, published on July 10, 1991.

図７は、分布を坂定した場合のモデル構造を説明した図で、このときの出力信号としては、図６で示した分布の中央値と分散値を出力するようなモデルで、入力層、中間層、出力層からなり、各層のノードが相互的に結合した構造となっている。ノード部は線形或いは非線形関数を用いるが、シグモイド関数を用いるのが一般的である。各ノードの結合には重み係数があり、各ノードの相互関係の強さを表している。 FIG. 7 is a diagram for explaining a model structure when the distribution is determined. As an output signal at this time, a model that outputs the median value and the variance value of the distribution shown in FIG. It consists of an intermediate layer and an output layer, and has a structure in which nodes of each layer are mutually coupled. The node portion uses a linear or nonlinear function, but generally uses a sigmoid function. Each node connection has a weighting coefficient, and represents the strength of the mutual relationship between the nodes.

通常、モデルパラメータ(後述)とは、この重み係数を指す。また、ここでは、中間層を一層で表現しているが、多層で表現することも可能である。入力信号には、関連する計測信号を入力する。このモデルで制御対象を模擬すると、蓄積データ数を考慮したモデルを作成することができるため、制御対象のさまざまな状態が容易に模擬できる。 Usually, the model parameter (described later) refers to this weighting factor. Here, the intermediate layer is expressed as a single layer, but may be expressed as a multilayer. A related measurement signal is input to the input signal. By simulating the controlled object with this model, it is possible to create a model that takes into account the number of stored data, and thus various states of the controlled object can be easily simulated.

図８は、モデル作成部４００によるモデル５００を作成するための処理を示すフローチャートである。なお、このフローチャートの実行に必要なパラメータについては、モデルパラメータデータベース２７０に保存されているが、このデータベースに保存されている情報の形態については後述する。 FIG. 8 is a flowchart showing a process for creating the model 500 by the model creation unit 400. The parameters necessary for executing this flowchart are stored in the model parameter database 270. The form of information stored in this database will be described later.

図８のフローチャートによる処理を開始したら、まず、ステップ４０１では、過去に設定したモデルパラメータを用いるのか、或いは新規にモデルパラメータを作成するのかを選択する。ここで新規にモデルパラメータを作成する場合には、ステップ４０２に進み、モデルパラメータの初期値を、乱数を用い設定する。 When the processing according to the flowchart of FIG. 8 is started, first, in step 401, it is selected whether to use a model parameter set in the past or create a new model parameter. When a new model parameter is created here, the process proceeds to step 402, and the initial value of the model parameter is set using a random number.

次に、ステップ４０３では、計測信号データベース２３０からモデル５００の入力信号及び出力信号となる計測信号３を抽出し、モデル５００の出力信号となる計測信号３の平均を計算する。計算した平均は、学習情報データベース２８０に保存される。 Next, in step 403, the measurement signal 3 that is the input signal and output signal of the model 500 is extracted from the measurement signal database 230, and the average of the measurement signal 3 that is the output signal of the model 500 is calculated. The calculated average is stored in the learning information database 280.

ステップ４０４では、モデル５００の出力信号となる計測信号３の分散を計算する。ここで計測信号のサンプル数が１個しかない場合には、分散を計算することができない。そこで、この場合には、デフォルト値として大きめの分散値を与える。例えば１００などをデフォルト値に設定すればよい。このデフォルト値は、逐次、ユーザが変更することが可能である。 In step 404, the variance of the measurement signal 3 that is the output signal of the model 500 is calculated. Here, when there is only one sample of the measurement signal, the variance cannot be calculated. Therefore, in this case, a larger dispersion value is given as a default value. For example, 100 or the like may be set as a default value. This default value can be sequentially changed by the user.

このときの分布の形状については、学習情報データベース２８０に保存されている形状を用いる。ただし、学習情報データベース２８０に、まだ形状が保存されていない場合には、正規分布を用いることになる。こうして計算した分散は、学習情報データベース２８０に保存される。 As the shape of the distribution at this time, the shape stored in the learning information database 280 is used. However, when the shape is not yet stored in the learning information database 280, a normal distribution is used. The variance thus calculated is stored in the learning information database 280.

ステップ４０５では、ステップ４０３とステップ４０４で計算された平均と分散をモデル５００の教師信号として設定し、次いでステップ４０６では、学習回数や学習係数、ノード数など学習に必要なパラメータを設定する。新規にモデルパラメータを作成する場合には、モデルパラメータデータベース２７０に保存されているデフォルト値を用いる。 In step 405, the mean and variance calculated in steps 403 and 404 are set as the teacher signal of the model 500, and then in step 406, parameters necessary for learning such as the number of learning, the learning coefficient, and the number of nodes are set. When a new model parameter is created, a default value stored in the model parameter database 270 is used.

ステップ４０７では、モデルパラメータを学習により初期値から逐次更新する。学習によるモデルパラメータの更新方法は、Back Propagation法などを用いる。この学習方法については、“ニューラルネットと計測制御”西川祓一・北村新三編著、朝倉書店、１９９５年１月２５日出版”に詳しく述べられているが、基本的には、モデル５００に入力信号を与えたときの出力信号と教師信号の差が無くなるように、モデルパラメータを更新することになる。 In step 407, the model parameters are sequentially updated from the initial values by learning. The Back Propagation method or the like is used as a model parameter update method by learning. This learning method is described in detail in “Neural network and measurement control” written by Junichi Nishikawa and Shinzo Kitamura, Asakura Shoten, published on January 25, 1995. The model parameters are updated so that there is no difference between the output signal and the teacher signal when the signal is given.

ここで、モデル５００からの出力信号と教師信号の差は、一般的には二乗誤差で表現され、評価関数と呼ばれる。各モデルパラメータを変動させた場合の評価関数の変動分を偏微分計算し、得られた値に学習係数を掛けたものをモデルパラメータの更新分とする。従って、これを繰り返していくと、モデル５００の出力信号と教師信号の差が無くなり、評価関数がゼロに近づく。 Here, the difference between the output signal from the model 500 and the teacher signal is generally expressed by a square error and is called an evaluation function. A variation of the evaluation function when each model parameter is varied is subjected to partial differential calculation, and a value obtained by multiplying the obtained value by the learning coefficient is used as an update of the model parameter. Therefore, if this is repeated, the difference between the output signal of the model 500 and the teacher signal disappears, and the evaluation function approaches zero.

そして、評価関数がゼロに近づくと、偏微分の値もゼロに近くなり、モデルパラメータの更新量がセロに近づく。ただし数値計算では、完全にゼロになることは無いので、ステップ４０８により、評価関数が設定された値以下になったら学習が終了したとみなし、モデル作成を終了する。 When the evaluation function approaches zero, the partial differential value also approaches zero, and the update amount of the model parameter approaches zero. However, in the numerical calculation, since it is never completely zero, if the evaluation function falls below the set value in step 408, it is considered that the learning is finished, and the model creation is finished.

一方、ステップ４０８で学習の終了条件に満たない場合には、学習の繰り返し回数が設定された回数に達した時点で繰り返し計算をストップし、ステップ４０６に戻って再度、学習用パラメータを設定する。 On the other hand, if the learning end condition is not satisfied in step 408, the iterative calculation is stopped when the number of learning repetitions reaches the set number, and the process returns to step 406 to set the learning parameters again.

ステップ４０１に戻り、ここで過去のモデルパラメータの使用を選択した場合、過去のモデルパラメータを初期値として学習によって修正するか否かを、ステップ４０９で選択する。修正する場合には、ステップ４０３に進む。修正しない場合には、過去のモデルパラメータをそのまま使用するため、モデル５００を再構築する必要がなく、モデル作成は終了となる。 Returning to step 401, when the use of the past model parameter is selected here, it is selected in step 409 whether or not the past model parameter is to be corrected by learning using the past model parameter as an initial value. In the case of correction, the process proceeds to step 403. If not corrected, since the past model parameters are used as they are, it is not necessary to reconstruct the model 500, and the model creation ends.

なお、この実施形態では、ノードにシグモイド関数を用いたニューラルネットワークを用いているが、ノードにガウシアン関数を用いた Radial Basis Function ネットワークなど、他のネットワークモデルを用いてもよい。 In this embodiment, a neural network using a sigmoid function is used for a node, but other network models such as a radial basis function network using a Gaussian function for a node may be used.

図９は、モデルパラメータデータベース２７０に保存されている情報の形態を説明する図であり、この図のように、モデルパラメータデータベース２７０には、ＩＤ、作成日時、学習係数、学習回数、終了条件、ノード数、パラメータ値が保存される。ここでノード数は、入力層、中間層、出力層に分かれる。また、パラメータ値は重み係数のことで、ノードの相互結合分あり、それぞれＷ₁₁、Ｗ₁₂、・・・と保存されている。 FIG. 9 is a diagram for explaining the form of information stored in the model parameter database 270. As shown in FIG. 9, the model parameter database 270 includes an ID, a creation date, a learning coefficient, the number of learnings, an end condition, The number of nodes and parameter values are saved. Here, the number of nodes is divided into an input layer, an intermediate layer, and an output layer. Further, the parameter value is a weighting factor, and there are mutual connections of nodes, which are stored as W ₁₁ , W ₁₂ ,.

なお、ＩＤの値が０００のものは、モデルパラメータを新規に作成する場合の学習パラメータのデフォルト値を示している。新規作成用のため、ノード数及びパラメータ値の箇所は通常、ブランクとなっている。 An ID value of 000 indicates a default value of a learning parameter when a model parameter is newly created. For new creation, the number of nodes and parameter values are usually blank.

次に、モデル５００、学習部６００の動作について説明する。学習部６００は、制御対象１００の特性を模擬するモデル５００を対象に、モデル出力１３がモデル出力目標値を達成するように、モデル入力１２の生成方法を学習する。なお、このような学習を実行するアルゴリズムとしては、例えば、“強化学習(Reinforcement Learning)”、三上貞芳・皆川雅章共訳、森北出版株式会社、２０００年１２月２０日出版に述べられている強化学習理論がある。 Next, operations of the model 500 and the learning unit 600 will be described. The learning unit 600 learns the generation method of the model input 12 so that the model output 13 achieves the model output target value for the model 500 that simulates the characteristics of the controlled object 100. Examples of algorithms for performing such learning are described in “Reinforcement Learning”, co-translation by Sadayoshi Mikami and Masaaki Minagawa, Morikita Publishing Co., Ltd., published on December 20, 2000. There is reinforcement learning theory.

ここで、この強化学習とは、評価値(報酬)情報を手がかりに、学習部６００とモデル５００との相互作用を通して、モデル出力目標値を達成するためのモデル入力１２の生成方法を学習するものであり、この強化学習を適用することにより、現時刻から将来にわたって得られる評価値の期待値が最大となるようなモデル入力１２の生成方法を学習することが可能である。 Here, the reinforcement learning is to learn a generation method of the model input 12 for achieving the model output target value through the interaction between the learning unit 600 and the model 500 using the evaluation value (reward) information as a clue. By applying this reinforcement learning, it is possible to learn the generation method of the model input 12 that maximizes the expected value of the evaluation value obtained from the current time to the future.

この実施形態では、強化学習アルゴリズムとして、Q-1earning 法を適用した場合を例にして説明するが、しかし、この実施形態の制御装置２００における学習方法としては、強化学習法以外にも遺伝的アルゴリズム、線形・非線形計画法などの最適化技術を適用することも可能である。 In this embodiment, a case where the Q-1earning method is applied will be described as an example of the reinforcement learning algorithm. However, as a learning method in the control device 200 of this embodiment, a genetic algorithm is used in addition to the reinforcement learning method. It is also possible to apply optimization techniques such as linear and nonlinear programming.

図１０は、Q-1earning 法の概要図で、ここに示すように、この Q-1earning 法を適用した学習部６００は、モデル入力１２を生成するエージェント６５０と、状態の価値を評価する評価器６６０とで構成される。 FIG. 10 is a schematic diagram of the Q-1earning method. As shown here, the learning unit 600 to which the Q-1earning method is applied includes an agent 650 that generates the model input 12 and an evaluator that evaluates the value of the state. 660.

図１１及び図１２は、この Q-1earning 法による場合の処理を説明するためのフローチャートである。ここで、このフローチャートの実行に必要な設計パラメータ、例えば割引率γなどについては学習パラメータデータベース２６０、及び学習情報データベース２８０保存されている。なお、これらのデータベースに保存されている情報の形態、及び設計パラメータをデータベースに登録する方法については後述する。 11 and 12 are flowcharts for explaining processing in the case of the Q-1earning method. Here, the learning parameters database 260 and the learning information database 280 are stored for the design parameters necessary for executing this flowchart, such as the discount rate γ. The form of information stored in these databases and a method for registering design parameters in the database will be described later.

図１１において、まず、このフローチャートは、制御対象１００を制御している間、繰り返し実施され、始めのステップ３０１では、制御におけるサンプリング周期ｒを取得する。次に、ステップ３０２では、１エピソード学習を実行する。このステップ３０２においては、モデル５００及び学習部６００が動作し、前述の強化学習アルゴリズムを実行する。そして、ステップ３０３では、学習終了判定を実行する。 In FIG. 11, first, this flowchart is repeatedly performed while the control target 100 is being controlled. In the first step 301, the sampling period r in the control is acquired. Next, in step 302, one episode learning is executed. In step 302, the model 500 and the learning unit 600 operate to execute the above-described reinforcement learning algorithm. In step 303, a learning end determination is executed.

このステップ３０３は、制御のサンプリング周期以下で学習を終了させるために設けられたステップであり、学習実行時間がｒより小さい間はステップ３０２に戻り、処理時間が周期ｒを超えたら学習を終了する。 This step 303 is a step provided for ending the learning within the control sampling period or less, returning to step 302 while the learning execution time is smaller than r, and ending learning when the processing time exceeds the period r. .

図１２は、図１１のステップ３０２における１エピソード学習実行時の動作を説明するフローチヤートで、まず、ステップ６０１では、モデル入力の初期値をランダムに設定する。次に、ステップ６０２では、ステップ６０１で生成したモデル入力１２をモデル５００に入力し、モデル出力１３を得る。次いで、ステップ６０３では、モデル出力１３とモデル出力の目標値とを比較し、モデル出力１３がモデル出力目標値を達成していればエピソードを終了し、達成していない場合はステップ６０４に進む。 FIG. 12 is a flowchart for explaining the operation at the time of executing one episode learning in step 302 of FIG. 11. First, in step 601, initial values of model inputs are set at random. Next, in step 602, the model input 12 generated in step 601 is input to the model 500, and the model output 13 is obtained. Next, in step 603, the model output 13 is compared with the target value of the model output. If the model output 13 has achieved the model output target value, the episode is terminated, and if not, the process proceeds to step 604.

次のステップ６０４では、学習部６００により、学習情報データベース２８０に保存されている情報を用いてモデル入力変更幅△ａを決定する。なお、このモデル入力変更幅△ａの決定方法は後述する。 In the next step 604, the learning unit 600 determines the model input change width Δa using information stored in the learning information database 280. A method for determining the model input change width Δa will be described later.

ステップ６０５では、次の(1)式を用いてモデル入力１２を決定する。

In step 605, the model input 12 is determined using the following equation (1).

ステップ６０６では、ステップ６０５で決定したモデル入力１２をモデル５００に入力し、モデル出力１３を得る。次いでステップ６０７では、ステップ６０６で得たモデル出力１３をもとにして、次の(2)式により評価値を計算する。

In step 606, the model input 12 determined in step 605 is input to the model 500, and the model output 13 is obtained. Next, at step 607, an evaluation value is calculated by the following equation (2) based on the model output 13 obtained at step 606.

ここでは、価値Ｑ(ｓ，ａ)が時刻での総和により決定されているが、これには意味がある。すなわち、実際の行動、ここではモデル入力１２を生成し、モデル５００に入力した場合の応答になるが、これには遅れ時間を伴う場合が多い。特に、プラントに適用した場合には、この影響が大きい。
Here, the value Q (s, a) is determined by the sum of time, but this has meaning. That is, an actual action, in this case, a model input 12 is generated and a response is obtained when it is input to the model 500, but this often involves a delay time. In particular, this effect is significant when applied to a plant.

そこで、行動直後に対する報酬により価値を決定するのではなく、将来的に与えられる報酬の総和で価値を決定する方がより現実的であり、このため時刻での総和により決定されるようにしたのである。また、この場合、割引率γの導入により、行動直後に得られた報酬が高くなるように設定することで、応答性も考慮した評価値をが出できるというメリットもある。 Therefore, it is more realistic to determine the value based on the sum of rewards given in the future, instead of determining the value based on the reward immediately after the action, so it is determined by the sum at the time. is there. Also, in this case, by introducing the discount rate γ, there is an advantage that an evaluation value considering responsiveness can be obtained by setting the reward obtained immediately after the action to be high.

ステップ６０８では、ステップ６０７で計算した評価値に基づき、次の(3)式により、エージェントのパラメータを更新し、その更新した結果を学習情報データベース２８０に保存する。

In step 608, based on the evaluation value calculated in step 607, the parameter of the agent is updated by the following equation (3), and the updated result is stored in the learning information database 280.

最後に、ステップ６０９で、ステップ４０３と同様の方法により終了判定を実施する。すなわち、ステップ６０９では、学習の終了条件に満たない場合、学習の繰り返し回数が設定された回数に達した時点で繰り返し計算をストップし、ステップ６０４に戻るのである。
Finally, in step 609, the end determination is performed by the same method as in step 403. That is, in step 609, if the learning end condition is not satisfied, the iterative calculation is stopped when the number of times of learning reaches the set number of times, and the process returns to step 604.

次に、学習部６００のエージェント６５０においてモデル入力１２を生成し、評価器６６０において状態価値を計算する場合の処理について説明する。なお、ここでは、タールコーディング法を用いた場合について説明するが、この方法以外の手法を用いてエージェント６５０及び評価器６６０を構成してもよい。 Next, processing when the model input 12 is generated in the agent 650 of the learning unit 600 and the state value is calculated in the evaluator 660 will be described. Although the case where the tar coding method is used will be described here, the agent 650 and the evaluator 660 may be configured using a method other than this method.

まず、評価器６６０では、上記したように、タイルコーディング法により状態を分割する。このタイルコーディング法とは、入力空間を分割し、どの領域に属するかを判別することによって、連続的な状態を離散的な状態として認識する手法であり、図１３は、このときのタイルコーディング法を説明する図であり、この図において、１つ１つの領域は、タイルと呼ばれる。例えば、モデル５００への入力信号１２が入力信号Ａと入力信号Ｂの２次元であり、入力信号Ａが０と１の間、入力信号Ｂが１と２の間にある場合は、図１３における状態番号１のタイルに属する。 First, as described above, the evaluator 660 divides the state by the tile coding method. The tile coding method is a method of recognizing a continuous state as a discrete state by dividing an input space and determining which region it belongs to. FIG. 13 shows a tile coding method at this time. In this figure, each region is called a tile. For example, when the input signal 12 to the model 500 is a two-dimensional input signal A and an input signal B, the input signal A is between 0 and 1, and the input signal B is between 1 and 2, It belongs to the tile of state number 1.

この場合、学習情報データベース２８０には、図１４に示すように、状態番号と価値関数とが対応した形態で情報が保存されている。評価器６６０では、モデル出力１３が得られたときの入力信号１２の値と、学習情報データベース２８０に保存されている情報を用いて、前述した(3)式に従って状態の価値を計算する。 In this case, information is stored in the learning information database 280 in a form in which the state number and the value function correspond as shown in FIG. The evaluator 660 uses the value of the input signal 12 when the model output 13 is obtained and the information stored in the learning information database 280 to calculate the value of the state according to the above-described equation (3).

ここで、まず、図１５は、学習情報データベース２８０に保存されている情報を示したもので、図示のように、状態番号に対応してモデル５００を作成する際に用いた教師信号の平均と分散が保存されている。このとき、前述のステップ６０４では、教師信号の分散値に基づいてモデル入力変化幅△ａを決定する。 First, FIG. 15 shows information stored in the learning information database 280. As shown in FIG. 15, the average of the teacher signals used when creating the model 500 corresponding to the state number is shown. The variance is preserved. At this time, in step 604 described above, the model input change width Δa is determined based on the variance value of the teacher signal.

従って、分散が小さい場合には、ばらつきが少なく入力信号の変化に対する感度が低いという理由から、入力変化幅△ａを大きくする。反対に、分散が大きい場合には、ばらつきが大きく入力信号の変化に対する感度が高いという理由から、入力変化幅△ａを小さくする。 Therefore, when the variance is small, the input change width Δa is increased because the variation is small and the sensitivity to changes in the input signal is low. On the other hand, when the variance is large, the input variation width Δa is reduced because the variation is large and the sensitivity to changes in the input signal is high.

次に、図１６は、学習パラメータデータベース２６０に保存されている情報の態様を示したもので、これには、図１３のように、図１２のフローチャートにおけるステップ６０６とステップ６０７を実行するのに必要な学習率などのパラメータが保存される。この強化学習では、評価値の期待値が最大となるように、モデル入力１２の生成方法を学習するので、モデル出力１３がモデル出力目標値を達成した場合に、評価値の値が大きくなることが望ましい。 Next, FIG. 16 shows an aspect of information stored in the learning parameter database 260. This is performed by executing steps 606 and 607 in the flowchart of FIG. 12 as shown in FIG. Parameters such as the required learning rate are saved. In this reinforcement learning, since the generation method of the model input 12 is learned so that the expected value of the evaluation value is maximized, the evaluation value increases when the model output 13 achieves the model output target value. Is desirable.

そこで、このような評価値の生成方法として、モデル出力１３が、モデル出力目標値を達成した場合には正の値、例えば「１」を評価値とする方法がある。また、モデル出力目標値を達成していない場合、モデル担力目標値とモデル出力１３の誤差に反比例するような関数を用いて、評価値を計算する方法がある。さらに、これらの方法を組み合わせて評価値を計算する方法も考えられる。 Therefore, as a method of generating such an evaluation value, there is a method in which a positive value, for example, “1” is used as the evaluation value when the model output 13 achieves the model output target value. Moreover, when the model output target value is not achieved, there is a method of calculating the evaluation value using a function that is inversely proportional to the error between the model force target value and the model output 13. Furthermore, a method of calculating an evaluation value by combining these methods is also conceivable.

次に、制御対象１００の運転員が保守ツール９１０を用い、画像表示装置９５０にデータベースの情報を表示させる方法について、図１７〜図２１により説明する。この場合、運転員は、キーボード９０１とマウス９０２を用い、表示された画面の空欄となっている箇所にパラメータ値を入力するなどの操作を実行することになる。 Next, a method in which the operator of the control target 100 displays the database information on the image display device 950 using the maintenance tool 910 will be described with reference to FIGS. In this case, the operator uses the keyboard 901 and the mouse 902 to execute an operation such as inputting a parameter value in a blank area of the displayed screen.

ここで、図１７は、画像表示装置９５０に表示される初期画面で、ここで運転員は、制御ロジック作成・編集ボタン９５１、学習条件設定ボタン９５２、情報表示ボタン９５３の中から必要なボタンを選択し、マウス９０２を用いてカーソル９５４を移動させ、マウス９０２をクリックすることにより、何れかのボタンを押すようになっている。 Here, FIG. 17 shows an initial screen displayed on the image display device 950. Here, the operator selects a necessary button from among the control logic creation / edit button 951, the learning condition setting button 952, and the information display button 953. By selecting, moving the cursor 954 using the mouse 902, and clicking the mouse 902, one of the buttons is pressed.

そして、まず、図１８は、制御ロジック作成・編集ボタン９５１がクリックされたときに表示される制御ロジック編集画面で、この画面において、運転員は、新規作成ボタン９６７と編集ボタン９６８の何れかを押す。ここで新規作成であれば、何も記述されていないロジック図が開き、編集の場合には、編集したいロジックを選択し、ロジック図が表示される。この作成或いは編集時は、予め登録してある標準要素モジュール９６３から必要なモジュールを選び、それをロジック編集画面９６１に移動させる。モジュール間は、結線／消去９６２を用いて接続する。 First, FIG. 18 is a control logic editing screen displayed when the control logic creation / editing button 951 is clicked. In this screen, the operator clicks either the new creation button 967 or the editing button 968. Push. If it is newly created, a logic diagram in which nothing is described is opened. In the case of editing, the logic to be edited is selected and the logic diagram is displayed. At the time of creation or editing, a necessary module is selected from the standard element modules 963 registered in advance and moved to the logic editing screen 961. The modules are connected using a connection / erase 962.

図１８の表示画面上で作成された制御ロジック図面は、保存ボタン９６４をクリックすることにより、データ送受信処理部９３０を介して制御ロジックデータベース２５０に保存される。また、操作信号生成部３００では、この制御ロジック図面の情報を用いて、計測信号２が入力されたときの操作信号１５を生成する。更に、この操作信号生成部３００では、学習情報データベース２８０に保存されている情報を併用して、操作信号１５を生成することができる。 The control logic drawing created on the display screen of FIG. 18 is saved in the control logic database 250 via the data transmission / reception processing unit 930 when the save button 964 is clicked. Further, the operation signal generation unit 300 generates the operation signal 15 when the measurement signal 2 is input, using the information in the control logic drawing. Further, the operation signal generation unit 300 can generate the operation signal 15 by using information stored in the learning information database 280 together.

このとき、学習情報データベース２８０には、図１２に示した情報の状態番号と中心の情報が保存されている。従って、これらの情報を用いることにより、モデル出力１３が望ましい値となるようなモデル入力１２と同じ値を持つ操作信号１５が容易に生成できる。このとき作成した制御ロジック図面を保存しない場合には、キャンセルボタン９６５をクリックする。一方、戻るボタン９６６をクリックすることにより、図１７の画面に戻すことができる。 At this time, the learning information database 280 stores the state number and central information of the information shown in FIG. Therefore, by using these pieces of information, it is possible to easily generate the operation signal 15 having the same value as the model input 12 so that the model output 13 becomes a desired value. If the control logic drawing created at this time is not saved, a cancel button 965 is clicked. On the other hand, by clicking the return button 966, it is possible to return to the screen of FIG.

図１７の画面において、学習条件設定ボタン９５２をクリックすることにより、図１９の画面が表示される。そこで、運転員は、図８のフローチャートを実行させるために必要な学習係数、学習回数及び終了条件を、図１９の画面の中のモデル作成欄９７１に、モデル固有のＰＩＤに基づいて入力し、或いは既に入力されている場合は、その値を修正することができる。このとき、ＩＤが０００であるデフォルト値を運転員が変更することができる。 When the learning condition setting button 952 is clicked on the screen of FIG. 17, the screen of FIG. 19 is displayed. Therefore, the operator inputs the learning coefficient, the number of times of learning, and the end condition necessary for executing the flowchart of FIG. 8 to the model creation field 971 in the screen of FIG. 19 based on the model-specific PID, Alternatively, if already entered, the value can be corrected. At this time, the operator can change the default value whose ID is 000.

次に、パラメータ設定欄９７２には、図１１及び図１２のフローチャートを実行するのに必要な設定パラメータを入力する。また、操作端設定欄９７３には、図１１のフローチャートによって操作方法を学習するための操作端名称、動作範囲、及びタイルコーディングのための分割数を入力する。ここで、図１９の次頁ボタン９７７をクリックすることにより、学習条件設定画面の後半画面に移る。なお、前頁ボタン９７８及び学習条件設定画面の後半画面については後述にて説明する。 Next, in the parameter setting field 972, setting parameters necessary for executing the flowcharts of FIGS. 11 and 12 are input. Further, in the operation end setting column 973, the operation end name for learning the operation method, the operation range, and the division number for tile coding are input according to the flowchart of FIG. Here, when the next page button 977 in FIG. 19 is clicked, the second half screen of the learning condition setting screen is displayed. The previous page button 978 and the second half screen of the learning condition setting screen will be described later.

そして、図１９の保存ボタン９７４をクリックすることにより、モデル作成欄９７１に入力された情報はモデルパラメータデータベース２７０に、パラメータ設定欄９７２に入力された情報は学習パラメータデータベース２６０に、それに操作端設定欄９７３に入力された情報は学習情報データベース２８０に、それぞれ保存される。 Then, by clicking the save button 974 in FIG. 19, the information input in the model creation field 971 is stored in the model parameter database 270, the information input in the parameter setting field 972 is stored in the learning parameter database 260, and the operation end setting is performed. Information input in the column 973 is stored in the learning information database 280, respectively.

ここで、キャンセルボタン９７５をクリックしたとすると、モデル作成欄９７１とパラメータ設定欄９７２、それに操作端設定欄９７３に入力された情報が何れもキャンセルされる。そして、戻るボタン９７６をクリックすることにより、図１７の画面に戻る。 Here, when the cancel button 975 is clicked, all the information input to the model creation field 971, the parameter setting field 972, and the operation end setting field 973 are canceled. Then, a return button 976 is clicked to return to the screen of FIG.

次に、図２０により、学習条件設定画面の前半画面について説明する。この前半画面は図１９の次頁ボタン９７７をクリックすることにより表示される。そこで、運転員は、学習情報欄９７９に、モデル５００の出力信号の平均、分散及び分布形状を入力し、或いはおのおのが入力されていた場合は、それらを修正することができる。そして、この情報に基づいて、図１２のフローチャートにおけるステップ６０４のモデル入力変化幅が決定されることになる。 Next, the first half screen of the learning condition setting screen will be described with reference to FIG. This first half screen is displayed by clicking the next page button 977 in FIG. Therefore, the operator can input the average, variance, and distribution shape of the output signal of the model 500 in the learning information field 979, or can correct them if they are input. Based on this information, the model input change width at step 604 in the flowchart of FIG. 12 is determined.

次に、図２１は、計測信号データベース２３０と操作信号データベース２４０に保存されている情報を画像表示装置９５０に表示させるため、その条件を設定するとき使用する画面で、図１７において、情報表示ボタン９５３をクリックすることにより表示される。そこで、運転員は、画像表示装置９５０に表示させたい計測信号、或いは操作信号を入力欄９８１に、そのレンジ(上限／下限)と共に入力する。このとき表示させたい時間については時刻入力欄９８２に入力する。 Next, FIG. 21 is a screen used when setting the conditions for displaying information stored in the measurement signal database 230 and the operation signal database 240 on the image display device 950. In FIG. It is displayed by clicking 953. Therefore, the operator inputs a measurement signal or an operation signal to be displayed on the image display device 950 into the input field 981 together with the range (upper limit / lower limit). At this time, the time to be displayed is input in the time input field 982.

また、表示ボタン９８３をクリックすることにより、図２２に示すようなトレンドグラフが画像表示装置９５０に表示される。ここで戻るボタン９９１をクリックすることにより、図２１の画面に戻すことができる。一方、戻るボタン９８４をクリックすることにより、図１７の画面に戻すことができる。なお、以上に説明した画像以外にも、制御装置２００内のデータベースに保存されている情報を任意に選択し、任意の態様で画像表示装置９５０に表示することもできる。 Also, by clicking the display button 983, a trend graph as shown in FIG. 22 is displayed on the image display device 950. By clicking the return button 991 here, it is possible to return to the screen of FIG. On the other hand, by clicking the return button 984, it is possible to return to the screen of FIG. In addition to the images described above, information stored in the database in the control device 200 can be arbitrarily selected and displayed on the image display device 950 in any manner.

次に、この実施形態では、図１の制御対象１００が、図２で説明した火力発電プラントの場合で、これに制御装置２００が適用され、火力発電プラントの空気ダンパを操作することにより、ＣＯやＮＯｘ、二酸化炭素、硫黄酸化物、水銀、フッ素、煤塵またはミストからなる微粒子類、揮発性有機化合物などの少なくとも１つの環境規制値の排出状況が制御できるようになっている。 Next, in this embodiment, the control target 100 in FIG. 1 is the thermal power plant described with reference to FIG. 2, and the control device 200 is applied thereto, and by operating the air damper of the thermal power plant, CO 2 It is possible to control the emission status of at least one environmental regulation value such as NOx, carbon dioxide, sulfur oxide, mercury, fluorine, fine particles composed of dust or mist, and volatile organic compounds.

ここで、まず、図２３は、火力発電設備において排出されるＣＯとＮＯｘの基本特性について説明すると、一般にＣＯの量とＮＯｘの量は、図示のように、トレードオフの関係にあり、ＣＯを低減しようとするとＮＯｘが増加し、ＮＯｘを低減しようとするとＣＯが増加する傾向にある。 Here, first, FIG. 23 explains the basic characteristics of CO and NOx discharged in a thermal power generation facility. Generally, the amount of CO and the amount of NOx are in a trade-off relationship as shown in the figure, and CO is When trying to reduce, NOx tends to increase, and when trying to reduce NOx, CO tends to increase.

一方、火力発電プラントにおいて、煙突から排出されるＣＯの量とＮＯｘの量には法的規制がかけられており、特にＮＯｘについては厳しく、このためボイラ出口のガスを脱硝装置に導き、ここでの処理を通して規制を守っているが、このとき脱硝装置に用いられるアンモニアの消費量は、脱硝装置入口のＮＯｘ濃度が高いほど多くなる。 On the other hand, in a thermal power plant, the amount of CO and NOx emitted from the chimney is legally restricted, especially for NOx, and therefore the gas at the boiler outlet is led to the denitration equipment, where The amount of consumption of ammonia used in the denitration device at this time increases as the NOx concentration at the inlet of the denitration device increases.

そこで、脱硝装置入口でのＮＯｘ量を低減することは大きなコストメリットとなり、従って、ＮＯｘ濃度には可能な限りの低減が望ましく、このためＣＯ及びＮＯｘのトレードオフ関係を考慮した制御アルゴリズムが必要となる。しかも、火力発電プラントは、設計時と試運転時及び運用時など状況が異なると、蓄積されている計測信号のデータも異なってしまう。従って、長期間の運用でも運転条件が異なれば、蓄積されているデータ数が多いからといって、必ずしも好ましいとは限らない。 Therefore, reducing the amount of NOx at the inlet of the denitration apparatus is a great cost merit. Therefore, it is desirable to reduce the NOx concentration as much as possible. Therefore, a control algorithm considering the trade-off relationship between CO and NOx is required. Become. Moreover, when the thermal power plant differs in design, trial operation, and operation, the stored measurement signal data will also differ. Therefore, if the operating conditions are different even in long-term operation, just because the number of accumulated data is large, it is not always preferable.

しかしながら、上記実施形態の場合、制御対象となる火力発電プラントを模擬するモデルは、蓄積されたデータに応じた分布の形状を図８のフローチャートに従ってモデル化することができ、従って、その分布の形状から、過去の状態を把握することができる。つまり、分散が大きければ、ばらつきの多い状態を意味し、プラントの状態が非常に不安定になっていることが分り、分散が小さければばらつきが少ないため、プラントの状態が非常に安定していることが分るので、蓄積データの信頼度を考慮した制御アルゴリズムを図１２のフローチャートから構築することが可能となる。 However, in the case of the above embodiment, the model simulating the thermal power plant to be controlled can model the distribution shape according to the accumulated data according to the flowchart of FIG. From this, the past state can be grasped. In other words, if the variance is large, it means that there is a lot of variation, and it can be seen that the state of the plant is very unstable. If the variance is small, the variation is small, so the state of the plant is very stable. Therefore, it is possible to construct a control algorithm considering the reliability of accumulated data from the flowchart of FIG.

この結果、上記実施形態によれば、蓄積データが少ない場合にも、データ数が少ないことを考慮した制御アルゴリズムを構築することができるため、データの変動に対しロバストな制御が可能となり、従って、ＣＯ及びＮＯｘのトレードオフ関係を考慮した上で、これらについての法的規制を常に満足させることができる。 As a result, according to the above embodiment, even when the accumulated data is small, it is possible to construct a control algorithm that takes into consideration that the number of data is small. In consideration of the trade-off relationship between CO and NOx, the legal regulations for these can always be satisfied.

なお、法的規制がかけられている排出物については、上記したＣＯとＮＯｘの外にも二酸化炭素、硫黄酸化物、水銀、フッ素、煤塵またはミストからなる微粒子類、揮発性有機化合物などがあるのは、既に説明した通りであるが、上記実施形態によれば、これら少なくとも１つの環境規制値についての制御も可能である。 In addition to the above-mentioned CO and NOx, there are carbon dioxide, sulfur oxide, mercury, fluorine, dust, mist, volatile organic compounds, etc. in addition to the above-mentioned CO and NOx. As described above, according to the embodiment, it is possible to control the at least one environmental regulation value.

本発明に係るプラントの制御装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the control apparatus of the plant which concerns on this invention. 本発明の一実施形態において制御対象となる火力発電プラントの一例を示すブロック図である。It is a block diagram which shows an example of the thermal power plant used as control object in one Embodiment of this invention. 本発明の一実施形態において制御対象となる火力発電プラントの一例における配管部とエアーヒーター部の拡大図である。It is an enlarged view of a piping part and an air heater part in an example of a thermal power plant to be controlled in one embodiment of the present invention. 本発明の一実施形態において計測信号データベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in a measurement signal database in one Embodiment of this invention. 本発明の一実施形態において操作信号データベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in the operation signal database in one Embodiment of this invention. 本発明の一実施形態において用いられているモデル化の仕組みを示す説明図である。It is explanatory drawing which shows the mechanism of modeling used in one Embodiment of this invention. 本発明の一実施形態において用いられているモデル化構造の説明図である。It is explanatory drawing of the modeling structure used in one Embodiment of this invention. 本発明の一実施形態におけるモデル作成部の処理を説明するためのフローチャート図である。It is a flowchart figure for demonstrating the process of the model preparation part in one Embodiment of this invention. 本発明の一実施形態においてモデルパラメータデータベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in a model parameter database in one Embodiment of this invention. 本発明の一実施形態において用いられている Q-Learning 法による学習部のブロック図である。It is a block diagram of the learning part by the Q-Learning method used in one Embodiment of this invention. 本発明の一実施形態による学習部に使用されているアルゴリズムのフローチャート図である。It is a flowchart figure of the algorithm currently used for the learning part by one Embodiment of this invention. 本発明の一実施形態による学習部に使用されているアルゴリズムにおける１エピソード学習実行時のフローチャート図である。It is a flowchart figure at the time of 1 episode learning execution in the algorithm currently used for the learning part by one Embodiment of this invention. 本発明の一実施形態による学習部における評価器に適用されているタイルコーディングの説明図である。It is explanatory drawing of the tile coding applied to the evaluator in the learning part by one Embodiment of this invention. 本発明の一実施形態において学習情報データベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in a learning information database in one Embodiment of this invention. 本発明の一実施形態において学習情報データベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in a learning information database in one Embodiment of this invention. 本発明の一実施形態において学習パラメータデータベースに記憶されるデータの態様を示す説明図である。It is explanatory drawing which shows the aspect of the data memorize | stored in the learning parameter database in one Embodiment of this invention. 本発明の一実施形態において画像表示される初期画面の説明図である。It is explanatory drawing of the initial screen displayed as an image in one Embodiment of this invention. 本発明の一実施形態において画像表示される制御ロジック作成・編集画面の説明図である。It is explanatory drawing of the control logic creation / edit screen displayed as an image in one Embodiment of this invention. 本発明の一実施形態において画像表示される学習条件設定画面の前半画面の説明図である。It is explanatory drawing of the first half screen of the learning condition setting screen displayed as an image in one Embodiment of this invention. 本発明の一実施形態において画像表示される学習条件設定画面の後半画面の説明図である。It is explanatory drawing of the second half screen of the learning condition setting screen displayed as an image in one Embodiment of this invention. 本発明の一実施形態において画像表示される表示情報設定画面の説明図である。It is explanatory drawing of the display information setting screen displayed as an image in one Embodiment of this invention. 本発明の一実施形態において画像表示される計測値のトレンドグラフの説明図である。It is explanatory drawing of the trend graph of the measured value displayed on an image in one Embodiment of this invention. 火力発電プラントから排出されるＣＯとＮＯｘの関係を説明する特性図である。It is a characteristic view explaining the relationship between CO and NOx discharged from a thermal power plant.

Explanation of symbols

１００：制御対象
２００：制御装置
２１０：外部入力インターフェイス
２２０：外部出力インターフェイス
２３０：計測信号データベース
２４０：操作信号データベース
２５０：制御ロジックデータベース
２６０：学習パラメータデータベース
２７０：モデルパラメータデータベース
２８０：学習情報データベース
３００：操作信号生成部
４００：モデル作成部
５００：モデル
６００：学習部
９００：入力装置
９０１：キーボード
９０２：マウス
９１０：保守ツール
９２０：外部入力インターフェイス
９３０：データ送受信処理部
９４０：外部出力インターフェイス
９５０：画像表示装置 100: Control target 200: Control device 210: External input interface 220: External output interface 230: Measurement signal database 240: Operation signal database 250: Control logic database 260: Learning parameter database 270: Model parameter database 280: Learning information database 300: Operation signal generation unit 400: Model creation unit 500: Model 600: Learning unit 900: Input device 901: Keyboard 902: Mouse 910: Maintenance tool 920: External input interface 930: Data transmission / reception processing unit 940: External output interface 950: Image display apparatus

Claims

When a predetermined operation signal is given to the control target, an operation signal necessary to make the value of the measurement signal obtained from the control target fall within the operation target value of the control target is generated. A plant control apparatus in which a signal is the predetermined operation signal,
When a predetermined operation signal is given to the controlled object, the model that predicts the value of the measurement signal obtained from the controlled object and the model output that is the prediction result of the model converges to the model output target value. An operation signal generated by the operation signal generation unit includes a learning unit that learns a generation method of a model input to be given to the model, and an operation signal generation unit that generates an operation signal to be given to the control target according to a result of the learning unit. In the plant control apparatus in which the predetermined operation signal is used,
An external input interface for capturing the measurement target measurement signal;
A measurement signal database for storing measurement signal values captured by the interface;
A plant that calculates an average and variance of measurement signals stored in the measurement signal database, corrects the operation signal using a result of the average and variance, and newly generates the predetermined operation signal Control device.

When a predetermined operation signal is given to the control target, an operation signal necessary to make the value of the measurement signal obtained from the control target fall within the operation target value of the control target is generated. A plant control apparatus in which a signal is the predetermined operation signal,
When a predetermined operation signal is given to the controlled object, the model that predicts the value of the measurement signal obtained from the controlled object and the model output that is the prediction result of the model converges to the model output target value. An operation signal generated by the operation signal generation unit includes a learning unit that learns a generation method of a model input to be given to the model, and an operation signal generation unit that generates an operation signal to be given to the control target according to a result of the learning unit. In the plant control apparatus in which the predetermined operation signal is used,
An external input interface for capturing the measurement target measurement signal;
A measurement signal database for storing measurement signal values captured by the interface;
When calculating the average and variance of the measurement signals stored in the measurement signal database and correcting the operation signal to generate a new operation signal, the change width of the operation signal is based on the variance of the measurement signal. A plant control device characterized by deciding.

In the plant control apparatus according to claim 1 or 2,
A plant control apparatus comprising a function for generating an operation signal using a result of calculating an expected value from an average and variance of measurement signals.

In the plant control apparatus according to claim 1 or 2,
A plant control device comprising a user interface for inputting a distribution shape of a measurement signal to the control device as an external input function.

In the plant control apparatus according to claim 1 or 2,
A plant control device comprising a user interface for inputting at least one of an average value, an expected value, a variance, and a distribution shape of a measurement signal to the control device as an external input function.

In the plant control apparatus according to any one of claims 1 to 5,
The controlled object is a thermal power plant,
Among the measurement signals of the thermal power plant, a function of capturing at least one of carbon monoxide and nitrogen oxide in the control device;
As an external input function, a function to set at least one environmental regulation value of carbon monoxide and nitrogen oxide as a limit value of the measurement signal,
A plant control device comprising a function of generating at least an operation signal of an air damper opening according to a learning result.

In the plant control apparatus according to any one of claims 1 to 5,
The controlled object is a thermal power plant,
Among the measurement signals of the thermal power plant, a function of capturing at least one of carbon monoxide and nitrogen oxide in the control device;
As an external input function, at least one environmental regulation value of carbon monoxide, nitrogen oxide, carbon dioxide, sulfur oxide, mercury, fluorine, dust or mist, or volatile organic compound is set as the limit value of the measurement signal. With the ability to set as
According to the learning result, the function of generating at least one operation signal among the air damper opening, the fuel flow rate supplied to the burner, the burner air flow rate, the air flow rate supplied to the air port, the gas recirculation amount, the burner angle, and the supply air temperature. A plant control apparatus characterized by comprising: