JP2021080038A

JP2021080038A - Elevator control device, elevator control method, machine learning device, machine learning method, and program

Info

Publication number: JP2021080038A
Application number: JP2019206594A
Authority: JP
Inventors: 正太服部; Shota Hattori; 靖大北上; Yasuhiro Kitagami
Original assignee: Kozo Keikaku Engineering Inc
Current assignee: Kozo Keikaku Engineering Inc
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2021-05-27
Anticipated expiration: 2039-11-14
Also published as: JP7409831B2

Abstract

【課題】マルチエージェントシミュレーションにより生成した状況を用いて、最適なエレベータの制御モードを学習及び決定できるエレベータの制御装置、エレベータの制御方法、機械学習装置、機械学習方法及びプログラムを提供する。【解決手段】エレベータの制御装置１０は、エレベータを待機中の利用者を認識する利用者識別部１３と、前記エレベータの利用状況として、少なくとも前記利用者の待ち時間を特定する利用状況記録部１５と、交通需要に影響を与える外部情報を取得する外部情報取得部１７と、少なくとも前記待ち時間及び前記外部情報に基づいて、最適な制御モードを判定する機械学習部２０と、前記最適な制御モードに基づいて前記エレベータの運転制御を行うかご割り当て部１９と、を有する。【選択図】図１PROBLEM TO BE SOLVED: To provide an elevator control device, an elevator control method, a machine learning device, a machine learning method and a program capable of learning and determining an optimum elevator control mode by using a situation generated by a multi-agent simulation. An elevator control device 10 includes a user identification unit 13 that recognizes a user waiting for an elevator, and a usage status recording unit 15 that specifies at least the waiting time of the user as the usage status of the elevator. An external information acquisition unit 17 that acquires external information that affects traffic demand, a machine learning unit 20 that determines an optimum control mode based on at least the waiting time and the external information, and the optimum control mode. It has a car allocation unit 19 that controls the operation of the elevator based on the above. [Selection diagram] Fig. 1

Description

本発明は、エレベータの制御装置、エレベータの制御方法、機械学習装置、機械学習方法及びプログラムに関し、特にエレベータの最適な制御モードを学習する手法に関する。 The present invention relates to an elevator control device, an elevator control method, a machine learning device, a machine learning method and a program, and more particularly to a method of learning an optimum control mode of an elevator.

種々のエレベータの運行制御方式が提案されている。基本的な制御アルゴリズムにＳＣＡＮがある。ＳＣＡＮでは、かごがある方向に移動し始めると、かごの移動方向と一致しかつ未通過のフロアで発生しているリクエスト（ホール呼び）を全て満たすまで移動方向を変えずに運行する。類似のアルゴリズムとして、ＬＯＯＫ、ＣーＳＣＡＮ、ＣーＬＯＯＫなどがある。 Various elevator operation control methods have been proposed. SCAN is a basic control algorithm. In SCAN, when the car starts to move in a certain direction, it operates without changing the moving direction until all the requests (hall calls) generated on the floor that have not passed and match the moving direction of the car are satisfied. Similar algorithms include LOOK, C-SCAN, C-LOOK and the like.

ＳＣＡＮのような比較的単純なアルゴリズムで複数台のエレベータを運転した場合、最終的に複数のかご同士が連れ添うような動き（団子運転）をしてしまうことがある。団子運転を防ぎ、各フロアでの待ち時間が均一になるような制御方式として、群管理がある。群管理では、あるフロアでホール呼びが発生すると、制御装置は、複数のかごのうち最適なかごをそのフロアに割り当てる。この際、制御装置は交通需要予測を行い、その結果に基づいて、最適なかごを決定するためのアルゴリズム（制御モード）を選択する。 When a plurality of elevators are operated by a relatively simple algorithm such as SCAN, a plurality of cars may eventually move to accompany each other (dumpling operation). There is group management as a control method that prevents dumpling operation and makes the waiting time on each floor uniform. In group management, when a hall call occurs on a floor, the controller assigns the most suitable car among the multiple cars to that floor. At this time, the control device predicts the traffic demand and selects an algorithm (control mode) for determining the optimum car based on the result.

特許文献１には、群管理において、交通需要予測がエレベータ利用者の行動パターンに基づいて作成されることが記載されている。特許文献２及び特許文献３には、乗り場に設けられたカメラによって利用者の顔を認識し、利用者ごとの待ち時間を計測すること、及び長待ちを解消するようにかごを割り当てることが記載されている。 Patent Document 1 describes that in group management, a traffic demand forecast is created based on an elevator user's behavior pattern. Patent Document 2 and Patent Document 3 describe that a camera provided at a landing recognizes a user's face, measures the waiting time for each user, and allocates a car so as to eliminate long waiting. Has been done.

特開２０１７−０３０８９４号公報JP-A-2017-030894 特開２０１９−０２３１２４号公報Japanese Unexamined Patent Publication No. 2019-023124 特開２０１７−１７８４７５号公報JP-A-2017-178475

しかしながら、特許文献１のような制御方式では、例えば出退勤や昼食時の外出といった典型的な行動パターンには対応できるものの、会議やイベントや災害などによる突発的な行動に対しては適切な交通需要予測や制御モードの選択ができないという問題がある。 However, although the control method as in Patent Document 1 can deal with typical behavior patterns such as going to work and going out at lunch, it is appropriate for traffic demand for sudden behavior due to meetings, events, disasters, etc. There is a problem that the prediction and control mode cannot be selected.

そこで、特許文献２及び特許文献３のような技術で取得できる実績データを用い、強化学習などの機械学習手法により状況に適した制御モードを選択できるようにすることが考えられる。しかしながら、実績データを用いた学習は多くの時間を要する。また、特許文献１のように個人識別を前提とした制御方式に機械学習を適用する場合、建物の利用者が変わる度に新たに学習を行う必要が生じ、非効率である。 Therefore, it is conceivable to use the actual data that can be obtained by the techniques such as Patent Document 2 and Patent Document 3 so that the control mode suitable for the situation can be selected by a machine learning method such as reinforcement learning. However, learning using actual data requires a lot of time. Further, when machine learning is applied to a control method premised on personal identification as in Patent Document 1, it is inefficient because it is necessary to perform new learning every time the user of the building changes.

そこで本発明は、マルチエージェントシミュレーションにより生成した状況を用いて、最適なエレベータの制御モードを学習及び決定できるエレベータの制御装置、エレベータの制御方法、機械学習装置、機械学習方法及びプログラムを提供することを目的とする。 Therefore, the present invention provides an elevator control device, an elevator control method, a machine learning device, a machine learning method, and a program capable of learning and determining an optimum elevator control mode using a situation generated by a multi-agent simulation. With the goal.

本発明の一態様は、エレベータを待機中の利用者を認識する利用識別部と、前記エレベータの利用状況として、少なくとも前記利用者の待ち時間を特定する利用状況記録部と、交通需要に影響を与える外部情報を取得する外部情報取得部と、少なくとも前記待ち時間及び前記外部情報に基づいて、最適な制御モードを判定する機械学習部と、前記最適な制御モードに基づいて前記エレベータの運転制御を行うかご割り当て部と、を有するエレベータの制御装置である。
本発明の他の態様において、前記機械学習装置は、マルチエージェントによるエレベータ運行シミュレーションにより、前記待ち時間の最大値を含む状況データと、制御モードと、を状態変数として取得するシミュレータと、前記シミュレーション結果の適否を示す判定データを出力する判定部と、前記状態変数及び前記判定データとを用いて、前記状況データと前記制御モードとを関連づける学習部と、を有する。
本発明の他の態様において、前記学習部は、前記判定データに関連する報酬を求める報酬計算部と、前記報酬を用いて、前記状況データにおける前記制御モードの価値を示す価値関数を更新する価値関数更新部と、を有する。
本発明の他の態様は、マルチエージェントによるエレベータ運行シミュレーションにより、利用者の待ち時間の最大値を含む状況データと、制御モードと、を状態変数として取得するシミュレータと、前記シミュレーション結果の適否を示す判定データを出力する判定部と、前記状態変数及び前記判定データとを用いて、前記状況データと前記制御モードとを関連づける学習部と、を有する機械学習装置である。
本発明の他の態様は、コンピュータが、エレベータを待機中の利用者を認識する利用識別ステップと、前記エレベータの利用状況として、少なくとも前記利用者の待ち時間を特定する利用状況記録ステップと、交通需要に影響を与える外部情報を取得する外部情報取得ステップと、少なくとも前記待ち時間及び前記外部情報に基づいて、最適な制御モードを判定する判定ステップと、前記最適な制御モードに基づいて前記エレベータの運転制御を行うかご割り当てステップと、を有するエレベータの制御方法である。
本発明の他の態様は、コンピュータが、マルチエージェントによるエレベータ運行シミュレーションにより、利用者の待ち時間の最大値を含む状況データと、制御モードと、を状態変数として取得するシミュレーションステップと、前記シミュレーション結果の適否を示す判定データを出力する判定ステップと、前記状態変数及び前記判定データとを用いて、前記状況データと前記制御モードとを関連づける学習ステップと、を有する機械学習方法である。
本発明の他の態様は、コンピュータに上記方法を実行させるためのプログラムである。 One aspect of the present invention affects traffic demand with a usage identification unit that recognizes a user waiting for an elevator, a usage status recording unit that specifies at least the waiting time of the user as the usage status of the elevator, and a usage status recording unit that specifies at least the waiting time of the user. An external information acquisition unit that acquires external information to be given, a machine learning unit that determines an optimum control mode based on at least the waiting time and the external information, and an operation control of the elevator based on the optimum control mode. It is an elevator control device having a car allocation unit and a car allocation unit.
In another aspect of the present invention, the machine learning device obtains a situation data including the maximum value of the waiting time and a control mode as state variables by an elevator operation simulation by a multi-agent, and the simulation result. It has a determination unit that outputs determination data indicating the suitability of the above, and a learning unit that associates the situation data with the control mode by using the state variable and the determination data.
In another aspect of the present invention, the learning unit uses a reward calculation unit for obtaining a reward related to the determination data and the reward to update a value function indicating the value of the control mode in the situation data. It has a function update unit.
Another aspect of the present invention shows a simulator that acquires situation data including the maximum value of the waiting time of a user and a control mode as state variables by an elevator operation simulation by a multi-agent, and the suitability of the simulation result. It is a machine learning device having a determination unit that outputs determination data, and a learning unit that associates the situation data with the control mode by using the state variable and the determination data.
Another aspect of the present invention includes a usage identification step in which a computer recognizes a user waiting for an elevator, a usage status recording step for identifying at least the waiting time of the user as the usage status of the elevator, and traffic. An external information acquisition step for acquiring external information that affects demand, a determination step for determining an optimum control mode based on at least the waiting time and the external information, and an elevator based on the optimum control mode. It is a control method of an elevator having a car allocation step for performing operation control.
In another aspect of the present invention, a simulation step in which a computer acquires a situation data including a maximum value of a user's waiting time and a control mode as state variables by an elevator operation simulation by a multi-agent, and the simulation result. It is a machine learning method having a determination step of outputting determination data indicating suitability of the above, and a learning step of associating the situation data with the control mode using the state variable and the determination data.
Another aspect of the present invention is a program for causing a computer to perform the above method.

本発明によれば、マルチエージェントシミュレーションにより生成した状況を用いて、最適なエレベータの制御モードを学習及び決定できるエレベータの制御装置、エレベータの制御方法、機械学習装置、機械学習方法及びプログラムを提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, an elevator control device, an elevator control method, a machine learning device, a machine learning method, and a program capable of learning and determining an optimum elevator control mode using a situation generated by a multi-agent simulation are provided. be able to.

エレベータの制御装置１０の概略的な機能ブロック図である。It is a schematic functional block diagram of the control device 10 of an elevator. 機械学習部２０の概略的な機能ブロック図である。It is a schematic functional block diagram of the machine learning unit 20. 学習部２０７の概略的な機能ブロック図である。It is a schematic functional block diagram of the learning unit 207.

以下、図面を参照しつつ本発明の実施形態について説明する。
図１は、エレベータの制御装置１０の概略的な機能ブロック図である。制御装置１０は、各フロア（フロア数ｎ）に設けられた乗り場カメラ１１ａ乃至１１ｎ、かご（かご数ｍ）内に設けられたかご内カメラ１２ａ乃至１２ｍ、利用者識別部１３、利用状況記録部１５、外部情報取得部１７、機械学習装置２０、かご割り当て部１９を有する。各処理部は、ＣＰＵ（中央処理装置）の一機能として実装されても良く、ＣＰＵがソフトウェアに従って動作することにより実現されて良い。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic functional block diagram of an elevator control device 10. The control device 10 includes landing cameras 11a to 11n provided on each floor (floor number n), in-car cameras 12a to 12m provided in the car (car number m), user identification unit 13, and usage status recording unit. It has 15, an external information acquisition unit 17, a machine learning device 20, and a car allocation unit 19. Each processing unit may be implemented as one function of a CPU (central processing unit), and may be realized by operating the CPU according to software.

乗り場カメラ１１ａ乃至１１ｎは、各フロアのエレベータ乗り場に、待ち利用者の顔がもれなく映るように設置される。乗り場カメラ１１ａ乃至１１ｎの映像は利用者識別部１３に出力される。 The platform cameras 11a to 11n are installed at the elevator platform on each floor so that the faces of waiting users can be seen without exception. The images of the landing cameras 11a to 11n are output to the user identification unit 13.

かご内カメラ１２ａ乃至１２ｍは、エレベータのカゴ内に、利用者の顔がもれなく映るように設置される。かご内カメラ１２ａ乃至１２ｍの映像は利用者識別部１３に出力される。 The cameras 12a to 12m in the car are installed in the car of the elevator so that the user's face can be completely reflected. The images of the cameras 12a to 12m in the car are output to the user identification unit 13.

利用者識別部１３は、乗り場カメラ１１ａ乃至１１ｎから入力される画像から、各フロアの乗り場で待機中の利用者の顔画像の特徴量を抽出する。また、かご内カメラ１２ａ乃至１２ｍから入力される画像から、各かご内に乗車中の利用者の顔画像の特徴量を抽出する。ここで得られた利用者の顔の特徴量は、利用者を識別する情報として用いられる。 The user identification unit 13 extracts the feature amount of the face image of the user waiting at the landing on each floor from the images input from the landing cameras 11a to 11n. Further, from the images input from the in-car cameras 12a to 12m, the feature amount of the face image of the user who is in each car is extracted. The feature amount of the user's face obtained here is used as information for identifying the user.

利用状況記録部１５は、ある利用者がどのフロアでかごに乗り、どのフロアで降りたかを特定する。乗降フロアの特定手法については様々な公知技術があるが、例えば利用者識別部１３が、かご内カメラ１２ａ乃至１２ｍの映像においてある利用者がフレームインしたことを認識した際の停止フロアを乗車フロア、フレームアウトしたことを認識した際の停止フロアを降車フロアとすることができる。又は、最初に利用者が認識された乗り場カメラ１１ａ乃至１１ｎの設置フロアを乗車フロア、次に利用者が認識された乗り場カメラ１１ａ乃至１１ｎを降車フロアとすることもできる。利用状況記録部１５は、判定した乗降フロアを、時刻及び利用者の識別子とともに記録する。 The usage status recording unit 15 identifies on which floor a user got in the car and on which floor he / she got off. There are various known techniques for identifying the boarding / alighting floor. For example, the boarding floor is the stop floor when the user identification unit 13 recognizes that a user has framed in the images of the cameras 12a to 12m in the car. , The stop floor when it is recognized that the frame has been out can be used as the disembarkation floor. Alternatively, the floor on which the landing cameras 11a to 11n first recognized by the user may be set as the boarding floor, and the landing cameras 11a to 11n recognized by the user as the boarding floor may be used as the boarding floor. The usage status recording unit 15 records the determined boarding / alighting floor together with the time and the user's identifier.

また、利用状況記録部１５は、各フロアにおける待ち人数を特定する。さらに、各利用者の待ち時間を特定する。待ち時間の特定手法については様々な公知技術があるが、例えば現在、乗り場カメラ１１ａ乃至１１ｎのいずれかがある利用者を捉えている場合、利用者識別部１３は、当該利用者がその乗り場カメラ１１ａ乃至１１ｎにおいて最初に認識された時刻から現在までの経過時間を算出し、待ち時間とすることができる。 In addition, the usage status recording unit 15 specifies the number of people waiting on each floor. Furthermore, the waiting time of each user is specified. There are various known techniques for specifying the waiting time. For example, when a user having any of the landing cameras 11a to 11n is currently captured, the user identification unit 13 uses the landing camera for the user. The elapsed time from the first recognized time in 11a to 11n to the present can be calculated and used as the waiting time.

外部情報取得部１７は、制御モードの選択に役立つ、すなわち交通需要に影響する種々の外部情報を取得する。外部情報には、例えば配信サーバからインターネット等を介して取得可能な天気情報、グループウェア等から取得可能な出退勤情報（出勤時刻及び退勤時刻に関する情報）やイベント情報（会議やイベントの開催に関する情報）、カレンダー情報（日付、曜日及び休日に関する情報）がある。天気情報は、ビル所在地や周辺地域における時間帯別の天気や気温に関する情報を含みうる。イベント情報は、イベントが開催される予定のフロア、開始及び終了予定日時、イベント名、主催者名等を含みうる。外部情報取得部１７は、現在時を基準とする所定の期間（例えば本日分、現在時刻から３時間後まで等）にかかる天気情報、イベント情報、カレンダー情報等を取得することができる。 The external information acquisition unit 17 acquires various external information that is useful for selecting a control mode, that is, that affects traffic demand. External information includes, for example, weather information that can be obtained from a distribution server via the Internet, attendance / leaving information (information about attendance time and leaving time) and event information (information about holding a meeting or event) that can be obtained from groupware, etc. , There is calendar information (information about dates, days of the week and holidays). The weather information may include information on the weather and temperature of each time zone in the location of the building and the surrounding area. The event information may include the floor where the event is scheduled to be held, the scheduled start and end dates, the event name, the organizer name, and the like. The external information acquisition unit 17 can acquire weather information, event information, calendar information, etc. for a predetermined period (for example, today's time, from the current time to 3 hours later, etc.) based on the current time.

機械学習装置２０は、最適な制御モードを判定する学習済みモデルを備える。学習済みモデルは、各フロアの待ち人数、各フロアで待機中の利用者の待ち時間の最大値（最大待ち時間）、天気情報、出退勤情報、イベント情報、カレンダー情報と、最適な制御モードと、の相関性を表すモデル構造に相当する。すなわち、機械学習装置２０は、現在の状況を示す状況データＳ１（各フロアの待ち人数、各フロアにおける最大待ち時間、天気情報、出退勤情報、イベント情報、カレンダー情報）を入力し、学習済みモデルが有するモデル構造に従って、最適な制御モードＳ２を判定結果として出力する。 The machine learning device 20 includes a trained model that determines an optimum control mode. The trained model includes the number of people waiting on each floor, the maximum waiting time (maximum waiting time) of users waiting on each floor, weather information, attendance / leaving information, event information, calendar information, and the optimum control mode. Corresponds to the model structure that represents the correlation of. That is, the machine learning device 20 inputs situation data S1 (waiting number of people on each floor, maximum waiting time on each floor, weather information, attendance / leaving information, event information, calendar information) indicating the current situation, and the trained model The optimum control mode S2 is output as a determination result according to the model structure.

かご割り当て部１９は、機械学習装置２０が判定した制御モードＳ２を用いてエレベータの運行制御を行う。すなわち、現在の状況Ｓ１に最適である制御モードＳ２による運転制御（群制御）を行なって、必要なフロアに適切なかごを割り当てる。群制御については公知技術であるため、ここでは詳細な説明は行わない。 The car allocation unit 19 controls the operation of the elevator using the control mode S2 determined by the machine learning device 20. That is, operation control (group control) is performed in the control mode S2, which is optimal for the current situation S1, and an appropriate car is assigned to the required floor. Since group control is a known technique, detailed description will not be given here.

次に、機械学習装置２０における学習済みモデルの生成手法について説明する。図２に示すように、学習モードの機械学習装置２０は、マルチエージェントによるエレベータ運行シミュレーションを行う条件生成部２０１及びシミュレータ２０３、エレベータ運行シミュレーションの適否判定結果Ｄを算定する判定部２０５、状況データＳ１と制御モードＳ２との関連性を学習する学習部２０７を有する。 Next, a method of generating a trained model in the machine learning device 20 will be described. As shown in FIG. 2, the machine learning device 20 in the learning mode includes a condition generation unit 201 and a simulator 203 for performing an elevator operation simulation by a multi-agent, a determination unit 205 for calculating the suitability determination result D of the elevator operation simulation, and situation data S1. It has a learning unit 207 that learns the relationship between the device and the control mode S2.

条件生成部２０１及びシミュレータ２０３は、マルチエージェントシミュレーションにより状態変数Ｓ（状況データＳ１及び制御モードＳ２）を生成する。マルチエージェントシミュレーションとは、自律的に意思決定し行動することのできる最小単位（エージェント）が、他のエージェントの存在情報をはじめとする環境情報を認識しながら、自らの行動を決定する仕組みを利用したシミュレーション技術である。本実施の形態では、各エージェントは、例えば以下のような制約条件に従って自らの行動を決定する。
・乗り場に到着したエージェントは、目的フロアに向かう方向への呼び登録を行い、待ち状態に入る
・かごが到着すると、乗車定員に達しない限りエージェントが乗車し、目的フロアで降車する
・天気によりかご内に乗車可能な人数は変動する（雨や低温では乗車可能人数が減る）
・休日以外の出勤時刻まで及び退勤時刻後に、エージェントは所定のフロアに移動する
・イベント開始時刻までに、所定のエージェントがイベント開催フロアに移動する
・その他、一定数のエージェントがフロア間をランダムに移動する The condition generation unit 201 and the simulator 203 generate a state variable S (situation data S1 and control mode S2) by multi-agent simulation. Multi-agent simulation uses a mechanism in which the smallest unit (agent) that can autonomously make decisions and act determines its own actions while recognizing environmental information such as the existence information of other agents. It is a simulation technology. In the present embodiment, each agent determines his / her behavior according to the following constraints, for example.
・ The agent who arrives at the landing will register to call in the direction toward the target floor and enter the waiting state. ・ When the car arrives, the agent will board and get off at the target floor unless the passenger capacity is reached. ・ The car will be disembarked depending on the weather. The number of passengers that can be boarded varies (the number of passengers that can be boarded decreases in rain or low temperatures)
・ Agents move to the specified floor until the time of work other than holidays and after the time of leaving work ・ By the time the event starts, the specified agent moves to the floor where the event is held ・ In addition, a certain number of agents randomly move between floors Moving

条件生成部２０１は、このような制約条件を具体的に規定するパラメータ（天気情報、出退勤情報、イベント情報、カレンダー情報）をランダムに決定する。また、条件生成部２０１は、エージェントの行動結果に影響を及ぼすエレベータの制御モードＳ２もランダムに決定する。制御モードＳ２は複数の制御パラメータの集合として規定できるから、条件生成部２０１は制御パラメータの組み合わせをランダムに決定する。 The condition generation unit 201 randomly determines parameters (weather information, attendance / leaving information, event information, calendar information) that specifically define such constraint conditions. The condition generation unit 201 also randomly determines the control mode S2 of the elevator that affects the action result of the agent. Since the control mode S2 can be defined as a set of a plurality of control parameters, the condition generation unit 201 randomly determines a combination of control parameters.

シミュレータ２０３は、条件生成部２０１が生成した諸条件を前提としてマルチエージェントシミュレーションを実行する。各エージェントは、条件生成部２０１が決定したパラメータにより規定される制約条件に従いつつ、自律的に行動する。エレベータの運行制御は公知の群制御アルゴリズムに従って実施されるが、制御モードＳ２が使用される。シミュレータ２０３は、エージェントの行動の結果である各フロアの待ち人数、各フロアにおける最大待ち時間を一定時間ごとに計測する。 The simulator 203 executes the multi-agent simulation on the premise of various conditions generated by the condition generation unit 201. Each agent acts autonomously while complying with the constraints defined by the parameters determined by the condition generation unit 201. Elevator operation control is performed according to a known group control algorithm, but control mode S2 is used. The simulator 203 measures the number of people waiting on each floor, which is the result of the agent's actions, and the maximum waiting time on each floor at regular intervals.

シミュレータ２０３は、計測した各フロアの待ち人数、各フロアにおける最大待ち時間を、天気情報、出退勤情報、イベント情報、カレンダー情報、制御モードＳ２とともに図示しない記憶領域に記録する。こうして、状態変数Ｓ（状況データＳ１及び制御モードＳ２）が収集される。 The simulator 203 records the measured number of people waiting on each floor and the maximum waiting time on each floor in a storage area (not shown) together with weather information, attendance / leaving information, event information, calendar information, and control mode S2. In this way, the state variables S (status data S1 and control mode S2) are collected.

判定部２０５は、条件生成部２０１及びシミュレータ２０３によるシミュレーション結果の評価指標である適否判定結果Ｄを算出する。本実施の形態では、利用者の最大待ち時間が短いほど良い制御でありとみなし、シミュレーションにおいて発生した待ち時間の最大値（各フロアにおける最大待ち時間の試行内における最大値）を判定データＤとする。 The determination unit 205 calculates the suitability determination result D, which is an evaluation index of the simulation result by the condition generation unit 201 and the simulator 203. In the present embodiment, it is considered that the shorter the maximum waiting time of the user is, the better the control is, and the maximum value of the waiting time generated in the simulation (the maximum value in the trial of the maximum waiting time on each floor) is regarded as the judgment data D. To do.

学習部２０７は、任意の機械学習アルゴリズムに従い、最適な制御モードＳ２を学習する。学習部２０７は、条件生成部２０１及びシミュレータ２０３が行う複数回のシミュレーション結果を用いて、状態変数Ｓと判定データＤを用いた学習を繰り返す。学習サイクルを繰り返すことにより、学習部２０７は、状況データＳ１と制御モードＳ２との相関性を徐々に識別し、最適解に近づけることができる。 The learning unit 207 learns the optimum control mode S2 according to an arbitrary machine learning algorithm. The learning unit 207 repeats learning using the state variable S and the determination data D by using the results of a plurality of simulations performed by the condition generation unit 201 and the simulator 203. By repeating the learning cycle, the learning unit 207 can gradually identify the correlation between the situation data S1 and the control mode S2 and approach the optimum solution.

学習部２０７が用いる学習アルゴリズムは特に限定されないが、本実施の形態では強化学習を用いる例を示す。強化学習では、環境が今どうなっているかを示す状態をｓ、エージェントが起こすことのできる行動をａ、ある状態においてエージェントが行動を起こした場合に得られる報酬をｒとし、エージェントが試行錯誤的に行動を繰り返した場合の状態行動価値Ｑ（数１）を最大化することを目的とする。なお、ここでいうエージェントは上述のマルチエージェントシミュレーションにおけるものとは異なり、強化学習において最適な制御モードＳ２（複数の制御パラメータの組み合わせからなる）を探索するための仮想的な主体である。

The learning algorithm used by the learning unit 207 is not particularly limited, but in the present embodiment, an example of using reinforcement learning is shown. In reinforcement learning, the state indicating what the environment is now is s, the action that the agent can take is a, and the reward obtained when the agent takes action in a certain state is r, and the agent makes trial and error. The purpose is to maximize the state action value Q (Equation 1) when the action is repeated. Note that the agent referred to here is a virtual subject for searching for the optimum control mode S2 (consisting of a combination of a plurality of control parameters) in reinforcement learning, unlike the agent in the above-mentioned multi-agent simulation.

図３は、強化学習を実行する場合の学習部２０７の構成を示す図である。学習部２０７は、状態ｓにおける行動ａに対する報酬ｒを算出する報酬計算部２０７１、報酬ｒに基づいて関数Ｑを更新する価値関数更新部２０７３を有する。 FIG. 3 is a diagram showing a configuration of a learning unit 207 when performing reinforcement learning. The learning unit 207 has a reward calculation unit 2071 that calculates the reward r for the action a in the state s, and a value function update unit 2073 that updates the function Q based on the reward r.

報酬計算部２０７１は、例えば状態変数Ｓのもとでの判定データＤが適切であると判定される場合（例えば、シミュレーションにおいて発生した待ち時間の最大値が所定の閾値未満である場合）にプラスの報酬ｒを、不適切であると判定される場合（例えば、シミュレーションにおいて発生した待ち時間の最大値が所定の閾値を超える場合）にマイナスの報酬ｒを出力する。ここでプラスの報酬ｒ及びマイナスの報酬ｒの絶対値は同一であっても異なっていても良い。 The reward calculation unit 2071 is positive when, for example, the determination data D under the state variable S is determined to be appropriate (for example, when the maximum value of the waiting time generated in the simulation is less than a predetermined threshold value). When it is determined that the reward r is inappropriate (for example, when the maximum value of the waiting time generated in the simulation exceeds a predetermined threshold value), a negative reward r is output. Here, the absolute values of the positive reward r and the negative reward r may be the same or different.

又は、報酬計算部２０７１は、予め定められた評価関数又は評価テーブル等に基づいて、判定データＤの値に応じた報酬ｒを算出しても良い。例えば、Ｄが閾値未満である場合には、Ｄが小さくなるほどプラスの報酬ｒの値を大きくし、Ｄが閾値を超える場合には、Ｄが大きくなるほどマイナスの報酬ｒの値を大きくするような評価関数又は評価テーブル等を用いることができる。これにより、より緻密に報酬ｒを設定することができる。 Alternatively, the reward calculation unit 2071 may calculate the reward r according to the value of the determination data D based on a predetermined evaluation function, evaluation table, or the like. For example, when D is less than the threshold value, the value of the positive reward r is increased as D becomes smaller, and when D exceeds the threshold value, the value of the negative reward r is increased as D becomes larger. An evaluation function, an evaluation table, or the like can be used. As a result, the reward r can be set more precisely.

価値関数更新部２０７３は、Ｑ学習、Ｓａｒｓａ又はモンテカルロ法等の手法を用いて、複数回にわたって行われる反復試行（前回の行動ａｔによりもたらされた状態ｓｔにおける、次の行動ａｔ＋１の実行）のあいだ、報酬ｒに基づき関数Ｑを更新しつづけることができる。これらの手法は公知であるため、ここでは具体的な説明を省略する。 The value function update unit 2073 uses a method such as Q-learning, Sarasa, or the Monte Carlo method to perform iterative trials (execution of the next action at + 1 in the state st brought about by the previous action at). Meanwhile, the function Q can be continuously updated based on the reward r. Since these methods are known, specific description thereof will be omitted here.

すなわち、学習部２０７は、エレベータ運行制御シミュレーションを繰り返し実行することにより、関数Ｑを更新していく。このプロセスは、例えば以下の手順で実施しうる。ここでの１回の試行は、例えば所定の時間にわたるエレベータ運転制御シミュレーションの実行である。
（１）最初の試行では、条件生成部２０１によって与えられた条件のもとで、行動ａとして制御モードＳ２のあるパラメータをランダムに決定し、シミュレータ２０３がシミュレーションを行う。判定部２０５が試行結果としてＤを出力する。報酬計算部２０７１がＤに基づいて報酬ｒを計算し、価値関数更新部２０７３がｒに基づいて関数Ｑを更新する。（２）次の試行では、次の行動ａとして制御モードＳ２のあるパラメータを所定のルールで変化させ、関数Ｑを更新する。
（３）上記（２）と同様の試行を一定回数繰り返す。
（４）状態を上記（１）の状態にリセットし、上記（２）乃至（３）のセットを一定回数繰り返す。
（５）条件生成部２０１によって与えられる条件をランダムに変更しつつ、上記（１）乃至（４）のセットを一定回数繰り返す。 That is, the learning unit 207 updates the function Q by repeatedly executing the elevator operation control simulation. This process can be carried out, for example, by the following procedure. One trial here is, for example, the execution of an elevator operation control simulation over a predetermined time.
(1) In the first trial, a certain parameter of the control mode S2 is randomly determined as the action a under the condition given by the condition generation unit 201, and the simulator 203 performs the simulation. The determination unit 205 outputs D as a trial result. The reward calculation unit 2071 calculates the reward r based on D, and the value function update unit 2073 updates the function Q based on r. (2) In the next trial, as the next action a, a certain parameter in the control mode S2 is changed according to a predetermined rule, and the function Q is updated.
(3) The same trial as in (2) above is repeated a certain number of times.
(4) The state is reset to the state of (1) above, and the set of (2) to (3) above is repeated a certain number of times.
(5) While randomly changing the conditions given by the condition generation unit 201, the above sets (1) to (4) are repeated a certain number of times.

以上、本発明の実施の形態について説明したが、本発明は上記実施形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。本発明はその発明の範囲内において、実施の形態の任意の構成要素の変形、もしくは実施の形態の任意の構成要素の省略が可能である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above embodiments and can be appropriately modified without departing from the spirit. Within the scope of the present invention, it is possible to modify any component of the embodiment or omit any component of the embodiment.

例えば、上述の実施の形態において示した機械学習手法はあくまで一例であり、状況データＳ１と制御モードＳ２との関連性を学習するという効果を発揮できるものであれば、他の機械学習手法で代替しうる。例えば、強化学習においてはニューラルネットワークを用いた手法を用いることができる。また、教師あり学習をはじめとする他の機械学習法を使用しても構わない。 For example, the machine learning method shown in the above-described embodiment is only an example, and if the effect of learning the relationship between the situation data S1 and the control mode S2 can be exhibited, another machine learning method can be used instead. Can be done. For example, in reinforcement learning, a method using a neural network can be used. You may also use other machine learning methods, including supervised learning.

また、本発明の情報処理はハードウェアにより実現されても良く、ＣＰＵがコンピュータプログラムを実行することにより実現されても良い。コンピュータプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（ｎｏｎ−ｔｒａｎｓｉｔｏｒｙｃｏｍｐｕｔｅｒｒｅａｄａｂｌｅｍｅｄｉｕｍ）又は一時的なコンピュータ可読媒体（ｔｒａｎｓｉｔｏｒｙｃｏｍｐｕｔｅｒｒｅａｄａｂｌｅｍｅｄｉｕｍ）によりコンピュータに供給され得る。 Further, the information processing of the present invention may be realized by hardware, or may be realized by the CPU executing a computer program. The computer program may be supplied to the computer by various types of non-transitory computer readable medium or temporary computer readable medium.

１０エレベータの制御装置
１１ａ乃至１１ｎ乗り場カメラ
１２ａ乃至１２ｍかご内カメラ
１３利用者識別部
１５利用状況記録部
１７外部情報取得部
１９かご割り当て部
２０機械学習装置
２０１条件生成部
２０３シミュレータ
２０５判定部
２０７学習部
２０７１報酬計算部
２０７３価値関数更新部

10 Elevator control device 11a to 11n Landing camera 12a to 12m In-car camera 13 User identification unit 15 Usage status recording unit 17 External information acquisition unit 19 Car allocation unit 20 Machine learning device 201 Condition generation unit 203 Simulator 205 Judgment unit 207 Learning Department 2071 Reward calculation department 2073 Value function update department

Claims

A user identification unit that recognizes users waiting for the elevator,
As the usage status of the elevator, at least a usage status recording unit that specifies the waiting time of the user, and
The external information acquisition department that acquires external information that affects traffic demand,
A machine learning unit that determines the optimum control mode based on at least the waiting time and the external information.
An elevator control device having a car allocation unit that controls the operation of the elevator based on the optimum control mode.

The machine learning device
A simulator that acquires the status data including the maximum value of the waiting time and the control mode as state variables by the elevator operation simulation by the multi-agent.
A judgment unit that outputs judgment data indicating the suitability of the simulation result, and a judgment unit.
The elevator control device according to claim 1, further comprising a learning unit that associates the situation data with the control mode using the state variable and the determination data.

The learning unit
A reward calculation unit that obtains rewards related to the judgment data, and
The control device for an elevator according to claim 2, further comprising a value function update unit that updates a value function indicating the value of the control mode in the situation data using the reward.

A simulator that acquires status data including the maximum value of the user's waiting time and control mode as state variables by simulating elevator operation by a multi-agent system.
A judgment unit that outputs judgment data indicating the suitability of the simulation result, and a judgment unit.
A machine learning device having a learning unit that associates the situation data with the control mode using the state variable and the determination data.

The computer
A user identification step that recognizes a user waiting for an elevator,
As the usage status of the elevator, at least a usage status recording step for specifying the waiting time of the user and
External information acquisition step to acquire external information that affects traffic demand,
A determination step for determining the optimum control mode based on at least the waiting time and the external information.
An elevator control method comprising a car allocation step for controlling the operation of the elevator based on the optimum control mode.

The computer
A simulation step to acquire the status data including the maximum value of the waiting time of the user and the control mode as state variables by the elevator operation simulation by the multi-agent.
A judgment step that outputs judgment data indicating the suitability of the simulation result, and
A machine learning method having a learning step of associating the situation data with the control mode using the state variable and the determination data.

A program for causing a computer to perform the method according to claim 5 or 6.