JP2006302057A

JP2006302057A - Medical voice information processing apparatus and medical voice information processing program

Info

Publication number: JP2006302057A
Application number: JP2005124471A
Authority: JP
Inventors: Satoru Hayamizu; 悟速水; Hirotsugu Asai; 博次浅井; Hideki Tanahashi; 英樹棚橋; Makoto Kanekawa; 誠金川; Yohei Ishii; 洋平石井
Original assignee: Gifu University NUC; Sanyo Electric Co Ltd; Gifu Prefecture
Current assignee: Gifu University NUC; Sanyo Electric Co Ltd; Gifu Prefecture
Priority date: 2005-04-22
Filing date: 2005-04-22
Publication date: 2006-11-02

Abstract

【課題】治療室内で交わされる会話や指示等から、特定の言語表現等を検出し、治療行為に対応するイベントを検出可能な医療音声情報処理装置を提供することを課題とする。
【解決手段】音声処理装置１は、医療音声情報３の記録等の処理を行う処理装置本体４と、発生した音声を音声情報５として取得するための複数のマイク６と、治療行為提示情報２１を提示するための液晶ディスプレイ８及びスピーカ９とによって主に構成されている。そして、マイク６によって取得された音声情報５から、言語データベース２８に記憶された治療行為に対する特徴的な言語表現２５が所定の時間幅で検出され、治療行為の要部を示すイベントが検出される。これにより、連続的に実施される治療行為で発生する各種イベントをリアルタイムで検出し、イベントインデックス２に記録することができる。
【選択図】図１PROBLEM TO BE SOLVED: To provide a medical voice information processing apparatus capable of detecting a specific language expression or the like from a conversation or instruction exchanged in a treatment room and detecting an event corresponding to a treatment action.
SOLUTION: A voice processing device 1 includes a processing device main body 4 that performs processing such as recording medical voice information 3, a plurality of microphones 6 for acquiring generated voice as voice information 5, and treatment action presentation information 21. Is mainly composed of a liquid crystal display 8 and a speaker 9 for presenting a message. Then, from the voice information 5 acquired by the microphone 6, a characteristic language expression 25 for the treatment action stored in the language database 28 is detected in a predetermined time width, and an event indicating a main part of the treatment action is detected. . As a result, various events occurring in the treatment practice that is continuously performed can be detected in real time and recorded in the event index 2.
[Selection] Figure 1

Description

本発明は、医療音声情報処理装置、及び医療音声情報処理プログラムに関するものであり、特に治療行為の際の指示や会話等に含まれる特定の言語表現を抽出し、標準的治療プログラムに従って実施される治療行為またはイベントを検出可能な医療音声情報処理装置、及び医療音声情報処理プログラムに関するものである。 The present invention relates to a medical speech information processing apparatus and a medical speech information processing program, and in particular, extracts a specific language expression included in an instruction, a conversation, or the like at the time of a treatment action, and is implemented according to a standard treatment program. The present invention relates to a medical voice information processing apparatus and a medical voice information processing program capable of detecting a treatment action or event.

従来から、救急救命医療等の医療現場では、搬送された患者の状態を、医師が的確に判断し、適切かつ迅速な治療を行うために、一連の治療行為の手順が予めフローチャート化され、それぞれの治療行為の具体的な内容や次の治療行為に遷移又は分岐する際の明確な条件が規定された標準的治療プログラム（例えば、ＡＣＬＳ及びＪＡＴＥＣ等）が採用され、使用されている。治療者は、患者の様態や治療に対する反応及び結果等を確認し、標準的治療プログラムに規定される遷移又は分岐条件に基づいて、次に実施する治療行為を決定し、実行する。このとき、ある治療行為から次の治療行為に遷移する的確な情報を治療者に対して提供することにより、治療者の判断ミスを防ぎ、治療を速やかに行えるようにすることができる。なお、ある治療行為から次の治療行為へ遷移する場合、治療者は心拍数、脈拍数、及び血圧等のバイタルサイン、意識レベルの有無、外傷の様子、Ｘ線写真の結果等の情報に基づき、予め標準的治療プログラムに規定された遷移条件に従って判断を行っている。 Conventionally, in a medical field such as emergency lifesaving medical care, a procedure of a series of treatment actions has been previously flowcharted in order for a doctor to accurately determine the state of a transported patient and perform appropriate and prompt treatment, Standard treatment programs (for example, ACLS and JATEC, etc.) that specify specific contents of the treatment action and clear conditions for transitioning to or branching to the next treatment action are adopted and used. The therapist confirms the patient's condition, the response to the treatment, the result, and the like, and determines and executes the next treatment action based on the transition or branch condition defined in the standard treatment program. At this time, it is possible to prevent the therapist from making a mistake and to perform the treatment promptly by providing the therapist with accurate information for transitioning from a certain treatment to the next treatment. When transitioning from one therapeutic action to the next, the therapist is based on information such as vital signs such as heart rate, pulse rate, and blood pressure, presence / absence of consciousness level, trauma, and radiographic results. Judgment is made in accordance with transition conditions defined in advance in the standard treatment program.

このとき、医師による他の医師或いは看護師等への指示、または看護師等からの確認などの意志表示は、主に音声によって行われている。すなわち、治療室内における音声は、互いの認識を共通化するための重要なコミュニケーションツールの一つである。また、治療室内で発せられる会話等の音声は、迅速かつ適切な対応を行う必要があるため、単語や短いフレーズなど、簡潔かつ的確な言語表現で行われている。 At this time, instructions from the doctor to other doctors, nurses, or the like, or will indications such as confirmation from the nurses, are mainly performed by voice. That is, the voice in the treatment room is one of important communication tools for sharing the mutual recognition. In addition, since voices such as conversations uttered in the treatment room need to be dealt with quickly and appropriately, they are performed with simple and accurate language expressions such as words and short phrases.

一方、治療者は一般的な診療及び治療行為、または前述のような標準的治療プログラムに則った治療行為を行った後、個々の治療や処置の具体的な内容、及び患者の既往歴等の情報を記録するために、患者毎のカルテを作成している。そして、この事後に行うカルテの作成作業を簡略化する目的で、治療室内の様子を映像情報として捉え、データ化した医療情報を取得する電子カルテシステムの構築が行われている。さらに、これらの映像情報とともに、治療内容の細部を音声情報として取得可能なシステム及び装置が既に開示されている（例えば、特許文献１及び特許文献２参照）。また、記録される映像に含まれる音声に予めタグを付し、映像または音声を当該タグと関連付けて抽出するようなシステムも開発されている（例えば、非特許文献１参照）。これにより、時間的に連続して撮影された映像の要約を簡易に行うことができるようになる。 On the other hand, the therapist, after performing general medical care and treatment, or treatment according to the standard treatment program as described above, the specific content of each treatment and treatment, and the patient's past history, etc. In order to record information, a chart for each patient is created. For the purpose of simplifying the work of creating a medical chart performed after this, an electronic medical chart system that captures the medical information converted into data by capturing the state of the treatment room as video information is being constructed. Furthermore, a system and an apparatus that can acquire details of treatment contents as audio information together with the video information have already been disclosed (see, for example, Patent Document 1 and Patent Document 2). In addition, a system has been developed in which a tag is attached in advance to audio included in a recorded video and the video or audio is extracted in association with the tag (see, for example, Non-Patent Document 1). As a result, it is possible to easily summarize videos taken continuously in time.

特許第３０７４７６９号公報Japanese Patent No. 3074769 特開２００２−２１５７９７号公報JP 2002-215797 A 橋迫勝広、外２名，「映像要約のための音声タグ抽出」，第１０回画像センシングシンポジウム講演論文集，画像センシング技術研究会，平成１６年，ｐ１７９−１８４Katsuhiro Hashisako, 2 others, “Audio tag extraction for video summarization”, Proceedings of 10th Image Sensing Symposium, Image Sensing Technology Study Group, 2004, p179-184

一般的な診療及び治療行為、または前述のような標準的治療プログラムに則った治療を行った場合、治療者は個々の治療や処置の具体的内容、及び患者の既往歴等を記録するため、患者毎のカルテを作成していた。特に、救急救命医療では、当然のことながら患者の救命を最優先とするため、それぞれの治療行為の実施内容や処置内容の結果の記録（医療記録）のカルテへの記載は、すべて事後に行われるのが常であった。そのため、カルテ作成は、自らが実施した治療内容をその都度思い出しながら、換言すると、過去の記憶を頼りに行っていた。そのため、カルテ作成の効率も悪く、不正確若しくは不明瞭な箇所が残ったり、記入漏れが発生したりする場合があった。 When performing general medical care and treatment, or treatment according to the standard treatment program as described above, the therapist records the specific details of each treatment and treatment, and the patient's past history, A chart was created for each patient. In particular, in lifesaving medical care, it is a matter of course that patient life is given the highest priority. Therefore, all descriptions of medical treatments and records of medical treatment results (medical records) must be written after the fact. It was usual. For this reason, the medical chart preparation relied on the memory of the past, in other words, while remembering the details of the treatment he had performed each time. Therefore, the efficiency of chart preparation is poor, and inaccurate or unclear portions may remain or omissions may occur.

カルテに記載されていない情報を補助するため、治療室内にビデオカメラ等の映像入力機器を設置し、患者が搬送されてから全ての治療行為の処置が完了するまでの流れを映像情報として記録するケースもあった。ところが、これらの映像入力装置は、治療室の天井付近に設置され、治療室全体を俯瞰の映像として捉えるなど、撮影に工夫は見られるものの、撮影アングルが固定されていることが多く、処置を施す人や医療機器などの遮蔽物の影響により必ずしも必要な映像を取得できないことがあった。このとき、複数の映像入力装置を用いることにより、上述した弊害を解消する対策も行われているが、治療行為の全体的な流れは把握できるものの、個々の治療行為の動きや患部の状態、治療者の治療技術の判定等を確認することは非常に困難であった。また、撮影アングルやズーム率が可変の映像入力装置を使用し、人手により制御する方法もあるが、人の確保や映像入力装置が複数台ある場合には制御が困難であるなど、現実的でなかった。さらに、カルテ入力等のために、記録された映像や音声の内容を確認する必要があったが、その作用には多大な労力を必要とした。そのため、キー情報などを用いて対象箇所を検索する機能を提供し、作業性を向上させたものも知られているが、提供されるキー情報は系統的に分類されることがなく、効率の悪いものであった。 In order to assist information not described in the medical record, video input equipment such as a video camera is installed in the treatment room, and the flow from when the patient is transported to the completion of all treatment actions is recorded as video information. There was also a case. However, although these video input devices are installed near the ceiling of the treatment room and capture the whole treatment room as a bird's-eye view image, the shooting angle is often fixed, but treatment is often performed. Necessary images may not always be acquired due to the influence of a shield such as a person who performs the treatment or a medical device. At this time, by using a plurality of video input devices, measures have been taken to eliminate the above-mentioned adverse effects, but although the overall flow of treatment actions can be grasped, the movement of each treatment action and the state of the affected part, It was very difficult to confirm the judgment of the treatment technique of the therapist. In addition, there is a method that uses a video input device with variable shooting angle and zoom ratio and controls it manually. However, it is difficult to control when there are multiple people and video input devices. There wasn't. Furthermore, it was necessary to check the contents of recorded video and audio for inputting medical charts, etc., but the operation required a great deal of labor. For this reason, there is a known function that improves the workability by providing a function for searching for a target location using key information, but the provided key information is not systematically classified and is efficient. It was bad.

特に、治療行為を実施した後にカルテ等を作成する場合、治療の開始から終了までの映像及び音声を順次再生し、カルテ作成に必要な場面を手作業に近い状態で探し出すことがあった。さらに、音声情報の記録と、治療行為の手順が規定された標準的治療プログラムと連動し、事後の解析や検索処理等を容易に行えるようにする音声情報（医療音声情報）の記録が行えるようなものはほとんどなかった。また、治療行為中の発言等を音声認識する場合、すべての発言に対して同様の認識処理を行うことは、非常に高い認識精度及び装置本体の処理能力が必要となる。そのため、出現可能性の高い言語表現を予め限定し、治療行為または治療行為に対応するイベントの検出精度を高めることの可能なものが求められていた。 In particular, when a medical chart or the like is created after the treatment is performed, images and sounds from the start to the end of the treatment are sequentially reproduced, and scenes necessary for creating the medical chart may be found in a state close to manual work. In addition, it is possible to record audio information (medical audio information) that facilitates subsequent analysis, search processing, etc. in conjunction with recording of audio information and standard treatment programs with prescribed treatment procedures. There was almost nothing. In addition, when recognizing a speech or the like during a treatment action, performing a similar recognition process on all the speech requires very high recognition accuracy and processing capability of the apparatus body. For this reason, there is a need for a language expression that has a high possibility of appearing in advance and that can improve the detection accuracy of a treatment action or an event corresponding to the treatment action.

そこで、上記実情に鑑み、治療室内で交わされる会話や指示等から、特定の言語表現等を検出し、治療行為または治療行為に対応したキーとなるイベントを検出可能な医療音声情報処理装置、及び医療音声情報処理プログラムの提供を課題とするものである。 Therefore, in view of the above situation, a medical speech information processing apparatus capable of detecting a specific language expression or the like from conversations or instructions exchanged in a treatment room and detecting a key action corresponding to the treatment action or treatment action, and It is an object to provide a medical voice information processing program.

上記の課題と解決するため、本発明にかかる医療音声情報処理装置は、「治療行為の手順が標準化された標準的治療プログラムに基づいて、前記治療行為に対応する治療行為提示情報を提示する治療行為提示手段と、提示される前記治療行為提示情報に同期して前記治療行為を時系列的にイベントインデックスに構造化して記録する治療行為記録手段と、前記治療行為の際に生じる音声を音声情報として取得する音声情報取得手段と、前記治療行為を示す特徴的な言語表現、及び前記言語表現の発現可能性を前記治療行為毎に算出した統計情報を言語データベースに記憶する言語データベース記憶手段と、取得された前記音声情報から、前記言語データベースに記憶された前記言語表現を抽出する抽出手段と、抽出された前記言語表現から前記統計情報を利用して、前記標準的治療プログラムの前記治療行為または前記治療行為に対応するイベントを検出する治療イベント検出手段と、検出された前記治療行為または前記イベント、抽出された前記言語表現、及び前記言語表現の発現時刻の少なくともいずれか一つを含む医療音声情報を、前記イベントインデックスに構造化して記録する音声情報記録手段と」を具備して主に構成されている。 In order to solve the above-described problems, the medical audio information processing apparatus according to the present invention provides a “treatment for presenting treatment action presentation information corresponding to the treatment action based on a standard treatment program in which the procedure of the treatment action is standardized. Action presentation means, treatment action recording means for recording and recording the treatment actions in an event index in time series in synchronization with the presented treatment action presentation information, and voice information generated during the treatment actions as audio information Voice information acquisition means for acquiring, a characteristic language expression indicating the treatment action, and a language database storage means for storing, in a language database, statistical information calculated for each treatment action, the expression possibility of the language expression; Extraction means for extracting the linguistic expression stored in the language database from the acquired voice information; Treatment event detection means for detecting the treatment action of the standard treatment program or an event corresponding to the treatment action using statistical information, the detected treatment action or the event, the extracted language expression, And voice information recording means for recording medical voice information including at least one of the expression times of the language expressions in the event index in a structured manner.

ここで、標準的治療プログラムとは、救急医療の現場等で使用されるＡＣＬＳ（Ａｄｖａｎｃｅｄｃａｒｄｉａｃｌｉｆｅｓｕｐｐｏｒｔ：二次救命処置）に代表的に示されるように、治療行為が一連の流れに沿って標準化され、各治療行為の間で分岐または遷移の条件等の判断基準が規定されているものである。なお、ＡＣＬＳは、救急救命センター等の医療機関で採用され、医師または十分な訓練を受けた者が、医師の指導の下に医療器具や医薬品を用いて実施する心肺蘇生のための手法であり、現在では心肺蘇生法の世界的な基準として認知されている。現在構築されている標準的治療プログラムは、心肺蘇生などに限定されているが、意識障害、重症熱傷、ガス・薬物中毒、外傷等、その他の症例に対しても構築されつつある。 Here, the standard treatment program is standardized according to a series of flows, as typically shown in ACLS (Advanced life support) used in the field of emergency medicine. Judgment criteria such as branching or transition conditions are defined between each treatment action. ACLS is a technique for cardiopulmonary resuscitation that is adopted by medical institutions such as emergency lifesaving centers and performed by medical doctors or those who have received sufficient training using medical equipment and medicines under the guidance of doctors. It is now recognized as the global standard for cardiopulmonary resuscitation. The standard treatment program currently established is limited to cardiopulmonary resuscitation, but is also being established for other cases such as impaired consciousness, severe burns, gas / drug addiction, trauma, etc.

そして、標準的治療プログラムに基づいて治療行為を実施し、その際の具体的な治療内容及びその手法、使用する医療器具、使用する薬品の種類及びその投与条件、及び治療に対する患者の反応等に関する情報等が記録される。また、治療行為提示情報とは、標準的治療プログラムに基づいて治療行為を実行するための情報であり、例えば、現在の治療内容を示す治療内容データ、次に実施する治療行為を決定するための判断基準を示す条件データなどが含まれる。この治療行為提示情報は、各治療行為毎にこれらの情報を連結した治療行為群情報の形で形成されているものであってもよい。 Based on the standard treatment program, the treatment is performed, and the specific treatment content and method, the medical device to be used, the type of medicine to be used and the administration condition, and the patient's response to the treatment, etc. Information etc. are recorded. The treatment action presentation information is information for executing a treatment action based on a standard treatment program, for example, treatment content data indicating the current treatment contents, and determining the next treatment action to be performed. It includes condition data indicating the judgment criteria. This treatment action presentation information may be formed in the form of treatment action group information obtained by connecting these pieces of information for each treatment action.

さらに、治療行為提示手段は、治療者によって選定された標準的治療プログラムの内容を、実際に治療行為を実施する治療者に具体的に提示するものであり、例えば、治療室内に設置された大型のディスプレイに治療内容データや条件データを表示し、治療者が当該治療行為を確認しながら行うことができるようにするものである。さらに、治療室に搬送されてからの経過時間、当該治療行為に遷移してからの時間等の情報を提示することも可能である。さらに、治療行為の内容を、スピーカ等を介して治療者の聴覚を通じて伝達するものであってもよい。 Furthermore, the treatment action presentation means specifically presents the contents of the standard treatment program selected by the therapist to the practitioner who actually carries out the treatment action. For example, the treatment action presentation means Treatment content data and condition data are displayed on the display so that the therapist can perform the treatment while confirming the treatment. Furthermore, it is also possible to present information such as the elapsed time after being transported to the treatment room and the time since the transition to the treatment action. Furthermore, the content of the treatment action may be transmitted through the hearing of the therapist via a speaker or the like.

ここで、イベントインデックスとは、時系列に従って映像情報、音声情報、テキスト情報、またはセンサ情報等から検出された多様なイベントを統合し、情報の種類やイベントの内容などを構造化して記録したものである。さらに、医学的な専門知識を活用することによって、各イベント間の関連性を含めて記録することができる。そして、イベントインデックスを参照することによって、イベントの分類や再生時の各種情報の選択などを行うことが可能となり、多様な情報の活用が容易となる。 Here, the event index is a record in which various types of events detected from video information, audio information, text information, sensor information, etc. are integrated according to time series, and the types of information and event contents are structured and recorded. It is. Furthermore, by utilizing medical expertise, it is possible to record the relevance between events. Then, by referring to the event index, it becomes possible to classify events, select various information at the time of reproduction, and the like, making it easy to use a variety of information.

さらに、音声入力機器とは、マイク等から構成される音声入力機器を設置する設置数及び設置場所も治療室の状況に応じて選択することができる。このとき、個々の治療者のそれぞれの音声を特定し、個別に取得する場合や、治療室内の複数箇所に予め設置され、治療者の会話や指示の音声及び治療室内の医療機器から発せられる機器作動音や警報音等をまとめて一つの音声情報として取得する場合、またはこれらを組合わせて取得することもできる。 Furthermore, with the voice input device, the number and location of installation of voice input devices composed of microphones and the like can be selected according to the condition of the treatment room. At this time, when the voices of individual therapists are identified and acquired individually, or installed in advance at a plurality of locations in the treatment room, the voices of conversations and instructions of the therapists and the devices emitted from the medical equipment in the treatment room When operating sound, alarm sound, and the like are collectively acquired as one sound information, they can also be acquired in combination.

また、言語データベースは、治療行為の際に発現する可能性の高い言語表現、例えば、”気管挿管”、”気管切開”、”輪状甲状靱帯穿刺”等の治療行為の名称、或いは”除細動器”、”エコー”、”Ｘ線撮影”等の治療に使用される医療機器や器具等の名称、”血圧○○”、”脈拍△△”、”心拍停止”等の患者の様態を示すバイタルの値等の報告、或いは投与する薬品名等の語句、用語がその頻出可能性を示す統計情報とともに記録されている。ここで、統計情報は、例えば、気道の確保を目的とする治療の場合、”気管挿管”や”気管切開”のような言語に対して高く設定され、”エコー”や”Ｘ線撮影”のような言語に対して低く設定されている。一方、患者が腹部の痛みなどを訴え、意識や呼吸は安定している状況では、上述の”気管挿管”や”気管切開”などの言語が発現される可能性は極めて低く、腹部の内部の出血状況等を確認するための”エコー”などの医療機器の呼称が治療中に発現する可能性は高い。そのため、それぞれの治療行為の内容に対して、予め統計的に算出された発現可能性を示す統計情報を各言語に対応するように記録されている。 In addition, the language database is a language expression that is highly likely to occur during treatment, such as “tracheal intubation”, “tracheotomy”, “circular thyroid ligament puncture”, or “defibrillation”. Names of medical devices and instruments used for treatment such as “Instrument”, “Echo”, “X-ray”, etc., “Blood pressure ○○”, “Pulse △△”, “Heart arrest” etc. Reports of vital values, etc., or phrases and terms such as names of drugs to be administered are recorded together with statistical information indicating the possibility of frequent occurrence. Here, statistical information is set high for languages such as “tracheal intubation” and “tracheotomy” in the case of treatment aimed at securing the airway, for example, “echo” and “radiography” It is set low for such languages. On the other hand, when the patient complains of pain in the abdomen and the consciousness and breathing are stable, the language such as “tracheal intubation” and “tracheotomy” described above is very unlikely. There is a high possibility that the name of a medical device such as “echo” for confirming the bleeding status will be expressed during treatment. For this reason, statistical information indicating the possibility of expression calculated statistically in advance is recorded for each language so as to correspond to each language.

したがって、本発明の医療音声情報処理装置によれば、標準的治療プログラムに基づいて治療行為提示情報が提示され、実施された治療行為の記録が時系列的にイベントインデックスに構造化して記録される。このとき、治療行為中に発せられた治療者等の音声がマイク等の音声入力機器を介して音声情報として取得される。そして、この取得された音声情報に含まれる治療行為に特徴的な言語表現が、当該言語表現及び治療行為に対する発現可能性を数値化した統計情報とともに記憶された言語データベースから抽出される。すなわち、予め検出される可能性の高い言語表現に限定し、抽出処理がなされるため、認識精度が向上する。そして、治療行為等に対応して抽出された言語表現がイベントインデックスに構造化して同期して記録される。これにより、事後の治療行為に対する解析や検証作業を容易に行うことが可能となる。 Therefore, according to the medical audio information processing apparatus of the present invention, treatment action presentation information is presented based on a standard treatment program, and records of the performed treatment actions are structured and recorded in an event index in a time series. . At this time, the voice of the therapist or the like uttered during the treatment is acquired as voice information through a voice input device such as a microphone. Then, a linguistic expression characteristic of the treatment action included in the acquired speech information is extracted from a language database stored together with statistical information obtained by quantifying the expression possibility for the linguistic expression and the treatment action. That is, since the extraction process is limited to language expressions that are likely to be detected in advance, the recognition accuracy is improved. Then, the linguistic expression extracted corresponding to the treatment action or the like is structured and recorded in synchronization with the event index. This makes it possible to easily perform analysis and verification work for the subsequent treatment action.

さらに、本発明にかかる医療音声情報処理装置は、上記構成に加え、「前記治療イベント検出手段は、前記言語表現の発現される前後の所定範囲の時間幅の中で抽出される複数の言語表現の組合わせ、前記言語表現の抽出回数、及び前記言語表現の発現順序の少なくともいずれか一つに基づいて検出する」ものであっても構わない。 Furthermore, the medical speech information processing apparatus according to the present invention has, in addition to the above configuration, “a plurality of language expressions extracted within a predetermined time range before and after the language expression is expressed. May be detected based on at least one of the combination of the above, the number of extractions of the language expression, and the expression order of the language expression.

したがって、本発明の医療音声情報処理装置によれば、治療イベント検出手段は、所定の時間幅の中に含まれる言語表現の組合わせ、言語表現の抽出回数、及び発現順序等に基づいて治療行為またはイベントが検出される。例えば、所定の時間幅を最初の言語表現である”気管挿管”が検出されてから１分以内の間に、同じ”気管挿管”の言語表現が３回以上繰り返し検出されたり、”気管挿管”に続いて”気管切開”が検出されたりする場合などが挙げられる。これにより、音声情報に含まれる特定の言語表現を１回検出しただけで、当該治療行為やイベント等を検出するのではなく、複数回、発現順序等を考慮して検出される。その結果、治療行為等の検出精度が向上し、誤った情報がイベントインデックスに記録されることが少なくなる。なお、所定の時間幅の決定や、発現順序等については、医学的知識に基づき、予め決定しておくものであってもよい。 Therefore, according to the medical sound information processing apparatus of the present invention, the treatment event detection means performs the treatment action based on the combination of the language expressions included in the predetermined time width, the number of extractions of the language expressions, the expression order, and the like. Or an event is detected. For example, within a minute after the first language expression “tracheal intubation” is detected for a predetermined time span, the same “tracheal intubation” language expression is repeatedly detected three times or more, or “tracheal intubation”. The case where “tracheostomy” is detected subsequently is mentioned. As a result, the specific linguistic expression included in the speech information is detected only once, and the treatment action, event, or the like is not detected, but is detected a plurality of times in consideration of the expression order. As a result, the detection accuracy of the treatment action and the like is improved, and erroneous information is less likely to be recorded in the event index. Note that the determination of the predetermined time width, the order of expression, and the like may be determined in advance based on medical knowledge.

さらに、本発明にかかる医療音声情報処理装置は、上記構成に加え、「前記治療イベント検出手段は、複数の前記治療行為または前記イベントの候補から一つが選択して検出されるとともに、選択外の前記治療行為または前記イベントを未検出情報として保持する未検出情報保持手段をさらに有し、事後に検出された前記治療行為または前記イベントから、前記未検出情報に保持される前記治療行為または前記イベントを修正する修正手段」を具備するものであっても構わない。 Further, the medical audio information processing apparatus according to the present invention has, in addition to the above-described configuration, “the treatment event detection unit selects and detects one of a plurality of treatment actions or event candidates, and is not selected. The treatment action or the event held in the undetected information from the treatment action or the event detected after the event, further comprising undetected information holding means for holding the treatment action or the event as undetected information. It may also be provided with a correcting means for correcting "."

したがって、本発明の医療音声情報処理装置によれば、治療行為またはイベントは、複数の候補がリストアップされ、その中から特に適するものが治療行為またはイベントとして記録される。このとき、この当該検索時において、選から漏れた治療行為やイベントに関する情報を一時的に記憶し、事後の治療行為等を検出する際の補助的な役割に利用することができる。すなわち、一度決定された治療行為またはイベントであっても、事後の治療行為の結果、前段階において正しくない治療行為が選択されていたことが判明することもある。このような場合、改めて、事後に戻って治療行為等の検出をやり直すのではなく、選択の際に保持していた未検出情報保持手段によって記録された未検出情報の中から、事後の現況に最も近似する治療行為等が検出される。これにより、新たに治療行為またはイベント等を検出する処理を要することがなく、速やかに事前の治療行為等を修正することが可能となる。その結果、医療音声情報処理装置に対する負荷が軽減されるとともに、修正によって正しい医療音声情報がイベントインデックスに記録されることとなる。 Therefore, according to the medical sound information processing apparatus of the present invention, a plurality of candidates are listed as treatment actions or events, and a particularly suitable one is recorded as a treatment action or event. At this time, at the time of the search, information regarding treatment actions and events that are not selected can be temporarily stored and used for an auxiliary role in detecting subsequent treatment actions and the like. That is, even if the treatment action or event is determined once, it may be found that an incorrect treatment action has been selected in the previous stage as a result of the subsequent treatment action. In such a case, instead of returning to the post-event and re-detecting the treatment action etc., the undetected information recorded by the non-detected information holding means held at the time of selection is changed to the subsequent status. The closest treatment action is detected. Thus, it is possible to quickly correct a prior treatment action without requiring a new treatment action or event detection process. As a result, the load on the medical sound information processing apparatus is reduced, and correct medical sound information is recorded in the event index by the correction.

さらに、本発明にかかる医療音声情報処理装置は、上記構成に加え、「抽出された前記言語表現をスコア化する第一スコア化手段と、前記治療行為または前記イベントに対応して取得される映像情報、テキスト情報、及びセンサ情報の少なくともいずれか一つのモダリティ情報をスコア化する第二スコア化手段とさらに具備し、前記治療イベント検出手段は、スコア化された前記言語表現及び前記モダリティ情報を統合し、前記治療行為または前記イベントを検出する」ものであっても構わない。 Furthermore, in addition to the above-described configuration, the medical audio information processing apparatus according to the present invention includes “first scoring means for scoring the extracted linguistic expression, and video acquired corresponding to the treatment action or the event. And second scoring means for scoring at least one modality information of information, text information, and sensor information, wherein the treatment event detection means integrates the scored language expression and the modality information. And detecting the therapeutic action or the event ”.

ここで、モダリティ情報とは、カメラ等の映像入力機器を介して取得される治療室内の様子等を撮影した映像情報、コンソール等のテキスト入力機器を介して取得される治療結果や治療行為等をテキスト化したテキスト情報、及び医療機器或いは非医療機器によってセンシングされたセンサ情報等を含むものであり、前述した音声情報以外の治療行為に関するすべての情報を統合したものである。 Here, modality information refers to video information obtained by photographing the inside of a treatment room obtained via a video input device such as a camera, treatment results and treatment actions obtained via a text input device such as a console. This includes text information that has been converted into text and sensor information sensed by a medical device or a non-medical device, and integrates all information related to the treatment action other than the voice information described above.

さらに、第一スコア化手段及び第二スコア化手段とは、検出された言語表現に対して当該治療行為に対する合致度等を算出し、数値化（スコア化）するものであり、入力された音声情報から得られた第一スコアと、その他のモダリティ情報から得られた第二スコアとを統合し、両者を合わせた数値に基づいて治療イベントを検出することがなされる。 Furthermore, the first scoring means and the second scoring means calculate the degree of match for the treatment action with respect to the detected linguistic expression, and digitize it (score it). The first score obtained from the information and the second score obtained from the other modality information are integrated, and the treatment event is detected based on the combined numerical value.

したがって、本発明の医療音声情報処理装置によれば、抽出された言語表現及びその他の映像等からなるモダリティ情報の双方をスコア化（数値化）し、その結果に基づいて治療行為またはイベントを検出することが可能となる。すなわち、音声情報のみによって検出された治療行為またはイベントに比べ、その他の映像等のモダリティ情報を組合わせることにより、音声情報の言語表現からは検出することができなかった細かい所作等から治療行為の内容を特に認識することが可能となる。 Therefore, according to the medical audio information processing apparatus of the present invention, both the extracted linguistic expression and the modality information including other images are scored (digitized), and the treatment action or event is detected based on the result. It becomes possible to do. In other words, compared to treatment actions or events detected only by voice information, by combining modality information such as other images, treatment actions can be taken from detailed actions that could not be detected from the linguistic expression of voice information. It becomes possible to recognize the contents in particular.

一方、本発明にかかる医療音声情報処理プログラムは、「治療行為の手順が標準化された標準的治療プログラムに基づいて、前記治療行為に対応する治療行為提示情報を提示する治療行為提示手段、提示される前記治療行為提示情報に同期して前記治療行為を時系列的にイベントインデックスに構造化して記録する治療行為記録手段、前記治療行為の際に生じる音声を音声情報として取得する音声情報取得手段、前記治療行為を示す特徴的な言語表現、及び前記言語表現の発現可能性を前記治療行為毎に算出した統計情報を言語データベースに記憶する言語データベース記憶手段、取得された前記音声情報から、前記言語データベースに記憶された前記言語表現を抽出する抽出手段、抽出された前記言語表現から前記統計情報を利用して、前記標準的治療プログラムの前記治療行為または前記治療行為に対応するイベントを検出する治療イベント検出手段、及び、認識された前記治療行為または前記イベントを、抽出された前記言語表現、及び前記言語表現の発現時刻の少なくともいずれか一つを含む医療音声情報を、前記イベントインデックスに構造化して記録する音声情報記録手段として、医療音声情報処理装置を機能させる」ものから主に構成されている。 On the other hand, the medical audio information processing program according to the present invention provides a treatment action presentation means for presenting treatment action presentation information corresponding to the treatment action based on a standard treatment program in which the procedure of treatment action is standardized. Treatment information recording means for recording and recording the treatment actions in an event index in time series in synchronization with the treatment action presentation information, voice information acquisition means for obtaining sound generated during the treatment actions as voice information, Language database storage means for storing, in a language database, characteristic language expressions indicating the treatment action, and statistical information obtained by calculating the expression possibility of the language expression for each treatment action, from the acquired voice information, the language Extracting means for extracting the language expression stored in the database, using the statistical information from the extracted language expression, Treatment event detection means for detecting the treatment action of the standard treatment program or an event corresponding to the treatment action, the extracted language expression of the recognized treatment action or event, and expression of the language expression It is mainly composed of what makes the medical audio information processing apparatus function as audio information recording means for structuring and recording medical audio information including at least one of the times in the event index.

さらに、本発明にかかる医療音声情報処理プログラムは、上記構成に加え、「前記言語表現の発現される前後の所定範囲の時間幅の中で抽出される複数の言語表現の組合わせ、前記言語表現の抽出回数、及び前記言語表現の発現順序の少なくともいずれか一つに基づいて前記治療行為または前記イベントを検出する前記イベント認識手段として、前記医療音声情報処理装置をさらに機能させる」ものであっても構わない。 Furthermore, in addition to the above configuration, the medical speech information processing program according to the present invention includes a “combination of a plurality of language expressions extracted within a predetermined range of time before and after the expression of the language expression, the language expression. The medical speech information processing apparatus further functions as the event recognition means for detecting the treatment action or the event based on at least one of the number of times of extraction and the order of expression of the language expression ” It doesn't matter.

さらに、本発明にかかる医療音声情報処理プログラムは、上記構成に加え、「複数の前記治療行為または前記イベントの候補から一つが選択して認識されるとともに、選択外の前記治療行為または前記イベントを未認識情報として保持する未認識情報保持手段、及び、事後に認識された前記治療行為または前記イベントから、前記未認識情報に保持される前記治療行為または前記イベントを修正する修正手段として、前記医療音声情報処理装置をさらに機能させる」ものであっても構わない。 In addition to the above configuration, the medical audio information processing program according to the present invention may be configured such that “one of a plurality of the therapeutic actions or the event candidates is selected and recognized, and the therapeutic actions or events that are not selected are selected. As the unrecognized information holding means for holding as unrecognized information and the correcting means for correcting the therapeutic action or event held in the unrecognized information from the therapeutic action or event recognized after the fact, The voice information processing apparatus may further function ”.

さらに、本発明にかかる医療音声情報処理プログラムは、上記構成に加え、「抽出された前記言語表現をスコア化する第一スコア化手段、前記治療行為または前記イベントに対応して取得される映像情報、テキスト情報、及びセンサ情報の少なくともいずれか一つのモダリティ情報をスコア化する第二スコア化手段、及び、スコア化された前記言語表現及び前記モダリティ情報を統合し、前記治療行為または前記イベントを検出する前記治療イベント検出手段として、前記医療音声情報処理装置をさらに機能させる」ものであっても構わない。 Further, the medical audio information processing program according to the present invention is, in addition to the above configuration, “video information acquired in response to the first scoring means for scoring the extracted language expression, the treatment action or the event. Second scoring means for scoring at least one modality information of text information and sensor information, and integrating the scored language expression and modality information to detect the treatment action or the event The medical sound information processing apparatus may further function as the treatment event detection means.

したがって、本発明の医療音声情報処理プログラムによれば、プログラムを実行することにより、医療音声情報処理装置に上記作用を奏させることが可能となる。 Therefore, according to the medical audio information processing program of the present invention, it is possible to cause the medical audio information processing apparatus to perform the above-described operation by executing the program.

本発明の効果として、標準的治療プログラムに基づいて実行された各治療行為に関する記録をイベントインデックスに同期して記録するともに、治療中に交わされた会話や指示等から、治療行為に特徴的な言語表現を抽出し、治療行為の内容やイベントを検出することができる。特に、予め言語データベースに記憶された統計情報によって、治療行為の発現可能性を考慮して該言語表現の検出及びイベント等が検出されるため、検出精度が格段に優れたものとなる。さらに、言語表現の検出を所定の時間幅に対応して行っているため、その言語及びその他の言語表現との組合わせ、言語表現の発現順序等によって、イベント等の検出が明瞭に行われる。これにより、検出されるイベント等の検出精度が向上し、事後の検索及び再生等の作業を効率的に行うことができるようになる。 As an effect of the present invention, a record relating to each treatment action executed based on the standard treatment program is recorded in synchronization with the event index, and from the conversations and instructions exchanged during the treatment, it is characteristic of the treatment action. Extract linguistic expressions and detect the content and events of treatment. In particular, the detection accuracy of the language expression and the event are detected by statistical information preliminarily stored in the language database in consideration of the possibility of the treatment action, so that the detection accuracy is remarkably improved. Furthermore, since the detection of the language expression is performed corresponding to a predetermined time width, the detection of the event or the like is clearly performed by the combination of the language and other language expressions, the expression order of the language expressions, and the like. As a result, the detection accuracy of the detected event and the like is improved, and subsequent operations such as search and reproduction can be performed efficiently.

次に、本実施形態の医療音声情報処理装置１（以下、単に「音声処理装置１」と称す）について、図１乃至図６に基づいて説明する。ここで、図１は本実施形態の音声処理装置１の概略構成を示すブロック図であり、図２及び図３は音声処理装置１による音声認識を模式的に示す説明図であり、図４及び図５は音声処理装置１の処理の流れを示すフローチャートであり、図６はイベントインデックス２の一例を示す説明図である。 Next, a medical voice information processing apparatus 1 (hereinafter simply referred to as “voice processing apparatus 1”) according to the present embodiment will be described with reference to FIGS. Here, FIG. 1 is a block diagram showing a schematic configuration of the voice processing device 1 of the present embodiment, and FIGS. 2 and 3 are explanatory diagrams schematically showing voice recognition by the voice processing device 1, FIG. FIG. 5 is a flowchart showing the processing flow of the audio processing apparatus 1, and FIG. 6 is an explanatory diagram showing an example of the event index 2.

本実施形態の音声処理装置１は、図１及び図２に示すように、治療室に配され、医療音声情報３の記録等の処理を行う処理装置本体４と、処理装置本体４に接続し、治療室で発生した音声を音声情報５として取得するための複数のマイク６と、標準的治療プログラム２０に則って実施される各治療行為の内容や遷移条件等を示す治療行為提示情報２１を映像及び音声を通じて治療者に提示するための液晶ディスプレイ８及びスピーカ９によって主に構成されている。さらに、音声処理装置１は、その他の構成として、治療行為に対する処置結果や所見等をテキスト情報１０としてテキスト化して入力するための操作入力用のコンソール１１、治療室の様子を把握するための種々のセンサ情報１２を取得するための医療機器１３（例えば、患者の心拍数等のバイタルサインを測定するための心電図モニタ等）、或いは治療室内における治療者等の動きを検出するための非接触型の各種センサ１４等と、治療室の様子を撮影し、映像情報１５を取得する映像入力機器１６と接続されている。ここで、医療機器１３或いは非医療機器である各種センサ１４によって取得されるセンサ情報１２、コンソール１１から入力されるテキスト情報１０、及び映像入力機器１６から入力される映像情報１５は、モダリティ情報７として統合して記録される。ここで、マイク６が本発明における音声情報取得手段に相当し、液晶ディスプレイ８及びスピーカ９が後述する情報提示手段１９の一部機能に相当する。 As shown in FIGS. 1 and 2, the audio processing device 1 of the present embodiment is disposed in a treatment room and is connected to a processing device main body 4 that performs processing such as recording of medical audio information 3, and the processing device main body 4. A plurality of microphones 6 for acquiring voice generated in the treatment room as voice information 5 and treatment action presentation information 21 indicating the content of each treatment action and transition conditions performed in accordance with the standard treatment program 20 It is mainly composed of a liquid crystal display 8 and a speaker 9 for presenting to a therapist through video and audio. In addition, the voice processing device 1 has other configurations such as an operation input console 11 for inputting the treatment results and findings for the therapeutic action as text information 10 and inputting the text information 10, and various types for grasping the state of the treatment room. Medical device 13 for acquiring sensor information 12 (for example, an electrocardiogram monitor for measuring a vital sign such as a heart rate of a patient) or a non-contact type for detecting a motion of a therapist or the like in a treatment room The various sensors 14 and the like are connected to a video input device 16 that takes a picture of the treatment room and acquires video information 15. Here, the sensor information 12 acquired by the various sensors 14 which are the medical device 13 or the non-medical device, the text information 10 input from the console 11, and the video information 15 input from the video input device 16 are the modality information 7. Are integrated and recorded. Here, the microphone 6 corresponds to the voice information acquisition means in the present invention, and the liquid crystal display 8 and the speaker 9 correspond to a partial function of the information presentation means 19 described later.

さらに、処理装置本体４は、その機能的構成として、治療行為の手順が標準化された標準的治療プログラム２０に基づいて、各治療行為の内容及び遷移条件等が記述された治療行為提示情報２１を記憶するプログラム記憶手段２２と、プログラム記憶手段２２に記憶された治療行為提示情報２１を液晶ディスプレイ８及びスピーカ９を介して提示する情報提示手段１９と、提示される治療行為提示情報２１に同期して治療行為及びその治療行為に係る治療記録を時系列に沿ってイベントインデックス２に構造化して記録する治療行為記録手段２３と、治療行為の際に生じる音声を音声情報５としてマイク６を介して取得する音声情報取得手段２４と、治療行為を表す特徴的な言語表現２５を記憶する辞書情報２６及びその言語表現２５の治療行為毎の発現可能性を予め算出した統計情報２７を言語データベース２８に記憶する言語データベース記憶手段２９と、取得された音声情報５から、言語データベース２８に記憶された辞書情報２６に合致する言語表現２５を抽出する抽出手段３０と、抽出された言語表現２５に対する統計情報２７に基づいて、標準的治療プログラムに対応する治療行為または治療行為に対するイベントを検出する治療イベント検出手段３１と、検出された治療行為またはイベントを、抽出された言語表現２５、該言語表現の発現された発現時刻５１（図６参照）のデータを含んだ医療音声情報３として、イベントインデックス２に構造化して記録する音声情報記録手段３４とを具備して主に構成されている。なお、情報提示手段１９が本発明における治療行為提示手段に相当する。 Further, the processing device main body 4 has, as its functional configuration, the treatment action presentation information 21 describing the content of each treatment action, the transition conditions, and the like based on the standard treatment program 20 in which the procedure of the treatment action is standardized. The program storage means 22 to be stored, the information presentation means 19 for presenting the treatment action presentation information 21 stored in the program storage means 22 via the liquid crystal display 8 and the speaker 9, and the treatment action presentation information 21 to be presented are synchronized. The treatment action recording means 23 for recording the treatment action and the treatment record related to the treatment action in a time series in the event index 2 and the voice generated during the treatment action as audio information 5 through the microphone 6 Speech information acquisition means 24 to be acquired, dictionary information 26 for storing characteristic language expressions 25 representing therapeutic actions, and treatment of the language expressions 25 Language database storage means 29 for storing statistical information 27 in which the expression possibility for each purpose is calculated in advance in the language database 28, and language expression that matches the dictionary information 26 stored in the language database 28 from the acquired speech information 5 25, extraction means 30 for extracting 25, treatment event detection means 31 for detecting a treatment action corresponding to the standard treatment program or an event for the treatment action based on the statistical information 27 for the extracted language expression 25, and detected Audio information structured and recorded in the event index 2 as medical audio information 3 including the data of the extracted language expression 25 and the expression time 51 (see FIG. 6) when the language expression is expressed. The recording unit 34 is mainly configured. The information presenting means 19 corresponds to the therapeutic action presenting means in the present invention.

ここで、治療イベント検出手段３１は、言語データベース２８から特徴的な言語表現２５を抽出される際に、その言語表現２５の抽出される所定の時間幅Ｔの抽出回数、当該言語表現２５と関連する言語表現２５の抽出（組合わせ）、または複数の言語表現２５の発現順序等に応じて、治療行為またはイベントを検出する時間幅検出手段３５を含んで構成されている。 Here, when the characteristic event 25 is extracted from the language database 28, the treatment event detection unit 31 is related to the number of extractions of the predetermined time width T extracted from the language expression 25 and the language expression 25. It includes a time width detecting means 35 for detecting a therapeutic action or event according to the extraction (combination) of the language expressions 25 to be performed or the order of expression of the plurality of language expressions 25.

また、処理装置本体４は、その他の機能的構成として、治療行為またはイベントを検出するための治療イベント検出手段３１において、複数の候補イベント３６の中から一つが選択されると（選択イベント）、選択されたなかったそれ以外の非選択イベントに関する情報を未検出情報３８として記録する未検出情報記録手段３９と、治療行為等の進行に応じて未検出情報３８に含まれる非選択イベントの中から再検出し、当該治療行為またはイベントを修正する修正手段４０と、抽出された言語表現２５を統計情報２７等に基づいて数値化する第一スコア化手段４１と、処理装置本体４に記録された音声情報５を除く、映像情報１５、テキスト情報１０等のモダリティ情報７をスコア化する第二スコア化手段４２とを具備して主に構成されている。なお、第一スコア化手段４１及び第二スコア化手段４２によって数値化された情報は、スコア情報４５として記録されている。なお、本実施形態の音声処理装置１で使用される処理装置本体４は、種々のデータを記録及び管理し、また情報の取得のための各種記録機器を選択的に制御するための機能を有するものであり、ここでは汎用のパーソナルコンピュータを応用して構築されている。したがって、これらの処理装置本体４は、ＣＰＵ等の演算回路、各種信号の送受を行うためのインターフェイス回路及びインターフェイス機器、及び取得した情報を記録し、保存するためのハードディスク及び半導体メモリ等の記憶媒体を含むハードウェア構成によって構築されている。 In addition, as another functional configuration, the processing apparatus body 4 is configured such that when one of a plurality of candidate events 36 is selected in the treatment event detection unit 31 for detecting a treatment action or event (selected event), Undetected information recording means 39 for recording information on other non-selected events that have not been selected as undetected information 38, and non-selected events included in the undetected information 38 according to the progress of treatment action, etc. Re-detection and correction means 40 for correcting the treatment action or event, first scoring means 41 for digitizing the extracted language expression 25 based on statistical information 27 and the like, and recorded in the processing device body 4 The second scoring means 42 for scoring the modality information 7 such as the video information 15 and the text information 10 excluding the audio information 5 is mainly configured. That. Information digitized by the first scoring means 41 and the second scoring means 42 is recorded as score information 45. The processing device body 4 used in the sound processing device 1 of the present embodiment has functions for recording and managing various data and selectively controlling various recording devices for acquiring information. Here, it is constructed by applying a general-purpose personal computer. Therefore, these processing device main bodies 4 include an arithmetic circuit such as a CPU, an interface circuit and an interface device for transmitting and receiving various signals, and a storage medium such as a hard disk and a semiconductor memory for recording and storing the acquired information. It is constructed by a hardware configuration including

ここで、本実施形態の治療イベント検出手段３１、未検出情報記録手段３９，及び修正手段４０について、さらに詳細に説明すると、図２に模式的に示すように、標準的治療プログラム２に則って時系列に沿って実施される治療行為や該治療行為に対して発生するイベントが音声情報５を利用して抽出される。ここで、図２（及び図３）における紙面下方に向かう矢印が標準的治療プログラム２０の時系列に沿った流れを示している。そして、例えば、図２において、治療行為Ａ０（図２における上段に相当）が実施されている場合において、該治療行為Ａ０の特定の時間幅Ｔにおいて検出された特定の言語表現２５によって、次に実施される可能性を有する複数の候補イベント３６（図２における下段に相当）が提示される。なお、図２中において四角形状で表された図形が個々のイベントを示している。そして、言語表現２５の検出から、治療行為Ａ０に対して複数のイベントＡ１〜イベントＡＸが提示される。なお、各イベントには、それぞれの言語表現２５及びその他のモダリティ情報７をスコア化し、統合したスコア値４６が対応して記憶され、図２では四角枠内にイベント名とともに数値”＜５＞”〜”＜０＞”で併せて図示している。なお、スコア値４６は、本実施形態では”＜５＞”〜”＜０＞”の六段階で表すものとし、”＜５＞”になる程、言語表現２５に対して合致する可能性の高い治療行為またはイベントを示し、一方、”＜０＞”になる程、合致する可能性の低い治療行為またはイベントであることを示している。そして、上述した候補イベント３６の中から、前述した言語データベース２８に含まれる統計情報２７、及び抽出された音声情報５、標準的治療プログラム２０の実施状況、及びモダリティ情報７等をスコア化したスコア情報４５に基づいて、次に実施される可能性の高い治療行為またはイベントが検索され、選択イベント（二重枠で図示）として決定される。換言すれば、図２においては、イベントＢが選択イベントとして選ばれることとなる。一方、選択されなかったイベント（選択イベント以外の候補イベント３６）は、非選択イベントとしてリスト化され、その情報が未検出情報３８として記憶される。 Here, the treatment event detection means 31, the undetected information recording means 39, and the correction means 40 of the present embodiment will be described in more detail. In accordance with the standard treatment program 2, as schematically shown in FIG. The therapeutic action performed along the time series and the events that occur with respect to the therapeutic action are extracted using the audio information 5. Here, the arrows directed downward in the drawing in FIG. 2 (and FIG. 3) indicate the flow along the time series of the standard treatment program 20. Then, for example, in FIG. 2, when the therapeutic action A0 (corresponding to the upper stage in FIG. 2) is performed, the specific language expression 25 detected in the specific time width T of the therapeutic action A0 A plurality of candidate events 36 (corresponding to the lower part in FIG. 2) having a possibility of being implemented are presented. In FIG. 2, a figure represented by a square shape represents each event. Then, from the detection of the language expression 25, a plurality of events A1 to AX are presented for the treatment action A0. Each event is scored by integrating the language expression 25 and other modality information 7 and is stored in association with the score value 46. In FIG. 2, a numerical value “<5>” is added to the event name along with the event name. It is also illustrated with ~ “<0>”. In this embodiment, the score value 46 is expressed in six stages from “<5>” to “<0>”, and the score “<5>” may match the language expression 25. A high therapeutic action or event is indicated, while a “<0>” indicates that the therapeutic action or event is less likely to match. The score obtained by scoring the statistical information 27 included in the language database 28 described above, the extracted speech information 5, the implementation status of the standard treatment program 20, the modality information 7 and the like from the candidate events 36 described above. Based on the information 45, a treatment action or event that is likely to be performed next is searched and determined as a selected event (shown in a double frame). In other words, event B is selected as the selected event in FIG. On the other hand, events that have not been selected (candidate events 36 other than the selected event) are listed as non-selected events, and the information is stored as undetected information 38.

このとき、以前に検出された治療行為等が事後の検出によって誤っていると認識された場合、本実施形態の修正手段４０によって検出されたイベントの修正が行われる。具体的に示すと、図３のイベントＢで直前（またはそれ以前）に検出されたイベント（ここでは、イベントＡ１が相当）が誤検出と判断されると、記憶された未検出情報３８の中の非選択イベントから正しいイベント（イベントＡ３が相当）が改めて検出される。そして、このイベントＡ３から、さらに音声情報５を認識し、実施可能性のあるイベントが候補イベント３６の中から選択され、決定される（イベントＢ３が相当）。すなわち、一度、決定されたイベントであっても、事後に当該決定が不適当であった場合、記憶された未検出情報３８及び修正手段４０によって修正が行われる。そのため、音声情報５に基づくイベントの検出精度が高くなることが期待される。 At this time, when it is recognized that the previously detected treatment action or the like is wrong by the subsequent detection, the event detected by the correcting means 40 of the present embodiment is corrected. Specifically, if an event detected immediately before (or before) event B in FIG. 3 (here, event A1 is equivalent) is determined to be a false detection, A correct event (corresponding to event A3) is detected again from the non-selected events. Then, the audio information 5 is further recognized from the event A3, and an event that may be implemented is selected from the candidate events 36 and determined (corresponding to the event B3). That is, even if the event has been determined once, if the determination is inappropriate after the fact, the stored undetected information 38 and the correction means 40 correct the event. Therefore, it is expected that the detection accuracy of the event based on the audio information 5 is increased.

次に、本実施形態の音声処理装置１の処理の流れについて、図４及び図５に基づいて説明する。ここで、図４及び図５のフローチャートにおけるステップＳ１からステップＳ１６が本発明における医療音声情報処理プログラムに相当する。 Next, the flow of processing of the speech processing apparatus 1 of the present embodiment will be described based on FIGS. 4 and 5. Here, step S1 to step S16 in the flowcharts of FIGS. 4 and 5 correspond to the medical audio information processing program of the present invention.

始めに、音声処理装置１の処理装置本体４、及び処理装置本体４に接続されたマイク６等の各種機器を起動し、稼働状態にする（ステップＳ１）。そして、治療室に搬送された患者の状態を治療者が診断し、搬送直後の状態に所見に基づいて患者の診断を下す。そして、使用する標準的治療プログラムの選定が行われる。このとき、音声処理装置１は、処理装置本体４に接続されたコンソール１１を介して、選定された標準的治療プログラム２０を実行する旨の指示の入力を受付け、標準的治療プログラム２０を実行する（ステップＳ２）。これにより、処理装置本体４は、プログラム記憶手段２２に記憶された各治療行為毎に提示された治療行為提示情報２１を液晶ディスプレイ８及びスピーカ９を通じて治療者に対して提示する（ステップＳ３）。 First, various devices such as the processing device main body 4 of the voice processing device 1 and the microphone 6 connected to the processing device main body 4 are activated and put into operation (step S1). Then, the therapist diagnoses the condition of the patient transported to the treatment room, and makes a diagnosis of the patient based on the findings immediately after the transport. The standard treatment program to be used is then selected. At this time, the voice processing device 1 receives an input of an instruction to execute the selected standard treatment program 20 via the console 11 connected to the processing device main body 4 and executes the standard treatment program 20. (Step S2). As a result, the processing apparatus body 4 presents the treatment action presentation information 21 presented for each treatment action stored in the program storage means 22 to the therapist through the liquid crystal display 8 and the speaker 9 (step S3).

その後、提示された治療行為提示情報２１の内容を確認した治療者によって患者に対して治療行為が実施される。このとき、音声処理装置１は、治療室に設置された複数のマイク６によって、治療室内で発せられる治療者同士の会話や指示等の音声を音声情報５として取得する（ステップＳ４）。さらに、これと同じタイミングで、処理装置本体４に接続されたその他の機器によって各種情報を取得し、記録する（ステップＳ５）。具体的に説明すると、接続された映像入力機器１６によって治療室の様子を撮影した映像情報１５を取得し、コンソール１１を利用して入力された治療行為の結果や所見等のテキスト情報１０を取得し、治療室に配された医療機器１３或いは位置センサ等の非医療機器からなる各種センサ１４からセンサ情報１２を取得する。 Thereafter, a treatment action is performed on the patient by the therapist who has confirmed the content of the presented treatment action presentation information 21. At this time, the voice processing device 1 acquires voices such as conversations and instructions between the therapists uttered in the treatment room as the voice information 5 by the plurality of microphones 6 installed in the treatment room (step S4). Furthermore, at the same timing, various information is acquired and recorded by other devices connected to the processing apparatus body 4 (step S5). More specifically, the video information 15 obtained by photographing the state of the treatment room is acquired by the connected video input device 16, and the text information 10 such as the result of the treatment action and the findings obtained by using the console 11 is acquired. Then, the sensor information 12 is acquired from various sensors 14 including non-medical devices such as medical devices 13 or position sensors arranged in the treatment room.

その後、処理装置本体４は、取得された音声情報５に含まれる、治療行為または治療行為に対応する特徴的な（キーとなる）イベントに係る言語表現２５を言語データベース２８に基づいて抽出する（ステップＳ６）。このとき、言語データベース２８には、治療行為に関する言語表現２５を記憶した辞書情報２６と、言語表現２５の当該治療行為についての発現可能性を統計的に算出し、その結果が記録された統計情報２７によって形成されている。そのため、標準的治療プログラム２０によって提示された治療行為提示情報２１に対応する治療行為を認識し、さらに統計情報２７に基づいて当該言語表現２５を抽出することが行われる。このとき、前述したように、言語表現２５の抽出は、音声情報５に含まれる一つの言語表現２５を音声認識やワードスポッティング技術を用いて検出し、抽出するものであっても構わないが、当該治療行為またはイベント等をさらに高精度で検出するために、下記に掲げる方法によって行われている。 Thereafter, the processing device main body 4 extracts a language expression 25 related to a treatment action or a characteristic (key) event corresponding to the treatment action included in the acquired audio information 5 based on the language database 28 ( Step S6). At this time, in the language database 28, the dictionary information 26 that stores the language expression 25 related to the treatment action, and the statistical information that the expression possibility of the language expression 25 about the treatment action is statistically calculated and the result is recorded. 27. Therefore, the treatment action corresponding to the treatment action presentation information 21 presented by the standard treatment program 20 is recognized, and the language expression 25 is extracted based on the statistical information 27. At this time, as described above, the linguistic expression 25 may be extracted by detecting and extracting one linguistic expression 25 included in the speech information 5 using speech recognition or word spotting technology. In order to detect the treatment action or event with higher accuracy, the following method is used.

すなわち、図２または図３に示すように、音声情報５に含まれるイベント等に特徴的な言語表現２５ａを検出する場合、所定の時間幅Ｔの中に言語表現２５ａに関連する関連言語表現２５ｂ，２５ｃが含まれているとすると、これらの複数の言語表現２５ａを認識することによって、イベント等の検出精度を向上させることが行われる。すなわち、音声情報５の中には治療行為やイベントに関すること以外のその他の言語表現も多数存在するため、これらをすべて認識し、イベント等の関連づけた検出処理をすることは音声処理装置１の処理装置本体４に対する負担が過剰となる。そこで、標準的治療プログラム２０によって提示される各治療行為毎に検出する言語表現２５ａが所定の時間幅Ｔの中で一定の条件で認識されか否かを判断する。すなわち、これにより、例えば、所定の時間幅Ｔの中で”気管閉塞”の言語表現２５ａが抽出され、その後に気道を確保するための”気管内チューブ”等の医療器具の名称や気管挿管を要求する”治療指示”に関する関連言語表現２５ｂ，２５ｃ等が抽出された場合、係るイベントは”気管挿管”の治療行為またはイベントに関するものと検出され、標準的治療プログラムに則った治療行為の流れに合わせて認識される。なお、一つの言語表現２５ａが複数回（例えば、三回以上）検出された場合に初めて当該言語表現２５ａを抽出するなどの条件を定めるものであってもよい。また、それぞれの言語表現２５ａの発現順序、例えば、”言語表現２５ａ→関連言語表現２５ｂまたは関連言語表現２５ｃ”は認識し、一方、”関連言語表現２５ｂ→言語表現２５ａ”の発現順序の場合は認識しない等の発現順序を関連づけるものであっても構わない。 That is, as shown in FIG. 2 or FIG. 3, when detecting a language expression 25a characteristic of an event or the like included in the audio information 5, the related language expression 25b related to the language expression 25a within a predetermined time width T. , 25c is included, the detection accuracy of events and the like is improved by recognizing these plural language expressions 25a. That is, since there are many other language expressions in the audio information 5 other than those related to the treatment action or the event, it is the process of the audio processing device 1 that recognizes all of them and performs the detection process associated with the event or the like. The burden on the apparatus main body 4 becomes excessive. Therefore, it is determined whether or not the language expression 25a detected for each treatment action presented by the standard treatment program 20 is recognized under a certain condition within a predetermined time width T. That is, by this, for example, the language expression 25a of “tracheal obstruction” is extracted within a predetermined time width T, and thereafter the name of a medical device such as “intratracheal tube” and a tracheal intubation for securing the airway are obtained. When the related language expressions 25b, 25c, etc. relating to the requested “treatment instruction” are extracted, the event is detected as related to the “tracheal intubation” treatment action or event, and the treatment action flow according to the standard treatment program is detected. It is recognized together. Note that a condition may be set such that, for example, when one language expression 25a is detected a plurality of times (for example, three times or more), the language expression 25a is first extracted. In addition, in the case of the expression order of each language expression 25a, for example, “language expression 25a → related language expression 25b or related language expression 25c” is recognized, while “expression of related language expression 25b → language expression 25a”. It may be associated with the expression order such as not recognizing.

そして、抽出された言語表現２５ａ等が統計情報２７及び標準的治療プログラム２０の治療行為提示情報２１に対応するようにスコア化される（ステップＳ７）。さらに、同様に音声情報５から言語表現２５ａ等が抽出された際の標準的治療プログラム２０の状態（提示される治療行為、治療行為に対する記録を含む）及び音声情報５以外に取得された映像情報１５等のモダリティ情報７のスコア化も同時に行われる（ステップＳ８，ステップＳ９）。そして、それぞれスコア化されたスコア値４６を統合する（ステップＳ１０）。このとき、スコア値４６の算出は、言語表現２５ａ等や言語表現２５ａ等に対応する統計情報２７、或いは治療行為に対する関連性等に基づいて予め規定された基準に従って行われる。 Then, the extracted language expression 25a and the like are scored so as to correspond to the statistical information 27 and the treatment action presentation information 21 of the standard treatment program 20 (step S7). Further, the state of the standard treatment program 20 (including the presented treatment action and a record of the treatment action) when the language expression 25a is extracted from the voice information 5 and the video information acquired in addition to the voice information 5 The modality information 7 such as 15 is scored simultaneously (steps S8 and S9). Then, the score values 46 that have been scored are integrated (step S10). At this time, the score value 46 is calculated according to a standard defined in advance based on the language expression 25a and the like, the statistical information 27 corresponding to the language expression 25a and the like, the relevance to the treatment action, and the like.

そして、統合されたスコア情報４５に基づいて当該時点における治療行為またはイベントを検出する（ステップＳ１１）。このとき、検出されたイベント等が前段階で検出されたイベント等と時系列的な流れにおいて合致しない場合、即ち、前段階で検出されたイベントが誤検出と判断される場合（ステップＳ１２においてＹＥＳ）、前段階のイベント検出の際に未検出情報３８として記録された未選択イベントの中から、改めてイベント等の検出処理がなされる（ステップＳ１３：図３二点鎖線参照）。一方、誤検出がない場合（ステップＳ１２においてＮＯ）、ステップＳ１３の処理をキャンセルする。 Then, based on the integrated score information 45, a treatment action or event at that time is detected (step S11). At this time, if the detected event or the like does not match the event or the like detected in the previous stage in a time-series flow, that is, if the event detected in the previous stage is determined to be a false detection (YES in step S12). ) From the unselected events recorded as the undetected information 38 at the time of event detection in the previous stage, an event detection process is performed again (see step S13: two-dot chain line in FIG. 3). On the other hand, when there is no false detection (NO in step S12), the process of step S13 is cancelled.

その後、治療行為に対するイベントの記録が、治療行為の記録と同期してイベントインデックス２に構造化して記録される（ステップＳ１４）。ここで、図６にイベントインデックス２の一例を示すと、検出された治療行為またはイベントに係るイベント項目５０、抽出された言語表現２５、言語表現の発現時刻５１、スコア情報４６等が構造化して記録されている。これにより、事後に治療行為を検証する作業において、各イベント等の項目が構造化して記録されているため、イベント項目５０や言語表現２５等を指定することにより、音声情報５及びその他関連するモダリティ情報７を容易に再生することができる。なお、図示していないが、モダリティ情報７もイベントインデックス２に同期して構造化して記録されている。 Thereafter, a record of the event for the treatment action is structured and recorded in the event index 2 in synchronization with the record of the treatment action (step S14). Here, an example of the event index 2 is shown in FIG. 6. The event item 50 related to the detected treatment action or event, the extracted language expression 25, the expression time 51 of the language expression, the score information 46, etc. are structured. It is recorded. As a result, since the items such as each event are structured and recorded in the work of verifying the treatment action after the fact, the audio information 5 and other related modalities can be specified by specifying the event item 50, the language expression 25, etc. Information 7 can be easily reproduced. Although not shown, the modality information 7 is also structured and recorded in synchronization with the event index 2.

そして、処理装置本体４は、イベントインデックス２に各種情報を記録した後、新たな音声情報５の取得を継続するか否かの判断の指示の有無を検出し、音声情報５の取得を継続しない場合（ステップＳ１５においてＹＥＳ）、音声処理装置１による音声情報５の取得及びイベント検出の処理を終了する（ステップＳ１６）。一方、音声情報５の取得を継続する場合（ステップＳ１５においてＮＯ）、ステップＳ４の処理に戻り、マイク６を介して取得される音声情報５の取得及び言語表現２５ａ等の抽出を継続する。 Then, after recording various information in the event index 2, the processing device body 4 detects the presence or absence of an instruction for determining whether or not to continue acquiring new audio information 5, and does not continue acquiring audio information 5. In the case (YES in step S15), the acquisition of the audio information 5 and the event detection process by the audio processing device 1 are terminated (step S16). On the other hand, when the acquisition of the voice information 5 is continued (NO in step S15), the process returns to step S4 and the acquisition of the voice information 5 acquired through the microphone 6 and the extraction of the language expression 25a and the like are continued.

上記に示したように、本実施形態の音声処理装置１によれば、標準的治療プログラムに基づいて実施される治療行為または治療行為に対するイベントを、マイク６を通じて取得した音声情報５に基づいて検出し、イベントインデックス２に同期して記録することができる。特に、言語データベース２８に予め記憶された言語表現２５を限定的に抽出することにより、かつ発現可能性を統計的に示した統計情報２７を利用することにより、各治療行為に対するイベントの検出精度が向上する。また、音声情報５とともに、音声処理装置１に接続されたその他の映像情報１５等のモダリティ情報７をスコア化し、これと抽出された言語表現２５のスコアと統合することにより、イベント検出精度をさらに高めることができる。 As described above, according to the voice processing device 1 of the present embodiment, a treatment action performed based on the standard treatment program or an event for the treatment action is detected based on the voice information 5 acquired through the microphone 6. However, it can be recorded in synchronization with the event index 2. In particular, by extracting linguistic expressions 25 pre-stored in the language database 28 and using statistical information 27 that statistically indicates the possibility of expression, the detection accuracy of events for each treatment action is improved. improves. In addition to the voice information 5, the modality information 7 such as other video information 15 connected to the voice processing device 1 is scored and integrated with the extracted language expression 25 score, thereby further improving the event detection accuracy. Can be increased.

さらに、発生したイベントの種類（イベント項目５０）、発現時刻５１と連動し、イベントインデックス２に構造化して記録されたその他のモダリティ情報７を再生することが可能となり、治療行為を事後に確認し、診療記録等を作成する場合の再生作業が速やかに行われる。さらに、検出されたイベント等をタグ情報として活用し、類似症例等の検索に応用することも可能となる。 In addition, it is possible to reproduce other modality information 7 structured and recorded in the event index 2 in conjunction with the type of event that occurred (event item 50) and onset time 51, and confirm the treatment action after the fact. In the case of creating medical records and the like, reproduction work is promptly performed. Furthermore, the detected event or the like can be used as tag information and applied to a search for similar cases or the like.

以上、本発明について好適な実施形態を挙げて説明したが、本発明はこれらの実施形態に限定されるものではなく、以下に示すように、本発明の要旨を逸脱しない範囲において、種々の改良及び設計の変更が可能である。 The present invention has been described with reference to preferred embodiments. However, the present invention is not limited to these embodiments, and various modifications can be made without departing from the spirit of the present invention as described below. And design changes are possible.

すなわち、本実施形態の音声処理装置１において、所定の時間幅Ｔの間で抽出された複数の言語表現２５に基づいてイベント等を検出するものを示したが、これに限定されるものではなく、音声情報５に含まれる言語表現２５から直接イベント等を検出するものであっても構わない。これにより、検出精度は若干低下するものの、音声処理装置１に係る負担を軽減することができる。そのため、複雑な言語表現２５を含まない状況においては、検出精度を切替え、単一の言語表現２５のみの抽出を行うものであってもよい。 That is, in the speech processing apparatus 1 according to the present embodiment, the detection of an event or the like based on a plurality of language expressions 25 extracted during a predetermined time width T is shown, but the present invention is not limited to this. The event or the like may be directly detected from the language expression 25 included in the audio information 5. Thereby, although the detection accuracy slightly decreases, the burden on the audio processing device 1 can be reduced. Therefore, in a situation where the complicated language expression 25 is not included, the detection accuracy may be switched and only the single language expression 25 may be extracted.

さらに、本実施形態の音声処理装置１において、イベント等の検出を、音声情報５と、その他映像情報１５を含むモダリティ情報８とをスコア化し、それぞれを統合したイベント等を判断するものを示したが、これに限定されるものではなく、音声情報５のみでイベント等を抽出するものであっても構わない。 Furthermore, in the audio processing device 1 of the present embodiment, the detection of an event or the like is shown by scoring the audio information 5 and the modality information 8 including the other video information 15 to determine an event or the like that integrates each. However, the present invention is not limited to this, and an event or the like may be extracted using only the audio information 5.

本実施形態の音声処理装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio processing apparatus of this embodiment. 音声処理装置による音声認識を模式的に示す説明図である。It is explanatory drawing which shows typically the speech recognition by a speech processing unit. 音声処理装置による音声認識を模式的に示す説明図である。It is explanatory drawing which shows typically the speech recognition by a speech processing unit. 音声処理装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of an audio processing apparatus. 音声処理装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of an audio processing apparatus. イベントインデックスの一例を示す説明図である。It is explanatory drawing which shows an example of an event index.

Explanation of symbols

１音声処理装置（医療音声情報処理装置）
２イベントインデックス
３医療音声情報
５音声情報
６マイク（音声入力機器）
７モダリティ情報
８液晶ディスプレイ（情報提示手段、治療行為提示手段）
９スピーカ（情報提示手段、治療行為提示手段）
１０テキスト情報（モダリティ情報）
１２センサ情報（モダリティ情報）
１５映像情報（モダリティ情報）
１９情報提示手段（治療行為提示手段）
２０標準的治療プログラム
２１治療行為提示情報
２３治療行為記録手段
２４音声情報取得手段
２５音声情報
２５，２５ａ言語表現
２５ｂ，２５ｃ関連言語表現（言語表現）
２７統計情報
２８言語データベース
２９言語データベース記憶手段
３０抽出手段
３１治療イベント検出手段
３２発現時刻
３４音声情報記録手段
３５時間幅検出手段
３８未検出情報
３９未検出情報記録手段
４０修正手段
４１第一スコア化手段
４２第二スコア化手段
Ｔ時間幅 1 Voice processing device (medical voice information processing device)
2 Event index 3 Medical voice information 5 Voice information 6 Microphone (voice input device)
7 Modality information 8 Liquid crystal display (information presentation means, treatment action presentation means)
9 Speaker (Information presentation means, treatment action presentation means)
10 Text information (modality information)
12 Sensor information (modality information)
15 Video information (modality information)
19 Information presentation means (treatment action presentation means)
20 Standard treatment program 21 Treatment action presentation information 23 Treatment action recording means 24 Audio information acquisition means 25 Audio information 25, 25a Language expression 25b, 25c Related language expression (language expression)
27 Statistical information 28 Language database 29 Language database storage means 30 Extraction means 31 Treatment event detection means 32 Onset time 34 Audio information recording means 35 Time width detection means 38 Undetected information 39 Undetected information recording means 40 Correction means 41 First scoring Means 42 Second scoring means T Time width

Claims

A treatment action presentation means for presenting treatment action presentation information corresponding to the treatment action based on a standard treatment program in which the procedure of the treatment action is standardized;
A treatment act recording means for structuring and recording the treatment action in an event index in time series in synchronization with the presented treatment action presentation information;
Voice information acquisition means for acquiring voice generated during the treatment as voice information;
Language database storage means for storing, in a language database, characteristic linguistic expressions indicating the treatment action, and statistical information calculated for each treatment action of the expression possibility of the language expression;
Extraction means for extracting the language expression stored in the language database from the acquired voice information;
Treatment event detection means for detecting the treatment action of the standard treatment program or an event corresponding to the treatment action using the statistical information from the extracted language expression;
Audio information recording means for structuring and recording medical audio information including at least one of the detected treatment action or event, the extracted language expression, and the expression time of the language expression in the event index A medical voice information processing apparatus.

The treatment event detection means includes
Based on at least one of a combination of a plurality of language expressions extracted within a time range of a predetermined range before and after the expression of the language expression, the number of extractions of the language expression, and the order of expression of the language expression The medical sound information processing apparatus according to claim 1, wherein the medical sound information processing apparatus is detected.

The treatment event detection means includes
A non-detected information holding means for holding one of a plurality of the therapeutic actions or the event selected and detected, and holding the non-selected therapeutic action or the event as undetected information;
The said treatment action or the said event detected after the fact further comprises the correction means which corrects the said treatment action or the said event hold | maintained in the said undetected information, The claim 1 or Claim 2 characterized by the above-mentioned. Medical voice information processing equipment.

First scoring means for scoring the extracted language expression;
Second scoring means for scoring at least one modality information of video information, text information, and sensor information acquired corresponding to the treatment action or the event,
The treatment event detection means includes
The medical speech information processing apparatus according to any one of claims 1 to 3, wherein the scored language expression and the modality information are integrated to detect the treatment action or the event.

A treatment action presentation means for presenting treatment action presentation information corresponding to the treatment action based on a standard treatment program in which the procedure of the treatment action is standardized;
A therapeutic action recording means for recording the therapeutic action in an event index in time series in synchronization with the presented therapeutic action presentation information;
Voice information acquisition means for acquiring voice generated during the treatment as voice information;
Language database storage means for storing, in a language database, characteristic linguistic expressions indicating the treatment action, and statistical information calculated for each treatment action of the possibility of expression of the language expression;
Extraction means for extracting the language expression stored in the language database from the acquired voice information;
Treatment event detection means for detecting the treatment action of the standard treatment program or an event corresponding to the treatment action using the statistical information from the extracted language expression;
And a voice for structuring and recording medical voice information including at least one of the extracted linguistic expression and the expression time of the linguistic expression in the event index as the recognized therapeutic action or the event. A medical speech information processing program for causing a medical speech information processing apparatus to function as information recording means.

Based on at least one of a combination of a plurality of language expressions extracted within a time range of a predetermined range before and after the expression of the language expression, the number of extractions of the language expression, and the order of expression of the language expression 6. The medical sound information processing program according to claim 5, wherein the medical sound information processing apparatus is further caused to function as the event recognition means for detecting the treatment action or the event.

Unrecognized information holding means for selecting and recognizing one of a plurality of candidates for the treatment action or the event, and holding the treatment action or the event not selected as unrecognized information;
The medical sound information processing apparatus is further caused to function as a correcting means for correcting the treatment action or the event held in the unrecognized information from the treatment action or the event recognized after the fact. The medical sound information processing program according to claim 5 or 6.

First scoring means for scoring the extracted language expression;
Second scoring means for scoring at least one modality information of video information, text information, and sensor information acquired corresponding to the treatment action or the event;
The medical speech information processing apparatus is further caused to function as the treatment event detecting means for integrating the scored language expression and the modality information and detecting the treatment action or the event. The medical speech information processing program according to any one of claims 5 to 7.