WO2000054180A1

WO2000054180A1 - System and method for creating formatted document on the basis of conversational speech recognition

Info

Publication number: WO2000054180A1
Application number: PCT/JP2000/001339
Authority: WO
Inventors: Tadamitsu Ryu
Original assignee: CAI Co Ltd
Current assignee: CAI Co Ltd
Priority date: 1999-03-05
Filing date: 2000-03-06
Publication date: 2000-09-14
Anticipated expiration: 2001-09-05

Abstract

A formatted document creating system used for recording conversation comprises a speech recognition device, an information extracting section for extracting information to be recorded according to the results of speech recognition, a conversation record buffer for recording a sentence including the extracted information, a case database for storing therein a case which may occur as the content of conversation, a topicality control section for deducing an undefined information on the basis of one item of information included in the sentence recorded in the conversation record buffer section by referencing cases stored in the case database, a multimodal interaction control section for inquiring the undefined information of the document creator to define the undefined information, an input/output device control section for controlling the communication with the document creator, an input device for inputting the intention of the document creator into a computer system, and a conversation record archive for storing therein the record of the defined conversation. The system can be applied to conversation on the phone.

Description

明細書 Specification

会話の音声認識に基づく定型的文書作成システム及び方法 Typical document creation system and method based on speech recognition of conversation

技術分野 Technical field

本発明は、一般的に、定型的な相談などの会話の文書化に係り、特に、例えば、電話を介しての対話をリアルタイムに文章化する定型的文書作成システム及び定型的文書作成方法に関する。 The present invention generally relates to the documentation of conversations such as routine consultations, and more particularly to, for example, a routine document creation system and a routine document creation method for writing a dialogue via a telephone in real time. .

の背畺 Back

相談等の会話の内容が文書化されれば、ユーザ情報や相談内容の管理に役立つ。そこで、従来、会話の記録は、単に録音を残すか、人手でメモを作成することに頼っていた。前者の方法は、録音を再生して人手による書き起こしをするが、すなわち、文章化するがその作業に多大な時間と人件費を必要とすることとなる欠点があった。一方、後者の方法は、メモを取る必要性から会話に集中できない欠点がある上、メモを取るスピードにも限界があった。そこで、会話の記録をただ単に録音するのではなく、これをコンピュータの音声認識機能を用いて文章化することが考えられる。このようなものとして、例えば、特開平 6— 2 5 3 0 5 7 あるいは特開平 5— 1 6 0 9 2 5がある。しかしながら、音声認識による書き起こし結果の中の情報を用いて、対話的に定型的な記録を作成する手法について規定した提案はない。 If the contents of conversations such as consultations are documented, it will be useful for managing user information and consultation contents. In the past, recording conversations has relied on simply leaving a recording or making notes manually. The former method replays a recording and transcribes it manually, but it has the disadvantage that it requires a great deal of time and labor to document it. On the other hand, the latter method has a drawback that it is not possible to concentrate on the conversation due to the need to take notes, and the speed of taking notes is also limited. Therefore, instead of simply recording a recording of the conversation, it is conceivable to convert it into a sentence using the speech recognition function of a computer. Examples of such a device include Japanese Patent Application Laid-Open No. Hei 6-255307 and Japanese Patent Application Laid-Open No. Hei 5-169925. However, there is no proposal that stipulates a method for interactively creating a standardized record using the information in the transcription result by speech recognition.

また、上述した音声認識による書き起こしの精度は、しばしば実用のために十分ではない。しかしながら、音声認識による書き起こしは、そのまま文書化するには十分でないとしても、定型的な記録を作成するために必要な情報（記録すベき情報）を含んでいる。例えば、電話を通じての医師と患者との間の会話では、一病気や怪我についての相談であろうことから身体の状態や症状、時間についての情報が記録すべき会話の内容となる。各種サービス業におけるカウンタ一での相談、例えば、旅行代理店におけるカウンターでは、旅行の目的、人数、時期、場所、費用、日程等が会話の内容となる。このような場合、話される会話の内容がある程度限定されており、また、一の情報とそれに続く他の情報との間には、密接な関連を有する場合が多い。一方、グラフィックユーザ一インタフェイス技術の進歩により、グラフィック表示装置とボインティングディバイスの組み合わせにより、情報をコンピュータに表示かつ入力することが一般的になっている。さらに、音声認識および合成技術の発達により、コンピュータシステムと操作者との間で、音声による情報の授受が可能になっている。 In addition, the accuracy of the transcript described above is often not sufficient for practical use. However, transcripts from speech recognition contain the necessary information (information to be recorded) to create a routine record, even if it is not sufficient to document it as is. For example, in a conversation between a physician and a patient over the telephone, information about the physical condition, symptoms, and time is the content of the conversation that should be recorded because it may be a consultation about illness or injury. Consultations at counters in various service industries, such as counters at travel agencies, include the purpose of travel, number of people, timing, location, cost, and schedule. In such cases, the content of the spoken conversation is somewhat limited, and there is often a close relationship between one piece of information and the other information that follows it. On the other hand, with the advance of the graphic user interface technology, it has become common to display and input information to a computer by a combination of a graphic display device and a pointing device. In addition, the development of speech recognition and synthesis technologies has made it possible to send and receive information between computer systems and operators by voice.

そこで、本発明の目的は、定型的な相談などの会話の内容を効率的に文書化する定型的文書作成システム及び定型的文書作成方法を提供することである。 Therefore, an object of the present invention is to provide a standardized document creation system and a standardized document creation method for efficiently documenting the contents of a conversation such as a regular consultation.

発明の閗示 DISCLOSURE OF THE INVENTION

本発明では、上記の目的を解決するための手段として、音声認識による書き起こし結果の中の情報を用いて、対話的に定型的な記録を作成する。 In the present invention, as a means for solving the above-mentioned object, a standardized record is created interactively using information in a transcription result by voice recognition.

より具体的には、本発明は、会話を記録する際用いる定型的文書作成シス亍ムであって、会話の音声を認識する音声認識装置と、音声認識結果から記録すべき情報を抽出する情報抽出部と、抽出された情報を含む文章を会話の内容として記錄する会話記録バッファと、複数の情報を含んで構成された、会話の内容として生起し得る事例を保存する事例データベースと、事例データベースに保存された事例を参考にして、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を行う話題性制御部と、会話記録バッファに記録された未確定情報について、文書作成者に問い合わせて当該未確定情報を確定するマルチモーダル対話制御部と、グラフィックを用いての文書作成者とのコミュニケーシヨンを制御する G U I 制御部と、文書作成者の意志をコンピュータシステムに入力する入力装置と、そして、確定された会話の記録を保存する会話記録ァ一力ィブとを含んで構成されてなる定型的文書作成システムを提供する。 More specifically, the present invention relates to a typical document creation system used when recording a conversation, a speech recognition device for recognizing speech of a conversation, and information for extracting information to be recorded from a speech recognition result. An extraction unit, a conversation record buffer that records sentences containing the extracted information as conversation contents, a case database that includes a plurality of pieces of information and stores cases that may occur as conversation contents, and a case. A topicality control unit that estimates other unconfirmed information from one sentence information recorded in the conversation recording buffer with reference to the cases stored in the database, and an unconfirmed recorded in the conversation recording buffer Communication between the multi-modal dialogue control unit that inquires the document creator of the information to determine the undecided information and the document creator using graphics , A GUI control unit for controlling the operation of the document, an input device for inputting the intention of the document creator to the computer system, and a conversation recording device for storing a record of the confirmed conversation. To provide a strategic document creation system.

事例データベースに、会話の内容として生起し得る事例であって、複数の情報を含んで構成された事例を保存する。本発明では、使用される状況によって、会話の内容として生起し得る事例は大きく異なることがあり得る。しかしながら、個々の使用状況においては、ある程度限定された会話の内容となることが想定される。そして、保存された各事例中に、その会話を記録する上で重要な意義を有する情報（記録すべき情報）が複数存在している。これらの情報及びその関連を, 現実の会話における個々の情報を確定する際に利用する。これにより、会話記録を作成する際、文章化された会話記録を構成する情報の候補として正答である情報、又は、正答である確率の高い情報を提示可能とする。 In the case database, cases that can occur as the content of a conversation and that include multiple pieces of information are stored. In the present invention, the cases that can occur as the contents of a conversation may vary greatly depending on the situation in which they are used. However, it is expected that the content of the conversation will be limited to some extent in individual usage situations. Then, in each saved case, there are multiple pieces of information (information to be recorded) that are important in recording the conversation. This information and its relationship are used to determine individual information in real conversations. This allows you to record conversations When creating a document, it is possible to present information that is a correct answer or information that has a high probability of being a correct answer as a candidate for information constituting a documented conversation record.

このような会話記録は、会話記録バッファに記録される。そして、未確定情報を含む会話記録は、 GUI制御部により操作者の持つコンピュータのモニタに写しだされている GUI上に提示される。こここまでの作業は、会話を行っている間にコンピュータがリアルタイムに作成する。会話記録バッファに記録された未確定情報についての文書作成者に対する問い合わせに関し、文書作成者は、各種の入力装置を用いてその意志をコンピュータシステムに入力する。すなわち、未確定情報が正答である場合にはそのまま確定し、また、誤っている場合には、他の正答である候補を選択して確定し又は自らが正答を入力して確定する。かかる作業は、ほとんどの場合、正答である未確定情報をクリックして確定する作業であり、会話の流れを阻害することなく短時間で必要な情報を全て含む会話記録を作成することができるようにする。確定した会話記録は、会話記録アーカイブに記録して、いつでも利用可能なように整理して蓄積される。 Such a conversation record is recorded in a conversation record buffer. The conversation record including the undetermined information is presented by the GUI control unit on the GUI displayed on the computer monitor of the operator. The work up to this point is created in real time by a computer during a conversation. Regarding inquiries to the document creator about the indeterminate information recorded in the conversation recording buffer, the document creator inputs his intention to the computer system using various input devices. In other words, if the undecided information is correct, the decision is made as it is. If the information is incorrect, another candidate that is a correct answer is selected and decided, or the correct answer is entered by itself and decided. In most cases, the task is to click the undecided information, which is the correct answer, to confirm the task, so that a conversation record containing all necessary information can be created in a short time without interrupting the flow of the conversation. To The confirmed conversation record is recorded in the conversation record archive and organized and stored so that it can be used at any time.

請求項 2に記載の本発明は、請求項 1 に記載の定型的文書作成システムにおいて、入力装置が、音声合成装置、音響発生装置、文字入力装置、ポインティングディバイス、グラフィック表示装置のいずれか一つ又はそれらの組み合わせを含んで構成されてなることを特徴とする。 According to a second aspect of the present invention, in the fixed form document creation system according to the first aspect, the input device is any one of a speech synthesizer, a sound generator, a character input device, a pointing device, and a graphic display device. Or a combination thereof.

入力装置は、操作者の意志をコンピュータシステムに入力することができるものであれば、どのような種類のものであっても良いが、操作性、正確性等から音声合成装置、音響発生装置、文字入力装置、ポインティングディバイス、グラフィック表示装置のいずれか一つ又はそれらの組み合わせであることが好ましい。請求項 3に記載の本発明は、請求項 1 に記載の定型的文書作成システムにおいて、話題性制御部が会話記録バッファにおいて文書作成者が確定した情報を、事例データベースに事例として保存するように構成されてなることを特徴とする。文書作成者による未確定情報の確定作業から得られる学習効果が、それ以降に行われる会話記録の作成に有効利用されることとなる。 The input device may be of any type as long as it can input the operator's intention into the computer system, but a voice synthesizer, a sound generator, etc., from the viewpoint of operability and accuracy. , A character input device, a pointing device, a graphic display device, or a combination thereof. According to a third aspect of the present invention, in the fixed form document creation system according to the first aspect, the topicality control unit stores information determined by the document creator in the conversation recording buffer as an example in the case database. It is characterized by having such a configuration. The learning effect obtained from the work to determine the unconfirmed information by the document creator will be effectively used for creating conversation records that will be performed thereafter.

請求項 4に記載の本発明は、請求項 1 に記載の定型的文書作成システムにおいて、ある情報についての情報抽出が蓋然的である場合、情報抽出部が、当該情報内容に関する複数の候補を統計などに基づく尤度情報と共に会話記録バッファに書き込むように構成されてなることを特徴とする。 According to a fourth aspect of the present invention, in the fixed form document creation system according to the first aspect, when information extraction for certain information is probable, the information extraction unit includes It is characterized in that it is configured to write a plurality of candidates relating to the contents together with likelihood information based on statistics and the like in a conversation recording buffer.

情報抽出部が、音声認識結果から記録すべき情報を抽出する際、 1 0 0 %の確率で一の情報を特定できる場合には確定された情報として文章化される。それ以外の場合には、すなわち、ある情報についての情報抽出が蓋然的である場合には、当該情報内容に関する複数の候補を、統計などに基づ〈尤度情報と共に会話記録バッファに書き込むようにする。文書作成者は、未確認情報を確定する際、この尤度情報を参照する。 When the information extraction unit extracts information to be recorded from the speech recognition result, if it can identify one piece of information with a probability of 100%, it is written as determined information. In other cases, that is, when it is probable that information extraction for a certain information is probable, a plurality of candidates related to the information content are written to the conversation recording buffer together with the likelihood information based on statistics or the like. To The document creator refers to this likelihood information when determining unconfirmed information.

請求項 5に記載の本発明は、請求項 1 に記載の定型的文書作成システムにおいて、会話記録バッファの情報に入力されるべき内容の推定が一意に決まらない場合、話題性制御部が、複数の候補を統計などに基づ〈尤度情報と共に会話記録バッファに書き込むように構成されてなることを特徴とする。 According to a fifth aspect of the present invention, in the fixed form document creation system according to the first aspect, when the estimation of the content to be input to the information of the conversation record buffer is not uniquely determined, the topicality control unit may be configured to use the topicality control unit. It is characterized in that it is configured to write a plurality of candidates in a conversation recording buffer together with the likelihood information based on statistics or the like.

前述のように、話題性制御部は、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を行う。会話記録バッファの情報に入力されるべき内容の推定が一意に決まらない場合、話題性制御部が、複数の候補を統計などに基づく尤度情報と共に会話記録バッファに書き込む。このように、話題性制御部が統計などに基づく尤度情報を用いることにより、音声認識による書き起こし結果の中の情報と、ユーザが入力あるいは決定した情報から、ユーザが選択すべき情報内容の順序付け、あるいは絞り込みを行う。このことは、定型的情報内容の記述作成を効率化する。 As described above, the topicality control unit estimates other undetermined information from one piece of information of a sentence recorded in the conversation recording buffer. If the content to be input to the information in the conversation recording buffer cannot be uniquely determined, the topicality control unit writes a plurality of candidates to the conversation recording buffer together with likelihood information based on statistics. As described above, by using the likelihood information based on statistics or the like by the speech control unit, the information to be selected by the user from the information in the transcription result by the speech recognition and the information input or determined by the user. Order or narrow down the content. This streamlines the creation of descriptions of routine information contents.

会話記録バッファに記録されたこのような未確定情報について、文書作成者に問い合わせて当該未確定情報を確定する。未確定情報についての文書作成者への問い合わせに対し、文書作成者は、各種の入力装置を用いてその意志をコンビュータシステムに入力する。すなわち、未確定情報の中に正答がある場合にはそれを選択して確定し、また、正答が候補中に存在しない場合には自らが正答を入力して確定する。文書作成者は、これらの作業を行う際、この尤度情報を参照する。請求項 6に記載の本発明は、請求項 1 に記載の定型的文書作成システムにおいて、マルチモーダル対話制御部が、音声、 G U I あるいは文字入力により文書作成者に問い合わせて会話記録バッファ中に記録された文章の情報を確定するように構成されてなることを特徴とする。 For such unconfirmed information recorded in the conversation record buffer, the document creator is inquired to determine the unconfirmed information. In response to an inquiry to the document creator about undecided information, the document creator enters his / her intention into the computer system using various input devices. In other words, if there is a correct answer in the undecided information, it is selected and confirmed, and if the correct answer does not exist in the candidates, the user himself inputs the correct answer and decides. The document creator refers to this likelihood information when performing these operations. According to a sixth aspect of the present invention, in the fixed form document creating system according to the first aspect, the multi-modal dialogue control unit creates a document by voice, GUI, or character input. It is characterized in that it is configured to determine the information of the sentence recorded in the conversation recording buffer by inquiring of the creator.

操作者の使い易い手段を用いて文章の情報を確定する。操作性がよリ向上する。請求項 7に記載の本発明は、請求項 5に記載の定型的文書作成シス亍厶において、マルチモーダル対話制御部が、会話記録バッファ中の問い合わせるべき情報について、尤度の高い候補から順番に文書作成者へ問い合わせを行うように構成されてなることを特徴とする。 The sentence information is determined using a means that is easy for the operator to use. Operability is improved. According to a seventh aspect of the present invention, in the fixed form document creating system according to the fifth aspect, the multi-modal dialogue control unit sequentially searches the information to be queried in the conversation record buffer from candidates having a high likelihood. It is configured to make an inquiry to the document creator in advance.

正答である可能性の高い候補から順番に文書作成者へ問い合わせを行うことにより、正答である情報に行き着く時間を短くする。 By inquiring the document creator in order from the candidates having the highest possibility of being the correct answer, the time to reach the information that is the correct answer is shortened.

請求項 8に記載の本発明は、請求項 1〜 7のいずれか 1項に記載の定型的文書作成システムにおいて、さらに、離れた場所間での会話を可能とする電話機と、そして、相手方電話番号を取得し会話記録バッファに記録する発信者電話番号取得装置とを含んでなることを特徴とする。 The present invention according to claim 8 is the fixed form document creation system according to any one of claims 1 to 7, further comprising: a telephone that enables conversation between remote locations; and A caller telephone number obtaining device for obtaining a number and recording the number in a conversation recording buffer.

音声認識装置は、電話を通じての会話の音声を認識することもできる。送信側及び/又は受信側の音声を認識し、これを受けて情報抽出部が音声認識結果から記録すべき情報を抽出する。発信者電話番号取得装置は、相手方電話番号を取得し会話記録バッファに記録する。相手方電話番号から、電話の他端末の話し手についての種々の情報を得ることでき、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を容易とする。 The voice recognition device can also recognize voice of a conversation through a telephone. The voice on the transmitting side and / or the receiving side is recognized, and in response to this, the information extracting unit extracts information to be recorded from the voice recognition result. The caller telephone number acquisition device acquires the other party's telephone number and records it in the conversation recording buffer. Various information about the speaker of the other terminal of the telephone can be obtained from the other party's telephone number, and it is easy to guess other undetermined information from one piece of text recorded in the conversation recording buffer.

第二態様に係る本発明は、会話を記録する際用いる定型的文書作成方法であつて、会話の音声を認識する音声認識ステップと、音声認識結果から記録すべき情報を抽出する情報抽出ステップと、抽出された情報を含む文章を会話の内容として会話記録バッファに書き込む書込ステップと、複数の情報を含んで構成された、会話の内容として生起し得る事例を事例データベースに保存する事例保存ステツプと、事例データベースに保存された事例を参考にして、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を行い、前記会話記録バッファにセッ卜する推測情報セットス亍ッブと、会話記録バッファに記録された未確定情報を含む文章をグラフィックで文書作成者に提示し、当該未確定情報について文害作成者に問い合わせて情報内容を確定する情報問合更新ステツプと、そして、確定された会話の記録を文章として保存する会話記録ステップとを含んで構成されてなる定型的文書作成方法を提供する。 According to a second aspect of the present invention, there is provided a standard document creation method used when recording a conversation, comprising: a voice recognition step of recognizing a voice of the conversation; and an information extraction step of extracting information to be recorded from the voice recognition result. And a writing step of writing a sentence containing the extracted information into the conversation recording buffer as the contents of the conversation, and a case of storing in the case database a case that can be generated as the contents of the conversation and that includes a plurality of pieces of information. With reference to the storage step and the cases stored in the case database, guess about other unconfirmed information from one piece of sentence information recorded in the conversation recording buffer, and set it in the conversation recording buffer. The guess information set switch and the sentence containing the unconfirmed information recorded in the conversation record buffer are presented to the document creator graphically, and the unconfirmed information is displayed. Information inquiry update Sutetsu to confirm the information content, contact the Bungai author with And a conversation recording step of saving a record of the determined conversation as a sentence.

請求項 1 0に記載の本発明は、請求項 9に記載の定型的文書作成方法において、情報問合更新ステップが、音声合成装置、音響発生装置、文字入力装置、ポインティングディバイス、グラフィック表示装置のいずれか一つ又はそれらの組み合わせを含んで構成されてなる入力装置を用いて行われることを特徴とする。 According to a tenth aspect of the present invention, in the fixed form document creating method according to the ninth aspect, the information inquiry updating step includes: a speech synthesizer, a sound generator, a character input device, a pointing device, a graphic display device. It is performed using an input device configured to include any one of the above or a combination thereof.

請求項 1 1 に記載の本発明は、請求項 9に記載の定型的文書作成方法において、さらに、会話記録バッファにおいて文書作成者が確定した情報を、データベースに事例として追加保存する事例追加ス亍ッブを含むように構成されてなることを特徴とする。 The present invention according to claim 11 is the method according to claim 9, wherein the information determined by the document creator in the conversation recording buffer is additionally stored in the database as a case. Characterized in that it is configured to include

請求項 1 2に記載の本発明は、請求項 9に記載の定型的文書作成方法において、書込ステップが、ある情報についての情報抽出が蓋然的である場合、当該情報内容に関する複数の候補を、統計などに基づく尤度情報と共に会話記録バッファに書き込むものであるように構成されてなることを特徴とする。 According to the present invention as set forth in claim 12, in the fixed form document creating method according to claim 9, when the information extraction for certain information is probable, a plurality of candidates for the information content are provided. Is written in the conversation recording buffer together with likelihood information based on statistics and the like.

請求項 1 3に記載の本発明は、請求項 9に記載の定型的文書作成システムにおいて、推測情報セットステップが、会話記録バッファの情報に入力されるべき内容の推定が一意に決まらない場合、複数の候補を、統計などに基づ〈尤度情報と共に会話記録バッファに書き込むものであるように構成されてなることを特徴とする。 According to the present invention described in claim 13, in the fixed form document creation system according to claim 9, the guess information setting step is such that the estimation of the content to be input to the information of the conversation record buffer is performed uniquely. If it is not determined, it is characterized in that it is configured to write multiple candidates to the conversation recording buffer together with likelihood information based on statistics and the like.

請求項 1 4に記載の本発明は、請求項 9に記載の定型的文書作成方法において、情報問合更新ステップが、音声、 G U I あるいは文字入力により文書作成者に問い合わせて会話記録バッファ中に記録された文章の情報を確定するように構成されたものであることを特徴とする。 The present invention described in claim 14 is the fixed form document creation method according to claim 9, wherein the information inquiry update step includes the step of inquiring the document creator by voice, GUI, or character input and in the conversation recording buffer. It is characterized in that it is configured to determine the information of the sentence recorded in.

請求項 1 5に記載の本発明は、請求項 1 3に記載の定型的文書作成方法において、推測情報セットステップが、会話記録バッファ中の問い合わせるべき情報について、尤度の高い候補から順番に文書作成者へ問い合わせを行うように構成されたものであることを特徴とする。 According to a fifteenth aspect of the present invention, in the fixed form document creating method according to the thirteenth aspect, the guess information setting step is performed from candidates having a high likelihood with respect to information to be queried in the conversation recording buffer. It is characterized in that it is configured to inquire the document creator in order.

請求項 1 6に記載の本発明は、請求項 9〜 1 5のいずれか 1項に記載の定型的文鲁作成方法において、音声認識ステップが電話機を通しての会話を認識するものであると共に、さらに、相手方電話番号を取得し会話記録バッファに記録する発信者電話番号取得ステツプを含んでなることを特徴とする。 The present invention according to claim 16 is a method according to any one of claims 9 to 15, wherein the speech recognition step recognizes a conversation through a telephone. And a caller telephone number obtaining step of obtaining the other party's telephone number and recording the same in a conversation recording buffer.

請求項 1 7に記載の本発明は、請求項 1 6に記載の定型的文書作成方法において、推測情報セットステップが、会話記録バッファに記録された相手方電話番号から事例データベースに保存された事例を参照して、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を行うように構成されたものであることを特徴とする。 According to the present invention described in claim 17, in the fixed form document creation method according to claim 16, the guess information setting step is stored in the case database from the other party's telephone number recorded in the conversation recording buffer. With reference to the case, it is characterized in that it is configured to infer other undetermined information from one piece of information of the sentence recorded in the conversation recording buffer.

図面の簡な B月 Simple B month of drawing

図 1 は、本発明に係る定型的文書作成システムの一実施形態のブロック図である。 FIG. 1 is a block diagram of an embodiment of a typical document creation system according to the present invention.

図 2は、本発明に係る電話機を使った会話における定型的文書作成シス亍厶の一実施 FIG. 2 shows an embodiment of a typical document creation system in a conversation using a telephone according to the present invention.

形態のプロック図である。 It is a block diagram of a form.

図 3は、医師と患者との会話の内容を書き表した説明図である。そして、図 4は、会話記録の内容を書き表した説明図である。 FIG. 3 is an explanatory diagram showing the contents of the conversation between the doctor and the patient. FIG. 4 is an explanatory diagram showing the contents of the conversation record.

発明を実施するための最良の形熊 Best shape bear for carrying out the invention

以下、本発明を図示された好ましい実施形態を参照して詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to the illustrated preferred embodiments.

図 1 は、本発明による会話の音声認識に基づく定型的文書作成システムの一実施形態の全体構成を示すブロック図である。 FIG. 1 is a block diagram showing an overall configuration of an embodiment of a typical document creation system based on speech recognition of a conversation according to the present invention.

本実施形態の会話の音声認識に基づく定型的文書作成システムは、図 1 に示すように、マイクロフォン 1 と、音声認識装置 2と、情報抽出部 3と、話題性制御部 4と、データベース 5と、マルチモーダル対話制御部 6と、会話記録バッファ 7と、音声合成装置 8と、音響発生装置 9と、文字入力装置 1 0と、 G U I 制御部 1 1 と、ポイン亍ィングディバイス 1 2と、グラフィック表示装置 1 3と、そして、会話記録アーカイブ 1 4からなる。 As shown in FIG. 1, the typical document creation system based on speech recognition of a conversation according to the present embodiment includes a microphone 1, a speech recognition device 2, an information extraction unit 3, a topic control unit 4, a database 5, , A multi-modal dialogue control unit 6, a conversation recording buffer 7, a speech synthesizer 8, a sound generator 9, a character input device 10, a GUI control unit 11, a pointing device 12, a graphic It comprises a display device 13 and a conversation record archive 14.

音声認識装置 2は、マイクロフォン 1 からの音声信号を言語音声として解釈し、文字列あるいは形態素解析結果（単語ごとに分解され、品詞などの情報が付加されたテキスト）を出力する。この場合、音声認識装置 2は、音声信号をほぼ 1 0 0 %の確率で一の情報として特定できる場合には、確定された情報として文章化され、そのまま会話記録バッファに書き込む。それ以外の場合には、すなわち、ある情報についての情報抽出が蓋然的である場合には、当該情報内容に関する複数の候補を、統計などに基づ〈尤度情報と共に会話記錄バッファに書き込むようにする。文書作成者は、未確認情報を確定する際、この尤度情報を参照することによリ、よリ適切な判断を下せることができるようになる。 The speech recognition device 2 interprets the speech signal from the microphone 1 as linguistic speech, and outputs a character string or a morphological analysis result (text that is decomposed for each word and to which information such as part of speech is added). In this case, if the speech signal can be identified as one piece of information with a probability of approximately 100%, the speech recognition device 2 converts the speech signal into text as determined information. Is written to the conversation record buffer. In other cases, that is, when it is probable that information extraction for certain information is possible, write multiple candidates for the information content based on statistics etc. To When deciding unconfirmed information, the document creator will be able to make more appropriate decisions by referring to this likelihood information.

なお、音声認識装置 2への入力とマイク口フォン 1 の間に錄音装置や通信回線を挿入すれば、必ずしも会話の行われる時間あるいは場所において音声認識を行う必要はなくなる。 In addition, if a microphone or a communication line is inserted between the input to the speech recognition device 2 and the microphone mouth phone 1, it is not always necessary to perform the speech recognition at the time or place where the conversation takes place.

音声認識装置 2は、また、マルチモーダル対話制御部 6が文書作成者から対話的に情報を収集する際にも、文書作成者が音声によって提供した情報を、文字列などに変換してマルチモーダル対話制御部 6に送る。 Also, when the multimodal interaction control unit 6 collects information interactively from the document creator, the speech recognition device 2 converts the information provided by the document creator by voice into a character string, etc. Send to dialogue control unit 6.

情報抽出部 3は、音声認識装置 2の書き起こしモード出力を入力とし、記録すべき情報を抽出して会話記録バッファ 7に書き込む。抽出すべき情報、すなわち、記録として残すべき情報は、想定されている会話の種類によって予め選定しておく。例えば、旅行代理店のカウンタ一での使用が想定されている場合には、旅行の目的、人数、名前、連絡場所、旅行日程、目的地、料金、宿泊施設、支払方法等を含む会話が記録として残すべき情報となる。 The information extraction unit 3 receives the transcription mode output of the speech recognition device 2 as input, extracts information to be recorded, and writes the information to the conversation recording buffer 7. The information to be extracted, that is, the information to be recorded, is selected in advance according to the type of conversation that is assumed. For example, if it is intended to be used at a travel agency counter, conversations including the purpose of the trip, number of people, name, contact place, itinerary, destination, fee, accommodation, payment method, etc. are recorded. The information should be left as

「旅行の目的」を例にすると、会話の中に、新婚旅行、会社の慰安会、卒業旅行、個人旅行、グループ旅行、ビジネス旅行、視察旅行、親子旅行、避暑、海水浴、スキー、ゴルフ、スキューバダイビング、観劇、観戦、イベントへの参加といった言葉（情報）が含まれている場合には、これを「記録として残すべき情報 J として情報抽出部 3が抽出し、会話記録バッファ 7に記録する。「人数」であれば、数字を表す単語と「にん」との組み合わせ、「夫婦」、「新婚」といつた 2人を意味する単語が含まれている場合等に、これを「記録として残すべき情報」として情報抽出部 3が抽出し、会話記録バッファ 7に記録する。 Taking the “purpose of travel” as an example, conversations can include honeymoons, company comfort meetings, graduation trips, individual trips, group trips, business trips, business trips, inspection trips, parent and child trips, summer vacation, sea bathing, skiing, golfing If it contains words (information) such as, scuba diving, watching a theater, watching a game, or participating in an event, the information extraction unit 3 extracts this as information J to be recorded and records it in a conversation recording buffer. Record in number 7. If the number is "number of people", a combination of a word that represents a number and "nin", or a word that means two people such as "married couple" or "honeymoon" is included. The information extraction unit 3 extracts this as “information to be recorded” and records it in the conversation recording buffer 7.

話題性制御部 4は、会話記録バッファ 7の会話記録に示された一の情報から、データベース 5を参照しつつ、他の情報として入力されるべき内容を推定し、会話記録バッファ 7に害きこむ。この際、推定が一意に決まらない場合、その情報に関する複数の候補を、尤度値と共に会話記録バッファ 7に書きこむ。尤度値は, データベース 5中に記述された情報に基づき、例えばベイズの手法などによって計算される。 The topicality control unit 4 estimates contents to be input as other information while referring to the database 5 from one piece of information shown in the conversation record in the conversation record buffer 7, and harms the conversation record buffer 7. I'm sorry. At this time, if the estimation is not uniquely determined, a plurality of candidates for the information are written into the conversation recording buffer 7 together with the likelihood value. The likelihood value is Based on the information described in the database 5, it is calculated by, for example, Bayesian method.

事例データベース 5は、会話記録バッファ 7に与えられた一の情報から、他の情報の内容を推定するための事例を保存する。実際には、会話記録バッファ 7で文書作成者が確定した情報について、各情報の内容の関係が事例として保存される。前述の旅行代理店のカウンタ一での例で言えば、「新婚旅行」という一の情報について、「人数」は 2人、「旅行日程」が 3日〜 2週間で、「目的地」は海外旅行が主で、ハワイ、地中海、米国西海岸、「航空会社」は JAL又は ANA といった日本の航空会社でビジネスクラスを利用し、「費用」は多少高〈ても海の見えるスィ一トル一厶に宿泊し、「支払」は現金で前払いといった事例が多数保存されている。 The case database 5 stores a case for estimating the content of other information from one information given to the conversation record buffer 7. Actually, for the information determined by the document creator in the conversation recording buffer 7, the relationship between the contents of each information is stored as an example. In the example of the travel agency counter mentioned above, for one piece of information such as “honeymoon”, “number of people” is 2 people, “travel itinerary” is 3 days to 2 weeks, and “destination” is Overseas travel is major, Hawaii, the Mediterranean, the west coast of the United States, "airlines" use business classes with Japanese airlines such as JAL or ANA, and "expenses" are somewhat high. Many cases have been saved, such as staying in a room and paying in advance with cash.

このような状況下で、会話記録バッファ 7に一の情報として「新婚旅行」と記錄されているにもかかわらず、「人数」が 3人で「旅行目的地」が熱海であると認識された仮定する。この場合、「新婚旅行」と「人数」が 3人は矛盾するため、マルチモーダル対話制御部 6は、一番目の候補として「2人」を、二番目の候補として「3人」をその尤度情報と共に表示して、文書作成者に問い合わせる。本件においては、全ての事例において人数が「2人」であったことから、前者の尤度が 1 . 0 0 0で後者の尤度が 0 . 0 0 0となる。文書作成者は、顧客に確かめる等した後、「2人」が正しければそれをクリックして確定し、また、「3人」が正しければそちらを選択して確定する。例えば、カップルのどちらかに子供がいれば、たとえ新婚であったとしても、「新婚旅行」を 3人で行く場合があり得ることとなる。 In such a situation, despite the fact that “Honeymoon” is recorded as one piece of information in the conversation record buffer 7, it is recognized that “number of people” is three and “travel destination” is Atami. Assume that In this case, since “honeymoon” and “number” contradict each other, the multimodal dialogue control unit 6 considers “two” as the first candidate and “three” as the second candidate. Display with the degree information and contact the document creator. In this case, since the number of persons was “2” in all cases, the likelihood of the former is 1.0000 and the likelihood of the latter is 0.0000. After confirming with the customer, the document creator clicks and confirms if “two” is correct, and selects and confirms if “three” is correct. For example, if one of the couples has a child, even if they are newlywed, it may be possible to go on a “honeymoon” with three people.

あるいは、「新婚旅行」の認識が実は誤っており、会話において顧客が「新人研修旅行」を縮めて「新研旅行」と発声し、音声認識装置 2がこれを「新婚旅行」と誤認したような場合もあり得る。この場合には、文書作成者は、例えば、「新婚旅行」の部位をキーボード等の入力装置 1 0により「新研旅行」と修正した上で確定する。「新人研修旅行」であれば、旅行目的地が熱海であることとも矛盾しない。なお、システムの使用初期段階では、学習による事例の蓄積はないので、想定される情報間の相互関係が設計者により書き込まれる。 Or, the recognition of “honeymoon” is actually wrong, and the customer shortens the “new employee training trip” and utters “shinken trip” in a conversation, and the voice recognition device 2 mistakenly recognizes this as “honeymoon travel”. Such a case may occur. In this case, for example, the document creator corrects the part of “Honeymoon” to “Shinken Travel” using an input device 10 such as a keyboard, and then determines the part. A “new employee training trip” does not contradict that the travel destination is Atami. In the initial stage of use of the system, there is no accumulation of cases by learning, so the assumed interrelationship between information is written by the designer.

話題性制御部 4は、尤度情報をデータベース 5に与えるために、会話記錄バッファフで文書作成者が確定した情報をデータベース 5に事例として保存する。マルチモーダル対話制御部 6は、会話記録バッファ 7を読み、内容が確定していない情報について、文書作成者に問い合わせることにより情報を収集する。この際、音声による対話が適切と判断される場合は、音声による問い合わせを行う。そうでなければ、 G U I あるいは文字入力装置 1 0による情報収集を行う。 The topicality control unit 4 stores the information determined by the document creator in the conversation record / buffer as a case in the database 5 in order to provide the likelihood information to the database 5. The multi-modal dialog control unit 6 reads the conversation recording buffer 7 and collects information whose contents have not been determined by inquiring of the document creator. At this time, if it is judged that the spoken dialogue is appropriate, a spoken inquiry is made. Otherwise, the information is collected by the GUI or the character input device 10.

マルチモーダル対話制御部 6は、文書作成者の求めに応じ、会話記録バッファ 7の内容を、グラフィック表示装置 1 3あるいは音声合成装置 8と音響発生装置 9を介して、文書作成者に提示することができる。特に、マルチモーダル対話制御部 6は、文書作成者の求めに応じ、会話記録バッファ 7の保持する音声認識装置 2の書き起こし結果をグラフィック表示装置 1 3に提示することができる。マルチモーダル対話制御部 6は、文書作成者から対話的に情報を収集する際、ユーザの発話の選択肢が限られている場合は、選択肢を音声認識装置 2にあらかじめ送ることにより、音声認識の精度を高めることができる。例えば、顧客が沖繙への旅行を希望している場合に、沖繙の地名とその読みを音声認識装置 2に予め送信しておく。こうすることにより、宿泊地を「〈にがみぐんおおぎみむら」と顧客が発声した時にも、「国頭郡大宜味村」と正しく認識する確率を高める。会話記録バッファ 7は、定型的対話の各情報についての内容を保持する。内容が確定していない場合（未確定情報の場合）、内容について候補リストが与えられる。各候補には尤度が与えられる場合がある。また、会話記録バッファ 7は、音声認識装置 2の書き起こし結果を保持する。 The multi-modal dialogue controller 6 presents the contents of the conversation recording buffer 7 to the document creator via the graphic display device 13 or the speech synthesizer 8 and the sound generator 9 at the request of the document creator. can do. In particular, the multimodal dialogue control unit 6 can present the transcription result of the speech recognition device 2 held in the conversation record buffer 7 to the graphic display device 13 at the request of the document creator. When collecting information interactively from a document creator, the multi-modal dialogue control unit 6 sends the options to the speech recognition device 2 in advance if the user's utterance options are limited. The accuracy of recognition can be improved. For example, when a customer wants to travel to Oki Ref, the place name of Oki Ref and its reading are transmitted to the speech recognition device 2 in advance. By doing so, the probability of correctly recognizing the accommodation place as “Ogimi-son, Kunigami-gun” even when the customer utters “Nigamigun Ogimimura” is raised. The conversation record buffer 7 holds the contents of each information of the typical dialog. If the content is not finalized (in the case of unconfirmed information), a candidate list is given for the content. Each candidate may be given a likelihood. The conversation recording buffer 7 holds the transcription result of the voice recognition device 2.

音声合成装置 8は、マルチモーダル対話制御部 6から文書作成者へのメッセージについて、テキス卜から音声信号に変換することができる。 The speech synthesizer 8 can convert a message from the multi-modal interaction control unit 6 to the document creator from text to a speech signal.

音響発生装置 9は、音声合成装置 8からの音声信号を音響に変換する。 The sound generator 9 converts the sound signal from the sound synthesizer 8 into sound.

キーボード等の文字入力装置 1 0は、文書作成者からのタイプ入力を、マルチモーダル対話制御部 6への信号へ変換する。 The character input device 10 such as a keyboard converts a type input from a document creator into a signal to the multi-modal interaction control unit 6.

G U I 制御部 1 1 は、マルチモーダル対話制御部 6からの選択肢入力要請をグラフィック表示装置 1 3に表示し、文書作成者のポインティングデバイス 1 2による選択をマルチモーダル対話制御部 6に伝達する。 The GUI control unit 11 receives the option input request from the multi-modal dialog control unit 6. The information is displayed on the graphic display device 13, and the selection by the pointing device 12 of the document creator is transmitted to the multi-modal interaction control unit 6.

ポインティングデバイス 1 2は、文書作成者の選択（例えば、ある位置でのマウスクリック）を G U I 制御部 1 1 に伝達する。 The pointing device 12 transmits the selection of the document creator (for example, a mouse click at a certain position) to the GUI control unit 11.

グラフィック表示装置 1 3は、マルチモーダル対話制御部 6からの入力要請および、文書作成者の要請に基づ〈会話記録バッファ 7の内容の表示を行う。 The graphic display device 13 <displays the contents of the conversation recording buffer 7 based on an input request from the multi-modal dialog control unit 6 and a request from the document creator.

会話記録アーカイブ 1 4は、会話記録バッファ 7の内容を、文書作成者の要請に応じて保存する。 The conversation record archive 14 stores the contents of the conversation record buffer 7 at the request of the document creator.

以上説明したように、本発明によれば、音声認識による書き起こし結果の中の情報を用いて、対話的に定型的な記録を作成することを行うことができる。本発明は、統計などに基づく尤度情報を用いることにより、ユーザが選択すべき情報の順序付け、あるいは絞り込みを行い、定型的情報の記述作成を効率化する。図 2は、本発明に係る電話機を使った会話における定型的文書作成システムの一実施形態のプロック図である。 As described above, according to the present invention, it is possible to interactively create a fixed record by using information in a transcription result by voice recognition. The present invention uses the likelihood information based on statistics and the like to order or narrow down the information to be selected by the user, and to streamline the creation of descriptions of typical information. FIG. 2 is a block diagram of an embodiment of a typical document creation system in a conversation using a telephone according to the present invention.

電話を通じての会話や相談は、インターネットが発達した現代においても重要な通信手段であり続けている。例えば、外出もままならない老人の場合、コンビュ一タの操作を覚えることは難しく、また、必要な機器を買い揃えるだけの財力にも乏しいことが多い。そこで、医療相談などは将来電話機を利用して行うことが予想される。その場合、医師の側は電話口での会話を的確に行うことは当然のこととして、その相談内容や薬の処方その他諸々の指示を記録として残して置かなければならない。面と向っての診療であれば、医師が必要に応じてカルテに書き込んで時間をとつても問題とならない。しかしながら、電話機を通じての診療では、長時間会話が途絶えると患者は不安となりがちであるから、リアルタイムで会話記録を取れるようなシステムの開発が要望されていた。 Telephone conversation and consultation continue to be an important means of communication even in the modern times of the Internet. For example, it is difficult for an elderly person who does not stay out to learn to operate a computer, and often lacks the financial resources to buy the necessary equipment. Therefore, it is expected that medical consultations will be conducted using telephones in the future. In this case, it is natural for the physician to have a proper telephone conversation, and it is necessary to keep a record of the consultation, the prescription of the medicine, and other various instructions. For face-to-face treatment, it does not matter if the doctor fills in the medical record as needed and takes the time. However, in medical treatment through the telephone, if the conversation is interrupted for a long time, patients tend to be anxious. Therefore, there has been a demand for a system that can record conversations in real time.

本実施例は、このような要望に応えるもので、基本的に、前述した図 1のシステムにおけるマイクロフォン 1 の代わりに電話機 1 5を設置し、そして、相手方電話番号を取得し会話記録バッファ 7に記録する発信者電話番号取得装置 1 6を追加したものである。本実施態様において、音声認識装置 2は、電話を通じての会話の内、送信側のみ、受信側のみ、あるいは、送受信の両方の音声を認識するように切り替えることができるように構成できる。音声認識は、一般に、音源の種類が少ないほどその認識率が向上する。従って、多少時間がかかっても医師が患者の発言を繰り返し、医師の発言のみを会話記録として残すようにすることができる。また、リアルタイムに会話記録を作る必要がなく且つ相手の会話内容のみに記録すべき情報が含まれているような使用状況では、相手の発言内容のみを音声認識するように構成することもできる。 This embodiment responds to such a demand. Basically, a telephone 15 is installed in place of the microphone 1 in the system shown in FIG. 1 described above, and the telephone number of the other party is obtained and the conversation record buffer 7 is obtained. A caller telephone number acquisition device 16 to be recorded in the system is added. In the present embodiment, the voice recognition device 2 can be configured to be able to switch so as to recognize only the transmission side, only the reception side, or both of the transmission and reception voices in the telephone conversation. In general, the smaller the number of sound sources, the higher the recognition rate of voice recognition. Therefore, even if it takes some time, the doctor can repeat the patient's remarks and leave only the doctor's remarks as a conversation record. In addition, in a usage situation where it is not necessary to make a conversation record in real time and only the content of the conversation of the other party includes information to be recorded, it is also possible to configure so that only the speech content of the other party is voice-recognized. .

発信者電話番号取得装置 1 6は、相手方電話番号を取得し会話記録バッファ 7 に記録する。相手方電話番号から、電話の他端末の話し手についての種々の情報を得ることができる。例えば、名前、性別、年齢、保険証の有無、疾病記録、治療履歴等を得ることでき、会話記録バッファ中に記録された文章の一の情報から他の未確定情報についての推測を容易とする。もちろん、会話の中で相手方電話番号を聞き出し、それを用いて文書作成者側のコンピュータに蓄積されたデータベースから同様の情報を得るように構成することもできる。 The caller telephone number acquisition device 16 acquires the other party's telephone number and records it in the conversation record buffer 7. From the other party's telephone number, various information about the speaker of the other terminal of the telephone can be obtained. For example, it is possible to obtain the name, gender, age, presence / absence of health insurance card, illness record, treatment history, etc., and to easily guess other undecided information from one piece of text recorded in the conversation record buffer. I do. Of course, it is also possible to retrieve the other party's telephone number during a conversation and use it to obtain the same information from a database stored in the document creator's computer.

(実施例 1 ) (Example 1)

次に、電話での診療を想定した定型的文書作成システムについて、説明する。初めに、事例データベース 5に、想定される情報間の相互関係が設計者により書き込まれる。システムの使用初期段階では、学習による事例の蓄積はないからである。 Next, a typical document creation system that assumes telephone medical care will be described. First, in the case database 5, the interrelationship between the assumed information is written by the designer. This is because there is no accumulation of cases by learning in the early stage of using the system.

記録すべき情報としては、「体温」、「血圧」、「気になること」、「症状」、「何時ですか」、「痛み」、「吐き気」、「寒気」、「咳」、「息」、「下痢」、「どこ」、「どのように」、「だれが」、「何が」、「保険証」、「支払」、「診断」等である。これらに関する単語を音声認識装置 2が出力してきた場合、情報抽出部 3は、記録すべき情報として会話記録バッファ 7に記録する。 The information to be recorded is "body temperature", "blood pressure", "worry", "symptoms", "what time", "pain", "nausea", "chill", "cough", "breath" , "Diarrhea", "where", "how", "who", "what", "insurance card", "payment", "diagnosis" and so on. When the speech recognition device 2 outputs words relating to these, the information extracting unit 3 records the words to be recorded in the conversation recording buffer 7.

一方、事例データベース 5には、例えば、風邪に関する事例として、「熱が 40度近くあり、咳、くしゃみがでる。 j 、「熱はないが、手足の関節が痛〈顔が火照る。」、「寒気がして鼻水が止まらない。」、「頭痛がする上に、咳が出るし黄色の痰が時々出る。」、「喉がいがらっぽく、咳をすると胸の奥が少し痛い。」等を蓄積しておく。風邪の場合には、「血圧」との関連がなく、「症状」として現れる種々の身体的異変との関連が尤度として高くなる。他の病気についても同様に、記録すべき情報を複数含む種々の事例を蓄積しておく。 On the other hand, the case database 5 shows, for example, as a case related to a cold, “The fever is close to 40 degrees and coughing and sneezing occur.” J, “There is no fever, but the joints of the limbs are painful. , “I feel cold and my runny nose does not stop.”, “I have a headache, coughing and sometimes yellow sputum.”, “The throat is irritating. No. And so on. In the case of a cold, there is no association with “blood pressure”, and the association with various physical abnormalities that appear as “symptoms” increases as the likelihood. Similarly, for other diseases, various cases containing multiple pieces of information to be recorded are accumulated.

このシステムを用いて、医師と患者との間の電話相談を会話記録することとする。図 3は、医師と患者との会話の内容である。本実施例では、電話の送受信を両方とも文書化して記録するものとする。 This system will be used to record telephone consultations between doctors and patients. Figure 3 shows the conversation between the doctor and the patient. In this embodiment, it is assumed that both telephone transmission and reception are documented and recorded.

音声認識装置 2は、電話機 1 5の送受信信号全てを、音声信号に変えて情報抽出部 3に出力する。情報抽出部 3のメモリには、記録すべき情報として、上述のような単語が登録されており、 A 1 ~ A9の音声認識装置 2による書き起こし文章では、 A3 ~ A8に記録すべきとされた情報が含まれている。一方、 B 1 ~ B9の音声認識装置 2による書き起こし文章では、 B2 ~ B4、 B6に記録すべきとされた情報が含まれている。記録すべきとされた情報には、表においてアンダーラインを付した。 The voice recognition device 2 converts all the transmission / reception signals of the telephone 15 into voice signals and outputs them to the information extracting unit 3. The words described above are registered in the memory of the information extraction unit 3 as information to be recorded, and in the transcribed sentences by the speech recognition device 2 of A1 to A9, the words should be recorded in A3 to A8. Information is included. On the other hand, the transcribed sentence of the voice recognition device 2 for B1 to B9 includes information that should be recorded in B2 to B4 and B6. Information that should be recorded is underlined in the table.

情報抽出部 3は、このような情報を含む書き起こし結果を会話記録バッファ 7 に、「どうしましたか。熱があるし、寒気がします。体温計で計ったら 40度近くあります。」、「他には目立った症状はありますか。関節が痛みますし、蓳がいがらっぽい感じです。」、「¾と力、、くしゃみはどうですか。ありません。ただ、肺の奥の方がちょっと痛いです。」、「鼻水はどうですか。でません。」、 The information extraction unit 3 writes the transcription result including such information in the conversation recording buffer 7 as "What happened. I have heat and I feel cold. It is almost 40 degrees when measured with a thermometer." "Do you have any other noticeable symptoms. Your joints are painful and you feel irritated.", "How about ¾, strength, sneezing. No, but the back of my lungs is a bit painful. "How about a runny nose?"

「いつごろから熱が出ましたか。昨日の晚からです。関節が痛み出したのは、今朝になってからです。」、「風邪ですね。」、「暖か〈して一日お休み下さい。市販の解熱剤もお飲み下さい。」のように記録する。ここで、音声認識装置 2による書き起こし文章が、現実の会話内容と異なる部位については理解のし易さのために、太文字にした上でアンダーラインを付した。 "When did the fever start? From yesterday's 晚. My joint began to hurt only this morning." "I have a cold." Please drink a commercially available antipyretic agent. " Here, in the transcribed sentence by the speech recognition device 2, portions that are different from the actual conversation content are bolded and underlined for easy understanding.

話題性制御部 4は、事例データベース 5に記録された事例における「喉がいがらつぼく、咳をすると胸の奥が少し痛い。」から、「i」の候補として「喉」を選択して置き換え、会話記録バッファ 7の記録を更新する。また、事例「熟が近くあり、疸、 < しゃみがでる。」から、「&」の候補として「咳」を選択して置き換え、会話記録バッファ 7の記録を更新する。マルチモーダル対話制御部 6は、 GUI制御部 1 1 の制御の下、モニタの GUI に「どうしましたか。」、「熱があるし、寒気がします。体温計で計ったら 40 度近くあります。」、「他には目立った症状はありますか。」、「関節が痛みますし、喉がいがらつぼい感じです。」、「咳とか、くしゃみはどうですか。」、「ありません。ただ、肺の奥の方がちょっと痛いです。」、「鼻水はどうですか。」、「でません。」、「いつごろから熱が出ましたか。」、「昨日の晚からです。関節が痛み出したのは、今朝になってからです。」、「風邪ですね。」、「暖か〈して一日お休み下さい。市販の解熱剤もお飲み下さい。」の各文章を表示する。もちろん、 GU I の代わりに、音声合成装置 8と音響発生装置 9とを用いて、音声で文書作成者に問い合わせをすることができる。 The topicality control unit 4 selects “throat” as a candidate for “i” from “throat is throbbing and coughing makes the back of the chest a little painful” in the case recorded in the case database 5. Replace and update the record in the conversation record buffer 7. In addition, in the case “Ripe near, jaundice, <sneezing.”, “Cough” is selected and replaced as a candidate for “&”, and the record in the conversation record buffer 7 is updated. Under the control of the GUI control unit 11, the multi-modal dialogue control unit 6 displays "What's going on?" And "It has heat and chills. It is close to 40 degrees when measured with a thermometer." , "Do you have any other noticeable symptoms?", "I have a pain in my joints and my throat is rough.""What about coughing or sneezing?", "No. The back is a bit painful. "," How about a runny nose? "," Not out. "," When did the fever come out? "It's only this morning. "," I have a cold. ", And" Warm up and take a rest. Please drink a commercially available antipyretic. " Of course, instead of GU I, a voice synthesizer 8 and a sound generator 9 can be used to inquire the document creator by voice.

文書作成者は、文字入力装置 1 0やモニタ等のグラフィック表示装置 1 3と共同するポィンティングディバイス 12を用いて問い合わせのあった情報について、正しいものを選択して確定する。図示された好ましい実施例では、未確定情報候補の候補が一通りであつたが、音声認識装置 2における認識精度が低い場合や近似した名称の候補が複数ある場合には二通り以上となる。その場合、各候補に尤度情報を付与することが好ましいことは前述のとおりである。 The document creator uses the pointing device 12 which is shared with the character input device 10 and the graphic display device 13 such as a monitor to select and confirm the correct information for the inquired information. In the illustrated preferred embodiment, the number of candidates for the undetermined information candidate is one. However, when the recognition accuracy in the voice recognition device 2 is low or when there are a plurality of candidates with similar names, the number of candidates is two or more. Become. In this case, it is preferable to add likelihood information to each candidate, as described above.

マルチモーダル対話制御部 6は、さらに、確定した会話記録から、各情報を含む定型文の形として会話記録バッファ 7に書き直して記録する。本実施例では、「熱が 40度あり、寒気がする。」、「関節が痛み、喉がいがらつぼい。」、「肺の奥の方がちょっと痛い。」、「熱は昨日の晩からでた。」、「関節は今朝になってから痛み出した。」、「風邪です。」、「暖かくして一日お休み下さし、。」、「解熱剤をお飲み下さい。」の各文章が確定された会話記録として記録される。そして、この会話記録は、会話記録アーカイブ 1 4に記録されると共に, 複数の記録すべき情報を含む事例（図 4 )として、事例データベース 5に保存される。ここで、患者備考欄における情報は、患者から医師に電話がかかった時点で、発信者電話番号取得装置 1 6が相手方電話番号を取得することによって得られるものであるが、もちろん、患者自身から聞き出すこともできる。 The multi-modal dialogue control unit 6 further rewrites the determined conversation record in the conversation record buffer 7 as a fixed sentence containing each information and records it. In this example, "The fever is 40 degrees and I feel chills.", "The joints are painful, my throat is irritated.", "The back of my lungs is a little painful." , "The joint began to hurt this morning.", "I have a cold.", "Wake up and take a day off.", "Please take antipyretic." Is recorded as the confirmed conversation record. Then, this conversation record is recorded in the conversation record archive 14 and also stored in the case database 5 as a case (FIG. 4) including a plurality of pieces of information to be recorded. Here, the information in the patient remarks column is obtained by the caller's telephone number acquisition device 16 acquiring the other party's telephone number when the patient calls the doctor, but of course, the patient himself Can also be heard from.

Claims

The scope of the claims

1. A standard document creation system used to record conversations,

A voice recognition device for recognizing the voice of the conversation;

An information extraction unit for extracting information to be recorded from the speech recognition result,

A conversation record buffer for recording sentences containing the extracted information as conversation contents, a case database composed of a plurality of pieces of information and storing cases that can occur as conversation contents,

A topicality control unit for estimating other unconfirmed information from one piece of sentence information recorded in the conversation record buffer with reference to the cases stored in the case database, and a conversation record A multimodal dialogue control unit for inquiring the document creator of the determined information to determine the undetermined information;

An input / output device controller that controls communication with the document creator using graphics;

An input device for inputting the author's will into the computer system; and a conversation record archive for storing a record of the confirmed conversation;

A standard document creation system configured to include:

2. The fixed form document creation system according to claim 1, wherein the input / output device is controlled by a DUI, and the input device includes any one of a character input device and a pointing device or a combination thereof. In addition, a standard document creation system in which the output device includes any one of a speech synthesis device, a sound generation device, and a graphic display device, or a combination thereof.

3. The fixed form document creation system according to claim 1, wherein the topicality control unit is configured to store, in the conversation record buffer, information determined by a document creator as an example in the database. A standardized document creation system.

4. In the fixed form document creation system according to claim 1, when the information extraction for a certain information is probable, the information extraction unit determines a plurality of candidates for the information content based on statistics or the like. <A typical document creation system configured to be written in the conversation recording buffer together with likelihood information.

5. The fixed-form document creation system according to claim 1, wherein the topicality control unit determines a plurality of candidates as statistics if the estimation of the content to be input to the information in the conversation recording buffer is not uniquely determined. <A typical document creation system configured to write in the conversation record buffer together with the likelihood information.

6. The fixed form document creation system according to claim 1, wherein the multi-modal conversation control unit inquires of a document creator by voice, GUI, or character input to obtain information on a sentence recorded in the conversation record buffer. A fixed form document creation system configured to determine the

7. The fixed form document creation system according to claim 5, wherein the multi-modal conversation control unit inquires of a document creator in order from information having a high likelihood with respect to information to be inquired in the conversation recording buffer. A standard document creation system configured as follows.

8. The fixed form document creation system according to any one of claims 1 to 7, further comprising: a telephone that enables a conversation between distant places; and acquiring the other party's telephone number to perform the conversation. A fixed form document creation system comprising: a caller telephone number obtaining device that records the caller's telephone number in a recording buffer.

9. A standard document preparation method used when recording a conversation,

A voice recognition step for recognizing the voice of the conversation;

An information extraction step of extracting information to be recorded from the speech recognition result;

A writing step of writing a sentence containing the extracted information as a conversation content in a conversation recording buffer;

A case storing step of storing in the case database a case that can occur as the content of the conversation, which includes multiple pieces of information;

Referring to the cases stored in the case database, guessing other unconfirmed information from one piece of information of the text recorded in the conversation recording buffer, and setting the guess information in the conversation recording buffer. When,

An information inquiry updating step of presenting the sentence containing the unconfirmed information recorded in the conversation recording buffer to the harm creator in a graphic form, inquiring the document creator about the unconfirmed information and confirming the information content, , A conversation recording step for storing a record of the confirmed conversation as a sentence; and a standard document creation method comprising:

10. The fixed form document creation method according to claim 9, wherein the information inquiry updating step is any of a speech synthesis device, a sound generation device, a character input device, a pointing device, and a graphic display device. A standard document creation method to be performed using an input device configured to include one or a combination thereof.

11. The fixed form document creation method according to claim 9, further comprising a case addition step for additionally storing information determined by the document creator in the conversation record buffer as a case in the database. A standardized document creation method composed of:

12. The fixed form document creation method according to claim 9, wherein, when information extraction for certain information is probable, the writing step is based on statistics or the like based on a plurality of candidates for the information content. A typical document creation method configured to be written in the conversation recording buffer together with likelihood information.

13. The fixed-form document creation system according to claim 9, wherein the guess information set step includes a plurality of guess information sets, when the estimation of the content to be input to the information of the conversation recording buffer is not uniquely determined. A typical document creation system configured to write candidates together with likelihood information based on statistics or the like in the conversation record buffer.

14. The fixed form document creation method according to claim 9, wherein the information inquiry update step is performed by inquiring a document creator by voice, GUI, or character input to send a sentence recorded in the conversation recording buffer. A standardized document creation method that is configured to determine the information of a document.

15. The fixed sentence damage creating method according to claim 13, wherein the guess information set step is to the document creator in order from the candidate having the highest likelihood with respect to the information to be queried in the conversation recording buffer. A canonical document creation method that is configured to make inquiries.

16. The fixed form document creation method according to any one of claims 9 to 15, wherein the voice recognition step is for recognizing a conversation through a telephone, and further, obtains the other party's telephone number. And a step of obtaining a caller telephone number to be recorded in the conversation recording buffer.

17. The fixed form document creation method according to claim 16, wherein the guess information set step refers to a case stored in the case database from a partner telephone number recorded in the conversation record buffer. And a method for making a guess for other undetermined information from one piece of information of a sentence recorded in a conversation recording buffer.