JP2015076774A

JP2015076774A - Communication support system, communication support method, and communication support program

Info

Publication number: JP2015076774A
Application number: JP2013212625A
Authority: JP
Inventors: 正義下元; Masayoshi Shimomoto; 田中　智之; Tomoyuki Tanaka; 智之田中; 大村　和弘; Kazuhiro Omura; 和弘大村; 優季長岡; Yuki Nagaoka; 広一鈴木; Koichi Suzuki; 浩一眞崎; Koichi Mazaki
Original assignee: Mizuho Information and Research Institute Inc
Current assignee: Mizuho Information and Research Institute Inc
Priority date: 2013-10-10
Filing date: 2013-10-10
Publication date: 2015-04-20
Anticipated expiration: 2033-10-10
Also published as: JP6293449B2

Abstract

【課題】コンピュータ端末を利用して、効率的なコミュニケーションを支援するためのコミュニケーション支援システム、コミュニケーション支援方法及びコミュニケーション支援プログラムを提供する。【解決手段】支援サーバ２０の制御部２１は、会話者のモーションデータを取得し、このモーションデータを用いて手話認識処理を実行する。そして、制御部２１は、手話認識結果をユーザ端末１０，１５に出力する。また、制御部２１は、会話者の音声を取得し、音声認識処理を実行する。そして、制御部２１は、音声認識結果をユーザ端末１０，１５に出力する。更に、制御部２１は、音声認識結果の関連語を特定し、この関連語の重み付けを行ない、この重み付けを用いて、手話認識処理を調整する。【選択図】図１A communication support system, a communication support method, and a communication support program for supporting efficient communication using a computer terminal are provided. A control unit 21 of a support server 20 acquires a conversation person's motion data and executes a sign language recognition process using the motion data. Then, the control unit 21 outputs the sign language recognition result to the user terminals 10 and 15. Moreover, the control part 21 acquires the voice of a conversation person, and performs a speech recognition process. Then, the control unit 21 outputs the voice recognition result to the user terminals 10 and 15. Further, the control unit 21 specifies related words of the speech recognition result, performs weighting of the related words, and adjusts the sign language recognition process using the weights. [Selection] Figure 1

Description

本発明は、コンピュータ端末を利用して行なわれるコミュニケーションを支援するためのコミュニケーション支援システム、コミュニケーション支援方法及びコミュニケーション支援プログラムに関する。 The present invention relates to a communication support system, a communication support method, and a communication support program for supporting communication performed using a computer terminal.

コンピュータ端末を利用して、効率的に資料説明を行なうための説明支援システムが検討されている（例えば、特許文献１参照）。この文献に記載された技術においては、２つのタブレット端末の画面同期処理を実行する。 An explanation support system for efficiently explaining a document using a computer terminal has been studied (for example, see Patent Document 1). In the technique described in this document, screen synchronization processing of two tablet terminals is executed.

また、視覚／聴覚障害者との間でコミュニケーションを行なうための技術が検討されている（例えば、特許文献２参照）。この文献に記載された技術においては、送信者は、自己の携帯電話機を使用して手話等の映像、キーボード操作等の文字又はマイクからの音声を入力する。そして、映像情報及び音声情報、映像情報を「手話→文字」変換、音声情報を「音声→文字」変換した文字情報、文字情報を「文字→音声」変換した音声情報を、受信者の携帯電話機に送信する。受信者は、映像情報、文字情報又は音声情報を選択してコミュニケーションを行なう。 In addition, a technique for communicating with a visually impaired person or a hearing impaired person has been studied (see, for example, Patent Document 2). In the technique described in this document, a sender uses his / her mobile phone to input an image such as sign language, a character such as a keyboard operation, or a sound from a microphone. Then, the video information and voice information, the video information is converted into “sign language → character”, the voice information is converted into “voice → character”, and the voice information is converted into “character → voice”. Send to. The receiver selects video information, character information, or audio information and performs communication.

更に、聴覚障害者と健聴者との円滑なコミュニケーションを支援し、バリアフリーな社会を実現する手話をリアルタイムでテキスト化し、何も持たず、何も触れずに、体の動きや声でコンピュータと対話する技術も検討されている（例えば、非特許文献１参照）。 In addition, it supports smooth communication between hearing-impaired and hearing-impaired people, texts sign language that realizes a barrier-free society in real time, and has nothing to do with anything and touches it with a computer using body movements and voices. Techniques for dialogue have also been studied (for example, see Non-Patent Document 1).

特開２０１３−２５６０８号公報（第１頁、図１）Japanese Patent Laying-Open No. 2013-25608 (first page, FIG. 1) 特開２００４−２４８０２２号公報（第１頁、図１）Japanese Patent Laying-Open No. 2004-248022 (first page, FIG. 1)

みずほ情報総研、千葉大学、みずほ情報総研と千葉大学、「手話認識システム」の共同開発に着手″、［online］、平成２５年９月４日、みずほ情報総研ホームページ、［平成２５年９月９日検索］、インターネット＜http://www.mizuho-ir.co.jp/company/release/2013/shuwa0904.html＞Mizuho Information & Research Institute, Chiba University, Mizuho Information & Research Institute and Chiba University, Joint Development of “Sign Language Recognition System”, [online], September 4, 2013, Mizuho Information & Research Institute Home Page, [September 9, 2013 Day search], Internet <http://www.mizuho-ir.co.jp/company/release/2013/shuwa0904.html>

上述のように、バリアフリーな環境を実現するために手話認識技術が検討されている。しかしながら、所定の入力を認識してテキストに変換する場合、認識処理方式や環境により、的確な認識が難しいことがある。例えば、認識処理により、特定の手話動作や音声等に対して、複数のテキスト候補（異義語）が特定される場合、会話の目的に応じた円滑なコミュニケーションを行なうことができない可能性がある。また、認識されたテキストのみを見ていたのでは、相手の状態を把握することができず、的確な意思疎通ができないことがある。 As described above, sign language recognition technology has been studied to realize a barrier-free environment. However, when a predetermined input is recognized and converted into text, accurate recognition may be difficult depending on the recognition processing method and environment. For example, when a plurality of text candidates (synonyms) are specified for a specific sign language action or voice by the recognition process, there is a possibility that smooth communication according to the purpose of the conversation cannot be performed. Moreover, if only the recognized text is viewed, the state of the other party cannot be grasped, and accurate communication may not be possible.

本発明は、上記問題点を解決するためになされたものであり、コンピュータ端末を利用して、効率的なコミュニケーションを支援するためのコミュニケーション支援システム、コミュニケーション支援方法及びコミュニケーション支援プログラムを提供することにある。 The present invention has been made to solve the above problems, and provides a communication support system, a communication support method, and a communication support program for supporting efficient communication using a computer terminal. is there.

（１）上記課題を解決するコミュニケーション支援システムは、第１の入力情報を取得する第１取得部と、第２の入力情報を取得する第２取得部と、前記第１、第２の入力情報をテキスト変換した認識結果を生成する制御部と、前記認識結果を出力する出力部とを備える。そして、前記制御部は、前記第１取得部において取得した第１の入力情報に基づいてテキスト変換を行なう第１の認識処理部と、前記第２取得部において取得した第２の入力情報に基づいてテキスト変換を行なう第２の認識処理部と、前記第１、第２の認識処理部の少なくとも一方の認識処理部の認識結果に基づいて、他方の認識処理部のテキスト変換を調整する認識調整部とを備える。この構成によれば、異なる認識方法を利用したコミュニケーションを実現することができる。更に、一方の認識結果を利用して、他方の認識処理を支援することができる。 (1) A communication support system that solves the above problems includes a first acquisition unit that acquires first input information, a second acquisition unit that acquires second input information, and the first and second input information. A control unit that generates a recognition result obtained by converting the text into a text, and an output unit that outputs the recognition result. And the said control part is based on the 1st recognition process part which performs a text conversion based on the 1st input information acquired in the said 1st acquisition part, and the 2nd input information acquired in the said 2nd acquisition part. A recognition adjustment that adjusts the text conversion of the other recognition processing unit based on the recognition result of at least one of the first recognition processing unit and the second recognition processing unit that performs text conversion A part. According to this configuration, communication using different recognition methods can be realized. Furthermore, it is possible to support the recognition process of the other by using the recognition result of one.

（２）上記コミュニケーション支援システムは、前記第１、第２の認識処理部のうち、認識率が高い認識処理部における認識結果に含まれる単語候補の重み付けを行ない、前記重み付けに基づいて、他方の認識処理部のテキスト変換を調整することが好ましい。コミュニケーションにおいては共通した話題をテーマにしているので、認識率（正しく認識した率）が高い認識結果に基づいて単語候補の重み付けを行ない、認識率が低い認識処理を支援することができる。 (2) The communication support system weights a word candidate included in a recognition result in a recognition processing unit having a high recognition rate among the first and second recognition processing units, and based on the weighting, It is preferable to adjust the text conversion of the recognition processing unit. Since common topics are used in communication, word candidates are weighted based on recognition results with a high recognition rate (correctly recognized rate), and recognition processing with a low recognition rate can be supported.

（３）上記コミュニケーション支援システムは、前記認識処理部毎に、認識結果の修正に応じて認識率を算出し、前記認識率に基づいて、認識率が高い認識処理部を特定することが好ましい。これにより、修正状況に応じて認識率が高い認識処理部を特定し、認識率の低い認識処理を支援することができる。従って、状況に応じて、第１、第２の認識処理部における相対的な認識率の高低が変わる場合にも、状況に応じた認識処理の支援を行なうことができる。 (3) It is preferable that the communication support system calculates a recognition rate for each recognition processing unit according to the correction of the recognition result, and identifies a recognition processing unit with a high recognition rate based on the recognition rate. Thereby, it is possible to identify a recognition processing unit having a high recognition rate according to the correction status, and to support recognition processing having a low recognition rate. Therefore, even when the relative recognition rate in the first and second recognition processing units changes depending on the situation, the recognition process can be supported according to the situation.

（４）上記コミュニケーション支援システムは、前記第１、第２の入力情報の少なくとも一方の入力情報とともに顔画像を取得し、前記入力情報に対応した認識結果に対応付けて、前記顔画像を前記出力部に出力することが好ましい。これにより、相手の表情を確認しながら、コミュニケーションを行なうことができる。 (4) The communication support system acquires a face image together with at least one of the first and second input information, and outputs the face image in association with a recognition result corresponding to the input information. It is preferable to output to the part. Thereby, it is possible to communicate while confirming the other party's facial expression.

（５）上記コミュニケーション支援システムは、前記顔画像が出力された領域に重畳しないように、前記認識結果を出力することが好ましい。出力部において、認識結果とともに表情を確認しながら、コミュニケーションを行なうことができる。 (5) It is preferable that the communication support system outputs the recognition result so that the face image is not superimposed on the output region. In the output unit, communication can be performed while confirming the facial expression together with the recognition result.

本発明によれば、コンピュータ端末を利用して、効率的なコミュニケーションを支援することができる。 According to the present invention, efficient communication can be supported using a computer terminal.

本発明の実施形態のコミュニケーション支援システムの説明図。Explanatory drawing of the communication assistance system of embodiment of this invention. 本実施形態の処理手順の説明図であって、（ａ）は手話対応処理、（ｂ）は音声対応処理の説明図。It is explanatory drawing of the process sequence of this embodiment, Comprising: (a) is a sign language corresponding | compatible process, (b) is explanatory drawing of an audio | voice corresponding | compatible process. 本実施形態の処理手順の説明図であって、（ａ）は認識調整処理、（ｂ）は手話認識処理の説明図。It is explanatory drawing of the process sequence of this embodiment, Comprising: (a) is a recognition adjustment process, (b) is explanatory drawing of a sign language recognition process. 本実施形態のユーザ端末における表示画面の説明図。Explanatory drawing of the display screen in the user terminal of this embodiment. 他実施形態の処理手順の説明図であって、（ａ）は手話対応処理、（ｂ）は音声対応処理、（ｃ）は認識調整処理の説明図。It is explanatory drawing of the process sequence of other embodiment, Comprising: (a) is a sign language corresponding | compatible process, (b) is an audio | voice corresponding | compatible process, (c) is explanatory drawing of a recognition adjustment process. 他実施形態の処理手順の説明図。Explanatory drawing of the process sequence of other embodiment. 他実施形態の処理手順の説明図。Explanatory drawing of the process sequence of other embodiment. 他実施形態の処理手順の説明図。Explanatory drawing of the process sequence of other embodiment.

＜第１の実施形態＞
以下、本発明を具体化したコミュニケーション支援システムの一実施形態を図１〜図４に従って説明する。本実施形態では、来店顧客が、金融機関の窓口カウンタにおいて、手続等の依頼を行ない、窓口担当者が対応を行なう場合を説明する。ここでは、来店顧客は手話により、窓口担当者は発話によりコミュニケーションを行なう場合を想定する。そして、第１の入力情報（手話動作）、第２の入力情報（音声）における２つの認識技術（手話認識、音声認識）を組み合わせて、コミュニケーションを支援する。具体的には、音声認識の方が手話認識よりも、認識率（正しく認識した率）が高い場合を想定し、音声認識結果を用いて手話認識を支援する。 <First Embodiment>
An embodiment of a communication support system embodying the present invention will be described below with reference to FIGS. In the present embodiment, a case will be described in which a customer at a store makes a request for a procedure or the like at a counter of a financial institution and a person in charge of the counter responds. Here, it is assumed that the customer at the store communicates by sign language and the person in charge at the window communicates by utterance. Then, communication is supported by combining two recognition techniques (sign language recognition and voice recognition) in the first input information (sign language operation) and the second input information (voice). Specifically, assuming that the recognition rate (correct recognition rate) is higher in speech recognition than in sign language recognition, sign language recognition is supported using speech recognition results.

図１に示すように、本実施形態では、金融機関の窓口カウンタに設置されたユーザ端末１０，１５（出力部）を用いる。更に、ユーザ端末１０，１５は、ネットワークを介して支援サーバ２０に接続される。 As shown in FIG. 1, in this embodiment, user terminals 10 and 15 (output unit) installed at a counter of a financial institution are used. Furthermore, the user terminals 10 and 15 are connected to the support server 20 via a network.

ユーザ端末１０は、来店顧客の依頼対応を行なう窓口担当者が用いるコンピュータ端末（タブレット端末）であり、ユーザ端末１５は、来店顧客が用いるコンピュータ端末（タブレット端末）である。ユーザ端末１０及びユーザ端末１５は、無線ＬＡＮ通信等を用いることにより、支援サーバ２０との間で通信を行なう。なお、通信方式は無線通信に限定されるものではなく、有線通信を用いることも可能である。 The user terminal 10 is a computer terminal (tablet terminal) used by a contact person in charge of a customer visit request, and the user terminal 15 is a computer terminal (tablet terminal) used by a customer visit. The user terminal 10 and the user terminal 15 communicate with the support server 20 by using wireless LAN communication or the like. Note that the communication method is not limited to wireless communication, and wired communication can also be used.

ユーザ端末１０，１５は、制御部、タッチパネルディスプレイを備える。
タッチパネルディスプレイは入出力手段として機能し、ディスプレイ上に情報を出力するとともに、ディスプレイ表面へのタッチを検知した場合、タッチ位置（座標）を特定して各種操作処理（ポインティング処理、キー入力処理等）を行なう。例えば、筆談を行なう場合には、タッチパネルディスプレイ上に手書き入力を行なう。 The user terminals 10 and 15 include a control unit and a touch panel display.
The touch panel display functions as input / output means, outputs information on the display, and when a touch on the display surface is detected, specifies the touch position (coordinates) and performs various operation processes (pointing process, key input process, etc.) To do. For example, when writing a handwriting, handwriting input is performed on the touch panel display.

支援サーバ２０は、ユーザ端末１０，１５を用いてのコミュニケーションを支援するためのコンピュータシステムである。この支援サーバ２０は、制御部２１、手話認識辞書２２、音声認識辞書２３、関連語記憶部２４を備えている。更に、支援サーバ２０には、カメラ３１、マイク３２が接続されている。 The support server 20 is a computer system for supporting communication using the user terminals 10 and 15. The support server 20 includes a control unit 21, a sign language recognition dictionary 22, a voice recognition dictionary 23, and a related word storage unit 24. Furthermore, a camera 31 and a microphone 32 are connected to the support server 20.

手話認識辞書２２（第１認識辞書）には、手話において用いられる動作パターン（特徴量）に対して、単語に関するデータが記録される。
音声認識辞書２３（第２認識辞書）には、発声において用いられる音声パターン（特徴量）に対して、単語に関するデータが記録される。
関連語記憶部２４には、相互に関連する複数の単語を関連付けてグループとして登録されている。ここで、同じグループに属する単語は、相互に関連語として扱われる。 In the sign language recognition dictionary 22 (first recognition dictionary), data related to words is recorded with respect to motion patterns (features) used in sign language.
In the speech recognition dictionary 23 (second recognition dictionary), data relating to words is recorded with respect to speech patterns (features) used in utterance.
In the related word storage unit 24, a plurality of mutually related words are associated and registered as a group. Here, words belonging to the same group are treated as related words.

第１取得部としてのカメラ３１は、撮影手段として機能する。ここでは、カメラ３１は、撮影した顧客の顔や動作（深度情報を含めたモーション）をモーションデータとして生成する。この動作には、上腕部の動きや口元の動き等の手話の動作が含まれており、手話内容を特定するための情報が含まれる。
第２取得部としてのマイク３２は、集音手段として機能する。ここでは、マイク３２は、窓口担当者の音声を集音する。 The camera 31 as the first acquisition unit functions as a photographing unit. Here, the camera 31 generates the photographed customer's face and motion (motion including depth information) as motion data. This action includes sign language actions such as upper arm movements and mouth movements, and includes information for identifying the sign language content.
The microphone 32 as the second acquisition unit functions as sound collection means. Here, the microphone 32 collects the voice of the person in charge of the window.

制御部２１は、制御手段（ＣＰＵ、ＲＡＭ、ＲＯＭ等）を備え、後述する処理（手話認識段階、音声認識段階、認識調整段階、端末制御段階、筆談支援段階等の各処理等）を行なう。そのためのコミュニケーション支援プログラムを実行することにより、制御部２１は、図１に示すように、手話認識部２１１、音声認識部２１２、認識調整部２１３、端末制御部２１４、筆談支援部２１５として機能する。 The control unit 21 includes control means (CPU, RAM, ROM, etc.), and performs processes (each process such as a sign language recognition stage, a voice recognition stage, a recognition adjustment stage, a terminal control stage, and a writing support stage) to be described later. By executing the communication support program for that purpose, the control unit 21 functions as a sign language recognition unit 211, a speech recognition unit 212, a recognition adjustment unit 213, a terminal control unit 214, and a writing support unit 215, as shown in FIG. .

第１の認識処理部としての手話認識部２１１は、カメラ３１により撮影された顧客の動作において、口元、腕や手等の身体画像パターンにより手話の動作を特定する。そして、手話認識部２１１は、この動きの特徴量を算出し、手話認識辞書２２を用いて、手話のテキスト変換処理を実行する。 A sign language recognition unit 211 serving as a first recognition processing unit identifies a sign language operation based on body image patterns of a mouth, an arm, a hand, and the like in a customer's operation photographed by the camera 31. Then, the sign language recognition unit 211 calculates a feature quantity of the movement, and executes a sign language text conversion process using the sign language recognition dictionary 22.

第２の認識処理部としての音声認識部２１２は、マイク３２により集音された窓口担当者の音声の特徴量を算出する。そして、音声認識部２１２は、この特徴量により、音声認識辞書２３を用いて、音声のテキスト変換処理を実行する。 The voice recognition unit 212 as the second recognition processing unit calculates the feature amount of the voice of the person in charge of the window collected by the microphone 32. Then, the speech recognition unit 212 executes speech text conversion processing using the speech recognition dictionary 23 based on the feature amount.

認識調整部２１３は、手話認識によって生成されたテキストと、音声認識によって生成されたテキストとを用いて、認識方法を調整する処理を実行する。この認識調整部２１３は、重み付けメモリを備える。この重み付けメモリには、音声認識された単語の関連語を、頻度（重み付け）に関連付けて記憶する。なお、この重み付けメモリは、来店顧客の一つの手続依頼の窓口対応を終了した場合に、リセットされる。 The recognition adjustment unit 213 executes processing for adjusting a recognition method using text generated by sign language recognition and text generated by voice recognition. The recognition adjustment unit 213 includes a weighting memory. In this weighting memory, related words of the speech-recognized word are stored in association with the frequency (weighting). The weighting memory is reset when the customer service counter for one procedure request is terminated.

端末制御部２１４は、ユーザ端末１０，１５のタッチパネルディスプレイにおける表示や入力を制御する処理を実行する。本実施形態では、手話認識又は音声認識により生成されたテキストや筆談における文字画像をタッチパネルディスプレイに表示したり、タッチパネルディスプレイからタッチ入力された情報を取得したりする。
筆談支援部２１５は、ユーザ端末１０，１５から取得したタッチ入力の軌跡に基づいて、描画された文字画像を生成する処理を実行する。 The terminal control unit 214 executes processing for controlling display and input on the touch panel display of the user terminals 10 and 15. In the present embodiment, text generated by sign language recognition or voice recognition and a character image in writing are displayed on the touch panel display, or information touched from the touch panel display is acquired.
The writing support unit 215 executes processing for generating a drawn character image based on the touch input trajectory acquired from the user terminals 10 and 15.

次に、このコミュニケーション支援システムにおける動作を、図２〜図４を用いて説明する。以下、手話対応処理、音声対応処理、認識調整処理、手話認識処理、出力処理の順番に説明する。 Next, the operation in this communication support system will be described with reference to FIGS. In the following, description will be given in the order of sign language handling processing, voice handling processing, recognition adjustment processing, sign language recognition processing, and output processing.

（手話対応処理）
まず、図２（ａ）を用いて、来店顧客についての手話対応処理を説明する。
ここでは、支援サーバ２０の制御部２１は、撮影処理を実行する（ステップＳ１−１）。具体的には、制御部２１の手話認識部２１１は、カメラ３１を用いて、来店顧客の動作を撮影してモーションデータを生成し、このモーションデータをカメラ３１から取得する。 (Sign language handling processing)
First, referring to FIG. 2A, a sign language handling process for a customer at the store will be described.
Here, the control unit 21 of the support server 20 executes a photographing process (step S1-1). Specifically, the sign language recognition unit 211 of the control unit 21 uses the camera 31 to capture the motion of the customer visiting the store and generate motion data, and acquires the motion data from the camera 31.

次に、支援サーバ２０の制御部２１は、筆談入力があったかどうかについての判定処理を実行する（ステップＳ１−２）。具体的には、制御部２１の手話認識部２１１は、ユーザ端末１５のタッチパネルディスプレイにおいて、タッチ入力が行なわれたかどうかを判定する。そして、タッチパネルディスプレイにおけるタッチ入力により、連続的な軌跡が描かれた場合、筆談入力と判定する。 Next, the control unit 21 of the support server 20 executes a determination process as to whether or not a writing input has been made (step S1-2). Specifically, the sign language recognition unit 211 of the control unit 21 determines whether or not touch input has been performed on the touch panel display of the user terminal 15. And when a continuous locus | trajectory is drawn by the touch input in a touchscreen display, it determines with a writing input.

筆談入力と判定した場合（ステップＳ１−２において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、軌跡特定処理を実行する（ステップＳ１−３）。具体的には、制御部２１の手話認識部２１１は、筆談支援部２１５に処理を引き継ぐ。この場合、筆談支援部２１５は、ユーザ端末１５から、タッチパネルディスプレイにおいてタッチ入力された軌跡を取得する。
この軌跡は、後述するように、ユーザ端末１０，１５のタッチパネルディスプレイに出力される（ステップＳ１−７）。 When it determines with a writing input (in the case of "YES" in step S1-2), the control part 21 of the support server 20 performs a locus | trajectory identification process (step S1-3). Specifically, the sign language recognition unit 211 of the control unit 21 takes over the process to the writing support unit 215. In this case, the writing support unit 215 obtains a trajectory touch-input on the touch panel display from the user terminal 15.
This trajectory is output to the touch panel displays of the user terminals 10 and 15 as will be described later (step S1-7).

一方、タッチパネルディスプレイにおけるタッチ入力がなく、筆談入力でないと判定した場合（ステップＳ１−２において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、手話認識処理を実行する（ステップＳ１−４）。具体的には、制御部２１の手話認識部２１１は、カメラ３１から取得したモーションデータに含まれる動作パターンに基づいて、手話認識辞書２２を用いて単語候補を特定する。この処理については、図３（ｂ）を用いて後述する。 On the other hand, when it is determined that there is no touch input on the touch panel display and it is not a writing input (in the case of “NO” in step S1-2), the control unit 21 of the support server 20 executes a sign language recognition process (step S1-4). ). Specifically, the sign language recognition unit 211 of the control unit 21 specifies word candidates using the sign language recognition dictionary 22 based on the motion pattern included in the motion data acquired from the camera 31. This process will be described later with reference to FIG.

次に、支援サーバ２０の制御部２１は、単語候補の出力処理を実行する（ステップＳ１−５）。具体的には、制御部２１の手話認識部２１１は、手話認識辞書２２を用いて特定した単語候補を、ユーザ端末１５のタッチパネルディスプレイに出力する。 Next, the control unit 21 of the support server 20 executes word candidate output processing (step S1-5). Specifically, the sign language recognition unit 211 of the control unit 21 outputs the word candidates specified using the sign language recognition dictionary 22 to the touch panel display of the user terminal 15.

次に、支援サーバ２０の制御部２１は、単語候補の選択処理を実行する（ステップＳ１−６）。具体的には、顧客は、ユーザ端末１５のタッチパネルディスプレイに表示された単語候補を確認する。手話認識された単語に間違いがある場合には、この単語をタッチパネルディスプレイにおいて選択する。この場合、制御部２１の手話認識部２１１は、ユーザ端末１５のタッチパネルディスプレイに、動作パターンに類似する他の単語候補のリストを出力する。そして、ユーザ端末１５のタッチパネルディスプレイにおいて、正しい単語を選択することにより、手話認識単語が確定される。手話認識された単語に間違いがない場合には、そのまま放置することにより、手話認識単語が確定される。 Next, the control unit 21 of the support server 20 executes word candidate selection processing (step S1-6). Specifically, the customer confirms the word candidates displayed on the touch panel display of the user terminal 15. If there is an error in the word recognized in sign language, this word is selected on the touch panel display. In this case, the sign language recognition unit 211 of the control unit 21 outputs a list of other word candidates similar to the operation pattern to the touch panel display of the user terminal 15. Then, the sign language recognition word is determined by selecting the correct word on the touch panel display of the user terminal 15. If there is no mistake in the sign-recognized word, the sign-recognized word is confirmed by leaving it alone.

次に、支援サーバ２０の制御部２１は、出力処理を実行する（ステップＳ１−７）。具体的には、制御部２１の端末制御部２１４は、音声認識結果を、ユーザ端末１０，１５のタッチパネルディスプレイに出力する。タッチパネルディスプレイに出力される内容の詳細は、図４を用いて後述する。 Next, the control part 21 of the support server 20 performs an output process (step S1-7). Specifically, the terminal control unit 214 of the control unit 21 outputs the voice recognition result to the touch panel display of the user terminals 10 and 15. Details of the content output to the touch panel display will be described later with reference to FIG.

（音声対応処理）
次に、図２（ｂ）を用いて、窓口担当者についての音声対応処理を説明する。この処理は、図２（ａ）に示した手話対応処理と並行して行われる。
ここでは、支援サーバ２０の制御部２１は、音声取得処理を実行する（ステップＳ２−１）。具体的には、制御部２１の音声認識部２１２は、マイク３２を用いて、窓口担当者の音声を取得する。 (Audio processing)
Next, the voice correspondence processing for the person in charge of the window will be described with reference to FIG. This process is performed in parallel with the sign language handling process shown in FIG.
Here, the control unit 21 of the support server 20 executes a voice acquisition process (step S2-1). Specifically, the voice recognition unit 212 of the control unit 21 acquires the voice of the person in charge using the microphone 32.

次に、支援サーバ２０の制御部２１は、音声認識処理を実行する（ステップＳ２−２）。具体的には、制御部２１の音声認識部２１２は、マイク３２により集音された音声の特徴量を算出する。そして、音声認識部２１２は、この特徴量に関連付けられた単語を音声認識辞書２３から取得する。ここでは、音声に最も近い特徴量の音声パターンを特定して、この音声パターンに関連付けられた単語を単語候補として特定する。 Next, the control part 21 of the support server 20 performs a speech recognition process (step S2-2). Specifically, the voice recognition unit 212 of the control unit 21 calculates the feature amount of the voice collected by the microphone 32. Then, the voice recognition unit 212 acquires a word associated with this feature amount from the voice recognition dictionary 23. Here, the voice pattern having the feature quantity closest to the voice is specified, and the word associated with the voice pattern is specified as the word candidate.

次に、支援サーバ２０の制御部２１は、単語候補の出力処理を実行する（ステップＳ２−３）。具体的には、制御部２１の音声認識部２１２は、音声認識辞書２３を用いて特定した単語候補を、ユーザ端末１０のタッチパネルディスプレイに出力する。 Next, the control unit 21 of the support server 20 executes word candidate output processing (step S2-3). Specifically, the voice recognition unit 212 of the control unit 21 outputs the word candidates specified using the voice recognition dictionary 23 to the touch panel display of the user terminal 10.

次に、支援サーバ２０の制御部２１は、単語候補の選択処理を実行する（ステップＳ２−４）。具体的には、窓口担当者は、ユーザ端末１０のタッチパネルディスプレイに表示された単語候補を確認する。音声認識された単語に間違いがある場合には、この単語をタッチパネルディスプレイにおいて選択する。この場合、制御部２１の音声認識部２１２は、ユーザ端末１０のタッチパネルディスプレイに、音声パターンに類似する他の単語候補のリストを出力する。そして、ユーザ端末１０のタッチパネルディスプレイにおいて、正しい単語を選択することにより、音声認識単語が確定される。音声認識された単語に間違いがない場合には、そのまま放置することにより、音声認識単語が確定される。 Next, the control unit 21 of the support server 20 executes a word candidate selection process (step S2-4). Specifically, the contact person confirms the word candidates displayed on the touch panel display of the user terminal 10. If there is an error in the speech-recognized word, this word is selected on the touch panel display. In this case, the voice recognition unit 212 of the control unit 21 outputs a list of other word candidates similar to the voice pattern to the touch panel display of the user terminal 10. Then, by selecting the correct word on the touch panel display of the user terminal 10, the speech recognition word is confirmed. If there is no mistake in the speech-recognized word, the speech-recognized word is determined by leaving it as it is.

次に、支援サーバ２０の制御部２１は、認識調整処理を実行する（ステップＳ２−５）。この処理については、図３（ａ）を用いて後述する。
次に、支援サーバ２０の制御部２１は、出力処理を実行する（ステップＳ２−６）。具体的には、制御部２１の端末制御部２１４は、音声認識結果を、ユーザ端末１０，１５のタッチパネルディスプレイに出力する。タッチパネルディスプレイに出力される内容の詳細は、図４を用いて後述する。 Next, the control unit 21 of the support server 20 executes recognition adjustment processing (step S2-5). This process will be described later with reference to FIG.
Next, the control part 21 of the support server 20 performs an output process (step S2-6). Specifically, the terminal control unit 214 of the control unit 21 outputs the voice recognition result to the touch panel display of the user terminals 10 and 15. Details of the content output to the touch panel display will be described later with reference to FIG.

（認識調整処理）
次に、図３（ａ）を用いて、認識調整処理（ステップＳ２−５）を説明する。
ここでは、支援サーバ２０の制御部２１は、関連語の特定処理を実行する（ステップＳ３−１）。具体的には、制御部２１の認識調整部２１３は、音声認識部２１２から、音声認識処理によって特定された単語を取得する。そして、認識調整部２１３は、関連語記憶部２４から、音声認識された単語の関連語を取得する。 (Recognition adjustment process)
Next, the recognition adjustment process (step S2-5) will be described with reference to FIG.
Here, the control unit 21 of the support server 20 executes related word identification processing (step S3-1). Specifically, the recognition adjustment unit 213 of the control unit 21 acquires the word specified by the voice recognition process from the voice recognition unit 212. Then, the recognition adjustment unit 213 acquires the related word of the word that has been voice-recognized from the related word storage unit 24.

次に、支援サーバ２０の制御部２１は、重み付け処理を実行する（ステップＳ３−２）。具体的には、制御部２１の認識調整部２１３は、関連語記憶部２４から取得した関連語が、重み付けメモリに記録されているかどうかを確認する。取得した関連語が重み付けメモリに記録されていない場合には、関連語とともに頻度「１」を記録する。一方、取得した関連語が、既に重み付けメモリに記録されている場合には、この関連語に関連付けられた頻度「１」を加算する。 Next, the control part 21 of the support server 20 performs a weighting process (step S3-2). Specifically, the recognition adjustment unit 213 of the control unit 21 checks whether or not the related word acquired from the related word storage unit 24 is recorded in the weighting memory. If the acquired related word is not recorded in the weighting memory, the frequency “1” is recorded together with the related word. On the other hand, if the acquired related word is already recorded in the weighting memory, the frequency “1” associated with the related word is added.

（手話認識処理）
次に、図３（ｂ）を用いて、手話認識処理（ステップＳ１−４）を説明する。
ここでは、支援サーバ２０の制御部２１は、口元や前腕部の動き特定処理を実行する（ステップＳ４−１）。具体的には、制御部２１の手話認識部２１１は、カメラ３１から取得したモーションデータにおいて、身体画像パターンを用いて、口元領域や上腕部領域を特定する。そして、手話認識部２１１は、モーションデータにおいて、特定した口元や上腕部の動き（手話の動作）を特定する。 (Sign language recognition processing)
Next, sign language recognition processing (step S1-4) will be described with reference to FIG.
Here, the control unit 21 of the support server 20 executes a movement specifying process for the mouth and the forearm (step S4-1). Specifically, the sign language recognition unit 211 of the control unit 21 specifies the mouth area and the upper arm part area using the body image pattern in the motion data acquired from the camera 31. Then, the sign language recognition unit 211 identifies the movement of the mouth or upper arm (the action of sign language) identified in the motion data.

次に、支援サーバ２０の制御部２１は、特徴量の抽出処理を実行する（ステップＳ４−２）。具体的には、制御部２１の手話認識部２１１は、口元や上腕部の動き（手話の動作）に基づいて、動きの方向や大きさに関する特徴量を算出する。 Next, the control unit 21 of the support server 20 executes a feature amount extraction process (step S4-2). Specifically, the sign language recognition unit 211 of the control unit 21 calculates a feature amount related to the direction and magnitude of the movement based on the movement of the mouth and upper arm (the sign language movement).

次に、支援サーバ２０の制御部２１は、辞書比較処理を実行する（ステップＳ４−３）。具体的には、制御部２１の手話認識部２１１は、算出した動きの特徴量と、手話認識辞書２２に記録された動きパターンとを比較し、一致度を算出する。ここで、一致度が基準値以上の動作パターンに対応するすべての単語を単語候補として特定する。 Next, the control part 21 of the support server 20 performs a dictionary comparison process (step S4-3). Specifically, the sign language recognition unit 211 of the control unit 21 compares the calculated feature amount of motion with the motion pattern recorded in the sign language recognition dictionary 22 to calculate the degree of coincidence. Here, all words corresponding to an operation pattern having a matching degree equal to or higher than a reference value are specified as word candidates.

次に、支援サーバ２０の制御部２１は、意味推定処理を実行する（ステップＳ４−４）。具体的には、制御部２１の手話認識部２１１は、認識調整部２１３の重み付けメモリに記録されている関連語とのマッチングを行なう。そして、手話認識部２１１は、重み付けメモリに関連語として記録されている単語の頻度と、動作パターンとの一致度とが高い単語候補を特定する。 Next, the control part 21 of the support server 20 performs a meaning estimation process (step S4-4). Specifically, the sign language recognition unit 211 of the control unit 21 performs matching with related words recorded in the weighting memory of the recognition adjustment unit 213. Then, the sign language recognition unit 211 identifies word candidates having a high degree of coincidence between the frequency of words recorded as related words in the weighting memory and the motion pattern.

（出力処理）
次に、図４を用いて、出力処理（ステップＳ１−７，Ｓ２−６）を説明する。ここでは、窓口担当者が用いるユーザ端末１０のタッチパネルディスプレイに出力される表示画面を説明する。 (Output processing)
Next, output processing (steps S1-7 and S2-6) will be described with reference to FIG. Here, a display screen output to the touch panel display of the user terminal 10 used by the person in charge of the window will be described.

具体的には、支援サーバ２０の制御部２１は、カメラ３１から取得したモーションデータにおいて顧客の顔画像を特定する。そして、制御部２１の端末制御部２１４は、ユーザ端末１０のタッチパネルディスプレイに、顔画像５００を出力する。この場合、顧客の表情がわかるように大きく表示する。 Specifically, the control unit 21 of the support server 20 specifies the customer's face image in the motion data acquired from the camera 31. Then, the terminal control unit 214 of the control unit 21 outputs the face image 500 to the touch panel display of the user terminal 10. In this case, it is displayed large so that the customer's facial expression can be understood.

また、制御部２１の端末制御部２１４は、手話認識部２１１における手話認識結果５１０，５１１を、時間経過の順番で表示する。具体的には、図４においては、最新の手話認識結果５１１はタッチパネルディスプレイの左下に表示され、古い手話認識結果５１０は上方に移動させる。 Further, the terminal control unit 214 of the control unit 21 displays the sign language recognition results 510 and 511 in the sign language recognition unit 211 in the order of time passage. Specifically, in FIG. 4, the latest sign language recognition result 511 is displayed on the lower left of the touch panel display, and the old sign language recognition result 510 is moved upward.

更に、制御部２１の端末制御部２１４は、音声認識部２１２における音声認識結果５２０，５２１を、時間経過の順番で表示する。具体的には、図４においては、最新の音声認識結果５２１はタッチパネルディスプレイの左下に表示され、古い音声認識結果５２０は上方に移動させる。この場合、端末制御部２１４は、タッチパネルディスプレイにおいて顔画像が出力された領域に重畳しないように、認識結果を出力する。
なお、来店顧客が用いるユーザ端末１５のタッチパネルディスプレイには、手話認識結果５１０，５１１、音声認識結果５２０，５２１のみが出力される。 Furthermore, the terminal control unit 214 of the control unit 21 displays the speech recognition results 520 and 521 in the speech recognition unit 212 in the order of passage of time. Specifically, in FIG. 4, the latest speech recognition result 521 is displayed on the lower left of the touch panel display, and the old speech recognition result 520 is moved upward. In this case, the terminal control unit 214 outputs the recognition result so as not to be superimposed on the area where the face image is output on the touch panel display.
Note that only the sign language recognition results 510 and 511 and the speech recognition results 520 and 521 are output to the touch panel display of the user terminal 15 used by the customer at the store.

以上、本実施形態によれば、以下に示す効果を得ることができる。
（１）上記実施形態では、支援サーバ２０の制御部２１は、撮影処理（ステップＳ１−１）、手話認識処理（ステップＳ１−４）、出力処理（ステップＳ１−７）を実行する。更に、支援サーバ２０の制御部２１は、音声取得処理（ステップＳ２−１）、音声認識処理（ステップＳ２−２）、出力処理（ステップＳ２−６）を実行する。これにより、音声を聞き取れない場合や手話ができない場合にも、テキストを通じてコミュニケーションを行なうことができる。従って、コミュニケーションの相手とコミュニケーション方法が異なる場合にも、円滑にコミュニケーションを実現することができる。 As described above, according to the present embodiment, the following effects can be obtained.
(1) In the embodiment described above, the control unit 21 of the support server 20 executes the shooting process (step S1-1), the sign language recognition process (step S1-4), and the output process (step S1-7). Further, the control unit 21 of the support server 20 executes a voice acquisition process (step S2-1), a voice recognition process (step S2-2), and an output process (step S2-6). As a result, even when speech cannot be heard or when sign language is not possible, communication can be performed through text. Therefore, even when the communication partner and the communication method are different, communication can be smoothly realized.

（２）上記実施形態では、支援サーバ２０の制御部２１は、認識調整処理を実行する（ステップＳ２−５）。ここでは、支援サーバ２０の制御部２１は、関連語の特定処理（ステップＳ３−１）、重み付け処理（ステップＳ３−２）を実行する。これにより、音声認識率が手話認識率よりも高い場合に、音声認識結果を用いて、手話認識処理を支援することができる。 (2) In the above embodiment, the control unit 21 of the support server 20 executes recognition adjustment processing (step S2-5). Here, the control unit 21 of the support server 20 executes related word identification processing (step S3-1) and weighting processing (step S3-2). Thereby, when the speech recognition rate is higher than the sign language recognition rate, the sign language recognition process can be supported using the speech recognition result.

（３）上記実施形態では、筆談入力と判定した場合（ステップＳ１−２において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、軌跡特定処理を実行する（ステップＳ１−３）。これにより、手話認識処理が困難な場合に、筆談に切り換えてコミュニケーションを図ることができる。 (3) In the above embodiment, when it is determined that the input is a handwriting input (in the case of “YES” in step S1-2), the control unit 21 of the support server 20 executes a locus specifying process (step S1-3). Thereby, when sign language recognition processing is difficult, it can switch to writing and can communicate.

（４）上記実施形態では、支援サーバ２０の制御部２１は、カメラ３１から取得したモーションデータにおいて顧客の顔画像を特定する。そして、制御部２１の端末制御部２１４は、ユーザ端末１０のタッチパネルディスプレイに、顔画像５００を出力する。これにより、来店顧客の表情を確認しながら、コミュニケーションを図ることができる。 (4) In the above embodiment, the control unit 21 of the support server 20 identifies the customer's face image in the motion data acquired from the camera 31. Then, the terminal control unit 214 of the control unit 21 outputs the face image 500 to the touch panel display of the user terminal 10. Thereby, communication can be aimed at, confirming a visitor's facial expression.

＜第２の実施形態＞
次に、図５を用いて、第２の実施形態を説明する。第１の実施形態においては、音声認識率の方が手話認識率よりも高い場合を想定して、音声認識処理の認識結果を用いて手話認識処理を支援した。第２の実施形態では、各認識処理の認識率に応じて、支援に用いる認識処理（優先認識方式）を決定する構成であり、同様の部分については詳細な説明を省略する。
ここでは、会話の最初の段階（優先認識方式が決まっていない段階）では、手話認識処理と音声認識処理とを別個独立に行なう。そして、会話の進捗状況に応じて、優先認識方式を決定する。 <Second Embodiment>
Next, a second embodiment will be described with reference to FIG. In the first embodiment, assuming that the speech recognition rate is higher than the sign language recognition rate, the sign language recognition processing is supported using the recognition result of the speech recognition processing. In the second embodiment, the recognition process (priority recognition method) used for support is determined according to the recognition rate of each recognition process, and detailed description of similar parts is omitted.
Here, in the first stage of conversation (a stage where the priority recognition method is not determined), the sign language recognition process and the voice recognition process are performed independently. Then, the priority recognition method is determined according to the progress of the conversation.

（手話対応処理）
まず、図５（ａ）を用いて、手話対応処理を説明する。
ここでは、支援サーバ２０の制御部２１は、ステップＳ１−１と同様に、撮影処理を実行する（ステップＳ５−１）。 (Sign language handling processing)
First, the sign language handling process will be described with reference to FIG.
Here, the control unit 21 of the support server 20 executes the photographing process similarly to step S1-1 (step S5-1).

次に、支援サーバ２０の制御部２１は、ステップＳ１−２と同様に、筆談入力かどうかについての判定処理を実行する（ステップＳ５−２）。
筆談入力と判定した場合（ステップＳ５−２において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、ステップＳ１−３と同様に、軌跡特定処理を実行する（ステップＳ５−３）。 Next, the control part 21 of the support server 20 performs the determination process about whether it is a handwriting input similarly to step S1-2 (step S5-2).
When it is determined that the input is a handwriting input (in the case of “YES” in step S5-2), the control unit 21 of the support server 20 executes a trajectory specifying process as in step S1-3 (step S5-3).

一方、タッチパネルディスプレイにおけるタッチ入力がなく、筆談入力でないと判定した場合（ステップＳ５−２において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、手話認識処理を実行する（ステップＳ５−４）。ここで、優先認識方式が決まっていない場合や、手話認識が優先認識方式となっている場合には、制御部２１の手話認識部２１１は、手話認識辞書２２に記録された動作パターンの特徴量が近いものから順番に単語候補として特定する。一方、音声認識が優先認識方式となっている場合には、制御部２１の手話認識部２１１は、ステップＳ１−４と同様に、重み付けメモリに関連語として記録されている単語の頻度と、動作パターンとの一致度とが高い単語候補を特定する。 On the other hand, when it is determined that there is no touch input on the touch panel display and it is not a writing input (in the case of “NO” in step S5-2), the control unit 21 of the support server 20 executes a sign language recognition process (step S5-4). ). Here, when the priority recognition method is not determined, or when the sign language recognition is the priority recognition method, the sign language recognition unit 211 of the control unit 21 has the feature amount of the motion pattern recorded in the sign language recognition dictionary 22. Word candidates are identified in order from the closest. On the other hand, when the speech recognition is the priority recognition method, the sign language recognition unit 211 of the control unit 21 performs the frequency and operation of the words recorded as related words in the weighting memory, similarly to step S1-4. A word candidate having a high degree of matching with the pattern is specified.

次に、支援サーバ２０の制御部２１は、ステップＳ１−５〜Ｓ１−７と同様に、単語候補の出力処理（ステップＳ５−５）、単語候補の選択処理（ステップＳ５−６）、出力処理（ステップＳ５−７）を実行する。 Next, similarly to steps S1-5 to S1-7, the control unit 21 of the support server 20 performs word candidate output processing (step S5-5), word candidate selection processing (step S5-6), and output processing. (Step S5-7) is executed.

（音声対応処理）
次に、図５（ｂ）を用いて、音声対応処理を説明する。
ここでは、支援サーバ２０の制御部２１は、ステップＳ２−１と同様に、音声取得処理を実行する（ステップＳ６−１）。 (Audio processing)
Next, the voice correspondence process will be described with reference to FIG.
Here, the control part 21 of the support server 20 performs a voice acquisition process similarly to step S2-1 (step S6-1).

次に、支援サーバ２０の制御部２１は、音声認識処理を実行する（ステップＳ６−２）。ここで、優先認識方式が決まっていない場合や、音声認識が優先認識方式となっている場合には、制御部２１の音声認識部２１２は、音声認識辞書２３に記録された音声パターンの特徴量が近いものから順番に単語候補として特定する。一方、手話認識が優先認識方式となっている場合には、制御部２１の音声認識部２１２は、認識調整部２１３の重み付けメモリに記録されている関連語とのマッチングを行なう。そして、音声認識部２１２は、重み付けメモリに関連語として記録されている単語の頻度と、動作パターンとの一致度とが高い単語候補を特定する。 Next, the control unit 21 of the support server 20 executes a voice recognition process (step S6-2). Here, when the priority recognition method is not determined or when the voice recognition is the priority recognition method, the voice recognition unit 212 of the control unit 21 performs the feature amount of the voice pattern recorded in the voice recognition dictionary 23. Word candidates are identified in order from the closest. On the other hand, when sign language recognition is a priority recognition method, the speech recognition unit 212 of the control unit 21 performs matching with related words recorded in the weighting memory of the recognition adjustment unit 213. Then, the speech recognition unit 212 identifies word candidates having a high degree of coincidence between the frequency of words recorded as related words in the weighting memory and the action pattern.

次に、支援サーバ２０の制御部２１は、ステップＳ２−３，Ｓ２−４，Ｓ２−６と同様に、単語候補の出力処理（ステップＳ６−３）、単語候補の選択処理（ステップＳ６−４）、出力処理（ステップＳ６−５）を実行する。 Next, similarly to steps S2-3, S2-4, and S2-6, the control unit 21 of the support server 20 outputs a word candidate (step S6-3) and a word candidate selection process (step S6-4). ) And output processing (step S6-5).

（認識調整処理）
次に、図５（ｃ）を用いて、認識調整処理を説明する。ここでは、ステップＳ５−６，Ｓ６−４で用いられる優先認識方式を決定する。この処理は、会話の進捗状況が基準時点を経過した場合に実行される。例えば、この基準時点としては、両認識方式において所定の単語数を変換した時点や、所定時間が経過した時点等を用いることができる。
ここでは、支援サーバ２０の制御部２１は、認識率の比較処理を実行する（ステップＳ７−１）。具体的には、制御部２１の認識調整部２１３は、手話認識と音声認識における認識率を比較する。本実施形態では、認識調整部２１３は、基準時点までの認識処理において、最初に出力した単語候補が修正されなかった割合を認識率として用いる。 (Recognition adjustment process)
Next, the recognition adjustment process will be described with reference to FIG. Here, the priority recognition method used in steps S5-6 and S6-4 is determined. This process is executed when the progress of the conversation has passed the reference time point. For example, as the reference time, a time when a predetermined number of words is converted in both recognition methods, a time when a predetermined time elapses, or the like can be used.
Here, the control unit 21 of the support server 20 performs recognition rate comparison processing (step S7-1). Specifically, the recognition adjustment unit 213 of the control unit 21 compares the recognition rates in sign language recognition and speech recognition. In the present embodiment, the recognition adjustment unit 213 uses, as a recognition rate, the ratio at which the word candidates that are output first are not corrected in the recognition processing up to the reference time.

次に、支援サーバ２０の制御部２１は、認識率に基づいて優先認識方式の決定処理を実行する（ステップＳ７−２）。具体的には、制御部２１の認識調整部２１３は、認識率が高い認識方式を優先認識方式として特定する。そして、認識調整部２１３は、手話認識部２１１及び音声認識部２１２に対して、いずれの認識方式を優先するかを通知する。この通知に応じて、上述したように、手話認識処理（ステップＳ５−４）、音声認識処理（ステップＳ６−２）を実行する。 Next, the control unit 21 of the support server 20 executes a priority recognition method determination process based on the recognition rate (step S7-2). Specifically, the recognition adjustment unit 213 of the control unit 21 specifies a recognition method having a high recognition rate as the priority recognition method. Then, the recognition adjustment unit 213 notifies the sign language recognition unit 211 and the voice recognition unit 212 of which recognition method is prioritized. In response to this notification, as described above, sign language recognition processing (step S5-4) and voice recognition processing (step S6-2) are executed.

以上、本実施形態によれば、以下に示す効果を得ることができる。
（５）上記実施形態では、支援サーバ２０の制御部２１は、認識率の比較処理（ステップＳ７−１）、認識率に基づいて優先認識方式の決定処理（ステップＳ７−２）を実行する。コミュニケーション環境によっては、認識率の相対的な高さが逆転する場合がある。例えば、周囲が騒がしい場合には、音声によるコミュニケーションが困難である。この場合にも、修正状況に応じて認識率を評価して、認識率が高い方の認識結果を利用して、他方の認識処理を支援することができる。 As described above, according to the present embodiment, the following effects can be obtained.
(5) In the above embodiment, the control unit 21 of the support server 20 executes the recognition rate comparison process (step S7-1) and the priority recognition method determination process (step S7-2) based on the recognition rate. Depending on the communication environment, the relative height of the recognition rate may be reversed. For example, when the surroundings are noisy, voice communication is difficult. Also in this case, the recognition rate can be evaluated according to the correction status, and the recognition process with the higher recognition rate can be used to support the other recognition process.

なお、上記各実施形態は、以下の態様に変更してもよい。
・上記各実施形態では、金融機関の窓口におけるコミュニケーション支援に用いた。本願発明の対象はこれに限定されるものではなく、複数種類の認識処理を用い、協働して認識率を改善する仕組みに適応することができる。 In addition, you may change each said embodiment into the following aspects.
In each of the above embodiments, it is used for communication support at a financial institution's window. The subject of the present invention is not limited to this, and can be applied to a mechanism that uses a plurality of types of recognition processing to improve the recognition rate in cooperation.

・上記第２の実施形態では、認識率に基づいて優先認識方式を決定する。これに代えて、両方の認識方式で認識された単語の関連語を用いて重み付けを行なうようにしてもよい。この場合、支援サーバ２０の制御部２１は、認識調整処理を実行する。 In the second embodiment, the priority recognition method is determined based on the recognition rate. Instead, weighting may be performed using related words of words recognized by both recognition methods. In this case, the control unit 21 of the support server 20 executes recognition adjustment processing.

図６を用いて、この認識調整処理を説明する。
ここでは、支援サーバ２０の制御部２１は、マッチング処理を実行する（ステップＳ８−１）。具体的には、制御部２１の認識調整部２１３は、手話認識処理において選択された単語の関連語と、音声認識処理において選択された単語の関連語とを、関連語記憶部２４から抽出する。そして、認識調整部２１３は、それぞれの認識方式において抽出した関連語のマッチングを行なう。 This recognition adjustment process will be described with reference to FIG.
Here, the control part 21 of the support server 20 performs a matching process (step S8-1). Specifically, the recognition adjustment unit 213 of the control unit 21 extracts the related word of the word selected in the sign language recognition process and the related word of the word selected in the speech recognition process from the related word storage unit 24. . And the recognition adjustment part 213 matches the related word extracted in each recognition system.

次に、支援サーバ２０の制御部２１は、マッチングに基づいて重み付け処理を実行する（ステップＳ８−２）。具体的には、制御部２１の認識調整部２１３は、マッチング結果において共通する関連語を、重み付けメモリにおいて、高い重み付け（高い頻度）に関連付けて記録する。
これにより、複数の認識方式を活用して、的確な認識処理を行ない、円滑なコミュニケーションを実現することができる。 Next, the control part 21 of the support server 20 performs a weighting process based on matching (step S8-2). Specifically, the recognition adjustment unit 213 of the control unit 21 records related words common in the matching result in association with high weighting (high frequency) in the weighting memory.
Thereby, a plurality of recognition methods can be utilized to perform an accurate recognition process, and smooth communication can be realized.

・上記各実施形態では、ユーザ端末１０に、利用者の顔画像と手話認識結果及び音声認識結果を出力する。ユーザ端末１０，１５は、タブレット端末に限定されるものではなく、デスクトップ端末やノート端末を利用することも可能である。また、表示手段として、傾斜させたハーフミラーに認識結果を出力するプロンプタを用いることも可能である。また、ヘッドマウントディスプレイを利用することも可能である。 In each of the above embodiments, the user's face image, sign language recognition result, and voice recognition result are output to the user terminal 10. The user terminals 10 and 15 are not limited to tablet terminals, and desktop terminals and notebook terminals can also be used. Moreover, it is also possible to use a prompter that outputs the recognition result to the inclined half mirror as the display means. A head mounted display can also be used.

・上記各実施形態では、手話認識結果及び音声認識結果を、ユーザ端末１０，１５に出力する。ここで、認識結果に基づいて、表示形態を変更するようにしてもよい。例えば、音声対応処理において、音声認識結果に対応して、選択肢を出力するコミュニケーション支援処理を行なうようにしてもよい。 In each of the above embodiments, the sign language recognition result and the speech recognition result are output to the user terminals 10 and 15. Here, the display form may be changed based on the recognition result. For example, in the voice handling process, a communication support process for outputting options corresponding to the voice recognition result may be performed.

図７を用いて、このコミュニケーション支援処理を説明する。
まず、支援サーバ２０の制御部２１は、図２（ｂ）に示す音声対応処理を実行する（ステップＳ９−１）。 The communication support process will be described with reference to FIG.
First, the control unit 21 of the support server 20 executes a voice correspondence process shown in FIG. 2B (step S9-1).

次に、支援サーバ２０の制御部２１は、音声認識結果が疑問文かどうかについての判定処理を実行する（ステップＳ９−２）。具体的には、制御部２１の端末制御部２１４は、音声認識結果が疑問形かどうかを判定する。例えば、文末に「ですか」、「でしょうか」等の質問を表す文字列（終助詞）を検出した場合には、疑問文と判定する。 Next, the control part 21 of the support server 20 performs the determination process about whether a speech recognition result is a question sentence (step S9-2). Specifically, the terminal control unit 214 of the control unit 21 determines whether the voice recognition result is questionable. For example, if a character string (final particle) representing a question such as “Is it?” Or “Is it?” Is detected at the end of the sentence, it is determined as a question sentence.

疑問文でないと判定した場合（ステップＳ９−２において「ＮＯ」の場合）、支援サーバ２０の制御部２１は、通常出力処理を実行する（ステップＳ９−３）。ここでは、音声認識結果を、そのまま出力する。 When it determines with it not being a question sentence (in the case of "NO" in step S9-2), the control part 21 of the support server 20 performs a normal output process (step S9-3). Here, the speech recognition result is output as it is.

一方、疑問文と判定した場合（ステップＳ９−２において「ＹＥＳ」の場合）、支援サーバ２０の制御部２１は、選択肢の出力処理を実行する（ステップＳ９−４）。具体的には、制御部２１の端末制御部２１４は、音声認識結果とともに「はい」、「いいえ」等の選択肢を、来店顧客のユーザ端末１５のタッチパネルディスプレイに出力する。 On the other hand, when it determines with a question sentence (in the case of "YES" in step S9-2), the control part 21 of the assistance server 20 performs the output process of an option (step S9-4). Specifically, the terminal control unit 214 of the control unit 21 outputs options such as “Yes” and “No” along with the voice recognition result to the touch panel display of the user terminal 15 of the customer at the store.

次に、支援サーバ２０の制御部２１は、選択肢の選択結果の出力処理を実行する（ステップＳ９−５）。具体的には、制御部２１の端末制御部２１４が、来店顧客のユーザ端末１５のタッチパネルディスプレイにおいて、選択肢の選択を検知した場合には、選択結果を、ユーザ端末１０，１５に出力する。
これにより、選択肢を利用して、効率的なコミュニケーションを実現することができる。 Next, the control unit 21 of the support server 20 executes an option selection result output process (step S9-5). Specifically, when the terminal control unit 214 of the control unit 21 detects selection of an option on the touch panel display of the customer terminal 15 of the customer at the store, the selection result is output to the user terminals 10 and 15.
Thus, efficient communication can be realized using the options.

・上記各実施形態では、手話認識結果、音声認識結果をユーザ端末１０，１５に出力する。これに加えて、認識結果に応じて、関連するサービスメニューを、ユーザ端末１５に出力するコミュニケーション支援処理を行なうようにしてもよい。この場合には、支援サーバ２０に、更に、サービスメニュー記憶部を設ける。このサービスメニュー記憶部には、サービスメニュー項目に対して、手話認識又は音声認識される可能性がある関連単語に関するデータを記憶しておく。 In each of the above embodiments, the sign language recognition result and the speech recognition result are output to the user terminals 10 and 15. In addition to this, a communication support process for outputting a related service menu to the user terminal 15 according to the recognition result may be performed. In this case, the support server 20 is further provided with a service menu storage unit. The service menu storage unit stores data related to related words that may be recognized in sign language or speech for service menu items.

図８を用いて、このコミュニケーション支援処理を説明する。
まず、支援サーバ２０の制御部２１は、認識単語に応じて関連メニューの抽出処理を実行する（ステップＳ１０−１）。具体的には、制御部２１の認識調整部２１３は、サービスメニュー記憶部から、手話認識又は音声認識された単語が関連単語として登録されているサービスメニュー項目を抽出する。 This communication support process will be described with reference to FIG.
First, the control unit 21 of the support server 20 executes related menu extraction processing according to the recognized word (step S10-1). Specifically, the recognition adjustment unit 213 of the control unit 21 extracts, from the service menu storage unit, a service menu item in which a sign language recognized or voice recognized word is registered as a related word.

次に、支援サーバ２０の制御部２１は、関連メニューの重み付け処理を実行する（ステップＳ１０−２）。具体的には、制御部２１の認識調整部２１３は、サービスメニュー記憶部に記録されている各サービスメニュー項目について、手話認識又は音声認識された関連単語の数をカウントする。そして、認識調整部２１３は、認識数が多い関連単語のサービスメニュー項目に対して重み付けを行なう。 Next, the control unit 21 of the support server 20 executes a related menu weighting process (step S10-2). Specifically, the recognition adjustment unit 213 of the control unit 21 counts the number of related words recognized in sign language or speech for each service menu item recorded in the service menu storage unit. Then, the recognition adjustment unit 213 performs weighting on the service menu items of related words having a large number of recognitions.

次に、支援サーバ２０の制御部２１は、重み付けに応じて表示変更処理を実行する（ステップＳ１０−３）。具体的には、制御部２１の認識調整部２１３は、重み付けが大きいサービスメニュー項目を、ユーザ端末１５のタッチパネルディスプレイにおいて優先的に表示する。例えば、選択しやすい形態で表示する。
これにより、サービスメニュー項目を利用して、効率的なコミュニケーションを実現することができる。 Next, the control part 21 of the support server 20 performs a display change process according to weighting (step S10-3). Specifically, the recognition adjustment unit 213 of the control unit 21 preferentially displays a service menu item having a large weight on the touch panel display of the user terminal 15. For example, it is displayed in a form that can be easily selected.
Thus, efficient communication can be realized using the service menu item.

・上記各実施形態では、支援サーバ２０の制御部２１の筆談支援部２１５は、ユーザ端末１０，１５から取得したタッチ入力の軌跡に基づいて、描画された文字画像を生成する処理を実行する。これに代えて、筆談支援部２１５が、ユーザ端末１０，１５から取得したタッチ入力の軌跡に基づいて、筆談をテキストに変換するＯＣＲ処理を実行するようにしてもよい。この場合には、支援サーバ２０に、軌跡パターンに応じた単語に関するデータが記録したＯＣＲ辞書を設ける。これにより、手書き文字についてもテキストにより確認することができる。
更に、ＯＣＲ認識結果により、手話認識や音声認識を支援するようにしてもよい。この場合には、ＯＣＲ認識結果に基づいて、手話認識処理や音声認識処理において用いる関連語の特定処理を行なう。 In each of the above embodiments, the writing support unit 215 of the control unit 21 of the support server 20 executes a process of generating a drawn character image based on the touch input trajectory acquired from the user terminals 10 and 15. Instead of this, the writing support unit 215 may execute an OCR process for converting the writing into text based on the touch input trajectory acquired from the user terminals 10 and 15. In this case, the support server 20 is provided with an OCR dictionary in which data relating to words according to the trajectory pattern is recorded. Thereby, it is possible to confirm handwritten characters by text.
Furthermore, sign language recognition and voice recognition may be supported based on the OCR recognition result. In this case, based on the OCR recognition result, a related word specifying process used in the sign language recognition process or the voice recognition process is performed.

・上記各実施形態では、支援サーバ２０の制御部２１は、単語候補の出力処理（ステップＳ１−５、Ｓ５−５）、単語候補の選択処理（ステップＳ１−６、Ｓ５−６）を実行する。ここでは、手話認識部２１１は、単語候補を、ユーザ端末１５のタッチパネルディスプレイに出力し、顧客は、ユーザ端末１５のタッチパネルディスプレイに表示された単語候補を確認する。手話認識された単語に間違いがある場合には、この単語をタッチパネルディスプレイにおいて選択する。これに加えて、手話認識部２１１は、単語候補を、ユーザ端末１０のタッチパネルディスプレイにも出力し、窓口担当者が単語候補を確認できるようにしてもよい。そして、支援サーバ２０の制御部２１は、ユーザ端末１０における選択により、単語候補の選択処理を実行する（ステップＳ１−６、Ｓ５−６）。これにより、窓口担当者の判断に基づいて、手話認識を進めることができるため、来店顧客の作業負担を軽減することができる。 In each of the above embodiments, the control unit 21 of the support server 20 executes word candidate output processing (steps S1-5 and S5-5) and word candidate selection processing (steps S1-6 and S5-6). . Here, the sign language recognition unit 211 outputs the word candidates to the touch panel display of the user terminal 15, and the customer confirms the word candidates displayed on the touch panel display of the user terminal 15. If there is an error in the word recognized in sign language, this word is selected on the touch panel display. In addition, the sign language recognition unit 211 may output the word candidates to the touch panel display of the user terminal 10 so that the person in charge in charge can check the word candidates. And control part 21 of support server 20 performs selection processing of a word candidate by selection in user terminal 10 (Steps S1-6, S5-6). Thereby, since the sign language recognition can be advanced based on the judgment of the person in charge of the counter, it is possible to reduce the work load of the visiting customer.

・上記各実施形態では、出力処理において、来店顧客の顔画像５００をユーザ端末１０のタッチパネルディスプレイに出力する。これに加えて、他のカメラで撮影した窓口担当者の顔画像をユーザ端末１５のタッチパネルディスプレイに出力するようにしてもよい。 In each of the above embodiments, the face image 500 of the customer at the store is output to the touch panel display of the user terminal 10 in the output process. In addition to this, the face image of the person in charge of the window taken with another camera may be output to the touch panel display of the user terminal 15.

１０，１５…ユーザ端末、２０…支援サーバ、２１…制御部、２１１…手話認識部、２１２…音声認識部、２１３…認識調整部、２１４…端末制御部、２１５…筆談支援部、２２…手話認識辞書、２３…音声認識辞書、２４…関連語記憶部、３１…カメラ、３２…マイク。 DESCRIPTION OF SYMBOLS 10,15 ... User terminal, 20 ... Support server, 21 ... Control part, 211 ... Sign language recognition part, 212 ... Speech recognition part, 213 ... Recognition adjustment part, 214 ... Terminal control part, 215 ... Handwriting support part, 22 ... Sign language Recognition dictionary, 23 ... voice recognition dictionary, 24 ... related word storage unit, 31 ... camera, 32 ... microphone.

Claims

A first acquisition unit for acquiring first input information;
A second acquisition unit for acquiring second input information;
A control unit that generates a recognition result obtained by text-converting the first and second input information;
A communication support system comprising an output unit for outputting the recognition result,
The controller is
A first recognition processing unit that performs text conversion based on the first input information acquired in the first acquisition unit;
A second recognition processing unit that performs text conversion based on the second input information acquired by the second acquisition unit;
A communication support system comprising: a recognition adjustment unit that adjusts text conversion of the other recognition processing unit based on a recognition result of at least one of the first and second recognition processing units.

Of the first and second recognition processing units, weight the word candidates included in the recognition result in the recognition processing unit with a high recognition rate,
The communication support system according to claim 1, wherein the text conversion of the other recognition processing unit is adjusted based on the weighting.

For each recognition processing unit, calculate the recognition rate according to the correction of the recognition result,
The communication support system according to claim 1, wherein a recognition processing unit having a high recognition rate is specified based on the recognition rate.

A face image is acquired together with at least one of the first input information and the second input information, and the face image is output to the output unit in association with a recognition result corresponding to the input information. The communication support system according to any one of claims 1 to 3.

The communication support system according to claim 4, wherein the recognition result is output so that the face image is not superimposed on the output region.

A first acquisition unit for acquiring first input information;
A second acquisition unit for acquiring second input information;
A control unit that generates a recognition result obtained by text-converting the first and second input information;
A communication support system using a communication support system including an output unit that outputs the recognition result,
The controller is
A first recognition process for performing text conversion based on the first input information acquired by the first acquisition unit;
A second recognition process for performing text conversion based on the second input information acquired in the second acquisition unit;
And a recognition adjustment process for adjusting text conversion of the other recognition process based on a recognition result of at least one of the first and second recognition processes.

A first acquisition unit for acquiring first input information;
A second acquisition unit for acquiring second input information;
A control unit that generates a recognition result obtained by text-converting the first and second input information;
A communication support system using a communication support system including an output unit that outputs the recognition result,
The control unit
A first recognition processing unit that performs text conversion based on the first input information acquired by the first acquisition unit;
A second recognition processing unit for performing text conversion based on the second input information acquired in the second acquisition unit;
A communication support program that functions as a recognition adjustment unit that adjusts text conversion of the other recognition processing unit based on a recognition result of at least one of the first and second recognition processing units.