JP7586367B1

JP7586367B1 - Audio processing device, audio processing method, and program

Info

Publication number: JP7586367B1
Application number: JP2024104877A
Authority: JP
Inventors: 沙也加矢部; 佳奈子豊田; 真緒前川; 健一郎西脇
Original assignee: Toppan Holdings Inc
Current assignee: Toppan Holdings Inc
Priority date: 2024-06-28
Filing date: 2024-06-28
Publication date: 2024-11-19
Anticipated expiration: 2044-06-28
Also published as: JP2026006104A

Abstract

A speech processing device, a speech processing method, and a program are provided that are capable of easily correcting the results of simultaneous interpretation in multiple languages.
[Solution] A voice processing device comprising: a voice conversion unit that converts voice data indicating the content of a user's speech in a first language into text data indicating the translation results into a plurality of second languages different from the first language through simultaneous interpretation; a monitoring processing unit that accepts corrections to the text data for one specified second language among the plurality of second languages; and a translation unit that translates the corrected text data into the second language that was not specified at the time of correction, and obtains text data indicating the translation results in which the corrections are reflected for each of the plurality of second languages.
[Selected figure] Figure 2

Description

本発明は、音声処理装置、音声処理方法、及びプログラムに関する。 The present invention relates to an audio processing device, an audio processing method, and a program.

近年、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）の発展により、ＡＩを用いた音声認識や機械翻訳の精度は向上しているが、認識結果や翻訳結果に誤認識や誤訳が含まれることがある。 In recent years, advances in AI (Artificial Intelligence) have improved the accuracy of AI-based speech recognition and machine translation, but the recognition and translation results may still contain misrecognitions or mistranslations.

これに関連し、下記特許文献１には、音声認識による認識結果を修正するための技術が開示されている。当該技術は、例えば、音声データを含む動画データにおいて、音声認識結果に基づき生成される字幕を修正する際に用いられる。 In this regard, the following Patent Document 1 discloses a technique for correcting the results of speech recognition. This technique is used, for example, when correcting subtitles generated based on the results of speech recognition in video data that includes audio data.

特開２０１９－１４８６８１号公報JP 2019-148681 A

ところで、音声認識と機械翻訳を組み合わせることで同時通訳が可能となる。同時通訳においても、認識結果又は翻訳結果に誤認識又は誤訳が含まれることがある。このため、多言語の同時通訳において、同時通訳結果に誤認識及び誤訳の少なくとも一方でも含まれていると、チェッカーは１つ１つの言語ごとに同時通訳結果を修正する必要があり、修正が容易ではなかった。 By the way, simultaneous interpretation is possible by combining speech recognition and machine translation. However, even in simultaneous interpretation, the recognition or translation results may contain misrecognition or mistranslation. For this reason, in simultaneous interpretation of multiple languages, if the simultaneous interpretation results contain at least one misrecognition or mistranslation, the checker must correct the simultaneous interpretation results for each language, which is not easy to do.

上述の課題を鑑み、本発明の目的は、多言語の同時通訳における同時通訳結果を容易に修正することが可能な音声処理装置、音声処理方法、及びプログラムを提供することにある。 In view of the above problems, the object of the present invention is to provide a speech processing device, a speech processing method, and a program that can easily correct the results of simultaneous interpretation in multiple languages.

上述の課題を解決するために、本発明の一態様に係る音声処理装置は、第１の言語によるユーザの発話内容を示す音声データを、同時通訳によって、前記第１の言語と異なる複数の第２の言語への翻訳結果を示すテキストデータに変換する音声変換部と、複数の前記第２の言語のうち指定された１つの前記第２の言語について、前記テキストデータに対する修正を受け付けるモニタリング処理部と、修正後の前記テキストデータを、修正時に指定されなかった前記第２の言語に翻訳し、複数の前記第２の言語ごとに修正が反映された翻訳結果を示すテキストデータを取得する翻訳部と、を備える音声処理装置である。 In order to solve the above-mentioned problems, a voice processing device according to one aspect of the present invention is a voice processing device that includes a voice conversion unit that converts voice data indicating the contents of a user's utterance in a first language into text data indicating the translation results into a plurality of second languages different from the first language by simultaneous interpretation, a monitoring processing unit that accepts corrections to the text data for one specified second language among the plurality of second languages, and a translation unit that translates the corrected text data into the second language that was not specified at the time of correction and obtains text data indicating the translation results in which the corrections are reflected for each of the plurality of second languages.

本発明の一態様に係る音声処理方法は、第１の言語によるユーザの発話内容を示す音声データを、同時通訳によって、前記第１の言語と異なる複数の第２の言語への翻訳結果を示すテキストデータに変換する音声変換過程と、複数の前記第２の言語のうち指定された１つの前記第２の言語について、前記テキストデータに対する修正を受け付けるモニタリング処理過程と、修正後の前記テキストデータを、修正時に指定されなかった前記第２の言語に翻訳し、複数の前記第２の言語ごとに修正が反映された翻訳結果を示すテキストデータを取得する翻訳過程と、を含むコンピュータにより実行される音声処理方法である。 A speech processing method according to one aspect of the present invention is a speech processing method executed by a computer, including a speech conversion process for converting speech data indicating the contents of a user's speech in a first language into text data indicating the translation results into a plurality of second languages different from the first language by simultaneous interpretation, a monitoring process for accepting corrections to the text data for one specified second language among the plurality of second languages, and a translation process for translating the corrected text data into the second language that was not specified at the time of correction, and acquiring text data indicating the translation results in which the corrections are reflected for each of the plurality of second languages.

本発明の一態様に係るプログラムは、コンピュータを、第１の言語によるユーザの発話内容を示す音声データを、同時通訳によって、前記第１の言語と異なる複数の第２の言語への翻訳結果を示すテキストデータに変換する音声変換手段と、複数の前記第２の言語のうち指定された１つの前記第２の言語について、前記テキストデータに対する修正を受け付けるモニタリング処理手段と、修正後の前記テキストデータを、修正時に指定されなかった前記第２の言語に翻訳し、複数の前記第２の言語ごとに修正が反映された翻訳結果を示すテキストデータを取得する翻訳手段と、として機能させるためのプログラムである。 A program according to one aspect of the present invention is a program for causing a computer to function as: a speech conversion means for converting speech data indicating the contents of a user's speech in a first language into text data indicating the translation results into a plurality of second languages different from the first language by simultaneous interpretation; a monitoring processing means for accepting corrections to the text data for a specified one of the plurality of second languages; and a translation means for translating the corrected text data into the second language that was not specified at the time of correction, and acquiring text data indicating the translation results in which the corrections are reflected for each of the plurality of second languages.

本発明によれば、多言語の同時通訳における同時通訳結果を容易に修正することができる。 The present invention makes it possible to easily correct the results of simultaneous interpretation in multiple languages.

第１の実施形態に係る字幕表示システムの構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a configuration of a subtitle display system according to a first embodiment. 第１の実施形態に係る音声処理装置の機能構成の一例を示すブロック図である。1 is a block diagram showing an example of a functional configuration of a sound processing device according to a first embodiment. 第１の実施形態に係るモニタリング画面の一例を示す図である。FIG. 4 is a diagram showing an example of a monitoring screen according to the first embodiment; 第１の実施形態に係る修正操作手順の一例を示す図である。FIG. 11 is a diagram illustrating an example of a correction operation procedure according to the first embodiment. 第１の実施形態に係る字幕表示システムにおける処理の流れの一例を示すシーケンス図である。4 is a sequence diagram showing an example of a processing flow in the subtitle display system according to the first embodiment. FIG. 第２の実施形態に係る字幕表示システムの構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of a configuration of a subtitle display system according to a second embodiment. 第２の実施形態に係る音声処理装置の機能構成の一例を示すブロック図である。FIG. 11 is a block diagram showing an example of a functional configuration of a sound processing device according to a second embodiment. 第２の実施形態に係るモニタリング画面の一例を示す図である。FIG. 11 is a diagram showing an example of a monitoring screen according to the second embodiment. 第２の実施形態に係る修正操作手順の一例を示す図である。FIG. 11 is a diagram illustrating an example of a correction operation procedure according to the second embodiment. 第２の実施形態に係る変換優先度の変更の一例を示す図である。FIG. 11 is a diagram illustrating an example of change of conversion priority according to the second embodiment. 第２の実施形態に係る字幕表示システムにおける処理の流れの一例を示すシーケンス図である。FIG. 11 is a sequence diagram showing an example of a processing flow in a subtitle display system according to a second embodiment. 第２の実施形態に係る表示準備処理の流れの一例を示すシーケンス図である。FIG. 11 is a sequence diagram showing an example of the flow of a display preparation process according to the second embodiment.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。 The following describes in detail an embodiment of the present invention with reference to the drawings.

＜＜１．第１の実施形態＞＞
図１から図５を参照して、第１の実施形態について説明する。以下では、字幕表示システムについて説明する。字幕表示システムは、ユーザの発話内容を、当該ユーザが用いる言語とは異なる複数の言語に翻訳（多言語翻訳）し、各言語の字幕（多言語字幕）を表示するためのシステムである。
以下では、講演会にて講演するユーザ（講演者）の発話内容（講演内容）が多言語翻訳された字幕を、講演を聴講するユーザ（聴講者）へ配信する例を一例として、第１の実施形態について説明する。なお、第１の実施形態では、講演者が発話する言語（第１の言語）が任意の１つの言語（例えば日本語以外のいずれかの言語）であり、多言語翻訳された字幕の言語（第２の言語）が任意の複数の言語（例えば日本語を含む複数の言語）であるものとする。 <<1. First embodiment>>
A first embodiment will be described with reference to Figures 1 to 5. A subtitle display system will be described below. The subtitle display system is a system for translating a user's speech into multiple languages different from the language used by the user (multilingual translation) and displaying subtitles in each language (multilingual subtitles).
In the following, a first embodiment will be described by taking as an example a case where subtitles obtained by translating the speech content (lecture content) of a user (lecture speaker) giving a lecture at a lecture into multiple languages are distributed to users (listeners) listening to the lecture. In the first embodiment, the language spoken by the lecturer (first language) is any one language (e.g., any language other than Japanese), and the language of the subtitles translated into multiple languages (second language) is any multiple languages (e.g., multiple languages including Japanese).

＜１－１．字幕表示システムの構成＞
図１を参照して、第１の実施形態に係る字幕表示システムの構成について説明する。図１は、第１の実施形態に係る字幕表示システムの構成の一例を示す図である。 <1-1. Configuration of the subtitle display system>
The configuration of a subtitle display system according to the first embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of the configuration of a subtitle display system according to the first embodiment.

図１に示すように、字幕表示システム１は、集音装置１０と、音声処理装置２０と、同時通訳エンジン２１と、機械翻訳エンジン２２と、モニタリング端末３０と、表示装置４０とを備える。
各装置と端末は、有線接続、無線接続、又はネットワーク接続によって、各種情報を送受信可能に接続される。ネットワークには、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、電話網（携帯電話網、固定電話網等）、地域ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）網、インターネット等が適用される。 As shown in FIG. 1, the subtitle display system 1 includes a sound collection device 10, a sound processing device 20, a simultaneous interpretation engine 21, a machine translation engine 22, a monitoring terminal 30, and a display device 40.
Each device and terminal are connected to each other via a wired connection, a wireless connection, or a network connection so as to be able to transmit and receive various information. The network may be, for example, a local area network (LAN), a wide area network (WAN), a telephone network (such as a mobile phone network or a landline telephone network), a regional Internet Protocol (IP) network, the Internet, or the like.

（１）集音装置１０
集音装置１０は、講演者が発話することで生じる音声を集音する装置である。集音装置１０は、例えば、マイクである。
集音装置１０は、有線接続又は無線接続によって音声処理装置２０と通信可能に接続されている。集音装置１０は、講演者の音声を集音すると、集音した音声をデータ化し、データ化した音声データを音声処理装置２０へ送信する。 (1) Sound collection device 10
The sound collection device 10 is a device that collects the sound generated when a speaker speaks, and is, for example, a microphone.
The sound collection device 10 is connected to the sound processing device 20 via a wired connection or a wireless connection so as to be able to communicate with the sound processing device 20. When the sound collection device 10 collects the voice of the lecturer, it converts the collected voice into data and transmits the converted voice data to the sound processing device 20.

（２）音声処理装置２０
音声処理装置２０は、同時通訳の結果を字幕として表示するための処理を行う装置である。音声処理装置２０は、例えば、１又は複数のサーバ（例えば、クラウドサーバ）、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などの装置によって実現される。当該装置では、音声処理装置２０として機能させるためのプログラムによって各種処理が実行される。
音声処理装置２０は、集音装置１０から受信する音声データに基づき、各種処理を実行する。音声処理装置２０は、例えば、音声変換処理、モニタリング処理、機械翻訳処理、字幕表示処理などを実行する。 (2) Audio Processing Device 20
The voice processing device 20 is a device that performs processing for displaying the results of simultaneous interpretation as subtitles. The voice processing device 20 is realized by, for example, one or more servers (e.g., cloud servers), PCs (Personal Computers), etc. In the device, various processes are executed by a program for functioning as the voice processing device 20.
The voice processing device 20 executes various processes based on the voice data received from the sound collection device 10. The voice processing device 20 executes, for example, a voice conversion process, a monitoring process, a machine translation process, a subtitle display process, and the like.

音声変換処理は、音声データがテキストデータに変換される処理である。音声変換処理にて、音声処理装置２０は、集音装置１０から受信する音声データを、後述する同時通訳エンジン２１の機能によって音声認識及び機械翻訳し、翻訳結果を示すテキストデータを取得する。音声変換処理では、第１の言語で示される１つのテキストデータが第１の言語とは異なる複数の第２の言語で示される複数のテキストデータに機械翻訳される。 The voice conversion process is a process in which voice data is converted into text data. In the voice conversion process, the voice processing device 20 performs voice recognition and machine translation of the voice data received from the sound collection device 10 using the functions of the simultaneous interpretation engine 21 described below, and obtains text data indicating the translation result. In the voice conversion process, one piece of text data expressed in a first language is machine translated into multiple pieces of text data expressed in multiple second languages different from the first language.

モニタリング処理は、音声データに対する音声変換処理の結果（音声変換結果）をチェッカーがモニタリングするために実行される処理である。モニタリング処理にて、音声処理装置２０は、モニタリング画面をモニタリング端末３０に表示させる。モニタリング画面は、音声データから変換されたテキストデータの表示可否の選択操作と、テキストデータに対する修正操作とを受け付け可能な画面である。 The monitoring process is a process executed by the checker to monitor the results of the voice conversion process on the voice data (voice conversion result). In the monitoring process, the voice processing device 20 displays a monitoring screen on the monitoring terminal 30. The monitoring screen is a screen that can accept a selection operation for whether or not to display the text data converted from the voice data, and an operation for correcting the text data.

チェッカーは、モニタリング端末３０にてモニタリング画面に表示される音声変換結果を確認することで、音声変換結果をモニタリングすることができる。チェッカーは、モニタリングにて、音声変換処理にて音声データから変換されたテキストデータを字幕として表示可能か否か（表示可否）を判定する。表示可否の判定基準は、例えば、音声変換結果に誤認識又は誤変換があるか否か、あるいは、テキストデータの意味が通じるか否かなどである。音声変換結果に誤認識又は誤変換がある、あるいは、テキストデータの意味が通じない場合、チェッカーは、テキストデータを字幕として表示不可と判定する。一方、音声変換結果に誤認識又は誤変換がなく、かつ、テキストデータの意味が通じる場合、チェッカーは、テキストデータを字幕として表示可能と判定する。 The checker can monitor the voice conversion result by checking the voice conversion result displayed on the monitoring screen of the monitoring terminal 30. During monitoring, the checker determines whether the text data converted from the voice data in the voice conversion process can be displayed as subtitles (displayability). The criteria for determining whether displayability is possible include, for example, whether there is a misrecognition or misconversion in the voice conversion result, or whether the meaning of the text data is understandable. If there is a misrecognition or misconversion in the voice conversion result, or if the meaning of the text data is incomprehensible, the checker determines that the text data cannot be displayed as subtitles. On the other hand, if there is no misrecognition or misconversion in the voice conversion result and the meaning of the text data is understandable, the checker determines that the text data can be displayed as subtitles.

チェッカーは、判定結果に応じて、モニタリング画面にて表示可否の選択操作を行い、必要に応じて修正操作を行う。表示可能と判定した場合、チェッカーは、モニタリング画面にて表示可能を選択する操作をする。一方、表示不可と判定した場合、チェッカーは、モニタリング画面にて表示不可を選択する操作をし、表示可能な内容となるようにテキストデータを修正する。チェッカーによる修正操作が完了すると、修正されたテキストデータの表示可否は、音声処理装置２０によって自動で表示可能に変更される。 Depending on the result of the judgment, the checker performs an operation to select whether or not the text data can be displayed on the monitoring screen, and performs a correction operation as necessary. If it is judged that the text data can be displayed, the checker performs an operation to select "displayable" on the monitoring screen. On the other hand, if it is judged that the text data cannot be displayed, the checker performs an operation to select "displayable" on the monitoring screen, and corrects the text data so that it can be displayed. Once the correction operation by the checker is completed, the displayability of the corrected text data is automatically changed to "displayable" by the audio processing device 20.

なお、モニタリング画面には、表示可否の自動選択機能が設けられてもよい。自動選択機能は、モニタリング画面にテキストデータが表示されてから所定の時間が経過後、自動で表示可能又は表示不可が自動で選択される機能である。所定の時間は、モニタリング画面にて任意のユーザ（例えばチェッカーや管理者など）が任意の時間（例えば３秒など）を設定可能である。また、表示可能又は表示不可のどちらを自動で選択するかは、モニタリング画面にて任意のユーザが設定可能である。 The monitoring screen may be provided with an automatic selection function for whether or not to display. The automatic selection function is a function that automatically selects whether to display or not display after a predetermined time has elapsed since text data was displayed on the monitoring screen. The predetermined time can be set by any user (e.g., a checker or an administrator) on the monitoring screen to any time (e.g., 3 seconds). Also, whether to automatically select whether to display or not display can be set by any user on the monitoring screen.

また、モニタリング画面に表示される音声変換結果は、１つの言語のテキストデータのみである。チェッカーは、モニタリング画面にて、音声変換処理にて変換された複数の第２の言語のテキストデータのうち、１つの第２の言語のテキストデータのみについてモニタリングすればよい。なお、複数の第２の言語のテキストデータのうちモニタリング画面に表示する第２の言語のテキストデータは、例えばチェッカーが扱うことが可能な言語に応じて、適宜選択可能である。 The voice conversion result displayed on the monitoring screen is text data in only one language. The checker only needs to monitor, on the monitoring screen, the text data in one second language among the text data in multiple second languages converted by the voice conversion process. Note that the text data in the second language to be displayed on the monitoring screen among the text data in multiple second languages can be appropriately selected according to, for example, the language that the checker can handle.

機械翻訳処理は、テキストデータが翻訳される処理である。機械翻訳処理にて、音声処理装置２０は、モニタリング処理にて表示不可と判定され修正されたテキストデータを、後述する機械翻訳エンジン２２の機能によって機械翻訳し、翻訳結果を示すテキストデータを取得する。機械翻訳処理では、第１の言語で示される１つのテキストデータが第１の言語とは異なる複数の第２の言語で示される複数のテキストデータに機械翻訳される。このため、音声処理装置２０は、機械翻訳処理により、ある言語について修正された１つのテキストデータから、他の複数の言語について修正が反映されたテキストデータを取得することができる。 Machine translation processing is processing in which text data is translated. In machine translation processing, the voice processing device 20 machine translates the text data that has been determined to be undisplayable in the monitoring processing and has been corrected, using the functions of the machine translation engine 22 described below, to obtain text data indicating the translation result. In machine translation processing, one piece of text data expressed in a first language is machine translated into multiple pieces of text data expressed in multiple second languages different from the first language. Therefore, the voice processing device 20 can obtain text data in which the corrections are reflected in multiple other languages from one piece of text data corrected in one language, using machine translation processing.

字幕表示処理は、字幕が表示される処理である。字幕表示処理にて、音声処理装置２０は、機械翻訳処理における翻訳結果を示すテキストデータを、字幕として表示装置４０に表示させる。
なお、音声処理装置２０は、指定された１つの第２の言語の字幕のみを表示装置４０に表示させてもよいし、複数の第２の言語の字幕を表示装置４０に表示させてもよい。 The subtitle display process is a process in which subtitles are displayed. In the subtitle display process, the voice processing device 20 causes the display device 40 to display text data indicating the translation result in the machine translation process as subtitles.
Note that the audio processing device 20 may cause the display device 40 to display only subtitles in one specified second language, or may cause the display device 40 to display subtitles in a plurality of second languages.

（３）同時通訳エンジン２１
同時通訳エンジン２１は、第１の言語を第２の言語に同時通訳するエンジン（プログラム）である。同時通訳エンジン２１は、音声処理装置２０から入力される第１の言語の音声データを音声認識によって第１の言語のテキストデータに変換し、当該第１の言語のテキストデータを第２の言語のテキストデータに機械翻訳（変換）する。同時通訳エンジン２１は、１つの第１の言語を異なる複数の第２の言語に機械翻訳する。即ち、同時通訳エンジン２１は、１つの第１の言語のテキストデータから複数の第２の言語のテキストデータを生成する。
なお、同時通訳エンジン２１の機能は、音声処理装置２０とは異なる装置又は端末によって提供されてもよいし、音声処理装置２０によって提供されてもよい。 (3) Simultaneous Interpretation Engine 21
The simultaneous interpretation engine 21 is an engine (program) that simultaneously translates a first language into a second language. The simultaneous interpretation engine 21 converts the voice data of the first language input from the voice processing device 20 into text data of the first language by voice recognition, and machine translates (converts) the text data of the first language into text data of the second language. The simultaneous interpretation engine 21 machine translates one first language into multiple different second languages. In other words, the simultaneous interpretation engine 21 generates multiple text data of the second language from text data of one first language.
The function of the simultaneous interpretation engine 21 may be provided by a device or terminal different from the voice processing device 20, or may be provided by the voice processing device 20.

（４）機械翻訳エンジン２２
機械翻訳エンジン２２は、第１の言語を第２の言語に機械翻訳するエンジン（プログラム）である。機械翻訳エンジン２２は、音声処理装置２０から入力される第１の言語のテキストデータを第２の言語のテキストデータに機械翻訳（変換）する。機械翻訳エンジン２２は、１つの第１の言語を異なる複数の第２の言語に機械翻訳する。即ち、機械翻訳エンジン２２は、１つの第１の言語のテキストデータから複数の第２の言語のテキストデータを生成する。
なお、機械翻訳エンジン２２の機能は、音声処理装置２０とは異なる装置又は端末によって提供されてもよいし、音声処理装置２０によって提供されてもよい。 (4) Machine translation engine 22
The machine translation engine 22 is an engine (program) that performs machine translation from a first language to a second language. The machine translation engine 22 performs machine translation (conversion) of text data in the first language input from the speech processing device 20 into text data in the second language. The machine translation engine 22 performs machine translation from one first language to multiple different second languages. In other words, the machine translation engine 22 generates text data in multiple second languages from text data in one first language.
The function of the machine translation engine 22 may be provided by a device or terminal different from the speech processing device 20, or may be provided by the speech processing device 20.

（５）モニタリング端末３０
モニタリング端末３０は、チェッカーがモニタリングのために使用する端末である。モニタリング端末３０は、例えば、ＰＣ、スマートフォン、タブレット端末などの端末である。当該端末では、モニタリング端末３０として機能させるためのプログラムによって各種処理が実行される。
モニタリング端末３０は、音声処理装置２０から受信する画面情報に基づき、モニタリング画面を表示し、チェッカーによるモニタリングに関する各種操作を受け付ける。 (5) Monitoring terminal 30
The monitoring terminal 30 is a terminal used by a checker for monitoring. The monitoring terminal 30 is, for example, a PC, a smartphone, a tablet terminal, etc. In the monitoring terminal 30, various processes are executed by a program for functioning as the monitoring terminal 30.
The monitoring terminal 30 displays a monitoring screen based on the screen information received from the sound processing device 20, and accepts various operations related to monitoring by the checker.

モニタリング端末３０には、例えば、モニタリング機能を利用するためのアプリケーション（以下、「モニタリングアプリ」とも称される）によって、モニタリング画面が表示される。チェッカーは、モニタリングアプリによってモニタリング端末３０に表示されるモニタリング画面を操作することで、同時通訳結果のモニタリングを行うことができる。
なお、モニタリングアプリの機能は、各端末にモニタリングアプリをインストールすること（即ちネイティブアプリ）で提供されてもよいし、Ｗｅｂシステム（即ちＷｅｂアプリ）によって提供されてもよい。Ｗｅｂアプリの場合、モニタリングアプリはサーバで管理されており、その機能はＷｅｂブラウザを介して提供される。 For example, a monitoring screen is displayed on the monitoring terminal 30 by an application for utilizing a monitoring function (hereinafter, also referred to as a "monitoring app") The checker can monitor the results of the simultaneous interpretation by operating the monitoring screen displayed on the monitoring terminal 30 by the monitoring app.
The functions of the monitoring app may be provided by installing the monitoring app on each terminal (i.e., a native app) or may be provided by a web system (i.e., a web app). In the case of a web app, the monitoring app is managed by a server, and its functions are provided via a web browser.

（６）表示装置４０
表示装置４０は、字幕を表示する装置である。表示装置４０は、例えば、スクリーン４１などのようなディスプレイ装置であってもよいし、スマートフォン４２などのようなディスプレイを有する装置であってもよい。表示装置４０は、音声処理装置２０と通信可能に接続され、音声処理装置２０から受信する画面情報に基づき字幕を表示する。 (6) Display device 40
The display device 40 is a device that displays subtitles. The display device 40 may be, for example, a display device such as a screen 41, or may be a device having a display such as a smartphone 42. The display device 40 is communicably connected to the audio processing device 20, and displays subtitles based on screen information received from the audio processing device 20.

表示装置４０には、例えば、字幕表示機能を利用するためのアプリケーション（以下、「字幕表示アプリ」とも称される）によって、字幕表示画面が表示される。聴講者は、字幕表示アプリによって表示装置４０に表示される字幕表示画面を操作することで、同時通訳結果を示す字幕を確認することができる。
なお、字幕表示アプリの機能は、各装置に字幕表示アプリをインストールすること（即ちネイティブアプリ）で提供されてもよいし、Ｗｅｂシステム（即ちＷｅｂアプリ）によって提供されてもよい。Ｗｅｂアプリの場合、字幕表示アプリはサーバで管理されており、その機能はＷｅｂブラウザを介して提供される。 For example, a subtitle display screen is displayed on the display device 40 by an application for utilizing a subtitle display function (hereinafter, also referred to as a "subtitle display application") The audience can check the subtitles showing the results of the simultaneous interpretation by operating the subtitle display screen displayed on the display device 40 by the subtitle display application.
The function of the subtitle display application may be provided by installing the subtitle display application in each device (i.e., a native application), or may be provided by a Web system (i.e., a Web application). In the case of a Web application, the subtitle display application is managed by a server, and the function is provided via a Web browser.

＜１－２．音声処理装置の機能構成＞
以上、第１の実施形態に係る字幕表示システム１の構成について説明した。続いて、図２から図４を参照して、第１の実施形態に係る音声処理装置２０の機能構成について説明する。図２は、第１の実施形態に係る音声処理装置２０の機能構成の一例を示すブロック図である。
図２に示すように、音声処理装置２０は、通信部２１０と、記憶部２２０と、第１制御部２３０と、第２制御部２４０とを備える。 <1-2. Functional configuration of the voice processing device>
The configuration of the subtitle display system 1 according to the first embodiment has been described above. Next, the functional configuration of the audio processing device 20 according to the first embodiment will be described with reference to Fig. 2 to Fig. 4. Fig. 2 is a block diagram showing an example of the functional configuration of the audio processing device 20 according to the first embodiment.
As shown in FIG. 2 , the voice processing device 20 includes a communication unit 210 , a storage unit 220 , a first control unit 230 , and a second control unit 240 .

（１）通信部２１０
通信部２１０は、各種情報を送受信する機能を有する。通信部２１０は、集音装置１０と、同時通訳エンジン２１と、機械翻訳エンジン２２と、モニタリング端末３０と、表示装置４０と通信可能に接続されており、各種情報を送受信する。 (1) Communication unit 210
The communication unit 210 has a function of transmitting and receiving various information. The communication unit 210 is communicably connected to the sound collection device 10, the simultaneous interpretation engine 21, the machine translation engine 22, the monitoring terminal 30, and the display device 40, and transmits and receives various information.

（２）記憶部２２０
記憶部２２０は、各種情報を記憶する機能を有する。記憶部２２０は、音声処理装置２０がハードウェアとして備える記憶媒体、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、フラッシュメモリ、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓｒｅａｄ／ｗｒｉｔｅＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、又はこれらの記憶媒体の任意の組み合わせによって構成される。 (2) Storage unit 220
The storage unit 220 has a function of storing various information. The storage unit 220 is configured by a storage medium provided as hardware in the audio processing device 20, such as a hard disk drive (HDD), a solid state drive (SSD), a flash memory, an electrically erasable programmable read only memory (EEPROM), a random access read/write memory (RAM), a read only memory (ROM), or any combination of these storage media.

記憶部２２０は、例えば、変換候補情報を記憶する。変換候補情報は、テキストデータの変換候補を示す情報である。変換候補情報には、チェッカーによる修正実績に基づき、変換候補となるテキストが蓄積される。
なお、変換候補は、例えば講演単位で蓄積される。変換候補は、講演の終了時に削除されてもよいし、蓄積したまま残されてもよい。蓄積されたまま残された変換候補は、他の講演に用いられてもよい。 The storage unit 220 stores, for example, conversion candidate information. The conversion candidate information is information that indicates conversion candidates for text data. The conversion candidate information accumulates text that is a conversion candidate based on the correction history by the checker.
The conversion candidates are stored, for example, on a lecture-by-lecture basis. The conversion candidates may be deleted when the lecture ends, or may remain stored. The conversion candidates that remain stored may be used for another lecture.

（３）第１制御部２３０
第１制御部２３０は、同時通訳に関する処理を制御する機能を有する。第１制御部２３０は、例えば、音声処理装置２０がハードウェアとして備えるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）又はＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）にプログラムを実行させることによって実現される。
図２に示すように、第１制御部２３０は、音声データ取得部２３１と、音声変換部２３２とを備える。 (3) First control unit 230
The first control unit 230 has a function of controlling processes related to simultaneous interpretation. The first control unit 230 is realized, for example, by causing a central processing unit (CPU) or a graphics processing unit (GPU) provided as hardware in the speech processing device 20 to execute a program.
As shown in FIG. 2, the first control unit 230 includes a voice data acquisition unit 231 and a voice conversion unit 232 .

（３－１）音声データ取得部２３１
音声データ取得部２３１は、音声データを取得する機能を有する。音声データ取得部２３１は、通信部２１０が集音装置１０から受信する音声データを取得し、音声変換部２３２へ入力する。 (3-1) Voice data acquisition unit 231
The voice data acquisition unit 231 has a function of acquiring voice data. The voice data acquisition unit 231 acquires the voice data received by the communication unit 210 from the sound collection device 10, and inputs the voice data to the voice conversion unit 232.

（３－２）音声変換部２３２
音声変換部２３２は、音声変換処理を実行する機能を有する。音声変換部２３２は、同時通訳エンジン２１を用いて音声変換処理を実行する。音声変換処理にて、音声変換部２３２は、音声データ取得部２３１によって取得される、第１の言語による講演者の発話内容を示す音声データを同時通訳エンジン２１へ入力し、同時通訳エンジン２１から出力されるテキストデータを翻訳結果として取得する。これにより、音声変換部２３２は、第１の言語による講演者の発話内容を示す音声データを、同時通訳によって、第１の言語と異なる複数の第２の言語への翻訳結果を示すテキストデータに変換することができる。
音声変換部２３２は、音声変換処理によって得られた複数の第２の言語のテキストデータをモニタリング処理部２４１へ入力する。 (3-2) Voice conversion unit 232
The voice conversion unit 232 has a function of executing a voice conversion process. The voice conversion unit 232 executes the voice conversion process using the simultaneous interpretation engine 21. In the voice conversion process, the voice conversion unit 232 inputs the voice data indicating the contents of the speaker's speech in the first language acquired by the voice data acquisition unit 231 to the simultaneous interpretation engine 21, and acquires text data output from the simultaneous interpretation engine 21 as a translation result. In this way, the voice conversion unit 232 can convert the voice data indicating the contents of the speaker's speech in the first language into text data indicating the translation results into a plurality of second languages different from the first language by simultaneous interpretation.
The voice conversion unit 232 inputs the text data in the second language obtained by the voice conversion process to the monitoring processing unit 241.

（４）第２制御部２４０
第２制御部２４０は、モニタリングと字幕表示に関する処理を制御する機能を有する。第２制御部２４０は、例えば、音声処理装置２０がハードウェアとして備えるＣＰＵ又はＧＰＵにプログラムを実行させることによって実現される。
図２に示すように、第２制御部２４０は、モニタリング処理部２４１と、機械翻訳部２４２と、字幕処理部２４３とを備える。 (4) Second control unit 240
The second control unit 240 has a function of controlling processes related to monitoring and subtitle display. The second control unit 240 is realized, for example, by causing a CPU or a GPU included in the audio processing device 20 as hardware to execute a program.
As shown in FIG. 2, the second control unit 240 includes a monitoring processing unit 241 , a machine translation unit 242 , and a subtitle processing unit 243 .

（４－１）モニタリング処理部２４１
モニタリング処理部２４１は、モニタリング処理を実行する機能を有する。モニタリング処理にて、モニタリング処理部２４１は、通信部２１０から画面情報をモニタリング端末３０へ送信し、モニタリング端末３０にモニタリング画面を表示する。モニタリング処理部２４１は、モニタリング端末３０に表示されたモニタリング画面を介して、チェッカーからテキストデータの表示可否の選択操作と、テキストデータに対する修正操作とを受け付ける。 (4-1) Monitoring Processing Unit 241
The monitoring processing unit 241 has a function of executing a monitoring process. In the monitoring process, the monitoring processing unit 241 transmits screen information from the communication unit 210 to the monitoring terminal 30 and displays a monitoring screen on the monitoring terminal 30. The monitoring processing unit 241 accepts, via the monitoring screen displayed on the monitoring terminal 30, a selection operation of whether or not to display text data from the checker and an operation to correct the text data.

モニタリング処理部２４１は、音声変換部２３２から入力される複数の第２の言語のテキストデータのうち、モニタリングの対象として予め指定されている１つの言語のテキストデータをモニタリング画面に表示する。このため、モニタリング処理部２４１は、複数の第２の言語のうち指定された１つの第２の言語について、テキストデータに対する修正を受け付ける。
なお、モニタリングの対象として予め指定されている１つの言語は、例えばチェッカーが希望する言語である。第１の実施形態では、一例として、チェッカーが日本人であり、チェッカーが希望する言語が日本語であるとする。 The monitoring processing unit 241 displays on the monitoring screen the text data of one language that has been designated in advance as a target for monitoring, among the text data of the plurality of second languages input from the voice conversion unit 232. For this reason, the monitoring processing unit 241 accepts corrections to the text data for the one second language designated among the plurality of second languages.
Note that one language designated in advance as a target for monitoring is, for example, a language desired by the checker. In the first embodiment, as an example, it is assumed that the checker is Japanese and the language desired by the checker is Japanese.

モニタリング処理部２４１は、モニタリング画面に表示されたテキストデータについて、表示装置４０への表示可否の選択操作を受け付けるためのＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）をモニタリング画面に表示する。当該ＵＩは、例えばボタンであるが、チェックボックス、プルダウンなどであってもよい。
モニタリング処理部２４１は、表示可否の選択操作を受け付けると、当該操作の対象であるテキストデータの字幕の表示を制御する。モニタリング処理部２４１の制御により、モニタリング画面にて表示可否に不可が選択されているテキストデータの字幕は表示装置４０に表示されず、モニタリング画面にて表示可否に可能が選択されているテキストデータの字幕は表示装置４０に表示される。 The monitoring processing unit 241 displays, on the monitoring screen, a UI (User Interface) for receiving a selection operation as to whether or not the text data displayed on the monitoring screen should be displayed on the display device 40. The UI is, for example, a button, but may also be a check box, a pull-down, or the like.
When the monitoring processing unit 241 receives a selection operation for display or non-display, it controls the display of subtitles of the text data that is the target of the operation. By the control of the monitoring processing unit 241, subtitles of text data for which "not possible" has been selected as the display option on the monitoring screen are not displayed on the display device 40, and subtitles of text data for which "possible" has been selected as the display option on the monitoring screen are displayed on the display device 40.

モニタリング処理部２４１は、テキストデータを表示するためのＵＩをモニタリング画面に表示する。当該ＵＩは、例えばテキストフィールドである。モニタリング処理部２４１は、モニタリング画面にて、テキストデータをテキストフィールドに表示し、当該テキストフィールドに対する操作によって表示可否の選択を受け付けてもよい。例えば、モニタリング処理部２４１は、テキストフィールドの内部が選択されると表示可否を不可に切り替え、テキストフィールドの内部を選択後にテキストフィールドの外部が選択されると表示可否を可能に切り替える。 The monitoring processing unit 241 displays a UI for displaying text data on the monitoring screen. The UI is, for example, a text field. The monitoring processing unit 241 may display the text data in the text field on the monitoring screen, and accept a selection of whether or not to display the text field through an operation on the text field. For example, the monitoring processing unit 241 switches whether or not to display when the inside of the text field is selected, and switches whether or not to display when the outside of the text field is selected after the inside of the text field is selected.

また、モニタリング処理部２４１は、テキストデータが表示されたテキストフィールドに対する操作によって、テキストデータに対する修正操作を受け付ける。モニタリング処理部２４１は、例えば、チェッカーによってテキストフィールドの内部が選択されると、選択されたテキストフィールドに表示されているテキストデータに対する修正を受け付ける。チェッカーは、例えば手入力によってテキストデータを修正できる。 The monitoring processing unit 241 also accepts correction operations for the text data by operations on the text field in which the text data is displayed. For example, when the inside of a text field is selected by the checker, the monitoring processing unit 241 accepts corrections to the text data displayed in the selected text field. The checker can correct the text data, for example, by manual input.

モニタリング処理部２４１は、モニタリング画面にて、テキストデータの修正箇所が選択されると修正箇所の近傍に変換候補を表示してもよい。モニタリング処理部２４１は、チェッカーによって変換候補から選択されたテキストを修正箇所に挿入する。このように、チェッカーは、手入力だけでなく、変換候補を選択することでテキストデータを修正することもできる。 When a portion of text data to be corrected is selected on the monitoring screen, the monitoring processing unit 241 may display conversion candidates near the portion to be corrected. The monitoring processing unit 241 inserts text selected by the checker from the conversion candidates into the portion to be corrected. In this way, the checker can correct text data not only by manual input, but also by selecting conversion candidates.

なお、モニタリング処理部２４１は、修正前のテキストデータと修正後のテキストデータとを比較して差分として検出されるテキストのうち、変換候補にないテキストを新しく変換候補に追加する。この場合、モニタリング処理部２４１は、記憶部２２０に記憶されている変換候補情報に新しい変換候補を追加する。これにより、変換候補情報には、チェッカーによる修正実績に基づき、変換候補となるテキストが蓄積されていく。 The monitoring processing unit 241 adds, to the conversion candidates, any text that is not included in the conversion candidates and is detected as a difference when comparing the text data before and after correction. In this case, the monitoring processing unit 241 adds the new conversion candidate to the conversion candidate information stored in the storage unit 220. As a result, text that is a conversion candidate is accumulated in the conversion candidate information based on the correction history by the checker.

テキストフィールドに表示されたテキストデータの修正が不要であり、表示可否に可能が選択された場合、モニタリング処理部２４１は、音声変換部２３２による音声変換処理によって得られた複数の第２の言語のテキストデータを字幕処理部２４３へ入力する。一方、テキストフィールドに表示されたテキストデータの修正が必要であり、表示可否に不可が選択された場合、モニタリング処理部２４１は、修正後のテキストデータを機械翻訳部２４２へ入力する。 If the text data displayed in the text field does not need to be corrected and "Yes" is selected for "Displayable", the monitoring processing unit 241 inputs the text data in the second language obtained by the voice conversion process by the voice conversion unit 232 to the subtitle processing unit 243. On the other hand, if the text data displayed in the text field needs to be corrected and "No" is selected for "Displayable", the monitoring processing unit 241 inputs the corrected text data to the machine translation unit 242.

なお、表示装置４０にて既に字幕が表示されているテキストデータの表示可否に不可が選択された場合、モニタリング処理部２４１は、不可が選択されたテキストデータの字幕を非表示にし、修正操作を受け付け可能とする。 When "no" is selected as the display option for text data for which subtitles are already displayed on the display device 40, the monitoring processing unit 241 hides the subtitles for the text data for which "no" is selected, and makes it possible to accept correction operations.

ここで、図３を参照して、第１の実施形態に係るモニタリング画面について説明する。図３は、第１の実施形態に係るモニタリング画面の一例を示す図である。 Here, the monitoring screen according to the first embodiment will be described with reference to FIG. 3. FIG. 3 is a diagram showing an example of the monitoring screen according to the first embodiment.

図３に示すモニタリング画面Ｇ１のボタンＢ１、ボタンＢ２、及びプルダウンＰＤは、表示可否の自動選択に関する設定を行うためのＵＩである。ボタンＢ１は、表示可否として可能（○）を自動で選択することを設定するためのボタンである。ボタンＢ２は、表示可否として不可（×）を自動で選択することを設定するためのボタンである。プルダウンＰＤは、モニタリング画面Ｇ１にテキストデータが表示されてから表示可否を自動で選択するまでの所定の時間を設定するためのプルダウンである。
図３に示す例では、一例として、ボタンＢ１がオン、ボタンＢ２がオフ、所定の時間が３秒に設定されている。この場合、モニタリング画面にテキストデータが表示されてから３秒経過後に、表示可否として可能が自動で選択される。一方、ボタンＢ１がオフ、ボタンＢ２がオンであった場合、モニタリング画面にテキストデータが表示されてから３秒経過後に、表示可否として不可が自動で選択される。 The buttons B1, B2, and pull-down PD on the monitoring screen G1 shown in Fig. 3 are UIs for making settings related to automatic selection of display/non-display. The button B1 is a button for setting automatic selection of possible (o) as display/non-display. The button B2 is a button for setting automatic selection of not possible (x) as display/non-display. The pull-down PD is a pull-down for setting a predetermined time from when text data is displayed on the monitoring screen G1 until when display/non-display is automatically selected.
3, as an example, button B1 is on, button B2 is off, and the predetermined time is set to three seconds. In this case, "Yes" is automatically selected as the display possibility three seconds after the text data is displayed on the monitoring screen. On the other hand, if button B1 is off and button B2 is on, "No" is automatically selected as the display possibility three seconds after the text data is displayed on the monitoring screen.

図３のモニタリング画面Ｇ１のテキストフィールドＦ１～Ｆ４は、表示されたテキストデータをチェッカーが修正するためのＵＩである。モニタリング処理部２４１は、音声変換部２３２から入力される複数の第２の言語のテキストデータのうち、モニタリングの対象として予め指定されている１つの言語のテキストデータを、当該テキストフィールドに表示する。
図３に示す例では、一例として、４つの音声データから音声変換された４つのテキストデータが、それぞれテキストフィールドＦ１～Ｆ４に時系列で表示されている。 3 are UIs for the checker to correct the displayed text data. The monitoring processing unit 241 displays, in the text field, text data in one language that is designated in advance as a target for monitoring, among the text data in the second language input from the speech conversion unit 232.
In the example shown in FIG. 3, four pieces of text data converted from four pieces of voice data are displayed in chronological order in text fields F1 to F4, respectively.

図３のモニタリング画面Ｇ１のボタンＢ３及びボタンＢ４は、チェッカーがテキストデータの表示可否を選択するためのＵＩである。ボタンＢ３は、表示可否として不可（×）を選択するためのボタンである。ボタンＢ４は、表示可否として可能（○）を選択するためのボタンである。ボタンＢ３及びボタンＢ４は、テキストフィールドＦごとに表示される。なお、ボタンＢ３及びボタンＢ４の選択は、チェッカーによる選択だけでなく、所定の時間の経過後、テキストデータの修正開始時、テキストデータの修正後などに、モニタリング処理部２４１の制御によって切り替えられる場合もある。 Buttons B3 and B4 on the monitoring screen G1 in FIG. 3 are UIs that allow the checker to select whether or not to display the text data. Button B3 is a button for selecting "not possible" (x) as whether or not to display. Button B4 is a button for selecting "possible" (o) as whether or not to display. Buttons B3 and B4 are displayed for each text field F. Note that the selection of buttons B3 and B4 may not only be made by the checker, but may also be switched under the control of the monitoring processing unit 241 after a predetermined time has elapsed, when editing the text data starts, or after editing the text data.

図３に示す例では、一例として、テキストフィールドＦ１～Ｆ４のそれぞれにボタンＢ３－１～Ｂ３－４と、ボタンＢ４－１～Ｂ４－４が表示されている。
テキストフィールドＦ１のボタンＢ３－１とボタンＢ４－１の例では、チェッカーによって、テキストフィールドＦ１に表示されているテキストデータが正確であり表示可能であると判定され、ボタンＢ４－１が選択されている。
テキストフィールドＦ２のボタンＢ３－２とボタンＢ４－２の例では、チェッカーによって、テキストフィールドＦ２に表示されているテキストデータの意味が通じないと判定され、ボタンＢ３－２が選択されている。
テキストフィールドＦ３のボタンＢ３－３とボタンＢ４－３の例では、チェッカーによって、テキストフィールドＦ３に表示されているテキストデータが正確であり表示可能であると判定され、ボタンＢ４－３が選択されている。
テキストフィールドＦ４のボタンＢ３－４とボタンＢ４－４の例では、チェッカーによって、テキストフィールドＦ４に表示されているテキストデータの修正必要かつ表示不可であると判定され、ボタンＦ３－４が選択されたが、テキストデータの修正後にモニタリング処理部２４１の制御によってボタンＢ４－４が選択されている。 In the example shown in FIG. 3, as an example, buttons B3-1 to B3-4 and buttons B4-1 to B4-4 are displayed in each of the text fields F1 to F4.
In the example of buttons B3-1 and B4-1 in text field F1, the checker has determined that the text data displayed in text field F1 is correct and can be displayed, and button B4-1 has been selected.
In the example of buttons B3-2 and B4-2 in text field F2, the checker determines that the meaning of the text data displayed in text field F2 is incomprehensible, and button B3-2 is selected.
In the example of buttons B3-3 and B4-3 in text field F3, the checker has determined that the text data displayed in text field F3 is correct and can be displayed, and button B4-3 has been selected.
In the example of buttons B3-4 and B4-4 in text field F4, the checker determined that the text data displayed in text field F4 needed to be corrected and could not be displayed, and button F3-4 was selected, but after the text data was corrected, button B4-4 was selected under the control of the monitoring processing unit 241.

ここで、図４を参照して、第１の実施形態に係る修正操作手順について説明する。図４は、第１の実施形態に係る修正操作手順の一例を示す図である。図４には、テキストフィールドＦ５に表示されたテキストデータをチェッカーが修正する例が示されている。なお、ボタンＢ３－５とボタンＢ４－５では、初期選択としてボタンＢ４－５が選択されているものとする。 Now, referring to FIG. 4, the correction operation procedure according to the first embodiment will be described. FIG. 4 is a diagram showing an example of the correction operation procedure according to the first embodiment. FIG. 4 shows an example in which a checker corrects text data displayed in text field F5. Note that, between buttons B3-5 and B4-5, it is assumed that button B4-5 is selected as the initial selection.

図４に示すように、まず、チェッカーは、テキストフィールドＦ５に表示されたテキストデータを確認し、修正が必要であるためテキストフィールドＦ５の任意の位置を選択（タッチ）する（ステップＳ１）。図４に示す例では、修正箇所（位置Ｐ）が選択されたものとする。 As shown in FIG. 4, first, the checker checks the text data displayed in the text field F5, and since correction is required, selects (touches) any position in the text field F5 (step S1). In the example shown in FIG. 4, it is assumed that the correction location (position P) is selected.

チェッカーによるテキストフィールドＦ５の選択後、モニタリング処理部２４１は、テキストフィールドＦ５を選択状態（例えば太枠表示）にし、表示可否の選択をボタンＢ４－５からボタンＢ３－５へ切り替え、テキストフィールドＦ５の内部にカーソルＫを表示し、修正箇所の近傍に変換候補を示すウィンドウＷ１を表示する（ステップＳ２）。カーソルＫの位置が修正箇所からずれている場合、チェッカーは、カーソルＫの位置を修正箇所へ移動する。
なお、テキストフィールドＦ５に表示されているテキストデータの字幕が既に表示装置４０に表示されている場合、チェッカーによって修正箇所が選択されたタイミングで、テキストフィールドＦ５に表示されているテキストデータの字幕が非表示（削除）される。 After the checker selects the text field F5, the monitoring processor 241 selects the text field F5 (e.g., displays a thick frame), switches the display/non-display selection from button B4-5 to button B3-5, displays a cursor K inside the text field F5, and displays a window W1 showing conversion candidates near the correction location (step S2). If the position of the cursor K is not aligned with the correction location, the checker moves the position of the cursor K to the correction location.
In addition, if the subtitles of the text data displayed in text field F5 are already being displayed on display device 40, the subtitles of the text data displayed in text field F5 are made invisible (deleted) at the time when the part to be corrected is selected by the checker.

チェッカーは、修正対象のテキストを削除する（ステップＳ３）。図４に示す例では、チェッカーは、「連結」を示すテキストを削除している。 The checker deletes the text to be corrected (step S3). In the example shown in FIG. 4, the checker deletes the text indicating "concatenation."

チェッカーは、ウィンドウＷ１に示されている変換候補の中から、正しいテキストを選択する（ステップＳ４）。図４に示す例では、チェッカーは、「連携」を示すテキストを選択している。 The checker selects the correct text from among the conversion candidates shown in window W1 (step S4). In the example shown in FIG. 4, the checker selects the text indicating "linkage."

チェッカーによる正しいテキストの選択後、モニタリング処理部２４１は、テキストフィールドＦ５の内部の修正箇所に、選択されたテキストを挿入する（ステップＳ５）。なお、ウィンドウＷ１に表示されている変換候補の中に正しいテキストがない場合、チェッカーは、正しいテキストを修正箇所に手入力することができる。 After the checker selects the correct text, the monitoring processing unit 241 inserts the selected text into the correction location inside the text field F5 (step S5). If the correct text is not among the conversion candidates displayed in the window W1, the checker can manually input the correct text into the correction location.

チェッカーは、修正が完了したため、テキストフィールドＦ５の外部を選択する（ステップＳ６）。これにより、モニタリング処理部２４１は、テキストフィールドＦ５を非選択状態に戻し（太枠表示の解除）、表示可否の選択をボタンＢ３－５からボタンＢ４－５へ切り替える。さらに、モニタリング処理部２４１は、修正後のテキストデータを機械翻訳部２４２へ入力する。 As the correction is complete, the checker selects the outside of text field F5 (step S6). As a result, the monitoring processing unit 241 returns text field F5 to an unselected state (cancels the bold frame display) and switches the display/non-display selection from button B3-5 to button B4-5. Furthermore, the monitoring processing unit 241 inputs the corrected text data to the machine translation unit 242.

（４－２）機械翻訳部２４２
機械翻訳部２４２は、機械翻訳処理を実行する機能を有する。機械翻訳部２４２は、機械翻訳エンジン２２を用いて機械翻訳処理を実行する。機械翻訳処理にて、機械翻訳部２４２は、モニタリング処理部２４１によって修正されたテキストデータを機械翻訳エンジン２２へ入力し、機械翻訳エンジン２２から出力されるテキストデータを翻訳結果として取得する。これにより、機械翻訳部２４２は、修正後のテキストデータを、修正時に指定されなかった第２の言語に翻訳し、複数の第２の言語ごとに修正が反映された翻訳結果を示すテキストデータを取得することができる。
機械翻訳部２４２は、モニタリング処理によって得られた修正後のテキストデータと、機械翻訳処理によって得られた複数の第２の言語のテキストデータを字幕処理部２４３へ入力する。 (4-2) Machine translation unit 242
The machine translation unit 242 has a function of executing a machine translation process. The machine translation unit 242 executes the machine translation process using the machine translation engine 22. In the machine translation process, the machine translation unit 242 inputs the text data corrected by the monitoring processing unit 241 to the machine translation engine 22, and acquires the text data output from the machine translation engine 22 as a translation result. In this way, the machine translation unit 242 can translate the corrected text data into a second language that was not specified at the time of correction, and acquire text data indicating the translation result in which the correction is reflected for each of the multiple second languages.
The machine translation unit 242 inputs the corrected text data obtained by the monitoring process and the multiple pieces of text data in a second language obtained by the machine translation process to the subtitle processing unit 243.

（４－３）字幕処理部２４３
字幕処理部２４３は、字幕表示処理を実行する機能を有する。字幕表示処理にて、字幕処理部２４３は、音声変換部２３２によって変換されてモニタリング処理部２４１から入力される複数の第２の言語のテキストデータ、又は、機械翻訳部２４２から入力される複数の第２の言語のテキストデータを用いて、表示装置４０に字幕を表示する。
モニタリング処理部２４１から入力される複数の第２の言語のテキストデータを用いる場合、字幕処理部２４３は、モニタリング処理にて修正が不要と判定された翻訳結果を示すテキストデータを字幕として表示装置４０に表示することができる。一方、機械翻訳部２４２から入力される複数の第２の言語のテキストデータを用いる場合、字幕処理部２４３は、モニタリング処理にて修正が必要と判定され、修正が反映された翻訳結果を示すテキストデータを字幕として表示装置４０に表示することができる。 (4-3) Subtitle processing unit 243
The subtitle processing unit 243 has a function of executing a subtitle display process. In the subtitle display process, the subtitle processing unit 243 displays subtitles on the display device 40 using the text data in the second language converted by the speech conversion unit 232 and input from the monitoring processing unit 241, or the text data in the second language input from the machine translation unit 242.
When using text data in a plurality of second languages input from the monitoring processing unit 241, the subtitle processing unit 243 can display text data indicating a translation result for which correction is determined not to be required in the monitoring process as subtitles on the display device 40. On the other hand, when using text data in a plurality of second languages input from the machine translation unit 242, the subtitle processing unit 243 can display text data indicating a translation result for which correction is determined to be required in the monitoring process and in which the correction is reflected as subtitles on the display device 40.

＜１－３．処理の流れ＞
以上、第１の実施形態に係る音声処理装置２０の機能構成について説明した。続いて、図５を参照して、第１の実施形態に係る字幕表示システム１における処理の流れについて説明する。図５は、第１の実施形態に係る字幕表示システム１における処理の流れの一例を示すシーケンス図である。 <1-3. Processing flow>
The functional configuration of the audio processing device 20 according to the first embodiment has been described above. Next, a process flow in the subtitle display system 1 according to the first embodiment will be described with reference to Fig. 5. Fig. 5 is a sequence diagram showing an example of a process flow in the subtitle display system 1 according to the first embodiment.

図５に示すように、まず、集音装置１０は、集音した音声の音声データを音声処理装置２０へ送信する（ステップＳ１０１）。音声処理装置２０の音声データ取得部２３１は、通信部２１０が集音装置１０から受信する音声データを取得する。 As shown in FIG. 5, first, the sound collection device 10 transmits audio data of the collected audio to the audio processing device 20 (step S101). The audio data acquisition unit 231 of the audio processing device 20 acquires the audio data that the communication unit 210 receives from the sound collection device 10.

次に、音声処理装置２０の音声変換部２３２は、音声データ取得部２３１によって取得された音声データについて、同時通訳エンジン２１へ同時通訳を依頼する（ステップＳ１０２）。音声変換部２３２は、通信部２１０を介して、音声データを同時通訳エンジン２１へ送信する。 Next, the voice conversion unit 232 of the voice processing device 20 requests the simultaneous interpretation engine 21 to perform simultaneous interpretation of the voice data acquired by the voice data acquisition unit 231 (step S102). The voice conversion unit 232 transmits the voice data to the simultaneous interpretation engine 21 via the communication unit 210.

次に、同時通訳エンジン２１は、音声処理装置２０から受信する音声データを同時通訳（音声認識及び機械翻訳）し、同時通訳の結果を音声処理装置２０へ送信する（ステップＳ１０３）。同時通訳の結果は、複数の第２の言語のテキストデータである。 Next, the simultaneous interpretation engine 21 simultaneously interprets (speech recognition and machine translation) the voice data received from the voice processing device 20 and transmits the result of the simultaneous interpretation to the voice processing device 20 (step S103). The result of the simultaneous interpretation is text data in multiple second languages.

次に、音声処理装置２０のモニタリング処理部２４１は、モニタリング画面の表示処理を行う（ステップＳ１０４）。モニタリング処理部２４１は、通信部２１０を介して画面情報をモニタリング端末３０へ送信し、モニタリング画面を表示させる。 Next, the monitoring processing unit 241 of the voice processing device 20 performs a process of displaying the monitoring screen (step S104). The monitoring processing unit 241 transmits screen information to the monitoring terminal 30 via the communication unit 210, and displays the monitoring screen.

モニタリング端末３０は、音声処理装置２０から受信する画面情報に基づき、モニタリング画面を表示する（ステップＳ１０５）。
モニタリング画面の表示後、モニタリング端末３０は、モニタリング画面にてチェッカーによるテキストデータの修正を受け付け、修正内容を示す修正情報を音声処理装置２０へ送信する（ステップＳ１０６）。 The monitoring terminal 30 displays a monitoring screen based on the screen information received from the sound processing device 20 (step S105).
After displaying the monitoring screen, the monitoring terminal 30 accepts corrections to the text data made by the checker on the monitoring screen, and transmits correction information indicating the contents of the corrections to the voice processing device 20 (step S106).

モニタリング処理部２４１は、通信部２１０がモニタリング端末３０から修正情報を受信するか否かに応じて、テキストデータの修正があるか否かを判定する（ステップＳ１０７）。修正がある場合（ステップＳ１０７／ＹＥＳ）、処理はステップＳ１０８へ進む。一方、修正がない場合（ステップＳ１０７／ＮＯ）、処理はステップＳ１１０へ進む。 The monitoring processing unit 241 determines whether or not the text data has been corrected, depending on whether or not the communication unit 210 receives correction information from the monitoring terminal 30 (step S107). If there has been a correction (step S107/YES), the process proceeds to step S108. On the other hand, if there has been no correction (step S107/NO), the process proceeds to step S110.

処理がステップＳ１０８へ進んだ場合、音声処理装置２０の機械翻訳部２４２は、チェッカーによって修正されたテキストデータについて、機械翻訳エンジン２２へ機械翻訳を依頼する（ステップＳ１０８）。機械翻訳部２４２は、通信部２１０を介して、テキストデータを機械翻訳エンジン２２へ送信する。 When the process proceeds to step S108, the machine translation unit 242 of the speech processing device 20 requests the machine translation engine 22 to perform machine translation of the text data corrected by the checker (step S108). The machine translation unit 242 transmits the text data to the machine translation engine 22 via the communication unit 210.

次に、機械翻訳エンジン２２は、音声処理装置２０から受信するテキストデータを機械翻訳し、機械翻訳の結果を音声処理装置２０へ送信する（ステップＳ１０９）。機械翻訳の結果は、複数の第２の言語のテキストデータである。送信後、処理はステップＳ１１１へ進む。 Next, the machine translation engine 22 machine translates the text data received from the speech processing device 20 and transmits the results of the machine translation to the speech processing device 20 (step S109). The results of the machine translation are text data in multiple second languages. After transmission, the process proceeds to step S111.

処理がステップＳ１１０へ進んだ場合、モニタリング処理部２４１は、テキストデータの表示可否を判定する（ステップＳ１１０）。表示可否が表示可能である場合（ステップＳ１１０／ＹＥＳ）、処理はステップＳ１１１へ進む。一方、表示可否が表示不可である場合（ステップＳ１１０／ＮＯ）、処理は終了する。 When the process proceeds to step S110, the monitoring processing unit 241 determines whether the text data can be displayed (step S110). If the display is possible (step S110/YES), the process proceeds to step S111. On the other hand, if the display is not possible (step S110/NO), the process ends.

処理がステップＳ１１１へ進んだ場合、音声処理装置２０の字幕処理部２４３は、字幕表示処理を実行する（ステップＳ１１１）。字幕処理部２４３は、通信部２１０を介して、複数の第２の言語のテキストデータを表示装置４０へ送信する。
表示装置４０は、音声処理装置２０から受信する第２の言語のテキストデータを字幕として表示する。（ステップＳ１１２）。 When the process proceeds to step S111, the subtitle processor 243 of the audio processing device 20 executes a subtitle display process (step S111). The subtitle processor 243 transmits text data in a plurality of second languages to the display device 40 via the communication unit 210.
The display device 40 displays the text data in the second language received from the audio processing device 20 as subtitles (step S112).

以上、第１の実施形態に係る処理の流れについて説明した。
以上説明したように、第１の実施形態に係る音声処理装置２０は、第１の言語によるユーザの発話内容を示す音声データを、同時通訳によって、第１の言語と異なる複数の第２の言語への翻訳結果を示すテキストデータに変換する音声変換部２３２と、複数の第２の言語のうち指定された１つの第２の言語について、テキストデータに対する修正を受け付けるモニタリング処理部２４１と、修正後のテキストデータを、修正時に指定されなかった第２の言語に翻訳し、複数の第２の言語ごとに修正が反映された翻訳結果を示すテキストデータを取得する機械翻訳部２４２と、を備える。 The process flow according to the first embodiment has been described above.
As described above, the voice processing device 20 according to the first embodiment includes a voice conversion unit 232 that converts voice data indicating the content of a user's speech in a first language into text data indicating the translation results into a plurality of second languages different from the first language by simultaneous interpretation, a monitoring processing unit 241 that accepts corrections to the text data for one specified second language among the plurality of second languages, and a machine translation unit 242 that translates the corrected text data into a second language that was not specified at the time of correction and obtains text data indicating the translation results in which the corrections are reflected for each of the plurality of second languages.

かかる構成により、多言語の同時通訳において、同時通訳結果に誤認識又は誤訳が含まれる場合、チェッカーは、翻訳語の多言語のうち１つの言語の同時通訳結果のみを修正するだけで、他の言語の同時通訳結果も修正することができる。
よって、第１の実施形態に係る音声処理装置２０は、多言語の同時通訳における同時通訳結果を容易に修正することを可能とする。 With this configuration, in the case of simultaneous interpretation in multiple languages, if the simultaneous interpretation result contains a misrecognition or mistranslation, the checker can correct the simultaneous interpretation result in only one of the multiple languages of the translated word, and can also correct the simultaneous interpretation results in the other languages.
Therefore, the speech processing device 20 according to the first embodiment makes it possible to easily correct the results of simultaneous interpretation in multiple languages.

また、講演者の発話内容の同時通訳結果に誤認識又は誤訳が含まれる場合、チェッカーは、同時通訳結果（字幕）が聴講者へ提示される前に誤認識又は誤訳を修正することができる。これにより、聴講者には、誤認識又は誤訳が含まれる同時通訳結果は提示されず、修正後の誤認識又は誤訳が含まれない同時通訳結果のみが提示される。
よって、第１の実施形態に係る音声処理装置２０は、ユーザが同時翻訳の内容を正しく理解することを可能とする。 Furthermore, if the simultaneous interpretation of the speaker's speech contains a misrecognition or mistranslation, the checker can correct the misrecognition or mistranslation before the simultaneous interpretation result (subtitles) is presented to the audience. This ensures that the audience will not be shown the simultaneous interpretation result that contains the misrecognition or mistranslation, but only the corrected simultaneous interpretation result that does not contain the misrecognition or mistranslation.
Therefore, the speech processing device 20 according to the first embodiment enables the user to correctly understand the content of the simultaneous translation.

＜＜２．第２の実施形態＞＞
以上、第１の実施形態について説明した。続いて、図６から図１２を参照して、第２の実施形態について説明する。第２の実施形態では、モニタリング画面におけるテキストデータの修正方法について、第１の実施形態とは異なる修正方法について説明する。以下では、第１の実施形態における説明と重複する説明については、適宜省略する。
なお、第２の実施形態では、講演者が発話する言語（第１の言語）が任意の１つの言語（例えば日本語）であり、多言語翻訳された字幕の言語（第２の言語）が任意の複数の言語（例えば日本語以外の言語）であるものとする。 <<2. Second embodiment>>
The first embodiment has been described above. Next, the second embodiment will be described with reference to Figs. 6 to 12. In the second embodiment, a method of correcting text data on a monitoring screen that is different from that in the first embodiment will be described. In the following, descriptions that overlap with those in the first embodiment will be omitted as appropriate.
In the second embodiment, the language spoken by the speaker (first language) is any one language (e.g., Japanese), and the language of the multilingual translated subtitles (second language) is any multiple languages (e.g., languages other than Japanese).

＜２－１．字幕表示システムの構成＞
図６を参照して、第２の実施形態に係る字幕表示システム１ａの構成について説明する。図６は、第２の実施形態に係る字幕表示システム１ａの構成の一例を示す図である。
図６に示すように、字幕表示システム１ａは、集音装置１０と、音声処理装置２０ａと、機械翻訳エンジン２２と、音声認識エンジン２３と、変換候補ＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）２４と、モニタリング端末３０と、表示装置４０とを備える。 <2-1. Configuration of subtitle display system>
A configuration of a subtitle display system 1a according to the second embodiment will be described with reference to Fig. 6. Fig. 6 is a diagram showing an example of the configuration of a subtitle display system 1a according to the second embodiment.
As shown in FIG. 6, the subtitle display system 1a includes a sound collection device 10, a sound processing device 20a, a machine translation engine 22, a voice recognition engine 23, a conversion candidate API (Application Programming Interface) 24, a monitoring terminal 30, and a display device 40.

（１）集音装置１０
第２の実施形態に係る集音装置１０は、第１の実施形態に係る集音装置１０と同様であるため、その説明を省略する。 (1) Sound collection device 10
The sound collector 10 according to the second embodiment is similar to the sound collector 10 according to the first embodiment, and therefore a description thereof will be omitted.

（２）音声処理装置２０ａ
第２の実施形態に係る音声処理装置２０ａは、第１の実施形態に係る音声処理装置２０と比較してモニタリング画面における修正方法が異なることにより、実行する処理が異なる。音声処理装置２０ａは、集音装置１０から受信する音声データに基づき、音声変換処理、モニタリング処理、機械翻訳処理、字幕表示処理に加え、形態素解析処理、変換優先度処理、自動変換処理部をさらに行う。 (2) Audio processing device 20a
The audio processing device 20a according to the second embodiment differs from the audio processing device 20 according to the first embodiment in the process it executes, due to a different correction method on the monitoring screen. Based on the audio data received from the sound collection device 10, the audio processing device 20a performs not only audio conversion processing, monitoring processing, machine translation processing, and subtitle display processing, but also morphological analysis processing, conversion priority processing, and an automatic conversion processing unit.

（３）機械翻訳エンジン２２
第２の実施形態に係る機械翻訳エンジン２２は、第１の実施形態に係る機械翻訳エンジン２２と同様であるため、その説明を省略する。 (3) Machine translation engine 22
The machine translation engine 22 according to the second embodiment is similar to the machine translation engine 22 according to the first embodiment, and therefore a description thereof will be omitted.

（４）音声認識エンジン２３
音声認識エンジン２３は、音声データに対する音声認識を行うエンジン（プログラム）である。音声認識エンジン２３は、音声処理装置２０ａから入力される第１の言語の音声データを音声認識によって第１の言語のテキストデータに変換する。
なお、音声認識エンジン２３の機能は、音声処理装置２０ａとは異なる装置又は端末によって提供されてもよいし、音声処理装置２０ａによって提供されてもよい。 (4) Voice recognition engine 23
The voice recognition engine 23 is an engine (program) that performs voice recognition on voice data. The voice recognition engine 23 converts the voice data in the first language input from the voice processing device 20a into text data in the first language by voice recognition.
The function of the voice recognition engine 23 may be provided by a device or terminal different from the voice processing device 20a, or may be provided by the voice processing device 20a.

（５）変換候補ＡＰＩ２４
変換候補ＡＰＩ２４は、テキストデータの読みに対する変換候補を取得するＡＰＩである。変換候補ＡＰＩ２４は、音声処理装置２０ａから入力される第１の言語のテキストデータの読みを示す情報に基づき、第１の言語のテキストデータの読みに対応する変換候補を取得する。
なお、変換候補ＡＰＩ２４の機能は、音声処理装置２０ａとは異なる装置又は端末によって提供されてもよいし、音声処理装置２０ａによって提供されてもよい。 (5) Conversion candidate API 24
The conversion candidate API 24 is an API that acquires conversion candidates for the reading of text data. The conversion candidate API 24 acquires conversion candidates corresponding to the reading of the text data of the first language based on information indicating the reading of the text data of the first language input from the voice processing device 20a.
The function of the conversion candidate API 24 may be provided by a device or terminal different from the voice processing device 20a, or may be provided by the voice processing device 20a.

（６）モニタリング端末３０
第２の実施形態に係るモニタリング端末３０は、第１の実施形態に係るモニタリング端末３０と同様であるため、その説明を省略する。 (6) Monitoring terminal 30
The monitoring terminal 30 according to the second embodiment is similar to the monitoring terminal 30 according to the first embodiment, and therefore a description thereof will be omitted.

（７）表示装置４０
第２の実施形態に係る表示装置４０は、第１の実施形態に係る表示装置４０と同様であるため、その説明を省略する。 (7) Display device 40
The display device 40 according to the second embodiment is similar to the display device 40 according to the first embodiment, and therefore a description thereof will be omitted.

＜２－２．音声処理装置の機能構成＞
以上、第２の実施形態に係る字幕表示システム１ａの構成について説明した。続いて、図７から図１０を参照して、第２の実施形態に係る音声処理装置２０ａの機能構成について説明する。図７は、第２の実施形態に係る音声処理装置２０ａの機能構成の一例を示すブロック図である。
図７に示すように、音声処理装置２０ａは、通信部２１０ａと、記憶部２２０ａと、第１制御部２３０ａと、第２制御部２４０ａとを備える。 <2-2. Functional configuration of the voice processing device>
The configuration of the subtitle display system 1a according to the second embodiment has been described above. Next, the functional configuration of the audio processing device 20a according to the second embodiment will be described with reference to Fig. 7 to Fig. 10. Fig. 7 is a block diagram showing an example of the functional configuration of the audio processing device 20a according to the second embodiment.
As shown in FIG. 7, the voice processing device 20a includes a communication unit 210a, a storage unit 220a, a first control unit 230a, and a second control unit 240a.

（１）通信部２１０ａ
通信部２１０ａは、集音装置１０と、機械翻訳エンジン２２と、音声認識エンジン２３と、変換候補ＡＰＩ２４と、モニタリング端末３０と、表示装置４０と通信可能に接続されており、各種情報を送受信する。 (1) Communication unit 210a
The communication unit 210a is communicatively connected to the sound collection device 10, the machine translation engine 22, the voice recognition engine 23, the conversion candidate API 24, the monitoring terminal 30, and the display device 40, and transmits and receives various information.

（２）記憶部２２０ａ
記憶部２２０ａは、優先度情報も記憶する点が、第１の実施形態に係る記憶部２２０と異なる。優先度情報は、テキストデータが形態素解析によって分割された形態素単位での自動変換の優先度を示す情報である。当該優先度は、チェッカーによるモニタリングにおける過去の修正履歴に基づき算出される。 (2) Storage unit 220a
The storage unit 220a differs from the storage unit 220 according to the first embodiment in that it also stores priority information. The priority information is information indicating the priority of automatic conversion for each morpheme into which text data is divided by morphological analysis. The priority is calculated based on the past correction history during monitoring by the checker.

（３）第１制御部２３０ａ
図７に示すように、第１制御部２３０ａは、音声データ取得部２３１ａと、音声変換部２３２ａとを備える。 (3) First control unit 230a
As shown in FIG. 7, the first control unit 230a includes a voice data acquisition unit 231a and a voice conversion unit 232a.

（３－１）音声データ取得部２３１ａ
第２の実施形態に係る音声データ取得部２３１ａは、第１の実施形態に係る音声データ取得部２３１と同様であるため、その説明を省略する。 (3-1) Voice data acquisition unit 231a
The voice data acquisition unit 231a according to the second embodiment is similar to the voice data acquisition unit 231 according to the first embodiment, and therefore a description thereof will be omitted.

（３－２）音声変換部２３２ａ
音声変換部２３２ａは、同時通訳ではなく音声認識のみを行う点が第１の実施形態に係る音声変換部２３２と異なる。
音声変換部２３２ａは、音声認識エンジン２３を用いて音声変換処理として音声認識処理を実行する。音声認識処理にて、音声変換部２３２ａは、音声データ取得部２３１ａによって取得される、第１の言語による講演者の発話内容を示す音声データを音声認識エンジン２３へ入力し、音声認識エンジン２３から出力されるテキストデータを音声認識結果として取得する。これにより、音声変換部２３２ａは、第１の言語によるユーザの発話内容を示す音声データを、音声認識によって、第１の言語によるユーザの発話内容を示すテキストデータに変換することができる。
音声変換部２３２ａは、音声認識処理によって得られた第１の言語のテキストデータをモニタリング処理部２４１ａへ入力する。 (3-2) Voice conversion unit 232a
The speech conversion unit 232a differs from the speech conversion unit 232 according to the first embodiment in that it does not perform simultaneous interpretation but only performs speech recognition.
The voice conversion unit 232a executes a voice recognition process as a voice conversion process using the voice recognition engine 23. In the voice recognition process, the voice conversion unit 232a inputs voice data indicating the contents of the speaker's speech in the first language acquired by the voice data acquisition unit 231a to the voice recognition engine 23, and acquires text data output from the voice recognition engine 23 as a voice recognition result. In this way, the voice conversion unit 232a can convert the voice data indicating the contents of the user's speech in the first language into text data indicating the contents of the user's speech in the first language by voice recognition.
The voice conversion unit 232a inputs the text data in the first language obtained by the voice recognition process to the monitoring processing unit 241a.

（４）第２制御部２４０ａ
図７に示すように、第２制御部２４０ａは、モニタリング処理部２４１ａと、機械翻訳部２４２ａと、字幕処理部２４３ａと、形態素解析部２４４と、変換優先度処理部２４５と、自動変換処理部２４６とを備える。 (4) Second control unit 240a
As shown in FIG. 7, the second control unit 240 a includes a monitoring processing unit 241 a , a machine translation unit 242 a , a subtitle processing unit 243 a , a morphological analysis unit 244 , a conversion priority processing unit 245 , and an automatic conversion processing unit 246 .

（４－１）モニタリング処理部２４１ａ
モニタリング処理部２４１ａは、モニタリング処理にて、モニタリング端末３０に表示されたモニタリング画面を介して、チェッカーからテキストデータに対する修正操作を受け付けるが、テキストデータの表示可否の選択操作は受け付けない。また、モニタリング処理部２４１ａは、モニタリング画面にて、テキストデータを複数のテキストに分割して表示し、分割されたテキスト単位でテキストデータに対する修正を受け付ける。このため、第１の実施形態と第２の実施形態とでは、モニタリング画面におけるテキストデータの修正方法が異なる。 (4-1) Monitoring processing unit 241a
During the monitoring process, the monitoring processor 241a accepts a correction operation for the text data from the checker via the monitoring screen displayed on the monitoring terminal 30, but does not accept a selection operation for whether or not to display the text data. Also, the monitoring processor 241a divides the text data into a plurality of texts and displays them on the monitoring screen, and accepts corrections to the text data in units of the divided texts. For this reason, the method of correcting the text data on the monitoring screen differs between the first embodiment and the second embodiment.

モニタリング処理部２４１ａは、テキストデータを表示するためのＵＩをモニタリング画面に表示する。当該ＵＩは、例えばテキストフィールドである。モニタリング処理部２４１ａは、モニタリング画面にて、１つのテキストデータについて複数のテキストフィールドを表示する。モニタリング処理部２４１ａは、後述する形態素解析部２４４による形態素解析の結果に基づき、テキストデータを形態素単位のテキストに分割する。形態素単位は、例えば、品詞単位である。この場合、モニタリング処理部２４１ａは、形態素解析部２４４がテキストデータを品詞単位で分割した結果を示す情報（以下、「品詞情報」とも称される）に基づき、テキストデータを品詞単位に分割する。品詞情報には、例えば、分割後のテキストと、各テキストの読みを示す情報が含まれる。
分割後、モニタリング処理部２４１ａは、分割した複数のテキストをそれぞれのテキストフィールドに表示する。モニタリング処理部２４１ａは、１つのテキストデータについて、複数のテキストフィールドごとに修正操作を受け付ける。これにより、チェッカーは、形態素単位でテキストデータを修正することができる。 The monitoring processing unit 241a displays a UI for displaying text data on the monitoring screen. The UI is, for example, a text field. The monitoring processing unit 241a displays a plurality of text fields for one piece of text data on the monitoring screen. The monitoring processing unit 241a divides the text data into texts in morpheme units based on the result of morpheme analysis by the morpheme analysis unit 244 described later. The morpheme units are, for example, parts of speech units. In this case, the monitoring processing unit 241a divides the text data into parts of speech units based on information indicating the result of the morpheme analysis unit 244 dividing the text data into parts of speech units (hereinafter, also referred to as "part of speech information"). The part of speech information includes, for example, the text after division and information indicating the reading of each text.
After the division, the monitoring processor 241a displays the divided texts in their respective text fields. The monitoring processor 241a accepts correction operations for each of the text fields for one piece of text data. This allows the checker to correct the text data on a morpheme-by-morpheme basis.

モニタリング処理部２４１ａは、モニタリング画面にて、テキストデータの修正対象となるテキストが選択されると、選択されたテキストの近傍に変換候補を表示する。モニタリング処理部２４１ａは、変換候補ＡＰＩ２４を用いて、変換候補を取得する。モニタリング処理部２４１ａは、形態素解析部２４４によって取得される品詞情報を変換候補ＡＰＩ２４へ入力し、変換候補ＡＰＩ２４から出力される変換候補を示す情報（以下、「変換候補情報」とも称される）を取得する。変換候補ＡＰＩ２４は、品詞情報に含まれるテキストの読みを参照し、同音異義語を変換候補として取得する。これにより、モニタリング処理部２４１ａは、取得した変換候補情報に基づき、モニタリング画面に変換候補を表示することができる。
モニタリング画面に表示した変換候補が選択された場合、モニタリング処理部２４１ａは、修正対象となるテキストを変換候補から選択されたテキストで置き換える。このように、チェッカーは、変換候補を選択することで、形態素単位でテキストデータを修正することができる。 When text to be corrected in the text data is selected on the monitoring screen, the monitoring processing unit 241a displays conversion candidates near the selected text. The monitoring processing unit 241a acquires conversion candidates using the conversion candidate API 24. The monitoring processing unit 241a inputs part-of-speech information acquired by the morpheme analysis unit 244 to the conversion candidate API 24, and acquires information indicating the conversion candidates (hereinafter also referred to as "conversion candidate information") output from the conversion candidate API 24. The conversion candidate API 24 refers to the reading of the text included in the part-of-speech information, and acquires homonyms as conversion candidates. This allows the monitoring processing unit 241a to display conversion candidates on the monitoring screen based on the acquired conversion candidate information.
When a conversion candidate displayed on the monitoring screen is selected, the monitoring processor 241a replaces the text to be corrected with the text selected from the conversion candidates. In this way, the checker can correct the text data on a morpheme-by-morpheme basis by selecting a conversion candidate.

なお、モニタリング処理部２４１ａは、変換候補の表示と共に、テキストの入力フィールドも表示し、当該入力フィールドへのテキストの入力による修正も受け付ける。これにより、チェッカーは、例えば変換候補の中に適切な変換候補がない場合に、入力フィールドに適切なテキストを入力することで、テキストデータを適切に修正することができる。 The monitoring processing unit 241a also displays a text input field along with displaying the conversion candidates, and accepts corrections by inputting text into the input field. This allows the checker to appropriately correct the text data by inputting appropriate text into the input field, for example, when there is no appropriate conversion candidate among the conversion candidates.

モニタリング処理部２４１ａは、モニタリング画面にテキストデータを表示する際に、当該テキストデータを過去の修正実績に基づき変換してから表示してもよい。例えば、表示対象となるテキストデータの中に、過去に別のテキストデータにて修正されたテキストに対応するテキストが含まれているとする。対応するテキストは、例えば、同音異義語である。この場合、モニタリング処理部２４１ａは、表示対象のテキストデータで該当するテキストを過去に修正されたテキストに変換してから表示する。即ち、モニタリング処理部２４１ａは、第１のテキストデータに対する修正の実施以降に行われる音声認識によって得られる第２のテキストデータを表示する際に、第１のテキストデータにて修正された第１のテキストと対応する第２のテキストが第２のテキストデータに含まれる場合、第２のテキストを修正後の第１のテキストに変換して表示する。
これにより、過去の修正実績があるテキストは、モニタリング処理部２４１ａによって自動変換されるため、チェッカーによるモニタリングにおける負荷を軽減することができる。 When displaying text data on the monitoring screen, the monitoring processing unit 241a may convert the text data based on past correction records before displaying the text data. For example, it is assumed that the text data to be displayed contains text corresponding to text previously corrected in other text data. The corresponding text is, for example, a homonym. In this case, the monitoring processing unit 241a converts the corresponding text in the text data to be displayed into text previously corrected before displaying the text data. That is, when displaying second text data obtained by speech recognition performed after the first text data is corrected, if the second text data contains second text corresponding to the first text corrected in the first text data, the monitoring processing unit 241a converts the second text into the corrected first text before displaying the second text data.
As a result, text that has been corrected in the past is automatically converted by the monitoring processing unit 241a, which reduces the load of monitoring by the checker.

モニタリング処理部２４１ａは、モニタリング画面にて、音声データから変換されたテキストデータだけでなく、当該テキストデータが異なる言語に翻訳されたテキストデータも表示してよい。音声データから変換されたテキストデータがモニタリング画面にて修正された場合、モニタリング処理部２４１ａは、機械翻訳エンジン２２を用いて修正が反映された翻訳結果を示すテキストデータを取得し、取得したテキストデータで翻訳の表示を更新する。
なお、モニタリング処理部２４１ａがモニタリング画面に表示する翻訳は、複数の第２の言語のうち、１つの言語の翻訳であってもよいし、複数の言語の翻訳であってもよい。 The monitoring processing unit 241a may display not only the text data converted from the voice data, but also the text data obtained by translating the text data into a different language on the monitoring screen. When the text data converted from the voice data is corrected on the monitoring screen, the monitoring processing unit 241a obtains text data indicating the translation result reflecting the correction using the machine translation engine 22, and updates the display of the translation with the obtained text data.
The translation that the monitoring processor 241a displays on the monitoring screen may be a translation of one of the multiple second languages, or may be a translation of multiple languages.

ここで、図８を参照して、第２の実施形態に係るモニタリング画面について説明する。図８は、第２の実施形態に係るモニタリング画面の一例を示す図である。 Here, the monitoring screen according to the second embodiment will be described with reference to FIG. 8. FIG. 8 is a diagram showing an example of the monitoring screen according to the second embodiment.

図８に示すモニタリング画面Ｇ２には、一例として、２つの音声データについて、音声認識処理が実行されたことで取得されたテキストデータと、それぞれのテキストデータに対する機械翻訳によって取得された翻訳結果とがそれぞれ表示されている。 The monitoring screen G2 shown in FIG. 8 displays, as an example, text data obtained by executing a voice recognition process for two pieces of voice data, and the translation results obtained by machine translation of each piece of text data.

１つ目の音声データは、例えば「かいしゃではしゃしょうをつけます」という発話内容を示す音声である。当該音声データは、音声変換部２３２ａの音声認識処理によって、「会社では車掌をつけます」というテキストデータに変換される。モニタリング処理部２４１ａは、当該テキストデータを、形態素解析の結果に基づき品詞単位に分割してからテキストフィールドＦ１１に表示する。図８に示す例では、テキストデータが「会社」、「で」、「は」、「車掌」、「を」、「つけ」、「ます」の７つの品詞（単語）に分割されて、テキストフィールドＦ１１に表示されている。
さらに、モニタリング処理部２４１ａは、当該テキストデータが機械翻訳部２４２ａによって翻訳された結果を示すテキストデータをテキストフィールドＦ１２に表示する。図８に示す例では、テキストデータが「Ｔｈｅｒｅｉｓａｃｏｎｄｕｃｔｏｒｉｎｔｈｅｃｏｍｐａｎｙ．」に翻訳されて、テキストフィールドＦ１２に表示されている。
なお、テキストフィールドＦ１１に表示されているテキストデータは、ある任意の時刻Ｔ１～Ｔ２の間における講演者の発話内容を示す音声データが音声認識されたテキストデータであり、第１のテキストデータである。時刻Ｔ２は時刻Ｔ１よりも後の時刻であるとする。
テキストフィールドＦ１２に表示されているテキストデータは、多言語翻訳の対象となる複数の言語（複数の第２の言語）のうち、予め設定された任意の言語への翻訳結果を示すテキストデータである。なお、第２の実施形態では、一例として、日本語から英語への翻訳結果が表示されるよう設定されているものとする。 The first voice data is, for example, a voice indicating the spoken content "The company provides a conductor." The voice data is converted into text data "The company provides a conductor" by the voice recognition process of the voice conversion unit 232a. The monitoring processing unit 241a divides the text data into parts of speech based on the result of morphological analysis and then displays it in the text field F11. In the example shown in FIG. 8, the text data is divided into seven parts of speech (words) of "company,""at,""is,""conductor,""for,""attach," and "masu" and displayed in the text field F11.
Furthermore, the monitoring processing unit 241a displays text data indicating the result of the text data being translated by the machine translation unit 242a in the text field F12. In the example shown in Fig. 8, the text data is translated into "There is a conductor in the company." and displayed in the text field F12.
The text data displayed in the text field F11 is text data obtained by performing voice recognition on speech data indicating the contents of a speech by a lecturer between certain arbitrary times T1 and T2, and is the first text data. Time T2 is assumed to be a time later than time T1.
The text data displayed in the text field F12 is text data showing the translation result into a preset arbitrary language among a plurality of languages (a plurality of second languages) that are the subject of multilingual translation. In the second embodiment, as an example, it is assumed that the translation result from Japanese to English is set to be displayed.

２つ目の音声データは、例えば「しゃしょうはおもにだいじなしきでつけます」という発話内容を示す音声である。当該音声データは、音声変換部２３２ａの音声認識処理によって、「社章は主に大事な式でつけます」というテキストデータに変換される。モニタリング処理部２４１ａは、当該テキストデータを、形態素解析の結果に基づき品詞単位に分割してからテキストフィールドＦ１３に表示する。図８に示す例では、テキストデータが「社章」、「は」、「主に」、「大事」、「な」、「式」、「で」、「つけ」、「ます」の９つの品詞（単語）に分割されて、テキストフィールドＦ１３に表示されている。
さらに、モニタリング処理部２４１ａは、当該テキストデータが機械翻訳部２４２ａによって翻訳された結果を示すテキストデータをテキストフィールドＦ１４に表示する。図８に示す例では、テキストデータが「Ｗｅｐｕｔｔｈｅｃｏｍｐａｎｙｅｍｂｌｅｍｍａｉｎｌｙａｔｉｍｐｏｒｔａｎｔｃｅｒｅｍｏｎｉｅｓ．」に翻訳されて、テキストフィールドＦ１４に表示されている。
なお、テキストフィールドＦ１３に表示されているテキストデータは、ある任意の時刻Ｔ３～Ｔ４の間における講演者の発話内容を示す音声データが音声認識されたテキストデータであり、第２のテキストデータである。時刻Ｔ３は時刻Ｔ２よりも後の時刻であり、時刻Ｔ４は時刻Ｔ３よりも後の時刻であるとする。 The second voice data is, for example, a voice indicating the spoken content of "The company emblem is mainly attached with an important formula." The voice data is converted into text data of "The company emblem is mainly attached with an important formula" by the voice recognition process of the voice conversion unit 232a. The monitoring processing unit 241a divides the text data into parts of speech based on the result of morphological analysis and then displays it in the text field F13. In the example shown in FIG. 8, the text data is divided into nine parts of speech (words) of "company emblem,""is,""mainly,""important,""na,""form,""de,""attach," and "masu," and is displayed in the text field F13.
Furthermore, the monitoring processing unit 241a displays text data indicating the result of the translation of the text data by the machine translation unit 242a in the text field F14. In the example shown in Fig. 8, the text data is translated into "We put the company emblem mainly at important ceremonies." and displayed in the text field F14.
The text data displayed in the text field F13 is text data obtained by performing voice recognition on the speech data indicating the contents of the speech of the speaker between any given times T3 and T4, and is the second text data. Time T3 is assumed to be later than time T2, and time T4 is assumed to be later than time T3.

ここで、図９を参照して、第２の実施形態に係る修正操作手順について説明する。図９は、第２の実施形態に係る修正操作手順の一例を示す図である。図９には、図８のテキストフィールドＦ１１のテキストデータを修正する例が示されている。 Now, referring to FIG. 9, a correction operation procedure according to the second embodiment will be described. FIG. 9 is a diagram showing an example of a correction operation procedure according to the second embodiment. FIG. 9 shows an example of correcting the text data in the text field F11 in FIG. 8.

図９に示すように、まず、チェッカーは、テキストフィールドＦ１１に表示されたテキストデータを確認し、修正が必要なテキストを選択（タッチ）する（ステップＳ１１）。図９に示す例では、修正するテキストとして「車掌」が選択されたものとする。 As shown in FIG. 9, first, the checker checks the text data displayed in the text field F11 and selects (touches) the text that needs to be corrected (step S11). In the example shown in FIG. 9, it is assumed that "conductor" is selected as the text to be corrected.

チェッカーによるテキストの選択後、モニタリング処理部２４１ａは、選択されたテキストの近傍に変換候補ウィンドウＷ１１、テキストフィールドＦ１５、追加ボタンＢ５、及び削除ボタンＢ６を表示する（ステップＳ１２）。チェッカーは、変換候補ウィンドウＷ１１に表示された変換候補の中から、修正に適切なテキストを選択する。変換候補の中に適切なテキストがない場合、チェッカーは、テキストフィールドＦ１５（入力フィールド）に適切なテキストを入力し、追加ボタンＢ５を押下する。テキストフィールドＦ１１で選択したテキストを削除したい場合、チェッカーは、削除ボタンＢ６を押下する。
なお、チェッカーによってテキストフィールドＦ１１から選択されたテキストが自動変換されたテキストである場合、テキストフィールドＦ１５には、自動変換前のテキストが表示される。この場合、チェッカーは、追加ボタンＢ５を押下することで、テキストフィールドＦ１１で選択したテキストを自動変換前のテキストに戻すことができる。 After the checker selects the text, the monitoring processor 241a displays a conversion candidate window W11, a text field F15, an add button B5, and a delete button B6 near the selected text (step S12). The checker selects appropriate text for correction from among the conversion candidates displayed in the conversion candidate window W11. If there is no appropriate text among the conversion candidates, the checker inputs appropriate text into the text field F15 (input field) and presses the add button B5. If the checker wants to delete the text selected in the text field F11, the checker presses the delete button B6.
If the text selected by the checker from the text field F11 is automatically converted text, the text before the automatic conversion is displayed in the text field F15. In this case, the checker can return the text selected in the text field F11 to the text before the automatic conversion by pressing the Add button B5.

チェッカーによる修正操作後、モニタリング処理部２４１は、テキストフィールドＦ１１で選択されたテキストを、修正後のテキストで表示する（ステップＳ１３）。さらに、モニタリング処理部２４１ａは、テキストフィールドＦ１１に示す修正後のテキストデータが翻訳されたテキストデータを、テキストフィールドＦ１２に表示する。
図９に示す例では、「車掌」が「社章」へ修正されている。これにより、図８に示す例では、２つ目の音声データの音声認識において「車掌」と認識されたテキストが、「社章」へ自動変換されてからテキストフィールドＦ１３に表示されている。 After the checker makes the correction, the monitoring processor 241 displays the text selected in the text field F11 as the corrected text (step S13). Furthermore, the monitoring processor 241a displays, in the text field F12, text data obtained by translating the corrected text data shown in the text field F11.
In the example shown in Fig. 9, "Conductor" is corrected to "Company Emblem." As a result, in the example shown in Fig. 8, the text recognized as "Conductor" in the voice recognition of the second voice data is automatically converted to "Company Emblem" and then displayed in the text field F13.

（４－２）機械翻訳部２４２ａ
機械翻訳部２４２ａは、音声変換部２３２ａによる音声認識処理にて第１の言語の音声データから変換されたテキストデータを、第１の言語と異なる第２の言語に翻訳し、翻訳結果を示すテキストデータを取得する。これにより、機械翻訳部２４２ａは、モニタリング画面に音声データから変換されたテキストデータと共に表示する、翻訳結果を示すテキストデータを取得することができる。
また、機械翻訳部２４２ａは、モニタリング画面に表示されているテキストデータが修正されると、修正後のテキストデータを翻訳し、修正が反映された翻訳結果を示すテキストデータを取得する。これにより、機械翻訳部２４２ａは、チェッカーがモニタリング画面にて修正後の翻訳を確認することが可能、かつそのまま字幕として表示装置４０に表示することが可能な、修正が反映されたテキストデータを取得することができる。 (4-2) Machine translation unit 242a
The machine translation unit 242a translates the text data converted from the voice data of the first language by the voice recognition process by the voice conversion unit 232a into a second language different from the first language, and obtains text data indicating the translation result. In this way, the machine translation unit 242a can obtain text data indicating the translation result to be displayed on the monitoring screen together with the text data converted from the voice data.
Furthermore, when text data displayed on the monitoring screen is corrected, the machine translation unit 242a translates the corrected text data and obtains text data indicating the translation result reflecting the corrections. This allows the checker to check the corrected translation on the monitoring screen, and allows the machine translation unit 242a to obtain text data reflecting the corrections that can be displayed as is on the display device 40 as subtitles.

（４－３）字幕処理部２４３ａ
第２の実施形態に係る字幕処理部２４３ａは、第１の実施形態に係る字幕処理部２４３と同様であるため、その説明を省略する。 (4-3) Subtitle processing unit 243a
The subtitle processing unit 243a according to the second embodiment is similar to the subtitle processing unit 243 according to the first embodiment, and therefore a description thereof will be omitted.

（４－４）形態素解析部２４４
形態素解析部２４４は、形態素解析を行う機能を有する。形態素解析部２４４は、音声データから変換されたテキストデータに対して形態素解析を行う。形態素解析部２４４は、形態素解析により、テキストデータを例えば品詞単位で複数のテキストに分割する。 (4-4) Morphological analysis unit 244
The morpheme analysis unit 244 has a function of performing morpheme analysis. The morpheme analysis unit 244 performs morpheme analysis on the text data converted from the voice data. The morpheme analysis unit 244 converts the text data into, for example, Divide text into multiple parts of speech.

（４－５）変換優先度処理部２４５
変換優先度処理部２４５は、優先度情報に関する処理を行う機能を有する。優先度情報は、テキストの自動変換において変換に用いるテキストの優先度を示す情報である。優先度は、過去の修正履歴に基づき決定される。変換優先度処理部２４５は、モニタリング画面に表示されたテキストデータが修正された場合、修正内容に応じて優先度（重み）を変更する。 (4-5) Conversion Priority Processing Unit 245
The conversion priority processing unit 245 has a function of performing processing related to priority information. Priority information is information indicating the priority of text used for conversion in automatic text conversion. The priority is determined based on the history of past corrections. When text data displayed on the monitoring screen is corrected, the conversion priority processing unit 245 changes the priority (weight) according to the content of the correction.

ここで、図１０を参照して、第２の実施形態に係る変換優先度の変更について説明する。図１０は、第２の実施形態に係る変換優先度の変更の一例を示す図である。図１０に示す左側の表は、変更前の優先度の一例を示し、右側の表は、変更後の優先度の一例を示している。 Here, referring to FIG. 10, a change in conversion priority according to the second embodiment will be described. FIG. 10 is a diagram showing an example of a change in conversion priority according to the second embodiment. The table on the left side of FIG. 10 shows an example of a priority before the change, and the table on the right side shows an example of a priority after the change.

図１０に示す各表には、変換前のテキストと、変換後のテキストと、優先度（重み）とが優先度情報として示されている。当該優先度情報は、音声認識されたテキストデータに、変換前のテキストが含まれている場合、優先度が高い変換後のテキストに変換することを示している。
左側の表には、「公園」を「講演」に変換する優先度が「１．０５」、「公園」を「公演」に変換する優先度が「１．００」、「公園」を「口演」に変換する優先度が「０．９５」、「講演」を「公演」に変換する優先度が「１．００」であることが示されている。「公園」については複数の優先度情報が示されている。音声認識されたテキストデータに「公園」が含まれている場合、優先度が最も高い「講演」に自動変換される。
音声認識されたテキストデータがチェッカーによって修正されると、変換優先度処理部２４５は、優先度情報を変更する。例えば、音声認識されたテキストデータに「講演」が含まれており、チェッカーによって「講演」が「公演」に修正されたとする。この場合、変換優先度処理部２４５は、図１０の左側に示す表を右側に示す表のように変更する。「講演」が「公演」に修正されたため、変換優先度処理部２４５は、「公演」に変換する優先度を上げ、「公演」以外に変換する優先度を下げている。 10, pre-conversion text, post-conversion text, and priority (weight) are shown as priority information. The priority information indicates that when pre-conversion text is included in speech-recognized text data, the pre-conversion text is converted to post-conversion text with a higher priority.
The table on the left shows that the priority of converting "park" to "lecture" is "1.05", the priority of converting "park" to "performance" is "1.00", the priority of converting "park" to "oral performance" is "0.95", and the priority of converting "lecture" to "performance" is "1.00". Multiple pieces of priority information are shown for "park". If "park" is included in the text data that has been speech-recognized, it is automatically converted to "lecture", which has the highest priority.
When the voice-recognized text data is corrected by the checker, the conversion priority processing unit 245 changes the priority information. For example, assume that the voice-recognized text data includes "lecture" and the checker corrects "lecture" to "performance." In this case, the conversion priority processing unit 245 changes the table shown on the left side of Fig. 10 to the table shown on the right side. Because "lecture" has been corrected to "performance," the conversion priority processing unit 245 increases the priority of conversion to "performance" and decreases the priority of conversion to anything other than "performance."

（４－６）自動変換処理部２４６
自動変換処理部２４６は、テキストの自動変換を行う機能を有する。自動変換処理部２４６は、複数のテキストに分割されたテキストデータが表示される前に、優先度情報を参照する。複数のテキストに自動変換の対象となるテキストが含まれる場合、自動変換処理部２４６は、過去の修正に用いられたテキストの中から優先度に応じたテキストを選択し、対象となるテキストを選択したテキストで変換する。過去の修正に用いられたテキストは、チェッカーが過去にテキストデータのテキストを修正した際の修正後のテキストであり、図１０に示す優先度情報における変換後のテキストである。
自動変換処理部２４６が選択する優先度に応じたテキストは、例えば、優先度が最も高いテキストである。１つのテキストについて１つの優先度情報のみが存在する場合、自動変換処理部２４６は、当該優先度情報が示すテキストを選択し、自動変換を行う。１つのテキストについて複数の優先度情報が存在する場合、自動変換処理部２４６は、優先度が最も高い優先度情報が示すテキストを選択し、自動変換を行う。 (4-6) Automatic conversion processing unit 246
The automatic conversion processing unit 246 has a function of automatically converting text. Before the text data divided into a plurality of texts is displayed, the automatic conversion processing unit 246 refers to priority information. When the plurality of texts includes a text to be automatically converted, the automatic conversion processing unit 246 selects a text according to priority from among the texts used for past corrections, and converts the target text with the selected text. The text used for past corrections is the corrected text when the checker corrected the text of the text data in the past, and is the converted text in the priority information shown in FIG. 10.
The text according to the priority selected by the automatic conversion processing unit 246 is, for example, the text with the highest priority. When only one piece of priority information exists for one text, the automatic conversion processing unit 246 selects the text indicated by the priority information and performs automatic conversion. When multiple pieces of priority information exist for one text, the automatic conversion processing unit 246 selects the text indicated by the priority information with the highest priority and performs automatic conversion.

＜２－３．処理の流れ＞
以上、第２の実施形態に係る音声処理装置２０ａの機能構成について説明した。続いて、図１１から図１２を参照して、第２の実施形態に係る処理の流れについて説明する。 <2-3. Processing flow>
The functional configuration of the voice processing device 20a according to the second embodiment has been described above. Next, the flow of processing according to the second embodiment will be described with reference to Figs.

（１）字幕表示システム１ａにおける処理の流れ
図１１を参照して、第２の実施形態に係る字幕表示システム１ａにおける処理の流れについて説明する。図１１は、第２の実施形態に係る字幕表示システム１ａにおける処理の流れの一例を示すシーケンス図である。 (1) Processing flow in the subtitle display system 1a The processing flow in the subtitle display system 1a according to the second embodiment will be described with reference to Fig. 11. Fig. 11 is a sequence diagram showing an example of the processing flow in the subtitle display system 1a according to the second embodiment.

図１１に示すように、まず、集音装置１０は、集音した音声の音声データを音声処理装置２０ａへ送信する（ステップＳ２０１）。音声処理装置２０ａの音声データ取得部２３１ａは、通信部２１０ａが集音装置１０から受信する音声データを取得する。 As shown in FIG. 11, first, the sound collection device 10 transmits audio data of the collected audio to the audio processing device 20a (step S201). The audio data acquisition unit 231a of the audio processing device 20a acquires the audio data received by the communication unit 210a from the sound collection device 10.

次に、音声処理装置２０ａの音声変換部２３２ａは、音声変換処理を行う（ステップＳ２０２）。音声変換部２３２ａは、通信部２１０ａを介して、音声データ取得部２３１ａによって取得された音声データを音声認識エンジン２３へ送信し、音声認識を依頼する。音声認識エンジン２３は、音声処理装置２０ａから受信する音声データを音声認識し、音声認識の結果を音声処理装置２０ａへ送信する。 Next, the voice conversion unit 232a of the voice processing device 20a performs a voice conversion process (step S202). The voice conversion unit 232a transmits the voice data acquired by the voice data acquisition unit 231a to the voice recognition engine 23 via the communication unit 210a and requests voice recognition. The voice recognition engine 23 performs voice recognition on the voice data received from the voice processing device 20a and transmits the result of the voice recognition to the voice processing device 20a.

次に、音声処理装置２０ａの機械翻訳部２４２ａは、機械翻訳処理を行う（ステップＳ２０３）。機械翻訳部２４２ａは、通信部２１０ａを介して、音声変換部２３２ａによって取得されたテキストデータを機械翻訳エンジン２２へ送信し、機械翻訳を依頼する。機械翻訳エンジン２２は、音声処理装置２０ａから受信するテキストデータを機械翻訳し、機械翻訳の結果を音声処理装置２０ａへ送信する。 Next, the machine translation unit 242a of the voice processing device 20a performs a machine translation process (step S203). The machine translation unit 242a transmits the text data acquired by the voice conversion unit 232a to the machine translation engine 22 via the communication unit 210a and requests machine translation. The machine translation engine 22 performs machine translation of the text data received from the voice processing device 20a and transmits the result of the machine translation to the voice processing device 20a.

次に、音声処理装置２０ａは、表示準備処理を行う（ステップＳ２０４）。表示準備処理は、モニタリング画面の表示を行うための準備処理である。表示準備処理の詳細は、後述する。 Next, the audio processing device 20a performs a display preparation process (step S204). The display preparation process is a preparation process for displaying the monitoring screen. The details of the display preparation process will be described later.

次に、音声処理装置２０ａのモニタリング処理部２４１ａは、モニタリング画面の表示処理を行う（ステップＳ２０５）。モニタリング処理部２４１ａは、通信部２１０ａを介して画面情報をモニタリング端末３０へ送信し、モニタリング画面を表示させる。 Next, the monitoring processing unit 241a of the audio processing device 20a performs a process of displaying the monitoring screen (step S205). The monitoring processing unit 241a transmits screen information to the monitoring terminal 30 via the communication unit 210a, and displays the monitoring screen.

モニタリング端末３０は、音声処理装置２０ａから受信する画面情報に基づき、モニタリング画面を表示する（ステップＳ２０６）。
モニタリング画面の表示後、モニタリング端末３０は、モニタリング画面にてチェッカーによるテキストデータの修正を受け付け、修正内容を示す修正情報を音声処理装置２０ａへ送信する（ステップＳ２０７）。 The monitoring terminal 30 displays a monitoring screen based on the screen information received from the sound processing device 20a (step S206).
After displaying the monitoring screen, the monitoring terminal 30 accepts corrections to the text data made by the checker on the monitoring screen, and transmits correction information indicating the contents of the corrections to the voice processing device 20a (step S207).

モニタリング処理部２４１ａは、通信部２１０ａがモニタリング端末３０から修正情報を受信するか否かに応じて、テキストデータの修正があるか否かを判定する（ステップＳ２０８）。修正がある場合（ステップＳ２０８／ＹＥＳ）、処理はステップＳ２０９へ進む。一方、修正がない場合（ステップＳ２０８／ＮＯ）、処理はステップＳ２１２へ進む。 The monitoring processing unit 241a determines whether or not the text data has been corrected, depending on whether or not the communication unit 210a receives correction information from the monitoring terminal 30 (step S208). If there has been a correction (step S208/YES), the process proceeds to step S209. On the other hand, if there has been no correction (step S208/NO), the process proceeds to step S212.

処理がステップＳ２０９へ進んだ場合、音声処理装置２０ａの機械翻訳部２４２ａは、機械翻訳処理を行う（ステップＳ２０９）。機械翻訳部２４２ａは、通信部２１０ａを介して、チェッカーによって修正されたテキストデータを機械翻訳エンジン２２へ送信し、機械翻訳を依頼する。機械翻訳エンジン２２は、音声処理装置２０ａから受信するテキストデータを機械翻訳し、機械翻訳の結果を音声処理装置２０ａへ送信する。 When the process proceeds to step S209, the machine translation unit 242a of the voice processing device 20a performs a machine translation process (step S209). The machine translation unit 242a transmits the text data corrected by the checker to the machine translation engine 22 via the communication unit 210a and requests machine translation. The machine translation engine 22 performs machine translation of the text data received from the voice processing device 20a and transmits the result of the machine translation to the voice processing device 20a.

次に、モニタリング処理部２４１ａは、モニタリング画面の更新処理を行う（ステップＳ２１０）。モニタリング処理部２４１ａは、機械翻訳部２４２ａによって取得される機械翻訳の結果に基づき、モニタリング画面の翻訳の表示を更新する。 Next, the monitoring processing unit 241a performs an update process for the monitoring screen (step S210). The monitoring processing unit 241a updates the translation display on the monitoring screen based on the results of the machine translation obtained by the machine translation unit 242a.

次に、音声処理装置２０ａの変換優先度処理部２４５は、優先度の更新を行う（ステップＳ２１１）。変換優先度処理部２４５は、通信部２１０ａがモニタリング端末３０から受信する修正情報に基づき、当該修正情報に対応する優先度情報の優先度を更新する。 Next, the conversion priority processing unit 245 of the audio processing device 20a updates the priority (step S211). Based on the correction information that the communication unit 210a receives from the monitoring terminal 30, the conversion priority processing unit 245 updates the priority of the priority information corresponding to the correction information.

処理がステップＳ２１２へ進んだ場合、音声処理装置２０ａの字幕処理部２４３ａは、字幕表示処理を実行する（ステップＳ２１２）。字幕処理部２４３ａは、通信部２１０ａを介して、複数の第２の言語のテキストデータを表示装置４０へ送信する。
表示装置４０は、音声処理装置２０ａから受信する第２の言語のテキストデータを字幕として表示する。（ステップＳ２１３）。 When the process proceeds to step S212, the subtitle processor 243a of the audio processing device 20a executes a subtitle display process (step S212). The subtitle processor 243a transmits text data in a plurality of second languages to the display device 40 via the communication unit 210a.
The display device 40 displays the text data in the second language received from the audio processing device 20a as subtitles (step S213).

（２）表示準備処理の流れ
図１２を参照して、第２の実施形態に係る表示準備処理の流れについて説明する。図１２は、第２の実施形態に係る表示準備処理の流れの一例を示すシーケンス図である。 (2) Flow of Display Preparation Processing A flow of the display preparation processing according to the second embodiment will be described with reference to Fig. 12. Fig. 12 is a sequence diagram showing an example of the flow of the display preparation processing according to the second embodiment.

図１２に示すように、まず、音声処理装置２０ａの形態素解析部２４４は、音声変換部２３２ａによって取得されたテキストデータに対して形態素解析を行う（ステップＳ３０１）。 As shown in FIG. 12, first, the morphological analysis unit 244 of the speech processing device 20a performs morphological analysis on the text data acquired by the speech conversion unit 232a (step S301).

次に、音声処理装置２０ａの自動変換処理部２４６は、記憶部２２０ａに記憶されている優先度情報を取得する（ステップＳ３０２）。 Next, the automatic conversion processing unit 246 of the audio processing device 20a acquires the priority information stored in the memory unit 220a (step S302).

自動変換処理部２４６は、優先度情報を参照し、形態素解析されたテキストデータの中に、自動変換の優先度が高い品詞（テキスト）があるか否かを確認する（ステップＳ３０３）。自動変換の優先度が高い品詞がある場合（ステップＳ３０３／ＹＥＳ）、処理はステップＳ３０４へ進む。一方、自動変換の優先度が高い品詞がない場合（ステップＳ３０３／ＮＯ）、処理はステップＳ３０５へ進む。 The automatic conversion processing unit 246 refers to the priority information and checks whether or not there is a part of speech (text) with a high priority for automatic conversion in the morphologically analyzed text data (step S303). If there is a part of speech with a high priority for automatic conversion (step S303/YES), the process proceeds to step S304. On the other hand, if there is no part of speech with a high priority for automatic conversion (step S303/NO), the process proceeds to step S305.

処理がステップＳ３０４へ進んだ場合、自動変換処理部２４６は、自動変換を行う（ステップＳ３０４）。自動変換後、処理はステップＳ３０５へ進む。 If the process proceeds to step S304, the automatic conversion processing unit 246 performs automatic conversion (step S304). After the automatic conversion, the process proceeds to step S305.

処理がステップＳ３０５へ進んだ場合、音声処理装置２０ａのモニタリング処理部２４１ａは、通信部２１０ａを介して、形態素解析部２４４によって取得される品詞情報を変換候補ＡＰＩ２４へ送信する（ステップＳ３０５）。 When the process proceeds to step S305, the monitoring processing unit 241a of the speech processing device 20a transmits the part-of-speech information acquired by the morphological analysis unit 244 to the conversion candidate API 24 via the communication unit 210a (step S305).

変換候補ＡＰＩ２４は、音声処理装置２０ａから受信する品詞情報に基づき変換候補情報を取得し、音声処理装置２０ａへ送信する（ステップＳ３０６）。 The conversion candidate API 24 obtains conversion candidate information based on the part of speech information received from the speech processing device 20a and transmits it to the speech processing device 20a (step S306).

なお、ステップＳ３０２からステップＳ３０６の処理は、ステップＳ３０１の形態素解析の結果をもとに、品詞単位（テキスト単位）で行われる。このため、全ての品詞について変換候補情報が取得されるまで、ステップＳ３０２からステップＳ３０６の処理は繰り返される。全ての品詞について変換候補情報が取得された場合、表示準備処理は終了する。 The processes from step S302 to step S306 are performed on a part-of-speech basis (text basis) based on the results of the morphological analysis in step S301. Therefore, the processes from step S302 to step S306 are repeated until conversion candidate information has been obtained for all parts of speech. When conversion candidate information has been obtained for all parts of speech, the display preparation process ends.

以上、第２の実施形態に係る処理の流れについて説明した。
以上説明したように、第２の実施形態に係る音声処理装置２０ａは、ユーザの発話内容を示す音声データを、音声認識によってテキストデータに変換する音声変換部２３２ａと、テキストデータを複数のテキストに分割して表示し、分割されたテキスト単位でテキストデータに対する修正を受け付けるモニタリング処理部２４１ａと、を備え、モニタリング処理部２４１ａは、第１のテキストデータに対する修正の実施以降に行われる音声認識によって得られる第２のテキストデータを表示する際に、第１のテキストデータにて修正された第１のテキストと対応する第２のテキストが第２のテキストデータに含まれる場合、第２のテキストを修正後の第１のテキストに変換して表示させる。 The process flow according to the second embodiment has been described above.
As described above, the voice processing device 20a according to the second embodiment includes a voice conversion unit 232a that converts voice data indicating the content of a user's speech into text data by voice recognition, and a monitoring processing unit 241a that divides the text data into multiple texts, displays the text data, and accepts corrections to the text data in divided text units. When displaying second text data obtained by voice recognition performed after corrections are made to the first text data, if the second text data contains second text corresponding to the first text corrected in the first text data, the monitoring processing unit 241a converts the second text into the corrected first text and displays it.

かかる構成により、チェッカーがある音声データに対する音声認識にて発生した誤認識を修正すると、その修正以降に別の音声データに対する音声認識にて同一の誤認識が発生したとしても、正しいテキストに自動変換される。これにより、チェッカーは、同一の誤認識が発生するたびに同一の修正を行う必要がなくなる。
よって、第２の実施形態に係る音声処理装置２０ａは、音声認識における誤認識の修正作業にかかる負荷を軽減することを可能とする。 With this configuration, when the checker corrects a recognition error that occurred in speech recognition of certain speech data, even if the same recognition error occurs in speech recognition of other speech data after the correction, it is automatically converted into correct text. This eliminates the need for the checker to make the same correction every time the same recognition error occurs.
Therefore, the voice processing device 20a according to the second embodiment makes it possible to reduce the burden of correcting erroneous recognition in voice recognition.

また、講演者の発話内容の同時通訳結果に誤認識又は誤訳が含まれる場合、チェッカーは、同時通訳結果（字幕）が聴講者へ提示される前に誤認識又は誤訳を修正することができる。これにより、聴講者には、誤認識又は誤訳が含まれる同時通訳結果は提示されず、修正後の誤認識又は誤訳が含まれない同時通訳結果のみが提示される。
よって、第２の実施形態に係る音声処理装置２０ａは、ユーザが同時翻訳の内容を正しく理解することを可能とする。 Furthermore, if the simultaneous interpretation of the speaker's speech contains a misrecognition or mistranslation, the checker can correct the misrecognition or mistranslation before the simultaneous interpretation result (subtitles) is presented to the audience. This ensures that the audience will not be shown the simultaneous interpretation result that contains the misrecognition or mistranslation, but only the corrected simultaneous interpretation result that does not contain the misrecognition or mistranslation.
Therefore, the speech processing device 20a according to the second embodiment enables the user to correctly understand the content of the simultaneous translation.

＜＜３．変形例＞＞
以上、実施形態について説明した。続いて、上述した実施形態の変形例について説明する。なお、以下に説明する変形例は、単独で実施形態に適用されてもよいし、組み合わせで実施形態に適用されてもよい。また、変形例は、実施形態で説明した構成に代えて適用されてもよいし、実施形態で説明した構成に対して追加的に適用されてもよい。 <<3. Modified Examples>>
The embodiment has been described above. Next, modified examples of the above-mentioned embodiment will be described. The modified examples described below may be applied to the embodiment alone or in combination with each other. The modified examples may be applied in place of the configuration described in the embodiment, or may be applied in addition to the configuration described in the embodiment.

上述した第１の実施形態と第２の実施形態は、組み合わせて実施されてもよい。例えば、第１の実施形態のモニタリング画面には、第２の実施形態のモニタリング画面と同様に、翻訳が表示されてもよい。
また、第２の実施形態のモニタリング画面には、第１の実施形態のモニタリング画面と同様に、チェッカーからテキストデータの表示可否の選択操作を受け付け可能なＵＩが表示されてもよい。 The first and second embodiments described above may be implemented in combination. For example, the monitoring screen of the first embodiment may display a translation in the same manner as the monitoring screen of the second embodiment.
Furthermore, the monitoring screen of the second embodiment may display a UI capable of receiving a selection operation from the checker as to whether or not to display text data, similar to the monitoring screen of the first embodiment.

また、上述した第１の実施形態では、第１の言語が日本語以外のいずれかの言語であり、第２の言語が日本語を含む複数の言語である例について説明したが、かかる例に限定されない。例えば、第１の言語が日本語であり、第２の言語が日本語以外の言語であってもよい。
また、上述した第２の実施形態では、第１の言語が日本語であり、第２の言語が日本語以外の複数の言語である例について説明したが、かかる例に限定されない。例えば、第１の言語が日本語以外の言語であり、第２の言語が日本語を含む複数の言語であってもよい。
また、上述した各実施形態では、表示装置４０には、第１の言語の字幕を表示可能であってもよい。 In the above-described first embodiment, an example is described in which the first language is a language other than Japanese and the second language is a plurality of languages including Japanese, but the present invention is not limited to such an example. For example, the first language may be Japanese and the second language may be a language other than Japanese.
In the above-described second embodiment, an example is described in which the first language is Japanese and the second language is a plurality of languages other than Japanese, but the present invention is not limited to such an example. For example, the first language may be a language other than Japanese and the second language may be a plurality of languages including Japanese.
In addition, in each of the above-described embodiments, the display device 40 may be capable of displaying subtitles in a first language.

以上、実施形態の変形例について説明した。
なお、上述した実施形態における字幕表示システム１，１ａ、及び音声処理装置２０，２０ａの機能の一部又は全部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。
また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The above describes the modified examples of the embodiment.
It should be noted that some or all of the functions of the subtitle display system 1, 1a and the audio processing device 20, 20a in the above-described embodiments may be realized by a computer. In this case, a program for realizing the functions may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read into a computer system and executed. Note that the term "computer system" here includes hardware such as an OS and peripheral devices.
Additionally, "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, CD-ROMs, etc., and storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording medium" may also include devices that dynamically hold a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, and devices that hold a program for a certain period of time, such as volatile memory within a computer system that serves as a server or client in such cases.
Furthermore, the above program may be for realizing a part of the above-mentioned functions, or may be capable of realizing the above-mentioned functions in combination with a program already recorded in a computer system, or may be realized by using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、図面を参照してこの発明の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 The above describes in detail an embodiment of the present invention with reference to the drawings, but the specific configuration is not limited to the above, and various design changes can be made without departing from the spirit of the present invention.

１，１ａ…字幕表示システム、１０…集音装置、２０，２０ａ…音声処理装置、２１…同時通訳エンジン、２２…機械翻訳エンジン、２３…音声認識エンジン、２４…変換候補ＡＰＩ、３０…モニタリング端末、４０…表示装置、４１…スクリーン、４２…スマートフォン、２１０，２１０ａ…通信部、２２０，２２０ａ…記憶部、２３０，２３０ａ…第１制御部、２３１，２３１ａ…音声データ取得部、２３２，２３２ａ…音声変換部、２４０，２４０ａ…第２制御部、２４１，２４１ａ…モニタリング処理部、２４２，２４２ａ…機械翻訳部、２４３，２４３ａ…字幕処理部、２４４…形態素解析部、２４５…変換優先度処理部、２４６…自動変換処理部 1, 1a...subtitle display system, 10...sound collection device, 20, 20a...sound processing device, 21...simultaneous interpretation engine, 22...machine translation engine, 23...speech recognition engine, 24...conversion candidate API, 30...monitoring terminal, 40...display device, 41...screen, 42...smartphone, 210, 210a...communication unit, 220, 220a...storage unit, 230, 230a...first control unit, 231, 231a...speech data acquisition unit, 232, 232a...speech conversion unit, 240, 240a...second control unit, 241, 241a...monitoring processing unit, 242, 242a...machine translation unit, 243, 243a...subtitle processing unit, 244...morpheme analysis unit, 245...conversion priority processing unit, 246...automatic conversion processing unit

Claims

a speech conversion unit that converts speech data indicating the contents of a user's speech in a first language into text data indicating a translation result into a plurality of second languages different from the first language by simultaneous interpretation;
a monitoring processing unit that accepts corrections to the text data for one designated second language among the plurality of second languages;
a translation unit that translates the corrected text data into the second language that was not specified at the time of correction, and obtains text data indicating a translation result in which the correction is reflected for each of the second languages;
An audio processing device comprising:

a subtitle processing unit that displays the text data indicating the translation result reflecting the correction as a subtitle on a display device;
The audio processing device of claim 1 further comprising:

the monitoring processing unit displays, on the monitoring terminal, a monitoring screen capable of accepting a correction operation for the text data converted from the voice data and a selection operation for whether or not to display the text data.
The audio processing device according to claim 1 .

Subtitles of the text data for which "cannot be displayed" is selected on the monitoring screen are not displayed on a display device, and subtitles of the text data for which "can be displayed" is selected on the monitoring screen are displayed on the display device.
The audio processing device according to claim 3 .

the monitoring processing unit displays the text data in a text field on the monitoring screen, switches the display enable/disable state to disabled when an inside of the text field is selected, and switches the display enable/disable state to enabled when an outside of the text field is selected after the inside of the text field is selected.
The audio processing device according to claim 3 .

When a portion of the text data to be corrected is selected on the monitoring screen, the monitoring processing unit displays conversion candidates near the portion of the text data to be corrected, and inserts text selected from the conversion candidates into the portion of the text data to be corrected.
The audio processing device according to claim 3 .

the monitoring processing unit adds, to the conversion candidates, text that is not included in the conversion candidates and is detected as a difference by comparing the text data before correction with the text data after correction.
The audio processing device according to claim 6.

a speech conversion process for converting speech data representing the contents of a user's speech in a first language into text data representing the results of translation into a plurality of second languages different from the first language by simultaneous interpretation;
a monitoring process for receiving corrections to the text data for a specified one of the second languages;
a translation process of translating the corrected text data into the second language that was not specified at the time of correction, and acquiring text data indicating a translation result in which the correction is reflected for each of the second languages;
16. A computer-implemented method for audio processing comprising:

Computer,
a speech conversion means for converting speech data representing the contents of a user's speech in a first language into text data representing the results of translation into a plurality of second languages different from the first language by simultaneous interpretation;
a monitoring processing means for receiving corrections to the text data for one designated second language among the plurality of second languages;
a translation means for translating the corrected text data into the second language that was not specified at the time of correction, and obtaining text data indicating a translation result in which the correction is reflected for each of the second languages;
A program to function as a