JP2000324395A

JP2000324395A - How to add timing information to subtitles

Info

Publication number: JP2000324395A
Application number: JP11134755A
Authority: JP
Inventors: Eiji Sawamura; 英治沢村; Takao Monma; 隆雄門馬; Takahiro Fukushima; 孝博福島; Ichiro Maruyama; 一郎丸山; Terumasa Ebara; 暉将江原; Katsuhiko Shirai; 克彦白井
Original assignee: Mitsubishi Electric Corp; NEC Corp; Nippon Hoso Kyokai NHK; Telecommunications Advancement Organization; NHK Engineering Services Inc; Japan Broadcasting Corp
Current assignee: Mitsubishi Electric Corp; NEC Corp; National Institute of Information and Communications Technology; Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 1999-05-14
Filing date: 1999-05-14
Publication date: 2000-11-24
Anticipated expiration: 2019-05-14
Also published as: JP4140745B2

Abstract

(57)【要約】【課題】字幕の基となる字幕文テキストを、所定の提
示形式に従う適切箇所で分割後の提示単位字幕の各々に
対し、その分割箇所に対応した高精度のタイミング情報
を自動的に付与し得る字幕へのタイミング情報付与方法
を提供することを課題とする。【解決手段】少なくとも字幕の基となる分割前の字幕
文テキストに対し、基準となるタイミング情報を各所に
付与しておき、字幕文テキストを所定の提示形式に従う
適切箇所で分割していくことで提示単位字幕化を行い、
各提示単位字幕の始点／終点のうち少なくともいずれか
一方に、基準となるタイミング情報と、提示単位字幕が
呈する文字種及び文字数、又は発音記号列のうち少なく
ともいずれか一方を含む文字情報と、に基づいて類推演
算したタイミング情報を付与する。 (57) [Summary] [Problem] For each of presentation unit subtitles obtained by dividing a subtitle sentence text as a base of a subtitle at an appropriate position according to a predetermined presentation format, high-precision timing information corresponding to the division position is provided. It is an object to provide a method of automatically adding timing information to captions. SOLUTION: By providing reference timing information to each part at least to a subtitle sentence text before division as a base of a subtitle and dividing the subtitle sentence text at an appropriate place according to a predetermined presentation format. Perform presentation subtitles,
At least one of the start point / end point of each presentation unit subtitle is based on reference timing information and character information including at least one of a character type and the number of characters presented by the presentation unit subtitle, or a phonetic symbol string. Timing information calculated by analogy.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ほぼ共通の電子化
原稿をアナウンス用と字幕用の双方に利用する形態を想
定して字幕番組を制作する字幕番組制作システムに適用
される字幕へのタイミング情報付与方法に係り、特に、
文頭などの各所に字幕の提示に関するタイミング情報が
付与された字幕の基となる字幕文テキストを、所定の提
示形式に従う適切箇所で分割後の提示単位字幕の各々に
対し、その分割箇所に対応した高精度のタイミング情報
を自動的に付与し得る字幕へのタイミング情報付与方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a timing for subtitles applied to a subtitle program production system for producing a subtitle program assuming a form in which an almost common digitized original is used for both announcement and subtitle. Regarding the method of giving information,
The subtitle sentence text, which is the basis of the subtitles to which the timing information regarding the presentation of subtitles has been added to each part such as the beginning of the sentence, is provided for each of the presentation unit subtitles after division at an appropriate position according to a predetermined presentation format, and the division position corresponding to the division position The present invention relates to a method for providing timing information to subtitles, which can automatically provide high-precision timing information.

【０００２】[0002]

【従来の技術】現代は高度情報化社会と一般に言われて
いるが、聴覚障害者は健常者と比較して情報の入手が困
難な状況下におかれている。2. Description of the Related Art Today, it is generally referred to as an advanced information society, but hearing impaired persons are in a situation where it is difficult to obtain information as compared with healthy persons.

【０００３】すなわち、例えば、情報メディアとして広
く普及しているＴＶ放送番組を例示して、日本国内の全
ＴＶ放送番組に対する字幕番組の割合に言及すると、欧
米では３３〜７０％に達しているのに対し、わずか１０
％程度ときわめて低くおかれているのが現状である。That is, for example, a TV broadcast program which is widely used as an information medium is exemplified, and when the ratio of a subtitle program to all TV broadcast programs in Japan is referred to, it reaches 33 to 70% in Europe and the United States. Only 10
At present, it is extremely low at about%.

【０００４】[0004]

【発明が解決しようとする課題】さて、日本国内の全Ｔ
Ｖ放送番組に対する字幕番組の割合が欧米と比較して低
くおかれている要因としては、主として字幕番組制作技
術の未整備を挙げることができる。具体的には、日本語
特有の問題も有り、字幕番組制作工程のほとんどが手作
業によっており、多大な労力・時間・費用を要するため
である。Now, all T in Japan
The reason why the ratio of subtitle programs to V broadcast programs is lower than in Europe and the United States is mainly due to the lack of subtitle program production technology. Specifically, there is also a problem unique to Japanese, and most of the subtitle program production processes are manually performed, requiring a great deal of labor, time, and cost.

【０００５】そこで、本発明者らは、字幕番組制作技術
の整備を妨げている原因究明を企図して、現行の字幕番
組制作の実体調査を行った。[0005] Therefore, the present inventors conducted a substantive investigation on the current production of subtitled programs in an attempt to investigate the cause of hindrance to the development of subtitled program production technology.

【０００６】図８の左側には、現在一般に行われている
字幕番組制作フローを示してある。ステップＳ１０１に
おいて、字幕番組制作者は、タイムコードを映像にスー
パーした番組データと、タイムコードを音声チャンネル
に記録した番組テープと、番組台本との３つの字幕原稿
作成素材を放送局から受け取る。なお、図中において
「タイムコード」を「ＴＣ」と略記する場合があること
を付言しておく。[0008] The left side of FIG. 8 shows a subtitle program production flow that is currently generally performed. In step S101, the subtitle program creator receives three subtitle manuscript creation materials, that is, program data in which the time code is superimposed on video, a program tape in which the time code is recorded on an audio channel, and a program script. Note that in the drawings, “time code” may be abbreviated as “TC”.

【０００７】ステップＳ１０３において、放送関係経験
者等の専門家は、ステップＳ１０１で受け取った字幕原
稿作成素材を基に、（１）番組アナウンスの要約書き起
こし、（２）別途規定された字幕提示の基準となる原稿
作成要領に従う字幕提示イメージ化、（３）その開始・
終了タイムコード記入、の各作業を順次行ない、字幕原
稿を作成する。In step S103, the expert, such as a person who has broadcast experience, based on the subtitle manuscript preparation material received in step S101, (1) transcribes the summary of the program announcement, and (2) prepares the separately specified subtitle presentation. Caption presentation image according to the standard manuscript preparation procedure, (3) Start
The work of inputting the end time code is sequentially performed to create a subtitle manuscript.

【０００８】ステップＳ１０５において、入力オペレー
タは、ステップＳ１０３で作成された字幕原稿をもとに
電子化字幕を作成する。In step S105, the input operator creates digitized subtitles based on the subtitle manuscript created in step S103.

【０００９】ステップＳ１０７において、ステップＳ１
０５で作成された電子化字幕を、担当の字幕制作責任
者、原稿作成者、及び入力オペレータの三者立ち会いの
もとで試写・修正を行い、完成字幕とする。In step S107, step S1
The digitized subtitles created in step 05 are previewed and corrected in the presence of the caption production manager in charge, the manuscript creator, and the input operator to make the completed subtitles.

【００１０】ところで、最近では、番組アナウンスの要
約書き起こしと字幕の電子化双方に通じたキャプション
オペレータと呼ばれる人材を養成することで、図８の右
側に示す改良された現行字幕制作フローも一部実施され
ている。[0010] Recently, by training a human resource called a caption operator who has been involved in both the summary transcription of program announcements and the digitization of subtitles, the improved current subtitle production flow shown on the right side of FIG. It has been implemented.

【００１１】すなわち、ステップＳ１１１において、字
幕番組制作者は、タイムコードを音声チャンネルに記録
した番組テープと、番組台本との２つの字幕原稿作成素
材を放送局から受け取る。That is, in step S111, the subtitle program maker receives two subtitle manuscript creation materials, a program tape in which a time code is recorded on an audio channel, and a program script.

【００１２】ステップＳ１１３において、キャプション
オペレータは、タイムコードを音声チャンネルに記録し
た番組テープを再生し、セリフの開始点でマウスのボタ
ンをクリックすることでその点の音声チャンネルから始
点タイムコードを取り出して記録する。さらに、セリフ
を聴取して要約電子データとして入力するとともに、字
幕原稿作成要領に基づく区切り箇所に対応するセリフ点
で再びマウスのボタンをクリックすることでその点の音
声チャンネルから終点タイムコードを取り出して記録す
る。これらの操作を番組終了まで繰り返して、番組全体
の字幕を電子化する。In step S113, the caption operator reproduces the program tape having the time code recorded on the audio channel, and clicks the mouse button at the start point of the dialog to extract the start time code from the audio channel at that point. Record. Furthermore, while listening to the dialogue and inputting it as summary electronic data, clicking the mouse button again at the dialogue point corresponding to the break point based on the subtitle manuscript creation procedure, extracting the end point time code from the audio channel at that point Record. These operations are repeated until the end of the program, and the subtitles of the entire program are digitized.

【００１３】ステップＳ１１７において、ステップＳ１
０５で作成された電子化字幕を、担当の字幕制作責任
者、及びキャプションオペレータの二者立ち会いのもと
で試写・修正を行い、完成字幕とする。In step S117, step S1
The digitized subtitles created in step 05 are previewed and modified under the attendance of the caption production manager in charge and the caption operator to obtain completed subtitles.

【００１４】後者の改良された現行字幕制作フローで
は、キャプションオペレータは、タイムコードを音声チ
ャンネルに記録した番組テープのみを使用して、セリフ
の要約と電子データ化を行うとともに、提示単位に分割
した字幕の始点／終点にそれぞれ対応するセリフのタイ
ミングでマウスボタンをクリックすることにより、音声
チャンネルの各タイムコードを取り出して記録するもの
であり、かなり省力化された効果的な字幕制作フローと
いえる。In the latter improved current subtitle production flow, the caption operator uses only a program tape in which a time code is recorded on an audio channel to summarize and convert the dialogue into words and to divide it into presentation units. By clicking the mouse button at the timing of the dialog corresponding to the start point / end point of the subtitle, each time code of the audio channel is taken out and recorded, which can be said to be an effective subtitle production flow with considerably reduced labor.

【００１５】さて、上述した現行字幕制作フローにおけ
る一連の処理の流れの中で特に多大な工数を要するの
は、ステップＳ１０３乃至Ｓ１０５又はステップＳ１１
３の、（１）番組アナウンスの要約書き起こし、（２）
字幕提示イメージ化、（３）その開始・終了タイムコー
ド記入、の各作業工程であり、これらの作業工程は熟練
者の知識・経験に負うところが大きい。The reason why a particularly large number of man-hours are required in a series of processing flows in the above-described current subtitle production flow is that steps S103 to S105 or step S11.
3. (1) Transcript of summary of program announcement, (2)
These are the work processes of subtitle presentation imaging and (3) entry of the start / end time code, and these work processes largely depend on the knowledge and experience of the skilled person.

【００１６】しかし、現在放送中の字幕番組のなかで、
予めアナウンス原稿が作成され、その原稿がほとんど修
正されることなく実際の放送字幕となっていると推測さ
れる番組がいくつかある。例えば、「生きもの地球紀
行」という字幕付き情報番組を実際に調べて見ると、ア
ナウンス音声と字幕内容はほとんど共通であり、共通の
原稿をアナウンス用と字幕用の双方に利用しているもの
と推測出来る。However, among the subtitle programs currently being broadcast,
There are some programs in which an announcement manuscript is created in advance, and the manuscript is assumed to be actual broadcast subtitles with little modification. For example, when actually examining an information program with subtitles called "The Life of the Earth", it is assumed that the announcement sound and subtitle content are almost the same, and that a common manuscript is used for both the announcement and subtitles I can do it.

【００１７】このようにアナウンス音声と字幕内容が極
めて類似し、アナウンス用と字幕用の双方にほぼ共通の
原稿を利用しており、その原稿が電子化されている番組
を想定した場合、（１）の番組アナウンスの要約書き起
こし作業はほとんど必要ないことになる。この場合、残
る作業は、（２）の字幕提示イメージ化、及び（３）の
開始・終了タイムコード記入、の各作業工程である。そ
こで、本発明者らは、これら各作業工程の簡略化を企図
して鋭意研究を進めた結果、（３）の開始・終了タイム
コード記入の工程を、人手を介することなく自動化でき
る新規な技術を想到するに至ったのである。As described above, the announcement sound and the subtitle contents are very similar, and a substantially common manuscript is used for both the announcement and the subtitle, and when a program in which the manuscript is digitized is assumed, (1) ) Will hardly need to be transcribed. In this case, the remaining operations are (2) subtitle presentation imaging and (3) start / end time code entry. The inventors of the present invention have made intensive studies to simplify each of these work processes, and as a result, a new technology that can automate the process of (3) starting and ending time code entry without human intervention. It came to come to mind.

【００１８】本発明は、上述した実情に鑑みてなされた
ものであり、文頭などの各所に字幕の提示に関するタイ
ミング情報が付与された字幕の基となる字幕文テキスト
を、所定の提示形式に従う適切箇所で分割後の提示単位
字幕の各々に対し、その分割箇所に対応した高精度のタ
イミング情報を自動的に付与し得る字幕へのタイミング
情報付与方法を提供することを課題とする。The present invention has been made in view of the above-described circumstances, and is intended to convert a subtitle sentence text serving as a base of a subtitle in which timing information relating to the presentation of a subtitle is added to each part such as a head of a sentence in an appropriate manner according to a predetermined presentation format. It is an object of the present invention to provide a method of providing timing information to subtitles, which can automatically provide high-precision timing information corresponding to the divided portion to each of the presentation unit subtitles divided at the portion.

【００１９】[0019]

【課題を解決するための手段】上記課題を解決するため
に、請求項１の発明は、字幕番組を制作するにあたり、
少なくとも字幕の基となる字幕文テキストを、所定の提
示形式に従う適切箇所で分割後の提示単位字幕の各々に
対し、その分割箇所に対応したタイミング情報を付与す
る際に用いられる字幕へのタイミング情報付与方法であ
って、前記所定の提示形式に従う適切箇所で分割前の字
幕文テキストの各所に対し、基準となるタイミング情報
を付与しておき、前記字幕文テキストを前記適切箇所で
分割していくことで提示単位字幕化を行い、前記基準と
なるタイミング情報と、各提示単位字幕が呈する文字種
及び文字数又は発音記号列を含む文字情報と、に基づい
て、前記適切箇所で分割後の各提示単位字幕の始点／終
点のうち少なくともいずれか一方に付与するタイミング
情報を類推演算し、前記字幕文テキストを前記適切箇所
で分割後の各提示単位字幕の各々に対し、前記類推演算
したタイミング情報を自動的に付与することを要旨とす
る。Means for Solving the Problems In order to solve the above-mentioned problems, the invention of claim 1 relates to a method for producing a subtitle program.
At least the subtitle sentence text that is the basis of the subtitles, at each of the presentation unit subtitles divided at an appropriate location according to a predetermined presentation format, timing information to the subtitles used when adding timing information corresponding to the division location A method of providing, wherein reference timing information is provided to each part of the subtitle sentence text before division at an appropriate part according to the predetermined presentation format, and the subtitle sentence text is divided at the appropriate part. By performing the presentation unit subtitles, based on the timing information serving as the reference and the character information including the character type and the number of characters or the pronunciation symbol string presented by each presentation unit subtitle, each presentation unit after division at the appropriate location The timing information to be assigned to at least one of the start point and the end point of the caption is calculated by analogy, and each of the caption texts is presented after being divided at the appropriate place. Position for each of the subtitle, and summarized in that the automatically given timing information described above analogy operation.

【００２０】請求項１の発明によれば、所定の提示形式
に従う適切箇所で分割前の字幕文テキストの各所に対
し、基準となるタイミング情報を付与しておき、字幕文
テキストを前記適切箇所で分割していくことで提示単位
字幕化を行い、前記基準となるタイミング情報と、各提
示単位字幕が呈する文字種及び文字数又は発音記号列を
含む文字情報と、に基づいて、前記適切箇所で分割後の
各提示単位字幕の始点／終点のうち少なくともいずれか
一方に付与するタイミング情報を類推演算し、字幕文テ
キストを前記適切箇所で分割後の各提示単位字幕の各々
に対し、前記類推演算したタイミング情報を自動的に付
与するので、したがって、字幕文テキストを所定の提示
形式に従う適切箇所で分割後の提示単位字幕の各々に対
し、その分割箇所に対応した高精度のタイミング情報を
自動的に付与可能な字幕へのタイミング情報付与方法を
得ることができる。According to the first aspect of the present invention, reference timing information is added to each portion of the subtitle text before division at an appropriate portion according to a predetermined presentation format, and the subtitle text is added at the appropriate portion. Performing the presentation unit subtitles by dividing, based on the timing information serving as the reference and the character information including the character type and the number of characters or the pronunciation symbol string presented by each presentation unit subtitle, after division at the appropriate place The timing information to be applied to at least one of the start point and the end point of each presentation unit subtitle is calculated by analogy, and the timing calculated by analogy is calculated for each of the presentation unit subtitles after the subtitle sentence text is divided at the appropriate position. Since the information is automatically added, the subtitle sentence text is divided into appropriate sub-subtitles according to the predetermined presentation format. Timing information attaching method of the response was highly accurate timing information of the automatically given subtitles can be obtained.

【００２１】また、請求項２の発明は、請求項１に記載
の字幕へのタイミング情報付与方法であって、前記適切
箇所で分割後の各提示単位字幕の始点／終点のうち少な
くともいずれか一方に付与するタイミング情報を類推演
算するにあたり、前記基準となるタイミング情報と、前
記各提示単位字幕が呈する文字種及び文字数を含む文字
情報と、に基づいて、漢字・アラビア数字・英字を含む
その他の文字の読み時間を、ひらがな又はカタカナを含
む文字の読み時間に対し、統計的な調査から得られる所
定倍率に時間換算することで、前記適切箇所で分割後の
各提示単位字幕の始点／終点のうち少なくともいずれか
一方に付与するタイミング情報を類推演算することを要
旨とする。According to a second aspect of the present invention, there is provided the method for adding timing information to subtitles according to the first aspect, wherein at least one of a start point and an end point of each presentation unit subtitle after division at the appropriate position. Based on the timing information serving as the reference and the character information including the character type and the number of characters presented by each of the presentation unit subtitles, in calculating the timing information to be given to the other characters, other characters including kanji, Arabic numerals, and English characters By converting the reading time of characters into a predetermined magnification obtained from a statistical survey with respect to the reading time of characters including hiragana or katakana, the start point / end point of each presentation unit subtitle after division at the appropriate part The gist of the present invention is to calculate the timing information to be given to at least one of them by analogy.

【００２２】請求項２の発明によれば、適切箇所で分割
後の各提示単位字幕の始点／終点のうち少なくともいず
れか一方に付与するタイミング情報を類推演算するにあ
たり、前記基準となるタイミング情報と、前記各提示単
位字幕が呈する文字種及び文字数を含む文字情報と、に
基づいて、漢字・アラビア数字・英字を含むその他の文
字の読み時間を、ひらがな又はカタカナを含む文字の読
み時間に対し、統計的な調査から得られる所定倍率に時
間換算することで、前記適切箇所で分割後の各提示単位
字幕の始点／終点のうち少なくともいずれか一方に付与
するタイミング情報を類推演算するので、したがって、
全字幕文字を対象とした複雑かつ一定の処理時間を要す
る同期検出技術の適用を要しない結果として、字幕の提
示に関する即時性の良好な維持を期待することができ
る。According to the second aspect of the present invention, when the timing information to be assigned to at least one of the start point and the end point of each presentation unit subtitle after division at an appropriate location is calculated by analogy, Based on the character information including the character type and the number of characters presented by each of the presentation unit subtitles, the reading time of other characters including kanji, Arabic numerals and English characters is statistically calculated with respect to the reading time of characters including hiragana or katakana. By converting the time to a predetermined magnification obtained from a typical survey, the timing information to be assigned to at least one of the start point and the end point of each of the divided presentation unit subtitles at the appropriate part is calculated by analogy.
As a result that it is not necessary to apply the synchronization detection technique that requires a complicated and constant processing time for all the subtitle characters, it is possible to expect to maintain good immediacy regarding the presentation of subtitles.

【００２３】さらに、請求項３の発明は、請求項２に記
載の字幕へのタイミング情報付与方法であって、前記統
計的な調査から得られる所定倍率は、約１．８６倍であ
ることを要旨とする。According to a third aspect of the present invention, there is provided the method for adding timing information to subtitles according to the second aspect, wherein the predetermined magnification obtained from the statistical investigation is about 1.86. Make a summary.

【００２４】請求項３の発明によれば、前記統計的な調
査から得られる所定倍率は、例えば約１．８６倍に設定
することができる。According to the invention of claim 3, the predetermined magnification obtained from the statistical investigation can be set to, for example, about 1.86.

【００２５】一方、請求項４の発明は、請求項１に記載
の字幕へのタイミング情報付与方法であって、前記適切
箇所で分割後の各提示単位字幕の始点／終点のうち少な
くともいずれか一方に付与するタイミング情報を類推演
算するにあたり、前記基準となるタイミング情報と、各
提示単位字幕が呈する発音記号列を含む文字情報と、に
基づいて、各発音記号の音素にそれぞれ対応する読み時
間を統計的手法を用いてテーブル化した音素時間表を参
照しながら、各提示単位字幕に含まれる発音記号列びの
各音素時間を積算することで、前記適切箇所で分割後の
各提示単位字幕の始点／終点のうち少なくともいずれか
一方に付与するタイミング情報を類推演算することを要
旨とする。According to a fourth aspect of the present invention, there is provided the method of adding timing information to a subtitle according to the first aspect, wherein at least one of a start point and an end point of each presentation unit subtitle after division at the appropriate position. In calculating the timing information to be applied to the analogy, the reading time corresponding to the phoneme of each phonetic symbol is calculated based on the reference timing information and the character information including the phonetic symbol string presented by each presentation unit subtitle. By referring to the phoneme time table tabulated using a statistical method, by accumulating each phoneme time of the phonetic symbol string included in each presentation unit subtitle, the presentation unit subtitle of each presentation unit after division at the appropriate place The gist of the present invention is to calculate the timing information to be applied to at least one of the start point and the end point by analogy.

【００２６】請求項４の発明によれば、適切箇所で分割
後の各提示単位字幕の始点／終点のうち少なくともいず
れか一方に付与するタイミング情報を類推演算するにあ
たり、前記基準となるタイミング情報と、各提示単位字
幕が呈する発音記号列を含む文字情報と、に基づいて、
各発音記号の音素にそれぞれ対応する読み時間を統計的
手法を用いてテーブル化した音素時間表を参照しなが
ら、各提示単位字幕に含まれる発音記号列びの各音素時
間を積算することで、前記適切箇所で分割後の各提示単
位字幕の始点／終点のうち少なくともいずれか一方に付
与するタイミング情報を類推演算するので、したがっ
て、請求項２の発明と同様に、全字幕文字を対象とした
複雑かつ一定の処理時間を要する同期検出技術の適用を
要しない結果として、字幕の提示に関する即時性の良好
な維持を期待することができる。According to the fourth aspect of the present invention, when the timing information to be added to at least one of the starting point and the ending point of each of the divided presentation unit subtitles at an appropriate location is calculated by analogy, , Character information including a phonetic symbol string presented by each presentation unit subtitle,
By referring to the phoneme time table in which the reading times corresponding to the phonemes of each phonetic symbol are tabulated using a statistical method, by accumulating the phoneme times of the phonetic symbol strings included in each presentation unit subtitle, The timing information to be assigned to at least one of the start point and the end point of each presentation unit subtitle after division at the appropriate part is calculated by analogy. Therefore, similar to the invention of claim 2, all subtitle characters are targeted. As a result of not requiring the application of the synchronization detection technique that requires a complicated and constant processing time, it is possible to expect to maintain good immediacy regarding the presentation of subtitles.

【００２７】そして、請求項５の発明は、請求項２乃至
４のうちいずれか一項に記載の字幕へのタイミング情報
付与方法であって、前記タイミング情報は、時間比率の
手法を用いて類推演算されることを要旨とする。According to a fifth aspect of the present invention, there is provided the method for adding timing information to subtitles according to any one of the second to fourth aspects, wherein the timing information is inferred by using a time ratio technique. The gist is to be calculated.

【００２８】請求項５の発明によれば、前記タイミング
情報は、時間比率の手法を用いて類推演算されるので、
したがって、簡便な手法をもって比較的高精度のタイミ
ング情報の類推演算を実現することができる。According to the fifth aspect of the present invention, the timing information is calculated by analogy using a time ratio method.
Therefore, it is possible to implement a relatively high-precision analogy calculation of timing information by a simple method.

【００２９】[0029]

【発明の実施の形態】以下に、本発明に係る字幕へのタ
イミング情報付与方法の一実施形態について、図に基づ
いて詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment of a method for providing timing information to subtitles according to the present invention will be described in detail with reference to the drawings.

【００３０】図１は、本発明に係る字幕へのタイミング
情報付与方法を具現化する自動字幕番組制作システムの
機能ブロック構成図、図２は、実際のＴＶニュース文を
対象とした平均読み数の調査結果を表す図、図３は、文
字種に着目したタイミング情報付与方法における時間誤
差の試算結果を表す図、図４は、本発明の説明に供する
分割字幕文を表す図、図５は、発音記号列に着目したタ
イミング情報付与方法において利用する音素時間表の一
例を表す図、図６乃至図７は、アナウンス音声に対する
字幕送出タイミングの同期検出技術に係る説明に供する
図である。FIG. 1 is a functional block diagram of an automatic subtitle program production system which embodies the method of adding timing information to subtitles according to the present invention. FIG. 2 shows the average number of readings for an actual TV news sentence. FIG. 3 is a diagram showing a result of a survey, FIG. 3 is a diagram showing a trial calculation result of a time error in a timing information adding method focusing on a character type, FIG. 4 is a diagram showing a divided subtitle sentence for explanation of the present invention, and FIG. FIG. 6 and FIG. 7 are diagrams illustrating an example of a phoneme time table used in the timing information adding method focusing on a symbol string, and FIG. 6 and FIG.

【００３１】なお、本発明の実施形態で採用する所定の
提示形式として、１行当たりの制限文字数Ｎを１５文字
とし、２行からなる提示単位字幕を一括総入れ換えする
提示形式を例示して、以下の説明を進めることにする。As a predetermined presentation format adopted in the embodiment of the present invention, a presentation format in which the limited number of characters N per line is 15 characters and a presentation unit subtitle consisting of two lines is totally and totally replaced is exemplified. The following description will proceed.

【００３２】既述したように、現在放送中の字幕番組の
なかで、予めアナウンス原稿が作成され、その原稿がほ
とんど修正されることなく実際の放送字幕となっている
と推測される番組がいくつかある。例えば、「生きもの
地球紀行」という字幕付き情報番組を実際に調べて見る
と、アナウンス音声と字幕内容はほぼ共通であり、ほぼ
共通の原稿をアナウンス用と字幕用の両方に利用してい
ると推測出来る。As described above, among subtitle programs currently being broadcast, an announcement manuscript is created in advance, and there are a number of programs that are presumed to have actual subtitles without any substantial modification of the manuscript. There is. For example, when actually examining an information program with subtitles called "Travel of the Earth", the announcement sound and subtitle content are almost the same, and it is estimated that almost the same manuscript is used for both the announcement and subtitle I can do it.

【００３３】そこで、本発明者らは、このようにアナウ
ンス音声と字幕の内容が極めて類似し、アナウンス用と
字幕用の両方に共通の原稿を利用しており、その原稿が
電子化されている番組を想定したとき、少なくとも文頭
などに字幕の提示に関するタイミング情報が各所に付与
された字幕の基となる字幕文テキストを、所定の提示形
式に従う適切箇所で分割後の提示単位字幕の各々に対
し、その分割箇所に対応した高精度のタイミング情報を
自動的に付与し得る字幕へのタイミング情報付与方法を
想到するに至ったのである。Therefore, the present inventors use the same manuscript for both the announcement and subtitles, because the announcement sound and the contents of the subtitles are very similar, and the manuscript is digitized. Assuming a program, at least the timing information on the presentation of subtitles at the beginning of the sentence is the subtitle sentence text that is the basis of the subtitles given to various places, for each of the presentation unit subtitles after division at an appropriate place according to the predetermined presentation format Thus, a method of adding timing information to subtitles, which can automatically add high-precision timing information corresponding to the division, has been conceived.

【００３４】ここで、本発明を想到するに至った背景に
ついて述べると、より読みやすく、理解しやすい字幕の
観点から字幕文テキストの分割問題を考える場合、当然
ながら読みやすく、理解しやすい字幕とはどのようなも
のかが問題となる。この問題に対する定量的に明確な回
答は未だ見出せていないが、しかし、実験字幕番組の制
作や字幕評価実験などの貴重な経験を通して、定性的な
がら考慮すべき要素が明らかになりつつある。Here, the background that led to the present invention will be described. When considering the problem of subtitle sentence text division from the viewpoint of subtitles that are easier to read and understand, it is obvious that subtitles are easier to read and understand. The question is what kind of thing is. We have not yet found a quantitatively clear answer to this question, but through valuable experience in producing experimental subtitle programs and subtitle evaluation experiments, qualitative but important factors are becoming clearer.

【００３５】字幕の読み易さ、理解し易さの観点から
は、一般にある程度以上の文字数が同時的に提示され、
この提示が所要時間継続しているのが良いといわれる
が、文字数や提示継続時間は、提示する字幕がどのよう
に読まれるかと大きく関わる。From the viewpoint of readability and comprehension of subtitles, generally, a certain number of characters or more is presented simultaneously.
It is said that it is good that the presentation lasts for the required time, but the number of characters and the presentation duration greatly affect how the presented subtitles are read.

【００３６】例えば聴覚障害者が字幕付テレビ番組を見
る場合を想定すると、視覚を介して、映像情報と音声情
報とを交互に見ることになるので、本来字幕は間欠的に
しか見ることが出来ない。そのため、音声情報をより読
みやすく、理解しやすい字幕として提示することで、字
幕を見ている割合を出来るだけ少なくして、その分だけ
映像を多く見られるようにするのが望ましい。For example, assuming that a hearing-impaired person watches a TV program with subtitles, the video information and the audio information are alternately viewed through the visual sense, so that the subtitles can be viewed only intermittently. Absent. For this reason, it is desirable to present the audio information as subtitles that are easier to read and understand, so that the ratio of viewing subtitles is reduced as much as possible, so that more video can be viewed accordingly.

【００３７】この場合の字幕の見方は、字幕の提示形式
にも依存するが、例えば２行の提示単位字幕を一括入れ
換えする提示形式を例示し、提示される全字幕の捕捉を
試みた場合、一般的には、基準となる字幕文字（例え
ば、音声アナウンスの進行に対応する文字）を中心とし
て、先読み、後読みもしくはその両方を行うことにな
る。The way of viewing subtitles in this case also depends on the presentation format of the subtitles. For example, when a presentation format in which two-line presentation unit subtitles are exchanged at a time is illustrated, and an attempt is made to capture all the subtitles presented, In general, pre-reading, post-reading, or both are performed centering on a reference subtitle character (for example, a character corresponding to the progress of a voice announcement).

【００３８】先読み、後読みもしくはその両方を行うこ
とになる要因としては、映像の注視又はまばたきや脇見
などを含む字幕から目を離している見逃し動作時間が存
在するからであり、１回当たりの見逃し動作時間の長さ
は、経験的には０．５〜２秒間程度であると思われる。The reason for performing pre-reading, post-reading, or both is that there is an overlooking operation time in which the user is looking away from the subtitles including watching the video or blinking or looking aside. The length of the missed operation time is empirically considered to be about 0.5 to 2 seconds.

【００３９】ここで、字幕の提示速度を２００字／分と
想定すると、その最大時間である２秒間は約７文字に相
当し、このことから、１回の見逃し動作で７文字分の字
幕文字を見逃すおそれがあることがわかる。Here, assuming that the subtitle presentation speed is 200 characters / minute, the maximum time of 2 seconds is equivalent to about 7 characters. It can be seen that there is a risk of overlooking.

【００４０】このことから、基準となる字幕文字を中心
に連続した１４文字が最低限の提示単位として必要であ
り、再び字幕に注視点が戻って字幕を読み取り、認識す
る分を前後各５〜７文字とすると、内容の連続した２４
〜２９文字程度の字幕を同時に画面提示するのが望まし
いことがわかる。ちなみに現行の字幕放送では一行１５
文字で二行提示が多く、最大３０文字程度まで提示され
ている。From the above, it is necessary that 14 characters continuous from the reference subtitle character be the minimum presentation unit, and the gazing point returns to the subtitle again, and the subtitle is read and recognized. Assuming 7 characters, 24
It can be seen that it is desirable to simultaneously present subtitles of about 29 characters on the screen. By the way, in the current subtitle broadcasting, one line is 15
Many characters are presented in two lines, and up to about 30 characters are presented.

【００４１】また、上記の分析結果に従い、字幕が提示
されてから実際に読まれるまで最悪２秒間程度必要なも
のと仮定すると、文字数が７文字以下の字幕を文字数相
当の時間のみ提示した場合には、この提示字幕が全く読
まれないおそれがある。例えば日本語の特質上、否定文
では否定語が文末におかれるので、この否定語部分が上
記の状態に該当するような分割はきわめて悪い影響をも
たらす可能性があり、このような分割は可及的に回避す
る必要がある。Further, assuming that it takes at least 2 seconds from the presentation of the subtitles to the actual reading according to the above analysis results, if the subtitles of 7 characters or less are presented only for the time corresponding to the number of characters, May not be able to read this subtitle at all. For example, due to the characteristics of Japanese, negative words are placed at the end of sentences in negative sentences, so divisions in which this negative word part corresponds to the above state may have a very bad effect, and such divisions are possible. It must be avoided as much as possible.

【００４２】その対策として、少ない文字数への分割を
しない、又は少ない文字数では提示時間を長くする、な
どの手法を適用するのが望ましい。As a countermeasure, it is desirable to apply a technique of not dividing the number of characters into a small number or increasing the presentation time with a small number of characters.

【００４３】次の問題は、例えば文間の無音区間、つま
りポーズの取り扱いである。字幕文中に長いポーズが存
在する場合には、このポーズの前後は相互に異なる内容
に関わる字幕文である可能性が高いことから、そのポー
ズにまたがるような字幕提示は好ましくない。逆に極め
て短いポーズが存在する場合には、このポーズの前後は
相互に共通の内容に関わる字幕文である可能性が高いこ
とから、むしろ連続した字幕文として取り扱う方が好ま
しい。このことから、ポーズ時間の長さを考慮した字幕
文の分割手法を適用するのが望ましい。The next problem is, for example, the handling of a silent section between sentences, that is, a pause. If there is a long pause in the caption text, it is highly likely that before and after the pause are caption texts related to mutually different contents, so it is not preferable to present a caption spanning the pose. Conversely, when there is a very short pause, it is more preferable to treat it as a continuous subtitle sentence before and after this pause, since there is a high possibility that the sentence is a subtitle sentence related to mutually common contents. For this reason, it is desirable to apply a caption sentence division method that takes into account the length of the pause time.

【００４４】さらに、ひとかたまりの文字群は可能な限
り分割せず、同一行に提示するのが望ましい。この例と
して、通常の単語のみならず、連続する漢字、カタカ
ナ、アラビア数字、英字などがあり、（xxx）や「xxx」
などと表わさるルビ、略称に対する正式呼称、注釈など
もこの範疇として取り扱う。Further, it is desirable that a group of characters be presented on the same line without being divided as much as possible. Examples of this include not only ordinary words, but also continuous kanji, katakana, Arabic numerals, and alphabetic characters, such as (xxx) or "xxx"
Ruby, abbreviations for formal names, annotations, etc. are also included in this category.

【００４５】このように、より読みやすく、理解しやす
い字幕を得ることを目的として字幕文テキストを分割す
るにあたっては、上述した要素を充分考慮する必要があ
る。ところが、この字幕文テキストの分割に伴い、適切
箇所で分割後の提示単位字幕の各々に対し、その分割箇
所に対応したタイミング情報を付与しなければならない
といった新たな課題を生ずる。As described above, when the caption text is divided for the purpose of obtaining a caption that is easier to read and understand, it is necessary to sufficiently consider the above-described elements. However, with the division of the caption sentence text, there arises a new problem that it is necessary to add timing information corresponding to the divided portion to each of the divided presentation unit subtitles at an appropriate portion.

【００４６】そこで、本発明は、本発明で提案するアナ
ウンス音声と字幕文テキストの同期検出技術、及び日本
語の読み及びその発音に関する統計的特徴解析手法等を
適用することにより、所定の提示形式に従って適切箇所
で分割された提示単位字幕の各々に対し、その分割箇所
に対応した高精度のタイミング情報の自動付与を実現す
るようにしている。Therefore, the present invention employs a synchronous detection technology for announcement sound and subtitle sentence text proposed in the present invention, and a statistical feature analysis method for reading and pronunciation of Japanese, and the like, to provide a predetermined presentation format. , Automatic presentation of high-precision timing information corresponding to the divided portion is realized for each of the presentation unit subtitles divided at an appropriate portion according to the above.

【００４７】さて、本実施形態の説明に先立って、以下
の説明で使用する用語の定義付けを行うと、本実施形態
の説明において、提示対象となる字幕文の全体集合を
「字幕文テキスト」と言い、字幕文テキストのうち、適
宜の句点で区切られたひとかたまりの字幕文の部分集合
を「単位字幕文」と言い、ディスプレイの表示画面上に
おいて提示単位となる字幕を「提示単位字幕」と言い、
提示単位字幕に含まれる各行の個々の字幕を表現すると
き、これを「提示単位字幕行」と言い、提示単位字幕行
のうちの任意の文字を表現するとき、これを「字幕文
字」と言うことにする。なお、表示画面上に単独行の提
示単位字幕を提示するとき、「提示単位字幕」と「提示
単位字幕行」とは同義となるため、この場合、「提示単
位字幕行」の表現はあえて使用しないことととする。Before the description of this embodiment, terms used in the following description are defined. In the description of this embodiment, the entire set of subtitle sentences to be presented is referred to as “subtitle text”. In the caption text, a subset of the caption text separated by appropriate punctuation is referred to as “unit subtitle text”, and the subtitle that is the presentation unit on the display screen of the display is referred to as “presentation subtitle”. say,
When expressing individual subtitles of each line included in the presentation unit subtitles, this is called "presentation unit subtitle line", and when expressing any character in the presentation unit subtitle line, this is called "subtitle character" I will. When presenting a single-line presentation unit subtitle on the display screen, the “presentation unit subtitle line” and “presentation unit subtitle line” are synonymous. In this case, the expression “presentation unit subtitle line” is used I will not do it.

【００４８】まず、本発明に係る字幕へのタイミング情
報付与方法を具現化する自動字幕番組制作システム１１
の概略構成について、図１を参照して説明する。First, an automatic subtitle program production system 11 that embodies the method for adding timing information to subtitles according to the present invention.
Will be described with reference to FIG.

【００４９】同図に示すように、自動字幕番組制作シス
テム１１は、電子化原稿記録媒体１３と、同期検出装置
１５と、統合化装置１７と、形態素解析部１９と、分割
ルール記憶部２１と、番組素材ＶＴＲ例えばディジタル
・ビデオ・テープ・レコーダ（以下、「Ｄ−ＶＴＲ」と
言う）２３と、を含んで構成されている。As shown in the figure, the automatic subtitle program production system 11 includes an electronic document recording medium 13, a synchronization detection device 15, an integration device 17, a morphological analysis unit 19, a division rule storage unit 21, , A program material VTR, for example, a digital video tape recorder (hereinafter, referred to as “D-VTR”) 23.

【００５０】電子化原稿記録媒体１３は、例えばハード
ディスク記憶装置やフロッピーディスク装置等より構成
され、提示対象となる字幕の全体集合を表す字幕文テキ
ストを記憶している。なお、本実施形態では、ほぼ共通
の電子化原稿をアナウンス用と字幕用の双方に利用する
形態を想定しているので、電子化原稿記録媒体１３に記
憶される字幕文テキストの内容は、提示対象字幕と一致
するばかりでなく、素材ＶＴＲに収録されたアナウンス
音声とも一致しているものとする。The digitized original recording medium 13 is composed of, for example, a hard disk storage device, a floppy disk device, or the like, and stores subtitle sentence text representing the entire set of subtitles to be presented. In this embodiment, since it is assumed that a substantially common digitized manuscript is used for both the announcement and the subtitle, the contents of the subtitle sentence text stored in the digitized manuscript recording medium 13 are presented. It is assumed that not only does it match the target subtitle, but also matches the announcement sound recorded on the material VTR.

【００５１】同期検出装置１５は、同期検出点付字幕文
と、これを読み上げたアナウンス音声との間における時
間同期を検出する機能等を有している。さらに詳しく述
べると、同期検出装置１５は、統合化装置１７で付与し
た同期検出点付字幕文が送られてくると、この字幕文に
関し、番組素材ＶＴＲから取り込んだこの字幕文に対応
するアナウンス音声及びそのタイムコードを参照して、
指定された同期検出点のタイミング情報、すなわちタイ
ムコードを検出するとともに、このアナウンス音声に含
まれるポーズ点を検出し、検出したタイムコードやポー
ズ点を統合化装置１７宛に送出する機能を有している。The synchronization detecting device 15 has a function of detecting the time synchronization between the caption sentence with the synchronization detection point and the announcement sound read out. More specifically, when the synchronization detection device 15 receives the subtitle sentence with the synchronization detection point provided by the integration device 17, the synchronization detection device 15 relates to the announcement sound corresponding to the subtitle sentence fetched from the program material VTR. And its timecode,
It has a function of detecting timing information of a designated synchronization detection point, that is, a time code, detecting a pause point included in the announcement sound, and transmitting the detected time code and pause point to the integrating device 17. ing.

【００５２】なお、上述したタイミング情報としてのタ
イムコードの同期検出は、本発明者らが研究開発したア
ナウンス音声を対象とした音声認識処理を含むアナウン
ス音声と字幕文テキスト間の同期検出技術を適用するこ
とで高精度に実現可能である。The synchronous detection of the time code as the above-mentioned timing information is performed by applying the synchronous detection technology between the announcement sound and the caption text including the speech recognition processing for the announcement sound which has been researched and developed by the present inventors. By doing so, it can be realized with high accuracy.

【００５３】すなわち、字幕送出タイミング検出の流れ
は、図６に示すように、まず、かな漢字交じり文で表記
されている字幕文テキストを、音声合成などで用いられ
ている読付け技術を用いて発音記号列に変換する。この
変換には、「日本語読付けシステム」を用いる。次に、
あらかじめ学習しておいた音響モデル（ＨＭＭ：隠れマ
ルコフモデル）を参照し、「音声モデル合成システム」
によりこれらの発音記号列をワード列ペアモデルと呼ぶ
音声モデル（ＨＭＭ）に変換する。そして、「最尤照合
システム」を用いてワード列ペアモデルにアナウンス音
声を通して比較照合を行うことにより、字幕送出タイミ
ングの同期検出を行う。That is, as shown in FIG. 6, the flow of the detection of the subtitle transmission timing is as follows. First, the subtitle sentence text described in the kana-kanji mixed sentence is pronounced using a reading technique used in speech synthesis or the like. Convert to a symbol string. The "Japanese reading system" is used for this conversion. next,
Refer to the acoustic model (HMM: Hidden Markov Model) that has been learned in advance,
To convert these phonetic symbol strings into a speech model (HMM) called a word string pair model. Then, by performing the comparison and collation through the announcement sound to the word string pair model using the “maximum likelihood collation system”, the synchronization detection of the caption transmission timing is performed.

【００５４】字幕送出タイミング検出の用途に用いるア
ルゴリズム(ワード列ペアモデル)は、キーワードスポッ
ティングの手法を採用している。キーワードスポッティ
ングの手法として、フォワード・バックワードアルゴリ
ズムにより単語の事後確率を求め、その単語尤度のロー
カルピークを検出する方法が提案されている。ワード列
ペアモデルは、図７に示すように、これを応用して字幕
と音声を同期させたい点、すなわち同期点の前後でワー
ド列１ (Keywords1)とワード列２ (Keywords2)とを連結
したモデルになっており、ワード列の中点（Ｂ）で尤度
を観測してそのローカルピークを検出し、ワード列２の
発話開始時間を高精度に求めることを目的としている。
ワード列は、音素ＨＭＭの連結により構成され、ガーベ
ジ (Garbage)部分は全音素ＨＭＭの並列な枝として構成
されている。また、アナウンサが原稿を読む場合、内容
が理解しやすいように息継ぎの位置を任意に定めること
から、ワード列１，２間にポーズ (Pause)を挿入してい
る。なお、ポーズ時間の検出に関しては、素材ＶＴＲか
ら音声とそのタイムコードが供給され、その音声レベル
が指定レベル以下で連続する開始、終了タイムコードか
ら、周知の技術で容易に達成できる。The algorithm (word string pair model) used for detecting the caption sending timing employs a keyword spotting technique. As a keyword spotting technique, a method has been proposed in which a posterior probability of a word is obtained by a forward / backward algorithm, and a local peak of the word likelihood is detected. As shown in FIG. 7, the word string pair model is applied to the point where it is desired to synchronize subtitles and audio, that is, word string 1 (Keywords1) and word string 2 (Keywords2) are connected before and after the synchronization point. The model is designed to observe the likelihood at the middle point (B) of the word string, detect its local peak, and obtain the utterance start time of the word string 2 with high accuracy.
The word sequence is formed by connecting phoneme HMMs, and the garbage (Garbage) portion is formed as parallel branches of all phoneme HMMs. When the announcer reads the manuscript, a pause is inserted between the word strings 1 and 2 because the position of the breath is arbitrarily determined so that the contents can be easily understood. The detection of the pause time can be easily achieved by a well-known technique from the start and end time codes in which a sound and its time code are supplied from the material VTR and the sound level is continuous below a specified level.

【００５５】統合化装置１７は、電子化原稿記録媒体１
３から読み出した字幕文テキストのうち、文頭を起点と
した所要文字数範囲を目安とした単位字幕文を順次抽出
する単位字幕文抽出機能と、単位字幕文抽出機能を発揮
することで抽出した単位字幕文を、所望の提示形式に従
う提示単位字幕に変換する提示単位字幕化機能と、提示
単位字幕化機能を発揮することで変換された提示単位字
幕に対し、同期検出装置１５から送出されてきたタイム
コード及びポーズ点を利用してタイミング情報を付与す
るタイミング情報付与機能と、を有している。The integrating device 17 stores the digitized original recording medium 1
3. A unit subtitle sentence extraction function for sequentially extracting unit subtitle sentences based on the required number of characters starting from the beginning of the subtitle sentence text read from unit 3, and a unit subtitle extracted by using the unit subtitle sentence extraction function A presentation unit subtitle conversion function of converting a sentence into a presentation unit subtitle according to a desired presentation format, and a time transmitted from the synchronization detection device 15 for the presentation unit subtitle converted by exhibiting the presentation unit subtitle function. A timing information adding function of adding timing information using a code and a pause point.

【００５６】形態素解析部１９は、漢字かな交じり文で
表記されている単位字幕文を対象として、形態素毎に分
割する分割機能と、分割機能を発揮することで分割され
た各形態素毎に、表現形、品詞、読み、標準表現などの
付加情報を付与する付加情報付与機能と、各形態素を文
節や節単位にグループ化し、いくつかの情報素列を得る
情報素列取得機能と、を有している。これにより、単位
字幕文は、表面素列、記号素列（品詞列）、標準素列、
及び情報素列として表現される。The morphological analysis unit 19 divides the unit caption sentence described in the kanji kana mixed sentence into morphemes, and expresses each of the morphemes divided by performing the splitting function. It has an additional information addition function to add additional information such as shape, part of speech, reading, and standard expression, and an information element sequence acquisition function to group each morpheme into clauses and clauses and obtain some information element strings. ing. As a result, the unit caption sentence is composed of a surface sequence, a symbol sequence (part of speech), a standard sequence,
And an information element sequence.

【００５７】分割ルール記憶部２１は、単位字幕文を対
象とした改行・改頁箇所の最適化を行う際に参照される
分割ルールを記憶する機能を有している。The division rule storage unit 21 has a function of storing a division rule that is referred to when optimizing a line feed / page break position for a unit subtitle sentence.

【００５８】Ｄ−ＶＴＲ２３は、番組素材が収録されて
いる番組素材ＶＴＲテープから、映像、音声、及びそれ
らのタイムコードを再生出力する機能を有している。The D-VTR 23 has a function of reproducing and outputting video, audio, and their time codes from a program material VTR tape on which the program material is recorded.

【００５９】次に、自動字幕番組制作システム１１にお
いて主要な役割を果たす統合化装置１７の内部構成につ
いて説明していく。Next, the internal configuration of the integrating device 17 which plays a major role in the automatic subtitle program production system 11 will be described.

【００６０】統合化装置１７は、単位字幕文抽出部３３
と、提示単位字幕化部３５と、タイミング情報付与部３
７と、を含んで構成されている。The integrating device 17 includes a unit subtitle sentence extracting unit 33
, Presentation unit subtitle conversion unit 35, timing information addition unit 3
7 are included.

【００６１】単位字幕文抽出部３３は、電子化原稿記録
媒体１３から読み出した、単位字幕文が提示時間順に配
列された字幕文テキストのなかから、例えば７０〜９０
字幕文字程度を目安とし、付加した区切り可能箇所情報
等を活用するなどして処理単位とするテキスト文を順次
抽出する機能を有している。なお、区切り可能箇所情報
としては、形態素解析部１９で得られた文節データ付き
形態素解析データ、及び分割ルール記憶部２１に記憶さ
れている分割ルール（改行・改頁データ）を利用するこ
ともできる。ここで、上述した分割ルール（改行・改頁
データ）について述べると、分割ルール（改行・改頁デ
ータ）で定義される改行・改頁推奨箇所は、第１に句点
の後ろ、第２に読点の後ろ、第３に文節と文節の間、第
４に形態素品詞の間、を含んでおり、分割ルール（改行
・改頁データ）を適用するにあたっては、上述した記述
順の先頭から優先的に適用するのが好ましい。The unit subtitle sentence extraction unit 33 reads, for example, 70 to 90 units of subtitle sentence text read from the digitized original recording medium 13 and arranged in the order of presentation time.
It has a function of sequentially extracting text sentences to be processed in units by using, for example, information on a delimitable portion, using the caption character as a guide. Note that, as the delimitable portion information, morphological analysis data with phrase data obtained by the morphological analysis unit 19 and division rules (line feed / page break data) stored in the division rule storage unit 21 can also be used. . Here, regarding the above-described division rule (line break / page break data), the recommended line break / page break point defined by the division rule (line break / page break data) is first after a punctuation mark and secondly by a reading point. After the third, between clauses, and fourth, between morpheme parts of speech. When applying the division rule (line feed / page break data), priority is given from the top of the above-described description order. It is preferably applied.

【００６２】提示単位字幕化部３５は、単位字幕文抽出
部３３で抽出した単位字幕文、単位字幕文に付加した区
切り可能箇所情報、及び同期検出装置１５からの情報等
に基づいて、単位字幕文抽出部３３で抽出した単位字幕
文を、所望の提示形式に従う少なくとも１以上の提示単
位字幕に変換する提示単位字幕化機能を有している。The presentation unit captioning unit 35 generates a unit caption based on the unit caption sentence extracted by the unit caption sentence extraction unit 33, the delimitable location information added to the unit caption sentence, information from the synchronization detection device 15, and the like. It has a presentation unit subtitle conversion function of converting the unit subtitle sentences extracted by the sentence extraction unit 33 into at least one or more presentation unit subtitles according to a desired presentation format.

【００６３】タイミング情報付与部３７は、提示単位字
幕化部３５で変換された提示単位字幕に対し、同期検出
装置１５から送出されてきたタイムコード及びポーズ点
を利用し、後述のタイミング内挿手法を用いてタイミン
グ情報を付与するタイミング情報付与機能を有してい
る。The timing information adding unit 37 uses the time code and the pause point transmitted from the synchronization detection device 15 for the presentation unit subtitle converted by the presentation unit subtitle conversion unit 35, and uses a timing interpolation method described later. And has a timing information adding function of adding timing information by using.

【００６４】次に、本発明に係る字幕へのタイミング情
報付与方法について、図２乃至図５を参照しつつ説明す
る。Next, a method of adding timing information to subtitles according to the present invention will be described with reference to FIGS.

【００６５】既述したように、アナウンス音声に対応す
る字幕に関するタイミング情報の同期検出は、本発明者
らが研究開発したアナウンス音声を対象とした音声認識
処理を含むアナウンス音声と字幕文テキスト間の同期検
出技術を適用することで高精度に実現可能であるが、こ
の同期検出処理はかなり複雑であり、一定の処理時間を
要するために、各提示単位字幕の全ての始点／終点タイ
ムコードを対象として同期検出技術を適用したのでは、
同期検出点が過多となり、字幕の提示に関する即時性が
損なわれてしまうおそれがある。As described above, the synchronous detection of the timing information related to the caption corresponding to the announcement voice is performed between the announcement voice including the voice recognition processing for the announcement voice researched and developed by the present inventors and the caption text. Although it can be realized with high accuracy by applying the synchronization detection technology, this synchronization detection process is quite complicated and requires a certain processing time. Therefore, it is necessary to target all start / end time codes of each presentation unit subtitle. In applying the synchronization detection technology as
There is a possibility that the number of synchronization detection points becomes excessive and the immediacy regarding the presentation of subtitles is impaired.

【００６６】ここで、字幕へのタイミング情報付与時期
に着目して字幕へのタイミング情報付与方法を分析する
と、分割後の字幕に基づくタイミング情報付与形態と、
分割前の字幕に基づくタイミング情報付与形態と、に大
別することができる。Here, a method of adding timing information to subtitles is analyzed by paying attention to the timing of adding timing information to subtitles.
It can be broadly divided into a timing information adding mode based on subtitles before division.

【００６７】分割後の字幕に基づくタイミング情報付与
形態では、付与対象となる字幕が確定しているので、そ
の始点／終点においてアナウンス音声と字幕文テキスト
間を比較することで同期検出を行い、始点／終点毎のタ
イミング情報を各々付与すればよい。In the timing information adding mode based on the divided subtitles, since the subtitles to be added are determined, synchronization detection is performed by comparing the announcement sound and the subtitle sentence text at the start point / end point, and the start point is determined. / It is only necessary to add timing information for each end point.

【００６８】この形態は、字幕に対して直接的にタイミ
ング情報を割り付け付与することから最も確実でその同
期精度も高い反面、同一の字幕文テキストを基に種々の
提示形式に従う字幕を制作する場合であっても、各提示
形式毎に複雑かつ一定の処理時間を要する同期検出を行
わなければならない結果として、字幕の提示に関する即
時性が損なわれてしまうおそれがあるといった課題を内
在している。In this mode, timing information is directly assigned to subtitles, which is the most reliable and has high synchronization accuracy. On the other hand, when producing subtitles in accordance with various presentation formats based on the same subtitle sentence text, However, there is a problem that the immediacy relating to the presentation of subtitles may be impaired as a result of the need to perform synchronous detection requiring a complicated and constant processing time for each presentation format.

【００６９】これに対し、分割前の字幕に基づくタイミ
ング情報付与形態は、同一の字幕文テキストから種々の
提示形式に従う字幕を制作する場合にも適したものであ
る。この場合、まず、分割前の字幕文テキストに対し、
例えば文頭などの各所に適当な間隔をおいて、同期検出
技術を適用することで基準となるタイミング情報を付与
しておき、その後、字幕文テキストを所定の提示形式に
従う適切箇所で分割していくことで提示単位字幕化を行
い、基準となるタイミング情報と、提示単位字幕が呈す
る文字種及び文字数、又は発音記号列などを含む文字情
報と、に基づいて、後述する内挿法を適用することで類
推演算したタイミング情報を、各提示単位字幕の始点／
終点のうち少なくともいずれか一方に付与するといった
手順を踏むので、各提示単位字幕の全ての始点／終点を
対象とした複雑かつ一定の処理時間を要する同期検出技
術の適用を要しない結果として、字幕の提示に関する即
時性の良好な維持を期待することができる。On the other hand, the timing information adding mode based on the subtitles before division is suitable for producing subtitles according to various presentation formats from the same subtitle text. In this case, first, for the subtitle text before splitting,
For example, at appropriate intervals at the beginning of a sentence or the like, reference timing information is added by applying a synchronization detection technique, and then the subtitle sentence text is divided into appropriate portions according to a predetermined presentation format. By performing the presentation unit subtitles, based on the timing information as a reference, character information including the character type and the number of characters presented by the presentation unit subtitles, or phonetic symbol string, by applying the interpolation method described later The timing information calculated by analogy is used as the starting point of each presentation unit subtitle /
Since the procedure of assigning to at least one of the end points is performed, it is not necessary to apply a synchronous detection technique that requires a complicated and constant processing time for all the start points / end points of each presentation unit subtitle. It can be expected that the promptness of the presentation of the information is well maintained.

【００７０】ここで、分割前の字幕に基づくタイミング
情報内挿付与形態は、さらに、文字種に着目したタイミ
ング情報付与方法と、発音記号列に着目したタイミング
情報付与方法と、に大別することができる。なお、以下
の説明において、文字種に着目したタイミング情報付与
方法を第１のタイミング情報付与方法と呼ぶ一方、発音
記号列に着目したタイミング情報付与方法を第２のタイ
ミング情報付与方法と呼ぶ場合があることを付言してお
く。Here, the timing information interpolation providing method based on the subtitles before division can be further roughly classified into a timing information providing method focusing on a character type and a timing information providing method focusing on a phonetic symbol string. it can. In the following description, a timing information providing method focusing on a character type is referred to as a first timing information providing method, while a timing information providing method focusing on a phonetic symbol sequence is sometimes referred to as a second timing information providing method. I will add that.

【００７１】第１のタイミング情報付与方法では、提示
単位字幕が呈する文字情報として文字種及び文字数を利
用し、タイミング情報を類推演算するにあたっては、漢
字・アラビア数字・英字などを含むその他の文字の読み
時間を、ひらがな又はカタカナを含む文字の読み時間に
対し、例えば図２に示すように、実際のＴＶニュース文
に含まれるこれら文字種の発音数を対象とした統計的な
調査から得られる、約１．８６倍などの所定倍率に時間
換算し、ひらがな又はカタカナが呈する読み時間と、そ
の他の文字が呈する読み時間換算値と、の積算値、及び
基準となるタイミング情報に基づいて、字幕に付与する
タイミング情報を類推演算する。そして、この類推演算
結果をタイミング情報として、分割後の提示単位字幕に
内挿付与するのである。In the first timing information adding method, the character type and the number of characters are used as the character information presented by the presentation unit subtitle, and the timing information is calculated by analogy when reading other characters including kanji, Arabic numerals, and alphabetic characters. The time is calculated based on the reading time of characters including hiragana or katakana, as shown in FIG. 2, for example, as shown in FIG. 2, obtained from a statistical survey on the number of pronunciations of these character types included in actual TV news sentences. Time conversion to a predetermined magnification such as .86 times, and is added to subtitles based on the integrated value of the reading time represented by Hiragana or Katakana, the converted reading time represented by other characters, and reference timing information. The timing information is calculated by analogy. Then, the result of the analogy calculation is added as timing information to the presentation unit subtitle after the division.

【００７２】第１のタイミング情報付与方法について、
図４に示すニュース文を例示してさらに詳しく述べる
と、分割字幕文１の文頭「ｔ」と、分割字幕文２の文末
「ａ」と、に予め基準となるタイミング情報が付与され
ており、それぞれのタイミング情報をＴＢ，ＴＥと想定
した場合において、分割字幕文２の文頭「ｉ」のタイミ
ング情報ＴＭは、下記の手順によって類推演算する。Regarding the first timing information adding method,
4 will be described in more detail by exemplifying the news sentence shown in FIG. 4. Reference timing information is added in advance to the beginning “t” of the divided subtitle sentence 1 and the end of the sentence “a” of the divided subtitle sentence 2, Assuming that the respective pieces of timing information are TB and TE, the timing information TM at the beginning of the sentence “i” of the divided subtitle sentence 2 is estimated by the following procedure.

【００７３】まず、分割字幕文１に含まれるひらがな又
はカタカナの文字数は１２、その他の漢字等の文字数は
７であり、また、分割字幕文２に含まれるひらがな又は
カタカナの文字数は１１、その他の漢字等の文字数は３
である。ひらがな又はカタカナの読み時間を「１」と想
定したとき、分割字幕文１，２の総読み時間ＴＲ１，Ｔ
Ｒ２は次式１，２により求められる。First, the number of characters of hiragana or katakana included in the divided subtitle sentence 1 is 12, the number of characters of other kanji etc. is 7, the number of characters of hiragana or katakana included in the divided subtitle sentence 2 is 11, and other characters are included. Number of characters such as kanji is 3
It is. Assuming that the reading time of Hiragana or Katakana is "1", the total reading time TR1, T of the divided subtitle sentences 1, 2
R2 is obtained by the following equations (1) and (2).

【００７４】ＴＲ１＝（１２＊１）＋（７＊１．８６）＝２５ …（式１）ＴＲ２＝（１１＊１）＋（３＊１．８６）＝１６．６ …（式２）この計算結果である分割字幕文１，２の各総読み時間Ｔ
Ｒ１，ＴＲ２を活用して、分割字幕文１，２間の分割点
である分割字幕文２の文頭「ｉ」のタイミング情報ＴＭ
を、時間比率の手法を用いて次式３によって類推演算す
る。TR1 = (12 * 1) + (7 * 1.86) = 25 (Expression 1) TR2 = (11 * 1) + (3 * 1.86) = 16.6 (Expression 2) The total reading time T of the divided subtitle sentences 1 and 2 as the calculation result
Utilizing R1 and TR2, the timing information TM of the head “i” of the divided subtitle sentence 2 which is a division point between the divided subtitle sentences 1 and 2
Is calculated by the following equation 3 using the time ratio method.

【００７５】ＴＭ＝ＴＢ＋（ＴＥ−ＴＢ）＊ＴＲ１／（ＴＲ１＋ＴＲ２）＝ＴＢ＋（ＴＥ−ＴＢ）＊０．６ …（式３）このようにして、分割字幕文２の文頭「ｉ」のタイミン
グ情報ＴＭを類推演算することができ、この類推演算結
果ＴＭをタイミング情報として、分割字幕文２の文頭
「ｉ」に付与するのである。なお、分割字幕文２の文頭
「ｉ」のタイミング情報ＴＭは、分割字幕文１の文末
「ｅ」のタイミング情報として取り扱うこともできる。TM = TB + (TE−TB) * TR1 / (TR1 + TR2) = TB + (TE−TB) * 0.6 (Equation 3) In this way, the timing information of the head “i” of the divided subtitle sentence 2 The TM can be calculated by analogy, and the analogy calculation result TM is added to the beginning “i” of the divided subtitle sentence 2 as timing information. Note that the timing information TM at the beginning of the subtitle sentence 2 of the divided subtitle sentence 2 can be handled as the timing information at the end of the sentence “e” of the divided subtitle sentence 1.

【００７６】ここで、統計的手法によって求めた所定文
字種の平均読み数を利用した第１のタイミング情報付与
方法では、漢字・アラビア数字・英字を含むその他の文
字の多少にかかわらず、どの字幕文に対しても例えば約
１．８６倍等の同一の倍率を適用する結果として、必然
的に時間誤差を生ずるおそれがある。そこで、第１のタ
イミング情報付与方法における時間誤差が与える影響に
ついて考察してみる。Here, in the first timing information adding method using the average number of readings of a predetermined character type obtained by the statistical method, the subtitle sentence is determined regardless of the number of other characters including Chinese characters, Arabic numerals, and English characters. For example, as a result of applying the same magnification such as about 1.86 times, a time error may inevitably occur. Therefore, the effect of the time error in the first timing information providing method will be considered.

【００７７】まず、前提として、字幕の基となる字幕文
テキストには、３０文字毎の間隔をおいて同期検出技術
を用いて検出した正確なタイミング情報が付与され、ま
た、一行１５文字の二行提示単位字幕とし、前頁二行目
開始点と、着目している現頁一行目終了点と、の各々に
は正確なタイミング情報が付与されているものとし、さ
らに、平均読み数の標準値を１．８６２と想定する。そ
して、上述した前提下において、平均読み数が上記標準
値とは異なる場合の着目している現頁一行目における開
始点に該当するタイミング情報が呈する時間誤差を試算
した。この時間誤差の試算結果を図３に示している。図
３において、前頁二行目は全て漢字、現頁一行目は全て
ひらがなとし、字幕速度が５，６，７文字／秒の場合を
それぞれ示した。First, it is premised that the subtitle sentence text that is the basis of the subtitles is provided with accurate timing information detected by using the synchronization detection technique at intervals of 30 characters. Line presentation unit subtitles, accurate timing information shall be given to each of the starting point of the second line of the previous page and the ending point of the first line of the current page of interest. Assume the value is 1.862. Then, on the premise described above, a trial calculation was made of the time error given by the timing information corresponding to the start point on the first line of the current page of interest when the average number of readings is different from the standard value. FIG. 3 shows the calculation result of the time error. In FIG. 3, the second line of the previous page is all kanji, the first line of the current page is all hiragana, and the case where the subtitle speed is 5, 6, 7 characters / second is shown.

【００７８】同図に示すように、この試算での最大時間
誤差は０．１６２秒（遅れ）であるが、本発明者らが別
途研究している字幕提示タイミングにおける時間誤差の
許容範囲に関する評価実験結果から、概ね±１．０秒程
度の時間誤差は許容範囲にあるとみなすことができるの
で、したがって、上述した統計的手法によって求めた文
字種の平均読み数を利用した第１のタイミング情報付与
方法は、簡便ながらかなり実用的な手法であると言うこ
とができる。As shown in the figure, the maximum time error in this trial calculation is 0.162 seconds (delay), but the present inventors are studying separately and evaluating the allowable range of the time error in the subtitle presentation timing. From the experimental results, a time error of about ± 1.0 second can be considered to be within an allowable range. Therefore, the first timing information addition using the average number of readings of the character type obtained by the above-described statistical method is performed. The method can be said to be a simple but fairly practical approach.

【００７９】次に、発音記号列に着目した第２のタイミ
ング情報付与方法では、提示単位字幕が呈する文字情報
として、各提示単位字幕に含まれる発音記号列を利用し
て、各発音記号の音素にそれぞれ対応する読み時間を統
計的手法を用いてテーブル化した例えば図５に示すよう
な音素時間表を参照しながら、字幕に付与するタイミン
グ情報を類推演算し、この類推演算結果をタイミング情
報として、分割後の提示単位字幕に付与するのである。Next, in the second timing information adding method focusing on the phonetic symbol sequence, the phoneme of each phonetic symbol is used as the character information presented by the presentation unit subtitle using the phonetic symbol sequence included in each presentation unit subtitle. The timing information to be added to the subtitles is calculated by analogy with reference to a phoneme time table as shown in FIG. 5, for example, in which the reading time corresponding to each is tabulated using a statistical method, and the analogy calculation result is used as timing information. Is added to the presentation unit subtitle after division.

【００８０】第２のタイミング情報付与方法について、
図４に示すニュース文を例示してさらに詳しく述べる
と、分割字幕文１の文頭「ｔ」と、分割字幕文２の文末
「ａ」と、に予め基準となるタイミング情報が付与され
ており、それぞれのタイミング情報をＴＢ，ＴＥと想定
した場合において、分割字幕文２の文頭「ｉ」のタイミ
ング情報ＴＭは、下記の手順によって類推演算する。な
お、図４における日本語読付け結果は、「，」で区切ら
れた発音記号列であり、各発音記号で表示される
「ｔ」，「ａ」，「ｉ」…などがそれぞれ音素である。
この音素については、音声データベースの解析から得た
図５に示す音素時間表を予め用意されているので、日本
語読付け結果である音素の列びと、その音素に対応する
読み時間である音素時間と、に基づいて、分割字幕文２
の文頭「ｉ」のタイミング情報ＴＭを次述の内挿法を用
いて類推演算することができる。The second timing information adding method is as follows.
4 will be described in more detail by exemplifying the news sentence shown in FIG. 4. Reference timing information is added in advance to the beginning “t” of the divided subtitle sentence 1 and the end of the sentence “a” of the divided subtitle sentence 2, Assuming that the respective pieces of timing information are TB and TE, the timing information TM at the beginning of the sentence “i” of the divided subtitle sentence 2 is estimated by the following procedure. In addition, the Japanese reading result in FIG. 4 is a phonetic symbol string delimited by “,”, and “t”, “a”, “i”... .
For this phoneme, the phoneme time table shown in FIG. 5 obtained from the analysis of the speech database is prepared in advance, so that the phoneme sequence as the Japanese reading result and the phoneme time as the reading time corresponding to the phoneme are prepared. And based on the subtitle sentence 2
The timing information TM at the beginning of the sentence "i" can be calculated by analogy using the interpolation method described below.

【００８１】すなわち、分割字幕文１，２の各々に対応
する読付け１，２の音素列びから得られる総読み時間Ｔ
Ｒ３，ＴＲ４は次式４，５により求められる。That is, the total reading time T obtained from the phoneme strings of readings 1 and 2 corresponding to the divided subtitle sentences 1 and 2, respectively.
R3 and TR4 are obtained by the following equations (4) and (5).

【００８２】ＴＲ３＝Ｔｔ＋Ｔａ＋Ｔｉ＋…＋Ｔｉ＋Ｔｔ＋Ｔｅ …（式４）ＴＲ４＝Ｔｉ＋Ｔｋ＋Ｔｅ＋…＋Ｔｉ＋Ｔｔ＋Ｔａ …（式５）ここで、例えば、「Ｔｔ」とは、図５に示す音素時間表
における音素「ｔ」に対応する読み時間５．６２７３１
６であり、また、「Ｔａ」とは、音素時間表における音
素「ａ」に対応する読み時間７．１３０９４１であり、
以下同様に、各音素に対応する読み時間を音素時間表か
ら取り出すことができる。TR3 = Tt + Ta + Ti +... + Ti + Tt + Te (Equation 4) TR4 = Ti + Tk + Te +. Time 5.62731
6, and “Ta” is a reading time 7.130941 corresponding to the phoneme “a” in the phoneme time table;
Similarly, the reading time corresponding to each phoneme can be extracted from the phoneme time table.

【００８３】この積算結果である分割字幕文１，２の各
々に対応する読付け１，２の音素列びから得られる総読
み時間ＴＲ３，ＴＲ４を活用して、分割字幕文１，２間
の分割点である分割字幕文２の文頭「ｉ」のタイミング
情報ＴＭを、時間比率の手法を用いて次式６によって類
推演算する。By utilizing the total reading time TR3, TR4 obtained from the phoneme sequence of reading 1, 2 corresponding to each of the divided subtitle sentences 1, 2 as the integration result, the divided subtitle sentences 1, 2 are used. The timing information TM of the beginning of the subtitle sentence 2 of the subtitle sentence 2 which is the division point is calculated by analogy with the following equation 6 using the method of the time ratio.

【００８４】ＴＭ＝ＴＢ＋（ＴＥ−ＴＢ）＊ＴＲ１／（ＴＲ１＋ＴＲ２） …（式６）このようにして、分割字幕文２の文頭「ｉ」のタイミン
グ情報ＴＭを類推演算することができ、この類推演算結
果ＴＭをタイミング情報として、分割字幕文２の文頭
「ｉ」に内挿付与するのである。なお、分割字幕文２の
文頭「ｉ」のタイミング情報ＴＭは、分割字幕文１の文
末「ｅ」のタイミング情報として取り扱うこともでき
る。TM = TB + (TE−TB) * TR1 / (TR1 + TR2) (Equation 6) In this manner, the timing information TM of the beginning “i” of the divided subtitle sentence 2 can be calculated by analogy. The calculation result TM is interpolated to the head “i” of the divided subtitle sentence 2 as timing information. Note that the timing information TM at the beginning of the subtitle sentence 2 of the divided subtitle sentence 2 can be handled as the timing information at the end of the sentence “e” of the divided subtitle sentence 1.

【００８５】ここで、第２のタイミング情報付与方法に
よって付与したタイミング情報の時間誤差を簡単な実験
により試算したところ、０．４秒程度に収束することが
確認されており、本第２のタイミング情報付与方法は、
概ね±１．０秒程度の時間誤差は許容範囲にあるとの評
価実験結果を鑑みて、前述の文字種に着目した第１のタ
イミング情報付与方法と同様に、かなり実用的で有効な
手法であると言うことができる。Here, when a time error of the timing information provided by the second timing information providing method was calculated by a simple experiment, it was confirmed that the time error converged to about 0.4 seconds. The information provision method is
In view of the result of the evaluation experiment that the time error of about ± 1.0 second is within an allowable range, this is a fairly practical and effective method, similar to the above-described first timing information adding method focusing on the character type. Can be said.

【００８６】このように、本発明に係る字幕へのタイミ
ング情報付与方法によれば、本発明で提案するアナウン
ス音声と字幕文テキストの同期検出技術、及び日本語の
読み及びその発音に関する統計的特徴解析手法等を適用
することにより、所定の提示形式に従って適切箇所で分
割後の提示単位字幕の各々に対し、その分割箇所に対応
した高精度のタイミング情報の自動付与を実現すること
ができる。As described above, according to the method of adding timing information to subtitles according to the present invention, the synchronous detection technology of announcement sound and subtitle sentence text proposed in the present invention, and statistical features related to reading and pronunciation of Japanese. By applying an analysis method or the like, it is possible to automatically give high-precision timing information corresponding to the divided portion to each of the divided presentation unit subtitles at an appropriate portion according to a predetermined presentation format.

【００８７】なお、本発明は、上述した実施形態の例に
限定されることなく、請求の範囲内において適宜の変更
を加えることにより、その他の態様で実施可能であるこ
とは言うまでもない。It is needless to say that the present invention is not limited to the above-described embodiment, but can be embodied in other forms by appropriately changing it within the scope of the claims.

【００８８】[0088]

【発明の効果】以上詳細に説明したように、請求項１の
発明によれば、字幕文テキストを所定の提示形式に従う
適切箇所で分割後の提示単位字幕の各々に対し、その分
割箇所に対応した高精度のタイミング情報を自動的に付
与可能な字幕へのタイミング情報付与方法を得ることが
できる。As described above in detail, according to the first aspect of the present invention, each of the presentation unit subtitles obtained by dividing the subtitle sentence text at an appropriate location according to a predetermined presentation format corresponds to the division location. It is possible to obtain a method of adding timing information to subtitles, which can automatically add the highly accurate timing information described above.

【００８９】また、請求項２の発明によれば、全字幕文
字を対象とした複雑かつ一定の処理時間を要する同期検
出技術の適用を要しない結果として、字幕の提示に関す
る即時性の良好な維持を期待することができる。According to the second aspect of the present invention, it is not necessary to apply a synchronous detection technique which requires a complicated and constant processing time for all subtitle characters, so that good immediacy regarding presentation of subtitles is maintained. Can be expected.

【００９０】一方、請求項４の発明によれば、請求項２
の発明と同様に、全字幕文字を対象とした複雑かつ一定
の処理時間を要する同期検出技術の適用を要しない結果
として、字幕の提示に関する即時性の良好な維持を期待
することができる。On the other hand, according to the invention of claim 4, according to claim 2,
As in the invention of the first aspect, as a result of not requiring the application of the synchronization detection technique which requires a complicated and constant processing time for all subtitle characters, it is possible to expect to maintain good immediacy regarding the presentation of subtitles.

【００９１】そして、請求項５の発明によれば、簡便な
手法をもって比較的高精度のタイミング情報の類推演算
を実現することができるといったきわめて優れた効果を
奏する。According to the fifth aspect of the invention, there is an extremely excellent effect that a relatively high-precision analogy calculation of timing information can be realized by a simple method.

[Brief description of the drawings]

【図１】図１は、本発明に係る字幕へのタイミング情報
付与方法を具現化する自動字幕番組制作システムの機能
ブロック構成図である。FIG. 1 is a functional block configuration diagram of an automatic subtitle program production system that embodies a method of adding timing information to subtitles according to the present invention.

【図２】図２は、実際のＴＶニュース文を対象とした平
均読み数の調査結果を表す図である。FIG. 2 is a diagram showing a survey result of an average number of readings for an actual TV news sentence.

【図３】図３は、文字種に着目したタイミング情報付与
方法における時間誤差の試算結果を表す図である。FIG. 3 is a diagram illustrating a trial calculation result of a time error in a timing information adding method focusing on a character type;

【図４】図４は、本発明の説明に供する分割字幕文を表
す図である。FIG. 4 is a diagram showing a divided subtitle sentence for explanation of the present invention.

【図５】図５は、発音記号列に着目したタイミング情報
付与方法において利用する音素時間表の一例を表す図で
ある。FIG. 5 is a diagram illustrating an example of a phoneme time table used in a timing information adding method focusing on a phonetic symbol string.

【図６】図６は、アナウンス音声に対する字幕送出タイ
ミングの同期検出技術に係る説明に供する図である。FIG. 6 is a diagram provided for describing a technique for detecting a synchronization of subtitle transmission timing for an announcement sound;

【図７】図７は、アナウンス音声に対する字幕送出タイ
ミングの同期検出技術に係る説明に供する図である。FIG. 7 is a diagram provided for describing a technique for detecting a synchronization of subtitle transmission timing with respect to an announcement sound;

【図８】図８は、現行字幕制作フロー、及び改良された
現行字幕制作フローに係る説明図である。FIG. 8 is an explanatory diagram relating to a current subtitle production flow and an improved current subtitle production flow.

[Explanation of symbols]

１１自動字幕番組制作システム１３電子化原稿記録媒体１５同期検出装置１７統合化装置１９形態素解析部２１分割ルール記憶部２３ディジタル・ビデオ・テープ・レコーダ（Ｄ−Ｖ
ＴＲ）３３単位字幕文抽出部３５提示単位字幕化部３７タイミング情報付与部DESCRIPTION OF SYMBOLS 11 Automatic closed-caption program production system 13 Electronic original recording medium 15 Synchronization detection device 17 Integrated device 19 Morphological analysis unit 21 Division rule storage unit 23 Digital video tape recorder (D-V
TR) 33 unit subtitle sentence extraction unit 35 presentation unit subtitle conversion unit 37 timing information addition unit

───────────────────────────────────────────────────── フロントページの続き (71)出願人 000006013 三菱電機株式会社東京都千代田区丸の内二丁目２番３号 (71)出願人 000004352 日本放送協会東京都渋谷区神南２丁目２番１号 (72)発明者沢村英治東京都港区芝２−31−19 通信・放送機構内 (72)発明者門馬隆雄東京都港区芝２−31−19 通信・放送機構内 (72)発明者福島孝博東京都港区芝２−31−19 通信・放送機構内 (72)発明者丸山一郎東京都港区芝２−31−19 通信・放送機構内 (72)発明者江原暉将東京都港区芝２−31−19 通信・放送機構内 (72)発明者白井克彦東京都港区芝２−31−19 通信・放送機構内Ｆターム(参考） 5C023 AA18 AA38 BA11 BA16 CA01 CA05 5C025 CA09 CB10 DA10 ──────────────────────────────────────────────────続き Continued on the front page (71) Applicant 000006013 Mitsubishi Electric Corporation 2-3-2 Marunouchi, Chiyoda-ku, Tokyo (71) Applicant 000004352 Japan Broadcasting Corporation 2-2-1 Jinnan, Shibuya-ku, Tokyo (72 Inventor Eiji Sawamura 2-31-19 Shiba, Minato-ku, Tokyo Communication and Broadcasting Corporation (72) Inventor Takao Kadoma 2-31-19 Shiba in Minato-ku Tokyo, Japan Communication and Broadcasting Corporation (72) Inventor Takahiro Fukushima Tokyo 2-31-19 Shiba, Minato-ku, Tokyo Communication and Broadcasting Organization (72) Inventor Ichiro Maruyama 2-31-19 Shiba, Minato-ku, Tokyo Inside (72) Inventor Terumasa Ehara 2 Shiba, Minato-ku, Tokyo −31−19 Communications and Broadcasting Corporation (72) Inventor Katsuhiko Shirai 2-31-19, Shiba, Minato-ku, Tokyo Communications and Broadcasting Corporation F-term (reference) 5C023 AA18 AA38 BA11 BA16 CA01 CA05 5C025 CA09 CB10 DA10

Claims

[Claims]

When producing a subtitle program, at least a subtitle sentence text as a basis of the subtitle is divided into an appropriate portion according to a predetermined presentation format, and each presentation unit subtitle is divided into timing information corresponding to the division portion. Is a method of providing timing information to subtitles used when adding, for each part of the subtitle sentence text before division at an appropriate place according to the predetermined presentation format, the reference timing information is added, Performing the presentation unit subtitle by dividing the subtitle sentence text at the appropriate part, based on the timing information serving as the reference, and character information including the character type and the number of characters or the pronunciation symbol string presented by each presentation unit subtitle, Calculating analogous timing information to be given to at least one of the start point and the end point of each of the divided presentation subtitles at the appropriate part; Makubun for each of the presentation units subtitle after division the text in the right location, timing information attaching method of the subtitle, characterized in that the automatically given timing information described above analogy operation.

2. The method according to claim 1, wherein timing information to be added to at least one of a starting point and an ending point of each of the divided presentation unit subtitles at the appropriate position is analogized. In calculating, based on the timing information serving as the reference and the character information including the character type and the number of characters presented by each presentation unit subtitle, the reading time of other characters including Chinese characters, Arabic numerals, and English characters, By converting the reading time of characters including katakana into a predetermined magnification obtained from a statistical survey, at least one of the starting point and the ending point of each presentation unit subtitle after division at the appropriate location is provided. A method for adding timing information to subtitles, comprising performing an analogy calculation of timing information.

3. The method according to claim 2, wherein the predetermined magnification obtained from the statistical investigation is about 1.86.
A method for adding timing information to subtitles, characterized in that the timing information is doubled.

4. The method of adding timing information to subtitles according to claim 1, wherein the timing information to be assigned to at least one of a start point and an end point of each of the divided presentation unit subtitles at the appropriate place is analogized. In calculating, based on the timing information serving as the reference and character information including a phonetic symbol string presented by each presentation unit subtitle, a reading time corresponding to each phoneme of each phonetic symbol is calculated using a statistical method. By summing the phoneme times of the phonetic symbol strings included in each presentation unit subtitle while referring to the converted phoneme time table, at least one of the start point / end point of each presentation unit subtitle after division at the appropriate location A method for adding timing information to subtitles, wherein a timing information to be added to one of the subtitles is calculated by analogy.

5. The method according to claim 2, wherein the timing information is calculated by analogy using a time ratio method. How to add timing information to subtitles.