[go: up one dir, main page]

CN111971744A - Handling speech to text conversion - Google Patents

Handling speech to text conversion Download PDF

Info

Publication number
CN111971744A
CN111971744A CN201980020915.XA CN201980020915A CN111971744A CN 111971744 A CN111971744 A CN 111971744A CN 201980020915 A CN201980020915 A CN 201980020915A CN 111971744 A CN111971744 A CN 111971744A
Authority
CN
China
Prior art keywords
links
glyph
link
text
estimated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201980020915.XA
Other languages
Chinese (zh)
Other versions
CN111971744B (en
Inventor
保罗·安东尼·埃文斯
科林·穆加特罗伊德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clear Xyz Ltd
Original Assignee
Clear Xyz Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clear Xyz Ltd filed Critical Clear Xyz Ltd
Publication of CN111971744A publication Critical patent/CN111971744A/en
Application granted granted Critical
Publication of CN111971744B publication Critical patent/CN111971744B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/04Speaking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The speech analysis is performed by: receiving an estimated glyph corresponding to an estimate made by speech-to-text software of what the user says aloud; comparing the estimated glyph to a reference glyph representing text from which the user is attempting to read; and providing feedback related to the user's speech based on the comparison of the estimated glyph to the reference glyph.

Description

处理语音到文本的转换Handling speech-to-text conversion

技术领域technical field

本发明涉及处理语音到文本(StT)的转换(transcription)。本发明包括方法、设备、计算机程序和非暂时性计算机可读存储介质,其可提供与处理语音到文本的转换相关的改进和/或优点。The present invention relates to processing speech-to-text (StT) transcription. The present invention includes methods, apparatus, computer programs, and non-transitory computer-readable storage media that may provide improvements and/or advantages related to processing speech-to-text conversion.

背景技术Background technique

StT转换软件生成软件估计的一个人大声说出的文字的文本转换本(texttranscript)。当人们讲话时,StT软件通常可执行此操作。在某些情况下,转换本可例如输出到文字处理器,以便StT软件提供口述转换服务(dictation-transcription service),允许用户通过大声朗读来撰写信件或其他文档。在其他情况下,可将文本转换本解析并解释为命令,例如来控制诸如手机、灯泡或气候控制系统之类的装置。The StT conversion software generates a text transcript of the text a person speaks aloud as estimated by the software. StT software usually does this when people speak. In some cases, the translations can be output, for example, to a word processor, so that the StT software provides a dictation-transcription service, allowing users to compose letters or other documents by reading aloud. In other cases, the text converter can be parsed and interpreted as commands, eg, to control devices such as cell phones, light bulbs, or climate control systems.

所要求保护的发明旨在提供对语音到文本的转换的改进处理以及使用语音到文本的转换的附加功能。The claimed invention seeks to provide improved handling of speech-to-text conversion and additional functionality using speech-to-text conversion.

发明内容SUMMARY OF THE INVENTION

在一个方面,本发明提供了一种分析语音的方法,包括:接收由语音到文本软件对参考文本的至少一部分的阅读者的口头阅读做出的估计,该估计包括多个估计字形,每个估计字形表示至少一个字素;将估计字形与表示至少部分参考文本的多个参考字形进行比较,每个参考字形表示至少一个字素;并基于估计字形与参考字形的比较来提供与阅读者语音相关的反馈。In one aspect, the present invention provides a method of analyzing speech, comprising: receiving an estimate made by speech-to-text software of a reader's oral reading of at least a portion of a reference text, the estimate comprising a plurality of estimated glyphs, each estimating the glyph representing at least one grapheme; comparing the estimated glyph to a plurality of reference glyphs representing at least part of the reference text, each reference glyph representing at least one grapheme; and providing a comparison with the reader speech based on the comparison of the estimated glyph to the reference glyph relevant feedback.

本发明的实施方式提供了一种分析来自参考文本的阅读的语音到文本转换的有效方式,从而可提供关于阅读者的流利度的反馈。具体地,通过比较表示字素或字素组合的字形,本发明的实施方式是独立于语言的。Embodiments of the present invention provide an efficient way of analyzing speech-to-text conversion from readings of reference texts so that feedback on the reader's fluency can be provided. In particular, embodiments of the present invention are language-independent by comparing glyphs representing graphemes or grapheme combinations.

在一个实施方式中,每个参考字形表示参考文本中的至少一个单词,并且每个估计字形表示由语音到文本软件估计为阅读者/用户说出的单词的一组一个或多个替代单词或短语。该实施方式改善了同音词(例如“sail”和“sale”),近同音词(“are”和“our”)的处理,以及由于口吃或阅读者严重的口音而引起的转换本中的错误或不确定性。In one embodiment, each reference glyph represents at least one word in the reference text, and each estimated glyph represents a set of one or more alternative words estimated by the speech-to-text software for the word spoken by the reader/user or phrase. This implementation improves the handling of homophones (such as "sail" and "sale"), near-homonyms ("are" and "our"), and errors or inaccuracies in translations due to stuttering or the reader's severe accent certainty.

在一个实施方式中,将估计字形与参考字形进行比较包括:将每个估计字形链接到任何匹配的参考字形以产生多个链接;标识链接之间的冲突;并通过去除一些冲突的链接修剪链接以解决所有已标识的冲突。链接匹配的字形和修剪冲突的链接,可消除转换本中的诸如口吃的不流利、错误和重复单词。链接所有匹配的字形可确保为分析提供最大的链接集。仍然可分析去除的不流利,以直接或间接地根据缺失情况提供反馈——例如,剩余转换本中会有时间间隔,指示不流利发生的位置。In one embodiment, comparing the estimated glyphs to the reference glyphs includes: linking each estimated glyph to any matching reference glyph to generate multiple links; identifying conflicts between the links; and pruning the links by removing some conflicting links to resolve all identified conflicts. Link matching glyphs and trim conflicting links to eliminate slurs, mistakes, and repeated words such as stuttering in conversions. Linking all matching glyphs ensures the largest set of links for analysis. The removed proficiency can still be analyzed to provide feedback, directly or indirectly, based on the absence - for example, there will be time gaps in the remaining translations indicating where the proficiency occurs.

在一个实施方式中,标识链接之间的冲突包括标识违反一组规则中的至少一个规则的链接,该组规则包括:(1)一个参考字形不得与一个以上的估计字形链接;没有一个参考字形可能具有两个链接(2)一个估计字形不得与一个以上参考字形链接;和(3)没有两个链接能够互相交叉。应用这三个规则会去除表示阅读中典型错误的链接,以准确了解阅读者的流利程度。In one embodiment, identifying a conflict between links includes identifying a link that violates at least one of a set of rules, the set of rules comprising: (1) a reference glyph must not be linked with more than one estimated glyph; none of the reference glyphs There may be two links (2) an estimated glyph must not be linked with more than one reference glyph; and (3) no two links can cross each other. Applying these three rules removes links that represent typical errors in reading to get an accurate picture of the reader's fluency.

在一个实施方式中,修剪所述链接包括:选择第一链接并标识与第一链接冲突的一组链接;确定将每个链接保持在所标识的一组冲突链接中的成本,该成本包括与该组中每个链接冲突的链接的数量;并去除所述冲突链接,但成本最低的冲突链接除外。反复选择链接,确定与保持该链接或冲突链接相关的成本,并且保持最低成本的链接有助于确保最终结果在转换本和参考文本之间提供最佳关联。In one embodiment, pruning the links includes selecting a first link and identifying a set of links that conflict with the first link; determining a cost of maintaining each link in the identified set of conflicting links, the cost including the number of conflicting links for each link in the group; and removing said conflicting links except for the lowest cost conflicting link. Iteratively selects a link to determine the cost associated with maintaining that or conflicting links, and keeping the link with the lowest cost helps ensure that the end result provides the best possible correlation between the converted copy and the reference text.

在一个实施方式中,该方法包括从多个链接产生多个链接束,每个链接束包括一个或多个链接,这些链接形成与参考字形的连续序列匹配的估计字形的连续序列;并且其中,标识链接之间的冲突包括标识链接束之间的冲突。修剪链接包括去除一些冲突的链接束以解决所标识的冲突。创建链接束创建表示完全流利阅读的链接部分。这使得能够更好且更一致地处理包含阅读错误或类似“噪音”的转换。通过定义链接束以包括任何单个链接或任何连续的链接组,创建最大数量的链接束集,以进一步改善分析。In one embodiment, the method includes generating a plurality of link bundles from a plurality of links, each link bundle comprising one or more links, the links forming a continuous sequence of estimated glyphs that match a continuous sequence of reference glyphs; and wherein, Identifying conflicts between links includes identifying conflicts between linked bundles. Pruning links involves removing some conflicting link bundles to resolve the identified conflicts. Create Link Bundles Create linked sections that represent full fluent reading. This enables better and more consistent handling of transitions that contain read errors or similar "noise". Create the maximum number of link bundle sets to further improve your analysis by defining link bundles to include any single link or any contiguous group of links.

在一个实施方式中,修剪链接还包括优先保持较大的链接束而不是较小的链接束,以解决所标识的冲突。大的链接束表示流利的读数,几乎没有“噪音”,因此使它们优先于单个链接或较小的链接束有助于确保保留最佳的阅读估计。In one embodiment, pruning the links further includes maintaining larger link bundles in preference to smaller link bundles to resolve the identified conflicts. Large linked bundles represent fluent readings with little "noise", so prioritizing them over individual links or smaller linked bundles helps ensure that the best reading estimates are preserved.

在另一方面,本发明提供了一种分析用户语音的方法,包括:接收一个或多个估计字形,所述估计字形对应于由语音到文本软件对用户已大声说出的内容的估计;将一个或多个估计字形和用户尝试读出的一个或多个参考字形进行比较;并基于一个或多个估计字形与一个或多个参考字形的比较提供与用户语音相关的反馈。In another aspect, the present invention provides a method of analyzing a user's speech, comprising: receiving one or more estimated glyphs, the estimated glyphs corresponding to estimates by speech-to-text software of what the user has spoken aloud; The one or more estimated glyphs are compared to one or more reference glyphs the user is attempting to read; and feedback related to the user's speech is provided based on the comparison of the one or more estimated glyphs to the one or more reference glyphs.

可选地,提供关于用户语音的反馈包括:产生多个链接,每个链接连接一个或多个估计字形中的一个字形和一个或多个参考字形中的一个字形;以及将修剪过程应用于多个链接以标识链接之间的冲突并解决所述冲突。Optionally, providing feedback on the user's speech includes: generating a plurality of links, each link connecting a glyph of the one or more estimated glyphs and a glyph of the one or more reference glyphs; and applying a trimming process to the multiple glyphs. links to identify and resolve conflicts between links.

可选地,修剪过程包括将至少一个规则应用于多个链接,该至少一个规则指定:一个或多个参考字形中的一个字形不得与一个或多个估计字形中的一个以上字形链接;一个或多个估计字形中的一个字形不得与一个或多个参考字形中的一个以上字形链接;或链接一个或多个估计字形和一个或多个参考字形中的字形的链接不得彼此交叉。Optionally, the trimming process includes applying at least one rule to the plurality of links, the at least one rule specifying that: a glyph in the one or more reference glyphs must not be linked with more than one glyph in the one or more estimated glyphs; one or more A glyph in a plurality of estimated glyphs must not be linked to more than one glyph in one or more reference glyphs; or links linking glyphs in one or more estimated glyphs and one or more reference glyphs must not cross each other.

可选地,修剪过程包括去除不符合至少一个规则的一个或多个链接。Optionally, the pruning process includes removing one or more links that do not meet the at least one rule.

可选地,修剪过程包括确定保留链接的成本,并将保留所述链接的成本与保留不同链接的成本进行比较,以确定要去除两个链接中的哪一个。Optionally, the pruning process includes determining the cost of retaining a link and comparing the cost of retaining the link to the cost of retaining a different link to determine which of the two links to remove.

可选地,每个链接包括一对索引,所述一对索引标识一个或多个估计字形和一个或多个参考字形的每一个中的对应字形。Optionally, each link includes a pair of indices identifying a corresponding glyph in each of the one or more estimated glyphs and the one or more reference glyphs.

可选地,该方法还包括产生多个链接束,每个链接束包括多个链接中的至少一个链接。Optionally, the method further includes generating a plurality of link bundles, each link bundle including at least one link of the plurality of links.

可选地,基于由麦克风提供的输出信号,产生与由语音到文本软件对用户已经大声说出的内容的估计相对应的一个或多个估计字形。Optionally, based on the output signal provided by the microphone, one or more estimated glyphs are generated corresponding to the estimation by the speech-to-text software of what the user has spoken aloud.

可选地,反馈包括以下一项或多项:指示用户语音的流利度的至少一个参数;以及在一个或多个参考字形中用户当前的阅读位置的至少一个表示。Optionally, the feedback includes one or more of: at least one parameter indicative of fluency of the user's speech; and at least one representation of the user's current reading position in one or more reference glyphs.

本发明的各方面可实施为例如设备、计算机程序或非暂时性计算机可读存储介质。Aspects of the invention may be implemented, for example, as an apparatus, a computer program, or a non-transitory computer-readable storage medium.

附图说明Description of drawings

现在将通过非限制性实施方式描述本发明的具体实施方式,其中:Specific embodiments of the present invention will now be described by way of non-limiting examples in which:

图1示出了StT转换软件的输出实例;Fig. 1 shows the output example of StT conversion software;

图2示出了一种将待阅读文本中的单词与StT转换本中的单词相关的方法;Fig. 2 shows a kind of method that the word in the text to be read is correlated with the word in the StT conversion book;

图3示出了用于修剪待读文本和StT转换本中的单词之间的相关性的第一规则的实例;Figure 3 shows an example of a first rule for trimming the correlation between the text to be read and the words in the StT translation;

图4示出了用于修剪待读文本和StT转换本中的单词之间的相关性的第二规则的实例;Figure 4 shows an example of a second rule for trimming the correlation between the text to be read and the words in the StT translation;

图5示出了用于修剪待阅读的文本和StT转换本中的单词之间的相关性的第三规则的实例;Figure 5 shows an example of a third rule for trimming correlations between text to be read and words in the StT translation;

图6示出了最大链接集的实例,该链接集将待阅读的文本中的单词和StT转换本中的单词相关联;Figure 6 shows an example of a maximum link set that associates words in the text to be read with words in the StT translation;

图7示出了图6的实例,其中,从连续文本单词到连续转换本单词的一系列链接以粗体箭头和白框表示;Fig. 7 shows the example of Fig. 6, wherein a series of links from consecutive text words to consecutive translation words are represented by bold arrows and white boxes;

图8示出了图6的实例最大链接集的变型;Figure 8 shows a variation of the example maximal link set of Figure 6;

图9示出了应用图3至图5的第一、第二和第三修剪规则后图8的链接集;Figure 9 shows the link set of Figure 8 after applying the first, second and third pruning rules of Figures 3 to 5;

图10示出了图6的实例最大链接集的变型;Figure 10 shows a variation of the example maximal link set of Figure 6;

图11示出了应用图3至图5的第一、第二和第三修剪规则后图10的链接集;Figure 11 shows the link set of Figure 10 after applying the first, second and third pruning rules of Figures 3 to 5;

图12示出了聚集到链接束b1和b2中的一组链接;Figure 12 shows a set of links aggregated into link bundles b1 and b2;

图13示出了将图3的修剪规则应用于链接束之后图12的链接集;Figure 13 shows the linked set of Figure 12 after applying the pruning rule of Figure 3 to linked bundles;

图14示出了聚集到链接束b1和b2中的一组链接;Figure 14 shows a set of links aggregated into link bundles b1 and b2;

图15示出了聚集到链接束b1和b3中的一组链接;Figure 15 shows a set of links aggregated into link bundles b 1 and b 3 ;

图16示出了聚集到链接束b1和b4中的一组链接;Figure 16 shows a set of links aggregated into link bundles b 1 and b 4 ;

图17示出了聚集到链接束b1和b6中的一组链接;Figure 17 shows a set of links aggregated into link bundles b 1 and b 6 ;

图18示意性地示出了用于确定一对链接束是否彼此重叠的方法;Figure 18 schematically illustrates a method for determining whether a pair of linked bundles overlap each other;

图19示出了应用于图1的实例输出的图2的方法;Figure 19 illustrates the method of Figure 2 applied to the example output of Figure 1;

图20示出了部分通过链接修剪过程的图19的实例;Figure 20 shows the example of Figure 19 partially through the link pruning process;

图21示出了链接修剪过程完成后的图19的实例;和Figure 21 shows the example of Figure 19 after the link pruning process is complete; and

图22、图23和图24示出了应用于图1输出的链接束方法。Figures 22, 23 and 24 illustrate the linked bundle method applied to the output of Figure 1 .

具体实施方式Detailed ways

本发明的实施方式表示了申请人目前将本发明付诸实践的最佳方法,但是它们并不是可实现本发明的唯一方法。仅举例说明它们,现在将对其进行描述。The embodiments of the present invention represent the applicant's present best ways to put the invention into practice, but they are not the only ways in which the invention can be practiced. They are only exemplified and will now be described.

在某些情况下,可能需要将StT转换软件输出的文本转换本与另一个文本,例如用户正在阅读的文本,进行比较。例如,这可使得可在文本中建立用户的当前阅读位置。还可给出正在阅读文本的用户的语音准确性的指示。这可使希望提高其英语口语的用户大声阅读故事、报纸和其他文本,并接收有关数据点的反馈,诸如阅读准确性、阅读速度、流利度和发音。可向用户提供关于她或他对他或他已经大声说出的文本的清晰度的反馈。In some cases, it may be necessary to compare the text translation output by the StT conversion software with another text, such as the text the user is reading. For example, this may allow the user's current reading position to be established in the text. An indication of the speech accuracy of the user who is reading the text may also be given. This enables users who wish to improve their spoken English to read stories, newspapers and other text aloud and receive feedback on data points such as reading accuracy, reading speed, fluency and pronunciation. The user may be provided with feedback on her or his intelligibility of the text he or he has spoken aloud.

通过将StT转换本与正在阅读的文本进行比较,可确定阅读者的当前阅读位置。在理想条件下,这可能很直观——StT转换本可能与正在阅读的文本完全匹配,因此StT转换本和文本之间存在一一对应的关系。在这种情况下,可使用StT转换本中最近添加的单词来标识当前的阅读位置。但是,在实际情况下,有许多因素可能会使跟踪当前阅读位置变得困难。阅读者,特别是正在学习阅读的阅读者,可能不能流利地阅读文本。例如,他或她可能放慢或在句子中遇到陌生单词中途停下来,然后恢复以前的阅读速度。阅读者可使用填充词(例如“um”,“so”或“like”)。阅读者可能会认识到自己犯了一个错误,然后回到单词、行、句子或段落的开头,以尝试另一遍流利的阅读。阅读者可能会漏掉一个单词;无意间跳过了一行文本;重读单词、行或句子;读乱一小段文本;或迷失在文字中。此外,StT软件有时可能会错误地转换阅读者所说的话,特别是例如,如果阅读者具有StT软件无法正确解释的口音时。这些问题中的任何一个都可能导致转换错误,这使得比较用户正在阅读的文本和StT软件的输出变得困难并且容易出错。By comparing the StT conversion to the text being read, the reader's current reading position can be determined. Under ideal conditions, this might be intuitive - the StT convertible might exactly match the text being read, so there is a one-to-one correspondence between the StT convertible and the text. In this case, the most recently added word in the StT translation can be used to identify the current reading position. However, there are a number of factors that can make tracking the current reading position difficult in practical situations. Readers, especially those who are learning to read, may not be able to read texts fluently. For example, he or she may slow down or stop in the middle of a sentence when encountering an unfamiliar word, and then resume the previous reading speed. The reader can use filler words (eg "um", "so" or "like"). The reader may recognize that he has made a mistake and then go back to the beginning of a word, line, sentence or paragraph to try another fluent reading. Readers may miss a word; inadvertently skip a line of text; reread a word, line, or sentence; garble a small piece of text; or get lost in the text. Additionally, StT software may sometimes incorrectly translate what the reader is saying, especially if, for example, the reader has an accent that the StT software cannot interpret correctly. Any of these problems can lead to conversion errors, which make it difficult and error-prone to compare the text the user is reading with the output of the StT software.

图1示出了当用户进行不流利阅读时StT软件的输出实例。在所示的实例中,用户尝试大声朗读文本迈克尔·邦德(Michael Bond)的《请照顾这只熊》中的“Come along,Paddington.We’ll take you home and you can have a nice hot bath”。StT软件会转换用户的尝试,StT软件估计的说的各种备用单词示出在图1所示的表中,并以从左到右的置信度从高到低的顺序显示在备用行中。例如,StT软件已将用户在文本摘录开头的单词“come”的尝试解释为(以降低置信度的顺序)“come”、“came”、“Kim”或“cam”。同样,StT软件将“along”解释为“along”、“long”,“alone”或“on”。Figure 1 shows an example of the output of the StT software when the user is not fluent in reading. In the example shown, the user attempts to read aloud the text "Come along, Paddington. We'll take you home and you can have a nice hot bath" from Michael Bond's "Please Take Care of This Bear" . The StT software translates the user's attempts, and the various alternate words spoken by the StT software are shown in the table shown in Figure 1, and displayed in the alternate row in order from left to right, from high to low confidence. For example, StT software has interpreted user attempts at the beginning of a text excerpt for the word "come" as (in order of decreasing confidence) "come", "came", "Kim" or "cam". Likewise, StT software interprets "along" as "along", "long", "alone" or "on".

用户的阅读从“come along”流利地开始,然后在正确之前尝试几次结巴的“Paddington”。接下来是“we’ll take you home”,尽管阅读者实际上说的是“we’ll takeyou home”,然后自我更正为“take you home”。最后,我们得到“and you can have a”,然后是“nice”、“hot”的错误转换。“bath”似乎已被遗漏,但在进一步检查StT输出最后一行中的替代单词时,出现“hardball’s”;经过一点解释,可将其视为“hot bath”。The user's reading begins fluently with "come along" and then tries a few stuttering "Paddington" before getting right. Next comes "we'll take you home", even though the reader actually says "we'll take you home" and then self-corrects to "take you home". In the end, we get "and you can have a", followed by a wrong transformation of "nice", "hot". "bath" appears to have been left out, but upon further inspection of the alternate word in the last line of StT's output, "hardball's" appears; after a bit of explanation, it can be thought of as "hot bath".

上述所有的不流利可能使得在任何给定时间都难以从StT转换本确定阅读者在阅读文本中的当前位置。因此,需要尝试标识并消除StT转换本中的不流利。为此,可先尝试将StT转换本中的单词与正在阅读的文本相关联。All of the above lack of fluency may make it difficult to determine the reader's current position in the reading text from the StT translation at any given time. Therefore, an attempt needs to be made to identify and eliminate the influency in the StT conversion book. To do this, first try to associate the words in the StT translation with the text you are reading.

对于不经常出现的单词,相关性可能微不足道。例如,可将在阅读的文本中出现一次和在转换本中出现一次的单词置信度地相关。但是,通常不是这种情况。在上面的简短实例中,单词“you”出现了两次,如果用户进一步阅读了摘录所出自的书,那么其他单词也将出现多次,包括“Paddington”等不太常见的单词。For infrequently occurring words, the correlation may be trivial. For example, a word that occurs once in a read text and a word that occurs once in a translation can be correlated with confidence. However, this is usually not the case. In the short example above, the word "you" appears twice, and if the user further reads the book from which the excerpt comes, other words will appear multiple times, including less common words like "Paddington".

在阅读的文本和StT转换本之间建立相关性的一种方法可能是寻找犹豫。例如,人们可查看转换本中的犹豫词或短语,例如“erm”、“um”、“I mean”等,并将其从相关性考虑中去除。但是,这种方法带来了几个问题。随着发现新的犹豫短语,犹豫单词和短语的列表可能会继续增长。不同的阅读者很可能会使用不同的“pet”犹豫短语,因此,一个阅读者可能使用“I mean”,而另一个阅读者可能使用“er,sorry”,当作为软件实施时,这将导致频繁的软件更新,这可能难以保持。犹豫词也可能合法出现在正在阅读的文本中,因此不应简单地去除。One way to establish a correlation between the text read and the StT convertible might be to look for hesitation. For example, one can look at hesitant words or phrases in the translation, such as "erm", "um", "I mean", etc., and remove them from relevance considerations. However, this approach poses several problems. The list of hesitant words and phrases will likely continue to grow as new hesitant phrases are discovered. Different readers are likely to use different "pet" hesitant phrases, so one reader may use "I mean" while another reader may use "er, sorry" which, when implemented as software, will result in Frequent software updates, which can be difficult to maintain. Hesitant words may also legitimately appear in the text being read and should not simply be removed.

另一种方法可能是寻找重复项。如图1的实例所示,阅读者有时会结巴并重复一个单词或短语,甚至开始在句子或段落的开头重新阅读。可寻找重复的单词或单词序列,将其识别为重复,并将它们从相关性考虑中排除。但是,这种方法也存在许多问题。重复的单词可能会合法地出现在正在阅读的文本中。例如,句子“I had had too many chocolatesand so didn’t feel very well.”中的结构“had had”就是合法使用的重复单词。由于某些阅读者只会重复读错的单词,而有些阅读者可能会从行首、句子开头或段落开头重复,因此无法预测要寻找的重复单词序列的时间。另外,重复经常与犹豫短语混在一起,因此不一定很容易发现。例如,很难编撰如何从“Today is Wednes,er I mean,today isWednesday”的单词中辨别出重复和犹豫单词。Another approach might be to look for duplicates. As shown in the example in Figure 1, readers sometimes stutter and repeat a word or phrase, or even start rereading at the beginning of a sentence or paragraph. Duplicate words or word sequences can be looked for, identified as duplicates, and excluded from relevance considerations. However, there are also many problems with this approach. Duplicate words may legitimately appear in the text being read. For example, the structure "had had" in the sentence "I had had too many chocolates and so didn't feel very well." is a legitimately used repeated word. Since some readers will only repeat mispronounced words, while others may repeat from the beginning of a line, sentence, or paragraph, it is impossible to predict when to look for repeated word sequences. Also, repetition is often mixed with hesitant phrases, so it's not necessarily easy to spot. For example, it is difficult to codify how to distinguish between repeated and hesitant words from the words "Today is Wednes,er I mean,today isWednesday".

图2示出了根据本发明实施方式的将待阅读文本中的单词与StT转换中的单词相关联的通用方法。文字单词以以下顺序给出:文字单词意图应沿着图的顶部被阅读。给要阅读的文本中的每个单词一个索引(在所示实例中显示为“i”,其中i为1到5之间的数字),并且类似地,给由StT软件估计为已经被阅读的每个单词或单词组一个索引(在所示实例中显示为“j”,其中j为1到5之间的数字)。Figure 2 illustrates a general method of associating words in a text to be read with words in a StT transformation according to an embodiment of the present invention. Literal words are given in the following order: Literal words are intended to be read along the top of the graph. give an index to each word in the text to be read (shown as "i" in the example shown, where i is a number between 1 and 5), and similarly, give the value estimated by the StT software to have been read One index per word or group of words (shown as "j" in the example shown, where j is a number between 1 and 5).

用链接对相关性进行建模,每个链接都将文本单词和转换本单词相关联。在这种情况下,“转换本单词”实际上可包括由StT软件提供的多个替代转换本单词。图2中索引为j=5的转换本单词示出了此概念——StT软件提供了多个选项,用于它估计示出的转换本中的最终发音的内容,由图2中的转换本单词行中索引为j=5处的级联正方形表示。如果索引j=5处的那些替代转换单词中的一个与文本单词匹配,则该文本单词和替代转换单词的集合将通过单个链接进行关联。为了表示的简单起见,附图中通常没有示出替代的转换本单词,但是图2或其他附图中的任何转换本单词也可具有多个替代转换本单词。Relevance is modeled with links, each link associating a text word with a translation word. In this case, "converted word" may actually include a number of alternate translations provided by the StT software. The transformbook word indexed j=5 in Figure 2 illustrates this concept - the StT software provides several options for it to estimate the content of the final pronunciation in the transformbook shown, from the transformbook in Figure 2 The concatenated square representation at index j=5 in the word row. If one of those alternate conversion words at index j=5 matches a text word, the text word and the set of alternate conversion words will be associated by a single link. For simplicity of presentation, alternate translation words are generally not shown in the figures, but any translation word in Figure 2 or other figures may also have multiple alternate translation words.

在附图中,文本单词和转换本单词之间的相关性在象形上表示为连接文本单词和转换本单词的箭头。有时可能会以粗体显示链接以强调。然而,可以以其他方式来表示和/或存储相关性。例如,给定的链接l可表示为一对值(i,j),指示链接的起点和终点,即要阅读的文本中的单词和StT转换本中的单词或单词组通过链接连接。例如,图2中所示的链接可表示为(3,2),因为它连接了待阅读的文本中索引为i=3的单词和StT软件输出中索引为j=2的单词或单词组。In the figures, the correlation between the text word and the translation word is represented pictographically as an arrow connecting the text word and the translation word. Links may sometimes be bolded for emphasis. However, dependencies may be represented and/or stored in other ways. For example, a given link l can be represented as a pair of values (i, j) indicating the start and end of the link, ie the word in the text to be read and the word or group of words in the StT translation are connected by the link. For example, the link shown in Figure 2 can be represented as (3, 2) because it connects the word with index i=3 in the text to be read and the word or group of words with index j=2 in the StT software output.

最终,文本单词和转换本单词之间的一组链接应该建模为尽可能接近流利的文本阅读。流利的阅读可归类为以下一种阅读:以正确的顺序阅读单词,并且文本中的单词与转换本中的最多一个单词相关,反之,转换本中的单词与文字中的最多一个单词相关。这些约束可通过三个规则来捕获,如果一组链接表示流利的阅读,则将文本单词和转换本单词相关的任何链接都必须满足这三个规则。Ultimately, a set of links between text words and translation words should be modeled as close to fluent text reading as possible. Fluent reading can be classified as one of the following: reading words in the correct order, and words in the text are related to at most one word in the translation, and vice versa, words in the translation are related to at most one word in the text. These constraints are captured by three rules that must be satisfied by any link that associates a text word with a translation word if a set of links represents fluent reading.

规则1:没有两个链接可将相同的文本单词链接到转换本单词。由于文本中的任何单词在有效阅读中仅应阅读一次,因此文本单词和转换本单词之间应有最多一个链接。作为规则1的结果,图3中所示的两个链接不能同时成为流利阅读的一部分,因为所示的两个链接将一个文本单词与两个转换本单词相关联。文本单词可能无法链接到任何转换本单词——例如,如果某个单词被阅读者跳过(即未阅读),或者该单词被StT软件错误地转换了。Rule 1: No two links can link the same text word to the translation word. Since any word in the text should only be read once in a valid reading, there should be at most one link between the text word and the converted word. As a result of Rule 1, the two links shown in Figure 3 cannot be part of Fluent Reading at the same time because the two links shown associate one text word with two translation words. Text words may not be linked to any converted words - for example, if a word was skipped by the reader (i.e. not read), or if the word was incorrectly converted by the StT software.

规则2:没有两个链接可将相同的转换本单词链接到一个文本单词。转换本中的一个单词应与文字中的最多一个单词对应。在一个完美的阅读中,每个文本单词都将链接到一个转换本单词。作为规则2的结果,图4中所示的链接不能同时成为流利阅读的一部分,因为这两个链接将两个文本单词与一个转换本单词相关联。由于多种原因中的任何一种,转换本单词可能不会链接到任何文本单词。阅读者从未说过的、错误转换的单词可能未链接到文本;犹豫词将不会链接到文本单词;例如,重新阅读的单词可能不会链接。Rule 2: No two links can link the same translation word to a text word. One word in the translation should correspond to at most one word in the text. In a perfect reading, each text word will be linked to a translation word. As a result of Rule 2, the links shown in Figure 4 cannot be part of Fluent Reading at the same time because the two links associate two text words with one translation word. Converting this word may not link to any text word for any of a number of reasons. Mistranslated words that were never spoken by the reader may not link to text; hesitant words will not link to text words; for example, reread words may not link.

规则3:没有两个链接可能交叉。交叉的链接表示单词被乱序阅读。作为规则3的结果,图5所示的链接不能全部都是流利阅读的一部分。Rule 3: No two links may cross. Crossed links indicate that the words were read out of order. As a result of Rule 3, the links shown in Figure 5 cannot all be part of Fluent Reading.

违反规则1、规则2或规则3的链接被称为冲突。在以下段落中阐述了解决此类冲突的方法。A link that violates Rule 1, Rule 2, or Rule 3 is called a conflict. Methods for resolving such conflicts are described in the following paragraphs.

要尝试获得一组表示流利阅读的链接,必须首先在文本单词和转换本单词之间创建最大的链接集。最大的链接集将在文本单词与转换本单词匹配的任何地方都包含一个链接(如上所述,一个转换本单词可能包括StT软件针对单个话语提出的几个替代转换本单词)。因此,最大的链接集不太可能为流利的阅读建模。To try to get a set of links that represent fluent reading, you must first create the largest set of links between text words and translation words. The largest set of links will contain a link wherever a text word matches a translation word (as mentioned above, a translation word may include several alternative translation words proposed by the StT software for a single utterance). Therefore, the largest link sets are unlikely to model fluent reading.

在下面的段落中将更详细地讨论创建最大链接集的步骤,其中,文本中的每个文本单词te在文本Te中都有一个索引,如图2中的以字母i所示。因此,可将文本视为文本单词的数组。类似地,转换本Tr中的每个转换本单词tr都有一个索引,如图2中的以字母J所示。这些步骤可表示如下。The steps to create the maximal link set are discussed in more detail in the following paragraphs, where each text word te in the text has an index in the text Te, as indicated by the letter i in Figure 2. Therefore, text can be thought of as an array of text words. Similarly, each translation word tr in the translation Tr has an index, as indicated by the letter J in FIG. 2 . These steps can be represented as follows.

1.设置L={}1. Set L={}

这将是一组候选链接。最初L为空,但随着标识候选链接而被填充。This will be a set of candidate links. Initially L is empty, but is filled as candidate links are identified.

2.设置i=12. Set i=1

3.将tei设置为Te中索引i处的文本单词。3. Set te i to the text word at index i in Te.

4.设置Tri=match(tei,Tr)4. Set Tri = match(te i , Tr)

Tri是一组与tei匹配的转换本单词。match(te,Tr)搜索Tr中的每个转换本单词tr,如果te与转换本单词中的任何替代单词匹配,则返回tr。 Tri is a set of translation words that match te i . match(te, Tr) searches each translation word tr in Tr and returns tr if te matches any of the alternative words in the translation word.

5.对于Tri中的每个tr5. For each tr in Tr i

5.1.设置L=L∪{Link(i,j)}——其中,j是转换本单词tr的索引。5.1. Set L=L∪{Link(i,j)}—where j is the index of the transformed word tr.

链接在索引i处的文本单词te和索引j处的转换本单词tr之间创建链接。请注意,这仅记录索引——而不是单词。The link creates a link between the text word te at index i and the translation word tr at index j. Note that this only records the index - not the word.

6.设置i=i+16. Set i=i+1

7.如果i>telast——其中,telast是Te中最后一个单词的索引。7. If i>te last - where te last is the index of the last word in Te.

停止stop

否则otherwise

转到步骤4。Go to step 4.

如上所述,这创建了将尽可能多的文本单词和尽可能多的转换本单词相关联的最大的链接集。最大的链接集不太可能表示流利的文本阅读,因为在大多数情况下,单词在文本中的作用将不止一次,因此,假设此类单词至少被正确转换一次,规则2将被破坏。其他规则也可能被违反。As mentioned above, this creates the largest set of links that associate as many text words as possible with as many translation words as possible. The largest set of links is unlikely to represent fluent text reading, since in most cases a word will function in the text more than once, so rule 2 will be broken assuming such words are translated correctly at least once. Other rules may also be violated.

一旦建立了最大链接集,便可对最大链接集应用修剪过程或方法(“修剪”),以标识破坏规则1、2和3中的一个或多个的链接,并去除链接直到规则都被遵守。修剪过程应最大化剩余链接的数量,因为这将导致文本和转换本单词的最佳相关性。Once the maximal link set is established, a pruning process or method ("pruning") may be applied to the maximal link set to identify links that break one or more of rules 1, 2, and 3, and remove links until the rules are all complied with . The pruning process should maximize the number of remaining links, as this will result in the best relevance of the text and converted words.

首选的修剪过程通过评估保持冲突链接的成本来去除链接。保持链接的成本是为了满足这三个规则而需要去除的冲突链接的数量。给定一组链接,修剪过程将依次考虑每个链接及其冲突的链接,并保留成本最低的链接,因为这是导致去除最少的冲突链接的链接。The preferred pruning process removes links by evaluating the cost of keeping conflicting links. The cost of keeping links is the number of conflicting links that need to be removed in order to satisfy these three rules. Given a set of links, the pruning process will consider each link and its conflicting links in turn and keep the link with the lowest cost, as this is the link that resulted in the removal of the least conflicting link.

链接的首选修剪过程的步骤可表示如下。The steps of the linked preferred pruning process can be represented as follows.

1.设置Lremaining={},Lremoved={}。1. Set L remaining = {}, L removed = {}.

这些集合可跟踪修剪(去除)的链接和剩余的链接。最初它们是空的,但随着修剪过程的进行而被填充。These collections keep track of pruned (removed) links and remaining links. Initially they are empty, but are filled as the pruning process progresses.

2.设置Lunprocessed=L-{Lremaining∪Lremoved}2. Set L unprocessed = L-{L remaining ∪L removed }

3.如果Lunprocessed={}3. If L unprocessed = {}

停止stop

4.在Lunprocessed中选择一个链接l4. Select a link in L unprocessed l

5.设置Lconflicting=conflicts(l,Lunprocessed)5. Set L conflicting =conflicts(l, L unprocessed )

conflicts(l,L)返回L的子集,其中包含与l冲突的链接,即违反规则R1、R2或R3中的至少一个规则。conflicts(l, L) returns a subset of L that contains links that conflict with l, i.e. violate at least one of rules R1, R2, or R3.

6.设置cl=|Lconflicting|6. Set c l = |L conflicting |

这是集合Lconflicting的大小,即与l冲突的链接数。This is the size of the set L conflicting , the number of links that conflict with l.

7.设置cmin=min({cl’:cl’=|conflicts(l’,Lunprocessed)|,l’∈Lconflicting})7. Set c min =min ({cl ' :cl ' =|conflicts(l', L unprocessed )|, l' ∈L conflicting })

换句话说,找到使Lunprocess中的每个链接l’保持的成本cl’。找到成本最低的链接,并将其保存在cmin中。In other words, find the cost c l ' to keep each link l' in L unprocess . Find the link with the lowest cost and save it in c min .

8.如果cl<=cmin 8. If c l <= c min

8.1.Lremoved=Lremoved∪Lconflicting 8.1.L removed =L removed ∪L conflicting

8.2.Lremaining=Lremaining∪{l}8.2.L remaining =L remaining ∪{l}

否则otherwise

8.3.Lremoved=Lremoved∪{l}8.3. L removed =L removed ∪{l}

决定时间。如果cl<=cmin,则保留链接l。因此,我们将L添加到Lremaining并将所有冲突的链接添加到LremovedDecide time. If c l <= c min , link l is reserved. Therefore, we add L to L remaining and add all conflicting links to L removed .

如果cl>cmin,则去除l,因此将L添加到LremovedIf c l >c min , l is removed, so L is added to L removed .

9.转到步骤29. Go to step 2

请注意,在步骤8中,从L到Lremaining或Lremoved的链接被添加。因此,最终,Lremaining∪Lremoved=L并且Lunprocessed将为{},因此该修剪过程最终在步骤3中终止。当终止时,Lremaining中的所有链接都将满足规则R1、R2和R3。修剪过程起作用,使得链接,如前所述,可表示为索引对。虽然索引指的是文本和转换本中的单词,但修剪过程完全不使用单词。修剪过程也不假定要考虑的链接的任何顺序(例如,从左到右或从右到左)。因此,尽管所描述和示出的实例出于说明目的使用英语单词,但修剪过程与语言无关,并且应该对其他语言同样有效。除适用于单词外,修剪过程还可应用于任何字素(grapheme)或字素组合,诸如字母、数字、标点、音节、徽标、象形图、表意字、速记符号或口语或话语的其他表示形式。在本文档中,“字形”一词旨在包括口语或话语的任何此类表示形式。在本文档中,一个或多个字形可构成“文本”(例如在所描述和示出的实施方式中所指的“参考文本”)。Note that in step 8, links from L to L remaining or L removed are added. Therefore, in the end, L remaining ∪ L removed = L and L unprocessed will be {}, so the pruning process finally terminates in step 3. When terminated, all links remaining in L will satisfy rules R1, R2, and R3. The pruning process works so that links, as previously described, can be represented as pairs of indices. While the index refers to words in the text and translations, the trimming process does not use words at all. The pruning process also does not assume any order of the links to be considered (eg, left-to-right or right-to-left). Thus, although the examples described and shown use English words for illustration purposes, the trimming process is language-independent and should work equally well for other languages. In addition to being applied to words, the trimming process can also be applied to any grapheme or combination of graphemes, such as letters, numbers, punctuation, syllables, logos, pictograms, ideograms, shorthand symbols or other representations of spoken or spoken words . In this document, the term "glyph" is intended to include any such representation of spoken language or discourse. In this document, one or more glyphs may constitute "text" (eg, "reference text" as referred to in the described and illustrated embodiments).

图6展示了一组要修剪的链接。假设修剪过程首先考虑链接14。l4与链接l3冲突(因为l3和l4违反了规则1导致同一个文本单词与两个转换本单词相关联)并与链接l5冲突(因为l4和l5违反了规则2导致两个文本单词与同一个转换本单词相关联)。下表中的各行给出了保持l4及其冲突链接,与相应链接冲突的链接以及因相应冲突而中断的规则的成本。Figure 6 shows a set of links to prune. Suppose the pruning process first considers link 1 4 . l 4 conflicts with link l 3 (because l 3 and l 4 violate rule 1 causing the same text word to be associated with two translation words) and conflicts with link l 5 (because l 4 and l 5 violate rule 2 causing two text words are associated with the same translation word). The rows in the table below give the cost of keeping l 4 and its conflicting links, links that conflict with the corresponding links, and rules broken by the corresponding conflicts.

链接Link 成本cost 冲突链接conflicting links l<sub>3</sub>l<sub>3</sub> 33 l<sub>1</sub>(R3),l<sub>2</sub>(R3),l<sub>4</sub>(R1),l<sub>1</sub>(R3), l<sub>2</sub>(R3), l<sub>4</sub>(R1), l<sub>4</sub>l<sub>4</sub> 22 l<sub>3</sub>(R1),l<sub>5</sub>(R2)l<sub>3</sub>(R1), l<sub>5</sub>(R2) l<sub>5</sub>l<sub>5</sub> 11 l<sub>4</sub>(R2)l<sub>4</sub>(R2)

在此实例中,l5的保留成本最低(cmin=c5=1vs c4=2),我们将去除l4。假设修剪过程接下来考虑链接l1。只有l3与l1冲突。保留l1的成本为c1=1,而保留l3的成本为c3=2,因为在l4被去除后l3与l1和l2冲突(均基于规则3)。因此,l3被去除,其余的链接(l1,l2和l5)互不冲突,因此修剪过程结束。In this example, l 5 has the lowest retention cost (c min =c 5 =1 vs c 4 =2) and we will remove l 4 . Suppose the pruning process next considers link l 1 . Only l 3 conflicts with l 1 . The cost of keeping l 1 is c 1 =1, and the cost of keeping l 3 is c 3 =2, because l 3 conflicts with l 1 and l 2 after l 4 is removed (both based on rule 3). Therefore, l 3 is removed, the remaining links (l 1 , l 2 and l 5 ) do not conflict with each other, so the pruning process ends.

如前所述,修剪过程假定没有顺序考虑要修剪的链接。在实践中,按顺序阅读确实很重要。以不同的顺序关联同一转换本中的链接可能会导致微妙不同的相关性,其中一些具有比其他更多的链接——修剪过程的目的是最大化转换本中相关单词的数量。相关链接数量的这种变化往往会出现在许多不流利的阅读中,尤其是在一段文本包含相同单词序列的情况下。就是说,即使是不流利的阅读也通常会出现流利的片段。这些片段可标识为从连续文本单词到连续转换本单词的链接序列,如图7所示。流利的阅读由粗体箭头和白色框表示。As mentioned earlier, the pruning process assumes that the links to be pruned are not considered sequentially. In practice, reading in order does matter. Associating links in the same translation in a different order can lead to subtly different correlations, some of which have more links than others—the purpose of the pruning process is to maximize the number of related words in the translation. This variation in the number of related links tends to show up in many less-fluent readings, especially if a piece of text contains the same sequence of words. That said, fluent snippets often appear even in fluent reading. These segments can be identified as linked sequences from consecutive text words to consecutive translation words, as shown in Figure 7. Fluent reading is indicated by bold arrows and white boxes.

在图7的实例中,考虑对链接进行修剪的顺序会影响哪些链接在修剪后存在下来,现在将参考图8、9、10和11进行描述。在每个实例中,链接均按以下顺序编号:考虑对其进行修剪(即首先考虑链接l1,然后考虑链接l2,依此类推)。In the example of FIG. 7 , considering the order in which the links are pruned affects which links survive the pruning, which will now be described with reference to FIGS. 8 , 9 , 10 and 11 . In each instance, the links are numbered in the following order: they are considered for pruning (ie, link l 1 is considered first, then link l 2 , and so on).

在图8的实例中,修剪过程首先考虑链接l1。保持l1的成本为1(需要去除l4才能保持l1)。l4是与l1冲突的唯一链接。保持l4的成本为3(需要去除l1,l2和l3),因此将l4去除。没有与l2冲突的链接。有一个链接l5与l3冲突。保持l3的成本为1(l4已被去除,因此唯一的其他冲突链接为l5)。保持l5的成本也是1(它唯一的冲突链接是l3)。此时,修剪过程将做出任意决定保留l3并去除l5。当保持每个链接的成本相同时,以有利于冲突的链接来保留考虑的链接。在这种情况下,修剪过程可利用链接的权重(即在找到匹配项之前必须搜索的替代单词向下离列表多远)来区分冲突的链接,但是即使这样也会导致“平局”,因此仍然需要对保留哪个链接和去除哪个链接做出任意决定。In the example of FIG. 8, the pruning process first considers link l 1 . The cost of keeping l1 is 1 ( l4 needs to be removed to keep l1 ). l 4 is the only link that conflicts with l 1 . The cost of keeping l 4 is 3 (requires removal of l 1 , l 2 and l 3 ), so l 4 is removed. There are no links that conflict with l2 . There is a link l 5 that conflicts with l 3 . The cost of keeping l3 is 1 ( l4 has been removed, so the only other conflicting link is l5 ) . The cost of keeping l 5 is also 1 (its only conflicting link is l 3 ). At this point, the pruning process will make an arbitrary decision to keep l3 and remove l5 . While keeping the cost of each link the same, keep the considered links in favor of conflicting links. In this case, the pruning process can use the link's weight (i.e. how far down the list an alternative word must be searched before a match is found) to distinguish conflicting links, but even this would result in a "tie", so still Arbitrary decisions need to be made about which links to keep and which to remove.

以上修剪过程导致链接l1,l2和l3保留,如图9所示。从这些链接表示由图7中白框和粗体箭头表示的“流利”读数的意义上来说,这是一个很好的结果。The above pruning process results in the retention of links l 1 , l 2 and l 3 , as shown in Figure 9. This is a good result in the sense that these links represent "fluent" readings represented by the white boxes and bold arrows in Figure 7.

图10示出了相同的实例,但按不同顺序考虑了链接。在图10的情况下,保持l1的成本为2(l1的冲突链接为l4和l5)。保持l4的成本为3(l4与l1,l2和l3冲突)。保持l5的成本为1(l5与l1冲突),因此l1被去除了!保持l2的成本为1(l2的冲突链接为l4)。保持l4的成本为2(l4的冲突链接为l2和l3),因此将l4去除。没有与l3和l5冲突的链接。Figure 10 shows the same example, but considering the links in a different order. In the case of Figure 10, the cost of keeping l 1 is 2 (the conflicting links of l 1 are l 4 and l 5 ). The cost of keeping l 4 is 3 (l 4 conflicts with l 1 , l 2 and l 3 ). The cost of keeping l 5 is 1 (l 5 conflicts with l 1 ), so l 1 is removed! The cost of keeping l 2 is 1 (the conflicting link of l 2 is l 4 ). The cost of keeping l 4 is 2 (the conflicting links of l 4 are l 2 and l 3 ), so l 4 is removed. There are no conflicting links with l 3 and l 5 .

修剪过程导致链接l2,l3和l5剩余,这是次优的——去除了l1,整个单词集中的流利阅读的一部分。The pruning process results in links l 2 , l 3 and l 5 remaining, which are sub-optimal - removing l 1 , part of the fluent reading in the entire word set.

因此,可能有这样的情况,其中考虑链接的顺序会导致不同的总体结果,特别是在最大链接集特别“嘈杂”的情况下,即包含大量错误链接的情况。因此,希望降低以去除“好的”链接为代价而修剪后保留了噪声链接的可能性。即使是不流利的阅读,也通常会具有流利的阅读片段,这些片段可被识别为将连续文本单词与连续转换本单词相关联的一系列链接。以良好链接为代价降低保留嘈杂链接的可能性的一种方法是将这些良好的链接序列组合在一起成为链接束,目的是在链接束上运行上述修剪过程。为此,必须稍微重新定义规则1、2和3,以考虑链接束而不是单个链接的行为,这将在以下段落中讨论。Therefore, there may be cases where the order in which the links are considered leads to different overall results, especially if the largest set of links is particularly "noisy", i.e. contains a large number of false links. Therefore, it is desirable to reduce the likelihood that noisy links remain after pruning at the expense of removing "good" links. Even non-fluent readings often have fluent reading segments that can be identified as a series of links that associate consecutive text words with consecutive translation words. One way to reduce the likelihood of retaining noisy links at the expense of good links is to group these good link sequences together into link bundles, with the aim of running the above pruning process on the link bundles. To do this, rules 1, 2, and 3 have to be redefined slightly to account for the behavior of linked bundles rather than individual links, as discussed in the following paragraphs.

规则1的原始定义要求没有两个链接可共享相同的文本单词。在链接束的上下文中,这意味着一个链接束中的任何链接都不能与另一个文本束中的链接共享相同的文本单词。结果,图12中所示的链接束b1和b2违反规则1。具体地,以粗体显示的冲突链接违反规则1。如果对束b1和b2应用修剪过程,则将去除整个链接束,以便将去除非粗体链接(以及已去除的束中的冲突链接)。如果首先考虑束b1,则修剪过程将决定去除束b2。实际上,最好的结果是去除b2中的两个粗体链接,而幸存的链接形成一个新束,如图13中的bn所示。更笼统地说,束bn最好包含b2中所有未违反规则1的链接。The original definition of Rule 1 required that no two links could share the same text word. In the context of link bundles, this means that no link in one link bundle can share the same text word as a link in another text bundle. As a result, the link bundles b 1 and b 2 shown in FIG. 12 violate Rule 1. Specifically, the conflicting links shown in bold violate Rule 1. If the pruning process is applied to bundles b1 and b2, the entire linked bundle will be removed, so that non - bold links (and conflicting links in removed bundles ) will be removed. If bundle b 1 is considered first, the pruning process will decide to remove bundle b 2 . In fact, the best result is to remove the two bold links in b2, and the surviving links form a new bundle, as shown by b n in Figure 13. More generally, bundle b n preferably contains all links in b 2 that do not violate rule 1.

违反规则1的束可如下所示。如果b是链接束,则bfrom,表示链接束b中包含的文本中第一个单词的索引的整数,bto,表示转换本中匹配单词的索引的整数,bsize,表示束中链接的数目,则在以下情况下链接束b和b’违反R1:A bundle that violates rule 1 can look like this. If b is a link bundle, then b from , an integer representing the index of the first word in the text contained in link bundle b, bto, an integer representing the index of the matching word in the translation, and b size , representing the number of links in the bundle , then linking bundles b and b' violates R1 if:

maX(bfrom+bsize,b’from+b’size)-min(bfrom,b’from)<bsize+b’size maX(b from +b size ,b' from +b' size )-min(b from ,b' from )<b size +b' size

图18示意性地示出了此表达式为何标识违反规则1的链接束。在图18的上半部分显示了上述表达式的左侧。第一个链接束b显示为包含一系列正方形的矩形,每个正方形表示链接束b中链接(或者换句话说,每个正方形表示文本中一个单词,链接束b中的链接从该单词延伸)。链接束b从图的左侧开始(表示索引bfrom),并向右延伸一段距离(平方数),该距离表示链接束b中的链接数(bsize)(即表示由束b中的由链接链接的文本中连续单词的数量)。因此,链接束b以bfrom+bsize结尾。第二个链接束b’(也显示为矩形)从b’from开始,并向右延伸一段距离,该距离表示链接束b’中的链接数(b’size)。因此,链接束b’以b’from+b’size结尾。因此,通过两个链接束实现的文本的最大跨度是从bfrom和b’from的最小值(图18中最左侧)到bfrom+bsize和b’from+b’size的最大值(图18中最右侧)。Figure 18 schematically shows why this expression identifies a linked bundle that violates Rule 1. The left side of the above expression is shown in the upper part of FIG. 18 . The first link bundle b is shown as a rectangle containing a series of squares, each square representing a link in link bundle b (or in other words, each square representing a word in the text from which the link in link bundle b extends) . Link bundle b starts from the left side of the graph (representing index b from ) and extends to the right by a distance (square) that represents the number of links in link bundle b (b size ) (that is, represented by the number of links in bundle b by the number of consecutive words in the linked text). Therefore, the link bundle b ends up with b from +b size . The second link bundle b' (also shown as a rectangle) starts from b' from and extends to the right by a distance that represents the number of links in link bundle b'(b' size ). Therefore, the link bundle b' ends up with b' from +b' size . Therefore, the maximum span of text achieved by two link bundles is from the minimum of b from and b' from (far left in Figure 18) to the maximum of b from +b size and b' from +b' size ( far right in Figure 18).

该表达式的右侧如图18的下半部分所示。如果两个链接束b和b’不重叠(即,如果没有文本单词被链接束b中的链接和链接束b’中的链接链接),则它们所获得的文本的最小跨度将为bsize+b’size(由表示两个首尾相连的两个链接束的两个矩形表示)。因此,如果两个链接束之间存在重叠(即规则1被两个链接束违反),则表达式的左侧(由图18的上半部分表示)将小于表达式的右侧(由图18的下半部分表示)。如果两个链接束之间没有重叠(即规则1没有被两个链接束违反),则表达式的左侧至少与表达式的右侧一样大。The right side of this expression is shown in the lower part of Figure 18. If two link bundles b and b' do not overlap (that is, if no text word is linked by a link in link bundle b and a link in link bundle b'), the minimum span of text they get will be b size + b' size (represented by two rectangles representing two linked bundles end to end). Therefore, if there is an overlap between two linked bundles (i.e. rule 1 is violated by both linked bundles), the left side of the expression (represented by the upper half of Figure 18) will be smaller than the right side of the expression (represented by the top half of Figure 18) the lower part of ). If there is no overlap between the two linked bundles (i.e. rule 1 is not violated by the two linked bundles), then the left side of the expression is at least as large as the right side of the expression.

可用类似的方式修改规则2,以标识违反规则2的束。在以下情况下链接束b和b’违反规则2:Rule 2 can be modified in a similar manner to identify bundles that violate Rule 2. Chaining bundles b and b' violates rule 2 when:

max(bto+bsize,b’to+bsize)min(bto,b’to)<bsize+b’size max(b to +b size , b' to +b size )min(b to , b' to )<b size +b' size

规则3也可扩展到链接束的上下文。在以下情况下链接束b和b’违反规则3:Rule 3 also extends to the context of linked bundles. Chaining bundles b and b' violates rule 3 when:

sgn(bfrom-b’from)+sgn(bto+bsize-(b’to+b’size))=0sgn(b from -b' from )+sgn(b to +b size -(b' to +b' size ))=0

其中,sgn是符号或符号函数,对于i>0,其定义为sgn(i)=+1,对于i=0,其定义为0,对于i<0,其定义为-1。where sgn is the sign or sign function, which is defined as sgn(i)=+1 for i>0, 0 for i=0, and -1 for i<0.

上述情况检测违反规则3的链接束,以及链接束违反规则1或2的某些情况。由于修剪过程对链接集有效,因此对修剪过程的工作没有影响,因为将链接添加到集合一次以上,例如如果检测到某个链接违反规则3和规则1,则不会导致该集合中出现同一链接的多个实例。The above case detects link bundles that violate rule 3, and some cases where link bundles violate rule 1 or 2. Since the pruning process works on linked sets, it has no effect on the working of the pruning process, since adding a link to a set more than once, for example if a link is detected to violate both rule 3 and rule 1, will not cause the same link to appear in that set multiple instances of .

图14中显示了交叉的链接束实例。在该实例中,这些束简单地交叉,并且修剪过程将去除b1或b2中的一个。An example of a cross-linked bundle is shown in Figure 14. In this example, the bundles simply intersect, and the pruning process will remove either b1 or b2 .

像链接一样,创建候选链接束的最大集合是有利的。从表面上看,这似乎很直观:只需收集L(最大的候选链接集)中的链接,将连续的文本单词和连续的转换单词关联到一个链接束中。但是,这不会提供最大的链接束集。给定一个包含比如3个连续链接l1,l2和l3的链接束b,还将创建包含l1和l2,l2和l3,l1,l2和l3的链接束。这给出了一组最大的候选链接束集,每个束仅包含连续的链接,并解决了在上述链接束的背景下讨论规则1时突出显示的问题。图12中的实例显示了违反规则1的两个束b1和b2。修剪过程将去除b2,因此规则1不再被违法。图12中的实例未显示(为清楚起见)的是要考虑其他候选链接束,例如图15和图16中所示的链接束。创建的最大链接束集将包括束b3和b4。这些也与b1冲突(根据规则1),因此将被修剪过程去除。但是,由于创建了最大的候选链接束集,因此还会有另一个候选链接束b6(如图17所示,对应于图13的束bn),与b1不冲突。因此,R1中突出显示的问题无需任何特殊处理即可解决。Like links, it is advantageous to create the largest set of candidate link bundles. On the surface, this seems intuitive: just collect the links in L (the largest set of candidate links), associating consecutive text words and consecutive transformed words into a link bundle. However, this will not provide the largest set of link bundles. Given a link bundle b containing say 3 consecutive links l 1 , l 2 and l 3 , a linked bundle containing l 1 and l 2 , l 2 and l 3 , l 1 , l 2 and l 3 will also be created. This gives the largest set of candidate link bundles, each containing only consecutive links, and addresses the issues highlighted when discussing Rule 1 in the context of link bundles above. The example in Figure 12 shows two bundles b 1 and b 2 that violate Rule 1. The pruning process will remove b2, so rule 1 is no longer violated. What is not shown in the example in Figure 12 (for clarity) is that other candidate link bundles, such as those shown in Figures 15 and 16, are to be considered. The largest set of linked bundles created will include bundles b3 and b4 . These also conflict with b 1 (according to rule 1), so will be removed by the pruning process. However, since the largest set of candidate linked bundles is created, there will be another candidate linked bundle b 6 (as shown in FIG. 17 , corresponding to bundle bn of FIG. 13 ), which does not conflict with b 1 . Therefore, the problems highlighted in R1 can be solved without any special handling.

链接束的首选修剪过程的步骤可表示如下。The steps of the preferred trimming process for linked bundles can be represented as follows.

从一组候选链接L开始,修剪过程的步骤为:Starting from a set of candidate links L, the steps of the pruning process are:

1.设置B=createLinkBundles(L)1. Set B=createLinkBundles(L)

createLinkBundles实现上面概述的步骤,以创建最大的候选链接束集createLinkBundles implements the steps outlined above to create the largest set of candidate link bundles

2.设置Bremaining={},Bremoved={}2. Set B remaining = {}, B removed = {}

这些集合可跟踪被修剪(去除)的链接束和剩余的链接束。最初它们是空的,但随着修剪过程的进行它们会被填充。These collections keep track of pruned (removed) link bundles and the remaining link bundles. Initially they are empty, but they are filled as the trimming process progresses.

3.设置i=max({bsize:b∈B-Bremoved})3. Set i=max ({b size : b∈BB removed })

找到最大的束的大小,以便可从最大的束开始运行原始的修剪过程,然后再运行下一个最大的束,依此类推……Find the size of the largest bundle so that the original pruning process can be run from the largest bundle, then the next largest bundle, and so on...

4.设置Bi={b:bsize==i,b∈{B-Bremoved}}4. Set B i = {b: b size == i, b ∈ {BB removed }}

5.设置Bremaining_i=修剪(Bi)5. Set B remaining_i = trim(B i )

这是上面描述的原始修剪过程,适用于链接束,使用了如上所述的针对链接束扩展的规则1、2和3。This is the original pruning process described above, applied to linked bundles, using rules 1, 2, and 3 for linked bundle expansion as described above.

6.设置Bremaining=Bremaining∪Bremaining_i,并且Bremoved=Bremoved∪(Bi-Bremaining_i)6. Set B remaining =B remaining ∪B remaining_i and B removed =B removed ∪(B i -B remaining_i )

Bremaining_i是在修剪后幸存下来的一组链接束;这些需要添加到Bremaining中。Bi-Bremaining_i是被修剪的链接束的集合;这些需要添加到BremovedB remaining_i is the set of linked bundles that survived pruning; these need to be added to B remaining . B i-Bremaining_i is the set of pruned linked bundles; these need to be added to B removed .

7.对于Bremaining中的每个链接束b7. For each linked bundle b in B remaining

7.1.设置Bconflicting=conflicts(b,{b’:b’∈B∧b’size<i})7.1. Set B conflicting =conflicts(b,{b': b'∈B∧b'size <i})

conflicts(b,B)返回包含与b冲突的链接束的B的子集——违反了规则1、2或3中至少一个,针对链接束进行了修改。conflicts(b, B) returns the subset of B that contains link bundles that conflict with b - violating at least one of rules 1, 2, or 3, modified for link bundles.

7.2.设置Bremoved=Bremoved∪Bconflicting 7.2. Set B removed =B removed ∪B conflicting

任何与b冲突且大小<i的束b’∈B也应去除。链接束的主要目的是除去嘈杂的链接,因此,较大的链接束将优先于与之冲突的任何较小的链接束保持。Any bundle b'∈B that collides with b and has size < i should also be removed. The main purpose of link bundles is to get rid of noisy links, so larger link bundles will remain in preference to any smaller link bundles that conflict with them.

8.如果i==08. If i==0

停止stop

否则otherwise

转到步骤3。Go to step 3.

如果i==0,则修剪过程结束。没有大小为0的链接束。否则,可能还有没有考虑修剪的链接束。If i==0, the pruning process ends. There are no linked bundles of size 0. Otherwise, there may also be linked bundles that are not considered for pruning.

图19示出了应用于图1的上下文中说明和描述的不流利阅读实例的图2的方法。阅读者尝试大声朗读文本迈克尔·邦德(Michael Bond)的《请照顾这只熊》中的“Comealong,Paddington.We’ll take you home and you can have a nice hot bath”。在该图的左栏中提供了一个索引,该索引可与要阅读的文本(其中的单词在中间列)和StT转换软件的输出(其中的单词,包括StT软件为每种话语建议的各种替代单词,在该图的右栏中)结合使用。FIG. 19 illustrates the method of FIG. 2 applied to the non-fluent reading example illustrated and described in the context of FIG. 1 . Readers attempt to read aloud the text "Comealong, Paddington. We'll take you home and you can have a nice hot bath" from Michael Bond's "Please Take Care of This Bear." An index is provided in the left column of the figure, which can be compared with the text to be read (with the words in the middle column) and the output of the StT conversion software (with the words, including various suggested by the StT software for each utterance) Alternative words, in the right column of the figure) are used in combination.

已创建了一组初始链接,该链接将文本中的单词和转换本中的单词连接起来。每个链接将文本中的单词与转换本中的一个单词或一组替代单词相连,图中使用箭头表示。链接集是最大的链接集,因为只要文本中的单词与StT软件输出中的替代单词之一匹配,就会创建一个链接。例如,要阅读的文本中索引1处的单词“Come”通过链接连接到StT软件输出中索引1处的单词组“come”、“came”、“Kim”和“cam”,而不是StT软件输出中的任何其他单词或单词组,因为单词“come”没有出现在StT软件输出的其他位置。然而,要读取的文本中索引8处的单词“take”通过链接连接到StT输出中索引8和索引11处的单词组,因为单词“take”出现在StT软件输出中的两个这样的位置。同样,要读取的文本中索引12处的单词“you”通过链接连接到StT输出中索引9、12和15处的单词组,因为这些单词组中的每一个都包含单词“you”。An initial set of links has been created that connects the words in the text with the words in the translation. Each link connects a word in the text to a word or a set of alternate words in the translation, represented by arrows in the diagram. The link set is the largest, since a link is created whenever a word in the text matches one of the alternative words in the output of the StT software. For example, the word "Come" at index 1 in the text to be read is linked to the word groups "come", "came", "Kim" and "cam" at index 1 in the StT software output, not the StT software output any other word or group of words in , because the word "come" does not appear elsewhere in the output of the StT software. However, the word "take" at index 8 in the text to be read is connected by a link to the group of words at index 8 and index 11 in the StT output because the word "take" appears in two such positions in the StT software output . Likewise, the word "you" at index 12 in the text to be read is linked to the word groups at indices 9, 12, and 15 in the StT output, since each of these word groups contains the word "you".

如上所述,尽管在图中用象形图表示为箭头,但实际上可以以其他方式表示和/或存储链接。例如,给定的链接l可表示为一对值(i,j),指示该链接的起点和终点,即要读取的文本中单词的索引以及通过链接连接的StT软件输出中的该单词或单词组的索引。例如,将要阅读的文本中的单词“hot”和单词组“hot”、“hope”、“hop”、“hawks”、“hotdogs”等连接起来的链接可表示为(20,20),因为它连接了要读取的文本中索引20处的单词和StT软件输出中索引20处的单词组。As mentioned above, although shown as arrows in pictograms in the figures, links may actually be represented and/or stored in other ways. For example, a given link l can be represented as a pair of values (i, j) indicating the start and end of the link, i.e. the index of the word in the text to be read and the word in the output of the StT software connected by the link or The index of the word group. For example, a link connecting the word "hot" and the word groups "hot", "hope", "hop", "hawks", "hotdogs", etc. in the text to be read can be represented as (20, 20) because it The word at index 20 in the text to be read is concatenated with the word group at index 20 in the output of the StT software.

当已经创建最大链接集时,将修剪过程应用于最大链接集,以尝试将链接集减少为尽可能接近流利地阅读要阅读的文本。修剪过程涉及应用到上面讨论的规则1-3的最大链接集,以尝试识别和去除导致通读不流利的链接。When the maximum link set has been created, a pruning process is applied to the maximum link set in an attempt to reduce the link set to as close as possible to fluent reading of the text to be read. The pruning process involves applying the largest set of links to Rules 1-3 discussed above in an attempt to identify and remove links that cause poor read through.

在第一种情况下,集合Lremaining和Lremoved为空,并且集合Lunprocessed包含图19中所示的所有19个链接(链接(1,1),(2,2),(6,6)等)。由于Lunprocessed不为空,选择链接(1,1)作为要测试的第一个链接。没有链接与链接(1,1)冲突,因此集合Lconflicting={}(即空集)。因此,设置cl=0和cmin=0(因为没有冲突的链接,因此保持这些链接的成本为0)。因为cl=cmin,所以集合Lremoved={}并且Lremaining=Lremaining∪{(1,1)}。然后从Lunprocessed中去除(1,1),其余18个链接仍待处理。In the first case, the sets L remaining and L removed are empty, and the set L unprocessed contains all 19 links shown in Figure 19 (links (1,1), (2,2), (6,6) Wait). Since L unprocessed is not empty, the link (1,1) is chosen as the first link to be tested. No link conflicts with link (1,1), so the set L conflicting = {} (ie the empty set). Therefore, set cl = 0 and cmin = 0 (since there are no conflicting links, the cost of keeping these links is 0). Since c l =c min , the set L removed ={} and L remaining =L remaining ∪{(1,1)}. Then remove (1,1) from L unprocessed , the remaining 18 links are still pending.

因此,在将修剪过程应用于链接(1,1)之后,将其保留在Lremaining中并从Lunprocessed中去除。当对链接(2,2),(6,6)和(7,7)进行修剪时,它们同样适用。Therefore, after applying the pruning process to the link (1,1), it is kept in L remaining and removed from L unprocessed . The same applies when pruning the links (2,2), (6,6) and (7,7).

接下来,选择链接(8,8)(仍在Lunprocessed中)。集合Lconflicting={(8,11)},因为该链接根据规则1与链接(8,8)冲突。设置cl=1且cmin=4(因为(8,11)根据规则1与链接(8,8)冲突,根据规则3与链接(12,9),(13,10)和(15,9)冲突——共四个冲突)。在这种情况下,cl<cmin。因此集合Lremoved=Lremoved∪{(8,11)}和Lremaining=Lremaining∪{(8,8)}={(1,1),(2,2),(6,6),(7,7),(8,8)}。从Lunprocessed中去除(8,8)。Next, select the link (8,8) (still in L unprocessed ). The set L conflicting = {(8, 11)} because this link conflicts with link (8, 8) according to rule 1. Set c l = 1 and c min = 4 (since (8,11) conflicts with link (8,8) according to rule 1 and links (12,9), (13,10) and (15,9 according to rule 3) ) conflict - a total of four conflicts). In this case, c l <c min . So the set L removed = L removed ∪{(8,11)} and L remaining =L remaining ∪{(8,8)}={(1,1),(2,2),(6,6),( 7,7), (8,8)}. Remove (8,8) from L unprocessed .

将(12,9)作为Lunprocessed的下一个要考虑的元素。集合Lconflicting={(12,12),(12,15)和(15,9)},因为(12,9)根据规则1与(12,12)和(12,15)冲突,根据规则2与(15,9)冲突。设置cl=3(因为(12,9)总共有三个冲突),而cmin=5(保持链接(12,12)的成本为5,因为它根据规则1与(12,9)和(12,15)冲突,而根据规则2与(15,12)冲突,并且根据规则3与(13,10)和(15,9)冲突;保留(12,15)的成本为9,因为它根据R1与(12,9)和(12,12)冲突,根据R2与(15,15)冲突并且根据R3与(13,10)、(13,13)、(14,14)、(15,9)和(15,12)冲突;保持(15,9)的成本为7,因为它根据R2与(12,9)冲突,并且根据R3与(13,10)、(12,12)、(13,13)、(14,14)、(15,12)和(15,15)冲突)。在这种情况下,cl<cmin。因此集合Lremoved={(8,11)}∪{(12,12),(12,15),(15,9)}且Lremaining={(1,1),(2,2),(6,6),(7,7),(8,8)}∪{(12,9)}。从Lunprocessed中去除(12,9)。Lunprocessed中仍然包含元素。Take (12,9) as the next element to consider for L unprocessed . The set L conflicting = {(12,12), (12,15) and (15,9)}, since (12,9) conflicts with (12,12) and (12,15) according to rule 1, according to rule 2 Conflicts with (15,9). set c l = 3 (since (12,9) has three conflicts in total), and c min = 5 (keep linking (12,12) cost 5 because it is related to (12,9) and (12) according to rule 1 ,15) conflicts with (15,12) according to rule 2, and with (13,10) and (15,9) according to rule 3; keeping (12,15) costs 9 because it according to R1 Conflicts with (12,9) and (12,12), with (15,15) according to R2 and with (13,10), (13,13), (14,14), (15,9) according to R3 conflicts with (15,12); keeping (15,9) costs 7 because it conflicts with (12,9) according to R2, and with (13,10), (12,12), (13, 13), (14,14), (15,12) and (15,15) conflict). In this case, c l <c min . Hence the set L removed = {(8,11)}∪{(12,12),(12,15),(15,9)} and L remaining ={(1,1),(2,2),( 6,6), (7,7), (8,8)}∪{(12,9)}. Remove (12,9) from L unprocessed . L unprocessed still contains elements.

……...

图20示出了过程中此刻保留在Lremaining或Lunprocessed中的链接。Figure 20 shows the links remaining in L remaining or L unprocessed at this point in the process.

将(13,10)作为Lunprocessed的下一个要考虑的元素。集合Lconflicting={(13,13)},因为(13,10)根据规则1与(13,13)冲突。设置cl=1(因为(13,10)仅与(13,13)冲突)且cmin=2(保留(13,13)的成本为2,因为它根据规则1与(13,10)冲突并且根据规则与(15,12)冲突)。在这种情况下,cl<cmin。因此集合Lremoved={(8,11),(12,12),(12,15),(15,9)}∪{(13,13)}且Lremaining={(1,1),(2,2),(6,6),(7,7),(8,8),(12,9)}∪{(13,10)}。从Lunprocessed中去除(13,10)。Lunprocessed中仍然包含元素。Take (13,10) as the next element to consider for L unprocessed . The set L conflicting = {(13, 13)} because (13, 10) conflicts with (13, 13) according to rule 1. Setting cl = 1 (because (13,10) only conflicts with (13,13)) and cmin = 2 (retaining (13,13) costs 2 because it conflicts with (13,10) according to rule 1 and conflict with (15,12) according to the rules). In this case, c l <c min . Hence the set L removed = {(8,11), (12,12), (12,15), (15,9)}∪{(13,13)} and L remaining ={(1,1),( 2,2), (6,6), (7,7), (8,8), (12,9)}∪{(13,10)}. Remove (13,10) from L unprocessed . L unprocessed still contains elements.

将(14,14)作为Lunprocessed的下一个要考虑的元素。集合Lconflicting={(15,12)},因为(14,14)根据规则3与(15,12)冲突。设置cl=1(因为(14,14)仅与(15,12)冲突)且cmin=2(保留(15,12)的成本为2,因为它根据规则1与(15,15)冲突并且根据规则3与(14,14)冲突)。在这种情况下,cl<cmin。因此集合Lremoved={(8,11),(12,12),(12,15),(15,9),(13,13)}∪{(15,12)}且Lremaining={(1,1),(2,2),(6,6),(7,7),(8,8),(12,9),(13,10)}∪{(14,14)}。从Lunprocessed中去除(14,14)。Lunprocessed中仍然包含元素。Take (14,14) as the next element to consider for L unprocessed . The set L conflicting = {(15, 12)} because (14, 14) conflicts with (15, 12) according to rule 3. Setting cl = 1 (because (14,14) only conflicts with (15,12)) and cmin = 2 (retaining (15,12) costs 2 because it conflicts with (15,15) according to rule 1 and conflict with (14,14) according to rule 3). In this case, c l <c min . Hence the set L removed = {(8,11), (12,12), (12,15), (15,9), (13,13)}∪{(15,12)} and L remaining ={( 1,1), (2,2), (6,6), (7,7), (8,8), (12,9), (13,10)}∪{(14,14)}. Remove (14,14) from L unprocessed . L unprocessed still contains elements.

其余未处理的链接没有冲突的链接,因此当处理Lunprocessed中剩余的元素时,不会去除其他链接。在使用链接(20,20)的修剪过程的最后迭代之后,Lremoved={(8,11),(12,12),(12,15),(15,9),(13,13),(15,12)}并且Lremaining={(1,1),(2,2),(6,6),(7,7),(8,8),(12,9),(13,10),(14,14),(15,15),(16,16),(17,17),(18,18),(20,20)}。当Lunprocessed=L-{Lremaining∪Lremoved}={}时,修剪过程终止。图21示出了那时即修剪过程完成时Lremaining中的链接。The remaining unprocessed links have no conflicting links, so when the remaining elements in L unprocessed are processed, no other links are removed. After the last iteration of the pruning process using the link (20, 20), L removed = {(8, 11), (12, 12), (12, 15), (15, 9), (13, 13), (15,12)} and L remaining = {(1,1), (2,2), (6,6), (7,7), (8,8), (12,9), (13, 10), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18), (20, 20)}. When L unprocessed = L-{L remaining ∪ L removed }={}, the pruning process is terminated. Figure 21 shows the links in L remaining when the pruning process is complete.

图22示出了使用链接束输出的文本和StT转换本的相同实例。图22示出了最大的链接束集。包含多个链接的链接束以大虚线箭头表示。链接束中包含的链接显示为未填充的空心箭头。仅包含一个链接的链接束显示为简单的黑线箭头。图22中总共有9个链接束——4个显示为大箭头,5个显示为黑线箭头。为了清楚起见,图中未明确标记链接束。链接束将使用符号(j,k,m)进行标识,表示包含m个链接的链接束,其中第一个包含的链接从文本单词j到转换(StT输出)单词k。因此,图22顶部的链接束(包含链接(1,1)和(2,2)的链接束)被标识为(1,1,2),因为链接束中的第一个链接链接了索引1处的文本单词(“come”)和索引1处的转换本单词(“come”、“came”、“Kim”、“cam”),并且链接束总共包含2个链接。Figure 22 shows the same example of text and StT transforms output using linked bundles. Figure 22 shows the largest set of linked bundles. Link bundles that contain multiple links are represented by large dashed arrows. Links contained in link bundles appear as unfilled hollow arrows. Link bundles that contain only one link appear as simple black arrows. There are a total of 9 linked bundles in Figure 22 - 4 are shown as large arrows and 5 are shown as black line arrows. For clarity, link bundles are not explicitly labeled in the figure. Link bundles will be identified using the notation (j,k,m), representing a link bundle containing m links, where the first contained link goes from text word j to transition (StT output) word k. Therefore, the link bundle at the top of Figure 22 (the one containing the links (1,1) and (2,2)) is identified as (1,1,2) because the first link in the link bundle links index 1 The text word ("come") at index 1 and the translation word ("come", "came", "Kim", "cam") at index 1, and the link bundle contains a total of 2 links.

将上述链节束修剪过程应用于图22所示实例的过程如下。The process of applying the above-described link bundle trimming process to the example shown in FIG. 22 is as follows.

集合B={b1,1,2,b6,6,3,b8,11,1,b12,9,2,b12,15,1,b1212,7,b15,9,1,b15,12,1,b20,20,1,……}。B将包括其他链接束,这些链接束构成此处列出的子束。例如,除了b1,1,2,还有b1,1,1和b2,2,1。但是,为简单起见,这些子束未在上面或下面的步骤中列出。Set B = {b 1,1,2 ,b 6,6,3 ,b 8,11,1 ,b 12,9,2 ,b 12,15,1 ,b 1212,7 ,b 15,9,1 ,b 15,12,1 ,b 20,20,1 ,…}. B will include other linked bundles that make up the sub-bundles listed here. For example, in addition to b1,1,2, there are also b1,1,1 and b2,2,1. However, for simplicity, these beamlets are not listed in the steps above or below.

首先,集合Bremaining和Bremoved为空。设置i=7(最大束的大小,在此实例中为b12,12,7)。Bi是大小为i的链接束的集合。在所示的实例中,b12,12,7是满足i=7的链接束的唯一实例。因此,对于i=7,集合Bi={b12,12,7}。Bremaining_i是在Bi上运行链接修剪过程的结果。由于Bi仅包含一个束,因此不会由于链接修剪过程而修剪任何内容。因此,集合Bremaining=Bremaining∪Bremaining_i={b12,12,7}并且Bremoved={}。First, the sets B remaining and B removed are empty. Set i=7 (the size of the largest bundle, in this example b 12,12,7 ). B i is the set of linked bundles of size i. In the example shown, b 12,12,7 is the only instance of a linked bundle that satisfies i=7. Therefore, for i=7, the set B i ={b 12,12,7 }. B remaining_i is the result of running the link pruning process on B i . Since B i contains only one bundle, nothing will be trimmed due to the link trimming process. Therefore, the set B remaining = B remaining ∪ B remaining_i = {b 12, 12, 7 } and B removed = {}.

接下来,对于Bremaining中的每个b(在本例中仅为b12,12,7),确定Bconflicting:Bconflicting={b12,9,2,b12,15,1,b15,9,1,b15,12,1}(因为b12,12,7根据规则2与b12,9,2和b15,9,1冲突并且根据规则1与b12,15,1和b15,12,1冲突)。集合Bremoved=Bremoved∪Bconflicting={b12,9,2,b12,15,1,b15,9,1,b15,12,1}。图23示出了链接束修剪过程中此时剩余的链接束。Next, for each b in B remaining (in this case only b 12,12,7 ), determine B conflicting : B conflicting = {b 12,9,2 ,b 12,15,1 ,b 15 ,9,1 ,b 15,12,1 } (because b 12,12,7 conflicts with b 12,9,2 and b 15,9,1 according to rule 2 and with b 12,15,1 and according to rule 1 b 15,12,1 conflict). Set B removed =B removed ∪B conflicting ={b 12,9,2 ,b 12,15,1 ,b 15,9,1 ,b 15,12,1 }. Figure 23 shows the link bundles remaining at this point in the link bundle pruning process.

接下来,设置i=3(B中剩余的第二大链接束大小——Bremoved)。集合Bi={b6,6,3},因为仅剩下一个大小为3的束。集合Bremaining_i={b6,6,3}。对于b6,6,3,Bconflicting为{b8,11,1}(因为b6,6,3根据规则2仅与b8,11,1冲突)。集合Bremoved={b12,9,2,b12,15,1,b15,9,1,b15,12,1}∪{b8,11,1}={b8,11,1,b12,9,2,b12,15,1,b15,9,1,b15,12,1}。图24示出了此时剩余的链接束。Next, set i=3 (the second largest link bundle size remaining in B - B removed ). The set B i = {b 6,6,3 }, since there is only one bundle of size 3 left. The set B remaining_i = {b 6,6,3 }. For b 6,6,3 , B conflicting is {b 8,11,1 } (because b 6,6,3 only conflicts with b 8,11,1 according to rule 2). Set B removed ={b 12,9,2 ,b 12,15,1 ,b 15,9,1 ,b 15,12,1 }∪{b 8,11,1 }={b 8,11,1 ,b 12,9,2 ,b 12,15,1 ,b 15,9,1 ,b 15,12,1 }. FIG. 24 shows the remaining link bundles at this time.

由于没有剩余的冲突链接束要去除,因此链接束修剪过程的进一步迭代不会导致链接束的任何进一步更改。最终,在测试大小为2和1的链接束是否存在冲突(i=0)之后,链接束修剪过程将停止。Since there are no remaining conflicting link bundles to remove, further iterations of the link bundle trimming process do not result in any further changes to the link bundles. Finally, after testing the linked bundles of size 2 and 1 for collision (i=0), the linked bundle pruning process will stop.

在图21所示的链接情况和图24所示的链接束情况下,结果都是参考文本(例如用户正在阅读的文本)中的单词与StT软件转换本(例如,软件的输出,其已解释了来自检测阅读者产生的声音的麦克风或其他装置的输入信号)中的单词之间的一组相关性。In the linked case shown in Figure 21 and the linked bundle case shown in Figure 24, the result is both the word in the reference text (eg the text the user is reading) and the StT software translation (eg, the output of the software, which has been interpreted A set of correlations between words in an input signal from a microphone or other device that detects sounds produced by the reader).

上述修剪过程和方法使得可提供有关用户语音的各种反馈。例如,修剪过程和方法使得可基于StT转换来跟踪阅读者当前的阅读位置,并且这种方式可容忍大量不流利阅读。这可启用诸如在屏幕上突出显示阅读者当前位置的功能。例如,如果用户正在阅读显示器(例如计算机屏幕、手机屏幕、平板电脑屏幕或其他可控制其输出的显示器)上显示的文本,则当用户阅读时该显示器可能会突出显示、加下划线或以其他方式在视觉上指示用户在文本中的当前位置。例如,取决于用户喜好和特定情况,显示器能够突出显示当前的单词、行、句子或段落。这可帮助阅读者在阅读过程中始终如一,从而使他们不会迷失在文本中。这也可以有助于确保说话者准确地说出所有单词。例如,如果通过StT软件确定说话者没有准确地说出某个特定单词(例如,说话者的发音不正确或用户错过了该单词),则在确定用户正确说出了这个单词之前,突出显示可能不会超过该单词。这可以有助于提高说话者自信而准确地朗读文本的能力。The trimming process and method described above makes it possible to provide various feedbacks about the user's speech. For example, the pruning process and method makes it possible to track the reader's current reading position based on the StT transition, and in such a way that it can tolerate large amounts of non-fluent reading. This enables features such as highlighting the reader's current location on the screen. For example, if a user is reading text displayed on a display (such as a computer screen, phone screen, tablet screen, or other display whose output can be controlled), the display may be highlighted, underlined, or otherwise displayed as the user is reading Visually indicate the user's current location in the text. For example, the display can highlight the current word, line, sentence or paragraph depending on user preferences and specific circumstances. This helps readers to be consistent in their reading process so they don't get lost in the text. This can also help ensure that the speaker says all the words accurately. For example, if it is determined by the StT software that the speaker did not say a particular word accurately (e.g., the speaker pronounced it incorrectly or the user missed the word), highlighting may will not exceed the word. This can help improve the speaker's ability to read text aloud with confidence and accuracy.

修剪过程和方法还使得可以跟踪给定参考文本中的阅读者正确阅读了多少个单词,并标识未正确阅读的单词。例如,可使用修剪过程和方法来维护未链接或间断链接的单词的有序列表。该列表可按单词未链接的次数占单词在阅读中遇到的次数的比例来排序,从而给出一种错误率(无论是阅读者产生的还是软件产生的)。在一些实施方式中,列表可仅包含错误率高于阈值的单词。The pruning process and method also makes it possible to track how many words were read correctly by the reader in a given reference text, and to identify words that were not read correctly. For example, pruning processes and methods can be used to maintain an ordered list of unlinked or intermittently linked words. The list can be sorted by the proportion of the number of times the word is not linked to the number of times the word is encountered in reading, giving an error rate (either reader-generated or software-generated). In some implementations, the list may only contain words with an error rate above a threshold.

修剪过程和方法可以使阅读者错误和上面列表中的StT软件中的错误区别开来。例如,如果某个单词首次出现在列表中,并且已知阅读者之前已经多次正确阅读了该单词(例如,由于该单词已经出现在文本中,该阅读者已经阅读了很多次并且阅读得正确无误),则可能是StT软件出现错误。对于有时会被StT软件忽略的常见短词(例如“the”、“and”和“you”)尤其如此。出现在未链接词列表中并且阅读者之前从未遇到过的新单词,或者已经在列表中的未链接单词可能是阅读错误,将被放入阅读者的“practice”列表中。当阅读者练习了列表中的单词并正确发音后,该单词将在列表中向下移动。The trimming process and method can differentiate reader errors from errors in the StT software listed above. For example, if a word first appears in the list, and the reader is known to have read the word correctly many times before (e.g., the reader has read it many times and read it correctly because the word already appears in the text No error), it may be an error in the StT software. This is especially true for common short words like "the," "and," and "you" that are sometimes ignored by StT software. New words that appear in the list of unlinked words and that the reader has never encountered before, or unlinked words that are already in the list, may be misreadings, and will be put into the reader's "practice" list. When the reader has practiced and pronounced a word in the list correctly, the word will move down the list.

修剪过程和方法可使得产生诸如“正确单词/读取单词”(其中正确单词的数量基于修剪后剩余链接的数量和/或修剪的链接数量)的度量,以及“每分钟读取单词”,例如基于StT软件提供的转换本中的计时信息。这样的定时信息可用于发现沉默的犹豫,和/或标识标点符号被阅读者正确地解释了——从而在遇到诸如“,”、“。”、“:”之类的东西时在语音中引入适当的停顿。The pruning process and method may result in metrics such as "words read/words read" (where the number of correct words is based on the number of links remaining after pruning and/or the number of links pruned), and "words read per minute", eg Based on the timing information in the conversion book provided by the StT software. Such timing information can be used to spot hesitations of silence, and/or to identify that punctuation was correctly interpreted by the reader - thus in speech when encountering things such as ",", ".", ":" Introduce appropriate pauses.

修剪过程和方法可在可在移动电话、平板电脑、笔记本电脑、台式机、其他个人计算装置或可接收表示用户产生的声音的信号或数据的任何设备和可处理所述信号或数据的处理器上执行的软件中实施。例如,处理器可接收语音到文本的转换软件输出的数据。所述数据可以已经由处理器本身基于麦克风或其他语音输入装置生成的信号生成。该设备优选地包括显示器,用户可使用该显示器来获得关于他或她对参考文本的阅读的基本瞬时的反馈。The trimming process and method can be performed on a mobile phone, tablet, laptop, desktop, other personal computing device, or any device that can receive signals or data representing user-generated sounds and processors that can process said signals or data. implemented in the software that executes on it. For example, the processor may receive data output by speech-to-text conversion software. The data may have been generated by the processor itself based on signals generated by a microphone or other voice input device. The device preferably includes a display with which the user can obtain substantially instantaneous feedback on his or her reading of the reference text.

修剪过程和方法可替代地或附加地使用关于参考文本中的单词链接到转换本中给定话语的“替代单词”中的第一个的频率的信息来产生度量。修剪过程可利用加权值,诸如标识在哪里找到特定匹配项的替代单词阵列中的索引。在这种情况下,可设置权重值,使得权重越小,StT软件对匹配的置信度就越大。权重0例如可指示链接到替代单词中的第一个的参考单词。这会潜在地给说话者他或她发音的准确性的指示。The pruning process and method may alternatively or additionally use information about the frequency with which words in the reference text are linked to the first of "alternative words" for a given utterance in the translation to generate metrics. The pruning process may utilize weighted values, such as indices in an array of alternative words that identify where a particular match was found. In this case, the weight value can be set so that the smaller the weight, the greater the confidence of the StT software on the match. A weight of 0 may, for example, indicate a reference word linked to the first of the alternative words. This could potentially give the speaker an indication of the accuracy of his or her pronunciation.

本发明的实施方式可由包括处理器和存储器的设备来执行。在此上下文中,处理器可以是能够执行导致上述过程和方法执行的指令的任何装置或结构。术语“处理器”旨在包括任何合适类型的处理器体系结构。类似地,在这种情况下,存储器可以是能够——无论是临时还是永久——存储数据以使得上述过程和方法能够被实施。术语“存储器”旨在包括任何合适类型的存储器,包括易失性和非易失性类型的存储器。Embodiments of the invention may be performed by a device including a processor and a memory. In this context, a processor may be any device or structure capable of executing instructions that cause the above-described processes and methods to be performed. The term "processor" is intended to include any suitable type of processor architecture. Similarly, in this case, the memory may be capable of storing data, whether temporarily or permanently, to enable the above-described processes and methods to be implemented. The term "memory" is intended to include any suitable type of memory, including both volatile and nonvolatile types of memory.

Claims (16)

1.一种分析语音的方法,包括:1. A method of analyzing speech, comprising: 接收由语音到文本软件对参考文本的至少一部分的阅读者的口头阅读做出的估计,所述估计包括多个估计字形,每个估计字形表示至少一个字素;receiving an estimate made by speech-to-text software of a reader's oral reading of at least a portion of the reference text, the estimate comprising a plurality of estimated glyphs, each estimated glyph representing at least one grapheme; 将所述估计字形与表示所述参考文本的至少一部分的多个参考字形进行比较,每个参考字形表示至少一个字素;和comparing the estimated glyph to a plurality of reference glyphs representing at least a portion of the reference text, each reference glyph representing at least one grapheme; and 基于所述估计字形与所述参考字形的比较,提供与所述阅读者语音相关的反馈。Feedback related to the reader's speech is provided based on the comparison of the estimated glyph to the reference glyph. 2.根据权利要求1所述的方法,其中,每个参考字形表示所述参考文本中的至少一个单词,并且每个估计字形表示由所述语音到文本软件估计为阅读者说的单词的一组一个或多个替代单词或短语。2. The method of claim 1, wherein each reference glyph represents at least one word in the reference text, and each estimated glyph represents a fraction of a word estimated by the speech-to-text software as a word spoken by a reader. Group one or more alternate words or phrases. 3.根据权利要求1或2所述的方法,其中,将所述估计字形与所述参考字形进行比较包括:3. The method of claim 1 or 2, wherein comparing the estimated glyph to the reference glyph comprises: 将每个估计字形链接到任何匹配的参考字形以产生多个链接;link each estimated glyph to any matching reference glyph to produce multiple links; 标识所述链接之间的冲突;和identify conflicts between said links; and 通过去除一些冲突的链接来修剪所述链接,以解决所标识的冲突。The links are pruned by removing some conflicting links to resolve the identified conflicts. 4.根据权利要求3所述的方法,其中,标识所述链接之间的冲突包括标识违反一组规则中的至少一个规则的链接,所述一组规则包括:4. The method of claim 3, wherein identifying a conflict between the links comprises identifying a link that violates at least one of a set of rules, the set of rules comprising: (1)一个参考字形不得与一个以上的估计字形链接;没有一个参考字形能够具有两个链接(1) A reference glyph may not be linked to more than one estimated glyph; no reference glyph can have two links (2)一个估计字形不得与一个以上参考字形链接;和(2) An estimated glyph may not be linked to more than one reference glyph; and (3)没有两个链接能够互相交叉。(3) No two links can cross each other. 5.根据权利要求3或4所述的方法,其中,修剪所述链接包括:5. The method of claim 3 or 4, wherein pruning the links comprises: 选择第一链接并标识与所述第一链接冲突的一组链接;selecting a first link and identifying a set of links that conflict with the first link; 确定将每个链接保持在所标识的一组冲突链接中的成本,所述成本包括与该组中每个链接冲突的链接的数量;和determining the cost of maintaining each link in the identified set of conflicting links, the cost including the number of links that conflict with each link in the set; and 去除所述冲突链接,但成本最低的冲突链接除外。The conflicting links are removed except for the lowest cost conflicting link. 6.根据权利要求3所述的方法,还包括:从所述多个链接产生多个链接束,每个链接束包括一个或多个链接,所述一个或多个链接形成与参考字形的连续序列匹配的估计字形的连续序列;并且其中:6. The method of claim 3, further comprising generating a plurality of link bundles from the plurality of links, each link bundle comprising one or more links, the one or more links forming a continuation with a reference glyph a contiguous sequence of estimated glyphs whose sequence matches; and where: 标识链接之间的冲突包括标识链接束之间的冲突;并且Identifying conflicts between links includes identifying conflicts between linked bundles; and 修剪所述链接包括去除一些冲突的链接束以解决所标识的冲突。Pruning the links includes removing some conflicting link bundles to resolve the identified conflicts. 7.根据权利要求6所述的方法,其中,修剪链接还包括:优先保持较大的链接束而不是较小的链接束,以解决所标识的冲突。7. The method of claim 6, wherein pruning the links further comprises maintaining larger link bundles in preference to smaller link bundles to resolve the identified conflicts. 8.根据权利要求3至7中的任一项所述的方法,其中,每个链接包括标识估计字形和匹配的参考字形的一对索引。8. The method of any of claims 3 to 7, wherein each link includes a pair of indices identifying the estimated glyph and a matching reference glyph. 9.根据任一前述权利要求所述的方法,其中,所述反馈包括以下中的一个或多个:9. The method of any preceding claim, wherein the feedback comprises one or more of: 指示所述阅读者的语音流利度的至少一个参数;和at least one parameter indicative of the reader's speech fluency; and 参考文本中阅读者当前阅读位置的至少一种表示。At least one representation of the reader's current reading position in the reference text. 10.一种包括处理器和存储器的设备,其中,所述存储器被配置为:10. An apparatus comprising a processor and a memory, wherein the memory is configured to: 接收由语音到文本软件对参考文本的至少一部分的阅读者的口头阅读做出的估计,所述估计包括多个估计字形,每个估计字形表示至少一个字素;receiving an estimate made by speech-to-text software of a reader's oral reading of at least a portion of the reference text, the estimate comprising a plurality of estimated glyphs, each estimated glyph representing at least one grapheme; 将估计字形与表示所述参考文本的至少一部分的多个参考字形进行比较,每个参考字形表示至少一个字素;和comparing the estimated glyph to a plurality of reference glyphs representing at least a portion of the reference text, each reference glyph representing at least one grapheme; and 基于所述估计字形与所述参考字形的比较,提供与所述阅读者语音相关的反馈。Feedback related to the reader's speech is provided based on the comparison of the estimated glyph to the reference glyph. 11.根据权利要求10所述的设备,还包括麦克风,该麦克风用于检测所述阅读者的口头阅读的声音并且用于向所述语音到文本软件产生输出信号。11. The apparatus of claim 10, further comprising a microphone for detecting the voice of the reader's spoken reading and for generating an output signal to the speech-to-text software. 12.根据权利要求10或11所述的设备,被配置为执行根据权利要求1至9中的任一项所述的方法。12. An apparatus according to claim 10 or 11, configured to perform the method according to any of claims 1 to 9. 13.一种计算机程序,包括指令,当所述程序由处理器执行时,所述指令使所述处理器执行以下操作:13. A computer program comprising instructions which, when executed by a processor, cause the processor to: 将与由语音到文本软件对参考文本的至少一部分的阅读者的口语阅读做出的估计对应的多个估计字形与表示所述参考文本的至少一部分的多个参考字形进行比较,每个估计字形和参考字形表示至少一个字素;和comparing a plurality of estimated glyphs corresponding to estimates made by the speech-to-text software of a reader's spoken reading of at least a portion of a reference text to a plurality of reference glyphs representing at least a portion of the reference text, each estimated glyph and a reference glyph representing at least one grapheme; and 基于所述估计字形与所述参考字形的比较,提供与所述阅读者语音相关的反馈。Feedback related to the reader's speech is provided based on the comparison of the estimated glyph to the reference glyph. 14.根据权利要求13所述的计算机程序,包括指令,当所述程序由处理器执行时,所述指令使所述处理器执行根据权利要求1至9中的任一项所述的方法。14. The computer program of claim 13, comprising instructions which, when the program is executed by a processor, cause the processor to perform the method of any one of claims 1 to 9. 15.一种用指令编码的非暂时性计算机可读存储介质,该指令在由处理器执行时使所述处理器:15. A non-transitory computer-readable storage medium encoded with instructions that, when executed by a processor, cause the processor to: 将与由语音到文本软件对参考文本的至少一部分的阅读者的口语阅读做出的估计对应的多个估计字形与表示所述参考文本的至少一部分的多个参考字形进行比较,每个估计字形和参考字形表示至少一个字素;和comparing a plurality of estimated glyphs corresponding to estimates made by the speech-to-text software of a reader's spoken reading of at least a portion of a reference text to a plurality of reference glyphs representing at least a portion of the reference text, each estimated glyph and a reference glyph representing at least one grapheme; and 基于所述估计字形与所述参考字形的比较,提供与所述阅读者语音相关的反馈。Feedback related to the reader's speech is provided based on the comparison of the estimated glyph to the reference glyph. 16.根据权利要求15所述的非暂时性计算机可读存储介质,其用指令编码,所述指令在由处理器执行时使所述处理器执行根据权利要求1至9中任一项所述的方法。16. The non-transitory computer-readable storage medium of claim 15 encoded with instructions that, when executed by a processor, cause the processor to perform the performance of any one of claims 1 to 9 Methods.
CN201980020915.XA 2018-03-23 2019-03-14 Handling speech-to-text conversion Expired - Fee Related CN111971744B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP18163773.7A EP3544001B8 (en) 2018-03-23 2018-03-23 Processing speech-to-text transcriptions
EP18163773.7 2018-03-23
PCT/EP2019/056515 WO2019179884A1 (en) 2018-03-23 2019-03-14 Processing speech-to-text transcriptions

Publications (2)

Publication Number Publication Date
CN111971744A true CN111971744A (en) 2020-11-20
CN111971744B CN111971744B (en) 2024-06-25

Family

ID=61800347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980020915.XA Expired - Fee Related CN111971744B (en) 2018-03-23 2019-03-14 Handling speech-to-text conversion

Country Status (4)

Country Link
US (1) US11790917B2 (en)
EP (1) EP3544001B8 (en)
CN (1) CN111971744B (en)
WO (1) WO2019179884A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11875780B2 (en) * 2021-02-16 2024-01-16 Vocollect, Inc. Voice recognition performance constellation graph
US20230023691A1 (en) * 2021-07-19 2023-01-26 Sterten, Inc. Free-form text processing for speech and language education

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999066493A1 (en) * 1998-06-19 1999-12-23 Kurzweil Educational Systems, Inc. Computer audio reading device providing highlighting of either character or bitmapped based text images
US6622121B1 (en) * 1999-08-20 2003-09-16 International Business Machines Corporation Testing speech recognition systems using test data generated by text-to-speech conversion
WO2006031536A2 (en) * 2004-09-10 2006-03-23 Soliloquy Learning, Inc. Intelligent tutoring feedback
CN101031913A (en) * 2004-09-30 2007-09-05 皇家飞利浦电子股份有限公司 Automatic text correction
CN101188110A (en) * 2006-11-17 2008-05-28 陈健全 Method for improving text and voice matching efficiency
US20080140413A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Synchronization of audio to reading
CN102272827A (en) * 2005-06-01 2011-12-07 泰吉克通讯股份有限公司 Method and device for solving ambiguous manual input text input by voice input
CN103714048A (en) * 2012-09-29 2014-04-09 国际商业机器公司 Method and system used for revising text
CN105103221A (en) * 2013-03-05 2015-11-25 微软技术许可有限责任公司 Speech recognition assisted evaluation on text-to-speech pronunciation issue detection
GB201519494D0 (en) * 2015-11-04 2015-12-16 Univ Oxford Speech processing system and method
CN105810197A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Voice processing method, voice processing device and electronic device
CN105913841A (en) * 2016-06-30 2016-08-31 北京小米移动软件有限公司 Voice recognition method, voice recognition device and terminal
CN107305768A (en) * 2016-04-20 2017-10-31 上海交通大学 Easy wrongly written character calibration method in interactive voice

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8109765B2 (en) * 2004-09-10 2012-02-07 Scientific Learning Corporation Intelligent tutoring feedback
US20060069562A1 (en) * 2004-09-10 2006-03-30 Adams Marilyn J Word categories
US9520068B2 (en) * 2004-09-10 2016-12-13 Jtt Holdings, Inc. Sentence level analysis in a reading tutor
US7433819B2 (en) * 2004-09-10 2008-10-07 Scientific Learning Corporation Assessing fluency based on elapsed time
TW200926140A (en) * 2007-12-11 2009-06-16 Inst Information Industry Method and system of generating and detecting confusion phones of pronunciation
US11682318B2 (en) * 2020-04-06 2023-06-20 International Business Machines Corporation Methods and systems for assisting pronunciation correction
KR102478076B1 (en) * 2022-06-13 2022-12-15 주식회사 액션파워 Method for generating learning data for speech recognition errer detection

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999066493A1 (en) * 1998-06-19 1999-12-23 Kurzweil Educational Systems, Inc. Computer audio reading device providing highlighting of either character or bitmapped based text images
US6622121B1 (en) * 1999-08-20 2003-09-16 International Business Machines Corporation Testing speech recognition systems using test data generated by text-to-speech conversion
WO2006031536A2 (en) * 2004-09-10 2006-03-23 Soliloquy Learning, Inc. Intelligent tutoring feedback
CN101031913A (en) * 2004-09-30 2007-09-05 皇家飞利浦电子股份有限公司 Automatic text correction
CN102272827A (en) * 2005-06-01 2011-12-07 泰吉克通讯股份有限公司 Method and device for solving ambiguous manual input text input by voice input
CN101188110A (en) * 2006-11-17 2008-05-28 陈健全 Method for improving text and voice matching efficiency
US20080140413A1 (en) * 2006-12-07 2008-06-12 Jonathan Travis Millman Synchronization of audio to reading
CN103714048A (en) * 2012-09-29 2014-04-09 国际商业机器公司 Method and system used for revising text
CN105103221A (en) * 2013-03-05 2015-11-25 微软技术许可有限责任公司 Speech recognition assisted evaluation on text-to-speech pronunciation issue detection
CN105810197A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Voice processing method, voice processing device and electronic device
GB201519494D0 (en) * 2015-11-04 2015-12-16 Univ Oxford Speech processing system and method
CN107305768A (en) * 2016-04-20 2017-10-31 上海交通大学 Easy wrongly written character calibration method in interactive voice
CN105913841A (en) * 2016-06-30 2016-08-31 北京小米移动软件有限公司 Voice recognition method, voice recognition device and terminal

Also Published As

Publication number Publication date
EP3544001A1 (en) 2019-09-25
CN111971744B (en) 2024-06-25
US20210027786A1 (en) 2021-01-28
US11790917B2 (en) 2023-10-17
EP3544001B1 (en) 2021-12-08
WO2019179884A1 (en) 2019-09-26
EP3544001B8 (en) 2022-01-12

Similar Documents

Publication Publication Date Title
JP5901001B1 (en) Method and device for acoustic language model training
TW448381B (en) Automatic segmentation of a text
EP2058800B1 (en) Method and system for recognizing speech for searching a database
JP4215418B2 (en) Word prediction method, speech recognition method, speech recognition apparatus and program using the method
JP5094486B2 (en) Synonymity determination device, method, program, and recording medium
WO2015169134A1 (en) Method and apparatus for phonetically annotating text
CN107679032A (en) Voice changes error correction method and device
JP5561123B2 (en) Voice search device and voice search method
CN106570180A (en) Artificial intelligence based voice searching method and device
KR20170008357A (en) System for Translating Using Crowd Sourcing, Server and Method for Web toon Language Automatic Translating
CN114678013B (en) A method, device and readable storage medium for sentence pronunciation evaluation
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
JP5097802B2 (en) Japanese automatic recommendation system and method using romaji conversion
WO2014194299A1 (en) Systems and methods for adaptive proper name entity recognition and understanding
CN111971744A (en) Handling speech to text conversion
CN110945514B (en) System and method for segmenting sentences
KR20120045906A (en) Apparatus and method for correcting error of corpus
US20070055496A1 (en) Language processing system
KR20200102309A (en) System and method for voice recognition using word similarity
JP2019159118A (en) Output program, information processing device, and output control method
JP4220151B2 (en) Spoken dialogue device
JP2013109738A (en) Semantic label application model learning device, semantic label application device, semantic label application model learning method and program
CN113591441A (en) Voice editing method and device, storage medium and electronic equipment
JPH11202886A (en) Speech recognition device, word recognition device, word recognition method, and storage medium storing word recognition program
Allauzen et al. Voice Query Refinement.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20240625