[go: up one dir, main page]

GB2390930A - Foreign language speech recognition - Google Patents

Foreign language speech recognition Download PDF

Info

Publication number
GB2390930A
GB2390930A GB0324945A GB0324945A GB2390930A GB 2390930 A GB2390930 A GB 2390930A GB 0324945 A GB0324945 A GB 0324945A GB 0324945 A GB0324945 A GB 0324945A GB 2390930 A GB2390930 A GB 2390930A
Authority
GB
United Kingdom
Prior art keywords
current
text
written text
word
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0324945A
Other versions
GB2390930B (en
GB2390930A8 (en
GB0324945D0 (en
Inventor
Robert J Tippe
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Custom Speech USA Inc
Original Assignee
Custom Speech USA Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Custom Speech USA Inc filed Critical Custom Speech USA Inc
Priority claimed from GB0118231A external-priority patent/GB2361569B/en
Publication of GB0324945D0 publication Critical patent/GB0324945D0/en
Publication of GB2390930A publication Critical patent/GB2390930A/en
Application granted granted Critical
Publication of GB2390930B publication Critical patent/GB2390930B/en
Publication of GB2390930A8 publication Critical patent/GB2390930A8/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

An apparatus for substantially simplifying the production of a foreign language speech model for a speech recognition program. Wherein a voice dictation recording is recorded a transcribed file is produced by a human transcriptionist and a written text is produced by a speech recognition program, wherein said written text is at least temporarily synchronized to said voice dictation recording, comprising: <SL> <LI>- means for comparing a copy of written text with said transcribed file resulting in a list of unmatched words. Said sequential list having a beginning, an end and a current unmatched word, said current unmatched word being successively advanced from said beginning to said end; <LI>- means for searching for said current unmatched word simultaneously within a first buffer containing said written text and a second buffer associated with said sequential list; and <LI>- means for correcting said current unmatched word in said second buffer, including displaying said current unmatched word in a manner substantially visually isolated from other text in said copy of said written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word. </SL>

Description

-1 SYSTEM ANrD METHOD FOR AUTOMATING Tracy S C=T 0 SE Red. ACE S The
present invention relates in general to computer speech recognition systems and, in particular, to a system Id method for automating the text transcription of voice dictation by various end users.
Speech recognition programs are well known in the aft. While these programs are ultimately useful in automatically converting speech into text, many users are dissuaded fi om using these programs because they require each user to spend a significant aft ount of time training the system. Usually this training begins by having each user read a 15 series of pre-selected materials for approximately 20 minutes. Then, as the user continues to use the program, as words are improperly transcribed the user is expected to stop and train the program as to the intended word thus advancing the ultimate accuracy of the acoustic model. Unfortunately, most professionals (doctors, dentists, veterinarians, lawyers) and business executives are unwilling to spend the time 20 developing the necessary acoustic model to truly benefit From the automated transcription. There are systems for using computers for routing transcription from a group of end users. Most often these systems are used in large multi-user settings such as hospitals.
2: In those systems, a voice user dictates into a general-pupose computer or o4.hef recording device and the resulting file is transferred autornatical]y to a human -. -; transcidptibrust. The human transcriptionist transcribes the file, which is then returned À to the original 'hawthorn for review. These systems have the perpetual overhead of - employing a sufficient number of human transcriptionists to transcribe all of the 30 dictation files.
- -2 According to the present invention there is provided an apparatus for substantially simplifying the production of a foreign language speech mode] for a speech recognition program wherein said foreign language provides a sufficient set of words to teach a voice dictation recording based upon a transcribed file produced by a human in; a' at.. -; ++ o f. . o A., A I; +; _ _ - ^ t =' "; 4 4 AIJ A V44 tA L__ - \_ _ d. __ 4 t _>A _...... _. _ ^.4 said written text is at least temporarily synchronized to said voice dictation recording, said apparatus comprising: 10 - means for sequentially comparing a copy of said written text with said transcribed file resulting in a sequential list of unmatched words culled from said copy of said written text, said sequential list having a beginning, an end and a current unmatched word, said current unmatched word being successively advanced from said beginning to said end; - means for incrementally searching for said current unmatched word contemporaneously within a first buffer associated with the speech recognition program containing said written text and a second buffer associated with said sequential list; and - means for correcting said current umnatched word in said second buffer, said correcting means including means for displaying said current unmatched word in a manner substantially visually isolated from other text in said copy of said written text and means for playing a portion of said synchronized voice dictation recording, from 25 said first buffer associated with said current unmatched word The correcting means may include means for alternatively viewing said current unmatched word in context within said copy of said written text.
30 The manner substantially visually isolated from other text may be manually selected from a group containing word-by-word display, sentenceby-sentence display, and said current urunatched word display.
The present invention will now be described, by way of example, with reference to the accompanvin drawings. in which: 5 Fig. 1 of the drawings is a block diagram of one potential embodiment of the present S-jSti-; 1Vl Su:iia; Maui;; ti'ai;:SC;ipuvi; SCi-Y'JCC:S i;'01-1 0'1' it;O't-C - iviCc 'US>G'iS' Fig. lb of the drawings is a block diagram of a general- purpose computer which may be used as a dictation station, a transcription station and the control means within the 10 present system; Fig. 2a of the drawings is a flow diagram of the main loop of the control means of the present system; 15 Fig. 2b of the drawings is a flow diagram of the enrollment stage portion of the control means of the present system; Fig. 2c of the drawings is a flow diagram of the training stage portion of the control means of the present system;
-ó - Fig. 2d of the drawings is a flow diagram of the automation stage portion of the control means of the present system; Fig 3 of the drawings is a directory structure used by the control means in the present system; Fig 4 of the drawings is a block diagram of a portion of a preferred embodiment ofthe manual editing, means; and Fig 5 of the drawings is an elevation view of the remainder of a prererrea embodiment of the manual editing means.
I O While the present invention may be embodied in many different forms, there is shown in the drawings and discussed herein a few specific embodiments with the understanding that the present disclosure is to be considered only as an exerr.plification
ofthe principles ofthe invention and is not intended to limit the invention to the embodiments illustrated I Fi i,] of the drawings generally shows one potential embodiment of the present system for substantially automating transcription services for one or more voice users.
The present system must include some means for receiving, a voice dictation fife from a current user This voice dictation file receiving means can be a digital audio recorder, an analog audio recorder, or standard means for receiving computer files on magnetic;nedia 20 or via 2 data connection.
As shown, in one embodiment, the system 100 includes multiple digital recording stations]0, 11, 12 and i3. Each digital recording station has at least a digital audio recorder and means for identifying the current voice user.
Preferably, each of these digital recording stations is implemen.-ed on a o,er-.zJ-
purpose computer (such as computer 20), although a specialized compute. could be developed for this specific purpose. The ceneral-purpose computer, though has the added advantage of being adaptable to varying uses in addition to operating within the present system i 00. In general, the gereral-purpose compute' should have, among, other e,rnents, a.-r, icroprocessor (such as the Intel Corporation PENTII\4iC>'rix K6 or
or -.r Motorola 68000 series); volatile and non-volatile memory; one or more mass storage devices (i.e. JUDD (not shown), Z ?PY drive 21, and other removable media devices 92 such as a CD-ROM drive' DITTO, ZIP or J. AZ drive (from Iornega Corporation) and the like); arious user input devices, such as a mouse 93, a keyboard 9.. or rn.ir. onshore, 5 an, video display system 26.!.n cue =,louirnent, the ceneral-?urpose co.m.pu4. el is controlled by the WINDONN7S 9.x operating system It is co. rztem?lated ho Fever, the, e prr-sent system would work equally well using a lACINT()r c^...;Gf Liz even another operating system such as a TOWS CE, UNIX or a JAVA cased operatir.g,r system, to nary a few.
10 Regardless of the particular computer patrol m used, in an embodiment utilizing an analog audio input (via microphone 25) the general-purpose computer must include a sound-card (not shown). Of CouFse in an embodiment with a digital infect no sound card would be necessary.
In the embodiment shown in Fi U. 1, digital audio recording stations 10, 1 I, l 15 and 13 are loaded and configured to run digital audio recording software on a PENTIU\I-based coTn?uter system operating under 'IhONNirS 9. x. Such digital recording software is available as a utility in the WINDOWS 9.x operating system or *orn various third party vendor such as The Programmers' Consoltium, Inc. of Oalton, Virginia (VOICEDGC), Syntrillium Corporation of Phoenix, Arizona (COOL EDIT) or 20 Dragon Systems Corporation (Dragon Naturally Speaking Professional Edition). These various software programs p. educe a voice dictation file in the form of a "WAY" tile.
However, as would be known to those skilled in the art, other audio file formats, such as IvIP3 or DSS, could also be used to format the voice dictation file, witlio,ut dez,zno rrom the spirit of the present invention. In one embodiment wirers VOICEDOC software 25 is used that software also automatically assigns a file handle to the WAV rile, however, it would be known to those of ordinary shill in the art to save an audio file on a computer system using standard operating system, file,. ,anagement methods.
Another means for receivin,, 2 voice dictation file is dedicated digital recorder 14, such as the Olympus Digital Voice Recorder D- 1000 manufactured by the Olympus 0 Corporation. Thus, if the current voice user is more comfo,able with a more conventional type of dictation device, they can continue to use a dedicated digital recorder 14. In order to harvest the digital audio text file, upon completion of a
( -6 recording, dedicated digital recorder 14 would be operably connected to one of the.
digital adagio record..." stations, such as 13, to-ward downloading the digital audio file into that general-purpose computer With this approach, for instance, no audio card 7...o.er al.c,native for receiving the voice dictation r!le My consist or using one form or another of removable magnetic media containing a pre-recorded Bud rile -w'ih this alternative an operator would input the rPm-'.'?.k!"...25..CcC il}dia into one ot'the digital audio recording statiors toward u,oJoading the audio file into the system In some cases it may be necessary to pre-process the audio files to make thorn 10 acceptable rigor nrocessi,ng by the sp- ch recognition software For instance, a DSS file format may have to be chanced to a WAV file format, or the sarr.plin, rate ore digital audio file may have to be upsampled or dowtsampled For instance, in use the Olympus Digital Voice Recorder with Dragon l?aturaily Speal:in, Olympus' Spitz rate needs to be upsampled to i 1 IvIHz Software to acco.nplish such pre-processin is available fro, I a variety of sources inc]udin' SyntriliJum Corporation and Olympus Corpo. ation The other aspect of the digital audio recording stations is some meails for identifying the current voice user The identify ing means may include keyboard 24 upon - which the user (or a separate operator) can input the current user's unique identification code. Or course, the user identification can be input using a i,yriad o, computer input 70 de-vices such IS pointing devices (e g mouse 2), a touch screer; Not shown), a light pen (not shown) , bar-code reader (not shown) or audio cues via rr.icrophone 95, to r,ame a few In the case of a first time user the identifying means may also assign that user an identifica,ion nu.m,ber after receiving potentially identi,Ayin:, information 1^rolm that user, 2: including (1) name; (2) address; (3) occupation; (4) vocal dialect or accent; etc. As discussed in association with the control means, based uror. tk.,s i..'ut il, formation, a voice user profile and a sub-directory within the control means are established. Thus, regardless of the particular identification means used, a user identification must be established for each voice user and subsequently provided with a corresponding digital 30 audio file 1-or each use such that the co..tro] means can appropriately route and the system ulti,-natel:,r transc.ibe the audio.
-7 i In one embodiment ofthe present invention, the identifying means may also seek the rna..ual selection of a specialty vocabulary It is contemplated that the specialty vocabulary sets may be general for various users such as medical (i e Radiology, O.-thopecic Surgery, Gynecology) and legal (i e corporate, patent, litigation) or hi'h]v specific. Such ',, w,;,,n -ace speciai.y the vocabulary parar,ete, s could be fuF.her !irmited based on the particular circumstances of a particular dictation f le. For instance it the current Stoics USCi- is G Raoioiogist dictating the reading of a abdominal fit T roan the ^I:IGnC; tU{e is nightly specialized and different from the nomenclature for a renal ultrasound By narrowly segmenting each selectable vocabulary set an increase in the 10 accuracy of the automatic speech converter is likely As shown in Fig 1, the digital audio recording stations may be operably connected to system 100 as part of computer network 30 or, alternatively, they may be operably connected to the system via interact host] 5 As shown in Fig' 1 b, the general-
purposG computer can be connected to both network jack 97 and telephone jack With 1: tile use of an intrnet host, connection may De accomplished by e-mailing the audio file via the Inernet Another method for completing such connection is by way of direct mode n connection via remote control software, such as PC WEE, enrich is available rro.m Symantec Corporation of Cupertino, California. It is also possible, if the IP address of digital audio recording station 10 or ir.te, net host] S is known, to transfer 20 the audio file using' basic file transfer protocol Thus, as can be seen from the foregoing', the present system allows great flexibility for voice users to prc,vide audio input into the system Control means 200 controls the flow of voice dictation Me based upon the trainings status ofthe current voice user As shown in Figs a, fib, ?c, 2d, control means 9: 900 co,-nprises a souare?.cg,2m operating on general purpose computer 40 In particular, the p.ogram is initialized in step ?01 where variable are set, butr^ers cleared and the particular configuration for this particular installation of the control means is loaded Control means continually monitors a target directory (such as "current" (shown in Fig 3)) to dete,-nine whether a new file has been moved into the target, step 20? 30 Once a new file is found (such as "6/23 id" (shown in Fig 3)), a determination is made as to whether or not the current user (shown in Fig 1) is a new user, step 203
-I - For each new user (as indicated by the existence of a " pro" file in the "currents' subdirectory), a new subdirectory is established, step 204 (such as the 'usern" subdirectory (shown in Fig 3)) This subdirectory is used to store all of the audio files ("xxxx wav"), written text (";xxx wit"), verbatim text ("XXX.VD"), transcription text ("..,;,) a.,d US-J potpie (! usern pro") for that parlic!ar us=. Each pa' icuiar job is assigned a unique number "xxxx" such that al! ofthe files associated with a job can be aCsoc''3tr-d bye DING;.l-nlben twig tnis directory structure, the number of users is paclicaiiy limits only by storage space within general-purpose computer 40 Now that the user subdircto, has been established, the user profile is moved to I O the subdirectory, step 205 The contents of this user profile mast vary between systems The contents of one potential user profile is shown in Fig 3 as containing the user name, address, occupation and training status Aside from the training status variable, which is necessary, the other data is useful in routing and transcribing the audio files The control means, having selected one set of files by the handle, determines the 15 identity of the current user by comparing the " id" file with its "user tbl," step 906 Now that the user is known the user profile may be parsed from that user's subdirectory and the current training status determined, step 207 Steps 908-2 l are the triage of the current training status is one of: enrollment, training, automate, and stop automation Enrollment is the first stage in automating transcriptio.n services As shown in 20 Fig, fib, the audio file is sent to transcription, ste? 301 In particular, he "x>; xx way" file is transferred to.,anscriptionist stations 50 and 51 In a preferred embodiment, both stations are geneal-purpose corrupters, which run both an audio player and manual input means The audio player is likely to be a digita! audio player, although it is possible that an ana] o-; audio file could be transferred to the stations Various audio players are 2 commonly available including a utility in the v30WS 9 x operating system 2nd various other third parties such from The Programmers' Consortium, Inc. of Oakton, Virginia (\'OICESCRIBE) Regardless of the audio player used so play the audio file, manual input means is running on the computer a. the: same time This manual input means may comprise any of text editor or wo, d processor (such as MS WORD, _ O WordPerfect, AmiPro or Word Pad) in combination wield a keyboard, mouse, or other user- interface device In one embodiment of the present instep tic;,, this manual input means relay, itself, also be speech recognition software, such as Naturally Speaking from
(. -q Dragon Systems of Newton, Massachusetts, Via Voice from IBM Corporation of -monk, New York, or Speech lunatic from Philips Corporation of Atlanta, Georgia.
Human transcriptionist 6 listens to the audio file created by cur, eat user 5 and as is known, manually inputs the perceived contents of that recorded text, thins establishing C the.ranscriDeci rein, step 3(). Being human, hu..ar, t. z scri?tionist is likely to impose experience, education and biases on the text and thus not input a verbatim transcript of Me audio file. Upon completion ofthe human transcription, the human tr?nsrripticn'st saves the tile and indicates that it is ready So. transfer to the current users subdirectory as "XX-X. text", step 3 33.
10 Inasmuch as this current user is only at the enrol!=ent stage, a human operator will have to listen to the audio file and manually cor.npare it to the transcribed fife and create a verbatim file, step 304. That verbatim file ''xxxx vb" is also transferred to the current user's subdirectory, step 305. Now that verbatim text is available, control means 200 starts the automatic speech conversion means, step 30D. This autocratic speech 1: conversion means may be a preexisting progi-am, such as Dragon System's Naturally Speaking IBl's Via Voice or Philips' Speech Magic, to name a few. Alternatively, it could be a ur.ique procrrarn that is designed to specifically perfo. m automated speech recognition. In a preferred embodiment, Dragon Systems' Naturally Speaking has been used 20 by running an executable simultaneously with Naturally Speaking that reeds phantom keystrokes and mousing operations through the WIN32AI'I, such that Naturally Speaking believes that it is interacting with a human being, when in Sac; it is being controlled by control means 200. Such techniques are well known in the computer sof ware testing, art and, thus, will not be discussed in detail. It should suffice to say that 2: by watching the application flow or any speech recognition program, an executable to mimic the interactive manual steps can be created.
tr the current user is a new user, the speech recognition program will need to establish the new user, step 307. Control means provides the necessary information from the user profile round in the current user's subdirectory. All speech recognition require 30 significant training to establish an acoustic mode] of a particular user. In the case of Dragon, initially the program seeks app.oxirnately 20 minutes of audio uszliy obtained by the user reading, a canned text provided by Dragon Systems. There is also
tJ -l o- functionality built into Dragon that allows "mobile training." Using this rztcue, the verbatim file and audio file fretted into the speech recognition ?oV-ram to beginning training the acoustic model for that user, step 3 08 Regardless of the length of that audio file, control means 900 closes the speech recognition program at the corpletion of fh rile, step joy.
As the enrollment step is too scan to use the au omatically created text, a copy of the franc Id,,l_ is I to the current user using the address information contained in the user profile, step 3 i 0. This address can be a street address or an e-mail address.
Following that transmission, the p.o'rarn returns to the main loop on Fig. 2a.
I 0 After a certain number of minutes of training have been conducted for a particular user, that user's training status may be changed 1rom enrollment to training.
The border for this change is subjective, but perhaps a good,-ue orthumb is once Dragon appears to be creating written text with 80% accuracy or more, the switch between states can be made. Thus, for such a user the next transcription event will 1: prompt control means 200 into the trainin=, state. As shown in Fig. 2c, steps 40] -403 are the same human transcription steps as steps 301-303 in the enrollment phase. Once the transcribed file is established, control m ens 200 starts the auto.-natic speech conversion means (or speech reco onition program) and selects the current user, step -04. The audio file is fed into the speech recognition program and a written sex-. is established within the 20 program buffer, step 405. In the case of Dragon, this buffer is given the same file handle on very instance of the program. Thus, that blather can be easily copied usir.o standard operating system commands and manual editing cars begin, step 406.
In one particular embodiment utilizing the VOICEWARE system from The Programmers' Consortium, Inc. of O:kton, Vi, Tibia, the user inputs audio into the 2: VOICEWARE system's VOlCEDOC program, thus, creating a ".wav" file. In addition, before releasing this ".wav" file to the VOICEWARE server, the user selects a transcriptionist." This ",ranscriptionist" may be a particIar human transcriptionist or may be the "computerized transcriptionist. ' If the user selects a "co,-nputerized transcriptionist'' they may also select whether that transcription is handled locally or 0 remotely This file is assigned a job number by the VOICEWARE server, which routes the job to the VOICESCRIBE portion of the system. Normally, NiOICESCR7BE is used by the human transcriptionist to receive and playback the job's audio ('.wav") file. In
t Addition, the audio file is grabbed by the automatic speech conversion means. In this VOICEWARE system embodiment, by placing VOICESCRIBE in ' auto mode" new jobs (i.e. an audio file newly created by VOICEDOC) are automatically downloaded from the \70ICEWAR server and a VOICESCRIB:E window having a window title formed by the job number of the. r''rreil"'. Phi" Be. An executaie file, running in the background "sees" the VOlCESCRE window open and using the WIN:2A1
determines the job number from the VOICESCRIBE window title. The exile 'ate then launches the automatic speech conversion means. In Dragon System's Naturally Speaking, r^or instance, there is a built in function for performing speech recognition on a 10 preexisting ".wav" file. Theexecutableprogram feeds phantom keystrokes to Naturally Spealcir,g to open the 't.wav" file fron1 the "current" directory (see Fig. 3) having the job number of the current job.
In this embodiment, after.Naturally Speaking has completed automatically transcribin I, the contexts of.he ".wv" n!e, the executable file resumes operation by I selecting all o' the text in the open laturlly Speaking window and copying it to the WINDOWS 5.x operating system cii?board. Then, using the clipboard utility, save the clipboard as a text file using the current job number faith a "dmt" suffix. The executable file then "clicks" the "complete" button in VOICESCRIBE to return the '6lmt,, file to the VOICEWARE server. As would be understood by those of ordinary skill in the art, the 70 foregoing procedure can be done utilizing other digital recording software and other automatic speech conversion means. Additionally, functionality analogous to the WINDOWS clipboard exists in other operatin'sysems. It is also possible to require human inte,wer.tion to activate or prompt one or more of the foregoing steps. Further, although, the various programs executing various steps of this could be funning on a 95 number of interconnected computers (via a LAN, WAN, ir. ternet connection ity, entail and the like), it is also contemplated that all of the necessary software can be rennin 3 on a single cor...p?ster.
Another alternative approach is also contemplated wherein the user dictates directly into the automatic speech conversion means and the VOICEWARE server picks 30 up a copy in the reverse direction. This approach works as follows; without actually recording any voice, the user clicks on the "complete', button in VOICEDOC, thus, creating an empty ". wav' file. This empty file is r.e'e.-heless assi -ned a ',nique job
(it -1 L number by the VOICEWARE server. The user (or an executable file, tinning in the background) then launches the automatic speech conversion remans and the user dictates
directly into that program, in the same manner previously used in association with such automatic speech conversion means. Upon completion or the dictation the user stresses a 3;0n labeled 'return" (2en.erfed by a background executable file), which executable
then commences a macro that gets the current job number from VOICEWARE On the manner describe above), selects z11 of the text in the document E!nd Car es It to..,e clipboard. The clipboard is then saved to the file "<jobnumber>.dmt," as discussed above. The executable then "clicks" the "complete" button (via the WIN39API) in I O VOICESCBE, which effectively returns the automatically to anscribed text file back to the VC!CEWARE server, which, in turn,. eturns the completed transcription to the VOICESCR[BE user. Notably, although, the various programs executing various steps of this could be running on a number of interconnected computers (via a Lofty, WAN, internet connectivity, emai3 and the like), it is also contemplated that all of the necessary 15 software can be burning on a single computer.. As would be understood by those or ordinary skill in the art, the foregoing procedure can be done utilizing other digital recording scP.ware and other automatic speech conversion means. Additionally, functionality analogous to the WINDOWS clipboard exists in other operating systems. It is also possible to require human intervention to activate or prompt one or. more of the 90 foregoing, steps.
Manual editing is not an easy task Hulmar, beings are prone to errors. Thus, the present invention also includes means for improving on that task. As shown in Fig. 4, the transcribed file ('3333.txt") and the copy ofthe written text ('3333.wrt") are sequentially compared word by word 406a toward establishing sequential list of 5 un;r etched words 406b that are culled rrorn the COW' ofthe Grit en text. This list has a beginning and an end and pointer 406c to the current unmatched WO. d. Underlying the sequential Iist is another list of objects which contains the original unmatched words, as well as the words immediately before and after that unmatched word, the starting location in memory of each unmatched word in the sequential list o-urrmached words 30 406D and the length ofthe unmatched word.
As shown in Fig. 5, the unmatched word pointed at by pointer 405c from list 405b is displayed in substantial visual isolation fro rn the other text in the copy of the
(- 13-
written text on a standard computer monitor 500 in an active window 501. As shown in Fig. S. the context of the unmatched word can be selected by the operator to be shown within the sentence it resides, word by word or in phrase context, by clicking on be tons 514, Si5, and 516, respectively.
5 Associated Pith active window 501 is background window:09, which contains
the copy ofthe wrists.- lest file. As shown in background window SG2, a incremental
search has locales tsee pointer 503) the next occurrence of the current unmatched word "cash." Contemporaneously therewith, within window 503 containing the buffer from the speech recognition program, the same incremental search has located (see pointer 10 50S) the next occurrence of the char. eat unril.atched word. A human user will likely only being viewing active window 501 activate the audio replay from the speech recognition program by ciiclcing on "play" button 510, which plays the audio synchronized to the text at pointer 06. Based on that snippet ofspeech, which can be played over and over by clicking on the play button, the human user can r.nanJally input the correction to the I 5 current unmatched word viz keyboard, mousing actions, or possibly even audible cues to another speech recognition program running within this windows.
In the present example, even if the choice of isolated come>;. o fi ered by buttons 514, 5] 5 and 516, it may still be difficult to determine the correct verbatim word out-of cor,text, accordingly there is a switch window button 5]3 that will move background
20 window 502 to the foreground with visible pointer 503 indicating the current location withir. the copy office writter text. The user can then return to the active window and input the correct word, "trash.'' T his chang" will only errect the copy of the written text displayed in background window 502.
When the operatoris ready "o; the next unmatched word, the operator clicl:s on 2: the advance button 51 l, which advances poin er 406c down the list of unmatched Fiords and activates the incremental search in both window 502 and 505. This unmatched word is now displayed in isolation and the operator can play the synchronized speech from the speech recognition program and correct this word as well. If at any point in the operation, the operator would like to return to z previous unmatched word, the operator 0 clicks on the reverse button:12, which moves pointer 406c back a word in the list and causes a b&c6=ard incremental sea, ch to occur. This is accomplished but using the underlying list of objects which contains the original unmatched words. This.list is
-if - traversed in object by object fashion, but a1ternati\,ely each of the records could be padded such that each item has the same word size to assist in bi-directional traversing of the list. As the unmatched words in this underlying list are read only it is possible to eturn to the original unmatched word such that tle operator can d_,ermine Or a r?irre. e.
À Cal, ec,ion should have been.m.a4^.
IJltimatelv; the ruby oft. -written sex, is rraily con ected resulting in a verbatim Charm,,'31'lich' is Sa'vcu to the user s subdirectory. The verbatim file is also passed to the speech recognition program for trair.. nc, step 407. The new (aid improved) acoustic model is saved, step 408, and the speecl recognition program is closed, step 409. As the 10 system is still in training, the transcribed file is retooled to the user, as in step 310 from the enrollment phase.
As shown in Fig. 4, the system may also include means,Dr determining the accuracy rate from the output of.he sequential comparin, means. Specirca!ly, by counting the number of words in the written text and the number of N,ords in list '06b 15 the ratio of words in said sequential list to words in said unwritten text can be determined, thus providing ar, accuracy percentage. As before, it is a matter of choice when to advance users from one stage to another. Once that goal is reached, the user's prorate is changed to the next seance, step 2 l.
One potential enhance.-nent o. derivative functionality is provided by the 90 determination ofthe accuracy percentage. In one embodiment, this perceratavie could be used to evaluate a human transcriptionist's skills. In particular, by using either a known verbatim file or a wellestablished user, the associated ".wav" file should be played ror the human transcri?tionist and the foregoing comparison would be performed on the transcribed text versus the verbatim 1$ created by the foregoing process. In this 25 rr.anner, additional functionality can be provided by the presen, system.
As understood, cu,rentiy, manufacturers of speech recognition programs use recordin, or foreign]anguz;,es, dictions, etc. with manually established verbatim files to program speech models. It should be readily apparent that the foregoing manner of establishing verba,irn text could be used in the initial developmen+ of these speech files on simplifying., this process greatly.
Once the user has reached the automation stage the Creates bnef!ts cfhe present system can be achieved The speech recognition so*-, al e is sea. Àed, step 600, and the current user selected, step 631. If desired, a particularized vocabulary may be selected, step 609 Then automatic conversion ofthe Digital audio file re-.nrr1Pr? ha,, tk cl!-rent user.m; cv...mer.ce, Step 6v: When completed, the written file is trar.smited to the user based on the information contained in the user profile, step 604 and, he nrrgro.
is returned to the main]C>Op. i Unfortunately, there.-nay be instances where the voice users cannot use automated transcription for a period of time (during an illness, after dental work, etc.) 10 because their acoustic mode! has beer ter,l?ora,ily (or even permanently) altered In that case, the system administrator may set the training status variable to a stop automation state in which steps 301, 02, 303, 303 and 3]0 (see Fig, fib) are the only steps perrorm.ed. The foregoing description and drawings merely explain and iliustra.e the i
] 5 invention and the invention is not limited thereto Those or the sl; ill in the a.-t who have the disclosure berrGre them will be able to make modifications and variatior!s therein
without departing from the scope often present invention For instance, it is possible to implement ail of the elements of the present system on a single,eneral-pu. pose computer by essentially time sharing the machine between the voice user, transcriptionist 90 and the speech recognition prorarn The resultin' cost saving retakes this system accessible to more types of office situations not simply large medical clinics, hospital, law firms or other large entities

Claims (4)

( -16 CLAIMS:
1. fen apparatus for suDstantiaiy simpiifyinb tine prolix' fiord of;: Foreign language speech model for a speech recognition program wherein said foreign language provides a sur^fcient set OT words IO teach a voice dictation recording based upon a transcribed file produced by a human transcriptionist i and a written text produced by a speech recognition program, wherein said written text is at least temporarily synchronized to said voice dictation recording, said apparatus comprising: - means for sequentially comparing a copy of said written text with said transcribed file resulting in a sequential list of unmatched words culled from said copy of said written text, said sequential list having a beginning, an end and a current unmatched word., said current 15 unmatched fiord being successively advanced from said beginning to said end; means for incrementally searching for said current unmatched word contemporaneously within a first buffer associated with the speech 20 recognition program containing said written text and a second buffer; associated with said sequential list; and means for correcting said current unmatched word in said second buffer, said correcting means including means for displaying said 25 current unmatched word in a manner substantially visually isolated from other text in said copy of said written text and means for playing a portion of said synchronized voice dictation recording from said first buffer associated with said current unmatched word.
30
2. The invention according, to Claim 1 wherein said correcting nouns further
includes means for alternatively viewing said current unmatched word in context within said copy of said written text.
-17
3. The invention according to Claim 2 wherein said manner substantially visually isolated from other text can be manually selected Mom a group vv,taill-vvv-y-vl -pry,.c;;Cc-''y--.cilcli-e uispa,>', lieu said current unmatched word display.
4. An apparatus for substantially simplifying the production of a foreign i language speech model for a speech recognition proTraTn, substantially as herein described with reference to the accompanying drawings.
GB0324945A 1999-02-05 2000-02-04 System and method for automating transcription services Expired - Fee Related GB2390930B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11894999P 1999-02-05 1999-02-05
GB0118231A GB2361569B (en) 1999-02-05 2000-02-04 System and method for automating transcription services

Publications (4)

Publication Number Publication Date
GB0324945D0 GB0324945D0 (en) 2003-11-26
GB2390930A true GB2390930A (en) 2004-01-21
GB2390930B GB2390930B (en) 2004-03-10
GB2390930A8 GB2390930A8 (en) 2004-06-07

Family

ID=29738068

Family Applications (2)

Application Number Title Priority Date Filing Date
GB0324946A Expired - Fee Related GB2391100B (en) 1999-02-05 2000-02-04 System and method for automating transcription services
GB0324945A Expired - Fee Related GB2390930B (en) 1999-02-05 2000-02-04 System and method for automating transcription services

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB0324946A Expired - Fee Related GB2391100B (en) 1999-02-05 2000-02-04 System and method for automating transcription services

Country Status (1)

Country Link
GB (2) GB2391100B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689420B2 (en) 2006-04-06 2010-03-30 Microsoft Corporation Personalizing a context-free grammar using a dictation language model
US7752152B2 (en) 2006-03-17 2010-07-06 Microsoft Corporation Using predictive user models for language modeling on a personal device with user behavior models based on statistical modeling
US8032375B2 (en) 2006-03-17 2011-10-04 Microsoft Corporation Using generic predictive models for slot values in language modeling

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106409296A (en) * 2016-09-14 2017-02-15 安徽声讯信息技术有限公司 Voice rapid transcription and correction system based on multi-core processing technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5031113A (en) * 1988-10-25 1991-07-09 U.S. Philips Corporation Text-processing system
GB2302199A (en) * 1996-09-24 1997-01-08 Allvoice Computing Plc Text processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122613A (en) * 1997-01-30 2000-09-19 Dragon Systems, Inc. Speech recognition using multiple recognizers (selectively) applied to the same input sample
US7062441B1 (en) * 1999-05-13 2006-06-13 Ordinate Corporation Automated language assessment using speech recognition modeling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5031113A (en) * 1988-10-25 1991-07-09 U.S. Philips Corporation Text-processing system
GB2302199A (en) * 1996-09-24 1997-01-08 Allvoice Computing Plc Text processing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752152B2 (en) 2006-03-17 2010-07-06 Microsoft Corporation Using predictive user models for language modeling on a personal device with user behavior models based on statistical modeling
US8032375B2 (en) 2006-03-17 2011-10-04 Microsoft Corporation Using generic predictive models for slot values in language modeling
US7689420B2 (en) 2006-04-06 2010-03-30 Microsoft Corporation Personalizing a context-free grammar using a dictation language model

Also Published As

Publication number Publication date
GB2391100A8 (en) 2004-06-07
GB2390930B (en) 2004-03-10
GB2390930A8 (en) 2004-06-07
GB0324946D0 (en) 2003-11-26
GB2391100A (en) 2004-01-28
GB2391100B (en) 2004-03-17
GB0324945D0 (en) 2003-11-26

Similar Documents

Publication Publication Date Title
CA2351705C (en) System and method for automating transcription services
EP1183680B1 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
US7006967B1 (en) System and method for automating transcription services
US6961699B1 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
US7516070B2 (en) Method for simultaneously creating audio-aligned final and verbatim text with the assistance of a speech recognition program as may be useful in form completion using a verbal entry method
US6728680B1 (en) Method and apparatus for providing visual feedback of speed production
US20060190249A1 (en) Method for comparing a transcribed text file with a previously created file
US20080255837A1 (en) Method for locating an audio segment within an audio file
US20020095290A1 (en) Speech recognition program mapping tool to align an audio file to verbatim text
Berweck It worked yesterday: On (re-) performing electroacoustic music
US20050131559A1 (en) Method for locating an audio segment within an audio file
JP2009522614A (en) Method and system for text editing and score reproduction
CA2362462A1 (en) System and method for automating transcription services
GB2390930A (en) Foreign language speech recognition
Baume et al. A contextual study of semantic speech editing in radio production
CN118200299A (en) Metaverse conference hosting method, device, equipment, storage medium and program product
AU2004233462B2 (en) Automated transcription system and method using two speech converting instances and computer-assisted correction
Turunen et al. Mobidic-a mobile dictation and notetaking application.
Apperley et al. Application of imperfect speech recognition to navigation and editing of audio documents
JP2021140084A (en) Voice recognition error correction support device, program and method, and voice recognition device

Legal Events

Date Code Title Description
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1062498

Country of ref document: HK

REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1062498

Country of ref document: HK

PCNP Patent ceased through non-payment of renewal fee

Effective date: 20070204