[go: up one dir, main page]

CN108733649B - Method, device and system for inserting voice recognition text into script document - Google Patents

Method, device and system for inserting voice recognition text into script document Download PDF

Info

Publication number
CN108733649B
CN108733649B CN201810377108.0A CN201810377108A CN108733649B CN 108733649 B CN108733649 B CN 108733649B CN 201810377108 A CN201810377108 A CN 201810377108A CN 108733649 B CN108733649 B CN 108733649B
Authority
CN
China
Prior art keywords
text recognition
text
content
information
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810377108.0A
Other languages
Chinese (zh)
Other versions
CN108733649A (en
Inventor
卢闪明
张亚鹏
李行
单衍景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING HUAXIA DENTSU TECHNOLOGY CO LTD
Original Assignee
BEIJING HUAXIA DENTSU TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING HUAXIA DENTSU TECHNOLOGY CO LTD filed Critical BEIJING HUAXIA DENTSU TECHNOLOGY CO LTD
Priority to CN201810377108.0A priority Critical patent/CN108733649B/en
Publication of CN108733649A publication Critical patent/CN108733649A/en
Application granted granted Critical
Publication of CN108733649B publication Critical patent/CN108733649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application discloses a method, a device and a system for inserting a voice recognition text into a record document, wherein the method for inserting the voice recognition text into the record document comprises the steps of receiving current text recognition information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length; and inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identification of the current text recognition information. According to the scheme, the text recognition content returned by the recognition server is timely inserted into the script document regardless of confirmation, so that the problem that different speaker language habits and the like cannot be corrected in a unified manner is solved, the problem that the text recognition content is slow to insert into the script document due to low recognition text confirmation speed caused by the problems of a network or the server is solved, and the user experience is greatly improved.

Description

Method, device and system for inserting voice recognition text into script document
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a system for inserting a speech recognition text into a script document.
Background
With the development of speech recognition technology, speech recognition technology is more and more widely used in various industries. For example: in the court trial or meeting process, if the voice recognition technology can be applied to the court trial or meeting, the voice is converted into characters and the characters are inserted into the written document in different colors in real time, so that the workload of court trial or meeting recording personnel is greatly reduced, the problem of missing and misreading is avoided, and even the labor is saved by completely replacing the work of the recording personnel.
In the speech recognition process, the recognition server obtains an audio stream of a current certain role speaking, and generates a recognition text aiming at the current audio stream successively by repeatedly slicing the audio stream for multiple times and analyzing the audio stream in combination with context and semantics of context. If the text recognition content in the text recognition information cannot be confirmed, the recognition server repeatedly performs recognition processing on the current audio stream until the text recognition content in the text recognition information of the current audio stream is confirmed, and the text recognition content is not inserted into the record document. In the recognition process, if the speech speed of the speaker is too fast and the speech pause time is short, the recognition server will cause an error in automatic sentence-breaking calculation (the audio streams corresponding to two sentences of speech of the speaker are treated as one sentence), and since the recognition server performs comparison and analysis on the current audio stream for an increased number of times to obtain the recognition text of the final confirmation state, the user experience will be poor.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device and a system for inserting a voice recognition text into a record document, and solve the technical problem that the existing record document inserting experience is poor.
In order to achieve the above object, an embodiment of the present application provides a method for inserting a speech recognition text into a transcript document, including:
receiving current text identification information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identification of the current text recognition information.
Preferably, the step of inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identifier of the current text recognition information includes:
the text recognition state identifier in the current text recognition information is a non-confirmed identifier, and the text recognition identifier in the previous text recognition information is a non-confirmed identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
the text recognition state identifier in the current text recognition information is a non-confirmation identifier, and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document;
the text recognition state identifier in the current text recognition information is a confirmation identifier, and the text recognition identifier in the previous text recognition information is a non-confirmation identifier, inserting the text recognition content of the current text information into the corresponding position of the writing document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
and if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document.
Preferably, the step of inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the last text recognition information and the text length and the text recognition content in the current text recognition information comprises:
comparing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition content of the previous text recognition information with the text recognition content of the previous text recognition information, if the comparison result is the same, removing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition information in the previous text recognition information, and inserting the residual content behind the text recognition content of the previous text recognition information in the record document; and if the comparison result is different, deleting the text identification content of the last text identification information, and inserting the text identification content of the current text identification information into the position of the text identification content of the last text identification information of the record document.
Preferably, the step of inserting the text recognition contents of the current text recognition information into the corresponding position of the bibliographic document comprises:
if the text recognition mark in the previous text recognition information is a non-determined mark and the text recognition state mark in the current text recognition information is a non-determined mark, obtaining an insertion position of the text recognition content in the current text recognition information through a bookmark used when the text recognition content in the previous text recognition information is inserted, inserting the text recognition content in the current text recognition information into a corresponding position, and updating the inclusion range of the bookmark;
and if the text identification mark in the previous text identification information is a confirmation mark, acquiring the insertion position of the text identification content in the current text identification information through a positioning function, inserting the text identification content in the current text identification information into a corresponding position, removing the shading effect of the bookmark containing the text content used when the text identification content in the previous text identification information is inserted, and re-creating a corresponding bookmark, wherein the bookmark contains the position area of the text identification content in the current text identification information.
In order to achieve the above object, an embodiment of the present application further provides a method for inserting a speech recognition text into a transcript document, including:
receiving an audio stream;
segmenting the audio stream to obtain audio substreams;
determining a target audio substream needing to be identified currently according to the text identification state identifier in the previous text identification information;
identifying the target audio substream to obtain current text identification information; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and sending the current text identification information to a writing record inserting end to insert the text identification content in the current text identification information into the writing record document.
Preferably, the step of determining the target audio substream currently to be identified comprises:
if the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, the target audio substream to be recognized currently is the audio substream corresponding to the previous text recognition information;
and if the text recognition state identifier in the last text recognition information is the confirmation identifier, the target audio substream needing to be recognized currently is the next audio substream.
In order to achieve the above object, an embodiment of the present application provides an apparatus for inserting a speech recognition text into a transcript document, including:
a receiving unit for receiving current text identification information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and the inserting record unit is used for inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identifier of the current text recognition information.
Preferably, the inserting record unit includes:
the first writing-record inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the writing-record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a non-confirmed identifier and the text recognition identifier in the previous text recognition information is a non-confirmed identifier;
the second inserting type recording module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the recording document if the text recognition state identifier in the current text recognition information is a non-confirmed identifier and the text recognition identifier in the last text recognition information is a confirmed identifier;
a third inserting type writing module, configured to insert the text recognition content of the current text information into a corresponding position of the writing document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the previous text recognition information is a non-confirmation identifier;
and the fourth writing record inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the writing record document if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the last text recognition information is a confirmation identifier.
In order to achieve the above object, an apparatus for inserting a speech recognition text into a script document according to an embodiment of the present application includes:
a receiving unit for receiving an audio stream;
the segmentation unit is used for segmenting the audio stream to obtain audio substreams;
the target audio substream confirming unit is used for confirming the current target audio substream needing to be identified according to the text identification state identifier in the previous text identification information;
the identification unit is used for identifying the target audio substream to obtain current text identification information; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and the sending unit is used for sending the current text identification information to a record inserting end to realize that the text identification content in the current text identification information is inserted into the record document.
Preferably, the target audio substream confirmation unit includes:
the first confirmation module is used for judging that the current target audio substream to be recognized is the audio substream corresponding to the previous text recognition information if the text recognition state identifier in the previous text recognition information is a non-confirmation identifier;
and the second confirmation module is used for determining that the target audio substream needing to be identified currently is the next audio substream if the text identification state identifier in the previous text identification information is the confirmation identifier.
As can be seen from the above, the returning of the recognized text is too slow due to the speaking habits of the speakers, the network and recognition server configuration, and the returning of the text recognition content only in the case of recognition confirmation, which results in poor user experience. Based on the scheme, the text recognition content returned by the recognition server is timely inserted into the script document regardless of confirmation, so that the problem that the language habits of different speakers and the like cannot be corrected uniformly is solved, the problem that the text recognition content is slow to insert into the script document due to low recognition text confirmation speed caused by the problems of a network or the server is also guaranteed, and the user experience is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a system for inserting a speech recognition text into a transcript document according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for inserting a speech recognition text into a transcript document according to an embodiment of the present application;
fig. 3 is a second flowchart of a method for inserting a speech recognition text into a transcript document according to an embodiment of the present application;
FIG. 4 is a functional block diagram of an apparatus for inserting a speech recognition text into a transcript document according to an embodiment of the present application;
FIG. 5 is a second functional block diagram of an apparatus for inserting a speech recognition text into a transcript document according to an embodiment of the present application;
fig. 6 is a schematic view of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
Fig. 1 is a schematic diagram of a system for inserting a speech recognition text into a transcript document according to an embodiment of the present application. The method comprises the following steps: and inserting a writing terminal and a voice recognition server. The voice recognition server acquires an audio stream from the voice collector, and segments the audio stream into a plurality of audio substreams after the audio stream is subjected to noise processing. The voice recognition server carries out recognition processing on each audio substream, constructs a recognition processing result into text recognition information, and sends the text recognition information to the inserting record terminal regardless of whether the recognition content of the audio substream is confirmed or not. If the recognition content of the currently recognized audio substream is confirmed, the speech recognition server can perform the recognition work of the next audio substream. And if the identification content of the currently identified audio substream is in an unconfirmed state, the voice identification server continues to identify the current audio substream. The voice recognition server returns the text recognition information to the insertion script terminal regardless of whether the recognized contents of the audio substream are in an unconfirmed state or a confirmed state. And the inserting record terminal inserts the text recognition content in the text recognition information returned by the voice recognition server into the corresponding position of the record document according to the text recognition state.
The technical scheme is applied to an application scene that only one role speaks at the same time. In the technical scheme, a script document creation storage unit is inserted, and text identification information returned by an identification server is stored in the storage unit. The storage unit stores information such as voice content and identification state identification, and when a real-time identification text is received each time, the text insertion position is obtained by calculation of the identification state identification in the storage unit through the insertion of the handwriting terminal, so that the insertion of the identification text content of a single-role speaking into the corresponding position of the handwriting document is realized.
In the present embodiment, the meaning of identifying the unconfirmed state of the content is: the voice recognition server carries out slicing analysis and other recognition operations on the acquired audio stream to generate text recognition content, the text recognition content is a part of final text generated by current audio substream recognition, and individual fields stored in the text recognition content need to be corrected and modified through re-recognition processing. The meaning of recognizing the confirmation status of the content is: the recognition server carries out slice analysis and other recognition operations on the acquired audio stream to generate text recognition content, and the text recognition content finally confirms the text which does not need to be subjected to recognition operation again by combining with context semantic analysis.
Based on the above description, an embodiment of the present application provides a method for inserting a speech recognition text into a bibliographic document, as shown in fig. 2. For the technical scheme, the method and the device are applied to the terminal for inserting the record, and specifically, the terminal for inserting the record can be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, an intelligent wearable device, a shopping guide terminal, a television and the like with a data processing function. Alternatively, the client may be software capable of running in the electronic device. The method is applied to a multi-role simultaneous speaking situation, and can comprise the following steps:
step 201): receiving current text identification information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length.
In the technical scheme, the text recognition content is the voice content in the current target audio substream. The text recognition state identification identifies whether the recognized voice content in the current target audio substream does not need to be recognized again. In this embodiment, the text recognition status is labeled as 1, which represents that the speech content recognized in the current audio substream is finally confirmed by combining with the context semantic analysis to a text that does not need to be recognized again. The text recognition state is marked as 0, the voice content recognized in the current audio substream is marked as a part of the final text generated by recognizing the current audio substream, and individual fields stored in the text recognition content need to be corrected and modified through re-recognition processing. The text length is the length of the speech content of the current target audio substream identified by the recognition server.
In this embodiment, the terminal inserted into the script is provided with a storage unit on the processor, which is specially used for storing the current text recognition information returned by the voice recognition server. The storage unit is divided into a plurality of storage areas, and different contents in the text recognition information are respectively stored in different areas, such as a specific storage text recognition state identifier, a specific storage text recognition content and the like. For the technical scheme, a storage unit stores previous text recognition information, a writing terminal is inserted to receive the current text recognition information returned by a voice recognition server, the writing terminal is inserted to insert the text recognition content in the current text recognition information into a corresponding writing document according to a text recognition state identifier in the current text recognition information and a text recognition state identifier in the previous text recognition information, the previous text recognition information is deleted from the storage unit, and the current text recognition information is stored in the storage unit. The inserting record terminal is provided with a memory for storing result information inserted in the record document, the content in the last text identification information stored by the storage unit is used for accurately confirming the inserting position when the text identification content is inserted, and the memory is different from the content stored by the storage unit in the technical scheme.
Step 202): and inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identification of the current text recognition information.
In this technical solution, the step of inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identifier of the current text recognition information includes:
the text recognition state identifier in the current text recognition information is a non-confirmed identifier, and the text recognition identifier in the previous text recognition information is a non-confirmed identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
the text recognition state identifier in the current text recognition information is a non-confirmation identifier, and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document;
the text recognition state identifier in the current text recognition information is a confirmation identifier, and the text recognition identifier in the previous text recognition information is a non-confirmation identifier, inserting the text recognition content of the current text information into the corresponding position of the writing document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
and if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document.
In the technical scheme, the step of inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the last text recognition information and the text length and the text recognition content in the current text recognition information comprises the following steps:
comparing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition content of the previous text recognition information with the text recognition content of the previous text recognition information, if the comparison result is the same, removing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition information in the previous text recognition information, and inserting the residual content behind the text recognition content of the previous text recognition information in the record document; and if the comparison result is different, deleting the text identification content of the last text identification information, and inserting the text identification content of the current text identification information into the position of the text identification content of the last text identification information of the record document.
In the technical scheme, the step of inserting the text recognition content of the current text recognition information into the corresponding position of the record document comprises the following steps:
if the text recognition mark in the previous text recognition information is a non-determined mark and the text recognition state mark in the current text recognition information is a non-determined mark, obtaining an insertion position of the text recognition content in the current text recognition information through a bookmark used when the text recognition content in the previous text recognition information is inserted, inserting the text recognition content in the current text recognition information into a corresponding position, and updating the inclusion range of the bookmark;
and if the text identification mark in the previous text identification information is a confirmation mark, acquiring the insertion position of the text identification content in the current text identification information through a positioning function, inserting the text identification content in the current text identification information into a corresponding position, removing the shading effect of the bookmark containing the text content used when the text identification content in the previous text identification information is inserted, and re-creating a corresponding bookmark, wherein the bookmark contains the position area of the text identification content in the current text identification information.
Specifically, to describe in detail the process of inserting the script document under the single-role speaking condition, the process flow of inserting the script terminal is as follows:
1. the character A speaks for the first time, the audio collector conducts audio collection on the speech of the character A to obtain audio streams, the recognition server conducts segmentation processing on the audio streams to obtain audio substreams, the recognition server conducts recognition processing on the first audio substream, text recognition content Sa1, text recognition state identification Ta1 and text length L1 are returned for the first time, a storage unit corresponding to the character A is created, and the text recognition content, the text recognition state identification and the text length in the text recognition information are stored. Wherein Ta1 is 1, which indicates that the recognition server is a confirmation text for the currently returned text recognition content Sa1, stores the text recognition content Sa1 and the text recognition state identification Ta1 in the currently returned text recognition information in the storage unit, and inserts the text recognition content Sa1 into the transcript document. And waiting for the recognition server to return the next text recognition information. Ta1 is 0, indicating that the recognition server has unconfirmed text of 1 bits of the currently returned text recognition content Sa, stores the text recognition content Sa1 and the text recognition state identification Ta1 in the currently returned text recognition information in the storage unit, and inserts the text recognition content Sa1 into the transcript document. And waiting for the recognition server to return the next text recognition information.
2. The recognition server returns the next text recognition information to obtain the text recognition content Sa2, the text recognition state flag Ta2, and the text length L2. If the text recognition status identifier Ta1 of the last text recognition information is 1, the current text recognition information is obtained by the recognition server for the current audio substream recognition. If the text recognition status identifier Ta1 of the previous text recognition information is 0, the current text recognition information is obtained by the recognition server for the audio substream corresponding to the previous text recognition information.
2.1 if Ta2 is 0 and the recognized text state in the text recognition information is not confirmed, comparing the L1-length content starting from the start position of the text recognition content S2 with the text recognition content S1, and if the comparison result is the same, acquiring an L2-L1 content portion S21 of the text recognition content S2 by character string truncation and inserting the content S21 into the transcript document in a tail-added manner; if the comparison results are not the same, the text recognition content S2 is directly inserted into the script document in an overlay insertion manner (deleting the text recognition content S1 and inserting the text recognition content S2) without intercepting the text recognition content S2. The contents stored in the storage unit are updated, the text recognition content Sa1, the text recognition state flag Ta1, and the text length L1 of the last text recognition information are deleted, and the text recognition content Sa2, the text recognition state flag Ta2, and the text length L2 of the current text recognition information are stored in the storage unit.
2.2 if Ta2 is 1, the recognition text state in the text recognition information is ok (Ta2 is 1), the text length is L2, and the text recognition content is S2. The contents starting from the start position of the text recognition content S2 in the currently acquired text recognition information to the length L1 are compared with the text content S1. If the comparison result is the same, acquiring the partial content L2-L1 of the text content S2 through character string interception and inserting the content into the script document in a tail adding mode; if the comparison results are different, the text content S2 does not need to be intercepted, and the text content S2 is directly inserted into the record document in an overlay insertion manner (S1 content in the deleted document is inserted into S2). Meanwhile, the text recognition content Sa1, the text recognition state flag Ta1, and the text length L1 of the last text recognition information are deleted, and the text recognition content Sa2, the text recognition state flag Ta2, and the text length L2 of the current text recognition information are stored in the storage unit.
3. And returning the third text identification information by the identification server, receiving the information by the inserting record terminal, and executing the inserting logic according to the step 2.1 in the processing flow of the inserting record terminal if the text identification state identifier Ta2 in the last text identification information is 0 regardless of 1 or 0 of the text identification state identifier Ta3 in the third text identification information. If the text recognition status flag Ta2 in the previous text recognition information is equal to 1, the text processing and insertion logic is restarted according to the sequence of step 1 in the process flow of inserting the script terminal. Finally, the text recognition content Sa2, the text recognition state flag Ta2, and the text length L2 of the last text recognition information are deleted, and the text recognition content Sa3, the text recognition state flag Ta3, and the text length L3 of the current text recognition information are stored in the storage unit.
When the text recognition content is inserted into the script document, a shading effect is added to a new text of the script document inserted into each role in real time, the shading effect of a text with a recognition state being a confirmed state in the text recognition content returned last time in the document is detected and removed, and the shading effect is ensured to follow the current latest inserted recognition text. Specifically, the step of inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identifier of the current text recognition information further includes:
after inserting the text recognition content in the current text recognition information into the corresponding position, judging the text recognition state identification of the previous text recognition information, if the text recognition state identification of the previous text recognition information is a confirmation identification, removing the shading effect of the text recognition content of the previous text recognition information, inserting the text recognition content in the current text recognition information, and setting the shading effect; and if the text recognition state identifier of the last text recognition information is a non-confirmed identifier, inserting the text recognition content in the current text recognition information, and setting a shading effect.
Then, on the basis of the above example, the shading effect adding logic corresponding to each text recognition content in the case of the role a speaking:
1. inserting the first text recognition information of the character A returned by the recognition server into the script terminal, creating a storage unit corresponding to the character A, and storing the text recognition content Sa1, the text recognition state identifier Ta1 and the text length L1 in the current text recognition information in an overlaying storage mode.
1.1 if Ta1 is equal to 0, calculating and acquiring a text recognition content insertion position of current text recognition information through a related positioning function provided by WordAPI, inserting the text recognition content into a record document, creating a Bookmark (Bookmark) B < a > corresponding to the role a and including Sa1, adding a corresponding ground-color effect to the inserted text content through the Bookmark, and going to step 2 to continue to execute the logic flow.
1.2 if Ta1 is 1, calculating and acquiring a text recognition content insertion position of current text recognition information through a related positioning function provided by WordAPI, inserting the text recognition content into a record document, creating a Bookmark (Bookmark) B < a > corresponding to the role A and including Sa1, and adding a corresponding ground-color effect to the inserted text content through the Bookmark. And the step 1 is carried out to continue the logic flow.
2. And the identification server returns the second text identification information to the terminal for inserting the record. The inserting-writing terminal stores the text recognition content Sa2, the text recognition state identification Ta2 and the text length L2 in the current text recognition information to the storage unit in an overlaying storage mode, and deletes the text recognition content Sa1, the text recognition state identification Ta1 and the text length L1 in the first text recognition information.
2.1 if the text recognition state identifier Ta2 is equal to 0, calculating and acquiring the insertion position of the text recognition content Sa2 through the bookmark B < a >, inserting the text recognition content Sa2 into the record document, updating the inclusion range of the bookmark B < a >, adding a corresponding shading effect to the updated bookmark B < a >, and going to step 3 to continue to execute the logic flow.
2.2 if the text recognition state identifier Ta2 is equal to 1, calculating and acquiring the insertion position of the text recognition content Sa2 through the bookmark B < a >, inserting the text recognition content Sa2 into the record document, updating the inclusion range of the bookmark B < a >, adding a corresponding shading effect to the updated bookmark B < a >, and going to step 3 to continue to execute the logic flow.
The 3 recognition server returns the third text recognition information to the insertion record terminal, the insertion record terminal stores the text recognition content Sa3, the text recognition state identifier Ta3 and the text length L3 in the current text recognition information to the storage unit in an overlaying storage mode, and deletes the text recognition content Sa2, the text recognition state identifier Ta2 and the text length L2 in the second text recognition information.
3.1 if Ta2 is equal to 0 and Ta3 is equal to 0, perform step 2.1.
3.2 if Ta2 is equal to 0 and Ta3 is equal to 1, perform step 2.2.
3.3 if Ta2 ═ 1, Ta3 ═ 0, or Ta2 ═ 1, Ta3 ═ 1, the insertion logic flow is executed from step 1 again, making it clear that bookmark B < a > contains the shading effect of the text.
The embodiment of the application provides another method for inserting the speech recognition text into the script document, as shown in fig. 3. For the technical scheme, the method is applied to inserting a voice recognition server, and specifically, the voice recognition server can be an electronic device with data operation and storage functions and network interaction functions; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers. The method for inserting the voice recognition text into the script document can comprise the following steps:
step 301: an audio stream is received.
In this embodiment, the voice collector collects voice of a user in an application scene in real time, and performs noise reduction processing on the collected voice to obtain an audio stream.
Step 302): and segmenting the audio stream to obtain audio substreams.
In this embodiment, in order to improve the accuracy of speech recognition, the audio stream fed back by the speech acquisition device is subjected to segmentation processing, and a large-segment audio stream is subjected to segmentation processing to obtain a plurality of small-segment audio streams. The data of the audio stream is not particularly large during each recognition, and the recognition precision is greatly improved.
Step 303): and determining the current target audio substream needing to be identified according to the text identification state identifier in the last text identification information.
In the technical scheme, if the recognition server cannot confirm the recognition result of the current audio information needing to be recognized, the recognition result is still fed back to the insertion record terminal, the unconfirmed content is inserted into the record document, then the recognition server continues to recognize the audio information again, whether the recognition result is confirmed or not at this time, the recognition result is still fed back to the insertion record terminal, and the recognition result is inserted into the record document. And the next audio information is not identified until the text identification information of the audio information identified and processed by the identification server is confirmed. And if the identification result of the identification server to the current audio information needing identification processing is in a confirmation state, feeding back the identification result to the insertion record terminal, inserting the confirmed content into the record document, and then identifying the next audio information by the identification server.
For a conventional technical scheme, the recognition result of the current audio information to be recognized cannot be confirmed by the recognition server, and the recognition result is not fed back to the insertion record terminal until the recognition server confirms the current audio information to be recognized result, and the recognition result is fed back to the insertion record terminal for insertion. The time spent in the insertion of the conventional technical scheme is longer than that of the technical scheme, and the experience degree of a user is greatly reduced. According to the technical scheme, the identification information is inserted into the record document in real time regardless of confirmation or not, and the user experience is improved. Therefore, in the present technical solution, the step of determining the target audio substream that needs to be identified currently includes:
if the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, the target audio substream to be recognized currently is the audio substream corresponding to the previous text recognition information;
and if the text recognition state identifier in the last text recognition information is the confirmation identifier, the target audio substream needing to be recognized currently is the next audio substream.
Step 304): identifying the target audio substream to obtain current text identification information; the current text information comprises text recognition content, a text recognition state identifier and a text length;
step 305): and sending the current text identification information to a writing record inserting end to insert the text identification content in the current text identification information into the writing record document.
According to the technical scheme, the problem of poor user experience due to low returning speed of the recognition text is solved by inserting the text generated in the process of slicing, comparing, analyzing and calculating the audio stream by the recognition server into the document in real time regardless of confirmation.
Fig. 4 is a functional block diagram of an apparatus for inserting a speech recognition text into a transcript document according to an embodiment of the present application. The device is used for inserting a record terminal in practical application. The method comprises the following steps:
a receiving unit 401, configured to receive current text identification information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and an inserting record unit 402, configured to insert corresponding text recognition content into a corresponding position of the record document according to the text recognition state identifier of the current text recognition information.
In this embodiment, the inserting record unit includes:
the first writing record inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the writing record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a non-confirmed identifier and the text recognition identifier in the previous text recognition information is a non-confirmed identifier;
the second handwriting inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the handwriting document if the text recognition state identifier in the current text recognition information is a non-confirmation identifier and the text recognition identifier in the previous text recognition information is a confirmation identifier;
a third inserting type writing module, configured to insert the text recognition content of the current text information into a corresponding position of the writing document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the previous text recognition information is a non-confirmation identifier;
and the fourth inserting record module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the record document if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the last text recognition information is a confirmation identifier.
Fig. 5 is a second functional block diagram of an apparatus for inserting a speech recognition text into a bibliographic document according to the embodiment of the present application. The device is used for inserting a record terminal in practical application. The method comprises the following steps:
a receiving unit 501 for receiving an audio stream;
a segmentation unit 502, configured to segment the audio stream to obtain an audio substream;
a target audio substream confirming unit 503, configured to determine a target audio substream that needs to be identified currently according to the text identification state identifier in the previous text identification information;
an identifying unit 504, configured to identify the target audio substream to obtain current text identification information; the current text information comprises text recognition content, a text recognition state identifier and a text length;
a sending unit 505, configured to send the current text identification information to an insertion entry end, so as to implement insertion of the text identification content in the current text identification information into the entry document.
Fig. 6 is a schematic diagram of an electronic system according to an embodiment of the present application. The electronic device includes: a memory a and a processor b, wherein the memory a stores a computer program, and when the computer program is executed by the processor b, the computer program realizes the following functions:
receiving current text identification information of a target audio substream; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and inserting the corresponding text recognition content into the corresponding position of the record document according to the text recognition state identification of the current text recognition information.
In this embodiment, when the corresponding text recognition content is inserted into the corresponding position of the record document according to the text recognition status identifier of the current text recognition information, the computer program implements the following functions when executed by the processor b:
if the text recognition state identifier in the current text recognition information is a non-confirmed identifier and the text recognition identifier in the previous text recognition information is a non-confirmed identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
the text recognition state identifier in the current text recognition information is a non-confirmation identifier, and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document;
the text recognition state identifier in the current text recognition information is a confirmation identifier, and the text recognition identifier in the previous text recognition information is a non-confirmation identifier, and then the text recognition content of the current text information is inserted into the corresponding position of the writing document according to the text length and the text recognition content in the previous text recognition information, and the text length and the text recognition content in the current text recognition information;
and if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document.
In this embodiment, the computer program, when executed by the processor b, implements the following functions when inserting the text recognition content of the current text recognition information into the corresponding position of the entry document according to the text length and the text recognition content in the last text recognition information and the text length and the text recognition content in the current text recognition information:
comparing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition content of the previous text recognition information with the text recognition content of the previous text recognition information, if the comparison result is the same, removing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition information in the previous text recognition information, and inserting the residual content behind the text recognition content of the previous text recognition information in the record document; and if the comparison result is different, deleting the text identification content of the last text identification information, and inserting the text identification content of the current text identification information into the position of the text identification content of the last text identification information of the record document.
In this embodiment, when the corresponding text recognition content is inserted into the corresponding position of the record document according to the text recognition status identifier of the current text recognition information, the computer program implements the following functions when executed by the processor b:
after inserting the text recognition content in the current text recognition information into the corresponding position, judging the text recognition state identification of the previous text recognition information, if the text recognition state identification of the previous text recognition information is a confirmation identification, removing the shading effect of the text recognition content of the previous text recognition information, inserting the text recognition content in the current text recognition information, and setting the shading effect; and if the text recognition state identifier of the last text recognition information is a non-confirmed identifier, inserting the text recognition content in the current text recognition information, and setting a shading effect.
An embodiment of the present application provides another electronic device, where the electronic device includes: a memory a and a processor b, wherein the memory a stores a computer program, and when the computer program is executed by the processor b, the computer program realizes the following functions:
receiving an audio stream;
segmenting the audio stream to obtain audio substreams;
determining a target audio substream needing to be identified currently according to the text identification state identifier in the previous text identification information;
identifying the target audio substream to obtain current text identification information; the current text information comprises text recognition content, a text recognition state identifier and a text length;
and sending the current text identification information to a writing record inserting end to insert the text identification content in the current text identification information into the writing record document.
In this embodiment, a target audio substream currently to be identified is determined, and the computer program, when executed by the processor b, implements the following functions:
if the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, the target audio substream to be recognized currently is the audio substream corresponding to the previous text recognition information;
and if the text recognition state identifier in the last text recognition information is the confirmation identifier, the target audio substream needing to be recognized currently is the next audio substream.
In this embodiment, the Memory includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card).
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
The specific functions implemented by the memory and the processor of the electronic device provided in the embodiments of the present disclosure may be explained in comparison with the foregoing embodiments in the present disclosure, and can achieve the technical effects of the foregoing embodiments, and thus, no further description is provided herein.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain a corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbylangue (Hardware Description Language), vhjhdul (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
Those skilled in the art will also appreciate that, in addition to implementing a client, server as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the client, server are in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, such a client and a server may be considered as a hardware component, and a device included therein for implementing various functions may be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the client, reference may be made to the introduction of the embodiments of the method described above for a comparative explanation.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (8)

1. A method for inserting a voice recognition text into a script document is applied to a terminal for inserting scripts, and comprises the following steps:
receiving current text identification information of a target audio substream; the current text recognition information comprises text recognition content, a text recognition state identifier and a text length;
inserting corresponding text recognition content into a corresponding position of the record document according to the text recognition state identifier of the current text recognition information, wherein the method comprises the following steps:
the text recognition state identifier in the current text recognition information is a non-confirmed identifier, and the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
inserting the text recognition content of the current text recognition information into the corresponding position of the record document if the text recognition state identifier in the current text recognition information is a non-confirmation identifier and the text recognition state identifier in the previous text recognition information is a confirmation identifier;
if the text recognition state identifier in the current text recognition information is a confirmed identifier and the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information;
and if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition state identifier in the previous text recognition information is a confirmation identifier, inserting the text recognition content of the current text recognition information into the corresponding position of the record document.
2. The method of claim 1, wherein the step of inserting the text recognition contents of the current text recognition information into the corresponding position of the bibliographic document based on the text length and the text recognition contents of the previous text recognition information and the text length and the text recognition contents of the current text recognition information comprises:
comparing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition content of the previous text recognition information with the text recognition content of the previous text recognition information, if the comparison result is the same, removing the content of the text recognition content of the current text recognition information from the starting position to the position with the same text length as the text recognition information in the previous text recognition information, and inserting the residual content behind the text recognition content of the previous text recognition information in the record document; and if the comparison result is different, deleting the text identification content of the last text identification information, and inserting the text identification content of the current text identification information into the position of the text identification content of the last text identification information of the record document.
3. The method of claim 2, wherein the step of inserting the text recognition contents of the current text recognition information into the corresponding position of the transcript document comprises:
if the text recognition state identifier in the previous text recognition information is a non-determined identifier and the text recognition state identifier in the current text recognition information is a non-determined identifier, obtaining an insertion position of the text recognition content in the current text recognition information through a bookmark used when the text recognition content in the previous text recognition information is inserted, inserting the text recognition content in the current text recognition information into a corresponding position, and updating an inclusion range of the bookmark;
and if the text recognition state identifier in the previous text recognition information is a confirmation identifier, acquiring the insertion position of the text recognition content in the current text recognition information through a positioning function, inserting the text recognition content in the current text recognition information into a corresponding position, removing the shading effect of the bookmark containing the text content used when the text recognition content in the previous text recognition information is inserted, and re-creating a corresponding bookmark, wherein the bookmark contains a position area of the text recognition content in the current text recognition information.
4. The method of claim 1, further comprising the steps of, applied to a speech recognition server:
receiving an audio stream;
segmenting the audio stream to obtain audio substreams;
determining a target audio substream needing to be identified currently according to the text identification state identifier in the previous text identification information;
identifying the target audio substream to obtain current text identification information; the current text recognition information comprises text recognition content, a text recognition state identifier and a text length;
and sending the current text identification information to a writing record inserting end to insert the text identification content in the current text identification information into the writing record document.
5. The method of claim 4, wherein the step of determining a target audio substream currently in need of identification comprises:
if the text recognition state identifier in the previous text recognition information is a non-confirmed identifier, the target audio substream to be recognized currently is the audio substream corresponding to the previous text recognition information;
and if the text recognition state identifier in the last text recognition information is the confirmation identifier, the target audio substream needing to be recognized currently is the next audio substream.
6. A device for inserting a voice recognition text into a record document is applied to a terminal for inserting the record, and comprises:
a receiving unit for receiving current text identification information of a target audio substream; the current text recognition information comprises text recognition content, a text recognition state identifier and a text length;
the system comprises a recording inserting unit and a recording inserting unit, wherein the recording inserting unit is used for inserting corresponding text recognition content into a corresponding position of a recording document according to a text recognition state identifier of current text recognition information, and the recording inserting unit comprises:
the first writing record inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the writing record document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a non-confirmed identifier and the text recognition state identifier in the previous text recognition information is a non-confirmed identifier;
the second handwriting inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the handwriting document if the text recognition state identifier in the current text recognition information is a non-confirmation identifier and the text recognition state identifier in the previous text recognition information is a confirmation identifier;
a third inserting type module, configured to insert the text recognition content of the current text recognition information into a corresponding position of the type document according to the text length and the text recognition content in the previous text recognition information and the text length and the text recognition content in the current text recognition information if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition state identifier in the previous text recognition information is a non-confirmation identifier;
and the fourth writing record inserting module is used for inserting the text recognition content of the current text recognition information into the corresponding position of the writing record document if the text recognition state identifier in the current text recognition information is a confirmation identifier and the text recognition state identifier in the last text recognition information is a confirmation identifier.
7. The apparatus of claim 6, applied to a speech recognition server, the receiving unit being adapted to receive an audio stream;
the device further comprises: the segmentation unit is used for segmenting the audio stream to obtain an audio substream;
the target audio substream confirming unit is used for confirming the current target audio substream needing to be identified according to the text identification state identifier in the previous text identification information;
the identification unit is used for identifying the target audio substream to obtain current text identification information; the current text recognition information comprises text recognition content, a text recognition state identifier and a text length;
and the sending unit is used for sending the current text identification information to a record inserting end to realize that the text identification content in the current text identification information is inserted into the record document.
8. The apparatus of claim 7, wherein the target audio substream confirmation unit comprises:
the first confirmation module is used for judging that the current target audio substream to be recognized is the audio substream corresponding to the previous text recognition information if the text recognition state identifier in the previous text recognition information is a non-confirmation identifier;
and the second confirmation module is used for determining that the target audio substream needing to be identified currently is the next audio substream if the text identification state identifier in the previous text identification information is the confirmation identifier.
CN201810377108.0A 2018-04-25 2018-04-25 Method, device and system for inserting voice recognition text into script document Active CN108733649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810377108.0A CN108733649B (en) 2018-04-25 2018-04-25 Method, device and system for inserting voice recognition text into script document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810377108.0A CN108733649B (en) 2018-04-25 2018-04-25 Method, device and system for inserting voice recognition text into script document

Publications (2)

Publication Number Publication Date
CN108733649A CN108733649A (en) 2018-11-02
CN108733649B true CN108733649B (en) 2022-05-06

Family

ID=63939266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810377108.0A Active CN108733649B (en) 2018-04-25 2018-04-25 Method, device and system for inserting voice recognition text into script document

Country Status (1)

Country Link
CN (1) CN108733649B (en)

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1599867B1 (en) * 2003-03-01 2008-02-13 Robert E. Coifman Improving the transcription accuracy of speech recognition software
EP1471502A1 (en) * 2003-04-25 2004-10-27 Sony International (Europe) GmbH Method for correcting a text produced by speech recognition
US9710819B2 (en) * 2003-05-05 2017-07-18 Interactions Llc Real-time transcription system utilizing divided audio chunks
US8019602B2 (en) * 2004-01-20 2011-09-13 Microsoft Corporation Automatic speech recognition learning using user corrections
US8140338B2 (en) * 2005-12-08 2012-03-20 Nuance Communications Austria Gmbh Method and system for speech based document history tracking
CN103000176B (en) * 2012-12-28 2014-12-10 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN106486113A (en) * 2015-08-26 2017-03-08 重庆西线科技有限公司 A kind of minutes method
CN106448675B (en) * 2016-10-21 2020-05-01 科大讯飞股份有限公司 Method and system for correcting recognition text
CN106802885A (en) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 A kind of meeting summary automatic record method, device and electronic equipment
CN106782551B (en) * 2016-12-06 2020-07-24 北京华夏电通科技有限公司 Voice recognition system and method
CN107562723A (en) * 2017-08-24 2018-01-09 网易乐得科技有限公司 Meeting processing method, medium, device and computing device
CN107564531A (en) * 2017-08-25 2018-01-09 百度在线网络技术(北京)有限公司 Minutes method, apparatus and computer equipment based on vocal print feature
CN107679032A (en) * 2017-09-04 2018-02-09 百度在线网络技术(北京)有限公司 Voice changes error correction method and device

Also Published As

Publication number Publication date
CN108733649A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN111797632B (en) Information processing method and device and electronic equipment
CN110910863B (en) Method, device and equipment for extracting audio segment from audio file and storage medium
US11398228B2 (en) Voice recognition method, device and server
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
CN111079408A (en) Language identification method, device, equipment and storage medium
CN112995749A (en) Method, device and equipment for processing video subtitles and storage medium
CN113705300B (en) Method, device and equipment for acquiring phonetic and text training corpus and storage medium
CN114120304A (en) Entity identification method, device and computer program product
CN113434631B (en) Emotion analysis method and device based on event, computer equipment and storage medium
CN109271598B (en) Method, device and storage medium for extracting news webpage content
CN107493370B (en) Flow template determining method, flow information identification method and device
CN113657120B (en) Man-machine interaction intention analysis method and device, computer equipment and storage medium
CN108733649B (en) Method, device and system for inserting voice recognition text into script document
EP3961433A2 (en) Data annotation method and apparatus, electronic device and storage medium
CN115146692B (en) Data clustering method, device, electronic device and readable storage medium
CN111310452A (en) Word segmentation method and device
CN108647190B (en) Method, device and system for inserting voice recognition text into script document
CN112395414B (en) Text classification method, training method of classification model, training device of classification model, medium and training equipment
CN111785259A (en) Information processing method, device and electronic device
CN114386407B (en) Word segmentation method and device for text
CN110796137A (en) Method and device for identifying image
CN110858142B (en) Application starting method and device
CN105787032A (en) Webpage snapshot generating method and device
CN112214669A (en) Home decoration material formaldehyde release data processing method and device and monitoring server
CN104657845A (en) Smart SOP prompt function system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100094 5 floor 101, 6 building, No. 3, Fung Xiu Road, Haidian District, Beijing.

Applicant after: BEIJING HUAXIA DENTSU TECHNOLOGY Co.,Ltd.

Address before: 100094 5 floor 101, 6 building, No. 3, Fung Xiu Road, Haidian District, Beijing.

Applicant before: BEIJING CHINASYS TECHNOLOGIES Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant