WO2018227761A1 - Correction device for recorded and broadcasted data for teaching - Google Patents
Correction device for recorded and broadcasted data for teaching Download PDFInfo
- Publication number
- WO2018227761A1 WO2018227761A1 PCT/CN2017/099055 CN2017099055W WO2018227761A1 WO 2018227761 A1 WO2018227761 A1 WO 2018227761A1 CN 2017099055 W CN2017099055 W CN 2017099055W WO 2018227761 A1 WO2018227761 A1 WO 2018227761A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- text
- voice data
- voice
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the invention relates to a network teaching recording and broadcasting technology, which can be used for recording and playing a teaching activity or a conference process based on network teaching or online conference, and particularly relates to a device capable of correcting recorded teaching voice data.
- the recorder mainly includes a camera and a wireless digital microphone to record video information and voice data of the courseware.
- the first network transmits the courseware information to the server.
- the server is used on the one hand to further process the courseware information, to generate courseware data, and on the other hand to search and call the courseware data in the database, and then convert the courseware data back to the courseware information.
- the database is used to store the courseware data.
- the second network is used to connect the client to the server.
- the client is used to facilitate the user to query courseware information and invoke courseware information.
- Said The patent application discloses a relatively typical streaming media format recording course technology. Now, its main disadvantage is that the files formed after recording are relatively large, the uploading and downloading speed is slow, and the required storage space is large.
- Recent technologies in teaching and recording such as CN105306861A (publication date February 3, 2016), disclose an effective classroom teaching and recording method and system.
- multimedia whiteboard can be realized for users.
- the functional voice, speech/speech voice, communication with other users, and/or coaching, etc. are recorded to form different data streams, and a unified time stamp for various data streams is generated by the network teaching recording system.
- the end obtains the data stream according to the time stamp reproduction, and the organic combination plays out to display to the user, thereby completing the on-demand browsing.
- the patent application discloses a classroom recording and recording method for separately storing and recording classroom teaching data in three data stream formats according to time stamps.
- CN101354748A (Publication Date January 28, 2009) discloses a character recognition device including an image pickup device, a character recognition device, a voice conversion device, and a voice output device for taking in text information and taking a photo
- the entered text information is sent as a picture to the character
- the character recognition device is configured to identify the text information in the picture and send the text information to the voice conversion device;
- the voice conversion device is configured to convert the text information into voice data, and send the message information to the a voice output device;
- the voice output device configured to play the voice data.
- the patent application discloses a technique for collecting and recognizing text symbols in image information and then converting the text symbols into speech.
- CN102956231A discloses a semi-automatic correction based speech key information recording apparatus and method in the field of speech recognition technology, the apparatus comprising: a key information extraction unit and an information correction unit connected thereto, The key information extracting unit obtains the uncorrected text data and extracts the key information, and outputs the key information to the information correcting unit, and the information correcting unit outputs the text data confirmed by the user feedback.
- the invention reduces the workload of manual correction by using a semi-automatic information correction unit; uses a database to correct special nouns such as place names and professional tool names, thereby reducing the influence caused by the operator's knowledge limit in manual correction; extracting voice data Key information in the message, thereby increasing the amount of information available for the recorded information.
- the patent application aims to solve the problem of semi-automatic correction of text data after speech conversion into text.
- CN105159870A (Publication Date December 16, 2015) discloses a processing system for accurately completing continuous natural speech textualization, the processing system comprising a cloud speech recognition engine and a speech recognition post-correction platform, the speech recognition post-correction platform and The cloud speech recognition engine is connected, and the speech recognition post-correction platform comprises a display unit, a correction operation unit, a control unit and a three-dimensional integrated generation unit, and the correction operation unit comprises a speech correction, a keyboard correction, a mouse correction and a keyboard plus The correct operation mode of the mouse, which discloses that the voice file to be recognized can be finely segmented to achieve accurate recognition.
- CN105808197A discloses an information processing method applied to an electronic device having a speech recognition module, the method comprising: receiving input speech data; After the voice data is input and the recognition result is obtained, when the first information in the recognition result is the content that needs to be corrected, the first information is at least one character in the recognition result, and the manner of inputting through the operation body is adopted. Correcting the first information in the recognition result, wherein the first information in the recognition result is corrected by the manner of inputting the operation body, and only the part of the target correction is corrected, without the user inputting the voice data again.
- the objective result can be obtained, the operation process is simple, and the overall speed of information input is improved.
- the patent application discloses that it is only necessary to correct the content that needs to be corrected at the first position after speech recognition, thereby improving the speed of the correction, but such correction is only for the recognized text data, wherein the process of speech recognition In the middle, the method of comparing the information to be identified with the standard voice data is used, thereby improving the recognition accuracy.
- CN106328145A (Publication Date, January 11, 2017) discloses a voice correction method and apparatus, comprising: acquiring voice data input by a user; and identifying the voice data to obtain text content corresponding to the voice data; When the text content includes the first preset keyword, the text content is divided into original text and edit text according to the first preset keyword, wherein the edit text is used to perform the original text. Correction; according to the edited text from the Extracting the text to be corrected in the original text; correcting the original text according to the edited text and the to-be-corrected text to obtain the corrected text.
- the patent application discloses that the text to be edited in the original text, that is, the edited text, can be obtained by means of keyword recognition, and the correction is made in a targeted manner.
- CN102215233A (Publication Date, October 12, 2011) discloses an information system client installed in a user's terminal device, which can be applied to a microblog, a blog, a forum or a personal space, etc., including: a user interaction module and a connection office.
- the voice module of the user interaction module preferably, further includes a feedback module, a conversion module, the voice module includes a voice collection unit, a voice recognition unit, and a voice synthesis unit, and the voice collection unit is configured to collect voices of the user; the voice recognition unit The voice recognition unit collects the voice recognition as text output to the user interaction module; the voice synthesis unit converts the text obtained by the user interaction module from the information system server into voice output to the user; the feedback module, the connection center a voice recognition unit, configured to confirm whether the voice recognition is correct, if the correct, the feedback module outputs the text to the user interaction module, and if not, the feedback module enables the voice collection unit Re-acquiring the user's voice or the voice recognition unit corrects the text straight To confirm that it is correct.
- the patent application discloses a technology for converting between voice and text respectively, and aims to convert information of one format into information of another format, and if the outputted text information is incorrect, the feedback module re- Collect user voices or directly correct the text information of the output
- CN106486113A discloses a method of recording a meeting, comprising: Obtaining a voice signal; converting the voice signal into corresponding text information by a voice conversion software, and displaying the text information, wherein the text information includes correct text information and incorrect text information; and performing error text information in the document Marking, and linking the erroneous text information of the mark to the voice signal corresponding to the erroneous text information; when clicking the erroneous text information, using the voice conversion software to associate the voice linked with the erroneous text information
- the signal is secondarily recognized, and the second recognized text information is editably displayed in the document; the error text information is corrected and edited in the editable display to obtain corrected text information, and the corrected text information is used Replace the error text message.
- the present invention aims to provide a teaching recording and reproducing data correcting apparatus, which replaces the text of the specific corrected text with the standard voice data on the basis of correcting the text converted by the voice.
- the corresponding voice segment corresponding to the corrected text content in the original recorded voice data forms standard voice data and corresponding text, so that when the recorded data is recorded on the spot after the event, the voice data different from the original recorded voice data can be played. Correct voice, and display the correct subtitle information.
- the invention aims to provide a teaching recording and broadcasting data correction device with a voice correction function, which comprises converting a voice signal in a network teaching or an online conference into original voice data with time stamp using a recording device, and using a voice recognition model. Converting the original voice data into original text data, correcting the original text data, replacing the old text content to be corrected with the new text content, realizing correction of the original text data to form corrected text data, using time stamping Positioning, replacing the standard voice data of the new text content with the corresponding voice data segment of the old text content to form the modified voice data.
- the description mainly describes an embodiment of the present invention in the name of a network teaching recording system or a network conference system, it can be understood that the apparatus of the present invention can also be used for other network online communication processes.
- Record and play That is, the invention relates to the provision of online teaching, online training, emergency command (map annotation and voice recording), financial systems or online
- the method of teaching and recording of the conference system or the method, system and computer program product of the recording and playing process of the conference in the process of network teaching, online training, emergency command (map annotation and voice recording), financial system (marketing explanation) or online conference
- the recording of the voice data is involved, the correction of the text data by the conversion of the voice data is recognized, and the standard voice data of the corrected text content is replaced with the corresponding voice data of the original recording, so that the correction of the recorded voice data can be realized.
- the invention provides a teaching recording and broadcasting data correction device, in the process of recording and on-demand review of a multimedia classroom (or network classroom) or the like, especially when recording a multimedia classroom, including voice data and a multimedia whiteboard.
- the action data electronic whiteboard book
- the operation data on the screen of the user terminal, the video data recorded by the recording device, etc. are added in time stamps in the data stream format and then saved, forming the recorded data.
- the cable is used.
- the wireless local area or the wide area network obtains the recorded data, realizes the reproduction process on the user terminal by using the time stamp or simulates the teaching process of the reproduction classroom, thereby realizing the review playback or the on-demand playback of the recorded classroom.
- the teaching recording data correction device of the present invention comprises a file identification generating unit, a voice data collecting unit, a voice data correcting unit, another data collecting unit, a recording data playing unit and an error information feedback unit, wherein
- a file identification generating unit configured to generate a file identification ID when starting the recording teaching process
- a voice data collecting unit configured to convert a voice signal into original voice data by using an audio collecting device, and save the voice data stream format
- a voice data correction unit configured to correct voice data that needs to be corrected in the original voice data, to form corrected voice data
- the other data collection unit is configured to collect at least one of the following data: action data on the multimedia whiteboard, operation data on the screen of the user terminal, video data of the video recording device, and adding the timestamp to each data collected, And separately saved in a data stream format, and together with the modified voice data stream and the modified text data, form recordable data that can be played;
- Recording a data playing unit the user uses the terminal to acquire the recorded data through a network, combines different data streams according to the time stamp, thereby playing the recorded data on the terminal, reproduces and/or simulates a recurring teaching process, and realizes Learning and/or reviewing the teaching process;
- the error information feedback unit may: when the user plays the recorded data by using the terminal, may select and submit the error text content in the found modified text data, and the feedback content is updated by the administrator, and the correction is updated. Text data, and repeating the voice data replacement unit, updating the modified voice data.
- the voice data modification unit further includes a voice data recognition unit, a text data correction unit, and a voice data replacement unit, wherein:
- a voice data identification unit configured to convert the original voice data identification into original text data
- a text data correction unit configured to correct the original text data, and correct the old text content that needs to be corrected into an accurate new text content to form corrected text data
- a voice data replacing unit configured to replace the voice data stream segment of the old text content in the original voice data with standard voice data of the new text content to form a modified voice data stream.
- the voice data collecting unit is configured to collect at least one voice data from at least one voice source, add a time stamp, and save the voice data stream format;
- the voice data identification unit is configured to convert the voice data stream identification into text data, the text data includes the time stamp, and the time of each text content in the text data may be determined according to the time stamp coordinate.
- the voice data replacing unit is configured to retrieve standard voice data of the new text content from a standard voice database, and replace the old one of the original voice data with the standard voice data according to the time stamp A segment of the speech data stream corresponding to the textual content, thereby forming a stream of modified speech data.
- the modified text data is displayed on the screen of the terminal in a subtitle manner according to the time stamp, preferably displayed on a screen area in which video data is played, and more preferably, the text data is editable
- the way is displayed in a specific area of the terminal, in a selectable manner.
- a correction history record is formed, which may include correction time, correction content, correction operator, problem finder, and the like.
- the voice data replacing unit is configured to calculate smoothing according to the pronunciation time of the replaced old text content in the original voice data and the pronunciation time of the standard voice data of the new text content.
- the coefficient further adjusts the pronunciation time of the new text content according to the smoothing coefficient, thereby causing smoothing and synchronization of the voice data before and after replacement.
- the old text content may be empty content, that is, the new text content replacing the empty content is missing, and the text content needs to be added now.
- the new text content may be empty content, that is, the old text content that is replaced is redundant, and the deleted text content is now required.
- the level of classroom recording is improved, various data are separately saved by means of the identifier of the time stamp, and the voice data is corrected by the recognition and conversion of the voice data and the correction of the text data, and the voice data is corrected according to the corrected text content.
- the content that needs to be corrected in the original recorded voice data overcomes the problems caused by "less talk, wrong talk and miss talk" in the classroom, and can obtain double corrected speech data and text data (subtitle information).
- FIG. 1 is a block diagram of a recording and broadcasting system according to the present invention.
- FIG. 2 is a flow chart showing the recording and recording steps in accordance with the present invention.
- Figure 3 is a flow chart of speech correction in accordance with the present invention.
- the network teaching in the invention is not limited to the classroom teaching form of students and teachers, and may include online network teaching, remote network teaching, local network teaching, and employees of enterprises and institutions, with teachers and students, or trainers as participants. Participate in online web conferencing, remote web conferencing, local web conferencing, and other forms of communication/interaction that use the web for online communication and/or presentation of file content, such as remote collaborative work.
- the teacher 1 and the student 2 respectively connect to the teaching server 3 via the Internet using a terminal device installed with a client of the network teaching recording and broadcasting system, thereby realizing network lecture/listening/recording/on-demand/review of the multimedia classroom. .
- the terminal device includes: a processor, a network module, a control module, a display module, and a smart operating system, and can be a smart phone, a PAD, a notebook computer, a desktop computer, or the like.
- the terminal may be provided with a plurality of data interfaces for connecting various extension devices and accessories through a data bus.
- the intelligent operating system includes Windows, Android and its improvements, iOS, on which application software can be installed and run, and functions of various application software, services, and application stores/platforms under the intelligent operating system are realized.
- Terminal devices can be connected to the Internet via RJ45/Wi-Fi/Bluetooth/2G/3G/4G/G.hn/Zigbee/Z-ware/RFID connections and connected to other terminals or other computers and devices via the Internet.
- 1394/USB/Serial/SATA/SCSI/PCI-E/Thunderbolt/data card interface and other data interfaces or bus methods through HDMI/YpbPr/SPDIF/AV/DVI/VGA/TRS/SCART/Displayport, etc. Audio and video interface, etc.
- the connection method is used to connect various expansion equipment and accessories to form a conference/teaching equipment interactive system.
- the reading device realizes image access, sound access, use control and screen recording of the electronic whiteboard, RFID reading function, and can access and control mobile storage devices, digital devices and other devices through corresponding interfaces; through DLNA/ IGRS technology and internet technology are used to implement functions such as manipulation, interaction and screen switching between multi-screen devices.
- a processor is defined to include, but is not limited to, an instruction execution system such as a computer/processor based system, an application specific integrated circuit (ASIC), a computing device, or a non-transitory or non-transitory computer.
- a hardware and/or software system that reads a storage medium to acquire or acquire logic and execute instructions contained in a non-transitory storage medium or a non-transitory computer readable storage medium.
- the processor may also include any controller, state machine, microprocessor, internetwork-based entity, service or feature, or any other analog, digital, and/or mechanical implementation thereof.
- the Internet may include a local area network and a wide area Internet, and may be a wired Internet or a wireless Internet, or any combination of these networks.
- the main steps of the network teaching recording according to the present invention are as follows:
- the user uses the terminal to log in, the intelligent electronic whiteboard, the teacher terminal screen operation motion capturing program, the camera, the microphone and other multimedia teaching equipment enter the working state, the camera may have more than one, the microphone includes at least one, respectively Used to capture the teacher's voice and to capture the student's voice, the recording server's teaching server can be used to generate digital timestamps.
- S200 Start online teaching: the teacher starts classroom teaching, and the recording and broadcasting system generates a teaching document ID.
- the teacher uses the intelligent electronic whiteboard to display (as a teaching board or explain the problem board), and uses real-time voice to explain and use.
- Real-time interactive voice communication and can also be displayed and explained on the teacher terminal using electronic documents such as PPT documents, so as to carry out multimedia teaching and interactive question and answer communication with students.
- S300 Recording data saving: During the recording process, the action on the intelligent whiteboard is transmitted and saved in the form of “action data stream + time stamp”.
- the voice in the teaching and interaction process is “voice data stream + time stamp”.
- Transmission and storage, the operation actions of electronic documents such as PPT documents involved in the teacher terminal are transmitted and saved in the manner of "electronic document operation data stream + time stamp", and the collected video data is transmitted in the form of "video data stream + time stamp”. And save. All of these data streams throughout the course of the course are tied to the teaching document ID to achieve the identity of the recorded course. These data can be added or deleted as needed.
- the recorded data includes voice data, video data, and PPT document presentation data
- the PPT document presentation data can usually be displayed in the form of video data. You must use an action action to reproduce it.
- classified recording Split screen display is a relatively mature technology.
- the various data recorded can be saved to a local database or a terminal database, and then uploaded to the remote teaching server through the network, or directly saved to the remote teaching server.
- a voice acquisition device such as various available microphones, can be used to acquire the voice signal, and the voice signal can be converted to voice data for storage in a data stream format.
- the gender of the speech source can be marked so that the standard speech of the corresponding gender can be selected for subsequent speech correction (replacement) operations.
- the gender of the voice source can be separately identified, and the multiple voice sources can be identified, and the time stamps can be separately saved and the multiple voice sources can be separately identified. I will not repeat them here.
- S400 Voice data conversion: For the recorded original voice data, the original text data is first formed by the voice model, and then the original text data is corrected. When the original text data is formed, the time stamp of the original voice data is added to the text data so that the text content in the text data can be time-located.
- the text content may be at least one word, word, sentence or paragraph in the text data.
- the clock data of the time dimension of the audio data can be obtained by the time positioning, that is, the clock parameter of the time point at which a certain data segment in one audio data can be relatively located.
- the original speech data identification can be converted into the original text data by using various available speech models, and when the speech data recognition conversion is performed, the gender of the speech source is first recognized. And adding gender information to the text data.
- Proofreading corrections for text data include manual proofreading, semi-automatic proofreading, and voice proofreading.
- Voice data correction The original text data is corrected using a voice correction command, that is, using a voice proofing method (CN106406807A), but the present invention is not limited thereto.
- the voice proofreading unit includes: receiving a voice correction instruction, identifying, in the text data to be corrected, all the characters that are the same as the voice correction instruction sound, and a time stamp of the text content, determining the to-be-corrected text in all the recognized texts, and displaying
- the alternative text list corresponding to the to-be-corrected text accepts an alternative text selection instruction, performs a replacement operation, and forms corrected text data, thereby completing the text correction.
- the standard pronunciation information of the corrected text is retrieved from the standard speech database, and the corresponding speech data segment is replaced with the standard pronunciation information according to the time stamp of the corrected text to form the corrected speech data.
- the standard speech database may include a girls standard speech database, a boys standard speech database, and/or a personalized standard speech database.
- the personalized standard voice database is a voice model of a specific speaker formed by a standard voice database formed by recording a specific speaker, or by corpus training, and can be used for voice recognition, and can also be used to generate personalized standard voice. database.
- the corresponding standard voice is selected according to the voice source gender information of the original text data, or other personalized information.
- the old text content may be empty content, that is, the new text content replacing the empty content is missing, and the added text content is now required.
- the new text content can be empty content, that is, The old text content that was replaced is superfluous and now needs to delete the text content.
- the specific steps of the speech correction are as follows:
- the voice correction instruction is received, for example, the user can issue a “selected Hu Jian” voice instruction through the unit to initiate the correction of the problem text “Hu Jian”. instruction.
- the user can clarify which character needs to be corrected by using a further voice instruction.
- the words that are recognized as “hujian” from the time of going to the following are: “Hu Jian”, “mutual see”, “shoulder shoulder”, etc., the user currently wants to recognize the first If a text is corrected, the "first" voice can be issued to determine the first recognized text as the current text to be corrected.
- an alternative text of the homophone is displayed in the vicinity of the text.
- a list of words that allows the user to subsequently select alternate text For example, if the first word “Hu Jian” in the text data is “hujian” is determined as the text to be corrected, then the first word in the text data in this step is “hujian”. "A list of alternative texts is displayed nearby: 1, Fujian; 2, accessories; 3, shoulder pads; 4, mutual see,...
- the user can speak the position of the alternative text in the alternative text list by voice, and complete the work of selecting the alternative text. For example, use Fujian to replace Hu Jian.
- the time position information of the text to be corrected is marked with a time stamp, thereby accurately positioning the time position information of the voice data corresponding to the corrected text.
- a correction history record is formed, the correction history record including correction time, correction content, correction operator, and the like.
- the standard speech data is searched according to the alternative text, and if a plurality of words or sentences are combined, a new piece of speech data is combined.
- the text data includes gender information of the voice source, and when the search is performed, the girl's pronunciation or the boy's pronunciation, or various voice data such as various trebles and basses may be obtained according to the gender information.
- the new voice data segment is replaced with the corresponding voice data segment in the original voice data according to the previously described time position information to form new voice data.
- the pronunciation time is not necessarily the same.
- the pronunciation time of the two speech segments may be calculated first. And a smoothing coefficient, according to the smoothing coefficient, speeding up or slowing down the standard pronunciation time, so that the pronunciation duration of the same text content after the replacement and before the replacement is consistent.
- the user uses the terminal to log in to the recording and broadcasting system through the Internet, and can realize the review playback or on-demand playback of the recorded classroom.
- these recording classrooms may be process record files of online online conferences, and the recording and playback system will send the teaching file IDs requested by the user for review or on-demand to the teaching server through the Socket encrypted channel, through teaching.
- the file ID obtains the time-stamped action data stream, the voice data stream, the electronic document operation data stream, the video data stream, and the text data of the course to be sent to the user terminal requesting the corresponding teaching file ID, and the user terminal locally according to the timestamp.
- Restore Reproduce or simulate reappearance
- These data streams can be displayed or switched display in each functional area of the user terminal. For video, it can generally be reproduced on the user terminal, but for the operation of the electronic whiteboard, simulation reproduction can be realized by the simulation program of the electronic whiteboard.
- the user can choose to play only at least one of these data streams, for example, can only listen to the voice.
- text data it can be displayed in a specific area of the user terminal in the form of subtitles, such as a video exhibition. In the exhibition area.
- the text data functioning as a caption can be displayed in a specific editable area, so that the user can perform a selected operation or the like, so that only the corresponding text needs to be selected for the found non-standard voice data or text information.
- Information can be fed back.
- the administrator of the recording and broadcasting system verifies the feedback after receiving the feedback from the user. If it finds that there is an error, repeats the correction steps of the previous text data and the voice data stream, so that the text data and the voice data can be continuously improved and improved. .
- the terminal and the server are configured to be connected to a communication network including the Internet. Therefore, the medium may be a program that carries the program code in a streaming manner via the communication network.
- the program code is downloaded from the communication network as described above, the program for downloading may be stored in the main device or may be installed from another recording medium.
- the present invention can be realized by the above-described program code in the form of a computer data signal embodied in an electronic transmission embodied in a carrier wave.
- the teaching and recording data correction device improves the level of classroom recording, and saves various data by means of time stamp identification, through recognition and conversion of voice data and correction of text data, and according to the correction Corrected voice data with text content, corrected original recorded language
- the content that needs to be corrected in the audio data overcomes the problems caused by "less talk, wrong talk and miss talk" in the classroom, and can obtain double corrected speech data and text data (subtitle information).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
本发明涉及一种网络教学录播技术,可以用于基于网络教学或者在线会议等的教学活动或会议过程的录制和播放,特别是涉及一种能够对录制的教学语音数据进行修正的装置。The invention relates to a network teaching recording and broadcasting technology, which can be used for recording and playing a teaching activity or a conference process based on network teaching or online conference, and particularly relates to a device capable of correcting recorded teaching voice data.
近些年来,由于传统教学模式已经越来越不能满足用户对于多媒体、信息化、便于回放等新型教学方式的需求,随着互联网技术,特别是移动互联网技术的快速发展和普及,各种网络教学录播系统蓬勃发展。在网络教学中,通过课堂录制将教学过程录制下来,可以在互联网上共享教学资源,用户可以使用终端在线访问这些教学资源,可以满足用户远程学习和回顾的需求。In recent years, as traditional teaching models have become less and less able to meet the needs of users for new teaching methods such as multimedia, informationization, and easy playback, with the rapid development and popularization of Internet technologies, especially mobile Internet technologies, various online teaching The recording and broadcasting system is booming. In the network teaching, the teaching process is recorded through the classroom recording, and the teaching resources can be shared on the Internet. The user can use the terminal to access these teaching resources online, which can meet the needs of the user for remote learning and review.
教学录播方面早期的技术,比如CN101141271A(公开日2008年3月12日)公开了一种网络教学的录播系统,包括:录制器、处理器、第一网络、第二网络,服务器、数据库及三个客户端。其中,录制器主要包括摄像头及无线数字话筒以录制课件的视频信息及语音数据。第一网络于将所述课件信息传送至服务器。服务器一方面用于进一步处理所述课件信息,产生课件数据,另一方面用于在数据库中搜寻并调用所述课件数据,进而将所述课件数据转换回所述课件信息。数据库用于存储所述课件数据。第二网络用于连接客户端与服务器。客户端用于方便用户查询课件信息及调用课件信息。所述 专利申请公开了一种比较典型的流媒体格式录制课程的技术,现在看其主要缺点在于录制后形成的文件比较大,上传下载速度慢,需要的存储空间大等。Early technologies in teaching recording and broadcasting, such as CN101141271A (publication date March 12, 2008), disclosed a recording and broadcasting system for network teaching, including: recorder, processor, first network, second network, server, database And three clients. Among them, the recorder mainly includes a camera and a wireless digital microphone to record video information and voice data of the courseware. The first network transmits the courseware information to the server. The server is used on the one hand to further process the courseware information, to generate courseware data, and on the other hand to search and call the courseware data in the database, and then convert the courseware data back to the courseware information. The database is used to store the courseware data. The second network is used to connect the client to the server. The client is used to facilitate the user to query courseware information and invoke courseware information. Said The patent application discloses a relatively typical streaming media format recording course technology. Now, its main disadvantage is that the files formed after recording are relatively large, the uploading and downloading speed is slow, and the required storage space is large.
教学录播方面近期的技术,比如CN105306861A(公开日2016年2月3日)公开了一种有效的课堂教学录播方法和系统,在网络教学或在线会议过程中,可以实现对于用户使用多媒体白板的功能操作、讲话/说话语音、与其他用户的交流和/或辅导等的交流语音进行录制,分别形成不同的数据流,并且由网络教学的录播系统产生统一的时间戳对各种数据流进行标记,而不是完全以流媒体的格式将整个事件记录下来,使得网络用户可随时随地方便的通过网络从云端服务器或局域网服务器下载各种需要播放的数据流,获取数据流后用户终端的客户端根据时间戳再现获得数据流,有机组合播放出来给用户进行展示,从而完成点播浏览。所述专利申请公开了一种根据时间戳以三种数据流格式分别存储和记录课堂教学数据的课堂录播方法。Recent technologies in teaching and recording, such as CN105306861A (publication date February 3, 2016), disclose an effective classroom teaching and recording method and system. In the process of online teaching or online meeting, multimedia whiteboard can be realized for users. The functional voice, speech/speech voice, communication with other users, and/or coaching, etc., are recorded to form different data streams, and a unified time stamp for various data streams is generated by the network teaching recording system. Marking, instead of recording the entire event in a streaming media format, the network user can conveniently download various data streams to be played from the cloud server or the LAN server through the network anytime and anywhere, and obtain the data stream after the customer of the user terminal. The end obtains the data stream according to the time stamp reproduction, and the organic combination plays out to display to the user, thereby completing the on-demand browsing. The patent application discloses a classroom recording and recording method for separately storing and recording classroom teaching data in three data stream formats according to time stamps.
随着对录制课程品质的追求越来越高,越来越多的教学录播系统采用了语音识别技术,通常需要将语音转换成文字,在屏幕上以字幕方式显示或者保存为文本格式。现有技术中,关于语音识别,特别是将语音转换为文字或者将文字转换为语音的专利申请不在少数,比如:With the increasing pursuit of the quality of recorded courses, more and more teaching and recording systems use speech recognition technology, usually need to convert the speech into text, display it as a subtitle on the screen or save it as a text format. In the prior art, there are a few patent applications for speech recognition, especially converting speech to text or converting text to speech, such as:
CN101354748A(公开日2009年1月28日)公开了一种文字识别装置,包括摄像装置、字符识别装置、语音转换装置、及语音输出装置,所述摄像装置,用于摄入文字信息,将摄入的文字信息以图片形式发送到所述字符识 别装置;所述字符识别装置,用于在上述图片中识别出上述文字信息,发送到所述语音转换装置;所述语音转换装置,用于将上述文字信息转换为语音数据,发送到所述语音输出装置;所述语音输出装置,用于播放上述语音数据。所述专利申请公开了一种采集和识别图像信息中的文字符号,然后将文字符号转换成语音的技术。CN101354748A (Publication Date January 28, 2009) discloses a character recognition device including an image pickup device, a character recognition device, a voice conversion device, and a voice output device for taking in text information and taking a photo The entered text information is sent as a picture to the character The character recognition device is configured to identify the text information in the picture and send the text information to the voice conversion device; the voice conversion device is configured to convert the text information into voice data, and send the message information to the a voice output device; the voice output device, configured to play the voice data. The patent application discloses a technique for collecting and recognizing text symbols in image information and then converting the text symbols into speech.
CN102956231A(公开日2013年3月6日)公开了一种语音识别技术领域的基于半自动校正的语音关键信息记录装置及方法,所述装置包括:关键信息提取单元和与之相连的信息校正单元,其中:关键信息提取单元获取未经校正的文本数据并提取出关键信息后输出至信息校正单元,信息校正单元输出用户反馈确认后的文本数据。本发明通过半自动的信息校正单元,降低了人工校正的工作量;利用数据库对特殊名词如地名、专业工具名称进行校正,降低了人工校正中操作员的知识量限制所造成的影响;提取语音数据中的关键信息,从而提高所记录信息的有效信息量。所述专利申请旨在解决语音转换成文本之后,对文本数据进行半自动校正的问题。CN102956231A (Publication Date March 6, 2013) discloses a semi-automatic correction based speech key information recording apparatus and method in the field of speech recognition technology, the apparatus comprising: a key information extraction unit and an information correction unit connected thereto, The key information extracting unit obtains the uncorrected text data and extracts the key information, and outputs the key information to the information correcting unit, and the information correcting unit outputs the text data confirmed by the user feedback. The invention reduces the workload of manual correction by using a semi-automatic information correction unit; uses a database to correct special nouns such as place names and professional tool names, thereby reducing the influence caused by the operator's knowledge limit in manual correction; extracting voice data Key information in the message, thereby increasing the amount of information available for the recorded information. The patent application aims to solve the problem of semi-automatic correction of text data after speech conversion into text.
CN105159870A(公开日2015年12月16日)公开了一种精确完成连续自然语音文本化的处理系统,所述处理系统包括云端语音识别引擎及语音识别后修正平台,所述语音识别后修正平台与所述云端语音识别引擎连接,所述语音识别后修正平台包括显示单元、修正操作单元、控制单元及三维一体生成单元,所述修正操作单元包括语音修正、键盘修正、鼠标修正及键盘加 鼠标的修正操作方式,其中公开了可以对于待识别的语音文件进行精细切分,实现精准识别。CN105159870A (Publication Date December 16, 2015) discloses a processing system for accurately completing continuous natural speech textualization, the processing system comprising a cloud speech recognition engine and a speech recognition post-correction platform, the speech recognition post-correction platform and The cloud speech recognition engine is connected, and the speech recognition post-correction platform comprises a display unit, a correction operation unit, a control unit and a three-dimensional integrated generation unit, and the correction operation unit comprises a speech correction, a keyboard correction, a mouse correction and a keyboard plus The correct operation mode of the mouse, which discloses that the voice file to be recognized can be finely segmented to achieve accurate recognition.
CN105808197A(公开日2016年7月27日)公开了一种信息处理方法,应用于具有语音识别模块的电子设备,所述方法包括:接收输入语音数据;在依据预设的语音识别模型对所述输入语音数据进行识别得到识别结果后,当所述识别结果中的第一信息为需要修正的内容时,所述第一信息为所述识别结果中的至少一个字符,采用通过操作体输入的方式对所述识别结果中的第一信息进行修正,所述用操作体输入的方式对识别结果中的第一信息进行修正,只需对目的修正的部分进行修正,而无需用户再次输入语音数据即可得到目的结果,操作过程简单,提高了信息输入的整体速度。所述专利申请公开了可以只需要通过对语音识别后的第一处需要修正的内容进行修正,从而提高了修正的速度,但是这样的修正只是针对识别后的文本数据,其中在语音识别的过程中,使用了将待识别信息与标准语音数据进行比对,进而提高识别准确率的方式。CN105808197A (Publication Date July 27, 2016) discloses an information processing method applied to an electronic device having a speech recognition module, the method comprising: receiving input speech data; After the voice data is input and the recognition result is obtained, when the first information in the recognition result is the content that needs to be corrected, the first information is at least one character in the recognition result, and the manner of inputting through the operation body is adopted. Correcting the first information in the recognition result, wherein the first information in the recognition result is corrected by the manner of inputting the operation body, and only the part of the target correction is corrected, without the user inputting the voice data again. The objective result can be obtained, the operation process is simple, and the overall speed of information input is improved. The patent application discloses that it is only necessary to correct the content that needs to be corrected at the first position after speech recognition, thereby improving the speed of the correction, but such correction is only for the recognized text data, wherein the process of speech recognition In the middle, the method of comparing the information to be identified with the standard voice data is used, thereby improving the recognition accuracy.
CN106328145A(公开日2017年1月11日)公开了一种语音修正方法及装置,包括:获取用户输入的语音数据;对所述语音数据进行识别,以得到所述语音数据对应的文本内容;当所述文本内容中包含第一预设关键词时,根据所述第一预设关键词将所述文本内容划分为原始文本和编辑文本,其中,所述编辑文本用于对所述原始文本进行修正;根据所述编辑文本从所 述原始文本中提取出待修正文本;根据所述编辑文本和所述待修正文本修正所述原始文本,以得到修正后的文本。所述专利申请公开了,可以通过关键字识别的方式获得原始文本中需要编辑的文本即编辑文本,针对性的进行修正。CN106328145A (Publication Date, January 11, 2017) discloses a voice correction method and apparatus, comprising: acquiring voice data input by a user; and identifying the voice data to obtain text content corresponding to the voice data; When the text content includes the first preset keyword, the text content is divided into original text and edit text according to the first preset keyword, wherein the edit text is used to perform the original text. Correction; according to the edited text from the Extracting the text to be corrected in the original text; correcting the original text according to the edited text and the to-be-corrected text to obtain the corrected text. The patent application discloses that the text to be edited in the original text, that is, the edited text, can be obtained by means of keyword recognition, and the correction is made in a targeted manner.
CN102215233A(公开日2011年10月12日)公开了一种信息系统客户端,安装于用户的终端设备中,可以应用于微博、博客、论坛或个人空间等,包括:用户交互模块以及连接所述用户交互模块的语音模块,优选的,还包括反馈模块,转换模块,所述语音模块包括语音采集单元、语音识别单元、语音合成单元,语音采集单元用于采集用户的语音;语音识别单元将语音采集单元采集的语音识别为文字输出至所述用户交互模块;语音合成单元将所述用户交互模块从所述信息系统服务器上获取的文字转换为语音向用户输出;所述反馈模块,连接所述语音识别单元,用于确认所述语音识别为文字是否正确,若正确,所述反馈模块将所述文字输出至所述用户交互模块,若不正确,所述反馈模块使所述语音采集单元重新采集用户的语音或者所述语音识别单元修正所述文字直至确认正确。所述专利申请公开了一种可以进行语音和文字分别互相转换的技术,旨在将一种格式的信息转换成另一种格式的信息,所述反馈模块如果输出的文字信息不正确,就重新采集用户语音,或者直接修正所述输出的文字信息。CN102215233A (Publication Date, October 12, 2011) discloses an information system client installed in a user's terminal device, which can be applied to a microblog, a blog, a forum or a personal space, etc., including: a user interaction module and a connection office. The voice module of the user interaction module, preferably, further includes a feedback module, a conversion module, the voice module includes a voice collection unit, a voice recognition unit, and a voice synthesis unit, and the voice collection unit is configured to collect voices of the user; the voice recognition unit The voice recognition unit collects the voice recognition as text output to the user interaction module; the voice synthesis unit converts the text obtained by the user interaction module from the information system server into voice output to the user; the feedback module, the connection center a voice recognition unit, configured to confirm whether the voice recognition is correct, if the correct, the feedback module outputs the text to the user interaction module, and if not, the feedback module enables the voice collection unit Re-acquiring the user's voice or the voice recognition unit corrects the text straight To confirm that it is correct. The patent application discloses a technology for converting between voice and text respectively, and aims to convert information of one format into information of another format, and if the outputted text information is incorrect, the feedback module re- Collect user voices or directly correct the text information of the output.
CN106486113A(2017年3月8日)公开了一种会议记录方法,包括: 获取语音信号;由语音转化软件将所述语音信号转化成对应的文字信息,并在文档中予以显示,其中,所述文字信息包括正确文字信息和错误文字信息;对文档中的错误文字信息进行标记,并将标记的所述错误文字信息与对应所述错误文字信息的语音信号进行关联链接;点击所述错误文字信息时,采用所述语音转化软件对与所述错误文字信息关联链接的语音信号进行二次识别,并在文档中对二次识别出来的文字信息进行可编辑显示;通过可编辑显示中对错误文字信息进行更正编辑,以得到更正的文字信息,并用所述更正的文字信息替换所述错误文字信息。CN106486113A (March 8, 2017) discloses a method of recording a meeting, comprising: Obtaining a voice signal; converting the voice signal into corresponding text information by a voice conversion software, and displaying the text information, wherein the text information includes correct text information and incorrect text information; and performing error text information in the document Marking, and linking the erroneous text information of the mark to the voice signal corresponding to the erroneous text information; when clicking the erroneous text information, using the voice conversion software to associate the voice linked with the erroneous text information The signal is secondarily recognized, and the second recognized text information is editably displayed in the document; the error text information is corrected and edited in the editable display to obtain corrected text information, and the corrected text information is used Replace the error text message.
综上可见,在现有技术中,无论是教学录播领域,还是语音识别转换领域,都没有涉及对于待识别语音本身的修正构思,大家关心的都是语音识别转换特别是语音转换成文字的准确率的问题。然而,在各种教学或者会议过程中,对于任何说话者来说,都可能存在错说、漏说或者发音不标准,甚至表达不标准的情况,对于这些问题,通常是采用在语音识别时,也就是转换成文字时(比如以字幕呈现),加上文字标注(比如以括号中解释的方式)的方式进行标识。In summary, in the prior art, neither the teaching recording and broadcasting field nor the speech recognition conversion field involves the correction concept of the speech itself to be recognized. Everyone cares about the speech recognition conversion, especially the speech conversion into text. The problem of accuracy. However, in any teaching or conference process, for any speaker, there may be misrepresentation, omission or non-standard pronunciation, and even non-standard expression. For these problems, it is usually used in speech recognition. That is, when converting to text (such as in subtitles), plus text annotations (such as explained in parentheses).
特别地,对于教学录播系统,由于讲授的课程要进行录制并且通过网络重现给用户,错说、漏说、表达不标准等问题带来的影响因为语言数据被压缩而变得突出而且影响很大,一方面,因为用户通常难以识别出这些错误,而且即使以字幕方式进行标识,另一方面,因为使用环境的原因,用户可能 不方便看字幕,仅能以语音的形式收听,语音表达不清楚,进一步影响了用户学习的效果。In particular, for the teaching and recording system, since the taught course is recorded and reproduced to the user through the network, the influence of the problems such as wrong saying, missing, and non-standard expression becomes prominent and affected because the language data is compressed. Very large, on the one hand, because users often find it difficult to identify these errors, and even if they are identified by subtitles, on the other hand, because of the use of the environment, the user may It is inconvenient to watch subtitles, and can only be listened to in the form of voice. The voice expression is unclear, which further affects the effect of user learning.
针对现有技术中存在的问题,本发明旨在提供一种教学录播数据修正装置,在对语音转换成的文本进行修正的基础上,对于具体修正的文字,使用标准的语音数据替换所述修正的文字内容所对应的在原始录播语音数据中的相应的语音片段,形成标准的语音数据和对应的文本,使得在事后点播回顾录播数据的时候,可以播放不同于原始录制语音数据的正确语音,以及显示对应的正确字幕信息。In view of the problems existing in the prior art, the present invention aims to provide a teaching recording and reproducing data correcting apparatus, which replaces the text of the specific corrected text with the standard voice data on the basis of correcting the text converted by the voice. The corresponding voice segment corresponding to the corrected text content in the original recorded voice data forms standard voice data and corresponding text, so that when the recorded data is recorded on the spot after the event, the voice data different from the original recorded voice data can be played. Correct voice, and display the correct subtitle information.
发明内容Summary of the invention
本发明旨在提供一种具备语音修正功能的教学录播数据修正装置,包括使用录音设备将在网络教学或在线会议过程中的语音信号转换成带有时间戳的原始语音数据,使用语音识别模型将所述原始语音数据识别转换成原始文本数据,对所述原始文本数据进行校对,使用新文本内容替换需要修正的旧文本内容,实现对原始文本数据的修正形成修正文本数据,使用时间戳进行定位,将新文本内容的标准语音数据替换旧文本内容的相应语音数据片段,形成修正语音数据。The invention aims to provide a teaching recording and broadcasting data correction device with a voice correction function, which comprises converting a voice signal in a network teaching or an online conference into original voice data with time stamp using a recording device, and using a voice recognition model. Converting the original voice data into original text data, correcting the original text data, replacing the old text content to be corrected with the new text content, realizing correction of the original text data to form corrected text data, using time stamping Positioning, replacing the standard voice data of the new text content with the corresponding voice data segment of the old text content to form the modified voice data.
应该理解的是,尽管说明书中主要以网络教学的录播系统或者网络会议系统的名义描述了本发明的实施例,但是可以理解的是,本发明的装置还可以用于其他网络在线交流过程的录制和播放。也就是说,本发明涉及给予网络教学、在线培训、应急指挥(地图标注和语音录制)、金融系统或者在线 会议登系统的教学活动或者会议过程录制及播放的方法、系统以及计算机程序产品,在网络教学、在线培训、应急指挥(地图标注及语音录制)、金融系统(操盘讲解)或者在线会议的过程中,只要涉及录制语音数据的,通过对所述语音数据识别转换后形成的文本数据的修正,将修正的文本内容的标准语音数据替换原始录制的相应语音数据,可以实现对于录制语音数据的修正。It should be understood that although the description mainly describes an embodiment of the present invention in the name of a network teaching recording system or a network conference system, it can be understood that the apparatus of the present invention can also be used for other network online communication processes. Record and play. That is, the invention relates to the provision of online teaching, online training, emergency command (map annotation and voice recording), financial systems or online The method of teaching and recording of the conference system or the method, system and computer program product of the recording and playing process of the conference, in the process of network teaching, online training, emergency command (map annotation and voice recording), financial system (marketing explanation) or online conference As long as the recording of the voice data is involved, the correction of the text data by the conversion of the voice data is recognized, and the standard voice data of the corrected text content is replaced with the corresponding voice data of the original recording, so that the correction of the recorded voice data can be realized. .
本发明提供一种教学录播数据修正装置,在对多媒体课堂(或网络课堂)或类似场景的录制和点播回顾过程中,特别是在对多媒体课堂进行录制时,包括将语音数据、多媒体白板上的动作数据(电子白板板书)、用户终端屏幕上的操作数据、录像设备录制的视频数据等以数据流格式添加时间戳后分别保存,形成录制数据,用户登录网络教学录播系统之后,使用有线或无线局域或广域网络,获得所述录制数据,借助时间戳在用户终端上实现重现或模拟重现课堂的授课过程,从而实现对录制课堂的回顾播放或点播播放。The invention provides a teaching recording and broadcasting data correction device, in the process of recording and on-demand review of a multimedia classroom (or network classroom) or the like, especially when recording a multimedia classroom, including voice data and a multimedia whiteboard. The action data (electronic whiteboard book), the operation data on the screen of the user terminal, the video data recorded by the recording device, etc. are added in time stamps in the data stream format and then saved, forming the recorded data. After the user logs in to the network teaching and recording system, the cable is used. Or the wireless local area or the wide area network obtains the recorded data, realizes the reproduction process on the user terminal by using the time stamp or simulates the teaching process of the reproduction classroom, thereby realizing the review playback or the on-demand playback of the recorded classroom.
本发明的教学录播数据修正装置,包括文件标识生成单元、语音数据采集单元、语音数据修正单元、其他数据采集单元、录制数据播放单元和错误信息反馈单元,其中,The teaching recording data correction device of the present invention comprises a file identification generating unit, a voice data collecting unit, a voice data correcting unit, another data collecting unit, a recording data playing unit and an error information feedback unit, wherein
文件标识生成单元,用于在开始录制教学过程时,生成文件标识ID;a file identification generating unit, configured to generate a file identification ID when starting the recording teaching process;
语音数据采集单元,用于使用音频采集设备将语音信号转换成原始语音数据,以语音数据流格式保存; a voice data collecting unit, configured to convert a voice signal into original voice data by using an audio collecting device, and save the voice data stream format;
语音数据修正单元,用于修正所述原始语音数据需要修正的语音数据,形成修正语音数据;a voice data correction unit, configured to correct voice data that needs to be corrected in the original voice data, to form corrected voice data;
其他数据采集单元,用于采集以下数据中的至少一种:多媒体白板上的动作数据、用户终端屏幕上的操作数据、录像设备的视频数据,对于采集的每种数据添加所述时间戳,均以数据流格式分别保存,与所述修正语音数据流和所述修正文本数据共同形成可以播放的录制数据;The other data collection unit is configured to collect at least one of the following data: action data on the multimedia whiteboard, operation data on the screen of the user terminal, video data of the video recording device, and adding the timestamp to each data collected, And separately saved in a data stream format, and together with the modified voice data stream and the modified text data, form recordable data that can be played;
录制数据播放单元,用户使用终端通过网络获取所述录制数据,根据所述时间戳组合不同数据流,从而在所述终端上播放所述录制数据,重现和/或模拟重现教学过程,实现对教学过程的学习和/或复习;Recording a data playing unit, the user uses the terminal to acquire the recorded data through a network, combines different data streams according to the time stamp, thereby playing the recorded data on the terminal, reproduces and/or simulates a recurring teaching process, and realizes Learning and/or reviewing the teaching process;
错误信息反馈单元,用户使用所述终端播放所述录制数据时,可以将发现的所述修正文本数据中的错误文字内容选定并提交反馈,反馈的内容经由管理员确认之后,更新所述修正文本数据,并重复所述语音数据替换单元,更新所述修正语音数据。The error information feedback unit may: when the user plays the recorded data by using the terminal, may select and submit the error text content in the found modified text data, and the feedback content is updated by the administrator, and the correction is updated. Text data, and repeating the voice data replacement unit, updating the modified voice data.
所述语音数据修正单元进一步包括语音数据识别单元、文本数据修正单元和语音数据替换单元,其中:The voice data modification unit further includes a voice data recognition unit, a text data correction unit, and a voice data replacement unit, wherein:
语音数据识别单元,用于将所述原始语音数据识别转换成原始文本数据;a voice data identification unit, configured to convert the original voice data identification into original text data;
文本数据修正单元,用于对所述原始文本数据进行校对,将其中需要修正的旧文字内容,修正为准确的新文字内容,形成修正文本数据; a text data correction unit, configured to correct the original text data, and correct the old text content that needs to be corrected into an accurate new text content to form corrected text data;
语音数据替换单元,用于使用所述新文字内容的标准语音数据替换在所述原始语音数据中的所述旧文字内容的语音数据流片段,形成修正语音数据流。And a voice data replacing unit, configured to replace the voice data stream segment of the old text content in the original voice data with standard voice data of the new text content to form a modified voice data stream.
所述语音数据采集单元,用于从至少一个语音源采集至少一个语音数据,并添加时间戳,以语音数据流格式保存;The voice data collecting unit is configured to collect at least one voice data from at least one voice source, add a time stamp, and save the voice data stream format;
所述语音数据识别单元,用于将所述语音数据流识别转换成文本数据,所述文本数据包含所述时间戳,根据所述时间戳可以确定所述文本数据中的每个文字内容的时间坐标。The voice data identification unit is configured to convert the voice data stream identification into text data, the text data includes the time stamp, and the time of each text content in the text data may be determined according to the time stamp coordinate.
所述语音数据替换单元,用于从标准语音数据库中,调取所述新文字内容的标准语音数据,根据所述时间戳,使用所述标准语音数据替换所述原始语音数据中的所述旧文字内容对应的语音数据流片段,从而形成修正语音数据流。The voice data replacing unit is configured to retrieve standard voice data of the new text content from a standard voice database, and replace the old one of the original voice data with the standard voice data according to the time stamp A segment of the speech data stream corresponding to the textual content, thereby forming a stream of modified speech data.
所述修正文本数据,根据所述时间戳,以字幕方式显示在所述终端的屏幕上,优选的是,显示在在播放视频数据的屏幕区域,更优选的是,所述文本数据以可编辑的方式如可选定的方式,显示在所述终端的特定区域。The modified text data is displayed on the screen of the terminal in a subtitle manner according to the time stamp, preferably displayed on a screen area in which video data is played, and more preferably, the text data is editable The way is displayed in a specific area of the terminal, in a selectable manner.
在对文本数据和语音数据进行修正或更新时,形成修正历史记录,所述修正历史记录可以包括修正时间、修正内容、修正操作人、问题发现人等等。When the text data and the voice data are corrected or updated, a correction history record is formed, which may include correction time, correction content, correction operator, problem finder, and the like.
所述语音数据替换单元,用于根据被替换的旧文字内容在所述原始语音数据中的发音时间以及新文字内容的标准语音数据的发音时间,计算出平滑 系数,再根据所述平滑系数,调整所述新文字内容的发音时间,由此使得替换前后语音数据的平滑和同步。The voice data replacing unit is configured to calculate smoothing according to the pronunciation time of the replaced old text content in the original voice data and the pronunciation time of the standard voice data of the new text content. The coefficient further adjusts the pronunciation time of the new text content according to the smoothing coefficient, thereby causing smoothing and synchronization of the voice data before and after replacement.
所述旧文字内容可以为空内容,也就是,替换所述空内容的新文字内容是遗漏的,现在需要添加的文字内容。The old text content may be empty content, that is, the new text content replacing the empty content is missing, and the text content needs to be added now.
所述新文字内容可以为空内容,也就是,被替换的所述旧文字内容是多余的,现在需要删除的文字内容。The new text content may be empty content, that is, the old text content that is replaced is redundant, and the deleted text content is now required.
通过本发明的方法,提高了课堂录制的水平,借助时间戳的标识,分别保存各种数据,通过对语音数据的识别转换和文本数据的修正,并根据修正的文本内容修正语音数据,修正了原始录制语音数据中需要修正的内容,克服了课堂上“少说、错说和漏说”等带来的问题,可以获得双修正后的语音数据和文本数据(字幕信息)。Through the method of the invention, the level of classroom recording is improved, various data are separately saved by means of the identifier of the time stamp, and the voice data is corrected by the recognition and conversion of the voice data and the correction of the text data, and the voice data is corrected according to the corrected text content. The content that needs to be corrected in the original recorded voice data overcomes the problems caused by "less talk, wrong talk and miss talk" in the classroom, and can obtain double corrected speech data and text data (subtitle information).
本发明的上述和进一步的目的以及特征,根据结合附图的以下详细说明就会更加清楚和完整。The above and further objects and features of the present invention will become more apparent from the following detailed description.
图1是根据本发明的录播系统架构图;1 is a block diagram of a recording and broadcasting system according to the present invention;
图2是根据本发明的录播步骤流程图;和Figure 2 is a flow chart showing the recording and recording steps in accordance with the present invention;
图3是根据本发明的语音修正流程图。Figure 3 is a flow chart of speech correction in accordance with the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
以下,将结合附图对本发明的具体实施方式进行进一步详细的描述。 Hereinafter, specific embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
本发明中网络教学不局限于学生和教师的课堂教学形式,其可以包括以教师和学生、或培训人为参与主体的在线网络教学、远程网络教学、本地网络教学,和以企事业单位员工等为参与主体的在线网络会议、远程网络会议、本地网络会议,以及其他的利用网络进行在线交流和/或文件内容展示的交流/交互形式,比如远程协同工作。The network teaching in the invention is not limited to the classroom teaching form of students and teachers, and may include online network teaching, remote network teaching, local network teaching, and employees of enterprises and institutions, with teachers and students, or trainers as participants. Participate in online web conferencing, remote web conferencing, local web conferencing, and other forms of communication/interaction that use the web for online communication and/or presentation of file content, such as remote collaborative work.
如图1所示,教师1、学生2分别使用安装有网络教学录播系统客户端的终端设备,通过互联网连接至教学服务器3,由此实现多媒体课堂的网络授课/听课/录制/点播/回顾等。As shown in FIG. 1 , the
所述终端设备包括:处理器、网络模块、控制模块、显示模块以及智能操作系统,可以智能手机、PAD、笔记本电脑、台式电脑等。所述终端上可以设有通过数据总线连接各种拓展类设备和配件的多种数据接口。所述智能操作系统包括Windows、Android及其改进、iOS,在其上可以安装、运行应用软件,实现在智能操作系统下的各种应用软件、服务和应用程序商店/平台的功能。The terminal device includes: a processor, a network module, a control module, a display module, and a smart operating system, and can be a smart phone, a PAD, a notebook computer, a desktop computer, or the like. The terminal may be provided with a plurality of data interfaces for connecting various extension devices and accessories through a data bus. The intelligent operating system includes Windows, Android and its improvements, iOS, on which application software can be installed and run, and functions of various application software, services, and application stores/platforms under the intelligent operating system are realized.
终端设备可以通过RJ45/Wi-Fi/蓝牙/2G/3G/4G/G.hn/Zigbee/Z-ware/RFID等连接方式连接到互联网络,并借助互联网连接到其它的终端或其它电脑及设备,通过1394/USB/串行/SATA/SCSI/PCI-E/Thunderbolt/数据卡接口等多种数据接口或者总线方式,通过HDMI/YpbPr/SPDIF/AV/DVI/VGA/TRS/SCART/Displayport等音视频接口等 连接方式,来连接各种拓展类设备和配件,组成了一个会议/教学设备互动系统。带有软件形式的声音捕捉控制模块和动作捕捉控制模块,或通过数据总线板载硬件形式的声音捕捉控制模块和动作捕捉控制模块,来实现声控和形控功能;通过音视频接口连接显示/投影模块、麦克风、音响设备和其它音视频设备,来实现显示、投影、声音接入、音视频播放,以及数字或模拟的音视频输入和输出功能;通过数据接口连接摄像头、麦克风、电子白板、RFID读取设备,实现影像接入、声音接入、电子白板的使用控制和录屏,RFID读取功能,并通过相应的接口可接入和管控移动存储设备、数字设备和其它设备;通过DLNA/IGRS技术和互联网络技术,来实现的包括多屏设备之间的操控、互动和甩屏等功能。Terminal devices can be connected to the Internet via RJ45/Wi-Fi/Bluetooth/2G/3G/4G/G.hn/Zigbee/Z-ware/RFID connections and connected to other terminals or other computers and devices via the Internet. Through 1394/USB/Serial/SATA/SCSI/PCI-E/Thunderbolt/data card interface and other data interfaces or bus methods, through HDMI/YpbPr/SPDIF/AV/DVI/VGA/TRS/SCART/Displayport, etc. Audio and video interface, etc. The connection method is used to connect various expansion equipment and accessories to form a conference/teaching equipment interactive system. The sound capture control module and the motion capture control module with software form, or the sound capture control module and the motion capture control module in the form of data bus onboard hardware, realize voice control and shape control function; connect display/projection through audio and video interface Modules, microphones, audio equipment and other audio and video equipment for display, projection, sound access, audio and video playback, and digital or analog audio and video input and output functions; connected to the camera, microphone, electronic whiteboard, RFID through the data interface The reading device realizes image access, sound access, use control and screen recording of the electronic whiteboard, RFID reading function, and can access and control mobile storage devices, digital devices and other devices through corresponding interfaces; through DLNA/ IGRS technology and internet technology are used to implement functions such as manipulation, interaction and screen switching between multi-screen devices.
本发明中,处理器定义为包括但不限于:指令执行系统,如基于计算机/处理器的系统、专用集成电路(ASIC)、计算设备、或能够从非暂时性存储介质或非暂时性计算机可读存储介质取得或获取逻辑并执行非暂时性存储介质或非暂时性计算机可读存储介质中包含的指令的硬件和/或软件系统。所述处理器还可以包括任意控制器,状态机,微处理器,基于互联网络的实体、服务或特征,或它们的任意其它模拟的、数字的和/或机械的实现方式。In the present invention, a processor is defined to include, but is not limited to, an instruction execution system such as a computer/processor based system, an application specific integrated circuit (ASIC), a computing device, or a non-transitory or non-transitory computer. A hardware and/or software system that reads a storage medium to acquire or acquire logic and execute instructions contained in a non-transitory storage medium or a non-transitory computer readable storage medium. The processor may also include any controller, state machine, microprocessor, internetwork-based entity, service or feature, or any other analog, digital, and/or mechanical implementation thereof.
本发明中,互联网可以包括局域网和广域互联网,可以是有线互联网,也可以是无线互联网,或者这些网络的任意组合。In the present invention, the Internet may include a local area network and a wide area Internet, and may be a wired Internet or a wireless Internet, or any combination of these networks.
如图2所示,根据本发明的网络教学录播的主要步骤: As shown in FIG. 2, the main steps of the network teaching recording according to the present invention are as follows:
S100:启动录播系统:用户使用终端登录,智能电子白板、教师终端屏幕操作动作捕捉程序、摄像机、麦克风等多媒体教学设备进入工作状态,所述摄像机可以不止一个,所述麦克风包括至少一个,分别用于捕捉教师的语音和用于捕捉学生的语音,录播系统的教学服务器可以用于产生数字时间戳。S100: Starting the recording and broadcasting system: the user uses the terminal to log in, the intelligent electronic whiteboard, the teacher terminal screen operation motion capturing program, the camera, the microphone and other multimedia teaching equipment enter the working state, the camera may have more than one, the microphone includes at least one, respectively Used to capture the teacher's voice and to capture the student's voice, the recording server's teaching server can be used to generate digital timestamps.
S200:开始网络教学:教师开始课堂教学,录播系统生成一个教学文件ID,在教学过程中,比如教师使用智能电子白板进行展示(作为授课板书或者讲解题板)、使用实时语音进行讲解、使用实时交互语音进行交流、还可以在教师终端上使用电子文档比如PPT文档进行展示和说明,从而进行多媒体授课及与学生互动问答交流。S200: Start online teaching: the teacher starts classroom teaching, and the recording and broadcasting system generates a teaching document ID. In the teaching process, for example, the teacher uses the intelligent electronic whiteboard to display (as a teaching board or explain the problem board), and uses real-time voice to explain and use. Real-time interactive voice communication, and can also be displayed and explained on the teacher terminal using electronic documents such as PPT documents, so as to carry out multimedia teaching and interactive question and answer communication with students.
S300:录制数据保存:在录制过程中,智能电子白板上的动作以“动作数据流+时间戳”的方式传输和保存,授课及互动过程中的语音以“语音数据流+时间戳”的方式传输和保存,教师终端上涉及的电子文档如PPT文档的操作动作以“电子文档操作数据流+时间戳”的方式传输和保存,采集的视频数据以“视频数据流+时间戳”的方式传输和保存。整个授课过程中的所有这些数据流与教学文件ID绑定实现对应录制课程的标识。这些数据,可以根据需要进行添加或删减,一种典型的情况是所录制数据包括语音数据、视频数据和PPT文档演示数据,而PPT文档演示数据通常也可以以视频数据的方式进行展示,不一定非要使用动作操作进行重现。在现有技术中,分类录制 分屏展示是比较成熟的技术。录制得到的各种数据可以先保存到本地数据库或者终端数据库,再由这些数据库通过网络上传到远程教学服务器,也可直接保存到远程教学服务器。S300: Recording data saving: During the recording process, the action on the intelligent whiteboard is transmitted and saved in the form of “action data stream + time stamp”. The voice in the teaching and interaction process is “voice data stream + time stamp”. Transmission and storage, the operation actions of electronic documents such as PPT documents involved in the teacher terminal are transmitted and saved in the manner of "electronic document operation data stream + time stamp", and the collected video data is transmitted in the form of "video data stream + time stamp". And save. All of these data streams throughout the course of the course are tied to the teaching document ID to achieve the identity of the recorded course. These data can be added or deleted as needed. A typical case is that the recorded data includes voice data, video data, and PPT document presentation data, and the PPT document presentation data can usually be displayed in the form of video data. You must use an action action to reproduce it. In the prior art, classified recording Split screen display is a relatively mature technology. The various data recorded can be saved to a local database or a terminal database, and then uploaded to the remote teaching server through the network, or directly saved to the remote teaching server.
在一个示例中,对于语音数据的采集,可以使用语音采集设备比如各种可用的麦克风采集语音信号,将语音信号转换成语音数据,以数据流格式保存。对于单一语音源的情况,可以标记出语音源的性别,这样在进行后续语音修正(替换)操作时,可以选择相应性别的标准语音。对于多个语音源的情况,可以分别识别出语音源的性别,这些多个语音源可以识别出来,添加时间戳后分别进行保存,将多个语音源分别识别出来的方法可以使用现有技术,在此不再赘述。In one example, for the acquisition of voice data, a voice acquisition device, such as various available microphones, can be used to acquire the voice signal, and the voice signal can be converted to voice data for storage in a data stream format. In the case of a single speech source, the gender of the speech source can be marked so that the standard speech of the corresponding gender can be selected for subsequent speech correction (replacement) operations. For the case of multiple voice sources, the gender of the voice source can be separately identified, and the multiple voice sources can be identified, and the time stamps can be separately saved and the multiple voice sources can be separately identified. I will not repeat them here.
S400:语音数据转换:对于录制的原始语音数据,首先通过语音模型进行识别转换形成原始文本数据,再对所述原始文本数据进行校对修正。在形成原始文本数据时,将原始语音数据的时间戳添加到文本数据中,使得可以对文本数据中的文字内容进行时间定位。所述文字内容可以是文本数据中的至少一个字、词、句或段。通过所述时间定位获取可以标记音频数据的时间维度的时钟数据,也就是可以相对定位一个音频数据中某个数据片段的时间点的时钟参数。S400: Voice data conversion: For the recorded original voice data, the original text data is first formed by the voice model, and then the original text data is corrected. When the original text data is formed, the time stamp of the original voice data is added to the text data so that the text content in the text data can be time-located. The text content may be at least one word, word, sentence or paragraph in the text data. The clock data of the time dimension of the audio data can be obtained by the time positioning, that is, the clock parameter of the time point at which a certain data segment in one audio data can be relatively located.
在进行识别转换时,可以使用各种可用的语音模型将原始语音数据识别转换为原始文本数据,在进行语音数据识别转换时,首先识别语音源的性别, 并且将性别信息添加到所述文本数据中。对于文本数据的校对修正包括人工校对、半自动校对、语音校对等。When performing the recognition conversion, the original speech data identification can be converted into the original text data by using various available speech models, and when the speech data recognition conversion is performed, the gender of the speech source is first recognized. And adding gender information to the text data. Proofreading corrections for text data include manual proofreading, semi-automatic proofreading, and voice proofreading.
S500:语音数据修正:使用语音修正指令即使用语音校对方式(CN106406807A)对原始文本数据进行修正,但是本发明不限于此。语音校对单元包括,接受语音修正指令,在待修正的文本数据中识别与所述语音修正指令读音相同的所有文字以及这些文字内容的时间戳,确定识别出的所有文字中的待修正文字,显示所述待修正文字对应的备选文字列表,接受备选文字选定指令,进行替换操作,形成修正文本数据,从而完成文本修正。S500: Voice data correction: The original text data is corrected using a voice correction command, that is, using a voice proofing method (CN106406807A), but the present invention is not limited thereto. The voice proofreading unit includes: receiving a voice correction instruction, identifying, in the text data to be corrected, all the characters that are the same as the voice correction instruction sound, and a time stamp of the text content, determining the to-be-corrected text in all the recognized texts, and displaying The alternative text list corresponding to the to-be-corrected text accepts an alternative text selection instruction, performs a replacement operation, and forms corrected text data, thereby completing the text correction.
完成文本修正的过程中,从标准语音数据库中调取修正文字的标准发音信息,根据被修正的文字的时间戳,用标准发音信息替换对应的语音数据片段,形成修正语音数据。所述标准语音数据库可以包括女生标准语音数据库、男生标准语音数据库和/或个性化标准语音数据库。所述个性化标准语音数据库是,通过对于特定发音人录制形成的标准语音数据库,或者通过语料训练,形成的特定发音人的语音模型,可以用于语音识别,还可以用于生成个性化标准语音数据库。In the process of completing the text correction, the standard pronunciation information of the corrected text is retrieved from the standard speech database, and the corresponding speech data segment is replaced with the standard pronunciation information according to the time stamp of the corrected text to form the corrected speech data. The standard speech database may include a girls standard speech database, a boys standard speech database, and/or a personalized standard speech database. The personalized standard voice database is a voice model of a specific speaker formed by a standard voice database formed by recording a specific speaker, or by corpus training, and can be used for voice recognition, and can also be used to generate personalized standard voice. database.
在从标准语音数据中调取标准发音信息时,根据所述原始文本数据的语音源性别信息,或者其他个性化信息,选择相应的标准语音。作为一种选择,所述旧文字内容可以为空内容,也就是,替换所述空内容的新文字内容是遗漏的,现在需要添加的文字内容。所述新文字内容可以为空内容,也就是, 被替换的所述旧文字内容是多余的,现在需要删除的文字内容。When the standard pronunciation information is retrieved from the standard voice data, the corresponding standard voice is selected according to the voice source gender information of the original text data, or other personalized information. Alternatively, the old text content may be empty content, that is, the new text content replacing the empty content is missing, and the added text content is now required. The new text content can be empty content, that is, The old text content that was replaced is superfluous and now needs to delete the text content.
如图3所示,在一个示例中,语音修正的具体步骤如下:As shown in FIG. 3, in one example, the specific steps of the speech correction are as follows:
S11:接收指令S11: Receive instruction
当识别的文本数据发现问题时,如需要修正的文字为“胡建”,接收语音修正指令,如用户可以通过此单元发出“选中胡建”的语音指令,发起修正问题文字“胡建”的指令。When the identified text data finds a problem, if the text to be corrected is “Hu Jian”, the voice correction instruction is received, for example, the user can issue a “selected Hu Jian” voice instruction through the unit to initiate the correction of the problem text “Hu Jian”. instruction.
S12:查找文字S12: Find text
在原始文本数据中识别与所述语音修正指令指定读音相同的所有文字。All texts identical to the specified pronunciation of the speech correction instruction are identified in the original text data.
S13:确定文字S13: Determine the text
确定识别出的文本数据中的所有待修正文字。Determine all the text to be corrected in the recognized text data.
其中,当在文本数据中出现多个与语音修正指令指定读音相同的文字时,用户可以通过进一步的语音指令明确哪个文字需要修正。例如,在待修正文本数据中从前往后识别出读音为“hujian”的文字依次有:“胡建”、“互见”、“护肩”...等,用户当前想要将识别出的第一个文字进行修正,则可发出“第一个”的语音来将识别出的第一个文字确定为当前待修正的文字。Wherein, when a plurality of words corresponding to the specified pronunciation of the voice correction instruction appear in the text data, the user can clarify which character needs to be corrected by using a further voice instruction. For example, in the text data to be corrected, the words that are recognized as "hujian" from the time of going to the following are: "Hu Jian", "mutual see", "shoulder shoulder", etc., the user currently wants to recognize the first If a text is corrected, the "first" voice can be issued to determine the first recognized text as the current text to be corrected.
S14:备选列表S14: Alternative list
显示所述待修正的文字对应的备选文字列表;所述备选文字与所述待修正的文字同音。Displaying an alternative text list corresponding to the text to be corrected; the candidate text is homophone with the text to be corrected.
其中,当选定了待修正的文字后,在所述文字的附近显示同音的备选文 字列表,便于用户后续选择备选文字。例如:若将文本数据中的第一个发音为“hujian”的文字“胡建”确定为待修正文字,则此步骤中在文本数据中的第一个发音为“hujian”的文字“胡建”附近显示备选文字列表:1、福建;2、附件;3、护肩;4、互见,...Wherein, when the text to be corrected is selected, an alternative text of the homophone is displayed in the vicinity of the text. A list of words that allows the user to subsequently select alternate text. For example, if the first word "Hu Jian" in the text data is "hujian" is determined as the text to be corrected, then the first word in the text data in this step is "hujian". "A list of alternative texts is displayed nearby: 1, Fujian; 2, accessories; 3, shoulder pads; 4, mutual see,...
S15:选定指令S15: Selected instruction
接收备选文字选定指令。Receive an alternate text selection instruction.
其中,用户可以通过语音说出备选文字在备选文字列表中的位置,完成备选文字选中的工作。比如使用福建替换胡建。Among them, the user can speak the position of the alternative text in the alternative text list by voice, and complete the work of selecting the alternative text. For example, use Fujian to replace Hu Jian.
S16:修正文字S16: Correct text
将所述待修正文字修正为所述备选文字选定指令所指定的备选文字。在进行修正替换的过程中,将待修正文字的时间位置信息,以时间戳进行标记,从而准确定位被修正文字所对应的语音数据的时间位置信息。优选的是,在修正文本数据和语音数据流的过程中,形成修正历史记录,所述修正历史记录包括修正时间、修正内容、修正操作人等等。Correcting the to-be-corrected text to an alternate text specified by the alternate text selection instruction. In the process of performing correction and replacement, the time position information of the text to be corrected is marked with a time stamp, thereby accurately positioning the time position information of the voice data corresponding to the corrected text. Preferably, in the process of correcting the text data and the voice data stream, a correction history record is formed, the correction history record including correction time, correction content, correction operator, and the like.
S17:语音片段S17: Voice clip
从标准语音库中,根据备选文字搜索其标准语音数据,如果多字词或句子,就组合形成一段新的语音数据片段。优选的是,文本数据中包含有语音源的性别信息,在进行所述搜索时,就可以根据性别信息获得女生发音或男生发音,或者各种高音、低音等不同的语音数据。 From the standard speech library, the standard speech data is searched according to the alternative text, and if a plurality of words or sentences are combined, a new piece of speech data is combined. Preferably, the text data includes gender information of the voice source, and when the search is performed, the girl's pronunciation or the boy's pronunciation, or various voice data such as various trebles and basses may be obtained according to the gender information.
S18:语音替换S18: Voice replacement
根据之前所述的时间位置信息,将所述新的语音数据片段替换原始语音数据中的相应语音数据片段,形成新的语音数据。优选的是,由于标准语音的发音时间和被替换的语音的发音时间,即使文字内容完全相同,发音时间也不一定相同,为了平滑的无缝替换,可以先根据两个语音片段的发音时间计算出平滑系数,根据所述平滑系数,加快或减慢所述标准发音时间,使得替换后和替换前同样文字内容的发音持续时间保持一致。The new voice data segment is replaced with the corresponding voice data segment in the original voice data according to the previously described time position information to form new voice data. Preferably, due to the pronunciation time of the standard speech and the pronunciation time of the replaced speech, even if the text content is completely the same, the pronunciation time is not necessarily the same. For smooth seamless replacement, the pronunciation time of the two speech segments may be calculated first. And a smoothing coefficient, according to the smoothing coefficient, speeding up or slowing down the standard pronunciation time, so that the pronunciation duration of the same text content after the replacement and before the replacement is consistent.
用户使用终端通过互联网登录录播系统,可以实现对录制课堂的回顾播放或点播播放。当然,对于某些用户比如网络在线会议用户,这些录制课堂可以是网络在线会议的过程记录文件,录播系统会把用户请求回顾或点播的教学文件ID通过Socket加密信道发送给教学服务器,通过教学文件ID获取此课程的带有时间戳的动作数据流、语音数据流、电子文档操作数据流、视频数据流以及文本数据等发送给请求相应教学文件ID的用户终端,用户终端在本地根据时间戳还原(重现或者模拟重现)整个课堂教学过程。这些数据流可以在用户终端的各个功能区分别进行显示或者切换式显示。对于视频一般可以在用户终端上进行重现,但是对于电子白板的操作,通过电子白板的模拟程序,可以实现模拟重现。The user uses the terminal to log in to the recording and broadcasting system through the Internet, and can realize the review playback or on-demand playback of the recorded classroom. Of course, for some users, such as online online conference users, these recording classrooms may be process record files of online online conferences, and the recording and playback system will send the teaching file IDs requested by the user for review or on-demand to the teaching server through the Socket encrypted channel, through teaching. The file ID obtains the time-stamped action data stream, the voice data stream, the electronic document operation data stream, the video data stream, and the text data of the course to be sent to the user terminal requesting the corresponding teaching file ID, and the user terminal locally according to the timestamp. Restore (reproduce or simulate reappearance) the entire classroom teaching process. These data streams can be displayed or switched display in each functional area of the user terminal. For video, it can generally be reproduced on the user terminal, but for the operation of the electronic whiteboard, simulation reproduction can be realized by the simulation program of the electronic whiteboard.
当然,用户可以选择只播放这些数据流的至少一种,比如可以只听语音。对于文本数据,可以以字幕的方式显示在用户终端的特定区域,比如视频展 示区内。Of course, the user can choose to play only at least one of these data streams, for example, can only listen to the voice. For text data, it can be displayed in a specific area of the user terminal in the form of subtitles, such as a video exhibition. In the exhibition area.
在一个示例中,起到字幕作用的文本数据可以显示在特定的可编辑区域,使得用户可以进行选定操作等,这样对于发现的不标准的语音数据或者文字信息,只需要选定相应的文字信息即可进行反馈。录播系统的管理员在接到用户的反馈之后,进行核实,如果发现确实存在错误,就重复前面的文本数据和语音数据流的修正步骤,使得文本数据和语音数据能够得到不断的完善和改进。In one example, the text data functioning as a caption can be displayed in a specific editable area, so that the user can perform a selected operation or the like, so that only the corresponding text needs to be selected for the found non-standard voice data or text information. Information can be fed back. The administrator of the recording and broadcasting system verifies the feedback after receiving the feedback from the user. If it finds that there is an error, repeats the correction steps of the previous text data and the voice data stream, so that the text data and the voice data can be continuously improved and improved. .
在上述实施例中,终端和服务器是可以由与包含互联网在内的通信网络进行连接的构成,所以也可以是以经由通信网络下载程序代码的方式流动地承载程序代码的媒体。在这样从通信网络下载程序代码的情况下,也可以是所述下载用的程序预先保存在主体装置中或者从别的记录媒体进行安装的构成。此外,本发明可以通过上述程序代码以电子传输所体现的、被嵌入于载波中的计算机数据信号的形态而得以实现。In the above embodiment, the terminal and the server are configured to be connected to a communication network including the Internet. Therefore, the medium may be a program that carries the program code in a streaming manner via the communication network. In the case where the program code is downloaded from the communication network as described above, the program for downloading may be stored in the main device or may be installed from another recording medium. Furthermore, the present invention can be realized by the above-described program code in the form of a computer data signal embodied in an electronic transmission embodied in a carrier wave.
以上介绍了本发明的较佳实施方式,旨在使得本发明的精神更加清楚和便于理解,并不是为了限制本发明,凡在本发明的精神和原则之内,所做的修正、替换、改进,均应包含在本发明所附的权利要求概况的保护范围之内。The preferred embodiments of the present invention have been described above, and the present invention is intended to be illustrative and not to limit the scope of the present invention. It is intended to be included within the scope of the appended claims.
本申请的所提供的一种教学录播数据修正装置,提高了课堂录制的水平,借助时间戳的标识,分别保存各种数据,通过对语音数据的识别转换和文本数据的修正,并根据修正的文本内容修正语音数据,修正了原始录制语 音数据中需要修正的内容,克服了课堂上“少说、错说和漏说”等带来的问题,可以获得双修正后的语音数据和文本数据(字幕信息)。 The teaching and recording data correction device provided by the application improves the level of classroom recording, and saves various data by means of time stamp identification, through recognition and conversion of voice data and correction of text data, and according to the correction Corrected voice data with text content, corrected original recorded language The content that needs to be corrected in the audio data overcomes the problems caused by "less talk, wrong talk and miss talk" in the classroom, and can obtain double corrected speech data and text data (subtitle information).
Claims (10)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710444172.1A CN107220228B (en) | 2017-06-13 | 2017-06-13 | A kind of teaching recorded broadcast data correction device |
| CN201710444172.1 | 2017-06-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018227761A1 true WO2018227761A1 (en) | 2018-12-20 |
Family
ID=59948760
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/099055 Ceased WO2018227761A1 (en) | 2017-06-13 | 2017-08-25 | Correction device for recorded and broadcasted data for teaching |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107220228B (en) |
| WO (1) | WO2018227761A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110459233A (en) * | 2019-03-19 | 2019-11-15 | 深圳壹秘科技有限公司 | Processing method, device and the computer readable storage medium of voice |
| CN110534100A (en) * | 2019-08-27 | 2019-12-03 | 北京海天瑞声科技股份有限公司 | A kind of Chinese speech proofreading method and device based on speech recognition |
Families Citing this family (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109324811B (en) * | 2017-07-28 | 2021-10-15 | 深圳市鹰硕技术有限公司 | Device for updating teaching recorded broadcast data |
| CN107767871B (en) * | 2017-10-12 | 2021-02-02 | 安徽听见科技有限公司 | Text display method, terminal and server |
| JP7069631B2 (en) * | 2017-10-16 | 2022-05-18 | 富士フイルムビジネスイノベーション株式会社 | Information processing equipment and information processing programs |
| CN107820112A (en) * | 2017-11-15 | 2018-03-20 | 安徽声讯信息技术有限公司 | A kind of audio written broadcasting live system |
| CN108320318B (en) * | 2018-01-15 | 2023-07-28 | 腾讯科技(深圳)有限公司 | Image processing method, device, computer equipment and storage medium |
| CN110390930A (en) * | 2018-04-15 | 2019-10-29 | 高翔 | A kind of method and system of audio text check and correction |
| CN108962293B (en) * | 2018-07-10 | 2021-11-05 | 武汉轻工大学 | Video correction method, system, terminal device and storage medium |
| CN110858492A (en) * | 2018-08-23 | 2020-03-03 | 阿里巴巴集团控股有限公司 | Audio editing method, device, equipment and system and data processing method |
| CN109300468B (en) * | 2018-09-12 | 2022-09-06 | 科大讯飞股份有限公司 | Voice labeling method and device |
| CN109243484A (en) * | 2018-10-16 | 2019-01-18 | 上海庆科信息技术有限公司 | A kind of generation method and relevant apparatus of conference speech record |
| CN109782986A (en) * | 2018-12-14 | 2019-05-21 | 浙江学海教育科技有限公司 | A kind of production method of teaching courseware, storage medium and application system |
| CN109858005B (en) * | 2019-03-07 | 2024-01-12 | 百度在线网络技术(北京)有限公司 | Method, device, equipment and storage medium for updating document based on voice recognition |
| CN110880316A (en) * | 2019-10-16 | 2020-03-13 | 苏宁云计算有限公司 | Audio output method and system |
| CN110930997B (en) * | 2019-12-10 | 2022-08-16 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
| CN111399800A (en) * | 2020-03-13 | 2020-07-10 | 胡勇军 | Voice input method system |
| CN113571061B (en) * | 2020-04-28 | 2024-12-13 | 阿里巴巴集团控股有限公司 | Speech transcription text editing system, method, device and equipment |
| CN112562638B (en) * | 2020-11-26 | 2025-01-07 | 北京达佳互联信息技术有限公司 | Voice preview method, device and electronic device |
| CN113590871B (en) * | 2021-02-05 | 2025-08-29 | 腾讯科技(深圳)有限公司 | Audio classification method, device and computer-readable storage medium |
| CN116524910B (en) * | 2023-06-25 | 2023-09-08 | 安徽声讯信息技术有限公司 | Manuscript prefabrication method and system based on microphone |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103366731A (en) * | 2012-03-31 | 2013-10-23 | 盛乐信息技术(上海)有限公司 | Text to speech (TTS) method and system |
| CN103366741A (en) * | 2012-03-31 | 2013-10-23 | 盛乐信息技术(上海)有限公司 | Voice input error correction method and system |
| CN105306861A (en) * | 2015-10-15 | 2016-02-03 | 深圳市时尚德源文化传播有限公司 | Online teaching recording and playing method and system |
| CN106710597A (en) * | 2017-01-04 | 2017-05-24 | 广东小天才科技有限公司 | Voice data recording method and device |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103207769B (en) * | 2012-01-16 | 2016-10-05 | 联想(北京)有限公司 | The method of voice correction and user equipment |
| CN105244022B (en) * | 2015-09-28 | 2019-10-18 | 科大讯飞股份有限公司 | Audio-video method for generating captions and device |
-
2017
- 2017-06-13 CN CN201710444172.1A patent/CN107220228B/en active Active
- 2017-08-25 WO PCT/CN2017/099055 patent/WO2018227761A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103366731A (en) * | 2012-03-31 | 2013-10-23 | 盛乐信息技术(上海)有限公司 | Text to speech (TTS) method and system |
| CN103366741A (en) * | 2012-03-31 | 2013-10-23 | 盛乐信息技术(上海)有限公司 | Voice input error correction method and system |
| CN105306861A (en) * | 2015-10-15 | 2016-02-03 | 深圳市时尚德源文化传播有限公司 | Online teaching recording and playing method and system |
| CN106710597A (en) * | 2017-01-04 | 2017-05-24 | 广东小天才科技有限公司 | Voice data recording method and device |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110459233A (en) * | 2019-03-19 | 2019-11-15 | 深圳壹秘科技有限公司 | Processing method, device and the computer readable storage medium of voice |
| CN110534100A (en) * | 2019-08-27 | 2019-12-03 | 北京海天瑞声科技股份有限公司 | A kind of Chinese speech proofreading method and device based on speech recognition |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107220228A (en) | 2017-09-29 |
| CN107220228B (en) | 2019-08-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018227761A1 (en) | Correction device for recorded and broadcasted data for teaching | |
| CN109324811B (en) | Device for updating teaching recorded broadcast data | |
| US12462808B2 (en) | Systems and methods for team cooperation with real-time recording and transcription of conversations and/or speeches | |
| CN111538851B (en) | Method, system, equipment and storage medium for automatically generating demonstration video | |
| CN109698920B (en) | Follow teaching system based on internet teaching platform | |
| JP6472898B2 (en) | Recording / playback method and system for online education | |
| US7458013B2 (en) | Concurrent voice to text and sketch processing with synchronized replay | |
| CN104408983B (en) | Intelligent tutoring information processing system based on recorded broadcast equipment | |
| WO2019095446A1 (en) | Following teaching system having speech evaluation function | |
| JP2002202941A (en) | Multimedia electronic learning system and learning method | |
| Valor Miró et al. | Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures | |
| KR20130115484A (en) | System for providing lecture contents using lecture data synchronized with teaching materials | |
| KR101858204B1 (en) | Method and apparatus for generating interactive multimedia contents | |
| KR101198091B1 (en) | Method and system for learning contents | |
| KR100395883B1 (en) | Realtime lecture recording system and method for recording a files thereof | |
| CN116312083A (en) | Course file generation method, device, electronic device and storage medium | |
| JP2004266578A (en) | Moving image editing method and apparatus | |
| JP4085015B2 (en) | STREAM DATA GENERATION DEVICE, STREAM DATA GENERATION SYSTEM, STREAM DATA GENERATION METHOD, AND PROGRAM | |
| CN118200299A (en) | Metaverse conference hosting method, device, equipment, storage medium and program product | |
| Paul | Building a specialised audio-visual corpus | |
| KR20030025771A (en) | System for providing educational contents on Internet and method thereof | |
| KR20200039907A (en) | Smart language learning services using scripts and their service methods | |
| KR20240113179A (en) | Method for providng lecture content using virtual human | |
| US20210397783A1 (en) | Rich media annotation of collaborative documents | |
| JP3816901B2 (en) | Stream data editing method, editing system, and program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17913820 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17913820 Country of ref document: EP Kind code of ref document: A1 |