CN1202471C

CN1202471C - Book production system and method

Info

Publication number: CN1202471C
Application number: CN 01141820
Authority: CN
Inventors: 吴昌隆
Original assignee: Newsoft Tech Corp
Current assignee: Newsoft Tech Corp
Priority date: 2001-09-19
Filing date: 2001-09-19
Publication date: 2005-05-18
Anticipated expiration: 2021-09-19
Also published as: CN1409213A

Abstract

The invention provides a book making system, which is used for generating a book comprising a character part and an illustration part and comprises a video receiving module, a decoding module, a character capturing module, an illustration capturing module and a book generating module. In the invention, a video receiving module receives an original video material; the decoding module decodes the original video data into a video data; the character acquisition module acquires a character part from the video data according to a manufacturing policy; the image-inserting capturing module captures at least one key frame from the video data as an image-inserting part according to the production policy; the book generation module typesets the acquired text part and the illustration part to generate a book. The invention also discloses a book making method implemented according to the system.

Description

Book production system and method

技术领域technical field

本发明涉及一种图书制作系统与方法，特别涉及一种利用一计算机软件分析一视频源(Video)以自动产生绘本、画册、漫画、电子书等图书文件的图书制作系统与方法。The present invention relates to a book making system and method, in particular to a book making system and method which utilizes a computer software to analyze a video source (Video) to automatically generate book files such as picture books, albums, comics, and electronic books.

背景技术Background technique

根据目前的技术，一般在制作绘本、画册、漫画、电子书等图书时，其内容的来源通常仍利用人工绘图，或是由计算机针对单张影像一一编辑整理，以便汇编成书册。According to the current technology, when making picture books, picture albums, comics, e-books and other books, the source of the content is usually still using manual drawing, or editing and sorting out single images by computers so as to compile them into books.

然而，随着数字摄影机、电视卡(TV Tuner Card)、机上盒(SetupBox)、DVD、VCD等电子信息产物的日益普及，使用者可以很容易地取得数字视频，因此，利用计算机处理视频源以产生图书文件，俨然成为计算机多媒体领域的重要应用与需求。However, with the increasing popularization of electronic information products such as digital cameras, TV Tuner Cards, set-top boxes (SetupBox), DVDs, and VCDs, users can easily obtain digital videos. Therefore, using computers to process video sources to Generating book files has become an important application and demand in the field of computer multimedia.

如上所述，当所得到的影像资料不是单张影像而是连续影像的视频源时，使用者必须将连续影像的视频源分解成多张影像，然后才能够由计算机针对该影像编辑整理成册。然而，对于一般的视频内容(Video Content)而言，在NTSC标准中，其拨放一秒钟可能是包含29.97张影像的连续切换，而在PAL标准中，其拨放一秒钟可能包含25张影像的连续切换，所以，一分钟长的视频内容便具有1500～1800张影像，如果使用者一一编辑每一张影像，将是一件非常耗时而没有效率的事。As mentioned above, when the obtained image data is not a single image but a video source of continuous images, the user must decompose the video source of continuous images into multiple images, and then the images can be edited and compiled into a book by the computer. However, for general video content (Video Content), in the NTSC standard, it may be a continuous switch that contains 29.97 images for one second, while in the PAL standard, it may contain 25 images for one second. Continuous switching of images, therefore, a minute-long video content has 1500-1800 images, if the user edits each image one by one, it will be a very time-consuming and inefficient thing.

因此，如何能够有效率地利用视频内容来产生绘本、画册、漫画、电子书等图书文件，正是当前一个重要的课题。Therefore, how to effectively use video content to generate book files such as picture books, albums, comics, and e-books is an important topic at present.

发明内容Contents of the invention

为了克服现有技术的不足，本发明的目的是提供一种图书制作系统与方法，其能够自动分析一视频源以产生绘本、画册、漫画、电子书等图书文件。In order to overcome the deficiencies of the prior art, the object of the present invention is to provide a book making system and method, which can automatically analyze a video source to generate picture books, albums, comics, e-books and other book files.

为达到上述目的，本发明的图书制作系统是用来产生包括一文字部分及一插图部分的一图书，且包括一视频接收模块、一解码模块、一文字撷取模块、一插图撷取模块以及一图书产生模块。在本发明中，视频接收模块接收一原视频资料，解码模块解码原视频资料以取得一视频资料，而原视频资料可以是任意一种视频格式，文字撷取模块则依据一制作方针自视频资料中取得文字部分，插图撷取模块则依据制作方针自视频资料中撷取至少一关键画面(key frame)以作为插图部分，然后图书产生模块依据所取得的文字部分与插图部分产生图书。To achieve the above object, the book production system of the present invention is used to generate a book including a text part and an illustration part, and includes a video receiving module, a decoding module, a text extraction module, an illustration extraction module and a book Generate modules. In the present invention, the video receiving module receives an original video data, and the decoding module decodes the original video data to obtain a video data, and the original video data can be in any video format, and the text extraction module obtains a video data from the video data according to a production policy. The text part is obtained in the image, the illustration extraction module extracts at least one key frame from the video data as the illustration part according to the production policy, and then the book generation module generates a book according to the obtained text part and illustration part.

另外，依本发明的图书制作系统还包括一编辑模块、一图书格式(template)选择模块、以及一制作方针选择模块。在本发明中，制作方针选择模块接受一使用者选择所需的制作方针，编辑模块接收使用者的操作以便对图书的内容进行编辑，图书格式选择模块接收使用者选择所需的至少一图书格式，而图书产生模块便套用所选出的图书格式来排版文字部分与插图部分以产生图书。In addition, the book production system according to the present invention further includes an editing module, a book format (template) selection module, and a production policy selection module. In the present invention, the production policy selection module accepts a user to select a required production policy, the editing module receives the user's operation to edit the content of the book, and the book format selection module receives at least one book format required by the user's selection , and the book generating module applies the selected book format to typeset the text part and the illustration part to generate a book.

如上所述，制作方针选择模块所能够选择的制作方针包括一音频(audio)分析算法则、一字幕(caption)分析算法则、一场景/镜头变换分析算法则以及一影像分析算法则，其中，音频分析算法则是一种分析视频资料的音频资料的算法则；字幕分析算法则是一种分析视频资料的字幕资料的算法则；场景/镜头变换分析算法则是一种分析视频资料的场景/镜头变换资料的算法则；影像分析算法则是一种分析视频资料的影像资料的算法则，而且其可以将影像资料与预先提供的一影像范例资料作比对分析，或是将影像资料与预先提供的一物体资料作比对分析，或是分析出影像资料中的一字幕影像资料。As mentioned above, the production policy that can be selected by the production policy selection module includes an audio (audio) analysis algorithm, a caption (caption) analysis algorithm, a scene/shot change analysis algorithm and an image analysis algorithm, wherein, The audio analysis algorithm is an algorithm for analyzing the audio data of the video data; the subtitle analysis algorithm is an algorithm for analyzing the subtitle data of the video data; Algorithm for lens transformation data; image analysis algorithm is an algorithm for analyzing image data of video data, and it can compare and analyze image data with a pre-provided image sample data, or compare image data with pre-provided An object data is provided for comparative analysis, or a subtitle image data in the image data is analyzed.

因此，文字撷取模块与插图撷取模块能够依据上述的音频分析算法则、字幕分析算法则、场景/镜头变换分析算法则、或是影像分析算法则来取得制作图书所需的文字部分与插图部分等资料，接着，图书产生模块将上述文字部分与插图部分套入图书格式中，于是便自动产生绘本、画册、漫画、电子书等图书文件。Therefore, the text extraction module and the illustration extraction module can obtain the text and illustrations required for making a book according to the above-mentioned audio analysis algorithm, subtitle analysis algorithm, scene/camera transition analysis algorithm, or image analysis algorithm. Then, the book generation module inserts the above-mentioned text part and illustration part into the book format, so that book files such as picture books, picture albums, comics, and e-books are automatically generated.

本发明亦提供一种图书制作方法，其包括一视频接收步骤、一解码步骤、一文字撷取步骤、一插图撷取步骤以及一图书产生步骤。在本发明中，视频接收步骤先接收原视频资料，接着解码步骤解码原视频资料以取得视频资料，然后文字撷取步骤与插图撷取步骤分别自视频资料中撷取出制作图书所需的文字部分与插图部分，最后图书产生步骤依据文字部分与插图部分产生图书。The present invention also provides a book production method, which includes a video receiving step, a decoding step, a text extraction step, an illustration extraction step and a book generation step. In the present invention, the video receiving step first receives the original video data, and then the decoding step decodes the original video data to obtain the video data, and then the text extraction step and the illustration extraction step respectively extract the text part required for making a book from the video data and the illustration part, and finally the book generation step generates the book according to the text part and the illustration part.

另外，依本发明的图书制作方法还包括一编辑步骤以于图书产生后编辑图书的内容、一图书格式选择步骤以便使用者选取所需的图书格式，进而让图书产生步骤套用该图书格式来产生图书、以及一制作方针选择模块以便使用者选取所需的制作方针。In addition, the book production method according to the present invention also includes an editing step to edit the content of the book after the book is generated, a book format selection step so that the user can select the required book format, and then let the book generation step apply the book format to generate Books, and a production policy selection module so that users can select the required production policy.

本发明的优点是：由于本发明的图书制作系统与方法能够自动分析一视频源，并配合多种视频格式，且整合视频内容分析、文字辨识、声音辨识等技术，来产生绘本、画册、漫画、电子书等图书文件，所以能够有效率地利用视频内容来产生图书文件。The advantage of the present invention is: because the book production system and method of the present invention can automatically analyze a video source, cooperate with multiple video formats, and integrate technologies such as video content analysis, character recognition, and sound recognition, to produce picture books, picture books, comics, etc. , e-books and other book files, so the video content can be efficiently used to generate book files.

附图说明Description of drawings

下面结合附图及实施例对本发明进行详细说明：Below in conjunction with accompanying drawing and embodiment the present invention is described in detail:

图1为一示意图，显示依本发明较佳实施例的图书制作系统的结构；Fig. 1 is a schematic diagram showing the structure of a book making system according to a preferred embodiment of the present invention;

图2为一流程图，显示依本发明较佳实施例的图书制作方法的流程；Fig. 2 is a flow chart, shows the flow process of the book production method according to the preferred embodiment of the present invention;

图3为一示意图，显示于本发明较佳实施例的图书制作方法中撷取关键画面的示意图。FIG. 3 is a schematic diagram showing a schematic diagram of capturing key frames in the book production method according to a preferred embodiment of the present invention.

图中符号说明：Explanation of symbols in the figure:

1 图书制作系统1 Book production system

101 视频接收模块101 Video receiving module

102 解码模块102 decoding module

103 制作方针选择模块103 Make policy selection module

104 文字撷取模块104 text extraction module

105 插图撷取模块105 illustration capture module

106 图书格式选择模块106 Book format selection module

107 图书产生模块107 Book generation module

108 编辑模块108 Edit module

2 图书制作方法2 book making method

201～209 依本发明较佳实施例的图书制作方法的流音频资料201-209 The streaming audio data of the book production method according to the preferred embodiment of the present invention

301 单张影像301 single image

302 关键画面302 key screen

40 原视频资料40 Original video material

41 视频资料41 Video material

411 音频资料411 audio data

412 字幕资料412 subtitle data

413 影像资料413 Video data

50 制作方针50 production policy

501 音频分析算法则501 Audio Analysis Algorithms

502 字幕分析算法则502 subtitle analysis algorithm

503 影像分析算法则503 Image Analysis Algorithms

5031 影像范例资料5031 Image sample data

5032 物体数据5032 object data

504 场景/镜头变换分析算法则504 Scene/Shot Change Analysis Algorithm

60 计算机设备60 computer equipment

601 信号源接口601 Signal source interface

602 内存602 memory

603 中央处理单元603 central processing unit

604 输入装置604 Input device

605 储存装置605 storage device

70 图书格式70 Book Formats

80 图书80 books

801 文字部分801 text part

802 插图部分802 Illustration section

具体实施方式Detailed ways

以下将参照相关图式，说明本发明较佳实施例的图书制作系统与方法，其中相同的组件将以相同的参照符号加以说明。The book production system and method of the preferred embodiments of the present invention will be described below with reference to related drawings, wherein the same components will be described with the same reference symbols.

请参照图1所示，本发明较佳实施例的图书制作系统1是用来产生包括一文字部分801以及一插图部分802的一图书80，且包括一视频接收模块101、一解码模块102、一制作方针选择模块103、一文字撷取模块104、一插图撷取模块105、一图书格式选择模块106、一图书产生模块107以及一编辑模块108。Please refer to Fig. 1, the book making system 1 of the preferred embodiment of the present invention is used to produce a book 80 including a text part 801 and an illustration part 802, and includes a video receiving module 101, a decoding module 102, a A production policy selection module 103 , a text extraction module 104 , an illustration extraction module 105 , a book format selection module 106 , a book generation module 107 and an editing module 108 .

在本实施例中，图书制作系统1可以应用于一计算机设备60中，而计算机设备60可以是现有的计算机装置，其包括一信号源接口601、一内存602、一中央处理单元(CPU)603、一输入装置604以及一储存装置605。其中，信号源接口601与一信号源输出装置或是一信号源纪录装置连接，例如是光驱、FireWire(IEEE 1394 Interface)、通用串行端口(USB)等接口装置，而信号源输出装置例如是数字摄影机，信号源纪录装置例如是VCD、DVD等。内存602可以是DRAM或EEPROM等任何一种或多种设置于计算机装置中的暂存内存。中央处理单元603则可采用任何一种现有的中央处理器架构，例如，包括ALU、缓存器与控制器等，以进行各种资料的处理与运算，以及控制计算机设备60中各组件的作动。输入装置604可以是鼠标、键盘等能够由使用者自行输入信息，或是操作各软件模块的装置。储存装置605可以是硬盘机、软盘机等任何一种或多种计算机可读取的资料储存装置。In this embodiment, the book making system 1 can be applied in a computer device 60, and the computer device 60 can be an existing computer device, which includes a signal source interface 601, a memory 602, a central processing unit (CPU) 603 , an input device 604 and a storage device 605 . Wherein, the signal source interface 601 is connected with a signal source output device or a signal source recording device, such as optical drive, FireWire (IEEE 1394 Interface), universal serial port (USB) and other interface devices, and the signal source output device is, for example, For digital video cameras, the signal source recording device is, for example, VCD, DVD, etc. The memory 602 can be any one or more kinds of temporary memory provided in the computer device, such as DRAM or EEPROM. The central processing unit 603 can adopt any existing central processing unit architecture, for example, including ALU, registers and controllers, etc., to process and calculate various data, and to control the operation of various components in the computer device 60. move. The input device 604 can be a mouse, a keyboard, etc., which can be used by the user to input information or operate various software modules. The storage device 605 may be any one or more computer-readable data storage devices such as a hard disk drive and a floppy disk drive.

本实施例中的各模块是指储存于储存装置605中或是一纪录媒体中的软件模块。中央处理单元603在读取各模块之后，即可经由计算机设备60中的各组件来实现各模块的功能。然而需注意的是，本领域技术人员亦可将本实施例中所公开的软件模块制作成硬件，如专用集成电路(application-specific integrated circuit，ASIC)芯片等，而不违反本发明的精神与范畴。Each module in this embodiment refers to a software module stored in the storage device 605 or a recording medium. After the central processing unit 603 reads each module, it can realize the functions of each module through each component in the computer device 60 . However, it should be noted that those skilled in the art can also make the software modules disclosed in this embodiment into hardware, such as application-specific integrated circuit (ASIC) chips, without violating the spirit and spirit of the present invention. category.

以下详细说明本实施例中各模块的功能。The functions of each module in this embodiment will be described in detail below.

在本实施例中，视频接收模块101接收一原视频资料40，解码模块102解码原视频资料40以取得一视频资料41，制作方针选择模块103是接受一使用者的操作以选取所需的一制作方针50，文字撷取模块104则依据制作方针50自视频资料41中取得文字部分801，插图撷取模块105则依据制作方针50自视频资料4 1中撷取至少一关键画面以作为插图部分802，而图书格式选择模块106接收使用者的选择以提供至少一图书格式70，图书产生模块107套用图书格式70，并依据所取得的文字部分801与插图部分802产生图书80，最后，编辑模块108在图书80产生之后，接受使用者操作以编辑图书80的内容。In this embodiment, the video receiving module 101 receives an original video data 40, the decoding module 102 decodes the original video data 40 to obtain a video data 41, and the production policy selection module 103 accepts a user's operation to select a desired video data. The production policy 50, the text extraction module 104 obtains the text part 801 from the video data 41 according to the production policy 50, and the illustration extraction module 105 extracts at least one key frame from the video data 41 according to the production policy 50 as an illustration part 802, and the book format selection module 106 receives the user's selection to provide at least one book format 70, the book generation module 107 applies the book format 70, and generates the book 80 according to the obtained text part 801 and illustration part 802, and finally, the editing module 108 Accept user operations to edit the content of the book 80 after the book 80 is generated.

如上所述，视频接收模块101与信号源接口601配合，例如，视频接收模块101可以通过FireWire(IEEE 1394 Interface)取得储存于数字摄影机中的原视频资料40，或是通过光驱取得记录于VCD、DVD中的原视频资料40。原视频资料40是由各种视频撷取装置或接收装置如数字摄影机、电视卡、机上盒等，以及各种视频储存装置如DVD、VCD所储存、传送、广播(Broadcasting)或接收的视频源，且其能够以各种视频资料格式(如MPEG-1，MPEG-2，MPEG-4，AVI，ASF，MOV等)储存、传送、广播或接收。As mentioned above, the video receiving module 101 cooperates with the signal source interface 601, for example, the video receiving module 101 can obtain the original video data 40 stored in the digital camera through FireWire (IEEE 1394 Interface), or obtain and record in VCD, Original video material on DVD 40. The original video data 40 is a video source stored, transmitted, broadcasted or received by various video capture devices or receiving devices such as digital cameras, TV cards, set-top boxes, etc., and various video storage devices such as DVD and VCD , and it can store, transmit, broadcast or receive in various video data formats (such as MPEG-1, MPEG-2, MPEG-4, AVI, ASF, MOV, etc.).

解码模块102能够针对输入的原视频资料40的视频格式、编码方式、或压缩方式进行解码转换还原为编码前的资料或近似于编码前的资料，例如，若编码方式采用失真压缩方式(Lossy Compression)，则解码后只能够取得近似于编码前的资料，以便产生一视频资料41。在本实施例中，视频资料41包括一音频资料411、一字幕资料412以及一影像资料413。音频资料411为视频资料41中所拨放的声音；字幕资料412为配合影像资料413出现于屏幕上的字幕串流(caption stream)；影像资料413为视频资料41所显示的所有单张影像，通常每秒钟的视频资料41是由25张单张影像或29.97张单张影像连续拨放所构成。The decoding module 102 can decode and convert the video format, encoding method, or compression method of the input original video material 40 into the data before encoding or approximate the data before encoding, for example, if the encoding method adopts a distortion compression method (Lossy Compression ), then only the data similar to that before encoding can be obtained after decoding, so as to generate a video data 41 . In this embodiment, the video data 41 includes an audio data 411 , a subtitle data 412 and an image data 413 . The audio data 411 is the sound played in the video data 41; the subtitle data 412 is a subtitle stream (caption stream) that appears on the screen in conjunction with the video data 413; the video data 413 is all single images displayed by the video data 41, Usually the video data 41 per second is composed of 25 single images or 29.97 single images played continuously.

制作方针选择模块103是与输入装置604配合，以便由使用者利用输入装置604选择制作图书80时所必须遵循的方针，而本实施例所提供的制作方针50包括一音频分析算法则501、一字幕分析算法则502、一影像分析算法则503以及一场景/镜头变换分析算法则504。The production policy selection module 103 cooperates with the input device 604 so that the user can use the input device 604 to select the policy that must be followed when making the book 80, and the production policy 50 provided in this embodiment includes an audio analysis algorithm 501, a A subtitle analysis algorithm 502 , an image analysis algorithm 503 , and a scene/shot change analysis algorithm 504 .

承上所述，音频分析算法则501是分析视频资料41的音频资料411，并利用特征抽取(Features Extraction)与特征匹配(Features Matching)方式进行分析。音频资料411的特征包括如频谱特征(Spectral Features)、音量(Volume)、零轴交会率(Zero Crossing Rate)、音调(Pitch)等。如上所述，当抽取频谱特征(Spectral Features)后，其经由杂音衰减(NoiseReduction)、分段(Segmentation)，并利用快速傅利叶转换(Fast FourierTransform)将音频资料411转至频率域(Frequency)，然后由一组频率滤波器(Filters)进行特征值抽取，这组特征值组成一个频谱特征向量(Spectral Feature Vector)。音量是容易量测的一种特征，其可利用均方根值(RMS，Root Mean Square)代表其特征值，然后通过音量(Volume)分析可辅助分段(Segmentation)的进行，亦即通过静音检测(SilenceDetection)帮助音频资料411段落边界(Boundaries)的决定。零轴交会率(Zero Crossing Rate)为计算每段(Clips)声音波形(Waveform)与零轴(ZeroAxis)交会的次数。音调(Pitch)为声音波形(Waveform)的基频(Fundamental Frequency)。因此，音频资料411可利用上述的音频特征及其特征值所组成的特征向量(Feature Vector)与音频样本(Audiotemplates)的特征进行分析比对，以便取得所需的音频资料411，并经由语音辨识技术取得文字部分801，并取得于视频资料41中与所需的音频资料411对应的影像资料413以作为插图部分802。在本实施例中，音频分析算法则501可以预先提供音频样本类别，如音乐(Music)、语音(Speech)、动物声(Animal Sound)、男声(Male Speech)与女声(FemaleSpeech)等，以供使用者选择所欲寻找的音频类别，因此，特征匹配便于容许的距离范围内，寻找与音频资料411的特征向量(Feature Vector)具有最短几何距离(Euclidean Distance)的音频样本类别，若此最接近的音频样本类别与使用者所选择的音频类别相同，则该音频资料411符合搜寻条件，另外，可以利用最短几何距离的倒数(Inverse)来表示所选择的音频资料411的可信度(Confidence)，从符合音频样本类别的音频资料411找出对应的视频画面段落(Clips)，并从这些视频画面段落的每一镜头中挑选出符合取图需求的影像作为插图部分802。另外，若视频资料41包括字幕串流(Caption Stream)，则解读所选择的音频资料411所对应的视频资料41内的字幕串流，来作为图书80的文字部分801；若视频资料41未包括字幕串流，则解读所选择的音频资料411内的音频资料411并利用语音分析(Speech Analysis)进行语音与文字(Voice toText)的转换处理，以作为图书80的文字部分801。另外，音频分析算法则501的运算复杂度低于影像或视觉(Visual)分析，并可作为影像或视觉(Visual)分析的引导及辅助资料。Continuing from the above, the audio analysis algorithm 501 analyzes the audio data 411 of the video data 41, and uses features extraction (Features Extraction) and feature matching (Features Matching) methods for analysis. The features of the audio data 411 include such as Spectral Features, Volume, Zero Crossing Rate, Pitch and so on. As mentioned above, after extracting the spectral features (Spectral Features), it passes through Noise Reduction (NoiseReduction), segmentation (Segmentation), and uses Fast Fourier Transform (Fast FourierTransform) to transfer the audio data 411 to the frequency domain (Frequency), and then The eigenvalues are extracted by a set of frequency filters (Filters), and this set of eigenvalues forms a spectral feature vector (Spectral Feature Vector). Volume is a feature that is easy to measure. It can use root mean square (RMS, Root Mean Square) to represent its characteristic value, and then through volume (Volume) analysis can assist segmentation (Segmentation), that is, through mute Detection (SilenceDetection) helps audio material 411 to determine paragraph boundaries (Boundaries). Zero Crossing Rate (Zero Crossing Rate) is to calculate the number of intersections between each Clips sound waveform (Waveform) and Zero Axis (Zero Axis). The pitch (Pitch) is the fundamental frequency (Fundamental Frequency) of the sound waveform (Waveform). Therefore, the audio data 411 can use the feature vector (Feature Vector) formed by the above audio features and their eigenvalues to analyze and compare the features of the audio samples (Audiotemplates) in order to obtain the required audio data 411, and through speech recognition The technology obtains the text part 801 and obtains the video data 413 corresponding to the required audio data 411 in the video data 41 as the illustration part 802 . In this embodiment, the audio analysis algorithm 501 can provide audio sample categories in advance, such as music (Music), speech (Speech), animal sound (Animal Sound), male voice (Male Speech) and female voice (FemaleSpeech), etc., for The user selects the audio category to be searched. Therefore, feature matching is convenient for finding the audio sample category with the shortest geometric distance (Euclidean Distance) to the feature vector (Feature Vector) of the audio data 411 within the allowable distance range, if this is the closest If the category of the audio sample is the same as the audio category selected by the user, the audio data 411 meets the search criteria. In addition, the inverse of the shortest geometric distance can be used to represent the confidence of the selected audio data 411 (Confidence) , find the corresponding video clips from the audio data 411 conforming to the audio sample category, and select an image that meets the drawing requirements from each shot of these video clips as the illustration part 802 . In addition, if the video material 41 includes a subtitle stream (Caption Stream), then interpret the subtitle stream in the video material 41 corresponding to the selected audio material 411 as the text part 801 of the book 80; if the video material 41 does not include For the subtitle stream, interpret the audio data 411 in the selected audio data 411 and use speech analysis (Speech Analysis) to convert voice and text (Voice to Text) to serve as the text part 801 of the book 80. In addition, the computational complexity of the audio analysis algorithm 501 is lower than that of video or visual (Visual) analysis, and can be used as guidance and auxiliary data for video or visual (Visual) analysis.

另外，字幕分析算法则502是分析视频资料41中的字幕资料412，并筛选具有字幕的视频画面。换言之，若视频资料41包括字幕串流则解读字幕串流以作为文字部分801，并寻找与字幕对应且时间同步的第一个视频画面作为插图部分802；若视频资料41未包括字幕串流，而是字幕包含于视频影像中则利用文字辨识技术将字幕(Caption)从视频影像中抽取出来作为文字部分801，并针对筛选取得的视频影像进行影像处理移除字幕(可由前后视频影像的资料进行运算处理)，以取得无字幕的视频影像以作为插图部分802。如上所述，文字辨识技术主要利用光学文字辨识技术(OCR，Optical Character Recognition)进行文字辨识。In addition, the subtitle analysis algorithm 502 is to analyze the subtitle material 412 in the video material 41 and screen the video frames with subtitles. In other words, if the video material 41 includes a subtitle stream, then interpret the subtitle stream as the text part 801, and find the first video frame corresponding to the subtitle and time-synchronized as the illustration part 802; if the video material 41 does not include the subtitle stream, If the subtitles are included in the video image, the text recognition technology is used to extract the subtitle (Caption) from the video image as a text part 801, and perform image processing on the filtered video image to remove the subtitle (can be performed by the data of the front and back video images) operation processing) to obtain the video image without subtitles as the illustration part 802 . As mentioned above, the text recognition technology mainly uses optical character recognition technology (OCR, Optical Character Recognition) for text recognition.

影像分析算法则503是分析视频资料41中的一影像资料413，并以色彩、纹理、形状、动作、位置等基本视觉特征为分析判断的依据。在本实施例中，当字幕包含于视频影像时，利用文字辨识技术将字幕从视频影像中抽取出来作为文字部分801；另外，将视频资料41与一影像范例资料5031做比较，以便找寻影像视觉特征相似性大的画面，或是找寻影像视觉特征相异性大的画面以作为插图部分802，或是将视频资料41与一物体资料5032作比对，例如以脸部检测(Face Detection)技术寻找视频资料41中具有人脸的视频画面以作为插图部分802。在本实施例中，当挑选与影像范例资料5031或是物体资料5032的视觉特征相似性大的画面，或是影像视觉特征相异性大的视频画面，以作为符合筛选画面准则的视频资料41时，同一镜头可设定仅筛选一个画面以作为插图部分802。The image analysis algorithm 503 analyzes an image data 413 in the video data 41 , and uses basic visual features such as color, texture, shape, movement, and position as the basis for analysis and judgment. In this embodiment, when the subtitles are included in the video image, the text recognition technology is used to extract the subtitles from the video image as the text part 801; in addition, the video data 41 is compared with an image sample data 5031, so as to find the video visual A picture with a large feature similarity, or look for a picture with a large dissimilarity in the visual features of the image as the illustration part 802, or compare the video data 41 with an object data 5032, for example, use face detection (Face Detection) technology to find A video frame with a human face in the video material 41 is used as the illustration part 802 . In this embodiment, when selecting a picture with a large visual feature similarity with the image sample data 5031 or object data 5032, or a video picture with a large dissimilarity in the visual feature of the video, as the video data 41 that meets the screening criteria , the same shot can be set to filter only one frame as the illustration part 802 .

场景/镜头变换分析算法则504是分析视频资料41中影像资料413的场景/镜头变换，并筛选视频资料41中影像资料413的场景/镜头变换后第一个符合条件的画面，以作为图书80的插图部分802以及视频资料41的段落的分割点。亦即是，若视频资料41包括字幕串流则解读视频资料41的段落内的字幕资料412以作为图书80的文字部分801；若视频资料41未包括字幕串流则解读视频资料41的段落内的音频资料411，并利用语音分析进行语音与文字的转换处理以作为图书80的文字部分801。一般而言，视频资料41为一视频串行(VideoSequence)，其通常由许多场景(Scenes)所组成，而每一场景又由多个镜头(Shots)所组成。在影片中，其最小单位是一个镜头，而影片便是由许多的镜头所堆砌起来的；在剧本中，其最小单位是一个场景，或称作场戏，场景表示每一故事或题材的段落，每一场景具有一明确的事件发生起始点，也具有一明确的结束点，在这样的一段时间范畴中便称作一场景，或称作一场戏。通常，一个镜头由多个视觉特性(如色彩(Color)、纹理(Texture)、形状(Shape)、动作(Motion))具一致性的画面(Frames)所组成，并且，其依据摄影机运动方向(Camera Direction)与摄影取景角度(View Angle)的改变而有变化，例如，当摄影机以不同的摄影取景角度来拍摄同一场景时，会产生不同的镜头，或以相同的摄影取景角度但拍摄不同的区域时，亦会产生不同的镜头。由于镜头可由一些基本视觉特性而区分，因此将视频资料41分割成多个连续的镜头是相当容易达成的，此技术主要由分析一些基本视觉特性的统计资料如视觉特性柱状图(Histogram)，因此，当一画面的视觉特性与前一画面的视觉特性差异达到某一程度时，就可在此画面与前一画面间作一分割，此分镜技术亦广泛运用于视频编辑软件。如上所述，将连续具关联性的镜头聚成一场景为场景变换分析的目的，严谨的说，其必须了解视频资料41的语意及内容，不过结合音频与视觉特性的分析亦可达到相当程度合理的场景变换分析，通常场景变换会同时产生音频特性(如音乐、语音、杂音(Noise)、静音(Silence))与视觉特性(如色彩、动作)的性质变化，而镜头的分割只针对视觉特性进行分析，场景变换分析须同时倚重音频特性与视觉特性的分析。The scene/shot change analysis algorithm 504 is to analyze the scene/shot change of the image data 413 in the video material 41, and filter the first qualified picture after the scene/shot change of the image data 413 in the video material 41, as the book 80 The illustration part 802 of and the segmentation point of the paragraph of the video material 41. That is, if the video data 41 includes a subtitle stream, then interpret the subtitle data 412 in the paragraph of the video data 41 as the text portion 801 of the book 80; if the video data 41 does not include a subtitle stream, then interpret the paragraph of the video data 41 The audio data 411 of the book 80 is used as the text part 801 of the book 80 by voice analysis. Generally speaking, the video data 41 is a video sequence (VideoSequence), which is usually composed of many scenes (Scenes), and each scene is composed of multiple shots (Shots). In a movie, the smallest unit is a shot, and the movie is made up of many shots; in a script, the smallest unit is a scene, or a scene, and a scene represents a paragraph of each story or theme , each scene has a definite starting point of event occurrence and a definite ending point, and in such a period of time it is called a scene, or a play. Usually, a shot is composed of multiple frames (Frames) with consistent visual characteristics (such as Color, Texture, Shape, and Motion), and it is based on the direction of camera movement ( Camera Direction) and the camera view angle (View Angle) change, for example, when the camera shoots the same scene with different camera view angles, different shots will be produced, or the same camera view angle but different shooting Different shots are also generated when the area is selected. Since the shots can be distinguished by some basic visual characteristics, it is quite easy to divide the video data 41 into a plurality of continuous shots. This technology mainly consists of analyzing some basic visual characteristic statistical data such as visual characteristic histogram (Histogram), so , when the visual characteristics of a frame differ from those of the previous frame to a certain extent, a split can be made between the frame and the previous frame. This mirroring technique is also widely used in video editing software. As mentioned above, the purpose of scene change analysis is to gather consecutive relevant shots into one scene. Strictly speaking, it must understand the semantic meaning and content of the video data 41, but the analysis combined with audio and visual characteristics can also be reasonably reasonable. Scene change analysis, usually scene changes will produce changes in the nature of audio characteristics (such as music, voice, noise (Noise), silence (Silence)) and visual characteristics (such as color, motion), and the segmentation of the shot is only for visual characteristics For analysis, scene change analysis must rely on the analysis of audio characteristics and visual characteristics at the same time.

文字撷取模块104与插图撷取模块105可以是储存在储存装置605的一软件模块，并通过中央处理单元603的运算来依据制作方针50撷取所需的文字部分801与插图部分802，以作为制作图书80的内容。The text extraction module 104 and the illustration extraction module 105 can be a software module stored in the storage device 605, and through the operation of the central processing unit 603, the required text part 801 and illustration part 802 are extracted according to the production policy 50, so as to As the contents of the production book 80 .

图书格式选择模块106所提供的图书格式70如绘本、画册、电子书、漫画等，并且可以配合不同的滤镜(Filters)如艺术家式滤镜(ArtisticFilters)、素描滤镜(Sketch Filters)、边线滤镜(Edge Filters)，来套用所取得的插图部分802，以得到使用者想要的影像处理效果(Effects)，而图书格式70与各种滤镜储存于储存装置605中。The book formats 70 provided by the book format selection module 106 such as picture books, picture albums, e-books, comics, etc., and can cooperate with different filters (Filters) such as Artistic Filters (ArtisticFilters), Sketch Filters (Sketch Filters), edge lines Filters (Edge Filters) are used to apply the obtained illustration part 802 to obtain the image processing effects (Effects) desired by the user, and the book format 70 and various filters are stored in the storage device 605 .

图书产生模块107可以是储存在储存装置605的一软件模块，并通过中央处理单元603的运算，以便套用图书格式70，并利用如调整大小(Rescaling)、影像合成(Image Composing)、制作图框等影像处理(Image processing)功能，来处理所取得的文字部分801与插图部分802，以便配合使用者选择的图书格式70与字型、大小来产生图书80。The book generation module 107 can be a software module stored in the storage device 605, and through the calculation of the central processing unit 603, so as to apply the book format 70, and use such as resizing (Rescaling), image synthesis (Image Composing), making frame and other image processing (Image processing) functions to process the obtained text part 801 and illustration part 802, so as to produce a book 80 in accordance with the book format 70, font and size selected by the user.

最后，编辑模块108可以与输入装置604配合，以便使用者于图书80产生之后，利用输入装置604的操作来进一步编辑图书80的内容。Finally, the editing module 108 can cooperate with the input device 604 so that the user can further edit the content of the book 80 by using the operation of the input device 604 after the book 80 is generated.

为使本发明的内容更容易理解，以下将举一实例，以说明依本发明较佳实施例的图书制作方法的流程。In order to make the content of the present invention easier to understand, an example will be given below to illustrate the flow of the book production method according to the preferred embodiment of the present invention.

请参照图2所示，在依本发明较佳实施例的图书制作方法2中，步骤201是接收原视频资料40，例如，可以将数字摄影机中纪录的资料经由传输线送至信号源接口601，以提供作为制作图书80的画面与内容。Please refer to Fig. 2, in the book making method 2 according to the preferred embodiment of the present invention, step 201 is to receive the original video data 40, for example, the data recorded in the digital camera can be sent to the signal source interface 601 via the transmission line, To provide images and contents for making a book 80 .

在步骤202中，解码模块102是辨识原视频资料40的格式并解码原视频资料40以产生经过解码的视频资料41，例如，原视频资料40为Interlaced MPEG-2格式，亦即是，一个讯框是由两个讯场(field)所组成，所以，在此步骤中，可以先进行MPEG-2格式的解码，然后利用内插法(Interpolation)解交错以得到视频资料41。In step 202, the decoding module 102 identifies the format of the original video data 40 and decodes the original video data 40 to generate the decoded video data 41. For example, the original video data 40 is in Interlaced MPEG-2 format, that is, a message The frame is composed of two fields. Therefore, in this step, MPEG-2 format decoding can be performed first, and then the interpolation method (Interpolation) can be used to deinterlace to obtain the video data 41 .

在步骤203中，文字撷取模块104与插图撷取模块105依据制作方针50来分析视频资料41以取得文字部分801与插图部分802，其能够依据音频分析算法则501、字幕分析算法则502、影像分析算法则503以及场景/镜头变换分析算法则504，针对视频资料41的每一视频画面与内容(包含音频内容)，进行分析搜寻并筛选取得符合制作方针50的文字部分801与插图部分802，例如，若视频资料41包括字幕串流则解读视频资料41的字幕串流以作为文字部分801；若视频资料41未包括字幕串流则解读视频资料41的音频，并利用语音分析进行语音与文字的转换处理以作为文字部分801，并在与字幕串流或音频对应的影像中撷取关键画面作为插图部分802，需注意的是，本实施例可以撷取多张关键画面来作为插图部分802。如图3所示，原视频资料40经过解码后会得到视频资料41，其包括多张单张影像301(每秒25张或29.97张)，而经过依据制作方针50的分析搜寻后会从该单张影像中撷取出关键画面302以作为插图部分802。In step 203, the text extraction module 104 and the illustration extraction module 105 analyze the video data 41 according to the production policy 50 to obtain the text part 801 and the illustration part 802, which can be based on the audio analysis algorithm 501, the subtitle analysis algorithm 502, The image analysis algorithm 503 and the scene/shot change analysis algorithm 504 analyze, search and filter each video frame and content (including audio content) of the video data 41 to obtain the text part 801 and the illustration part 802 that meet the production policy 50 For example, if the video material 41 includes a subtitle stream, then interpret the subtitle stream of the video material 41 as the text part 801; The text conversion process is used as the text part 801, and the key frame is captured in the video corresponding to the subtitle stream or audio as the illustration part 802. It should be noted that in this embodiment, multiple key frames can be captured as the illustration part 802. As shown in Figure 3, after the original video data 40 is decoded, the video data 41 will be obtained, which includes a plurality of single images 301 (25 or 29.97 per second), and after analyzing and searching according to the production policy 50, it will be obtained from the The key frame 302 is extracted from a single image as the illustration part 802 .

步骤204是判断是否已经完成视频资料41中所有内容的分析比对，当未完成视频资料41中所有内容的分析比对时，重复进行步骤203；当完成视频资料41中所有内容的分析比对时，进行步骤205。Step 204 is to judge whether the analysis and comparison of all content in the video material 41 has been completed, and when the analysis and comparison of all content in the video material 41 has not been completed, step 203 is repeated; when the analysis and comparison of all content in the video material 41 is completed , go to step 205.

步骤205是判断图书80是否需要套用图书格式70，当图书80需要套用图书格式70时，进行步骤206；当图书80不需要套用图书格式70时，进行步骤207。Step 205 is to judge whether the book format 70 needs to be applied to the book 80, and if the book 80 needs to apply the book format 70, proceed to step 206;

在步骤206中，图书格式选择模块106提供使用者选择所需的图书格式70，图书格式70包括各种具有图片、影像、相片、绘画或是绘图的图书样板，例如，漫画、绘本、画册、电子书等，以及各种布置版面(Layout)。In step 206, the book format selection module 106 provides the user with the required book format 70 to select. The book format 70 includes various book templates with pictures, images, photos, paintings or drawings, such as comics, picture books, picture books, E-books, etc., and various layouts (Layout).

在步骤207中，图书产生模块107依据在步骤203中取得的文字部分801与插图部分802，而且，当有进行步骤206时，套用步骤206中所提供的图书格式70，并运用不同的滤镜，如艺术家式滤镜、素描滤镜、边线滤镜等，来处理插图部分802，以得到所需的影像处理效果，再利用如调整大小，影像合成、制作图框等影像处理功能得到符合图书格式70的影像画面，然后，将文字部分801与插图部分802配合图书格式70与字型、大小进行转换处理，来产生图书80。In step 207, the book generation module 107 is based on the text part 801 and illustration part 802 obtained in step 203, and when step 206 is performed, the book format 70 provided in step 206 is applied, and different filters are used , such as artist filter, sketch filter, edge filter, etc., to process the illustration part 802 to obtain the desired image processing effect, and then use image processing functions such as resizing, image synthesis, and frame making to obtain The image screen in the format 70, and then convert the text part 801 and the illustration part 802 according to the book format 70 and the font and size to generate the book 80.

步骤208是判断使用者是否进行手动编辑图书80，当使用者要进行手动编辑图书80时，进行步骤209。Step 208 is to determine whether the user manually edits the book 80 , and if the user wants to manually edit the book 80 , proceed to step 209 .

在步骤209中，使用者利用编辑模块108来预览(Preview)、修改(Refine)、修饰(Modify)图书80的内容。例如，使用者可以针对图书80的重要内容的文字部分加上底线，或是文字加粗等；或是使用者可以另外插入图案等等。In step 209 , the user utilizes the editing module 108 to preview, modify (Refine), modify (Modify) the content of the book 80 . For example, the user can add an underline to the text part of the important content of the book 80, or make the text bold; or the user can insert a pattern and the like.

综上所述，由于本发明较佳实施例的图书制作系统与方法能够分析视频资料41，以针对视频资料41的音频资料411、字幕资料412及影像资料413，来整合视频内容分析、文字辨识、声音辨识等技术，所以能够有效率地利用视频资料来产生图书文件。To sum up, since the book production system and method of the preferred embodiment of the present invention can analyze the video data 41, to integrate video content analysis and text recognition for the audio data 411, subtitle data 412 and image data 413 of the video data 41 , voice recognition and other technologies, so it can efficiently use video data to generate book files.

以上所述仅为举例性，而非为限制性。任何未脱离本发明的精神与范畴，而对其进行的等效修改或变更，均应包含于本专利的保护范围之中。The above description is for illustration only, not for limitation. Any equivalent modification or change without departing from the spirit and scope of the present invention shall be included in the protection scope of this patent.

Claims

1, a kind of book makign system, it is to be used for producing books, and these books comprise a word segment and an illustration part, and this book makign system comprises:

One video reception module, it receives a former video data;

One decoder module, its this former video data of decoding is to obtain a video data, and wherein this video data comprises at least one image data, and an audio data or a captions data;

One literal acquisition module, it is made policy according to one and obtain this word segment in this video data;

One illustration acquisition module, it is made policy according to this and captures a key picture with as this illustration part in this video data; And

One books generation module, it partly produces these books according to obtained this word segment and this illustration;

Wherein this making policy comprises by at least one calculation rule in the group that audio analysis calculation rule, captions analysis calculation rule, image analysing computer calculation rule and one scene/shot transition analysis calculation rule is formed.

2, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One editor module, it is after these books produce, and the operation that receives a user is to edit the content of these books.

3, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One figure book format is selected module, and its selection that receives a user to be providing at least one figure book format, and this books generation module is applied mechanically this figure book format and produced this books.

4, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One makes policy selects module, and its selection of accepting a user is to provide this making policy.

5, according to the described book makign system of claim 1, it is characterized in that: this audio analysis algorithm is analyzed this audio data in this video data, this literal acquisition module captures this audio data obtaining this word segment according to this audio analysis algorithm, and this illustration acquisition module acquisition and corresponding this image data of this audio data are with as this illustration part.

6, according to the described book makign system of claim 1, it is characterized in that: this captions analytical algorithm is then analyzed this captions data in this video data, this literal acquisition module then captures this captions data obtaining this word segment according to this captions analytical algorithm, and this illustration acquisition module acquisition and corresponding this image data of this captions data are with as this illustration part.

7, according to the described book makign system of claim 1, it is characterized in that: this image analysing computer algorithm is according to this image data in this video data of image case study, this illustration acquisition module captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition module from corresponding this video data of this image data obtain this word segment.

8, according to the described book makign system of claim 1, it is characterized in that: this image analysing computer algorithm foundation one object is analyzed this image data in this video data, this illustration acquisition module captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition module from corresponding this video data of this image data obtain this word segment.

9, according to the described book makign system of claim 1, it is characterized in that: this image analysing computer algorithm is analyzed this image data in this video data, this literal acquisition module captures captions in this image data with as this word segment, and this illustration acquisition module captures this image data with as this illustration part.

10, according to the described book makign system of claim 1, it is characterized in that: this scene/shot transition analytical algorithm is then analyzed the scene/shot transition of this image data in this video data, this literal acquisition module and this illustration acquisition module with this scene/shot transition analytical algorithm then as the selection of this word segment and this illustration part and the foundation of segmentation.

11, a kind of books method for making, it is used for producing books, and these books comprise a word segment and an illustration part, and this books method for making comprises:

One video reception step, it receives a former video data;

One decoding step, its this former video data of decoding is to obtain a video data, and wherein this video data comprises at least one image data, and an audio data or a captions data;

One literal acquisition step, it is made policy according to one and obtain this word segment in this video data;

One illustration acquisition step, it is made policy according to this and captures a key picture with as this illustration part in this video data; And

One books produce step, and it partly produces these books according to obtained this word segment and this illustration;

12, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One edit step, it is after these books produce, and the operation that receives a user is to edit the content of these books.

13, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One figure book format is selected step, and its selection that receives a user to be providing at least one figure book format, and these books produce step and apply mechanically this figure book format and produce this books.

14, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One makes policy selects step, and its selection of accepting a user is to provide this making policy.

15, according to the described books method for making of claim 11, it is characterized in that: this audio analysis algorithm is analyzed this audio data in this video data, this literal acquisition step captures this audio data obtaining this word segment according to this audio analysis algorithm, and this illustration acquisition step acquisition and corresponding this image data of this audio data are with as this illustration part.

16, according to the described books method for making of claim 11, it is characterized in that: this captions analytical algorithm is then analyzed this captions data in this video data, this literal acquisition step then captures this captions data obtaining this word segment according to this captions analytical algorithm, and this illustration acquisition step acquisition and corresponding this image data of this captions data are with as this illustration part.

17, according to the described books method for making of claim 11, it is characterized in that: this image analysing computer algorithm is according to this image data in this video data of image case study, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

18, according to the described books method for making of claim 11, it is characterized in that: this image analysing computer algorithm foundation one object is analyzed this image data in this video data, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

19, according to the described books method for making of claim 11, it is characterized in that: this image analysing computer algorithm is analyzed this image data in this video data, this literal acquisition step captures captions in this image data with as this word segment, and this illustration acquisition step captures this image data with as this illustration part.

20, according to the described books method for making of claim 11, it is characterized in that: this scene/shot transition analytical algorithm is then analyzed the scene/shot transition of this image data in this video data, this literal acquisition step and this illustration acquisition step with this scene/shot transition analytical algorithm then as this word segment and this illustration partly selection and the foundation of segmentation.