HK1126605B

HK1126605B - Storage medium recording text-based subtitle stream, apparatus and method reproducing thereof

Info

Publication number: HK1126605B
Application number: HK09105277.4A
Authority: HK
Inventors: 郑吉洙; 朴成煜; 金光玟
Original assignee: 三星电子株式会社
Priority date: 2004-02-28
Filing date: 2006-08-04
Publication date: 2011-11-11

Description

Storage medium recording text-based subtitle stream, and apparatus and method of reproducing the same

This application is a divisional application of the patent application having an application date of 28/2/2005, application No. 200580000307.0, entitled "storage medium for recording text-based subtitle streams and apparatus and method for reproducing the same".

Technical Field

The present invention relates to reproduction of multimedia images, and more particularly, to a storage medium recording multimedia image streams and text-based subtitle streams, a reproducing apparatus and a reproducing method of reproducing the multimedia streams and text-based subtitle streams recorded on the storage medium.

Background

Video streams and audio streams providing high-density (HD) multimedia images, presentation graphic streams providing descriptions, and interactive graphic streams providing buttons or menus for interacting with a user are multiplexed into a main stream, which may also be referred to as a video and audio 'AV' data stream, and recorded on a storage medium. In particular, in order to display a subtitle or caption on an image, a presentation graphics stream for providing a subtitle also provides a bitmap-based image.

Disclosure of Invention

Technical problem

In addition to its large size, bitmap-based description data also has the problem of: the generation of caption or caption data and the editing of the generated caption data are difficult. This is because the description data is multiplexed with other data streams such as video, audio, and interactive graphic streams. Moreover, there is another problem in that the output type of the specification data cannot be changed in various ways, that is, one output type of the specification is changed to another output type of the specification.

Technical scheme

Aspects of the present invention advantageously provide a storage medium in which text-based subtitle streams are recorded, and a reproducing apparatus and method of reproducing text-based subtitle data recorded on such a storage medium.

Advantageous effects

The present invention advantageously provides a storage medium storing a text-based subtitle data stream separately from image data, and a reproducing apparatus and a reproducing method of reproducing such a text-based subtitle data stream, so that generation of subtitle data and editing of the generated subtitle data can be made simpler. In addition, regardless of the number of subtitle data items, the caption may be provided in a plurality of languages.

Drawings

The present invention will be more clearly understood from the following detailed description of exemplary embodiments and the claims when read in conjunction with the accompanying drawings, all forming part of the disclosure of the present invention. Even though the following described and illustrated disclosure focuses on disclosing exemplary embodiments of the invention, it should be clearly understood that the description and illustrations of the exemplary embodiments of the invention are not limited thereto. The spirit and scope of the present invention are to be limited only by the terms of the appended claims. The following represents a brief description of the drawings, in which:

fig. 1 is a diagram for explaining a structure of multimedia data recorded on a storage medium according to an embodiment of the present invention;

fig. 2 illustrates an exemplary data structure of the clip AV stream and a text-based subtitle stream shown in fig. 1 according to an embodiment of the present invention;

fig. 3 is a diagram for explaining an exemplary data structure of a text-based subtitle stream according to an embodiment of the present invention;

fig. 4 illustrates a text-based subtitle stream having the data structure illustrated in fig. 3 according to an embodiment of the present invention;

FIG. 5 illustrates the dialog type element shown in FIG. 3 according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an exemplary data structure of a dialog type element, according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating an exemplary data structure of a dialog type element, according to an embodiment of the present invention;

FIG. 8 illustrates an exemplary dialog type element shown in FIG. 6 or FIG. 7 according to an embodiment of the present invention;

fig. 9A and 9B illustrate an exemplary clip information file including a plurality of font sets referred to by font information according to an embodiment of the present invention;

fig. 10 is a diagram showing positions of a plurality of font files referred to by font file information shown in fig. 9A and 9B;

fig. 11 is a diagram for explaining an exemplary data structure of the dialog presentation unit shown in fig. 3, according to an embodiment of the present invention;

fig. 12A and 12B are diagrams for explaining an exemplary data structure of the dialog presentation unit shown in fig. 3, according to an embodiment of the present invention;

fig. 13 illustrates a dialog presentation unit shown in fig. 11 through 12B according to an embodiment of the present invention;

fig. 14 is a diagram for explaining an exemplary data structure of dialog text information shown in fig. 13;

FIG. 15 illustrates the dialog text message illustrated in FIG. 13 in accordance with an embodiment of the present invention;

fig. 16 is a diagram for explaining a constraint in continuously reproducing a continuous Dialog Presentation Unit (DPU);

fig. 17 is a diagram explaining an exemplary reproducing apparatus for reproducing text-based subtitle streams according to an embodiment of the present invention;

fig. 18 is a diagram for explaining a preloading procedure of text-based subtitle streams in an exemplary reproducing apparatus according to an embodiment of the present invention;

fig. 19 is a diagram for explaining a reproduction procedure of a Dialog Presentation Unit (DPU) in an exemplary reproducing apparatus according to an embodiment of the present invention;

fig. 20 is a diagram explaining a process in which text-based subtitle streams are synchronized with a moving image and output in a reproducing apparatus according to an embodiment of the present invention;

fig. 21 is a diagram for explaining a process in which text-based subtitle streams are output to a screen of an exemplary reproducing apparatus according to an embodiment of the present invention;

fig. 22 is a diagram for explaining a process of translating text-based subtitle streams in a reproducing apparatus according to an embodiment of the present invention;

fig. 23 illustrates an exemplary status register placed in an exemplary reproducing apparatus for reproducing a text-based subtitle data stream according to an embodiment of the present invention; and

fig. 24 is a flowchart of a method of reproducing a text-based subtitle stream according to an embodiment of the present invention.

Best mode for carrying out the invention

According to an aspect of the present invention, an apparatus for reproducing image data and text-based subtitle data recorded on a storage medium to display a caption on an image based on the image data, includes: a video decoder for decoding the image data; and a subtitle decoder for converting the presentation information item into bitmap information based on the type information and controlling the converted presentation information item to be output in synchronization with the decoded image data. The text-based subtitle data includes presentation information items as units of display description and type information specifying an output type of the description.

The subtitle decoder may decode a text-based subtitle recorded separately from the image data and output the subtitle data, which is layered on the decoded image data. The type information and the presentation information may be formed in units of Packetized Elementary Streams (PESs), and the subtitle decoder may parse and process the type information and the presentation information in units of the PES.

The genre information may be formed in one PES and recorded in the front of the subtitle data, and a plurality of presentation information items are recorded in units of PES after the genre information, and the subtitle decoder applies one genre information item to the plurality of presentation information items.

In addition, the presentation information may include: text information indicating the content of the description; and combination information for controlling output of a bitmap image obtained by converting the text information. The subtitle decoder may control a time when the converted text information is output by referring to the combination information.

The presentation information may specify one or more window regions in which the caption is to be output on the screen, and at the same time the subtitle decoder may output the converted text information into the one or more window regions.

The output start time and the output end time of the presentation information in the combination information may be defined as time information of a global time axis used in a playlist that is a reproduction unit of the image data, and the subtitle decoder may synchronize the output of the converted text information with the output of the decoded image data by referring to the output start time and the output end time.

The subtitle decoder may continuously reproduce two presentation information items if an output end time of a presentation information item being reproduced is the same as an output start time of a next presentation information item.

The subtitle decoder may reset the internal buffer between the output start time and the output end time if the next presentation information item does not have to be continuously reproduced, and may reserve the buffer without resetting if the next presentation information item has to be continuously reproduced.

The type information may be a set of output types predefined by a manufacturer of the storage medium and applied to the presentation information, and the subtitle decoder may convert a plurality of presentation information items recorded later into a bitmap image based on the type information.

In addition, the text information in the reproduction information may include text to be converted into a bitmap image and inline type information to be applied to only a portion of the text, and the subtitle decoder emphasizes the designated portion of the text by applying the inline type information to only the portion of the text to which the type information predefined by the manufacturer is applied.

As the inline type information, the subtitle decoder may apply a relative value of predetermined font information or a predetermined absolute value included in type information predefined by a manufacturer to a portion of text.

In addition, the type information may further include user changeable type information, and after receiving selection information on one type among the user changeable type information items from a user, the subtitle decoder may apply the type information, the inline type information, and then the user changeable type information item finally corresponding to the selection information, which are predefined by a manufacturer, to the text.

As the user-changeable genre information, the subtitle decoder may apply a relative value of predetermined font information in a genre information item predefined by a manufacturer to the text.

If the storage medium allows predetermined type information defined in the reproducing apparatus in addition to the type information predefined by the manufacturer, the subtitle decoder may apply the predetermined type information to the text.

In addition, the type information may include a set of palettes to be applied to the presentation information, and the subtitle decoder converts all presentation information items following the type information into a bitmap image based on colors defined in the palettes.

The presentation information may include a set of palettes and a color update flag in addition to the set of palettes included in the type information, and the subtitle decoder may apply the set of palettes included in the presentation information if the color update flag is set to '1' and the subtitle decoder may apply the initial set of palettes included in the type information if the color update flag is set to '0'.

The subtitle decoder may perform a fade-in/fade-out effect by setting a color update flag to '1' and gradually changing a transparency value of a palette included in a plurality of consecutive presentation information items, and reset a color lookup table (CLUT) in the subtitle decoder based on an initial set of palettes included in the type information if the fade-in/fade-out effect is completed.

In addition, the type information may include: area information indicating a position of a window area of the converted presentation information to be output onto the image; and font information required for converting the presentation information into a bitmap image, and the subtitle decoder may convert the converted presentation information into the bitmap image by using the region information and the font information.

The font information may include at least one of an output start position, an output direction, a category, a line interval, a font identifier, a font type, a font size, or a color of the converted presentation information, and the subtitle decoder converts the presentation information into a bitmap image based on the font information.

As the font identifier, the subtitle decoder may refer to indication information about a font file included in a clip information file storing attribute information of a recording unit of image data.

In addition, the subtitle decoder may buffer subtitle data and a font file referred to by the subtitle data before the image data is reproduced.

In addition, if a plurality of subtitle data items supporting a plurality of languages are recorded on a storage medium, a subtitle decoder may receive selection information about a desired language from a user and reproduce a subtitle data item corresponding to the selection information among the plurality of subtitle data items.

According to another aspect of the present invention, a method of reproducing data from a storage medium storing image data and text-based subtitle data to display a caption on an image based on the image data, the method comprising: decoding the image data; reading the type information and the presentation information items; converting the presentation information item into a bitmap image based on the type information; and controlling the converted presentation information to be output in synchronization with the decoded image data. The text-based subtitle data includes: presentation information indicating a unit for displaying the description; and type information for specifying the output type of the specification.

According to another aspect of the present invention, there is provided a storage medium for storing: image data; and text-based subtitle data for displaying a caption on an image based on the image data, wherein the subtitle data includes: a type information item for specifying an output type of the description; and a plurality of presentation information items as a display unit for explanation, and the subtitle data is separated from the image data and recorded separately.

Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Modes for carrying out the invention

The present invention will now be described in more detail with reference to the appended drawings, in which exemplary embodiments of the invention are shown.

Referring to fig. 1, a storage medium (e.g., medium 230 shown in fig. 2) according to an exemplary embodiment of the present invention is formed of a plurality of layers to manage a multimedia data structure 100 of a multimedia image stream recorded thereon. The multimedia data structure 100 includes a clip 110, which is a recording unit of multimedia images; a playlist 120, i.e., a reproduction unit of multimedia images; the movie object 130 includes navigation commands for reproducing multimedia images; and an index table 140 for specifying a movie object to be reproduced first and a title of the movie object 130.

The clip 110 is executed as one object, which includes a clip audio-visual (AV) stream 112 for an AV data stream of a high image quality movie and clip information 114 for attributes corresponding to the AV data stream. For example, the AV data stream may be compressed according to a standard such as Moving Picture Experts Group (MPEG). However, in all aspects of the invention, such clips 110 need not require the AV data stream 112 to be compressed. In addition, the clip information 114 may include audio/video attributes of the AV data stream 112, an entry point map, information regarding the location of a random access entry point therein recorded in units of predetermined segments, and the like.

The playlist 120 is a set of reproduction time intervals of the clips 110, and each reproduction time interval is referred to as a play item 122. The movie objects 130 are formed of a navigation command program, and these navigation commands start reproduction of the playlist 120, switch between the movie objects 130, or manage reproduction of the playlist 120 according to the user's preference.

The index table 140 is a table located at an upper layer of the storage medium to define a plurality of titles and menus and includes start position information of all the titles and menus, whereby titles and menus selected by a user operation such as title search or menu call can be reproduced. The index table 140 also includes start position information of titles and menus that are automatically reproduced first when the storage medium is placed on the reproducing apparatus.

Among these items, the structure of a clip AV stream in which multimedia images are compression-encoded will now be explained with reference to fig. 2. Fig. 2 illustrates an exemplary data structure of the AV data stream 210 shown in fig. 1 and a text-based subtitle stream 220 according to an embodiment of the present invention.

Referring to fig. 2, in order to solve the problem related to the bitmap-based caption data described above, a text-based subtitle data stream 220 according to an embodiment of the present invention is provided separately from a clip AV data stream 210 recorded on a storage medium 230 such as a Digital Versatile Disc (DVD). The AV data stream 210 includes a video stream 202, an audio stream 204, a presentation graphic stream 206 for providing subtitle data, and an interactive graphic stream 208 for providing buttons and menus for interacting with a user, all of which are multiplexed into a moving picture main stream called an audio-visual 'AV' data stream and recorded in a storage medium 230.

The text-based subtitle data 220 according to an embodiment of the present invention represents data for providing subtitles or descriptions of multimedia images to be recorded on the storage medium 230, and may be performed by using a markup language such as extensible markup language (XML). However, binary data is used to provide subtitles or captions for such multimedia images. Hereinafter, the text-based subtitle data 220 that provides a description of a multimedia image using binary data will be simply referred to as a "text-based subtitle stream". The presentation graphic stream 206 for providing subtitle data or caption data also provides bitmap-based subtitle data to display subtitles (or captions) on a screen.

Since the text-based subtitle data stream 220 is recorded separately from the AV data stream 210 and is not multiplexed together with the AV data stream 210, the size of the text-based subtitle data stream 220 is not limited thereto. As a result, multiple languages may be used to provide subtitles or captions. Also, the text-based subtitle data stream 220 can be conveniently produced and efficiently edited without any difficulty.

Then, the text-based subtitle stream 220 is converted into a bitmap graphic image and output onto a screen, layered on a multimedia image. The process of converting such text-based data into a graphic-based bitmap image is called rendering. Text-based subtitle stream 220 includes information requesting translation of caption text.

The structure of the text-based subtitle stream 220 including translation information will now be explained with reference to fig. 3. Fig. 3 is a diagram for explaining an exemplary data structure of text-based subtitle stream 220 according to an embodiment of the present invention.

Referring to fig. 3, the text-based subtitle stream 220 according to an embodiment of the present invention includes a dialog type unit (DSU)310, and a plurality of Dialog Presentation Units (DPUs) 320 through 340. DSU310 and DPU

320-340 are also referred to as dialog units. Each dialog unit 310-340 forming the text-based subtitle stream 220 is recorded in the form of a Packetized Elementary Stream (PES), or simply referred to as a PES packet 350. In addition, the PES of the text-based subtitle stream 220 is recorded and transmitted in units of Transport Packets (TP) 362. A series of TPs is called a Transport Stream (TS).

However, as shown in fig. 2, the text-based subtitle stream 220 according to an embodiment of the present invention is not multiplexed together with the AV data stream 210 and is recorded as a separate TS on the storage medium 230.

Referring to fig. 3, one dialog unit is recorded in one PES packet 350 included in the text-based subtitle stream 220. The text-based subtitle stream 220 includes one DSU310 located at the front and a plurality of DPUs 320 through 340 located after the DSU 310. The DSU310 includes information specifying an output type of a dialog in a caption displayed on a screen on which a multimedia image is reproduced. Meanwhile, the plurality of DPUs 320 to 340 include text information items regarding dialog contents to be displayed and information regarding respective output times.

Fig. 4 illustrates a text-based subtitle stream 220 having the data structure illustrated in fig. 3 according to an embodiment of the present invention.

Referring to fig. 4, text-based subtitle stream 220 includes a DSU 410 and a plurality of DPUs 420.

In an exemplary embodiment of the present invention, the number of DPUs is defined by num _ of _ dialog _ presentation _ units. However, the reference numerals of the DPUs cannot be individually designated. An exemplary case is to use a statement such as while (processed _ length < end _ of _ file).

Now, the data structures of the DSU and DPU will be explained in detail with reference to fig. 5. Fig. 5 illustrates the dialog type element shown in fig. 3 according to an embodiment of the present invention.

Referring to fig. 5, in the DSU310, a set of dialog type information items, dialog style ()510, in which output type information items of a displayed dialog to be described are collected, is defined. The DSU310 includes information on the location of an area whose dialog is displayed in the description, information requesting translation of the dialog, information on the type controllable by the user, and the like. The details of the data will be explained later.

Fig. 6 is a diagram for explaining an exemplary data structure of a dialog type unit (DSU) according to an embodiment of the present invention.

Referring to fig. 6, the DSU310 includes a palette set 610 and a region type set 620. Palette set 610 is a set of multiple palettes that define the colors that will be used in the description. Color combinations or color information such as transparency contained in the palette collection 610 may be applied to all of the plurality of DPUs located after the DSU.

Region type set 620 is a set of output type information entries for each dialog that forms a specification. Each region type includes: region information 622 indicating a position where a dialog is to be displayed on the screen; text type information 624 for indicating an output type of text to be applied to each dialog; and a user changeable type set 626 for indicating the type of text to be applied to each dialog that the user can arbitrarily change.

Fig. 7 is a diagram for explaining an exemplary data structure of a dialog type element according to another embodiment of the present invention.

Referring to fig. 7, unlike fig. 6, a palette set 610 is not included. That is, no palette set is defined in the DSU310, but a palette set 610 is defined in the DPU, which will be explained with reference to fig. 12A and 12B. The data structure for each region type 710 is the same as described above with reference to fig. 6.

Fig. 8 illustrates a dialog type element shown in fig. 6 or 7 according to an embodiment of the present invention.

Referring to fig. 8 and 6, the DSU310 includes palette sets 860 and 610, and a plurality of region types 820 and 620. As described above, the palette set 610 is a set of a plurality of palettes defining colors to be used in the description. Color combinations or color information, such as transparency, included in the palette set 610 may be applied to all of the plurality of DPUs located after the DSU.

Meanwhile, each of the region types 820 and 620 includes region information 830 and 622 indicating information about a window region to be displayed on the screen described therein, and the region information 830 and 622 includes information about X, Y coordinates, width, height, background color, and the like, describing the window region to be displayed on the screen.

In addition, each of the region types 820 and 620 includes text type information 840 and 624 indicating an output type of text to be applied to each dialog. That is, X, Y coordinates which may include a position in which the text of the dialog is to be displayed in the above-described window region, such as the output direction of the text from the left to the right or from the top to the bottom, a category, a line interval, an identifier of a font to be referred to, a font type such as a bold font or an oblique font, a font size, information on a font color, and the like.

Also, each of the region types 820 and 620 may further include user changeable type sets 850 and 626 indicating types that a user can arbitrarily change. However, user changeable type sets 850 and 626 are optional. The user changeable type sets 850 and 626 may include change information on the position of the window area, the output position of the text, the font size, and the line spacing between the text output type information entries 840 and 624. Each change information entry may be represented as a relatively increased or decreased value compared to information regarding the output types 840 and 624 of text to be applied to each dialog.

Summarizing the above, there are three types of related information, type information (region _ style)620 defined in the region types 820 and 620, inline type information (inline _ style)1510 and user changeable type information (user _ changeable _ style)850, which will be explained and emphasized partially later, and the order of applying these information entries is as follows:

1) basically, region type information 620 defined in the region type is applied.

2) If there is inline type information, the inline type information 1510 is used to stack a portion in which region type information is applied, and emphasize a portion of the caption text.

3) If there is user changeable type information 850, the information is used last. The presence of user changeable type information is optional.

Meanwhile, in the text type information items 840 and 624 to be applied to the text of each dialog, font file information to be referenced by an identifier (font _ id)842 of a font may be defined as follows.

Fig. 9A illustrates an exemplary clip information file 910 including the plurality of font sets referenced by font information 842 shown in fig. 8 according to an embodiment of the present invention.

Referring to fig. 9A, 8, 2 and 1, in StreamCodingInfo ()930, which is a stream coding information structure included in the clip information files 910 and 110, information on various streams recorded on the storage medium according to the present invention is included. I.e., includes information regarding the video stream 202, the audio stream, the presentation graphic stream, the interactive graphic stream, the text-based subtitle stream, etc. Specifically, with respect to the text-based subtitle stream 220, information (textST _ language _ code)932 regarding a language for displaying a caption may be included. In addition, a font name 936 and a file name 938 of a file storing font information corresponding to font _ ids 842 and 934 which specify font identifiers to be referenced and shown in fig. 8 may be defined. Subsequently, a method for finding a font file corresponding to an identifier of a font to be referenced and defined herein will be explained with reference to fig. 10.

Fig. 9B illustrates an exemplary clip information file 940 including a plurality of font sets referenced by the font information 842 shown in fig. 8 according to another embodiment of the present invention.

Referring to fig. 9B, a structure, ClipInfo (), may be defined in the clip information files 910 and 110. In this structure, a plurality of font sets referenced by the font information 842 shown in fig. 8 can be defined. That is, the font file name 952 corresponding to the font _ id 842 indicating the identifier of the font to be referenced and shown in fig. 8 is specified. Now, a method for finding a font file corresponding to an identifier of a font defined herein and to be referred to will be explained.

Fig. 10 is a diagram illustrating the locations of a plurality of font files referenced by the font file names 938 and 952 illustrated in fig. 9A and 9B.

Referring to fig. 10, a directory structure of files regarding multimedia images recorded on a storage medium according to an embodiment of the present invention is illustrated. Specifically, by using the directory structure, the position of the font file such as 11111.font 1010 or 99999.font 1020 in the auxiliary data (AUXDATA) directory can be easily found.

Meanwhile, the structure of the DPU forming the dialogue unit will now be explained in more detail with reference to fig. 11.

Fig. 11 is a diagram for explaining an exemplary data structure of the DPU320 shown in fig. 3 according to other embodiments of the present invention.

Referring to fig. 11 and 3, the DPU320 including text information on dialog contents to be output and information on display time includes: time information 1110 indicating the time of a dialog to be output on the screen; palette reference information 1120 for specifying a palette to be referenced; and dialog region information 1130 for a dialog to be output on the screen. Specifically, the dialog region information 1130 of the dialog to be output on the screen includes: type reference information 1132 for specifying an output type to be applied to the dialog; and dialog text information 1134 for specifying the text of a dialog to be actually output on the screen. In this case, it is assumed that a palette set specified by the palette reference information 1120 is defined in the DSU (see 610 of fig. 6).

Meanwhile, fig. 12A is a diagram explaining an exemplary data structure of the DPU320 shown in fig. 3, according to an embodiment of the present invention.

Referring to fig. 12A and 3, the DPU320 includes: time information 1210 indicating a time of a dialog to be output on a screen; a palette set 1220 for defining a palette set; and dialog region information 1230 for a dialog to be output on the screen. In this case, the palette set 1220 is not defined in the DSU as shown, but is directly defined in the DPU 320.

Meanwhile, fig. 12B is a diagram explaining an exemplary data structure of the DPU320 shown in fig. 3, according to an embodiment of the present invention.

Referring to fig. 12B, the DPU320 includes: time information 1250 for indicating the time of a dialog to be output on the screen; a color update flag 1260; a palette set 1270 used when the color update flag is set to 1; and dialog region information 1280 for a dialog to be output on the screen. In this case, the palette set 1270 is also defined in the DSU shown in fig. 11, and is also stored in the DPU 320. Specifically, to express fade-in/fade-out using continuous reproduction, a palette set 1270 to be used for expressing fade-in/fade-out is defined in the DPU320 in addition to a basic palette set defined in the DSU, and a color update flag 1260 may be set to 1. This will be explained in more detail with reference to fig. 19.

Fig. 13 illustrates the DPU320 illustrated in fig. 11 through 12B, according to an embodiment of the present invention.

Referring to fig. 13, 11, 12A, and 12B, the DPU includes dialog start time information (dialog _ start _ PTS) and dialog end time information (dialog _ end _ PTS)1310 as time information 1110 indicating a time of a dialog to be output onto a screen. In addition, a dialog palette identifier (dialog _ palette _ id) is included as the palette reference information 1120. In the case of fig. 12A, a palette set 1220 may be included instead of the palette reference information 1120. Dialog text information (region _ subtitle)1334 is included as dialog region information 1230 for a dialog to be output, and a region type identifier (region _ style _ id)1332 may be included in order to specify an output type to be applied thereto. The example shown in fig. 13 is only an embodiment of the DPU, and the DPU having the data structure shown in fig. 11 to 12B may be implemented by modification in various ways.

Fig. 14 is a diagram for explaining a data structure of dialog text information (region _ subtitle) shown in fig. 13.

Referring to fig. 14, the dialog text information 1134 shown in fig. 11, the dialog text information 1234 shown in fig. 12A, the dialog text information 1284 shown in fig. 12B, and the dialog text 1334 shown in fig. 13 include inline type information 1410 and dialog text 1420 as output types of the dialog with the emphasized portion.

Fig. 15 illustrates dialog text information 1334 shown in fig. 13 in accordance with an embodiment of the present invention. As shown in fig. 15, the dialog text information 1334 is executed by inline type information (inline style)1510 and dialog text (text string) 1520. In addition, information indicating the end of the intra-row type is preferably included in the embodiment shown in FIG. 15. Unless the end portion of the inline type is defined, the inline type that was specified may be applied successively later, contrary to the manufacturer's intent.

Meanwhile, fig. 16 is a diagram explaining constraints in continuously reproducing consecutive DPUs.

Referring to fig. 16 and 13, when it is necessary to continuously reproduce the above-described plurality of DPUs, the following constraints are required.

1) Dialog start time information (dialog _ start _ PTS)1310 defined in the DPU indicates a time when a dialog object starts to be output onto the Graphic Plane (GP), and the graphic plane will be explained later with reference to fig. 17.

2) Dialog end time information (dialog _ end PTS)1310 defined in the DPU indicates a time for resetting a text-based subtitle decoder that processes text-based subtitles, which will be explained later with reference to fig. 17.

3) When it is required to continuously reproduce the above-described DPUs, the dialog end time information (dialog _ end _ PTS) of the current DPU should be identical to the dialog start time information (dialog _ start _ PTS) of the DPU that is subsequently continuously reproduced. That is, in fig. 16, in order to continuously reproduce DPU #2 and DPU #3, session end time information included in DPU #2 should be the same as session start time information included in DPU # 3.

Meanwhile, it is preferable that the DSU according to the present invention satisfies the following limitations.

1) The text-based subtitle stream 220 includes one DSU.

2) The reference number of the user changeable type information item (user _ control _ style) included in all the region types (region _ styles) should be the same.

At the same time, it is preferable that the DPU according to the present invention satisfies the following constraints.

1) At least two window areas of the description should be defined.

A structure of an exemplary reproducing apparatus based on a data structure of the text-based subtitle stream 220 recorded on the storage medium according to an embodiment of the present invention will now be explained with reference to fig. 17.

Fig. 17 is a diagram explaining a structure of an exemplary reproducing apparatus for reproducing text-based subtitle streams according to an embodiment of the present invention.

Referring to fig. 17, a reproducing apparatus 1700, also referred to as a playback apparatus, includes a buffer unit including a Font Preloading Buffer (FPB)1712 for storing font files; and a Subtitle Preloading Buffer (SPB)1710 for storing a text-based subtitle file; and a text-based subtitle decoder 1730 for decoding and reproducing the text-based subtitle stream previously recorded on the storage medium and then outputting it through a graphic screen (GP)1750 and a color lookup table (CLUT) 1760.

Specifically, the buffer unit includes a Subtitle Preloading Buffer (SPB)1710 in which the text-based subtitle data stream 220 is preloaded; and a Font Preloading Buffer (FPB)1712 in which font information is preloaded.

The subtitle decoder 1730 includes a text subtitle processor 1732, a Dialog Composition Buffer (DCB)1734, a Dialog Buffer (DB)1736, a text subtitle translator 1738, a dialog presentation controller 1740, and a Bitmap Object Buffer (BOB) 1742.

The text subtitle processor 1732 receives the text-based subtitle data stream 220 from the text Subtitle Preloading Buffer (SPB)1710, transmits the above-described type-related information included in the DSU and dialog output time information included in the DPU to a Dialog Composition Buffer (DCB)1734, and transmits dialog text information included in the DPU to a Dialog Buffer (DB) 1736.

The presentation controller 1740 controls the text translator 1738 by using the type-related information included in the Dialog Composition Buffer (DCB)1734, and controls the bitmap image time translated in the Bitmap Object Buffer (BOB)1740 to be output to the picture plane (GP)1750 by using the dialog output time information.

The text subtitle translator 1738 converts the dialog text information into a bitmap image, i.e., performs translation, by applying a font information item corresponding to the dialog text information stored in the Dialog Buffer (DB)1736 among font information preloaded in the Font Preloading Buffer (FPB)1712 to the dialog text information, according to the control of the presentation controller 1740. The translated bitmap image is stored in a Bitmap Object Buffer (BOB)1742, and is output to a Graphics Plane (GP)1750 according to the control of the presentation controller 1740. At this time, the color specified in the DSU is applied by referring to the color lookup table (CLUT) 1760.

As the type-related information to be applied to the dialog text, information defined in the DSU by the manufacturer may be used, and type-related information predefined by the user may also be applied. As shown in fig. 17, the reproducing apparatus 1700 uses type information defined by a user in preference to type-related information defined by a manufacturer.

As described with reference to fig. 8, as the type-related information to be applied to the dialog text, region type information (region _ style) defined in the DSU by the manufacturer is basically used, and if an inline type is included in the DPU containing the dialog text, the inline type information (inline _ style) is applied to a corresponding section and the region type information is applied to the DPU. In addition, if the manufacturer additionally defines user changeable types in the DSU and one of the user changeable types defined by the user is selected, the area type and/or in-line type is applied, and then the user changeable type is finally applied. In addition, as described with reference to fig. 15, information indicating that the constraint of the inline type is applied is preferably included in the content of the inline type.

Also, the manufacturer may specify whether to allow or prevent the use of type-related information defined in the reproducing apparatus itself separately from the information related to the type that the manufacturer defines and records on the storage medium.

Fig. 18 is a diagram for explaining a preloading procedure of the text-based subtitle data stream 220, for example, in the reproducing apparatus 1700, shown in fig. 17, according to an embodiment of the present invention.

Referring to fig. 18, the text-based subtitle data stream 220 shown in fig. 2 is defined in the sub path of the playlist described above. In the sub path, a plurality of text-based subtitle data streams 220 supporting a plurality of languages may be defined. In addition, a font file to be applied to the text-based subtitle may be defined in the clip information file 910 or 940 described above with reference to fig. 9A and 9B. There are 255 text-based subtitle data streams 220 that can be included in one memory medium, which can be defined in each playlist. In addition, 255 font files that can be included in one storage medium can be defined. However, in order to guarantee seamless presentation, the size of the text-based subtitle data stream 220 should be smaller than or equal to the size of the preload buffer 1710 of the reproducing apparatus 1700 as shown in fig. 17.

Fig. 19 is a diagram for explaining a reproducing process of a DPU in a reproducing apparatus according to the present invention.

Referring to fig. 19, 13, and 17, a process of reproducing the DPU is illustrated. As shown in fig. 17, the presentation controller 1740 controls the time of the translated dialog to be output to the Graphics Plane (GP)1750 by using dialog start time information (dialog _ start _ PTS) and dialog end time information (dialog _ end _ PTS) specifying the output time 1310 of the dialog included in the DPU. At this time, the dialog start time information specifies the time when transmission of the translated dialog bitmap image stored in the Bitmap Object Buffer (BOB)1742 included in the text-based subtitle decoder 1730 to the Graphics Plane (GP)1750 is completed. That is, if it is a dialog start time defined in the DPU, bitmap information required to establish a dialog should be prepared for use after the transmission of information to the picture plane (GP)1750 is completed. In addition, the session end time information specifies the time when the reproduction DPU is completed. At this time, the subtitle decoder 1730 and the picture plane (GP)1750 are both reset. It is preferable that a buffer in the subtitle decoder 1730, such as a Bitmap Object Buffer (BOB)1742, be reset between the start time and the end time of the DPU regardless of continuous reproduction.

However, when continuous reproduction of a plurality of DPUs is required, the subtitle decoder 1730 and the pictorial plane 1750 are not reset, and the content stored in each buffer such as the Dialog Composition Buffer (DCB)1734, the Dialog Buffer (DB)1736, and the Bitmap Object Buffer (BOB)1742 should be preserved. That is, when the session end time information of the DPU currently being reproduced and the session start time information of the DPUs to be successively reproduced later are the same, the contents of each buffer are not reset but are retained.

Specifically, as an example of applying continuous reproduction of a plurality of DPUs, there is a fade-in/fade-out effect. The fade in/out effect may be performed by changing a color lookup table (CLUT)1760 of a bitmap object transmitted to the illustration surface (GP) 1750. That is, the first DPU includes combination information such as color, type, and output time, and then a plurality of consecutive DPUs have the same combination information as the first DPU, but update only the palette information. In this case, by gradually changing the transparency in the color item, from 0% to 100%, the fade-in/fade-out effect can be performed.

Specifically, when the data structure of the DPU shown in fig. 12B is used, the fade-in/fade-out effect can be effectively performed by using the color update flag 1260. That is, if the dialog presentation controller 1740 checks and determines that the color update flag 1260 included in the DPU is set to '0', that is, if it is a general case that the fade-in/fade-out effect is not required, the color information included in the DSU shown in fig. 6 is basically used. However, if presentation controller 1740 determines that the color update flag 1260 is set to '1', i.e., if a fade-in/fade-out effect is required, the fade-in/fade-out effect is performed by using color information 1270 included in the DPU, instead of using color information 610 included in the DSU shown in fig. 6. At this time, by adjusting the transparency of the color information 1270 included in the DPU, the fade-in/fade-out effect can be simply performed.

Therefore, after the fade in/out effect is displayed, it is preferable to update the color query list (CLUT)1760 to the initial color information included in the DSU. This is because, if it is not updated, the color information once specified can be continuously applied thereafter, which is contrary to the purpose of the manufacturer.

Fig. 20 is a diagram explaining a process in which a text-based subtitle stream is synchronized with moving image data and output to a reproducing apparatus according to an embodiment of the present invention.

Referring to fig. 20, dialog start time information and dialog end time information included in a DPU of a text-based subtitle data stream 220 should be defined as a point in time with respect to a global time axis used in a playlist to synchronize with an output time of an AV data stream 210 of multimedia images. Accordingly, an interruption between a System Time Clock (STC) of the AV data stream and a dialog output time (PTS) of the text-based subtitle data stream 220 can be prevented.

Fig. 21 is a diagram for explaining a process in which a text-based subtitle data stream is output onto a screen in a reproducing apparatus according to an embodiment of the present invention.

Referring to fig. 21, shown is a process in which dialog text information 2104 is converted into a bitmap image 2106 by applying translation information 2102 including type-related information, and the converted bitmap image is output to a corresponding position on a pictorial plane (GP)1750 based on output position information (such as region _ horizontal _ position and region _ vertical _ position) included in combination information 2108.

The translation information 2102 represents type information such as a region width, a height, a foreground color, a background color, a text type, a font name, a font type, and a font size. As described above, the translation information 2102 is defined in the region type set in the DSU. Meanwhile, the combination information 2108 indicates the start time and end time of the presentation, and explains horizontal and vertical position information and the like output to the window area of the illustration face (GP) 1750. This information is defined in the DPU.

Fig. 22 is a diagram explaining a process of translating the text-based subtitle data stream 220 in the reproducing apparatus 1700 as shown in fig. 17, according to an embodiment of the present invention.

Referring to fig. 22, 21, and 8, a window region specified by using the region _ horizontal _ position, region _ vertical _ position, region _ width, and region _ height as the position information 830 of the window region defined as the caption in the DSU is specified as a region where the caption is displayed on the illustration plane (GB) 1750. The bitmap image of the translated dialog is displayed from the start point position specified by the text _ horizontal _ position and the text _ vertical _ position as the output position 840 of the dialog in the window region.

Meanwhile, the reproducing apparatus according to the present invention stores type information (style _ id) selected by a user in a system registration area. Fig. 23 illustrates an exemplary status register provided in a reproducing apparatus for reproducing a text-based subtitle data stream according to an embodiment of the present invention.

Referring to fig. 23, a status register (player status register, hereinafter, referred to as PSR) stores type information (selected type 2310) selected by a user in a 12 th register. Therefore, for example, even after the reproducing apparatus 1700 shown in fig. 17 performs a menu call or another operation, if the user presses a genre information change button, genre information previously selected by the user may be first applied with reference to the PSR 12. The registers that store the information may be changed.

Now, a method of reproducing the text-based subtitle data stream 220 based on a storage medium recording the text-based subtitle stream 220 and a structure of the above-described reproducing apparatus that reproduces the subtitle data stream 220 will be explained with reference to fig. 24. Fig. 24 is a flowchart of the operation of a method of reproducing the text-based subtitle data stream 220 according to an embodiment of the present invention.

The text-based subtitle data stream 220 including DSU information and DPU information is read from, for example, the storage medium 230 shown in fig. 2 in operation 2410, and the caption text included in the DPU information is converted into a bitmap image based on the translation information included in the DSU information in operation 2420. In operation 2430, the converted bitmap image is output on the screen according to the time information and the position information, which are combination information included in the DPU information.

As described above, the present invention advantageously provides a storage medium storing a text-based subtitle data stream separate from image data, a reproducing apparatus and a reproducing method for reproducing such a text-based subtitle data stream, so that generation of subtitle data and editing of subtitle data become simpler. In addition, regardless of the number of subtitle data items, the caption may be provided in a plurality of languages.

In addition, since subtitle data is formed with one type information item (DSU) and a plurality of presentation information items (DPUs), an output type to be applied to the entire presentation data may be defined in advance and may be changed in various ways, and an in-line type emphasizing a partial caption and a user changeable type may also be defined.

Also, by using a plurality of adjacent presentation information items, continuous reproduction of the description becomes possible, and by applying this, fade-in/fade-out and other effects can be easily performed.

The exemplary embodiments of the present invention can also be written as computer programs and can be executed in general-use digital computers that execute the programs using a computer readable medium. Examples of the computer readable medium include magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, DVDs, etc.), and storage media such as carrier waves (e.g., transmission through the internet). The computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

While exemplary embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes and modifications may be made, and equivalents may be substituted for elements thereof without departing from the spirit and scope of the invention. Many modifications may be made to adapt the teachings of the present invention to a particular situation without departing from its scope. For example, any computer readable medium or data storage device may be applied as long as text-based subtitle data and AV data are separately recorded on the computer readable medium or data storage device. In addition, as shown in fig. 3 or 4, text-based subtitle data may also be configured differently. Also, the reproducing apparatus shown in fig. 17 may also be implemented as a partial recording apparatus, or otherwise as a single apparatus for performing a recording and/or reproducing function with respect to a storage medium. Likewise, the CPU may be implemented as a chipset having firmware or otherwise as a general or special purpose computer programmed to perform the methods described with reference to FIG. 24, for example. Therefore, it is intended that the invention not be limited to the disclosed exemplary embodiments, but that the invention will include all embodiments falling within the scope of the appended claims.

Industrial applicability

The present invention is applied to a storage medium in which text-based subtitle streams are recorded, and a reproducing apparatus and method of reproducing text-based subtitle data recorded on such a storage medium.

Claims

1. A method of reproducing data from a storage medium storing image data and text-based subtitle data to display a dialog on an image based on the image data, comprising:

decoding the image data;

receiving text-based subtitle data including a dialog presentation unit and a dialog type unit, and converting text for a dialog included in the dialog presentation unit into a bitmap image based on the dialog type unit;

the converted text for dialog is output in synchronization with the decoded image data,

wherein the dialog presentation unit includes the text for the dialog and output time information indicating a time when the dialog is to be output on a screen, and the dialog type unit includes text type information specifying an output type of the text to be applied to the corresponding dialog and a palette set including a set of a plurality of palettes defining colors of the text to be applied to the corresponding dialog.

2. The method of claim 1, wherein the dialog presentation unit and the dialog type unit are formed in units of packetized elementary streams, and the dialog presentation unit and the dialog type unit are parsed and processed in units of packetized elementary streams.

3. The method of claim 1, wherein if a plurality of subtitle data items supporting a plurality of languages are recorded on the storage medium, selection information about a desired language from a user is received, and a subtitle data item corresponding to the selection information among the plurality of subtitle data items is reproduced.