HK1131493B

HK1131493B - Performing trick play functions in a digital video recorder with efficient use of resources

Info

Publication number: HK1131493B
Application number: HK09111355.7A
Authority: HK
Inventors: 埃里克‧文尼尔
Original assignee: Tivo Solutions Inc.
Priority date: 2006-10-31
Filing date: 2007-10-31
Publication date: 2013-05-10

Abstract

Techniques for selecting a picture of a video program for display in accordance with a selected trick play mode of a DVR are described herein. Sometimes, when desired playback speed is faster than normal, a DVR is forced to select some pictures to play and some pictures to skip. Ideally, in order to preserve 'smooth' viewing quality, the next picture to be displayed should be temporally close to the currently displayed picture. Since some pictures can take longer to decode than others, accounting for picture decoding costs helps ensure that the best picture can be selected for decoding given playback speed requirements. According to one embodiment of the invention, for each picture of a plurality of pictures in a video data stream, a DVR determines a decoding cost for that picture. Based on the determined costs, the DVR selects a particular picture. The DVR outputs the selected picture for display.

Description

Performing trick play functions in a digital video recorder with efficient use of resources

Priority

The present invention claims priority from U.S. provisional patent application No. 60/855,890 entitled "trick play for advanced codec" filed on 2006, 10, 31, which is incorporated herein by reference for all purposes.

Technical Field

Embodiments of the present invention relate generally to Digital Video Recorders (DVRs). More particularly, embodiments of the invention relate to techniques for performing fast forward, rewind, and other trick play functions for performing digital video recording run by a digital video recorder.

Background

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Accordingly, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.

The Moving Picture Experts Group (MPEG) specifies several standards for encoding video streams. The MPEG standard specifies that an encoded video stream may contain a plurality of frames. The encoded video stream may be "interlaced" or "progressive". If the encoded video stream is interlaced, each frame in the video stream comprises two fields. The "top" field of an interlaced frame represents the odd lines of pixels in the frame and the "bottom" field of the interlaced frame represents the even lines of pixels in the frame. As used herein, a "picture" represents a frame (in the case of a progressive video stream) or a field (in the case of an interlaced video stream) and encodes the frame or field. The image of the encoded frame is referred to as a "frame image". Pictures that encode a single field are referred to as "field pictures".

In addition to being a frame picture or a field picture, a given picture may also be an intra-coded picture ("I-picture"), a predictive-coded picture ("P-picture"), or a bi-directionally predictive-coded picture ("B-picture"). I-pictures independently represent a complete frame or field in a video stream; no data for other pictures in the video stream is required to decode and render the frames or fields of the I-picture representation. In contrast, P-pictures and B-pictures cannot independently represent frames or fields in a video stream. P-pictures and B-pictures rely on data in the video stream that is encoded by one or more other pictures (in addition to data encoded by the P-pictures and B-pictures themselves) to adequately represent a complete frame or field in the video stream. More particularly, the sub-components ("blocks") of P-pictures and B-pictures refer to (refer to) other pictures in the video stream.

Each picture in an MPEG encoded video stream is subdivided into "macroblocks". Each "macroblock" is a set of 256 pixels 16 pixels high and 16 pixels wide. Each macroblock is also subdivided into "blocks". A "block" is a collection of pixels. The size of the pixel blocks may vary depending on the particular MPEG standard used to encode the video stream.

In an MPEG encoded video stream, pictures appear in "decoding order" (the order in which those pictures will be decoded) rather than "presentation order" (the order in which the content represented by those pictures will be presented). Since a particular image cannot be fully decoded until all other images to which the particular image block refers are decoded, the particular image is placed later in the decoding order MPEG encoded video stream than the other images at encoding time. As a result, when the specific picture is decoded, other pictures to which the specific picture block refers have already been decoded.

The I-picture and P-picture are referred to as "reference pictures" because blocks of other pictures refer to the I-picture and P-picture. According to some coding standards, B-pictures are not reference pictures because blocks of other pictures do not reference B-pictures under the standard. A block in a P-picture may refer back to a previous (referring to presentation order) reference picture in the video stream. A block in a B-picture may refer to a pair of other pictures in the video stream. The pair of other pictures includes a preceding (referring to presentation order) reference picture in the video stream and a succeeding (referring to presentation order) reference picture in the video stream. Blocks in an I-picture do not refer to any other picture in the video stream.

The MPEG-2 standard complies with certain specific restrictions on which other pictures blocks of a particular picture refer to. The MPEG-2 standard requires that a picture to which a block of a P-picture refers be the same picture for all blocks of the P-picture that refer to another picture; according to the MPEG-2 standard, different blocks of the same P-picture are not allowed to refer to different pictures in the video stream. Similarly, the MPEG-2 standard requires that a pair of pictures to which a block of a B-picture refers be the same pair of pictures for all blocks of the B-picture reference pair; according to the MPEG-2 standard, different blocks of the same B-picture are not allowed to refer to different pairs of pictures in the video stream. The VC-1 coding standard also complies with the aforementioned limitations. In contrast, the MPEG-4 standard is not similarly limited; different blocks of a given picture in an MPEG-4 encoded video stream may refer to different pictures (in the case of P-pictures) or different pairs of pictures (in the case of B-pictures) in the video stream.

Furthermore, the MPEG-2 standard specifies that only the two most recently decoded frames of a reference picture are retained in the frame buffer so that blocks of other pictures can reference the decoded frames. Whenever a new frame of reference pictures is encountered in an MPEG-2 encoded video stream, if two decoded frames already exist in the frame buffer, one of the decoded frames is removed from the frame buffer to make room for the new frame. This imposes a restriction on the set of other frames that a block in the MPEG-2 encoded stream can refer to. The VC-1 coding standard also has the aforementioned limitations. In contrast, under the MPEG-4 standard, 16 decoded frames of reference pictures (or 32 decoded fields of reference pictures) may be retained in a frame buffer so that blocks of other pictures can refer to the decoded frames. Thus, under the MPEG-4 standard, the set of other frames that a block can reference is less limited.

Furthermore, under the MPEG-2 standard, at least the most recently decoded frame is selected for removal whenever a frame needs to be removed from the frame buffer, as described above. In contrast, under the MPEG-4 standard, any given frame in the frame buffer may be selected for removal regardless of how recently the given frame was decoded, regardless of when the frame needs to be removed from the frame buffer.

The functions of a Digital Video Recorder (DVR) include playback, random access, and "trick play" of content. Trick play functions include play pause, fast forward and rewind performed at various frame rates or play speeds. Despite the differences between MPEG-2 and other more advanced standards, such as VC1(SMPTE-421M) and AVC (MPEG-4 part 10 or h.264), commercially available DVRs typically run trick play functions as if the DVR had to run under the limitations of at least some of the old MPEG-2 standards. As a result, commercially available DVRs provide their users with a relatively simple and natural trick play experience. Conventional methods of performing trick play functions of DVRs, which have the characteristics of intra-stream inaccurate reconfiguration (rearrangement), low number of frames per second, etc., typically utilize a large amount of resources-including processor resources, memory, and/or hard disk space-or provide a tedious viewing experience. There is a need for a method of providing trick play functionality in a DVR with an advanced codec or legacy codec that occupies a limited amount of additional resources beyond that required for conventional playback while providing a high quality viewing experience.

Drawings

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements; wherein:

FIG. 1 shows an example of a representation of a series of images in a data stream portion representing a moving video program;

FIG. 2 shows an example of a dependency graph constructed by a DVR based on the series of images shown in FIG. 1, according to one embodiment of the invention;

FIG. 3 is a block diagram of the internal structure and operation of a DVR, according to one embodiment of the invention; and

fig. 4 is a block diagram showing a digital video recorder on which an implementation of the embodiment is based.

Detailed Description

The present invention describes a method and apparatus for performing trick play functions in a digital video recorder by efficiently utilizing resources. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

Specific embodiments are described herein according to the following summary:

1.0 general overview

2.0DVR overview

3.0 performing trick play functions in a digital video recorder with efficient utilization of resources

4.0 execution mechanisms-hardware overview

5.0 extensions and alternatives

1.0 general overview

The needs identified in the foregoing background, and other needs and objects that will become apparent for the following description, are achieved by the present invention, which comprises, in one aspect, a method for selecting frames of a multi-frame video program for play in accordance with a selected trick play mode of a DVR. For example, a DVR may perform such a method.

The DVR should be capable of playing back video programs at speeds desired by the DVR user, including speeds faster than normal playback speeds. For example, a DVR user may desire that the DVR play a video stream at three times the normal playback speed. However, even assuming that the DVR has the processing power required to decode the video stream pictures at three times the normal playback speed, the DVR-driven player cannot play the pictures at three times the normal playback speed. For example, an NTSC-compatible player can play up to 35 images per second. In this case, the DVR may waste the DVR's processing resources by decoding certain pictures that cannot be displayed in playback anyway.

Because DVRs store programs in encoded form (to conserve storage space), and because the pictures of these video programs require time and processing resources to decode, when the desired playback speed is faster than normal, it is often not possible for a DVR (which has limited processing power) to decode and play all temporally adjacent pictures of a program. Sometimes, certain pictures must be skipped during playback to maintain the desired playback speed.

For example, when a DVR user desires to view a program at twice the normal playback speed, the DVR's limited resources do not allow the DVR to decode twice as many pictures in the same amount of time as would be required to decode half of the pictures at normal playback speed. Based on the DVR's resources, the DVR is forced to skip certain pictures. According to one approach, if the data stream is an MPEG-2 data stream, the DVR may simply skip decoding and playing of some or all of the B-pictures in the data stream, since the B-pictures are not reference pictures. However, sometimes when this operation is performed, the quality of the viewing experience can be significantly degraded. When skipping the image of a program in playback, the program may appear jittery. The images played may appear to be disjointed and have no relationship to each other. There is little or no visual transition between the images being played.

Ideally, in order to maintain "fluid" viewing quality to the greatest extent possible while meeting playback speed requirements, the next image to be played in playback should be as close in time (meaning presentation order) to the currently playing image as possible within the limits of playback speed. Some pictures take longer to decode than others. As described above, under the MPEG-4 standard, 16 decoded frames of reference pictures (or 32 decoded fields of reference pictures) can be retained in a frame buffer so that blocks of other pictures can refer to the decoded frames. A particular picture that contains blocks that refer back to a large number of other pictures can be computationally expensive to decode, since in this case the DVR needs to ensure that each of the other pictures is decoded first-if the decoding of the particular picture is not skipped, then the decoding of the other pictures to which the blocks of the particular picture refer cannot be skipped. Calculating the cost of image decoding can help ensure that the "best" image is selected for decoding given user-specified playback speed requirements. Pictures that are too costly to decode at the required playback speed can be excluded from the decoding process faster than normal playback.

Certain embodiments of the present invention take into account the decoding cost of pictures when selecting which pictures of a program to decode and play. According to one embodiment of the invention, for each of a plurality of pictures in a data stream representing the program, a cost associated with decoding the picture is determined. Selecting a particular image from the plurality of images based on costs associated with images of the plurality of images. The specific picture is decoded and played.

In one embodiment of the invention, the DVR constructs a dependency graph that indicates that: for each particular picture in the data stream, the DVR needs to decode the minimum set of other pictures in order to decode that particular picture. The DVR can use the information indicated in the dependency graph to calculate the cost required to decode any picture in the data stream. Thus, in one embodiment of the invention, the DVR constructs a dependency graph for the data stream and then selects pictures to decode and play based on both the information in the dependency graph and the playback speed currently specified by the DVR user.

In further aspects, the invention includes a computer apparatus and a computer readable medium configured to execute the foregoing steps.

2.0DVR overview

FIG. 3 is a block diagram illustrating an example of the internal structure and operation of a DVR, according to one embodiment of the invention. The internal structure and operation of a DVR is also described in U.S. patent No. 6,233,389, which is incorporated by reference as if originally disclosed.

The DVR shown in fig. 3 comprises an input module 301, a media converter 302 and an output module 303. Input module 301 receives various forms of Television (TV) input streams. For example, the television input stream received by input module 301 may be a National Television Standards Committee (NTSC) compliant signal or a Phase Alternating Line (PAL) compliant broadcast signal. As another example, the television input stream received by input module 301 may be in digital form, such as a Digital Satellite System (DSS) compatible signal, a Digital Broadcast Service (DBS) compatible signal, or an Advanced Television Standards Committee (ATSC) system compatible signal. DBS, DSS, and ATSC are based on standards known as moving Picture experts group 2(MPEG-2) and MPEG-2 transport. MPEG-2 transmission is a standard for formatting a digital data stream from a television source transmitter so that a television receiver can parse the input stream to find a program in a multiplexed signal. According to one embodiment of the invention, the input module 301 generates an MPEG stream. According to another embodiment of the invention, the input module 301 generates streams encoded with different codecs.

MPEG-2 transport multiplexing supports multiple programs in the same broadcast channel, which multiplexes video and audio inputs as well as private data. The input module 301 tunes the channel to a particular program, extracts a specified MPEG stream from the channel, and inputs the MPEG stream to the rest of the system. Analog television signals are encoded into a similar MPEG format using separate video and audio encoders so that the rest of the system is unaware of how the signal was obtained. Information can be modulated into the Vertical Blanking Interval (VBI) of an analog television signal in a number of standard ways; for example, the north american teletext standard (NABTS) is used to modulate information onto a particular line of an NTSC signal, and the united states Federal Communications Commission (FCC) requires that certain other lines be used for Closed Captioning (CC) and Extended Data Services (EDS). These signals are decoded by input module 301 and passed to other modules as if the signals were passed over an MPEG-2 dedicated data channel.

The media converter 302 is between the microprocessor CPU 306, the hard disk or storage device 305 and the memory 304. The input stream is converted to an MPEG stream and sent to the media converter 302. The media converter 302 buffers the MPEG stream into the memory 304. If the DVR user is watching real-time television, the media converter 302 performs two operations: the media converter 302 sends the MPEG stream to the output module 303 and simultaneously writes the MPEG stream to the hard disk or storage device 305.

Output module 303 receives the MPEG stream as input and generates an analog television signal according to NTSC, PAL, or other television standards. The output module 303 includes an MPEG decoder, an On Screen Display (OSD) generator, an analog television encoder, and audio logic. The OSD generator allows the program logic to provide an image that can be superimposed on top of the generated television analog signal. In addition, output module 303 can modulate information provided by the program logic onto the VBI of the output signal in a variety of standard formats including NABTS, CC and EDS.

3.0 performing trick functions in a digital video recorder with efficient utilization of resources

3.1 dependency graph

According to one embodiment of the invention, the DVR constructs a dependency graph for pictures in the data stream based on information contained in the headers of the pictures in the data stream. For example, the DVR may determine, from a header of a particular picture in the data stream, a specified set of other pictures that the particular picture block refers to. The DVR may obtain information from, for example, a picture header in an MPEG-4 data stream. If the data stream is an MPEG-4 data stream, a particular tile may reference up to 16 other frame pictures (or up to 32 other field pictures) in the data stream.

In one embodiment of the invention, a DVR receives a data stream representing pictures in decoding order. When the DVR receives the data stream, the DVR examines the picture header (header) in the data stream and creates and stores a dependency graph based on dependency information contained in the picture header. For a particular picture in each data stream, the particular picture header identifies other pictures to which the block of the particular picture refers (and thus on which the block depends). In one embodiment of the invention, a DVR records a data stream from a source (e.g., satellite, cable, etc.) and creates a dependency graph when the data stream is recorded. Thus, by the time the DVR has recorded the entire data stream, the DVR will completely construct a dependency graph for the data stream. This does not mean that the dependency graph cannot be used until it is completed; in one embodiment of the invention, at any time while the DVR records a received data stream, the DVR is able to utilize the partially constructed dependency graph to more efficiently perform certain operations.

Fig. 1 shows an example of a representation of a series of images in a partial data stream. The pictures represented in fig. 1 by the letters indicating the picture type (I, P or B) appear in the data stream in decoding order rather than presentation order. The numbers in parentheses of each image indicate the position of the image in the presentation order. In decoding order, the pictures are: i (2), B (0), B (1), P (5), B (3), B (4), P (8), B (6) and B (7). In presentation order, the images are: b (0), B (1), I (2), B (3), B (4), P (5), B (6), B (7) and P (8). As shown in fig. 1, B (0) refers to I (2), B (1) refers to B (0) and I (2), P (5) refers to B (1), B (3) refers to I (2) and P (5), B (4) refers to B (3), P (8) refers to B (3), B (6) refers to P (8) and B (4), and B (7) refers to B (6). The portion of the data stream shown is assumed to conform to a coding standard that allows B-pictures to be reference pictures, although some coding standards do not allow B-pictures to be reference pictures.

Fig. 2 shows an example of a dependency graph that a DVR constructs based on the series of pictures shown in fig. 1. The DVR is started to add I (2) to the dependency graph. The DVR determines that B (0) refers to I (2), and thus adds B (0) to the dependency graph and the connections from B (0) to I (2) to the dependency graph. The DVR determines that B (1) refers to B (0) and I (2). The DVR adds B (1) to the dependency graph and adds the connection of B (1) to B (0) to the dependency graph. Since B (1) is already connected to B (0), and B (0) is already connected to I (2), the DVR does not need to add a direct connection between B (1) and I (2). The DVR determines that P (5) references B (1) and thus adds P (5) to the dependency graph and the connection from P (5) to B (1) to the dependency graph. The DVR determines that B (3) refers to I (2) and P (5). The DVR adds B (3) to the dependency graph and adds the connections from B (3) to P (5) to the dependency graph. Since B (3) is already connected to P (5), and since P (5) is already directly connected to I (2) through the link chain from P (5) to B (1) to B (0) to I (2), the DVR need not add a direct connection between B (3) and I (2). The DVR determines that B (4) refers to B (3) and thus adds B (4) to the dependency graph and adds the connections from B (4) to B (3) to the dependency graph. The DVR determines that P (8) also references B (3), and thus adds P (8) to the dependency graph and the connection from P (8) to B (3) to the dependency graph. The DVR determines that B (6) references P (8) and B (4) and thereby adds B (6) to the dependency graph and adds one connection between B (6) and P (8) and another connection between B (6) and B (4) to the dependency graph. The DVR determines that B (7) refers to B (6) and P (8). The DVR adds B (7) to the dependency graph and adds the connections from B (7) to B (6) to the dependency graph. Since B (7) is already connected to B (6), and B (6) is already connected to P (8), the DVR does not need to add a direct connection between B (7) and P (8).

It can be deduced from the above example that in one embodiment of the invention, whenever a DVR is adding a picture to a dependency graph, the DVR first checks whether two pictures have been indirectly connected through other existing connection chains in the dependency graph before adding (based on reference information in the picture header) a connection between the two pictures in the dependency graph. If the DVR determines that two pictures are already connected in this manner, the DVR need not add a direct connection between the two pictures to the dependency graph. Alternatively, if the DVR determines that two pictures have not been connected in the manner described above, the DVR adds a direct connection between the two pictures (assuming that at least one of the picture header information indicates that such a connection should exist) to the dependency graph.

After the DVR has constructed the dependency graph shown in fig. 2, the DVR can determine, for example, that the DVR needs to decode the following pictures before the DVR has completely decoded and presented B (4): i (2), B (0), B (1), P (5) and B (3). By determining that the decoding of B (4) also requires decoding of the other three pictures, the DVR begins to estimate the cost of decoding B (4). The DVR can take advantage of this cost and the costs associated with other pictures in the dependency graph when selecting the next picture to decode and play while operating in a "trick play" mode.

In determining the cost of decoding a particular picture, the DVR may utilize the picture cache to determine which other pictures that the particular picture depends on have already been decoded. The DVR reduces the estimated cost of decoding the particular picture if one or more other pictures on which the particular picture depends are already in decoded form in the picture cache. For example, although decoding of B (4) requires decoding of I (2), B (0), B (1), P (5), and B (3) in the above example, if I (2), B (0), B (1), P (5), and B (3) have already been decoded and are currently located in the DVR picture buffer, the cost of decoding B (4) may be as little as the cost of decoding B (4) itself.

In an alternative embodiment of the invention, each image is subdivided into two or more regions, and instead of reflecting dependencies from one complete image to another complete image, the dependency graph reflects dependencies from one image region to another image region; each node of the dependency graph may represent a region of an image rather than the entire image. For example, in one embodiment of the invention, each node of the dependency graph may represent half of the image (e.g., left or right). Thus, in one embodiment of the invention, the dependency graph may indicate dependencies at a finer granularity than the entire image. In such embodiments of the invention, the DVR may determine the cost of decoding a particular picture based, at least in part, on the cost of decoding the region into which the particular picture is subdivided.

Once the DVR builds the dependency graph, the DVR can utilize the dependency graph to improve the performance of various operations that a user may command the DVR to perform. One such operation is "random access," in which a user specifies a location in a data stream (e.g., through a timeline) that the user wants to view immediately; the user may instruct the DVR to "jump" to a specified location in the data stream and begin presenting the program represented by the data stream at the specified location. Another such operation (or class of operations) is "trick play," in which a user indicates a multiplier (e.g., two times faster, three times faster, etc. forward or reverse). In performing such an operation, the DVR presents pictures from the data stream in fast forward or rewind at the user-specified faster speed, when the user specifies.

3.2 random Access operation

For example, the user may instruct the DVR to begin presenting the program at a time position corresponding to B (3) (the picture header in the data stream may indicate a time stamp (time stamp) indicating the time at which the pictures are expected to be presented relative to each other). Under such conditions, B (3) is the "target" image. This is a random access operation. Thus, the DVR can determine from the dependency graph that the cost of decoding B (3) is based on the fact that: to decode B (3), the DVR needs to first decode I (2), B (0), B (1), and P (5) (and also B (3) itself) -assuming no such pictures have been decoded and are currently located in the DVR picture buffer (typically, when performing a random access operation, the temporal location to which the DVR user wants to "jump" is far enough away from the current location that pictures in the DVR picture buffer that have been decoded are not useful in performing the operation). The greater the number of other pictures that need to be decoded before a particular picture (e.g., B (3)) is decoded, the higher the estimated cost of decoding that particular picture will typically have.

Thus, in one embodiment of the invention, the initial estimated cost for decoding a particular picture (or region) is equal to the total number of other pictures (or regions) that the DVR needs to decode in order to decode the particular picture (or region). If any of the other pictures on which B (3) depends are already in the DVR picture cache, then the DVR may take advantage of this fact to reduce the estimated cost of decoding B (3). For example, if P (5) is already decoded and currently located in the DVR picture cache, the DVR may determine that the cost of decoding only B (3) itself may impact the total cost of decoding B (3) (if P (5) is already located in the picture cache, then since I (2), B (0), or B (1) is already used to decode P (5), even if B (3) also depends on these pictures, there is no need to decode any of I (2), B (0), or B (1).

After the DVR has determined the decoding cost of B (3), e.g., B (3) is a picture that the DVR user would like to "skip" to in a random access operation, the DVR may also determine the decoding cost of other pictures that are temporally close to B (3) in the presentation order to decode in a similar manner. For example, the DVR may also determine the decoding cost of decoding I (2) and B (4), both I (2) and B (4) being close to B (3) in the presentation order. The DVR may determine that the cost of decoding I (2) is significantly lower than the cost of decoding B (3), because the decoding of I (2) does not require the decoding of any other pictures (I (2) is not dependent on any other pictures).

For each picture in the set of pictures that is temporally close (in presentation order) to the "target" picture (e.g., B (3)), the DVR may determine a weighted cost for the picture. The weighted cost is based on (a) a decoding cost of the picture and (b) a temporal distance (in presentation order) from a target picture. Images that are very close in time to the target image have more weight than images that are not close in time-therefore, the target image is more important than all other images. Regardless of the more weights, the cost of weighting of the target image still exceeds the cost of weighting of other temporally close images due to the potentially higher decoding cost of the target image. In one embodiment of the invention, to determine the weighted cost for a given picture, the DVR multiplies the decoding cost for that picture by a value that is based on the temporal distance (e.g., in units of time) of that picture from the target picture. For example, to calculate a weighted cost for a picture that is 2 seconds from the target picture, the DVR multiplies the decoding cost for that picture by 2. In this embodiment, the weighted cost of the target picture may be set equal to the decoding cost of the target picture (to avoid multiplication by 0).

In various embodiments of the present invention, the decoding cost of an image may have a different degree of impact on the weighted cost of the image relative to the temporal distance of the image from the target image; in one embodiment of the invention said decoding cost affects the weighted cost to a greater extent than said temporal distance, and in another embodiment of the invention said decoding cost affects the weighted cost to a lesser extent than the temporal distance.

Based on the weighted cost associated with each picture under consideration (which is a selected subset of all pictures, e.g., the set of 10 (or some other specified number) pictures that are temporally closest to the target picture, both earlier and later in presentation order), the DVR selects one of the pictures based on the weighted cost of that picture. For example, the DVR may select the picture associated with the lowest weighted cost. The image may or may not be the target image. These pictures may or may not be the pictures with the lowest decoding cost. After selecting the picture, in one embodiment, the DVR decodes the selected picture and any other pictures that the DVR needs to decode in order to decode the selected picture. The DVR then presents the decoded selected picture and proceeds to present the content of the data stream from that point in the data stream, thus completing the random access operation.

By selecting the picture with the lowest weighted cost, the DVR can reduce the amount of time that the user must wait for the DVR to jump to the desired location in the data stream, while also jumping to a point that is not actually distinguishable from the point actually specified by the user because it is very close in time to the point actually specified by the user. The techniques described above may be contrasted with the method in which a DVR typically jumps to a target picture; this approach sometimes results in an intolerably long delay when the DVR decodes a large number of pictures. The techniques described above may also be contrasted with the approach in which a DVR typically skips to an I-picture that is closest in decoding order to the target picture in a data stream; this approach can sometimes cause the DVR to jump to a location in the presentation order that has a longer distance from the location that the user actually desires to jump to.

3.3 trick Play operation

As described above, the DVR can also use the dependency graph to select pictures that should be decoded and played in "trick play" mode operation. There are 4 conventional types of trick play operations: (1) playing the data stream forward at a speed faster than a normal playback speed; (2) playing back the data stream at normal speed or faster than normal playback speed; (3) playing the data stream forward at a slower than normal playback speed; and (4) playing back the data stream at a slower than normal playback speed.

No special consideration is needed when the DVR plays the data stream forward at a slower than normal playback speed. Assuming that the DVR can decode and play all pictures in the data stream at normal playback speed, the DVR should also be capable of decoding and playing all such pictures at any speed slower than normal playback speed. Thus, when performing such trick play operations, the DVR can actually decode and play each picture in the data stream. In such a case, the DVR need not skip decoding or presentation of any pictures.

However, according to one embodiment of the invention, when the DVR plays the data stream forward or backward at a speed faster than the normal playback speed, the DVR may not decode and present each picture (due to processing limitations) at the speed specified by the DVR user. For example, a DVR user may desire that the DVR present the program represented by the data stream at three times the normal playback speed, but the DVR may not be capable of decoding pictures at three times the normal playback speed (e.g., the DVR may be capable of decoding pictures at half that speed). Thus, the DVR needs to select, from among the pictures in the data stream, a subset of the pictures that the DVR decodes and presents to the user. According to one embodiment of the invention, the DVR uses the information in the dependency graph to better select the pictures to be decoded and presented to the user, to ensure temporal presentation consistency as much as possible while also meeting the user's playback speed requirements.

According to one embodiment of the invention, when the user-specified playback speed is greater than the speed at which the DVR is capable of decoding pictures, the DVR decodes pictures at the speed at which the pictures can be decoded while skipping decoding and presentation of at least some of the pictures in the data stream to maintain the user-specified playback speed. To provide a "smooth" viewing experience for DVR users (by not being able to wait too long between presentations of different pictures), DVRs select, decode, and present pictures based at least in part on the decoding costs of the pictures. The decoding cost and its estimation have been discussed above in the context of random access operations.

In one embodiment of the invention, in the faster-than-normal fast-forward playback mode, at a user-specified speed that is faster than the speed at which the DVR can decode pictures, the DVR divides the user-specified playback speed (e.g., 60 pictures per second) by the fastest speed at which the DVR can actually decode pictures (e.g., 7.5 pictures per second). The quotient is taken as "skipped digits". The DVR locates a picture that is many seconds ahead in time of the current picture (in presentation order), and if the result is not an integer, approximates the nearest integer; for example, if the current picture is at a position of 3 seconds in the presentation direction, and if the skip number is 8 (i.e., 60/7.5), the DVR positions the picture at a position of 13 seconds (i.e., 3+8) in the presentation direction. In some embodiments of the invention, the skip number may be selected to be some specified amount greater than the quotient discussed above to compensate for the time that the DVR takes to locate (locate) and decode the appropriate picture.

For example, the image at the 13-second position is B (97). This makes B (97) a "target" picture-a picture that the DVR should decode and present if the DVR is able to be sufficiently fast. However, if B (97) has a very high decoding cost (e.g., because B (97) depends on a large number of other pictures), the DVR may not be able to maintain the user-specified speed if the DVR actually decodes and presents B (97). The DVR calculates a decoding cost for B (97) and determines whether the decoding cost is below a specified threshold. The specified threshold is a value selected based on, for example, information about how fast the DVR is capable of decoding a single picture; this value depends on the DVR hardware and structure and varies from DVR to DVR. If the decoding cost is below a specified threshold, this indicates that the DVR is capable of decoding and presenting B at a sufficiently fast speed to maintain the user-specified playback speed (97). In such a case, the DVR decodes B (97) (and any pictures that need to be decoded in order to decode B (97), as shown in the dependency graph) and presents B (97) to the user before performing the above-described process again, when B (97) is the current picture.

Alternatively, if the decoding cost is not below the specified threshold, this indicates that the DVR is not capable of decoding and presenting B (97) sufficiently fast to guarantee the user-specified playback speed. In such a case, the DVR looks up another picture (a) that can be decoded at a speed sufficiently fast to ensure that the user specifies the playback speed and (B) that the picture is as close in time (in presentation order) as possible to B (97), the "target" picture in this example.

In one embodiment of the invention, to find the picture, the DVR forms a collection of other pictures (e.g., 16 pictures or some other specified number of pictures) that are temporally close to the target picture (B (97) in this example) in presentation order (before and after the target picture). Using the dependency graph, the DVR calculates a decoding cost for each of the other pictures. If at least one other picture has a decoding cost lower than the particular threshold, the DVR selects, from among the pictures whose decoding cost is lower than the particular threshold, the picture that is temporally closest (in presentation order) to the target picture. In this case, the DVR decodes the selected picture (and any other pictures that need to be decoded in order to decode the selected picture) and presents the selected picture to the user, before performing the above process again, when the target picture is B (97) (not the selected picture, to prevent adding temporal "sliding" in playback) as the new current picture.

Alternatively, if no other pictures have a decoding cost below the specified threshold, then in one embodiment of the invention, the DVR only presents the current picture before performing the above process again, with the target picture (e.g., B (97), which in this case is not decoded and presented) as the new current picture. In this case, the DVR cannot locate any suitable picture to decode and present while maintaining the user-specified playback speed, and thus the DVR simply presents the same picture again before proceeding. Ideally, this situation is generally avoided as much as possible, since it reduces the "fluency" of the presentation and ultimately leads to a larger time interval between the presented images.

In at least some embodiments of the invention, whenever the DVR calculates the decoding cost for any picture, the DVR reduces the decoding cost for a significant number of the pictures that the DVR depends on if one or more of the pictures are already present in the DVR's picture cache in decoded form. In this case, the DVR does not need much time to decode the picture, and therefore the decoding cost of the picture is reduced to a degree based on the time saved due to the presence of other pictures in the DVR's picture cache.

In one embodiment of the invention, in a mode faster than regular backward playback, at a user-specified speed faster than the speed at which the DVR can decode pictures, the DVR utilizes techniques similar to the mode faster than regular forward playback described above, except that instead of locating the target picture that temporally precedes the current picture in presentation order, the DVR locates the target picture that temporally follows the current picture in presentation order based on the skipped digits.

In one embodiment of the invention, if the DVR is capable of decoding pictures at a user-specified playback speed, regardless of whether the user-specified playback speed is greater than the normal playback speed, the DVR decodes and plays all pictures in the data stream. Thus, in one embodiment of the invention, in the event that the user-specified playback speed is greater than the DVR maximum picture decoding speed, the DVR skips decoding and rendering only certain pictures.

4.0 implementation mechanisms-hardware overview

FIG. 4 illustrates a block diagram of a computer system 400 upon which an embodiment of the invention may execute. Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. Computer system 400 also includes a main memory 406, such as a random access memory ("RAM") or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. Computer system 400 also includes a read only memory ("ROM") 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.

Computer system 400 may be coupled via bus 402 to a display 412, such as a cathode ray tube ("CRT"), for displaying information to a computer user. An input device 414, including alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416, such as a mouse, a trackball, a stylus, or cursor direction keys (cursor direction keys) for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. The input device typically has degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.

The present invention is directed to the use of computer system 400 for selecting frames of a multi-frame video program to be played based on a selected DVR trick play mode. According to one embodiment of the invention, selecting a frame of a multi-frame video program to play in accordance with a selected DVR trick play mode is provided by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406. Such instructions may be read into main memory 406 from another computer-readable medium, such as storage device 410. Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 404 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410. Volatile media includes dynamic memory, such as main memory 406. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infrared data transmissions.

Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other type of magnetic medium, a CD-ROM, any other optical medium, punch cards (punchcard), paper tape (papertape), any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge (cartridge), a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can download the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402. Bus 402 carries the data to main memory 406, from which main memory 406 processor 404 can retrieve and execute the instructions. The instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404.

Computer system 400 also includes a communication interface 418 coupled to bus 402. Communication interface 418 provides a two-way data communication coupling to a network link 420, network link 420 connects to a local network 422. For example, communication interface 418 may be an integrated services digital network ("ISDN") card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 418 may be a local area network ("LAN") card to provide a data communication connection to a compatible LAN. Wireless links may also be used. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 may provide a connection through local network 422 to a host computer 424 or to digital equipment operated by an Internet service provider ("ISP") 426. ISP 426 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 428. Local network 422 and internet 428 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 420 and through communication interface 418, which carry the digital data to and from computer system 400, are exemplary forms of carrier waves transporting the information.

Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418. In accordance with the present invention, one such downloaded application provides for selecting frames of a multi-frame video program to be played in accordance with a selected DVR trick play mode as described herein.

The received code may be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution. In this manner, computer system 400 may obtain application program code in the form of a carrier wave.

4.0 extensions and alternatives

In the foregoing specification, the invention has been described with reference to specific embodiments. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method for selecting pictures of a multi-picture video program for playback, the method comprising:

determining a cost associated with decoding the picture for each picture of a plurality of pictures representing the program in a data stream;

selecting a particular image from the plurality of images based at least in part on costs associated with images of the plurality of images;

wherein the specific image is an image that is not a target image to which the user wants to jump the digital video recorder DVR; and

outputting the specific image for playing instead of the target image in response to a request from a user to jump to the target image;

wherein selecting the particular picture comprises selecting the picture with the lowest weighted cost that is not the picture with the lowest decoding cost, wherein the weighted cost is determined based on both the decoding cost of the picture and the temporal distance from the target picture.

2. The method of claim 1, further comprising:

receiving at least a portion of the data stream at the DVR;

encoding each image of the plurality of images based on a codec; and

a dependency graph is generated and stored on the DVR that represents which of the plurality of pictures depend on which other of the plurality of pictures.

3. The method of claim 1, wherein the step of determining a cost for each image of the plurality of images comprises:

a minimum number of other images that need to be decoded before each individual image of the plurality of images can be played is determined for that individual image.

4. The method of claim 1, wherein the step of determining a cost for each image of the plurality of images comprises performing the following steps for each individual image of the plurality of images:

determining a set of one or more other images on which the individual image depends to represent a complete frame; and

it is determined how many pictures in the picture set have been decoded and have been presented in the DVR picture cache.

5. The method of claim 1, wherein each picture of the plurality of pictures is at least one of an I-picture, a P-picture, or a B-picture.

6. The method of claim 1, wherein selecting the particular picture based on a cost associated with an access unit of the plurality of access units comprises:

when the digital video recorder is operating in the designated mode, selecting a particular image from the plurality of images based at least in part on whether a cost associated with an image in the plurality of images exceeds a designated value, wherein the designated value corresponds to a maximum amount of time allowed for a time interval during which a different image of the program is played.

7. An apparatus for selecting pictures of a multi-picture video program for playback, the apparatus comprising:

means for determining a cost associated with decoding each picture of a plurality of pictures in a data stream representing a program;

means for selecting a particular image from the plurality of images based at least in part on costs associated with images of the plurality of images;

wherein the particular image is an image that is not a target image to which the user wants to jump the apparatus; and

a module for outputting the specific image instead of the target image for playing in response to a request from a user to jump to the target image;

wherein the module that selects the particular picture is configured to select the picture with the lowest weighted cost that is not the picture with the lowest decoding cost, wherein the weighted cost is determined based on both the decoding cost of the picture and the temporal distance from the target picture.

8. The apparatus of claim 7, further comprising:

a module that receives at least a portion of the data stream;

a module for encoding each of the plurality of images based on a codec; and

a module that generates and stores on a device a dependency graph representing which images of the plurality of images depend on which other images of the plurality of images.

9. The apparatus of claim 7, further comprising:

means for determining, for each individual image of the plurality of images, a minimum amount of other images that need to be decoded before the image corresponding to the individual image is played.

10. The apparatus of claim 7, further comprising: for each individual image of the plurality of images,

a module for determining a set of one or more other images on which each individual image depends to represent a complete frame; and

means for determining how many pictures in the set of pictures have been decoded and have been presented in a picture buffer of a device.

11. The apparatus of claim 7, wherein each picture of the plurality of pictures is at least one of an I-picture, a P-picture, or a B-picture.

12. The apparatus of claim 7, further comprising:

means for selecting a particular picture from the plurality of pictures based at least in part on whether a cost associated with a picture in the plurality of pictures exceeds a specified value when the digital video recorder is operating in a specified mode, wherein the specified value corresponds to a maximum amount of time allowed for a time interval during which different pictures of the program are played.