HK1188353A - Interactive encoded content system including object models for viewing on a remote device - Google Patents
Interactive encoded content system including object models for viewing on a remote device Download PDFInfo
- Publication number
- HK1188353A HK1188353A HK14101270.3A HK14101270A HK1188353A HK 1188353 A HK1188353 A HK 1188353A HK 14101270 A HK14101270 A HK 14101270A HK 1188353 A HK1188353 A HK 1188353A
- Authority
- HK
- Hong Kong
- Prior art keywords
- content
- mpeg
- video
- frame
- encoded
- Prior art date
Links
Description
U.S. Patent Application entitled "MPEG Objects and Systems and Methods for Using MPEG Objects " and assigned to the same assignee filed contemporaneously herewith on January 11, 2008 is related generally to the subject matter of the present application and is incorporated herein by reference in its entirety and is attached as Appendix B.
The present application claims priority from U.S. provisional applications serial no. 60/884,773, filed January 12, 2007 , serial no 60/884,744, filed January 12, 2007 , and serial no. 60/884,772, filed January 12, 2007 , the full disclosures of which are hereby incorporated herein by reference.
The present invention relates to systems and methods for providing interactive content to a remote device and more specifically to systems and methods wherein an object model is associated with pre-encoded video content.
In cable television systems, the cable head-end transmits content to one or more subscribers wherein the content is transmitted in an encoded form. Typically, the content is encoded as digital MPEG video and each subscriber has a set-top box or cable card that is capable of decoding the MPEG video stream. Beyond providing linear content, cable providers can now provide interactive content, such as web pages or walled-garden content. As the Internet has become more dynamic, including video content on web pages and requiring applications or scripts for decoding the video content, cable providers have adapted to allow subscribers the ability to view these dynamic web pages. In order to composite a dynamic web page for transmission to a requesting subscriber in encoded form, the cable head end retrieves the requested web page and renders the web page. Thus, the cable headend must first decode any encoded content that appears within the dynamic webpage. For example, if a video is to be played on the webpage, the headend must retrieve the encoded video and decode each frame of the video. The cable headend then renders each frame to form a sequence of bitmap images of the Internet web page. Thus, the web page can only be composited together if all of the content that forms the web page is first decoded. Once the composite frames are complete, the composited video is sent to an encoder, such as an MPEG encoder to be re-encoded. The compressed MPEG video frames are then sent in an MPEG video stream to the user's set-top box.
Creating such composite encoded video frames in a cable television network requires intensive CPU and memory processing, since all encoded content must first be decoded, then composited, rendered, and re-encoded. In particular, the cable headend must decode and re-encode all of the content in real-time. Thus, allowing users to operate in an interactive environment with dynamic web pages is quite costly to cable operators because of the required processing. Additionally, such systems have the additional drawback that the image quality is degraded due to re-encoding of the encoded video.
Embodiments of the invention disclose a system for encoding at least one composite encoded video frame for display on a display device. The system includes a markup language-based graphical layout, the graphical layout including frame locations within the composite frame for at least the first encoded source and the second encoded source. Additionally, the system has a stitcher module for stitching together the first encoded source and the second encoded source according to the frame locations of the graphical layout. The stitcher forms an encoded frame without having to decode the block-based transform encoded data for at least the first source. The encoded video may be encoded using one of the MPEG standards, AVS, VC-1 or another block-based encoding protocol.
In certain embodiments of the invention, the system allows a user to interact with graphical elements on a display device. The processor maintains state information about one or more graphical elements identified in the graphical layout. The graphical elements in the graphical layout are associated with one of the encoded sources. A user transmits a request to change state of one of the graphical elements through a client device in communication with the system. The request for the change in state causes the processor to register the change in state and to obtain a new encoded source. The processor causes the stitcher to stitch the new encoded source in place of the encoded source representing the graphic element. The processor may also execute or interpret computer code associated with the graphic element.
For example, the graphic element may be a button object that has a plurality of states, associated encoded content for each state, and methods associated which each of the states. The system may also include a transmitter for transmitting to the client device the composited video content. The client device can then decode the composited video content and cause the composited video content to be displayed on a display device. In certain embodiments each graphical element within the graphical layout is associated with one or more encoded MPEG video frames or portions of a video frame, such as one or more macroblocks or slices. The compositor may use a single graphical element repeatedly within the MPEG video stream. For example, the button may be only a single video frame in one state and a single video frame in another state and the button may be composited together with MPEG encoded video content wherein the encoded macroblocks representing the button are stitched into the MPEG encoded video content in each frame.
Other embodiments of the invention disclose a system for creating one or more composite MPEG video frames forming an MPEG video stream. The MPEG video stream is provided to a client device that includes an MPEG decoder. The client device decodes the MPEG video stream and outputs the video to a display device. The composite MPEG video frames are created by obtaining a graphical layout for a video frame. The graphical layout includes frame locations within the composite MPEG video frame for at least a first MPEG source and a second MPEG source. Based upon the graphical layout the first and second MPEG sources are obtained. The first and second MPEG sources are provided to a stitcher module. The stitcher module stitches together the first MPEG source and the second MPEG source according to the frame locations of the graphical layout to form an MPEG frame without having to decode the macroblock data of the MPEG sources.
In certain embodiments, the MPEG sources are only decoded to the slice layer and a processor maintains the positions of the slices within the frame for the first and second MPEG sources. This process is repeated for each frame of MPEG data in order to form an MPEG video stream.
In certain embodiments, the system includes a groomer. The groomer grooms the MPEG sources so that each MPEG element of the MPEG source is converted to an MPEG P-frame format. The groomer module may also identify any macroblocks in the second MPEG source that include motion vectors that reference other macroblocks in a section of the first MPEG source and re-encodes those macroblocks as intracoded macroblocks.
The system may include an association between an MPEG source and a method for the MPEG source forming an MPEG object. In such a system, a processor would receive a request from a client device and in response to the request, a method of the MPEG object would be used. The method may change the state of the MPEG object and cause the selection of a different MPEG source. Thus, the stitcher may replace a first MPEG source with a third MPEG source and stitch together the third and second MPEG sources to form a video frame. The video frame would be streamed to the client device and the client device could decode the updated MPEG video frame and display the updated material on the client's display. For example, an MPEG button object may have an "on" state and an "off" state and the MPEG button object may also include two MPEG graphics composed of a plurality of macroblocks forming slices. In response to a client requesting to change the state of the button from off to on, a method would update the state and cause the MPEG encoded graphic representing an "on" button to be passed to the stitcher.
In certain embodiments, the video frame may be constructed from an unencoded graphic or a graphic that is not MPEG encoded and a groomed MPEG video source. The unencoded graphic may first be rendered. For example, a background may be rendered as a bit map. The background may then be encoded as a series of MPEG macroblocks divided up into slices. The stitcher can then stitch together the background and the groomed MPEG video content to form an MPEG video stream. The background may then be saved for later reuse. In such a configuration, the background would have cut-out regions wherein the slices in those regions would have no associated data, thus video content slices could be inserted into the cut-out. In other embodiments, real-time broadcasts may be received and groomed for creating MPEG video streams.
The foregoing features of the invention will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which:
- Fig. 1 is a block diagram showing a communications environment for implementing one version of the present invention;
- Fig. 1A shows the regional processing offices and the video content distribution network;
- Fig. 1B is a sample composite stream presentation and interaction layout file;
- Fig. 1C shows the construction of a frame within the authoring environment;
- Fig. 1D shows breakdown of a frame by macroblocks into elements;
- Fig. 2 is a diagram showing multiple sources composited onto a display;
- Fig. 3 is a diagram of a system incorporating grooming;
- Fig. 4 is a diagram showing a video frame prior to grooming, after grooming, and with a video overlay in the groomed section;
- Fig. 5 is a diagram showing how grooming is done, for example, removal of B - frames;
- Fig. 6 is a diagram showing an MPEG frame structure;
- Fig. 7 is a flow chart showing the grooming process for I, B, and P frames;
- Fig. 8 is a diagram depicting removal of region boundary motion vectors;
- Fig. 9 is a diagram showing the reordering of the DCT coefficients;
- Fig. 10 shows an alternative groomer;
- Fig. 11 shows an environment for a stitcher module;
- Fig. 12 is a diagram showing video frames starting in random positions relative to each other;
- Fig. 13 is a diagram of a display with multiple MPEG elements composited within the picture;
- Fig. 14 is a diagram showing the slice breakdown of a picture consisting of multiple elements;
- Fig. 15 is a diagram showing slice based encoding in preparation for stitching;
- Fig. 16 is a diagram detailing the compositing of a video element into a picture;
- Fig. 17 is a diagram detailing compositing of a 16x16 sized macroblock element into a background comprised of 24x24 sized macroblocks;
- Fig. 18 is a diagram depicting elements of a frame;
- Fig. 19 is a flowchart depicting compositing multiple encoded elements;
- Fig. 20 is a diagram showing that the composited element does not need to be rectangular nor contiguous;
- Fig. 21 shows a diagram of elements on a screen wherein a single element is non-contiguous;
- Fig. 22 shows a groomer for grooming linear broadcast content for multicasting to a plurality of processing offices and/or session processors;
- Fig. 23 shows an example of a customized mosaic when displayed on a display device;
- Fig. 24 is a diagram of an IP based network for providing interactive MPEG content;
- Fig. 25 is a diagram of a cable based network for providing interactive MPEG content;
- Fig. 26 is a flow-chart of the resource allocation process for a load balancer for use with a cable based network; and
- Fig. 27 is a system diagram used to show communication between cable network elements for load balancing.
As used in the following detailed description and in the appended claims the term "region" shall mean a logical grouping of MPEG (Motion Picture Expert Group) slices that are either contiguous or non-contiguous. When the term MPEG is used it shall refer to all variants of the MPEG standard including MPEG-2 and MPEG-4. The present invention as described in the embodiments below provides an environment for interactive MPEG content and communications between a processing office and a client device having an associated display, such as a television,. Although the present invention specifically references the MPEG specification and encoding, principles of the invention may be employed with other encoding techniques that are based upon block-based transforms. As used in the following specification and appended claims, the terms encode, encoded, and encoding shall refer to the process of compressing a digital data signal and formatting the compressed digital data signal to a protocol or standard. Encoded video data can be in any state other than a spatial representation. For example, encoded video data may be transform coded, quantized, and entropy encoded or any combination thereof. Therefore, data that has been transform coded will be considered to be encoded.
Although the present application refers to the display device as a television, the display device may be a cell phone, a Personal Digital Assistant (PDA) or other device that includes a display. A client device including a decoding device, such as a set-top box that can decode MPEG content, is associated with the display device of the user. In certain embodiments, the decoder may be part of the display device. The interactive MPEG content is created in an authoring environment allowing an application designer to design the interactive MPEG content creating an application having one or more scenes from various elements including video content from content providers and linear broadcasters. An application file is formed in an Active Video Markup Language (AVML). The AVML file produced by the authoring environment is an XML-based file defining the video graphical elements (i.e. MPEG slices) within a single frame/page, the sizes of the video graphical elements, the layout of the video graphical elements within the page/frame for each scene, links to the video graphical elements, and any scripts for the scene. In certain embodiments, an AVML file may be authored directly as opposed to being authored in a text editor or generated by an authoring environment. The video graphical elements may be static graphics, dynamic graphics, or video content. It should be recognized that each element within a scene is really a sequence of images and a static graphic is an image that is repeatedly displayed and does not change over time. Each of the elements may be an MPEG object that can include both MPEG data for graphics and operations associated with the graphics. The interactive MPEG content can include multiple interactive MPEG objects within a scene with which a user can interact. For example, the scene may include a button MPEG object that provides encoded MPEG data forming the video graphic for the object and also includes a procedure for keeping track of the button state. The MPEG objects may work in coordination with the scripts. For example, an MPEG button object may keep track of its state (on/off), but a script within the scene will determine what occurs when that button is pressed. The script may associate the button state with a video program so that the button will indicate whether the video content is playing or stopped. MPEG objects always have an associated action as part of the object. In certain embodiments, the MPEG objects, such as a button MPEG object, may perform actions beyond keeping track of the status of the button. In such, embodiments, the MPEG object may also include a call to an external program, wherein the MPEG object will access the program when the button graphic is engaged. Thus, for a play/pause MPEG object button, the MPEG object may include code that keeps track of the state of the button, provides a graphical overlay based upon a state change, and/or causes a video player object to play or pause the video content depending on the state of the button.
Once an application is created within the authoring environment, and an interactive session is requested by a requesting client device, the processing office assigns a processor for the interactive session.
The assigned processor operational at the processing office runs a virtual machine and accesses and runs the requested application. The processor prepares the graphical part of the scene for transmission in the MPEG format. Upon receipt of the MPEG transmission by the client device and display on the user's display, a user can interact with the displayed content by using an input device in communication with the client device. The client device sends input requests from the user through a communication network to the application running on the assigned processor at the processing office or other remote location. In response, the assigned processor updates the graphical layout based upon the request and the state of the MPEG objects hereinafter referred to in total as the application state. New elements may be added to the scene or replaced within the scene or a completely new scene may be created. The assigned processor collects the elements and the objects for the scene, and either the assigned processor or another processor processes the data and operations according to the object(s) and produces the revised graphical representation in an MPEG format that is transmitted to the transceiver for display on the user's television. Although the above passage indicates that the assigned processor is located at the processing office, the assigned processor may be located at a remote location and need only be in communication with the processing office through a network connection. Similarly, although the assigned processor is described as handling all transactions with the client device, other processors may also be involved with requests and assembly of the content (MPEG objects) of the graphical layout for the application.
The content provider 160 may encode the video content as MPEG video/audio or the content may be in another graphical format (e.g. JPEG, BITMAP, H263, H264, VC-1 etc.). The content may be subsequently groomed and/or scaled in a Groomer/Scaler 190 to place the content into a preferable encoded MPEG format that will allow for stitching. If the content is not placed into the preferable MPEG format, the processing office will groom the format when an application that requires the content is requested by a client device. Linear broadcast content 170 from broadcast media services, like content from the content providers, will be groomed. The linear broadcast content is preferably groomed and/or scaled in Groomer/Scaler 180 that encodes the content in the preferable MPEG format for stitching prior to passing the content to the processing office.
The video content from the content producers 160 along with the applications created by application programmers are distributed through a video content distribution network 150 and are stored at distribution points 140. These distribution points are represented as the proxy/cache within Fig. 1 . Content providers place their content for use with the interactive processing office in the video content distribution network at a proxy/cache 140 location. Thus, content providers 160 can provide their content to the cache 140 of the video content distribution network 150 and one or more processing office that implements the present architecture may access the content through the video content distribution network 150 when needed for an application. The video content distribution network 150 may be a local network, a regional network or a global network. Thus, when a virtual machine at a processing office requests an application, the application can be retrieved from one of the distribution points and the content as defined within the application's AVML file can be retrieved from the same or a different distribution point.
An end user of the system can request an interactive session by sending a command through the client device 110, such as a set-top box, to a processing office 105. In Fig. 1 , only a single processing office is shown. However, in real-world applications, there may be a plurality of processing offices located in different regions, wherein each of the processing offices is in communication with a video content distribution network as shown in Fig. 1B . The processing office 105 assigns a processor for the end user for an interactive session. The processor maintains the session including all addressing and resource allocation. As used in the specification and the appended claims the term "virtual machine" 106 shall refer to the assigned processor, as well as, other processors at the processing office that perform functions, such as session management between the processing office and the client device as well as resource allocation (i.e. assignment of a processor for an interactive session).
The virtual machine 106 communicates its address to the client device 110 and an interactive session is established. The user can then request presentation of an interactive application (AVML) through the client device 110. The request is received by the virtual machine 106 and in response, the virtual machine 106 causes the AVML file to be retrieved from the proxy/cache 140 and installed into a memory cache 107 that is accessible by the virtual machine 106. It should be recognized that the virtual machine 106 may be in simultaneous communication with a plurality of client devices 110 and the client devices may be different device types. For example, a first device may be a cellular telephone, a second device may be a set-top box, and a third device may be a personal digital assistant wherein each device access the same or a different application.
In response to a request for an application, the virtual machine 106 processes the application and requests elements and MPEG objects that are part of the scene to be moved from the proxy/cache into memory 107 associated with the virtual machine 106. An MPEG object includes both a visual component and an actionable component. The visual component may be encoded as one or more MPEG slices or provided in another graphical format. The actionable component may be storing the state of the object, may include performing computations, accessing an associated program, or displaying overlay graphics to identify the graphical component as active. An overlay graphic may be produced by a signal being transmitted to a client device wherein the client device creates a graphic in the overlay plane on the display device. It should be recognized that a scene is not a static graphic, but rather includes a plurality of video frames wherein the content of the frames can change over time.
The virtual machine 106 determines based upon the scene information, including the application state, the size and location of the various elements and objects for a scene. Each graphical element may be formed from contiguous or non-contiguous MPEG slices. The virtual machine keeps track of the location of all of the slices for each graphical element. All of the slices that define a graphical element form a region. The virtual machine 106 keeps track of each region. Based on the display position information within the AVML file, the slice positions for the elements and background within a video frame are set. If the graphical elements are not already in a groomed format, the virtual machine passes that element to an element renderer. The renderer renders the graphical element as a bitmap and the renderer passes the bitmap to an MPEG element encoder 109. The MPEG element encoder encodes the bitmap as an MPEG video sequence. The MPEG encoder processes the bitmap so that it outputs a series of P-frames. An example of content that is not already pre-encoded and pre-groomed is personalized content. For example, if a user has stored music files at the processing office and the graphic element to be presented is a listing of the user's music files, this graphic would be created in real-time as a bitmap by the virtual machine. The virtual machine would pass the bitmap to the element renderer 108 which would render the bitmap and pass the bitmap to the MPEG element encoder 109 for grooming.
After the graphical elements are groomed by the MPEG element encoder, the MPEG element encoder 109 passes the graphical elements to memory 107 for later retrieval by the virtual machine 106 for other interactive sessions by other users. The MPEG encoder 109 also passes the MPEG encoded graphical elements to the stitcher 115. The rendering of an element and MPEG encoding of an element may be accomplished in the same or a separate processor from the virtual machine 106. The virtual machine 106 also determines if there are any scripts within the application that need to be interpreted. If there are scripts, the scripts are interpreted by the virtual machine 106.
Each scene in an application can include a plurality of elements including static graphics, object graphics that change based upon user interaction, and video content. For example, a scene may include a background (static graphic), along with a media player for playback of audio video and multimedia content (object graphic) having a plurality of buttons, and a video content window (video content) for displaying the streaming video content. Each button of the media player may itself be a separate object graphic that includes its own associated methods.
The virtual machine 106 acquires each of the graphical elements (background, media player graphic, and video frame) for a frame and determines the location of each element. Once all of the objects and elements (background, video content) are acquired, the elements and graphical objects are passed to the stitcher/compositor 115 along with positioning information for the elements and MPEG objects. The stitcher 115 stitches together each of the elements (video content, buttons, graphics, background) according to the mapping provided by the virtual machine 106. Each of the elements is placed on a macroblock boundary and when stitched together the elements form an MPEG video frame. On a periodic basis all of the elements of a scene frame are encoded to form a reference P-frame in order to refresh the sequence and avoid dropped macroblocks. The MPEG video stream is then transmitted to the address of client device through the down stream network. The process continues for each of the video frames. Although the specification refers to MPEG as the encoding process, other encoding processes may also be used with this system.
The virtual machine 106 or other processor or process at the processing office 105 maintains information about each of the elements and the location of the elements on the screen. The virtual machine 106 also has access to the methods for the objects associated with each of the elements. For example, a media player may have a media player object that includes a plurality of routines. The routines can include, play, stop, fast forward, rewind, and pause. Each of the routines includes code and upon a user sending a request to the processing office 105 for activation of one of the routines, the object is accessed and the routine is run. The routine may be a JAVA-based applet, a script to be interpreted, or a separate computer program capable of being run within the operating system associated with the virtual machine.
The processing office 105 may also create a linked data structure for determining the routine to execute or interpret based upon a signal received by the processor from the client device associated with the television. The linked data structure may be formed by an included mapping module. The data structure associates each resource and associated object relative to every other resource and object. For example, if a user has already engaged the play control, a media player object is activated and the video content is displayed. As the video content is playing in a media player window, the user can depress a directional key on the user's remote control. In this example, the depression of the directional key is indicative of pressing a stop button. The transceiver produces a directional signal and the assigned processor receives the directional signal. The virtual machine 106 or other processor at the processing office 105 accesses the linked data structure and locates the element in the direction of the directional key press. The database indicates that the element is a stop button that is part of a media player object and the processor implements the routine for stopping the video content. The routine will cause the requested content to stop. The last video content frame will be frozen and a depressed stop button graphic will be interwoven by the stitcher module into the frame. The routine may also include a focus graphic to provide focus around the stop button. For example, the virtual machine can cause the stitcher to enclose the graphic having focus with a boarder that is 1 macroblock wide. Thus, when the video frame is decoded and displayed, the user will be able to identify the graphic/object that the user can interact with. The frame will then be passed to a multiplexor and sent through the downstream network to the client device. The MPEG encoded video frame is decoded by the client device displayed on either the client device (cell phone, PDA) or on a separate display device (monitor, television). This process occurs with a minimal delay. Thus, each scene from an application results in a plurality of video frames each representing a snapshot of the media player application state.
The virtual machine 106 will repeatedly receive commands from the client device and in response to the commands will either directly or indirectly access the objects and execute or interpret the routines of the objects in response to user interaction and application interaction model. In such a system, the video content material displayed on the television of the user is merely decoded MPEG content and all of the processing for the interactivity occurs at the processing office and is orchestrated by the assigned virtual machine. Thus, the client device only needs a decoder and need not cache or process any of the content.
It should be recognized that through user requests from a client device, the processing office could replace a video element with another video element. For example, a user may select from a list of movies to display and therefore a first video content element would be replaced by a second video content element if the user selects to switch between two movies. The virtual machine, which maintains a listing of the location of each element and region forming an element can easily replace elements within a scene creating a new MPEG video frame wherein the frame is stitched together including the new element in the stitcher 115.
The authoring environment includes a graphical editor as shown in Fig. 1C for developing interactive applications. An application includes one or more scenes. As shown in Fig. 1B the application window shows that the application is composed of three scenes (scene 1, scene 2 and scene 3). The graphical editor allows a developer to select elements to be placed into the scene forming a display that will eventually be shown on a display device associated with the user. In some embodiments, the elements are dragged-and-dropped into the application window. For example, a developer may want to include a media player object and media player button objects and will select these elements from a toolbar and drag and drop the elements in the window. Once a graphical element is in the window, the developer can select the element and a property window for the element is provided. The property window includes at least the location of the graphical element (address), and the size of the graphical element. If the graphical element is associated with an object, the property window will include a tab that allows the developer to switch to a bitmap event screen and alter the associated object parameters. For example, a user may change the functionality associated with a button or may define a program associated with the button.
As shown in Fig. 1D , the stitcher of the system creates a series of MPEG frames for the scene based upon the AVML file that is the output of the authoring environment. Each element/graphical object within a scene is composed of different slices defining a region. A region defining an element/object may be contiguous or non-contiguous. The system snaps the slices forming the graphics on a macro-block boundary. Each element need not have contiguous slices. For example, the background has a number of non-contiguous slices each composed of a plurality of macroblocks. The background, if it is static, can be defined by intracoded macroblocks. Similarly, graphics for each of the buttons can be intracoded; however the buttons are associated with a state and have multiple possible graphics. For example, the button may have a first state "off" and a second state "on" wherein the first graphic shows an image of a button in a non-depressed state and the second graphic shows the button in a depressed state. Fig. 1C also shows a third graphical element, which is the window for the movie. The movie slices are encoded with a mix of intracoded and intercoded macroblocks and dynamically changes based upon the content. Similarly if the background is dynamic, the background can be encoded with both intracoded and intercoded macroblocks, subject to the requirements below regarding grooming.
When a user selects an application through a client device, the processing office will stitch together the elements in accordance with the layout from the graphical editor of the authoring environment. The output of the authoring environment includes an Active Video Mark-up Language file (AVML) The AVML file provides state information about multi-state elements such as a button, the address of the associated graphic, and the size of the graphic. The AVML file indicates the locations within the MPEG frame for each element, indicates the objects that are associated with each element, and includes the scripts that define changes to the MPEG frame based upon user's actions. For example, a user may send an instruction signal to the processing office and the processing office will use the AVML file to construct a set of new MPEG frames based upon the received instruction signal. A user may want to switch between various video elements and may send an instruction signal to the processing office. The processing office will remove a video element within the layout for a frame and will select the second video element causing the second video element to be stitched into the MPEG frame at the location of the first video element. This process is described below.
The application programming environment outputs an AVML file. The AVML file has an XML-based syntax. The AVML file syntax includes a root object <AVML>. Other top level tags include <initialscene> that specifies the first scene to be loaded when an application starts. The <script> tag identifies a script and a <scene> tag identifies a scene. There may also be lower level tags to each of the top level tags, so that there is a hierarchy for applying the data within the tag. For example, a top level stream tag may include <aspect ratio> for the video stream, <video format>, <bit rate>, <audio format> and <audio bit rate>. Similarly, a scene tag may include each of the elements within the scene. For example, <background> for the background, <button> for a button object, and <static image> for a still graphic. Other tags include <size> and <pos> for the size and position of an element and may be lower level tags for each element within a scene. An example of an AVML file is provided in Fig. 1B . Further discussion of the AVML file syntax is provided in Appendix A attached hereto.
The process of stitching is described below and can be performed in a much more efficient manner if the elements have been groomed first.
Grooming removes some of the interdependencies present in compressed video. The groomer will convert I and B frames to P frames and will fix any stray motion vectors that reference a section of another frame of video that has been cropped or removed. Thus, a groomed video stream can be used in combination with other groomed video streams and encoded still images to form a composite MPEG video stream. Each groomed video stream includes a plurality of frames and the frames can be can be easily inserted into another groomed frame wherein the composite frames are grouped together to form an MPEG video stream. It should be noted that the groomed frames may be formed from one or more MPEG slices and may be smaller in size than an MPEG video frame in the MPEG video stream.
As shown, video element 420 is inserted within the background video frame 410 (also for example only; this element could also consist of multiple slices per row). If a macroblock within the original video frame 410 references another macroblock in determining its value and the reference macroblock is removed from the frame because the video image 420 is inserted in its place, the macroblocks value needs to be recalculated. Similarly, if a macroblock references another macroblock in a subsequent frame and that macroblock is removed and other source material is inserted in its place, the macroblock values need to be recalculated. This is addressed by grooming the video 430. The video frame is processed so that the rows contain multiple slices some of which are specifically sized and located to match the substitute video content. After this process is complete, it is a simple task to replace some of the current slices with the overlay video resulting in a groomed video with overlay 440. The groomed video stream has been specifically defined to address that particular overlay. A different overlay would dictate different grooming parameters. Thus, this type of grooming addresses the process of segmenting a video frame into slices in preparation for stitching. It should be noted that there is never a need to add slices to the overlay element. Slices are only added to the receiving element, that is, the element into which the overlay will be placed. The groomed video stream can contain information about the stream's groomed characteristics. Characteristics that can be provided include: 1. the locations for the upper left and lower right corners of the groomed window. 2. The location of upper left corner only and then the size of the window. The size of the slice accurate to the pixel level.
There are also two ways to provide the characteristic information in the video stream. The first is to provide that information in the slice header. The second is to provide the information in the extended data slice structure. Either of these options can be used to successfully pass the necessary information to future processing stages, such as the virtual machine and stitcher.
| # | ||
| A | Picture Coding Type | P-Frame |
| B | Intra DC Precision | Match stitcher setting |
| C | Picture structure | Frame |
| D | Frame prediction frame DCT | Match stitcher setting |
| E | Quant scale type | Match stitcher setting |
| F | Intra VLC format | Match stitcher setting |
| G | Alternate scan | Normal scan |
| H | Progressive frame | Progressive scan |
Next, the slice overhead information 740 must be modified. The parameters to modify are given in the table below.
| # | ||
| A | Quantizer Scale Code | Will change if there is a "scale type" change in the picture header. |
Next, the macroblock overhead 750 information may require modification. The values to be modified are given in the table below.
| # | ||
| A | Macroblock type | Change the variable length code from that for an I frame to that for a P frame) |
| B | DCT type | Set to frame if not already |
| C | Concealment motion vectors | Removed |
Finally, the block information 760 may require modification. The items to modify are given in the table below.
| # | ||
| A | DCT coefficient values | Require updating if there were any quantizer changes at the picture or slice level. |
| B | DCT coefficient ordering | Need to be reordered if "alternate scan" was changed from what it was before. |
Once the block changes are complete, the process can start over with the next frame of video.
If the frame type is a B-frame 705, the same steps required for an 1-frame are also required for the B-frame. However, in addition, the motion vectors 770 need to be modified. There are two scenarios: B-frame immediately following an 1-frame or P-frame, or a B-frame following another B-frame. Should the B-frame follow either an I or P frame, the motion vector, using the I or P frame as a reference, can remain the same and only the residual would need to change. This may be as simple as converting the forward looking motion vector to be the residual.
For the B-frames that follow another B-frame, the motion vector and its residual will both need to be modified. The second B-frame must now reference the newly converted B to P frame immediately preceding it. First, the B-frame and its reference are decoded and the motion vector and the residual are recalculated. It must be noted that while the frame is decoded to update the motion vectors, there is no need to re-encode the DCT coefficients. These remain the same. Only the motion vector and residual are calculated and modified.
The last frame type is the P-frame. This frame type also follows the same path as an I-frame Fig. 8 diagrams the motion vector modification for macroblocks adjacent to a region boundary. It should be recognized that motion vectors on a region boundary are most relevant to background elements into which other video elements are being inserted. Therefore, grooming of the background elements may be accomplished by the application creator. Similarly, if a video element is cropped and is being inserted into a "hole" in the background element, the cropped element may include motion vectors that point to locations outside of the "hole". Grooming motion vectors for a cropped image may be done by the content creator if the content creator knows the size that the video element needs to be cropped, or the grooming may be accomplished by the virtual machine in combination with the element renderer and MPEG encoder if the video element to be inserted is larger than the size of the "hole" in the background.
In addition to updating motion vectors and changing frame types, the groomer may also convert field based encoded macroblocks to frame based encoded macroblocks. Fig. 9 shows the conversion of a field based encoded macroblocks to frame based. For reference, a frame based set of blocks 900 is compressed. The compressed block set 910 contains the same information in the same blocks but now it is contained in compressed form. On the other hand, a field based macroblock 940 is also compressed. When this is done, all the even rows (0, 2, 4, 6) are placed in the upper blocks (0 & 1) while the odd rows (1, 3, 5, 7) are placed in the lower blocks (2&3). When the compressed field based macroblock 950 is converted to a frame based macroblock 970, the coefficients need to be moved from one block to another 980. That is, the rows must be reconstructed in numerical order rather than in even odd. Rows 1 & 3, which in the field based encoding were in blocks 2 & 3, are now moved back up to blocks 0 or 1 respectively. Correspondingly, rows 4 & 6 are moved from blocks 0 & 1 and placed down in blocks 2 & 3.
This particular type of encoding is called "slice based encoding". A slice based encoder/virtual machine is one that is aware of the desired slice structure of the output frame and performs its encoding appropriately. That is, the encoder knows the size of the slices and where they belong. It knows where to leave holes if that is required. By being aware of the desired output slice configuration, the virtual machine provides an output that is easily stitched.
It is also possible for there to be an overlap in the composited video frame. Referring back to Fig. 17 , the element 1840 consisted of four slices. Should this element actually be five slices, it would overlap with the background element 1800 in the composited video frame 1880. There are multiple ways to resolve this conflict with the easiest being to composite only four slices of the element and drop the fifth. It is also possible to composite the fifth slice into the background row, break the conflicting background row into slices and remove the background slice that conflicts with the fifth element slice (then possibly add a sixth element slice to fill any gap).
The possibility of different slice sizes requires the compositing function to perform a check of the incoming background and video elements to confirm they are proper. That is, make sure each one is complete (e.g., a full frame), there are no sizing conflicts, etc.
The performance of the stitcher can be improved (build frames faster with less processor power) by providing the stitcher advance information on the frame format. For example, the virtual machine may provide the stitcher with the start location and size of the areas in the frame to be inserted. Alternatively, the information could be the start location for each slice and the stitcher could then figure out the size (the difference between the two start locations). This information could be provided externally by the virtual machine or the virtual machine could incorporate the information into each element. For instance, part of the slice header could be used to carry this information. The stitcher can use this foreknowledge of the frame structure to begin compositing the elements together well before they are required.
The source for the element slices can be any one of a number of options. It can come from a real-time encoded source. It can be a complex slice that is built from separate slices, one having a background and the other having text. It can be a pre-encoded element that is fetched from a cache. These examples are for illustrative purposes only and are not intended to limit the options for element sources.
When a client device sends a request for a mosaic application, the processing office associated with the client device assigns a processor/virtual machine for the client device for the requested mosaic application. The assigned virtual machine constructs the personalized mosaic by compositing the groomed content from the desired channels using a stitcher. The virtual machine sends the client device an MPEG stream that has a mosaic of the channels that the client has requested. Thus, by grooming the content first so that the content can be stitched together, the virtual machines that create the mosaics do not need to first decode the desired channels, render the channels within the background as a bitmap and then encode the bitmap.
An application, such as a mosaic, can be requested either directly through a client device or indirectly through another device, such as a PC, for display of the application on a display associated with the client device. The user could log into a website associated with the processing office by providing information about the user's account. The server associated with the processing office would provide the user with a selection screen for selecting an application. If the user selected a mosaic application, the server would allow the user to select the content that the user wishes to view within the mosaic. In response to the selected content for the mosaic and using the user's account information, the processing office server would direct the request to a session processor and establish an interactive session with the client device of the user. The session processor would then be informed by the processing office server of the desired application. The session processor would retrieve the desired application, the mosaic application in this example, and would obtain the required MPEG objects. The processing office server would then inform the session processor of the requested video content and the session processor would operate in conjunction with the stitcher to construct the mosaic and provide the mosaic as an MPEG video stream to the client device. Thus, the processing office server may include scripts or application for performing the functions of the client device in setting up the interactive session, requesting the application, and selecting content for display. While the mosaic elements may be predetermined by the application, they may also be user configurable resulting in a personalized mosaic.
These additional resources add cost to the system. As a result, the desire is to minimize the number of additional resources that are required to deliver a level of performance to the user that mimics a non-blocking system such as an IP network. Since there is not a one-to-one correspondence between the cable network resources and the users on the network, the resources must be shared. Shared resources must be managed so they can be assigned when a user requires a resource and then freed when the user is finished utilizing that resource. Proper management of these resources is critical to the operator because without it, the resources could be unavailable when needed most. Should this occur, the user either receives a "please wait" message or, in the worst case, a "service unavailable" message.
- (1) The Set Top 2609 requests content 2610 from the Controller 2607
- (2) The Controller 2607 requests QAM bandwidth 2620 from the SRM 2603
- (3) The SRM 2603 checks QAM availability 2625
- (4) The SRM 2603 allocates the QAM modulator 2630
- (5) The QAM modulator returns confirmation 2635
- (6) The SRM 2603 confirms QAM allocation success 2640 to the Controller
- (7) The Controller 407 allocates the Session processor 2650
- (8) The Session processor confirms allocation success 2653
- (9) The Controller 2607 allocates the content 2655
- (10) The Controller 2607 configures 2660 the Set Top 2609. This includes:
- a. Frequency to tune
- b. Programs to acquire or alternatively PIDs to decode
- c. IP port to connect to the Session processor for keystroke capture
- (11) The Set Top 2609 tunes to the channel 2663
- (12) The Set Top 2609 confirms success 2665 to the Controller 2607
The Controller 2607 allocates the resources based on a request for service from a set top box 2609. It frees these resources when the set top or server sends an "end of session". While the controller 2607 can react quickly with minimal delay, the SRM 2603 can only allocate a set number of QAM sessions per second i.e. 200. Demand that exceeds this rate results in unacceptable delays for the user. For example, if 500 requests come in at the same time, the last user would have to wait 5 seconds before their request was granted. It is also possible that rather than the request being granted, an error message could be displayed such as "service unavailable".
While the example above describes the request and response sequence for an AVDN session over a cable TV network, the example below describes a similar sequence over an IPTV network. Note that the sequence in itself is not a claim, but rather illustrates how AVDN would work over an IPTV network.
- (1) Client device requests content from the Controller via a Session Manager (i.e. controller proxy).
- (2) Session Manager forwards request to Controller.
- (3) Controller responds with the requested content via Session Manager (i.e. client proxy).
- (4) Session Manager opens a unicast session and forwards Controller response to client over unicast IP session.
- (5) Client device acquires Controller response sent over unicast IP session.
- (6) Session manager may simultaneously narrowcast response over multicast IP session to share with other clients on node group that request same content simultaneously as a bandwidth usage optimization technique.
A first issue is the assignment of QAMs 2770 and QAM channels 2775 by the SRM 2720. In particular, the resources must be managed to prevent SRM overload, that is, eliminating the delay the user would see when requests to the SRM 2720 exceed its sessions per second rate.
To prevent SRM "overload", "time based modeling" may be used. For time based modeling, the Controller 2700 monitors the history of past transactions, in particular, high load periods. By using this previous history, the Controller 2700 can predict when a high load period may occur, for example, at the top of an hour. The Controller 2700 uses this knowledge to pre-allocate resources before the period comes. That is, it uses predictive algorithms to determine future resource requirements. As an example, if the Controller 2700 thinks 475 users are going to join at a particular time, it can start allocating those resources 5 seconds early so that when the load hits, the resources have already been allocated and no user sees a delay.
Secondly, the resources could be pre-allocated based on input from an operator. Should the operator know a major event is coming, e.g., a pay per view sporting event, he may want to pre-allocate resources in anticipation. In both cases, the SRM 2720 releases unused QAM 2770 resources when not in use and after the event.
Thirdly, QAMs 2770 can be allocated based on a "rate of change" which is independent of previous history. For example, if the controller 2700 recognizes a sudden spike in traffic, it can then request more QAM bandwidth than needed in order to avoid the QAM allocation step when adding additional sessions. An example of a sudden, unexpected spike might be a button as part of the program that indicates a prize could be won if the user selects this button.
Currently, there is one request to the SRM 2720 for each session to be added. Instead the controller 2700 could request the whole QAM 2770 or a large part of a single QAM's bandwidth and allow this invention to handle the data within that QAM channel 2775. Since one aspect of this system is the ability to create a channel that is only 1, 2, or 3 Mb/sec, this could reduce the number of requests to the SRM 2720 by replacing up to 27 requests with a single request.
The user will also experience a delay when they request different content even if they are already in an active session. Currently, if a set top 2790 is in an active session and requests a new set of content 2730, the Controller 2700 has to tell the SRM 2720 to de-allocate the QAM 2770, then the Controller 2700 must de-allocate the session processor 2750 and the content 2730, and then request another QAM 2770 from the SRM 2720 and then allocate a different session processor 2750 and content 2730. Instead, the controller 2700 can change the video stream 2755 feeding the QAM modulator 2770 thereby leaving the previously established path intact. There are a couple of ways to accomplish the change. First, since the QAM Modulators 2770 are on a network so the controller 2700 can merely change the session processor 2750 driving the QAM 2770. Second, the controller 2700 can leave the session processor 2750 to set top 2790 connection intact but change the content 2730 feeding the session processor 2750, e.g., "CNN Headline News" to "CNN World Now". Both of these methods eliminate the QAM initialization and Set Top tuning delays.
Thus, resources are intelligently managed to minimize the amount of equipment required to provide these interactive services. In particular, the Controller can manipulate the video streams 2755 feeding the QAM 2770. By profiling these streams 2755, the Controller 2700 can maximize the channel usage within a QAM 2770. That is, it can maximize the number of programs in each QAM channel 2775 reducing wasted bandwidth and the required number of QAMs 2770. There are three primary means to profile streams: formulaic, pre-profiling, and live feedback.
The first profiling method, formulaic, consists of adding up the bit rates of the various video streams used to fill a QAM channel 2775. In particular, there may be many video elements that are used to create a single video stream 2755. The maximum bit rate of each element can be added together to obtain an aggregate bit rate for the video stream 2755. By monitoring the bit rates of all video streams 2755, the Controller 2700 can create a combination of video streams 2755 that most efficiently uses a QAM channel 2775. For example, if there were four video streams 2755: two that were 16 Mb/sec and two that were 20 Mb/sec then the controller could best fill a 38.8 Mb/sec QAM channel 2775 by allocating one of each bit rate per channel. This would then require two QAM channels 2775 to deliver the video. However, without the formulaic profiling, the result could end up as 3 QAM channels 2775 as perhaps the two 16 Mb/sec video streams 2755 are combined into a single 38.8 Mb/sec QAM channel 2775 and then each 20 Mb/sec video stream 2755 must have its own 38.8 Mb/sec QAM channel 2775.
A second method is pre-profiling. In this method, a profile for the content 2730 is either received or generated internally. The profile information can be provided in metadata with the stream or in a separate file. The profiling information can be generated from the entire video or from a representative sample. The controller 2700 is then aware of the bit rate at various times in the stream and can use this information to effectively combine video streams 2755 together. For example, if two video streams 2755 both had a peak rate of 20 Mb/sec, they would need to be allocated to different 38.8 Mb/sec QAM channels 2775 if they were allocated bandwidth based on their peaks. However, if the controller knew that the nominal bit rate was 14 Mb/sec and knew their respective profiles so there were no simultaneous peaks, the controller 2700 could then combine the streams 2755 into a single 38.8 Mb/sec QAM channel 2775. The particular QAM bit rate is used for the above examples only and should not be construed as a limitation.
A third method for profiling is via feedback provided by the system. The system can inform the controller 2700 of the current bit rate for all video elements used to build streams and the aggregate bit rate of the stream after it has been built. Furthermore, it can inform the controller 2700 of bit rates of stored elements prior to their use. Using this information, the controller 2700 can combine video streams 2755 in the most efficient manner to fill a QAM channel 2775.
It should be noted that it is also acceptable to use any or all of the three profiling methods in combination. That is, there is no restriction that they must be used independently.
The system can also address the usage of the resources themselves. For example, if a session processor 2750 can support 100 users and currently there are 350 users that are active, it requires four session processors. However, when the demand goes down to say 80 users, it would make sense to reallocate those resources to a single session processor 2750, thereby conserving the remaining resources of three session processors. This is also useful in failure situations. Should a resource fail, the invention can reassign sessions to other resources that are available. In this way, disruption to the user is minimized.
The system can also repurpose functions depending on the expected usage. The session processors 2750 can implement a number of different functions, for example, process video, process audio, etc. Since the controller 2700 has a history of usage, it can adjust the functions on the session processors 2700 to meet expected demand. For example, if in the early afternoons there is typically a high demand for music, the controller 2700 can reassign additional session processors 2750 to process music in anticipation of the demand. Correspondingly, if in the early evening there is a high demand for news, the controller 2700 anticipates the demand and reassigns the session processors 2750 accordingly. The flexibility and anticipation of the system allows it to provide the optimum user experience with the minimum amount of equipment. That is, no equipment is idle because it only has a single purpose and that purpose is not required.
The present invention may be embodied in many different forms, including, but in no way limited to, computer program logic for use with a processor (e.g., a microprocessor, microcontroller, digital signal processor, or general purpose computer), programmable logic for use with a programmable logic device (e.g., a Field Programmable Gate Array (FPGA) or other PLD), discrete components, integrated circuitry (e.g., an Application Specific Integrated Circuit (ASIC)), or any other means including any combination thereof. In an embodiment of the present invention, predominantly all of the reordering logic may be implemented as a set of computer program instructions that is converted into a computer executable form, stored as such in a computer readable medium, and executed by a microprocessor within the array under the control of an operating system.
Computer program logic implementing all or part of the functionality previously described herein may be embodied in various forms, including, but in no way limited to, a source code form, a computer executable form, and various intermediate forms (e.g., forms generated by an assembler, compiler, networker, or locator.) Source code may include a series of computer program instructions implemented in any of various programming languages (e.g., an object code, an assembly language, or a high-level language such as Fortran, C, C++, JAVA, or HTML) for use with various operating systems or operating environments. The source code may define and use various data structures and communication messages. The source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form.
The computer program may be fixed in any form (e.g., source code form, computer executable form, or an intermediate form) either permanently or transitorily in a tangible storage medium, such as a semiconductor memory device (e.g., a RAM, ROM, PROM, EEPROM, or Flash-Programmable RAM), a magnetic memory device (e.g., a diskette or fixed disk), an optical memory device (e.g., a CD-ROM), a PC card (e.g., PCMCIA card), or other memory device. The computer program may be fixed in any form in a signal that is transmittable to a computer using any of various communication technologies, including, but in no way limited to, analog technologies, digital technologies, optical technologies, wireless technologies, networking technologies, and internetworking technologies. The computer program may be distributed in any form as a removable storage medium with accompanying printed or electronic documentation (e.g., shrink wrapped software or a magnetic tape), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the communication system (e.g., the Internet or World Wide Web.)
Hardware logic (including programmable logic for use with a programmable logic device) implementing all or part of the functionality previously described herein may be designed using traditional manual methods, or may be designed, captured, simulated, or documented electronically using various tools, such as Computer Aided Design (CAD), a hardware description language (e.g., VHDL or AHDL), or a PLD programming language (e.g., PALASM, ABEL, or CUPL.)
While the invention has been particularly shown and described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended clauses. As will be apparent to those skilled in the art, techniques described above for panoramas may be applied to images that have been captured as non-panoramic images, and vice versa.
Embodiments of the present invention may be described, without limitation, by the following clauses. While these embodiments have been described in the clauses by process steps, an apparatus comprising a computer with associated display capable of executing the process steps in the clauses below is also included in the present invention. Likewise, a computer program product including computer executable instructions for executing the process steps in the clauses below and stored on a computer readable medium is included within the present invention.
- Embodiment 1. A system for encoding at least one composite encoded video frame for display on a display device from a first and a second encoded sources that include block-based transform encoded data, the system comprising: a markup language-based graphical layout, the graphical layout including frame locations within the composite frame for at least the first encoded source and the second encoded source, a stitcher module for stitching together the first encoded source and the second encoded source according to the frame locations of the graphical layout to form an encoded frame without having to decode the block-based transform encoded data for at least the first source.
- Embodiment 2. A system according to the features of embodiment 1 wherein a fourier-based transform is used to encode the first and second encoded sources.
- Embodiment 3. A system according to the features of embodiment 1 wherein a discrete cosine transform is used to encode the first and second encoded sources.
- Embodiment 4. A system according to the features of embodiment 1 wherein the encoded frame is encoded using MPEG-I.
- Embodiment 5. A system according to the features of embodiment 1 wherein the encoded frame is encoded as using MPEG-2.
- Embodiment 6. A system according to the features of embodiment 1 wherein the encoded frame is encoded using H.264.
- Embodiment 7. A system according to the features of embodiment 1 wherein the encoded frame is encoded using VC-I.
- Embodiment 8. A system according to the features of embodiment 1 wherein the encoded frame is encoded using AVS.
- Embodiment 9. A system according to the features of embodiment 1 wherein the first encoded source is encoded using an MPEG standard..
- Embodiment 10. A system according to the features of embodiment 9 wherein the second encoded source is encoded using an MPEG standard.
- Embodiment 11. A system according to the features of embodiment 9 wherein the first encoded source contains transform coded macroblock data.
- Embodiment 12. A system according to the features of embodiment 10 wherein the second encoded source contain transform coded macroblock data and one or more motion vectors.
- Embodiment 13. A system according to the features of embodiment 1 wherein at least the second encoded source is video that includes a plurality of video frames and the stitcher module stitches together the first encoded source and the second encoded source to form a sequence of stitched frames.
- Embodiment 14. A system according to the features of embodiment 9 further comprising: a groomer module wherein the first encoded source is groomed so that each sequential frame element is converted to an MPEG P-frame format.
- Embodiment 15. The system according to the features of embodiment 14 wherein the groomer module identifies any macroblocks that include motion vectors that reference other macroblocks in a section of the first encoded source and re-encodes those macroblocks as intracoded macroblocks.
- Embodiment 16. A system allowing for interactivity with video content that is transmitted to a client device in a compressed form, the system comprising: a processor for controlling an interactive session with the client device and maintaining state information about graphical elements within the compressed video content; a stitcher responsive to the processor for stitching together two or more compressed graphical elements; wherein when the processor receives a signal requesting a change in state of a first graphical element, the processor retrieves a new compressed graphical element indicative of the state change of the first graphical element and the stitcher stitches the new compressed graphical element into the compressed video stream.
- Embodiment 17. A system according to the features of embodiment 16 further comprising: a transmitter for transmitting the new compressed video content to the requesting client device.
- Embodiment 18. A system according to the features of embodiment 16, wherein the first graphical element is one or more MPEG encoded video frames.
- Embodiment 19. A system according to the features of embodiment 16 wherein each of the graphical elements are block- based encoded.
- Embodiment 20. A system for interacting with composite block-based encoded content within a video frame, the system comprising: a processor for receiving a first block-based encoded source including encoded block-based encoded data, a second block-based encoded source including block-based encoded data, and a mark-up' language layout, the markup-language layout including frame locations within the video frame for at least the data from the first block-based encoded source and the data from the second block-based encoded source and reference to an object model corresponding to the second block-based encoded source, a stitcher module for stitching together the data from the first block-based encoded source and the data from the second block-based encoded source according to the frame locations of the mark-up language layout to form the video frame; wherein the processor receives user input, and in response to the user input in conjunction with the object model instructs the stitcher module to stitch together the first block-based encoded source and a third block-based encoded source replacing the second block-based encoded source in accordance with the user input.
- Embodiment 21. A system according to the features of embodiment 20 wherein the first and the second block-based encoded sources are encoded using an MPEG standard.
- Embodiment 22. A system according to the features of embodiment 20 wherein the first and the second block-based encoded sources are encoded using H264.
- Embodiment 23. A system according to the features of embodiment 20 wherein the first and second block-based encoded sources are encoded using VC- 1.
- Embodiment 24. A system according to the features of embodiment 20 wherein the first and the second block-based encoded sources are encoded using AVS.
- Embodiment 25. The system according to the features of embodiment 20 wherein the object model represents a button that changes graphically to the third block-based encoded source and maintains an indication of a change in state.
- Embodiment 26. A method for using groomed video content to provide interactive encoded content comprising: establishing an interactive session with a client device associated with a user; in response to user input, retrieving a selected mark-up language layout wherein the mark-up language layout includes reference to encoded video content and groomed video content; stitching the groomed video content into the encoded video content creating an encoded video stream; and sending the encoded video stream to the client device associated with the user,
- Embodiment 27. The method according to the features of embodiment 26, wherein the encoded video content is encoded prior to receipt of request from a user for an interactive session.
- Embodiment 28. The method according to the features of embodiment 26 further comprising: receiving a real-time broadcast; and grooming the real-time broadcast to form the groomed video content.
- Embodiment 29. The method according to the features of embodiment 26 wherein stitching occurs in response to a signal received from a client device of the user.
- Embodiment 30. The method according to the features of embodiment 26, wherein the mark-up language layout includes an addressable location for insertion of the groomed video content into the encoded video content and wherein the stitcher places the groomed video content into the encoded video content at the addressable location.
- Embodiment 31. The method according to claim 26, wherein the groomed video content is in an MPEG format and is part of an MPEG object associated with at least one method.
- Embodiment 32. The method according to the features of embodiment 27, further comprising: in response to receiving a second signal from the client device of the user executing the method of the MPEG object causing the stitcher to place new groomed video into the encoded video content.
- Embodiment 33. A groomer for grooming MPEG encoded video content, the groomer comprising: an input for receiving in the MPEG encoded video content; receiving a graphical layout that references the MPEG encoded video content and indicates a portion of the MPEG encoded video content to remove; a grooming module for converting each frame of the MPEG encoded video content to a P-frame and for intracoding all macroblocks that contain motion vectors that reference the portion of the MPEG encoded video content to be removed.
- Embodiment 34. A groomer according to the features of embodiment 33 wherein the grooming module removes the portion of the MPEG encoded video content from the MPEG encoded video content as indicated in the graphical layout.
- Embodiment 35. A method for creating composite MPEG content from multiple MPEG encoded sources that include a macroblock and a slice level in a processor, the method comprising: retrieving a mark-up language frame layout including regions within the frame layout for compositing a first MPEG encoded source and a second MPEG encoded source; receiving the first MPEG encoded source; receiving the second MPEG encoded source; decoding the first and second MPEG encoded sources to the slice level; associating data at the slice level from the first and second MPEG sources with the regions from the mark-up language frame layout; compositing the data at the slice level of the first and second MPEG encoded sources according to the mark-up language frame layout.
- Embodiment 36. The method according to the features of embodiment 35, wherein the region within the frame layout includes discontinuous positions for the first MPEG encoded source.
- Embodiment 37. A method for creating a custom MPEG mosaic, the method comprising: receiving a request from a requestor for a custom MPEG mosaic containing at least a first video source; receiving one or more groomed MPEG content streams including the first video source; and compositing the groomed MPEG content stream including the first video source with a background creating an MPEG mosaic; transmitting the MPEG mosaic to the requestor.
- Embodiment 38. The method according to the features of embodiment 37 wherein the request from the requestor requires at least two video sources; and wherein compositing requires that groomed MPEG content streams including the first video source and the second video source are composited with the background.
- Embodiment 39. The method according to the features of embodiment 37 further comprising: in response to the request from a requestor, retrieving a mark-up language layout corresponding to a number of video sources selected by the requestor for display in the custom MPEG mosaic.
- Embodiment 40. The system according to the features of embodiment 1 further comprising: a processor for controlling an interactive session that retrieves from memory in response to user input the mark-up language graphical layout.
- Embodiment 41. The system according to the features of embodiment 40 wherein the processor maintains state information about one or more objects associated with the encoded frame and responsive to a request for changing the state of the object the processor replaces the first encoded source with a new encoded source.
Claims (15)
- A stitching system for receiving multimedia content (1210, 1220) from a plurality of sources and stitching together MPEG video elements originating from the multimedia content to form an MPEG compliant stream (1290), the system comprising:a demultiplexer (1225) for receiving encoded multimedia content (1220) from a source and demultiplexing the multimedia content into encoded video content (1229) and encoded audio content (1227);a stitcher (1200) for receiving encoded video content from multiple sources, in particular from the demultiplexer (1225), and stitching together MPEG video elements which are part of the encoded video content to form a sequence of MPEG video frames; anda multiplexor (1280) for receiving and combining the sequence of MPEG video frames and the encoded audio content to form an MPEG compliant stream.
- A stitching system according to claim 1 further comprising:an encoder (109/1215) for receiving unencoded multimedia content (1210), for encoding video content (1219) of the multimedia content and outputting the multimedia content as one or more MPEG video elements to said stitcher (1200).
- A stitching system according to claim 2, wherein the encoder (1215) demultiplexes the multimedia content into video content (1219) and audio content (1217).
- A stitching system according to claim 3 wherein the encoder provides the audio content to an audio selector (1230).
- A stitching system according to on of the proceeding claims, wherein the stitcher (1200) further includes a frame synchronization module (1240) for receiving encoded video content from multiple sources and synchronizing the encoded video content.
- A stitching system according to claim 5, wherein the stitcher (1200) further includes:a buffer (1250) coupled to the frame synchronization module (1240) for storing encoded video content prior to construction of the sequence of MPEG video frames.
- A stitching system according to claim 6, wherein the stitcher (1200) further includes:a controller (1275) for controlling the selection of MPEG video elements for forming a complete MPEG video frame.
- A stitching system according to claim 7, wherein the stitcher (1200) further includes:a frame constructor (1270) for combining the selected MPEG video elements to form a complete MPEG video frame and creating an MPEG compliant stream.
- A stitching system according to one of the proceeding claims wherein multimedia content can include still images, video images and/or audio data.
- A stitching system according to one of the proceeding claims, further comprising:an audio selector(1230) for selectively receiving the encoded audio content from the demultiplexer (1225) and selecting between the encoded audio content from the multiple sources.
- A stitching system according to claim 10, further comprising:a delay means (1260) for delaying the selected encoded audio content so that the encoded audio content will be synchronized with the sequence of MPEG video frames at the multiplexor.
- A method for stitching together a scene composed of a sequence of multiple MPEG video frames using multiple encoded MPEG video elements from multiple sources, the method comprising:for the scene:iteratively determining frame composition (2010) for a current MPEG video frame including which MPEG video elements are to be used from the multiple sources;for each row of the current MPEG video frame (2020):fetch the next slice of MPEG video elements from one or more sources (2030);insert the MPEG video elements into the MPEG video frame structure (2040); anddetermine if the row has been completed (2050);determining if all of the rows of the current MPEG video frame have been completed and if so, begin iteratively determining frame composition (2010) on the next MPEG video frame in the scene (2080);determine if the sequence of MPEG video frames has been completed (2090); andoutputting the sequence of MPEG video frames.
- A method for stitching together a scene composed of a sequence of multiple MPEG video frames according to claim 12, further comprising:synchronize the frame with other frames (2015) for a scene.
- A computer program product adapted to perform the method in accordance with claims 13 or 14.
- A computer-readable storage medium comprising the program in accordance with claim 14.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US60/884,772 | 2007-01-12 | ||
| US60/884,744 | 2007-01-12 | ||
| US60/884,773 | 2007-01-12 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1188353A true HK1188353A (en) | 2014-04-25 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2106665B1 (en) | Interactive encoded content system including object models for viewing on a remote device | |
| US20080212942A1 (en) | Automatic video program recording in an interactive television environment | |
| US9826197B2 (en) | Providing television broadcasts over a managed network and interactive content over an unmanaged network to a client device | |
| US20090328109A1 (en) | Providing Television Broadcasts over a Managed Network and Interactive Content over an Unmanaged Network to a Client Device | |
| JP5936805B2 (en) | Method, system, and computer software for streaming parallel user sessions | |
| US7688889B2 (en) | Methods, apparatus, and systems for insertion of overlay content into a video signal with transrating capabilities | |
| US10743039B2 (en) | Systems and methods for interleaving video streams on a client device | |
| HK1188353A (en) | Interactive encoded content system including object models for viewing on a remote device | |
| HK1134729B (en) | Interactive encoded content system including object models for viewing on a remote device | |
| HK1188661B (en) | Interactive encoded content system including object models for viewing on a remote device | |
| US9219930B1 (en) | Method and system for timing media stream modifications |