HK1172720A

HK1172720A - Multifunction multimedia device

Info

Publication number: HK1172720A
Application number: HK12113581.4A
Authority: HK
Inventors: B．波尼亚托夫斯基; R．马修斯
Original assignee: Tivo有限公司
Priority date: 2009-12-04
Filing date: 2010-12-03
Publication date: 2013-04-26

Description

Multifunctional multimedia equipment

Technical Field

The present invention relates to a multifunctional multimedia device.

Background

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Thus, unless otherwise indicated, the approaches described in this section should not be assumed to be prior art merely as being encompassed by this section.

The multimedia content stream may be received by a multimedia player for display to a user. In addition, a general description of the multimedia content may be received by the multimedia player for display to the user. Multimedia content is typically presented in a fixed, non-editable format. The user may jump to a particular point within the media content through a scene selection authored by the producer. Thus, viewing media content is generally passive and user interaction is minimal.

Drawings

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1A is a block diagram illustrating an exemplary system according to an embodiment;

FIG. 1B is a block diagram illustrating an exemplary media device according to an embodiment;

FIG. 2 illustrates a flow diagram for presenting additional content according to an embodiment.

Fig. 3 shows a flow diagram for determining a position in the playing of media content according to an embodiment.

FIG. 4 illustrates a flow diagram for detecting advertisement play according to an embodiment.

Fig. 5 shows a flow diagram for acquiring fingerprints from media content according to an embodiment.

FIG. 6 illustrates an exemplary architecture for collecting and storing fingerprints obtained from media devices.

Fig. 7 shows a flow diagram for presenting a message according to an embodiment.

FIG. 8 shows a flow diagram for interpreting a voice command according to an embodiment;

FIG. 9 illustrates a flow diagram for associating annotations with media content according to an embodiment;

FIG. 10 illustrates an exemplary system for configuring an environment in accordance with one or more embodiments.

FIG. 11 illustrates a flow diagram for selecting media content for recording based on one or more fingerprints derived from the media content in accordance with one or more embodiments;

FIG. 12 illustrates a flow diagram for replacing an incomplete copy of media content with a complete copy of media content in accordance with one or more embodiments;

FIG. 13 illustrates a flow diagram for beginning recording media content in a content stream based on one or more fingerprints obtained from the media content in accordance with one or more embodiments;

FIG. 14 illustrates a flow diagram for stopping recording of media content in a content stream based on one or more fingerprints obtained from the media content in accordance with one or more embodiments;

FIG. 15 illustrates a block diagram that shows a system upon which an embodiment of the invention may be implemented.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

Some features are described below, each of which can be used independently of the other features or in any combination with the other features. However, any individual feature may not solve any of the problems described above, or may only solve one of the problems described above. Some of the above problems may not be adequately addressed by any of the features described herein. Although headings are provided, information regarding a particular heading that does not appear in the section having that heading may also be found elsewhere in the specification.

Exemplary features are described according to the following outline:

1.0 functional overview

2.0 System architecture

3.0 presenting additional content based on media content fingerprints

4.0 determining the playback position based on the media content fingerprint,

5.0 recording based on media content fingerprints

6.0 publishing record or viewing information

7.0 obtaining fingerprints from media content

8.0 Presence message

9.0 interpreting commands

10.0 associating input with media content

11.0 obtaining annotations by personal media device

12.0 tagging media content

13.0 publication of media content annotations

14.0 automatically generated annotations

15.0 Environment configuration

16.0 hardware overview

17.0 extensions and substitutions

1.0 functional overview

In an embodiment, media content is received and presented to a user. The fingerprint obtained from the media content is then used to query a server to identify the media content. Based on the media content (identified based on the fingerprint), additional content is obtained and presented to the user.

In embodiments, the additional content may include advertisements (e.g., for products, services, or other media content) that are selected based on the identified media content.

In an embodiment, the fingerprint is dynamically obtained from the media content upon receiving a command to render the media content. In an embodiment, the fingerprint is dynamically obtained from the media content upon receiving a command to render additional content that is related to the media content being rendered.

In an embodiment, a face is detected in media content based on a fingerprint acquired from the media content. The name of the person associated with the face is determined and presented in the additional content. Detecting a face and/or determining a name of a person associated with the face may be performed in response to receiving a user command.

In embodiments, features (e.g., objects, structures, landscape, location, etc.) in a frame of media content may be detected based on a fingerprint obtained from the media content. The features may be identified and the identification may be presented. The features may be identified and/or the identification may be presented in response to a user command.

In an embodiment, the fingerprint may be dynamically acquired while the media content is being played. The location in the media content playback may then be determined based on the fingerprint.

In an embodiment, the additional content may be presented based on a playback position in the media content. In an embodiment, the additional content presented based on the play position of the media content may be in response to a user command.

In embodiments, the playback of media content may be synchronized across multiple devices based on the location in the playback of the media content. In an embodiment, synchronization across multiple devices may be achieved by starting playing media content on multiple devices simultaneously, looking for any location of the media content on the devices, or delaying the playing of the media content on the devices. During the synchronized playing of media content on multiple devices, commands to fast forward, fast rewind, pause, stop, seek, or play on one device may be performed on all synchronized devices. In an embodiment, it may be determined that an advertisement is playing based on a location in the media content play. The advertisement may be skipped or fast forwarded based on the location in the media content play. In embodiments, a notification of the advertisement being played or the speed at which the advertisement is played may be provided. In an embodiment, advertisements may be selected based on a location in the media content play.

In an embodiment, the play of an advertisement may be detected by determining that one or more fingerprints of media content being played are associated with an advertisement portion of the media content. In an embodiment, the advertisement may be detected by identifying a person associated with a face in an advertisement portion of the media content and determining that the identified person is not an actor listed in the media content. In an embodiment, the advertisement may be augmented with additional content pertaining to the product or service being advertised. In embodiments, the advertisement may be automatically fast forwarded, muted, or replaced with a replacement advertisement. In an embodiment, only non-advertising portions of the media content may be recorded by skipping advertising portions of the detected media content.

In an embodiment, a command is received to record particular media content on a first device associated with a first user, and the particular media content is scheduled for recording on the first device. A notification to schedule a recording of particular media content on a first device is provided to a second device associated with a second user. The second device may then schedule the recording of the particular media content. In response to the notification, the second device may schedule recording of the specific media content to record the specific media content after receiving no user command or receiving user confirmation.

In an embodiment, a command may be received from the second user via the second device to record all media content scheduled for recording on the first device, any of the plurality of designated devices, or a device associated with any of the plurality of designated users.

In an embodiment, scheduled recordings of particular media content on multiple devices may be detected. In response to detecting that the particular media content is scheduled for recording on the plurality of devices, a notification that the particular media content is scheduled for recording on the plurality of devices may be provided to at least one of the plurality of devices. The particular media content may then be displayed simultaneously on the multiple devices. Based on the user availability calendar accessible through each device, one of the devices may select a time to play particular media content synchronously on multiple devices. A time may also be suggested to receive a user confirmation for the suggested time.

In an embodiment, a command may be received to record or play particular media content on a device associated with a user. In response to the command, the particular media content may be recorded or played and information may be issued relating to a user indication that the user is recording or playing the particular media content. The information may be automatically published to a web service for further action, such as display on a web page. In response to the command, information related to the particular media content may be obtained and presented to the user. In embodiments, groups may be automatically created (e.g., on a social networking site) for users associated with a device that plays or records particular media content.

In an embodiment, media devices that meet an idle criterion may be detected. In response to detecting the idle criteria, media content may be sent to the media device. The media device may be configured to receive a particular content stream or a stream accessible via the internet that includes media content. The media device may obtain a fingerprint from the media content and send the fingerprint to a fingerprint database along with additional data pertaining to the media, such as title, outline, closed captioning text, etc. Detecting that the media device satisfies the idle criteria may involve receiving a signal from the media device, the media device completing a duration without receiving a user command on the media device, or determining that the media content has resource availability for acquiring fingerprints.

In an embodiment, a message is received while audio/video (AV) content is being played. The message is interpreted based on message preferences associated with the user and presented to the user based on the message preferences. In an embodiment, one or more messages may be filtered based on message preferences.

In an embodiment, presenting the message includes overlaying information related to the message on one or more video frames of the AV content being played to the user. Presenting the message may include playing audio information associated with the message. In an embodiment, the AV content is paused or muted while the message is presented.

In an embodiment, the message is presented by another user as audio input, text input, or graphical input. The audio input may include speech that is associated with the sender of the message, the recipient of the message, a particular fictional character or non-fictional characters, or a combination thereof. The message may be played uniquely to the message recipient.

In an embodiment, the message may be presented for a period of time specified by the message preference. The message may be retained during playing of the AV content until a commercial occurs and may be presented at a commercial time. In embodiments, the message may be received from a messaging service associated with a social networking site.

In an embodiment, a user-defined alarm condition is received from a user. The AV content is played while monitoring for the occurrence of a user-defined alert condition and detecting the occurrence of a user-defined alert condition. The alert may be presented in response to detecting the occurrence of a user-defined alert condition.

In an embodiment, detecting the alert condition includes determining that media content of interest to the user is available on the content stream. In an embodiment, detecting the alert condition includes determining that media content related to the user requested information is available on the content stream. Detecting the alarm condition may include receiving a notification indicating that the alarm condition occurred. In an embodiment, detecting the occurrence of the alarm condition may include obtaining information using Optical Character Recognition (OCR) and detecting the occurrence of the alarm condition based on the information.

In an embodiment, a voice command is received from a user and the user is identified based on the voice command. The voice command is then interpreted based on the identified preferred preferences associated with the user to determine a behavior of the plurality of behaviors. The action is then executed.

In an embodiment, a plurality of users capable of applying a voice command is determined. The number of applicable users may be determined by means of recognizing the users based on the speech input.

In embodiments, the actions based on the user's preferred preferences may include configuring the multimedia device or environment, presenting information, making a purchase, or performing another appropriate action. In embodiments, the behavior may be presented for user confirmation prior to execution of the behavior, or the behavior may be checked to ensure that the user allows the behavior to be executed. In an embodiment, the voice command may be interpreted based on the language of the received voice command.

In an embodiment, annotations are received from a user while media content is being played on a multimedia device. The annotations are stored as being related to the media content. In embodiments, the annotations may include audio input, textual input, and/or graphical input. In an embodiment, the second played media content is accompanied by audio input received from the user. Playing the media content a second time may involve playing only the video portion of the media content with the audio input received from the user.

In embodiments, multiple versions of an annotation may be received during different playback of media content, and each annotation may be stored in association with the media content. The annotations may be provided in a language different from the original language of the audio portion of the media content. The annotation may have instructions related to the intended playback. The annotations may include audio automatically generated based on information obtained using optical character recognition. In an embodiment, annotations may be analyzed to obtain annotation patterns related to media content. The annotations may be obtained from a user and may include comments of the media content. In an embodiment, the user profile may be generated based on the annotation. The annotations may mark time intervals or particular points in the media content play, which may serve as bookmarks to continue the play of the media content. The interval marked by the annotation may be skipped during subsequent playing of the media content or used to create a play order.

Although specific components are described herein as performing method steps, in other embodiments, media or mechanisms representing specified components may perform the method steps. Further, while certain aspects of the invention are discussed with respect to components in a system, the invention may be practiced with components distributed across multiple systems. Embodiments of the present invention also include any system that includes means for performing the method steps described herein. Embodiments of the present invention also include computer-readable media having instructions that, when executed, cause the method steps described herein to be performed.

2.0 System architecture

Although a particular computer architecture is described herein, other embodiments of the invention are applicable to any architecture that can be used to perform the functions described herein.

Fig. 1 shows media device a (100), media source (110), media device N (120), fingerprint server (130), network device (140), and network server (150). Any of these components are presented to clarify the functionality described herein and may not be necessary to practice the invention. In addition, components not shown in FIG. 1 may also be used to perform the functions described herein. Functions described as being performed by one component may instead be performed by another component.

In embodiments, media source (110) generally represents any content source from which media device a (100) may receive media content. The media source (110) may be a broadcaster (including a broadcaster/service) that streams media content to media device a (100). The media source (110) may be a media content server from which media device a (100) downloads media content. The media source (100) may be an audio and/or video player from which the media device a (100) receives the media content being played. The media source (100) may be a computer-readable storage or input medium (e.g., physical memory, compact disk, or digital video disk) that the media device a (100) reads to obtain media content. The terms streaming, broadcasting, or downloading to a device may be used interchangeably herein and should not be construed as being limited to one particular method of obtaining data by a device. Media device a (100) may receive data from a broadcast service, a network server, another media device, or any suitable system with data or content that can be accessed by the media device, by streaming, broadcasting, downloading, etc. Different sources may be mentioned, presented as different examples below. Examples describing a particular source should not be construed as limited to only that source.

In an embodiment, the fingerprint server (130) generally represents any server that stores fingerprints acquired from media content. The fingerprint server (130) may be accessed by media device a (100) to download and/or upload fingerprints obtained from media content. The fingerprint server (130) may be managed by a content source (e.g., a broadcast service, a web service, or any other content source) for storing a database of fingerprints obtained from media content. The content source may select media content to be fingerprinted. Media device a (100) may obtain a fingerprint from the selected media content and provide the fingerprint to the fingerprint server (130). In an embodiment, the fingerprint server (130) may act as a database for identifying media content or metadata associated with the media content based on fingerprints obtained from the media content. In an embodiment, at least a portion of the fingerprint server (130) is implemented on one or more media devices. The media device may be updated continuously, periodically, or according to another suitable schedule as the fingerprint server (130) is updated.

In an embodiment, the network device (140) generally represents any component that is part of the media device a (100) or is a completely separate device that includes communication functionality over a network (e.g., the internet, intranet, world wide web, etc.). For example, the network device (140) may be a computer that is communicatively coupled to media device a (100) or a network card of media device a (100). The network device (140) may include functionality to publish information related to media device a (100) (e.g., media content scheduled for recording on media device a (100), media content recorded on media device a (100), media content being played on media device a (100), media content previously played on media device a (100), media content displayed on media device a (100), user preference/statistics collected by media device a (100), user settings on media device a (100), etc.). The network device (140) may publish information on a website, provide information in the form of an electronic message or text message, print information on a network printer, or publish information in any other suitable manner. The network device (140) may include functionality to provide information directly to another media device (e.g., media device N (120)). The network device (140) may include functionality to obtain information from a network. For example, the network device (140) may search for metadata or any other additional data related to the media content and provide the search results to media device A (100). Another example may relate to a network device (140) obtaining information related to media content scheduled, recorded, and/or played on media device N (120).

In an embodiment, media device a (100) (or media device N (120)) generally represents any media device that includes a processor and is configured to present media content. Media device a (100) may refer to a single device or any combination of devices (e.g., a receiver and a television) that may be configured to present media content. Examples of media device a (100) include one or more of a receiver, digital video recorder, digital video player, television, display, blu-ray player, audio content player, video content player, digital photo frame, handheld mobile device, computer, printer, and the like. Media device a (100) may present media content by playing media content (e.g., audio and/or visual media content), displaying media content (e.g., still images), printing media content (e.g., coupons), electronically transmitting media content (e.g., electronic mailboxes), publishing media content (e.g., on a website), or any other suitable method. In an embodiment, media device a (100) may be a management device in communication with one or more other media devices in the system. For example, media device a (100) may receive commands from a media device (e.g., a DVD player, a remote control, a joystick, etc.) and communicate the commands to another media device (e.g., a display, a receiver, etc.). In embodiments, media device a (100) may represent any apparatus having one or more subsystems configured to perform the functions described herein.

In an embodiment, media device a (100) may include functionality to obtain fingerprints from media content. For example, media device A (100) may obtain fingerprints from media content recorded on associated memory or stored anywhere else accessible (e.g., external hard drive, DVD, etc.). Media device a (100) may also obtain fingerprints from media content available on the content stream. The media content available on the content stream includes any media content accessible by media device A (100). For example, the content available on the content stream may include content being played out by a broadcast service, available content downloaded from a network server, a peer device, or another system, or otherwise accessible by media device a (100). In embodiments, media device a (100) may include functionality to obtain media content being played and to dynamically acquire fingerprints from the media content being played or stored on the media device. In embodiments, media device a (100) may include processing and storage capabilities to decompress media content (e.g., video frames), modify and/or edit media content, and compress media content.

In embodiments, media device a (100) may include functionality to emulate the functionality of other media devices (e.g., media device N (120)) by recording or playing the same media content as the other media devices. For example, media device a (100) may include functionality to receive notification that media content is being recorded on media device N (120) and obtain the same media content from a content source. Media device a may automatically record media content or provide the notification to the user and record media content in response to a user command.

FIG. 1B illustrates an exemplary block diagram of a media device, in accordance with one or more embodiments. As shown in fig. 1B, media device (100) may include components such as a memory system (155), disk (160), Central Processing Unit (CPU) (165), display subsystem (170), audio/video input (175), tuner (180), network module (190), peripheral unit (195), text/audio converter (167), and/or other components necessary to perform the functions described herein.

In an embodiment, the audio/video input (175) may correspond to any component that includes functionality to receive audio and/or video input (e.g., HDMI 176, DVI 177, analog device 178) from an external source. For example, the audio/video input (175) may be a displayport or High Definition Multimedia Interface (HDMI), which may receive input from different devices. The audio/video input (175) may receive input from a set top box, a blu-ray disc player, a personal computer, a video game console, an audio/video receiver, a compact disc player, an enhanced universal disc player, a high definition compact disc, a holographic universal disc, a laser disc, a mini disc, a film disc, a RAM disc, a vinyl disc, a floppy disc, a hard drive, etc. Media device a (100) may include a plurality of audio/video inputs (175).

In an embodiment, tuner (180) generally represents any input component capable of receiving a content stream (e.g., via a cable, satellite, internet, network, or terrestrial antenna). The tuner (180) may allow one or more receive frequencies while filtering out others (e.g., by using electronic resonance). The television tuner may convert the RF television transmission into audio and video signals that may be further processed to produce sound and/or images.

In an embodiment, input may also be received from the network module (190). Network module (190) generally represents any input component capable of receiving information over a network (e.g., the internet, intranet, world wide web, etc.). Examples of network module (190) include a network card, a network adapter, a Network Interface Controller (NIC), a network interface card, a local area network adapter, an ethernet card, and/or any other component that can receive information over a network. The network module (190) may also be used to directly connect other devices (e.g., media devices, computers, secondary storage devices, etc.).

In an embodiment, input may be received by the media device (100) from a device communicatively connected by a wired and/or wireless communication segment. The input received by the media device (100) may be stored to the memory system (155) or to the disk (160). The memory system (155) may include one or more different types of physical memory to store data. For example, one or more memory buffers (e.g., high definition frame buffers) in the memory system (155) may include storage capacity to load one or more uncompressed High Definition (HD) video frames for editing and/or fingerprinting. The memory system (155) may also store frames in a compressed format (e.g., MPEG2, MPEG4, or any other suitable format) where they are then decompressed to a frame buffer for modification, fingerprinting, replacement, and/or display. The memory system (155) may include flash memory, DRAM memory, EEPROM, conventional rotating disk drives, and the like. The disk (160) generally represents secondary storage accessible by the media device (100).

In embodiments, the central processing unit (165) may perform the functions described herein using any input received by media device a (100). For example, the central processing unit (165) may be used to dynamically acquire fingerprints from frames of media content stored in the memory system (155). The central processing unit (165) may be configured to mark or identify the media content or portions of the media content based on a tag, hash value, fingerprint, timestamp, or other suitable information related to the media content. The central processing unit (165) may be used to modify media content (e.g., scale video frames), analyze media content, decompress media content, compress media content, and so on. Video frames (e.g., high definition video frames) stored in the video frame buffer may be dynamically modified by the central processing unit (165) to overlay additional content (e.g., information about the frame, program information, chat messages, system messages, web page content, pictures, electronic program guides, or any other suitable content) over the video frames, manipulate the video frames (e.g., stretch, rotate, shrink, etc.), or replace the video frames in real-time. Thus, an electronic program guide, dynamically selected advertising information, media content information, or any other text/graphics may be written onto the video frames stored in the frame buffer to superimpose additional content on the stored video frames. The central processing unit (165) may be used to handle communications with any input and/or output device associated with the media device (100). For example, real-time dynamically modified video frames may be subsequently transmitted for display. The central processing unit (165) may be used to communicate with other media devices to perform functions related to synchronization or data distribution.

In an embodiment, the text/audio converter (167) generally represents any software and/or hardware for converting text to audio and/or for converting audio to text. For example, the text/audio converter may include a function of converting text corresponding to closed caption data into an audio file. The audio file may be based on computerized speech or may be trained to use speech of a user, a fictional or non-fictional person, or the like. In an embodiment, the automatically generated speech for the particular message may be the speech of the user-generated message. The text/audio converter may include a function of switching languages when converting from speech to text or from text to speech. For example, audio input in French may be converted to text messages in English.

In an embodiment, the peripheral unit (195) generally represents input and output for any peripheral communicatively connected to the media device (100) (e.g., via USB, external serial advanced technology attachment (eSATA), parallel ATA, serial ATA, bluetooth, infrared, etc.). Examples of peripherals may include remote controls, USB drives, keyboards, mice, microphones, and voice recognition devices that may be used to operate the media device (100). In embodiments, multiple microphones may be used to detect speech, identify user location, and the like. In an embodiment, the microphone may be part of media device a (100) or other device (e.g., a remote control) communicatively connected to media device (100). In an embodiment, when audio input is received from a user (e.g., via a microphone), the media device (100) may include functionality to identify media content being played (e.g., a particular program, or a location in a particular program).

In an embodiment, display subsystem (170) generally represents any software and/or device including the functionality to output (e.g., video output display 171) and/or actually display one or more images. Examples of display devices include kiosks, handheld devices, computer screens, displays, televisions, and the like. The display device may use different types of screens such as liquid crystal displays, cathode ray tubes, projectors, plasma screens, etc. The output of the media device (100) may be in a specialized format for the type of display device being used, the size of the display device, the resolution (e.g., 720i, 720P, 1080i, 1080P or other suitable resolution), and so forth.

3.0 presenting additional content based on media content fingerprints

Fig. 2 shows a flow diagram for presenting additional content, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 2 should not be construed as limiting the scope of the invention.

Initially, according to an embodiment, a command is received to present media content (step 202). The received command may be entered by the user via a keyboard or a remote control. The command may be a user selection in an Electronic Program Guide (EPG) for recording and/or playing media content. The command may be a channel selection entered by a user. The command may be a request to display a picture slide. The command may be to play an audio file. The command may be a command requesting to play a movie (e.g., a blu-ray player). In an embodiment, receiving a command to present media content may include a user entering a title of the media content in a search field on a user interface. In an embodiment, media content is presented (step 204). Presenting media content may include playing audio and/or visual media content (e.g., video content), displaying or printing images, and so forth. Presenting media content also involves overlaying the media content over other media content being presented.

In an embodiment, a fingerprint is acquired from media content (step 206). An example of obtaining a fingerprint from media content includes projecting intensity values of one or more video frames onto a set of projection vectors and obtaining a set of projection values. The fingerprint bits may then be computed and concatenated based on each of the projected values to compute a fingerprint of the media content. Another example may include applying a mathematical function to a spectrogram of an audio file. In accordance with one or more embodiments, other fingerprint acquisition techniques may also be used to acquire fingerprints from media content. In an embodiment, the fingerprint is dynamically retrieved from the media content as the media content is being played. For example, media content received from a content source may be played and fingerprinted simultaneously. Fingerprints may be obtained for media content identification, e.g., identifying a particular program, movie, etc. Media streams containing 3-dimensional video may also be fingerprinted. In an embodiment, fingerprinting a 3-dimensional video may involve selecting a fingerprint portion of the 3-dimensional video. For example, objects that are close in a 3-dimensional video stream (e.g., objects that appear closer when viewing a 3-dimensional video) may be selected for fingerprinting to identify faces or structures. The near object may be selected based on a depth mark field associated with the object or by the relative size of the object to other objects.

In an embodiment, a command is received to present additional content related to the media content being presented (step 208). A command may be received to identify general additional content (e.g., any feature in the media content). For example, information about the media content being played, such as a synopsis of a movie, actors in the movie, year the movie was made, duration associated with a particular media content, director or producer of the movie, genre of the movie, etc. In an embodiment, specific information may be requested. For example, a command requesting the geographic location of the world of the current scene being played. Another example may involve a command requesting identification of a person in the current scene being played. Another example may relate to a request for the year and model of a car in a movie scene. Another example may involve a request to save or publish information about the content (including timestamps, offsets from the beginning, and other contextual data) for later use or reference. Thus, a specific information request may include an identification of a location, object, or person in a scene of the media content.

When a command for additional content is received, the additional content requested by the user may not be available. Accordingly, upon receiving the command, the additional content is dynamically identified based on the fingerprint of the media content (step 210). For example, a fingerprint obtained from media content may be used to query a web server and receive an identification of an object, location, or person in a scene matching the fingerprint. Fingerprints may also be used to identify media content being played to obtain metadata that has been associated with the media content. In an embodiment, the fingerprint may be dynamically acquired from the media content after receiving a command to present the additional information.

In an embodiment, additional content is presented (step 212). Presenting the additional content may include overlaying the additional content on the media content being presented to the user. Rendering the additional content may further include overlaying the additional content on portions of the frames replaced by scaling, cropping, or otherwise altering the original content. Uncompressed for overlaying additional content on original or changed media content

The HD frame may be loaded into a frame buffer and additional data may be written to the same frame buffer, overwriting the original frame information with the additional data. The additional information may relate to the media content being played, EPG display data, channel indications in banner display format (as described in us patent 6,642,939, owned by the present applicant and incorporated herein by reference), program profiles, and the like. For example, in a movie, the geographical location of a scene may be displayed on the screen simultaneously with the scene. In another example, a field may display the name of the current actor in the scene at any given time. A visual indication may be displayed that associates the name of an object, place, person, etc. with the object, place, person on the screen. For example, a line between a car in the scene and information identifying the car. The additional content may also provide links to advertisers, businesses, etc. related to the displayed images. For example, the additional information about the car displayed on the screen may include identification information about the car, the name of a car dealer selling the car, a link of the car dealer selling the car, price information related to the car, safety information related to the car, or any other information directly or indirectly related to the identified car. Another may involve presenting information about available content streams (e.g., received from a broadcast service or from a network server). The content itself may be overlaid on the frame or a link with a description may be overlaid on the frame, where the link may be selected by user input. The additional content may be presented as closed caption data. In another example, subtitles in a language selected by a user may be overlaid on content, such as a movie or television program. Subtitles can be obtained by various methods including downloading from an existing subtitle file database or real-time computational translation from the closed caption text of the original content. Another example may involve a synchronized overlay of lyrics over a music video or concert performance. The system may perform this operation over several frames or until the user instructs it to delete the overlay. At this point, the system may stop writing additional information to the frame buffer. In embodiments, the audio content may replace or overlay audio from the original content. One example may involve replacing the audio stream of a national broadcast of a national football match with the audio stream of a local radio announcer. One example may involve real-time mixing of audio from the original media with additional audio (e.g., commentary of actors in the scene). The examples may relate to modification of the original and additional audio, such as amplification.

4.0 determining playback position based on media content fingerprints

Fig. 3 illustrates a flow diagram for determining a position in a media content playback, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 3 should not be construed as limiting the scope of the invention.

Initially, according to an embodiment, a command is received to present media content (step 302) and the media content is presented (step 304). Steps 302 and 304 are essentially the same as steps 202 and 204 described above.

In an embodiment, a fingerprint is obtained from the media content being played (step 306) to determine a location in the media content playing on the first device (step 308). For example, as the media device receives media content in a content stream (or from any other source), the media device may display the media content and obtain a fingerprint from the particular frame being displayed. The media device may also obtain fingerprints from every n frames, from embedded frames (iframes), or based on any other frame selection mechanism. The content fingerprint obtained from one or more frames may then be compared to a fingerprint database to identify a database fingerprint that matches the frame fingerprint. The fingerprint database may be implemented locally on the media device itself or on a server communicatively connected to the media device. The match between the content fingerprint and the database fingerprint may be an exact match, or the two fingerprints may satisfy a similarity threshold (e.g., at least a threshold number of signature bits in the fingerprint match). Once a match is identified in the database, the metadata stored as being associated with the database fingerprint is obtained. The metadata may include a location in the media content. For example, the metadata may indicate that the fingerprint corresponds to a kth frame of a total of n frames in the media content. Based on the location information and/or the number of frames per second, a location in the media content playback may be determined. The metadata may also explicitly indicate a location. For example, the metadata may indicate that the fingerprint corresponds to a play position 35 minutes and 3 seconds from the beginning of the media content.

According to one or more embodiments, based on the location in the play of the media content on the first device, the second device may synchronize with the first device by simultaneously playing the same media content on the second device (step 310). Once the location in the media content playback is determined for the first device, playback of the media content on the second device can begin at the location. If the playing of the media content has already started on the second device, the playing of the media content on the second device may be stopped and restarted at the location. Alternatively, the playing of the media content on the second device may fast forward or rewind to the location.

In an embodiment, viewing of live or stored programs may be synchronized using a buffer incorporated into the media device. For example, content received in a content stream may be stored at multiple devices as they are received. Thereafter, the devices can communicate to synchronously begin playing of the media content, pausing of the media content, fast forwarding of the media content, and rewinding of the media content. A large buffer capable of storing the entire media content may be used in embodiments. Alternatively, a smaller buffer may be used and the video frames may be deleted as they are played and replaced by new video frames received in the content stream. Synchronized playback of a live or stored program may involve playing back a particular frame stored in the memory buffer at a particular time to achieve frame-level synchronization. For example, the two devices may exchange information indicating the second in which a particular frame stored in memory is to be played and the rate at which future frames are to be played. Thus, based on the same start time, frames may be displayed at exactly the same time or approximately the same time on different media devices. In addition, additional frame/time combinations may be determined to ensure that synchronization is maintained. When the media device is being used in different time zones, the time may be adjusted to account for the time difference. For example, Greenwich Mean Time (GMT) may be used on all media devices for synchronized playback of media content.

In an embodiment, synchronization may be maintained after multiple devices playing the same media content are synchronized. To maintain synchronization, any play functions received on one device (e.g., stop, fast forward, rewind, play, pause, etc.) may be performed on both devices (step 312).

In an embodiment, the playing of the advertisement may be detected based on a location in the playing of the media content (step 314). For example, media content available on a content stream may include television programs and advertisements interspersed at different times during the television programs. The composition information for the media content may indicate that the television program was broadcast for twenty-five minutes, followed by a five minute advertisement, followed by another twenty-five minute television program, and followed by another five minute advertisement. Thus, if the position in the media content playback is determined to be twenty minutes after the start, the television program is being played. However, if the position of the media content playing is determined to be twenty-seven minutes after the start, the advertisement is being played.

In an embodiment, the playing of an advertisement may be detected without determining a location in the playing of the media content. For example, if the media content includes a television program and advertisements interspersed between television programs, the advertisements may be detected based on a fingerprint obtained from the media content currently being played. The fingerprint obtained from the media content currently being played may be compared to fingerprints obtained from television programs only or advertisements only. Based on the comparison, the media content being played simultaneously may be determined to be part of a television program or part of an advertisement.

In an embodiment, the playing of an advertisement may be detected based on elements presented in the media content. For example, the faces of actors in the media content may be recognized based on fingerprints obtained from the media content being played. The names of actors may then be compared to the names of actors in the list of actors in the television program. If the detected actors in the media content being played match the actors listed in the television program, the television program is being played. Alternatively, if the detected actors in the media content being played do not match the actors listed in the television program, then the advertisement is being played. In an embodiment, a time window may be used for detection of known actors in a television program, where at least one actor listed in the television program must be detected within the time window to infer that the television program is playing.

In accordance with one or more embodiments, many different actions may be taken in response to determining that an advertisement is being played. In an embodiment, advertisements may be automatically fast forwarded. For example, whenever the play of an advertisement is detected, an automatic fast forward function may be used for the play of the media content until the advertisement play is complete (e.g., when the play of a television program is again detected based on a fingerprint). Likewise, advertisements may also be automatically muted, with a muting function being selected in response to detecting completion of an advertisement.

In an embodiment, if media content is being recorded, advertisements may be automatically skipped for recording. For example, in a recording of a movie being received from a content source, non-advertising portions of the media content (e.g., movie portions) may be recorded, while advertising portions of the media content may be skipped for recording.

In an embodiment, a replacement advertisement may be displayed. When the content stream is received and displayed, the commercial portion of the detected content stream may be replaced with a replacement commercial. For example, a media device at a sports bar may be programmed to display special drinks instead of advertisements received from the content stream. Alternatively, advertisements stored in memory or streamed from a local provider of the server may be displayed instead of advertisements received in the content stream. Advertisements may be selected based on media content. For example, if during a sporting event, advertisements directed to people may be selected.

In embodiments, the advertisement may be augmented with additional content related to the advertisement. When a content stream is received, the advertising portion of the detected content stream may be scaled, cropped, or otherwise modified, and the empty space displayed may be programmatically filled by additional content. For example, for an advertisement for an upcoming movie at a movie theater, it may be enhanced by showing the movie theater within 15 miles of the vicinity of the device. The user may also be presented with one or more interactive functions related to additional content, such as an option to store information about the movie being played, including selected local movie theaters and show times, for future presentation, reference, ticketing, or other related activities. In another example, the advertisement may be augmented with games, quizzes, surveys, video, and audio related to the advertisement. In embodiments, the advertisement may be augmented with information about actions taken by the user's social network connections (related to the advertisement). For example, advertising on a digital camera may be enhanced by a user's friend taking a picture with the same digital camera. In another example, an advertisement for a recently released DVD movie may be enhanced by a friend's rating and review of the movie.

In embodiments, advertisements may be augmented with additional content unrelated to the advertisement. When a content stream is received, the advertising portion of the detected content stream may be scaled, cropped, or otherwise modified, and the empty space displayed may be programmatically filled by additional content. In one embodiment, a user may instruct the system to use a portion of the display to display personalized content during an advertisement. In one example, the personalized content may include the latest scores and statistics of the user's favorite sports team. In another example, the content may include all or part of a message that the user has recently received, such as an email, a Short Message Service (SMS), an instant message, a social network notification, and a voicemail. In another example, information about additional content (related to the content interrupted by the advertisement) may be presented to the user. In another example, the user may be presented with the chance to turn to his game in a previously started game. In embodiments, the user may also be presented with one or more interactive functions related to additional content, such as an option to store content information for use in future presentations, references, or other activities. In one example, the user may choose to use a keyboard or microphone to respond to a short message, email, voicemail, or instant message.

In an embodiment, a notification of the playing of an advertisement by a media device may be provided to an interested party (e.g., a vendor or broadcaster). For example, if a provider's advertisement is played on a media device, the content source may be informed that the provider's advertisement was actually played. In addition, if the provider's advertisements are fast-forwarded through, the content source may be informed that the provider's advertisements are fast-forwarded through. This information may be provided to the provider to enable the provider to determine the effectiveness of the advertisement. Additional information, including whether the advertisement was played as a previously stored record or directly after being received from a content source, may be provided to interested parties.

In embodiments, cumulative statistics for users may also be collected based on ad detection. For example, particular types of advertisements or media content viewed by the user may be saved to determine user interests. These user interests may be provided to the provider, stored on a server, published on an interactive web page associated with the user, or otherwise presented. Anonymous information for multiple users may be collected to build reports based on user viewing or input. U.S. patent application 10/189,989 (owned by the applicant and incorporated herein by reference) describes this practice.

5.0 recording based on media content fingerprints

In an embodiment, fingerprints obtained from media content in a content stream may be used to begin and/or end recording of media content in a content stream, as shown in fig. 13 and 14.

A recording of particular media content in a content stream or known content streams available at a future time may be scheduled (step 1302). The scheduling of particular media content may be based on a time interval for the playout of the media content in the content stream, as indicated by an Electronic Program Guide (EPG). However, according to one or more embodiments, a particular time interval is not necessary for scheduling a recording.

The content in the content stream may be monitored by fingerprinting (step 1304) obtained from the content received in the content stream. The monitoring of the content stream may begin before a specified time period prior to an expected start time (e.g., as indicated by the EPG) for the particular media content scheduled for recording. The fingerprint may then be used to query a fingerprint database and identify content in the content stream (step 1306). If the content in the content stream matches the particular media content scheduled for recording (step 1308), recording of the content in the content stream begins (step 1310). If the content in the content stream does not match the particular media content scheduled for recording, the media content may continue to be monitored. The above method records the entire content of the particular media content if the particular media content is played out before the scheduled start time, because the start time of the recording is based on identifying the particular media content in the content stream.

Fig. 14 shows an example of ending a particular media content based on a fingerprint obtained by the content (received from the content stream). Recording of the particular media content in the content stream begins (step 1402). Recording may be initiated using a fingerprint-based method as shown in fig. 14 or may simply be initiated based on an expected start time (e.g., as indicated by an EPG). The fingerprint may then be obtained from the content in the content stream (step 1404). Fingerprints may be continuously or periodically acquired once a broadcast (including streaming) of particular media content begins or is near an expected end time of the particular media content. For example, monitoring for an end may begin with the playout of particular media content, or may begin fifteen minutes before the scheduled end time. Thereafter, the fingerprint database may be queried with fingerprints to identify content in the content stream (step 1406). Recording of the content in the content stream continues as long as the content in the content stream matches the particular media content scheduled for recording (step 1408). However, when the content in the content stream no longer matches the particular media content, recording is stopped (step 1410). For example, the user may select a recording of a football game from the EPG. The end time of the stream of the football game may not be known because the length of the football game may not be known in advance. In this example, content in the content stream that includes a football game may be continuously or periodically fingerprinted to determine if the football game is still being played. The recording may stop once it is determined that the soccer game is no longer being played.

In an embodiment, the acquired fingerprints may be used to identify content that is most relevant to a particular set of media content. For example, the EPG data may indicate that a football game will be available in a 5pm to 8pm content stream, followed by comedy shows from 8pm to 9 pm. However, the soccer game may be played for a time shorter or longer than the predetermined time interval of 5pm to 8pm indicated by the EPG data. Therefore, the end time of the soccer game will not be determinable based solely on the EPG data. The fingerprint may be continuously or periodically retrieved from the content in the content stream from some time prior to the projected end time indicated by the EPG data until the content is no longer available on the content stream. Continuing with the previous example, fingerprints may be obtained from 7:30pm to 8:30pm or from 7:30pm until the soccer game is no longer available on the content stream.

In this example, the system may determine (e.g., based on EPG data) that the comedy show will follow the football game regardless of whether the football game is ending early or late. Thus, the acquired fingerprint may be analyzed to determine whether the corresponding media content is one of: (1) a football match or (2) a comedy show. Determining which media content corresponds to a fingerprint from a limited set of possible media content requires less computational and/or processing power than identifying media content from a large database of media content files. For example, the captured fingerprints may be used only to determine whether the corresponding media content frames include faces of starring comedies or comedy actors known in a comedy opening. Fingerprints may also be taken from a smaller set of features in each media content file to simplify the fingerprint acquisition computation. Based on the fingerprints of the content stream, an end time of the football game may be determined, and a start time of the comedy show may be determined.

In an embodiment, one or more advertisements may be displayed in the content stream. To distinguish commercials from subsequent programs in the content stream, fingerprints may be taken with a minimum duration after the program being recorded is completed to ensure that the program is no longer available in the content stream. For example, the fingerprint may be taken in a window of 10 minutes (longer than most commercial breaks) after the last frame of media content identified as being recorded. Thereafter, if no media content is found in the content stream within the 10 minute window or other specified time, it may be determined that playout of the media content in the content stream has ended. Additional content (which is not part of the media content) may be deleted. In the previous example, if the non-football game content is continuously displayed for a minimum of 10 minutes near the expected end time of the football game, the system may determine that the playout of the football game has ended and that the last 10 minutes of recording was replacement content, which was not part of the football game. This last 10 minutes of recording can be deleted.

In an embodiment, the recording schedule may be modified based on an unplanned extension or reduction in the media content stream. An unscheduled extension of a program may cause the overall broadcast schedule to change by one day or night. For example, if a football match results in an unscheduled extension of twenty minutes, the subsequent shows and/or scheduled broadcasts of the program will all shift by twenty minutes. In an embodiment, the change may be identified based on a fingerprint obtained from content in the content stream, and the recording schedule on the multimedia device may be changed to match the change in the scheduled playout.

As shown in fig. 11, in accordance with one or more embodiments, media content may be selected for recording by a media device based on a fingerprint obtained from the media content. One or more fingerprints may be obtained from content in a content stream being monitored (step 1102). The fingerprint may then be compared to a fingerprint database to identify the media content (step 1104). More frequent viewing of the content stream by the user may be selected for monitoring. In another example, a content stream specified by a user may be monitored. Thereafter, if the identified media content matches a user-specified characteristic or user viewing history (step 1106), the media content may be recorded (step 1108). Examples of user-specified characteristics may include content type, actor or actress, geographic region, language, sound, or any other characteristic that the user has specified. In an embodiment, fingerprints are used to identify user-specified features in media content that are otherwise unavailable (e.g., in metadata associated with the media content). In another example, if the media content in the content stream is similar to a program viewed and/or recorded by the user, the media content may be recorded.

As shown in fig. 12, in accordance with one or more embodiments, an incomplete copy of media content may be replaced by a complete copy of the media content. For example, after recording a copy of media content (step 1202), it is determined that the recorded copy is incomplete (step 1204). The determination is made by determining that the duration of the recorded copy is shorter than the expected duration of the media content. The expected duration of the media content may be obtained from an Electronic Program Guide (EPG), from metadata associated with the media content, from an online search for duration, a database query, or from any other suitable source.

In an embodiment, a new full copy of the media content is obtained (step 1206). Obtaining a new copy of the media content may involve identifying an accessible content stream with the media content and obtaining the media content from the content stream. In another example, a new copy of the media content may be requested from a network server or broadcast service. In another example, the new copy of the media content may be searched and downloaded over a network (e.g., the internet). In an embodiment, any identified partial recording may be concatenated with another portion of the separately recorded media content to obtain an entire recording of the media content. The missing portion of the copy of the recorded media content may first be identified based on a fingerprint obtained from the recorded media content. For example, a fingerprint obtained from a partial recording may be compared to a fingerprint known to be associated with a complete recording of media content. Based on the comparison results, missing portions of the acquired fingerprint and corresponding partially recorded missing portions may be identified. Thereafter, only the missing part (replacing the new copy) can be obtained according to the above-described technique.

A portion of a media content record may be cut when previously played-out media content has an unplanned extension. In the above example, content from the content stream may be scheduled for recording comedy programming requested by the user from 8pm to 9 pm. However, due to the twenty minute delay of the football game, the first 20 minutes of comedy programming will not be available on the content stream. Thus, a recording of 8pm to 9pm content may include a 20 minute football match followed by a 40 minute comedy show. Alternatively, records shorter from 8:20pm to 9:00pm may include only a portion of the original comedy program. In an embodiment, the fingerprint may be used to determine a location in the video playback and adjust the recording interval accordingly. For example, content available in an 8:20pm content stream may be identified as the beginning of a comedy based on a fingerprint obtained from the content. Based on this identification, the recording interval may be changed from 8:00pm-9:00pm to 8:20pm-9:20pm, or from 8:00pm-9:00pm to 8:00pm-9:20 pm. In another embodiment, the recording may simply continue until the fingerprint obtained from the content in the content stream no longer matches the comedy-related fingerprint. In embodiments, fingerprints for media content in a content stream may be sent to a media device in advance, such that the media device may compare a received fingerprint known to correspond to the entire media content with a fingerprint obtained from media content accessible on the content stream.

In embodiments, playback of recorded content may include selecting a start position (other than the beginning of the recorded content) and/or selecting an end position (other than the end of the recorded content). For example, if a one-hour recording of a comedy program includes a 20 minute football match followed by a 40 minute comedy program, the fingerprint may be used to determine that the comedy program begins at the 20 minute recording location. Based on this information, when playback of a comedy program is selected, playback may begin at a 20 minute position. Likewise, alternate content may be recorded at the end of recording of comedy programming. In this example, playback may be automatically stopped by the multimedia device in response to determining that the remainder of the recording does not include comedy programming. Starting and/or stopping playback of recorded content may also be used to skip advertisements at the beginning or end of recording based on fingerprinting of the content. For example, in response to playback of a 30 minute recording, if the first two minutes of recording include only advertisements, playback may begin at the two minute position.

In an embodiment, partial recordings of comedy programming (e.g., a recording that is shortened by forty minutes, or only forty minutes of an hour of recording corresponding to a comedy programming) may be identified based on a fingerprint obtained from the recording, a length of the recording, or using another suitable mechanism. In an embodiment, in response to identifying the partial recording of the media content, the media content may be automatically re-recorded, as shown in FIG. 12 and described above.

In embodiments, fingerprint-based tags may be generated for marking the start point and/or end point of media content. For example, a tag may be generated by a media device receiving the content stream based on the acquired fingerprint, the tag marking a particular frame indicating a start and/or end time of the program. In another example, a content source may use a fingerprint obtained from media content to identify the exact start and end times of the media content, and then tag frames before streaming to a media device to indicate a start point and/or an end point. In embodiments, any other fingerprint-based implementation may be used in cases where the start point and/or end point of the media content is detected by a fingerprint obtained from the media content.

6.0 publishing record or viewing information

Fig. 4 illustrates a flow diagram for detecting playback of an advertisement, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 4 should not be construed as limiting the scope of the invention.

In an embodiment, a command is received to view or record media content on a first device associated with a first user (step 402). A command to view or record media content may be received through a selection in an Electronic Program Guide (EPG). The command may be for a single recording of media content (e.g., a movie, sporting event, or particular television show) or a series of recordings of media content (e.g., a multi-episode television show). A command may be received to play a media content file stored locally on the memory (e.g., a DVD player may receive a command to play a DVD and a digital video recorder may receive a command to play a stored recording). In an embodiment, a single media device may receive all of these commands and instruct other devices accordingly (e.g., DVD player, blu-ray player).

According to an embodiment, a viewing or recording of media content on a first device is published (step 404). Viewing or recording of published media content may be user-specified. For example, the viewing or recording of the media content may be posted on a web page associated with the user (e.g., the user's web page on a web site, such as MySpace)Or Facebook)(MySpaceIs a registered trademark of MySpace, Inc., Beverly Hills, Calif., and FacebookIs a registered trademark of Facebook, inc., palo alto, CA), publication on a group page may be sent to other users via e-mail (e.g., a web page designated for the group), may be provided as a text message, or may be published in any other manner. In an embodiment, all of the user's views or records may be automatically emailed to other users on the list who have selected to accept messages from the user (e.g., using Twitter)TwitterIs a registered trademark of Twitter, inc., San Francisco, CA). The viewing or recording of the published media content may also include a fee associated with the media content. For example, if the user selects a pay-per-view movie, the price of the movie may also be released. In an embodiment, publishing the viewing or recording of the media content may involve publishing the name of the user (or a user name associated with the user) on a publication associated with the media content. For example, all users who have viewed particular media content may post to a social networking websiteOn a single web page of interest. Any user who has responded to the publication related to a particular media content (e.g., "like," "approve," "share," etc.) (which indicates that the user has viewed the particular media content) may post on a single web page.

In an embodiment, in response to receiving a command to record media content on a first device associated with a first user, the media content is recorded on the first device and a second device associated with a second user (step 406). For example, the first device may notify the second device of a scheduled recording of media content, and the second device may automatically record the media content. In another example, the second device may prompt the second user to record media content in response to the notification by the first device. The second device may then record the media content upon receiving a user's command to record the media content. In an embodiment, the recording of the media content on the second device may be immediately following the posting of the recording on the first device (e.g., on a website), as described above. For example, the second user may select a link on the website related to the posting of the media content record on the first device to record the media content on the second device related to the second user. In an embodiment, a media device may emulate other media devices by recording all programs recorded by the other media devices.

According to an embodiment, the recording of the same media content on multiple devices may be detected (step 408). For example, different users within a user group may schedule recordings of the same media content on their respective media devices. Records of schedules for each media device associated with a user within a user group may be collected and compared (e.g., by one of a server, service, or media device) to detect any overlapping scheduled records. In embodiments, media content that has been recorded on a media device may be compared to media content that has been recorded on another media content, or to a scheduled recording on another media content.

In an embodiment, a media device may be configured to automatically schedule recording of any media content scheduled for recording by another specified media device. Thus, the media device may be configured to emulate other media devices identified by the device identification number. The media device may also be configured to emulate any device associated with a given user. For example, based on the second user's publication on the social networking website, the first user may determine that the second user has multiple choices for a new show or program. The first user then chooses to mimic the second user's television-watching habits by submitting a mimicking request with an identification number of the media device associated with the second user or the second user's name. Alternatively, the first user may indicate a preference on a social networking website. The social networking website may then communicate the identification of the first user and the second user to a content source that configures a media device associated with the first user to record the same program recorded by a media device associated with the second user.

In an embodiment, each media device may be configured to access a database of media device recording schedules (e.g., on a server, provided by a third party service, etc.). Users can access this database using their own media devices and mimic the recording of another media device tagged by the name or identification of a particular user. For example, a user may select a particular program that is also recorded by another user. In an embodiment, the user will be able to access other recording related statistics to select programs for viewing or recording. For example, the media device record database may indicate the most popular programs based on scheduled recordings in the future, based on completed recordings, or based on multiple users who watch the programs when they are available on the content stream.

According to an embodiment, media content may be scheduled for play on multiple devices simultaneously (step 410). The play time of the media content may be selected automatically or based on input from one or more users. For example, all users associated with media devices that are scheduled to record (or have recorded) particular media content may be notified of overlapping selections, and one user may select a simultaneous viewing time for the media content by all users using their respective media devices. In another example, each media device may access a user availability calendar to determine available viewing times for the respective user. Thereafter, the synchronized viewing of the programs may be scheduled in a calendar so that all users (or most users) are available.

According to an embodiment, viewers/recorders of the same media content may automatically register with groups related to the media content (step 412). For example, in response to each recording/viewing of a movie, all viewers and/or recorders of a particular movie may automatically register with a social network group related to the movie. The automatically registered group may be used by the user as a forum to discuss media content, to find other users with similar viewing preferences, to schedule viewing times for similar recordings, or any other suitable use. A forum may be initiated by two or more users associated with multiple devices that are playing media content synchronously. The forum may be populated by inviting users to instant messaging chat (e.g., Yahoo!Instant Messaging、GoogleChat、AIMTwitterEtc.) (Yahoo!Is Yahoo! Registered trademark of Sunnyvale, CA, GoogleIs a registered trademark of Google, Inc., Mountain View, CA, AIMIs a registered trademark, Twitter, of AOL LLC, Dulles, VAIs a registered trademark of Twitter, inc., San Francisco, CA), video chat (e.g., Skype)SkypeIs a media initiative of Skype Limited corp., Dublin, a registered trademark of Ireland), a website topic, or an electronic messaging (email) topic. A forum may include two users or any number of users. Forums may be created for users who are already known to be connected. For example, if the user is a friend on a social networking site, a forum may be created. In embodiments, a forum may be created to introduce suppliers to potential customers. For example, during the playing of a football game, an invitation may be presented to chat with a provider of tickets to the football game. In embodiments, the forum may act as an appointment portal. For example, a male and a female viewing the same program in the same geographic area subscribed to by the dating server may be invited to chat by the media device. Another example relates to an active portal. For example, the media device may be configured to invite viewers of a cooking channel program to cook together, or the media device may be configured to invite viewers of a travel channel program to travel together to a featured destination. As described above, the media device may be configured to communicate with any other computing device (e.g., other media devices or personal computers).

7.0 obtaining fingerprints from media content

Fig. 5 illustrates a flow diagram for acquiring fingerprints from media content, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 5 should not be construed as limiting the scope of the invention.

In an embodiment, a media device is monitored to determine that the media device satisfies an idle criteria (step 502). The idle criteria may be based on a percentage of deactivation or usage of the media device or component (e.g., a percentage related to available bandwidth of the total bandwidth or a percentage related to available processing capacity of the total processing capacity). The media device may be self-monitoring or monitored by a server. Monitoring the media device for the idle criteria may involve detecting the end of a period of time without receiving a user command. Monitoring the media device for the idleness criteria may involve detecting availability of resources for receiving and/or acquiring fingerprints from the media content. Monitoring the media device may include monitoring different components of the media device independently. For example, if a user is watching a recording stored on a media device and is not recording any additional content being transferred to the media device, the tuner may be idle. Based on this information, a determination may be made that the tuner satisfies the idleness criterion. Thus, different components of the media device may be associated with independent idleness criteria. In another example, the components necessary to obtain a fingerprint from media content may satisfy an idleness criterion.

In an embodiment, a media device receives media content from a content source for the purpose of obtaining a fingerprint from the media content (step 504). The media device may receive media content in response to alerting the content source media device (or a component within the media device) that the idleness criteria are met. In an embodiment, the content source may automatically detect whether the media device satisfies an idleness criterion. For example, the content source may determine that the media device has not requested to view any particular media content (e.g., broadcast content, web page content, etc.). Thus, the tuner is likely to have bandwidth to download the media content. In an embodiment, a media device may include functionality to receive multiple content streams. In this embodiment, the content source may determine how many content streams are being received by the media device. Based on the known configuration and/or functionality of the media device, the content source may determine the available bandwidth for the tuner to receive other media content. Once the idleness criteria are met, the content source may download the particular media content, causing the media device to generate a fingerprint.

In an embodiment, a content source may establish a fingerprint database for media content by distributing the media content to be played out to a plurality of media devices that satisfy an idleness criterion. For example, if five thousand devices meet the idleness criteria and two thousand unique media content files are to be fingerprinted, the content source may transfer four unique media content files to each of the five thousand media devices for generating respective fingerprints from the media devices. In embodiments, the content source may send each unique media content file to two or more media devices in case of errors with the fingerprint obtained from the media device or the media device is interrupted while the fingerprint is obtained. The content source may also instruct the media device to fingerprint (e.g., based on a user's command) content that has been downloaded to the media device. In embodiments, the user may resume utilizing the media device, thereby preventing or blocking the media device from acquiring fingerprints. In an embodiment, the content source may prompt the user to request permission to use the media device when an idle criterion is satisfied before the media content is downloaded to the media device. The content source may also provide incentives, such as points to view pay-per-view movies, if the user allows the content source to use the media device to implement and/or perform a particular function (e.g., acquire a fingerprint).

In an embodiment, the fingerprint is obtained from media content of the media device (step 506). Any technique may be used to obtain fingerprints from media content. One example is to obtain a fingerprint from a video frame based on intensity values of pixels within the video frame. A function (e.g., a function downloaded to the media device) may be applied to each intensity value, and then based on the result, a signature bit (e.g., "0" or "1") may be assigned to the intensity value. By applying the method to a spectrum generated from audio data, similar techniques can be used for audio fingerprint acquisition.

The fingerprint may be obtained by the media device based on specific instructions from the content source. For example, a fingerprint may be obtained from all video frames of a particular media content file. Alternatively, the fingerprint may be taken from every n frames or every embedded frame received by the media device. In an embodiment, a particular frame to be fingerprinted may be marked. The labeling techniques are described in application 09/665,921, application 11/473,990 and application 11/473,543, all of which are owned by the present applicant and incorporated herein by reference. Once the media device receives the tagged frames, the media device may decompress the frames, analyze the frames, and obtain fingerprints from the frames. Depending on the media content, the video frame fingerprints may be classified by the media device (e.g., by name, number of sets, etc. of the media content).

In an embodiment, a media device may obtain a fingerprint for media content being viewed by a user. For example, a user may select a particular program on an electronic program guide displayed by the media device. The media device may then request a content stream from a content source that includes the particular program. As an optional step, the source may indicate whether the fingerprint is needed for a particular program requested by the media device. The indication may be a flag in data received at the media device. If a particular program requires fingerprinting as indicated by the flag, the media device may decompress the corresponding video frame, load the decompressed video frame into memory, and analyze the video frame for fingerprints from the video frame. In an embodiment, a user may change channels halfway through the playback of the media content being fingerprinted. The tuner can be used to receive different content streams. In this case, the media device may have a fingerprint that is acquired for only a portion of the media content. The media device may generate metadata indicating the start and end positions in the playback of the media content for which a fingerprint has been acquired.

In an embodiment, the media device may then upload the fingerprint obtained from the media content (or from a portion of the media content) to the fingerprint server (step 508). Thus, the fingerprint database may be established by a plurality of media devices, each uploading a fingerprint of media content. A fingerprint received for only a portion of media content may be combined with other fingerprints from the same media content to produce a complete fingerprint. For example, if one media device generates and uploads fingerprints for video frames of a first half of a program, while a second media device generates and uploads fingerprints for video frames of a second half of the same program, the two fingerprints received from the two devices may be combined to derive fingerprints for all video frames of the program.

FIG. 6 illustrates an exemplary architecture for collecting and storing fingerprints acquired from a media device, in accordance with one or more embodiments. Fingerprint management engine (604) generally represents any hardware and/or software (e.g., media device a (606), media device B (608), media device C (610), media device N (620), etc.) that may be configured to obtain fingerprints acquired by media devices. The fingerprint management engine (600) may be implemented by a content source or other system/service (including functionality to take fingerprints acquired by a media device). The fingerprint management engine (604) may obtain fingerprints for media content that has been received by the media device (e.g., in response to a user selection of the media content or a content stream containing the media content). The fingerprint management engine (604) may transmit media content to the media device, particularly for the purpose of acquiring fingerprints. The fingerprint management engine (604) may transmit media content to the media device for acquiring fingerprints in response to detecting that the media device is idle. In an embodiment, a fingerprint management engine (604) maintains a fingerprint database (602) for storing and querying fingerprints obtained by media devices.

8.0 Presence message

Fig. 7 shows a flow diagram for presenting a message, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 7 should not be construed as limiting the scope of the invention.

Initially, a message preference associated with a user is received (step 702). Message preferences generally refer to preferences related to message content, message timing, message filtering, message priority, message presentation, or any other characteristic (related to the message). For example, a message preference may indicate that a message is to be presented once the message is received or retained for a particular time (e.g., when an advertisement is being played). Message preference preferences may indicate different preferences based on the source of the message or the recipient of the message. For example, messages from a particular website, Really Simple Syndication (RSS) feed, or a particular user may be classified as high priority messages to be presented first or upon receipt. Messages of low priority may be retained for a particular period of time. The message preferences may indicate whether the message is to be rendered upon receipt, converted to text, converted to audio, rendered in a particular manner/format/style, and so forth. Message preferences may be associated with automated operations in which receipt of a particular message causes a specified action to be automatically performed. One or more preferences (e.g., message preferences), viewing history, and/or other information related to the user form a user profile.

In an embodiment, the message preferences may include user-defined alarm conditions. For example, the alarm condition may include receiving an email, a voicemail, a text message, an instant message, a twitter message, etc., that satisfies a particular condition. The alarm condition may include a particular user operation performed by a user of the specified list. For example, the alert condition may be that a particular user issues a hiking activity invitation on a web page. The alert condition may be based on a particular keyword in the communication, a topic related to the communication, and the like. For example, an alarm condition may be satisfied if the words "emergency" or "urgent" are found in the communication. The alarm condition may be related to safety (e.g., a house alarm or a car alarm being issued). The alarm condition may be associated with a kitchen appliance. For example, the alarm condition may be associated with a sounded oven timer. The alarm condition may include a change in the status of the entity specified by the user. For example, the alert condition may be associated with a time at which a user of the social networking website changes state from "in relevance" to "single". The alert condition may include the availability of particular media content in the content stream selected based on the user profile. For example, the user profile may include viewing history, names of actors, types of media content, languages associated with the media content. If the media content matches any portion of the user profile, an alert condition may be satisfied and an alert may be issued accordingly.

In embodiments, the message preferences may be received as direct input from the user, determined based on the user's profile, obtained from the internet (e.g., from a web page or other profile associated with the user by querying a database, etc.). Message preferences may be obtained by monitoring usage patterns of the media device. For example, if the usage pattern indicates that the user immediately checks for a message upon receiving notification of the message, the information preferences may indicate that the message is immediately displayed or played. The user's message preferences may also be based on the sender. For example, the sender of the message may indicate a delivery method and/or delivery preferences. Message preferences may also be modified randomly (e.g., user input), periodically, or continuously.

In an embodiment, a command to play media content is received (step 704). The received commands may be submitted by the user via a keyboard, remote control device, mouse, joystick, microphone, or any other suitable input device. The command may be selected by a user in an Electronic Program Guide (EPG) for playback of media content. The command may be a channel selection entered by a user. The command may be a request to display a picture slide. The command may be to play an audio file. The command may be a command requesting to play a movie (e.g., a blu-ray player). In an embodiment, receiving a command to present media content may include a user entering a title of the media content in a search field on a user interface. The command to play the media content may be a user selection of a particular media content stored in memory.

In an embodiment, media content is played (step 706). In embodiments, the media content may be played in response to a command or without receiving a command. For example, a user may turn on a media device that is automatically configured to receive a content stream on a last selected channel or a default channel. In embodiments, a media device may select media content to play based on user preferences, or in response to the playing or recording of the media content on another media device.

In an embodiment, a message may be received while media content is being played (step 708). The message may be received from a local resource or from a remote resource over a network (e.g., the internet, an intranet, a broadcast service, etc.). The message may be received from a web service over an internet connection. For example, friend messages or status changes related to a social networking website may be received from a web service. The web service may be configured to provide all information related to the social networking website or filtered message selections related to particular preferences. Another example may include a Really Simple Syndication (RSS) feed, which may be received from a web service related to news, sports, entertainment, weather, stocks, or any other suitable category. In an embodiment, the message may be received from a content source related to a service provided by the content source. For example, the message may indicate the availability of a car procurement service, or the availability of a particular car for sale.

The message may be direct information to a user or group of users (e.g., voicemail, text message, email, etc.). The message may be received in a format different from the original format. For example, a text message may be received as an audio file, or the text message may be converted to an audio file by a media device upon receipt of the text message. Instead, the audio file may be received as a text message, or converted to a text message. In embodiments, symbols, abbreviations, images, etc., may be used to represent the messages. In embodiments, messages received in one language may be translated into a different language.

In an embodiment, receiving the message may include detecting the occurrence of a user-defined alarm condition. For example, all messages may be monitored and compared to user-defined alarm conditions. In embodiments, EPG data, RSS feeds, web pages, event logs, display information obtained using OCR, or any other source of information may be monitored for the occurrence of an alarm condition. If any of the received messages match an alarm condition, the occurrence of the alarm condition may be identified. An alarm may then be immediately presented indicating that an alarm condition has occurred. The message indicating that the alarm condition occurred may be interpreted based on user preferences.

It may be determined whether to present the message immediately, to present the message later, or not to present the message at all (step 710). Based on the user preferences, the received message may be presented immediately (step 717) or retained until a later time. The message may be presented during an advertisement break when the user selects to view the message based on a specified schedule or other suitable time. Messages may also be filtered based on user preferences. For example, each received message may be compared to a user-defined alert condition to determine whether the message matches the user-defined alert condition. Messages that match the user-defined alarm condition will be presented, while messages that do not match the user-defined alarm condition will be filtered out.

In an embodiment, presenting the message may include presenting the message in a visual format and/or playing the message in an audio format. For example, a message may be rendered by loading media content into a frame buffer and overwriting the message content in the frame buffer to overwrite a portion of the frame of media content. The contents of the frame buffer may then be displayed on the screen. In another exemplary implementation, different buffers may be used for media content and message content, with content on the display screen being obtained from both buffers. In an embodiment, presenting the message may include displaying the message information and simultaneously playing an audio file with the message information. The message information displayed on the screen and played in the audio file may be the same or different. For example, the display screen may display the face of a person associated with or announcing the message, while the audio file may include the actual message. In an embodiment, playing the audio message may include eliminating or reducing a volume associated with playing the media content.

9.0 interpreting commands

FIG. 8 illustrates a flow diagram for interpreting a voice command, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 8 should not be construed as limiting the scope of the invention.

Initially, one or more users present in the vicinity of the multimedia device are identified (step 802). One or more users may be identified based on voice input received by the multimedia device or an input device associated with the multimedia device (e.g., microphone, remote control). For example, the multimedia device (or associated input device) may be configured to periodically sample detectable speech input and compare the speech input to data representing the speech of the user to identify a known user. Data representing the user's voice may be generated based on the user performing voice training to cause the multimedia device to receive voice samples associated with the user. The user may be identified during the active or passive mode. For example, the user may be identified when a user command is received to identify the user, or may be automatically identified without a specific user command. Although speech recognition is used as an example, other methods of recognizing the user may be used. For example, the user name may be entered via an input device (e.g., keyboard, mouse, remote control, joystick, etc.). The user may be identified based on metadata associated with the household. The user may be identified using fingerprint detection on the media device or on another communicatively connected device (e.g., a remote control).

In an embodiment, a voice command of a user is received (step 804). The voice command may be received by the user first indicating that the voice command is to be given. For example, a user may speak a keyword (e.g., "command") or make an input on a device (e.g., a remote control) indicating that the user is to submit a voice command. The voice command may be received by continually processing all of the voice input and comparing the voice input to known commands to determine whether the voice command was submitted. For example, speech input in the last n seconds of the current time may be submitted continuously for analysis to determine whether a speech command was received within the last n seconds. In an embodiment, different portions of a voice command may be received from different users. For example, the command "record" may be received from a first user, while different titles of a program/show may be received from multiple users. Examples of other commands include "order pizza", "leave this game is amazing", "leave board who wants to see the American prize (wall post while watch the emmys)", etc. Although voice commands are used in this example, any type of input may be accepted (e.g., using a mouse, keyboard, joystick).

The command may be interpreted (step 806) based on preferences associated with one or more identified users (e.g., in a user profile) to determine an action to be performed (step 808). Interpreting the command may involve determining whether the command is applicable to one user (e.g., the user who presented the command) or a plurality of users (e.g., including the plurality of users identified in step 802). A particular command word may indicate a single user command or a multi-user command. For example, a leave message command may be interpreted by default as applicable to a single user, e.g., the user submitting the command. Further, the commands may be interpreted based on user preferences/settings. If the user submitting the command "leave a message that the game is surprised (twee is amazing)" is associated with a twitter account, the action to be performed is to generate a message for the user's twitter account (including the word "this game is surprised"). Another example of a command applicable to a single user includes "leave board who wants to see the American prize (wall post who to watch the emmys)". In this case, the user's command may be identified as a Facebook message board, and the message "who wants to see the american prize (who dates to come watch the fact)" may be published on the user's Facebook profile. The multimedia device may be configured such that certain types of commands are associated with multiple user commands. For example, an order for a food item may be associated with all identified users. The command "order pizza" may be interpreted as an order for a pizza with ingredients that match the preferences of all identified users. The command "buy tickets" may be interpreted as an order to purchase tickets to a football game currently being advertised on the television for all identified users. Based on the identified user, the command may be intentionally ambiguous for a complete interpretation. For example, a command "play recorded show" may cause each recorded show to be evaluated on the media device to determine how many of the identified users like the recorded show based on user preferences. Thereafter, the recorded programs that match the preferences of the largest number of identified users are selected for playback.

In embodiments, all or a portion of the command interpretation may be confirmed by the user prior to execution. For example, when ordering a pizza, pizza toppings selected based on user preferences may be presented for confirmation. Another example of confirmation involving a command may involve any order requiring money or money thresholds.

In an embodiment, the command may be interpreted based on permissions associated with the user, and the command may be executed only if the user giving the command has the permissions to give the command. For example, recording and/or playing an R-rated movie may be limited to users over the age of seventeen. A profile including the age of the user may be installed for each user. If the identified user over seventeen years old gives a command to record/play an R-rated movie, the command is executed. However, if a user under seventeen years old gives a command to record/play an R-rated movie, the command is rejected. In an embodiment, the command may be interpreted based on the user's religion and/or political beliefs. For example, if the demo user submits a command, the election programs hosted by the demo will be recorded, and if the sympatheter user submits a command, the election programs hosted by the sympatheter will be recorded.

In an embodiment, the language used to submit the commands may be used to interpret the commands. For example, if a command to record a program is submitted in french, french subtitles will be picked from a set of available subtitle streams and recorded with the program. In another example, if multiple audio streams are available in different languages, the audio streams will be selected based on the language of the command.

10.0 associating input with media content

Fig. 9 illustrates a flow diagram for associating annotations with media content, according to an embodiment. One or more of the steps described below may be omitted, repeated, and/or performed in a different order. Accordingly, the particular arrangement of steps shown in FIG. 9 should not be construed as limiting the scope of the invention. Further, although specific types of annotations (e.g., audio, text, graphics, etc.) may be discussed in the examples below, embodiments of the present invention are applicable to any type of annotation.

In an embodiment, media content is played (step 902). The media content may include audio and video content, or the media content may include video content alone. While the media content is playing, audio input received from the user may be recorded (step 904). The audio input received from the user may be a general response to the media content. For example, the audio input may include laughter, excitement (e.g., panting, "wonder (wow)", etc.), commentary, criticism, praise, or any other reaction to the media content. In an embodiment, the commentary may include audio input intended for subsequent playback of the media content. For example, in documentaries about travel destinations, a user can submit voice input that includes stories or memories related to a particular travel destination having a characteristic. In another example, a band may provide lyrics during a particular portion of media content for recording in relation to the portion of media content. In another embodiment, the user may provide comments, storylines, character lines, or any other information about the media content in the alternate language during playback of the media content in the original language. Different versions of audio input (e.g., by the same user or by different users) may be recorded as being related to particular media content. In an embodiment, the audio input may have instructions for information intended for playback. For example, the playback information may indicate that the submitted audio will completely replace the original audio, or be played simultaneously with the original audio. In an embodiment, the audio input may be automatically generated by a text-to-speech converter that generates speech based on text associated with the media content. For example, speech in the alternate language may be generated based on closed captioning text in the alternate language. In an embodiment, optical character recognition may be used to identify building names, letters, team names, etc. displayed on a screen and convert them to audio for visually impaired viewers or for viewers unable to read information (e.g., due to language barriers or age). In embodiments, the audio input may be received while a particular portion of the media content is played and stored in association with the particular portion of the media content.

In an embodiment, the media content is then played with audio input received during a previous play of the media content (step 906). Playing the additional audio input received during the previous play of the media content may include playing in place of or concurrently with the original audio stream. In an embodiment, the additional audio input may be a feature that can be turned on or off during playback of the corresponding media content. In an embodiment, multiple versions of additional audio input may be provided, wherein a user may select a particular additional audio input for playing during the playing of the media content. For example, an online community may be established for submitting and downloading reviews that play with different movies. Different users may record audio input related to a particular movie (or other content) using different media devices and then upload the audio input to be associated with the movie. When a movie purchaser downloads a movie, the purchaser can select a comment (e.g., audio input) to be downloaded/played with the movie by another user. If the buyer finds the commentary of a particular user interesting, the buyer can set the particular user as a default commentator when downloading the movie (or other media content) and download the commentary owned by the particular user.

Although audio input is used as an example of annotation of media content, any type of annotation may be used in accordance with embodiments of the present invention. For example, during the playing of media content, text may be entered or images may be submitted by one or more users. In embodiments, all or a portion of the annotations or annotations sets may be processed or analyzed to obtain new content. In embodiments, annotations sets related to the same media content may be compared to identify annotation patterns. For example, the annotations sets may be analyzed to determine the most annotated points within the media content. Thus, scenes or actors that cause the user to be greatly excited (or otherwise emotionally) may be identified in the scene by annotations. In another example, user content included in an annotation set (e.g., a text or voice note) may be analyzed to determine the feeling of the collective user (e.g., the most interesting scenes in a movie, or the most interesting movies released in 2009).

11.0 obtaining annotations by personal media device

In embodiments, any annotations (including audio input, text input, graphical input, etc.) may be made before, during, or after presentation of the media content by a personal media device associated with the user. The annotations may be made based on the selection of an administrator, content producer, content director, etc. For example, upon completion of each performance in a talent tournament presentation in the media content of the content stream received by and displayed by the media device, the media device prompts the user to make comments (e.g., votes, comments, critics, praises, etc.). In embodiments, the resulting annotations (or other annotations) may be associated with the entire media content rather than specific points in the media content (such as the submission of audio input). The annotations of one or more users may then be processed (e.g., computing votes, scores, etc.) for the media content.

In an embodiment, audio input is obtained by a media device from a user to establish a user profile. For example, reactions to different media content may be obtained from a user. Based on the reaction, a user profile may be automatically created that may include the user's interests, likes, dislikes, concepts of value, political views, and the like. The automatically created profile may be used for dating services, social networking websites, and the like. The automatically generated configuration file may be published on a web page (e.g., a web page of a social networking website).

In an embodiment, the system can obtain user annotations to identify information related to the media content. For example, annotations may be available for identifying faces that have been detected but cannot be automatically identified. The system may also be configured after the media content has recently been played out for a parent's note indicating whether the media content is suitable for a child.

12.0 tagging media content

In an embodiment, the user may use the annotation to mark a location in the media content play. For example, a user may submit audio input or text input during the playback of media content, including specific keywords such as "mark", "note", "record", etc., which indicate the current location at which the system marks the playback of the media content. The system may automatically tag a particular location based on the user's response. For example, a user input above a certain frequency or a certain decibel level may indicate that the user is excited. This excitement point may be automatically stored. In an embodiment, the marked points may include a start point and/or an end point. For example, periods of high user activity that may be associated with exciting parts of a sports game may be marked by a start point and an end point. Parents may mark the beginning and end of media content that is not appropriate for children, whereby the marked portions may be skipped during playback unless a password is provided. The user may mark a segment of the home video for a number of things. The annotations may be stored as being related to the points as a result of the marking of the points by the user or automatically based on the user's response. When a user marks a point, the annotation may embody a reference to the original content, the time, or a frame offset from the beginning of the original content, and the UTC. Although audio input may be used as an example, input may also be submitted by pressing a key on a remote control device, clicking a mouse, entering a command on a keyboard, or using any other input method.

In an embodiment, marking (or identifying) a particular point in the media content may involve marking a media frame. For example, the media frames may be tagged with a tag, as described in applicant's own patent application 09/665,921 filed on 9/20/2000, which is incorporated herein by reference. Another example may involve tagging media frames with hash values, as described in applicant's own patent application 11/473,543 filed on day 22/6/2006, which is incorporated herein by reference. In an embodiment, marking a particular point in the media content may involve taking a fingerprint from one or more frames in the media content and using the fingerprint to identify the particular point in the media content. In an embodiment, a particular point may be marked by storing a time interval starting from the beginning in the media content play.

In an embodiment, the location of the user mark may be selected by the user at a later time. For example, a user can browse through points marked by different users by pressing the next or scanning (scan) during the course of playing media content. The image of each marker point may be presented to the user, where the user may select a particular image and begin/resume playing media content from the corresponding user marker point. User annotations may be used to dynamically segment media content into different portions. The user annotations may also be used to filter out certain portions of the media content (e.g., periods of no annotations/excitement) and play the remainder of the media content in the next play of the media content.

13.0 publication of media content annotations

In embodiments, all or part of the annotation may be posted (e.g., referenced or presented on a website or web service). In an embodiment, all or part of the annotations may be automatically presented to a user on another system. In one example, the user may request that the system send all or part of the annotation to an email or SMS address. In another example, a user may request that the system automatically add a movie to an online shopping cart or queue when another user (e.g., a movie critic or friend) has positively reviewed the movie. In embodiments, the media content annotation may be sold by a user in the online community for sale or transaction of the media content. In embodiments, annotations (e.g., media content with embedded annotations) may be transferred directly from one media device to another (e.g., via email, intranet, internet, or any other available communication method).

14.0 automatically generated annotations

In an embodiment, a system may obtain annotation content for media content from a closed caption portion of the media content. In an example, the system may generate an annotation that includes an appropriate name identified by the natural language processing system and/or the semantic analysis system, and then associate the annotation with the video content, the appropriate name appearing in the closed captioning. In another example, when the utterance "we will come back after these words (we'll be back after the words)" or a similar utterance is recognized in the closed captioning, the system may generate an annotation indicating the beginning of the advertisement break. Another example includes a system that produces annotations relating to a locale of media content that contains explicit closed captioning language. The system may then provide an option to automatically mute the audio portion of the media content associated with the explicit closed caption language.

In an embodiment, the system may generate the audio input using an optical character recognition system. In an example, the system may generate an annotation that includes a title that is announcing a movie. For example, the annotation may display the movie title (e.g., at the bottom of the screen) once it is identified or at the end of the movie trailer. In another example, the system may generate an audio annotation that includes a name of a cast member from the video content corresponding to the credits list. Another example may involve the system generating annotations indicative of score changes during a sports game by analyzing data obtained by OCR in the microphone area on which the sports game is played.

In an example, the system may detect that a user is browsing an Electronic Program Guide (EPG) by identifying a set of program and movie titles from OCR. The system may then generate a visual annotation in the EPG recommending the highest rating program listed in the EPG. In embodiments, the annotations may also include other contextual information that may be used to further optimize the recommendation. For example, the annotations may be based on the content recently viewed by the user, and the annotations may be used to recommend content from the EPG of the same genre or of the same actors.

In an embodiment, the system may retrieve the annotation content using a speech-to-text system. For example, when an audio mute or hearing impaired person makes a request, the system may generate a transcript of the dialog in the media content to be used in future presentations. In embodiments, the captured transcript may be processed by a separate system that monitors the topic or person of interest and then automatically generates annotations relating to the topic or person of interest.

15.0 Environment configuration

FIG. 10 illustrates an exemplary system for configuring an environment in accordance with one or more embodiments. In an embodiment, environment configuration engine (1015) generally represents any software and/or hardware configurable to determine an environment configuration (1025). The environment configuration engine (1015) may be implemented within the media device, as shown in FIG. 1B, or may be implemented as a separate component. The environment configuration engine (1015) may identify one or more users (e.g., user a (1005), user N (1010), etc.) in proximity to the environment configuration engine (1015) and identify user preferences (1020) associated with the determined users. The user may be identified based on speech recognition or based on other input that identifies the user. Based on user preferences (1020), the environment configuration engine may configure a user interface, audio system configuration, room lighting, game console, music playlist, seating configuration, or any other suitable environment configuration (1025). For example, if five friends associated with the group user preferences are identified, a channel playing a sports game may be automatically selected, and surround speech for an audio stream associated with the sports game may be selected. Another is that the column may involve identifying a couple and automatically starting to play a romantic comedy.

16.0 hardware overview

According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. A special purpose computing device may be hardwired to perform the techniques, or may include digital electronic devices such as one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) that are continuously programmed to perform the techniques, or may include one or more general purpose hardware processors that are programmed to perform the techniques according to program instructions in firmware, memory, other storage, or a combination. Such special purpose computing devices may also incorporate custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. A special purpose computing device may be a desktop computer system, portable computer system, handheld device, network device, or any other device that includes hardwired and/or programmed logic to implement the techniques.

For example, FIG. 11 is a block diagram illustrating a system 1100 upon which an embodiment of the invention may be implemented. System 1100 includes a bus 1102 or other communication mechanism for communicating information, and a hardware processor 1104 coupled with bus 1102 for processing information. Hardware processor 1104 can be, for example, a general purpose microprocessor.

System 1100 also includes a main memory 1106, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 1102 for storing information and instructions to be processed by processor 1104. Main memory 1106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1104. The instructions, when stored in a storage medium accessible to processor 1104, make system 1100 a special-purpose machine that specifically performs the operations specified in the instructions.

System 1100 further includes a Read Only Memory (ROM)1108 or other static storage device coupled to bus 1102 for storing static information and instructions for processor 1104. A storage device 1110, such as a magnetic disk or optical disk, is provided and coupled to bus 1102 for storing information and instructions.

System 1100 may be coupled via bus 1102 to a display 1112, such as a Cathode Ray Tube (CRT), for displaying information to a computer user. An input device 1114, including alphanumeric and other keys, is coupled to bus 1102 for communicating information and command selections to processor 1104. Another type of user input device is cursor control 11111 (e.g., a mouse, a trackball, cursor direction keys) for communicating direction information and command selections to processor 1104 and for controlling cursor movement on display 1112. The input device typically has two degrees of freedom in two axes, a first axis (e.g., the x-axis) and a second axis (e.g., the y-axis), that allows the device to indicate position in a plane.

System 1100 can implement the techniques described herein using custom hard-wired logic, one or more ASICs or FPGAs, firmware, and/or program logic that, in combination with the system, render system 1100 a special-purpose machine or program system 1100 to be a re-used machine. According to one embodiment, the techniques herein are performed by system 1100 in response to processor 1104 executing one or more sequences of one or more instructions contained in main memory 1106. Such instructions may be read into main memory 1106 from another storage medium, such as storage device 1110. Execution of the sequences of instructions contained in main memory 1106 causes processor 1104 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term "storage medium" as used herein refers to any medium that stores data and/or instructions for causing a machine to function in a particular manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1110. Volatile media include dynamic memory, such as main memory 1106. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state disk, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media is distinct from, but can be used concurrently with, transmission media. Transmission media participate in the transfer of information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 1104 for execution. For example, the instructions may initially be carried on a magnetic disk or a solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to system 1100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on data bus 1102. The bus 1102 carries the data to the main memory 1106, from which the processor 1104 reads and executes the instructions. The instructions received by main memory 1106 may optionally be stored on storage device 1110 either before or after execution by processor 1104.

System 1100 also includes a communication interface 1118 coupled to bus 1102. Communication interface 1118 provides a two-way data communication coupling to a network link 1120, and network link 1120 is connected to a local network 1122. For example, communication interface 1118 may be an integrated services digital network (I SDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection for a corresponding type of telephone line. As another example, communication interface 1118 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. Wireless connections may also be implemented. In any such implementation, communication interface 1118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network connection 1120 typically provides data communication through one or more networks to other data devices. For example, network link 1120 may provide data communication through local network 1122 to a host computer 1124 or to data equipment operated by an Internet Service Provider (ISP) 11211. ISP 11211 in turn provides data communication through packet data communication networks around the world, now commonly referred to as the "internet" 1128. Local network 1122 and Internet 1128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1120 and through communication interface 1118, which carry the digital data to and from system 1100, are exemplary forms of transmission media.

System 1100 can send messages and receive data, including program code, through the network(s), network link 1120 and communication interface 1118. In the internet example, a server 1130 might transmit a requested code for an application program through internet 1128, ISP 11211, local network 1122 and communication interface 1118.

The received code may be executed by processor 1104 as it is received, and/or stored in storage device 1110, or other non-volatile storage for later execution.

17.0 extensions and substitutions

In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. For terms contained in the claims, any definitions expressly expressed herein shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method, comprising:

scheduling a recording of a particular media content in a content stream at a predetermined start time;

content on a content stream received before a predetermined start time;

obtaining a fingerprint from the content and querying a fingerprint database to identify content in the content stream as the particular media content;

starting recording of the particular media content in the content stream before a predetermined start time;

wherein the method is performed by a device comprising a processor.

2. The method of claim 1, wherein the predetermined start time is based on information related to an Electronic Program Guide (EPG).

3. A method, comprising:

recording a content stream comprising first media content;

monitoring the content stream for identifying additional media content different from the first media content;

identifying additional media content in the content stream that is different from the first media content by:

obtaining a fingerprint from the additional media content;

querying a fingerprint database using the fingerprint to determine that additional media content in the content stream is different from the first media content;

in response to identifying additional media content in the content stream that is different from the first media content, stopping recording of the content stream;

wherein the method is performed by a device comprising a processor.

4. The method of claim 3, wherein the recording stops at an actual end time of the first media content, the actual end time of the first media content being different from a predetermined end time of the first media content indicated by an Electronic Program Guide (EPG).

5. The method of claim 3, wherein an Electronic Program Guide (EPG) indicates that the additional media content is available on the content stream after the first media content.

6. The method of claim 3, further comprising:

detecting that the additional media content is available on the content stream after a predetermined start time of the additional media content;

modifying a predetermined recording time interval of the additional media content in response to the detecting step.

7. A method, comprising:

recording content received in a content stream from a predetermined start time of a first media content to a predetermined end time of the first media content to obtain a content record;

obtaining a fingerprint from the content record;

querying a fingerprint database using the fingerprint to determine that a first portion of the content record includes second media content and a second portion of the content record includes the first media content;

in response to a command to play the content recording, beginning playback of the content recording in a second portion of the content recording;

wherein the method is performed by a device comprising a processor.

8. A method, comprising:

obtaining a fingerprint from content available in a content stream;

querying a fingerprint database using the fingerprint to identify the content;

determining that the identified content is related to a user-specified characteristic;

recording the identified content in response to the determination;

wherein the method is performed by a device comprising a processor.

9. The method of claim 8, wherein the content is received in the content stream outside of a time interval indicated by an Electronic Program Guide (EPG) used to receive the content.

10. The method of claim 8, wherein the user-specified characteristics include one or more of:

a type of content;

an actor or actress associated with the content;

a geographic area related to the content;

a language related to the content;

a sound associated with the content.

11. A method, comprising:

obtaining a fingerprint from content available in a content stream;

querying a fingerprint database using the fingerprint to identify the content;

determining that the identified content is related to a user viewing history;

recording the identified content in response to the determination;

wherein the method is performed by a device comprising a processor.

12. The method of claim 11, wherein the determining step comprises:

determining that one or more characteristics of the identified content are equivalent to one or more characteristics of media content included in the user viewing history.

13. A method, comprising:

recording a first copy of the media content;

detecting that the first copy of the media content is an incomplete copy of the media content;

in response to the detecting step, a second copy of the media content is obtained, wherein the second copy is a complete copy of the media content.

14. The method of claim 13, wherein the obtaining step comprises one or more of:

requesting a second copy of the media content from a broadcast service and receiving the second copy in response to the request;

downloading a second copy of the media content from the network server;

the content stream is identified using the second copy of the media content and the second copy of the media content is recorded from the content stream.

15. The method of claim 13, wherein the detecting step comprises:

it is determined that the duration of the first copy of the media content is shorter than the expected duration of the first copy of the media content.

16. The method of claim 13, wherein the detecting step comprises:

determining that a second media content available on the content stream prior to the media content is played out longer than an expected end time of the second media content.

17. A method, comprising:

recording media content;

detecting that a portion of the media content is lost from the recorded media content;

in response to the detecting step, a missing portion of the media content is obtained.

18. The method of claim 17, wherein the detecting step comprises:

obtaining a fingerprint from the recorded media content;

a fingerprint database is queried using the fingerprint to identify missing portions of the recorded media content based on the fingerprint.

19. The method of claim 17, wherein the obtaining step comprises one or more of:

requesting a missing portion of the recorded media content from a broadcast service and receiving the missing portion in response to the request;

downloading a missing portion of the recorded media content from the network server;

the content stream is identified using the media content and a missing portion of the media content is recorded from the content stream.

20. A computer-readable storage medium comprising a set of instructions which, when executed by a processor, perform the steps of one or more of claims 1 to 19.

21. An apparatus comprising means configured to perform the steps of one or more of claims 1-19.

22. An apparatus comprising at least one device configured to perform the steps of one or more of claims 1-19.