[go: up one dir, main page]

WO2009112803A1 - Processing a sequence of frames - Google Patents

Processing a sequence of frames Download PDF

Info

Publication number
WO2009112803A1
WO2009112803A1 PCT/GB2009/000560 GB2009000560W WO2009112803A1 WO 2009112803 A1 WO2009112803 A1 WO 2009112803A1 GB 2009000560 W GB2009000560 W GB 2009000560W WO 2009112803 A1 WO2009112803 A1 WO 2009112803A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
content stream
fingerprint
sequence
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2009/000560
Other languages
French (fr)
Inventor
Winfried Antonius Henricus Berkvens
Roel Peter Geert Cuppen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ambx UK Ltd
Original Assignee
Ambx UK Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ambx UK Ltd filed Critical Ambx UK Ltd
Priority to GB1016922.5A priority Critical patent/GB2472162B/en
Publication of WO2009112803A1 publication Critical patent/WO2009112803A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4131Peripherals receiving signals from specially adapted client devices home appliance, e.g. lighting, air conditioning system, metering devices
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/005Reproducing at a different information rate from the information rate of recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H05ELECTRIC TECHNIQUES NOT OTHERWISE PROVIDED FOR
    • H05BELECTRIC HEATING; ELECTRIC LIGHT SOURCES NOT OTHERWISE PROVIDED FOR; CIRCUIT ARRANGEMENTS FOR ELECTRIC LIGHT SOURCES, IN GENERAL
    • H05B47/00Circuit arrangements for operating light sources in general, i.e. where the type of light source is not relevant
    • H05B47/10Controlling the light source
    • H05B47/155Coordinated control of two or more light sources
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B2220/00Record carriers by type
    • G11B2220/20Disc-shaped record carriers
    • G11B2220/25Disc-shaped record carriers characterised in that the disc is based on a specific recording technology
    • G11B2220/2537Optical discs
    • G11B2220/2562DVDs [digital versatile discs]; Digital video discs; MMCDs; HDCDs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Television Signal Processing For Recording (AREA)
  • Image Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Collating Specific Patterns (AREA)

Abstract

A method of determining one or more characteristics of a content stream (16) comprising a sequence (18) of frames (20), the method comprising the steps of receiving the sequence of frames, for each of a plurality of frames in the sequence of frames: measuring an attribute of one or more portions (34) of the frame, calculating a fingerprint (28) from the measured attribute (s), accessing a timetable (30) of predetermined fingerprints (28) for the content stream, and identifying the calculated- fingerprint (28) in the timetable (30). Finally, the process determines the (or each) characteristic of the content stream (16) (such as the speed and/or direction of the content stream) from at least two of the calculated and identified fingerprints (28).

Description

Processing a sequence of frames
FIELD OF THE INVENTION
This invention relates to a method of, and system for, determining one or more characteristics of a content stream comprising a sequence of frames. In one embodiment, this can be used to determine whether the playback speed of the content stream is different from a normal playback.
BACKGROUND OF THE INVENTION
When a user is participating in an entertainment experience such as watching a film in their home, it is now possible to provide augmentation of the user's experience. This can be delivered by one or more additional devices that are supplementary to the conventional devices used to watch the film such as a television and perhaps one audio output device. The additional devices could be lighting devices, or temperature controlling devices such as heaters and cooling fans or devices that provide vibration through furniture. AU of the additional devices go towards providing the ambient environment around the user and augment the user's experience of the film.
This experience creation for television is seen as a good and appreciated development. The method of delivering the augmentation can be chosen by the user according to the user's own specification. One currently available technology is known as amBX (see www.ambx.com) and is seen as a technology that could be used to deliver these experiences to the user. The technology amBX defines scripts that can be used to control the additional devices used in providing the ambient environment. To be able to achieve a good amBX-based experience when watching television, a system has been defined that synchronizes amBX scripts with the original audio/visual (AV) content (the film). This synchronization process works well for a normal speed playback of the original content. For example, International Patent Application Publication WO 2007/ 072326 disclose a system in which a content stream and a script are synchronized for outputting one or more sensory effects in a multimedia system. A fingerprint is calculated from a portion of the content stream. A time value is determined that corresponds to the fingerprint. The time value may be stored in a fingerprint database that is accessed utilizing the fingerprint and thereby, the time value is retrieved. A script clock is synchronized to the time value and thereby, to the portion of the content stream. The script is rendered in synchronization with the portion of the content stream utilizing the synchronized script clock. The script is utilized to produce one or more sensory effects that are output in an effects signal for an effects controller. The effects signal is produced in synchronization with the rendering of the portion of the content stream.
The current synchronization solutions, such as those described in the above patent application provide good results for a conventional playback at normal speed, and can also detect the pausing and stopping of the original content to which the script is being synchronized. However, during trick-play situations, during which the playback speed is varied, such as fast-forward, slow-motion and rewind actions taken by the user, no correct system performance will be achieved with the current synchronization solution. In these situations, it is possible that no time values will be found, resulting in the system needing to resynchronize or even to re-identify. Another thing that could happen is that incorrect time values will be found by the synchronization mechanism, which can result in detecting linear playback wrongfully. Yet another thing that could happen is that after resume (restart of the linear playback) it will take longer to resynchronize at the correct time point. All these results lead to a lesser quality of the experience, from the point of view of the user, who may experience incorrect delivery of ambient effects or no delivery at all.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to improve upon the known art. According to a first aspect of the present invention, there is provided a method of determining one or more characteristics of a content stream comprising a sequence of frames, the method comprising the steps of receiving the sequence of frames, for each of a plurality of frames in the sequence of frames: measuring an attribute of one or more portions of the frame, calculating a fingerprint from the measured attribute(s), accessing a timetable of predetermined fingerprints for the content stream, and identifying the calculated fingerprint in the timetable, determining the or each characteristic of the content stream from at least two of the calculated and identified fingerprints.
According to a second aspect of the present invention, there is provided a system for determining one or more characteristics of a content stream comprising a sequence of frames, the system comprising a receiver arranged to receive the sequence of frames, and a processor arranged, for each of a plurality of frames in the sequence of frames: to measure an attribute of one or more portions of the frame, to calculate a fingerprint from the measured attribute(s), to access a timetable of predetermined fingerprints for the content stream, and to identify the calculated fingerprint in the timetable, and to determine the or each characteristic of the content stream from at least two of the calculated and identified fingerprints.
According to a third aspect of the present invention, there is provided a computer program product on a computer readable medium for determining one or more characteristics of a content stream comprising a sequence of frames, the product comprising instructions for receiving the sequence of frames, for each of a plurality of frames in the sequence of frames measuring an attribute of one or more portions of the frame, calculating a fingerprint from the measured attribute(s), accessing a timetable of predetermined fingerprints for the content stream, and identifying the calculated fingerprint in the timetable, determining the or each characteristic of the content stream from at least two of the calculated and identified fingerprints. Owing to the invention, it is possible accurately to detect one or more characteristics of the content stream. Advantageously, the step of determining the (or each) characteristic of the content stream from at least two of the calculated and identified fingerprints comprises identifying the speed and/or direction of the content stream. In the proposed system, a solution is provided for the synchronizing of augmentation effects with the original content stream during playback of the content stream at a non-standard speed, such as fast- forward and rewind. The speed and direction of the playback can be determined, without the need to re-synchronize the effects delivery with the playback of the content stream, once normal playback is resumed.
Furthermore, the invention is able to provide for the delivery of effects during the playback at non-normal speed, so that the users also get some experience belonging to and possibly also aligned with the content (movie) during these trick-play situations, so that people will not be pulled out of the content completely during these actions. It could be expected that users would like to experience some effects that are aligned with the (approximate) time-position of the fast-forward/rewind (but with less dynamics), and this can be provided for by a system constructed according to the invention. The invention can be used for the improvement of the synchronization system used to synchronize a content stream with a script (or a second content stream). It is currently foreseen that this invention can be used in the amBX decoder, which could be packaged as a separate device or could become part of future TV's and/or disc-player devices in the Home Cinema domain. Preferably, the step of measuring an attribute of one or more portions of the frame comprises measuring a mean luminance of the portion(s). The use of mean luminance within one or more portions of an image frame has a number of advantages, stemming from the fact that it is often the case that frames close together will have similar luminance values. During playback situations at a non-normal speed such as fast forward, slow-motion or rewind, a certain amount of frames in the content stream are discarded and never output to the user. By basing the fingerprint calculation around mean luminance values, the discarding of frames (for example by using only every 3rd or 4th frame), will not affect the efficacy of matching based around such fingerprints. Ideally, the step of calculating a fingerprint from the measured attribute(s) comprises generating a bit sequence from the measured attribute(s). Additionally, or alternatively, the step of calculating a fingerprint from the measured attribute(s) may comprise generating a single value from the measured attribute(s). In order to facilitate the matching of an extracted fingerprint to a stored predetermined fingerprint, the use of a bit sequence is a simple and effective method of being able to look-up in the timetable to identify the fingerprint.
Preferably, the generating of a bit sequence from the measured attribute(s) comprises performing a spatial difference operation on a pair of portions of the frame. The advantage of using spatial differences within a frame is that the information that is produced and used is a set of relative values instead of absolute values. Absolute values are affected by all kinds of external influences such as the settings of a DVD player, noise on the analogue or digital path, horizontal and vertical shifts of a frame at the receiver etc. By switching to relative values a more robust process is possible.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:- Fig. 1 is a schematic diagram of an entertainment system, Fig. 2 is a schematic diagram of a content stream, Fig. 3 is a schematic diagram of a sequence of frames and respective fingerprints,
Fig. 4 is a flowchart of a method of determining one or more characteristics of the content stream, Fig. 5 is a schematic diagram of a sequence of frames, fingerprints and a timetable of predetermined fingerprints, and
Fig. 6 is a schematic diagram showing generation of a fingerprint.
DETAILED DESCRIPTION OF THE EMBODIMENTS
Fig. 1 shows a DVD player 10 connected to a display device 12 through a device 14. A signal 16, comprising a content stream, is output from the DVD player 10, in a conventional fashion. The device 14 is intercepting the signal 16 in order to monitor that signal 16, which it nevertheless passes on unaltered to the display device 12. The device 14 is shown as an independent device, but the functionality that is provided by that device 14 could be located within the DVD player 10 or within the display device 12, or be distributed amongst more than one device.
To illustrate the invention, Fig. 1 shows a conventional DVD player 10 and a conventional display device 12, with the device 14 located as a separate function in-between the two devices 10 and 12. However, the device 14 need not be interposed between the player 10 and the display device 12, it could be connected to a separate output of the DVD player 10, or the output of the DVD player 10 could be divided into two, with one output going to the display 12 and the other output going to the device 14. The DVD player 10 is only one example of a device that outputs a display signal. Other devices such as computers etc. may also be used in the system of Fig. 1.
The content stream 16, shown in Fig. 2, comprises a sequence 18 of frames 20. The content stream 16 can also include within it specific data, which contains information about the frames 20 within the sequence 18. The data need not be present within the signal 16, nor need it be on a frame-by-frame basis. If the data is not present in the signal 16, then the device 14 will obtain it from another location, such as by connecting to a remote server. The device 14 is a system for determining one or more characteristics (such as the playback speed) of the sequence 18 of image frames 20, and comprises a receiver 24, which is arranged to receive the sequence 18 of frames 20, and a processor 26, which is arranged to perform various calculations and processes on the frames 20, in order to detect the pause or stopping of the DVD player 10.
Fig. 3 shows schematically how the processor 26 is calculating fingerprints 28 from the frames 20 that make up the sequence 18 of frames 20, within the content stream 16. Each fingerprint 28 is calculated from one or more portions within each respective frame 20. The portion that is used to generate the fingerprint 28 may make up the entirety of the respective frame 20. The processor 26 is measuring an attribute, such as mean luminance within the selected portion(s) and using this to generate a bit sequence that makes up the fingerprint 28. Different methods of generating the fingerprints 28 are discussed in more detail below. As discussed above with reference to the prior art, the current technique used to synchronize the AV content stream with any additional effects delivery (such as described by an amBX script) makes use of the general assumption that the content stream is being played at the normal speed. By playing the content linearly, all of the frames are played in a linear order, which allows predicting the next frame being played based on the current frame. Furthermore, it allows for the use of temporal information for the creation of the fingerprints. Finally, the fact that known systems can easily use the combination of fingerprints created for a set of successive frames provides more information and therefore gives the system more chance to come up with the correct time prediction. This means that synchronization will work in cases where the detected fingerprints contain some bit-errors. The present invention is directed to the situation where the playing of the content stream 16 is at a speed different from normal, for example when the user is fast forwarding or rewinding the content stream. When the speed at which the content stream 16 is being played is different from normal, not all of the frames (in the original content) are being played anymore. Depending of the speed, multiple frames are being skipped and other frames are being shown on the screen for a number of frame-times. During fast forward and rewind, each frame that is being shown on the screen is presented for a normal frame duration of 40ms (in a PAL system). So, to allow for a faster move through the content, frames have to be skipped.
The system of Figs. 1 to 3 must be sufficiently robust to deal with the fact that within the content stream output from the DVD player 10, frames are skipped and others are repeated for a number of frame-times. So, the system makes use of single frame (spatial) fingerprints 28 (not using temporal properties in the fingerprints), and at least a pair of one frame fingerprints 28 is required to give reliable predictions. This is because the location of the searched position in the content stream 16 is more random and the fact that bit errors are still expected in the spatial fingerprints 28, for example due to the analogue path, finding a position based on one measurement gives only a small chance of getting the correct match. Similarly, owing to the possibility of the content stream 16 being output at different speeds (for example, 2x, 4x, 8x, 16x, 32x in both positive and negative direction) the distance between successive presented frames 20 cannot be predicted and is also not constant and therefore cannot be used/stored. It is not possible to store the combined results of a number of measurements, but each single spatial fingerprint has to be stored.
Fig. 4 illustrates the method carried out by the device 14 to determine, for example, the speed and direction of the playback embodied by the content stream 16. Step Sl comprises the receipt of the sequence 18 of frames 20, which are then passed to the processor 26 for handling. Steps S2 to S 5 are then repeated for a plurality of the frames 20 within the sequence 18.
Step S2 comprises measuring an attribute (for example mean luminance) of one or more defined portion(s) in a received video frame 20. Optionally, before using the mean luminance calculations of the defined portion(s), there is first taken the spatial differences of these mean luminance calculations. This is discussed in more detail below, with reference to Fig. 6. Effectively the processor 26 is deriving data from the pixel content of the frame 20 that can be used to generate a suitable fingerprint.
The next step is step S3, which is the calculation of a fingerprint 28 from the measured attribute. The processor 26 is representing the (set of) attributes in an optimized way, which can be considered as some sort of 'fingerprint' of the respective frame 20. The fingerprint 28 can be a bit sequence, and can be a single value which will be, for example, a (single) mean luminance or a representation of this mean luminance by a bit sequence. The latter can be done, if fixed mean luminance ranges are used within a content item. After a fingerprint has been calculated, the next step is step S4, in which a timetable of predetermined fingerprints is accessed. This can be recalled by the device 14 from local storage or may be accessed from data uploaded from the DVD player 10 at the start of the film, or downloaded from a remote storage device. The processor 26 is then configured, in step S5 to search in the "extra data" (for instance in the timetable) to find the accompanying time position or time period for the representation of these characteristics/features. This "extra data" contains for each frame of the content some kind of bit sequence. The processor 26 is identifying the calculated fingerprint from step S3 within the accessed timetable. The steps S2 to S5 are repeated for at least two of the frames 20 within the sequence 18 of frames 20. In general the two frames 20 for which the steps S2 to S5 are repeated are not successive frames, but will be spaced apart from each other.
The final step of the process is step S6, which uses the resulting time positions or time periods of a number (two or more) of these frames 20 (the result of step S5) and decides, based on these values, and the time elapsed between the measurements whether the content is being winded or that the content is played back continuously. The processor 26 is determining from the calculated and identified fingerprints 28 one or more characteristics of the content stream 16 from the position of the fingerprints 28. For example if within the content stream 16 received by the device 14, fingerprints 28 are calculated from the frame numbers twenty and twenty-five and then these fingerprints 28 are matched to predetermined fingerprints twenty and thirty of the timetable, then the output of the process is that the content stream 16 is being played forward at two times the normal speed. Based on the fact whether the time positions/periods (from the measurements) are increasing or decreasing it can be decided whether the winding is a fast-forward, slow-motion or a rewind. It is also possible to provide information about the winding speed itself. In one embodiment, the device 14 is providing that the (change of) position within the content, when the user is winding through the (video) content, can be detected by calculating the mean luminance intensity of the captured frames 20, or even using the mean luminance intensity difference between two following captured frames, and comparing these luminance intensities with values stored in the timetable (the timetable has to be extended with this possibility).
Being able to do this is based on the assumptions that firstly, sets of succeeding frames have mean luminance intensity values within a bounded range (typically the range is small compared to the maximum range from 0 to 255), and secondly, that such sets of succeeding frames are often longer than the distance of two following frames outputted by a disc-player device during any speed of fast-forward or rewind.
In video content 'particles' can be discerned. A particle is defined as a set of succeeding frames for which the mean luminance intensity for (each of) these frames lies within a specified mean luminance intensity range. The boundary of a range can be chosen dynamically, and depends on the structure of the frames of a particular piece of content. A particle boundary is defined as a point in the content where the mean luminance intensity changes to a value outside the mean luminance intensity range of the previous particle (two succeeding frames having mean luminance intensity values that fall into two different ranges). Often, these boundaries will be crossed at moments where scene or shot changes are occurring in the content, but this is not necessarily the case (for example, a light flashing up during a dark evening scene also results in a particle boundary but does not indicate a scene or shot change).
By storing the mean luminance intensity range in the timetable as reference material (extra annotations), the processor 26 can detect when a particle boundary is crossed. At that time the mean luminance intensity of the next frame no longer falls within the mean luminance intensity range of the previous particle and should fall within the range of the next particle of the content (during fast-forward) or within the range of the previous particle of the content (during rewind). This concept is shown in Fig. 5, which provides the detection of the crossing of a particle boundary. The timetable 30 is used in determining the direction of the content stream 16. The original content 32 is shown, as is the sequence of frames that is outputted by the DVD player 10 as the content stream 16. The processor 26 is calculating fingerprints 28 (shown as single values of mean luminance for each frame 20) which are then matched in the timetable 30.
To overcome incorrect changes some hysteresis can be build in. Based on the fact that the mean luminance intensity ranges for particles are related to the video content, the sequence of particle mean luminance intensity ranges is not expected to have a regular pattern and is therefore useful for this mechanism. During fast-forward or rewind a discplayer will only play-out some of the frames of the content. The length of the gap between two following frames played-out depends on the fast-forward or rewind speed and the specific implementation of this functionality for this in the disc player device. If for every played-out frame during this trick-play the mean luminance intensity is calculated the crossing of a boundary can be detected, meaning that a rough estimation can be given about the location within the content. If this is known, a general-script could be played that goes with this particle. To make this embodiment more robust, the solution could be improved by using relative changes of mean luminance values only. This could be done, by taking the mean luminance intensity difference between two following captured frames for instance. This solution is advantageous, because the actual mean luminance intensity on the output of a disc-player can vary between players due to different user/system settings or different hardware (even if the content is exactly the same).
If absolute values are being used in the calculation of the fingerprint from the attribute of mean luminance, it is important to be able to calibrate the luminance range of the disc-player. One option to do this, in case of content preceded with a black screen title, is by adding typical frames in the black screen title to calibrate the system. In other cases, it is an option to use mean luminance intensity values calculated at a known position in the content (for instance during normal playback), a position for which the synchronization system is very sure of having/finding the correct spot in the timetable, and compare these with the mean luminance intensity values calculated. Another optimization could be to not only check the direct surrounding particles as good candidates but also extend the set of possible candidates. This could depend on the length (the number of frames belonging to this particle) of the current particle that is detected and the length of the direct surrounding particles. To make sure that particles do not become too small it could be an option for situations where there is a very small set of frames outside the luminance intensity range of the surrounding sets of frames (before and after), to combine the smaller particles to make one larger particle out of this. In order to provide a workable solution it is necessary to add the mean luminance intensity, for the frames that do not fall within the luminance intensity range, in the timetable 30. If now during the winding situation this particular intensity would be measured this can be recognized as such and would not lead to problems in the process.
This embodiment provides a means to knowing, to a reasonable level of accuracy, the position in content when performing fast-forward or rewind operations. By having this, fast-forward and rewind can be discerned from playback, pause, and stop situations. Also resynchronization could be reached quicker after the user goes back to normal playback again (a rough position within the content is still known). Furthermore, the system could be used to keep playing some augmenting effects that align with the content so that the user stays in the atmosphere of the content in a better way, even while they are fast- forwarding, for example. Many possible extensions to this embodiment are possible. For example, instead of using the mean luminance intensity (difference) itself to decide upon crossing a particle boundary, also the following measures could be used, the mean luminance in parts of a frame, the total luminance in a frame, the total luminance difference between two following captured frames, and/or variation in the sum of temporal difference values (STD). So, as long as the calculated value for two following captured frames is below a threshold, both frames are identified as being part of the same particle.
The mechanism described above, could also be used to detect that there is no normal playback anymore, by counting the number of frames with mean luminance intensity (difference) in one particle. If the number of succeeding frames within one particle is (much) smaller than the number of frames expected in this particle, something else is happening other than normal playback. This same mechanism could maybe be used for detecting channel zapping by the user. And furthermore, it maybe also could help to discern between pause (receiving an expected mean luminance value) and stop (receiving a non expected mean luminance value). During the lock-in phase for synchronization or during a ^synchronization phase, the mean luminance intensity information could also be used to limit the set of candidates from the timetable to be used to find the next time-point in the content (part of the candidate selection mechanism). This could be used during the identification process. A further embodiment of the invention is now described, with reference to Fig.
6. In this embodiment, the processor is creating a spatial fingerprint 28, which is a fingerprint 28 created based on a spatial component of the frame 20, and these fingerprints 28 are used to get time values from the timetable 30. By subtracting the time values found for two successive frames 20, (it is also possible to select two frames on a certain time distance) resulting in a time period and comparing this with the time that would have been passed in case of normal playback it can be decided how fast the content is being played.
Fig. 6 illustrates the spatial fingerprint creation. A video frame 20 is split up into portions (numbered 34 in the Figure) of pixels. For the implementation of this Figure, the portions of the frame 20 comprise four rows of blocks and nine blocks per row. This gives a total of thirty-six portions. The spatial fingerprint 28 is computed by calculating the mean luminance difference (ML.Diff.) between two neighboring blocks of pixels in the same row. If the results of the calculation of this mean luminance difference is negative a bit with value '0' is added to the spatial fingerprint 28 and if the mean luminance difference is zero or positive a bit with value ' 1 ' is added to the spatial fingerprint 28. Repeating this process for all of the portions 34 within the whole frame 20 results in a thirty-two bits spatial fingerprint 28. Of course, the number of bits required for a spatial fingerprint 28 could be different from thirty-two, because spatial-fingerprints with a different number of bits are also possible.
The spatial fingerprints 28 are handled in the same manner as the previous embodiments in order to determine the speed of the playback of the content stream 16. The processor 26 takes the spatial fingerprints 28 from received frames N and N + Y, and searches for their time values in the timetable 30. The processor 26 then subtract the time value found for frame N from the time value found for frame N + Y, and then this result is divided by Y x Frame-Duration. The result gives a measure for the content speed. Instead of using time information, it could also be chosen to use the distance in the timetable between the two frames. The result has to be divided by the number of received frames (Y) in the final step.
Optimization of this process is possible, for example, the search for a time value in the timetable 30 based on the exact spatial fingerprint 28 could result in not finding a match. Therefore, it is an option to allow for some weak bits (toggling the bits from the spatial fingerprint 28 that have the most chance oi oeing wrong) in the spatial fingerprint 28 and using also these to find a candidate in identifying step of the embodiment.
Another optimization could be to not only use the frame pairs N and N + Y, but perform the operation on a set of pairs. Those pairs that give non-expected peaks can be filtered out. Additionally, or alternatively, instead of using a fixed frame number distance Y, the pair of frames that is used can be selected based on the strength of the spatial-fingerprints and the distance can then be a dynamic value. Yet, another optimization could be to limit the search in the timetable 30 by only using a subset of the timetable 30, being the part that lies around the last given time value from a detected fingerprint 28.

Claims

CLAIMS:
1. A method of determining one or more characteristics of a content stream (16) comprising a sequence (18) of frames (20), the method comprising the steps of: receiving the sequence (18) of frames (20), for each of a plurality of frames (20) in the sequence (18) of frames (20): - measuring an attribute of one or more portions (34) of the frame (20), calculating a fingerprint (28) from the measured attribute(s), accessing a timetable (30) of predetermined fingerprints (28) for the content stream (16), and identifying the calculated fingerprint (28) in the timetable (30), - determining the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28).
2. A method according to claim 1 , wherein the step of determining the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28) comprises identifying the speed and/or direction of the content stream (16).
3. A method according to claim 1 or 2, wherein the step of measuring an attribute of one or more portions (34) of the frame (20) comprises measuring a mean luminance of the portion(s) (34).
4. A method according to claim 1, 2 or 3, wherein the step of calculating a fingerprint (28) from the measured attribute(s) comprises generating a bit sequence from the measured attribute(s).
5. A method according to claim 4, wherein the generating of a bit sequence from the measured attribute(s) comprises performing a spatial difference operation on a pair of portions (34) of the frame (20).
6. A method according to any preceding claim, wherein the step of calculating a fingerprint (28) from the measured attribute(s) comprises generating a single value from the measured attribute(s).
7. A system for determining one or more characteristics of a content stream (16) comprising a sequence (18) of frames (20), the system comprising: a receiver (24) arranged to receive the sequence (18) of frames (20), and a processor (26) arranged, for each of a plurality of frames (20) in the sequence (18) of frames (20): - to measure an attribute of one or more portions (34) of the frame (20), to calculate a fingerprint (28) from the measured attribute(s), to access a timetable (30) of predetermined fingerprints (28) for the content stream (16), and to identify the calculated fingerprint (28) in the timetable (30), and to determine the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28).
8. A system according to claim 7, wherein the processor (26) is arranged, when determining the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28), to identify the speed and/or direction of the content stream (16).
9. A system according to claim 7 or 8, wherein the processor (26) is arranged, when measuring an attribute of one or more portions (34) of the frame (20), to measure a mean luminance of the portion(s) (34).
10. A system according to claim 7, 8 or 9, wherein the processor (26) is arranged, when calculating a fingerprint (28) from the measured attribute(s), to generate a bit sequence from the measured attribute(s).
11. A system according to claim 10, wherein the processor (26) is arranged, when generating of a bit sequence from the measured attribute(s), to perform a spatial difference operation on a pair of portions (34) of the frame (20).
12. A system according to any one oi claims 7 to 11, wherein the processor (26) is arranged, when calculating a fingerprint (28) from the measured attribute(s), to generate a single value from the measured attribute(s).
13. A computer program product on a computer readable medium for determining one or more characteristics of a content stream (16) comprising a sequence (18) of frames (20), the product comprising instructions for: receiving the sequence (18) of frames (20), for each of a plurality of frames (20) in the sequence (18) of frames (20): - measuring an attribute of one or more portions (34) of the frame (20), calculating a fingerprint (28) from the measured attribute(s), accessing a timetable (30) of predetermined fingerprints (28) for the content stream (16), and identifying the calculated fingerprint (28) in the timetable (30), - determining the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28).
14. A computer program product according to claim 13, wherein the instructions for determining the or each characteristic of the content stream (16) from at least two of the calculated and identified fingerprints (28) comprise instructions for identifying the speed and/or direction of the content stream (16).
15. A computer program product according to claim 13 or 14, wherein the instructions for measuring an attribute of one or more portions (34) of the frame (20) comprise instructions for measuring a mean luminance of the portion(s) (34).
16. A computer program product according to claim 13, 14 or 15, wherein the instructions for calculating a fingerprint (28) from the measured attribute(s) comprise instructions for generating a bit sequence from the measured attribute(s).
17. A computer program product according to claim 16, wherein the instructions for generating of a bit sequence from the measured attribute(s) comprise instructions for performing a spatial difference operation on a pair of portions (34) of the frame (20).
18. A computer program product according to any one of claims 13 to 17, wherein the instructions for calculating a fingerprint (28) from the measured attribute(s) comprise instructions for generating a single value from the measured attribute(s).
PCT/GB2009/000560 2008-03-14 2009-03-02 Processing a sequence of frames Ceased WO2009112803A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1016922.5A GB2472162B (en) 2008-03-14 2009-03-02 Processing in sequence of frames

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08152736.8 2008-03-14
EP08152736 2008-03-14

Publications (1)

Publication Number Publication Date
WO2009112803A1 true WO2009112803A1 (en) 2009-09-17

Family

ID=40602417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2009/000560 Ceased WO2009112803A1 (en) 2008-03-14 2009-03-02 Processing a sequence of frames

Country Status (2)

Country Link
GB (1) GB2472162B (en)
WO (1) WO2009112803A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065782A1 (en) * 2001-02-12 2002-08-22 Koninklijke Philips Electronics N.V. Generating and matching hashes of multimedia content
WO2005006758A1 (en) * 2003-07-11 2005-01-20 Koninklijke Philips Electronics N.V. Method and device for generating and detecting a fingerprint functioning as a trigger marker in a multimedia signal
WO2007072326A2 (en) * 2005-12-23 2007-06-28 Koninklijke Philips Electronics N.V. Script synchronization using fingerprints determined from a content stream

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002065782A1 (en) * 2001-02-12 2002-08-22 Koninklijke Philips Electronics N.V. Generating and matching hashes of multimedia content
WO2005006758A1 (en) * 2003-07-11 2005-01-20 Koninklijke Philips Electronics N.V. Method and device for generating and detecting a fingerprint functioning as a trigger marker in a multimedia signal
WO2007072326A2 (en) * 2005-12-23 2007-06-28 Koninklijke Philips Electronics N.V. Script synchronization using fingerprints determined from a content stream

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OOSTVEEN J ET AL: "FEATURE EXTRACTION AND A DATABASE STRATEGY FOR VIDEO FINGERPRINTING", LECTURE NOTES IN COMPUTER SCIENCE, SPRINGER VERLAG, BERLIN; DE, vol. 2314, 11 March 2002 (2002-03-11), pages 117 - 128, XP009017770, ISSN: 0302-9743 *

Also Published As

Publication number Publication date
GB201016922D0 (en) 2010-11-24
GB2472162B (en) 2012-12-26
GB2472162A (en) 2011-01-26

Similar Documents

Publication Publication Date Title
US10075758B2 (en) Synchronizing an augmented reality video stream with a displayed video stream
TWI681666B (en) Systems and methods for rendering & pre-encoded load estimation based encoder hinting
US20140255004A1 (en) Automatically determining and tagging intent of skipped streaming and media content for collaborative reuse
US20180295420A1 (en) Methods, systems and apparatus for media content control based on attention detection
US9438876B2 (en) Method for semantics based trick mode play in video system
JP2013197995A5 (en) Movie shooting device, movie shooting method, digest playback setting device, digest playback setting method, and program
US6539055B1 (en) Scene change detector for video data
CN105359544A (en) Trick Play in Digital Video Streaming
US12407895B2 (en) Temporal placement of a rebuffering event
JP2010532943A (en) A method for synchronizing a content stream and a script for outputting one or more sensory effects in a multimedia system
JP2007511948A (en) Trick play signal playback
CN101466013B (en) Signal processor and signal processing method
JP2016201617A (en) Moving picture reproducing apparatus and method
US20200204867A1 (en) Systems and methods for displaying subjects of a video portion of content
US11363286B2 (en) System and method for improved video operations
WO2009112803A1 (en) Processing a sequence of frames
KR20110129325A (en) Advertising image detection method and apparatus
US9001895B2 (en) Image display device and image processing device
KR101290673B1 (en) Method of detecting highlight of sports video and the system thereby
US8502922B2 (en) Multimedia device and play mode determination method of the same
WO2009063172A2 (en) Pause/stop detection
KR100739995B1 (en) CTC system with automatic search function and method
WO2012086616A1 (en) Video processing device, video processing method, and video processing program
JP2006245658A (en) Decoder verification device, encoder verification device, decoder verification method, encoder verification method
JP2015097426A (en) Method and device for optimal playback positioning in digital content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09720422

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 1016922

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20090302

WWE Wipo information: entry into national phase

Ref document number: 1016922.5

Country of ref document: GB

122 Ep: pct application non-entry in european phase

Ref document number: 09720422

Country of ref document: EP

Kind code of ref document: A1