[go: up one dir, main page]

WO2008125861A2 - Video data transmission - Google Patents

Video data transmission Download PDF

Info

Publication number
WO2008125861A2
WO2008125861A2 PCT/GB2008/001332 GB2008001332W WO2008125861A2 WO 2008125861 A2 WO2008125861 A2 WO 2008125861A2 GB 2008001332 W GB2008001332 W GB 2008001332W WO 2008125861 A2 WO2008125861 A2 WO 2008125861A2
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
signatures
signature
portions
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2008/001332
Other languages
French (fr)
Other versions
WO2008125861A3 (en
Inventor
Phil Endecott
Nigel Dickens
David Milway
Brian Knight
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ADVENTIQ Ltd
Original Assignee
ADVENTIQ Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ADVENTIQ Ltd filed Critical ADVENTIQ Ltd
Publication of WO2008125861A2 publication Critical patent/WO2008125861A2/en
Publication of WO2008125861A3 publication Critical patent/WO2008125861A3/en
Priority to GB0917453A priority Critical patent/GB2460588A/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/1454Digital output to display device ; Cooperation and interconnection of the display device with other functional units involving copying of the display data of a local workstation or window to a remote workstation or window so that an actual copy of the data is displayed simultaneously on two or more displays, e.g. teledisplay
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2370/00Aspects of data communication
    • G09G2370/24Keyboard-Video-Mouse [KVM] switch

Definitions

  • the present invention relates to a method of transmitting video and in particular to a method of reducing the amount of video data transmitted between computers over a network.
  • Remote access software packages are available, such as VNC provided by RealVNC Ltd, that allow the content of one computer's display to be displayed, for instance in a window, on another remote computer's display over a network connection.
  • VNC provided by RealVNC Ltd
  • Such software can also communicate keyboard and mouse data in the reverse direction, thereby allowing full remote control.
  • VNC uses a protocol called RFB to send the display data over the network (such as a LAN or a WAN, such as the Internet).
  • RFB stands for "Remote Frame Buffer".
  • An objective of the design of RFB is to minimise the. data that needs to be sent. For example, rather than simply sending the values of all of the pixels in every frame, it sends only those parts of the screen that have changed each time.
  • VNC operates entirely in software, hi contrast, the present invention relates to systems in which video capture is carried out in hardware.
  • Such systems including hardware video capture and sending display data over a network to a remote software viewer are sometime referred to as "KVM via DP" indicating that Keyboard, Video and Mouse data are sent via Internet Protocol over the network.
  • AdderLink DP A hardware RFB encoder already exists, and is currently provided under the name AdderLink DP and is provided by Adder Technology Ltd.
  • the input to the AdderLink DP encoder is video from a standard PC monitor lead and the input video data is captured into memory within the device.
  • Embedded software encodes the data from this memory in the RFB format and transmits the encoded data to the remote viewer computer via the network interface.
  • AdderLink IP attempts to identify and send only those regions of the display that have changed in each frame, in a process known as
  • the special case relates to where there is a translation of a portion of video data between a first frame and a second frame.
  • a translation is called a "copyrect" (meaning a copy rectangle, where rectangle includes rectangular and square).
  • a copyrect is a message in the RFB protocol that tells the displaying computer to copy a region of its screen to a new location, rather than requiring the contents of that frame portion to be resent.
  • Copyrects are commonly generated when a user moves or scrolls a window. These are frequent operations which require only a handful of bytes of network data when encoded as copyrects, but would need many kilobytes of data if sent in any other way.
  • the detection of copyrects is alternatively referred to herein as motion detection.
  • Embodiments of the current invention provide a method and apparatus implementing a hardware encoder which can detect copyrects from incoming video data by comparing a new frame with the last one.
  • “hardware encoder” means a combination of hardware and embedded software as the split between hardware and software is arbitrary and dependent upon a particular implementation of the encoder.
  • the present invention relates to video from PCs (i.e. personal computers and similar). Work in the field of camera-image video (e.g., television), has quite different characteristics. Video compression formats such as MPEG make use of different forms of motion detection to identify regions that have moved from one frame to the next (e.g., camera panning, or a subject moving within the frame).
  • the above discussion of the prior art relates predominantly to a situation in which the PC video data to be encoded is in a digital format, for instance DVI (digital video interface).
  • DVI digital video interface
  • the video data received by the encoder is the same as the original data displayed upon the remote computer, for example when the input to the system is a DVI cable.
  • many PC video connections use an analogue signal. This can be converted to digital data by an analogue-to-digital converter in the hardware encoder.
  • this analogue data will typically include noise, including noise introduced during the analogue to digital conversion.
  • the process of encoding the data for use in a KVM via IP system is further complicated by the presence of noise.
  • Embodiments of the present invention provide more efficient detection methods. In particular embodiments of the present invention, all three techniques can be applied within a single system.
  • the present invention is not limited to a comparison of a first frame with a second immediately subsequent frame. Indeed, any two frames may be compared even if widely separated in time, for instance by intervening frames. Furthermore, the second frame may precede the first frame (again, possibly with intervening frames). The present invention further extends to instances in which the first and second frames are one and the same frame, for which differences are detected between portions of the same frame, or translated and copied portions of video data within the same frame can be detected.
  • Embodiments of the present invention provide improved KVM via IP systems for use with both digital and analogue PC video data.
  • the invention is not, however, limited to KVM via IP applications, and may be more generally applied for processing video data, and in particular PC video data, in other applications.
  • a method of detecting a translation of a portion of video data received from a video source comprising: computing first signatures for each of a plurality of portions of a video frame; computing a second signature for a portion of a video frame; comparing the second signature with the first signatures; and determining whether a portion of video data has been translated according to a result of said comparison.
  • An advantage of the first aspect of the present invention is that by detecting translated portions of video data, a reduced amount of information relating to the amount by which the video data has been translated needs to be transmitted across a network in a KVM via IP system rather than the translated video data itself. This can provide significant reductions in the amount of information transmitted across a network for many typical applications, for instance where a remote computer is displaying conventional desktop publishing applications and a user is scrolling portions of the displayed data.
  • Said first signatures may be computed for a plurality of portions of a first video frame and said second signature is computed for a portion of a second video frame, the method comprising determining whether a portion of video data has been translated between the first and the second frames according to a result of said comparison.
  • said first signatures may be computed for a plurality of portions of a video frame and said second signature is computed for a different portion of the same video frame, the method comprising determining whether a portion of video data has been translated within the video frame according to a result of said comparison.
  • Said signatures may be computed from video data contained within the respective portions of a video frame.
  • the method may further comprise computing second signatures for each of a plurality of portions of a video frame, and comparing each second signature with the first signatures.
  • a second signature may be computed at every pixel position in a frame of video data.
  • the plurality of portions of the video frame for which first signatures are computed may comprise aligned and non-overlapping portions of the video frame.
  • the portions of each video frame may comprise rectangular video portions of a predetermined size.
  • the method may further comprise computing a magnitude of the translation of the video data.
  • the method may further comprise: storing at least some of the second signatures associated with aligned and non-overlapping portions of the video frame; and repeating the method using the stored second signatures in place of the first signatures.
  • the method may further comprise: identifying a first subset of the first signatures; comparing the first subset of first signatures with the or each second signature; and determining whether a region of the video frame has been translated according to a result of said comparison.
  • the first subset may comprise one or more first signatures for which a value of the signature is unique amongst the first signatures computed for the video frame.
  • the method may further comprise computing an amount of the translation of the region of the video frame.
  • the signatures may be computed using a rolling signature algorithm, the rolling signature algorithm allowing a new signature to be computed from a previous signature of an overlapping portion of the video frame and the values of those pixels either included in the previous overlapping portion of the video frame and not in the current portion of the video frame or are included in the current portion of the video frame and not in the previous overlapping portion of the video frame.
  • the rolling signature algorithm may comprise a two-dimensional rolling signature algorithm.
  • the two dimensional rolling signature algorithm may comprise: calculating a first signature using video data for each of a plurality of columns of pixels; and calculating a second signature using each of the first signatures.
  • the method may further comprise: generating an encoded representation of a video frame; wherein the encoded representation of the video frame allows the identification of portions of video data that have been translated.
  • the method may further comprise: transmitting the encoded representation of the video frame over a computer network to a decoder associated with a remote computer; wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
  • the method may further comprise: computing third signatures for each of a plurality of aligned and non-overlapping portions of a video frame; computing fourth signatures for each of a plurality of aligned and non-overlapping portions of a video frame; and comparing the third and fourth signatures to identify portions of a video frame that are the same as corresponding portions of another video frame; wherein an encoded representation of a video frame only includes an encoded representation of portions of the video frame that are not the same as portions of another video frame.
  • the method may further comprise: receiving analogue video data received from a video output of a further computer; converting the analogue video data to digital video data; computing fifth signatures for each of a plurality of portions of a video frame; computing sixth signatures for each of a plurality of portions of a video frame; comparing the fifth and sixth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
  • the method may further comprise: receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
  • a carrier medium carrying computer readable code for controlling a data processing device to carry out the above described method.
  • a data processing device for detecting a translation of a portion of video data received from a video source, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the above described method.
  • a method of processing analogue video data received from a video source comprising: converting the analogue video data to digital video data; computing first signatures for each of a plurality of portions of a video frame; computing second signatures for each of a plurality of portions of a video frame; comparing the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
  • An advantage of the fourth aspect of the invention is that by providing an efficient means of encoding analogue PC video data, the amount of information that needs to be transmitted across a network in a KVM via IP system can be reduced.
  • Said first signatures may be computed for a plurality of portions of a first video frame and said second signatures are computed for portions of a second video frame, the method comprising identifying portions of video data within the second video frame that are the same as corresponding portions of video data within the first video frame.
  • said first signatures may be computed for a plurality of portions of a video frame and said second signatures are computed for a different plurality of portions of the same video frame, the method comprising identifying portions of video data that are the same as other portions of video data within the same video frame.
  • the method may further comprise: generating an encoded representation of the video frame; wherein the encoded representation of the video frame allows the identification of portions of the video frame that are the same as corresponding portions of another video frame.
  • the method may further comprise: transmitting the encoded representation of a video frame over a computer network to a decoder associated with a remote computer; wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
  • the method may further comprise: receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
  • a carrier medium carrying computer readable code for controlling a data processing device to carry out the above described method.
  • a data processing device for processing analogue video data received from a video source, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 24 to 29.
  • an apparatus for detecting a translation of a portion of video data received from a video source comprising: a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute a second signature for a portion of a video frame; a comparator operable to compare the second signature with the first signatures and to determine whether a portion of video data has been translated.
  • the signature generator may be operable to compute first signatures for a plurality of portions of a first video frame and said second signature for a portion of a second video frame, the comparator being operable to determine whether a portion of video data has been translated between the first and the second frames.
  • the signature generator may be operable to compute first signatures for a plurality of portions of a video frame and said second signature for a portion of the same video frame, the comparator being operable to determine whether a portion of video data has been translated within the same video frame.
  • the apparatus may comprise a hardware video encoder which is connectable to a video output of a computer.
  • the hardware video encoder may comprise a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory such that the processor comprises the signature generator and the comparator.
  • the signature generator may be operable to compute signatures from video data contained within the respective portions of a video frame.
  • the signature generator may be operable to compute second signatures for each of a plurality of portions of a video frame, and the comparator is operable to compare each second signature with the first signatures.
  • the signature generator may be operable to compute a second signature at all pixel locations within a video frame except pixel locations at a top edge or at a left hand edge of the video frame.
  • the plurality of portions of a video frame for which first signatures are computed may comprise aligned and non-overlapping portions of a video frame.
  • the comparator may be in the form of a pipeline comprising a series of processing stages arranged to compare a second signature iteratively against a plurality of first signatures to determine whether a portion of video data has been translated.
  • the apparatus may implement a binary tree storage structure for storing the plurality of first signatures, arranged such that at each processing stage the pipeline is operable to compare the second signature with a single first signature, the number of stages being less than the number of first signatures.
  • Each processing stage may comprise a memory, a comparator and a logic device.
  • the logic device may implement a function which: appends a 0 to an address and sets a match signal, if a preceding stage has indicated an exact match; appends a 1 to the address and sets the match signal, if the comparator of the stage indicates an exact match between a data signal and the content of the memory of the stage; or appends a 0 to the address and does not set the match signal, if the comparator of the stage indicates that the data signal is less than the content of the memory of the stage; or appends a 1 to the address and does not set the match signal, if the comparator of the stage indicates that the data signal is greater less than the content of the memory of the stage; wherein a final match signal is indicative of whether a portion of video data has been translated, and a final address is indicative of the first position of the translated portion of video data within the video frame.
  • the apparatus may further comprise: a transmitter arranged to transmit an encoded representation of video frame to a decoder associated with a remote computer; wherein the encoded representation of the video frame allows the identification of portions of video data that have been translated and wherein the decoder is adapted to reproduce the video ⁇ frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
  • the apparatus may further comprise: a second signature generator operable to compute third signatures for each of a plurality of aligned and non-overlapping portions of a video frame and fourth signatures for each of a plurality of aligned and non-overlapping portions of a video frame; and a second comparator operable to compare the third and fourth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein an encoded representation of a video frame allows the identification of portions of the video frame that are the same as portions of another video frame.
  • the apparatus may further comprise: an analogue to digital converter arranged to convert analogue video data received from a video output of a further computer to digital video data; a third signature generator operable to compute fifth signatures for each of a plurality of portions of a video frame and to compute sixth signatures for each of a plurality of portions of a video frame; and a third comparator operable to compare the fifth and sixth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the fifth and sixth signatures are computed using a function which provides the substantially same value irrespective of the presence of noise in the digital video data.
  • the apparatus may further comprise: a receiver for receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
  • an apparatus for processing analogue video data received from a video source comprising: an analogue to digital converter arranged to convert analogue video data to digital video data; a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute second signatures for each of a plurality of portions of a video frame; and a comparator operable to compare the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the first and second signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
  • Said signature generator may be operable to compute first signatures for a plurality of portions of a first video frame and second signatures for portions of a second video frame, the comparator being operable to identify portions of video data within the second video frame that are " the same as corresponding portions of video data within the first video frame.
  • Said signature generator may be operable to compute first signatures for a plurality of portions of a video frame and second signatures for portions of the same video frame, the comparator being operable to identify portions of video data within the video frame that are the same as other portions of video data within the video frame.
  • the apparatus may further comprise: a transmitter arranged to transmit an encoded representation of the video frame to a decoder associated with a remote computer; wherein r
  • the encoded representation of the video frame allows the identification of portions of video data that have been translated between and wherein the decoder is adapted to reproduce the video frame using the encoded representation of the second video frame and a previously decoded representation of another video frame.
  • the apparatus may further comprise: a receiver for receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
  • Embodiments of the present invention may be implemented in software.
  • a carrier medium carrying computer readable code for controlling a computer or a bespoke hardware encoder to carry out the above aspects of the invention may be provided.
  • a computer apparatus comprising a program memory storing processor readable instructions and a processor configured to read and execute instructions stored in said program
  • the processor readable instructions stored in said program memory may comprise instructions controlling the processor to carry out the above aspects of the invention.
  • the computer apparatus may comprise bespoke hardware.
  • certain embodiments of the present invention are implemented in a bespoke encoder, which comprise bespoke hardware and embedded software for processing video data.
  • FIG. 1 schematically illustrates a network within which a KVM via IP system in accordance with certain embodiments of the present invention can be implemented
  • FIG. 2 schematically illustrates a KVM via IP video encoder suitable for video data which is in a digital format, in accordance with an embodiment of the present invention
  • Figure 3 schematically illustrates a rolling signature generator forming part of the video encoder of Figure 2
  • Figure 4 schematically illustrates a KVM via IP video encoder suitable for video data which is in an analogue format, in accordance with a further embodiment of the present invention
  • Figure 5 schematically illustrates an analogue signature generator forming part of the video encoder of Figure 4;
  • Figure 6 shows a schematic block diagram of a stage of an apparatus for implementing a pipelined process for detecting translated portions of video data in frames of PC video data, in accordance with an embodiment of the present invention.
  • Figure 7 shows a graphical representation of the storage of data for use within the pipelined process partially implemented by the apparatus of Figure 6..
  • FIG. 1 this schematically illustrates an example of a computer network within which a KVM via IP system in accordance with embodiments of the present invention can be implemented.
  • a first computer 100 is connected to a decoder 160 via a network 150.
  • the network 150 comprises an Internet Protocol (IP) network.
  • IP Internet Protocol
  • the first computer 100 generates PC video data that may be displayed upon a local video display device connected to the first computer.
  • PC Personal Computer
  • PC Personal Computer
  • the decoder 160 may comprise a stand alone data processing device, for instance a ' hardware decoder (where the term "hardware” should be interpreted broadly to include a hardware process arranged to implement embedded software). Alternatively, the decoder 160 may be connected to a computer to process the decoded video data. The decoder 160 may be integrated with a computer. A decoder 160 in combination with a display device 161 may be referred to as a KVM terminal. PC video data is encoded via a hardware encoder 110 and transmitted to the decoder 160 for display upon a display device 161 to a user located there.
  • user input data for instance derived from user input devices such as a computer keyboard 162 or mouse 163 connected to the decoder 160 (optionally via a separate computer), is transmitted from the decoder 160 to the first computer 100 via the network 150 in order to control the first computer 100.
  • a user operating the keyboard 162 or mouse 163 can interact with the first computer 100 much as if they were local to it. Since network speed is typically limited, denser encoding of the video data has the benefit of improving the responsiveness of the system and hence the quality of the user's experience.
  • the network 150 may comprise any form of network, such as a local area network (LAN) or a wide area network (WAN), for instance the internet.
  • LAN local area network
  • WAN wide area network
  • the decoder 160 may optionally be arranged to receive encoded video data from a plurality of computers 100 each connected to the network 150 by separate encoders 110 or by a single encoder 110 connected via a switch to all of the computers 100.
  • Raw PC video data from the first computer comprises a sequence of frames, each made up of a sequence of rows and each row made up of a sequence of pixels.
  • Raw video data can be encoded to a compressed format in order to reduce the amount of data needed for the representation while retaining the same or similar appearance.
  • Embodiments of the present invention relate to efficient methods and apparatuses for performing such encoding.
  • the invention is not, however, limited to KVM via IP applications, and may be more generally applied for processing video data, and in particular PC video data, in other applications.
  • Figure 2 shows the organisation of the video processing portion of a KVM via IP system for DVI digital video incorporating the current invention.
  • Figure 2 illustrates a hardware PC video data encoder 110 for encoding digital PC video data prior to transmission across a network.
  • the term "hardware encoder” is not to be interpreted as the encoder processing video data exclusively in hardware.
  • a practical hardware encoder typically includes a processor adapted to implement embedded software for performing at least part of the encoding.
  • the hardware encoder illustrated in Figure 2 is particularly adapted for processing DVI PC video data.
  • FIG. 2 illustrates a computer 100 which outputs digital video in DVI format which in turn is input to a KVM via IP video encoder 110.
  • the KVM via IP video encoder 110 encodes this DVI video data using encoding methods in accordance with embodiments of the present invention and outputs encoded video data to a network 150 which transmits the encoded data to a remote decoder 160 connected to a remote display 170.
  • Keyboard and mouse data for controlling computer 100 may be transmitted back across network 150, in accordance with known techniques that will not be further described here.
  • FIG. 2 further illustrates the internal organisation of the KVM via IP video encoder 110 as will now be described.
  • Incoming DVI video data to the encoder 110 is first decoded to raw video data by means of a DVI receiver 112.
  • the video encoder 110 shown in Figure 2 could be modified to operate with any other form of digital PC video signal by replacing the DVI receiver 112 with an appropriate receiver.
  • the raw video data is in raster format, that is with pixels represented in order from left to right and from top to bottom of a video frame.
  • the present invention is not limited to any particular format for the raw video data, beyond the raw video data being stored in a digital format.
  • the raw video data is stored in a frame buffer memory 114 from where it can be read by an encoder 116
  • the encoder 116 is responsible for generating the encoded video data to be transmitted across the network, as will be described below.
  • the raw video data is also sent to a region signature generator 118 which computes a 32- bit signature using a cyclic redundancy check (CRC) for each aligned non-overlapping region of a frame (e.g. 16x16 pixel squares or some other size or shape). That is, the frame of video data is subdivided into a series of separate video data portions, for each of which a separate 32-bit region signature is generated using a CRC.
  • the generated region signatures are stored in a region signature memory 120 from where they can be read by the encoder 116.
  • the 32-bit CRC specified by clause 3.2.8 of IEEE standard 802.3 is used to generate region signatures, although it will be appreciated that there are a number of other suitable known signature types.
  • the raw video data from the DVI receiver 122 is also sent to a rolling signature generator 122 which computes a 16-bit signature for each region (e.g. 16x16 pixel squares or some other size or shape) of a first frame.
  • a rolling signature generator 122 which computes a 16-bit signature for each region (e.g. 16x16 pixel squares or some other size or shape) of a first frame.
  • region e.g. 16x16 pixel squares or some other size or shape
  • the rolling signatures that are computed for aligned non-overlapping region positions corresponding to the same non-overlapping portions of the screen operated upon by the region signature generator 118 are stored in an aligned rolling signature memory 124 from where they can be read by the
  • a second frame is studied and rolling signatures from the rolling signature generator 122 are generated for both aligned and unaligned regions. That is, a rolling signature is generated for a plurality of pixel locations, except those near to the edge of the frame. Those signatures that correspond to aligned non- overlapping regions are stored in a further signature memory 126. In certain embodiments of the present invention rolling signatures are generated for every pixel location.
  • the signature memory 126 data previously stored for a preceding frame, derived from the aligned rolling signature memory 124, is compared to the rolling signature generated for each pixel of the present frame to identify whether or not rolling signatures generated for any pixel position in the new frame match any rolling signatures for aligned positions in the old frame. If a rolling signature is matched, this corresponds to a detected copyrect. Based on this record of current signature data and previous signature data, a rolling signature value and the coordinates of the region of the previous frame it corresponds to are sent via a run-length encoder 128 to a match log memory 130, which records details of detected copyrects.
  • the match log memory 130 records details of the pairs of matched signatures stored for the first frame in the aligned rolling signature memory 124 and currently generated by the rolling signature generator 126 for the second frame.
  • the run-length encoder 128 is used to limit the amount of memory required to store this information, but may be omitted in certain embodiments of the invention.
  • the amount of match log data that needs to be stored can be reduced by coalescing neighbouring match log entries, for neighbouring regions of the video frame that contain the same pixel values, into a single entry augmented with a count of the number of neighbouring matched regions. From the match log memory 130 this data regarding detected copyrects can be read by the encoder 116.
  • the encoder 116 first reads data stored in the region signature memory 120 for the current frame and compares it with the corresponding data for a previous frame retained within the region signature memory 120.
  • the encoder 116 performs differencing, that is detecting portions of the frame that have not changed from the preceding frame. When the signatures are equal, it is determined that portion of the screen has not changed, and no data corresponding to it is included in the encoded video data output by the encoder 116.
  • the remote decoder 160 is operable to detect those portions of the screen that are not included within the encoded video signal and to copy that portion of the frame from the previous frame generated at the remote computer. Thus for portions of a video image that change infrequently (for instance menu bars), the video data need not be continually retransmitted.
  • the encoder 116 determines that a portion of the screen has changed, it next reads data stored in the match log memory 130 to determine whether that portion of the screen matches any portion of the previous frame (i.e. it performs motion detection by detecting copyrects), or any data that is cached by the viewer.
  • the process of detecting copyrects is described in greater detail below.
  • the receiver of the encoded video data retains certain old data (a "cache"); it may, for example, retain a certain number of the most recent frames of data (as well as the immediately preceding frame), or a certain number of most recent bytes of data, or certain data specifically indicated for retention by the encoder.
  • a new frame contains content that has occurred before and is stored in the receiver's cache the encoder need not send the content but can instead place a code in the compressed data indicating that the cached data should be retrieved and re-used.
  • the signatures are used to facilitate efficient encoding with caching in the present invention.
  • the encoder 116 keeps track of the signatures of the content to be stored in the receiver's cache and marks the corresponding locations in the signature memory to this effect.
  • a match log entry is stored. These match log entries are subsequently used to place a cache code in the compressed video data transmitted across the network 150.
  • the signatures themselves can be used as identifiers for the cached content.
  • the encoder 116 When a copyrect is detected, the encoder 116 outputs into the encoded video data a code describing the match, rather than having to include the whole of the video data for that portion of the frame. Information about regions that have moved by the same displacement can be identified and merged in order to compress the data further. After processing the current frame to detect unchanged frame regions and copyrects, if the region signature memory 120 indicates that a portion of the screen has changed and the match log memory 130 provides no information about matches for that area the encoder 116 reads the raw video data from the frame buffer memory 114. The encoder 116 may apply additional known forms of encoding or compression to this raw data using conventional techniques before outputting the video data into the encoded video data stream transmitted across the network.
  • the encoder 116 also updates the signature memory 126 with the current contents of the aligned rolling signature memory 124, in order to detect copyrects for the following frame, . so that rolling signatures at aligned positions in the current frame will result in match log entries when they are found in the next frame to be processed.
  • the rolling signature function used is chosen for the property that it can be economically updated by storing a new value and retrieving an old value without needing to recompute the signature from scratch. That is, if we know f(D0...Dn) then f(Dl...Dn+l) can be economically computed given DO and Dn+ 1.
  • a signature function is a general term for a function that computes an identifier for a chunk of data.
  • An example of a signature function is the CRC used within the region signature generator 118. If two portions of data have the same signature, then it is likely that the data itself is the same. However, there is always a chance that two different chunks of data will have the same signature. This is called a collision. Fallback methods can be used to cope in the rare case when a signature collision occurs. However, the more bits that are used in the signature, the less likely are collisions and so with sufficient bits in the signature, the need to use fallback methods can be obviated.
  • the signature function used to detect copyrects in the embodiment of the invention shown in Figure 2 is a 16-bit rolling signature, generated by the rolling signature generator 122.
  • Embodiments of the present invention compute and store signatures (each having a length of, e.g., 16 to 64 bits) for rectangular or square regions of the display (e.g., 16 by 16 pixels). It will be appreciated, however, that any shape that can be tiled across a frame may alternatively be used, though only rectangular frame regions will be described for simplicity.
  • Using signatures allows regions that have changed to be detected efficiently.
  • the signatures for a new frame of video data are compared with the signatures for the last frame processed, which is typically the immediately preceding frame, in order to detect copyrects. This requires less data processing than a method based simply on comparing pixels. For example, one 32-bit signature can be compared for a 16x16 pixel block, rather than 256 16-bit pixel comparisons-, providing a 100-fold saving.
  • the motion-detection algorithm is not limited to detecting only copyrects that are aligned to pixel region boundaries, since users can move or scroll windows with any pixel displacement. As discussed above, this is achieved by computing signatures for an old video frame only for aligned pixel regions. For the new video frame, rolling region signatures are computed at a large number of pixel positions, and the signatures for the new frame are compared as they are generated with the stored signatures for the last frame. This algorithm allows copyrects with any displacement to be detected, not only displacements that are a multiple of the pixel region size, and stores and compares only a relatively small number of signatures.
  • the rolling signature function used in the current invention is similar to, though simpler than, the Adler-32 [RFC 1950], Fletcher [RFCl 146] and Rsync [A. Tridgell, P. Mackerras, "The rsync algorithm", Technical report, Australian National University] algorithms. These known algorithms related only to the processing of one dimensional data. There is no suggestion within these algorithms or within the cited technical report that they could be modified to be applied to two dimensional data. It will be appreciated that the precise signature function chosen may vary.
  • the method of detecting copyrects uses a 2D rolling signature algorithm to generate a 2D rolling signature by first computing a set of ID rolling signatures for columns, and then taking a ID rolling signature of those signatures. More particularly, for an n-by-m pixel video frame, for each of the n pixel columns of the screen a rolling signature relating to a region of the frame identified by the pixel is maintained. From the incoming PC video data for the new frame, as each pixel value arrives (in raster-scan order) the pixel data value is incorporated into the appropriate column signature, and the pixel value that arrived m lines earlier (i.e., the corresponding pixel for the last frame) is removed from the signature for that column.
  • a single rolling signature is calculated for the other dimension, i.e., across the pixel rows for the region of the frame identified by that pixel.
  • the row signature is recalculated with the new column signature, and the column signature from m columns earlier is removed.
  • this signature is a rolling signature for the region of the frame of which the current pixel is the bottom-right corner. That is, the rolling signature calculates a signature for a region of a frame smaller than the whole frame associated with the current pixel, that is every region of the frame at every offset, not just aligned and non-overlapping portions of the screen.
  • a content-addressable memory can be used for the rolling signature memory 126.
  • one location is used for each signature actually stored, rather than one for each possible signature. This makes it possible to use a signature with more bits and hence less likelihood of collisions.
  • FIG. 3 illustrates the internal structure of the rolling signature generator 122.
  • Raw video data derived from the DVI receiver is first processed by a hash function 200, which outputs compacted pixel values for each new pixel value in a frame.
  • Digital PC video data pixels values are processed in raster-scan order: typically left to right and top to bottom. Once the pixels in the last row of a frame have been transmitted, the pixel values for the next frame are transmitted, starting again at the first row and the first column.
  • the purpose of the hash function is to reduce the amount of data to be stored while generating the rolling signature.
  • the compacted pixel values are stored temporarily in a buffer 202 so that a pair of compacted pixel values can be presented to a column rolling signature function 204 corresponding to one (new) pixel and another (old) pixel from m rows earlier (where m is equal to the number of rows of pixels within each video frame). That is, each pixel is presented to the column rolling signature function 204 together with the corresponding pixel value from the preceding frame.
  • the column rolling signature function 204 calculates a separate rolling signature for each column of pixels within a region of a video frame identified by the current pixel.
  • a column rolling signature memory 206 stores current rolling signatures for each column of the screen. Given the current rolling signature, the new compacted pixel value and the old compacted pixel value (the corresponding compacted pixel value from one tile height earlier within the same frame) the column signature function 204 computes a new column rolling signature. The new column rolling signature is sent to the column rolling signature memory 206 where it replaces the previous signature for that column. By updating the column signature in this way each time a pixel is changed, the column rolling signature function 204 provides an efficient means of computing signatures of regions of an image (the regions of an image being referred to herein as "tiles") providing maximal re-use of a previously computed signature in the creation of a signature for a neighbouring region.
  • Each new column rolling signature is in turn also stored temporarily in a buffer 208 so that a pair of values can be presented to a region rolling signature function 210 corresponding to a new column signature and an old column signature from n columns earlier (where n is the number of columns in each tile). That is, a new column signature is presented to the region rolling signature function with the column signature from n columns earlier within the same frame. Given the current region rolling signature, which it stores internally, and the new and old column rolling signatures, the region rolling signature function 210 computes the new region rolling signature.
  • the region rolling signature function 210 each time a new pixel value is transmitted by the hash function, the region rolling signature function 210 generates a rolling signature for the region of the frame identified by that pixel (that is, rolling signatures are generated for every overlapping portion of video data) except near the top and left edges of a frame.
  • the rolling signature is a function of all the pixels in the region of the frame of which the current pixel is the bottom right corner.
  • the process of creating the rolling signatures reduces the information encoded within a tile of pixels to a small number of compacted pixel values. If, by way of example, the compacted pixels are each represented by four bits of information, then for a region of the video frame that is all the same colour there are at most sixteen different tile signatures. Consequently there would be an unacceptably high probability of a signature collision when all of the pixels in such a tile move from one position to another within a frame.
  • the region rolling signature is combined with the raw video data for one pixel by means of a bitwise exclusive OR operation by the exclusive OR function 212 to output the final output of the rolling signature generator 122.
  • some signatures may be identified as "special" by storing a different value than normal in the signature memory 126. For example, a random subset of the signatures or those signatures that occurred only once in the first frame may be selected.
  • match log entry is stored in a separate part of the match log than other match log entries.
  • a subset of the matches is recorded and built up in this separate part of the match log; these can be more readily assessed to identify the nature of any translations of regions of video data between two frames. For example, if the user has scrolled a window within an application currently being displayed on the first computer, it will normally be possible to identify the distance that the window contents have moved by studying only a few of the regions that fall within the window.
  • the "non-special" part of the match log is partitioned according to the position of the match in the second frame relative to the region dimensions.
  • match log entries are written to memory in contiguous locations as the matches are found.
  • Each match log entry is augmented with the address, or part of the address, of the last entry in the same partition. It is then possible to study all of the match log entries in a particular partition by following this chain of addresses.
  • FIG. 4 this schematically illustrates the organisation of the video processing part (that is, a hardware encoder) of a KVM via IP system for analogue video in accordance with an embodiment of the present invention.
  • a computer 100 outputs analogue PC video data which is input to an analogue KVM via IP video encoder 310.
  • PC video data is converted from digital data to analogue form by a converter within a computer for transmission to a display device.
  • the display device can then convert the data back to a digital format for display. Small amounts of noise may be inadvertently introduced at each stage of this conversion process.
  • the KVM via IP video encoder 310 encodes this analogue video data and outputs encoded video data to a network 150 which transmits it to a remote decoder 160 connected to a remote display 170.
  • FIG 4 further illustrates the internal organisation of the KVM via IP video encoder for analogue video 310.
  • Incoming analogue video data is first digitised by means of an analogue to digital converter 312.
  • the raw video data is stored in a frame buffer memory 314 from where it can be accessed by an encoder 316.
  • the raw video data is in the same format as for the raw video data in Figure 2 after the DVI receiver 112.
  • it is generally not appropriate to apply a rolling signature generator to the raw video data shown in Figure 4 because the analogue to digital converter 312 produces a digital signal that is only an approximation to the analogue signal due to the addition of noise.
  • the raw video data is also sent to an analogue signature generator 318 which computes a 64-bit analogue signature for each aligned non-overlapping region (e.g. 16x16 squares or some other size or shape) of the frame.
  • analogue region signatures are stored in a region signature memory 320 from where they can be read by the encoder 316.
  • the encoder 316 first reads data stored in the region signature memory 320 and compares it with the corresponding data for the previous frame also stored in the region signature memory .320 (i.e. it performs differencing in the same way as the encoder 116 using signature data from memory 120 in Figure 2). When the pair of signatures are equal (within a predetermined threshold chosen to allow for the noise present in the analogue input), it is determined that a portion of the frame has not changed, and no data corresponding to that portion of the frame is included in the encoded video data output by the encoder 316.
  • the encoder 316 When the encoder 316 determines that a portion of the screen has changed, the encoder 316 reads raw video data for that portion of the frame from the frame buffer memory 314. The encoder 316 may apply additional types of encoding or compression to this raw data in accordance with known techniques before outputting the video data into the encoded video data transmitted across network 150 to remote decoder 160 for displaying on remote display 170.
  • FIG. 5 shows in greater detail the internal organisation of the analogue signature generator 318.
  • Each of the sub-components within the analogue signature generator 318 may be optionally enabled or disabled. That is, an analogue signature generator 318 may comprise only certain of the processing steps described below.
  • a signature function that can compute signatures for analogue signals such that small perturbations to the input due to noise result in small or zero changes to the output is used.
  • Raw video data input to the analogue signature generator 318 is first optionally (and if appropriate for the raw data format) processed by an RGB to YUV conversion function 410 which converts the video data from red-green-blue format with 8 bits for each colour component to luminance/chrominance format with more bits for luminance and fewer bits for chrominance.
  • RGB to YUV conversion function 410 which converts the video data from red-green-blue format with 8 bits for each colour component to luminance/chrominance format with more bits for luminance and fewer bits for chrominance.
  • RGB to YUV conversion function 410 which converts the video data from red-green-blue format with 8 bits for each colour component to luminance/chrominance format with more bits for luminance and fewer bits for chrominance.
  • RGB to YUV conversion function 410 which converts the video data from red-green-blue format with 8 bits for each colour component to luminance/chrominance format with more bits for luminance and fewer bits for chrominance.
  • the data is next optionally processed by an edge scaling function 412 which is arranged to double the magnitude of values corresponding to pixels at the edges of the regions into which the frame is divided for the purposes of signature calculation. This increases the sensitivity of the signature to movement of elements of the image that only slightly overlap the region, and would therefore otherwise have little influence on the signature.
  • the data is next differentiated by a differentiator 414. That is, for each of the three colour components, the difference between the current value and the last processed value (the value for the last processed pixel) is calculated. Pixel values at the leading edge of a frame tile may have come from a neighbouring tile. In order to prevent the signatures of pixels in one tile affecting those for a neighbouring tile values at the leading edge of a tile are set to a zero output.
  • the output of the differentiator 414 is six values: the three original colour components and the three differentiated colour components.
  • a pseudo-random code generator 416 uses a linear feedback shift register to generate a pseudo-random code for each pixel position.
  • This code is a function of the pixel's position within its region and the initial value of the linear feedback shift register, but is the same for corresponding pixels within different regions of a frame.
  • This code pattern may alternatively either be chosen at random or be based upon prior evaluation of the system's performance using that code pattern with typical video sequences.
  • the pseudo-random code generator is reset at the first column of the tile. The value set is dependent upon the row number of the tile within the frame. Consequently, the same pseudo-random code is used for each tile in each frame that is in the same position within the display.
  • a sealer 418 optionally multiplies the six values output by the differentiator 414 by either - 1, +1/2, +1 or +3/2 determined by the code supplied by the pseudo-random code generator 416.
  • a permutator 420 optionally permutes the six values output by the sealer 418, along with two zero inputs, in a pattern determined by the pseudo-random code generator 416, to output eight values.
  • the two zero inputs are added to prevent the output of the accumulator being set to a saturated value too frequently.
  • the additional zeros prevents the output of the accumulator being set to the saturated value for too long.
  • a partial signature memory 422 stores partially-computed analogue signatures.
  • An accumulator 424 retrieves partially-computed analogue signatures from the partial signature memory 422 and adds to them the values output by the permutator 420.
  • the number of bits used to store each value is chosen such that the accumulated values do not overflow too frequently.
  • the output of the accumulator 424 is the output from the analogue signature generator 318.
  • the eight values, each of eight bits, are combined to form an analogue region signature.
  • a zero is written to the partial signature memory 422 in preparation for generating the next region signature.
  • the output from the accumulator 424 is written to the partial signature memory 422.
  • the analogue signature may be truncated, retaining only the most significant bits, in order to reduce the effect of noise.
  • the number of bits may not be equal for each value.
  • a hardware encoder for a KVM via EP product capable of processing both digital and analogue PC video data is generated by combining the encoders 110 and 310 shown in Figures 2 and 4.
  • encoder 110 is used
  • analogue PC video data is output from computer 100
  • encoder 310 is used. Whether digital or analogue PC video data is being output from computer 100 may be simply detected using known techniques, that will not be described in further detail here.
  • the process of detecting copyrects may be implemented in a hardware encoder arranged to implement a pipelined process in order to increase the efficiency of the process, as will now be described in greater detail.
  • the pipelined process is implemented by, a pipelined content-addressable-memory (CAM) has been designed that compares the rolling signatures for a new frame with the aligned rolling signatures computed for the last video frame. It uses a tree organization as graphically illustrated in Figure 7. The tree structure illustrated in Figure 7 allows each new rolling signature generated for the current frame to be assessed against all of stored aligned rolling signature for the previous frame without requiring the current signature to be individually compared to every stored signature. This streamlines the processing of new rolling signatures resulting in significant efficiency savings as will be described in relation to Figure 7.
  • the pipelined CAM is an alternative implementation of the signature memory 126, run length encoder 128 and match log memory 130 of Figure 2.
  • the CAM is implemented as a series of separate processing stages. At each stage, the current rolling signature for the current frame is compared to a single stored rolling signature for the previous frame. The processing stage determines whether the two signatures are the same, or which one is the larger. The result of this comparison is used to determine which stored signature is compared to the current rolling signature in the following stage (unless a match has already been found).
  • FIG. 6 shows a generic stage 10 for the CAM.
  • Each stage generally comprises a random access memory 12 having an address input 13 for receiving address data.
  • the RAM 12 stores the aligned rolling checksums from the previous frame.
  • An output from the RAM 12 supplies a single aligned rolling signature value from the previous frame to an input of a comparator 14, the aligned rolling signature being addressed by the address input 13.
  • a further input 15 of the comparator 14 is supplied with the current rolling signature value for the new frame from the rolling signature generator 122.
  • the logic device 16 provides an indication of a match, or which signature is greater.
  • the logic device 16 also receives a match signal 18 as a further input (which comprises the results of the previous processing stages), and supplies an updated match signal 18 as an output which is passed to the next processing stage (the match signal output 18 comprising the results of the processing at all of the previous stages including the current stage).
  • the logic device 16 provides a further output to the address line 13 which determines the address to be provided to the RAM 12 in the following stage (that is, determining which stored aligned rolling signature from the previous frame to be assessed in the next processing stage).
  • Each processing stage 10 is controlled by a clock signal 20 which prevents the data signal
  • the updated match signal 18 and the address signal 13 from being passed to the next processing stage 10 until the current processing stage 10 has completed its processing. It will be appreciated that in certain embodiments of the present invention there may only be a single RAM 12, comparator 14 and logic device 16; the output data line 15, match line 18 and address line 13 being looped back to their respective inputs for processing during the next processing stage.
  • Each stage compares an input value on data input 15 (comprising the current rolling signature value for the current frame) with a single stored aligned rolling signature values from the previous frame.
  • the result of the comparison determines a further result bit upon the match line 18 (indicating whether the current stage provided a matched rolling signature, or which signature was the larger) which is applied to a next stage in the pipeline (for assessing the next stage of the tree structure shown in Figure 7).
  • Logic implemented function f operates as follows. If the match input is set, then that indicates that there was an exact match between a rolling signature for the current frame and a stored aligned rolling signature in an earlier stage. The current stage appends a 0 to the address signal on line 13 and sets the match output 18. That is, if there is a match in a previous processing stage then the current rolling signature has already been matched and thus the current processing stage is redundant. The match output 18 remains set.
  • the current stage appends a 1 to the address signal 13 (indicating to future stages which stage identified a match) and sets the match output 18.
  • the current stage appends a 0 to the address signal 13 if the current rolling signature is less than the rolling signature from the RAM or appends a 1 to the address signal 13 if it is greater and the match output 18 is not set.
  • each processing stage determines the rolling signature stored in the RAM 12 to be assessed in the following processing stage. After all processing stages are complete, if the match output 18 is set then this indicates that the current rolling signature has been matched to a stored aligned rolling signature for the previous frame.
  • the final address signal 13 indicates which stored aligned rolling signature was matched according to the position of a 1 within the final address signal.
  • first and last stages may be of a simpler construction for the following reasons.
  • the first stage is simplified because at this stage there is no previous comparison which could have found a match, so the test applied in computing a match are simplified to a single comparison of two values.
  • the last stage is simplified because at the end of the last stage there is no need to add an additional address bit; either because the last stage produces a match at the currently determined address (from the output of all previous stages) such that the address of the match is already known, or because the last stage does not produce a match in which case the predetermined address is not useful anyway.
  • Figure 7 shows a graphic representation 30 of fifteen stages 10 arranged as a pipelined binary tree.
  • Each of the fifteen words 32 within Figure 7 are equivalent to a separate stored aligned signature within the RAM 12 for a previous frame.
  • each word (corresponding to an aligned rolling signature) are stored in the RAM 12 as illustrated in Figure 6.
  • Each word is arranged within the tree structure of Figure 7 according to its assigned value (that is, rolling signatures are arranged according to their numeric value).
  • stage 1 location 0 is read from the RAM and "lazy” is compared with “fox” ("fox” being chosen as its address within the RAM is 0 - the address of each word being shown above the word). Match is not set as they are not equal, and the next bit of the address is set to 1 as “lazy” > "fox”.
  • stage 2 location 01 is read from the RAM and "lazy” is compared with “lazy”. Match is set as they are equal, and the next bit of the address is set to 1.
  • stage 3 match is already set so the next bit of the address is set to 0. The final result indicates a match at address 0110 ( ⁇ to). For rolling signatures, this is equivalent to the newly generated rolling signature successfully being matched to a stored aligned rolling signature at stage 2.
  • stage 1 For searching for a value from the last stage, "sea".
  • stage 2 For searching for a value from the last stage, "sea”.
  • stage 1 location 1 is read from the RAM and "sea” is compared with "shells”. Match is not set as they are not equal, and the next bit of the address is set to 0 as "sea” ⁇ "shells", hi stage 2, location 10 is read from the RAM and "sea” is compared with "sells”. Match is not set as they are not equal, and the next bit of the address is set to 0 as "sea” ⁇ “sells”.
  • stage 3 location 100 is read from the RAM and "sea” is compared with "sea”. Match is set as they are equal, and the next bit of the address is set to 1. The final result indicates a match at address 1001
  • Match is not set as they are not equal, and the next bit of the address is set to 1 as “foo” > “by”.
  • location 001 is read from the RAM and "foo" is compared with "dog”.
  • Match is not set as they are not equal, and the next bit of the address is undefined. The final result indicates no match, and so the address is immaterial.
  • the process continues by calculating a new rolling signature for the next pixel position and then processing the new rolling signature in the same way as shown above in Figure 6 and 7 to each of the same stored aligned rolling signature positions for the old frame.
  • Embodiments of the present invention described above predominantly relate to a comparison of two separate video frames that follow consecutively within a continuous series of video data.
  • the present invention is not limited to a comparison of a first frame with a second immediately subsequent frame. Indeed, any two frames may be compared even if widely separated in time, for instance by intervening frames.
  • the second frame may precede the first frame (again, possibly with intervening frames).
  • the present invention further extends to instances in which the first and second frames are one and the same frame, for which differences are detected between portions of the same frame, or translated and copied portions of video data within the same frame can be detected. Further modifications and applications of the present invention will be readily apparent to the appropriately skilled person from the teaching herein, without departing from the scope of the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Television Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method and apparatus for detecting a translation of a portion of video data received from a video output of a computer. The method comprises computing first signatures for each of a plurality of portions of a video frame and a second signature for a portion of a video frame. The second signature is compared with the first signatures and a determination is made whether a portion of video data has been translated according to a result of said comparison. In accordance with a further aspect of the invention analogue video data is received from a video output of a computer, the method comprising converting the analogue video data to digital video data, computing first signatures for each of a plurality of portions of a video frame and second signatures for each of a plurality of portions of a video frame. The first and second signatures are compared to identify portions of a video frame that are the same as other portions of a video frame, the signatures being computed using a function which provides the same value irrespective of the presence of noise in the digital video data.

Description

Video Data Transmission
The present invention relates to a method of transmitting video and in particular to a method of reducing the amount of video data transmitted between computers over a network.
Remote access software packages are available, such as VNC provided by RealVNC Ltd, that allow the content of one computer's display to be displayed, for instance in a window, on another remote computer's display over a network connection. Such software can also communicate keyboard and mouse data in the reverse direction, thereby allowing full remote control.
VNC uses a protocol called RFB to send the display data over the network (such as a LAN or a WAN, such as the Internet). RFB stands for "Remote Frame Buffer". An objective of the design of RFB is to minimise the. data that needs to be sent. For example, rather than simply sending the values of all of the pixels in every frame, it sends only those parts of the screen that have changed each time.
VNC operates entirely in software, hi contrast, the present invention relates to systems in which video capture is carried out in hardware. Such systems including hardware video capture and sending display data over a network to a remote software viewer are sometime referred to as "KVM via DP" indicating that Keyboard, Video and Mouse data are sent via Internet Protocol over the network.
An important consideration in providing an effective KVM via D? system is to efficiently generate compact RFB data from the captured video stream.
A hardware RFB encoder already exists, and is currently provided under the name AdderLink DP and is provided by Adder Technology Ltd. The input to the AdderLink DP encoder is video from a standard PC monitor lead and the input video data is captured into memory within the device. Embedded software encodes the data from this memory in the RFB format and transmits the encoded data to the remote viewer computer via the network interface.
To increase operational efficiency, the AdderLink IP attempts to identify and send only those regions of the display that have changed in each frame, in a process known as
"differencing". This is done by storing the previous frame and simply comparing old and new frames. For some types of display data, this gives performance similar to that achieved by the VNC software server. However, there is a special case of changes between frames of PC video data that is detected by the VNC software server but not by the AdderLink IP. The present invention recognises this and provides a method and apparatus for detecting this case in hardware.
The special case relates to where there is a translation of a portion of video data between a first frame and a second frame. Such a translation is called a "copyrect" ( meaning a copy rectangle, where rectangle includes rectangular and square). A copyrect is a message in the RFB protocol that tells the displaying computer to copy a region of its screen to a new location, rather than requiring the contents of that frame portion to be resent. Copyrects are commonly generated when a user moves or scrolls a window. These are frequent operations which require only a handful of bytes of network data when encoded as copyrects, but would need many kilobytes of data if sent in any other way. The detection of copyrects is alternatively referred to herein as motion detection.
It is relatively easy for software to generate copyrects as it can observe how the operating system is manipulating the screen at a relatively high level. On the other hand, a hardware encoder only has access to the data coming out of the PC's video cable, where this high- level information is not present.
Embodiments of the current invention provide a method and apparatus implementing a hardware encoder which can detect copyrects from incoming video data by comparing a new frame with the last one. Herein, "hardware encoder", means a combination of hardware and embedded software as the split between hardware and software is arbitrary and dependent upon a particular implementation of the encoder. The present invention relates to video from PCs (i.e. personal computers and similar). Work in the field of camera-image video (e.g., television), has quite different characteristics. Video compression formats such as MPEG make use of different forms of motion detection to identify regions that have moved from one frame to the next (e.g., camera panning, or a subject moving within the frame). The central difference between camera images and PC video is that PC video requires identifying exact copies, while MPEG motion detection only looks for approximate matches. MPEG generally uses an iterative search method to find matches, e.g., the "three step search" algorithm, which attempts to find a good match by zooming in on promising-looking areas. However,' such methods are not suitable for PC video in which exact copies need to be identified.
While brute-force methods for detecting copyrects can be considered, they are impractical owing to the resources required. The brute-force method simply involves comparing every pixel of the new frame with every pixel of the old frame. This will lead to many false matches where pixels just happen to be of the same colour. However, when displacements from old position to new position for matching pixel-pairs are calculated, a spike in the correlation distribution will be found corresponding to the displacement of the window- move or scroll that has occurred. This can be used to identify a copyrect from the incoming PC video data alone.
This brute force method of naively comparing the pixels would require a content- addressable-memory (CAM) with one entry for each pixel, i.e., millions of entries each of 16 or 24 bits for typical PC screen modes. This would require an impractical silicon area for most real world applications, though could be viable using current technologies for cost-insensitive supercomputing applications.
The above discussion of the prior art relates predominantly to a situation in which the PC video data to be encoded is in a digital format, for instance DVI (digital video interface). For a digital PC video system, the video data received by the encoder is the same as the original data displayed upon the remote computer, for example when the input to the system is a DVI cable. However, many PC video connections use an analogue signal. This can be converted to digital data by an analogue-to-digital converter in the hardware encoder. However, this analogue data will typically include noise, including noise introduced during the analogue to digital conversion. The process of encoding the data for use in a KVM via IP system is further complicated by the presence of noise.
In general, there are three circumstances in which video encoding can reduce the amount of data needed to represent a video stream.
Firstly, data for regions of the image that have not changed from one frame to the next need not be repeated. This first optimisation is here called "differencing".
Secondly, where copyrects can be detected, this corresponds to a region of the image that has moved from one frame to the next (as would occur for example when a window is scrolled). Only a description of the moved region's position in the old and new frame need be given, rather than the region's content. This second optimisation is also called "motion detection".
Thirdly, when a region of a frame is equal to the same or a different region of the same or an earlier frame (not necessarily the immediately previous frame), it is only necessary to indicate that the previous content should be repeated. This third optimisation is called "caching".
In principle all of these circumstances could be detected by exhaustive comparison of the new frame and previous frames. Embodiments of the present invention provide more efficient detection methods. In particular embodiments of the present invention, all three techniques can be applied within a single system.
Furthermore, the present invention is not limited to a comparison of a first frame with a second immediately subsequent frame. Indeed, any two frames may be compared even if widely separated in time, for instance by intervening frames. Furthermore, the second frame may precede the first frame (again, possibly with intervening frames). The present invention further extends to instances in which the first and second frames are one and the same frame, for which differences are detected between portions of the same frame, or translated and copied portions of video data within the same frame can be detected.
It is an object of embodiments of the present invention to obviate or mitigate one or more of the problems associated with the prior art, whether identified herein or elsewhere. In particular, it is an object of embodiments of the present invention to provide a hardware based approach for detecting copyrects in PC video data. Embodiments of the present invention provide improved KVM via IP systems for use with both digital and analogue PC video data. The invention is not, however, limited to KVM via IP applications, and may be more generally applied for processing video data, and in particular PC video data, in other applications.
The scope of the present invention is defined by the appended claims.
According to a first aspect of the present invention there is provided a method of detecting a translation of a portion of video data received from a video source, the method comprising: computing first signatures for each of a plurality of portions of a video frame; computing a second signature for a portion of a video frame; comparing the second signature with the first signatures; and determining whether a portion of video data has been translated according to a result of said comparison.
An advantage of the first aspect of the present invention is that by detecting translated portions of video data, a reduced amount of information relating to the amount by which the video data has been translated needs to be transmitted across a network in a KVM via IP system rather than the translated video data itself. This can provide significant reductions in the amount of information transmitted across a network for many typical applications, for instance where a remote computer is displaying conventional desktop publishing applications and a user is scrolling portions of the displayed data.
Said first signatures may be computed for a plurality of portions of a first video frame and said second signature is computed for a portion of a second video frame, the method comprising determining whether a portion of video data has been translated between the first and the second frames according to a result of said comparison. Alternatively, said first signatures may be computed for a plurality of portions of a video frame and said second signature is computed for a different portion of the same video frame, the method comprising determining whether a portion of video data has been translated within the video frame according to a result of said comparison.
Said signatures may be computed from video data contained within the respective portions of a video frame.
The method may further comprise computing second signatures for each of a plurality of portions of a video frame, and comparing each second signature with the first signatures. A second signature may be computed at every pixel position in a frame of video data.
The plurality of portions of the video frame for which first signatures are computed may comprise aligned and non-overlapping portions of the video frame. The portions of each video frame may comprise rectangular video portions of a predetermined size. The method may further comprise computing a magnitude of the translation of the video data.
The method may further comprise: storing at least some of the second signatures associated with aligned and non-overlapping portions of the video frame; and repeating the method using the stored second signatures in place of the first signatures.
The method may further comprise: identifying a first subset of the first signatures; comparing the first subset of first signatures with the or each second signature; and determining whether a region of the video frame has been translated according to a result of said comparison. The first subset may comprise one or more first signatures for which a value of the signature is unique amongst the first signatures computed for the video frame.
The method may further comprise computing an amount of the translation of the region of the video frame.
The signatures may be computed using a rolling signature algorithm, the rolling signature algorithm allowing a new signature to be computed from a previous signature of an overlapping portion of the video frame and the values of those pixels either included in the previous overlapping portion of the video frame and not in the current portion of the video frame or are included in the current portion of the video frame and not in the previous overlapping portion of the video frame. The rolling signature algorithm may comprise a two-dimensional rolling signature algorithm. The two dimensional rolling signature algorithm may comprise: calculating a first signature using video data for each of a plurality of columns of pixels; and calculating a second signature using each of the first signatures.
The method may further comprise: generating an encoded representation of a video frame; wherein the encoded representation of the video frame allows the identification of portions of video data that have been translated. The method may further comprise: transmitting the encoded representation of the video frame over a computer network to a decoder associated with a remote computer; wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame. The method may further comprise: computing third signatures for each of a plurality of aligned and non-overlapping portions of a video frame; computing fourth signatures for each of a plurality of aligned and non-overlapping portions of a video frame; and comparing the third and fourth signatures to identify portions of a video frame that are the same as corresponding portions of another video frame; wherein an encoded representation of a video frame only includes an encoded representation of portions of the video frame that are not the same as portions of another video frame.
The method may further comprise: receiving analogue video data received from a video output of a further computer; converting the analogue video data to digital video data; computing fifth signatures for each of a plurality of portions of a video frame; computing sixth signatures for each of a plurality of portions of a video frame; comparing the fifth and sixth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data. The method may further comprise: receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
According to a second aspect of the present invention there is provided a carrier medium carrying computer readable code for controlling a data processing device to carry out the above described method.
According to a third aspect of the present invention there is provided a data processing device for detecting a translation of a portion of video data received from a video source, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the above described method.
According to a fourth aspect of the present invention there is provided a method of processing analogue video data received from a video source, the method comprising: converting the analogue video data to digital video data; computing first signatures for each of a plurality of portions of a video frame; computing second signatures for each of a plurality of portions of a video frame; comparing the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
An advantage of the fourth aspect of the invention is that by providing an efficient means of encoding analogue PC video data, the amount of information that needs to be transmitted across a network in a KVM via IP system can be reduced.
Said first signatures may be computed for a plurality of portions of a first video frame and said second signatures are computed for portions of a second video frame, the method comprising identifying portions of video data within the second video frame that are the same as corresponding portions of video data within the first video frame. Alternatively, said first signatures may be computed for a plurality of portions of a video frame and said second signatures are computed for a different plurality of portions of the same video frame, the method comprising identifying portions of video data that are the same as other portions of video data within the same video frame.
The method may further comprise: generating an encoded representation of the video frame; wherein the encoded representation of the video frame allows the identification of portions of the video frame that are the same as corresponding portions of another video frame. The method may further comprise: transmitting the encoded representation of a video frame over a computer network to a decoder associated with a remote computer; wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame. The method may further comprise: receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
According to a fifth aspect of the present invention there is provided a carrier medium carrying computer readable code for controlling a data processing device to carry out the above described method.
According to a sixth aspect of the present invention there is provided a data processing device for processing analogue video data received from a video source, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 24 to 29.
According to a seventh aspect of the present invention there is provided an apparatus for detecting a translation of a portion of video data received from a video source, the apparatus comprising: a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute a second signature for a portion of a video frame; a comparator operable to compare the second signature with the first signatures and to determine whether a portion of video data has been translated. The signature generator may be operable to compute first signatures for a plurality of portions of a first video frame and said second signature for a portion of a second video frame, the comparator being operable to determine whether a portion of video data has been translated between the first and the second frames. Alternatively, the signature generator may be operable to compute first signatures for a plurality of portions of a video frame and said second signature for a portion of the same video frame, the comparator being operable to determine whether a portion of video data has been translated within the same video frame.
The apparatus may comprise a hardware video encoder which is connectable to a video output of a computer. The hardware video encoder may comprise a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory such that the processor comprises the signature generator and the comparator.
The signature generator may be operable to compute signatures from video data contained within the respective portions of a video frame.
The signature generator may be operable to compute second signatures for each of a plurality of portions of a video frame, and the comparator is operable to compare each second signature with the first signatures. The signature generator may be operable to compute a second signature at all pixel locations within a video frame except pixel locations at a top edge or at a left hand edge of the video frame.
The plurality of portions of a video frame for which first signatures are computed may comprise aligned and non-overlapping portions of a video frame.
The comparator may be in the form of a pipeline comprising a series of processing stages arranged to compare a second signature iteratively against a plurality of first signatures to determine whether a portion of video data has been translated. The apparatus may implement a binary tree storage structure for storing the plurality of first signatures, arranged such that at each processing stage the pipeline is operable to compare the second signature with a single first signature, the number of stages being less than the number of first signatures. Each processing stage may comprise a memory, a comparator and a logic device. The logic device may implement a function which: appends a 0 to an address and sets a match signal, if a preceding stage has indicated an exact match; appends a 1 to the address and sets the match signal, if the comparator of the stage indicates an exact match between a data signal and the content of the memory of the stage; or appends a 0 to the address and does not set the match signal, if the comparator of the stage indicates that the data signal is less than the content of the memory of the stage; or appends a 1 to the address and does not set the match signal, if the comparator of the stage indicates that the data signal is greater less than the content of the memory of the stage; wherein a final match signal is indicative of whether a portion of video data has been translated, and a final address is indicative of the first position of the translated portion of video data within the video frame.
The apparatus may further comprise: a transmitter arranged to transmit an encoded representation of video frame to a decoder associated with a remote computer; wherein the encoded representation of the video frame allows the identification of portions of video data that have been translated and wherein the decoder is adapted to reproduce the video ^ frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
The apparatus may further comprise: a second signature generator operable to compute third signatures for each of a plurality of aligned and non-overlapping portions of a video frame and fourth signatures for each of a plurality of aligned and non-overlapping portions of a video frame; and a second comparator operable to compare the third and fourth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein an encoded representation of a video frame allows the identification of portions of the video frame that are the same as portions of another video frame. The apparatus may further comprise: an analogue to digital converter arranged to convert analogue video data received from a video output of a further computer to digital video data; a third signature generator operable to compute fifth signatures for each of a plurality of portions of a video frame and to compute sixth signatures for each of a plurality of portions of a video frame; and a third comparator operable to compare the fifth and sixth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the fifth and sixth signatures are computed using a function which provides the substantially same value irrespective of the presence of noise in the digital video data. The apparatus may further comprise: a receiver for receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
According to an eighth aspect of the present invention there is provided an apparatus for processing analogue video data received from a video source, the apparatus comprising: an analogue to digital converter arranged to convert analogue video data to digital video data; a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute second signatures for each of a plurality of portions of a video frame; and a comparator operable to compare the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the first and second signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
Said signature generator may be operable to compute first signatures for a plurality of portions of a first video frame and second signatures for portions of a second video frame, the comparator being operable to identify portions of video data within the second video frame that are" the same as corresponding portions of video data within the first video frame. Said signature generator may be operable to compute first signatures for a plurality of portions of a video frame and second signatures for portions of the same video frame, the comparator being operable to identify portions of video data within the video frame that are the same as other portions of video data within the video frame.
The apparatus may further comprise: a transmitter arranged to transmit an encoded representation of the video frame to a decoder associated with a remote computer; wherein r
-13- the encoded representation of the video frame allows the identification of portions of video data that have been translated between and wherein the decoder is adapted to reproduce the video frame using the encoded representation of the second video frame and a previously decoded representation of another video frame.
The apparatus may further comprise: a receiver for receiving a signal indicative of a user input from the remote computer and using the signal to operate the computer.
Embodiments of the present invention may be implemented in software. For example a carrier medium carrying computer readable code for controlling a computer or a bespoke hardware encoder to carry out the above aspects of the invention may be provided.
Alternatively, a computer apparatus comprising a program memory storing processor readable instructions and a processor configured to read and execute instructions stored in said program may be provided. The processor readable instructions stored in said program memory may comprise instructions controlling the processor to carry out the above aspects of the invention. Alternatively, the computer apparatus may comprise bespoke hardware.
In particular, certain embodiments of the present invention are implemented in a bespoke encoder, which comprise bespoke hardware and embedded software for processing video data.
Embodiments of the present invention will now be described, by way of example only, and with reference to the accompanying drawings, in which: ,
Figure 1 schematically illustrates a network within which a KVM via IP system in accordance with certain embodiments of the present invention can be implemented;
Figure 2 schematically illustrates a KVM via IP video encoder suitable for video data which is in a digital format, in accordance with an embodiment of the present invention;
Figure 3 schematically illustrates a rolling signature generator forming part of the video encoder of Figure 2; Figure 4 schematically illustrates a KVM via IP video encoder suitable for video data which is in an analogue format, in accordance with a further embodiment of the present invention;
Figure 5 schematically illustrates an analogue signature generator forming part of the video encoder of Figure 4;
Figure 6 shows a schematic block diagram of a stage of an apparatus for implementing a pipelined process for detecting translated portions of video data in frames of PC video data, in accordance with an embodiment of the present invention; and
Figure 7 shows a graphical representation of the storage of data for use within the pipelined process partially implemented by the apparatus of Figure 6..
Similar items in different Figures share common reference numerals unless indicated otherwise.
Referring first to Figure 1, this schematically illustrates an example of a computer network within which a KVM via IP system in accordance with embodiments of the present invention can be implemented. A first computer 100 is connected to a decoder 160 via a network 150. The network 150 comprises an Internet Protocol (IP) network. The first computer 100 generates PC video data that may be displayed upon a local video display device connected to the first computer. It will be appreciated that where the term "PC" (Personal Computer) is used herein, this should not be interpreted as being limited only to PC data, for example excluding Apple computers. Rather, the term PC should be interpreted in a general sense as being synonymous for computer or general purpose computer. The decoder 160 may comprise a stand alone data processing device, for instance a' hardware decoder (where the term "hardware" should be interpreted broadly to include a hardware process arranged to implement embedded software). Alternatively, the decoder 160 may be connected to a computer to process the decoded video data. The decoder 160 may be integrated with a computer. A decoder 160 in combination with a display device 161 may be referred to as a KVM terminal. PC video data is encoded via a hardware encoder 110 and transmitted to the decoder 160 for display upon a display device 161 to a user located there. Furthermore, user input data, for instance derived from user input devices such as a computer keyboard 162 or mouse 163 connected to the decoder 160 (optionally via a separate computer), is transmitted from the decoder 160 to the first computer 100 via the network 150 in order to control the first computer 100. By this means a user operating the keyboard 162 or mouse 163 can interact with the first computer 100 much as if they were local to it. Since network speed is typically limited, denser encoding of the video data has the benefit of improving the responsiveness of the system and hence the quality of the user's experience.
It will be appreciated that the network 150 may comprise any form of network, such as a local area network (LAN) or a wide area network (WAN), for instance the internet.
The decoder 160 may optionally be arranged to receive encoded video data from a plurality of computers 100 each connected to the network 150 by separate encoders 110 or by a single encoder 110 connected via a switch to all of the computers 100.
"Raw" PC video data from the first computer comprises a sequence of frames, each made up of a sequence of rows and each row made up of a sequence of pixels. Raw video data can be encoded to a compressed format in order to reduce the amount of data needed for the representation while retaining the same or similar appearance. Embodiments of the present invention relate to efficient methods and apparatuses for performing such encoding. The invention is not, however, limited to KVM via IP applications, and may be more generally applied for processing video data, and in particular PC video data, in other applications.
Referring now to Figure 2, this shows the organisation of the video processing portion of a KVM via IP system for DVI digital video incorporating the current invention. In particular, Figure 2 illustrates a hardware PC video data encoder 110 for encoding digital PC video data prior to transmission across a network. As noted above in the introduction, the term "hardware encoder" is not to be interpreted as the encoder processing video data exclusively in hardware. Indeed, a practical hardware encoder typically includes a processor adapted to implement embedded software for performing at least part of the encoding. The hardware encoder illustrated in Figure 2 is particularly adapted for processing DVI PC video data.
Figure 2 illustrates a computer 100 which outputs digital video in DVI format which in turn is input to a KVM via IP video encoder 110. The KVM via IP video encoder 110 encodes this DVI video data using encoding methods in accordance with embodiments of the present invention and outputs encoded video data to a network 150 which transmits the encoded data to a remote decoder 160 connected to a remote display 170. Keyboard and mouse data for controlling computer 100 may be transmitted back across network 150, in accordance with known techniques that will not be further described here.
Figure 2 further illustrates the internal organisation of the KVM via IP video encoder 110 as will now be described.
Incoming DVI video data to the encoder 110 is first decoded to raw video data by means of a DVI receiver 112. It will be appreciated that the video encoder 110 shown in Figure 2 could be modified to operate with any other form of digital PC video signal by replacing the DVI receiver 112 with an appropriate receiver. In one embodiment of the present invention the raw video data is in raster format, that is with pixels represented in order from left to right and from top to bottom of a video frame. However the present invention is not limited to any particular format for the raw video data, beyond the raw video data being stored in a digital format. The raw video data is stored in a frame buffer memory 114 from where it can be read by an encoder 116 The encoder 116 is responsible for generating the encoded video data to be transmitted across the network, as will be described below.
The raw video data is also sent to a region signature generator 118 which computes a 32- bit signature using a cyclic redundancy check (CRC) for each aligned non-overlapping region of a frame (e.g. 16x16 pixel squares or some other size or shape). That is, the frame of video data is subdivided into a series of separate video data portions, for each of which a separate 32-bit region signature is generated using a CRC. The generated region signatures are stored in a region signature memory 120 from where they can be read by the encoder 116. In one embodiment of the present invention the 32-bit CRC specified by clause 3.2.8 of IEEE standard 802.3 is used to generate region signatures, although it will be appreciated that there are a number of other suitable known signature types.
The raw video data from the DVI receiver 122 is also sent to a rolling signature generator 122 which computes a 16-bit signature for each region (e.g. 16x16 pixel squares or some other size or shape) of a first frame. It will be appreciated that in alternative embodiments of the present invention alternative signature lengths may be used and that the invention is not limited to the used of any particular signature. The rolling signature algorithm that is used will be described in greater detail below. For example, for a screen of size 1024 by 768 pixels with a region size of 16 by 16 pixels, (1024/16) * (768/16) = 3072 aligned non- overlapping signatures will be generated. The rolling signatures that are computed for aligned non-overlapping region positions corresponding to the same non-overlapping portions of the screen operated upon by the region signature generator 118 are stored in an aligned rolling signature memory 124 from where they can be read by the encoder 116.
After the first frame has been processed, a second frame is studied and rolling signatures from the rolling signature generator 122 are generated for both aligned and unaligned regions. That is, a rolling signature is generated for a plurality of pixel locations, except those near to the edge of the frame. Those signatures that correspond to aligned non- overlapping regions are stored in a further signature memory 126. In certain embodiments of the present invention rolling signatures are generated for every pixel location. The rolling signature generator 122 generates a 16-bit signature for each 16x16 portion of the video data starting from each selected pixel location within the video frame. For example, for a 1024 by 768 pixel screen there are (1024-15)* (768-15) = 759777 possible 16 by 16 pixel square regions whose signatures will be computed.
In the signature memory 126, data previously stored for a preceding frame, derived from the aligned rolling signature memory 124, is compared to the rolling signature generated for each pixel of the present frame to identify whether or not rolling signatures generated for any pixel position in the new frame match any rolling signatures for aligned positions in the old frame. If a rolling signature is matched, this corresponds to a detected copyrect. Based on this record of current signature data and previous signature data, a rolling signature value and the coordinates of the region of the previous frame it corresponds to are sent via a run-length encoder 128 to a match log memory 130, which records details of detected copyrects. The match log memory 130 records details of the pairs of matched signatures stored for the first frame in the aligned rolling signature memory 124 and currently generated by the rolling signature generator 126 for the second frame.
The run-length encoder 128 is used to limit the amount of memory required to store this information, but may be omitted in certain embodiments of the invention. In the case where a frame contains several adjacent regions with equal signatures, the amount of match log data that needs to be stored can be reduced by coalescing neighbouring match log entries, for neighbouring regions of the video frame that contain the same pixel values, into a single entry augmented with a count of the number of neighbouring matched regions. From the match log memory 130 this data regarding detected copyrects can be read by the encoder 116.
The operation of the complete KVM via IP encoder 110 for processing PC video data (or other video data) will now be described.
The encoder 116 first reads data stored in the region signature memory 120 for the current frame and compares it with the corresponding data for a previous frame retained within the region signature memory 120. The encoder 116 performs differencing, that is detecting portions of the frame that have not changed from the preceding frame. When the signatures are equal, it is determined that portion of the screen has not changed, and no data corresponding to it is included in the encoded video data output by the encoder 116. The remote decoder 160 is operable to detect those portions of the screen that are not included within the encoded video signal and to copy that portion of the frame from the previous frame generated at the remote computer. Thus for portions of a video image that change infrequently (for instance menu bars), the video data need not be continually retransmitted. When the encoder 116 determines that a portion of the screen has changed, it next reads data stored in the match log memory 130 to determine whether that portion of the screen matches any portion of the previous frame (i.e. it performs motion detection by detecting copyrects), or any data that is cached by the viewer. The process of detecting copyrects is described in greater detail below.
For caching, the receiver of the encoded video data retains certain old data (a "cache"); it may, for example, retain a certain number of the most recent frames of data (as well as the immediately preceding frame), or a certain number of most recent bytes of data, or certain data specifically indicated for retention by the encoder. When a new frame contains content that has occurred before and is stored in the receiver's cache the encoder need not send the content but can instead place a code in the compressed data indicating that the cached data should be retrieved and re-used.
The signatures are used to facilitate efficient encoding with caching in the present invention. The encoder 116 keeps track of the signatures of the content to be stored in the receiver's cache and marks the corresponding locations in the signature memory to this effect. When cached content occurs in a new frame, by the same means described below for detecting copyrects, a match log entry is stored. These match log entries are subsequently used to place a cache code in the compressed video data transmitted across the network 150. In one embodiment of the present invention, the signatures themselves can be used as identifiers for the cached content.
A particular apparatus and method of detecting copyrects is described in greater detail below, with reference to Figures 6 and 7.
When a copyrect is detected, the encoder 116 outputs into the encoded video data a code describing the match, rather than having to include the whole of the video data for that portion of the frame. Information about regions that have moved by the same displacement can be identified and merged in order to compress the data further. After processing the current frame to detect unchanged frame regions and copyrects, if the region signature memory 120 indicates that a portion of the screen has changed and the match log memory 130 provides no information about matches for that area the encoder 116 reads the raw video data from the frame buffer memory 114. The encoder 116 may apply additional known forms of encoding or compression to this raw data using conventional techniques before outputting the video data into the encoded video data stream transmitted across the network.
The encoder 116 also updates the signature memory 126 with the current contents of the aligned rolling signature memory 124, in order to detect copyrects for the following frame, . so that rolling signatures at aligned positions in the current frame will result in match log entries when they are found in the next frame to be processed.
The rolling signature function used is chosen for the property that it can be economically updated by storing a new value and retrieving an old value without needing to recompute the signature from scratch. That is, if we know f(D0...Dn) then f(Dl...Dn+l) can be economically computed given DO and Dn+ 1.
A signature function is a general term for a function that computes an identifier for a chunk of data. An example of a signature function is the CRC used within the region signature generator 118. If two portions of data have the same signature, then it is likely that the data itself is the same. However, there is always a chance that two different chunks of data will have the same signature. This is called a collision. Fallback methods can be used to cope in the rare case when a signature collision occurs. However, the more bits that are used in the signature, the less likely are collisions and so with sufficient bits in the signature, the need to use fallback methods can be obviated. The signature function used to detect copyrects in the embodiment of the invention shown in Figure 2 is a 16-bit rolling signature, generated by the rolling signature generator 122.
Embodiments of the present invention compute and store signatures (each having a length of, e.g., 16 to 64 bits) for rectangular or square regions of the display (e.g., 16 by 16 pixels). It will be appreciated, however, that any shape that can be tiled across a frame may alternatively be used, though only rectangular frame regions will be described for simplicity. Using signatures allows regions that have changed to be detected efficiently. The signatures for a new frame of video data are compared with the signatures for the last frame processed, which is typically the immediately preceding frame, in order to detect copyrects. This requires less data processing than a method based simply on comparing pixels. For example, one 32-bit signature can be compared for a 16x16 pixel block, rather than 256 16-bit pixel comparisons-, providing a 100-fold saving.
The motion-detection algorithm is not limited to detecting only copyrects that are aligned to pixel region boundaries, since users can move or scroll windows with any pixel displacement. As discussed above, this is achieved by computing signatures for an old video frame only for aligned pixel regions. For the new video frame, rolling region signatures are computed at a large number of pixel positions, and the signatures for the new frame are compared as they are generated with the stored signatures for the last frame. This algorithm allows copyrects with any displacement to be detected, not only displacements that are a multiple of the pixel region size, and stores and compares only a relatively small number of signatures.
The rolling signature function used in the current invention is similar to, though simpler than, the Adler-32 [RFC 1950], Fletcher [RFCl 146] and Rsync [A. Tridgell, P. Mackerras, "The rsync algorithm", Technical report, Australian National University] algorithms. These known algorithms related only to the processing of one dimensional data. There is no suggestion within these algorithms or within the cited technical report that they could be modified to be applied to two dimensional data. It will be appreciated that the precise signature function chosen may vary.
The method of detecting copyrects uses a 2D rolling signature algorithm to generate a 2D rolling signature by first computing a set of ID rolling signatures for columns, and then taking a ID rolling signature of those signatures. More particularly, for an n-by-m pixel video frame, for each of the n pixel columns of the screen a rolling signature relating to a region of the frame identified by the pixel is maintained. From the incoming PC video data for the new frame, as each pixel value arrives (in raster-scan order) the pixel data value is incorporated into the appropriate column signature, and the pixel value that arrived m lines earlier (i.e., the corresponding pixel for the last frame) is removed from the signature for that column.
At the same time, a single rolling signature is calculated for the other dimension, i.e., across the pixel rows for the region of the frame identified by that pixel. As each pixel arrives, after its column signature has been updated, the row signature is recalculated with the new column signature, and the column signature from m columns earlier is removed. Hence, this signature is a rolling signature for the region of the frame of which the current pixel is the bottom-right corner. That is, the rolling signature calculates a signature for a region of a frame smaller than the whole frame associated with the current pixel, that is every region of the frame at every offset, not just aligned and non-overlapping portions of the screen.
As a large number of separate signature values are to be stored, there is an incentive to use relatively small signatures. However, small signatures have an increased chance of collision. There are two options for reducing the probability of collisions.
Firstly, a content-addressable memory can be used for the rolling signature memory 126. In this case, one location is used for each signature actually stored, rather than one for each possible signature. This makes it possible to use a signature with more bits and hence less likelihood of collisions.
Secondly, an approach of detecting collisions can be adopted. Relatively small signatures are used for motion detection. Having identified a likely translation, the source and destination pixel values are compared, either directly or by computing and comparing signatures with more bits than were used in detection, to confirm the matched copyrect. If this comparison indicates that the source and destination differ, it can be determined that a collision has occurred. The process of generating a rolling signature will now be described in greater detail with reference to Figure 3. Figure 3 illustrates the internal structure of the rolling signature generator 122.
Raw video data derived from the DVI receiver is first processed by a hash function 200, which outputs compacted pixel values for each new pixel value in a frame. Digital PC video data pixels values are processed in raster-scan order: typically left to right and top to bottom. Once the pixels in the last row of a frame have been transmitted, the pixel values for the next frame are transmitted, starting again at the first row and the first column. The purpose of the hash function is to reduce the amount of data to be stored while generating the rolling signature.
The compacted pixel values are stored temporarily in a buffer 202 so that a pair of compacted pixel values can be presented to a column rolling signature function 204 corresponding to one (new) pixel and another (old) pixel from m rows earlier (where m is equal to the number of rows of pixels within each video frame). That is, each pixel is presented to the column rolling signature function 204 together with the corresponding pixel value from the preceding frame. The column rolling signature function 204 calculates a separate rolling signature for each column of pixels within a region of a video frame identified by the current pixel.
A column rolling signature memory 206 stores current rolling signatures for each column of the screen. Given the current rolling signature, the new compacted pixel value and the old compacted pixel value (the corresponding compacted pixel value from one tile height earlier within the same frame) the column signature function 204 computes a new column rolling signature. The new column rolling signature is sent to the column rolling signature memory 206 where it replaces the previous signature for that column. By updating the column signature in this way each time a pixel is changed, the column rolling signature function 204 provides an efficient means of computing signatures of regions of an image (the regions of an image being referred to herein as "tiles") providing maximal re-use of a previously computed signature in the creation of a signature for a neighbouring region. Each new column rolling signature is in turn also stored temporarily in a buffer 208 so that a pair of values can be presented to a region rolling signature function 210 corresponding to a new column signature and an old column signature from n columns earlier (where n is the number of columns in each tile). That is, a new column signature is presented to the region rolling signature function with the column signature from n columns earlier within the same frame. Given the current region rolling signature, which it stores internally, and the new and old column rolling signatures, the region rolling signature function 210 computes the new region rolling signature. In this way, each time a new pixel value is transmitted by the hash function, the region rolling signature function 210 generates a rolling signature for the region of the frame identified by that pixel (that is, rolling signatures are generated for every overlapping portion of video data) except near the top and left edges of a frame. The rolling signature is a function of all the pixels in the region of the frame of which the current pixel is the bottom right corner.
The process of creating the rolling signatures reduces the information encoded within a tile of pixels to a small number of compacted pixel values. If, by way of example, the compacted pixels are each represented by four bits of information, then for a region of the video frame that is all the same colour there are at most sixteen different tile signatures. Consequently there would be an unacceptably high probability of a signature collision when all of the pixels in such a tile move from one position to another within a frame. In order to reduce the risk of a signature collision the region rolling signature is combined with the raw video data for one pixel by means of a bitwise exclusive OR operation by the exclusive OR function 212 to output the final output of the rolling signature generator 122.
The process of detecting copyrects can be optimised further as will now be described.
Firstly, after studying the first frame some signatures may be identified as "special" by storing a different value than normal in the signature memory 126. For example, a random subset of the signatures or those signatures that occurred only once in the first frame may be selected. When these signatures are found in the second frame their match log entry is stored in a separate part of the match log than other match log entries. A subset of the matches is recorded and built up in this separate part of the match log; these can be more readily assessed to identify the nature of any translations of regions of video data between two frames. For example, if the user has scrolled a window within an application currently being displayed on the first computer, it will normally be possible to identify the distance that the window contents have moved by studying only a few of the regions that fall within the window.
Having identified the displacement of the motion between the two frames based on a subset of the regions, it is necessary to identify all of the regions that have moved by this displacement. At this point the second optimisation is used. The "non-special" part of the match log is partitioned according to the position of the match in the second frame relative to the region dimensions. If it has been established that the motion of interest has a displacement of (dx,dy) where x and y are orthogonal axes within the frame, then since the signatures recorded in the first frame were for aligned regions, all of the corresponding matches in the second frame will be at positions where (x mod a, y mod b) equals (dx mod a, dy mod b) where (a,b) is the region size in pixels. Only this part of the match log needs to be studied: entries in other partitions must relate to motion with other displacements or matches where regions are equal but not as a result of motion.
The partitioning of the match log required for the above two optimisations is implemented as follows. Match log entries are written to memory in contiguous locations as the matches are found. Each match log entry is augmented with the address, or part of the address, of the last entry in the same partition. It is then possible to study all of the match log entries in a particular partition by following this chain of addresses.
Referring now to Figure 4, this schematically illustrates the organisation of the video processing part (that is, a hardware encoder) of a KVM via IP system for analogue video in accordance with an embodiment of the present invention. A computer 100 outputs analogue PC video data which is input to an analogue KVM via IP video encoder 310. Commonly, PC video data is converted from digital data to analogue form by a converter within a computer for transmission to a display device. The display device can then convert the data back to a digital format for display. Small amounts of noise may be inadvertently introduced at each stage of this conversion process. The KVM via IP video encoder 310 encodes this analogue video data and outputs encoded video data to a network 150 which transmits it to a remote decoder 160 connected to a remote display 170.
Figure 4 further illustrates the internal organisation of the KVM via IP video encoder for analogue video 310. Incoming analogue video data is first digitised by means of an analogue to digital converter 312. The raw video data is stored in a frame buffer memory 314 from where it can be accessed by an encoder 316. The raw video data is in the same format as for the raw video data in Figure 2 after the DVI receiver 112. However, it is generally not appropriate to apply a rolling signature generator to the raw video data shown in Figure 4 because the analogue to digital converter 312 produces a digital signal that is only an approximation to the analogue signal due to the addition of noise.
The raw video data is also sent to an analogue signature generator 318 which computes a 64-bit analogue signature for each aligned non-overlapping region (e.g. 16x16 squares or some other size or shape) of the frame. These analogue region signatures are stored in a region signature memory 320 from where they can be read by the encoder 316.
The operation of the analogue hardware KVM via IP encoder 310 shown in Figure 4 will now be described.
The encoder 316 first reads data stored in the region signature memory 320 and compares it with the corresponding data for the previous frame also stored in the region signature memory .320 (i.e. it performs differencing in the same way as the encoder 116 using signature data from memory 120 in Figure 2). When the pair of signatures are equal (within a predetermined threshold chosen to allow for the noise present in the analogue input), it is determined that a portion of the frame has not changed, and no data corresponding to that portion of the frame is included in the encoded video data output by the encoder 316.
When the encoder 316 determines that a portion of the screen has changed, the encoder 316 reads raw video data for that portion of the frame from the frame buffer memory 314. The encoder 316 may apply additional types of encoding or compression to this raw data in accordance with known techniques before outputting the video data into the encoded video data transmitted across network 150 to remote decoder 160 for displaying on remote display 170.
Figure 5 shows in greater detail the internal organisation of the analogue signature generator 318. Each of the sub-components within the analogue signature generator 318 may be optionally enabled or disabled. That is, an analogue signature generator 318 may comprise only certain of the processing steps described below. To deal with analogue originating video data, a signature function that can compute signatures for analogue signals such that small perturbations to the input due to noise result in small or zero changes to the output is used.
Raw video data input to the analogue signature generator 318 is first optionally (and if appropriate for the raw data format) processed by an RGB to YUV conversion function 410 which converts the video data from red-green-blue format with 8 bits for each colour component to luminance/chrominance format with more bits for luminance and fewer bits for chrominance. YUV is also known as YcbCr and is defined by ISO/TEC standard 13818. This conversion has the benefit that the magnitude of the difference between two colour values more closely matches the human vision system's sensitivity to those differences (humans are more sensitive to some colours than others).
The data is next optionally processed by an edge scaling function 412 which is arranged to double the magnitude of values corresponding to pixels at the edges of the regions into which the frame is divided for the purposes of signature calculation. This increases the sensitivity of the signature to movement of elements of the image that only slightly overlap the region, and would therefore otherwise have little influence on the signature.
The data is next differentiated by a differentiator 414. That is, for each of the three colour components, the difference between the current value and the last processed value (the value for the last processed pixel) is calculated. Pixel values at the leading edge of a frame tile may have come from a neighbouring tile. In order to prevent the signatures of pixels in one tile affecting those for a neighbouring tile values at the leading edge of a tile are set to a zero output. The output of the differentiator 414 is six values: the three original colour components and the three differentiated colour components.
A pseudo-random code generator 416 uses a linear feedback shift register to generate a pseudo-random code for each pixel position. This code is a function of the pixel's position within its region and the initial value of the linear feedback shift register, but is the same for corresponding pixels within different regions of a frame. This code pattern may alternatively either be chosen at random or be based upon prior evaluation of the system's performance using that code pattern with typical video sequences. The pseudo-random code generator is reset at the first column of the tile. The value set is dependent upon the row number of the tile within the frame. Consequently, the same pseudo-random code is used for each tile in each frame that is in the same position within the display.
A sealer 418 optionally multiplies the six values output by the differentiator 414 by either - 1, +1/2, +1 or +3/2 determined by the code supplied by the pseudo-random code generator 416.
A permutator 420 optionally permutes the six values output by the sealer 418, along with two zero inputs, in a pattern determined by the pseudo-random code generator 416, to output eight values. The two zero inputs are added to prevent the output of the accumulator being set to a saturated value too frequently. The additional zeros prevents the output of the accumulator being set to the saturated value for too long.
A partial signature memory 422 stores partially-computed analogue signatures.
An accumulator 424 retrieves partially-computed analogue signatures from the partial signature memory 422 and adds to them the values output by the permutator 420. The number of bits used to store each value is chosen such that the accumulated values do not overflow too frequently.
When the last pixel of each region has been processed the output of the accumulator 424 is the output from the analogue signature generator 318. The eight values, each of eight bits, are combined to form an analogue region signature. When the final signature for a region is output from the accumulator 424, a zero is written to the partial signature memory 422 in preparation for generating the next region signature. For all other pixels within a region the output from the accumulator 424 is written to the partial signature memory 422.
The analogue signature may be truncated, retaining only the most significant bits, in order to reduce the effect of noise. The number of bits may not be equal for each value. The collection of values together form the analogue signature.
In accordance with a further embodiment of the present invention, a hardware encoder for a KVM via EP product capable of processing both digital and analogue PC video data is generated by combining the encoders 110 and 310 shown in Figures 2 and 4. When digital PC video data is output from computer 100, encoder 110 is used, and when analogue PC video data is output from computer 100, encoder 310 is used. Whether digital or analogue PC video data is being output from computer 100 may be simply detected using known techniques, that will not be described in further detail here.
In accordance with certain embodiments of the present invention the process of detecting copyrects (that is, portions of video data that are translated from a first frame to a second frame, alternatively referred to herein as motion detection) may be implemented in a hardware encoder arranged to implement a pipelined process in order to increase the efficiency of the process, as will now be described in greater detail.
The pipelined process is implemented by, a pipelined content-addressable-memory (CAM) has been designed that compares the rolling signatures for a new frame with the aligned rolling signatures computed for the last video frame. It uses a tree organization as graphically illustrated in Figure 7. The tree structure illustrated in Figure 7 allows each new rolling signature generated for the current frame to be assessed against all of stored aligned rolling signature for the previous frame without requiring the current signature to be individually compared to every stored signature. This streamlines the processing of new rolling signatures resulting in significant efficiency savings as will be described in relation to Figure 7. The pipelined CAM is an alternative implementation of the signature memory 126, run length encoder 128 and match log memory 130 of Figure 2.
The CAM is implemented as a series of separate processing stages. At each stage, the current rolling signature for the current frame is compared to a single stored rolling signature for the previous frame. The processing stage determines whether the two signatures are the same, or which one is the larger. The result of this comparison is used to determine which stored signature is compared to the current rolling signature in the following stage (unless a match has already been found).
Figure 6 shows a generic stage 10 for the CAM. Each stage generally comprises a random access memory 12 having an address input 13 for receiving address data. The RAM 12 stores the aligned rolling checksums from the previous frame. An output from the RAM 12 supplies a single aligned rolling signature value from the previous frame to an input of a comparator 14, the aligned rolling signature being addressed by the address input 13. A further input 15 of the comparator 14 is supplied with the current rolling signature value for the new frame from the rolling signature generator 122. The output of the comparator
14 (providing an indication of a match, or which signature is greater) is applied as an input to a logic device 16 which implements a function fas described below. The logic device 16 also receives a match signal 18 as a further input (which comprises the results of the previous processing stages), and supplies an updated match signal 18 as an output which is passed to the next processing stage (the match signal output 18 comprising the results of the processing at all of the previous stages including the current stage). The logic device 16 provides a further output to the address line 13 which determines the address to be provided to the RAM 12 in the following stage (that is, determining which stored aligned rolling signature from the previous frame to be assessed in the next processing stage).
Each processing stage 10 is controlled by a clock signal 20 which prevents the data signal
15 (comprising the current rolling signature for the current frame), the updated match signal 18 and the address signal 13 from being passed to the next processing stage 10 until the current processing stage 10 has completed its processing. It will be appreciated that in certain embodiments of the present invention there may only be a single RAM 12, comparator 14 and logic device 16; the output data line 15, match line 18 and address line 13 being looped back to their respective inputs for processing during the next processing stage.
Each stage compares an input value on data input 15 (comprising the current rolling signature value for the current frame) with a single stored aligned rolling signature values from the previous frame. The result of the comparison determines a further result bit upon the match line 18 (indicating whether the current stage provided a matched rolling signature, or which signature was the larger) which is applied to a next stage in the pipeline (for assessing the next stage of the tree structure shown in Figure 7)..
Logic implemented function f operates as follows. If the match input is set, then that indicates that there was an exact match between a rolling signature for the current frame and a stored aligned rolling signature in an earlier stage. The current stage appends a 0 to the address signal on line 13 and sets the match output 18. That is, if there is a match in a previous processing stage then the current rolling signature has already been matched and thus the current processing stage is redundant. The match output 18 remains set.
Otherwise, if the comparator of the current stage indicates an exact match between a rolling signature for the current frame and a stored aligned rolling signature for an earlier stage in this stage, then the current stage appends a 1 to the address signal 13 (indicating to future stages which stage identified a match) and sets the match output 18.
Otherwise, the current stage appends a 0 to the address signal 13 if the current rolling signature is less than the rolling signature from the RAM or appends a 1 to the address signal 13 if it is greater and the match output 18 is not set.
In this way, each processing stage determines the rolling signature stored in the RAM 12 to be assessed in the following processing stage. After all processing stages are complete, if the match output 18 is set then this indicates that the current rolling signature has been matched to a stored aligned rolling signature for the previous frame. The final address signal 13 indicates which stored aligned rolling signature was matched according to the position of a 1 within the final address signal.
In this manner, an efficient method is provided of searching for a match between a newly generated rolling signature and each of the stored aligned rolling signatures from the previous frame. It is not necessary to compare the newly generated rolling signature against every one of the stored aligned rolling signatures for the previous frames.
There first and last stages may be of a simpler construction for the following reasons. The first stage is simplified because at this stage there is no previous comparison which could have found a match, so the test applied in computing a match are simplified to a single comparison of two values. Similarly, the last stage is simplified because at the end of the last stage there is no need to add an additional address bit; either because the last stage produces a match at the currently determined address (from the output of all previous stages) such that the address of the match is already known, or because the last stage does not produce a match in which case the predetermined address is not useful anyway.
An example of operation of the pipelined CAM, by analogy to a system for searching natural languages, will now be described with reference to Figure 7 which shows a graphic representation 30 of fifteen stages 10 arranged as a pipelined binary tree. Each of the fifteen words 32 within Figure 7 are equivalent to a separate stored aligned signature within the RAM 12 for a previous frame.
Consider a 15-entry CAM 30 with 4 stages, containing the following words 32: "the", "quick", "brown", "fox", jumps", "over", "lazy", "dog", "she", "sells", "sea", "shells", "by", "shore". The 15 entries each correspond to a separate aligned rolling signature for a previous frame. The value of each word (corresponding to the numeric value of each rolling signature) is shown below each word. The words are arranged in a particular fashion such that at each stage only a single comparison needs to be performed (that is, only four comparisons are required) in order to search through all of the fifteen words. It should be noted that duplicate values (e.g. "the") are not included for simplicity and that the values are padded with a "maxint" or "minint" value if they do not fill the CAM (in this example, there are sixteen values but only four stages required to represent the fifteen values and so the sixteenth stage is padded with the value MAX). The values of each word (corresponding to an aligned rolling signature) are stored in the RAM 12 as illustrated in Figure 6. Each word is arranged within the tree structure of Figure 7 according to its assigned value (that is, rolling signatures are arranged according to their numeric value).
Consider searching for "lazy". This is equivalent to comparing a newly generated rolling signature that corresponds to a particular stored aligned rolling signature for a previous frame with the stored aligned rolling signatures for the previous frame. In stage 0, "lazy" is compared with "quick". The match signal 18 is not set as they are not equal, and the most significant bit of the address is set to 0 as "lazy" < "quick". That is, the value assigned to "lazy" (6) is less than the value assigned to "quick" (8) - for rolling signatures, this is equivalent to the newly generated rolling signature having a lower numeric value than the first stored aligned rolling signature.
In stage 1, location 0 is read from the RAM and "lazy" is compared with "fox" ("fox" being chosen as its address within the RAM is 0 - the address of each word being shown above the word). Match is not set as they are not equal, and the next bit of the address is set to 1 as "lazy" > "fox". In stage 2, location 01 is read from the RAM and "lazy" is compared with "lazy". Match is set as they are equal, and the next bit of the address is set to 1. In stage 3, match is already set so the next bit of the address is set to 0. The final result indicates a match at address 0110 (^όto). For rolling signatures, this is equivalent to the newly generated rolling signature successfully being matched to a stored aligned rolling signature at stage 2.
Consider searching for a value from the last stage, "sea". In stage 0, "sea" is compared with "quick". Match is not set as they are not equal, and the most significant bit of the address is set to 1 as "sea" > "quick". In stage 1, location 1 is read from the RAM and "sea" is compared with "shells". Match is not set as they are not equal, and the next bit of the address is set to 0 as "sea" < "shells", hi stage 2, location 10 is read from the RAM and "sea" is compared with "sells". Match is not set as they are not equal, and the next bit of the address is set to 0 as "sea" < "sells". In stage 3, location 100 is read from the RAM and "sea" is compared with "sea". Match is set as they are equal, and the next bit of the address is set to 1. The final result indicates a match at address 1001
Figure imgf000035_0001
Consider searching for non-existent value "foo". This is equivalent to the current rolling signature for a frame not matching any of the stored aligned rolling signatures for an old frame. In stage 0, "foo" is compared with "quick". Match is not set as they are not equal, and the most significant bit of the address is set to 0 as "foo" < "quick". In stage 1, location 0 is read from the RAM and "foo" is compared with "fox". Match is not set as they are not equal, and the next bit of the address is set to 0 as "foo" < "fox". In stage 2, location 00 is read from the RAM and "foo" is compared with "by". Match is not set as they are not equal, and the next bit of the address is set to 1 as "foo" > "by". In stage 3, location 001 is read from the RAM and "foo" is compared with "dog". Match is not set as they are not equal, and the next bit of the address is undefined. The final result indicates no match, and so the address is immaterial.
For detecting copyrects within a current frame, the process continues by calculating a new rolling signature for the next pixel position and then processing the new rolling signature in the same way as shown above in Figure 6 and 7 to each of the same stored aligned rolling signature positions for the old frame.
Embodiments of the present invention described above predominantly relate to a comparison of two separate video frames that follow consecutively within a continuous series of video data. However, the present invention is not limited to a comparison of a first frame with a second immediately subsequent frame. Indeed, any two frames may be compared even if widely separated in time, for instance by intervening frames. Furthermore, the second frame may precede the first frame (again, possibly with intervening frames). The present invention further extends to instances in which the first and second frames are one and the same frame, for which differences are detected between portions of the same frame, or translated and copied portions of video data within the same frame can be detected. Further modifications and applications of the present invention will be readily apparent to the appropriately skilled person from the teaching herein, without departing from the scope of the appended claims.

Claims

CLAIMS:
1. A method of detecting a translation of a portion of video data received from a video source, the method comprising: computing first signatures for each of a plurality of portions of a video frame; computing a second signature for a portion of a video frame; comparing the second signature with the first signatures; and determining whether a portion of video data has been translated according to a result of said comparison.
2. A method according to claim 1, wherein said first signatures are computed for a plurality of portions of a first video frame and said second signature is computed for a portion of a second video frame, the method comprising determining whether a portion of video data has been translated between the first and the second frames according to a result of said comparison.
3. A method according to claim 1, wherein said first signatures are computed for a plurality of portions of a video frame and said second signature is computed for a different portion of the same video frame, the method comprising determining whether a portion of video data has been translated within the video frame according to a result of said comparison.
4. A method according to any one of the preceding claims, further comprising computing second signatures for each of a plurality of portions of a video frame, and comparing each second signature with the first signatures.
5. A method according to any one of the preceding claims, further comprising: storing at least some of the second signatures associated with aligned and non- overlapping portions of the video frame; and repeating the method using the stored second signatures in place of the first signatures.
6. A method according to any one of the preceding claims, further comprising: identifying a first subset of the first signatures; comparing the first subset of first signatures with the or each second signature; and determining whether a region of the video frame has been translated according to a result of said comparison.
7. A method according to claim 6, wherein the first subset comprises one or more first signatures for which a value of the signature is unique amongst the first signatures computed for the video frame.
8. A method according to any one of the preceding claims, the method further comprising computing a magnitude of the translation of the video data.
9. A method according to any one of the preceding claims, wherein the signatures are computed using a rolling signature algorithm, the rolling signature algorithm allowing a new signature to be computed from a previous signature of an overlapping portion of the video frame and the values of those pixels either included in the previous overlapping portion of the video frame and not in the current portion of the video frame or are included in the current portion of the video frame and not in the previous overlapping portion of the video frame.
10. A method according to any one of the preceding claims, further comprising: generating an encoded representation of a video frame; and transmitting the encoded representation of the video frame over a computer network to a decoder associated with a remote computer; wherein the encoded representation of the video frame allows the identification of portions of video data that have been translated; and wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
11. A method according to claim 10, further comprising: computing third signatures for each of a plurality of aligned and non-overlapping portions of a video frame; computing fourth signatures for each of a plurality of aligned and non-overlapping portions of a video frame; and comparing the third and fourth signatures to identify portions of a video frame that are the same as corresponding portions of another video frame; wherein an encoded representation of a video frame only includes an encoded representation of portions of the video frame .that are not the same as portions of another video frame.
12. A method according to any one of the preceding claims, further comprising: receiving analogue video data received from a video output of a further computer; converting the analogue video data to digital video data; computing fifth signatures for each of a plurality of portions of a video frame; computing sixth signatures for each of a plurality of portions of a video frame; comparing the fifth and sixth signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
13. A method of processing analogue video data received from a video source, the method comprising: converting the analogue video data to digital video data; computing first signatures for each of a plurality of portions of a video frame; computing second signatures for each of a plurality of portions of a video frame; comparing the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data. .
14. A method according to claim 13, wherein said first signatures are computed for a plurality of portions of a first video frame and said second signatures are computed for portions of a second video frame, the method comprising identifying portions of video data within the second video frame that are the same as corresponding portions of video data within the first video frame.
15. A method according to claim 13, wherein said first signatures are computed for a plurality of portions of a video frame and said second signatures are computed for a different plurality of portions of the same video frame, the method comprising identifying portions of video data that are the same as other portions of video data within the same video frame.
16. A method according to any one of claims 13 to 15, further comprising: generating an encoded representation of the video frame; and transmitting the encoded representation of a video frame over a computer network to a decoder associated with a remote computer; wherein the encoded representation of the video frame allows the identification of portions of the video frame that are the same as corresponding portions of another video frame; and wherein the decoder is adapted to reproduce the video frame using the encoded representation of the video frame and a previously decoded representation of another video frame.
17. A carrier medium carrying computer readable code for controlling a data processing device to carry out the method of any one of claims 1 to 16.
18. A data processing device for processing video data, the apparatus comprising: a program memory storing processor readable instructions; and a processor configured to read and execute instructions stored in said program memory; wherein the processor readable instructions comprise instructions controlling the processor to carry out the method of any one of claims 1 to 16.
19. An apparatus for detecting a translation of a portion of video data received from a video source, the apparatus comprising: a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute a second signature for a portion of a video frame; a comparator operable to compare the second signature with the first signatures and to determine whether a portion of video data has been translated.
20. An apparatus according to claim 19, wherein the apparatus is further arranged to implement the method of any one of claims 2 to 12.
21 An apparatus according to any claim 19 or claim 20, wherein the comparator is in the form of a pipeline comprising a series of processing stages arranged to compare a second signature iteratively against a plurality of first signatures to determine whether a portion of video data has been translated.
22. An apparatus according to claim 21, wherein the apparatus implements a binary tree storage structure for storing the plurality of first signatures, arranged such that at each processing stage the pipeline is operable to compare the second signature with a single first signature, the number of stages being less than the number of first signatures..
23. An apparatus for processing analogue video data received from a video source, the apparatus comprising: an analogue to digital converter arranged to convert analogue video data to digital video data; a signature generator operable to compute first signatures for each of a plurality of portions of a video frame and to compute second signatures for each of a plurality of portions of a video frame; and a comparator operable to compare the first and second signatures to identify portions of a video frame that are the same as other portions of a video frame; wherein the first and second signatures are computed using a function which provides substantially the same value irrespective of the presence of noise in the digital video data.
24. An apparatus according to claim 23, wherein the apparatus is further arranged to implement the method of any one of claims 14 to 16.
PCT/GB2008/001332 2007-04-16 2008-04-16 Video data transmission Ceased WO2008125861A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0917453A GB2460588A (en) 2007-04-16 2009-10-06 Video data transmission

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0707276.2 2007-04-16
GBGB0707276.2A GB0707276D0 (en) 2007-04-16 2007-04-16 Video data transmission

Publications (2)

Publication Number Publication Date
WO2008125861A2 true WO2008125861A2 (en) 2008-10-23
WO2008125861A3 WO2008125861A3 (en) 2009-04-02

Family

ID=38116778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/001332 Ceased WO2008125861A2 (en) 2007-04-16 2008-04-16 Video data transmission

Country Status (2)

Country Link
GB (2) GB0707276D0 (en)
WO (1) WO2008125861A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013040708A1 (en) * 2011-09-19 2013-03-28 Quickplay Media Inc. Media processor
US8671021B2 (en) 2006-12-13 2014-03-11 Quickplay Media Inc. Consumption profile for mobile media
WO2014088707A1 (en) * 2012-12-05 2014-06-12 Silicon Image, Inc. Method and apparatus for reducing digital video image data
EP2647000A4 (en) * 2010-11-30 2015-09-30 Ati Technologies Ulc Method and apparatus for providing static frame
US9448760B2 (en) 2009-04-23 2016-09-20 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US9866604B2 (en) 2008-04-04 2018-01-09 Quickplay Media Inc Progressive download playback
US10327044B2 (en) 2006-12-13 2019-06-18 Quickplay Media Inc. Time synchronizing of distinct video and data feeds that are delivered in a single mobile IP data network compatible stream

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69419439T2 (en) * 1993-01-11 1999-12-16 Canon K.K., Tokio/Tokyo Device and method for motion detection
US5990852A (en) * 1996-10-31 1999-11-23 Fujitsu Limited Display screen duplication system and method
US6331855B1 (en) * 1999-04-28 2001-12-18 Expertcity.Com, Inc. Method and apparatus for providing remote access, control of remote systems and updating of display information
EP1752891A3 (en) * 1999-11-29 2007-02-21 Sony Corporation Method and apparatus for establishing and browsing a hierarchical video camera motion transition graph.
US20020131500A1 (en) * 2001-02-01 2002-09-19 Gandhi Bhavan R. Method for determining a motion vector for a video signal
JP2005501355A (en) * 2001-08-27 2005-01-13 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Caching method
US6983020B2 (en) * 2002-03-25 2006-01-03 Citrix Online Llc Method and apparatus for fast block motion detection

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10327044B2 (en) 2006-12-13 2019-06-18 Quickplay Media Inc. Time synchronizing of distinct video and data feeds that are delivered in a single mobile IP data network compatible stream
US8671021B2 (en) 2006-12-13 2014-03-11 Quickplay Media Inc. Consumption profile for mobile media
US11675836B2 (en) 2006-12-13 2023-06-13 Directv, Llc Mobile media pause and resume
US8805270B2 (en) 2006-12-13 2014-08-12 Quickplay Media Inc. Seamlessly switching among unicast, multicast, and broadcast mobile media content
US11182427B2 (en) 2006-12-13 2021-11-23 Directv, Llc Mobile media pause and resume
US11113333B2 (en) 2006-12-13 2021-09-07 The Directv Group, Inc. Automated content tag processing for mobile media
US10459977B2 (en) 2006-12-13 2019-10-29 Quickplay Media Inc. Mediation and settlement for mobile media
US10409862B2 (en) 2006-12-13 2019-09-10 Quickplay Media Inc. Automated content tag processing for mobile media
US10078694B2 (en) 2006-12-13 2018-09-18 Quickplay Media Inc. Mediation and settlement for mobile media
US10180982B2 (en) 2006-12-13 2019-01-15 Quickplay Media Inc. Mobile media pause and resume
US9866604B2 (en) 2008-04-04 2018-01-09 Quickplay Media Inc Progressive download playback
EP2244182B1 (en) * 2009-04-23 2020-03-11 VMWare, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US10067732B2 (en) 2009-04-23 2018-09-04 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US9448760B2 (en) 2009-04-23 2016-09-20 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US10572214B2 (en) 2009-04-23 2020-02-25 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US11003412B2 (en) 2009-04-23 2021-05-11 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
US11397553B2 (en) 2009-04-23 2022-07-26 Vmware, Inc. Method and system for identifying drawing primitives for selective transmission to a remote display
EP2647000A4 (en) * 2010-11-30 2015-09-30 Ati Technologies Ulc Method and apparatus for providing static frame
WO2013040708A1 (en) * 2011-09-19 2013-03-28 Quickplay Media Inc. Media processor
EP2759156A4 (en) * 2011-09-19 2015-10-21 Quickplay Media Inc Media processor
WO2014088707A1 (en) * 2012-12-05 2014-06-12 Silicon Image, Inc. Method and apparatus for reducing digital video image data

Also Published As

Publication number Publication date
WO2008125861A3 (en) 2009-04-02
GB0707276D0 (en) 2007-05-23
GB2460588A (en) 2009-12-09
GB0917453D0 (en) 2009-11-18

Similar Documents

Publication Publication Date Title
US7986844B2 (en) Optimized video compression using hashing function
US7606314B2 (en) Method and apparatus for caching, compressing and transmitting video signals
US7684483B2 (en) Method and apparatus for digitizing and compressing remote video signals
CN112204618B (en) Method, apparatus and system for mapping 3D point cloud data into a 2D surface
US10783668B2 (en) Handling duplicate points in point cloud compression
WO2008125861A2 (en) Video data transmission
US8787460B1 (en) Method and apparatus for motion vector estimation for an image sequence
EP3097539B1 (en) A method and system for interactive graphics streaming
US7450129B2 (en) Compression of streams of rendering commands
US20130148740A1 (en) Method and apparatus for processing partial video frame data
WO2006024011A2 (en) Method and apparatus for capturing and transmitting screen images
KR20190027716A (en) Hardware friendly virtual frame buffer
CN101099174A (en) Image Compression for Computer Graphics
US7421130B2 (en) Method and apparatus for storing image data using an MCU buffer
JP2008526107A (en) Using graphics processors in remote computing
US20170201763A1 (en) Digital watermarking for securing remote display protocol output
US8099578B2 (en) Method and system for finding scrolled regions within a tile cache
US20210350582A1 (en) Point cloud global tetris packing
EP0987656A2 (en) Method of graphics data compression
CN119071496A (en) Remote desktop screen frame transmission method, system, electronic device and storage medium
Chang et al. Histogram-based reversible data hiding based on pixel differences with prediction and sorting
CA2463824C (en) Lossless variable-bit signature compression
Shafee et al. A secure steganography algorithm using compressive sensing based on HVS feature
JP2002539663A (en) Data compression method and apparatus by embedded run-length encoding
Cherckesova et al. Video stream processing using steganographic information concealing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08736994

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 0917453

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20080416

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08736994

Country of ref document: EP

Kind code of ref document: A2