US20120314775A1 - Video decoder with transposing vector processor and methods for use therewith - Google Patents
Video decoder with transposing vector processor and methods for use therewith Download PDFInfo
- Publication number
- US20120314775A1 US20120314775A1 US13/162,075 US201113162075A US2012314775A1 US 20120314775 A1 US20120314775 A1 US 20120314775A1 US 201113162075 A US201113162075 A US 201113162075A US 2012314775 A1 US2012314775 A1 US 2012314775A1
- Authority
- US
- United States
- Prior art keywords
- data
- vector
- orientation
- matrix
- read
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000013598 vector Substances 0.000 title claims abstract description 212
- 238000000034 method Methods 0.000 title claims description 28
- 239000011159 matrix material Substances 0.000 claims abstract description 109
- 230000015654 memory Effects 0.000 claims abstract description 58
- 230000004044 response Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 description 52
- 230000006870 function Effects 0.000 description 37
- 238000010586 diagram Methods 0.000 description 32
- 239000000872 buffer Substances 0.000 description 19
- 238000013139 quantization Methods 0.000 description 8
- 230000008878 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000017105 transposition Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000007620 mathematical function Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 101000969688 Homo sapiens Macrophage-expressed gene 1 protein Proteins 0.000 description 1
- 102100021285 Macrophage-expressed gene 1 protein Human genes 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/40—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
Definitions
- the present invention relates to coding used in devices such as video encoders/decoders for stereoscopic television signals.
- Video encoding has become an important issue for modern video processing devices. Robust encoding algorithms allow video signals to be transmitted with reduced bandwidth and stored in less memory. However, the accuracy of these encoding methods face the scrutiny of users that are becoming accustomed to greater resolution and higher picture quality. Standards have been promulgated for many encoding methods including the H.264 standard that is also referred to as MPEG-4, part 10 or Advanced Video Coding (AVC), and the VP8 standard set forth by On2 Technologies, Inc. While these standards set forth many powerful techniques, further improvements are possible to improve the performance and speed of implementation of such methods. The video signal encoded by these encoding methods must be similarly decoded for playback on most video display devices.
- SVC Scalable Video Coding
- Efficient and fast encoding and decoding of video signals is important to the implementation of many video devices, particularly video devices that are destined for home use. Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with the present invention.
- FIGS. 1-3 present pictorial diagram representations of various video devices in accordance with embodiments of the present invention.
- FIG. 4 presents a block diagram representation of a video system in accordance with an embodiment of the present invention.
- FIG. 5 presents a block diagram representation of a video decoder 102 in accordance with an embodiment of the present invention.
- FIG. 6 presents a block diagram representation of a multi-format video decoder 150 in accordance with an embodiment of the present invention.
- FIG. 7 presents a block diagram representation of a multi-format video decoder 150 in accordance with an embodiment of the present invention.
- FIG. 8 presents a block diagram representation of a decoding process in accordance with an embodiment of the present invention.
- FIG. 9 presents a block diagram representation of a matrix vector processing unit 190 in accordance with another embodiment of the present invention.
- FIG. 10 presents a block diagram representation of a filter vector processing unit in accordance with another embodiment of the present invention.
- FIG. 11 presents a block diagram representation of a VPU instruction 180 in accordance with another embodiment of the present invention.
- FIG. 12 presents a block diagram representation of a VPU 200 in accordance with an embodiment of the present invention.
- FIG. 13 presents a block diagram representation of a VPU 300 in accordance with an embodiment of the present invention.
- FIG. 14 presents a graphical representation of a write operation in accordance with an embodiment of the present invention.
- FIG. 15 presents a graphical representation of a write operation in accordance with an embodiment of the present invention.
- FIG. 16 presents a graphical representation of a read operation in accordance with an embodiment of the present invention.
- FIG. 17 presents a graphical representation of a read operation in accordance with an embodiment of the present invention.
- FIG. 18 presents a block diagram representation of a VPU 325 in accordance with an embodiment of the present invention.
- FIG. 19 presents a block diagram representation of a video distribution system 375 in accordance with an embodiment of the present invention.
- FIG. 20 presents a block diagram representation of a video storage system 179 in accordance with an embodiment of the present invention.
- FIG. 21 presents a flow diagram representation of a method in accordance with an embodiment of the present invention.
- FIG. 22 presents a flow diagram representation of a method in accordance with an embodiment of the present invention.
- FIGS. 1-3 present pictorial diagram representations of various video devices in accordance with embodiments of the present invention.
- set top box 10 with built-in digital video recorder functionality or a stand alone digital video recorder, television or monitor 20 and portable computer 30 illustrate electronic devices that incorporate a video decoder in accordance with one or more features or functions of the present invention. While these particular devices are illustrated, the present invention can be implemented in any device that is capable of decoding and/or transcoding video content in accordance with the methods and systems described in conjunction with FIGS. 4-15 and the appended claims.
- FIG. 4 presents a block diagram representation of a video decoder 102 in accordance with an embodiment of the present invention.
- this video device includes a receiving module 100 , such as a server, cable head end, television receiver, cable television receiver, satellite broadcast receiver, broadband modem, 3G transceiver or other information receiver or transceiver that is capable of receiving a received signal 98 and generating a video signal 110 that has been encoded via a video encoding format.
- Video processing device 125 includes video decoder 102 and is coupled to the receiving module 100 to decode or transcode the video signal for storage, editing, and/or playback in a format corresponding to video display device 104 .
- Video processing device can include set top box 10 with built-in digital video recorder functionality or a stand alone digital video recorder. While shown as separate from video display device 104 , video processing device 125 , including video decoder 102 can be incorporated in television or monitor 20 and portable computer 30 of other device that includes a video decoder, such as video decoder 102 .
- the received signal 98 is a broadcast video signal, such as a television signal, high definition television signal, enhanced definition television signal or other broadcast video signal that has been transmitted over a wireless medium, either directly or through one or more satellites or other relay stations or through a cable network, optical network or other transmission network.
- received signal 98 can be generated from a stored video file, played back from a recording medium such as a magnetic tape, magnetic disk or optical disk, and can include a streaming video signal that is transmitted over a public or private network such as a local area network, wide area network, metropolitan area network or the Internet.
- Video signal 110 can include a digital video signal complying with a digital video codec standard such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC) including an SVC signal, an encoded stereoscopic video signal having a base layer that includes a 2D compatible base layer and an enhancement layer generated by processing in accordance with an MVC extension of MPEG-4 AVC, or another digital format such as a Motion Picture Experts Group (MPEG) format (such as MPEG1, MPEG2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), video coding one (VC-1), VP8, etc.
- MPEG Motion Picture Experts Group
- WMV Windows Media Video
- AVI Audio Video Interleave
- Video display devices 104 can include a television, monitor, computer, handheld device or other video display device that creates an optical image stream either directly or indirectly, such as by projection, based on the processed video signal 112 either as a streaming video signal or by playback of a stored digital video file.
- FIG. 5 presents a block diagram representation of a video decoder 102 in accordance with an embodiment of the present invention.
- Video decoder 102 includes an entropy decoding device 140 having a processing module 142 that generates entropy decoded (EDC) data 146 from an encoded video signal such as video signal 110 .
- the EDC data 146 can include run level data, motion vector differential data, and macroblock header data and/or other data that results from the entropy decoding of an encoded video signal.
- Multi-format video decoding device 150 includes a processing module 152 , a memory module 154 and a hardware accelerator module 156 that operate to generate a decoded video signal, such as processed video signal 112 , from the EDC data 146 .
- the entropy decoding device 140 and the multi-format video decoding device 150 operate contemporaneously in a pipelined process where the multi-format video decoding device 150 generates a first portion of the decoded video signal during at least a portion of time that the entropy decoding device 140 generates EDC data 146 from a second portion of the encoded video signal.
- the processing modules 142 and 152 can each be implemented using a single processing device or a plurality of processing devices.
- a processing device may be a microprocessor, co-processors, a micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such as memory modules 144 and 154 .
- These memories may each be a single memory device or a plurality of memory devices.
- Such a memory device can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information.
- the processing modules 142 and 152 implement one or more of their functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry
- the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
- the processing modules 142 and 152 each includes a processor produced by ARC International to implement the neighbor management module 218 , however other processor configurations could likewise be employed.
- FIG. 6 presents a block diagram representation of a multi-format video decoder 150 in accordance with an embodiment of the present invention.
- the memory module 154 includes a software library 160 that stores format configuration data corresponding to a plurality of video coding formats such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC) including the SVC and MVC extensions, MPEG2, MPEG4, Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), video coding one (VC-1), VP8, or other video coding/compression format, etc.
- the hardware accelerator module 156 includes a plurality of vector processor units (VPU 1 , VPU 2 , . . .
- VPU N that operate in conjunction with processing module 152 to generate a decoded video signal from the EDC data 146 .
- the plurality of vector processing units and the processing module 152 are configured, based on the configuration data, to a selected one of the plurality of video coding formats. In this fashion, a since video decoder 102 can be configured for operation of the particular video coding format or formats of the video signal 110 .
- the multi-format video decoder 150 can receive selection data from a user or designer that indicates the particular video coding format.
- EDC data 146 can be analyzed by processing module 152 to identify the video coding format of the video signal 110 .
- the multi-format video decoder 150 responds to the selection by retrieving the configuration data from the software library 160 and by configuring the processing module 152 and the vector processing units to decode the selected video coding format.
- Configuration data can include loading program instructions executed by the processing module 152 and the vector processing units of the hardware accelerator module 156 along with other data used in conjunction with the decoding of the EDC data 146 .
- the VPU instructions include one or more instructions that configure the vector processing units of hardware accelerator module 156 to the selected coding format, in addition to instructions that perform the particular decoding operations performed by the vector processing units in accordance with the selected video coding format.
- the vector processors can include one or more matrix vector processors that perform parallel matrix operations such as matrix addition, multiplication, transposition, inversion and/or other matrix operation.
- the vector processors can further include one or more filter vector processors that perform parallel filter operations.
- These vector processing units are configured via VPU programming instructions that include vector instructions, scalar instructions and branching instructions, to operate in accordance with the selected video coding format.
- FIG. 7 presents a block diagram representation of a multi-format video decoder 150 in accordance with an embodiment of the present invention.
- multi-format video decoder 150 includes a processing module 152 and a memory module 154 as described in conjunction with FIG. 5 .
- the multi-format video decoding device 150 further includes a bus 221 , a signal interface 158 , decode motion compensation module 204 , neighbor management module 218 , deblocking filter module 222 , inverse transform module 276 , inverse quantization module 274 , and inverse intra prediction module 211 . While a particular bus architecture is shown that represents the functionality of communication between the various modules of multi-format video decoding device 150 , other architectures can be implemented in accordance with the broad scope of the present invention.
- the signal interface 158 receives EDC data 146 and optionally buffers and preprocesses the EDC data 146 for processing by the other modules of multi-format video decoding device 150 .
- the decoded video signal generated via processing by the other modules of multi-format video decoding device 150 is optionally buffered, such as via a ring buffer or other buffer structure implemented in conjunction with memory locations of memory module 154 and formatted for output as processed video signal 112 .
- the decode motion compensation module 204 , neighbor management module 218 , deblocking filter module 222 , inverse transform module 276 , inverse quantization module 274 , and inverse intra prediction module 211 are configured to operate to decode the EDC data 146 in accordance with the selected video format such as VP8, H.264 (including MVC and/or SVC), VC-1 or other compression standard.
- the decode motion compensation module 204 , neighbor management module 218 , deblocking filter module 222 , inverse transform module 276 , inverse quantization module 274 , inverse intra prediction module 211 are implemented using software stored in memory module 154 and executed via processing module 152 as well as via vector processing unit instructions executed by the plurality of vector processing units of hardware accelerator module 156 .
- the decode motion compensation module 204 , deblocking filter module 222 , and inverse intra prediction module 211 are implemented using three separate filter vector processing units, one for each module.
- the inverse transform module 276 and the inverse quantization module 274 are implemented via two separate matrix vector processing units, one for each module.
- the neighbor management module 218 is implemented via software executed by processing module 152 .
- neighbor management module 218 In operation, neighbor management module 218 generates motion vector data, macroblock mode data and deblock strength data, based on the motion vector differential data and the macroblock header data.
- a data structure such as a linked list, array or one or more registers are used to associate and store neighbor data for each macroblock of a processed picture.
- the neighbor management module 218 stores the motion vector data for a group of macroblocks that neighbor a current macroblock and generates the motion vector data for the current macroblock based on both the macroblock mode data and the motion vector data for the group of macroblocks that neighbor the current macroblock.
- the neighbor management module 218 calculates a motion vector magnitude and adjusts the deblock strength data based on the motion vector magnitude.
- the decode motion compensation module 204 generates inter-prediction data based on the motion vector data when the macroblock mode data indicates an inter-prediction mode.
- the inverse intra-prediction module 211 generates intra-prediction data when the macroblock mode data indicates an intra-prediction mode.
- the inverse quantization module 274 dequantizes run level data. The dequantized run level data is inverse transformed, such as via an inverse discrete cosine transform or other inverse transform via inverse transform module 276 to generate residual data.
- the inverse intra-prediction module 211 generates reconstructed picture data based on the residual data and on the inter-prediction data when the macroblock mode data indicates an inter-prediction mode and based on the residual data and on the intra-prediction data when the macroblock mode data indicates an intra-prediction mode.
- the deblocking filter module 222 generates the decoded video signal from the reconstructed picture data, based on the deblock strength data.
- the deblocking filter module 222 operates to smooth horizontal and vertical edges of a block that may correspond to exterior boundaries of a macroblock of a frame or field of video signal 110 or edges that occur in the interior of a macroblock.
- a boundary strength that is determined based on quantization parameters, adjacent macroblock type, etcetera, can vary the amount of filtering to be performed.
- the H.264 standard defines two parameters, ⁇ and ⁇ , that are used to determine the strength of filtering on a particular edge.
- the parameter ⁇ is a boundary edge parameter applied to data that includes macroblock boundaries.
- the parameter ⁇ is an interior edge parameter applied to data that is within a macroblock interior.
- motion vector magnitude is used by neighbor management module 218 to generate deblock strength data that adjusts the values for ⁇ and ⁇ for deblocking filter module 222 . For instance, when the motion vector magnitude indicates large motion vectors, e.g. magnitudes above a first magnitude threshold, a larger value of ⁇ can be selected. Further, motion vector magnitude indicates small motion vectors, e.g. magnitudes below the same or other threshold, a smaller value of ⁇ can be selected.
- FIG. 8 presents a block diagram representation of a decoding process in accordance with an embodiment of the present invention.
- the neighbor management module 218 receives macroblock header and motion vector differential data 230 from the EDC data 146 via buffer 300 .
- the neighbor management module 218 checks the macroblock (MB) mode from the MB header.
- MB macroblock
- inter-prediction mode the neighbor management module 218 calculates motion vectors and also calculates deblock strength data and passes this data along with other EDC data, such as run level data 272 to one or more frame buffers, represented in the process flow as buffers 302 , 304 , 308 , 310 and 318 implemented via memory module 154 .
- the decode motion compensation module 204 generates inter-prediction data based on the motion vectors and on reference frames retrieved from the frame buffer and stores the results in buffer 314 , such as a ring buffer.
- the inverse intra prediction module 211 generates intra-prediction data.
- the inverse quantization module 274 retrieves run level data 272 from buffer 304 and inverse quantizes the data with data from the frame buffer 302 and generates de-quantized data that is stored in buffer 306 .
- Inverse transforms module 276 inverse transforms the de-quantized data based on the frame buffered data to generate residual data that is stored in buffer 312 .
- the residual data is combined in inverse intra-prediction module 211 with either intra-prediction data or inter-prediction data supplied in response to the mode determination by neighbor management module 218 , to generate current reconstructed frames/fields that are buffered in the buffer 316 .
- Deblocking filter module 222 applies deblocking filtering to the reconstructed frames/fields in accordance with the deblock strength data from neighbor management module 218 to generate decoded video output in the form of filtered pictures 226 that are buffered via buffer 320 .
- the buffers 306 , 312 , 314 , 316 , 318 and 320 can each be a ring buffer implemented via memory module 154 , however other buffer configurations are likewise possible.
- FIG. 9 presents a block diagram representation of a matrix vector processing unit 190 in accordance with another embodiment of the present invention.
- matrix vector processing unit 190 includes a dedicated hardware block that performs parallel matrix operations such as matrix addition, multiplication, transposition, inversion and/or other matrix operations on an input matrix 192 to generate an output matrix 194 .
- the matrix vector processing unit 190 is configured via VPU instructions 180 that include vector instructions, scalar instructions and branching instructions. These VPU instructions 180 include configuration data and commands 170 that configure the matrix VPU 190 in accordance with the selected video coding format and command the matrix vector processing unit to perform the corresponding functions such as all or part of an inverse discrete cosine transform, inverse quantization or other matrix function of the multi-format video decoder 150 .
- the VPU instructions 180 further include vector and/or scalar data used in conjunction with vector and scalar operations of the device.
- FIG. 10 presents a block diagram representation of a filter vector processing unit 195 in accordance with another embodiment of the present invention.
- filter vector processing unit 195 includes a dedicated hardware block that performs parallel filter operations such as an n-tap one-dimensional horizontal filter, an n-tap one-dimensional vertical filter, or an n-tap two-dimensional filter.
- the filter VPU 196 operates to filter input data 196 , such as a block of pixels, a row of pixels, a column of pixels of a video picture or other data to generate filtered data 198 .
- the filter vector processing unit 195 is configured via VPU instructions 181 that include vector instructions, scalar instructions and branching instructions. These VPU instructions 181 include configuration data and commands 172 that configure the filter VPU 195 in accordance with the selected video coding format such as by programming the filter parameters, (e.g. the number of taps, type of filter, and the particular filter coefficients) and command the filter vector processing unit to perform the corresponding functions such as all or part of the generation of inter-prediction data, intra-prediction data and or filtered picture data of the multi-function video decoder 150 .
- the VPU instructions 181 further include vector and/or scalar data used in conjunction with vector and scalar operations of the device.
- FIG. 11 presents a block diagram representation of a VPU instruction 180 or 181 in accordance with another embodiment of the present invention.
- the VPU instructions include three portions, vector instruction 182 , scalar instruction 184 , and branching instruction 186 .
- a vector processing unit such as matrix vector processing unit 190 or filter vector processing unit 195 can be configured/programmed to move blocks of data, to perform vector or scalar operations on the data, to perform conditional or unconditional branching, or to perform other logical or arithmetic operations.
- the vector instruction 182 can include commands and data to perform multiple simultaneous logical or arithmetic operations via a single instruction.
- the vector data can include data blocks of 32 bits or more and the matrix or vector filter operations include any of the operations discussed in conjunction with either matrix VPU 190 or filter VPU 195 .
- the scalar instruction 184 can include commands and data to perform single scalar logical or arithmetic operations via a single instruction.
- the scalar data can include scalar data blocks of 32 bits or less or long scalar blocks of more than 32 bits.
- Matrix or filter scalar operations include mask creation, data masking, addressing instructions, data move operations, flag calculations, etc.
- Branching instructions include conditional or unconditional branching instructions based on logical or arithmetic conditions.
- the filter VPU 195 implements a deblocking filter as part of deblocking filter module 222 .
- the filter VPU 195 executes filter VPU instructions 181 in a similar fashion to a function or subroutine call.
- the filter VPU 195 can execute a data move command to configure a particular n-tap deblocking filter, based on the selection of the particular video coding format, by loading filter coefficients and other configuration data to establish an initial filter configuration.
- the deblock strength is retrieved to optionally adjust the filter coefficients or otherwise adjust the filter configuration to a current deblock strength.
- input data 196 is retrieved, filtered and transferred to a buffer in response to filter commands.
- FIG. 12 presents a block diagram representation of a VPU 200 in accordance with an embodiment of the present invention.
- vector processor unit 200 is a further example of any of the vector processor units (VPU 1 , VPU 2 , . . . VPU N) of hardware accelerator module 156 , including matrix VPU 190 and filter VPU 195 or other vector processor or a component of any of the foregoing.
- VPU 200 includes a vector function module 202 that generates vector function data 204 based on a vector function of a first input vector V 1 and a second input vector V 2 .
- a selection module 210 selects each element of a vector output 216 as either a corresponding element of the vector function data 204 or a corresponding element of a third input vector V 3 .
- VPU 200 includes a control register 210 for storing the selection data 212 , based on selection instruction 214 that is input to configure the vector processing unit 200 .
- the vector function can include an arithmetic function, such as a binary addition, subtraction, multiplication, division or other arithmetic operator.
- the vector function can be a logical function or any other function.
- the vector function can includes a selection of the vector function data 204 as either the first input vector V 1 or a second input vector V 2 .
- third input vector V 3 can be either the first input vector V 1 and a second input vector V 2 .
- the vector output 216 can be made up of interspersed elements of the first input vector V 1 and a second input vector V 2 .
- the particular vector function of vector function module 202 can be configured via function instruction 215 .
- the vector function module 202 can include a plurality of different vector functions that are selected via corresponding values of the function instruction 215 .
- the function instruction 215 can be scalar or vector instruction.
- the selection module 210 selects each element of the vector output 216 based on selection data 212 .
- the selection data 212 indicates a subset of elements of the vector output 216 that correspond to the vector function data 204 and further another subset of elements of the vector output 216 that correspond to the third input vector V 3 .
- the selection data 212 indicates which of the elements of the third input vector V 3 will be modified to be the vector function data 204 and which other elements of the third input vector V 3 will be left alone, unmodified.
- the selection instruction 214 can be a vector instruction, such as a binary selection vector that indicates, via a binary value for each element the third input vector V 3 , which elements of V 3 will be modified and unmodified.
- the instruction 214 can be a scalar instruction, such as scalar value that corresponds to one of a plurality of sets of selection data 212 that indicates which elements of V 3 will be modified and unmodified.
- V 1 (a 1 , a 2 , a 3 , . . . a n )
- V 2 (b 1 , b 2 , b 3 , . . . b n )
- V 3 (c 1 , c 2 , c 3 , . . . c n )
- V f (d 1 , d 2 , d 3 , . . . d n )
- selection data 212 is a vector as follows:
- V o (e 1 , e 2 , e 3 , . . . e n )
- the values of the vector output 216 can be calculated as:
- the vector processing unit 200 allows implementation of a wide range of logical/arithmetic vector functions without branching instructions.
- conditional branching is implemented as a hardware function or selection, without the need to implement software branching, conditional statements, etc.
- FIG. 13 presents a block diagram representation of a VPU 300 in accordance with an embodiment of the present invention.
- vector processor unit 300 is a further example of any of the vector processor units (VPU 1 , VPU 2 , . . . VPU N) of hardware accelerator module 156 , including matrix VPU 190 and filter VPU 195 or other vector processor or a component of any of the foregoing.
- VPU 300 includes a control register 310 for storing read/write instructions 314 of the operational instructions of a program stored in memory to be executed to generate a decoded video signal from EDC data.
- the matrix memory 304 can be a single memory device or a plurality of memory devices. Such a memory device can include a random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information.
- the read/write instructions 314 include vector read instructions that include vector read orientation data and vector write instructions that include vector write orientation data that are input to configure the vector processor unit 300 based on command data 312 .
- VPU 300 includes a matrix memory 304 that stores matrix data corresponding to a plurality of rows and columns and that generates vector read data 306 in a first read orientation when the vector read orientation data has a first value and that generates vector read data 306 in a second read orientation when the vector read orientation data has a second value.
- the matrix memory stores vector write data 302 in a first write orientation when the vector write orientation data has a third value and that stores vector write data 302 in a second write orientation when the vector write orientation data has a fourth value.
- read/write instructions 314 can be formatted in accordance with the following table:
- Read/write indicator Orientation indicator Command data Read Row Read in row orientation Read Column Read in column orientation Write Row Write in row orientation Write Column Write in column orientation It should be noted that the values of the read/write indicator, the orientation indicator and the command data 312 can be represented by different digital values.
- read/write instructions 314 are loaded in control register 310 .
- matrix memory 304 reads vector read data 306 from the matrix memory 304 in either column or row orientation or writes vector write data to the matrix, memory 304 in either column or row orientation.
- vector data can be stored and retrieved in a traditional fashion.
- vector data can be automatically transposed, without the need for further data manipulation and further instructions.
- vector processor unit 300 is shown as a separate unit, in other embodiments the components of vector processor unit 300 can be included as components of another vector processor unit such as a matrix multiplication unit or a unit that performs other matrix mathematical functions that employ matrix transpositions as part of the input/output manipulation of matrix data.
- Examples of read and write operations for the matrix memory 304 are shown in conjunction with FIGS. 14-17 that follow.
- FIG. 14 presents a graphical representation of a write operation in accordance with an embodiment of the present invention.
- an example write operation of matrix memory 304 is shown.
- vector write data 302 is represented by (x 1 , x 2 , . . . ) and is stored in row orientation of as the kth row as (x k1 , x k2 , . . . x ki , . . . ).
- This operation for a plurality of vector write data 302 and writing different rows, an entire matrix can be stored.
- FIG. 15 presents a graphical representation of a write operation in accordance with an embodiment of the present invention.
- vector write data 302 is also represented by (x 1 , x 2 , . . . ) but is stored in column orientation of as the ith column as (x 1i , x 2i , . . . x ki , . . . ).
- This operation for a plurality of vector write data 302 and writing different columns, an entire matrix can be stored.
- FIG. 16 presents a graphical representation of a read operation in accordance with an embodiment of the present invention.
- vector read data 306 is represented by (x 1 , x 2 , . . . ) and is retrieved in row orientation as the kth row as (x k1 , x k2 , . . . x ki , . . . ).
- This operation for a plurality of vector read data 306 and reading different rows, an entire matrix can be read.
- FIG. 17 presents a graphical representation of a read operation in accordance with an embodiment of the present invention.
- vector read data 306 is also represented by (x 1 , x 2 , . . . ) and is retrieved in column orientation as the ith column as (x 1i , x 2i , . . . x ki , . . . ).
- This operation for a plurality of vector read data 306 and reading different columns, an entire matrix can be read.
- vector data can be stored and retrieved in a traditional fashion.
- vector data can be automatically transposed, without the need for further data manipulation and further instructions. For example, writing data as shown in FIG. 14 and reading the data as shown in FIG. 16 yields no transposition. However, writing data as shown in FIG. 14 and reading the data as shown in FIG. 17 yields a transposition of the matrix data.
- FIG. 18 presents a block diagram representation of a VPU 325 in accordance with an embodiment of the present invention.
- vector processor unit 325 is a further example of any of the vector processor units (VPU 1 , VPU 2 , . . . VPU N) of hardware accelerator module 156 , including matrix VPU 190 and filter VPU 195 or other vector processor or a component of any of the foregoing.
- VPU 325 includes a matrix multiplier 320 that generates output data 326 based on a matrix multiplication of input data 322 and input data 324 .
- VPU 325 includes a control register 330 for storing a matrix instruction 334 that includes matrix input configuration data 332 that configures the matrix multiplier 320 .
- the matrix input configuration data 332 indicates the dimensionality of the input data 322 and the input data 324 , and by inference the dimensionality of the output data 326 .
- different values of the matrix input configuration data can correspond to input data formatted as a 1 ⁇ 8 matrix, an 8 ⁇ 8 matrix, a 4 ⁇ 4 matrix or other dimensions.
- matrix instructions 314 can be formatted in accordance with the following table:
- matrix instructions 314 are loaded in control register 330 .
- the matrix multiplier 320 multiplies the input data 322 by the input data 324 to generate the output data 326 .
- the matrix multiplier includes a plurality of multipliers and adders that are configured, based on the matrix input configuration data 332 to perform the mathematical functions associated with matrix multiplication.
- FIG. 19 presents a block diagram representation of a video distribution system 375 in accordance with an embodiment of the present invention.
- video signal 110 is transmitted from a video encoder via a transmission path 122 to a video decoder 102 .
- the video decoder 102 operates to decode the video signal 110 for display on a display devices 12 or 14 or other display device.
- video decoder 102 can be implemented in a set-top box, digital video recorder, router or home gateway.
- decoder 102 can optionally be incorporated directly in the display device 12 or 14 .
- the transmission path 122 can include a wireless path that operates in accordance with a wireless local area network protocol such as an 802.11 protocol, a WIMAX protocol, a Bluetooth protocol, etc. Further, the transmission path can include a wired path that operates in accordance with a wired protocol such as a Universal Serial Bus protocol, an Ethernet protocol or other high speed protocol.
- a wireless local area network protocol such as an 802.11 protocol, a WIMAX protocol, a Bluetooth protocol, etc.
- the transmission path can include a wired path that operates in accordance with a wired protocol such as a Universal Serial Bus protocol, an Ethernet protocol or other high speed protocol.
- FIG. 20 presents a block diagram representation of a video storage system 179 in accordance with an embodiment of the present invention.
- device 11 is a set top box with built-in digital video recorder functionality, a stand alone digital video recorder, a DVD recorder/player or other device that stores the video signal 110 .
- device 11 can include video decoder 102 that operates to decode the video signal 110 when retrieved from storage to generate a processed video signal 112 in a format that is suitable for display by video display device 12 .
- video storage system 179 can include a hard drive, flash memory device, computer, DVD burner, or any other device that is capable of generating, storing, decoding, transcoding and/or displaying the video content of video signal 110 in accordance with the methods and systems described in conjunction with the features and functions of the present invention as described herein.
- FIG. 21 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. In particular, a method is presented for use in conjunction with one or more functions and features described in conjunction with FIGS. 1-20 .
- entropy decoded (EDC) data is generated from an encoded video signal.
- a decoded video signal is generated from the EDC data, via a plurality of vector processor units, in response to a plurality of operational instructions including at least one vector read instruction that includes vector read orientation data, wherein at least one of the plurality of vector processor units operates by: storing matrix data in a matrix memory corresponding to a plurality of rows and columns; generating vector read data in a first read orientation when the vector read orientation data has a first value; and generating vector read data in a second read orientation when the vector read orientation data has a second value.
- the vector read data in the first read orientation corresponds to matrix data in one of the plurality of rows of the matrix memory
- the vector read data in the second read orientation corresponds to matrix data in one of the plurality of columns of the matrix memory
- the plurality of operational instructions can further include at least one vector write instruction that includes vector write orientation data, and wherein the at least one of the plurality of vector processor units further operates by: storing vector write data in a first write orientation when the vector write orientation data has a third value; and storing vector write data in a second write orientation when the vector write orientation data has a fourth value.
- the vector write data in the first write orientation corresponds to matrix data in one of the plurality of rows of the matrix memory
- the vector write data in the second write orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
- FIG. 22 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. In particular, a method is presented for use in conjunction with one or more functions and features described in conjunction with FIGS. 1-21 .
- entropy decoded (EDC) data is generated from an encoded video signal.
- a decoded video signal is generated from the EDC data, via a plurality of vector processor units, in response to a plurality of operational instructions including at least one matrix multiply instruction that includes matrix input configuration data, wherein at least one of the plurality of vector processor units operates by: generating output data based on a multiplication of first input data and second input data in accordance with the matrix input configuration data, wherein the matrix input configuration data indicates the dimensionality of the first input data and the second input data.
- the first input data is formatted as a 1 ⁇ 8 matrix when the matrix input configuration data has a first value, is formatted as an 8 ⁇ 8 matrix when the matrix input configuration data has a second value.
- the second input data is formatted as a 1 ⁇ 8 matrix when the matrix input configuration data has a first value and is formatted as an 8 ⁇ 8 matrix when the matrix input configuration data has a second value.
- the first input data and the second input data can both be formatted as a 4 ⁇ 4 matrix when the matrix input configuration data has another value.
- the term “substantially” or “approximately”, as may be used herein, provides an industry-accepted tolerance to its corresponding term and/or relativity between items. Such an industry-accepted tolerance ranges from less than one percent to twenty percent and corresponds to, but is not limited to, component values, integrated circuit process variations, temperature variations, rise and fall times, and/or thermal noise. Such relativity between items ranges from a difference of a few percent to magnitude differences.
- the term “coupled”, as may be used herein, includes direct coupling and indirect coupling via another component, element, circuit, or module where, for indirect coupling, the intervening component, element, circuit, or module does not modify the information of a signal but may adjust its current level, voltage level, and/or power level.
- inferred coupling i.e., where one element is coupled to another element by inference
- inferred coupling includes direct and indirect coupling between two elements in the same manner as “coupled”.
- the term “compares favorably”, as may be used herein, indicates that a comparison between two or more elements, items, signals, etc., provides a desired relationship. For example, when the desired relationship is that signal 1 has a greater magnitude than signal 2 , a favorable comparison may be achieved when the magnitude of signal 1 is greater than that of signal 2 or when the magnitude of signal 2 is less than that of signal 1 .
- a module includes a functional block that is implemented in hardware, software, and/or firmware that performs one or module functions such as the processing of an input signal to produce an output signal.
- a module may contain submodules that themselves are modules.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A multi-format video decoder includes an entropy decoding device that generates entropy decoded (EDC) data from an encoded video signal. A multi-format video decoding device includes a memory module that stores a plurality of operational instructions including at least one vector read instruction that includes vector read orientation data. A plurality of vector processor units generate a decoded video signal from the EDC data, wherein at least one of the plurality of vector processors include a matrix memory that stores matrix data corresponding to a plurality of rows and columns and that generates vector read data in a first read orientation when the vector read orientation data has a first value and that generates vector read data in a second read orientation when the vector read orientation data has a second value.
Description
- The present application claims priority under 35 USC 119(e) to the provisionally filed application entitled, “VIDEO DECODER WITH VECTOR PROCESSING UNIT AND METHODS FOR USE THEREWITH,” having Ser. No. 61/494,614, filed on Jun. 8, 2011, the contents of which are incorporated herein by reference thereto.
- The present invention relates to coding used in devices such as video encoders/decoders for stereoscopic television signals.
- Video encoding has become an important issue for modern video processing devices. Robust encoding algorithms allow video signals to be transmitted with reduced bandwidth and stored in less memory. However, the accuracy of these encoding methods face the scrutiny of users that are becoming accustomed to greater resolution and higher picture quality. Standards have been promulgated for many encoding methods including the H.264 standard that is also referred to as MPEG-4,
part 10 or Advanced Video Coding (AVC), and the VP8 standard set forth by On2 Technologies, Inc. While these standards set forth many powerful techniques, further improvements are possible to improve the performance and speed of implementation of such methods. The video signal encoded by these encoding methods must be similarly decoded for playback on most video display devices. - The Motion Picture Expert Group (MPEG) has presented a Scalable Video Coding (SVC) Annex G extension to H.264/MPEG-4 AVC for standardization. SVC provides for encoding of video bitstreams that include subset bitstreams that can represent lower spatial resolution, lower temporal resolution or otherwise lower quality video. A subset bitstream can be derived by dropping packets from the total bitstream. SVC streams allow end devices to flexibly scale the temporal resolution, spatial resolution or video fidelity, for example, to match the capabilities of a particular device.
- Efficient and fast encoding and decoding of video signals is important to the implementation of many video devices, particularly video devices that are destined for home use. Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of ordinary skill in the art through comparison of such systems with the present invention.
-
FIGS. 1-3 present pictorial diagram representations of various video devices in accordance with embodiments of the present invention. -
FIG. 4 presents a block diagram representation of a video system in accordance with an embodiment of the present invention. -
FIG. 5 presents a block diagram representation of avideo decoder 102 in accordance with an embodiment of the present invention. -
FIG. 6 presents a block diagram representation of amulti-format video decoder 150 in accordance with an embodiment of the present invention. -
FIG. 7 presents a block diagram representation of amulti-format video decoder 150 in accordance with an embodiment of the present invention. -
FIG. 8 presents a block diagram representation of a decoding process in accordance with an embodiment of the present invention. -
FIG. 9 presents a block diagram representation of a matrixvector processing unit 190 in accordance with another embodiment of the present invention. -
FIG. 10 presents a block diagram representation of a filter vector processing unit in accordance with another embodiment of the present invention. -
FIG. 11 presents a block diagram representation of aVPU instruction 180 in accordance with another embodiment of the present invention. -
FIG. 12 presents a block diagram representation of aVPU 200 in accordance with an embodiment of the present invention. -
FIG. 13 presents a block diagram representation of aVPU 300 in accordance with an embodiment of the present invention. -
FIG. 14 presents a graphical representation of a write operation in accordance with an embodiment of the present invention. -
FIG. 15 presents a graphical representation of a write operation in accordance with an embodiment of the present invention. -
FIG. 16 presents a graphical representation of a read operation in accordance with an embodiment of the present invention. -
FIG. 17 presents a graphical representation of a read operation in accordance with an embodiment of the present invention. -
FIG. 18 presents a block diagram representation of aVPU 325 in accordance with an embodiment of the present invention. -
FIG. 19 presents a block diagram representation of avideo distribution system 375 in accordance with an embodiment of the present invention. -
FIG. 20 presents a block diagram representation of avideo storage system 179 in accordance with an embodiment of the present invention. -
FIG. 21 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. -
FIG. 22 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. -
FIGS. 1-3 present pictorial diagram representations of various video devices in accordance with embodiments of the present invention. In particular, settop box 10 with built-in digital video recorder functionality or a stand alone digital video recorder, television ormonitor 20 andportable computer 30 illustrate electronic devices that incorporate a video decoder in accordance with one or more features or functions of the present invention. While these particular devices are illustrated, the present invention can be implemented in any device that is capable of decoding and/or transcoding video content in accordance with the methods and systems described in conjunction withFIGS. 4-15 and the appended claims. -
FIG. 4 presents a block diagram representation of avideo decoder 102 in accordance with an embodiment of the present invention. In particular, this video device includes areceiving module 100, such as a server, cable head end, television receiver, cable television receiver, satellite broadcast receiver, broadband modem, 3G transceiver or other information receiver or transceiver that is capable of receiving a receivedsignal 98 and generating avideo signal 110 that has been encoded via a video encoding format.Video processing device 125 includesvideo decoder 102 and is coupled to the receivingmodule 100 to decode or transcode the video signal for storage, editing, and/or playback in a format corresponding tovideo display device 104. Video processing device can include settop box 10 with built-in digital video recorder functionality or a stand alone digital video recorder. While shown as separate fromvideo display device 104,video processing device 125, includingvideo decoder 102 can be incorporated in television or monitor 20 andportable computer 30 of other device that includes a video decoder, such asvideo decoder 102. - In an embodiment of the present invention, the received
signal 98 is a broadcast video signal, such as a television signal, high definition television signal, enhanced definition television signal or other broadcast video signal that has been transmitted over a wireless medium, either directly or through one or more satellites or other relay stations or through a cable network, optical network or other transmission network. In addition, receivedsignal 98 can be generated from a stored video file, played back from a recording medium such as a magnetic tape, magnetic disk or optical disk, and can include a streaming video signal that is transmitted over a public or private network such as a local area network, wide area network, metropolitan area network or the Internet. -
Video signal 110 can include a digital video signal complying with a digital video codec standard such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC) including an SVC signal, an encoded stereoscopic video signal having a base layer that includes a 2D compatible base layer and an enhancement layer generated by processing in accordance with an MVC extension of MPEG-4 AVC, or another digital format such as a Motion Picture Experts Group (MPEG) format (such as MPEG1, MPEG2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), video coding one (VC-1), VP8, etc. -
Video display devices 104 can include a television, monitor, computer, handheld device or other video display device that creates an optical image stream either directly or indirectly, such as by projection, based on the processedvideo signal 112 either as a streaming video signal or by playback of a stored digital video file. -
FIG. 5 presents a block diagram representation of avideo decoder 102 in accordance with an embodiment of the present invention.Video decoder 102 includes anentropy decoding device 140 having aprocessing module 142 that generates entropy decoded (EDC)data 146 from an encoded video signal such asvideo signal 110. TheEDC data 146 can include run level data, motion vector differential data, and macroblock header data and/or other data that results from the entropy decoding of an encoded video signal. Multi-formatvideo decoding device 150 includes aprocessing module 152, amemory module 154 and ahardware accelerator module 156 that operate to generate a decoded video signal, such as processedvideo signal 112, from theEDC data 146. - In an embodiment of the present invention, the
entropy decoding device 140 and the multi-formatvideo decoding device 150 operate contemporaneously in a pipelined process where the multi-formatvideo decoding device 150 generates a first portion of the decoded video signal during at least a portion of time that theentropy decoding device 140 generatesEDC data 146 from a second portion of the encoded video signal. - The
142 and 152 can each be implemented using a single processing device or a plurality of processing devices. Such a processing device may be a microprocessor, co-processors, a micro-controller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, and/or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such asprocessing modules 144 and 154. These memories may each be a single memory device or a plurality of memory devices. Such a memory device can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when thememory modules 142 and 152 implement one or more of their functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry. In an embodiment of the present invention theprocessing modules 142 and 152 each includes a processor produced by ARC International to implement theprocessing modules neighbor management module 218, however other processor configurations could likewise be employed. -
FIG. 6 presents a block diagram representation of amulti-format video decoder 150 in accordance with an embodiment of the present invention. Thememory module 154 includes asoftware library 160 that stores format configuration data corresponding to a plurality of video coding formats such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC) including the SVC and MVC extensions, MPEG2, MPEG4, Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), video coding one (VC-1), VP8, or other video coding/compression format, etc. Thehardware accelerator module 156 includes a plurality of vector processor units (VPU 1,VPU 2, . . . VPU N) that operate in conjunction withprocessing module 152 to generate a decoded video signal from theEDC data 146. The plurality of vector processing units and theprocessing module 152 are configured, based on the configuration data, to a selected one of the plurality of video coding formats. In this fashion, a sincevideo decoder 102 can be configured for operation of the particular video coding format or formats of thevideo signal 110. - In an embodiment of the present invention, the
multi-format video decoder 150 can receive selection data from a user or designer that indicates the particular video coding format. In another embodiment of the present invention,EDC data 146 can be analyzed by processingmodule 152 to identify the video coding format of thevideo signal 110. In either case, themulti-format video decoder 150 responds to the selection by retrieving the configuration data from thesoftware library 160 and by configuring theprocessing module 152 and the vector processing units to decode the selected video coding format. - Configuration data can include loading program instructions executed by the
processing module 152 and the vector processing units of thehardware accelerator module 156 along with other data used in conjunction with the decoding of theEDC data 146. For example, when a particular video coding format is selected, software forprocessing module 152 and VPU instructions for thehardware accelerator module 156 are selected to be executed. In one mode of operation, the VPU instructions include one or more instructions that configure the vector processing units ofhardware accelerator module 156 to the selected coding format, in addition to instructions that perform the particular decoding operations performed by the vector processing units in accordance with the selected video coding format. - As will be discussed further in conjunction with
FIGS. 7-12 , the vector processors can include one or more matrix vector processors that perform parallel matrix operations such as matrix addition, multiplication, transposition, inversion and/or other matrix operation. The vector processors can further include one or more filter vector processors that perform parallel filter operations. These vector processing units are configured via VPU programming instructions that include vector instructions, scalar instructions and branching instructions, to operate in accordance with the selected video coding format. -
FIG. 7 presents a block diagram representation of amulti-format video decoder 150 in accordance with an embodiment of the present invention. In particular,multi-format video decoder 150 includes aprocessing module 152 and amemory module 154 as described in conjunction withFIG. 5 . In addition, the multi-formatvideo decoding device 150 further includes abus 221, asignal interface 158, decodemotion compensation module 204,neighbor management module 218,deblocking filter module 222,inverse transform module 276,inverse quantization module 274, and inverseintra prediction module 211. While a particular bus architecture is shown that represents the functionality of communication between the various modules of multi-formatvideo decoding device 150, other architectures can be implemented in accordance with the broad scope of the present invention. - In operation, the
signal interface 158 receivesEDC data 146 and optionally buffers and preprocesses theEDC data 146 for processing by the other modules of multi-formatvideo decoding device 150. Similarly, the decoded video signal generated via processing by the other modules of multi-formatvideo decoding device 150 is optionally buffered, such as via a ring buffer or other buffer structure implemented in conjunction with memory locations ofmemory module 154 and formatted for output as processedvideo signal 112. - The decode
motion compensation module 204,neighbor management module 218,deblocking filter module 222,inverse transform module 276,inverse quantization module 274, and inverseintra prediction module 211 are configured to operate to decode theEDC data 146 in accordance with the selected video format such as VP8, H.264 (including MVC and/or SVC), VC-1 or other compression standard. In an embodiment of the present invention, the decodemotion compensation module 204,neighbor management module 218,deblocking filter module 222,inverse transform module 276,inverse quantization module 274, inverseintra prediction module 211 are implemented using software stored inmemory module 154 and executed viaprocessing module 152 as well as via vector processing unit instructions executed by the plurality of vector processing units ofhardware accelerator module 156. In a particular embodiment, the decodemotion compensation module 204,deblocking filter module 222, and inverseintra prediction module 211 are implemented using three separate filter vector processing units, one for each module. In addition, theinverse transform module 276 and theinverse quantization module 274 are implemented via two separate matrix vector processing units, one for each module. In an embodiment of the present invention theneighbor management module 218 is implemented via software executed by processingmodule 152. - In operation,
neighbor management module 218 generates motion vector data, macroblock mode data and deblock strength data, based on the motion vector differential data and the macroblock header data. In an embodiment of the present invention, a data structure, such as a linked list, array or one or more registers are used to associate and store neighbor data for each macroblock of a processed picture. In particular, theneighbor management module 218 stores the motion vector data for a group of macroblocks that neighbor a current macroblock and generates the motion vector data for the current macroblock based on both the macroblock mode data and the motion vector data for the group of macroblocks that neighbor the current macroblock. In addition, theneighbor management module 218 calculates a motion vector magnitude and adjusts the deblock strength data based on the motion vector magnitude. - The decode
motion compensation module 204 generates inter-prediction data based on the motion vector data when the macroblock mode data indicates an inter-prediction mode. Theinverse intra-prediction module 211 generates intra-prediction data when the macroblock mode data indicates an intra-prediction mode. Theinverse quantization module 274 dequantizes run level data. The dequantized run level data is inverse transformed, such as via an inverse discrete cosine transform or other inverse transform viainverse transform module 276 to generate residual data. Theinverse intra-prediction module 211 generates reconstructed picture data based on the residual data and on the inter-prediction data when the macroblock mode data indicates an inter-prediction mode and based on the residual data and on the intra-prediction data when the macroblock mode data indicates an intra-prediction mode. - The
deblocking filter module 222 generates the decoded video signal from the reconstructed picture data, based on the deblock strength data. In operation, thedeblocking filter module 222 operates to smooth horizontal and vertical edges of a block that may correspond to exterior boundaries of a macroblock of a frame or field ofvideo signal 110 or edges that occur in the interior of a macroblock. A boundary strength, that is determined based on quantization parameters, adjacent macroblock type, etcetera, can vary the amount of filtering to be performed. For example, the H.264 standard defines two parameters, α and β, that are used to determine the strength of filtering on a particular edge. The parameter α is a boundary edge parameter applied to data that includes macroblock boundaries. The parameter β is an interior edge parameter applied to data that is within a macroblock interior. In accordance with the present invention, motion vector magnitude is used byneighbor management module 218 to generate deblock strength data that adjusts the values for α and β for deblockingfilter module 222. For instance, when the motion vector magnitude indicates large motion vectors, e.g. magnitudes above a first magnitude threshold, a larger value of α can be selected. Further, motion vector magnitude indicates small motion vectors, e.g. magnitudes below the same or other threshold, a smaller value of α can be selected. -
FIG. 8 presents a block diagram representation of a decoding process in accordance with an embodiment of the present invention. In this embodiment, theneighbor management module 218 receives macroblock header and motion vectordifferential data 230 from theEDC data 146 viabuffer 300. Theneighbor management module 218 checks the macroblock (MB) mode from the MB header. In inter-prediction mode, theneighbor management module 218 calculates motion vectors and also calculates deblock strength data and passes this data along with other EDC data, such asrun level data 272 to one or more frame buffers, represented in the process flow as 302, 304, 308, 310 and 318 implemented viabuffers memory module 154. The decodemotion compensation module 204 generates inter-prediction data based on the motion vectors and on reference frames retrieved from the frame buffer and stores the results inbuffer 314, such as a ring buffer. In intra-prediction mode, the inverseintra prediction module 211 generates intra-prediction data. - The
inverse quantization module 274 retrieves runlevel data 272 frombuffer 304 and inverse quantizes the data with data from theframe buffer 302 and generates de-quantized data that is stored inbuffer 306. Inverse transformsmodule 276 inverse transforms the de-quantized data based on the frame buffered data to generate residual data that is stored inbuffer 312. The residual data is combined ininverse intra-prediction module 211 with either intra-prediction data or inter-prediction data supplied in response to the mode determination byneighbor management module 218, to generate current reconstructed frames/fields that are buffered in thebuffer 316. -
Deblocking filter module 222 applies deblocking filtering to the reconstructed frames/fields in accordance with the deblock strength data fromneighbor management module 218 to generate decoded video output in the form of filteredpictures 226 that are buffered viabuffer 320. - The
306, 312, 314, 316, 318 and 320 can each be a ring buffer implemented viabuffers memory module 154, however other buffer configurations are likewise possible. -
FIG. 9 presents a block diagram representation of a matrixvector processing unit 190 in accordance with another embodiment of the present invention. In particular, matrixvector processing unit 190 includes a dedicated hardware block that performs parallel matrix operations such as matrix addition, multiplication, transposition, inversion and/or other matrix operations on aninput matrix 192 to generate anoutput matrix 194. - The matrix
vector processing unit 190 is configured viaVPU instructions 180 that include vector instructions, scalar instructions and branching instructions. TheseVPU instructions 180 include configuration data and commands 170 that configure thematrix VPU 190 in accordance with the selected video coding format and command the matrix vector processing unit to perform the corresponding functions such as all or part of an inverse discrete cosine transform, inverse quantization or other matrix function of themulti-format video decoder 150. TheVPU instructions 180 further include vector and/or scalar data used in conjunction with vector and scalar operations of the device. -
FIG. 10 presents a block diagram representation of a filtervector processing unit 195 in accordance with another embodiment of the present invention. In particular, filtervector processing unit 195 includes a dedicated hardware block that performs parallel filter operations such as an n-tap one-dimensional horizontal filter, an n-tap one-dimensional vertical filter, or an n-tap two-dimensional filter. Thefilter VPU 196 operates to filterinput data 196, such as a block of pixels, a row of pixels, a column of pixels of a video picture or other data to generate filtered data 198. - The filter
vector processing unit 195 is configured viaVPU instructions 181 that include vector instructions, scalar instructions and branching instructions. TheseVPU instructions 181 include configuration data and commands 172 that configure thefilter VPU 195 in accordance with the selected video coding format such as by programming the filter parameters, (e.g. the number of taps, type of filter, and the particular filter coefficients) and command the filter vector processing unit to perform the corresponding functions such as all or part of the generation of inter-prediction data, intra-prediction data and or filtered picture data of themulti-function video decoder 150. TheVPU instructions 181 further include vector and/or scalar data used in conjunction with vector and scalar operations of the device. -
FIG. 11 presents a block diagram representation of a 180 or 181 in accordance with another embodiment of the present invention. As previously discussed, the VPU instructions include three portions,VPU instruction vector instruction 182,scalar instruction 184, and branchinginstruction 186. Through the use of these instructions, a vector processing unit, such as matrixvector processing unit 190 or filtervector processing unit 195 can be configured/programmed to move blocks of data, to perform vector or scalar operations on the data, to perform conditional or unconditional branching, or to perform other logical or arithmetic operations. - In an embodiment of the present invention, the
vector instruction 182 can include commands and data to perform multiple simultaneous logical or arithmetic operations via a single instruction. In an embodiment of the present invention, the vector data can include data blocks of 32 bits or more and the matrix or vector filter operations include any of the operations discussed in conjunction with eithermatrix VPU 190 or filterVPU 195. Thescalar instruction 184 can include commands and data to perform single scalar logical or arithmetic operations via a single instruction. In an embodiment of the present invention, the scalar data can include scalar data blocks of 32 bits or less or long scalar blocks of more than 32 bits. Matrix or filter scalar operations include mask creation, data masking, addressing instructions, data move operations, flag calculations, etc. Branching instructions include conditional or unconditional branching instructions based on logical or arithmetic conditions. - In an example of operation, the
filter VPU 195 implements a deblocking filter as part ofdeblocking filter module 222. In one mode of operation, thefilter VPU 195 executesfilter VPU instructions 181 in a similar fashion to a function or subroutine call. For example, in aninitial VPU instruction 181, thefilter VPU 195 can execute a data move command to configure a particular n-tap deblocking filter, based on the selection of the particular video coding format, by loading filter coefficients and other configuration data to establish an initial filter configuration. Insubsequent VPU instructions 181, the deblock strength is retrieved to optionally adjust the filter coefficients or otherwise adjust the filter configuration to a current deblock strength. In addition,input data 196 is retrieved, filtered and transferred to a buffer in response to filter commands. -
FIG. 12 presents a block diagram representation of aVPU 200 in accordance with an embodiment of the present invention. In particular,vector processor unit 200, is a further example of any of the vector processor units (VPU 1,VPU 2, . . . VPU N) ofhardware accelerator module 156, includingmatrix VPU 190 and filterVPU 195 or other vector processor or a component of any of the foregoing.VPU 200 includes avector function module 202 that generatesvector function data 204 based on a vector function of a first input vector V1 and a second input vector V2.A selection module 210 selects each element of avector output 216 as either a corresponding element of thevector function data 204 or a corresponding element of a third input vector V3. As shown,VPU 200 includes acontrol register 210 for storing theselection data 212, based onselection instruction 214 that is input to configure thevector processing unit 200. - The vector function can include an arithmetic function, such as a binary addition, subtraction, multiplication, division or other arithmetic operator. In addition, the vector function can be a logical function or any other function. For example, the vector function can includes a selection of the
vector function data 204 as either the first input vector V1 or a second input vector V2. Further, third input vector V3 can be either the first input vector V1 and a second input vector V2. In this fashion, thevector output 216 can be made up of interspersed elements of the first input vector V1 and a second input vector V2. In an embodiment of the present invention, the particular vector function ofvector function module 202 can be configured viafunction instruction 215. For example, thevector function module 202 can include a plurality of different vector functions that are selected via corresponding values of thefunction instruction 215. Thefunction instruction 215 can be scalar or vector instruction. - In operation, the
selection module 210 selects each element of thevector output 216 based onselection data 212. For example, theselection data 212 indicates a subset of elements of thevector output 216 that correspond to thevector function data 204 and further another subset of elements of thevector output 216 that correspond to the third input vector V3. In this fashion, theselection data 212 indicates which of the elements of the third input vector V3 will be modified to be thevector function data 204 and which other elements of the third input vector V3 will be left alone, unmodified. Theselection instruction 214 can be a vector instruction, such as a binary selection vector that indicates, via a binary value for each element the third input vector V3, which elements of V3 will be modified and unmodified. In an alternative embodiment, theinstruction 214 can be a scalar instruction, such as scalar value that corresponds to one of a plurality of sets ofselection data 212 that indicates which elements of V3 will be modified and unmodified. - Consider the following example, where:
- V1=(a1, a2, a3, . . . an)
- V2=(b1, b2, b3, . . . bn)
- V3=(c1, c2, c3, . . . cn)
- where, the
vector function data 204, is represented as Vf - Vf=(d1, d2, d3, . . . dn)
- where, the ith element di is generated as
- di=f(ai, bi)
- for i=(1, 2, 3, . . . n) and where f is the vector function. Consider further that the
selection data 212 is a vector as follows: - S=(s1, s2, s3, . . . sn)
- where si=1, indicates a selection of the
vector function data 204 in the ith position of thevector output 216 and si=0, indicates a selection of the third input vector V3 in the ith position of thevector output 216. Representing thevector output 216 as - Vo=(e1, e2, e3, . . . en)
- for i=(1, 2, 3, . . . n), the values of the
vector output 216 can be calculated as: - ei=di, if si=1, and
- ei=ci, if si=0
- The
vector processing unit 200 allows implementation of a wide range of logical/arithmetic vector functions without branching instructions. In effect, the conditional branching is implemented as a hardware function or selection, without the need to implement software branching, conditional statements, etc. -
FIG. 13 presents a block diagram representation of aVPU 300 in accordance with an embodiment of the present invention. In particular,vector processor unit 300, is a further example of any of the vector processor units (VPU 1,VPU 2, . . . VPU N) ofhardware accelerator module 156, includingmatrix VPU 190 and filterVPU 195 or other vector processor or a component of any of the foregoing. As shown,VPU 300 includes acontrol register 310 for storing read/writeinstructions 314 of the operational instructions of a program stored in memory to be executed to generate a decoded video signal from EDC data. Thematrix memory 304 can be a single memory device or a plurality of memory devices. Such a memory device can include a random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. - The read/write
instructions 314 include vector read instructions that include vector read orientation data and vector write instructions that include vector write orientation data that are input to configure thevector processor unit 300 based oncommand data 312.VPU 300 includes amatrix memory 304 that stores matrix data corresponding to a plurality of rows and columns and that generates vector readdata 306 in a first read orientation when the vector read orientation data has a first value and that generates vector readdata 306 in a second read orientation when the vector read orientation data has a second value. The matrix memory stores vector writedata 302 in a first write orientation when the vector write orientation data has a third value and that stores vector writedata 302 in a second write orientation when the vector write orientation data has a fourth value. - For example, read/write
instructions 314 can be formatted in accordance with the following table: -
Read/write indicator Orientation indicator Command data Read Row Read in row orientation Read Column Read in column orientation Write Row Write in row orientation Write Column Write in column orientation
It should be noted that the values of the read/write indicator, the orientation indicator and thecommand data 312 can be represented by different digital values. - In operation, read/write
instructions 314 are loaded incontrol register 310. In response to commanddata 312 that implements these instructions,matrix memory 304 reads vector readdata 306 from thematrix memory 304 in either column or row orientation or writes vector write data to the matrix,memory 304 in either column or row orientation. By selection of the same orientation for read and write operations, vector data can be stored and retrieved in a traditional fashion. By selection of different orientations for read and write operations however, vector data can be automatically transposed, without the need for further data manipulation and further instructions. - It should be noted that, while
vector processor unit 300 is shown as a separate unit, in other embodiments the components ofvector processor unit 300 can be included as components of another vector processor unit such as a matrix multiplication unit or a unit that performs other matrix mathematical functions that employ matrix transpositions as part of the input/output manipulation of matrix data. - Examples of read and write operations for the
matrix memory 304 are shown in conjunction withFIGS. 14-17 that follow. -
FIG. 14 presents a graphical representation of a write operation in accordance with an embodiment of the present invention. In particular, an example write operation ofmatrix memory 304 is shown. In this example,vector write data 302 is represented by (x1, x2, . . . ) and is stored in row orientation of as the kth row as (xk1, xk2, . . . xki, . . . ). By repeating this operation for a plurality ofvector write data 302 and writing different rows, an entire matrix can be stored. -
FIG. 15 presents a graphical representation of a write operation in accordance with an embodiment of the present invention. In particular, another example write operation ofmatrix memory 304 is shown. In this further example,vector write data 302 is also represented by (x1, x2, . . . ) but is stored in column orientation of as the ith column as (x1i, x2i, . . . xki, . . . ). By repeating this operation for a plurality ofvector write data 302 and writing different columns, an entire matrix can be stored. -
FIG. 16 presents a graphical representation of a read operation in accordance with an embodiment of the present invention. In particular, an example read operation ofmatrix memory 304 is shown. In this example, vector readdata 306 is represented by (x1, x2, . . . ) and is retrieved in row orientation as the kth row as (xk1, xk2, . . . xki, . . . ). By repeating this operation for a plurality of vector readdata 306 and reading different rows, an entire matrix can be read. -
FIG. 17 presents a graphical representation of a read operation in accordance with an embodiment of the present invention. In particular, another example read operation ofmatrix memory 304 is shown. In this example, vector readdata 306 is also represented by (x1, x2, . . . ) and is retrieved in column orientation as the ith column as (x1i, x2i, . . . xki, . . . ). By repeating this operation for a plurality of vector readdata 306 and reading different columns, an entire matrix can be read. - As discussed in conjunction with
FIG. 13 , by selection of the same orientation for read and write operations, vector data can be stored and retrieved in a traditional fashion. By selection of different orientations for read and write operations however, vector data can be automatically transposed, without the need for further data manipulation and further instructions. For example, writing data as shown inFIG. 14 and reading the data as shown inFIG. 16 yields no transposition. However, writing data as shown inFIG. 14 and reading the data as shown inFIG. 17 yields a transposition of the matrix data. -
FIG. 18 presents a block diagram representation of aVPU 325 in accordance with an embodiment of the present invention. In particular,vector processor unit 325, is a further example of any of the vector processor units (VPU 1,VPU 2, . . . VPU N) ofhardware accelerator module 156, includingmatrix VPU 190 and filterVPU 195 or other vector processor or a component of any of the foregoing.VPU 325 includes amatrix multiplier 320 that generatesoutput data 326 based on a matrix multiplication ofinput data 322 andinput data 324. As shown,VPU 325 includes a control register 330 for storing amatrix instruction 334 that includes matrix input configuration data 332 that configures thematrix multiplier 320. In particular the matrix input configuration data 332 indicates the dimensionality of theinput data 322 and theinput data 324, and by inference the dimensionality of theoutput data 326. - In an embodiment of the present invention, different values of the matrix input configuration data can correspond to input data formatted as a 1×8 matrix, an 8×8 matrix, a 4×4 matrix or other dimensions. For example,
matrix instructions 314 can be formatted in accordance with the following table: -
Dimensions of Dimensions of Dimensions of input data 322input data 324output data 3261 × 8 8 × 8 1 × 8 8 × 8 1 × 8 8 × 1 4 × 4 4 × 4 4 × 4
It should be noted that the dimensions of theinput data 322,input data 324 andoutput data 326 can be represented by different digital values. - In operation,
matrix instructions 314 are loaded in control register 330. In response to matrix input configuration data 332 thematrix multiplier 320 multiplies theinput data 322 by theinput data 324 to generate theoutput data 326. In an embodiment of the present invention, the matrix multiplier includes a plurality of multipliers and adders that are configured, based on the matrix input configuration data 332 to perform the mathematical functions associated with matrix multiplication. -
FIG. 19 presents a block diagram representation of avideo distribution system 375 in accordance with an embodiment of the present invention. In particular,video signal 110 is transmitted from a video encoder via atransmission path 122 to avideo decoder 102. Thevideo decoder 102 operates to decode thevideo signal 110 for display on a 12 or 14 or other display device. In an embodiment of the present invention,display devices video decoder 102 can be implemented in a set-top box, digital video recorder, router or home gateway. In the alternative,decoder 102 can optionally be incorporated directly in the 12 or 14.display device - The
transmission path 122 can include a wireless path that operates in accordance with a wireless local area network protocol such as an 802.11 protocol, a WIMAX protocol, a Bluetooth protocol, etc. Further, the transmission path can include a wired path that operates in accordance with a wired protocol such as a Universal Serial Bus protocol, an Ethernet protocol or other high speed protocol. -
FIG. 20 presents a block diagram representation of avideo storage system 179 in accordance with an embodiment of the present invention. In particular,device 11 is a set top box with built-in digital video recorder functionality, a stand alone digital video recorder, a DVD recorder/player or other device that stores thevideo signal 110. In this configuration,device 11 can includevideo decoder 102 that operates to decode thevideo signal 110 when retrieved from storage to generate a processedvideo signal 112 in a format that is suitable for display byvideo display device 12. While these particular devices are illustrated,video storage system 179 can include a hard drive, flash memory device, computer, DVD burner, or any other device that is capable of generating, storing, decoding, transcoding and/or displaying the video content ofvideo signal 110 in accordance with the methods and systems described in conjunction with the features and functions of the present invention as described herein. -
FIG. 21 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. In particular, a method is presented for use in conjunction with one or more functions and features described in conjunction withFIGS. 1-20 . Instep 400, entropy decoded (EDC) data is generated from an encoded video signal. Instep 402, a decoded video signal is generated from the EDC data, via a plurality of vector processor units, in response to a plurality of operational instructions including at least one vector read instruction that includes vector read orientation data, wherein at least one of the plurality of vector processor units operates by: storing matrix data in a matrix memory corresponding to a plurality of rows and columns; generating vector read data in a first read orientation when the vector read orientation data has a first value; and generating vector read data in a second read orientation when the vector read orientation data has a second value. - In an embodiment of the present invention, the vector read data in the first read orientation corresponds to matrix data in one of the plurality of rows of the matrix memory, and the vector read data in the second read orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
- The plurality of operational instructions can further include at least one vector write instruction that includes vector write orientation data, and wherein the at least one of the plurality of vector processor units further operates by: storing vector write data in a first write orientation when the vector write orientation data has a third value; and storing vector write data in a second write orientation when the vector write orientation data has a fourth value. In an embodiment of the present invention, the vector write data in the first write orientation corresponds to matrix data in one of the plurality of rows of the matrix memory, and the vector write data in the second write orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
-
FIG. 22 presents a flow diagram representation of a method in accordance with an embodiment of the present invention. In particular, a method is presented for use in conjunction with one or more functions and features described in conjunction withFIGS. 1-21 . Instep 410, entropy decoded (EDC) data is generated from an encoded video signal. Instep 412, a decoded video signal is generated from the EDC data, via a plurality of vector processor units, in response to a plurality of operational instructions including at least one matrix multiply instruction that includes matrix input configuration data, wherein at least one of the plurality of vector processor units operates by: generating output data based on a multiplication of first input data and second input data in accordance with the matrix input configuration data, wherein the matrix input configuration data indicates the dimensionality of the first input data and the second input data. - In an embodiment of the present invention, the first input data is formatted as a 1×8 matrix when the matrix input configuration data has a first value, is formatted as an 8×8 matrix when the matrix input configuration data has a second value. The second input data is formatted as a 1×8 matrix when the matrix input configuration data has a first value and is formatted as an 8×8 matrix when the matrix input configuration data has a second value. The first input data and the second input data can both be formatted as a 4×4 matrix when the matrix input configuration data has another value.
- While particular combinations of various functions and features of the present invention have been expressly described herein, other combinations of these features and functions are possible that are not limited by the particular examples disclosed herein are expressly incorporated in within the scope of the present invention.
- As one of ordinary skill in the art will appreciate, the term “substantially” or “approximately”, as may be used herein, provides an industry-accepted tolerance to its corresponding term and/or relativity between items. Such an industry-accepted tolerance ranges from less than one percent to twenty percent and corresponds to, but is not limited to, component values, integrated circuit process variations, temperature variations, rise and fall times, and/or thermal noise. Such relativity between items ranges from a difference of a few percent to magnitude differences. As one of ordinary skill in the art will further appreciate, the term “coupled”, as may be used herein, includes direct coupling and indirect coupling via another component, element, circuit, or module where, for indirect coupling, the intervening component, element, circuit, or module does not modify the information of a signal but may adjust its current level, voltage level, and/or power level. As one of ordinary skill in the art will also appreciate, inferred coupling (i.e., where one element is coupled to another element by inference) includes direct and indirect coupling between two elements in the same manner as “coupled”. As one of ordinary skill in the art will further appreciate, the term “compares favorably”, as may be used herein, indicates that a comparison between two or more elements, items, signals, etc., provides a desired relationship. For example, when the desired relationship is that
signal 1 has a greater magnitude thansignal 2, a favorable comparison may be achieved when the magnitude ofsignal 1 is greater than that ofsignal 2 or when the magnitude ofsignal 2 is less than that ofsignal 1. - As the term module is used in the description of the various embodiments of the present invention, a module includes a functional block that is implemented in hardware, software, and/or firmware that performs one or module functions such as the processing of an input signal to produce an output signal. As used herein, a module may contain submodules that themselves are modules.
- Thus, there has been described herein an apparatus and method, as well as several embodiments including a preferred embodiment, for implementing a video decoder. Various embodiments of the present invention herein-described have features that distinguish the present invention from the prior art.
- It will be apparent to those skilled in the art that the disclosed invention may be modified in numerous ways and may assume many embodiments other than the preferred forms specifically set out and described above. Accordingly, it is intended by the appended claims to cover all modifications of the invention which fall within the true spirit and scope of the invention.
Claims (14)
1. A video decoder comprising:
an entropy decoding device that generates entropy decoded (EDC) data from an encoded video signal;
a video decoding device, coupled to the entropy decoding device, that includes:
a memory module that stores a plurality of operational instructions including at least one vector read instruction that includes vector read orientation data;
a plurality of vector processor units, coupled to the memory, for generating a decoded video signal from the EDC data, wherein at least one of the plurality of vector processor units includes:
a matrix memory that stores matrix data corresponding to a plurality of rows and columns and that generates vector read data in a first read orientation when the vector read orientation data has a first value and that generates vector read data in a second read orientation when the vector read orientation data has a second value.
2. The video decoder of claim 1 wherein the vector read data in the first read orientation corresponds to matrix data in one of the plurality of rows of the matrix memory.
3. The video decoder of claim 1 wherein the vector read data in the second read orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
4. The video decoder of claim 1 wherein the plurality of operational instructions further includes at least one vector write instruction that includes vector write orientation data;
wherein the matrix memory stores vector write data in a first write orientation when the vector write orientation data has a third value and that stores vector write data in a second write orientation when the vector write orientation data has a fourth value.
5. The video decoder of claim 4 wherein the vector write data in the first write orientation corresponds to matrix data in one of the plurality of rows of the matrix memory.
6. The video decoder of claim 4 wherein the vector write data in the second write orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
7. The video decoder of claim 1 wherein the encoded video signal is encoded in accordance with a VP8 coding standard.
8. A method comprising:
generating entropy decoded (EDC) data from an encoded video signal;
generating a decoded video signal from the EDC data, via a plurality of vector processor units, in response to a plurality of operational instructions including at least one vector read instruction that includes vector read orientation data, wherein at least one of the plurality of vector processor units operates by:
storing matrix data in a matrix memory corresponding to a plurality of rows and columns;
generating vector read data in a first read orientation when the vector read orientation data has a first value; and
generating vector read data in a second read orientation when the vector read orientation data has a second value.
9. The method of claim 8 wherein the vector read data in the first read orientation corresponds to matrix data in one of the plurality of rows of the matrix memory.
10. The method of claim 8 wherein the vector read data in the second read orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
11. The method of claim 8 wherein the plurality of operational instructions further includes at least one vector write instruction that includes vector write orientation data, and wherein the at least one of the plurality of vector processor units operates by:
storing vector write data in a first write orientation when the vector write orientation data has a third value; and
storing vector write data in a second write orientation when the vector write orientation data has a fourth value.
12. The method of claim 11 wherein the vector write data in the first write orientation corresponds to matrix data in one of the plurality of rows of the matrix memory.
13. The method of claim 11 wherein the vector write data in the second write orientation corresponds to matrix data in one of the plurality of columns of the matrix memory.
14. The method of claim 8 wherein the encoded video signal is encoded in accordance with a VP8 coding standard.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/162,075 US20120314775A1 (en) | 2011-06-08 | 2011-06-16 | Video decoder with transposing vector processor and methods for use therewith |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201161494614P | 2011-06-08 | 2011-06-08 | |
| US13/162,075 US20120314775A1 (en) | 2011-06-08 | 2011-06-16 | Video decoder with transposing vector processor and methods for use therewith |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20120314775A1 true US20120314775A1 (en) | 2012-12-13 |
Family
ID=47293189
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/162,075 Abandoned US20120314775A1 (en) | 2011-06-08 | 2011-06-16 | Video decoder with transposing vector processor and methods for use therewith |
| US13/162,265 Active 2034-09-05 US9503741B2 (en) | 2011-06-08 | 2011-06-16 | Video decoder with multi-format vector processor and methods for use therewith |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/162,265 Active 2034-09-05 US9503741B2 (en) | 2011-06-08 | 2011-06-16 | Video decoder with multi-format vector processor and methods for use therewith |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US20120314775A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130343456A1 (en) * | 2011-11-25 | 2013-12-26 | Panasonic Corporation | Image processing method and image processing apparatus |
| US20160366430A1 (en) * | 2015-06-15 | 2016-12-15 | Microsoft Technology Licensing, Llc | Multiple Bit Rate Video Decoding |
| US9832476B2 (en) | 2015-06-15 | 2017-11-28 | Microsoft Technology Licensing, Llc | Multiple bit rate video decoding |
| US11347503B2 (en) * | 2013-07-15 | 2022-05-31 | Texas Instruments Incorporated | Method and apparatus for vector based matrix multiplication |
| US20240160448A1 (en) * | 2022-11-10 | 2024-05-16 | Azurengine Technologies Zhuhai Inc. | Mixed scalar and vector operations in multi-threaded computing |
Families Citing this family (39)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10275243B2 (en) | 2016-07-02 | 2019-04-30 | Intel Corporation | Interruptible and restartable matrix multiplication instructions, processors, methods, and systems |
| US11163565B2 (en) | 2017-03-20 | 2021-11-02 | Intel Corporation | Systems, methods, and apparatuses for dot production operations |
| WO2019009870A1 (en) | 2017-07-01 | 2019-01-10 | Intel Corporation | Context save with variable save state size |
| US11093247B2 (en) | 2017-12-29 | 2021-08-17 | Intel Corporation | Systems and methods to load a tile register pair |
| US11023235B2 (en) | 2017-12-29 | 2021-06-01 | Intel Corporation | Systems and methods to zero a tile register pair |
| US11816483B2 (en) | 2017-12-29 | 2023-11-14 | Intel Corporation | Systems, methods, and apparatuses for matrix operations |
| US11809869B2 (en) | 2017-12-29 | 2023-11-07 | Intel Corporation | Systems and methods to store a tile register pair to memory |
| US11669326B2 (en) | 2017-12-29 | 2023-06-06 | Intel Corporation | Systems, methods, and apparatuses for dot product operations |
| US11789729B2 (en) | 2017-12-29 | 2023-10-17 | Intel Corporation | Systems and methods for computing dot products of nibbles in two tile operands |
| US10664287B2 (en) | 2018-03-30 | 2020-05-26 | Intel Corporation | Systems and methods for implementing chained tile operations |
| US11093579B2 (en) | 2018-09-05 | 2021-08-17 | Intel Corporation | FP16-S7E8 mixed precision for deep learning and other algorithms |
| US10970076B2 (en) | 2018-09-14 | 2021-04-06 | Intel Corporation | Systems and methods for performing instructions specifying ternary tile logic operations |
| US11579883B2 (en) | 2018-09-14 | 2023-02-14 | Intel Corporation | Systems and methods for performing horizontal tile operations |
| US10866786B2 (en) | 2018-09-27 | 2020-12-15 | Intel Corporation | Systems and methods for performing instructions to transpose rectangular tiles |
| US10990396B2 (en) | 2018-09-27 | 2021-04-27 | Intel Corporation | Systems for performing instructions to quickly convert and use tiles as 1D vectors |
| US10719323B2 (en) | 2018-09-27 | 2020-07-21 | Intel Corporation | Systems and methods for performing matrix compress and decompress instructions |
| US10963256B2 (en) | 2018-09-28 | 2021-03-30 | Intel Corporation | Systems and methods for performing instructions to transform matrices into row-interleaved format |
| US10896043B2 (en) | 2018-09-28 | 2021-01-19 | Intel Corporation | Systems for performing instructions for fast element unpacking into 2-dimensional registers |
| US10929143B2 (en) | 2018-09-28 | 2021-02-23 | Intel Corporation | Method and apparatus for efficient matrix alignment in a systolic array |
| US10963246B2 (en) | 2018-11-09 | 2021-03-30 | Intel Corporation | Systems and methods for performing 16-bit floating-point matrix dot product instructions |
| US10929503B2 (en) | 2018-12-21 | 2021-02-23 | Intel Corporation | Apparatus and method for a masked multiply instruction to support neural network pruning operations |
| US11886875B2 (en) | 2018-12-26 | 2024-01-30 | Intel Corporation | Systems and methods for performing nibble-sized operations on matrix elements |
| US11294671B2 (en) | 2018-12-26 | 2022-04-05 | Intel Corporation | Systems and methods for performing duplicate detection instructions on 2D data |
| US20200210517A1 (en) | 2018-12-27 | 2020-07-02 | Intel Corporation | Systems and methods to accelerate multiplication of sparse matrices |
| US10942985B2 (en) | 2018-12-29 | 2021-03-09 | Intel Corporation | Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions |
| US10922077B2 (en) | 2018-12-29 | 2021-02-16 | Intel Corporation | Apparatuses, methods, and systems for stencil configuration and computation instructions |
| US11016731B2 (en) | 2019-03-29 | 2021-05-25 | Intel Corporation | Using Fuzzy-Jbit location of floating-point multiply-accumulate results |
| US11269630B2 (en) | 2019-03-29 | 2022-03-08 | Intel Corporation | Interleaved pipeline of floating-point adders |
| US10990397B2 (en) | 2019-03-30 | 2021-04-27 | Intel Corporation | Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator |
| US11175891B2 (en) | 2019-03-30 | 2021-11-16 | Intel Corporation | Systems and methods to perform floating-point addition with selected rounding |
| US11403097B2 (en) | 2019-06-26 | 2022-08-02 | Intel Corporation | Systems and methods to skip inconsequential matrix operations |
| US11334647B2 (en) | 2019-06-29 | 2022-05-17 | Intel Corporation | Apparatuses, methods, and systems for enhanced matrix multiplier architecture |
| US11714875B2 (en) | 2019-12-28 | 2023-08-01 | Intel Corporation | Apparatuses, methods, and systems for instructions of a matrix operations accelerator |
| US11972230B2 (en) | 2020-06-27 | 2024-04-30 | Intel Corporation | Matrix transpose and multiply |
| US12112167B2 (en) | 2020-06-27 | 2024-10-08 | Intel Corporation | Matrix data scatter and gather between rows and irregularly spaced memory locations |
| US11941395B2 (en) | 2020-09-26 | 2024-03-26 | Intel Corporation | Apparatuses, methods, and systems for instructions for 16-bit floating-point matrix dot product instructions |
| US12474928B2 (en) | 2020-12-22 | 2025-11-18 | Intel Corporation | Processors, methods, systems, and instructions to select and store data elements from strided data element positions in a first dimension from three source two-dimensional arrays in a result two-dimensional array |
| US12001385B2 (en) | 2020-12-24 | 2024-06-04 | Intel Corporation | Apparatuses, methods, and systems for instructions for loading a tile of a matrix operations accelerator |
| US12001887B2 (en) | 2020-12-24 | 2024-06-04 | Intel Corporation | Apparatuses, methods, and systems for instructions for aligning tiles of a matrix operations accelerator |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5812791A (en) * | 1995-05-10 | 1998-09-22 | Cagent Technologies, Inc. | Multiple sequence MPEG decoder |
| US20060146060A1 (en) * | 2001-10-31 | 2006-07-06 | Stephen Barlow | Data access in a processor |
| US20100061444A1 (en) * | 2008-09-11 | 2010-03-11 | On2 Technologies Inc. | System and method for video encoding using adaptive segmentation |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5352448B2 (en) * | 2007-03-28 | 2013-11-27 | パナソニック株式会社 | Inverse quantization circuit, inverse quantization method, and image reproduction apparatus |
| US8208745B2 (en) * | 2008-01-31 | 2012-06-26 | Analog Devices, Inc. | Spatial domain video enhancement/scaling system and method |
| US8311111B2 (en) * | 2008-09-11 | 2012-11-13 | Google Inc. | System and method for decoding using parallel processing |
| US9706214B2 (en) * | 2010-12-24 | 2017-07-11 | Microsoft Technology Licensing, Llc | Image and video decoding implementations |
-
2011
- 2011-06-16 US US13/162,075 patent/US20120314775A1/en not_active Abandoned
- 2011-06-16 US US13/162,265 patent/US9503741B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5812791A (en) * | 1995-05-10 | 1998-09-22 | Cagent Technologies, Inc. | Multiple sequence MPEG decoder |
| US20060146060A1 (en) * | 2001-10-31 | 2006-07-06 | Stephen Barlow | Data access in a processor |
| US20100061444A1 (en) * | 2008-09-11 | 2010-03-11 | On2 Technologies Inc. | System and method for video encoding using adaptive segmentation |
Cited By (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9313499B2 (en) * | 2011-11-25 | 2016-04-12 | Panasonic Intellectual Property Corporation Of America | Image processing method and image processing apparatus |
| US9414064B2 (en) | 2011-11-25 | 2016-08-09 | Sun Patent Trust | Image processing method and image processing apparatus |
| US20130343456A1 (en) * | 2011-11-25 | 2013-12-26 | Panasonic Corporation | Image processing method and image processing apparatus |
| US12363345B2 (en) | 2011-11-25 | 2025-07-15 | Sun Patent Trust | Image processing method and image processing apparatus |
| US9961371B2 (en) | 2011-11-25 | 2018-05-01 | Sun Patent Trust | Image processing method and image processing apparatus |
| US10587895B2 (en) | 2011-11-25 | 2020-03-10 | Sun Patent Trust | Image processing method and image processing apparatus |
| US10924764B2 (en) | 2011-11-25 | 2021-02-16 | Sun Patent Trust | Image processing method and image processing apparatus |
| US11463731B2 (en) | 2011-11-25 | 2022-10-04 | Sun Patent Trust | Image processing method and image processing apparatus |
| US11895335B2 (en) | 2011-11-25 | 2024-02-06 | Sun Patent Trust | Image processing method and image processing apparatus |
| US12007904B2 (en) | 2013-07-15 | 2024-06-11 | Texas Instruments Incorporated | Method and apparatus for vector based matrix multiplication |
| US12450165B2 (en) | 2013-07-15 | 2025-10-21 | Texas Instruments Incorporated | Vector based matrix multiplication |
| US11347503B2 (en) * | 2013-07-15 | 2022-05-31 | Texas Instruments Incorporated | Method and apparatus for vector based matrix multiplication |
| US9832476B2 (en) | 2015-06-15 | 2017-11-28 | Microsoft Technology Licensing, Llc | Multiple bit rate video decoding |
| US9883194B2 (en) * | 2015-06-15 | 2018-01-30 | Microsoft Technology Licensing, Llc | Multiple bit rate video decoding |
| US20160366430A1 (en) * | 2015-06-15 | 2016-12-15 | Microsoft Technology Licensing, Llc | Multiple Bit Rate Video Decoding |
| US20240160448A1 (en) * | 2022-11-10 | 2024-05-16 | Azurengine Technologies Zhuhai Inc. | Mixed scalar and vector operations in multi-threaded computing |
| US12131157B2 (en) * | 2022-11-10 | 2024-10-29 | Azurengine Technologies Zhuhai Inc. | Mixed scalar and vector operations in multi-threaded computing |
Also Published As
| Publication number | Publication date |
|---|---|
| US20120314774A1 (en) | 2012-12-13 |
| US9503741B2 (en) | 2016-11-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9503741B2 (en) | Video decoder with multi-format vector processor and methods for use therewith | |
| US8971416B2 (en) | Video decoder with vector processor and methods for use therewith | |
| US9025666B2 (en) | Video decoder with shared memory and methods for use therewith | |
| US12170783B2 (en) | Video display preference filtering | |
| US20140269923A1 (en) | Method of stabilizing video, post-processing circuit and video decoder including the same | |
| JP7326439B2 (en) | Apparatus and method for deblocking filter in video coding | |
| US9088800B2 (en) | General video decoding device for decoding multilayer video and methods for use therewith | |
| US8077769B2 (en) | Method of reducing computations in transform and scaling processes in a digital video encoder using a threshold-based approach | |
| KR20230162801A (en) | Externally enhanced prediction for video coding | |
| US9025660B2 (en) | Video decoder with general video decoding device and methods for use therewith | |
| CN110290384A (en) | Image filtering method, device and video codec | |
| US9369713B2 (en) | Multi-format video decoder with vector processing instructions and methods for use therewith | |
| US20120230410A1 (en) | Multi-format video decoder and methods for use therewith | |
| US20070147515A1 (en) | Information processing apparatus | |
| US20120002719A1 (en) | Video encoder with non-syntax reuse and method for use therewith | |
| KR20250174382A (en) | Method and appratus for video encoding and decoding based on non-linear geometric partitioning mode | |
| US20130101023A9 (en) | Video encoder with video decoder reuse and method for use therewith | |
| KR20250175506A (en) | Method and appratus for video encoding and decoding based on adaptive coding of directional intra prediction mode | |
| Rao et al. | VP6 Video Coding Standard | |
| Song et al. | 1080p 60 Hz intra-frame video CODEC chip design and its implementation | |
| HK40098263A (en) | Signaling of eob for one-dimensional transform skipping | |
| Das et al. | A Cost-Shared Quantization Algorithm and Its Implementation for Multi-Standard Video Codecs | |
| Jagadish | Implementation of Serial and Parallel algorithm of SAO in HEVC | |
| JP2008141577A (en) | Re-encoding apparatus and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: VIXS SYSTEMS, INC., A CORPORATION, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAKSONO, INDRA;LIU, DONG;WANG, HONGRI (GRACE);AND OTHERS;REEL/FRAME:026918/0204 Effective date: 20110818 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |