WO2005119920A2 - Memory compression - Google Patents
Memory compression Download PDFInfo
- Publication number
- WO2005119920A2 WO2005119920A2 PCT/EP2005/005994 EP2005005994W WO2005119920A2 WO 2005119920 A2 WO2005119920 A2 WO 2005119920A2 EP 2005005994 W EP2005005994 W EP 2005005994W WO 2005119920 A2 WO2005119920 A2 WO 2005119920A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- values
- memory
- generating
- value
- replacing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11C—STATIC STORES
- G11C7/00—Arrangements for writing information into, or reading information out from, a digital store
- G11C7/10—Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
- G11C7/1006—Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Definitions
- the invention enables size and cost reductions for low-cost electronic devices based on embedded microprocessors with memory requirements for fixed data such as look-up tables or programs.
- Electronic devices often use microprocessors to perform a variety of tasks ranging from the very simple to the very complicated.
- a device such as an electronic communication device, a remote control device, other mobile devices, etc.
- the programs corresponding to those functions are usually, once development is complete, stored in a Read Only Memory (ROM).
- ROM Read Only Memory
- Other fixed data may also be stored in ROM, such as a look-up table for values of some function required for computation, such as an exponential or logarithm function, or trigonometric functions.
- a low cost and compact form of ROM is Mask Programmed ROM.
- the address is decoded to provide a logic level on one of a number of signal lines corresponding to the address.
- the address lines cross the bit lines corresponding to the desired output word, and a transistor is placed at the crossing of every address line and bit line where a binary "1 " is the desired bit output for that address.
- Such a ROM is in fact equivalent to a gigantic OR-gate for each output bit, determining it to be a "1 " if address A1.OR.A2.OR.A3.... is active, else determining it to be a "0".
- the transistors used to indicate a stored '1 ' provide the inputs to each multi-input OR gate.
- Figure 1 shows such a ROM, wherein a word of length 32 bits is stored against each of 1024 addresses.
- the pattern of Figure 1 may be repeated to form successive rows of 1024 words, and row-select lines are provided to enable output from a selected row, which, together with activation of a particular column address line, selects a particular word.
- the transistors are placed by means of a custom-designed diffusion and/or contact and/or metallization mask used during the fabrication process, adapted for the desired bit pattern. Since mask-making is expensive, this is used only when the ROM contents are expected to be fixed for high volumes of production units.
- Some solutions that could give the advantages of Masked ROM while preserving some flexibility comprise storing mature sections of program, i.e., subroutines, or fixed tables for mathematical functions or character set displays in Masked ROM, but linking and invoking them by a smaller program in reprogrammable memory. That way, if a Masked-ROM routine needs to be replaced, the replacement routine only need be placed in reprogrammable memory and the Masked-ROM version bypassed, a process known as "patching".
- the area advantages of Masked ROM have still not been sufficient to encourage widespread use of this technique.
- SUMMARY Exemplary embodiments of the present invention comprise a method for compressing data for storage in a memory.
- the method comprises forming a set of values based on a monotonically ordered series of look-up table values. For one or more paired values in the set, the exemplary method generates a difference of the values in the pair. After replacing one of the values in the pair with a value based on the differences to modify the set, the exemplary method stores final values based on the modified set in memory. According to one embodiment, the remaining values in the modified pairs remain unaltered. After this first iteration, final values based on the modified set are stored in memory. Alternatively, additional iterations may be performed to further compress the data storage.
- the exemplary method forms pairs between the unmodified values and forms pairs between the modified values, and generates new differences of the values in the new pairs.
- the exemplary method modifies the current set by replacing one of the values in the pairs with a value based on the differences. This process repeats until the predetermined number of iterations has been completed.
- final values based on the modified set of values produced by the final iteration are stored in memory.
- the remaining values in the modified pairs are replaced with values generated based on the sum of the values in the pair. After this first iteration, concatenating the sum and difference of each pair generates the final values.
- additional iterations may be performed to further compress the data storage.
- the exemplary method forms pairs between the sum values and forms pairs between the difference values, and generates new sums differences of the values in each of the new pairs. This process repeats until the predetermined number of iterations has been completed. After completing the final iteration, concatenating the sums and differences corresponding to the pairings of the final iteration generates the final values.
- a ROM for storing a number 2 N1 quantities of word length L1 is compressed from a total size of L1 x 2 N1 bits to a ROM that stores 2 N2 words of L2 bits, a total of L2 x 2 N2 bits, where L2 > L1 and N2 ⁇ N1 , by use of an inventive algorithm.
- One application is for providing a look-up table for a regular function F(x) where the table is addressed by a binary bit pattern corresponding to x and outputs the value F pre-stored at that location.
- a first order compression replaces alternate values of the function with their delta values from the preceding address.
- each pair of adjacent values may be replaced by their sum, with the least significant bit (LSB) discarded, and their difference.
- LSB least significant bit
- a 2 nd order algorithm can comprise repeating the compression procedure, taking as input the array of even-address values from the first stage, and separately the odd-address delta-value array, thereby obtaining a function value, two first order differences and a second order difference in replacement for four function values.
- the second order differences are usually even smaller than the first order differences, and therefore can be stored using even fewer bits than the first order differences, achieving further compression.
- a systematic procedure is disclosed for constructing compression algorithms of higher order using a Fast Walsh Transform based on a modified Butterfly operation in which the LSB of the sum term is discarded. Values are reconstructed using the inverse transform upon reading the ROM.
- a second implementation stores blocks of numerically adjacent values by storing a common or nearly common most significant bit (MSB) pattern and different LSB patterns, thus reducing the average number of bits per value stored. It can allow more convenient block sizes, i.e., regular block sizes, if the common most significant bit patterns in a block are allowed to differ by up to + or -1. Extra bits associated with the least significant parts then determine if the common most significant part must be incremented or decremented for a particular retrieved value, or not.
- MSB most significant bit
- the second implementation allows simpler reconstruction than the first implementation.
- the present invention also comprises memory for storing a plurality of look-up table values, where each look-up table value is associated with an address comprising a plurality of address symbols.
- One exemplary memory comprises a decoder, an encoder, and one or more patterns of crisscrossed interconnect lines.
- the decoder generates an enable signal for one of the decoder outputs by decoding one or more of the plurality of address symbols, while the encoder generates an output word based on the enable signal.
- the patterns of crisscrossed interconnect lines connect each of the decoder outputs to an encoder input.
- some aspects of the present invention form the patterns of crisscrossed interconnection lines using one or more planar layers of conductor tracks vertically interleaved with isolating material to utilize the vertical dimension of the memory.
- One exemplary use for this memory allows compressing memory size for storing values of an irregular function, or even storing random values, such as computer program instructions.
- the inventive algorithm operates on the function or data values in a sorted numerical order, which can be achieved by permuting address lines on the chip in a fixed pattern.
- the compression procedures described for regular functions can then be used to replace certain absolute data values with their deltas relative to the closest absolute data value stored anywhere, using fewer bits to store the deltas than to store absolute data values.
- each OR gate input requires one transistor, and is thus equivalent to a memory bit in terms of silicon area occupied.
- Reconstruction of an addressed value that has been compressed using an algorithm of order M comprises reading a word of length L2 bits whenever one of M adjacent address lines is activated. Then, its constituent parts comprising a base value, first order and higher differences are combined according to which of the address lines is active.
- Each group of M address lines may be combined using an M-line to log 2 (M)-line priority encoder to obtain log 2 (M) bits for systematically enabling reconstructive combinatory logic.
- ROM silicon area can be reduced typically by a factor between 2 and 5 depending on the order of the algorithm and the amount of reconstructive combinatorial logic used. Bit-exact reconstruction always occurs, as is of course required when the compressed data is a computer program.
- compression of random ROM contents is achieved by sorting the data into numerical order by permuting address lines. Storing the same number of bits of information with fewer actual bit elements in the ROM than information bits is possible because information is effectively stored within the permutation pattern.
- Figure 1 illustrates a mask programmed Read Only Memory (ROM).
- Figure 2 illustrates a plot of one exemplary regular function.
- Figure 3 illustrates one exemplary compression algorithm.
- Figure 4 illustrates exemplary reconstruction logic.
- Figure 5 illustrates a plot comparison between a true monotonic function and an exponential approximation.
- Figures 6A - 6D illustrate one exemplary process for eliminating a first layer of a Butterfly circuit for reconstructing stored data.
- Figure 7 illustrates a conventional MOS transistor.
- Figure 8 illustrates the construction of one exemplary memory circuit.
- Figures 9A and 9B illustrate one exemplary row address decoder.
- Figure 10 illustrates one exemplary process for arranging stored data in a sorted order.
- Figure 11 illustrates one exemplary implementation for a memory using first-order compression of random data.
- Figures 12a - 12C illustrate one exemplary implementation for address decoding.
- Figure 13 illustrates one exemplary priority encoder.
- Figures 14A and 14B compare exemplary encoder and decoder combinations.
- a plot of this function for base-2 is shown in Figure 2, for values of x between 0 and 16383 divided by 512, i.e., values of x with nine bits to the right of the binary point and 5 bits to the left of the point.
- the function is plotted versus the complement of 512x, denoted by X M , which ranges from 0 to 32 and gives a monotonically increasing function value as the argument X M is increased.
- X M the complement of 512x
- base-e may be used, in which case similar accuracy is achieved with 24 binary places.
- the uncompressed look-up table for the latter would comprise 16384, 24-bit words, a total of 393216 bits.
- the values of the words should be rounded to the nearest LSB, a fact which will have a significance described later. However, whatever those values are, any compression and decompression algorithms applied to these look-up table values are required to reproduce the desired rounded values in a bit-exact manner. For values of x greater than 17.32, the function is zero to 24-bit accuracy in the base-e case, and is zero in the base-2 case when x > 24.
- a first order compression algorithm comprises storing 24-bit values only for even address values of X M , and in an odd address, storing the delta-value from the previous even value.
- Direct computation for the exemplary function shows that the highest delta encountered is 16376 (times 2 "24 ), which can be accommodated within a 14-bit word. Therefore, the ROM can be reconfigured as 8192 words of 24 bits and 8192 words of 14 bits. Only the 24-bit value is required for an even address X M , and both the 24-bit and the 14-bit value are required to be read and added for an odd address X M . The ROM has thus been compressed from 16384 x 24
- a second order algorithm can be derived by applying the process of replacing pairs of adjacent values by a value and a delta-value to the first array of 8192 24-bit values and separately to the second array of 8192 14-bit delta-values.
- the first application yields delta- values two apart in the original function, which are roughly double the single-spaced deltas, so requiring 15-bits of storage;
- the second application of the algorithm to the array of 14-bit first- order deltas yields second-order deltas, which theoretically should be in the range 0 to 31 , requiring 5 bits.
- the range of the second-order deltas increases to -1 minimum to 33 maximum, requiring 6 bits of storage. This arises because one value may be rounded down while the adjacent word is rounded up, so increasing (or decreasing) the delta by 1 or 2.
- Reconstruction of a desired value comprises addressing the 4096x59-bit ROM with the most significant 12 bits of X M to obtain the four sub-words of length 24, 15, 14, and 6-bits respectively; then the least significant 2 bits of X M are used to control reconstruction of the desired value from the four sub-words as follows: Denoting the 24, 15, 14, and 6-bit values by F, D2, D1 , and DD respectively, for
- X M LSBs 00
- the desired value is F 01 F + D1 10 F + D2 11 F + D1 + D2 + DD
- Forming F + D1 requires a 14-bit adder
- forming F + D2 requires a 15-bit adder
- forming D2 + DD requires a 6-bit adder. Therefore, 35 bits of full adder cells are needed to reconstruct the desired value.
- the 6-bit adder adds either 1 or DD to D2 according to the LSB of X M , and in general could add a value DD0 or DD1 , to allow any range of DD from negative to positive to be accommodated.
- Figure 3 depicts this algorithm for compression by iterating the process of taking differences between adjacent values.
- the desired values to be stored are initially the 16384 24- bit values denoted by fO, f 1 , f2, f6... and so on to f (16384).
- the differences are computed to be at maximum 14-bit long values, which is 1024 times smaller than the original 24-bit values.
- the function values are represented by 8192 pairs of values, one being of 24-bits length and the other being of 14 bits length.
- every 2 nd even value i.e., every 4 th one of the original values, now remains unchanged as a 24-bit value, but the even values in between such as f2 - fO have become 15- bit differences.
- Every alternate d-value likewise remains an unchanged 14-bit value while those in between such as d6 - d4 have been reduced to 6-bit second-order differences.
- the original 16384 24-bit values have been compressed to 4096 sets of values comprising a 24-bit, a 15-bit, a 14-bit and a 6-bit value.
- Figure 4 shows the reconstruction logic for a second-order difference compression algorithm.
- the original method of storing 16384, 24-bit words shown in the upper picture comprises a look up table to which a 14-bit address is applied.
- the lower picture depicts the second-order difference compression technique in which the look-up table has been reduced to 4096, 59-bit values.
- the 59 bits comprising a 24-bit absolute value, a 15-bit double-spaced difference, a 14-bit single-spaced difference and a 6-bit 2 nd order difference are applied to respective adders as shown.
- the first column of adders is enabled by the bit of second least significance to add the 15-bit difference to the 24-bit value and to add the 6-bit second order difference to the 14-bit difference if the bit is a T, else, if the enabling bit is a '0', the 24 bit value is passed without adding the 15-bit value and the 14-bit value is passed without adding the 6-bit value.
- the correction is a second order difference represented by D1 - (D2/2), which, by direct calculation is found to occupy the range -1 to 8, requiring 4 bits of storage.
- this 2nd order difference is nearly equal to 1/4 of the value DD. Therefore it may be replaced by the difference from DD/4, which by direct calculation has the range -1 to +1 , requiring 2 bits of storage, and is effectively a 3 rd -order difference, which may be denoted DDD.
- the stored values comprise 4096 sets of four values as follows:
- a 6-bit value: DD(4i) (F(4i + 3) - F(4i + 2)) - (F(4i + 1 ) - F(4i))
- a 2-bit value: DDD(4i) (INT(D2(4i)/2)) - (F(4i + 1 ) - F(4i)) - INT(DD(4i)/4)
- fO sO + f1
- f 1 is replaced with f 1 - fO
- a circuit that computes the sum and difference of a pair of values is called a butterfly circuit. All of the sums are arranged adjacently followed by all the differences. The differences as already seen will be shorter words than the sums.
- the process is then repeated on the block of sum values, and then separately on the block of difference values, each half the length of the original, to obtain a second order algorithm and so on for higher order algorithms.
- the highest order algorithm for 16384 values is 14. At that point the blocks have only one value each and thus cannot be compressed further.
- the 14th order algorithm is in fact a 16384-point Walsh Transform, and lower order algorithms stop at the intermediate stages of the full Walsh Transform.
- the sum and difference values associated with the final iteration are concatenated to form the final compressed values stored in memory. It will be appreciated that the final iteration may be any selected iteration, and therefore, is not required to be the 14 th iteration.
- the word length can increase by one bit.
- the sum and the difference always have the same least significant bit, which can therefore be discarded on one of them without losing information.
- V 2 J the sum is prevented from growing in length.
- the difference is added to or subtracted from the sum and its LSB is used twice by feeding it into the carry/borrow input of the first adder/subtractor stage to replace the missing sum LSB.
- bit count in parentheses is that obtained when a Walsh Transform using the standard Butterfly operation is used.
- the inferior compression of the standard Walsh Transform is partly due to growth of the word lengths of sum terms, and partly due to an amplification of the rounding errors in the original table values when higher-order differences are formed, which is less noticeable when the modified Butterfly is used and the LSBs of the sum terms are discarded.
- bit-exact reconstruction of the original values is achieved when the appropriate inverse algorithm is applied.
- the look-up table reduces to a single word 32085 (23188) bits wide, which is less than 2 bits per table value on average; yet each value can be reconstructed to 24-bit (23-bit) accuracy.
- a single word is no longer a table, and needs no address, since it will always be "read", the values can just be hard- wired into the inputs of the reconstruction adders.
- 4 levels of reconstruction adders are required and the number of single-bit adder cells required is around 65000. Because an adder cell is larger than a memory bit, this does not represent the minimum silicon area. The minimum area is achieved at some compromise between the number of bits stored and the number of adder cells required for value reconstruction.
- This function is an approximation to the above exemplary function which can suffice over a portion of the argument range. It is useful to use this approximation over the whole range and to store only corrections needed, which have a shorter word length.
- base-2 the bit patterns for this base-2 exponential function repeat with a shift when the argument changes from O.X 2 to 1.X 2 , 2.X 2 , etc., so only the bit patterns for the principal range denoted by O.X 2 , where X 2 is a 9-bit value, need be synthesized.
- the following table gives the total number of bits required to be stored to synthesize the function, rounded to the nearest 23 bits after the binary point, for values of the argument X 2 from 0 to 511.
- the final row constitutes a machine for computing the function without a memory, using only controlled adders with fixed, hardwired inputs.
- the differences between the desired values of F a and F s and the exponential function are plotted for the base-2 case in Figure 5. These differences can be stored in a look-up table that occupies less space than the original functions, the chip area being indicated approximately by the triangular area under the curve to the bottom right.
- Figure 6A shows a conventional butterfly circuit for adding or subtracting the same 14-bit value to/from the 24-bit value to produce one of two alternative values according to the SELECT control line.
- Figure 6B shows that the butterfly circuit can be simplified into a 14-bit adder for adding or subtracting the 14-bit value to/from the 14 LSBs of the 24-bit value to produce one of two alternative 14 LSBs of the result, plus a carry or borrow bit from the add or subtract operation which is combined with the 10 MSBs of the 24 bit value to produce one of two ' alternative 10-bit MSB patterns.
- the MSB patterns may be the same if no borrow or carry is generated by the 14-bit add/subtractor.
- the two alternative 14-bit results are simply pre-stored, that corresponding to the difference (subtract) operation is stored along with the 10 MSBs in place of the 24-bit value, while that corresponding to the sum result is stored in place of the 14-bit value.
- a 15 lh bit is needed to indicate whether the 10 MSBs are the same or incremented by 1 for the sum case.
- the 15 th bit is always a zero in the difference case and is shown as an input to the selector alongside the 14 LSBs from the 24-bit word, lf zeros are implemented in memory as the absence of a transistor, the column of zeros takes up no silicon area.
- the carry propagator is still needed in Figure 6C to increment the 10 MSBs if necessary.
- the selector in Figure 6C does not need to be explicitly provided. It is inherent in memories that the address bits select one or other of a number of stored values. Therefore the two alternative 14 (or 15) bit patterns are simply selected by the select line, which is the least significant bit of the 14-bit address. The 10 MSBs are addressed by the 13 MSBs of the address. Thus, the memory comprises 8192 10-bit words and 16384 14/15-bit words, a total of 319488 bits, which is 8192 bits more for the benefit of eliminating a 14-bit adder. Finally, Figure 6D shows the omission of the carry propagator by delaying incrementing the 10 MSBs with the carry bit and simply feeding forward the carry bit to be used at a convenient opportunity in a subsequent adder.
- the division of the 24-bit word into a 10-bit part and a 14-bit part in Figure 6 arises from the exemplary function exhibiting a delta between even and odd addressed values that is never more than a 14-bit value.
- the 10 MSBs are always the same or differing only by 1 between an even and adjacent odd-addressed value.
- the difference between values 2 apart is never more than a 15-bit value
- the difference between values 4 apart is never more than a 16-bit value. Therefore, a group of 4 (or even 5) adjacent values have the same most significant 8 bits except for a possible increment of 1 , while having different least significant 16- bit parts.
- an alternative realization is to arrange the memory as 4096 8-bit values and 16384 17-bit values, the 17th bit indicating whether the 8 MSBs have to be incremented by 1 or not. This reduces the number of bits to 31 1296.
- a table showing the number of bits of memory for different splits is shown below.
- the table indicates there is an optimum split which minimizes the number of bits.
- This process can now be iterated by recognizing that the LSB patterns within each group are monotonically increasing values. For example, in each group of 64 values in the last case, no two adjacent values can differ by more than a 14-bit value. Therefore these values can be divided into 32 groups of 2, storing for each pair 7 MSBs (which may have to be incremented by 1 ) and two alternative 14 or bit LSB patterns, the 15th bit indicating if the 7 MSBs should be incremented by 1. -The different ways to represent groups of 64 21 -bit values is shown below, with the overall number of memory bits implied.
- Figure 7 shows the construction of a MOS transistor.
- a conducting gate electrode is made of a conducting material with a high melting point so as to withstand processing temperatures, and is separated from the silicon substrate by a thin insulating layer, generally of silicon dioxide.
- the gate and its underlying oxide insulation are generally produced by covering the whole substrate with gate structure first and then etching away except where gates are desired. Then drain and source electrodes are formed by implanting impurities on either side of the gate, the gate acting as its own mask to ensure that drain and source adjoin the gate exactly. The implanted impurities have to be allowed to
- FIG. 8 shows a ROM constructed in this way. Alternating stripes of source and drain diffusion form a multitude of transistors with their sources and drains paralleled. The transistors are located where the gate structures have not been etched away. A transistor is placed where a binary 1 is desired to be stored, and no transistor is placed where a binary 0 is to be stored. Actually in Figure 8, a transistor is shown as comprising a gate from the source line above the drain line and a gate from the source line below the drain line. Both are enabled by the same gate signal and thus are in effect two transistors in parallel.
- the number of separate drain lines corresponds to the number of bits in a word. All of the source lines for the same word are connected at the left hand side to a word row enable signal, which is pulled to ground to read a word in that row. Different columns of transistors correspond to different words in the row.
- An address decoder accepts a binary address input and puts an enable signal on all the gates in one column, thus enabling all the bits of a particular word to the drain lines.
- the drain line is pulled down by a turned on transistor if the word contains a zero in that position, or else is not pulled down if a transistor is absent, corresponding to a 0 in the word.
- Multiple rows of words can be constructed and the address lines run top to bottom through all rows.
- a cascadable building block comprises two MOS transistors with their sources commoned to the "cascoding input" C.
- An address line is connected to the gate of one transistor and its drain develops a complementary address signal, which is connected to the gate of the other transistor. The result is that either one transistor or the other conducts between source and drain depending on whether the address line is high or low.
- FIG. 9B A cascade of this building block is shown in Figure 9B, in which the drain connections DR1 and DR2 of a first device controlled by address bit A1 are connected to the source connections of two identical building blocks higher in the chain which are both controlled by address bit A2. Their drain connections are in turn connected to pairs of building blocks that are all controlled by address bit A3 to present 8 drain connections, and so forth.
- an N-bit address bit pattern is decoded by the binary tree of building blocks to produce a conducting path from one of 2 N final drain connections to the first source connection annotated "Chip Enable".
- FIG. 10 shows how a memory containing random values can be arranged in order of numerical value. The words are simply sorted and arranged in numerical order, while remaining connected to the same output line of the address decoder.
- FIG. 11 illustrates the implementation of a memory for first-order compression of random data using a decoder, crisscross line pattern, and an encoder. Every alternate full word, in value-sorted order, has been removed compared with Figure 10.
- the preceding word to the removed word is now addressed by its own address as well as the address of the omitted word that was adjacent to it by including an OR gate to OR the two addresses.
- the address line of the omitted word now enables a shortened delta-word, which is to be added to the preceding word by the reconstruction logic in order to recreate the omitted word.
- the number of bits of memory has been reduced to a full word and a delta word for every pair of original full words at the cost of a two input OR gate per pair.
- a two-input OR-gate is at maximum 4 transistors, which can be counted as four bits. This should be added to the delta-word length in estimating the amount of silicon area saved.
- the address-line crisscross pattern also requires chip area.
- the crisscross connection pattern may run over the top of the area occupied by the full-words using a suitable number of metallization layers insulated or mutually isolated from each other. Such layers may be fabricated using photo-lithographic techniques. For the purposes of this invention it is assumed that an essentially unlimited number of metallization layers can be used, and that insulative layers isolate the crisscrossed interconnection lines and/or patterns of crisscrossed interconnection lines from unwanted contact. Indeed, using layer upon layer of interconnect patterns, in order to store more data using the same chip area, exploits the vertical dimension. One has to be careful in extending the concept of higher order difference algorithms to random data, as differences of greater than first order may show no meaningful regularity.
- the logical OR of 4 addresses must then address the value of fO.
- a four input gate is a maximum of 8 transistors.
- the silicon area occupied by this 4-input OR gate is the cost for the savings of reducing 3 full-words f 1 , 12, and f3 to three delta-words d1 , d2, and d3.
- the 4-input OR gate is only twice as complex as the 2-input OR-gate but the memory reduction is now approximately 3 times greater. Therefore the overhead of the OR- gate has reduced in comparison to the savings achieved. Wherever two data values to be stored are identical and thus appear next to one another in value sorted order, they are replaced by a single value and no delta need be stored. Then the address line corresponding to the zero delta is not needed, and the OR of the two address lines to the single stored value may be absorbed into the last stage of address decoding by using an AND-OR gate as shown in Figure 12.
- Figure 12A shows two decoders of the Figure 9 type for decoding, for example, 8 address bits to one 256 address lines. Because Figure 9 has open-drain outputs, pull-up transistors (or resistors) have to be added to reset the outputs to a known state between memory reads. The 256-lines of one decoder (AO .... A255) are then crossed over the 256 lines (BO - B255) of the other to give 65536 crossing points. When an individual one of the 65536 lines must be decoded, a 2-input AND gate is placed at that junction, as shown in Figure 12B, giving an address decoder complexity of 4 transistors per output line, as the number of transistors in the 8*256 line decoders is negligible in comparison.
- Figure 12C shows how the 8 transistors of two AND gates can be reconnected to form an AND-OR using no extra transistors. Analyzing some typical DSP programs used in mobile devices made an estimate of the savings possible using the above technique.
- a first program comprising 8464 16-bit instructions was determined to comprise only 911 different 16-bit values. Clearly many values are repeated multiple times and the above technique of storing these only once and using AND-OR address decoding can save a factor of 9 approximately in chip area without needing to use any delta values.
- a second program of 2296 words was found to comprise only 567 different 16-bit values, also allowing a significant compression factor. The combined programs compress from 10760 words to 1397 words.
- the 4-transistors per address used in AND-OR address decoding were required in the conventional memory's address decoder, and therefore do not add complexity.
- a more accurate estimate of the size reduction is to add the 4 -transistors per address of the address decoder to the 16 transistors per word of the memory array to obtain 20 transistors per word total for the conventional memory, which reduces to 4 + 6/9 transistors, which represents approximately a factor of 3 reduction.
- the number of words to be stored exceeds the number of possible values, for example storing 1 megaword of 16-bit values, it is evident that it is never necessary to store move than 65536 values, using AND-OR address decoding to combine all addresses at which the same word is to be stored to enable that particular 16-bit word.
- a memory of this type benefits from the use of more overlapping interconnecting layers to handle the complex interconnection pattern of AND-OR addressing.
- the technique is thus a method of using the vertical dimension on chips (more interconnecting layers) to realize more efficient memories. In the latter case, the 65536 possible words do not even need to be stored. Having
- a device called a priority encoder can be used as a 65536:16 line encoder, which is the inverse function of a 16:65536 line address decoder.
- a priority encoder can be constructed with at most 6 transistors per input, which is less than the 16-bits per word otherwise needed.
- a priority encoder is shown in Figure 13. A set of N inputs numbered 0 to N - 1 is divided into a first half 0 to N/2 - 1 and a second half N/2 to N - 1.
- Pairs of corresponding lines are selected from the first and second halves, for example lines 0 and N/2 as shown, to be ORed together.
- the NOR gate gives a logic '0' out if either of the inputs are a 1.
- the N-type transistor conducts if the lower input was a 1 and connects the upper input, which must be a 0, to output B.
- the P-type transistor conducts if either input was a 1 , and connects the upper input to the output B if the upper input was a 1. Therefore, output B has the polarity of the upper input if either of the inputs was a 1 , else output B is open circuit.
- All output B's are paralleled forming a wired OR, giving a 1 out if the active input line lay in the upper half of the inputs, else a '0' is output. This is the most significant bit of the desired encoding.
- the process is then repeated for the N/2 outputs A0 to A(N/2 - 1) to produce the second most significant output bit of the desired encoding, and so forth.
- the combination of an address decoder and its inverse, a priority encoder just reproduces the input, as shown in Figure 14.
- any 1 :1 mapping also known as a Substitution box or S-box, or an information lossless coding
- some addresses are AND-ORed, implying that some output values are missing, a many:one mapping or information lossy coding is produced.
- any read only memory having an output word length equal to the address word length can be produced in this way.
- the information is evidently stored in the address line crisscross or permutation pattern, as that is the only difference between Figures 14A and 14B. Using an estimated 4 transistors per line in the address decoder and 6 in the priority encoder shows a saving for memory sizes above 1024 10- bit words.
- interconnecting patterns can be produced, all sharing the same priority encoder and address decoder, providing a method of selecting the desired interconnecting pattern.
- a series pass switch per line comprised of a P and an N type transistor in parallel as shown in Figure 13 may be used for this.
- a number M of interconnecting patterns may be selected and a memory of size M.2 N N-bit words may be created using 2 + 10/M transistors per word, showing an increasing efficiency as the number of layers is increased.
- single-transistor pass switches could be used, for example by creating a power supply of -V, (for a P-type switch) or V cc + V t (for an N-type switch) to ensure that the switch conducts for both a logic 1 and logic 0 level.
- -V for a P-type switch
- V cc + V t for an N-type switch
- tables of stored data that represent monotonic functions can be compressed by storing only one base value per group of numerically adjacent values, plus the deltas from the base value for the other values in the group.
- groups of values may be transformed using a Walsh Transform using a modified Butterfly operation. The limiting case of the latter results in the ability to synthesize any monotonic function with an array of adders with fixed inputs because the memory table has disappeared.
- An alternative to storing deltas was disclosed to comprise storing the LSB part of the pre- computed result of adding the delta, plus an extra bit to indicate if a carry to the MS part is required, thus eliminating the majority of the reconstruction adders.
- the technique was extended to piecewise monotonic functions, where the function is monotonic for each compression group.
- the technique was then extended to random data such as a computer program, by imagining the data to be first sorted in numerical order by permuting the address lines. When any Delta is zero, it need not be stored, nor an address line generated to address it, thus allowing simplification of the OR-ing of addresses needed for the base word. Analysis of a typical DSP program showed that the majority of the compression could be achieved using this technique alone.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Semiconductor Memories (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020067026117A KR101236673B1 (en) | 2004-06-04 | 2005-06-03 | Memory compression |
| CN2005800182352A CN1965486B (en) | 2004-06-04 | 2005-06-03 | Memory compression |
| EP05776097A EP1774657A2 (en) | 2004-06-04 | 2005-06-03 | Memory compression |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US57738604P | 2004-06-04 | 2004-06-04 | |
| US60/577,386 | 2004-06-04 | ||
| US11/143,157 US8316068B2 (en) | 2004-06-04 | 2005-06-02 | Memory compression |
| US11/143,157 | 2005-06-02 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2005119920A2 true WO2005119920A2 (en) | 2005-12-15 |
| WO2005119920A3 WO2005119920A3 (en) | 2006-02-16 |
Family
ID=35276991
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2005/005994 Ceased WO2005119920A2 (en) | 2004-06-04 | 2005-06-03 | Memory compression |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP1774657A2 (en) |
| WO (1) | WO2005119920A2 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11163533B2 (en) | 2019-07-18 | 2021-11-02 | International Business Machines Corporation | Floating point unit for exponential function implementation |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7930170B2 (en) * | 2001-01-11 | 2011-04-19 | Sasken Communication Technologies Limited | Computationally efficient audio coder |
-
2005
- 2005-06-03 EP EP05776097A patent/EP1774657A2/en not_active Ceased
- 2005-06-03 WO PCT/EP2005/005994 patent/WO2005119920A2/en not_active Ceased
Non-Patent Citations (1)
| Title |
|---|
| S.W. SMITH: "The Scientist and Engineer's Guide to Digital Signal Processing", 1999, CALIFORNIA TECHNICAL PUBLISHING |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11163533B2 (en) | 2019-07-18 | 2021-11-02 | International Business Machines Corporation | Floating point unit for exponential function implementation |
Also Published As
| Publication number | Publication date |
|---|---|
| EP1774657A2 (en) | 2007-04-18 |
| WO2005119920A3 (en) | 2006-02-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8478804B2 (en) | Memory compression | |
| US10972126B2 (en) | Data compression and storage | |
| CA2298919C (en) | Interleaving and turbo encoding using prime number permutations | |
| TWI396446B (en) | System and method for context-based adaptive binary arithematic encoding and decoding | |
| KR100861893B1 (en) | Code construction for irregular shortened ???? codes with good performance | |
| JP3636684B2 (en) | Method for performing turbo encoder interleaving and interleaving in UMTS | |
| JP2005252820A (en) | Encoding method and apparatus | |
| EP3886324B1 (en) | Compression and/or decompression of activation data | |
| JP3515036B2 (en) | Interleaving method, interleaving device, turbo coding method, and turbo coding device | |
| CN1965486B (en) | Memory compression | |
| KR101236673B1 (en) | Memory compression | |
| WO2005119920A2 (en) | Memory compression | |
| JP3694259B2 (en) | Method for performing body-body coding in a mobile communication system | |
| WO2009150612A1 (en) | Reconfigurable turbo interleaver for multiple standards | |
| HK1102651A (en) | Memory compression | |
| EP4109320A1 (en) | Implementing functions in hardware | |
| US7103685B1 (en) | Bitstream compression with don't care values | |
| US7519896B2 (en) | Turbo encoder and related methods | |
| CN107959502B (en) | LDPC coding method | |
| US20050223054A1 (en) | Multiplier sign extension method and architecture | |
| US20250265038A1 (en) | Decoder having multiple stages of adders to decode delta encoded data | |
| KR100982666B1 (en) | Context-based Adaptive Variable Length Coding Decoding Apparatus and Table Search Method for Decoding | |
| CN110518918B (en) | Decoding method and decoder | |
| JP2002330076A (en) | Huffman decoder and decoding method | |
| JP4597075B2 (en) | Operation mapping method to reconfigurable circuit, reconfigurable circuit, and data flow graph |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 200580018235.2 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1020067026117 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2005776097 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020067026117 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 2005776097 Country of ref document: EP |