WO2001061543A1 - Method for compression of small computer data files - Google Patents
Method for compression of small computer data files Download PDFInfo
- Publication number
- WO2001061543A1 WO2001061543A1 PCT/US2001/004986 US0104986W WO0161543A1 WO 2001061543 A1 WO2001061543 A1 WO 2001061543A1 US 0104986 W US0104986 W US 0104986W WO 0161543 A1 WO0161543 A1 WO 0161543A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- compression
- entries
- storing
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Definitions
- the present invention generally relates to compression of binary data for ease of transmission between a source and receiving device, and more particularly to compression of binary data files by selecting an optimum compression method from a plurality of compression methods.
- the present invention although useful for many applications, was originally designed for the purpose of reducing file sizes of data recorded at remote locations of natural gas flow, such as where the natural gas flowed from the well into a pipeline. Many natural gas wells are located in places that are difficult to reach. Prior to the development of the present invention, data was recorded and stored on site. Approximately once a month, a
- the present invention has reduced the size of the file needed to be transmitted to a point where it is more economical to use satellite communication than to visit each site.
- Computers store and use numbers in binary format (base 2), instead of the more familiar base 10 format.
- Computers typically use a 16-bit or 32-bit segment of binary numbers, regardless of the number's actual value.
- Binary digits, or bits may have a value of either 0 or 1.
- 1024 (base 10) is expressed as 00000100 00000000 (base 2 or binary).
- 1023 is expressed as 00000011 11111111 , and uses one less meaningful bit than the value of 1024.
- the zeros included in the 16-bit number before the first digit that is not zero do not have any numerical significance.
- 000001023 is identical to 1023
- 00000011 1111 1111 is identical to 11 11111111.
- the present invention uses several methods to compress the number of bits used to store data entries, either by removing unnecessary bits or by designating the number in a fashion that requires fewer bits.
- the present invention examines data arrays found in gas flow records and evaluates, for example, five different compression algorithms for each data array. The best algorithm choice depends upon the underlying characteristics of the array.
- a data sample might include time, differential pressure, static pressure, temperature, flow time, C , pressure extension, volume and energy data values.
- each of five algorithms is applied to each array and the method is chosen for each array that allows maximum compression.
- Each compression method makes use of a few compression techniques that further reduce the size of the compressed file.
- the present invention uses dynamic data field sizing throughout the compressed file to identify data values. If four bits are sufficient to positively identify a value, the present invention will use only four bits instead of the 16 bits or 32 bits used in the standard computer format. Many of the compression methods and techniques are in themselves novel, such as Ratio Compression, Strip Top Bit, and Outlier Compression. The present invention also includes novel methods for specifying time and summary data.
- Some data arrays may include many repetitions of the same number.
- the present invention searches for the most frequently repeated value in the data array, or the mode of the array.
- the present invention will include the mode only one time in the compression record. If the array has variations from the mode, the variations are treated as outliers and treated individually.
- the present invention identifies the position of outlier values and their values in the compressed file. This compression method is called Mode compression.
- Some data arrays, such as static pressure or C readings may change by small increments in relation to the value of the reading. The present invention will search for the
- Some data arrays such as temperature readings or storage tank level readings, increase or decrease regularly by relatively small amounts.
- the present invention will include the first reading in the compressed file. It will then subtract that reading from the subsequent reading and include that difference in the compressed file. This compression method is called Difference From Last.
- Some data arrays are proportional to another array or a combination of other arrays in the data file.
- volume is proportional to C, pressure extension, and flow time.
- Energy is proportional to differential pressure and static pressure.
- the present invention creates an array of estimated values and finds a ratio that when multiplied to the estimated values will minimize the range of deltas between the multiplied values and the actual values. The ratio and the deltas are included in the compressed file. This compression method is called Ratio compression.
- an outlier bitmap that has one bit for each element of the array.
- the bit of the bitmap that corresponds to each outlier will have a value of 1, while all non-outlier values will be 0.
- an outlier bitmap is a series of bits that "maps" which values of an array must be treated as outliers. It is an object of the present invention to provide a method of compression that substantially improves the compression ratio for small computer data files.
- Fig. 1 is a top-level processing diagram, or flowchart, showing the process of the present invention
- Fig. 2a is a table illustrating a sample data array
- Fig. 2b is a compression record created for the data contained in Fig. 2a;
- Fig. 3a is a table illustrating a sample data array
- Fig. 3b is a compression record created for the data contained in Fig. 3a;
- Fig. 4 is a compression record illustrating the Strip Top Bit compression method;
- Fig. 5 is a processing diagram, or flowchart, showing the process of the Strip Zeros compression method;
- Fig. 6a is a table illustrating a sample data array
- Fig. 6b is a compression record created for the data contained in Fig. 6a
- Fig. 7a is a table illustrating a sample data array with binary values
- Fig. 7b is a compression record created for the data contained in Fig. 7a;
- Fig. 7c is another compression record for the data contained in Fig. 7a
- Fig. 7d is another compression record for the data contained in Fig. 7a
- Fig. 8 is a top-level processing diagram, or flowchart, showing the process of
- Fig. 9 is a top-level processing diagram, or flowchart, showing the process of the Delta From Base compression method
- Fig. 10a is a table illustrating a sample data array
- Fig. 10b is a compression record for the data contained in Fig. 10a
- Fig. 1 la is a table illustrating a sample data array
- Fig. 1 lb is a compression record for the data contained in Fig. 1 la
- Fig. 12a is a table illustrating a sample data array
- Fig. 12b is a compression record for the data contained in Fig. 12a
- Fig. 13a is a table illustrating a sample data array
- Fig. 13b is a compression record for the data contained in Fig. 13a
- Fig. 14 is a top-level processing diagram, or flowchart, showing the process of the Difference From Last compression method
- Fig. 15a is a table illustrating a sample data array
- Fig. 15b is a compression record for the data contained in Fig. 15a;
- Fig. 16a is a table illustrating a sample data array
- Fig. 16b is a compression record for the data contained in Fig. 16a;
- Fig. 17a a table illustrating a sample data array
- Fig. 17b a compression record for the data contained in Fig.17a
- Fig. 18 is a top-level processing diagram, or flowchart, showing the process of the Ratio compression method
- Fig. 19 is a table illustrating a sample data array
- Figs. 20a - 20d are tables showing iterations of the mantissa determination of
- Fig. 21a is a table illustrating a sample data array
- Fig. 21b is a table showing estimate data used by the Ratio compression
- Fig. 21c is a table showing deltas determined by the Ratio compression
- Fig. 21 d is a compression record for the data contained in Fig. 21a;
- Fig. 22a is a table illustrating a sample data array;
- Fig. 22b is a table showing estimates determined by the Ratio compression method;
- Fig. 22c is a compression record for the data contained in Fig. 22a;
- Fig. 23a is a table illustrating a sample data array
- Fig. 23b is a table illustrating an intermediate step of the Ratio compression method
- Fig. 23c is a compression record for the data contained in Fig. 23a. DETAILED DESCRIPTION OF THE INVENTION
- the present invention was designed to compress the binary representation of numbers. It performs its compression by removing unnecessary bits from binary representations and designating numbers in more compact fashion.
- Computer systems use 16-bit or 32-bit binary values to designate values, regardless of how many bits are actually required. In a 16-bit system, 1024 is designated as 00000100 00000000. Under normal
- N bits 2 N - 1.
- 4 bits can hold a maximum value of 2 4 - 1 or 15. 15 is represented in binary form as 1111.
- base 10 the maximum value of N
- N digits 10 N - 1.
- 4 digits in base 10 can hold a maximum value of 10 4 - 1 or 9999.
- the present invention reduces the values of some data entries so that those entries may be designated by fewer bits.
- a flowchart illustrates the basic steps included within the present invention.
- Raw data is input.
- the present invention immediately compresses time data (time of day that readings were taken) and removes "no flow" data. Time data readings are typically one hour apart at the top of the hour. Under these circumstances, the present invention will only send the original time value. If variations
- the outlier compression method is used, as discussed below.
- records where flow did not occur are identified when all readings are 0 except for temperature and static temperature. These records are removed from the data to be compressed, and indices for which records are no flow are included in the compressed file.
- each of five algorithms are applied to each data array and the best method is selected.
- the five different compression methods are:(l) Strip Zeros; (2) Mode; (3) Delta From Base; (4) Difference From Last; and (5) Ratio Compression.
- Strip Zeros compression finds the largest value in an array and determines how many bits are required to hold that
- Mode Compression searches every parameter array for the value most frequently repeated, or the mode value. For example, when flow time is consistently 60 minutes, Mode Compression will store one value along with an instruction to repeat that value in each field of the array. Any deviations from the mode are identified with the Outlier Compression. If there are no exceptions, the present invention will only save the beginning time value and need not include the delta standard value.
- the Delta From Base method finds the lowest value in the array and subtracts that number from all array values. If the remainders are small relative to the base, then the compressed output containing the base and the deltas will be much smaller than the original array.
- the Difference From Last method is similar to Delta From Base, but the deltas are the differences from the previous values instead of from a base number. This method works particularly well for parameters that change relatively uniformly over time, such as temperature and tank levels. Finally the Ratio Compression method is used when the elements of an array are related proportionally to a data set that can be created from other arrays already included in the compression record. The present invention finally performs a daily summary compression.
- Strip Top Bit Each technique minimizes the number of bits required to identify the data to
- Outlier values are those in data arrays that require unique identification, allowing the remainder of the elements to be expressed in a more compact format.
- Fig. 2a shows an array of 24 data entries having 21 identical entries and three
- the varying entries being the first, third, and fourth locations.
- the three varying entries are considered outliers.
- bitmap indicates with a 1 in the first, third, and fourth locations of the 24-element bitmap that the first, third, and fourth data entries are outliers. If the present invention were to use this configuration, it would require 24 bits to specify which elements were outliers, plus an additional bit to specify that more than one outlier exists. The value of the 21 identical entries is included only once. Although the outlier values must be stored individually, the total amount of data stored is significantly reduced. When just a few data values are outliers and the array contains seven or more elements, such as the array shown in Fig. 2a, the present invention will use a second outlier method for identification. See Fig. 2b. Fig. 2b shows a compression record where that same identification is accomplished with only 13 bits. As shown in Fig. 2b, the first bit indicates that it is a "few outliers" type data array. The next four bits indicate the number of outliers.
- the number of outliers indicator is safely reduced by one because at least one outlier must exist to invoke this record.
- the next bit indicates whether the index is recorded in reverse order from the actual order of the data array. In situations where the outliers are towards the front of the data array, subsequent indices are specified with fewer bits when expressed in the backwards direction.
- the present invention examines both directions of a data array and chooses the direction that results in the smallest resulting data file. If only one outlier exists, the direction indicator is not used as it would be meaningless.
- example is 3. Five bits are used to store this index because 5 bits are sufficient to indicate any number between 0 (00000 2 ) and the 23 indices of this example (10111 2 ).
- the next outlier must be at least one less than the first index, or 2 in this example. Therefore, the position can be indicated in only two bits.
- the final index can only be one less than 2, so the final index (0) is stored with one bit. Therefore, 13 bits identify the outliers in place of the 24-bit outlier bitmap.
- Fig. 3a shows a 24-element data array where elements 15, 17 and 20 - 24 (indicated by index numbers 14, 16 and 19 - 23) are outliers. Because five outliers are clustered at the end of the array, the last four positions do not have
- Fig. 3a The compression record for Fig. 3a is shown in Fig. 3b.
- the present invention uses varying length data fields to identify data entries throughout its process. When the data files are decompressed, the invention must know the lengths of the fields. The present invention will size the fields to match the largest size necessary to convey the required information. For example, assume that Delta From Base was used to compress an array. If the deltas are 17, 0, 23, 45, 3, 8 and 29, the present invention will set the data field to 6 bits - the number of bits necessary to identify the number 45 in binary form.
- the present invention may determine that the first bit of every data entry is 1. For example, in a Delta From Base array, the base number might be 2000. The binary representation of 2000 is 11111010000. The initial "1" of that binary string represents a value of 1024. If the greatest Delta From Base value is less than 976, the present invention knows that all values of the first bit for each data entry are "1" and will remove the
- Strip Top Bit must deal with 0 as an element. It does this by setting the field width value to maximum. For example, when compressing an array with a maximum value of 2 30 - 1, five bits are necessary to specify the field width. If the largest value is in fact 2 30 - 1, then the file size will be stored as 11110. To express 0, the field width will be set to the maximum size or 11111. In other words, 0 is specified as 2 N - 1 , where N is
- the present invention in its optimal form, works only with positive integer values.
- the upper limit is bound by Strip Top Bit, which uses the value 31 to specify subsequent data values of 0.
- This number limits bit widths for data to 30, which has a maximum decimal amount of 1,073,741,823. To ensure error free execution, the present invention may adjust the raw data to remain below this maximum number.
- the present invention adds an offset to all temperature data to make certain that the data consists only of positive integers.
- all temperature readings have 40 degrees added to them.
- the actual value of the increase will depend on the accuracy of the reading. If temperature is measured in lOOths of a degree, the present invention will multiply the readings by 100 to turn the fractions into integers. The 40- degree offset becomes 4000 under these circumstances. This offset is subtracted when the data is decompressed.
- the present invention After adding the temperature offset, the present invention checks all the data values to verify that they fall within the acceptable data range. If they do not. the present invention either raises them to 0 (if negative) or lowers them to 1,073,741,823 (the maximum value the present invention can handle). For example, if temperature us -5000, adding the offset results in -1000, which is less than 0. 0 is substituted for -1000.
- the present invention is easily modified to accommodate negative numbers or a mix of positive and negative numbers.
- the offset technique applied to temperature reasons is easily expanded to address negative data entries.
- the present invention could also address negative and positive numbers by adding a sign entry where necessary, either as an inflection indicator or a sign indicator for each data entry.
- Strip Zeros The first compression method analyzed by the present invention is Strip Zeros.
- a typical number is stored in binary format, many of the binary digits at the beginning of the number are zeros. For example, 100,000 is expressed in binary format as:
- Strip Zeros saves 15 bits for a single data entry less the bits necessary to define the data field.
- Strip Zeros evaluates four distinct methods of compressing arrays of numbers and selects the most compact method. The four methods are: (1) uniform field size; (2) single
- Fig. 6a shows a 6-index array
- Fig. 6b shows the compression record for that array, demonstrating how the uniform size indicator of the
- Strip Zeros compression method was used and how each element was identified with 6 bits. In this example, each data entry was reduced by a size of 5 bits.
- the single field size Strip Zeros compression method is used when the uniform field size does not apply. In other words, some values will require different size fields than others.
- the bit width of the storage field is selected to be the bit width necessary to describe the largest element. Any preceding zeros are stripped from the values, but the top bit is not stripped. Elements that require smaller field widths retain zeros in the beginning bit positions as shown in Fig. 7a.
- the individual field sizes Strip Zeros compression method is most effective when a large variation of field sizes exists. Storing the individual field sizes may be more compact than setting the width to accommodate the largest element value. The present invention will evaluate both methods and choose the smallest result.
- the individual field sizes method can take advantage of the Strip Top Bit technique since each element is separately compressed. Before each element in the compression record is a field size specifying how many bits are required to hold the element (without the top bit). Elsewhere throughout the present invention, this field will always be 5 bits wide. However, in the individual field sizes method, the maximum size of the field can be predetermined. In the example illustrated in Fig. 7c, only 4 bits are needed to identify the maximum field width of 10, which is the number of bits required to store 1025 after the top bit has been removed. The
- Fig. 7a demonstrates that the individual field sizes method can save storage space.
- the original record was 76 bits, while the compressed record is 56 bits.
- the two field sizes Strip Zeros compression method divides the array elements into two groups.
- the present invention starts with the first group being limited to data values
- the present invention may determine that the smallest data value requires 4 bits to identify. All data values that can be identified in 4 bits will belong in the small size group. The size of the compressed file is calculated. Strip Zeros next increases the bit width of the smaller group by one, so that the small group includes the previous data entries as well as all entries identifiable with one more bit. The present invention then
- Fig. 7d shows a compression record using the two field sizes method for the data array shown in Fig. 7a in the individual field size method. The result in this case is 66 bits. While that size causes the present invention to choose the single field size method over the two field sizes method for this array, it shows a compression of 10 bits. In some circumstances, the present invention may be able to save an additional bit by listing the larger field width before the smaller field width. For example, if the large group requires 11 bits, the smaller size must be a bit width of 10 or less, expressed 1010 2 and requiring only 4 bits to store (a savings of 1 bit). Simply reversing the process
- the Mode Compression method searches through an array for the value most frequently repeated - the mode of the array. It stores this value in the compression record only once, and treats all other values as outliers.
- the Mode Compression method yields excellent compression for parameters that are relatively constant such as the flow time parameter in the gas flow data record. For example, a reading may be taken every hour, resulting in precisely 60 minutes between readings.
- the present invention identifies the value of the data, strips the top bit, and transmits the value one time instead of once for each element.
- the number of array elements is either a fixed number or specified at the beginning of the compressed file, so the decompression is easily accomplished. If outliers exist in the array, Mode Compression adds an outlier map and the values of the outlier elements, and applies the Strip Top Bit method.
- Mode Compression is capable of reducing the data file by the number of bits necessary to store all data elements minus one (the one that must be transmitted) less the bits necessary to define the compression. Decompression occurs simply by copying the saved data value in each entry of the array.
- the Delta From Base compression method finds the lowest value in the array. This value is called the base and is subtracted from all the array elements to create deltas. Delta From Base yield particularly good compression when the base is significantly greater than the largest delta. For example, suppose an array had values of 6000, 6003, 6015, and 6008. The binary value of those numbers is: 0001011 1 01 1 10000; 00010111 01110011; 00010111 01111111; and
- the present invention can send the base number one time, and follow it with the deltas from the base.
- the deltas are all 4 designated in 4 bits, resulting in a 12 bit savings per data entry, less the bits
- Delta From Base employs a number of strategies for adjusting the base, specifying the deltas and determining the outliers to minimize the overall compression record
- Delta From Base begins with the consideration of one delta bit width and no outliers.
- An example of an array that Delta From Base compresses efficiently is shown in
- Fig. 10a with a resulting compression record shown in Fig. 10b.
- the base is 347 and the largest delta is 4022.
- the base bit width, after Strip Top Bit is applied, is 8 bits.
- the largest delta, 4022 requires 12 bits to identify.
- the base is transmitted one time in 8 bits and the remaining deltas are each transmitted in 12 bits.
- Delta From Base next considers dividing the deltas into two separate groups, or two size bins.
- Bin size refers to the field width necessary to identify the data of the array, with the large bin size sufficient to identify the largest value stored.
- Delta From Base examines every smaller bin size, ranging from 0 to one less than the largest bin size. If any combination results in an overall smaller compression record size, Delta From Base saves those record parameters. Delta From Base performs this analysis by starting with the largest element value as the base. For example, examine the data array shown in Fig. 1 la. The largest value, selected for the base, is 30803. The deltas range from 0 to 493. The largest delta thus requires 9 bits to identify.
- Fig. 1 lb shows the compression record for this array and this method, indicating that the base width requirement is 14. The top bit is stripped, and the larger delta of 493 is assigned its width of 9 bits. The smaller delta size is assigned a width of 4 bits.
- This example has small bin deltas clustered at the end of the array, so the compression record need not include indices for all of them.
- An outlier is defined as a delta that is too
- Delta From Base can express the outlier as a raw value or as a value outside the delta range.
- the goal behind removing outliers is to shrink the number of bits required to identify the remaining records that vary little from the base number, and thereby reduce the overall number of bits in the compressed file. For example, suppose that in an array of 24 elements, 23 elements required 4 bits to store and one element required 5 bits to store. By keeping the delta size within 4 bits, 23 bits were saved. If the number of bits required to identify the outlier is less than 23, a net compression has occurred. Delta From Base examines the elements in three ways: 1) it removes elements from largest to smallest; 2) it removes elements from smallest to largest, and 3) removes elements from top and bottom in
- Delta From Base adjusts the base to the new value, saves the new upper and lower delta values, and determines the field width necessary to handle the base and the
- Fig. 12b Delta From Base - Outliers/Delta From Range considers specifying outliers as the distance form the record's delta range. For example, consider an array containing the elements 0, 23, 15, 4, 31, 19, 33, 39. The deltas range from 0 to 39, requiring 6 bits to store. If the delta fields were reduced by 1 to 5 bits each, all of the values would fit except for 33 and 39. Delta From Base can use a separate size field of six bits to hold 33 and 39. Another approach is to subtract the smaller delta range of 33, leaving 0 and 6, which fit in only 3 bits. This approach has the advantage of using fewer bits for specifying outliers, but does take an extra five bits to specify the outlier range as compared to using raw values described earlier.
- Fig. 13a shows a 24-element array with a delta range from 0 to 543, taking a field of 10 bits to store. Delta From Base determines that removing the maximum delta, or 543, allows the remaining deltas to fit in 9 bit fields. The outlier value is shifted down by 512 (2 9 ) to 31 , which takes 5 bits to store. The adjusted base is represented only once and removed from the remaining elements. The outlier element has an additional
- Difference From Last compression is similar to Delta From Base, but the deltas are the differences from the previous values and not from a base value. This compression type works particularly well for parameters that steadily ramp up/down or cycle through a sampling period, such as a tank level or temperature sensor. Difference From Last begins by subtracting each element from its subsequent element to determine the differences. It then identifies if and where there are inflections, or changes in the signs of the differences. For example, the values of 6000, 6003, and 6015 have binary values of: 00010111 01110000; 000101 11 01110011 ; and 000101 11 0111 1111. Difference From Last may report the second and third values as 0011 and 1111, saving the first 12 bits of each entry.
- Decompression requires that the first delta be added to or subtracted from the number before it, and the next delta be added to or subtracted from the result of the first delta calculation, and so on. If there is a change in sign of the deltas, then either an inflection map must be included in the compression record or else the values that change inflection must be treated as outliers.
- An inflection map is similar to an outlier bit- map. It consists of a number of bits equivalent to the number of data entries in the data array.
- Difference From Last employs two methods for optimizing the compression record size. The first method splits the differences into a large size bin and a small size bin. It next finds the optimum lower bin size in a fashion similar to that previously discussed. The second method shrinks the difference size and adds outliers to those removed because of inflection. The logic flow of the Difference From Last method can be seen in Fig. 14.
- Fig. 15b shows the compression record for the array.
- the compression record indicates no inflections, no outliers, change in the positive direction, a field size, a base
- Difference From Last next considers dividing the differences into two size bins. Difference From Last searches every possible lower bin size, which can range from 0 to one less than the largest bin size. If any two-size combination results in an overall smaller compression record size, then Difference From Last saves the record parameters.
- Fig. 16a shows an array where the initial value is 4547 and the differences range from -1131 to 994. Difference From Last will create an inflection record to identify when the differences change between positive and negative values. The inflection record specifies where the differences
- Fig. 16b shows the compression record for this example.
- the record contains an inflections indicator, few outliers indicator, number of outliers, index direction backwards indicator, and the outlier index records. It then indicates the initial size of the field to hold the initial value and the initial value itself. Two different size bins are then indicated with more than one outlier.
- the record then includes the larger and smaller difference field sizes, followed by the values of the differences as well as a bit map which indicates which data entries are small bin entries and which bin entries are large bin entries.
- Difference From Last will consider designating specific differences as outliers.
- Difference From Last splits the differences based upon sign of the value. Those values whose sign (positive or negative) is occurs less frequently are considered outliers. The remaining differences are set to positive values and the difference size is based upon the largest non-outlier difference. Difference From Last then shrinks the difference size, designating the now excluded values as outliers. If a particular difference size results in a
- Fig. 17a shows an example array where the initial value is 1048 and the differences range from -1 to -739. There are no inflections. Difference From Last sets all the differences to positive. Difference From Last starts with 10 bits to store 739 and then shrinks the difference size from 10 to 0 bits. In this example, the optimal size occurs at 5 bits, leaving three outliers in index positions 1, 3 and 4.
- the compression record is shown in Fig. 17b. Decompression of any Difference From Last compression method is achieved by reversing the compression steps taken.
- Ratio compression is used when the elements of an array are related proportionally to a data set that can be created from other arrays already included in the compressed record. Processing begins by creating an array of estimated values, and then finding a ratio that minimizes the range of deltas between the estimated and actual values.
- the present invention searches for the optimum way to describe the deltas.
- the logic flow of ratio compression is shown in Fig. 18.
- ratio compression For compressing natural gas flow data, ratio compression is used for three types of parameters: pressure extension, volume and energy. For each, an estimated array is
- the estimated arrays are created in the following ways:
- the estimated pressure extension array is calculated by taking the square root of the product of the differential and static pressures. 2. The estimated volume array is created by multiplying flow time by pressure extension and then by C. If the pressure extension array is not included in the gas flow data record, then the estimated pressure extension array is used. If either the flow time or C array is not included, then they are assumed to be the value of 1.
- the estimated energy array is simply the actual volume array.
- the estimated arrays are created in a similar manner from arrays included in the compression record.
- the present invention checks to make sure that multiplying the differential and static pressures will not result in an overflow before performing the square root, i.e., that the result will not exceed the maximum storage of 32 bits.
- the product size is estimated by rewriting the product calculation in logarithms: log 2 (product) ⁇ log 2 (differential pressure) + log 2 (static pressure).
- estimated volume pressure extension x flow time x C.
- the present invention performs the calculation in two steps: first multiplying the pressure extension by flow time and then multiplying the result by C. Either step can result in an overflow, defined in this case as exceeding the maximum Strip Top Bit representation of a number, meaning 30 bits.
- the product is re-written in logarithmic form as previously described and Shrink Product is used to avoid overflows.
- the two equations are
- the present invention will correct this with an adjusting ratio.
- the data arrays may be amenable for Ratio compression. If an array is proportionally related to another, a simple constant multiplied to one array will provide the values of another. In this situation, the original array may be used as the estimate array. In the situation where the second array is proportional to the first with a constant offset added to the multiplication result, the estimate
- ratios may include two offsets that vary over time. In this case, the best approach is to use the lowest values of each array as separate offsets. The value of the offset must be included in the compression record.
- Ratio compression assumes that the ideal ratio is the one that minimizes the sum of deltas when specified as:
- the present invention uses a more compact representation by limiting the mantissa to 15 bits (14 bits after Strip Top Bit) and the exponent to 5 bits. With a flag field and a mantissa size field, the ratio record can range from 1 bit to 24 bits. Allowing the mantissa to become greater than 15 bits does not add any benefits. The product of the mantissa with the estimate is limited to 31 bits. so increasing the precision of the mantissa only reduces it for the estimate.
- K x estimate-, (mantissa x estimate-,) x 2 "exponent .
- the dynamic range of the ratio is from 4.66 x 10 "10 to 3.28 x 10 4 .
- the present invention aborts further ratios processing if either array sum is 0, the calculated mantissa is 0, or the calculated exponent is less than 0.
- the present invention will also take steps to prevent an overflow in summing ratio arrays. Before summing either the estimate or actual arrays, the present invention will verify that the result will not exceed the maximum storage size of 32 bits. The sum is quickly estimated as:
- the table presented in Fig. 19 is a sample gas flow record with 22 volume values and estimated volume values. The estimated volume was created by multiplying flow time, C and the square root of the product of differential and static pressures, as previously described. First, the present invention checks for a potential overflow in summing the actual and estimated volume arrays.
- an overflow may exist.
- the present invention determines that adjusting all of the estimated volume elements downward by a factor of 2 2 is necessary. After completing the calculations, the present invention determines that the actual volume sum is 15016.
- the present invention divides the actual volume sum (3936354304) by the estimated volume sum (184651) to create the mantissa (21317).
- the present invention recognizes that the exponent is greater than 31 , and that it needs make no further adjustments.
- the present invention must then determine the compression record size corresponding to this ratio. Next, it will shrink the mantissa by one bit and reduce the exponent by one iteratively. With each iteration, the present invention calculates what the compression record size would be for the new ratio along with 1 added to the ratio and 1 subtracted from the ratio. If a new mantissa and exponent result in a smaller compression record, then the present invention saves these values.
- the present invention treats the new ratio as a better approximation of the optimum ratio.
- the present invention then adjusts the new mantissa back up to 15 bits and the exponent upward, provided that the exponent does not exceed 31.
- the present invention repeats the mantissa shrinkage cycle repetitively until the compression record size no longer decreases. Tabular examples of this
- Fig. 21a For example, consider the pressure extension array of Fig. 21a.
- the present invention creates an estimate array by taking the square root of the product of the differential and static pressures as shown in Fig. 21b.
- the present invention determines that the optimum ratio has a mantissa of 289 and an exponent of 8. This is the equivalent of 1.129. Multiplying the estimated pressure extensions by this ratio creates a range of deltas from -2 to 13, as seen in Fig. 21c.
- the present invention picks the lowest delta as the base and adjusts the deltas upwards. Since the largest delta is 15, the deltas can fit into 4 bits.
- Ratio Compression next considers dividing the deltas into two size bins. Ratio searches every possible lower bin size, which can range from 0 to one less than the largest bin size. Ratio repeats this process by using the largest delta as the base and subtracts each delta from it. If any two-size combination from either search results in an overall smaller compression record, ratio compression will save the parameters. For example, see Fig. 22a. Flow time, C and pressure extension values are not available, so the present invention uses the square root of the product of the differential and static pressures for the volume estimate. The present invention determines that the optimum ratio has a mantissa of 3 and an exponent of 2. This is the equivalent of 0.75 (3x2 ⁇ 2 ). Multiplying the estimated volume values by this ratio crates a range of deltas from -455 to 15 as seen in Fig. 22b. The present invention picks
- Ratio compression next considers specifying outliers as the value of the
- delta bin size or field bit width, one bit at a time and then calculates the value of the difference between the top of the data range and the outliers. Ratio repeats this process by using the largest delta as the base and subtracting each delta from it. If any combination
- Fig. 23a shows an example pressure extension array.
- the present invention uses the square root of the product of the differential and static pressures for the pressure extension estimate.
- the present invention determines that the optimum ratio has a mantissa of 1 and an exponent of 0. Since the ratio is 1 , the present invention uses a single flag to indicate a ratio of one in the compression record.
- Fig. 23b shows an array for the same example readings as Fig. 23a with volume, the ratio estimate, the delta, and a value for (base - delta).
- the present invention picked the highest delta of 187 and subtracted the other deltas from it. Using a delta size of 7 and treating all deltas below 68 as outliers provides the optimal size compression file.
- Outlier values range up to 993 outside the acceptable delta range, requiring 10 bits to store.
- Fig. 23c shows the compression record for this example.
- the first 5 lines of the record indicate that the data is compressed by ratio compression with many outliers and indicate the positions of those outliers.
- the base is indicated in the next three lines as positive 68.
- a flag indicating direction from base for the outliers is next, bit lengths are specified, and the outlier values are identified.
- Ratio Compression applies the selected ratio equation to the other decompressed values of the transmitted data, followed by addition or subtraction of the deltas that designate the variation for each data entry from the ratio.
- the present invention applies Time Compression, No Flow Record Removal, Daily Summary Compression, and One
- Readings are typically taken once an hour for the purposes that the present invention was designed.
- the present invention can compress the time array very compactly if every element is taken at the top of the hour and every
- the present invention identifies that there are no outliers and specifies the first time reading by hour after midnight.
- the present invention creates an outlier bitmap for all the data entries.
- the first entry is flagged as an outlier. Any other entry that is not exactly 60 minutes from the preceding entry is also flagged as an outlier.
- the full value of all outliers is stored in the compressed file. During decompression, the outlier values are placed in the time array, and all other values are determined by adding 60 minutes to the preceding data value.
- the invention searches the data for records for "no flow" records. No flow records exist when all readings are zero with the exception of static pressure and temperature. The invention then removes the zero elements form the differential pressure, flow time, C, pressure extension, volume and energy arrays. The records remaining for compression are reduced by the number of no flow records. The present invention uses the outlier method to identify which records are no flow. The present invention creates a summary value of each given parameter array after compression. The summary value may be the daily average or it may be a total across
- the present invention begins by calculating the average or total of a parameter array and then subtracts the estimated value from the actual value. A number of summation rules are used. To estimate the summary average, the present invention first checks for an overflow, i.e., the sum exceeds the maximum storage space of 32 bits. The upper limit of the sum is estimated by:
- log 2 (sum) ⁇ elements ⁇ largest element x number of elements.
- log 2 (sum) log 2 (largest element) + log 2 (number of elements). If the sum were to overflow, then log 2 (sum) would exceed 32 by no more than N bits.
- N log 2 (largest element) + log 2 (number of elements) - 32. If the present invention determines that N is greater than 0, it can prevent an overflow by shifting down each element N bits before summing and shifting the average back up by N bits.
- N log 2 (largest element) + log 2 (24) - 32. If N is greater than 0, the present invention can prevent an overflow by first shifting down ⁇ (elements) by N bits before dividing by (number of elements), then shifting the total back up
- total ( ⁇ [ ( ⁇ elements) / 2 N ] x 24 ⁇ / number of elements) x 2 N .
- the present invention determines the difference with the actual summary. Depending on the size of the difference, the present invention will create on of three types of compression records. If the difference is between -8 and +7, the small size summary compression record is used.
- the compression record includes a small size indicator (1 bit), a negative sign indicator (1 bit) and a value for
- the compression record will include a flag for the summary size being other than small size, a flag to indicate the value is within the calculated range, and the summary value.
- the present invention will create a large size compression record that includes the summary size field.
- the compression record must add a designator for the size of the field needed to hold the value (after Strip Top Bit is applied).
- each array in the input record has only one reading
- the present invention will put all of the parameters in one array instead of having an array for each type of data reading. If daily summaries are set for that reading, they are interspersed with the readings from that hour. The composite array is then compressed by Strip Zeros.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2001238365A AU2001238365A1 (en) | 2000-02-15 | 2001-02-15 | Method for compression of small computer data files |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US50415300A | 2000-02-15 | 2000-02-15 | |
| US09/504,153 | 2000-02-15 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2001061543A1 true WO2001061543A1 (en) | 2001-08-23 |
| WO2001061543A9 WO2001061543A9 (en) | 2002-10-17 |
Family
ID=24005072
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2001/004986 Ceased WO2001061543A1 (en) | 2000-02-15 | 2001-02-15 | Method for compression of small computer data files |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2001238365A1 (en) |
| WO (1) | WO2001061543A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2385958A (en) * | 2001-11-14 | 2003-09-03 | Hewlett Packard Co | Data compression system |
| US7672262B2 (en) | 2005-04-22 | 2010-03-02 | Baker Hughes Incorporated | System, method, and apparatus for command and control of remote instrumentation |
| EP2153610A4 (en) * | 2007-06-01 | 2010-09-15 | Research In Motion Ltd | Interactive compression with multiple units of compression state information |
| GB2467239B (en) * | 2010-03-09 | 2011-02-16 | Quantum Corp | Controlling configurable variable data reduction |
| EP1292036B1 (en) * | 2001-08-23 | 2012-08-01 | Nippon Telegraph And Telephone Corporation | Digital signal decoding methods and apparatuses |
| CN107409152A (en) * | 2015-03-12 | 2017-11-28 | 英特尔公司 | Method and apparatus for compressing the data received by network |
| CN115085737A (en) * | 2022-07-29 | 2022-09-20 | 上海电气风电集团股份有限公司 | Data compression method and electronic equipment |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4782325A (en) * | 1983-12-30 | 1988-11-01 | Hakan Jeppsson | Arrangement for data compression |
| US4988998A (en) * | 1989-09-05 | 1991-01-29 | Storage Technology Corporation | Data compression system for successively applying at least two data compression methods to an input data stream |
| US5546575A (en) * | 1994-05-23 | 1996-08-13 | Basil E. Potter & Associates, Inc. | Encoding method for compressing a tabular database by selecting effective compression routines for each field and structure of partitions of equal sized records |
| US5649151A (en) * | 1992-06-29 | 1997-07-15 | Apple Computer, Inc. | Efficient method and apparatus for access and storage of compressed data |
| US5678043A (en) * | 1994-09-23 | 1997-10-14 | The Regents Of The University Of Michigan | Data compression and encryption system and method representing records as differences between sorted domain ordinals that represent field values |
-
2001
- 2001-02-15 WO PCT/US2001/004986 patent/WO2001061543A1/en not_active Ceased
- 2001-02-15 AU AU2001238365A patent/AU2001238365A1/en not_active Abandoned
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4782325A (en) * | 1983-12-30 | 1988-11-01 | Hakan Jeppsson | Arrangement for data compression |
| US4988998A (en) * | 1989-09-05 | 1991-01-29 | Storage Technology Corporation | Data compression system for successively applying at least two data compression methods to an input data stream |
| US5649151A (en) * | 1992-06-29 | 1997-07-15 | Apple Computer, Inc. | Efficient method and apparatus for access and storage of compressed data |
| US5546575A (en) * | 1994-05-23 | 1996-08-13 | Basil E. Potter & Associates, Inc. | Encoding method for compressing a tabular database by selecting effective compression routines for each field and structure of partitions of equal sized records |
| US5678043A (en) * | 1994-09-23 | 1997-10-14 | The Regents Of The University Of Michigan | Data compression and encryption system and method representing records as differences between sorted domain ordinals that represent field values |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1292036B1 (en) * | 2001-08-23 | 2012-08-01 | Nippon Telegraph And Telephone Corporation | Digital signal decoding methods and apparatuses |
| GB2385958B (en) * | 2001-11-14 | 2006-01-18 | Hewlett Packard Co | Data compression/decompression system |
| GB2385958A (en) * | 2001-11-14 | 2003-09-03 | Hewlett Packard Co | Data compression system |
| US7672262B2 (en) | 2005-04-22 | 2010-03-02 | Baker Hughes Incorporated | System, method, and apparatus for command and control of remote instrumentation |
| EP2153610A4 (en) * | 2007-06-01 | 2010-09-15 | Research In Motion Ltd | Interactive compression with multiple units of compression state information |
| US8090046B2 (en) | 2007-06-01 | 2012-01-03 | Research In Motion Limited | Interactive compression with multiple units of compression state information |
| US8451940B2 (en) | 2007-06-01 | 2013-05-28 | Research In Motion Limited | Interactive compression with multiple units of compression state information |
| GB2467239B (en) * | 2010-03-09 | 2011-02-16 | Quantum Corp | Controlling configurable variable data reduction |
| CN107409152A (en) * | 2015-03-12 | 2017-11-28 | 英特尔公司 | Method and apparatus for compressing the data received by network |
| EP3269126A4 (en) * | 2015-03-12 | 2018-09-05 | Intel Corporation | Method and apparatus for compaction of data received over a network |
| US10701168B2 (en) | 2015-03-12 | 2020-06-30 | Intel Corporation | Method and apparatus for compaction of data received over a network |
| CN107409152B (en) * | 2015-03-12 | 2021-10-15 | 英特尔公司 | Method and apparatus for compressing data received over a network |
| CN115085737A (en) * | 2022-07-29 | 2022-09-20 | 上海电气风电集团股份有限公司 | Data compression method and electronic equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2001061543A9 (en) | 2002-10-17 |
| AU2001238365A1 (en) | 2001-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA2475186C (en) | Method and apparatus for windowing in entropy encoding | |
| CA2283591C (en) | Data coding network | |
| US7973680B2 (en) | Method and system for creating an in-memory physical dictionary for data compression | |
| US9337863B1 (en) | Methods and apparatus for rational compression and decompression of numbers | |
| US20100299378A1 (en) | Sortable floating point numbers | |
| US8509554B2 (en) | Systems and methods for optimizing bit utilization in data encoding | |
| CN113377332B (en) | A Hardware Implementation Method of Softmax Based on Linear Segmentation | |
| CN104834539B (en) | A kind of data increment update method | |
| CN112506880B (en) | Data processing method and related equipment | |
| US5995970A (en) | Method and apparatus for geographic coordinate data storage | |
| AU4690496A (en) | Method and apparatus for digital data compression | |
| EP1995974B1 (en) | Method for realizing arithmetic coding | |
| WO2001061543A1 (en) | Method for compression of small computer data files | |
| US7685213B2 (en) | Conversion of floating-point numbers from binary into string format | |
| CN107291832B (en) | A Data Storage Method Based on List Storage Structure | |
| CN111491169A (en) | Digital image compression method, device, equipment and medium | |
| US7921144B2 (en) | Fast correctly-rounding floating-point conversion | |
| US6240431B1 (en) | Decompression of limited range floating point numbers | |
| US12307217B1 (en) | Dynamic adjustment of floating point exponent bias for exponent compression | |
| US6938062B1 (en) | Apparatus and method for providing higher radix redundant digit lookup tables for recoding and compressing function values | |
| US20060224648A1 (en) | Method and apparatus for providing a base-2 logarithm approximation to a binary number | |
| Chandrasekaran et al. | Polynomial testing of the query “Is ab≥ cd?” with application to finding a minimal cost reliability ratio spanning tree | |
| EP0472730B1 (en) | Data compression and restoration method and device therefor | |
| US11928566B2 (en) | System and method for off-chip data compression and decompression for machine learning networks | |
| JP3018990B2 (en) | Arithmetic coding device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| COP | Corrected version of pamphlet |
Free format text: PAGES 1/30-30/30, DRAWINGS, REPLACED BY NEW PAGES 1/33-33/33; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |