WO2001061543A1

WO2001061543A1 - Method for compression of small computer data files

Info

Publication number: WO2001061543A1
Application number: PCT/US2001/004986
Authority: WO
Inventors: Wallace Walter
Original assignee: Orbcomm Global LP
Current assignee: Orbcomm Global LP
Priority date: 2000-02-15
Filing date: 2001-02-15
Publication date: 2001-08-23
Anticipated expiration: 2002-08-15
Also published as: WO2001061543A9; AU2001238365A1

Abstract

A method for compressing small computer files through the use of reduction of values and removal of unnecessary bits in the compressed computer file. The method examines data in arrays or groups that share certain characteristics and evaluates each of a plurality of compression methods for each array. Compression methods include, without limitation, Mode Compression, Strip Zeros compression, Delta From Base compression, Difference from Last compression, and Ratio Compression. It chooses the best compression method for each array. The method also applies certain compression techniques to all data and uses varying field widths throughout to minimize the compressed file. It results in substantially reduced file sizes compared to traditional compression methods for small computer files.

Description

METHOD FOR COMPRESSION OF SMALL COMPUTER DATA FILES

FIELD OF THE INVENTION

The present invention generally relates to compression of binary data for ease of transmission between a source and receiving device, and more particularly to compression of binary data files by selecting an optimum compression method from a plurality of compression methods.

BACKGROUND OF THE INVENTION

Many data compression methods are known in the art. Most available data compression methods examine the frequency of data character strings and assign variable length codes to represent that data. More frequently appearing and longer strings receive shorter codes and rare byte strings receive longer codes. When the resulting compressed file is prepared for transmission, it must include a definition of the codes used in the compression.

These compression programs have two major faults when used to compress small files. First, the definition of the codes is large relative to the data to be compressed, and second, when data changes slightly from record to record, few repetitive byte values can be assigned compact codes to represent them.

The present invention, although useful for many applications, was originally designed for the purpose of reducing file sizes of data recorded at remote locations of natural gas flow, such as where the natural gas flowed from the well into a pipeline. Many natural gas wells are located in places that are difficult to reach. Prior to the development of the present invention, data was recorded and stored on site. Approximately once a month, a

person would visit the site with the resulting expenses in time and travel costs and download the data into a laptop computer. One method to avoid the expense of such a process was to transmit the data from the remote site to a central location. However, standard telephone lines are not typically run to these sites, and the cost of installing a telephone line or similar communication network line is prohibitive. The introduction of satellite communication systems provided another alternative for data transmission.

However, even the satellite communication option proved to be prohibitively expensive. The present invention has reduced the size of the file needed to be transmitted to a point where it is more economical to use satellite communication than to visit each site.

Traditional compression methods, such as PKZip^®, examine the frequency of individual bytes of data and assign variable length codes to represent that data, with frequently appearing byte strings receiving shorter codes than more rare byte strings. In contrast, the present invention eliminates unnecessary bits and makes use of the principle that smaller numbers are identified by fewer bits than larger numbers.

The prior art teaches few compression methods that are optimized for small files composed of data arrays. In most applications, reducing a file of 900 bytes to 200 bytes represents a substantial cost savings over other small file compression methods.

SUMMARY OF THE INVENTION

Computers store and use numbers in binary format (base 2), instead of the more familiar base 10 format. Computers typically use a 16-bit or 32-bit segment of binary numbers, regardless of the number's actual value. Binary digits, or bits, may have a value of either 0 or 1. In a 16-bit computer system, 1024 (base 10) is expressed as 00000100 00000000 (base 2 or binary). 1023 is expressed as 00000011 11111111 , and uses one less meaningful bit than the value of 1024. Just as in base 10 notation, the zeros included in the 16-bit number before the first digit that is not zero do not have any numerical significance. In other words, just as in base 10 format 000001023 is identical to 1023, in binary format, 00000011 1111 1111 is identical to 11 11111111. The present invention uses several methods to compress the number of bits used to store data entries, either by removing unnecessary bits or by designating the number in a fashion that requires fewer bits.

The present invention examines data arrays found in gas flow records and evaluates, for example, five different compression algorithms for each data array. The best algorithm choice depends upon the underlying characteristics of the array. A data sample might include time, differential pressure, static pressure, temperature, flow time, C , pressure extension, volume and energy data values. After these measurements are arranged in arrays by the present invention, each of five algorithms is applied to each array and the method is chosen for each array that allows maximum compression. Each compression method makes use of a few compression techniques that further reduce the size of the compressed file.

The present invention uses dynamic data field sizing throughout the compressed file to identify data values. If four bits are sufficient to positively identify a value, the present invention will use only four bits instead of the 16 bits or 32 bits used in the standard computer format. Many of the compression methods and techniques are in themselves novel, such as Ratio Compression, Strip Top Bit, and Outlier Compression. The present invention also includes novel methods for specifying time and summary data.

Some data arrays, such as flow time, may include many repetitions of the same number. The present invention searches for the most frequently repeated value in the data array, or the mode of the array. The present invention will include the mode only one time in the compression record. If the array has variations from the mode, the variations are treated as outliers and treated individually. The present invention identifies the position of outlier values and their values in the compressed file. This compression method is called Mode compression. Some data arrays, such as static pressure or C readings, may change by small increments in relation to the value of the reading. The present invention will search for the

smallest value in the array and include that value only once in the compressed file. It will then subtract that value from the remaining values and include the difference, or delta, in the compressed file. This compression method is called Delta From Base.

Some data arrays, such as temperature readings or storage tank level readings, increase or decrease regularly by relatively small amounts. The present invention will include the first reading in the compressed file. It will then subtract that reading from the subsequent reading and include that difference in the compressed file. This compression method is called Difference From Last.

Some data arrays are proportional to another array or a combination of other arrays in the data file. In the gas flow application, volume is proportional to C, pressure extension, and flow time. Energy is proportional to differential pressure and static pressure. The present invention creates an array of estimated values and finds a ratio that when multiplied to the estimated values will minimize the range of deltas between the multiplied values and the actual values. The ratio and the deltas are included in the compressed file. This compression method is called Ratio compression.

Some data arrays are most compactly expressed simply by stripping zeros that are not necessary to identify the values of the array. The present invention examines the array

and determines the maximum number of bits necessary to identify the values included. For example, 100,000 is expressed in binary form as 00000000 00000001 10000110 10100000. If 100,000 is the largest value of the array, the present invention can safely strip the first 15 bits from the binary representation, as they will be 0 for every value contained in the array. This compression method is called Strip Zeros. Each of the five compression methods also makes use of other compression techniques to further reduce the size of the compressed file. Strip Top Bit is routinely used to remove the first bit of compressed values when it is safe to assume that the value of the top bit must be 1. The outlier compression method allows the program to identify which elements of an array are "outliers" or expressed differently than the majority of other

elements. It will create an outlier bitmap that has one bit for each element of the array. The bit of the bitmap that corresponds to each outlier will have a value of 1, while all non-outlier values will be 0. In other words, an outlier bitmap is a series of bits that "maps" which values of an array must be treated as outliers. It is an object of the present invention to provide a method of compression that substantially improves the compression ratio for small computer data files.

It is a further object of the invention to optimize the compression of files by choosing from a plurality of compression methods for different sections of data that are

transmitted. It is a further object of the invention to provide a method of compressing data

by identifying an array of data values as a ratio of other data values.

It is a further object of the invention to provide a methods of removing the first bit of an array of data entries when that bit may be assumed to be there.

It is a further object of the invention to optimize the compression file for each compression method by identifying and compressing outliers separately, thereby reducing the record size for the remaining elements.

It is a further object of the invention to provide a method for compressing the

average or sum value of an array. BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a top-level processing diagram, or flowchart, showing the process of the present invention;

Fig. 2a is a table illustrating a sample data array; Fig. 2b is a compression record created for the data contained in Fig. 2a;

Fig. 3a is a table illustrating a sample data array;

Fig. 3b is a compression record created for the data contained in Fig. 3a; Fig. 4 is a compression record illustrating the Strip Top Bit compression method; Fig. 5 is a

processing diagram, or flowchart, showing the process of the Strip Zeros compression method;

Fig. 6a is a table illustrating a sample data array; Fig. 6b is a compression record created for the data contained in Fig. 6a; Fig. 7a is a table illustrating a sample data array with binary values; Fig. 7b is a compression record created for the data contained in Fig. 7a;

Fig. 7c is another compression record for the data contained in Fig. 7a; Fig. 7d is another compression record for the data contained in Fig. 7a; Fig. 8 is a top-level processing diagram, or flowchart, showing the process of

the Mode compression method; Fig. 9 is a top-level processing diagram, or flowchart, showing the process of the Delta From Base compression method;

Fig. 10a is a table illustrating a sample data array; Fig. 10b is a compression record for the data contained in Fig. 10a; Fig. 1 la is a table illustrating a sample data array; Fig. 1 lb is a compression record for the data contained in Fig. 1 la; Fig. 12a is a table illustrating a sample data array; Fig. 12b is a compression record for the data contained in Fig. 12a; Fig. 13a is a table illustrating a sample data array; Fig. 13b is a compression record for the data contained in Fig. 13a; Fig. 14 is a top-level processing diagram, or flowchart, showing the process of the Difference From Last compression method;

Fig. 15a is a table illustrating a sample data array; Fig. 15b is a compression record for the data contained in Fig. 15a;

Fig. 16a is a table illustrating a sample data array; Fig. 16b is a compression record for the data contained in Fig. 16a;

Fig. 17a a table illustrating a sample data array; Fig. 17b a compression record for the data contained in Fig.17a; Fig. 18 is a top-level processing diagram, or flowchart, showing the process of the Ratio compression method; Fig. 19 is a table illustrating a sample data array;

Figs. 20a - 20d are tables showing iterations of the mantissa determination of

the Ratio compression method;

Fig. 21a is a table illustrating a sample data array;

Fig. 21b is a table showing estimate data used by the Ratio compression

method;

Fig. 21c is a table showing deltas determined by the Ratio compression

method;

Fig. 21 d is a compression record for the data contained in Fig. 21a; Fig. 22a is a table illustrating a sample data array; Fig. 22b is a table showing estimates determined by the Ratio compression method;

Fig. 22c is a compression record for the data contained in Fig. 22a;

Fig. 23a is a table illustrating a sample data array; Fig. 23b is a table illustrating an intermediate step of the Ratio compression method; and

Fig. 23c is a compression record for the data contained in Fig. 23a. DETAILED DESCRIPTION OF THE INVENTION

The present invention was designed to compress the binary representation of numbers. It performs its compression by removing unnecessary bits from binary representations and designating numbers in more compact fashion. Computer systems use 16-bit or 32-bit binary values to designate values, regardless of how many bits are actually required. In a 16-bit system, 1024 is designated as 00000100 00000000. Under normal

circumstances, computer systems will always use all 16 bits regardless of whether they are necessary. One way the present invention will reduce the amount of bits required to store the number is to remove the 0's preceding the first occurring value of 1. Although computers keep track of values in the 16-bit or 32-bit format, the preceding 0's are not necessary to designate the value of the number in the same way that 0's preceding base 10 numbers don't mean anything. For example, in base 10, 00000576 identifies the same value as 576. Likewise, 00000100 00000000 in binary format identifies the same value as 100 00000000.

The number of bits necessary to hold a particular base 10 number is identified by the equation N bits = 2^N - 1. In other words, 4 bits can hold a maximum value of 2⁴ - 1 or 15. 15 is represented in binary form as 1111. Similarly, in base 10, the maximum value of N

digits is expressed by the equation N digits = 10^N - 1. 4 digits in base 10 can hold a maximum value of 10⁴ - 1 or 9999. As discussed more fully below, the present invention reduces the values of some data entries so that those entries may be designated by fewer bits.

Referring to Fig. 1 , a flowchart illustrates the basic steps included within the present invention. Raw data is input. In the gas flow application, the present invention immediately compresses time data (time of day that readings were taken) and removes "no flow" data. Time data readings are typically one hour apart at the top of the hour. Under these circumstances, the present invention will only send the original time value. If variations

occur, the outlier compression method is used, as discussed below. In the gas flow data application, records where flow did not occur are identified when all readings are 0 except for temperature and static temperature. These records are removed from the data to be compressed, and indices for which records are no flow are included in the compressed file.

After compressing the time records and removing the no flow records, each of five algorithms are applied to each data array and the best method is selected. The five different compression methods are:(l) Strip Zeros; (2) Mode; (3) Delta From Base; (4) Difference From Last; and (5) Ratio Compression. In simple terms, Strip Zeros compression finds the largest value in an array and determines how many bits are required to hold that

value. The present invention then saves the data in data fields limited to that number of bits, eliminating the string of zeros customarily in front of the actual data. Mode Compression searches every parameter array for the value most frequently repeated, or the mode value. For example, when flow time is consistently 60 minutes, Mode Compression will store one value along with an instruction to repeat that value in each field of the array. Any deviations from the mode are identified with the Outlier Compression. If there are no exceptions, the present invention will only save the beginning time value and need not include the delta standard value. The Delta From Base method finds the lowest value in the array and subtracts that number from all array values. If the remainders are small relative to the base, then the compressed output containing the base and the deltas will be much smaller than the original array. The Difference From Last method is similar to Delta From Base, but the deltas are the differences from the previous values instead of from a base number. This method works particularly well for parameters that change relatively uniformly over time, such as temperature and tank levels. Finally the Ratio Compression method is used when the elements of an array are related proportionally to a data set that can be created from other arrays already included in the compression record. The present invention finally performs a daily summary compression.

Most of the compression algorithms of the present invention use two techniques to compress the data included in the compressed file: Outlier Compression and

Strip Top Bit. Each technique minimizes the number of bits required to identify the data to

be compressed. Outlier Compression

Outlier values are those in data arrays that require unique identification, allowing the remainder of the elements to be expressed in a more compact format. For example, Fig. 2a shows an array of 24 data entries having 21 identical entries and three

varying entries, the varying entries being the first, third, and fourth locations. The three varying entries are considered outliers. A bitmap indicating which entries are outliers, including one bit for each data entry, appears as 10110000 00000000 00000000. The outlier

bitmap indicates with a 1 in the first, third, and fourth locations of the 24-element bitmap that the first, third, and fourth data entries are outliers. If the present invention were to use this configuration, it would require 24 bits to specify which elements were outliers, plus an additional bit to specify that more than one outlier exists. The value of the 21 identical entries is included only once. Although the outlier values must be stored individually, the total amount of data stored is significantly reduced. When just a few data values are outliers and the array contains seven or more elements, such as the array shown in Fig. 2a, the present invention will use a second outlier method for identification. See Fig. 2b. Fig. 2b shows a compression record where that same identification is accomplished with only 13 bits. As shown in Fig. 2b, the first bit indicates that it is a "few outliers" type data array. The next four bits indicate the number of outliers.

The number of outliers indicator is safely reduced by one because at least one outlier must exist to invoke this record. The next bit indicates whether the index is recorded in reverse order from the actual order of the data array. In situations where the outliers are towards the front of the data array, subsequent indices are specified with fewer bits when expressed in the backwards direction. The present invention examines both directions of a data array and chooses the direction that results in the smallest resulting data file. If only one outlier exists, the direction indicator is not used as it would be meaningless. The first outlier index in this

example is 3. Five bits are used to store this index because 5 bits are sufficient to indicate any number between 0 (00000₂) and the 23 indices of this example (10111₂). The next outlier must be at least one less than the first index, or 2 in this example. Therefore, the position can be indicated in only two bits. The final index can only be one less than 2, so the final index (0) is stored with one bit. Therefore, 13 bits identify the outliers in place of the 24-bit outlier bitmap.

When outliers are congregated at one end of an array, the outlier method can further compress the record. For example, Fig. 3a shows a 24-element data array where elements 15, 17 and 20 - 24 (indicated by index numbers 14, 16 and 19 - 23) are outliers. Because five outliers are clustered at the end of the array, the last four positions do not have

to be indicated individually. During the decompression process, the present invention keeps track of how many outlier positions remain to be determined. When that number matches the remaining number of indices, the remaining indices must be outliers and do not need indicators. The compression record for Fig. 3a is shown in Fig. 3b. Strip Top Bit

The present invention uses varying length data fields to identify data entries throughout its process. When the data files are decompressed, the invention must know the lengths of the fields. The present invention will size the fields to match the largest size necessary to convey the required information. For example, assume that Delta From Base was used to compress an array. If the deltas are 17, 0, 23, 45, 3, 8 and 29, the present invention will set the data field to 6 bits - the number of bits necessary to identify the number 45 in binary form.

In some situations, the present invention may determine that the first bit of every data entry is 1. For example, in a Delta From Base array, the base number might be 2000. The binary representation of 2000 is 11111010000. The initial "1" of that binary string represents a value of 1024. If the greatest Delta From Base value is less than 976, the present invention knows that all values of the first bit for each data entry are "1" and will remove the

top bit from the compressed file. When the file is decompressed, that "1" is added back in. This method saves one bit for every entry after the base entry less bits needed to define the modification. A compression record illustrating this example is shown in Fig. 4.

Sometimes, Strip Top Bit must deal with 0 as an element. It does this by setting the field width value to maximum. For example, when compressing an array with a maximum value of 2³⁰ - 1, five bits are necessary to specify the field width. If the largest value is in fact 2³⁰ - 1, then the file size will be stored as 11110. To express 0, the field width will be set to the maximum size or 11111. In other words, 0 is specified as 2^N - 1 , where N is

the size of the filed width value. Data Input Limits

The present invention, in its optimal form, works only with positive integer values. The upper limit is bound by Strip Top Bit, which uses the value 31 to specify subsequent data values of 0. This number limits bit widths for data to 30, which has a maximum decimal amount of 1,073,741,823. To ensure error free execution, the present invention may adjust the raw data to remain below this maximum number.

For example, the present invention adds an offset to all temperature data to make certain that the data consists only of positive integers. By default, all temperature readings have 40 degrees added to them. Of course, the actual value of the increase will depend on the accuracy of the reading. If temperature is measured in lOOths of a degree, the present invention will multiply the readings by 100 to turn the fractions into integers. The 40- degree offset becomes 4000 under these circumstances. This offset is subtracted when the data is decompressed.

After adding the temperature offset, the present invention checks all the data values to verify that they fall within the acceptable data range. If they do not. the present invention either raises them to 0 (if negative) or lowers them to 1,073,741,823 (the maximum value the present invention can handle). For example, if temperature us -5000, adding the offset results in -1000, which is less than 0. 0 is substituted for -1000.

It should be noted that the present invention is easily modified to accommodate negative numbers or a mix of positive and negative numbers. The offset technique applied to temperature reasons is easily expanded to address negative data entries. The present invention could also address negative and positive numbers by adding a sign entry where necessary, either as an inflection indicator or a sign indicator for each data entry.

Strip Zeros The first compression method analyzed by the present invention is Strip Zeros. When a typical number is stored in binary format, many of the binary digits at the beginning of the number are zeros. For example, 100,000 is expressed in binary format as:

00000000000000011000011010100000. The present invention recognizes that 15 of these bytes are not necessary and only the last 17 bits are required to identify the value that needs to be stored, resulting in:

1 10000110 10100000.

In this example, Strip Zeros saves 15 bits for a single data entry less the bits necessary to define the data field. Strip Zeros evaluates four distinct methods of compressing arrays of numbers and selects the most compact method. The four methods are: (1) uniform field size; (2) single

field size; (3) individual field size; and (4) two field sizes. When Strip Zeros is selected as the most efficient compression method by the present invention, only one of the four methods is used. A flow chart showing the logic behind the choice is demonstrated in Fig. 5. In the uniform method field size, the present invention determines that all array elements are exactly N bits in size. Only N - 1 bits are stored because the present

invention knows the first bit is 1 and applies Strip Top Bit. Fig. 6a shows a 6-index array

including the elements, their binary value, and the stripped binary value element field size in bits. In this example, a value of 64 (2⁶) was stripped from the element values. Fig. 6b shows the compression record for that array, demonstrating how the uniform size indicator of the

Strip Zeros compression method was used and how each element was identified with 6 bits. In this example, each data entry was reduced by a size of 5 bits.

The single field size Strip Zeros compression method is used when the uniform field size does not apply. In other words, some values will require different size fields than others. The bit width of the storage field is selected to be the bit width necessary to describe the largest element. Any preceding zeros are stripped from the values, but the top bit is not stripped. Elements that require smaller field widths retain zeros in the beginning bit positions as shown in Fig. 7a.

The individual field sizes Strip Zeros compression method is most effective when a large variation of field sizes exists. Storing the individual field sizes may be more compact than setting the width to accommodate the largest element value. The present invention will evaluate both methods and choose the smallest result. The individual field sizes method can take advantage of the Strip Top Bit technique since each element is separately compressed. Before each element in the compression record is a field size specifying how many bits are required to hold the element (without the top bit). Elsewhere throughout the present invention, this field will always be 5 bits wide. However, in the individual field sizes method, the maximum size of the field can be predetermined. In the example illustrated in Fig. 7c, only 4 bits are needed to identify the maximum field width of 10, which is the number of bits required to store 1025 after the top bit has been removed. The

present invention subtracts 2 from the number of bits to hold the field width of 10 (which is 4 bits) and includes it as the maximum size indicator (2 = 4 - 2). Two can safely be subtracted from the maximum field size because the cases of 0 and 1 correspond to arrays whose largest elements are 0 and 1, respectively. In these two cases, other Strip Zeros methods are found to be more compact. The example shown in Fig. 7a demonstrates that the individual field sizes method can save storage space. The original record was 76 bits, while the compressed record is 56 bits.

The two field sizes Strip Zeros compression method divides the array elements into two groups. The present invention starts with the first group being limited to data values

that require the smaller field widths to identify, while the larger bin holds all other data entries. For example, the present invention may determine that the smallest data value requires 4 bits to identify. All data values that can be identified in 4 bits will belong in the small size group. The size of the compressed file is calculated. Strip Zeros next increases the bit width of the smaller group by one, so that the small group includes the previous data entries as well as all entries identifiable with one more bit. The present invention then

recalculates the results. This process is repeated until the bit width of the smaller group is one less than the bit width of the larger group. If any result is the smallest overall compression, the present invention chooses that method. Fig. 7d shows a compression record using the two field sizes method for the data array shown in Fig. 7a in the individual field size method. The result in this case is 66 bits. While that size causes the present invention to choose the single field size method over the two field sizes method for this array, it shows a compression of 10 bits. In some circumstances, the present invention may be able to save an additional bit by listing the larger field width before the smaller field width. For example, if the large group requires 11 bits, the smaller size must be a bit width of 10 or less, expressed 1010₂and requiring only 4 bits to store (a savings of 1 bit). Simply reversing the process

decompresses data entries that are compressed by Strip Zeros compression.

Mode Compression

The Mode Compression method searches through an array for the value most frequently repeated - the mode of the array. It stores this value in the compression record only once, and treats all other values as outliers. The Mode Compression method yields excellent compression for parameters that are relatively constant such as the flow time parameter in the gas flow data record. For example, a reading may be taken every hour, resulting in precisely 60 minutes between readings. The present invention identifies the value of the data, strips the top bit, and transmits the value one time instead of once for each element. The number of array elements is either a fixed number or specified at the beginning of the compressed file, so the decompression is easily accomplished. If outliers exist in the array, Mode Compression adds an outlier map and the values of the outlier elements, and applies the Strip Top Bit method. The logic behind Mode Compression is shown in Fig. 8. Mode compression is capable of reducing the data file by the number of bits necessary to store all data elements minus one (the one that must be transmitted) less the bits necessary to define the compression. Decompression occurs simply by copying the saved data value in each entry of the array. Delta From Base

The Delta From Base compression method finds the lowest value in the array. This value is called the base and is subtracted from all the array elements to create deltas. Delta From Base yield particularly good compression when the base is significantly greater than the largest delta. For example, suppose an array had values of 6000, 6003, 6015, and 6008. The binary value of those numbers is: 0001011 1 01 1 10000; 00010111 01110011; 00010111 01111111; and

00010111 01111000. The first 12 bits of each of these numbers is identical. The present invention can send the base number one time, and follow it with the deltas from the base. In this example, the deltas are all 4 designated in 4 bits, resulting in a 12 bit savings per data entry, less the bits

necessary to define the compression. Decompression occurs by translating the base number

(restoring the top bit, if necessary) and adding each delta to that base for individual data

entries.

Delta From Base employs a number of strategies for adjusting the base, specifying the deltas and determining the outliers to minimize the overall compression record

size, such as using a single delta size, two delta sizes, and removal of outliers.. The logic flow of Delta From Base can be seen in Fig. 9. Delta From Base considers several different

methods of compression, chooses the best one, and then compares it to the results of other compression methods.

Delta From Base begins with the consideration of one delta bit width and no outliers. An example of an array that Delta From Base compresses efficiently is shown in

Fig. 10a, with a resulting compression record shown in Fig. 10b. When applying the no outliers Delta From Base method to the array produced in Fig. 10a, the base is 347 and the largest delta is 4022. The base bit width, after Strip Top Bit is applied, is 8 bits. The largest delta, 4022, requires 12 bits to identify. The base is transmitted one time in 8 bits and the remaining deltas are each transmitted in 12 bits.

Delta From Base next considers dividing the deltas into two separate groups, or two size bins. Bin size refers to the field width necessary to identify the data of the array, with the large bin size sufficient to identify the largest value stored. Delta From Base examines every smaller bin size, ranging from 0 to one less than the largest bin size. If any combination results in an overall smaller compression record size, Delta From Base saves those record parameters. Delta From Base performs this analysis by starting with the largest element value as the base. For example, examine the data array shown in Fig. 1 la. The largest value, selected for the base, is 30803. The deltas range from 0 to 493. The largest delta thus requires 9 bits to identify. The greatest compression ratio occurs, however, when the deltas are split into two groups, with the smaller deltas assigned 4 bits, or a maximum value of 15. Fig. 1 lb shows the compression record for this array and this method, indicating that the base width requirement is 14. The top bit is stripped, and the larger delta of 493 is assigned its width of 9 bits. The smaller delta size is assigned a width of 4 bits. Direction and lower size indicators are presented, and the deltas follow. This example has small bin deltas clustered at the end of the array, so the compression record need not include indices for all of them.

After Delta From Base has searched for the best no outlier representation, it then searches for the best outlier representation. An outlier is defined as a delta that is too

large to be included compactly in the compression record. Delta From Base can express the outlier as a raw value or as a value outside the delta range.

The goal behind removing outliers is to shrink the number of bits required to identify the remaining records that vary little from the base number, and thereby reduce the overall number of bits in the compressed file. For example, suppose that in an array of 24 elements, 23 elements required 4 bits to store and one element required 5 bits to store. By keeping the delta size within 4 bits, 23 bits were saved. If the number of bits required to identify the outlier is less than 23, a net compression has occurred. Delta From Base examines the elements in three ways: 1) it removes elements from largest to smallest; 2) it removes elements from smallest to largest, and 3) removes elements from top and bottom in

an optimized removal process towards the center of the element values. If a more compact record is found, Delta From Base adjusts the base to the new value, saves the new upper and lower delta values, and determines the field width necessary to handle the base and the

outliers.

Consider the array shown in Fig. 12a. If the base value were set to 28, the maximum delta would be 730 and would require 10 bits to store. If three values were excluded and the base was adjusted to 706, the maximum delta is 52, requiring only 6 bits to store. The number of saved bits is 4 per element, or 76 in this example. As the identification of the outliers will take 19 bits, the record is compressed by an additional 57 bits. The

compression record is demonstrated in Fig. 12b. Delta From Base - Outliers/Delta From Range considers specifying outliers as the distance form the record's delta range. For example, consider an array containing the elements 0, 23, 15, 4, 31, 19, 33, 39. The deltas range from 0 to 39, requiring 6 bits to store. If the delta fields were reduced by 1 to 5 bits each, all of the values would fit except for 33 and 39. Delta From Base can use a separate size field of six bits to hold 33 and 39. Another approach is to subtract the smaller delta range of 33, leaving 0 and 6, which fit in only 3 bits. This approach has the advantage of using fewer bits for specifying outliers, but does take an extra five bits to specify the outlier range as compared to using raw values described earlier. If the outlier deltas are in the negative direction, then the absolute values are stored. Delta From Base uses a sign flag to indicate whether the outliers extend towards the top of the range or towards the bottom. Fig. 13a shows a 24-element array with a delta range from 0 to 543, taking a field of 10 bits to store. Delta From Base determines that removing the maximum delta, or 543, allows the remaining deltas to fit in 9 bit fields. The outlier value is shifted down by 512 (2⁹) to 31 , which takes 5 bits to store. The adjusted base is represented only once and removed from the remaining elements. The outlier element has an additional

512 removed from it. The remaining values are the deltas from base. The compression record for this example is shown in Fig. 13b.

Difference From Last

Difference From Last compression is similar to Delta From Base, but the deltas are the differences from the previous values and not from a base value. This compression type works particularly well for parameters that steadily ramp up/down or cycle through a sampling period, such as a tank level or temperature sensor. Difference From Last begins by subtracting each element from its subsequent element to determine the differences. It then identifies if and where there are inflections, or changes in the signs of the differences. For example, the values of 6000, 6003, and 6015 have binary values of: 00010111 01110000; 000101 11 01110011 ; and 000101 11 0111 1111. Difference From Last may report the second and third values as 0011 and 1111, saving the first 12 bits of each entry. Decompression requires that the first delta be added to or subtracted from the number before it, and the next delta be added to or subtracted from the result of the first delta calculation, and so on. If there is a change in sign of the deltas, then either an inflection map must be included in the compression record or else the values that change inflection must be treated as outliers. An inflection map is similar to an outlier bit- map. It consists of a number of bits equivalent to the number of data entries in the data array.

The bit that corresponds to each entry that changes inflection from the previous entry is stored

as 1 while other bits are stored as 0.

Difference From Last employs two methods for optimizing the compression record size. The first method splits the differences into a large size bin and a small size bin. It next finds the optimum lower bin size in a fashion similar to that previously discussed. The second method shrinks the difference size and adds outliers to those removed because of inflection. The logic flow of the Difference From Last method can be seen in Fig. 14.

Difference From Last begins by considering the simplest case of just one difference size and no outliers. An inflection record is included if necessary. Fig. 15a shows

a 24-element array where the initial value is 116. The differences range from 10 to 19 without inflections. Fig. 15b shows the compression record for the array. The compression record indicates no inflections, no outliers, change in the positive direction, a field size, a base

size, and the deltas in order.

Difference From Last next considers dividing the differences into two size bins. Difference From Last searches every possible lower bin size, which can range from 0 to one less than the largest bin size. If any two-size combination results in an overall smaller compression record size, then Difference From Last saves the record parameters. Fig. 16a shows an array where the initial value is 4547 and the differences range from -1131 to 994. Difference From Last will create an inflection record to identify when the differences change between positive and negative values. The inflection record specifies where the differences

change signs, i.e., index numbers 2, 3, 7 and 15. The difference array is then converted to all positive values. Next, Difference From Last determines that the larger bin size must be 11 bits wide. It calculates the resulting final output for smaller bin sizes of 0 to 10 bits. In this example, the most advantageous example occurs when the smaller bin size is 8 bits. Fig. 16b shows the compression record for this example. The record contains an inflections indicator, few outliers indicator, number of outliers, index direction backwards indicator, and the outlier index records. It then indicates the initial size of the field to hold the initial value and the initial value itself. Two different size bins are then indicated with more than one outlier. The record then includes the larger and smaller difference field sizes, followed by the values of the differences as well as a bit map which indicates which data entries are small bin entries and which bin entries are large bin entries.

Finally, Difference From Last will consider designating specific differences as outliers. Difference From Last splits the differences based upon sign of the value. Those values whose sign (positive or negative) is occurs less frequently are considered outliers. The remaining differences are set to positive values and the difference size is based upon the largest non-outlier difference. Difference From Last then shrinks the difference size, designating the now excluded values as outliers. If a particular difference size results in a

smaller compression record, Difference From Last saves the parameters. Fig. 17a shows an example array where the initial value is 1048 and the differences range from -1 to -739. There are no inflections. Difference From Last sets all the differences to positive. Difference From Last starts with 10 bits to store 739 and then shrinks the difference size from 10 to 0 bits. In this example, the optimal size occurs at 5 bits, leaving three outliers in index positions 1, 3 and 4. The compression record is shown in Fig. 17b. Decompression of any Difference From Last compression method is achieved by reversing the compression steps taken.

Ratio Compression

Ratio compression is used when the elements of an array are related proportionally to a data set that can be created from other arrays already included in the compressed record. Processing begins by creating an array of estimated values, and then finding a ratio that minimizes the range of deltas between the estimated and actual values.

The present invention then searches for the optimum way to describe the deltas. The logic flow of ratio compression is shown in Fig. 18.

For compressing natural gas flow data, ratio compression is used for three types of parameters: pressure extension, volume and energy. For each, an estimated array is

created from the arrays that have already been compressed, and the deltas from the actual arrays are then calculated. If the deltas are small, then a compression output record is created. The estimated arrays are created in the following ways:

1. The estimated pressure extension array is calculated by taking the square root of the product of the differential and static pressures. 2. The estimated volume array is created by multiplying flow time by pressure extension and then by C. If the pressure extension array is not included in the gas flow data record, then the estimated pressure extension array is used. If either the flow time or C array is not included, then they are assumed to be the value of 1.

3. The estimated energy array is simply the actual volume array. For non-gas flow applications, the estimated arrays are created in a similar manner from arrays included in the compression record.

The present invention checks to make sure that multiplying the differential and static pressures will not result in an overflow before performing the square root, i.e., that the result will not exceed the maximum storage of 32 bits. The product size is estimated by rewriting the product calculation in logarithms: log₂ (product) < log₂ (differential pressure) + log₂ (static pressure). Log₂ is defined as the number of bits required to store a value. If the product were to overflow, then log₂ (product) would exceed 32 by no more than N bits where N is calculated as N = log₂ (differential pressure) + log₂ (static pressure) - 32. If the present invention determines that N is greater than 0, it will prevent an overflow by first shifting

down both elements by a total of N bits before multiplying them, where N is first rounded up to be an even number. The present invention shifts down the larger of the two elements first until it is the smaller, and then alternates shifting down the respective elements. This technique is called "Shrink Product". Shrink Product then shifts the square root back up by N/2 bits. The equation for the process is: estimate = [(differential pressure/2^A) x (static pressure/2^B)]° ⁵ x 2^N/2, where N = A + B. A and B are the shift values and are increased one at a time while the larger of differential or static pressure is reduced by a factor of 2. When A + B equals N, then the shifting process has completed. Similarly, the present invention prevents an estimated volume overflow with the following equation: estimated volume = pressure extension x flow time x C. The present invention performs the calculation in two steps: first multiplying the pressure extension by flow time and then multiplying the result by C. Either step can result in an overflow, defined in this case as exceeding the maximum Strip Top Bit representation of a number, meaning 30 bits. The product is re-written in logarithmic form as previously described and Shrink Product is used to avoid overflows. In this case, the two equations are

rewritten log, (product) < log₂ (multiplicand 1) + log, (multiplicand2). If the product were to overflow, then log, (product) would exceed 30 by N, where N = log₂ (multiplicand 1) + log₂ (multiplicand2) - 30,

and the result would be off by a factor of 2^N1+N2. The present invention will correct this with an adjusting ratio.

For some applications of the present invention, the data arrays may be amenable for Ratio compression. If an array is proportionally related to another, a simple constant multiplied to one array will provide the values of another. In this situation, the original array may be used as the estimate array. In the situation where the second array is proportional to the first with a constant offset added to the multiplication result, the estimate

is more difficult. If the offset is small compared to the results of the multiplication, it can be ignored. If it is large, it may skew the ratio calculation. In this situation, the safest method is

to remove the offset before invoking ratio compression. However, this offset must be known, and would have to be included in the present invention build for the purpose intended. Some

ratios may include two offsets that vary over time. In this case, the best approach is to use the lowest values of each array as separate offsets. The value of the offset must be included in the compression record.

Ratio compression assumes that the ideal ratio is the one that minimizes the sum of deltas when specified as:

∑ delta_n = ∑ [actual-, - (K x estimate,,)] = ∑ actual„ - K x ^ estimate-,. If ∑ delta_n is set to equal 0, then K = ∑ actual-/^ estimate,,. If K were a floating-point number, then it would take 32 bits to specify in the compression record. The present invention uses a more compact representation by limiting the mantissa to 15 bits (14 bits after Strip Top Bit) and the exponent to 5 bits. With a flag field and a mantissa size field, the ratio record can range from 1 bit to 24 bits. Allowing the mantissa to become greater than 15 bits does not add any benefits. The product of the mantissa with the estimate is limited to 31 bits. so increasing the precision of the mantissa only reduces it for the estimate.

The exponent is essentially the number of places to shift the product to the right. Therefore, the product of the ratio and the estimate is the following equation:

K x estimate-, = (mantissa x estimate-,) x 2^"exponent. The dynamic range of the ratio is from 4.66 x 10^"10 to 3.28 x 10⁴. The present invention

calculates the initial ratio in four steps:

1. Calculate the sums of the estimate and actual arrays, while ensuring that neither sum will overflow. The exponent is initialized to 0 and adjusted to prevent an overflow.

2. Adjust the actual array sum upward and the estimate array sum downward to size the mantissa at 15 bits, and adjust the exponent accordingly.

3. Divide the estimate array sum into the actual array sum to create the mantissa.

4. If the exponent is greater than 31, then reduce the mantissa downward N

bits to bring the exponent down to 31.

The present invention aborts further ratios processing if either array sum is 0, the calculated mantissa is 0, or the calculated exponent is less than 0. The present invention will also take steps to prevent an overflow in summing ratio arrays. Before summing either the estimate or actual arrays, the present invention will verify that the result will not exceed the maximum storage size of 32 bits. The sum is quickly estimated as:

Sum = ∑ elements < largest element x number of elements

The equation can be written logarithmically as: log, (sum) < log, (largest element) + log, (number of elements) If the sum were to overflow, than log,(sum) would exceed 32 by no more than N bits, where N is calculated as:

N = log, (largest element) + log, (number of elements) - 32. If the present invention determines that N is greater than 0, it can prevent an overflow by first shifting down each element N bits before summing pursuant to the equation: sum = ∑ (elements/2^N). The present invention increases the exponent by N when preventing an estimated array sum overflow and reduces the exponent by N when preventing an actual array sun overflow. The table presented in Fig. 19 is a sample gas flow record with 22 volume values and estimated volume values. The estimated volume was created by multiplying flow time, C and the square root of the product of differential and static pressures, as previously described. First, the present invention checks for a potential overflow in summing the actual and estimated volume arrays. In this example, an overflow may exist. The present invention determines that adjusting all of the estimated volume elements downward by a factor of 2² is necessary. After completing the calculations, the present invention determines that the actual volume sum is 15016. Next, the present invention adjusts the estimated and actual volume sums so that the calculated ratio will be 15 bits in size. For this example, the present invention adjusts the estimated volume sum downward (378166712/2" = 184651) and the actual volume sum upward (15016 x 2¹⁸ = 3936354304). The exponent is increased by 11 and then by 18 to a new value of 31. Third, the present invention divides the actual volume sum (3936354304) by the estimated volume sum (184651) to create the mantissa (21317). Finally, the present invention recognizes that the exponent is greater than 31 , and that it needs make no further adjustments. With a mantissa of 21317 and exponent of 31 , the initial ratio has the effective value of 21317 x 2^"31 = 9.9265 x 10^"6. After finding the initial ratio, the present invention must then determine the compression record size corresponding to this ratio. Next, it will shrink the mantissa by one bit and reduce the exponent by one iteratively. With each iteration, the present invention calculates what the compression record size would be for the new ratio along with 1 added to the ratio and 1 subtracted from the ratio. If a new mantissa and exponent result in a smaller compression record, then the present invention saves these values. The present invention treats the new ratio as a better approximation of the optimum ratio. The present invention then adjusts the new mantissa back up to 15 bits and the exponent upward, provided that the exponent does not exceed 31. The present invention repeats the mantissa shrinkage cycle repetitively until the compression record size no longer decreases. Tabular examples of this

process, through four iterations, are demonstrated in Figs. 20a - 20d.

When the present invention completes the mantissa shrinkage cycle, it has the

parameters necessary to create a no outlier compression record with one delta size. For example, consider the pressure extension array of Fig. 21a. The present invention creates an estimate array by taking the square root of the product of the differential and static pressures as shown in Fig. 21b. The present invention determines that the optimum ratio has a mantissa of 289 and an exponent of 8. This is the equivalent of 1.129. Multiplying the estimated pressure extensions by this ratio creates a range of deltas from -2 to 13, as seen in Fig. 21c. The present invention picks the lowest delta as the base and adjusts the deltas upwards. Since the largest delta is 15, the deltas can fit into 4 bits. The compression record for the pressure

extension array is shown in Fig. 2 Id.

Ratio Compression - No Outliers/Two Delta Sizes

Ratio Compression next considers dividing the deltas into two size bins. Ratio searches every possible lower bin size, which can range from 0 to one less than the largest bin size. Ratio repeats this process by using the largest delta as the base and subtracts each delta from it. If any two-size combination from either search results in an overall smaller compression record, ratio compression will save the parameters. For example, see Fig. 22a. Flow time, C and pressure extension values are not available, so the present invention uses the square root of the product of the differential and static pressures for the volume estimate. The present invention determines that the optimum ratio has a mantissa of 3 and an exponent of 2. This is the equivalent of 0.75 (3x2^~2). Multiplying the estimated volume values by this ratio crates a range of deltas from -455 to 15 as seen in Fig. 22b. The present invention picks

the highest delta of 15 and subtracts the deltas from it. The deltas now range from 0 to 470. where the larger bin is 9 bits wide. The greatest compression comes by splitting the deltas into two bins, where the smaller bin holds deltas that are 61 or smaller (6 bits wide). The compression record is shown in Fig. 22c. Ratio Compression - Outliers/Delta from Range

Ratio compression next considers specifying outliers as the value of the

record's delta range. The method is the same as that described for the Delta From Base - Outliers/Delta From Range method previously discussed. Ratio compression shrinks the

delta bin size, or field bit width, one bit at a time and then calculates the value of the difference between the top of the data range and the outliers. Ratio repeats this process by using the largest delta as the base and subtracting each delta from it. If any combination

results in an smaller compression record, the parameters are saved. Fig. 23a shows an example pressure extension array. The present invention uses the square root of the product of the differential and static pressures for the pressure extension estimate. The present invention determines that the optimum ratio has a mantissa of 1 and an exponent of 0. Since the ratio is 1 , the present invention uses a single flag to indicate a ratio of one in the compression record. Fig. 23b shows an array for the same example readings as Fig. 23a with volume, the ratio estimate, the delta, and a value for (base - delta). The present invention picked the highest delta of 187 and subtracted the other deltas from it. Using a delta size of 7 and treating all deltas below 68 as outliers provides the optimal size compression file. Outlier values range up to 993 outside the acceptable delta range, requiring 10 bits to store. Fig. 23c shows the compression record for this example. The first 5 lines of the record indicate that the data is compressed by ratio compression with many outliers and indicate the positions of those outliers. The base is indicated in the next three lines as positive 68. A flag indicating direction from base for the outliers is next, bit lengths are specified, and the outlier values are identified.

Decompression of records compressed by Ratio Compression applies the selected ratio equation to the other decompressed values of the transmitted data, followed by addition or subtraction of the deltas that designate the variation for each data entry from the ratio. Other Compression Methods

In addition to the five core compression methods, the present invention applies Time Compression, No Flow Record Removal, Daily Summary Compression, and One

Record Compression to source files. Readings are typically taken once an hour for the purposes that the present invention was designed. The present invention can compress the time array very compactly if every element is taken at the top of the hour and every

subsequent reading is one hour later. Under these circumstances, the present invention identifies that there are no outliers and specifies the first time reading by hour after midnight.

If either of these conditions is not met, an outlier record is created.

If any element in the time array violates either condition (i.e., not at the top of the hour exactly one hour after the previous reading), the present invention creates an outlier bitmap for all the data entries. The first entry is flagged as an outlier. Any other entry that is not exactly 60 minutes from the preceding entry is also flagged as an outlier. The full value of all outliers is stored in the compressed file. During decompression, the outlier values are placed in the time array, and all other values are determined by adding 60 minutes to the preceding data value.

After the time array is compressed, the invention searches the data for records for "no flow" records. No flow records exist when all readings are zero with the exception of static pressure and temperature. The invention then removes the zero elements form the differential pressure, flow time, C, pressure extension, volume and energy arrays. The records remaining for compression are reduced by the number of no flow records. The present invention uses the outlier method to identify which records are no flow. The present invention creates a summary value of each given parameter array after compression. The summary value may be the daily average or it may be a total across

24 hours of records. The present invention begins by calculating the average or total of a parameter array and then subtracts the estimated value from the actual value. A number of summation rules are used. To estimate the summary average, the present invention first checks for an overflow, i.e., the sum exceeds the maximum storage space of 32 bits. The upper limit of the sum is estimated by:

sum = ∑ elements < largest element x number of elements. The equation is rewritten as: log₂ (sum) < log₂ (largest element) + log₂ (number of elements). If the sum were to overflow, then log₂ (sum) would exceed 32 by no more than N bits where

N = log₂ (largest element) + log₂ (number of elements) - 32. If the present invention determines that N is greater than 0, it can prevent an overflow by shifting down each element N bits before summing and shifting the average back up by N bits. The equation employed is: average = {[ ∑ (elements / 2^N) ] / number of elements} x 2^N. To estimate the summary total, the present invention will check during the summation whether the total exceeds the upper limit of 31 bits. If the limit is exceeded, the present invention will set the total to the upper limit. Next, it extrapolates the total upwards if there are fewer than 24 records. The revised total is calculated by: total = [( ∑ elements) x 24] / number of elements.

The equation is rewritten as: log, (total) < log, ( ∑ elements) + log₂ (24 / number of elements). If the log₂ (total) can equal or exceed 32, then the present invention sets the total to the upper limit. Otherwise, it checks if the [( ∑ elements) x 24] portion of the extrapolation will exceed 32 bits by calculating

N = log₂ (largest element) + log₂ (24) - 32. If N is greater than 0, the present invention can prevent an overflow by first shifting down ∑ (elements) by N bits before dividing by (number of elements), then shifting the total back up

by N bits. The equation used is: total = ({[ ( ∑ elements) / 2^N] x 24} / number of elements) x 2^N.

After creating the estimate of the average or total, the present invention determines the difference with the actual summary. Depending on the size of the difference, the present invention will create on of three types of compression records. If the difference is between -8 and +7, the small size summary compression record is used. The compression record includes a small size indicator (1 bit), a negative sign indicator (1 bit) and a value for

the difference (3 bits).

If the difference is outside the -8 to +7 range, the actual summary value must be included in the compression record. If an average summary value is less than or equal to

the size of the largest element in the parameter array, or if a total summary value is less than or equal to the size of the total estimate, the present invention can assume the size of the summary value in the compression record. The compression record will include a flag for the summary size being other than small size, a flag to indicate the value is within the calculated range, and the summary value.

If the summary value cannot fit into the either of the two previous fields, the present invention will create a large size compression record that includes the summary size field. In this case, the compression record must add a designator for the size of the field needed to hold the value (after Strip Top Bit is applied).

Finally, if each array in the input record has only one reading, the present invention will put all of the parameters in one array instead of having an array for each type of data reading. If daily summaries are set for that reading, they are interspersed with the readings from that hour. The composite array is then compressed by Strip Zeros.

Claims

I claim:

1. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the raw data entries into data arrays; predicting the size of a compressed file from each of a plurality of compression methods; selecting the optimum compression method for each data array; applying the optimum compression method for each data array to that data array; and creating a computer compression record that includes identification of the manner in which data was stored and the data itself of a significantly smaller size than the original data file.

2. A method according to claim 1 wherein one of the plurality of compression methods comprises identifying that each data entry is identical to every other data entry and storing the entry only one time.

3. A method according to claim 1 wherein one of the plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion of the character string.

4. A method according to claim 1 wherein one of the plurality of compression methods comprises selecting one data entry as a base value of a data array, subtracting the base value from remaining data entries of the data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.

5. A method according to claim 1 wherein one of the plurality of compression methods comprises selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the differences for the remaining data arrays.

6. A method according to claim 1 wherein one of the plurality of compression methods comprises determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining

the variation of the actual data entries from the results of the mathematical equation applied to the corresponding data entries of the two or more other data arrays, and storing the mathematical equation and each variation.

7. A method according to claim 1 wherein the plurality of compression comprises: identifying that each data entry is identical to every other data entry and storing the entry only one time; removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion of the character string; selecting one data entry as a base value of a data array, subtracting the base value from remaining data entries of the data array, and storing the base value for one data entry and the calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference; and determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation of the actual data entries from the results of the mathematical equation applied to the corresponding data entries of the two or more other data arrays, and storing the mathematical equation and each the variation.

8. A method for compressing data files comprising the steps of: acquiring raw data and organizing the data into data arrays of similar type where the raw data represents readings from instruments, including, but not limited to, entries identifying the time of readings and amount of time between readings; determining which entries representing the amount of time between each reading was a predetermined uniform amount of time, and, if so, storing only the beginning reading time; determining whether data entries indicate that no activity occurred between a time reading and a next time reading, and, if so, removing the record corresponding to the next time reading; predicting the size of a compressed file that would result from each of a plurality of compression methods applied to each data array;

selecting the optimum compression method for each data array; storing instructions indicating data compression methods and data field sizes; storing data compressed by the plurality of compression methods; storing data entry values that are excluded from the data stored by the plurality of compression records;

compressing and storing summaries of a day's worth of data; and creating a computer compression record that includes identification of the manner in which data was stored and the data itself, resulting in a compressed computer file of significantly smaller size than the original data file.

9. A method according to claim 8 wherein one of the plurality of compression methods comprises identifying that each data entry is identical to every other data entry and

storing the entry one time.

10. A method according to claim 8 wherein one of the plurality of compression methods comprises removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion of the character string.

11. A method according to claim 8 wherein one of the plurality of compression methods comprises selecting one data entry as a base value of a data array. subtracting the base value from remaining data entries of the data array, and storing the base value for one data entry and the calculated differences for the remaining data entries.

12. A method according to claim 8 wherein one of the plurality of compression methods comprises selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference.

13. A method according to claim 8 wherein one of the plurality of compression methods comprises determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation of the actual data entries from the results of the mathematical equation applied to the corresponding data entries of the two or more other data arrays, and storing the mathematical equation and each variation.

14. A method according to claim 8 wherein the plurality of compression

methods comprises identifying that each data entry is identical to every other data entry and storing the entry only one time; removing unnecessary zeros from the beginning of a character string that is a binary representation of a data value and storing only the meaningful portion of the character string; selecting one data entry as a base value of a data array, subtracting the base value from remaining data entries of the data array, and storing the base value for one data entry and the calculated differences for the remaining data entries; selecting and storing a first data entry of a data array as a base, and determining the difference between each remaining data entry and its preceding data entry and storing the difference; and determining a mathematical equation that will approximately describe the data entries of one data array as a comparison of two or more other data arrays, determining the variation of the actual data entries from the results of the mathematical equation applied to the corresponding data entries of the two or more other data arrays, and storing the mathematical equation and each the variation.

15. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type; determining an equation that closely describes each data entry of one data array as a comparison of a plurality of other corresponding data entries from other data arrays; determining a variation of each actual data entry from the result of the equation; storing the equation; and storing the variations.

16. A method for compressing data files comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type; determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of

bits necessary to store the data entry.

17. A method according to claim 1 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar type; determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.

18. A method according to claim 7 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

type; determining the number of bits necessary to identify each data entry; storing each data entry in a data field where the data field width equals the number of bits necessary to store the data entry.

19. A method according to claim 8 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

bits necessary to store the data entry.

20. A method according to claim 14 further comprising the steps of: acquiring raw data entries and organizing the data entries into data arrays of similar

21. A method for compressing data files comprising the steps of:

acquiring raw data entries and organizing the data entries into data arrays; applying a compression method to the data entries; identifying data entries that would allow more efficient compression if they were excluded from the data entries to which the compression method was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

22. A method according to claim 1 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

23. A method according to claim 7 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

24. A method according to claim 8 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

25. A method according to claim 14 further comprising the steps of: identifying data entries that would allow more efficient compression by the plurality of compression methods if the identified data entries were excluded from the data entries to which the plurality of compression methods was applied; storing the excluded data entry separately from the data entries to which the compression method was applied.

26. A method according to claim 1 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

27. A method according to claim 7 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

28. A method according to claim 8 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

29. A method according to claim 14 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and

reversing the compression process to recreate the original data array.

30. A method according to claim 17 further comprising the steps of: reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

31. A method according to claim 21 further comprising the steps of:

reading the computer compression record, identifying how data entries were stored, and reversing the compression process to recreate the original data array.

32. A method according to claim 22 further comprising the steps of: