AUTOMATED METHOD FOR COMPRESSING BINARY STRINGS WITHOUT INFORMATION LOSS, AND RELATED AUTOMATED METHOD FOR DECOMPRESSING.
* * * * * * * The present invention concerns an automated method for compressing binary strings without information loss, or lossless compression method, and related automated method for decompressing, which makes possible, in a simple, fast and inexpensively producible way, to minimise in a lossless way the dimension of binary strings. The present invention further concerns the instruments necessary to perform the automated methods and to the apparatus performing the same.
It is known that one of the main problems of computer technology and, more broadly, of telecommunications, lies in the larger and larger amount of data needed for processing. This causes a heavy occupation of memories readable by computers, such as removable (e.g. CD-ROMs and floppy disks) or fixed (the so-called Hard Disks of computers) memory media, and a consequent long occupation of telecommunications networks during information transmission, which involves higher transmission error probability, higher transmission cost, and physical limitation to the feasibility of specific real time applications. In this regard, it is possible to consider some applications for transmitting large data quantity by employing transmission channels having limited capacity; in particular, it is possible to recall the attempt for carrying out an interactive television
service, known as Video-on-Demand, based upon the twin wire telephone network, that, due to the great limitations of the channel, has to necessarily transmit a smaller data quantity, making consequently the television image resolution worse. Considering that the quantity of information presently stored in computers and/or exchanged through telecommunications networks is rapidly growing, also due to the big push exerted by the Internet which extensively uses information having large data quantity (such as digital images), the problem of reducing the quantity of data representing both in- formation and application programs is extremely critical.
In order to solve such problem, many solutions have been developed, providing for an encoding of data, represented in binary format, according to suitable methods in order to reduce the memory occupation. Generally, such encoding methods are particularly advantageous when applied to specific types of data to be encoded, having specific characteristics. In other words, each individual encoding method is more or less advantageous depending on if it is applied, for instance, to data representing stationary images or to data representing video images or to data representing sounds. The compression methods can be classified according to the congruence between decompressed data and original data, dividing the concerned compression methods into methods having no information loss or "lossless" methods, when the data reconstructed or decompressed from the compressed data are identical to the original ones, and methods hav-
ing information losses or "lossy" methods, in which the reconstructed data lose a portion of the original data information. Obviously lossless methods are preferable and, in some specific applications when all the information must be kept, are the only ones that may be used. Examples of lossless methods are the Huffman encoding method and the Lempel-Ziv-Welch, or LZW, encoding method.
Such methods have some drawbacks.
For instance, the Huffman method operates only on static strings, creating a corresponding dictionary for each string, while the LZW method operates on dynamic strings, creating a dynamically updated dictionary. Consequently, both said methods are complex and their computer execution is slow.
Moreover, just because each encoding method is suitably designed in connection with specific types of data to be encoded, newer and newer encoding requirements for data having particular characteristics are being encountered. This causes a certain difficulty of use, especially in case of transmission of composite data, requiring the use of different encoding methods.
Therefore, the search of improved encoding methods aimed at even more compressing the data to be stored and/or transmitted is being even more developed.
It is an object of this invention, therefore, to provide an automated method for compressing binary strings without information loss, or lossless compression method, which is simple, fast executed by a com-
puter, and which makes possible minimise the number of bits needed to represent the number under consideration.
It is another object of this invention to provide such a compression method which generates compressed data strings suitably organised so as to be possibly further compressed.
It is still object of this invention to provide the related automated method for decompressing, and the instruments necessary to perform the automated methods and the apparatus performing the same.
It is specific subject matter of this invention an automated method for compressing, without information loss, or lossless compression method, a binary string S of input data into a compressed binary string E of output data, characterised in that it comprises the following steps:
A. fixing a sorting O of the 2" binary values (B), which may be assumed by a binary word comprising n bits, according to a rule of sorting: O = {B0,B B2,...,B2»_l} = {Bk}k=0 l 2 r_λ ;
B. inizialising the output string E to the void string:
£ ="" ;
C. subdividing the input string S into words each comprising n bits;
D. for each binary value Bk of the sorting O, for k = 0,1,2,...,(2" -2), as- suming that S0 = S ,
D.1 locating within the string Sk the Nk words, with Nk ≥ 0 , having the binary value Bk , D.2 in the case when Nk > 0 , storing in an array Ak the distance dt k , for 7 = 0,1,2,...,Nk - 1 , of each word located during step D.1
with respect to the preceding word having binary value Bk , in
such a way that the distance d " of the first located word is
equal to the number of words of the string Sk which precede it
and the distance dt k of the located words following the first
one is equal to the number of words interposed between the
word located during step D.1 and the preceding word having
binary value Bk ,
D.3 juxtaposing to the output string E a datum related to the pres¬
ence of at least one word having binary value Bk within the
D string Sk and, in the case when Nk > 0 , the elements di * , for
i - 0,1,2,..., Nk - 1 , stored in the array Ak during step D.2,
D.4 constructing a string Sk+i obtained by eliminating from the
string Sk the Nk words, with N ≥ 0 , located during step D.1
having the binary value Bk , and repeating loop D,
E. juxtaposing to the output string E the number N2»_ι of words of the
last string S2"-ι -
Preferably according to the invention, the datum related to the pres¬
ence of at least one word having binary value Bk within the string Sk ,
which is juxtaposed to the output string E during step D.3, is a flag bit, still
more preferably equal to "1", in the case when at least one word having
binary value Bk is present within the string Sk , or otherwise is equal to
"0", in the case when no word having binary value Bk is present within
the string Sk .
Alternatively, the output string E may further comprise an initial flag
bit for indicating whether at least one binary value Bk is not present within the input string S (and, therefore, the specific flags of each individual binary value Bk must be read) or all the binary values Bk are present (and, therefore, the output string E does not comprise specific flags of each in- dividual binary value Bk because all the values must be read).
Always according to the invention, the datum related to the presence of at least one word having binary value Bk within the string Sk , which is juxtaposed to the output string E during step D.3, may be the number N^ of words located within the string Sk . Still according to the invention, the sorting rule may assume the 2" binary values (B) as positive binary values and the sorting 0 fixed during step A may be either the decreasing sorting or the increasing sorting of such positive binary values.
Furthermore according to the invention, during step A the method may count the number of occurrences of each binary value Bk within the input binary string S, and the sorting rule may fix the sorting O according to either the decreasing sorting or the increasing sorting of the occurrences of the binary values Bk within the input binary string S, the method further comprising the step: F. juxtaposing to the output string E data related to the sorting O fixed during step A.
In this way, the method is automatically adaptive to the type of file to be compressed, in conformity to the occurrences of the binary values.
Preferably according to the invention, during step D.3 the elements
d k are juxtaposed to the output string E according to a binary encoded
representation.
Still more preferably according to the invention, said encoded repre¬
sentation defines a first positive integer value gk , with gk ≥0 , and a sec-
ond positive integer value χk , with xk > 0 , and the value of dt * is repre¬
sented according to:
- a first mode, in the case when
- a second mode, in the case when
- a third mode, in the case when
D
Always according to the invention, according to said first mode, di *
may be represented:
- in the case when gk > 0 , by (gk + 1) binary bits according to the
formula ([3]):
where "&" is the juxtaposition operator and [z\ is the positive
two's complement binary representation having t bits of the value Z;
and
in the case when gk = 0 , by only one binary bit equal to
"0"
Still according to the invention, according to said second mode, d
k
may be represented by (g
k +x
k + 2) binary bits according to the formula:
Furthermore according to the invention, according to said third mode,
di k is represented:
A -2*»
- in the case when χk > 0, by gk + χk + 2 +M binary
bits, where In[x] is the operator returning the integer part of X,
according to the formula:
W where & "l" is the operator constructing a string including W bits y=ι
W_ equal to "1" and Re is the operator returning the remainder of Y
W the division — , i.e. it is equal to W -In\ ; and
Y
in the case when binary
bits according to the formula:
Always according to the invention, for each binary value B
k of the sorting O, for k = 0,1,2,...,(2" - 2), it may be
8k = 8 xk = x Still according to the invention, the method may also comprise the step:
G. juxtaposing to the output string E data related to the first positive integer Value g and to the second positive integer value x. . Preferably according to the invention, the method may determine for each array Ak , for k = 0,1,2,...,(2" - 2], the optimum first and second positive integer values gk and xk , i.e. which are apt to minimise the size of memory needed to represent the array Ak according to the binary encoded representation, the method juxtaposing to the output string E, during step D.3, data related to the optimum first and second positive integer values gk and xk .
Furthermore according to the invention, the method may determine the optimum first and second positive integer values gk and xk by means of a trial and error process.
Always according to the invention, the trial and error process may verify the size of memory needed to represent the array Ak according to the binary encoded representation for all the values of the first and second positive integer values gk and xk ranging, respectively, from 0 and G, with G > 0 , and from 0 and X, with X > 0.
Preferably according to the invention, G=12 and/or X=3.
Still preferably according to the invention, the number n of bits, included in the words into which the input string S is subdivided during step C, is an involution of 2: n = 2m Furthermore according to the invention, the method may furthermore comprise the step:
H. juxtaposing to the output string E data related to the number n of bits included in the words into which the input string S is subdivided during step C. Preferably according to the invention, the method still comprises the step:
I. juxtaposing to the output string E data related to the size of the input string S.
Still preferably according to the invention, the method is iterated, being applied at each h-Vn iteration, with h > \ , to the output string E obtained by the preceding (Λ-l)-th iteration.
Always according to the invention, for at least two consecutive iterations (h-\) and h, with h > \ , the corresponding numbers nh_x and nh of bits, included in the words into which the corresponding input strings S are subdivided during step C, may be different with respect to one another: nh_x ≠ nh It is still specific subject matter of this invention an automated method for decompressing a compressed binary string E of input data into a binary string S of output data, characterised in that the compressed bi-
nary string E of input data has been obtained by applying to the binary string S the just described automated compression method, and in that it comprises the following steps:
J. reading within the compressed string E the number N2»_ι of words having the binary value τ5 "-ι '>
K. constructing a string S
2»_, comprising N
2»_χ words having the binary value B
2"_
\ \ L. for each binary value B
k of the sorting 0, for k =
- 3J, ,2,1,0 , constructing a corresponding string S
k executing the steps of the loop:
L 1 initialising the string Sk to the string S +Ϊ, L.2 reading within the compressed string E the datum related to the presence of at least one word having binary value Bk within the string Sk , L.3 in the case when at least one word having binary value Bk is present within the string Sk , reading within the compressed string E the values d k , for i = 0,1,2, ,Nk - \ , with Nk ≥ 0 , of the distances of each word having binary value Bk from the preceding word having binary value Bk , and inserting into the string Sk words having the binary value Bk at the positions located by the values d " L 4 repeating the loop L, M. assuming the binary string S of output data equal to the string S0 :
S = S0 .
Always according to the invention, the decompression method may be iterated, the compressed binary string E of input data having being obtained by applying to the binary string S the iterated automated compression method. It is also subject matter of this invention an electronic apparatus, comprising at least one central processing unit and at least one memory unit, characterised in that it executes the automated compression method previously illustrated.
It is further subject matter of this invention an electronic apparatus, comprising at least one central processing unit and at least one memory unit, characterised in that it executes the automated decompression method described before.
It is still subject matter of this invention an electric, magnetic or electromagnetic signal modulated by a digital signal, characterised in that said digital signal comprises at least one compressed binary string E obtained by means of the automated compression method previously illustrated.
It is another subject matter of this invention a computer program comprising code means adapted to execute, when running on a computer, the automated compression method previously described. It is further subject matter of this invention a memory medium readable by a computer, storing a program, characterised in that the program is the computer program just described.
It is another subject matter of this invention a computer program comprising code means adapted to execute, when running on a computer,
the automated decompression method illustrated before.
It is further subject matter of this invention a memory medium readable by a computer, storing a program, characterised in that the program is the computer program just described. The present invention will be now described, by way of illustration and not by way of limitation, according to its preferred embodiments, by particularly referring to the Figures of the enclosed drawings, in which:
Figure 1 schematically shows a portion of a binary string S of input data to be compressed by means of a preferred embodiment of the auto- mated method according to the invention;
Figure 2 schematically shows the portion of the string S of Figure 1 ;
Figure 3 schematically shows a flow chart of the trial and error process for determining the first and the second optimum positive integer values gk and xk according to the preferred embodiment of the automated method according to the invention;
Figure 4 schematically shows a first portion of the compressed binary string E of output data generated by the preferred embodiment of the automated method according to the invention;
Figure 5 schematically shows a portion of a second string Si gener- ated during the execution of the preferred embodiment of the automated compression method according to the invention;
Figure 6 schematically shows a second portion of the compressed binary string E of output data generated by the preferred embodiment of the automated method according to the invention;
Figure 7 schematically shows a third portion of the compressed bi¬
nary string E of output data generated by the preferred embodiment of the automated method according to the invention; and
Figure 8 schematically shows a flow chart of the preferred embodi-
ment of the automated compression method according to the invention.
In the following of the description same references will be used to indicate alike elements in the Figures.
With reference to Figure 1 , it may be observed that the automated
compression method firstly subdivides into words 1 of equal length the
string S, having binary representation, which is to be compressed. Each
word 1 comprises n bits, where in a preferred embodiment of the invention
n = 8. Advantageously, in case the size of the string S is not a multiple of
n, in order to obtain a last word of n bits the string S may be padded with a
tail of bits equal to "0".
Each word 1 may assume one binary value B among 2n different bi¬
nary values, which, for n = 8, are equal to 2
8 = 256. The method fixes a sorting O of such 2
n values according to a corresponding sorting rule:
In the embodiment of the method shown in the Figures, the words 1 are
considered as positive binary numbers the sorting of which is the de¬
creasing sorting from the decimal value 255, equal to the binary value "11111111", to the value 0, equal to the binary value "00000000".
With reference to Figure 2, the method scans the string S for determining the position of the words 1' having the binary value BQ which is in
the first position within the sorting 0, and it stores their reciprocal distances df" in a memory array AQ ; in the embodiment of Figure 1 , such first binary value is the value "11111111". In particular, in the embodiment of Figure 2, the distance df" of the first word V equal to B0 ="11111111" is equal to the number of words 1 of the string S which precede it, while the distance df° (for /' > 0) between two successive words V equal to B0-"\ 1 11111" is equal to the number of words 1 which separate them, in such a way that in the case when two words equal to BQ='"\ 1111111" are consecutive, as in the case of the words 1" of Figure 2, their reciprocal distance is equal to 0, so that in Figure 2 it is df" = 0.
Assuming that the string S comprises N
0 words, with N
0 > 0 , having binary value B
0 , at the end of such step of scanning the string S for searching all the words having binary value B
0 , the array A
Q contains N
0 elements related to the distances
for /
' = 0,1,2,..., N
0 - 1 , which locate the positions within the string S of the N
0 words V having binary value
%
In the embodiment of the method shown in the Figures, each one of the No elements of the array AQ , in case N0 > 0 , is represented according to a suitable encoded binary representation. In particular, being de- fined two positive integer values g0 and xQ , with g0 ≥ 0 and x0 > 0 , depending on the value of df" this may be represented in one of three different ways: 1 ) in the case when df' < 28" [2]
df" is represented by (g0 + 1) binary bits, the first of which is equal to "0" and the following g0 bits form the positive two's complement binary representation having g0 bits of the value of df" ; in other words, in this case df" is represented according to the formula:
"0 "& H [3]
where "&" is the juxtaposition operator and [z]
t is the positive two's complement binary representation having t bits of the value Z; ) in the case when
df" is represented by (go + χo + 2) binary bits the first of which is equal to "1", the second one is equal to "0" and the following (g
0 + x
Q) bits are the positive two's complement binary representation having (g
0 + χ
0) bits of the value of ψ
i ° - 2
8° j; in other words, in this case d ,pB, is represented according to the formula:
"10"& ^' - 2*^ [5] o +*o) ) finally, in the case when
2(s« +x«) < dB" [6]
d ,pB, is represented by g
0 + XQ + 2 binary bits, where
In[x] is the operator returning the integer part of X, in such a way that
the first bit of the encoded representation is equal to "1", the following
bits are equal to "1", the following bit is equal to "0'
and the last (go +*o)
D'
ts are tne positive two's complement binary
Bn d. ■ 28" representation having (go + *o) D'ts °f tne value of Re
>(g„ +*o )
W_ where Re is the operator returning the remainder of the division Y
W_ W_ i.e. it is equal to W -Iή in other words, in this third case Y Y
d ,pBB is represented according to the formula:
w where & T is the operator constructing a string including W bits
equal to "1".
In the embodiment of the method shown in the Figures, the two posi-
tive integer values g0 and x0 are fixed through a trial and error process
which finds the optimum values of g0 and x0 that minimise the size of
memory needed to represent the array AQ according to the encoded rep¬
resentation. Specifically, with reference to Figure 3, immediately under¬
standable to one skilled in the art, for each trial value of g0 between 0
and G, with G > 0 , the method verifies the memory occupation of the rep¬
resentation of the array AQ for each value of xQ between 0 and X, with
X > 0 , and it considers as optimum values of g0 and χ0 the ones which
minimise such memory occupation. Preferably, G=12 and X=3.
With reference to Figure 4, it may be observed that at the end of the step of encoded representation of the array AQ , the method constructs an output string E comprising:
- a first flag bit equal to "1", for indicating that at least one word of value B0 exists within the string S (i.e. the array AQ comprises at least one element df" ),
- the values of g0 and x0 , and
- the No elements of the array AQ having encoded representation.
As it will be illustrated later on, it is not necessary to insert into the output string E the value of N0. However, alternative embodiments of the method may provide, instead of the first flag bit of indicating the presence of at least one word of value BQ within the string S, for the insertion into the output string E of the value of N0 (which assumes the value 0, in case of absence of words of value v30 within the string S). Moreover, as shown in Figure 5, a second binary string S: is constructed, either physically or logically, which is obtained by eliminating from the initial string S all the words 1 ' having the binary value B0.
In the case when the string S does not comprise any word of value equal to B0 , that is N0 = 0 , the output string E comprises only one flag bit equal to "0".
Afterward, similarly to what executed during the preceding step, the method scans the string Si for determining the position of the words having the binary value Bλ which is in the second position within the sorting O, and it stores their reciprocal distances df in a corresponding memory
array Aγ ; in the embodiment of the method shown in the Figures, such second binary value is the value "11111110".
Assuming that the string Si comprises Ni words, with Ni > 0 , having the binary value Bγ , at the end of such scanning step of the string Si for searching all the words having the binary value Bx , the array Aλ contains Nj elements related to the distances df , for i - 0,1,2,...,N - I , which locate the positions within the string SL of the Ni words having binary value Bγ.
Similarly to what done for the array AQ , in case Ni > 0 , the embodi- ment of the method shown in the Figures fixes two positive integer values gi and xλ which minimise the size of memory needed to represent the array Aγ according to the encoded representation, illustrated above with reference to the array AQ .
With reference to Figure 6, it may be observed that, in case Ni > 0 , at the end of the step of encoded representation of the array Ax , the method juxtaposes at the head of the output string E
- a first flag bit of flag equal to "1", for indicating that at least one word of value Bγ exists within the string S (i.e. the array A comprises at least one element di ' ), - the values of gi and xi , and
- the Ni elements of the array A having encoded representation.
In case the string S does not comprises any word of value equal to τ5ι , that is Ni = 0 , only one flag bit equal to "0" is juxtaposed to the output string E.
A third binary string S2 is then constructed, either physically or logically, which is obtained by eliminating from the second string Si all the words having the binary value τ9ι .
The method executes similar operations on the third string S2 , scanning it for determining the position of the words having the binary value B2 which is in the third position within the sorting O, and it stores their reciprocal distances df2 in a memory array A2 ; in the embodiment of the method shown in the Figures, such third binary value is the value "11111101". Assuming that the string S2 comprises N2 words, with N2 > 0 , having the binary value B2 , at the end of such scanning step the array A2 contains N2 elements related to the distances df , for /' = 0,1,2,..., N2 - 1. Similarly to what previously done, in case N2 > 0 the method fixes two positive integer values g2 and x2 which minimise the size of memory needed to represent the array A2 according to the already illustrated encoded representation, and, as shown in Figure 7, at the head of the output string E there are juxtaposed:
- a first flag bit equal to "1",
- the values of g2 and x2 , and
- the N2 elements of the array A2 having encoded representation. Also, in case the string S does not comprise any word of value equal to B2 , that is N2 = 0, only one flag bit equal to "0" is juxtaposed to the output string E.
Afterward, a fourth binary string S3 is constructed, either physically or logically, which is obtained by eliminating from the third string S2 all
the words having the binary value B2.
In other words, the method iterates the preceding steps for all the 2"
different binary values Bt according to the sorting O. However, for the last
binary value Br_λ , that in the embodiment of Figure 1 is the value
"00000000", the method stores only the number N2»_ι of words having
such binary value Br_γ , since the last string S2»_, comprises only words
having such binary value Bτ_γ , so that the related reciprocal distances
dt 2"-' have constant value equal to 0. Therefore, for the last binary value
B _γ it is not even executed an encoded representation as previously
illustrated.
Finally, the method advantageously stores the initial size of the
compressed string S, without the possible tail of bits equal to "0".
The embodiment of the method shown in the Figures provides for the
following specific cases of the encoded representation for specific values
of gk and xk , for £ = 0,1,2,...,2" - 2 :
- in the case when gk = 0 , if the inequality [2] occurs, the element
D d k is represented only by the bit "0"; and
- in the case when χ = 0 , the inequality [4] may never occur and,
therefore, in case the inequality [6] occurs, the encoded repre-
sentation of the element dfk does not comprise the first bit "1",
i.e. this is represented according to the formula
Figure 8 shows a flow chart which schematically summarises the method steps illustrated above.
The embodiment of the method shown in the Figures may be iterated, that is the output string E may be subjected to the same steps of the compression method, schematically shown in Figure 8, being treated as it were an input string S so as to obtain a new output string Eγ . In this case, the initial size of the input string S is advantageously stored only once at the head of the last output string.
In particular, the string E is subdivided in words comprising the same number n of bits, if the new output string Eγ which is obtained has smaller size, otherwise the string E is subdivided in words comprising a number nx of bits that is different from n bits, preferably n < n . Still more preferably, the sizes nh of the words in which the strings to be compressed are subdivided ad each Λ-th iteration are equal to involutions of 2, according to the formula: nh = 2m> The number nh of bits of the words into which the input string to be compressed has been subdivided is stored at the output string head.
The automated decompression method reconstructs the input string S starting from the output string E. In particular, the output string E is read according to a sorting reversed with respect to the sorting O used for constructing it, carrying out the following steps:
- reading the number nh of bits of the words forming the string S;
- constructing a string S _l comprising N _{ words having the binary
value B _x
- for each successive binary value Bk , for k decreasing from (2" - 2) to
0,
- reading the flag bit and, in case it is equal to "1", reading the
values of gk and xk and constructing a string S^ by inserting
into the string Sk+l words having the binary value Bk at the po-
sitions located by the values dχ k interpreted according to the
encoded representation of formulas [3], [5], [7] and [8].
It is clear that, knowing the number of words included in the string
S£+1, it is not necessary to know the number N^ of words having binary
value Bk to be inserted into the string S +λ, thanks to the fact that the at-
D tempt of reading f k , for i ≥ Nk , would be inconsistent with the number
of words included in the string Sk+γ.
In case the compression method has been iterated, the decompres-
sion method must also be iterated up to reconstructing the input string S.
Advantageously, an input data string to be compressed may prelimi¬
narily be subdivided in equal parts or "segments", all of which may sepa¬
rately be compressed for being then, for example, transmitted and, at re¬
ception, decompressed and composed so as to form again the original
string. This segmentation is done in order on the one hand to lighten the
compression process and on the other hand to allow data to be sent in
several segments along one of common data transmission lines, in such a
way that the receiving apparatus may incrementally reconstruct the origi¬
nal input data string starting from the different consecutive segments, by
incrementally juxtaposing the segments obtained from decompression of the compressed segments.
The preferred embodiments have been above described and some modifications of this invention have been suggested, but it should be un- derstood that those skilled in the art can make other variations and changes, without so departing from the related scope of protection, as defined by the following claims.