[go: up one dir, main page]

CN1536768A - Compression method for 2-byte character data - Google Patents

Compression method for 2-byte character data Download PDF

Info

Publication number
CN1536768A
CN1536768A CNA2003101242211A CN200310124221A CN1536768A CN 1536768 A CN1536768 A CN 1536768A CN A2003101242211 A CNA2003101242211 A CN A2003101242211A CN 200310124221 A CN200310124221 A CN 200310124221A CN 1536768 A CN1536768 A CN 1536768A
Authority
CN
China
Prior art keywords
byte
character
data
code word
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2003101242211A
Other languages
Chinese (zh)
Other versions
CN100474781C (en
Inventor
赵s衍
赵畇衍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pan Thai Co ltd
Original Assignee
Pantech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pantech Co Ltd filed Critical Pantech Co Ltd
Publication of CN1536768A publication Critical patent/CN1536768A/en
Application granted granted Critical
Publication of CN100474781C publication Critical patent/CN100474781C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/38Transceivers, i.e. devices in which transmitter and receiver form a structural unit and in which at least one part is used for functions of transmitting and receiving
    • H04B1/40Circuits

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明提供了一种在终端机的信息处理模块中,以2字节字符(朝鲜字符、汉语)为单位对信息进行压缩后再存储,从而可以减少存储空间的2字节字符数据的压缩方法。本发明的2字节字符数据的压缩方法的特征在于包括:根据频率数生成多个可压缩代码字,存储在基本词典表中,将登记的表示下一个代码字的变量初始化的步骤;识别输入的信息数据是否是2字节字符,并接收的输入步骤;比较输入的数据是否包含在该可压缩代码字中,当包含在该可压缩的代码字中时,从该词典表中经过映射过程搜索符合代码并输出,当词典中没有该符合代码时,将其登记在词典中的步骤;判断是否是数据的尾数,当数据没有输入完时,返回依次输入信息数据的输入步骤;以及当是数据的尾数时,进行清除过程的步骤,当编码该可压缩代码字得到的符合代码的位数比该可压缩代码字可以降低位的临界值小时,以log2(C1+1)-1位输出,当符合代码字比临界值大时,以log2(C1+1)位输出,该C1是当前被赋值的代码字数。

The present invention provides a method for compressing information in units of 2-byte characters (Korean characters, Chinese) before storing in an information processing module of a terminal, thereby reducing storage space for 2-byte character data. . The compression method of 2-byte character data of the present invention is characterized in that comprising: generate a plurality of compressible codewords according to the frequency number, store in the basic dictionary table, the step of the variable initialisation of the representation next codeword of registering; Identify input Whether the information data is a 2-byte character, and receive the input step; compare whether the input data is contained in the compressible code word, and when contained in the compressible code word, go through the mapping process from the dictionary table Searching for the matching code and outputting it, when there is no such matching code in the dictionary, registering it in the dictionary; judging whether it is the mantissa of the data, when the data has not been input, returning to the input step of inputting the information data in sequence; and when it is When the mantissa of the data, carry out the step of clearing process, when the number of digits of the conforming code obtained by encoding the compressible codeword is smaller than the critical value that the compressible codeword can reduce the bit, take log 2 (C1+1)-1 bit Output, when the matching code word is larger than the critical value, it is output with log 2 (C1+1) bits, where C1 is the number of code words currently assigned.

Description

The compression method of 2 byte character data
Technical field
The present invention relates to a kind of compression method of 2 byte character data, more particularly, relate to a kind ofly, utilize the compression method of 2 byte character data of 2 byte character compression algorithms in order to reduce SMS (Short Message Service) in the mobile communication terminal and the information stores space of EMS (Enhanced Messaging Service).
Background technology
Generally speaking, the client utilizes the information of mobile communication terminal to send receiving function (SMS, EMS), carries out information exchange miscellaneous.Most mobile communication terminal compresses this information hardly, and the terminating machine that carries out the part compression also just utilizes the compression algorithm that is fit to English alphabet.
But, when adopting this compression algorithm, resemble the such language of Korea's character and Chinese, because have the characteristics of tediously long property mostly, so relatively compression efficiency is low, and need more internal memory, existence can not reduce the problem of memory space effectively.
The flat 2-255977 of [patent documentation 1] Ri Bente Open (1990-255977 number bulletin of Japan Patent)
The flat 9-069785 of [patent documentation 2] Ri Bente Open (1997-069785 number bulletin of Japan Patent)
Summary of the invention
The present invention has overcome above-mentioned deficiency, it is a kind of in the message processing module of terminating machine that its purpose is to provide, with 2 byte characters (Korea's character, Chinese) is that unit compresses information and stores, thereby can reduce the compression method of 2 byte character data of memory space.
To achieve these goals, the compression method of 2 byte character data of the present invention is characterised in that and comprises: generate a plurality of compressible code words according to frequency number, and be stored in the basic dictionary table, with the step of the initialization of variable of the next code word of the expression of registration; Whether the information data of identification input is 2 byte characters, and the input step that receives; Relatively whether Shu Ru data are included in this compressible code word, in the time of in being included in this compressible code word, from this dictionary table, meet code and output, when not having this to meet code in the dictionary, it is registered in step in the dictionary through mapping process search; Judge whether it is the mantissa of data, when data have not been imported, return the input step of input information data successively; And when being the mantissa of data, carry out the step of reset procedure, a critical value that can reduce than this compressible code word when the figure place that meets code that will this compressible code word coding obtains hour is with log 2(C1+1)-1 an output is when meeting code word when bigger than critical value, with log 2(C1+1) position output, this C1 is current by the number of the code word of assignment.
Beneficial effect of the present invention is, in the message processing module of terminating machine, the information by compressing 2 byte characters (Korea's character, Chinese etc.) is also stored, and can reduce memory space.That is to say that when utilizing the text of method compression English of the present invention and Korea's character mixing, compare with existing compression method, the mean pressure shrinkage has about about 22% the effect of improving.
Description of drawings
Fig. 1 is the operational flowchart of the compression method of 2 byte character data in the one embodiment of the invention.
Fig. 2 is in the compression method of the 2 byte character data of one embodiment of the present of invention, and search meets the operational flowchart that the step (compression step) of code and output is elaborated through mapping process from this dictionary table.
Fig. 3 is the operational flowchart that this dictionary generation/management process that meets dictionary of management in the compression method of the 2 byte character data of one embodiment of the present of invention is elaborated.
Embodiment
For convenience of description, the compression method of 2 byte character data of the present invention is that example describes with the Korean.But be equally applicable to language with 2 type flags such as Chinese, Japanese etc.Therefore, in the present embodiment, only Korean compression situation is described, but the present invention is not limited in Korean, this will be readily apparent to persons skilled in the art.
Below the contrast accompanying drawing describes embodiments of the invention.
Fig. 1 is the operational flowchart of the compression method of 2 byte character data in one embodiment of the present of invention, below will relevant therewith situation be described.
At first, the maximum character string number (N7) of initialization, code number of words (N2), initial dictionary entry number (N5) etc., the character that frequency number is high is collected in the basic dictionary table, and the variable C1 initialization (S101) of the next code word of the expression that will register, be used for character compression code word to be constructed as follows table described.Here, in order to find the needed code word of character compression, from Korea's character and English mixed file, find out the frequency of occurrences of 2350 words of finishing type Korea character after, with its arrangement and observation, 2% 470 words that often use are wherein registered as code word.In this case, the whole frequency of occurrences of these 470 characters of 2% reaches more than 85%.Therefore, the initialization value of this variable C1 can be 471.
Table 1:
????0~255 ASCII (ASCII)
????256~725 Korea's character code (470 words)
????726~1023 10 codings
????1024~2047 11 codings
????2048~4095 12 codings
Then, the variable that is initialised of contrast is stored in the compressible code word of appending and comprises this basic dictionary table in interior additional dictionary table, reinitializes the variable C1 (S102) of the next code word of expression of registration.At this, the figure place that meets code of the compressible code word of encoding depends on following formula.
Formula 1:(C1+lim)≤2 Log (C1+1)-1
Formula 2:lim=C3-C1-1
Formula 3:C3=2 Log (C1+1)
At this, this C1 is meant current by the code number of words of assignment, and lim is meant that code word can reduce the critical value of position.Therefore, code word is converted to when ranking, if code word is littler than determined critical value (lim), then with log 2(C1+1)-1 an output is if it is bigger than critical value to meet code word, then with log 2(C1+1) position output.
For example, this C1 is 750 o'clock, lim=(1024-750-1)=273, so compression epoch code word is between 0 to 273, with 9 coding outputs, if compress the epoch code word between 274 to 749, each code word adds 274, with 10 coding outputs
When removing compression,,, then its value is read as the code word code,, then read with 10 again, the value that deducts 274 is read as the code word code if this value of reading is bigger than 274 if this value of reading is littler than 274 with 9 read the code word bits.Following table 2 is represented dictionary table structure of the present invention in above-mentioned mode.
Table 2:
Compressible code word The code that is encoded 10 systems
????0 ????000000000 ????0
????1 ????000000001 ????1
????2 ????000000010 ????2
????. ??????. ????.
????. ??????. ????.
????273 ????100010001 ????273
????274 ????1000100100 ????548(274+274)
????275 ????1000100101 ????549(274+275)
????. ??????. ????.
????. ??????. ????.
????749 ????1111111111 ????1023(274+749)
Thereafter, input information data successively.Relatively whether Shu Ru data are included in this compressible code word, and in the time of in being included in this compressible code word, through mapping process, search meets code and output (S103) from this dictionary table.Then, confirm that this meets code and whether is present in the dictionary, when not having in the dictionary, the dictionary that carries out registering in dictionary generates step (S104).
Afterwards, judge whether it is the mantissa of data, when not being the mantissa of data, turn back to the step of input information data (S105) successively.
If the mantissa of data then carries out reset procedure (Flush) (S106).At this, said this reset procedure is meant in memory storage methods, with 8 or 16 storage data, but has the figure place of variable-length for the data that have been compressed, when the data of last storage are not 8 or 16, with remaining at last position with 0 process of filling up.
Fig. 2 is in the compression method in one embodiment of the invention 2 byte character data, process mapping process from this dictionary table, search meets the operational flowchart that is elaborated of step (compression step) of code and output, and relevant therewith being described as follows is described.
At first read first byte (S201) of input data.
Judge that this first byte is whether in the 1st assignment scope (S202) thereafter.Here, when being finishing type Korea character, because first byte possesses 25 numerals from the B0 of 16 systems to C8, so the 1st assignment scope can be that B0 from 16 systems is to C8.
If this first byte is positioned at the 1st assignment scope, read second byte (S203) of input data.
On the other hand, if this first byte is not in the 1st assignment scope, because be not Korea's character of finishing type, so determine it is character (S207) in the ASCII.
Judge that this second byte is whether in the 2nd assignment scope (S204) thereafter.Here, when being finishing type Korea character, because second byte possesses 94 numerals from the A1 of 16 systems to FE, so the 2nd assignment scope can be that A1 from 16 systems is to FE.
If this second byte is positioned at the 2nd assignment scope, judge whether the input data are included in this dictionary table (S205).
On the other hand, if this second byte be not in the 2nd assignment scope, because be not Korea's character of finishing type, so determine it is character (S207) in the ASCII.
If the data of input are included in this dictionary table, determine to meet code value (S206).
On the other hand, if the data of input are not included in this dictionary table, because be not the high Korea's character of the frequency of occurrences, so determine it is character (S207) in the ASCII.
Fig. 3 is to checking that in the compression method of the 2 byte character data of one embodiment of the present of invention this meets code and whether is present in the dictionary, if just be not registered in the dictionary in the dictionary, and remove the operational flowchart that the dictionary management step that is registered in the code that does not often use in the dictionary is elaborated, relevant therewith being described as follows is described.
Whether the character string (length) of at first judging this code word surpasses maximum character string number (N7), if the character string of this code word surpasses maximum character string number (N7) then stops dictionary management step (S301).
If the character string of this code word does not surpass maximum character string number (N7), then judge whether to be present in this dictionary table, in the time of in having this dictionary table, then stop dictionary management step (S302).
If do not exist in the dictionary table, to this character string of new variables C1 assignment (S303).
Then, new variables C1 is for the code word assignment of the character string that then generated and increase its value (S304).
Then, greater than code number of words (N2) (S305) whether the variable C1 of judgement increase.
If the variable C1 that increases is greater than code number of words (N2), to the variable C1 assignment dictionary entry number (N5) that increases, if the variable C1 that increases is during less than code number of words (N2), not to its assignment dictionary entry number (N5) (S306).
Then, whether the node of judging the new variables C1 that assignment give to increase is as leaf (leaf) node of the node of expression character string end character or is not the node (C1==NULL) that is not used, when the node of the new variables that assignment give to increase is not leaf (leaf) node of the node of character string end character in the expression dictionary entry or when not being the node that is not used, turn back to new variables C1 increases its value for the code word assignment of the character string that then generated step (S307).
If when the node of the variable C1 that assignment give to increase is leaf (leaf) node of node of expression character string end character or the node that is not used, then from dictionary entry, remove variable C1, prepare the code word (S308) of the new character string of assignment.
The present invention is not limited to the disclosed scope of the foregoing description.Can carry out various improvement, change in technical theme of the present invention, these improvement, change also are subordinated to technology category of the present invention, protected by the present invention.

Claims (9)

1. the compression method of byte character data is characterized in that comprising:
Generate a plurality of compressible code words according to frequency number, be stored in the basic dictionary table, with the step of the initialization of variable of the next code word of the expression of registration;
Whether the information data of identification input is 2 byte characters, and the input step that receives;
Relatively whether Shu Ru data are included in the described compressible code word, in the time of in being included in described compressible code word, from described dictionary table, meet code and output through mapping process search, described when meeting code when not having in the dictionary, it is registered in step in the dictionary;
Judge whether it is the mantissa of data, when data have not been imported, return the input step of input information data successively; And
When being the mantissa of data, carry out the step of reset procedure,
When the figure place that meets code that obtains of the described compressible code word of coding can reduce the critical value hour of position than described compressible code word, with log 2(C1+1)-1 an output is when meeting code word when bigger than critical value, with log 2(C1+1) position output, described C1 is current by the code number of words of assignment.
2. the compression method of 2 byte character data according to claim 1 is characterized in that also comprising:
With reference to the variable that has been initialised, the compressible code word of appending is stored in comprises described basic dictionary table in interior additional dictionary table, the step that the variable of the next code word of expression of registration is reinitialized.
3. the compression method of 2 byte character data according to claim 1 is characterized in that:
In order to find described compressible code word, from the mixed file of 2 byte characters and 1 byte character, find the frequency of occurrences of described 2 byte characters of finishing type after, with its arrangement and analysis, the character that wherein often uses is registered as code word.
4. the compression method of 2 byte character data according to claim 1 is characterized in that:
Begin the measuring frequency number from the character that utilizes the combination performance more than 2 bytes, only the character that will often use is registered in the dictionary as the basic code word.
5. the compression method of 2 byte character data according to claim 3 is characterized in that:
Described 2 byte characters are Chinese, and described 1 byte character is an English character.
6. the compression method of 2 byte character data according to claim 3 is characterized in that:
Described 2 byte characters are Koreans, and described 1 byte character is an English character.
7. the compression method of 2 byte character data according to claim 1 is characterized in that:
The step that meets code and output through mapping process search from described dictionary table comprises:
Read the step of first byte of input data;
Judge whether described first byte is positioned at the step of the 1st assignment scope;
When described first byte is positioned at the 1st assignment scope, read the step of second byte of input data;
When described first byte is not positioned at described the 1st assignment scope, because be not Korea's character of finishing type, so determine it is the step of the character in the ASCII;
Judge whether described second byte is positioned at the step of the 2nd assignment scope;
When described second byte was positioned at described the 2nd assignment scope, whether the data of judging input were included in the step in the described dictionary table;
When described second byte is not positioned at described the 2nd assignment scope, because be not Korea's character of finishing type, so determine it is the step of the character in the ASCII;
When the input data are included in the described dictionary table, determine to meet the step of code value; And
When the input data are not included in the described dictionary table, because be not the high Korea's character of the frequency of occurrences, so determine it is the step of the character in the ASCII.
8. the compression method of 2 byte character data according to claim 5 is characterized in that:
Described the 1st assignment scope is that B0 from 16 systems is to C8.
9. the compression method of 2 byte character data according to claim 5 is characterized in that:
Described the 2nd assignment scope is that A1 from 16 systems is to FE.
CNB2003101242211A 2003-04-08 2003-12-31 Compression method of two-byte character data Expired - Fee Related CN100474781C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR1020030021924 2003-04-08
KR10-2003-0021924A KR100494876B1 (en) 2003-04-08 2003-04-08 Data compression method for multi-byte character language
KR10-2003-0021924 2003-04-08

Publications (2)

Publication Number Publication Date
CN1536768A true CN1536768A (en) 2004-10-13
CN100474781C CN100474781C (en) 2009-04-01

Family

ID=34374057

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101242211A Expired - Fee Related CN100474781C (en) 2003-04-08 2003-12-31 Compression method of two-byte character data

Country Status (2)

Country Link
KR (1) KR100494876B1 (en)
CN (1) CN100474781C (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 A Chinese data compression and decompression method and related equipment
CN106354699A (en) * 2015-07-13 2017-01-25 富士通株式会社 Encoding computer program, encoding method, encoding apparatus, decoding computer program, decoding method, and decoding apparatus
CN106471743A (en) * 2014-06-20 2017-03-01 甲骨文国际公司 Encoding of plain ASCII data streams
CN104054316B (en) * 2011-11-15 2017-04-12 思杰系统有限公司 Systems and methods for load balancing SMS centers and establishing virtual private networks
CN112416315A (en) * 2020-06-16 2021-02-26 上海哔哩哔哩科技有限公司 CSS code compression method, electronic device and storage medium
CN114880523A (en) * 2022-04-27 2022-08-09 深圳市优必选科技股份有限公司 Character string processing method and device, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100755533B1 (en) * 2005-07-25 2007-09-06 주식회사 팬택 Character set generation method and apparatus
KR101386169B1 (en) * 2007-08-09 2014-04-17 삼성전자주식회사 Apparatus and method for compression and restoration SMS
KR102633001B1 (en) * 2023-03-27 2024-02-05 주식회사 무브먼츠 Method for implementing underground facilities as ar in an offline environment using combined data precessing of qr code and nfc

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 A Chinese data compression and decompression method and related equipment
CN104054316B (en) * 2011-11-15 2017-04-12 思杰系统有限公司 Systems and methods for load balancing SMS centers and establishing virtual private networks
CN106471743A (en) * 2014-06-20 2017-03-01 甲骨文国际公司 Encoding of plain ASCII data streams
CN106354699A (en) * 2015-07-13 2017-01-25 富士通株式会社 Encoding computer program, encoding method, encoding apparatus, decoding computer program, decoding method, and decoding apparatus
CN106354699B (en) * 2015-07-13 2021-05-18 富士通株式会社 Encoding method, encoding device, decoding method, and decoding device
CN112416315A (en) * 2020-06-16 2021-02-26 上海哔哩哔哩科技有限公司 CSS code compression method, electronic device and storage medium
CN112416315B (en) * 2020-06-16 2024-05-14 上海哔哩哔哩科技有限公司 Compression method of CSS code, electronic device and storage medium
CN114880523A (en) * 2022-04-27 2022-08-09 深圳市优必选科技股份有限公司 Character string processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN100474781C (en) 2009-04-01
KR20040087503A (en) 2004-10-14
KR100494876B1 (en) 2005-06-14

Similar Documents

Publication Publication Date Title
US6778103B2 (en) Encoding and decoding apparatus using context
US5870036A (en) Adaptive multiple dictionary data compression
US6747582B2 (en) Data compressing apparatus, reconstructing apparatus, and its method
US7973680B2 (en) Method and system for creating an in-memory physical dictionary for data compression
US20130307709A1 (en) Efficient techniques for aligned fixed-length compression
US8509554B2 (en) Systems and methods for optimizing bit utilization in data encoding
JPH0869370A (en) Method and system for compression of data
US7770091B2 (en) Data compression for use in communication systems
CN101751451B (en) A Chinese data compression and decompression method and related equipment
KR20120137235A (en) Method and apparatus for compressing genetic data
US9236881B2 (en) Compression of bitmaps and values
CN1536768A (en) Compression method for 2-byte character data
JPS6356726B2 (en)
US7864085B2 (en) Data compression method and apparatus
US6518895B1 (en) Approximate prefix coding for data compression
EP4398120A1 (en) Compact probabilistic data structure for storing streamed log lines
CN108880559B (en) Data compression method, data decompression method, compression device and decompression device
EP1891545B1 (en) Compressing language models with golomb coding
CN115189696A (en) Hardware compression and decompression method based on Huffman decoding table
JPH03204234A (en) Restoration of compressed data
Robert et al. Simple lossless preprocessing algorithms for text compression
Dath et al. Enhancing adaptive huffman coding through word by word compression for textual data
CN112506876B (en) Lossless compression query method supporting SQL query
HK1070189A (en) A method for compressing 2 bits characters data
Zavadskyi A family of data compression codes with multiple delimiters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1070189

Country of ref document: HK

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1070189

Country of ref document: HK

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Seoul, South Kerean

Patentee after: Pantech property management Co.

Address before: Seoul, South Kerean

Patentee before: PANTECH Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20161026

Address after: Seoul, South Kerean

Patentee after: PANTECH CO.,LTD.

Address before: Seoul, South Kerean

Patentee before: Pantech property management Co.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200609

Address after: Seoul, South Kerean

Patentee after: Pan Thai Co.,Ltd.

Address before: Seoul, South Kerean

Patentee before: Pantech Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090401