CN118075701A

CN118075701A - Implementation method for splitting and recombining long short messages for processing multiple coding formats

Info

Publication number: CN118075701A
Application number: CN202410224107.8A
Authority: CN
Inventors: 祁景晨; 韩弘光
Original assignee: Hainan Shenzhou Hope Network Co ltd
Current assignee: Hainan Shenzhou Hope Network Co ltd
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2024-05-24
Anticipated expiration: 2044-02-29
Also published as: CN118075701B

Abstract

The invention discloses a method for realizing splitting and recombining long short messages for processing multiple coding formats, which comprises the steps of firstly judging whether a short message content contains a Chinese character string, if not, defining the byte length of each short message to be 160, otherwise, defining the byte length to be 140, and processing multiple conditions of mixing full Chinese, british and Chinese and English when the short message content contains the Chinese character string; the method for converting the short message content compatible with multiple coding formats into byte streams and converting the byte streams into the short message content is provided, so that the method can be compatible with the splitting and recombination of the short message content with multiple coding formats; the invention provides an algorithm for reasonably distributing and splitting the number of short messages, which can reduce the number of split short messages as far as possible under the condition of meeting the related specifications of the short messages and meeting the requirements of the readability of the short messages, thereby achieving the effect of reducing the expenses of users in sending the short messages; the invention has reasonable design and is worth popularizing.

Description

Implementation method for splitting and recombining long short messages for processing multiple coding formats

Technical Field

The invention relates to the technical field of Short Message Service (SMS), in particular to a method for realizing splitting and recombining long short messages for processing various coding formats.

Background

Short Message Service (SMS) is a widely used communication method, the history of which can be traced back to the 80 s of the 20 th century. With the development of mobile communication technology, short message service has been widely used in various industries such as bank securities, transportation, business logistics, administrative management, public service, etc. SMS data is generated by a service provider and forwarded to a user's mobile terminal through an industry gateway connected to a telecommunications carrier via the internet. According to the related short message specification, the single short message information quantity of the mobile phone is limited to 160 English characters or 140 bytes of binary information, namely 70 Chinese characters (including punctuation marks). If it is required to send an ultra-long short message with a length of more than 140 bytes (70 Chinese characters), the general processing method generally has two solutions: firstly, the long short messages are sent in a plurality of independent short messages, so that the mobile phone can receive a plurality of short messages, and the short messages arrive at the mobile phone in order, so that disorder or loss can occur, and a user needs to check all the short messages to know the whole content of the short messages, so that the user experience effect is poor; secondly, the system is divided into a plurality of short messages to be sent through a UDHI mechanism, and the short messages are displayed as a long short message which is combined together in sequence at a receiving end, so that the experience of a user is better.

There are many implementation methods based on the second solution in the market at present, but most of them only consider the scenario that the character string is all english or all chinese. If there are multiple character coding formats in the short message content, say that the short message content has both Chinese and English, since 7-bit ASCII codes are generally adopted for English letters, 1 byte is occupied, and Chinese is 8-bit UCS-2 codes, two bytes are required to be occupied. In the splitting process of the long short message, as the character is not known to belong to Chinese or English, two bytes corresponding to a Chinese character are likely to appear, the first byte is divided into the last short message, and the next byte is allocated to the next short message, so that messy codes of the display content of the short message can be caused, and the experience of a user in using the short message is affected.

Therefore, a realization method for splitting and reorganizing long short messages for processing multiple coding formats becomes a urgent problem to be solved.

Disclosure of Invention

The invention aims to solve the technical problems that the realization method based on the second solution is quite a lot in the market at present, but most of the realization methods only consider the scenes that the character strings are all English or Chinese. If there are multiple character coding formats in the short message content, say that the short message content has both Chinese and English, since 7-bit ASCII codes are generally adopted for English letters, 1 byte is occupied, and Chinese is 8-bit UCS-2 codes, two bytes are required to be occupied. In the splitting process of the long short message, as the character is not known to belong to Chinese or English, two bytes corresponding to a Chinese character are likely to appear, the first byte is divided into the last short message, and the next byte is allocated to the next short message, so that messy codes of the display content of the short message can be caused, and the experience of a user in using the short message is affected.

In order to solve the technical problems, the invention provides a realization method for splitting and recombining long short messages for processing various coding formats, which is mainly characterized in that the method can reasonably split and recombine long short messages containing Chinese and English, avoid the problem of messy codes caused in the process of splitting the long short messages, finally realize that a receiving end can correctly receive the long short messages, and reduce the splitting quantity of the short messages as far as possible under the condition of meeting the related specifications of the short messages, thereby achieving the effect of reducing the expenses of users on short message transmission.

The technical scheme provided by the invention is as follows: a method for realizing splitting and recombining long short messages for processing multiple coding formats is as follows:

Step 1: defining the coding format and length of the short message, firstly judging the content and signature of the incoming short message, judging whether the signature and the content contain Chinese characters, and if so, coding the coding format of the short message into GB2312 codes, wherein the length of each short message is 140 bytes; otherwise, the encoding format of the short message is ASCII encoding, and the length of each short message is 160 bytes;

step 2: calling a method for splitting the content of the short message into a custom byte object array, wherein the input parameters are the content of the short message and the corresponding coding format of the short message; the custom byte object encapsulates three attributes: whether it is a chinese character, chinese byte identification, and single byte data; the process of splitting the short message content into an object array is mainly characterized in that byte data corresponding to each character of the short message content is respectively processed in a series to finally form a custom byte object array;

Step 3: executing the operation of the step 2 again on the short message signature, and splitting the short message signature content into a custom byte object array;

Step 4: calculating the maximum byte length of the first short message which can be stored, and subtracting the byte object length of the short message signature from the short message length to obtain the maximum byte length of the first short message which can be stored;

Step 5: judging whether the short message is a long short message, firstly defining an identification of whether the short message is a long short message, setting a default value as False, setting the identification of whether the short message is a short message as True if the maximum byte length which can be stored in the first short message is smaller than the length of a custom byte array corresponding to the short message content, then changing the current short message coding format into UCS2 coding, changing the length of each short message into 133 bytes, and then calculating the maximum length of the first short message, namely subtracting the custom byte object array length of the short message signature from the maximum length of each short message; and finally, re-executing the step 2 for the content of the short message to obtain a new custom byte object array;

Step 6: initializing a short message content array for storing the split pieces of short message content, and initializing a short message byte data array for storing the split pieces of short message byte data;

Step 7: if the identification of the long short message is False, namely the short message is not long short message, converting the custom byte array obtained in the step 5 into a character string, adding the character string into the short message content array initialized in the step 6, and adding the byte data in the custom byte array into the short message byte data array initialized in the step 6;

Step 8: if the identification of the long short message is True, the long short message is obtained; entering a long short message splitting flow;

step 9: and finally returning to the two arrays defined in the step 6 and the number of the split short messages;

step 10: and entering a short message reorganization flow, wherein the short message reorganization is processed by a short message 7-byte method, and three parameters including byte arrays of the current short message, the total number of the batch of short messages and the serial numbers of the current short message in the batch of short messages are required to be transmitted.

Further, the chinese byte identifier in step 2 is used to represent the start bit and the end bit of the current byte, and since a single chinese character corresponds to two bytes, here, the start bit 0 is used to represent the first byte corresponding to chinese, and the end bit 1 is used to represent the second byte corresponding to chinese.

Furthermore, in the step 5, the length of each short message is changed to 133 bytes, because in the long short message, in order to ensure that the short message receiving side can receive orderly short messages, and the protocol header needs to be added into each short message after being combined, and the protocol header is 7 bits; this requires that the encoding format of the sms can only use UCS2 encoding, and UCS2 encoding has a total of 140 bytes, minus 7 bytes of the protocol header, so each sms is 133 bytes long.

Further, the process of forming a custom byte object array in step 2 mainly includes the following steps:

Step 2-1: initializing an empty custom byte object array, judging the coding format of the incoming short message, and entering a corresponding processing method by different short message coding formats, wherein if the encoding is US_ASCII, the processing logic is simpler, the short message content is only required to be split into byte arrays according to the coding format, then traversing the byte arrays, assigning the byte arrays to byte data fields of the custom byte object, judging whether the byte data fields are Chinese character fields or not, finally adding the byte arrays into the custom byte object array, and continuing the following steps by other coding formats:

Step 2-2: splitting the content of the short message into character arrays;

Step 2-3: traversing the character array, and converting the single character into a byte array corresponding to the coding format;

step 2-4: judging the length of the byte array, and if the length is greater than 1, judging the byte array as the byte array corresponding to the Chinese character; otherwise, the corresponding byte array of the non-Chinese character is;

step 2-5: if the byte array is the byte array corresponding to Chinese characters, circularly traversing the byte array, assembling the custom byte object data, setting a Chinese character field as True, simultaneously putting the currently traversed byte data into a single byte data field of the object, setting a Chinese byte identification field of the byte object as 0 if the currently traversed byte is the first byte in the byte array, setting a Chinese byte identification field of the byte object as 1 if the currently traversed byte is the second byte in the byte array, and finally adding the custom byte object into the custom byte object array;

Step 2-6: if the byte data is the byte array corresponding to the non-Chinese character, the custom byte object data is assembled, whether the Chinese character field is set as False, the currently traversed byte data is put into a single byte data field of the object, and finally the custom byte object is added into the custom byte object array.

Further, the process of converting the custom byte array into the character string in the step 7 mainly includes the following steps:

Step 7-1: acquiring an input custom byte array and a short message coding format parameter;

Step 7-2: initializing a byte array with the same length as the input custom byte array;

step 7-3: traversing the custom byte array object, taking out the byte data in the custom byte array object, and converting the byte data into character strings according to the input short message coding format.

Further, the long short message splitting process described in the step 8 mainly includes the following two steps:

step 8-1: adding the first short message content;

Step 8-2: adding the content of the subsequent short message;

the step 8-1 is mainly a process of adding a long short message to split out a first piece of data, and because the first long short message contains a short message signature, special processing is needed, and the adding of the first short message content mainly comprises the following steps of;

step 8-1-1: defining an index number for recording the position of the currently processed byte, initializing to 0, and initializing an array for storing a first short message custom byte object;

Step 8-1-2: adding first short message data, firstly performing cycle traversal on a custom byte object array, adding 1 to an index number after entering a cycle body each time, and stopping cycle when the index number starts to be equal to the maximum length of the first short message; then, the custom byte object corresponding to the current index number is fetched from the custom byte object array every time, if the current index number is not equal to the maximum length of the first short message, the custom byte object corresponding to the current index number is added into the custom byte object array initialized in the step 8-1, if the current index number is equal to the maximum length of the first short message, that is, the last byte at the end of the first short message is processed currently, whether the Chinese byte identification field corresponding to the byte object is 0 is judged, if not, the byte object is added into the custom byte object array initialized in the step 8-1, if not, the cycle is jumped out, and finally, the array of the custom byte object of the short message defined in the step 8-1 is converted into a character string and is added into the text message content array defined in the step 6, and the byte data of the custom byte object of the short message defined in the step 8-1 is added into the text message byte data array defined in the step 6;

The main flow of the step 8-2 comprises the following steps:

Step 8-2-1: firstly defining a moving cursor for defining the position of the current processing byte object in each subsequent short message, defaulting to 0, wherein the maximum value is the length of each short message; defining a custom byte object array for storing the subsequent short message data;

Step 8-2-2: then starting to continuously traverse the custom byte object array corresponding to the whole short message content, and adding 1 to the moving cursor every time the custom byte object array enters the loop body;

step 8-2-3: judging whether the value of the mobile cursor for taking the remainder of the length of each short message is equal to 0 or whether the current index number is equal to the last bit of the short message content;

step 8-2-4: if the condition of the step 8-2-3 is met, judging whether the currently processed custom byte object belongs to a starting byte corresponding to the Chinese character, namely, whether a Chinese byte identification field is 0;

Step 8-2-5: if the identification field is 0, converting the custom byte object array storing the subsequent short message data into a character string, adding the character string into the short message content array defined in the step 6, taking out byte data from the custom byte object array storing the subsequent short message data, adding the byte data into the short message byte data array defined in the step 6, converting the custom byte object array storing the subsequent short message data into a null array again, adding the custom byte object processed currently, and resetting the moving cursor to be 1;

Step 8-2-6: if the identification field is not 0, adding the currently processed custom byte object into a custom byte object array for storing the subsequent short message data, then converting the custom byte object array for storing the subsequent short message data into a character string, adding the character string into a short message content array defined in the step 6, taking out the byte data from the custom byte object array for storing the subsequent short message data, adding the byte data into a short message byte data array defined in the step 6, and then converting the custom byte object array for storing the subsequent short message data into a null array again, wherein the difference is that the mobile cursor is set to 0 in the step 8-2-5;

Step 8-2-7: when the condition of step 8-2-3 is not satisfied, the custom byte object currently processed is added to the custom byte object array defined in step 8-2-1.

Further, the short message reorganization process in step 10 includes the following steps:

Step 10-1: initializing a byte array with the length of 7 and a byte array with the length of 7 plus the current short message byte array, wherein the byte array with the length of 7 is used for storing a protocol header, and the other byte array is used for storing byte data after the original byte array is packaged with the protocol header;

Step 10-2: assigning a value to a byte array of the storage protocol header, wherein a 0-4 bit value of the byte array is a default value, and a first byte defaults to 0x06 to represent the length of the remaining protocol header; the second byte defaults to 0x08, which is specified in the GSM 03.40 specification 9.2.3.24.1, indicating that the length of the identification bit of the subsequent batch of very long messages is 2; the third byte defaults to 0x04, which represents the length of the remaining SMS identification; the fourth and fifth bytes represent the unique flag of the batch of messages, and in fact, the SME re-records after merging the messages, so it is not important whether this flag is unique; the 6 th byte is the number of the batch of short messages, and the 7 th byte is the serial number of the current short message in the batch of short messages;

Step 10-3: the first 7 bits of the byte array defined in step 10-1 are added to the byte array of the protocol header, the original byte array is copied to the back of the protocol header of the byte array defined in step 10-1, and the byte data after the final encapsulation of the protocol header is returned.

Compared with the prior art, the invention has the advantages that: the invention provides a method for resolving a long short message in a short message containing Chinese and English scenes, and marks two bytes corresponding to Chinese characters, so that the situation that the two bytes of the Chinese characters are respectively resolved into two short messages in the process of resolving the long short message is avoided, and the problem of messy codes is fundamentally avoided;

The invention provides a method for converting the short message content compatible with multiple coding formats into byte streams and converting the byte streams into the short message content, which can realize the splitting and recombination of the short messages with multiple coding formats and improve the expandability of programs;

The invention provides an algorithm for reasonably distributing and splitting the number of short messages, which can reduce the number of split short messages as far as possible under the condition of meeting the related specifications of the short messages and meeting the requirements of the readability of the short messages, thereby achieving the effect of reducing the expenses of users in sending the short messages; the invention has reasonable design and is worth popularizing.

Drawings

FIG. 1 is a flow chart of a process for processing custom byte object arrays of a method for implementing splitting and reassembling long short messages in multiple encoding formats.

Fig. 2 is a flow chart of adding a first piece of short message content in a method for implementing splitting and reorganizing long short messages with multiple coding formats.

Detailed Description

The invention further provides a method for realizing splitting and reorganizing long short messages with multiple coding formats by combining the drawings.

The present invention will be described in detail with reference to fig. 1-2.

A method for realizing splitting and recombining long short messages for processing multiple coding formats is as follows:

The chinese byte id described in step 2 is used to represent the start and end bits of the current byte, and since a single chinese character corresponds to two bytes, here the start bit 0 is used to represent the first byte corresponding to chinese and the end bit 1 is used to represent the second byte corresponding to chinese.

In the step 5, the length of each short message is changed into 133 bytes, because in the long short message, in order to ensure that the short message receiving side can receive orderly short messages after being combined, a protocol header needs to be added into each short message, and the protocol header is 7 bits; this requires that the encoding format of the sms can only use UCS2 encoding, and UCS2 encoding has a total of 140 bytes, minus 7 bytes of the protocol header, so each sms is 133 bytes long.

The process of forming a custom byte object array in step 2 mainly includes the following steps:

Step 2-2: splitting the content of the short message into character arrays;

The process of converting the custom byte array into the character string in the step 7 mainly comprises the following steps:

The long short message splitting process in the step 8 mainly comprises the following two steps:

step 8-1: adding the first short message content;

Step 8-2: adding the content of the subsequent short message;

The main flow of the step 8-2 comprises the following steps:

The short message reorganization process in the step 10 includes the following steps:

The implementation process of the method for realizing splitting and recombining the long short messages for processing various coding formats comprises the following steps: the implementation method for splitting and recombining the long short messages for processing various coding formats is as follows:

In the embodiment of the invention, firstly, judging whether the short message content contains a Chinese character string or not, if not, defining that the length of each short message byte is 160, otherwise, 140, and processing various conditions of mixing Chinese, british and Chinese and English when the short message content contains full Chinese;

In the embodiment of the invention, the method for converting the short message content compatible with multiple coding formats into byte streams and converting the byte streams into the short message content is provided, so that the method can be compatible with the splitting and recombination of the short message content with multiple coding formats;

In the embodiment of the invention, a custom byte object is packaged, and the custom byte object comprises a Chinese character and a Chinese byte identifier, and a method for splitting the short message content into the custom byte object is provided, so that the user can know which byte belongs to English characters and which byte belongs to Chinese characters when splitting the short message conveniently, and the disorder code of the short message content caused by respectively splitting two bytes of the same Chinese character into two short messages when splitting the short message is avoided;

in the embodiment of the invention, two different processing flows are adopted for the first short message data and the subsequent short message data, because the first short message data needs to store short message signature information, the subsequent short message data does not need to store short message signature, but the two processing flows are the same in that byte data at a short message splitting point are judged, whether the byte data belong to the first byte of Chinese characters or not is judged, if the byte data belong to the first byte, the byte data are not independently placed in the previous short message but placed in the subsequent short message, and after optimization of an algorithm, the splitting quantity of the short message can be reduced as much as possible under the condition that the requirements of related specifications of the short message and the readability of the short message are met, thereby achieving the effect of reducing the expenses of a user on short message sending.

The invention provides a method for resolving a long short message in a short message containing Chinese and English scenes, and marks two bytes corresponding to Chinese characters, so that the situation that the two bytes of the Chinese characters are respectively resolved into two short messages in the process of resolving the long short message is avoided, and the problem of messy codes is fundamentally avoided;

The invention and its embodiments have been described above without limitation. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims

1. A method for realizing splitting and recombining long short messages for processing multiple coding formats is characterized in that: the implementation method for splitting and recombining the long short messages for processing various coding formats is as follows:

2. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 1, wherein the method is characterized in that: the chinese byte id described in step 2 is used to represent the start and end bits of the current byte, and since a single chinese character corresponds to two bytes, here the start bit 0 is used to represent the first byte corresponding to chinese and the end bit 1 is used to represent the second byte corresponding to chinese.

3. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 1, wherein the method is characterized in that: in the step 5, the length of each short message is changed into 133 bytes, because in the long short message, in order to ensure that the short message receiving side can receive orderly short messages after being combined, a protocol header needs to be added into each short message, and the protocol header is 7 bits; this requires that the encoding format of the sms can only use UCS2 encoding, and UCS2 encoding has a total of 140 bytes, minus 7 bytes of the protocol header, so each sms is 133 bytes long.

4. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 2, wherein the method is characterized in that: the process of forming a custom byte object array in step 2 mainly includes the following steps:

Step 2-2: splitting the content of the short message into character arrays;

5. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 1, wherein the method is characterized in that: the process of converting the custom byte array into the character string in the step 7 mainly comprises the following steps:

6. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 1, wherein the method is characterized in that: the long short message splitting process in the step 8 mainly comprises the following two steps:

step 8-1: adding the first short message content;

Step 8-2: adding the content of the subsequent short message;

The main flow of the step 8-2 comprises the following steps:

7. The method for implementing splitting and reorganizing long short messages with multiple coding formats according to claim 1, wherein the method is characterized in that: the short message reorganization process in the step 10 includes the following steps: