CN115168303A

CN115168303A - Storage algorithm model based on complex data serialization

Info

Publication number: CN115168303A
Application number: CN202210883606.9A
Authority: CN
Inventors: 郑荣华; 蔡鹏祥
Original assignee: Huabi Technology Chengdu Co ltd
Current assignee: Huabi Technology Chengdu Co ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-10-11

Abstract

The invention discloses a storage algorithm model based on complex data serialization, and relates to the technical field of data processing. The invention comprises the following steps: extracting complex data, adopting a serialization framework Thrift, and utilizing IDL syntax to define and describe data types and services; the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, and the class of objects is translated into the class in a target language after being compiled by a Thrift compiler to obtain stored data; operating the stored data by using a client; performing hash calculation on the key of the data through a server to obtain a number; the server carries out surplus calculation on the obtained numbers and the number of the servers to obtain the serial number of the servers; and operating on the corresponding server by using the server. The invention can easily describe any structured data and unstructured data by providing the IDL for describing the data schema, supports cross-language reading and writing and has enough operation convenience.

Description

Storage algorithm model based on complex data serialization

Technical Field

The invention relates to the technical field of data processing, in particular to a storage algorithm model based on complex data serialization.

Background

When data needs to be stored in a file or sent out through a network, a data object needs to be converted into a byte stream, namely, data serialization is carried out, wherein the data serialization is a process of converting a memory object into the byte stream, and directly determines the data analysis efficiency and the mode evolution capability, namely, whether compatibility can be still maintained when a data format is changed, such as adding or deleting fields;

at present, a corresponding storage support algorithm model is lacked for serialization of complex data, processing and storage controllability of structured data and unstructured data are insufficient, and the compatibility of a current data storage mode for data storage is poor; therefore, we propose a storage algorithm model based on complex data serialization.

Disclosure of Invention

The invention aims to provide a storage algorithm model based on complex data serialization, and solves the problems in the background.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a storage algorithm model based on complex data serialization, which comprises the following steps:

s1: extracting complex data, adopting a serialization framework Thrift, and utilizing IDL syntax to define and describe data types and services;

s2: the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, and the class of objects is translated into the class in a target language after being compiled by a Thrift compiler to obtain stored data;

s3: operating the stored data by using the client;

s4: based on the steps, performing hash calculation on the key of the data through the server to obtain a number;

s5: the server carries out surplus calculation on the obtained numbers and the number of the servers so as to obtain the serial number of the server;

s6: and operating on the corresponding server by using the server to complete the storage algorithm operation based on the complex data serialization.

S7: and based on S6, the server fetches the data from the corresponding server and returns the data to the client.

And the IDL file in the S1 generates corresponding target language codes by a special code generator for a user to use in application, and the Thrift IDL syntax is similar to C language.

Each domain in the Thrift struct in S2 consists of four attributes including a domain number, a domain modification, a domain type and a domain name.

The complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing object instances into byte streams and reading the byte streams to restore the object instances.

The storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method.

The storage algorithm model comprises document storage, the target of the document storage is to set up a bridge between a key value storage mode and a traditional relational data system, the advantages of the key value storage mode and the traditional relational data system are integrated, data are mainly stored in JSON or JSON-like documents and are semantic, a document type database is regarded as an upgrade version of the key value database, key values are allowed to be nested in the stored values, and the document storage model can generally create indexes for the values so as to be convenient for upper-level application.

The invention has the following beneficial effects:

1. the invention provides IDL for describing data schema based on a complex data serialization storage algorithm model, can easily describe any structured data and unstructured data, supports cross-language reading and writing, at least supports three mainstream languages of C + +, java and Python, and has strong controllability.

2. The invention is based on the storage algorithm model of complex data serialization, through carrying out data coding storage, namely integers can adopt variable length coding, character strings can adopt compression coding and the like, so as to avoid unnecessary storage waste as far as possible, simultaneously support schema evolution and ensure the forward and backward compatibility of a read-write module.

Of course, it is not necessary for any product to practice the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the operation of the storage algorithm model based on complex data serialization according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Please refer to fig. 1: the invention relates to a storage algorithm model based on complex data serialization, which comprises the following steps:

s1: extracting complex data, adopting a serialization framework Thrift, utilizing IDL grammar to define and describe data types and services, generating a corresponding target language code by an IDL file through a special code generator for a user to use in application, wherein the Thrift IDL grammar is similar to C language;

s2: the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, the class of objects is translated into a class in a target language after being compiled by a Thrift compiler to obtain stored data, and each domain in the Thrift struct consists of four attributes including domain numbers, domain modifications, domain types and domain names;

the method comprises the following steps that domain numbers are adopted, each domain must be a unique (but discontinuous) integer, and Thrift uses the numbers to realize backward and forward compatibility, and in the schema evolution process, the numbers of the existing domains are not deleted and modified, and only new numbers are given to new domains; the field modification comprises two keywords, namely required and optional, which are used for limiting the value of a field, wherein required represents that a value must be set for the field, and the optional represents that the value of the field is available or not; the domain type Thrift supports very rich data types, supports basic types such as int, long and the like, supports complex container types such as set, list, map and the like, and can refer to the description of the Thrift official website document specifically; the domain name, each domain name under the same struct must be unique, and a default value can be set for the domain.

S3: operating the stored data by using the client; s4: based on the steps, performing hash calculation on the key of the data through the server to obtain a number;

s5: the server carries out surplus calculation on the obtained numbers and the number of the servers so as to obtain the serial number of the server; s6: and operating on the corresponding server by using the server to complete the storage algorithm operation based on the complex data serialization.

The complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing an object instance into a byte stream and restoring the object instance by reading the byte stream; the storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method; the storage algorithm model comprises document storage, the target of the document storage is to establish a bridge between a key value storage mode and a traditional relational data system, the advantages of the key value storage mode and the traditional relational data system are integrated, data are mainly stored in a JSON or JSON-like format document, the document type database is semantic, the document type database is regarded as an upgrade version of the key value database, key values are allowed to be nested in the stored values, and the document storage model can generally establish indexes for the values so as to be convenient for upper-layer application.

In the scheme, the operation process of the client on the stored data comprises the steps that the servers are distributed on a ring; the client starts to perform data operation; the server carries out hash calculation on the key of the data to obtain a number; comparing each point corresponding to the circular ring by using the obtained hash value to obtain a drop point of the data on the circular ring; the server clockwise searches a server node closest to the drop point; the servers operate on the respective servers.

In this scheme, the storage structure of data can be obtained by the following four basic storage methods:

a Sequential Storage method, which stores logically adjacent nodes in physically adjacent Storage units, the logical relationship between the nodes is represented by the adjacency relationship of the Storage units, the Storage representation obtained thereby is called Sequential Storage Structure (Sequential Storage Structure), usually by means of array description of program language, the method is mainly applied to linear data Structure, and non-linear data Structure can also realize Sequential Storage by some linearization method.

A Linked Storage method does not require that logically adjacent nodes are physically adjacent, the logical relationship between nodes is represented by an additional pointer field, and the resulting Storage representation is called a chained Storage Structure (Linked Storage Structure), usually described by means of a pointer type of a programming language.

An Index storage method, generally, while storing node information, an additional Index table is also established, where the Index table is composed of a plurality of Index entries, and if each node has an Index entry in the Index table, the Index table is called Dense Index (Dense Index), and if a group of nodes only corresponds to one Index entry in the Index table, the Index table is called sparse Index (sparse Index), and the general form of the Index entry is:

the key words are data items which can uniquely identify one node, and the address of the index item in the dense index indicates the storage position of the node; the address of an index entry in the sparse index indicates the initial storage position of a group of nodes; the hash storage method has the basic idea that: and directly calculating the storage address of the node according to the keyword of the node.

In the description herein, references to the description of "one embodiment," "an example," "a specific example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A storage algorithm model based on complex data serialization is characterized by comprising the following steps:

s2: the method comprises the steps that a predicate uses struct keywords to describe a general name of a class of objects, and the class of objects is translated into a class in a target language after being compiled by a Thrift compiler to obtain stored data;

s3: operating the stored data by using the client;

2. The model of claim 1, wherein the IDL file in S1 is generated by a special code generator to generate corresponding target language code for the user to use in the application, and the Thrift IDL syntax is similar to C language.

3. The model of complex data serialization-based storage algorithm according to claim 1, wherein each domain in the thread struct in S2 is composed of four attributes, including domain number, domain modification, domain type and domain name.

4. The storage algorithm model based on complex data serialization is characterized in that the complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing an object instance into a byte stream and restoring an object instance from a read byte stream.

5. The storage algorithm model based on complex data serialization is characterized in that the storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method.

6. The storage algorithm model based on complex data serialization according to claim 1, characterized in that said storage algorithm model comprises document storage, said document storage aims at building a bridge between key value storage and traditional relational data system, integrating the advantages of both, its data is mainly stored in JSON or JSON-like format documents, it is semantic, document type database is regarded as an upgrade of key value database, allowing key values to be nested in stored values, and document storage model can generally create index for its values to facilitate upper application.

7. The storage algorithm model based on complex data serialization of claim 1, wherein S7: and based on S6, the server fetches the data from the corresponding server and returns the data to the client.