[go: up one dir, main page]

CN115168303A - Storage algorithm model based on complex data serialization - Google Patents

Storage algorithm model based on complex data serialization Download PDF

Info

Publication number
CN115168303A
CN115168303A CN202210883606.9A CN202210883606A CN115168303A CN 115168303 A CN115168303 A CN 115168303A CN 202210883606 A CN202210883606 A CN 202210883606A CN 115168303 A CN115168303 A CN 115168303A
Authority
CN
China
Prior art keywords
data
storage
serialization
server
complex data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210883606.9A
Other languages
Chinese (zh)
Inventor
郑荣华
蔡鹏祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huabi Technology Chengdu Co ltd
Original Assignee
Huabi Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huabi Technology Chengdu Co ltd filed Critical Huabi Technology Chengdu Co ltd
Priority to CN202210883606.9A priority Critical patent/CN115168303A/en
Publication of CN115168303A publication Critical patent/CN115168303A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a storage algorithm model based on complex data serialization, and relates to the technical field of data processing. The invention comprises the following steps: extracting complex data, adopting a serialization framework Thrift, and utilizing IDL syntax to define and describe data types and services; the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, and the class of objects is translated into the class in a target language after being compiled by a Thrift compiler to obtain stored data; operating the stored data by using a client; performing hash calculation on the key of the data through a server to obtain a number; the server carries out surplus calculation on the obtained numbers and the number of the servers to obtain the serial number of the servers; and operating on the corresponding server by using the server. The invention can easily describe any structured data and unstructured data by providing the IDL for describing the data schema, supports cross-language reading and writing and has enough operation convenience.

Description

Storage algorithm model based on complex data serialization
Technical Field
The invention relates to the technical field of data processing, in particular to a storage algorithm model based on complex data serialization.
Background
When data needs to be stored in a file or sent out through a network, a data object needs to be converted into a byte stream, namely, data serialization is carried out, wherein the data serialization is a process of converting a memory object into the byte stream, and directly determines the data analysis efficiency and the mode evolution capability, namely, whether compatibility can be still maintained when a data format is changed, such as adding or deleting fields;
at present, a corresponding storage support algorithm model is lacked for serialization of complex data, processing and storage controllability of structured data and unstructured data are insufficient, and the compatibility of a current data storage mode for data storage is poor; therefore, we propose a storage algorithm model based on complex data serialization.
Disclosure of Invention
The invention aims to provide a storage algorithm model based on complex data serialization, and solves the problems in the background.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention relates to a storage algorithm model based on complex data serialization, which comprises the following steps:
s1: extracting complex data, adopting a serialization framework Thrift, and utilizing IDL syntax to define and describe data types and services;
s2: the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, and the class of objects is translated into the class in a target language after being compiled by a Thrift compiler to obtain stored data;
s3: operating the stored data by using the client;
s4: based on the steps, performing hash calculation on the key of the data through the server to obtain a number;
s5: the server carries out surplus calculation on the obtained numbers and the number of the servers so as to obtain the serial number of the server;
s6: and operating on the corresponding server by using the server to complete the storage algorithm operation based on the complex data serialization.
S7: and based on S6, the server fetches the data from the corresponding server and returns the data to the client.
And the IDL file in the S1 generates corresponding target language codes by a special code generator for a user to use in application, and the Thrift IDL syntax is similar to C language.
Each domain in the Thrift struct in S2 consists of four attributes including a domain number, a domain modification, a domain type and a domain name.
The complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing object instances into byte streams and reading the byte streams to restore the object instances.
The storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method.
The storage algorithm model comprises document storage, the target of the document storage is to set up a bridge between a key value storage mode and a traditional relational data system, the advantages of the key value storage mode and the traditional relational data system are integrated, data are mainly stored in JSON or JSON-like documents and are semantic, a document type database is regarded as an upgrade version of the key value database, key values are allowed to be nested in the stored values, and the document storage model can generally create indexes for the values so as to be convenient for upper-level application.
The invention has the following beneficial effects:
1. the invention provides IDL for describing data schema based on a complex data serialization storage algorithm model, can easily describe any structured data and unstructured data, supports cross-language reading and writing, at least supports three mainstream languages of C + +, java and Python, and has strong controllability.
2. The invention is based on the storage algorithm model of complex data serialization, through carrying out data coding storage, namely integers can adopt variable length coding, character strings can adopt compression coding and the like, so as to avoid unnecessary storage waste as far as possible, simultaneously support schema evolution and ensure the forward and backward compatibility of a read-write module.
Of course, it is not necessary for any product to practice the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the operation of the storage algorithm model based on complex data serialization according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Please refer to fig. 1: the invention relates to a storage algorithm model based on complex data serialization, which comprises the following steps:
s1: extracting complex data, adopting a serialization framework Thrift, utilizing IDL grammar to define and describe data types and services, generating a corresponding target language code by an IDL file through a special code generator for a user to use in application, wherein the Thrift IDL grammar is similar to C language;
s2: the method comprises the steps that a Thrift uses struct keywords to describe the general name of a class of objects, the class of objects is translated into a class in a target language after being compiled by a Thrift compiler to obtain stored data, and each domain in the Thrift struct consists of four attributes including domain numbers, domain modifications, domain types and domain names;
the method comprises the following steps that domain numbers are adopted, each domain must be a unique (but discontinuous) integer, and Thrift uses the numbers to realize backward and forward compatibility, and in the schema evolution process, the numbers of the existing domains are not deleted and modified, and only new numbers are given to new domains; the field modification comprises two keywords, namely required and optional, which are used for limiting the value of a field, wherein required represents that a value must be set for the field, and the optional represents that the value of the field is available or not; the domain type Thrift supports very rich data types, supports basic types such as int, long and the like, supports complex container types such as set, list, map and the like, and can refer to the description of the Thrift official website document specifically; the domain name, each domain name under the same struct must be unique, and a default value can be set for the domain.
S3: operating the stored data by using the client; s4: based on the steps, performing hash calculation on the key of the data through the server to obtain a number;
s5: the server carries out surplus calculation on the obtained numbers and the number of the servers so as to obtain the serial number of the server; s6: and operating on the corresponding server by using the server to complete the storage algorithm operation based on the complex data serialization.
S7: and based on S6, the server fetches the data from the corresponding server and returns the data to the client.
The complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing an object instance into a byte stream and restoring the object instance by reading the byte stream; the storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method; the storage algorithm model comprises document storage, the target of the document storage is to establish a bridge between a key value storage mode and a traditional relational data system, the advantages of the key value storage mode and the traditional relational data system are integrated, data are mainly stored in a JSON or JSON-like format document, the document type database is semantic, the document type database is regarded as an upgrade version of the key value database, key values are allowed to be nested in the stored values, and the document storage model can generally establish indexes for the values so as to be convenient for upper-layer application.
In the scheme, the operation process of the client on the stored data comprises the steps that the servers are distributed on a ring; the client starts to perform data operation; the server carries out hash calculation on the key of the data to obtain a number; comparing each point corresponding to the circular ring by using the obtained hash value to obtain a drop point of the data on the circular ring; the server clockwise searches a server node closest to the drop point; the servers operate on the respective servers.
In this scheme, the storage structure of data can be obtained by the following four basic storage methods:
a Sequential Storage method, which stores logically adjacent nodes in physically adjacent Storage units, the logical relationship between the nodes is represented by the adjacency relationship of the Storage units, the Storage representation obtained thereby is called Sequential Storage Structure (Sequential Storage Structure), usually by means of array description of program language, the method is mainly applied to linear data Structure, and non-linear data Structure can also realize Sequential Storage by some linearization method.
A Linked Storage method does not require that logically adjacent nodes are physically adjacent, the logical relationship between nodes is represented by an additional pointer field, and the resulting Storage representation is called a chained Storage Structure (Linked Storage Structure), usually described by means of a pointer type of a programming language.
An Index storage method, generally, while storing node information, an additional Index table is also established, where the Index table is composed of a plurality of Index entries, and if each node has an Index entry in the Index table, the Index table is called Dense Index (Dense Index), and if a group of nodes only corresponds to one Index entry in the Index table, the Index table is called sparse Index (sparse Index), and the general form of the Index entry is:
the key words are data items which can uniquely identify one node, and the address of the index item in the dense index indicates the storage position of the node; the address of an index entry in the sparse index indicates the initial storage position of a group of nodes; the hash storage method has the basic idea that: and directly calculating the storage address of the node according to the keyword of the node.
In the description herein, references to the description of "one embodiment," "an example," "a specific example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (7)

1. A storage algorithm model based on complex data serialization is characterized by comprising the following steps:
s1: extracting complex data, adopting a serialization framework Thrift, and utilizing IDL syntax to define and describe data types and services;
s2: the method comprises the steps that a predicate uses struct keywords to describe a general name of a class of objects, and the class of objects is translated into a class in a target language after being compiled by a Thrift compiler to obtain stored data;
s3: operating the stored data by using the client;
s4: based on the steps, performing hash calculation on the key of the data through the server to obtain a number;
s5: the server carries out surplus calculation on the obtained numbers and the number of the servers so as to obtain the serial number of the server;
s6: and operating on the corresponding server by using the server to complete the storage algorithm operation based on the complex data serialization.
2. The model of claim 1, wherein the IDL file in S1 is generated by a special code generator to generate corresponding target language code for the user to use in the application, and the Thrift IDL syntax is similar to C language.
3. The model of complex data serialization-based storage algorithm according to claim 1, wherein each domain in the thread struct in S2 is composed of four attributes, including domain number, domain modification, domain type and domain name.
4. The storage algorithm model based on complex data serialization is characterized in that the complex data serialization comprises two parts of serialization and deserialization, which respectively correspond to two processes of writing an object instance into a byte stream and restoring an object instance from a read byte stream.
5. The storage algorithm model based on complex data serialization is characterized in that the storage structure of the data comprises a sequential storage method, a link storage method, an index storage method and a hash storage method.
6. The storage algorithm model based on complex data serialization according to claim 1, characterized in that said storage algorithm model comprises document storage, said document storage aims at building a bridge between key value storage and traditional relational data system, integrating the advantages of both, its data is mainly stored in JSON or JSON-like format documents, it is semantic, document type database is regarded as an upgrade of key value database, allowing key values to be nested in stored values, and document storage model can generally create index for its values to facilitate upper application.
7. The storage algorithm model based on complex data serialization of claim 1, wherein S7: and based on S6, the server fetches the data from the corresponding server and returns the data to the client.
CN202210883606.9A 2022-07-26 2022-07-26 Storage algorithm model based on complex data serialization Withdrawn CN115168303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210883606.9A CN115168303A (en) 2022-07-26 2022-07-26 Storage algorithm model based on complex data serialization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210883606.9A CN115168303A (en) 2022-07-26 2022-07-26 Storage algorithm model based on complex data serialization

Publications (1)

Publication Number Publication Date
CN115168303A true CN115168303A (en) 2022-10-11

Family

ID=83496319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210883606.9A Withdrawn CN115168303A (en) 2022-07-26 2022-07-26 Storage algorithm model based on complex data serialization

Country Status (1)

Country Link
CN (1) CN115168303A (en)

Similar Documents

Publication Publication Date Title
Sevilla Ruiz et al. Inferring versioned schemas from NoSQL databases and its applications
US6785685B2 (en) Approach for transforming XML document to and from data objects in an object oriented framework for content management applications
CN103714129B (en) Dynamic data structure based on conditional plan and the construction device of relation and construction method
US8417714B2 (en) Techniques for fast and scalable XML generation and aggregation over binary XML
US7634515B2 (en) Data model and schema evolution
CN114880483A (en) A metadata knowledge graph construction method, storage medium and system
US7383274B2 (en) Systems and methods for efficiently storing and accessing data storage system paths
US20130191404A1 (en) Using views of subsets of nodes of a schema to generate data transformation jobs to transform input files in first data formats to output files in second data formats
CN101290625A (en) A Storage and Retrieval Method of XML Document
US20020002566A1 (en) Transfromation of marked up documents using a base architecture
CN101901234A (en) Method and system for converting XML data into resource description framework data
Davardoost et al. Extracting OLAP cubes from document-oriented NoSQL database based on parallel similarity algorithms
CN116628066A (en) Data transmission method, device, computer equipment and storage medium
CN101794223B (en) Design method of WADE service message architecture
Černjeka et al. NoSQL document store translation to data vault based EDW
Su-Cheng et al. Mapping of extensible markup language-to-ontology representation for effective data integration
CN116701325B (en) Binary file cache-based XBRL classification standard loading method
CN115168303A (en) Storage algorithm model based on complex data serialization
Alaoui et al. Semantic oriented data modeling based on RDF, RDFS and OWL
Mahmoud et al. Using semantic web technologies to improve the extract transform load model
CN112783836A (en) Information exchange method, device and computer storage medium
O'Connor et al. Desirable properties for XML update mechanisms
CN114816387A (en) Entity class generation method and device
Anderson et al. Structure and behavior awareness in themis
Geipel et al. Metamorph: a transformation language for semi-structured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221011