[go: up one dir, main page]

CN115801877B - Data transmission platform and method - Google Patents

Data transmission platform and method Download PDF

Info

Publication number
CN115801877B
CN115801877B CN202211425720.3A CN202211425720A CN115801877B CN 115801877 B CN115801877 B CN 115801877B CN 202211425720 A CN202211425720 A CN 202211425720A CN 115801877 B CN115801877 B CN 115801877B
Authority
CN
China
Prior art keywords
transmission
data
data transmission
cluster
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211425720.3A
Other languages
Chinese (zh)
Other versions
CN115801877A (en
Inventor
施力
陈明坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianyang Guorong Beijing Technology Co ltd
Original Assignee
Lianyang Guorong Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianyang Guorong Beijing Technology Co ltd filed Critical Lianyang Guorong Beijing Technology Co ltd
Priority to CN202211425720.3A priority Critical patent/CN115801877B/en
Publication of CN115801877A publication Critical patent/CN115801877A/en
Application granted granted Critical
Publication of CN115801877B publication Critical patent/CN115801877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种数据传输平台及方法。该数据传输平台部署在每个集群内,每个数据传输平台均包括一个调度器和至少一个节点;调度器通过各个节点选举实现单例,用于对数据传输任务进行统一流控和优先级调度;每个节点均包括接待员、传输客户端和传输服务端;接待员用于接收调度客户端下发的数据传输任务并转发给调度器;传输客户端用于接收并执行调度器下发的数据传输任务;传输服务端用于接收传输客户端传输的数据。调度器能够对数据传输任务进行统一流控和优先级调度,保证在多用户同时传输文件时依然不会占满集群间的网络带宽,保障业务优先级较高的文件优先传输;调度器通过众节点选举实现单例,当该节点故障时会重新选举,实现高可用调度。

The present invention discloses a data transmission platform and method. The data transmission platform is deployed in each cluster, and each data transmission platform includes a scheduler and at least one node; the scheduler realizes a singleton through the election of each node, which is used to perform unified flow control and priority scheduling on the data transmission task; each node includes a receptionist, a transmission client and a transmission server; the receptionist is used to receive the data transmission task issued by the scheduling client and forward it to the scheduler; the transmission client is used to receive and execute the data transmission task issued by the scheduler; the transmission server is used to receive the data transmitted by the transmission client. The scheduler can perform unified flow control and priority scheduling on the data transmission task, ensuring that the network bandwidth between clusters will not be fully occupied when multiple users transmit files at the same time, and ensuring that files with higher business priority are transmitted first; the scheduler realizes a singleton through the election of multiple nodes, and will re-elect when the node fails to achieve high-availability scheduling.

Description

Data transmission platform and method
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data transmission platform and method.
Background
Enterprises often need to transmit file data of cross-level groups during production and large data service processing, and the enterprises have the following pain points:
(1) The data magnitude is large, the data quantity is large, and the processing is difficult;
(2) Cross-region transmission, unstable network and high requirement on reliability and consistency of file data transmission;
(3) The bandwidth is limited, if the data transmission is long and the data volume is large, the network bandwidth among clusters is easy to occupy, and other services needing real-time communication are affected. Even if a single task is provided with current limiting transmission, when a plurality of users transmit at the same time, the network bandwidth among clusters still can be squeezed;
(4) The existing tools generally have no centralized bandwidth management and priority control, and bandwidth resources with limited occupation can be mutually squeezed and mutually influenced under the condition of multiple users;
(5) For safety reasons, the network access among clusters is limited, only one-way access is realized, even the network cannot be directly communicated and only indirect access can be realized through a plate hopping machine;
(6) Files under a variety of file systems, such as S3/FTP/SFTP/HDFS, across the file systems need to be able to transfer to each other.
When file data transmission is carried out, data transmission based on the FTP/SFTP protocol is the simplest transmission mode, but has a single point problem, and is faced with the data volume of tb level and unreliable network environment, single nodes tend to catch the front of the eye and break the elbow, and any network fluctuation can possibly cause file transmission interruption to influence the reliability. And cannot support transfers across file systems. DistCp is a Hadoop ecological inter-cluster file transfer tool provided by Apache, and can achieve data migration of large data magnitude. However, the tool cannot support transmission bandwidth limitation, needs to ensure network interconnection among clusters, does not support migration among clusters which are started by Kerberos but are not synchronized in clock, and is also a problem in supporting file systems other than Hadoop ecology.
Accordingly, the inventors recognize that there is a need for a high availability data transfer platform that can support reliable transfer of large amounts of data, has sophisticated throttling and priority management, supports proxy transfer, and supports multiple file systems.
Disclosure of Invention
Based on the above, in order to solve the above technical problems, a high-availability data transmission platform capable of supporting current limiting and priority management is provided, so as to solve the problem that the existing data transmission platform in the prior art does not have centralized bandwidth management and priority control.
In order to achieve the above object, the present application provides the following technical solutions:
In the first aspect, the data transmission platform is deployed in each cluster, and each cluster comprises a scheduler and at least one node, wherein the scheduler realizes singleton through each node election and is used for carrying out unified flow control and priority scheduling on received data transmission tasks, and each node comprises:
the system comprises a receiving member, a dispatching client, a data transmission platform, a receiving member and a receiving member, wherein the receiving member is used for communicating with the dispatching client and the dispatching member, receiving a data transmission task issued by the dispatching client and forwarding the data transmission task to the dispatching member;
The transmission client is used for communicating with the scheduler, receiving and executing the data transmission task issued by the scheduler, transmitting the data to a transmission server of a data transmission platform of the target cluster, and returning an execution result to the scheduler;
The transmission server is used for communicating with the transmission client, receiving the data transmitted by the transmission client, automatically checking the file after the data is transmitted, and returning the checking result to the transmission client.
Optionally, the transmission client is further configured to create a transmission channel with a transmission server of the data transmission platform of the target cluster, and transmit the data to the transmission server of the data transmission platform of the target cluster through the transmission channel.
Further optionally, the transmission channel is configured to automatically throttle transmission according to a traffic limit set by the scheduler, and support multiple data compression formats.
Further optionally, the transmission channel includes a backpressure controller, a ring buffer, a back-end monitor, a compressor, a decompressor, a frame encoder, a frame decoder, a front-end monitor, a write buffer, and a read buffer.
Further optionally, the message writing process in the transmission channel includes:
writing data into the transmission channel by an external system in a data stream mode;
the back pressure controller judges whether the back pressure controller is in a writable state of the writing buffer zone according to the use condition of the writing buffer zone provided by the front-end monitor;
The back-end monitor counts the read-write data volume of the message;
The compressor compresses the message;
the compressed information is encoded by a frame encoder, and the compressed information is packaged into one data frame;
the front-end monitor counts the data volume condition of the actual network transmission and the use condition of the writing buffer;
writing the message into a write buffer area and preparing to send;
the reading flow of the message in the transmission channel comprises the following steps:
the network message is written into the read buffer;
the front-end monitor counts the data volume condition received by the network and the service condition of the reading buffer zone;
The frame decoder decodes the network message to generate a data frame;
The decompressor decompresses the data frames to obtain the data which is actually transmitted;
The back-end monitoring area acquires the size of the actually read message and the service condition of the annular buffer area;
the back pressure controller judges whether the annular buffer area is full, if so, the back pressure of the reading end is triggered, and when the training waiting buffer area has residual space, data is written under the annular buffer area;
and the external system reads the data in the annular buffer area in a data stream mode to carry out subsequent processing.
Optionally, the data transmission platform adopts GOSSIP protocol.
Optionally, the transmission client and the transmission server support two data transmission modes, namely PUSH and PULL.
Further optionally, the receptionist is further configured to query the client for file information that needs to be migrated, so as to facilitate the scheduling of the client to create a data migration task.
In a second aspect, a data transmission method is applied to the data transmission platform provided in the first aspect, and the method includes:
the reception staff of the cluster A receives a first data transmission task issued by a first scheduling client created on a local client and forwards the first data transmission task to the scheduler of the cluster A;
The dispatcher of the cluster A performs task check and priority dispatching on the received first data transmission task, registers occupied bandwidth for the first data transmission task, and then issues the first data transmission task to a transmission client of the cluster A;
the transmission client of the cluster A establishes connection with the transmission server of the target cluster B, and performs data transmission to the transmission server of the target cluster B according to the bandwidth registered by the scheduler;
When the transmission is completed, the transmission client of the cluster A receives a verification result returned by the transmission server of the target cluster B;
after the transmission is completed, the transmission client side of the cluster A returns a task result to the dispatcher of the cluster A, and after the dispatcher of the cluster A releases the bandwidth occupied by the first data transmission task, the final result of the first data transmission task is returned to the first dispatching client side through the receptionist of the cluster A.
Optionally, the method further comprises:
When the transmission server side of the target cluster B receives data sent by the transmission client side of the cluster A, a second scheduling client side is established on the target cluster B, and the second scheduling client side sends a second data transmission task to a receptionist of the target cluster B;
The receptionist of the target cluster B receives the second data transmission task issued by the second scheduling client and forwards the second data transmission task to the scheduler of the target cluster B;
The dispatcher of the target cluster B performs task check and priority dispatching on the received second data transmission task, registers occupied bandwidth for the second data transmission task, and then transmits the second data transmission task to the transmission client of the target cluster B;
The transmission client of the target cluster B is connected with the transmission server of the target cluster C, and performs data transmission to the transmission server of the target cluster C according to the bandwidth registered by the scheduler;
When the transmission is completed, the transmission client side of the target cluster B receives a verification result returned by the transmission server side of the target cluster C;
After the transmission is completed, the transmission client side of the target cluster B returns a task result to the dispatcher of the target cluster B, and after the dispatcher of the target cluster B releases the bandwidth occupied by the second data transmission task, the final result of the second data transmission task is returned to the second dispatching client side through the reception staff of the target cluster B.
The invention has at least the following beneficial effects:
The embodiment of the invention provides a data transmission platform which is deployed in each cluster, wherein the data transmission platform of each cluster comprises a scheduler and at least one node, the scheduler realizes singlecases through node elections and is used for carrying out unified flow control and priority scheduling on received data transmission tasks, each node comprises a receiver, a transmission client and a transmission service end, the receiver is used for receiving the data transmission tasks issued by the scheduling client and forwarding the data transmission tasks to the scheduler, the transmission client is used for receiving and executing the data transmission tasks issued by the scheduler and transmitting the data to the transmission service end of the data transmission platform of a target cluster, the transmission service end is used for receiving the data transmitted by the transmission client and automatically checking files after the data is transmitted, the checking results are returned to the transmission client, the scheduler in the data transmission platform can carry out unified flow control and priority scheduling on the received data transmission tasks, the network bandwidth among the clusters is ensured not to be occupied when the files are simultaneously transmitted by multiple users through the scheduler in a unified allocation mode, normal operation of other services which need to be accessed by the network is not influenced, the scheduler is ensured to be carried out according to the priority scheduling of the single-node, and the node can realize unified flow control and priority scheduling of the received by the nodes when the nodes are subjected to single-node election high priority scheduling.
The data transmission platform adopts a distributed implementation scheme based on a GOSSIP protocol, so that all node data in the system are consistent, the system has the characteristics of decentralization, high expandability, strong fault tolerance and final consistency, and based on the protocol, the output transmission platform can realize dynamic capacity reduction and expansion in operation, fault nodes are automatically removed in node operation, and the high availability and high efficiency operation of the system are ensured.
The transmission client and the transmission server support two data transmission modes of PUSH and PULL, namely, two modes of pushing data from a local cluster to a target cluster and pulling data from the target cluster by the local cluster are simultaneously supported, and bidirectional data transmission can be realized under a network environment which can only be accessed in one direction.
Drawings
FIG. 1 is a logic architecture diagram of a data transmission platform according to an embodiment of the present invention;
FIG. 2 is a block diagram of a transmission channel in one embodiment of the invention;
FIG. 3 is a timing diagram of a data transmission process according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a general file system protocol according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of PUSH data transmission in proxy mode according to an embodiment of the present invention;
FIG. 6 is a flow chart of a PULL data transmission in proxy mode in accordance with one embodiment of the present invention;
fig. 7 is a flow chart of a data transmission method according to an embodiment of the present invention;
FIG. 8 is a task execution flow chart of a data transmission method according to an embodiment of the present invention;
fig. 9 is a logic diagram of proxy mode transmission according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, a data transmission platform is provided, a set of data transmission platforms is deployed in each cluster, each data transmission platform of each cluster comprises a scheduler and at least one node (1 or more nodes), each node comprises a role of a receptionist/a transmission client/a transmission server and the like, and the role of the scheduler exists as a single instance in the transmission platform.
Specifically, the data migration platform is distributed, and the scheduler realizes a single instance through the election of each node in the platform and bears the responsibility of carrying out unified flow control and priority scheduling on the received data transmission tasks. In order to ensure consistency of flow control and priority scheduling, each data transmission platform has only one scheduler role, and when a node where a scheduler is located fails, a new scheduler role is newly selected and established, so that high-availability task scheduling is realized.
Each node comprises:
the system comprises a receiving member, a dispatching client, a dispatching platform and a data transmission platform, wherein the receiving member is used for communicating with the dispatching client and the dispatcher, receiving a data transmission task issued by the dispatching client and forwarding the data transmission task to the dispatcher;
the transmission client is used for communicating with the scheduler, receiving and executing the data transmission task issued by the scheduler, transmitting the data to a transmission server of a data transmission platform of the target cluster, and returning an execution result to the scheduler;
The transmission server is used for communicating with the transmission client, receiving the data transmitted by the transmission client, automatically checking the file after the data is transmitted, and returning the checking result to the transmission client.
That is, the user performs message communication with the data transmission platform of the current cluster through the scheduling client, and is used for submitting a data transmission task, applying bandwidth resources, checking the running state of the task and the like; a scheduling client is also created inside the data transmission platform and is used for carrying out message communication with the data transmission platforms of other clusters. The file transfer task of the scheduling client is submitted to the scheduler first and then issued to the transfer client for actual task transfer, or task rejection due to flow control or priority restrictions.
The receptionist is the role of the receptionist of the cluster, and when the scheduling client first establishes a connection with the cluster, the scheduling client first communicates with the receptionist. The receptionist can acquire the address information of the scheduler in real time, so when the scheduling client wants to communicate with the scheduler, the receptionist can forward the communication message to the scheduler. Besides, the receptionist is also responsible for querying the client for file information in the file system to be migrated, so as to schedule the client to create a data migration task.
Further, the transmission client side also creates a transmission channel with the transmission server side of the data transmission platform of the target cluster, and transmits the data to the transmission server side of the data transmission platform of the target cluster through the transmission channel to perform file transmission. After the transmission server establishes a transmission channel with the transmission client, a file transmission task can be started. In other words, the transmission client and the server realize bidirectional data transmission and message communication through the transmission channel, the transmission channel can automatically limit the flow of the transmission according to the flow limit set by the scheduler, and support various data compression formats to realize efficient data transmission.
Further, the transmission channel includes a backpressure controller, a ring buffer, a back-end monitor, a compressor, a decompressor, a frame encoder, a frame decoder, a front-end monitor, a write buffer, and a read buffer. Fig. 2 shows the internal structure design of the transmission channel, and a user can read and write data from the transmission channel by means of a data stream.
The writing flow of the message is as follows:
1. writing data into the transmission channel by an external system in a data stream mode;
2. The back pressure controller judges whether the writing buffer area is in a writable state according to the use condition of the writing buffer area provided by the front-end monitor, and if the writing buffer area is not writable, triggers back pressure of the writing end to perform blocking waiting;
3. The back-end monitor counts the read-write data volume of the message;
4. The compressor compresses the message;
5. the compressed information is encoded by a frame encoder, and the compressed information is encapsulated into data frames one by one, so that decompression failure caused by network sticking and unpacking is prevented;
6. the front-end monitor counts the data volume condition of the actual network transmission and the use condition of the writing buffer;
7. The message is written to the write buffer and ready to be sent.
The message reading flow is as follows:
1. The network message is first written into the read buffer;
2. the front-end monitor counts the data volume condition received by the network and the service condition of the reading buffer zone;
3. The frame decoder decodes the network message to generate a data frame;
4. the decompressor decompresses the data frames to obtain the data which is actually transmitted;
5. The back-end monitoring area acquires the size of the actually read message and the service condition of the annular buffer area;
6. The back pressure controller judges whether the annular buffer area is full, if so, the back pressure of the reading end is triggered, and when the back pressure controller turns to wait for the buffer area to have the residual space, the data is written into the annular buffer area;
7. and the external system reads the data in the annular buffer area in a data stream mode to carry out subsequent processing.
Further, as shown in fig. 3, the transmission client and the transmission server support two data transmission modes, PUSH and PULL.
Further, as shown in FIG. 4, the data transfer platform provides a "universal file system protocol" that provides the operations common to multiple file systems such as querying, copying, creating directories, deleting, uploading, downloading, etc. And provides for implementation of this protocol for a variety of common file systems, including S3, HDFS, FTP, SFTP, local file systems, and the like. The data transmission platform realizes the data transmission of high availability and high performance across file systems and clusters through the protocol.
The general file system protocol also provides realization of proxy mode, and data transmission of proxy mode is aimed at realizing data transmission under isolated network environment, i.e. cluster A is not communicated with C network, and data of cluster A is transmitted to cluster B by a proxy cluster B, and then transmitted to cluster C by cluster B, so as to implement data transmission based on proxy.
The user uses the proxy mode to transfer data as if the data were sent to a new file system. In other words, the proxy mode is implemented based on the "universal file system protocol" shown in fig. 4, i.e., cluster B abstracts the cluster that it proxies into a separate file system.
The transmission client and the transmission server in the proxy mode also support two data transmission modes of PUSH and PULL, the flow diagram of PUSH data transmission in the proxy mode is shown in fig. 5, and the flow diagram of PULL data transmission in the proxy mode is shown in fig. 6.
In general, the data transmission platform provided by the invention has the following technical characteristics:
(one) high availability
The data transmission platform adopts a distributed implementation scheme based on a GOSSIP protocol, so that dynamic and rapid capacity shrinking and expanding nodes can be realized, abnormal nodes can be automatically removed when the nodes are in fault, the normal service supply of the platform in 7 x 24 hours is ensured, and the data transmission task operates normally;
the scheduler realizes a single instance through the election of a plurality of nodes, namely, only one node has a scheduler role at a certain time, and when the node fails, each node can reselect a new scheduler node to realize high-availability scheduling;
GOSSIP ensures dynamic contraction and expansion of nodes, namely, GOSSIP is a common consistency protocol in a distributed system, and utilizes a random and infectious mode to transmit information to the whole network, and enables all node data in the system to be consistent within a certain time, so that the system has the characteristics of decentralization, high expandability, strong fault tolerance and final consistency;
Vector clocks, which are a data structure that is a technique for generating partial sequence values for various operations or events in a distributed environment, can detect parallel conflicts and detect causal violations of operations or events to maintain system consistency. The data transmission platform ensures that at most one scheduler instance exists at a certain moment through the vector clock, and unified priority scheduling and unified flow control are realized by the data transmission platform.
(II) high Performance
The platform architecture layer can realize dynamic capacity contraction and expansion and multipath parallel data transmission, so that the efficiency and performance of data transmission are greatly improved;
on the bottom layer, the data transfer platform supports virtual memory mapping technology (mmap). In the data transmission process, the context switching of an operating system can be reduced, the transmission performance is improved, and the CPU and memory resource occupation of a process are reduced. And multiple data compression formats are supported, and bandwidth occupation is reduced while the same data volume is transmitted.
(III) high reliability
The automatic retry of the task which fails or overtime of the network is supported, and the automatic comparison and verification are carried out after the file transmission is finished, so that the reliability and consistency of the file transmission are ensured.
(IV) high practicality
Unified flow control, namely uniformly distributing flow resources through a scheduler, ensuring that network bandwidth among clusters is not occupied when a plurality of users transmit files at the same time, and ensuring that normal operation of other business needing network access is not influenced;
unified priority management, wherein the scheduler performs unified scheduling according to the priority of the transmission task, so that the priority transmission of the file with higher service priority is ensured;
The transmission client and the transmission server support two data transmission modes of PUSH and PULL, namely, simultaneously support two modes of pushing data from a local cluster to a target cluster and pulling data from the target cluster by the local cluster;
Supporting a proxy mode, and facing a cluster which is not communicated with a network and can only be accessed through a plate hopping machine, the data transmission platform can still realize bidirectional data transmission through the proxy mode;
And the method supports the submitting and managing of data transmission tasks through various interaction modes such as a visual page, shell command line, javaAPI and the like, and is convenient to use and develop secondarily.
In summary, the transmission platform provided by the invention has high availability and high reliability, greatly improves the stability of transmission tasks of large data magnitude, reduces the possibility of file transmission failure caused by network or hardware problems, supports multi-path parallel data transmission, virtual memory mapping and data compression, greatly improves the efficiency of data transmission, reduces the occupation of server resources, provides support for the use of multiple tenants of the transmission platform by unified flow control and unified priority management, enables the transmission tasks with higher priority to allocate higher transmission bandwidth and preferentially transmit, prevents occupying bandwidth resources of other services, supports data transmission under various complex network environments, supports two data transmission modes based on PUSH/PULL, even if a network between clusters can only access unidirectionally, can still realize bidirectional data transmission, and can realize data transmission between clusters without intercommunication of the network based on agents, supports the operation submitting and management of the data transmission tasks in various modes through visual pages and Shell command lines, reduces the operation submitting and management of users, also provides a JavaAPI (application program interface) for other integrated data systems.
In one embodiment, as shown in fig. 7, a data transmission method is provided to perform a data transmission task, and the method is applied to the data transmission platform provided in the foregoing embodiment, and includes:
S701, a receptionist of the cluster A receives a first data transmission task issued by a first scheduling client created on a local client and forwards the first data transmission task to a scheduler of the cluster A;
S702, a dispatcher of the cluster A performs task check and priority dispatching on the received first data transmission task, registers occupied bandwidth for the first data transmission task, and then issues the first data transmission task to a transmission client of the cluster A;
S703, the transmission client of the cluster A creates a connection with the transmission server of the target cluster B, and performs data transmission to the transmission server of the target cluster B according to the bandwidth registered by the scheduler;
S704, when the transmission is completed, the transmission client of the cluster A receives a verification result returned by the transmission server of the target cluster B;
and S705, after the transmission is completed, the transmission client side of the cluster A returns a task result to the dispatcher of the cluster A, and after the dispatcher of the cluster A releases the bandwidth occupied by the first data transmission task, the final result of the first data transmission task is returned to the first dispatching client side through the reception staff of the cluster A.
In other words, when a user starts to perform data transmission, a dispatching client role is firstly created locally, and is connected with a transmission platform receptionist role of a current cluster, the transmission task is sent to a receptionist, and the receptionist is forwarded to a dispatcher, after receiving the task, the dispatcher performs task check and priority dispatching, registers occupied bandwidth for the task, and then issues the task to the transmission client, the transmission client establishes connection with a transmission server of a target cluster, starts to perform data transmission according to the registered bandwidth of the dispatcher, and in the transmission process, the transmission progress is pushed to the dispatching client in real time through the receptionist. After the transmission is completed, the transmission client side returns a task result to the dispatcher, and after the dispatcher releases the bandwidth occupied by the task, the final result of the task is returned to the client side by the receptionist, so that the whole transmission task is completed.
Further, the method further comprises:
When the transmission server side of the target cluster B receives data sent by the transmission client side of the cluster A, a second scheduling client side is established on the target cluster B, and the second scheduling client side sends a second data transmission task to a receptionist of the target cluster B;
The receptionist of the target cluster B receives a second data transmission task issued by a second scheduling client and forwards the second data transmission task to the scheduler of the target cluster B;
The dispatcher of the target cluster B performs task check and priority dispatching on the received second data transmission task, registers occupied bandwidth for the second data transmission task, and then transmits the second data transmission task to the transmission client of the target cluster B;
The transmission client of the target cluster B is connected with the transmission server of the target cluster C, and performs data transmission to the transmission server of the target cluster C according to the registered bandwidth of the scheduler;
When the transmission is completed, the transmission client side of the target cluster B receives a verification result returned by the transmission server side of the target cluster C;
after the transmission is completed, the transmission client side of the target cluster B returns a task result to the dispatcher of the target cluster B, and after the dispatcher of the target cluster B releases the bandwidth occupied by the second data transmission task, the final result of the second data transmission task is returned to the second dispatching client side through the reception staff of the target cluster B.
By the method, data transmission in the proxy mode can be realized, that is, as shown in fig. 9, in the case that the cluster a and the cluster C are not in communication, the data transmission platforms of the two clusters cannot realize direct data transmission. However, if there is a cluster B that performs network communication with clusters a and C, respectively, a set of data transmission platforms may be deployed in the cluster B and the data transmission between clusters a and C may be implemented using this as a springboard.
In the proxy mode, the cluster a communicates with the cluster B according to the task execution logic shown in fig. 8 and transmits data, and the transmission server of the cluster B creates a scheduling client while receiving the data of the cluster a, and also communicates with the cluster C according to the task execution logic shown in fig. 8 and transmits the data received from the cluster a to the cluster C, thereby realizing the data transmission in the proxy mode.
In other words, the process is real-time, and it is not necessary to wait for the cluster a to transmit to the cluster B and then transmit to the cluster C after the cluster B completes, but the data of the cluster a is transmitted to the cluster C in real time after being transferred by the cluster B.
For the cluster a, only the "universal file system protocol" provided in fig. 4 and the flow shown in fig. 8 are used for data transmission with the cluster B, while the cluster B resolves the address information of the cluster C and the information of the file system from the protocol in the initialization process of fig. 5 and 6, and then the cluster B establishes a connection with the cluster C and performs data transmission according to the flow of fig. 8.
It should be understood that, although the steps in the flowchart of fig. 7 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 7 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages in other steps or other steps.
In one embodiment, a computer device is provided, including a memory and a processor, the memory having stored therein a computer program, involving all or part of the flow of the methods of the embodiments described above.
In one embodiment, a computer readable storage medium having a computer program stored thereon is provided, involving all or part of the flow of the methods of the embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static RandomAccess Memory, SRAM) or dynamic random access memory (Dynamic RandomAccess Memory, DRAM), etc.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (7)

1. The data transmission platform is characterized by being deployed in each cluster, wherein the data transmission platform of each cluster comprises a scheduler and at least one node, the scheduler realizes singleton through node election and is used for carrying out unified flow control and priority scheduling on received data transmission tasks, and each node comprises:
the system comprises a receiving member, a dispatching client, a data transmission platform, a receiving member and a receiving member, wherein the receiving member is used for communicating with the dispatching client and the dispatching member, receiving a data transmission task issued by the dispatching client and forwarding the data transmission task to the dispatching member;
The transmission client is used for communicating with the scheduler, receiving and executing the data transmission task issued by the scheduler, transmitting the data to a transmission server of a data transmission platform of the target cluster, and returning an execution result to the scheduler;
The transmission server is used for communicating with the transmission client, receiving the data transmitted by the transmission client, automatically checking the file after the data is transmitted, and returning the checking result to the transmission client;
The transmission client is also used for creating a transmission channel with the transmission server of the data transmission platform of the target cluster, and transmitting data to the transmission server of the data transmission platform of the target cluster through the transmission channel;
the transmission channel comprises a backpressure controller, an annular buffer zone, a back-end monitor, a compressor, a decompressor, a frame encoder, a frame decoder, a front-end monitor, a writing buffer zone and a reading buffer zone;
the message writing flow in the transmission channel comprises the following steps:
writing data into the transmission channel by an external system in a data stream mode;
the back pressure controller judges whether the back pressure controller is in a writable state of the writing buffer zone according to the use condition of the writing buffer zone provided by the front-end monitor;
The back-end monitor counts the read-write data volume of the message;
The compressor compresses the message;
the compressed information is encoded by a frame encoder, and the compressed information is packaged into one data frame;
the front-end monitor counts the data volume condition of the actual network transmission and the use condition of the writing buffer;
writing the message into a write buffer area and preparing to send;
the reading flow of the message in the transmission channel comprises the following steps:
the network message is written into the read buffer;
the front-end monitor counts the data volume condition received by the network and the service condition of the reading buffer zone;
The frame decoder decodes the network message to generate a data frame;
The decompressor decompresses the data frames to obtain the data which is actually transmitted;
The back-end monitoring area acquires the size of the actually read message and the service condition of the annular buffer area;
the back pressure controller judges whether the annular buffer area is full, if so, the back pressure of the reading end is triggered, and when the training waiting buffer area has residual space, data is written under the annular buffer area;
and the external system reads the data in the annular buffer area in a data stream mode to carry out subsequent processing.
2. The data transmission platform of claim 1, wherein the transmission channel is configured to automatically throttle transmission according to a flow limit set by a scheduler and support a plurality of data compression formats.
3. The data transmission platform of claim 1, wherein the data transmission platform employs GOSSIP protocol.
4. The data transmission platform according to claim 1, wherein the transmission client and the transmission server support two data transmission modes, PUSH and PULL.
5. The data transmission platform of claim 1, wherein the receptionist is further configured to query the client for file information that needs to be migrated, so as to facilitate scheduling the client to create a data migration task.
6. A data transmission method applied to the data transmission platform of any one of claims 1 to 5, the method comprising:
the reception staff of the cluster A receives a first data transmission task issued by a first scheduling client created on a local client and forwards the first data transmission task to the scheduler of the cluster A;
The dispatcher of the cluster A performs task check and priority dispatching on the received first data transmission task, registers occupied bandwidth for the first data transmission task, and then issues the first data transmission task to a transmission client of the cluster A;
the transmission client of the cluster A establishes connection with the transmission server of the target cluster B, and performs data transmission to the transmission server of the target cluster B according to the bandwidth registered by the scheduler;
When the transmission is completed, the transmission client of the cluster A receives a verification result returned by the transmission server of the target cluster B;
after the transmission is completed, the transmission client side of the cluster A returns a task result to the dispatcher of the cluster A, and after the dispatcher of the cluster A releases the bandwidth occupied by the first data transmission task, the final result of the first data transmission task is returned to the first dispatching client side through the receptionist of the cluster A.
7. The data transmission method according to claim 6, characterized in that the method further comprises:
When the transmission server side of the target cluster B receives data sent by the transmission client side of the cluster A, a second scheduling client side is established on the target cluster B, and the second scheduling client side sends a second data transmission task to a receptionist of the target cluster B;
The receptionist of the target cluster B receives the second data transmission task issued by the second scheduling client and forwards the second data transmission task to the scheduler of the target cluster B;
The dispatcher of the target cluster B performs task check and priority dispatching on the received second data transmission task, registers occupied bandwidth for the second data transmission task, and then transmits the second data transmission task to the transmission client of the target cluster B;
The transmission client of the target cluster B is connected with the transmission server of the target cluster C, and performs data transmission to the transmission server of the target cluster C according to the bandwidth registered by the scheduler;
When the transmission is completed, the transmission client side of the target cluster B receives a verification result returned by the transmission server side of the target cluster C;
After the transmission is completed, the transmission client side of the target cluster B returns a task result to the dispatcher of the target cluster B, and after the dispatcher of the target cluster B releases the bandwidth occupied by the second data transmission task, the final result of the second data transmission task is returned to the second dispatching client side through the reception staff of the target cluster B.
CN202211425720.3A 2022-11-15 2022-11-15 Data transmission platform and method Active CN115801877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211425720.3A CN115801877B (en) 2022-11-15 2022-11-15 Data transmission platform and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211425720.3A CN115801877B (en) 2022-11-15 2022-11-15 Data transmission platform and method

Publications (2)

Publication Number Publication Date
CN115801877A CN115801877A (en) 2023-03-14
CN115801877B true CN115801877B (en) 2025-01-14

Family

ID=85437649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211425720.3A Active CN115801877B (en) 2022-11-15 2022-11-15 Data transmission platform and method

Country Status (1)

Country Link
CN (1) CN115801877B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207814A (en) * 2012-12-27 2013-07-17 北京仿真中心 Decentralized cross cluster resource management and task scheduling system and scheduling method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657371B (en) * 2017-01-03 2020-01-21 北京奇虎科技有限公司 Scheduling method and device for transmission node
CN113485821A (en) * 2021-09-08 2021-10-08 北京交通大学 High-reliability video conference system, control method thereof and storage medium
CN114143039B (en) * 2021-11-05 2024-04-16 中国电子科技集团公司第十五研究所 A global multi-level unified and secure data transmission method and server cluster

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207814A (en) * 2012-12-27 2013-07-17 北京仿真中心 Decentralized cross cluster resource management and task scheduling system and scheduling method

Also Published As

Publication number Publication date
CN115801877A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN110113420B (en) NVM-based distributed message queue management system
US5687372A (en) Customer information control system and method in a loosely coupled parallel processing environment
CN103442049B (en) The mixed clouds operating system architecture of a kind of component-oriented and communication means thereof
CN103297268B (en) Based on the distributed data consistency maintenance system and method for P2P technology
US7260589B2 (en) High performance support for XA protocols in a clustered shared database
CN114637475A (en) A distributed storage system control method, device and readable storage medium
CN102158540A (en) System and method for realizing distributed database
WO2020025049A1 (en) Data synchronization method and apparatus, database host, and storage medium
CN103312624A (en) Message queue service system and method
CN104899274A (en) High-efficiency remote in-memory database access method
CN114253929B (en) Network disk system architecture based on distributed file storage
EP0747813A2 (en) Customer information control system and method with temporary storage queuing functions in a loosely coupled parallel processing environment
CN111382132A (en) Medical image data cloud storage system
CN103607448A (en) Method for storage of ATC system dynamic data
CN109032753A (en) A kind of isomery virtual hard disk trustship method, system, storage medium and Nova platform
US5790868A (en) Customer information control system and method with transaction serialization control functions in a loosely coupled parallel processing environment
US20030154202A1 (en) Distributed data system with process co-location and out -of -process communication
US20250021261A1 (en) Data Processing Method and Data Processing Apparatus for Converged System, Device, and System
CN105468643B (en) The access method and system of distributed file system
CN110633046A (en) A storage method, device, storage device and storage medium for a distributed system
CN113641763A (en) Distributed time sequence database system, electronic equipment and storage medium
US5630133A (en) Customer information control system and method with API start and cancel transaction functions in a loosely coupled parallel processing environment
CN115238006A (en) Retrieval data synchronization method, device, equipment and computer storage medium
CN115801877B (en) Data transmission platform and method
CN109343928B (en) Virtual memory file redirection method and system for virtual machine in virtualization cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant