[go: up one dir, main page]

CN105955819B - Hadoop-based data transmission method and system - Google Patents

Hadoop-based data transmission method and system Download PDF

Info

Publication number
CN105955819B
CN105955819B CN201610243294.XA CN201610243294A CN105955819B CN 105955819 B CN105955819 B CN 105955819B CN 201610243294 A CN201610243294 A CN 201610243294A CN 105955819 B CN105955819 B CN 105955819B
Authority
CN
China
Prior art keywords
intermediate result
file
index
task
reduce task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610243294.XA
Other languages
Chinese (zh)
Other versions
CN105955819A (en
Inventor
曹政
郭嘉梁
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201610243294.XA priority Critical patent/CN105955819B/en
Publication of CN105955819A publication Critical patent/CN105955819A/en
Application granted granted Critical
Publication of CN105955819B publication Critical patent/CN105955819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data transmission method and system based on Hadoop, this method comprises: intermediate result file production steps, establish an intermediate result file to store intermediate result caused by all Map tasks at any time;Establishment step is indexed, index file is established, which is updated according to the intermediate result file at any time;Transfer step actively sends the intermediate result that do not transmit to the Reduce task when judging to have the intermediate result that do not transmit in the intermediate result file according to the index file and corresponding Reduce task has been started up.Present invention reduces the execution times of Hadoop task, so that Map task and the degree of concurrence of Reduce task are higher.Resource utilization ratio is improved, the storage overhead of system is reduced.

Description

Data transmission method and system based on Hadoop
Technical field
The present invention relates to big data processing system field more particularly to a kind of data transmission method based on Hadoop and it is System.
Background technique
For a clearer understanding of the present invention, being carried out first to several nouns explained below:
Hadoop system: the distributed system infrastructure developed by Apache foundation, user can be not In the case where angle distribution formula low-level details, distributed program is developed.Hadoop realizes a distributed file system (Hadoop Distributed File System), abbreviation HDFS.
MapReduce Computational frame: the software frame of the parallel processing large data sets based on Hadoop distributed file system Frame constitutes the two big core components of Hadoop with HDFS.MapReduce Computational frame needs to run is appointed by the Map that user realizes Business and Reduce task.
Fig. 1 is Hadoop system flow chart of data processing schematic diagram.Each Map task can generate an intermediate result text Part, each intermediate result file include multiple regions, and the number in region is equal to the number of Reduce task.When Map task execution After complete, Reduce task can send the intermediate result of request Map task generation, execute Reduce task after merger sequence Calculation processing logic, finally write the result into HDFS.
Although this design method is used till today always in Hadoop system, shortcoming is still remained.
It in the above-mentioned methods, need to be from Reduce task to Map task requests intermediate result.Further, Reduce appoints Business could be applied obtaining intermediate result after the completion of needing passively to wait until Map task execution.Then in Map task execution process In, even if having produced a part of intermediate result, it can not be sent to Reduce task immediately and handled, can only wait, Thus reduce running efficiency of system.Meanwhile Internet resources can generate the free time because Map task is not completed, Reduce appoints Business is not as Map task is completed, and no input data is without can be carried out subsequent calculating.As it can be seen that aforesaid way makes this There is the neutral gear having to wait in the multiple portions of system.
Fig. 2 is the intermediate result storage mode schematic diagram of Map task in Hadoop system, and each Map task is respectively provided with Corresponding buffer circle.Each Map task (Map1, Map2, Map3) is first by intermediate result < keyword of generation, value, region > be stored in corresponding buffer circle, it is needed after buffer circle capacity reaches threshold value by the data in buffer circle It is ranked up according to region, the data in region are ranked up according to keyword, and the result deposit one after sequence is temporarily overflow Out in file.After the Map task execution, all spill files for needing to correspond to the Map task carry out merger row Sequence is merged into an intermediate result file.As it can be seen that each Map task respectively generates an intermediate result file.
The mode of this storage intermediate result, so that every server generates a large amount of intermediate result file.If cluster is advised Mould is 1000 nodes, needs to handle the data of 100T, will generate 1,000,000 Map tasks, per node on average just needs to transport 1000 Map tasks of row, therefore each node is likely to require and opens simultaneously the biography that 1000 intermediate result files carry out data It is defeated, so that system storage overhead is higher.
Patent document 1 (Publication No. CN102209087A) discloses a kind of in the data with storage network (SAN) The heart carry out the transmission of MapReduce data method, the data center include be deployed with Job Server, Map task server and The multiple servers of Reduce task server, this method comprises: in response to the Map task for receiving Job Server distribution, Map task server executes Map task and generates Map task output result;Map task is exported result by Map task server Write-in storage network;And the Reduce task in response to receiving Job Server distribution, Reduce task server is from depositing It stores up network and reads Map task output result.But the patent is absorbed in the network bandwidth advantage using storage network, accelerates intermediate As a result efficiency of transmission, but traditional mode by Reduce task requests intermediate result is still used, exist as described above Problem.
Summary of the invention
Present invention solves the technical problem that being, a kind of data transmission method and system based on Hadoop is provided, so that Intermediate result can be transmitted in the implementation procedure of Map task, improve the degree of concurrence of Map task and Reduce task.
Further, resource utilization ratio can be improved in the present invention, after Reduce task is executed as early as possible Continuous calculating logic, so that computing resource and Internet resources are fully utilized.
Further, the present invention only generates an intermediate result file on every server, reduces in the transmission Between result when, the file number of opening, reduce system storage overhead.
To solve the above-mentioned problems, the invention discloses a kind of data transmission methods based on Hadoop, this method comprises:
Intermediate result file production steps establish an intermediate result file stored produced by all Map tasks at any time Intermediate result;
Establishment step is indexed, index file is established, which is updated according to the intermediate result file at any time;
Transfer step, when judged to exist in the intermediate result file according to the index file intermediate result do not transmitted and When corresponding Reduce task has been started up, the intermediate result that do not transmit actively is sent to the Reduce task.
The index file is directed to each intermediate result and an index is arranged, which records corresponding Reduce task The deviation post of information, transmission flag bit and intermediate result in the intermediate result file.
The transfer step further comprises: after the Reduce Mission Success receives the intermediate result, in this Between transmission flag bit in index corresponding to result be updated to transmit.
The intermediate result file production steps further comprise:
Step 10, each Map task stores generated intermediate result in the buffer;
Step 20, after the usage amount when the buffer area reaches buffer threshold, all centres for will being stored in the buffer area As a result merger sequence is carried out, the intermediate result after sequence is stored into interim spill file;
Step 30, after interim spill file number reaches spill file threshold value, in all interim spill files Between result carry out merger sequence, by after sequence intermediate result store into the intermediate result file.
The intermediate result includes: keyword, value and region, and the merger sequence in the step 20 and the step 30 is equal Are as follows: it is ranked up according to region, is ranked up in region according to keyword.
The invention also discloses a kind of data transmission system based on Hadoop, which includes:
Intermediate result file generating unit stores all Map task institutes for establishing an intermediate result file at any time The intermediate result of generation;
Index establishes unit, for establishing index file, updates the index file at any time according to the intermediate result file;
Transmission unit, for judging in the intermediate result file when according to the index file in the presence of the intermediate knot not transmitted Fruit and when corresponding Reduce task has been started up, actively sends the intermediate result that do not transmit to the Reduce task.
The index file is directed to each intermediate result and an index is arranged, which records corresponding Reduce and appoint Deviation post of the information, transmission flag bit and intermediate result of being engaged in the intermediate result file.
The transmission unit further comprises a updating unit, for receiving centre knot when the Reduce Mission Success After fruit, the transmission flag bit in index corresponding to the intermediate result is updated to transmit.
The intermediate result file generating unit further comprises:
First storage unit, for generated intermediate result to be stored in the buffer to each Map task;
Second storage unit after reaching buffer threshold for the usage amount when the buffer area, will store in the buffer area All intermediate results carry out merger sequence, by after sequence intermediate result store into interim spill file;
Third storage unit, for after interim spill file number reaches spill file threshold value, to all interim spillings Intermediate result in file carries out merger sequence, and the intermediate result after sequence is stored into the intermediate result file.
The intermediate result includes: keyword, value and region, in second storage unit and the third storage unit Merger sequence is equal are as follows: is ranked up according to region, is ranked up in region according to keyword.
The invention also discloses a kind of distributed file systems, including the data transmission system based on Hadoop.
The beneficial effects of the present invention are:
1. shorten the execution time of Hadoop task: finger daemon is in Map task execution by concurrent between raising subtask Intermediate result is transmitted in the process, so that Map task and the degree of concurrence of Reduce task are higher.
2. improving resource utilization ratio: it can be carried out the transmission of intermediate result during Map task execution, Reduce task can also execute subsequent calculating logic as soon as possible, so that computing resource and Internet resources have obtained sufficient benefit With.
3. reducing system storage overhead: only generating an intermediate result file on every server, reduce and transmitting When intermediate result, the file number of opening.
4, the multi-level buffer of the intermediate result and the storage mode of multiple merger sequence through the invention, so that finally Data in deposit intermediate result file realize ordering, on this basis, can need to be sent to same Reduce in collection It is unified that the predetermined intermediate result is sent after the predetermined intermediate result of task, to improve efficiency of transmission.
Detailed description of the invention
Fig. 1 show Hadoop system flow chart of data processing schematic diagram;
Fig. 2 show the intermediate result storage mode schematic diagram of Map task in Hadoop system;
Fig. 3 show a kind of flow chart of data transmission method based on Hadoop of the invention;
Fig. 4 A show the flow chart of the method for Map task storage intermediate result of the invention;
Fig. 4 B show the structural schematic diagram of index file of the invention;
Fig. 5 show the specific flow chart of the data transmission method of the invention based on Hadoop;
Fig. 6 show the flow chart that Reduce task of the invention receives intermediate result.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to of the invention Data transmission method and system based on Hadoop are illustrated.It should be appreciated that specific embodiment described herein is only used To explain the present invention, it is not intended to limit the present invention.
In order to reduce the waiting time of Reduce task, the degree of concurrence of Map task and Reduce task is improved, is dropped simultaneously Low system storage overhead, the mode that the present invention generates an intermediate result file to Map task each in background technique carry out It improves, is revised as all Map tasks for running in same server and carries out intermediate result using same intermediate result file Storage, cooperate the use of index file, so that intermediate result can be transmitted in the implementation procedure of Map task, in Between result continuous generation and active push gives Reduce task, to improve the degree of concurrence of Map task Yu Reduce task, Improve resource utilization ratio.
It is illustrated in figure 3 a kind of flow chart of data transmission method based on Hadoop of the invention, is included the following steps:
Intermediate result file production steps establish an intermediate result file stored produced by all Map tasks at any time Intermediate result;
Establishment step is indexed, index file is established, which is updated according to the intermediate result file at any time;
Transfer step, when judged to exist in the intermediate result file according to the index file intermediate result do not transmitted and When corresponding Reduce task has been started up, the intermediate result that do not transmit actively is sent to the Reduce task.
Specifically, the present invention passes through caused by all Map tasks in same intermediate result document storage server Intermediate result.With the continuous execution of Map task, generated intermediate result can be according to generated time sequencing successively It stores into intermediate result file.
In another optimal enforcement example, the storage of intermediate result can also be carried out as follows.It is as shown in Figure 4 A The method of Map task storage intermediate result of the invention, including the following steps:
Step 410, intermediate result is stored to buffer area: intermediate result < keyword of generation is worth, region by Map task > storage is in the buffer.
Step 420, intermediate result is stored to spill file:, will after the usage amount when buffer area reaches buffer threshold All intermediate results in the buffer area carry out merger sequence, and the intermediate result after sequence is stored in interim spill file, Merger sequence is ranked up first, in accordance with " region " this field in intermediate result, and the intermediate result in the same area is pressed According to " keyword ", this field is ranked up.
Step 430, intermediate result is stored to intermediate result file: overflows text when interim spill file number reaches interim After part threshold value, merger sequence is carried out to the intermediate result in all interim spill files, the intermediate result after sequence is added Into intermediate result file.Merger sequence is still to be ranked up first, in accordance with " region " this field in intermediate result, According to " keyword ", this field is ranked up intermediate result in the same area.
It can be seen that with the continuous execution of Map task, intermediate result is endlessly generated, and these intermediate results By way of above-mentioned multi-level buffer and multiple merger sequence, endlessly storage enters intermediate result file.
The present invention has also set up an index file, which is used for as every intermediate knot in the intermediate result file Fruit establishes index.It therefore, is then during this is newly-increased in the index file whenever increasing an intermediate result in intermediate result file newly Between result is corresponding increases an index.
It is as shown in Figure 4 B the structural schematic diagram of the index file.
Every index records: corresponding Reduce mission bit stream, transmission flag bit, intermediate result are in intermediate result text Deviation post in part.
Specifically, since intermediate result file includes multiple regions, each region corresponds to a Reduce task, so Pass through " region " this field documented in intermediate result, so that it may determine corresponding Reduce mission bit stream.In Fig. 4 B Shown, Reduce process label is the corresponding Reduce mission bit stream.
Whether the intermediate result that transmission flag bit is used to mark this index corresponding has been transferred to Reduce task.Then When entering intermediate result file in every intermediate result, and generating an index accordingly, transmission flag bit is disposed as " not Transmission ".In the present embodiment, it " can not be transmitted " by " 0 " expression, " 1 " expression " transmission ".
Deviation post of the intermediate result in the intermediate result file represents the intermediate result in the intermediate result file In actual storage locations.This intermediate result then can be navigated in intermediate result file by the deviation post.
It is illustrated in figure 5 the specific flow chart of the data transmission method of the invention based on Hadoop.The present invention exists One finger daemon is installed in Hadoop, Fig. 5 the method is executed by the finger daemon.
Step 11, Map task is executed, the index file of intermediate result file and the intermediate result file, the rope are generated The structure of quotation part is as shown in Figure 4 B;
Step 12, finger daemon traverses the index file, checks the transmission flag bit of all intermediate results;
Step 13, when Map task executions all in server finish, and all intermediate results are to have transmitted, and are jumped to Step 18;Otherwise, step 14 is executed;
Step 14, if finger daemon discovery there is the intermediate result not yet transmitted and corresponding Reduce task has opened It is dynamic, then follow the steps 15;Otherwise, 12 are gone to step;
Step 15, finger daemon actively sends the intermediate result that do not transmit to Reduce task;
It guards and carries out finding the centre that transmission flag bit is 0 in intermediate result file for according to the guide of index file As a result, and sending it to Reduce task corresponding with Reduce process number;
Step 16, if finger daemon receives response message, 17 are thened follow the steps;Otherwise, 12 are gone to step;
Step 17, the transmission flag bit for the intermediate result transmitted in step 15 in index file is changed to by finger daemon Transmission, gos to step 12;
After finger daemon receives the response message from Reduce task, by the transmission mark of the intermediate result in index file Will position is labeled as 1;
Step 18, finger daemon exits.
Corresponding, the process that the Reduce task receives intermediate result is, described in the Reduce task reception The intermediate result that finger daemon is sent, and the message being properly received, the Reduce Mission Success are sent to the finger daemon After having received all intermediate results, then carry out subsequent calculating.It is specific as shown in Figure 6, including the following steps:
Step 21, Reduce task collects intermediate result;
Step 22,24 are gone to step if Reduce Mission Success receives intermediate result,;Otherwise, step 23 is executed;
Step 23, Reduce task does not do response;
Step 24, Reduce task returns to the finger daemon for sending intermediate result and is properly received message;
Step 25, if Reduce Mission Success has received all intermediate results, 26 are thened follow the steps;Otherwise, step is jumped to Rapid 21;
Step 26, Reduce task carries out subsequent calculating.
Based on the above content as it can be seen that the beneficial effects of the present invention are:
1. shorten the execution time of Hadoop task: finger daemon is in Map task execution by concurrent between raising subtask Intermediate result is transmitted in the process, so that Map task and the degree of concurrence of Reduce task are higher.
2. improving resource utilization ratio: it can be carried out the transmission of intermediate result during Map task execution, Reduce task can also execute subsequent calculating logic as soon as possible, so that computing resource and Internet resources have obtained sufficient benefit With.
3. reducing system storage overhead: only generating an intermediate result file on every server, reduce and transmitting When intermediate result, the file number of opening.
4, the storage mode to be sorted by the multi-level buffer of intermediate result described in Fig. 4 A and multiple merger, so that finally depositing Entering the data in intermediate result file realizes ordering, and on this basis, Fig. 5 step 15 can need to be sent to same in collection It is unified that the predetermined intermediate result is sent after the predetermined intermediate result of Reduce task, to improve efficiency of transmission.
The present invention is described in detail above, specific case used herein is to the principle of the present invention and embodiment party Formula is expounded, and the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile it is right In those of ordinary skill in the art, according to the thought of the present invention, change is had in specific embodiments and applications Place, in conclusion the contents of this specification are not to be construed as limiting the invention.

Claims (7)

1.一种基于Hadoop的数据传输方法,其特征在于,该方法包括:1. a data transmission method based on Hadoop, is characterized in that, the method comprises: 中间结果文件产生步骤,建立一个中间结果文件以随时存储所有Map任务所产生的中间结果;具体包括:每个Map任务将所产生的中间结果存储在缓冲区中;当该缓冲区的使用量达到缓冲区阈值后,将该缓冲区中存储的所有中间结果进行归并排序,将排序后的中间结果存储到临时溢出文件中;当临时溢出文件个数达到溢出文件阈值后,对所有临时溢出文件中的中间结果进行归并排序,将排序后的中间结果存储到该中间结果文件中;The step of generating the intermediate result file is to create an intermediate result file to store the intermediate results generated by all the Map tasks at any time; it specifically includes: each Map task stores the generated intermediate results in a buffer; when the usage of the buffer reaches After the buffer threshold, all intermediate results stored in the buffer are merged and sorted, and the sorted intermediate results are stored in the temporary overflow file; when the number of temporary overflow files reaches the overflow file threshold, all temporary overflow files are sorted. Merge and sort the intermediate results of , and store the sorted intermediate results in the intermediate result file; 索引建立步骤,建立索引文件,该索引文件针对每个该中间结果设置一条索引,该索引记载有对应的Reduce任务信息、传输标志位以及中间结果在该中间结果文件中的偏移位置,根据该中间结果文件随时更新该索引文件;The index establishment step is to establish an index file, the index file sets an index for each intermediate result, and the index records the corresponding Reduce task information, transmission flag bit and the offset position of the intermediate result in the intermediate result file. The intermediate result file updates the index file at any time; 传送步骤,当根据该索引文件判断出该中间结果文件中存在未传输的中间结果且对应的Reduce任务已经启动时,主动向该Reduce任务发送该未传输的中间结果。In the transmission step, when it is determined according to the index file that there is an untransmitted intermediate result in the intermediate result file and the corresponding Reduce task has been started, actively send the untransmitted intermediate result to the Reduce task. 2.如权利要求1所述的方法,其特征在于,该传送步骤进一步包括:2. The method of claim 1, wherein the transmitting step further comprises: 当所述Reduce任务成功接收到该中间结果之后,对该中间结果所对应的索引中的传输标志位更新为已传输。After the Reduce task successfully receives the intermediate result, the transmission flag bit in the index corresponding to the intermediate result is updated to be transmitted. 3.如权利要求1所述的方法,其特征在于,该中间结果包括:关键字、值以及区域,该中间结果文件产生步骤中的归并排序为:按照区域进行排序,区域内按照关键字进行排序。3. The method of claim 1, wherein the intermediate result comprises: a keyword, a value and a region, and the merge sort in the step of generating the intermediate result file is: sorting according to the region, and performing the sorting according to the keyword in the region sort. 4.一种基于Hadoop的数据传输系统,其特征在于,该系统包括:4. a data transmission system based on Hadoop, is characterized in that, this system comprises: 中间结果文件产生单元,用于建立一个中间结果文件以随时存储所有Map任务所产生的中间结果;包括第一存储单元、第二存储单元和第三存储单元,其中该第一存储单元用于将每个该Map任务将所产生的中间结果存储在缓冲区中;该第二存储单元用于当该缓冲区的使用量达到缓冲区阈值后,将该缓冲区中存储的所有中间结果进行归并排序,将排序后的中间结果存储到临时溢出文件中;该第三存储单元用于当临时溢出文件个数达到溢出文件阈值后,对所有临时溢出文件中的中间结果进行归并排序,将排序后的中间结果存储到该中间结果文件中;The intermediate result file generating unit is used to establish an intermediate result file to store the intermediate results generated by all Map tasks at any time; it includes a first storage unit, a second storage unit and a third storage unit, wherein the first storage unit is used to store the intermediate results. Each of the Map tasks stores the generated intermediate results in a buffer; the second storage unit is used to merge and sort all the intermediate results stored in the buffer when the usage of the buffer reaches the buffer threshold , and store the sorted intermediate results in the temporary overflow file; the third storage unit is used to merge and sort the intermediate results in all temporary overflow files when the number of temporary overflow files reaches the overflow file threshold, and sort the sorted intermediate results. The intermediate result is stored in the intermediate result file; 索引建立单元,用于建立索引文件,该索引文件针对每个该中间结果设置一条索引,该索引记载有对应的Reduce任务信息、传输标志位以及中间结果在该中间结果文件中的偏移位置,根据该中间结果文件随时更新该索引文件;An index building unit is used to build an index file. The index file sets an index for each intermediate result, and the index records the corresponding Reduce task information, the transmission flag bit, and the offset position of the intermediate result in the intermediate result file. Update the index file at any time according to the intermediate result file; 传送单元,用于当根据该索引文件判断出该中间结果文件中存在未传输的中间结果且对应的Reduce任务已经启动时,主动向该Reduce任务发送该未传输的中间结果。The transmitting unit is configured to actively send the untransmitted intermediate result to the Reduce task when it is determined according to the index file that there is an untransmitted intermediate result in the intermediate result file and the corresponding Reduce task has been started. 5.如权利要求4所述的系统,其特征在于,该传送单元进一步包括一更新单元,用于当所述Reduce任务成功接收到该中间结果之后,对该中间结果所对应的索引中的传输标志位更新为已传输。5 . The system according to claim 4 , wherein the transmitting unit further comprises an updating unit, which is used for transmitting the index corresponding to the intermediate result after the Reduce task successfully receives the intermediate result. 6 . The flag bits are updated to have been transmitted. 6.如权利要求4所述的系统,其特征在于,该中间结果包括:关键字、值以及区域,该第二存储单元以及该第三存储单元中的归并排序均为:按照区域进行排序,区域内按照关键字进行排序。6. The system of claim 4, wherein the intermediate result comprises: a keyword, a value and a region, and the merge sort in the second storage unit and the third storage unit are: sorting according to the region, Sort by keyword within the area. 7.一种分布式文件系统,其特征在于,包括权利要求4~6中任一所述的基于Hadoop的数据传输系统。7. A distributed file system, comprising the Hadoop-based data transmission system according to any one of claims 4 to 6.
CN201610243294.XA 2016-04-18 2016-04-18 Hadoop-based data transmission method and system Active CN105955819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610243294.XA CN105955819B (en) 2016-04-18 2016-04-18 Hadoop-based data transmission method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610243294.XA CN105955819B (en) 2016-04-18 2016-04-18 Hadoop-based data transmission method and system

Publications (2)

Publication Number Publication Date
CN105955819A CN105955819A (en) 2016-09-21
CN105955819B true CN105955819B (en) 2019-06-18

Family

ID=56917916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610243294.XA Active CN105955819B (en) 2016-04-18 2016-04-18 Hadoop-based data transmission method and system

Country Status (1)

Country Link
CN (1) CN105955819B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388615B (en) * 2018-09-28 2022-04-01 智器云南京信息科技有限公司 Spark-based task processing method and system
CN111444148B (en) * 2020-04-09 2023-09-05 南京大学 Data transmission method and device based on MapReduce

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102209087B (en) * 2010-03-31 2014-07-09 国际商业机器公司 Method and system for MapReduce data transmission in data center having SAN
CN102214184B (en) * 2010-04-07 2013-08-14 腾讯科技(深圳)有限公司 Intermediate file processing device and intermediate file processing method of distributed computing system
CN101996079A (en) * 2010-11-24 2011-03-30 南京财经大学 MapReduce programming framework operation method based on pipeline communication
CN103327128A (en) * 2013-07-23 2013-09-25 百度在线网络技术(北京)有限公司 Intermediate data transmission method and system for MapReduce
CN103605576B (en) * 2013-11-25 2017-02-08 华中科技大学 Multithreading-based MapReduce execution system

Also Published As

Publication number Publication date
CN105955819A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
US10574545B2 (en) Techniques for analytics-driven hybrid concurrency control in clouds
CN114338504B (en) Micro-service deployment and routing method based on network edge system
CN111600936B (en) Asymmetric processing system based on multiple containers and suitable for ubiquitous electric power internet of things edge terminal
CN111459418B (en) An RDMA-based key-value storage system transmission method
US9794370B2 (en) Systems and methods for distributed network-aware service placement
CN103914399B (en) Disk buffering method and device in a kind of concurrent computational system
CN102004778B (en) Text index online updating method in cloud environment
CN110929878B (en) Distributed random gradient descent method
EP3007113A1 (en) Event processing with enhanced throughput
CN110308984B (en) Cross-cluster computing system for processing geographically distributed data
CN107247623B (en) A kind of distributed cluster system and data connecting method based on multi-core CPU
US20240394507A1 (en) Method, apparatus, system, medium and electronic device for generating graph neural network
CN110058949B (en) Sensing cloud low-coupling control method based on intelligent edge computing
CN108737527B (en) A method and system suitable for large-scale user access to a platform
CN115277454A (en) An Aggregate Communication Method for Distributed Deep Learning Training
WO2025179929A1 (en) Training architecture, method and system for graph neural network model, and server
CN105955819B (en) Hadoop-based data transmission method and system
WO2022033290A1 (en) Strong consistency storage system, strong consistency data storage method, server, and medium
CN108228323A (en) Hadoop method for scheduling task and device based on data locality
Song et al. Towards modeling large-scale data flows in a multidatacenter computing system with petri net
CN120258094A (en) A model training method, data processing method, device and program product
CN119341992A (en) Distributed current limiting method, device, equipment and medium based on DPU service grid
CN116582407B (en) A containerized microservice orchestration system and method based on deep reinforcement learning
WO2020015576A1 (en) Distributed processing method, device and storage medium on the basis of consistency protocol
CN116776097A (en) Data processing method based on reinforcement learning, electronic device and readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant