TWI506456B - System and method for dispatching hadoop jobs in multi-cluster environment - Google Patents
System and method for dispatching hadoop jobs in multi-cluster environment Download PDFInfo
- Publication number
- TWI506456B TWI506456B TW102118168A TW102118168A TWI506456B TW I506456 B TWI506456 B TW I506456B TW 102118168 A TW102118168 A TW 102118168A TW 102118168 A TW102118168 A TW 102118168A TW I506456 B TWI506456 B TW I506456B
- Authority
- TW
- Taiwan
- Prior art keywords
- cluster
- feature
- work
- matrix equation
- module
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 10
- 239000011159 matrix material Substances 0.000 claims description 99
- 230000003068 static effect Effects 0.000 claims description 28
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000007405 data analysis Methods 0.000 claims description 11
- 238000012544 monitoring process Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 description 7
- 230000006399 behavior Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Landscapes
- Debugging And Monitoring (AREA)
- Multi Processors (AREA)
Description
本發明係關於一種基於Hadoop多叢集環境的工作分派系統及方法,特別係指一種結合叢集特徵分析、工作特徵分析,對叢集與運算任務進行特徵的分析判斷,並透過執行環境的選擇進行運算任務的派送,以達成簡便且效率高之服務運算資源之調配。The invention relates to a work assignment system and method based on Hadoop multi-cluster environment, in particular to a combination cluster feature analysis, work feature analysis, analysis and judgment of cluster and operation tasks, and computing tasks through execution environment selection. Delivery to achieve a simple and efficient deployment of service computing resources.
近年來因為大量的資訊化,使得一般企業與政府機構面對的是爆炸性成長的資料量,無論是在資料儲存、資料庫或資料檢索與資料探勘的領域中,都遭遇相同的問題,資料過濾與整理的龐大且耗時的工作,已無法由一台超級電腦負荷,轉而導向透過大量的群組電腦同時進行運算,進而獲得最大的效益。現今的資訊領域採用雲端服務的技術提供分散式運算來解決上述的問題,其中又以Apache Hadoop為主要的開放原始碼解決方案之一。In recent years, due to the large amount of informationization, the average enterprise and government agencies are facing an explosive growth in data volume. In the fields of data storage, database or data retrieval and data exploration, the same problem is encountered. The huge and time-consuming work that has been done is no longer burdened by a supercomputer, but instead is directed to computing through a large number of group computers, so as to obtain maximum benefits. Today's information field uses cloud-based services to provide decentralized computing to solve the above problems, including Apache Hadoop as one of the main open source solutions.
Hadoop實做出一個分散式運算的處理框架概念稱為MapReduce,透過將對數據進行的運算工作分發給網路上的每個節點處理,每個節點會周期性的把完成的工作和狀態的更新報告回來,進而達成大規模的數據運算分析。在此處理 框架之下,工作的排程與分派預設為FIFO(First In First Out)演算法,雖然架構上簡單,卻因此忽略運算工作本質上需求的差異,可能造成某項工作長期占用資源的情況。此外,系統參數的調校是否能與運算工作本質上的需求相符合,也是另一項在Hadoop系統當中相當重要的因素,但是若需要滿足此項條件,使用者往往需要針對不同的運算工作重新設定整體系統環境參數,以便讓整體系統的效能與運作可以配合運算工作的需求。由此可見,上述習用的方法仍有諸多缺失,實非一良善之設計者,而亟待加以改良。如何大量的群組電腦同時進行運算,卻又能針對運算工作進行最佳化,本案發明人鑑於上述習用方法所衍生的各項缺點,乃亟思加以改良創新,並經多年苦心孤詣潛心研究後,終於成功研發完成本件基於Hadoop多叢集環境的工作分派之系統。Hadoop actually makes a decentralized operation processing framework called MapReduce. By distributing the data processing work to each node on the network, each node periodically reports the completed work and status updates. Come back, and then achieve a large-scale data operation analysis. Processing here Under the framework, the scheduling and assignment of work is preset to the FIFO (First In First Out) algorithm. Although the architecture is simple, it ignores the difference in the essential requirements of the computing work, which may cause a certain task to occupy resources for a long time. In addition, whether the adjustment of the system parameters can be consistent with the essential requirements of the computing work is another important factor in the Hadoop system, but if this condition is met, the user often needs to re-work for different operations. Set the overall system environment parameters so that the overall system performance and operation can match the needs of computing work. It can be seen that there are still many shortcomings in the above-mentioned methods, which are not a good designer, but need to be improved. How can a large number of group computers perform calculations at the same time, but they can optimize the computing work. The inventors of the present invention have improved and innovated in view of the shortcomings derived from the above-mentioned conventional methods, and after years of painstaking research. Finally, I successfully developed this system based on Hadoop multi-cluster environment.
本發明之目的即在於提供一種裝置與系統,特別是應用在多個大量資料處理的分散式電腦叢集,能夠根據執行程式特徵,待處理資料特性,與電腦叢集的動態行為,選擇最佳的執行環境。可以降低不同運算特性的工作之排程等待時間,有效的加快運算分析的速度,並提升整體資源使用效率。The object of the present invention is to provide an apparatus and system, in particular, a distributed computer cluster applied to a plurality of large-scale data processing, which can select the best execution according to the characteristics of the execution program, the characteristics of the data to be processed, and the dynamic behavior of the computer cluster. surroundings. It can reduce the scheduling waiting time of work with different computing characteristics, effectively speed up the calculation and analysis, and improve the overall resource utilization efficiency.
可達成上述發明目的之基於Hadoop多叢集環境的工作分派系統及方法,係利用一組叢集特徵與監控模組、工作資料與程式分析模組以及執行環境選擇模組的結合,提供最佳化的Hadoop多叢集環境工作分派系統給使用者執行大資料運算服務。其方法係透過掌握叢集特徵、監控叢集運作情 形、分析運算資料特性與程式運算特性等影響參數,進而運算比對找出最合適的叢集,再透過執行環境選擇模組找到對應的叢集,並將用戶工作,包含用戶程式與輸入資料派送到對應的叢集中執行。A work distribution system and method based on the Hadoop multi-cluster environment that achieves the above objects, and utilizes a combination of a cluster feature and a monitoring module, a work data and a program analysis module, and an execution environment selection module to provide optimization. The Hadoop multi-cluster environment work dispatching system performs large data computing services for users. The method is to grasp the cluster characteristics and monitor the cluster operation. Shape, analysis of the operating data characteristics and program operation characteristics and other influencing parameters, and then the operation to find the most appropriate cluster, and then through the execution environment selection module to find the corresponding cluster, and the user work, including user program and input data is sent to The corresponding cluster is executed.
1‧‧‧工作分派系統1‧‧‧Work assignment system
11‧‧‧特徵資料庫模組11‧‧‧Characteristic database module
12‧‧‧叢集特徵模組12‧‧‧ Cluster Feature Module
13‧‧‧叢集監控模組13‧‧‧ Cluster Monitoring Module
14‧‧‧工作資料分析模組14‧‧‧Work data analysis module
15‧‧‧工作程式分析模組15‧‧‧Working program analysis module
16‧‧‧執行環境選擇模組16‧‧‧Execution Environment Selection Module
2‧‧‧用戶操作介面2‧‧‧User interface
3‧‧‧客戶程式3‧‧‧Client
4‧‧‧輸入資料4‧‧‧ Input data
5‧‧‧迷你叢集5‧‧‧Mini Cluster
6‧‧‧主機叢集6‧‧‧ Host cluster
第1圖為本發明之基於Hadoop多叢集環境的工作分派系統架構圖;第2圖為本發明基於Hadoop多叢集環境的工作分派系統之運作流程圖;第3圖為本發明基於Hadoop多叢集環境的工作分派系統之執行環境選擇流程圖。1 is a schematic diagram of a work distribution system based on a Hadoop multi-cluster environment of the present invention; FIG. 2 is a flowchart of operation of a work distribution system based on a Hadoop multi-cluster environment; FIG. 3 is a Hadoop multi-cluster environment according to the present invention; The execution environment selection flow chart of the work distribution system.
如第1圖所示,為本發明基於Hadoop多叢集環境的工作分派系統及方法之一種實施範例的架構示意圖,包括有:一特徵資料庫模組11,係用以儲存後述叢集特徵模組12、叢集監控模組13、工作資料分析模組14、工作程式分析模組15的矩陣方程式;一叢集特徵模組12,係用以收集叢集中不會隨著時間改變的靜態特徵,並以叢集靜態特徵矩陣方程式來描述其收集到的靜態特徵;一叢集監控模組13,用以定期收集每個叢集的動態特徵,並分析動態特徵曲線,以建立叢集動態特徵矩陣方程式來描述叢集特徵分析結果; 一工作資料分析模組14,係用以收集工作執行中不會隨著時間改變的靜態特徵,並以工作靜態特徵矩陣方程式來描述其收集到的靜態特徵;一工作程式分析模組15,係用以分析用戶程式在執行時使用資源的情形,主要用以建立工作動態特徵矩陣方程式來描述用戶程式行為特徵;一執行環境選擇模組16,係用以由工作程式分析模組15與叢集特徵模組12建立的矩陣方程式中選出最適合用戶工作之叢集,並將其送往對應的叢集;本發明基於Hadoop多叢集環境的工作分派系統運作流程如第2圖所示,客戶將其工作(包含客戶程式3與輸入資料4)透過用戶操作介面2送至Hadoop多叢集環境的工作分派系統1,工作分派系統1由客戶工作特性與各叢集6特性找出最適合之叢集在將其送往此叢集執行,工作分派系統1中各個模組之說明如下。FIG. 1 is a schematic structural diagram of an implementation example of a work distribution system and method based on a Hadoop multi-cluster environment according to the present invention, including: a feature database module 11 for storing a cluster feature module 12 described later. , the cluster monitoring module 13, the work data analysis module 14, the work program analysis module 15 matrix equation; a cluster feature module 12, is used to collect static features that the cluster does not change over time, and clustered The static feature matrix equation is used to describe the collected static features; a cluster monitoring module 13 is used to periodically collect the dynamic features of each cluster and analyze the dynamic characteristic curves to establish a cluster dynamic feature matrix equation to describe the cluster feature analysis results. ; A work data analysis module 14 is configured to collect static features that do not change over time during work execution, and describe the collected static features by working static feature matrix equations; a work program analysis module 15, The method for analyzing the use of resources by the user program during execution is mainly used to establish a working dynamic characteristic matrix equation to describe the user program behavior feature; an execution environment selection module 16 is configured to analyze the module 15 and the cluster feature by the working program. The cluster equation established by the module 12 selects the cluster most suitable for the user's work and sends it to the corresponding cluster; the working process of the work distribution system based on the Hadoop multi-cluster environment of the present invention is shown in FIG. 2, and the customer works ( The work distribution system 1 including the client program 3 and the input data 4) is sent to the Hadoop multi-cluster environment through the user operation interface 2, and the work distribution system 1 finds the most suitable cluster by the customer work characteristics and the cluster 6 characteristics, and sends it to the cluster. This cluster is executed, and the description of each module in the work distribution system 1 is as follows.
首先,叢集監控模組13定期收集每個叢集的動態特徵(例如CPU頻率(GHz)、Disk空間、Memory的使用量),並針對動態特徵曲線進行分析,將分析結果轉換成叢集動態特徵矩陣方程式,再儲存在特徵資料庫模組11。舉例來說,定期收集N個叢集(C_1…C_N)的n個動態特徵,如每秒CPU頻率(GHz)的使用量(%)、Disk空間的使用量(%)等,並以矩陣表示:
每個叢集各取時間間隔(t 1 ~t k
),其中k為間隔總數,計算出每個時間間隔的平均使用量,並以n×k
矩陣表示:
再將每個叢集在各時間間隔的1×n 矩陣儲存到特徵資料庫模組11。Each cluster is stored in the feature database module 11 at a 1×n matrix of each time interval.
而叢集特徵模組12主要負責收集叢集中不會隨著時間改變的靜態特徵,例如CPU核心數、CPU頻率(GHz)、Disk空間大小、Memory大小等規格,並將收集到的資料轉換成叢集靜態特徵矩陣方程式儲存在特徵資料庫模組11中,第i個叢集的矩陣方程式以1xn矩陣表示:
例如:
當有新叢集加入系統時,叢集特徵模組12會收集其靜態特徵,並同樣儲存在特徵資料庫模組11。When a new cluster is added to the system, the cluster feature module 12 collects its static features and also stores them in the feature database module 11.
工作資料分析模組14收集工作執行中不會隨著時間改變的靜態特徵,例如總資料量大小、總資料筆數、資料格式型態、是否壓縮等,描述工作的靜態特徵。當有新工作進入工作分派系統時,工作資料分析模組14會收集其靜態特徵,並將收集到的資料轉換成工作靜態特徵矩陣方程式儲存在特徵資料庫模組11中,工作靜態特徵矩陣方程式以1xn矩 陣表示:[Js 1 Js 2 Js 3 …Js n ] (6) The work profile analysis module 14 collects static features that do not change over time during the execution of the work, such as the total amount of data, the total number of data, the format of the data format, whether it is compressed, etc., describing the static characteristics of the work. When a new job enters the work distribution system, the work data analysis module 14 collects its static features, and converts the collected data into a working static feature matrix equation stored in the feature database module 11, working static characteristic matrix equation Expressed in a 1xn matrix: [Js 1 Js 2 Js 3 ... Js n ] (6)
例如:[總資料筆數] 1×n (7) For example: [Total data] 1×n (7)
工作程式分析模組15,係用以分析客戶程式3在執行時使用資源的情形,係工作特徵分析模組中的一子模組,主要用以建立矩陣方程式來描述客戶程式3行為特徵;用戶於用戶操作介面2提交客戶程式3與輸入資料4的存放路徑後,工作程式分析模組15從輸入資料4擷取固定筆數的資料當作樣本,將客戶程式3與輸入資料樣本上載至迷你叢集5,要求迷你叢集5啟動程式開始處理輸入資料樣本,記錄客戶程式3在處理固定筆數資料樣本時使用迷你叢集5資源的情形(例如中央處理器、記憶體、檔案讀取與寫入要求、網路封包讀取與寫入要求)與花費時間,並將收集到的資料轉換成工作動態特徵矩陣方程式儲存在特徵資料庫模組11中,工作動態特徵矩陣方程式以1xn矩陣表示,其中Jd n 為第n個工作動態特徵參數:[Jd 1 Jd 2 Jd 3 …Jd n ] (8) The work program analysis module 15 is used to analyze the situation in which the client 3 uses resources during execution, and is a sub-module in the work feature analysis module, which is mainly used to establish a matrix equation to describe the behavior characteristics of the client 3; After the user operation interface 2 submits the storage path of the client 3 and the input data 4, the work program analysis module 15 takes a fixed number of data from the input data 4 as a sample, and uploads the client 3 and the input data sample to the mini. Cluster 5, requires the Mini Cluster 5 startup program to start processing input data samples, and record the case where the client 3 uses the Mini Cluster 5 resource when processing a fixed number of data samples (eg, CPU, memory, file read and write requirements). , network packet read and write requirements) and time spent, and the collected data is converted into a working dynamic characteristic matrix equation stored in the feature database module 11, the working dynamic characteristic matrix equation is represented by a 1xn matrix, where Jd n is the nth working dynamic characteristic parameter: [Jd 1 Jd 2 Jd 3 ... Jd n ] (8)
例如:
執行環境選擇模組16之運作流程如第3圖所示,首先從特徵儲存庫模組11取得叢集監控模組13、叢集特徵模組12、工作資料分析模組14與工作程式分析模組15分析之
叢集靜態特徵矩陣方程式、叢集動態特徵矩陣方程式、工作靜態特徵矩陣方程式[Js 1 Js 2 Js 3 …Js n ]
與工作動態特徵矩陣方程式[Jd 1 Jd 2 Jd 3 …Jd n ]
並計算出用戶程式特徵矩陣方程式與對應各叢集的叢集特徵矩陣方程式如第(10)與(11)式所示:
其中F1 job
代表用戶程式特徵矩陣方程式中的第一個特徵,其值為工作靜態特徵矩陣方程式第一項Js 1
與工作動態特徵矩陣方程式第一項Jd 1
相乘之結果,後面依此類推,共有n個特徵值,而則是代表第i個叢集的叢集特徵矩陣方程式的第一個特徵,同樣也有n個特徵值,由於叢集動態特徵矩陣方程式的值為當時叢集的平均使用率,而我們分析需要的為叢集的剩餘使用率,所以以(1-)計算出叢集剩餘使用率,其值是由叢集靜態特徵矩陣方程式第一項 與第一項叢集動態特徵矩陣剩餘使用率(1-)相乘而來,這裡以叢集1為例,其叢集特徵矩陣方程式即為,為了避免混淆之後我們以J表示用戶程式特徵矩陣方程式而Ci表示第i個叢集的叢集特徵矩陣方程式,如第3圖說明第二步是針對Ci做分群的動作,首先將不適合的叢集過濾掉,由於部分用戶程式特徵有下限值,若叢集對應的特徵值低於下限值這些用戶程式就無法在叢集上執行,舉例來說用戶程式特徵中有disk使用量,若叢集特徵中的disk剩餘量低於用戶程式所需disk使用量時,此叢集就不適合執行此用戶程式。在判別不
適合的叢集可透過比較用戶程式特徵矩陣方程式與各叢集特徵矩陣方程式,若其中的元素屬於有下限值的特徵,且叢集特徵矩陣方程式元素小於用戶程式特徵矩陣方程式,表示此叢集特徵矩陣方程式對應的叢集不適合執行目前的用戶程式,於是不適合之叢集集合Clusterunsuitable
的表示如第(12)式所示:
其中Cluster all
代表所有叢集特徵矩陣方程式集合,L代表有下限值的特徵集合,表示Ci
叢集的叢集特徵矩陣方程式的第j個元素,而Fj job
為用戶程式特徵矩陣方程式的第j個元素,過濾掉不適合的叢集後針對剩餘的叢集特徵方程式再將其分為最優先叢集特徵矩陣方程式集合與次優先叢集特徵矩陣方程式集合,首先最優先叢集特徵矩陣方程式集合在這裡定義為叢集特徵矩陣方程式的各特徵元素皆滿足用戶程式特徵方程式的所有元素,剩餘叢集特徵矩陣方程式則是次優先叢集特徵矩陣方程式集合,這兩個集合定義如下:
ClusterCluster second priortySecond priorty =Cluster=Cluster allAll -(Cluster-(Cluster first priortyFirst priorty U ClusterU Cluster unsuitableUnsuitable ) (14)) (14)
其中Cluster first priorty 為最優先叢集特徵矩陣方程式集合而Cluster second priorty 為次優先叢集特徵矩陣方程式集合,將叢集特徵矩陣方程式分群後,下一步開始從中選擇目標叢集,選擇目標叢集可分為以下步驟: Cluster first priorty is the highest priority cluster feature matrix equation set and Cluster second priorty is the sub-priority cluster feature matrix equation set. After the cluster feature matrix equations are grouped, the next step is to select the target cluster, and the target cluster can be divided into the following steps:
A.檢查最優先叢集特徵矩陣方程式集合,若非空集合,則從集合中選擇最適合之叢集特徵矩陣方程式,在這裡用戶程式特徵矩陣方程式可視為一存在於n維空間的向量,同時最優先叢集特徵矩陣方程式集合也可視為多組存在於n為空間的向量集合,於是利用向量的距離做為選擇上的依據;一般而言距離越大代表叢集會有更充裕的資源供用戶程式執行,但本發明的目的在於降低工作等待(wait to run)時間,為了避免大量的用戶程式都配置到某個特定的叢集上降低了執行效率,於是在此選擇了向量距離最近,也就是最符合當時用戶程式執行的叢集,選擇如(15)所示:
B.如最優先叢集特徵矩陣方程式集合不存在任何的叢集特徵方程式,則從次優先叢集特徵矩陣方程式集合中做選擇,在這集合中皆是無法完全滿足用戶程式的叢集集合但還是可以順利完成用戶工作的需求,這裡的選擇方法如第一步相同,將各矩陣方程式視為存在於n維空間的向量,為了避免因叢集特徵與用戶程式特徵差別過大導致用戶程式執行時間過長,這邊一樣是以選擇兩向量空間距離最少做為選擇的依據,選擇如(17)式所示:
C.如最優先與次優先叢集特徵矩陣方程式集合皆不存在任何的叢集特徵方程式,則表示目前所有存在的叢集皆不適合執行用戶工作,此時執行環境選擇模組退回用戶工作要求,並通知使用者。C. If there is no cluster characteristic equation in the set of the highest priority and the second priority cluster feature matrix, it means that all the existing clusters are not suitable for performing user work. At this time, the execution environment selection module returns the user work request and notifies the use. By.
如有找出最適合的叢集特徵矩陣方程式,執行環境選擇模組16由選出的叢集特徵矩陣方程式找到對應的叢集,並將客戶工作,包含客戶程式3與輸入資料4派送到對應的叢集中執行。If the most suitable cluster feature matrix equation is found, the execution environment selection module 16 finds the corresponding cluster from the selected cluster feature matrix equation, and sends the client work, including the client 3 and the input data 4, to the corresponding cluster to execute. .
本發明所提供之基於Hadoop多叢集環境的工作分派之系統,與其他習用技術相互比較時,更具有下列之優點:The system of work assignment based on Hadoop multi-cluster environment provided by the invention has the following advantages when compared with other conventional technologies:
1.本發明可根據待處理資料特性、運算程式之特徵與電腦叢集的動態行為,提供最佳化的執行環境給使用者,有效降低工作等待時間,提供可行、可靠、高效率之運算服務。1. The invention can provide an optimized execution environment to the user according to the characteristics of the data to be processed, the characteristics of the computing program and the dynamic behavior of the computer cluster, effectively reducing the waiting time of the work, and providing a feasible, reliable and efficient computing service.
2.本發明的工作分派之系統與方法可根據待處理資料特性進而充分使用運算設備硬體資源,降低運算服務建置成本,確保服務的穩定性與可靠性,解決運算工作本質上需求的差異之問題,進而提昇整體服務速度與效率,其經濟效益非常明顯。2. The system and method for the work assignment of the present invention can fully utilize the hardware resources of the computing device according to the characteristics of the data to be processed, reduce the cost of computing service construction, ensure the stability and reliability of the service, and solve the difference in the essential requirements of the computing work. The problem, and thus the overall service speed and efficiency, its economic benefits are very obvious.
上列詳細說明係針對本發明之一可行實施例之具體說明,惟該實施例並非用以限制本發明之專利範圍,凡未脫離本發明技藝精神所為之等效實施或變更,均應包含於本案之專利範圍中。The detailed description of the preferred embodiments of the present invention is intended to be limited to the scope of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.
綜上所述,本案不但在空間型態上確屬創新,並能較習用物品增進上述多項功效,應已充分符合新穎性及進 步性之法定發明專利要件,爰依法提出申請,懇請 貴局核准本件發明專利申請案,以勵發明,至感德便。In summary, this case is not only innovative in terms of space type, but also can enhance the above-mentioned multiple functions compared with the customary items, and should fully comply with the novelty and progress. The statutory invention patent requirements of the step, apply in accordance with the law, and ask your office to approve the invention patent application, in order to invent invention, to the sense of virtue.
1‧‧‧工作分派系統1‧‧‧Work assignment system
11‧‧‧特徵資料庫模組11‧‧‧Characteristic database module
12‧‧‧叢集特徵模組12‧‧‧ Cluster Feature Module
13‧‧‧叢集監控模組13‧‧‧ Cluster Monitoring Module
14‧‧‧工作資料分析模組14‧‧‧Work data analysis module
15‧‧‧工作程式分析模組15‧‧‧Working program analysis module
16‧‧‧執行環境選擇模組16‧‧‧Execution Environment Selection Module
2‧‧‧用戶操作介面2‧‧‧User interface
3‧‧‧客戶程式3‧‧‧Client
4‧‧‧輸入資料4‧‧‧ Input data
5‧‧‧迷你叢集5‧‧‧Mini Cluster
6‧‧‧主機叢集6‧‧‧ Host cluster
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW102118168A TWI506456B (en) | 2013-05-23 | 2013-05-23 | System and method for dispatching hadoop jobs in multi-cluster environment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW102118168A TWI506456B (en) | 2013-05-23 | 2013-05-23 | System and method for dispatching hadoop jobs in multi-cluster environment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW201445332A TW201445332A (en) | 2014-12-01 |
| TWI506456B true TWI506456B (en) | 2015-11-01 |
Family
ID=52707054
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW102118168A TWI506456B (en) | 2013-05-23 | 2013-05-23 | System and method for dispatching hadoop jobs in multi-cluster environment |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI506456B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201216073A (en) * | 2010-10-01 | 2012-04-16 | Kuan-Chang Fu | System and method for sharing network storage and computing resource |
| TW201312467A (en) * | 2011-07-28 | 2013-03-16 | Yahoo Inc | Method and system for distributed application stack deployment |
| TW201314463A (en) * | 2011-05-20 | 2013-04-01 | Soft Machines Inc | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
| US20130124483A1 (en) * | 2011-11-10 | 2013-05-16 | Treasure Data, Inc. | System and method for operating a big-data platform |
-
2013
- 2013-05-23 TW TW102118168A patent/TWI506456B/en not_active IP Right Cessation
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201216073A (en) * | 2010-10-01 | 2012-04-16 | Kuan-Chang Fu | System and method for sharing network storage and computing resource |
| TW201314463A (en) * | 2011-05-20 | 2013-04-01 | Soft Machines Inc | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
| TW201312467A (en) * | 2011-07-28 | 2013-03-16 | Yahoo Inc | Method and system for distributed application stack deployment |
| US20130124483A1 (en) * | 2011-11-10 | 2013-05-16 | Treasure Data, Inc. | System and method for operating a big-data platform |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201445332A (en) | 2014-12-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104298550B (en) | A kind of dynamic dispatching method towards Hadoop | |
| Chen et al. | How does the workload look like in production cloud? analysis and clustering of workloads on alibaba cluster trace | |
| WO2021179462A1 (en) | Improved quantum ant colony algorithm-based spark platform task scheduling method | |
| CN104516784B (en) | A kind of method and system for predicting the task resource stand-by period | |
| US20180198855A1 (en) | Method and apparatus for scheduling calculation tasks among clusters | |
| CN107357652B (en) | A cloud computing task scheduling method based on segmentation sorting and standard deviation adjustment factor | |
| Xu et al. | Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters | |
| JP6953800B2 (en) | Systems, controllers, methods, and programs for running simulation jobs | |
| Canali et al. | Improving scalability of cloud monitoring through PCA-based clustering of virtual machines | |
| Saklani et al. | Multicore Implementation of K-Means Clustering Algorithm | |
| Vakilinia et al. | Analysis and optimization of big-data stream processing | |
| US20220050814A1 (en) | Application performance data processing | |
| Kumar et al. | A comprehensive review of straggler handling algorithms for mapreduce framework | |
| CN103019855A (en) | Method for forecasting executive time of Map Reduce operation | |
| CN112148475A (en) | Task scheduling method and system of loongson big data all-in-one machine integrating load and power consumption | |
| CN112000460A (en) | Service capacity expansion method based on improved Bayesian algorithm and related equipment | |
| CN104077398B (en) | Work distribution system and method based on Hadoop multi-cluster environment | |
| Suleiman et al. | A framework for characterizing very large cloud workload traces with unsupervised learning | |
| Ouyang et al. | An approach for modeling and ranking node-level stragglers in cloud datacenters | |
| US10817401B1 (en) | System and method for job-to-queue performance ranking and resource matching | |
| Piao et al. | Computing resource prediction for mapreduce applications using decision tree | |
| TWI506456B (en) | System and method for dispatching hadoop jobs in multi-cluster environment | |
| Leonenkov et al. | Supercomputer efficiency: Complex approach inspired by Lomonosov-2 history evaluation | |
| Qi et al. | Data mining based root-cause analysis of performance bottleneck for big data workload | |
| Lang et al. | Implementation of load balancing algorithm based on flink cluster |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |