[go: up one dir, main page]

WO2013140412A1 - A method and system for distributed computing of jobs - Google Patents

A method and system for distributed computing of jobs Download PDF

Info

Publication number
WO2013140412A1
WO2013140412A1 PCT/IN2012/000192 IN2012000192W WO2013140412A1 WO 2013140412 A1 WO2013140412 A1 WO 2013140412A1 IN 2012000192 W IN2012000192 W IN 2012000192W WO 2013140412 A1 WO2013140412 A1 WO 2013140412A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
sub
jobs
database server
linked list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IN2012/000192
Other languages
French (fr)
Inventor
Sachindran Kunjumpidukkal
Pramod Krishna KAMATH
Anilkumar Krishna PATIL
Paras Shailesh DESAI
Abhijit Mukund GADGIL
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Infosys Ltd
Original Assignee
Infosys Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Infosys Ltd filed Critical Infosys Ltd
Priority to EP12871948.1A priority Critical patent/EP2828761A4/en
Priority to PCT/IN2012/000192 priority patent/WO2013140412A1/en
Publication of WO2013140412A1 publication Critical patent/WO2013140412A1/en
Priority to PH12014502110A priority patent/PH12014502110A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • the present invention relates generally to a system and method for dynamically allocating and provisioning computing resources in a distributed shared computing environment. More particularly, the present invention relates to a system and method for executing jobs in a multi- machine environment comprising of a plurality of application servers.
  • System Performance is a central aspect in enterprise systems deployed in various businesses such as Information Technology, banking industry, telecommunication, military, health care and financial services. Few models concerning system performance for the enterprise systems have been continuously researched in the past. However, framework of the models used in the past has been based on anticipating an expected load on the enterprise systems rather than on modeling the system based on a runtime load. Anticipating the expected load on the enterprise systems has been known to lead to system failures and hence a poor system performance leading to customer dissatisfaction and a loss of revenue to a business running the enterprise systems.
  • performance modeling tools based on statistical analysis and mathematical modeling, is used as another technique for improving system performance, however these tools are known to be very expensive involving major in-house resources to operate, and typically only focus on a particular aspect of the overall system.
  • Certain other enterprise systems employ, performance monitoring tools for monitoring the system performance periodically, however, such tools end to report problems too early resulting in unnecessary expenditure or too late resulting in the system failure.
  • none of the existing systems deploy an integrated framework that can improve the overall performance of the enterprise systems, at an affordable cost.
  • the present invention discloses a system and method for distributed computing of a plurality of jobs submitted by a plurality of users.
  • the system comprises a database server configured to log the plurality of jobs and one or more application servers configured to execute the plurality of jobs by a bin agent.
  • the bin agent of an application server further comprises a job allocation module, configured to pull a logged job from the database server and a data fetch module, configured to fetch a set of data records essential for execution of the logged job from the database server
  • the bun agent comprises a configuration module configured to divide the logged job into a plurality of sub- jobs for distributed execution of the logged job and a caching module, configured to store the fetched set of data records in the form of a plurality of sub-linked lists in a cache server, where each sub- linked list is associated with a sub-job of the plurality of sub-jobs.
  • the bin agent comprises a computing engine configured to execute the sub-job by accessing the associated sub- linked list.
  • a method for distributed computing of a plurality of jobs may include logging the plurality of jobs in a database server, pulling by a bin agent the logged job for execution by an application server, where the bin agent exists within the application server, fetching by the bin agent a set of data records, required for execution of the logged job, from the database server, storing the fetched set of data records as a linked list in a cache server, dividing the linked list into a plurality of sub-linked lists where each sub-linked list is associated with a sequential identifier , dividing the logged job into a plurality of sub-jobs in the database server where each sub-job is associated with at least one sub-linked list in the cache server, and executing the sub-job by the bin agent, by accessing the associated sub-linked list from the cache server.
  • FIG. 1 is a schematic illustration, in accordance with an embodiment, of an environment for distributed computing of a plurality of jobs by one or more application servers communicatively coupled with a database server;
  • FIG. 2 is a schematic diagram illustrating a system for tracking the health status of the database server and the one or more application servers, in accordance with an embodiment of the invention.
  • FIG. 3 is a schematic diagram of a cache server for storing a plurality of data records fetched from the database server, as a serialized linked list, in accordance with an embodiment of the invention.
  • FIG. 4 is a flowchart illustrating an embodiment of a method for distributed computing of a plurality of jobs by one or more application servers.
  • FIG. 5 illustrates a generalized example of a computing environment 500.
  • the present invention is directed to a method and system for distributed computing of varying demand loads on the system by a plurality of shared computing resources.
  • the plurality of shared computing resources includes a plurality of application servers and service-oriented components hosted in a computing grid environment.
  • the shared computing resources can be localized in a particular location or can be distributed over a wide geographical area.
  • the varying demand loads can be visualized as execution of a plurality of jobs submitted by one or more users of the system.
  • FIG. 1 illustrates an environment 100 in which various embodiments of the invention can be practiced.
  • the environment 100 includes a plurality of users 144a, 144b to 144x, those submit a plurality of jobs viz. Job 1 142a, Job 2 142b to Job n 142x for execution, a collector table 106 installed in a database server 102, a dispatcher module 118, a cache server 120, and a plurality of application servers such as application server 1 136, application server 2 138, and application server n 140.
  • the plurality of jobs; 142a, 142b to 142x can be of various types such as online jobs, batch jobs and channel jobs.
  • Online jobs usually execute only once at a predetermined time where as batch jobs can occur repeatedly, at a predefined time intervals.
  • an instance of the batch job in the system deployed for a banking institution may be an EOD BOD (End of Day Beginning of Day) operation that needs to be repeated in a time interval of 24 hours.
  • An instance of the EOD BOD operation of the banking institution can be an interest calculation for all current accounts held by all customers of the banking institution.
  • instances of EOD BOD operations could be inter-sol reconciliation, processing of renewal of all term deposit accounts held at the banking institution and making a business day's limit available to all the customers.
  • the collector table 106 is configured to accept and store the plurality of jobs; 142a, 142b to 142x that are submitted to the system for execution.
  • the jobs that are stored in the collector table 106 are said to be in the lodged state.
  • Job 1 110, Job2 112, and Job n 114 are hereinafter referred to as lodged jobs.
  • the collector table 106 is logically represented by a set of queues, where each queue, such as queue 108, holds a set of jobs based on certain predefined conditions such as priority.
  • the jobs are placed in different queues based on the predefined conditions.
  • the predefined condition of priority is set by updating a priority field, a parameter of a job.
  • the priority field determines the order of execution of the job within a job group.
  • the job group usually represents a set of jobs 142a to 142n allocated to a particular application server for execution.
  • the jobs 142a to 142n that can be executed in parallel usually have the same priority.
  • the priority field of the job 142a to 142n can be set by the user, such as user 1 144a. Online jobs are usually given a default priority of high. However, as the priority field is configurable, once a job is submitted or lodged, the priority field of the job can be changed to a new order of priority. In such a case, the default priority of the job shall be overridden by the new order of priority.
  • each queue is identified by a unique ID, known as queue ID.
  • each queue is configured to store a plurality of information associated with the plurality of jobs viz. 142a, 142b to 142x.
  • the plurality of information include parameters of the plurality of jobs, parameters of the plurality of users who submitted the plurality of jobs, the application servers on which the plurality of jobs are being executed and a status of completion of each of the plurality of jobs.
  • a lodged job such as Job 1 110
  • the dispatcher module 118 is preferably a non-listening service responsible for picking up jobs at appropriate time and dispatching them to the plurality of application servers viz. 136, 138 and 140 for execution.
  • the dispatcher module 118 is configured to poll for jobs lodged in a set of queues.
  • the system may be configured to have more than one dispatcher when the demand load is high, whereby each dispatcher is configured to poll for lodged jobs in a set of queues.
  • a single dispatcher is preferably configured to poll almost ten queues.
  • the lodged Job 1 110 gets allocated to an application server 136 to 138 for execution, by populating a machine ID field of the lodged Job 1 110 to the address of the application server 136 to 138.
  • the Job 1 110 shall be marked as 'D-dispatched' by the dispatcher module 118.
  • Any of the known algorithms of load balancing, such as round robin and peak load distribution algorithms may be employed in the dispatcher logic 144 for populating the machine ID field of the lodged Job 1 110.
  • the Job 1 110 can be picked up for execution only by the application server to whom, the Job 1 110 has been allocated, in one embodiment the application server 1 may be 136.
  • the Job 1 110 In case health status of the application server 1 136 is not conducive for execution of the Job 1 110, the Job 1 110 shall remain in the dispatched state till the health status of the application server 1 136 becomes conducive and the Job 1 110 is picked up for execution. Contrastingly, in the pull model, the machine ID field is not populated, and hence the Job 1 110 shall be marked as 'D- dispatched' by the dispatcher module 118 without allocating the Job 1 110 to any of the plurality of application servers viz. 136, 138 and 140. Further, the Job 1 110 can be picked up for execution by any one of the plurality of application servers via, 136, 138 and 140, whose health status is conducive for executing the Job 1 110.
  • a set of predefined parameters of the Job 1 110 are checked by the dispatcher module 118.
  • the set of predefined parameters determine whether the Job 1 110, is ready for execution.
  • the set of predefined parameters include, the priority field, a scheduled time for execution, and the availability of resources required for execution of the Job 1 110.
  • the dispatcher module 118 basically sets an executable indicator of the Job 1 110, indicating that the Job 1 110, is in the 'dispatched' state and hence ready for execution.
  • the Job 1 110 in the dispatched state is picked up for execution by the application server 1 136 only when the health status of the application server is conducive for execution.
  • the health status of application server 1 136 is determined by a resource monitoring unit 116a installed within the application server 1 136.
  • the resource monitoring unit 116a periodically checks for the processor, memory utilization and the number of current jobs being executed by the application server 1 136. In case the processor, memory utilization and the number of current jobs being executed are below a predefined first preset level, a health status indicator 200a (refer FIG. 2) is updated in a shared memory 134a of the application server 1 136, by the resource monitoring unit 116a.
  • Setting of the health status indicator 200a signifies that the health status of the application server 1 136 is conducive for execution of at least an additional job.
  • the application server 1 136 may require the access to data records, stored within a database 104 of the database server 102.
  • the plurality of application servers viz. 136, 138 and 140 are sent to the database server 102, load on the database server 102 gets built up, as a result of which the database server 102 may cease to respond to certain requests.
  • it is essential to monitor the health status of the database server 102, before forking or picking up the Job 1 110 for execution.
  • the health status of the database server 102 is determined by an agent service of the resource monitoring unit 116a, namely the resource monitoring agent 116 installed within the database server 102.
  • the health status of the database server 102 is requested by the resource monitoring unit 116a at 206 (FIG. 2) to the resource monitoring agent 116.
  • the determined health status of the database server 102 is preferably sent by the resource monitoring agent 116 to the requesting resource monitoring unit 116a.
  • a health status indicator of the database server 202a is updated by the resource monitoring unit 116a; based on the received health status of the database server 102. Setting of the health status indicator 202a indicates that the health status of the database server 102 is conducive for execution of the Jobl 110.
  • An overall status 204a for execution of the Jobl 110 by the application server 1 136 is determined by a combination of the health status indicator 200a and the health status indicator 202a. Hence the overall status 204a is preferably set only when the health status of both the database server 102 and the application server 1 136 are conducive for executing the Jobl 110.
  • the execution of the Jobl 110 is performed by a bin agent 122a of the application server.
  • a bin agent is usually a non-listening service configured for every application server.
  • the bin agent 122a is configured to pick the jobs allocated to the application server 1 136 by the dispatcher module 118 in the push model, and to pick jobs when the health status of the application serverl 136 are conducive in the pull model.
  • the bin agent 122a includes a job allocation module 130, for picking up a dispatched job such as Jobl 110 for execution, based on a set of predefined conditions.
  • the set of predefined conditions include the executable indicator of the Jobl 110, the health status indicator 200a of the application serverl 136, and the health status indicator 202a of the database server 102.
  • the bin agent 122a further includes a data fetch module 128, a caching module 126, a configuration module 132 and a computing engine 124.
  • the data fetch module 128 is preferably responsible for fetching a set of data records, required for execution of the Jobl 110, from the database server 102.
  • the caching module 126 is configured to store the fetched set of data records as a linked list in the cache server 120, partition the linked list into a plurality of sub-linked lists; and provide each sub-linked list with a sequential identifier such as sequential identifierl 316 (refer FIG. 3).
  • the configuration module 132 is configured to divide a pulled job such as Jobl 110, into a plurality of sub-child jobs viz. 110a, 110b and 110c, and store each of the plurality of sub-child jobs viz. 110a, 110b and 110c, in the database server 102.
  • Each sub- child job is preferably associated with at least one sub-linked list in the cache server 120.
  • a sub- child job such as 110a can be pulled from the database server 102, by the computing engine 124 for execution.
  • the computing engine 124 can access the associated sub-linked list of the sub- child job 110a, from the cache server 120, and execute the sub-child job 110a.
  • FIG. 3 illustrates an embodiment of the cache server 120.
  • the cache server 120 is basically used for storing sets of data records retrieved from the database server; for faster access of data by the plurality of application servers 136, 138 and 140.
  • the cache server 120 is preferably located centrally, and hence is preferably accessible by each of the plurality of application servers viz. 136, 138 and 140, of the system. As a result the cache server 120 enables, faster retrieval of data and thereby an enhanced system performance.
  • each lodged job such as Job 1 110, Job2 112, and Job n 114, are preferably represented by a Main Child Info Structure, such as Main Child Info Structure 1 300.
  • the Main Child Info Structure 1 300 includes a key also known as the Request ID 302 that stores the Jobl 110 in a string form.
  • the Request ID 302 enables synchronizing the Jobl 110 with the associated linked list of data records.
  • the sub-child jobs field 305 illustrates the number of sub-child jobs the Jobl 110 is partitioned into by the configuration module 132.
  • a sub-child record 1 308 field points to a sub- child info structure 1 314a, that contains, information required for execution of a sub-child jobl.
  • the sub-child Info structurel 314a is represented by a sequential identifier 1 316, which is used for associated the sub-child jobl with the associated sub-linked list 328a, which is pointed by the sub-child info structure 314a.
  • the FirstRecord field 318 points to the first record; Record 1 318a, of the sub-linked list 328a, and the LastRecord field 320 stores the address of the last record viz. Record4 318d.
  • Each record stores the Value 332 of data record, the length of the data record as ValueLength 334, and address of the next record as Next Record Pointer 336.
  • the data records are stored in the form of a serialized linked list, whereby only the address of the first and last records are stored in each of the sub-child info structures viz, 314a, 314b and 314x.
  • each of the sub-child info structures viz. 314a, 314b to 314x are stored in the form of a hash map in the main child info structure 300.
  • the hash map viz. 308, 310 up to 312 is basically an array of pointers.
  • Each sub-child info structure, such as sub-child info strucutrel 314a contains a field called record count 332, which signifies the number of data records associated with the sub-linked list 328a; that is pointed to by the sub-child info structure 314a.
  • record count 332 signifies the number of data records associated with the sub-linked list 328a; that is pointed to by the sub-child info structure 314a.
  • a clear flag 306 of the main child info structure 1 300 is set to 'true', when Get Flag of the plurality of sub-child info structures of the main child info structure 1 300, viz. 314a, 314b, and 314x are set to 'true'.
  • Setting of the clear flag to 'true' signifies that the entire linked list of the Jobl 110 has been fetched by one or more computing engines of the plurality of application server's viz. 136, 138, and 140, for distributed execution of the Jobl 110.
  • the linked list is no longer necessary in the cache server 120 and needs to be flashed, so that a plurality of data records required for execution of the subsequent Jobs; such as Job 2 112 up to Job n 114, can be stored in the cache server 120.
  • a job completion status indicator in the collector table 106 signifying the status of completion of a sub- job is updated by a computing engine that executes the sub- job.
  • the job completion status indicator of a sub- job can be set to; 'Success' if the sub- job has been executed successfully by the computing engine, and to 'Failure' if the sub- job could not be executed successfully.
  • the job completion status indicator of the plurality of sub-child jobs of Jobl 110 for instance, is continuously polled by a grid reporting agent.
  • the Jobl 110 shall be reported by the Grid Reporting Agent as being successfully completed, only if the job completion status of each of the plurality of the sub- jobs, represented by the sub-child info structures viz.
  • 314a, 314b and 314x are set to 'Success'. If the job completion status of even one of the sub- jobs is reported as a 'Failure', the job completion status of the Jobl 110 shall be reported by the Grid Reporting Agent as a 'Failure'.
  • FIG. 4 describes an embodiment of a method of practicing the instant invention.
  • a plurality of jobs submitted by at least one user, are logged in a database server.
  • a logged job is pulled by a job allocation module of a bin agent, where the bin agent is installed within an application server.
  • a set of data records are fetched at step 406, by a data fetch module of the bin agent, from the database server.
  • the set of data records are essential for execution of the pulled job.
  • the fetched set of data records are stored as a linked list in a cache server, by a caching module of the bin agent.
  • the linked list is further divided into a plurality of sub-linked lists by the caching module, in stored in the database server at step 410.
  • Each of the plurality of sub-linked lists is associated with a sequential identifier such as the sequential identifierl 316.
  • the pulled job is divided into a plurality of sub-jobs in the database server, by a configuration module of the bin agent.
  • Each of the sub-job is associated with a sub-liked list through the sequential identifier.
  • a sub-job is pulled, by a computing engine, from the database server for execution.
  • the associated sub-linked list present in the cache server is accessed by the computing engine for execution of the sub-job.
  • the associated sub-linked list is flashed from the cache server by the caching module.
  • a job completion status indicating the success or failure of execution of the sub-job is updated in the database server, by the computing engine.
  • the job completion status is set to 'Success' if the sub-job is executed successfully by the computing engine, and is set to 'Failure', if the sub-job failed in execution.
  • a job is said to have been successfully completed, when the job completion status of all the divided sub-jobs shows a success.
  • an executable indicator of the logged job, a health status indicator of the application server and a health status indicator of the database server are monitored before pulling the logged j ob for execution.
  • the executable indicator of the logged job is set by a dispatcher module, based on the availability of resources required for execution of the logged job, a priority field of the logged job and a scheduled time for execution as set for the logged job.
  • the health status indicator of the application server is updated by a resource monitoring unit of the application server after every predefined time interval.
  • the health status indicator is set when processor, memory utilization and the number of current jobs being executed are below a first preset level. Setting of the health status indicator implies that the application server is capable of executing at least one additional job.
  • the resource monitoring unit can request a resource monitoring agent installed within the database server, for the health status of the database server.
  • the resource monitoring unit can update the health status indicator of the database server in a shared memory of the application server.
  • the logged job can be assigned by the dispatcher module to a particular application server, based on a dispatcher logic, when the executable indicator, the health status indicator of the application server and the health status indicator of the database server are set .
  • the dispatcher logic determines whether the execution of the logged jobs occur in a push mode or a pull mode.
  • the logged job When a machinelD field of logged job is set to an address of an application server, the logged job shall be executed in the push mode, where the dispatcher module shall push or allocate the execution of the job to the application server. However, if the machinelD field of the logged job is not populated, then any of the application servers whose health status indicator is set can pull the job for execution. In the pull mode, the logged job can be executed by any one of the plurality of application servers that form the computing grid, whereas in the push mode the logged job shall remain in the database server, until the health status indicator of the application server to whom the logged job is assigned, is set.
  • FIG. 5 illustrates a generalized example of a computing environment 500.
  • the computing environment 500 is not intended to suggest any limitation as to scope of use or functionality of described embodiments.
  • the computing environment 500 includes at least one processing unit 510 and memory 520.
  • the processing unit 510 executes computer-executable instructions and may be a real or a virtual processor. In a multi-processing system, multiple processing units execute computer- executable instructions to increase processing power.
  • the memory 520 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. In some embodiments, the memory 520 stores software 580 implementing described techniques.
  • a computing environment may have additional features.
  • the computing environment 500 includes storage 540, one or more input devices 550, one or more output devices 560, and one or more communication connections 570.
  • An interconnection mechanism such as a bus, controller, or network interconnects the components of the computing environment 500.
  • operating system software provides an operating environment for other software executing in the computing environment 500, and coordinates activities of the components of the computing environment 500.
  • the storage 540 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 500.
  • the storage 540 stores instructions for the software 580.
  • the input device(s) 550 may be a touch input device such as a keyboard, mouse, pen, trackball, touch screen, or game controller, a voice input device, a scanning device, a digital camera, or another device that provides input to the computing environment 500.
  • the output device(s) 560 may be a display, printer, speaker, or another device that provides output from the computing environment 500.
  • the communication connections) 570 enable communication over a communication medium to another computing entity.
  • the communication medium conveys information such as computer- executable instructions, audio or video information, or other data in a modulated data signal.
  • a modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
  • Computer-readable media are any available media that can be accessed within a computing environment.
  • Computer-readable media include memory 520, storage 540, communication media, and combinations of any of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Multi Processors (AREA)

Description

A METHOD AND SYSTEM FOR DISTRIBUTED COMPUTING OF JOBS
FIELD OF THE INVENTION
The present invention relates generally to a system and method for dynamically allocating and provisioning computing resources in a distributed shared computing environment. More particularly, the present invention relates to a system and method for executing jobs in a multi- machine environment comprising of a plurality of application servers.
BACKGROUND
System Performance is a central aspect in enterprise systems deployed in various businesses such as Information Technology, banking industry, telecommunication, military, health care and financial services. Few models concerning system performance for the enterprise systems have been continuously researched in the past. However, framework of the models used in the past has been based on anticipating an expected load on the enterprise systems rather than on modeling the system based on a runtime load. Anticipating the expected load on the enterprise systems has been known to lead to system failures and hence a poor system performance leading to customer dissatisfaction and a loss of revenue to a business running the enterprise systems.
Additionally, several performance engineering techniques for improving the system performance have been deployed in the prior art. In certain conventional enterprise applications, computing resources of the enterprise applications are manually assigned and provisioned for meeting various applications demands. The computing resources are manually assigned keeping in view a demand level at a fixed point in time. Such an approach is ill-equipped to handle increasing demand levels and decreasing demand levels of applications. As the demand level is usually fixed as per the peak demand level, the assigned computing resources are underutilized during periods of less than peak demand level. Certain, enterprise systems, utilize the technique of performance analysis based on load testing. However, this technique is inflexible since it relates specifically to the load existing on the system at a particular point in time. Also, performance modeling tools, based on statistical analysis and mathematical modeling, is used as another technique for improving system performance, however these tools are known to be very expensive involving major in-house resources to operate, and typically only focus on a particular aspect of the overall system. Certain other enterprise systems, employ, performance monitoring tools for monitoring the system performance periodically, however, such tools end to report problems too early resulting in unnecessary expenditure or too late resulting in the system failure. Hence, none of the existing systems deploy an integrated framework that can improve the overall performance of the enterprise systems, at an affordable cost.
There is a need for a system that integrates various performance engineering tools into one framework that can adaptively meet varying load demands at different points of time, by automatically provisioning computing resources of the system. Further, there is a need for a framework that can optimize the system performance in a cost effective manner.
SUMMARY
The present invention discloses a system and method for distributed computing of a plurality of jobs submitted by a plurality of users. In accordance with a disclosed embodiment, the system comprises a database server configured to log the plurality of jobs and one or more application servers configured to execute the plurality of jobs by a bin agent. The bin agent of an application server further comprises a job allocation module, configured to pull a logged job from the database server and a data fetch module, configured to fetch a set of data records essential for execution of the logged job from the database server Additionally, the bun agent comprises a configuration module configured to divide the logged job into a plurality of sub- jobs for distributed execution of the logged job and a caching module, configured to store the fetched set of data records in the form of a plurality of sub-linked lists in a cache server, where each sub- linked list is associated with a sub-job of the plurality of sub-jobs. Finally, the bin agent comprises a computing engine configured to execute the sub-job by accessing the associated sub- linked list.
In an additional embodiment, a method for distributed computing of a plurality of jobs is disclosed. The disclosed embodiment, may include logging the plurality of jobs in a database server, pulling by a bin agent the logged job for execution by an application server, where the bin agent exists within the application server, fetching by the bin agent a set of data records, required for execution of the logged job, from the database server, storing the fetched set of data records as a linked list in a cache server, dividing the linked list into a plurality of sub-linked lists where each sub-linked list is associated with a sequential identifier , dividing the logged job into a plurality of sub-jobs in the database server where each sub-job is associated with at least one sub-linked list in the cache server, and executing the sub-job by the bin agent, by accessing the associated sub-linked list from the cache server.
These and other features, aspects, and advantages of the present invention will be better understood with reference to the following description and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic illustration, in accordance with an embodiment, of an environment for distributed computing of a plurality of jobs by one or more application servers communicatively coupled with a database server;
FIG. 2 is a schematic diagram illustrating a system for tracking the health status of the database server and the one or more application servers, in accordance with an embodiment of the invention; and
FIG. 3 is a schematic diagram of a cache server for storing a plurality of data records fetched from the database server, as a serialized linked list, in accordance with an embodiment of the invention.
FIG. 4 is a flowchart illustrating an embodiment of a method for distributed computing of a plurality of jobs by one or more application servers. FIG. 5 illustrates a generalized example of a computing environment 500.
While system and method of the present invention is described herein by way of example and embodiments, those skilled in the art recognize that system and method for tracking a location of an item are not limited to the embodiments or drawings described. It should be understood that the drawings and description are not intended to be limiting to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word "may" is used in a permissive sense (i.e., meaning having the potential to) rather than the mandatory sense (i.e., meaning must). Similarly, the words "include", "including", and "includes" mean including, but not limited to.
DETAILED DESCRIPTION
In the following detailed description, examples are provided only for a thorough understanding of the present invention, the examples in no way limit the scope of the invention. The present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In other embodiments, well known methods, procedures, components and circuitry have been described at a relatively high-level, without detail, in order to prevent unnecessary obscuring the aspects of the present invention.
The present invention is directed to a method and system for distributed computing of varying demand loads on the system by a plurality of shared computing resources. The plurality of shared computing resources includes a plurality of application servers and service-oriented components hosted in a computing grid environment. The shared computing resources can be localized in a particular location or can be distributed over a wide geographical area. Further, the varying demand loads can be visualized as execution of a plurality of jobs submitted by one or more users of the system.
FIG. 1 illustrates an environment 100 in which various embodiments of the invention can be practiced. The environment 100 includes a plurality of users 144a, 144b to 144x, those submit a plurality of jobs viz. Job 1 142a, Job 2 142b to Job n 142x for execution, a collector table 106 installed in a database server 102, a dispatcher module 118, a cache server 120, and a plurality of application servers such as application server 1 136, application server 2 138, and application server n 140. The plurality of jobs; 142a, 142b to 142x can be of various types such as online jobs, batch jobs and channel jobs. Online jobs usually execute only once at a predetermined time where as batch jobs can occur repeatedly, at a predefined time intervals. In one embodiment of the present invention, an instance of the batch job in the system deployed for a banking institution may be an EOD BOD (End of Day Beginning of Day) operation that needs to be repeated in a time interval of 24 hours. An instance of the EOD BOD operation of the banking institution can be an interest calculation for all current accounts held by all customers of the banking institution. Alternatively, instances of EOD BOD operations could be inter-sol reconciliation, processing of renewal of all term deposit accounts held at the banking institution and making a business day's limit available to all the customers.
In an embodiment of the invention, the collector table 106 is configured to accept and store the plurality of jobs; 142a, 142b to 142x that are submitted to the system for execution. The jobs that are stored in the collector table 106 are said to be in the lodged state. In FIG. 1, Job 1 110, Job2 112, and Job n 114 are hereinafter referred to as lodged jobs. The collector table 106 is logically represented by a set of queues, where each queue, such as queue 108, holds a set of jobs based on certain predefined conditions such as priority. In one embodiment, the jobs are placed in different queues based on the predefined conditions. The predefined condition of priority is set by updating a priority field, a parameter of a job. The priority field determines the order of execution of the job within a job group. The job group usually represents a set of jobs 142a to 142n allocated to a particular application server for execution. The jobs 142a to 142n that can be executed in parallel usually have the same priority. The priority field of the job 142a to 142n can be set by the user, such as user 1 144a. Online jobs are usually given a default priority of high. However, as the priority field is configurable, once a job is submitted or lodged, the priority field of the job can be changed to a new order of priority. In such a case, the default priority of the job shall be overridden by the new order of priority. Hence, in an instance where the system receives critical batch jobs, the priority field of batch jobs can be set to high, instead of the online jobs. Further, each queue is identified by a unique ID, known as queue ID. Further each queue is configured to store a plurality of information associated with the plurality of jobs viz. 142a, 142b to 142x. The plurality of information include parameters of the plurality of jobs, parameters of the plurality of users who submitted the plurality of jobs, the application servers on which the plurality of jobs are being executed and a status of completion of each of the plurality of jobs.
A lodged job, such as Job 1 110, can be allocated to an application server such as application server 1 136 in the Push Model or the Pull Model, by the dispatcher module 118, on the basis of a dispatcher logic 144. The dispatcher module 118 is preferably a non-listening service responsible for picking up jobs at appropriate time and dispatching them to the plurality of application servers viz. 136, 138 and 140 for execution. The dispatcher module 118, is configured to poll for jobs lodged in a set of queues. The system may be configured to have more than one dispatcher when the demand load is high, whereby each dispatcher is configured to poll for lodged jobs in a set of queues. In one disclosed embodiment, a single dispatcher is preferably configured to poll almost ten queues.
In the push model, the lodged Job 1 110, gets allocated to an application server 136 to 138 for execution, by populating a machine ID field of the lodged Job 1 110 to the address of the application server 136 to 138. On being allocated to an application server, the Job 1 110 shall be marked as 'D-dispatched' by the dispatcher module 118. Any of the known algorithms of load balancing, such as round robin and peak load distribution algorithms may be employed in the dispatcher logic 144 for populating the machine ID field of the lodged Job 1 110. In the push model, the Job 1 110 can be picked up for execution only by the application server to whom, the Job 1 110 has been allocated, in one embodiment the application server 1 may be 136. In case health status of the application server 1 136 is not conducive for execution of the Job 1 110, the Job 1 110 shall remain in the dispatched state till the health status of the application server 1 136 becomes conducive and the Job 1 110 is picked up for execution. Contrastingly, in the pull model, the machine ID field is not populated, and hence the Job 1 110 shall be marked as 'D- dispatched' by the dispatcher module 118 without allocating the Job 1 110 to any of the plurality of application servers viz. 136, 138 and 140. Further, the Job 1 110 can be picked up for execution by any one of the plurality of application servers via, 136, 138 and 140, whose health status is conducive for executing the Job 1 110. Further, before marking the Job 1 110, as dispatched, a set of predefined parameters of the Job 1 110, are checked by the dispatcher module 118. The set of predefined parameters, determine whether the Job 1 110, is ready for execution. The set of predefined parameters include, the priority field, a scheduled time for execution, and the availability of resources required for execution of the Job 1 110. When the set of predefined parameters of the Job 1 110 are satisfied, the dispatcher module 118, basically sets an executable indicator of the Job 1 110, indicating that the Job 1 110, is in the 'dispatched' state and hence ready for execution.
As aforementioned, the Job 1 110 in the dispatched state, is picked up for execution by the application server 1 136 only when the health status of the application server is conducive for execution. The health status of application server 1 136, is determined by a resource monitoring unit 116a installed within the application server 1 136. The resource monitoring unit 116a periodically checks for the processor, memory utilization and the number of current jobs being executed by the application server 1 136. In case the processor, memory utilization and the number of current jobs being executed are below a predefined first preset level, a health status indicator 200a (refer FIG. 2) is updated in a shared memory 134a of the application server 1 136, by the resource monitoring unit 116a. Setting of the health status indicator 200a signifies that the health status of the application server 1 136 is conducive for execution of at least an additional job. For execution of the Job 1 110, the application server 1 136, may require the access to data records, stored within a database 104 of the database server 102. As multiple such requests for access of data records, by the plurality of application servers viz. 136, 138 and 140 are sent to the database server 102, load on the database server 102 gets built up, as a result of which the database server 102 may cease to respond to certain requests. Hence, it is essential to monitor the health status of the database server 102, before forking or picking up the Job 1 110 for execution. The health status of the database server 102 is determined by an agent service of the resource monitoring unit 116a, namely the resource monitoring agent 116 installed within the database server 102. In FIG. 2, the health status of the database server 102 is requested by the resource monitoring unit 116a at 206 (FIG. 2) to the resource monitoring agent 116. The determined health status of the database server 102 is preferably sent by the resource monitoring agent 116 to the requesting resource monitoring unit 116a. A health status indicator of the database server 202a is updated by the resource monitoring unit 116a; based on the received health status of the database server 102. Setting of the health status indicator 202a indicates that the health status of the database server 102 is conducive for execution of the Jobl 110. An overall status 204a for execution of the Jobl 110 by the application server 1 136 is determined by a combination of the health status indicator 200a and the health status indicator 202a. Hence the overall status 204a is preferably set only when the health status of both the database server 102 and the application server 1 136 are conducive for executing the Jobl 110.
The execution of the Jobl 110 is performed by a bin agent 122a of the application server. A bin agent is usually a non-listening service configured for every application server. In the disclosed embodiment, the bin agent 122a, is configured to pick the jobs allocated to the application server 1 136 by the dispatcher module 118 in the push model, and to pick jobs when the health status of the application serverl 136 are conducive in the pull model. The bin agent 122a includes a job allocation module 130, for picking up a dispatched job such as Jobl 110 for execution, based on a set of predefined conditions. The set of predefined conditions include the executable indicator of the Jobl 110, the health status indicator 200a of the application serverl 136, and the health status indicator 202a of the database server 102. When each of the set of predefined conditions are above the first preset level, the Jobl 110 shall be pulled for execution in the pull mode. The bin agent 122a further includes a data fetch module 128, a caching module 126, a configuration module 132 and a computing engine 124. The data fetch module 128 is preferably responsible for fetching a set of data records, required for execution of the Jobl 110, from the database server 102. The caching module 126 is configured to store the fetched set of data records as a linked list in the cache server 120, partition the linked list into a plurality of sub-linked lists; and provide each sub-linked list with a sequential identifier such as sequential identifierl 316 (refer FIG. 3). The configuration module 132 is configured to divide a pulled job such as Jobl 110, into a plurality of sub-child jobs viz. 110a, 110b and 110c, and store each of the plurality of sub-child jobs viz. 110a, 110b and 110c, in the database server 102. Each sub- child job is preferably associated with at least one sub-linked list in the cache server 120. A sub- child job such as 110a can be pulled from the database server 102, by the computing engine 124 for execution. The computing engine 124 can access the associated sub-linked list of the sub- child job 110a, from the cache server 120, and execute the sub-child job 110a.
FIG. 3 illustrates an embodiment of the cache server 120. The cache server 120 is basically used for storing sets of data records retrieved from the database server; for faster access of data by the plurality of application servers 136, 138 and 140. The cache server 120 is preferably located centrally, and hence is preferably accessible by each of the plurality of application servers viz. 136, 138 and 140, of the system. As a result the cache server 120 enables, faster retrieval of data and thereby an enhanced system performance. In the disclosed embodiment, each lodged job, such as Job 1 110, Job2 112, and Job n 114, are preferably represented by a Main Child Info Structure, such as Main Child Info Structure 1 300. The Main Child Info Structure 1 300, includes a key also known as the Request ID 302 that stores the Jobl 110 in a string form. The Request ID 302, enables synchronizing the Jobl 110 with the associated linked list of data records. The sub-child jobs field 305 illustrates the number of sub-child jobs the Jobl 110 is partitioned into by the configuration module 132. A sub-child record 1 308 field, points to a sub- child info structure 1 314a, that contains, information required for execution of a sub-child jobl. The sub-child Info structurel 314a, is represented by a sequential identifier 1 316, which is used for associated the sub-child jobl with the associated sub-linked list 328a, which is pointed by the sub-child info structure 314a. The FirstRecord field 318 points to the first record; Record 1 318a, of the sub-linked list 328a, and the LastRecord field 320 stores the address of the last record viz. Record4 318d. Each record, stores the Value 332 of data record, the length of the data record as ValueLength 334, and address of the next record as Next Record Pointer 336. Thus, as known in the art, the data records are stored in the form of a serialized linked list, whereby only the address of the first and last records are stored in each of the sub-child info structures viz, 314a, 314b and 314x.
The addresses of each of the sub-child info structures viz. 314a, 314b to 314x are stored in the form of a hash map in the main child info structure 300. The hash map viz. 308, 310 up to 312 is basically an array of pointers. Each sub-child info structure, such as sub-child info strucutrel 314a, contains a field called record count 332, which signifies the number of data records associated with the sub-linked list 328a; that is pointed to by the sub-child info structure 314a. When all the data records signified by the record count 332, are fetched by the database server and stored in the sub-linked list 328a, a Put Flag 326 can be set to 'true'. When all the data records of the sub-linked list 328a; are accessed by the computing engine 124 for execution of the sub-child jobl, a count of the number of data records accessed by the computing engine 124 is sent to the cache server. When the count equals the record count 332, Get Flag 324 is set to 'True'. The count of the number of data records accessed by the computing engine 124 can be received from the collector table 106. On setting the Get Flag 324 to 'True', all the data records of the sub-linked list 328a are flashed from the cache server 120, thereby making space for new data records. A clear flag 306 of the main child info structure 1 300 is set to 'true', when Get Flag of the plurality of sub-child info structures of the main child info structure 1 300, viz. 314a, 314b, and 314x are set to 'true'. Setting of the clear flag to 'true' signifies that the entire linked list of the Jobl 110 has been fetched by one or more computing engines of the plurality of application server's viz. 136, 138, and 140, for distributed execution of the Jobl 110. Hence, the linked list is no longer necessary in the cache server 120 and needs to be flashed, so that a plurality of data records required for execution of the subsequent Jobs; such as Job 2 112 up to Job n 114, can be stored in the cache server 120.
A job completion status indicator in the collector table 106, signifying the status of completion of a sub- job is updated by a computing engine that executes the sub- job. The job completion status indicator of a sub- job can be set to; 'Success' if the sub- job has been executed successfully by the computing engine, and to 'Failure' if the sub- job could not be executed successfully. The job completion status indicator of the plurality of sub-child jobs of Jobl 110, for instance, is continuously polled by a grid reporting agent. The Jobl 110 shall be reported by the Grid Reporting Agent as being successfully completed, only if the job completion status of each of the plurality of the sub- jobs, represented by the sub-child info structures viz. 314a, 314b and 314x are set to 'Success'. If the job completion status of even one of the sub- jobs is reported as a 'Failure', the job completion status of the Jobl 110 shall be reported by the Grid Reporting Agent as a 'Failure'.
FIG. 4 describes an embodiment of a method of practicing the instant invention. In step 402, a plurality of jobs, submitted by at least one user, are logged in a database server. At step 404, a logged job is pulled by a job allocation module of a bin agent, where the bin agent is installed within an application server. A set of data records, are fetched at step 406, by a data fetch module of the bin agent, from the database server. The set of data records are essential for execution of the pulled job. Further, at step 408, the fetched set of data records are stored as a linked list in a cache server, by a caching module of the bin agent. The linked list is further divided into a plurality of sub-linked lists by the caching module, in stored in the database server at step 410. Each of the plurality of sub-linked lists is associated with a sequential identifier such as the sequential identifierl 316. At step 412, the pulled job is divided into a plurality of sub-jobs in the database server, by a configuration module of the bin agent. Each of the sub-job is associated with a sub-liked list through the sequential identifier. At step 414, a sub-job is pulled, by a computing engine, from the database server for execution. The associated sub-linked list present in the cache server is accessed by the computing engine for execution of the sub-job. On completing the execution of the sub-job, the associated sub-linked list is flashed from the cache server by the caching module. A job completion status indicating the success or failure of execution of the sub-job is updated in the database server, by the computing engine. The job completion status is set to 'Success' if the sub-job is executed successfully by the computing engine, and is set to 'Failure', if the sub-job failed in execution. In the disclosed embodiment a job is said to have been successfully completed, when the job completion status of all the divided sub-jobs shows a success. In the disclosed embodiment, an executable indicator of the logged job, a health status indicator of the application server and a health status indicator of the database server are monitored before pulling the logged j ob for execution. The executable indicator of the logged job is set by a dispatcher module, based on the availability of resources required for execution of the logged job, a priority field of the logged job and a scheduled time for execution as set for the logged job. The health status indicator of the application server is updated by a resource monitoring unit of the application server after every predefined time interval. The health status indicator is set when processor, memory utilization and the number of current jobs being executed are below a first preset level. Setting of the health status indicator implies that the application server is capable of executing at least one additional job. Further, the resource monitoring unit can request a resource monitoring agent installed within the database server, for the health status of the database server. On receiving the updated health status of the database server, the resource monitoring unit can update the health status indicator of the database server in a shared memory of the application server. Alternatively in the disclosed embodiment, the logged job can be assigned by the dispatcher module to a particular application server, based on a dispatcher logic, when the executable indicator, the health status indicator of the application server and the health status indicator of the database server are set .The dispatcher logic determines whether the execution of the logged jobs occur in a push mode or a pull mode. When a machinelD field of logged job is set to an address of an application server, the logged job shall be executed in the push mode, where the dispatcher module shall push or allocate the execution of the job to the application server. However, if the machinelD field of the logged job is not populated, then any of the application servers whose health status indicator is set can pull the job for execution. In the pull mode, the logged job can be executed by any one of the plurality of application servers that form the computing grid, whereas in the push mode the logged job shall remain in the database server, until the health status indicator of the application server to whom the logged job is assigned, is set.
One or more of the above-described techniques can be implemented in one or more computer systems. FIG. 5 illustrates a generalized example of a computing environment 500. The computing environment 500 is not intended to suggest any limitation as to scope of use or functionality of described embodiments.
With reference to Fig. 5, the computing environment 500 includes at least one processing unit 510 and memory 520. In Fig. 5, this most basic configuration 530 is included within a dashed line. The processing unit 510 executes computer-executable instructions and may be a real or a virtual processor. In a multi-processing system, multiple processing units execute computer- executable instructions to increase processing power. The memory 520 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. In some embodiments, the memory 520 stores software 580 implementing described techniques.
A computing environment may have additional features. For example, the computing environment 500 includes storage 540, one or more input devices 550, one or more output devices 560, and one or more communication connections 570. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 500. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 500, and coordinates activities of the components of the computing environment 500.
The storage 540 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 500. In some embodiments, the storage 540 stores instructions for the software 580.
The input device(s) 550 may be a touch input device such as a keyboard, mouse, pen, trackball, touch screen, or game controller, a voice input device, a scanning device, a digital camera, or another device that provides input to the computing environment 500. The output device(s) 560 may be a display, printer, speaker, or another device that provides output from the computing environment 500.
The communication connections) 570 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer- executable instructions, audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Implementations can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, within the computing environment 500, computer-readable media include memory 520, storage 540, communication media, and combinations of any of the above.
Having described and illustrated the principles of our invention with reference to described embodiments, it will be recognized that the described embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are not related or limited to any particular type of computing environment, unless indicated otherwise. Various types of general purpose or specialized computing environments may be used with or perform operations in accordance with the teachings described herein. Elements of the described embodiments shown in software may be implemented in hardware and vice versa.
As will be appreciated by those ordinary skilled in the art, the foregoing example, demonstrations, and method steps may be implemented by suitable code on a processor base system, such as general purpose or special purpose computer. It should also be noted that different implementations of the present technique may perform some or all the steps described herein in different orders or substantially concurrently, that is, in parallel. Furthermore, the functions may be implemented in a variety of programming languages. Such code, as will be appreciated by those of ordinary skilled in the art, may be stored or adapted for storage in one or more tangible machine readable media, such as on memory chips, local or remote hard disks, optical disks or other media, which may be accessed by a processor based system to execute the stored code. Note that the tangible media may comprise paper or another suitable medium upon which the instructions are printed. For instance, the instructions may be electronically captured via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
The following description is presented to enable a person of ordinary skill in the art to make and use the invention and is provided in the context of the requirement for a obtaining a patent. The present description is the best presently-contemplated method for carrying out the present invention. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles of the present invention may be applied to other embodiments, and some features of the present invention may be used without the corresponding use of other features. Accordingly, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.
While the foregoing has described certain embodiments and the best mode of practicing the invention, it is understood that various implementations, modifications and examples of the subject matter disclosed herein may be made. It is intended by the following claims to cover the various implementations, modifications, and variations that may fall within the scope of the subject matter described.

Claims

We Claim:
1. A system for distributed computing of at least one of a plurality of jobs, the system comprising: a. a database server configured to: log the plurality of jobs submitted by at least one user; and b. one or more application servers, whereby each application server comprising a bin agent for executing a job allocated to the application server, the bin agent further comprising: a job allocation module, configured to pull a logged job from the database server, based on a set of predefined conditions; a data fetch module, configured to fetch a set of data records from the database server, whereby the set of data records is essential for executing the pulled job; a caching module, configured to: store the fetched set of data records, as a linked list in a cache server; partition the linked list into a plurality of sub-linked lists; and associate each sub-linked list with a sequential identifier, whereby each sub linked list is accessible by the one or more application servers; a configuration module, configured to: divide the pulled job into a plurality of sub-jobs, whereby each sub-job is associated with at least one sub-linked list based on the sequential identifier; and store the plurality of sub-jobs in the database server; a computing engine configured to: pull a sub-job from the database server; access the at least one associated sub-linked list from the cache server; and execute the sub-job.
2. The system of claim 1, wherein the database server comprises a collector table, the collector table configured to: store a plurality of information associated with the plurality of jobs, whereby the plurality of information of a job comprises parameters of the job, parameters of the user, the application server executing the job, and a job completion status.
3. The system of claim 2, wherein the computing engine is further configured to: update the job completion status of the sub-job in the collector table, on completing the execution of the sub-job.
4. The system of claim 2, wherein the plurality of jobs comprises one or more of a batch job, an online job, and a channel job.
5. The system of claim 1, wherein the set of predefined conditions comprise: an executability indicator of the job; a health status indicator of the application server; and a health status indicator of the database server.
6. The system of claim 5, wherein each application server, further comprises a resource monitoring unit, configured to: set the health status indicator of the application server after every predefined time interval, when memory utilization, processor utilization, and a count of jobs being executed by the application server are below a first preset level; and set the health status indicator of the database server after every predefined time interval, when memory and processor utilization of the database server are below a second preset level.
7. The system of claim 6, further comprising at least one dispatcher module configured to: set the executable indicator of the logged job, when a set of predefined parameters required for execution of a logged job are satisfied ; and assign a logged job to a bin agent of an application server, when the executable indicator, the health status indicator of the application server and the health status indicator of the database server are set.
8. The system of claim 7, wherein the set of predefined parameters comprise: a priority field set for the job; availability of resources required for execution of the job; and a scheduled time of execution.
9. The system of claim 8, wherein the priority field is set to a numerical value by the at least one user, the numerical value determining the order in which the logged job is to be executed.
10. The system of claim 9, wherein the job allocation module of the application server, is configured to pull a logged job, when the executable indicator of the job, the health status indicator of the application server, and the health status indicator of the database server are set.
11. The system to claim 2, wherein the collector table is partitioned into a plurality of queues, whereby each queue holds a set of jobs associated with the at least one dispatcher module.
12. The system of claim 1, wherein the caching module, is further configured to flash a sub- linked list when the sub-linked list is fetched by the computing engine for execution of an associated sub-job.
13. A method for distributed computing of at least one of a plurality of jobs, the method comprising the steps of: logging the plurality of jobs, submitted by at least one user, in a database server; pulling, by a job allocation module of a bin agent, a logged job for execution, whereby the bin agent exists within an application server; fetching , by a data fetch module of the bin agent, a set of data records from the database server, whereby the set of data records is essential for executing the pulled job; storing, by a caching module, the fetched set of data records, as a linked list, in a cache server; dividing, by the caching module, the linked list into a plurality of sub-linked lists, and associating each sub-linked list with a sequential identifier; dividing, by a configuration module, the pulled job into a plurality of sub-jobs in the database server, whereby each sub-job is associated with a sub-linked list, based on the sequential identifier; and pulling, by a computing engine, a sub-job from the database server, for executing the sub-job by accessing at least one associated sub-linked list from the cache server.
14. The method of claim 13, wherein the plurality of sub-linked list are accessible by a plurality of computing engines, whereby each computing engine exists within a bin agent of an application server.
15. The method of claim 13, further comprising flashing, by the caching module, a sub-linked list from the cache server, when the sub-linked list is fetched by the computing engine for execution of an associated sub-job.
16. The method of claim 13, further comprising storing a plurality of information associated with the plurality of jobs, whereby the plurality of information of a job comprises; parameters of the job, parameters of the at least one user, the application server executing the job, and a job completion status, in a collector table, whereby the collector table exists in the database server.
17. The method of claim 16, further comprising updating, by the computing engine, the job completion status of the sub-job in the collector table, on completing the execution of the sub- job.
18. The method of claim 16, wherein the step of pulling, by a job allocation module of a bin agent, a logged job for execution is performed by considering at least one or more of: a health status indicator of the application server; a health status indicator of the database server; and an executable indicator of the logged job.
19. The method of claim 18, further comprising: setting, by a resource monitoring unit of the application server, the health status indicator of the application server, after every predefined time interval, when memory utilization, processor utilization, and a count of jobs being executed by the application server are below a first preset level; and setting, by a resource monitoring agent, the health status indicator of the database server at predefined time intervals, when memory utilization and processor utilization by the database server are below a second preset level.
20. The method of claim 19, further comprising: setting, by a dispatcher module, the executable indicator of the logged job, based on a set of predefined parameters; assigning, by the dispatcher module, the logged job to an application server, when the executable indicator, the health status indicator of the application server, and the health status indicator of the database server are set.
21. The method of claim 20, wherein the set of predefined parameters comprise: a priority field of the logged job; availability of resources required for execution of the job; and a scheduled time for execution.
22. The method of claim 21, wherein the priority field is set to a numerical value, by the at least one user, the numerical value determining the order in which the logged job is to be executed.
23. The method of claim 21, wherein the step of pulling, by a job allocation module, a logged job, is performed when the health status indicator of the application server, the health status indicator of the database server, and the executable criteria of the logged job is set.
23. The method of claim 20, further comprising partitioning the collector table into a plurality of queues, whereby each queue holds a set of jobs associated with a dispatcher module.
24. The method of claim 13, wherein the plurality of jobs comprises one or more of a batch job, an online job, and a channel job.
25. A computer program product consisting of a plurality of program instructions stored on a non-transitory computer-readable medium that, when executed by a computing device, performs a method for distributed computing of at least one of a plurality of jobs, the method comprising: logging the plurality of jobs, submitted by at least one user, in a database server; pulling, by a job allocation module of a bin agent, a logged job for execution, whereby the bin agent exists within an application server; fetching , by a data fetch module of the bin agent, a set of data records from the database server, whereby the set of data records is essential for executing the pulled job; storing, by a caching module, the fetched set of data records, as a linked list, in a cache server; dividing, by the caching module, the linked list into a plurality of sub-linked lists, and associating each sub-linked list with a sequential identifier; dividing, by a configuration module, the pulled job into a plurality of sub-jobs in the database server, whereby each sub-job is associated with a sub-linked list, based on the sequential identifier; and pulling, by a computing engine, a sub-job from the database server, for executing the sub-job by accessing at least one associated sub-linked list from the cache server.
PCT/IN2012/000192 2012-03-23 2012-03-23 A method and system for distributed computing of jobs Ceased WO2013140412A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP12871948.1A EP2828761A4 (en) 2012-03-23 2012-03-23 Method and system for distributed computing of jobs
PCT/IN2012/000192 WO2013140412A1 (en) 2012-03-23 2012-03-23 A method and system for distributed computing of jobs
PH12014502110A PH12014502110A1 (en) 2012-03-23 2014-09-22 A method and system for distributed computing of jobs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IN2012/000192 WO2013140412A1 (en) 2012-03-23 2012-03-23 A method and system for distributed computing of jobs

Publications (1)

Publication Number Publication Date
WO2013140412A1 true WO2013140412A1 (en) 2013-09-26

Family

ID=49221937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2012/000192 Ceased WO2013140412A1 (en) 2012-03-23 2012-03-23 A method and system for distributed computing of jobs

Country Status (3)

Country Link
EP (1) EP2828761A4 (en)
PH (1) PH12014502110A1 (en)
WO (1) WO2013140412A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298724A (en) * 2014-09-25 2015-01-21 蓝盾信息安全技术有限公司 Big data report pre-storage and calculation method
CN111277900A (en) * 2018-12-05 2020-06-12 深圳市茁壮网络股份有限公司 Starting method and device of set top box
CN111708643A (en) * 2020-06-11 2020-09-25 中国工商银行股份有限公司 Batch operation method and device for distributed streaming media platform
WO2021050139A1 (en) * 2019-09-10 2021-03-18 Microsoft Technology Licensing, Llc Self-partitioning distributed computing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276771A1 (en) * 2005-09-15 2009-11-05 3Tera, Inc. Globally Distributed Utility Computing Cloud
US20100010843A1 (en) * 2008-07-08 2010-01-14 Arundat Mercy Dasari Algorithm system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080389A1 (en) * 2004-10-06 2006-04-13 Digipede Technologies, Llc Distributed processing system
US9043401B2 (en) * 2009-10-08 2015-05-26 Ebay Inc. Systems and methods to process a request received at an application program interface

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276771A1 (en) * 2005-09-15 2009-11-05 3Tera, Inc. Globally Distributed Utility Computing Cloud
US20100010843A1 (en) * 2008-07-08 2010-01-14 Arundat Mercy Dasari Algorithm system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2828761A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298724A (en) * 2014-09-25 2015-01-21 蓝盾信息安全技术有限公司 Big data report pre-storage and calculation method
CN111277900A (en) * 2018-12-05 2020-06-12 深圳市茁壮网络股份有限公司 Starting method and device of set top box
CN111277900B (en) * 2018-12-05 2022-12-23 深圳市茁壮网络股份有限公司 Starting method and device of set top box
WO2021050139A1 (en) * 2019-09-10 2021-03-18 Microsoft Technology Licensing, Llc Self-partitioning distributed computing system
US11294732B2 (en) 2019-09-10 2022-04-05 Microsoft Technology Licensing, Llc Self-partitioning distributed computing system
CN111708643A (en) * 2020-06-11 2020-09-25 中国工商银行股份有限公司 Batch operation method and device for distributed streaming media platform

Also Published As

Publication number Publication date
PH12014502110A1 (en) 2014-12-10
EP2828761A4 (en) 2015-12-02
EP2828761A1 (en) 2015-01-28

Similar Documents

Publication Publication Date Title
US10169090B2 (en) Facilitating tiered service model-based fair allocation of resources for application servers in multi-tenant environments
CN109565515B (en) System, device and process for dynamic tenant structure adjustment in distributed resource management system
CN108431796B (en) Distributed resource management system and method
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
US9189543B2 (en) Predicting service request breaches
WO2021036936A1 (en) Method and apparatus for allocating resources and tasks in distributed system, and system
US9466036B1 (en) Automated reconfiguration of shared network resources
US8949429B1 (en) Client-managed hierarchical resource allocation
US9588813B1 (en) Determining cost of service call
US10282245B1 (en) Root cause detection and monitoring for storage systems
JP2014191594A (en) Decentralized processing system
CN113544647B (en) Capacity management using virtual machine family modeling in cloud computing systems
US8904144B1 (en) Methods and systems for determining at risk index for storage capacity
WO2011104983A1 (en) Monitoring status display device, monitoring status display method, and monitoring status display program
US9886337B2 (en) Quorum based distributed anomaly detection and repair using distributed computing by stateless processes
US10223189B1 (en) Root cause detection and monitoring for storage systems
CN113723758A (en) Method and device for managing work tasks, storage medium and electronic equipment
CN118227289A (en) Task scheduling method, device, electronic equipment, storage medium and program product
US11489731B2 (en) Techniques and architectures for efficient allocation of under-utilized resources
WO2013140412A1 (en) A method and system for distributed computing of jobs
US20180018129A1 (en) Storage monitoring system and monitoring method therefor
CN111324459A (en) Calendar-based resource scheduling method and device, electronic equipment and storage medium
CN112685157B (en) Task processing method, device, computer equipment and storage medium
US11200097B2 (en) Device and method for optimizing the utilization over time of the resources of an IT infrastructure
US11204942B2 (en) Method and system for workload aware storage replication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12871948

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2012871948

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012871948

Country of ref document: EP