[go: up one dir, main page]

CN111209107A - Multi-cluster operation method - Google Patents

Multi-cluster operation method Download PDF

Info

Publication number
CN111209107A
CN111209107A CN201911362939.1A CN201911362939A CN111209107A CN 111209107 A CN111209107 A CN 111209107A CN 201911362939 A CN201911362939 A CN 201911362939A CN 111209107 A CN111209107 A CN 111209107A
Authority
CN
China
Prior art keywords
cluster
user
operates
administrator
operation method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911362939.1A
Other languages
Chinese (zh)
Inventor
胡梦龙
张涛
原帅
吕灼恒
王家尧
胡辰
王新雷
李斌
沙超群
厉军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN201911362939.1A priority Critical patent/CN111209107A/en
Publication of CN111209107A publication Critical patent/CN111209107A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

本发明公开了一种多集群操作方法,包括:为用户添加一个属性;管理员通过设置所述属性的值来确定用户操作单集群、或操作当前登陆集群、或操作多集群。通过上述技术方案,能够避免将集群信息暴漏给用户的问题。

Figure 201911362939

The invention discloses a multi-cluster operation method, comprising: adding an attribute for a user; an administrator determines that the user operates a single cluster, operates a current login cluster, or operates multiple clusters by setting the value of the attribute. Through the above technical solutions, the problem of exposing cluster information to users can be avoided.

Figure 201911362939

Description

Multi-cluster operation method
Technical Field
The invention relates to the technical field of computer clusters, in particular to a multi-cluster operation method.
Background
The SLURM is an open-source cluster job scheduling system with good fault tolerance and high scalability, and has the key functions of: allocating computing resources to perform work tasks; providing a framework for starting, executing, monitoring jobs on the assigned node sets; arbitration resource contention issues. The cluster consists of all nodes managed by one slarmctld daemon.
SLURM provides the ability to target commands to other clusters, rather than, or in addition to, the local cluster that invoked the command. After enabling this behavior, the user may submit jobs to one or more clusters and receive status from these remote clusters. Part of the client commands now provide an "-M" -clusters ═ "option that provides the ability to communicate with comma-separated cluster lists.
At present, a user must explicitly specify a cluster list by using an option of "-M, — clusters", and an administrator must expose cluster information to the user, which cannot meet the requirement of the administrator for controlling the cluster information. SLURM provides temporarily no functionality to shield the cluster information from the user.
Disclosure of Invention
In view of the above problems in the related art, the present invention provides a multi-cluster operation method, which can eliminate the need for explicitly specifying cluster names.
The technical scheme of the invention is realized as follows:
according to an aspect of the present invention, there is provided a multi-cluster operation method, including:
adding an attribute to the user;
the administrator determines whether to operate a single cluster by the user, or to operate a current login cluster, or to operate multiple clusters by setting the values of the attributes.
According to an embodiment of the present invention, adding an attribute to a user comprises: a field is added to the database for the user table, which is a list of cluster names operable by the user.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for a user having authority only for a first cluster, when the value of the field set by the administrator is the name of the first cluster, if the user logs in the first cluster having the authority, the user operates the first cluster, and if the user logs in a second cluster having no authority, the user operates the first cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for users having authority over both the first cluster and the second cluster, when the administrator sets the value of the field as the first cluster name, if the user logs in the first cluster having authority, the user operates the first cluster, and if the user logs in the second cluster having authority, the user operates the first cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for users having authority over both the first cluster and the second cluster, when the administrator sets the value of the field as the current cluster name, if the user logs in the first cluster having authority, the user only operates the first cluster, and if the user logs in the second cluster having authority, the user only operates the second cluster.
According to an embodiment of the present invention, the administrator setting the values of the attributes includes: for a user having authority over both the first cluster and the second cluster, when the administrator sets the values of the fields as the first cluster name and the second cluster name, or all the cluster names, if the user logs in the first cluster having the authority, the user operates the first cluster and the second cluster, and if the user logs in the second cluster having the authority, the user operates the first cluster and the second cluster.
According to the embodiment of the invention, the operation of a user on a single cluster, or the operation of a current login cluster, or the operation of a multi-cluster comprises the following steps: for submitting jobs to a single cluster, or a currently logged-on cluster, or multiple clusters.
According to the embodiment of the invention, when the submission job is executed and when the multi-cluster operation needs to be executed, the user and cluster information in the database are sequentially inquired and the cluster list is returned so as to select the cluster from all available clusters to submit the job.
The technical scheme of the invention realizes the SLURM dynamically configurable multi-cluster operation method. An attribute is added to a user, and an administrator determines that the user submits jobs to functions of a single cluster, a current login cluster, a multi-cluster and the like by setting the value of the attribute. Therefore, the user does not need to care about the cluster information, and the problem that the cluster information is exposed to the user in the prior art is solved. By default, the user may submit jobs to all clusters. In addition, the control function of the cluster administrator is enhanced, and the requirement of the cluster administrator on protecting cluster information is met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a method of multi-cluster operation according to an embodiment of the invention;
FIG. 2 is a flow diagram of a batch submit job command according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
FIG. 1 is a flow chart of a method of multi-cluster operation according to an embodiment of the invention. As shown in fig. 1, the multi-cluster operation method of the embodiment of the present invention may include the following steps:
s11, adding an attribute for the user;
s12, the administrator determines whether the user operates the single cluster, the current login cluster or the multi-cluster by setting the value of the attribute.
According to the technical scheme, by adding the attributes, an administrator determines the functions of a user operation list cluster, a current login cluster, a multi-cluster and the like by setting the user attributes. Therefore, in contrast to the prior art, the user does not need to explicitly specify the cluster name.
Specifically, a feandmulticycle field may be added to the user table in the database, where the meaning of the field is a list of cluster names that can be operated by the user, and the specific setting conditions include:
(1) for users having authority only to one cluster a (the first cluster), the administrator sets feandmulticruster a.
A user logs in the cluster A with the authority and can operate the cluster A;
a user logs in a cluster B (a second cluster) without authority and can operate the cluster A;
(2) for users with authority in two clusters a, B, the administrator sets feandmulticruster a.
A user logs in the cluster A with the authority and can operate the cluster A;
a user logs in the cluster B with the authority and can operate the cluster A;
(3) for users with authority in the two clusters A and B, the administrator sets the current as the fendnmulticruster.
A user logs in the cluster A with the authority and can only operate the cluster A;
a user logs in the cluster B with the authority and can only operate the cluster B;
(4) for users with authority over two clusters a, B, the administrator sets either feandmulticluster-a, B or feandmulticluster-all.
A user logs in the cluster A with the authority and can operate the clusters A and B;
and the user logs in the cluster B with the authority and can operate the clusters A and B.
In one embodiment, the SLURM multi-cluster operation commands are numerous, and the dynamic configuration implementation principle is described by taking a batch commit job command sbatch as an example, and a sbatch code processing flow chart is shown in fig. 2, which includes:
1) the sbatch command starts to be executed, firstly, a configuration file (slarm. conf) is analyzed, and some key parameters are stored;
2) analyzing and storing parameters transmitted from sources such as a job script, an environment variable, a command line and the like, and processing the condition of an option of ' M ' -Cluster ';
3) filling a job structure according to the parameters obtained in the above two steps, the structure containing all necessary information for execution of one job;
4) and judging whether to execute the multi-cluster operation according to opt. If there are multiple clusters, execute the slarmdb _ get _ first _ avail _ cluster, this function interacts with the database daemon slarmdb, execute job _ will _ run, slarmjb _ will _ run2 and job _ will _ run _ cluster in turn, the function is to select one suitable cluster from all available clusters to submit the job. Calling slurmdb _ get _ info _ cluster inside the function, sequentially inquiring user and cluster information in the mysql database, and returning appropriate cluster list information;
5) if not, executing the slarm _ submit _ batch _ jobs;
6) step 4) and step 5) call the slm _ send _ resv _ controller _ msg, and the function packs the job information and sends the job information to the management node daemon slrmctld of the cluster configured by the user to wait for scheduling and execution.
It should be understood that other commands related to multi-cluster operations may be processed similarly to fig. 2.
In summary, the technical solution of the present invention realizes a method for dynamically configurable multi-cluster operation by SLURM. An attribute is added to a user, and an administrator determines that the user submits jobs to functions of a single cluster, a current login cluster, a multi-cluster and the like by setting the value of the attribute. Therefore, the user does not need to care about the cluster information, and the problem that the cluster information is exposed to the user in the prior art is solved. By default, the user may submit jobs to all clusters. In addition, the control function of the cluster administrator is enhanced, and the requirement of the cluster administrator on protecting cluster information is met.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1.一种多集群操作方法,其特征在于,包括:1. a multi-cluster operation method, is characterized in that, comprises: 为用户添加一个属性;Add an attribute to the user; 管理员通过设置所述属性的值来确定用户操作单集群、或操作当前登陆集群、或操作多集群。The administrator determines that the user operates a single cluster, operates a currently logged-in cluster, or operates a multi-cluster by setting the value of the attribute. 2.根据权利要求1所述的多集群操作方法,其特征在于,为用户添加一个属性包括:2. The multi-cluster operation method according to claim 1, wherein adding an attribute for the user comprises: 在数据库中为用户表增加一个字段,所述字段为用户可操作的集群名列表。A field is added to the user table in the database, and the field is a list of cluster names that the user can operate. 3.根据权利要求1所述的多集群操作方法,其特征在于,管理员设置所述属性的值包括:3. The multi-cluster operation method according to claim 1, wherein the setting of the value of the attribute by an administrator comprises: 对于只对第一集群有权限的用户,当所述管理员设置所述字段的值为第一集群名时,For a user who only has permission to the first cluster, when the administrator sets the value of the field to the name of the first cluster, 如果所述用户登录在有权限的所述第一集群,则所述用户操作所述第一集群,If the user logs in to the authorized first cluster, the user operates the first cluster, 如果所述用户登录在没有权限的第二集群,则所述用户操作所述第一集群。If the user logs in to the second cluster without permission, the user operates the first cluster. 4.根据权利要求1所述的多集群操作方法,其特征在于,管理员设置所述属性的值包括:4. The multi-cluster operation method according to claim 1, wherein setting the value of the attribute by an administrator comprises: 对于对第一集群和第二集群均有权限的用户,当所述管理员设置所述字段的值为第一集群名时,For a user who has rights to both the first cluster and the second cluster, when the administrator sets the value of the field to the name of the first cluster, 如果所述用户登录在有权限的所述第一集群,则所述用户操作所述第一集群,If the user logs in to the authorized first cluster, the user operates the first cluster, 如果所述用户登录在有权限的第二集群,则所述用户操作所述第一集群。If the user logs in to the second cluster with permission, the user operates the first cluster. 5.根据权利要求1所述的多集群操作方法,其特征在于,管理员设置所述属性的值包括:5 . The multi-cluster operation method according to claim 1 , wherein setting the value of the attribute by an administrator comprises: 6 . 对于对第一集群和第二集群均有权限的用户,当所述管理员设置所述字段的值为当前集群名时,For a user who has rights to both the first cluster and the second cluster, when the administrator sets the value of the field to the current cluster name, 如果所述用户登录在有权限的所述第一集群,则所述用户只操作所述第一集群,If the user logs in to the authorized first cluster, the user only operates the first cluster, 如果所述用户登录在有权限的第二集群,则所述用户只操作所述第二集群。If the user logs in to the second cluster with permission, the user only operates the second cluster. 6.根据权利要求1所述的多集群操作方法,其特征在于,管理员设置所述属性的值包括:6. The multi-cluster operation method according to claim 1, wherein the setting of the value of the attribute by an administrator comprises: 对于对第一集群和第二集群均有权限的用户,当所述管理员设置所述字段的值为第一集群名和第二集群名、或者为所有集群名时,For a user who has rights to the first cluster and the second cluster, when the administrator sets the value of the field to the first cluster name and the second cluster name, or to all cluster names, 如果所述用户登录在有权限的所述第一集群,则所述用户操作所述第一集群和第二集群,If the user logs in to the authorized first cluster, the user operates the first cluster and the second cluster, 如果所述用户登录在有权限的所述第二集群,则所述用户操作所述第一集群和第二集群。If the user logs in the authorized second cluster, the user operates the first cluster and the second cluster. 7.根据权利要求1所述的多集群操作方法,其特征在于,用户操作单集群、或操作当前登陆集群、或操作多集群包括:7. The multi-cluster operation method according to claim 1, wherein the user operating a single cluster, or operating a current login cluster, or operating multiple clusters comprises: 用于提交作业至单集群、或当前登陆集群、或多集群。Used to submit jobs to a single cluster, or the currently logged-in cluster, or multiple clusters. 8.根据权利要求2所述的多集群操作方法,其特征在于,当执行提交作业时,8. The multi-cluster operation method according to claim 2, characterized in that, when executing a job submission, 当需要执行多集群操作时,依次查询所述数据库中的用户和集群信息并返回集群列表,以从所有可用集群中选择集群以提交作业。When a multi-cluster operation needs to be performed, the user and cluster information in the database are sequentially queried and a list of clusters is returned to select a cluster from all available clusters to submit the job.
CN201911362939.1A 2019-12-26 2019-12-26 Multi-cluster operation method Pending CN111209107A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911362939.1A CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911362939.1A CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Publications (1)

Publication Number Publication Date
CN111209107A true CN111209107A (en) 2020-05-29

Family

ID=70782533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911362939.1A Pending CN111209107A (en) 2019-12-26 2019-12-26 Multi-cluster operation method

Country Status (1)

Country Link
CN (1) CN111209107A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645022A (en) * 2009-08-28 2010-02-10 曙光信息产业(北京)有限公司 Work scheduling management system and method for a plurality of colonies
CN103294485A (en) * 2013-06-27 2013-09-11 曙光信息产业(北京)有限公司 Web service packaging method and Web service packaging system both used for ABINIT parallel computing system
CN105183820A (en) * 2015-08-28 2015-12-23 广东创我科技发展有限公司 Multi-tenant supported large data platform and tenant access method
CN106165367A (en) * 2014-12-31 2016-11-23 华为技术有限公司 A kind of access control method, storage device and control system storing device
CN107895113A (en) * 2017-12-06 2018-04-10 北京搜狐新媒体信息技术有限公司 A kind of fine-grained data authority control method and system for supporting the more clusters of hadoop
US20190089812A1 (en) * 2016-03-31 2019-03-21 Alibaba Group Holding Limited Routing method and device
CN109740373A (en) * 2018-12-19 2019-05-10 福建新大陆软件工程有限公司 A kind of Hadoop cluster management method, system and platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645022A (en) * 2009-08-28 2010-02-10 曙光信息产业(北京)有限公司 Work scheduling management system and method for a plurality of colonies
CN103294485A (en) * 2013-06-27 2013-09-11 曙光信息产业(北京)有限公司 Web service packaging method and Web service packaging system both used for ABINIT parallel computing system
CN106165367A (en) * 2014-12-31 2016-11-23 华为技术有限公司 A kind of access control method, storage device and control system storing device
CN105183820A (en) * 2015-08-28 2015-12-23 广东创我科技发展有限公司 Multi-tenant supported large data platform and tenant access method
US20190089812A1 (en) * 2016-03-31 2019-03-21 Alibaba Group Holding Limited Routing method and device
CN107895113A (en) * 2017-12-06 2018-04-10 北京搜狐新媒体信息技术有限公司 A kind of fine-grained data authority control method and system for supporting the more clusters of hadoop
CN109740373A (en) * 2018-12-19 2019-05-10 福建新大陆软件工程有限公司 A kind of Hadoop cluster management method, system and platform

Similar Documents

Publication Publication Date Title
CN111078315B (en) Microservice orchestration, execution method and system, architecture, device, storage medium
CN108920259B (en) Deep learning job scheduling method, system and related equipment
CN116018788A (en) Configure service mesh networking resources for dynamically discovered peers or network functions
JP2020501253A (en) On-demand code execution in a localized device coordinator
WO2019218463A1 (en) Method and apparatus for automatically building kubernetes master node on basis of ansible tool, terminal device, and readable storage medium
JP6532385B2 (en) INFORMATION PROCESSING SYSTEM, CONTROL METHOD THEREOF, AND PROGRAM
CN108701132B (en) Resource management system and method
CN112395107A (en) Tax control equipment control method and device, storage medium and electronic equipment
CN103098033A (en) System and method for managing resources of a portable computing device
JP2020502643A (en) Localized device coordinator with on-demand code execution capability
US11645098B2 (en) Systems and methods to pre-provision sockets for serverless functions
JPWO2014171130A1 (en) Information processing system, deployment method, processing device, and deployment device
WO2019223099A1 (en) Application program calling method and system
JP2021518014A (en) On-demand code execution with limited memory footprint
WO2024066342A1 (en) Task processing method and apparatus, electronic device, and storage medium
JP7313351B2 (en) Resource processing method and system, storage medium, electronic device
Smirnov et al. Integration and combined use of distributed computing resources with Everest
US8676842B2 (en) Creating multiple Mbeans from a factory Mbean
CN111209107A (en) Multi-cluster operation method
US20110246553A1 (en) Validation of internal data in batch applications
CN114048460B (en) Cross-platform automatic data batch processing method, system, equipment and storage medium
JP2018084994A (en) Control system and control method
WO2024226591A1 (en) Third party interface for systems providing access management as a service
CN107784488A (en) A kind of business process management system of loose couplings
CN114237818A (en) Method, system, computing device and storage medium for sharing resources among virtual machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200529