CN115378812A

CN115378812A - Method and system for data center network equipment maintenance

Info

Publication number: CN115378812A
Application number: CN202210553546.4A
Authority: CN
Inventors: 马克西姆·莫内; 卢多维克·德马叙尔
Original assignee: OVH SAS
Current assignee: OVH SAS
Priority date: 2021-05-20
Filing date: 2022-05-20
Publication date: 2022-11-22
Anticipated expiration: 2042-05-20
Also published as: US20220376985A1; EP4092963B1; EP4092963A1; CN115378812B; CA3159474A1

Abstract

A method and system for maintaining network devices in a data center involves a network upgrade tool coupled to a CMDB, a rules DB, an upgrade path DB, and a network configurator. The network upgrade tool is configured to collect/obtain information from the network devices, remove those non-compliant network devices from automated maintenance, cluster the remaining network devices into groups of redundant network devices, and for all such groups, upgrade the groups in turn The network operating system of the network device present in .

Description

Method and system for data center network equipment maintenance

技术领域technical field

本技术涉及信息技术，更具体地涉及用于数据中心中的网络设备的自动化维护的方法和系统。The technology relates to information technology, and more particularly to methods and systems for automated maintenance of network equipment in data centers.

背景技术Background technique

已经开发了旨在支持网络设备诸如数据中心中的交换机、路由器等的维护的系统。如本文所用，“维护”可以包括例如执行将设备从当前网络操作系统升级到目标(例如：最新的)网络操作系统。如本文所用，“数据中心”不限于位于一个服务器场的物理边界内的基础设施，而是包括组织部署、控制和维护以给该组织自己的内部服务或给第三方实体即该组织的所有客户提供基于计算机群集的服务的所有基础设施，无论是本地的还是远程的。对在节省所需人力、避免人为错误和管理客户影响的同时实现数据中心中的网络设备的维护的自动化的系统的需求已久。Systems have been developed aimed at supporting the maintenance of network equipment such as switches, routers, etc. in data centers. As used herein, "maintenance" may include, for example, performing an upgrade of a device from a current network operating system to a target (eg, latest) network operating system. As used herein, a "data center" is not limited to infrastructure located within the physical boundaries of a server farm, but includes infrastructure deployed, controlled and maintained by an organization for either the organization's own internal services or to third party entities, i.e. all of the organization's customers All infrastructure that provides services based on computer clusters, whether local or remote. There has been a long-standing need for a system that automates maintenance of network equipment in a data center while saving required manpower, avoiding human error, and managing customer impact.

然而，开发此类系统所面临的挑战不仅在于要解决的网络设备的数目，还在于数据中心中的此类设备的多样性，包括它们的特性和它们的作用。例如，数据中心网络可能由不同的架构组成，这些架构并肩工作以提供连接性和服务，其中这些架构中的每个架构都设计有特定的特性、硬件型号、使能特征和/或冗余级别并针对该特定的特性、硬件型号、使能特征和/或冗余级别而设计，并且所有这些并行架构已连接到更中心的架构或核心网络。However, the challenge in developing such systems lies not only in the number of network devices to address, but also in the diversity of such devices in the data center, including their characteristics and their roles. For example, a data center network may consist of different architectures that work side-by-side to provide connectivity and services, where each of these architectures is designed with specific features, hardware models, enabling characteristics, and/or levels of redundancy And designed for that specific feature, hardware model, enabling feature, and/or level of redundancy, and all of these parallel fabrics are connected to a more central fabric or core network.

这种开发的挑战还在于使对数据中心的功能的中断最小化，并确保尽可能限制由于网络设备的维护而导致的部分或全部数据中心的不可用性。这又允许使组织提供给客户的网络连接性服务中的中断最小化，从而视情况而定满足合同服务质量承诺。The challenge of such a development is also to minimize disruption to the functionality of the data center and to ensure that the unavailability of some or all of the data center due to maintenance of network equipment is limited as much as possible. This in turn allows minimizing disruptions in network connectivity services provided by the organization to customers, thereby meeting contractual service quality commitments as the case may be.

因此，需要一种方法和系统，该方法和系统通过提供要升级的网络设备列表以及如果需要和在需要的时候输入一些简单的选项来允许没有特定网络知识的数据中心运营商使用简单的入口点来使大规模网络设备维护活动自动化。Therefore, there is a need for a method and system that allows a simple entry point for data center operators without specific network knowledge by providing a list of network devices to be upgraded and some simple options to enter if and when needed To automate large-scale network equipment maintenance activities.

一般而言，本技术旨在通过将自动化维护失败风险较高的那些网络设备从自动化维护中移除并利用剩余网络设备的冗余以便限制数据中心中由于维护操作导致的停机时间来使数据中心中的网络设备的维护自动化。In general, the present technology aims to make data centers more efficient by removing from automated maintenance those network devices that are at higher risk of automated maintenance failure and utilizing the redundancy of the remaining network devices in order to limit downtime in the data center due to maintenance operations. Maintenance automation of network devices in .

背景技术部分中讨论的主题不应仅仅因为其在背景技术部分中提及而被认为是现有技术。类似地，不应假定在背景技术部分中提到的或与背景技术部分的主题相关联的问题先前已在现有技术中被认识到。背景技术部分中的主题仅表示不同的方法。Subject matter discussed in the Background section should not be admitted to be prior art solely because it is mentioned in the Background section. Similarly, it should not be assumed that a problem mentioned in or associated with the subject matter of the Background section had been previously recognized in the prior art. The topics in the background section merely represent different approaches.

发明内容Contents of the invention

本技术的实施方式是基于开发者对与现有技术相关联的缺点的认识而开发的。Embodiments of the present technology were developed based on the developers' recognition of shortcomings associated with the prior art.

在一个实施方式中，本技术的各种实施方案提供了一种用于维护数据中心中的网络设备的方法，该方法包括：In one embodiment, various embodiments of the present technology provide a method for maintaining network equipment in a data center, the method comprising:

-针对维护列表上的网络设备中的每个网络设备收集：- Collect for each of the network devices on the maintenance list:

-管理IP(MANAGEMENTIP)信息，其中MANAGEMENTIP表示与网络设备唯一相关联的虚拟标识；以及- Management IP (MANAGEMENTIP) information, where MANAGEMENTIP represents a virtual identity uniquely associated with a network device; and

-任何标签和位置(LOCATION)信息，其中标签表示与网络设备相关联的键值元组，其中键是BU、ROLE或INFRA中的任一者，并且值是实际数据或键的数据的指针，其中BU值表示网络设备的与网络设备在数据中心中所用于的业务或产品供应相关的特性，ROLE值表示网络设备的与网络设备在数据中心中所占据的位置和功能相关的特性，并且INFRA值表示网络设备的与网络设备在其中运行的数据中心基础设施的版本或代相关的特性，并且其中LOCATION信息表示网络设备的与网络设备的实际物理位置相关的特性；- any label and location (LOCATION) information, where the label represents a key-value tuple associated with the network device, where the key is any of BU, ROLE, or INFRA, and the value is a pointer to the actual data or data for the key, Among them, the BU value indicates the characteristics of the network equipment related to the service or product supply used by the network equipment in the data center, the ROLE value indicates the characteristics of the network equipment related to the position and function occupied by the network equipment in the data center, and INFRA the value represents a characteristic of the network device related to the version or generation of data center infrastructure in which the network device is operating, and wherein the LOCATION information represents a characteristic of the network device related to the actual physical location of the network device;

-针对维护列表上的网络设备中的每个网络设备，使用对应的MANAGEMENTIP，获取该网络设备的硬件型号和当前网络操作系统级别；- For each network device in the network devices on the maintenance list, use the corresponding MANAGEMENTIP to obtain the hardware model and current network operating system level of the network device;

-从维护列表中移除具有不受支持的硬件型号或当前网络操作系统级别或BU、ROLE或INFRA标签或LOCATION信息中的任一者中有错误的网络设备；- Remove network devices with unsupported hardware models or current network operating system levels or errors in any of the BU, ROLE or INFRA labels or LOCATION information from the maintenance list;

-将维护列表上的剩余网络设备群集成：具有相同BU标签的网络设备的i个集群BU_i，在BU_i集群内的具有相同ROLE标签的网络设备的j个集群ROLE_j，以及具有相同INFRA标签的网络设备的ROLE_j集群内的k个集群Cluster_ijk；- Group the remaining network devices on the maintenance list into: i clusters BU _i of network devices with the same BU label, j cluster ROLE _j of network devices with the same ROLE label in the BU _i cluster, and the same INFRA k clusters Cluster _ijk in the ROLE _j cluster of the network device of the label;

-根据适用于INFRA_k的冗余规则，在每个Cluster_ijk集群内创建冗余的网络设备的组；- Create redundant groups of network devices within each Cluster _ijk cluster according to the redundancy rules applicable to INFRA _k ;

-在每个创建的组中验证是否符合适用于组中网络设备的数目的组合BU_i-ROLE_j-INFRA_k的大小规则，以及组中的所有网络设备是否共享相同LOCATION信息；- Verify in each created group that the size rules for the combination BU _i -ROLE _j -INFRA _k applicable to the number of network devices in the group are complied with and that all network devices in the group share the same LOCATION information;

-从维护列表中移除没有集群或分组成组、或者存在于不符合冗余规则或大小规则的组中、或者当存在于相同组中时没有共享相同LOCATION信息的网络设备；以及- Remove from the maintenance list network devices that do not have a cluster or grouping group, or are present in a group that does not comply with redundancy rules or size rules, or do not share the same LOCATION information when present in the same group; and

-使用对应的MANAGEMENTIP和升级规则，针对所有Cluster_ijk集群内的所有组，将每个创建的组中的剩余网络设备从网络操作系统的当前版本依次升级到操作系统的目标版本。- Using the corresponding MANAGEMENTIP and upgrade rules, for all groups in all Cluster _ijk clusters, upgrade the remaining network devices in each created group from the current version of the network operating system to the target version of the operating system in sequence.

在实施方式中，收集和获取是使用由数据中心运营商在部署数据中心时分配给维护列表上的网络设备中的每个网络设备的唯一ID来执行的，并且该唯一ID与每个网络设备的标签中的任一标签以及MANAGEMENTIP和LOCATION信息相关联。In an embodiment, the collection and acquisition are performed using a unique ID assigned to each of the network devices on the maintenance list by the data center operator when deploying the data center, and the unique ID is associated with each network device Associate any of the tags in the , along with the MANAGEMENTIP and LOCATION information.

在实施方式中，从维护列表中移除还包括移除对于BU、ROLE和INFRA标签中的每个标签不具有恰好一个值的网络设备。In an embodiment, removing from the maintenance list also includes removing network devices that do not have exactly one value for each of the BU, ROLE, and INFRA tags.

在实施方式中，从维护列表中移除还包括创建填充有移除的网络设备的ID的错误列表。In an embodiment, removing from the maintenance list further includes creating an error list populated with IDs of removed network devices.

在实施方式中，错误列表还包括能够由数据中心运营商纠正的错误以及链接到故障网络设备的错误的子列表。In an embodiment, the error list also includes a sub-list of errors that can be corrected by the data center operator and errors linked to faulty network equipment.

在实施方式中，依次升级还包括调整并行处理的Cluster_ijk集群的数目以使数据中心的停机时间最小化。In an embodiment, the sequential upgrading further includes adjusting the number of parallel processing Cluster _ijk clusters to minimize the downtime of the data center.

在实施方式中，依次升级还包括调整处理Cluster_ijk集群的顺序以优化对部署、控制和维护数据中心的组织的客户的服务的连续性。In an embodiment, sequentially upgrading further includes adjusting the order of processing the Cluster _ijk clusters to optimize continuity of service to customers of the organization that deploys, controls and maintains the data center.

在又一实施方式中，本技术的各种实施方案提供了一种用于维护数据中心中的网络设备的系统，该系统包括耦合到CMDB、规则DB、升级路径DB和网络配置器的网络升级工具，该网络升级工具被配置成：In yet another embodiment, various embodiments of the present technology provide a system for maintaining network devices in a data center that includes a network upgrade coupled to a CMDB, a rules DB, an upgrade path DB, and a network configurator tool, the network upgrade tool is configured to:

-从CMDB中针对维护列表上的网络设备中的每个网络设备收集：- Collect from the CMDB for each of the network devices on the maintenance list:

-MANAGEMENTIP信息，其中MANAGEMENTIP表示与网络设备唯一相关联的虚拟标识，以及- MANAGEMENTIP information, where MANAGEMENTIP represents a virtual identity uniquely associated with a network device, and

-任何标签和LOCATION信息，其中标签表示与网络设备相关联的键值元组，其中键是BU、ROLE或INFRA中的任一者，并且值是实际数据或键的数据的指针，其中BU表示网络设备的与网络设备在数据中心中所用于的业务或产品供应相关的特性，ROLE表示网络设备的与网络设备在数据中心中所占据的位置和功能相关的特性，并且INFRA表示网络设备的与网络设备在其中运行的数据中心基础设施的版本或代相关的特性，并且其中LOCATION表示网络设备的与网络设备的实际物理位置相关的特性；- Any label and LOCATION information, where the label represents a key-value tuple associated with a network device, where the key is any of BU, ROLE, or INFRA, and the value is a pointer to the actual data or data for the key, where BU represents The characteristics of the network equipment related to the service or product supply that the network equipment is used in the data center, ROLE indicates the characteristics of the network equipment related to the position and function that the network equipment occupies in the data center, and INFRA indicates the network equipment and A version or generation-related characteristic of the data center infrastructure in which the network device operates, and wherein LOCATION represents a characteristic of the network device that is related to the actual physical location of the network device;

-将维护列表上的剩余网络设备群集成：具有相同BU标签的网络设备的i个集群BU_i，在BU_i集群内具有相同ROLE标签的网络设备的j个集群ROLE_j，以及具有相同INFRA标签的网络设备的ROLE_j集群内的k个集群Cluster_ijk；- Group the remaining network devices on the maintenance list into: i clusters BU _i of network devices with the same BU label, j cluster ROLE _j of network devices with the same ROLE label within the BU _i cluster, and the same INFRA label k clusters Cluster _ijk in the ROLE _j cluster of network equipment;

-根据从规则DB中收集的适用于INFRA_k的冗余规则，在每个Cluster_ijk集群内创建冗余的网络设备的组；- Create redundant groups of network devices within each Cluster _ijk cluster according to the redundancy rules applicable to INFRA _k collected from the rules DB;

-在每个创建的组中验证是否符合从规则DB中收集的适用于组中网络设备的数目的组合BU_i-ROLE_j-INFRA_k的大小规则，以及组中的所有网络设备是否共享相同LOCATION信息；- In each created group verify compliance with the size rules collected from the rules DB for the combination BU _i -ROLE _j -INFRA _k applicable to the number of network devices in the group and that all network devices in the group share the same LOCATION information;

-通过网络配置器，使用对应的MANAGEMENTIP和从规则DB中收集的升级规则，针对所有Cluster_ijk集群内的所有组，将每个创建的组中的剩余网络设备从网络操作系统的当前版本依次升级到操作系统的目标版本。- Use the corresponding MANAGEMENTIP and the upgrade rules collected from the rule DB to upgrade the remaining network devices in each created group sequentially from the current version of the network operating system for all groups in all Cluster _ijk clusters through the network configurator to the target version of the operating system.

在实施方式中，网络升级工具还被配置成使用由数据中心运营商在部署数据中心时分配给维护列表上的网络设备中的每个网络设备的唯一ID来进行收集和获取，并且该唯一ID与每个网络设备的标签中的任一标签以及MANAGEMENTIP和LOCATION信息相关联。In an embodiment, the network upgrade tool is further configured to collect and acquire using a unique ID assigned to each of the network devices on the maintenance list by the data center operator when deploying the data center, and the unique ID Associated with either of the tags for each network device, along with the MANAGEMENTIP and LOCATION information.

在实施方式中，网络升级工具还被配置成创建错误列表，该错误列表填充有从维护列表中移除的网络设备的ID。In an embodiment, the network upgrade tool is further configured to create an error list populated with IDs of network devices removed from the maintenance list.

在实施方式中，网络升级工具还被配置成调整并行处理的Cluster_ijk集群的数目以使数据中心的停机时间最小化。In an embodiment, the network upgrade tool is further configured to adjust the number of parallel processing Cluster _ijk clusters to minimize data center downtime.

在实施方式中，网络升级工具还被配置成调整处理Cluster_ijk集群的顺序以优化对部署、控制和维护数据中心的组织的客户的服务的连续性。In an embodiment, the network upgrade tool is further configured to adjust the order of processing the Cluster _ijk clusters to optimize continuity of service to customers of the organization that deploys, controls, and maintains the data center.

在实施方式中，冗余规则和大小规则被硬编码在网络升级工具中。In an embodiment, redundancy rules and size rules are hardcoded in the network upgrade tool.

在又一实施方式中，本技术的各种实现方式提供了一种计算机可读介质，该计算机可读介质包括使计算系统执行上述方法的指令。In yet another embodiment, various implementations of the present technology provide a computer-readable medium including instructions for causing a computing system to perform the method described above.

在又一实施方式中，本技术的各种实施方式提供了一种用于维护数据中心中的多个网络设备的方法。该方法包括针对多个网络设备中的至少一个网络设备收集下述，该至少一个网络设备选自维护列表：-MANAGEMENTIP，该MANAGEMENTIP是与至少一个网络设备唯一相关联的虚拟标识；-LOCATION，该LOCATION是表示至少一个网络设备的实际物理位置的信息；-标签，该标签是与至少一个网络设备相关联的键值元组，键值是数据和数据的指针中的一者，In yet another embodiment, various embodiments of the present technology provide a method for maintaining a plurality of network devices in a data center. The method includes collecting for at least one network device of a plurality of network devices, the at least one network device selected from a maintenance list: - MANAGEMENTIP, which is a virtual identity uniquely associated with the at least one network device; - LOCATION, the LOCATION is information representing the actual physical location of at least one network device; - a label, which is a key-value tuple associated with at least one network device, the key value being one of data and a pointer to data,

标签是下述中的一者：-BU标签，该BU标签是表示数据中心中的至少一个网络设备的产品供应应用的信息；-ROLE标签，该ROLE标签是表示至少一个网络设备在数据中心中所占据的功能的信息；以及-INFRA标签，该INFRA标签是表示至少一个网络设备在其中运行的数据中心的版本的信息；针对维护列表上的至少一个网络设备，基于至少一个网络设备的MANAGEMENTIP，获取至少一个网络设备的硬件型号和当前网络操作系统级别；以及响应于确定下述中的至少一项：至少一个网络设备的硬件型号不受支持，至少一个网络设备的当前网络操作系统级别不受支持，BU标签、ROLE标签、INFRA标签和LOCATION标签中的至少一者中的错误，从维护列表中移除至少一个网络设备。The label is one of the following: - BU label, which is information indicating the product supply application of at least one network device in the data center; - ROLE label, which indicates that at least one network device in the data center information about the function occupied; and - the INFRA tag, the INFRA tag is information indicating the version of the data center in which the at least one network device is running; for at least one network device on the maintenance list, based on the MANAGEMENTIP of the at least one network device, obtaining a hardware model and a current network operating system level of at least one network device; and in response to determining at least one of: the hardware model of at least one network device is not supported, the current network operating system level of at least one network device is not supported Support, errors in at least one of the BU label, ROLE label, INFRA label and LOCATION label, remove at least one network device from the maintenance list.

在一些实施方案中，该方法还包括确定下述中的至少一项：至少一个网络设备的硬件型号不受支持，至少一个网络设备的当前网络操作系统级别不受支持，BU标签、ROLE标签、INFRA标签和LOCATION标签中的至少一者中的错误。In some embodiments, the method also includes determining at least one of the following: the hardware model of at least one network device is not supported, the current network operating system level of at least one network device is not supported, BU label, ROLE label, Error in at least one of the INFRA tag and the LOCATION tag.

在一些实施方案中，该方法还包括针对多个网络设备中的每个网络设备收集：MANAGEMENTIP、LOCATION和标签；响应于从维护列表中移除至少一个网络设备，将维护列表上的剩余网络设备群集成：在剩余网络设备的相关联标签之一中具有相同BU标签值的网络设备的i个集群BU_i；在剩余网络设备的相关联标签之一中具有相同ROLE键值的网络设备的j个集群ROLE_j，以及在剩余网络设备的相关联标签之一中具有相同INFRA标签值的网络设备的ROLE_j集群内的k个集群Cluster_ijk；在每个Cluster_ijk集群内创建冗余的多个网络设备的组；根据适用于INFRA_k的冗余规则，验证每个创建的组中的冗余网络设备的数目是否与第一数量相匹配；以及根据适用于组合BU_i-ROLE_j-INFRA_k的大小规则，验证每个组中的网络设备的数目是否与第二数量相匹配。In some embodiments, the method further includes collecting, for each network device in the plurality of network devices: MANAGEMENTIP, LOCATION, and label; in response to removing at least one network device from the maintenance list, maintaining the remaining network devices on the list Clustering into: i cluster BU _i of network devices with the same BU tag value in one of the associated tags of the remaining network devices; j of network devices with the same ROLE key value in one of the associated tags of the remaining network devices cluster ROLE _j , and k clusters Cluster _ijk within the ROLE _j cluster of network devices having the same INFRA tag value in one of the associated tags of the remaining network devices; create redundant multiples within each Cluster _ijk cluster groups of network devices; verify that _the number of redundant network devices in each created _group matches the first number according to the redundancy rules applicable to INFRA _k ; and verify that _the number of redundant network devices in each created group matches the first number; and , verifying that the number of network devices in each group matches the second number.

在一些实施方案中，该方法还包括验证每个组的网络设备是否共享相同LOCATION；从维护列表中移除在给定组中存在且冗余的数量与第一数量不匹配的、在另一给定组中存在的数量与第二数量不匹配的、或者在另一组中存在且不共享相同LOCATION的至少一个网络设备。In some embodiments, the method further includes verifying whether the network devices of each group share the same LOCATION; At least one network device present in the given group whose number does not match the second number, or present in another group and not sharing the same LOCATION.

在一些实施方案中，该方法还包括使用升级规则和相应的MANAGEMENTIP，将所有Cluster_ijk集群内的每个组中的剩余网络设备从网络操作系统的当前版本升级到操作系统的目标版本。In some embodiments, the method further includes upgrading the remaining network devices in each group within all Cluster _ijk clusters from the current version of the network operating system to the target version of the operating system using the upgrade rule and the corresponding MANAGEMENTIP.

在一些实施方案中，收集和获取是使用由数据中心运营商在部署数据中心时分配给维护列表上的网络设备中的每个网络设备的唯一ID来执行的，并且该唯一ID与每个网络设备的相关联标签中的任一相关联标签、MANAGEMENTIP和LOCATION相关联。In some embodiments, the collection and acquisition is performed using a unique ID assigned to each of the network devices on the maintenance list by the data center operator when the data center is deployed, and the unique ID is associated with each network Any of the device's associated tags, MANAGEMENTIP, and LOCATION are associated.

在一些实施方案中，从维护列表中移除还包括移除对于BU、ROLE和INFRA标签中的每个标签不具有恰好一个值的网络设备。In some embodiments, removing from the maintenance list also includes removing network devices that do not have exactly one value for each of the BU, ROLE, and INFRA tags.

在一些实施方案中，从维护列表中移除还包括创建填充有移除的网络设备的ID的错误列表。In some embodiments, removing from the maintenance list further includes creating an error list populated with IDs of removed network devices.

在一些实施方案中，错误列表还包括能够由数据中心运营商纠正的错误以及链接到故障网络设备的错误的子列表。In some embodiments, the error list also includes a sub-list of errors that can be corrected by the data center operator and errors linked to faulty network equipment.

在一些实施方案中，升级还包括调整并行处理的Cluster_ijk集群的数目以使数据中心的停机时间最小化。In some embodiments, upgrading also includes adjusting the number of parallel processing Cluster _ijk clusters to minimize data center downtime.

在一些实施方案中，升级还包括调整处理Cluster_ijk集群的顺序以优化对部署、控制和维护数据中心的组织的客户的服务的连续性。In some embodiments, upgrading also includes adjusting the order of processing the Cluster _ijk clusters to optimize continuity of service to customers of the organization that deploys, controls, and maintains the data center.

在又一实施方案中，本技术的各种实施方案提供了一种用于维护数据中心中的网络设备的系统，该系统包括耦合到CMDB、规则DB、升级路径DB和网络配置器的网络升级工具，该网络升级工具被配置成：-从CMDB中针对维护列表上的多个网络设备中的每个网络设备收集：-MANAGEMENTIP，该MANAGEMENTIP是与给定网络设备唯一相关联的虚拟标识；-LOCATION，该LOCATION是表示给定网络设备的实际物理位置的信息；以及-标签，该标签是与给定网络设备相关联的键值元组，其中，该标签是下述中的一者：-表示数据中心中网络设备被用于的业务或产品供应的信息，BU；-表示网络设备在数据中心中所占据的位置和功能的信息，ROLE；以及-表示网络设备在其中运行的数据中心的版本或代的信息，INFRA；用于标签的值是实际数据或数据的指针；-针对维护列表上的网络设备中的每个网络设备，使用相应的MANAGEMENTIP，获取给定网络设备的硬件型号和当前网络操作系统级别；-从维护列表中移除具有不受支持的硬件型号或当前网络操作系统级别或者在其相应的相关联的BU、ROLE和INFRA标签中的任一标签中或在其LOCATION中的任一LOCATION中有错误的网络设备；-将维护列表上的剩余网络设备群集成：在剩余网络设备的相关联标签之一中具有相同BU键值的网络设备的i个集群BU_i，在BU_i集群内的剩余网络设备的相关联标签之一中具有相同ROLE键值的网络设备的j个集群ROLE_j，以及在剩余网络设备的相关联标签之一中具有相同INFRA键值的网络设备的ROLE_j集群内的k个集群Cluster_ijk；-根据从规则DB中收集的适用于INFRA_k的冗余规则，在每个Cluster_ijk集群内创建冗余的网络设备的组；-根据适用于INFRA_k的冗余规则，验证每个组中的冗余网络设备的数目是否与第一数量相匹配；-根据适用于组合BU_i-ROLE_j-INFRA_k的大小规则，验证每个组中的网络设备的数目是否与第二数量相匹配；-验证每个创建的组中的所有网络设备是否共享相同LOCATION；-从维护列表中移除在组中的一个组中存在且冗余的数量与第一数量不匹配的、在组中的一个组中存在的数量与第二数量不匹配的、或者在组中的一个组中存在且不共享相同LOCATION的网络设备；以及-通过网络配置器，使用从升级路径DB中收集的升级规则和相应的MANAGEMENTIP，针对所有Cluster_ijk集群内的所有组，将每个组中的剩余网络设备从网络操作系统的当前版本升级到操作系统的目标版本。In yet another embodiment, various embodiments of the present technology provide a system for maintaining network devices in a data center that includes a network upgrade coupled to a CMDB, a rules DB, an upgrade path DB, and a network configurator tool, the network upgrade tool is configured to: - collect from the CMDB for each network device in a plurality of network devices on the maintenance list: - MANAGEMENTIP, which is a virtual identifier uniquely associated with a given network device;- LOCATION, which is information representing the actual physical location of a given network device; and - a label, which is a key-value tuple associated with the given network device, wherein the label is one of:- - information indicating the business or product supply that the network equipment is used for in the data center, BU; - information indicating the position and function occupied by the network equipment in the data center, ROLE; and - indicating the data center in which the network equipment operates Version or generation information, INFRA; the value used for the label is the actual data or a pointer to the data; - For each network device in the network device on the maintenance list, use the corresponding MANAGEMENTIP to obtain the hardware model and CURRENT NOS LEVEL; - Removes from the maintenance list that has an unsupported hardware model or current NOOS level either in any of its corresponding associated BU, ROLE, and INFRA tags or in its LOCATION There is a wrong network device in any LOCATION in; - cluster the remaining network devices on the maintenance list into: i cluster BU _i of network devices with the same BU key value in one of the associated labels of the remaining network devices, j cluster ROLE _j of network devices with the same ROLE key value in one of the associated labels of the remaining network devices within the cluster of BU _i , and the network with the same INFRA key value in one of the remaining network devices' associated labels k clusters Cluster _ijk within the ROLE _j cluster of the device; - according to the redundancy rules applicable to INFRA _k collected from the rule DB, create redundant groups of network devices within each Cluster _ijk cluster; - according to the applicable Redundancy rules for INFRA _k , verify that the number of redundant network devices in each group matches the first number; - verify the number of redundant network devices in each group according to the size rules applicable to the combination BU _i -ROLE _j -INFRA _k whether the number of network devices matches the second number; - verify that all network devices in each created group share the same LOCATION; - remove from the maintenance list the number that exists in one of the groups and is redundant with Network devices that do not match in the first quantity, are present in one of the groups in a quantity that does not match the second quantity, or are present in one of the groups and do not share the same LOCATION; and - via the Network Configurator, Use upgrade path D from The upgrade rules and corresponding MANAGEMENTIP collected in B upgrade the remaining network devices in each group from the current version of the network operating system to the target version of the operating system for all groups in all Cluster _ijk clusters.

在一些实施方案中，网络升级工具还被配置成使用由数据中心运营商在部署数据中心时分配给维护列表上的多个网络设备中的每个网络设备的唯一ID来进行收集和获取，并且该唯一ID与每个网络设备的相关联标签中的任一相关联标签、所述MANAGEMENTIP和所述LOCATION相关联。In some embodiments, the network upgrade tool is further configured to collect and obtain using a unique ID assigned to each of the plurality of network devices on the maintenance list by the data center operator when deploying the data center, and This unique ID is associated with any one of the associated tags of each network device, the MANAGEMENTIP and the LOCATION.

在一些实施方案中，冗余和大小规则被硬编码在网络升级工具中。In some embodiments, redundancy and size rules are hardcoded in the network upgrade tool.

本技术提供了一种方法和系统，当网络设备符合相同的参数(架构、角色、硬件型号、运行的软件版本和/或使能的特征等)时，该方法和系统将执行完全相同的维护过程，从而可以移除人为错误的风险。调整后的并行执行的量将限制数据中心不可用的影响。通过识别共享某些参数或特性的网络设备的基础设施中的冗余以及对此类网络设备的非同时维护以便可能地使服务降级但不会完全中断它，将使向客户提供的网络连接服务的中断最小化。此外，这样的方法和系统将允许数据中心运营商容易地考虑新设备硬件或设备之间的新架构。The technology provides a method and system that will perform exactly the same maintenance when network devices conform to the same parameters (architecture, role, hardware model, running software version and/or enabled features, etc.) process, thereby removing the risk of human error. The adjusted amount of parallel execution will limit the impact of data center unavailability. Network connectivity services provided to customers will be enhanced by identifying redundancies in the infrastructure of network equipment sharing certain parameters or characteristics and non-simultaneous maintenance of such network equipment so as to potentially degrade the service but not completely interrupt it interruptions are minimized. Furthermore, such methods and systems would allow data center operators to easily consider new equipment hardware or new architectures between equipment.

在本说明书的上下文中，除非另有明确规定，否则系统可以指但不限于适合手头相关任务的“电子设备”、“操作系统”、“计算系统”、“基于计算机的系统”、“控制器单元”、“监控装置”、“控制装置”和/或其任何组合。In the context of this specification, unless expressly stated otherwise, a system may refer to, but is not limited to, an "electronic device", "operating system", "computing system", "computer-based system", "controller unit", "monitoring device", "control device" and/or any combination thereof.

在本说明书的上下文中，图中所示的功能步骤可以通过使用专用硬件以及能够执行与适当软件相关联的软件的硬件来提供。此外，图中所示的各种功能块诸如标签为“网络设备”、“工具”、“配置器”等的功能，可以通过使用专用硬件以及能够执行与适当软件相关联的软件的硬件来提供。当由“处理器”提供时，功能可以由单个专用处理器、单个共享处理器或多个单独的处理器提供，其中一些可以共享。在本技术的一些实施方式中，处理器可以是通用处理器，诸如中央处理单元(CPU)或专用于特定用途的处理器，诸如数字信号处理器(DSP)。在前述中，术语“处理器”的明确使用不应被解释为专门指能够执行软件的硬件，并且可以隐含地包括但不限于专用集成电路(ASIC)、现场可编程门阵列(FPGA)、用于存储软件的只读存储器(ROM)、随机存取存储器(RAM)和非易失性存储器。也可以包括其他硬件，传统的和/或定制的。In the context of this description, the functional steps shown in the figures may be provided by the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. Furthermore, the various functional blocks shown in the figures, such as the functionality labeled "network device", "tool", "configurator", etc., may be provided through the use of dedicated hardware as well as hardware capable of executing software associated with appropriate software . When provided by a "processor," the functionality may be provided by a single dedicated processor, a single shared processor, or multiple separate processors, some of which may be shared. In some embodiments of the present technology, the processor may be a general-purpose processor, such as a central processing unit (CPU), or a dedicated processor for a specific purpose, such as a digital signal processor (DSP). In the foregoing, explicit use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, but is not limited to, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), Read-only memory (ROM), random-access memory (RAM), and non-volatile memory for storing software. Other hardware, conventional and/or custom, may also be included.

在本说明书的上下文中，“标签”旨在表示与每个数据中心网络设备相关联并且存储在数据库例如配置管理数据库(CMDB)中的键值元组。键唯一标识数据元素，并且是值实际数据或数据的指针(pointer)，视情况而定。标签是数据中心网络设备的某些特性的便捷键。In the context of this specification, a "tag" is intended to mean a key-value tuple associated with each data center network device and stored in a database, such as a configuration management database (CMDB). A key uniquely identifies a data element and is either the actual data of the value or a pointer to the data, as the case may be. Labels are convenient keys to certain characteristics of data center networking equipment.

在本说明书的上下文中，“BU”(用于业务单元)旨在，作为数据中心网络设备的标签的一部分，表示与数据中心网络设备所用于的业务或产品供应有关的特性。业务或产品供应可以是部署、控制和维护基础设施的组织的业务或产品供应或该组织的客户的业务或产品供应。例如，BU可以是服务器、云、主机等，并且取决于业务或产品供应的目录的粒度、多样性和复杂性。In the context of this specification, "BU" (for Business Unit) is intended, as part of the labeling of a data center network device, to denote a characteristic related to the business or product offering for which the data center network device is intended. The business or product offering may be that of the organization that deploys, controls and maintains the infrastructure or that of a customer of that organization. For example, a BU can be a server, cloud, host, etc., and depends on the granularity, variety, and complexity of the catalog of business or product offerings.

在本说明书的上下文中，“ROLE”旨在，作为数据中心网络设备的标签的一部分，表示与数据中心网络设备在数据中心网络基础设施中占据的位置和功能有关的特性。例如，ROLE可以是但不限于：“聚合”、“架顶”(ToR)、“行尾”(EoR)、“脊椎”、“巨型脊椎”等。In the context of this specification, "ROLE" is intended, as part of the labeling of a data center network device, to denote characteristics related to the position and function that a data center network device occupies within the data center network infrastructure. For example, ROLE can be, but is not limited to: "aggregate", "top of rack" (ToR), "end of row" (EoR), "spine", "mega-spine", etc.

在本说明书的上下文中，“INFRA”旨在，作为数据中心网络设备的标签的一部分，表示与数据中心网络设备意在于其中运行的、可能随着时间的推移随着增强和演进而演进的基础设施的版本或代有关的特性。In the context of this specification, "INFRA" is intended, as part of the labeling of data center networking equipment, to denote the basis for which the data center networking equipment is intended to operate, which may evolve over time with enhancements and evolutions Version or generation related features of the facility.

在本说明书的上下文中，“LOCATION”旨在，作为与数据中心网络设备相关联的信息的一部分，表示与数据中心网络设备在数据中心中的实际物理位置有关的特性。例如LOCATION可以是但不限于：数据中心建筑物的名称、特定的数据中心房间等。In the context of this specification, "LOCATION" is intended, as part of the information associated with a data center networking device, to denote a characteristic relating to the actual physical location of a data center networking device in a data center. For example, LOCATION can be, but is not limited to: the name of a data center building, a specific data center room, and the like.

在本说明书的上下文中，“MANAGEMENTIP”旨在，作为与数据中心网络设备相关联的信息的一部分，表示与数据中心网络设备唯一相关联的并且允许通过使用例如自动化工具到达数据中心网络设备并使用它执行操作诸如例如检索信息、改变配置、升级等的虚拟标识，诸如例如IP地址。In the context of this specification, "MANAGEMENTIP" is intended, as part of the information associated with a data center network device, to denote a It performs operations such as eg retrieving information, changing configurations, upgrading etc. virtual identities such as eg IP addresses.

仍然在本说明书的上下文中，“一个”计算机可读介质和“该”计算机可读介质不应被解释为同一计算机可读介质。相反，只要适当，“一个”计算机可读介质和“该”计算机可读介质也可以被解释为第一计算机可读介质和第二计算机可读介质。Still in the context of this specification, "a" computer readable medium and "the" computer readable medium should not be construed as the same computer readable medium. Conversely, "a" computer-readable medium and "the" computer-readable medium may also be construed as a first computer-readable medium and a second computer-readable medium, as appropriate.

仍然在本说明书的上下文中，除非另有明确规定，否则词语“第一”、“第二”、“第三”等已被用作形容词，仅是为了允许在它们所修饰的名词彼此之间进行区分，而不是为了描述这些名词之间的任何特定关系。Still in the context of this specification, unless expressly stated otherwise, the words "first", "second", "third", etc. have been used as adjectives only to allow the distinction between the nouns they modify distinction, not to describe any particular relationship between these nouns.

本技术的实施方案各自具有上述目的和/或方面中的至少一个，但不一定具有所有这些目的和/或方面。应当理解，由于试图实现上述目的而产生的本技术的一些方面可能不满足该目的和/或可能满足本文未具体列举的其他目的。The embodiments of the present technology each have at least one of the above-mentioned objects and/or aspects, but not necessarily all of them. It should be understood that some aspects of the present technology resulting from an attempt to achieve the above purpose may not satisfy that purpose and/or may satisfy other objectives not specifically enumerated herein.

本技术的实施方式的附加和/或替代特征、方面和优点将从以下描述、附图和所附权利要求变得明显。Additional and/or alternative features, aspects and advantages of embodiments of the technology will become apparent from the following description, drawings and appended claims.

附图说明Description of drawings

为了更好地理解本技术以及它的其他方面和进一步的特征，参考将结合附图使用的以下描述，其中：For a better understanding of the present technology, as well as its other aspects and further features, reference is made to the following description to be used in conjunction with the accompanying drawings, in which:

图1描绘了其中可以使用本技术的数据中心环境；Figure 1 depicts a data center environment in which the present technology may be used;

图2呈现了根据本技术的方法的广泛概述；Figure 2 presents a broad overview of methods according to the present technology;

图3a至图3c提供了根据本技术的方法的更详细视图；Figures 3a to 3c provide more detailed views of methods in accordance with the present technology;

图4提供了数据中心中的网络设备的集群/子集群的逻辑图示；以及Figure 4 provides a logical illustration of clusters/sub-clusters of network devices in a data center; and

图5图示了可以在本技术中使用的计算系统。Figure 5 illustrates a computing system that may be used in the present technology.

应当注意，除非本文另有明确说明，否则附图不是按比例绘制的。此外，从一个图到下一个图相同的元件共享相同的附图标签。It should be noted that unless otherwise expressly indicated herein, the drawings are not drawn to scale. Furthermore, elements that are the same from one figure to the next share the same reference label.

具体实施方式Detailed ways

本文中列举的示例和条件语言主要旨在帮助读者理解本技术的原理，而不是将其范围限制于这些具体列举的示例和条件。将理解，本领域的技术人员可以设计尽管在本文中没有明确地描述或示出但是体现了本技术的原理并且被包括在本技术的精神和范围内的各种布置。The examples and conditional language enumerated herein are primarily intended to assist the reader in understanding the principles of the technology, not to limit its scope to these specifically enumerated examples and conditions. It will be appreciated that those skilled in the art can devise various arrangements that, although not explicitly described or shown herein, embody the principles of the technology and are included within its spirit and scope.

此外，为了帮助理解，以下描述可以描述本技术的相对简化的实施方案。如本领域技术人员将理解的，本技术的各种实施方案可能具有更大的复杂性。Furthermore, the following description may describe a relatively simplified implementation of the technology as an aid to understanding. Various embodiments of the present technology may have greater complexities, as will be appreciated by those skilled in the art.

在一些情况下，还可以阐述被认为是对本技术的修改的有用示例。这样做仅仅是为了帮助理解，并且再次，不是为了限定本技术的范围或阐明本技术的界限。这些修改不是详尽的列表，并且本领域的技术人员可以进行其他修改，同时仍然保持在本技术的范围内。此外，在没有阐述修改的示例的情况下，不应解释为没有修改是可能的和/或所描述的是实现本技术的该元素的唯一方式。In some cases, useful examples of what are considered modifications to the technology may also be set forth. This is done merely to aid understanding and, again, not to define the scope of the technology or to clarify the boundaries of the technology. These modifications are not an exhaustive list, and other modifications may be made by those skilled in the art while remaining within the scope of the technology. Furthermore, where no modified example is set forth, it should not be construed that no modification is possible and/or that what is described is the only way of implementing that element of the technology.

此外，本文中列举本技术的原理、方面和实施方案及其具体示例的所有陈述旨在涵盖其结构和功能等同物，无论它们是当前已知的还是未来开发的。因此，例如，本领域技术人员将理解，本文中的任何框图表示体现本技术的原理的说明性电路的概念视图。类似地，将理解，任何流程图、作业图、状态转换图、伪代码等表示可以在非暂时性计算机可读介质中实质上表示并且因此由计算机或处理器执行的各种过程，无论是否明确示出这样的计算机或处理器。Moreover, all statements herein reciting principles, aspects, and embodiments of the technology, as well as specific examples thereof, are intended to encompass structural and functional equivalents thereof, whether currently known or developed in the future. Thus, for example, it will be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the technology. Similarly, it will be understood that any flowcharts, job diagrams, state transition diagrams, pseudocode, etc. represent various processes that may be substantially represented in a non-transitory computer-readable medium and thus executed by a computer or processor, whether explicitly Such a computer or processor is shown.

软件模块或暗示为软件的简单模块在本文中可以表示为流程图元素或指示过程步骤和/或文本描述的执行的其他元素的任何组合。这样的模块可以由明确或隐含地示出的硬件执行。此外，应该理解，模块可以包括例如但不限于提供所需能力的计算机程序逻辑、计算机程序指令、软件、堆栈、固件、硬件电路或其组合。Software modules, or simple modules implied to be software, may be represented herein as any combination of flowchart elements or other elements indicating the execution of process steps and/or textual descriptions. Such modules may be performed by hardware shown explicitly or implicitly. Furthermore, it should be understood that a module may comprise, for example but not limited to, computer program logic, computer program instructions, software, stacks, firmware, hardware circuits or combinations thereof to provide the required capabilities.

有了这些基本原理，我们现在将考虑一些非限制性示例来说明本技术的各个方面的各种实施方案。图1描绘了其中可以使用本技术的数据中心环境。网络升级工具100向数据中心运营商101提供输入/输出接口，该输入/输出接口允许该数据中心运营商输入用于网络设备维护活动的自动化的输入和选项并接收关于这种活动的状态和结果。网络升级工具100可以使用YAQL语言(又一种查询语言)开发，并且可以被解释为用于启动和组织任务、子任务和独立动作的开源软件编排器内的工作流。例如，网络升级工具100可以是软件堆栈102一部分，诸如为OpenStack项目(可在https://docs.openstack.org/mistral/latest/获得)的部件之一的Mistra。对本领域技术人员来说将明显的是，在本公开的教导内仍然可以使用其他语言、软件和软件框架。With these rationales in place, we will now consider some non-limiting examples to illustrate various implementations of various aspects of the technology. Figure 1 depicts a data center environment in which the present technology can be used. Network upgrade tool 100 provides an input/output interface to data center operator 101 that allows the data center operator to enter inputs and options for automation of network equipment maintenance activities and to receive status and results regarding such activities . The network upgrade tool 100 can be developed using the YAQL language (Yet Another Query Language) and can be interpreted as a workflow within an open source software orchestrator for initiating and organizing tasks, subtasks and individual actions. For example, network upgrade tool 100 may be part of a software stack 102, such as Mistra, one of the components of the OpenStack project (available at https://docs.openstack.org/mistral/latest/). It will be apparent to those skilled in the art that other languages, software and software frameworks can still be used within the teachings of the present disclosure.

网络升级工具100可以进一步与作为抽象和统一的应用程序编程接口(API)系统的网络配置器105接口，该网络配置器提供与数据中心中的网络设备交互的能力，而不管它们的硬件型号或网络操作系统的多样性。网络配置器105可以具有与数据中心网络设备的接口106。The network upgrade tool 100 may further interface with a network configurator 105 as an abstract and unified application programming interface (API) system that provides the ability to interact with network devices in the data center regardless of their hardware model or The diversity of network operating systems. The network configurator 105 may have an interface 106 with data center network devices.

网络升级工具100可以进一步与CMDB 103耦合并接口，该CMDB可以引用数据中心中的网络设备，并且存储例如网络升级工具100可以检索的与此类被引用的网络设备中的每个网络设备相关联的标签和MANAGEMENTIP。例如，在数据中心中部署每个新网络设备时，每个新网络设备例如可以由数据中心运营商101给予唯一的网络设备ID，并且这种网络设备ID可以用于从CMDB 103中检索与部署的网络设备相关联的标签和MANAGEMENTIP。Network upgrade tool 100 can further couple and interface with CMDB 103, which can reference network devices in the data center and store, for example, network upgrade tool 100 retrieved information associated with each of such referenced network devices. label and MANAGEMENTIP. For example, when each new network device is deployed in a data center, each new network device can be given a unique network device ID by, for example, the data center operator 101, and this network device ID can be used for retrieval and deployment from the CMDB 103 The label associated with the network device and MANAGEMENTIP.

网络升级工具100可以进一步与升级路径DB 104接口，该升级路径DB可以引用数据中心中的网络设备，并且视情况而定与此类被引用的网络设备中的每个网络设备或此类网络设备的组相关联地存储网络升级工具100可以检索并用于自动化维护的到目标网络操作系统级别的路径(即：在当前级别和目标级别之间的所有必要的中间操作系统级别)。The network upgrade tool 100 can further interface with an upgrade path DB 104, which can reference network devices in the data center and, as the case may be, each of such referenced network devices or such network devices The set of is associated with stores the path to the target network operating system level (ie: all necessary intermediate operating system levels between the current level and the target level) that the network upgrade tool 100 can retrieve and use to automatically maintain.

网络升级工具100可以进一步与规则DB 112接口，该规则DB可以存储网络升级工具100可以检索和用于自动维护的规则。例如，网络升级工具100可以检查某些网络设备是否符合某些规则。例如，规则可以详细说明数据中心中共享相同INFRA标签值的网络设备的预期冗余。例如，其他规则可能适用于共享BU-ROLE-INFRA标签值的相同组合的网络设备。The network upgrade tool 100 can further interface with a rules DB 112, which can store rules that the network upgrade tool 100 can retrieve and use for automatic maintenance. For example, the network upgrade tool 100 can check whether certain network devices comply with certain rules. For example, a rule could detail the expected redundancy of network devices in a data center that share the same INFRA label value. For example, other rules might apply to network devices that share the same combination of BU-ROLE-INFRA tag values.

本领域技术人员将理解，尽管被表示为三个单独的物理和逻辑实体，但CMDB 103、升级路径DB 104和规则DB112可以全部或部分地形成相同物理和/或逻辑数据库的一部分，或者在物理上是网络升级工具100的一部分，而不影响本文教导的一般性。此外，根据部署、控制和维护基础设施的组织的公司和开发环境，规则DB 112甚至可能不是单独的数据库，并且规则可能被硬编码，例如被硬编码在网络升级工具100中。Those skilled in the art will appreciate that, although represented as three separate physical and logical entities, the CMDB 103, the upgrade path DB 104 and the rules DB 112 may all or partly form part of the same physical and/or logical database, or be located in a physical The above is part of the network upgrade tool 100 and does not affect the generality of the teachings herein. Furthermore, depending on the company and development environment of the organization deploying, controlling and maintaining the infrastructure, the rules DB 112 may not even be a separate database, and the rules may be hardcoded, for example in the network upgrade tool 100 .

数据中心可以包括通过基础设施中的连接110互连的多个网络设备109，如图1所示，仅作为简化示例。每个网络设备109可以与作为其标签的一部分的它相关联：A data center may include a plurality of network devices 109 interconnected by connections 110 in the infrastructure, as shown in FIG. 1 , as a simplified example only. Each network device 109 may have associated with it as part of its label:

-ROLE；-ROLE;

-INFRA 111；以及-INFRA 111; and

-BU 108。-BU 108.

此外，LOCATION信息107a-107b也与每个网络设备109相关联。In addition, LOCATION information 107a - 107b is also associated with each network device 109 .

图2呈现了根据本技术的方法的广泛概述。在步骤201和202，可以收集/获取数据中心中要维护的网络设备109(其可以存在于要维护的设备的列表中)的某些参数。这包括在步骤201：与网络设备109中的每个网络设备相关联的一个或多个标签，包括它们的BU、ROLE、INFRA以及它们的MANAGEMENTIP和LOCATION信息。并且这包括在步骤202：使用网络设备109的对应的MANAGEMENTIP信息从网络设备获取与它们的硬件型号和它们的当前网络操作系统有关的某些参数。Figure 2 presents a broad overview of methods in accordance with the present technology. In steps 201 and 202, certain parameters of the network devices 109 to be maintained in the data center (which may exist in the list of devices to be maintained) may be collected/obtained. This includes, at step 201 : one or more labels associated with each of the network devices 109 , including their BU, ROLE, INFRA and their MANAGEMENTIP and LOCATION information. And this includes, at step 202 : using the corresponding MANAGEMENTIP information of the network device 109 to obtain from the network devices certain parameters related to their hardware model and their current network operating system.

在步骤203，可以从维护列表中移除具有不受支持的硬件和软件特性(诸如不受支持的硬件型号或当前网络操作系统)的即不易受本技术的自动化维护影响的或者其BU、ROLE、INFRA标签或LOCATION信息有错误的那些网络设备109。例如，它们的网络操作系统级别可能不是受支持的级别，或者它们的硬件型号可能已过时或无法识别。例如，它们的标签可能不具有本技术的自动维护系统所识别的值，或者对于BU、ROLE或INFRA标签等中的任一者而言(严格地)具有多于或少于一个单个值。In step 203, those with unsupported hardware and software characteristics (such as unsupported hardware models or current network operating systems) that are not susceptible to the automatic maintenance of this technology or their BU, ROLE can be removed from the maintenance list. , INFRA label or those network devices 109 with wrong LOCATION information. For example, their network operating system level might not be a supported level, or their hardware model might be outdated or unrecognized. For example, their tags may not have a value recognized by the automated maintenance system of the present technology, or have (strictly) more or less than a single value for any of the BU, ROLE, or INFRA tags, etc.

在步骤204，列表上的剩余网络设备109可以如下群集：At step 204, the remaining network devices 109 on the list may be clustered as follows:

-形成共享相同BU标签的网络设备109的i个集群BU_i；- forming i clusters BU _i of network devices 109 sharing the same BU label;

-在每个BU_i集群内，形成共享相同ROLE标签的网络设备109的j个集群ROLE_j；- within each BU _i cluster, form j clusters ROLE _j of network devices 109 sharing the same ROLE label;

-在每个ROLE_j集群内，形成共享相同的INFRA标签的网络设备109的k个集群Cluster_ijk。- Within each ROLE _j cluster, k clusters Cluster _ijk of network devices 109 sharing the same INFRA label are formed.

本领域技术人员将理解，与所描述的不同的群集网络设备109的另一顺序可以用于得出集群Cluster_ijk的构造，这仍在本公开的教导内。例如，集群可以首先由共享相同ROLE标签的网络设备109组成，然后由共享相同BU标签的网络设备109组成等。Those skilled in the art will appreciate that another sequence of cluster network devices 109 than described may be used to derive the configuration of the cluster Cluster _ijk , still within the teachings of this disclosure. For example, a cluster may first be composed of network devices 109 sharing the same ROLE label, then network devices 109 sharing the same BU label, and so on.

在步骤205，在每个形成的Cluster_ijk中，可以识别网络设备109之间的冗余。该冗余可以在INFRA_k级别处、以本技术的自动化系统可以应用以将冗余网络设备109分组成每个Cluster_ijk内的组的冗余规则限定。一个组可以包括1个(该网络设备没有冗余)或2个或更多个网络设备109。In step 205, in each formed Cluster _ijk , redundancy between network devices 109 may be identified. This redundancy may be defined at the INFRA _k level, with redundancy rules that automation systems of the present technology may apply to group redundant network devices 109 into groups within each Cluster _ijk . A group may include 1 (the network device has no redundancy) or 2 or more network devices 109 .

在步骤206，在集群Cluster_ijk中的每个形成的组中，可以利用适用于BU_i-ROLE_j-INFRA_k的特定组合的特定大小规则来检查存在于该组中的网络设备109的符合度。例如，大小规则可以指示集群Cluster_ijk中的所有组必须包括3个网络设备109。属于包括(严格地)多于或少于3个网络设备109的组的所有网络设备109都是不符合的。此外，可以执行组中的所有网络设备109是否共享相同LOCATION信息的检查。In step 206, in each formed group in the cluster Cluster _ijk , a specific size rule applicable to a specific combination of BU _i -ROLE _j -INFRA _k can be used to check the conformity of the network devices 109 present in the group . For example, the size rule may indicate that all groups in the cluster Cluster _ijk must include 3 network devices 109 . All network devices 109 belonging to a group comprising (strictly) more or less than 3 network devices 109 are non-compliant. Furthermore, a check may be performed whether all network devices 109 in the group share the same LOCATION information.

在步骤207，可以从维护列表中移除在步骤204和205没有被群集或分组、不符合冗余规则或大小规则或者在步骤205和206之后当存在于同一组中时不共享相同LOCATION信息的那些网络设备109。In step 207, those that were not clustered or grouped in steps 204 and 205, did not comply with redundancy rules or size rules, or did not share the same LOCATION information when present in the same group after steps 205 and 206 may be removed from the maintenance list Those network devices 109 .

在步骤208，根据本技术的自动化过程可以依次升级组中的所有剩余网络设备109。由于这些网络设备109是冗余的网络设备，因此对数据中心的功能的中断以及对提供给部署、控制和维护数据中心的组织的客户的网络连接性服务的中断被最小化，并且数据中心的由于网络设备的维护的部分或全部的技术不可用性被尽可能地限制。可以对Cluster_ijk中的所有创建的组以及对所有形成的Cluster_ijk集群执行相同的操作。对于经过调整的多个Cluster_ijk集群，可以并行执行相同的操作以增加维护过程的速度，同时使数据中心的整体停机时间以及对部署、控制和维护数据中心的组织的客户的服务的中断最小化。此外，还可以调整处理Cluster_ijk集群的顺序，以优化对组织的客户的服务的持续性。At step 208, an automated process in accordance with the present technique may sequentially upgrade all remaining network devices 109 in the group. Because these network devices 109 are redundant network devices, interruptions to the functionality of the data center and to network connectivity services provided to customers of organizations deploying, controlling and maintaining the data center are minimized, and the data center Partial or total technical unavailability due to maintenance of network equipment is limited as much as possible. The same can be done for all created groups in Cluster _ijk and for all formed Cluster _ijk clusters. For tuned multiple Cluster _ijk clusters, the same operations can be performed in parallel to increase the speed of the maintenance process while minimizing the overall downtime of the data center and disruption of services to the customers of the organization that deploys, controls and maintains the data center . In addition, the order of processing Cluster _ijk clusters can also be adjusted to optimize the continuity of service to the organization's customers.

图3a至图3c提供了根据本技术的方法的更详细视图。参考图3a，在步骤301，可以尝试获取要在数据中心中维护的网络设备的当前网络操作系统的级别和硬件型号的信息。这些网络设备可以通过维护列表上的ID列表列出。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)向网络升级工具100提供网络设备ID的维护列表，(ii)网络升级工具100收集来自CMDB 103的对应的MANAGEMENTIP，以及(iii)网络升级工具100使用检索到的对应的MANAGEMENTIP来查询要维护的网络设备以获得当前网络操作系统的级别和硬件型号的信息。收集/获取不一定成功，可以在步骤302和307将收集/获取不成功的网络设备的ID添加到错误列表，并从维护列表中移除。Figures 3a-3c provide more detailed views of methods in accordance with the present technology. Referring to FIG. 3 a , in step 301 , an attempt may be made to obtain information on the level of the current network operating system and the hardware model of the network device to be maintained in the data center. These network devices can be listed by maintaining a list of IDs on the list. Referring back to FIG. 1 , this can be performed, for example, by (i) providing (by data center operator 101 or otherwise) a maintenance list of network device IDs to network upgrade tool 100, (ii) network upgrade tool 100 collecting data from The corresponding MANAGEMENTIP of the CMDB 103, and (iii) the network upgrade tool 100 uses the retrieved corresponding MANAGEMENTIP to query the network device to be maintained to obtain the information of the current network operating system level and hardware model. The collection/acquisition is not necessarily successful, and the IDs of network devices whose collection/acquisition is unsuccessful can be added to the error list and removed from the maintenance list in steps 302 and 307.

在步骤303，可以对维护列表上的剩余网络设备(即：在步骤301/302之后ID没有被添加到错误列表的那些网络设备)进行确定，在步骤301获取的硬件型号是否是受支持的硬件型号。如本文所用，“受支持的硬件型号”是指数据中心的维护操作能够处理的硬件型号。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)在规则DB112中存储受支持的硬件型号，(ii)网络升级工具100从规则DB 112中检索那些受支持的硬件型号，以及(iii)网络升级工具100将检索到的受支持的硬件型号与在步骤301获取的硬件型号进行比较。如果确定不成功或者硬件型号是不受支持的硬件型号，则可以在步骤304和307将对应网络设备的ID添加到错误列表，并从维护列表中移除。In step 303, the remaining network devices on the maintenance list (that is: those network devices whose IDs are not added to the error list after steps 301/302) can be determined, whether the hardware model obtained in step 301 is supported hardware model. As used herein, "Supported Hardware Models" means the hardware models that Data Center's maintenance operations can handle. Referring back to FIG. 1, this can be performed, for example, by (i) storing supported hardware models (by the data center operator 101 or otherwise) in the rules DB 112, (ii) Retrieve those supported hardware models in , and (iii) the network upgrade tool 100 compares the retrieved supported hardware models with the hardware models obtained in step 301 . If the determination is unsuccessful or the hardware model is not supported, then in steps 304 and 307, the ID of the corresponding network device can be added to the error list and removed from the maintenance list.

在步骤305，可以尝试收集维护列表上要在数据中心中维护的剩余网络设备的标签的信息以及LOCATION和MANAGEMENTIP信息。这可以由网络升级工具100从CMDB 103中收集该信息来执行。收集不一定成功，并且可以在步骤306和307将收集不成功的网络设备的ID添加到错误列表，并从维护列表中移除。In step 305, an attempt may be made to collect label information of the remaining network devices to be maintained in the data center on the maintenance list, as well as LOCATION and MANAGEMENTIP information. This can be performed by the network upgrade tool 100 collecting this information from the CMDB 103 . The collection is not necessarily successful, and in steps 306 and 307 the IDs of the network devices whose collection is unsuccessful can be added to the error list and removed from the maintenance list.

参考图3b，在步骤310，可以对维护列表上的剩余网络设备进行确定，这些网络设备是否分别具有一个且仅一个标签BU、ROLE和INFRA的值(如在步骤305获取的)。可以在步骤311和312将在BU、ROLE或INFRA标签中的任一者中没有值或具有不止一个值的网络设备的ID添加到错误列表中，并从维护列表中移除。Referring to FIG. 3 b , in step 310 , it may be determined whether the remaining network devices on the maintenance list have one and only one value of the labels BU, ROLE and INFRA (as acquired in step 305 ). IDs of network devices with no value or more than one value in any of the BU, ROLE or INFRA tags may be added to the error list and removed from the maintenance list at steps 311 and 312 .

在步骤313，可以将维护列表上的剩余网络设备分组成共享相同BU标签的集群。可以在步骤314和312将在BU中具有未知/未识别值的网络设备的ID添加到错误列表，并且从维护列表中移除。In step 313, the remaining network devices on the maintenance list may be grouped into clusters sharing the same BU label. IDs of network devices with unknown/unrecognized values in BU may be added to the error list and removed from the maintenance list at steps 314 and 312 .

在步骤316，可以在每个BU集群中将维护列表上的剩余网络设备分组成共享相同ROLE标签的子集群。在ROLE中具有未知/识别值的网络设备的ID可以在步骤317和312被添加到错误列表中，并从维护列表中移除。In step 316, the remaining network devices on the maintenance list may be grouped into sub-clusters sharing the same ROLE label in each BU cluster. IDs of network devices with unknown/identified values in ROLE may be added to the error list and removed from the maintenance list at steps 317 and 312 .

图4提供了根据图3b中的步骤313和316群集/子群集数据中心中的网络设备的逻辑图示。仅表示了数据中心网络设备401的一部分。网络设备401各自拥有已经被图示为“NDxx”的唯一的ID。作为步骤313的结果，网络设备401各自被群集成被图示为BU01和BU11的BU集群402。作为步骤316的结果，网络设备401各自被子群集成被图示为ROLE01、ROLE07、ROLE13等的ROLE子集群403。Figure 4 provides a logical diagram of network devices in a cluster/sub-cluster data center according to steps 313 and 316 in Figure 3b. Only a portion of data center network equipment 401 is shown. The network devices 401 each have a unique ID which has been illustrated as "NDxx". As a result of step 313, network devices 401 are each clustered into a BU cluster 402, illustrated as BU01 and BU11. As a result of step 316, network devices 401 are each subclustered into ROLE subclusters 403, illustrated as ROLE01, ROLE07, ROLE13, and so on.

现在返回图3c，在步骤320，对于在步骤313创建的每个集群BU_i和在步骤316创建的每个子集群ROLE_j，网络设备可以进一步被分组成共享相同INFRA标签k(INFRA_k)的子子集群Cluster_ijk。Returning now to Figure 3c, at step 320, for each cluster BU _i created at step 313 and each subcluster ROLE _j created at step 316, network devices can be further grouped into subgroups sharing the same INFRA label k (INFRA _k ) Subcluster Cluster _ijk .

在步骤321，根据适用于对应INFRA_k的冗余规则，可以进一步将在每个Cluster_ijk中看起来是冗余的网络设备分组成冗余网络设备的组。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)在规则DB 112中存储特定于INFRA_k的冗余规则，(ii)网络升级工具100从规则DB 112中检索那些冗余规则，以及(iii)网络升级工具100根据那些冗余规则将网络设备分组成每个Cluster_ijk中的组。例如，规则可以是将固有地携带冗余信息的网络设备ID的规范化。例如，规则可以是INFRA_k中的冗余网络设备的ID以特定于该INFRA_k的相同字符链结尾。In step 321, according to the redundancy rules applicable to the corresponding INFRA _k , the network devices that appear to be redundant in each Cluster _ijk may be further grouped into groups of redundant network devices. Referring back to FIG. 1 , this can be performed, for example, by (i) storing (by the data center operator 101 or otherwise) redundancy rules specific to INFRA _k in the rules DB 112, (ii) the network upgrade tool 100 Those redundancy rules are retrieved from the rules DB 112, and (iii) the network upgrade tool 100 groups network devices into groups in each Cluster _ijk according to those redundancy rules. For example, a rule may be the normalization of a network device ID that would inherently carry redundant information. For example, a rule could be that the IDs of redundant network devices in an INFRA _k end with the same chain of characters specific to that INFRA _k .

在步骤322，可以根据适用于对应组合BU_i-ROLE_j-INFRA_k的规则，检查每个组中存在的网络设备的硬件型号是否是一个或几个预期硬件型号之一。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)在规则DB 112中存储特定于BU_i-ROLE_j-INFRA_k的硬件型号预期规则，(ii)网络升级工具100从规则DB 112中检索那些硬件型号预期规则，以及(iii)网络升级工具100根据那些硬件型号预期规则检查每个Cluster_ijk中的组中的网络设备。In step 322, it may be checked whether the hardware model of the network device existing in each group is one or one of several expected hardware models according to the rules applicable to the corresponding combination BU _i -ROLE _j -INFRA _k . Referring back to FIG. 1, this can be performed, for example, by (i) storing (by the data center operator 101 or otherwise) in the rules DB 112 hardware model expectation rules specific to BU _i -ROLE _j -INFRA _k , (ii) the network upgrade tool 100 retrieves those hardware model expected rules from the rule DB 112, and (iii) the network upgrade tool 100 checks the network devices in the groups in each Cluster _ijk according to those hardware model expected rules.

在步骤323，可以根据适用于对应组合BU_i-ROLE_j-INFRA_k的大小规则，检查每个组中存在的网络设备的实际数目。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)在规则DB112中存储特定于BU_i-ROLE_j-INFRA_k的大小(网络设备预期的数目)规则，(ii)网络升级工具100从规则DB 112中检索那些大小规则，以及(iii)网络升级工具100根据那些大小规则检查每个Cluster_ijk中的组中的网络设备的数目。In step 323, the actual number of network devices present in each group may be checked according to the size rules applicable to the corresponding combination BU _i -ROLE _j -INFRA _k . Referring back to FIG. 1, this can be performed, for example, by (i) storing (by the data center operator 101 or otherwise) in the rules DB 112 specific sizes of BU _i -ROLE _j -INFRA _k (network device expected number) rules, (ii) the network upgrade tool 100 retrieves those size rules from the rule DB 112, and (iii) the network upgrade tool 100 checks the number of network devices in groups in each Cluster _ijk according to those size rules.

在步骤324，可以验证每个组中存在的所有网络设备是否共享相同LOCATION信息。从步骤305起，网络升级工具100可以使用该LOCATION信息。At step 324, it may be verified whether all network devices present in each group share the same LOCATION information. From step 305, the network upgrade tool 100 can use the LOCATION information.

在步骤325和326，可以将没有在步骤320或321被分组、在步骤322与预期硬件型号之一不匹配、在步骤323没有在组中总计达到预期数目、或者当在相同组中时没有共享相同LOCATION的所有网络设备添加到错误列表中，并从维护列表中移除。In steps 325 and 326, may not be grouped in steps 320 or 321, did not match one of the expected hardware models in step 322, did not add up to the expected number in the group in step 323, or did not share when in the same group All network devices of the same LOCATION are added to the error list and removed from the maintenance list.

在步骤327，可以维护维护列表上的剩余网络设备：对于组中的所有网络设备，对于所有Cluster_ijk，这是依次执行的。返回参考图1，这可以例如通过以下方式执行：(i)(由数据中心运营商101或以其他方式)在升级路径DB 104中存储用于将网络设备从当前网络操作系统级别带到目标网络操作系统级别的升级规则，(ii)网络升级工具100从升级路径DB 104中检索那些升级规则，以及(iii)网络升级工具100通过网络配置器105应用升级。In step 327, the remaining network devices on the maintenance list may be maintained: for all network devices in the group, for all Cluster _ijk , this is performed sequentially. Referring back to FIG. 1, this can be performed, for example, by (i) storing (by the data center operator 101 or otherwise) in the upgrade path DB 104 the OS level upgrade rules, (ii) the network upgrade tool 100 retrieves those upgrade rules from the upgrade path DB 104 and (iii) the network upgrade tool 100 applies the upgrade through the network configurator 105 .

通过本技术，对数据中心的相同客户提供相同服务所涉及的网络设备已被识别并分组在一起，并且在移除不符合且对于其的自动化维护风险太大的那些网络设备之后，对属于一个组中的所有网络设备依次执行维护。因此，服务并没有完全中断，只是视情况而定降级。With this technology, the network devices involved in providing the same service to the same customers in the data center have been identified and grouped together, and after removing those network devices that are not compliant and for which automated maintenance is too risky, the All network devices in the group perform maintenance sequentially. Therefore, the service is not completely disrupted, but degraded as appropriate.

本领域技术人员将理解，通过跟踪错误原因，在步骤307、312和326所指的错误列表的粒度可以更高。例如，错误列表可以分解为子列表，该子列表允许在(i)数据中心运营商101可能能够纠正的错误诸如标签、分类、标识与(ii)链接到故障网络设备诸如通过其网络操作系统级别链接到故障网络设备的错误等之间进行区分。Those skilled in the art will appreciate that the granularity of the error lists referred to in steps 307, 312 and 326 can be higher by tracking the cause of the errors. For example, the list of errors may be broken down into sublists that allow errors such as labeling, classification, identification, and (ii) linking to faulty network devices such as through their network operating system level Distinguish between errors that are linked to faulty network devices, etc.

图5图示了可以在本技术中使用的计算系统。呈现了可以用于网络升级工具100和/或网络配置器105的计算系统500的实施方案的示例。如本领域技术人员将理解的，这样的计算系统可以在任何其他合适的硬件、软件和/或固件或其组合中实现，并且可以是单个物理实体或者具有分布式功能的几个单独的物理实体。Figure 5 illustrates a computing system that may be used in the present technology. An example of an implementation of a computing system 500 that may be used with network upgrade tool 100 and/or network configurator 105 is presented. As will be understood by those skilled in the art, such a computing system may be implemented in any other suitable hardware, software, and/or firmware, or a combination thereof, and may be a single physical entity or several separate physical entities with distributed functionality .

在本技术的一些方面，计算系统500可以包括各种硬件部件，包括共同由处理器501表示的一个或多个单核或多核处理器、固态驱动器502、存储器503和输入/输出接口504。在该上下文中，处理器501可以被或可以不被包括在FPGA中。在一些其他方面，计算系统500可以是“现成的”通用计算系统。在一些方面，计算系统500也可以分布在多个系统中。计算系统500也可以专门用于本技术的实施方案。如本技术领域的技术人员可以理解的，在不背离本技术的范围的情况下，可以设想关于如何实现计算系统500的多种变化。In some aspects of the present technology, computing system 500 may include various hardware components including one or more single-core or multi-core processors collectively represented by processor 501 , solid-state drive 502 , memory 503 , and input/output interface 504 . In this context, processor 501 may or may not be included in an FPGA. In some other aspects, computing system 500 can be an "off-the-shelf" general-purpose computing system. In some aspects, computing system 500 may also be distributed among multiple systems. Computing system 500 may also be specific to embodiments of the present technology. As can be appreciated by those skilled in the art, numerous changes may be envisaged as to how computing system 500 may be implemented without departing from the scope of the present technology.

计算系统500的各种部件之间的通信可以通过各种硬件部件以电子方式耦合到的一个或多个内部和/或外部总线505(例如PCI总线、通用串行总线、IEEE 1394“火线”总线、SCSI总线、串行ATA总线、ARINC总线等)来实现。Communications between the various components of computing system 500 may be through one or more internal and/or external buses 505 (e.g., PCI bus, Universal Serial Bus, IEEE 1394 "FireWire" bus) to which the various hardware components are electronically coupled. , SCSI bus, Serial ATA bus, ARINC bus, etc.) to achieve.

输入/输出接口504可以允许使能联网能力，诸如有线或无线访问。作为示例，输入/输出接口504可以包括联网接口，诸如但不限于网络端口、网络插口、网络接口控制器等。可以如何实现联网接口的多个示例对于本技术领域的技术人员将变得明显。根据本技术的实施方案，固态驱动器502可以存储程序指令，诸如适合加载到存储器503并由处理器501执行以用于根据本技术的方法和过程步骤的库、应用等的那些部分。Input/output interface 504 may allow for enabling networking capabilities, such as wired or wireless access. As an example, input/output interface 504 may include a networking interface such as, but not limited to, a network port, network socket, network interface controller, and the like. Several examples of how the networking interface may be implemented will become apparent to those skilled in the art. Solid state drive 502 may store program instructions, such as those portions of libraries, applications, etc., adapted to be loaded into memory 503 and executed by processor 501 for methods and process steps in accordance with the present technology, according to embodiments of the present technology.

尽管已经参照以特定顺序执行的特定步骤描述和示出了上述实施方案，但是将理解，在不背离本公开的教导的情况下，这些步骤可以被组合、细分或重新排序。步骤中的至少一些步骤可以并行或串行执行。因此，步骤的顺序和分组不是本技术的限制。还应明确理解，并非本文提到的所有技术效果都需要在本技术的每个实施方式中都享有。Although the foregoing embodiments have been described and illustrated with reference to particular steps performed in a particular order, it will be understood that these steps may be combined, subdivided, or reordered without departing from the teachings of the disclosure. At least some of the steps may be performed in parallel or serially. Accordingly, the order and grouping of steps is not a limitation of the present technology. It should also be clearly understood that not all technical effects mentioned herein need to be enjoyed in every embodiment of the technology.

对本技术的上述实施方案的修改和改进对于本领域技术人员来说可以是明显的。前面的描述旨在是示例性的而不是限制性的。因此，本技术的范围旨在仅由所附权利要求的范围来限制。Modifications and improvements to the above-described embodiments of the technology may be apparent to those skilled in the art. The foregoing description is intended to be illustrative rather than limiting. Accordingly, the scope of the technology is intended to be limited only by the scope of the appended claims.

Claims

1. A method for maintaining a plurality of network devices in a data center, comprising:

collecting, for at least one network device of the plurality of network devices, the at least one network device selected from a maintenance list:

-a management IP, the management IP being a virtual identity uniquely associated with the at least one network device;

-a location, which is information representative of an actual physical location of the at least one network device; and

-a tag being a set of key values associated with the at least one network device, the key values being one of data and pointers to the data,

the tag is one of:

-a BU tag, said BU tag being information representative of a product provisioning application of said at least one network device in said data center;

-a ROLE tag being information representative of a function occupied by the at least one network device in the data center; and

-an INFRA tag, the INFRA tag being information representative of a version of the data center in which the at least one network device is operating;

for the at least one network device on the maintenance list, acquiring a hardware model and a current network operating system level of the at least one network device based on the management IP of the at least one network device; and

in response to determining at least one of:

the hardware model of the at least one network device is not supported,

the current network operating system level of the at least one network device is not supported, an

An error in at least one of the BU tag, the ROLE tag, the INFRA tag, and the location tag,

removing the at least one network device from the maintained list.

2. The method of claim 1, further comprising determining at least one of:

the hardware model of the at least one network device is not supported,

An error in at least one of the BU tag, the ROLE tag, the INFRA tag, and the location tag.

3. The method of claim 1, further comprising:

collecting, for each network device of the plurality of network devices:

the management IP is a set of IP addresses that are managed,

said position, and

the label;

in response to removing the at least one network device from the maintenance list, clustering remaining network devices on the maintenance list into:

i clusters of BUs of network devices having the same BU tag value among the associated tags of the remaining network devices _i ，

Nets having the same ROLE key value in one of the associated labels of the remaining network devicesNetwork device j cluster ROLE _j And ROLE of a network device having the same INFRA tag value in one of the associated tags of the remaining network devices _j K clusters within a Cluster _ijk ；

At each Cluster _ijk Creating a redundant group of the plurality of network devices within the cluster; and

according to the suitability for INFRA _k Verifying whether the number of redundant network devices in each created group matches the first number; and

according to applicable combination BU _i -ROLE _j -INFRA _k Verifying whether the number of network devices in each group matches the second number.

4. The method of claim 3, further comprising:

verifying whether the network devices of each group share the same location;

removing from the maintenance list the following network devices: at least one network device present in a given group and having a number of redundancies that does not match the first number; at least one network device present in another given group in a number that does not match the second number; or at least one network device present in another group and not sharing the same location.

5. The method of claim 4, further comprising:

all Clusters are managed using the upgrade rules and corresponding management IPs _ijk The remaining network devices in each group within the cluster upgrade from a current version of the network operating system to a target version of the operating system.

6. The method of claim 1, wherein the collecting and the obtaining are performed using a unique ID assigned to each of the network devices on the maintenance list by a data center operator at a time of deployment of the data center, and the unique ID is associated with any of the associated tags of each network device, the management IP, and the location.

7. The method of claim 5, wherein removing from the maintained list further comprises: network devices that do not have exactly one value for each of the BU tag, the roll tag, and the INFRA tag are removed.

8. The method of claim 6, wherein removing from the maintained list further comprises: an error list is created that is populated with the IDs of the removed network devices.

9. The method of claim 8, wherein the error list further comprises: a sub-list of errors that can be corrected by the data center operator and errors that link to a failed network device.

10. The method of claim 6, wherein the upgrading further comprises: adjusting Cluster for parallel processing _ijk A number of clusters to minimize downtime of the data center.

11. The method of claim 5, wherein the upgrading further comprises: adjusting process Cluster _ijk An order of clustering to optimize continuity of service to customers of an organization that deploys, controls, and maintains the data center.

12. A system for maintaining network devices in a data center, the system comprising a network upgrade tool coupled to a CMDB, a rules DB, an upgrade path DB, and a network configurator, the network upgrade tool configured to:

-collecting from the CMDB for each of a plurality of network devices on a maintenance list:

-a management IP, said management IP being a virtual identity uniquely associated with a given network device;

-a location, which is information representative of an actual physical location of the given network device; and

-a label being a key-value tuple associated with the given network device,

wherein the tag is one of:

-information, BU, representing a service or product offering for which the network device is used in the data center;

-information representative of the locations and functions occupied by said network devices in said data centre, ROLE; and

-information representative of a version or generation of the data center, INFRA, in which the network device operates;

the value for the tag is the actual data or a pointer to the data;

-for each of the network devices on the maintenance list, obtaining the hardware model and current network operating system level of the given network device using the respective management IP;

-removing from the maintenance list: a network device having an unsupported hardware model; or a network device with a current network operating system level that is not supported; or a network device having an error in any of its respective associated BU tag, roll tag, and INFRA tag or in any of its locations;

-clustering the remaining network devices on the maintenance list into: i clusters BU of network devices having the same BU key value in one of the associated labels of the remaining network devices _i (ii) a In BU _i J cluster ROLEs of network devices having the same ROLE key value in one of the associated labels of the remaining network devices within a cluster _j (ii) a And ROLE of network devices having the same INFRA key in one of the associated tags of the remaining network devices _j K clusters within a Cluster _ijk ；

-adapting the INFRA according to what is collected from the rule DB _k At each Cluster _ijk Creating redundant network devices within a clusterPreparing the group;

according to the suitability for use in INFRA _k Verifying whether the number of redundant network devices in each group matches the first number;

according to applicable combination BU _i -ROLE _j -INFRA _k Verifying whether the number of network devices in each group matches the second number;

-verifying whether all network devices in each created group share the same location;

-removing from the maintenance list the following network devices: network devices present in one of the groups and having a number of redundancies that does not match the first number; a number of network devices present in one of the groups that does not match the second number; or network devices that are present in one of the groups and that do not share the same location; and

-using the upgrade rules and corresponding management IPs collected from the upgrade path DB, by the network configurator, for all clusters _ijk All groups within the cluster upgrade the remaining network devices in each group from a current version of the network operating system to a target version of the operating system.

13. The system of claim 12, wherein the network upgrade tool is further configured to: the collecting and retrieving is performed using a unique ID assigned to each of the plurality of network devices on the maintenance list by a data center operator at the time of deployment of the data center, and the unique ID is associated with any of the associated tags of each network device, the management IP, and the location.

14. The system of claim 12, wherein the redundancy rules and the size rules are hard-coded in the network upgrade tool.

15. A computer-readable medium comprising instructions that cause a computing system to perform the method of claim 1.