WO2023217396A1

WO2023217396A1 - Test channel scheduling using artificial intelligence

Info

Publication number: WO2023217396A1
Application number: PCT/EP2022/063105
Authority: WO
Inventors: Sahar TAHVILI; Chen Song; Jiecong YANG; Yulin CUI
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2023-11-16
Anticipated expiration: 2024-11-13

Abstract

A method for testing a system using testbeds is provided. The method includes obtaining a set of task lists. A task list is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and the set of tasks lists define a testing protocol for the system. The method includes scheduling the set of task lists on a set of testbeds using reinforcement learning. The scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list.

Description

TEST CHANNEL SCHEDULING

USING ARTIFICIAL INTELLIGENCE

TECHNICAL FIELD

[0001] Disclosed are embodiments related to test channel scheduling using artificial intelligence.

BACKGROUND

[0002] For testing cloud-based Radio Access Network (RAN) applications such a virtual Distributed Unit (vDU) and a virtual Central Unit (vCU), an “epic” needs to be created manually. An “epic” refers to a high-level requirement, task, or feature set that teams can break down into smaller user stories, and is a term used in software development, such as for agile software development. For testing an epic, several test cases need to be executed on a cloudnative infrastructure, such as a test channel. Each test channel may have a different configuration and thereby requires different costs. A manual mapping between the created epics and test channels needs to be done daily. Manual scheduling of the epics to the test channels has required a team where they need to master the configuration for each test channel and the cost of each test channel. Having a large set of daily created epics with different priorities and the cost of each test channel requires a more accurate and automated effort. Scheduling of the epics to test channels is a resource-consuming manual process that has suffered from human judgments, errors, ambiguity, and uncertainty. Moreover, the current manual process cannot handle a large set of epics and test channels. Embodiments disclosed herein are able to handle (read, parse, and analyze) different kinds of unstructured natural text, such as, for example, Word documents, and Excel worksheets, as input.

SUMMARY

[0003] Full-stack virtualization of the 5G new radio (NR) central unit (CU) and distributed unit (DU) based on commercial off-the-shelf (COTS) hardware using cloud-native technologies is currently being considered [1], In order to accelerate the product development, several cloud-native infrastructures and technologies need to be utilized. Moreover, the products need to be tested in several testing levels such as unit, integration, and system acceptance testing [2], For testing different parts of the products and applications, an epic needs to be created. An epic corresponds to a macro functionality of the system to develop. Each epic includes a set of user stories that need to be tested separately. The feature will be specified gradually during the testing process, and the epic completed with the addition/deletion of the user story. The development of an epic may cover several versions. The features that need to be tested are usually described textually in an epic [3], An epic may consist of a set of task lists defining a testing protocol for a cloud-based application. For testing the described features inside of an epic, a cloud-native test channel needs to be assigned that satisfies the required configuration for the test. For example, depending on the set of user stories or tasks within the epic, different configurations may be needed.

[0004] In fact, each cloud-native test channel has different configurations and capabilities, such as Radio Gateway (RGW), 5G Spectrum Sharing, Traffic, and external simulators. The cost of building and using each cloud-native test channel depends on the test channel’s configuration and capabilities. However, for testing a product, a set of test cases should be defined and executed. Mapping a test channel with a proper configuration and capacities has required efforts from subject matter experts (SMEs). Furthermore, each epic has a different priority, starting date, and also duration. During a study, it was assessed that using a test channel amounts to a significant cost. Therefore, assigning a simple epic that requires a simple test channel (e.g., a test channel without traffic) to a test channel with a complex configuration (e.g., an external simulator such as VIA VI) can directly cause a waste of money and testing resources. Simultaneously, the delivery plan of the product and also the priority of the upcoming epics need to be considered as well. In this regard, having an Al-based decision support system, which can manage epics priorities, on-time delivery, feasible configuration, test channels availabilities, and also the cost of each cloud-native test channel is very beneficial.

[0005] In this disclosure, an Al-based decision support system for dynamic scheduling of the upcoming epics on the cloud-native test channels is introduced, implemented, and evaluated. The system includes two main phases including different artificial intelligence techniques such as natural language processing and reinforcement learning. The system reads an epic (e.g., in form of a lira ticket written in an unstructured natural text) as input and schedules it for the testing considering several criteria such as test channel configuration, availability, and also cost. Furthermore, the system can easily be adapted for other testing activities such as reading a troubleshooting ticket as input and scheduling it for root cause analysis and further testing.

[0006] Existing approaches currently have problems including that such techniques do not consider inputs as an epic (or any type of cloud-based request) that are written in a non- formal natural text. The related art are not suitable for the cloud RAN domain for cloud-native infrastructure. The related art also is not able to handle (read, parse, and analyze) an unstructured natural text as input. Other shortcomings also exist.

[0007] References

[0008] [1] https://www.ericsson.eom/en/blog/2020/2/virtualized-5g-ran-why -when-and- how

[0009] [2] Tahvili. S, “Multi-Criteria Optimization of System Integration Testing”,

Malardalens University, 2018.

[0010] [3] C. Moretti, “USER STORY AND EPIC. WHAT IS IT?”, 2016, https://lyontesting.fr/en/user-story-and-epic-what-is-it/

[0011] [4] Mayer. R, “Comparative expression processing”, US11087087B1, 2021

[0012] [5] Bishop et al. “Systems and methods for planning, scheduling, and management”, CA2501902A1, 2004.

[0013] [6] Eran et al., “Managing a device cloud”, US10567479B2, 2018.

[0014] [7] Fang et al., “Flow shop scheduling method based on deep reinforcement learning”, CN112987664A, 2018

[0015] According to a first aspect, a method for testing a system using testbeds is provided. The method includes obtaining a set of task lists. A task list is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and the set of tasks lists define a testing protocol for the system. The method includes scheduling the set of task lists on a set of testbeds using reinforcement learning. The scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list.

[0016] In some embodiments, the system comprises a cloud-based application, the testbeds comprise cloud-native test channels, and the testbed configurations comprise software, hardware, and networking resources. In some embodiments, the method further includes initiating execution of the set of tasks lists on the set of testbeds based on the step of scheduling the set of task lists on a set of testbeds using reinforcement learning. In some embodiments, scheduling the set of tasks lists using reinforcement learning comprises: initializing a Q-table containing expected reward values for each state and action combination; defining a Q-function and reward; selecting and performing an action; measuring a reward of the action; and updating the Q-table, wherein an entry Q_oid in the Q-table is updated to Q_new by the equation Q_new = Qoid + ^a * TD, where a is a learning rate, where TD refers to temporal difference and is given by TD = r + y * max (Q) — Q_oid, where y is a discount factor, and where max (Q) is an estimated optimal future next state value.

[0017] In some embodiments, measuring a reward of the action comprises calculating a

reward r such that r = - where v refers to the overall cost of the schedule and is given by v =

* makespan + i₂ * cost + i₃ * wT * p, where i₁₍ i₂, i₃ are weights, makespan is the total duration of the schedule, cost is the cost of each test channel, wT is the sum of the waiting time for the task lists to be scheduled, and p is the priority of each task list. In some embodiments, obtaining a set of task lists comprises extracting information about the task lists from natural text using natural language processing. In some embodiments, extracting information about the task lists from natural text using natural language processing comprises: tokenizing the task lists from natural text into a group of tokenized words; removing stop words from the group of tokenized words; applying part-of-speech tagging on the group of tokenized words; and selecting textual features from the tagged group of tokenized words.

[0018] In some embodiments, obtaining a set of task lists further comprises generating the associated set of compatible testbed configurations using a machine learning based classifier based on the information about the task lists extracted from natural text. In some embodiments, obtaining a set of task lists further comprises generating the associated set of compatible testbed configurations using a rules-based classification based on the information about the task lists extracted from natural text. In some embodiments, the method further includes identifying a set of compatible testbeds for the set of compatible testbed configurations. In some embodiments, the task list is a collection of tasks related to a software development project goal. In some embodiments, the task list is in the form of a unstructured natural text.

[0019] According to a second aspect, a server configured for testing a system using testbeds is provided. The server is configured to obtain a set of task lists. A task list is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and the set of tasks lists define a testing protocol for the cloud-based application. The server is configured to schedule the set of task lists on a set of testbeds using reinforcement learning, wherein the scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.

[0021] FIG. 1 illustrates a framework according to an embodiment.

[0022] FIG. 2 illustrates an overview of a test channel configuration for one cloud-native test channel, according to an embodiment.

[0023] FIG. 3 illustrates a sample of an epic in a lira ticket format, according to an embodiment.

[0024] FIG. 4 illustrates using a rule-based or ML-based classification, according to an embodiment.

[0025] FIG. 5 illustrates a customized version of the Q-learning model according to an embodiment.

[0026] FIG. 6 illustrates a graphical user interface (GUI) such as may be used according to an embodiment [0027] FIG. 7 illustrates a schematic overview of a cloud implementation according to an embodiment.

[0028] FIG. 8 is a flowchart illustrating a process according to an embodiment.

[0029] FIG. 9 is a block diagram of an apparatus according to an embodiment.

DETAILED DESCRIPTION

[0030] An Al-based decision support system implemented in Python is disclosed herein. Other implementation languages are possible. The system is able to read an epic and generates the required cloud-based configuration for performing testing. Moreover, the system ranks and assigns each epic for testing based on the cloud-native infrastructure costs, and availability, and also it considers the priority of each epic. Note that, due to the embedded natural language processing (NLP) techniques, the system is not limited to the current version of the epics (lira tickets), where it can easily read, parse, and analyze any text document written in an unstructured natural text.

[0031] FIG. 1 illustrates a framework according to an embodiment. As shown in FIG. 1, the framework 100 includes a first phase 104 for performing text analysis using NLP and a second phase 106 for performing dynamic scheduling using reinforcement learning. The first phase 104 results in an output 108, which may include for each input epic 102 a testbed configuration appropriate for executing the epic and metadata such as priority, starting date, and duration. The second phase 106 considers criteria 110 such as cost, availability, configuration, and priority. The framework 100 results in an output 112, in the form of a schedule. For example, the schedule may assign different epics (shown as rectangles with different fill patterns) to one of the test channels for a given time period.

[0032] Since the input data 102 for this approach is an epic (e.g., a lira ticket), written in an unstructured natural language, then NLP techniques are needed to be used to analyze the inputs. In some embodiments, the epic may be in other formats, such as in a structured or unstructured language format.

[0033] Embodiments disclosed herein are generally applicable for testing a system using testbeds. The system being tested is not limited to any particular system, and may include in some embodiments a cloud-based system, such as, for example, a cloud RAN application. Similarly, the testbeds are not limited to any particular testbed, and may include in some embodiments cloud-native test channels. An example is provided in the specific context of a cloud RAN application, but this is not limiting and other embodiments may involve other cloudbased systems, or even non-cloud-based systems.

[0034] Phase 1. Text analysis using NLP:

[0035] The first phase of framework 100 is about capturing the raw data, i.e., the raw text in the epics which are written in an unstructured natural text. The text may be in any language (e.g., English). Generally, the users provide some information for each epic which needs to be tested by creating a Jira ticket. However, as far as the input is a natural text, the system is able to read and analyze it. The provided information in each epic might include some irrelevant information for scheduling purposes. Furthermore, human-generated epics may suffer from ambiguity, uncertainty, grammar errors, or the lack of relevant information for scheduling purposes. In this regard, designing and employing a unique text analysis framework is required to transfer an unstructured natural text to machine-readable data. In order to schedule the upcoming epics, two types of information extraction are needed: test channel configuration extraction and metadata extraction.

[0036] Structured data is in a standardized format, has a well-defined structure, complies to a data model, follows a persistent order, and generally is easily parsed by a computer. On the other hand, unstructured text is not standardized data, it is not in a standardized format, does not have a well-defined structure, does not follow a persistent order, and is not easily parsed by a computer. To understand unstructured text generally requires more sophisticated techniques, such as natural language processing, as compared to structured data.

[0037] The test channel configuration extraction: testbeds (e.g., cloud-native test channels) have different features (capabilities) that impact the cost of the testbed. FIG. 2 illustrates an overview of a test channel configuration for one cloud-native test channel, according to an embodiment. As shown, a cloud-native test channel has several features (capabilities). The configuration in FIG. 2 is only partial and for illustrative purposes. Understanding the configuration of the testbed may require a heavy knowledge of the domain of the system being tested. For example, understanding the configuration of cloud-native test channel requires a heavy knowledge of the Cloud RAN domain. A system under test can comprise software, hardware, and networking resources. For a cloud-based system, this property makes scheduling testbeds particularly complex due to the interaction of the software, hardware, and networking resources, among other things. For example, in the context of a cloud-based system, the current cloud-native test channel’s features can be classified into four main groups: 1-Fronthaul Capability, 2-Traffic Capability, 3-Traffic Special Capability, and 4-Platform Capability. However, some of the mentioned features (capabilities) can be eliminated and merged or it might be a case that a new feature needs to be added to a cloud-native test channel in the future. In the way that the system is designed, it is able to adapt itself to possible future changes. Paying no attention to the test channels configuration might lead to incorrect scheduling, which directly impacts the total cost of utilizing the resources. In Table 2, a list is provided of the current features (capabilities) embedded in the Cloud-Native test channels at Ericsson, where some examples are provided for each feature. The list in Table 2 is illustrative, and other cloud-native test channels having different features are also within the scope of disclosed embodiments.

[0038] Table 1. Current cloud-native test channel features and capability types.

[0039] During several meetings with the SMEs at Ericsson, all current cloud-native test channels features (capabilities) are extracted from the test channel configuration and stored in the system database. In total, 12 unique configurations are identified where each configuration might have one or more than one test channel. In other words, the problem of scheduling an epic on a cloud-native test channel is not necessarily a one-to-one mapping problem. Therefore, selecting a proper test channel among available test channels impacts the cost and on-time delivery of the product. Table 3 provides a list of different test channel configurations. The list in Table 3 is illustrative, and other cloud-native test channels having different combinations of features are also within the scope of disclosed embodiments.

[0040] Table 2. Identified unique test channel configurations.

[0041] It should be considered that the cost of each embedded feature to the cloud-native test channel is different, whereas the total cost of utilizing a test channel is also measured and is considered for the scheduling purposes by the system. In order to get a better overview regarding the input to the system, a sample of an epic in a lira ticket format is provided in FIG.

3. As can be seen in FIG. 3, the description and definition of done (DoD) are specified textually. Using FIG. 3 as input to the system, the system is able to generate the required cloud configuration for testing an epic. In other words, the features needed for testing an epic (such as those provided in Table 2) may be automatically extracted from the epic by employing different NLP techniques. Furthermore, another piece of information that needs to be extracted from an epic for scheduling purposes is the metadata.

[0042] Metadata Extraction: examples of metadata used for scheduling include the priority of each epic (e.g., A, B, C or 1.1, 2.5), the starting date (e.g., Week 16 or July 1, Q2, Release 1), the test duration (2 weeks, or starting date week 14 and ending date week 16), the Platform (RedHat or WindRiver), and so on. This information is specified in the epic and may be extracted using NLP.

[0043] Feature Extraction using NLP:

[0044] In order to extract the relevant information in the form of features, several NLP techniques may be utilized in one or more different steps. Table 4 provides the embedded NLP techniques in the system where the input is an epic (such as shown in FIG. 3) and example output are described in Table 4.

[0045] Table 3. The utilized NLP techniques in the system for feature extraction.

[0046] Configuration Generation using Machine Learning and Rule-based Approaches:

[0047] In this step, the required configuration for an epic (e.g., in a lira ticket format) needs to be generated. Using the extracted features from an epic can help the system for generating the required configurations for testing a cloud-based application (e.g., vDU, vCU). For performing this step, two different approaches are proposed: 1 -Machine Learning classification and 2-Rule-based classification. However, since each of the mentioned classification approaches has their advantage and disadvantages, both approaches were utilized. Generally, an ML-based classifier can handle a large set of data and a Rule-based classifier performs better on a smaller dataset.

[0048] Approach 1. Using an ML-based approach for Generation the Cloud-Native Test Channel Configuration

[0049] Utilizing the provided features in Table 4, the system is able to generate automatically the required cloud-native test channel configuration for testing a cloud-based application. In this regard, a support vector machine (SVM) method is embedded in the system, though other machine learning classifiers are also used in some embodiments. It needs to be considered that, for a large dataset (e.g., a large set of different configurations, test channels, and also upcoming epics), an ML-based classifier performs better, both in the terms of execution time and accuracy.

[0050] Approach 2. Using a Rule-based Classification for Generating the Cloud-Native Test Channel Configuration

[0051] As mentioned before, if there is a limited number of configurations, test channels, and also upcoming epics, then employing a rule-based classifier can be considered a more efficient approach in terms of execution time and also accuracy. Later in this disclosure, the performance of both mentioned classifiers is evaluated.

[0052] FIG. 4 illustrates the two approaches, using a rule-based or ML-based classification, according to an embodiment. As shown, the extracted features 402 from the epic (such as a lira ticket) are inputs to the classification 404, which may either be a rules-based classifier 404a or an ML-based classifier 404b. The result of the classification is a configuration 406, that is, a specific cloud-native test channel configuration applicable to the epic.

[0053] As can be seen in FIG. 4, by applying the proposed NLP techniques in Table 4 on an epic (such as a lira ticket like the one presented in FIG. 3), the system is able to generate the required cloud-native configuration for testing a cloud-based application. It might be a case that several compatible configurations can be generated for an epic if the provided information is not sufficient. Moreover, as presented in Table 3, for each cloud-native test channel configuration, there might be just one or several test channels. However, since this information is already stored in the embedded database, the system is able to suggest which test channel can be used for each epic. Furthermore, since the main goal of the system is optimization, several criteria may be considered for the scheduling.

[0054] Phase 2. Scheduling upcoming request using reinforcement learning and multi criteria decision making:

[0055] After the required configurations for testing a cloud application are generated in Phase 1, the system will schedule the epics for testing, considering one or more of the following criteria: [0056] (Criteria 1) The cost of each test channel - As presented in Table 3, for each configuration, there might be several test channels. However, the cost of each test channel may be different, even if it has the same capabilities as another channel. The cost of the current version of the cloud-native test channels is significant. The cost of a cloud-native test channel can be divided into direct and indirect costs, where the total cost is a sum of direct and indirect costs. These costs include hardware (direct cost) and server cost. For mid-band and high-band, the hardware cost is higher compared to low-band. These costs also include software cost and license (direct cost), platform cost (e.g., RedHat, WindRiver); operation cost (test channel configuration, complexity and manpower cost (indirect cost)); footprint and size (indirect cost); and power and connection cost (indirect cost). As mentioned before, the total cost of utilizing a cloud-native test channel is significant. Therefore, assigning a not complex epic on a complex test channel can be costly.

[0057] (Criteria 2) The availability of each test channel - Besides the required total cost for utilizing a cloud-native test channel, the availability of a test channel needs to be monitored as well. The task of assigning an epic to a cloud-native test channel is a dynamic task, which means several epics might be submitted per day, and each of them requires a different configuration and therefore a different test channel. For each decision (assigning an epic to a test channel) the availability of all test channels needs to be checked in advance.

[0058] (Criteria 3) The compatibility between each Epic and test channel configuration - As mentioned before, a wrong decision during the scheduling process can directly impact the required cost. For instance, if the decision-maker assigns a rather simple epic to a complex test channel, then a senior tester who can master the complexity of the test channel needs to be assigned as well. On the other hand, assigning a complex epic to a simple test channel (e.g., an epic which requires traffic to be assigned to an 0AM test channel) also impacts the cost. In this scenario, the assigned tester needs to check the test channel configuration first and it will take time to realize that the suggested test channel is not compatible with the epic.

[0059] (Criteria 4) The priority of each request - As presented in FIG. 3, each epic has a different priority due to the delivery plan. This priority can be specified as A, B, and C, or as numbers 1.2, and 2.2, and so on. This information is acquired as part of the metadata extraction, as presented in Table 4. Knowing the priority of each epic in advance can help to change some of the decisions (preemption). For instance, knowing the priority may allow for stopping the test execution of an epic on a test channel due to a lower priority and executing a new upcoming epic which has a higher priority. Furthermore, the time-sharing scenario on each test channel can also be considered. Since the cloud-native test channels might be located in the different areas then the test channels can be utilized for several epics where the testers are located in different countries using different time zones.

[0060] (Criteria 5) The starting date and duration of each request - Each epic might need a different starting date and also duration. If this information is specified in an epic, the system will consider it for scheduling. This information is already presented as the metadata in Table 4. However, in any case, if the starting date is not specified, a priority comparison between the current epics will be performed by the system. Furthermore, the default test duration for each epic is assumed as 4 weeks of full-time work, using a simple regression model on the previous similar epics. The default duration value will be changed if and only if the test duration is specified in the epic.

[0061] In addition, other criteria are also possible and may be considered. In the way that the system has been designed, the system is able to add more features for the test channels or add more criteria (or remove some of the criteria) for the scheduling phase. Considering the captured and extracted information in Phase 1 and also the above-mentioned criteria, the system may use a reinforcement learning approach called the Q-learning model for dynamic scheduling of the cloud-native test channels.

[0062] FIG. 5 illustrates a customized version of the Q-learning model according to an embodiment. As shown, at step 502, the Q-table is initialized. At step 504, the Q-function and reward are defined. At step 506, an action to perform is selected and performed. At step 508, the reward is measured. At step 510, the Q-table is updated. At step 512, the most optimal scheduling decision is generated. Steps 506 to 510 represent the learning loop for multiple episodes. These steps are now further described.

[0063] Initialize the Q-table:

[0064] In reinforcement learning, the Q-Table is used to store expected reward values for each of the combinations of all possible states/environments and actions. It helps to find the best action for each state. In the system, each state represents a different scheduling state, and each action represents assigning one epic (e.g., lira ticket) to one infrastructure (e.g., test channel). Table 5 represents an example of the Q-table in the system, where the action column (X, Y) represents epic (X) which is assigned to test channel (Y). The state column in Table 5 shows the status of each epic if it is assigned or not.

[0065] Table 5. A part of the Q-table, where (X, Y) means EPIC X is assigned to test channel Y (in the action part) and 0 or 1 means the EPIC is assigned or not (in the state part). If all EPICs are assigned, then it is the final state.

[0066] For example, in the right-most column in Table 5, there is (2,1) in the action line which means epic 2 is assigned to test channel 1 (an example of an action). The Os in the state column represent the expected reward for the given action in different states.

[0067] Define Q-function and reward:

[0068] The Q-function and reward together define how the learning process should be performed. In the system, the Q-function and reward are related to a makespan, which is the total time for scheduling, cost of the test channel, and priority of an Epic.

[0069] Learning loop for multiple episodes:

[0070] Select and perform an action to perform: first, the model will capture the current state, and later it will choose the highest Q-value for an action.

[0071] Measure the reward: in order to train the model in a more optimal way, the rewards need to be measured and the Q-table needs to be updated dynamically. In fact, the final goal is to select a scheduling decision that has the maximum reward. More information is provided later in this disclosure using Equation 3 and Equation 4. [0072] Update the Q-table: using the measured reward, the expected reward corresponding to the selected action and current state will be updated based on Equation 1 and Equation 2.

TD = r + y * max (Q) — Q_oid Equation 1. where, TD means temporal difference, y means discount factor, max (Q) means an estimated optimal future (next state) value, Q_oid means a current Q value of given state and given action.

Qnew = Qoid + a * TD Equation 2. where, a means learning rate, Q_new means the new Q value of given state and action.

[0073] Generate the most optimal scheduling decision:

[0074] The scheduling decision with the lowest overall cost which has been found during the learning loop will be selected as the final decision.

[0075] Implementation

[0076] As mentioned, the system is able to manage any request (epic) written in an unstructured natural text. FIG. 6 illustrates a graphical user interface (GUI) such as may be used with the system in an embodiment. The GUI may increase the overall performance. As can be seen in FIG. 6, the required information for generating a proper configuration for each epic is provided as a multi-select answer option under the “Functions” part of the system. This option can help to minimize the calculation time for the embedded NLP techniques inside of the system. In other words, the current unstructured natural text inside of each epic will be transferred to a structured natural language text, which directly impacts the performance of the system in terms of accuracy. The user (decision-maker in this case) needs to select the required functions before scheduling. Moreover, some of the functions such as “Traffic” and “0AM” are binary, which means the user cannot select both, since the current version of the cloud-native test channels at Ericsson has either traffic or non-traffic (0AM).

[0077] Furthermore, all other required information such as priority, start date, and end date (duration of each epic) will also be captured by the system. This embedded feature in the system can also help to minimize the required calculation time and also have the same formats for all epics. Selecting all the mandatory functions, the system will provide a list of all test channels which satisfy the configuration. Moreover, as can be seen in FIG. 6, based on the given starting date and priority, the system will highlight the available cloud-native test channel (e.g., with the green light). Finally, some optional information such as the platform type and version (e.g., RedHat version 4.8) is designed on the "OS" in FIG. 6.

[0078] Performance Evaluation

[0079] In order to evaluate the performance of the system, a set of labelled data is provided by the SMEs. As mentioned before, the process of assigning a proper cloud-native test channel to an epic (test support request such as a lira ticket) is performed fully manually today. In fact, several SMEs need to check the test channel configurations using their own knowledge and checking the configuration pictures (such as FIG. 2) for identifying a proper test channel for each epic. On the other hand, analyzing an unstructured natural text inside of each epic (e.g., the lira ticket in FIG. 3) is also a time-consuming process that suffers from human judgment, uncertainty, and ambiguity.

[0080] Performance Measurements for phase 1 :

[0081] Using the labelled data provided by the SMEs at PDU Cloud RAN, and also measuring the precision, recall, and Fl score, the results are summarized in Table 6.

[0082] Table 6. Performance evaluation of the system against the labeled data provided by the SMEs, (Train/Test split ratio: 0.67/0.33).

[0083] As can be seen in Table 6, using a Rule-based approach for the classification shows a better performance compared to the ML-based approach. However, the are several reasons behind the obtained results:

[0084] The data set size: in total 19 EPICs and corresponding labels are used for the performance evaluation due to the lack of time.

[0085] The lack of relevant and sufficient information in the epic, where the provided information is the epic page was not enough, or wrong information was provided. For instance, the user did not provide if he/she needs a test channel with traffic or without traffic, which is one of the mandatory information for configuration generation. In some cases, the user provides wrong information such as a test channel with traffic and without traffic (OAM) at the same time is needed.

[0086] The labels are performed manually based on the provided information in an epic which might suffer from ambiguity and also not sufficient data even for the SMEs to map an epic with a test channel. Moreover, as mentioned before, for each epic several configurations might be compatible based on the provided information in the epic. However, in the labeled data by SMEs just one configuration is selected.

[0087] Moreover, during the performance evaluation, some of the wrong decisions were identified, for instance, TC10 is an OAM test channel without traffic, which was assigned to an epic which required a traffic test channel.

[0088] Performance Measurements for phase 2:

[0089] In this part, a self-defined metric indicating the overall cost as the performance measurement is used, which is related to the makespan, cost of test channels, and priority of each epic. Equation 3 shows the calculation of the total overall cost of a complete schedule, which is provided by us.

Equation 3. The total overall cost of the complete schedule. where, v means the overall cost of the schedule, i₁₍ i₂, 13 represent the weights of different measurement parts, makespan shows the total duration of a schedule, cost means the cost of each test channel, wT means the sum of the waiting time for an epic to be scheduled, and p shows the priority of each epic.

[0090] Equation 4 shows how the reward is calculated:

1 r = — v

Equation 4. The reward of a complete schedule. where, r means reward, where the higher the reward indicates the better a decision. [0091] FIG. 7 illustrates a schematic overview of a cloud implementation according to an embodiment. Embodiments of the system may be implemented in Python, though other languages could also be used. However, the system can also be implemented in the cloud, since it is designed to optimize the usage of the cloud-native infrastructure. As shown, cloud implementation 700 may include a web clients module 702, a serving frontend 704, a FaaS (classification) module 706, a FaaS (scheduling) module 708, an Object Storage (model) module 710, a NoSQL Database (data) module 712, a Machine Learning Training Cluster module 714, a CI/CD module 716, and a Code Repository module 718. Exemplary connections between the modules are shown for illustrative purposes. The cloud implementation 700 may have a production part and a development part. The modules for cloud implementation 700 may be implemented in a single server, or distributed across multiple servers. Cloud implementation 700 may be capable of performing the text analysis and dynamic scheduling aspects of disclosed embodiments.

[0092] FIG. 8 is a flowchart illustrating a process 800, according to an embodiment, for testing a system using testbeds. Process 800 may begin in step s802.

[0093] Step s802 comprises obtaining a set of task lists. A task list is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and the set of tasks lists define a testing protocol for the system.

[0094] Step s804 comprises scheduling the set of task lists on a set of testbeds using reinforcement learning. The scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list.

[0095] Step s706 (which is optional) comprises initiating execution of the set of tasks lists on the set of testbeds based on the step of scheduling the set of task lists on a set of testbeds using reinforcement learning.

[0096] Process 800 may be performed by the framework 100, such as that described above with respect to FIGS. 1-7. [0097] In some embodiments, the system comprises a cloud-based application, the testbeds comprise cloud-native test channels, and the testbed configurations comprise software, hardware, and networking resources. In some embodiments, the method further includes identifying a set of compatible testbeds for the set of compatible testbed configurations.

[0098] In some embodiments, scheduling the set of tasks lists using reinforcement learning comprises: initializing a Q-table containing expected reward values for each state and action combination; defining a Q-function and reward; selecting and performing an action; measuring a reward of the action; and updating the Q-table, wherein an entry Q_oicl in the Q-table is updated to Q_new by the equation Q_new = Q_oid + a * TD, where a is a learning rate, where TD refers to temporal difference and is given by TD = r + y * max (Q) — Q_oid, where y is a discount factor, and where max (Q) is an estimated optimal future next state value. In some

1 embodiments, measuring a reward of the action comprises calculating a reward r such that r = - where v refers to the overall cost of the schedule and is given by v =

* makespan + i₂ * cost + i₃ * wT * p, where i₁₍ i₂, 13 are weights, makespan is the total duration of the schedule, cost is the cost of each test channel, wT is the sum of the waiting time for the task lists to be scheduled, and p is the priority of each task list.

[0099] In some embodiments, providing a set of task lists comprises extracting information about the task lists from natural text using natural language processing. In some embodiments, extracting information about the task lists from natural text using natural language processing comprises: tokenizing the task lists from natural text into a group of tokenized words; removing stop words from the group of tokenized words; applying part-of-speech tagging on the group of tokenized words; and selecting textual features from the tagged group of tokenized words. In some embodiments, providing a set of task lists further comprises generating the associated test channel configuration using a machine learning based classifier based on the information about the task lists extracted from natural text. In some embodiments, providing a set of task lists further comprises generating the associated test channel configuration using a rules-based classification based on the information about the task lists extracted from natural text. In some embodiments, the task list is a collection of tasks related to a software development project goal. In some embodiments, the task list is in the form of a unstructured natural text. For example, the task list may be an epic in the form of a lira ticket.

[0100] FIG. 9 is a block diagram of apparatus 900 (e.g., a server), according to some embodiments, for performing the methods disclosed herein. As shown in FIG. 9, apparatus 900 may comprise: processing circuitry (PC) 802, which may include one or more processors (P) 955 (e.g., a general purpose microprocessor and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be co-located in a single housing or in a single data center or may be geographically distributed (i.e., apparatus 900 may be a distributed computing apparatus); at least one network interface 948 comprising a transmitter (Tx) 945 and a receiver (Rx) 947 for enabling apparatus 900 to transmit data to and receive data from other nodes connected to a network 910 (e.g., an Internet Protocol (IP) network) to which network interface 948 is connected (directly or indirectly) (e.g., network interface 948 may be wirelessly connected to the network 910, in which case network interface 948 is connected to an antenna arrangement); and a storage unit (a.k.a., “data storage system”) 908, which may include one or more non-volatile storage devices and/or one or more volatile storage devices. In embodiments where PC 902 includes a programmable processor, a computer program product (CPP) 941 may be provided. CPP 941 includes a computer readable medium (CRM) 942 storing a computer program (CP) 943 comprising computer readable instructions (CRI) 944. CRM 942 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like. In some embodiments, the CRI 944 of computer program 943 is configured such that when executed by PC 902, the CRI causes apparatus 900 to perform steps described herein (e.g., steps described herein with reference to the flow charts). In other embodiments, apparatus 900 may be configured to perform steps described herein without the need for code. That is, for example, PC 902 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.

[0101] While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above described exemplary embodiments. Moreover, any combination of the above-described embodiments in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.

[0102] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.

Claims

1. A method for testing a system using testbeds, the method comprising: obtaining (s802) a set of task lists (102), wherein a task list (102) is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and wherein the set of tasks lists (102) define a testing protocol for the system; and scheduling (s804) the set of task lists (102) on a set of testbeds using reinforcement learning, wherein the scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list (102).

2. The method of claim 1, wherein the system comprises a cloud-based application, the testbeds comprise cloud-native test channels, and the testbed configurations comprise software, hardware, and networking resources.

3. The method of any one of claims 1-2, further comprising: initiating (s806) execution of the set of tasks lists (102) on the set of testbeds based on the step of scheduling (s804) the set of task lists on a set of testbeds using reinforcement learning.

4. The method of any one of claims 1-3, wherein scheduling the set of tasks lists (102) using reinforcement learning comprises: initializing a Q-table containing expected reward values for each state and action combination; defining a Q-function and reward; selecting and performing an action; measuring a reward of the action; and updating the Q-table, wherein an entry Q_oid in the Q-table is updated to Q_new by the equation Q_new = Q_oid + a * TD, where a is a learning rate, where TD refers to temporal difference and is given by TD = r + y * max (Q) — Q_oid, where y is a discount factor, and where max (Q) is an estimated optimal future next state value.

5. The method of claim 4, wherein measuring a reward of the action comprises calculating a reward r such that r = - where v refers to the overall cost of the schedule and is

given

are weights, makespan is the total duration of the schedule, cost is the cost of each test channel, wT is the sum of the waiting time for the task lists to be scheduled, and p is the priority of each task list.

6. The method of any one of claims 1-5, wherein obtaining a set of task lists (102) comprises extracting information about the task lists (102) from natural text using natural language processing.

7. The method of claim 6, wherein extracting information about the task lists (102) from natural text using natural language processing comprises: tokenizing the task lists (102) from natural text into a group of tokenized words; removing stop words from the group of tokenized words; applying part-of-speech tagging on the group of tokenized words; and selecting textual features from the tagged group of tokenized words.

8. The method of claim 6, wherein obtaining a set of task lists (102) further comprises generating the associated set of compatible testbed configurations using a machine learning based classifier based on the information about the task lists (102) extracted from natural text.

9. The method of claim 6, wherein obtaining a set of task lists (102) further comprises generating the associated set of compatible testbed configurations using a rules-based classification based on the information about the task lists (102) extracted from natural text.

10. The method of any one of claims 1-9, further comprising identifying a set of compatible testbeds for the set of compatible testbed configurations.

11. The method of any one of claims 1-10, wherein the task list (102) is a collection of tasks related to a software development project goal.

12. The method of any one of claims 1-11, wherein the task list (102) is in the form of a unstructured natural text.

13. A computer program (943) comprising instructions which when executed by processing circuitry (902) of a server (900), causes the server (900) to perform the method of any one of claims 1-12.

14. A carrier containing the computer program (943) according to claim 13, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium (942).

15. A server (900) configured for testing a system using testbeds, the server (900) being further configured to: obtain a set of task lists (102), wherein a task list (102) is associated with a set of compatible testbed configurations, a priority, a starting time, and a duration, and wherein the set of tasks lists define a testing protocol for the cloud-based application; and schedule the set of task lists (102) on a set of testbeds using reinforcement learning, wherein the scheduling is based on one or more of a cost of a testbed matching an associated testbed configuration, an availability of the testbed matching the associated testbed configuration, a compatibility of the testbed matching the associated testbed configuration, an associated priority, and an associated starting time and duration for a given task list.

16. The server of claim 15, wherein the system comprises a cloud-based application, the testbeds comprise cloud-native test channels, and the testbed configurations comprise software, hardware, and networking resources.

17. The server of any one of claims 15-16, further configured to: initiate execution of the set of tasks lists (102) on the set of testbeds based on the step of scheduling the set of task lists on a set of testbeds using reinforcement learning.

18. The server of any one of claims 15-17, wherein scheduling the set of tasks lists (102) using reinforcement learning comprises: initializing a Q-table containing expected reward values for each state and action combination; defining a Q-function and reward; selecting and performing an action; measuring a reward of the action; and updating the Q-table, wherein an entry Q_oid in the Q-table is updated to Q_new by the equation Q_new = Q_oid + a * TD, where a is a learning rate, where TD refers to temporal difference and is given by TD = r + y * max (Q) — Q_oid, where y is a discount factor, and where max (Q) is an estimated optimal future next state value.

19. The server of claim 18, wherein measuring a reward of the action comprises

1 calculating a reward r such that r = - where v refers to the overall cost of the schedule and is

given

20. The server of any one of claims 15-19, wherein obtaining a set of task lists (102) comprises extracting information about the task lists (102) from natural text using natural language processing.

21. The server of claim 20, wherein extracting information about the task lists (102) from natural text using natural language processing comprises: tokenizing the task lists (102) from natural text into a group of tokenized words; removing stop words from the group of tokenized words; applying part-of-speech tagging on the group of tokenized words; and selecting textual features from the tagged group of tokenized words.

22. The server of claim 20, wherein obtaining a set of task lists (102) further comprises generating the associated testbed configuration using a machine learning based classifier based on the information about the task lists (102) extracted from natural text.

23. The server of claim 20, wherein obtaining a set of task lists (102) further comprises generating the associated testbed configuration using a rules-based classification based on the information about the task lists extracted from natural text.

24. The server of any one of claims 15-23, further comprising identifying a set of compatible testbeds for the set of compatible testbed configurations.

25. The server of any one of claims 15-24, wherein the task list (102) is a collection of tasks related to a software development project goal.

26. The server of any one of claims 15-25, wherein the task list (102) is in the form of a unstructured natural text.