NL2033997B1

NL2033997B1 - Method, system and computer program product for root cause analysis in an industrial machine

Info

Publication number: NL2033997B1
Application number: NL2033997A
Authority: NL
Inventors: Smeenk Jeroen; Nusselder Robin; Wouteres Janszen Cornelis; Santana Sanz Yordie
Original assignee: Vmi Holland Bv
Priority date: 2023-01-20
Filing date: 2023-01-20
Publication date: 2024-07-30
Also published as: WO2024155182A4; EP4652508A1; WO2024155182A1; CN118805150A

Abstract

The invention relates to a method for root cause analysis in an industrial machine, wherein the method comprises the steps of: a) logging errors as they occur in the industrial machine; b) assigning a first causality time window and a second causality time window to a first logged error and a second logged error, respectively; c) determining, when an output of the industrial machine has been. interrupted. during' an interruption. time window, which logged errors are still open during said interruption time window; d) classifying at least one of the first logged error and the second logged error as a possible cause for the interruption in the output of the industrial machine during the interruption time window. The invention further relates to a system and a computer program. product for root cause analysis in an industrial machine.

Description

P141712NLOO

Method, system and computer program product for root cause analysis in an industrial machine

BACKGROUND

The invention relates to a method, a system and a computer program product for root cause analysis in an industrial machine.

Industrial machines, such as tire building machines, are becoming increasingly more modular. Each module is typically configured to repeat a limited number of tasks before delivering a semi-finished product to the next module. The semi-finished product is transferred between modules when the respective modules are ready to supply and receive. On a modular level, there is no exchange of information other than ‘ready to supply’ and ‘ready to receive’. As such, the modules are not aware of each other's state and a single error in one of the modules

Can cause a chain reaction of seemingly unrelated errors in downstream modules. With several errors occurring simultaneously and/or across multiple industrial machines, it is difficult to distinguish between root causes and effects. Consequently, a human operator may spend time solving an effect, without effectively addressing the root cause.

Moreover, on a system level, there may be an interruption in an otherwise regular output of the industrial machine, having one or more root causes that may have occurred simultaneously or in succession in one or more upstream modules. Depending on the lead time of the industrial machine and the severity of the one or more root causes, the errors associated with said one or more root causes may already have been cleared prior to the occurrence of the interruption. Therefore, retroactive determination of the root cause{s) for the interruption is difficult and again requires an extensive investigation, including manually timing the processes in the various modules, studying the causality between local module errors and the output of the industrial machine and manually generating downtime reports.

In practice, at least one in five interruptions can not be traced back to a specific root cause and/or is classified as an acceptable short stoppage that does not require further investigation. In addition, if there are multiple root causes for the same interruption, it is difficult to apportion blame for the lost time during the interruption.

In summary, the known root cause analysis is time consuming, partially incomplete and not very accurate.

Consequently, lost time as a result of interruptions in the output of the industrial machine can not be addressed effectively and the efficiency of the industrial machine is not fully optimized.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a method, a system and a computer program product for root cause analysis in an industrial machine, wherein the root cause analysis and/or the efficiency of the industrial machine can be improved.

According to a first aspect, the invention provides a method for root cause analysis in an industrial machine, wherein the method comprises the steps of: a) logging a plurality of errors as they occur in the industrial machine; b) assigning a first causality time window and a second causality time window to a first logged error and a second logged error, respectively, of the plurality of errors, wherein the first logged error and the second logged error are considered open as long as the first causality time window and the second causality time window, respectively, are open; ¢) determining, when an output of the industrial machine has been interrupted during an interruption time window, which logged errors of the first logged error and the second logged error are still open during said interruption time window; and d) «classifying at least one of the first logged error and the second logged error as a possible cause for the interruption in the output of the industrial machine if it is determined in step c) that the at least one of the first logged error and the second logged error is still open during the interruption time window.

Although the embodiment above only describes steps b), c) and dg) only for two errors, it will be appreciated that there may be more than two errors occurring simultaneously and/or successively, to which the same steps may be applied.

The causality time windows can be used to determine a causal relationship between the logged errors occurring at a moment in time that does not necessarily overlap with the interruption time window. In particular, the length of the causality time windows can be chosen such that they extend at least up to a time when the effects of the logged errors on the output of the industrial machine are expected to manifest themselves. Consequently, the determination in step ¢) and classification in step d) can be automated and do not require an extensive and time consuming, manual investigation. In this manner, a greater number of root causes for the interruption in the output of the industrial machine can be determined, analyzed and/or addressed more effectively, thereby ultimately reducing downtime and improving efficiency of the industrial machine.

In one embodiment the method further comprises the steps of: - storing the first logged error and the second logged error in a buffer for as long as the first causality time window and the second causality time window, respectively, are open; - clearing the first logged error {from the buffer when the first causality time window closes; - clearing the second logged error from the buffer when the second causality time window closes; and - determining in step c) which logged errors are still open by checking which logged errors are still in the buffer during the interruption time window. In this embodiment, step c}) only requires retrieving the logged errors which are still in the buffer from said buffer as the interruption time window starts or during a time when the interruption time window is still open. The causality time windows themselves are merely used as a tool to keep the logged errors in the buffer for a certain amount of time. Once the relevant causality time window has expired, the associated logged error is cleared and can no longer be retrieved as a possible cause for the interruption in the output during the interruption time window.

Alternatively, the method further comprises the steps of: - determining in step c) which logged errors are still open by checking which causality time windows are still open during the interruption time window and looking up the associated logged errors. This alternative embodiment has the technical advantage over the previous embodiment that the logged errors may remain available, even after expiry of the assigned causality time windows, together with the assigned causality time windows, such that the causal relationship between the logged errors and the interruption in the output may also be determined later, based on a historical overlap between one of the causality time windows and the interruption time window.

In another embodiment the first logged error and the second logged error are both classified as a first possible cause and a second possible cause, respectively, 5 for the interruption in the output of the industrial machine in step dj) if it is determined in step c) that the first logged error and the second logged error are both still open during the interruption time window. Hence, both possible causes can share the blame for the interruption in the output of the industrial machine during the interruption time window.

Preferably, the method further comprises the step of: e) apportioning the interruption time window to the first possible cause and the second possible cause. The apportionment can provide additional insight into the extent in which different causes are to blame for the interruption in the output of the industrial machine.

More preferably, the interruption time window is apportioned in step e) according to a start time or an end time of one of the first causality time window and the second causality time window. Hence, the properties of the causality time windows can control the way in which the interruption time window is apportioned.

In a further embodiment the interruption time window is apportioned in step e) by assigning the first possible cause and the second possible cause to a first time section and a second time section, respectively, of the interruption time window. By actually dividing the interruption time window into time sections, the possible causes can be assigned to the respective time sections and stored as such for further analysis.

In particular when the first causality time window starts prior to the second causality time window, the interruption time window is apportioned according to one or more of the following conditions; if the first causality time window and the second causality time window overlap and the first causality time window ends before the second causality time window has ended, then the first time section ends and the second time section starts when the first causality time window ends; or if the first causality time window and the second causality time window overlap and the second causality time window ends before the first causality time window has ended, then the second time section ends and the first time section starts when the second causality time window ends; or if the first causality time window and the second causality time window do not overlap, then the first time section ends and the second time section starts when the second causality time window starts. By using one or more of the conditions specified above, the apportionment can be controlled in such a way that the time sections more or less correspond to the length of the overlap of one of the causality time windows with the interruption time window, while the remaining length of the interruption time window is assigned to possible cause associated with the other causality time window.

In another embodiment the method further comprises the steps of: - generating a first timestamp and a second timestamp for the first logged error and the second logged error, respectively; and - starting the first causality time window and the second causality time window from the first timestamp and the second timestamp, respectively. Hence, the timestamp generated at the occurrence of the respective logged errors can be used as a starting point for the respective causality time windows.

In another embodiment the first causality time window and the second causality time window have a first window length and a second window length, respectively, which are determined independently.

:

Preferably, during a normal operation of the industrial machine, the industrial machine has an overall lead time and a cycle time, wherein the method further comprises the step of: - setting the first window length to be at least equal to a first error duration plus a first time-out that is longer than the cycle time and shorter than the overall lead time; and/or - setting the second window length to be at least equal to a second error duration plus a second time- out that is longer than the cycle time and shorter than the overall lead time. Hence, the length of the causality time windows is likely to extend up to a time when the effects of the logged errors on the output of the industrial machine are expected to manifest themselves.

More preferably, the first time-out is longer than the cycle time and shorter than a remaining lead time of the overall lead time measured from a position in the industrial machine where the first logged error has occurred; and/or wherein the second time-out is longer than the cycle time and shorter than a remaining lead time of the overall lead time measured from a position in the industrial machine where the second logged error has occurred. In this way, the length of the causality time windows can be chosen such that they extend at least up to a time when the effects of the logged errors on the output of the industrial machine are expected to manifest themselves.

In another embodiment, during a normal operation of the industrial machine, the industrial machine has a cycle time, wherein the method further comprises the step of: - starting the interruption time window when the output has an output interval that deviates from the cycle time. Preferably, the output interval deviates from the cycle time when said output interval exceeds the cycle time with at least ten percent of said cycle time, and preferably at least twenty percent of said cycle time. By using a tolerance of at least ten percent or at least twenty percent, it can be prevented that minor cycle time deviations trigger the start of the interruption time window.

In another embodiment the method further comprises the steps of: - monitoring user input; - classifying a special error indicative of no operation as a possible cause for the interruption in the output of the industrial machine when no user input has been detected for a predetermined time period during the interruption time window. In this manner, it can be prevent that the interruption in the output because of ‘no operation’ is classified as ‘unknown’ or ‘undefined’ and is unnecessarily investigated.

In a further embodiment the method further comprises the step of: - ending the interruption time window when the output of the industrial machine is resumed.

In another embodiment the industrial machine comprises a plurality of modules, wherein the method further comprises the steps of: - distinguishing between errors occurring as a root cause in one of the modules and errors occurring as an effect of the root cause in one of the modules and classifying the errors accordingly; and - assigning the first causality time window and the second causality time window only to logged errors which have been classified as root causes. Hence, it can be prevented that errors that merely occur as an effect of the root cause are classified as possible causes for the interruption in the output of the industrial machine.

According to a second aspect, the invention provides a method for root cause analysis in an industrial machine, wherein the industrial machine comprises a plurality of modules, wherein the method comprises the steps of: - assigning a status to each module of the plurality of modules reflecting a first state in which no human intervention is required and a second state in which human intervention is required; - changing the status of a first module of the plurality of modules from the first state to the second state upon occurrence of an earliest error in said first module that requires human intervention in said first module; and - classifying the earliest error that requires human intervention in said first module as a root cause for the change in status of said first module from the first state to the second state.

By only classifying the earliest error as the root cause, it can be prevented that any further errors in the same module are also classified as root cause. In other words, a module that changes status from the first state to the second state can do so only because of a single root cause. Any further errors are considered effects of said root cause, even if said further errors also require human intervention. By only showing a single root cause, the amount of error data can be reduced significantly and the human operator can be directed to the single root cause quickly to take appropriate action. Solving the root cause may also automatically resolve the errors that were generated as an effect of the root cause.

Preferably, any further errors occurring in said first module, after the earliest error that requires human intervention in said first module and before a change of the status of said first module back from the second state to the first state, are ignored as root cause candidates for the change in status of said first module from the first state to the second state. More preferably, said further errors occurring in said first module are classified as effects of the root cause. For example, an error that is generated as a human operator moves through an active safety light curtain in the first module is classified as a root cause if there is no earlier error in said first module that reduires the human operator to move through the safety light curtain. However, if there is an earlier error in said first module that requires human intervention, then the error generated by the safety light curtain is only an effect of the earliest error.

In a further embodiment the first state is representative of: - a respective module of the plurality of modules running without errors; - a respective module of the plurality of modules being on hold for a reason external said respective module; or - a respective module of the plurality of modules having an error that can be resolved automatically without human intervention. It will be clear that a module running without errors does not require human intervention.

A module that is on hold for some reason external to the module, for example another module that is not supplying a semi-finished product, may resume its process once the external reason is cleared. Similarly, a module that can automatically resume its process when the relevant error (internal or external) is resolved also does not require human intervention.

Preferably, the method further comprises the step of: - using two or more substates of the first state to distinguish between the respective module running without errors, being on hold or having an error that can be resolved automatically without human intervention. The further distinction can be used to visually inform a human operator of the difference between a module that is running normally and a module that is on hold without requiring human intervention.

In another embodiment the method further comprises the steps of:

- changing the status of a second module of the plurality of modules from the first state to the second state upon occurrence of an earliest error in said second module that requires human intervention in said second module; and - classifying the earliest error that requires human intervention in said second module as a root cause for the change in status of said second module from the first state to the second state. Different modules may change status as a result of different root causes. This embodiment has the same advantages as described earlier in relation to the first module, but now applied to the second module. Hence, error information about the first module and the second module can be limited to a single root cause per module.

In another embodiment the method further comprises the steps of: - providing a human machine interface; - displaying the statuses of the plurality of modules on the human machine interface; and - displaying only errors on the human machine interface which are classified as a root cause for a change in status of any module of the plurality of modules from the first state to the second state. By only showing root causes, the amount of error data can be reduced significantly and the human operator can more quickly determine the appropriate course of action for each module.

Optionally, a human operator may choose to see more error data. Also, more detailed error data may be shown on human machine interfaces located closely to or at a respective module. Additionally or alternatively, error data may be filtered based on location or distance of the human machine interface relative to the machine, providing more detail once the human operator approaches the industrial machine.

Preferably, the method further comprises the step of: - filtering out any further errors which are classified as effects of a root cause from being displayed on the human machine interface. Hence, it can be prevented that the human operator takes action on an error that has not been classified as the root cause for the respective module, possibly causing further errors when operation is resumed without solving the root cause.

In another embodiment the method further comprises the steps of: - assigning one or more roles of a plurality of roles to one or more error messages in a list of error messages used for generating errors in the first module; - associating a human machine interface with a human operator having a first role of the plurality of roles; and - displaying, upon occurrence of one or more errors in the first module, on the human machine interface only the error messages, related to the one or more errors, that have been assigned the first role. Consequently, the human interface only displays those errors which are of interest to the human operator having the first role. This allows for a further reduction of error information and a quicker and more effective response of the human operator.

In another embodiment, that can be applied to the methods according to the first aspect and the second aspect of the invention, the industrial machine is a tire building machine. As described in the background of the invention, tire building machines are becoming increasingly more modular. The methods according to the present invention therefore can be particularly advantageous when applied to tire building machines.

According to a third aspect, the invention provides a system for root cause analysis in an industrial machine, wherein the system comprises a control unit that is connectable to the industrial machine, wherein the control unit, when connected to the industrial machine, is configured for executing the steps of the method according to any one of the embodiments of the first aspect or the second aspect of the invention.

The system according to the third aspect of the invention is used to execute the previously discussed methods and therefore has the same technical advantages, which will not be repeated hereafter.

In one embodiment the system further comprises a buffer for storing the first logged error and the second logged error for as long as the first causality time window and the second causality time window, respectively, are open, wherein the first logged error and the second logged error are cleared from the buffer when the first causality time window and the second causality time window, respectively, close, wherein the control unit is configured for determining in step c¢) which logged errors are still open by checking which logged errors are still in the buffer during the interruption time window.

Alternatively, the control unit is configured for determining in step ¢) which logged errors are still open by checking which causality time windows are still open during the interruption time window and looking up the associated logged errors.

According to a fourth aspect, the invention provides a computer program product comprising a non- transitory computer-readable medium holding instructions that, when executed by a processor, cause the system according to any one of the embodiments of the third aspect of the invention to perform the steps of the method according to any one of the embodiments of the first aspect or the second aspect of the invention.

The computer program product according to the fourth aspect of the invention is used to cause the aforementioned system to perform the steps of the previously discussed methods and therefore has the same technical advantages, which will not be repeated hereafter.

The various aspects and features described and shown in the specification can be applied, individually, wherever possible. These individual aspects, in particular the aspects and features described in the attached dependent claims, can be made subject of divisional patent applications.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be elucidated on the basis of an exemplary embodiment shown in the attached schematic drawings, in which: figure 1 shows a schematic top view of an industrial machine with a plurality of modules and a control unit; figure 2 shows a schematic top view of the industrial machine of figure 1 supplemented with error messages; figure 3 shows a flow chart of the steps of a method for root cause analysis in the modules of the industrial machine of figures 1 and 2; figure 4 shows a schematic top view of the industrial machine of figure 1 in which only the error messages related to a root cause in a module are shown; figure 5 shows a flow chart of the steps of a method for root cause analysis of an interruption in an output of the industrial machine; and figures 6-12 show different event graphs representative of different operational scenarios and combining, at the top, statuses and buffer histories of one or more modules, and, at the bottom, the output, the overall status and an event history of the industrial machine.

DETAILED DESCRIPTION OF THE INVENTION

Figure 1 schematically shows an industrial machine M, in particular a tire building machine for manufacturing a green or unvulcanized tire or parts thereof. The invention may also be applied to another type of industrial machine.

The industrial machine M is modular or comprises a system of modules M1-M10. Each module M1-M10 is arranged, programmed and/or configured for repeating one or more tasks on a semi-finished product (not shown). The tasks may include, but are not limited to: supplying, positioning, cutting, splicing, joining, assembling and transferring.

In this example, there are ten modules M1-M10 arranged in two parallel lines MI1-M3, M4-M6 and a third line M7-M9 that receives and assembles semi-finished products from the two parallel lines M1-M3, M4-M6, which assembly is ultimately delivered to an output module M10.

It will be apparent to one skilled in the art that many variations in the configuration of the plurality of modules

M1-M10 are encompassed by the scope of the present invention.

The industrial machine M has a lead time or an overall lead time that is to be interpreted as a duration from the start of a process, i.e. at the first module MI,

M4 of each parallel line M1-M3, M4-M6, to the completion of the process, i.e. at the output module M10. In addition, the industrial machine M has a cycle time which is to be interpreted as the average duration or a regular interval between completion times at the output module M10.

As further shown in figure 1, the industrial machine M and/or the system of modules M1-M10 comprises a control unit U that is functionally, electronically and/or operationally connected to the plurality of modules M1-M10 to control their respective operations and/or processes.

The control unit U is provided with a processor and a computer readable medium. The computer readable medium is preferably non-transitory or tangible, e.g. a physical data carrier such as a hard-drive, a USB-drive, a RAM memory or the like. The computer readable member stores, carries or loads instructions that, when executed by the processor,

cause the industrial machine M and/or the system of modules

M1-M10 to operate and/or perform a process or task. It will be appreciated that each module M1-M10 may also have its own control unit and associated components (not shown).

Although the control unit U is shown as a central unit, it may also comprise several decentralized units in the respective modules M1-M10.

The control unit U is arranged, programmed and/or configured to assign a module state G1-G10 to each module

M1-M10. In addition, the control unit U is arranged, programmed and/or configured to assign a machine status G to the industrial machine M.

The industrial machine M further includes a buffer B for storing relevant process data, such as process information, the module statuses G1-G10, the machine status

G, alerts, error messages and/or events that occur in the industrial machine M and/or the plurality of modules MIl-

M10. Like the control unit U, the buffer B is shown as a centralized unit, but it may be decentralized as well, for example one local buffer (not shown) for each module MI1-

M10. The control unit U is functionally, electronically and/or operationally connected to the buffer B to access data stored therein.

Figures 6-12 show exemplary event graphs representative of different operational scenarios, including the module statuses Gl, G4, G8 of several modules

M1, M4, M8 and the machine status G.

The module statuses G1-G10 may include a first state in which no human intervention is required and a second state in which human intervention is required.

The first state may include one or more substates distinguishing between: - a respective module M1-M10 of the plurality of modules M1-M10 running normally, i.e. without errors, as represented schematically in figures 6-12 with a check mark symbol; - a respective module M1-M10 of the plurality of modules M1-M10 being on hold for a reason external said respective module M1-M10 or having an error that can be resolved automatically without human intervention, as represented schematically in figures 6-12 with a triangle shape pointing right (‘play’ symbol), surrounded by a circular arrow (\‘automation’ symbol).

The second state may be a single state, as represented schematically in figures 6-12 with an exclamation mark surrounded by a triangle shape.

Note that the machine status G uses similar symbols to represent the different machine states.

The event graphs in figures 6-12 further include a buffer history Bl, B4, B8 for the respective modules Ml,

M4, M8 and an event history E for the industrial machine M.

All information in the event graphs of figures 6- 12 is plotted along a time axis T. In this example, the time axis T includes information about the output P of the industrial machine M. In particular, the time axis includes times T1, T2, T3, T4 at which a product is completed (hereafter referred to as ‘completion times’), i.e. the times when a product is discharged from the industrial machine M at the output module M10. These completion times

Tl, T2, T3, T4 may be represented by grid lines, labels or ticks on or intersecting with the time axis T. The event graphs may optionally show the cycle time C of the industrial machine M.

Note that figure 6 includes a special moment in time T3’ when, based on the cycle time C and/or a regular interval between the completion times Tl, T2, T3, T4, a product was expected to be completed, yet the product did not arrive at the output module M10. Hence, the output P of the industrial machine M is interrupted until the actual completion time T3 of the product. The time during which the output P has been interrupted can be considered as

Most time’ in the output P of the industrial machine M, as represented with the lost time window L. The interruption in the output P of the industrial machine M may therefore be expressed in the aforementioned ‘lost time’, or alternatively in a ‘lost amount’ of products compared to an optimal amount of products to be manufactured during a time period, for example compared to a production target for a day.

The interruption in the output P of the industrial machine M may have various causes, one or more of which can be identified as a root cause. The present invention provides a method for root cause analysis in the industrial machine M, in particular on a modular level, as will be detailed below with reference to figures 1-4.

Figure 1 shows a situation in which the industrial machine M is running ‘normally’, i.e. without any errors.

Figure 2 shows a situation in which several faults or errors Al-A6 occur in the modules Ml1-M10 of the industrial machine M. The errors Al-A6 may occur simultaneously or successively. Upon occurrence of said errors Al-A6, corresponding alerts, warnings or error messages are generated, either at a module level or at a system level, as represented schematically with rectangular text balloons.

Figure 3 shows, on the left side, the steps $S1-56 for root cause analysis in the first module Ml. Step S1 starts the root cause analysis. At the start of the root cause analysis, the module state Gl of the first module M1 is set or reset to the first state, representative of the first module M1 running normally, i.e. without requiring human intervention. Step S2 involves collecting errors Al, as they occur, from the first module Ml. In step S3, it is determined, for each error Al in the first module Ml, if the collected error Al is the earliest error in the first module M1 that requires human intervention. If ‘yes’ (represented with \Y’ in figure 3), then the collected error Al is considered a root cause. If ‘no’ (represented with WN’ in figure 3), then the collected error Al is considered an effect rather than a root cause. In both cases, the collected error(s) is/are stored in the buffer

B. In addition, when the collected error Al is a root cause, the status of the first module M1 is changed in step

S5 from the first state to the second state, and the root cause analysis for the first module Ml is terminated in step S6.

The same or a similar root cause analysis is performed for any further modules that have generated error messages. Figure 3 shows, on the right side, an exemplary root cause analysis for the errors A2, A3, A4 occurring in the fourth module M4, with steps S11-S16 corresponding to steps S1-S6. Note that of the three errors A2, A3, Ad, only the earliest error A2 that requires human intervention in the fourth module M4 is considered a root cause and the other errors A3, A4 are considered effects of the earliest error AZ.

The method optionally further includes assigning a role Rl, R2, schematically represented as a human operator, to the errors Al, A2 which are considered to be a root cause, which roles Rl, R2 may be used to selectively distribute or display certain errors Al, A2 only to human operators subscribed to or having the respective role, for example ‘maintenance’, ‘loader’, ‘quality’ or ‘process’.

As further shown in figure 3, the method includes a root cause analysis on a system level in the industrial machine M, starting with step S521. In step S22, the earliest errors Al, A2 that require human intervention are collected from the modules Ml, M4. In step S23, said earliest errors Al, A2 are classified as root causes. In step S524, the errors Al, A2 that have been classified as root cause are displayed on a human machine interface MMI, optionally filtered based on the roles Rl, R2 assigned to said errors Al, AZ.

Figure 4 shows the system of modules M1-M10 and their module statuses G1-G10 as they are being displayed on the human machine interface HMI. Note that, compared to the errors Al-A6 generated in the situation of figure 2, the amount of error data that is ultimately displayed on the human machine interface HMI is significantly reduced by only displaying the errors Al, A2 that have been classified as root cause for the respective modules M1, M4. In addition, a human operator can easily determine based on the module statuses G1-G10 which modules M1, M4 have a second state that requires human intervention and which modules M2, M3, M5-M10 have a first state in which they can automatically resume their processes when the root causes are cleared.

The human machine interface HMI may be a fixed screen or a wearable, such as a smart watch, a mobile phone, a tablet, virtual reality glasses or any other form of human-machine interface.

Optionally, the method may further include the steps of ignoring an error in a module as a root cause when the error is caused by a reason external to said module, even if the error is the earliest error in said module that requires human intervention. For example, as shown in figure 2, the sixth module M6 has a fifth error A5 which is the earliest error in said sixth module Mé& that requires human intervention. However, in this scenario, the fifth error A5 itself may be caused by the errors A2, A3, A4 in the fourth module M4. Hence, the reason for the fifth error

Ab lies external to the sixth module M6, in the fourth module M4. Consequently, the control unit U may be adapted, arranged, programmed and/or configured to ignore the fifth error A5 as a root cause in the sixth module M6.

Figure 5 shows the steps S101-5108, $201, S202 of another method for root cause analysis in the industrial machine (M), in particular on a system level, based on various operational scenarios as further detailed below with reference to figures 6-12.

Step S101 is the start of the root cause analysis of lost time L in the output P of the industrial machine M in any of the operational scenarios as shown in figures 6, 7 and 9-12.

In step S102 a plurality of errors Al-A5 are logged as they occur in the industrial machine M. A timestamp is generated for each error Al-A5.

In step S103, a distinction is made between root causes and effects, for example based on the steps of the method in figure 3. Preferably, only the errors Al, A2 that have been classified as a root cause are sent to a buffer

B.

In step S201, causality time windows Wl, W2, represented schematically with stopwatches, are assigned to the errors Al, AZ that have been classified as root cause.

Specifically, the causality time windows Wl, W2 are started from the timestamps of the respective errors Al, A2. The errors Al, A2 are stored in the buffer B for as long as the respective causality time windows Wl, W2 are open. When one of the causality time windows Wl, W2 expires, the corresponding error Al, A2 is cleared from the buffer B.

Alternatively, the causality time windows Wl, W2 themselves may be stored in the buffer B, optionally together with the respective errors Al, A2. In that case, the causality time windows Wl, W2 and their associated errors Al, A2 may be retained in the buffer B for a period longer than the respective causality time windows Wl, WZ, for later reference and look-up purposes.

Meanwhile, in step S104, the interval between completion times Tl, T2, T3, T4 of the products are received or monitored.

In step S105 the interval is compared with the cycle time C of the industrial machine M. If the interval corresponds or substantially corresponds to the cycle C of the industrial machine M, then the monitoring is continued, as represented with “N%. If it is determined that the interval exceeds the cycle time C, then an interruption time window X is opened and the next step in the method is initiated, as represented with ò\Y/.

Preferably, the opening of the interruption time window X is delayed with a time delay D, as shown in figure

6, corresponding to a tolerance of at least ten percent or at least twenty percent of said cycle time C. By using said tolerance, it can be prevented that minor cycle time deviations trigger the start of the interruption time window X. Note that, for example in figure 6, the lost time

L in the output P of the industrial machine M already starts at the expected time of completion T3' of one of the products. However, the interruption time window X is started with a time delay D after said expected time of completion T3'.

In step S106, a request is sent to the buffer B to return all errors Al, A2 which are still open, as represented with the arrow from step 5106 to step S202. In response, in step 5202, it is determined which errors Al,

A2 are still open. Depending on how the errors Al, A2 and/or their respective causality time windows Wl, W2 are stored in the buffer B, the determination in step S202 may either be based on: - the errors Al, A2 which are still in the buffer B at the start of the interruption time window X or the errors Al, A2 which are entered into the buffer B during the interruption time window X; or - the errors Al, A2 for which the causality time windows Wl, W2 at least partially overlap with the interruption time window X, as determined at the start of the interruption time window XxX, during the interruption time window X or even after expiry of the interruption time window X, provided that the causality time windows Wl, W2 are kept in the buffer B for a longer period of time.

The errors Al, A2 which are still open are subsequently received in step S106, as represented with the arrow from step S202 to step S106.

In step S107, the logged errors Al, A2 which were open during the interruption time window X are classified as possible causes for the interruption in the output P of the industrial machine M.

The interruption time window X lasts until or ends when the output P is resumed.

In step S108, the possible causes for the interruption in the output P of the industrial machine M are presented, for example on the human machine interface

HMI of the industrial machine M.

The method of figure 5 can be applied to the various operational scenarios as specified in more detail below.

Figure 6 shows an operational scenario in which the module status Gl of the first module Ml changes from the first state (checkmark symbol) to the second state (exclamation mark) upon occurrence of a earliest error Al (hereafter ‘the first error Al’) that requires human intervention. The first error Al is classified as the root cause for the change in the module status G1 of the first module Ml, in accordance with the method of figure 3. A first causality time window Wl 1s assigned to the first error Al.

Note that the first causality time window Wl extends beyond the duration of the first error Al, i.e. beyond the point where the module status G1 of the first module M1 returns to the first state (checkmark symbol). In particular, the length of the first causality time window

Wl is set to be at least equal to a duration of the first error Al plus a first time-out Vl that is longer than the cycle time C and shorter than the overall lead time of the industrial machine M. In particular, the first time-out V1 is shorter than a remaining lead time of the overall lead time at and including the first module M1. In this way, the length of the first causality time window Wl can be chosen such that it extends at least up to a time when the effects of the first error Al on the output P of the industrial machine M are expected to manifest itself.

The first error Al is stored in the buffer history Bl associated with the first module M1 for as long as the causality time window Wl is open. Note that the output P of the industrial machine M initially continues uninterrupted until the effect of the error Al manifests itself in the form of the output interval exceeding the cycle time C. The interruption time window X is opened and the buffer history Bl of the first module Ml is accessed to ascertain which errors are still open. In this example, only the first error Al is open. Hence, the first error Al is recorded and/or presented as possible cause or root cause for the lost time L during the interruption time window X. When no further errors are open or entered into the buffer B during the interruption time window Xx, the first error Al remains the possible cause or root cause for said lost time L. Hence, the first error Al is recorded as possible cause or root cause for the entire length of the interruption time window X.

Figure 7 shows an alternative operational scenario that differs from the operational scenario of figure 6 in a further error A6 occurs, in this example in the eighth module M8. However, this further error A6 does not change the module status G8 of the eighth module M8 to the second state (exclamation mark). Instead, the eighth module M8 remains in the first state (checkmark) or only changes to a substate of the first state (triangle pointing right surrounded by a circular arrow) indicating that the eighth module M8 is ready to automatically resume its operation when a root cause external to said eighth module

M8 is resolved. Hence, the further error A6 is not classified as a root cause and is ignored as a root cause candidate for the lost time L during the interruption time window X. Hence, despite the first error Al and the further error A6 occurring simultaneously, only the first error Al is recorded and/or presented as a possible cause or root cause for the lost time L during the interruption time window X.

Figure 8 shows a further alternative operational scenario that differs from the previously discussed operational scenarios in that the first causality time window Wl expires before an interruption in the output P of the industrial machine M occurs or no interruption occurs at all. In this scenario, the first error Al is cleared from the buffer history Bl of the first module M1 without any further action.

Figure 9 shows a further alternative operational scenario in which a first error Al occurs in the first module M1 and a second error A2 occurs in the fourth module

M4. Both errors Al, AZ are determined to be root causes in accordance with the method of figure 3 and both errors Al,

AZ are assigned causality time windows Wl, WZ. The time window lengths of both causality time windows Wl, WZ are determined independently, based on the actual duration of the respective errors Al, A2 and/or different time-outs Vl,

V2, for example depending on the severity of the respective error Al, A2 and/or the remaining lead time from the respective modules M1, M4 to the output module M10. Hence, the window lengths of the respective causality time windows

Wl, W2 may vary considerably.

As both errors Al, A2 are still open during the interruption time window X, both errors Al, A2 are recorded and/or presented as possible causes or root causes for the lost time L during the interruption time window X. In particular, the interruption time window X is split up or divided into two time sections Kl, K2 which are apportioned to the errors Al, AZ.

In the example of figure 9, the causality time windows Wl, WZ do not overlap. Since the first causality time window Wl is ends before the second causality time window W2 starts, the first error Al associated with the first causality time window Wl is recorded as possible cause or root cause for the first time section Kl. The first time section Kl ends and the second time section K2 starts at the start of the second causality time window W2.

Figure 10 shows a further alternative operational scenario that differs from the further alternative operational scenario in figure 9 in that the causality time windows Wl, W2 overlap and the first causality time window

Wl ends before the second causality time window W2. Again, the first error Al associated with the first causality time window Wl is recorded as possible cause or root cause for the first time section Kl and the second error A2 associated with the second causality time window WZ is recorded as possible cause or root cause for the second time section K2. However, in this case, the first time section Kl ends and the second time section K2 starts when first causality time window Wl ends.

Figure 11 shows a further alternative operational scenario, again with overlapping causality time windows Wl,

W2. However, the further alternative operational scenario of figure 11 differs from the further alternative operational scenario in figure 10 in that the second causality time window W2 ends before the first causality time window Wl. In this case, the second error A2 associated with the second causality time window W2 is recorded as possible cause or root cause for the first time section Kl and the first error Al associated with the first causality time window Wl is recorded as possible cause or root cause for the second time section K2. The first time section Kl ends and the second time section K2 starts when second causality time window W2 ends.

Figure 12 shows a further alternative operational scenario that differs from the previously discussed scenarios in that it monitors user inputs I of the human operator plotted against the time axis T. When, at the expected time of completion T3' of a product, no output P is detected at the output module M10, and for a predetermined time period F from the expected time of completion T3’ no user inputs I are received, it is assumed that the industrial machine M is not being operated, for example because the human operator is absent, and classifying, recording and/or presenting a special error Z indicative as ‘no operation’ as possible cause or root cause for the interruption in the output P during the interruption time window X. In this manner, it can be prevent that the interruption in the output P because of ‘no operation’ 1s classified as ‘unknown’ or ‘undefined’ and is unnecessarily investigated.

In a further alternative operational scenario, the control unit may be adapted, arranged, programmed or configured to «classify a ‘specification change’ or a ‘manual mode’ in which manual operations, such as an specification change’ can take place, as a special root cause to the interruption time window XxX if the configuration of the industrial machine M is being changed, for example, a tire size change in a tire building machine or changeover to a different compound for the green tire that is being made in said tire building machine. Although such a specification change or manual mode is not an error, it may be useful to include them as a root cause because it can provide insight into how long the industrial machine M is down as a result of the specification change or the manual mode. In particular, if the specification change takes longer than expected, an investigation can be launched into the reasons for the delay.

It is to be understood that the above description is included to illustrate the operation of the preferred embodiments and is not meant to limit the scope of the invention. From the above discussion, many variations will be apparent to one skilled in the art that would yet be encompassed by the scope of the present invention.

LIST OF REFERENCE NUMERALS

Al-A6 Errors

B buffer

B3 third buffer history

B4 fourth buffer history

C cycle time

D delay

E root cause history

F predetermined time period

G machine status

G1-G10 module statuses

HMI human machine interface 1 input

K1 first time section

K2 second time section

L lost time window

M industrial machine

M1-M10 modules

P output

R1 first role

R2 second role

S1 start of first module root cause analysis

S2 collecting errors from the first module 53 is this the earliest error in the first module that requires human intervention? 54 store error in (module) buffer

Sh change module status

S6 end of first module root cause analysis

S11 start of fourth module root cause analysis

S512 collecting errors from the fourth module

S13 is this the earliest error in the fourth module that requires human intervention?

S14 store error in (module) buffer 515 change module status 516 end of fourth module root cause analysis

S21 start of root cause analysis in the industrial machine

S22 collect the earliest errors that require human intervention from the modules 523 classify earliest errors that require human intervention as root causes 524 generate human machine interface with root causes

S101 start of root cause analysis of an interruption in the output of the industrial machine

S102 logging a plurality of errors

S103 distinguishing between root causes and effects 5104 monitoring the output interval 5105 does the output interval exceed the cycle time? 5106 receiving one or more logged errors which are still open 5107 classifying the one or more logged errors as possible cause for the interruption in the output of the industrial machine 5108 presenting the possible causes for the interruption in the output of the industrial machine 5201 assigning causality time windows to the logged errors 5202 determining which logged errors of the first logged error and the second logged error are still open

T time axis

T1 time of completion of first product

T2 time of completion of second product

T3' expected time of completion of third product

T3 actual time of completion of third product

T4 time of completion of fourth product

U control unit v1 first time-out

V2 second time-out

Wl first causality time window

W2 second causality time window

X interruption time window

Z special error ‘no operation’

Claims

CONCLUSIONS

1. A method for source cause analysis in an industrial machine (M), the method comprising the steps of: a) logging (S102) a plurality of faults (Al-A6) as they occur in the industrial machine; b) assigning (S201) a first causality period (Wl) and a second causality period (W2) to a first fault (Al) and a second fault (A2) of the plurality of faults (Al-A6}, respectively, the first logged fault (Al) and the second logged fault (A2) being considered open as long as the first causality period (Wl) and the second causality period (W2) are open, respectively; c) determining (35202), when an output (P) of the industrial machine (M) is interrupted during an interruption period (X), which logged faults (Al, A2) of the first logged fault (Al) and the second logged fault (A2) are still open during the interruption period (X); and d) classifying (S107) at least one of the first logged fault (Al) and the second logged fault (A2) as a possible cause for the interruption in the output (P) of the industrial machine (M) if it is determined in step c) that the at least one of the first logged fault {Al} and the second logged fault (A2) is still open during the interruption period (X).

2. The method of claim 1, wherein the method further comprises the steps of: - storing the first logged error (A1) and the second logged error (A2) in a buffer (B) as long as the first causality time slot (W1) and the second causality time slot (W2), respectively, are open; - clearing the first logged error (A1) from the buffer {B} when the first causality time slot

(Wl) closes; - clearing the second logged error (A2) from the buffer (B) when the second causality time slot (W2) closes; and - determining (S202) in step c¢) which logged errors (Al, A2) are still open by checking which logged errors are still in the buffer (B) during the interrupt time slot (X}.

The method of claim 1, wherein the method further comprises the step of: - determining (S202) in step c) which logged errors (A1, A2) are still open by checking which causality time slots (W1, WZ) are still open during the interrupt time slot (X) and looking up the corresponding logged errors (A1, A2).

4. A method according to any preceding claim, wherein the first logged error (A1) and the second logged error (A2) are both classified (S107) as a first possible cause and a second possible cause, respectively, for the interruption in the output (P) of the industrial machine (M) in step d) if it is determined (5202) in step c¢) that the first logged error (A1) and the second logged error (A2) are both still open during the interruption period (X).

5. A method according to claim 4, wherein the method further comprises the step of: eg) dividing the interruption period (X) between the first possible cause and the second possible cause.

6. A method according to claim 5, wherein the interruption period (X) is divided in step e) based on a start time or an end time of one of the first causality period (W1) and the second causality period (W2).

7. Method according to claim 5 or 6, wherein the interruption period (X) in step e) is divided by assigning the first possible cause and the second possible cause to a first time section (K1) and a second time section (K2) of the interruption period (X), respectively.

8. The method of claim 7, wherein the first causality period (Wl) starts before the second causality period (W2), the interruption period (X) being divided according to one or more of the following conditions: if the first causality period (Wl) and the second causality period (WZ) overlap and the first causality period (W1) ends before the second causality period (W2) has ended, then the first time section (K1) ends and the second time section (K2) starts when the first causality period (Wl) ends; or if the first causality period (Wl) and the second causality period (W2) overlap and the second causality period (W2) ends before the first causality period (Wl) has ended, then the second time section (K2) ends and the first time section (K1) starts when the second causality period (W2) ends; or if the first causality period (Wl) and the second causality period (W2) do not overlap, then the first time section (Kl) ends and the second time section (K2) starts when the second causality period (W2) starts.

9. A method according to any preceding claim, wherein the method further comprises the steps of: - generating a first timestamp and a second timestamp for the first logged error (Al) and the second logged error (AZ), respectively; and - starting the first causality epoch (W1) and the second causality epoch (W2) from the first timestamp and the second timestamp, respectively.

10. A method according to any preceding claim, wherein the first causality period (W1) and the second causality period (W2) have a first slot length and a second slot length, respectively, which are determined independently.

A method according to claim 10, wherein, during normal operation of the industrial machine (M), the industrial machine (M) has an overall throughput time and a cycle time (C), the method further comprising the step of: - setting the first slot length to be at least equal to a first error duration plus a first time-out (V1) that is longer than the cycle time (C) and shorter than the overall throughput time; and/or - setting the second slot length to be at least equal to a second error duration plus a second time-out (V2) that is longer than the cycle time (C) and shorter than the overall throughput time.

A method according to claim 11, wherein the first time-out (V1) is longer than the cycle time (C) and shorter than a remaining lead time of the overall lead time measured from a position in the industrial machine (M) where the first logged error (A1) occurred; and/or wherein the second time-out (V2) is longer than the cycle time (C) and shorter than a remaining lead time of the overall lead time measured from a position in the industrial machine (M) where the second logged error (A1) occurred.

13. A method according to any preceding claim, wherein, during normal operation of the industrial machine (M), the industrial machine (M) has a cycle time (C), the method further comprising the step of: - starting the interruption period (X) when the output (P) has an output interval that deviates from the cycle time (C).

14. A method according to claim 13, wherein the output interval deviates from the cycle time (C) when the output interval exceeds the cycle time (C) by at least ten percent of the cycle time (C) and preferably at least twenty percent of the cycle time (Cj).

15. A method according to any preceding claim, wherein the method further comprises the steps of: - monitoring a user input (I); - classifying a special fault (2) indicative of no operation as a possible cause for the interruption in the output (P) of the industrial machine (M) when no user input (I) is detected for a predetermined period of time (F) during the interruption period (X).

16. A method according to any preceding claim, wherein the method further comprises the step of: - terminating the interruption period (X) when the output (P) of the industrial machine (M) is resumed.

17. A method according to any preceding claim, wherein the industrial machine (M) comprises a plurality of modules (M1-M10), the method further comprising the steps of: - distinguishing (S53, S13) between faults (A1, AZ) occurring as a source cause in one of the modules (M1-M10) and faults (A3-A6) occurring as an effect of the source cause in one of the modules (M1-M10) and classifying the faults (A1-A6) as such; and - allocating (5201) only the first causality time slot (W1) and the second causality time slot (W2) to logged faults (A1, AZ) classified as source causes.

18. Method for source cause analysis in an industrial machine (M), the industrial machine (M) comprising a plurality of modules (M1-M10), the method comprising the steps of:

- assigning a status (G1-G10) to each module (M1-M10) of the plurality of modules (M1-M10) representing a first state in which no human intervention is necessary and a second state in which human intervention is necessary; - changing (35) the status (G1-G10) of a first module (M1) of the plurality of modules (M1-M10) from the first state to the second state upon the occurrence of an earliest fault (Al) in the first module (M1) requiring human intervention in the first module (M1); and - classifying (S13) the earliest fault (Al) requiring human intervention in the first module (M1) as a source cause for the change in status (Gl) of the first module (Ml) from the first state to the second state. 19. The method of claim 18, wherein any further faults occurring in the first module {M1} after the earliest fault (A1) requiring human intervention in the first module (M1) and before the state (Gl) of the first module {M1} changes back from the second state to the first state are ignored as candidates for the source cause for the change in state (Gl) of the first module (M1) from the first state to the second state.

20. The method of claim 19, wherein the further errors occurring in the first module (M1) are classified as an effect of the source cause.

21. A method according to any one of claims 18 to 20, wherein the first state is representative of: - a respective module (M1-M10) of the plurality of modules (M1-M10) operating without fault; - a respective module (M1-M10) of the plurality of modules (M1-M10) waiting for a reason external to the respective module (M1-M10); or - a respective module (M1-M10) of the plurality of modules (M1-M10) having a fault that can be automatically corrected without human intervention.

22. A method according to claim 21, wherein the method further comprises the step of: - using one or more substates of the first state to distinguish between the respective module (M1-M10) that is operating without fault, is waiting or has a fault that can be automatically corrected without human intervention.

23. A method according to any one of claims 18 to 22, wherein the method further comprises the steps of: - changing (S15) the status (G4) of a second module (M4) of the plurality of modules (M1-M10} from the first state to the second state upon the occurrence of an earliest fault (AZ) in the second module (M4) requiring human intervention in the second module (M4); and - classifying (S523) the earliest fault (AZ) requiring human intervention in the second module (M4) as a source cause for the change in status (G4) of the second module (M4) from the first state to the second state.

24. A method according to any one of claims 18 to 23, wherein the method further comprises the steps of: - providing a human-machine interface (HMI); and - displaying (524) the statuses (G1-G10) of the plurality of modules (M1-M10) on the human-machine interface (HMI); and - displaying (S24) only faults (A1, A2) on the human-machine interface (HMI) that are classified as a source cause for a change in status (G1, G4) of a module (M1, M4) of the plurality of modules (M1-M10) from the first state to the second state.

25. The method of claim 24, wherein the method further comprises the step of: - filtering out further errors (A3-A6) classified as effects of a source cause of display on the human-machine interface (HMI).

26. A method according to any one of claims 18 to 25, wherein the method further comprises the steps of: - assigning one or more roles (R1, R2) of a plurality of roles (R1, R2}) to one or more error messages in a list of error messages used to generate errors in the first module (M1); - associating a human-machine interface (HMI) with a human operator having a first role (R1) of the plurality of roles (R1, RZ); and - displaying (S24), upon the occurrence of one or more errors (A1-A6) in the first module (M1), only the error messages on the human-machine interface (HMT) related to the one or more errors (A1) assigned to the first role (R1).

27. Method according to any one of the preceding claims, wherein the industrial machine (M) is a tire building machine.

28. A system for root cause analysis in an industrial machine (M), the system comprising a control unit (U) connectable to the industrial machine (M), the control unit (U), when connected to the industrial machine (M), being configured to perform the steps (S1-56, S11-S16, S21-524, 5101-5108, S201, S202) of the method according to any one of claims 1-27.

29. The system of claim 28, wherein the system further comprises a buffer (B) for storing the first logged error (A1) and the second logged error (A2) for as long as the first causality time slot (W1) and the second causality time slot (W2), respectively, are open, the first logged error (A1) and the second logged error (A2) being cleared from the buffer (B) when the first causality time slot (W1) and the second causality time slot (W2), respectively, close, the control unit (U) being configured to determine (S202) in step c¢) which logged errors (A1, A2) are still open by checking which logged errors (A1, A2) are still in the buffer (B) during the interrupt time slot (X).

The system of claim 28, wherein the control unit (U) is configured to determine (5202) in step c}) which logged faults (A1, A2) are still open by checking which causality time slots (W1, W2) are still open during the interrupt time slot (XX) and looking up the corresponding logged faults (A1, A2).

31. A computer program product comprising an immortal computer readable medium containing instructions which, when executed by a processor, cause a system according to any of claims 28-30 to perform the steps (S1-S6, 511-816, S21-524, S101-5108, 5201, 5202) of the method according to any of claims 1-27. -070-0-70-70-0-0-0-