CN106021642A

CN106021642A - Reachable identification based process model repairing method

Info

Publication number: CN106021642A
Application number: CN201610291898.1A
Authority: CN
Inventors: 杜玉越; 祁宏达; 李鹏; 王路; 刘伟
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2016-05-05
Filing date: 2016-05-05
Publication date: 2016-10-12
Anticipated expiration: 2036-05-05
Also published as: CN106021642B

Abstract

The invention discloses a process model restoration method based on reachable marks. The process model repair method firstly proposes extensions to the calibration, adds the reachable marks of the Petri net, proposes the concept of extended calibration, and attempts to calibrate through the extended To identify the various types of extended actions that may occur. The optimal extended calibration can provide support for the repair of various deviations. By separately analyzing the extended log action and the extended model action, the present invention proposes algorithms to repair these two types of deviations. A simulation experiment of this method was carried out through the event log of the Thoracic Surgery Department of a certain hospital, and it was found that the method of the present invention has improved accuracy and generalization. Compared with existing methods, the method of the present invention has better performance in each dimension.

Description

A Process Model Restoration Method Based on Reachability Mark

技术领域technical field

本发明涉及一种基于可达标识的过程模型修复方法。The invention relates to a process model restoration method based on reachable marks.

背景技术Background technique

过程挖掘是在1998年由R.Agrawal等人提出的。目前国际上比较认可的过程挖掘定义为“过程挖掘是指那些从实际执行集合中提取出结构化过程描述的方法”。Process mining was proposed by R.Agrawal et al. in 1998. At present, the definition of process mining recognized internationally is "process mining refers to the methods for extracting structured process descriptions from actual execution sets".

过程挖掘技术一共有三类应用：过程发现(process discovery)、一致性检测(conformance checking)和过程增强(process enhancement)。过程发现技术的目标是通过事件日志中构建一个过程模型。在构建过程模型时，主要考虑控制流维度。一致性检测则是将一个已知的过程模型与这个模型的事件日志进行对比，来检查记录在日志中的实际情况是否符合这个模型。过程增强其理念是利用实际过程产生的事件日志来扩展或改进一个已存在的过程。与一致性检测不同，第三种过程挖掘场景的目的在于改进和扩展已知模型。There are three types of applications of process mining technology: process discovery, conformance checking and process enhancement. The goal of process discovery techniques is to construct a process model from event logs. When constructing a process model, the control flow dimension is mainly considered. Consistency detection is to compare a known process model with the event log of this model to check whether the actual situation recorded in the log conforms to the model. The idea of process enhancement is to extend or improve an existing process by using the event log generated by the actual process. Unlike consistency detection, the third process mining scenario aims at improving and extending known models.

过程模型的“好坏”需要通过不同方面来进行评价，主要的评价标准有：拟合度(fitness)、精确度(precision)、简化度(simplicity)和泛化度(generalization)。其中，拟合度是过程模型评价最为重要的指标。一个模型有着极佳的拟合度意味着这个模型能够重演日志中所有的迹(trace)。高精确度代表着模型只会重演事件日志中的事件，简化度意味着最好的模型是能够解释日志中所见行为且最为简单的模型，泛化度则代表着模型是否能够允许未来的行为。The "good or bad" of the process model needs to be evaluated through different aspects. The main evaluation criteria are: fitness, precision, simplicity and generalization. Among them, the degree of fit is the most important index for process model evaluation. A model with an excellent fit means that the model is able to reproduce all the traces in the log. High accuracy means that the model will only replay the events in the event log, simplicity means the best model is the simplest model that can explain the behavior seen in the log, and generalization means whether the model can allow future behavior .

在一致性检测和过程增强技术方面，目前最大的挑战就是在过程模型上进行重演，来发现事件日志中观察到的行为的最优路径。In terms of consistency detection and process enhancement techniques, the biggest challenge at present is to perform replay on the process model to find the optimal path of the behavior observed in the event log.

现有技术中给出的基于事件日志的校准方法，不仅能够对事件日志进行重演，还能够解决在重演过程中出现的诸如重复变迁，不可见变迁，复杂的控制流和循环此类问题。另外，对于过程模型中出现的偏差，校准也能够发现偏差出现的地点，并提供偏差的诊断信息。进一步地，通过对不同偏差赋予不同程度的严重性，可以利用校准来实现不同角度的分析。The event log-based calibration method given in the prior art can not only replay the event log, but also solve problems such as repeated transitions, invisible transitions, complex control flow and loops during the replay process. In addition, for deviations in the process model, calibration can also find out where the deviations occur and provide diagnostic information for the deviations. Furthermore, by assigning different degrees of severity to different deviations, calibration can be used to implement analysis from different angles.

通过对事件日志进行校准，能够发现日志中出现的各类偏差。如何对校准中出现的偏差进行修复，使其能够更好的在过程模型中重演也逐渐成为了关注的重点。模型修复技术是一种新的过程挖掘的应用技术，它属于过程增强这一范畴。模型修复以事件日志和过程模型作为输入，如果存在事件日志中的迹不能够在过程模型中重演，则需要对过程模型进行修改。修改后的模型要尽量与原始模式相似，即不改变原始模型的基本结构。By calibrating the event log, various deviations in the log can be found. How to repair the deviation in the calibration so that it can be better reproduced in the process model has gradually become the focus of attention. Model restoration technology is a new application technology of process mining, which belongs to the category of process enhancement. Model repair takes event log and process model as input. If the traces in the event log cannot be reproduced in the process model, the process model needs to be modified. The modified model should be as similar as possible to the original model, that is, the basic structure of the original model should not be changed.

目前存在一些方法，其运用编辑距离矩阵(edit distance matrices)来发现改进的模型，这与修复模型的目标一致，但并不属于模型修复技术。基于模型和日志的偏差来对模型修复，目前来看，现有的方法并不多。尽管现有技术中提出了一种模型修复方法，该模型修复方法利用校准发现模型和日志的偏差通过校准，进而对模型进行修复。但是，此种模型修复方法主要关注于拟合度这一维度，对于其他维度诸如简化度和精确度方面，欠缺考虑。在这一背景下，本发明综合考虑各个维度的指标，进而去修复过程模型。There exist methods that use edit distance matrices to discover improved models, which is consistent with the goal of inpainting models, but is not a model inpainting technique. Based on the deviation of the model and the log to repair the model, at present, there are not many existing methods. Although a model repairing method is proposed in the prior art, the model repairing method uses calibration to discover the deviation of the model and the log to pass the calibration, and then repairs the model. However, this model repair method mainly focuses on the dimension of fit, and lacks consideration for other dimensions such as simplification and accuracy. Under this background, the present invention comprehensively considers the indicators of each dimension, and then repairs the process model.

发明内容Contents of the invention

针对现有技术中存在上述技术问题，本发明的提出了一种基于可达标识的过程模型修复方法，该方法通过综合考虑以上各个维度，便于对过程模型进行修复。Aiming at the above-mentioned technical problems in the prior art, the present invention proposes a method for repairing a process model based on reachable marks, which facilitates repairing the process model by comprehensively considering the above dimensions.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于可达标识的过程模型修复方法，其特征在于，包括如下步骤：A method for repairing a process model based on reachability identification, characterized in that it includes the following steps:

a参数定义a parameter definition

定义迹和事件日志Define traces and event logs

设是活动集，迹σ∈A*是活动队列，而事件日志L则是迹的多重集合，即L∈1B(A*)；Assume is the activity set, the trace σ∈A* is the activity queue, and the event log L is a multiple set of traces, that is, L∈1B(A*);

定义Petri网Defining Petri nets

四元组PN＝(P,T；F,M)称为一个Petri网PN，当且仅当：The quadruple PN=(P, T; F, M) is called a Petri net PN if and only if:

(1)N＝(P,T；F)为一个网，其中，P是一个有限库所集，T是一个有限变迁集，是一个有限弧集；(1) N=(P, T; F) is a network, wherein, P is a finite place set, T is a finite transition set, is a finite arc set;

(2)M:P→N称为网PN的一个标识，其中m_i为初始标识，m_f为终止标识；(2) M:P→N is called an identifier of the network PN, where m _i is the initial identifier, and m _f is the termination identifier;

(3)PN具有下面的变迁发生规则：(3) PN has the following transition rules:

(a)对于变迁t∈T，如果则称变迁t在标识M下使能，记为M[t>；(a) For transition t∈T, if Then the transition t is said to be enabled under the mark M, denoted as M[t>;

(b)若M[t>，则在标识M下，变迁t可以发生，从标识M引发变迁t得到一个新的标识M′，记为M[t>M′，且对 (b) If M[t>, then under the label M, the transition t can occur, and a new label M' can be obtained from the transition t triggered by the label M, which is denoted as M[t>M', and for

在Petri网定义中，^·t表示t的输入集，t^·表示t的输出集，M(p)表示在库所p处的标识，M′(p)表示引发变迁t后在库所p处的标识；In the definition of Petri net ^, t represents the input set of t ^, t represents the output set of t, M(p) represents the identity at place p, and M′(p) represents the position at place p after triggering transition t logo;

定义校准define calibration

γ∈(A^>>×T^>>)*是迹σ和模型PN之间的校准，当且仅当：γ∈(A ^>> ×T ^>> )* is the calibration between the trace σ and the model PN if and only if:

(1)π₁(γ)_↓A＝σ，即迹中的移动序列产生了这条迹；(1) π ₁ (γ) _↓A = σ, that is, the movement sequence in the trace produces this trace;

(2)即模型中的移动序列产生一个完全发生序列；(2) That is, the sequence of moves in the model produces a complete sequence of occurrences;

其中，Γ_σ,PN是迹σ和模型PN之间所有校准的集合，π₁(γ)_↓A表示校准中第一个元素，π₂(γ)_↓T表示校准中第二个元素；where Γ _σ,PN is the set of all calibrations between the trace σ and the model PN, π ₁ (γ) _↓A represents the first element in the calibration, and π ₂ (γ) _↓T represents the second element in the calibration;

校准中存在三类动作：同步动作、日志动作和模型动作；其中，There are three types of actions in calibration: synchronous actions, log actions, and model actions; among them,

同步动作指在迹和模型中均会发生的动作，日志动作指在日志中发生但模型中不发生的动作，模型动作则是指在模型中发生但在日志中不发生的动作；Synchronous actions refer to actions that occur in both the trace and the model, log actions refer to actions that occur in the log but not in the model, and model actions refer to actions that occur in the model but not in the log;

定义最优校准Define Optimal Calibration

函数lc:A^>>×T^>>→IR是校准的可能性代价函数，迹σ和模型PN之间的校准γ∈Γ_σ,PN是一个最优校准当且仅当任意校准γ’∈Γ_σ,PN:Σ_(a,t)∈γlc((a,t))≤Σ_{(a’,t’)∈γ’}lc((a’,t’))；The function lc:A ^>> ×T ^>> →IR is the likelihood cost function of the calibration, the calibration γ∈Γ between the trace σ and the model PN _σ,PN is an optimal calibration if and only if any calibration γ'∈Γ _σ,PN : Σ _(a,t)∈γlc ((a,t))≤Σ _{(a',t')∈γ'lc} ((a',t'));

其中，Γ_σ,PN,lc是可能性代价函数为lc的情况下迹σ和模型PN之间的最优校准集，Σ_(a,t)∈γlc((a,t))表示校准γ的可能下代价值，Σ_{(a’,t’)∈γ’}lc((a’,t’))表示校准γ’的可能性代价值；where Γ _σ,PN,lc is the optimal calibration set between the trace σ and the model PN when the possibility cost function is lc, and Σ _(a,t)∈γ lc((a,t)) denotes the calibration γ The possible next generation value of , Σ _{(a',t')∈γ'} lc((a',t')) represents the possible cost value of calibrating γ';

定义扩展移动队列Define an extended move queue

β∈(A^>>×T^>>×M)*是迹σ和模型PN之间的扩展移动队列，当且仅当：β∈(A ^>> ×T ^>> ×M)* is the extended movement queue between trace σ and model PN if and only if:

(1)π₁(β)_↓A≤σ，即扩展移动队列在迹中的移动均是迹σ的前缀；(1) π ₁ (β) _↓A ≤ σ, that is, the movement of the extended mobile queue in the trace is the prefix of the trace σ;

(2)存在一个完全发生序列有即扩展移动队列在模型中的移动是完全发生序列的前缀；(2) There is a complete occurrence sequence Have That is, the movement of the extended movement queue in the model is the prefix of the complete occurrence sequence;

其中，表示完全发生序列，T*表示在变迁集T下的有限队列；in, Represents a complete sequence, T* represents a finite queue under the transition set T;

(3)m_i[π₂(β)_↓T>m，即扩展移动队列在模型中的移动m∈R(m_i)；(3) m _i [π ₂ (β) _↓T >m, that is, the movement m∈R(m _i ) of the extended mobile queue in the model;

对于扩展移动队列中的三元组(a,t,m)∈β，分别有以下扩展动作：For the triplet (a, t, m) ∈ β in the extended mobile queue, there are the following extended actions:

(1)若a∈A且t＝>>，则称之为扩展日志动作；(1) If a∈A and t=>>, it is called an extended log action;

(2)若a＝>>且t∈T，则称之为扩展模型动作；(2) If a=>> and t∈T, it is called an extended model action;

(3)若a∈A且t∈T，则称之为扩展同步动作；(3) If a ∈ A and t ∈ T, it is called an extended synchronous action;

(4)其他类型称之为非法扩展动作；(4) Other types are called illegal extension actions;

通过扩展移动队列的定义进一步给出扩展校准的定义：The definition of the extended calibration is given further by the definition of the extended mobile queue:

定义扩展校准Define Extended Calibration

扩展校准β是迹σ和模型PN之间的扩展移动队列，当且仅当：The extended calibration β is an extended mobile queue between the trace σ and the model PN if and only if:

(1)π₁(β)_↓A＝σ，即扩展移动队列在迹中的移动是这条迹σ；(1) π ₁ (β) _↓A = σ, that is, the movement of the extended mobile queue in the trace is this trace σ;

(2)即扩展移动队列在模型中的移动是一条完全发生序列；(2) That is, the movement of the extended movement queue in the model is a complete occurrence sequence;

(3)m_i[π₂(β)_↓T>m_f，即扩展移动队列在模型中的移动发生后，最终会到达终止标识，(3) m _i [π ₂ (β) _↓T >m _f , that is, after the movement of the extended movement queue in the model occurs, it will eventually reach the termination mark,

其中，B_σ,PN是迹σ和模型PN之间所有扩展校准的集合；其中，π₁(β)_↓A表示校准中第一个元素在活动集A上的映射，π₂(β)_↓T表示校准中第二个元素在变迁集T中的映射；where B _σ,PN is the set of all extended calibrations between the trace σ and the model PN; where π ₁ (β) _↓A represents the mapping of the first element in the calibration on the active set A, and π ₂ (β) _{↓ T} represents the mapping of the second element in the transition set T in the calibration;

定义最优扩展校准Define Optimal Extended Calibration

迹σ和模型PN之间的扩展校准β∈B_σ,PN是一个最优扩展校准当且仅当任意的扩展校准β’∈B_σ,PN:Σ_(a,t,m)∈βlc((a,t))≤Σ_{(a’,t’，m’)∈β’}lc((a’,t’))；An extended calibration β∈B _σ, PN between the trace σ and the model PN is an optimal extended calibration if and only if any extended calibration β'∈B _σ,PN : Σ _{(a,t,m)∈βlc} ( (a,t))≤Σ _{(a',t',m')∈β'} lc((a',t'));

其中，B_σ,PN,lc是可能性代价函数为lc的情况下迹σ和模型PN之间的最优扩展校准集；where B _σ,PN,lc is the optimal extended calibration set between the trace σ and the model PN when the likelihood cost function is lc;

c偏差修复c bias fix

定义扩展事件日志和扩展迹集Define extended event logs and extended trace sets

对于事件日志L，fitness(L,PN)＝1，事件日志L’是扩展事件日志当且仅当：For event log L, fitness(L, PN) = 1, event log L' is an extended event log if and only if:

(1)L’∩L＝L；(1) L'∩L=L;

(2) (2)

(3)0<fitness(L’,PN)<1；(3) 0<fitness(L',PN)<1;

对于扩展后新出现的迹，其集合PA＝L’–L，称之为扩展迹集；For the new trace after expansion, its set PA=L'-L is called the extended trace set;

其中，fitness(L,PN)表示事件日志L在模型PN下的拟合度，fitness(L’,PN)表示事件日志L’在模型PN下的拟合度；Among them, fitness(L, PN) represents the fitting degree of the event log L under the model PN, and fitness(L', PN) represents the fitting degree of the event log L' under the model PN;

c.1扩展日志动作修复c.1 extended log action fix

定义日志动作偏差集Define log action bias set

设最优扩展校准集Β_L,PN,lc是基于事件日志L和Petri网PN的最优扩展校准集，给定一个库所p，MLS是日志动作偏差集，当且仅当：Let the optimal extended calibration set Β _{L, PN, lc} be the optimal extended calibration set based on event log L and Petri net PN, given a place p, MLS is the log action deviation set, if and only if:

$M m L L S S = = {{m m l l s the s | | &ForAll; &ForAll; m m l l s the s [[i i]] &Element; &Element; m m l l s the s,, {m m}_{i i} ((p p)) = = 11 {andπ andπ}_{22} ((m m l l s the s [[i i]])) = = > > > >}};;$

收集指定库所处日志动作，其步骤如下：Collect the log action of the specified library, the steps are as follows:

Step1:输入最优扩展校准集Β_L,PN,lc和指定库所p，遍历整个最优扩展校准集Β_L,PN,lc；Step1: input optimal extended calibration set Β _{L, PN, lc} and designated place p, traverse the entire optimal extended calibration set Β _{L, PN, lc} ;

遍历最优扩展校准集中的校准β_i，并在遍历过程中执行以下操作：Traverse the calibration β _i in the optimal extended calibration set, and perform the following operations during the traversal:

1)若存在β_i[j..k]∈β_i，对于有m_l(p)＝1，andπ₂(β_i[l])＝>>，则L←L∪γ_i[j..k]；1) If there exists β _i [j..k]∈β _i , for There m _l (p)=1, andπ ₂ (β _i [l])=>>, then L←L∪γ _i [j..k];

其中，β_i[j..k]表示β_i中的部分元素，j，k用于确定范围；β_i[l]表示β_i的一个元素，m_l(p)表示标识m_l在库所p处的值；Among them, β _i [j..k] represents some elements in β _i , and j and k are used to determine the range; β _i [l] represents an element of β _i , and m _l (p) represents the identity of m _l in the place the value at p;

2)若不存在符合1)的扩展序列，则跳转至Step1；2) If there is no extension sequence conforming to 1), then jump to Step1;

Step2：遍历完最优扩展校准集Β_L,PN,lc中所有的校准，输出日志动作偏差集MLS；Step2: After traversing all the calibrations in the optimal extended calibration set Β _{L, PN, lc} , output the log action deviation set MLS;

对扩展日志的日志动作进行修复，其具体步骤如下：To repair the log action of the extended log, the specific steps are as follows:

Step1：输入日志动作偏差集MLS和指定库所p，利用InductiveMiner算法挖掘日志动作偏差集MLS，置PN’←InductiveMiner(MLS)；Step1: Input the log action deviation set MLS and the specified place p, use the InductiveMiner algorithm to mine the log action deviation set MLS, and set PN’←InductiveMiner(MLS);

Step2：对于所有属于m_i’的后集的变迁t，执行以下操作：Step2: For all transitions t belonging to the posterior set of m _i ', perform the following operations:

1)：将弧(p,t)加入到原有模型PN的弧集F中；1): Add the arc (p, t) to the arc set F of the original model PN;

2)：将弧(m_i’,t)从挖掘模型PN’的弧集F’中删除；2): Delete the arc (m _i ', t) from the arc set F' of the mining model PN';

Step3：对于所有属于m_f’的前集的变迁t，执行以下操作：Step3: For all transitions t belonging to the previous set of m _f ', perform the following operations:

1)：将弧(t,p)加入到原有模型PN的弧集F中；1): Add the arc (t,p) to the arc set F of the original model PN;

2)：将弧(t,m_f’)从挖掘模型PN’的弧集F’中删除；2): Delete the arc (t, m _f ') from the arc set F' of the mining model PN';

Step4：删除挖掘模型PN’的库所集P’中的初始库所m_i’和终止库所m_f，’并将库所集P’所有元素添加到原有模型PN的库所集P中；Step4: Delete the initial place m _i ' and the termination place m _f in the place set P' of the mining model PN', and add all elements of the place set P' to the place set P of the original model PN ;

Step5：将库所集T’所有元素添加到原有模型PN的库所集T中；Step5: Add all elements of the place set T' to the place set T of the original model PN;

Step6：将库所集F’所有元素添加到原有模型PN的库所集F中；Step6: Add all elements of the place set F' to the place set F of the original model PN;

Step7：输出模型PN；Step7: Output model PN;

c.2对扩展日志的模型动作进行修复，其具体步骤如下：c.2 Repair the model action of the extended log, the specific steps are as follows:

Step1：输入校准β和模型动作出现的位置i，在原有模型PN的变迁集中添加不可见变迁τ；Step1: Input the calibration β and the position i where the model action appears, and add the invisible transition τ to the transition set of the original model PN;

Step2：对于所有属于π₂(β[i])的前集的库所p，执行以下操作：Step2: For all places p belonging to the previous set of π ₂ (β[i]), perform the following operations:

1)：将库所p加入到原有模型PN的库所集P中；1): Add the place p to the place set P of the original model PN;

2)：将弧(p,τ)加入到原有模型PN的弧集F中；2): Add the arc (p, τ) to the arc set F of the original model PN;

Step3：对于所有属于π₂(β[i])的后集的库所p，执行以下操作：Step3: For all places p belonging to the back set of π ₂ (β[i]), perform the following operations:

2)：将弧(τ,p)加入到原有模型PN的弧集F中；2): Add the arc (τ, p) to the arc set F of the original model PN;

Step4：输出模型PN。Step4: Output model PN.

本发明具有如下优点：The present invention has the following advantages:

本发明首先对校准提出了扩展，加入了Petri网的可达标识，提出了扩展校准的概念，并试图通过扩展校准来识别可能出现的各类扩展动作。最优扩展校准则能为各类偏差的修复提供支持。通过对扩展日志动作和扩展模型动作分别进行分析，本发明分别提出了算法对这两类偏差进行了修复。通过某医院胸外科的事件日志对这一方法进行仿真实验发现，本发明在精确度和泛化度上均有提升。相较于现有方法，本发明在各个维度上均有着较好的表现。The present invention firstly proposes an extension to the calibration, adds the reachability mark of the Petri net, proposes the concept of the extended calibration, and attempts to identify various possible extended actions through the extended calibration. The optimal extended calibration can provide support for the repair of various deviations. By separately analyzing the extended log action and the extended model action, the present invention proposes algorithms to repair these two types of deviations. A simulation experiment of this method is carried out through the event log of the Thoracic Surgery Department of a certain hospital, and it is found that the present invention has improved accuracy and generalization. Compared with the existing methods, the present invention has better performance in each dimension.

附图说明Description of drawings

图1为本发明中Petri网的模型示意图；Fig. 1 is the model schematic diagram of Petri net among the present invention;

图2为本发明中基于Petri网的迹的扩展校准示意图；Fig. 2 is the extended calibration schematic diagram of the trace based on Petri net among the present invention;

图3为本发明中基于Petri网的事件日志L中所有的迹的最优扩展校准示意图；Fig. 3 is a schematic diagram of optimal extended calibration of all traces in the event log L based on Petri net in the present invention;

图4为本发明中日志动作的修复示意图；Fig. 4 is the restoration sketch map of log action in the present invention;

图5为本发明中基于Petri网的迹的扩展校准示意图；Fig. 5 is the extended calibration schematic diagram of the trace based on Petri net among the present invention;

图6为本发明中模型动作的修复示意图；Fig. 6 is a schematic diagram of repairing model actions in the present invention;

图7为本发明实例中某医院胸外科的Petri网模型示意图；Fig. 7 is the Petri net model schematic diagram of certain hospital thoracic surgery in the example of the present invention;

图8为本发明实例中某医院胸外科修复后的Petri网模型示意图。Fig. 8 is the schematic diagram of the Petri net model after the repair of a certain hospital thoracic surgery in the example of the present invention.

具体实施方式detailed description

本发明的基本思想是：本发明试图综合考虑以上各个维度来对过程模型进行修复。校准能够对事件日志进行重演，发现各类偏差，即日志动作和模型动作，却无法确定偏差在Petri网中出现的位置，因此，本发明基于Petri网的可达标识，提出了扩展校准的概念，这样便能确定偏差的位置。进一步地，本发明针对扩展校准中出现的日志动作和模型动作分别提出了算法进行修复。最后，通过实验证明了修复算法在各个维度上均有着较好的表现。The basic idea of the present invention is: the present invention attempts to restore the process model by comprehensively considering the above dimensions. Calibration can replay the event log and find various deviations, namely log actions and model actions, but cannot determine where the deviations appear in the Petri net. Therefore, the present invention proposes the concept of extended calibration based on the reachability mark of the Petri net , so that the location of the deviation can be determined. Furthermore, the present invention proposes algorithms for repairing log actions and model actions that appear in the extended calibration. Finally, the experiment proves that the restoration algorithm has better performance in each dimension.

下面结合附图以及具体实施方式对本发明作进一步详细说明：Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:

一种基于可达标识的过程模型修复方法，其包括如下步骤：A method for repairing a process model based on reachability identification, which includes the following steps:

1、参数定义1. Parameter definition

定义迹和事件日志Define traces and event logs

设是活动集，迹σ∈A*是活动队列，而事件日志L则是迹的多重集合，即 Assume is the activity set, the trace σ∈A* is the activity queue, and the event log L is a multiple set of traces, namely

Petri网是一个包含库所(place)和变迁(transition)的有向二分图。Petri net is a directed bipartite graph including places and transitions.

定义Petri网Defining Petri nets

以一个过程模型作为参考，活动在执行过程没有偏离模型，意味着在当前状态下活动是被模型所允许的。也就是说，给定一个无任何偏差的事件日志，活动是完全按照模型的顺序来执行的。基于这种思路，现有技术中给出了校准这一概念，用以比较事件日志中的活动与模型上的活动。显而易见地，过程模型的一个实例，不能够直接影响相同过程下的其它实例。因此，对于校准来说，每一个实例都要对应一个校准。Taking a process model as a reference, the activity does not deviate from the model during its execution, which means that the activity is allowed by the model in its current state. That is, given an event log without any bias, activities are executed exactly in the order of the model. Based on this idea, the concept of calibration is given in the prior art to compare the activities in the event log with the activities on the model. Obviously, an instance of a process model cannot directly affect other instances of the same process. So, for calibration, there is one calibration per instance.

定义校准define calibration

校准中存在三类动作：同步动作、日志动作和模型动作。There are three types of actions in calibration: synchronous actions, log actions, and model actions.

同步动作指在迹和模型中均会发生的动作，日志动作指在日志中发生但模型中不发生的动作，模型动作则是指在模型中发生但在日志中不发生的动作。Synchronous actions refer to actions that occur in both the trace and the model, log actions refer to actions that occur in the log but not in the model, and model actions refer to actions that occur in the model but not in the log.

给定一个迹和一个Petri网，可能会存在多个校准。在实际过程中，偏差的可能性会随着活动的不同，人为制定的不同而产生不同的影响。例如，Given a trace and a Petri net, multiple calibrations may exist. In the actual process, the possibility of deviation will have different effects according to the different activities and different artificial formulations. For example,

可以认为日志动作产生的偏差要大于模型动作产生的偏差。为了能够产生最有可能的校准，需要对每一个移动赋予一个可能性代价值。在本发明中，对于日志动作和模型动作，其可能性代价值为1，对于同步扩展动作，其可能性代价值为0。根据制定的可能性代价值，能够找到可能性代价值最低的校准，这种校准称之为最优校准。It can be considered that the bias produced by the log action is larger than the bias produced by the model action. In order to be able to produce the most probable alignment, it is necessary to assign a likelihood cost to each move. In the present invention, for the log action and the model action, the possibility cost value is 1, and for the synchronous extension action, the possibility cost value is 0. According to the established possible cost value, the calibration with the lowest possible cost value can be found, and this kind of calibration is called the optimal calibration.

定义最优校准Define Optimal Calibration

设函数lc:A^>>×T^>>→IR是校准的可能性代价函数，迹σ和模型PN之间的校准γ∈Γ_σ,PN是一个最优校准当且仅当：γ’∈Γ_σ,PN:Σ_(a,t)∈γlc((a,t))≤Σ_{(a’,t’)∈γ’}lc((a’,t’))；Let the function lc:A ^>> ×T ^>> →IR be the possibility cost function of the calibration, the calibration between the trace σ and the model PN γ∈Γ _σ,PN is an optimal calibration if and only if: γ'∈Γ _σ,PN : Σ _(a,t)∈γlc ((a,t))≤Σ _{(a',t')∈γ'lc} ((a',t'));

其中，Γ_σ,PN,lc是可能性代价函数为lc的情况下迹σ和模型PN之间的最优校准集。where Γ _σ,PN,lc is the optimal calibration set between the trace σ and the model PN with the likelihood cost function lc.

校准和最优校准能够将事件日志和过程模型关联起来，并且最优校准能够处理过程模型中的不可见变迁、复杂的控制流和循环等。当偏差出现时，校准能够展示出偏差所在的位置，并且为进一步的研究提供诊断信息。可能性代价函数则为校准提供了一种严重度的度量方式。Calibration and optimal calibration can correlate event logs with process models, and optimal calibration can handle invisible transitions in process models, complex control flows and loops, etc. Calibration can show where deviations occur when they occur and provide diagnostic information for further investigation. The likelihood cost function provides a measure of severity for calibration.

2、扩展校准2. Extended Calibration

过程模型修复属于过程增强的具体应用，其理念是利用实际过程中产生的事件日志扩展和改进一个已经存在的过程模型。校准能够对事件日志进行重演，发现各类偏差，即日志动作和模型动作，却无法确定偏差在Petri网中出现的位置。因此，本发明通过引入可达标识对校准进行扩展，来确定偏差出现的位置。进而才有可能在相应的位置处对偏差进行修复。Process model repair belongs to the specific application of process enhancement, and its idea is to use the event log generated in the actual process to extend and improve an existing process model. Calibration can replay the event log and find various deviations, that is, log actions and model actions, but it cannot determine where the deviations appear in the Petri net. Therefore, the present invention expands the calibration by introducing reachable marks to determine where the deviation occurs. Then it is possible to repair the deviation at the corresponding position.

过程模型修复需要有评价指标，拟合度是过程模型最为重要的评价指标，为了便于描述，本发明使用fitness(L,N)来表示事件日志L在模型N下的拟合度。Process model restoration requires an evaluation index, and the fitting degree is the most important evaluation index of the process model. For the convenience of description, the present invention uses fitness(L, N) to represent the fitting degree of the event log L under the model N.

对于其他指标，要在尽可能保证高拟合度的情况下进行考虑，因此本发明正是在保证高拟合度的同时，使过程模型有更好的精确度和简化度。For other indexes, it should be considered under the condition of ensuring high fitting degree as much as possible, so the present invention makes the process model have better accuracy and simplification while ensuring high fitting degree.

为了对过程模型进行修复，需要对校准来进行一些扩展以适应需求。In order to fix the process model, some extensions to the calibration are needed to suit the needs.

定义扩展移动队列Define an extended move queue

定义扩展校准Define Extended Calibration

以图1所示的Petri网为例，假设存在一条迹σ＝<a,d,e,f>，其扩展校准如图2所示。以迹σ＝<a,d,e,f>为例，可以得到图2所示的扩展校准。对于以上两个扩展校准，能够发现扩展移动(a,a,m₁)是一个同步扩展动作，(f,>>,m₅)是一个日志扩展动作，(>>,b,m₃)是一个模型扩展动作。其中可达标识m₂:m₂(p₂)＝m₂(p₅)＝1,m₂(p₁)＝m₂(p₂)＝m₂(p₃)＝m₂(p₄)＝m₂(p₆)＝0。Taking the Petri net shown in Figure 1 as an example, assuming there is a trace σ=<a, d, e, f>, its extended calibration is shown in Figure 2. Taking the trace σ=<a,d,e,f> as an example, the extended calibration shown in Figure 2 can be obtained. For the above two extended calibrations, it can be found that the extended move (a,a,m ₁ ) is a synchronous extended action, (f,>>,m ₅ ) is a log extended action, and (>>,b,m ₃ ) is A model extension action. Among them, the reachable identifier m ₂ : m ₂ (p ₂ )=m ₂ (p ₅ )=1, m ₂ (p ₁ )=m ₂ (p ₂ )=m ₂ (p ₃ )=m ₂ (p ₄ ) =m ₂ (p ₆ )=0.

在实际中偏差的严重度会随着活动的不同、人为制定的不同而产生不同的影响。为了能够产生最有可能的扩展校准，需要对每一个扩展动作赋予一个可能性代价值。根据制定的可能性代价函数，便能够找到可能性代价值最低的扩展校准，这种校准称之为最优扩展校准。In practice, the severity of the deviation will have different impacts depending on the activities and human formulations. In order to be able to generate the most probable extended calibration, it is necessary to assign a likelihood cost value to each extended action. According to the formulated possibility cost function, the extended calibration with the lowest possible cost value can be found, which is called the optimal extended calibration.

定义最优扩展校准Define Optimal Extended Calibration

在本发明中，对于日志扩展动作和模型扩展动作，其可能性代价值为1，对于同步扩展动作，其可能性代价值为0。因此，对于图2中的两个扩展校准β₁和β₂，其可能性代价值lc(β₁)＝2，lc(β₂)＝2，显然，β₁，β₂均是是迹σ＝<a,e,i,f,g>的最优扩展校准。In the present invention, for the log extension action and the model extension action, the possibility cost value is 1, and for the synchronous extension action, the possibility cost value is 0. Therefore, for the two extended calibrations β ₁ and β ₂ in Fig. 2, the possible cost values lc(β ₁ )=2, lc(β ₂ )=2, obviously, β ₁ and β ₂ are traces σ = Optimal extended calibration for <a,e,i,f,g>.

3、偏差修复方法3. Deviation repair method

在现实生活中，过程模型往往通过过程挖掘算法或者手动进行建立。因此存在着这样一种情形：一开始的事件日志与过程模型完全符合的，但随着实际过程的发展，事件日逐渐扩大，出现了一些原有的过程模型中不存在的迹。这类日志本发明中称之为扩展日志。In real life, process models are often built through process mining algorithms or manually. Therefore, there is such a situation: the event log is completely consistent with the process model at the beginning, but with the development of the actual process, the event log gradually expands, and some traces that do not exist in the original process model appear. This type of log is called extended log in the present invention.

在此本发明给出了扩展日志的形式化定义。Herein, the present invention provides a formalized definition of an extended log.

对于事件日志L，fitness(L,PN)＝1。事件日志L’是扩展事件日志当且仅当：For event log L, fitness(L,PN)=1. The event log L' is an extended event log if and only if:

(1)L’∩L＝L；(1) L'∩L=L;

(2) (2)

(3)0<fitness(L’,PN)<1。(3) 0<fitness(L',PN)<1.

对于扩展后新出现的迹，其集合PA＝L’–L，称之为扩展迹集。For the newly appearing traces after expansion, the set PA=L’–L is called the extended trace set.

对于扩展事件日志来说，扩展迹集中的迹导致扩展事件日志的拟合度出现了下降。因此，只要发现扩展迹集中出现的各类偏差，并对各类偏差进行修复，就可以完成扩展事件日志的过程模型修复工作。For the extended event log, the traces in the extended trace set lead to a decrease in the fit of the extended event log. Therefore, as long as various deviations appearing in the extended trace set are found and repaired, the process model restoration work of the extended event log can be completed.

对于最优扩展校准来说，存在两类基本的偏差类型，扩展日志动作和扩展模型动作。在此，分别对这两类偏差进行处理修复。For optimal extended calibration, there are two basic types of biases, extended log actions and extended model actions. Here, the two types of deviations are processed and repaired respectively.

3.1扩展日志动作修复3.1 Extended log action repair

扩展日志动作的出现，说明事件日志中出现了过程模型中不存在的活动，需要在原有的过程模型上进行扩展，因而需要在出现日志动作的库所处添加一个自环来满足这类修复需求。当一条迹存在多条最优扩展校准，并且其偏差完全相同时，取其中任意一条作为参考并不会影响过程模型的修复。因此，本发明对此类情况仅仅取其中任意一条最优扩展校准。对于事件日志来说，同一库所处不同的迹可能出现不同的日志动作，因此需要对该库所处的日志动作进行归类整理。将其以事件日志的形式进行保存，之后调用挖掘算法，来挖掘这一事件日志，在挖掘出相应的结构后，将其添加到该库所处。The appearance of the extended log action indicates that there are activities in the event log that do not exist in the process model, and the original process model needs to be extended. Therefore, it is necessary to add a self-loop to the repository where the log action appears to meet this type of repair requirement. . When there are multiple optimal extended calibrations for a trace, and their deviations are exactly the same, taking any one of them as a reference will not affect the restoration of the process model. Therefore, the present invention only selects any one of the optimal extended calibrations for such cases. For event logs, different log actions may occur in different traces of the same library, so it is necessary to classify the log actions of the library. Save it in the form of an event log, and then call the mining algorithm to mine the event log, and add it to the library after mining the corresponding structure.

定义扩展校准时可知，π₃(β)是模型中变迁发生后的可达标识，由Petri网的发生规则可知，对于任意的π₃(β[j])∈π₃(β),至少存在一个库所p，其可达标识m_j(p)＝1，若可达标识存在多个库所使得其可达标识值为1时，需要对其进行预处理，以免出现多次重复修复。因而，可以任意取一个作为日志收集的库所，本发明则是选取了最早出现的库所。When defining the extended calibration, it can be known that π ₃ (β) is the reachable label after the transition in the model occurs. According to the occurrence rules of Petri nets, for any π ₃ (β[j])∈π ₃ (β), there exists at least For a place p, its reachability flag m _j (p)=1, if there are multiple places in the reachable flag so that the reachable flag value is 1, it needs to be preprocessed to avoid repeated repairs. Therefore, one can arbitrarily select a storehouse for log collection, and the present invention selects the earliest storehouse.

首先需要设计一个算法，来实现对某一库所处日志动作的收集。其输入是事件日志对应的最优扩展校准集和指定的库所，其思想是依次对每一条扩展校准进行遍历，当发现指定库所并且是扩展日志动作时，将其存入日志MLS中，对于日志MLS，在此称之为日志动作偏差集。遍历完所有的扩展最优校准后将会得到一个事件日志，这个事件日志为下一步进行过程模型修复打下了基础。下面给出日志动作偏差集的形式化定义：First of all, it is necessary to design an algorithm to realize the collection of log actions in a certain library. Its input is the optimal extended calibration set corresponding to the event log and the designated location. The idea is to traverse each extended calibration in turn. When the specified location is found and it is an extended log action, it will be stored in the log MLS. For log MLS, it is called log action deviation set here. After traversing all the extended optimal calibrations, an event log will be obtained, which lays the foundation for the next step of process model repair. The formal definition of log action deviation set is given below:

定义日志动作偏差集Define log action bias set

设是活动集，L是基于活动集A的事件日志，PN＝(P,T,F,M)是基于活动集A的Petri网，设最优扩展校准集Β_L,PN,lc是基于事件日志和Petri网的最优扩展校准集，给定一个库所p，MLS是日志动作偏差集，当且仅当：Assume is the active set, L is the event log based on the active set A, PN=(P, T, F, M) is the Petri net based on the active set A, and the optimal extended calibration set Β _{L, PN, lc} is based on the event log and the optimal extended calibration set of Petri nets, given a place p, MLS is a log action deviation set if and only if:

$M m L L S S = = {{m m l l s the s | | &ForAll; &ForAll; m m l l s the s [[i i]] &Element; &Element; m m l l s the s,, {m m}_{i i} ((p p)) = = 11 {andπ andπ}_{22} ((m m l l s the s [[i i]])) = = > > > >}} . .$

算法1收集指定库所处日志动作，其步骤如下：Algorithm 1 collects the log actions of the specified library, and its steps are as follows:

Step2：遍历完最优扩展校准集Β_σ,PN,lc中所有的校准，输出日志动作偏差集MLS；Step2: After traversing all the calibrations in the optimal extended calibration set Β _{σ, PN, lc} , output the log action deviation set MLS;

以图1的Petri网为例，假设事件日志L＝{<a,b,f,g,h,d,e>,<a,b,f,h,d,e>,<a,b,f,k,h,d,e>}，分别求其最优扩展校准，如图3所示。Taking the Petri net in Figure 1 as an example, suppose the event log L={<a,b,f,g,h,d,e>,<a,b,f,h,d,e>,<a,b, f, k, h, d, e>}, respectively seek their optimal extended calibration, as shown in Figure 3.

通过最优扩展校准使用上述步骤得到其日志动作偏差集MLS＝{<f,g,h>,<f,h>,<f,k,h>}。The log action bias set MLS={<f,g,h>,<f,h>,<f,k,h>} is obtained through the optimal extended calibration using the above steps.

在收集到所有指定库所处的日志动作，将其存储到日志动作偏差集中。下一步是对日志动作偏差集进行处理，可以通过挖掘算法来挖掘过程模型，之后将这一模型添加到指定的库所处。目前，国内外的很多学者都致力于研究过程挖掘算法。比如，基于活动顺序关系产生的α算法以及其衍生算法，基于语义技术提出的区域挖掘和ILP挖掘等等。本发明所使用的过程挖掘算法是归纳挖掘。下面，给出日志动作的修复算法，其主要思想是通过挖掘算法挖掘出日志动作偏差集对应的过程模型，再将这一过程模型插入到指定的库所处。After collecting the log actions of all specified libraries, store them in the log action deviation set. The next step is to process the log action deviation set. The process model can be mined through the mining algorithm, and then this model is added to the specified repository. At present, many scholars at home and abroad are devoted to the research of process mining algorithms. For example, the α algorithm based on the sequence relationship of activities and its derivative algorithms, the region mining and ILP mining based on semantic technology, and so on. The process mining algorithm used in the present invention is inductive mining. In the following, the restoration algorithm of the log action is given. The main idea is to dig out the process model corresponding to the log action deviation set through the mining algorithm, and then insert this process model into the specified repository.

算法2对扩展日志的日志动作进行修复，其具体步骤如下：Algorithm 2 repairs the log action of the extended log, and its specific steps are as follows:

Step7：输出模型PN。Step7: Output model PN.

继续以事件日志L为例，根据其日志动作偏差集MLS＝{<f,g,h>,<f,h>,<f,k,h>}，按照算法2，制定库所为p₃，可以得到如图4的过程模型，其中修复的部分已标出。Continuing to take the event log L as an example, according to its log action deviation set MLS={<f,g,h>,<f,h>,<f,k,h>}, according to Algorithm 2, formulate the place as p ₃ , the process model shown in Figure 4 can be obtained, where the repaired part has been marked.

3.2扩展模型动作修复3.2 Extended Model Action Repair

模型动作的出现说明事件日志中不存在这种活动，为了使事件日志在过程模型中重演，添加一个不可见变迁来扩展模型，使其与事件日志保持高拟合性。The appearance of model actions indicates that such activities do not exist in the event log. In order to make the event log replay in the process model, an invisible transition is added to extend the model so that it maintains a high fit with the event log.

下面给出该情形下的过程模型修复算法，算法的输入是最优扩展校准中的模型动作，通过模型动作来确定其在过程模型中对应的变迁，进而确定变迁的前集和后集，进一步添加一个不可见变迁，最后将过程模型输出。The process model restoration algorithm in this situation is given below. The input of the algorithm is the model action in the optimal extended calibration, and the corresponding transition in the process model is determined through the model action, and then the front set and the back set of the transition are determined, and further Add an invisible transition, and finally output the process model.

算法3对扩展日志的模型动作进行修复，其具体步骤如下：Algorithm 3 repairs the model action of the extended log, and its specific steps are as follows:

Step4：输出模型PN。Step4: Output model PN.

以图1所示的Petri网为例，对于迹σ＝<a,d,e>来说，其最优扩展校准如图5所示。按照上述步骤对其进行修复，可以得到修复后的过程模型，如图6所示。Taking the Petri net shown in Figure 1 as an example, for the trace σ=<a,d,e>, its optimal extended calibration is shown in Figure 5. According to the above steps to repair it, the repaired process model can be obtained, as shown in Figure 6.

通过确认扩展日志动作和扩展模型动作，可以确定事件日志中的偏差在模型中出现的位置，分别对这两类偏差进行修复，模型最终将会得到修复。By confirming the extended log action and the extended model action, you can determine where the deviations in the event log appear in the model, and repair these two types of deviations respectively, and the model will eventually be repaired.

本发明以某医院胸外科的业务流程为例，通过对原有事件日志的基础上进行挖掘建模，得到图7的Petri网模型。其主要活动过程描述如下：首先，病人可以进行预约，预约的方式有两种：分诊台预约(reserve at triage station)和电话预约(reserve by phone)，在预约完成后，进行预约登记(booking)和取得预约号(get reservation number)。病人也可以不进行预约直接进行就诊(consult without reservation)，因此病人需要进行挂号(registration)。对于预约的病人和挂号的病人，医院需要进行排号(arranging)、顺序叫号(call number by order)，病人依照顺序找医生问诊(inquiry)。医生通过问诊来确定是否进行影像检查和临床检验。影像检查一共有三种类型：普通CT(common ct)、PET-CT和胸部增强(chest enhanced scan)。临床检验有4种类型，血沉(ESR)、生化全套(Biochemicalfull set)、血气分析(blood gas analysis)和血常规(blood routine)。之后，医生根据检查结果进行诊断(diagnosis)。若病人病情无碍，则可离开医院(leave)。若病人病情严重，则需要住院治疗(hospitalization)。The present invention takes the business process of the thoracic surgery department of a certain hospital as an example, and obtains the Petri net model in FIG. 7 by mining and modeling on the basis of the original event log. Its main activity process is described as follows: First, patients can make an appointment. There are two ways to make an appointment: reserve at triage station and reserve by phone. After the appointment is completed, make an appointment registration (booking ) and get reservation number. Patients can also consult without reservation directly, so patients need to register. For patients who make appointments and registered patients, the hospital needs to arrange and call number by order, and the patients will seek a doctor's consultation (inquiry) according to the order. Doctors ask questions to determine whether to order imaging tests and clinical tests. There are three types of imaging examinations: common CT (common ct), PET-CT, and chest enhanced scan. There are 4 types of clinical tests, erythrocyte sedimentation rate (ESR), biochemical full set, blood gas analysis and blood routine. After that, the doctor makes a diagnosis based on the test results. If the patient is in good condition, they can leave the hospital. If the patient is seriously ill, hospitalization is required.

由于医院的数据量庞大，在此仅取了其中部分作为事件日志，本次实验用到的事件日志由2016条迹组成，分别对每一条迹进行校准，能够得到所有的最优扩展校准集。这一事件日志在原有的事件日志的基础上，出现了新的活动，使得这一事件日志并不能够被上述的过程模型所重演，即这一模型相对于真实流程出现了活动的缺失。Due to the huge amount of data in the hospital, only part of it is taken as the event log here. The event log used in this experiment consists of 2016 traces, and each trace is calibrated separately to obtain all the optimal extended calibration sets. On the basis of the original event log, new activities appear in this event log, so that this event log cannot be reproduced by the above process model, that is, there is a lack of activities in this model relative to the real process.

以迹σ＝<Reserve by Internet,Booking,Get reservation number,Arranging,Call number by order,Inquiry,Common ct,ESR,Diagnosis,Hospitalization>为例，给出其最优扩展校准。Taking the trace σ=<Reserve by Internet, Booking, Get reservation number, Arranging, Call number by order, Inquiry, Common ct, ESR, Diagnosis, Hospitalization> as an example, the optimal extended calibration is given.

通过该最优扩展校准可以发现，(Reserve by Internet,>>,m₁)是一个扩展日志动作，说明在此处存在一个模型中不存在的活动，这一活动可能便是模型缺失的活动。而(>>,Reserve by Phone,m₂)是一个扩展模型动作，说明这一活动本应存在于迹中，然而事实并未存在。对于这两种动作，均需要相应的修复算法对模型进行修复。对所有的迹进行扩展校准，便能够发现事件日志中存在的所有的扩展日志动作和扩展模型动作。进而，可以对这些动作进行修复。Through the optimal extended calibration, it can be found that (Reserve by Internet,>>,m ₁ ) is an extended log action, indicating that there is an activity that does not exist in the model, and this activity may be the missing activity in the model. And (>>,Reserve by Phone,m ₂ ) is an extended model action, indicating that this activity should exist in the trace, but the fact does not exist. For these two actions, corresponding repair algorithms are required to repair the model. Extensive calibration of all traces enables discovery of all extended log actions and extended model actions present in the event log. In turn, these actions can be repaired.

为了修复扩展日志动作，需要在事件日志中进行收集这些动作。本发明所有的算法均在ProM6.0平台中实现，ProM是一个开源的过程挖掘工具，其提供了源代码供人们学习和应用。In order to fix extended log actions, it is necessary to collect these actions in the event log. All the algorithms of the present invention are implemented in the ProM6.0 platform. ProM is an open source process mining tool, which provides source codes for people to learn and apply.

首先，我们将事件日志和过程模型作为输入，通过Replay a Log on Petri Netfor All Alignments插件来获得事件日志的最优校准集Γ_L,PN,lc。First, we take the event log and the process model as input, and obtain the optimal calibration set Γ _L,PN,lc of the event log through the Replay a Log on Petri Net for All Alignments plug-in.

在最优校准集的基础上，本发明对其添加了可达标识，得到了其最优扩展校准集Β_L,PN,lc。On the basis of the optimal calibration set, the present invention adds a reachability mark to it, and obtains its optimal extended calibration set BL _,PN,lc .

在获得了事件日志的最优扩展校准集Β_L,PN,lc后，需要对日志动作偏差集进行收集。以最优扩展校准集Β_L,PN,lc和库所p₁作为输入，调用算法1。After obtaining the optimal extended calibration set Β _{L, PN, lc} of the event log, it is necessary to collect the log action deviation set. Algorithm 1 is invoked with the optimal extended calibration set Β _{L, PN, lc} and place p ₁ as input.

对最优扩展校准集中Β_L,PN,lc所有的扩展校准进行遍历，可以发现，其中部分最优扩展校准存在β_i[j..k]∈β_i，并且对于有m_l(p₁)＝1且π₂(β_i[l])＝>>，算法1将其添加到一个日志L中。若最优扩展校准不满足这一条件，算法将继续进行遍历。Traversing all the extended calibrations in the optimal extended calibration set Β _{L, PN, lc} , it can be found that some of the optimal extended calibrations exist β _i [j..k]∈β _i , and for With m _l (p ₁ )=1 and π ₂ (β _i [l])=>>, Algorithm 1 adds it to a log L. If the optimal extended calibration does not satisfy this condition, the algorithm will continue to traverse.

遍历完最优扩展校准集Β_σ,PN,lc后，算法1会输出在p₁处的日志动作偏差集MLS。After traversing the optimal extended calibration set Β _σ,PN,lc , Algorithm 1 will output the log action deviation set MLS at p ₁ .

相似地，对过程模型所有的库所进行算法1的处理，便可以得到整个过程模型各个库所处的日志动作偏差集。Similarly, by performing Algorithm 1 processing on all the locations of the process model, the log action deviation set of each location of the entire process model can be obtained.

对于本发明所使用到的事件日志，通过算法1，得到的日志动作偏差集，如表1所示。For the event log used in the present invention, the log action deviation set obtained through Algorithm 1 is shown in Table 1.

表1各个位置处的日志动作偏差集Table 1 Log Action Bias Sets at Various Positions

这些动作均存在于事件日志中，而过程模型中不存在，因此需要对过程模型进行修复。以p₁处的日志动作偏差集MLS和指定库所p₁作为输入，调用算法2。These actions are present in the event log but not in the process model, so the process model needs to be fixed. With the log action deviation set MLS at p ₁ and the designated place p ₁ as input, call Algorithm 2.

算法2首先会调用ProM里已经存在的算法InductiveMiner算法来挖掘日志动作偏差集MLS，会得到一个中间变量的Petri网PN。’Algorithm 2 will first call the InductiveMiner algorithm that already exists in ProM to mine the log action deviation set MLS, and obtain a Petri net PN of an intermediate variable. '

紧接着，算法2会对PN’进行处理，对于所有属于m_i’的后集的变迁t，会将弧(p,t)加入到原有模型PN的弧集F中并且弧(m_i’,t)从挖掘模型PN’的弧集F’中删除；Next, Algorithm 2 will process PN'. For all transitions t belonging to the back set of _{mi ', the arc (p,t) will be added to the arc set F of the original model PN and the arc (m i} _' ,t) is deleted from the arc set F' of the mining model PN';

对于所有属于m_f’的前集的变迁t，会将弧(t,p)加入到原有模型PN的弧集F中并且将弧(t,m_f’)从挖掘模型PN’的弧集F’中删除。For all transitions t belonging to the previous set of m _f ', the arc (t,p) will be added to the arc set F of the original model PN and the arc (t,m _f ') will be added to the arc set of the mining model PN'F' is deleted.

之后，算法2会删除挖掘模型PN’的库所集P’中的初始库所m_i’和终止库所m_f’，并将库所集P’所有元素添加到原有模型PN的库所集P中。Afterwards, Algorithm 2 will delete the initial place m _i ' and the termination place m _f ' in the place set P' of the mining model PN', and add all elements of the place set P' to the places of the original model PN Set P in.

进一步地，算法2将库所集T’所有元素添加到原有模型PN的库所集T中；并且将库所集F’所有元素添加到原有模型PN的库所集F中。Further, Algorithm 2 adds all elements of place set T' to place set T of the original model PN; and adds all elements of place set F' to place set F of the original model PN.

最后，算法2会输出模型PN。Finally, Algorithm 2 outputs the model PN.

相似的，对于p₁₂处的日志动作偏差集MLS和指定库所p₁₂，通过算法2对过程模型进行修复。Similarly, for the log action deviation set MLS at p ₁₂ and the designated place p ₁₂ , the process model is repaired by Algorithm 2.

最后是对于扩展模型动作的修复，以迹σ的最优扩展校准β为例，可以发现当i＝2时，出现了一个扩展模型动作。以最优扩展校准β和i＝2为输入，调用算法3。Finally, for the restoration of the extended model action, taking the optimal extended calibration β of the trace σ as an example, it can be found that when i=2, an extended model action appears. Algorithm 3 is invoked with the optimal extended calibration β and i=2 as input.

首先，算法3会在原有模型PN的变迁集T中添加不可见变迁τ。First, Algorithm 3 will add invisible transition τ to the transition set T of the original model PN.

接着，算法3对于所有属于π₂(β[i])的前集的库所p，将库所p加入到原有模型PN的库所集P中；将弧(p,τ)加入到原有模型PN的弧集F中。Next, in Algorithm 3, for all the places p belonging to the previous set of π ₂ (β[i]), add the places p to the place set P of the original model PN; add the arc (p,τ) to the original There is an arc set F of model PN.

之后，算法3对于所有属于π₂(β[i])的后集的库所p，将库所p加入到原有模型PN的库所集P中；将弧(τ,p)加入到原有模型PN的弧集F中。Afterwards, Algorithm 3, for all the places p belonging to the back set of π ₂ (β[i]), adds the places p to the original model PN’s place set P; adds the arc (τ,p) to the original There is an arc set F of model PN.

最后，算法3输出模型PN。Finally, Algorithm 3 outputs the model PN.

相似的，遍历整个最优扩展校准集Β_σ,PN,lc，修复所有出现的扩展模型动作。Similarly, traverse the entire optimal extended calibration set Β _σ,PN,lc and repair all the extended model actions that appear.

修复后的过程模型在p₁处添加了两个活动，网上预约(reserve by Internet)和微信预约(reserve by WeChat)。在p₁₃处添加了两类新的诊断措施，保守治疗(Conservative treatment)和转院治疗(Referral treatment)。另外如图所示，Petri网模型中添加了两个不可见变迁，用以修复最优扩展校准中出现的模型动作。 _The repaired process model adds two activities at p1, reserve by Internet and reserve by WeChat. Added two new categories of diagnostic measures at p ₁₃ , Conservative treatment and Referral treatment. In addition, as shown in the figure, two invisible transitions are added to the Petri net model to repair the model actions that occur in the optimal extended calibration.

对于修复后的过程模型，不难发现，其事件日志可以被其完全重演了。即这一过程模型符合了真实的流程。为了更好的对这一模型进行评价，给出了这一模型与现有方法在拟合度、精确度、泛化度和时间上的比较，其指标如表2所示。For the repaired process model, it is not difficult to find that its event log can be completely replayed by it. That is, this process model conforms to the real process. In order to better evaluate this model, the comparison between this model and existing methods in terms of fitting degree, accuracy, generalization degree and time is given, and its indicators are shown in Table 2.

表2过程模型的一致性指标Table 2 Consistency indicators of the process model

拟合度Fit 精确度Accuracy 泛化度Generalization 时间(s)time(s) 原模型original model 0.85660.8566 0.75110.7511 0.90210.9021 现有的方法existing methods 11 0.81880.8188 0.976020.97602 6.56.5 本发明方法The method of the invention 11 0.85020.8502 0.988020.98802 6.46.4

根据表2可以发现，本发明方法和现有方法在拟合度上都为1，说明事件日志能够完全在过程模型中重演。在精确度和泛化度方面，本发明的方法较现有方法有所提高。而对于时间复杂度而言，由于两个方法使用的同一种挖掘方法，因而两种方法在时间上差距很小。According to Table 2, it can be found that the fitting degree of the method of the present invention and the existing method is 1, indicating that the event log can be completely reproduced in the process model. In terms of accuracy and generalization, the method of the invention is improved compared with the existing methods. As for the time complexity, since the two methods use the same mining method, the time difference between the two methods is very small.

当然，以上说明仅仅为本发明的较佳实施例，本发明并不限于列举上述实施例，应当说明的是，任何熟悉本领域的技术人员在本说明书的教导下，所做出的所有等同替代、明显变形形式，均落在本说明书的实质范围之内，理应受到本发明的保护。Of course, the above descriptions are only preferred embodiments of the present invention, and the present invention is not limited to the above-mentioned embodiments. It should be noted that all equivalent substitutions made by any person skilled in the art under the teaching of this specification , obvious deformation forms, all fall within the essential scope of this specification, and should be protected by the present invention.

Claims

1. A process model repairing method based on reachable identification is characterized by comprising the following steps:

a parameter definition

Defining traces and event logs

Is provided withIs an active set, trace σ ∈ A is an active queue, and event log L is a multiple set of traces, i.e.Defining a Petri Net

The quadruple PN (P, T; F, M) is called a Petri net PN if and only if:

(1) n-is a net, where P is a finite set of libraries, T is a finite set of transitions,is a finite arc set;

(2) p → N is called an identifier of the net PN, where M_iFor initial identification, m_fIs a termination identifier;

(3) PN has the following transition occurrence rules:

(a) for transition T ∈ T, ifThen the transition t is called to be enabled under the identifier M and is marked as M [ t ]>；

(b) If M [ t ]>Then, under the mark M, transition t can occur, and a new mark M' is obtained by triggering transition t from the mark M and is marked as M [ t ]>M' and to

In the definition of the Petri net,^·t represents an input set of t, t^·An output set representing t, M (p) represents the identity at repository p, M' (p) represents the identity at repository p after the transition t is initiated;

defining calibration

γ∈(A^>>×T^>>) Is the alignment between trace σ and model PN, if and only if:

(1)π₁(γ)_↓Aσ, i.e. the sequence of movements in the trace produces this trace;

(2)that is, the moving sequence in the model generates a completely occurring sequence;

wherein,_σ,PNis the set of all calibrations between trace sigma and model PN, pi₁(γ)_↓ADenotes the first element in the alignment, π₂(γ)_↓TRepresents the second element in the calibration;

there are three types of actions in calibration: a synchronization action, a log action, and a model action; wherein,

synchronous action refers to action that occurs in both the trace and the model, log action refers to action that occurs in the log but does not occur in the model, and model action refers to action that occurs in the model but does not occur in the log;

defining optimal calibration

Function lc: A^>>×T^>>→ IR is the possible cost function of calibration, calibration γ ∈ between trace σ and model PN_σ,PNIs an optimal calibration and only if the gamma' ∈ is arbitrarily calibrated_σ,PN:∑_(a,t)∈γlc((a,t))≤∑_{(a’,t’)∈γ’}lc((a’,t’))；

Wherein,_σ,PN,lcis the optimal calibration set between trace σ and model PN for a probability cost function of lc, ∑_(a,t)∈γlc ((a, t)) represents the cost value of the possibility of calibrating γ, ∑_{(a’,t’)∈γ’}lc ((a ', t ')) represents the cost value of the possibility to calibrate γ ';

b extended calibration

Defining extended mobile queues

β∈(A^>>×T^>>× M) is the extended move queue between trace σ and model PN, if and only if:

(1)π₁(β)_↓Asigma is less than or equal to, namely the movement of the extended mobile queue in the trace is the prefix of the trace sigma;

(2) there is a fully occurring sequenceIs provided withNamely, the movement of the extended movement queue in the model is the prefix of the completely-occurring sequence;

wherein,indicating a completely occurring sequence, T indicates a limited queue under the transition set T;

(3)m_i[π₂(β)_↓T>m, i.e. the movement of the extended movement queue in the model m ∈ R (m)_i)；

For a triplet (a, t, m) e β in the extended move queue, there are the following extended actions, respectively:

(1) if a belongs to A and t is > >, the operation is called an extended log operation;

(2) if a > and T belongs to T, the method is called an extended model action;

(3) if a belongs to A and T belongs to T, the method is called as an extended synchronization action;

(4) the other type is called illegal extension action;

the definition of extended alignment is further given by the definition of extended mobility queue:

defining extended calibration

Spread calibration β is the spread move queue between trace σ and model PN if and only if:

(1)π₁(β)_↓Aσ, that is, the movement of the extended move queue in the trace is the trace σ;

(2)namely, the movement of the extended movement queue in the model is a completely occurring sequence;

(3)m_i[π₂(β)_↓T>m_fi.e. after the extended mobile queue moves in the model, the termination flag is finally reached,

wherein, B_σ,PNIs the set of all spread calibrations between trace σ and model PN; wherein, pi₁(β)_↓ARepresenting the mapping of the first element in the alignment on the active set A, π₂(β)_↓TRepresenting a mapping of a second element in the calibration in the transition set T;

defining optimal extended calibration

Spread calibration β∈ B between trace σ and model PN_σ,PNIs an optimal spread calibration and only if any spread calibration β' ∈ B_σ,PN:∑_(a,t,m)∈βlc((a,t))≤∑_{(a’,t’，m’)∈β’}lc((a’,t’))；

Wherein, B_σ,PN,lcThe optimal extended calibration set between the trace sigma and the model PN under the condition that the probability cost function is lc;

c bias repair

Defining extended event logs and extended trace sets

For event log L, the fitness (L, PN) is 1, event log L' is an extended event log if and only if:

(1)L’∩L＝L；

(2)

(3)0<fitness(L’,PN)<1；

for a trace which newly appears after the extension, a set PA (PA) is L' -L and is called an extension trace set;

wherein, the fitness (L, PN) represents the fitting degree of the event log L under the model PN, and the fitness (L ', PN) represents the fitting degree of the event log L' under the model PN;

c.1 extended Log action repair

Defining a set of log action biases

Set the optimal extended calibration set B_L,PN,lcIs an optimal extended calibration set based on event logs L and Petri nets PN, given a repository p, MLS is a set of log action deviations, if and only if:

M L S = {m l s | &ForAll; m l s [i] &Element; m l s, m_{i} (p) = 1 {andπ}_{2} (m l s [i]) = > >};

collecting the log action of the designated library, which comprises the following steps:

step1 input the optimal extended calibration set B_L,PN,lcAnd a designated library place p, traversing the whole optimal extended calibration set B_L,PN,lc；

Traversing calibrations β in the optimal extended calibration set_iAnd executing the following operations in the traversal process:

1) if present β_i[j..k]∈β_iTo aHas m_l(p)＝1，andπ₂(β_i[l])＝>>Then L ← L ∪ gamma_i[j..k]；

Wherein, β_i[j..k]Representation β_iβ, j and k are used for determining the range_i[l]Representation β_iAn element of (1), m_l(p) represents the symbol m_lThe value at bin p;

2) if no spreading sequence according to 1) exists, jumping to Step 1;

step 2: traversing the optimal extended calibration set B_L,PN,lcCalibrating all the devices, and outputting a log action deviation set MLS;

the method for repairing the log action of the expansion log comprises the following specific steps:

step1: inputting a log action deviation set MLS and a specified library p, mining the log action deviation set MLS by using an InduceMiner algorithm, and setting PN' ← InduceMiner (MLS);

step 2: for all of m_i' transition t of the latter set, performing the following operations:

1): adding arcs (p, t) into an arc set F of the original model PN;

2): arc (m)_i', t) are deleted from the arc set F ' of the mining model PN ';

step 3: for all of m_f' transition t of the previous set, performing the following operations:

1): adding arcs (t, p) into an arc set F of the original model PN;

2): will arc (t, m)_f') is deleted from the arc set F ' of the mining model PN ';

step 4: deleting an initial repository m in a repository set P' of a mining model PN_i' and terminating library m_f'and adding all elements of the library set P' into the library set P of the original model PN;

step 5: adding all elements of the library set T' into the library set T of the original model PN;

step 6: adding all elements of the library set F' into the library set F of the original model PN;

step 7: outputting a model PN;

c.2, repairing the model action of the expansion log, and specifically comprising the following steps:

step1: inputting a calibration beta and a position i where model action occurs, and adding an invisible transition tau in a transition set of an original model PN;

step 2: for all belonging to pi₂(β[i]) The library of the previous set p, performing the following operations:

1): adding the place P into a place P of the original model PN;

2): adding the arcs (p, tau) into an arc set F of the original model PN;

step 3: for all belonging to pi₂(β[i]) The library of the latter set p, the following operations are performed:

1): adding the place P into a place P of the original model PN;

2): adding arcs (tau, p) into an arc set F of the original model PN;

step 4: and outputting the model PN.