CN118228255A

CN118228255A - A method, device and equipment for detecting risk of application program

Info

Publication number: CN118228255A
Application number: CN202410257025.3A
Authority: CN
Inventors: 周璟; 但家旺; 田胜; 刘云飞; 王宝坤; 孟昌华; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-06-21

Abstract

The embodiments of the present specification disclose a method, apparatus and device for detecting risks of an application program, the method comprising: obtaining target data for detecting whether a target application program has a preset operational risk, the target data comprising at least access log data of a user before the user uses the target application program; determining the user's behavior sequence data based on the access log data, and determining sequence diagram structure data based on the access log data and the behavior sequence data; encoding the behavior sequence data through a sequence encoding sub-model in a pre-trained encoder to obtain a sequence representation corresponding to the behavior sequence data; inputting the sequence diagram structure data into a sequence diagram encoding sub-model in the encoder to encode the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data; and determining whether the target application program has a preset operational risk based on the sequence representation and the sequence diagram structure representation.

Description

Application risk detection method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a risk detection method, apparatus, and device for an application program.

Background

In industry digitized business, applets play a vital role. The host program often has a huge user base, and the user is guided to the digital service of different industries through the applet, so that more potential users can be provided for each industry, and the service development is promoted.

Meanwhile, merchants behind the applet may face a series of operational risks (such as false marketing, privacy data disclosure, illegal deduction, illegal lending and performance problems, etc.), which may lead to legal risks and regulatory risks for the merchants, and may also bring economic loss, privacy data disclosure, etc. to users. In addition, the merchant may not be able to fulfill its obligations (e.g., may not be able to provide products or services on time, may not be able to guarantee quality of products or services, etc.) according to the contract agreement, and thus may cause complaints and refund requirements of the user, and damage reputation and credit of the merchant, so it is particularly important to learn about the industry digitalized body represented by the applet. For this reason, there is a need to provide a better risk detection management scheme for application programs to improve the cognitive ability of new industry digitalized subjects.

Disclosure of Invention

It is an aim of embodiments of the present description to provide a better risk detection governance scheme for applications to improve awareness to new industry digitising bodies.

In order to achieve the above technical solution, the embodiments of the present specification are implemented as follows:

The embodiment of the specification provides a risk detection method for an application program, which comprises the following steps: and acquiring target data for detecting whether the target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program. And determining behavior sequence data of the user before the user uses the target application program based on the access log data, and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data. Based on the behavior sequence data, coding the relative positions of different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder, and transmitting a hidden state corresponding to the data at the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model so as to code the behavior sequence data to obtain a sequence representation corresponding to the behavior sequence data. Inputting the sequence diagram structure data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data through an attention module in the sequence diagram coding sub-model so as to encode the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data. And determining whether the target application program has preset operational risks or not based on the sequence characterization and the sequence diagram structural characterization.

The embodiment of the specification provides a risk detection device of application program, the device includes: the data acquisition module is used for acquiring target data for detecting whether a target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program. And the data processing module is used for determining behavior sequence data of the user before the user uses the target application program based on the access log data and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data. And the sequence coding module is used for carrying out relative position coding processing on different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder based on the behavior sequence data, and transmitting a hidden state corresponding to the data at the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the behavior sequence data, thereby obtaining a sequence representation corresponding to the behavior sequence data. And the diagram coding module is used for inputting the sequence diagram structural data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities in the sequence diagram structural data and the association relationship between the same type of entities through the attention module in the sequence diagram coding sub-model so as to code the sequence diagram structural data and obtain the sequence diagram structural representation corresponding to the sequence diagram structural data. And the risk determining module is used for determining whether the target application program has preset operational risk or not based on the sequence representation and the sequence diagram structural representation.

The embodiment of the specification provides a risk detection device of application program, the risk detection device of application program includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and acquiring target data for detecting whether the target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program. And determining behavior sequence data of the user before the user uses the target application program based on the access log data, and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data. Based on the behavior sequence data, coding the relative positions of different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder, and transmitting a hidden state corresponding to the data at the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model so as to code the behavior sequence data to obtain a sequence representation corresponding to the behavior sequence data. Inputting the sequence diagram structure data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data through an attention module in the sequence diagram coding sub-model so as to encode the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data. And determining whether the target application program has preset operational risks or not based on the sequence characterization and the sequence diagram structural characterization.

The present description also provides a storage medium for storing computer-executable instructions that when executed by a processor implement the following: and acquiring target data for detecting whether the target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program. And determining behavior sequence data of the user before the user uses the target application program based on the access log data, and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data. Based on the behavior sequence data, coding the relative positions of different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder, and transmitting a hidden state corresponding to the data at the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model so as to code the behavior sequence data to obtain a sequence representation corresponding to the behavior sequence data. Inputting the sequence diagram structure data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data through an attention module in the sequence diagram coding sub-model so as to encode the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data. And determining whether the target application program has preset operational risks or not based on the sequence characterization and the sequence diagram structural characterization.

Drawings

For a clearer description of embodiments of the present description or of the solutions of the prior art, the drawings that are required to be used in the description of the embodiments or of the prior art will be briefly described, it being obvious that the drawings in the description below are only some of the embodiments described in the description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art;

FIG. 1 is a diagram illustrating an embodiment of a risk detection method for an application program according to the present disclosure;

FIG. 2 is a schematic diagram illustrating another embodiment of a risk detection method for an application program according to the present disclosure;

FIG. 3 is a schematic diagram of a media map structure of the present disclosure;

FIG. 4 is a schematic diagram of another media diagram configuration of the present disclosure;

FIG. 5 is a schematic diagram illustrating another embodiment of a risk detection method for an application program according to the present disclosure;

FIG. 6 is a diagram illustrating another embodiment of a risk detection method for an application program according to the present disclosure;

FIG. 7 is a diagram illustrating another exemplary risk detection method for an application according to the present disclosure;

FIG. 8 is a diagram illustrating an exemplary embodiment of a risk detection apparatus for an application program according to the present disclosure;

Fig. 9 is a schematic diagram illustrating an embodiment of a risk detection apparatus for an application program according to the present disclosure.

Detailed Description

The embodiment of the specification provides a risk detection method, device and equipment for an application program.

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The embodiment of the specification provides a risk detection mechanism of an application program, wherein the applet is a lightweight application program provided for users based on a development framework and tools provided by a certain open platform, and developers can develop the applet in an ecosystem corresponding to the open platform for free. Unlike common application programs, the applet does not need to be downloaded and installed, and a user can directly search, open and use the applet in the host program, and the applet has the advantages of high starting speed, small occupied space, high safety and the like. The applet covers a plurality of industries including retail, catering, traveling, life service and the like, provides diversified digital service scenes such as online shopping, ordering, taxi taking, financial management and the like for users, and simultaneously provides a diversified marketing channel for merchants, and can directly interact and trade with the users through the applet platform, so that the transaction efficiency is improved, and the marketing cost is reduced.

In industry digitized business, applets play a vital role. The host program often has a huge user base, and the user is guided to the digital service of different industries through the applet, so that more potential users can be provided for each industry, and the service development is promoted. The applet is used as a part of the host program, and a user can directly enter the applet through the host program without downloading additional application programs, so that the use convenience and experience of the user are greatly improved. The small program is tightly combined with the host program system, so that the complete host program functions (such as the payment function of the host program) can be provided, a convenient solution is provided for the digital service of each industry, and the smooth progress of the corresponding service is promoted. The applet covers a plurality of industries including retail, catering, traveling, life service and the like, provides diversified digital service scenes for users, and meets diversified requirements of the users. The applet has perfect development tools and a developer ecological system, and the developer can develop and issue through the applet platform to obtain more users and opportunities, thereby promoting the rapid development of the applet.

Meanwhile, merchants behind the applet may face a series of operational risks (such as false marketing, disclosure of privacy of users, illegal deduction, illegal lending and performance problems, etc.), which may lead to legal risks and regulatory risks for the merchants, and may also bring economic loss, disclosure of privacy data, etc. to the users. In addition, the merchant may not be able to fulfill its obligations (e.g., may not be able to provide products or services on time, may not be able to guarantee quality of products or services, etc.) according to the contract agreement, and thus may cause complaints and refund requirements of the user, and damage reputation and credit of the merchant, so it is particularly important to learn about the industry digitalized body represented by the applet.

In the industry digitized subject-aware tasks, if graph structure data and sequence information are to be considered, there are a variety of algorithm categories, including: the graph convolution neural network GCN (Graph Convolutional Neural Networks), the GCN is a neural network for processing graph structure data, it learns the node representation by aggregating the neighbor information of the nodes in each network layer, and uses the learned node representation to perform classification or regression tasks, the GCN can perform joint modeling on node characteristics and edge relations in the graph structure, and performs information transmission and characteristic learning through multi-layer convolution; the graph attention network GAT (Graph Attention Networks), GAT, is a graph neural network that uses the attention mechanism to dynamically calculate weights between nodes, giving different attention weights to different nodes by considering the relationship between each node and its neighbors, so that important nodes and relationships can be more focused, and these weights can be applied to the aggregation and update process of node representations; the Graph-RNN (Graph Neural Networks with Recurrent Neural Networks) is combined by the Graph-RNN and the Graph-RNN, the Graph-RNN is combined by the Graph-RNN (Graph Neural Networks with Recurrent Neural Networks) and the cyclic neural network RNN (Recurrent Neural Networks), and is used for processing sequence information in Graph structure data, the sequence information is modeled by the RNN after the nodes in the Graph structure data are coded according to sequence order, so that the time sequence relation among the nodes can be captured, and the prediction or classification of downstream tasks can be carried out; self-encoder (Graph Autoencoders): the self-encoder is constructed by an unsupervised learning neural network and is used for learning a low-dimensional embedded representation in the graph structure data, the nodes in the graph structure data are mapped into low-dimensional vectors through the structures of the encoder and the decoder, then the low-dimensional vectors are reconstructed into the original graph structure data through the decoder, and the self-encoder can be used for tasks such as feature learning, node classification, link prediction and the like of the graph structure data.

However, in practical applications, the graph structure data and the sequence information are often sparse, i.e. there are few connections between nodes, and such data sparsity may cause the model to have difficulty capturing global relationships and patterns, thereby degrading the performance of the model. In the related art, there is often a case where information on one main body itself, information on a sequence to be skipped, information on a node in a skip sequence, information on association between the main body and other main bodies, and the like cannot be considered, and thus it is impossible to consider both the sequence information and information to be included in the graph structure data.

Therefore, the embodiment of the specification aims at discussing a risk management scheme of an application program, and provides a novel industry digital main body cognition solution, through a modeling thought based on a graph neural network and sequence information, the fusion problem of graph structure data and the sequence information can be accurately adapted, the sparsity of the data is reduced, and the sequence information and the information contained in the graph structure data can be considered. Specific processing can be seen from the details in the following examples.

As shown in fig. 1, the embodiment of the present disclosure provides a risk detection method for an application program, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, or the like), and where the server may be a separate server, or may be a server cluster formed by a plurality of servers, and the server may be a background server such as a financial service or an online shopping service, or may be a background server of a certain application program. In this embodiment, the execution subject is taken as a server for example for detailed description, and for the case that the execution subject is a terminal device, the following processing of the case of the server may be referred to, and will not be described herein. The method specifically comprises the following steps:

In step S102, target data for detecting whether the target application program has a preset operational risk is acquired, where the target data at least includes access log data of the user before the user uses the target application program.

The target application may be any application program, for example, the target application may be an application program for performing a resource transaction, an instant messaging application, a shopping application, or the like, and the target application may be an applet installed in a host program, for example, an applet installed in the instant messaging application, or an applet installed in an application program for performing a resource transaction, which may be specifically set according to the actual situation, and the embodiment of the present disclosure is not limited thereto. The operational risk may include various kinds, for example, the operational risk may include false marketing, illegal deduction, illegal lending, and performance problems, etc., which may be specifically set according to actual situations, and the embodiment of the present specification is not limited thereto. The user may be any user that needs to access the target application. The access log data may be data in a log for recording access of the user to different pages or data, and the access log data may be recorded with network address data of the page accessed by the user, a name of the page, a page type, page content, and the like, and may be specifically set according to actual situations, which is not limited in the embodiment of the present specification.

In implementation, when an application program (i.e., a target application program) is used, a user often needs to undergo a series of page adjustment operations, and in the process of cognizing the target application program, multiple layers of information, such as information of a page jump behavior sequence reaching the target application program, information of a certain page (i.e., a data node) in the page jump behavior sequence reaching the target application program, etc., need to be considered, wherein the information of the page jump behavior sequence reaching the target application program is that the user experiences multiple page jumps when using the target application program, so as to form a certain behavior sequence, and the behavior sequences can be obtained through recorded access log data of the user, etc., so as to know the behavior mode and habit of using the target application program by the user. The information of a certain page in the page jump behavior sequence reaching the target application program can be obtained by means of recorded access log data of a user for each data node (including the name of the page, the page type, the page content and the like) in the page jump behavior sequence of the target application program, so as to know whether the access to the high-risk page exists in the user preamble sequence. Based on the above, in order to acquire the information of the plurality of layers, access log data of the user before the user starts the target application may be searched from the pre-recorded access log data of the user, in addition, related information (such as name, category, developer information, etc. of the target application) of the target application may be acquired, and the acquired data may be used as target data for detecting whether the target application has a preset operational risk.

In step S104, based on the above access log data, behavior sequence data of the user before the user uses the target application program is determined, and based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes included in the behavior sequence data is determined.

In implementation, before the target application program is started, the user can go through multiple page jumps, and the user behaviors such as the multiple page jumps are recorded in the access log data, at this time, the access log data can be analyzed, the user behaviors corresponding to the multiple page jumps can be constructed according to the sequence of the use time, and the behavior sequence data can be used as the behavior sequence data of the user before the user uses the target application program. In addition, for each data node (including the name of the page, the page type, the page content, etc.) in the target application page jump behavior sequence, the access log data and the behavior sequence data may be analyzed, corresponding edges may be constructed based on different data nodes included in the behavior sequence data as entities, association relationships between the different data nodes may be constructed, corresponding graph structure data may be constructed based on the data nodes and the corresponding edges, and the graph structure data may be used as sequence graph structure data constructed based on the different data nodes included in the behavior sequence data.

In step S106, based on the above behavior sequence data, the sequence coding sub-model in the pre-trained encoder performs coding processing on the relative positions of different data nodes included in the behavior sequence data, and the hidden state corresponding to the data at the current processing time point in the behavior sequence data is transferred to the next time point through the cyclic strategy in the sequence coding sub-model, so as to perform coding processing on the behavior sequence data, and obtain the sequence representation corresponding to the behavior sequence data.

The encoder may be a model for encoding data of different modalities (such as serialized data and non-serialized data, etc.) or obtaining corresponding characterizations, and the encoder may be constructed by a plurality of different algorithms or networks, for example, the encoder may be constructed by one or more different neural networks, or the encoder may be constructed by one or more different feature extraction algorithms, etc., which may be specifically set according to practical situations, and the embodiments of the present disclosure are not limited to this. The sequence coding sub-model can be a sub-model for coding or characterizing the serialized data, and the sequence coding sub-model can also realize the coding processing of the relative positions of different data nodes contained in the behavior sequence data, and the hidden state corresponding to the data of the current processing time point in the behavior sequence data is transferred to the next time point through a circulation strategy in the sequence coding sub-model. The sequence coding sub-model may include one or more sequence coding sub-models, which may be set for different types of serialized data, respectively, and may code the specified serialized data to obtain coded information corresponding to the data, and may use the coded information as a representation of the serialized data, where the representation may be information representing the specified data in a more concise manner, and may be specifically according to practical situations. The sequence coding submodel may be constructed by a plurality of different algorithms or networks, for example, the serialized data may be mapped into a matrix or vector by a preset mapping algorithm, or the sequence coding submodel may be constructed by a neural network, etc., and may be specifically set according to practical situations. The sequence representation may be a numerical value, a string of characters, or a matrix or vector, which may be specifically set according to the actual situation, which is not limited in the embodiment of the present specification.

In an implementation, in order to encode data of different modes (such as serialized data, graph structure data, etc.) or obtain corresponding characterization, an encoder may be preset, in addition, in order to encode or characterize each serialized data, a sequence encoding sub-model may be preset in the encoder, a corresponding algorithm may be obtained, and a framework of the sequence encoding sub-model may be constructed based on the algorithm, input data of the sequence encoding sub-model may be a specified serialized data, and output data may be a data characterization, for example, the sequence encoding sub-model may include an encoding sub-model of a neural network of a self-attention mechanism. Then, a training sample can be obtained, and the training sample can be used for performing supervised model training on the sequence coding sub-model to obtain a trained sequence coding sub-model.

The sequence coding sub-model in the encoder can be used for carrying out relative position coding processing on different data nodes contained in the behavior sequence data through the sequence coding sub-model in the pre-trained encoder, meanwhile, the hidden state corresponding to the data at the current processing time point in the behavior sequence data can be transmitted to the next time point through the circulation strategy in the sequence coding sub-model so as to carry out coding processing on the behavior sequence data, and the sequence representation corresponding to the behavior sequence data is obtained.

In step S108, the sequence diagram structure data is input into a sequence diagram coding sub-model in the encoder, and the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data are determined by the attention module in the sequence diagram coding sub-model, so as to encode the sequence diagram structure data, and obtain a sequence diagram structure representation corresponding to the sequence diagram structure data.

The sequence diagram coding sub-model may be a sub-model for coding or obtaining the diagram structure data or the sequence diagram structure data as described above, and the sequence diagram coding sub-model may include an attention module, and may determine association relations between different types of entities in the sequence diagram structure data and association relations between entities of the same type through the attention module. The sequence diagram coding sub-model may include one or more sequence diagram coding sub-models, which may be set for the diagram structure data, and may encode the diagram structure data or the sequence diagram structure data as described above to obtain coding information corresponding to the data, and may use the coding information as a sequence diagram structural representation corresponding to the sequence diagram structure data. The sequence diagram coding submodel may be constructed by a plurality of different algorithms or networks, for example, the diagram structure data or the sequence diagram structure data as described above may be mapped into a matrix or vector by a preset mapping algorithm, or the sequence diagram coding submodel may be constructed by a neural network, etc., and may be specifically set according to practical situations. The structural representation of the sequence chart can be a numerical value, a string of characters, a matrix or a vector, and can be specifically set according to practical situations, and the embodiment of the specification is not limited to the numerical representation.

In an implementation, in order to encode or characterize the graph structure data or the sequence graph structure data as described above, a sequence graph encoding sub-model may be preset in an encoder, a corresponding algorithm may be obtained, and an architecture of the sequence graph encoding sub-model may be constructed based on the algorithm, input data of the sequence encoding sub-model may be a certain graph structure data, and output data may be a data characterization, for example, the sequence graph encoding sub-model may include an encoding sub-model of a neural network of an attention module for determining association relations between different types of entities and association relations between entities of the same type in the sequence graph structure data. Then, a training sample can be obtained, and the training sample can be used for performing supervised model training on the sequence diagram coding sub-model to obtain a trained sequence diagram coding sub-model.

The sequence diagram structure data can be respectively input into a sequence diagram coding sub-model in an encoder, the incidence relation between different types of entities in the sequence diagram structure data and the incidence relation between the same type of entities are determined through an attention module in the sequence diagram coding sub-model, through the mode, coding processing can be carried out on the sequence diagram structure data to obtain sequence diagram structure characterization corresponding to the sequence diagram structure data, in this way, the sequence diagram coding sub-model can capture the relations between different types of entities, such as the connection relation between different types of data nodes, and the like, and in addition, the sequence diagram coding sub-model can capture the relations between the same type of entities, such as the connection relation between the same type of data nodes, and the like.

In step S110, it is determined whether the target application program has a preset operational risk based on the sequence representation and the sequence diagram structural representation.

In implementation, the sequence characterization and the sequence diagram structural characterization may be fused to obtain a fusion feature, the classification prediction of the business risk may be performed on the target application program based on the fusion feature, whether the target application program has a preset business risk may be determined according to the prediction result, if the target application program has the preset business risk, the target application program may be refused to run online, and if the target application program does not have the preset business risk, the target application program may be allowed to run online.

The embodiment of the specification provides a risk detection method for an application program, by acquiring target data for detecting whether a preset operational risk exists in the target application program, wherein the target data at least comprises access log data of a user before the user uses the target application program, then, based on the access log data, the behavior sequence data of the user before the user uses the target application program can be determined, and based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data can be determined, then, based on the behavior sequence data, a sequence coding sub-model in a pre-trained encoder can be used for carrying out relative position coding processing on different data nodes contained in the behavior sequence data, and a hidden state corresponding to data of a current processing time point in the behavior sequence data is transferred to a next time point through a circulation strategy in the sequence coding sub-model, so as to carry out coding processing on the behavior sequence data, obtain a sequence representation corresponding to the behavior sequence diagram structure data, and based on the behavior sequence diagram coding sub-model, a correlation between different types of entity diagram structure data in the behavior sequence diagram is determined, and a correlation diagram can be introduced in the sequence diagram coding sub-model, thus, a correlation relation between entity diagram structure data and a target structure map can be determined, and a correlation diagram can be introduced in the sequence representation mechanism in the sequence coding sub-model, and a correlation relation between the sequence diagram is based on the correlation structure representation of the entity diagram structure and the correlation diagram and the correlation map, and the correlation map is obtained, the relative position representation is used for encoding the relation between the positions of different data nodes in the sequence data, so that the encoder can better capture the dependency relation existing in the long sequence data, the modeling capability of a model is improved, in addition, a circulation mechanism is introduced into a sequence encoding sub-model in the encoder, the hidden state of each time step can be transferred to the next time step, so that the encoder can keep the memory of the previous time step when processing the sequence data, and the long-term dependency relation possibly existing can be better captured, through the circulation mechanism, the encoder can better process the long sequence data, and the performance and the generalization capability of the encoder are improved. In addition, the attention mechanism is introduced into the sequence diagram coding submodel in the encoder to construct the relationship between different types of entities and the relationship between the same type of entities in the sequence diagram structural data, so that the encoder can capture the complex relationship between different types of entities in the heterogeneous diagram, further more accurate characterization and prediction results can be obtained, and the risk detection efficiency and accuracy of the application program are improved.

In practical applications, there may be other application programs related to the target application program, for example, the target application program may be an applet, the applet may have an association relationship with other applet programs, whether the target application program has a preset business risk may be determined based on the other application programs, specifically, the target data may further include information related to the target application program, the information related to the target application program may include other application programs different from the target application program, information of a merchant corresponding to the target application program, and information of a merchant corresponding to the other application program, and may further include information of a developer corresponding to the target application program, information of a developer corresponding to the other application program, identity information authenticated by the merchant (including a merchant corresponding to the target application program and a merchant corresponding to the other application program, etc.), business license information authenticated by the merchant, and resource account information (such as bank card account information or account information of a financial application) bound by the merchant, etc., based on this, as shown in fig. 2, the embodiment may further include the following processing steps S114 and step S114.

In step S112, based on the information related to the target application, media map structure data constructed from the target application, other applications, information of merchants corresponding to the target application, and information of merchants corresponding to the other applications is determined.

In an implementation, the relationship between different applications may be constructed by a strong medium association, where the strong medium association may include multiple types, for example, different applications may be associated by a merchant, as shown in fig. 3, a target application is an applet, and the association between different applets may be constructed by the merchant, as shown in fig. 3, including 2 different applets, respectively APPID1 and APPID2, and in addition, 2 different applets may be established by information of the merchant, and so on. In addition, the association between different applets may be constructed by the merchants corresponding to the target application program, for example, as shown in fig. 4, and the association includes 2 different applets, namely APPID1 and APPID2, in addition, the merchant corresponding to the applet APPID1 is PID1, the merchant corresponding to the applet APPID2 is PID2, and the association between the merchant PID1 and the merchant PID2 may be performed according to media such as identity information, so that corresponding media map structure data may be constructed.

In step S114, the above media map structure data and the above sequence map structure representation are input into a media map coding sub-model in the encoder, and the attention module in the media map coding sub-model determines the association relationship between different types of entities in the media map structure data and the association relationship between the same types of entities, so as to encode the media map structure data, and obtain the media map structure representation corresponding to the media map structure data.

The medium diagram coding sub-model can be a sub-model for coding the medium diagram structure data or obtaining the corresponding representation of the medium diagram structure data, and the medium diagram coding sub-model can comprise an attention module, and the attention module can be used for determining the association relations between different types of entities in the medium diagram structure data and the association relations between the entities of the same type. The media map coding sub-model may include one or more media map coding sub-models, which may be set for the map structure data, and may code the map structure data or the media map structure data as described above to obtain coding information corresponding to the data, and may use the coding information as a media map structure representation corresponding to the media map structure data. The media map coding sub-model may be constructed by a plurality of different algorithms or networks, for example, map the map structure data or the media map structure data as described above into a matrix or vector by a preset mapping algorithm, or construct the media map coding sub-model by a neural network, etc., and may be specifically set according to practical situations. The structural representation of the medium diagram may be a numerical value, a string of characters, a matrix or a vector, and may be specifically set according to practical situations, which is not limited in the embodiment of the present specification.

In an implementation, in order to code or characterize the media map structural data, a media map coding sub-model may be preset in an encoder, a corresponding algorithm may be obtained, and a framework of the media map coding sub-model may be constructed based on the algorithm, input data of the media map coding sub-model may be certain map structural data (such as the media map structural data), output data may be a data characterization, and in particular, the media map coding sub-model may include a coding sub-model of a neural network of an attention module for determining association relations between different types of entities and association relations between entities of the same type in the media map coding sub-model. Then, a training sample may be obtained, and the training sample may be used to perform supervised model training on the media map coding sub-model, so as to obtain a trained media map coding sub-model, or the media map coding sub-model may be obtained through the related content of the sequence map coding sub-module, which may be specifically set according to the actual situation, and this embodiment of the present disclosure is not limited.

The medium diagram structure data and the sequence diagram structure characterization can be respectively input into a medium diagram coding sub-model in an encoder, the incidence relation between different types of entities in the medium diagram structure data and the incidence relation between the same types of entities are determined through an attention module in the medium diagram coding sub-model, the medium diagram structure data can be subjected to coding processing in the mode to obtain the medium diagram structure characterization corresponding to the medium diagram structure data, and therefore the medium diagram coding sub-model can capture the relations between different types of entities, such as the connection relation between different types of data nodes, and the like, and in addition, the medium diagram coding sub-model can capture the relations between the same types of entities, such as the connection relation between the same types of data nodes, and the like.

Based on the processing of the above step S112 and step S114, the above step S110 may be further processed in the following manner, and the saw leg may include the following: and determining whether the target application program has preset operational risk or not based on the sequence characterization, the sequence diagram structural characterization and the medium diagram structural characterization.

In implementation, the sequence representation, the sequence diagram structural representation, and the medium diagram structural representation may be fused (e.g., the sequence representation, the sequence diagram structural representation, and the medium diagram structural representation may be directly spliced to perform fusion processing, or a preset data fusion algorithm may be used to perform fusion processing on the sequence representation, the sequence diagram structural representation, and the medium diagram structural representation), so as to obtain fusion features, a classification prediction of an operational risk may be performed on a target application based on the fusion features, whether the target application has a preset operational risk may be determined according to a prediction result, if the target application has the preset operational risk, the target application may be refused to perform online operation, and if the target application does not have the preset operational risk, the target application may be allowed to perform online operation.

In practical applications, whether the target application program has a preset business risk may be determined based on information of the target application program itself, for example, the target application program is an applet, the information of the applet may cause the preset business risk, whether the target application program has the preset business risk may be determined based on the information of the target application program itself, specifically, the target data further includes information of the target application program, and the information of the target application program may include a name, a category, description information, developer information of the target application program, and so on, based on this, as shown in fig. 5, an embodiment of the present disclosure may be further processed in a manner of step S116 described below.

In step S116, the information of the target application program is input into the information encoding sub-model in the encoder, and the information representation corresponding to the information of the target application program is obtained.

The information coding sub-model can be a sub-model for coding text information or obtaining corresponding characterization of the text information. The information coding sub-model may include one or more information coding sub-models, which may be constructed by a plurality of different algorithms or networks, for example, text information may be mapped into a matrix or vector by a preset mapping algorithm, or the information coding sub-model may be constructed by a neural network, and may be specifically set according to practical situations. The information representation may be a numerical value, a string of characters, or a matrix or vector, which may be specifically set according to the actual situation, which is not limited in the embodiment of the present specification.

In implementation, in order to code or characterize text information, an information coding sub-model may be preset in an encoder, a corresponding algorithm may be obtained, a framework of the information coding sub-model may be constructed based on the algorithm, input data of the information coding sub-model may be text information, output data may be a data characterization, then a training sample may be obtained, and the training sample may be used to perform supervised model training on the information coding sub-model, so as to obtain a trained information coding sub-model. The information of the target application program can be input into an information coding sub-model in the encoder, and the information of the target application program is coded through the information coding sub-model, so that the information representation corresponding to the information of the target application program is obtained.

In addition, based on the processing of the above step S116, the above step S110 may be further processed in the following manner, which may specifically include the following: and determining whether the target application program has preset operational risk or not based on the sequence characterization, the sequence diagram structural characterization and the information characterization.

In implementation, the sequence representation, the sequence diagram structural representation, and the information representation may be fused (e.g., the sequence representation, the sequence diagram structural representation, and the information representation may be directly spliced to perform fusion processing, or a preset data fusion algorithm may be used to perform fusion processing on the sequence representation, the sequence diagram structural representation, and the information representation), so as to obtain a fusion feature, a classification prediction of an operational risk may be performed on a target application based on the fusion feature, whether the target application has a preset operational risk may be determined according to a prediction result, if the target application has the preset operational risk, the target application may be refused to be run online, and if the target application does not have the preset operational risk, the target application may be allowed to run online.

In practical applications, based on the processing of the step S112, the step S114, and the step S116, the step S110 may further be processed in the following manner, which may specifically include the following: and determining whether the target application program has preset operational risk or not based on the sequence characterization, the sequence diagram structural characterization, the medium diagram structural characterization and the information characterization.

In an implementation, the sequence representation, the sequence diagram structural representation, the medium diagram structural representation and the information representation may be fused (for example, the sequence representation, the sequence diagram structural representation, the medium diagram structural representation and the information representation may be directly spliced to perform fusion processing, or a preset data fusion algorithm may be used to perform fusion processing on the sequence representation, the sequence diagram structural representation, the medium diagram structural representation and the information representation), so as to obtain fusion characteristics, a classification prediction of an operational risk may be performed on a target application based on the fusion characteristics, whether the target application has a preset operational risk may be determined according to a prediction result, if the target application has the preset operational risk, the target application may be refused to perform online operation, and if the target application does not have the preset operational risk, the target application may be allowed to perform online operation.

In practical applications, the specific processing manner of the step S106 may be varied, and an alternative processing manner is provided below, and as shown in fig. 6, the following processing of the step S1062 and the step S1064 may be specifically included.

In step S1062, the behavior sequence data is input into the first pre-trained characterization model, to obtain an initial characterization sequence corresponding to the behavior sequence data.

The first characterization model may include various types, and the first characterization model may also be a pre-trained model, for example, the first characterization model may be constructed through a neural network, or the first characterization model may be constructed through a specified language processing model, for example, the first characterization model may be constructed through BERT, or the first characterization model may be a model constructed based on RoBERTa, or the like, which may be specifically set according to an actual situation, and embodiments of the present disclosure do not limit this.

In implementation, in the process of acquiring the target data in step S102, sampling of the behavior sequence data taking the target application program as an endpoint may be performed first, specifically, intercepting the behavior sequence data with the length of n (where n is a positive integer greater than or equal to 0) of each piece of behavior sequence data, counting a set of the skip page sequences of the target application program within a preset duration before the current moment (such as within 1 month or 3 months before the current moment), and selecting the skip page sequence with the frequency greater than a preset threshold as the skip page sequence of the target application program, based on this, for each application program, the behavior sequence data with the frequency greater than the preset threshold (such as maintaining 20 pieces) and the length of n may be maintained, where if the length of the behavior sequence data is less than n, the alignment process may be performed by the preset data (or default data).

Then, a training sample for training the first characterization model may be obtained, and the training sample may be used to perform model training on the first characterization model, so as to obtain a trained first characterization model, where the trained first characterization model will not update its model parameters along with the training of the encoder, that is, the model parameters of the first characterization model are fixed in the process of training the encoder. For a certain behavior sequence data k to be input, the length of the behavior sequence data k is n, and the behavior sequence data k may be text type data, where x _ik represents the ith data of the behavior sequence data k. The behavior sequence data k can be input into a first characterization model, and the first characterization model can be used for carrying out coding processing on the behavior sequence data k to obtain an initial characterization sequence corresponding to the behavior sequence data k, namely { h _1k,h_2k,...,h_nk }, wherein the length of each characterization h _ik can be a-dimension (specifically, 32-dimension, 16-dimension and the like).

In step S1064, the initial characterization sequence is input into a sequence coding sub-model in the encoder, the relative position coding module in the sequence coding sub-model performs the coding processing of the relative positions on the initial characterizations corresponding to different data nodes included in the initial characterization sequence, and the hidden state corresponding to the data of the current processing time point in the initial characterization sequence is transferred to the next time point through the circulation strategy in the sequence coding sub-model, so as to perform the coding processing on the initial characterization sequence, and the sequence characterization corresponding to the behavior sequence data is obtained.

The sequence coding sub-model can be a sub-model for coding or characterizing the characterization sequence, and the sequence coding sub-model can also realize the coding processing of the relative positions of different data nodes contained in the characterization sequence, and transfer the hidden state corresponding to the data of the current processing time point in the characterization sequence to the next time point through a circulation strategy in the sequence coding sub-model. The sequence coding submodel may be constructed by a plurality of different algorithms or networks, for example, the characterization sequence may be mapped into a matrix or vector by a preset mapping algorithm, or the sequence coding submodel may be constructed by a neural network, etc., and may be specifically set according to practical situations.

In implementation, in order to code or characterize each characterization sequence, a sequence coding sub-model may be preset in the encoder, a corresponding algorithm may be obtained, and an architecture of the sequence coding sub-model may be constructed based on the algorithm, input data of the sequence coding sub-model may be a specific characterization sequence, and output data may be a data characterization, for example, the sequence coding sub-model may include a coding sub-model of a neural network of a self-attention mechanism. Then, a training sample can be obtained, and the training sample can be used for performing supervised model training on the sequence coding sub-model to obtain a trained sequence coding sub-model.

The characterization sequence can be input into a sequence coding sub-model in an encoder, the sequence coding sub-model in the encoder trained in advance is used for carrying out coding processing on the relative positions of different data nodes contained in the characterization sequence, meanwhile, the hidden state corresponding to the data of the current processing time point in the characterization sequence can be transmitted to the next time point through a circulation strategy in the sequence coding sub-model, so that the characterization sequence is subjected to coding processing, the sequence characterization corresponding to the characterization sequence is obtained, and the sequence characterization corresponding to the characterization sequence can be used as the sequence characterization corresponding to the behavior sequence data.

In practical applications, the first characterization model may be a model constructed based on ALBERT, roBERTa or ELECTRA.

In addition, the sequence coding submodel can be a submodel constructed based on a transducer-XL, and the transducer-XL is a network which is improved on the basis of the transducer model and can be used for modeling tasks for processing long sequence data. Because of the limitations of the self-attention mechanism of the transducer model, modeling long sequence data often suffers from performance and efficiency challenges, and the transducer-XL solves the above problems by introducing a relative position coding mechanism in the relative position coding module and a cyclic mechanism corresponding to the cyclic strategy. The transform model uses absolute position coding to provide position information for the position of each data node in the input sequence data, however, the absolute position coding mechanism described above can have problems in the processing of long sequence data, since the model cannot directly learn information related to the relative position of the input sequence data, in order to solve the above problems, the transform-XL introduces a relative position coding mechanism, and uses relative position representation to code the relationship between the positions of different data nodes in the sequence data. Through the mode, the transducer-XL can better capture the dependency relationship existing in long sequence data, the modeling capability of the model is improved, in addition, a circulation mechanism is introduced into the transducer-XL, so that the hidden state of each time step can be transferred to the next time step, the model can keep the memory of the previous time step when processing the sequence data, and therefore the possibly existing long-term dependency relationship can be better captured, and through the circulation mechanism, the transducer-XL can better process the long sequence data, and the performance and the generalization capability of the model are improved.

Based on the above, the processing of step S1064 described above may include: inputting an initial characterization sequence { h _1k,h_2k,...,h_nk } into a sequence coding sub-model constructed based on a transducer-XL in an encoder, carrying out relative position coding processing on initial characterizations corresponding to different data nodes contained in the initial characterization sequence through a relative position coding module in the sequence coding sub-model constructed based on the transducer-XL, and transmitting a hidden state corresponding to data of a current processing time point in the initial characterization sequence to a next time point through a circulation strategy in the sequence coding sub-model constructed based on the transducer-XL so as to carry out coding processing on the initial characterization sequence, thereby obtaining a sequence characterization corresponding to behavior sequence data.

It should be noted that, the sequence coding sub-model constructed based on the transducer-XL may obtain a sequence representation corresponding to the behavior sequence data, and for the above-mentioned sequence representation determined by the frequency greater than the preset threshold value, the sequence representation corresponding to each behavior sequence data may be determined by the sequence coding sub-model constructed based on the transducer-XL, then the sequence representation corresponding to the multiple behavior sequence data may be fused to obtain a fused sequence representation corresponding to the acquired multiple behavior sequence data, where a manner of fusing the sequence representation corresponding to the multiple behavior sequence data may include multiple manners, for example, a fusion algorithm (such as a wavelet change algorithm, a neural network algorithm, etc.) may be preset, and the sequence representation corresponding to the multiple behavior sequence data may be fused by the fusion algorithm, or the sequence representation corresponding to the multiple behavior sequence data may be fused by a maximum pooling operation, which may not be limited according to the embodiment of the present specification.

In practical applications, the sequence diagram coding sub-model may be a sub-model constructed based on a target algorithm, the medium diagram coding sub-model may be a sub-model constructed based on a target algorithm, the target algorithm may include an algorithm for processing heterogeneous diagram structure data, for example, the target may include a graph roll-up neural network GCN algorithm, an algorithm combining a graph neural network and a cyclic neural network, and the like, which may be specifically set according to practical situations, and the embodiment of the present disclosure is not limited to this.

In practical applications, the information coding sub-model may be a sub-model constructed based on a multi-layer perceptron MLP or the information coding sub-model is a sub-model constructed based on AutoInt networks.

For the information coding sub-model constructed based on the MLP, at this time, one or more full connection layers can be included in the information coding sub-model, the input discrete features can be converted into low-dimensional continuous features through the MLP, and the information coding sub-model constructed based on the MLP can be suitable for the situation that the number of the processing feature values is small. For the information coding submodel constructed based on AutoInt network, since AutoInt network is an adaptive feature interaction network, autoInt network can better construct nonlinear relation between features by automatically learning interaction weights between features. In AutoInt networks, the interaction weights between features can be learned through an attention mechanism, the interaction weights between different features can be adaptively learned, and by adaptively modeling the feature interactions, the AutoInt network can improve the expressive power of the model.

In practical applications, the target algorithm may include a heterogram transformation HGT algorithm, and the target application may be an applet loaded in the host program.

The HGT (Heterogeneous Graph Transformer, heterogeneous graph conversion) algorithm is an algorithm for processing heterogeneous graph data, combines the ideas of a graph neural network and a transducer module, and can be used for tasks such as data node classification, link prediction and the like. The HGT algorithm can effectively model and utilize the characteristics and relationships of different types of data nodes and edges in the heterograms by introducing a plurality of attention mechanisms and hierarchical information transfer. The core idea of the HGT algorithm is to consider the data nodes and edges in the heterograms as different types of entities, each entity corresponding to a respective representation vector (or token). In the HGT algorithm, features of each entity are mapped through a linear transformation to obtain an initial entity representation vector, and then the HGT algorithm models relationships among the entities through a plurality of attention mechanisms. The attention mechanism in the HGT algorithm may include two levels of attention mechanisms, namely a cross-type attention mechanism and an internal attention mechanism, where the cross-type attention mechanism is used to construct relationships between different types of entities, for example, connection relationships between different types of nodes may be constructed, and the internal attention mechanism may be used to construct relationships between the same type of entities, for example, connection relationships between the same type of nodes, and through the two levels of attention mechanisms, the HGT algorithm may capture complex relationships between different types of entities in the heterogeneous graph.

In addition, the HGT algorithm also introduces a layering information transfer mechanism, and the representation of the entity is iteratively updated through a multi-layer transducer module. In each transducer module, the HGT algorithm uses a multi-head attention mechanism to construct the relation between entities, optimizes information transmission and combination through a residual error connection layer and a normalization layer, and can gradually extract and combine the characteristics of different layers through multi-layer information transmission, so that more accurate characterization and prediction results are obtained.

Based on the above, the processing of step S108 may include: inputting sequence diagram structure data into a sequence diagram coding sub-model constructed based on an HGT algorithm in an encoder, determining the association relationship between different types of entities in the sequence diagram structure data through a cross-type attention mechanism in an attention module in the sequence diagram coding sub-model constructed based on the HGT algorithm, determining the association relationship between the same types of entities through an internal attention mechanism in the attention module, optimizing information transmission and combination through a residual connection layer and a normalization layer by the sequence diagram coding sub-model constructed based on the HGT algorithm, and gradually extracting and combining different layers of features by the sequence diagram coding sub-model constructed based on the HGT algorithm so as to encode the sequence diagram structure data to obtain the sequence diagram structure representation corresponding to the sequence diagram structure data. More accurate characterization and prediction results can be obtained through a sequence diagram coding submodel constructed based on the HGT algorithm.

In practical applications, the specific processing manner for determining the sequence diagram structure data constructed based on the different data nodes included in the behavior sequence data in the step S104 based on the access log data and the behavior sequence data may be various, and an optional processing manner is provided below, and as shown in fig. 7, the following processing may be specifically included in the steps S1042 and S1044.

In step S1042, a correlation coefficient matrix between different data nodes in the behavior sequence data is constructed based on the access log data and the behavior sequence data.

In implementation, as described above, in the process of acquiring the target data in step S102, the behavior sequence data with the target application program as the endpoint may be sampled, so that for each application program, the behavior sequence data with the frequency greater than the preset threshold (for example, 20 pieces of the behavior sequence data are maintained) and the length of n may be maintained. Then, for a certain behavior sequence data k to be input, the length of the behavior sequence data k is n and may be denoted as { x _1k,x_2k,...,x_nk }, the behavior sequence data k may be input into a first characterization model, and the first characterization model may perform encoding processing on the behavior sequence data k to obtain an initial characterization sequence corresponding to the behavior sequence data k, i.e., { h _1k,h_2k,...,h_nk }. Then, similarity between different data nodes in the initial characterization sequence can be described by adopting a similarity algorithm (such as a cosine similarity algorithm or a Euclidean distance similarity algorithm, etc.), a trainable parameter W can be added, and then a correlation coefficient matrix between different data nodes in the behavior sequence data can be constructed by the following formula (taking the cosine similarity algorithm as an example) based on the access log data and the initial characterization sequence.

M_ij＝cos(Wh_i,Wh_j)

Wherein M _ij represents different data node correlation coefficient matrices, h _i and h _j respectively represent two different data nodes in the behavior sequence data, and cos represents cosine similarity.

It should be noted that, although the manner of determining the initial characterization sequence is the same as that provided above, in practical application, the initial characterization sequence may be determined by other manners, for example, the data of each data node in the behavior sequence data may be mapped to a corresponding matrix or vector by a mapping manner, so as to obtain an initial characterization sequence corresponding to the behavior sequence data, or the data of each data node in the behavior sequence data may be subjected to feature extraction by a preset feature extraction algorithm, so as to obtain an initial characterization sequence corresponding to the behavior sequence data, which may be specifically set according to practical situations, and the embodiment of the present specification is not limited thereto.

In step S1044, a joint-edge sparse threshold is determined based on the correlation coefficient matrix between different data nodes in the behavior sequence data, and sequence diagram structure data is constructed based on the joint-edge sparse threshold and the data nodes included in the behavior sequence data.

In implementation, for a correlation coefficient matrix calculated by different data nodes in behavior sequence data, a non-negative threshold (i.e., a conjoined sparse threshold) is intercepted, a conjoined edge is reserved between two data nodes greater than the conjoined sparse threshold, and a conjoined edge is not reserved between two data nodes smaller than the conjoined sparse threshold, and the conjoined edge sparse threshold can be determined in a manner of designating a relative proportion, as shown in the following formula.

Wherein, a _ij is a continuous edge between two data nodes. For example, if the value of ε is determined to be 0.2, then the first 20% of the conjoined edges are reserved, and for a 20 length behavioural series data, the number of conjoined edges may be 20× (20-1) ×0.5×0.3=38. After the edge connection is subjected to sparsification processing in the mode, the target application program and the data nodes in the connected behavior sequence data form a connected graph, namely the sequence graph structure data.

The encoder may be trained in the following manner, and specifically may include the following processing from step a02 to step a 10.

In step a02, a data sample for detecting whether the target application program has a preset operational risk is obtained, where the data sample includes at least historical access log data of the user before the user uses the target application program.

In step a04, a behavior sequence data sample of the user before the user uses the target application program is determined based on the historical access log data, and a sequence diagram structure data sample constructed based on different data nodes contained in the behavior sequence data sample is determined based on the historical access log data and the behavior sequence data sample.

In step a06, based on the behavior sequence data sample, encoding processing of relative positions is performed on different data nodes included in the behavior sequence data sample through a sequence encoding sub-model in the encoder, and a hidden state corresponding to the data sample at a current processing time point in the behavior sequence data sample is transferred to a next time point through a circulation strategy in the sequence encoding sub-model, so as to encode the behavior sequence data sample, and a sample sequence representation corresponding to the behavior sequence data sample is obtained.

In step a06, the sequence diagram structural data sample is input into a sequence diagram coding sub-model in the encoder, and the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structural data sample are determined by an attention module in the sequence diagram coding sub-model, so as to encode the sequence diagram structural data sample, and obtain a sample sequence diagram structural representation corresponding to the sequence diagram structural data sample.

In step a10, based on the sample sequence representation and the sample sequence diagram structural representation, decoding the sample sequence representation and the sample sequence diagram structural representation by a decoder corresponding to the encoder, and performing joint training on the encoder and the decoder based on a decoding result and a preset loss function to obtain a trained encoder.

In implementation, the sample sequence representation and the sample sequence diagram structure representation may be subjected to a splicing process to obtain a spliced representation, and the spliced representation may be input to a decoder, through which the spliced representation is subjected to a decoding process. The specific processing procedures of the step a02 to the step a10 may also refer to the related content, and are not described herein.

In practical application, step a06 may also be implemented by the following manner: inputting the behavior sequence data sample into a pre-trained first characterization model to obtain a sample initial characterization sequence corresponding to the behavior sequence data sample; the method comprises the steps of inputting a sample initial characterization sequence into a sequence coding sub-model in an encoder, carrying out relative position coding processing on sample initial characterizations corresponding to different data nodes contained in the sample initial characterization sequence through a relative position coding module in the sequence coding sub-model, and transmitting a hidden state corresponding to data of a current processing time point in the sample initial characterization sequence to a next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the sample initial characterization sequence, thereby obtaining sample sequence characterizations corresponding to the behavior sequence data samples.

In practical application, the determining, in step a04, the processing of the sequence diagram structure data sample constructed based on different data nodes included in the behavior sequence data sample based on the historical access log data and the behavior sequence data sample may also be determined by the following manner, including: based on the historical access log data and the behavior sequence data samples, constructing a correlation coefficient matrix between different data nodes in the behavior sequence data samples; and determining a sample continuous edge sparse threshold based on a correlation coefficient matrix among different data nodes in the behavior sequence data sample, and constructing a sequence diagram structure data sample based on the sample continuous edge sparse threshold and the data nodes contained in the behavior sequence data sample.

In practical application, the data sample further includes information related to the target application program, where the information related to the target application program includes information of other application programs different from the target application program, information of merchants corresponding to the target application program, and information of merchants corresponding to the other application programs, and may further include: determining a medium graph structure data sample constructed by the target application program, other application programs, information of merchants corresponding to the target application program and information of merchants corresponding to the other application programs based on the information related to the target application program; inputting the medium diagram structure data sample and the sample sequence diagram structure representation into a medium diagram coding sub-model in an encoder, and determining the association relationship between different types of entities in the medium diagram structure data sample and the association relationship between the same type of entities through an attention module in the medium diagram coding sub-model so as to encode the medium diagram structure data sample to obtain a sample medium diagram structure representation corresponding to the medium diagram structure data sample.

Accordingly, step a10 may be: based on the sample sequence representation, the sample sequence diagram structural representation and the sample medium diagram structural representation, decoding the sample sequence representation, the sample sequence diagram structural representation and the sample medium diagram structural representation through a decoder corresponding to the encoder, and based on a decoding result and a preset loss function, performing joint training on the encoder (comprising a sequence coding sub-model, a sequence diagram coding sub-model and a medium diagram coding sub-model) and the decoder to obtain the trained encoder.

In implementation, the sample sequence characterization, the sample sequence diagram structural characterization, and the sample media diagram structural characterization may be spliced to obtain spliced characterization, and the spliced characterization may be input to a decoder, through which the spliced characterization is decoded.

In practical application, the data sample further includes information of the target application program, and then may further include: and inputting the information of the target application program into an information coding sub-model in the encoder to obtain sample information characterization corresponding to the information of the target application program.

Accordingly, step a10 may be: based on the sample sequence representation, the sample sequence diagram structural representation, the sample medium diagram structural representation and the sample information representation, decoding the sample sequence representation, the sample sequence diagram structural representation, the sample medium diagram structural representation and the sample information representation through a decoder corresponding to the encoder, and performing joint training on the encoder (comprising a sequence coding sub-model, a sequence diagram coding sub-model, a medium diagram coding sub-model and an information coding sub-model) and the decoder based on a decoding result and a preset loss function to obtain the trained encoder.

In implementation, the sample sequence characterization, the sample sequence diagram structural characterization, the sample medium diagram structural characterization and the sample information characterization may be subjected to splicing processing to obtain a spliced characterization, and the spliced characterization may be input to a decoder, through which the spliced characterization is subjected to decoding processing.

During the training of the encoder, supervised learning can be performed on known samples based on the above-described structure. After the encoder training is completed, a predictive scoring can be performed on the full number of applets, so that the business risk cognition on the applets is completed, and the details of the related content can be seen, and the description is omitted here.

In addition, the scheme can effectively fuse the graph structure data (comprising sequence graph structure data, medium graph structure data and the like) with the behavior sequence data, so that the cognition capability of a target application program is improved, the processing speed is improved, the misjudgment rate is reduced, in addition, the information of the target application program, the page jump behavior sequence data, the page node information and the information of other application programs related to the target application program can be fused, the characteristics and the potential risks of the target application program are comprehensively known, and the problem of fusion of the graph structure data and the sequence data can be effectively solved through a novel industry digital main cognition scheme, so that the cognition capability of a small program is improved, the flexibility and the adaptability are higher, more accurate and reliable analysis results can be provided, and powerful support is provided for judging the operational risks.

The following provides a detailed description of a risk detection method for an application program according to an embodiment of the present disclosure in conjunction with a specific application scenario, where the target application program may be an applet installed in a host program, and the method may specifically include the following steps:

In step B02, a data sample for detecting whether the applet is at a preset operational risk is acquired, where the data sample includes historical access log data of the user, information related to the applet, and information of the applet before the user uses the applet.

Wherein the information related to the applet includes other applet different from the applet, information of the merchant corresponding to the applet, and information of the merchant corresponding to the other applet.

In step B04, a behavior sequence data sample of the user before the user uses the applet is determined based on the history access log data.

In step B06, the behavior sequence data sample is input into a pre-trained first characterization model constructed based on RoBERTa, and a sample initial characterization sequence corresponding to the behavior sequence data sample is obtained.

In step B08, the sample initial characterization sequence is input into a sequence coding sub-model constructed based on a transducer-XL in an encoder, the relative position coding module in the sequence coding sub-model constructed based on the transducer-XL performs relative position coding processing on the sample initial characterization corresponding to different data nodes included in the sample initial characterization sequence, and the hidden state corresponding to the data of the current processing time point in the sample initial characterization sequence is transferred to the next time point through a circulation strategy in the sequence coding sub-model, so as to perform coding processing on the sample initial characterization sequence, and obtain the sample sequence characterization corresponding to the behavior sequence data sample.

In step B10, a correlation coefficient matrix between different data nodes in the behavior sequence data sample is constructed based on the historical access log data and the sample initial characterization sequence corresponding to the behavior sequence data sample.

In step B12, a sample edge-joining sparse threshold is determined based on a correlation coefficient matrix between different data nodes in the behavior sequence data sample, and a sequence diagram structural data sample is constructed based on the sample edge-joining sparse threshold and the data nodes included in the behavior sequence data sample.

In step B14, the sequence diagram structural data sample is input into a sequence diagram coding sub-model constructed based on the HGT algorithm in the encoder, and the correlation between different types of entities and the correlation between the same type of entities in the sequence diagram structural data sample are determined through the attention module in the sequence diagram coding sub-model constructed based on the HGT algorithm, so as to encode the sequence diagram structural data sample, and obtain a sample sequence diagram structural representation corresponding to the sequence diagram structural data sample.

In step B16, a media map structure data sample constructed from the applet, the other applet, information of the merchant corresponding to the applet, and information of the merchant corresponding to the other applet is determined based on the information related to the applet.

In step B18, the media map structural data sample and the sample sequence map structural representation are input into a media map coding sub-model constructed based on the HGT algorithm in the encoder, and the correlation between different types of entities in the media map structural data sample and the correlation between the same types of entities are determined through the attention module in the media map coding sub-model constructed based on the HGT algorithm, so as to encode the media map structural data sample, and obtain the sample media map structural representation corresponding to the media map structural data sample.

In step B20, the information of the applet is input into an information coding sub-model constructed based on AutoInt networks in an encoder, and sample information characterization corresponding to the information of the applet is obtained.

In step B22, based on the sample sequence representation, the sample sequence diagram structural representation, the sample medium diagram structural representation and the sample information representation, the sample sequence diagram structural representation, the sample medium diagram structural representation and the sample information representation are decoded by a decoder corresponding to the encoder, and the encoder and the decoder are jointly trained based on the decoding result and a preset loss function, so as to obtain the trained encoder.

During the training of the encoder, supervised learning can be performed on known samples based on the above-described structure. After the encoder training is completed, the prediction scoring can be performed on the specified applet, so that the operational risk cognition on the applet is completed, and the specific reference can be seen from the related content in the following steps.

In step B24, target data for detecting whether the applet has a preset risk of operability is obtained, where the target data includes at least access log data of the user before the applet is used by the user, information related to the applet, and information of the applet.

In step B26, based on the access log data, behavior sequence data of the user before the user uses the applet is determined.

In step B28, the behavior sequence data is input into a first characterization model constructed based on RoBERTa in pre-training, so as to obtain an initial characterization sequence corresponding to the behavior sequence data.

In step B30, the initial characterization sequence is input into a sequence coding sub-model constructed based on a transducer-XL in the encoder, the relative position coding module in the sequence coding sub-model constructed based on the transducer-XL performs the relative position coding processing on the initial characterization corresponding to different data nodes included in the initial characterization sequence, and the hidden state corresponding to the data at the current processing time point in the initial characterization sequence is transferred to the next time point through the circulation strategy in the sequence coding sub-model, so as to perform the coding processing on the initial characterization sequence, and the sequence characterization corresponding to the behavior sequence data is obtained.

In step B32, a correlation coefficient matrix between different data nodes in the behavior sequence data is constructed based on the access log data and the initial characterization sequence corresponding to the behavior sequence data.

In step B34, a joint-edge sparse threshold is determined based on the correlation coefficient matrix between different data nodes in the behavior sequence data, and sequence-diagram structural data is constructed based on the joint-edge sparse threshold and the data nodes included in the behavior sequence data.

In step B36, the sequence diagram structure data is input into a sequence diagram coding sub-model constructed based on the HGT algorithm in the encoder, and the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data are determined through the attention module in the sequence diagram coding sub-model constructed based on the HGT algorithm, so as to encode the sequence diagram structure data, and obtain a sequence diagram structure representation corresponding to the sequence diagram structure data.

In step B38, media map structure data constructed from the applet, other applet, information of the merchant corresponding to the applet, and information of the merchant corresponding to the other applet is determined based on the information related to the applet.

In step B40, the above media map structure data and the above sequence map structure representation are input into a media map coding sub-model constructed based on the HGT algorithm in the encoder, and the association relationship between different types of entities in the media map structure data and the association relationship between the same types of entities are determined by the attention module in the media map coding sub-model constructed based on the HGT algorithm, so as to encode the media map structure data, and obtain the media map structure representation corresponding to the media map structure data.

In step B42, the information of the applet is input into an information coding sub-model constructed based on AutoInt networks in the encoder, and information characterization corresponding to the information of the applet is obtained.

In step B44, it is determined whether the applet has a preset operational risk based on the sequence characterization, the sequence map structural characterization, the media map structural characterization and the information characterization.

The above method for detecting risk of an application program provided in the embodiment of the present disclosure further provides a device for detecting risk of an application program based on the same concept, as shown in fig. 8.

The risk detection device for the application program comprises: a data acquisition module 801, a data processing module 802, a sequence encoding module 803, a graph encoding module 804, and a risk determination module 805, wherein:

A data acquisition module 801, configured to acquire target data for detecting whether a target application program has a preset operational risk, where the target data at least includes access log data of a user before the user uses the target application program;

A data processing module 802, configured to determine, based on the access log data, behavior sequence data of the user before the user uses the target application, and determine, based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes included in the behavior sequence data;

The sequence coding module 803 is used for carrying out coding processing on the relative positions of different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder based on the behavior sequence data, and transmitting a hidden state corresponding to the data at the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the behavior sequence data, so that a sequence representation corresponding to the behavior sequence data is obtained;

the diagram coding module 804 inputs the sequence diagram structural data into a sequence diagram coding sub-model in the encoder, and determines association relations between different types of entities and association relations between entities of the same type in the sequence diagram structural data through an attention module in the sequence diagram coding sub-model so as to code the sequence diagram structural data to obtain a sequence diagram structural representation corresponding to the sequence diagram structural data;

The risk determination module 805 determines, based on the sequence representation and the sequence diagram structural representation, whether a preset operational risk exists for the target application.

In this embodiment of the present disclosure, the target data further includes information related to the target application, where the information related to the target application includes other applications different from the target application, information of merchants corresponding to the target application, and information of merchants corresponding to the other applications, and the apparatus further includes:

The medium diagram construction module is used for determining medium diagram structure data constructed by the target application program, the other application programs, the information of the merchants corresponding to the target application program and the information of the merchants corresponding to the other application programs based on the information related to the target application program;

The medium diagram coding module is used for inputting the medium diagram structural data and the sequence diagram structural representation into a medium diagram coding sub-model in the coder, and determining the association relationship between different types of entities in the medium diagram structural data and the association relationship between the same type of entities through an attention module in the medium diagram coding sub-model so as to code the medium diagram structural data to obtain the medium diagram structural representation corresponding to the medium diagram structural data;

The risk determination module 805 determines whether a preset operational risk exists for the target application based on the sequence characterization, the sequence diagram structural characterization, and the media diagram structural characterization.

In this embodiment of the present disclosure, the target data further includes information of the target application program, and the apparatus further includes:

The information coding module inputs the information of the target application program into an information coding sub-model in the coder to obtain an information representation corresponding to the information of the target application program;

the risk determination module 805 determines whether a preset operational risk exists for the target application based on the sequence characterization, the sequence diagram structural characterization, and the information characterization.

In the embodiment of the present disclosure, the sequence encoding module 803 includes:

The initial characterization unit is used for inputting the behavior sequence data into a pre-trained first characterization model to obtain an initial characterization sequence corresponding to the behavior sequence data;

The sequence coding unit inputs the initial characterization sequence into a sequence coding sub-model in the coder, carries out relative position coding processing on initial characterization corresponding to different data nodes contained in the initial characterization sequence through a relative position coding module in the sequence coding sub-model, and transmits a hidden state corresponding to data of a current processing time point in the initial characterization sequence to a next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the initial characterization sequence to obtain a sequence characterization corresponding to the behavior sequence data.

In this embodiment of the present disclosure, the first characterization model is a model constructed based on ALBERT, roBERTa or ELECTRA, the sequence coding sub-model is a sub-model constructed based on a transform-XL, the sequence map coding sub-model is a sub-model constructed based on a target algorithm, the medium map coding sub-model is a sub-model constructed based on the target algorithm, the target algorithm includes an algorithm for processing heterogeneous map structure data, and the information coding sub-model is a sub-model constructed based on a multi-layer perceptron MLP or the information coding sub-model is a sub-model constructed based on a AutoInt network.

In this embodiment of the present disclosure, the target algorithm includes a heterogram transformation HGT algorithm, and the target application is an applet loaded in a host program.

In the embodiment of the present disclosure, the data processing module 802 includes:

the correlation coefficient matrix construction unit is used for constructing a correlation coefficient matrix between different data nodes in the behavior sequence data based on the access log data and the behavior sequence data;

The sequence diagram construction unit is used for determining a continuous edge sparse threshold value based on a correlation coefficient matrix among different data nodes in the behavior sequence data, and constructing the sequence diagram structural data based on the continuous edge sparse threshold value and the data nodes contained in the behavior sequence data.

In an embodiment of the present disclosure, the apparatus further includes:

The sample acquisition module is used for acquiring a data sample for detecting whether the target application program has preset business risk or not, wherein the data sample at least comprises historical access log data of a user before the user uses the target application program;

the sample processing module is used for determining a behavior sequence data sample of the user before the user uses the target application program based on the historical access log data and determining a sequence diagram structure data sample constructed based on different data nodes contained in the behavior sequence data sample based on the historical access log data and the behavior sequence data sample;

The sequence sample coding module is used for coding the relative positions of different data nodes contained in the behavior sequence data samples through a sequence coding sub-model in the coder based on the behavior sequence data samples, and transmitting the hidden state corresponding to the data samples at the current processing time point in the behavior sequence data samples to the next time point through a circulation strategy in the sequence coding sub-model so as to code the behavior sequence data samples to obtain sample sequence characterization corresponding to the behavior sequence data samples;

the sequence diagram structure data sample is input into a sequence diagram coding sub-model in the encoder, and the incidence relation between different types of entities and the incidence relation between the same type of entities in the sequence diagram structure data sample are determined through an attention module in the sequence diagram coding sub-model so as to encode the sequence diagram structure data sample to obtain a sample sequence diagram structure representation corresponding to the sequence diagram structure data sample;

And the training module is used for carrying out decoding processing on the sample sequence representation and the sample sequence diagram structural representation through a decoder corresponding to the encoder based on the sample sequence representation and the sample sequence diagram structural representation, and carrying out joint training on the encoder and the decoder based on a decoding result and a preset loss function to obtain a trained encoder.

The embodiment of the present disclosure provides a risk detection apparatus for an application program, by obtaining target data for detecting whether a preset operational risk exists in the target application program, where the target data includes at least access log data of a user before the user uses the target application program, then, based on the access log data, it may determine behavior sequence data of the user before the user uses the target application program, and based on the access log data and the behavior sequence data, it may determine sequence diagram structure data constructed based on different data nodes included in the behavior sequence data, afterwards, may determine, based on the behavior sequence data, a relative position encoding process is performed on different data nodes included in the behavior sequence data by a sequence encoding sub-model in a pre-trained encoder, and a hidden state corresponding to data of a currently processed time point in the behavior sequence data is transferred to a next time point by a cyclic strategy in the sequence encoding sub-model, so as to perform encoding process on the behavior sequence data, obtain a sequence representation corresponding to the behavior sequence diagram structure data, and based on the access log data and the behavior sequence diagram encoding sub-model, determine, based on the sequence diagram structure data and the entity relationship between different types in the attention diagram structure data in the behavior sequence diagram sub-model, and the sequence diagram representation structure map structure data may be introduced into a sequence representation mechanism, and the corresponding relationship between the target sequence representation structure map structure and the target application program is determined by the relative position encoding sub-model, the relative position representation is used for encoding the relation between the positions of different data nodes in the sequence data, so that the encoder can better capture the dependency relation existing in the long sequence data, the modeling capability of a model is improved, in addition, a circulation mechanism is introduced into a sequence encoding sub-model in the encoder, the hidden state of each time step can be transferred to the next time step, so that the encoder can keep the memory of the previous time step when processing the sequence data, and the long-term dependency relation possibly existing can be better captured, through the circulation mechanism, the encoder can better process the long sequence data, and the performance and the generalization capability of the encoder are improved. In addition, the attention mechanism is introduced into the sequence diagram coding submodel in the encoder to construct the relationship between different types of entities and the relationship between the same type of entities in the sequence diagram structural data, so that the encoder can capture the complex relationship between different types of entities in the heterogeneous diagram, further more accurate characterization and prediction results can be obtained, and the risk detection efficiency and accuracy of the application program are improved.

The risk detection device for the application program provided in the embodiment of the present disclosure further provides a risk detection device for the application program based on the same concept, as shown in fig. 9.

The risk detection device of the application may provide a terminal device or a server or the like for the above-described embodiments.

The risk detection device of the application program may have a relatively large difference due to different configurations or performances, and may include one or more processors 901 and a memory 902, where the memory 902 may store one or more application programs or data. Wherein the memory 902 may be transient storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown in the figures), each of which may include a series of computer-executable instructions in a risk detection device for the application program. Still further, the processor 901 may be arranged to communicate with the memory 902 to execute a series of computer executable instructions in the memory 902 on the risk detection device of the application. The risk detection device of the application may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input output interfaces 905, and one or more keyboards 906.

In particular, in this embodiment, the risk detection device of the application program includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the risk detection device of the application program, and executing the one or more programs by the one or more processors includes computer executable instructions for:

acquiring target data for detecting whether a target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program;

determining behavior sequence data of the user before the user uses the target application program based on the access log data, and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data;

Based on the behavior sequence data, carrying out coding processing on relative positions of different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder, and transmitting a hidden state corresponding to data at a current processing time point in the behavior sequence data to a next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the behavior sequence data, thereby obtaining a sequence representation corresponding to the behavior sequence data;

Inputting the sequence diagram structure data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structure data through an attention module in the sequence diagram coding sub-model so as to encode the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data;

And determining whether the target application program has preset operational risks or not based on the sequence characterization and the sequence diagram structural characterization.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the risk detection device embodiment of the application program, since it is substantially similar to the method embodiment, the description is relatively simple, and reference is made to the partial description of the method embodiment for relevant points.

The embodiment of the present specification provides a risk detection apparatus for an application program, by acquiring target data for detecting whether a target application program has a preset business risk, the target data including at least access log data of a user before the user uses the target application program, then, based on the access log data, behavior sequence data of the user before the user uses the target application program may be determined, and based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes included in the behavior sequence data may be determined, then, based on the behavior sequence data, encoding processing of relative positions of the different data nodes included in the behavior sequence data may be performed by a sequence encoding sub-model in a pre-trained encoder, transmitting the hidden state corresponding to the data of the current processing time point in the behavior sequence data to the next time point through a circulation strategy in the sequence coding sub-model to code the behavior sequence data to obtain a sequence representation corresponding to the behavior sequence data, inputting the sequence diagram structure data into the sequence diagram coding sub-model in the encoder, determining the association relationship between different types of entities in the sequence diagram structure data and the association relationship between the same types of entities through an attention module in the sequence diagram coding sub-model to code the sequence diagram structure data to obtain a sequence diagram structure representation corresponding to the sequence diagram structure data, finally, determining whether a preset operational risk exists in the target application program based on the sequence representation and the sequence diagram structure representation, thus, introducing a relative position coding mechanism into the sequence coding sub-model in the encoder, the relative position representation is used for encoding the relation between the positions of different data nodes in the sequence data, so that the encoder can better capture the dependency relation existing in the long sequence data, the modeling capability of a model is improved, in addition, a circulation mechanism is introduced into a sequence encoding sub-model in the encoder, the hidden state of each time step can be transferred to the next time step, so that the encoder can keep the memory of the previous time step when processing the sequence data, and the long-term dependency relation possibly existing can be better captured, through the circulation mechanism, the encoder can better process the long sequence data, and the performance and the generalization capability of the encoder are improved. In addition, the attention mechanism is introduced into the sequence diagram coding submodel in the encoder to construct the relationship between different types of entities and the relationship between the same type of entities in the sequence diagram structural data, so that the encoder can capture the complex relationship between different types of entities in the heterogeneous diagram, further more accurate characterization and prediction results can be obtained, and the risk detection efficiency and accuracy of the application program are improved.

Further, based on the method shown in fig. 1 to 7, one or more embodiments of the present disclosure further provide a storage medium, which is used to store computer executable instruction information, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instruction information stored in the storage medium can implement the following flow when executed by a processor:

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for one of the above-described storage medium embodiments, since it is substantially similar to the method embodiment, the description is relatively simple, and reference is made to the description of the method embodiment for relevant points.

The embodiment of the present specification provides a storage medium, by acquiring target data for detecting whether a target application has a preset business risk, the target data including at least access log data of a user before the user uses the target application, then, based on the access log data, behavior sequence data of the user before the user uses the target application may be determined, and based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes included in the behavior sequence data may be determined, then, based on the behavior sequence data, encoding processing of relative positions may be performed on the different data nodes included in the behavior sequence data by a sequence encoding sub-model in a pre-trained encoder, and a hidden state corresponding to data of a point in time currently processed in the behavior sequence data may be transferred to a next point in time by a cyclic strategy in the sequence encoding sub-model, the sequence diagram structure data is encoded to obtain the sequence diagram structure representation corresponding to the sequence diagram structure data, the sequence diagram structure data can be input into a sequence diagram encoding sub-model in an encoder, the incidence relation between different types of entities in the sequence diagram structure data and the incidence relation between the same type of entities are determined through an attention module in the sequence diagram encoding sub-model, the sequence diagram structure data is finally obtained, whether the target application program has preset operational risk can be determined based on the sequence representation and the sequence diagram structure representation, in this way, by introducing a relative position encoding mechanism into the sequence encoding sub-model in the encoder, the relative position representation is used for encoding the relation between the positions of different data nodes in the sequence data, so that the encoder can better capture the dependency relation existing in the long sequence data, the modeling capability of a model is improved, in addition, a circulation mechanism is introduced into a sequence encoding sub-model in the encoder, the hidden state of each time step can be transferred to the next time step, so that the encoder can keep the memory of the previous time step when processing the sequence data, and the long-term dependency relation possibly existing can be better captured, through the circulation mechanism, the encoder can better process the long sequence data, and the performance and the generalization capability of the encoder are improved. In addition, the attention mechanism is introduced into the sequence diagram coding submodel in the encoder to construct the relationship between different types of entities and the relationship between the same type of entities in the sequence diagram structural data, so that the encoder can capture the complex relationship between different types of entities in the heterogeneous diagram, further more accurate characterization and prediction results can be obtained, and the risk detection efficiency and accuracy of the application program are improved.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-to-parallel device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-to-parallel device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present description may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A risk detection method for an application, the method comprising:

2. The method of claim 1, the target data further comprising information related to the target application, the information related to the target application comprising information of other applications different from the target application, information of merchants corresponding to the target application, and information of merchants corresponding to the other applications, the method further comprising:

Determining media map structure data constructed by the target application program, the other application programs, information of merchants corresponding to the target application program and information of merchants corresponding to the other application programs based on the information related to the target application program;

Inputting the medium diagram structure data and the sequence diagram structure representation into a medium diagram coding sub-model in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the medium diagram structure data through an attention module in the medium diagram coding sub-model so as to encode the medium diagram structure data to obtain the medium diagram structure representation corresponding to the medium diagram structure data;

the determining whether the target application program has a preset operational risk based on the sequence representation and the sequence diagram structural representation comprises the following steps:

And determining whether the target application program has a preset operational risk or not based on the sequence characterization, the sequence diagram structural characterization and the medium diagram structural characterization.

3. The method of claim 2, further comprising information of the target application in the target data, the method further comprising:

Inputting the information of the target application program into an information coding sub-model in the encoder to obtain an information representation corresponding to the information of the target application program;

And determining whether the target application program has preset operational risks or not based on the sequence characterization, the sequence diagram structural characterization and the information characterization.

4. A method according to claim 3, wherein the encoding processing of the relative positions of different data nodes included in the behavior sequence data by using a sequence encoding sub-model in a pre-trained encoder based on the behavior sequence data, and transmitting a hidden state corresponding to data at a current processing time point in the behavior sequence data to a next time point by using a cyclic strategy in the sequence encoding sub-model, so as to encode the behavior sequence data, and obtain a sequence representation corresponding to the behavior sequence data, includes:

Inputting the behavior sequence data into a pre-trained first characterization model to obtain an initial characterization sequence corresponding to the behavior sequence data;

inputting the initial characterization sequence into a sequence coding sub-model in the encoder, carrying out relative position coding processing on the initial characterization corresponding to different data nodes contained in the initial characterization sequence through a relative position coding module in the sequence coding sub-model, and transmitting a hidden state corresponding to data of a current processing time point in the initial characterization sequence to a next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the initial characterization sequence, thereby obtaining the sequence characterization corresponding to the behavior sequence data.

5. The method of claim 4, the first characterization model being a model constructed based on ALBERT, roBERTa or ELECTRA, the sequence-encoding sub-model being a sub-model constructed based on a transducer-XL, the sequence-map-encoding sub-model being a sub-model constructed based on a target algorithm, the media-map-encoding sub-model being a sub-model constructed based on the target algorithm, the target algorithm comprising an algorithm for processing heterostructure data, the information-encoding sub-model being a sub-model constructed based on a multi-layer perceptron MLP or the information-encoding sub-model being a sub-model constructed based on a AutoInt network.

6. The method of claim 5, the target algorithm comprising a heterogram transformation HGT algorithm, the target application being an applet hosted in a host program.

7. The method of claim 6, the determining, based on the access log data and the behavior sequence data, sequence diagram structure data constructed based on different data nodes included in the behavior sequence data, comprising:

constructing a correlation coefficient matrix between different data nodes in the behavior sequence data based on the access log data and the behavior sequence data;

And determining a continuous edge sparse threshold based on a correlation coefficient matrix among different data nodes in the behavior sequence data, and constructing the sequence diagram structural data based on the continuous edge sparse threshold and the data nodes contained in the behavior sequence data.

8. The method of claim 7, the method further comprising:

Acquiring a data sample for detecting whether the target application program has preset operational risk or not, wherein the data sample at least comprises historical access log data of a user before the user uses the target application program;

Determining a behavior sequence data sample of the user before the user uses the target application program based on the historical access log data, and determining a sequence diagram structure data sample constructed based on different data nodes contained in the behavior sequence data sample based on the historical access log data and the behavior sequence data sample;

Based on the behavior sequence data sample, performing coding processing on relative positions of different data nodes contained in the behavior sequence data sample through a sequence coding sub-model in the coder, and transmitting a hidden state corresponding to a data sample at a current processing time point in the behavior sequence data sample to a next time point through a circulation strategy in the sequence coding sub-model so as to perform coding processing on the behavior sequence data sample, thereby obtaining a sample sequence representation corresponding to the behavior sequence data sample;

inputting the sequence diagram structural data sample into a sequence diagram coding submodel in the encoder, and determining the association relationship between different types of entities and the association relationship between the same type of entities in the sequence diagram structural data sample through an attention module in the sequence diagram coding submodel so as to encode the sequence diagram structural data sample to obtain a sample sequence diagram structural representation corresponding to the sequence diagram structural data sample;

And based on the sample sequence representation and the sample sequence diagram structural representation, decoding the sample sequence representation and the sample sequence diagram structural representation through a decoder corresponding to the encoder, and performing joint training on the encoder and the decoder based on a decoding result and a preset loss function to obtain a trained encoder.

9. A risk detection apparatus for an application, the apparatus comprising:

the data acquisition module is used for acquiring target data for detecting whether a target application program has preset operational risk or not, wherein the target data at least comprises access log data of a user before the user uses the target application program;

The data processing module is used for determining behavior sequence data of the user before the user uses the target application program based on the access log data and determining sequence diagram structure data constructed based on different data nodes contained in the behavior sequence data based on the access log data and the behavior sequence data;

The sequence coding module is used for carrying out relative position coding processing on different data nodes contained in the behavior sequence data through a sequence coding sub-model in a pre-trained encoder based on the behavior sequence data, and transmitting a hidden state corresponding to data at a current processing time point in the behavior sequence data to a next time point through a circulation strategy in the sequence coding sub-model so as to carry out coding processing on the behavior sequence data, so that a sequence representation corresponding to the behavior sequence data is obtained;

the diagram coding module is used for inputting the sequence diagram structural data into a sequence diagram coding sub-model in the encoder, and determining the association relationship between different types of entities in the sequence diagram structural data and the association relationship between the same type of entities through an attention module in the sequence diagram coding sub-model so as to code the sequence diagram structural data to obtain a sequence diagram structural representation corresponding to the sequence diagram structural data;

And the risk determining module is used for determining whether the target application program has preset operational risk or not based on the sequence representation and the sequence diagram structural representation.

10. A risk detection apparatus of an application program, the risk detection apparatus of the application program comprising:

A processor; and

A memory arranged to store computer executable instructions that, when executed, cause the processor to: