CN119863295B

CN119863295B - Risky store cluster detection method and its device, equipment, and medium

Info

Publication number: CN119863295B
Application number: CN202411978733.2A
Authority: CN
Inventors: 梁轩
Original assignee: Guangzhou Shangyan Network Technology Co ltd
Current assignee: Guangzhou Shangyan Network Technology Co ltd
Priority date: 2024-12-30
Filing date: 2024-12-30
Publication date: 2025-09-23
Anticipated expiration: 2044-12-30
Also published as: CN119863295A

Abstract

The application relates to a method, a device, equipment and a medium for detecting a risk shop station group, wherein the method comprises the steps of marking shops as risk shops when the commodity number of abnormal commodity trade in the shops meets a preset condition, collecting a plurality of risk shops to form a risk shop set, calculating commodity total number ratio of every two risk shops in the risk shop set, marking the corresponding two risk shops as suspected risk shop pairs when the commodity total number ratio is in a preset ratio interval, enabling the suspected risk shop pairs to be used for forming a suspected risk shop pair set, filtering part of suspected risk shop pairs from the suspected risk shop pair set, calculating commodity similar coverage rates of the suspected risk shop pairs of the corresponding two risk shops, and determining the risk station group based on commodity similar coverage rates of all the suspected risk shop pairs in the suspected risk shop pair set. According to the application, the detection range is gradually reduced through a multi-stage screening mechanism, and finally, the risk store station group is accurately identified.

Description

Risk shop station group detection method and device, equipment and medium thereof

Technical Field

The application relates to the technical field of electronic commerce, in particular to a detection method for a station group of a risk store, and a device, equipment and medium thereof.

Background

The e-commerce platform usually has a large number of online shops, and the online shops are charged with actual operations by different operation entities, and different operators and operation modes thereof enable the corresponding online shops to present different risk levels. The risk level of the online store refers to an abstract description concept of the operation health condition of the online store, and is generally reflected as the operation credit of the online store, and the online store with low risk level has better operation credit, whereas the online store with higher risk level, namely, the risk store, has relatively worse operation credit. It can be seen that the risk level of an online store of an e-commerce platform basically determines the survival of this platform.

The prior art generally relies on independent analysis of a single store in identifying a risk store. Specifically, the prior art mainly adopts preset rules or models to perform risk assessment and risk classification on a single shop. However, some operators often build multiple online stores in order to circumvent e-commerce platform supervision, disperse risks, or implement other improper actions. These stores seemingly operate independently, having different store names and business bodies, but are in fact regulated by the same business entity. In this case, when a risk occurs in a certain shop, if the shop station group to which the shop station group belongs cannot be timely identified and managed, huge potential risks and losses are caused to the e-commerce platform.

Therefore, how to design a detection method capable of identifying a risk shop station group becomes a technical problem to be solved by the e-commerce platform.

Disclosure of Invention

The primary object of the present application is to solve at least one of the above problems and provide a method for detecting a group of risk shops, and a device, apparatus and medium thereof.

In order to meet the purposes of the application, the application adopts the following technical scheme:

The application provides a detection method for a risk shop station group, which is suitable for one of the purposes of the application, and comprises the following steps:

When the commodity quantity of abnormal commodity trades in shops meets preset conditions, marking the shops as risk shops, and collecting a plurality of risk shops to form a risk shop set;

Calculating commodity total number ratio of every two risk shops in the risk shop set, and marking the corresponding two risk shops as suspicious risk shop pairs when the commodity total number ratio is in a preset ratio interval, wherein the suspicious risk shop pairs are used for forming a suspicious risk shop pair set;

filtering out part of the suspected risk shop pairs from the suspected risk shop pair set based on the commodity quantity of the corresponding transaction abnormal commodity and a preset similarity judgment rule;

calculating commodity similar coverage rates of the suspected risk shops to corresponding two risk shops based on the filtered suspected risk shop pair set, and determining risk station group shops based on commodity similar coverage rates of all suspected risk shops in the suspected risk shop pair set to corresponding risk shops.

In another aspect, a risk store station group detection device according to one of the objects of the present application includes:

The risk store marking module is used for marking the store as a risk store when the commodity number of the abnormal commodity traded in the store meets the preset condition, and acquiring a plurality of risk stores to form a risk store set;

The risk store pair forming module is used for calculating commodity total number ratio of every two risk stores in the risk store set, and when the commodity total number ratio is in a preset ratio interval, the corresponding two risk stores are marked as suspected risk store pairs, and the suspected risk store pairs are used for forming a suspected risk store pair set;

the risk shop pair filtering module is used for filtering part of the suspected risk shop pairs from the suspected risk shop pair set based on the commodity quantity of the corresponding transaction abnormal commodities of the suspected risk shop pairs and a preset similarity judging rule;

The risk store station group determining module is used for calculating commodity similar coverage rates of the suspected risk stores to corresponding two risk stores based on the filtered suspected risk store pair set, and determining risk station group stores based on commodity similar coverage rates of all suspected risk stores in the suspected risk store pair set.

In yet another aspect, a computer device adapted to one of the objects of the present application comprises a central processor and a memory, said central processor being adapted to invoke the steps of running a computer program stored in said memory to perform the method of detecting a group of risk shops according to the present application.

In yet another aspect, a computer program product is provided adapted to another object of the application, comprising a computer program/instruction which, when executed by a processor, carries out the steps of the method described in any of the embodiments of the application.

The technical scheme of the application has various advantages, including but not limited to the following aspects:

On the one hand, the application comprehensively considers the relevance among a plurality of risk shops, and can more effectively identify the risk shop station group controlled by the same business entity, thereby not only improving the detection coverage range, but also more accurately capturing the behavior of a merchant for dispersing risks by operating a plurality of shops.

On the other hand, the detection range is gradually narrowed by a multi-stage screening mechanism, and finally, the risk station group shops are determined. Specifically, the risk shops are screened out to form a risk shop set according to preset conditions, and then the suspected risk shop pairs are further screened out to form a suspected risk shop pair set based on the commodity total number ratio of every two risk shops in the risk shop set. That is, by quantifying the commodity scale relationship between stores, a risk store station group that may be controlled by the same business entity can be preliminarily identified. And then filtering the suspected risk shop pairs based on the number of the abnormal commodity and a preset similarity judgment rule, screening the more strongly correlated suspected risk shop pairs based on the set of the suspected risk shop pairs, filtering out the suspected risk shop pairs which do not accord with the preset similarity judgment rule, reducing false alarm rate and improving detection accuracy. Finally, by calculating the commodity similarity coverage rate and determining the risk station group shops, the risk station group controlled by the same business entity can be comprehensively identified.

According to the application, the detection range is gradually reduced through a multi-stage screening mechanism, the risk shop station group controlled by the same business entity is finally and accurately identified, the false alarm rate is effectively reduced, the detection accuracy and reliability are improved, and more efficient risk shop station group detection is provided for an e-commerce platform.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of an exemplary embodiment of a method for detecting a group of risk stores according to the present application;

FIG. 2 is a schematic flow chart of a risk store marked according to a forbidden limit merchandise ratio in an embodiment of the application;

FIG. 3 is a schematic flow chart of marking a risk store according to a fraud risk ratio in an embodiment of the present application;

FIG. 4 is a flow chart of filtering suspected risk store pairs based on the number of abnormal articles in a transaction according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a first similarity determination condition compliance determination according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of determining risk station group stores based on commodity similarity coverage in an embodiment of the present application;

FIG. 7 is a schematic flow chart of determining similar coverage of commodities in an embodiment of the present application;

FIG. 8 is a schematic block diagram of a risk store station group detection apparatus of the present application;

fig. 9 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

The risk store station group detection method of the application can be programmed into a computer program product and deployed in a server for operation, for example, in the exemplary application scenario of the application, the method can be deployed in a server of an e-commerce platform, wherein the e-commerce platform can be an e-commerce platform with open independent station service. The independent station is a novel official network (website) which is built based on the SaaS technology platform, has independent domain names, proprietary content, data and rights and interests, has independent management main authority and management main responsibility, is supported by social cloud computing capability, and can independently and freely butt-joint third party software tools, propaganda popularization media and channels. An online store is typically set up and operated on an independent station by a commodity seller so that the online store can display its commodity on the independent station, and an e-commerce service such as searching for the commodity on the online store, adding a shopping cart, settling accounts, delivering logistics, placing orders, after-sales services, etc. is provided to a commodity buyer (i.e., user) who performs various interactive operations with the e-commerce platform on a terminal device.

In some embodiments of the present application, the goods with forbidden or fraudulent risks are regarded as abnormal goods, wherein the goods with forbidden or forbidden risks (hereinafter called forbidden goods) definitely define the goods with restricted or forbidden sales for the electronic commerce platform, such as forbidden goods, illegal goods, dangerous goods, etc., in the field of cross-border electronic commerce, the forbidden goods can comprise seafood, agricultural products, drinks, etc., the goods with fraudulent risks (hereinafter called fraudulent goods) refer to the illegal goods, such as false delivery, low price, etc., and it is noted that the merchant attracts the attention of consumers with low price far lower than the market standard price when selling the fraudulent goods on the shelf, so that the consumers ignore the doubt on the authenticity of the goods. Wherein the fraudulent goods are usually counterfeit and inferior products with extremely low cost, and merchants are usually selling at low price to quickly acquire profits, evade platform supervision or induce consumer ordering.

In an exemplary embodiment of the present application, the number of goods in the store that trade abnormal goods is obtained, whether the store is a risk store is determined based on the number of goods, and a plurality of risk stores are collected to form a risk store set. The proportion of the commodity quantity of the abnormal commodity in the trade of the risk shops in the risk shop set to the total commodity quantity is higher, and the characteristics of the risk shops corresponding to the risk shop station group are met. And if one of the stores controlled by the same business entity is a risk store, the other stores controlled by the business entity are risk stores with high probability. Calculating the commodity total number ratio of every two risk shops in the risk shop set, and marking the corresponding two risk shops as suspected risk shop pairs when the commodity total number ratio is in a preset ratio interval, wherein the step considers that the commodity number of the risk shops on the same risk shop station group is always similar to a certain degree, and the same management entity always adopts a unified operation strategy and commodity on-shelf mode when controlling a plurality of shops, so that the commodity total number of the shops fluctuates in a certain range. By calculating the total commodity ratio and setting a reasonable preset ratio interval, the suspected risk shop pairs with similar characteristics can be effectively identified.

And further judging the similarity and the relatedness between the suspected risk shop pairs based on the commodity number of the corresponding transaction abnormal commodity and a preset similarity judging rule so as to confirm whether the suspected risk shop pairs belong to the same risk shop station group. In some embodiments, a rule for determining similarity is preset based on the first category of merchandise, the risk ratio of the forbidden merchandise, the second category of merchandise, and the rate of the fraudulent category of merchandise, and reference is made to the following detailed description for specific steps, which are not repeated herein. By this step, the suspected risk store pairs, some of which do not satisfy the preset similarity determination rule, are filtered out of the set of suspected risk store pairs. Wherein the similarity determination rules are preset by those skilled in the art as required.

Finally, determining risk station group stores by calculating commodity similar coverage rates of the suspected risk stores to the corresponding two risk stores, in one embodiment, marking each store in the suspected risk store pair set as a node in a risk station group store map, and when the commodity similar coverage rate is greater than a preset coverage rate threshold value, establishing edge connection between the corresponding two stores to form the risk station group store map for determining the risk station group to which the store belongs. The coverage threshold is set by those skilled in the art based on business data or experience.

The risk shop station group detection method of the application can be programmed into a computer program product and deployed in a server for operation, for example, in the exemplary application scenario of the application, the method can be deployed in the server of an e-commerce customer service platform, thereby being executed by accessing an interface opened after the computer program product is operated, and performing man-machine interaction with the process of the computer program product through a graphical user interface.

Referring to fig. 1, the method for detecting a risk shop station group according to the present application, in an exemplary embodiment, includes the following steps:

Step S5100, when the commodity number of the abnormal commodity traded in the shops meets the preset condition, marking the shops as risk shops, and collecting a plurality of risk shops to form a risk shop set;

In one embodiment, the limit-selling risk commodity is taken as the abnormal commodity, the preset condition is set to be that the limit-selling commodity proportion is larger than a preset proportion threshold value, wherein the limit-selling commodity proportion is obtained by dividing the commodity quantity of the abnormal commodity with the limit-selling risk in the corresponding store by the total commodity quantity of the store, and the proportion threshold value is set by a person skilled in the art according to the requirement. Specifically, detection of the forbidden and limited sales risk is performed on all the effective commodities in the store, and forbidden and limited sales risk labels are added to the corresponding effective commodities. After the forbidden limit selling risk labels are added to the effective commodities, the number of forbidden limit selling risk labels of shops can be directly counted to obtain the commodity number of the abnormal commodity. And when the commodity quantity of the abnormal commodity traded in the shops meets the preset condition, namely, the forbidden and limited commodity proportion is larger than the preset proportion threshold value, marking the corresponding shops as risk shops.

In another embodiment, the article at risk of fraud is considered as a transaction anomaly article, the preset condition is set such that the fraud article duty cycle is greater than a preset second proportion threshold, and the corresponding store is marked as a risky store when the number of articles in the store for which the fraud article is at risk meets the corresponding preset condition, i.e. the fraud article duty cycle is greater than the preset second proportion threshold. In another embodiment, the goods with forbidden sale risk and the goods with fraudulent risk are regarded as abnormal goods, and when two corresponding preset conditions are required to be met, namely, when the forbidden sale limiting goods ratio is greater than a preset proportion threshold value and the fraudulent goods ratio is greater than a preset second proportion threshold value, the corresponding shops are marked as risk shops.

After the stores to be detected are marked by the risk stores, a plurality of risk stores are acquired to form a risk store set, and the risk store set can be constructed based on detection and marking of all stores on the platform for realizing comprehensive detection when the risk store station group is determined later. The step screens potential risk shops in the abnormal commodity based on a screening mechanism of the abnormal commodity to form a risk shop set, and a data basis is provided for subsequent risk shop station group detection.

Step S5200, calculating commodity total number ratios of every two risk shops in the risk shop set, and marking the corresponding two risk shops as suspected risk shop pairs when the commodity total number ratios are in a preset ratio interval, wherein the suspected risk shop pairs are used for forming a suspected risk shop pair set;

And calculating commodity total ratio of every two risk shops aiming at every two risk shops in the risk shop set. Comparing the calculated commodity total ratio with a preset ratio interval. The preset ratio interval is preset according to the platform rule and the historical data and is used for judging whether two risk shops have similarity on commodity scale. And when the commodity total number ratio is in a preset ratio interval, marking the corresponding two risk shops as suspected risk shop pairs. In this way, the suspected risk store pairs having similarity on the commodity scale can be selected to constitute a set of suspected risk store pairs.

The risk shops in the risk shop station group are operated by the same main body or the related main bodies, and the corresponding commodity numbers often have high consistency. By screening shop pairs with commodity total number ratio in a preset interval, risk shops possibly belonging to the same station group can be effectively identified, so that more accurate data support is provided for subsequent risk station group detection.

Step S5300, filtering out part of the suspected risk shop pairs from the suspected risk shop pair set based on the commodity quantity of the corresponding abnormal commodity and a preset similarity judgment rule;

In one embodiment, based on each pair of stores (suspected risk store pair) in the set of suspected risk store pairs, a corresponding forbidden limit commodity risk ratio and fraudulent commodity risk ratio are calculated based on the number of commodities (including forbidden limit commodity number and fraudulent commodity number) of the corresponding transaction abnormal commodity of the suspected risk store pair. And then screening each pair of shops in two stages based on a preset similarity judgment condition. The first stage is based on the similarity judgment of the forbidden limit commodity, and the second stage is based on the similarity judgment of the fraudulent commodity. The present application is not limited to the order of the two stages, and the order of the two stages does not affect the inventive spirit of the present application.

In the similarity judgment in the first stage, the difference value of the forbidden and limited commodity risk ratios of the two shops is calculated first, namely the absolute value of the difference value of the forbidden and limited commodity risk ratios of the two shops. And comparing the difference value with a preset difference value threshold value, and if the difference value is smaller than the preset difference value threshold value, preliminarily judging that the two stores have similarity in the risk ratio of the forbidden commodities. Further analyzing the restricted commodity categories of the two stores, specifically, respectively obtaining a plurality of first commodity categories with a large number of restricted commodity in the two stores, and determining the coincidence rate of the first commodity categories of the two stores. The overlap ratio is the number of identical categories in the first category of two stores divided by the total number of first category of either store. If the coincidence rate is larger than a preset coincidence rate threshold value, judging that the two stores have high similarity in the restricted commodity category, and conforming to a preset similarity judging rule. If the risk ratio difference value and the category overlapping ratio of the forbidden and limited commodities of the two shops meet the preset conditions, the similarity judgment of the second stage is carried out, otherwise, the suspected risk shop pair is filtered out from the suspected risk shop pair set.

In the second stage of similarity determination, further screening is performed based on the similarity of the fraudulent merchandise. And obtaining a plurality of second commodity categories with a large number of fraudulent commodities in the two stores, and calculating the risk ratio of the fraudulent commodities. And judging whether the second commodity category of the two stores and the risk ratio of the fraudulent commodity are in accordance with a preset similarity judgment rule or not by analyzing the second commodity category of the two stores and the risk ratio of the fraudulent commodity. For example, a difference threshold corresponding to the category overlapping ratio and the risk ratio difference of the fraudulent commodity may be set, if both shops meet the preset conditions on the category and the risk ratio of the fraudulent commodity, the preset similarity determination rule is considered to be met, otherwise, the shop pair is filtered out from the suspected risk shop pair set. For specific steps, reference should be made to the following embodiments, which are not repeated herein.

Through the double screening mechanism, suspected risk shop pairs which do not meet preset similar judgment conditions can be effectively removed, and the suspected risk shop pairs which are high in similarity on forbidden and fraudulent commodities can be reserved. The fine screening process not only improves the accuracy of detection of the risk shop station group, but also provides a more reliable data basis for subsequent risk station group determination. Meanwhile, parameters such as a preset difference value threshold value and a preset coincidence rate threshold value can be adjusted according to actual service requirements so as to adapt to risk detection requirements in different scenes, and the flexibility and applicability of the method are further improved.

Step S5400, calculating commodity similar coverage rates of the suspected risk shops to the corresponding two risk shops based on the filtered suspected risk shop pair set, and determining risk station group shops based on the commodity similar coverage rates of all the suspected risk shops in the suspected risk shop pair set to the corresponding two risk shops.

In one embodiment, for each pair of stores (suspected risk store pair) in the filtered set of suspected risk store pairs, the corresponding commodity title is converted to a vector representation, resulting in a commodity title vector. The conversion of commodity heading vectors can be implemented by natural language processing techniques, for example, word embedding models (such as Word2Vec, BERT, etc.) are used to map words in commodity headings into vectors, and a vector representation of each commodity heading is obtained through a weighted average or pooling operation, so as to obtain commodity heading vectors. And then, based on the commodity heading vectors, calculating cosine similarity between the commodity heading vectors of the two shops to obtain a commodity similarity matrix. The cosine similarity is calculated by dividing the dot product of two commodity title vectors by the corresponding modular length product, the value range is [ -1,1], and the closer the value is to 1, the more similar the semantics of the two commodity titles are.

And determining commodity similarity coverage rates of two stores based on the obtained commodity similarity matrix. Specifically, for the commodity heading vector of each store, the maximum cosine similarity of the store and the commodity heading vector of another store is determined in the commodity similarity matrix, and the maximum cosine similarity is compared with a preset similarity threshold. And counting the commodity number with the maximum cosine similarity larger than the similarity threshold, and dividing the commodity number by the commodity total number of any one of the two shops to obtain the commodity similarity coverage rate of the two shops. The commodity similarity coverage rate reflects the similarity degree of two shops on commodity title semantics, and the higher the coverage rate is, the more likely the commodities of the two shops belong to the same category or have the same operation characteristics.

In another embodiment, a risk station group store map is constructed based on the commodity similarity coverage of all suspected risk store pairs. And taking each store as a node in the graph, and establishing edge connection between the two corresponding stores when the commodity similarity coverage rate of the two stores is greater than a preset coverage rate threshold value. In this way, store pairs with high similarity are connected to form a risk station group store map. Based on the risk group store graph, a tightly connected sub-graph may be identified by a graph analysis algorithm (e.g., a community detection algorithm) to determine a set of potential risk group stores. By the commodity similarity coverage rate-based graph construction method, risk store station groups operated by the same main body or related main bodies can be effectively identified, and powerful support is provided for risk management of an e-commerce platform.

In another embodiment, since a large number of shops need to be detected simultaneously, the number of the included shops is huge, and the calculation efficiency is low due to the fact that commodity vector representation is directly calculated in real time and commodity similarity coverage analysis between every two shops is performed. In this embodiment, all the commodity titles are preprocessed first, and corresponding commodity title vectors are obtained and stored offline. And meanwhile, according to the shop identification and the commodity serial number of the commodity, an index is established for the commodity so as to be quickly searched. When the risk shop station group detection is carried out, commodity data are divided into historical commodity data and new-shelf commodity data, commodity title vectors stored offline are obtained directly through commodity indexes for the historical commodity data, corresponding commodity title vectors are obtained for the new-shelf commodity data, the commodity title vectors corresponding to the new-shelf commodity data are stored offline, and corresponding indexes are updated. By the method, the time cost of real-time calculation is remarkably reduced, and efficient technical support is provided for large-scale risk store station group detection.

As can be appreciated from the exemplary embodiments of the present application, the technical solution of the present application has various advantages, including but not limited to the following aspects:

On the other hand, the detection range is gradually narrowed by a multi-stage screening mechanism, and finally, the risk station group shops are determined. Specifically, the risk shops are screened out to form a risk shop set according to preset conditions, and then the suspected risk shop pairs are further screened out to form a suspected risk shop pair set based on the commodity total number ratio of every two risk shops in the risk shop set. That is, by quantifying the commodity scale relationship between stores, a risk store station group that may be controlled by the same business entity can be preliminarily identified. And then filtering the suspected risk shop pairs based on the number of the abnormal commodity and a preset similarity judgment rule, screening the more strongly correlated suspected risk shop pairs based on the set of the suspected risk shop pairs, filtering out the suspected risk shop pairs which do not accord with the preset similarity judgment rule, reducing false alarm rate and improving detection accuracy. Finally, by calculating the commodity similarity coverage rate and determining the risk station group shops, the risk station group controlled by the same business entity can be comprehensively identified. According to the application, the detection range is gradually reduced through a multi-stage screening mechanism, the risk shop station group controlled by the same business entity is finally and accurately identified, the false alarm rate is effectively reduced, the detection accuracy and reliability are improved, and more efficient risk shop station group detection is provided for an e-commerce platform.

In addition, by offline storage of commodity title vectors, the real-time calculation requirements can be greatly reduced, and particularly when large-scale commodity data are processed, repeated calculation and resource waste are avoided. In addition, the introduction of offline storage and indexing mechanisms also enhances the scalability and flexibility of the present application. With the continuous increase of the number of commodities on the platform, the accuracy and timeliness of detection of the risk shop station group can be ensured by periodically updating the commodity title vector and index stored offline. The technical scheme of the application can adapt to the continuously changing data scale and business requirement of the e-commerce platform, and provides continuous and reliable risk management support for the platform.

In a further embodiment, referring to fig. 2, when the number of the abnormal commodity in the store meets a preset condition, marking the store as a risk store, and collecting a plurality of risk stores to form a risk store set, including the following steps:

Step S5110, acquiring commodity titles of all effective commodities in a store, wherein the commodity titles comprise commodity main titles and commodity sub-titles;

Commodity information of the target shops can be accessed through a data interface or a database of the electronic commerce platform, and commodity titles of all effective commodities in the commodity information can be extracted. The effective commodity is the commodity in the current selling state, namely the commodity which is not put down or deleted, and the commodity title consists of a commodity normal title and a commodity subheading. The commodity heading is core description information of the commodity and is used for summarizing main characteristics of the commodity, such as brands, models, functions and the like, and the commodity subheading is supplementary description of the heading and comprises sales promotion information, specification parameters or other detailed descriptions. When extracting commodity titles, it is necessary to ensure that the main title and the sub-title are acquired simultaneously so as to ensure the integrity and accuracy of commodity information. For some goods, the subtitle may be empty, in which case only the positive title is used as the goods title.

In one embodiment, the title and the subheading of the commodity are spliced together to form a new text, for example, when the commodity is entitled to English, all texts are subjected to lower case processing to eliminate the influence of case-case difference on text analysis. The text may then be segmented by means of BPE (Byte Pai r Encod i ng) and the segmentation result mapped to a sequence of values by means of a predefined vocabulary. Each integer in the sequence of values corresponds to a word of the original text, thereby converting the text into a numerical form that can be processed by the subsequent model. And finally, inputting the processed text into a preset text pre-training model to extract text characteristics and representing the text characteristics by vectors. The text pre-training model employs a contrast learning framework such that similar text is closer in vector space. And (3) carrying out L2 normalization on the vector obtained through the text pre-training model, so that the sum of squares of all position values is 1, and further ensuring the consistency and comparability of the vector in subsequent calculation.

Step S5120, inputting the commodity title into a pre-trained limited sales risk detection model, and determining whether an effective commodity corresponding to the commodity title has limited sales risk, wherein the effective commodity with limited sales risk is characterized as a commodity which is limited or forbidden to be sold by an electronic commerce platform where a store is located;

And inputting the acquired commodity titles (including the commodity normal titles and the commodity sub-titles) into a pre-trained forbidden and limited sales risk detection model. In one embodiment, the limited sales risk detection model is constructed based on natural language processing technology and machine learning algorithm, and can perform semantic analysis and limited sales risk classification on commodity titles. The training data of the restricted sales risk detection model comprises a large number of marked commodity title samples, wherein positive samples are commodity titles of restricted or restricted sales specified by the platform, and negative samples are compliant commodity titles. Through training, the restricted sales risk detection model can learn key features and semantic patterns of the restricted sales commodity titles, such as vocabularies and descriptions of contents related to contraband, illegal and illegal commodities, dangerous goods and the like. Inputting the commodity title obtained in the last step into a forbidden and limited sales risk detection model, and outputting a risk probability value by the forbidden and limited sales risk detection model to indicate the possibility that the commodity title has forbidden and limited sales risks. And comparing the risk probability value with the risk threshold according to the preset risk threshold, if the risk probability value is larger than the risk threshold, judging that the effective commodity corresponding to the commodity title has the forbidden and limited sales risk, and adding a forbidden and limited sales risk label to the corresponding effective commodity. Wherein the risk threshold is set as desired by a person skilled in the art. By this step, the forbidden articles possibly existing in the store can be identified efficiently and accurately.

Step S5130, counting the commodity number of the abnormal commodity with the forbidden sale risk in the store, and obtaining the forbidden sale commodity occupation ratio of the store based on the commodity number of the abnormal commodity and the total number of the effective commodity of the store;

in one embodiment, all valid items in the store are traversed to check whether the valid items are marked as having a forbidden limit of sales risk. And counting the commodity number of the abnormal commodity with the forbidden sale risk in the shop by inquiring the forbidden sale risk label. And calculating the rate of the forbidden and limited commodities based on the counted number of the forbidden and limited commodities and the obtained total number of the effective commodities. (the number of abnormal commodity in trade/the total number of effective commodity in store with forbidden limit risk) ×100% is forbidden limit commodity ratio. The rate of the forbidden limit commodity reflects the relative proportion of the forbidden limit commodity in the store, and is an important index for evaluating the risk degree of the store.

And step 5140, marking the store as a risk store when the forbidden limit commodity ratio is larger than a preset proportion threshold value, and acquiring a plurality of risk stores to form a risk store set.

And comparing the calculated forbidden limit commodity proportion with a preset proportion threshold value. The preset proportion threshold is set by a person skilled in the art according to rules, historical data and/or risk management requirements of the e-commerce platform and is used for judging whether the store has high forbidden and limited sales risk. For example, assuming that the preset proportion threshold is 3%, if the rate of the forbidden articles of a store is 5%, the preset proportion threshold is exceeded, which indicates that the store has a high forbidden articles risk, the store is marked as a risk store, and in one embodiment, a risk store label is added to the store with the forbidden articles rate greater than the preset proportion threshold. And traversing all shops on the platform, screening out shops with the forbidden and limited commodity proportion larger than a preset threshold value, and forming a risk shop set.

In this embodiment, the risk level of the corresponding store is reflected according to the ratio of the forbidden commodity by using the forbidden commodity as the abnormal commodity, and the risk level is quantified to determine whether the corresponding store is a risk store, and finally, a plurality of risk stores are collected to form a risk store set. The risk assessment mechanism based on the forbidden commodity proportion can accurately identify potential high-risk shops.

In a further embodiment, referring to fig. 3, when the number of the abnormal commodity in the store meets a preset condition, marking the store as a risk store, and collecting a plurality of risk stores to form a risk store set, including the following steps:

step S5150, acquiring commodity prices of all effective commodities in a store and a preset standard price library;

In one embodiment, commodity information of a store is accessed through a data interface or a database of the electronic commerce platform, and commodity prices of all effective commodities are extracted. The commodity price includes one or more of selling price, promotion price, discount price equivalent information of the commodity. And acquiring a preset standard price library, wherein the standard price library comprises standard price ranges of various commodities on the platform, and setting is performed based on historical sales data, market quotations and platform rules. The construction of the standard price base can be realized through data mining and statistical analysis technology, so that reasonable price intervals of similar commodities on the market can be reflected. By this step, it is ensured that the identification process of the risk store has a reliable price basis.

Step S5160, judging whether the commodity price of each effective commodity is low or not by adopting a fraud risk detection model according to a preset standard price library, when the commodity price is low, confirming the effective commodity corresponding to the commodity price as having fraud risk, and calibrating the effective commodity as a transaction abnormal commodity;

In one embodiment, the standard price range of the corresponding commodity is searched from the standard price library based on the commodity title corresponding to the commodity price, the commodity price and the corresponding standard price range are input into a pre-trained fraud risk detection model, the fraud risk detection model can determine the deviation degree between the commodity price and the standard price range, the deviation degree is obtained by comparing the deviation degree with the lowest price in the standard price range, and whether the commodity price of each effective commodity is low or not is judged by combining with a preset risk threshold, so that whether the commodity has fraud risk is judged. For example, if the commodity price of a valid commodity is below the risk threshold (e.g., 30%) for the corresponding commodity standard price in the standard price library, the commodity is marked as having a risk of fraud and the valid commodity is marked as a transaction anomalous commodity. Merchants often put on goods at extremely low prices for consumer appeal purposes, but the physical objects that consumers receive after purchase often differ from the description on the e-commerce platform. It should be noted that this embodiment is generally applied to standard goods, that is, goods with definite standards and rules, and the standard price range is related to the brand and specification of the corresponding goods. For example, a 64-inch television and a 32-inch television may have significant differences in corresponding prices due to the different sizes.

Through the embodiment, the fraud risk model is detected, the fraud risk label is added to the commodity with the fraud risk corresponding to the effective commodity, meanwhile, the effective commodity is marked as the transaction abnormal commodity, and the fraud risk label is used for counting the commodity quantity of the transaction abnormal commodity with the fraud risk in the store.

Step S5170, counting the number of the abnormal commodity in the trade with fraud risk, determining whether the corresponding store is a risk store based on the number of the abnormal commodity in the trade and the preset condition, and collecting a plurality of risk stores to form a risk store set.

In one embodiment, all valid items in the store are traversed and checked for marking as having a fraud risk. And counting the commodity quantity of the abnormal commodity with the fraud risk in the shop by inquiring the fraud risk label. And then obtaining the total number of the effective commodities of the store, and calculating the duty ratio of the fraudulent commodities based on the counted number of the fraudulent risk commodities and the total number of the effective commodities. The ratio of the fraudulent commodity reflects the relative proportion of the fraudulent commodity in the store, the calculated ratio of the fraudulent commodity is compared with a preset ratio threshold value, and the preset ratio threshold value is set according to the rule of the electronic commerce platform and the historical data and is used for judging whether the store has higher fraud risk. If the proportion of the fraudulent commodities is larger than a preset proportion threshold value, marking the shops as risk shops, screening shops with the proportion of the fraudulent commodities larger than the preset threshold value, and forming a risk shop set.

In this embodiment, by combining the standard price library and the fraud risk detection model, it is possible to efficiently identify that a commodity with a low price is determined to be a commodity with fraud risk, take the commodity with fraud risk as a transaction abnormal commodity, reflect the risk degree of the corresponding store according to the fraud commodity occupation ratio, determine whether the corresponding store is a risk store by quantifying the risk degree, and finally collect a plurality of risk stores to form a risk store set. The risk assessment mechanism based on the proportion of the fraudulent commodity can accurately identify potential high-risk shops.

In a further embodiment, referring to fig. 4, based on the number of goods of the corresponding abnormal goods of the transaction by the suspected risk shop and a preset similarity determination rule, filtering part of the suspected risk shop pairs from the set of suspected risk shop pairs, including the following steps:

Step S5310, based on the suspected risk shops, obtaining a plurality of first commodity categories with a large commodity number and a forbidden and limited commodity risk ratio of the two risk shops for the corresponding two risk shops;

In one embodiment, for both risk stores in the suspected risk store pair, each traverse through all valid items in the risk store, checking if the item is marked as having a forbidden limit of sales risk. And counting the category distribution of the restricted goods in each store, namely the quantity of the restricted goods under each category by inquiring the restricted sales risk tag. Specifically, the method analyzes the commodity title, description text or other related information through natural language processing technology, identifies the category to which the commodity belongs, and matches the identified category with a preset list of forbidden and limited sales categories. The list of no-sell categories is constructed based on platform rules, laws and regulations, and historical data, including merchandise categories that explicitly prohibit or limit sales, in cross-border e-commerce fields such as "aquatic products", "agricultural products", "drinks", "electronic products", "cosmetics", and the like. If the commodity category is matched with a certain category in the limited sales category list, a corresponding limited sales category label is added for the commodity. For example, if a commodity belongs to the category "seafood", a "no sell-seafood" tag is added to the corresponding commodity. It should be noted that, the list of the restricted sales categories may be dynamically updated according to the platform rule, the change of the law and regulation, and the actual business requirement, and after the list is updated, the commodities in the store are scanned again, whether the corresponding commodity category matches with the new list is checked, and the corresponding restricted sales category label is updated.

And sorting the categories according to the quantity of the restricted commodities, and screening out a plurality of first commodity categories with a large quantity of the restricted commodities. The number of the first commodity categories can be set according to actual requirements, for example, the first three categories of the forbidden and limited commodity number are selected. The first commodity category reflects the main risk distribution of the shops on the forbidden and limited commodity and is the basis for judging the similarity of the shops. And acquiring the risk ratio of the forbidden and limited commodities. The risk proportion of the restricted sales commodity reflects the risk concentration degree of the store on the restricted sales commodity, and is a key index for evaluating the risk level of the store. For example, a total of 1000 valid commodities in a risk store a, wherein 100 commodities are marked as forbidden limit commodities, namely, the forbidden limit commodity risk ratio of the risk store is 10%, and among the 100 forbidden limit commodities, a plurality of first commodity categories with a large forbidden limit commodity number are screened out through counting distribution of forbidden limit commodity categories. For example, assume that of the 100 restricted commodities, there are 40 commodities under the "aquatic products" category, 30 commodities under the "agricultural products" category, 20 commodities under the "wine" category, 5 commodities under the "electronic products" category, and 5 commodities are arranged under the category of cosmetics, and the first commodity category of the third category is selected as aquatic products, agricultural products and drinks according to the quantity of the products which are forbidden to be sold. The data of the two risk shops in the suspected risk shop pair on the forbidden commodity category and the risk ratio can be obtained through the steps, and a comprehensive and accurate basis is provided for subsequent similarity judgment.

Step S5320, determining whether the suspected risk shop pair meets a preset first similar judgment condition based on the first commodity category and the forbidden commodity risk ratio, and if not, filtering the corresponding suspected risk shop pair from the suspected risk shop pair set;

In one embodiment, assume that risk store A has a total of 1000 available items, with 100 items marked as no-sell items, the corresponding no-sell item risk ratio is 10%, and risk store B has a total of 800 available items, with 80 items marked as no-sell items, the no-sell item risk ratio is 10%. The difference in the risk ratios of the forbidden articles of the two stores, i.e., |10% -10% |=0%, is calculated and compared with a preset first difference threshold (e.g., 5%). If the difference is smaller than a preset first difference threshold, the two stores are preliminarily judged to have similarity in the risk proportion of the forbidden and limited commodities.

And when the difference value is smaller than a preset first difference value threshold value, further analyzing the forbidden and limited commodity categories of the two shops. And obtaining a plurality of first commodity categories with a large number of forbidden and restricted commodities in the two shops. For example, the forbidden and limited sales commodity of the risk store A is mainly distributed in three categories of "aquatic products" (40), "agricultural products" (30) and "drinks" (20), and the forbidden and limited sales commodity of the risk store B is mainly distributed in three categories of "aquatic products" (35), "agricultural products" (25) and "drinks" (15). The overlapping ratio of the two first commodity categories is calculated in a mode that the number of the same categories in the first commodity categories of the two stores is divided by the total number of the first commodity categories of one store. In this example, the first commodity categories of both stores are "aquatic product", "agricultural product", and "wine", the same category number is 3, and the total number of the first commodity categories of one store is 3, so the coincidence ratio is 3/3×100% =100%. Comparing the coincidence rate with a preset first coincidence rate threshold (for example, 80%), if the coincidence rate is larger than the preset first coincidence rate threshold, determining that the two stores have high similarity in the restricted commodity category, namely, determining that the first similarity determination condition is met when the difference of the risk ratios of the restricted commodity of the two stores is smaller than the preset first difference threshold and the coincidence rate corresponding to the first commodity category is larger than the preset first coincidence rate threshold.

If the forbidden commodity risk ratio difference value and the category overlapping ratio of the two shops meet preset conditions, the suspected risk shop pair is reserved, otherwise, the suspected risk shop pair is filtered out of the suspected risk shop pair set. Through the steps, suspected risk shop pairs with high similarity in the risk proportion and category distribution of the forbidden and limited commodities can be effectively screened out.

Step S5330, when the suspected risk stores meet a preset first similar judgment condition, acquiring a plurality of second commodity categories with more commodities and a fraudulent commodity risk ratio based on the suspected risk stores for the corresponding two risk stores, wherein the fraudulent commodity categories of the two risk stores comprise a plurality of second commodity categories with more commodities and the fraudulent commodity risk ratio is the ratio of the number of the abnormal commodity with the fraudulent risk in the corresponding store to the total number of the effective commodities of the store;

And respectively extracting fraud risk labels of all effective commodities in the risk shops aiming at two risk shops in the suspected risk shop pair which meet the first similar judging condition, wherein the fraud risk labels are generated through a pre-trained fraud risk detection model. And counting the quantity of the fraudulent commodities in each store according to the fraud risk labels, and classifying and summarizing according to the commodity category. For example, the fraudulent goods of risk store A are mainly distributed in three categories of "luxury goods" (30), electronic products "(20) and" health products "(10), and the fraudulent goods of risk store B are mainly distributed in three categories of" luxury goods "(25), electronic products" (15) and "virtual goods" (10). And screening a plurality of second commodity categories with a large number of fraudulent commodities, for example, selecting the category of the first three ranks as the second commodity category. Meanwhile, the risk ratio of the fraudulent commodities in each store is calculated, namely, the ratio of the commodity number of the abnormal commodity with the fraud risk in the corresponding store to the total number of the effective commodities in the store is calculated.

Step S5340, determining whether the suspected risk shop pair meets a preset second similar judgment condition based on the second commodity category and the fraudulent commodity risk ratio, and if not, filtering the corresponding suspected risk shop pair from the suspected risk shop pair set.

In one embodiment, for two risk stores in the suspected risk store pair, the difference in risk ratios of the fraudulent goods in the two risk stores, that is, the absolute value of the difference in risk ratios of the fraudulent goods in the two stores, is calculated separately, and the difference is compared with a preset second difference threshold. For example, the risk ratio of the fraudulent commodity in the risk store a is 5%, the risk ratio of the fraudulent commodity in the risk store B is 6%, the difference is 1%, if the preset difference threshold is 2%, the difference is smaller than the preset second difference threshold, and the two stores are preliminarily judged to have similarity in the risk ratio of the fraudulent commodity. And then further analyzing the distribution of the fraudulent commodity categories of the two stores, obtaining a plurality of second commodity categories with more fraudulent commodity numbers, and calculating the coincidence rate of the second commodity categories. For example, the second merchandise categories of risk store A are "luxury," "electronic products," and "health products," the second merchandise categories of risk store B are "luxury," "electronic products," and "virtual merchandise," the same category number is 2, and the total number of second merchandise categories for one store is 3, then the overlap ratio is 2/3≡66.7%. And comparing the coincidence rate with a preset second coincidence rate threshold (for example, 60%), and if the coincidence rate is larger than the preset second coincidence rate threshold, judging that the two stores have high similarity in the category of the fraudulent commodity, and conforming to a second similar judging condition. If the risk ratio difference value and the category coincidence rate of the fraudulent goods of the two stores meet preset conditions, the suspected risk store pair is reserved, otherwise, the suspected risk store pair is filtered out of the suspected risk store pair set. For example, if the risk ratio of the fraudulent article of the risk store a is 5%, the risk ratio of the fraudulent article of the risk store B is 8%, the difference is 3%, the risk ratio is greater than a preset difference threshold, or the category overlapping ratio is 50%, and the risk ratio is less than a preset overlapping ratio threshold, the suspected risk store pair is filtered from the suspected risk store pair set. Through the steps, suspected risk shop pairs with high similarity in the risk proportion and category distribution of the fraudulent goods can be effectively screened out.

In the embodiment, the shop pairs with high similarity in risk characteristics are further screened from the suspected risk shop pair set based on multi-dimensional similarity judgment of the forbidden and fraudulent commodities, so that the accuracy and reliability of risk shop station group detection are remarkably improved. The multi-level similarity judging mechanism not only improves the screening accuracy, but also avoids the misjudgment possibly caused by single-dimension judgment. By filtering out suspected risk shop pairs which do not meet the similarity condition, shops possibly belonging to the same risk station group can be accurately identified, and more reliable data support is provided for subsequent risk station group detection. According to the embodiment, through multi-dimensional and multi-level similarity judgment, the comprehensiveness and robustness of detection of the risk shop station group are obviously improved.

In a further embodiment, referring to fig. 5, based on the first category of goods and the risk ratio of the forbidden articles, determining whether the pair of suspected risk shops meets a preset first similarity determination condition includes the following steps:

step S5321, calculating the difference value of the risk ratio of the forbidden limit commodity of the suspected risk shops to the corresponding two risk shops;

and calculating the difference value of the forbidden and limited commodity risk ratios of the two shops, namely the absolute value of the difference value of the forbidden and limited commodity risk ratios of the two shops, aiming at the two risk shops in the suspected risk shop pair. The difference reflects the difference degree of the risk ratio of the forbidden commodities of the two shops, and is a key index for judging whether the two shops have similarity or not. By calculating the difference value, a quantization basis can be provided for subsequent similarity judgment, so that shop pairs with consistency in the risk characteristics of the forbidden and limited commodities are screened out.

Step S5322, when the difference value is smaller than a preset difference value threshold value, obtaining the same number of first commodity categories of the two risk shops, dividing the number by the total number of first commodity categories corresponding to any one of the two risk shops, and obtaining the coincidence rate of the first commodity categories of the two risk shops;

And respectively determining first commodity categories corresponding to the two risk shops, wherein the first commodity categories are obtained by counting a plurality of categories with a large number of forbidden and restricted commodities in each shop. And when the difference value is smaller than a preset difference value threshold (namely, a first difference value threshold), comparing the first commodity categories of the two risk shops, and finding out the same category number. The same category number is then divided by the total number of first merchandise categories for either of the two risk stores. And obtaining the coincidence rate of the first commodity category of the two risk shops, wherein the coincidence rate reflects the similarity degree of the two shops on the forbidden commodity category, and the higher the coincidence rate is, the more similar the distribution of the two shops on the forbidden commodity category is, so that a quantitative basis is provided for subsequent similarity judgment.

And step S5323, when the coincidence rate is larger than a preset coincidence rate threshold value, judging that the suspected risk shop pair meets a preset first similar judgment condition.

The preset coincidence rate threshold is preset according to actual service demands and historical data and is used for measuring whether the similarity of two stores in the restricted commodity category reaches an acceptable standard. For example, assuming that the preset coincidence rate threshold is 60%, if the coincidence rate of the first commodity category of the two risk shops is 66.7% after calculation, the coincidence rate is greater than the preset coincidence rate threshold, which indicates that the two shops have high similarity in the distribution of the forbidden commodity categories. At this time, the suspected risk shop pair is judged to meet a preset first similarity judgment condition, namely, the two shops meet similarity requirements in terms of the risk ratio and category distribution of the forbidden and limited commodities, so that the shop pair is reserved for subsequent risk station group detection analysis.

In the embodiment, the comparison between the forbidden commodity risk ratio of two shops in the suspected risk shop pair and the coincidence rate of the first commodity category are calculated, and the similarity judgment is carried out based on the preset difference threshold and the coincidence rate threshold, so that the shop pair with similar risk characteristics is reserved, the shop pair which does not meet the conditions is filtered, and the screening accuracy and reliability of the suspected risk shop pair are remarkably improved. The similarity judging mechanism based on the quantitative index not only improves the screening accuracy and objectivity, but also avoids the deviation possibly caused by subjective judgment.

In a further embodiment, referring to fig. 6, based on the filtered set of suspected risk shops, calculating a commodity similar coverage rate of the suspected risk shops to the corresponding two risk shops, and based on the commodity similar coverage rates of all suspected risk shops in the set of suspected risk shops, determining a risk station group shop, including the following steps:

step S5410, converting all commodity titles in the two corresponding risk shops of the suspected risk shop pair into vector representations based on the filtered suspected risk shop pairs in the suspected risk shop pair set, to obtain commodity title vectors;

Extracting all commodity titles in the corresponding two risk shops from the suspected risk shops, and in one embodiment, performing text preprocessing on the extracted commodity titles, including operations such as text cleaning, word segmentation, stop word removal and the like. After the preprocessing is completed, the preprocessed commodity titles are converted into vector representations using a pre-trained natural language processing model (e.g., word2Vec, BERT, etc.). For each commodity title, each word in the title is mapped to a high-dimensional vector by a word embedding technique. These term vectors are then aggregated into a commodity heading vector of a fixed dimension by a weighted average or pooling operation (e.g., mean pooling or max pooling). And finally, carrying out L2 standardization processing on the generated commodity header vectors to ensure that the modular length of all vectors is 1, and ensuring consistency in the subsequent calculation of cosine similarity. Through the steps, all commodity titles of each store are converted into vector representations, and a high-quality vector data base is provided for subsequent commodity similarity calculation and risk station group detection.

Step S5420, calculating cosine similarity between commodity title vectors of the two risk shops to obtain a commodity similarity matrix of the risk shops;

For each pair of commodity title vectors in the corresponding two risk shops, calculating cosine similarity between the two risk shops. The cosine similarity calculation formula is that the dot product of two vectors is divided by the corresponding modular length product, the value range is [ -1,1], and the closer the value is to 1, the more similar the semantics of two commodity titles are. Specifically, for the ith commodity heading vector of store a and the jth commodity heading vector of store B, cosine similarity is calculated, and the result is stored in the ith row and jth column of the similarity matrix. And finally generating a commodity similarity matrix by traversing all commodity title vector pairs of the two stores, wherein each element represents the semantic similarity of the corresponding commodity titles in the two stores.

Step S5430, determining commodity similar coverage rates of the suspected risk shops to the corresponding two risk shops based on the commodity similar matrix;

In one embodiment, for each row in the commodity similarity matrix (i.e., each commodity heading vector for store a), a maximum cosine similarity value between all commodity heading vectors for store a and store B is found and compared to a preset similarity threshold. If the maximum cosine similarity value is larger than the similarity threshold value, the commodity of the store A and a certain commodity in the store B are considered to have high similarity in terms of semantics, and the commodity is counted into the quantity of similar commodities. And counting the number of all similar commodities meeting the conditions in the store A, and dividing the number of the similar commodities by the total number of the commodities of the store A to obtain the commodity similar coverage rate of the store A relative to the store B. Similarly, for each column in the commodity similarity matrix (i.e., each commodity heading vector of store B), the maximum cosine similarity value between all commodity heading vectors of store B and store a is found, and the same comparison and statistics are performed to obtain the commodity similarity coverage rate of store B with respect to store a. And finally, comprehensively calculating the commodity similar coverage rate of the two shops (such as taking an average value or a weighted average value) to obtain the commodity similar coverage rate of the suspected risk shops to the corresponding two risk shops. The commodity similarity coverage rate reflects the overall similarity degree of two shops on commodity title semantics, and provides an important quantification basis for subsequent risk station group detection.

Step S5440, marking each store in the suspected risk store pair set as a node in a risk station group store map, and when the commodity similarity coverage is greater than a preset coverage threshold, establishing edge connection between two corresponding stores to form a risk station group store map for determining a risk station group to which the store belongs.

And constructing an initial undirected graph by taking all shops in the suspected risk shop pair set as nodes, wherein each node represents one risk shop. Traversing all suspected risk store pairs, acquiring the similar coverage rate of corresponding commodities for each pair of stores, and comparing the similar coverage rate of the commodities with a preset coverage rate threshold. If the commodity similarity coverage rate is larger than a preset coverage rate threshold value, establishing an undirected edge between corresponding store nodes, wherein the undirected edge indicates that two stores have high similarity in commodity semantics and possibly belong to the same risk station group. And traversing all store pairs and completing the edge connection to finally construct a risk station group store diagram. Nodes in the graph represent risk shops, edges represent commodity semantic similarity among shops, and closely connected subgraphs in the graph can be further identified through a graph analysis algorithm (such as a community detection algorithm), so that a potential risk station group shop set is determined.

In this embodiment, the commodity titles of the risk shops are converted into vector representations, commodity similarity coverage rates among shops are calculated based on the commodity title vectors, so that a risk shop group shop map is constructed, and suspected risk shop pairs with high similarity in commodity operation characteristics can be effectively identified based on a screening mechanism of the commodity similarity coverage rates, so that risk shop station groups possibly operated by the same main body or related main bodies can be more accurately positioned.

In a further embodiment, referring to fig. 7, determining the commodity similarity coverage rate of the suspected risk shops to the corresponding two risk shops based on the commodity similarity matrix includes:

step S5431, determining a cosine similarity maximum value of each row in the commodity similarity matrix, and comparing the cosine similarity maximum value with a preset similarity threshold;

In one embodiment, for each row in the commodity similarity matrix (i.e., each commodity heading vector for store A), all matrix elements in that row are traversed to determine the maximum cosine similarity value between all commodity heading vectors for store A and store B. The maximum cosine similarity value indicates the highest semantic similarity between the current commodity of store A and a commodity of store B. And then comparing the found maximum cosine similarity value with a preset similarity threshold. The preset similarity threshold is preset according to actual service demands and historical data and is used for measuring whether two commodity titles have enough high similarity semantically. If the maximum cosine similarity value is larger than a preset similarity threshold value, the commodity of the store A and a commodity in the store B are considered to have high similarity in terms of semantics, the commodity is counted into the subsequent similar commodity quantity statistics, otherwise, the commodity and the commodity in the store B are considered to have insufficient similarity in terms of semantics, and the counting is not performed.

And S5432, counting the number of the cosine similarity maximum value which is larger than the similarity threshold value, dividing the number by the total number of commodities of any one of the two risk shops, and obtaining the commodity similarity coverage rate of the suspected risk shops to the corresponding two risk shops.

In one embodiment, through the comparison in the previous step, the number of the cosine similarity maximum value of the store a relative to the store B is counted and is larger than the similarity threshold value, and the number is divided by the total number of commodities of the store a, so that the corresponding commodity similarity coverage rate is obtained. Similarly, for each column in the commodity similarity matrix, counting the number of times that the corresponding maximum cosine similarity value is larger than a preset similarity threshold, namely the number of commodities which are semantically similar in commodity title relative to the commodity A in the commodity B, and dividing the number by the total number of commodities of the commodity B to obtain the commodity similarity coverage rate of the commodity B relative to the commodity A. And finally, comprehensively calculating the commodity similar coverage rate of the two shops (such as taking an average value or a weighted average value) to obtain the commodity similar coverage rate of the suspected risk shops to the corresponding two risk shops.

In this embodiment, the cosine similarity maximum value of each row in the commodity similarity matrix is calculated and compared with a preset similarity threshold, so that commodity pairs with high similarity in commodity title semanteme of two shops can be accurately identified, the number of the cosine similarity maximum value larger than the similarity threshold is counted, commodity similarity coverage rate is calculated based on the counting result, and the overall similarity degree of the two shops in commodity semanteme can be quantified. Because risk station group stores are often operated by the same subject or associated subjects, the corresponding commodity selection policies, operation modes and target customer group heights are consistent, i.e., the shelved commodities exhibit significant divergence in category, title, description and function. In the embodiment, the similarity degree of commodity titles between two stores is quantified through commodity similarity coverage rate, so that the relevance between the two stores is determined, and the potential risk station group is more accurately positioned.

Referring to fig. 8, the risk shop group detection device provided by adapting to one of the purposes of the present application is a functional embodiment of the risk shop group detection method of the present application, and on the other hand, the risk shop group detection device provided by adapting to one of the purposes of the present application comprises a risk shop marking module 5100, a risk shop pair constructing module 5200, a risk shop pair filtering module 5300 and a risk shop group determining module 5400, wherein the risk shop marking module 5100 is used for marking the shops as risk shops when the number of goods in the shops with abnormal goods meets a preset condition, acquiring a plurality of risk shops to form a risk shop set, and the risk shop pair constructing module 5200 is used for calculating the commodity total number ratio of every two risk shops in the risk shop set, and marking the corresponding two risk shops as a suspected risk shop pair when the commodity total number ratio is within a preset ratio interval, wherein the suspected shop pair filtering module is used for constructing a suspected shop pair set, and the suspected shop pair filtering module is used for calculating a suspected product pair coverage rate based on the corresponding to the suspected product pair filtering station, and the suspected product pair similarity is calculated based on the two corresponding risk pair similarity sets, and the suspected product pair similarity sets is calculated based on the two corresponding risk pair similarity sets and the suspected risk pair similarity sets based on the two similarity sets and the suspected risk pair similarity sets.

In a further embodiment, the risk store marking module 5100 includes a commodity title obtaining submodule, a limit-sales risk determining submodule, a risk store marking submodule and a risk store marking submodule, wherein the commodity title obtaining submodule is used for obtaining commodity titles of all effective commodities in stores, the commodity titles include commodity normal titles and commodity sub-titles, the limit-sales risk determining submodule is used for inputting the commodity titles into a pre-trained limit-sales risk detection model, determining whether the effective commodities corresponding to the commodity titles have limit-sales risks, the effective commodities with limit-sales risks are characterized in that the electronic commodity platform where the stores are located prescribes limited or prohibited from selling commodities, the limit-sales commodity proportion calculating submodule is used for counting the number of the abnormal commodity with limit-sales risks in the stores, and obtaining the limit-sales commodity proportion of the stores based on the commodity number of the abnormal commodity with limit-sales risk and the total number of the effective commodity of the stores, and the risk store marking submodule is used for marking the stores as risk stores when the limit-sales commodity proportion is larger than a preset proportion threshold, and collecting a plurality of risk stores to form a risk set.

In a further embodiment, the risk store marking module 5100 includes a commodity price obtaining sub-module configured to obtain commodity prices of all effective commodities in a store and a preset standard price library, a fraud risk determining sub-module configured to determine whether the commodity price of each effective commodity is low or not according to the preset standard price library by using a fraud risk detection model, determine that an effective commodity corresponding to the commodity price is at a fraud risk when the commodity price is low or not, and mark the effective commodity as a transaction abnormal commodity, and a risk store determining sub-module configured to count the number of commodities of the transaction abnormal commodity having the fraud risk in the store, determine whether the corresponding store is a risk store based on the number of commodities of the transaction abnormal commodity and the preset condition, and collect a plurality of risk stores to form a risk store set.

In a further embodiment, the risk store pair filtering module 5300 includes a first commodity category obtaining submodule, configured to obtain a plurality of first commodity categories and a risk proportion of the limited commodity for which the two risk stores include a large number of commodities based on the suspected risk store pair corresponding to two risk stores, a first risk store pair filtering submodule, configured to determine whether the suspected risk store pair meets a preset first similarity judgment condition based on the first commodity category and the risk proportion of the limited commodity, and if not, filter a corresponding suspected risk store pair from the set of suspected risk store pairs, and a second commodity category obtaining submodule, configured to obtain a plurality of second commodity categories and a fraud commodity proportion, based on the suspected risk store pair corresponding to two risk stores, when the suspected risk store pair meets the preset first similarity judgment condition, the fraud commodity category including a large number of the two risk stores, based on the first commodity category and the fraud commodity proportion, and if the corresponding risk store pair does not meet the preset first similarity judgment condition, and determine whether the corresponding suspected risk store pair meets the fraud commodity pair from the set based on the second similarity judgment condition, and if the corresponding risk store pair does not meet the second similarity judgment condition.

In a further embodiment, the first risk shop pair filtering submodule comprises a duty ratio difference value calculating submodule, a coincidence ratio calculating submodule and a condition coincidence judging submodule, wherein the duty ratio difference value calculating submodule is used for calculating the difference value of the forbidden commodity risks of the two corresponding risk shops, the coincidence ratio calculating submodule is used for obtaining the same number of first commodity categories of the two risk shops when the difference value is smaller than a preset difference value threshold value, dividing the number by the total number of the first commodity categories corresponding to any one of the two risk shops to obtain the coincidence ratio of the first commodity categories of the two risk shops, and the condition coincidence judging submodule is used for judging that the suspected risk shop pair meets a preset first similar judging condition when the coincidence ratio is larger than the preset coincidence ratio threshold value.

In a further embodiment, the risk shop station group determining module 5400 includes a commodity title vector conversion sub-module configured to convert all commodity titles in the corresponding two risk shops of the suspected risk shop pair set into vector representations based on the filtered suspected risk shop pairs in the suspected risk shop pair set to obtain commodity title vectors, a commodity similarity matrix determining sub-module configured to calculate cosine similarity between commodity title vectors of the two risk shops to obtain a commodity similarity matrix of the risk shops, a commodity similarity coverage determining sub-module configured to determine commodity similarity coverage of the suspected risk shop pair corresponding to the two risk shops based on the commodity similarity matrix, and a risk shop graph determining sub-module configured to mark each of the suspected risk shop pair set as a node in a risk shop graph, and when the commodity similarity coverage is greater than a preset coverage threshold, to establish an edge connection between the corresponding two risk shops to form a risk shop graph for determining the risk shop to which the risk shop station group belongs.

In a further embodiment, the commodity similarity coverage rate determining submodule comprises a cosine similarity comparing submodule and a commodity similarity coverage rate calculating submodule, wherein the cosine similarity comparing submodule is used for determining the maximum value of cosine similarity of each row in the commodity similarity matrix, comparing the maximum value of cosine similarity with a preset similarity threshold value, and the commodity similarity coverage rate calculating submodule is used for counting the number of the maximum value of cosine similarity which is larger than the similarity threshold value, dividing the number by the total number of commodities of any one of the two risk shops, and obtaining the commodity similarity coverage rate of the suspected risk shops to the corresponding two risk shops.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. As shown in fig. 9, the internal structure of the computer device is schematically shown. The computer device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and the computer readable instructions when executed by a processor can enable the processor to realize a detection method of a risk shop station group. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the risk store station group detection method of the present application. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor in this embodiment is configured to execute specific functions of each module and its sub-module in fig. 8, and the memory stores program codes and various data required for executing the above modules or sub-modules. The network interface is used for data transmission between the user terminal or the server. The memory in the present embodiment stores program codes and data necessary for executing all modules/sub-modules in the risk shop station group detection device of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the method for detecting a group of risk stores of any of the embodiments of the present application.

Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments of the present application may be implemented by a computer program for instructing relevant hardware, where the computer program may be stored on a computer readable storage medium, where the program, when executed, may include processes implementing the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-only memory (Read-On l yMemory, ROM), or a random access memory (Random Access Memory, RAM).

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, acts, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed herein may be alternated, altered, rearranged, disassembled, combined, or eliminated. Further, various operations, methods, steps, means, or arrangements of procedures found in the prior art with the open source of the present application may be alternated, altered, rearranged, split, combined, or eliminated.

The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. A method for detecting risky store clusters, comprising the following steps:

When the number of abnormally traded goods in a store meets the preset conditions, the store is marked as a risky store, and multiple risky stores are collected to form a risky store set;

Calculating the ratio of the total number of goods between each of the risky stores in the risky store set; when the ratio of the total number of goods is within a preset ratio interval, marking the corresponding two risky stores as a suspected risky store pair, and the suspected risky store pair is used to form a suspected risky store pair set;

Based on the quantity of the abnormally traded commodities corresponding to the suspected risky store pairs and a preset similarity determination rule, filtering out some of the suspected risky store pairs from the set of suspected risky store pairs;

Based on the filtered set of suspected risk store pairs, the product similarity coverage rate of the suspected risk store pairs corresponding to the two risk stores is calculated, and based on the corresponding product similarity coverage rates of all suspected risk store pairs in the set of suspected risk store pairs, the risky site group stores are determined.

2. The method for detecting risky store clusters according to claim 1 is characterized in that when the number of abnormally traded goods in a store meets a preset condition, the store is marked as a risky store, and multiple risky stores are collected to form a risky store set, including:

Get the product titles of all valid products in the store, including the product title and product subtitle;

Input the product title into a pre-trained sales restriction risk detection model to determine whether the valid product corresponding to the product title has a sales restriction risk. Valid products with a sales restriction risk are characterized as products that are restricted or prohibited from sale by the e-commerce platform where the store is located;

Counting the number of products with abnormal transactions that are at risk of being banned or restricted in sale in the store, and calculating the percentage of banned or restricted products in the store based on the number of products with abnormal transactions and the total number of valid products in the store;

When the proportion of the banned or restricted goods is greater than a preset ratio threshold, the store is marked as a risky store, and multiple risky stores are collected to form a risky store set.

3. The method for detecting risky store clusters according to claim 1 is characterized in that when the number of abnormally traded goods in a store meets a preset condition, the store is marked as a risky store, and multiple risky stores are collected to form a risky store set, including:

Get the product prices of all valid products in the store and the preset standard price library;

A fraud risk detection model is used to determine whether the price of each valid product is artificially low based on a preset standard price library. If the price is artificially low, the valid product corresponding to the price is confirmed to have a fraud risk and is marked as an abnormal transaction product.

The number of products with abnormal transactions and fraud risks in the store is counted, and based on the number of products with abnormal transactions and the preset conditions, whether the corresponding store is a risky store is determined, and multiple risky stores are collected to form a risky store set.

4. The method for detecting risky store clusters according to claim 1, characterized in that, based on the number of products of the abnormally traded products corresponding to the suspected risky store pairs and a preset similarity determination rule, some of the suspected risky store pairs are filtered out from the set of suspected risky store pairs, comprising:

Based on the suspected risky store and the corresponding two risky stores, obtaining the banned or restricted product categories of the two risky stores, including multiple first product categories with a large number of products, and the risk ratio of banned or restricted products;

Based on the first product category and the risk ratio of the prohibited or restricted products, determining whether the suspected risky store pair meets a preset first similarity judgment condition; if not, filtering the corresponding suspected risky store pair from the set of suspected risky store pairs;

When the suspected risky store meets the preset first similarity judgment condition, based on the suspected risky store and the corresponding two risky stores, obtain the fraudulent product categories of the two risky stores, including multiple second product categories with a large number of products, and the fraudulent product risk ratio, where the fraudulent product risk ratio is the ratio of the number of products with abnormal transactions that have fraudulent risks in the corresponding store to the total number of valid products in the store;

Based on the second product category and the risk ratio of the fraudulent products, determine whether the suspected risk store pair meets the preset second similarity judgment condition; if not, filter the corresponding suspected risk store pair from the suspected risk store pair set.

5. The method for detecting risky store clusters according to claim 4, wherein determining whether the suspected risky store pair meets a preset first similarity judgment condition based on the first product category and the risk ratio of the prohibited or restricted products comprises:

Calculate the difference in the risk ratio of banned or restricted goods sold by the suspected risky store to the corresponding two risky stores;

When the difference is less than a preset difference threshold, the number of identical first product categories of the two risky stores is obtained, and the number is divided by the total number of first product categories corresponding to any of the two risky stores to obtain the overlap rate of the first product categories of the two risky stores;

When the overlap rate is greater than a preset overlap rate threshold, it is determined that the pair of suspected risky stores meets a preset first similarity judgment condition.

6. The method for detecting risky store clusters according to claim 1 is characterized in that, based on the filtered set of suspected risky store pairs, the product similarity coverage ratio of the suspected risky store pairs corresponding to the two risky stores is calculated, and based on the corresponding product similarity coverage ratios of all suspected risky store pairs in the set of suspected risky store pairs, the risky store cluster is determined, including:

Based on the suspected risky store pairs in the filtered set of suspected risky store pairs, convert all product titles in the two risky stores corresponding to the suspected risky store pairs into vector representations to obtain product title vectors;

Calculating the cosine similarity between the product title vectors of the two risky stores to obtain a product similarity matrix of the risky stores;

Determining, based on the product similarity matrix, the product similarity coverage rate of the suspected risky store to the corresponding two risky stores;

Each store in the set of suspected risk store pairs is marked as a node in the risk station group store graph. When the product similarity coverage is greater than the preset coverage threshold, an edge connection is established between the corresponding two stores to form a risk station group store graph for determining the risk station group to which the store belongs.

7. The method for detecting risky store clusters according to claim 6, wherein determining the product similarity coverage of the suspected risky store to the corresponding two risky stores based on the product similarity matrix comprises:

Determining the maximum cosine similarity of each row in the product similarity matrix, and comparing the maximum cosine similarity with a preset similarity threshold;

The number of stores whose maximum cosine similarity is greater than the similarity threshold is counted, and the number is divided by the total number of goods in any of the two risk stores to obtain the similarity coverage of goods of the suspected risk store to the corresponding two risk stores.

8. A device for detecting risky store clusters, comprising:

A risky store marking module is used to mark a store as a risky store when the number of abnormally traded products in the store meets the preset conditions, and collect multiple risky stores to form a risky store set;

A risky store pair formation module is used to calculate the ratio of the total number of goods between any two risky stores in the risky store set. When the ratio of the total number of goods is within a preset ratio interval, the corresponding two risky stores are marked as a suspected risky store pair. The suspected risky store pair is used to form a suspected risky store pair set.

a risky store pair filtering module, configured to filter out some of the suspected risky store pairs from the set of suspected risky store pairs based on the quantity of the abnormally traded products corresponding to the suspected risky store pairs and a preset similarity determination rule;

The risky store cluster determination module is used to calculate the product similarity coverage rate of the suspected risk store pairs corresponding to the two risk stores based on the filtered set of suspected risk store pairs, and determine the risky store cluster based on the corresponding product similarity coverage rate of all suspected risk store pairs in the set of suspected risk store pairs.

9. A computer device comprising a central processing unit and a memory, wherein the central processing unit is configured to call and run a computer program stored in the memory to execute the steps of the method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores a computer program implemented according to the method according to any one of claims 1 to 7 in the form of computer-readable instructions, and when the computer program is called and executed by a computer, it executes the steps included in the corresponding method.