HK40016958A - Two-dimensional code risk identification method and system - Google Patents
Two-dimensional code risk identification method and system Download PDFInfo
- Publication number
- HK40016958A HK40016958A HK42020007206.4A HK42020007206A HK40016958A HK 40016958 A HK40016958 A HK 40016958A HK 42020007206 A HK42020007206 A HK 42020007206A HK 40016958 A HK40016958 A HK 40016958A
- Authority
- HK
- Hong Kong
- Prior art keywords
- url
- dimensional code
- code
- user
- protocol
- Prior art date
Links
Description
Technical Field
The application relates to the technical field of information security, in particular to a two-dimension code risk identification method and system.
Background
The appearance of the two-dimensional code greatly facilitates the life of people, the code can be scanned for payment, friends can be added, the webpage can be opened, the operation difficulty of people is greatly reduced, and the operation time is saved. With the development of information technology, more and more off-line scenes are communicated through two-dimensional codes, and people are familiar with scanning codes. The two-dimensional code brings convenience to people, and covers some information, so that lawless persons can take advantage of the two-dimensional code, and therefore hidden dangers of many safety risks are caused. For example, the black product can convert the fishing link into the two-dimensional code and enable the user to scan, so that the user identification difficulty is increased, and compared with an unknown link, the user can get on the hook more easily and steal the information of the user. And a lawbreaker disguises the malicious link as a two-dimensional code, then induces a user to identify or manufacture a fake payment bill, and induces the user to transfer accounts for payment or replace the off-line two-dimensional code in a normal scene through code scanning.
The current common solutions for safely scanning codes include, for example, relying on a blacklist mechanism, but the solutions lack the risk identification capability for two-dimensional codes not in a blacklist, and further include post-event and strong countermeasure management and control solutions reported by users, but the solutions are far from meeting the risk identification requirement for the two-dimensional codes.
In addition, the detection mode of malicious URLs corresponding to the two-dimensional codes is based on a character string matching method at present, security manufacturers collect a large number of malicious URLs and store the malicious URLs in a feature library, and when the URLs are detected, only character strings are compared.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Compared with the traditional post-affair and strong-countermeasure management and control mode based on user report, the scheme designs a set of weak-countermeasure fraud prevention and control scheme which integrates the steps of code scanning behavior identification, code content identification, derivative risk identification, follow-up risk management and control and the like. The scheme can also be applied to real-time fraud risk transaction strategy prevention and control.
According to the technical scheme, a set of fraud two-dimensional code recognition algorithm is established from multiple dimensions, the two-dimensional code for fraud of black products can be accurately found in the two-dimensional code provided by or interacted with a user, and compared with the prior art that the two-dimensional code for fraud of black products can be recognized only through self report of the user, the recognition initiative is stronger, and the recognition range is wider.
In addition, the risk derivation identification scheme is different from the existing mode of directly shielding or invalidating the cheating two-dimensional code, but derives and finds cheaters and cheated persons based on the found cheating two-dimensional code, and finds a collection medium used by cheating and a new cheating two-dimensional code. The present regulatory scheme provides significant improvements in promoting blackout crime costs and protecting fraudsters.
Aiming at the problem of malicious URL detection, in the technical scheme disclosed by the invention, a set of formalization rules can be customized to convert a URL character string into uniform formal data to be detected, the data are used for matching with a feature library, and whether the URL is a malicious link or not is judged. The starting point of the formalized rule is to eliminate the variability of the URL character string format, abandon redundant information which has no practical significance for detection, and supplement default information which does not exist in the URL character string to form the URL character string to be detected with the format 'scheme:// domain: port/path'.
Specifically, the present disclosure provides a formalization detection method for a malicious URL in a two-dimensional code, including:
according to RFC specifications, splitting the URL to be detected into syntax element character strings according to a URL syntax structure;
extracting a designated character string from the character string obtained by splitting;
judging whether the protocol character string and the port number character string exist or not and performing completion processing on the nonexistent character string part;
reordering the character strings obtained after completion processing to obtain a new URL so as to calculate the hash value of the new URL and take the hash value as the hash value corresponding to the URL to be detected; and
and traversing the malicious URL feature library, and comparing and detecting the feature data in the malicious URL feature library with the hash value corresponding to the URL to be detected.
In another embodiment of the present disclosure, a formalization detection system for malicious URLs in two-dimensional codes is provided, including:
means for splitting the URL to be detected into a syntax element string according to a URL syntax structure according to RFC specifications;
means for extracting a specified character string from the split character string;
means for judging whether the protocol character string and the port number character string exist or not and performing completion processing on the nonexistent character string part;
a device for reordering the character strings obtained after completion processing to obtain a new URL to calculate a hash value of the new URL and using the hash value as a hash value corresponding to the URL to be detected; and
and the device is used for traversing the malicious URL feature library and comparing and detecting the feature data in the malicious URL feature library with the hash value corresponding to the URL to be detected.
In yet another embodiment of the present disclosure, a computer-readable storage medium storing instructions for formalized detection of malicious URLs in two-dimensional codes is provided, the instructions comprising:
according to RFC specifications, splitting the URL to be detected into syntax element character strings according to a URL syntax structure;
instructions for extracting a specified character string from the split character string;
an instruction for judging whether the protocol character string and the port number character string exist or not and performing completion processing on the nonexistent character string part;
an instruction for reordering the character strings obtained after completion processing to obtain a new URL to calculate a hash value of the new URL and using the hash value as a hash value corresponding to the URL to be detected; and
and the instruction is used for traversing the malicious URL feature library and comparing and detecting the feature data in the malicious URL feature library with the hash value corresponding to the URL to be detected.
Other aspects, features and embodiments of the disclosure will become apparent to those ordinarily skilled in the art upon review of the following description of specific exemplary embodiments of the disclosure in conjunction with the accompanying figures. While features of the disclosure may be discussed below with respect to certain embodiments and figures, all embodiments of the disclosure may include one or more of the advantageous features discussed herein. In other words, while one or more embodiments may have been discussed as having certain advantageous features, one or more of such features may also be used in accordance with the various embodiments of the present disclosure discussed herein. In a similar manner, although example embodiments may be discussed below as device, system, or method embodiments, it should be appreciated that such example embodiments may be implemented in a variety of devices, systems, and methods.
Drawings
So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
FIG. 1 illustrates an application environment in which embodiments of the present disclosure may be implemented.
Fig. 2 shows a block diagram of a two-dimensional code fraud risk identification module according to one embodiment of the present disclosure.
FIG. 3 illustrates a block diagram of a seed input component in accordance with one embodiment of the present disclosure.
Fig. 4 illustrates a block diagram of a two-dimensional code identification component in accordance with one embodiment of the present disclosure.
Fig. 5 illustrates a block diagram of a two-dimensional code-derived risk identification component in accordance with one embodiment of the present disclosure.
Fig. 6 illustrates a block diagram of a two-dimensional code risk management component according to one embodiment of the present disclosure.
Fig. 7 illustrates a data flow diagram for two-dimensional code fraud risk identification, according to one embodiment of the present disclosure.
Fig. 8 shows a flow diagram of a method for two-dimensional code fraud risk identification according to one embodiment of the present disclosure.
Fig. 9 shows a flow diagram of a method for formalized detection of malicious URLs in two-dimensional codes according to one embodiment of the present disclosure.
Detailed Description
Various embodiments will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary embodiments. Embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of these embodiments to those skilled in the art. Embodiments may be implemented as a method, system or device. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation, or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
FIG. 1 illustrates an application environment 100 in which embodiments of the present disclosure may be implemented.
The code scanning payment under the line is more and more frequent by utilizing the mobile phone, and now, many people have realized cashless travel. In recent years, a lawless person steals the two-dimension code fraud collected by a merchant, the false illegal parking list fraud with the two-dimension code, the fraud payment two-dimension code embezzlement, and the hottest sharing bicycle is also pasted with the false two-dimension code, so that the user can transfer money.
For example, in fig. 1, a fraudster 102 in a subway or bus scans and recognizes his generated two-dimensional code 104 with various commercially promoted fake seekers (by way of example and not limitation), and then the victim's 106 cell phone is either embedded with malicious code such as trojan horse programs or jumps to a phishing website, bank card, etc. and swiped. In recent years, with the rise and popularization of electronic commerce, a cheater also orders a victim to a regular electronic commerce to prevent a platform from discovering that the order is swiped, directly sends a two-dimensional payment code to the victim, and allows the victim to recognize the transfer to cheat. By using the convenient two-dimensional code, cheaters think of various cheating modes, and the cheating mode is too defensive to prevent.
In fig. 1, the two-dimensional code 104 is composed of a plurality of small blocks, in which white represents 0 and black represents 1, and the numbers are arranged and combined to obtain a matrix, and then information, such as text, pictures, links, accounts, installation packages, videos, and the like, is compiled through a specific algorithm.
The fraudulent user 102 creates a fraudulent two-dimensional code 104, and the fraudulent two-dimensional code 104 can be classified into a transaction code and a non-transaction code, wherein the transaction code shows a payment interface after being identified, and the non-transaction code opens a url link after being identified, and the url link can be called a two-dimensional code id or two-dimensional code content. Alternatively, a login action or an action to add a contact may be performed after the non-transaction code is identified, and so on.
The fraudster 102 then presents the fraudulent two-dimensional code 104 to the victim 106 through various channels. For example, the rogue user 102 presents the rogue two-dimensional code on his electronic device and induces the victim 106 to scan the code, or sends the rogue two-dimensional code 104 as a picture to the victim 106 through a tool such as social software, the victim 106 identifies the rogue two-dimensional code 104 by long-pressing the received picture, or the victim 106 otherwise downloads the rogue two-dimensional code 104 as a picture and reads and identifies the rogue two-dimensional code 104 in the picture from an album.
After the victim 106 identifies the rogue two-dimensional code 104 in the various manners described above, if the type of the rogue two-dimensional code 104 is a transaction code, the victim 106 may see a payment interface through which the victim 106 may remit money to the rogue 102 for various reasons authored by the rogue 102. For example, a cheater orders a victim at a regular power company to prevent the platform from discovering that the order is swiped, directly sends a collection two-dimensional code to the victim, and allows the victim to perform identification transfer for cheating.
In addition, the fraudster may make a fraud two-dimensional code of the entity, such as pasting the fraud two-dimensional code on paper or a card, and then overlaying the fraud two-dimensional code on other normal two-dimensional codes, for example, the fraudster may overlay the fraud two-dimensional code on a checkout two-dimensional code that the merchant presents for checkout in order to direct money paid by the consumer to the merchant into its checkout account.
On the other hand, if the type of the fraudulent two-dimensional code 104 is a non-transaction code, the victim 106, upon identifying the fraudulent two-dimensional code 104, may download a trojan program made by the fraudster 102, which, when implanted in the electronic device of the victim 106, may steal sensitive information of the victim 106, such as game account numbers, social accounts, bank account numbers, account passwords, online transaction codes of various types of financial apps, and the like, resulting in the account numbers being stolen or assets or funds being stolen.
The victim 106 may also be jumped to a fraud page made by the fraudster 102 after identifying the fraud two-dimensional code 104, which directs the victim 106 to enter the various sensitive information described above, which the fraudster 102 may opportunistically steal the victim's assets or accounts when acquiring.
Aiming at the various fraud scenes, the disclosure discloses a self-circulation two-dimensional code fraud risk identification method. Specifically, when the victim 106 identifies a two-dimensional code, the two-dimensional code is uploaded to the server 108. The two-dimensional code recognition component in the server 108 recognizes the uploaded two-dimensional code using a two-dimensional code recognition algorithm. The identification may be based on various factors, such as two-dimensional code type, code scanning action, two-dimensional code id characteristics, code scanning amount analysis, whether code is scanned for the first time, two-dimensional code association case analysis, and so forth. As can be appreciated by those skilled in the art, the fraudulent two-dimensional code may also be identified based on other factors, and those skilled in the art may adjust the respective weights of the various factors employed as desired.
When the identified two-dimensional code is determined to be a rogue two-dimensional code, the rogue two-dimensional code is added to a rogue two-dimensional code library in the server 108, at which time a fraudster and a fraudster associated with the rogue two-dimensional code are determined.
For the cheater, all the two-dimensional codes newly generated by the cheater are evaluated through a two-dimensional code identification algorithm, and all the two-dimensional codes determined as the cheating two-dimensional codes are added into the cheating two-dimensional code library again to complete two-dimensional code reflow, so that more cheaters can be found, and the steps are completed by a derived risk identification component and a risk management and control component in the server 108.
For a fraudster, the derived risk identification component in the server 108 acquires the collection media of the suspicious transaction before and after the fraudster scans the code and the historical suspicious collection media in a certain period of time, and the risk management component in the server 108 performs continuous transaction interception on the collection media. At the same time, the risk management component in server 108 identifies users other than the identified fraudster who have transacted with or have and are attempting to remit money to the receiving mediums as potential fraudsters who are then examined for the two-dimensional codes used to conduct the suspect transactions to discover more fraudsters, thereby enabling further discovery of more fraudsters.
Therefore, compared with the prior art that the identification can be carried out only through the user self-reporting, the technical scheme disclosed by the disclosure has stronger identification initiative and wider identification range. In addition, different from the existing mode of directly shielding or invalidating the fraud two-dimensional code, the technical scheme disclosed by the disclosure derives and finds more fraud two-dimensional codes and more cheaters and cheaters based on the fraud two-dimensional codes newly manufactured or identified by the cheaters and potential cheaters associated with the cheaters, so as to realize a self-circulation derived risk identification scheme.
Specifically, the scheme is not one-time identification of the two-dimensional code, the fraudster collection medium and the fraudster (including the potential fraudster), but the identification results, namely the determined two-dimensional code newly manufactured or code-scanned for the first time by the fraudster subsequently and the two-dimensional code used for suspicious transaction by the potential fraudster, which is found through the suspicious collection medium used by the fraudster are used as input of different links, and the management and control effects of identifying more two-dimensional codes, fraudsters, fraudster collection medium and fraudsters (including the potential fraudsters) in the aspects of raising the cost of the black production plan and protecting the fraudsters are greatly improved.
The foregoing aspects of the disclosure are described in detail below with reference to various block diagrams, data flow diagrams, and method flow diagrams.
Fig. 2 shows a block diagram of a fraud risk identification module 200 according to one embodiment of the present disclosure. Fraud risk identification module 200 includes a seed input component 202, a two-dimensional code identification component 204, a derivative risk identification component 206, and a risk management component 208. Those skilled in the art will appreciate that the various functions of fraud risk identification module 200 and the various components included therein may be implemented on the user's computing device, or on a server/cloud, or partially on the user's computing device and partially on the server/cloud.
Referring to fig. 2, the seed input component 202 is configured to receive a two-dimensional code provided by a reporting user as a seed input. In one embodiment of the invention, the reporting user may refer to a user who finds himself deceived or suffered a loss due to the two-dimensional code provided by the fraudster and reports the encountered fraud case. In addition, the seed input component 202 is also utilized to receive two-dimensional codes provided by potentially deceived users as additional seed inputs. In one embodiment of the present disclosure, the potentially deceived user may be a user associated with the fraudulent collection media in the case of a deceived user (e.g., a user who has ever or is attempting to remit money to the suspect collection media, a user who has remitted to the suspect collection media).
After the user finds himself deceived, the user can upload the two-dimensional code recognized in a certain period of time to the server through the own computing device. In one embodiment of the invention, by way of example and not limitation, a user may take an image of a rogue two-dimensional code through a camera on his own computing device and then upload the image to a server. As can be appreciated by one skilled in the art, the user can provide the rogue two-dimensional code for receipt by the seed input component 202 in a variety of other ways.
In one embodiment of the present disclosure, the fraud two-dimensional code may be classified into a transaction two-dimensional code and a non-transaction two-dimensional code. In the case of a transaction two-dimensional code, by way of example and not limitation, the user being spoofed may mean that a fraudster entices the user for some reason to remit money thereto via the fraudulent two-dimensional code. In the case of a non-transaction two-dimensional code, by way of example and not limitation, the user being spoofed may refer to the user identifying the spoofed two-dimensional code and then jumping to an external link, or downloading an installation package for a Trojan program, or the like.
Referring to fig. 2, the two-dimensional code recognition component 204 is configured to recognize the two-dimensional code received by the seed input component 202. Specifically, the two-dimensional code identification component 204 can determine whether the two-dimensional code is a rogue two-dimensional code based on at least one of a two-dimensional code type, a code scanning action, a two-dimensional code id characteristic, a code scanning amount, whether the code is scanned for the first time, two-dimensional code association case analysis, and the like of the two-dimensional code. Each of the above features is specifically described below.
The two-dimension code type is generally divided into (1) transaction two-dimension codes, and payment can be directly paid by presenting a payment interface after the transaction two-dimension codes are identified; and (2) the non-transaction two-dimensional code is identified and then the non-transaction two-dimensional code jumps to a website link or a download link. In one embodiment of the present disclosure, when the type of the two-dimensional code is a non-transaction two-dimensional code (external link code), the two-dimensional code is likely to induce the user to input sensitive information (such as a phishing website) or download an unknown risk application (e.g., a fee-absorbing app, a trojan horse program, etc.) installation package, and thus is considered more likely to be a fraudulent two-dimensional code.
This is because the user is prone to be paralyzed when the user recognizes the two-dimensional code and is not a direct transfer or remittance, and the less alert user is not readily aware of it immediately in the event of jumping to a phishing page to induce the user to enter sensitive information or to download a trojan program to directly steal sensitive information in the user's computing device. Fraud two-dimensional codes are mostly in the form of non-transaction two-dimensional codes in the art.
In one embodiment of the present disclosure, by way of example and not limitation, for non-transactional two-dimensional codes (external link two-dimensional codes) that are more likely to be fraudulent two-dimensional codes, rather than directly masking the two-dimensional code as is common in the art, the user is allowed to click on an external link to be able to discover what the true purpose of the two-dimensional code is, such as opening a page for the user to enter sensitive information, downloading an installation package for a trojan program, and so forth. And when the user clicks the external link, taking the user as a key safety control object. For example, when the user has finished entering sensitive information and clicks a button such as "submit", "complete", etc. to submit form information to the server, the submission operation is intercepted and the user is informed of the risk therein. The page is then identified as a fraudulent page and informed of the security risk in other users when they open the page.
The code scanning action can be divided into camera code scanning, long press identification and album identification. The camera scanning refers to that a camera on the computing device of the user is called to scan and recognize the two-dimensional code through a scanning function in the computing device of the user (the device is carried by itself or an application program installed on the device is carried by itself). The long press identification refers to identifying the two-dimensional code in a mode of selecting the two-dimensional code in the identification image by long press. The album identification refers to selecting an album in a scan function, selecting a picture already stored in the album and identifying a two-dimensional code in the picture.
In different fraud scenarios, the code scanning action can reflect different degrees of risk. For example, in a virtual commodity selling scene, a fraudster sends a generated two-dimensional code picture to a user through a social chat tool, the user stores the two-dimensional code in an album, and the content of the two-dimensional code is read in an album identification mode.
For two-dimensional code id features, blacklisting techniques are commonly employed in the art. The blacklist technology is to record all discovered malicious websites into an address list, i.e. a so-called blacklist, and accordingly determine whether the website accessed by the user is a malicious website. The blacklist technology is simple to implement, but the problem is that it is difficult to update the blacklist in time, most of the current browser manufacturers adopt the method, and establish a blacklist library at a user end and update the blacklist library every few days. This approach is a relatively simple method for identifying malicious web addresses as a browser, but has the significant disadvantage of lacking the ability to identify unknown web pages.
The two-dimensional code id features in the disclosure include the length of the code corresponding to the URL, text editing features, URL source code features. Generally, the URLs of two-dimensional codes made by a cheating party or a cheater have certain similarity.
According to the RFC specification, the syntax format of a URL is as follows: "scheme:// username: password @ domain: port/path? query _ string # fragment _ id (see: RFC1738 standard HTTP:// www.1etf.0rg/RFC/rfcl738.txt), all URLs must follow this rule, where the protocol (scheme) portion is the HTTP protocol by default, the username password (username: password) portion can be omitted, the port number (port) is 80 by default in the HTTP protocol, which is also omitted, where the fragment _ id portion has no actual value in detecting whether it is a malicious URL.
Domain is IP;
port not 80/443;
"more than 4 in domain;
the path depth is larger;
URLs contain characters such as @, -, etc.;
URLs contain sensitive vocabularies such as ' secure ', ' account ', ' webscr ', ' logic ', ' ebayiseapi ', ' sign ', ' bank ', ' confirm ', ' submit ', update ';
URL length exceeds 23 characters;
approximation to a legitimate domain name, e.g., replace l with 1, o with 0, etc.;
multiple common top-level domains appear in the main domain name.
Further, by way of example and not limitation, a page of a URL to which a two-dimensional code corresponds is more likely to have a malicious URL if it has the following characteristics:
a large number of static links;
a number of non-native domain name links, resources, methods;
a large number of hidden blocks;
there are iframes that point to other domain names;
back links very rarely
Copy/ipc number false or none;
whether there is a login window. Phishing websites often entice users to reveal personal sensitive information through a login window, and the following logic is usually adopted to judge whether a webpage contains the login window: firstly, all < form > tags are found in a page, then < input > tags in the page are found, and finally, keywords such as password and pass are matched for each < input > tag. If password and pass are not matched, a strategy of matching keywords such as logic, sign and the like in all < form > tags is provided. Other methods may be employed to determine whether a login window is included.
However, according to the above characteristics of the URL format, there may be variability in the URL format, and multiple URLs that are not identical may point to the same link address, for example, www.xxxx.com and http:// www.xxxx.com:80 are the same link address, and there may be multiple pairs of usernames and passwords with the same access right in the URL link of the ftp protocol, so that even in ftp connections with different usernames and passwords, the pointed access files are consistent, for example, ftp:// username: password @ ftp.
Therefore, in view of the variability of the above-mentioned URL format, a malicious URL, which is a simple change, without changing the nature of the malicious link, the content of the malicious URL is not completely consistent with the features in the virus library to be matched, so that the malicious two-dimensional code is missed, many hackers use such a vulnerability, often transform the format of the URL address of the horse-hanging website to avoid recognition, and some malicious codes, when propagating themselves through the network, often modify the value in the query part of the horse-hanging URL address, and the value may be randomly generated, thereby ensuring that the malicious URL link has longer timeliness.
In an embodiment of the present disclosure, for the above situation, a set of formalization rules can be customized to convert a URL string into uniform formal data to be detected, which is used for matching with a feature library, and determining whether the URL is a malicious link. The starting point of the formalized rule is to eliminate the variability of the URL character string format, abandon redundant information which has no practical significance for detection, and supplement default information which does not exist in the URL character string to form the URL character string to be detected with the format 'scheme:// domain: port/path'.
Specifically, a formalization detection method for a malicious URL in a two-dimensional code is provided, which includes:
a, according to RFC specifications, splitting a URL to be detected into syntax element character strings according to a URL syntax structure;
b, extracting specified character strings from the character strings obtained by splitting, wherein the specified character strings comprise protocols, domain names, port numbers and paths;
c, judging whether the protocol character string and the port number character string exist or not, and performing completion processing on the nonexistent character string part;
d, reordering the character strings obtained after completion processing to obtain a new URL, and calculating the hash value of the new URL to serve as the hash value corresponding to the URL to be detected;
and e, traversing the malicious URL feature library, and comparing and detecting the feature data in the malicious URL feature library with the hash value corresponding to the URL to be detected.
Further, the completion processing of the nonexistent protocol character string or the nonexistent port number character string includes the steps of:
judging whether the protocol character string exists or not, and if not, supplementing HTTP as a default protocol;
and judging whether the port number character string exists or not, and if not, supplementing a default port number according to the protocol type in the protocol character string.
Further, supplementing the default port according to the protocol type in the protocol string comprises supplementing 80 as a default port number if the protocol type is an HTTP protocol; supplementing 21 as a default port number if the protocol type is the FTP protocol; other protocols are processed in a unified mode, the port number is not supplemented, and an empty character string is added to serve as the port number.
And the malicious URL feature library is a hash value list consisting of hash values corresponding to each malicious URL obtained through the steps a to d by taking each malicious URL captured by an antivirus manufacturer as a URL to be detected in advance.
The method carries out regularization processing on the character string format aiming at the characteristic that the character string format of the malicious URL link address has variability, abandons the variable part which is meaningless to detection, supplements the default omitted part, and forms the data to be detected with enough information content. The character string format of the regulated URL address is 'scheme:// domain: port/path', the protocol, the domain name, the port and the path are reserved, the data can completely determine the address information pointed by one URL, so that the detection of the data in the format and the detection of the source URL data are equivalent. For forming URL character strings to be detected with the format of ' scheme:// domain: port/path ', wherein the protocol ' scheme ' part is added with a default HTTP protocol if omitted, the port ' part is omitted, is added with a default 80 if the protocol is an HTTP protocol, is added with 21 if the protocol is an FTP protocol, and the user name, password, query _ string and fragment _ id parts in RFC specification of the URL are deleted, so that the URL character strings to be detected are completed, for example, the URL link address is ' www.test, com/main/index and html ', and the URL character strings are converted into ' HTTP:// www.test, com:80/main/index and html ' after the formatting of the above rules. In order to facilitate detection and control of the scale of the feature library, the URL character string to be detected is hashed and used as detection data.
In another embodiment of the present disclosure, if the URL to be detected in the two-dimensional code is a short URL, the URL is converted into a URL with a corresponding length by using a reduction method and step a is started. In addition, for URLs that cannot be judged by rules (such as RFC specifications), feature fields are extracted to build a prediction file, and model prediction is performed on the URLs by a classifier trained and continuously updated under line.
Specifically, the offline training mode of the classifier is as follows: extracting relevant characteristics of URLs from a URL knowledge base to construct a training file, and then training, optimizing and storing a model by adopting a classification algorithm, wherein the classification algorithm is at least decision tree, support vector machine, logistic regression, random forest or multiple multiplexing; the offline training of the classifier is periodically or non-periodically updated along with the change of the URL knowledge base, when the maliciousness of the URL which cannot be judged by the predefined rule is detected, the relevant characteristic fields of the URL are extracted to construct a prediction file, and then the prediction file is detected by adopting the stored model to obtain and output a prediction result.
In another embodiment of the disclosure, the URL in the binary data of the two-dimensional code pattern to be detected is analyzed and matched with the HASH library of the malicious URL, and if the matching is successful, the two-dimensional code to be detected is the malicious two-dimensional code. Specifically, the URL in the two-dimensional code is extracted, the white list URL is filtered, and the Bloom-filter algorithm is adopted to store the malicious URL in the malicious URL library.
In another embodiment of the present disclosure, the server may collect malicious links, trojans or viruses, or illegal information such as URLs of phishing websites in advance, and add the illegal information to the blacklist list; for another example, the server may collect security information such as a legal link and a URL of a legal website in advance, and add the security information to the white list. The blacklist and the whitelist can be updated regularly, and the updated blacklist and whitelist are sent to the user equipment; or the user equipment logs in the server side to download the blacklist and the white list which are collected by the server side in advance.
In the embodiment, a website information query technology based on a server side can be adopted to intercept malicious websites, security information of the websites is queried in real time from the server side as a basis library, when any website can be accessed by user equipment, whether the website is safe or not can be queried in real time from the server side, the user equipment judges through the website query information returned by the server side, if the website is a malicious website, real-time interception is carried out, and if the website is a normal website, no operation is carried out. Meanwhile, the safety information of the website is seamlessly updated in the server at the server end, so that the safety of webpage browsing of the user equipment can be rapidly and effectively protected in real time.
In addition, the URL of the webpage accessed by the user equipment browser, the URL linked in the webpage content and the URL of the downloaded file are encrypted into ciphertext through a cloud query protocol and sent to the server side, the server side carries out intelligent analysis and comparison in a server side website library according to the website information ciphertext submitted by the user equipment, the comparison result is returned to the user equipment, and the user equipment determines whether the webpage access behavior of the user equipment browser is safe or not according to the judgment result returned by the server side.
In another embodiment of the present disclosure, URL matching is performed on a URL included in the content of the communication message and a preset URL blacklist; if the URL is successfully matched, the URL is stored in a malicious URL library, otherwise: matching the IP address of the sending terminal equipment of the communication message with a preset IP address blacklist; and if the IP address is successfully matched, storing the URL into a malicious URL library.
The code scanning amount in the present disclosure includes: user dimension: the number of times of scanning the code by the user and the number of different codes scanned by the user; equipment dimension: the number of code scanning of the equipment; code dimension: the number of identification users of each two-dimensional code. A two-dimensional code is identified by a number of users exceeding a certain number (greater than 1000, by way of example and not limitation), and the two-dimensional code may be considered to be substantially not suspected of being fraudulent.
Whether the code is first scanned is used as a label in determining whether the code is a fraudulent two-dimensional code. Specifically, if a user is the first code scanning user of the current two-dimensional code, the first code scanning user may be a fraudster who performs the code scanning test, because the fraudster needs to scan the code once by himself after manufacturing the fraudster two-dimensional code to check whether the code is as expected, for example, jump to a specified phishing page, download a specified trojan program, or appear a specified transaction code, and so on.
The two-dimensional code correlation case analysis refers to the condition that the complaints of the users for identifying the two-dimensional codes are cheated. Generally, if the number of complaining users associated with a two-dimensional code is greater, the more likely the two-dimensional code is to be a fraudulent two-dimensional code.
By combining at least one of the above characteristics and historical fraud two-dimensional code data, the two-dimensional code recognition component 204 establishes a fraud two-dimensional code recognition algorithm, analyzes the two-dimensional code scanned by the reporting user, and finally determines whether the two-dimensional code is a fraud two-dimensional code. By way of example and not limitation, in one embodiment of the disclosure, a two-dimensional code is determined to be a rogue two-dimensional code if the two-dimensional code type is an external link code and the code scanning action is identifying the two-dimensional code from an album and the two-dimensional code id characteristics conform to one or more characteristics of a rogue url. By way of example and not limitation, in another embodiment of the present disclosure, a two-dimensional code is determined to be a fraudulent two-dimensional code if the two-dimensional code has been complained more than a certain threshold. By way of example and not limitation, in yet another embodiment of the disclosure, if a user identifies a first code scan of a threshold percentage (e.g., 80%, 70%, or other threshold percentage) of two-dimensional codes identified by the user (e.g., the user identifies 100 two-dimensional codes, wherein the first code scan of 80 two-dimensional codes is the user), the two-dimensional code that the user first scans is considered to be a fraudulent two-dimensional code.
The two-dimensional code identification component 204 automatically cleans the scavenged data of the deceived user every day and performs fraud risk identification. The two-dimensional code recognition component 204 writes the daily recognized fraudulent two-dimensional codes into a full table, i.e., a database of fraudulent two-dimensional codes, in date partitions, and ensures that data in the table is not duplicated.
When a two-dimensional code is determined to be a rogue two-dimensional code, it is typically straightforward in the prior art to mask the two-dimensional code, whereas in the present disclosure derivative risk identification component 206 first determines all users of each rogue two-dimensional code, i.e., all users who have identified the two-dimensional code.
The derivative risk identification component 206 then determines the first code-scanning user of the rogue two-dimensional code as a fraudster, rather than directly masking the two-dimensional code. Since a fraudster would typically identify a fraudulent two-dimensional code for testing by himself after producing the two-dimensional code. For the identified fraudster, derivative risk identification component 206 monitors all two-dimensional codes that are newly manufactured or first scanned subsequently by the fraudster and passes these two-dimensional codes as suspected-fraud two-dimensional codes to risk management component 208 for risk assessment and reflow of these two-dimensional codes.
In another embodiment of the present disclosure, if a certain user identifies more than a threshold number of fraudulent two-dimensional codes (including first and non-first sweeps), derivative risk identification component 206 identifies the user as a suspected fraudster. As will be appreciated by those skilled in the art, the number of fraudulent two-dimensional codes identified by any one user is limited, and thus if a user identifies an excessive number of fraudulent two-dimensional codes, the user can be considered a suspected fraudulent user. Thus, the suspected fraudulent user identified or newly identified two-dimensional code is monitored in an important manner to find more fraudulent two-dimensional codes.
In another embodiment of the present invention, the derived risk identification component 206 obtains all the two-dimensional codes that were once identified and identified in the future by the fraudulent user, and transmits the information of the two-dimensional codes to the risk management and control component 208 for risk evaluation of the two-dimensional codes, so that the fraudulent two-dimensional codes can be traced and prevented to find more fraudulent two-dimensional codes. By way of example and not limitation, of all two-dimensional codes once identified and identified in the future by a rogue user, the two-dimensional code of the rogue user as the first code-scanning user is highly likely to be a rogue two-dimensional code.
In another embodiment of the invention, derivative risk identification component 206 periodically collects evidence of fraud by the identified fraudsters and pushes it to the police for accurate offline strikes.
In addition, the derivative risk identification component 206 determines a non-first-time code-scanning user of the rogue two-dimensional code as a rogue recipient. In one embodiment of the present disclosure, if the fraudster identifies a transaction code, the derivative risk identification component 206 determines the current suspect collection medium associated with conducting the current suspect transaction with the transaction code. In one embodiment of the present disclosure, the current suspect collection medium may be the fraudster's own online or offline funds account, or another person's online or offline funds account that the fraudster is able to master.
In another embodiment of the present disclosure, if the non-transaction code identified by the fraudster, the derivative risk identification component 206 ranks the fraudster user as the key security control object. Specifically, when a non-transaction fraudulent two-dimensional code is identified by a fraudulent user and a phishing page is opened, the derivative risk identification component 206 prevents the user from submitting the page and informing the user of the security risk, or when a non-transaction fraudulent two-dimensional code is identified by a fraudulent user and a Trojan program starts to be downloaded, the derivative risk identification component 206 prevents the Trojan program from being downloaded and informs the user of the risk therein.
In addition to determining the current suspicious transaction and the current suspicious collection media, derivative risk identification component 206 also determines historical suspicious transactions and historical suspicious collection media associated with the fraudster. As will be appreciated by those skilled in the art, historical suspicious transactions may include all historical transactions related to the current fraud case. By way of example and not limitation, a historical suspicious transaction may also be a transaction between all of the non-acquaintances within 12 hours prior to the current transaction, and so on.
After derivative risk identification component 206 determines the current suspect collection medium and the historical suspect collection medium, risk management component 208 continuously intercepts suspect transactions conducted through the current suspect collection medium and the historical suspect collection medium to prevent a fraudster from receiving and diverting the stolen money. These suspicious transactions may refer to all transactions associated with these suspicious collection media. In addition, risk management component 208 also identifies users other than the fraudster who have transacted with or were and are attempting to remit money to the suspect collection media (including current suspect collection media and historical suspect collection media) as potentially fraudulently users, thereby enabling discovery of fraudsters involved in other fraudulent cases and fraudulently two-dimensional codes associated with other fraudulent cases identified by the fraudster through the current fraudulent case to enable self-loop-derived risk identification.
By way of example and not limitation, after the two-dimensional code identification component 204 determines a rogue two-dimensional code a, the following steps are performed: (1) the fraudster 1 and the cheaters 1-N (i.e., all non-first-scan users of the fraudulent two-dimensional code a) associated with the fraudulent two-dimensional code a are identified. Then, for the cheated persons, (2) determining the current suspicious transactions and the historical suspicious transactions carried out by the cheated persons 1-N through the cheating two-dimensional code A, thereby determining the current suspicious collecting media and the historical suspicious collecting media related to the suspicious transactions and continuously intercepting the suspicious transactions carried out through the collecting media. Next, (3) potential fraudsters 1-M who have transacted with and attempted to transact through the suspect collection media are identified. Subsequently, (4) one or more two-dimensional codes used by the potential fraudsters 1-M to conduct the suspicious transactions are determined, the two-dimensional codes are traversed, and a fraudulent two-dimensional code B in the two-dimensional codes is determined. For a fraudster, (5) all two-dimensional codes which are newly manufactured or scanned for the first time are identified and the two-dimensional codes are identified as the frauds two-dimensional codes C.
For the newly identified one or more fraud two-dimensional codes B and C, repeating the above steps (1) to (5), namely, continuously identifying the fraudster 1 associated with the fraud two-dimensional codes B, C or one or more other fraudsters 2 and fraudsters 1 to K known to the fraudster 1, and performing a series of derived risk identification and risk management steps, and repeating the steps circularly, so as to find more fraudsters, more fraudsters and more fraud two-dimensional codes, wherein the identification of the new fraud two-dimensional codes triggers the cyclic execution of the above steps (1) to (5), thereby realizing the self-circulation derived risk identification and risk management.
In one embodiment of the present disclosure, upon identifying a potential fraudster, risk management component 208 further identifies all suspicious transactions by the potential fraudster within a period of time before and after the occurrence of fraud and causes the transactions to fail in full. In particular, risk management component 208 determines the time at which a potential fraudster conducts or attempts to conduct transactions with a suspect transaction medium, identifies suspect transactions within a certain time period (e.g., 12 hours) around that time, and causes the transactions to fail in full to prevent the potential fraudster from being cheated.
FIG. 3 illustrates a block diagram of the seed input component 202 according to one embodiment of the present disclosure. Referring to FIG. 3, the seed input component 202 includes a reporting user two-dimensional code receiving subcomponent 302 for receiving as seed input a two-dimensional code provided by a reporting user in association with a reported fraudulent case, and a potentially deceived user two-dimensional code receiving subcomponent 304 for receiving a two-dimensional code through which a potentially deceived user conducts a suspicious transaction in association with a suspicious collection medium.
In one embodiment of the present disclosure, fraud encountered by the user may refer to the user remitting to the fraudster by identifying a transaction two-dimensional code made by the fraudster; the user opens a phishing website made by the fraudster by identifying an external link two-dimensional code made by the fraudster and inputs sensitive information and causes the assets or funds to be stolen; the user downloads the Trojan horse program made by the fraudster by identifying the external link two-dimensional code made by the fraudster and causes sensitive information to be leaked or funds to be stolen, and the like. The technical solution in the present disclosure is not limited to the above-described case.
By way of example and not limitation, when a user finds that fraud is encountered, the user reports the fraud case through the mobile computing device or the desktop computing device and uploads information related to the fraud case to the server. Seed input component 202, implemented on the server, then receives this information via its reporting user two-dimensional code receiving subcomponent 302 and determines therefrom the two-dimensional code associated with the fraudulent case, i.e., the two-dimensional code used by the fraudulent user to commit fraud.
In one embodiment of the disclosure, after determining that a two-dimensional code is a fraudulent two-dimensional code, a fraudster and a deceased person of the fraudulent two-dimensional code are determined, and a suspicious collection medium used by the fraudster through the fraudulent two-dimensional code is determined. Subsequently, users associated with suspicious transactions conducted through these suspicious collection media are identified and identified as potentially deceased users. In one embodiment of the present disclosure, by way of example and not limitation, a potentially deceived user being associated with suspicious transactions conducted via the suspicious collection media may mean that the user has had a transaction with the suspicious collection media or has or is attempting to conduct a transaction via the suspicious collection media.
Subsequently, the potentially deceived user two-dimensional code receiving subcomponent 304 in the seed entry component 202 receives the information of the potentially deceived user and thereby determines the two-dimensional codes that the potentially deceived user used to conduct these suspicious transactions and will enter as additional seeds to discover more rogue two-dimensional codes, more fraudsters, and more fraudsters.
When two-dimensional codes used by the reporting user and the potentially deceived user are determined, the seed input component 202 passes these two-dimensional codes to the two-dimensional code identification component 204
Fig. 4 illustrates a block diagram of two-dimensional code identification component 204, according to one embodiment of the present disclosure. Referring to fig. 4, when the two-dimensional code recognition component 204 receives a two-dimensional code from the seed input component 202, the received two-dimensional code is recognized by the two-dimensional code recognition algorithm 402 to determine whether it is a rogue two-dimensional code.
Specifically, the two-dimensional code recognition component 204 determines whether the two-dimensional code is a fraudulent two-dimensional code based on the two-dimensional code type, the anchor scanning action, the two-dimensional code id characteristic, the code scanning amount, whether the code is scanned for the first time, and the two-dimensional code association case analysis of the two-dimensional code by the two-dimensional code recognition algorithm 402. If a rogue two-dimensional code is determined, the rogue two-dimensional code is added to the rogue two-dimensional code library 404 and is ensured not to be duplicated with existing rogue two-dimensional codes in the library.
In one embodiment of the present disclosure, the fraud two-dimensional codes identified daily are written into a full table, i.e., the fraud two-dimensional code library 404, in date divisions.
In one embodiment of the present disclosure, the two-dimensional code recognition component 204 periodically samples and checks the accuracy of the two-dimensional code recognition or the case of the anti-missing, analyzes the reason, and adjusts or optimizes the two-dimensional code recognition algorithm 402 accordingly to further improve the recognition accuracy.
FIG. 5 illustrates a block diagram of the derivative risk identification component 206 according to one embodiment of the present disclosure. Referring to fig. 5, when a rogue two-dimensional code is received, derivative risk identification component 206 determines, via a rogue/rogue determination subcomponent 502, a rogue (i.e., the user that first scanned the rogue two-dimensional code) and a rogue (i.e., the user that did not first scan the rogue two-dimensional code) associated with the received rogue two-dimensional code. Subsequently, for the identified fraudster, derivative risk identification component 206 monitors all two-dimensional codes subsequently newly manufactured or first scanned by the fraudster via new fraud two-dimensional code monitoring subcomponent 504 and passes these two-dimensional codes as suspected fraud two-dimensional codes 508 to risk management component 208 for risk assessment and reflow of these two-dimensional codes.
For an identified fraudster, derivative risk identification component 206 first determines a fraudulent transaction by the fraudster via the fraud two-dimensional code via suspect collection media determination sub-component 506, and thereby determines suspect collection media 510 for conducting the fraudulent transaction.
In one embodiment of the present disclosure the suspect collection media 510 also includes historical suspect collection media associated with historical transactions conducted within a certain time period prior to the current fraudulent transaction conducted by the fraudster to discover more fraudulent cases.
Fig. 6 illustrates a block diagram of the risk management component 208 according to one embodiment of the present disclosure. Referring to fig. 6, risk management component 208 identifies and adds the received suspected rogue two-dimensional code as a rogue two-dimensional code 608 to the rogue two-dimensional code repository through rogue two-dimensional code identification and reflow subcomponent 602.
In one embodiment of the present disclosure, the fraud two-dimensional code recognition and reflow subcomponent 602 may directly recognize all suspected fraud two-dimensional codes, i.e., all two-dimensional codes that were subsequently newly manufactured or first scanned by a fraudster, as fraud two-dimensional codes 608 and add them to the fraud two-dimensional code library 404 and ensure that they are not duplicated with existing fraud two-dimensional codes in the library, thereby triggering the fraudster/fraudster determination subcomponent 502 to obtain fraudsters and new fraudsters associated with the fraud two-dimensional codes and perform subsequent risk prevention and control steps, thereby enabling self-loop-derived risk recognition.
In another embodiment of the present disclosure, the fraud two-dimensional code identification and reflow subcomponent 602 can also evaluate the received suspected fraud two-dimensional code by the two-dimensional code identification component 204 to determine if it is a fraud two-dimensional code 608 and, if it is determined to be a fraud two-dimensional code, add it to the fraud two-dimensional code repository 404 and trigger subsequent loop-derived risk prevention steps, as described above.
Returning to FIG. 6, a collection medium interception subcomponent 604 in risk management component 208 is used to continually intercept all transactions related to suspicious collection medium 510, including current suspicious transactions and historical suspicious transactions, to prevent further fraud.
In addition, a potential fraudster determination subcomponent 606 in risk management component 208 further identifies users other than the current fraudster associated with such suspicious transactions as potential fraudsters 610 and provides them to potential fraudster user two-dimensional code receiving subcomponent 304 for receiving two-dimensional codes through which the potential fraudster user conducted suspicious transactions associated with suspicious payment media, and subsequently determines whether they are fraudulent two-dimensional codes through two-dimensional code identifying component 204, thereby enabling the discovery of further fraudulent two-dimensional codes for self-loop derived risk prevention and control.
FIG. 7 illustrates a data flow diagram for fraud risk identification according to one embodiment of the present disclosure. Referring to fig. 7, the seed input component 202 receives and passes the two-dimensional code 702 provided by the reporting user to the two-dimensional code recognition component 204. Two-dimensional code identification component 204 determines whether received two-dimensional code 702 is a rogue two-dimensional code based on a rogue two-dimensional code identification algorithm. If the received two-dimensional code 702 is a fraudulent two-dimensional code, the two-dimensional code identification component 204 passes the identified fraudulent two-dimensional code 704 to the derivative risk identification component 206.
Subsequently, derivative risk identification component 206 determines a fraudster (typically the producer and first scanner of the fraudulent two-dimensional code) and a fraudster (typically a non-first scanner) associated with the fraudulent two-dimensional code 704. For the fraudster, derived risk identification component 206 monitors and submits its subsequent newly manufactured or first scanned two-dimensional code 706 to risk management component 208 for further evaluation and reflow. Therefore, when a fraudster does not use a fraudster two-dimensional code newly manufactured by the fraudster, the fraudster two-dimensional code is found and is recorded into a prevention and control system, and the fraudster prevention and control can be carried out in advance. For the fraudster, derivative risk identification component 206 determines transactions by the fraudster via fraud two-dimensional code 704, including current suspicious transactions as well as historical suspicious transactions, by way of example and not limitation, non-acquaintance transactions that are within a period of time prior to the current suspicious transactions. Derivative risk identification component 206 thus determines suspicious collection media 708, including current suspicious collection media and historical suspicious collection media, for conducting these suspicious transactions. Derivative risk identification component 206 then passes these suspicious collection media 708 along with the fraudster's newly identified two-dimensional code 708 to risk management component 208.
Risk management component 208 identifies the fraudster first-time-scanned two-dimensional code 706 as a new fraud two-dimensional code 710 and provides it to derivative risk identification component 206 for determination of new fraudsters (if any) and new fraudsters (if any), thereby discovering the new fraudster first-time-scanned two-dimensional code 718 and identifying it as a new fraud two-dimensional code 722, as described above for discovering new fraud two-dimensional code 710, thereby enabling self-loop derivative risk management for the fraudster.
In addition, risk management component 208 also identifies and provides users associated with suspect collection media 708 as potential fraudsters 712 to seed input component 202 for determining, via two-dimensional code identification component 204, whether potential fraudster-identified two-dimensional code 714 is a fraudster two-dimensional code to discover more new fraudsters 716, as with discovering new fraudsters 710 described above, and thus new suspect collection media 720 and new potential fraudsters 722, as with discovering suspect collection media 708 and potential fraudsters 712 described above, thereby enabling self-loop-derived risk management for fraudsters.
As shown in fig. 7, as long as a new fraudulent two-dimensional code is found, the subsequent derivative risk control steps for the fraudster and the fraudster can be triggered, so that the prior identification of the new fraudulent two-dimensional code of the fraudster can be realized to improve the risk control efficiency. As long as a new fraud two-dimensional code is found in the subsequent derivative risk control steps, additional derivative risk control steps are triggered again, and the steps are repeated in a circulating manner to realize self-circulating and automatic fraud risk identification and prevention control. As will be appreciated by those skilled in the art, the vertical ellipses in fig. 7 represent the loop continuation of the self-loop-derived risk management step shown.
Fig. 8 shows a flow diagram of a method 800 for fraud risk identification according to one embodiment of the present disclosure.
In step 802, a two-dimensional code provided by a reporting user is received as a seed input. The received two-dimensional code is associated with a fraud case reported by the user.
At step 804, the received two-dimensional code is identified to identify a rogue two-dimensional code. The identification may be based on various factors, such as code type, code scanning action, code id characteristics, code scanning volume analysis, whether code is scanned for the first time, and code association case analysis.
When the rogue two-dimensional code is identified, the method 800 continues to the following steps:
at step 806, a fraudster and a fraudster associated with the fraud two-dimensional code are determined. In one embodiment of the present disclosure, a fraudster may refer to a first code-scan user of an identified fraudulent two-dimensional code, while a fraudster may refer to a non-first code-scan user of the fraudulent two-dimensional code.
At step 808, the two-dimensional code of the subsequent first code scan by the fraudster is identified to identify a new rogue two-dimensional code. In one embodiment of the disclosure, all two-dimensional codes subsequently scanned for the first time by a fraudster or newly manufactured are directly recognized as the fraud two-dimensional codes. In another embodiment of the present disclosure, a fraudster's subsequent first code scan or all newly manufactured two-dimensional codes are identified based on code type, code scan action, code id characteristics, code scan amount analysis, whether the code is first scanned, and code association case analysis to determine whether they are rogue two-dimensional codes. The method 800 then returns to step 806 to implement self-loop-derived risk prevention and control for the fraudster.
In step 810, the current suspicious transaction and the historical suspicious transaction performed by the cheater through the cheating two-dimensional code are determined. In one embodiment of the present disclosure, the historical suspicious transactions may include all historical transactions related to the current fraud case. By way of example and not limitation, a historical suspicious transaction may also be a transaction between all of the non-acquaintances within 12 hours prior to the current transaction.
At step 812, a current suspect collection medium and a historical suspect collection medium associated with the current suspect transaction and the historical suspect transaction are determined.
At step 814, suspicious transactions associated with the current suspicious collection media and the historical suspicious collection media are continuously intercepted. In one embodiment of the present disclosure, the suspicious transactions may refer to all transactions associated with the suspicious collection media.
At step 816, the user associated with the suspicious transaction, other than the fraudster, is identified as a potential fraudster. In one embodiment of the present disclosure, upon identifying a potential fraudster, all suspicious transactions by the potential fraudster within a period of time before and after the occurrence of fraud are further identified and the transactions are all failed to prevent fraud.
At step 818, a two-dimensional code associated with the suspicious transaction is determined. The method 800 then returns to step 802 to implement self-loop-derived risk prevention and control for the fraudster.
Fig. 9 shows a flow diagram of a method 900 for formal detection of malicious URLs in two-dimensional codes according to one embodiment of the present disclosure. The method comprises the following steps:
at 902, according to RFC specifications, splitting a URL to be detected into syntax element character strings according to a URL syntax structure;
at 904, extracting a specified string from the split string, including a protocol, a domain name, a port number, and a path;
at 906, judging whether the protocol character string and the port number character string exist, and performing completion processing on the nonexistent character string part;
at 908, reordering the completed strings to obtain a new URL, and calculating a hash value of the new URL as a hash value corresponding to the URL to be detected;
at 910, the malicious URL feature library is traversed and the feature data in the malicious URL feature library is compared with the hash value corresponding to the URL to be detected for detection.
As will be appreciated by one of skill in the art, the steps in this flowchart may be performed by hardware (e.g., processors, engines, memory, circuitry), software (e.g., operating system, applications, drivers, machine/processor-executable instructions), or a combination thereof. As one of ordinary skill in the art will appreciate, embodiments may include more or fewer steps than those shown.
Embodiments of the present invention are described above with reference to block diagrams and/or operational illustrations of methods and systems according to embodiments of the invention. The functions/acts noted in the blocks may occur out of the order noted in any flowchart. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
Claims (10)
1. A malicious two-dimension code detection method comprises the following steps:
according to RFC specifications, splitting a URL to be detected in the two-dimensional code into syntax element character strings according to a URL syntax structure;
extracting specified character strings including a protocol, a domain name, a port number and a path from the character strings obtained by splitting;
judging whether the protocol character string and the port number character string exist or not and performing completion processing on the nonexistent character string part;
reordering the character strings obtained after completion processing to obtain a new URL so as to calculate a hash value of the new URL and taking the hash value as the hash value corresponding to the URL to be detected; and
and traversing the malicious URL feature library, and comparing and detecting the feature data in the malicious URL feature library with the hash value corresponding to the URL to be detected.
2. The method of claim 1, wherein the completion processing of the non-existent protocol string or the non-existent port number string comprises the steps of:
judging whether the protocol character string exists or not, and if not, supplementing HTTP as a default protocol;
and judging whether the port number character string exists or not, and if not, supplementing a default port number according to the protocol type in the protocol character string.
3. The method of claim 2, wherein supplementing a default port according to a protocol type in the protocol string comprises:
supplementing 80 as a default port number if the protocol type is the HTTP protocol;
supplementing 21 as a default port number if the protocol type is the FTP protocol;
other protocols are processed uniformly without supplementing port numbers, and empty character strings are added as port numbers.
4. The method of claim 1, further comprising converting the URL to be detected into a URL of a corresponding length using a reduction method if the URL to be detected is a short URL, and using the URL as the URL to be detected.
5. The method of claim 1, further comprising extracting feature fields to construct a prediction file for a URL that cannot be judged according to RFC specifications, and model-predicting the URL by an offline-trained and constantly-updated classifier.
6. The method of claim 5, wherein the offline training mode of the classifier is:
constructing a training file based on relevant characteristics of the extracted URL in the malicious URL characteristic library; and
training, optimizing and storing the model by using a classification algorithm, wherein the classification algorithm at least comprises a decision tree, a support vector machine, a logistic regression, a random forest or a plurality of reuses,
wherein offline training of the classifier is updated periodically or non-periodically as the malicious URL feature library changes.
7. The method of claim 1, wherein if the two-dimensional code corresponding to the URL to be detected is received in a communication message, further performing IP address matching on an IP address of a sending-end device of the communication message with a preset IP address blacklist, and if the IP address matching is successful, storing the URL to be detected in the malicious URL feature library.
8. The method of claim 1, wherein the malicious URL feature library is a hash value list consisting of hash values corresponding to each malicious URL obtained by taking each malicious URL captured previously as a URL to be detected and passing through the steps in the method.
9. A computer system comprising means for performing the method of any one of claims 1-8.
10. A computer-readable storage medium having instructions that, when executed, cause a machine to perform the method of any of claims 1-8.
Publications (1)
Publication Number | Publication Date |
---|---|
HK40016958A true HK40016958A (en) | 2020-09-18 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Butt et al. | Cloud-based email phishing attack using machine and deep learning algorithm | |
US12056708B2 (en) | Apparatus and method for cybersecurity | |
Das Guptta et al. | Modeling hybrid feature-based phishing websites detection using machine learning techniques | |
Jain et al. | Phishing detection: analysis of visual similarity based approaches | |
RU2607229C2 (en) | Systems and methods of dynamic indicators aggregation to detect network fraud | |
US10999130B2 (en) | Identification of vulnerability to social phishing | |
US20160261618A1 (en) | System and method for selectively evolving phishing detection rules | |
Pandey et al. | Phish-Sight: a new approach for phishing detection using dominant colors on web pages and machine learning | |
CN109831459B (en) | Method, device, storage medium and terminal equipment for secure access | |
CN110443031A (en) | A kind of two dimensional code Risk Identification Method and system | |
Abiodun et al. | Linkcalculator–an efficient link-based phishing detection tool | |
Nivedha et al. | Improving phishing URL detection using fuzzy association mining | |
Kaur et al. | Five-tier barrier anti-phishing scheme using hybrid approach | |
Lai et al. | Phishing and spoofing websites: Detection and countermeasures | |
US20240163299A1 (en) | Email security diagnosis device based on quantitative analysis of threat elements, and operation method thereof | |
Gautam et al. | Phishing prevention techniques: past, present and future | |
HK40016958A (en) | Two-dimensional code risk identification method and system | |
Rahim et al. | A survey on anti-phishing techniques: From conventional methods to machine learning | |
Njoku et al. | URL Based Phishing Website Detection Using Machine Learning. | |
WO2021133592A1 (en) | Malware and phishing detection and mediation platform | |
Lakshmi et al. | Advanced Phishing Website Detection Techniques in Internet of Things Using Machine Learning | |
Parmar | Detection of Phishing URL using Ensemble Learning Techniques | |
Patel | Design and Implementation of Heuristic based Phishing detection technique | |
Almutairi et al. | Developing a webpage phishing attack detection tool | |
Roellke | Detection, Triage, and Attribution of PII Phishing Sites |