CN119341816B

CN119341816B - Fishing website detection method based on YOLOv and Resnet-101

Info

Publication number: CN119341816B
Application number: CN202411472624.3A
Authority: CN
Inventors: 朱二周; 刘豪; 赵俊
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2024-10-22
Filing date: 2024-10-22
Publication date: 2025-10-21
Anticipated expiration: 2044-10-22
Also published as: CN119341816A

Abstract

The invention discloses a phishing website detection method based on YOLOv and Resnet-101, which comprises the steps of constructing and training a phishing website detection network model, wherein the phishing website detection network model comprises a target detection module and a similarity calculation module, the target detection module is combined with a YOLOv s network and an attention module, the similarity calculation module is based on a Resnet-101 network, target information of a website to be detected is obtained through the target detection module, feature extraction and cosine similarity calculation are carried out on a detection target Logo image and a legal Logo image through the similarity calculation module, and when the cosine similarity is higher than a set threshold, the detection target Logo image is judged to be correct, and the website to be detected is marked as the legal website. The invention does not need any training of fishing data and has the characteristics of small time cost, high detection accuracy and strong expansibility.

Description

Fishing website detection method based on YOLOv and Resnet-101

Technical Field

The invention belongs to the network security technology, and particularly relates to a phishing website detection method based on YOLOv and Resnet-101.

Background

In recent years, the number of phishing events has increased dramatically, and phishing detection methods for URLs, html, and web site shots have emerged to cope with increasingly severe phishing threats. The phishing detection method based on target detection aims at identifying webpage key information, namely legal brand Logo, in a website screenshot. Binary phishing reports (legal and illegal) are then generated for the user in combination with the domain name extracted from the URL.

Common conventional target detection algorithms include hog+svm, DPM, etc. The method has some limitations in target detection tasks, such as dependence of characteristic extraction on manual design, low efficiency of sliding window and candidate region generation, low detection precision and difficulty in coping with multi-target detection. The Logo in the webpage is basically a small target, and a large number of missing reports and false reports can be caused due to low detection accuracy.

Along with the development of deep learning, a plurality of target detection algorithms based on convolutional neural networks are created, features are automatically learned from data, the step of manually designing the features is omitted, and the detection speed and the detection precision are improved. Among them, YOLO is one of representative algorithms, has remarkable characteristics and advantages, and the occurrence of YOLO has been improved until now. YOLOv5 inherits the characteristics of high speed and high efficiency of the YOLO family, and is further optimized on the basis, so that the device is lighter, and better balance between accuracy and speed is realized. By virtue of the diversified model versions, the easy-to-use characteristics and the strong community support, YOLOv becomes one of the most widely applied target detection algorithms at present, and is suitable for practical application in various environments from embedded equipment to high-performance computing and the like. Therefore YOLOv is used to detect legal Logo in the web page.

The original YOLOv model has lower detection precision on the small target, and in order to improve the detection precision of YOLOv on the small target, an attention module is embedded in a YOLOv characteristic extraction network, so that the characteristic learning capability of the model on the small target is enhanced. These modules make the model more focused on the key areas of small objects by dynamically adjusting the weights of the channels and spatial dimensions.

Thanks to the extremely fast detection speed of YOLOv, the detection result can be verified by other means after the detection is completed, and the detection accuracy is higher. The method comprises the steps of extracting improved YOLOv detection results and features of legal brand Logo screenshot by utilizing Resnet-101 obtained through training on a large dataset ImageNet, calculating cosine similarity of the detection results and features of legal brand Logo screenshot, and considering that detection is correct when the similarity is higher than a certain threshold value, otherwise, detecting failure.

Disclosure of Invention

The invention aims to solve the defects in the prior art and provide a fishing website detection method based on YOLOv and Resnet-101;

The invention aims to solve the problems, and extracts legal brand Logo in the webpage by improving YOLOv, and extracts target detection results and features of real brand Logo by utilizing Resnet-101. An interpretable phishing detection report is ultimately generated for the user. The explanatory report provides detailed information to the user to help them determine whether to access the target web page.

The technical scheme is that the phishing website detection method based on YOLOv and Resnet-101 comprises the following steps:

Step S1, acquiring legal Logo and making an image data set, splitting the legal Logo into a training set and a verification set of YOLOv s+SE target detection module, acquiring a URL address and a corresponding webpage screenshot of a legal website, then acquiring the URL address and the corresponding webpage screenshot of a phishing website, respectively merging and dividing the two types of data, and respectively serving as a phishing detection data set and a target detection data set;

Step S1, firstly acquiring a URL address of a legal website and a corresponding legal Logo image, then acquiring a URL address of a phishing website and a corresponding webpage screenshot, then combining two types of data to be used as a phishing detection data set, and dividing a target detection data set at the same time, wherein the target detection data set comprises the legal Logo image and the phishing webpage screenshot;

s2, constructing and training a phishing website detection network model, wherein the phishing website detection network model comprises a target detection module and a similarity calculation module, the target detection module is combined with a YOLOv S network and an attention module, and the similarity calculation module is based on a Resnet-101 network;

S3, inputting the URL address of the website to be detected and the corresponding webpage screenshot into a trained phishing website detection network model, and obtaining target information of the website to be detected through a target detection module;

The target information comprises category information and coordinate information, a legal Logo image and a corresponding URL domain name are obtained according to the category information, a target area is intercepted on a webpage screenshot according to the coordinate information, and the intercepted target area is scaled into a detection target Logo image with the height consistent with that of the obtained legal Logo image;

S4, performing feature extraction on the detection target Logo image and the legal Logo image obtained in the step S3 by using a similarity calculation module, calculating a cosine similarity value, and judging that the detection target Logo image is correct when the obtained cosine similarity value is higher than a set threshold value;

and S5, combining the domain name in the URL address with a correct detection target Logo image to generate an interpretable phishing detection report.

Further, in the step 1, the synthetic Logo image is randomly rotated, scaled, different backgrounds are added, and the invalid URL address in the data set is deleted.

Further, the target detection module is based on YOLOv s network, and an attention module (CBAM attention module or SE attention module) is added before a space pyramid pooling rapid module SPPF of YOLOv s network, so that the calculation process of the space pyramid pooling is accelerated while the capability of fusing multi-scale features is maintained, and finally the detection precision is improved.

And inquiring a legal Domain database and a legal Logo image database according to the category information in the target information to obtain a legal Logo Domain name and a legal Logo image corresponding to the detection target.

Further, the processing procedure of the similarity calculation module is as follows:

Step 4.1, firstly adjusting an input image into a fixed size, performing normalization processing, further scaling a pixel value to a specific range, then inputting the obtained image into a ResNet-101 model, and performing forward propagation through each layer of the ResNet-101 model to obtain a feature vector with the dimension of 1x1x 2048;

And 4.2, calculating cosine similarity between the feature vector of the step 4.1 and the features of the legal Logo image.

Further, the specific method for generating the interpretable phishing report in the step S5 is as follows:

Based on the correct Logo image and the corresponding legal domain name list in the webpage screenshot, four phishing detection reports are obtained by combining domain name information extracted from the URL:

(a) Legal websites;

(b) A phishing website of a user is deceived by legal Logo;

(c) A phishing website that uses a legitimate domain name to fool the user;

(d) Phishing websites without legal information.

The invention has the advantages that no training of any fishing data is needed, and the invention has the characteristics of small time cost, high detection accuracy and strong expansibility. The target detection module disclosed by the invention uses the rapid spatial pyramid pooling module in YOLOv s, the calculation process of spatial pyramid pooling is accelerated while the capability of fusing multi-scale features is maintained, and before feature fusion, the attention module is further added in front of the rapid spatial pyramid pooling module SPPF, so that the detection accuracy is improved.

Drawings

FIG. 1 is a schematic overall flow chart of the present invention;

FIG. 2 is a schematic diagram of a target detection module according to the present invention;

FIG. 3 is a schematic diagram of a target detection result in an embodiment;

FIG. 4 is a schematic diagram of Resnet-101 structures in an embodiment;

FIG. 5 is a schematic diagram of the number of labels of 16 categories of the target detection training set in an embodiment;

FIG. 6 is a graph showing the performance of 10 models at different cosine similarity values in the examples;

fig. 7 is an explanatory phishing detection report diagram of the present invention.

Detailed Description

The technical scheme of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.

As shown in FIG. 1, the phishing website detection method based on YOLOv and Resnet-101 of the invention comprises the following steps:

In the step 1 of the embodiment, random rotation and scaling are carried out on the synthetic Logo image, different backgrounds are added, and invalid URL addresses in the dataset are deleted.

For example, in this embodiment, 16 kinds of logos of 5 brands are searched first, as shown in fig. 5, including ：google,search,chrome,mail,map,play,chat,picture,meeting,amazon 1,amazon 2,alibaba,twitter 1,twitter 2,facebook_1,facebook_2., then the legal Logo image is rotated, scaled and added with different backgrounds, and LabelMe is used to label the legal Logo in the image, so as to make a training set and a verification set for the target detection module. The training set of the target detection module contains 435 images and the verification set of the target detection module contains 201 images.

Meanwhile, the PHISHTANK website is accessed to acquire the URL of the phishing website and the webpage screenshot thereof since 2023, data cleaning is carried out, and the URL which has been invalidated is deleted. The phishing webpage image after data cleaning is divided into two parts, wherein one part is only provided with an image, a LabelMe label legal Logo is used as a test set of the target detection module, and the other part comprises the image and the URL thereof and is used for the phishing detection test set. 131 legal website URLs and corresponding web page shots are manually collected and divided into a portion of the test set of the target detection module and a portion of the phishing detection data set. The test set of the target detection module contains 2203 URLs and their web site images, wherein 2122 URLs are phishing web sites and 81 URLs are legal web sites. The phishing detection test set contains 1108 URL machine website images, of which 1058 URLs are phishing websites and 50 are legitimate websites.

The target detection module is based on YOLOv s network, and adds an attention module before a space pyramid pooling rapid module SPPF of YOLOv s network, and queries a legal Domain database and a legal Logo image database according to category information in target information to obtain legal Logo Domain names and legal Logo images corresponding to the legal Logo. The processing procedure of the similarity calculation module in this embodiment is as follows:

For example, all the Target shots obtained in the previous step are taken as a data set and recorded as Target, and two values, categories and judgment values of each row are sequentially recorded in the csv file according to the sequence in the Target. Judging that the value is only 0 or 1, wherein the value is 1 to represent correctness, the value is 0 to represent error, 10 models Resnet-101, resnet-50, EFFICIENTNET-B0-B7 are selected, and each model comprises the following steps:

The method comprises the steps of obtaining a Target image, inquiring a legal Logo image database according to the category of a csv corresponding row to obtain legal Logo, respectively carrying out feature extraction on the two images by using a model, calculating similarity by using a cosine similarity formula, obtaining 1 when the similarity is higher than a certain threshold value, indicating that the two images are similar, otherwise, obtaining 0 and dissimilar, and finally comparing the similarity with a judging value to judge that the judgment is correct.

Examples

MAP, accuracy, precision, recall and F ₁ -Score were used to evaluate the model's metrics.

In the field of computer vision, average Accuracy (AP) is commonly used to evaluate the performance of object detection and image classification models. The AP measures the average accuracy of the model used to predict a particular type of object in the complete dataset. As shown in equation (1), the calculation of the AP considers both accuracy and recall by forming an accuracy-recall curve (recall on x-axis, accuracy on y-axis), and the value of the AP is obtained from the area under the curve.

For multi-category target detection tasks, a single category of AP values may not fully measure the effectiveness of the detection model. Therefore, it is necessary to average the object AP values of all categories in the dataset. As shown in equation (2), the average AP (mAP) accurately reflects the overall performance of the detection model.

TABLE 1 mAP of eight target detection models

Table 25 shows the performance of the YOLOv model.

Table 1 gives the experimental results for 8 target detection models. In this table, mAP_Train and mAP_test represent mAP values for training and testing phases, respectively. Time is the average Time cost of detecting a graph, detectedNum is the number of Logo detected. The experimental results listed in table 1 show that 8 models have high mAP values during the test phase of object detection. However, the YOLOv model (YOLOv s, yolov 8m and YOLOv l) was not selected for two reasons. (1) The YOLOv series of models identify many background elements as objects. The target detection test set has only 201 marked Logo. However, as shown in column 5 of Table 1, YOLOV m and YOLOv l each identified 800 more Logo. Thus, the YOLOv model produces too many false positives. (2) YOLOv model can result in high time costs. As shown in column 4 of table 1, the minimum detection time for the YOLOv series model was higher than for the YOLOv series model.

In order to determine the most effective object detection model, the object detection experimental results of the remaining 5 models (YOLOv s, YOLOv5m, YOLOv 5l, YOLOv s+cbam of the present invention and YOLOv s+se of the present invention) are first compared with the actual annotation data in the target detection verification set. Then, the values of the evaluation index, precision, accuracy, recall and F ₁ -Score, were calculated as shown in FIG. 6.

The results are shown in table 2, and it can be seen that adding CBAM and SE attention module to YOLOv s significantly improved the four index values compared to the YOLOv s alone. Furthermore, the performance of YOLOv s+cbm and YOLOv s+se is superior to the more complex YOLOv 5m and YOLOv l models. Thus, adding an attention mechanism in YOLOv s is an effective method. The experimental results in table 2 also show that the Precision of YOLOv s+se is 20.2% higher than YOLOv s. This means that YOLOv s+SE will produce very small false positives in object prediction and reduce the error in identifying the background element as Logo. The largest F ₁ -Score indicated that the overall performance of YOLOv s+SE was optimal.

According to the similarity calculation module, based on the detected Logo and category obtained by the target detection module in the webpage screenshot, a corresponding legal Logo image is obtained from a legal Logo image library. And acquiring the detected Logo and the corresponding features of the legal Logo image by using a feature extraction module. And calculating cosine similarity between the two images.

In order to verify the performance of the feature extraction module in the technical scheme of the invention, 10 different models, namely Resnet-50, resnet-101 and EFFICIENTNET-B0-B7, are selected for extracting features from Logo images. From the experimental results, the effect of the invention is optimal.

And after the Logo image of the detection target is obtained, performing cosine similarity calculation between the Logo image and a legal Logo image as feature vectors, and evaluating the similarity between the Logo image and the legal Logo image by calculating cosine values of included angles between the two vectors through the cosine similarity. In particular, if the angle between two vectors is closer to 0, i.e. their cosine values are closer to 1, this indicates that the two vectors are more similar. Conversely, if the angle is closer to 180 degrees, the cosine value is closer to-1, indicating that the two vectors are less similar. The range of cosine similarity is [ -1,1], where 1 represents two completely similar vectors, -1 represents two completely opposite vectors, and 0 represents two orthogonal or independent vectors.

For example, given two vectors a and b, the a vector represents the feature of the detected Logo image extracted through Resnet-101, the b vector represents the feature of the legal Logo image extracted through Resnet-101, and their cosine similarity can be calculated by the following formula:

In the experimental results shown in FIG. 6, it can be seen that when the cosine similarity threshold is set between 0.5 and 0.6, the recall rates of Resnet-50 and Resnet-101 models are close to 1, the precision values of Resnet-101 are each higher than 0.85 as shown in FIG. 6 (a), and furthermore, accuracy and F ₁ -Score of Resnet-101 are best in 10 models as shown in FIGS. 6 (b) and (c). Experimental results show that two different images can be distinguished by combining the feature extraction capability of Resnet-101 model with cosine similarity calculation. Resnet-101 obtained the highest F ₁ -Score (0.926) when the cosine similarity threshold reached 0.6. Since F ₁ -Score is a comprehensive evaluation index reflecting overall performance of the model, we set the similarity to 0.6. In other words, when the similarity detection result is greater than or equal to the threshold value, the object detection result is considered to be correct.

Step S4 of this embodiment generates four types of interpretable phishing reports, as shown in fig. 7:

(a) Legal website. Only if the domain name extracted from the URL can be successfully matched in the legal domain name corresponding to the correct Logo image, the URL is considered as a legal website. For example reported as "Legal// Domain: di Logo li," where di is the Domain name and li is the correct Logo image category.

(B) And using legal Logo to deceive the phishing website of the user. The target detection and similarity calculation module determines that the Logo image on the target webpage is authentic. But the input URL does not contain the domain name to which the Logo belongs. In this case, the report is "Phish// No domain. Receive the User WITH LEGAL logo".

(C) And spoofing the user's phishing website using legitimate domain names. The domain name contained in the input URL is consistent with the legitimate domain name in the legitimate domain name database. However, the Logo image on the screenshot determined by the object detection and similarity calculation module is not authentic. In this case, the content of the detection report is "Phish// Fake logo. Decetive User WITH LEGAL Domain".

(D) Phishing websites without legal information. The webpage screenshot does not have any Logo image, and the URL also has no legal domain name. In this case, the content of the detection report is "Phish// No Legal Information".

Claims

1. A phishing website detection method based on YOLOv5 and Resnet-101, characterized by comprising the following steps:

Step S1: Obtain legitimate logos and create an image dataset, which is then split into a training set and a validation set for the target detection module. Obtain the URL address and corresponding webpage screenshots of legitimate websites, then obtain the URL address and corresponding webpage screenshots of phishing websites, merge the two types of data, and then divide them into a phishing detection dataset and a target detection dataset, respectively.

Step S2: constructing and training a phishing website detection network model, wherein the phishing website detection network model includes a target detection module and a similarity calculation module; the target detection module combines a YOLOv5s network and an attention module, and the similarity calculation module includes a Resnet-101 network;

Step S3: Input the URL address of the website to be detected and the corresponding webpage screenshot into the trained phishing website detection network model, and obtain the target information of the website to be detected through the target detection module;

The target information includes category information and coordinate information. Based on the category information, a legitimate logo image and corresponding URL domain name are obtained. Based on the coordinate information, the target area is captured on the webpage screenshot and scaled to a detection target logo image that is highly consistent with the obtained legitimate logo image.

Step S4: Use a similarity calculation module to perform feature extraction on the detection target logo image and the legal logo image obtained in step S3, and then calculate the cosine similarity value. When the obtained cosine similarity value is higher than a set threshold, the detection target logo image is judged to be correct;

Compare the URL domain name corresponding to the legal logo image obtained in step S3 based on the category information with the URL domain name of the website to be detected. If the two are consistent, the URL domain name of the website to be detected is determined to be legal;

Step S5: Combine the domain name detection result and the logo image detection result to generate an interpretable phishing detection report. The specific method for generating an interpretable phishing report is as follows:

Based on the correct logo image in the webpage screenshot and its corresponding legitimate domain name list, combined with the domain name information extracted from the URL, four phishing detection reports are obtained:

(a) Legal website;

(b) Phishing websites that use legitimate logos to deceive users;

(c) Phishing websites that use legitimate domain names to deceive users;

(d) Phishing websites that do not contain legitimate information.

2. The phishing website detection method based on YOLOv5 and Resnet-101 according to claim 1 is characterized in that in step S1, the legitimate logo image is randomly rotated and scaled, and different backgrounds are added; and invalid URL addresses in the data set are deleted.

3. The phishing website detection method based on YOLOv5 and Resnet-101 according to claim 1, characterized in that the target detection module is based on the YOLOv5s network, and an attention module is added before the spatial pyramid pooling fast module SPPF of the YOLOv5s network;

According to the category information in the target information, the legal domain database and legal logo image database are queried to obtain the legal logo domain name and legal logo image corresponding to the detection target.

4. The phishing website detection method based on YOLOv5 and Resnet-101 according to claim 1, wherein the processing process of the similarity calculation module is as follows:

Step S4.1: First, resize the input image to a fixed size and normalize it, scaling the pixel values to a specific range. Then, input the resulting image into the ResNet-101 model and perform forward propagation through the layers of the ResNet-101 model to obtain a feature vector of dimension 1x1x2048.

Step S4.2: Calculate the cosine similarity between the feature vector obtained in step S4.1 and the features of the legal logo image.