WO2014032619A1

WO2014032619A1 - Web address access method and system

Info

Publication number: WO2014032619A1
Application number: PCT/CN2013/082729
Authority: WO
Inventors: 肖鹏; 李晓波; 宋申雷; 刘起
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2012-08-31
Filing date: 2013-08-30
Publication date: 2014-03-06
Anticipated expiration: 2015-02-28
Also published as: CN102833258B; CN102833258A

Description

网址访问方法及系统技术领域 Website access method and system

本发明涉及网络技术领域，具体涉及一种网址访问方法及系统。背景技术 The present invention relates to the field of network technologies, and in particular, to a method and system for accessing a website. Background technique

在所有木马、恶意软件传播的途径中，有 70%以上的安全威胁来源于网络浏览，主要的方式包括网页挂马、网络钓鱼、恶意下载等，各式各样的恶意网站已经严重威胁到用户个人信息安全、国家信息安全和互联网健康发展，因此针对恶意网站的实时拦截是信息安全厂商必备的核心功能。 More than 70% of all Trojan horses and malware transmissions come from web browsing. The main methods include webpage hanging, phishing, malicious downloading, etc. Various malicious websites have seriously threatened users. Personal information security, national information security and the healthy development of the Internet, so real-time interception of malicious websites is an essential function of information security vendors.

现有技术的恶意网址库都存在于本地客户端，而互联网上的恶意网站在不断更新变化，恶意网址库的生成也需要不断更新，现有技术需要依靠本地客户端不断升级新的本地恶意网址库才能保证恶意网址的拦截效果，然而本地恶意网址的升级时间周期过长，往往存在滞后性，无法及时更新互联网上层出不穷的各类恶意网址，导致无法快速有效地拦截恶意网站。发明内容 The prior art malicious URL library exists in the local client, and the malicious website on the Internet is constantly updated and changed, and the generation of the malicious URL library needs to be continuously updated. The existing technology needs to rely on the local client to continuously upgrade the new local malicious website. The library can guarantee the interception effect of malicious URLs. However, the upgrade time period of local malicious URLs is too long, and there is often a lag. It is impossible to update all kinds of malicious websites on the Internet in an timely manner, which makes it impossible to intercept malicious websites quickly and effectively. Summary of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的网址访问方法和相应的网址访问系统。 In view of the above problems, the present invention has been made in order to provide a web site access method and corresponding web site access system that overcomes the above problems or at least partially solves the above problems.

根据本发明的一个方面，提供了一种网址访问方法，包括： According to an aspect of the present invention, a method for accessing a web address is provided, including:

客户端获取请求访问的网址对应的网址信息； The client obtains the URL information corresponding to the URL requested to be accessed;

所述客户端根据所述网址信息，提取网址密文； The client extracts a website ciphertext according to the website information;

所述客户端将所述网址密文提交给服务器； The client submits the website ciphertext to the server;

所述服务器将网址密文与数据库中存储的密文进行匹配； The server matches the website ciphertext with the ciphertext stored in the database;

若网址密文与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果；所述客户端根据所述恶意网址查询结果，阻断对所述网址的访问行为； If the ciphertext of the webpage matches the ciphertext of the database marked as a malicious webpage, the malicious webpage query result is returned to the client; the client blocks the access behavior of the webpage according to the malicious webpage query result;

若网址密文不与数据库中标记为恶意网址的密文匹配，则向所述客户端返回正常网址查询结果；所述客户端根据所述正常网址查询结果，继续进行对所述网址的访问行为。 If the ciphertext of the web address is not matched with the ciphertext of the database that is marked as a malicious web address, the normal web address query result is returned to the client; and the client continues to access the web address according to the normal web address query result. .

可选地，所述网址信息具体为至少一个第一 URL。 Optionally, the website information is specifically at least one first URL.

可选地，所述数据库中标记为恶意网址的密文包括以下信息的一种或多种：恶意 URL的特征值、恶意 URL的主机名的特征值和恶意 URL的子域名的特征值。 Optionally, the ciphertext marked as a malicious website in the database includes one or more of the following information: a feature value of the malicious URL, a feature value of the host name of the malicious URL, and a feature value of the sub-domain name of the malicious URL.

可选地，所述至少一个第一 URL包括：所述请求访问的网址对应的网页的 URL 或所述请求访问的网址对应的网页内容中链接的 URL或下载文件的 URL或以上信息的任一组合。可选地，所述客户端获取请求访问的网址对应的网址信息包括：通过指定响应事件接口，获取所述客户端请求访问的网址对应的网页的 URL。可选地，所述客户端获取请求访问的网址对应的网址信息包括： Optionally, the at least one first URL includes: a URL of the webpage corresponding to the webpage requested to be accessed, or a URL of the webpage content corresponding to the webpage requested to be accessed, or a URL of the downloaded file or any of the above information. combination. Optionally, the obtaining, by the client, the URL information corresponding to the web address requested to be accessed, by: specifying a response event interface, obtaining a URL of the webpage corresponding to the webpage requested by the client. Optionally, the website information that is obtained by the client to obtain the requested URL includes:

获得客户端的浏览器内部的页面对象； Get the page object inside the client's browser;

通过调用所述页面对象的方法，获取所述客户端请求访问的网址对应的网页内容中链接的 URL。 The URL of the link in the webpage content corresponding to the webpage requested by the client is obtained by calling the method of the page object.

可选地，所述客户端获取请求访问的网址对应的网址信息包括： Optionally, the website information that is obtained by the client to obtain the requested URL includes:

监控所述客户端的浏览器内部与下载有关的函数； Monitoring a function related to downloading within the browser of the client;

当所述浏览器发生下载行为时，获取所述下载文件的 URL。 When the browser generates a download behavior, the URL of the downloaded file is obtained.

可选地，在所述客户端根据所述网址信息，提取网址密文之前还包括：所述客户端对所述至少一个第一 URL进行规范化处理。 Optionally, before the client extracts the website ciphertext according to the website information, the method further includes: the client performing normalization processing on the at least one first URL.

可选地，所述客户端对所述至少一个第一 URL进行规范化处理包括：将所述第一 URL中的字母大小写进行统一； Optionally, the normalizing, by the client, the at least one first URL includes: unifying a letter case in the first URL;

去除所述第一 URL中重复多余的路径符和参数。 Excluding redundant path characters and parameters in the first URL.

可选地，所述客户端根据所述网址信息，提取网址密文包括： Optionally, the client extracts the website ciphertext according to the website information, including:

获取所述第一 URL的主机名和所述第一 URL的第一域名段； Obtaining a host name of the first URL and a first domain name segment of the first URL;

分别计算所述第一 URL的特征值、所述第一 URL的主机名的特征值和所述第一 URL的第一域名段的特征值； Calculating, respectively, a feature value of the first URL, a feature value of a host name of the first URL, and a feature value of a first domain name segment of the first URL;

所述第一 URL的特征值、所述第一 URL的主机名的特征值和所述第一 URL的第一域名段的特征值即为所述网址密文。 The feature value of the first URL, the feature value of the host name of the first URL, and the feature value of the first domain name segment of the first URL are the website ciphertext.

可选地，若所述第一 URL的主机名从右至左的第一级根域名为国际顶级域名，则所述第一 URL的第一域名段为所述第一 URL的主机名的第一级子域名；若所述第一 URL 的主机名从右至左的第一级根域名为国家地区顶级域名，第一级子域名包括国际顶级域名，则所述第一 URL的第一域名段为所述第一 URL的主机名的第二级子域名；若所述第一 URL的主机名使用了动态域名，则所述第一 URL的第一域名段为第一 URL的主机名从动态域名开始，向右提取的下一级子域名。 Optionally, if the first-level root domain name of the first URL from the right-to-left is an international top-level domain name, the first domain name segment of the first URL is the host name of the first URL. a first-level sub-domain; if the first-level root domain name of the first URL from the right-to-left is a country-level top-level domain, and the first-level sub-domain includes an international top-level domain, the first domain of the first URL The segment is a second-level sub-domain name of the host name of the first URL; if the host name of the first URL uses a dynamic domain name, the first domain name segment of the first URL is the host name of the first URL The dynamic domain name begins, and the next-level subdomain is extracted to the right.

可选地，所述若网址密文与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果具体为：若所述至少一个第一 URL中任一第一 URL的特征值、所述至少一个第一 URL中任一第一 URL的主机名的特征值和所述至少一个第一 URL中任一第一 URL的第一域名段的特征值中的任一个与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果。 Optionally, if the ciphertext of the website matches the ciphertext of the database marked as a malicious url, the result of returning the malicious url to the client is: if any of the first URLs in the at least one first URL a feature value, a feature value of a host name of any one of the at least one first URL, and a feature value of a first domain name segment of any one of the at least one first URL and If the ciphertext matching in the database is marked as a malicious URL, the malicious URL query result is returned to the client.

可选地，所述服务器将网址密文与数据库中存储的密文进行匹配包括：将所述至少一个第一 URL中任一第一 URL的特征值与数据库中标记为恶意网址的密文进行匹配；若所述至少一个第一 URL中任一第一 URL的特征值与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果； Optionally, the matching, by the server, the ciphertext of the website with the ciphertext stored in the database includes: performing, by using a feature value of any one of the at least one first URL and a ciphertext of the database marked as a malicious URL Matching; if the feature value of any one of the at least one first URL matches the ciphertext marked as a malicious URL in the database, returning the malicious URL query result to the client;

若所述至少一个第一 URL中任一第一 URL的特征值不与数据库中标记为恶意网址的密文匹配，则将所述至少一个第一 URL中任一第一 URL的主机名的特征值与数据库中标记为恶意网址的密文进行匹配；若所述至少一个第一 URL 中任一第一 URL 的主机名的特征值与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果； If the feature value of any of the at least one first URL does not match the ciphertext marked as a malicious URL in the database, the feature of the host name of any of the at least one first URL Value and number Matching according to the ciphertext marked as a malicious URL in the library; if the feature value of the host name of any of the at least one first URL matches the ciphertext marked as a malicious URL in the database, then the client is End the return of the malicious URL query result;

若所述至少一个第一 URL中任一第一 URL的主机名的特征值不与数据库中标记为恶意网址的密文匹配，则将所述至少一个第一 URL中任一第一 URL的第一域名段的特征值与数据库中标记为恶意网址的密文进行匹配；若所述至少一个第一 URL 中任一第一 URL 的第一域名段的特征值与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果；若所述至少一个第一 URL中任一第一 URL的第一域名段的特征值不与数据库中标记为恶意网址的密文匹配，则向所述客户端返回正常网址查询结果。 If the feature value of the host name of any one of the at least one first URL does not match the ciphertext of the database marked as a malicious URL, then the first URL of the at least one first URL is The feature value of a domain name segment is matched with the ciphertext marked as a malicious website in the database; if the feature value of the first domain name segment of any one of the at least one first URL is the same as the password marked as a malicious URL in the database If the text matches, the malicious web address query result is returned to the client; if the feature value of the first domain name segment of any of the first URLs in the at least one first URL does not match the ciphertext marked as a malicious web address in the database, Returning the normal URL query result to the client.

可选地，还包括构建所述数据库的步骤； Optionally, the method further includes the step of constructing the database;

所述构建数据库的步骤包括： The steps of constructing the database include:

获取已知为恶意网址且第一域名段相同的至少一个第二 URL ; Obtaining at least one second URL known as a malicious web address and having the same first domain name segment;

获取所述至少一个第二 URL中包含子域名级数最高的第三 URL，从右至左逐级追溯第三 URL所包含的子域名，提取至少一级子域名； Obtaining a third URL that includes the highest number of sub-domains in the at least one second URL, and traces the sub-domains included in the third URL from right to left, and extracts at least one sub-domain name;

若所述第二 URL的第一域名段属于预设的可信名单，将所述每个第二 URL的特征值和每个第二 URL的主机名的特征值标记为恶意网址的密文，存储在数据库中；若所述第二 URL的第一域名段属于预设的不可信名单，获取至少一个第二 URL 中包含子域名级数最低的第四 URL，将所述每个第二 URL的特征值、每个第二 URL的主机名的特征值以及除了各个第二 URL 的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL的子域名的特征值标记为恶意网址的密文，存储在数据库中。 If the first domain name segment of the second URL belongs to the preset trusted list, the feature value of each second URL and the feature value of the host name of each second URL are marked as the ciphertext of the malicious website. Stored in the database; if the first domain name segment of the second URL belongs to the preset untrusted list, obtain at least one second URL that includes the lowest number of sub-domain name levels, and each of the second URLs The feature value, the feature value of the host name of each second URL, and the feature value of the sub-domain name of the at least one sub-domain name that is extracted in addition to the host name of each second URL is higher than the sub-domain name of the fourth URL. The ciphertext of the URL, stored in the database.

可选地，所述追溯提取的至少一级子域名的级数为设定阈值。 Optionally, the number of levels of the at least one primary domain name extracted by the traceback is a set threshold.

根据本发明的另一方面，提供了一种网址访问系统，包括：客户端和服务器；所述客户端包括： According to another aspect of the present invention, a URL access system is provided, including: a client and a server; the client includes:

监控模块，用于获取请求访问的网址对应的网址信息； a monitoring module, configured to obtain URL information corresponding to the website requested to access;

提取模块，用于根据所述网址信息，提取网址密文； An extraction module, configured to extract a website ciphertext according to the website information;

通信模块，用于将所述网址密文提交给服务器； a communication module, configured to submit the website ciphertext to a server;

保护模块，用于根据服务器返回的恶意网址查询结果，阻断对所述网址的访问行为； a protection module, configured to block access to the web address according to a malicious web address query result returned by the server;

访问模块，用于根据服务器返回的正常网址查询结果，继续进行对所述网址的访问行为。 The access module is configured to continue the access behavior to the web address according to the normal web address query result returned by the server.

所述服务器包括： The server includes:

数据库，用于存储密文； a database for storing ciphertext;

查询模块，用于将网址密文与数据库中存储的密文进行匹配；若网址密文与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果；若网址密文不与数据库中标记为恶意网址的密文匹配，则向所述客户端返回正常网址查询结果。可选地，所述监控模块具体用于获取请求访问的网址对应的至少一个第一 URL，所述至少一个第一 URL包括：所述请求访问的网址对应的网页的 URL或所述请求访问的网址对应的网页内容中链接的 URL或下载文件的 URL或以上信息的任一组合；所述数据库中标记为恶意网址的密文包括以下信息的一种或多种：恶意 URL的特征值、恶意 URL的主机名的特征值和恶意 URL的子域名的特征值。 a query module, configured to match the ciphertext of the website with the ciphertext stored in the database; if the ciphertext of the website matches the ciphertext marked as a malicious web address in the database, return a malicious web query result to the client; If the text does not match the ciphertext marked as a malicious URL in the database, the normal URL query result is returned to the client. Optionally, the monitoring module is specifically configured to obtain at least one first URL corresponding to the website that is requested to be accessed, where the at least one first URL includes: a URL of the webpage corresponding to the website that requests the access, or the requested access The URL of the link in the webpage content corresponding to the web address or the URL of the downloaded file or any combination of the above information; the ciphertext marked as a malicious web address in the database includes one or more of the following information: characteristic value of the malicious URL, malicious The feature value of the host name of the URL and the feature value of the subdomain of the malicious URL.

可选地，所述监控模块包括： Optionally, the monitoring module includes:

第一监控单元，用于通过指定响应事件接口，获取所述客户端请求访问的网址对应的网页的 URL。 The first monitoring unit is configured to obtain, by specifying a response event interface, a URL of a webpage corresponding to the web address requested by the client.

第二监控单元，用于获得客户端的浏览器内部的页面对象；通过调用所述页面对象的方法，获取所述客户端请求访问的网址对应的网页内容中链接的 URL。 The second monitoring unit is configured to obtain a page object inside the browser of the client; and obtain a URL linked in the webpage content corresponding to the webpage requested by the client by calling the method of the page object.

第三监控单元，用于监控所述客户端的浏览器内部与下载有关的函数；当所述浏览器发生下载行为时，获取所述下载文件的 URL。 a third monitoring unit, configured to monitor a function related to downloading in the browser of the client; and when the browser performs a downloading behavior, obtain a URL of the downloaded file.

可选地，所述客户端还包括：处理模块，用于对所述至少一个第一 URL进行规范化处理。 Optionally, the client further includes: a processing module, configured to perform normalization processing on the at least one first URL.

可选地，所述处理模块包括： Optionally, the processing module includes:

统一单元，用于将所述第一 URL中的字母大小写进行统一； a unit for unifying the uppercase and lowercase of the letters in the first URL;

去除单元，用于去除所述第一 URL中重复多余的路径符和参数。 And a removing unit, configured to remove redundant path identifiers and parameters in the first URL.

可选地，所述提取模块包括： Optionally, the extracting module includes:

获取单元，用于获取所述第一 URL的主机名和所述第一 URL的第一域名段；计算单元，用于分别计算所述第一 URL的特征值、所述第一 URL的主机名的特征值和所述第一 URL的第一域名段的特征值； An acquiring unit, configured to acquire a host name of the first URL and a first domain name segment of the first URL, and a calculating unit, configured to separately calculate a feature value of the first URL, and a host name of the first URL a feature value and a feature value of the first domain name segment of the first URL;

可选地，若所述第一 URL的主机名从右至左的第一级根域名为国际顶级域名，则所述获取单元具体用于获取所述第一 URL的主机名的第一级子域名为所述第一 URL 的第一域名段；若所述第一 URL 的主机名从右至左的第一级根域名为国家地区顶级域名，第一级子域名包括国际顶级域名，则所述获取单元具体用于获取所述第一 URL 的主机名的第二级子域名为所述第一 URL的第一域名段；若所述第一 URL使用了动态域名，则所述获取单元具体用于获取从动态域名开始，向右提取的下一级子域名为所述第一 URL的第一域名段。 Optionally, if the first-level root domain name of the host name of the first URL is an international top-level domain name, the acquiring unit is specifically configured to acquire the first level of the host name of the first URL. The domain name is the first domain name segment of the first URL; if the first-level root domain name of the first URL from the right-to-left domain name is a country-level top-level domain name, and the first-level sub-domain name includes an international top-level domain name, The obtaining unit is specifically configured to obtain the second-level sub-domain name of the host name of the first URL as the first domain name segment of the first URL; if the first URL uses a dynamic domain name, the acquiring unit is specific For obtaining the starting from the dynamic domain name, the next-level sub-domain extracted to the right is the first domain name segment of the first URL.

可选地，所述查询模块具体用于将网址密文与数据库中存储的密文进行匹配；若所述至少一个第一 URL中任一第一 URL的特征值、所述至少一个第一 URL中任一第一 URL的主机名的特征值和所述至少一个第一 URL中任一第一 URL的第一域名段的特征值中的任一个与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果。可选地，所述查询模块具体用于： Optionally, the querying module is specifically configured to match the ciphertext of the website with the ciphertext stored in the database; if the feature value of any of the at least one first URL, the at least one first URL Any one of a feature value of a host name of any one of the first URLs and a feature value of a first domain name segment of any one of the at least one first URLs matches a ciphertext of the database marked as a malicious URL, Returning the malicious URL query result to the client. Optionally, the query module is specifically configured to:

将所述至少一个第一 URL中任一第一 URL的特征值与数据库中标记为恶意网址的密文进行匹配；若所述至少一个第一 URL中任一第一 URL的特征值与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果； Matching a feature value of any one of the at least one first URL with a ciphertext marked as a malicious URL in the database; if the feature value of any of the at least one first URL is in a database If the ciphertext matching is marked as a malicious URL, the malicious URL query result is returned to the client;

若所述至少一个第一 URL中任一第一 URL的特征值不与数据库中标记为恶意网址的密文匹配，则将所述至少一个第一 URL中任一第一 URL的主机名的特征值与数据库中标记为恶意网址的密文进行匹配；若所述至少一个第一 URL 中任一第一 URL 的主机名的特征值与数据库中标记为恶意网址的密文匹配，则向所述客户端返回恶意网址查询结果； If the feature value of any of the at least one first URL does not match the ciphertext marked as a malicious URL in the database, the feature of the host name of any of the at least one first URL The value is matched with the ciphertext marked as a malicious URL in the database; if the feature value of the host name of any of the at least one first URL matches the ciphertext marked as a malicious URL in the database, then The client returns the malicious URL query result;

可选地，所述服务器还包括：构建模块，用于构建所述数据库； Optionally, the server further includes: a building module, configured to build the database;

所述构建模块包括： The building module includes:

第一获取单元，用于获取已知为恶意网址且第一域名段相同的至少一个第二 URL; a first obtaining unit, configured to acquire at least one second URL that is known to be a malicious web address and has the same first domain name segment;

第二获取单元，用于获取所述至少一个第二 URL中包含子域名级数最高的第三 URL, 从右至左逐级追溯第三 URL所包含的子域名，提取至少一级子域名； a second obtaining unit, configured to obtain a third URL that includes the highest number of sub-domains in the at least one second URL, and traces the sub-domains included in the third URL from right to left, and extracts at least one sub-domain name;

第一标记单元，用于若所述第二 URL的第一域名段属于预设的可信名单，将所述每个第二 URL的特征值和每个第二 URL的主机名的特征值标记为恶意网址的密文，存储在数据库中； a first marking unit, configured to: if the first domain name segment of the second URL belongs to a preset trusted list, mark the feature value of each second URL and the feature value of the host name of each second URL The ciphertext for the malicious URL is stored in the database;

第二标记单元，用于若所述第二 URL的第一域名段属于预设的不可信名单，获取至少一个第二 URL中包含子域名级数最低的第四 URL，将所述每个第二 URL的特征值、每个第二 URL的主机名的特征值以及除了各个第二 URL的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL的子域名的特征值标记为恶意网址的密文，存储在数据库中。 a second marking unit, configured to: if the first domain name segment of the second URL belongs to a preset untrusted list, obtain at least one second URL that includes the lowest number of sub-domain name levels, and each of the The feature value of the second URL, the feature value of the host name of each second URL, and the feature value tag of the sub-domain name of the sub-domain name of the at least one sub-domain name extracted by the tracing other than the host name of each second URL is higher than the sub-domain name of the fourth URL The ciphertext for the malicious URL is stored in the database.

根据本实施例提供的网址访问方法及系统，当客户端请求访问网址时，从网址信息中提取网址密文，将网址密文提交给服务器，服务器将网址密文与数据库中存储的密文匹配，完成网址的安全查询和验证，客户端根据服务器的验证结果决定是否继续对网址的访问行为。该方法不依赖客户端本地的数据库，将网址的安全查询和验证放在服务器侧完成。由于服务器侧的数据库可以及时的更新互联网上的各类恶意网址，它的升级周期远远短于客户端本地的数据库，而且服务器侧的数据库中恶意网址的信息存储量很大，覆盖面很广，从而能够快速有效地拦截恶意网站。上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。附图说明 According to the URL access method and system provided by the embodiment, when the client requests to access the web address, the web site ciphertext is extracted from the web address information, and the web address ciphertext is submitted to the server, and the server matches the web address ciphertext with the ciphertext stored in the database. To complete the security query and verification of the URL, the client decides whether to continue the access behavior to the URL according to the verification result of the server. This method does not rely on the client's local database, and the security query and verification of the URL is done on the server side. Since the database on the server side can update various malicious websites on the Internet in time, its upgrade period is much shorter than the local database of the client, and the information of the malicious website in the database on the server side is large, and the coverage is wide. This enables fast and effective interception of malicious websites. The above description is only an overview of the technical solutions of the present invention, and the technical means of the present invention can be more clearly understood, and can be implemented in accordance with the contents of the specification, and the above and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below. DRAWINGS

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中： Various other advantages and benefits will become apparent to those skilled in the art in the <RTIgt; The drawings are only for the purpose of illustrating the preferred embodiments and are not intended to limit the invention. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:

图 1示出了根据本发明一个实施例的网址访问方法的流程图； 1 shows a flow chart of a method of accessing a web address according to an embodiment of the present invention;

图 2示出了根据本发明一个实施例的网址访问方法的流程图； 2 shows a flow chart of a method of accessing a web address according to an embodiment of the present invention;

图 3示出了本发明实施例中网址密文匹配过程的流程图； FIG. 3 is a flowchart showing a process of matching a ciphertext of a website in the embodiment of the present invention;

图 4示出了根据本发明一个实施例的网址访问系统的结构示意图。 FIG. 4 shows a schematic structural diagram of a web address access system according to an embodiment of the present invention.

图 5示意性地示出了可以实现根据本发明的网址访问系统；以及 Figure 5 is a schematic illustration of a web site access system in accordance with the present invention;

图 6示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。具体实施方式 Fig. 6 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention. detailed description

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。 Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the invention has been described with respect to the embodiments of the present invention, it should be understood that Rather, the embodiments are provided so that this disclosure will be more fully understood, and the scope of the disclosure may be fully conveyed by those skilled in the art.

图 1示出了根据本发明一个实施例的网址访问方法的流程图。本实施例中，以客户端访问的网址为统一资源定位符（Universal Resource Locator , 以下简称： URL) 为例进行介绍。如图 1所示，该方法包括如下步骤： FIG. 1 shows a flow chart of a method of accessing a web address according to an embodiment of the present invention. In this embodiment, the URL accessed by the client is a Uniform Resource Locator (hereinafter referred to as URL). As shown in FIG. 1, the method includes the following steps:

步骤 101、客户端获取请求访问的网址对应的网址信息。 Step 101: The client obtains the URL information corresponding to the website requested to access.

监控客户端各种类型的浏览器的网页访问行为，所请求访问的网址信息称为第一 URL。该第一 URL可以包括如下几种： Monitors the web access behavior of various types of browsers on the client. The requested URL information is called the first URL. The first URL may include the following:

i .请求访问的网址对应的网页的 URL； i. The URL of the web page corresponding to the web address requested to be accessed;

例如，客户端请求访问 " 新浪 " 主页，该网页的 URL 即为： http： //www. sina. com. cn/。 For example, the client requests access to the "Sina" home page, and the URL of the page is: http: //www.sina.com.cn/.

ii .请求访问的网址对应的网页内容中链接的 URL; Ii. The URL of the link in the webpage content corresponding to the webpage requesting access;

在客户端请求访问的网页的内容中有可能存在一些链接网址，这些链接网址的 URL也属于监控的范围。 There may be some link URLs in the content of the webpage requested by the client, and the URLs of these link URLs also belong to the scope of monitoring.

iii.下载文件的 URL。 Iii. URL of the downloaded file.

客户端请求下载文件，该下载文件的 URL也属于监控的范围。 The client requests to download the file, and the URL of the downloaded file also belongs to the scope of monitoring.

客户端的某一网页访问行为可能涉及到以上三种 URL中的一种或多种，即第一 URL包括以上三种 URL中的任一种或任意几种的组合。 A web page access behavior of the client may involve one or more of the above three URLs, namely, the first The URL includes any one of the above three URLs or a combination of any of the several.

步骤 102、客户端根据网址信息，提取网址密文。 Step 102: The client extracts the ciphertext of the website according to the website information.

客户端根据第一 URL所包含的信息，提取第一 URL对应的网址密文。 The client extracts the ciphertext of the first URL according to the information contained in the first URL.

步骤 103、客户端将网址密文提交给服务器。 Step 103: The client submits the website ciphertext to the server.

步骤 104、服务器将网址密文与数据库中存储的密文进行匹配，数据库中存储的密文包括被标记为恶意网址的密文；若网址密文与数据库中标记为恶意网址的密文匹配，执行步骤 105; 否则，执行步骤 107。 Step 104: The server matches the ciphertext of the website with the ciphertext stored in the database, and the ciphertext stored in the database includes the ciphertext marked as the malicious website; if the ciphertext of the website matches the ciphertext marked as the malicious website in the database, Go to step 105; otherwise, go to step 107.

本实施例在服务器侧预先构建了数据库，该数据库中至少存储了被标记为恶意网址的密文。这些密文都是根据大量已知为恶意网址的 URL而获得的。 In this embodiment, a database is pre-built on the server side, and at least the ciphertext marked as a malicious web address is stored in the database. These ciphertexts are based on a large number of URLs known as malicious URLs.

步骤 105、服务器向客户端返回恶意网址查询结果，执行步骤 106。 Step 105: The server returns a malicious web address query result to the client, and step 106 is performed.

客户端提交的网址密文与数据库中标记为恶意网址的密文匹配表明客户端要访问的第一 URL为恶意网址，在此种情况下，服务器向客户端返回恶意网址查询结果。 The ciphertext submitted by the client matches the ciphertext marked as a malicious URL in the database, indicating that the first URL to be accessed by the client is a malicious web address. In this case, the server returns a malicious web query result to the client.

步骤 106、客户端根据恶意网址查询结果，阻断对网址的访问行为，结束。步骤 107、服务器向客户端返回正常网址查询结果，执行步骤 108。 Step 106: The client intercepts the access behavior of the web address according to the result of the malicious web address query, and ends. Step 107: The server returns a normal web address query result to the client, and step 108 is performed.

客户端提交的网址密文与数据库中标记为恶意网址的密文不匹配表明客户端要访问的第一 URL 为正常网址，在此种情况下，服务器向客户端返回正常网址查询结果。 The URL ciphertext submitted by the client does not match the ciphertext marked as a malicious URL in the database, indicating that the first URL to be accessed by the client is a normal URL. In this case, the server returns the normal URL query result to the client.

步骤 108、客户端根据正常网址查询结果，继续进行对网址的访问行为，结束。根据本实施例提供的网址访问方法，当客户端请求访问网址时，从网址信息中提取网址密文，将网址密文提交给服务器，服务器将网址密文与数据库中存储的密文匹配，完成网址的安全查询和验证，客户端根据服务器的验证结果决定是否继续对网址的访问行为。该方法不依赖客户端本地的数据库，将网址的安全查询和验证放在服务器侧完成。由于服务器侧的数据库可以及时的更新互联网上的各类恶意网址，它的升级周期远远短于客户端本地的数据库，而且服务器侧的数据库中恶意网址的信息存储量很大，覆盖面很广，从而能够快速有效地拦截恶意网站。 Step 108: The client continues the access behavior to the web address according to the normal web address query result, and ends. According to the URL access method provided by the embodiment, when the client requests to access the website address, the website ciphertext is extracted from the website address information, and the website cipher text is submitted to the server, and the server matches the website ciphertext with the ciphertext stored in the database, and completes The security query and verification of the URL, the client decides whether to continue the access behavior to the URL according to the verification result of the server. This method does not rely on the client's local database, and the security query and verification of the URL is done on the server side. Since the database on the server side can update various malicious websites on the Internet in time, its upgrade period is much shorter than the local database of the client, and the information of the malicious website in the database on the server side is large, and the coverage is wide. This enables fast and effective interception of malicious websites.

图 2示出了根据本发明一个实施例的网址访问方法的流程图。本实施例提供了一种基于云安全的网址访问方法，不依赖客户端本地的网址数据库，将网址的安全查询和验证放在服务器侧完成。如图 2所示，该方法包括如下步骤： 2 shows a flow chart of a method of accessing a web address in accordance with one embodiment of the present invention. This embodiment provides a cloud security-based URL access method, which does not depend on the client's local URL database, and performs security query and verification of the URL on the server side. As shown in FIG. 2, the method includes the following steps:

步骤 201、客户端获取请求访问的至少一个第一 URL。 Step 201: The client acquires at least one first URL that is requested to be accessed.

本实施例的至少一个第一 URL可以包括上述实施例所述的三种 URL中的任一种或任意几种的组合。 The at least one first URL of this embodiment may include any one of the three types of URLs described in the above embodiments or a combination of any of the several.

上述三种 URL的获取方法分别描述如下： The methods for obtaining the above three URLs are described as follows:

通过指定响应事件接口，例如通过实现标准插件机制的指定响应事件接口，获取客户端请求访问的网址对应的网页的 URL。例如，在 IE ( Internet Explorer ) 浏览器中使用浏览器辅助对象（Browser Helper Object , 简称： BH0) 插件机制，通过响应 "BeforeNavigate2 " 事件可以获取 IE当前加载的 URL。在火狐（Firefox) 浏览器中使用火狐扩展机制提供的指定响应事件接口，获取火狐浏览器当前加载的 URL。在谷歌（chrome ) 浏览器中使用网景插件应用程序编程接口（Netscape Plugin Appl ication Programming Interface , 简称： NPAPI ) 插件机制，获取谷歌浏览器当前加载的 URL。 The URL of the web page corresponding to the web address requested by the client is obtained by specifying a response event interface, for example, by implementing a specified response event interface of the standard plug-in mechanism. For example, in the IE (Internet Explorer) browser, using the Browser Helper Object (BH0) plugin mechanism, you can get the URL currently loaded by IE by responding to the "BeforeNavigate2" event. Use the specified response event interface provided by the Firefox extension mechanism in the Firefox browser to get the URL currently loaded by Firefox. Use the Netscape plug-in application programming interface (Netscape Plugin) in the chrome browser Appl ication Programming Interface (abbreviation: NPAPI) plugin mechanism, get the URL currently loaded by Google Chrome.

从浏览器环境中获得浏览器访问的网页内容中的链接 URL，包括但不限于页面内的超级链接地址。具体方法是，获得浏览器内部的页面对象，再通过调用页面对象的方法，获取网页内容中的链接 URL。其中，可以通过浏览器提供的标准插件机制获得浏览器内部的页面对象。 The URL of the link in the web content accessed by the browser from the browser environment, including but not limited to the hyperlink address within the page. The specific method is to obtain a page object inside the browser, and then obtain a link URL in the webpage content by calling a method of the page object. Among them, the page object inside the browser can be obtained through the standard plug-in mechanism provided by the browser.

从浏览器环境中获得浏览器正在下载文件的 URL。具体方法是，监控浏览器内部与下载有关的函数，当发现浏览器发生下载行为时，能够分析获得下载文件的 URL。其中，可以使用钩子（H00K) 机制监控浏览器内部与下载有关的函数。 Get the URL of the file the browser is downloading from the browser environment. The specific method is to monitor the function related to the download inside the browser, and when the download behavior of the browser is found, the URL of the downloaded file can be analyzed. Among them, you can use the hook (H00K) mechanism to monitor the browser-related functions related to downloading.

步骤 202、客户端对至少一个第一 URL进行规范化处理。 Step 202: The client normalizes the at least one first URL.

该规范化处理过程可以包括：将第一 URL中的字母大小写进行统一，包括协议、主机名、路径名、文件名和参数等信息；去除第一 URL中重复多余的路径符和参数。 The normalization process may include: unifying the uppercase and lowercase letters in the first URL, including information such as a protocol, a host name, a path name, a file name, and a parameter; and removing redundant path identifiers and parameters in the first URL.

例如，第一 URL为： HTTp：〃 www. A. com〃aBc/abc. Php?A=l。 For example, the first URL is: HTTp: 〃 www. A. com〃aBc/abc. Php?A=l.

将其中的大小写字母统一为小写： http : //www. a. com//abc/abc. php?a=l _; 去除其中重复多余的路径符： http：〃 www. a. com/abc/abc. php?a=l。 Unify the uppercase and lowercase letters to lowercase: http : //www. a. com//abc/abc. php?a=l _; remove the redundant path characters: http:〃 www. a. com/abc/ Abc. php?a=l.

步骤 203、客户端根据第一 URL，提取网址密文。 Step 203: The client extracts the website ciphertext according to the first URL.

对于第一 URL来说，第一 URL本身（url ) 、第一 URL的主机名（host ) 和第一 URL的第一域名段（domain ) 是三段关键信息。在得到第一 URL之后，获取第一 URL 的主机名和第一 URL的第一域名段。其中，第一 URL的主机名是去除掉第一 URL中的路径符、协议头和端口号等信息之后的主机部分；第一 URL 的第一域名段是根据第一 URL的主机名从右至左逐级追溯得到的。优选地，在获取第一 URL的第一域名段时，从右至左最高追溯 7级。 For the first URL, the first URL itself (url), the host name of the first URL (host), and the first domain name of the first URL (domain) are three pieces of key information. After obtaining the first URL, the host name of the first URL and the first domain name segment of the first URL are obtained. The host name of the first URL is a host part after the information such as the path identifier, the protocol header, and the port number in the first URL is removed; the first domain name segment of the first URL is from the right to the host name according to the first URL. Traced back to the left. Preferably, when the first domain name segment of the first URL is obtained, the highest level is traced from right to left.

若第一 URL的主机名从右至左的第一级根域名为国际顶级域名，则第一 URL的第一域名段为第一 URL的主机名的第一级子域名。其中国际顶级域名是指 " com" 、 "net " 、 " org" 、 " edu" 、 " gov" 等常见顶级域名。例如，第一 URL 的主机名为需 w. a. com, 它从右至左的第一级根域名为 " com" ，那么提取它的第一级子域名 " a. com" 为第一 URL的第一域名段。 If the first-level root domain name of the first URL from the right to the left is an international top-level domain name, the first domain name segment of the first URL is the first-level sub-domain name of the host name of the first URL. Among them, international top-level domains refer to common top-level domains such as "com", "net", "org", "edu", and "gov". For example, if the host name of the first URL is wa com, and the first-level root domain name from right to left is "com", then the first-level sub-domain name "a. com" is extracted as the first URL. Domain name segment.

若第一 URL的主机名从右至左的第一级根域名为国家地区顶级域名，第一级子域名包括国际顶级域名，则第一 URL的第一域名段为第一 URL的主机名的第二级子域名。其中国家地区顶级域名是指 " cn" 、 "hk"等特殊顶级域名。例如，第一 URL 的主机名为 www. a. com. cn，它从右至左的第一级根域名为 " cn" ，第一级子域名为 " com. cn" ，那么提取它的第二级子域名 " a. com. cn" 为第一 URL的第一域名段。 If the first-level root domain name of the first URL from the right-to-left domain is the country-level top-level domain name, and the first-level sub-domain name includes the international top-level domain name, the first domain name segment of the first URL is the host name of the first URL. Second level subdomain. The top-level domain names in the country refer to special top-level domains such as "cn" and "hk". For example, the host name of the first URL is www.a.com.cn, its first-level root domain name from right to left is "cn", and the first-level subdomain is "com.cn", then extract its first The second-level subdomain "a. com. cn" is the first domain name segment of the first URL.

若第一 URL的主机名使用了动态域名，则第一 URL的第一域名段为第一 URL的主机名从动态域名开始，向右提取的下一级子域名。其中动态域名是指一些二级或三级动态域名，如 " 3322. org " 、 " s. 3322. org" 、 " s. 3322. net " 等动态域名。例如，第一 URL的主机名为 www. a. 3322. org, 它使用了动态域名 " 3322. org" ，那么从动态域名开始，向右提取下一级子域名 " a. 3322. org" 为第一 URL 的第一域名段。本实施例分别计算上述三段关键信息的特征值作为网址密文。所述特征值可以具体为哈希值，优选地，所述特征值可以为根据消息摘要算法第五版（Message Digest Algori thm,以下简称： md5 )计算得到的哈希值，或 SHAl码，或 CRC ( Cycl i c Redundancy Check, 循环冗余校验）码等可唯一标识原程序的特征码。在下面的例子中，以特征值为 32位 md5哈希值为例进行说明。 If the host name of the first URL uses a dynamic domain name, the first domain name segment of the first URL is a host name of the first URL starting from the dynamic domain name, and the next-level sub-domain name extracted to the right. Dynamic domain names refer to some second- or third-level dynamic domain names, such as "3322. org", "s. 3322. org", and "s. 3322. net". For example, the first URL has the host name www.a.3322.org, which uses the dynamic domain name "3322. org". From the dynamic domain name, the next subdomain is extracted to the right "a. 3322. org" The first domain name segment of the first URL. In this embodiment, the feature values of the three pieces of key information are respectively calculated as the website ciphertext. The eigenvalue may be a hash value, or the eigenvalue may be a hash value calculated according to a message digest algorithm (Message Digest Algori, md5), or a SHA1 code, or A CRC (Cyclic Redundancy Check) code can uniquely identify the signature of the original program. In the following example, the eigenvalue is a 32-bit md5 hash value.

例如，第一 URL为： http : //www. a. com/abc/abc. php?a=l _; 根据上述方法，获取第一 URL的主机名为：需 w. a. com; 获取第一 URL的第一域名段为： a. com。 For example, the first URL is: http : //www. a. com/abc/abc. php?a=l _; According to the above method, the host name of the first URL is obtained: wa com; A domain name segment is: a. com.

计算第一 URL本身的 32位 md5哈希值为： Calculate the 32-bit md5 hash value of the first URL itself:

md5 (http： //www. a. com/abc/abc. php?a=l, 32) =e2a6b69ff l5c6a8e276f 089250a b3f7d Md5 (http: //www. a. com/abc/abc. php?a=l, 32) =e2a6b69ff l5c6a8e276f 089250a b3f7d

计算第一 URL的主机名的 32位 md5哈希值为： The 32-bit md5 hash value of the host name of the first URL is calculated as:

md5 (www. a. com, 32) = 30f4a7bbef e70d75616707c80921a7e8 Md5 (www. a. com, 32) = 30f4a7bbef e70d75616707c80921a7e8

计算第一 URL的第一域名段的 32位 md5哈希值为： Calculate the 32-bit md5 hash value of the first domain name segment of the first URL:

md5 (a. com, 32) = b3655bd7aad56513fcdacbd4254ed6b7 Md5 (a. com, 32) = b3655bd7aad56513fcdacbd4254ed6b7

对于具有一个第一 URL的情况，上述计算得到的第一 URL的 32位 md5哈希值、第一 URL的主机名的 32位 md5哈希值和第一 URL的第一域名段的 32位 md5哈希值即为第一 URL的网址密文。对于多个第一 URL的情况，分别计算每个第一 URL的上述三段关键信息的 32位 md5哈希值，将每个第一 URL的上述三段关键信息的 32位 md5哈希值形成一组，从而得到包括但不限于一组 32位 md5哈希值的网址密文。 For the case of having a first URL, the 32-bit md5 hash value of the first URL obtained by the above calculation, the 32-bit md5 hash value of the host name of the first URL, and the 32-bit md5 of the first domain name segment of the first URL. The hash value is the URL ciphertext of the first URL. For the case of multiple first URLs, the 32-bit md5 hash values of the above three pieces of key information of each first URL are respectively calculated, and the 32-bit md5 hash value of the above three pieces of key information of each first URL is formed. A set of ciphertexts that include, but are not limited to, a set of 32-bit md5 hashes.

在上述第一 URL为： http：〃 www. a. com/abc/abc. php?a=l的例子中，得到的第一 URL的一组网址密文如下： In the above example where the first URL is: http:〃 www. a. com/abc/abc. php?a=l, a set of URL ciphertexts of the first URL obtained is as follows:

domain | host | url Domain | host | url

a. com I www. a. com http : //www. a. com/abc/abc. php?a=l a. com I www. a. com http : //www. a. com/abc/abc. php?a=l

b3655bd7aad56513fcdacbd4254ed6b7 | 30f4a7bbef e70d75616707c80921a7e8 _e2a6b69ff l 5c6a8e276f089250ab3f7d B3655bd7aad56513fcdacbd4254ed6b7 | 30f4a7bbef e70d75616707c80921a7e8 _e 2a6b69ff l 5c6a8e276f089250ab3f7d

步骤 204、客户端将网址密文提交给服务器。 Step 204: The client submits the website ciphertext to the server.

步骤 205、服务器将网址密文与数据库中存储的密文进行匹配，数据库中存储的密文至少包括被标记为恶意网址的密文；若网址密文与数据库中标记为恶意网址的密文匹配，执行步骤 206 ; 否则，执行步骤 208。 Step 205: The server matches the ciphertext of the website with the ciphertext stored in the database, and the ciphertext stored in the database includes at least the ciphertext marked as the malicious website; if the ciphertext of the website matches the ciphertext marked as the malicious website in the database Go to Step 206; otherwise, go to Step 208.

本实施例在服务器侧预先构建了网址数据库，该网址数据库中至少存储了被标记为恶意网址的密文。具体地，网址数据库中的数据键值按照网址 url、网址 host 和网址 domain三种关键信息的特征值进行存储，三种关键信息的键值可以分别按照正常网址和恶意网址进行标记。具体地，标记为恶意网址的密文包括以下信息的一种或多种：恶意 URL的特征值、恶意 URL的主机名的特征值和恶意 URL的子域名的特征值。 In this embodiment, a web site database is pre-built on the server side, and at least the ciphertext marked as a malicious web address is stored in the web address database. Specifically, the data key value in the URL database is stored according to the characteristic values of the three key information: the URL url, the URL host, and the URL domain. The key values of the three key information can be marked according to the normal URL and the malicious URL respectively. Specifically, the ciphertext marked as a malicious web address includes one or more of the following: a feature value of the malicious URL, a feature value of the host name of the malicious URL, and a feature value of the sub-domain name of the malicious URL.

网址数据库中的密文都是根据大量已知为恶意网址的 URL而获得的。 The ciphertext in the URL database is based on a large number of URLs known as malicious URLs.

本实施例中，构建网址数据库可以包括以下步骤： In this embodiment, constructing the website database may include the following steps:

( a) 获取已知为恶意网址且第一域名段相同的至少一个第二 URL。在获取到大量已知为恶意网址的 URL之后，按照客户端提取主机名和第一域名段的方法，获取这些恶意网址的 URL的主机名和第一域名段。在这些恶意网址的 URL 中，经常会出现第一域名段相同的 URL。例如，对于以下恶意网址的 URL: (a) Obtaining at least one second URL known as a malicious web address and having the same first domain name segment. After obtaining a large number of URLs known as malicious URLs, the host name and the first domain name segment of the URLs of the malicious URLs are obtained according to the method of extracting the host name and the first domain name segment by the client. In the URLs of these malicious URLs, the same URL with the same first domain name segment often appears. For example, for the URL of the following malicious URL:

http： //a. b. c. d. e. f . g. com/abc/abc l. php?a=l Http: //a. b. c. d. e. f . g. com/abc/abc l. php?a=l

http： / /b. c. d. e. f . g. com/ abc/ abc. php?a=l Http: / /b. c. d. e. f . g. com/ abc/ abc. php?a=l

http : //d. e. f . g. com/ abc/ abc. php?a=l Http : //d. e. f . g. com/ abc/ abc. php?a=l

其第一域名段均为 g. com。在这里，以上三个 URL被称为第二 URL。 Its first domain name is g. com. Here, the above three URLs are called the second URL.

( b ) 获取至少一个第二 URL中包含子域名级数最高的第三 URL，从右至左逐级追溯第三 URL所包含的子域名，提取至少一级子域名； (b) obtaining at least one third URL having the highest number of sub-domains in the second URL, and tracing the sub-domains included in the third URL from right to left, and extracting at least one sub-domain;

在上述例子中，三个第二 URL 中包含子域名级数最高的第三 URL 是： http : //a. b. c. d. e. f. g. com/abc/abc l. php?a=l , 它共包含 7级子域名。从右至左逐级追溯第三 URL所包含的子域名，提取到如下 7级子域名： In the above example, the third URL with the highest number of subdomains in the three second URLs is: http : //a. b. c. d. e. f. g. com/abc/abc l. php?a=l , which contains a total of 7 subdomains. The subdomains included in the third URL are traced from right to left, and are extracted to the following 7 subdomains:

第一级子域名： g. com First level subdomain: g. com

第二级子域名： f. g. com Second level subdomain: f. g. com

第三级子域名： e. f. g. com Third-level subdomain: e. f. g. com

第四级子域名： d. e. f. g. com Fourth-level subdomain: d. e. f. g. com

第五级子域名： c. d. e. f. g. com Level 5 subdomain: c. d. e. f. g. com

第六级子域名： b. c. d. e. f. g. com Level 6 subdomain: b. c. d. e. f. g. com

第七级子域名： a. b. c. d. e. f. g. com The seventh level subdomain: a. b. c. d. e. f. g. com

优选地，本步骤追溯提取的至少一级子域名的级数为设定阈值 N。由于一个域名中的多个子域名中既有恶意网址也有正常网址，一般 6级以下都会出现这种情况，所以优选地， N大于或等于 6。 Preferably, the number of levels of at least one level of subdomains extracted in this step is set to a threshold N. Since there are both malicious URLs and normal URLs in multiple subdomains in a domain name, this is generally the case below level 6, so preferably N is greater than or equal to 6.

( c )若第二 URL的第一域名段属于预设的可信名单，例如，白名单，则将每个第二 URL的特征值、每个第二 URL的主机名的特征值标记为恶意网址的密文，存储在数据库中。 (c) if the first domain name segment of the second URL belongs to a preset trusted list, for example, a whitelist, marking the feature value of each second URL and the feature value of the host name of each second URL as malicious The ciphertext of the URL, stored in the database.

对于一些访问量较大的正常网站，例如： sina. com. cn， sohu. com 等网站，可将它们写入预设的可信名单。如果第二 URL 的第一域名段属于这样的可信名单，那么将每个第二 URL的特征值和每个第二 URL的主机名的特征值标记为恶意网址的密文，存储在数据库中。 For some normal websites with large traffic, such as: sina.com.cn, sohu.com, etc., they can be written to the default trusted list. If the first domain name segment of the second URL belongs to such a trusted list, the feature value of each second URL and the feature value of the host name of each second URL are marked as the ciphertext of the malicious website, and are stored in the database. .

在上述例子中，如果 g. com属于预设的可信名单，那么被标记为恶意网址的密文包括以下信息的特征值： In the above example, if g. com belongs to the default trusted list, the ciphertext marked as a malicious URL includes the feature values of the following information:

各个第二 URL: Each second URL:

http： //a. b. c. d. e. f . g. com/ abc/ abc l. php?a=l Http: //a. b. c. d. e. f . g. com/ abc/ abc l. php?a=l

http : // b. c. d. e. f . g. com/ abc/ abc. php?a=l Http : // b. c. d. e. f . g. com/ abc/ abc. php?a=l

http : // d. e. f. g. com/abc/abc. php?a=l Http : // d. e. f. g. com/abc/abc. php?a=l

各个第二 URL的主机名： Host name of each second URL:

a. b. c. d. e. f . g. com a. b. c. d. e. f . g. com

b. c. d. e. f . g. com d. e. f . g. com Bcde f . g. com De f . g. com

将上述信息的特征值存储在云端网址数据库中，且被标记为恶意网址的密文。然而，未出现恶意网址的其他子域名的特征值，可以被标记为正常网址也存储在云端网址数据库中，包括： The feature value of the above information is stored in the cloud URL database and is marked as a ciphertext of the malicious URL. However, the feature values of other subdomains that do not have a malicious URL can be marked as normal URLs and also stored in the cloud URL database, including:

g. com g. com

f . g. com f. g. com

e. f . g. com e. f. g. com

c. d. e. f . g. com c. d. e. f. g. com

( d)若第二 URL的第一域名段属于预设的不可信名单，例如，黑名单，则获取至少一个第二 URL中包含子域名级数最低的第四 URL，将每个第二 URL的特征值、每个第二 URL的主机名的特征值以及除了各个第二 URL的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL 的子域名的特征值标记为恶意网址的密文，存储在数据库中。 (d) if the first domain name segment of the second URL belongs to a preset untrusted list, for example, a blacklist, obtain at least one second URL that includes the lowest number of subdomains in the second URL, and each second URL The feature value, the feature value of the host name of each second URL, and the feature value of the sub-domain name of the at least one-level sub-domain name that is higher than the fourth URL except the host name of each second URL is marked as malicious. The ciphertext of the URL, stored in the database.

对于一些访问量很小的网站，可以将它们写入不可信名单。如果第二 URL的第一域名段属于这样的不可信名单，那么获取至少一个第二 URL 中包含子域名级数最低的第四 URL，将每个第二 URL的特征值、每个第二 URL的主机名的特征值以及除了各个第二 URL的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL的子域名的特征值标记为恶意网址的密文，存储在数据库中。 For some sites with very small traffic, they can be written to an untrusted list. If the first domain name segment of the second URL belongs to such an untrusted list, obtaining at least one second URL having the lowest number of sub-domain names in the second URL, and the feature value of each second URL, each second URL The eigenvalue of the host name and the ciphertext of the at least one level of the sub-domain name other than the host name of each second URL are higher than the privilege value of the sub-domain name of the fourth URL, which is marked as a malicious website, and is stored in the database. .

在上述例子中，如果 g. com属于预设的不可信名单，获取其中包含子域名级数最低的第四 URL为： http：〃 www. d. e. f. g. com/abc/abc. php?a=l，它共包含 4级子域名，那么被标记为恶意网址的密文包括以下信息的特征值： In the above example, if g. com belongs to the default untrusted list, obtain the fourth URL with the lowest number of subdomains in it: http:〃 www. defg com/abc/abc. php?a=l, it A total of 4 subdomains are included, then the ciphertext marked as a malicious URL includes the eigenvalues of the following information:

各个第二 URL: Each second URL:

各个第二 URL的主机名： Host name of each second URL:

a. b. c. d. e. f . g. com a. b. c. d. e. f . g. com

b. c. d. e. f . g. com b. c. d. e. f . g. com

d. e. f . g. com d. e. f. g. com

所述追溯提取的至少一级子域名中级数高于第四 URL 的子域名包括： a. b. c. d. e. f . g. com, b. c. d. e. f . g. com, c. d. e. f . g. com, 其中 a. b. c. d. e. f . g. com 和 b. c. d. e. f. g. com是第二 URL的主机名，那么除了各个第二 URL的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL的子域名就是： The sub-domain name of the at least one-level sub-domain name that is traced and extracted higher than the fourth URL includes: abcde f . g. com, bcde f . g. com, cde f . g. com, where abcde f . g. com And bcdefg com is the host name of the second URL, then the sub-domain name of the at least one sub-domain name that is extracted in addition to the host name of each second URL is higher than the sub-domain name of the fourth URL is:

c d. e. f . g. com c d. e. f . g. com

g. com f . g. com g. com f. g. com

e. f . g. com e. f. g. com

本步骤所提到的特征值应该是与客户端所提交的特征值是相同类型的。该特征值可以具体为哈希值，优选地，该特征值可以为根据 md5算法计算得到的哈希值。 The feature values mentioned in this step should be of the same type as the feature values submitted by the client. The feature value may be specifically a hash value. Preferably, the feature value may be a hash value calculated according to the md5 algorithm.

服务器将客户端提交的网址密文与云端网址数据库中标记为恶意网址的密文匹配，具体匹配过程如下： The server matches the ciphertext submitted by the client with the ciphertext marked as a malicious URL in the cloud URL database. The specific matching process is as follows:

若至少一个第一 URL中任一第一 URL的特征值、至少一个第一 URL中任一第一 URL的主机名的特征值和至少一个第一 URL中任一第一 URL的第一域名段的特征值中的任一个与云端网址数据库中标记为恶意网址的密文匹配，则执行步骤 206 ; 否则，执行步骤 208。 And a feature value of any of the first URLs in the at least one first URL, a feature value of the host name of any one of the at least one first URL, and a first domain name segment of any of the first URLs in the at least one first URL If any one of the feature values matches the ciphertext marked as a malicious URL in the cloud URL database, step 206 is performed; otherwise, step 208 is performed.

图 3示出了本发明实施例中网址密文匹配过程的流程图。图 3所示的匹配过程是本发明实施例一种优选的实施方式，但本发明不仅限于此。如图 3 所示，服务器将客户端提交的网址密文与数据库中存储的密文进行匹配的过程还可以包括如下步骤： FIG. 3 is a flow chart showing a process of matching a ciphertext of a website in the embodiment of the present invention. The matching process shown in Fig. 3 is a preferred embodiment of the embodiment of the present invention, but the present invention is not limited thereto. As shown in FIG. 3, the process of matching the ciphertext submitted by the client with the ciphertext stored in the database may further include the following steps:

步骤 301、将至少一个第一 URL中任一第一 URL的特征值与数据库中标记为恶意网址的密文进行匹配；若匹配，执行步骤 206 ; 否则，执行步骤 302 ; Step 301: Match the feature value of any one of the at least one first URL with the ciphertext marked as a malicious URL in the database; if yes, perform step 206; otherwise, perform step 302;

步骤 302、将至少一个第一 URL中任一第一 URL的主机名的特征值与数据库中标记为恶意网址的密文进行匹配；若匹配，执行步骤 206 ; 否则，执行步骤 303 ; 步骤 303、将至少一个第一 URL中任一第一 URL的第一域名段的特征值与数据库中标记为恶意网址的密文进行匹配；若匹配，执行步骤 206 ; 否则，执行步骤 208。 Step 302: Match the feature value of the host name of any one of the first URLs to the ciphertext marked as the malicious URL in the database; if yes, go to step 206; otherwise, go to step 303; Step 303; Matching the feature value of the first domain name segment of any one of the at least one first URL with the ciphertext marked as a malicious website in the database; if yes, performing step 206; otherwise, performing step 208.

综上所述，上述匹配过程包括如下三种情况： In summary, the above matching process includes the following three cases:

( 1 )至少一个第一 URL的三段关键信息的特征值中的任一特征值与云端网址数据库中标记为恶意网址的密文匹配，执行步骤 206 ; (1) any one of the feature values of the three pieces of key information of the at least one first URL is matched with the ciphertext marked as a malicious website in the cloud url database, and step 206 is performed;

( 2 )至少一个第一 URL的三段关键信息的特征值都不与云端网址数据库中标记为恶意网址的密文匹配，执行步骤 208 ; (2) the eigenvalues of the three pieces of key information of the at least one first URL are not matched with the ciphertexts marked as malicious URLs in the cloud website database, and step 208 is performed;

( 3 )至少一个第一 URL的三段关键信息的其中一特征值与云端网址数据库中标记为正常网址的密文匹配，且其他特征值都不与云端网址数据库中标记为恶意网址的密文匹配，执行步骤 208。 (3) at least one of the three pieces of key information of the first URL matches the ciphertext marked as the normal URL in the cloud website database, and the other feature values are not related to the ciphertext marked as the malicious website in the cloud website database. If yes, go to step 208.

步骤 206、服务器向客户端返回恶意网址查询结果，执行步骤 207。 Step 206: The server returns a malicious web address query result to the client, and step 207 is performed.

步骤 207、客户端根据恶意网址查询结果，阻断对网址的访问行为，结束。客户端根据恶意网址查询结果，阻断对网址的访问行为，并提示用户。 Step 207: The client blocks the access behavior of the website according to the result of the malicious website query, and ends. The client intercepts the access behavior of the web address according to the result of the malicious web address query, and prompts the user.

步骤 208、服务器向客户端返回正常网址查询结果，执行步骤 209。 Step 208: The server returns a normal web address query result to the client, and step 209 is performed.

步骤 209、客户端根据正常网址查询结果，继续进行对网址的访问行为，结束。根据本实施例提供的网址访问方法，当客户端请求访问网址时，从网址信息中提取网址密文，将网址密文提交给服务器，服务器将网址密文与数据库中存储的密文匹配，完成网址的安全查询和验证，客户端根据服务器的验证结果决定是否继续对网址的访问行为。该方法不依赖客户端本地的数据库，将网址的安全查询和验证放在服务器侧完成。由于服务器侧的数据库可以及时的更新互联网上的各类恶意网址，它的升级周期远远短于客户端本地的数据库，而且服务器侧的数据库中恶意网址的信息存储量很大，覆盖面很广，从而能够快速有效地拦截恶意网站。 Step 209: The client continues the access behavior to the web address according to the normal web address query result, and ends. According to the URL access method provided by the embodiment, when the client requests to access the website address, the website ciphertext is extracted from the website address information, and the website cipher text is submitted to the server, and the server matches the website ciphertext with the ciphertext stored in the database, and completes The security query and verification of the URL, the client decides whether to continue the access behavior to the URL according to the verification result of the server. This method does not rely on the client's local database, and the security query and verification of the URL is done on the server side. Since the database on the server side can update various malicious websites on the Internet in time, its upgrade period is much shorter than the local database of the client, and the information of the malicious website in the database on the server side is large, and the coverage is wide. This enables fast and effective interception of malicious websites.

图 4示出了根据本发明一个实施例的网址访问系统的结构示意图。如图 4所示，该网址访问系统包括：客户端 1和服务器 2。 FIG. 4 shows a schematic structural diagram of a web address access system according to an embodiment of the present invention. As shown in FIG. 4, the URL access system includes: Client 1 and Server 2.

客户端 1包括：监控模块 10、提取模块 11、通信模块 12、保护模块 13和访问模块 14。其中，监控模块 10 用于获取请求访问的网址对应的网址信息；提取模块 11用于根据网址信息，提取网址密文；通信模块 12用于将网址密文提交给服务器 2 ; 保护模块 13用于根据服务器 2返回的恶意网址查询结果，阻断对网址的访问行为；访问模块 14用于根据服务器 2返回的正常网址查询结果，继续进行对网址的访问行为。 The client 1 includes: a monitoring module 10, an extraction module 11, a communication module 12, a protection module 13, and an access module 14. The monitoring module 10 is configured to obtain the web address information corresponding to the web address requested to be accessed; the extracting module 11 is configured to extract the web address ciphertext according to the web address information; the communication module 12 is configured to submit the web address ciphertext to the server 2; and the protection module 13 is configured to: According to the result of the malicious web address query returned by the server 2, the access behavior to the web address is blocked; the access module 14 is configured to continue the access behavior to the web address according to the normal web address query result returned by the server 2.

服务器 2包括：数据库 20和查询模块 21。其中，数据库 20用于存储密文；查询模块 21用于将网址密文与数据库 20 中存储的密文进行匹配；若网址密文与数据库 20中标记为恶意网址的密文匹配，则向客户端 1返回恶意网址查询结果；若网址密文不与数据库 20中标记为恶意网址的密文匹配，则向客户端 1返回正常网址查询结果。 The server 2 includes: a database 20 and a query module 21. The database 20 is configured to store the ciphertext; the query module 21 is configured to match the ciphertext of the web address with the ciphertext stored in the database 20; if the ciphertext of the web address matches the ciphertext marked as a malicious web address in the database 20, the client The end 1 returns the malicious web address query result; if the web address ciphertext does not match the ciphertext marked as the malicious web address in the database 20, the normal web address query result is returned to the client 1.

进一步的，监控模块 10具体用于获取请求访问的网址对应的至少一个第一 URL，所述至少一个第一 URL包括：请求访问的网址对应的网页的 URL或请求访问的网址对应的网页内容中链接的 URL或下载文件的 URL或以上信息的任一组合。所述数据库中标记为恶意网址的密文包括以下信息的一种或多种：恶意 URL 的特征值、恶意 URL的主机名的特征值和恶意 URL的子域名的特征值。 Further, the monitoring module 10 is specifically configured to obtain at least one first URL corresponding to the website that is requested to be accessed, where the at least one first URL includes: a URL of a webpage corresponding to the webpage requesting access, or a webpage content corresponding to the webpage requesting access The URL of the link or the URL of the downloaded file or any combination of the above information. The ciphertext marked as a malicious web address in the database includes one or more of the following information: a feature value of the malicious URL, a feature value of the host name of the malicious URL, and a feature value of the subdomain of the malicious URL.

监控模块 10可以包括：第一监控单元 10a，用于通过指定响应事件接口，获取客户端 1请求访问的网址对应的网页的 URL。 The monitoring module 10 may include: a first monitoring unit 10a, configured to obtain a URL of a webpage corresponding to the webpage requested by the client 1 by specifying a response event interface.

监控模块 10也可以包括：第二监控单元 10b，用于获得客户端 1的浏览器内部的页面对象；通过调用页面对象的方法，获取客户端 1 请求访问的网址对应的网页内容中链接的 URL。 The monitoring module 10 may further include: a second monitoring unit 10b, configured to obtain a page object inside the browser of the client 1; and obtaining a URL linked in the webpage content corresponding to the webpage requested by the client 1 by calling the method of the page object .

监控模块 10还可以包括：第三监控单元 10c，用于监控客户端 1的浏览器内部与下载有关的函数；当浏览器发生下载行为时，获取下载文件的 URL。 The monitoring module 10 may further include: a third monitoring unit 10c, configured to monitor a function related to the download inside the browser of the client 1; and obtain a URL of the downloaded file when the download behavior occurs in the browser.

客户端 1还可以包括：处理模块 15，用于对至少一个第一 URL进行规范化处理。进一步的，处理模块 15可以包括：统一单元 15a和去除单元 15b，统一单元 15a用于将第一 URL中的字母大小写进行统一；去除单元 15b用于去除第一 URL中重复多余的路径符和参数。 The client 1 may further include: a processing module 15 configured to perform normalization processing on the at least one first URL. Further, the processing module 15 may include: a unified unit 15a for unifying the uppercase and lowercase of letters in the first URL, and a removing unit 15b for removing redundant path identifiers and the first URL parameter.

提取模块 1 1可以包括：获取单元 11a和计算单元 l lb。其中，获取单元 11a用于获取第一 URL的主机名和第一 URL的第一域名段；计算单元 l ib , 用于分别计算第一 URL的特征值、第一 URL的主机名的特征值和第一 URL的第一域名段的特征值；所述第一 URL的特征值、所述第一 URL的主机名的特征值和所述第一 URL的第一域名段的特征值即为所述网址密文。 The extraction module 1 1 may include: an acquisition unit 11a and a calculation unit 1 lb. The obtaining unit 11a is configured to obtain a host name of the first URL and a first domain name segment of the first URL, and a calculating unit l ib for separately calculating a feature value of a URL, a feature value of a host name of the first URL, and a feature value of the first domain name segment of the first URL; a feature value of the first URL, a feature value of the host name of the first URL, and a The feature value of the first domain name segment of the first URL is the website ciphertext.

若第一 URL的主机名从右至左的第一级根域名为国际顶级域名，则获取单元 11a 具体用于获取第一 URL的主机名的第一级子域名为第一 URL的第一域名段； If the first-level root domain name of the host name of the first URL is the international top-level domain name, the obtaining unit 11a is specifically configured to obtain the first-level sub-domain name of the host name of the first URL as the first domain name of the first URL. Paragraph

若第一 URL的主机名从右至左的第一级根域名为国家地区顶级域名，第一级子域名包括国际顶级域名，则获取单元 11a具体用于获取第一 URL的主机名的第二级子域名为第一 URL的第一域名段； If the first-level root domain name of the first-level host name of the first URL is the country-level top-level domain name, and the first-level sub-domain name includes the international top-level domain name, the obtaining unit 11a is specifically configured to obtain the second host name of the first URL. The subdomain name is the first domain name segment of the first URL;

若第一 URL使用了动态域名，则获取单元 11a具体用于获取从动态域名开始，向右提取的下一级子域名为第一 URL的第一域名段。 If the first URL uses the dynamic domain name, the obtaining unit 11a is specifically configured to obtain the first domain name segment of the first URL that is extracted from the dynamic domain name and is extracted to the right.

查询模块 21具体用于将网址密文与数据库 20中存储的密文进行匹配；若至少一个第一 URL中任一第一 URL的特征值、至少一个第一 URL中任一第一 URL的主机名的特征值和至少一个第一 URL中任一第一 URL的第一域名段的特征值中的任一个与数据库 20中标记为恶意网址的密文匹配，则向客户端 1返回恶意网址查询结果。 The query module 21 is specifically configured to match the ciphertext of the website with the ciphertext stored in the database 20; if the feature value of any of the first URLs in the at least one first URL, the host of any of the at least one first URL Returning a malicious URL query to the client 1 by matching any one of the feature value of the name and the feature value of the first domain name segment of any of the first URLs of the at least one first URL with the ciphertext marked as a malicious URL in the database 20. result.

作为一种优选的实施方式，该查询模块 21可以具体用于： As a preferred implementation manner, the query module 21 can be specifically configured to:

服务器 2还包括构建模块 22，该构建模块 22可以包括：第一获取单元 22a、第二获取单元 22b、第一标记单元 22c和第二标记单元 22d。其中，第一获取单元 22a 用于获取已知为恶意网址且第一域名段相同的至少一个第二 URL; 第二获取单元 22b 用于获取至少一个第二 URL中包含子域名级数最高的第三 URL，从右至左逐级追溯第三 URL所包含的子域名，提取至少一级子域名；第一标记单元 22c用于若第二 URL 的第一域名段属于预设的可信名单，将每个第二 URL的特征值和每个第二 URL的主机名的特征值标记为恶意网址的密文，存储在数据库 20中；第二标记单元 22d，用于若第二 URL的第一域名段属于预设的不可信名单，获取至少一个第二 URL中包含子域名级数最低的第四 URL，将每个第二 URL的特征值、每个第二 URL的主机名的特征值以及除了各个第二 URL 的主机名以外的追溯提取的至少一级子域名中级数高于第四 URL的子域名的特征值标记为恶意网址的密文，存储在数据库 20中。 The server 2 also includes a building block 22, which may include: a first obtaining unit 22a, a second obtaining unit 22b, a first marking unit 22c and a second marking unit 22d. The first obtaining unit 22a is configured to acquire at least one second URL that is known to be a malicious web address and the first domain name segment is the same. The second obtaining unit 22b is configured to obtain the at least one second URL that includes the highest number of sub-domain names. The third URL, the sub-domain name included in the third URL is traced from right to left, and at least one sub-domain name is extracted. The first marking unit 22c is configured to: if the first domain name segment of the second URL belongs to the preset trusted list, The ciphertext of the feature value of each second URL and the host name of each second URL is marked as a ciphertext of the malicious URL, and is stored in the database 20; the second marking unit 22d is configured to be the first of the second URL The domain name segment belongs to a preset untrusted list, and obtains at least one second URL included a fourth URL having the lowest sub-domain level, a feature value of each second URL, a feature value of a host name of each second URL, and at least one sub-domain name extracted in addition to the host name of each second URL The ciphertext of the subdomain name whose upper number is higher than the fourth URL is marked as the ciphertext of the malicious web address, and is stored in the database 20.

根据本实施例提供的网址访问系统，当客户端请求访问网址时，从网址信息中提取网址密文，将网址密文提交给服务器，服务器将网址密文与数据库中存储的密文匹配，完成网址的安全查询和验证，客户端根据服务器的验证结果决定是否继续对网址的访问行为。该方法不依赖客户端本地的数据库，将网址的安全查询和验证放在服务器侧完成。由于服务器侧的数据库可以及时的更新互联网上的各类恶意网址，它的升级周期远远短于客户端本地的数据库，而且服务器侧的数据库中恶意网址的信息存储量很大，覆盖面很广，从而能够快速有效地拦截恶意网站。 According to the URL accessing system provided by the embodiment, when the client requests to access the website address, the website ciphertext is extracted from the website address information, and the website cipher text is submitted to the server, and the server matches the website ciphertext with the ciphertext stored in the database, and completes The security query and verification of the URL, the client decides whether to continue the access behavior to the URL according to the verification result of the server. This method does not rely on the client's local database, and the security query and verification of the URL is done on the server side. Since the database on the server side can update various malicious websites on the Internet in time, its upgrade period is much shorter than the local database of the client, and the information of the malicious website in the database on the server side is large, and the coverage is wide. This enables fast and effective interception of malicious websites.

图 5示出了可以实现根据本发明的网址访问系统。该网址访问系统上包括处理器 510和以存储器 520形式的计算机程序产品或者计算机可读介质。存储器 520可以是诸如闪存、 EEPR0M (电可擦除可编程只读存储器）、 EPR0M、硬盘或者 ROM之类的电子存储器。存储器 520具有用于执行上述方法中的任何方法步骤的程序代码 1331 的存储空间 530。例如，用于程序代码的存储空间 1330可以包括分别用于实现上面的方法中的各种步骤的各个程序代码 531。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘，紧致盘（CD ) 、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图 6 所述的便携式或者固定存储单元。该存储单元可以具有与图 5 的服务器中的存储器 520类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常，存储单元包括计算机可读代码 53 Γ ，即可以由例如诸如 510 之类的处理器读取的代码，这些代码当由服务器运行时，导致该服务器执行上面所描述的方法中的各个步骤。 Figure 5 illustrates a web site access system in accordance with the present invention. The web site access system includes a processor 510 and a computer program product or computer readable medium in the form of a memory 520. The memory 520 can be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPR0M, a hard disk, or a ROM. Memory 520 has a memory space 530 for program code 1331 for performing any of the method steps described above. For example, storage space 1330 for program code may include various program code 531 for implementing various steps in the above methods, respectively. The program code can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to Figure 6. The storage unit may have a storage section, a storage space, and the like arranged similarly to the storage 520 in the server of Fig. 5. The program code can be compressed, for example, in an appropriate form. Typically, the storage unit includes computer readable code 53 Γ , i.e., code readable by a processor, such as 510, that when executed by the server causes the server to perform various steps in the methods described above.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。 The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the teaching based on the teachings herein. According to the above description, the structure required to construct such a system is obvious. Moreover, the invention is not directed to any particular programming language. It is to be understood that the present invention may be embodied in a variety of programming language, and the description of the specific language is described above for the purpose of illustrating the preferred embodiments of the invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。 Numerous specific details are set forth in the description provided herein. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和 /或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书（包括伴随的权利要求、摘要和附图）中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书（包括伴随的权利要求、摘要和附图）中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。 Similarly, the various features of the present invention are sometimes grouped together into a single embodiment, in the above description of the exemplary embodiments of the invention, Figure, or a description of it. However, the method disclosed is not to be interpreted as reflecting the intention that the claimed invention requires more features than those recited in the claims. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the specific embodiments, and each of the claims as a separate embodiment of the invention. Those skilled in the art will appreciate that the modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined. Each feature disclosed in the specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent, or similar purpose, unless otherwise stated.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。 In addition, those skilled in the art will appreciate that, although some embodiments described herein include certain features that are not included in other embodiments, and other features, combinations of features of different embodiments are intended to be within the scope of the present invention. Different embodiments are formed and formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器（DSP ) 来实现根据本发明实施例的网址访问系统中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序（例如，计算机程序和计算机程序产品）。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。 The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of the web access system in accordance with embodiments of the present invention. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the present invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词 "包含"不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词 "一"或 "一个" 不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。 It is to be noted that the above-described embodiments are illustrative of the present invention and are not intended to limit the scope of the invention, and those skilled in the art can devise alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" preceding a component does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.

此外，还应当注意，本说明书中使用的语言主要是为了可读性和教导的目的而选择的，而不是为了解释或者限定本发明的主题而选择的。因此，在不偏离所附权利要求书的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围，对本发明所做的公开是说明性的，而非限制性的，本发明的范围由所附权利要求书限定。 In addition, it should be noted that the language used in the specification has been selected for the purpose of readability and teaching, and is not intended to be construed as limiting or limiting. Therefore, many modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The disclosure of the present invention is intended to be illustrative, and not restrictive, and the scope of the invention is defined by the appended claims.

Claims

Rights request

1. A website access method, including:

The client obtains the URL information corresponding to the URL requested;

The client extracts the URL ciphertext based on the URL information;

The client submits the ciphertext of the URL to the server;

The server matches the URL ciphertext with the ciphertext stored in the database;

If the URL ciphertext matches the ciphertext marked as a malicious URL in the database, the malicious URL query result is returned to the client; the client blocks access to the URL based on the malicious URL query result;

If the ciphertext of the URL does not match the ciphertext marked as a malicious URL in the database, the normal URL query result is returned to the client; the client continues to access the URL based on the normal URL query result. .

2. The method according to claim 1, the website information is specifically at least one first URL.

3. The method according to claim 2, the ciphertext marked as a malicious URL in the database includes one or more of the following information: the characteristic value of the malicious URL, the characteristic value of the host name of the malicious URL, and the characteristic value of the malicious URL. Characteristic value of subdomain name.

4. The method according to claim 2, wherein the at least one first URL includes: the URL of the web page corresponding to the URL requested to be accessed or the URL of a link or downloaded file in the web page content corresponding to the URL requested to be accessed. URL or any combination of the above information.

5. The method according to claim 4, the client obtaining the URL information corresponding to the URL requested to access includes:

By specifying a response event interface, obtain the URL of the web page corresponding to the URL requested by the client.

6. The method according to claim 4, the client obtaining the URL information corresponding to the URL requested to access includes:

Obtain the page object inside the client's browser;

By calling the method of the page object, the URL linked in the web page content corresponding to the URL requested by the client is obtained.

7. The method according to claim 4, the client obtaining the URL information corresponding to the URL requested to access includes:

Monitor download-related functions within the client's browser;

When a download occurs in the browser, the URL of the downloaded file is obtained.

8. The method according to any one of claims 2 to 7, before the client extracts the URL ciphertext according to the URL information, further comprising: the client normalizing the at least one first URL. .

9. The method according to claim 8, the client normalizing the at least one first URL includes:

Unify the uppercase and lowercase letters in the first URL; Remove redundant path characters and parameters from the first URL.

10. The method according to any one of claims 2 to 7, the client extracting the URL ciphertext according to the URL information includes:

Obtain the host name of the first URL and the first domain name segment of the first URL;

Calculate the characteristic value of the first URL, the characteristic value of the host name of the first URL and the first URL respectively.

Characteristic value of the first domain name segment of the URL;

The characteristic value of the first URL, the characteristic value of the host name of the first URL, and the characteristic value of the first domain name segment of the first URL are the URL ciphertext.

11. The method according to claim 10, if the first-level root domain name from right to left of the host name of the first URL is an international top-level domain name, then the first domain name segment of the first URL is the third-level root domain name. The first-level subdomain name of a URL’s host name;

If the first-level root domain name from right to left of the host name of the first URL is the country's top-level domain name, and the first-level subdomain name includes the international top-level domain name, then the first domain name segment of the first URL is the first-level domain name. The second-level subdomain name of a URL’s host name;

If the host name of the first URL uses a dynamic domain name, the first domain name segment of the first URL is the next-level sub-domain name extracted from the dynamic domain name to the right of the host name of the first URL.

12. The method according to claim 10, if the URL ciphertext matches the ciphertext marked as a malicious URL in the database, then the malicious URL query result returned to the client is specifically: if the at least one first The characteristic value of any first URL in the URL, the characteristic value of the host name of any first URL in the at least one first URL, and the first domain name segment of any first URL in the at least one first URL. If any one of the characteristic values matches the ciphertext marked as a malicious URL in the database, the malicious URL query result is returned to the client.

13. The method according to claim 12, the server matching the URL ciphertext with the ciphertext stored in the database includes:

Match the characteristic value of any first URL in the at least one first URL with the ciphertext marked as a malicious URL in the database; If the characteristic value of any first URL in the at least one first URL matches the ciphertext in the database If the ciphertext marked as a malicious URL matches, the malicious URL query result is returned to the client;

If the characteristic value of any one of the at least one first URL does not match the ciphertext marked as a malicious URL in the database, then the characteristic value of the host name of any one of the at least one first URL is The value is matched with the ciphertext marked as a malicious URL in the database; if the characteristic value of the host name of any one of the at least one first URL matches the ciphertext marked as a malicious URL in the database, then the The client returns malicious URL query results;

If the characteristic value of the host name of any first URL in the at least one first URL does not match the ciphertext marked as a malicious URL in the database, then the third URL of any first URL in the at least one first URL is The characteristic value of a domain name segment is matched with the ciphertext marked as a malicious URL in the database; If the text matches, the malicious URL query result is returned to the client; if the characteristic value of the first domain name segment of any of the at least one first URL does not match the ciphertext marked as a malicious URL in the database, then return to the client Normal URL query results.

14. The method according to claim 3, further comprising the step of constructing the database;

The steps to build the database include:

Obtain at least one second URL that is known to be a malicious URL and has the same first domain name;

Obtain the third URL with the highest subdomain name level contained in the at least one second URL, trace the subdomain names contained in the third URL step by step from right to left, and extract at least one level of subdomain names;

If the first domain name segment of the second URL belongs to the preset trusted list, mark the characteristic value of each second URL and the characteristic value of the host name of each second URL as the ciphertext of the malicious URL, Stored in the database; If the first domain name segment of the second URL belongs to the preset untrusted list, obtain at least one second URL that contains the fourth URL with the lowest subdomain name level, and add each second URL to The characteristic value of , the characteristic value of the host name of each second URL, and the characteristic value of the subdomain name of at least one level of retroactively extracted subdomains with a higher level than the fourth URL except the host name of each second URL are marked as malicious. The ciphertext of the URL is stored in the database.

15. A website access system, including: client and server;

The clients include:

Monitoring module, used to obtain the URL information corresponding to the URL requested to be accessed;

An extraction module, used to extract the URL ciphertext based on the URL information;

Communication module, used to submit the ciphertext of the website address to the server;

The protection module is used to block access to the URL based on the malicious URL query results returned by the server;

The access module is used to continue accessing the URL based on the normal URL query results returned by the server.

The servers include:

Database, used to store ciphertext;

Query module, used to match the URL ciphertext with the ciphertext stored in the database; if the URL ciphertext matches the ciphertext marked as a malicious URL in the database, return the malicious URL query results to the client; if the URL ciphertext matches If the text does not match the ciphertext marked as a malicious URL in the database, the normal URL query result is returned to the client.

16. A computer program, comprising computer readable code, which when the computer readable code is run on a server, causes the server to execute the security of the Android application program according to any one of claims 1-14 Detection method.

17. A computer-readable medium in which the website access method as claimed in claim 16 is stored.