[go: up one dir, main page]

CN115186166B - A Tor core site discovery method based on hidden service association - Google Patents

A Tor core site discovery method based on hidden service association

Info

Publication number
CN115186166B
CN115186166B CN202210854926.1A CN202210854926A CN115186166B CN 115186166 B CN115186166 B CN 115186166B CN 202210854926 A CN202210854926 A CN 202210854926A CN 115186166 B CN115186166 B CN 115186166B
Authority
CN
China
Prior art keywords
hidden service
domain name
hidden
group
sites
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210854926.1A
Other languages
Chinese (zh)
Other versions
CN115186166A (en
Inventor
杨明
邢琳
顾晓丹
宋炳辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210854926.1A priority Critical patent/CN115186166B/en
Publication of CN115186166A publication Critical patent/CN115186166A/en
Application granted granted Critical
Publication of CN115186166B publication Critical patent/CN115186166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种基于隐藏服务关联的Tor核心站点发现方法,包括步骤(1)隐藏服务关联算法:针对内容相近但域名不同的Web站点,设计基于页面结构和内容的隐藏服务关联算法;(2)计算隐藏服务存活率;(3)隐藏服务访问量测量;(4)Tor核心站点发现:通过(2)和(3)得到的隐藏服务存活率和访问量对(1)中聚类的每个组中的隐藏服务进行分析,识别其中的核心站点。本发明可实现Tor暗网中具有较高分析价值的核心站点发现。

The present invention discloses a Tor core site discovery method based on hidden service association, comprising the steps of (1) a hidden service association algorithm: for websites with similar content but different domain names, a hidden service association algorithm based on page structure and content is designed; (2) calculating the hidden service survival rate; (3) measuring the hidden service visit volume; and (4) Tor core site discovery: analyzing the hidden services in each group clustered in (1) using the hidden service survival rate and visit volume obtained in (2) and (3) to identify the core sites therein. The present invention can realize the discovery of core sites with high analysis value in the Tor dark web.

Description

Tor core site discovery method based on hidden service association
Technical Field
The invention belongs to the technical field of anonymous networks (Anonymity Network), and particularly relates to a Tor core site discovery method based on hidden service association.
Background
Because of their strong anonymity, torr is used by many criminals to conduct illegal transactions, engage in illegal activities such as gun sales, drug sales, and privacy information transactions, and in addition, it is used by some organizations to implement large-scale network attacks. In order to effectively manage the darknet content, efficient crawlers of the Tor darknet content are required. However, different hidden services in the darknet have different importance, and the effective information amount also has a great difference, so that if the full-net crawler is used, much valuable information cannot be obtained, and the quality of the data provided by the hidden service is low. In addition, the site contents corresponding to a large number of domain names in the darknet are almost the same, namely the web page contents of different domain names are basically the same, and the phenomenon can cause a large number of crawlers to analyze, store and calculate resources to consume on repeated site contents, so that the detection and grasp of the darknet space are seriously restricted, and therefore, the core site discovery aiming at the Torr darknet is necessary.
Disclosure of Invention
Aiming at the problem that the quality of acquired data of the current darknet is lower because a large amount of illegal contents are urgent to be managed in the Tor darknet, the invention provides a method for discovering the Tor core site based on hidden service association.
The invention adopts the following technical scheme:
A method for discovering a Tor core site based on hidden service association, the method comprising the steps of:
(1) Designing a hidden service association algorithm based on a page structure and contents aiming at Web sites with similar contents and different domain names;
(2) Calculating the survival rate of the hidden service, namely indirectly judging whether the hidden service is online or not through whether the descriptor of the hidden service exists or not, and taking the hidden service as one of the characteristics of core site judgment;
(3) The hidden service access amount measurement, namely collecting the condition that a hidden service blind public key is requested through deploying a hidden service directory server HSDir, and further analyzing and comparing the access amount of the calculated hidden service;
(4) The Tor core site finds that the hidden services in each group clustered in (1) are analyzed by the hidden service survival rate and the access amount obtained in (2) and (3), and the core site is identified.
Further, the step (1) specifically includes:
(11) Clustering by using the redirection links in the Response Header, wherein after some domain names are accessed, the state codes are returned 301 and are automatically redirected to other pages, and the Location field in the Response Header displays the redirected page domain names, so that the domain names and the redirection domain names are clustered into a group;
(12) Defining that the titles of the default pages of the sites in the dark network are nonsensical, including 'Index of/', 'Apache 2 Debian Default Page', '401 Authorization Required', apache and Nginx, grouping the nonsensical titles and the sites without title information respectively, and grouping the sites with the meaningful title information and the same title text;
(13) And (3) integrating HTMLDOM tree, CSS style and page keywords to cluster, namely extracting one page from the set of meaningful titles, calculating DOM tree structure, class attribute value, id attribute value of each page and the first 20 pieces of keyword information in the page, and comparing the DOM tree structure similarity, class attribute value, id attribute value similarity and page keyword similarity of each page by using a similarity algorithm.
Further, the step (2) specifically includes:
(21) Reading the domain name of the hidden service survival rate to be calculated from the database;
(22) Deploying a plurality of Torr processes, and sending a query request to the hidden server by the client through the Torr control protocol to realize the concurrent execution of a plurality of processes;
(23) If the descriptor is in a non-abnormal state, judging whether the descriptor exists according to the returned information so as to save the result, wherein if the descriptor exists, the domain name is considered to be on-line, and if the descriptor does not exist, the domain name is considered to be off-line;
(24) If the descriptor inquiry is abnormal and the inquiry times are not more than 5 times, putting the domain name into a queue again, and carrying out re-inquiry later, and returning to the step (22);
(25) And according to the returned information, storing the detection result for calculating the survival rate of the hidden service.
Further, the step (3) specifically includes:
(31) Calculating all blind public keys in a certain period for each v3 domain name;
(32) Comparing the off-line calculated blind public key result with the blind public key data collected from the hidden service directory server to obtain the total access quantity of each v3 domain name;
(33) The daily average access amount of the hidden service v3 domain name is calculated by dividing the total access amount of each v3 domain name by the statistical days.
Further, the step (4) specifically includes:
(41) For each group of clusters in (1), the survival sr j_i of each group was calculated, which was the maximum survival for all domain names in the group, and the survival sr j_i was expressed as follows:
wherein, online_num is the measurement domain name online
(42) For each group of clusters in (1), calculating an access quantity view j_i for each group, view j_i being the sum of all domain name access quantities for each group for websites with declared mirror sites, and view j_i being the maximum value of all domain name access quantities for each group for websites without declared mirror sites;
(43) Modeling the discovery problem of the core site as a classification problem in machine learning, taking the access quantity, the survival rate, the number of similar pages and the access degree as classification attributes, and using XGBoost model to discover the core site;
(44) For pages classified as core sites, a discrimination probability x of classification is calculated at the same time, and based on the discrimination probability, the identified core sites are further classified into importance degrees of 3 levels, wherein, the page with x being equal to or greater than 0.9 is regarded as the most important core site, the page with 0.75 being equal to or greater than x <0.9 is regarded as the next most important page, and the page with 0.5 being equal to or greater than x <0.75 is regarded as the least important core site.
Compared with the prior art, the invention has the remarkable advantages that:
1. the hidden service detection efficiency is improved, namely, from the beginning of a request sent by the Torr client to the receiving of the hidden service, the whole process needs to pass through a 15-hop onion router, and by using the hidden service detection method, the hidden service detection efficiency is obviously improved only by passing through a 3-hop onion router.
2. The traditional scheme for deploying hidden service directory servers to collect access volume is based on Torv protocol, but the method calculates the blind public key of v3 domain name offline by stripping Tor source code, and finally obtains Torv hidden service access volume by analysis and comparison.
3. The existing Tor hidden service importance degree ordering does not consider the characteristics of the Tor protocol, and the core site and the characteristics of the Tor protocol are combined, including the survival rate, the access quantity and the like of the hidden service, so that the discovery of the hidden service core site can be realized more effectively.
Drawings
FIG. 1 is a schematic diagram of the comprehensive cluster analysis algorithm of the present invention.
Fig. 2 is a hidden service probe flow chart of the present invention.
FIG. 3 is a system deployment diagram of the hidden service probe activity and access volume measurement of the present invention.
FIG. 4 is a flow chart of model training for core site discovery of the present invention.
Detailed Description
The invention designs and realizes the Tor core site discovery technology based on hidden service association, and discovers the core site in the dark network. The method comprises the following steps of hidden service association, hidden service detection activity, hidden service access amount measurement and core site discovery scheme, and specifically comprises the following steps:
1. hiding service associations
The hidden service association algorithm comprises three steps of clustering by utilizing a redirection link in a Response Header, clustering meaningful titles, clustering by combining with HTMLDOM tree, CSS style, page keywords and the like.
Clustering is performed by using the redirect links in the Response Header, wherein the Location field in the Response Header displays the redirected page domain name, so that the domain name and the redirect domain name are clustered into a group in step one.
The method comprises the steps of clustering meaningful titles, namely, the invention considers that the titles of default pages of Web servers (such as Apache, nginx and the like) such as Index of/"," Apache2 Debian Default Page "," 401Authorization Required "and the like are meaningless, and on the basis of the step one (the invention considers that the group titles successfully clustered in the step one are the titles of the redirected domain name), the meaningless titles and the sites without title information are respectively divided into one group, and the sites with meaningful title information and the same title text are divided into one group.
In combination with content clustering such as HTMLDOM tree, CSS style, page keywords and the like, the method extracts one page from the set of meaningful titles, calculates DOM tree structure, class attribute value, id attribute value and first 20 keyword information in the page of each page, and compares the DOM tree structure similarity, class attribute value, id attribute value similarity and page keyword similarity of each page by using a similarity algorithm, wherein the overall flow is shown in figure 1. Specifically, similarity of DOM trees of each two pages is calculated by using a sequence comparison method and is denoted as similarity 1, similarity of class attribute values and id attribute values of each two page documents is calculated by using a Jaccard coefficient (Jaccard similarity coefficient) and is denoted as similarity 2, and similarity of keyword information in each two pages is also calculated by using the coefficient and is denoted as similarity 3. The three similarities are combined to determine whether two pages should be counted as a group.
2. Hidden service activity detection scheme
In the scheme, whether the hidden service is online or not is indirectly judged by whether the descriptor of the hidden service exists or not. By analyzing the Tor protocol, it is found herein that the client needs to query the hidden service directory server for hidden service descriptors before communicating to the hidden service, and when querying, the return situation of the hidden service directory server can be generalized to three situations:
(1) The query is successful, namely the descriptor exists and returns successfully;
(2) Query failure: descriptor not present;
(3) Query exceptions-no descriptor information is returned for some reasons including query timeout, hidden service directory server rejection of request, etc.
Each hidden service will send its own descriptor to the hidden service directory server periodically (no more than two hours), and the hidden service directory server will also clear the expiration descriptor periodically, so whether the hidden service is online can be determined indirectly by whether the hidden service's descriptor is present.
The whole activity detection flow is shown in fig. 2, and the specific steps are as follows:
(1) Reading the domain name to be tested from the database;
(2) Deploying a plurality of Torr processes, and sending a query request to the hidden server by the client through the Torr control protocol to realize the concurrent execution of a plurality of processes;
(3) If the descriptor is in a non-abnormal state, judging whether the descriptor exists according to the returned information so as to save the result, wherein if the descriptor exists, the domain name is considered to be on-line, and if the descriptor does not exist, the domain name is considered to be off-line;
(4) If the descriptor inquiry is abnormal and the inquiry times are not more than 5 times, putting the domain name into a queue again, and carrying out re-inquiry later to return to the step (2);
(5) And storing the online detection result of the hidden service according to the returned information.
For each hidden service, the survival rate is sr, which can be expressed as follows:
Wherein online_num is the measurement domain name online
3. Hiding service access amount measurement scheme
The hidden service domain name in the Tor network needs to be queried through HSDir at first when accessed, so that corresponding modification can be made in the Tor source code, and the client access request condition can be recorded and counted. This is also the general idea of the hidden service access amount measurement method proposed by the present invention.
When the client sends the descriptor Id value corresponding to the domain name to the selected HSDir, the call of the cache_lookup_v3_as_dir function in the cache/src/feature/hs/hs_cache.c file in the Tor source code is triggered to find out whether the descriptor Id value exists in the cache. If return 1 is found, otherwise return 0. Thus, code may be modified in this function to record client access volume request conditions. However, HSDir cannot directly obtain the hidden service domain name, and only the blind public key can be seen when the access amount is obtained, and the blind public key can be calculated offline. The specific flow of the measurement is as follows:
(1) Calculating all blind public keys in a certain period for each domain name;
(2) Comparing the blind public key result of the offline calculation with blind public key data collected from HSDir to obtain the total access quantity of each v3 domain name;
(3) The average daily access for the hidden service domain name is calculated by dividing the total access for each domain name by the number of days counted.
Fig. 3 shows an overall deployment scenario for hidden service probe activity and access volume probe.
4. Core site discovery scheme
The core site discovery scheme combines and calculates the obtained survival rate and access quantity characteristics according to 2 and 3, and the whole algorithm flow is as follows:
(1) The survival rate and access amount of each group of hidden services are calculated by recording the survival rate of each group of domain names as sr j_i, the value of which is the maximum value of the survival rate of all domain names of the group, recording the access amount of each group as view j_i, for a website with a mirror site declared, view j_i is the sum of the access amounts of all domain names of each group, and for a website without a mirror site declared, view j_i is the maximum value of the access amount of all domain names of each group.
(2) And data preprocessing, namely carrying out data normalization processing on the access quantity of each group of hidden services. Let the normalized access amount of each group of hidden services be view' j_i, then
(3) And (3) training a classification model to obtain a core site, namely modeling the discovery problem of the core site as a classification problem in machine learning, using the preprocessed data as classification attributes, and using XGBoost model to perform core site discovery, wherein the overall flow of model training is shown in figure 4.
It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the scope of protection thereof, and although the present application has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that various changes, modifications or equivalents may be made to the specific embodiments of the application after reading the present application, and these changes, modifications or equivalents are within the scope of protection of the claims appended hereto.

Claims (2)

1.一种基于隐藏服务关联的Tor核心站点发现方法,其特征在于,该方法包括如下步骤:1. A Tor core site discovery method based on hidden service association, characterized in that the method comprises the following steps: (1)隐藏服务关联算法:针对内容相近但域名不同的Web站点,设计基于页面结构和内容的隐藏服务关联算法;(1) Hidden service association algorithm: For websites with similar content but different domain names, a hidden service association algorithm based on page structure and content is designed; (11)利用Response Header中的重定向链接进行聚类:(11) Clustering using redirect links in Response Header: (12)将拥有有意义标题的相同站点聚成一类:(12) Group similar sites with meaningful titles into one group: (13)结合HTMLDOM树、CSS样式、页面关键词综合进行聚类:(13) Clustering based on HTML DOM tree, CSS style, and page keywords: (2)计算隐藏服务存活率:通过隐藏服务的描述符是否存在来间接判定隐藏服务是否在线,并将其作为核心站点判断的特征之一;(2) Calculate the hidden service survival rate: indirectly determine whether the hidden service is online by whether the hidden service descriptor exists, and use it as one of the features for core site judgment; (21)从数据库中读取待计算隐藏服务存活率的域名;(21) Read the domain name of the hidden service survival rate to be calculated from the database; (22)部署多个Tor进程,客户端通过Tor控制协议向隐藏服务器发送查询请求,实现多进程并发执行;(22) Deploy multiple Tor processes, and the client sends query requests to the hidden server through the Tor control protocol to achieve multi-process concurrent execution; (23)如果描述符为非异常状态,将根据返回信息判断描述符是否存在进而保存结果:如果描述符存在,则认为域名在线;如果不存在,则认为域名不在线;(23) If the descriptor is in a non-abnormal state, the descriptor will be judged based on the returned information to determine whether it exists and then save the result: if the descriptor exists, the domain name is considered to be online; if it does not exist, the domain name is considered to be offline; (24)如果描述符查询异常且查询次数不超过5次,则将此域名重新放入队列中,稍后将进行重新查询,返回步骤(22);(24) If the descriptor query is abnormal and the number of queries does not exceed 5, put the domain name back into the queue and requery it later, and return to step (22); (25)根据返回信息,保存计算隐藏服务存活率的检测结果;(25) Based on the returned information, save the detection results of calculating the survival rate of the hidden service; (3)隐藏服务访问量测量:通过部署隐藏服务目录服务器HSDir收集隐藏服务盲公钥被请求的情况,进而分析比对计算隐藏服务的访问量;(3) Hidden service access volume measurement: By deploying the hidden service directory server HSDir, we collect the requests for the blind public key of the hidden service, and then analyze and compare the hidden service access volume; (31)对每一个v3域名,计算一定周期内的全部盲公钥;(31) For each v3 domain name, calculate all blind public keys within a certain period; (32)将离线计算的盲公钥结果和从隐藏服务目录服务器上收集到的盲公钥数据做比对,获得每个v3域名的总访问量;(32) Compare the offline calculated blind public key result with the blind public key data collected from the hidden service directory server to obtain the total number of visits for each v3 domain name; (33)将每个v3域名的总访问量除以统计天数以此来计算该隐藏服务v3域名的日均访问量;(33) Divide the total number of visits to each v3 domain name by the number of statistical days to calculate the average daily number of visits to the hidden service v3 domain name; (4)Tor核心站点发现:通过(2)和(3)得到的隐藏服务存活率和访问量对(1)中聚类的每个组中的隐藏服务进行分析,识别其中的核心站点;(4) Tor core site discovery: Analyze the hidden services in each group clustered in (1) based on the hidden service survival rate and visit volume obtained in (2) and (3) to identify the core sites; (41)对于(1)中聚类的每个组,计算每个组的存活率,其值为该组中所有域名的最大存活率;存活率如下公式表示:(41) For each group clustered in (1), calculate the survival rate of each group , whose value is the maximum survival rate of all domain names in the group; survival rate The following formula expresses it: ; 其中,online_num为测量域名在线的总次数,check_num表示测量域名的总次数;Among them, online_num is the total number of times the domain name is measured online, and check_num is the total number of times the domain name is measured; (42)对于(1)中聚类的每个组,计算每个组的访问量:对于有声明镜像站点的网站,为每个组的所有域名访问量的总和;而对于没有声明镜像站点的网站,为每个组的所有域名访问量的最大数值;(42) For each group clustered in (1), calculate the number of visits for each group :For websites that have declared mirror sites, The sum of all domain name visits for each group; for websites that do not declare mirror sites, The maximum value of all domain name visits for each group; (43)将核心站点的发现问题建模为机器学习中的二分类问题,以访问量、存活率、相似页面数量、出入度作为分类属性,并使用XGBoost模型进行核心站点发现;(43) The core site discovery problem is modeled as a binary classification problem in machine learning, with the number of visits, survival rate, number of similar pages, and in-and-out degree as classification attributes, and the XGBoost model is used for core site discovery; (44)对于分类为核心站点的页面,同时计算分类的判别概率x,基于该判别概率,进一步将认定的核心站点分为3个级别的重要程度,其中,x≥0.9的页面将被视为最重要的核心站点,0.75≤x<0.9的页面被视为次重要的页面,而0.5≤x<0.75的页面被视为最不重要的核心站点。(44) For pages classified as core sites, the classification probability x is calculated at the same time. Based on the probability, the identified core sites are further divided into three levels of importance, among which pages with x ≥ 0.9 will be regarded as the most important core sites, pages with 0.75 ≤ x < 0.9 will be regarded as the second most important pages, and pages with 0.5 ≤ x < 0.75 will be regarded as the least important core sites. 2.根据权利要求1所述的一种基于隐藏服务关联的Tor核心站点发现方法,其特征在于:2. A Tor core site discovery method based on hidden service association according to claim 1, characterized in that: 所述步骤(11)具体为:The step (11) is specifically as follows: 由于一些域名访问以后会返回301状态码并自动重定向到其他页面,Response Header头中的Location字段会显示重定向后的页面域名,因此将域名和重定向域名聚类成一组;Since some domain names will return a 301 status code and automatically redirect to other pages after being accessed, the Location field in the Response Header will display the domain name of the redirected page. Therefore, the domain name and the redirected domain name are clustered into one group; 所述步骤(12)具体为:The step (12) is specifically as follows: 定义暗网中的站点默认页面的标题是无意义的,包括 “Index of /”、“Apache2Debian Default Page”、“401 Authorization Required”、 Apache、Nginx,将该无意义标题以及没有标题信息的站点各自划为一组,而将拥有有意义标题信息且标题文字相同的站点划分为一组;The titles of the default pages of sites on the dark web are defined as meaningless, including "Index of /", "Apache2Debian Default Page", "401 Authorization Required", Apache, and Nginx. Sites with these meaningless titles and no title information are grouped separately, while sites with meaningful title information and the same title text are grouped together. 所述步骤(13)具体为:The step (13) is specifically as follows: 结合HTMLDOM树、CSS样式、页面关键词综合进行聚类:将在有意义标题的组中抽取一个页面,计算每个页面的DOM树结构、class属性值、id属性值以及页面中的前20个关键词信息,并用相似度算法比较每个页面的DOM树结构相似度、class属性值和id属性值相似度以及页面关键词相似度。Clustering is performed by combining HTML DOM tree, CSS style, and page keywords: a page will be extracted from the group of meaningful titles, and the DOM tree structure, class attribute value, id attribute value, and the top 20 keyword information of each page will be calculated. The similarity of the DOM tree structure, class attribute value, id attribute value, and page keyword similarity of each page will be compared using a similarity algorithm.
CN202210854926.1A 2022-07-20 2022-07-20 A Tor core site discovery method based on hidden service association Active CN115186166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210854926.1A CN115186166B (en) 2022-07-20 2022-07-20 A Tor core site discovery method based on hidden service association

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210854926.1A CN115186166B (en) 2022-07-20 2022-07-20 A Tor core site discovery method based on hidden service association

Publications (2)

Publication Number Publication Date
CN115186166A CN115186166A (en) 2022-10-14
CN115186166B true CN115186166B (en) 2025-10-14

Family

ID=83519364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210854926.1A Active CN115186166B (en) 2022-07-20 2022-07-20 A Tor core site discovery method based on hidden service association

Country Status (1)

Country Link
CN (1) CN115186166B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118413452B (en) * 2024-06-24 2024-08-23 中国电子科技集团公司第三十研究所 A Tor key node discovery method and device based on time series network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800364A (en) * 2018-12-15 2019-05-24 深圳壹账通智能科技有限公司 Amount of access statistical method, device, equipment and storage medium based on block chain
CN114095242A (en) * 2021-11-18 2022-02-25 东南大学 Storage type hidden channel scheme based on Tor hidden service domain name state

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560413B1 (en) * 2005-07-14 2013-10-15 John S. Quarterman Method and system for detecting distributed internet crime
US20100205297A1 (en) * 2009-02-11 2010-08-12 Gurusamy Sarathy Systems and methods for dynamic detection of anonymizing proxies
EP3719685A1 (en) * 2019-04-03 2020-10-07 Deutsche Telekom AG Method and system for clustering darknet traffic streams with word embeddings
CN114157713B (en) * 2021-10-09 2023-06-16 北京邮电大学 Method and system for capturing hidden service traffic
CN114238736A (en) * 2021-12-24 2022-03-25 上海谋乐网络科技有限公司 A method and device for monitoring dark web data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800364A (en) * 2018-12-15 2019-05-24 深圳壹账通智能科技有限公司 Amount of access statistical method, device, equipment and storage medium based on block chain
CN114095242A (en) * 2021-11-18 2022-02-25 东南大学 Storage type hidden channel scheme based on Tor hidden service domain name state

Also Published As

Publication number Publication date
CN115186166A (en) 2022-10-14

Similar Documents

Publication Publication Date Title
Bonchi et al. Web log data warehousing and mining for intelligent web caching
Tyagi et al. An algorithmic approach to data preprocessing in web usage mining
CN113785289B (en) System and method for dynamically generating a set of API endpoints
US11755626B1 (en) Systems and methods for classifying data objects
US12181956B1 (en) Machine-learning based prioritization of alert groupings
CN101127043A (en) A lightweight personalized search engine and its search method
US7340460B1 (en) Vector analysis of histograms for units of a concept network in search query processing
CN101369276A (en) A Forensics Method of Web Browser Cache Data
US8065729B2 (en) Method and apparatus for generating network attack signature
Sujatha et al. Improved user navigation pattern prediction technique from web log data
Lakshmi et al. An overview of preprocessing on web log data for web usage analysis
CN114500122B (en) Specific network behavior analysis method and system based on multi-source data fusion
CN118710461A (en) A smart campus archive data security management system
CN119006029A (en) User portrait construction method, equipment, medium and product based on multi-source data
CN115186166B (en) A Tor core site discovery method based on hidden service association
Dua et al. Discovery of Web frequent patterns and user characteristics from Web access logs: a framework for dynamic Web personalization
US20120030164A1 (en) Method and system for gathering and usage of live search trends
Lu et al. Web log mining
KR20200066428A (en) A unit and method for processing rule based action
CN118018598A (en) Intelligence push method and device based on user subscription
US8909795B2 (en) Method for determining validity of command and system thereof
Wu et al. A data warehousing and data mining framework for web usage management
US12189624B1 (en) Facilitating management and storage of configurations
Hawwash et al. Mining and tracking evolving web user trends from large web server logs
Haider Ramadhan et al. A classification of techniques for web usage analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant