TW201815142A

TW201815142A - Method for detecting domain flux botnet through proxy server log

Info

Publication number: TW201815142A
Application number: TW105130289A
Authority: TW
Inventors: 鄭棕翰; 陳建智; 周國森; 黃秀娟; 施君熹
Original assignee: 中華電信股份有限公司
Priority date: 2016-09-20
Filing date: 2016-09-20
Publication date: 2018-04-16
Also published as: TWI634769B

Abstract

本發明有關於一種通過代理伺服器日誌偵測domain flux殭屍網路的方法，以網站分析模組執行以下步驟，包含：存取代理伺服器日誌資料；再根據一節點過濾演算法對日誌資料中常規的用戶代理連線過濾掉；並根據一網站分群演算法，在剩餘紀錄中進行分群，以取得日誌內容中外部網站之間相關聯的集合；取得外部網站的連線拓樸，並以一連線特徵演算法判斷集合是否符合domain flux的連線特徵；以及，濾除剩餘紀錄當中與正常應用程式或CDN連接之網域，其餘網域即判斷為domain flux殭屍網路。 The invention relates to a method for detecting a domain flux botnet through a proxy server log. A website analysis module performs the following steps, including: accessing proxy server log data; and then filtering the log data according to a node filtering algorithm. The conventional user-agent connection is filtered out; and according to a website grouping algorithm, grouping is performed in the remaining records to obtain the associated set of external websites in the log content; the connection topology of the external website is obtained, and a The connection feature algorithm determines whether the collection conforms to the connection characteristics of domain flux; and filters out the domains connected to normal applications or CDNs in the remaining records, and the remaining domains are judged to be domain flux botnets.

Description

Method for detecting domain flux botnet through proxy server logs

本發明有關於一種通過代理日誌偵測惡意網路之方法，特別是有關於一種通過代理伺服器日誌偵測domain flux殭屍網路的方法。 The present invention relates to a method for detecting a malicious network through a proxy log, and more particularly, to a method for detecting a domain flux botnet through a proxy server log.

針對代理伺服器日誌資料的整理及分析，是現今資訊安全控管的基礎，而隨著巨量資料時代到來，如何在巨量的日誌資料當中撈取出有價值的資訊，係領域內技術人員的主要課題，若能夠發展出一種可以篩選出惡意網域的群集，並進一步鎖定企業或公司內部的受害主機，對於需要進行嚴謹資安控管的企業或公司而言，將具有明顯之助益。 The collation and analysis of the log data of the proxy server is the basis of today's information security control. With the advent of the huge amount of data, how to extract valuable information from the huge amount of log data is the responsibility of technical personnel in the field. The main issue, if we can develop a cluster that can screen out malicious domains, and further lock the victim hosts within the enterprise or company, it will have obvious benefits for enterprises or companies that need strict security control.

如中華民國專利公告號1455546，其係一種偵測快速變動網域技術之惡意網域的方法與系統，其主要是利用路由器資訊所包含的路由器主機名稱與網路位址自治系統號碼等等資訊，配合路由器主機的特定部分名稱相同，或是網路封包傳送時間大於預設的一個檢查值作為基礎，來判斷是否為惡意網域，然而，其準確性有所誤差，並有著將正常的應用程式錯誤的報告為惡意網域的可能性。 Such as the Republic of China Patent Bulletin No. 1455546, which is a method and system for detecting malicious domains with rapidly changing domain technologies. It mainly uses router host names and network address autonomous system numbers included in router information. To determine whether it is a malicious domain based on the same name of a specific part of the router host, or if the network packet transmission time is greater than a preset check value, however, its accuracy is incorrect and it has normal applications Possibility of reporting a programmatic error as a malicious domain.

而除上開先前技術之外，領域中亦已發展出透過網路搜尋引擎與WHOIS網站查詢得到網域名稱相關網域的集合結果，以找出僅包含少量搜尋結果之可疑網站，最後再依據相關聯網域的集合與其搜尋結果數目，判斷是否為可疑殭屍網路的中繼站域名的技術方法，唯其仍無法準確篩選出domain flux殭屍網路，domain flux技術若應用殭屍網路，是一種可以透過既有的DNS服務，或是網域生成算法，來實現多個域名與同一IP位址相關聯，進而逃避URL檢測的技術。 In addition to the previous technology, the field has also developed a set of domain name-related domain results obtained through a web search engine and a WHOIS website query to find suspicious websites that contain only a small number of search results. A collection of related networking domains and the number of search results to determine whether it is a relay domain name of a suspicious botnet. However, it still cannot accurately screen out the domain flux botnet. If the domain flux technology is applied to a botnet, it can Existing DNS services or domain generation algorithms to achieve the technology of associating multiple domain names with the same IP address and evading URL detection.

承上，各種針對代理伺服器日誌資料所延伸出的惡意網站防護方法，因應日新月異的技術，仍有著諸多的可能性，可以各自針對問題進行改良，而一種針對domain flux殭屍網路的篩選方法，則係目前領域中人亟其需要的。 According to the various malicious website protection methods that are extended to the proxy server log data, there are still many possibilities in response to the changing technology. Each can be improved according to the problem. A screening method for the domain flux botnet, It is the urgent need of people in the field.

本發明提出一種通過代理伺服器日誌偵測domain flux殭屍網路的方法，係因攻擊者為了使殭屍網路(Botnet)的存活率提高，會經常使用domain flux技術以避免被輕易查獲進而封鎖，但由於惡意程式連線至外部特定網站的行為都會詳細被記錄在代理伺服器的日誌資料內，故本發明之發想為透過分析代理伺服器的日誌資料，並透過網域聯集之結果來取得符合domain flux連線行為之聯集後得出行為可能為domain flux殭屍網路的方法。 The present invention proposes a method for detecting a domain flux botnet through a proxy server log. In order to improve the survival rate of a botnet, attackers often use the domain flux technology to avoid being easily detected and then blocked. However, since the behavior of a malicious program connecting to a specific external website will be recorded in detail in the log data of the proxy server, the idea of the present invention is to analyze the log data of the proxy server and use the result of the network domain collection. After obtaining the union set that conforms to the domain flux connection behavior, a method that may be a domain flux botnet is obtained.

本發明之一種通過代理伺服器日誌偵測domain flux殭屍網路之方法，其主要係透過一網站分析模組執行複數步驟，首先為，網站分析模組存取代理伺服器日誌資料，並根據一節點過濾演算法對各該日誌資料中用戶代理(user-agent)資訊代表之連線狀況進行過濾，以將屬於與常規網站連線的日誌內容濾除；其中，該節點過濾演算法係利用用戶代理的節點維度值(degree)此種特徵，來過濾有名的網站，詳細來說，此步驟係透過MapReduce架構，以外部網站其終端URL作為鍵(Key)，用戶代理當作值(Value)，可以有效率的得到每個外部網站被不同用戶代理連結的次數之清單，藉由過濾清單中維度值大的網站，即可初步過濾掉相對有名的網站，本發明係保留在長天期流量資訊中維度值小於一預設閾值(例如，閾值為10)的外部網站，其意旨為，除了過濾掉較有名的網站之外，僅保留只有被相當少數用戶代理(User-agent)所連線的外部網站通常表示其用途特殊，這些外部網站有高機率為惡意中繼站。 A method for detecting a domain flux botnet through a proxy server log according to the present invention is mainly to execute multiple steps through a website analysis module. First, the website analysis module accesses the proxy server log data, and according to a The node filtering algorithm filters the connection status represented by the user-agent information in each of the log data to filter the log content that belongs to the connection with the conventional website; among them, the node filtering algorithm uses the user This feature of the node's node dimension (degree) is used to filter the famous websites. In detail, this step uses the MapReduce architecture to use the terminal URL of the external website as the key and the user agent as the value. A list of the number of times each external website is linked by different user agents can be efficiently obtained. By filtering websites with a large dimension value in the list, you can initially filter out relatively well-known websites. The present invention retains long-term traffic information External websites whose median dimensional value is less than a preset threshold (for example, the threshold is 10) are intended to protect only the more well-known websites Only relatively small number of user agent (User-agent) of the external connection site usually indicates its specific purpose, these sites have a high probability of external malicious relay station.

下一步驟為，該網站分析模組根據一網站分群演算法，在上一步驟過濾後剩餘的日誌內容的連線紀錄中，匹配外部網站中具有相同Client IP以及用戶代理連線紀錄者，以分群找出日誌內容中外部網站之間相關聯的集合；本步驟之目的，係為透過日誌中的流量資訊所提供之資訊，以將外部網站間建立關聯性，其旨在將被同一個程式所連結到的外部網站，皆視為有關的網站；詳細來說，本發明係將具有相同Client IP和用戶代理連線紀錄的外部網站都分在同一集合內，其係代表這些網站是有關連之特徵的，而由於這個建立外部網站集合之特徵(Client IP跟用戶代理)，可以透過日誌記錄中的欄位內容是否完全匹配(Exact Match)來判斷，故這種網站分群演算法亦可以透過MapReduce中的鍵與值(Key-value)架構來實做。 The next step is that the website analysis module uses a website grouping algorithm to match the remaining log content of the log content filtered in the previous step to the connection records of external websites that have the same Client IP and user agent connection records. Grouping to find the collection of associations between external websites in the log content; the purpose of this step is to provide information through the traffic information in the logs to establish associations between external websites, which is intended to be used by the same program All external websites linked to are considered as related websites. In detail, the present invention is to group external websites with the same Client IP and user agent connection records in the same collection, which means that these websites are related websites. Because of the characteristics of establishing an external website collection (Client IP and user agent), it can be judged by whether the content of the fields in the log records exactly match (Exact Match), so this website grouping algorithm can also pass Key-value architecture in MapReduce.

其中，該網站分群演算法可以透過聯集查找的方法進行分群，即以該網站分析模組以剩餘之日誌內容的連線紀錄建立若干集合後，進行下列步驟：以各集合內的個別連線紀錄資料作為元素，若連線紀錄連接的外部網站具有相同之Client IP及用戶代理者，則判斷為交集，並將元素有交集的集合合併；刪除集合內元素數量大於預設閥值的集合；將與其餘集合不相交集的集合判斷為獨立集合；以及，重複上述三步驟直至所有集合被判斷為獨立集合。 Among them, the website grouping algorithm can be grouped by the method of joint set search, that is, after the website analysis module establishes several collections with the connection records of the remaining log content, the following steps are performed: individual connections in each collection Record data is used as an element. If the external website connected to the connection record has the same Client IP and user agent, it is judged as an intersection and merges the collection of the intersection of the elements; deletes the collection whose number of elements is greater than the preset threshold; A set that does not intersect with the remaining sets is determined as an independent set; and the above three steps are repeated until all sets are determined as independent sets.

承上步驟，接著，該網站分析模組取得日誌內容中Client IP的連線紀錄和Client IP及連線之外部網站的連線拓樸；其中，本發明通過以上步驟以不相交集的網址集合對連線紀錄做集合分類後，所得出同一集合之連線紀錄僅會連結到同一集合的網域(domain)，同一集合的連線紀錄所隱含的意義係為具有相同目的之程式所產生出的網路行為，這個步驟將取得Client IP以及外部網址的連線拓樸，並且取得Client IP在代理伺服器日誌中的連線資訊，例如，對不相交集合的網址字串透過雜湊(hash)方法取得其群組代號，配合連線紀錄中所記載每個網址對應連線資訊，透過MapReduce的Multiple Input機制以網址當作key即可對照出得群組代號對應的連線紀錄集合以及群組的連線拓樸圖。 Following the steps, the website analysis module obtains the client IP connection record in the log content and the client IP and the connection topology of the connected external website; wherein, the present invention uses the above steps to collect the disjoint sets of URLs. After classifying the connection records, the connection records of the same collection will only be connected to the domain of the same collection. The meaning of the connection records of the same collection is generated by programs with the same purpose. Out of the network behavior, this step will obtain the client IP and the connection topology of the external URL, and obtain the client IP connection information in the proxy server log, for example, hash the URL string of disjoint sets ) Method to obtain its group code, with the corresponding connection information of each URL recorded in the connection record, using MapReduce's Multiple Input mechanism to use the URL as the key, you can compare the connection record collection and group corresponding to the group code. Connected topology illustration of group.

再來，該網站分析模組將外部網站與相關聯的集合間之關係透過連線拓樸呈現，並以一連線特徵演算法判斷集合是否符合domain flux的連線特徵，其中，所謂的domain flux的連線特徵，係指連線係透過少數的Client IP和相同的用戶代理(User-agent)以連線至許多相異的外部網址網域；實作上，該連線特徵演算法係用以運算一個集合中的終端URL數與Client IP數的比值，若計算出的比值超過了一預設閥值，即代表集合的網站符合domain flux連線特徵的網域變動行為。 Next, the website analysis module presents the relationship between the external website and the associated collection through a connection topology, and uses a connection characteristic algorithm to determine whether the collection meets the connection characteristics of domain flux. Among them, the so-called domain The connection feature of flux refers to the connection through a few Client IPs and the same user-agent to connect to many different external URL domains. In practice, the connection feature algorithm is It is used to calculate the ratio of the number of terminal URLs to the number of client IPs in a collection. If the calculated ratio exceeds a preset threshold, it means that the website of the collection meets the domain flux behavior of domain flux connection.

最後，該網站分析模組取出符合domain flux連線特徵的集合，濾除其中代表與正常的網路應用程式(例如，防毒程式軟體)或CDN等等所連接之網域，該網站分析模組即可將其餘網域判斷為domain flux的殭屍網路並予以警戒。 Finally, the website analysis module extracts a set of domain flux connection characteristics, and filters out the domains that are connected to normal network applications (such as antivirus software) or CDNs. The website analysis module The remaining domains can be judged as a botnet of domain flux and alert.

以上，即為本發明之通過代理伺服器日誌偵測domain flux殭屍網路的方法，可以透過domain flux殭屍網路所可能具有的特徵，在巨量長天期的日誌資料中，挖掘出疑似domain flux殭屍網路的外部網站，以便進行進一步的預防措施。 The above is the method for detecting a domain flux botnet through a proxy server log according to the present invention. Through the possible characteristics of the domain flux botnet, a suspected domain can be mined in a huge amount of long-term log data. An external website of the flux botnet for further precautions.

A‧‧‧電腦 A‧‧‧Computer

B‧‧‧電腦 B‧‧‧Computer

C‧‧‧電腦 C‧‧‧Computer

E‧‧‧電腦 E‧‧‧Computer

F‧‧‧電腦 F‧‧‧Computer

G‧‧‧電腦 G‧‧‧Computer

H‧‧‧電腦 H‧‧‧Computer

1‧‧‧攻擊者 1‧‧‧ attacker

2‧‧‧攻擊者 2‧‧‧ attacker

10‧‧‧群集 10‧‧‧ cluster

11‧‧‧中繼站 11‧‧‧ relay station

12‧‧‧中繼站 12‧‧‧ relay station

13‧‧‧中繼站 13‧‧‧ relay station

20‧‧‧群集 20‧‧‧ cluster

21‧‧‧中繼站 21‧‧‧ relay station

22‧‧‧中繼站 22‧‧‧ relay station

23‧‧‧中繼站 23‧‧‧ relay station

30‧‧‧中繼站 30‧‧‧ relay station

S201~S205‧‧‧方法步驟 S201 ~ S205‧‧‧Method steps

圖1為本發明通過代理伺服器日誌偵測domain flux殭屍網路的情境示意圖。 FIG. 1 is a schematic diagram of a scenario of detecting a domain flux botnet through a proxy server log according to the present invention.

圖2為為本發明通過代理日誌對外部網站分群之方法流程圖。 FIG. 2 is a flowchart of a method for grouping external websites by proxy logs according to the present invention.

圖3係為本發明一個用戶代理連結的次數之統計清單的範例示意圖。 FIG. 3 is a diagram illustrating an example of a statistical list of the number of user agent connection times according to the present invention.

圖4係為本發明將一簡單的日誌資料轉換成外部網站集合的範例示意圖。 FIG. 4 is a schematic diagram of an example of converting a simple log data into a collection of external websites according to the present invention.

圖5係為本發明以集合回推網路連線群組的範例示意圖。 FIG. 5 is a schematic diagram of an exemplary pushback network connection group according to the present invention.

圖6係為本發明domain flux的網路行為拓樸圖範例示意圖。 FIG. 6 is an exemplary schematic diagram of a network behavior topology diagram of the domain flux of the present invention.

圖7係一防毒軟體透過domain flux方法進行連線行為的範例示意圖。 FIG. 7 is a schematic diagram of an example of a connection behavior of an antivirus software through a domain flux method.

圖8係一種代理伺服器日誌資料的範例示意圖。 FIG. 8 is a schematic diagram of an example of proxy server log data.

圖9係為透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖。 FIG. 9 is a schematic diagram of an embodiment of screening a botnet that conforms to domain flux by the method of the present invention.

圖10係為透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖。 FIG. 10 is a schematic diagram of an embodiment of screening a botnet that conforms to domain flux through the method of the present invention.

為了使本發明的目的、技術方案及優點更加清楚明白，下面結合附圖及實施例，對本發明進行進一步詳細說明。應當理解，此處所描述的具體實施例僅用以解釋本發明，但並不用於限定本發明。 In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not intended to limit the present invention.

圖1係為本發明通過代理伺服器日誌偵測domain flux殭屍網路的情境示意圖，攻擊者將惡意程式植入被害的主機後即可使其成為殭屍電腦(Bots)，而殭屍電腦可以進行竊取機密或敏感資料等惡意行為，而現今攻擊者為了提高殭屍網路(Botnet)之存活率，會使用domain flux的技術來迴避檢查，但由於惡意程式連到外部特定網站的行為都會被記錄在代理伺服器日誌資料內，故本發明仍可將其解析出來，圖1中，攻擊者1利用惡意程式將目標企業內部網路中的電腦A和電腦B和電腦C變為殭屍電腦，電腦A和電腦B和電腦C中的惡意程式會連結至中繼站11、中繼站12以及中繼站13，而電腦C在執行正常連線時連接到有名網站；另外，攻擊者2利用惡意程式將目標企業內部網路中的電腦D和電腦E和電腦H變為殭屍電腦，電腦A和電腦B和電腦C中的惡意程式會連結至中繼站21、中繼站22以及中繼站23，而電腦D在執行正常連線時連接到有名網站；其中電腦C和電腦 D會連到中繼站也會連到名網站；另外，有名網站30係與執行正常應用程式的電腦C、電腦D、電腦E、電腦F和電腦G連結，其中的電腦與中繼站有著交錯的連結關係，且殭屍網路係透過domain flux來迴避一般檢查，將需要透過本發明的方法，才能夠有效率的在Proxy Server的日誌資料中，透過變成殭屍電腦的電腦A、電腦B、電腦E和電腦H的紀錄反追蹤出疑為殭屍網路中繼站網域的方法。 FIG. 1 is a schematic diagram of a scenario of detecting a domain flux botnet through a proxy server log according to the present invention. An attacker can insert a malicious program into a victim host to make it a bot, and the bot can steal. Malicious behaviors such as confidential or sensitive data, and today, in order to improve the survival rate of botnets, attackers will use domain flux technology to avoid checks, but the behavior of malicious programs connecting to specific external websites will be recorded in the proxy In the server log data, the present invention can still parse it out. In Figure 1, the attacker 1 uses a malicious program to change the computer A, computer B, and computer C in the target enterprise's intranet into zombies. Computer A and Malicious programs in computer B and computer C will be connected to relay station 11, relay station 12, and relay station 13, and computer C will connect to a well-known website while performing a normal connection; in addition, attacker 2 uses a malicious program to connect the target enterprise's intranet Computer D and computer E and computer H become zombie computers, and the malicious programs in computer A and computer B and computer C will link to relay station 21, relay station 22, and China Station 23, and computer D is connected to the famous website when the normal connection is performed; among them, computer C and computer D will be connected to the relay station and also to the famous website; in addition, the famous website 30 is connected to the computer C and computer running the normal application program. D, computer E, computer F, and computer G are connected, and the computer and the relay station have a staggered connection relationship, and the botnet bypasses the general inspection through the domain flux. The method of the present invention will be required to be able to efficiently The proxy server's log data uses the records of computer A, computer B, computer E, and computer H that became bots to track down the suspected botnet relay domain.

接著，圖2係為本發明通過代理日誌對外部網站分群之方法流程圖，首先，步驟S201係為以用戶代理維度值過濾有名網站，透過MapReduce的架構，網站分析模組獲取日誌中每個外部網站被不同用戶代理連結的次數清單，過濾清單中維度值過大的網站即可將有名的網站先濾掉，如圖3所示，其係為一個用戶代理連結的次數之統計清單的範例示意圖，在其中可以觀察到大多數的網站都只有被一組用戶代理連線過的紀錄(佔比92.77%)，而有名的網站例如www.google.com，其則是被22825個不同的用戶代理連線過；透過上述觀察可以得知，本發明透過檢查外部網站被不同的用戶代理連線紀錄，可以針對有名的網站先做初步篩選。以圖1範例討論的話，有名網站30會在S201的步驟先被慮除。 Next, FIG. 2 is a flowchart of a method for grouping external websites by proxy logs according to the present invention. First, step S201 is to filter famous websites by using user agent dimension values. Through the MapReduce architecture, the website analysis module obtains each external website in the log. A list of the number of times a website has been linked by different user agents. A website with a large dimension value in the filtering list can filter out famous websites first, as shown in Figure 3, which is an example diagram of a statistical list of the number of user agent links. It can be observed that most websites have only been connected by a group of user agents (accounting for 92.77%), and famous websites such as www.google.com are connected by 22825 different user agents. Through the above observations, it can be known from the above observations that by checking the connection records of external websites by different user agents, the present invention can make a preliminary screening for famous websites. According to the example shown in FIG. 1, the famous website 30 will be eliminated in step S201.

再來，請繼續參照圖2，步驟S202則是透過聯集查找來獲取外部網站的關聯集合，主要係將代理伺服器日誌資料轉換為外部網站集合之形式來呈現；本發明係以網站分析模組將具有相同Client IP和用戶代理連線紀錄的外部網站分至同個集合，代表這些網站之間是有關連的，其可以透過MapReduce的Key-value架構進行實做；請參考圖4，其係將一簡單的日誌資料轉換成外部網站集合的範例示意圖，由圖4中可以觀察到，其係以Client IP和用戶代理作為鍵，而終端的URL(Dest Url)作為值進行聯集查找，最後將範例日誌資料透過聯集查找所產出的結果係為“CnC1,CnC2,CnC3”以及“CnC4,CnC5,CnC6”此兩個不相交的集合，而關於聯集查找如何實施的詳細方法，將在之後段落進行說明。 Then, please continue to refer to FIG. 2. In step S202, the association set of the external website is obtained through the joint search, which is mainly presented by transforming the proxy server log data into an external website collection. The present invention is based on a website analysis model. The group divides external websites with the same Client IP and user-agent connection records into the same collection, which indicates that these websites are related and can be implemented through the MapReduce Key-value architecture; please refer to Figure 4, which It is a schematic diagram of an example of converting a simple log data into a collection of external websites. As can be observed in Figure 4, it uses Client IP and user agent as keys, and the URL (Dest Url) of the terminal as the value to perform joint set search. Finally, the results of the example log data through the joint set search are two disjoint sets: “CnC1, CnC2, CnC3” and “CnC4, CnC5, CnC6”. For the detailed method of how the joint set search is implemented, This will be explained in the following paragraphs.

接著，步驟S203係以集合回推網路連線群組，其中，網站分析模組透過互不相交的網址集合來對連線紀錄做分群，得出之同一群組的連線紀錄僅連線到同一集合的網域(domain)，同一群組之連線紀錄所表示的意義為其乃係同一目的之程式所產生出的網路行為；在本步驟中，網站分析模組會取得Client IP與外部網址的連線拓樸，並且取得Client IP在代理伺服器日誌中的連線資訊；如圖5中以集合回推網路連線群組的範例示意圖所示，本發明之網站分析模組對“CnC1,CnC2,CnC3”與CnC4,CnC5,CnC6”此兩不相交集合以雜湊(hash)方式獲取其Group-ID之後，再將終端URL的CnC1~CnC6依其所應屬的集合給予Group-ID，再搭配連線紀錄中所記錄下之CnC1~CnC6的連線資訊，透過MapReduce的Multiple Input以網址當作Key，即可對照出Group-ID對應的連線紀錄群組與群組之連線拓樸圖，如圖中下方所示。 Next, step S203 pushes back the network connection group by collection. The website analysis module groups the connection records by disjoint sets of URLs. The connection records of the same group are only connected. To the domain of the same set, the meaning of the connection records of the same group is the network behavior generated by programs with the same purpose; in this step, the website analysis module will obtain the Client IP The topology of the connection with the external website, and obtain the connection information of the Client IP in the proxy server log; as shown in the example diagram of the pushback network connection group in Figure 5, the website analysis model of the present invention After the two disjoint sets of "CnC1, CnC2, CnC3" and CnC4, CnC5, CnC6 "are obtained in a hash manner, the Group-ID is given, and then CnC1 ~ CnC6 of the terminal URL are given according to the set to which they belong. Group-ID, combined with the CnC1 ~ CnC6 connection information recorded in the connection record, and using MapReduce Multiple Input with the URL as the key, you can compare the connection record group and group corresponding to the Group-ID The connection topology is shown below.

再來，步驟S204係為判斷行為符合domain flux特徵，是網站分析模組分別將各個連線的群組以其網路連線拓樸圖呈現，接著判斷群組是否符合domain flux的連線特徵；其中，domain flux的連線特徵為少數的Client IP連線至眾多的外部網址，如圖6所示，其係為domain flux的網路行為拓樸圖範例示意圖，在單一集合內，只有少數Client IP使用相同的用戶代理(User-agent)連線多個不同網域，圖中可見，其係由10.107.56.20的Client IP使用相同的用戶代理連線至開頭為sp-install、c-sp-storage、sp-download、sp-alive、sp-setting、sp-storage、Orbtr-install、spms-download、c-api.sec等等的網域，其符合了一種典型以少數Client IP使用相同的用戶代理(User-agent)連線多個不同網域的domain flux的連線特徵。 Next, step S204 is to determine that the behavior conforms to the characteristics of domain flux. The website analysis module presents each connected group with its network connection topology diagram, and then determines whether the group meets the connection characteristics of domain flux. Among them, the connection characteristic of domain flux is that a small number of Client IPs connect to many external URLs, as shown in Figure 6, which is a schematic diagram of a network behavior topology diagram of domain flux. In a single set, only a few The Client IP uses the same User-agent to connect to multiple different domains. As can be seen in the figure, the Client IP uses the same User Agent to connect to the beginning of sp-install and c-sp at 10.107.56.20. -storage, sp-download, sp-alive, sp-setting, sp-storage, Orbtr-install, spms-download, c-api.sec, and so on, which conform to a typical use of the same The user-agent (User-agent) connection characteristics of the domain flux of multiple different domains.

再來，步驟S205係為過濾正當行為網域，其中，網站分析模組取出符合domain flux連線特徵的集合後，將濾除其中代表與正常的網路應用程式(例如，防毒程式軟體)或CDN等等所連接之網域；如圖7所示，其係一防毒軟體透過domain flux方法進行連線行為的範例示意圖，但這種已知的正常網路應用程式行為，將不會被列為本發明所欲偵測的殭屍網路行為，會在此步驟被排除掉，剩餘的網域才會被判斷為domain flux的殭屍網路；例如在圖7之範例中，可透過將”iavs9x.u.avast.com"的正規表示式加入過濾清單，來過濾掉這種正常網路應用程式。 Then, step S205 is to filter the domains of legitimate behaviors. After the website analysis module takes out the collection that meets the domain flux connection characteristics, it will filter out the representative and normal network applications (for example, antivirus software) or CDN and other connected domains; as shown in Figure 7, it is an example schematic of anti-virus software's connection behavior through the domain flux method, but this known normal network application behavior will not be listed The botnet behavior to be detected by the present invention will be eliminated at this step, and the remaining domains will be judged as botnets with domain flux. For example, in the example in FIG. 7, "iavs9x" .u.avast.com "regular expression is added to the filtering list to filter out this normal web application.

接著，如圖8所示，其係舉出為一種代理伺服器日誌資料的範例示意圖，其中，每一行資料分別都代表一條log紀錄的所建立的時間戳(Timestamp)、客戶IP(Client IP)、終端URL(Dest Url)、終端埠(Dest Port)、用戶代理(User-agent)等資訊，更可以額外包含傳送量(Sent Byte)、接收量(Receive Byte)、方法(Method)、路徑(Path)等等資訊，而本發明主要僅使用到其中的客戶IP、終端URL與用戶代理資訊來偵測domain flux殭屍網路，藉由節點維度值的大小來判斷是否為有名的網站，維度值低的代表僅被少量用戶代理連線的外部網站，其有較高的機率為惡意中繼站。 Next, as shown in FIG. 8, it is a schematic diagram of an example of log data of a proxy server, in which each line of data represents a created time stamp (Timestamp), client IP (Client IP) of a log record, respectively. , Terminal URL (Dest Url), terminal port (Dest Port), user-agent (User-agent) and other information, can also include the amount of transmission (Sent Byte), the amount of reception (Receive Byte), method (Method), path ( Path) and other information, and the present invention mainly uses only the client IP, terminal URL, and user agent information to detect the domain flux botnet, and judges whether it is a famous website by the size of the node dimension value. The dimension value A low representative of an external website that is only connected by a small number of user agents has a higher chance of being a malicious relay station.

承上，步驟S202中使用網站分群演算法將外部網站集合中有交集的集合合併，進而產生不相交的集合，其聯集查找方法的步驟如下：(一)以集合內的元素為單位，找出集合彼此間的交集，並將有交集的集合合併為一個集合(集合大小超過預設閥值的在此步驟中過濾)；(二)判斷哪些集合是不相交集合(跟其他集合沒有交集)並將它們獨立出來，剩下的集合則回到步驟(一)執行；(三)重複步驟(一)跟步驟(二)直到所有集合都被獨立出來。以圖4作為範例的話，以MapReduce架構，在Item 1~Item 8當中，以Client IP和用戶代理作為key，而終端URL作為值的話，首先可以群集出四個集合，分別為Item 1跟Item 2群集出CnC1,CnC2之值，Item 3與Item 4會群集出CnC2,CnC3之值，Item 5與Item 6會群集出CnC4,CnC5之值，Item 7與Item 8會群集出CnC4,CnC6之值，這四個集合可以作為再聯集的輸入集合，而聯集最後的結果為“CnC1,CnC2,CnC3”跟“CnC4,CnC5,CnC6”這兩個不相交的網址(URL)集合；接著，則依圖5範例所示，在本發明進行步驟S203後，取得兩集合的連線拓樸圖。 In succession, in step S202, a website clustering algorithm is used to merge the sets that have intersections in the collection of external websites, and then generate disjoint sets. The steps of the method for searching for a connected set are as follows: (1) using the elements in the set as the unit, find Set the intersections of the sets with each other, and merge the sets that have intersections into one set (the size of the set exceeds the preset threshold is filtered in this step); (2) determine which sets are disjoint sets (there is no intersection with other sets) And separate them, and the remaining sets return to step (a) for execution; (c) repeat steps (a) and (b) until all sets are independent. Taking Figure 4 as an example, using the MapReduce architecture, in Item 1 ~ Item 8, with Client IP and user agent as keys, and terminal URL as the value, four sets can be clustered first, which are Item 1 and Item 2 The values of CnC1 and CnC2 are clustered. The values of CnC2 and CnC3 are clustered at Item 3 and Item 4. The values of CnC4 and CnC5 are clustered at Item 5 and Item 6. The values of CnC4 and CnC6 are clustered at Item 7 and Item 8. These four sets can be used as the input set of the rejoining set, and the final result of the rejoining set is two disjoint sets of URLs (CnC1, CnC2, CnC3) and "CnC4, CnC5, CnC6"; then, According to the example shown in FIG. 5, after step S203 is performed in the present invention, a connection topology diagram of the two sets is obtained.

而本發明係依據同個Client IP連線到許多不同的終端URL，所呈現的一對多關係符合domain flux連線特徵之連線拓樸以判斷殭屍網路，其舉例來說，可以根據在一個集合中的終端URL數值，除以客戶IP數值的結果值超過一定的預設閥值，作為一種判定依據，來判斷集合符合domain flux連線特徵的網域變動行為。 The present invention is based on the same Client IP connecting to many different terminal URLs, and the one-to-many relationship presented conforms to the connection topology of the domain flux connection to determine the botnet. For example, it can be based on the The value of the terminal URL in a set divided by the value of the client IP exceeds a certain preset threshold, as a judgment basis to determine the domain change behavior of the set that meets the domain flux connection characteristics.

最後，如圖9與圖10所示，其係為一個透過本發明之方法篩選出符合domain flux殭屍網路之實施例示意圖，經過本發明如圖2所示的流程後，取得之群組739663648符合domain flux殭屍網路集合之特徵，進一步，再將此群組上的網域(Domain)於Virus Total網站進行查詢，如圖10所示，可以觀察到確實透過本發明獲取出的整群網域都被登記為惡意的網址，故知本發明之通過代理伺服器日誌偵測domain flux殭屍網路之方法確實有效。 Finally, as shown in FIG. 9 and FIG. 10, it is a schematic diagram of an embodiment of screening a botnet that conforms to domain flux through the method of the present invention. After the process shown in FIG. 2 of the present invention, a group 739663648 is obtained. In line with the characteristics of the domain flux botnet collection, further, the domain on this group is queried on the Virus Total website. As shown in FIG. 10, the entire group of networks obtained through the present invention can be observed. Domains are registered as malicious web addresses, so it is known that the method of detecting domain flux botnets through proxy server logs of the present invention is indeed effective.

上列詳細說明乃針對本發明之最佳實施例進行具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The above detailed description is specifically for the preferred embodiment of the present invention, but this embodiment is not intended to limit the patent scope of the present invention. Any equivalent implementation or change without departing from the technical spirit of the present invention should be included in Within the scope of the patent in this case.

綜上所述，本發明於技術思想上實屬創新，也具備先前技術不及的多種功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出專利申請，懇請貴局核准本件發明專利申請案以勵發明，至感德便。 In summary, the present invention is technically innovative and has multiple effects that are inferior to the previous technology. It has fully met the novel and progressive statutory invention patent requirements. It has filed a patent application in accordance with the law and urges your office to approve this invention. The patent application encourages invention, and it is a matter of virtue.

Claims

A method for detecting domain flux botnets through proxy server logs, which uses a website analysis module to perform the following steps, including: accessing at least one log data stored by an external proxy server; The connection status represented by the user-agent information in the log data is filtered, and the log content belonging to the connection with the conventional website is filtered; according to a website group algorithm, the connection log of the remaining log content is filtered. In order to match the external client with the same Client IP and user-agent connection record, perform grouping to obtain the associated set of external websites in the log content; obtain the client IP connection record and Client IP and connection in the log content The connection topology of external websites; the relationship between external websites and associated collections is presented through the connection topology, and a connection feature algorithm is used to determine whether the collection meets the connection characteristics of domain flux. Among them, domain flux Connection characteristics refers to the use of the same user agent with Client IP to connect to multiple dissimilar domains; Collection feature, which was filtered off and the domain lines connected to the normal web application or the CDN, it is determined that the remaining domain zombie network domain flux.

The method for detecting a domain flux botnet through a proxy server log as described in item 1 of the scope of the patent application, wherein the node filtering algorithm is performed by the website analysis module based on long-term log data. The terminal URL is used as the key, and the user agent information is used as the value. The MapReduce framework calculates and obtains a list of the number of times that each external website in the log data has been connected by a different user agent. The number of times is the same as the dimension value on the node. The size of the dimension value can be used to determine whether the external website represented by the node is a famous website. Only external websites whose dimension value is less than the preset threshold will be retained. .

The method for detecting domain flux botnets through proxy server logs as described in item 1 of the scope of the patent application, wherein the grouping algorithm of the website is grouped by the method of joint set search, that is, the website analysis module After establishing several collections of connection records in the log content, the following steps are performed: using individual connection record data in each collection as elements, if the external website connected to the connection records has the same Client IP and user agent, it is determined as Intersect and merge sets with elements that have intersections; delete sets that have more elements in the set than a preset threshold; judge sets that do not intersect with other sets as independent sets; and repeat the above three steps until all sets are judged as independent set.

The method for detecting a domain flux botnet through a proxy server log as described in item 1 of the scope of the patent application, wherein the connection feature algorithm is used to calculate the ratio of the number of terminal URLs to the number of client IPs in a set If the ratio exceeds the preset threshold, it means that the aggregated website conforms to the domain change behavior of the domain flux connection characteristic.