HK40000813B

HK40000813B - System and methods for detecting online fraud

Info

Publication number: HK40000813B
Application number: HK19124260.1A
Authority: HK
Inventors: A-O‧达米安
Original assignee: 比特梵德知识产权管理有限公司
Priority date: 2016-07-11
Filing date: 2017-07-10
Publication date: 2023-08-11

Description

System and method for detecting online fraud

Background

The present invention relates to computer security systems and methods, and in particular, to systems and methods for detecting online fraud (e.g., fraudulent web pages).

The rapid development of services such as electronic communication, online commerce, and online banking commerce has been accompanied by an increase in electronic crimes. Internet fraud, particularly in the form of phishing and identity theft, has posed an increasing threat to global internet users. Sensitive identity information and credit card details fraudulently obtained by international criminal networks operating on the internet are used to fund various online transactions and/or are further sold to third parties. In addition to the direct economic loss to individuals, internet fraud can cause a series of undesirable side effects such as increased security costs for companies, increased retail and banking prices, decreased stock value, decreased wages and decreased tax.

In an exemplary phishing attempt, a false website masquerades as a real web page belonging to an online retailer or financial institution inviting the user to enter some personal information (e.g., user name, password) and/or financial information (e.g., credit card number, account number, security code). Once an unsuspecting user submits information, the information may be collected by a fake website. In addition, the user may be directed to another web page, which may install malware on the user's computer. Malware (e.g., viruses, trojan horses) may continue to steal personal information by recording keys that a user presses while accessing certain web pages, and may turn the user's computer into a platform for launching other malicious attacks.

Software running on the internet user's computer system may be used to identify fraudulent network documents and alert and/or prevent access to those documents. Several methods for identifying fraudulent web pages have been proposed. Exemplary policies include matching addresses of web pages with a list of known fraudulent and/or trusted addresses (a technique known as blacklisting and whitelisting, respectively). To avoid such detection, fraudsters often change their website addresses.

There is a continuing effort to develop methods of detecting and preventing online fraud, particularly methods that enable active detection.

Disclosure of Invention

According to one aspect, a computer system includes at least one hardware processor configured to operate a reverse address mapper, a registration data filter connected to the reverse address mapper, and a content analyzer connected to the registration data filter. The reverse address mapper is configured to identify a set of co-hosted internet domains from known rogue internet domains, wherein the known rogue internet domains are located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The registration data filter is configured to filter the co-hosted internet domain set to generate a fraud candidate domain subset. Filtering the set of co-hosted internet domains includes determining whether a selection condition is met based on domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is met. The content analyzer is configured to analyze an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent, and in response, determine that the candidate domain is fraudulent when the electronic document is fraudulent.

According to another aspect, a method of identifying a rogue internet domain includes: identifying a set of co-hosted internet domains using at least one hardware processor from a known rogue internet domain, wherein the known rogue internet domain is located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The method further includes filtering, using the at least one hardware processor, the set of co-hosted internet domains to produce a subset of fraud candidate domains. Filtering the set of co-hosted internet domains includes determining whether a selection condition is met based on domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is met. The method further includes analyzing, using the at least one hardware processor, electronic documents distributed by candidate domains selected from the subset of fraud candidate domains to determine whether the electronic documents are fraudulent. The method further comprises: in response to analyzing the electronic document, determining that the candidate domain is fraudulent when the electronic document is fraudulent.

According to another aspect, a non-transitory computer-readable medium stores instructions that, when executed by at least one hardware processor, cause the hardware processor to form a reverse address mapper, a registration data filter connected to the reverse address mapper, and a content analyzer connected to the registration data filter. The reverse address mapper is configured to identify a set of co-hosted internet domains from known rogue internet domains, wherein the known rogue internet domains are located at a target Internet Protocol (IP) address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address. The registration data filter is configured to filter the co-hosted internet domain set to generate a fraud candidate domain subset. Filtering the set of co-hosted internet domains includes determining whether a selection condition is met based on domain name registration data characterizing one of the set of co-hosted internet domains, and in response, selecting the domain into the subset of fraud candidate domains when the selection condition is met. The content analyzer is configured to analyze an electronic document distributed by a candidate domain selected from the subset of fraudulent candidate domains to determine whether the electronic document is fraudulent and, in response, determine that the candidate domain is fraudulent when the electronic document is fraudulent.

Drawings

The foregoing aspects and advantages of the invention will be better understood upon reading the following detailed description and upon reference to the drawings in which:

FIG. 1 illustrates an exemplary set of client systems protected from online fraud according to some embodiments of the present invention.

Fig. 2-a illustrates an exemplary hardware configuration of a client system according to some embodiments of the invention.

Fig. 2-B illustrates an exemplary hardware configuration of a server computer system according to some embodiments of the invention.

FIG. 3 illustrates exemplary software components executing on a client system according to some embodiments of the invention.

Fig. 4 illustrates an exemplary data exchange between a client system and a secure server according to some embodiments of the invention.

Fig. 5 illustrates an exemplary sequence of steps performed by a fraud prevention module and a security server to protect a client system from electronic fraud, according to some embodiments of the invention.

FIG. 6 illustrates exemplary components of a fraud identification server according to some embodiments of the invention.

FIG. 7 illustrates an exemplary sequence of steps performed by a fraud identification server according to some embodiments of the present invention.

Detailed Description

In the following description, it is to be understood that all of the enumerated connections between structures may be direct operative connections, or indirect operative connections via intermediate structures. A set of elements includes one or more elements. Any reference to an element should be understood as referring to at least one element. The plurality of elements comprises at least two elements. Any described method steps need not necessarily be performed in the particular illustrated order, unless otherwise necessary. A first element (e.g., data) derived from a second element encompasses both the first element equal to the second element, as well as the first element generated by processing the second element and optionally other data. Making a determination or decision from a parameter encompasses making a determination or decision from a parameter and optionally from other data. Unless otherwise specified, an indicator of an amount/data may be the amount/data itself, or an indicator other than the amount/data itself. A computer program is a series of processor instructions that perform tasks. The computer programs described in some embodiments of the invention may be separate software entities or sub-entities (e.g., subroutines, libraries) of other computer programs. Computer security encompasses protecting devices and data from unauthorized access, modification, and/or destruction unless otherwise specified. Unless otherwise specified, the term online fraud is not limited to fraudulent websites, but also encompasses other illegitimate or unsolicited commercial electronic communications, such as e-mail, instant messaging, and telephone text and multimedia messaging internet domains (or simply domains) are a subset of the computing resources (real or virtual computer systems, network addresses) owned, controlled or operated by a particular individual or organization. A fraudulent internet domain is a domain that hosts and/or distributes fraudulent electronic documents. Domain names are alphanumeric aliases that represent individual internet domains. The rogue domain name is a domain name of a rogue domain. Computer-readable media encompasses non-transitory media such as magnetic, optical, and semiconductor storage media (e.g., hard disk drives, optical disks, flash memory, DRAM), as well as communication links such as electrically conductive cables and fiber optic links. According to some embodiments, the present invention provides, inter alia, computer systems comprising hardware (e.g., one or more processors) programmed to perform the methods described herein, and computer-readable media encoding instructions to perform the methods described herein.

The following description illustrates embodiments of the invention and does not necessarily illustrate embodiments of the invention in a limiting manner.

Fig. 1 illustrates an exemplary fraud prevention system according to some embodiments of the invention. The security server 14 and fraud identification server 12 protect the plurality of client systems 10a-d from online fraud. Client systems 10a-d generally represent any electronic device having a processor and memory and capable of connecting to a communication network. Exemplary client devices include, among other things, personal computers, laptop computers, mobile computing devices (e.g., tablet computers), mobile phones, wearable devices (e.g., watches, fitness monitors), game consoles, TVs, and home appliances (e.g., refrigerators, media players). Client systems 10a-d are interconnected via a communication network 13, such as a corporate network or the internet. Portions of network 13 may include a Local Area Network (LAN) and/or a telecommunications network (e.g., a 3G network).

Each server 12, 14 generally represents a set of communicatively coupled computer systems that may not be in physical proximity to each other. In some embodiments, the security server 14 is configured to: receiving a query from a client system, the query indicating an electronic document, such as a web page or an electronic message; and respond with an evaluation indicator indicating whether the respective document is likely fraudulent. In some embodiments, the likelihood of fraud is determined from a location indicator of the respective document. Exemplary location indicators include domain names, host names, and Internet Protocol (IP) addresses of computer systems hosting or distributing the respective electronic documents. Domain names are terms commonly used in the art to refer to a unique sequence of characters that identify a particular address field of the internet owned and/or controlled by an individual or organization. The domain name constitutes an abstraction (e.g., an alias) of a set of network addresses (e.g., IP addresses) of computers hosting and/or distributing electronic documents. Domain names typically include a cascading tag sequence defined by points, such as www.bitdefender.com.

Fraud identification server 12 is configured to gather information about online fraud, including, for example, a list of location indicators (domain names, IP addresses, etc.) of fraud documents. In some embodiments, fraud identification server 12 stores fraud indication information in fraud domain database 15 that may be further used by security server 14 to determine the likelihood that the electronic document is fraudulent. Details of such functions are given below.

Fig. 2-a illustrates an exemplary hardware configuration of client system 10 (e.g., systems 10a-d in fig. 1). For simplicity, the client systems shown are computer systems; the hardware configuration of other client systems, such as mobile phones, smartwatches, etc., may differ slightly from the configuration shown. Client system 10 includes a set of physical devices including a hardware processor 20 and a memory unit 22. The processor 20 includes a physical device (e.g., a microprocessor, a multi-core integrated circuit formed on a semiconductor substrate, etc.) configured to perform computations and/or logical operations with signal and/or data sets. In some embodiments, such operations are indicated to the processor 20 in the form of a sequence of processor instructions (e.g., machine code or other type of encoding). The memory unit 22 may include volatile computer-readable media (e.g., DRAM, SRAM) that store instructions and/or data that are accessed or generated by the processor 20.

Input device 24 may include a computer keyboard, mouse, microphone, etc., including a separate hardware interface and/or adapter that allows a user to introduce data and/or instructions into client system 10. Output device 26 may include a display device (e.g., a display screen, a liquid crystal display) and speakers, and a hardware interface/adapter, such as a graphics card, which allows client system 10 to transfer data to a user. In some embodiments, the input device 24 and the output device 26 may share common hardware, such as a touch screen device. The storage unit 28 includes a computer-readable medium capable of non-volatile storage, reading, and writing of software instructions and/or data. Exemplary storage units 28 include magnetic and optical disks and flash memory devices, and removable media such as CD and/or DVD disks and drives. The set of network adapters 32 enable the client system 10 to connect to computer networks and/or other electronic devices. Controller hub 30 represents a plurality of systems, peripheral devices, and/or chipset buses, and/or all other circuitry enabling communication between processor 20 and devices 22, 24, 26, 28, and 32. For example, the controller hub 30 may include a memory controller, an input/output (I/O) controller, an interrupt controller, and the like. In another example, controller hub 30 may include a north bridge connecting processor 20 to memory 22 and/or a south bridge connecting processor 20 to devices 24, 26, 28, and 32.

Fig. 2-B illustrates an exemplary hardware configuration of fraud identification server 12, according to some embodiments of the present invention. The security server 14 may have a similar configuration. Fraud identification server 12 includes at least one hardware processor 120 (e.g., microprocessor, multi-core integrated circuit), physical memory 122, server storage 128, and a set of server network adapters 132. Adapter 132 may contain a network card and other communication interfaces that enable fraud identification server 12 to connect to communication network 13. The server storage 128 may store at least a subset of records from the fraud domain database 15. In an alternative embodiment, server 12 may access fraud records from database 15 via network 13. In some embodiments, server 12 further includes input and output devices, which may function similarly to input/output devices 24 and 26, respectively, of client system 10.

Fig. 3 illustrates exemplary software executing on client system 10 according to some embodiments of the invention. Operating System (OS) 34 provides a guestThe interface between the hardware of end system 10 and a set of software applications. Exemplary OS containsAnd->Etc. Application 36 generally represents any user application such as word processing, image processing, spreadsheets, calendars, online games, social media, web browsers, and electronic communication applications, among others.

Fraud prevention module 38 protects client system 10 from electronic fraud, such as by preventing client system 10 from accessing fraudulent electronic documents (e.g., fraudulent websites, email messages, etc.). In some embodiments, the operation of fraud prevention module 38 may be turned on and/or off by a user of client system 10. Fraud prevention module 38 may be a stand-alone application or may form part of a suite of computer programs that protect client system 10 from computer security threats such as malware (malicious code), spyware, and unauthorized intrusion. The module 38 may operate in various levels of processor permissions (e.g., user mode, kernel mode). In some embodiments, module 38 is integrated with application 36, for example, as a plug-in, an accessory, or a toolbar.

In some embodiments, fraud prevention module 38 may include a network filter 39 configured to intercept requests of client system 10 to access remote documents and selectively block individual requests. Exemplary access requests detected by module 38 include hypertext transfer protocol (HTTP) requests issued by client system 10. The network filter 39 may operate, for example, as a driver registered with the OS 34. In embodiments where OS 34 and application 36 execute within virtual machines, fraud prevention module 38 (or at least network filter 39) may execute outside the respective virtual machine, e.g., at the hypervisor's processor permission level. Such a configuration may effectively protect module 38 and/or network filter 39 from malicious code that may affect the virtual machine. In yet another embodiment, fraud prevention module 38 may operate at least in part on a different electronic device than client system 10, for example, on a router, proxy server, or gateway device for connecting client system 10 to an extended network such as the internet.

Fig. 4 illustrates the operation of fraud prevention module 38 via an exemplary data exchange between client system 10 and secure server 14. Fig. 5 further illustrates an exemplary sequence of steps performed by fraud prevention module 38 and/or security server 14 to protect client system 10 from electronic fraud, according to some embodiments of the present invention. In the illustrative example in which application 36 comprises a web browser, when a user attempts to access a remote document (e.g., a website), application 36 may send a request to access the respective document to a service provider server over communications network 13. A typical request may include an encoding of the location of the respective resource. Exemplary location codes include domain names, hostnames, uniform Resource Identifiers (URIs), uniform Resource Locators (URLs), and Internet Protocol (IP) addresses, among others.

Upon detecting an access request (e.g., an HTTP request issued by a web browser), some embodiments of the fraud prevention module 38 at least temporarily suspend transmission of the respective request to its intended destination, but instead transmit the document indicator 42 to the security server 14. In some embodiments, the document indicator 42 contains a code (e.g., domain name, URL, IP address) of the location of the requested document, and may further contain other information obtained by the fraud prevention module 38 by analyzing the intercepted access request. Such information may include an indicator of the type of document requested, an indicator of the requesting application, an identifier of the requesting user, and so forth. In response to receiving the document indicator 42, in step sequences 208-210 (FIG. 5), some embodiments of the security server 14 formulate an assessment indicator 44 indicating whether the requested document is likely fraudulent and transmit the indicator 44 to the client system 10. In some embodiments, the likelihood of fraud is quantified as a boolean value (e.g., 0/1, yes/no), or as a number between a lower limit and an upper limit (e.g., between 0 and 100).

In some embodiments, in step 212, fraud prevention module 38 determines whether the requested document is likely fraudulent based on evaluation indicator 44. If so, step 214 allows client system 10 (e.g., application 36) to access the respective document, such as by transmitting the original access request to its intended destination. If not, step 216 may block access to the individual documents. Some embodiments may further display notifications (e.g., alert screens, icons, interpretations, etc.) to the user and/or may notify a system administrator of client system 10.

In alternative embodiments, fraud prevention module 38 executing on client system 10 or on a router connecting client system 10 to the internet may redirect all requests to access remote documents to security server 14 for analysis. Thus, the security server 14 may be placed at the location of a proxy server between the client system 10 and a remote server providing access to the respective resources. In such embodiments, steps 212-214-216 may be performed by the secure server 14.

In an exemplary embodiment that protects a user of client system 10 from fraudulent electronic messages (e.g., emails), fraud prevention module 38 may be installed as a plug-in or attachment to a message reader application. Upon receipt of the message, module 38 may parse the header of the respective message to extract a document indicator that includes, for example, an electronic address of a sender of the respective message and/or a domain name of an email server delivering the respective message. Module 38 may then transmit document indicator 42 to security server 14 and, in response, receive assessment indicator 44 from server 14. Fraud prevention module 38 may determine from indicator 44 whether the respective message is likely fraudulent and, if so, prevent the content of the respective message from being displayed to the user. In some embodiments, module 38 may place messages that are considered potentially fraudulent into a separate message folder.

In alternative embodiments, fraud prevention module 38 may execute on a server computer system (e.g., an email server) that manages electronic messaging on behalf of multiple client systems (e.g., client systems 10a-d in FIG. 1). To determine that a message may be fraudulent, module 38 may block the distribution of the respective message to its intended recipient.

Upon determining the likelihood of fraud, the security server 14 may query the fraud domain database 15 (step 208 in fig. 5). In some embodiments, database 15 includes a set of records, each record corresponding to a rogue domain name; these record sets are sometimes referred to in the art as blacklists. In some embodiments, step 208 includes determining whether the domain name indicated by the document indicator 42 matches any of the blacklisted records of the database 15. If so, the security server 14 may determine that the requested document is likely fraudulent.

Fraud domain database 15 may be populated and maintained by fraud identification server 12. In some embodiments, server 12 identifies a previously unknown set of rogue domains based on knowledge derived from analyzing known rogue internet domains (referred to herein as seed domains). The domain name of the newly discovered rogue domain may then be added to database 15. Fig. 6 illustrates exemplary components of fraud identification server 12, according to some embodiments of the present invention. The server 12 may include a reverse address mapper 52, a registration data filter 54 coupled to the reverse address mapper 52, and a content analyzer 56 coupled to the filter 54. Fig. 7 illustrates an exemplary sequence of steps performed by fraud identification server 12 to discover fraudulent internet domains according to some embodiments of the present invention.

Some embodiments of the present invention rely on the observation that physical computing resources belonging to one rogue domain typically also belong to other rogue domains. For example, the same server and/or IP address may host multiple fraudulent websites. Such servers or network addresses may be owned by fraudsters or may be hijacked without knowledge of their legitimate owners/operators, e.g., through the use of elaborate malware. The following description shows how knowledge of one rogue domain can be used to reveal other previously unknown rogue domains.

In some embodiments, reverse address mapper 52 is configured to receive an indicator of a seed domain (e.g., seed domain name 62 in fig. 6) and output a set of co-hosted domains 64 (step 234 in fig. 7). Seed domains represent known fraudulent domains, i.e., domains known to host or distribute fraudulent documents. Examples of such domains include domains hosting fake banking websites, fake online wagering sites, fake loan sites, and the like. For example, the seed domain name may be detected by a researcher of a computer security company, or may be reported by an internet user or a manager of an online fraud investigation. Seed domain names may also be automatically discovered by a collection of tools known in the art (e.g., honeypot technology).

In some embodiments, co-hosted domain 64 includes a set of domains (e.g., public IP addresses) that share a public network address with a seed domain. The exemplary set of co-hosted domains 64 uses the same physical server to distribute electronic documents. Because a single network/IP address may correspond to multiple different computer systems, co-hosted domain 64 may not necessarily contain the same physical machine as the seed domain. However, the domain name server would map the seed domain name 62 and the domain names of all co-hosted domains 64 to the same network address. To identify co-hosted domain 64, fraud identification server 12 may use any method known in the art of computer networks. Such operations are commonly referred to as reverse IP analysis, reverse Domain Name System (DNS) lookup, or reverse DNS resolution. In one exemplary method, server 12 operates a name server (i.e., determines an IP address from a domain name) for performing a direct DNS lookup and uses the name server to construct a reverse DNS map. Another approach may look up a pointer DNS record type (PTR record) for a particular domain, such as in-addr.

Not all co-hosted domains 64 need be fraudulent. As described above, sometimes computer systems belonging to a legitimate domain are hijacked by a fraudster, who then uses a separate machine to host a set of fraudulent domains. Sometimes such rogue domains are only hosted on a respective machine for a short period of time and then moved to another server to avoid detection or take countermeasures. In some embodiments, registration data filter 54 of fraud identification server 12 is configured to filter the set of co-hosted domains 64 to select a set of fraud candidate domains 66 (step 236 in fig. 7), representing domains suspected to be fraudulent. Fraud candidate field 66 may be subject to further review as follows.

Step 236 may be considered an optimization because fraud analysis as shown below may be computationally expensive. Pre-filtering the set of co-hosted domains 64 may reduce computational burden by using relatively less expensive rules to select a subset of candidate domains for fraud analysis. Some embodiments of registration data filter 54 select fraud candidate domains 66 based on the domain name registration record for each co-hosted domain. The registration record is generated and/or maintained by a domain registration authority (e.g., an internet registrar). For each registered domain name, the exemplary registration record may include contact data (e.g., name, address, telephone number, email address, etc.) of the registrant, owner, or administrator of the respective domain name, as well as automatically generated data, such as the registrant's ID, as well as various timestamps indicating the time at which the respective domain name was registered, the time at which the respective registration record was last modified, the time at which the respective registration record expired, etc.

Some domain name registration data is public and may be queried by specific computer instructions and/or protocols (e.g., WHOIS). In some embodiments, registration data filter 54 obtains domain name registration data associated with co-hosted domain 64 from domain registration database 17, for example, by using the WHOIS protocol. Filter 54 may then search the set of domain name registration data for each co-hosted domain for a fraud indication pattern to determine if the domain is likely fraudulent. Some embodiments rely on the following observations: the registrations of rogue domain names are typically aggregated over time (bursts of domain name registrations); such embodiments may compare the registration timestamp of the seed domain name 62 with the registration timestamp of the co-hosted domain 64 and select the respective co-hosted domain to the set of fraud candidate domains 66 based on the comparison (e.g., when the two registrations are very close in time).

Another exemplary fraud indication feature is a registrant (e.g., owner, administrator, etc.) of the domain name. Some embodiments of filter 54 may attempt to match the registrant's credentials with a list of known names, phone numbers, addresses, emails, etc. collected from domain name registration data for known rogue domains, such as seed domain name 62. A match may indicate that the respective co-hosted domain may be fraudulent, thus justifying inclusion of the respective co-hosted domain in the set of fraud candidate domains 66.

In some embodiments, filter 54 may look up certain fraud indication features for the registrant's telephone number. In one example, some areas or country codes may be considered fraud-indicative. In another example, certain digit combinations within a telephone number correspond to an automatic call redirection service; the respective telephone number may appear to be a legal number, but calling it will cause the respective call to be redirected to another number, possibly the number of another country. Such a call redirection mode may be considered fraud-indicative. Some embodiments of registration data filter 54 may perform a reverse phone number lookup and compare the result of the lookup with other domain registration data, such as an address or name. Any discrepancy may be considered fraud-indicative and may result in the inclusion of a respective co-hosted domain in the fraud candidate set.

Yet another exemplary criterion for selecting a domain to the set of fraud candidate domains 66 is the registrant's email address. Some embodiments of filter 54 may attempt to match individual email addresses to a blacklist of email addresses collected from known fraudulent documents (e.g., web pages, email messages). The blacklist may also include email addresses collected from domain registration data for known rogue domains. Some embodiments of filter 54 may look for patterns in the registrant's email, such as a significantly random sequence of characters, an abnormally long email address, etc. Such patterns may indicate that the respective addresses are automatically generated, which may be indicative of fraud. In some embodiments, filter 54 may determine whether to include the co-hosted domain into the fraud candidate set based on the provider of the email address, e.g., based on whether the respective provider allows anonymous email accounts, based on whether the respective email address is provided for free, etc. Some embodiments may identify email servers that handle emails addressed to and/or originating from individual email addresses, and determine whether to include co-hosted domains in the fraud candidate sets based on the identity of such servers.

In response to selecting fraud candidate domain 66, in some embodiments, content analyzer 56 performs content analysis to determine whether any of the fraud candidate domains in the fraud candidate domain set are actually fraudulent (step 238 in fig. 7). The content analysis may include accessing fraud candidate domains and analyzing the content of electronic documents hosted or distributed by the respective domains. When the content analysis determines that the electronic document is fraudulent, step 240 may determine that the respective fraud candidate domain is indeed fraudulent and may save the newly identified fraud domain name to the fraud domain database 15.

Exemplary content analysis of hypertext markup language (HTMT) documents includes, among other things, determining whether the respective document includes a user authentication (login) page. Such a determination may include determining whether the respective web page includes a form field and/or any of a plurality of user authentication keywords (e.g., "user name", "password", financial institution name and/or acronym).

The content analysis may further include comparing the individual HTML documents to known sets of fraudulent documents and/or legal documents. When the document is sufficiently similar to a known fraudulent document, some embodiments determine that the respective document is fraudulent. These methods rely on the observation that fraudsters often reuse successful document templates, and thus there are typically several fraudulent documents that use approximately the same design and/or format.

However, when a document is sufficiently similar to a particular legitimate document, the document may also be fraudulent. In one such instance, the web page may attempt to fool the user by masquerading as a legitimate web page for a financial institution (e.g., bank, insurance company, etc.). Thus, some embodiments of the content analyzer 56 use content analysis to determine whether an HTML document located in a fraud candidate domain is an illegitimate clone of a legitimate web page. Such a determination may include analyzing a set of graphical elements (e.g., images, logos, color schemes, fonts, font styles, font sizes, etc.) of the document during review and comparing the elements to graphical elements collected from a set of legal web pages.

The content analysis may further include analyzing text portions of the respective electronic documents. Such text analysis may include searching for certain keywords, calculating the frequency of occurrence of certain terms and/or sequences of terms, determining the relative position of certain terms with respect to other terms, and the like. Some embodiments determine an inter-document distance indicating a degree of similarity between the target document and the reference document (or fraudulent or legitimate) and determine whether the target document is legitimate based on the calculated distance.

Another example of text-based content analysis includes identifying and extracting contact information from an electronic document, such as an HTML document or an email message (e.g., address, contact phone number, contact email address, etc.). The content analyzer 56 may then attempt to match the individual contact data with a blacklist of similar data extracted from known fraudulent documents. For example, when a web page lists contact phone numbers that appear on a fraudulent web site, some embodiments may infer that the web page is also fraudulent. Other embodiments seek fraud-indicating patterns in contact data, such as patterns indicating phone number digits for call redirection services for phone numbers with certain country and/or region codes, etc. (see above for analysis of domain registration data).

Another set of exemplary content analysis methods identifies code (e.g., business tracking code) segments placed within an electronic document. The network analysis service (e.g.,) Using instances of such code to calculate and report various data related to web page usage: number of visits, recommender, country of visit, etc. Such code typically includes a unique client ID (e.g., tracking ID) that allows the respective analysis service to associate the respective electronic document with a particular client. Some embodiments of content analyzer 56 may identify tracking IDs and attempt to match individual IDs with black lists of such IDs collected from known fraudulent documents. A match may indicate that the document currently being analyzed is also fraudulent. />

The exemplary systems and methods described above allow for automatic detection of internet fraud, such as fraudulent web pages and electronic messages. Some embodiments identify names that automatically identify rogue internet domain names, i.e., domains that host or distribute rogue documents, and prevent users from accessing the respective rogue domain names. Alternate embodiments display an alert and/or notify the system administrator when an attempt is made to access a known rogue domain name.

Some embodiments automatically discover a previously unknown set of rogue domain names based on knowledge derived from analyzing known rogue domain names. Such automatic detection can quickly respond to emerging fraud attempts and can even proactively prevent fraud by detecting domain names that are registered but have not been used to perform fraudulent activity.

Some embodiments select a fraud candidate domain from a set of domains hosted on the same machine as the known fraud domain. The candidate set may be further refined based on domain registration data. Content analysis may then be used to identify truly rogue domains within the candidate set.

It will be clear to a person skilled in the art that the above-described embodiments may be varied in many ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the appended claims and their legal equivalents.

Claims

1. A computer system comprising at least one hardware processor configured to operate a reverse address mapper, a registration data filter connected to the reverse address mapper, and a content analyzer connected to the registration data filter, wherein:

the reverse address mapper is configured to identify a set of co-hosted internet domains from a known rogue internet domain, wherein the known rogue internet domain is located at a target internet protocol, IP, address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address;

the registration data filter is configured to:

receiving the co-hosted set of internet domains from the reverse address mapper; and

filtering the received co-hosted internet domain set to generate a subset of fraud candidate domains, wherein filtering the co-hosted internet domain set comprises:

determining whether a selection condition is met based on domain name registration data characterizing one of the co-hosted internet domain sets, an

In response, selecting the domain into the fraud candidate domain subset when the selection condition is satisfied; and is also provided with

The content analyzer is configured to:

receiving the subset of fraud candidate domains from the registration data filter,

analyzing an electronic document distributed by a selected candidate domain from the received subset of fraudulent candidate domains to determine if the electronic document is fraudulent, an

In response, the candidate domain is determined to be fraudulent when the electronic document is fraudulent.

2. The computer system of claim 1, wherein determining whether the selection condition is met comprises: the domain name registration data characterizing the domain is compared with domain name registration data characterizing the known rogue internet domain.

3. The computer system of claim 2, wherein determining whether the selection condition is met comprises: the registration timestamp of the domain is compared with the registration timestamp of the known rogue internet domain.

4. The computer system of claim 1, wherein the domain name registration data characterizing the domain comprises an email address, and wherein the registration data filter is configured to determine whether the selection condition is met based on the email address.

5. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is met according to a length of the email address.

6. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is met based on an identification of a mail server processing emails sent to the email address.

7. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is met based on whether a provider of the email address allows an anonymous email account.

8. The computer system of claim 4, wherein the registration data filter is configured to determine whether the selection condition is met based on a likelihood of automatically generating the email address.

9. The computer system of claim 1, wherein the domain name registration data characterizing the domain comprises a telephone number, and wherein the registration data filter is configured to determine whether the selection condition is met based on the telephone number.

10. The computer system of claim 9, wherein determining whether the selection condition is met comprises performing a reverse phone number lookup to determine an entity owning the phone number, and wherein the registration data filter is configured to determine whether the selection condition is met based on a result of the reverse phone number lookup.

11. A method of identifying fraudulent internet domains, the method comprising:

identifying a set of co-hosted internet domains using at least one hardware processor from a known rogue internet domain, wherein the known rogue internet domain is located at a target internet protocol, IP, address, and wherein identifying the set of co-hosted internet domains comprises selecting the set of co-hosted internet domains such that all components of the set of co-hosted internet domains are located at the target IP address;

in response to identifying the co-hosted internet domain set, filtering the co-hosted internet domain set using the at least one hardware processor to generate a fraud candidate domain subset, wherein filtering the co-hosted internet domain set comprises:

In response, selecting the domain into the fraud candidate domain subset when the selection condition is satisfied;

in response to generating the subset of fraud candidate domains, analyzing, using the at least one hardware processor, electronic documents distributed by candidate domains selected from the subset of fraud candidate domains to determine whether the electronic documents are fraudulent; and

in response to analyzing the electronic document, determining that the candidate domain is fraudulent when the electronic document is fraudulent.

12. The method of claim 11, wherein determining whether the selection condition is met comprises: the domain name registration data characterizing the domain is compared with domain name registration data characterizing the known rogue internet domain.

13. The method of claim 12, wherein determining whether the selection condition is met comprises: the registration timestamp of the domain is compared with the registration timestamp of the known rogue internet domain.

14. The method of claim 11, wherein the domain name registration data characterizing the domain includes an email address, and wherein the method includes determining whether the selection condition is met based on the email address.

15. The method of claim 14, comprising determining whether the selection condition is met according to a length of the email address.

16. The method of claim 14, comprising determining whether the selection condition is met based on an identification of a mail server processing emails sent to the email address.

17. The method of claim 14, comprising determining whether the selection condition is met based on whether a provider of the email address allows an anonymous email account.

18. The method of claim 14, comprising determining whether the selection condition is met based on a likelihood of automatically generating the email address.

19. The method of claim 11, wherein the domain name registration data characterizing the domain comprises a telephone number, and wherein the method comprises determining whether the selection condition is met based on the telephone number.

20. The method of claim 19, wherein determining whether the selection condition is met comprises performing a reverse phone number lookup to determine an entity owning the phone number, and wherein the method comprises determining whether the selection condition is met based on a result of the reverse phone number lookup.

21. A non-transitory computer-readable medium storing instructions that, when executed by at least one hardware processor, cause the hardware processor to form a reverse address mapper, a registration data filter connected to the reverse address mapper, and a content analyzer connected to the registration data filter, wherein:

the registration data filter is configured to:

The content analyzer is configured to: