CN114662034A - User marking method, user marking system, electronic device and storage medium - Google Patents
User marking method, user marking system, electronic device and storage medium Download PDFInfo
- Publication number
- CN114662034A CN114662034A CN202210393449.3A CN202210393449A CN114662034A CN 114662034 A CN114662034 A CN 114662034A CN 202210393449 A CN202210393449 A CN 202210393449A CN 114662034 A CN114662034 A CN 114662034A
- Authority
- CN
- China
- Prior art keywords
- target
- information
- user
- data
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种用户标记方法、用户标记系统、电子设备及存储介质。The present application relates to the field of computer technology, and in particular, to a user marking method, a user marking system, an electronic device and a storage medium.
背景技术Background technique
云防护系统能够为用户的网站访问提供安全防护,用户在云防护系统中访问网站时,可对用户进行标记,以向用户提供差异化、精细化的防护服务。目前,对在云防护系统中访问网站的用户进行标记的方式比较单一,导致用户标记较为片面,不够准确。The cloud protection system can provide security protection for users' website access. When users visit websites in the cloud protection system, users can be marked to provide users with differentiated and refined protection services. At present, the way of marking users who visit the website in the cloud protection system is relatively simple, resulting in a one-sided and inaccurate user marking.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种用户标记方法、用户标记系统、电子设备及存储介质,可对在云防护系统中访问网站的用户进行完整且准确地标记。The embodiments of the present application provide a user marking method, a user marking system, an electronic device, and a storage medium, which can completely and accurately mark a user who accesses a website in a cloud protection system.
本申请实施例第一方面提供了一种用户标记方法,可以包括:A first aspect of the embodiments of the present application provides a user marking method, which may include:
从用户信息系统中获取目标用户的用户数据,并根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签;Obtain user data of the target user from the user information system, and determine a first tag set according to the user data, where the first tag set includes at least one first tag corresponding to the user data;
从云防护系统中获取该目标用户对应的网络访问日志数据;Obtain the network access log data corresponding to the target user from the cloud protection system;
基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签;Analyze the network access log data based on a preset rule, and determine a second label set according to the analysis result, where the second label set includes at least one second label corresponding to the analysis result;
根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,该目标标签集合下的标签用于对该目标用户进行标记。According to the first label set and the second label set, a target label set corresponding to the target user is determined, and the labels under the target label set are used to mark the target user.
可选的,该根据该用户数据确定第一标签集合,包括:利用命名实体识别NER技术对该用户数据进行信息提取,得到与该目标用户对应的实体信息,并根据该实体信息确定第一标签集合;其中,该实体信息包括地域信息;该根据该实体信息确定第一标签集合,包括:将该地域信息确定为第一标签集合下的第一标签;和/或,该实体信息包括机构名称信息;该根据该实体信息确定第一标签集合,包括:获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签。Optionally, determining the first label set according to the user data includes: extracting information from the user data using named entity recognition NER technology, obtaining entity information corresponding to the target user, and determining the first label according to the entity information. Wherein, the entity information includes regional information; the determining of the first label set according to the entity information includes: determining the regional information as the first label under the first label set; and/or, the entity information includes an institution name information; determining the first label set according to the entity information includes: acquiring an industry type corresponding to the institution name information, and determining the industry type as a first label under the first label set.
可选的,该获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签,包括:基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型,该预置的行业划分规则包括多个机构名称信息对应的行业类型;将该目标行业类型确定为第一标签集合下的第一标签。Optionally, acquiring the industry type corresponding to the institution name information, and determining the industry type as the first label under the first label set, includes: determining the industry type corresponding to the institution name information based on a preset industry division rule. the target industry type, the preset industry division rule includes the industry type corresponding to a plurality of organization name information; the target industry type is determined as the first label under the first label set.
可选的,该基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,包括:对该网络访问日志数据进行分析,得到分析数据,该分析数据包括与该目标用户对应的网站地址、与请求统一资源定位器URL对应的网络资源信息、目标URL及网站的历史访问汇总信息中的至少一种;基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合。Optionally, analyzing the network access log data based on a preset rule, and determining the second label set according to the analysis result, includes: analyzing the network access log data, and obtaining analysis data, where the analysis data includes and the target. At least one of the website address corresponding to the user, the network resource information corresponding to the requesting uniform resource locator URL, the target URL, and the summary information of the website's historical access; based on the preset rules, determine the data type corresponding to the analysis data, and A second set of tags is determined according to the data type.
可选的,该分析数据包括该与该目标用户对应的网站地址;该基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合,包括:对该网站地址进行解析,确定与该网站地址对应的根域名;基于预置的第一分类规则,确定与该根域名对应的目标域名类型,该预置的第一分类规则包括多个域名类型;将该目标域名类型确定为第二标签集合下的第二标签。Optionally, the analysis data includes the website address corresponding to the target user; the data type corresponding to the analysis data is determined based on a preset rule, and the second label set is determined according to the data type, including: the website The address is resolved to determine the root domain name corresponding to the website address; the target domain name type corresponding to the root domain name is determined based on the preset first classification rule, and the preset first classification rule includes multiple domain name types; The target domain name type is determined as the second label under the second label set.
可选的,该分析数据包括该与该请求URL对应的网络资源信息;该基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合,包括:获取该网络资源信息所属的资源类型对应的网页服务及该网页服务的第一特征信息;将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,该第一目标特征信息为该多个网页服务的特征信息中与该第一特征信息匹配成功的特征信息;将该目标资源类型确定为第二标签集合下的第二标签。Optionally, the analysis data includes the network resource information corresponding to the request URL; the data type corresponding to the analysis data is determined based on a preset rule, and the second label set is determined according to the data type, including: obtaining the The webpage service corresponding to the resource type to which the network resource information belongs and the first characteristic information of the webpage service; the first characteristic information is matched with the characteristic information of the preset multiple webpage services; The target resource type corresponding to the target feature information, the first target feature information is the feature information that is successfully matched with the first feature information in the feature information of the plurality of webpage services; the target resource type is determined as the second tag set. Second tab.
可选的,该分析数据包括该目标URL;该基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合,包括:获取该目标URL的第二特征信息;将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,该第二目标特征信息为该多个URL的特征信息中与该第二特征信息匹配成功的特征信息;将该URL类型确定为第二标签集合下的第二标签。Optionally, the analysis data includes the target URL; the data type corresponding to the analysis data is determined based on a preset rule, and the second label set is determined according to the data type, including: acquiring the second feature information of the target URL ; This second feature information is matched with the feature information of the preset multiple URLs; If the matching is successful, then determine the URL type corresponding to the second target feature information, and the second target feature information is the feature of the multiple URLs The feature information in the information that is successfully matched with the second feature information; the URL type is determined as the second tag under the second tag set.
可选的,该分析数据包括该网站的历史访问汇总信息;该基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合,包括:在该历史访问总汇信息中,获取预设时间段内的第一访问汇总信息,该第一访问汇总信息包括该目标用户对该网站的访问次数;将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,该访问曲线用于反映该目标用户在该各个时刻对该网站的访问次数;将该访问曲线的第三特征信息与预置的多个访问曲线分别对应的特征信息进行匹配;若匹配成功,则确定与第三目标特征信息对应的目标标签,该第三目标特征信息为该多个访问曲线分别对应的特征信息中与该第三特征信息匹配成功的特征信息;将该目标标签确定为第二标签集合下的第二标签。Optionally, the analysis data includes historical access summary information of the website; the data type corresponding to the analysis data is determined based on a preset rule, and the second label set is determined according to the data type, including: in the historical access summary In the information, the first visit summary information within a preset time period is obtained, and the first visit summary information includes the number of visits to the website by the target user; the number of visits at each moment in the preset time period is clustered to obtain The access curve corresponding to the preset time period, the access curve is used to reflect the number of visits of the target user to the website at each moment; the third characteristic information of the access curve corresponds to the preset multiple access curves respectively The feature information is matched; if the matching is successful, the target label corresponding to the third target feature information is determined, and the third target feature information is the feature that is successfully matched with the third feature information in the feature information corresponding to the multiple access curves respectively. information; determine the target tag as the second tag under the second tag set.
本申请实施例第二方面提供了一种用户标记系统,可以包括:A second aspect of the embodiments of the present application provides a user marking system, which may include:
第一数据采集模块,用于从用户信息系统中获取目标用户的用户数据;a first data acquisition module, used for acquiring user data of the target user from the user information system;
第一数据处理模块,用于根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签;a first data processing module, configured to determine a first tag set according to the user data, where the first tag set includes at least one first tag corresponding to the user data;
第二数据采集模块,用于从云防护系统中获取该目标用户对应的网络访问日志数据;The second data collection module is used to obtain the network access log data corresponding to the target user from the cloud protection system;
第二数据处理模块,用于基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签;a second data processing module, configured to analyze the network access log data based on a preset rule, and determine a second label set according to the analysis result, where the second label set includes at least one second label corresponding to the analysis result;
用户标记模块,用于根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,该目标标签集合下的标签用于对该目标用户进行标记。A user marking module, configured to determine a target label set corresponding to the target user according to the first label set and the second label set, and the labels under the target label set are used to mark the target user.
可选的,第一数据处理模块,具体用于利用命名实体识别NER技术对该用户数据进行信息提取,得到与该目标用户对应的实体信息,并根据该实体信息确定第一标签集合;其中,该实体信息包括地域信息,将该地域信息确定为第一标签集合下的第一标签;和/或,该实体信息包括机构名称信息,获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签。Optionally, the first data processing module is specifically configured to use named entity recognition NER technology to perform information extraction on the user data, obtain entity information corresponding to the target user, and determine the first label set according to the entity information; wherein, The entity information includes region information, and the region information is determined as the first label under the first label set; and/or, the entity information includes organization name information, obtains the industry type corresponding to the organization name information, and assigns the industry The type is determined as the first label under the first label set.
可选的,第一数据处理模块,具体用于基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型,该预置的行业划分规则包括多个机构名称信息对应的行业类型;将该目标行业类型确定为第一标签集合下的第一标签。Optionally, the first data processing module is specifically configured to determine the target industry type corresponding to the institution name information based on a preset industry division rule, and the preset industry division rule includes the industry types corresponding to the multiple institution name information. ; Determine the target industry type as the first label under the first label set.
可选的,第二数据处理模块,具体用于对该网络访问日志数据进行分析,得到分析数据,该分析数据包括与该目标用户对应的网站地址、与请求统一资源定位器URL对应的网络资源信息、目标URL及网站的历史访问汇总信息中的至少一种;基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合。Optionally, the second data processing module is specifically configured to analyze the network access log data to obtain analysis data, where the analysis data includes the website address corresponding to the target user and the network resource corresponding to the requesting uniform resource locator URL. at least one of information, target URL, and summary information of historical visits of the website; based on preset rules, determine the data type corresponding to the analysis data, and determine the second tag set according to the data type.
可选的,第二数据处理模块,具体用于该分析数据包括该与该目标用户对应的网站地址;对该网站地址进行解析,确定与该网站地址对应的根域名;基于预置的第一分类规则,确定与该根域名对应的目标域名类型,该预置的第一分类规则包括多个域名类型;将该目标域名类型确定为第二标签集合下的第二标签。Optionally, the second data processing module is specifically used for the analysis data to include the website address corresponding to the target user; to analyze the website address to determine the root domain name corresponding to the website address; based on the preset first The classification rule determines the target domain name type corresponding to the root domain name, the preset first classification rule includes multiple domain name types; the target domain name type is determined as the second label under the second label set.
可选的,第二数据处理模块,具体用于该分析数据包括该与该请求URL对应的网络资源信息;获取该网络资源信息所属的资源类型对应的网页服务及该网页服务的第一特征信息;将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,该第一目标特征信息为该多个网页服务的特征信息中与该第一特征信息匹配成功的特征信息;将该目标资源类型确定为第二标签集合下的第二标签。Optionally, the second data processing module is specifically used for the analysis data to include the network resource information corresponding to the request URL; to obtain the webpage service corresponding to the resource type to which the network resource information belongs and the first feature information of the webpage service. ; The first feature information is matched with the feature information of the preset multiple webpage services; if the matching is successful, the target resource type corresponding to the first target feature information is determined, and the first target feature information is the multiple webpages The feature information in the feature information of the service that is successfully matched with the first feature information; the target resource type is determined as the second tag under the second tag set.
可选的,第二数据处理模块,具体用于该分析数据包括该目标URL;获取该目标URL的第二特征信息;将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,该第二目标特征信息为该多个URL的特征信息中与该第二特征信息匹配成功的特征信息;将该URL类型确定为第二标签集合下的第二标签。Optionally, the second data processing module is specifically used for the analysis data to include the target URL; obtain the second feature information of the target URL; match the second feature information with the feature information of multiple preset URLs; If the matching is successful, the URL type corresponding to the second target feature information is determined, and the second target feature information is the feature information that is successfully matched with the second feature information in the feature information of the multiple URLs; the URL type is determined as The second label under the second label set.
可选的,第二数据处理模块,具体用于该分析数据包括该网站的历史访问汇总信息;在该历史访问总汇信息中,获取预设时间段内的第一访问汇总信息,该第一访问汇总信息包括该目标用户对该网站的访问次数;将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,该访问曲线用于反映该目标用户在该各个时刻对该网站的访问次数;将该访问曲线的第三特征信息与预置的多个访问曲线分别对应的特征信息进行匹配;若匹配成功,则确定与第三目标特征信息对应的目标标签,该第三目标特征信息为该多个访问曲线分别对应的特征信息中与该第三特征信息匹配成功的特征信息;将该目标标签确定为第二标签集合下的第二标签。Optionally, the second data processing module is specifically used for the analysis data to include historical visit summary information of the website; in the historical visit summary information, obtain first visit summary information within a preset time period, the first visit The summary information includes the number of visits of the target user to the website; the number of visits at each moment in the preset time period is clustered to obtain an access curve corresponding to the preset time period, and the access curve is used to reflect the target user's The number of visits to the website at each moment; the third feature information of the access curve is matched with the feature information corresponding to the preset multiple access curves; if the matching is successful, the target corresponding to the third target feature information is determined label, the third target feature information is the feature information that is successfully matched with the third feature information among the feature information corresponding to the multiple access curves respectively; the target label is determined as the second label under the second label set.
本申请实施例第三方面提供了一种电子设备,可以包括:A third aspect of the embodiments of the present application provides an electronic device, which may include:
存储有可执行程序代码的存储器;a memory in which executable program code is stored;
以及所述存储器耦合的处理器;and the memory-coupled processor;
所述处理器调用所述存储器中存储的所述可执行程序代码,所述可执行程序代码被所述处理器执行时,使得所述处理器实现如本申请实施例第一方面所述的方法。The processor invokes the executable program code stored in the memory, and when the executable program code is executed by the processor, causes the processor to implement the method according to the first aspect of the embodiments of the present application .
本申请实施例第四方面提供一种计算机可读存储介质,其上存储有可执行程序代码,所述可执行程序代码被处理器执行时,实现如本申请实施例第一方面所述的方法。A fourth aspect of the embodiments of the present application provides a computer-readable storage medium on which executable program codes are stored, and when the executable program codes are executed by a processor, implement the method described in the first aspect of the embodiments of the present application .
本申请实施例第五方面公开一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行本申请实施例第一方面公开的任意一种所述的方法。A fifth aspect of the embodiments of the present application discloses a computer program product that, when the computer program product runs on a computer, causes the computer to execute any one of the methods disclosed in the first aspect of the embodiments of the present application.
本申请实施例第六方面公开一种应用发布平台,该应用发布平台用于发布计算机程序产品,其中,当该计算机程序产品在计算机上运行时,使得该计算机执行本申请实施例第一方面公开的任意一种所述的方法。A sixth aspect of the embodiments of the present application discloses an application publishing platform, and the application publishing platform is used to publish a computer program product, wherein when the computer program product runs on a computer, the computer is caused to execute the first aspect of the embodiments of the present application. any of the methods described.
从以上技术方案可以看出,本申请实施例具有以下优点:As can be seen from the above technical solutions, the embodiments of the present application have the following advantages:
在本申请实施例中,电子设备从用户信息系统中获取目标用户的用户数据,并根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签,该第一标签集合下的第一标记可用于标记该目标用户在该用户信息系统中的真实情况;该电子设备从云防护系统中获取该目标用户对应的网络访问日志数据;然后,该电子设备基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签,该第二标签集合下的第二标签都可用于标记该目标用户在该云防护系统中的上网记录信息;最后,该电子设备根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,能够结合用户数据及网络访问日志数据同时对目标用户进行标记,使得用户标记更加多样化及完整,可以对在云防护系统中访问网站的用户进行完整且准确地标记。In the embodiment of the present application, the electronic device obtains the user data of the target user from the user information system, and determines a first tag set according to the user data, where the first tag set includes at least one first tag corresponding to the user data, The first tag under the first tag set can be used to mark the real situation of the target user in the user information system; the electronic device obtains the network access log data corresponding to the target user from the cloud protection system; then, the electronic device The network access log data is analyzed based on preset rules, and a second tag set is determined according to the analysis result. The second tag set includes at least one second tag corresponding to the analysis result. The second tag set under the second tag set All tags can be used to mark the online record information of the target user in the cloud protection system; finally, the electronic device determines the target tag set corresponding to the target user according to the first tag set and the second tag set, which can be combined with User data and network access log data mark target users at the same time, which makes user marks more diverse and complete, and can mark users who visit websites in the cloud protection system completely and accurately.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments and the prior art. Obviously, the drawings in the following description are only some implementations of the present application. For example, other drawings may also be obtained from these drawings.
图1为本申请实施例中用户标记方法的一个场景示意图;FIG. 1 is a schematic diagram of a scenario of a user marking method in an embodiment of the present application;
图2为本申请实施例中用户标记方法的一个实施例示意图;FIG. 2 is a schematic diagram of an embodiment of a user marking method in an embodiment of the present application;
图3为本申请实施例中用户标记方法的另一个实施例示意图;FIG. 3 is a schematic diagram of another embodiment of a user marking method in an embodiment of the present application;
图4为本申请实施例中用户标记系统的一个实施例示意图;FIG. 4 is a schematic diagram of an embodiment of a user marking system in an embodiment of the present application;
图5为本申请实施例中电子设备的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of an electronic device in an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种用户标记方法、用户标记系统、电子设备及存储介质,可对在云防护系统中访问网站的用户进行完整且准确地标记。The embodiments of the present application provide a user marking method, a user marking system, an electronic device, and a storage medium, which can completely and accurately mark a user who accesses a website in a cloud protection system.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,都应当属于本申请保护的范围。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the present application. examples, but not all examples. Based on the embodiments in this application, all should belong to the scope of protection of this application.
需要说明的是,本申请实施例的执行主体可以是用户标记系统,也可以是电子设备,该电子设备可以包括但不限于服务器。下面以用户标记系统为例,对本申请技术方案做进一步说明。It should be noted that the execution body of the embodiment of the present application may be a user marking system or an electronic device, and the electronic device may include but is not limited to a server. The technical solution of the present application will be further described below by taking the user marking system as an example.
在一些实施例中,如图1所示,为本申请实施例中用户标记方法的一个场景示意图。在图1中,用户终端101在服务器102的云防护系统中访问网站时会产生至少一个用户的用户数据及至少一个用户的网络访问日志数据;服务器102的用户信息系统可存储该用户数据,该云防护系统可存储该网络访问日志数据。In some embodiments, as shown in FIG. 1 , it is a schematic diagram of a scenario of a user marking method in an embodiment of the present application. In FIG. 1, when the
服务器102可从该用户信息系统中获取目标用户的用户数据,并根据该用户数据,确定第一标签集合;服务器102可从该云防护系统中获取该目标用户的网络访问日志数据,并根据该网络访问日志数据确定第二标签集合;最后,该服务器102根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合。The
如图2所示,为本申请实施例中用户标记方法的一个实施例示意图,可以包括:As shown in FIG. 2, it is a schematic diagram of an embodiment of a user marking method in this embodiment of the present application, which may include:
201、从用户信息系统中获取目标用户的用户数据。201. Acquire user data of a target user from a user information system.
在一些实施例中,用户信息系统可存储用户终端在云防护系统中访问网站的至少一个用户的用户数据。可选的,每个用户的用户数据的数量可以为至少一个,当第一用户的用户数据的数量为多个时,这多个用户数据都是不同的,其中,该第一用户为上述每个用户中的任一用户。In some embodiments, the user information system may store user data of at least one user whose user terminal accesses the website in the cloud protection system. Optionally, the number of user data of each user may be at least one, and when the number of user data of the first user is multiple, the multiple user data are different, wherein the first user is each of the above. any of the users.
可选的,该第一用户的用户数据可以包括但不限于以下至少一项:该第一用户的登录账号、登录密码、用户名、位置信息及工作信息等。Optionally, the user data of the first user may include, but is not limited to, at least one of the following: the first user's login account, login password, user name, location information, work information, and the like.
在一些实施例中,目标用户指的是上述至少一个用户中的任一用户,即上述至少一个用户中的每个用户都可为目标用户。也就是说,电子设备在该用户信息系统中,可从上述至少一个用户的用户数据中,获取获取目标用户的用户数据,以便该电子设备后续能够更加准确对该目标用户的用户数据进行处理。In some embodiments, the target user refers to any one of the at least one user, that is, each user of the at least one user may be a target user. That is, in the user information system, the electronic device can obtain the user data of the target user from the user data of the at least one user, so that the electronic device can process the user data of the target user more accurately in the future.
202、根据用户数据确定第一标签集合。202. Determine a first tag set according to user data.
其中,该第一标签集合包括至少一个与该用户数据对应的第一标签。Wherein, the first tag set includes at least one first tag corresponding to the user data.
在一些实施例中,由于目标用户的用户数据的数量为至少一个,每个用户数据可对应一个第一标签,所以,电子设备根据这至少一个用户数据,可确定与该至少一个用户数据分别对应的至少一个第一标签,并根据该至少一个第一标签,确定与该目标用户对应的第一标签集合。In some embodiments, since the number of user data of the target user is at least one, and each user data may correspond to a first tag, the electronic device may determine, according to the at least one user data, that it corresponds to the at least one user data respectively at least one first label of the target user, and determine a first label set corresponding to the target user according to the at least one first label.
其中,这个第一标签集合下的每个第一标签都可用于标记该目标用户在该用户信息系统中的真实情况。Wherein, each first label under the first label set can be used to mark the real situation of the target user in the user information system.
在一些实施例中,电子设备根据用户数据确定第一标签集合,可以包括:电子设备可以先对用户数据进行分析,得到实体信息;该电子设备再利用该实体信息,确定第一标签集合。In some embodiments, the electronic device determining the first tag set according to the user data may include: the electronic device may first analyze the user data to obtain entity information; the electronic device then uses the entity information to determine the first tag set.
可选的,该实体信息可以包括地域信息和/或机构名称信息。Optionally, the entity information may include regional information and/or organization name information.
电子设备可以对该目标用户的位置信息进行分析,确定与该位置信息对应的地域信息,和/或,对该目标用户的工作信息进行分析,确定与该工作信息对应的机构名称信息;然后,该电子设备再根据该地域信息和/或该机构名称信息,可准确与该地域信息和/或该机构名称信息对应的第一标签,并根据该第一标签,确定与该目标用户对应的第一标签集合。The electronic device can analyze the location information of the target user, determine the regional information corresponding to the location information, and/or analyze the work information of the target user, and determine the organization name information corresponding to the work information; then, The electronic device can then accurately determine the first label corresponding to the region information and/or the organization name information according to the region information and/or the organization name information, and determine the first label corresponding to the target user according to the first label. A collection of labels.
203、从云防护系统中获取目标用户对应的网络访问日志数据。203. Obtain network access log data corresponding to the target user from the cloud protection system.
在一些实施例中,云防护系统在检测到用户终端使用该云防护系统时,可记录该用户终端在该云防护系统中访问至少一个网站时的请求数据,得到与该用户终端的用户所对应的网络访问日志数据。In some embodiments, when detecting that the user terminal uses the cloud protection system, the cloud protection system may record the request data when the user terminal accesses at least one website in the cloud protection system, and obtain the data corresponding to the user of the user terminal. of network access log data.
其中,该网络访问日志数据指的是用户终端在云防护系统中访问至少一个网站时对应的超文本传输协议(Hyper Text Transfer Protocol,HTTP)日志。The network access log data refers to a Hyper Text Transfer Protocol (Hyper Text Transfer Protocol, HTTP) log corresponding to when the user terminal accesses at least one website in the cloud protection system.
可选的,该请求数据可以包括但不限于以下至少一项:请求时间、请求方法及请求统一资源定位器(Uniform Resource Locator,URL)等。Optionally, the request data may include but not limited to at least one of the following: request time, request method, request uniform resource locator (Uniform Resource Locator, URL), and the like.
可选的,使用该云防护系统的用户终端的数量可以为至少一个,也即,云防护系统在检测到至少一个用户终端使用该云防护系统时,可存储与该至少一个用户终端的用户所对应的网络访问日志数据。其中,该云防护系统在检测到多个用户终端使用该云防护系统时,该云防护系统存储的与这多个用户终端的用户所对应的网络访问日志数据可以是相同的,也可以是不同的,此处不做具体限定。Optionally, the number of user terminals using the cloud protection system may be at least one, that is, when the cloud protection system detects that at least one user terminal uses the cloud protection system, it may store information about the user terminal of the at least one user terminal. Corresponding network access log data. Wherein, when the cloud protection system detects that multiple user terminals use the cloud protection system, the network access log data stored by the cloud protection system and corresponding to the users of the multiple user terminals may be the same or different. , there is no specific limitation here.
也就是说,电子设备在该云防护系统中,可从上述至少一个用户终端的用户所对应的网络访问日志数据中,获取目标用户对应的网络访问日志数据,以便该电子设备后续能够更加准确对该目标用户对应的网络访问日志数据进行处理。That is to say, the electronic device in the cloud protection system can obtain the network access log data corresponding to the target user from the network access log data corresponding to the user of the at least one user terminal, so that the electronic device can more accurately The network access log data corresponding to the target user is processed.
204、基于预设规则对网络访问日志数据进行分析,并根据分析结果确定第二标签集合。204. Analyze the network access log data based on a preset rule, and determine a second tag set according to the analysis result.
其中,该第二标签集合包括至少一个与该分析结果对应的第二标签。Wherein, the second label set includes at least one second label corresponding to the analysis result.
在一些实施例中,电子设备基于预设规则,对获取的网络访问日志数据进行分析,得到至少一个分析结果;该电子设备根据该至少一个分析结果,可确定与该至少一个分析结果分别对应的至少一个第二标签;然后,该电子设备根据该至少一个第二标签,确定与该目标用户对应的第二标签集合。In some embodiments, the electronic device analyzes the acquired network access log data based on a preset rule to obtain at least one analysis result; the electronic device may determine the corresponding at least one analysis result according to the at least one analysis result. at least one second tag; then, the electronic device determines a second tag set corresponding to the target user according to the at least one second tag.
其中,这个第二标签集合下的每个第二标签都可用于标记该目标用户在该云防护系统中的上网记录信息。Wherein, each second label under the second label set can be used to mark the Internet record information of the target user in the cloud protection system.
在一些实施例中,电子设备对获取的网络访问日志数据进行分析,可得到与该网络访问日志数据对应的至少一个分析数据,然后,该电子设备再基于该预设规则,确定该至少一个分析数据分别对应的至少一个数据类型,最后,该电子设备根据该至少一个数据类型确定与该目标用户对应的第二标签集合。In some embodiments, the electronic device analyzes the acquired network access log data to obtain at least one analysis data corresponding to the network access log data, and then the electronic device determines the at least one analysis data based on the preset rule At least one data type corresponding to the data, and finally, the electronic device determines a second tag set corresponding to the target user according to the at least one data type.
其中,该预设规则指的是电子设备中预置的对网络访问日志数据进行分析的规则。The preset rule refers to a rule preset in the electronic device for analyzing network access log data.
可选的,上述分析数据可以包括但不限于以下至少一种:与该目标用户对应的网站地址(简称:网址)、与请求URL对应的web资源信息、目标URL及网站的历史访问汇总信息等。Optionally, the above-mentioned analysis data may include but not limited to at least one of the following: the website address (abbreviated as: website) corresponding to the target user, the web resource information corresponding to the request URL, the target URL and the historical access summary information of the website, etc. .
其中,与该目标用户对应的网址指的是该目标用户通过用户终端在云防护系统中访问过的网站所对应的URL地址路径信息。The URL corresponding to the target user refers to URL address path information corresponding to the website that the target user has visited in the cloud protection system through the user terminal.
与请求URL对应的网络资源信息指的是该目标用户通过用户终端在使用云防护系统时访问网站产生的请求URL所对应的数据资源。其中,该数据资源可以分为静态web资源和动态web资源,该静态web资源指的是网站所对应的web页面中供目标用户浏览的数据始终是不变,该动态web资源指的是网站所对应的web页面中供该目标用户浏览的数据是由程序产生的,不同的用户或者不同时间点访问该web页面看到的内容都是各不相同的。可选的,该静态web资源可以包括但不限于以下至少一项:超文本标记语言(Hyper Text MarkupLanguage,HTML)、层叠样式表(Cascading Style Sheets,CSS)、直译式脚本语言(JavaScript,JS)、文本文档格式(Text,TXT)、音视频及图片等;该动态web资源可以包括但不限于以下至少一项:动态网页技术标准(Java Server Pages,JSP)页面、Servlet程序及Thymeleaf程序等。The network resource information corresponding to the request URL refers to the data resource corresponding to the request URL generated by the target user accessing the website through the user terminal when using the cloud protection system. Among them, the data resources can be divided into static web resources and dynamic web resources. The static web resources refer to the data in the web pages corresponding to the website for the target user to browse, and the dynamic web resources refer to the data on the website. The data in the corresponding web page for the target user to browse is generated by a program, and different users or different time points visit the web page to see different contents. Optionally, the static web resource may include, but is not limited to, at least one of the following: Hyper Text Markup Language (HTML), Cascading Style Sheets (CSS), literal translation scripting language (JavaScript, JS) , text document format (Text, TXT), audio, video and pictures, etc.; the dynamic web resources may include but not limited to at least one of the following: dynamic web page technology standard (Java Server Pages, JSP) pages, Servlet programs and Thymeleaf programs, etc.
目标URL指的是该目标用户通过用户终端在使用云防护系统时会存在请求至少一个URL的情况,电子设备可将该请求次数大于预设次数的URL作为目标URL,其中,该预设次数可以是电子设备出厂前设置的,也可以是目标用户自定义的,此处不做具体限定。The target URL refers to the situation that the target user requests at least one URL when using the cloud protection system through the user terminal, and the electronic device can use the URL whose number of requests is greater than the preset number of times as the target URL, where the preset number of times can be It is set by the electronic device before leaving the factory, or it can be customized by the target user, which is not specifically limited here.
网站的历史访问汇总信息可以包括但不限于以下至少一项:该目标用户在访问网站时产生的访问时间、访问内容及在预设时间段内对该网站进行的访问次数等,其中,该预设时间段可以是电子设备出厂前设置的,也可以是目标用户自定义的,此处不做具体限定。The aggregated information on historical visits to a website may include, but is not limited to, at least one of the following: the visit time, the content of the visit, and the number of visits to the website within a preset period of time, etc., when the target user visits the website, and the like. The set time period may be set before the electronic device leaves the factory, or may be defined by the target user, which is not specifically limited here.
需要说明的是,步骤201-202与步骤203-204的时序不做限定。It should be noted that the sequence of steps 201-202 and steps 203-204 is not limited.
205、根据第一标签集合和第二标签集合,确定与目标用户对应的目标标签集合。205. Determine a target label set corresponding to the target user according to the first label set and the second label set.
其中,该目标标签集合下的标签用于对该目标用户进行标记。The tags under the target tag set are used to mark the target user.
在一些实施例中,由于第一标签集合下的第一标签用于标记目标用户在该用户信息系统中的真实情况,第二标签集合下的第二标签用于标记该目标用户在该云防护系统中的上网记录信息,所以,电子设备根据该第一标签集合和该第二标签集合,得到的与该目标用户对应的目标标签集合是较为完整的。这样一来,该电子设备可根据该较为完整的目标标签集合,更加准确标记该目标用户,进而便于该云防护系统后续能够基于该较为完整的目标标签集合对该目标用户提供精细程度更高的服务。In some embodiments, since the first tag under the first tag set is used to mark the real situation of the target user in the user information system, the second tag under the second tag set is used to mark the target user in the cloud protection system Therefore, the target tag set corresponding to the target user obtained by the electronic device according to the first tag set and the second tag set is relatively complete. In this way, the electronic device can more accurately mark the target user according to the relatively complete set of target tags, so that the cloud protection system can subsequently provide the target user with more precise information based on the relatively complete set of target tags. Serve.
在一些实施例中,电子设备在获取第一标签集合和第二标签集合后,可以对该第一标签集合下的第一标签和该第二标签集合下的第二标签进行去重处理,得到与目标用户对应的目标标签集合;或者,该电子设备可以获取该第一标签集合下的部分第一标签和该第二标签集合下的部分第二标签,并将该部分第一标签和部分第二标签,组成得到与目标用户对应的目标标签集合。无论是哪种方法得到的目标标签集合都可更加准确标记该目标用户。In some embodiments, after acquiring the first label set and the second label set, the electronic device may perform deduplication processing on the first label under the first label set and the second label under the second label set to obtain The target tag set corresponding to the target user; or, the electronic device may obtain part of the first tag under the first tag set and part of the second tag under the second tag set, and use the part of the first tag and part of the second tag. Two tags are formed to obtain a target tag set corresponding to the target user. The target tag set obtained by either method can more accurately mark the target user.
在本申请实施例中,电子设备从用户信息系统中获取目标用户的用户数据,并根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签,该第一标签集合下的第一标记可用于标记该目标用户在该用户信息系统中的真实情况;该电子设备从云防护系统中获取该目标用户对应的网络访问日志数据;然后,该电子设备基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签,该第二标签集合下的第二标签都可用于标记该目标用户在该云防护系统中的上网记录信息;最后,该电子设备根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,能够结合用户数据及网络访问日志数据同时对目标用户进行标记,使得用户标记更加多样化及,该电子设备利用该目标标签集合下的标签可以对在云防护系统中访问网站的用户进行完整且准确地标记。In the embodiment of the present application, the electronic device obtains the user data of the target user from the user information system, and determines a first tag set according to the user data, where the first tag set includes at least one first tag corresponding to the user data, The first tag under the first tag set can be used to mark the real situation of the target user in the user information system; the electronic device obtains the network access log data corresponding to the target user from the cloud protection system; then, the electronic device The network access log data is analyzed based on preset rules, and a second tag set is determined according to the analysis result. The second tag set includes at least one second tag corresponding to the analysis result. The second tag set under the second tag set All tags can be used to mark the online record information of the target user in the cloud protection system; finally, the electronic device determines the target tag set corresponding to the target user according to the first tag set and the second tag set, which can be combined with User data and network access log data mark target users at the same time, making user marks more diverse. The electronic device can mark users who visit websites in the cloud protection system completely and accurately by using the labels under the target label set. .
如图3所示,为本申请实施例中用户标记方法的另一个实施例示意图,可以包括:As shown in FIG. 3, it is a schematic diagram of another embodiment of the user marking method in this embodiment of the present application, which may include:
301、从用户信息系统中获取目标用户的用户数据。301. Obtain user data of a target user from a user information system.
需要说明的是,步骤301与本实施例中图2所示的步骤201类似,此处不再赘述。It should be noted that
302、利用命名实体识别NER技术对用户数据进行信息提取,得到与目标用户对应的实体信息。302. Use the named entity recognition NER technology to extract information from the user data to obtain entity information corresponding to the target user.
可以理解的是,该命名实体识别(Named Entity Recognition,NER)技术又称作专名识别技术,是自然语言处理中常见的一项数据处理技术,使用范围较广。命名实体通常指的是非结构化的用户数据中具有特别意义或者指代性非常强的实体信息,该实体信息可以包括但不限于以下至少一项:用户名称信息、地域信息、机构名称信息、时间信息、专有名词信息等。It can be understood that the named entity recognition (Named Entity Recognition, NER) technology, also called proper name recognition technology, is a common data processing technology in natural language processing, and is widely used. Named entities usually refer to unstructured user data with special meaning or highly referential entity information, the entity information may include but not limited to at least one of the following: user name information, geographical information, organization name information, time information, proper noun information, etc.
在一些实施例中,电子设备利用该NER技术,可以对该目标用户的用户数据进行信息提取,准确得到与该目标用户对应的实体信息,以便后续该电子设备可对该实体信息进行有效且准确地处理。其中,该实体信息可以包括但不限于:地域信息和/或机构名称信息。In some embodiments, the electronic device can use the NER technology to perform information extraction on the user data of the target user, and obtain the entity information corresponding to the target user accurately, so that the electronic device can perform effective and accurate follow-up on the entity information. deal with. The entity information may include, but is not limited to, geographic information and/or organization name information.
303、根据实体信息确定第一标签集合。303. Determine a first tag set according to the entity information.
在一些实施例中,电子设备在获取与该目标用户对应的实体信息之后,可以根据该实体信息,确定与该实体信息对应的第一标签,并将该第一标签组成第一标签集合,以标记该目标用户的真实情况。In some embodiments, after acquiring the entity information corresponding to the target user, the electronic device may determine a first tag corresponding to the entity information according to the entity information, and form the first tag into a first tag set to Mark the real situation of this target user.
可选的,电子设备根据实体信息确定第一标签集合,可以包括但不限于以下至少一种实现方式:Optionally, the electronic device determines the first tag set according to the entity information, which may include, but is not limited to, at least one of the following implementations:
实现方式1:该实体信息包括地域信息;电子设备将该地域信息确定为第一标签集合下的第一标签。Implementation 1: the entity information includes regional information; the electronic device determines the regional information as the first tag under the first tag set.
在一些实施例中,由于该地域信息可直接对目标用户进行标记,所以,该电子设备可以直接将该地域信息确定为第一标签集合下的第一标签,以标记该目标用户。In some embodiments, since the region information can directly mark the target user, the electronic device can directly determine the region information as the first tag under the first tag set to mark the target user.
实现方式2:该实体信息包括机构名称信息;电子设备获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签。Implementation mode 2: the entity information includes organization name information; the electronic device acquires the industry type corresponding to the organization name information, and determines the industry type as the first label under the first label set.
可选的,电子设备获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签,包括:电子设备基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型;该电子设备将该目标行业类型确定为第一标签集合下的第一标签。Optionally, the electronic device obtains the industry type corresponding to the name information of the institution, and determines the industry type as the first label under the first label set, including: the electronic device determines, based on the preset industry division rules, the industry type that is related to the institution. The target industry type corresponding to the name information; the electronic device determines the target industry type as the first label under the first label set.
其中,该预置的行业划分规则可包括多个机构名称信息对应的行业类型。Wherein, the preset industry division rule may include industry types corresponding to multiple organization name information.
由于机构名称信息无法体现目标用户的相关特征,即该机构名称信息无法直接对该目标用户进行有效标记,而与该机构名称信息对应的行业类型可有效体现该目标用户的相关特征,即与该机构名称信息对应的行业类型可对该目标用户进行有效标记,此外,电子设备中存储有多个机构名称信息对应的行业类型,即该电子设备中存储有预置的行业划分规则,所以,该电子设备可以先在预置的行业划分规则中,获取与该机构名称信息对应的行业类型,然后,该电子设备再将与该机构名称信息对应的行业类型确定为第一标签集合下的第一标签,以确保可有效标记该目标用户。Because the organization name information cannot reflect the relevant characteristics of the target user, that is, the organization name information cannot directly mark the target user effectively, and the industry type corresponding to the organization name information can effectively reflect the relevant characteristics of the target user, that is, the relevant characteristics of the target user can be effectively reflected by the organization name information The industry type corresponding to the organization name information can effectively mark the target user. In addition, the electronic device stores multiple industry types corresponding to the organization name information, that is, the electronic device stores preset industry division rules. Therefore, this The electronic device can first obtain the industry type corresponding to the organization name information in the preset industry classification rules, and then the electronic device determines the industry type corresponding to the organization name information as the first label set under the first label set. tags to ensure that the target user can be effectively tagged.
可选的,电子设备中预置的行业类型可以包括但不限于以下至少一个:保险业、餐饮业、电讯业、房地产、服务业、服装业及广告业等。Optionally, the industry types preset in the electronic device may include, but are not limited to, at least one of the following: insurance industry, catering industry, telecommunication industry, real estate, service industry, clothing industry, and advertising industry.
可选的,电子设备基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型,可以包括:电子设备利用贝叶斯分类器,基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型。Optionally, the electronic device determines the target industry type corresponding to the name information of the institution based on the preset industry division rules, which may include: the electronic device uses a Bayesian classifier to determine the target industry type based on the preset industry division rules. The target industry type corresponding to the name information.
其中,贝叶斯分类器指的是通过该机构名称信息的先验概率,利用贝叶斯公式计算出该机构名称信息的后验概率,即该机构名称信息属于某一行业类型的概率,选择具有最大的后验概率所对应的行业类型作为与该机构名称信息对应的目标行业类型。Among them, the Bayesian classifier refers to the prior probability of the name information of the organization, and the posterior probability of the name information of the organization is calculated by the Bayesian formula, that is, the probability that the name information of the organization belongs to a certain industry type, select The industry type corresponding to the largest posterior probability is used as the target industry type corresponding to the institution name information.
该电子设备利用该贝叶斯分类器可准确对机构名称信息进行分类,得到与该机构名称信息对应的目标行业类型。Using the Bayesian classifier, the electronic device can accurately classify the organization name information, and obtain the target industry type corresponding to the organization name information.
304、从云防护系统中获取目标用户对应的网络访问日志数据。304. Obtain network access log data corresponding to the target user from the cloud protection system.
需要说明的是,步骤304与本实施例中图2所示的步骤203类似,此处不再赘述。It should be noted that
305、对网络访问日志数据进行分析,得到分析数据。305. Analyze the network access log data to obtain analysis data.
在一些实施例中,电子设备对网络访问日志数据进行分析,得到分析数据,以便后续电子设备可对该分析数据进行处理。In some embodiments, the electronic device analyzes the network access log data to obtain the analysis data, so that the subsequent electronic device can process the analysis data.
可选的,电子设备对网络访问日志数据进行分析,得到分析数据,可以包括但不限于以下至少一种实现方式:Optionally, the electronic device analyzes the network access log data to obtain the analysis data, which may include, but is not limited to, at least one of the following implementations:
实现方式1:电子设备对网络访问日志数据进行分析,得到目标用户通过用户终端在云防护系统中访问过的网站所对应的URL地址路径信息;该电子设备将该URL地址路径信息作为与该目标用户对应的网站地址。Implementation mode 1: The electronic device analyzes the network access log data, and obtains the URL address path information corresponding to the website that the target user has visited in the cloud protection system through the user terminal; the electronic device uses the URL address path information as the target user. The website address corresponding to the user.
在一些实施例中,电子设备可以对网络访问日志数据进行分析,确定该目标用户在云防护系统中使用过的网站;然后,该电子设备可获取该使用过的网站所对应的网站地址栏中的URL地址路径信息;接着,该电子设备可直接将该URL地址路径信息作为与该目标用户对应的网站地址。In some embodiments, the electronic device can analyze the network access log data to determine the website that the target user has used in the cloud protection system; then, the electronic device can obtain the website address bar corresponding to the used website URL address path information; then, the electronic device can directly use the URL address path information as the website address corresponding to the target user.
实现方式2:电子设备对网络访问日志数据进行分析,得到目标用户通过用户终端在使用云防护系统时访问网络产生的请求数据,该请求数据为请求URL;然后,该电子设备可根据该请求URL,可确定与该请求URL对应的数据资源;接着,该电子设备将该数据资源作为与请求URL对应的网络资源信息。Implementation mode 2: The electronic device analyzes the network access log data, and obtains the request data generated by the target user accessing the network through the user terminal when using the cloud protection system, and the request data is the request URL; then, the electronic device can be based on the request URL. , the data resource corresponding to the request URL can be determined; then, the electronic device uses the data resource as the network resource information corresponding to the request URL.
实现方式3:电子设备对网络访问日志数据进行分析,得到目标用户通过用户终端在使用云防护系统时产生的至少一个请求URL;该电子设备获取该至少一个请求URL对应的请求次数;然后,该电子设备可获取请求次数大于预设次数的URL,并将该URL作为目标URL。Implementation 3: The electronic device analyzes the network access log data, and obtains at least one request URL generated by the target user through the user terminal when using the cloud protection system; the electronic device obtains the number of requests corresponding to the at least one request URL; then, the The electronic device may obtain a URL whose number of requests is greater than a preset number of times, and use the URL as a target URL.
实现方式4:电子设备对网络访问日志数据进行分析,得到目标用户通过用户终端在云防护系统中使用的网站,并获取该网站的历史访问汇总信息。其中,该网站的历史访问汇总信息可以包括但不限于以下至少一项:该目标用户使用该网站时产生的访问时间、访问内容及在预设时间段内对该网站进行的访问次数等。Implementation mode 4: The electronic device analyzes the network access log data, obtains the website used by the target user in the cloud protection system through the user terminal, and obtains the historical visit summary information of the website. Wherein, the historical visit summary information of the website may include, but is not limited to, at least one of the following: visit time, visit content, and visit times of the website within a preset time period when the target user uses the website.
电子设备可对上述实现方式1-实现方式4分别得到的与该目标用户对应的网站地址网址、与请求URL对应的网络资源信息、目标URL及网站的历史访问汇总信息进行有效处理。The electronic device can effectively process the website address and URL corresponding to the target user, the network resource information corresponding to the request URL, the target URL, and the website's historical access summary information obtained from implementations 1 to 4, respectively.
306、基于预设规则,确定与分析数据对应的数据类型,并根据数据类型确定第二标签集合。306. Based on a preset rule, determine a data type corresponding to the analysis data, and determine a second tag set according to the data type.
在一些实施例中,由于分析数据无法体现目标用户的相关特征,即该分析数据无法对该目标用户进行有效标记,而与分析数据对应的数据类型可有效体现该目标用户的相关特征,即上述与分析数据对应的数据类型可对该目标用户进行有效标记,所以,电子设备在获取分析数据之后,可以先在预置的多个数据类型中,确定与该分析数据对应的数据类型,然后,该电子设备根据数据类型,可有效确定第二标签集合,从而对目标用户进行有效标记。In some embodiments, since the analysis data cannot reflect the relevant characteristics of the target user, that is, the analysis data cannot effectively mark the target user, and the data type corresponding to the analysis data can effectively reflect the relevant characteristics of the target user, that is, the above-mentioned The data type corresponding to the analysis data can effectively mark the target user. Therefore, after acquiring the analysis data, the electronic device can first determine the data type corresponding to the analysis data from the preset multiple data types, and then, The electronic device can effectively determine the second tag set according to the data type, so as to effectively tag the target user.
可选的,电子设备基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合,可以包括但不限于以下至少一种实现方式:Optionally, the electronic device determines a data type corresponding to the analysis data based on a preset rule, and determines a second tag set according to the data type, which may include, but is not limited to, at least one of the following implementations:
实现方式1:该分析数据包括该与该目标用户对应的网站地址;电子设备对该网站地址进行解析,确定与该网站地址对应的根域名;该电子设备基于预置的第一分类规则,确定与该根域名对应的目标域名类型,该预置的第一分类规则包括多个域名类型;该电子设备将该目标域名类型确定为第二标签集合下的第二标签。Implementation mode 1: the analysis data includes the website address corresponding to the target user; the electronic device parses the website address to determine the root domain name corresponding to the website address; the electronic device determines based on the preset first classification rule The target domain name type corresponding to the root domain name, the preset first classification rule includes multiple domain name types; the electronic device determines the target domain name type as the second label under the second label set.
在一些实施例中,网站地址可包括但不限于网站的根域名、网站程序及网站空间等。电子设备在获取网站地址之后,可以对该网站地址进行解析,得到与该网站地址对应的根域名;然后,该电子设备可以在预置的多个域名类型中,确定与该根域名对应的目标域名类型,并将该目标域名类型确定为第二标签集合下的第二标签。In some embodiments, the website address may include, but is not limited to, the root domain name of the website, the website program, and the website space. After obtaining the website address, the electronic device can parse the website address to obtain the root domain name corresponding to the website address; then, the electronic device can determine the target corresponding to the root domain name from the preset multiple domain name types domain name type, and the target domain name type is determined as the second label under the second label set.
可选的,电子设备基于预置的第一分类规则,确定与该根域名对应的目标域名类型,可以包括:电子设备利用贝叶斯分类器,基于预置的第一分类规则,确定与该根域名对应的目标域名类型。Optionally, the electronic device determines the target domain name type corresponding to the root domain name based on the preset first classification rule, which may include: the electronic device uses a Bayesian classifier to determine, based on the preset first classification rule, the target domain name type corresponding to the root domain name. The target domain name type corresponding to the root domain name.
该电子设备利用该贝叶斯分类器可准确对根域名进行分类,得到与该根域名对应的目标域名类型。The electronic device can accurately classify the root domain name by using the Bayesian classifier, and obtain the target domain name type corresponding to the root domain name.
实现方式2:该分析数据包括该与该请求URL对应的网络资源信息;电子设备获取该网络资源信息所属的资源类型对应的网页服务及该网页服务的第一特征信息;该电子设备将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,该第一目标特征信息为该多个网页服务的特征信息中与该第一特征信息匹配成功的特征信息;该电子设备将该目标资源类型确定为第二标签集合下的第二标签。Implementation 2: the analysis data includes the network resource information corresponding to the request URL; the electronic device obtains the web page service corresponding to the resource type to which the network resource information belongs and the first feature information of the web page service; the electronic device obtains the first feature information of the web page service; A feature information is matched with the preset feature information of multiple webpage services; if the matching is successful, the target resource type corresponding to the first target feature information is determined, and the first target feature information is the feature information of the multiple webpage services. The feature information that successfully matches with the first feature information in the electronic device; the electronic device determines the target resource type as the second tag under the second tag set.
在一些实施例中,电子设备在获取网络资源信息之后,需要判定该网络资源信息是静态网络资源还是动态网络资源,即需要判定该网络资源信息所属的资源类型;然后,该电子设备获取该资源类型对应的网页服务及该网页服务的第一特征信息,该网页服务指的是通过万维网(World Wide Web,WWW)超文本传输协议(Hyper Text Transfer Protocol,HTTP)进行通信的客户端和服务器应用程序,该第一特征信息指的是该网页服务被开发时对应的语言开发;接着,该电子设备在预置的多个网页服务的特征信息中,若获取到与该语言开发对应的第一目标特征信息,则可获取与该第一目标特征信息对应的语言开发类型,即目标资源类型,并将该目标资源类型确定为第二标签集合下的第二标签。In some embodiments, after acquiring the network resource information, the electronic device needs to determine whether the network resource information is a static network resource or a dynamic network resource, that is, it needs to determine the resource type to which the network resource information belongs; then, the electronic device acquires the resource The web service corresponding to the type and the first feature information of the web service, the web service refers to the client and server applications that communicate through the World Wide Web (World Wide Web, WWW) Hyper Text Transfer Protocol (Hyper Text Transfer Protocol, HTTP) program, the first feature information refers to the language development corresponding to the web page service being developed; then, if the electronic device obtains the first feature information corresponding to the language development in the preset feature information of multiple web page services target feature information, the language development type corresponding to the first target feature information, that is, the target resource type can be obtained, and the target resource type can be determined as the second label under the second label set.
可选的,电子设备将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,可以包括:电子设备利用贝叶斯分类器将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型。Optionally, the electronic device matches the first feature information with the preset feature information of multiple web services; if the matching is successful, determining the target resource type corresponding to the first target feature information may include: the electronic device utilizes The Bayesian classifier matches the first feature information with the preset feature information of multiple webpage services; if the matching is successful, the target resource type corresponding to the first target feature information is determined.
电子设备利用该贝叶斯分类器可准确对第一特征信息进行分类,得到与该第一目标特征信息对应的目标资源类型。The electronic device can accurately classify the first feature information by using the Bayesian classifier, and obtain the target resource type corresponding to the first target feature information.
实现方式3:该分析数据包括该目标URL;电子设备获取该目标URL的第二特征信息;该电子设备将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,该第二目标特征信息为该多个URL的特征信息中与该第二特征信息匹配成功的特征信息;该电子设备将该URL类型确定为第二标签集合下的第二标签。Implementation 3: the analysis data includes the target URL; the electronic device obtains the second feature information of the target URL; the electronic device matches the second feature information with the preset feature information of multiple URLs; if the matching is successful, Then determine the URL type corresponding to the second target feature information, and the second target feature information is the feature information that is successfully matched with the second feature information in the feature information of the multiple URLs; the electronic device determines the URL type as the first URL type. The second label under the second label set.
在一些实施例中,URL的特征信息指的是URL的文本特征信息,该文本特征信息可以包括但不限于词汇特征和/或结构特征,。In some embodiments, the feature information of the URL refers to the text feature information of the URL, and the text feature information may include, but is not limited to, lexical features and/or structural features.
电子设备中预置有多个词汇特征和/或多个结构特征,在这预置的多个词汇特征和/或多个结构特征中,不同的词汇特征和/或结构特征对应一个URL类型。可选的,URL类型可以包括但不限于以下至少一项:目录(root)类型、二级目录(subroot)类型、路径(path)类型及文件(file)类型等。其中,root类型可以表示URL对应的网站首页;subroot类型可以表示URL对应网站的二级栏目首页;path类型可以表示URL对应网站的列表类页面,包含着丰富的链接信息;file类型可以表示URL对应网站的网页都属于内容页面。Multiple lexical features and/or multiple structural features are preset in the electronic device. Among the preset multiple lexical features and/or multiple structural features, different lexical features and/or structural features correspond to a URL type. Optionally, the URL type may include, but is not limited to, at least one of the following: a directory (root) type, a secondary directory (subroot) type, a path (path) type, a file (file) type, and the like. Among them, the root type can represent the homepage of the website corresponding to the URL; the subroot type can represent the secondary column homepage of the website corresponding to the URL; the path type can represent the list page of the website corresponding to the URL, which contains rich link information; the file type can represent the corresponding URL Web pages are content pages.
该电子设备在获取目标URL的词汇特征和/或结构特征之后,会在预置的多个词汇特征和/或多个结构特征中,若获取到与目标URL的词汇特征和/或结构特征对应的URL类型,则将该URL类型确定为第二标签集合下的第二标签。After acquiring the lexical feature and/or structural feature of the target URL, the electronic device will, among the preset multiple lexical features and/or multiple structural features, obtain a lexical feature and/or structural feature corresponding to the target URL. The URL type is determined as the second tag under the second tag set.
可选的,电子设备将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,可以包括:电子设备利用贝叶斯分类器将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型。Optionally, the electronic device matches the second feature information with the preset feature information of multiple URLs; if the matching is successful, then determining the URL type corresponding to the second target feature information, which may include: the electronic device utilizes Bayeux. The classifier matches the second feature information with the preset feature information of multiple URLs; if the matching is successful, the URL type corresponding to the second target feature information is determined.
该电子设备利用该贝叶斯分类器可准确对第二特征信息进行分类,得到与该第二目标特征信息对应的URL类型。The electronic device can accurately classify the second feature information by using the Bayesian classifier, and obtain a URL type corresponding to the second target feature information.
实现方式4:该分析数据包括该网站的历史访问汇总信息;电子设备在该历史访问总汇信息中,获取预设时间段内的第一访问汇总信息,该第一访问汇总信息可以包括但不限于该目标用户对该网站的访问次数;该电子设备将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,该访问曲线用于反映该目标用户在该各个时刻对该网站的访问次数;该电子设备将该访问曲线的第三特征信息与预置的多个访问曲线分别对应的特征信息进行匹配;若匹配成功,则确定与第三目标特征信息对应的目标标签,该第三目标特征信息为该多个访问曲线分别对应的特征信息中与该第三特征信息匹配成功的特征信息;该电子设备将该目标标签确定为第二标签集合下的第二标签。Implementation mode 4: the analysis data includes the historical visit summary information of the website; the electronic device obtains the first visit summary information within a preset time period from the historical visit summary information, and the first visit summary information may include but is not limited to The number of visits of the target user to the website; the electronic device clusters the number of visits at each moment in the preset time period to obtain an access curve corresponding to the preset time period, and the access curve is used to reflect the target user's The number of visits to the website at each moment; the electronic device matches the third feature information of the access curve with the feature information corresponding to a plurality of preset access curves; if the matching is successful, it is determined to match the third target feature information The corresponding target tag, the third target feature information is the feature information that is successfully matched with the third feature information in the feature information corresponding to the multiple access curves respectively; the electronic device determines the target tag as the second tag set. Second tab.
在一些实施例中,不同的用户对应的预置的访问曲线是不同的,不同的预置的访问曲线对应的标签也是不同的。In some embodiments, preset access curves corresponding to different users are different, and labels corresponding to different preset access curves are also different.
电子设备可以在获取目标用户对应的历史访问总汇信息之后,获取预设时间段内的第一访问汇总信息,由于该第一访问总汇信息可以包括目标用户对该网站的访问次数,也可以包括预设时间段中的各个时刻信息,所以,该电子设备基于该第一访问总汇信息,将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,在该访问曲线中,各个时刻及与该各个时刻对应的访问次数即为该目标用户对应的访问曲线的特征信息。由于电子设备中存储有多个用户对应的访问曲线对应的特征信息,所以,该电子设备在这预置的多个访问曲线分别对应的特征信息中,若获取与该目标用户对应的访问曲线所对应的特征信息,则可直接获取该特征信息对应的目标标签,并将该目标标签确定为第二标签集合下的第二标签。The electronic device may obtain the first visit summary information within a preset time period after obtaining the historical visit summary information corresponding to the target user, because the first visit summary information may include the number of visits to the website by the target user, and may also include the predetermined time period. Set each moment information in the time period, so, based on the first access summary information, the electronic device performs clustering on the number of visits at each moment in the preset time period, and obtains the access curve corresponding to the preset time period. In the access curve, each time and the number of visits corresponding to each time are the characteristic information of the access curve corresponding to the target user. Since feature information corresponding to access curves corresponding to multiple users is stored in the electronic device, in the feature information corresponding to the preset multiple access curves, if the electronic device obtains all the access curves corresponding to the target user Corresponding feature information, the target label corresponding to the feature information can be directly obtained, and the target label is determined as the second label under the second label set.
需要说明的是,步骤301-303与步骤304-306的时序不做限定。It should be noted that the sequence of steps 301-303 and steps 304-306 is not limited.
307、根据第一标签集合和第二标签集合,确定与目标用户对应的目标标签集合。307. Determine a target label set corresponding to the target user according to the first label set and the second label set.
在一些实施例中,电子设备可对第一标签集合和第二标签集合进行处理,可准确确定与目标用户对应的目标标签集合。In some embodiments, the electronic device can process the first tag set and the second tag set, and can accurately determine the target tag set corresponding to the target user.
可选的,电子设备根据第一标签集合和第二标签集合,确定与目标用户对应的目标标签集合,可以包括但不限于以下其中一种实现方式:Optionally, the electronic device determines the target tag set corresponding to the target user according to the first tag set and the second tag set, which may include but is not limited to one of the following implementations:
实现方式1:电子设备将第一标签集合下的第一标签与第二标签集合下的第二标签进行去重处理,得到与目标用户对应的目标标签集合。Implementation mode 1: The electronic device performs deduplication processing on the first tag under the first tag set and the second tag under the second tag set, to obtain a target tag set corresponding to the target user.
在一些实施例中,第一标签集合下的第一标签与第二标签集合下的第二标签可能会存在相同的标签,电子设备可以在第一标签中去除该相同的标签,然后,将余下的第一标签与第二标签集合下的第二标签进行组合,得到与目标用户对应的目标标签集合;或者,该电子设备可以在第二标签中去除该相同的标签,然后,将余下的第二标签与第一标签集合下的第一标签进行组合,得到与目标用户对应的目标标签集合。也就是说,电子设备可以避免对该目标用户进行重复标记,以提高标记的完整性和准确性。In some embodiments, the first label under the first label set and the second label under the second label set may have the same label, the electronic device may remove the same label from the first label, and then store the remaining label The first label of the user is combined with the second label under the second label set to obtain the target label set corresponding to the target user; or, the electronic device can remove the same label from the second label, and then use the remaining The second tag is combined with the first tag under the first tag set to obtain a target tag set corresponding to the target user. That is, the electronic device can avoid repeating the marking of the target user, so as to improve the completeness and accuracy of the marking.
实现方式2:电子设备在第一标签集合下获取第一数量的第一标签,在第二标签集合下获取第二数量的第二标签;该电子设备根据该第一数量的第一标签和该第二数量的第二标签,得到与目标用户对应的目标标签集合。Implementation 2: The electronic device obtains a first number of first labels under the first label set, and obtains a second number of second labels under the second label set; the electronic device obtains the first label according to the first number of first labels and the For the second number of second tags, a target tag set corresponding to the target user is obtained.
可选的,该第一数量与该第二数量可以是相同的,也可以是不同的,此处不做具体限定。Optionally, the first quantity and the second quantity may be the same or different, which are not specifically limited here.
在一些实施例中,由于第一数量的第一标签与第二数量的第二标签即可有效标记目标用户,所以,电子设备可将该第一数量的第一标签和该第二数量的第二标签进行组合,得到与目标用户对应的目标标签集合,以对目标用户进行有效标记。In some embodiments, since the target user can be effectively marked with the first number of first tags and the second number of second tags, the electronic device can use the first number of first tags and the second number of first tags The two tags are combined to obtain a target tag set corresponding to the target user, so as to effectively mark the target user.
无论是实现方式1得到目标标签集合,还是实现方式2得到目标标签集合,该目标标签集合都是较为完整的,这样就可全面且有效地对目标用户进行标记,以使云防护系统后续能够对该目标用户提供差异程度更小,精细程度更高的服务。Regardless of whether the target tag set is obtained in implementation mode 1 or the target tag set obtained in implementation mode 2, the target tag set is relatively complete, so that the target user can be marked comprehensively and effectively, so that the cloud protection system can follow up This target user provides services with a smaller degree of differentiation and a higher degree of granularity.
在本申请实施例中,电子设备从用户信息系统中获取目标用户的用户数据;该电子设备利用命名实体识别NER技术对用户数据进行信息提取,可准确得到与目标用户对应的实体信息,并根据该实体信息确定第一标签集合,该第一标签集合下的第一标记可用于标记该目标用户在该用户信息系统中的真实情况;然后,该电子设备从云防护系统中获取该目标用户对应的网络访问日志数据;该电子设备对网络访问日志数据进行分析,得到分析数据,基于预设规则,确定与分析数据对应的数据类型,并根据数据类型确定第二标签集合,该第二标签集合下的第二标签都可用于标记该目标用户在该云防护系统中的上网记录信息;最后,该电子设备根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,能够结合用户数据及网络访问日志数据同时对目标用户进行标记,使得用户标记更加多样化及,该电子设备利用该目标标签集合下的标签可以对在云防护系统中访问网站的用户进行完整且准确地标记。In the embodiment of the present application, the electronic device obtains the user data of the target user from the user information system; the electronic device uses the named entity recognition NER technology to extract information from the user data, and can accurately obtain the entity information corresponding to the target user, and according to the The entity information determines a first tag set, and the first tag under the first tag set can be used to mark the real situation of the target user in the user information system; then, the electronic device obtains the corresponding correspondence of the target user from the cloud protection system the network access log data; the electronic device analyzes the network access log data to obtain analysis data, determines a data type corresponding to the analysis data based on preset rules, and determines a second label set according to the data type, the second label set The second label under the cloud protection system can be used to mark the Internet record information of the target user in the cloud protection system; finally, the electronic device determines the target label corresponding to the target user according to the first label set and the second label set. Collection, which can mark target users at the same time in combination with user data and network access log data, making user marks more diverse, and the electronic device can use the tags under the target tag collection to complete the user who visits the website in the cloud protection system. and accurately marked.
如图4所示,为本申请实施例中用户标记系统的一个实施例示意图,可以包括:As shown in FIG. 4, it is a schematic diagram of an embodiment of the user marking system in the embodiment of the present application, which may include:
第一数据采集模块401,用于从用户信息系统中获取目标用户的用户数据;The first
第一数据处理模块402,用于根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签;a first
第二数据采集模块403,用于从云防护系统中获取该目标用户对应的网络访问日志数据;The second data collection module 403 is configured to obtain the network access log data corresponding to the target user from the cloud protection system;
第二数据处理模块404,用于基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签;The second data processing module 404 is configured to analyze the network access log data based on a preset rule, and determine a second label set according to the analysis result, where the second label set includes at least one second label corresponding to the analysis result;
用户标记模块405,用于根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,该目标标签集合下的标签用于对该目标用户进行标记。The
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第一数据处理模块402,具体用于利用命名实体识别NER技术对该用户数据进行信息提取,得到与该目标用户对应的实体信息,并根据该实体信息确定第一标签集合;其中,该实体信息包括地域信息,将该地域信息确定为第一标签集合下的第一标签;和/或,该实体信息包括机构名称信息,获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签。The first
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第一数据处理模块402,具体用于基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型,该预置的行业划分规则包括多个机构名称信息对应的行业类型;将该目标行业类型确定为第一标签集合下的第一标签。The first
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第二数据处理模块404,具体用于对该网络访问日志数据进行分析,得到分析数据,该分析数据包括与该目标用户对应的网站地址、与请求统一资源定位器URL对应的网络资源信息、目标URL及网站的历史访问汇总信息中的至少一种;基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合。The second data processing module 404 is specifically configured to analyze the network access log data to obtain analysis data, where the analysis data includes the website address corresponding to the target user, the network resource information corresponding to the requesting uniform resource locator URL, the target At least one of URL and historical access summary information of the website; based on preset rules, determine the data type corresponding to the analysis data, and determine the second tag set according to the data type.
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第二数据处理模块404,具体用于该分析数据包括该与该目标用户对应的网站地址;对该网站地址进行解析,确定与该网站地址对应的根域名;基于预置的第一分类规则,确定与该根域名对应的目标域名类型,该预置的第一分类规则包括多个域名类型;将该目标域名类型确定为第二标签集合下的第二标签。The second data processing module 404 is specifically used for the analysis data including the website address corresponding to the target user; parsing the website address to determine the root domain name corresponding to the website address; based on the preset first classification rule, A target domain name type corresponding to the root domain name is determined, and the preset first classification rule includes multiple domain name types; the target domain name type is determined as a second label under the second label set.
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第二数据处理模块404,具体用于该分析数据包括该与该请求URL对应的网络资源信息;获取该网络资源信息所属的资源类型对应的网页服务及该网页服务的第一特征信息;将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,该第一目标特征信息为该多个网页服务的特征信息中与该第一特征信息匹配成功的特征信息;将该目标资源类型确定为第二标签集合下的第二标签。The second data processing module 404 is specifically configured for the analysis data to include the network resource information corresponding to the request URL; to obtain the web page service corresponding to the resource type to which the network resource information belongs and the first feature information of the web page service; The first feature information is matched with the preset feature information of multiple webpage services; if the matching is successful, the target resource type corresponding to the first target feature information is determined, and the first target feature information is the feature of the multiple webpage services. The feature information in the information that is successfully matched with the first feature information; the target resource type is determined as the second tag under the second tag set.
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第二数据处理模块404,具体用于该分析数据包括该目标URL;获取该目标URL的第二特征信息;将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,该第二目标特征信息为该多个URL的特征信息中与该第二特征信息匹配成功的特征信息;将该URL类型确定为第二标签集合下的第二标签。The second data processing module 404 is specifically used for the analysis data to include the target URL; obtaining second feature information of the target URL; matching the second feature information with the preset feature information of multiple URLs; if the matching is successful , then determine the URL type corresponding to the second target feature information, and the second target feature information is the feature information that is successfully matched with the second feature information in the feature information of the multiple URLs; the URL type is determined as the second label Second tab under Collections.
可选的,在本申请的一些实施例中,Optionally, in some embodiments of the present application,
第二数据处理模块404,具体用于该分析数据包括该网站的历史访问汇总信息;在该历史访问总汇信息中,获取预设时间段内的第一访问汇总信息,该第一访问汇总信息包括该目标用户对该网站的访问次数;将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,该访问曲线用于反映该目标用户在该各个时刻对该网站的访问次数;将该访问曲线的第三特征信息与预置的多个访问曲线分别对应的特征信息进行匹配;若匹配成功,则确定与第三目标特征信息对应的目标标签,该第三目标特征信息为该多个访问曲线分别对应的特征信息中与该第三特征信息匹配成功的特征信息;将该目标标签确定为第二标签集合下的第二标签。The second data processing module 404 is specifically configured for the analysis data to include historical visit summary information of the website; in the historical visit summary information, obtain first visit summary information within a preset time period, and the first visit summary information includes The number of visits to the website by the target user; clustering the number of visits at each moment in the preset time period to obtain an access curve corresponding to the preset time period, and the access curve is used to reflect the target user at each moment. The number of visits to the website; the third feature information of the access curve is matched with the feature information corresponding to the preset multiple access curves respectively; if the matching is successful, the target label corresponding to the third target feature information is determined, and the The third target feature information is the feature information that is successfully matched with the third feature information among the feature information corresponding to the multiple access curves respectively; the target label is determined as the second label under the second label set.
如图5所示,为本申请实施例中电子设备的一个实施例示意图。该电子设备可以包括:存储器501和处理器502,存储器501和处理器502耦合,处理器502调用存储器501中存储的可执行程序代码;As shown in FIG. 5 , it is a schematic diagram of an embodiment of the electronic device in the embodiment of the present application. The electronic device may include: a
在本申请实施例中,处理器502还具有以下功能:In this embodiment of the present application, the
从用户信息系统中获取目标用户的用户数据,并根据该用户数据确定第一标签集合,该第一标签集合包括至少一个与该用户数据对应的第一标签;Obtain user data of the target user from the user information system, and determine a first tag set according to the user data, where the first tag set includes at least one first tag corresponding to the user data;
从云防护系统中获取该目标用户对应的网络访问日志数据;Obtain the network access log data corresponding to the target user from the cloud protection system;
基于预设规则对该网络访问日志数据进行分析,并根据分析结果确定第二标签集合,该第二标签集合包括至少一个与该分析结果对应的第二标签;Analyze the network access log data based on a preset rule, and determine a second label set according to the analysis result, where the second label set includes at least one second label corresponding to the analysis result;
根据该第一标签集合和该第二标签集合,确定与该目标用户对应的目标标签集合,该目标标签集合下的标签用于对该目标用户进行标记。According to the first label set and the second label set, a target label set corresponding to the target user is determined, and the labels under the target label set are used to mark the target user.
可选的,处理器502还具有以下功能:Optionally, the
利用命名实体识别NER技术对该用户数据进行信息提取,得到与该目标用户对应的实体信息,并根据该实体信息确定第一标签集合;其中,该实体信息包括地域信息,将该地域信息确定为第一标签集合下的第一标签;和/或,该实体信息包括机构名称信息,获取与该机构名称信息对应的行业类型,并将该行业类型确定为第一标签集合下的第一标签。Use named entity recognition NER technology to extract information from the user data, obtain entity information corresponding to the target user, and determine the first label set according to the entity information; wherein, the entity information includes regional information, and the regional information is determined as The first label under the first label set; and/or, the entity information includes institution name information, obtain the industry type corresponding to the institution name information, and determine the industry type as the first label under the first label set.
可选的,处理器502还具有以下功能:Optionally, the
基于预置的行业划分规则,确定与该机构名称信息对应的目标行业类型,该预置的行业划分规则包括多个机构名称信息对应的行业类型;将该目标行业类型确定为第一标签集合下的第一标签。Determine the target industry type corresponding to the institution name information based on a preset industry division rule, where the preset industry division rule includes industry types corresponding to multiple institution name information; determine the target industry type as the first label set the first label.
可选的,处理器502还具有以下功能:Optionally, the
对该网络访问日志数据进行分析,得到分析数据,该分析数据包括与该目标用户对应的网站地址、与请求统一资源定位器URL对应的网络资源信息、目标URL及网站的历史访问汇总信息中的至少一种;基于预设规则,确定与该分析数据对应的数据类型,并根据该数据类型确定第二标签集合。The network access log data is analyzed to obtain analysis data, and the analysis data includes the website address corresponding to the target user, the network resource information corresponding to the requesting uniform resource locator URL, the target URL and the historical access summary information of the website. At least one of: determining a data type corresponding to the analysis data based on a preset rule, and determining a second tag set according to the data type.
可选的,处理器502还具有以下功能:Optionally, the
该分析数据包括该与该目标用户对应的网站地址;对该网站地址进行解析,确定与该网站地址对应的根域名;基于预置的第一分类规则,确定与该根域名对应的目标域名类型,该预置的第一分类规则包括多个域名类型;将该目标域名类型确定为第二标签集合下的第二标签。The analysis data includes the website address corresponding to the target user; the website address is parsed to determine the root domain name corresponding to the website address; based on the preset first classification rule, the target domain name type corresponding to the root domain name is determined , the preset first classification rule includes multiple domain name types; the target domain name type is determined as the second label under the second label set.
可选的,处理器502还具有以下功能:Optionally, the
该分析数据包括该与该请求URL对应的网络资源信息;获取该网络资源信息所属的资源类型对应的网页服务及该网页服务的第一特征信息;将该第一特征信息与预置的多个网页服务的特征信息进行匹配;若匹配成功,则确定与第一目标特征信息对应的目标资源类型,该第一目标特征信息为该多个网页服务的特征信息中与该第一特征信息匹配成功的特征信息;将该目标资源类型确定为第二标签集合下的第二标签。The analysis data includes the network resource information corresponding to the request URL; obtain the web page service corresponding to the resource type to which the network resource information belongs and the first feature information of the web page service; combine the first feature information with preset multiple The feature information of the web page service is matched; if the matching is successful, the target resource type corresponding to the first target feature information is determined, and the first target feature information is the feature information of the plurality of web page services. The first feature information is successfully matched with the first feature information feature information; determine the target resource type as the second tag under the second tag set.
可选的,处理器502还具有以下功能:Optionally, the
该分析数据包括该目标URL;获取该目标URL的第二特征信息;将该第二特征信息与预置的多个URL的特征信息进行匹配;若匹配成功,则确定与第二目标特征信息对应的URL类型,该第二目标特征信息为该多个URL的特征信息中与该第二特征信息匹配成功的特征信息;将该URL类型确定为第二标签集合下的第二标签。The analysis data includes the target URL; obtain second feature information of the target URL; match the second feature information with the feature information of multiple preset URLs; if the matching is successful, determine that it corresponds to the second target feature information The URL type of the URL, the second target feature information is the feature information that successfully matches the second feature information in the feature information of the multiple URLs; the URL type is determined as the second tag under the second tag set.
可选的,处理器502还具有以下功能:Optionally, the
该分析数据包括该网站的历史访问汇总信息;在该历史访问总汇信息中,获取预设时间段内的第一访问汇总信息,该第一访问汇总信息包括该目标用户对该网站的访问次数;将该预设时间段内各个时刻的访问次数进行聚类,得到该预设时间段对应的访问曲线,该访问曲线用于反映该目标用户在该各个时刻对该网站的访问次数;将该访问曲线的第三特征信息与预置的多个访问曲线分别对应的特征信息进行匹配;若匹配成功,则确定与第三目标特征信息对应的目标标签,该第三目标特征信息为该多个访问曲线分别对应的特征信息中与该第三特征信息匹配成功的特征信息;将该目标标签确定为第二标签集合下的第二标签。The analysis data includes historical visit summary information of the website; in the historical visit summary information, obtain first visit summary information within a preset time period, and the first visit summary information includes the number of visits to the website by the target user; Clustering the number of visits at each moment in the preset time period to obtain an access curve corresponding to the preset time period, where the access curve is used to reflect the number of visits of the target user to the website at each moment; the visit The third feature information of the curve is matched with the feature information corresponding to the preset multiple access curves; if the matching is successful, the target label corresponding to the third target feature information is determined, and the third target feature information is the multiple access curves. Among the feature information corresponding to the curves respectively, the feature information that is successfully matched with the third feature information; the target label is determined as the second label under the second label set.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.
Claims (11)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210393449.3A CN114662034A (en) | 2022-04-14 | 2022-04-14 | User marking method, user marking system, electronic device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210393449.3A CN114662034A (en) | 2022-04-14 | 2022-04-14 | User marking method, user marking system, electronic device and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114662034A true CN114662034A (en) | 2022-06-24 |
Family
ID=82035165
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210393449.3A Pending CN114662034A (en) | 2022-04-14 | 2022-04-14 | User marking method, user marking system, electronic device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114662034A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115311003A (en) * | 2022-07-12 | 2022-11-08 | 车智互联(北京)科技有限公司 | A kind of user label generation method, computing device and storage medium |
| CN116991954A (en) * | 2023-08-21 | 2023-11-03 | 青岛以萨数据技术有限公司 | Account data marking method and device, storage medium and electronic equipment |
| CN118734306A (en) * | 2024-06-14 | 2024-10-01 | 恒安嘉新(北京)科技股份公司 | A method, device, electronic device and storage medium for generating a label library |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106504099A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of system for building user's portrait |
| CN107528749A (en) * | 2017-08-28 | 2017-12-29 | 杭州安恒信息技术有限公司 | Website Usability detection method, apparatus and system based on cloud protection daily record |
| CN109561162A (en) * | 2017-09-26 | 2019-04-02 | 北京国双科技有限公司 | Excavate the method and device that user accesses hobby |
| CN110276075A (en) * | 2019-06-21 | 2019-09-24 | 腾讯科技(深圳)有限公司 | Model training method, named entity recognition method, device, equipment and medium |
| WO2021068608A1 (en) * | 2019-10-11 | 2021-04-15 | 深圳壹账通智能科技有限公司 | Method and apparatus for extracting user portrait, and computer device and storage medium |
| CN114254242A (en) * | 2022-03-01 | 2022-03-29 | 互联网域名系统北京市工程研究中心有限公司 | User portrait method and device based on recursive analysis log |
-
2022
- 2022-04-14 CN CN202210393449.3A patent/CN114662034A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106504099A (en) * | 2015-09-07 | 2017-03-15 | 国家计算机网络与信息安全管理中心 | A kind of system for building user's portrait |
| CN107528749A (en) * | 2017-08-28 | 2017-12-29 | 杭州安恒信息技术有限公司 | Website Usability detection method, apparatus and system based on cloud protection daily record |
| CN109561162A (en) * | 2017-09-26 | 2019-04-02 | 北京国双科技有限公司 | Excavate the method and device that user accesses hobby |
| CN110276075A (en) * | 2019-06-21 | 2019-09-24 | 腾讯科技(深圳)有限公司 | Model training method, named entity recognition method, device, equipment and medium |
| WO2021068608A1 (en) * | 2019-10-11 | 2021-04-15 | 深圳壹账通智能科技有限公司 | Method and apparatus for extracting user portrait, and computer device and storage medium |
| CN114254242A (en) * | 2022-03-01 | 2022-03-29 | 互联网域名系统北京市工程研究中心有限公司 | User portrait method and device based on recursive analysis log |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115311003A (en) * | 2022-07-12 | 2022-11-08 | 车智互联(北京)科技有限公司 | A kind of user label generation method, computing device and storage medium |
| CN116991954A (en) * | 2023-08-21 | 2023-11-03 | 青岛以萨数据技术有限公司 | Account data marking method and device, storage medium and electronic equipment |
| CN118734306A (en) * | 2024-06-14 | 2024-10-01 | 恒安嘉新(北京)科技股份公司 | A method, device, electronic device and storage medium for generating a label library |
| CN118734306B (en) * | 2024-06-14 | 2025-09-09 | 恒安嘉新(北京)科技股份公司 | Tag library generation method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10043199B2 (en) | Method, device and system for publishing merchandise information | |
| US9529780B2 (en) | Displaying content on a mobile device | |
| CN102708174B (en) | Method and device for displaying rich media information in a browser | |
| US20090089278A1 (en) | Techniques for keyword extraction from urls using statistical analysis | |
| CN114662034A (en) | User marking method, user marking system, electronic device and storage medium | |
| WO2016173200A1 (en) | Malicious website detection method and system | |
| CN103685604B (en) | A kind of domain name pre-parsed method and device | |
| WO2004084097A1 (en) | Method and apparatus for detecting invalid clicks on the internet search engine | |
| CN105868290B (en) | Method and device for displaying search results | |
| WO2014153457A1 (en) | Merging web page style addresses | |
| CN113076294B (en) | Information sharing method and device | |
| CN103647767A (en) | Website information display method and apparatus | |
| CN104023046B (en) | Mobile terminal recognition method and device | |
| US20110270691A1 (en) | Method and system for providing url possible new advertising | |
| CN105808642B (en) | Recommended methods and devices | |
| WO2018145637A1 (en) | Method and device for recording web browsing behavior, and user terminal | |
| CN111415183A (en) | Method and apparatus for processing access requests | |
| CN111639283A (en) | Corpus construction method, device, electronic device and medium | |
| CN106919600A (en) | One kind failure network address access method and terminal | |
| KR101499685B1 (en) | Method for Providing Keywords Tree | |
| KR20150048831A (en) | Social context for offsite advertisements | |
| JP2011209886A (en) | Method, program, and device for annotation | |
| KR20090049507A (en) | Public opinion analysis method and system through communication network and recording medium therefor | |
| CN116484133B (en) | Webpage identification processing method and device, computer equipment and readable storage medium | |
| CN107066510A (en) | An information processing method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |