CN112818249B - A method and system for constructing a multi-dimensional portrait of a specific tendency group - Google Patents
A method and system for constructing a multi-dimensional portrait of a specific tendency group Download PDFInfo
- Publication number
- CN112818249B CN112818249B CN202110244522.6A CN202110244522A CN112818249B CN 112818249 B CN112818249 B CN 112818249B CN 202110244522 A CN202110244522 A CN 202110244522A CN 112818249 B CN112818249 B CN 112818249B
- Authority
- CN
- China
- Prior art keywords
- tendency
- data
- library
- target
- specific
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Abstract
本发明涉及一种特定倾向性人群的多维画像构建方法,具体包括如下步骤S1:将输入法数据源作为目标人群数据来源;S2:根据输入法数据源和实际需求构建至少一个具有至少一个特定倾向性特征的集合的特征库;S3:将待分析人群的输入法数据源与特征库进行匹配,筛选出具有特定倾向性的目标人群;S4:提取目标人群的输入的原始文本数据,并构建单通道目标人群画像库;S5:导出个体标识类信息,并将该个体在所有互联网平台存在的网络账号进行关联;S6:跨互联网平台网络数据融合,将各个互联网平台的异构网络数据融合,根据分析结果对个体的特定倾向性进行调整并形成特定倾向性目标人群多维画像库,且对S2中的相关倾向性分级特征库进行调整和完善。
The present invention relates to a method for constructing a multi-dimensional portrait of a specific tendency group, which specifically includes the following steps: S1: taking the input method data source as the target group data source; S2: constructing at least one image with at least one specific tendency according to the input method data source and actual needs S3: Match the input method data source of the population to be analyzed with the feature database, and filter out the target population with specific tendencies; S4: Extract the input original text data of the target population, and construct a single Channel target group portrait library; S5: Export individual identification information, and associate the individual's online accounts on all Internet platforms; S6: Cross-Internet platform network data fusion, fusion of heterogeneous network data of each Internet platform, according to The analysis results adjust the specific tendency of the individual and form a multi-dimensional portrait library of the specific tendency target population, and adjust and improve the related tendency classification feature library in S2.
Description
技术领域technical field
本发明涉及一种特定倾向性人群的多维画像构建方法。The invention relates to a method for constructing a multi-dimensional portrait of a specific tendency group.
本发明还涉及一种特定倾向性人群的多维画像构建系统。The invention also relates to a multi-dimensional portrait construction system of a specific tendency group.
背景技术Background technique
随着移动互联网、社交媒体的普及,各类违法信息传播加剧,传播手段日益隐蔽,现有监测手段的不足日益凸显。互联网发展为有关部门开展反非法活动等专项工作带来了空前的挑战。包括云盘、即时通讯、论坛博客和网络金融在内的各类网络应用的普及,使得具有特定非法倾向的人员大量利用私密化、圈群化的网络工具,给煽动、招募、组织、策划和实施非法活动带来了极大便利。传统的基于网络爬虫采集网页数据的做法存在发现难、追踪难、威慑力不够和成本较高等诸多问题,必须另辟蹊径,创新专题信息管控的新模式,切实提升对各类违法违规信息的感知能力。大数据技术成为变革社会治理的重要技术手段之一。With the popularization of mobile Internet and social media, the dissemination of various illegal information has intensified, the means of dissemination have become increasingly concealed, and the insufficiency of existing monitoring methods has become increasingly prominent. The development of the Internet has brought unprecedented challenges for relevant departments to carry out special work such as anti-illegal activities. The popularization of various network applications, including cloud disks, instant messaging, forums and blogs, and online finance, makes people with specific illegal tendencies make extensive use of private and group-based network tools to incite, recruit, organize, plan and The implementation of illegal activities has brought great convenience. The traditional method of collecting web page data based on web crawlers has many problems such as difficulty in discovery, difficulty in tracking, insufficient deterrence, and high cost. It is necessary to find another way to innovate a new mode of special information management and control, and effectively improve the ability to perceive all kinds of illegal information. Big data technology has become one of the important technical means to transform social governance.
由于各类违法违规的网络公害类信息传播加剧,传播手段日益隐蔽,传统的监测手段的不足日益凸显。因此,必须转变思维、另辟蹊径,创新网上非法活动等网络公害类信息管控模式,切实提升对网络公害类信息生产和传播的感知力和防控力。近年来,全球互联网数据总量维持在较高增长率,一方面,技术监测面远不及全网数据增长率,单纯追加人力或技术资源难以达到效果;另一方面,网络传播新技术新应用造成监测盲区,主要包括各类社交媒体、网盘、邮件组、即时通讯群组等封闭式传播平台,以及朋友圈、内容分享社区、直播弹幕等半封闭式传播平台。从而导致专题有害信息管控“层层设防、层层难防”。Due to the intensified dissemination of various types of illegal and illegal network public nuisance information, the means of dissemination have become increasingly concealed, and the insufficiency of traditional monitoring methods has become increasingly prominent. Therefore, it is necessary to change thinking, find new ways, innovate online public nuisance information management and control modes such as illegal activities on the Internet, and effectively improve the perception and prevention and control of the production and dissemination of network public nuisance information. In recent years, the total amount of global Internet data has maintained a relatively high growth rate. On the one hand, the technical monitoring area is far less than the growth rate of the entire network data, and it is difficult to achieve results simply by adding human or technical resources; Monitoring blind spots mainly include closed communication platforms such as various social media, online disks, email groups, and instant messaging groups, as well as semi-closed communication platforms such as circle of friends, content sharing communities, and live broadcast bullet screens. As a result, the management and control of special harmful information is "fortified and difficult to prevent".
此外,各类非法信息的传播相对于常规有害信息,其组织性、目的性较强,而且特征明显:内容生产更加隐蔽,传播圈群化、私密化,爆发更具随机性,境内外互通频繁。这些特点使得这类有害信息识别、监测十分困难,很难发现源头和提前预测,但是一旦出现又会造成极为恶劣的影响,导致监管始终处于被动局面。只有深入研究这类非法信息的生产传播规律,才能避免“大海捞针”的困境,做到“有的放矢”和事半功倍的作用。In addition, the dissemination of various types of illegal information is more organized and purposeful than conventional harmful information, and has obvious characteristics: content production is more concealed, dissemination circles are grouped and private, outbreaks are more random, and domestic and foreign exchanges are frequent. . These characteristics make it very difficult to identify and monitor such harmful information, and it is difficult to find the source and predict in advance. Only by thoroughly studying the law of production and dissemination of such illegal information can we avoid the dilemma of "finding a needle in a haystack" and achieve "targeted" and multiplier effects.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种特定倾向性人群的多维画像构建方法,可以快速、精准地感知具有特定倾向性的人群,并根据个体标识信息进行多平台追踪和综合研判,并形成个体倾向性人群的多维画像库,从而便于用于对倾向性人群的群体和个体的研究和跟踪分析,同时也有利于及时掌控这类违法有害信息的生产和传播源头,方便用户后续对这类违法有害信息及其生产传播人群及时进行依法依规的管理。The purpose of the present invention is to provide a multi-dimensional portrait construction method of a specific tendency group, which can quickly and accurately perceive the specific tendency group, and conduct multi-platform tracking and comprehensive research and judgment according to the individual identification information, and form an individual tendency group. The multi-dimensional image library is convenient for the research and tracking analysis of groups and individuals of tendentious groups, and it is also conducive to timely control of the production and dissemination sources of such illegal and harmful information, which is convenient for users to follow up on such illegal and harmful information and Its production and dissemination populations are managed in a timely manner in accordance with laws and regulations.
本发明的另一目的是提供一种特定倾向性人群的多维画像构建系统,能实现上述方法。Another object of the present invention is to provide a multi-dimensional portrait construction system of a specific tendency group, which can realize the above method.
本发明的一种特定倾向性人群的多维画像构建方法,具体包括如下步骤:A method for constructing a multi-dimensional portrait of a specific tendency group of the present invention specifically includes the following steps:
S1:将输入法数据源作为待分析人群数据来源;S1: Use the input method data source as the data source of the population to be analyzed;
S2:根据输入法数据源以及实际需求构建至少一个特征库,所述特征库具有至少一个特定倾向性特征的集合;S2: Build at least one feature library according to the input method data source and actual requirements, and the feature library has a set of at least one specific tendency feature;
S3:将待分析人群的输入法数据源与特征库进行匹配,筛选出具有至少一个特定倾向性的目标人群;S3: Match the input method data source of the population to be analyzed with the feature database, and screen out the target population with at least one specific tendency;
S4:提取目标人群的输入的原始文本数据,并构建基于输入法的单通道目标人群画像库;S4: Extract the input original text data of the target group, and construct a single-channel target group portrait library based on the input method;
S5:导出单通道目标人群画像库中的个体标识类信息,并将其推送给各互联网平台,并将该个体在所有互联网平台存在的网络账号进行关联;S5: Export the individual identification information in the single-channel target group portrait database, push it to each Internet platform, and associate the individual's online accounts on all Internet platforms;
S6:跨互联网平台网络数据融合,将S5中的各个互联网平台的异构网络数据融合;根据分析结果对该个体的特定倾向性进行调整并形成特定倾向性目标人群多维画像库,且对S2中的相关倾向性分级特征库进行调整和完善。S6: Cross-Internet platform network data fusion, fuse the heterogeneous network data of each Internet platform in S5; adjust the specific tendency of the individual according to the analysis results and form a multi-dimensional portrait library of the target population with specific tendency, and analyze the data in S2. The related propensity grading feature library can be adjusted and improved.
采用上述方法,可以快速、精准地感知具有特定倾向性的人群,并根据个体标识信息进行多平台追踪和综合研判,并形成个体倾向性人群的多维画像库,从而便于用于对倾向性人群的群体和个体的研究和跟踪分析,同时也有利于及时掌控这类违法有害信息的生产和传播源头,方便用户后续对这类违法有害信息及其生产传播人群及时进行依法依规的管理。Using the above method, people with specific tendencies can be quickly and accurately sensed, and multi-platform tracking and comprehensive research and judgment can be carried out according to individual identification information, and a multi-dimensional portrait library of individual tendent groups can be formed, which is convenient for the analysis of tendent groups. The research and tracking analysis of groups and individuals is also conducive to timely control of the source of production and dissemination of such illegal and harmful information, and it is convenient for users to manage such illegal and harmful information and its production and dissemination groups in a timely manner in accordance with laws and regulations.
进一步,步骤S5包括:Further, step S5 includes:
S5.1导出单通道目标人群画像库中的个体标识类信息,其包括但不限于:用户在使用输入法时所使用的设备标识码、输入法注册手机号、输入法注册邮箱;S5.1 Export the individual identification information in the single-channel target group portrait database, including but not limited to: the device identification code used by the user when using the input method, the registered mobile phone number of the input method, and the registered mailbox of the input method;
S5.2各互联网平台对各个体标识类信息进行排查,并提供该个体在其平台上的账号及相关数据信息,获取目标人群在各互联网平台上可能存在的网络账号并相互关联,形成目标人群网络帐号库。S5.2 Each Internet platform checks the identification information of each individual, and provides the account and related data information of the individual on its platform, obtains the possible online accounts of the target group on each Internet platform and associates them with each other to form the target group Network account library.
采用上述方法,通过设备标识码、输入法注册手机号、输入法注册邮箱等个体标识类信息对个体的各互联网账号进行关联,对目标人群的个体进行多平台追踪分析,以更全面地对个体相关信息进行追踪和管理。The above method is used to associate individual Internet accounts with individual identification information such as device identification code, input method registered mobile phone number, and input method registered email address, and conduct multi-platform tracking and analysis of the target group of individuals, so as to more comprehensively analyze the individual Track and manage relevant information.
步骤S6包括:Step S6 includes:
S6.1对目标人群网络帐号库中每个个体的输入法数据源和各互联网平台的数据信息进行数据融合;S6.1 Data fusion is performed on the input method data source of each individual in the target group's network account database and the data information of each Internet platform;
S6.2根据S6.1的融合数据进行综合研判,并对目标人群个体的特定倾向性和/或级别进行调整,形成特定倾向性目标人群分级多维画像库,并对S2中的相关倾向性的特征库进行调整和完善。S6.2 conducts comprehensive research and judgment according to the fusion data of S6.1, and adjusts the specific tendencies and/or levels of the individual target groups to form a hierarchical multi-dimensional portrait library of the target groups with specific tendencies. The feature library is adjusted and improved.
采用上述方法,通过融合数据进行综合研判,从而基于较为全面的互联网数据资源,来形成精准的特定倾向性目标人群分级多维画像库,并据此对相应的特征库进行调整和完善,以提供筛选精准度。The above method is used to conduct comprehensive research and judgment through the fusion of data, so as to form an accurate hierarchical multi-dimensional portrait database of specific tendentious target groups based on relatively comprehensive Internet data resources, and adjust and improve the corresponding feature database accordingly to provide screening. precision.
还包括步骤S7:用户业务支撑,基于特定倾向性目标人群分级多维画像库,根据实际情况研发不同使用模型,包括但不限于:实体发现、目标活动轨迹还原和追踪、伴随关系分析、信息溯源和扩散分析、社会关系网络还原和社会关系网络挖掘或其他标准化基础数据分析模型。It also includes step S7: user business support, based on a hierarchical multi-dimensional portrait database of specific tendentious target groups, and developing different usage models according to the actual situation, including but not limited to: entity discovery, target activity trajectory restoration and tracking, accompanying relationship analysis, information traceability and Diffusion analysis, social relationship network restoration and social relationship network mining or other standardized basic data analysis models.
采用上述方法,利用融合后的数据后续研发设置不同使用模型,以满足复杂多样的用户使用需求Using the above method, use the fused data to set up different usage models in subsequent research and development to meet the complex and diverse user needs
步骤S1包括:Step S1 includes:
S1.1多源数据采集:基于不同输入法采集数据,包括但不限于:输入文本、输入时间、输入所在平台、设备标识码、注册账号;S1.1 Multi-source data collection: collect data based on different input methods, including but not limited to: input text, input time, input platform, device identification code, registered account;
S1.2多源异构数据处理:对所采集的数据进行预处理,根据清洗机制或筛除机制,清除噪音信息或空白信息;S1.2 Multi-source heterogeneous data processing: preprocess the collected data, and remove noise information or blank information according to the cleaning mechanism or screening mechanism;
S1.3输入法数据源基础库建立:基于预处理后的数据源构建输入法数据源基础库,进行存储管理并建立查询检索机制。S1.3 Establishment of input method data source base library: build input method data source base library based on the preprocessed data source, perform storage management and establish a query and retrieval mechanism.
采用上述方法,一方面通过采集不同类型的输入法工具采集不同的输入数据源,该数据包括但不限于输入文本、输入时间、输入所在平台、设备标识码、注册账号等,如此采集的数据具有多源性和丰富性,使得数据比较完整、全面,也便于后续对目标人群的分析和管理;另一方面在处理前对数据进行清洗或筛选,清除噪音信息或空白信息,提高待分析人群数据的有效性。Using the above method, on the one hand, different input data sources are collected by collecting different types of input method tools, and the data includes but not limited to input text, input time, input platform, device identification code, registered account, etc. The data thus collected has The multi-source and richness makes the data relatively complete and comprehensive, and also facilitates the subsequent analysis and management of the target population; effectiveness.
步骤S2包括:Step S2 includes:
S2.1初始特征库构建:根据实际需求构建至少具有一个特定倾向的初始特征库;S2.1 Initial feature library construction: build an initial feature library with at least one specific tendency according to actual needs;
S2.2分级标识:按分级标准对初始特征库的某个特定倾向进行属性分级和标识,在表征某特定倾向性特征时能够区分倾向性程度;S2.2 Grading identification: attribute classification and identification of a specific tendency in the initial feature library according to the grading standard, and can distinguish the degree of tendency when characterizing a specific tendency feature;
S2.3特征库补充:分析与研究S6中的文本信息,将新发现的用于表征特定倾向性特征的文本信息添加到相关特定倾向性特征库中;S2.3 Feature Library Supplement: Analyze and study the text information in S6, and add the newly discovered text information used to characterize specific tendency features to the relevant specific tendency feature library;
S2.4特征库调整:根据S6中目标人群的输入法输入的文本信息,对表征特定倾向性特征的文本信息进行核对,并根据核对结果对相应特定倾向性特征库的内容和分级标识进行调整。S2.4 Feature library adjustment: According to the text information input by the input method of the target group in S6, check the text information representing the specific tendency feature, and adjust the content and graded identification of the corresponding specific tendency feature library according to the check result. .
采用上述方法,在最初构建具有至少具有一个特定倾向性特特征的集合的初始特征库,并在该特征库内部根据相关特征词或目标人群的特征词出现频率对目标人群进行分级和标识,这样后续可以根据其标识级别进行区别管理;另外,根据最终匹配结果对特征库和分级标准进行调整、完善,从而不断提高该方法的判断分析精准度。同时,对具备特定倾向性的部分目标人群的输入法原始数据进行智能分析,提取反映特定倾向性的新词,包括具备一定区分度的特定术语、暗语、黑话、行话、口号、简称、缩写别称、别名等文本信息及其组合,或者具备特定倾向性意义的人名、地名、组织机构名称等文本信息及其组合构建相应抽取算法,及时发现新词和隐晦用语,不断更新和完善特征库,以提高该方法的判断分析精准度和筛选速度。Using the above method, an initial feature database with at least one set of specific tendency features is initially constructed, and the target population is graded and identified according to the frequency of occurrence of the relevant feature words or the feature words of the target population in the feature database, so that In the follow-up, different management can be carried out according to its identification level; in addition, the feature library and classification standard are adjusted and improved according to the final matching result, so as to continuously improve the judgment and analysis accuracy of this method. At the same time, intelligently analyze the input method raw data of some target groups with specific inclinations, and extract new words that reflect specific inclinations, including specific terms, code words, slang, jargon, slogans, abbreviations, and abbreviations with a certain degree of distinction. , aliases and other text information and their combinations, or text information with specific tendent meanings such as names, place names, organization names and their combinations to construct corresponding extraction algorithms, discover new words and cryptic terms in a timely manner, and continuously update and improve the feature database. Improve the judgment analysis accuracy and screening speed of this method.
本发明还提供一种特定倾向性人群的多维画像构建系统,包括:The present invention also provides a multi-dimensional portrait construction system for a specific tendency group, including:
输入法数据源子系统,归集和存储有待分析人群的输入法数据信息;The input method data source subsystem collects and stores the input method data information of the population to be analyzed;
倾向性特征库子系统,具有至少一个特定倾向性的特征和特征分级信息的集合;以及A propensity feature library subsystem, having at least one specific propensity feature and a collection of feature ranking information; and
倾向性匹配子系统,将输入法数据源子系统中的数据信息与倾向性特征库子系统中的特性信息进行比对,并对具有某一特定倾向性的人群进行标记,筛选出具有至少一个特定倾向性的目标人群;The propensity matching subsystem compares the data information in the input method data source subsystem with the characteristic information in the propensity feature library subsystem, marks the people with a specific tendency, and filters out at least one specific tendencies of the target population;
目标人群网络账号子系统,根据目标人群的个体标识类信息将该个体在所有互联网平台存在的网络账号进行关联;The target group's network account subsystem, which associates the individual's online accounts on all Internet platforms according to the target group's individual identification information;
跨互联网平台网络数据融合子系统,将各个互联网平台的异构网络数据融合,根据分析结果对该个体的特定倾向性进行调整并形成特定倾向性目标人群多维画像库,并对相关倾向性分级特征库进行调整和完善。The cross-Internet platform network data fusion subsystem integrates the heterogeneous network data of each Internet platform, adjusts the specific tendencies of the individual according to the analysis results, and forms a multi-dimensional portrait database of the target population with specific tendencies, and classifies the characteristics of related tendencies. library to adjust and improve.
所述目标人群网络账号子系统包括:The target group network account subsystem includes:
目标人群个体标识信息数据库,从倾向性匹配子系统的基于输入法的单通道目标人群画像库中,输出个体使用输入法时所涉及的设备标识码、注册手机号、注册邮箱或其他个体标识类信息,并归集形成该数据库;并以此为依据,要求包括视频网站、云盘、社交网络等互联网企业对其这些个体标识类进行自身排查;Target group individual identification information database, from the input method-based single-channel target group portrait library of the propensity matching subsystem, output the device identification code, registered mobile phone number, registered email address or other individual identification types involved when the individual uses the input method information, and collect it to form the database; and based on this, Internet companies including video websites, cloud disks, social networks and other Internet companies are required to conduct their own investigations on these individual identification categories;
目标人群网络帐号库,各个互联网企业基于目标人群个体标识信息数据库进行排查,对于具备某个/多个特定倾向性的目标人群个体在其运营的互联网平台上可能注册的网络账号进行归集,形成目标人群网络帐号库。The target group network account database, each Internet company conducts investigations based on the target group’s individual identification information database, and collects the network accounts that may be registered by the target group individuals with one or more specific tendencies on the Internet platform operated by them to form Target population network account database.
跨平台网络数据融合子系统包括:The cross-platform network data fusion subsystem includes:
多源异构网络数据融合模块,根据目标人群网络帐号库,各个互联网平台要求提供相关帐号的网络数据,将具有特定倾向性的个体在多个互联网平台的异构网络数据与其输入法数据源进行数据融合;Multi-source heterogeneous network data fusion module, according to the target population network account database, each Internet platform requires the network data of the relevant account, and analyzes the heterogeneous network data of individuals with specific tendencies on multiple Internet platforms and their input method data sources. Data Fusion;
综合倾向性研判和分级模块,针对个体的跨平台融合数据进行综合研判,对其特定倾向性及其分级进行二次确认,对倾向性特征库子系统中的相关倾向性分级特征库进行调整和完善;The comprehensive tendency judgment and classification module conducts comprehensive research and judgment on the cross-platform fusion data of individuals, conducts secondary confirmation of its specific tendency and its classification, and adjusts and adjusts the relevant tendency classification feature library in the tendency feature library subsystem. Complete;
特定倾向性目标人群分级多维画像库,基于较为全面的互联网数据资源,根据综合倾向性研判和分级模块的分析和确认结果形成该特定倾向性目标人群分级多维画像库。The hierarchical multi-dimensional portrait library of specific tendentious target groups is based on relatively comprehensive Internet data resources, and is formed according to the analysis and confirmation results of the comprehensive tendentiousness judgment and the grading module.
还包括倾向性特征发现和核实子系统,其包括:Also included is a propensity feature discovery and verification subsystem, which includes:
目标人群原始输入文本库,提取倾向性匹配子系统中的基于输入法的单通道目标人群画像库的目标人群的输入的原始文本数据,作为本子系统的基础数据来源;The original input text library of the target group, extracting the original text data of the target group input in the single-channel target group portrait library based on the input method in the propensity matching subsystem, as the basic data source of this subsystem;
倾向性特征发现模块,提取目标人群原始输入文本库中用于表征且未被录入在倾向性特征库子系统中的文本信息,经核实后将该文本信息作为新特征补入到相应的特征库中;The tendency feature discovery module extracts the text information that is used for representation in the original input text library of the target group and is not entered in the tendency feature library subsystem. After verification, the text information is added to the corresponding feature library as a new feature. middle;
倾向性特征核实模块,对倾向性匹配子系统中的基于输入法的单通道目标人群画像库的目标人群的输入的原始文本数据进行二次综合研判,核实该目标人群所体现的倾向性及其分级。并根据核实结论调整倾向性特征库子系统的内容和分级标识。The propensity feature verification module conducts secondary comprehensive research and judgment on the input original text data of the target population of the single-channel target population portrait library based on the input method in the propensity matching subsystem, and verifies the propensity and its characteristics of the target population. Grading. And according to the verification conclusion, adjust the content and classification mark of the tendency feature library subsystem.
还包括用户业务支撑子系统,其包括:It also includes the user service support subsystem, which includes:
标准化基础数据分析模型库,以特定倾向性目标人群分级多维画像库为数据支撑,开发粒度小、耦合度低、接口统一的数据分析模块,各分析模块独立运行进行数据分析且标准化的输入输出接口,形成该标准化基础数据分析模型库;Standardized basic data analysis model library, based on the hierarchical multi-dimensional portrait library of specific tendentious target groups as data support, develop data analysis modules with small granularity, low coupling degree and unified interface, each analysis module operates independently for data analysis and standardized input and output interfaces , forming the standardized basic data analysis model library;
用户业务支撑模块库,以标准化基础数据分析模型库为基础,结合用户自身业务需求,通过对各分析模块进行组装,形成所需的用户业务支撑模块。The user business support module library is based on the standardized basic data analysis model library, combined with the user's own business needs, and forms the required user business support modules by assembling each analysis module.
本实施例以特定倾向性目标人群分级多维画像库为本系统形成的核心数据库,支撑用户开展多样化的业务工作。In this embodiment, the core database formed by the system is based on the hierarchical multi-dimensional portrait library of specific tendent target groups, which supports users to carry out diversified business work.
本发明以输入法工具获取的相关数据为基础数据源,可以快速、精准地感知具有特定倾向性的人群,并根据个体标识信息进行多平台追踪和综合研判,并形成个体倾向性人群的多维画像库,从而便于用于对倾向性人群的群体和个体的研究和跟踪分析,同时也有利于及时掌控相关人群信息,方便后续分析和管理;对目标人群的个体进行多平台追踪分析,对其特定倾向性及其分级情况进行综合研判,从而实现对目标人物群体的特定倾向性及其倾向性分级的情况作出进一步的确认和调整;可以利用输入法数据源将人群感知为各种倾向性,尤其对危险信息传播人群进行分类划分,有利于及时掌控危险信息传播源头,方便后续对危险信息和危险信息传播人群及时处理。The invention takes the relevant data obtained by the input method tool as the basic data source, can quickly and accurately perceive the crowd with a specific tendency, and conduct multi-platform tracking and comprehensive research and judgment according to the individual identification information, and form a multi-dimensional portrait of the individual tendency group It is convenient for the research and tracking analysis of groups and individuals of tendentious populations, and it is also conducive to timely control of relevant population information, which is convenient for subsequent analysis and management; Conduct comprehensive research and judgment on tendencies and their grading, so as to further confirm and adjust the specific tendencies of target groups and their tendencies grading; input method data sources can be used to perceive people as various tendencies, especially Categorizing the dissemination groups of dangerous information is conducive to controlling the source of dissemination of dangerous information in a timely manner, and facilitates subsequent timely processing of dangerous information and dissemination groups of dangerous information.
附图说明Description of drawings
图1为本发明所述的一种特定倾向性人群的多维画像构建方法流程示意图。FIG. 1 is a schematic flowchart of a method for constructing a multi-dimensional portrait of a specific tendency group according to the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制;术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性;此外,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limiting the invention; the terms "first", "second", "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance; furthermore, unless otherwise Clearly stipulated and defined, the terms "installed", "connected" and "connected" should be understood in a broad sense, for example, it may be a fixed connection, a detachable connection, or an integral connection; it may be a mechanical connection or a Electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, and it can be the internal connection of two components. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations.
实施例1Example 1
如图1所示,本实施例提供一种特定倾向性人群的多维画像构建方法,具体包括如下步骤:As shown in FIG. 1 , this embodiment provides a method for constructing a multi-dimensional portrait of a specific tendency group, which specifically includes the following steps:
S1:将输入法数据源作为待分析人群数据来源;S1: Use the input method data source as the data source of the population to be analyzed;
S2:根据输入法数据源和实际需求构建至少一个特征库,所述特征库具有至少一个特定倾向性特征的集合;S2: constructing at least one feature library according to the input method data source and actual requirements, the feature library having at least one set of specific tendency features;
S3:将待分析人群的输入法数据源与特征库进行匹配,筛选出具有至少一个特定倾向性的目标人群;S3: Match the input method data source of the population to be analyzed with the feature database, and screen out the target population with at least one specific tendency;
S4:提取目标人群的输入的原始文本数据,并构建基于输入法的单通道目标人群画像库;S4: Extract the input original text data of the target group, and construct a single-channel target group portrait library based on the input method;
S5:导出单通道目标人群画像库中的个体标识类信息,并将其推送给各互联网平台,并将该个体在所有互联网平台存在的网络账号进行关联;S5: Export the individual identification information in the single-channel target group portrait database, push it to each Internet platform, and associate the individual's online accounts on all Internet platforms;
S6:跨互联网平台网络数据融合,将S5中的各个互联网平台的异构网络数据融合,根据分析结果对该个体的特定倾向性进行调整并形成特定倾向性目标人群多维画像库,且对S2中的相关倾向性分级特征库进行调整和完善。S6: Cross-Internet platform network data fusion, integrate the heterogeneous network data of each Internet platform in S5, adjust the specific tendency of the individual according to the analysis results, and form a multi-dimensional portrait library of the target population with specific tendency, and analyze the data in S2. The related propensity grading feature library can be adjusted and improved.
如在现实情况中,可能某位目标个体其在使用输入法时较少输入与特定倾向性相关的文本信息,但其有通过云盘等程序分享具有特定倾向性的相关资源等,则需要根据所追踪到的数据信息对给个体的特定倾向性及属性级别进行调整,以进一步提高所筛选目标人群的准确性或精准度。For example, in a real situation, a target individual may seldom input text information related to a specific tendency when using the input method, but he may share relevant resources with a specific tendency through programs such as cloud disks, etc. The tracked data information is adjusted to the specific tendency and attribute level of the individual to further improve the accuracy or precision of the screened target population.
在本实施例中,以输入法工具获取的相关数据为基础数据源,对具有某种特定倾向性的人群进行筛选和分类划分,有利于及时掌控相关人群信息,方便后续分析和管理;其相对于其他分析方法来说,筛选精准度高、目标人群识辨度高、后续管理跟踪方便等;并通过个体标识类信息对目标人群个体进行多平台追踪分析,对其特定倾向性及其分级情况进行综合研判,从而实现对目标人群的特定倾向性及其倾向性分级的情况作出进一步的确认和调整;同时根据个体标识信息进行多平台追踪和综合研判,并形成个体倾向性人群的多维画像库,从而便于用于对倾向性人群的群体和个体的研究和跟踪分析;本实施例通过对人群的多维感知将危险信息传播人群进行分类划分,有利于及时掌控危险信息传播源头,方便后续对危险信息和危险信息传播人群及时处理。In this embodiment, the relevant data obtained by the input method tool is used as the basic data source to screen and classify groups of people with a specific tendency, which is conducive to timely control of relevant group information and facilitates subsequent analysis and management; it is relatively For other analysis methods, the screening accuracy is high, the target population is highly identifiable, and the follow-up management and tracking are convenient. Conduct comprehensive research and judgment, so as to further confirm and adjust the specific tendencies of the target population and their tendencies grading; at the same time, conduct multi-platform tracking and comprehensive research and judgment based on individual identification information, and form a multi-dimensional portrait library of individual tendencies. , so as to facilitate the research and tracking analysis of groups and individuals of tendentious populations; this embodiment classifies and divides the population of dangerous information dissemination through multi-dimensional perception of the crowd, which is conducive to timely control of the source of dangerous information dissemination and facilitates follow-up to dangerous information. Information and dangerous information dissemination groups should be dealt with in a timely manner.
在本实施例中,在合法合规的情况下获取输入法数据源,通过输入法工具获取的基础数据源相比于传统基于网页采集的数据获取方式(如网络爬虫等)而言,具有如下优势:In this embodiment, the input method data source is obtained under the condition of legal compliance, and the basic data source obtained by the input method tool has the following characteristics compared with the traditional data acquisition method based on web page collection (such as web crawler, etc.) Advantage:
一、数据来源全面稳定。输入法是网民输入文字、上网发布信息的必备软件,不论使用论坛、微博还是QQ、微信,都离不开输入法的支持,具有用户覆盖面广、用户粘性高、替换率较低、权限级别高等特点。各输入法为改善用户体验,都内置了数据收集功能,即在用户使用上网设备时,输入法会自动收集用户的输入内容、设备标识码、所使用网络平台等信息并上传到后台数据库。在此需要说明的是:输入法公司在安装输入法工具时,需要用户对其冗长“用户协议”进行确认,才能进一步安装,而“用户协议”里面有明确收集相关信息,也就是输入法是在用户许可了情况下采集数据。1. The data sources are comprehensive and stable. The input method is a necessary software for netizens to input text and publish information online. Whether they use forums, Weibo, QQ, or WeChat, they are inseparable from the support of the input method. It has wide user coverage, high user stickiness, low replacement rate, and permissions. High level features. In order to improve the user experience, each input method has a built-in data collection function, that is, when the user uses the Internet device, the input method will automatically collect the user's input content, device identification code, the network platform used and other information and upload it to the background database. What needs to be explained here is: when the input method company installs the input method tool, the user needs to confirm its lengthy "user agreement" before further installation, and the "user agreement" has clear collection of relevant information, that is, the input method is Collect data with the user's permission.
二、是输入法数据供给侧相对集中。目前,市场占有率排前的输入法软件垄断了绝大部分市场份额,为协调数据来源提供了便利。Second, the input method data supply side is relatively concentrated. At present, the input method software with the highest market share monopolizes the vast majority of the market share, which provides convenience for coordinating data sources.
三、是数据种类较全面。各输入法为提高分析能力、改善用户体验,都内置了数据收集功能,即在用户使用软件时,软件会自动收集用户的输入时间、输入文本内容、设备标识码、输入所在网络平台等信息,并上传到后台数据库存储,为用户开展数据分析提供了重要支撑。Third, the data types are more comprehensive. In order to improve the analysis ability and user experience, each input method has built-in data collection function, that is, when the user uses the software, the software will automatically collect the user's input time, input text content, device identification code, input network platform and other information. And upload it to the background database for storage, which provides important support for users to carry out data analysis.
在本实施例中,将该方法用于电信方面等非法活动的网络人群的筛选跟踪,相关职责部门可以根据分析结果对重点个体采取不同的管理手段,从而提高对上述危险倾向性人群的监管效率和跟踪处理速度。也就是说,本实施例对相关具有特定倾向性的人群进行智能分类划分,有利于及时掌控这类违法有害信息的生产和传播源头,方便用户后续对这类违法有害信息及其生产传播人群及时进行依法依规的管理。In this embodiment, the method is used for the screening and tracking of network crowds with illegal activities such as telecommunications, and the relevant responsible departments can adopt different management methods for key individuals according to the analysis results, thereby improving the supervision efficiency of the above-mentioned dangerous tendency groups. and track processing speed. That is to say, this embodiment intelligently classifies and divides relevant groups of people with specific tendencies, which is beneficial to timely control the source of production and dissemination of such illegal and harmful information, and facilitates users to follow up on such illegal and harmful information and its production and dissemination groups in a timely manner. Carry out management in accordance with laws and regulations.
进一步,步骤S5包括:Further, step S5 includes:
S5.1导出单通道目标人群画像库中的个体标识类信息,其包括但不限于:用户在使用输入法时所使用的设备标识码、输入法注册手机号、输入法注册邮箱;S5.1 Export the individual identification information in the single-channel target group portrait database, including but not limited to: the device identification code used by the user when using the input method, the registered mobile phone number of the input method, and the registered mailbox of the input method;
S5.2各互联网平台对各个体标识类信息进行排查,并提供该个体在其平台上的账号及相关数据信息,获取目标人群在各互联网平台上可能存在的网络账号并相互关联,形成目标人群网络帐号库。S5.2 Each Internet platform checks the identification information of each individual, and provides the account and related data information of the individual on its platform, obtains the possible online accounts of the target group on each Internet platform and associates them with each other to form the target group Network account library.
本实施例通过设备标识码、输入法注册手机号、输入法注册邮箱等个体标识类信息对个体的各互联网账号进行关联,对目标人群的个体进行多平台追踪分析,以更全面地对个体相关信息进行追踪和管理。In this embodiment, individual identification information such as device identification code, input method registered mobile phone number, and input method registered mailbox is used to associate individual Internet accounts, and to perform multi-platform tracking and analysis on individuals of the target population, so as to more comprehensively analyze the correlation between individuals. Information is tracked and managed.
步骤S6包括:Step S6 includes:
S6.1对目标人群网络帐号库中每个个体的输入法数据源和各互联网平台的数据信息进行数据融合;S6.1 Data fusion is performed on the input method data source of each individual in the target group's network account database and the data information of each Internet platform;
S6.2根据S6.1的融合数据进行综合研判,并对目标人群个体的特定倾向性和/或级别进行调整,形成特定倾向性目标人群分级多维画像库,并对S2中的相关倾向性的特征库进行调整和完善。S6.2 conducts comprehensive research and judgment according to the fusion data of S6.1, and adjusts the specific tendencies and/or levels of the individual target groups to form a hierarchical multi-dimensional portrait library of the target groups with specific tendencies. The feature library is adjusted and improved.
本实施例通过融合数据进行综合研判,从而基于较为全面的互联网数据资源,来形成精准的特定倾向性目标人群分级多维画像库,并据此对相应的特征库进行调整和完善,以提供筛选精准度。In this embodiment, comprehensive research and judgment is carried out by fusing data, so as to form an accurate hierarchical multi-dimensional portrait library of specific tendentious target groups based on relatively comprehensive Internet data resources, and adjust and improve the corresponding feature library accordingly, so as to provide accurate screening. Spend.
还包括步骤S7:用户业务支撑,利用S5中融合的数据根据实际情况研发不同使用模型,包括但不限于:实体发现、目标活动轨迹还原和追踪、伴随关系分析、信息溯源和扩散分析、社会关系网络还原和社会关系网络挖掘或其他标准化基础数据分析模型。还可以对这些模型进行自定义组合拼装,从而生成包括但不限于:①敏感话题发现和追踪模块;②团伙的发现和布控模块;③团伙组织结构和人事关系分析模块;④团伙地域异动监测模块;⑤群体性行为预警模块;⑥境内外特定倾向性人员勾联监测模块;⑦信息可视化和监听监看模块,以满足自身复杂多样的业务需求。本实施例利用融合后的数据后续研发设置不同使用模型,以满足复杂多样的用户使用需求。It also includes step S7: user business support, using the data fused in S5 to develop different usage models according to the actual situation, including but not limited to: entity discovery, target activity trajectory restoration and tracking, accompanying relationship analysis, information traceability and diffusion analysis, social relationship Network restoration and social relationship network mining or other standardized basic data analysis models. These models can also be assembled and assembled by custom, including but not limited to: ①sensitive topic discovery and tracking module; ②gang discovery and control module; ③gang organization structure and personnel relationship analysis module; ④gang regional change monitoring module ; ⑤ group behavior early warning module; ⑥ domestic and foreign specific tendentious personnel collusion monitoring module; ⑦ information visualization and monitoring module to meet their complex and diverse business needs. This embodiment uses the fused data to set up different usage models in subsequent research and development to meet complex and diverse user usage requirements.
本实施例的步骤S1包括:Step S1 of this embodiment includes:
S1.1多源数据采集:基于不同输入法采集数据,包括但不限于:输入文本、输入时间、输入所在平台、设备标识码、注册账号;S1.1 Multi-source data collection: collect data based on different input methods, including but not limited to: input text, input time, input platform, device identification code, registered account;
S1.2多源异构数据处理:对所采集的数据进行预处理,根据清洗机制或筛除机制,清除噪音信息或空白信息;S1.2 Multi-source heterogeneous data processing: preprocess the collected data, and remove noise information or blank information according to the cleaning mechanism or screening mechanism;
S1.3输入法数据源基础库建立:基于预处理后的数据源构建输入法数据源基础库,进行存储管理并建立查询检索机制。S1.3 Establishment of input method data source base library: build input method data source base library based on the preprocessed data source, carry out storage management and establish a query and retrieval mechanism.
采用上述方法,一方面通过采集不同类型的输入法工具采集不同的输入数据源,该数据包括但不限于输入文本、输入时间、输入所在平台、设备标识码、注册账号等,如此采集的数据具有多源性和丰富性,使得数据比较完整、全面,也便于后续对目标人群的分析和管理;另一方面在处理前对数据进行清洗或筛选,清除噪音信息或空白信息,提高待分析人群数据的有效性。Using the above method, on the one hand, different input data sources are collected by collecting different types of input method tools, the data includes but not limited to input text, input time, input platform, device identification code, registered account, etc. The data thus collected has The multi-source and richness makes the data more complete and comprehensive, and it is also convenient for the subsequent analysis and management of the target population; effectiveness.
步骤S2包括:Step S2 includes:
S2.1初始特征库构建:根据实际需求构建至少具有一个特定倾向的初始特征库;S2.1 Initial feature library construction: build an initial feature library with at least one specific tendency according to actual needs;
S2.2分级标识:按分级标准对初始特征库的某个特定倾向进行属性分级和标识,在表征某特定倾向性特征时能够区分倾向性程度;S2.2 Grading identification: attribute classification and identification of a specific tendency in the initial feature database according to the grading standard, and can distinguish the degree of tendency when characterizing a specific tendency feature;
S2.3特征库补充:分析与研究S6中的文本信息,将新发现的用于表征特定倾向性特征的文本信息添加到相关特定倾向性特征库中;S2.3 Feature Library Supplement: Analyze and study the text information in S6, and add the newly discovered text information used to characterize specific tendency features to the relevant specific tendency feature library;
S2.4特征库调整:根据S6中目标人群的输入法输入的文本信息,对表征特定倾向性特征的文本信息进行核对,并根据核对结果对相应特定倾向性特征库的内容和分级标识进行调整。S2.4 Feature library adjustment: According to the text information input by the input method of the target group in S6, check the text information representing the specific tendency feature, and adjust the content and graded identification of the corresponding specific tendency feature library according to the check result. .
通俗地说,本实施例的具体方案为:S1:将输入法数据源作为待分析人群数据来源;In layman's terms, the specific scheme of this embodiment is: S1: use the input method data source as the data source of the population to be analyzed;
S2:根据输入法数据源以及实际需求构建至少一个特征库,所述特征库具有至少一个特定倾向性特征的集合,每个特定倾向性特征具备分级标识,每个倾向性特征库可以持续更新迭代;S2: Build at least one feature library according to the input method data source and actual requirements, the feature library has a set of at least one specific tendency feature, each specific tendency feature has a graded identification, and each tendency feature library can be continuously updated and iterated ;
所谓特定倾向性特征的定义:包括具备一定区分度的特定术语、暗语、黑话、行话、口号、简称、缩写、别名等文本信息及其组合,或者具备特定倾向性意义的人名、地名、组织机构名称等文本信息及其组合;The definition of the so-called specific tendency characteristics: including specific terms, code words, slang, jargon, slogans, abbreviations, abbreviations, aliases and other text information and their combinations with a certain degree of distinction, or names of people, places, and organizations with specific tendency meanings textual information such as names and combinations thereof;
所谓特定倾向性特征的分级标识,是指该特征在表征某特定倾向性时,所体现的“倾向性程度”的区分。The so-called graded identification of a specific tendency feature refers to the distinction of the "propensity degree" embodied by the feature when it represents a specific tendency.
S3:将待分析人群的输入法数据源与特征库进行匹配,进行人群各类特定倾向性及其分级情况的匹配,筛选出具有至少一个特定倾向性的目标人群,并根据特定倾向性特征的分级标识对目标人群进行分级。S3: Match the input method data source of the population to be analyzed with the feature database, perform matching of various specific tendencies of the population and their classification, and screen out the target population with at least one specific tendencies. The grading mark grades the target population.
本实施例在最初构建具有至少具有一个特定倾向性特特征的集合的初始特征库,并在该特征库内部根据相关特征词或目标人群的特征词出现频率对目标人群进行分级和标识,这样后续可以根据其标识级别进行区别管理;另外,根据最终匹配结果对特征库和分级标准进行调整、完善,从而不断提高该方法的判断分析精准度。同时,对具备特定倾向性的部分目标人群的输入法原始数据进行智能分析,提取反映特定倾向性的新词,包括具备一定区分度的特定术语、暗语、黑话、行话、口号、简称、缩写别称、别名等文本信息及其组合,或者具备特定倾向性意义的人名、地名、组织机构名称等文本信息及其组合构建相应抽取算法,及时发现新词和隐晦用语,不断更新和完善特征库,以提高该方法的判断分析精准度和筛选速度。In this embodiment, an initial feature database with at least one set of specific tendency features is initially constructed, and the target population is classified and identified according to the frequency of occurrence of related feature words or feature words of the target population in the feature database. Differential management can be carried out according to its identification level; in addition, the feature library and grading standard are adjusted and improved according to the final matching result, so as to continuously improve the judgment and analysis accuracy of this method. At the same time, intelligently analyze the input method raw data of some target groups with specific inclinations, and extract new words that reflect specific inclinations, including specific terms, code words, slang, jargon, slogans, abbreviations, and abbreviations with a certain degree of distinction. , aliases and other text information and their combinations, or text information with specific tendent meanings such as names, place names, organization names and their combinations to construct corresponding extraction algorithms, discover new words and cryptic terms in a timely manner, and continuously update and improve the feature database. Improve the judgment analysis accuracy and screening speed of this method.
综上,在本实施例中,对目标人群的个体进行多平台追踪分析,对其特定倾向性及其分级情况进行综合研判,从而实现对目标人物群体的特定倾向性及其倾向性分级的情况作出进一步的确认和调整;同时还可以利用融合后的数据后续研发设置不同使用模型,以满足复杂多样的用户使用需求。To sum up, in this embodiment, multi-platform tracking analysis is carried out on the individuals of the target group, and their specific tendencies and their grading are comprehensively judged, so as to realize the specific tendencies of the target groups and their tendencies grading. Make further confirmation and adjustments; at the same time, you can also use the fused data to set up different usage models for subsequent research and development to meet complex and diverse user needs.
实施例2Example 2
本实施例提供一种特定倾向性人群的多维画像构建系统,包括:This embodiment provides a multi-dimensional portrait construction system for a specific tendency group, including:
输入法数据源子系统,归集和存储有待分析人群的输入法数据信息;The input method data source subsystem collects and stores the input method data information of the population to be analyzed;
倾向性特征库子系统,具有至少一个特定倾向性的特征信息的集合;以及A propensity feature library subsystem, having a set of feature information for at least one specific propensity; and
倾向性匹配子系统,将输入法数据源子系统中的数据信息与倾向性特征库子系统中的特性信息进行比对,并对具有某一特定倾向性的人群进行标记,筛选出具有至少一个特定倾向性的目标人群;The propensity matching subsystem compares the data information in the input method data source subsystem with the characteristic information in the propensity feature library subsystem, marks the people with a specific tendency, and filters out at least one specific tendencies of the target population;
目标人群网络账号子系统,根据目标人群的个体标识类信息将该个体在所有互联网平台存在的网络账号进行关联;The target group's network account subsystem, which associates the individual's online accounts on all Internet platforms according to the target group's individual identification information;
跨互联网平台网络数据融合子系统,将各个互联网平台的异构网络数据融合,根据分析结果对该个体的特定倾向性进行调整并形成特定倾向性目标人群多维画像库,并对相关倾向性分级特征库进行调整和完善。The cross-Internet platform network data fusion subsystem integrates the heterogeneous network data of each Internet platform, adjusts the specific tendencies of the individual according to the analysis results, and forms a multi-dimensional portrait library of the target population with specific tendencies, and classifies the related tendencies. library to adjust and improve.
其中,所述输入法数据子系统包括Wherein, the input method data subsystem includes
多源数据归集模块,对待分析人群的各种输入法数据进行归集,输入法数据包括但不限于:输入文本、输入时间、输入所在平台、设备标识码、注册账号;The multi-source data collection module collects various input method data of the population to be analyzed. The input method data includes but is not limited to: input text, input time, input platform, device identification code, registered account;
多源数据预处理模块,对所采集的数据进行预处理,根据清洗机制或筛除机制,清除噪音信息或空白信息;以及输入法数据库模块,基于预处理后的数据源构建输入法数据源基础库,进行存储管理并建立查询检索机制,提供多种数据接口支撑特定倾向性人群的挖掘;The multi-source data preprocessing module preprocesses the collected data and removes noise information or blank information according to the cleaning mechanism or screening mechanism; and the input method database module, which builds the input method data source foundation based on the preprocessed data source database, carry out storage management and establish a query and retrieval mechanism, and provide a variety of data interfaces to support the mining of specific tendentious groups;
所述倾向性特征库子系统包括The tendency feature library subsystem includes
特征库初始化模块,根据实际需求构建至少具有一个特定倾向性特征的初始特征库或导入原有的初始特征库;以及A feature library initialization module, which builds an initial feature library with at least one specific tendency feature or imports an original initial feature library according to actual needs; and
特征分级标识模块,按分级标准对初始特征库的某个特定倾向进行属性分级和标识;按分级标准对初始特征库的某个特定倾向进行属性分级和标识,The feature grading and identification module is used to classify and identify a specific tendency of the initial feature library according to the grading standard;
所述倾向性特征库子系统根据倾向性匹配子系统的匹配结构对特征库初始化模块和特征分级标识模块进行更新或完善;The tendency feature library subsystem updates or perfects the feature library initialization module and the feature classification identification module according to the matching structure of the tendency matching subsystem;
所述倾向性匹配子系统包括:The propensity matching subsystem includes:
人群特定倾向性匹配模块,将输入法数据源子系统中的输入法数据库模块的数据信息与倾向性特征库子系统中的特征分级标识模块的特性信息进行关联分析,并对具有某一/多个特定倾向性的人群进行标记,筛选出具有至少一个特定倾向性的目标人群并对其特定倾向性进行分级标识;The crowd-specific tendency matching module performs correlation analysis between the data information of the input method database module in the input method data source subsystem and the characteristic information of the feature grading identification module in the tendency feature library subsystem, and analyzes the characteristics of one or more Marking groups of people with at least one specific tendency, screening out target groups with at least one specific tendency, and grading and identifying their specific tendency;
基于输入法的单通道目标人群画像库,对人群特定倾向性匹配模块所获得的目标人群个体的输入法输入的文本信息,结合输入时间、输入所在平台、设备标识码、注册账号及其它信息进行数据融合并分析形成单通道目标人群画像库。The single-channel target group portrait library based on the input method, the text information input by the input method of the target group individual obtained by the group-specific tendency matching module is combined with the input time, input platform, device identification code, registered account and other information. Data fusion and analysis form a single-channel target population portrait library.
在本实施例中,所述输入法数据子系统,一方面通过采集不同类型的输入法工具采集不同的输入数据源,该数据包括但不限于输入文本、输入时间、输入所在平台、设备标识码、注册账号等,如此采集的数据具有多源性和丰富性,使得数据比较完整、全面,也便于后续对目标人群的分析和管理;另一方面在处理前对数据进行清洗或筛选,清除噪音信息或空白信息,提高待分析人群数据的有效性。所述倾向性特征库子系统在最初构建具有至少具有一个特定倾向性的初始特征库,并在该特征库内部根据相关特征词或目标人群的特征词出现频率对目标人群进行分级和标识,这样后续可以根据其标识级别进行区别管理;另外,根据最终匹配结果对特征库和分级标准进行调整、完善,从而不断提高该方法的判断分析精准度。In this embodiment, the input method data subsystem, on the one hand, collects different input data sources by collecting different types of input method tools, and the data includes but is not limited to input text, input time, input platform, device identification code , registered accounts, etc. The data collected in this way has multiple sources and richness, making the data relatively complete and comprehensive, and also facilitating the subsequent analysis and management of the target population; on the other hand, the data is cleaned or screened before processing to remove noise. information or blank information to improve the validity of the population data to be analyzed. The tendency feature library subsystem initially constructs an initial feature library with at least one specific tendency, and classifies and identifies the target population according to the frequency of occurrence of the relevant feature words or the feature words of the target population in the feature database, so that Subsequent management can be differentiated according to its identification level; in addition, according to the final matching result, the feature library and classification standard are adjusted and improved, so as to continuously improve the judgment and analysis accuracy of this method.
也就是说,本实施例以输入法工具获取的相关数据为基础数据源,对具有某种特定倾向性的人群进行筛选和分类划分,有利于及时掌控相关人群信息,方便后续分析和管理;其相对于其他分析方法来说,筛选精准度高、目标人群识辨度高、后续管理跟踪方便等;并通过个体标识类信息对目标人群个体进行多平台追踪分析,对其特定倾向性及其分级情况进行综合研判,从而实现对目标人群的特定倾向性及其倾向性分级的情况作出进一步的确认和调整;本实施例通过对人群的多维感知将危险信息传播人群进行分类划分,有利于及时掌控危险信息传播源头,方便后续对危险信息和危险信息传播人群及时处理。That is to say, in this embodiment, the relevant data obtained by the input method tool is used as the basic data source, and the crowds with a certain tendency are screened and classified, which is conducive to timely control of the relevant crowd information and facilitates subsequent analysis and management; Compared with other analysis methods, the screening accuracy is high, the target population is highly recognizable, and the follow-up management and tracking are convenient; Comprehensively research and judge the situation, so as to further confirm and adjust the specific tendencies of the target population and their tendencies classification; this embodiment classifies and divides the population of dangerous information dissemination through the multi-dimensional perception of the population, which is conducive to timely control The source of the dissemination of dangerous information is convenient for subsequent timely processing of dangerous information and the dissemination of dangerous information.
所述目标人群网络账号子系统包括:The target group network account subsystem includes:
目标人群个体标识信息数据库,从倾向性匹配子系统的基于输入法的单通道目标人群画像库中,输出个体使用输入法时所涉及的设备标识码、注册手机号、注册邮箱或其他个体标识类信息,并归集形成该数据库;Target group individual identification information database, from the input method-based single-channel target group portrait library of the propensity matching subsystem, output the device identification code, registered mobile phone number, registered email address or other individual identification types involved when the individual uses the input method information, and aggregated to form the database;
目标人群网络帐号库,各个互联网企业基于目标人群个体标识信息数据库进行排查,对于具备某个/多个特定倾向性的目标人群个体在其运营的互联网平台上可能注册的网络账号进行归集,形成目标人群网络帐号库。The target group network account database, each Internet company conducts investigations based on the target group’s individual identification information database, and collects the network accounts that may be registered by the target group individuals with one or more specific tendencies on the Internet platform operated by them to form Target population network account database.
本实施例中,由倾向性匹配子系统的基于输入法的单通道目标人群画像库,输出每个人物使用输入法时所使用的设备标识码、注册手机号、注册邮箱等个体标识类信息归集形成该数据库,并以此为依据,要求包括视频网站、云盘、社交网络等互联网企业对其这些个体标识类进行自身排查。各个互联网企业基于目标人群个体标识信息数据库排查后,对于具备某个/多个特定倾向性的目标人群在其运营的互联网平台上可能注册的网络账号进行归集,从而形成全面的目标人群网络帐号库,以后续更全面地对个体相关信息进行追踪和管理。In this embodiment, the single-channel target group portrait library based on the input method of the propensity matching subsystem outputs the device identification code, registered mobile phone number, registered mailbox and other individual identification information that each character uses when using the input method. The database is assembled and based on this, Internet companies including video websites, cloud disks, and social networks are required to conduct their own investigations on these individual identification categories. After checking the individual identification information database of the target group, each Internet enterprise collects the online accounts that may be registered on the Internet platform operated by the target group with one or more specific tendencies, so as to form a comprehensive network account of the target group. The database can be used to track and manage individual-related information more comprehensively in the future.
跨平台网络数据融合子系统包括:The cross-platform network data fusion subsystem includes:
多源异构网络数据融合模块,根据目标人群网络帐号库,各个互联网平台要求提供相关帐号的网络数据,将具有特定倾向性的个体在多个互联网平台的异构网络数据与其输入法数据源进行数据融合;Multi-source heterogeneous network data fusion module, according to the target population network account database, each Internet platform requires the network data of the relevant account, and analyzes the heterogeneous network data of individuals with specific tendencies on multiple Internet platforms and their input method data sources. Data Fusion;
综合倾向性研判和分级模块,针对个体的跨平台融合数据进行综合研判,对其特定倾向性及其分级进行二次确认,对倾向性特征库子系统中的相关倾向性分级特征库进行调整和完善;The comprehensive tendency judgment and classification module conducts comprehensive research and judgment on the cross-platform fusion data of individuals, conducts secondary confirmation of its specific tendency and its classification, and adjusts and adjusts the relevant tendency classification feature library in the tendency feature library subsystem. Complete;
特定倾向性目标人群分级多维画像库,根据综合倾向性研判和分级模块的分析和确认结果形成该特定倾向性目标人群分级多维画像库。The hierarchical multi-dimensional portrait library of the specific tendency target group is formed according to the analysis and confirmation results of the comprehensive tendency judgment and classification module.
在本实施例中,将形成的目标人群网络帐号库,交由各个互联网平台提供相关帐号的网络数据,并将多个互联网平台的异构网络数据融合,经综合倾向性研判和分级,最终生成精准度高的特定倾向性目标人群分级多维画像库。In this embodiment, the formed target population network account database is handed over to each Internet platform to provide the network data of the relevant accounts, and the heterogeneous network data of multiple Internet platforms is fused, and finally generated after comprehensive tendency judgment and classification A high-accuracy multi-dimensional portrait library of specific tendent target groups.
进一步地,本实施例还包括倾向性特征发现和核实子系统,其包括:Further, this embodiment also includes a tendency feature discovery and verification subsystem, which includes:
目标人群原始输入文本库,提取倾向性匹配子系统中的基于输入法的单通道目标人群画像库的目标人群的输入的原始文本数据,作为本子系统的基础数据来源;The original input text library of the target group, extracting the original text data of the target group input in the single-channel target group portrait library based on the input method in the propensity matching subsystem, as the basic data source of this subsystem;
倾向性特征发现模块,提取目标人群原始输入文本库中用于表征且未被录入在倾向性特征库子系统中的文本信息,经核实后将该文本信息作为新特征补入到相应的特征库中;The tendency feature discovery module extracts the text information that is used for representation in the original input text library of the target group and is not entered in the tendency feature library subsystem. After verification, the text information is added to the corresponding feature library as a new feature. middle;
倾向性特征核实模块,对倾向性匹配子系统中的基于输入法的单通道目标人群画像库的目标人群的输入的原始文本数据进行二次综合研判,核实该目标人群所体现的倾向性及其分级,并根据核实结论调整倾向性特征库子系统的内容和分级标识。The propensity feature verification module conducts secondary comprehensive research and judgment on the input original text data of the target population of the single-channel target population portrait library based on the input method in the propensity matching subsystem, and verifies the propensity and its characteristics of the target population. grading, and adjust the content and grading identification of the tendency feature library subsystem according to the verification conclusion.
在本实施例中,通过倾向性特征发现模块,提取并未存储的特定倾向性特征,而且经核实后作为新特征补入到相应的特征库中。采用该方法,对具备特定倾向性的目标人群的输入法原始数据进行智能分析,提取具备新发现的特定倾向性特征,从而及时发现新词和隐晦用语及其组合,不断更新和完善特定倾向性特征库,以提高该方法的判断分析精准度和适应性。In this embodiment, a specific tendency feature that is not stored is extracted through the tendency feature discovery module, and is added to the corresponding feature library as a new feature after verification. This method is used to intelligently analyze the input method raw data of target groups with specific tendencies, and extract the newly discovered specific tendencies features, so as to discover new words, obscure terms and their combinations in time, and constantly update and improve specific tendencies. The feature library is used to improve the accuracy and adaptability of the judgment and analysis of the method.
通过倾向性特征核实模块,对S3的目标人群的输入的原始文本数据进行二次综合研判,进一步核实这些目标人群所体现的倾向性及其分级。并根据这些核实结论,进一步调整S2中相关倾向性特征库的内容和分级标识,从而不断完善S2的倾向性特征库,以提高该方法的判断分析精准度和适应性。Through the propensity feature verification module, a secondary comprehensive judgment is made on the input original text data of the target population of S3, and the propensity and classification of these target populations are further verified. And based on these verification conclusions, the content and classification identification of the relevant tendency feature library in S2 are further adjusted, so as to continuously improve the tendency feature library of S2, so as to improve the accuracy and adaptability of the judgment and analysis of this method.
进一步地,本实施例还包括用户业务支撑子系统,其包括:Further, this embodiment also includes a user service support subsystem, which includes:
标准化基础数据分析模型库,以特定倾向性目标人群分级多维画像库为数据支撑,开发粒度小、耦合度低、接口统一的数据分析模块,各分析模块独立运行进行数据分析且标准化的输入输出接口,形成该标准化基础数据分析模型库用户业务支撑模块库,以标准化基础数据分析模型库为基础,结合用户自身业务需求,通过对各分析模块进行组装,形成所需的用户业务支撑模块。Standardized basic data analysis model library, based on the hierarchical multi-dimensional portrait library of specific tendentious target groups as data support, develop data analysis modules with small granularity, low coupling degree and unified interface, each analysis module operates independently for data analysis and standardized input and output interfaces , to form the user business support module library of the standardized basic data analysis model library. Based on the standardized basic data analysis model library, combined with the user's own business needs, the required user business support modules are formed by assembling each analysis module.
本实施例中,一方面基于特定倾向性目标人群分级多维画像库的数据支撑,开发粒度小、耦合度低、接口统一的数据分析组件,各分析模型既能够独立运行进行数据分析,提供重要的情报线索,也能够以标准化的输入输出接口,从而形成标准化基础数据分析模型库,可研发包括但不限于“实体发现、目标活动轨迹还原和追踪、伴随关系分析、信息溯源和扩散分析、社会关系网络还原和社会关系网络挖掘”等标准化基础数据分析模型;并按照预定标准编写脚本快速组装特定的数据分析模型,形成监测布控感知链条。In this embodiment, on the one hand, a data analysis component with small granularity, low coupling degree and unified interface is developed based on the data support of the hierarchical multi-dimensional portrait database of specific tendentious target groups. Each analysis model can run independently for data analysis and provide important Intelligence clues can also use standardized input and output interfaces to form a standardized basic data analysis model library, which can be developed including but not limited to "entity discovery, target activity trajectory restoration and tracking, accompanying relationship analysis, information traceability and diffusion analysis, social relations. Standardized basic data analysis models such as network restoration and social relationship network mining”; and write scripts according to predetermined standards to quickly assemble specific data analysis models to form a monitoring, control, and perception chain.
另一方面,基于标准化基础数据分析模型库,结合用户自身业务需求,通过以上分析模块的组装,可形成包括但不限于敏感话题发现和追踪模块、团伙的发现和布控模块、团伙组织结构和人事关系分析模块、团伙地域异动监测模块、群体性行为预警模块、境内外特定倾向性人员勾联监测模块、信息可视化和监听监看模块等用户业务支撑模块。On the other hand, based on the standardized basic data analysis model library, combined with the user's own business needs, through the assembly of the above analysis modules, it can be formed including but not limited to sensitive topic discovery and tracking module, gang discovery and control module, gang organizational structure and personnel. User business support modules such as relationship analysis module, gang regional change monitoring module, group behavior early warning module, domestic and foreign specific tendentious personnel association monitoring module, information visualization and monitoring monitoring module.
综上,本实施例以特定倾向性目标人群分级多维画像库为本系统形成的核心数据库,支撑用户开展多样化的业务工作。To sum up, in this embodiment, the core database formed by the system is based on the hierarchical multi-dimensional portrait database of specific tendent target groups, which supports users to carry out diversified business work.
实施例3Example 3
本实施例提供一种人群的倾向性人群的多维画像构建方法,其具体包括如下步骤:The present embodiment provides a method for constructing a multi-dimensional portrait of a group of people with a tendency to, which specifically includes the following steps:
S1:将输入法数据源作为待分析人群数据来源;S1: Use the input method data source as the data source of the population to be analyzed;
S1.1多源数据采集:基于不同输入法采集数据,包括但不限于:输入文本、输入时间、输入所在平台、设备标识码、注册账号;该输入法包括腾讯输入法、百度输入法、搜狗输入法等;如此采集的数据具有多源性和丰富性,使得数据比较完整、全面,也便于后续对目标人群的分析和管理;S1.1 Multi-source data collection: collect data based on different input methods, including but not limited to: input text, input time, input platform, device identification code, registered account; the input methods include Tencent input method, Baidu input method, Sogou input method Input methods, etc.; the data collected in this way has multiple sources and richness, which makes the data relatively complete and comprehensive, and also facilitates the subsequent analysis and management of the target population;
S1.2多源异构数据处理:对所采集的数据进行预处理,根据清洗机制或筛除机制,清除噪音信息或空白信息;原始数据类型繁杂,可能含有大量与无关的无用信息,通过清洗机制或筛除机制,如去除停用词、过短过长甚至空白数据或指定数据来源类型等达到清洗或筛除效果;S1.2 Multi-source heterogeneous data processing: Preprocess the collected data, and remove noise information or blank information according to the cleaning mechanism or screening mechanism; Mechanisms or screening mechanisms, such as removing stop words, too short or too long or even blank data, or specifying data source types to achieve cleaning or screening effects;
S1.3输入法数据源基础库建立:基于预处理后的数据源构建输入法数据源基础库,进行存储管理并建立查询检索机制;对海量文本信息实现高效的管理存储管理和查询检索,以便于该数据源的后续使用和多个不同分析方向的高效利用;S1.3 Establishment of input method data source basic library: build input method data source basic library based on the preprocessed data source, carry out storage management and establish a query retrieval mechanism; realize efficient management, storage management and query retrieval of massive text information, so that for subsequent use of this data source and efficient use of multiple different analysis directions;
S2:根据输入法数据源构建特征库,所述特征库具有某一倾向特征;S2: construct a feature library according to the input method data source, and the feature library has a certain tendency characteristic;
S2.1初始特征库构建:根据监管需求构建至少具有某一倾向特征的初始特征库;S2.1 Initial feature library construction: build an initial feature library with at least a certain tendency characteristic according to regulatory requirements;
S2.2分级标识:按监管部门的特征库的倾向特征进行属性分级和标识,以便于监管部门对不同等级的人员采取不同跟踪或管理方式,从而有重点地部署监管工作;S2.2 Hierarchical identification: Attribute grading and identification according to the tendencies of the feature database of the regulatory department, so that the regulatory department can adopt different tracking or management methods for personnel at different levels, so as to deploy the supervision work in a focused manner;
S2.3特征库补充:分析与研究S6中的文本信息,将新发现的用于表征特定倾向性特征的文本信息添加到相关特定倾向性特征库中;S2.3 Feature Library Supplement: Analyze and study the text information in S6, and add the newly discovered text information used to characterize specific tendency features to the relevant specific tendency feature library;
S2.4特征库调整:根据S6中目标人群的输入法输入的文本信息,对表征特定倾向性特征的文本信息进行核对,并根据核对结果对相应特定倾向性特征库的内容和分级标识进行调整;根据最终匹配结果对特征库和分级标准进行调整、完善,从而不断提高监管的目标人群的筛选精准度;S2.4 Feature library adjustment: According to the text information input by the input method of the target group in S6, check the text information representing the specific tendency feature, and adjust the content and graded identification of the corresponding specific tendency feature library according to the check result. ; Adjust and improve the feature library and grading standards according to the final matching results, so as to continuously improve the screening accuracy of the target population under supervision;
S3:将待分析人群的输入法数据源与特征库进行匹配,筛选出具有某一倾向性目标人群;以明确其风险等级或危险性,监管部门根据需要采取相应处理措施;S3: Match the input method data source of the population to be analyzed with the feature database, and screen out the target population with a certain tendency; in order to clarify its risk level or danger, the regulatory department will take corresponding measures as needed;
S4:提取目标人群的输入的原始文本数据,并构建基于输入法的单通道目标人群画像库;S4: Extract the input original text data of the target group, and construct a single-channel target group portrait library based on the input method;
S5:导出单通道目标人群画像库中的个体标识类信息,并将其推送给各互联网平台,并将该个体在所有互联网平台存在的网络账号进行关联;S5: Export the individual identification information in the single-channel target group portrait database, push it to each Internet platform, and associate the individual's online accounts on all Internet platforms;
具体地,步骤S5包括:Specifically, step S5 includes:
S5.1导出单通道目标人群画像库中的个体标识类信息,其包括但不限于:用户在使用输入法时所使用的设备标识码、输入法注册手机号、输入法注册邮箱;S5.1 Export the individual identification information in the single-channel target group portrait database, including but not limited to: the device identification code used by the user when using the input method, the registered mobile phone number of the input method, and the registered mailbox of the input method;
S5.2各互联网平台对各个体标识类信息进行排查,并提供该个体在其平台上的账号及相关数据信息,获取目标人群在各互联网平台上可能存在的网络账号并相互关联,形成目标人群网络帐号库;S5.2 Each Internet platform checks the identification information of each individual, and provides the account and related data information of the individual on its platform, obtains the possible online accounts of the target group on each Internet platform and associates them with each other to form the target group network account library;
S6:跨互联网平台网络数据融合,将S5中的各个互联网平台的异构网络数据融合,根据分析结果对该个体的特定倾向性进行调整并形成目标人群分级多维画像库,且对S2中的相关倾向性分级特征库进行调整和完善;S6: Cross-Internet platform network data fusion, integrate the heterogeneous network data of each Internet platform in S5, adjust the specific tendencies of the individual according to the analysis results and form a hierarchical multi-dimensional portrait library of the target population, and analyze the relevant information in S2. Adjust and improve the tendency classification feature library;
具体地,步骤S6包括:Specifically, step S6 includes:
S6.1对目标人群网络帐号库中每个个体的输入法数据源和各互联网平台的数据信息进行数据融合;S6.1 Data fusion is performed on the input method data source of each individual in the target group's network account database and the data information of each Internet platform;
S6.2根据S6.1的融合数据进行综合研判,并对目标人群个体的倾向性进行分级,形成目标人群分级多维画像库,并对S2中的倾向性的特征库进行调整和完善。S6.2 conducts comprehensive research and judgment based on the fusion data of S6.1, and classifies the inclinations of the individual target groups, forms a multi-dimensional portrait library for the classification of target groups, and adjusts and improves the predisposition feature library in S2.
在本实施例中,以输入法工具获取的相关数据为基础数据源,对具有倾向性的人群进行筛选和分类划分,有利于及时掌控相关人群信息,方便后续分析和管理;其相对于其他分析方法来说,筛选精准度高、目标人群识辨度高、后续管理跟踪方便等;并对目标人群的个体进行多平台追踪分析,对其特定倾向性及其分级情况进行综合研判,从而实现对重点关注人物群体的特定倾向性及其倾向性分级的情况作出进一步的确认和调整;对危险信息传播人群进行分类划分,有利于及时掌控危险信息传播源头,方便后续对危险信息和危险信息传播人群及时处理。In this embodiment, the relevant data obtained by the input method tool is used as the basic data source to screen and classify the people with tendencies, which is conducive to timely control of the relevant population information and facilitates subsequent analysis and management; compared with other analysis In terms of methods, the screening accuracy is high, the target population is highly identifiable, and the follow-up management and tracking are convenient, etc.; and the multi-platform tracking and analysis of the individuals of the target population is carried out, and their specific tendencies and their classification are comprehensively judged, so as to realize the Focus on the specific tendencies of character groups and further confirm and adjust their tendencies grading; classify and classify the dissemination of dangerous information, which is conducive to timely control of the source of dangerous information dissemination, and facilitates follow-up of dangerous information and dangerous information dissemination groups. deal with in a timely manner.
本实施例对具备倾向性的部分目标人群的输入法原始数据进行智能分析,提取反映特定倾向性的新词,包括具备一定区分度的特定术语、暗语、黑话、行话、口号、简称、缩写别称、别名等文本信息及其组合,或者具备特定倾向性意义的人名、地名、组织机构名称等文本信息及其组合构建相应抽取算法,及时发现新词和隐晦用语,不断更新和完善特征库,以提高倾向性的判断分析精准度和筛选速度。This embodiment intelligently analyzes the input method raw data of some target groups with a tendency to extract new words that reflect a specific tendency, including specific terms with a certain degree of discrimination, code words, slang words, jargon, slogans, abbreviations, and abbreviations. , aliases and other text information and their combinations, or text information with specific tendent meanings such as names, place names, organization names and their combinations to construct corresponding extraction algorithms, discover new words and cryptic terms in a timely manner, and continuously update and improve the feature database. Improve the accuracy of judgment and analysis of tendencies and the speed of screening.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的技术人员来说,在不脱离本发明构思的前提下,还可以做出若干等同替代或明显变型,而且性能或用途相同,都应当视为属于本发明的保护范围之内。The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and it cannot be considered that the specific implementation of the present invention is limited to these descriptions. For those skilled in the technical field of the present invention, without departing from the concept of the present invention, several equivalent substitutions or obvious modifications can be made, and the performance or use is the same, which should be regarded as belonging to the protection scope of the present invention. Inside.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110244522.6A CN112818249B (en) | 2021-03-04 | 2021-03-04 | A method and system for constructing a multi-dimensional portrait of a specific tendency group |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110244522.6A CN112818249B (en) | 2021-03-04 | 2021-03-04 | A method and system for constructing a multi-dimensional portrait of a specific tendency group |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112818249A CN112818249A (en) | 2021-05-18 |
CN112818249B true CN112818249B (en) | 2022-06-21 |
Family
ID=75862895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110244522.6A Active CN112818249B (en) | 2021-03-04 | 2021-03-04 | A method and system for constructing a multi-dimensional portrait of a specific tendency group |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112818249B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103763124A (en) * | 2013-12-26 | 2014-04-30 | 孙伟力 | Internet user behavior analyzing and early-warning system and method |
CN108021670A (en) * | 2017-12-06 | 2018-05-11 | 中国南方航空股份有限公司 | Multi-source heterogeneous data fusion system and method |
CN109344734A (en) * | 2018-09-11 | 2019-02-15 | 北京唐冠天朗科技开发有限公司 | A kind of population at risk's recognition methods and system |
CN110222992A (en) * | 2019-06-11 | 2019-09-10 | 深圳市安络科技有限公司 | A kind of network swindle method for early warning and device based on group's portrait of being deceived |
CN110309423A (en) * | 2019-06-28 | 2019-10-08 | 北京奇艺世纪科技有限公司 | A kind of sensitive information recognition methods, device and electronic equipment |
CN110928425A (en) * | 2018-09-17 | 2020-03-27 | 北京搜狗科技发展有限公司 | Information monitoring method and device |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9235978B1 (en) * | 2013-01-16 | 2016-01-12 | Domo, Inc. | Automated suggested alerts based on natural language and user profile analysis |
US9697290B2 (en) * | 2014-01-16 | 2017-07-04 | International Business Machines Corporation | Providing relevant information to a user based upon monitored user activities in one or more contexts |
US11113615B2 (en) * | 2018-09-11 | 2021-09-07 | ZineOne, Inc. | Real-time event analysis utilizing relevance and sequencing |
CN110263126A (en) * | 2019-06-20 | 2019-09-20 | 维沃移动通信有限公司 | A kind of generation method and mobile terminal of user's portrait |
CN111538751B (en) * | 2020-03-23 | 2021-05-04 | 重庆特斯联智慧科技股份有限公司 | Tagged user portrait generation system and method for Internet of things data |
CN111611478B (en) * | 2020-05-06 | 2023-04-07 | 支付宝(杭州)信息技术有限公司 | Information recommendation method and device and electronic equipment |
CN111966920B (en) * | 2020-07-13 | 2023-09-12 | 江汉大学 | Prediction methods, devices and equipment for stable conditions for public opinion dissemination |
-
2021
- 2021-03-04 CN CN202110244522.6A patent/CN112818249B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103763124A (en) * | 2013-12-26 | 2014-04-30 | 孙伟力 | Internet user behavior analyzing and early-warning system and method |
CN108021670A (en) * | 2017-12-06 | 2018-05-11 | 中国南方航空股份有限公司 | Multi-source heterogeneous data fusion system and method |
CN109344734A (en) * | 2018-09-11 | 2019-02-15 | 北京唐冠天朗科技开发有限公司 | A kind of population at risk's recognition methods and system |
CN110928425A (en) * | 2018-09-17 | 2020-03-27 | 北京搜狗科技发展有限公司 | Information monitoring method and device |
CN110222992A (en) * | 2019-06-11 | 2019-09-10 | 深圳市安络科技有限公司 | A kind of network swindle method for early warning and device based on group's portrait of being deceived |
CN110309423A (en) * | 2019-06-28 | 2019-10-08 | 北京奇艺世纪科技有限公司 | A kind of sensitive information recognition methods, device and electronic equipment |
Non-Patent Citations (2)
Title |
---|
"基于异构身份的用户行为分析系统设计";陈昭昀 等;《通信技术》;20200710;第53卷(第07期);全文 * |
"微博的高校舆情监控系统设计";文展 等;《无线互联科技》;20080110;第15卷(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112818249A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223168B (en) | Label propagation anti-fraud detection method and system based on enterprise relationship map | |
CN112001586B (en) | Enterprise networking big data audit risk control architecture based on block chain consensus mechanism | |
Kim et al. | Data governance framework for big data implementation with a case of Korea | |
KR100752677B1 (en) | Information technology risk management system and method | |
CN103279883B (en) | Electronic-payment transaction risk control method and system | |
CN107818150A (en) | A kind of log audit method and device | |
CN104636408B (en) | News certification method for early warning and system based on user-generated content | |
CN114398669A (en) | Method and device for joint credit scoring based on privacy-preserving computing and cross-organization | |
CN112733045B (en) | User behavior analysis method and device and electronic equipment | |
CN112070338A (en) | Enterprise internal auxiliary auditing method | |
Liao et al. | Comparing international contractors' CSR communication patterns: A semantic analysis | |
CN115630404A (en) | Data security management service method | |
CN114785710B (en) | A method and system for evaluating the service capability of secondary nodes in industrial Internet identity resolution | |
CN119538316B (en) | Electronic commerce platform information security desensitization scheme analysis system based on artificial intelligence | |
CN116739408A (en) | Power grid dispatching safety monitoring method and system based on data tag and electronic equipment | |
CN117235343A (en) | Short video data processing system and processing method based on image processing technology monitoring | |
CN116738449A (en) | DSMM-based data security management and control and operation system | |
Paraschiv et al. | A unified graph-based approach to disinformation detection using contextual and semantic relations | |
Seidler et al. | Criminal network analysis inside law enforcement agencies: a data-mining system approach under the national intelligence model | |
CN110891071A (en) | Network traffic information acquisition method, device and related equipment | |
CN112818249B (en) | A method and system for constructing a multi-dimensional portrait of a specific tendency group | |
US20160188676A1 (en) | Collaboration system for network management | |
JP2007287132A (en) | Information technology risk management system and its method | |
CN115134388A (en) | Electronic data reconnaissance data platform | |
Alguliyev et al. | Social credit system as a new tool in the management of citizens' behavior: problems and prospects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |