HK1179010A - Method and system for automated social networking graph mining and visualization - Google Patents
Method and system for automated social networking graph mining and visualization Download PDFInfo
- Publication number
- HK1179010A HK1179010A HK13106022.4A HK13106022A HK1179010A HK 1179010 A HK1179010 A HK 1179010A HK 13106022 A HK13106022 A HK 13106022A HK 1179010 A HK1179010 A HK 1179010A
- Authority
- HK
- Hong Kong
- Prior art keywords
- social
- vertex
- vertices
- entity
- entities
- Prior art date
Links
Description
Many social networking applications have recently been online on the internet to allow people to connect socially as well as professionally. Typically, such social networking applications require a user to create a user Identification (ID) and password and identify their friends in order to create a profile. However, many web pages on the world Wide Web or Internet, such as news sites, blogs, reviews, etc., describe social activities of people and other entities, even though such information is not listed on social networking application sites.
In addition, while there are many social networking applications, there are not many ways to easily determine and view social connections or relationships between people and other entities.
SUMMARY
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The automated social networking graph mining and visualization techniques described herein mine social connections from general (not necessarily social application specific) web pages and allow for visualization of social networking relationships.
More specifically, in one embodiment, the automated social networking graph mining and visualization technique automatically mines and lays out a social networking graph of an entity. The entities may be, for example, people, organizations, or even keywords. The technique uses the distance between the name of an entity on one or more web pages and the names of related entities to determine connections between entities and the strength of those connections.
In the mining process, the input is a generic web page. The names of people or other entities and the social connections between them are automatically extracted and synthesized. For each entity name, its social network graph is defined as the union of the set of names in one or more Web pages that are connected to the entity name and the set of social connections between the connected entity names identified from the Web. These social connections may be weighted by a relationship ranking process.
In one embodiment, the social networking graph mining and visualization technique may use a layout process to generate a two-dimensional (2-D) social networking graph around names of people or other entities. In one embodiment of the technique, the layout is automatically generated using a force directed model and an energy reduction process. In this layout, the social connection weight is represented by the distance between its two endpoints. The shorter the distance, the greater the connection weight. Also, in one embodiment, the layout process aggregates names of people or other entities that are tightly connected to each other.
Drawings
The specific features, aspects, and advantages of the disclosure will become better understood with regard to the following description, appended claims, and accompanying drawings where:
FIG. 1 is an exemplary architecture for employing one exemplary embodiment of the automated social networking graph mining and visualization techniques described herein.
FIG. 2 depicts a flow diagram of an exemplary process for employing automated social networking graph mining and visualization techniques.
FIG. 3 depicts a flowchart of an exemplary process for creating a social networking graph in accordance with one exemplary embodiment of an automated social networking graph mining and visualization technique.
FIG. 4 depicts an exemplary social networking graph created by one exemplary embodiment of an automated social networking graph mining and visualization technique.
FIG. 5 depicts a graphical user interface employed in one exemplary embodiment of an automated social networking graph mining and visualization technique.
FIG. 6 is a schematic diagram of an exemplary computing device that may be used to implement automated social networking graph mining and visualization techniques.
Detailed Description
In the following description of automated social networking graph mining and visualization techniques, reference is made to the accompanying drawings, which form a part hereof, and which are shown by way of illustrative examples in which the automated social networking graph mining and visualization techniques described herein may be implemented. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the claimed subject matter.
1.0Automated social networking graph mining and visualization techniques
The following sections provide an overview of automated social networking graph mining and visualization techniques, as well as an exemplary architecture, process, and user interface for employing the techniques.
1.1Technical overview
Social networks have recently received increased attention. Social networking applications and services such as Facebook and Twitter allow users to grow their own private social networks. However, the internet actually contains a public social network that is implicit in the text in a general web page. For example, web pages that describe social activities provide implicit social connections between people or entities mentioned on those web pages. These public implied social connections may be viewed as a vast social network graph. Little work has been done to automatically identify and present social connections between people and other entities from web pages that are not specifically designed for social networking applications.
In one embodiment, the automated social networking graph mining and visualization technique automatically identifies social networking connections from a generic web page and provides a social networking graph in a 2-D layout that allows a viewer to easily identify connections between people or entities in the graph and the strength of those connections.
In one embodiment, the techniques create a 2-D visualization layout that provides a social network diagram of a particular entity as a set of vertices (names) and edges (social connections) arranged in a radial layout based on connections and connection strengths between the entity and other entities. The owner of the social graph (e.g., the entity for which the graph was generated) is centrally located. Using the mining and layout processes described above, social networking graphs from the web are automatically generated and visualized.
1.2Exemplary architecture
FIG. 1 provides an exemplary architecture 100 for employing one embodiment of automated social networking graph mining and visualization techniques. As shown in FIG. 1, the architecture 100 employs a social networking graph mining and visualization module 102 that resides on a computing device 600 that will be discussed in more detail with reference to FIG. 6.
The social networking graph mining and visualization module 102 is used to mine and present the social networking graph 104 to a user on a display 106, which the user 108 may manipulate via a user interface 110. The web page 112 is input into the social networking graph mining and visualization module 102. Web page visual parser 114 parses web page content from each web page entered into information blocks 116. The information block 116 is input to a name extractor 118, which identifies the entity name from the information block. These entities may be, for example, people, organizations, or keywords.
The entity name tagged information block 120 is input into a social connection ranker 122 that determines and ranks one or more connections between the entities identified in the tagged information block 116. The social connection ranker 122 also identifies and assigns weights to the strengths of the connections. One exemplary process for determining the rank is discussed in more detail below. The social connection ranker 122 outputs ranked entity names and their connection weights 124. These connection weights 124 are input into a social connection aggregator 126, which aggregates all the ranked entity names and their connection weights 124, and creates the social network diagram 104 using the force direction model and outputs the diagram on the display 106. Once the portions of the social networking graph 104 are displayed on the display 106, the user 108 can manipulate the social networking graph 104 via the user interface 110 to show other portions thereof, as will be discussed in more detail below.
1.3.Exemplary Processes employed by automated social network graph mining and visualization techniques
The following paragraphs provide a description of an exemplary process for employing automated social networking graph mining and visualization techniques. It is to be understood that the order of the acts may be interchanged in some cases, and that portions of the acts may even be omitted in some cases.
1.3.1Automatic social networking graph mining
A high-level flow diagram of a process for employing one embodiment of the automated social networking graph mining and visualization technique is shown in FIG. 2. This embodiment of the technology mines social connections and their strengths from a generic web page. As shown in FIG. 2, at block 202, the web pages are entered into the process. These entered web pages may be found, for example, by a web crawler crawling the internet.
The content of each entered web page is parsed into chunks, as shown at block 204. For example, a visual parser that identifies text blocks or other contiguous data blocks may be used to parse the content of a web page. Various conventional techniques may be used to identify and parse the text or other contiguous data block. The name of the person or entity found in the information block is then identified, as shown in block 206. This may be done, for example, by a conventional name finder. Here, conventional name finder refers to any computer algorithm capable of automatically finding an entity name from a block of information.
The social connections between the entity names in the information block are then ranked, as shown in block 208. In one embodiment of the technique, the ordering takes into account the location of the name and surrounding text in the information block (e.g., the location of the name and a particular distance or number of words separating the name in the text). The closer the two names are to each other, the stronger the connection is considered to be. Details regarding the sorting process employed by one embodiment of the technique are provided below. The ranked social connections from all the pieces of information are then aggregated to determine the strength of the social connection between the entities found on the web page, as shown at block 210. In one embodiment of the social networking graph mining and visualization technique, the integration takes into account the frequency of finding the same set of names on a web page in addition to the proximity of names in the information blocks in determining the strength of the connection found for each information block. The strength of the social connection may then be used for various purposes, such as determining who a person's friends are, or for mapping these social connections to provide visual assistance in determining the strength of the social connection between the persons or other entities, as shown at block 212.
1.3.2Visual social connections
FIG. 3 provides a flow diagram of one exemplary process 300 used by automated social networking graph mining and visualization techniques to visualize social connections. In this embodiment of the technology, social connections between names of people/entities extracted from a generic web page are represented in the form of a 2-D graph having a set of vertices representing names and a set of edges representing social connections. The 2-D map is represented using a force directed model, and the layout of the 2-D map is substantially optimized or enhanced using an energy minimization process.
As shown in block 302, a list of ranked social connections is entered. The owner of the social graph (the person for whom the graph is displayed) is placed at the center of the 2D graph as the center vertex, as shown at block 304. Vertices representing names of people or other entities and having social connections to owners of the social graph are placed in different tracks around a center vertex (or in tracks around the center vertex based on ranking), as shown at block 306. The shorter the radius of a track, the stronger the social connection between the vertex and the center vertex in that track. A track around the center vertex is created for each social connection, with the vertex (entity) with the strongest social connection to the owner closest to the center vertex. As shown at block 308, the vertices may then be aggregated into different clusters according to connectivity between the vertices (e.g., aggregated according to connections between them and an energy minimization process that will be described in more detail below). Vertices in the same cluster (e.g., vertices with more than one connection between them) are placed close to each other. The vertex clusters are also placed such that the vertex clusters do not overlap each other, as shown in block 310. As indicated at block 312, the force directed model is then used to substantially optimize the uniformity of the layout of the 2D map.
FIG. 4 provides one example of a 2D map layout 400 produced by one embodiment of the technology. In this embodiment of the social networking graph mining and visualization technique, the force direction model substantially optimizes the layout to uniformity by employing a set of force and energy minimization processes:
1) a repulsive force 402 between every two vertices (an example of which is shown in fig. 4);
2) an attraction force 404 along the edge (e.g., between social connections between vertices 402);
3) repulsive forces 406 between adjacent tracks; and
4) impenetrable boundaries 408 that are modeled to isolate clusters that are not connected to each other.
5) Each object in the figure, such as each vertex, edge, track and impenetrable boundary, is either radial/tangential free, or both directions are free, and can thus move according to the force directed to the model.
Thus, there are four different forces in the force direction model at each vertex:
attraction along the edge
Repulsive force between every two vertexes
Repulsive force between adjacent tracksAnd
impenetrable boundary
Wherein i, j represents the ith vertex and the jth vertex; d represents a distance; o iskRepresents the k-th track; u shapemRepresents the m-th impenetrable boundary; c1,2,3Represents a constant; θ represents the orthogonal angle of the line passing through the apex and the central apex and the impenetrable boundary; and τ represents a constant threshold. Given the above definition, the aggregate force at one vertex is
When the forces are balanced at one vertex, the aggregate force at that vertex is zero. Thus, the technique seeks to minimize the sum of the aggregate forces of all vertices. The layout algorithm then finds the proper placement of the vertices, where
Can be considered as the energy of the layout, and thereforeThe layout algorithm actually reduces the total energy of the force model. In one embodiment of the technique, the energy reduction process is performed in an iterative manner. In each iteration, each vertex moves in the direction of the focusing force. The process stops when the energy converges to a certain level. More specifically, as the energy drops, the total displacement of the apex also decreases. This technique sets a constant threshold. When the total displacement is less than this value, the iterative process stops. Before that time, the vertex remains moving, i.e. its position changes, in each iteration.
1.3.3Ranking social connections
As previously discussed, certain embodiments of social networking graph mining and visualization techniques employ a ranking process to rank social connections between entities. The following section gives additional information about the ranking process used in one embodiment of the technique.
The ordering employed by one embodiment of the technique may be described as follows. Given entity name list X = { X =0,x1,.. } and a web page list W = { W = }0,w1,.., each web page has an information block B = { B = }0,b1,. if xi,xjIn an information block, the weight of the relationship between the two names in the information block is defined as
Rb(xi,xj)=Rd·Rc
Wherein R isdA distance measure representing a relationship, and RcA contextual measure representing a relationship. These two metrics are defined as follows:
Rc=(xi,xj)=1.0K(xi,xj)=φ
here, d (x)i,xj) Representing x in an information blocki,xjThe character distance between. If xi,xjNot all present in the information block, then d (x)i,xj) Is infinite. Namely, RdEqual to zero. Variable K (x)i,xj) Representing a set of relational keywords found between two names in the block of information, such as "wife", "friend", etc. Variable kmRepresenting a predefined keyword weight, km>1.0. In one embodiment, each relationship keyword has a predefined weight, e.g., "wife" has a weight of 2.0 and "friend" has a weight of 1.6. K (x)i,xj) Being relational keywords corresponding to two namesAnd (4) setting the weight. If for xiAnd xjWithout a relational keyword, K (x)i,xj) Is an empty collection composed ofAnd (4) showing. Otherwise, K (x)i,xj) Will have one or more values, such as {1.6, 2.0. In one embodiment, the set of keywords is collected manually. When a key from the set exists in a block of information having two entity names, a process is used to decide whether to use the key to describe the relationship between the two entity names. Thus, xi,xjIs defined as:
the ranking process may thus compute relationship weights between all entities in the information block, and may rank the entities and their associated relationship weights accordingly. However, it should be noted that other ranking processes may be used to create the ranked list of social connections.
Note that a social graph is defined as G = (V, E), where V is a set of vertices and E is a set of edges. The name of each unique entity/person found from the web is the vertex in the graph, i.e., V = { x = { (x) }0,x1,...}. At each of the weights having a non-zero weightxi,xjBetween pairs, define edge eij. Thus, E = { E = { E }ijWherein R (x)i,xj)>0。
1.4Exemplary user interface:
FIG. 5 illustrates an exemplary user interface 500 for displaying and manipulating a social graph in accordance with one embodiment of automated social networking graph mining and visualization techniques. In this embodiment, a user may search for an entity name (e.g., a person, an organization, or a keyword) by entering the name (e.g., "C" in this example) in the search box 502. The entity name (e.g., C) is shown as central vertex 504 and the other connected entities are shown as vertices 506 surrounding central vertex 504. The strength of the social connection between vertices 506 is shown as edge 508. The shorter the edge 508 between two vertices 506, the stronger the relationship between the entities represented by the connected vertices. The longer the edge 508 between two vertices 506, the weaker the relationship between the entities represented by the connected vertices. In one embodiment, the color of the vertex 506 and optionally the background behind it varies from 0 ° to 360 ° in a rainbow-like color sequence (e.g., red 510, orange 512, yellow 514, green 516, blue 518, and purple 520). In addition, the closer the nodes or vertices 506, the more similar the color of the nodes (and optionally the background behind the nodes) are. In one embodiment, the node/vertex color changes according to polar coordinates from 0 ° to 360 °.
In one embodiment, the exemplary user interface 500 also provides animation. For example, when a user clicks (e.g., selects) one of vertices 506 with a user input device, the entire social graph/map 522 changes to a new map centered around the clicked (e.g., selected) vertex. During this switch transition, there are three animations:
a) the vertices that were not originally displayed on the new map are no longer displayed.
b) Moving the initially displayed vertex, which is still in the new map, to a new location; and
c) new vertices that were not in the original map will appear and move to the location.
The user interface 500 may also include cursor controls 524 to allow the user to move over the map 522 to display other portions of the map. The user may also select individual vertices 506 to find additional information about them. For example, in one embodiment, if a user selects a vertex with a mouse or other input device, a pop-up window or display screen will be displayed with additional information about the entity associated with the selected vertex.
2.0Computing environment
Automated social networking graph mining and visualization techniques are designed to operate in a computing environment. The following description is intended to provide a brief, general description of a suitable computing environment in which automated social networking graph mining and visualization techniques may be implemented. The technology is operational with numerous general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable include, but are not limited to, personal computers, server computers, hand-held or laptop devices (e.g., media players, notebook computers, cellular telephones, personal digital assistants, voice recorders), multiprocessor systems, multiprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
FIG. 6 illustrates an example of a suitable computing system environment. The computing system environment is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the technology. Neither should the computing environment be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment. With reference to FIG. 6, an exemplary system for implementing automated social networking graph mining and visualization techniques includes a computing device, such as computing device 600. In its most basic configuration, computing device 600 typically includes at least one processing unit 602 and memory 604. Depending on the exact configuration and type of computing device, memory 604 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This most basic configuration is illustrated in fig. 6 by dashed line 606. Additionally, device 600 may also have additional features/functionality. For example, device 600 may also include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in FIG. 6 by removable storage 608 and non-removable storage 610. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 604, removable storage 608 and non-removable storage 610 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by device 600. Computer-readable media include transitory, propagating signals and computer (readable) storage media. Any such computer storage media may be part of device 600.
Device 600 may also contain communication connections 612 that allow the device to communicate with other devices and networks. Communication connection(s) 612 is one example of communication media. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal, thereby changing the configuration or state of a receiving device for the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term "computer-readable media" as used herein includes both storage media and communication media.
Device 600 may have various input devices 614 such as a display, keyboard, mouse, pen, camera, touch input device, etc. Output device(s) 616 such as a display 622, speakers, printer, etc. may also be included. All of these devices are well known in the art and need not be discussed at length here.
The automated social network diagram mining and visualization techniques may be described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Automated social networking graph mining and visualization techniques may be implemented in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should also be noted that any or all of the above-described alternative embodiments described herein may be used in any combination desired to form additional hybrid embodiments. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. The specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (10)
1. A computer-implemented process for identifying social connections between entities, comprising:
using a computing device to:
receiving a set of web pages identifying social connections between entities;
analyzing the content of each webpage into information blocks;
identifying a name of an entity in the information block;
ordering social connections between entity names in the information block taking into account the location of the names in the information block and surrounding text; and
the ranked social connections from all the pieces of information are aggregated to determine a strength of connection between entities associated with the entity name.
2. The computer-implemented process of claim 1 further comprising creating a social network diagram based on a strength of connections between entities associated with the entity name.
3. The computer-implemented process of claim 1 wherein ranking social connections between the entity names further comprises associating a connection weight with an entity name.
4. The computer-implemented process of claim 1 wherein parsing content of each web page comprises using a visual parser to parse the content into chunks of information.
5. A system for displaying social connections between entity names extracted from a generic web page in the form of creating a 2-D graph having a set of vertices representing names and a set of edges representing social connections, the system comprising:
a general purpose computing device;
computer program comprising program modules executable by the general purpose computing device, wherein the computing device is directed by the program modules of the computer program to:
inputting a ranked list of social connections between the social graph owner and the additional entities;
placing the social graph owner at the center of the 2D graph as a center vertex;
for each entity in the sorted list, placing vertices representing names of entities in the sorted list in different tracks around the center vertex, wherein the shorter the track radius, the stronger the social connection between a vertex in the track and the center vertex;
clustering the vertices into different clusters according to connectivity between the vertices;
optimizing uniformity of the 2D layout using a force direction model, wherein the force direction model comprises:
repulsive force between every two vertices;
an attractive force between the edges;
repulsive forces between adjacent tracks; and
isolating impenetrable boundaries of clusters that are not connected to each other.
6. The system of claim 5, wherein each vertex, edge, track, and non-traversable boundary is either radially or tangentially free, or both radially and tangentially free, to allow movement in response to force.
7. The system of claim 5, further comprising a graphical user interface for displaying a 2D map on a display and allowing a user to manipulate the 2D map.
8. The system of claim 5, wherein the graphical user interface further displays the vertices of the 2D map in color, wherein vertices that are close to each other are displayed in a similar color.
9. The system of claim 5, wherein the color of the vertex changes according to a polar coordinate from 0 ° to 360 ° of the position of the vertex.
10. The system of claim 5, wherein the graphical user interface further comprises an animation that includes the 2D map changing to a new 2D map when a user selects a vertex.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/780,522 | 2010-05-14 |
Publications (2)
Publication Number | Publication Date |
---|---|
HK1179010A true HK1179010A (en) | 2013-09-19 |
HK1179010B HK1179010B (en) | 2018-03-29 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11657105B2 (en) | Automated networking graph mining and visualization | |
Hosseini et al. | Analysis of citation networks in building information modeling research | |
US8683389B1 (en) | Method and apparatus for dynamic information visualization | |
CN105760495B (en) | A kind of knowledge based map carries out exploratory searching method for bug problem | |
CN102750336B (en) | A method for personalized resource recommendation based on user relevance | |
Zhu et al. | A new reconstruction method for 3D buildings from 2D vector floor plan | |
US20140101557A1 (en) | Valence graph tool for custom network maps | |
US11789985B2 (en) | Method for determining competitive relation of points of interest, device | |
US20180181807A1 (en) | Conversion of static images into interactive maps | |
US20130013650A1 (en) | Visual and context-oriented curation platform | |
Lou et al. | Schedule a rich sentimental travel via sentimental POI mining and recommendation | |
CN104063383A (en) | Information recommendation method and device | |
US20150138203A1 (en) | Visualizing large graphs | |
TWI525460B (en) | Electronic calculating apparatus, personalized information recommending method thereof, and computer program product thereof | |
KR102338381B1 (en) | Big data based emotional information analysis and evaluation system and Driving method of the Same | |
CN114036398A (en) | Content recommendation and ranking model training method, device, equipment and storage medium | |
CN112380104A (en) | User attribute identification method and device, electronic equipment and storage medium | |
US20230289839A1 (en) | Data selection based on consumption and quality metrics for attributes and records of a dataset | |
CN113688164A (en) | Interest point query method and system based on knowledge graph correlation analysis | |
KR102603985B1 (en) | AI-based Seminar Custom Advertisement Production System | |
HK1179010A (en) | Method and system for automated social networking graph mining and visualization | |
HK1179010B (en) | Method and system for automated social networking graph mining and visualization | |
CN103870520B (en) | For searching for the device and method of information | |
Zhang et al. | An approach for exploring spatial associations in multi-layer networks based on convergent and divergent flow structures | |
Hu et al. | A multi-attribute decision analysis method based on rough sets dealing with uncertain information |