Information-pushing method based on type of webpage and device
Technical field
The present invention relates to mobile communication technology field, particularly to a kind of information-pushing method based on type of webpage and dress
Put.
Background technology
Data clusters is a focus of current internet application, through the development of decades, the network user and the Internet
Scale presents explosive growth, and a small amount of useful information is often flooded by the internet data of magnanimity, only by individual subscriber
Actively browse webpage and be difficult to effectively obtain key message.In this case, the Internet is from simple exhibition information passively
Start to change, in order to make the information of propelling movement the most quick and precisely, it is necessary to whole internet informations are carried out to active push information
Preliminary screening, data clusters is exactly a kind of information classification approach for setting up association between internet information.
Owing to pushed information is frequently not the information of user's initiative, it is easy to disliked by user, thus push
Accuracy is particularly important.Generally, pushed information mainly includes Search Results, news, life & amusement information and wide
Accusing, the accurate input of pushed information increasingly comes into one's own, and the type currently browsing webpage based on user pushes relevant letter
Breath is exactly one of which realization approach.Such as based on web page contents advertisement directional technology, it is simply that refer at the page that browser returns
Adding an advertisement in face, the classification of advertisement is consistent with type of webpage as far as possible.By data clusters, network push can be from pass
The information that connection degree is higher is carried out preferably, but the webpage owing to wanting real-time online currently to browse user is sorted out, to relevant
The performance of sorting algorithm proposes the requirement of harshness.
Web page classifying generally uses machine learning algorithm, such as naive Bayesian (Naive Bayes) algorithm, KNN (K-at present
Nearest neighbor) algorithm, support vector machine (Support Vector Machine, SVM) algorithm, neutral net
(Artificial Neural Network, ANN) algorithm etc..The vector that the basic ideas of these algorithms are all based on document is empty
Between model, by having marked in a large number the document training of classification, the model after being trained is to predict the classification of new web page.
The subject matter of these machine learning algorithms of prior art has:
(1) needing the substantial amounts of sample having marked classification, workload is big, and the quality of grader is by mark sample
Quality impact is bigger.Obtaining mark classification webpage to be typically manually to mark, the advantage of this method is mark sample quality
Height, but a large amount of manpower need to be expended.Also having some ways is to utilize the classified navigation website of the Internet or search engine orientation to crawl
The page, the advantage of this method is can to mark with automatization, but sample is of low quality, and noise is relatively big, and classification also differs surely
Needed for meeting self, the efficiency i.e. obtaining webpage is low, accuracy rate is low.
(2) some algorithm (such as ANN algorithm, SVM algorithm etc.) itself is more complicated, runs expense high, is only suitable at off-line
Reason, it is impossible to for the online process in real time that performance requirement is higher, i.e. real-time is low.
Method based on above-mentioned acquisition webpage, when carrying out information pushing, causes information pushing inefficiency, and real-time is low.
Summary of the invention
For the defect of prior art, the technical problem to be solved is to carry out accurately the most real-time and efficiently
Information pushing.
For solving the problems referred to above, an aspect of of the present present invention provides a kind of information-pushing method based on type of webpage, institute
The method of stating includes step:
The cooccurrence relation utilizing the history page words of description being obtained ahead of time obtains each described history page words of description
The type weights of corresponding different page type;Wherein, described cooccurrence relation is for representing the coexisting state between word;With described class
Type weights are that the attribute of word builds word's kinds attribute library;Utilize the current page words of description obtained in real time at described word
Categorical attribute is inquired about in storehouse, obtains the type weights of each page type of current page words of description;Calculate each page
The type weights sum of each current page words of description in the type of face, is set to the page type that type weights sum is maximum
The described type currently browsing webpage;Propelling movement network in webpage is currently browsed user based on the described type currently browsing webpage
Information.
Preferably, the cooccurrence relation of the history page words of description that described utilization is obtained ahead of time obtains each described history page
The step of the type weights of face words of description correspondence difference page type includes:
The cooccurrence relation utilizing history page words of description sets up term network;
The strength of association between each history page words of description is obtained in described term network according to described cooccurrence relation;
Travel through described term network, obtain the distance between each described history page words of description;
According in advance give each setting classification core word give initial weight, described distance, described strength of association with
And the decay intensity preset, obtain the type weights of each described history page words of description correspondence difference page type.
Preferably, the described association obtained in described term network between each history page words of description according to cooccurrence relation is strong
The step of degree includes:
The number of times that each history page words of description occurs jointly is obtained according to described cooccurrence relation;
Strength of association according between the equation below each history page words of description of acquisition:
Sij=Cij/Max(C)
Wherein, CijIt it is the co-occurrence number of times of word i and word j;Max (C) is that the co-occurrence number of times between word is maximum.
Preferably, described basis give each classification core word give initial weight, described distance, described strength of association with
And the decay intensity preset, obtain the step of the type weights of each described history page words of description correspondence difference page type
Including:
Equation below is utilized to obtain the type weights of each described history page words of description correspondence difference page type:
Wherein, wjType weights for term node j;I with j is two term node associated in term network, SijIt is
Node i and the strength of association of node j, wiIt is the type weights of node i;α is default decay intensity, diIt is node i and classification core
The distance of heart word, when calculating for the first time, wiFor giving the initial weight of described classification core word.
Preferably, described the step of pushing network information in webpage is currently browsed based on the type currently browsing webpage user
Suddenly include:
The network letter that inquiry is same or like with the type currently browsing webpage in default network information database
Breath;
The network information of acquisition is pushed to currently browsing webpage.
On the other hand, the present invention provides a kind of information push-delivery apparatus based on type of webpage, described device bag the most simultaneously
Include:
First weights acquisition module, utilizes the cooccurrence relation of the history page words of description being obtained ahead of time to obtain described in each
The type weights of history page words of description correspondence difference page type;Wherein, described cooccurrence relation is for representing between word
Coexisting state;
Attribute library sets up module, builds word's kinds attribute library for the attribute with described type weights as word;
Second weights acquisition module, utilizes the current page words of description obtained in real time in described word's kinds attribute library
Inquire about, obtain the type weights of each page type of current page words of description;
Page type determines module, for calculating the type weights of each current page words of description in each page type
Sum, is set to the described type currently browsing webpage by the page type that type weights sum is maximum;
Info push module, for currently browsing propelling movement net in webpage based on the described type currently browsing webpage user
Network information.
Preferably, described weights module includes:
Described first weights acquisition module includes:
Term network sets up unit, for utilizing the cooccurrence relation of the history page words of description being obtained ahead of time to set up word
Network;
Word association intensity acquiring unit, for obtaining each history page in described term network according to described cooccurrence relation
Strength of association between words of description;
Traversal Unit, is used for traveling through described term network, obtains the distance between each described history page words of description;
Acquiring unit, for according to the initial weight giving in advance the classification core word of each setting, described distance, institute
State strength of association and default decay intensity, obtain the class of each described history page words of description correspondence difference page type
Type weights.
Preferably, described word association intensity acquiring unit obtains each history page descriptor according to described cooccurrence relation
The number of times that language occurs jointly;Strength of association according between the equation below each history page words of description of acquisition:
Sij=Cij/Max(C)
Wherein, CijIt it is the co-occurrence number of times of word i and word j;Max (C) is that the co-occurrence number of times between word is maximum.
Preferably, described acquiring unit utilizes equation below to obtain each described history page words of description correspondence not same page
The type weights of face type:
Wherein, wjType weights for term node j;I with j is two term node associated in term network, SijIt is
Node i and the strength of association of node j, wiIt is the type weights of node i;α is default decay intensity, diIt is node i and classification core
The distance of heart word, when calculating for the first time, wiFor giving the initial weight of described classification core word.
Preferably, described info push module includes:
Information query unit is identical with the type currently browsing webpage for inquiring about in default network information database
Or the close network information;
Information pushing unit, for pushing the network information of acquisition to currently browsing webpage.
Compared with prior art, the invention provides a kind of information-pushing method based on type of webpage and device, pass through
The cooccurrence relation of the history page descriptor obtained in advance determines the word type weights relative to different page types;Weigh with the type
Value sets up Words ' Attributes storehouse;When user's displaying live view webpage, obtain page-describing word in real time, retouch with the page obtained in real time
Predicate language query terms attribute library, it is thus achieved that the type weights of the relatively different page type of each page-describing word obtained in real time;
Calculate in different page type the most again, the sum of all types of weights;Thus can get the page-describing word of each page type
Type weights and;Type weights and maximum page type are set to the page type of current page, such that it is able to more
Determine current page type accurately;The corresponding network information is selected to push further according to the page type determined;Due to
Page type can be accurately determined, it is not necessary to repeated pages type judges and the process of network information push, such that it is able to real
Now push accurate related information for user.Therefore, technical scheme is real-time, judgement is accurate, substantially increases
The accuracy of information pushing and efficiency.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of information-pushing method based on type of webpage in one embodiment of the invention;
Fig. 2 is the topological structure schematic diagram constructing term network in one embodiment of the invention;
Fig. 3 is the structural representation of information push-delivery apparatus based on type of webpage in one embodiment of the invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe wholely.Obviously, described embodiment is to implement the better embodiment of the present invention, and described description is that the present invention is described
Rule for the purpose of, be not limited to the scope of the present invention.Protection scope of the present invention should be with claim institute circle
The person of determining is as the criterion, and based on the embodiment in the present invention, those of ordinary skill in the art are not on the premise of making creative work
The every other embodiment obtained, broadly falls into the scope of protection of the invention.
Machine learning algorithm of the prior art depends on the document marked in a large number, how to obtain these a large amount of
The document of mark becomes the bottleneck affecting prior art performance.In technical scheme, it is no longer dependent on the mark to document
Note, sets up term network by the cooccurrence relation of statistical web page descriptor, utilizes the mapping relations reality between word in term network
Type decision the most accurately, thus ensure the accuracy pushed.
Fig. 1 is the schematic flow sheet of information-pushing method based on type of webpage, in one embodiment of the invention, base
Information-pushing method in type of webpage includes step:
The cooccurrence relation of the history page words of description that S1, utilization are obtained ahead of time obtains each history page words of description pair
Answer the type weights of different page types;Wherein, cooccurrence relation is for representing the coexisting state between word;Enter step S2;
S2, attribute with type weights as word build word's kinds attribute library;Wait and enter step S3;
The current page words of description that S3, utilization obtain in real time is inquired about in word's kinds attribute library, obtains current
The type weights of each page type of page-describing word;Enter step S4;
Preferably, this S3 step includes but not limited to following steps:
Acquisition current page words of description webpage is currently browsed from user;
Inquire about in word's kinds attribute library for index with current page words of description, obtain each current page and retouch
The type weights that the business classification of predicate language, each current page words of description are classified relative to this business;Wherein, word's kinds
Attribute library includes the mapping relations of page-describing word and each page type.
S4, calculate the type weights sum of each current page words of description in each page type, by type weights it
It is set to currently browse the type of webpage with maximum page type;Enter step S5;
Preferably, this S4 includes but not limited to following steps:
Calculate in the classification of each business, the type weights sum that each current page words of description is classified relative to this business;
The page type that type weights sum is maximum is set to currently browse the type of webpage.
S5, currently browse pushing network information in webpage based on the type currently browsing webpage user.
Compared with prior art, the invention provides a kind of information-pushing method based on type of webpage and device, pass through
The cooccurrence relation of the history page descriptor obtained in advance determines the word type weights relative to different page types;Weigh with the type
Value sets up Words ' Attributes storehouse;When user's displaying live view webpage, obtain page-describing word in real time, retouch with the page obtained in real time
Predicate language query terms attribute library, it is thus achieved that the type weights of the relatively different page type of each page-describing word obtained in real time;
Calculate in different page type the most again, the sum of all types of weights;Thus can get the page-describing word of each page type
Type weights and;Type weights and maximum page type are set to the page type of current page, such that it is able to more
Determine current page type accurately;The corresponding network information is selected to push further according to the page type determined;Due to
Page type can be accurately determined, it is not necessary to repeated pages type judges and the process of network information push, such that it is able to real
Now push accurate related information for user.Therefore, technical scheme is real-time, judgement is accurate, substantially increases
The accuracy of information pushing and efficiency.
In one or more embodiments, with off-line, the process of history web pages can be carried out i.e. step S1 and S2 can be from
Line processes, and is not take up the real time resources of system.History web pages can be the tired of the webpage that accesses within a period of time of local user
Long-pending, it is also possible to be the server end collection to Webpage.Based on both modes, word's kinds attribute library can user originally
Ground structure or renewal, it is also possible to after server end builds or updates, pass to user this locality again preserve use.The most clear to user
The amount of calculation of the process of webpage and the process of propelling movement of looking at is little, requirement of real-time is high, can exist in real time while user browses webpage
Line processes, i.e. step S3 to S5 can be with online treatment.
In an embodiment of the present invention, found by the general character of research html web page, existing html web page exists many
Containing being described the label of web page characteristics main information, the label such as the most common title, keywords and description, this
The word occurred in the description information of a little labels is associated together according to certain categorical attribute.By label is divided further
Analysis is it is found that word therein and then can be divided into again classification core word and classified description word.Wherein, classification core word is classification
Title, such as " physical culture ", " reading " etc.;Classified description word is a kind of description to classification core word, as " football ", " NBA " are exactly
A kind of description to " physical culture ".
For example, the label of Sina's sports channel page is:
<title>sina's sports storm _ Sina website</title>
< meta name=" keywords " content=" physical culture, sports news, Sina's sports storm, the Olympic Games, 2012, difficult to understand
National Games, NBA is live, "/>
< " Sina's physical culture provides the most professional physical culture to meta name=" description " content=
News and race report, mainly have a following column: domestic football, international soccer, basketball, NBA, comprehensive sports, the Olympic Games, F1, net
Ball, golf, chess and card, lottery ticket, video, picture, blog, physical culture microblogging, community forum "/>
And the label of Netease's sports channel page is:
<title>the physical culture door of Netease's physical culture _ have attitude</title>
<meta name=" keywords " content=" physical culture, sports news, sports center, physical culture picture, "/>
< " physical culture, sports channel comprise sports news, England Premier League, meaning to meta name=" description " content=
First, Division A League Matches of Spanish Football, champion cup, sports score, football lottery, welfare lottery ticket, physical culture beautiful sceneries, tennis, F1, chess and card, table tennis and badminton, physical culture forum, in super, in
State's football, the professional sports portal website such as comprehensive sports "/>
It can be seen that the word of the appearance in the description information of the labels such as above-mentioned title, keywords and description
Language is the information manually selected and describe a webpage principal character, can regard a kind of natural artificial mark as, although its
Differ and mark out classification belonging to webpage the most clearly, but it is believed that selected word is relevant to classification, permissible by selected word
Indirectly summarize implicit classification.
Preferably, in one or more embodiment of the present invention, utilize the history page descriptor being obtained ahead of time
The step of the type weights that the cooccurrence relation of language obtains each history page words of description correspondence difference page type includes:
The cooccurrence relation utilizing the history page words of description being obtained ahead of time sets up term network;Wherein, term network is
Fully connected topology;
The strength of association between each history page words of description is obtained in term network according to cooccurrence relation;
Traversal term network, obtains the distance between each history page words of description;
According to the initial weight giving in advance the classification core word of each setting, distance, strength of association and default
Decay intensity, obtains the type weights of each history page words of description correspondence difference page type.Wherein, classification core word can
To be set according to actual needs.
Preferably, in one or more embodiment of the present invention, obtain in term network each according to cooccurrence relation
The step of the strength of association between history page words of description includes:
The number of times that each history page words of description occurs jointly is obtained according to cooccurrence relation;
Co-occurrence number of times between word is the most, and strength of association is the biggest, obtains each history page words of description according to equation below
Between strength of association:
Sij=Cij/Max(C)
Wherein, CijIt it is the co-occurrence number of times of word i and word j;Max (C) is that the co-occurrence number of times between word is maximum.
Preferably, in one or more embodiment of the present invention, at the beginning of giving to each classification core word
Beginning weights, distance, strength of association and default decay intensity, obtain each history page words of description correspondence difference classes of pages
The step of the type weights of type includes:
Equation below is utilized to obtain the type weights of each history page words of description correspondence difference page type:
Wherein, i with j is two term node associated in term network, SijIt is the strength of association of node i and node j, α
It is default decay intensity, diIt it is the distance of node i and classification core word;wjFor the type weights of node j to be calculated, wiFor
The type weights of the node i having calculated that, when calculating for the first time, wiFor giving the initial weight of described classification core word.
The computing formula of the above-mentioned type weights is that recurrence uses, i.e. i is only only this page type when calculating for the first time
Classification core word, wiOnly it is only the initial weight of classification core word when first calculates, before using during calculating subsequently
The secondary value calculated.
wj、wiMeaning different and different according to corresponding page type;As: if wiRelative for giving classification core word i
The initial weight of first page type, then wjFor word j relative to the type weights of first page type.If wiFor giving classification
Core word i relative to the initial weight of second page type, then wjWeigh relative to the type of second page type for word j
Value;... by that analogy, the type weights of each history page words of description correspondence difference page type can be obtained.
In the above-described embodiment, in order to save system consumption, w is worked asjLess than stopping during certain threshold value obtaining type weights
Process.
More specifically, in a preferred embodiment of the invention, the step that history page words of description is obtained ahead of time can be:
Description information is obtained, subsequently from retouching from the label (such as labels such as title, keywords and description) of history web pages
State and information carries out word extraction to obtain history page words of description;Wherein, the extraction of history page words of description can be from
Line processes, thus can use more complicated extraction mode, such as based on statistics or the participle of semanteme.
In the above-described embodiment, it is thus achieved that need after term network to further determine that page type set up in which word of employing
Classification.In a preferred embodiment of the invention, will appear in the word in web page tag and be divided into two kinds, one is classification
Core word, another kind is classified description word.Corresponding above-mentioned Sina and the example of Netease, physical culture is classification core word, score, English
Super, NBA etc. is classified description word.But for a webpage, computer itself is difficult to differentiate between classification core word and classification is retouched
Predicate, the present invention is by substantial amounts of webpage statistical analysis, and the frequency that classification core word occurs is higher than classification descriptor,
Thus the classification core word in label word and classified description word can be distinguished by the statistics word frequency of occurrences.
As a specific embodiment, the result that description information to multiple webpages process after obtain is presented herein below
(each word in the description information obtained after word segmentation processing):
Webpage 1: novel, describing love affairs, read ...
Webpage 2: read, novel, chapters and sections, describing love affairs ...
Webpage 3: read, magazine ...
Webpage 4: read, pass through, novel ...
Webpage 5: the bird of indignation, game, Need For Speed ...
Webpage 6: the counteroffensive of wild boar, the bird of indignation ...
Webpage 7: game, plant Great War corpse ...
Webpage 8: software, Hua Jun ...
Webpage 9: software, instrument ...
Webpage 10: automobile, popular ...
Webpage 11: automobile, Ke Luzi, Chevrolet ...
The term network that obtains after processing according to above-mentioned explanation is as in figure 2 it is shown, from figure 2 it can be seen that limit
Power is determined by the co-occurrence number of times in different web pages between word, and co-occurrence number of times is the most, and the weights of incidence edge are the biggest.Fig. 2 associates
Spend high node, such as reading, game, software, automobile etc., it is believed that being classification core word, other word is classified description word.
According to Fig. 2 it can be seen that the degree of incidence of classification core word and other words is apparently higher than general category descriptor, in the drawings
It is more that performance is exactly the number on the limit that node has, and therefore i.e. can determine that classification core word by the limit number of statistics node.Determine
I.e. each page type can be divided by each classification core word, the type power of each classification core word under default situations after classification core word
Value is set to 1, using it as the basis calculating each classified description part of speech type weights.
Certainly, rely on the classification core word that is calculated automatically from of machine may and imperfect, sometimes with business needed for point
Class not one_to_one corresponding, now can adjust page type, by some classification core on the basis of machine processing result further
Word is divided into more accurately in page type, and specify corresponding type weights (classification core word is under this page type simultaneously
Initial weight).Such as the following is a kind of adjustment mode of the classification core word to Fig. 2:
When, after the page type determining classification core word, carrying out page type according to the type weights of classification core word
Extension, wherein the type weights of this page type can carry out index (default decay intensity) decay expansion by the weights of incidence edge
Dissipate.So, each root, according to different from the strength of association of classification core word and distance, obtains each page-describing word for each page
The type weights of face type.
Specifically, utilize classification core word with the starting type weights of this page type, by BFS (breadth-first)
The type weights of each page type of each word that graph search algorithm computing and sorting core word is associated, and associate further
The type weights of each page type of word, by that analogy, the type weights of each page type are diffused to whole word net
Network.
Page-describing word is as follows for the computing formula of the type weights of different page types:
Wherein, wjType weights for word j;I with j is two term node associated in term network, SijIt it is node i
With the strength of association of node j, wiIt is the type weights of node i;When calculating for the first time, wiFor giving the initial of classification core word i
Weights;α is default decay intensity, diIt it is the distance of node i and classification core word.
Save and calculate resource, avoid the word's kinds attribute library the hugest and maintain the high degree of association of word, use
When above formula calculates, when the type weight w of node jjLess than stopping extension during certain threshold value.
Use aforesaid way to travel through whole term network, export each word type weights for different page types.Still
As a example by the term network of Fig. 2, being set in after determining classification core word, the result that traversal obtains is:
Game: game: 1.0;Software: 0.3
The bird of indignation: game: 0.6;Software: 0.2
Software: software: 1.0;Game: 0.2
(in the case of this locality does not exist word's kinds attribute library) can be built after completing traversal or update (in this locality
In the case of existing word's kinds attribute library) word's kinds attribute library, this storehouse have recorded the different classes of pages that each word is had
The type weights of type.
Browsing situation according to user subsequently, the webpage currently browsed user carries out real-time judge, to determine current page
The type in face, it is judged that process such as above-mentioned steps.
In a preferred embodiment of the invention, first currently browse from user webpage label (as title, keywords,
Description etc.) in extraction obtain the description information of webpage, from description information, carry out word extraction subsequently current to obtain
Page-describing word.The extraction of current page words of description needs online treatment, can use and simply mate based on dictionary dictionary
Participle method (mechanical Chinese word segmentation method).After extraction current page words of description, divide at word with current page words of description for index
Generic attribute is inquired about in storehouse, obtains the categorical attribute of each word and collects.I.e. obtain each of current page words of description
The type weights of page type;Calculate the type weights sum of each current page words of description in each page type, by class
The page type of type weights sum maximum is as the type currently browsing webpage.
Such as, obtain the description information of certain webpage of user's displaying live view, after extraction, obtain following page-describing word:
Amusement, attack strategy, treasured book, download;
Word's kinds attribute library is inquired about and obtains each page-describing word type weights for each page type:
Amusement: game: 0.7;Video: 0.6
Attack strategy: game 0.9;Read: 0.4
Treasured book: game: 0.6
Download: software: 0.9
The type weights of whole page types are collected (the most each page type is sued for peace respectively), obtain data below:
Game: 0.7+0.9+0.6=2.2
Video: 0.6
Read: 0.4
Software: 0.9
This webpage visible type weights sum under " game " this page type is maximum, so this webpage is classified as
" play " type.
Relevant technical staff in the field will be understood that corresponding with the method for the present invention, and the present invention includes one the most simultaneously
Planting information push-delivery apparatus based on type of webpage, with said method step correspondingly, Fig. 3 is information based on type of webpage
The structural representation of pusher, this device includes:
First weights acquisition module, utilizes the cooccurrence relation of the history page words of description being obtained ahead of time to obtain each history
The type weights of page-describing word correspondence difference page type;Wherein, cooccurrence relation is for representing the coexisting state between word;
Attribute library sets up module, builds word's kinds attribute library for the attribute with type weights as word;
Second weights acquisition module, utilizes the current page words of description obtained in real time to carry out in word's kinds attribute library
Inquiry, obtains the type weights of each page type of current page words of description;
Page type determines module, for calculating the type weights of each current page words of description in each page type
Sum, is set to currently browse the type of webpage by the page type that type weights sum is maximum;
Info push module, for currently browsing propelling movement network letter in webpage based on the type currently browsing webpage user
Breath.
Compared with prior art, the invention provides a kind of information-pushing method based on type of webpage and device, pass through
The cooccurrence relation of the history page descriptor obtained in advance determines the word type weights relative to different page types;Weigh with the type
Value sets up Words ' Attributes storehouse;When user's displaying live view webpage, obtain page-describing word in real time, retouch with the page obtained in real time
Predicate language query terms attribute library, it is thus achieved that the type weights of the relatively different page type of each page-describing word obtained in real time;
Calculate in different page type the most again, the sum of all types of weights;Thus can get the page-describing word of each page type
Type weights and;Type weights and maximum page type are set to the page type of current page, such that it is able to more
Determine current page type accurately;The corresponding network information is selected to push further according to the page type determined;Due to
Page type can be accurately determined, it is not necessary to repeated pages type judges and the process of network information push, such that it is able to real
Now push accurate related information for user.Therefore, technical scheme is real-time, judgement is accurate, substantially increases
The accuracy of information pushing and efficiency.
Preferably, in one or more embodiment of the present invention, the first weights acquisition module includes:
Term network sets up unit, for utilizing the cooccurrence relation of the history page words of description being obtained ahead of time to set up word
Network;
Word association intensity acquiring unit, for obtaining each history page words of description in term network according to cooccurrence relation
Between strength of association;
Traversal Unit, is used for traveling through term network, obtains the distance between each history page words of description;
Acquiring unit, is used for according to the initial weight given to the classification core word of each setting in advance, distance, association by force
Degree and the decay intensity preset, obtain the type weights of each history page words of description correspondence difference page type.
Preferably, in one or more embodiment of the present invention, word association intensity acquiring unit is according to co-occurrence
The number of times that each history page words of description of Relation acquisition occurs jointly;Each history page words of description is obtained according to equation below
Between strength of association:
Sij=Cij/Max(C)
Wherein, CijIt it is the co-occurrence number of times of word i and word j;Max (C) is that the co-occurrence number of times between word is maximum.
Preferably, in one or more embodiment of the present invention, state acquiring unit and utilize equation below to obtain respectively
The type weights of individual history page words of description correspondence difference page type:
Wherein, wjType weights for term node j;I with j is two term node associated in term network, SijIt is
Node i and the strength of association of node j, wiIt is the type weights of node i;α is default decay intensity, diIt is node i and classification core
The distance of heart word, when calculating for the first time, wiFor giving the initial weight of described classification core word.
Preferably, in one or more embodiment of the present invention, info push module includes:
Information query unit is identical with the type currently browsing webpage for inquiring about in default network information database
Or the close network information;
Information pushing unit, for pushing the network information of acquisition to currently browsing webpage.
Compared with prior art, technical scheme is closed by the classification between term network administration web page descriptor
Connection, determines the page type of webpage, thus carries out information pushing the most efficiently, it has followed by the type weights of word
Advantage highlighted below:
1. algorithm realizes simple, efficient, shows that this method can meet online webpage substantially by actual data test
The accuracy of type decision and performance requirement;
2. without artificial mark Web page classifying sample, being greatly saved human cost, workload is few;
3. data model updates simple, supports incremental update, and the increase of the increase of sample data, even page type is the most not
Needing re-training, this sorting algorithm being also general is difficult to.
It will appreciated by the skilled person that all or part of step realizing in above-described embodiment method is permissible
Instructing relevant hardware by program to complete, described program can be stored in a computer read/write memory medium,
Upon execution, including each step of above-described embodiment method, and described storage medium may is that ROM/RAM, magnetic to this program
Dish, CD, storage card etc..
Although above in association with preferred embodiment, invention has been described, but it should be appreciated by those skilled in the art,
Method and system of the present invention is not limited to the embodiment described in detailed description of the invention, is wanting without departing substantially from by appended right
In the case of seeking the spirit and scope of the invention that book limits, can to the present invention various modification can be adapted, increase and replace.