[go: up one dir, main page]

WO2019047849A1 - Procédé de traitement d'actualités, appareil, support d'informations et dispositif informatique - Google Patents

Procédé de traitement d'actualités, appareil, support d'informations et dispositif informatique Download PDF

Info

Publication number
WO2019047849A1
WO2019047849A1 PCT/CN2018/104156 CN2018104156W WO2019047849A1 WO 2019047849 A1 WO2019047849 A1 WO 2019047849A1 CN 2018104156 W CN2018104156 W CN 2018104156W WO 2019047849 A1 WO2019047849 A1 WO 2019047849A1
Authority
WO
WIPO (PCT)
Prior art keywords
news
event
identified
time
time node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2018/104156
Other languages
English (en)
Chinese (zh)
Inventor
殷乐
花贵春
王丹丹
郎兵
赵林
胡博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2019047849A1 publication Critical patent/WO2019047849A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present application relates to the field of Internet application technologies, and in particular, to a news processing method, apparatus, computer readable storage medium, and computer device.
  • the recommended news can be recent hot news, or it can be based on different users' targeted recommendations in the corresponding field.
  • the news needs to set the news expiration time, and the invalid news is dealt with in time to ensure that the invalid news is not recommended to the user, and the news recommended to the user is in line with the development of the news event, thereby satisfying the user's reading needs.
  • the related art there is no effective solution to the above problem.
  • the embodiment of the present application provides a news processing method, device, computer readable storage medium, and computer device that can improve the timeliness of recommended news.
  • a news processing method executed by a server, comprising:
  • a news processing apparatus comprising: a first obtaining module, configured to acquire a word vector of the news to be identified; a second acquiring module, configured to acquire a word vector corresponding to the event, and a time node of the event; and a determining module, configured to: Determining, according to a similarity between the word vector of the to-be-identified news and the word vector of the event, an association event of the to-be-identified news, and determining a time node corresponding to the to-be-identified news in the associated event, Determining whether the news is valid according to the time node.
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements a news processing method.
  • the news processing method includes: acquiring a word vector of the news to be recognized; acquiring a word vector of the event, and a time node of the event; determining, based on the similarity between the word vector of the news to be recognized and the word vector of the event The associated event of the news to be identified, and determining a time node corresponding to the news to be identified in the associated event, determining whether the news is valid according to the time node.
  • a computer device comprising a memory, a processor, and a computer program stored on the memory, the processor implementing a news processing method when the program is executed.
  • the news processing method includes: acquiring a word vector of the news to be recognized; acquiring a word vector of the event, and a time node of the event; determining, based on the similarity between the word vector of the news to be recognized and the word vector of the event The associated event of the news to be identified, and determining a time node corresponding to the news to be identified in the associated event, determining whether the news is valid according to the time node.
  • FIG. 1 is an application environment diagram of a news processing method according to an embodiment of the present application.
  • FIG. 2 is a flow chart of a news processing method in an embodiment of the present application.
  • FIG. 3 is a flowchart of a news processing method in another embodiment of the present application.
  • FIG. 4 is a flow chart of a news processing method in still another embodiment of the present application.
  • FIG. 5 is a flowchart of a news processing method in still another embodiment of the present application.
  • FIG. 6 is a flow chart of a news processing method in still another embodiment of the present application.
  • FIG. 7 is a schematic diagram of an application scenario in which a news reading application provides news processing on a server during a news push service according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an application scenario displayed by a news reading application on a terminal during a news push service according to an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of main steps of a news processing method in which the game event A and the news B to be identified are taken as an example.
  • FIG. 10 is a schematic structural diagram of a news processing apparatus according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a news processing apparatus according to another embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a news processing apparatus according to still another embodiment of the present application.
  • FIG. 13 is a schematic diagram showing the internal structure of a computer device according to an embodiment of the present application.
  • Browsing news through the Internet has become the habit of more and more users, and many news websites or news applications also have the function of actively recommending news to users.
  • the determination of the expiration time of the news includes two ways:
  • the corresponding expiration time is preset for the news including the corresponding keyword
  • the corresponding expiration time is preset for the news of the category.
  • this method can only be invalidated for news settings containing specific keywords or the same category. Duration, and for news areas that contain a large number of clear events and where the periodicity of the event is not clear, such as sports news, movie news, etc., the way to set the expiration time based on news keywords or categories is not applicable, for example, after a sports competition. It is unreasonable to recommend pre-match or in-game news. It is not appropriate to recommend the preview news before the release of the movie. It is not meaningful to recommend the news to the user after the news is recommended to the user. Poor timeliness.
  • FIG. 1 is a diagram showing an application environment of a news processing method according to an embodiment of the present application, including a terminal 100 and a server 200.
  • the server 200 is connected to the terminal 100 through a network.
  • the user downloads the news application through the terminal 100 or logs in to the news website for browsing.
  • the news application refers to an application software specifically for the user to read the news information or an application software including a function module specially for the user to read the news information, such as various commonly used news reading areas including the news recommendation function. (Application) software.
  • the terminal 100 may be a smartphone, a tablet, a personal digital assistant (PDA), a personal computer, or the like.
  • the server 200 transmits the recommended news to the corresponding terminal 100 through the network for the user to display and view through the terminal.
  • Server 200 can be a standalone physical server or a cluster of physical servers.
  • FIG. 2 is a news processing method according to an embodiment of the present application.
  • the method may be performed by the server 200, and the method includes the following steps.
  • Step 101 Acquire a word vector of the news to be identified.
  • News usually refers to the use of general narratives, texts, images, videos and other means to timely report more significant and valuable events, so that a certain group of people understand.
  • News in a broad sense refers to a message that contains all the words, images, videos, and audio data that record events and disseminate information through media or network channels.
  • news in a broad sense includes not only news websites and news applications in the usual sense. Text, images, video, and audio data that are served, as well as event-related messages that are served as articles in social applications in the usual sense.
  • news refers to news in a broad sense.
  • the to-be-identified news refers to the object to be processed in the news processing method provided by the embodiment of the present application.
  • step 101 acquiring a word vector of the news to be identified includes: extracting a keyword based on the news to be recognized; mapping the extracted keyword to a word vector space to obtain a word vector corresponding to the keyword.
  • a keyword generally refers to information that describes a feature that is necessarily mentioned in an event process and that can reflect a unique event.
  • the description information of an event usually includes time, place, person, and information related to four elements of the event, thereby
  • the keywords can be determined and extracted at least from the perspective of the information related to the four elements.
  • the step of extracting keywords based on the news to be identified may be obtained by fetching structured information from a vertical website of the news or other related news web pages, and the crawling of the structured information may adopt a crawling method known in the current Internet technology. For example, reptile technology.
  • a vertical website is a website that focuses on certain areas or specific needs, providing a full range of in-depth information and related services about the field or needs.
  • the structured information means that after the information is analyzed, it can be decomposed into a plurality of interrelated components. Each component has a clear hierarchical structure. Its use and maintenance are managed through the database, and there are certain operational specifications.
  • the extraction of the keywords may be derived from the title of the news, the content of the report, and the comment corresponding to the news.
  • extracting a keyword based on the to-be-identified news includes: extracting a keyword corresponding to the news to be identified from at least one of: first, information included in content of the news to be identified; second, to be identified Specific associated information for the news.
  • the news to be identified refers to information contained in the content of the news report itself, such as a news headline and a news body, wherein the news for the video or audio data can be extracted by the voice in addition to the keywords in the news headline.
  • the key is extracted by recognizing the way it is converted into text.
  • the specific related information of the news to be identified mainly refers to the information contained in the content related to the news report, such as the comment corresponding to the news, and the news for the video or audio data, in addition to the keyword can be extracted from the news title, Extract keywords from the comments.
  • the keyword can be comprehensively extracted by means of the content and comment of the news report itself, so that the keyword can be more accurately and accurately identified.
  • the keywords of the news are also fully considered to help the timeliness of the rich news content in the news.
  • Word vector refers to the way in which words, words, phrases, etc. in a language are converted into digits.
  • the expression form of the word vector includes: a word is represented by a vector of a specific length, the length of the vector is the size of the dictionary, the component of the vector has only one, and the others are all 0, and the position of the 1 corresponds to the position of the word in the dictionary.
  • a word in the language to be a short vector of fixed length shorter than the specific length, putting all these vectors together to form a word vector space, and each vector is one of the spaces Point, introduce the distance parameter in the space, and judge the lexical and semantic similarity between the words according to the distance between the short vectors corresponding to the words.
  • the training of the word vector can be realized by means of a language model, and the extracted keywords are mapped into the word vector space to obtain the corresponding word vector.
  • the word vector model e.g., word2vec
  • the word vector model is trained by samples, e.g., words and corresponding word vectors, to obtain parameters of the word vector model. Mapping the extracted keywords to the word vector space can obtain the word vectors corresponding to the keywords by inputting the extracted keywords into the word vector model.
  • Step 103 Acquire a word vector of the event and a time node of the event.
  • the event may be one or more, and when the event is multiple, the word vector of the multiple events is obtained.
  • the time node of the event is multiple, and the time node of the event can be obtained by acquiring the sequence of time nodes of the event.
  • Events are things that are significant and can affect a certain group of people.
  • the description information of an event usually includes time, place, person, and information through four elements related to the event, wherein the event is described by the content including the event from the generation to the end of the development process.
  • the time node of an event refers to a specific time point that divides things into multiple development stages according to some common characteristics of different time periods. Taking the sports event as an example, according to the development process of the sports event, the game can be divided into three stages: pre-match, mid-game and post-game, respectively, at the time of the game start time and the end time of the game.
  • the time point, the premiere time, the release start time, and the release end time are respectively used as time nodes to distinguish them from before and during the release. And three stages after the release.
  • acquiring the word vector corresponding to the event includes: extracting a keyword based on the event; mapping the extracted keyword to the word vector space to obtain a word vector corresponding to the keyword.
  • the keyword of the extracted event is input into the word vector model, and the vector output by the word vector model is used as the word vector corresponding to the keyword.
  • a keyword generally refers to information that describes a feature that is necessarily mentioned in an event process and that can reflect the uniqueness of the event.
  • the description information of the event usually includes time, place, person, and information related to the four elements of the event.
  • the event itself has attribute information of the industry or domain category, and the category to which the event belongs is information related to another element of the event, so that the keywords of the event can be determined or extracted according to at least information related to the five elements.
  • the keyword of the event can extract "XX” day from the perspective of time elements, from the perspective of location elements. Extract “Beijing”, extract the starring "XX” from the perspective of the character elements, and extract the "entertainment” class as the keyword of the event from the perspective of the event category elements.
  • News is a specific form of presentation of events, and extracting keywords based on events may also be based on a plurality of related news extraction keywords whose events are known. Specifically, one or more news associated with the event is acquired, and the keyword of the event is determined according to the information of the content included in the one or more news articles and the specific associated information.
  • Step 105 Determine an association event of the news to be identified based on the similarity between the word vector of the news to be identified and the word vector of the event, and determine a time node corresponding to the news to be identified in the associated event.
  • the similarity between the news to be identified and each event is determined, and one event is selected as the associated event of the news to be identified according to the similarity between the news to be identified and each event.
  • the time node corresponding to the news to be identified is determined in the sequence of time nodes of the associated event.
  • Similarity refers to the degree of association between two things.
  • the manner of determining the similarity between the news and the event to be identified based on the word vector of the news to be recognized and the word vector of the event mainly includes: matching between the word vector of the news to be recognized and the word vector of the event, and determining according to the result of the matching; or A similarity value is calculated between the word vector of the news to be identified and the word vector of the event, and is determined according to the magnitude of the similarity value.
  • the related event corresponding to the news to be identified is automatically identified by the similarity between the news and the event to be identified, that is, whether the news to be identified is the related news of the specific event.
  • the time node of the associated event corresponding to the news to be identified is automatically identified by the similarity between the news and the event to be identified, that is, the development stage in which the associated event corresponding to the news to be identified is located is identified.
  • the news processing method by setting a time node of an event, extracting related information of the news to be recognized, automatically identifying the related news related to the event, and determining a time node of the event corresponding to the news according to the time information of the news.
  • the time node of the event corresponding to the news can be judged based on the news node corresponding to the event, the event corresponding to the news to be identified, and whether the news to be identified is
  • the current development stage of the event can be accurately identified, which is conducive to improving the timeliness of the news to be identified.
  • Step 106 Determine whether the news is valid according to the time node.
  • the news processing method further includes:
  • Step 107 When the corresponding time node is a specific time node associated with the failure, it is determined that the news to be identified is invalid.
  • the time node of an event is typically a sequence of multiple time nodes that are arranged in chronological order. Each time node represents the start time of one development phase of the event or the end time of another development phase of the event, and any two adjacent time nodes correspond to a development phase of the event. Therefore, after determining the time node corresponding to the news to be identified, the development stage of the event in which the news to be identified is located is determined, so that it can be determined according to the corresponding time node whether it is a specific time node associated with the failure.
  • the next time node of the time node corresponding to the news to be identified that is, the end time of the event development stage where the news to be identified is located or the event development stage of the news to be identified may be
  • the start time of the next development phase is taken as a specific time node associated with the failure, and the specific time node associated with the failure can be determined as the expiration time of the news to be identified.
  • the subsequent time node with the preset interval of the corresponding time node may be The time node in the time is determined as the specific time node associated with the failure, and the specific time node is determined as the expiration time of the news to be identified.
  • the corresponding time node may be added with a preset time length as a specific time node of the failure association, and the specific time node associated with the failure is associated. Determine the expiration time of the news to be identified.
  • the specific time node associated with the failure may be a time or a time period. When a specific time node associated with the failure is represented by a time period, the time period may be set according to actual application requirements.
  • the time is determined as the expiration time of the news to be identified.
  • the start time of the next development phase of the event is set to the expiration time of the news to be identified, and the specific time node associated with the failure refers to the development stage of the event in which the news is to be identified. The start time of the next development phase.
  • the event development is divided into multiple development stages by the time node, and after identifying the different development stages of the event of the news, the expiration time of the news is set as the start time of the next development stage or the subsequent specific development stage, which one is selected
  • the development phase is based on actual application needs.
  • the specific time node associated with the failure is determined by the time node corresponding to the news, so that only the news belonging to the current development stage of the event is recommended to the user, and the news that does not belong to the current development stage of the event is timely removed from the transaction to ensure The timeliness of news recommended to users.
  • the determining, according to the time node, whether the news is valid the news processing method further includes:
  • Step 108 When the type of the corresponding time node is an end time node, and the preset failure time of the end time node arrives, it is determined that the news to be identified is invalid.
  • each time node can be used to indicate the start time of one development phase of the event or the end time of another development phase representing the event.
  • the time node at the forefront of the node sequence is the start time node
  • the time node at the end of the node sequence is the end time node
  • the time nodes between the front end and the last end are intermediate time nodes.
  • the time node defines its end time, and may be the end time node when determining the time node of the event corresponding to the time information included in the news to be identified. Therefore, for the case where each time node is used to indicate the start time of a development phase of the event, when it is confirmed that the time node of the news corresponding event to be identified is the start time node or the intermediate time node, the corresponding time may be The next time node of the node, or a subsequent time node having a preset interval, or a time node corresponding to the event plus a time node determined by the preset duration, is determined as the expiration time of the news to be identified. When it is confirmed that the time node of the event corresponding to the news to be identified is the end time node, the expiration time of the related news belonging to the last development stage of the event is determined by setting the preset expiration time.
  • the preset failure duration refers to a time range in which the preset news is valid, and the failure processing is performed for the time when the time after the news release remains valid beyond the valid time range.
  • the end time node is set to indicate the start time of the last development phase of the event, and the time node of the event corresponding to the news to be identified is determined to be the end time node, the news belongs to the last development stage of the event.
  • the expiration time of the news can be determined by setting the corresponding time node plus the preset expiration time. Through the setting of the time node, the event is divided into multiple different development stages through multiple time nodes, and only the time of each development stage is considered, and the presets are uniformly set for the final development stage of the events in different fields. The expiration time is sufficient, which can reduce the difficulty of setting the time node of the event.
  • the event node is divided into multiple stages according to a certain common characteristic of different development stages by the time node of the event, and the event corresponding to the news to be identified is determined.
  • the time node can be used to know the development stage of the event in which the news is to be identified, determine whether the news belongs to the news of the current development stage of the event, and determine the news that is not in the current development stage of the event as invalid news.
  • the event-based time node sets a reasonable life cycle for the news, and timely determines news that is not in the current development stage of the event as invalid news, so as to avoid recommending news that is not in line with the current development stage of the event to the user, so as to improve the direction.
  • the timeliness of the news recommended by the user are used to know the development stage of the event in which the news is to be identified, determine whether the news belongs to the news of the current development stage of the event, and determine the news that is not in the current development stage of the event as invalid news.
  • the time node for acquiring the event includes:
  • the setting of the time node of the event can be formed in a predefined manner. For example, by analyzing the common development characteristics of events in different domain categories, it is divided into several development stages, and the division time points of several development stages are determined, and these division time points are used as pre-defined time nodes of events of corresponding categories. For example, by analyzing the common development characteristics of events with different thermal agendas, it is divided into several hot discussion stages, and the division time points of several hot discussion stages are determined, and these time points are taken as events corresponding to the thermal agenda degree.
  • the splitting time point may be a time or a time period.
  • the time node may also be a time or a time segment. When the split time point is a time segment, According to actual needs, it is selected to set any time in the time period to belong to the time included in the two development stages adjacent thereto or the time included in one of them.
  • step 103 the time node for acquiring the event includes:
  • the relevant news of the event is obtained and clustered, and the time node of the event is determined according to the time information included in the related news of different categories.
  • the setting of the time node of the event can be determined by clustering the relevant news of the event.
  • Clustering refers to the process of classifying data into different classes or clusters. Objects in the same class or cluster have great similarities, and objects in the same class or clusters have great dissimilarity.
  • the time information included in the related news includes the release time of the related news, the time when the news is involved, and the like. In this embodiment, the time information included in the related news refers to the release time of the news.
  • the clustering is performed, the related news is classified into different categories according to the keywords of the related news. Specifically, the keywords of the related news may be corresponding.
  • the vector input classification model divides the relevant news into different categories through the classification model, wherein the classification model is pre-trained.
  • the split time node corresponding to the category is determined according to the release time of each related news in the category.
  • the split time point of the corresponding class may be determined according to the earliest release time and the latest release time in the related news included in different categories in the clustering result.
  • the split time points corresponding to different categories are used as the time nodes of the event.
  • the related news of the event is acquired and clustering is performed, and the time node of the event is determined according to time information included in the related news of different categories, including:
  • the time node of the event is determined based on the initial time node.
  • the time information included in the related news includes the release time of the related news, the time when the event involved in the news occurs, and the like. Taking the time information included in the related news refers to the release time of the news as an example, firstly, the earliest publishing time and the latest publishing time in the related news of different categories obtained by the cluster processing are used as the segmentation time points of the corresponding category, and the segments are segmented. The time point serves as the initial time node of the corresponding event.
  • the adjustment rule may be formulated according to some personalized requirements, and the initial time node may be adjusted according to the adjustment rule to obtain the time node of the event; or Based on the time node, the time node of the event is obtained by the user adjusting in a custom manner according to experience or other conditions.
  • step 105 determining an association event of the news to be identified based on the similarity between the word vector of the news to be identified and the word vector of the event, and determining that the news to be identified corresponds to the related event.
  • Time nodes including:
  • Step 1051 Construct a first feature corresponding to the news to be identified based on the similarity between the word vector of the news to be identified and the word vector of the event.
  • the manner in which the similarity between the word vector of the to-be-recognized news and the word vector of the event is determined includes: determining the matching probability value between the word vector of the news and the word vector of the event; or, by calculating the word vector and event of the news The similarity value between the word vectors is determined.
  • the first feature refers to the similarity represented by the matching probability value or the similarity value of the word vector of the news to be recognized and the word vector of the event.
  • the similarity value between the word vector of the news and the word vector of the event is calculated as follows:
  • a i represents the word vector of the keyword of the i-th event in f e
  • f n represents the keyword of the news to be recognized
  • b j represents the j-th news in f n
  • the word vector of the keyword, n represents the number of keywords of the news
  • K represents the number of keywords of the event.
  • the word vector of the event keyword and the word vector of the keyword of the news all express the corresponding information in a digital manner, and how to determine the word vector of the keyword of the event and the word vector of the keyword of the news can be realized by a known method. As implemented by the word2vec language model.
  • the specific representation of the first feature corresponding to the news to be identified is constructed as follows:
  • Equation 2 fea represents the first feature corresponding to the news to be identified. Wherein, fea represents the feature of the news to be identified relative to an event, and when there are N events, there are N said fea.
  • step 1052 the first feature is input as the sample feature into the first classification model, and the confidence that the different event is the associated event of the news to be identified is obtained.
  • the first classification model may be a softmax regression model or a support vector machine (SVM) model.
  • the sample feature is represented by x, and the first feature is input as the sample feature into the first classification model to obtain a specific representation of the confidence that the different events are associated events of the news to be identified as follows:
  • Equation 3 h ⁇ (x) represents the confidence, ⁇ represents the model parameters obtained by training, and x represents the sample features.
  • Step 1053 determining that the event that the confidence meets the condition is an associated event of the news to be identified.
  • Equation 4 J( ⁇ ) represents the cost function, x (i) represents the input, y (i) represents the output, and m represents the number of sample features.
  • an iterative optimization algorithm such as the gradient descent method to solve the minimized cost function, it is determined that the confidence needs to satisfy the condition, and an available classification model is realized, that is, the model parameters of the classification model are determined.
  • the first feature corresponding to the news to be identified is input into the first classification model, and the probability (confidence) of the related news that the news to be identified belongs to one event is determined, that is, the probability that the event is the associated event of the news to be identified. Determining an association event of the to-be-identified news according to the confidence level, and further determining a time node corresponding to the to-be-identified news in the association event.
  • step 1051 based on the similarity between the word vector of the news to be identified and the word vector of the event, constructing the first feature corresponding to the news to be identified, including:
  • the following feature components are combined to obtain a first feature corresponding to the news to be identified: a similarity between the word vector of the news to be recognized and the word vector of the event; a relationship between the time of the news to be recognized and the time node of the event.
  • the time of the news to be identified refers to the release time of the news to be recognized, and the relationship between the time of the news to be identified and the time node of the event refers to the relationship between the release time of the news to be recognized and each time node of the event.
  • the time of the news to be identified includes the time of publication of the news to be identified, the time of occurrence of the event content involved in the news to be identified, and the like. Taking the time when the news to be recognized is the release time of the news to be identified as an example, the relationship between the time of the news to be identified and the time node of the event may be the difference between the release time of the news to be identified and the time when the event occurs. Based on the similarity between the word vector of the news to be identified and the word vector of the event, the first feature corresponding to the news to be identified is constructed as follows:
  • Equation 5 fea represents the first feature corresponding to the news to be identified, Similarar represents the similarity between the keyword of the news and the keyword of the event, newtime represents the release time of the news to be identified, and eventtime represents the time node of the event, in the instance In the event, the event can occur at the time when the event occurs, and the event can occur at the time corresponding to the first time node of the event.
  • a one-dimensional feature component the mean of the word vectors of the news to be identified.
  • determining a time node corresponding to the news to be identified in the associated event includes:
  • Step 1054 Construct a second feature corresponding to the news to be identified based on the relationship between the time of the news to be identified and the time node of the event.
  • the time of the news to be identified refers to the release time of the news to be recognized
  • the relationship between the time of the news to be identified and the time node of the event refers to the relationship between the release time of the news to be recognized and each time node of the event.
  • the time of the news to be identified mainly includes the time of publication of the news to be identified, the time when the content of the event to be identified is involved, and the like.
  • the relationship between the time of the news to be recognized and the time node of the event may be a difference between the time of the news to be recognized and the time node of the event, or a value given according to the magnitude of the difference, or the like.
  • the time in the news to be identified refers to the time of the news release, and the relationship between the time in the news to be identified and the time node of the event is a difference
  • the time vector for constructing the news to be identified is as follows:
  • Timefea [newtime-e_time 0 ,....,newtime-e_time i ,...,newtime-e_time n ] (Equation 6)
  • timefea represents the time vector of the news to be identified
  • e_time i represents the ith time node of the event
  • newtime represents the news release time of the news to be identified.
  • timefea can be used as the second feature, and in addition, Timefea] as a second feature, where W i is the word vector of the i-th keyword of the news to be recognized, and M is the number of keywords in the news to be recognized.
  • step 1055 the second feature is input to the second classification model, and the confidence of the different time nodes of the associated event corresponding to the news to be identified is obtained.
  • the second classification model can be a softmax regression model or an SVM model.
  • Outputting the second feature to the second classification model means inputting the second feature as the second sample feature to the second classification model, expressing the sample feature by x, and outputting the second feature to the second classification model to obtain the
  • the specific representation of the confidence that the news corresponds to the different time nodes of the associated event is as follows:
  • Equation 7 h ⁇ (x) represents confidence, ⁇ represents training model parameters, and x represents sample characteristics.
  • Step 1056 Determine that the time node whose confidence meets the condition is a time node corresponding to the news to be identified.
  • Equation 8 J( ⁇ ) represents the cost function, x (i) represents the input, y (i) represents the output, and m represents the number of sample features.
  • an iterative optimization algorithm such as the gradient descent method to solve the minimized cost function
  • an available classification model is realized, that is, the model parameters of the second classification model are determined.
  • the second feature is input to the second classification model, and the probability of each time node of the news corresponding event to be identified is calculated, that is, the time node corresponding to the news to be identified is determined by the probability of each time node of the news corresponding event to be identified, wherein
  • the cost function is used to determine the parameters of the model, and the parameters of the model are obtained through training.
  • some samples are input into the formula (7) to obtain the confidence of the sample, wherein the confidence of the sample is the confidence of the representation of the model parameter. Enter the confidence of the sample into equation (8), solve the cost function, and determine the parameters of the model.
  • step 1054 based on the relationship between the time of the news to be identified and the time node of the event, the second feature corresponding to the news to be identified is constructed, including:
  • the following feature components are combined to obtain the second feature corresponding to the news to be identified: the mean value of the word vector of the news to be recognized; the relationship between the time of the news to be recognized and the different time nodes of the associated event.
  • the mean value of the word vector of the news to be identified refers to the mean value of the word vector corresponding to the time node of the event to be identified by the news.
  • the relationship between the time of the news to be recognized and the time node of the event may be a difference between the time of the news to be recognized and the time node of the event, or a value given according to the magnitude of the difference, or the like.
  • the relationship between the time in the news to be identified and the time node of the event is a difference
  • the second feature of constructing the news to be identified is as follows:
  • Equation 9 fea represents the second feature
  • M represents the number of time nodes associated with the event
  • Wi represents the word vector of the i-th word of the news to be identified
  • timefea represents the relationship between the time in the news to be recognized and the time node of the event.
  • the time vector of the to-be-identified news to be characterized such as the time vector of the news to be identified, which is represented by the difference between the time in the news to be recognized and the time node of the event, as shown in Equation 6.
  • step 105 based on the similarity between the word vector of the news to be recognized and the word vector of the event, the associated event of the news to be identified is determined, and the news to be identified is determined to be associated.
  • the time node corresponding to the time may also be implemented by another implementation manner, and the time node corresponding to the news to be identified is directly determined according to the third classification model.
  • the plurality of events correspond to a plurality of time nodes, and the third classification model is used to determine which one of the plurality of time nodes corresponds to the news to be identified, and then the event corresponding to the determined time node is used as the association of the news to be identified.
  • the event specifically includes the following steps:
  • Step 1057 Based on the similarity between the word vector of the news to be recognized and the word vector of the event, and the relationship between the time of the news to be recognized and the time node of the event, construct a third feature corresponding to the news to be identified.
  • step 1057 based on the similarity between the word vector of the news to be identified and the word vector of the event, and the relationship between the time of the news to be recognized and the time node of the event, constructing a third feature corresponding to the news to be identified Including combining the following feature components to obtain a third feature: a similarity between the word vector of the news to be recognized and the word vector of the event; a relationship between the time of the news to be recognized and the occurrence time node of the event; the mean value of the word vector of the news to be identified; The relationship between the time of the news to be identified and the different time nodes of the associated event.
  • the feature component may be the same as the corresponding feature component in the foregoing embodiment, such as the similarity between the word vector of the news to be recognized and the word vector of the event, as shown in formula (2), the word vector and event of the news to be recognized.
  • the combination of the similarity of the word vector, the relationship between the time of the news to be recognized and the time of occurrence of the event is as shown in the formula (5); the relationship between the time of the news to be identified and the time node of the event is as shown in the formula (6).
  • Step 1058 Input a third feature to a third classification model, and obtain a confidence that the time of the news to be identified corresponds to different time nodes of different events.
  • the third classification model may be a softmax regression model or a SVM (Support Vector Machine) model.
  • the outputting the third feature to the third classification model refers to inputting the third feature as a third sample feature to the third classification model, expressing the sample feature by x, and outputting the third feature to the third classification model to obtain news to be identified.
  • the specific representation of the confidence of the time corresponding to different time nodes of different events is as follows:
  • Equation 10 h ⁇ (x) represents confidence, ⁇ represents training model parameters, and x represents sample features formed by the third feature.
  • the third feature of the news sample and the time node corresponding to the news are respectively trained as inputs and outputs of the third model.
  • the time nodes of the plurality of news constitute a set of time nodes, each time node carries an identifier of the corresponding event, and the third classification model is used to determine which time node in the set of corresponding time nodes of the news to be identified, and corresponding to the determined time node
  • the event acts as an associated event for the news to be identified.
  • Step 1059 Determine that the time node whose confidence meets the condition is a time node corresponding to the news to be identified, and the event corresponding to the determined time node is used as an associated event of the news to be identified.
  • Equation 11 J( ⁇ ) represents the cost function, x (i) represents the input, y (i) represents the output, and m represents the number of sample features.
  • An iterative optimization algorithm such as gradient descent method is used to solve the minimum cost function to determine the model parameters of the third classification model, to implement an available classification model, and to input the third feature of the news to be identified into the third classification model to determine the third
  • the information to be identified by the classification model corresponds to the probability of each time node, and the time node that determines the confidence satisfaction condition is the time node corresponding to the news to be identified, and further determines the event corresponding to the determined time node as the associated event of the news to be identified.
  • the development stage of the event is divided by the time node of the event, and the life cycle of the related news related to the event is corresponding to the development stage of the event, thereby identifying whether the news and the event are recognized.
  • the correlation and the time of the news correspond to the judgment of the current stage of development of the event is more scientific and precise, and further, through this method, the calculation of the expiration time of the news can achieve better results.
  • the news processing method can be applied to any news reading application software for users to read news information, such as Daily Express, Tencent News, and the like.
  • the terminal 100 is installed as a client for the daily newsletter.
  • the news reading application provided by the embodiment of the present application is in the news push service.
  • FIG. 8 is a schematic diagram of an application scenario displayed by a news reading application on a terminal during a news push service according to an embodiment of the present application.
  • the user can read the server through the news processing method to determine the to-be-identified news by installing a news reading application client in the terminal. After the associated event and the time node corresponding to the event, the news corresponding to the current development stage of the event is pushed, and the user views through the software interface of the news reading application on the terminal 100.
  • a specific application manner of determining the expiration time of the news by the news processing method provided by the embodiment of the present application is as follows, taking the sports event A and the news B to be identified as an example, including:
  • obtaining a time node of the event A by clustering the related news of the event specifically comprising: clustering related news of the sports event A, and acquiring four time nodes A1, A2, and A3 of the sports event A.
  • A4 divides the event into the game event A before the game (time nodes A1 to A2), the game event A (time nodes A2 to A3), and the game event A (time nodes A3 to A4).
  • the keyword of the news B to be identified and the keyword of the event A are obtained.
  • it is determined whether the news B to be identified is the related news of the event A and specifically includes: Extracting the structured information as the keyword of the news B from the title, the report content and the comment of the news B to be identified, and calculating the similarity between the keyword of the news B and the keyword of the predefined or pre-extracted event A, and The sample features are constructed according to the similarity, and classified by the classification model to determine whether the news B to be identified is the related news of the match event A.
  • the extraction of the keyword of the news B to be identified may take into account the full text of the news or even the content included in the comment, and the similarity includes the keywords of the plurality of news and the key of the event respectively.
  • the calculation of the similarity between words can obtain more accurate judgment results.
  • the news related to the recorded sports events in the news to be identified may not be effectively identified and recalled, so that the news and The accuracy of the correlation of events is more accurate.
  • the recall rate of news and competitions can reach 85%, and the correct rate can reach 98%.
  • the method includes: constructing a sample feature according to a release time of the news to be identified and a time node of the event, and classifying the classification model to determine which time node of the game event A corresponding to the news item B to be identified, for example, determining that the news B to be identified corresponds to the game The previous stage, that is, the time node in the corresponding association event is A1; if the news B to be identified corresponds to the stage in the game, that is, the time node in the corresponding association event is A2; if the news B to be identified corresponds to the connection after the game , that is, the time node in the corresponding associated event is A3.
  • the terminal 100 recalls when the expiration time node corresponding to the news B to be identified arrives.
  • the dead time node corresponding to the news B to be identified is the next time node A n+1 of the corresponding time node A n .
  • the next time node A n+1 is determined as the expiration time of the news B to be identified according to the corresponding time node A n .
  • Any two adjacent time nodes (A n , A n+1 ) respectively represent the start and end time of a development phase of event A, and can be started at the current development stage by determining the development stage of the event in which the news to be identified is located.
  • the related news that will belong to the previous development stage will be invalidated to ensure the timeliness of the news.
  • the related news belonging to the pre-match is pushed to the user before the mid-stage of the match event A, and is recalled when the time node A2 of the match event A arrives; the related news belonging to the match is in the post-competition phase.
  • the correct rate of news recognition before the game can reach 95%
  • the correct rate of news recognition in the game can reach 90%
  • the correct rate of news recognition after the game can reach 97%.
  • the above news processing method can improve the competitiveness of news reading application software by setting a reasonable life cycle for news and improving the timeliness of news recommendation.
  • a news processing apparatus including a first obtaining module 11, a second obtaining module 13, and a determining module 15.
  • the first obtaining module 11 is configured to acquire a word vector of the news to be identified.
  • the second obtaining module 13 is configured to acquire a word vector corresponding to the event and a time node of the event.
  • the determining module 15 is configured to determine an association event of the news to be identified based on the similarity between the word vector of the news to be identified and the word vector of the event, and determine a time node corresponding to the news to be identified in the associated event, according to the time node Determine if the news is valid.
  • the first obtaining module 11 includes a keyword extracting unit 111 and a word vector unit 113.
  • the keyword extraction unit is configured to extract keywords based on the news to be identified.
  • the word vector unit is used to map the extracted keywords into the word vector space to obtain a word vector corresponding to the keyword.
  • the keyword extracting unit is specifically configured to extract keywords corresponding to the to-be-identified news from at least one of the following: the to-be-identified news; the specific associated information of the to-be-identified news.
  • the second acquisition module 13 includes a predefined unit 131 or a clustering unit 133.
  • the predefined unit 131 is configured to acquire a predefined time node of the event.
  • the clustering unit 133 is configured to acquire related news of the event and perform clustering processing, and determine a time node of the event according to time information included in the related news of different categories.
  • the failure determination module 17 is further configured to determine that the news to be identified is invalid when the type of the corresponding time node is an end time node and the preset failure time of the end time node arrives.
  • the failure determination module 17 is configured to determine that the news to be identified is invalid when the corresponding time node is a specific time node associated with the failure.
  • the determination module 15 includes a first feature unit 151, a first classification unit 152, and an event determination unit 153.
  • the first feature unit 151 is configured to construct a first feature corresponding to the news to be identified based on the similarity between the word vector of the news to be recognized and the word vector of the event.
  • the first classification unit 152 is configured to input the first feature as a sample feature into the first classification model, and obtain a confidence that the different events are associated events of the news to be identified.
  • the event determining unit 153 is configured to determine that the event whose confidence meets the condition is an associated event of the news to be identified.
  • the first feature unit 151 is specifically configured to combine the following feature components to obtain a first feature corresponding to the news to be identified: a similarity between the word vector of the news to be recognized and the word vector of the event; a time of the news to be identified and a time node of the event Relationship.
  • the determining module further includes a second feature unit 154, a second classifying unit 155, and a time determining unit 156.
  • the second feature unit 154 is configured to construct a second feature corresponding to the news to be identified based on the relationship between the time of the news to be identified and the time node of the event.
  • the second classification unit 155 is configured to input the second feature to the second classification model, and obtain the confidence of the different time nodes of the related event corresponding to the news to be identified.
  • the time determining unit 156 is configured to determine that the time node whose confidence meets the condition is a time node corresponding to the news to be identified.
  • the second feature unit 154 is specifically configured to combine the following feature components to obtain a second feature corresponding to the news to be identified: an average value of the word vector of the news to be recognized; a relationship between the time of the news to be identified and different time nodes of the associated event.
  • the determining unit 15 includes a third feature unit 157, a third classifying unit 158, and a determining unit 159.
  • the third feature unit 157 is configured to construct a third feature corresponding to the news to be identified based on the similarity between the word vector of the news to be recognized and the word vector of the event, and the relationship between the time of the news to be recognized and the time node of the event.
  • the third classification unit 158 is configured to input the third feature to the third classification model, and obtain the confidence that the time of the news to be identified corresponds to different time nodes of different events.
  • the determining unit 159 is configured to determine that the time node satisfying the condition satisfies the time node corresponding to the news to be identified, and the event corresponding to the determined time node as the associated event of the news to be identified.
  • the news processing apparatus divides the event into a plurality of development stages according to a certain common characteristic of different time periods by the time node of the event, and determines the event corresponding to the news to be identified.
  • the time node can thus know the development stage of the event in which the news is to be identified, determine whether the news belongs to the news of the current development stage of the event, and determine the news that is not in the current development stage of the event as invalid news.
  • the event-based time node sets a reasonable life cycle for the news, and timely determines news that is not in the current development stage of the event as invalid news, so as to avoid recommending news that is not in line with the current development stage of the event to the user, so as to improve the direction.
  • the timeliness of the news recommended by the user are examples of the news that is not in line with the current development stage of the event to the user.
  • the embodiment of the present application further provides a computer device, including a processor and a memory for storing a computer program capable of running on a processor, wherein when the processor is configured to run the computer program, executing:
  • a news processing method comprising: acquiring a word vector of a news to be recognized; acquiring a word vector of an event, and a time node of the event; and similarity between a word vector of the news to be recognized and a word vector of the event, Determining an associated event of the to-be-identified news, and determining a time node corresponding to the to-be-identified news in the associated event, and determining, according to the time node, whether the news is valid.
  • the processor is further configured to: when the computer program is executed, perform: the acquiring a word vector of the news to be identified, including: extracting a keyword based on the news to be identified; mapping the extracted keyword to a word vector space, The word vector corresponding to the keyword.
  • the processor is further configured to: when the computer program is executed, the extracting a keyword based on the to-be-identified news, comprising: extracting a keyword corresponding to the news to be identified from at least one of the following: the news to be identified; The specific associated information of the news to be identified.
  • the processor is further configured to: when the computer program is executed, execute: the acquiring a time node of the event, comprising: acquiring a predefined time node of the event; or acquiring relevant news of the event and performing The clustering process determines a time node of the event according to time information included in related news of different categories.
  • the processor is further configured to: when the computer program is executed, the determining, according to the similarity between the word vector of the to-be-identified news and the word vector of the event, determining the associated event of the to-be-identified news, including: Constructing, according to the similarity between the word vector of the news to be identified and the word vector of the event, constructing a first feature corresponding to the news to be identified; and inputting the first feature as a sample feature into the first classification model to obtain a different
  • the event is a confidence level of the associated event of the news to be identified; and the event determining that the confidence meets the condition is an associated event of the news to be identified.
  • the processor is further configured to: when the computer program is executed, perform: the first feature corresponding to the to-be-identified news based on the similarity between the word vector of the to-be-identified news and the word vector of the event,
  • the method includes: combining the following feature components to obtain the first feature corresponding to the news to be identified: a similarity between a word vector of the news to be recognized and a word vector of the event; and a time of the news to be identified The relationship of the time nodes of the event.
  • the processor is further configured to: when the computer program is executed, the determining, by the determining, the time node corresponding to the to-be-identified news in the association time, comprising: a time based on the to-be-identified news and the event And establishing a second feature corresponding to the news to be identified; and inputting the second feature to the second classification model to obtain a confidence that the to-be-identified news corresponds to different time nodes of the associated event;
  • the time node that determines that the confidence meets the condition is a time node corresponding to the news to be identified.
  • the processor is further configured to: when the computer program is executed, the second feature corresponding to the news to be identified based on the relationship between the time of the news to be identified and the time node of the event, including: Combining the following feature components to obtain the second feature corresponding to the news to be identified: the mean value of the word vector of the news to be identified; the relationship between the time of the news to be identified and different time nodes of the associated event .
  • the processor is further configured to: when the computer program is executed, perform: determining, according to a similarity between a word vector of the to-be-identified news and a word vector of the event, an association event of the to-be-identified news, and Determining a time node corresponding to the to-be-identified news in the association time, comprising: a similarity between a word vector of the to-be-identified news and a word vector of the event, and a time and a location of the to-be-identified news Constructing a relationship of time nodes of the event, constructing a third feature corresponding to the news to be identified; inputting the third feature to the third classification model, and obtaining a confidence that the time of the news to be identified corresponds to different time nodes of different events And determining, by the time node that the confidence meets the condition, the time node corresponding to the news to be identified, and the event corresponding to the determined time node as the associated event of the news to be identified.
  • the processor is further configured to: when the computer program is executed, the method further includes: when the type of the corresponding time node is an end time node, and the preset is compared to the end time node When the failure duration arrives, it is determined that the news to be identified is invalid.
  • the processor is further configured to: when the computer program is executed, execute: the news processing method further comprises: when the corresponding time node is a specific time node associated with the failure, determining that the news to be identified is invalid.
  • FIG. 12 is a schematic diagram of an internal structure of a computer device, which may be the server 200 shown in FIG. 1, including a processor connected through a system bus, an internal memory, a network interface, and a non-volatile storage medium. .
  • the processor is configured to implement a computing function and a function of controlling a server.
  • the processor is configured to execute the news processing method provided by the embodiment of the present application.
  • the non-volatile storage medium stores an operating system, a database, and a news processing apparatus for implementing the news processing method provided by the embodiments of the present application.
  • the network interface is used to connect to the terminal.
  • the memory can be implemented by any type of volatile or non-volatile storage device, or a combination thereof.
  • the non-volatile memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), or an Erasable Programmable Read (EPROM). Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or Compact Disc Read-Only Memory (CD-ROM); the magnetic surface memory can be a disk storage or a tape storage.
  • the volatile memory can be a random access memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • SRAM Static Random Access Memory
  • SSRAM Synchronous Static Random Access Memory
  • SSRAM Dynamic Random Access
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM enhancement Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM Synchronous Dynamic Random Access Memory
  • DRRAM Direct Memory Bus Random Access Memory
  • the memory is used to store various types of data to support the operation of the news processing device.
  • Examples of such data include any computer program for operating on a news processing device, such as an operating system and applications; news to be identified, word vectors for news to be identified, time nodes for events, word vectors for time, and the like.
  • the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks.
  • the application can include various applications, such as a news application, a Media Player, a browser, etc., for implementing various application services.
  • a program implementing the method of the embodiment of the present application may be included in an application.
  • the network interface is used for wired or wireless communication between the news processing device and other devices.
  • the news processing device can access a wireless network based on a communication standard such as WiFi, 2G or 3G, or a combination thereof.
  • the network interface receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the network interface further includes a Near Field Communication (NFC) module to facilitate short range communication.
  • NFC Near Field Communication
  • the NFC module may be based on Radio Frequency Identification (RFID) technology, IrDA (Infrared Data Association) technology, Ultra Wideband (UWB) technology, Bluetooth (BT, BlueTooth) technology or other technologies. to fulfill.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth BT, BlueTooth
  • the news processing method disclosed in the above embodiment of the present application may be applied to a processor or implemented by a processor.
  • the number of processors may be one or more to complete all or part of the steps of the above method.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general purpose processor, a digital signal processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.
  • DSP digital signal processor
  • the processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application.
  • a general purpose processor can be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiment of the present application may be directly implemented as a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a storage medium, the storage medium being located in the memory, the processor reading the information in the memory, and completing the steps of the foregoing methods in combination with the hardware thereof.
  • the news processing device may be configured by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), and Complex Programmable Logic Devices (CPLDs). , Complex Programmable Logic Device), Field-Programmable Gate Array (FPGA), General Purpose Processor, Controller, Micro Controller Unit (MCU), Microprocessor, or other electronics Element implementation for performing the aforementioned method.
  • ASICs Application Specific Integrated Circuits
  • DSPs Programmable Logic Devices
  • PLDs Programmable Logic Devices
  • CPLDs Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • MCU Micro Controller Unit
  • Microprocessor or other electronics Element implementation for performing the aforementioned method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé de traitement d'actualités, un appareil, un support d'informations et un dispositif informatique, le procédé consistant : à acquérir un vecteur de mot d'une actualité à identifier (101); à acquérir un vecteur de mot d'un événement et un nœud temporel de l'événement (103); sur la base d'un degré de similarité entre le vecteur de mot de l'actualité à identifier et le vecteur de mot de l'événement, à déterminer un événement pertinent de l'actualité à identifier, à déterminer un nœud temporel correspondant dans l'événement pertinent de l'actualité à identifier (105), et à déterminer si l'actualité est en vigueur en fonction du nœud temporel (106).
PCT/CN2018/104156 2017-09-05 2018-09-05 Procédé de traitement d'actualités, appareil, support d'informations et dispositif informatique Ceased WO2019047849A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710791715.7A CN110020104B (zh) 2017-09-05 2017-09-05 新闻处理方法、装置、存储介质及计算机设备
CN201710791715.7 2017-09-05

Publications (1)

Publication Number Publication Date
WO2019047849A1 true WO2019047849A1 (fr) 2019-03-14

Family

ID=65634737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104156 Ceased WO2019047849A1 (fr) 2017-09-05 2018-09-05 Procédé de traitement d'actualités, appareil, support d'informations et dispositif informatique

Country Status (2)

Country Link
CN (1) CN110020104B (fr)
WO (1) WO2019047849A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990705A (zh) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 一种新闻处理方法、装置、设备及介质
CN111125429A (zh) * 2019-12-20 2020-05-08 腾讯科技(深圳)有限公司 一种视频推送方法、装置和计算机可读存储介质
CN111125520A (zh) * 2019-12-11 2020-05-08 东南大学 一种面向新闻文本的基于深度聚类模型的事件线抽取方法
CN112948528A (zh) * 2021-03-02 2021-06-11 北京秒针人工智能科技有限公司 一种基于关键词的数据归类方法及系统
CN113407714A (zh) * 2020-11-04 2021-09-17 腾讯科技(深圳)有限公司 基于时效的数据处理方法、装置、电子设备及存储介质
CN115048486A (zh) * 2022-05-24 2022-09-13 支付宝(杭州)信息技术有限公司 事件抽取方法、装置、计算机程序产品、存储介质及设备
CN118626653A (zh) * 2024-08-09 2024-09-10 西安康奈网络科技有限公司 一种基于特征识别的网络新闻归类管理系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704603B (zh) * 2019-09-12 2022-09-09 武汉灯塔之光科技有限公司 一种通过资讯发掘当前热点事件的方法和装置
CN110889024A (zh) * 2019-10-25 2020-03-17 武汉灯塔之光科技有限公司 一种用于计算资讯关联股票的方法和装置
CN110888877A (zh) * 2019-11-13 2020-03-17 深圳市超视智慧科技有限公司 事件信息显示方法、装置、计算设备及存储介质
CN112257734B (zh) * 2019-11-15 2024-08-20 北京沃东天骏信息技术有限公司 一种信息处理方法及装置、存储介质
CN110929018B (zh) * 2019-12-04 2023-03-21 Oppo(重庆)智能科技有限公司 文本处理方法、装置、存储介质及电子设备
CN111324748B (zh) * 2020-02-28 2023-08-04 北京百度网讯科技有限公司 一种体育战报的生成方法、装置、电子设备及存储介质
CN113722593B (zh) * 2021-08-31 2024-01-16 北京百度网讯科技有限公司 事件数据处理方法、装置、电子设备和介质
CN114185922B (zh) * 2021-12-01 2025-04-18 维沃移动通信有限公司 信息检测方法、信息检测装置、电子设备和可读存储介质
CN116340639B (zh) * 2023-03-31 2023-12-12 北京百度网讯科技有限公司 新闻召回方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012092150A2 (fr) * 2010-12-30 2012-07-05 Pelco Inc. Moteur d'inférence pour la détection d'événement et la recherche légiste sur la base de métadonnées d'analyse vidéo
CN104915446A (zh) * 2015-06-29 2015-09-16 华南理工大学 基于新闻的事件演化关系自动提取方法及其系统
CN105468669A (zh) * 2015-10-13 2016-04-06 中国科学院信息工程研究所 一种融合用户关系的自适应微博话题追踪方法
CN106886567A (zh) * 2017-01-12 2017-06-23 北京航空航天大学 基于语义扩展的微博突发事件检测方法及装置
CN107122423A (zh) * 2017-04-06 2017-09-01 深圳Tcl数字技术有限公司 影视推介方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8661025B2 (en) * 2008-11-21 2014-02-25 Stubhub, Inc. System and methods for third-party access to a network-based system for providing location-based upcoming event information
CN103324718B (zh) * 2013-06-25 2016-08-10 百度在线网络技术(北京)有限公司 基于海量搜索日志挖掘话题脉络的方法和系统
CN103473263B (zh) * 2013-07-18 2017-02-08 大连理工大学 一种面向新闻事件演变过程的可视化展现方法
CN104768131B (zh) * 2015-03-12 2018-10-19 中国科学技术大学苏州研究院 一种基于车车通信的中继节点告警消息转发方法
CN107016556B (zh) * 2016-01-27 2021-02-05 创新先进技术有限公司 数据处理方法及装置
CN105787095B (zh) * 2016-03-16 2019-09-27 广州索答信息科技有限公司 互联网新闻的自动生成方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012092150A2 (fr) * 2010-12-30 2012-07-05 Pelco Inc. Moteur d'inférence pour la détection d'événement et la recherche légiste sur la base de métadonnées d'analyse vidéo
CN104915446A (zh) * 2015-06-29 2015-09-16 华南理工大学 基于新闻的事件演化关系自动提取方法及其系统
CN105468669A (zh) * 2015-10-13 2016-04-06 中国科学院信息工程研究所 一种融合用户关系的自适应微博话题追踪方法
CN106886567A (zh) * 2017-01-12 2017-06-23 北京航空航天大学 基于语义扩展的微博突发事件检测方法及装置
CN107122423A (zh) * 2017-04-06 2017-09-01 深圳Tcl数字技术有限公司 影视推介方法及装置

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990705A (zh) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 一种新闻处理方法、装置、设备及介质
CN110990705B (zh) * 2019-12-06 2024-04-12 深圳市雅阅科技有限公司 一种新闻处理方法、装置、设备及介质
CN111125520A (zh) * 2019-12-11 2020-05-08 东南大学 一种面向新闻文本的基于深度聚类模型的事件线抽取方法
CN111125520B (zh) * 2019-12-11 2023-04-21 东南大学 一种面向新闻文本的基于深度聚类模型的事件线抽取方法
CN111125429A (zh) * 2019-12-20 2020-05-08 腾讯科技(深圳)有限公司 一种视频推送方法、装置和计算机可读存储介质
CN111125429B (zh) * 2019-12-20 2023-05-30 腾讯科技(深圳)有限公司 一种视频推送方法、装置和计算机可读存储介质
CN113407714A (zh) * 2020-11-04 2021-09-17 腾讯科技(深圳)有限公司 基于时效的数据处理方法、装置、电子设备及存储介质
CN113407714B (zh) * 2020-11-04 2024-03-12 腾讯科技(深圳)有限公司 基于时效的数据处理方法、装置、电子设备及存储介质
CN112948528A (zh) * 2021-03-02 2021-06-11 北京秒针人工智能科技有限公司 一种基于关键词的数据归类方法及系统
CN115048486A (zh) * 2022-05-24 2022-09-13 支付宝(杭州)信息技术有限公司 事件抽取方法、装置、计算机程序产品、存储介质及设备
CN115048486B (zh) * 2022-05-24 2024-05-31 支付宝(杭州)信息技术有限公司 事件抽取方法、装置、计算机程序产品、存储介质及设备
CN118626653A (zh) * 2024-08-09 2024-09-10 西安康奈网络科技有限公司 一种基于特征识别的网络新闻归类管理系统

Also Published As

Publication number Publication date
CN110020104B (zh) 2023-04-07
CN110020104A (zh) 2019-07-16

Similar Documents

Publication Publication Date Title
WO2019047849A1 (fr) Procédé de traitement d'actualités, appareil, support d'informations et dispositif informatique
US20190311709A1 (en) Computerized system and method for formatted transcription of multimedia content
CN109582799B (zh) 知识样本数据集的确定方法、装置及电子设备
US8990065B2 (en) Automatic story summarization from clustered messages
CN105701254B (zh) 一种信息处理方法和装置、一种用于信息处理的装置
WO2020207074A1 (fr) Procédé et dispositif de poussée d'informations
WO2018072071A1 (fr) Système et procédé de construction de carte de connaissances
CN108009228A (zh) 一种内容标签的设置方法、装置及存储介质
CN113806588B (zh) 搜索视频的方法和装置
CN110297880B (zh) 语料产品的推荐方法、装置、设备及存储介质
JP6165955B1 (ja) 検索クエリに応答してホワイトリストとブラックリストを使用し画像とコンテンツをマッチングする方法及びシステム
US12316891B2 (en) Video generating method and apparatus, electronic device, and readable storage medium
CN110990563A (zh) 一种基于人工智能的传统文化素材库构建方法及系统
WO2015070798A1 (fr) Procédé de groupement d'images, système de groupement d'images, et serveur de groupement d'images
CN109299277A (zh) 舆情分析方法、服务器及计算机可读存储介质
CN108846097A (zh) 用户的兴趣标签表示方法、文章推荐方法、及装置、设备
CN113254665B (zh) 一种知识图谱扩充方法、装置、电子设备及存储介质
CN113688310A (zh) 一种内容推荐方法、装置、设备及存储介质
CN108038243A (zh) 音乐推荐方法、装置、存储介质及电子设备
CN110909258B (zh) 一种信息推荐方法、装置、设备及存储介质
CN113761129B (zh) 文章搜索方法和装置、计算机设备以及存储介质
US11314793B2 (en) Query processing
JP2020521246A (ja) ネットワークアクセス可能なコンテンツの自動化された分類
CN113407775A (zh) 视频搜索方法、装置及电子设备
WO2024179519A1 (fr) Procédé et appareil de reconnaissance sémantique

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18854812

Country of ref document: EP

Kind code of ref document: A1