CN104424252B

CN104424252B - Literal information processing method and word content server based on XML

Info

Publication number: CN104424252B
Application number: CN201310381678.4A
Authority: CN
Inventors: 毕继安
Original assignee: Peking University Founder Group Co Ltd; Beijing Founder Electronics Co Ltd
Current assignee: BEIJING BEIDA FOUNDER ELECTRONICS Co Ltd; New Founder Holdings Development Co ltd
Priority date: 2013-08-28
Filing date: 2013-08-28
Publication date: 2017-12-15
Anticipated expiration: 2033-08-28
Also published as: CN104424252A

Abstract

The invention provides an XML-based word information processing method and a word content server. The method includes: the text content server obtains a piece of text information to be processed; the text content server uses at least two evaluation rule description files stored in advance to determine the evaluation information of the text information to be processed, wherein each evaluation The rule description file is used to describe the evaluation information corresponding to a type of characteristic content of a piece of text information, and the evaluation rule description file is a file described in XML; the text content server uses the obtained evaluation information for the paragraph Text information is marked. The invention enables the user to obtain the comprehensive evaluation of the text according to the evaluation information when searching, and find the content required by the user as soon as possible.

Description

Text information processing method and text content server based on XML

技术领域technical field

本发明涉及计算机技术，尤其涉及一种基于可扩展标记语言(Extensible MarkupLanguage，以下简称：XML)的文字信息处理方法和文字内容服务器。The present invention relates to computer technology, in particular to a text information processing method and text content server based on Extensible Markup Language (XML).

背景技术Background technique

随着互联网技术的不断发展，人类正在迎来一个信息大爆炸的时代。人们可以通过网络获取到大量的文字信息，例如新闻、论文、微博等。With the continuous development of Internet technology, human beings are ushering in an era of information explosion. People can obtain a large amount of text information through the Internet, such as news, papers, Weibo and so on.

在这大量的文字信息中，用户在需要查找自己所需要的文字信息时，时长会受到一些无效信息、不感兴趣的信息、甚至一些非法信息的干扰，从而为用户的查找带来不便。In this large amount of text information, when the user needs to find the text information he needs, he will be disturbed by some invalid information, uninteresting information, or even some illegal information for a long time, thus bringing inconvenience to the user's search.

因此，在现有存在大量文字信息的情况下，如何使用户方便地获知一段文字信息的综合评价，以便用户更快地找到其所需内容，成为亟待解决的问题。Therefore, in the case of a large amount of text information, how to enable users to easily obtain the comprehensive evaluation of a piece of text information, so that users can find the content they need more quickly, has become an urgent problem to be solved.

发明内容Contents of the invention

本发明提供一种基于XML的文字信息处理方法和文字内容服务器。The invention provides an XML-based word information processing method and a word content server.

本发明提供一种基于XML的文字信息处理方法，包括：The present invention provides an XML-based text information processing method, comprising:

文字内容服务器获取待处理的一段文字信息；The text content server acquires a piece of text information to be processed;

所述文字内容服务器采用预先存储的至少两种评价规则描述文件，确定所述待处理的该段文字信息的评价信息，其中，各评价规则描述文件用于对一段文字信息的一类特征内容所对应的评价信息进行描述，且所述评价规则描述文件为采用XML描述的文件；The text content server uses at least two types of pre-stored evaluation rule description files to determine the evaluation information of the piece of text information to be processed, wherein each evaluation rule description file is used to identify a type of characteristic content of a piece of text information. The corresponding evaluation information is described, and the evaluation rule description file is a file described in XML;

所述文字内容服务器采用获得的各评价信息对该段文字信息进行标记。The text content server uses the obtained evaluation information to mark the piece of text information.

本发明提供一种文字内容服务器，包括：The present invention provides a text content server, including:

获取模块，用于确定待处理的一段文字信息；An acquisition module, configured to determine a piece of text information to be processed;

确定模块，用于采用预先存储的至少两种评价规则描述文件，确定所述待处理的该段文字信息的评价信息，其中，各评价规则描述文件用于对一段文字信息的一类特征内容所对应的评价信息进行描述，且所述评价规则描述文件为采用XML描述的文件；The determination module is used to determine the evaluation information of the piece of text information to be processed by using at least two types of pre-stored evaluation rule description files, wherein each evaluation rule description file is used to identify a type of characteristic content of a piece of text information. The corresponding evaluation information is described, and the evaluation rule description file is a file described in XML;

标记模块，用于采用获得的各评价信息对该段文字信息进行标记。The marking module is configured to use the obtained evaluation information to mark the piece of text information.

本发明中，文字内容服务器上可以预存用于对一段文字信息的至少两类特征内容所对应的评价信息分别进行描述的评价规则描述文件，通过该评价规则描述文件可以对一段文字信息的各类特征内容进行评价，从而从多个角度对该段文字信息进行综合评价，并采用该评价信息对该段文字信息进行标记，进而使得用户在进行检索时，可以根据该评价信息获得该段文字的综合评价，尽快找到用户所需的内容。而且，该评价规则描述文件采用XML来描述，其通用性和扩展性较好。In the present invention, the text content server can pre-store evaluation rule description files used to describe the evaluation information corresponding to at least two types of characteristic content of a piece of text information, and through the evaluation rule description file, various types of text information of a piece of text information can be described. The feature content is evaluated, so as to comprehensively evaluate the text information from multiple angles, and use the evaluation information to mark the text information, so that users can obtain the information of the text information based on the evaluation information when searching. Comprehensive evaluation, find the content that the user needs as soon as possible. Moreover, the evaluation rule description file is described by XML, which has good versatility and expansibility.

附图说明Description of drawings

图1为本发明基于XML的文字信息处理方法实施例的流程图；Fig. 1 is the flow chart of the embodiment of the text information processing method based on XML of the present invention;

图2为本发明文字内容服务器实施例的结构示意图。FIG. 2 is a schematic structural diagram of an embodiment of a text content server in the present invention.

具体实施方式detailed description

图1为本发明基于XML的文字信息处理方法实施例的流程图，如图1所示，本实施例的方法可以包括：Fig. 1 is the flow chart of the embodiment of the text information processing method based on XML of the present invention, as shown in Fig. 1, the method of the present embodiment can comprise:

步骤101、文字内容服务器获取待处理的一段文字信息；Step 101, the text content server acquires a piece of text information to be processed;

步骤102、文字内容服务器采用预先存储的至少两种评价规则描述文件，确定所述待处理的文字信息的评价信息，其中，各评价规则描述文件用于对一段文字信息的一类特征内容所对应的评价信息进行描述，且所述评价规则描述文件为采用XML描述的文件；Step 102, the text content server uses at least two pre-stored evaluation rule description files to determine the evaluation information of the text information to be processed, wherein each evaluation rule description file is used to correspond to a type of characteristic content of a piece of text information The evaluation information is described, and the evaluation rule description file is a file described in XML;

步骤103、文字内容服务器采用获得的各评价信息对该段文字信息进行标记。Step 103, the text content server uses the obtained evaluation information to mark the piece of text information.

具体来说，文字内容服务器可以预先生成并存储至少两个评价规则描述文件。而且，该评价规则描述文件为采用XML描述的文件，每个评价规则描述文件可以用于对一段文字信息的一类特征内容所对应的评价信息进行描述。Specifically, the text content server may generate and store at least two evaluation rule description files in advance. Moreover, the evaluation rule description file is a file described in XML, and each evaluation rule description file can be used to describe the evaluation information corresponding to a type of characteristic content of a piece of text information.

举例来说，字数、敏感词、关键词出现频率等均可以作为一段文字信息的特征内容。对应地，该文字内容服务器上即可存储与每类特征内容对应的评价规则描述文件。For example, word count, sensitive words, frequency of occurrence of keywords, etc. can all be used as characteristic content of a piece of text information. Correspondingly, the evaluation rule description file corresponding to each type of characteristic content can be stored on the text content server.

以字数为特征内容举例来说，对应的评价规则，例如可以为：For example, for content characterized by the number of words, the corresponding evaluation rules can be, for example:

文章整体字数在0-100字，且标题大于30字，得60分；The overall word count of the article is 0-100 words, and the title is more than 30 words, 60 points are awarded;

文章整体字数在100-200字，得70分；The overall word count of the article is 100-200 words, and 70 points are awarded;

文章整体字数在200-300字，得80分；The overall word count of the article is 200-300 words, and 80 points are awarded;

文章整体字数在300-400字，得90分；The overall word count of the article is 300-400 words, and 90 points are awarded;

文章整体字数在500字以上，得100分；Articles with more than 500 words will get 100 points;

其余文章得0分。The rest of the articles get 0 points.

为了描述该评价规则，本实施例采用XML描述的评价规则描述文件来实现。例如schema文件，其描述框架，例如可以如下所示：In order to describe the evaluation rule, this embodiment is realized by using an evaluation rule description file described in XML. For example, the schema file, which describes the framework, can be as follows, for example:

上述源程序的相关说明可以参见XML语言的相关标准，此处不再赘述。For the relevant description of the above source program, please refer to the relevant standards of the XML language, which will not be repeated here.

因此，在上述框架的基础上，针对上述字数的评价规则描述文件，具体如下所示：Therefore, on the basis of the above framework, the evaluation rule description file for the above word count is as follows:

另外，还可以自定义更多的评价规则，例如下述文件提供了三类特征内容对应的评价规则描述文件，其中id为1的内容，描述了“敏感词”这类特征内容的评价规则，id为2的内容，描述了“五言绝句”这类特征内容的评价规则，id为3的内容，描述了“段落”这类特征内容的评价规则，具体文件形式如下所示：In addition, you can also customize more evaluation rules. For example, the following files provide evaluation rule description files corresponding to three types of characteristic content. Among them, the content with id 1 describes the evaluation rules for characteristic content such as "sensitive words". The content with id 2 describes the evaluation rules for characteristic content such as "five-character quatrains", and the content with id 3 describes the evaluation rules for characteristic content such as "paragraph". The specific file format is as follows:

因此，当文字内容服务器获取到待处理的一段文字信息之后，该文字内容服务器即可采用其预先存储的各种评价规则描述文件，确定该待处理的文字信息的评价信息，从而获得该段文字信息的不同特征内容的评价信息，举例来说，若该段文字信息的整体字数为250字，则字数特征内容的评价信息为80，若该段文字信息的敏感词为2级，对应评价信息为60，等等。因此，通过各种评价规则描述文件，即可获得从各种特征角度描述该段文字信息的评价信息。Therefore, after the text content server obtains a piece of text information to be processed, the text content server can use its pre-stored various evaluation rule description files to determine the evaluation information of the text information to be processed, thereby obtaining the The evaluation information of different characteristic content of the information. For example, if the overall word count of the text information is 250 characters, the evaluation information of the word count characteristic content is 80. If the sensitive words of the text information are level 2, the corresponding evaluation information for 60, and so on. Therefore, through various evaluation rule description files, it is possible to obtain evaluation information describing the text information from various characteristic angles.

然后，文字内容服务器即可采用获得的各评价信息对该段文字信息进行标记，例如将各个评价信息与该段文字信息存储在一起，或者也可以采用其它标记方式，本实施例不做限定。Then, the text content server can use the obtained evaluation information to mark the piece of text information, for example, store each evaluation information together with the piece of text information, or use other marking methods, which are not limited in this embodiment.

在完成上述过程后，该文字内容服务器即可采用下述两种方式来为用户提供搜索服务。After the above process is completed, the text content server can provide users with search services in the following two ways.

方式一：method one:

文字内容服务器接收用户输入的搜索请求，该搜索请求中包含关键字；The text content server receives a search request input by the user, and the search request contains keywords;

文字内容服务器确定与关键字对应的文字信息；The text content server determines the text information corresponding to the keyword;

文字内容服务器向用户推送搜索结果，搜索结果包括文字信息和该文字信息对应的各评价信息。The text content server pushes search results to the user, and the search results include text information and evaluation information corresponding to the text information.

该方式一中，用户在获得搜索得到的文字信息的同时，可以获得该文字信息的各评价信息，从而使得用户可以快捷方便地根据各评价信息来确定该文字信息的综合评价，以便用户更快地找到其所需内容。In the first method, the user can obtain the evaluation information of the text information while obtaining the text information obtained by searching, so that the user can quickly and conveniently determine the comprehensive evaluation of the text information according to the evaluation information, so that the user can quickly to find what it needs.

方式二：Method 2:

文字内容服务器接收用户输入的搜索请求，搜索请求中包含关键字和用户需求信息；The text content server receives the search request input by the user, and the search request contains keywords and user demand information;

文字内容服务器确定与关键字对应的文字信息，并根据文字信息对应的各评价信息确定满足用户需求信息的文字信息；The text content server determines the text information corresponding to the keyword, and determines the text information that meets the user's demand information according to the evaluation information corresponding to the text information;

文字内容服务器将满足用户需求的文字信息推送给用户。The text content server pushes the text information meeting the needs of the user to the user.

该方式二中，用户可以在发送搜索请求中把用户需求信息也发送给文字内容服务器，例如某一特征内容的评分高于一预设值的需求，从而使得文字内容服务器只将搜索得到的文字信息中满足用户需求信息的那部分文字信息推送给用户。相比与方式一来说，方式二能够让用户更快地找到其所需内容。In the second method, the user can also send the user's demand information to the text content server when sending the search request, for example, the requirement that the score of a certain feature content is higher than a preset value, so that the text content server only sends the searched text The part of the text information that meets the user's needs in the information is pushed to the user. Compared with method 1, method 2 allows users to find what they need more quickly.

本实施例中，文字内容服务器上可以预存用于对一段文字信息的至少两类特征内容所对应的评价信息分别进行描述的评价规则描述文件，通过该评价规则描述文件可以对一段文字信息的各类特征内容进行评价，从而从多个角度对该段文字信息进行综合评价，并采用该评价信息对该段文字信息进行标记，进而使得用户在进行检索时，可以根据该评价信息获得该段文字的综合评价，尽快找到用户所需的内容。而且，该评价规则描述文件采用XML来描述，其通用性和扩展性较好。In this embodiment, the text content server can pre-store evaluation rule description files used to describe the evaluation information corresponding to at least two types of characteristic content of a piece of text information, through which evaluation rule description files can be used to describe each piece of text information. Class feature content is evaluated, so as to comprehensively evaluate the text information from multiple angles, and use the evaluation information to mark the text information, so that users can obtain the text information based on the evaluation information when searching. The comprehensive evaluation of the user can find the content that the user needs as soon as possible. Moreover, the evaluation rule description file is described by XML, which has good versatility and expansibility.

图2为本发明文字内容服务器实施例的结构示意图，如图2所示，本实施例的服务器可以包括：获取模块11、确定模块12和标记模块13，其中：FIG. 2 is a schematic structural diagram of an embodiment of a text content server in the present invention. As shown in FIG. 2, the server of this embodiment may include: an acquisition module 11, a determination module 12, and a marking module 13, wherein:

获取模块11，用于确定待处理的一段文字信息；Obtaining module 11, is used for determining a piece of text information to be processed;

确定模块12，用于采用预先存储的至少两种评价规则描述文件，确定所述待处理的该段文字信息的评价信息，其中，各评价规则描述文件用于对一段文字信息的一类特征内容所对应的评价信息进行描述，且所述评价规则描述文件为采用XML描述的文件；The determining module 12 is configured to use at least two pre-stored evaluation rule description files to determine the evaluation information of the piece of text information to be processed, wherein each evaluation rule description file is used to classify a type of characteristic content of a piece of text information The corresponding evaluation information is described, and the evaluation rule description file is a file described in XML;

标记模块13，用于采用获得的各评价信息对该段文字信息进行标记。The marking module 13 is configured to use the obtained evaluation information to mark the piece of text information.

进一步地，该服务器还可以包括：生成模块14和搜索处理模块15，其中：Further, the server may also include: a generation module 14 and a search processing module 15, wherein:

生成模块14，用于在获取模块11获取待处理的一段文字信息之前，生成所述至少两种评价规则描述文件并存储。The generation module 14 is configured to generate and store the at least two evaluation rule description files before the acquisition module 11 acquires a piece of text information to be processed.

搜索处理模块15，用于接收用户输入的搜索请求，所述搜索请求中包含关键字；确定与所述关键字对应的文字信息；向所述用户推送搜索结果，所述搜索结果包括所述文字信息和该文字信息对应的各评价信息；或者，用于接收用户输入的搜索请求，所述搜索请求中包含关键字和用户需求信息；确定与所述关键字对应的文字信息，并根据所述文字信息对应的各评价信息确定满足用户需求信息的文字信息；将满足用户需求的文字信息推送给所述用户。The search processing module 15 is configured to receive a search request input by a user, and the search request includes a keyword; determine text information corresponding to the keyword; push a search result to the user, and the search result includes the text Information and each evaluation information corresponding to the text information; or, for receiving a search request input by the user, the search request includes keywords and user demand information; determining the text information corresponding to the keywords, and according to the The evaluation information corresponding to the text information determines the text information that meets the user's demand information; and pushes the text information that meets the user's demand to the user.

本实施例的文字内容服务器，其可以用于执行图1所示方法实施例的技术方案，其实现原理和技术效果类似，此处不再赘述。The text content server of this embodiment can be used to implement the technical solution of the method embodiment shown in FIG. 1 , and its implementation principle and technical effect are similar, and will not be repeated here.

本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions and related hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

A kind of 1. literal information processing method based on expandable mark language XML, it is characterised in that including：

Word content server obtains pending passage information；

The word content server describes file using at least two evaluation rules prestored, determines described pending The evaluation information of this section of text information, wherein, each evaluation rule describes file and is used in a category feature of passage information Hold corresponding evaluation information to be described, and it is the file described using XML that the evaluation rule, which describes file,；

This section of text information is marked using each evaluation information obtained for the word content server；

After this section of text information is marked using each evaluation information obtained for the word content server, the side Method also includes：

The word content server receives the searching request of user's input, and keyword is included in the searching request；

The word content server determines text information corresponding with the keyword；

The word content server pushes search result to the user, and the search result includes the text information and should Each evaluation information corresponding to text information；

Or after this section of text information is marked using each evaluation information obtained for the word content server, Methods described also includes：

The word content server receives the searching request of user's input, is needed in the searching request comprising keyword and user Seek information；

The word content server determines text information corresponding with the keyword, and according to corresponding to the text information Each evaluation information determines the text information for meeting user's request information；

The text information for meeting user's request is pushed to the user by the word content server.
2. according to the method for claim 1, it is characterised in that the word content server obtains one section of pending text Before word information, in addition to：

At least two evaluation rule is generated to describe file and store.
A kind of 3. word content server, it is characterised in that including：

Acquisition module, for determining pending passage information；

Determining module, for describing file using at least two evaluation rules prestored, determine described pending section The evaluation information of text information, wherein, each evaluation rule describes file for a kind of feature institute to passage information Corresponding evaluation information is described, and it is the file described using XML that the evaluation rule, which describes file,；

Mark module, for this section of text information to be marked using each evaluation information obtained；

The server also includes：Search process module, for receiving the searching request of user's input, wrapped in the searching request Containing keyword；It is determined that text information corresponding with the keyword；Search result, the search result bag are pushed to the user Include each evaluation information corresponding to the text information and the text information；Or for receiving the searching request of user's input, institute State and keyword and user's request information are included in searching request；It is determined that text information corresponding with the keyword, and according to institute State the text information that each evaluation information corresponding to text information determines to meet user's request information；The word of user's request will be met Information is pushed to the user.
4. server according to claim 3, it is characterised in that also include：

Generation module, for before obtaining pending passage information in the acquisition module, generating described at least two Evaluation rule describes file and stored.