CN113779235B

CN113779235B - Word document outline recognition processing method and device

Info

Publication number: CN113779235B
Application number: CN202111070726.9A
Authority: CN
Inventors: 麦天骥
Original assignee: BEIJING LEDICT TECHNOLOGY CO LTD
Current assignee: BEIJING LEDICT TECHNOLOGY CO LTD
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2024-02-02
Anticipated expiration: 2041-09-13
Also published as: CN113779235A

Abstract

The invention discloses a Word document outline recognition and processing method and device. By acquiring the Word file, the Word file is locally saved and parsed, and the Word file is converted into an HTML code file; all title tags in the HTML code file are circulated in JavaScript, Use a recursive algorithm to traverse all the title tags of the HTML code file and organize them into tree structure data; generate the title directory data corresponding to the Word file through the tree structure data, preset a unique primary key for the title of each HTML code file, and use the unique primary key to perform HTML The code file content and title directory data are linked. The invention can perform outline recognition processing on Word documents, realize linkage between catalogs and Word documents, facilitate grasping the outline of Word documents, and can be integrated into an application system to quickly generate browsing and editing help pages.

Description

A Word document outline recognition and processing method and device

技术领域Technical field

本发明涉及Word文档处理技术领域，具体涉及一种Word文档大纲识别处理方法及装置。The invention relates to the technical field of Word document processing, and in particular to a Word document outline recognition and processing method and device.

背景技术Background technique

Word是微软公司开发的一个文字处理器应用程序，是Office软件中的一个组件。使用Microsoft Office Word可创建和编辑信件、报告、网页或电子邮件中的文本和图形。相比于写字板和记事本功能更强大，性能更全面，可以插入图片、多媒体、艺术效果等。Word文档广泛应用在各行各业，给办公带来了巨大的便利。Word is a word processor application developed by Microsoft Corporation and is a component of Office software. Use Microsoft Office Word to create and edit text and graphics in letters, reports, web pages, or emails. Compared with WordPad and Notepad, it has more powerful functions and more comprehensive performance. It can insert pictures, multimedia, artistic effects, etc. Word documents are widely used in all walks of life, bringing great convenience to office work.

现阶段，随着信息化工作的不断推进，相关部门存在各种各样的应用系统，通过应用系统对涉及的Word文档进行处理展示，特别是行政部门，需要对用户上传的Word文档进行处理及优化展示。为了提高用户体验度，每个应用系统都设计了帮助功能。通过帮助功能实现对用户上传的Word文档进行辅助处理，每个系统的帮助功能都各自不同，不够统一，不仅用户使用繁琐，而且开发工作量巨大。虽然Word软件本身具有处理标题目录的功能，但是并不能融合在专门的应用系统中。Word文档通常具有大纲，如何快速的对Word文档进行大纲识别处理以便于掌握Word文档梗概具有现实的意义。At this stage, with the continuous advancement of informatization work, there are various application systems in relevant departments, through which the Word documents involved are processed and displayed. In particular, administrative departments need to process and display Word documents uploaded by users. Optimize display. In order to improve user experience, each application system is designed with a help function. The help function is used to assist in the processing of Word documents uploaded by users. The help functions of each system are different and not unified enough. Not only is it cumbersome for users to use, but also the development workload is huge. Although the Word software itself has the function of processing title directories, it cannot be integrated into a specialized application system. Word documents usually have an outline. How to quickly identify the outline of a Word document so as to grasp the outline of the Word document has practical significance.

发明内容Contents of the invention

为此，本发明提供一种Word文档大纲识别处理方法及装置，实现帮助系统中对Word文档进行大纲识别处理生成帮助页面，以便于Word文档的展示处理。To this end, the present invention provides a Word document outline recognition and processing method and device, which realizes the outline recognition processing of Word documents in the help system to generate a help page, so as to facilitate the display processing of Word documents.

为了实现上述目的，本发明提供如下技术方案：一种Word文档大纲识别处理方法，包括以下步骤：In order to achieve the above object, the present invention provides the following technical solution: a Word document outline recognition and processing method, including the following steps:

获取Word文件，对所述Word文件进行本地保存和解析，将所述Word文件转换为HTML代码文件；Obtain the Word file, locally save and parse the Word file, and convert the Word file into an HTML code file;

在JavaScript中循环所述HTML代码文件中所有的标题标签，使用递归算法遍历所述HTML代码文件的所有标题标签并整理为树结构数据；Loop all the title tags in the HTML code file in JavaScript, use a recursive algorithm to traverse all the title tags in the HTML code file and organize them into tree structure data;

通过所述树结构数据生成对应所述Word文件的标题目录数据，对每一个HTML代码文件的标题预设一个唯一主键，使用所述唯一主键进行HTML代码文件内容和标题目录数据联动。The title directory data corresponding to the Word file is generated through the tree structure data, a unique primary key is preset for the title of each HTML code file, and the unique primary key is used to link the content of the HTML code file and the title directory data.

作为Word文档大纲识别处理方法的优选方案，将所述Word文件保存至本地服务器，在本地服务器将所述Word文件转换为HTML代码文件，将生成的HTML代码文件返回至展示所述Word文件的前端设备。As a preferred solution for the Word document outline recognition and processing method, the Word file is saved to a local server, the Word file is converted into an HTML code file on the local server, and the generated HTML code file is returned to the front end that displays the Word file. equipment.

作为Word文档大纲识别处理方法的优选方案，所述前端设备的展示界面包括目录窗口和富文本编辑器窗口，所述目录窗口用于展示所述标题目录数据，所述富文本编辑器窗口用于展示所述HTML代码对应的Word文件内容。As a preferred solution for the Word document outline recognition and processing method, the display interface of the front-end device includes a table of contents window and a rich text editor window. The table of contents window is used to display the title table of contents data, and the rich text editor window is used to Display the content of the Word file corresponding to the HTML code.

作为Word文档大纲识别处理方法的优选方案，当所述富文本编辑器窗口的Word文件内容发生变化以后，重新触发生成Word文件内容变化后的标题目录。As a preferred solution for the Word document outline recognition processing method, when the content of the Word file in the rich text editor window changes, the generation of the title directory after the content of the Word file changes is re-triggered.

作为Word文档大纲识别处理方法的优选方案，将富文本编辑器窗口的Word文件内容发生变化前的标题目录与富文本编辑器窗口的Word文件内容发生变化后的标题目录进对比；As an optimal solution for the Word document outline identification and processing method, compare the title directory before the content of the Word file in the rich text editor window changes with the title directory after the content of the Word file in the rich text editor window changes;

若富文本编辑器窗口的Word文件内容发生变化后的标题目录存在删除的标题标签，则将删除的标题标签对应的主键一并删除；If there are deleted title tags in the title directory after the content of the Word file in the rich text editor window has changed, the primary keys corresponding to the deleted title tags will be deleted together;

若富文本编辑器窗口的Word文件内容发生变化后的标题目录存在新增的标题标签，则对新增的标题标签创建新的主键；If there is a new title tag in the title directory after the content of the Word file in the rich text editor window changes, a new primary key will be created for the new title tag;

若所述标题标签在富文本编辑器窗口的Word文件内容发生变化前后的标题目录中均存在，则在富文本编辑器窗口的Word文件内容发生变化后的标题目录中延续使用所述标题标签的主键。If the title tag exists in the title directory before and after the content of the Word file in the rich text editor window changes, then the title tag will continue to be used in the title directory after the content of the Word file in the rich text editor window changes. Primary key.

作为Word文档大纲识别处理方法的优选方案，所述标题目录数据生成步骤程包括：As a preferred solution for the Word document outline recognition and processing method, the title directory data generation step includes:

判断标题的标签层级是否等于1：Determine whether the label level of the title is equal to 1:

若标题的标签层级等于1，插入父级目录；若标题的标签层级不等于1，继续遍历剩余标题对应的标签层级；If the tag level of the title is equal to 1, insert the parent directory; if the tag level of the title is not equal to 1, continue to traverse the tag levels corresponding to the remaining titles;

判断当前标题的当前层级是否大于父级层级：Determine whether the current level of the current title is greater than the parent level:

若当前标题的当前层级大于父级层级，插入当前目录的子目录，继续遍历剩余标题对应的标签层级，重复判断过程，直至遍历结束；If the current level of the current title is greater than the parent level, insert the subdirectory of the current directory, continue to traverse the tag levels corresponding to the remaining titles, and repeat the judgment process until the end of the traversal;

若当前标题的当前层级不大于父级层级，插入父级目录，继续遍历剩余标题对应的标签层级，重复判断过程，直至遍历结束。If the current level of the current title is not greater than the parent level, insert the parent directory, continue traversing the tag levels corresponding to the remaining titles, and repeat the judgment process until the traversal is completed.

本发明还提供一种Word文档大纲识别处理装置，包括：The invention also provides a Word document outline recognition and processing device, which includes:

Word文件处理模块，用于获取Word文件，对所述Word文件进行本地保存和解析，将所述Word文件转换为HTML代码文件；The Word file processing module is used to obtain Word files, locally save and parse the Word files, and convert the Word files into HTML code files;

标题标签获取模块，用于在JavaScript中循环所述HTML代码文件中所有的标题标签；The title tag acquisition module is used to loop through all title tags in the HTML code file in JavaScript;

标题标签遍历模块，用于使用递归算法遍历所述HTML代码文件的所有标题标签并整理为树结构数据；A title tag traversal module, used to use a recursive algorithm to traverse all title tags of the HTML code file and organize them into tree-structured data;

标题目录生成模块，用于通过所述树结构数据生成对应所述Word文件的标题目录数据；A title directory generation module, configured to generate title directory data corresponding to the Word file through the tree structure data;

联动处理模块，用于对每一个HTML代码文件的标题预设一个唯一主键，使用所述唯一主键进行HTML代码文件内容和标题目录数据联动。The linkage processing module is used to preset a unique primary key for the title of each HTML code file, and use the unique primary key to link the content of the HTML code file and the title directory data.

作为Word文档大纲识别处理装置的优选方案，将所述Word文件保存至本地服务器，在本地服务器将所述Word文件转换为HTML代码文件，将生成的HTML代码文件返回至展示所述Word文件的前端设备；As a preferred solution of the Word document outline recognition and processing device, the Word file is saved to a local server, the Word file is converted into an HTML code file on the local server, and the generated HTML code file is returned to the front end that displays the Word file. equipment;

所述前端设备的展示界面包括目录窗口和富文本编辑器窗口，所述目录窗口用于展示所述标题目录数据，所述富文本编辑器窗口用于展示所述HTML代码对应的Word文件内容。The display interface of the front-end device includes a directory window and a rich text editor window. The directory window is used to display the title directory data, and the rich text editor window is used to display the content of the Word file corresponding to the HTML code.

作为Word文档大纲识别处理装置的优选方案，还包括标题目录更新模块，用于当所述富文本编辑器窗口的Word文件内容发生变化以后，重新触发生成Word文件内容变化后的标题目录；As a preferred solution of the Word document outline recognition and processing device, it also includes a title directory update module, which is used to re-trigger the generation of a title directory after the content of the Word file changes when the content of the Word file in the rich text editor window changes;

还包括标题目录对比模块，用于将富文本编辑器窗口的Word文件内容发生变化前的标题目录与富文本编辑器窗口的Word文件内容发生变化后的标题目录进对比；It also includes a title directory comparison module, which is used to compare the title directory before the content of the Word file in the rich text editor window changes with the title directory after the content of the Word file in the rich text editor window changes;

作为Word文档大纲识别处理装置的优选方案，所述标题目录生成模块中：As a preferred solution of the Word document outline recognition and processing device, in the title directory generation module:

本发明具有如下优点：通过获取Word文件，对Word文件进行本地保存和解析，将Word文件转换为HTML代码文件；在JavaScript中循环HTML代码文件中所有的标题标签，使用递归算法遍历HTML代码文件的所有标题标签并整理为树结构数据；通过树结构数据生成对应Word文件的标题目录数据，对每一个HTML代码文件的标题预设一个唯一主键，使用唯一主键进行HTML代码文件内容和标题目录数据联动。本发明能够对Word文档进行大纲识别处理，实现目录和Word文档的联动，方便掌握Word文档梗概，可以集成于应用系统快速的生成浏览编辑帮助页面。The invention has the following advantages: by acquiring the Word file, locally saving and parsing the Word file, and converting the Word file into an HTML code file; looping through all title tags in the HTML code file in JavaScript, and using a recursive algorithm to traverse the HTML code file. All title tags are organized into tree structure data; the title directory data corresponding to the Word file is generated through the tree structure data, a unique primary key is preset for the title of each HTML code file, and the unique primary key is used to link the content of the HTML code file and the title directory data . The invention can perform outline recognition processing on Word documents, realize linkage between catalogs and Word documents, facilitate grasping the outline of Word documents, and can be integrated into application systems to quickly generate browsing and editing help pages.

附图说明Description of the drawings

为了更清楚地说明本发明的实施方式或现有技术中的技术方案，下面将对实施方式或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是示例性的，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图引伸获得其它的实施附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to describe the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only exemplary. For those of ordinary skill in the art, other implementation drawings can be obtained based on the extension of the provided drawings without exerting creative efforts.

本说明书所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明可实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容能涵盖的范围内。The structures, proportions, sizes, etc. shown in this specification are only used to coordinate with the contents disclosed in the specification for the understanding and reading of people familiar with this technology. They are not used to limit the conditions under which the invention can be implemented, and therefore do not have any technical Any structural modifications, changes in proportions or adjustments in size should still fall within the scope of the technical content disclosed in the present invention without affecting the effectiveness and purpose achieved by the present invention. within the scope covered.

图1为本发明实施例中提供的Word文档大纲识别处理方法流程示意图；Figure 1 is a schematic flow chart of a Word document outline recognition and processing method provided in an embodiment of the present invention;

图2为本发明实施例中提供的Word文档大纲识别处理方法技术路线示意图；Figure 2 is a schematic diagram of the technical route of the Word document outline recognition and processing method provided in the embodiment of the present invention;

图3为本发明实施例中提供的Word文档大纲识别处理方法的展示示意图；Figure 3 is a schematic diagram showing the Word document outline recognition and processing method provided in the embodiment of the present invention;

图4为本发明实施例中提供的Word文档大纲识别处理装置示意图。Figure 4 is a schematic diagram of a Word document outline recognition and processing device provided in an embodiment of the present invention.

具体实施方式Detailed ways

以下由特定的具体实施例说明本发明的实施方式，熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本发明的其他优点及功效，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following specific embodiments are used to illustrate the implementation of the present invention. Persons familiar with this technology can easily understand other advantages and effects of the present invention from the content disclosed in this specification. Obviously, the described embodiments are only part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

实施例1Example 1

参见图1、图2和图3，提供一种Word文档大纲识别处理方法，包括以下步骤：Referring to Figure 1, Figure 2 and Figure 3, a Word document outline recognition and processing method is provided, including the following steps:

S1、获取Word文件，对所述Word文件进行本地保存和解析，将所述Word文件转换为HTML代码文件；S1. Obtain the Word file, save and parse the Word file locally, and convert the Word file into an HTML code file;

S2、在JavaScript中循环所述HTML代码文件中所有的标题标签，使用递归算法遍历所述HTML代码文件的所有标题标签并整理为树结构数据；S2. Loop all the title tags in the HTML code file in JavaScript, use a recursive algorithm to traverse all the title tags in the HTML code file and organize them into tree structure data;

S3、通过所述树结构数据生成对应所述Word文件的标题目录数据，对每一个HTML代码文件的标题预设一个唯一主键，使用所述唯一主键进行HTML代码文件内容和标题目录数据联动。S3. Generate title directory data corresponding to the Word file through the tree structure data, preset a unique primary key for the title of each HTML code file, and use the unique primary key to link the content of the HTML code file and the title directory data.

本实施例中，将所述Word文件保存至本地服务器，在本地服务器将所述Word文件转换为HTML代码文件，将生成的HTML代码文件返回至展示所述Word文件的前端设备。所述前端设备的展示界面包括目录窗口和富文本编辑器窗口，所述目录窗口用于展示所述标题目录数据，所述富文本编辑器窗口用于展示所述HTML代码对应的Word文件内容。In this embodiment, the Word file is saved to a local server, the Word file is converted into an HTML code file on the local server, and the generated HTML code file is returned to the front-end device that displays the Word file. The display interface of the front-end device includes a directory window and a rich text editor window. The directory window is used to display the title directory data, and the rich text editor window is used to display the content of the Word file corresponding to the HTML code.

具体的，将用户上传的Word文件保存在应用系统配置的本地服务器上，并且将Word文件转换为HTML代码文件的步骤在本地服务器上执行，将生成的HTML代码文件结果返回前端设备展示，提高处理效率。Specifically, the Word file uploaded by the user is saved on the local server configured by the application system, and the steps of converting the Word file into an HTML code file are executed on the local server, and the generated HTML code file results are returned to the front-end device for display, improving processing. efficiency.

具体的，步骤S1的一种实现代码如下：Specifically, an implementation code of step S1 is as follows:

本实施例中，把生成的HTML代码文件返回到前端设备，然后在JavaScript中循环HTML代码从中所有的标题(h标签)，并且使用递归算法把当前文档的所有标题整理为树结构的数据，一种实现代码如下：In this embodiment, the generated HTML code file is returned to the front-end device, and then all the titles (h tags) from the HTML code are looped in JavaScript, and a recursive algorithm is used to organize all the titles of the current document into tree-structured data. The implementation code is as follows:

本实施例中，当所述富文本编辑器窗口的Word文件内容发生变化以后，重新触发生成Word文件内容变化后的标题目录。In this embodiment, when the content of the Word file in the rich text editor window changes, the generation of the title directory after the content of the Word file changes is re-triggered.

将富文本编辑器窗口的Word文件内容发生变化前的标题目录与富文本编辑器窗口的Word文件内容发生变化后的标题目录进对比；Compare the title directory before the content of the Word file in the rich text editor window changes with the title directory after the content of the Word file in the rich text editor window changes;

本实施例中，所述标题目录数据生成步骤程包括：In this embodiment, the title directory data generating step includes:

具体的，标题目录数据生成的一种实现代码如下：Specifically, an implementation code for generating title directory data is as follows:

参见图3，基于本发明的技术方案，设计一种通用帮助系统，通用帮助系统是B/S结构的在线管理系统，可以通过在线编辑生成对应的帮助页面、更新说明、操作指引等等。每个系统只需要简单的引用一行js代码即可实现帮助功能。Referring to Figure 3, based on the technical solution of the present invention, a general help system is designed. The general help system is an online management system with a B/S structure, and can generate corresponding help pages, update instructions, operation guides, etc. through online editing. Each system only needs to simply quote a line of js code to implement the help function.

作为通用帮助系统，支持通过Word文件快速生成帮助页面。帮助页面的结构统一为左侧大纲右侧内容，而现有很多帮助功能都存在Word中。所以此系统支持Word文档上传功能。As a general help system, it supports quickly generating help pages through Word files. The structure of the help page is unified into an outline on the left and content on the right, and many existing help functions exist in Word. So this system supports Word document upload function.

Word文档上传以后，实现在线识别文档大纲，在左侧以树形菜单的方式展示文档大纲。同时，右侧展示对应的Word内容。点击左侧目录右侧定位到对应的内容，右侧内容编辑大纲以后可以更新左侧大纲。After the Word document is uploaded, the document outline is recognized online and the document outline is displayed in a tree menu on the left. At the same time, the corresponding Word content is displayed on the right side. Click on the right side of the left directory to locate the corresponding content. After editing the outline of the right content, you can update the left outline.

综上所述，本发明通过获取Word文件，对Word文件进行本地保存和解析，将Word文件转换为HTML代码文件；在JavaScript中循环HTML代码文件中所有的标题标签，使用递归算法遍历HTML代码文件的所有标题标签并整理为树结构数据；通过树结构数据生成对应Word文件的标题目录数据，对每一个HTML代码文件的标题预设一个唯一主键，使用唯一主键进行HTML代码文件内容和标题目录数据联动。当富文本编辑器窗口的Word文件内容发生变化以后，重新触发生成Word文件内容变化后的标题目录。将富文本编辑器窗口的Word文件内容发生变化前的标题目录与富文本编辑器窗口的Word文件内容发生变化后的标题目录进对比；若富文本编辑器窗口的Word文件内容发生变化后的标题目录存在删除的标题标签，则将删除的标题标签对应的主键一并删除；若富文本编辑器窗口的Word文件内容发生变化后的标题目录存在新增的标题标签，则对新增的标题标签创建新的主键；若标题标签在富文本编辑器窗口的Word文件内容发生变化前后的标题目录中均存在，则在富文本编辑器窗口的Word文件内容发生变化后的标题目录中延续使用标题标签的主键。本发明能够对Word文档进行大纲识别处理，实现目录和Word文档的联动，方便掌握Word文档梗概，可以集成于应用系统快速的生成浏览编辑帮助页面。To sum up, the present invention obtains the Word file, locally saves and parses the Word file, and converts the Word file into an HTML code file; loops through all title tags in the HTML code file in JavaScript, and uses a recursive algorithm to traverse the HTML code file. All title tags are organized into tree structure data; the title directory data corresponding to the Word file is generated through the tree structure data, a unique primary key is preset for the title of each HTML code file, and the unique primary key is used to identify the HTML code file content and title directory data Linkage. When the content of the Word file in the rich text editor window changes, the generation of the title directory after the content of the Word file changes is retriggered. Compare the title directory before the content of the Word file in the rich text editor window changes with the title directory after the content of the Word file in the rich text editor window changes; if the title directory after the content of the Word file in the rich text editor window changes, If there are deleted title tags in the directory, the primary keys corresponding to the deleted title tags will be deleted together; if there are new title tags in the title directory after the content of the Word file in the rich text editor window has changed, the new title tags will be deleted. Create a new primary key; if the title tag exists in the title directory before and after the content of the Word file in the rich text editor window changes, the title tag will continue to be used in the title directory after the content of the Word file in the rich text editor window changes. the primary key. The invention can perform outline recognition processing on Word documents, realize linkage between catalogs and Word documents, facilitate grasping the outline of Word documents, and can be integrated into application systems to quickly generate browsing and editing help pages.

实施例2Example 2

参见图4，本发明还提供一种Word文档大纲识别处理装置，包括：Referring to Figure 4, the present invention also provides a Word document outline recognition and processing device, which includes:

Word文件处理模块1，用于获取Word文件，对所述Word文件进行本地保存和解析，将所述Word文件转换为HTML代码文件；Word file processing module 1, used to obtain Word files, locally save and parse the Word files, and convert the Word files into HTML code files;

标题标签获取模块2，用于在JavaScript中循环所述HTML代码文件中所有的标题标签；Title tag acquisition module 2, used to loop through all title tags in the HTML code file in JavaScript;

标题标签遍历模块3，用于使用递归算法遍历所述HTML代码文件的所有标题标签并整理为树结构数据；Title tag traversal module 3 is used to use a recursive algorithm to traverse all title tags of the HTML code file and organize them into tree structure data;

标题目录生成模块4，用于通过所述树结构数据生成对应所述Word文件的标题目录数据；Title directory generation module 4, used to generate title directory data corresponding to the Word file through the tree structure data;

联动处理模块5，用于对每一个HTML代码文件的标题预设一个唯一主键，使用所述唯一主键进行HTML代码文件内容和标题目录数据联动。The linkage processing module 5 is used to preset a unique primary key for the title of each HTML code file, and use the unique primary key to link the content of the HTML code file and the title directory data.

本实施例中，将所述Word文件保存至本地服务器，在本地服务器将所述Word文件转换为HTML代码文件，将生成的HTML代码文件返回至展示所述Word文件的前端设备；In this embodiment, the Word file is saved to a local server, the Word file is converted into an HTML code file on the local server, and the generated HTML code file is returned to the front-end device that displays the Word file;

本实施例中，还包括标题目录更新模块6，用于当所述富文本编辑器窗口的Word文件内容发生变化以后，重新触发生成Word文件内容变化后的标题目录；In this embodiment, a title directory update module 6 is also included, which is used to re-trigger the generation of a title directory after the content of the Word file changes when the content of the Word file in the rich text editor window changes;

还包括标题目录对比模块7，用于将富文本编辑器窗口的Word文件内容发生变化前的标题目录与富文本编辑器窗口的Word文件内容发生变化后的标题目录进对比；It also includes a title directory comparison module 7 for comparing the title directory before the content of the Word file in the rich text editor window changes with the title directory after the content of the Word file in the rich text editor window changes;

本实施例中，所述标题目录生成模块4中：In this embodiment, in the title directory generation module 4:

需要说明的是，上述装置各模块/单元之间的信息交互、执行过程等内容，由于与本申请实施例1中的方法实施例基于同一构思，其带来的技术效果与本申请方法实施例相同，具体内容可参见本申请前述所示的方法实施例中的叙述，此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units of the above-mentioned device are based on the same concept as the method embodiment in Embodiment 1 of the present application, and the technical effects brought by it are the same as those of the method embodiment of the present application. The same, for specific details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.

实施例3Example 3

本发明实施例3提供一种计算机可读存储介质，所述计算机可读存储介质中存储Word文档大纲识别处理方法的程序代码，所述程序代码包括用于执行实施例1或其任意可能实现方式的Word文档大纲识别处理方法的指令。Embodiment 3 of the present invention provides a computer-readable storage medium. The computer-readable storage medium stores a program code for a Word document outline recognition processing method. The program code includes a program code for executing Embodiment 1 or any possible implementation thereof. The Word document outline identifies the processing method instructions.

计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘(SolidStateDisk、SSD))等。Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or other integrated media that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

实施例4Example 4

本发明实施例4提供一种电子设备，所述电子设备包括处理器，所述处理器与存储介质耦合，当所述处理器执行存储介质中的指令时，使得所述电子设备执行实施例1或其任意可能实现方式的Word文档大纲识别处理方法。Embodiment 4 of the present invention provides an electronic device. The electronic device includes a processor. The processor is coupled to a storage medium. When the processor executes instructions in the storage medium, the electronic device executes Embodiment 1. Or any possible implementation of the Word document outline recognition processing method.

具体的，处理器可以通过硬件来实现也可以通过软件来实现，当通过硬件实现时，该处理器可以是逻辑电路、集成电路等；当通过软件来实现时，该处理器可以是一个通用处理器，通过读取存储器中存储的软件代码来实现，该存储器可以集成在处理器中，可以位于所述处理器之外，独立存在。Specifically, the processor can be implemented by hardware or software. When implemented by hardware, the processor can be a logic circuit, an integrated circuit, etc.; when implemented by software, the processor can be a general-purpose processor. The memory is implemented by reading the software code stored in the memory. The memory can be integrated in the processor, or can be located outside the processor and exist independently.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.

显然，本领域的技术人员应该明白，上述的本发明的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，并且在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented using general-purpose computing devices. They can be concentrated on a single computing device, or distributed across a network composed of multiple computing devices. , optionally, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device for execution by the computing device, and in some cases, may be in a sequence different from that herein. The steps shown or described are performed either individually as individual integrated circuit modules, or as multiple modules or steps among them as a single integrated circuit module. As such, the invention is not limited to any specific combination of hardware and software.

虽然，上文中已经用一般性说明及具体实施例对本发明作了详尽的描述，但在本发明基础上，可以对之作一些修改或改进，这对本领域技术人员而言是显而易见的。因此，在不偏离本发明精神的基础上所做的这些修改或改进，均属于本发明要求保护的范围。Although the present invention has been described in detail with general descriptions and specific examples above, it is obvious to those skilled in the art that some modifications or improvements can be made on the basis of the present invention. Therefore, these modifications or improvements made without departing from the spirit of the present invention all fall within the scope of protection claimed by the present invention.

Claims

1. The Word document outline identification processing method is characterized by comprising the following steps of:

obtaining a Word file, locally storing and analyzing the Word file, and converting the Word file into an HTML code file;

circulating all title labels in the HTML code file in JavaScript, traversing all title labels of the HTML code file by using a recursion algorithm, and finishing the title labels into tree structure data;

generating title directory data corresponding to the Word files through the tree structure data, presetting a unique main key for the title of each HTML code file, and using the unique main key to link the content of the HTML code file with the title directory data;

storing the Word file to a local server, converting the Word file into an HTML code file at the local server, and returning the generated HTML code file to front-end equipment for displaying the Word file;

the display interface of the front-end equipment comprises a catalog window and a rich text editor window, wherein the catalog window is used for displaying the title catalog data, and the rich text editor window is used for displaying Word file contents corresponding to the HTML codes;

when the Word file content of the rich text editor window changes, re-triggering and generating a title directory after the Word file content changes;

comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;

if the deleted title label exists in the title directory after the Word file content of the rich text editor window is changed, deleting the main key corresponding to the deleted title label;

if the title directory after the Word file content of the rich text editor window is changed has a newly added title label, creating a new main key for the newly added title label;

if the title label exists in the title catalogues before and after the Word file content of the rich text editor window is changed, continuing to use the main key of the title label in the title catalogues after the Word file content of the rich text editor window is changed;

the title directory data generation procedure comprises the following steps:

judging whether the label level of the title is equal to 1:

if the label level of the title is equal to 1, inserting a parent directory; if the label level of the title is not equal to 1, continuing traversing the label levels corresponding to the remaining titles;

judging whether the current level of the current title is larger than the parent level:

if the current level of the current title is greater than the parent level, inserting a sub-directory of the current directory, continuing traversing the tag levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished;

if the current level of the current title is not greater than the parent level, inserting a parent level directory, continuing to traverse the label levels corresponding to the remaining titles, and repeating the judging process until the traversing is finished.

2. A Word document outline recognition processing device, comprising:

the Word file processing module is used for acquiring a Word file, locally storing and analyzing the Word file and converting the Word file into an HTML code file;

the title tag acquisition module is used for circulating all title tags in the HTML code file in JavaScript;

the title tag traversing module is used for traversing all title tags of the HTML code file by using a recursion algorithm and arranging the title tags into tree structure data;

the title directory generation module is used for generating title directory data corresponding to the Word file through the tree structure data;

the linkage processing module is used for presetting a unique main key for the title of each HTML code file and carrying out linkage between the content of the HTML code file and the title directory data by using the unique main key;

the title directory updating module is used for re-triggering and generating a title directory after the Word file content of the rich text editor window is changed;

the title directory comparison module is used for comparing the title directory before the Word file content of the rich text editor window is changed with the title directory after the Word file content of the rich text editor window is changed;

the title catalog generation module is as follows:

judging whether the label level of the title is equal to 1: